The impact of RNA structure on coding sequence evolution in both bacteria and eukaryotes
© Gu et al.; licensee BioMed Central Ltd. 2014
Received: 30 January 2014
Accepted: 7 April 2014
Published: 23 April 2014
Many studies have found functional RNA secondary structures are selectively conserved among species. But, the effect of RNA structure selection on coding sequence evolution remains unknown. To address this problem, we systematically investigated the relationship between nucleotide conservation level and its structural sensitivity in four model organisms, Escherichia coli, yeast, fly, and mouse.
We define structurally sensitive sites as those with putative local structure-disruptive mutations. Using both the Mantel-Haenszel procedure and association test, we found structurally sensitive nucleotide sites evolved more slowly than non-sensitive sites in all four organisms. Furthermore, we observed that this association is more obvious in highly expressed genes and region near the start codon.
We conclude that structurally sensitive sites in mRNA sequences normally have less nucleotide divergence in all species we analyzed. This study extends our understanding of the impact of RNA structure on coding sequence evolution, and is helpful to the development of a codon model with RNA structure information.
Messenger RNA (mRNA) encodes functional information with linear nucleotide sequences for amino acids in a cell. In addition to mRNA primary linear structure, base pairing of local nucleotides in mRNAs creates specific secondary structures, such as stems and loops. It has been reported that mRNA structures encode several regulatory information in different biological processes , including DNA transcription , pre-mRNA splicing , microRNA (miRNA) mediated gene regulation [4, 5], gene translation [6–8], and cellular localization [9, 10]. Experimental profiling of mRNA structure at genome scale both in vitro[6, 11–13] and in vivo has confirmed regulatory roles of mRNA secondary structure in various organisms. Given the essentiality of RNA structure in regulating gene expression, it is important to perform mRNA structure analysis from the perspective of evolution.
Current evolutionary studies on RNA structures can be largely classified into two groups. The first group focuses on the conservation level of RNA structures in a genome. The basic method is to compare RNA structures within species in a phylogenetic tree. A set of functional RNAs (fRNAs) with conserved structures have been identified in human [15–19], Drosophila, and yeast . Although different algorithms were applied among these studies, the consensus conclusion of these studies is that RNA structures experienced widespread purifying selection in organisms. Unlike the studies we mentioned above, the main issue addressed in the second group of studies is the effect of RNA secondary structure change caused by a single-point mutation. Some SNPs in mRNA coding [22, 23] and non-coding [24, 25] regions can cause aberrant gene expression by affecting mRNA secondary structures. Also, a point mutation in or close to miRNA target sites would disrupt normal gene regulation by affecting local mRNA accessibility [5, 26, 27]. A more recent study investigated accumulated mutations in Escherichia coli (E. coli) genes over 40,000 generations of evolution, and found mutations that may disrupt mRNA secondary structure are selectively filtered out in the course of evolution .
As RNA secondary structure is conserved among species and the fitness of structurally disruptive mutations is low, it is reasonable to hypothesize that selection on RNA secondary structure should lead to less nucleotide sequence divergence in the genome. However, little is known on this topic so far. The only study, to our knowledge, was performed by Warden et al.. They predicted fRNAs in coding region of yeast genes and found significant effects of RNA secondary structure on protein evolutionary rates . Notably, RNA structures are selectively conserved in protein coding regions in many organisms, such as Drosophila, yeast  and human [16, 17, 31]. In a recent study, Smith et al. proposed that the relative enrichment of conserved RNA structure was the highest in protein coding region than that in any other genomic region. To understand the effect of RNA structure conservation on coding sequence evolution, we systematically investigated the relationship between nucleotide conservation level and mRNA secondary structure in four model organisms, including one prokaryote and three eukaryotes, E. coli, Saccharomyces cerevisiae (S. cerevisiae), Drosophila melanogaster (D. melanogaster), and Mus musculus (M. musculus). We define structurally sensitive sites in mRNA as those with putative local structure-disruptive mutations. We first assess whether structurally sensitive nucleotide sites are more conserved than non-sensitive sites. Next, we compare the above effect between genes with different expression level and codon usage bias, respectively. To further elucidate sequence constraint in different local regions along mRNA, we analyze the effect in translation initiation and elongation regions. Our analyses present a deep view of structure-associated nucleotide divergence in coding region. This study extends our understanding on the evolutionary process of coding sequences and helps develop a better model for coding sequence evolution.
Structurally sensitive sites in mRNA are more evolutionarily conserved
We first assessed whether the mRNA nucleotide sites with putative structurally disruptive mutations are more evolutionarily conserved. According to the table of genetic code, most substitutions at the first codon position and all the substitutions at the second codon position are nonsynonymous. The conservation level of nucleotide at the first and second codon position is inevitably governed by strong purifying selection against amino acid replacement. To avoid the confounding factors caused by the selection on nonsynonymous sites, only the 4-fold degenerate sites in coding sequences were investigated in this study, which means we focused on the synonymous sites without any nonsynonymous mutational opportunity.
We evaluated the conservation level of each 4-fold degenerate site by weighted entropy (E w ), which was calculated by multiple sequence alignment of widely diverged orthologs (see Methods for details). Lower E w means higher conservation level and E w = 0 means the no variation in nucleotide type in the alignment column. Here, we considered sites with E w < 0.5 as conserved. For comparison, we also computed phyloP score  for each nucleotide site, which is a conservation score based on a model of neutral evolution (see Methods for details). Higher phyloP score means higher conservation level. We found a very strong negative correlation between E w and phyloP score, with the mean of Pearson correlation coefficient < −0.85 across all the genes in each species (Additional file 1: Figure S1).
We used RNAsnp  program to assess the effect of single-point mutation on local mRNA secondary structure. RNAsnp helps screen the putative structure-disruptive mutations in RNA sequences by estimating the structural changes of all three possible substitutions at each nucleotide site. The structural distance (d max ) between wild-type and mutant sequences was calculated from base pairing probability matrices . We measured the structural sensitivity for a 4-fold degenerate site by the mean d max for all 3 possible mutations at this site, assessing the likelihood that a mutation at this site is structurally disruptive. We considered a nucleotide site as structurally sensitive if the structural sensitivity was larger than 0.1.
To investigate the reason why E. coli shows the most significant signal while yeast shows the least significant signal, we counted the number of structurally sensitive sites for each gene within each species. E. coli shows the highest fraction of sensitive sites, while the proportion in yeast was significantly lower than that in the other species (P < 10−8 by t-test; see also Additional file 3: Figure S2). Since the power of Mantel-Haenszel test is affected by the number of sensitive sites, the lowest number of sensitive sites in yeast may partly explain the least significant signal in this species.
To determine whether the conservation level at structurally sensitive sites was affected by expression level, we calculated the OR MH separately for the genes with the highest 50% and the lowest 50% expression level. In all the species except fly, the OR MH for the highest-expressed genes tended to be higher than that for the genes with the lowest expression level (Figure 1B). The corresponding P-values were also more significant in highly expressed genes in all species except fly (Figure 1B).
We also tested whether gene codon bias could affect the conservation level at structurally sensitive sites. Effective number of codons (ENC) is usually used to measure gene codon usage bias . Here, we used an improved version of ENC, ENC’, which takes background nucleotide composition into account [37, 38]. Lower ENC’ values indicate stronger codon bias. By comparing the bottom 50% of genes with the lowest ENC’ to the top 50% of genes with the highest ENC’, we found that, in all the species with the exception of fly, the P-values for the genes with stronger codon bias tended to be more significant than those for the genes with the lowest codon bias (Figure 1C).
In addition, we repeated the above analyses using phyloP score as the measure of nucleotide conservation level (Additional file 4: Figure S3). We considered sites with phyloP score > 0 as conserved. Side-by-side comparison between Figure 1 and Additional file 4: Figure S3 indicated that the results generated based on phyloP score mirrored what we found when using weighed entropy as the measure of conservation level.
Stronger association between conservation level and structural sensitivity at translation initiation region
To investigate whether window size affected our results, we redid our analysis for the four species using sliding windows of 45 nucleotides. Results for the alternate window size were compatible to those obtained with a window size of 36 nucleotides (Additional file 5: Figure S4).
To address why there is an exceptional pattern in yeast, we checked the composition of structurally sensitive sites for each window. Similar to the results mentioned in the previous subsection, the fraction of sensitive sites of the 5′ sliding windows was lower in yeast compared with the other species (Additional file 6: Figure S5). The lowest number of sensitive sites in yeast may interpret the least significant signal in this species. Also, we observed a trend that there are more structurally sensitive sites in the 5′ end windows, comparing with the downstream windows from the same species (Additional file 6: Figure S5).
Weighted entropy correlates negatively with structural sensitivity
All above analyses were based on categorized data, such as a classification of all nucleotide sites into conserved/non-conserved or structurally sensitive/non-sensitive. Weighted entropy and structural sensitivity are continuous quantities. Lower weighted entropy denotes higher conservation level while higher structural sensitivity indicates more severe structural constraints. Therefore, if forcing both variables into dichotomous categories, we may lose statistical power.
To make use of the continuous values of structural sensitivity and weighted entropy for each nucleotide site, we calculated the Pearson correlation coefficient between structural sensitivity and weighted entropy of the 4-fold degenerate sites in each gene. As test statistic, we used the mean of all these correlation coefficients. We calculated the sampling distribution of this statistic by randomly permuting weighted entropy of 4-fold degenerate sites with identical nucleotide within each gene. Since we expected weighted entropy to decrease with structural sensitivity, we calculated one-tailed P-values for the left tail of the sampling distribution of the mean correlation coefficient. Our alternative hypothesis was that the mean correlation coefficient should be more negative than expected by chance if structural sensitive sites are more evolutionarily conserved.
We examined the relationship between the conservation level of 4-fold degenerate sites and the corresponding structural sensitivity in the mRNAs in four model organisms. Using both categorized and continuous analyses, we found that the conservation level is increased for the nucleotide sites with putative structurally disruptive single-point mutations. In E. coli, yeast, and mouse, the association is stronger in highly expressed genes than in genes with low expression level. Also, we found that the association is stronger at 5′ translation initiation region comparing with downstream elongation sequence. All these findings suggest that structurally important sites tend to experience stronger purifying selection at the nucleotide level from prokaryotes to eukaryotes.
In Mantel-Haenszel procedure, we used a cutoff to categorize the 4-fold codon sites into two groups: structurally sensitive vs. non-sensitive. It was suggested by the authors of RNAsnp that roughly 8-9% single-point mutations in RNA are structurally disruptive . Because there are three possible mutations at a nucleotide site, we can reasonably expect that, on average, there are less than ~25% nucleotide sites in mRNA with potentially disruptive mutations. In our study, the proportion of structurally sensitive site in most mRNAs was lower than 25% (Additional file 3: Figure S2), which suggests that we chose a reasonable cutoff for structural sensitivity. In addition, less stringent cutoffs for structural sensitivity only slightly changed the results (Additional file 2: Table S1), which suggest that our results were independent of the cutoff choice.
Protein functional and structural constraints govern the evolution at nonsynonymous sites in coding sequences. Therefore, we didn’t take the first and second codon positions into account. We only focused on the 4-fold degenerate sites. However, it is important to note that 4-fold degenerate sites are not essentially free of selective constraints. Several mechanisms have been reported, which cause selective pressure on synonymous sites, such as selection for accurate and fast translation [42–47], selection for RNA global stability , selection for miRNA binding , selection for splicing efficiency [49, 50], and selection for protein co-translational folding [51–53]. All these factors may weaken or bias the association between local structural sensitivity and site conservation level. In the resampling test, we kept the amino acid sequence, codon usage bias, and nucleotide composition for each gene, which helps avoid some of the confounding selective factors acting on synonymous sites.
There is a general observation that evolutionary constraints appear to increase with gene expression level [45, 54–56]. Our results also indicate that highly expressed genes seem to exhibit a stronger association between conservation level and structural importance than genes with low expression level. The exception of fly may be due to the fact that the fly dataset with available expression information was extremely shrunk. Given the fact that codon bias somewhat reflects gene expression level [57–59], the effect of ENC’ largely mirrors our findings on gene expression.
We found a stronger association between nucleotide conservation level and structural sensitivity at 5′ translation initiation region comparing with downstream elongation region, which is likely due to the enhanced importance of mRNA secondary structure for translation initiation. Several recent studies have demonstrated various structure-related regulatory mechanisms in mRNA translational process [7, 31, 60–64]. Especially, a universal selection on reduced RNA stability at translation initiation region has been reported from prokaryotes to eukaryotes by in silico studies . Experimental studies also confirmed the key role of RNA structure near the start codon for translation initiation [61, 62, 65]. Some regulatory structures are also observed near the translation initiation region, such as internal ribosomal entry sites (IRES) in some eukaryotic genes  and PKR activating structure in inflammation-related genes . Unlike translation initiation, codon usage and corresponding tRNA abundance, rather than RNA secondary structure, are the more important factors that regulates translation elongation and the final output of gene expression . Therefore, it’s not surprising that the association of RNA structural sensitivity with nucleotide conservation is stronger at translation initiation region.
Our results suggest a universal trend of increased nucleotide conservation at structurally sensitive nucleotide sites. But, the statistical significance is weakest in yeast, which may be due to the lower fraction of sensitive sites in yeast. Both global and sliding window analyses indicate that the proportion of structurally sensitive sites was significantly reduced in yeast compared with the other species (Additional file 3: Figure S2 and Additional file 6: Figure S5). The relatively low number of sensitive sites could decrease the statistical power of our analysis, which may partly explain why the signal in yeast is kind of blurred.
Although this study is not the first to provide the evidence that protein-coding sequences are under evolutionary selection in keeping functional RNA secondary structure, we found a relatively strong and pervasive signal that structurally important sites tend to be more evolutionarily conserved from prokaryotes to eukaryotes, which is stronger for highly expressed genes and for translation initiation region.
Our results highlight the importance of local RNA secondary structure in coding sequence evolution, and suggest that mRNA sequences are experiencing purifying selection in keeping functional RNA secondary structures. The inclusion of local RNA secondary structure information in a codon model should be beneficial for the detection of purifying/positive selection in coding sequences.
We obtained genomic sequences from the following sources: the Comprehensive Microbial Resource (http://cmr.tigr.org/) for E. coli, the Saccharomyces Genome Database (ftp://genome-ftp.stanford.edu/) for S. cerevisiae, the Eisen Lab (http://rana.lbl.gov/drosophila/) for D. melanogaster, and Ensembl (http://www.ensembl.org/) for M. musculus.
For E. coli, we obtained orthologs in Shigella sonnei, Shigella flexneri, Shigella boydii, Shigella dysenteriae, Klebsiella pneumoniae, Salmonella typhimurium, Salmonella enterica, Photorhabdus luminescens, and Sodalis glossinidius from TIGR’s Comprehensive Microbial Resource (http://cmr.tigr.org/). For S. cerevisiae, we obtained orthologs in Saccharomyces paradoxus, Saccharomyces mikatae, Saccharomyces bayanus, Saccharomyces kudriavzevii, Saccharomyces castellii, and Saccharomyces kluyveri from the Saccharomyces Genome Database (ftp://genome-ftp.stanford.edu/). For D. melanogaster, we obtained orthologs in Drosophila simulans, Drosophila sechellia, Drosophila yakuba, Drosophila erecta, Drosophila ananassae, Drosophila pseudoobscura, Drosophila persimilis, Drosophila willistoni, Drosophila mojavensis, Drosophila virilis, and Drosophila grimshawi from the Drosophila 12-genome project AAAWiki at http://rana.lbl.gov/drosophila/. For mouse, we obtained orthologs in human, chimp, macaque, rat, cow, dog, and horse from Biomart through the Ensembl Homology track (http://www.ensembl.org/). We built multiple alignments of orthologous sequences based on the peptide sequences with MUSCLE . We excluded from our data set those ortholog pairs for which less than 80% of either sequence could be aligned to the other sequence. We only saved the alignments in which each species has its corresponding ortholog. This step yielded 1,156, 1,164, 3,047, and 6,324 alignments in E. coli, yeast, fly, and mouse, respectively.
Nucleotide site conservation level
Here, N is the set of unique nucleotides in the column and p i is the weighted fraction of sequences carrying a particular nucleotide i. Lower E w means higher conservation level and E w = 0 means the no variation in nucleotide type in the column. We considered sites with E w < 0.5 as conserved.
We also applied phyloP program  to compute conservation score for each nucleotide site. The conservation P-values were computed using the likelihood ratio test (LRT) method with “--wig-scores” option. The phylogenetic model was produced by the phyloFit program  using “REV” nucleotide substitution model. The site specific conservation score was computed as “-log(P)”. Higher phyloP score means higher conservation level.
We used previously published expression data for each species: for E. coli, we obtained gene expression levels measured in mRNAs per cell from ; for S. cerevisiae, we used expression data from ; for D. melanogaster, we used as expression level the geometric mean of expression data from different tissues obtained by ; and for M. musculus, we measured expression level as the breadth of expression among different tissues .
Mutation-induced mRNA structural change
We used the RNAsnp to estimate local mRNA secondary structural changes induced by mutations. This program focuses on the local regions of maximal structural change between mutant and wild-type . We applied “Mode 3″ in RNAsnp with default settings to screen putative structure-disruptive mutations in mRNA sequences. The mutation effects were quantified by maximum structural distance (d max ). We measured the structural sensitivity for a nucleotide site by the mean d max for all 3 possible mutations at this site. We considered a nucleotide site as structurally sensitive if the structural sensitivity was larger than 0.1.
To avoid the confounding factors, such as amino acid composition and strong purifying selection on nonsynonymous sites, we only focused on the synonymous sites without any nonsynonymous mutational opportunity. This means only the 4-fold degenerate sites were taken into account in this study (Additional file 8: Table S2). The percentage of 4-fold degenerate sites among the third codon positions of each gene varies from roughly 20% to 70% (Additional file 9: Figure S7). In total, 200,786, 160,079, 657,900, and 1,598,517 4-fold degenerate sites were included for E. coli, yeast, fly, and mouse, respectively.
2 × 2 contingency table for one particular gene in E. coli
Counts for the i th 2 × 2 contingency table
The null hypothesis in this analysis assumes that the conservation status of 4-fold degenerate sites (e.g. conserved or non-conserved) is independent of the corresponding structural status (e.g. structurally sensitive or non-sensitive) in any given stratum. The Mantel-Haenszel procedure was conducted by “mantelhaen.test” function in R plotform with the options of continuity correction and “two.sided” alternative hypothesis.
The second method was to calculate the Pearson correlation coefficient between the two continuous variables (weighed entropy and structural sensitivity) for each gene. As test statistic, we used the mean of the correlation coefficients over all genes. We calculated the sampling distribution by randomly reshuffling, separately for each gene, weighted entropy among 4-fold degenerate sites with identical nucleotide and recalculating all correlation coefficients. We generated 1,000 resampled sequences for each gene. All the statistical analyses were conducted using the R platform (version 2.15.1).
Availability of supporting data
The data sets supporting the results of this article are available in TreeBASE, http://purl.org/phylo/treebase/phylows/study/TB2:S15642.
This work was supported by the National Basic Research Program of China (2012CB316501 to WG), the National High Technology Research and Development Program of China (2012AA020401 to WG), National Natural Science Foundation of China (61171143 to WG), and Tsinghua National Laboratory for Information Science and Technology (TNList) Cross-Discipline Foundation. This work was also supported by the Korea Research Foundation Grant funded by the Korean Government (MOEHRD, Basic Research Promotion Fund) (KRF-2011-0016587 to JHK).
- Wan Y, Kertesz M, Spitale RC, Segal E, Chang HY: Understanding the transcriptome through RNA structure. Nat Rev Genet. 2011, 12 (9): 641-655. 10.1038/nrg3049.PubMedView ArticleGoogle Scholar
- Nagalakshmi U: The transcriptional landscape of the yeast genome defined by RNA sequencing. Science. 2008, 320: 1344-1349. 10.1126/science.1158441.PubMedPubMed CentralView ArticleGoogle Scholar
- Warf MB, Berglund JA: Role of RNA structure in regulating pre-mRNA splicing. Trends Biochem Sci. 2010, 35: 169-178. 10.1016/j.tibs.2009.10.004.PubMedPubMed CentralView ArticleGoogle Scholar
- Meijer HA, Kong YW, Lu WT, Wilczynska A, Spriggs RV, Robinson SW, Godfrey JD, Willis AE, Bushell M: Translational repression and eIF4A2 activity Are critical for MicroRNA-mediated gene regulation. Science. 2013, 340 (6128): 82-85. 10.1126/science.1231197.PubMedView ArticleGoogle Scholar
- Gu W, Wang X, Zhai C, Xie X, Zhou T: Selection on synonymous sites for increased accessibility around miRNA binding sites in plants. Mol Biol Evol. 2012, 29 (10): 3037-3044. 10.1093/molbev/mss109.PubMedView ArticleGoogle Scholar
- Li F, Zheng Q, Vandivier LE, Willmann MR, Chen Y, Gregory BD: Regulatory impact of RNA secondary structure across the Arabidopsis transcriptome. Plant Cell. 2012, 24 (11): 4346-4359. 10.1105/tpc.112.104232.PubMedPubMed CentralView ArticleGoogle Scholar
- Gu W, Zhou T, Wilke CO: A universal trend of reduced mRNA stability near the translation-initiation site in prokaryotes and eukaryotes. PLoS Comput Biol. 2010, 6 (2): e1000664-10.1371/journal.pcbi.1000664.PubMedPubMed CentralView ArticleGoogle Scholar
- Kozak M: Regulation of translation via mRNA structure in prokaryotes and eukaryotes. Gene. 2005, 361: 13-37.PubMedView ArticleGoogle Scholar
- Martin KC, Ephrussi A: mRNA localization: gene expression in the spatial dimension. Cell. 2009, 136: 719-730. 10.1016/j.cell.2009.01.044.PubMedPubMed CentralView ArticleGoogle Scholar
- Gonsalvez GB, Urbinati CR, Long RM: RNA localization in yeast: moving towards a mechanism. Biol Cell. 2005, 97: 75-86. 10.1042/BC20040066.PubMedView ArticleGoogle Scholar
- Kertesz M: Genome-wide measurement of RNA secondary structure in yeast. Nature. 2010, 467: 103-107. 10.1038/nature09322.PubMedView ArticleGoogle Scholar
- Wan Y, Qu K, Ouyang Z, Kertesz M, Li J, Tibshirani R, Makino DL, Nutter RC, Segal E, Chang HY: Genome-wide measurement of RNA folding energies. Mol Cell. 2012, 48 (2): 169-181. 10.1016/j.molcel.2012.08.008.PubMedPubMed CentralView ArticleGoogle Scholar
- Zheng Q, Ryvkin P, Li F, Dragomir I, Valladares O, Yang J, Cao K, Wang L-S, Gregory BD: Genome-wide double-stranded RNA sequencing reveals the functional significance of base-paired RNAs in Arabidopsis. PLoS Genet. 2010, 6 (9): e1001141-10.1371/journal.pgen.1001141.PubMedPubMed CentralView ArticleGoogle Scholar
- Ding Y, Tang Y, Kwok CK, Zhang Y, Bevilacqua PC, Assmann SM: In vivo genome-wide profiling of RNA secondary structure reveals novel regulatory features. Nature. 2014, 505: 696-700.PubMedView ArticleGoogle Scholar
- Pedersen JS: Identification and classification of conserved RNA secondary structures in the human genome. PLoS Comput Biol. 2006, 2: e33-10.1371/journal.pcbi.0020033.PubMedPubMed CentralView ArticleGoogle Scholar
- Smith MA, Gesell T, Stadler PF, Mattick JS: Widespread purifying selection on RNA structure in mammals. Nucleic Acids Res. 2013, 41 (17): 8220-8236. 10.1093/nar/gkt596.PubMedPubMed CentralView ArticleGoogle Scholar
- Washietl S, Pedersen JS, Korbel JO, Stocsits C, Gruber AR, Hackermüller J, Hertel J, Lindemeyer M, Reiche K, Tanzer A, Ucla C, Wyss C, Antonarakis SE, Denoeud F, Lagarde J, Drenkow J, Kapranov P, Gingeras TR, Guigó R, Snyder M, Gerstein MB, Reymond A, Hofacker IL, Stadler PF: Structured RNAs in the ENCODE selected regions of the human genome. Genome Res. 2007, 17 (6): 852-864. 10.1101/gr.5650707.PubMedPubMed CentralView ArticleGoogle Scholar
- Torarinsson E, Yao Z, Wiklund ED, Bramsen JB, Hansen C, Kjems J, Tommerup N, Ruzzo WL, Gorodkin J: Comparative genomics beyond sequence-based alignments: RNA structures in the ENCODE regions. Genome Res. 2008, 18 (2): 242-251. 10.1101/gr.6887408.PubMedPubMed CentralView ArticleGoogle Scholar
- Washietl S, Hofacker IL, Lukasser M, Huttenhofer A, Stadler PF: Mapping of conserved RNA secondary structures predicts thousands of functional noncoding RNAs in the human genome. Nat Biotechnol. 2005, 23 (11): 1383-1390. 10.1038/nbt1144.PubMedView ArticleGoogle Scholar
- Stark A, Lin MF, Kheradpour P, Pedersen JS, Parts L, Carlson JW, Crosby MA, Rasmussen MD, Roy S, Deoras AN, Ruby JG, Brennecke J, Hodges E, Hinrichs AS, Caspi A, Paten B, Park S-W, Han MV, Maeder ML, Polansky BJ, Robson BE, Aerts S, van Helden J, Hassan B, Gilbert DG, Eastman DA, Rice M, Weir M, Hahn MW, Park Y: Discovery of functional elements in 12 Drosophila genomes using evolutionary signatures. Nature. 2007, 450 (7167): 219-232. 10.1038/nature06340.PubMedPubMed CentralView ArticleGoogle Scholar
- Steigele S, Huber W, Stocsits C, Stadler P, Nieselt K: Comparative analysis of structured RNAs in S. cerevisiae indicates a multitude of different functions. BMC Biol. 2007, 5 (1): 25-10.1186/1741-7007-5-25.PubMedPubMed CentralView ArticleGoogle Scholar
- Nackley AG, Shabalina SA, Tchivileva IE, Satterfield K, Korchynskyi O, Makarov SS, Maixner W, Diatchenko L: Human catechol-O-methyltransferase haplotypes modulate protein expression by altering mRNA secondary structure. Science. 2006, 314 (5807): 1930-1933. 10.1126/science.1131262.PubMedView ArticleGoogle Scholar
- Bartoszewski RA, Jablonsky M, Bartoszewska S, Stevenson L, Dai Q, Kappes J, Collawn JF, Bebok Z: A synonymous single nucleotide polymorphism in ΔF508 CFTR alters the secondary structure of the mRNA and the expression of the mutant protein. J Biol Chem. 2010, 285 (37): 28741-28748. 10.1074/jbc.M110.154575.PubMedPubMed CentralView ArticleGoogle Scholar
- Chen J-M, Férec C, Cooper D: A systematic analysis of disease-associated variants in the 3′ regulatory regions of human protein-coding genes II: the importance of mRNA secondary structure in assessing the functionality of 3′ UTR variants. Hum Genet. 2006, 120 (3): 301-333. 10.1007/s00439-006-0218-x.PubMedView ArticleGoogle Scholar
- Naslavsky MS, Crovella S, Filho JLL, Rocha CRC: The sound of silence: human β-defensin-1 gene untranslated SNPs change the predicted mRNA secondary structure in a length-dependent manner. Immunol Lett. 2010, 129 (1): 53-55. 10.1016/j.imlet.2009.12.024.PubMedView ArticleGoogle Scholar
- Thomas LF, Saito T, Sætrom P: Inferring causative variants in microRNA target sites. Nucleic Acids Res. 2011, 39 (16): e109-10.1093/nar/gkr414.PubMedPubMed CentralView ArticleGoogle Scholar
- Hurst LD: Molecular genetics: the sound of silence. Nature. 2011, 471 (7340): 582-583. 10.1038/471582a.PubMedView ArticleGoogle Scholar
- Chursov A, Frishman D, Shneider A: Conservation of mRNA secondary structures may filter out mutations in Escherichia coli evolution. Nucleic Acids Res. 2013, 41 (16): 7854-7860. 10.1093/nar/gkt507.PubMedPubMed CentralView ArticleGoogle Scholar
- Warden CD, Kim S-H, Yi SV: Predicted functional RNAs within coding regions constrain evolutionary rates of yeast proteins. PLoS One. 2008, 3 (2): e1559-10.1371/journal.pone.0001559.PubMedPubMed CentralView ArticleGoogle Scholar
- Findeiß S, Engelhardt J, Prohaska SJ, Stadler PF: Protein-coding structured RNAs: A computational survey of conserved RNA secondary structures overlapping coding regions in drosophilids. Biochimie. 2011, 93 (11): 2019-2023. 10.1016/j.biochi.2011.07.023.PubMedView ArticleGoogle Scholar
- Meyer IM, Miklós I: Statistical evidence for conserved, local secondary structure in the coding regions of eukaryotic mRNAs and pre-mRNAs. Nucleic Acids Res. 2005, 33 (19): 6338-6348. 10.1093/nar/gki923.PubMedPubMed CentralView ArticleGoogle Scholar
- Siepel A, Haussler D: Combining phylogenetic and hidden Markov models in biosequence analysis. J Comput Biol. 2004, 11 (2–3): 413-428.PubMedView ArticleGoogle Scholar
- Sabarinathan R, Tafer H, Seemann SE, Hofacker IL, Stadler PF, Gorodkin J: RNAsnp: efficient detection of local RNA secondary structure changes induced by SNPs. Hum Mutat. 2013, 34 (4): 546-556.PubMedPubMed CentralGoogle Scholar
- Mantel N, Haenszel W: Statistical aspects of the analysis of data from retrospective studies of disease. J Natl Cancer Inst. 1959, 22 (4): 719-748.PubMedGoogle Scholar
- Mantel N: Chi-Square Tests with One Degree of Freedom; Extensions of the Mantel- Haenszel Procedure. J Am Stat Assoc. 1963, 58 (303): 690-700.Google Scholar
- Wright F: The ‘effective number of codons’ used in a gene. Gene. 1990, 87 (1): 23-29. 10.1016/0378-1119(90)90491-9.PubMedView ArticleGoogle Scholar
- Fuglsang A: Accounting for background nucleotide composition when measuring codon usage bias: brilliant idea, difficult in practice. Mol Biol Evol. 2006, 23 (7): 1345-1347. 10.1093/molbev/msl009.PubMedView ArticleGoogle Scholar
- Novembre JA: Accounting for background nucleotide composition when measuring codon usage bias. Mol Biol Evol. 2002, 19 (8): 1390-1394. 10.1093/oxfordjournals.molbev.a004201.PubMedView ArticleGoogle Scholar
- Zhou T, Wilke CO: Reduced stability of mRNA secondary structure near the translation-initiation site in dsDNA viruses. BMC Evol Biol. 2011, 11: 59-10.1186/1471-2148-11-59.PubMedPubMed CentralView ArticleGoogle Scholar
- Kudla G, Murray AW, Tollervey D, Plotkin JB: Coding-sequence determinants of gene expression in Escherichia coli. Science. 2009, 324 (5924): 255-258. 10.1126/science.1170160.PubMedPubMed CentralView ArticleGoogle Scholar
- Bazykin GA, Kochetov AV: Alternative translation start sites are conserved in eukaryotic genomes. Nucleic Acids Res. 2011, 39 (2): 567-577. 10.1093/nar/gkq806.PubMedPubMed CentralView ArticleGoogle Scholar
- Akashi H: Gene expression and molecular evolution. Curr Opin Genet Dev. 2001, 11 (6): 660-666. 10.1016/S0959-437X(00)00250-1.PubMedView ArticleGoogle Scholar
- Sharp PM, Tuohy TM, Mosurski KR: Codon usage in yeast: cluster analysis clearly differentiates highly and lowly expressed genes. Nucleic Acids Res. 1986, 14 (13): 5125-5143. 10.1093/nar/14.13.5125.PubMedPubMed CentralView ArticleGoogle Scholar
- Zhou T, Weems M, Wilke CO: Translationally optimal codons associate with structurally sensitive sites in proteins. Mol Biol Evol. 2009, 26 (7): 1571-1580. 10.1093/molbev/msp070.PubMedPubMed CentralView ArticleGoogle Scholar
- Drummond DA, Wilke CO: Mistranslation-induced protein misfolding as a dominant constraint on coding-sequence evolution. Cell. 2008, 134 (2): 341-352. 10.1016/j.cell.2008.05.042.PubMedPubMed CentralView ArticleGoogle Scholar
- Duret L: Evolution of synonymous codon usage in metazoans. Curr Opin Genet Dev. 2002, 12 (6): 640-649. 10.1016/S0959-437X(02)00353-2.PubMedView ArticleGoogle Scholar
- Zhou T, Gu W, Wilke CO: Detecting positive and purifying selection at synonymous sites in yeast and worm. Mol Biol Evol. 2010, 27 (8): 1912-1922. 10.1093/molbev/msq077.PubMedPubMed CentralView ArticleGoogle Scholar
- Chamary JV, Hurst LD: Evidence for selection on synonymous mutations affecting stability of mRNA secondary structure in mammals. Genome Biol. 2005, 6 (9): R75-10.1186/gb-2005-6-9-r75.PubMedPubMed CentralView ArticleGoogle Scholar
- Parmley JL, Chamary JV, Hurst LD: Evidence for purifying selection against synonymous mutations in mammalian exonic splicing enhancers. Mol Biol Evol. 2006, 23 (2): 301-309.PubMedView ArticleGoogle Scholar
- Warnecke T, Hurst LD: Evidence for a trade-off between translational efficiency and splicing regulation in determining synonymous codon usage in Drosophila melanogaster. Mol Biol Evol. 2007, 24 (12): 2755-2762. 10.1093/molbev/msm210.PubMedView ArticleGoogle Scholar
- Komar AA, Lesnik T, Reiss C: Synonymous codon substitutions affect ribosome traffic and protein folding during in vitro translation. FEBS Lett. 1999, 462 (3): 387-391. 10.1016/S0014-5793(99)01566-5.PubMedView ArticleGoogle Scholar
- Kimchi-Sarfaty C, Oh JM, Kim IW, Sauna ZE, Calcagno AM, Ambudkar SV, Gottesman MM: A “silent” polymorphism in the MDR1 gene changes substrate specificity. Science. 2007, 315 (5811): 525-528. 10.1126/science.1135308.PubMedView ArticleGoogle Scholar
- Zhang G, Hubalewska M, Ignatova Z: Transient ribosomal attenuation coordinates protein synthesis and co-translational folding. Nat Struct Mol Biol. 2009, 16 (3): 274-280. 10.1038/nsmb.1554.PubMedView ArticleGoogle Scholar
- Duret L, Mouchiroud D: Determinants of substitution rates in mammalian genes: expression pattern affects selection intensity but not mutation rate. Mol Biol Evol. 2000, 17 (1): 68-74. 10.1093/oxfordjournals.molbev.a026239.PubMedView ArticleGoogle Scholar
- Pal C, Papp B, Hurst LD: Highly expressed genes in yeast evolve slowly. Genetics. 2001, 158 (2): 927-931.PubMedPubMed CentralGoogle Scholar
- Drummond DA, Raval A, Wilke CO: A single determinant dominates the rate of yeast protein evolution. Mol Biol Evol. 2006, 23 (2): 327-337.PubMedView ArticleGoogle Scholar
- Kurland CG: Codon bias and gene expression. FEBS Lett. 1991, 285 (2): 165-169. 10.1016/0014-5793(91)80797-7.PubMedView ArticleGoogle Scholar
- Ikemura T: Codon usage and tRNA content in unicellular and multicellular organisms. Mol Biol Evol. 1985, 2 (1): 13-34.PubMedGoogle Scholar
- Henry I, Sharp PM: Predicting gene expression level from codon usage bias. Mol Biol Evol. 2007, 24 (1): 10-12.PubMedView ArticleGoogle Scholar
- Silverman IM, Li F, Gregory BD: Genomic era analyses of RNA secondary structure and RNA-binding proteins reveal their significance to post-transcriptional regulation in plants. Plant Sci. 2013, 205–206: 55-62.PubMedView ArticleGoogle Scholar
- Shah P, Ding Y, Niemczyk M, Kudla G, Plotkin JB: Rate-limiting steps in yeast protein translation. Cell. 2013, 153 (7): 1589-1601. 10.1016/j.cell.2013.05.049.PubMedPubMed CentralView ArticleGoogle Scholar
- Goodman DB, Church GM, Kosuri S: Causes and effects of N-terminal codon bias in bacterial genes. Science. 2013, 342 (6157): 475-479. 10.1126/science.1241934.PubMedView ArticleGoogle Scholar
- Dvir S, Velten L, Sharon E, Zeevi D, Carey LB, Weinberger A, Segal E: Deciphering the rules by which 5′-UTR sequences affect protein expression in yeast. Proc Natl Acad Sci. 2013, 110 (30): E2792-E2801. 10.1073/pnas.1222534110.PubMedPubMed CentralView ArticleGoogle Scholar
- Bentele K, Saffert P, Rauscher R, Ignatova Z, Bluthgen N: Efficient translation initiation dictates codon usage at gene start. Mol Syst Biol. 2013, 9: 675-PubMedPubMed CentralView ArticleGoogle Scholar
- Kim Y, Lee G, Jeon E, Sohn E, Lee Y, Kang H, Lee D, Kim DH, Hwang I: The immediate upstream region of the 5′-UTR from the AUG start codon has a pronounced effect on the translational efficiency in Arabidopsis thaliana. Nucleic Acids Res. 2014, 42 (1): 485-498. 10.1093/nar/gkt864.PubMedPubMed CentralView ArticleGoogle Scholar
- Mokrejš M, Vopálenský V, Kolenatý O, Mašek T, Feketová Z, Sekyrová P, Skaloudová B, Kříž V, Pospíšek M: IRESite: the database of experimentally verified IRES structures. Nucleic Acids Res. 2006, 34 (1): D125-D130. http://www.iresite.org,PubMedPubMed CentralView ArticleGoogle Scholar
- Kaempfer R: Interferon- mRNA attenuates its own translation by activating PKR: A molecular basis for the therapeutic effect of interferon- in multiple sclerosis. Cell Res. 2006, 16 (2): 148-153. 10.1038/sj.cr.7310020.PubMedView ArticleGoogle Scholar
- Edgar RC: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004, 32 (5): 1792-1797. 10.1093/nar/gkh340.PubMedPubMed CentralView ArticleGoogle Scholar
- Stamatakis A: RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics. 2006, 22 (21): 2688-2690. 10.1093/bioinformatics/btl446.PubMedView ArticleGoogle Scholar
- Stone EA, Sidow A: Constructing a meaningful evolutionary average at the phylogenetic center of mass. BMC Bioinformatics. 2007, 8: 222-10.1186/1471-2105-8-222.PubMedPubMed CentralView ArticleGoogle Scholar
- Siepel A, Haussler D: Phylogenetic estimation of context-dependent substitution rates by maximum likelihood. Mol Biol Evol. 2004, 21 (3): 468-488.PubMedView ArticleGoogle Scholar
- Covert MW, Knight EM, Reed JL, Herrgard MJ, Palsson BO: Integrating high-throughput and computational data elucidates bacterial networks. Nature. 2004, 429 (6987): 92-96. 10.1038/nature02456.PubMedView ArticleGoogle Scholar
- Holstege FC, Jennings EG, Wyrick JJ, Lee TI, Hengartner CJ, Green MR, Golub TR, Lander ES, Young RA: Dissecting the regulatory circuitry of a eukaryotic genome. Cell. 1998, 95 (5): 717-728. 10.1016/S0092-8674(00)81641-4.PubMedView ArticleGoogle Scholar
- Stolc V, Gauhar Z, Mason C, Halasz G, van Batenburg MF, Rifkin SA, Hua S, Herreman T, Tongprasit W, Barbano PE, Bussemaker HJ, White KP: A gene expression map for the euchromatic genome of Drosophila melanogaster. Science. 2004, 306 (5696): 655-660. 10.1126/science.1101312.PubMedView ArticleGoogle Scholar
- Su AI, Wiltshire T, Batalov S, Lapp H, Ching KA, Block D, Zhang J, Soden R, Hayakawa M, Kreiman G, Cooke MP, Walker JR, Hogenesch JB: A gene atlas of the mouse and human protein-encoding transcriptomes. Proc Natl Acad Sci U S A. 2004, 101 (16): 6062-6067. 10.1073/pnas.0400782101.PubMedPubMed CentralView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.