Exploring the selective constraint on the sizes of insertions and deletions in 5' untranslated regions in mammals
© Chen et al; licensee BioMed Central Ltd. 2011
Received: 10 February 2011
Accepted: 5 July 2011
Published: 5 July 2011
Small insertions and deletions ("indels" with size ≦ 100 bp) whose lengths are not multiples of three (non-3n) are strongly constrained and depleted in protein-coding sequences. Such a constraint has never been reported in noncoding genomic regions. In 5'untranslated regions (5'UTRs) in mammalian genomes, upstream start codons (uAUGs) and upstream open reading frames (uORFs) can regulate protein translation. The presence of non-3n indels in uORFs can potentially disrupt the functions of these regulatory elements. We thus hypothesize that natural selection disfavors non-3n indels in 5'UTRs when these regulatory elements are present.
We design the Indel Selection Index to measure the selective constraint on non-3n indels in 5'UTRs. The index controls for the genomic compositions of the analyzed 5'UTRs and measures the probability of non-3n indel depletion downstream of uAUGs. By comparing the experimentally supported transcripts of human-mouse orthologous genes, we demonstrate that non-3n indels downstream of two types of uAUGs (alternative translation initiation sites and the uAUGs of coding sequence-overlapping uORFs) are underrepresented. The results hold well regardless of differences in alignment tool, gene structures between human and mouse, or the criteria in selecting alternatively spliced isoforms used for the analysis.
To our knowledge, this is the first study to demonstrate selective constraints on non-3n indels in 5'UTRs. Such constraints may be associated with the regulatory functions of uAUGs/uORFs in translational regulation or the generation of protein isoforms. Our study thus brings a new perspective to the evolution of 5'UTRs in mammals.
Insertion and deletion mutations (indels) frequently occur during the evolution of mammalian genomes [1–4]. Most of these indels are selectively neutral when they occur in noncoding genomic regions [5, 6]. Meanwhile, indels indivisible by three (designated as "non-3n" indels, in contrast to "3n indels", indels divisible by three) were found to be underrepresented in coding sequences (CDS) [4, 7]. This is because non-3n indels lead to frameshift in CDS (which frequently results in pseudogenization), while 3n indels do not. Therefore, non-3n indels in CDS have larger chances of being removed by natural selection. If reading frame preservation is the only evolutionary constraint on indel size, 3n indels and non-3n indels should have the same probability of being retained in noncoding regions. Indeed, a former study only found an overrepresentation of 3n indels in CDS, but not in noncoding sequences in the human genome .
To test our hypothesis, we develop the Indel Selection Index (ISI, see Methods) to examine whether the occurrence of non-3n indels between the first uAUGs and the translation initiation site ("TIS", also known as the main start codon) of the main CDS are selectively disfavored during the evolution of mammalian genes after primate-rodent divergence. Since non-3n indels may affect protein expression more seriously when they occur downstream of the uAUGs of alternative uORF and overlapping uORFs (Figure 1B), purifying selection on non-3n indels is expected to be particularly stringent in these regions, while relatively relaxed in the case of strict uORFs. Our results clearly support this hypothesis. This study thus offers a new perspective to the evolution of 5'UTRs, in that the sizes of indels can be subject to selective constraint in these genomic regions conditional on the presence of certain regulatory elements.
We compared human and mouse orthologous genes to examine whether non-3n indels are selectively constrained downstream of uAUGs. While the range of 5'UTRs varies in alternatively spliced mRNA isoforms, we used three different criteria to select one transcript for each gene for this analysis (see Methods): (1) a randomly selected transcript, (2) the transcript with the longest 5'UTR, and (3) the transcript with a "pure" 5'UTR (i.e. a 5'UTR that does not overlap with the CDS of any other splicing isoforms). These three different selection criteria have distinct biological implications. The first dataset assumes that for each gene, all of the isoforms are equally important to the organism. The assumption may not be true, but random sampling may fairly rule out potential sampling biases. The second dataset contains the largest number of uAUGs. However, some uAUGs in these 5'UTRs may overlap with the CDS of other isoforms of the same gene. In contrast, although the third dataset enables us to explore the selection pressure that works exclusively on 5'UTRs, only a small proportion of transcripts with biased properties can be included in the analysis. These transcripts tend to be the products of the genes with short 5'UTR, genes that are lowly expressed, or genes not alternatively spliced, although > 90% of human genes have multiple isoforms . Given the distinct properties of selected transcripts in the three datasets, we reason that if our hypothesis is correct, all three datasets will yield consistent results.
Transcripts of human-mouse orthologous genes analyzed in this study
No. of genes (%)b
Single uORF type
Multiple types of uORF
To evaluate the evolutionary constraints on indel lengths in different parts of a 5'UTR, we developed the Indel Selection Index ("ISI"; see Methods). In brief, ISI measures the probability of observing a higher frequency of non-3n indels downstream of a uAUG (or a reference point), as compared to the 5'UTR region upstream of the uAUG (or reference point). In other words, a small ISI indicates a depletion of non-3n indels downstream of a certain 5'UTR position. The use of ISI thus controls for the properties potentially specific to 5'UTRs. We first analyzed the ISI distribution of the G0 transcripts, where the ratios of non-3n to 3n indels are expected to be approximately equal between the "upstream" and "downstream" regions to a given position of the 5'UTRs. We assigned a reference point to each of the transcript and shifted the point from 10% to 90% of the 5'UTR lengths (with intervals of 10%) from the cap to obtain the ISI values. ISI values of the G0 transcripts vary with the reference point position (Additional file 1). For both human and mouse, the median ISI values remain approximately equal when the reference point is located at 30~70% of the 5'UTR lengths from the 5' cap, but drop toward both ends of 5'UTR. Therefore, controlling the ratio of upstream/downstream length is necessary when analyzing the ISI values. We then compared the ISI values of uAUG-containing transcripts (Ga, Gs, and Gv) with those of G0 transcripts with corresponding upstream/downstream ratios. Note that the uAUG-containing transcripts may have multiple uAUGs of the same type. In such cases, we used the first uAUG from the cap as the reference point.
Notably, the locations of reference points of the corresponding G0 transcripts actually differ significantly among Ga, Gs, and Gv. In view of the variations of ISI values with different reference point locations (Additional file 1), the comparisons among the three transcript groups appear unfair. Particularly, all of the Gs transcripts are compared against the G0 transcripts with their reference points located at ~25% from the cap, whereas the percentages for Ga and Gv fall between 54%~69% (Figure 2). To address this issue, we divided the Gv transcripts into three equal-sized groups according to the relative positions of their uAUGs (we did not perform the analysis for Ga because of its small sample size). As shown in Additional file 3 the first Gv subgroups (Gv_1) were compared against G0 transcripts with reference points located at 14%~16% from the cap for human, and 24%~26% for mouse. These Gv transcripts have uAUGs located closer to the cap than their Gs counterparts, and they still have ISI values significantly lower than the corresponding G0 transcripts. Similar results are also observed for Gv_2 (Additional file 3). Gv_3 transcripts show a similar trend, although the differences in ISI values are statistically insignificant, possibly due to reduced sample sizes.
Possible reasons for non-3n indel depletion in 5'UTRs
We have demonstrated that non-3n indels are subject to purifying selection in mammalian 5'UTRs conditional on the presence of uORFs. We show that both alternative and overlapping uORFs contribute to decreased non-3n indels downstream of their uAUGs, and that strict uORFs have only minor effects in this regard. These results hold well when technical issues in transcript isoform selection, difference in alignment tool, and differences in transcript structures between human and mouse are controlled.
The suppression of non-3n indels downstream of AISs and VuAUGs implies the functional importance of these two uAUG types. Two possible reasons may explain this observation. The first is the functional importance of uORF-associated protein products. Translation of overlapping uORFs always generates radically different peptides from those translated from the main CDS because of the difference in reading frame . Strict uORFs can also produce functional proteins when translated [28, 29]. Therefore, additional constraints on non-3n indels unrelated to peptide coding may have separated strict uORFs from the other two types of uORFs. Another explanation is that overlapping uORFs can give rise to in-frame N-truncated peptides, which may have different molecular functions from the original, full-length peptides . Such N-truncated peptides may change in length or simply disappear if non-3n indels occur downstream of VuAUGs. On the other hand, alternative uORFs can lead to the generation of N-extended peptides, which could have different functions from original peptides (e.g. the human regulators of G-protein signalling (RGS2)) . The functional disruption of such N-extended or N-truncated peptides by non-3n indels may be detrimental to the organism in general, and thus could have been removed by natural selection. A recent study provides evidence of the importance of alternative translation start sites by showing that start codons downstream of TISs are evolutionarily conserved . It is suggested that alternative start sites may work as "backup" translational initiation sites and thus may increase the efficiency of translation . The same comment likely also applies to the AISs analyzed in this study.
The second possible explanation for the uAUG-related selection pressure on non-3n indels is the evolutionary conservation of protein abundance. As shown in Figure 1B, non-3n indels may cause interchanges between the three types of uORFs (Figure 1B). In the case of alternative uORFs, downstream non-3n indels can lead to the generation of either strict or overlapping uORFs, causing strong inhibition of protein production of the main CDS in both cases. Such drastic changes in protein abundance are likely deleterious. By contrast, when a non-3n indel occurs downstream of an SuAUG, the affected strict uORF may either become an alternative or overlapping uORF, or remain a strict uORF (but with a different length) (Figure 1B). We suggest that the latter case is more likely, for changing a strict uORF to an alternative or overlapping uORF requires that the reading frame starting from the uAUG remain open until it reaches the TIS. Furthermore, in the case of alternative uORF, the reading frame must be the same as that used by the main CDS. These requirements are difficult to fulfil considering that SuAUGs are relatively distant from the TIS (about 70~75% of the 5'UTR length). As such, non-3n indels that occur downstream of SuAUGs may not lead to changes in uORF type in most cases. Such indels may have no significant effects on changing the protein abundance of the downstream CDS, and thus may have small fitness effects. Lastly, in the case of overlapping uORFs, the occurrence of non-3n indels has a higher possibility of changing them into alternative uORFs than in the case of strict uORFs, because VuAUGs are typically closer to the TIS (about 40% of the 5'UTR length). Overlapping uORFs can result in nearly complete inhibition of protein translation or generation of N-truncated protein [10, 18, 19]. Furthermore, overlapping uORFs can serve important regulatory roles . Therefore, non-3n indels in overlapping uORFs may be selectively disfavoured. A non-3n indel may also change an overlapping uORF to a strict uORF, or simply change the length of the original uORF (without changing its type). In these cases, non-3n indels may not have significant effects in changing protein abundance, and thus may be tolerated by selection.
One unexpected observation from our results is that four of the six datasets demonstrate lower P values in the Gv transcripts than in the Ga transcripts (Figure 3), indicating that overlapping uORFs may have contributed stronger constraints on non-3n indels in 5'UTRs than alternative uORFs. That said, the real cause of this Gv-Ga difference remains unclear.
Notably, it has been recently reported that 3'UTRs actually have a larger effect on protein abundance than 5'UTRs, which appear to account for ~1% of the variations in protein abundance . How can we observe any selection pressure on non-3n indels in 5'UTRs if these non-coding regions have such a "small" effect on protein abundance? There are three possible explanations. Firstly, as we mentioned above, the non-3n indels in 5'UTRs may affect both the abundance and the peptide sequence of the affected gene. The "qualitative" change may also be subject to selection pressure. Secondly, even though on average 5'UTRs account for only a small proportion of the variations in protein abundance genome-widely, in individual genes the variations can be very large, as was demonstrated by a recent study . The cases where 5'TURs have very small effects on protein abundance may actually add to the noise in our analysis. However, we have found clear signals of selection pressure on non-3n indels despite these potential noises, which in fact reflects the strength of the "real" signals. Finally, the uORFs per se may be biologically functional, in terms of either their peptide products or their regulatory roles. The disruption of functional uORFs is thus likely subject to selective constraint.
Limitations of the study
The current analysis contains a few limitations. Firstly, determining which transcript of a gene to analyze is controversial. This study used three different criteria for transcript selection. In the case of the transcripts with the longest 5'UTRs, one uORF may partly overlap with CDS. In this case, the ISI value may be smaller than expected because of the strong constraint on reading frame preservation in coding sequences. In addition, the classification of the uORFs (alternative, strict, or overlapping) in these 5'UTRs is sometimes ambiguous (see Additional file 3 for an example). Analyzing the transcripts with pure 5'UTRs avoids this problem. However, this practice will lead to significantly decreased numbers of uORFs and severely reduced sample sizes, which in turn may result in decreased statistic power and potential sampling biases. These two criteria for transcript selection represent two extremes. The results derived using randomly selected transcripts fall in-between. Nevertheless, this study obtained consistent results across all datasets, indicating that depletion of non-3n indels is unlikely to result solely from constraints in the main coding sequences.
Secondly, since our study is based on pairwise sequence alignments, we cannot distinguish between insertions and deletions, nor can we infer the lineage specificity of the identified indels. We do not know exactly in which lineage the indels have affected the structures of the 5'UTRs (i.e. the types of uORFs). This is important because the 5'UTRs of human and mouse transcripts may have different lengths and uORFs. When a non-3n indel occurs to a lineage whose transcript does not contain any uORFs or 5'UTR exons, this indel adds to the noise in our analysis. We cannot eliminate such noises without using multiple-species sequence alignments (which, however, will dramatically decrease the sample size and render the analysis infeasible). To overcome this problem, this study performed analyses using the transcript structures of human and mouse separately. The results from both species turn out to be consistent with each other. Therefore, in spite of the above limitation, our results seem to have revealed a biological fact.
To the best of our knowledge, this is the first study to demonstrate the selective constraint on non-3n indels in 5'UTRs. This constraint may result from the requirement to preserve either the translational regulatory elements (uORFs) in 5'UTRs or the functions of the peptides whose translation is associated with uORFs. Our results thus demonstrate the impacts of indels in the evolution of 5'UTRs in mammalian genomes and re-assure the functional importance of uORFs from a different angle.
In this study, we analyze the human and mouse transcripts because their genomes have been fully sequenced and extensively curated [33, 34]. In addition, the transcriptomes of these two species have been well characterized. The annotations for 5'UTRs are thus fairly accurate for the two species. The sequences of experimentally verified transcripts with known protein products of one-to-one human-mouse orthologous genes, based on the Ensembl release 54 http://www.ensembl.org, were retrieved through BioMart . Non-protein-coding genes and protein-coding genes whose transcripts did not contain 5'UTRs were excluded. Potentially misannotated transcripts (whose locations of TISs were inconsistent with that observed in the DNA sequences) were also excluded. To avoid repetitive counts of the same indels, only one transcript was selected for each gene by three different criteria: (1) a randomly selected transcript, (2) the transcript with the longest 5'UTR, and (3) the transcript with a "pure" 5'UTR. A pure 5'UTR is one that does not overlap with any coding sequences in other splicing isoforms (See Additional file 4 for an example).
Sequence alignments and identification of indels and uORFs
Indels were identified based on the human-mouse pairwise genomic sequence alignments downloaded from the University of California, Santa Cruz (UCSC) Genome Browser http://genome.ucsc.edu/ . The versions of the human and mouse genomes (hg18 and mm9, respectively) correspond to Ensembl release 54.
To ensure that human-mouse orthologous sequences were compared in our study, this work only retained the alignments that include one-to-one human-mouse orthologous genes annotated by Ensembl. In addition, to avoid mis-assignment of gaps (indels), the alignable exonic sequences in one species are required to overlap with > 80% of the annotated exonic sequences of the other species. The alignments must cover the entirety of the annotated 5'UTRs. Consequently, this study obtained ~6,000 human and mouse genes for subsequent analyses (Table 1). To examine whether different alignment tools affect the overall results, the Pecan alignment program  was used with default parameters to re-align the retrieved human-mouse orthologous sequences.
The 5'UTRs of the retrieved transcripts were then scanned for the existence of uAUGs. Around half of the analyzed human and mouse transcripts were found to have at least one uAUG (Additional file 5). These proportions are similar to those observed in previous studies [16, 24]. Here, a uORF is defined as a putative open reading frame in 5'UTR starting with a uAUG and composed of at least nine nucleotides, including the stop codon. Human and mouse orthologous genes may have different 5'UTR structures and different numbers and types of uORFs. Therefore, we performed our analyses according to the human and mouse gene annotations separately.
Measurement of selection pressure on indel lengths -- the indel selection index
where N represents the number of indels, the subscripts "n3n" and "3n" represent non-3n and 3n indels. The subscripts "d" and "u" indicate the 5'UTR regions downstream and upstream of the uAUGs (or reference points) of interest. Adding the pseudocount 0.5 ensures that the denominator is not zero. Using different pseudocounts changes the magnitude, but not the sign (positive or negative) of the R value (Additional file 6). The ratio of non-3n to 3n indels upstream of uAUGs serves as the "background" to measure the depletion (or enrichment) of non-3n indels downstream of uAUGs.
which represents the proportion of the randomized R-values (R shuffled ) that is smaller than the observed R-value. The distribution of R shuffled was generated by randomly shuffling the locations of the indels 1,000 times for each transcript, while retaining the lengths and numbers of indels of the 5'UTR. An ISI value smaller than the neutral expectation indicates depletion of non-3n indels downstream of a uAUG. The neutral distribution of ISIs were derived from the transcripts without uAUGs (see Additional file 1), with the upstream/downstream differentiation point moving from the cap to TIS by an increment of 1% of 5'UTR length. Note that we use ISI rather than comparing the non-3n to 3n indel ratios between 5'UTR and intergenic/intronic regions to control for the potential biological differences between 5'UTR and the other noncoding regions.
The authors wish to thank Hsuan-Yu Lin for his assistance in data processing and analysis. We also thank Dr. Wen-Chang Wang and Tsung-Kai Chang for statistical advice. Intramural funding from the National Health Research Institutes, Taiwan, to BYL and FCC, supported this work.
- Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, Agarwal P, Agarwala R, Ainscough R, Alexandersson M, An P, Antonarakis SE, Attwood J, Baertsch R, Bailey J, Barlow K, Beck S, Berry E, Birren B, Bloom T, Bork P, Botcherby M, Bray N, Brent MR, Brown DG, Brown SD, Bult C, Burton J, Butler J, Campbell RD, Carninci P, et al: Initial sequencing and comparative analysis of the mouse genome. Nature. 2002, 420 (6915): 520-562. 10.1038/nature01262.View ArticlePubMedGoogle Scholar
- Consortium TCSaA: Initial sequence of the chimpanzee genome and comparison with the human genome. Nature. 2005, 437 (7055): 69-87. 10.1038/nature04072.View ArticleGoogle Scholar
- Mills RE, Luttig CT, Larkins CE, Beauchamp A, Tsui C, Pittard WS, Devine SE: An initial map of insertion and deletion (INDEL) variation in the human genome. Genome Res. 2006, 16 (9): 1182-1190. 10.1101/gr.4565806.View ArticlePubMedPubMed CentralGoogle Scholar
- Chen FC, Chen CJ, Li WH, Chuang TJ: Human-specific insertions and deletions inferred from mammalian genome sequences. Genome Res. 2007, 17 (1): 16-22.View ArticlePubMedPubMed CentralGoogle Scholar
- Chen CH, Chuang TJ, Liao BY, Chen FC: Scanning for the signatures of positive selection for human-specific insertions and deletions. Genome Biol Evol. 2009, 1 (1):
- Lunter G, Ponting CP, Hein J: Genome-wide identification of human functional DNA using a neutral indel model. PLoS Comput Biol. 2006, 2 (1): e5-10.1371/journal.pcbi.0020005.View ArticlePubMedPubMed CentralGoogle Scholar
- de la Chaux N, Messer PW, Arndt PF: DNA indels in coding regions reveal selective constraints on protein evolution in the human lineage. BMC Evol Biol. 2007, 7 (1): 191-10.1186/1471-2148-7-191.View ArticlePubMedPubMed CentralGoogle Scholar
- Wilkie GS, Dickson KS, Gray NK: Regulation of mRNA translation by 5'- and 3'-UTR-binding factors. Trends Biochem Sci. 2003, 28 (4): 182-188. 10.1016/S0968-0004(03)00051-3.View ArticlePubMedGoogle Scholar
- Sonenberg N, Hinnebusch AG: Regulation of translation initiation in eukaryotes: mechanisms and biological targets. Cell. 2009, 136 (4): 731-745. 10.1016/j.cell.2009.01.042.View ArticlePubMedPubMed CentralGoogle Scholar
- Jackson RJ, Hellen CU, Pestova TV: The mechanism of eukaryotic translation initiation and principles of its regulation. Nat Rev Mol Cell Biol. 2010, 11 (2): 113-127. 10.1038/nrm2838.View ArticlePubMedPubMed CentralGoogle Scholar
- Kozak M: Effects of intercistronic length on the efficiency of reinitiation by eucaryotic ribosomes. Mol Cell Biol. 1987, 7 (10): 3438-3445.View ArticlePubMedPubMed CentralGoogle Scholar
- van der Velden AW, Thomas AA: The role of the 5' untranslated region of an mRNA in translation regulation during development. Int J Biochem Cell Biol. 1999, 31 (1): 87-106. 10.1016/S1357-2725(98)00134-4.View ArticlePubMedGoogle Scholar
- Mignone F, Gissi C, Liuni S, Pesole G: Untranslated regions of mRNAs. Genome Biol. 2002, 3 (3): REVIEWS0004-View ArticlePubMedPubMed CentralGoogle Scholar
- Kochetov AV: Alternative translation start sites and hidden coding potential of eukaryotic mRNAs. Bioessays. 2008, 30 (7): 683-691. 10.1002/bies.20771.View ArticlePubMedGoogle Scholar
- Kozak M: The scanning model for translation: an update. J Cell Biol. 1989, 108 (2): 229-241. 10.1083/jcb.108.2.229.View ArticlePubMedGoogle Scholar
- Calvo SE, Pagliarini DJ, Mootha VK: Upstream open reading frames cause widespread reduction of protein expression and are polymorphic among humans. Proc Natl Acad Sci USA. 2009, 106 (18): 7507-7512. 10.1073/pnas.0810916106.View ArticlePubMedPubMed CentralGoogle Scholar
- Kochetov AV, Sarai A, Rogozin IB, Shumny VK, Kolchanov NA: The role of alternative translation start sites in the generation of human protein diversity. Mol Genet Genomics. 2005, 273 (6): 491-496. 10.1007/s00438-005-1152-7.View ArticlePubMedGoogle Scholar
- Kozak M: Regulation of translation via mRNA structure in prokaryotes and eukaryotes. Gene. 2005, 361: 13-37.View ArticlePubMedGoogle Scholar
- Kozak M: Pushing the limits of the scanning mechanism for initiation of translation. Gene. 2002, 299 (1-2): 1-34.
- Sarrazin S, Starck J, Gonnet C, Doubeikovski A, Melet F, Morle F: Negative and translation termination-dependent positive control of FLI-1 protein synthesis by conserved overlapping 5' upstream open reading frames in Fli-1 mRNA. Mol Cell Biol. 2000, 20 (9): 2959-2969. 10.1128/MCB.20.9.2959-2969.2000.View ArticlePubMedPubMed CentralGoogle Scholar
- Song KY, Choi HS, Hwang CK, Kim CS, Law PY, Wei LN, Loh HH: Differential use of an in-frame translation initiation codon regulates human mu opioid receptor (OPRM1). Cell Mol Life Sci. 2009, 66 (17): 2933-2942. 10.1007/s00018-009-0082-7.View ArticlePubMedGoogle Scholar
- Schrimpf SP, Weiss M, Reiter L, Ahrens CH, Jovanovic M, Malmstrom J, Brunner E, Mohanty S, Lercher MJ, Hunziker PE, Aebersold R, von Mering C, Hengartner MO: Comparative functional analysis of the Caenorhabditis elegans and Drosophila melanogaster proteomes. PLoS Biol. 2009, 7 (3): e48-10.1371/journal.pbio.1000048.View ArticlePubMedGoogle Scholar
- Sultan M, Schulz MH, Richard H, Magen A, Klingenhoff A, Scherf M, Seifert M, Borodina T, Soldatov A, Parkhomchuk D, Schmidt D, O'Keeffe S, Haas S, Vingron M, Lehrach H, Yaspo ML: A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome. Science. 2008, 321 (5891): 956-960. 10.1126/science.1160342.View ArticlePubMedGoogle Scholar
- Resch AM, Ogurtsov AY, Rogozin IB, Shabalina SA, Koonin EV: Evolution of alternative and constitutive regions of mammalian 5'UTRs. BMC Genomics. 2009, 10: 162-10.1186/1471-2164-10-162.View ArticlePubMedPubMed CentralGoogle Scholar
- Chen X, Tompa M: Comparative assessment of methods for aligning multiple genome sequences. Nat Biotechnol. 2010, 28 (6): 567-572. 10.1038/nbt.1637.View ArticlePubMedGoogle Scholar
- Paten B, Herrero J, Beal K, Fitzgerald S, Birney E: Enredo and Pecan: genome-wide mammalian consistency-based multiple alignment with paralogs. Genome Res. 2008, 18 (11): 1814-1828. 10.1101/gr.076554.108.View ArticlePubMedPubMed CentralGoogle Scholar
- Klemke M, Kehlenbach RH, Huttner WB: Two overlapping reading frames in a single exon encode interacting proteins--a novel way of gene usage. EMBO J. 2001, 20 (14): 3849-3860. 10.1093/emboj/20.14.3849.View ArticlePubMedPubMed CentralGoogle Scholar
- Cvijovic M, Dalevi D, Bilsland E, Kemp GJ, Sunnerhagen P: Identification of putative regulatory upstream ORFs in the yeast genome using heuristics and evolutionary conservation. BMC Bioinformatics. 2007, 8: 295-10.1186/1471-2105-8-295.View ArticlePubMedPubMed CentralGoogle Scholar
- Tautz D: Polycistronic peptide coding genes in eukaryotes--how widespread are they?. Brief Funct Genomic Proteomic. 2009, 8 (1): 68-74.View ArticlePubMedGoogle Scholar
- Gu S, Anton A, Salim S, Blumer KJ, Dessauer CW, Heximer SP: Alternative translation initiation of human regulators of G-protein signaling-2 yields a set of functionally distinct proteins. Mol Pharmacol. 2008, 73 (1): 1-11.View ArticlePubMedGoogle Scholar
- Bazykin GA, Kochetov AV: Alternative translation start sites are conserved in eukaryotic genomes. Nucleic Acids Res. 2010, 39 (2): 567-577.View ArticlePubMedPubMed CentralGoogle Scholar
- Vogel C, Abreu Rde S, Ko D, Le SY, Shapiro BA, Burns SC, Sandhu D, Boutz DR, Marcotte EM, Penalva LO: Sequence signatures and mRNA concentration can explain two-thirds of protein abundance variation in a human cell line. Mol Syst Biol. 2010, 6: 400-View ArticlePubMedPubMed CentralGoogle Scholar
- Church DM, Goodstadt L, Hillier LW, Zody MC, Goldstein S, She X, Bult CJ, Agarwala R, Cherry JL, DiCuccio M, Hlavina W, Kapustin Y, Meric P, Maglott D, Birtle Z, Marques AC, Graves T, Zhou S, Teague B, Potamousis K, Churas C, Place M, Herschleb J, Runnheim R, Forrest D, Amos-Landgraf J, Schwartz DC, Cheng Z, Lindblad-Toh K, Eichler EE, et al: Lineage-specific biology revealed by a finished genome assembly of the mouse. PLoS Biol. 2009, 7 (5): e1000112-10.1371/journal.pbio.1000112.View ArticlePubMedPubMed CentralGoogle Scholar
- Consortium IHGS: Finishing the euchromatic sequence of the human genome. Nature. 2004, 431 (7011): 931-945. 10.1038/nature03001.View ArticleGoogle Scholar
- Kasprzyk A, Keefe D, Smedley D, London D, Spooner W, Melsopp C, Hammond M, Rocca-Serra P, Cox T, Birney E: EnsMart: a generic system for fast and flexible access to biological data. Genome Res. 2004, 14 (1): 160-169.View ArticlePubMedPubMed CentralGoogle Scholar
- Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler D: The human genome browser at UCSC. Genome Res. 2002, 12 (6): 996-1006.View ArticlePubMedPubMed CentralGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.