The evolution of the coding exome of the Arabidopsisspecies - the influences of DNA methylation, relative exon position, and exon length
© Chen et al.; licensee BioMed Central Ltd. 2014
Received: 27 March 2014
Accepted: 19 June 2014
Published: 25 June 2014
The evolution of the coding exome is a major driving force of functional divergence both between species and between protein isoforms. Exons at different positions in the transcript or in different transcript isoforms may (1) mutate at different rates due to variations in DNA methylation level; and (2) serve distinct biological roles, and thus be differentially targeted by natural selection. Furthermore, intrinsic exonic features, such as exon length, may also affect the evolution of individual exons. Importantly, the evolutionary effects of these intrinsic/extrinsic features may differ significantly between animals and plants. Such inter-lineage differences, however, have not been systematically examined.
Here we examine how DNA methylation at CpG dinucleotides (CpG methylation), in the context of intrinsic exonic features (exon length and relative exon position in the transcript), influences the evolution of coding exons of Arabidopsis thaliana. We observed fairly different evolutionary patterns in A. thaliana as compared with those reported for animals. Firstly, the mutagenic effect of CpG methylation is the strongest for internal exons and the weakest for first exons despite the stringent selective constraints on the former group. Secondly, the mutagenic effect of CpG methylation increases significantly with length in first exons but not in the other two exon groups. Thirdly, CpG methylation level is correlated with evolutionary rates (dS, dN, and the dN/dS ratio) with markedly different patterns among the three exon groups. The correlations are generally positive, negative, and mixed for first, last, and internal exons, respectively. Fourthly, exon length is a CpG methylation-independent indicator of evolutionary rates, particularly for dN and the dN/dS ratio in last and internal exons. Finally, the evolutionary patterns of coding exons with regard to CpG methylation differ significantly between Arabidopsis species and mammals.
Our results suggest that intrinsic features, including relative exonic position in the transcript and exon length, play an important role in the evolution of A. thaliana coding exons. Furthermore, CpG methylation is correlated with exonic evolutionary rates differentially between A. thaliana and animals, and may have served different biological roles in the two lineages.
KeywordsDNA methylation Exon evolution Evolutionary rate Relative exon position Exon length
The evolution of the coding exome is a major driving force of functional divergence. In the past, a coding gene was considered as a basic unit for biological regulations and molecular functions. As such, in the majority of evolutionary studies, the “functional unit” targeted by natural selection is presumed to be a gene. However, with the advances in molecular biology and high-throughput sequencing technologies, it has gradually become clear that alternative transcript isoforms of the same gene (and the corresponding protein products) can be spatio-temporally regulated, and convey fairly divergent biological functions [1–5]. In other words, in many cases, a “transcript” rather than a “gene” is the biologically functional unit. The importance of transcript isoforms is particularly significant in complex organisms because they have highly developed networks of transcript/protein isoforms .
Transcript isoforms of the same gene differ from each other by alternatively spliced exonic regions. In cases where transcript isoforms convey distinct biological functions, the alternatively spliced exonic regions are crucial for the between-isoform functional divergences. These exonic regions should be accordingly targeted by natural selection. Therefore, the biological functions of alternative (and non-alternative) exonic sequences and the selection pressure thereon can be revealed by examining the evolutionary patterns of these sequences [6–11].
We previously examined the determinants of exonic evolutionary rates in mammals and Arabidopsis species. The biological factors that affect exonic evolutionary rates were found to differ between these two lineages [6, 7]. In addition, we discovered that in mammals, the position of an exon (first, last, or internal exon) in the transcript is significantly associated with the evolution of the exonic sequence in accordance with the level of DNA methylation at CpG dinucleotides (“CpG methylation” in short) . This is probably because the position of an exon is related to its biological function (or lack of function), thus making the exon selectively constrained for the function mediated by CpG methylation, or prone to the mutagenesis effect of CpG methylation . However, whether this proposition is also true for Arabidopsis remains unexplored.
Plant coding exons differ from their mammalian counterparts in several aspects. Firstly, alternative RNA splicing is less well developed, and plays a less important role in exon evolution in plants than in mammals [5, 7]. Secondly, on average, a plant gene includes fewer but longer exons than a mammalian gene [13–15]. Thirdly, the effective population sizes of plants (Arabidopsis thaliana as an example) are considerably larger than those of mammals (e.g. human and mouse) , giving rise to a higher efficiency of natural selection on plant exonic sequences. Given these differences, we expect the evolutionary patterns of Arabidopsis exons at different positions to diverge from those of their mammalian counterparts.
In this study, we systematically examined the mutational effects and of CpG methylation and its correlations with exonic evolutionary rates for A. thaliana coding exons at different positions. Our results indicate that first, last, and internal coding exons of A. thaliana have fairly different evolutionary patterns in this regard. The three exon groups diverge significantly in their liability to CpG methylation-related mutagenesis. Furthermore, the CpG methylation-evolutionary rate correlations differ significantly among the three exon groups. These correlations also differ significantly between Arabidopsis species and mammals. In addition, we found exon length to be a CpG methylation-independent indicator of exonic evolutionary rates in Arabidopsis species. Our results suggest that intrinsic exonic features (relative position and length) may be important determinants for the evolution of A. thaliana coding exons, and that CpG methylation may play different biological roles in the coding exons of mammals and Arabidopsis species.
The mutagenic effect of CpG methylation for exons at different positions
The methylome datasets and the background exome dataset analyzed in this study
Bisulfite Seq. read dept
Average mCG density (per 100 Sampled CpG)
Average mCG density (per 100 Sampled bp)
282.9 ± 325.0
The next question to ask is whether the mutagenic effect of CpG methylation differs for first, last, and internal exons. To address this issue, we evaluated the Pearson’s correlations as described above separately for each of the three exon groups. Unexpectedly, as shown in Figure 1B, although the mCG-CpG O/E correlations remain negative across the three exon groups, the strongest and the weakest mutagenic effect occur, respectively, in internal and first exons. This is to the contrary of what was previously observed for mammals, where the strongest mutagenic effect of CpG methylation occurs in first exons, while the weakest in internal exons .
The evolutionary rate profiles apparently are inconsistent with the mCG-related mutagenic effect profiles in first, last, and internal exons. Specifically, in internal exons, we observe the co-occurrence of a high median mCG density (Additional file 1), a strong mutagenic effect of CpG methylation (Figure 1B), and low evolutionary rates as compared with the other two exon groups (Figure 2). One possible explanation is that the strong selection pressure imposed on internal exons has significantly constrained the mCG-related mutations from occurring in this exon group. This appears to be true judging from the low dN/dS ratio in internal exons as compared with the other two exon groups (Figure 2). Interestingly, the median dS is also the lowest in internal exons, suggesting that synonymous substitutions are subject to strong purifying selection in this exon group.
The correlations between CpG methylation and exonic evolutionary rates
The associations between exon length and exonic sequence evolution
The Spearman’s coefficient of correlation (ρ) between exon length and the d N / d S ratio, d N , and d S before (upper row; “Original”) and after (lower row; “Control”) controlling for four potential confounding factors (ASE/CSE exon type, proprtion of repetitive elements/disordered regions, and exonic expression level)
The above observations may be confounded by other biological factors. For example, ASEs are known to have increased dN and the dN/dS ratios as compared with CSEs [8, 10, 18, 19]. Therefore, the increase in dN and the dN/dS ratio in longer exons might have resulted from an increase in the proportion of ASEs. Meanwhile, the proportion of repetitive elements (in terms of length) is also correlated with evolutionary rates because these elements are subject to relaxed selective constraints . Similar comments also apply to intrinsically disordered protein regions [21–23]. The next factor to consider is exonic expression level, which has been shown to be an important determinant of dN and the dN/dS ratio [6, 7]. We thus conducted partial Spearman’s correlation analyses while simultaneously controlling all of these four factors (the ASE/CSE exon type, proportion of repetitive elements/disordered regions, and exonic expression level). As shown in Table 2 (“Control”), the results remain virtually the same.
The last but a critical factor to control is the level of CpG methylation. To evaluate the influence of CpG methylation, we have to employ the sperm methylome datasets (S1 ~ S4), which include considerably fewer but longer exons as compared with the background dataset (Table 1). Using the four sperm methylome datasets, we again conducted partial Spearman’s correlation analysis between exon length and evolutionary rates while simultaneously controlling for mCG density, ASE/CSE exon type, proportion of repetitive elements/disordered region, and exonic expression level. As shown in Additional file 3, for first exons, the results are similar but the ρ values are decreased. This observation indicates that the length dependence of mCG-related mutagenic effect in first exons (Figure 4) accounts for part but not all of the length dependence of dN and dS in this exon group. Meanwhile, for last exons, the exon length-dS correlation becomes statistically insignificant in S2 and S4, whereas the exon length-dN and the exon length-dN/dS correlations remain statistically significant with decreased ρ values. These results imply that mCG-related mutations may account for part of the length dependence of dS, dN, and the dN/dS ratio in last exons. However, the decreases in ρ value and the level of statistical significance may also be ascribable to the decrease in sample size and the bias in exon length. By comparison, for internal exons, all of the correlations remain statistically significant with two notable changes as compared with the results in Table 2: (1) the ρ values of the exon length-dN and the exon length-dN/dS correlations are increased; and (2) the ρ values of the exon length-dS correlations turn negative. Therefore, for internal exons, mCG-related mutations appear to be an important factor affecting dS. Nevertheless, mCG-related mutations cannot explain the length dependence of dN and the dN/dS ratio in this exon groups.
Taken together, our results indicate that the correlations between exon length and the evolutionary measurements (dN, dS and the dN/dS ratio) are unaffected by the ASE/CSE exon type, proportion of repetitive elements/disordered region, and exonic expression level in any of the three exon groups. However, the level of CpG methylation may account for part of the exon length-evolutionary rate correlations differentially for first, last, and internal exons. In summary, exon length appears to be a CpG methylation-independent indicator for dN in all of the three exon groups, and for the dN/dS ratio in last and internal exons of A. thaliana.
We have shown that for the coding sequences of A. thaliana, the mutagenic effects of CpG methylation differ between exons at different relative positions. Among the three compared exon groups (first, last, and internal), the highest CpG methylation level and the strongest mutagenic effect of CpG methylation both occur in internal coding exons (Figure 1 and Additional file 1) despite the most stringent selective constraint (lowest dN/dS ratio) on this exon group (Figure 2). First coding exons, quite to the opposite, have the lowest level of CpG methylation and suffer the weakest mutagenic effect of CpG methylation, yet evolve the most rapidly. Interestingly, we show that mCG density is (weakly) positively correlated with dS, dN, and dN/dS ratio in first exons, yet the same correlations are significantly negative for last exons. For internal exons, the correlations are negative, weakly negative, and positive for dS, dN, and dN/dS ratio, respectively (Figure 4 and Additional file 2). The mutagenic effect of CpG methylation cannot fully explain these observations. Apparently, selection pressure has played a major role here. We have previously reported that in mammals, CpG methylation may have different biological roles in first, last, and internal coding exons . Similar comments may also apply to Arabidopsis species – that first exons are more liable to the mutagenic effects, yet the other two exon groups are more affected by the regulatory functions of CpG methylation. Noticeably, however, the correlations between mCG density and evolutionary rates actually diverge significantly between Arabidopsis species and mammals . One riveting difference is that for internal exons, the mCG density-dS and mCG density-dN/dS correlations are quite to the opposite between the two lineages. Such divergences appear to suggest that the biological roles of CpG methylation in coding exons have diverged significantly between the two lineages.
We also report here that exon length is an indicator of evolutionary rates of coding exons in Arabidopsis species. And this is not confounded by the ASE/CSE exon type, the proportion of repetitive elements, the proportion of intrinsically disordered regions, or exonic expression level. One may suspect that this observation has resulted from alignment errors, leading to increased dN and dN/dS ratios in longer exons. However, this is unlikely to be the case for two reasons. Firstly, the compared species - A. thaliana and A. lyrata - are very closely related. The median dN value of first exons (which evolve the most rapidly among the three groups) is smaller than 0.03 (Figure 2). Alignment errors may be a minor issue for sequence pairs with such a high level of similarity. Secondly, the length dependence of dN and dN/dS ratio is unlikely to result from the alignment between paralogous exonic sequences. This is because to observe such length dependence, we should have systematically aligned orthologous sequences for shorter exons but paralogous sequences for longer exons. We perceive no possible reasons why this may happen. Another possible explanation for the length dependence of dN and dN/dS ratio is annotation error. However, this may not be a major problem judging from the small evolutionary rates as shown in Figure 2.
The coding exons of animal and plant genes differ from each other in a number of biological features. One example is microRNA (miRNA) targeting sites. Previous studies have reported that genes targeted by more miRNAs tend to be under stronger selective constraints [24–26]. A recent study indicated that in mammals, approximately 2% of the synonymous sites were selectively constrained for such regulatory sequences as splicing motifs, enhancers, and miRNA target sites . For A. thaliana, it was predicted that ~75% of miRNA target sites were located in CDS . In comparison, only 53.4% and 56.5% of miRNA targets were predicted to reside in CDS in human and mouse, respectively . One important question is whether differential miRNA targeting is the true reason for the differences in the mCG density-evolutionary rate correlations between Arabidopsis species and mammals (Figure 3, ). Recall that the differences between the two lineages lie mainly in the mCG density-dS correlations in internal and last exons. These correlations are significantly positive in mammals but negative in Arabidopsis. This divergence implies that for internal and last exons in mammals, the principal biological role of mCG is mutagenesis. In Arabidopsis, however, mCG density may be associated with other selection-constrained biological functions. If the divergence in mCG density-dS correlations is to be ascribed to the higher proportion of miRNA target sites in the CDS of Arabidopsis, three prerequisites should be fulfilled: (1) in the internal and last exons of Arabidopsis, mCG density must be positively correlated with the probability of miRNA targeting; (2) miRNA targeting must be significantly constrained by selection in the two exon groups of Arabidopsis; and (3) this miRNA targeting-related selection affects only synonymous sites in internal and last exons of Arabidopsis. An example of miRNA-mediated DNA methylation has been reported for rice . The authors discovered that a specific group of 24-nucleotide (nt) miRNAs could mediate DNA methylation within a ~80-nt region around the target sites. However, only five such targets were identified. And most of the methylation occurred in the CHH or CHG context . A follow-up study published lately showed that 65 of 24-nt miRNAs exhibited elevated CHH methylation (but not CpG methylation) around their target sites . These studies imply that miRNA targeting may lead to an increased level of DNA methylation in the gene body of plants (which, in fact, was also observed in human ). Of note, nevertheless, each miRNA was predicted to have only one target site in the target gene. Furthermore, only 13 of the 65 target sites were located in CDS . Meanwhile, a recent study suggested that the miRNA target sites in CDS were subject to negative selection . These observations seem to suggest a connection between miRNA targeting and the mCG density-dS correlations in plants. However, we speculate that the influences of miRNA targeting might be insubstantial for three reasons. First, only a relatively small number (tens) of miRNAs have been reported to cause DNA methylation at the target sites. And most of them occur outside of CDS. miRNA-mediated methylation in CDS thus may be uncommon. Second, the sequences that are subject to miRNA-mediated methylation account for a minority (~80 nt  or ~200 nt ) in light of the average CDS length of ~1300 bp in the A. thaliana genome . Certainly, we cannot exclude the possibility that a methylation-inducing miRNA has multiple target sites in one gene, or that a gene is targeted by multiple methylation-inducing miRNAs. In such cases, the effects of miRNA targeting will undoubtedly be non-negligible. Nevertheless, these scenarios were not observed in the recent studies [30, 31]. The overall influences of miRNA targeting on CDS methylation thus might be immaterial. Third, the identified miRNA-mediated DNA methylation occurred mostly in the CHH or CHG contexts [30, 31]. Since we focus on methylation at CpG dinucleotides, the influences of miRNA-mediated methylation on our analysis should be fairly limited.
Another potential confounding factor in the mCG density-evolutionary rate analysis is the level of protein phosphorylation. Phosphorylated amino acid residues have been known to evolve more slowly than those unphosphorylated [34–37]. Since the motifs for phosphorylation differ between Arabidopsis and mammals [38, 39], the evolutionary rates of coding exons in the two lineages may be differentially affected by phosphorylation-related constraints. However, phosphorylation occurs at amino acid residues. The selective constraints at the amino acid level influence dN but not dS. Note that the mCG density-dN correlations are generally similar between mammals and Arabidopsis (Figure 3, ). Therefore, phosphorylation appears to have no significant effects on the differences in the mCG density-dN correlations between the two lineages.
One may suspect that the correlations between exon length and dN and dN/dS ratio have resulted from functional biases. This is because exons of different lengths may belong to genes of different functional categories. To examine this possibility, we divided the background dataset (Table 1) into five length subgroups and conducted an all-to-all pairwise comparison of gene ontology functional categories between the five subgroups of internal exons using FatiGO . As shown in Additional file 4, although the five length subgroups of internal exons differ from one another in view of gene ontology annotations, we do not observe any particular trend that may cause the length dependence of dN and dN/dS ratio. We also examined whether the correlations between mCG density and evolutionary rate could differ between different functional categories. We classified the analyzed genes according to the third level of “Molecular Function” of Gene Ontology, and calculated the correlations for nine functional groups that included ≥ 1000 genes. Note that one gene can be assigned to multiple functional groups. The sum of genes in all of the functional groups thus outnumbers the analyzed genes. The mCG density-evolutionary rate correlations in individual functional groups are similar to what we observed in Figure 3 (Additional file 5). Therefore, functional bias may not be a major concern in our analysis.
The correlations between exon length and evolutionary rates in Arabidopsis species have been previously observed . However, the underlying mechanism remains unclear. Here we show that first, last, and internal coding exons diverge from each other in terms of the exon length-dN/dS ratio correlation – the correlation is stronger in internal exons than in last exons, and is statistically insignificant in first exons. The length dependence of dN/dS ratio in last and internal exons remains statistically significant after controlling for potential confounding factors (the ASE/CSE exon type, the content of repetitive elements/disordered region, exonic expression level, and the level of CpG methylation). Of note, for last and internal exons, this length dependence occurs because longer exons have a larger increase in dN than in dS when compared with shorter exons. This increase in dN is probably unrelated to structural-functional reasons, for the proportion of disordered protein region (which is an indicator of protein structural flexibility and is strongly associated with the content of protein domains) does not significantly affect the exon length-dN/dS ratio correlations. It will be interesting to test the evolutionary neutrality of exons of different lengths when adequate polymorphism data become available.
Meanwhile, it has been recently reported that in human, transcription factor binding sites (TFBS) frequently reside in coding exons, and may significantly affect the evolution of these exonic sequences . The same comment may also apply to A. thaliana. However, currently no base-resolution TFBS datasets are available for A. thaliana. We may revisit this issue and investigate whether the density of TFBS is associated with the observed length dependence of dN and dN/dS ratio when such datasets are accessible.
One important issue is that we analyzed only one plant species in this study. Whether the observations in A. thaliana can be applied to other plant species remains unknown. To address this issue, we retrieved three genome-scale methylome datasets of rice (Oryza sativa L. ssp. japonica). Two of the datasets were derived from young panicles , and the other was derived from leaves  (Additional file 6). Our analysis confirmed the mutagenic effect of mCG on coding exons and the stronger mutagenic effect on non-first exons than on first exons in rice (Additional file 7). The evolutionary rates of first, last, and internal exons were similar to what we observed in A. thaliana (Additional file 8). Intriguingly, however, the correlations between mCG density and evolutionary rates were fairly different between rice (Additional file 9) and A. thaliana (Figure 3). Particularly, in view of the mCG density-dS correlations in last and internal exons, rice was similar to mammals  but not to A. thaliana. Of note, the rice methylome data were derived from panicles and leaves but not gamete cells. Whether the identified mCGs and the associated substitutions are heritable is therefore questionable. To be sure, we cannot rule out the possibility that the differences in mCG density-dS correlations between A. thaliana and rice represent genuine divergences in the biological roles of mCG. Adding to the complexity of this issue is that the domesticated rice (O. japonica) has been artificially selected. It will be interesting to re-examine this topic when the gamete methylome datasets of both cultivated and wild rice are available.
The mammal-Arabidopsis divergence in the association between DNA methylation and coding exon evolution is unexpected. DNA methylation is a major source of genomic sequence mutation on one hand, and an important transcriptional/splicing regulator on the other hand. Our results imply that this balance between biological roles of DNA methylation in coding exons may have differed significant between Arabidopsis and mammals in a length- and position-dependent manner. The detailed evolutionary mechanisms and functional outcomes are worth further explorations.
Measurement of CpG methylation level and the CpG O/E
The genome-scale, single base-resolution DNA methylation datasets of A. thaliana sperm were retrieved from a recent study  under accession number SRX156133 (Table 1). The bisulfite sequence reads were mapped to the genome of A. thaliana (TAIR10), and the methylated CpGs being identified by BS-Seeker  with default parameters. To ensure data quality, only the CpG dinucleotides that are covered by ≥5 bisulfite reads were retained (such CpG dinucleotides are designated as “sampled CpGs”). The methylation status of a CpG was represented as the percentage of reads that support the methylation of this CpG site. Only the CpGs with a methylation frequency of ≥80% were regarded as methylated [46, 47], and designated as “mCGs”. Since the accuracy of evolutionary rate estimates may be compromised in the case of short exons (e.g., <50 bp) [18, 21, 48], we only considered the CDSs that are longer than 80 bp and contain ≥10 sampled CpGs to ensure that the CDSs contain sufficient information. Here we focus on CpG methylation because the other types (CHG and CHH) of methylation are relatively rare , and may have only minor effects on the evolution of Arabidopsis exons.
The level of CpG methylation of a particular exonic region was represented by the “mCG density”, which was measured by calculating the number of mCGs per 100 CpG dinucleotides, and was defined as .
The CpG O/E was defined as , where P CpG , P C , and P G represent the frequency of CpG dinucleotides, C nucleotides, and G nucleotides, respectively.
Classification of coding exons
The A. thaliana gene annotations and the corresponding coding sequences were downloaded from the Ensembl genome browser at http://www.ensembl.org/. The CDSs that overlap with non-coding RNAs or pseudogenes were excluded. Single-exon genes were also excluded. According to the relative positions of exons in the Ensembl-annotated genes, the retrieved coding exonic regions were divided into three groups: first, internal, and last exons. Briefly, all of the transcript isoforms of a gene were collated (except for those that overlapped non-coding RNAs or pseudogenes), and the coordinates of the exons were compared. The coding exon that was closest to the most downstream 5’UTR and the most upstream 3’UTR was classified as the first and last coding exon, respectively. However, in the case where a stand-alone 5’UTR exon was followed by a second 5’UTR juxtaposed to a coding exon, this coding exon was excluded. This is because in this case, the first coding exon is not part of the most upstream exonic region. The same comment also applied to the last exon. The remaining exons that were neither first nor last coding exons were considered as internal exons. The retrieved exons were also classified into constitutively and alternatively spliced exons (CSEs and ASEs, respectively) according to whether they were always present in different transcript isoforms of a gene.
Measurement of exonic expression level
The transcriptome data for A. thaliana pollen derived from a recent study  were retrieved from the Gene Expression Omnibus database under accession number SRP022162. The sequencing reads were mapped to the A. thaliana genome by using TopHat 2 , and then analyzed by using eXpress  to obtain exonic expression levels.
Predictions of intrinsically disordered regions and repetitive elements
The genomic and peptide sequences of A. thaliana retrieved from the ENSEMBL Plants website were submitted to RepeatMasker  and Disopred , respectively, for predictions of repetitive elements and intrinsically disordered regions. The prediction tools were applied with default parameters. The proportions of exonic regions that overlapped repetitive elements and disordered regions were then calculated separately.
Calculation of evolutionary rates
The one-to-one gene orthology between A. thaliana and A. lyrata was retrieved from ENSEMBL Plants (Version 18). The protein sequences of the orthologous genes were aligned using MUSCLE  and then back-translated to nucleotide sequences. The aligned sequences were then separated exon-wise according to the annotations of ENSEMBL. The exonic sequence alignments were checked for the correctness of reading frame before being submitted to the CodeML program of PAML4  for the calculations of dN, dS and the dN/dS ratio.
- CpG methylation:
The level of DNA methylation at CpG dinucleotides
- CpG O/E ratio:
The observed-to-expected ratio of the number of CpG dinucleotides
- d N :
Nonsynonymous substitution rate
- d S :
Synonymous substitution rate
Methylated CpG dinucleotide
Transcription factor binding site
We thank Dr. Pao-Yang Chen and Mr. Wen-Wei Liao at Academia Sinica, and Dr. Wen Wang and Xin Li at the Kunming Institute of Zoology for technical assistance in processing the methylome data. We are also grateful for Dr. Ben-Yang Liao for constructive comments. This study was supported by the Ministry of Science and Technology under contract number NSC-102-2311-B-400-003 (to FCC) and NSC-102-2621-B-001-003 (to TJC).
- Kelemen O, Convertini P, Zhang Z, Wen Y, Shen M, Falaleeva M, Stamm S: Function of alternative splicing. Gene. 2013, 514 (1): 1-30.PubMedView ArticleGoogle Scholar
- Singh RK, Cooper TA: Pre-mRNA splicing in disease and therapeutics. Trends Mol Med. 2012, 18 (8): 472-482.PubMedPubMed CentralView ArticleGoogle Scholar
- Carvalho RF, Feijao CV, Duque P: On the physiological significance of alternative splicing events in higher plants. Protoplasma. 2012, [Epub ahead of print]Google Scholar
- Kalsotra A, Cooper TA: Functional consequences of developmentally regulated alternative splicing. Nat Rev Genet. 2011, 12 (10): 715-729.PubMedPubMed CentralView ArticleGoogle Scholar
- Keren H, Lev-Maor G, Ast G: Alternative splicing and evolution: diversification, exon definition and function. Nat Rev Genet. 2010, 11 (5): 345-355.PubMedView ArticleGoogle Scholar
- Chen FC, Liao BY, Pan CL, Lin HY, Chang AY: Assessing determinants of exonic evolutionary rates in mammals. Mol Biol Evol. 2012, 29 (10): 3121-3129.PubMedView ArticleGoogle Scholar
- Wu GC, Chen FC: Determinants of exon-level evolutionary rates in Arabidopsis species. Evol BioInform Online. 2012, 8: 389-415.PubMedPubMed CentralGoogle Scholar
- Chen FC, Chaw SM, Tzeng YH, Wang SS, Chuang TJ: Opposite evolutionary effects between different alternative splicing patterns. Mol Biol Evol. 2007, 24 (7): 1443-1446.PubMedView ArticleGoogle Scholar
- Chen FC, Chen CJ, Ho JY, Chuang TJ: Identification and evolutionary analysis of novel exons and alternative splicing events using cross-species EST-to-genome comparisons in human, mouse and rat. BMC Bioinformatics. 2006, 7: 136-PubMedPubMed CentralView ArticleGoogle Scholar
- Chen FC, Chuang TJ: Different alternative splicing patterns are subject to opposite selection pressure for protein reading frame preservation. BMC Evol Biol. 2007, 7 (1): 179-PubMedPubMed CentralView ArticleGoogle Scholar
- Gelly JC, Lin HY, de Brevern AG, Chuang TJ, Chen FC: Selective constraint on human pre-mRNA splicing by protein structural properties. Genome Biol Evol. 2012, 4 (9): 966-975.PubMedView ArticleGoogle Scholar
- Chuang TJ, Chen FC, Chen YZ: Position-dependent correlations between DNA methylation and the evolutionary rates of mammalian coding exons. Proc Natl Acad Sci U S A. 2012, 109 (39): 15841-15846.PubMedPubMed CentralView ArticleGoogle Scholar
- Mouse Genome Sequence Consortium: Initial sequencing and comparative analysis of the mouse genome. Nature. 2002, 420 (6915): 520-562.View ArticleGoogle Scholar
- Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, Funke R, Gage D, Harris K, Heaford A, Howland J, Kann L, Lehoczky J, LeVine R, McEwan P, McKernan K, Meldrim J, Mesirov JP, Miranda C, Morris W, Naylor J, Raymond C, Rosetti M, Santos R, Sheridan A, Sougnez C, et al: Initial sequencing and analysis of the human genome. Nature. 2001, 409 (6822): 860-921.PubMedView ArticleGoogle Scholar
- Arabidopsis Genome Initiative: Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature. 2000, 408 (6814): 796-815.View ArticleGoogle Scholar
- Gossmann TI, Woolfit M, Eyre-Walker A: Quantifying the variation in the effective population size within a genome. Genetics. 2011, 189 (4): 1389-1402.PubMedPubMed CentralView ArticleGoogle Scholar
- Cokus SJ, Feng S, Zhang X, Chen Z, Merriman B, Haudenschild CD, Pradhan S, Nelson SF, Pellegrini M, Jacobsen SE: Shotgun bisulphite sequencing of the Arabidopsis genome reveals DNA methylation patterning. Nature. 2008, 452 (7184): 215-219.PubMedPubMed CentralView ArticleGoogle Scholar
- Chen FC, Chuang TJ: The effects of multiple features of alternatively spliced exons on the K (A)/K (S) ratio test. BMC Bioinformatics. 2006, 7: 259-PubMedPubMed CentralView ArticleGoogle Scholar
- Chen FC, Wang SS, Chen CJ, Li WH, Chuang TJ: Alternatively and constitutively spliced exons are subject to different evolutionary forces. Mol Biol Evol. 2006, 23 (3): 675-682.PubMedView ArticleGoogle Scholar
- Graur D, Li W-H: Fundamentals of Molecular Evolution. 2000, Sunderland, Massachusetts: Sinauer Associates, 2Google Scholar
- Chen FC, Pan CL, Lin HY: Independent effects of alternative splicing and structural constraint on the evolution of mammalian coding exons. Mol Biol Evol. 2011, 29 (1): 187-193.PubMedView ArticleGoogle Scholar
- Brown CJ, Johnson AK, Daughdrill GW: Comparing models of evolution for ordered and disordered proteins. Mol Biol Evol. 2010, 27 (3): 609-621.PubMedPubMed CentralView ArticleGoogle Scholar
- Brown CJ, Takayama S, Campen AM, Vise P, Marshall TW, Oldfield CJ, Williams CJ, Dunker AK: Evolutionary rate heterogeneity in proteins with long disordered regions. J Mol Evol. 2002, 55 (1): 104-110.PubMedView ArticleGoogle Scholar
- Cheng C, Bhardwaj N, Gerstein M: The relationship between the evolution of microRNA targets and the length of their UTRs. BMC Genomics. 2009, 10: 431-PubMedPubMed CentralView ArticleGoogle Scholar
- Chen SC, Chuang TJ, Li WH: The relationships among microRNA regulation, intrinsically disordered regions, and other indicators of protein evolutionary rate. Mol Biol Evol. 2011, 28 (9): 2513-2520.PubMedPubMed CentralView ArticleGoogle Scholar
- Chen YC, Cheng JH, Tsai ZT, Tsai HK, Chuang TJ: The impact of trans-regulation on the evolutionary rates of metazoan proteins. Nucleic Acids Res. 2013, 41 (13): 6371-6380.PubMedPubMed CentralView ArticleGoogle Scholar
- Lin MF, Kheradpour P, Washietl S, Parker BJ, Pedersen JS, Kellis M: Locating protein-coding sequences under selection for additional, overlapping functions in 29 mammalian genomes. Genome Res. 2011, 21 (11): 1916-1928.PubMedPubMed CentralView ArticleGoogle Scholar
- Ding J, Li D, Ohler U, Guan J, Zhou S: Genome-wide search for miRNA-target interactions in Arabidopsis thaliana with an integrated approach. BMC Genomics. 2012, 13 (Suppl 3): S3-PubMedPubMed CentralGoogle Scholar
- Dweep H, Sticht C, Pandey P, Gretz N: miRWalk–database: prediction of possible miRNA binding sites by “walking" the genes of three genomes. J Biomed Inform. 2011, 44 (5): 839-847.PubMedView ArticleGoogle Scholar
- Wu L, Zhou H, Zhang Q, Zhang J, Ni F, Liu C, Qi Y: DNA methylation mediated by a microRNA pathway. Mol Cell. 2010, 38 (3): 465-475.PubMedView ArticleGoogle Scholar
- Hu W, Wang T, Xu J, Li H: MicroRNA mediates DNA methylation of target genes. Biochem Biophys Res Commun. 2014, 444 (4): 676-681.PubMedView ArticleGoogle Scholar
- Chuang TJ, Chiang TW: Pre-transcriptional DNA Methylation, Transcriptional Transcription Factor and Post-transcriptional microRNA Regulations on Protein Evolutionary Rate. Genome Biol Evol. 2014, In pressGoogle Scholar
- Fang Z, Rajewsky N: The impact of miRNA target sites in coding sequences and in 3'UTRs. PLoS One. 2011, 6 (3): e18067-PubMedPubMed CentralView ArticleGoogle Scholar
- Chen SC, Chen FC, Li WH: Phosphorylated and nonphosphorylated serine and threonine residues evolve at different rates in mammals. Mol Biol Evol. 2010, 27 (11): 2548-2554.PubMedPubMed CentralView ArticleGoogle Scholar
- Aivaliotis M, Macek B, Gnad F, Reichelt P, Mann M, Oesterhelt D: Ser/Thr/Tyr protein phosphorylation in the archaeon Halobacterium salinarum–a representative of the third domain of life. PLoS One. 2009, 4 (3): e4777-PubMedPubMed CentralView ArticleGoogle Scholar
- Levy ED, Michnick SW, Landry CR: Protein abundance is key to distinguish promiscuous from functional phosphorylation based on evolutionary information. Philos Trans R Soc Lond B Biol Sci. 2012, 367 (1602): 2594-2606.PubMedPubMed CentralView ArticleGoogle Scholar
- Freschi L, Osseni M, Landry CR: Functional divergence and evolutionary turnover in mammalian phosphoproteomes. PLoS Genet. 2014, 10 (1): e1004062-PubMedPubMed CentralView ArticleGoogle Scholar
- Villen J, Beausoleil SA, Gerber SA, Gygi SP: Large-scale phosphorylation analysis of mouse liver. Proc Natl Acad Sci U S A. 2007, 104 (5): 1488-1493.PubMedPubMed CentralView ArticleGoogle Scholar
- Wang X, Bian Y, Cheng K, Gu LF, Ye M, Zou H, Sun SS, He JX: A large-scale protein phosphorylation analysis reveals novel phosphorylation motifs and phosphoregulatory networks in Arabidopsis. J Proteomics. 2013, 78: 486-498.PubMedView ArticleGoogle Scholar
- Al-Shahrour F, Minguez P, Tarraga J, Medina I, Alloza E, Montaner D, Dopazo J: FatiGO +: a functional profiling tool for genomic data. Integration of functional annotation, regulatory motifs and interaction data with microarray experiments. Nucleic Acids Res. 2007, 35 (Web Server issue): W91-96.PubMedPubMed CentralView ArticleGoogle Scholar
- Stergachis AB, Haugen E, Shafer A, Fu W, Vernot B, Reynolds A, Raubitschek A, Ziegler S, LeProust EM, Akey JM, Stamatoyannopoulos JA: Exonic transcription factor binding directs codon choice and affects protein evolution. Science. 2013, 342 (6164): 1367-1372.PubMedPubMed CentralView ArticleGoogle Scholar
- Li X, Zhu J, Hu F, Ge S, Ye M, Xiang H, Zhang G, Zheng X, Zhang H, Zhang S, Li Q, Luo R, Yu C, Yu J, Sun J, Zou X, Cao X, Xie X, Wang J, Wang W: Single-base resolution maps of cultivated and wild rice methylomes and regulatory roles of DNA methylation in plant gene expression. BMC Genomics. 2012, 13: 300-PubMedPubMed CentralView ArticleGoogle Scholar
- Chodavarapu RK, Feng S, Ding B, Simon SA, Lopez D, Jia Y, Wang GL, Meyers BC, Jacobsen SE, Pellegrini M: Transcriptome and methylome interactions in rice hybrids. Proc Natl Acad Sci U S A. 2012, 109 (30): 12040-12045.PubMedPubMed CentralView ArticleGoogle Scholar
- Ibarra CA, Feng X, Schoft VK, Hsieh TF, Uzawa R, Rodrigues JA, Zemach A, Chumak N, Machlicova A, Nishimura T, Rojas D, Fischer RL, Tamaru H, Zilberman D: Active DNA demethylation in plant companion cells reinforces transposon methylation in gametes. Science. 2012, 337 (6100): 1360-1364.PubMedPubMed CentralView ArticleGoogle Scholar
- Chen PY, Cokus SJ, Pellegrini M: BS Seeker: precise mapping for bisulfite sequencing. BMC Bioinformatics. 2010, 11: 203-PubMedPubMed CentralView ArticleGoogle Scholar
- Meissner A, Mikkelsen TS, Gu H, Wernig M, Hanna J, Sivachenko A, Zhang X, Bernstein BE, Nusbaum C, Jaffe DB, Gnirke A, Jaenisch R, Lander ES: Genome-scale DNA methylation maps of pluripotent and differentiated cells. Nature. 2008, 454 (7205): 766-770.PubMedPubMed CentralGoogle Scholar
- Laurent L, Wong E, Li G, Huynh T, Tsirigos A, Ong CT, Low HM, Kin Sung KW, Rigoutsos I, Loring J, Wei CL: Dynamic changes in the human methylome during differentiation. Genome Res. 2010, 20 (3): 320-331.PubMedPubMed CentralView ArticleGoogle Scholar
- Nekrutenko A, Makova KD, Li WH: The K (A)/K (S) ratio test for assessing the protein-coding potential of genomic regions: an empirical and simulation study. Genome Res. 2002, 12 (1): 198-202.PubMedPubMed CentralView ArticleGoogle Scholar
- Loraine AE, McCormick S, Estrada A, Patel K, Qin P: RNA-seq of Arabidopsis pollen uncovers novel transcription and alternative splicing. Plant Physiol. 2013, 162 (2): 1092-1109.PubMedPubMed CentralView ArticleGoogle Scholar
- Xi L, Feber A, Gupta V, Wu M, Bergemann AD, Landreneau RJ, Litle VR, Pennathur A, Luketich JD, Godfrey TE: Whole genome exon arrays identify differential expression of alternatively spliced, cancer-related genes in lung cancer. Nucleic Acids Res. 2008, 36 (20): 6535-6547.PubMedPubMed CentralView ArticleGoogle Scholar
- Roberts A, Pachter L: Streaming fragment assignment for real-time analysis of sequencing experiments. Nat Methods. 2013, 10 (1): 71-73.PubMedView ArticleGoogle Scholar
- Smit AFA, Hubley R, Green P: RepeatMasker Open-3.0. 1996, http://www.repeatmasker.org/faq.html, –2010,Google Scholar
- Ward JJ, McGuffin LJ, Bryson K, Buxton BF, Jones DT: The DISOPRED server for the prediction of protein disorder. Bioinformatics. 2004, 20 (13): 2138-2139.PubMedView ArticleGoogle Scholar
- Edgar RC: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004, 32 (5): 1792-1797.PubMedPubMed CentralView ArticleGoogle Scholar
- Yang Z: PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol. 2007, 24 (8): 1586-1591.PubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.