Different alternative splicing patterns are subject to opposite selection pressure for protein reading frame preservation

Background Alternative splicing (AS) has been regarded capable of altering selection pressure on protein subsequences. Particularly, the frequency of reading frame preservation (FRFP), as a measure of selection pressure, has been reported to be higher in alternatively spliced exons (ASEs) than in constitutively spliced exons (CSEs). However, recently it has been reported that different ASE types – simple and complex ASEs – may be subject to opposite selection forces. Therefore, it is necessary to re-evaluate the evolutionary effects of such splicing patterns on frame preservation. Results Here we show that simple and complex ASEs, respectively, have higher and lower FRFPs than CSEs. Since complex ASEs may alter the ends of their flanking exons, the selection pressure on frame preservation is likely relaxed in this ASE type. Furthermore, conservation of the ASE/CSE splicing pattern increases the FRFPs of simple ASEs but decreases those of complex ASEs. Contrary to the well-recognized concept of strong selection pressure on conserved ASEs for protein reading frame preservation, our results show that conserved complex ASEs are relaxed from such pressure and the frame-disrupting effect caused by the insertion of complex ASEs can be offset by compensatory changes in their flanking exons. Conclusion In this study, we find that simple and complex ASEs undergo opposite selection pressure for protein reading frame preservation, with CSEs in-between. Simple ASEs have much higher FRFPs than complex ones. We further find that the FRFPs of complex ASEs coupled with flanking exons are close to those of simple ASEs, indicating that neighboring exons of an ASE may evolve in a coordinated way to avoid protein dysfunction. Therefore, we suggest that evolutionary analyses of AS should take into consideration the effects of different splicing patterns and the joint effects of multiple AS events.


Background
Alternative splicing (AS) is a topic of increasing interests because it has been suggested to be an important contributor to transcriptome/proteome complexity, gene function, and a wide variety of biological processes [1][2][3][4][5][6][7]. Previous studies have reported that as high as 40~80% of human genes undergo AS [8][9][10][11][12]. Of the observed AS events in mammals, the most common AS event is "cassette exon". It can add or remove an individual exon in a transcript [13][14][15]. Cassette exons are sometimes referred to as alternatively spliced exons (ASEs) [16][17][18][19][20][21][22][23]. It has been suggested that ASEs and constitutively spliced exons (CSEs, exons that are always included in the transcript) are under different selection pressures and evolve at distinct rates -the former have higher nonsynonymous (Ka) substitution rates but lower synonymous (Ks) substitution rates than the latter [16,18,19,[24][25][26]. ASEs are regarded as under relaxed selection pressure because of their dispensability in transcripts. Also, conserved ASEs (i.e., exons are alternatively spliced in a pair of compared species) have been suggested to be constrained for preservation of the reading frame [18,22,27]. Many studies have pointed out that preservation of reading frame may indicate functional selection pressure of an AS event [18,22,[27][28][29].
Recently, the Alternative Splicing Database (ASD) project at European Bioinformatics Institute (EBI) [30] further classifies cassette exons into simple and complex cassette exons ("simple ASEs" and "complex ASEs"). Complex ASEs differ from simple ones in that the former change the lengths of one or both of their flanking exons when they are included in the transcripts, whereas the latter do not (see Fig. 1). Therefore, inclusion of a complex ASE results in simultaneous changes of two or three exons. In contrast, inclusion of a simple ASE does not alter its flanking exon(s) and appear to cause fewer changes. Chen et al. have reported that simple ASEs have higher Ka and lower Ks than CSEs, whereas complex ASEs have evolutionary rates to the opposite of simple ASEs vs. CSEs [31]. They also found that GC contents and codon usage bias are associated with increased Ks values in complex ASEs but not in simple ones [31]. Such observation modified the previous view that ASEs accelerate evolution of protein subsequences. However, whether simple/complex splicing pattern is related to preservation of reading frame has not been investigated.

Results and discussion
Since ASEs in one species may be CSEs in the other (i.e. lineage-specific ASE/CSEs), it is necessary to specify the splicing pattern of the exons studied. Therefore, we classify ASEs into three major groups according to splicing pattern conservation (see Materials and Methods). Each group is subsequently divided into four subsets (Table 1). We then compare the frequencies of reading frame preservation (designated as "FRFP", i.e., the proportions of exons of which the lengths are divisible by 3) between simple and complex ASEs (Fig. 2). For Group A, the FRFPs for human (mouse) simple and complex ASEs are 43.0% (45.7%) and 37.9% (35.5%), respectively. In comparison, the FRFPs of CSEs approximate 40% (39.7% in human and 39.5% in mouse [22]). It has been well recognized that CSEs have lower FRFPs than ASEs. However, we find that although this is true for simple ASEs (P-values < 0.01 in both human and mouse; all statistical tests used in this section are the Fisher's exact test), it does not seem to hold for complex ASEs. The FRFPs are higher in CSEs than in complex ASEs in both species, though the differences are not highly significant (both P-values > 0.01). Overall, our results indicate that simple and com1plex ASEs are under opposite selection pressure for protein reading frame preservation. Particularly, complex ASEs differ significantly in FRFP from commonly regarded ASEs, which are dominated in number by simple ASEs. We then extract conserved ASEs (Group C) from Group A. Note that "conservation" here refers to the conservation of the ASE/ CSE splicing pattern between human and mouse, rather Comparisons of frame-preserving frequencies of simple ASEs and complex ASEs in ASE conservation unspecified group (Group A), lineage-specific ASE group (Group B), and con-served ASE group (Group C) Figure 2 Comparisons of frame-preserving frequencies of simple ASEs and complex ASEs in ASE conservation unspecified group (Group A), lineage-specific ASE group (Group B), and conserved ASE group (Group C). Figure 1 Two kinds of ASEs analyzed in the study (A) simple ASEs and (B) complex ASEs. Complex ASEs change the boundaries of one or both of their flanking exons when they are included in transcripts while simple ASEs do not. Therefore, a complex ASE looks like a simple ASE plus exon extension/truncation events. The length difference between Transcripts 1 and 2 is d 1 + d 2 + d 3 -d 4 -d 5 .

Two kinds of ASEs analyzed in the study (A) simple ASEs and (B) complex ASEs
than the simple/complex pattern. We find that the FRFP of Group C simple ASEs increases to 49.8% in human and 53.4% in mouse (Fig. 2). Meanwhile, for Group C complex ASEs, the FRFPs decrease to <35% (34.3% for human; 33.3% for mouse) (Fig. 2). It is obvious that simple ASEs in Group C have higher FRFPs than in Group A, whereas the reverse is true for complex ASEs in both human and mouse. We then compare the FRFPs of simple and complex ASEs with those of CSEs. For simple ASEs, Group A has lower FRFPs than Group C, while both groups have higher FRFPs than those of CSEs (Fig. 3). However, for complex ASEs, the trend is reversed. Even if the expected FRFP of CSEs is set as 45% [32], the trends still hold well in conserved ASEs. Therefore, simple and complex ASEs seem to cause FRFP changes to the opposite ends when compared with CSEs. Note that the "CSEs" stated above are those with unspecified splicing pattern conservation. We therefore retrieve 21,669 pairs of conserved CSEs for comparison. The FRFPs of conserved CSEs are 38.4% in human and 38.3% in mouse, respectively. These figures further confirm our observations that CSEs tend to have higher FRFP than complex ASEs but lower FRFP than simple ones. Overall, our result supports Chen et al's suggestion that simple and complex ASEs cause evolutionary changes to the contrary ends with CSEs in-between [31].
To further probe the effects of splicing pattern conservation on frame preservation, we compare the FRFPs between conserved and lineage-specific ASEs (Groups C and B). As shown in Figure 2, for simple ASEs, conservation of ASE/CSE splicing pattern results in an increase in FRFP. In contrast, splicing pattern conservation causes the FRFP to drop in complex ASEs, such observation disobeys the previous view [18,22,27] that conserved ASEs have a higher probability to be frame-preserving than lineagespecific ones.
On the other hand, also see Table 1, we find that >70% of the ASEs (either simple or complex) have CSE counterparts in the other species, indicating that AS patterns tend not to be evolutionarily conserved in human and mouse. If only conserved ASEs are considered, the simple splicing pattern has a much higher probability of being conserved between human and mouse than the complex splicing pattern ( Table 2). The result indicates that most complex ASEs are lineage-specific.
Another issue of interest is that, since a complex ASE looks like a simple event plus one (or two) exon extension/trun-Comparisons of frame-preserving frequencies of simple ASEs, complex ASEs, and complex+ flanking exons in Groups A and C Figure 3 Comparisons of frame-preserving frequencies of simple ASEs, complex ASEs, and complex+ flanking exons in Groups A and C. cation event(s) (see Fig. 1B), the FRFPs of complex ASEs may in fact reflect the effects of exon extension/truncation events. However, as shown in Table 3, we find that the FRFPs in the lineage-specific exon extension/truncation events are around 50%, whereas in conserved events, the FRFPs significantly increase to over 60% (both P-values < 0.001; Table 3). Such an increase in FRFP towards conserved ASEs is similar to what is observed in simple ASEs. Therefore, exon extension/truncation events and complex ASEs may be under different selection pressures for reading frame preservation. We speculate that a complex splicing event is rather an integrated "module" that requires synchronized changes in neighboring exons, than merely a simple ASE accidentally coupled with exon extension/ truncation events. To find support for this hypothesis, we further analyze whether the length changes caused by complex ASEs and their flanking exons can offset the frame-shifting effects of each other and retain the upstream reading frame. We find that the FRFPs of complex ASEs coupled with flanking exons (complex+flanking exons) are close to those of simple ASEs (Fig. 3). In Group C, the FRFPs of complex+flanking exons (49.2% in human and 47.8% in mouse) are significantly higher than those of conserved CSEs (dashed lines in Fig. 3; both Pvalues < 0.01). Therefore, the selection pressure for frame preservation may apply to transcripts as a whole, but not to complex ASEs per se. Furthermore, our results imply that in an alternatively spliced transcript, neighboring exons of an ASE may evolve in a coordinated way to avoid protein dysfunction.

Conclusion
In sum, one surprising finding of this study is that the FRFP of complex ASEs is lower than that of CSEs. Our result suggests that the frame-shifting effects of complex ASEs are rescued by the compensatory changes in the flanking exons, thus leaving the downstream protein reading frames unaltered. Therefore, complex ASEs appear to be more relaxed from selection pressure than simple ones in terms of reading frame preservation. One possible reason is that most observed ASEs (>80%) are simple ASEs (see Table 1) and the previously analyzed results are likely dominated by the effects of these exons. If we divide ASEs into simple and complex ASEs, the opposite evolutionary effects between them are observed. Previously, we have reported that complex ASEs are under stronger selection pressure against amino acid changes than simple ones [31]. In addition, we find that exons that participate in both simple and complex AS events have intermediate FRFPs, which fall between those of simple and complex ASEs (data not shown). In sum, our results reveal that, simple and complex ASEs have quite distinct evolutionary features. It appears that both simple and complex AS patterns have functional importance in view of the two different forms of selection pressure (protein sequence conservation and reading frame preservation) for which they are constrained. Although the biology of complex ASEs has rarely been documented, it is likely that this ASE type has resulted from a different molecular mechanism and played a different role from that of simple ASEs. Note that "reading frame preservation" here means that the changes in exon length caused by such events are multiples of 3.

Methods
We used 5,176 orthologous gene pairs of human and mouse from the EBI database [33] and extracted reciprocal best-hit coding exon pairs using the BLAST package (version 2.2.11 from NCBI website). The human and mouse files used to annotate exon types (including the ASE types) were download from ASD (AltSplice Human Release 2 based on Ensembl 27.35a.1 and AltSplice Mouse Release 2 based on Ensembl 27.33c.1 [30,34]). Based on the above information, also see Table 1, we divided the extracted human-mouse exon pairs into three groups: A. ASE conservation unspecified (i.e., simple/complex ASEs vs. all exons, the ASEs of which the ASE/CSE splicing patterns of the orthologous exons are not limited), B. lineagespecific ASE (i.e., simple/complex ASEs vs. CSEs, the ASEs of which the orthologous exons are CSEs) and C. conserved ASE (i.e., simple/complex ASEs vs. all ASEs) groups. Note that "All exons" include CSEs and all ASEs; whereas "All ASEs" include simple ASEs, complex ASEs, and uncertain ASE type. Groups B and C are subsets of Group A.