In sexual species, genetic parasites such as transposable elements (TEs) can proliferate to the detriment of the host . Natural selection is widely considered the dominant force limiting TE proliferation [2–4]. The selective forces limiting TE abundance are thought to act against three primary consequences of TE proliferation- gene mutation by TE insertion, chromosomal rearrangement caused by ectopic recombination among dispersed copies, and the energetic burden imposed on the host arising from the costs of replication, transcription and translation of TE copies. TE insertion alleles often segregate at low frequencies in populations, consistent with natural selection limiting their increase. However, studies in recent years have shown that different modes of RNA silencing also play an important role in constraining TE proliferation [5–7]. In particular, within the germline of animals, the piRNA machinery functions as an immune system to protect the genome against TE proliferation [8–10]. TE copies that have inserted into distinct chromosomal regions known as piRNA clusters are recognized as aberrant and their transcripts are directed to the piRNA biogenesis machinery. In complex with Argonaute proteins, 26 - 31 nt piRNAs use sequence identity to target transcripts of dispersed TE copies for degradation. In turn, this generates secondary piRNAs that feed into the cycle of piRNA biogenesis and TE silencing.
These two forces - natural selection and genome defense by small RNAs - directly limit TE proliferation, but in distinct ways. In particular, natural selection will primarily act to limit the increase of harmful TE insertion alleles. Conversely, the piRNA machinery will act by directly limiting the transposition rate, and thus the rate of production of new insertion alleles. How these forces jointly determine the rate of TE proliferation is poorly understood . In fact, the distinction between natural selection and genome defense as separable forces may be artificial, since the genome defense machinery is itself the product of natural selection.
To what degree does the strength of natural selection explain variation in TE abundance across species? Population genetic theory predicts that the strength of selection relative to genetic drift will be greater in larger populations . Thus, as indicated by Ohta, mildly deleterious substitutions will fix at a greater rate in smaller populations. This prediction has been confirmed in a variety of systems where purifying selection was estimated using the rate ratio of non-synonymous to synonymous substitution (ω). Consistent with natural selection acting more strongly against deleterious non-synonymous substitutions, larger populations tend to have smaller ω values. For example, in mammals there is a strong negative correlation between population size and ω . A large study comparing branch-specific ω between closely related mainland and island species found that ω estimates were higher for island species, likely due to their smaller population size . This was observed in both vertebrates and invertebrates. If variation in TE abundance across species is also influenced by the strength of selection relative to drift, TE abundance and genome-wide ω values for protein-coding genes should be positively correlated. This is because differences in population size will modulate the efficacy of selection against both types of mutation.
This hypothesis, however, relies on several assumptions. First, it assumes that variation in ω is more strongly explained by the influence of mildly deleterious substitutions rather than adaptive substitutions. If beneficial mutations are common and much more likely to fix in larger populations, the negative relationship between ω and population size will be ameliorated. Studies in Drosophila, however, have indicated that population size has little effect on the rate of adaptive fixation [15, 16]. Second, it assumes that the rates of both forms of mutation are independent of population size. Among related species, this is likely for mutations at the nucleotide level, but may be violated for TE insertions if exposure to TE invasion is greater in larger populations. Finally, it assumes that the distribution of fitness effects is similar between TE insertions and non-synonymous mutations and doesn't consider the non-linear scaling of the probability of fixation of mildly deleterious alleles with effective population size . When the product of the effective population size and deleterious selection coefficient (N
s) is substantially greater than one, the chance that such a deleterious allele fixes becomes vanishingly small. Only nearly neutral deleterious mutations for which N
s is less than one are expected to fix. Thus, the degree to which variation in population size governs the accumulation of deleterious alleles, such as TE insertions, itself depends on population size. In organisms with very large population sizes drift becomes extremely weak, and modest variation in population size among related species may only impact the rate of accumulation of very mildly deleterious alleles with selection coefficients very close to zero . In this case, the distribution of fitness effects of mildly deleterious mutations will be an important factor in determining the relationship between population size and ω. Setting aside the effects of beneficial mutations, a negative correlation will be observed across all population sizes only if there are a sufficient number of nearly neutral mutations at all population sizes. If the distribution of fitness effects for deleterious mutations does not display this characteristic, modest increases in population size in very large populations may not always lead to decreased ω. For similar reasons, one might not expect a simple relationship between TE accumulation and population size when population sizes are large.
This latter point is highlighted by studies of TE dynamics yielding contrasting results for different species. In Drosophila melanogaster, studies have suggested that the deterministic forces of selection against TEs may greatly outweigh genetic drift as a factor [19, 20]. D. melanogaster also has a large effective population size. Thus, modest variation in population size among species across the genus Drosophila may not be an important factor contributing to variation in TE abundance. In contrast, studies in vertebrates with much smaller effective population sizes have shown that genetic drift can be an important factor contributing to TE accumulation . For example, the frequency distribution of TE insertion alleles in the pufferfish is consistent with neutrality . This indicates that variation in effective population size at these low population sizes may have a greater impact on the fate of mildly deleterious TE insertions, relative to species with larger population sizes such as Drosophila. It also indicates that in species with larger population sizes, there may only be a weak correlation between the rate of non-synonymous substitution and TE abundance. This may explain why, in one study, after correcting for phylogenetic signal, no apparent relationship between genomic TE number and population size was found . Considering these issues, we aimed to test the simple hypothesis that TE abundance is positively correlated with genome-wide ω across the Drosophila genus. Rejection of this hypothesis would support alternative models and provide further testable hypotheses in the study of TE dynamics in large populations.
An added level of complexity is the evolutionary dynamic between TEs and the piRNA machinery. In several species of Drosophila, many, but not all, components of the piRNA machinery show a high rate of adaptive evolution [24–26]. This has been proposed to arise from an evolutionary arms race between the host and TEs. Evolutionary arms races between hosts and parasites drive cycles of adaptation and counter-adaptation, leading to increased rates of adaptive evolution in host immune systems. In the case of TEs, there is likely strong selection to avoid silencing by the piRNA machinery. This may select for functions analogous to those observed in viruses, which have mechanisms that directly antagonize the machinery of RNA silencing [27, 28]. Reciprocally, natural selection acting on the host likely selects for changes that counteract these strategies, driving a high rate of adaptive evolution in the proteins of the piRNA machinery. But importantly, the classic evolutionary arms race is not sufficient to explain all evolutionary dynamics between host and parasite [29–31]. For example, trench warfare may occur when a diversity of parasites selects for the maintenance of multiple defense strategies, thus favoring a mode of balancing selection . While an evolutionary arms race with TEs is suggested by the high rate of evolution in the piRNA machinery in some Drosophila species, it is not clear that increasing TE abundance could drive ever-increasing levels of amino-acid evolution. Constraint on core function will eventually pose some limit to rates of amino-acid evolution. Additionally, with increasing TE abundance, the balance of forces may begin to favor purifying selection over adaptation. To explore these issues, we test the simple hypothesis that in the Drosophila genus, increasing TE abundance drives a higher rate of amino-acid substitution in the piRNA machinery as measured by ω.
In this study, we found that genome-wide levels of purifying selection are greater (smaller ω) in Drosophila species with higher TE abundance, inconsistent with a model in which increasing TE abundance and ω are jointly explained by weaker selection relative to drift. Compared to control genes, we also find that this observation is more evident in the piRNA machinery. Strikingly, we find that TE abundance and levels of codon bias are positively correlated in the piRNA machinery but not in control genes or in the rest of the genome. Rather than an increasing rate of amino-acid evolution, the primary response of the piRNA machinery to increasing TE abundance appears to be through improved codon usage for increased translational efficiency.