- Research article
Widespread gene duplication and adaptive evolution in the RNA interference pathways of the Drosophila obscura group
BMC Evolutionary Biologyvolume 19, Article number: 99 (2019)
RNA interference (RNAi) related pathways provide defense against viruses and transposable elements, and have been implicated in the suppression of meiotic drive elements. Genes in these pathways often exhibit high levels of adaptive substitution, and over longer timescales show gene duplication and loss—most likely as a consequence of their role in mediating conflict with these parasites. This is particularly striking for Argonaute 2 (Ago2), which is ancestrally the key effector of antiviral RNAi in insects, but has repeatedly formed new testis-specific duplicates in the recent history of the obscura species-group of Drosophila.
Here we take advantage of publicly available genomic and transcriptomic data to identify six further RNAi-pathway genes that have duplicated in this clade of Drosophila, and examine their evolutionary history. As seen for Ago2, we observe high levels of adaptive amino-acid substitution and changes in sex-biased expression in many of the paralogs. However, our phylogenetic analysis suggests that co-duplications of the RNAi machinery were not synchronous, and our expression analysis fails to identify consistent male-specific expression.
These results confirm that RNAi genes, including genes of the antiviral and piRNA pathways, have undergone multiple independent duplications and that their history has been particularly labile within the obscura group. However, they also suggest that the selective pressures driving these changes have not been consistent, implying that more than one selective agent may be responsible.
Gene duplication is an important process in molecular evolution, providing raw genetic material for evolutionary innovation. The evolutionary dynamics following gene duplication are often described in terms of two alternative models, ‘neofunctionalization’ and ‘subfunctionalization’ . Under neofunctionalization, the functional redundancy following duplication provides relaxed selective constraint, and allows new mutations to accumulate through genetic drift. Most such mutations will reduce the functionality of the gene (resulting in pseudogenization), but some paralogs can be selected for new or derived functions. Under subfunctionalization, the duplicates independently accumulate mutations that allow them to specialise in a subset of ancestral functions of a pleiotropic gene. Neofunctionalization leads to asymmetrical evolutionary rates among paralogs (with faster evolution in paralogs that gain derived function), whereas equal rates are expected for the latter . It has been suggested that both processes have played an important role in the rapid evolution of RNA interference-related pathways, including the long- and short-term evolutionary history of the Argonautes, the effectors of RNAi [3,4,5].
The RNAi-related pathways comprise a range of small-RNA mechanisms best known for their roles in mediating the control of gene expression, antiviral responses, and defence against mobile genetic elements. Respectively, these include the miRNA pathway (Dicer-1 and Argonaute-1 in insects ), the siRNA pathway (Dcr-2 and Argounate 2 in insects ), and the piRNA pathway (piwi-family Argonaute AGO3 and Piwi/Aub in insects [8, 9]). In addition, RNAi-related pathways have been implicated in a variety of other biological processes, such as the control of dosage compensation [10,11,12] and the suppression of genetic drive [13,14,15,16,17,18]. Several genes involved in the defensive piRNA and siRNA pathways, but not genes of the miRNA pathway, display elevated rates of adaptive protein evolution (but see  for an exception). This is best studied in Drosophila [20,21,22], but is also detectable in other invertebrates . It has been hypothesized that this may be a consequence of parasite-mediated ‘arms-race’ coevolution [21, 24], either through conflict with parasite-encoded immune suppressors—as widely seen in RNA viruses —or in the case of the piRNA pathway, through selection for ‘re-tuning’ suppression mechanisms .
Adaptive evolution of RNAi pathways is partly reflected in the gain, loss, and functional divergence of Argonaute-family duplications . For example, within the Drosophilidae—an important model for RNAi-related pathways of animals—Piwi has been duplicated in the lineages leading to Phortica variegata and Scaptodrosophila deflexa , and Ago2 has been duplicated in those leading to S. deflexa, D. willistoni, D. melanogaster (where only one paralog remains – the canonical Ago2) and D. pseudoobscura . This is particularly striking in the obscura group of Drosophila species , which has experienced at least 6 independent duplications of Ago2 over the last 20 million years, with all but one of the resulting duplicates becoming testis-specific, and most displaying evidence of recent and/or ongoing positive selection .
Recently it has been noted that several accessory components of the siRNA and piRNA pathways have also been duplicated in D. pseudoobscura, which has previously been noted to have a high rate of gene turnover, including in some RNAi genes . These include armitage, asterix, cutoff, maelstrom, tejas and vreteno . In D. melanogaster, these proteins are engaged in a number of roles in the piRNA pathway (Table 1). Here we use publicly available data to reconfirm the history and expression of Ago2 in the obscura group, and to test whether duplications in the other genes also show male-specific expression, whether duplications are contemporaneous with those of Ago2, and whether they too show strong signatures of adaptive protein evolution. We find no clear pattern of these duplications being coincident with Ago2 duplications, but both asterix and cutoff duplications display increased sexual dimorphism relative to their ancestral copies, through decreased female expression. In addition, several of the gene duplicates show evidence of adaptive protein evolution in D. pseudoobscura, including both copies of cutoff, the ancestral copy of asterix, and the new duplicates of tejas, maelstrom and vreteno.
Obscura group Argonaute 2 genes are duplicated and show male-biased expression
The obscura group has experienced multiple duplications of Ago2 and it has previously been shown that these are associated with positive selection and testis-specific expression . Here we reanalyzed the expression patterns and evolutionary history of these genes using publicly available RNAseq and genomic data, additionally including newly available genomic sequences from D. algonquin , D. athabasca , D. bifasciata and D. miranda .
In contrast to the previous qPCR analysis that failed to identify substantial or easily detectable expression of the ancestral copy in D. pseudoobscura (Ago2d ), we found the expression of all Ago2 homologs in D. pseudoobscura was detectable at a high level in RNAseq data, and that all show significant male-bias (Fig. 1). The Ago2d expression detected here is unlikely to be an artefact of cross-mapping between paralogs as we observed the reads that mapped uniquely across the gene. The male bias was largest for Ago2e, where expression in males is approximately 1000-fold higher than females (pMCMC< 0.001; Fig. 1), and smallest in Ago2c and the ancestral copy Ago2d, consistent with the ca. two-fold enrichment of the single copy of Ago2 in male D. melanogaster. We also confirmed that D. miranda, a close relative of D. pseudoobscura that has not previously been analyzed, displayed a qualitatively similar pattern among those paralogs represented (Fig. 1). In D. obscura, we found the ancestral copy (Ago2a) again showed slightly, but marginally significant higher expression in males (pMCMC = 0.014), but that other Ago2 proteins showed a strong male biased expression, with the largest effect for Ago2f, where male expression was 2000-fold higher (Fig. 1; pMCMC< 0.001).
Six piRNA pathway genes are duplicated, and asterix and cutoff duplicates show increased male-bias in their expression
Palmer et al.  recently identified six accessory piRNA pathway genes that have also experienced duplication in the obscura group (Additional file 1: Table S1). We could locate the duplicates for all genes in D. pseudoobscura, except for armitage where we instead identified a duplicate in the affinis subgroup but not in the pseudoobscura, obscura or subobscura subgroups. At the time of analysis, assembled genomic resources were not available for the obscura and subobscura subgroups, and our analysis of transcriptomic data did not identify duplications in those genes. Following submission, draft genome assemblies became available for D. obscura  and D. subobscura . A search of these genomes confirmed that duplicates are undetectable for all genes except vreteno, but that two copies of vreteno are detectable in D. obscura (not shown).
Where chromosomal locations of the new duplications could be determined by synteny in D. pseudoobscura, we found that cutoff, maelstrom and vreteno were duplicated from an autosome to the X chromosome, asterix duplicated from the X chromosome to an autosome, and tejas duplicated between autosomal locations. Two duplicates (asterix and tejas) lack introns, suggesting they are retro-transcribed copies created through an mRNA intermediate.
Using public RNAseq data from D. pseudoobscura and D. miranda, we found that all of the gene duplicates were expressed (Fig. 2). Armitage, which was not duplicated within the newly examined lineages for which RNAseq data were available, did not show strong sex-biased expression. Similarly, maelstrom, tejas, and vreteno were not strongly differentially expressed between the sexes, and nor were their duplicates in D. pseudoobscura and D. miranda. In contrast, both asterix and cutoff duplicates displayed substantially reduced expression in females and slightly increased expression in males relative to the ancestral copy (Fig. 2). For example, as previously reported from qPCR analysis  the paralog of asterix in D. pseudoobscura displays ca. 1000-fold higher expression in males than females. In both D. pseudoobscura and D. miranda those genes with overall strongly increased male-biased expression (Argonaute-2, asterix, cutoff, and their paralogs) had the highest expression in testis, and had reduced expression in ovaries (Additional file 2: Figure S1).
Adaptive amino-acid substitutions are generally more common in the duplicates
Using population genetic data from D. pseudoobscura, with D. miranda as an outgroup, we used the McDonald-Kreitman framework and a maximum-likelihood extension to estimate the rate of adaptive substitution in protein sequences, and to test whether this rate differed between the ancestral and duplicated copies [44, 45] . Treating genes individually, we found evidence for positive selection acting on at least one paralog for each of the genes except asterix and Ago2c (p < 0.05; Additional file 3: Table S2). Among the ancestral copies, only cutoff displayed evidence of positive selection. We then tested whether the paralogs generally showed a different pattern of selection to the ancestral copies by dividing the genes into two classes (6 ancestral copies and 8 paralogs) and comparing the likelihood of models that allowed the classes to differ in the adaptive rate α (Table 2) . The best-supported model allowed α to differ between ancestral and duplicate copies (Akaike weight: 0.81), and the second-best supported model was that in which ancestral copies experienced no ongoing positive selection (i.e. α = 0; Akaike weight: 0.19), providing overall evidence that the paralogs have experienced more adaptive protein evolution. In the best-supported model, the α value was estimated to be 0.68 for the duplicate group, which more than three times larger than the α value of the ancestral group (0.20). In case segregating weakly-deleterious variants led to a downward bias in estimates of α, we repeated this analysis excluding all alleles with a minor allele frequency < 0.125 , although this reduced power to the extent that few genes remained individually significant (Additional file 4: Table S3). We also repeated the analysis with a larger dataset PRJNA326536  (Additional file 4: Table S3), and obtained qualitatively similar results (R2 = 0.946 for α estimates between the analyses; the second dataset, while larger, is less suitable for analysis as only the third chromosome is a direct sample from a wild population).
Gene duplications were unlikely to be contemporaneous
Given the multiple duplications of Ago2 and the piRNA pathway components in the obscura group, we hypothesized that some duplications may have occurred near-simultaneously, duplicating whole components of a pathway together. We therefore used relaxed-clock phylogenetic methods to estimate the relative timings of each duplication. In agreement with the previous analysis of Ago2 , we found that the duplications giving rise to Ago2e and Ago2f predated the split between the obscura and pseudoobscura subgroups, with a subsequent loss of Ago2f from the pseudoobscura subgroup (Fig. 3). In contrast, we found that duplications in five of the other six genes unambiguously occurred after the obscura/pseudoobscura split, with the timing of duplication in maelstrom being uncertain. Briefly, armitage displayed a single duplication shared by members of the affinis subgroup, asterix and tejas a single duplication each in the lineage leading to D. pseudoobscura (which were subsequently lost in the affinis subgroup), cutoff a single duplication recently in the pseudoobscura subgroup, and vreteno a single duplication at the base of the obscura group (Fig. 4). For maelstrom, the maximum clade credibility tree suggests duplication occurred very slightly prior to this split, followed by subsequent loss of one paralog the obscura subgroup (Fig. 4). However, this was poorly supported, and a duplication that post-dates the split between the affinis and pseudoobscura subgroups, and so does not require a hypothetical loss of one paralog from the affinis subgroup, may be more parsimonious. We used the posterior distributions of split times, relative to the divergence time of the obscura and pseudoobscura subgroups, to infer whether or not duplications occurred at approximately the same time (Fig. 5). Although the small amount of information available from single genes made relative timings highly uncertain, it is clear that few of the Ago2 duplications could have been concurrent with the piRNA-pathway duplications (Additional file 5: Figure S2). However, the recent and rapid duplications within the piRNA pathway could have been concurrent, with vreteno, tejas, maelstrom and asterix not differing significantly, all having duplicated very close to the split between D. obscura and D. pseudoobscura (posterior overlap > 0.1 in each case; Additional file 5: Figure S2).
Although four of the six piRNA pathway duplicates did not display altered tissue specificity compared to the ancestral copy, asterix and cutoff both became significantly more male biased, as did each of the Ago2 duplicates ; Fig. 1, Fig. 2). In each case, this was due to higher (or exclusive) expression in the testis. The duplicated genes also showed higher rates of adaptive amino acid substitution, together and individually, whereas only two (asterix and armitage) displayed evidence of positive selection when single-copy in D. melanogaster (Additional file 4: Table S3).
This new tissue specificity and the rapid evolution of duplicated copies broadly suggest that gene duplication in these pathways may be associated with functional diversification through neofunctionalization, for example by testis-specific selective pressure. Three main selective pressures seem likely candidates to have driven this process. First, given the role of Ago2 in Drosophila antiviral defense , and the role of the piRNA pathway in antiviral defense in mosquitoes , it is possible that these duplications have specialized to a virus that is active in the male germline, such as D. obscura and D. affinis Sigmaviruses . Second, given the role of all of these genes in the suppression of transposable elements (TEs), their evolution may have been shaped by the invasion of TEs that are more active in testis, as seen for Penelope  and copia . Such a ‘duplication arms-race’ in response to TE invasion is thought to occur in mammals, where repeated duplications of KRAB-ZNF family are selected following the invasion of novel TEs, and subsequently provide defence [53, 54]. Alternatively, duplicates may quantitatively enhance the pre-existing response to TEs, as suggested for another rapidly-evolving piRNA-pathway component, Rhino .
The third, and arguably most compelling, hypothesis is that selection is mediated by conflict between meiotic drive elements and their suppressors, such as sex ratio distorting X-chromosomes [56, 57]. Most directly, meiotic drive elements are common in Drosophila, and RNAi-related pathways have been widely implicated in their action and suppression [15, 16, 18]. In addition, sex-chromosome drive is widespread in the obscura group: X-chromosome drive was first described in D. obscura, has also been reported in pseudoobscura, persimilis, affinis, azteca, subobscura and athabasca , and is mediated through a testis-specific function (Y-bearing sperm have reduced function). Finally, a testis-specific class of hairpin (endo) siRNAs is required for male fertility in D. melanogaster , testes-restricted clustered miRNAs show rapid evolutionary turnover and are represented in large numbers in D. pseudoobscura , and suppression of sex-specific duplicates of S-Lap1 via a small-RNA mechanism has recently been implicated in the meiotic drive mechanism of D. pseudoobscura . In this context, it is also interesting to note that Ago2 is involved in directing heterochromatin formation in Drosophila dosage compensation [10,11,12], and that in D. melanogaster sex-ratio distorting Spiroplasma achieve male-killing through the disruption of dosage compensation (although this acts at the embryonic stage ). Nevertheless, in the absence of mechanistic studies, the link with meiotic drive remains speculative, as testis is generally more permissive to gene expression and testis-specific expression may be a transient state (i.e. the “Out of Testis Hypothesis” ).
In this study we have built on previous work that showed Drosophila RNAi-pathway genes evolve rapidly and adaptively [20,21,22,23]—including multiple duplications of Ago2 in the obscura group —to identify duplicates of several other RNAi-pathway genes in these species. We have shown that some of these gene duplicates have altered sex-biased expression, and that some have experienced positive selection following duplication. We suggest that this may have been driven by selection mediated through meiotic drive. Very recently, after submission of this article, improved sequencing of the D. miranda neo-Y and neo-X chromosomes identified a large number of previously undetected RNAi-pathway duplications (including homologs of Argonaute 2, Dicer-2, shutdown, cutoff, and Panoramix) within this species that also display testis-bias in their expression, further supporting a role for meiotic drive in shaping their evolution .
Sequence collation and paralog identification
The full-length sequences for 7 RNA inteference genes; Argonaute 2 (Ago2), armitage (armi), asterix (arx), cutoff (cuff), tejas (tej), maelstrom (mael), and vreteno (vret) from 12 obscura group species were identified using tBLASTn (BLAST+ 2.6.0)  with a local BLAST database (see below for the details of the construction of local genomic database). Known gene sequences from D. pseudoobscura and D. melanogaster were used as a query with a stringent e-value threshold (1e-40). Genes were inferred to have been duplicated when BLAST indicated that there were multiple full-length hits located in different genomic regions. The sequences were manually inspected, introns removed and the coding frame identified using Bioedit v 7.2 . Genes in D. pseudoobscura were classified as ancestral or duplicate copies based on the syntenic orthology with D. melanogaster using Flybase Genome Browser . High quality genomes are not available for other members of the obscura groups, and in those cases ancestral/derived status was assigned based on homology with D. pseudoobscura. To provide a comprehensive overview of the evolution of the RNAi paralogs, we included 24 Drosophila species outside of obscura group with assembled genomes already available in public databases. The Flybase and NCBI tblastn online portal were used to identify the target genes with queries from D. melanogaster or closely related species.
Five obscura group species had assembled genomes at the time of this study: D. pseudoobscura (assembly Dpse_3.0 ), D. miranda (assembly DroMir_2.2  ), D. persimilis (assembly dper_caf1 ), D. affinis (Drosophila affinis Genome Release 1.0 ) and D. lowei (Drosophila lowei Genome Release 1.0 [69, 70]) and in these cases the genome was directly used for local BLAST database. For four species (D. obscura, D. subobscura, D. subsilvestris, D. tristis) we used de novo assembled transcriptomes based on paired RNA-seq reads data from wild-collected males  (Accession: PRJNA312496). Assembly was performed using Trinity  with ‘--trimmomatric’ and otherwise default parameters, and the assembled transcriptome was searched locally using BLAST. For three other species: D. athabasca , D. algonquin  (Accession: PRJNA274695) and D. bifasciata (Accession: PRJDB4817), only unassembled genomic reads were available. For these species we applied a targeted assembly approach as follows: (i) reads that had local similarity with all known duplicated RNAi proteins were identified using Diamond  with relaxed e-value of 1; (ii) hits from Diamond were then retained and used for assembly using Spades v3.10.1 ; and (iii) scaffolds produced by Spades were then used as references in local BLAST database.
Phylogenetic analysis and the relative timing of duplications
Bayesian relaxed clock trees were used to infer the evolutionary relationship among paralogs. First, the sequences were aligned as translated nucleotide in Clustal W  with default parameters. Regions with ambiguous alignment were identified and removed manually by eye. A total of 7 gene trees were then inferred using Beast v1.7.0 . Inference used a relaxed clock model with an uncorrelated lognormal distribution among branches, and an HKY substitution model with empirical base frequencies and rate variation among sites was modelled as a gamma distribution with four categories. The site model allowed for third codon position to have different substitution model from the other positions.
The trees were scaled by setting the time to most recent common ancestor of the obscura group to have lognormal distribution with a data-scale mean of 1, and a very small standard deviation of 0.01. This had the advantage of scaling all duplications to the same relative timescale, while allowing different genes and different paralogs to vary in their rate. To record the posterior ages of duplication, we specified the ancestral and duplicated genes as a distinct taxon set. The Monte Carlo Markov Chain analysis was run for at least 100 million states and posterior sample was recorded every 10,000 states. Log files were then inspected in Tracer v1.6  for parameter stationarity, and adequate sampling as indicated by an effective sample size over 200. Finally, 25% of initial trees were discarded as burn-in, and maximum clade credibility trees were summarized using Tree Annotator. Parameter MCMC files were processed using a custom R script  to infer the posterior distribution the age of duplication for each gene and to quantify the degree overlapping between these age distributions. To provide a reference species tree (Fig. 3, in-set), we created a concatenated dataset from the ancestral copies of each duplicated gene, and inferred the gene tree in the same way.
Differential expression analysis of the duplicated RNAi genes
For this analysis, we used obscura group transcriptome datasets available in EBI ENA (European Nucleotide Archive, http://www.ebi.ac.uk/ena) and DDBJ (DNA DataBank of Japan, http://www.ddbj.nig.ac.jp/) that included the sex and tissue annotation. The datasets comprised 163, 42 and 34 RNA-seqs datasets of D. pseudoobscura, D. miranda and D. obscura respectively; Bioproject: DRA004463, PRJEB1227 , PRJNA226598, PRJNA219224 , PRJNA326536 , PRJNA74723, PRJNA321079, PRJNA291085 , PRJNA268967 . Since our main interest was the comparison expression between sex, but not its absolute expression value, we performed a simple read-counting analysis. In outline, each RNA-seq dataset was mapped to the full-length CDS using Bowtie2 v2.3.2  with mode ‘--very sensitive’ and otherwise default parameters. The reads mapped to reference were counted using combination of SAMtools view flag -F 4 and SAMtools idxstats v1.4 . The count data were then normalized by gene length and read depth, where it was then scaled relative to the expression of RpL32. To determine the statistical significance of difference gene expression, generalised linear mixed models were fitted using R package MCMCglmm  with sex as fixed effect and tissue as a random effect, and log-transformed normalised expression as the response variable. The natural logarithm transformation (loge) was used to reduce the skewness of the distributions. To allow for zero value for non-expressed genes, the genes with read count 0 was replaced with 1.
Where Y is loge transformed normalized expression data (response variable), μ is mean of loge transformed expression and ε is residual error. The random effects (tissue) and the residual were assumed to be distributed multivariate normal with mean 0 and uncorrelated covariance matrix MVN (0, Iσ2). Sex was modelled as a factor with 2 variables (male-female) and tissue contained 13 variables of different tissue.
Population genetic analysis of the RNAi duplicated genes
We used the McDonald-Krietman test  to compare the rate of adaptive evolution between ancestral and duplicate genes using polymorphism data from publicly-available sequencing datasets: Pseudobase (12 strains of pseudoobscura, Accession list: SRP007802 ) and 12 strains D. miranda (Bioproject: PRJNA277849 ).
Genomic reads for each strain were mapped to the genomic reference using Bowtie2  with ‘--very-sensitive’ mode and otherwise default parameters and reads mapped to the genes of interest were extracted using SAMtools view (flag -F 4). Duplicate reads were marked using MarkDuplicates (Picard Tools ). To reduce the excessive variants surrounding indel, we then applied GATK IndelRealigner , which discards the original mapping and performs local-realignment around indel. The output was then sorted and indexed and the BAM file was used for ‘mpileup’ variant calling (SAMtools v1.4 ). The output VCF files were then filtered to only include SNP (GATK SelectVariants ), and variants that were covered by less than five reads were masked with ‘N’ (undetermined bases, −-snpmask GATK v3.5 ). The variant files were then converted to FASTA format using GATK FastaAlternateReferenceMaker, which replaced genomic reference with variants defined in VCF files  and output the heterozygous calls with IUPAC ambiguous code. Finally, FastPHASE  was used to generate pseudo-haplotypes, although haplotype information was not utilized by the analysis.
MK tests were performed for each gene on D. pseudoobscura-D miranda. DNAsp v5.0  was used to estimate the statistics for the MK test and Fisher’s exact test was used to calculate the statistical significance for single-gene analyses. Genes were then grouped into ancestral and duplicate genes, and a cross-gene analysis was performed using a maximum likelihood extension of the MK test . Five different models were fitted that differed in the constraint of α (proportion of non-synonymous subtitutions estimated to be adaptive), and the relative support between models was compared using Akaike Weights.
Binary Alignment Map
Basic Alignment Search Tool
Genome Analysis Toolkit
Hasegawa, Kishino and Yano model
Monte Carlo Markov Chain
Single Nucleotide Polymorphism
Variant Call Format
Hahn MW. Distinguishing among evolutionary models for the maintenance of gene duplicates. J Hered. 2009;100:605–17. https://doi.org/10.1093/jhered/esp047.
Innan H, Kondrashov F. The evolution of gene duplications: classifying and distinguishing between models. Nat Rev Genet. 2010;11:97–108. https://doi.org/10.1038/nrg2689.
Lewis SH, Salmela H, Obbard DJ. Duplication and diversification of dipteran argonaute genes, and the evolutionary divergence of Piwi and Aubergine. Genome Biol Evol. 2016;8:507–18.
Lewis SH, Webster CL, Salmela H, Obbard DJ. Repeated duplication of Argonaute2 is associated with strong selection and testis specialization in Drosophila. Genetics. 2016;204:757–69.
Singh RK, Gase K, Baldwin IT, Pandey SP. Molecular evolution and diversification of the Argonaute family of proteins in plants. BMC Plant Biol. 2015;15:23. https://doi.org/10.1186/s12870-014-0364-6.
Vidigal JA, Ventura A. The biological functions of miRNAs: lessons from in vivo studies. Trends Cell Biol. 2015;25:137–47. https://doi.org/10.1016/j.tcb.2014.11.004.
Bronkhorst AW, van Rij RP. The long and short of antiviral defense: small RNA-based immunity in insects. Curr Opin Virol. 2014;7:19–28. https://doi.org/10.1016/J.COVIRO.2014.03.010.
Czech B, Hannon GJ. One loop to rule them all: the ping-pong cycle and piRNA-guided silencing. Trends Biochem Sci. 2016;41:324–37. https://doi.org/10.1016/j.tibs.2015.12.008.
Lewis SH, Quarles KA, Yang Y, Tanguy M, Frézal L, Smith SA, et al. Pan-arthropod analysis reveals somatic piRNAs as an ancestral defence against transposable elements. Nat Ecol Evol. 2018;2:174–81. https://doi.org/10.1038/s41559-017-0403-4.
Menon DU, Meller VH. A role for siRNA in X-chromosome dosage compensation in Drosophila melanogaster. Genetics. 2012;191:1023–8. https://doi.org/10.1534/genetics.112.140236.
Tang W, Seth M, Tu S, Shen E-Z, Li Q, Shirayama M, et al. A sex chromosome piRNA promotes robust dosage compensation and sex determination in C. elegans. Dev Cell. 2018;44:762–770.e3. https://doi.org/10.1016/j.devcel.2018.01.025.
Deshpande N, Meller VH. Chromatin that guides dosage compensation is modulated by the siRNA pathway in Drosophila melanogaster. Genetics. 2018;209:1085–97. https://doi.org/10.1534/genetics.118.301173.
Tao Y, Masly JP, Araripe L, Ke Y, Hartl DL. A sex-ratio meiotic drive system in Drosophila simulans. I: an autosomal suppressor. PLoS Biol. 2007;5:e292. https://doi.org/10.1371/journal.pbio.0050292.
Tao Y, Araripe L, Kingan SB, Ke Y, Xiao H, Hartl DL. A sex-ratio meiotic drive system in Drosophila simulans. II: an X-linked distorter. PLoS Biol. 2007;5:e293. https://doi.org/10.1371/journal.pbio.0050293.
Gell SL, Reenan RA. Mutations to the piRNA pathway component aubergine enhance meiotic drive of segregation distorter in Drosophila melanogaster. Genetics. 2013;193:771–84. https://doi.org/10.1534/genetics.112.147561.
Aravin AA, Klenov MS, Vagin VV, Bantignies F, Cavalli G, Gvozdev VA. Dissection of a natural RNA silencing process in the Drosophila melanogaster germ line. Mol Cell Biol. 2004;24:6742–50. https://doi.org/10.1128/MCB.24.15.6742-6750.2004.
Bachtrog D, Ellison C. Recurrent gene amplification on Drosophila Y chromosomes suggests cryptic sex chromosome drive is common on young sex chromosomes. bioRxiv. 2018:324368. https://doi.org/10.1101/324368.
Smibert P, Lai EC, Lin C-J, Hu F, Wen J, Dubruille R, et al. The hpRNA/RNAi pathway is essential to resolve Intragenomic conflict in the Drosophila male Germline. Dev Cell. 2018;46:316–326.e5. https://doi.org/10.1016/j.devcel.2018.07.004.
Hill T, Koseva BS, Unckless RL. The genome of Drosophila innubila reveals lineage-specific patterns of selection in immune genes. Mol Biol Evol. 2019. https://doi.org/10.1093/molbev/msz059.
Obbard DJ, Jiggins FM, Halligan DL, Little TJ. Natural selection drives extremely rapid evolution in antiviral RNAi genes. Curr Biol. 2006;16:580–5.
Obbard DJ, Gordon KHJ, Buck AH, Jiggins FM. The evolution of RNAi as a defence against viruses and transposable elements. Philos Trans R Soc Lond Ser B Biol Sci. 2009;364:99–115. https://doi.org/10.1098/rstb.2008.0168.
Kolaczkowski B, Hupalo DN, Kern AD. Recurrent adaptation in RNA interference genes across the Drosophila phylogeny. Mol Biol Evol. 2011;28:1033–42. https://doi.org/10.1093/molbev/msq284.
Palmer WH, Hadfield JD, Obbard DJ. RNA-interference pathways display high rates of adaptive protein evolution in multiple invertebrates. Genetics. 2018;208:1585–99. https://doi.org/10.1534/genetics.117.300567.
Marques JT, Carthew RW. A call to arms: coevolution of animal viruses and host innate immune responses. Trends Genet. 2007;23:359–64. https://doi.org/10.1016/J.TIG.2007.04.004.
van Mierlo JT, Overheul GJ, Obadia B, van Cleef KWR, Webster CL, Saleh MC, et al. Novel Drosophila viruses encode host-specific suppressors of RNAi. PLoS Pathog. 2014;10:e1004256. https://doi.org/10.1371/journal.ppat.1004256.
Blumenstiel JP, Erwin AA, Hemmer LW. What drives positive selection in the Drosophila piRNA machinery? The genomic autoimmunity hypothesis. Yale J Biol Med. 2016;89:499–512.
Dowling D, Pauli T, Donath A, Meusemann K, Podsiadlowski L, Petersen M, et al. Phylogenetic origin and diversification of RNAi pathway genes in insects. Genome Biol Evol. 2017;1:evw281. https://doi.org/10.1093/gbe/evw281.
O’Grady PM, DeSalle R. Phylogeny of the genus Drosophila. Genetics. 2018;209:1–25. https://doi.org/10.1534/genetics.117.300583.
Hahn MW, Han MV, Han SG. Gene family evolution across 12 Drosophila genomes. PLoS Genet. 2007;3:2135–46. https://doi.org/10.1371/journal.pgen.0030197.
Saito K, Ishizu H, Komai M, Kotani H, Kawamura Y, Nishida KM, et al. Roles for the Yb body components Armitage and Yb in primary piRNA biogenesis in Drosophila. Genes Dev. 2010;24:2493–8. https://doi.org/10.1101/gad.1989510.
Vourekas A, Zheng K, Fu Q, Maragkakis M, Alexiou P, Ma J, et al. The RNA helicase MOV10L1 binds piRNA precursors to initiate piRNA processing. Genes Dev. 2015;29:617–29. https://doi.org/10.1101/gad.254631.114.
Ohtani H, Iwasaki YW, Shibuya A, Siomi H, Siomi MC, Saito K. DmGTSF1 is necessary for Piwi-piRISC-mediated transcriptional transposon silencing in the Drosophila ovary. Genes Dev. 2013;27:1656–61. https://doi.org/10.1101/gad.221515.113.
Dönertas D, Sienski G, Brennecke J. Drosophila Gtsf1 is an essential component of the Piwi-mediated transcriptional silencing complex. Genes Dev. 2013;27:1693–705. https://doi.org/10.1101/gad.221150.113.
Mohn F, Sienski G, Handler D, Brennecke J. The rhino-deadlock-cutoff complex licenses noncanonical transcription of dual-Strand piRNA clusters in Drosophila. Cell. 2014;157:1364–79. https://doi.org/10.1016/j.cell.2014.04.031.
Sato K, Siomi MC. Functional and structural insights into the piRNA factor maelstrom. FEBS Lett. 2015;589:1688–93. https://doi.org/10.1016/j.febslet.2015.03.023.
Patil VS, Kai T. Repression of Retroelements in Drosophila Germline via piRNA pathway by the Tudor domain protein Tejas. Curr Biol. 2010;20:724–30. https://doi.org/10.1016/j.cub.2010.02.046.
Zamparini AL, Davis MY, Malone CD, Vieira E, Zavadil J, Sachidanandam R, et al. Vreteno, a gonad-specific protein, is essential for germline development and primary piRNA biogenesis in Drosophila. Development. 2011;138:4039–50. https://doi.org/10.1242/dev.069187.
Leader DP, Krause SA, Pandit A, Davies SA, Dow JAT. FlyAtlas 2: a new version of the Drosophila melanogaster expression atlas with RNA-Seq, miRNA-Seq and sex-specific data. Nucleic Acids Res. 2018;46:D809–15. https://doi.org/10.1093/nar/gkx976.
Wong Miller KM, Bracewell RR, Eisen MB, Bachtrog D. Patterns of genome-wide diversity and population structure in the Drosophila athabasca species complex. Mol Biol Evol. 2017. https://doi.org/10.1093/molbev/msx134.
Alekseyenko AA, Ellison CE, Gorchakov AA, Zhou Q, Kaiser VB, Toda N, et al. Conservation and de novo acquisition of dosage compensation on newly evolved sex chromosomes in Drosophila. Genes Dev. 2013;27:853–8. https://doi.org/10.1101/gad.215426.113.
Nozawa M, Onizuka K, Fujimi M, Ikeo K, Gojobori T. Accelerated pseudogenization on the neo-X chromosome in Drosophila miranda. Nat Commun. 2016;7:13659. https://doi.org/10.1038/ncomms13659.
Seabra SG, Fragata I, Antunes MA, Faria GS, Santos MA, Sousa VC, et al. Different genomic changes underlie adaptive evolution in populations of contrasting history. Mol Biol Evol. 2018;35:549–63. https://doi.org/10.1093/molbev/msx247.
Meisel RP, Hilldorfer BB, Koch JL, Lockton S, Schaeffer SW. Adaptive evolution of genes duplicated from the Drosophila pseudoobscura neo-X chromosome. Mol Biol Evol. 2010;27:1963–78. https://doi.org/10.1093/molbev/msq085.
McDonald JH, Kreitman M. Adaptive protein evolution at the Adh locus in Drosophila. Nature. 1991;351:652–4. https://doi.org/10.1038/350055a0.
Welch JJ. Estimating the genomewide rate of adaptive protein evolution in drosophila. Genetics. 2006;173:821–37. https://doi.org/10.1534/genetics.106.056911.
Charlesworth J, Eyre-Walker A. The McDonald-Kreitman test and slightly deleterious mutations. Mol Biol Evol. 2008;25:1007–15. https://doi.org/10.1093/molbev/msn005.
Fuller ZL, Haynes GD, Richards S, Schaeffer SW. Genomics of natural populations: how differentially expressed genes shape the evolution of chromosomal inversions in Drosophila pseudoobscura. Genetics. 2016;204:287–301. https://doi.org/10.1534/genetics.116.191429.
Pang J-F, Gao J-J, Zhang Y-P, Watabe H-A, Aotsuka T. Molecular phylogeny of the Drosophila obscura species group, with emphasis on the Old World species. BMC Evol Biol. 2007;7:87. https://doi.org/10.1186/1471-2148-7-87.
Campbell CL, Black WC, Hess AM, Foy BD. Comparative genomics of small RNA regulatory pathway components in vector mosquitoes. BMC Genomics. 2008;9:425. https://doi.org/10.1186/1471-2164-9-425.
Longdon B, Wilfert L, Obbard DJ, Jiggins FM. Rhabdoviruses in two species of drosophila: vertical transmission and a recent sweep. Genetics. 2011;188:141–50. https://doi.org/10.1534/genetics.111.127696.
Rozhkov NV, Aravin AA, Zelentsova ES, Schostak NG, Sachidanandam R, McCombie WR, et al. Small RNA-based silencing strategies for transposons in the process of invading Drosophila species. RNA. 2010;16:1634–45. https://doi.org/10.1261/rna.2217810.
Pasyukova E, Nuzhdin S, Li W, Flavell AJ. Germ line transposition of the copia retrotransposon in Drosophila melanogaster is restricted to males by tissue-specific control of copia RNA levels. Mol Gen Genet. 1997;255:115–24. https://doi.org/10.1007/s004380050479.
Thomas JH, Schneider S. Coevolution of retroelements and tandem zinc finger genes. Genome Res. 2011;21:1800–12. https://doi.org/10.1101/gr.121749.111.
Jacobs FMJ, Greenberg D, Nguyen N, Haeussler M, Ewing AD, Katzman S, et al. An evolutionary arms race between KRAB zinc-finger genes ZNF91/93 and SVA/L1 retrotransposons. Nature. 2014;516:242–5. https://doi.org/10.1038/nature13760.
Levine MT, Van der Wende HM, Hsieh E, Baker E-CPEP, Malik HS, Wende Vander HM, et al. Recurrent gene duplication diversifies genome defense repertoire in Drosophila. Mol Biol. 2016;33:1–13.
Jaenike J. Sex chromosome meiotic drive. Annu Rev Ecol Syst. 2001;32:25–49. https://doi.org/10.1146/annurev.ecolsys.32.081501.113958.
Lindholm AK, Dyer KA, Firman RC, Fishman L, Forstmeier W, Holman L, et al. The ecology and evolutionary dynamics of meiotic drive. Trends Ecol Evol. 2016;31:315–26. https://doi.org/10.1016/J.TREE.2016.02.001.
Wen J, Duan H, Bejarano F, Okamura K, Fabian L, Brill JA, et al. Adaptive regulation of testis gene expression and control of male fertility by the drosophila harpin RNA pathway. Mol Cell. 2015;57:165–78. https://doi.org/10.1016/j.molcel.2014.11.025.
Mohammed J, Flynt AS, Panzarino AM, Mondal MMH, DeCruz M, Siepel A, et al. Deep experimental profiling of microRNA diversity, deployment, and evolution across the Drosophila genus. Genome Res. 2018;28:52–65. https://doi.org/10.1101/gr.226068.117.
Harumoto T, Anbutsu H, Lemaitre B, Fukatsu T. Male-killing symbiont damages host’s dosage-compensated sex chromosome to induce embryonic apoptosis. Nat Commun. 2016;7:12781. https://doi.org/10.1038/ncomms12781.
Kaessmann H. Origins, evolution, and phenotypic impact of new genes. Genome Res. 2010;20:1313–26. https://doi.org/10.1101/gr.101386.109.
Bachtrog D, Mahajan S. Massive gene amplification on a recently formed Drosophila Y chromosome. bioRxiv. 2018:451005. https://doi.org/10.1101/451005.
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215:403–10. https://doi.org/10.1016/S0022-2836(05)80360-2.
Thomas H. BioEdit: a user-firendly biological sequence alignment editor and analysis program for windows 95/95/NT. Nucleic Acids Symp Ser. 1999;41:95–8 doi:citeulike-article-id:691774.
St. Pierre SE, Ponting L, Stefancsik R, McQuilton P. FlyBase 102—advanced approaches to interrogating FlyBase. Nucleic Acids Res. 2014;42:D780–8. https://doi.org/10.1093/nar/gkt1092.
English AC, Richards S, Han Y, Wang M, Vee V, Qu J, et al. Mind the gap: upgrading genomes with Pacific biosciences RS long-read sequencing technology. PLoS One. 2012;7:e47768. https://doi.org/10.1371/journal.pone.0047768.
Zhou Q, Bachtrog D. Sex-specific adaptation drives early sex chromosome evolution in Drosophila. Science (80- ). 2012;337:341–5. https://doi.org/10.1126/science.1225385.
Drosophila 12 Genomes Consortium AG, Clark AG, Eisen MB, Smith DR, Bergman CM, Oliver B, et al. Evolution of genes and genomes on the Drosophila phylogeny. Nature. 2007;450:203–18. https://doi.org/10.1038/nature06341.
Palmieri N, Kosiol C, Schlötterer C. The life cycle of Drosophila orphan genes. Elife. 2014;3:e01311. https://doi.org/10.7554/eLife.01311.
McGaugh SE, Heil CSS, Manzano-Winkler B, Loewe L, Goldstein S, Himmel TL, et al. Recombination modulates how selection affects linked sites in Drosophila. PLoS Biol. 2012;10:e1001422. https://doi.org/10.1371/journal.pbio.1001422.
Webster CL, Waldron FM, Robertson S, Crowson D, Ferrari G, Quintana JF, et al. The discovery, distribution, and evolution of viruses associated with Drosophila melanogaster. PLoS Biol. 2015;13:e1002210. https://doi.org/10.1371/journal.pbio.1002210.
Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol. 2011;29:644–52.
Buchfink B, Xie C, Huson DH. Fast and sensitive protein alignment using DIAMOND. Nat Methods. 2014;12:59–60. https://doi.org/10.1038/nmeth.3176.
Nurk S, Bankevich A, Antipov D, Gurevich A, Korobeynikov A, Lapidus A, et al. Assembling genomes and mini-metagenomes from highly chimeric reads. In: Research in computational molecular biology. RECOMB 2013. Lecture notes in computer science. Berlin, Heidelberg: Springer; 2013. p. 158–70. https://doi.org/10.1007/978-3-642-37195-0_13.
Thompson JD, Higgins DG, Gibson TJ. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994;22:4673–80. https://doi.org/10.1093/nar/22.22.4673.
Drummond AJ, Rambaut A. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol Biol. 2007;7:214. https://doi.org/10.1186/1471-2148-7-214.
Rambaut A. Tracer v1.6. http://tree.bio.ed.ac.uk/software/tracer/. 2013.
R Core Team. R: A Language and Environment for statistical Computing. 2017. https://www.r-project.org.
Chen Z-X, Sturgill D, Qu J, Jiang H, Park S, Boley N, et al. Comparative validation of the D. melanogaster modENCODE transcriptome annotation. Genome Res. 2014;24:1209–23. https://doi.org/10.1101/gr.159384.113.
VanKuren NW, Vibranovski MD. A novel dataset for identifying sex-biased genes in Drosophila. J Genomics. 2014;2:64–7. https://doi.org/10.7150/jgen.7955.
Nyberg KG, Machado CA. Comparative expression dynamics of Intergenic long noncoding RNAs in the genus Drosophila. Genome Biol Evol. 2016;8:1839–58. https://doi.org/10.1093/gbe/evw116.
Gomes S, Civetta A. Hybrid male sterility and genome-wide misexpression of male reproductive proteases. Sci Rep. 2015;5:11976. https://doi.org/10.1038/srep11976.
Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10:R25. https://doi.org/10.1186/gb-2009-10-3-r25.
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The sequence alignment/map format and SAMtools. Bioinformatics. 2009;25:2078–9. https://doi.org/10.1093/bioinformatics/btp352.
Hadfield JD. MCMC methods for multi-response generalized linear mixed models: the MCMCglmm R package. J Stat Softw. 2010;33:1–22. https://doi.org/10.18637/jss.v033.i02.
Smukowski Heil CS, Ellison C, Dubin M, Noor MAF. Recombining without hotspots: a comprehensive evolutionary portrait of recombination in two closely related species of Drosophila. Genome Biol Evol. 2015;7:2829–42. https://doi.org/10.1093/gbe/evv182.
Van der Auwera GA, Carneiro MO, Hartl C, Poplin R, del Angel G, Levy-Moonshine A, et al. From fastQ data to high-confidence variant calls: the genome analysis toolkit best practices pipeline. Curr Protoc Bioinforma. 2013; SUPL;43:11.10.1–11.10.33. https://doi.org/10.1002/0471250953.bi1110s43.
Li H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics. 2011;27:2987–93. https://doi.org/10.1093/bioinformatics/btr509.
Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, et al. The variant call format and VCFtools. Bioinformatics. 2011;27:2156–8. https://doi.org/10.1093/bioinformatics/btr330.
Scheet P, Stephens M. A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and Haplotypic phase. Am J Hum Genet. 2006;78:629–44. https://doi.org/10.1086/502802.
Librado P, Rozas J. DnaSP v5: a software for comprehensive analysis of DNA polymorphism data. Bioinformatics. 2009;25:1451–2. https://doi.org/10.1093/bioinformatics/btp187.
Li WH. Unbiased estimation of the rates of synonymous and nonsynonymous substitution. J Mol Evol. 1993;36:96–9. https://doi.org/10.1007/BF02407308.
We thank Billy Palmer for initial discussion on the variant calling, and Billy Palmer and Samuel Lewis for comments on an earlier version of this manuscript. We thank the many people who made their published data publicly available, and Shu Kondo for permission to use unpublished data from D. bifasciata.
DC was financially supported by a Master’s Training Scholarship from the Indonesian Endowment Fund for Education (LPDP) and the University of Edinburgh School of Biological Science Bursary for MSc in Quantitative Genetics and Genome Analysis. The funding body had no role in the design of the study or in the collection, analysis, or interpretation of data, or in the writing of the manuscript.
Availability of data and materials
Fasta alignment (both for phylogenetic and MK analysis) and raw expression data are available via Figshare (DOI: https://doi.org/10.6084/m9.figshare.7145720).
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Table S1. The detailed RNAi genes and its duplicate in D. pseudoobscura. The genomic position is based on the D. pseudoobscura assembly 3.0. (XLSX 10 kb)
Figure S1. The expression profile of RNAi across tissue. The normalized expression plotted across tissues. The error bars denote the standard error for the given tissue. Blue bar indicates male and female is indicated by red bar. The plot is shown for D. pseudoobscura (n = 163), D. miranda (n = 42) and D. obscura (n = 34), respectively. (PDF 131 kb)
Table S2. MK test of duplicated RNAi genes in D. pseudoobscura-D. miranda. Ds is synonymous divergence, Dn is non-synonymous divergence, Pn is the non-synonymous polymorphisms, Ps is the synonymous polymorphisms, α represents proportion of substitutions that are adaptive, a is the absolute number of adaptive substitutions. Ln is the number of non-synonymous sites, Ls is the number of synonymous sites. Ka is the number of non-synonymous mutations per non-synonymous sites and Ks the number of synonymous mutations per synonymous sites from single randomly chosen strain for each species (Li, 1993  calculated using R package seqinr). Parameter ωa is identical to Ka/Ks ratio except that the numerator only takes adaptive divergence (α * Dn)/Ln)/(Ds/Ls). (XLSX 10 kb)
Table S3. Additional MK test analysis. Table the results of additional MK test analysis where minor allele frequency is removed, repetition with larger dataset and results in D. melanogaster. (XLSX 12 kb)
Figure S2. The heat-map p-value of the difference between pairwise posterior distributions. Large p-value (> 0.05) indicates the overlapping distribution and the duplication time might be shared (blue box). Red boxes denote comparison with p-value < 0.05 which indicate non-overlapping posterior distribution and an asynchronous duplication event. Pink colored box indicates the marginally significant (0.01 < p-value < 0.05). (PDF 221 kb)