Selection shapes turnover and magnitude of sex-biased expression in Drosophila gonads

Background Sex-biased gene expression is thought to drive the phenotypic differences in males and females in metazoans. Drosophila has served as a primary model for studying male-female differences in gene expression, and its effects on protein sequence divergence. However, the forces shaping evolution of sex-biased expression remain largely unresolved, including the roles of selection and pleiotropy. Research on sex organs in Drosophila, employing original approaches and multiple-species contrasts, provides a means to gain insights into factors shaping the turnover and magnitude (fold-bias) of sex-biased expression. Results Here, using recent RNA-seq data, we studied sex-biased gonadal expression in 10,740 protein coding sequences in four species of Drosophila, D. melanogaster, D. simulans, D. yakuba and D. ananassae (5 to 44 My divergence). Using an approach wherein we identified genes with lineage-specific transitions (LSTs) in sex-biased status (amongst testis-biased, ovary-biased and unbiased; thus, six transition types) standardized to the number of genes with the ancestral state (S-LSTs), and those with clade-wide expression bias status, we reveal several key findings. First, the six categorical types of S-LSTs in sex-bias showed disparate rates of turnover, consistent with differential selection pressures. Second, the turnover in sex-biased status was largely unrelated to cross-tissue expression breadth, suggesting pleiotropy does not restrict evolution of sex-biased expression. Third, the fold-sex-biased expression, for both testis-biased and ovary-biased genes, evolved directionally over time toward higher values, a crucial finding that could be interpreted as a selective advantage of greater sex-bias, and sexual antagonism. Fourth, in terms of protein divergence, genes with LSTs to testis-biased expression exhibited weak signals of elevated rates of evolution (than ovary-biased) in as little as 5 My, which strengthened over time. Moreover, genes with clade-wide testis-specific expression (44 My), a status not observed for any ovary-biased genes, exhibited striking acceleration of protein divergence, which was linked to low pleiotropy. Conclusions By studying LSTs and clade-wide sex-biased gonadal expression in a multi-species clade of Drosophila, we describe evidence that interspecies turnover and magnitude of sex-biased expression have been influenced by selection. Further, whilst pleiotropy was not connected to turnover in sex-biased gonadal expression, it likely explains protein sequence divergence. Electronic supplementary material The online version of this article (10.1186/s12862-019-1377-4) contains supplementary material, which is available to authorized users.

The RNA-seq data listed were used for expression analyses with respect to tissue type, noting that when more than one RNA-seq dataset was available, the largest sample was used for study for parallel data sets in all contrasts. See Methods and Text File S5 for more details on RNA-seq datasets and analyses.           Table S4. Only genes studied with dN and dS<1.5 and dS>0.001 were included, and thus these are conservative estimates.

Text File S2: Fold Bias
For testis-biased genes, the largest percentage belonged to the ≥10 fold-biased class (>59% of each testis-biased gene set), whereas the majority of ovary-biased genes were contained within the ≥2-5 fold category for each of the four species. This indicates that testis-biased gene expression is skewed toward higher fold-bias than ovary-biased expression (Table S8; Chi 2 -tests of the ≥2-5 fold and of the ≥10-fold bias classes between testis-and ovary-biased genes, P<0.0001 per species).
As shown in Fig. S2, we found that an increase in fold testis-biased expression was associated with elevated expression in the testis, with the highest expression observed in the ≥10-fold class for each of four species (Ranked ANOVA and Dunn's paired contrast P<0.05 Fig. S2A-D). These high Our findings concur with those of our previous study on A. aegypti, wherein fold-bias in gonad expression was correlated with changes in transcript levels in both sexes [5]. Further interspecies gonadal data from a wider range of insect genera will help to determine the generality of these patterns.

Text File S3: Functions of genes with conserved testis-specific expression in all four species
Genes with conserved testis-specific expression across all four species are of particular interest as their conserved specificity implies that there has been a long-term selective advantage of exclusive expression in the male gonad. We thus conducted to GO analysis to assess the function of the universally testis-specific genes (N=171) and found that genes with the highest enrichment scores were are involved ATPase activity and membrane functions (Table S9). This is consistent with findings from mice showing that ATPase activity affects male fertility by regulating normal spermatogenesis and apoptosis [6], and from D. melanogaster showing that testis-biased genes contain an overrepresentation of genes involved in ATP biosynthesis [7].

Text File S4: Minimal differences in positive selection between testis-and ovary-biased genes
To further evaluate the role of pleiotropy on dN/dS, particularly with respect to the other feasible hypothesis of adaptive evolution, we assessed positive selection for the universally testis-biased, ovarybiased and unbiased genes using maximum likelihood codon "sites" analyses in PAML [3]. For this, we compared the models M7 versus M8. The former model permits negative selection and neutral evolution, and the latter additionally allows for positive selection at codon sites [3]. As positive selection is highly sensitive to alignments and divergence level, we studied only the subset of genes with dN and dS <1.5 and dS>0.001 in all of the four species, including in the divergent species Dana, making this analysis highly conservative. As shown in Table S10, using genes matching these criteria, we found signatures of positive selection in 12.4%, 10.2% and 9.5% of genes that were universally testis-biased, ovary biased and unbiased respectively (P<0.05 for 2∆lnL). These signatures were consistent with mildly, but not statistically significantly, higher frequency of genes with positive selection in the testisbiased gene set (Chi 2 P>0.05 for paired contrasts).
For additional rigor, we examined positive selection tests for the melanogaster group available from flyDIVaS [4,8]. This database includes two additional species in addition to the four examined here (D. sechellia and D. erecta), and should be an indicator of the proportion of genes evolving adaptively in this clade. Using that database (and genes with successful orthology calls and appropriately unsaturated alignments therein), we found signatures of positive selection in 16.0, 13.6, and 10.8% of the universally testis-biased, ovary-biased and unbiased genes respectively. For each of these categories, the proportion of genes with such signatures was higher than our calculated estimates based on our four study species, consistent with inclusion of more species. Nonetheless, whilst the group of universally testis-biased genes had more instances of positive selection than the comparable ovarybiased genes, only its comparison to unbiased genes was statistically significant (Chi -2 P<0.05). Further, the actual difference between testis-biased and ovary-biased genes was again small (net difference in percentages was <2.5%). Together, these data suggest that while positive selection is more common overall in testis-biased than ovary-biased genes, this difference only affects a small subset of genes (difference ≤2.5% using either approach). Extensive positive selection has been thought to be characteristic of male-sexual genes for Drosophila [9][10][11], but this has not always been found in this taxon [12], nor in other organisms [5]. Our results suggest a weak effect for the universally testis-biased genes examined here.
In order to test for adaptive evolution accompanying transitions to sex-biased expression, we examined those genes exhibiting LSTs from unbiased to sex-biased status, that is unb-ts and unb-ov (Table 1), using branch-site analyses [3], where the species branch containing the LST was chosen as the foreground (tested) branch for each gene [3]. We report that branch-site positive selection was observed for both types of transitions in each species, with a low of 5.4% for genes with unb-ts LSTs in Dmel up to 27.2% for unb-ts LSTs in the comparatively much longer branch Dana. However, there were minimal differences in the proportion of unb-ts and unb-ov genes exhibiting positive selection within each species terminal branch. Specifically, the net differences observed between unb-ts and unb-ov were less than 2.7% between the two LST categories for each species, with unb-ov having mildly higher values than unb-ts for Dmel, Dsim and Dyak and the opposite trend detected for Dana. This analysis thus indicates no evidence of greater instances of positive selection for LSTs to testis-biased expression than to ovary-biased expression (Table S11). In sum, a mildly elevated level of positive selection was observed for universally testis-biased genes, and no effect for LSTs, suggesting that the strong and persistent rapid evolution of testis-biased genes observed here (Figs. 5, S3) is better explained by low pleiotropy (Fig. 6) than by adaptive sequence evolution.

Text File S5: The current approach and RNA-seq data
The RNA-seq data in Table S1 used for the present study comprise deep sequence datasets, with >42 million paired-end reads per sample/tissue, which were utilized to generate FPKM per gonadal tissue per species. Use of deep large-scale and single RNA-seq samples (per tissue) has been repeatedly employed to comparatively study tissue expression in metazoans [13][14][15][16], including Drosophila studies based on public data such as Drosophila modENCODE [15,17] and sex-biased gene expression [1,14]; the latter is facilitated by very high correlations observed among replicates of sex-related samples (see below, e.g., [18]). In a similar manner, single expressed-sequence tags (ESTs) datasets (per tissue) have been repeatedly and effectively used to characterize and compare expression within and among tissues, including sexual tissues [7,[19][20][21][22][23][24].
In our assessment, we determined testis and ovary expression levels using mapping of the deep RNA-seq samples (Table S1) to species-specific CDS using Geneious 11.0.3 [25] as outlined in Methods. Very strong correlations were found across all 10,770 genes in ovary FPKM (Spearman's R=0.93, P<2X10 -7 ) and in testis FPKM (R=0.87, P<2X10 -7 ) between the two most closely related species Dmel and Dsim (Fig. 2B) affirming the effectiveness of these large-scale read samples for precisely quantifying sex-related expression profiles (Fig. 2B) (see also below).
While Dmel and Dana RNA-seq had unitary samples in the available gonadal data sets [1], Dsim and Dyak had two or three replicates (that were provided at the SRA); for comparability of sampling per species, we used the single or the largest RNA-seq dataset that was available per taxon for study (Table   S1). For additional stringency, we wished to assess FPKM for a species using replicated data for comparison to our results obtained using data from Table S1. For this assessment, we downloaded the second largest ovary and testis RNA-seq sets for Dsim, as LSTs in that branch occurred most recently (5 My, than Dyak). Identifiers for ovary and testis replicates used for Dsim at the SRA database are SRR1511608, SRR1548741; the testes RNA-seq sample had 14 million fewer paired-reads than replicate 1 (Table S1) while ovaries had similar read counts (less than 300,000 fewer reads than in Table   S1).
We found FPKM values using the two RNA-seq data sets for the ovaries and for the testes were each very strongly correlated in Dsim. Specifically, the Spearman's ranked R for FPKM between replicates across all 10,740 genes was 0.96 for the ovaries and was 0.94 for testes, P<2X10 -7 .
Furthermore, the ratio of testis:ovary expression, measured as log2 (testis FPKM/ovary FPKM), across all genes was also strongly correlated R=0.86, P<2X10 -7 between the two replicates. Thus, FPKM values in testes and in ovaries were remarkably consistent between the two replicates in Dsim, as was the degree of gonadal sex-biased expression (fold testis:ovary bias), affirming very high reproducibility of each of the single sex-related gene sets.
We next conducted differential expression analyses using P-values from Deseq2 [26] and the two replicates (average values per tissue, FPKM ≥1 in one tissue type) for testes and for ovaries in Dsim (denoted as Approach 2), and compared the findings to our main results in our study (denoted as Approach 1; Fig. 1). We found that the vast majority of genes, 88.9%, had the same categorical SBS using the Approach 1 in our main study (Table S1) and using Approach 2 of the replicated datasets.
Most of the variation, 8.9%, between the two approaches was, as expected, from genes near the threshold of sex-biased (2 to 5 fold) and unbiased status. No genes, that is 0%, had opposite (sex-bias) calls between testis-biased and ovary-biased status between the two methods (Table S1). Thus, using replicated samples and Deseq2 resulted in a good agreement with the main analyses; nonetheless, the former more conservative approach tended to classify more genes near the cutoff of sex-biased expression as unbiased.
Finally, we compared our classifications of LSTs using data in Table S1 to that obtained using the replicated testis-ovary datasets. Using the more conservative Approach 2, we found that the majority of Dsim genes with LSTs (75.1%) retained the same categorical LST status as found in our main study (Approach 1; Table S4). However, a subset of genes exhibited their ancestral state under Approach 2 (values of 0, 0, 3.7, and 17.3% of the observed LSTs for ov-ts, ts-ov ts-unb, ov-unb respectively), particularly for unb-ov and unb-ts (48.3 and 22.4% per transition type). This is consistent with more conservative calls (under Approach 2) near the sex-biased expression threshold, and particularly for unbov as those LSTs had had lower fold-bias than unb-ts (see Fig. 4; note that sex-bias status changes are sensitive to methods and cutoffs, the latter of which are in-effect arbitrary [27]; thus the same approach should be employed for all contrasts, as conducted in our main study). Excluding all genes with any variation in SBS between the two approaches, that is using the largest RNA-seq dataset in Table S1 and using replicated (averaged expression) and Deseq2 analyses, still yielded a more than three-fold higher level of ts-ov than ov-ts (one-sided Chi 2 P=0.025) and a markedly lower rate of reversals in SBS than gains/losses (at least 6-fold for each type of contrast, P<0.0001). Thus, both approaches yielded similar patterns as observed in Table 2 and Table S4, indicating the results are robust to the method employed.
Using results from the RNA-seq data in our study (Table S1), we found strong correlations in gonad (for testis and for ovary) expression (FKPM) across all genes per sex between Dmel and Dsim