Population size may shape the accumulation of functional mutations following domestication

Background Population genetics theory predicts an important role of differences in the effective population size (Ne) among species on shaping the accumulation of functional mutations by regulating the selection efficiency. However, this correlation has never been tested in domesticated animals. Results Here, we synthesized 62 whole genome data in eight domesticated species (cat, dog, pig, goat, sheep, chicken, cattle and horse) and compared domesticates with their wild (or ancient) relatives. Genes with significantly different selection pressures (revealed by nonsynonymous/synonymous substitution rate ratios, Ka/Ks or ω) between domesticated (Dω) and wild animals (Wω) were determined by likelihood-ratio tests. Species-level effective population sizes (Ne) were evaluated by the pairwise sequentially Markovian coalescent (PSMC) model, and Dω/Wω were calculated for each species to evaluate the changes in accumulation of functional mutations after domestication relative to pre-domestication period. Correlation analysis revealed that the most recent (~ 10.000 years ago) Ne(s) are positively correlated with Dω/Wω. This result is consistent with the corollary of the nearly neutral theory, that higher Ne could boost the efficiency of positive selection, which might facilitate the overall accumulation of functional mutations. In addition, we also evaluated the accumulation of radical and conservative mutations during the domestication transition as: Dradical/Wradical and Dconservative/Wconservative, respectively. Surprisingly, only Dradical/Wradical ratio exhibited a positive correlation with Ne (p < 0.05), suggesting that domestication process might magnify the accumulation of radical mutations in species with larger Ne. Conclusions Our results confirm the classical population genetics theory prediction and highlight the important role of species’ Ne in shaping the patterns of accumulation of functional mutations, especially radical mutations, in domesticated animals. The results aid our understanding of the mechanisms underlying the accumulation of functional mutations after domestication, which is critical for understanding the phenotypic diversification associated with this process. Electronic supplementary material The online version of this article (10.1186/s12862-018-1120-6) contains supplementary material, which is available to authorized users.


Background
In one of his major works, The Variation of Animals and Plants Under Domestication, Darwin observed that domestication is the process during which striking phenotypic variation burgeons [1]. Much later, it has been suggested that the diversification of phenotypic variation in domesticated species might be attributed to the faster accumulation of functional (nonsynonymous) variants [2][3][4]. However, genome-wide patterns of accumulation of functional mutations pre-and post-domestication across different domesticated species are still poorly understood. According to the population genetics theory, fates of genetic variations may lie in the coupled effect of changes in selection intensities and effective population sizes (N e ) [5]. Against this backdrop of theoretical prediction, it is reasonable to evaluate the relative efficiency of mutation accumulation before and after domestication in the context of both selection and N e .
It has been theorized that selection may act upon domesticates in a stage-dependent manner [3,6]. More specifically, domestication may begin with an unintentional process (unconscious selection), characterized by the relaxation of selection constraints vital in wild environments, alongside the introduction of novel selection forces [7]. These early shifts in selection constraints may have contributed to the emergence of domesticationfacilitating traits, such as increased docility and tameness, which are thought to be prerequisites for the whole domestication process [8]. Although the early (unconscious) domestication began thousands of years ago, deliberate human selection is a process that emerged within the recent three centuries through intensive breeding, which has led to rapid improvement of desirable traits and creation of most modern breeds [9,10]. In addition to selection, another critical factor influencing the accumulation of mutations is the changes in N e . Unlike selection, the effect of population size on genome evolution is independent of specific domestication episodes. Once domestic populations formed and became isolated from their wild relatives, genetic drift, characterized by diminished N e , came to influence the domestication processes [11]. In this sense, although the eye-catching feature of domestication is selection itself, it (selection) has to "dance with shackles on" as the end results of selection are largely shaped within the frame of lineage N e .
Quantitatively, the magnitude of selection is commonly measured by the ratio of the number of nonsynonymous substitutions per nonsynonymous site (Ka) to the number of synonymous substitutions per synonymous site (Ks) -Ka/Ks (orω); where ω < 1 indicates purifying selection,ω = 1 -neutral selection, andω > 1 -positive selection [12][13][14]. Variations in selection strength may tune the amount of mutations: studies have found that domesticated animals accumulate functional mutations in some mitochondrial genes much faster than their wild relatives, in part due to the relaxed purifying selection [2,[15][16][17]. However, apparently, relaxation of purifying selection is only one of the possible directions of changes in selection strength, especially for nuclear genomes. In this synthesized study, based on all arithmetic possibilities of Ka/Ks changes following the domestication, we have identified six possible directions of selection pressure dynamics (also referred to as "selection dynamics" in this study) in domesticated animals: relaxed purifying selection (RPR; 1 > D ω > W ω ), intensified purifying selection (IPR; 1 > W ω > D ω ), intensified positive selection (IPS; D ω > W ω > 1), relaxed positive selection (RPS; W ω > D ω > 1), positive selection transition (PST; W ω < 1; D ω > 1), and purifying selection transition (PRT; W ω > 1; D ω < 1). In this way, we can trace the changes of accumulation of mutations from a broader perspective of selection dynamics.
Although the role of selection in determining the accumulation or even fixation of functional mutations has always been one of the focal points in evolutionary biology, the efficiency of selection is believed to be largely influenced by demographic changes in N e [5]. On the species level, different N e may be the key factor influencing the overall efficiency of selection. For example, primates (humans and orangutans) have 1.5 times higher Ka/Ks than rodent (mice and rats), probably due to differences in N e [14]. Likewise, human genomes may exhibit lower levels of both purifying and positive selection than chimpanzee genomes, probably owing to a smaller N e in humans [18]. It has been suggested that changes in the N e may have influenced the efficiency of selection for functional mutations from the very beginning of domestication [3,19]. However, the relationship between N e and accumulation of functional mutations has never been formally tested. In addition, differences in patterns of mutation accumulation of nuclear genes among domesticated species remain poorly understood. In this study, we have compared ω ratios, conservative mutations, and radical mutations among the eight domesticated species (pigs, dogs, goats, sheep, chicken, cats, cattle and horses) for which a relatively large amount of genomic data is available. These domesticated animals may serve as an appropriate model to understand how differences in the N e have influenced the efficiency of selection on the accumulation of mutations.

Datasets
We used genomic data from both domesticated animals and their progenitors to compare their differences. In total, 62 whole-genome datasets, including 36 genome assemblies and 36 genomic re-sequencing short reads (SRA), were downloaded from the NCBI database. To increase the reliability of variant calling, only the resequencing data with a comparatively high read depth in the current database (ranging from 6.23× to 57.31×) were included (Additional file 1). Since resequencing genomic data usually do not have publicly available gene annotations, we designed a "reference-mapping assembly" strategy to facilitate local annotation, by incorporating reference CDS and gene annotations (gtf or gff ), which were retrieved from the Ensembl database [20]. The "gff" file for goat (Table 1) was obtained from GigaDB database (http://gigadb.org/) [21,22].

Reference-mapping assembly and CDS extraction
Given that some of the species included in the analysis (cattle, dog, sheep, goat, cat, horse and chicken) have only one or two publicly available genome assemblies and gene annotations, a reference-mapping assembly approach was used to generate genome sequences of all downloaded short reads [24]. Sequencing adaptors removal and data quality control were performed using Trimmomatic-0.35 and FastQC [25,26]. Reads were discarded if Phred quality was < 20 over three consecutive base pairs (bp) and shorter than 40 bp. Referencemapping was carried out with Bowtie2, with a very sensitive alignment setting following suggestions in the manual [27]. After sorting the aligned bam files by Samtools v1.1, other tools in this package, including mpileup, bcftools and vcfutils, were invoked to produce the target genomes [28,29]. These target genomes were then used to extract the corresponding CDS sequences with gKaKs v1.3 program by incorporating both CDS and gene annotation (gff or gtf ) of public reference [30].

The dynamics of selective pressure
To calculate phylogeny-based Ka/Ks (ω) for each CDS in each individual of a species, we generated phylogenies and alignments as input files. Non-homologous sequences, multiple frame shift mutations and early stop codons were deleted by BLAT [31] and bl2seq [32]. In total, the number of sequence alignments varied from 14,441 in chicken to 23,019 in dog (Fig. 1). We produced the phylogeny-aware alignments by invoking the codon model in PRANK [33]. We randomly selected 1000 alignments of orthologs (determined using 1:1 orthologs from the BioMart database [34]) with at least 1 k bp to compute the maximum likelihood gene trees using RAxML [35], with 100 fast bootstrap replicates. Based on these gene trees, we used STAR [36] to infer phylogenetic trees for all studied species (Additional file 2).
Based on these alignments and phylogenetic trees, we calculated Ka/Ks ratios for two types of branches, "wild" and "domesticates", using PAML [37]. To further determine the significance level between them, we used "likelihood ratio test" (LRT) to compare two models, "two-ratios model" (TRM) and "one-ratio model" (ORM), with the chi-square approximation. TRM hypothesis assumes a different ratio between domesticates and wild branches, whereas ORM assumes the same ω for all branches. For the Ka/Ks ratios with extreme values, where only nonsynonymous or only synonymous mutations were detected, we kept them only if they were statistically significant [14]. For both domesticated and wild/ancient branches, the mean ω values of significantly different genes (Table 1, Fig. 2) were measured. In addition, accumulation levels of functional mutations at post-domestication stages relative to pre-domestication stages were compared using a novel metric: D ω /W ω , where D is a domesticated group and W a wild group (Fig. 3a). Since annotation artefacts of reference genomes equally affect domesticated and wild groups, this metric has the advantage of avoiding biases caused by different annotations. Selection dynamics types (see Introduction) were identified based on all arithmetic possibilities of Ka/Ks changes ( Table 2). By doing so, we can examine (a) whether the overall proportion of nonsynonymous mutations has increased after domestication in different species; and (b) whether these changes might be due to specific type(s) of selection dynamics.

Conservative and radical functional changes
It has been suggested that a high frequency of nonsynonymous mutations can lead to an increased ratio of radical mutations [38]. Here we categorized and compared radical and conservative nonsynonymous mutations based on physiochemical properties of proteins, such as amino acid charge, polarity and volume [39]. Conservative mutations are the changes wherein proteins retain similar physiochemical properties, whereas radical mutations are the ones with radical changes in physiochemical features of proteins. We evaluated the occurrence of radical and conservative changes per lineage based on a previously proposed method [39]. Subsequently, the changes after domestication relative to before domestication were calculated by two metrics: D conservative /W conservative and D radical /W radical (Fig. 3a). Significance tests were performed using G-test (Fig. 4), and Pearson correlation analysis was conducted to evaluate whether the patterns of D ω /W ω, D conservative / Number of genomes analyzed per species, all transcripts used, transcripts with statistically significant differences, and genome-wide Ka/Ks ratios in domesticates ("D") and their wild relatives ("W") W conservative and D radical /W radical across different species might be correlated with N e (Fig. 3b, c, d).

Results and discussion
Previous studies have observed faster accumulation of non-silent mutations following domestication in some animals, including dog, pig, yak and chicken, which is believed to be a consequence of a decrease in N e associated with domestication and relaxation of purifying selection on mitochondrial genes in some domesticated species [2,16,17], but the debate on whether all domesticated animals exhibit a consistent trend is still ongoing [40]. Considering large N e differences across species, it would be very interesting to investigate whether the Fig. 1 Historical demography of the eight species. Generation time and mutation rate are based on previous reports [52][53][54][55][56][57] accumulation of functional mutations post-and predomestication might exhibit interspecific heterogeneities. Although more than 14,441 transcripts were analyzed for each species, LRT detected less than 600 genes exhibiting significant (p < 0.05) differences between domesticates and their wild or ancient relatives (Table 1,  Additional file 3). Intriguingly, two opposite patterns were observed among the significant Ka/Ks for the eight species: dog, cat, pig, goat, sheep and chicken have higher ω in domesticates than in their corresponding wild relatives, whereas in horse and cattle this trend is reversed ( Fig. 2 and Table 1). This pattern was further confirmed by D ω /W ω ratio (Fig. 3a), which was used to avoid biases caused by putative differences in the level of annotation among species. These ratios are lower than 1 in cattle and horse but higher than 1 in the other six species. Interestingly, studies have revealed that, in comparison to small animals, large animals exhibit higher levels of slightly deleterious mutations, which may lead to population decline or even extinction [41]. In this study, the ancestors of the two largest animals, cattle and horse, exhibited the highest ω ratios among the wild relatives of the eight studied species, indicating that they had accumulated the highest amount of (slightly) deleterious mutations.
To evaluate the functional effects of nonsynonymous mutations, we categorized them as radical or conservative. We found that the numbers of radical mutations are universally higher than the numbers of conservative mutations in both domesticates and their progenitors for all eight species (Fig. 4). When we further compared D conservative /W conservative and D radical /W radical ratios across species, we observed that both of these metrics are lower than 1 for cattle and horse and higher than 1 for the remaining six species (Fig. 3a). Hence, these results suggest that domesticates may not share a common trend in terms of the accumulation of non-silent mutations.
Interestingly, it seems that the most parsimonious distinction between the two groups of species revealed from our results is the body-mass, as cattle and horse a b c d Fig. 3 Selection pressure, accumulation of radical and conservative mutations after domestication relative to before domestication. a D ω /W ω, D radical /W radical and D conservative /W conservative ratios of the eight species. All significantly different genes were incorporated. Values shown in the horizontal axis are raw data for body mass of the eight species based on previous studies [45]. b The Pearson correlation between D ω /W ω and the most recent N e (~10,000 years ago). c The Pearson correlation between D conservative /W conservative and the most recent N e (~10,000 years ago). d The Pearson correlation between D radical /W radical and the most recent N e (~10,000 years ago) have more than three times higher body-mass than any of the remaining six species (Fig. 3a). Thus, for the sake of convenience, we can term the two groups as "LD" (large body-mass domesticates, including cattle and horse) and "SD" (small body-mass domesticates, including cat, dog, pig, goat, sheep, and chicken). In the field of evolutionary genetics, body-mass (or generation time) has usually been used as a proxy for N e due to the inverse relationship between them [42][43][44][45][46][47].
To analyze how different N e may be involved in affecting the selection efficiency, we evaluated their N e (s) using the PSMC method. Since this method can only infer historical demography on an ancient timescale (~10,000 years ago) [23], it is appropriate for comparisons of long-term interspecies differences in N e . PSMC analysis revealed that LD species have lower most recent (~10,000 years ago) N e than SD species (Fig. 1). This difference in N e is roughly consistent with the predicted negative relationship between body-mass (or generation time) and N e [42][43][44][45][46][47] (Fig. 3a). Pearson correlation revealed that D ω /W ω is significantly correlated with N e (s) in all eight species (p < 0.05, Fig. 3b), which is consistent with theoretical population genetics expectations. Nearly neutral theory suggests that the effect of selection depends on the product of the effective population size N e and selection coefficient s (N e s) [5,48]. Later, the relationship between ω, N e and selection coefficient (s) has been formulated as ω ¼ S 1−e −S ; where S = 4N e s [49]. According to the formula, to achieve the same changes in ω, species with smaller N e would have to undergo much bigger changes in selection coefficient. In other words, selection is expected to be inefficient in species with small N e , either when it comes to accumulation of beneficial functional mutations (through positive selection), or to removal of deleterious functional mutations (by purifying selection). In contrast, selection efficiency would be higher in populations with higher N e [50]. Thus, the main factor contributing to the lower Ka/Ks after domestication in LD species, observed in this study, might be their lower N e (s), which resulted in lesser efficiency of positive selection to accumulate beneficial mutations. We also observed a positive relationship between D radical /W radical and N e (Fig. 3d), which suggests that higher N e might also  Fig. 4 Numbers of radical and conservative mutations in domesticates and corresponding extant or ancient wild relatives. Stars above the horse indicate a significant difference (G-test, p = 0.048) between radical and conservative mutations. Note: red and blue bars represent the numbers of conservative and radical mutations per lineage, respectively; to save space, they have been partially overlapped. "D" represents domesticated species and "W" represents wild species drive a faster accumulation of radical mutations, as a result of a more efficient positive selection. This conclusion was further confirmed by our selection dynamics analysis (Table 2), where we found that the SD species with higher N e have more genes under higher positive selection (PST + IPS). Thus, our results revealed that under the frame of higher N e the efficiency of positive selection may be promoted at the post-domestication stage. Taken together, we detected a positive relationship between the interspecies variation in N e and the tempo of accumulation of functional mutations, indicating the existence of interspecific heterogeneity in the efficiency of selection. It is worth noting that, since our study was only limited to protein-coding regions, future efforts should be made to explore how the effects of regulatory elements might be influenced by population parameters, considering their important roles in domestication [51].

Conclusions
Animal domestication presents a unique opportunity to study how the joint effects of selection and drift influence the accumulation of mutations, especially on a genome-wide scale. Rapidly-increasing amount of available genomic data offers us an opportunity to explore whether the differences in interspecific demography might result in different rates of accumulation of functional mutations, as predicted by theoretical population genetics. In this study, we found that D ω /W ω and D radical /W radical are positively correlated with the species-level effective population sizes. Our results suggest that the impact of N e on the accumulation of functional (including radical) mutations might be underestimated, and emphasize the importance of maintaining a large population size for strengthening the efficiency of selection in animal breeding.

Additional files
Additional file 1: The data for 62 genomes from eight species. Data used in the study, including species, status (domestic or wild, where breed is indicated for domestic animals), accession ID with the corresponding reference, and read depth. Abbreviations D conservative /W conservative : The conservative changes after domestication relative to before domestication; D radical /W radical : The radical changes after domestication relative to before domestication; D ω : Ka/Ks of domesticated species or animals; D ω /W ω : Ka/Ks of domesticated species or animals relative to that of wild species or animals; IPR: Intensified purifying selection (1 > W ω > D ω ); IPS: Intensified positive selection (D ω > W ω > 1); Ka/Ks: The ratio of the number of nonsynonymous substitutions per nonsynonymous site (Ka) to the number of synonymous substitutions per synonymous site (Ks); LD: Large body-mass domesticates, including cattle and horse; LRT: Likelihood ratio test; N e : Effective population sizes; ORM: One-ratio model; PRT: Purifying selection transition (W ω > 1; D ω < 1); PSMC: Pairwise sequentially Markovian coalescent; PST: Positive selection transition (W ω < 1; D ω > 1); RPR: Relaxed purifying selection (1 > D ω > W ω ); RPS: Relaxed positive selection (W ω > D ω > 1); s: Selection coefficient; SD: Small body-mass domesticates, including cat, dog, pig, goat, sheep, and chicken; TRM: Two-ratios model; W ω : Ka/Ks of wild species or animals