Differential strengths of selection on S-RNases from Physalis and Solanum(Solanaceae)
© Paape and Kohn; licensee BioMed Central Ltd. 2011
Received: 27 October 2010
Accepted: 19 August 2011
Published: 19 August 2011
The S-RNases of the Solanaceae are highly polymorphic self-incompatibility (S-) alleles subject to strong balancing selection. Relatively recent diversification of S-alleles has occurred in the genus Physalis following a historical restriction of S-allele diversity. In contrast, the genus Solanum did not undergo a restriction of S-locus diversity and its S-alleles are generally much older. Because recovery from reduced S-locus diversity should involve increased selection, we employ a statistical framework to ask whether S-locus selection intensities are higher in Physalis than Solanum. Because different S-RNase lineages diversify in Physalis and Solanum, we also ask whether different sites are under selection in different lineages.
Maximum-likelihood and Bayesian coalescent methods found higher intensities of selection and more sites under significant positive selection in the 48 Physalis S-RNase alleles than the 49 from Solanum. Highest posterior densities of dN/dS (ω) estimates show that the strength of selection is greater for Physalis at 36 codons. A nested maximum likelihood method was more conservative, but still found 16 sites with greater selection in Physalis. Neither method found any codons under significantly greater selection in Solanum. A random effects likelihood method that examines data from both taxa jointly confirmed higher selection intensities in Physalis, but did not find different proportions of sites under selection in the two datasets. The greatest differences in strengths of selection were found in the most variable regions of the S-RNases, as expected if these regions encode self-recognition specificities. Clade-specific likelihood models indicated some codons were under greater selection in background Solanum lineages than in specific lineages of Physalis implying that selection on sites may differ among lineages.
Likelihood and Bayesian methods provide a statistical approach to testing differential selection across populations or species. These tests appear robust to the levels of polymorphism found in diverse S-allele collections subject to strong balancing selection. As predicted, the intensity of selection at the S-locus was higher in the taxon with more recent S-locus diversification. This is the first confirmation by statistical test of differing selection intensities among self-incompatibility alleles from different populations or species.
Keywordspositive selection non-synonymous and synonymous substitution S-RNase polymorphism Physalis, Solanum
Self-incompatibility (SI) polymorphisms are maintained by balancing selection over long evolutionary time scales. Selection continually favors rare alleles because they are less frequently rejected as mates [1, 2]. Shared ancestral polymorphism is commonly observed as a result of strong balancing selection with alleles from different species and genera clustering together in phylogenetic reconstructions [3–5]. This implies that S-alleles are often much older than the species from which they are sampled. Coalescence times of S-locus polymorphisms are often estimated as a few tens of millions of years, far longer than coalescence times of polymorphism at loci not subject to balancing selection [6, 7]. Sequence divergence at S-loci is also extreme, with stylar S-alleles often differeing at 40% or more of their amino acids. This is another sign of their great age, as well as the rarity of recombination at known S-loci. Also of importance for the current study, alleles undergoing diversification can leave distinct signatures of positive selection among amino acid sites across related taxa.
Richman et al.  detected a remarkable reduction in the extent of shared ancestral polymorphism among alleles from the S-RNase locus, which encodes the stylar specificity component of the gametophytic SI system of Solanaceae. In particular, Physalis crassifolia alleles, while numerous, all belonged to just three trans-generic lineages while alleles sampled from most other Solanaceae represented far more ancient lineages. Estimates of historical effective population sizes of Solanum carolinense and P. crassifolia showed at least an order of magnitude decrease in Physalis relative to Solanum . The pattern found in P. crassifolia, in which all S-alleles within the species represent only three ancient lineages, is shared by other SI Physalis species and by SI members of the closely related genus Witheringia [7–14]. These findings have been interpreted as the result of a historical restriction of S-locus diversity that occurred approximately 15 MYA  in a common ancestor of Physalis and Witheringia that is not shared with Solanum or other sampled genera of Solanceae [3, 7, 13].
Genealogical patterns suggest that Physalis S-RNase alleles underwent rapid re-diversification following the historical restriction at the S-locus [8, 13, 14]. Because allele numbers in Physalis species are comparable to those found among species of Solanum, it is thought that post-bottleneck rediversification has returned allele numbers to equilibrium or nearly so . This provides an opportunity to examine patterns of selection on sets of S-RNase alleles that have different evolutionary histories. The more recently diversified S-alleles of Physalis might be expected to show greater rates of non-synonymous substitutions because of the increased strength of recent diversifying selection . The intensity of selection on S-alleles is inversely proportional to their number. So when the number of alleles is below equilibrium, as after a severe bottleneck, selection intensity is predicted to be higher than it is after equilibrium in allele number is achieved . The time frame over which a period of heightened selection would be evident at the self-incompatibility locus is not known.
Here we compare selective regimes acting on the S-RNase alleles drawn from species of Physalis and Solanum (Solanaceae). Positive selection has been estimated among self-incompatibility alleles of several taxa using various methods [13, 15–19], most commonly the maximum likelihood phylogenetic approaches first proposed by Nielsen and Yang  and more recently by coalescent-based methods described by Wilson and McVean . These methods use the ratio of non-synonymous (dN) to synonymous (dS) nucleotide substitutions (ω) to estimate patterns of selection at individual codons. In this study, we investigate positive selection on amino acids among S-RNases both within and across species of Physalis and Solanum (Solanaceae). These polymorphic S-alleles provide useful contrasts because diversification at the S-locus in the different genera took place during different time periods and among different S-allele lineages.
Several previous studies [19, 22, 23] have utilized PAML  to assess which codons within S-allele sequences were subject to positive selection in different taxa. However, none of these studies have been able to statistically determine how the strength and location of selection differs between groups of sequences. For instance, Castric and Vekemens  compared patterns of selection among several taxa at the S-receptor kinase (SRK) locus which controls stylar recognition in the sporophytic SI system found in Brassicaceae. Using PAML on separate datasets from each taxon, a higher intensity of selection (higher ω) was estimated among positively selected sites in Brassica relative to those in two self-incompatible species of Arabidopsis. This was attributed to post-bottleneck diversification of SRK alleles in Brassica. However, given the methods used, the statistical significance of the difference in estimates of selection intensity could not be evaluated.
PAML analyses  also found different sites under significant positive selection in different sets of S-alleles. It was concluded, however, that this was poor evidence for selection occurring on different sites. In their study , the power to detect selection was shown to be low so non-overlap in the codons found to be under selection in different datasets would be expected, even if selection acted on the same sites in each set of alleles. Similarly, Vieira et al.  looked at positive selection across S-RNases and found evidence for different positively selected sites in S-RNases from different families and sub-families of flowering plants. Again however, they did not employ a statistical framework capable of testing the significance of differences in selective pressures acting on the same codons in different taxa.
In this study we apply both phylogenetic maximum likelihood and coalescent Bayesian methods, treating S-allele alignments and phylogenies from species in each genus either as a) distinct datasets compared using a series of nested maximum likelihood and Bayesian models of selection or b) as a combined data set in which specific clades of interest within single phylogenies are examined. Our primary goal is to apply statistical frameworks using formal hypothesis tests to answer the following questions: 1) Can we detect significant differences in the strength of selection between genera? 2) Do the proportions of sites under selection differ among genera? 3) Which sites show significantly different selection intensities between genera? 4) Are differences in the strength of selection due to significantly higher dN or dS in one dataset relative to the other? 5) Do sites under selection differ among S-allele lineages?
Average pairwise nucleotide divergence (π) among S-alleles for each species and genus estimated using DNASP 4
P. cinerascens (12)
P. longifolia (37)
S. carolinense (17)
S. chilense (32)
Do selection intensities or the proportion of sites under selection differ among S-alleles from Solanum versus Physalis?
Comparative rate distribution tests of non-synonymous and synonymous substitutions across datasets
HA: Rates free to vary
Log likelihood: -16749.63
Inferred rates for Physalis:
Inferred rates for Solanum:
a) H 0 : Same strength of selection
Log likelihood: -16765.49
Inferred rates for Physalis:
Inferred rates for Solanum:
Are selection strengths (dN/dS) different?
LRT = 31.722 p < 0.0001; DF = 1
b) H 0 : Same proportion of selected sites
Log likelihood: -16750.60
Inferred rates for Physalis:
Inferred rates for Solanum:
Are the proportions of codons under selection different?
LRT = 1.929 p < 0.165; DF = 1
c) H 0 : Same dN/dS and proportions
Log likelihood: -16766.96
Inferred rates for Physalis:
Inferred rates for Solanum:
Are selective regimes (dN/dS and proportions) different?
LRT = 34.647 p < 0.0001; DF = 2
d) H 0 : Shared distributions of rates
Log likelihood: -16764.30
Inferred joint rates:
Are the distributions different?
LRT = 29.350 p < 0.001; DF = 10
Which sites show significant differences in strengths of positive selection?
Sites predicted to be under differential positive selection using the Bayesian ratio of omegas (ω1/ω2) test, the fixed effects likelihood comparison of selective pressures (FEL-CSP), and FEL-Clade tests
S-RNase Region (codon positions)
15, 26, 33
44, 46, 49
56 (0.97), 59
69, 71, 84
9, 101, 104, 110, 111,
112, 119 (0.96), 120 (0.96),
123, 124 (0.97), 125
127 (0.95), 129, 131
8, 9,13, 24,
26 (0.08), 31
96, 99, 116, 125
127, 129, 131
FEL Clade A
3, 6, 9,
87, 89, 90
96, 99, 110, 121
FEL Clade C
9, 14, 24
97, 127, 129
FEL Clade Sc
21, 22, 23, 29
30, 32, 34, 35, 38
Clade A as forground
103, 106, 109
FEL Clade Sc
47, 48, 50
Clade C as forground
Do different S-allele lineages experience greater selection intensities?
Clade model likelihood ratio tests comparing Physalis Clade A (subtree) and its subtending branch to all other S-RNasesa
1) Global dN/dS rate (whole tree)
dNdS Clade A = dNdS Alla
Global dN/dS = 0.90; CI = (0.87, 0.94)
2) Separating Branch Versus Two Clades
dNdSdNdS Shared Clades A + Alla = 0.90;
CI = (0.86, 0.94)
Phys dNdS Branch A = 5.18; CI = (1.64, 10.49)
LRT p-value vs the single rate model = 0.266
3) Clade A + Branch vs Alla
dNdS Branch A = Clade A = 2.19; CI = (2.03, 2.36)
dNdS Clade S = 0.70; CI = (0.73, 0.77)
LRT p-value vs the single rate model < 0.001
4) Clade A (subtree) vs Branch + Alla
Phys dNdS Clade A = 2.04; CI = (2.20, 2.37)
Phys dNdS Branch A = dNdS; Clade Alla = 0.70;
CI = (0.73, 0.77)
LRT p-value vs the single rate model < 0.001
5) Clade A, Solanum , and Branch
Phys dNdS Clade A = 2.04; CI = (2.20, 2.37)
dNdS Clade Alla 0.70; CI = (0.73, 0.77)
Phys dNdS Branch A = 2.96; CI = (0.83, 6.17)
LRT p-value vs the single rate model < 0.001
Do selected sites differ among lineages?
What causes higher dN/dS ratios in Physalis?
When allele numbers at the S-locus are below equilibrium, as after recovery from a demographic restriction, selection favoring new alleles is expected to increase . We have used a series of statistical methods to determine if the intensity of selection acting on S-RNases differed among taxa and lineages, and whether the number and positions of sites under selection differed. As indicated by the distributions of dN and dS along the entire S-RNase gene in the initial REL models (Table 2), there is a significantly greater dN/dS ratio in Physalis. This method is similar to PAML models that begin by categorizing dN and dS rates into discrete distributions, but with the added use of a framework of nested models that compare those rates across two taxa with homologous polymorphism. Subsequent likelihood (PAML) and coalescent (OmegaMap) analyses found more sites under significant positive selection in Physalis rendering the second result of the REL analysis somewhat surprising: that no significant difference in the proportion of sites under selection was detected. The REL method may be less sensitive in detecting differences in local processes than in overall selective pressure, but the main difference we can confirm between the genera is in the intensity of selection rather than the proportion of sites subject to it.
We used a novel adaptation of OmegaMap  to determine which codons are subject to stronger selection in one genus versus the other. The Markov chain process of the Bayesian method produces a distribution of ω values around a mean for each codon that allows one to establish upper and lower 95% confidence intervals. This feature of Bayesian statistics makes this method useful for hypothesis tests regarding dN/dS ratios across taxa, something that is not possible using existing maximum likelihood methods such as PAML. These tests found 36 codons under significantly higher selection in Physalis. We also used an alternative fixed effects maximum likelihood method to compare selective pressures (FEL-CSP) using likelihood ratio tests for increased dN/dS in one genus relative to the other. This method detected roughly half as many sites under differential selection as the Bayesian method, suggesting that either the Bayesian approach is prone to high false positive rates or that the FEL-CSP method has reduced power. Based on our power analysis, we suspect the latter as the Bayesian method appears to perform similarly to a REL method (i.e. PAML). Previous simulations  comparing both REL and FEL methods on individual datasets showed that FEL is less powerful when the number of sequences is below 64 as are each of our datasets.
As expected, both the Bayesian and FEL-CSP methods predict that the greatest differences in the magnitudes of positive selection on individual codons occur in the previously identified hyper-variable regions HVa and HVb . The hyper-variable regions are thought to play a major role in determining specificity [31–36]. For example, Matton et al.  demonstrated alteration of specificity using mutagenesis experiments involving these hypervariable regions. These studies showed that as few as 4 amino acid changes in corresponding positions of the S11 and S13 S-RNases of S. chacoense could alter specificity to that of the alternative allele. However, entire domain swapping in studies [32, 33] using S-RNases of Petunia inflata and Nicotiana alata, suggest that while HVa and HVb are important, other regions are also likely involved in recognition at least in some alleles or lineages. Consistent with this idea, both codon-based methods used here also show considerable differential selection in the V2 region near the 3' end of the S-RNases, supporting previous analyses of both Lycium [16, 23] and Solanum chilense  S-RNases which also found evidence of selection in this region.
The genealogy of S-alleles from Physalis suggests that extant S-RNases evolved from only 3 lineages, giving rise to the expectation of strong selection within each of these three clades. Indeed, Physalis clade A shows the highest dN/dS as expected during early strong selection on a reduced number of S-alleles. These results suggest that the clade model captures increased post-bottleneck diversifying selection intensities. Clade C also shows increased selection pressure relative to background lineages while clade B contains too few alleles for testing by this method. This test confirms the findings of the REL test but on isolated foreground lineages and shows that selection is generally stronger in each re-diversified clade relative to average selection estimated for background lineages.
All methods used found higher dN/dS ratios in Physalis, as expected following a severe reduction in S-allele numbers. However, due to saturation, dN may be more severely underestimated in long branches potentially leading to reduced estimates of dN/dS ratios . Because its alleles are generally older, this could providing a potential alternative to greater selection for lower dN/dS estimates from Solanum. We therefore estimated dN and dS at terminal branches for the two species with the most alleles (P. longifolia and S. chilense) to a) estimate dN and dS in the absence of interspecific branch lengths, b) gain insight into non-synonymous substitution rates of similarly aged S-alleles, and c) estimate recent selection by ignoring internal branches. For alleles separated by equivalent amounts of synonyomous change, Physalis alleles have accumulated non-synonymous substitutions at about twice the rate for Solanum (Figure 7). Evidence for increased dN/dS ratios is apparent even at relatively low levels of divergence (dN and dS < 0.15). This is strong evidence that saturation of non-synonyous substitutions is not the cause of higher inferred intensity of selection in Physalis.
In comparison to tests for increases in selection across the gene or at specific codons, methods for testing whether the same or different codons are under selection in different groups or lineages are considerably less well developed. The FEL-Clade models returned the only evidence suggesting that sites under positive selection in a particular clade might be under neutral or purifying selection in the background phylogeny (Figure 6 and Table 3). FEL-Clade analyses also showed mostly different sites under selection across the two main Physalis clades examined (A and C; Table 3). Finding different sites under selection in different clades might indicate that different residues contribute to specificity differentiation in different groups of alleles. However, this finding could also reflect low power to detect selection, given the reduced sample sizes represented within each clade. With low power, the expected overlap in sites predicted to be under selection would also be low .
The FEL-Clade models also indicated several sites where the strength of positive selection in Solanum was greater than in the contrasted clade (A or C) from Physalis. This is in contrast to other methods explored here where all significant differences in the strength of positive selection at specific sites showed increased selection intensity in Physalis. If clades differ in sites subject to positive selection, analyses combining all Physalis clades might mask these effects while the FEL-Clade method may expose these differences.
Several methods detected increased selection intensities acting on the alleles from Physalis when compared to those from Solanum, consistent with recovery from a historical restriction in S-locus diversity in Physalis. However, another question, whether the same or different residues were under selection in alleles from the two sources was more difficult to answer. The REL method did not detect a higher proportion of sites under selection in Physalis and the method cannot detect whether selection acts on the same or different codons. Other methods found more sites under significant positive selection and higher selection intensities acting on selected sites in Physalis, but both may result from increased selection intensities rather than differences in sites subject to positive selection. The FEL clade-specific approach provided some evidence that different sites were under selection in specified Physalis clades than across the background Solanum alleles but the assumption of this test, that selection on the background clade is uniform, may not hold and these results should be treated cautiously. While the methods explored here for testing differential strengths of selection across a gene or at specific codons appear adequate, further development of statistical methods for testing whether the same or different sites are under selection is needed.
Sequences and Phylogeny Construction
Amino acid and nucleotide S-RNase sequences were obtained from GenBank for 12 Physalis cinerascens, 36 P. longifolia, 17 Solanum carolinense, 32 S. chilense and one Antirrhinum hispanicum (Ahis5) allele used as an outgroup sequence. Automated alignment of the complete dataset containing all S-alleles was performed using ClustalX  and manually adjusted using Se-Al v2.0 . A nucleotide alignment was matched with corresponding amino acids to produce a codon alignment using PAL2NAL  that resulted in 131 codons. A phylogeny of all S-alleles (n = 98) was created using Mr. Bayes v3.1  to generate a 50% majority consensus topology. The analysis was run under a GTR+ Г + I substitution model for 1,000,000 generations, sampling every 100th tree for a total of 10,000 trees. The initial 2501 trees were discarded as the burn-in phase. The remaining trees represent generations on which posterior probabilities were calculated.
Separate datasets were compiled for each genus: one that contained 48 Physalis and the other with 49 Solanum S-alleles. Corresponding topologies for each dataset were pruned from the Bayesian consensus tree using TreeEdit v1.0a10  to maintain genealogical relationships found when all taxa's alleles were included. The use of 2 species from each genus simply enlarges each dataset as the genealogical patterns exhibited for congeners are shared because of trans-specific polymorphism. The same tree topology for each dataset was used in all subsequent selection analyses that utilize phylogenies unless otherwise stated. A general time reversible (GTR) model of nucleotide substitution is used for all subsequent phylogenetic selection analyses so that direct comparisons can be made across models and datasets. Pairwise nucleotide divergence π was estimated for synonymous and non-synonymous substitutions for all taxa using DNASP 4.0 . Sequence alignments, Newick string tree topologies and HYPHY likelihood functions for Physalis and Solanum datasets can be found as Nexus files in online Supplementary data.
Distribution of dN and dS Rates
The most general test of the relative strength of selection across two datasets compares the distribution of synonymous and non-synonymous substitution rates using a random effects likelihood (REL) approach  implemented in the program HYPHY . This consists of several nested models for hypothesis testing, similar to the likelihood ratio tests (LRTs) described by Nielsen and Yang  and implemented in PAML , that begin by estimating general discrete distributions of four rate classes for each dataset. Rate classes are as follows: two bins for negative selection where dS1 > dN1 and dS2 > dN2; one for neutral evolution dS3 = dN3 ; and one for positive selection dS4 < dN4.
Null hypotheses comparing both datasets are as follows: a) H0: dN4p/dS4p = dN4s /dS4s for the same strength of selection where subscripts indicate bin 4 (dN4 > dS4) and Physalis 'p' or Solanum 's', b) H0: p4p = p4s for the same proportion of positively selected sites, c) the same selective regime which combines both a) and b) (H0: dN4p/dS4p = dN4s /dS4s and p4p = p4s), and finally d) H0: rates derived from the combined dataset equal to rates estimated for each taxon separately. An independent distribution model of rates that are free to vary for both datasets is set as the alternative hypothesis against which the null model likelihoods (a, b, c and d) are tested. Models are rejected by -2ΔlnL (ΔlnL = the difference in log likelihoods of the two models) where significance is determined by χ2 distribution with the degrees of freedom (df) equal to the difference in the number of parameters between models.
Codon Selection Estimates
To estimate the ratio (ω) of non-synonymous (dN) to synonymous (dS) substitutions at individual amino acid sites we first used the program codeml in PAML 3.15 . Values of ω < 1 for individual codons indicates purifying selection while sites with ω = 1 are considered neutral. Positive selection at the amino acid level is predicted when ω > 1. A series of nested neutral and selection models first developed by Nielsen and Yang  use likelihood ratio tests (LRT) to determine the model that best fits the data. The null model M1 (neutral) constrains all sites to be either of class ω = 0 or ω = 1 while the alternative model M2a (selection) adds a third class in which ω > 1 at individual sites. Model M3 (selection) assumes three discrete site classes (ω0 , ω1, and ω2) with three corresponding proportions (p0, p1, p2) estimated from the data. Models are then compared and rejected by likelihood ratio tests as described in the section above. Sites estimated to be under positive selection are determined by an empirical Bayes approach  where posterior probabilities are estimated from rates within each site class. Because we are primarily concerned with comparing posterior probabilities from the robust general discrete (M3) model with a subsequent coalescent analysis, we forgo full analyses including models with more complex rate distributions (i.e. M7 and M8).
The Bayesian coalescent method was conducted using OmegaMap v0.5  which implements a population genetics likelihood approximation to the coalescent to infer recombination and estimate ω. The model of base substitution including transition/transversion rates among codons was adopted from Nielsen and Yang . Rather than using a maximum likelihood approach to estimate the selection parameter, OmegaMap employs a Bayesian method with a Markov Chain Monte Carlo (MCMC) process to estimate posterior distributions of parameters. This allows the use of posterior densities of ω to investigate whether dN/dS is greater at any particular codon in one dataset versus the other without the need for nested models. This can only be done if datasets are the same length, encode for homologous genes, and have reliable alignments of codon positions. By sampling from the distribution of ω values we are able to determine the ratio of ω estimated from Physalis relative to Solanum. Rejection of the null hypothesis that sites have equivalent ω values is observed when the 95% posterior density of ratios exclude 1 (H0: w1HPD w2HPD = 1).
Rather than estimating ω for each dataset using a variable model along pre-defined blocks of adjacent codons, we assumed an independent model for each site with an improper inverse distribution of rates. The MCMC chain was iterated over 500,000 generations sampling every 100th generation. We ran each dataset twice to check for convergence and removed a burn in of 50,000 generations using R http://www.r-project.org/. The chain generates upper and lower posterior densities (highest posterior density HPD) to determine mean point estimates of ω at each codon position for each dataset. Because the independent model is computationally intensive, we ran the OmegaMap analyses using the Cornell BioHPC server http://cbsuapps.tc.cornell.edu/omegamap.aspx. The upper and lower HPD of ω values from each dataset were then combined and re-sampled after a burn in of 25,000 generations to get HPD's and the geometric mean for the ratio of ω's using R.
FEL-CSP (Fixed Effects Likelihood-Compare Selective Pressures)
We also used a fixed-effects likelihood (FEL) method to infer differential selection at individual sites among datasets . FEL differs from the REL type models of PAML and the coalescent method of OmegaMap in that dN and dS are estimated at individual sites directly rather than using pre-defined distributions of rates . Alignments of each dataset were first used to estimate global parameters such as nucleotide frequencies, topology, and branch lengths. We use separate trees for each dataset (rather than a single phylogeny including both genera). These parameters were then fixed throughout the selection estimate procedure. The null model H0: dN1/dS1 = dN2/dS2 and alternative model HA: where dS1, dN1, dS2, dN2 are free to vary are fitted to every codon and, because they are nested, likelihood ratio tests can be used to determine significantly different selection pressures on individual sites. We estimated selection using the CompareSelectivePressure batch file in HYPHY v0.99. Actual dN/dS values for each dataset were then checked for any potential false positive estimates of differential positive selection. Here it is possible for the model to reject the null hypothesis that dN/dS ratios are equivalent across datasets but codons may not actually have ω estimates > 1.
We conducted simulations for Physalis and Solanum datasets independently to determine the power of the FEL test for given p-values. We simulated 100 replicates of each dataset and corresponding phylogeny using the site-by-site rate estimates from the FEL method with 25% of sites evolving neutrally. This produced 13100 sites with non-zero rates (131 codons × 100 replicates) to estimate false positive rates over bins of p-values of width 0.01. The power analysis was conducted using a batch command program in the HYPHY v0.99 package.
Lineage-specific selection pressures
A phylogeny of Physalis and Solanum compartmentalized into all Solanum lineages versus Physalis clade A and its subtending branch was used to determine equality of dN/dS between them. Physalis clade A represents the largest re-diversification among Physalis S-alleles, and this method compares rate estimates for one specified clade against those for a background phylogeny. The HKY85 model of nucleotide substitution was used along with phylogenies containing all Solanum S-RNases (49) and the S-RNases found within clade A (Figure 1). Comparison among five models using LRT's are as follows: Model 1) allows one global dN/dS value, Model 2) constrains the specified subclade and background dN/dS values to be equal but adds a new parameter for dN/dS along the branch leading to the clade. Model 3) constrains dN/dS values of the specified clade and its subtending branch to be equal but allows background branches to have a distinct dN/dS value. Model 4) constrains background branch's dN/dS and the subtending branch to be equal while the clade is allowed to vary, and Model 5) allows all compartments (specified clade, its subtending branch, and background branches) to have dN/dS values free to vary. Log likelihood scores were used to determine best fit models and Akaike information criterion (AIC) values were used to adjust for differences in parameters among likelihood ratio tests . The process was then repeated with Physalis clade C compared to background lineages from Solanum. Phyalis clade B contains too few alleles for useful analysis by this method.
FEL-Clade Test (subtree selection comparison)
To ask whether different codons were under selection in different lineages we used a FEL approach comparing the selection on individual codons in background lineages with that on a particular Physalis clade (A or C). In this case the alternative Physalis clade (A or C) was included as part of the background phylogeny. For the class of codons with dN/dS > 1, the null model H0 has 3 rate classes for each codon: dN for the background lineages = dN for the Physalis clade of interest, dS background lineages = dS Physalis clade of interest, dN/dS background lineages = dN/dS Physalis clade of interest. The alternative hypothesis HA: has one rate class for dN for all background lineages, another dN rate class for Physalis clade being compaired, a single dS rate for all lineages, and one dN/dS for all background lineages, and another dN/dS > 1 ratio for the Physalis clade of interest. Likelihood ratio tests are conducted for each codon position where significance is determined at the p ≤ 0.05 level.
random effects likelihood
fixed effects likelihood
likelihood ratio test
highest posterior density
Markov Chain Monte Carlo
Thanks to Danny Wilson for R code to estimate ratio of omegas as well as valuable suggestions and assistance with software. Thanks also to Sergei Kosakovsky Pond for consultation regarding various methods and for support with simulations and the use of HYPHY. The manuscript was greatly improved by the suggestions of two anonymous reviewers. Support for this work was provided by NSF DEB-0639984 to J.R.K.
- Wright S: The distribution of self-sterility alleles in populations. Genetics. 1939, 24: 538-552.PubMedPubMed Central
- Clark AG: Evolutionary inferences from molecular characterization of self-incompatibility alleles. Mechanisms of Molecular Evolution. Edited by: Takahata N, Clark AG. 1993, Sunderland, MA: Sinauer, 79-108.
- Richman AD, Uyenoyama MK, Kohn JR: Allelic diversity and gene genealogy at the self-incompatibility locus in the Solanaceae. Science. 1996, 273: 1212-1216. 10.1126/science.273.5279.1212.View ArticlePubMed
- Kusaba M, Nishio T, Satta Y, Hinata K, Ockendon D: Striking sequence similarity in inter- and intra-specific comparisons of class I SLG alleles from Brassica oleracea and Brassica campestris: Implications for the evolution and recognition mechanism. Proceedings of the National Academy of Sciences USA. 1997, 94: 7673-7678. 10.1073/pnas.94.14.7673.View Article
- Sonneveld T, Robbins TP, Bošković R, Tobutt KR: Cloning of six cherry self-incompatibility alleles and development of allele-specific PCR detection. Theoretical and Applied Genetics. 2001, 102: 1046-1055. 10.1007/s001220000525.View Article
- Bechsgaard JS, Castric V, Charlesworth D, Vekemans X, Schierup MH: The transition to self-Compatibility in Arabidopsis thaliana and evolution within S-haplotypes over 10 Myr. Molecular Biology and Evolution. 2006, 23: 1741-1750. 10.1093/molbev/msl042.View ArticlePubMed
- Paape T, Igic B, Smith S, Olmstead R, Bohs L, Kohn JR: A 15-Million-Year-Old Genetic Bottleneck at the S-locus of the Solanaceae. Molecular Biology and Evolution. 2008, 25: 655-663. 10.1093/molbev/msn016.View ArticlePubMed
- Richman AD, Kohn JR: Self-incompatibility alleles from Physalis: implications for historical inference from balanced genetic polymorphisms. Proceedings of the National Academy of Sciences USA. 1999, 96: 168-172. 10.1073/pnas.96.1.168.View Article
- Lu Y: Roles of lineage sorting and phylogenetic relationship in the genetic diversity at the self-incompatibility locus of Solanaceae. Heredity. 2001, 86: 195-205. 10.1046/j.1365-2540.2001.00823.x.View ArticlePubMed
- Lu Y: Historical events and allelic polymorphism at the gametophytice self incompatibility locus in Solanaceae. Heredity. 2006, 96: 22-28.PubMed
- Igic B, Bohs L, Kohn JR: Historical inferences from the self-incompatibility locus. New Phytologist. 2004, 161: 97-105.View Article
- Igic B, Smith WA, Robertson K, Schaal BA, Kohn JR: The population genetics of the self-incompatibility polymorphism in wild tomatoes: I. S-RNase diversity in Solanum chilense (Dun.) Reiche (Solanaceae). Heredity. 2007, 99: 553-561. 10.1038/sj.hdy.6801035.View ArticlePubMed
- Richman A: Evolution of balanced genetic polymorphism. Molecular Ecology. 2000, 9: 1953-1963. 10.1046/j.1365-294X.2000.01125.x.View ArticlePubMed
- Stone JL, Pierce SE: Rapid recent radiation of S-RNase lineages in Witheringia solanacea (Solanaceae). Heredity. 2005, 94: 547-555. 10.1038/sj.hdy.6800657.View ArticlePubMed
- Takebayashi N, Brewer PB, Newbigin E, Uyenoyama MK: Patterns of variation within self-incompatibility loci. Molecular Biology and Evolution. 2003, 20: 1778-1794. 10.1093/molbev/msg209.View ArticlePubMed
- Savage AE, Miller JS: Gametophytic self-incompatibility in Lycium parishii (Solanaceae): allelic diversity, genealogical structure, and patterns of molecular evolution at the S-RNase locus. Heredity. 2006, 96: 434-444. 10.1038/sj.hdy.6800818.View ArticlePubMed
- Nunes MDS, Santos RAM, Ferreira SM, Vieira J, Vieira CP: Variability patterns and positively selected sites at the gametophytic self-incompatibility pollen SFB gene in a wild self-incompatible Prunus spinosa (Rosaceae) population. New Phytologist. 2006, 172: 577-587. 10.1111/j.1469-8137.2006.01838.x.View ArticlePubMed
- Sato KT, et al: Coevolution of the S-locus genes SRK, SLG and SP11/SCR in Brassica oleracea and B. rapa. Genetics. 2002, 162: 931-940.PubMedPubMed Central
- Castric V, Vekemans X: Evolution under strong balancing selection: How many codons determine specificity at the female self-incompatibility gene SRK in Brassicaceae?. BMC Evolutionary Biology. 2007, 7: 132-10.1186/1471-2148-7-132.View ArticlePubMedPubMed Central
- Nielsen R, Yang Z: Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene. Genetics. 1998, 148: 929-36.PubMedPubMed Central
- Wilson DJ, McVean G: Estimating diversifying selection and functional constraint in the presence of recombination. Genetics. 2006, 172: 1411-1425.View ArticlePubMedPubMed Central
- Vieira J, Morales-Hojas R, Santos RAM, Vieira CP: Different positively selected sites at the gametophytic self-incompatibility pistil S-RNase gene in the Solanaceae and Rosaceae (Prunus, Pyrus, and Malus). Journal of Molecular Evolution. 2007, 65: 175-185. 10.1007/s00239-006-0285-6.View ArticlePubMed
- Miller JS, Levin RA, Feliciano NM: A tale of two continents: Baker's rule and the maintenance of self-incompatibility in Lycium (Solanaceae). Evolution. 2008, 62: 1052-1065. 10.1111/j.1558-5646.2008.00358.x.View ArticlePubMed
- Kosakovsky Pond SL, Poon AFY, Frost SDW: Estimating selection pressures on alignments of coding sequences: Analyses using HyPhy. The Phylogenetic Handbook: A Practical Approach to DNA and Protein Phylogeny. Edited by: Salemi M, Vandamme A-M.
- Kosakovsky Pond SL, Frost SDW, Muse SV: HyPhy: hypothesis testing using phylogenies. Bioinformatics. 2005, 21: 676-679. 10.1093/bioinformatics/bti079.View Article
- McVean G, Awadalla P, Fearnhead P: A coalescent-based method for detecting and estimating recombination from gene sequences. Genetics. 2002, 160: 1231-1241.PubMedPubMed Central
- Kosakovsky Pond SL, Frost SDW, Grossman Z, Gravenor MB, Richman DD, et al: Adaptation to different human populations by HIV-1 revealed by codon-based analyses. PLoS Computational Biology. 2006, 2 (6): e62-10.1371/journal.pcbi.0020062.View ArticlePubMed Central
- Kosakovsky Pond SL, Poon AFY, Frost SDW: Estimating selection pressures on alignments of coding sequences: Analyses using HyPhy. The Phylogenetic Handbook: A Practical Approach to DNA and Protein Phylogeny. Edited by: Salemi M, Vandamme A-M. 2009, Cambridge: Cambridge University Press, 419-450.View Article
- Nielsen R: Codon-substitution models for detecting molecular adaptation at individual sites along specific lineages. Molecular Biology and Evolution. 2002, 19: 908-917.View ArticlePubMed
- Bielawski JP, Yang Z: A maximum likelihood method for detecting functional divergence at individual codon sites, with application to gene family evolution. Journal of Molecular Evolution. 2004, 59: 121-132.View ArticlePubMed
- Ioerger TR, Gohlke JR, Xu B, Kao T-h: Primary structural features of the self-incompatibility protein in Solanaceae. Sexual Plant Reproduction. 1991, 4: 81-87.View Article
- McCubbin A: Molecular and biochemical bases of gametophytic self-incompatibility in Solanaceae. Plant Physiology and Biochemistry. 1997, 35: 171-176.
- Zurek D, Mou B, Beecher B, McClure B: Exchanging sequence domains between S RNases from Nicotiana alata disrupts pollen recognition. The Plant Journal. 1997, 11: 797-808. 10.1046/j.1365-313X.1997.11040797.x.View ArticlePubMed
- Ishimizu T, Endo T, Yamaguchi-Kabata Y, Nakamura KT, Sakiyama F, Norioka S: Identification of regions in which positive selection may operate in S-RNase of Rosaceae: Implication for S-allele-specific recognition sites in S-RNase. FEBS Letters. 1998, 440: 337-342. 10.1016/S0014-5793(98)01470-7.View ArticlePubMed
- Matton DP, Maes O, Laublin G, Qin X, Bertrand C, Morse D, Cappadocia M: Hypervariable domains of self-incompatibility RNases mediate allele-specific pollen recognition. The Plant Cell. 1997, 9: 1757-1766.View ArticlePubMedPubMed Central
- Matton DP, Luu DT, Qin X, Laublin G, O'Brien M, Maes O, Morse D, Cappadocia M: Production of an S-RNase with dual specificity suggests a novel hypothesis for the generation of new S alleles. The Plant Cell. 1999, 11: 2087-2097.View ArticlePubMedPubMed Central
- Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG: The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Research. 1997, 25: 4876-4882. 10.1093/nar/25.24.4876.View ArticlePubMedPubMed Central
- Rambaut A: Se-Al: Sequence Alignment Editor. 2002, [http://evolve.zoo.ox.ac.uk/]
- Suyama M, Torrents D, Bork P: PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments. Nucleic Acids Research. 2006, 34: W609-W612. 10.1093/nar/gkl315.View ArticlePubMedPubMed Central
- Ronquist F, Huelsenbeck JP: MrBayes 3.1: Bayesian phylogenetic inference under mixed models. Bioinformatics. 2003, 19: 1572-1574. 10.1093/bioinformatics/btg180.View ArticlePubMed
- Rambaut A, Charleston M: Tree Edit. Phylogenetic Tree Editor v1.0 alpha 8. 2001, [http://tree.bio.ed.ac.uk/software/treeedit/]
- Rozas J, Sanchez-DelBarrio JC, Messeguer X, Rozas R: DnaSP, DNA polymorphism analyses by the coalescent and other methods. Bioinformatics. 2003, 19: 2496-2497. 10.1093/bioinformatics/btg359.View ArticlePubMed
- Yang Z: Phylogenetic analysis by maximum likelihood (PAML). 2000, London: University College
- Yang Z, Wong WSW, Nielsen R: Bayes empirical Bayes inference of amino acid sites under positive selection. Molecular Biology and Evolution. 2005, 22: 1107-18. 10.1093/molbev/msi097.View ArticlePubMed
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.