Gene duplications play an important role in the evolution of functional protein diversity. Some models of duplicate gene evolution predict complex forms of paralog divergence; orthologous proteins may diverge as well, further complicating patterns of divergence among and within gene families. Consequently, studying the link between protein sequence evolution and duplication requires the use of flexible substitution models that can accommodate multiple shifts in selection across a phylogeny. Here, we employed a variety of codon substitution models, primarily Clade models, to explore how selective constraint evolved following the duplication of a green-sensitive (RH2a) visual pigment protein (opsin) in African cichlids. Past studies have linked opsin divergence to ecological and sexual divergence within the African cichlid adaptive radiation. Furthermore, biochemical and regulatory differences between the RH2aα and RH2aβ paralogs have been documented. It thus seems likely that selection varies in complex ways throughout this gene family.
Clade model analysis of African cichlid RH2a opsins revealed a large increase in the nonsynonymous-to-synonymous substitution rate ratio (ω) following the duplication, as well as an even larger increase, one consistent with positive selection, for Lake Tanganyikan cichlid RH2aβ opsins. Analysis using the popular Branch-site models, by contrast, revealed no such alteration of constraint. Several amino acid sites known to influence spectral and non-spectral aspects of opsin biochemistry were found to be evolving divergently, suggesting that orthologous RH2a opsins may vary in terms of spectral sensitivity and response kinetics. Divergence appears to be occurring despite intronic gene conversion among the tandemly-arranged duplicates.
Our findings indicate that variation in selective constraint is associated with both gene duplication and divergence among orthologs in African cichlid RH2a opsins. At least some of this variation may reflect an adaptive response to differences in light environment. Interestingly, these patterns only became apparent through the use of Clade models, not through the use of the more widely employed Branch-site models; we suggest that this difference stems from the increased flexibility associated with Clade models. Our results thus bear both on studies of cichlid visual system evolution and on studies of gene family evolution in general.
Gene duplication is known to play major roles in genomic and phenotypic evolution, and often precipitates divergent evolution of protein structure and function [1, 2]. A number of models have been proposed to explain the retention and evolution of duplicated genes in the face of deleterious, pseudogenizing mutations [3–5]; these models differ in the predictions they make about post-duplication sequence evolution and, consequently, in how amenable they are to investigation. The classic neofunctionalization model , for example, predicts a fairly simple pattern of post-duplication protein sequence evolution, with one paralog diverging while the other retains the ancestral function under a regime of purifying selection. However, other models, such as the duplication-degeneration-complementation model  and the escape from adaptive conflict model [7, 8], predict more complex forms of divergence. Furthermore, it is becoming increasingly recognized that divergent protein evolution can occur without duplication—that is, among orthologs—and that this can contribute to adaptive phenotypic evolution [9–11], contrary to classical assumptions . Distinguishing among models of gene duplication, and determining the relative roles played by adaptive and non-adaptive processes in protein evolution, thus requires approaches that can accommodate complex patterns of sequence evolution.
Recent advances in codon substitution models that account for variation in site-specific selective constraint among multiple clades or lineages [13, 14] provide a promising approach for distinguishing among models of gene duplication and evolution. Branch-site models [15, 16] are commonly used to detect the signature of strong site-specific positive selection along a pre-specified lineage after gene duplication (cf. 'conserved-but-different' or 'Type II' divergence patterns) [17, 18]. Clade models [19, 20], meanwhile, can be used to detect more subtle differences in site-specific selective constraint among entire clades or partitions of a phylogeny (c.f. 'covarion-like' or 'Type I' divergence patterns) [17, 18]. Clade models are not restricted to detecting strict cases of positive selection and can be used to consider variation among multiple clades simultaneously . As such, Clade models may be better suited to detecting complex forms of divergence in selective constraint across gene families than the Branch-site models . However, compared to the popular Branch-site models, Clade models have been relatively under used.
Species derived from recent adaptive radiations are intriguing systems for studying patterns of evolution at the molecular level, as rapid phenotypic evolution implies a comparable degree of change in the underlying genome [23, 24]. The endemic and diverse cichlid fishes of the Rift Valley of eastern Africa are thought to be the result of multiple, young, adaptive radiations [25, 26]. The high numbers of species found within the Rift Valley’s lakes and rivers, as well as the impressive degree of phenotypic variation present among closely related species, make these fishes ideal for studies on functional diversification and speciation. Recently, progress has been made associating adaptive phenotypic evolution in African cichlids with variation at the molecular level, for example with regard to jaw morphology or colour patterning, [27–30]. Perhaps most notably, a number of studies have linked ecological and sexual divergence among African cichlids to divergence in colour vision genes, the opsins [31, 32]. Opsins form the protein component of visual pigments, the photosensitive compounds expressed in the rod and cone photoreceptor cells of the retina that absorb and transduce photons of light into the biochemical signals that ultimately underlie the visual sense [33–35]. Amino acid substitutions that affect the opsin’s retinal chromophore binding pocket can alter the pigment's absorbance spectrum, generating variation in spectral sensitivity [36, 37]. Recent studies of African cichlid opsins have linked variation in opsin protein sequences and expression patterns to ecologically- and sexually-selected divergence among closely related populations and species [38, 39] and, as a result, African cichlids have emerged as model systems for study of the molecular biology, evolution, and ecology of opsins [31, 32].
Compared to most vertebrates, African cichlids possess a large number of opsin genes, with seven cone opsins and one rod opsin . The resulting visual pigments vary in spectral sensitivity, with the wavelength of maximal absorbance (λmax) ranging from the ultraviolet (UV) to the yellow. Most of these opsins are evolutionarily ancient, with orthologs present in most teleosts, if not most vertebrates. However, the green-sensitive RH2aα and RH2aβ opsins are relatively young [41, 42], and descend from a duplication event specific to the African cichlid clade . Spectrophotometric study of RH2aα and RH2aβ pigments from Oreochromis niloticus (Nile tilapia) and Metriaclima zebra (Lake Malawi zebra mbuna), expressed in vitro, revealed functional divergence between the paralogs, with the RH2aα pigments red-shifted (λmax ≈ 528 nm) compared to the RH2aβ pigments (λmax ≈ 518 nm) [41, 42]. However, it has proven difficult to broadly survey for variation in RH2aα/β λmax via microspectrophotometry (MSP) due to their fairly close λmax values and the noise inherent in MSP . Regulatory differences are also apparent [42, 44] but the high level of sequence similarity between these paralogs (~95% identical at the nucleotide level) makes quantitative PCR studies challenging. As a result of these methodological difficulties, it is sometimes simply assumed that the RH2aα and RH2aβ paralogs are sufficiently similar to justify treating them equivalently [45, 46]. However, there is reason to believe that comparably small differences in λmax can be ecologically and evolutionarily important in African cichlids , and whether or not these opsins are functionally equivalent from the perspective of African cichlid visual biology and fitness is not clear. Comparative sequence analysis may be able to provide useful insights into this system.
Given the important role vision plays in the African cichlid adaptive radiation [31, 32] as well as the important role gene duplication plays in functional diversification in general , we set out to explore patterns of sequence evolution associated with the RH2aα-RH2aβ gene duplication event using both Branch-site and Clade codon-substitution model approaches. We document complex patterns of divergence among duplicated African cichlid RH2aα and RH2aβ opsins, reflecting both paralog-specific and species-specific processes. Importantly, positive selection was documented not using Branch-site models, but the less widely employed Clade models. We discuss the implications of our findings in light of gene duplication theory, cichlid visual ecology, and opsin structure and function.
Phylogenetic analyses were carried out on a data set of 48 fish RH2 opsin sequences from 29 species; species names and accession numbers are provided in Additional file 1: Figure S1. Translated amino acid sequences were assembled in MEGA 4  and aligned using ClustalW , after which the extreme N- and C-termini of the opsin sequences were trimmed, leaving an alignment 343 codons in length. Bayesian phylogenetic analysis was carried out using MrBayes 3.2  using the GTR+I+Γ nucleotide substitution model, which was selected based on AIC rank, as calculated by MrModeltest 2.2 . Four runs, each consisting of four chains (three heated and one cold), were run for 5 x 106 generations, sampling every 100 generations. The first 25% of the samples were considered ‘burn-in’ and discarded. Adequate sampling and convergence were assured by ensuring that (1) the standard-deviation of split frequencies was less than 0.01 by the end of the analysis, (2) post-scale reduction factors were approximately 1.000 for all parameter estimates and topological partitions, (3) parameter estimate-by-generation plots were stationary, and (4) effective sample sizes were greater than 100. These checks were carried out through direct examination of the MrBayes output file and using Tracer 1.5 . A codon-partitioned approach was employed as well, assuming separate GTR+I+Γ substitution models for each of the three codon positions. Maximum likelihood (ML) phylogenetic analyses were carried out using PhyML 3.0  assuming the GTR+I+Γ nucleotide substitution model. Ten random trees plus a tree inferred using the BIONJ algorithm were used as starting trees, and both nearest neighbour interchange (NNI) and subtree-pruning and regrafting (SPR) tree-search approaches were used to explore tree space. Node support for the ML tree was evaluated using the SH-like approximate likelihood ratio test approach . For both Bayesian and ML analyses, four rate categories were assumed for the +Γ distribution used to described among-site rate variation .
After estimating the fish RH2 phylogeny, we focused on the RH2a opsin clade for subsequent molecular evolutionary analyses. Specifically, we extracted and analyzed the monophyletic sub-tree containing the duplicated African cichlid RH2aα and RH2aβ opsins plus five outgroup RH2a sequences; species names and accession numbers for these sequences can be found in Figure 1 and Additional file 1: Figure S1. Due to uneven taxonomic coverage in online genetic databases, the RH2aβ clade possesses opsin sequences from three additional species compared to the RH2aα clade. The three extra sequences in the RH2aβ clade are all derived from Lake Tanganyikan cichlids, which were surveyed prior to the discovery of the RH2a duplication event ; whether these species have retained or lost their RH2aα opsins is currently not known, though current phylogenetic hypotheses for Lake Tanganyikan cichlids  suggest that multiple deletions would be required for all three species to truly lack RH2aα opsins in their respective genomes (see Additional file 1: Figure S2). Note that the RH2aβ sequences from Ophthalmotilapia ventralis and Neolamprologous brichardi were obtained from cDNA even though relative RH2a gene expression is skewed towards the RH2aα paralog in Oreochromis niloticus and Lake Malawi haplochromine cichlids [42, 44]. RH2a opsin sequences are available for many other African cichlids, but often differ from one another at just one or two positions; because including all such sequences would greatly increase computational time without adding much information, we chose to focus on key sequences representing well-studied species and key cichlid lineages.
We explored patterns of selective constraint across this RH2a data set through the use of codon substitution models that include the nonsynonymous to synonymous substitution rate ratio (ω or dN/dS) as a parameter [56, 57]. The ω ratio speaks to the form and strength of selective constraint operating on protein-coding DNA: 0 < ω < 1 is consistent with purifying selection (nonsynonymous substitutions are accumulating more slowly than synonymous substitutions), ω = 1 suggests neutrality (nonsynonymous and synonymous substitutions are accumulating at equivalent rates), and ω > 1 indicates positive selection (nonsynonymous substitutions are accumulating faster than synonymous substitutions). Three different approaches were used, each of which makes different assumptions about how ω varies across the alignment and/or across the phylogeny: (1) Branch models , (2) Branch-site models , and (3) Clade models . Models were fit to the data using the codeml program of the PAML 4.2 software package . Within each of these model classes, likelihood ratio tests (LRTs) were used to compare the fit of complex models against simpler, nested models [60, 61]. LRTs were carried out by comparing twice the difference in ln likelihood scores of nested models against a χ2 distribution with the degrees of freedom equal to the number of extra parameters estimated by the more complex model. However, as LRTs can only be used to compare nested models, we also used AIC scores  to help convey the relative fit of the different Clade models applied to our fish RH2a data set. In addition to the selection pressure parameters (dN/dS, ω; proportions, p), the transition-to-transversion substitution rate ratio (κ) and branch lengths were optimized as well. Codon frequencies were approximated using the F3x4 calculation. Each model was fit to the data multiple times from different starting parameter values to help ensure local optima were avoided, with either ω or κ perturbed, as needed, depending on the particular model. As these methods assume that the aligned sequences are related by a phylogenetic tree, not by a reticulating network, the signature of gene conversion within this data set was searched for using Phi , as implemented in PhiPack. Significance was assessed either assuming a normally approximated null distribution or via a permutation approach (with 1000 permutations), and analyses were run with the window size left at the recommended default value of w = 100 and at w = 50. The results were qualitatively equivalent in spite of these changes; as such, only results derived using the normal approximation and w = 100 are shown. Dot plots were created using eBioX 1.5.1 .
Branch models  assume that the ω ratio varies across branches of the phylogeny (specified a priori) but that it is invariant across sites of the alignment; comparing complex Branch models (i.e., ones with multiple ω ratios) against simpler, nested models tests whether ω varies significantly between sections of the phylogeny. Since Branch models make the unrealistic assumption of among-site homogeneity, they often lack power to detect subtle patterns of divergence across phylogenies, and we conducted post-hoc Branch model analysis simply to help demonstrate our Clade model partitioning schemes (described below). Branch-site models and Clade models similarly allow for variation in ω among pre-specified branches of the phylogeny, but, unlike Branch models, also incorporate among-site variation in selective constraint. The signature of positive selection (ω > 1) along pre-specified lineages was tested for using the Branch-site approach of Zhang, Nielsen, and Yang . This model assumes that there are four classes of sites and that the phylogeny can be divided into ‘background’ and ‘foreground’ lineages based on an a priori hypothesis about when positive selection may have occurred. The first two classes of sites correspond to codons that experience selection consistently across the entire phylogeny, experiencing either purifying selection (0 < ω0 < 1) or neutral pressure (ω1 = 1), respectively. The final two classes of sites correspond to codons that experience purifying or neutral selection on the background lineages, but positive selection (ω2 > 1) on the foreground lineage. These four site classes comprise proportions p0 (universally-purifying site class), p1 (universally-neutral site class), p0*p2/(1 – p2) (purifying-to-positive selection site class), and p1*p2/(1 – p2) (neutral-to-positive selection site class) of the total data set (where p2 = 1 – p0 – p1). The goodness-of-fit of this Branch-site model is established by comparing it via a LRT against a constrained null model where ω2 = 1; this LRT thus tests for the presence of positively selected sites.
The signature of divergent selective constraint across the phylogeny was tested for using the Clade model C (CmC) approach of Bielawski and Yang  as modified by Yang, Wong, and Nielsen . In its simplest form, CmC assumes that the branches of the phylogeny can be divided into two partitions, the ‘background’ branches and the ‘foreground’ branches. CmC accommodates among-site variation in the substitution process by assuming three site classes. As with the Branch-site approach described above, the first two classes of sites correspond to codons that experience selection consistently across the entire phylogeny, experiencing either purifying selection (0 < ω0 < 1) or neutral pressure (ω1 = 1). The third site class accounts for codons that experience divergent selection pressures in different, pre-defined partitions (i.e., ω2 > 0 for the background branches and ω3 > 0 for the foreground branches). These site classes correspond to proportions p0, p1, and p2 of the total data set (where p2 = 1 – p0 – p1). Models M1a and M2a_rel, neither of which incorporates among-lineage variation in ω, were used as null models to test for the presence of divergently selected sites. M1a is the standard null model for CmC analyses ; M1a possesses only two site classes: one for sites subject to purifying selection (0 < ω0 < 1), and one for neutral sites (ω1 = 1). However, our previous analyses of simulated data sets revealed that the CmC versus M1a LRT is prone to false positive test results when faced with moderate among-site variation in selective constraint. We therefore also employed our newly proposed M2a_rel null model for CmC analyses ; M2a_rel possesses purifying and neutral site classes, like the M1a model, but also possesses a third site class under which a single ω ratio is estimated for all branches of the phylogeny (ω2 > 0). We checked the robustness of our results to slight changes in model framework by reanalyzing the data using the Clade model D (CmD) framework ; like CmC, CmD assumes three site classes with the final class modeling divergent selection among clades but, unlike CmC, no constraints are placed on the ω estimates for any of the site classes.
Yoshida et al.  recently extended CmC to allow for more than two tree partitions, each with a separately estimated ω ratio. We refer to this as a ‘multi-clade’ approach, and we used this approach to examine complex patterns of divergence in selection across the phylogeny by comparing such models against simpler, nested models with fewer tree partitions. Assuming a phylogeny can be partitioned into three clades (X, Y, and Z), the multi-clade approach could be used to estimate three separate ω ratios for the three tree partitions (ω2 > 0 for clade X, ω3 > 0 for clade Y, and ω4 > 0 for clade Z). Comparing this model against a simpler, null model with only two tree partitions (say, ω2 > 0 for clade X, and ω3 > 0 for both clade Y and clade Z) would constitute a test of whether selective constraint is equivalent in clades Y and Z (i.e., whether or not ω3 ≠ ω4). This LRT’s null model is formed by imposing a single, non-boundary constraint on the alternative model (i.e., the constraint that ω3 = ω4), reducing model size by one estimated parameter. As a result, the null distribution for this LRT should follow a χ2 distribution with one degree of freedom. We therefore generated simulated data sets assuming a CmC framework (with two tree partitions), and used these data sets to evaluate this multi-clade LRT’s false-positive rate.
Simulated data sets were generated using the evolver program of the PAML 4.2 software package . Following our earlier simulation study of CmC LRTs , 100 data sets of 10 taxa and 500 codons were simulated under CmC assuming the topology, branch lengths, and partitioning shown in Figure 2a. Half of the codons of each simulated data set (p0 = 0.5) were generated assuming strong purifying selection (ω = 0.0), p1 = 0.20 were generated assuming neutrality (ω1 = 1.0), and the remaining codons (p2 = 0.3) were simulated assuming divergent selection pressure between the solid (ω2 = 0.15) and dashed (ω3 = 0.65) tree partitions. Additional parameters for the simulations were set as follows: the transition to transversion substitution rate ratio (κ) was set to κ = 2.0; the total tree length (TL) was set to TL = 3.0 substitutions per codon; and the equilibrium frequency for each sense codon was set to 1/61. For null model analyses, we ran CmC assuming the two partitions shown in Figure 2a (i.e., correct partitioning), while for alternative model analyses, we ran CmC assuming the three partitions shown in Figure 2b (i.e., overly complex partitioning). Branch lengths were freely estimated, but κ and the equilibrium codon frequencies were fixed at their simulated values. Analyses were run multiple times from different starting ω values to help detect and avoid local optima in the likelihood surface, as described in Weadick and Chang . The LRT test statistics for each of the 100 simulated data sets—twice the difference in ln likelihood scores from the alternative and null model analyses—were calculated, compiled, and compared against the expected χ21 null distribution. Observed and expected cumulative density functions were compared via a one-sided Kolmogorov-Smirnov test and a one-sided binomial test. We carried out additional analyses on the 100 simulated data sets where the phylogenetic partitioning of the alternative and null models was misspecified, with branch 10 of the tree shown in Figure 2a,b included in the ‘dashed’ partition, rather than the correct ‘solid’ partition.
Following Clade model analysis, a Bayes empirical Bayes (BEB) approach was used to identify specific codons with high posterior probability (PP) of being in the ‘divergent selection’ site class . BEB-identified sites were then mapped on to the three-dimensional crystal structure of bovine rhodopsin (PDB accession 1u19)  using MacPyMol (Delano Scientific). The phylogenetic location of specific amino acid substitutions was inferred by using ancestral reconstruction methods  to estimate the most probable residue at each node under the WAG+F+Γ amino acid substitution model [53, 71]. Site numbering is based on alignment against bovine RH1 opsin (rhodopsin).
Phylogenetic analyses recovered reciprocally monophyletic clades of African cichlid RH2aα and RH2aβ opsins and a sister relationship between the African cichlid RH2a opsin clade and the RH2a opsin of the Neotropical cichlid Crenicichla frenata (Figure 1). Trees estimated via the codon-partitioned Bayesian approach and the ML approach were highly similar, and only differed with respect to the arrangement of a few of the highly similar haplochromine cichlid RH2a opsin sequences; the full Bayesian (codon-partitioned) and ML trees are provided in Additional file 1: Figure S1. The non-partitioned and codon-partitioned Bayesian methods yielded trees with equivalent branching patterns (result not shown). For molecular evolutionary analyses, we focused on the subtree corresponding to the RH2a opsins of cichlids and those of closely related atherinomorph fishes (guppy, Poecilia reticulata; bluefin killifish, Lucania goodei; and medaka, Oryzias latipes) from the codon-partitioned Bayesian phylogeny (Figure 1). Fitting the simple M0 codon substitution model to this data set provided an overall ω estimate of 0.1476, indicating that purifying selection is the predominant force shaping the evolution of these RH2a opsin sequences. Estimates of dS, calculated for each branch given the branch length, the number of nonsynonymous and synonymous sites in the sequence, and the overall ω estimate calculated under the M0 model, were always well below one, indicating that saturation of synonymous substitutions is unlikely to adversely affect our analyses.
Gene conversion between paralogs violates the assumptions of ML methods for estimating ω, and can even cause spurious signatures of positive selection under some conditions . It has previously been suggested that gene conversion is unlikely to affect the African cichlid RH2a opsins because the paralogs are arranged in a head-to-head manner . However, recent work on rodent genomes has shown that duplicates oriented in such a fashion are as prone to gene conversion as those oriented in the typical head-to-tail manner . We therefore used Phi to test for local correlations in phylogenetic incompatibility across the data set (i.e., across the alignment). While we did detect a significant signature of recombination (P = 0.028), this reflected gene conversion between the distantly related RH2B and RH2C paralogs of the medaka (Oryzies latipes), not gene conversion between the African cichlid RH2a opsins. Unlike the African cichlid RH2a paralogs, these duplicated medaka opsins are oriented in a head-to-tail manner, indicating an independent duplication event . Visual inspection revealed that the third and fourth introns of the medaka’s RH2B and RH2C paralogs are highly similar, while the first and second introns are divergent (Additional file 1: Figure S3), and removing either (or both) of the medaka RH2B/C paralogs from the data set eliminated the signature of conversion (P > 0.50 in all cases). Furthermore, strong gene conversion should result in sequences clustering by species, not by paralog, and this pattern is not observed within the African cichlid RH2a opsin portion of the estimated phylogeny (Figure 1). These results suggest that gene conversion has not had a notable impact on the evolutionary trajectories of the coding sequences of the duplicated African cichlid RH2a opsins and thus should not have adverse effects on our analyses. Visual inspection of the RH2aα and RH2aβ opsins of Oreochromis niloticus, however, revealed that introns one and four are highly similar, while introns two and three are relatively divergent (Additional file 1: Figure S3). This may indicate that natural selection is maintaining distinct opsin coding sequences in spite of at least some intronic gene conversion, as occurs in human red and green opsins . Alternatively, the fact that these introns are highly similar may reflect strong purifying selection on non-coding motifs with roles in gene regulation or splicing control; more intronic RH2a sequence data, obtained from numerous species, will be needed to address these possibilities.
Several amino acid substitutions occurred along the RH2aα and RH2aβ post-duplication branches (Additional file 1: Table S1), but in neither case did we obtain evidence indicative of adaptive divergence following gene duplication using Branch-sites methods (Table 1). The Branch-site LRT for positive selection  was applied four times, with different branches set as the foreground partition: (1) the RH2aα post-duplication branch; (2) the RH2aβ post-duplication branch; (3) both post-duplication branches, combined; and (4) the branch joining the paralogous clades in a reduced data set for which outgroup sequences were excluded. Our Branch-site tests remained non-significant even when a more liberal null distribution (a 50:50 mixture of 0 and χ21) was employed instead. We do note, however, that the substitutions inferred along the RH2aβ post-duplication branch were densely clustered (substitutions occurred at six sites in a 13 site stretch: sites 27–39; Additional file 1: Table S1). This region of the protein has recently been proposed to serve as the entry channel for the retinal chromophore into the protein’s binding pocket . One of these substitutions (I36F) involved the replacement of a non-aromatic isoleucine residue with an aromatic phenylalanine residue at a site located at the base of the proposed entry channel; aromatic residues are thought to assist in retinal uptake , suggesting that this change may enhance visual pigment regeneration rate.
Parameter estimates, log-likelihood scores, and likelihood ratio test (LRT)Pvalues obtained from Branch-site analyses of the RH2a data set
Site class 0
Site class 1
Site class 2
LRT P value
BrS-A α (37)
BrS-N α (36)
BrS-A β (37)
BrS-N β (36)
BrS-A αβ (37)
BrS-N αβ (36)
BrS-A αβ reduced (27)
BrS-N αβ reduced (26)
NOTE—n.p number of parameters.
Given our initial findings using Branch-site methods, we explored alternative signatures of divergence in selective constraint using the Clade model C (CmC) approach [20, 21, 66]. First, we built on our earlier simulation-based study of CmC LRTs  in order to evaluate the appropriateness of the χ21 null distribution for the multi-clade LRT comparing CmC with three tree partitions against a simpler, nested version of CmC with only two tree partitions . The empirical and expected distributions of LRT test statistics from our simulation analyses are plotted in Figures 2c and 2d, and it can be seen that the two distributions follow one another quite closely. Parameter estimates under the alternative and null model are summarized in Additional file 1: Figure S4. Although eight of the 100 tests indicated positive test results, this value is not significantly different from the expected 5% (one-sided binomial test: P = 0.1280). Furthermore, a Kolmogorov-Smirnov test comparing the observed and expected cumulative density functions was non-significant, indicating a good fit to the expected null distribution (one-sided test for an empirical distribution falling below the χ21 distribution: D = 0.0609; P = 0.4759). Our simulation results therefore suggest that the LRT comparing CmC with three tree partitions against a simpler, nested version of CmC with two tree partitions can be evaluated using a χ21 null distribution, though we note that further analyses are needed to fully evaluate the reliability and power of this approach. Recently, Gossmann and Schmid  carried out similar simulation-based analyses of this LRT, concluding that it has fair-to-good power and a relatively low false positive rate; however, these analyses were carried out on smaller data sets than we employed here.
Additional analyses of the simulated data sets were carried out using an incorrect phylogenetic partitioning strategy. Given the tree shown in Figure 2b, we treated the branch leading to tip #10 as part of the ‘dashed’ partition, rather than the correct ‘solid’ partition, and we then tested for divergence (i.e., ω3 ≠ ω4) by comparing the ‘dotted’ partition against the expanded, heterogeneous ‘dashed’ partition. Based on the simulated branch lengths and ω parameter values, misspecifying the phylogenetic partitioning in this way should reduce the ω3 estimate (to ω3 ≈ 0.52) compared to the ω4 estimate (ω4 ≈ 0.65), generating a signature of divergence. We found that 23 of the 100 LRTs were significant, suggesting weak power for this test under the given conditions. Maximum likelihood estimates of the parameter values appeared to be accurate for most of the parameters (ω0, ω2, ω3, p0, p2), though slightly upwardly biased for the ω4 estimate (Additional file 1: Figure S4). Given these results, we conclude that the properly specified multi-clade LRT is statistically sound, but caution that care must be taken when designating partitions for Clade model analyses. Specifically, we recommend that partitioning choices be carefully based on external considerations, such as gene duplication theory, taxonomic sampling, or phylogenetic patterns of niche variation. Of course, additional simulation-based studies of this LRT’s properties will be beneficial, and future work should address the performance of this test using larger data sets and assuming more complex evolutionary scenarios designed to challenge the assumptions of the alternative and null models.
We first applied CmC with either the entire RH2aα clade (‘CmC α’) or the entire RH2aβ clade (‘CmC β’) set as the foreground partition (Figure 1); all other branches comprised the background partition. In both cases, the ω ratio for the divergent selection site class changed from less than one on the background lineages (ω2 ≈ 0.55–0.65) to greater than one on the foreground lineages (ω3 ≈ 1.15–1.53), with approximately 21% of the data set assigned to the divergent selection site class (Table 2). These estimates suggest that selective constraint was relaxed following duplication, and may indicate the action of weak positive selection as well. Both models fit the data significantly better than the M1a null model, but ‘CmC α’ did not fit the data significantly better than the more reliable M2a_rel null model (Table 3). However, this ‘CmC α’ model includes the RH2aβ clade as part of the ‘background’ partition alongside the outgroup orthologs of non-African cichlids. This may be inappropriate, as the ‘CmC β’ vs. M2a_rel LRT suggested a large increase in ω for the RH2aβ clade. To evaluate this possibility, we employed a multi-clade CmC approach assuming three partitions: the RH2aα branches, the RH2aβ branches, and the outgroup orthologous branches. Using this model, which we call the ‘CmC α & β’ model, it was estimated that approximately 21% of the data set evolved under divergent selective constraint across the three partitions, with a ω ratio less than one along the outgroup branches (ω2 = 0.50), slightly above one along the RH2aα branches (ω3 = 1.13), and somewhat higher still along the RH2aβ branches (ω4 = 1.54) (Table 2). Comparison against the ‘CmC β’ model yielded a significant LRT result (Table 3), indicating that selective constraint did indeed change after duplication in the RH2aα clade; this difference only became statistically significant once the elevated ω ratio for the RH2aβ clade was accounted for in the model. Comparison against a different null model, one where all African cichlid RH2a branches were considered as a single foreground partition (termed the ‘CmC αβ’ model), revealed that the ω ratios estimated for the RH2aα and RH2aβ partitions under the ‘CmC α & β’ model were not significantly different from one another (Table 3), with a common ω ratio estimate of ω3 = 1.39 (Table 2).
Parameter estimates, log-likelihood scores, and AIC weights obtained from Clade model analyses of the RH2a data set
ω2, ω3, ω4
CmC αβMVR & βT (39)
CmC αβ (38)
CmC α & β (39)
CmC β (38)
CmC βT (38)
CmC α (38)
NOTE—n.p. = number of parameters; SC = site class.
Models are ranked according to AIC score.
Likelihood ratio test (LRT)Pvalues for nested Clade model C (CmC) comparisons
CmC αβMVR & βT
< 0.0001 (4)
< 0.0001 (2)
CmC α & β
< 0.0001 (4)
< 0.0001 (2)
< 0.0001 (1)
< 0.0001 (3)
< 0.0001 (1)
< 0.0001 (3)
< 0.0001 (3)
Degrees of freedom for each LRT are indicated in parentheses.
Our RH2a data set was taxonomically unbalanced, possessing three Lake Tanganyikan cichlid RH2aβ sequences without corresponding RH2aα paralogs, and our Branch model analyses revealed that selective constraint was significantly different between the branches corresponding to these Lake Tanganyikan cichlid opsin RH2aβ lineages and the remaining RH2aβ branches (Figure 1, inset). It is therefore possible that the increased ω ratio observed for the RH2aβ clade in our CmC analyses was driven by these Lake Tanganyikan RH2aβ sequences. In lieu of removing these taxonomically-unbalancing sequences from the alignment, which would represent an unfortunate loss of data, we carried out further analyses using the recently developed multi-Clade approach . Specifically, we added a third partition to the ‘CmC αβ’ model that separated the Lake Tanganyikan RH2aβ branches (ββ) from the Lake Malawi, Lake Victoria, and riverine RH2aα and RH2aβ branches (αβMVR). Using this model, which we call the ‘CmC αβMVR & ββ’ model, it was estimated that approximately 21% of the data set evolved under divergent selective constraint, with a ω ratio less than one along the outgroup branches (ω2 = 0.50), slightly above one along the αβMVR branches (ω3 = 1.09), and substantially higher along the βT branches (ω4 = 2.61) (Table 2). Out of the Clade models considered, the ‘CmC αβMVR & βT’ model was the best fitting according to AIC scores, and comparison against the ‘CmC αβ’ model yielded a significant LRT result (Table 3), indicating that the ω ratio differs significantly between the Lake Tanganyikan cichlid RH2aβ opsins (the βT partition) and the RH2aα and RH2aβ opsins of other African cichlids (the αβMVR partition). Importantly, partitioning the tree in this manner did not eliminate the ω ratio increase observed for African cichlid RH2a opsin clade compared to the outgroup orthologs (LRT against ‘CmC βT’; Tables 2, 3). Finally, the divergent ω ratio estimate for the Lake Tanganyikan RH2aβ branches (the βT partition) was significantly greater than ω = 1 (Table 4), indicating that the increase in ω seen for the βT partition is due to site-specific positive selection and not simply relaxed functional constraint. Conversely, the divergent ω ratio estimated for the αβMVR partition, though elevated, was not found to significantly exceed ω = 1 (Table 4).
Likelihood ratio tests (LRTs) to determine whether foreground ω estimates from the ‘CmC αβMVR& βT’ model significantly differ from ω = 1
Unconstrained ω estimate
lnL with constraint ω = 1
αβMVR (RH2aα and non-Tanganyikan RH2aβ branches)
βT (Tanganyikan RH2aβ branches)
Both LRTs have 1 degree of freedom.
Reanalyzing the data under the CmD framework gave broadly similar results (compare Tables 2 and 3 with Additional file 1: Tables S2 and S3). First, parameter estimates were qualitatively similar between CmC and CmD analogues. Second, the rank order of CmC and CmD analogues was almost completely equivalent, with the only difference being the relative position of the two and three site-class null models (M1a and M2a_rel for CmC; M3 (K = 2) and M3 (K = 3) for CmD), which in both cases were very poor fits to the data. And, finally, P values for the various LRTs were generally similar, with only three tests providing qualitatively different results under the two model frameworks (one test achieved significance under CmD but not under CmC, and two others achieved significance under CmC but not CmD); notably, non-significant P values were still less than P = 0.10 for these each of these three cases. Notwithstanding these few differences, the broad similarity of AIC ranks, the majority of the LRTs, and the qualitative agreement in parameter estimates suggest that our CmC analyses have provided reliable inferences on cichlid RH2a opsin evolution.
Eighty-one of the 343 total codons in the alignment were variable at the amino acid level and, of these, 27 were identified by BEB analysis as members of the divergently evolving site class under the best fitting ‘CmC αβMVR & βT’ model (PP > 0.75 for each site) (Table 5). Included among these identified sites are seven of the 10 sites that substituted along the RH2aα and RH2aβ post-duplication branches (Additional file 1: Table S1). The position of these sites within the opsin protein's secondary and tertiary structure are shown in Figure 3. Most of these sites are situated within the opsin protein’s extracellular half (Figure 3) and several are, or are adjacent to, known RH2 pigment spectral tuning sites (Table 5). Most prominent among these is site 122; all African cichlid species in our data set possess a glutamate residue (E122) at this site except for the Lake Tanganyikan species Ophthalmotilapia ventralis, which instead possesses a glutamine residue (Q122). The E122Q substitution is known to have a large effect on RH2 pigment spectral sensitivity, shifting λmax to shorter wavelengths by ~12-16 nm [78–80]. Moreover, this substitution is known to dramatically affect non-spectral properties of visual pigments (discussed below). Two other notable substitutions occurred along this same branch: F213V and G273V. Mutating site 273 has been shown to affect retinal uptake [76, 81], while substitutions at site 213 are known to have slight effects on λmax in zebrafish (Danio rerio) RH2 opsins . Furthermore, substitutions at site 213 and 273 have been experimentally shown to influence the decay rate of the activated visual pigment in zebrafish RH1 opsins (J.M. Morrow and B.S.W. Chang, unpublished data). Interestingly, these three sites (122, 213, and 273) all substituted independently along the branch terminating with the Oryzies latipes (medaka) RH2C sequence, which suggests that some non-spectral opsin properties may be evolving convergently.
Sites identified as divergently evolving by Bayes empirical Bayes inference under the ‘CmC αβMVR& βT’ Clade model
Notes on Opsin Structure–Function
N-term / TM1
RH2 spectral tuning site . Situated on edge of retinal uptake/release channel .
Adjacent to RH2 spectral tuning site (site 108) .
E1 / TM3
Adjacent to RH2 spectral tuning site (site 108) . Adjacent to cysteine bond site (site 110) .
RH2 spectral tuning site . Adjacent to opsin counterion (site 113) .
Major RH2 spectral tuning site . Also influences G protein activation efficiency, active-state decay rate, and visual pigment regeneration rate .
NOTE— Site numbering follows bovine RH1 opsin. Approximate location follows Sakmar et al. . Abbreviations: N-term N-terminal tail, TM Transmembrane helix, E, Extracellular loop, C Cytoplasmic loop, C-term C-terminal tail.
Only sites with posterior probability (PP) > 0.75 are shown. Underlined sites are those which substituted along branches within the Lake Tanganyikan cichlid RH2aβ partition of the phylogeny.
A total of 178.5 amino acid changes were estimated to have occurred across the entire tree (given an alignment of 343 codons and a total tree length of 0.52032 amino acid substitutions per site, as estimated under the WAG+F+Γ amino acid substitution model). By reconstructing the ancestral amino acid residues at each node, we infer that 86 nonsynonymous changes, at minimum, occurred at the 27 BEB identified sites (mean ± SD = 3.19 ± 1.11 amino acid changes per BEB site; range = 2–6). For 18 of these 27 BEB sites, substitutions occurred along branches within the Lake Tanganyikan cichlid RH2aβ partition. Examining the changes at these sites in the context of the phylogeny reveals a few interesting patterns (Figure 4, Additional file 1: Table S4). First, BEB identified sites 273 and 277 both substituted along the branch terminating in the Ophthalmotilapia ventralis RH2aβ opsin sequence; these two sites are situated approximately one α helical turn apart in the opsin’s sixth transmembrane α helix, suggesting coevolutionary change. Possibly, this pattern reflects compensatory evolution, as one of the substitutions (G273V) introduced a larger amino acid while the other (M277L) introduced a smaller one. Similarly, BEB identified sites 158 and 162 are both situated one α helical turn apart in the fourth transmembrane α helix, and both substituted along the branch terminating in the Tropheus duboisi RH2aβ opsin sequence. Again, we see one substitution introduce a larger reside (L158F) and the other introduce a smaller residue (I162V). These two sites both project outwards into a proposed opsin dimerization interface , and position 162 has been experimentally shown to contribute to dimerization in the homologous dopamine D2 receptor proteins . Changes at these sites may thus influence dimerization strength, which, in turn, can influence G protein binding . Interestingly, these changes also occurred along the RH2aα post-duplication branch, suggesting not simply coevolution, but convergence as well. Finally, a cluster of BEB identified sites (sites 107, 109, and 112) found at the boundary between the opsin’s first extracellular loop and third transmembrane α helix all substituted along the branch leading to the last common ancestor of the Ophthalmotilapia ventralis and Neolamprologus brichardi RH2aβ opsin sequences. Site 112 is a known RH2 pigment spectral tuning site , while sites 107 and 109 surround another known RH2 spectral tuning site (site 108) . The substitution at site 109 is particularly interesting, as the inferred change (F109S) is quite physicochemically severe, replacing a bulky, hydrophobic phenylalanine residue with a smaller, hydroxyl-bearing serine residue. More broadly, it can be seen that more sites known to affect spectral sensitivity (or that are adjacent to such sites) substitute within the RH2aβ clade compared to the RH2aα clade. Most of this difference stems from changes along the branch ancestral to Ophthalmotilapia ventralis and Neolamprologus brichardi, and along the branch terminating in Ophthalmotilapia ventralis.
Evolution after gene duplication is often characterized by some combination of relaxed selective constraint and the positively selected fixation of advantageous mutations, ultimately resulting in functional divergence among paralogs [3–5, 89]. Consistent with these expectations, our results (1) revealed a dramatic change in selective constraint following the African cichlid RH2a opsin duplication event and (2) identified the signature of divergent evolution at several amino acid sites of known functional importance in RH2 visual pigments. Clade model analyses revealed that, after duplication, the selective regime experienced by many alignment sites changed from weak purifying selection to either neutral evolution or weak positive selection. Interestingly, this switch in constraint applied to both duplicated clades relative to the outgroup orthologs, and this pattern was only detected once divergence among entire clades was considered but not when just the branches immediately following the duplication event were considered.
While the patterns of evolution we observed in the fish RH2a opsin data set are consistent with a dramatic change in selective constraint following the African cichlid RH2a opsin duplication event, it is not obvious which models of gene duplication are operating here, as the observed patterns do not neatly fit the predictions of most models. The adaptive and non-adaptive (Dykhuizen-Hartl) neofunctionalization models [3, 90] both posit that one copy accumulates previously deleterious substitutions, potentially leading to the evolution of a new function, while the other retains the ancestral function under a regime of purifying selection. These neofunctionalization models thus predict asymmetrical rates of evolution after duplication between paralogs, but our results indicate approximately equal shifts in selective constraint after duplication in both duplicates (with ω changing from ω ≈ 0.5 before duplication to ω ≈ 1.1–1.5, afterwards). The segregation avoidance model [3, 91] proposes that duplication may beneficially fix both alleles at loci harbouring balanced polymorphisms, thus eliminating costs associated with segregation load. This model predicts that functional divergence occurs among alleles before duplication, not long after the duplicate loci have fixed as we have found. The dosage model operates when possessing multiple loci provides a beneficial increase in gene product [3, 92]; this model seems inappropriate as well, as RH2aβ opsin expression is generally quite low in both Oreochromis niloticus and Lake Malawi haplochromine cichlids [42, 44], and as the paralogs are known to be functionally divergent, contrary to model predictions. The popular duplication-degeneration-complementation model applies to multifunctional proteins, and describes a scenario by which each daughter protein neutrally loses one or more of the multiple functions such that both copies are then needed to perform all tasks . This model could explain increases in ω seen in both daughter clades, but this model seems unlikely to apply to opsins, at least at the protein coding level, as it is difficult to conceive of a way opsin protein biochemistry could be subfunctionalized. Opsin proteins have several measurable biochemical phenotypes (including λmax, active state stability, and regeneration rate), but proper visual pigment functioning requires an integrated protein for successful phototransduction. Subfunctionalization could occur at the regulatory level, however .
The ‘gene sharing’ model (also referred to as the ‘specialization’ or ‘escape from adaptive conflict’ model) [7, 8] seems to be the most appropriate model for our cichlid opsin data set. If a single-copy protein’s ability to efficiently serve two roles is compromised due to pleiotropy then, after duplication, each copy can adaptively specialize on one of the two roles (assuming both roles are suboptimal in the ancestor). The post-duplication ω increases we observed for our RH2a data set may indicate weak positive selection in both paralogs, which would be consistent with this prediction. While this model is typically applied to multifunctional proteins that carry out two totally different roles (e.g., a structural role and an enzymatic role, as in α lens crystallins), it can also apply to proteins that perform a single biochemical task (e.g., a particular enzymatic reaction) if there is a benefit to having copies with subtly different rates or efficiencies . With regard to opsins, this could mean that a property such as λmax diverged in both copies compared to the ancestor, one to a longer wavelength and one to a shorter wavelength. Similarly, structural constraints may prevent co-optimization of different aspects of the protein’s overall function. Such predictions are testable through ancestral reconstruction and functional characterization of the single-copy ancestor. Of course, it should be noted that many of these models of gene duplicate retention and evolution can act in concert or subsequent to one another . Indeed, the patterns of amino acid substitutions inferred along the RH2aβ post-duplication branch are quite different than those inferred along the RH2aα post-duplication branch, with six highly-clustered sites substituting along the RH2aβ branch (Additional file 1: Table S1). Changes at these sites could conceivably influence pigment regeneration rate , and this may indicate that both neofunctionalization and specialization occurred following the RH2a duplication event in this system. Furthermore, here we have only considered the evolution of protein biochemistry, but these models of duplicate gene evolution can also be considered with regard to gene expression patterns.
The fact that we uncovered among-lineage ω variation when Clade models were used, but not when Branch-site models were used, opens the possibility that divergence among African cichlid RH2a opsins is also influenced by a collection of lineage-specific processes. Consistent with this hypothesis, we documented a large increase in ω along Lake Tanganyikan cichlid RH2aβ opsin branches that was above and beyond the increase already described for the RH2aβ clade. The estimated ω ratio for this tree partition was significantly above one, clearly indicating the action of positive selection. This finding is of note given that our initial focus was only on divergence associated with gene duplication, not divergence among orthologs. Our study thus serves as an example of how the evolution of duplicated genes can be driven by both paralog-specific and species-specific processes . This point has practical importance as well. Many studies within the field of visual ecology assume functional equivalence among ortholgous pigments and model focal species’ perceptual abilities using data from close relatives . Our results suggest that such an approach should only be applied tentatively for studies on cichlid visual ecology.
At this point, it is difficult to say what factors are behind the positive selection operating along the Lake Tanganyikan cichlid RH2aβ opsin lineages, as the visual niches inhabited by the three Lake Tanganyikan species included in our data set have surely evolved over the time-scale captured by our phylogeny. These three species inhabit distinct visual niches, varying in colour patterning, habitat depth, and diet , and these ecological differences could precipitate divergent selection on opsin biochemistry and expression; the detection of divergent sexually selected courtship signals or food sources may select for divergence in λmax, while vision under brighter or dimmer conditions could select for divergence in non-spectral, kinetic properties of visual pigments. Interestingly, some of the sites identified as positively selected along Lake Tanganyikan RH2aβ opsin lineages are known to effect both spectral (i.e., λmax) and non-spectral attributes of visual pigments (Table 5). Most notably, the E122Q substitution, which occurred along the terminal branch leading to the Lake Tanganyikan cichlid Ophthalmotilapia ventralis, is known to increase the λmax of RH2 pigments by a large amount (~12-16 nm) [78–80], and to affect a number of non-spectral, kinetic properties, including the photoactivated pigment’s decay rate, the efficiency with which the activated pigment activates the downstream G protein, and the rate of visual pigment reformation following retinal release [84, 97]. For each of these properties, experimentally adding the E122Q substitution to rod pigments produces mutant pigments that behave in a more cone-like manner (i.e., with a faster rate of active state decay and faster pigment regeneration). Ophthalmotilapia ventralis is known to generally reside at shallower depths than the other two Lake Tanganyikan cichlids in our data set (approximate depth range: O. ventralis 2–10 m; N. brichardi 5–30 m, T. duboisi 3–15 m) , where the visual environment is expected to be somewhat brighter. We thus see a compatible pattern between opsin molecular evolution and comparative visual ecology in this system. Overall, our results suggest that the RH2aβ opsins play an important role in vision in at least some Lake Tanganyikan species, despite the fact that the RH2aβ opsin has generally been found to be lowly expressed in other African cichlids.
It is notable that we were only able to uncover among-lineage ω variation once Clade models were employed, but not when the more popular Branch-site models were used. A number of factors are expected to affect the power of the Branch-site test , and it may be that future analyses of larger data sets will yield different results, though we suspect otherwise, as substitutions occurred at several amino acids along both of the post-duplication branches. It appears that functional divergence simply occurred in a manner undetectable by the Branch-site methods. Branch-site models assume a very specific form of functional divergence—that is, a punctuated burst of adaptive sequence turnover—but functional divergence can instead manifest as variation in the overall strength of constraint or residue conservation between clades . Most of the sites that substituted along the RH2aα and RH2aα post-duplication branches also substituted along other branches of the phylogeny, and such substitution patterns may not fit neatly within the site classes defined by the Branch-site models. The design of Clade models makes them better able to detect this alternative signature of functional divergence. Furthermore, the patterns we uncovered were only detectable because the Clade model approach was recently expanded to allow for multiple foreground partitions , which allowed us to fit models that accommodate multiple shifts in the ω ratio across the phylogeny. To date, only three other studies have employed this new approach: Yoshida et al.  uncovered variation in the strength of positive selection affecting HIV env genes sampled from different decades; Wei and Ge  documented divergent selective constraint among duplicated MADS-box transcription factors in grasses; and, finally, Gossmann and Schmid  studied genome-wide patterns of divergence among duplicated Arabidopsis thaliana genes. We note that our study is the first, to our knowledge, to explicitly investigate divergence among both orthologs and paralogs in the same data set. Finally, it is noteworthy that several studies have used the results of Clade model analyses to support arguments of positive selection [101, 102], but whether or not the relevant ω estimates significantly exceed ω = 1 has not been explicitly tested, as we have done here; this point, while at first glance technical in nature, is of large importance, as ω estimates larger than ω = 1 may occur due to chance.
In conclusion, our results are indicative of functional divergence among African cichlid RH2a opsins driven, at least in part, by positive selection. Combined with the insights of past studies, which indicate biochemical and expression differences among paralogs and among species, our results suggest that there is much to be learned by distinguishing among African cichlid RH2a opsin sequences, rather than grouping them together on account of their high sequence similarity. Furthermore, these results provide a framework for mechanistic studies of functional diversification among cichlid RH2a opsins, and help establish African cichlid RH2a opsins as a useful system for research on how function and linkage shape the evolution of young tandem duplicates . Finally, our study adds to a growing body of research directed towards uncovering the molecular signature of diversification within the rapidly speciating African cichlid clade.
Helen Rodd provided feedback and support throughout all stages of this project. We thank Bonnie Fraser, Asher Cutter, Karen Carleton, Marla Sokolowski, and two anonymous reviewers for providing feedback on earlier versions of this manuscript, David Yu, Frances Hauser, and Shannon Refvik for proofreading, and the members of the Chang lab for informative discussions. This work was supported by the National Sciences and Engineering Research Council of Canada (CJW, BSWC), and a University of Toronto Vision Science Research Program Fellowship (CJW).
Department of Evolutionary Biology, Max Planck Institute for Developmental Biology
Department of Ecology and Evolutionary Biology, University of Toronto
Department of Ecology and Evolutionary Biology, Department of Cell and Systems Biology, and Centre for the Analysis of Genome Evolution and Function, University of Toronto
Hanada K, Kuromori T, Myouga F, Toyoda T, Shinozaki K: Increased expression and protein divergence in duplicate genes is associated with morphological diversification.PLoS Genet 2009,5(12):e1000781.PubMedView Article
Lynch M, Conery JS: The origins of genome complexity.Science 2003,302(5649):1401–1404.PubMedView Article
Ohno S: Evolution by Gene Duplication. Springer-Verlag, New York; 1970.
Hahn MW: Distinguishing Among Evolutionary Models for the Maintenance of Gene Duplicates.J Hered 2009,100(5):605–617.PubMedView Article
Conant GC, Wolfe KH: Turning a hobby into a job: How duplicated genes find new functions.Nat Rev Gen 2008,9(12):938–950.View Article
Force A, Lynch M, Pickett FB, Amores A, Yan YL, Postlethwait J: Preservation of duplicate genes by complementary, degenerative mutations.Genetics 1999,151(4):1531–1545.PubMed
Hughes AL: The evolution of functionally novel proteins after gene duplication.Proc R Soc Lond B Biol Sci 1994,256(1346):119–124.View Article
Levasseur A, Orlando L, Bailly X, Milinkovitch MC, Danchin EGJ, Pontarotti P: Conceptual bases for quantifying the role of the environment on gene evolution: the participation of positive selection and neutral evolution.Biol Rev 2007,82(4):551–572.PubMedView Article
Hoekstra HE, Coyne JA: The locus of evolution: Evo devo and the genetics of adaptation.Evolution 2007,61(5):995–1016.PubMedView Article
Studer RA, Robinson-Rechavi M: How confident can we be that orthologs are similar, but paralogs differ?Trends Genet 2009,25(5):210–216.PubMedView Article
Kimura M, Ohta T: Some principles governing molecular evolution.Proc Natl Acad Sci U S A 1974,71(7):2848–2852.PubMedView Article
Anisimova M, Liberles DA: The quest for natural selection in the age of comparative genomics.Heredity 2007,99(6):567–579.PubMedView Article
Anisimova M: Parametric models of codon evolution. In Codon Evolution: Mechanisms and Models. Edited by: Cannarozii GM, Schneider A. Oxford University Press, Oxford; 2012.
Yang Z, Nielsen R: Codon-substitution models for detecting molecular adaptation at individual sites along specific lineages.Mol Biol Evol 2002,19(6):908–917.PubMedView Article
Zhang JZ, Nielsen R, Yang ZH: Evaluation of an improved branch-site likelihood method for detecting positive selection at the molecular level.Mol Biol Evol 2005,22(12):2472–2479.PubMedView Article
Studer RA, Robinson-Rechavi M: Large-scale analysis of orthologs and paralogs under covarion-like and constant-but-different models of amino acid evolution.Mol Biol Evol 2010,27(11):2618–2627.PubMedView Article
Gu X: A simple statistical method for estimating type-II (cluster-specific) functional divergence of protein sequences.Mol Biol Evol 2006,23(10):1937–1945.PubMedView Article
Forsberg R, Christiansen FB: A codon-based model of host-specific selection in parasites, with an application to the influenza A virus.Mol Biol Evol 2003,20(8):1252–1259.PubMedView Article
Bielawski JP, Yang ZH: A maximum likelihood method for detecting functional divergence at individual codon sites, with application to gene family evolution.J Mol Evol 2004,59(1):121–132.PubMedView Article
Yoshida I, Sugiura W, Shibata J, Ren FR, Yang ZH, Tanaka H: Change of Positive Selection Pressure on HIV-1 Envelope Gene Inferred by Early and Recent Samples.PLoS One 2011,6(4):e18630.PubMedView Article
Chang BSW, Du J, Weadick CJW, Muller J, Bickelmann C, Yu DD, Morrow JM: The Future of Codon Models in Studies of Molecular Function: Ancestral Reconstruction, and Clade Models of Functional Divergence. In Codon Evolution: Mechanisms and Models. Edited by: Cannarozii GM, Schneider A. Oxford University Press, Oxford; 2012.
Hodges SA, Derieg NJ: Adaptive radiations: from field to genomic studies.Proc Natl Acad Sci U S A 2009,106(Suppl 1):9947–9954.PubMedView Article
Jeukens J, Bittner D, Knudsen R, Bernatchez L: Candidate genes and adaptive radiation: insights from transcriptional adaptation to the limnetic niche among coregonine fishes (Coregonus spp., Salmonidae).Mol Biol Evol 2009,26(1):155–166.PubMedView Article
Kocher TD: Adaptive evolution and explosive speciation: the cichlid fish model.Nat Rev Gen 2004,5(4):288–298.View Article
Salzburger W: The interaction of sexually and naturally selected traits in the adaptive radiations of cichlid fishes.Mol Ecol 2009,18(2):169–185.PubMedView Article
Roberts RB, Hu Y, Albertson RC, Kocher TD: Craniofacial divergence and ongoing adaptation via the hedgehog pathway.Proc Natl Acad Sci 2011,108(32):13194–13199.PubMedView Article
Gunter HM, Clabaut C, Salzburger W, Meyer A: Identification and characterization of gene expression involved in the coloration of Cichlid fish using microarray and qRT-PCR approaches.J Mol Evol 2011,72(2):127–137.PubMedView Article
Salzburger W, Braasch I, Meyer A: Adaptive sequence evolution in a color gene involved in the formation of the characteristic egg-dummies of male haplochromine cichlid fishes.BMC Biol 2007, 5:51.PubMedView Article
Albertson RC, Streelman JT, Kocher TD, Yelick PC: Integration and evolution of the cichlid mandible: The molecular basis of alternate feeding strategies.Proc Natl Acad Sci U S A 2005,102(45):16287–16292.PubMedView Article
Terai Y, Okada N: Speciation of Cichlid Fishes by Sensory Drive. In From Genes to Animal Behavior. Edited by: Inoue-Murayama M, Kawamura S, Weiss A. Springer Japan, Tokyo; 2011:311–328.View Article
Carleton K: Cichlid fish visual systems: mechanisms of spectral tuning.Integr Zool 2009,4(1):75–86.PubMedView Article
Yau KW, Hardie RC: Phototransduction motifs and variations.Cell 2009,139(2):246–264.PubMedView Article
Yokoyama S: Evolution of dim-light and color vision pigments.Annu Rev Genom Hum Genet 2008, 9:259–282.View Article
Bowmaker JK: Evolution of vertebrate visual pigments.Vision Res 2008,48(20):2022–2041.PubMedView Article
Seehausen O, Terai Y, Magalhaes IS, Carleton KL, Mrosso HDJ, Miyagi R, van der Sluijs I, Schneider MV, Maan ME, Tachida H, et al.: Speciation through sensory drive in cichlid fish.Nature 2008,455(7213):620-U623.PubMedView Article
Carleton KL, Parry JW, Bowmaker JK, Hunt DM, Seehausen O: Colour vision and speciation in Lake Victoria cichlids of the genus Pundamilia.Mol Ecol 2005,14(14):4341–4353.PubMedView Article
Hofmann CM, Carleton KL: Gene duplication and differential gene expression play an important role in the diversification of visual pigments in fish.Integrative and Comparative Biology 2009,49(6):630–643.PubMedView Article
Parry JWL, Carleton KL, Spady T, Carboo A, Hunt DM, Bowmaker JK: Mix and match color vision: Tuning spectral sensitivity by differential opsin gene expression in Lake Malawi Cichlids.Curr Biol 2005,15(19):1734–1739.PubMedView Article
Spady TC, Parry JWL, Robinson PR, Hunt DM, Bowmaker JK, Carleton KL: Evolution of the cichlid visual palette through ontogenetic subfunctionalization of the opsin gene arrays.Mol Biol Evol 2006,23(8):1538–1547.PubMedView Article
Weadick CJ, Loew E, Rodd FH, Chang BSW: Visual pigment molecular evolution in the Trinidadian pike cichlid (Crenicichla frenata): A less colourful world for Neotropical cichlids?Mol Biol Evol 2012,29(10):3045–3060.PubMedView Article
Smith AR, D'Annunzio L, Smith AE, Sharma A, Hofmann CM, Marshall NJ, Carleton KL: Intraspecific cone opsin expression variation in the cichlids of Lake Malawi.Mol Ecol 2011,20(2):299–310.PubMedView Article
O'Quin KE, Smith AR, Sharma A, Carleton KL: New evidence for the role of heterochrony in the repeated evolution of cichlid opsin expression.Evol Dev 2011,13(2):193–203.PubMedView Article
Sabbah S, Laria RL, Gray SM, Hawryshyn CW: Functional diversity in the color vision of cichlid fishes.BMC Biol 2011, 8:133.View Article
Tamura K, Dudley J, Nei M, Kumar S: MEGA4: Molecular evolutionary genetics analysis (MEGA) software version 4.0.Mol Biol Evol 2007,24(8):1596–1599.PubMedView Article
Thompson JD, Higgins DG, Gibson TJ: Clustal-W - Improving the Sensitivity of Progressive Multiple Sequence Alignment through Sequence Weighting, Position-Specific Gap Penalties and Weight Matrix Choice.Nucleic Acids Res 1994,22(22):4673–4680.PubMedView Article
Guindon S, Dufayard JF, Lefort V, Anisimova M, Hordijk W, Gascuel O: New Algorithms and Methods to Estimate Maximum-Likelihood Phylogenies: Assessing the Performance of PhyML 3.0.Syst Biol 2010,59(3):307–321.PubMedView Article
Yang Z: Maximum-likelihood phylogenetic estimation from DNA-sequences with variable rates over sites - approximate methods.J Mol Evol 1994,39(3):306–314.PubMedView Article
Spady TC, Seehausen O, Loew ER, Jordan RC, Kocher TD, Carleton KL: Adaptive molecular evolution in the opsin genes of rapidly speciating cichlid species.Mol Biol Evol 2005,22(6):1412–1422.PubMedView Article
Koblmuller S, Sefc KM, Sturmbauer C: The Lake Tanganyika cichlid species assemblage: recent advances in molecular phylogenetics.Hydrobiologia 2008, 615:5–20.View Article
Anisimova M, Kosiol C: Investigating protein-coding sequence evolution with probabilistic codon substitution models.Mol Biol Evol 2009,26(2):255–271.PubMedView Article
Yang Z, Bielawski JP: Statistical methods for detecting molecular adaptation.Trends Ecol Evol 2000,15(12):496–503.PubMedView Article
Yang Z: Likelihood ratio tests for detecting positive selection and application to primate lysozyme evolution.Mol Biol Evol 1998,15(5):568–573.PubMedView Article
Yang Z: Computational Molecular Evolution. Oxford University Press, Oxford; 2006.View Article
Huelsenbeck JP, Bollback JP: Application of the Likelihood Function in Phylogenetic Analysis. In Handbook of Statistical Genetics. Volume 1. 3rd edition. Edited by: Balding DJ, Bishop M, Cannings C. Wiley, West Sussex; 2007:460–488.View Article
Huelsenbeck JP, Rannala B: Phylogenetic methods come of age: testing hypotheses in an evolutionary context.Science 1997,276(5310):227–232.PubMedView Article
Akaike H: New look at statistical-model identification.IEEE Trans Autom Control 1974,AC19(6):716–723.View Article
Bruen TC, Philippe H, Bryant D: A simple and robust statistical test for detecting the presence of recombination.Genetics 2006,172(4):2665–2681.PubMedView Article
Yang Z, Nielsen R: Synonymous and nonsynonymous rate variation in nuclear genes of mammals.J Mol Evol 1998,46(4):409–418.PubMedView Article
Yang ZH, Wong WSW, Nielsen R: Bayes empirical Bayes inference of amino acid sites under positive selection.Mol Biol Evol 2005,22(4):1107–1118.PubMedView Article
Weadick CJ, Chang BSW: An improved likelihood ratio test for detecting site-specific functional divergence among clades of protein-coding genes.Mol Biol Evol 2012,29(5):1297–1300.PubMedView Article
Yang Z: PAML 4: phylogenetic analysis by maximum likelihood.Mol Biol Evol 2007,24(8):1586–1591.PubMedView Article
Okada T, Sugihara M, Bondar AN, Elstner M, Entel P, Buss V: The retinal conformation and its environment in rhodopsin in light of a new 2.2 angstrom crystal structure.J Mol Biol 2004,342(2):571–583.PubMedView Article
Yang Z, Kumar S, Nei M: A new method of inference of ancestral nucleotide and amino-acid-sequences.Genetics 1995,141(4):1641–1650.PubMed
Whelan S, Goldman N: A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach.Mol Biol Evol 2001,18(5):691–699.PubMedView Article
Casola C, Hahn MW: Gene Conversion Among Paralogs Results in Moderate False Detection of Positive Selection Using Likelihood Methods.J Mol Evol 2009,68(6):679–687.PubMedView Article
Ezawa K, Oota S, Saitou N: Genome-wide search of gene conversions in duplicated genes of mouse and rat.Mol Biol Evol 2006,23(5):927–940.PubMedView Article
O'Quin KE, Smith D, Naseer Z, Schulte J, Engel SD, Loh YHE, Streelman JT, Boore JL, Carleton KL: Divergence in cis-regulatory sequences surrounding the opsin gene arrays of African cichlid fishes.BMC Evol Biol 2011, 11:120.PubMedView Article
Zhao ZM, Hewett-Emmett D, Li WH: Frequent gene conversion between human red and green opsin genes.J Mol Evol 1998,46(4):494–496.PubMedView Article
Hildebrand PW, Scheerer P, Park JH, Choe HW, Piechnick R, Ernst OP, Hofmann KP, Heck M: A ligand channel through the g protein coupled receptor opsin.PLoS One 2009,4(2):e4382.PubMedView Article
Gossmann TI, Schmid KJ: Selection-driven divergence after gene duplication in Arabidopsis thaliana.J Mol Evol 2011,73(3–4):153–165.PubMedView Article
Yokoyama S, Zhang H, Radlwimmer FB, Blow NS: Adaptive evolution of color vision of the common coelacanth (Latimeria chalumnae).Proc Natl Acad Sci U S A 1999,96(11):6279–6284.PubMedView Article
Takenaka N, Yokoyama S: Mechanisms of spectral tuning in the RH2 pigments of Tokay gecko and American chameleon.Gene 2007,399(1):26–32.PubMedView Article
Chinen A, Matsumoto Y, Kawamura S: Reconstitution of ancestral green visual pigments of zebrafish and molecular mechanism of their spectral differentiation.Mol Biol Evol 2005,22(4):1001–1010.PubMedView Article
Piechnick R, Ritter E, Hildebrand PW, Ernst OP, Scheerer P, Hofmann KP, Heck M: Effect of channel mutations on the uptake and release of the retinal ligand in opsin.Proc Natl Acad Sci U S A 2012,109(14):5247–5252.PubMedView Article
Karnik SS, Khorana HG: Assembly of functional rhodopsin requires a disulfide bond between cysteine residues 110 and 187.J Biol Chem 1990,265(29):17520–17524.PubMed
Zhukovsky EA, Oprian DD: Effect of carboxylic acid side chains on the absorption maximum of visual pigments.Science 1989,246(4932):928–930.PubMedView Article
Imai H, Kojima D, Oura T, Tachibanaki S, Terakita A, Shichida Y: Single amino acid residue as a functional determinant of rod and cone visual pigments.Proc Natl Acad Sci U S A 1997,94(6):2322–2326.PubMedView Article
Zhang L, Sports CD, Osawa S, Weiss ER: Rhodopsin phosphorylation sites and their role in arrestin binding.J Biol Chem 1997,272(23):14762–14768.PubMedView Article
Shi W, Osawa S, Dickerson CD, Weiss ER: Rhodopsin Mutants Discriminate Sites Important for the Activation of Rhodopsin Kinase and G(T).J Biol Chem 1995,270(5):2112–2119.PubMedView Article
Fotiadis D, Jastrzebska B, Philippsen A, Muller DJ, Palczewski K, Engel A: Structure of the rhodopsin dimer: a working model for G-protein-coupled receptors.Curr Opin Struct Biol 2006,16(2):252–259.PubMedView Article
Guo W, Shi L, Filizola M, Weinstein H, Javitch JA: Crosstalk in G protein-coupled receptors: changes at the transmembrane homodimer interface determine activation.Proc Natl Acad Sci U S A 2005,102(48):17495–17500.PubMedView Article
Taylor JS, Raes J: Duplication and divergence: the evolution of new genes and old ideas.Annu Rev Genet 2004, 38:615–643.PubMedView Article
Kimura M: The Neutral Theory of Molecular Evolution. Cambridge University Press, Cambridge, U.K.; 1983.View Article
Innan H, Kondrashov F: The evolution of gene duplications: classifying and distinguishing between models.Nat Rev Gen 2010,11(2):97–108.
Temple SE: Why different regions of the retina have different spectral sensitivities: a review of mechanisms and functional signicance of intraretinal variability in spectral sensitivity in vertebrates.Vis Neurosci 2011.
Rastogi S, Liberles DA: Subfunctionalization of duplicated genes as a transition state to neofunctionalization.BMC Evol Biol 2005, 5:28.PubMedView Article
Renoult JP, Courtiol A, Kjellberg F: When assumptions on visual system evolution matter: nestling colouration and parental visual performance in birds.J Evol Biol 2010,23(1):220–225.PubMedView Article
Brichard P: Cichlids and All the Other Fishes of Lake Tanganyika. TFH Publications, Neptune City; 1989.
Imai H, Kefalov V, Sakurai K, Chisaka O, Ueda Y, Onishi A, Morizumi T, Fu YB, Ichikawa K, Nakatani K, et al.: Molecular properties of rhodopsin and rod function.J Biol Chem 2007,282(9):6677–6684.PubMedView Article
Yang Z, dos Reis M: Statistical properties of the branch-site test of positive selection.Mol Biol Evol 2011,28(3):1217–1228.PubMedView Article
Gaucher EA, Gu X, Miyamoto MM, Benner SA: Predicting functional divergence in protein evolution by site-specific rate shifts.Trends Biochem Sci 2002,27(6):315–321.PubMedView Article
Wei RX, Ge S: Evolutionary history and complementary selective relaxation of the duplicated PI genes in grasses.J Integr Plant Biol 2011,53(8):682–693.PubMedView Article
Liu Y, Cotton JA, Shen B, Han X, Rossiter SJ, Zhang S: Convergent sequence evolution between echolocating bats and dolphins.Curr Biol 2010,20(2):R53–54.PubMedView Article
Li Z, Gan X, He S: Distinct evolutionary patterns between two duplicated color vision genes within cyprinid fishes.J Mol Evol 2009,69(4):346–359.PubMedView Article
Hasselmann M, Lechner S, Schulte C, Beye M: Origin of a function by tandem gene duplication limits the evolutionary capability of its sister copy.Proc Natl Acad Sci U S A 2010,107(30):13378–13383.PubMedView Article
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.