Coevolution of amino acid residues in the key photosynthetic enzyme Rubisco
© Wang et al; licensee BioMed Central Ltd. 2011
Received: 15 June 2011
Accepted: 23 September 2011
Published: 23 September 2011
One of the key forces shaping proteins is coevolution of amino acid residues. Knowing which residues coevolve in a particular protein may facilitate our understanding of protein evolution, structure and function, and help to identify substitutions that may lead to desired changes in enzyme kinetics. Rubisco, the most abundant enzyme in biosphere, plays an essential role in the process of carbon fixation through photosynthesis, thus facilitating life on Earth. This makes Rubisco an important model system for studying the dynamics of protein fitness optimization on the evolutionary landscape. In this study we investigated the selective and coevolutionary forces acting on large subunit of land plants Rubisco using Markov models of codon substitution and clustering approaches applied to amino acid substitution histories.
We found that both selection and coevolution shape Rubisco, and that positively selected and coevolving residues have their specifically favored amino acid composition and pairing preference. The mapping of these residues on the known Rubisco tertiary structures showed that the coevolving residues tend to be in closer proximity with each other compared to the background, while positively selected residues tend to be further away from each other. This study also reveals that the residues under positive selection or coevolutionary force are located within functionally important regions and that some residues are targets of both positive selection and coevolution at the same time.
Our results demonstrate that coevolution of residues is common in Rubisco of land plants and that there is an overlap between coevolving and positively selected residues. Knowledge of which Rubisco residues are coevolving and positively selected could be used for further work on structural modeling and identification of substitutions that may be changed in order to improve efficiency of this important enzyme in crops.
KeywordsRubisco coevolution phylogeny positive selection
Coevolution is one of the few paramount forces acting on all levels of biological organization from bioms to nucleotides. Observations of the complementary adaptations in two or more species caused by mutual selection pressures have started from Darwin's (1862) work on orchids and their pollinators and resulted in theoretical generalizations such as 'Red Queen Hypothesis' [1, 2]. More recently concepts and methodologies developed for the study of species coevolution were applied to the growing wealth of molecular data, in particular for detection of coevolution between and within proteins . Identifying coevolving positions in proteins allows better understanding of their structure and function and paves the road to engineering proteins with desired properties. Several computational methods have been proposed to detect coevolving residues from multiple sequence alignments (e.g., [4–8]). Best approaches strive to disentangle patterns created by coevolution and those due to shared ancestry (phylogenetic correlation) and stochasticity (random error). Based on recent comparative evaluation of the state-of-art techniques to detect coevolution , here we use one of the top performing approaches implemented in CoMap  to study the coevolution of residues in a key photosynthetic enzyme Rubisco.
Rubisco (ribulose-1,5-bisphosphate carboxylase/oxygenase, EC 18.104.22.168) is the key enzyme of the Calvin cycle, catalyzing the fixation of inorganic carbon dioxide to organic sugars. Rubisco is a gateway for inorganic carbon, which is present in all light-dependent ecosystems. However, due to the poor turnover rate and competition between O2 and CO2 at the active site, Rubisco is often the rate-limiting step of the photosynthesis . These properties of Rubisco coupled with its high concentration in photosynthesizing organs make it the most abundant enzyme on Earth . Both biospheric importance and intracellular abundance of Rubisco stimulated plenitude of molecular studies using Rubisco as a model system (reviewed in ), but despite significant progress our understanding of Rubisco functioning and evolution is still far from complete .
Land plants and green algae have type IB Rubisco, which is a hexadecamer consisting of eight large, plastid encoded, and eight small, nuclear encoded, subunits. Large subunits which possess the active site of Rubisco are encoded by the chloroplast gene rbcL, which over three decades ago was among the first fully sequenced genes  and since that time became one of the most often sequenced genes thanks to its wide use in phylogenetics of plants and algae (e.g. ). Plant systematists have mainly used rbcL paying little attention to its function. However, recently positive selection acting on rbcL was found in the most lineages of land plants . The mapping of the positively selected residues on Rubisco tertiary structure revealed that they are located in regions important for dimer-dimer, intradimer, large subunit-small subunit and Rubisco-Rubisco activase interactions, and that some of the positively selected residues are close to the active site . Positive selection on Rubisco is in concert with well-known variation in Rubisco kinetics found in different species (e.g. ) and its correlation with environmental parameters (e.g. ). Positive selection has been shown as a driving force for kinetic differences in Rubiscos of C3 and C4 plants [19, 20].
Coevolutionary studies have been applied to a few important proteins and provided new information about protein-protein interactions, ligand-receptor bindings, and the 3D protein structure [8, 21–23]. Here we study the coevolution and positive selection on Rubisco large subunit using 142 data sets of the rbcL gene representing the main lineages of land plants (for detailed description see ). Our aim is to provide a better insight into the patterns of groups of non-independent sites and positively selected sites as well as to find their amino acid composition, pairing preference, and spatial distribution.
Results and discussion
About half of Rubisco residues coevolve
Known interactions of the inferred coevolving residues
15, 63, 64, 106, 109, 121, 126, 128, 129, 131, 176, 180, 205, 207, 208, 209, 211, 271, 297, 408, 413, 461
34, 105, 142**, 143, 146, 147, 162, 164, 216, 249, 285, 286
Small Subunit (SSU)
76, 163, 166, 223, 226, 227, 229, 230, 260, 261*, 397, 433, 453, 454
DD and ID
SSU and DD
219, 258, 288
SSU and ID
Presumably, sites coevolving for certain biochemical properties (charge, volume, polarity and Grantham distance) may have different amino acid composition preferences. Thus, we studied how amino acid frequency was different in residues that coevolve compared to the whole sequences. All coevolving sites had higher proportion of A, C, E, F, K, V, T compared with whole sequences (Figure 2 and Additional file 1). Among these residues, A, C, F, V are hydrophobic and E, K, T are hydrophilic. Amino acids W, Q, M and I were underrepresented in all coevolving sites. Proportion of A, D, K, L, T and V were significantly higher in coevolving sites detected by polarity (Figure 1a). Proportion of D, F, H, K, P were significantly higher in coevolving sites detected by Grantham (Figure 1b). Proportion of K, D and G were significantly higher in coevolving sites detected by charge (Figure 1c) compared with whole sequences. Frequencies of D, F, K, L V, Y were significantly higher in sites coevolving for volume (Figure 1d). Thus, our results demonstrate that frequencies of certain amino acids at coevolving sites are significantly different from the amino acid composition found in the whole sequence.
The amino acid composition of sites under positive selection is different from that of other sites
Fourteen of the most often positively selected residues of the Rubisco large subunit
Location of residues
Residues within 9 Å4
374, 375, 376, 396, 399, 400, 401, 402, 407, 410, 411, 445, 446, 447, 448, 449,451, 452, 453, 454
ID, SSU, DD
154, 155, 184, 187, 189, 219, 220, 221, 223, 224, 226, 227
DD, SSU, ID
209, 210, 213, 217,247, 248, 249, 250, 252, 253
DD, ID, SSU
142**, 143, 144, 146, 147, 148, 149, 150, 281, 282, 314, 315, 316, 364, 365, 366
140,141, 143, 144, 145, 272, 276, 311, 312, 313, 314, 315, 364, 365
23, 25, 26, 27, 54, 70, 71, 72, 73, 74, 93, 94, 96. 97. 98. 99
23, 86**, 326*, 332
413, 414, 417, 435, 436, 437, 438, 440, 441, 442, 443, 444, 446, 447
33*, 152, 151, 153, 135*, 281*, 310*, 440*, 470*
180, 181, 182, 184, 185, 186, 215, 216, 217, 218, 220, 221, 222, 223, 224, 225,227
121, 388, 423
DD, SSU, ID
143, 144, 152, 249, 253, 254, 274, 285, 286, 277, 278, 280, 281
319, 320, 321, 322, 323, 324, 326, 327, 329, 330, 332, 333, 462, 464
373, 374, 376, 377, 378, 379, 393, 396, 410, 411, 414, 436, 446, 449, 450, 453
AS, ID, SSU
190, 228, 229, 230, 231, 248, 253, 254, 256, 257, 280, 281, 282, 283, 315, 316
101, 86**, 167, 149*, 169*, 256*, 320*, 371*, 398
26, 27, 29, 30, 76, 91, 94, 128, 129, 130, 131, 132
19, 355*, 93*
33, 34, 35, 36, 37, 39, 41, 81, 84, 85, 87, 88
149*, 256*, 167, 169*, 222*, 317*, 320*, 371*, 398, 23, 30*
The coevolving and positively selected sites are preferably located in helices
The protein secondary structure elements helix, strand and coil have different physical and chemical properties, thus play distinct roles in the protein tertiary structure and function. So the evolutionary force may vary among different secondary structures. In Drosophila proteins the coil regions are more likely to be under positive selection than expected, while the helices and strands undergo less positive selection .
The secondary structure of the large subunits of Rubisco is conserved throughout land plants, despite the variation in primary sequences . The helix parts are usually amphipathic with one side hydrophobic and the other side hydrophilic, thus the structured regions can occur anywhere in the protein and involve the largest proportion of residues in Rubisco large subunit. The strands often contain hydrophobic residues and could form a well-structured parallel or anti-parallel beta sheet. The active site of Rubisco is located at the carboxy-terminal end of the beta strand . The coil is the most flexible element without ordered structure and assists the conformational change of the protein. Loop 6 of Rubisco large subunit is conserved in land plants and green algae. It is crucial for the catalytic process because it controls the opened or closed state of the enzyme, which influences the association of the substrate . It was shown that residues in mobile regions of the protein tend to evolve in highly correlated fashion, participating in physical and functional contacts during their motion .
Overall, this shows that evolutionary forces are unevenly distributed on the large subunit of Rubisco, with the helical parts of the structure more frequently affected by coevolution and positive selection compared to other parts. Interestingly, our results differ from ones obtained for Drosophila proteins where less than expected selection was found in helices .
Coevolving residues are closer in 3D structure
In order to compare the distribution of physical distances between the coevolving residues and all the residues in the LSU of Rubisco, we used four known 3D structures of spinach and tobacco from PDB both in activated and non-activated states. For each PDB record, distances between the center masses of any two residues in the protein 3D structure were calculated. The coevolving residues were mapped onto the PDB structures, and all the pair-wise combinations of the coevolving sites within a group were listed. The corresponding distances of all the coevolving pairs were collected. The minimum pair-wise distances between residues for both the activated and unactivated state of each species were calculated and the smallest value was chosen for further comparisons. It is said that two residues are in physical contact, if the distance between them is under a certain threshold. In some studies, they use the distance between two beta carbons (Cβ) or two alpha carbons (Cα) of the corresponding residues, with the direct contact threshold of 8Å [22, 23]. However, this method only considers one point of the residue, so that the possible position conformations of the other part of the molecule are neglected. In this study, distances between the center mass of the residues were calculated, thus the residue molecule was considered as a whole.
Physical distances between residue pairs
Coevolving sites distance
Positively selected sites distance
Positively selected site/active site distance
Next we analyzed physical proximity of the 14 residues most frequently found under positive selection in the LSU. Interestingly, these positively selected sites showed an opposite trend, as they were significantly further from each other than the background (Table 3), again showing different pattern from the Drosophila proteins . The distances between active sites and sites under positive selection tended to be shorter compared with the background, although not significantly (possibly due to small samples).
The functionally and structurally important sites in the protein are usually more conserved than other sites. But in some cases, one mutation at a crucial site may be compensated by mutations at other sites, so to maintain vital interactions and functions . Our study shows that the coevolving and positively selected sites tend to be located within the functionally and structurally important regions of Rubisco. Substitutions have to be compatible with the protein function and be structurally stable. Therefore, the amino acid composition and the residues pairing preference of the sites under coevolution and positive selection can provide a better insight in terms of protein evolution. Our molecular evolutionary analysis reveals that different evolutionary forces may have distinct amino acid composition and pairing preferences. The coevolving residue composition may not be too different from that of the background because they are wide spread across the sequence, while the positive selection is quite different. Moreover, the groups of non-independent residues have their pairing preference. Based on the amino acid pairing frequency matrix with different biochemical properties, the distinct patterns of coevolving pairs could provide a hint for the further analysis about the mutual information.
Our study indicates that coevolving sites are in closer proximity in the tertiary structure of the Rubisco large subunit. Predicting protein tertiary structure from the primary sequence is a crucial problem in computational biology. The physical interactions between coevolving residues could help to build a residue contact map in protein tertiary structure analysis. Moreover, many residues which coevolve or under positive selection are found in the functionally or structurally important locations, such as dimer-dimer, intradimer, active site and small subunit interactions. Our results appear to be in agreement with the study of Yeang and Haussler , who proposed that in large protein families coevolving positions are spatially coupled and many of the coevolving positions are located at the functionally or structurally important positions. Furthermore, we find that many sites are both under positive selection and coevolution, suggesting that selection towards a new optimum may require more than one substitution. Indeed, multiple neutral changes along the mutational landscape of a protein may precede mutations with high advantageous fitness effect .
Because of the importance of Rubisco, it has been the target of genetic engineering for a long time. Aspects including structure, function and evolution of this enzyme have been studied with the aim to improve its kinetics. Nowadays, the experimental way of random mutagenesis and bioselection could be used to identify mutations that influence important properties of Rubisco [13, 28–30], but the vast amount of candidates and the repetitive lab work make the process slow, unpredictable and tedious. Knowledge of location of coevolving or positively selected residues may be used to design future mutagenesis experiments and accelerate efforts to engineer better Rubisco, which would potentially increase the yield of agriculturally important crops.
1. Sequence data and phylogeny estimation
We used 142 rbcL data sets from . The sequences were assigned to each data set according to their phylogenetic relations. Each data set had 11 to 40 sequences. The codon alignments were constructed from DNA sequences by back-translating from amino acid alignments. Sequences within each data set were truncated to the same length. Angiosperms, gymnosperms, ferns, and mosses were represented by 122, 8, 9 and 4 datasets, respectively.
For each alignment a phylogeny was reconstructed using maximum likelihood as implemented in PhyML v3.0 [31, 32]. During the inference we used amino acid models WAG  and LG , both with Γ-rate variation. The tree space was traversed using the combination of NNI and SPR heuristics . The inferred phylogenies were used for further coevolution and positive selection analyses.
2. Coevolution Analysis
To detect coevolving residues we used a clustering approach that searches for ancestral co-substitutions or for compensatory changes by correlating amino acid substitution histories, as implemented in the R-program CoMap [6, 35]. Substitution numbers for each branch were sampled from a posterior distribution based on a Markov substitution model and a phylogeny with branch lengths relating the sequences. Parameters of the models and tree branches were estimated by maximum likelihood prior to the sampling. Substitutions were weighted by different biochemical properties (charge, polarity, volume and Grantham) to detect coevolutionary trends specific to amino acid properties . The amount of the biochemical change for one site was represented by weighted substitution vectors, containing weighted substitution counts for each branch of a phylogeny. The correlated or compensatory evolution was estimated based on the correlation coefficient of the substitution vectors. To select candidate groups, the complete linkage hierarchical clustering was applied to distance matrices based on the correlation and compensation statistics . To asses the significance of inferred clusters, the parametric bootstrap with 1000 replicates was used to generate the joint null distribution of minimum site variability together with coevolution or compensation statistic ρ, as described in . From such empirical distribution, p-values for each candidate cluster with observed statistic ρobs were computed as Pr(ρ > ρobs | Nmin), i.e., by conditioning on minimum site variability Nmin of the cluster. Clusters with p-value ≤ 0.05 were considered as evolving non-independently. A simulation procedure described in  was used to correct for multiple non-independent tests.
3. Positive Selection Analyses
Positive selection on the protein level was measured using the ω ratio, which is the ratio of nonsynonymous to synonymous substitution rates per site. Negative selection results in lower nonsynonymous rate relative to synonymous and so ω < 1. If the nonsynonymous mutations are favored, the nonsynonymous rate should be higher than synonymous, and so ω > 1 indicates evolution by positive selection. Here we estimated selective pressure on rbcL using different types of Markov models of codon evolution: (1) site-specific codon models that allow variation of selective pressure among sites in a sequence, and (2) switching codon models that allow variation of selective pressure among sites and over the evolutionary time. All branch lengths of inferred phylogenies were re-optimized under codon models.
3.1. Detecting selection with site-specific codon models
Site-specific codon models were used to test each alignment for positive selection using likelihood ratio tests (LRTs) of nested models: M0 (one ratio) vs M3 (discrete), M1a (neutral) vs M2a (selection), and M7 (beta) vs M8 (beta & ω) . These analyses were performed using the codeml program of the PAML package [37, 38]. If the LRT for positive selection was significant, Bayes naïve empirical Bayesian approach was used to infer sites under positive selection . Sites that were inferred to be under positive selection using model M8, if site's posterior probability for the positive selection class was ≥ 0.95.
3.2. Detecting selective episodes with switching codon models
We also used switching Markov modulated codon models to detect episodes of positive selection during the evolution of Rubisco, as implemented in the program FitModel . Each switching model used in our study allows three possible selective regimes for codon sites, for example like site model M2a with classes for positive (ω > 1), neutral (ω = 1), and negative selection (ω < 1), or like model M2 with 3 classes with no constraints on the ω ratio. Unlike site model, switching models allow each codon site to change the selective regime, and thus be affected by different selective pressures at different time points. This is accomplished by using an additional Markov process to describe the switches between selection regimes at any individual site. We used models both with bias (+S2) and with no bias (+S1) for switching between selective regimes. For each alignment we used LRTs to test whether switches of selective pressure over time occurred (M2a vs M2a+S1 and M3 vs M3+S1), and for alignments with significant evidence for switches we also tested whether there was switching bias (M2a+S1 vs M2a+S2 and M3+S1 vs M3+S2). Sites with episodes under positive selection were detected a posteriori using the Bayesian approach .
4. Mapping sites on 3D Structure
The analyses of location, properties and the distance analysis of residues in the protein structure were performed using VMD viewer , a program for visualization, manipulation and analysis of large molecules in three dimensions. Command options were used to extract information about sets of molecules, vectors and coordinates. The center mass of a molecule was computed using the Tcl language of VMD.
M.W. was supported by the ETH Zurich scholarship for international master students. M.V.K. was supported by Natural Environment Research Council UK. M.A. was supported by ETH Zurich and was also receiving funding from the Swiss National Science Foundation (research grant 31003A_127325).
- Van Valen L: The red queen. The American Naturalist. 1977, 111 (980): 12-View Article
- Van Valen L: A new evolutionary law. Evolutionary Theory. 1973, 1 (1): 30-
- Pazos F, Valencia A: Protein co-evolution, co-adaptation and interactions. EMBO J. 2008, 27 (20): 2648-2655. 10.1038/emboj.2008.189.View ArticlePubMedPubMed Central
- Gloor GB, Martin LC, Wahl LM, Dunn SD: Mutual information in protein multiple sequence alignments reveals two classes of coevolving positions. Biochemistry. 2005, 44 (19): 7156-7165. 10.1021/bi050293e.View ArticlePubMed
- Fares MA, Travers SA: A novel method for detecting intramolecular coevolution: adding a further dimension to selective constraints analyses. Genetics. 2006, 173 (1): 9-23. 10.1534/genetics.105.053249.View ArticlePubMedPubMed Central
- Dutheil J, Galtier N: Detecting groups of coevolving positions in a molecule: a clustering approach. BMC Evol Biol. 2007, 7: 242-10.1186/1471-2148-7-242.View ArticlePubMedPubMed Central
- Dutheil J: Detecting site-specific biochemical constraints through substitution mapping. J Mol Evol. 2008, 67 (3): 257-265. 10.1007/s00239-008-9139-8.View ArticlePubMed
- Yeang CH, Haussler D: Detecting coevolution in and among protein domains. PLoS Comput Biol. 2007, 3 (11): e211-10.1371/journal.pcbi.0030211.View ArticlePubMedPubMed Central
- Dutheil JY: Detecting coevolving positions in a molecule: Why, when and how to account for phylogeny?. Briefings in Bioinformatics. 2011
- Andersson I, Backlund A: Structure and function of Rubisco. Plant Physiol Biochem. 2008, 46 (3): 275-291. 10.1016/j.plaphy.2008.01.001.View ArticlePubMed
- Ellis RJ: The most abundant protein in the world. Trends in Biochemical Sciences. 1979, 4 (11): 4-View Article
- Portis AR, Parry MA: Discoveries in Rubisco (Ribulose 1,5-bisphosphate carboxylase/oxygenase): a historical perspective. Photosynth Res. 2007, 94 (1): 121-143. 10.1007/s11120-007-9225-6.View ArticlePubMed
- Spreitzer RJ, Salvucci ME: Rubisco: structure, regulatory interactions, and possibilities for a better enzyme. Annu Rev Plant Biol. 2002, 53: 449-475. 10.1146/annurev.arplant.53.100301.135233.View ArticlePubMed
- McIntosh L, Poulsen C, Bogorad L: Chloroplast gene sequence for the large subunit of ribulose bisphosphatecarboxylase of maize. Nature. 1980, 288: 5-View Article
- Group TAP: An ordinal classification for the families of flowering plants. Annals of the Missouri Botanical Garden. 1998, 85: 3-
- Kapralov MV, Filatov DA: Widespread positive selection in the photosynthetic Rubisco enzyme. BMC Evol Biol. 2007, 7: 73-10.1186/1471-2148-7-73.View ArticlePubMedPubMed Central
- Jordan DB, Ogren WL: Species variation in the specificity of ribulose biphosphate carboxylase/oxygenase. Nature. 1981, 291: 3-View Article
- Galmés J, Flexas J, Keys AJ, Cifre J, Mitchell RAC: Rubisco specificity factor tends to be larger in plant species from drier habitats and in species with persistent leaves. Plant, Cell and Environment. 2005, 28: 9-View Article
- Christin PA, Salamin N, Muasya AM, Roalson EH, Russier F, Besnard G: Evolutionary switch and genetic convergence on rbcL following the evolution of C4 photosynthesis. Mol Biol Evol. 2008, 25 (11): 2361-2368. 10.1093/molbev/msn178.View ArticlePubMed
- Kapralov MV, Kubien DS, Andersson I, Filatov DA: Changes in Rubisco kinetics during the evolution of C4 photosynthesis in Flaveria (Asteraceae) are associated with positive selection on genes encoding the enzyme. Mol Biol Evol. 2011, 28 (4): 1491-1503. 10.1093/molbev/msq335.View ArticlePubMed
- Shindyalov IN, Kolchanov NA, Sander C: Can three-dimensional contacts in protein structures be predicted by analysis of correlated mutations?. Protein Eng. 1994, 7 (3): 349-358. 10.1093/protein/7.3.349.View ArticlePubMed
- Saraf MC, Moore GL, Maranas CD: Using multiple sequence correlation analysis to characterize functionally important protein regions. Protein Eng. 2003, 16 (6): 397-406. 10.1093/protein/gzg053.View ArticlePubMed
- Wang ZO, Pollock DD: Coevolutionary patterns in cytochrome c oxidase subunit I depend on structural and functional context. J Mol Evol. 2007, 65 (5): 485-495. 10.1007/s00239-007-9018-8.View ArticlePubMed
- Glaser F, Steinberg DM, Vakser IA, Ben-Tal N: Residue frequencies and pairing preferences at protein-protein interfaces. Proteins. 2001, 43 (2): 89-102. 10.1002/1097-0134(20010501)43:2<89::AID-PROT1021>3.0.CO;2-H.View ArticlePubMed
- Anisimova M, Kosiol C: Investigating protein-coding sequence evolution with probabilistic codon substitution models. Mol Biol Evol. 2009, 26 (2): 255-271. 10.1093/molbev/msn232.View ArticlePubMed
- Ridout KE, Dixon CJ, Filatov DA: Positive selection differs between protein secondary structure elements in Drosophila . Genome Biol Evol. 2010, 2: 166-179. 10.1093/gbe/evq008.View ArticlePubMedPubMed Central
- Wagner A: Neutralism and selectionism: a network-based reconciliation. Nat Rev Genet. 2008, 9 (12): 965-974.View ArticlePubMed
- Smith SA, Tabita FR: Positive and negative selection of mutant forms of prokaryotic (cyanobacterial) ribulose-1,5-bisphosphate carboxylase/oxygenase. J Mol Biol. 2003, 331 (3): 557-569. 10.1016/S0022-2836(03)00786-1.View ArticlePubMed
- Mueller-Cajar O, Morell M, Whitney SM: Directed evolution of Rubisco in Escherichia coli reveals a specificity-determining hydrogen bond in the form II enzyme. Biochemistry. 2007, 46 (49): 14067-14074. 10.1021/bi700820a.View ArticlePubMed
- Whitney SM, Houtz RL, Alonso H: Advancing our understanding and capacity to engineer nature's CO2-sequestering enzyme, Rubisco. Plant Physiol. 2011, 155 (1): 27-35. 10.1104/pp.110.164814.View ArticlePubMedPubMed Central
- Guindon S, Gascuel O: A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol. 2003, 52 (5): 696-704. 10.1080/10635150390235520.View ArticlePubMed
- Guindon S, Dufayard JF, Lefort V, Anisimova M, Hordijk W, Gascuel O: New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol. 2010, 59 (3): 307-321. 10.1093/sysbio/syq010.View ArticlePubMed
- Whelan S, Goldman N: A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol Biol Evol. 2001, 18 (5): 691-699.View ArticlePubMed
- Le SQ, Gascuel O: An improved general amino acid replacement matrix. Mol Biol Evol. 2008, 25 (7): 1307-1320. 10.1093/molbev/msn067.View ArticlePubMed
- Dutheil J, Pupko T, Jean-Marie A, Galtier N: A model-based approach for detecting coevolving positions in a molecule. Mol Biol Evol. 2005, 22 (9): 1919-1928. 10.1093/molbev/msi183.View ArticlePubMed
- Yang Z, Nielsen R, Goldman N, Pedersen AM: Codon-substitution models for heterogeneous selection pressure at amino acid sites. Genetics. 2000, 155 (1): 431-449.PubMedPubMed Central
- Yang Z: PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci. 1997, 13 (5): 555-556.PubMed
- Yang Z: PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol. 2007, 24 (8): 1586-1591. 10.1093/molbev/msm088.View ArticlePubMed
- Yang Z, Wong WS, Nielsen R: Bayes empirical bayes inference of amino acid sites under positive selection. Mol Biol Evol. 2005, 22 (4): 1107-1118. 10.1093/molbev/msi097.View ArticlePubMed
- Guindon S, Rodrigo AG, Dyer KA, Huelsenbeck JP: Modeling the site-specific variation of selection patterns along lineages. Proc Natl Acad Sci USA. 2004, 101 (35): 12957-12962. 10.1073/pnas.0402177101.View ArticlePubMedPubMed Central
- Humphrey W, Dalke A, Schulten K: VMD: visual molecular dynamics. J Mol Graph. 1996, 14 (1): 33-38. 10.1016/0263-7855(96)00018-5. 27-38View ArticlePubMed
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.