The structurally constrained protein evolution model accounts for sequence patterns of the LβH superfamily
© Parisi and Echave; licensee BioMed Central Ltd. 2004
Received: 23 December 2003
Accepted: 22 October 2004
Published: 22 October 2004
Structure conservation constrains evolutionary sequence divergence, resulting in observable sequence patterns. Most current models of protein evolution do not take structure into account explicitly, being unsuitable for investigating the effects of structure conservation on sequence divergence. To this end, we recently developed the Structurally Constrained Protein Evolution (SCPE) model. The model starts with the coding sequence of a protein with known three-dimensional structure. At each evolutionary time-step of an SCPE simulation, a trial sequence is generated by introducing a random point mutation in the current coding DNA sequence. Then, a "score" for the trial sequence is calculated and the mutation is accepted only if its score is under a given cutoff, λ. The SCPE score measures the distance between the trial sequence and a given reference sequence, given the structure. In our first brief report we used a "global score", in which the same reference sequence, the ancestral one, was used at each evolutionary step. Here, we introduce a new scoring function, the "local score", in which the sequence accepted at the previous evolutionary time-step is used as the reference. We assess the model on the UDP-N-acetylglucosamine acyltransferase (LPXA) family, as in our previous report, and we extend this study to all other members of the left-handed parallel beta helix fold (LβH) superfamily whose structure has been determined.
We studied site-dependent entropies, amino acid probability distributions, and substitution matrices predicted by SCPE and compared with experimental data for several members of the LβH superfamily. We also evaluated structure conservation during simulations. Overall, SCPE outperforms JTT in the description of sequence patterns observed in structurally constrained sites. Maximum Likelihood calculations show that the local-score and global-score SCPE substitution matrices obtained for LPXA outperform the JTT model for the LPXA family and for the structurally constrained sites of class i of other members within the LβH superfamily.
We extended the SCPE model by introducing a new scoring function, the local score. We performed a thorough assessment of the SCPE model on the LPXA family and extended it to all other members of known structure of the LβH superfamily.
Protein structure is more conserved than protein sequence during molecular evolution [1–3]. Remote homologous proteins constitute an extreme example of sequence divergence where proteins with similar function and no apparent sequence similarity present almost the same fold . However, protein sequences are far from being random. Rather, they are selected through evolution in such a way that functional constraints modulate sequence variability. Usually, only a few residues are directly related to the protein function. However, these residues must maintain adequate spatial relationships for the protein to remain functional, so that the whole 3D structure is conserved. In turn, structure conservation constrains sequence variability in such a way that residue substitution does not disturb the overall 3D structure of the protein. This results in emergent non-random sequence patterns.
The restrictions imposed by the environment of a given protein site onto its pattern of amino acid substitutions have been largely discussed [2, 5–9]. Briefly, highly constrained positions are more conserved. Furthermore, each site has a biased composition related to its structural environment. Models of protein evolution that take this into account outperform other, simpler, models [10–13]. Recently, a number of models of protein evolution have been developed that take explicit account of protein structure, stability, and/or foldability [14–22]. Even though such models have not been used yet for phylogenetic inference purposes, they are useful to gain insight into the detailed mechanism of protein evolution. Noteworthy, some of these models have been able to reproduce quantitatively observed amino acid substitution patterns [12, 14, 23].
To study how protein structure conservation modulates sequence divergence, we recently developed the Structurally Constrained Protein Evolution (SCPE) model . The starting point of an SCPE simulation is the coding-sequence of a protein of known three-dimensional structure, which we shall call the "ancestral sequence". At each evolutionary time-step, a new "trial sequence" is generated by random mutation, at DNA level, of the "current sequence" (accepted at the previous time-step). Then, the trial DNA is translated using the universal genetic code and a "score" that estimates the protein structure perturbation introduced by the mutation is evaluated. The trial sequence is accepted, becoming the new current sequence, only if its score is below a certain "cut-off", λ, that measures the amount of structural perturbation allowed by natural selection. In this way, for λ = 0 only synonymous mutations are accepted, whereas for λ ~ ∞ all mutations are accepted. The procedure is repeated until a desired number of mutations are reached. In the present work the DNA is mutated using the Jukes-Cantor model, so that each nucleotide substitution occurs with the same probability.
The model depends on one parameter, the cut-off λ, that must be fit by comparison to actual sequence data. Different properties could be used to fit the cut-off. As we will show below, the model is quite robust with respect to the property used. Therefore, we have used the simplest way, which is to fit λ such that the acceptance rate, ω, inferred for actual sequences is reproduced. The acceptance rate is the probability that an amino acid mutation is accepted. Thus, it can be estimated by the ratio between the number of amino acid substitutions (accepted mutations) and the total number of trial amino acid mutations. The acceptance rate has been extensively used to characterize the strength of the selective pressure under which proteins evolve [24–27]. If all mutations were neutral they would be accepted and ω would be 1. In general the ω values are usually below 0.5 due to the deleterious effects of most amino acid mutations . In proteins under very strong selective pressure ω can take values very close to zero.
One of the main factors determining the quality of the SCPE model is the scoring function. Given the structure of the ancestral protein, which we assume constant throughout the simulation, the score of a given trial sequence is defined as the RMSD between the mean-field energy profile of the trial sequence and that of a reference sequence. In our previous work, the same reference sequence, the ancestral one, was used for each time-step. Therefore, the score of each trial sequence was a measure of the dissimilarity between the trial sequence and the ancestral sequence, given the structure. Such a score depends only on the trial sequence and the ancestral sequence, but not on the particular sequence mutated to obtain the trial. Hence, it does not depend on the precise evolutionary path between the ancestral and the trial. Therefore, this will be called from now on "global score".
Even though the global score has been proved to be very good at reproducing the sequence patterns of a test case, it also shows some problems. Mainly, at the beginning of a simulation most mutations fall below the optimum cut-off. This results in too high values of the acceptance rate. Only after about 5% of the sites have been substituted, the cut-off is purifying enough to reproduce the acceptance rate inferred for the actual family. From a more qualitative point of view, since at the beginning of global-score simulations almost all mutations are accepted, erroneous amino acids, which are not found in the natural sequences of the family studied, can be introduced with relatively high probability during the first few steps of a global-score simulation. We shall see below that these are unwanted artefacts of the global-score SCPE simulations.
To tackle the problems described in the previous paragraph, in this paper we introduce a "local score", in which the reference sequence for a given trial is that accepted in the previous evolutionary time step, the current sequence, rather than the ancestral one. Thus, the local score measures the mutational perturbation introduced in a given time-step, rather than the global difference between the trial and ancestral sequences.
We shall show below that when the local score is used, the acceptance rate averaged over independent runs does not depend on the amount of divergence from the ancestral sequence. Furthermore, no erroneous amino acids are accepted during the simulations. Thus, these artefacts of the global-score simulations are absent when the new scheme is used. To further compare both schemes, other properties were analysed. Specifically, we evaluated and compared structure conservation, entropy profiles, amino acid distributions, and substitution matrices. We show that SCPE simulations that use the LPXA from E. coli as ancestral sequence can be used to estimate site-dependent amino acid substitution matrices [32, 33] which outperform the usually used JTT model . Moreover, we consider the applicability of the SCPE substitution matrices obtained from LPXA simulations to other protein families which adopt the LβH fold.
Results and discussion
Determination of optimal λ
Using the method yn00+w+f, which best estimates the calculated ω, we obtained the ω of the reference alignment of 25 sequences homologous to the UDP-N-acetylglucosamine acyltransferase from Escherichia coli (LPXA reference alignment). The average ω for this alignment is 0.22. Using this value in Figure 3a and 3b the optimal values of λ obtained are 1.10 and 7.00 for local and global score, respectively.
We note here that the optimal λ values for local and global score are very different. Thus, for the sake of comparison, we take advantage of the one-to-one relationship between λ and ω, shown in Figures 3a and 3b, and use the calculated acceptance rate ω instead of λ as model parameter. In Figure 3c we plot the inferred ω versus the calculated ω for local-score and global-score simulations. Using this plot and the inferred ω value for the LPXA reference alignment, ω = 0.22 ± 0.11 (0.11 is the standard deviation of ω), we calculate an optimal ω of 0.15 (0.12–0.27) for local score and 0.19 (0.12–0.26) for global score.
Assessment of structure conservation
Evaluation of structure conservation. The table shows the percentage of output sequences that recognize correctly the LβH fold for local-score SCPE, global-score SCPE, and JTT for two different amounts of amino acid substitutions per site (Ka).
Amount of Divergence
Ka = 0.28
Ka = 1.7
Local-score SCPE λ = 1.10, ω = 0.15
Local-score SCPE λ = 8.00, ω = 0.92
Global-score SCPE λ = 7.00, ω = 0.19
Global-score SCPE λ = 90.00, ω = 0.95
When both SCPE schemes are compared, Table 1 shows that local-score simulations perform better than global-score ones. This result is counterintuitive, because one might expect, a priori, that in the long term the global-score would be better at conserving structure than the local-score, since in the later case the reference sequence is reset at each step so that it would be easier to lose memory of the ancestral protein. One of the reasons of the global-score SCPE being worse at conserving structure could be the erroneous amino acid substitutions introduced at the beginning of the simulations (see above). To gain more insight into this issue, further work involving much longer simulations would be needed. However, for long enough evolutionary time it is not longer reasonable to assume that structure remains constant. In this limit, any model based on assuming structural conservation will break down.
A more detailed analysis shows that the maximum of the local-score plot corresponds to a ω = 0.12, that is in good agreement with the optimum cut-off determined from the acceptance rates, as explained previously. In contrast, for the global-score case the cut-off at the maximum of the similarity score plot is significantly below the optimum ω value previously obtained. This difference would be due to the wrong behaviour of the global-score scheme for small amounts of divergence (see Figure 2), which will affect the SCPE substitution pattern and, therefore, the amino acid probability distributions. The same behaviour, though less marked, is found in the plots of Figure 5.
Finally, it is interesting to note that the similarity score for ω = 0 is much better than JTT. Since ω = 0 corresponds to a simulation where no nonsynonymous substitutions are accepted, this is the score obtained using just the initial sequence. Memory of this sequence might favour the good agreement observed for SCPE. However, it is noteworthy that the actual agreement increases for ω > 0, showing that the good fit is not due exclusively to a memory effect. The substitution matrix assessment described in the next section should be less sensitive to memory effects.
Even though it has long been recognized that substitution patterns are site-specific and depend on protein family, it is in general very difficult to estimate site-specific and family-specific substitution matrices due to a lack-of-data problem. As we reported previously, a possible strategy to overcome this obstacle is to obtain site-specific substitution matrices from SCPE simulations . To further evaluate how the SCPE model is able to reproduce the substitution pattern of the LPXA family, a maximum likelihood analysis was used. SCPE runs were used to obtain a substitution matrix Q c for each structural class. Then, these matrices were used to calculate the maximum likelihood of the LPXA reference alignment using a given topology (see Methods).
Figure 7a shows that for LPXA, local-score simulations lead to better substitution matrices than global-score ones. Inspection of Figure 7b reveals that this is mainly due to the local-score SCPE giving better results for sites i+4 and, to a lesser degree, i+2. Figure 7a also reveals that both, local and global, SCPE models outperform JTT (dotted line of Figure 7a) for almost the whole ω range studied. This is due to the fact that site-specific amino acid substitution patterns, especially for constrained structural classes i and i+4, are not well described by general models such as JTT.
Other LβH families
LβH superfamily members studied.
Gene name or synonym
Number of sequences aligned
Streptogramin A Acetyltransferase
Comparison of models on 7 families of the LβH superfamily. Logarithm of the Maximum Likelihood per site obtained with different models for the families studied. Better models lead to larger ML values. The three numbers reported for each case correspond to structural classes i and i+4, considered separately, and to the average over the six structural classes.
For LPXA, SCPE (both local and global) are clearly better than JTT for all sites considered. CAT and SATA behave similarly, though the advantage of SCPE over JTT is less marked. For other families, SCPE (local and global) is better than JTT for class i sites. For other structural classes there is no definite advantage of SCPE over JTT.
When comparing local-score SCPE with global-score SCPE one finds no definite advantage of either one over the other. For sites i, where the more meaningful results are expected, local and global give very similar results for all families except for LACA where global is better than local.
We presented in full detail the Structurally Constrained Protein Evolution Model (SCPE), developed recently. We improved on our previous model by introducing a new scoring function. Our previous work was based on a "global" score, which measures how a trial sequence differs from the ancestral sequence in its ability to fit a reference structure assumed constant. In contrast, the "local" score measures the perturbation introduced by a given mutation with respect to the previously accepted sequence, rather than the ancestral one.
Both schemes, global and local, were compared in their ability to match the substitution patterns of the protein family LPXA. We performed a thorough assessment comparing structure conservation, entropy profiles, amino acid distributions, and substitution matrices. LβH proteins were found to be particularly suited for such a detailed characterization of the sequence pattern, because of the fact that most of their sites belong to one of only six different structural classes. Furthermore, these properties were studied as a function of the single parameter of the model: a cutoff that measures selection pressure against structural divergence. Finally, we applied the model to all other members of the LβH superfamily whose structure is known, extending previous studies performed only on the LPXA family.
In general, we found that the local-score SCPE behaves either similarly or better than the global-score scheme, depending on the property considered. Furthermore, for LPXA, and for sites of the structurally constrained class i of all other families studied, both SCPE models clearly outperform the widely used JTT model, showing the power of the SCPE model to account for substitution patterns conditioned by structural constraints.
Currently, we are using the SCPE model to investigate several issues important in protein evolution, such as overdispersion of the molecular clock, correlation between the evolution of different sites, and heterotachy. Also, we are testing the applicability of the SCPE model to other protein families, in order to assess its generality. Nevertheless, we should mention that since most protein families do not display the regularity of LβH proteins, it is more difficult to perform a detailed quantification of sequence patterns, which makes such tests at the same time more difficult and less demanding than the LβH superfamily.
The LPXA family belongs to a large and diverse group of proteins , the LβH (Left-handed parallel β Helix) superfamily. All the sequences of this superfamily contain an imperfect tandem-repetition of a hexapeptide motif . This motif is typically described by [LIVMA]-X3- [ASCVTN]-X. The first position of the hexapeptide is called i, and the following i+1, i+2, up to i+5. The sequence forms a left-handed parallel β helix, forming an equilateral triangular prism  (Figure 1a). Each coil of the helix is formed by three hexapeptides. Equivalent positions of different hexapeptides fall into similar structural environments. Residues at positions i and i+4, for example, point towards the inside of the β helix (Figure 1b). Thus, each site of the hexapeptide pattern corresponds to a different structural class. In this study we did not analyse sites that are at loop regions. Also, the first and last coils of the β helix of LPXA were not considered, since the structural environments of sites in these coils are not exactly the same as those of the other coils. Although all the LβH members have a homo-trimeric active form, we only use the monomer form in this study. We also analyse other LβH families, which are summarized, with a brief description, in Table 2.
The first step in the calculation of the SCPE score is the calculation of a profile of mean energies per position. In the present case we used the Cβ-Cβ potential of the program PROSA II . The original coordinates of the ancestral sequence were modified in order to provide with Cβ coordinates to those residues without them. Thus, all the GLY residues were substituted for ALA residues and an adequate rotamer was chosen using the program SCWRL . Later, the substituted ALA residues were converted back to the original GLY, keeping the Cβ coordinate of ALA to use when a GLY mutates to a residue with Cβ. Once the energy per position is obtained the score is calculated using:
where N is the length of the protein sequence, E mut (p) is the mean-field energy of position p in the trial (mutated) sequence and E ref (p) is the corresponding value of the reference sequence. The "global score" is calculated using the ancestral sequence as reference. The "local score" is calculated using the sequence accepted in the previous step in the simulation (i.e. the sequence that is mutated to obtain the trial).
The ancestral sequence was the UDP-N-acetylglucosamine acyltransferase (LPXA) from Escherichia coli. The coordinates were obtained from the PDB database  (ID code 1lxa). The cutoff range covered was 0–2.00 with a step of 0.1 for local score and 0–20 with a step of 1.00 for global score. For each cutoff value we performed 300 independent simulations, each one of 2500 mutational steps.
Using the LPXA from Escherichia coli as the reference protein, we recovered 25 homologous sequences using sequence similarity searches. This set constitutes the reference LPXA family. For each of the other members of the LβH superfamily for which at least one member has known structure, we used this member's sequence to characterize putative homologous proteins. See Table 2 for details. All the similarity searches were performed using the program BLASTP  at the NCBI server and the sequence alignments were obtained using Clustal X .
Estimation of acceptance rates
To assess the optimal selective pressure in our SCPE simulations, we inferred the mean ω value in the homologous LPXA family. Also, we inferred the ω in our SCPE simulations for different cut-offs. All the ω inferences were made using the program yn00 from PAML . We used options "w", which applies a weighting scheme between codons, and "f", which takes into account the codon frequencies of the data.
In the SCPE simulations, we also estimated ω directly by counting: ω is the ratio between the number of amino acid substitutions (accepted mutations) and the total number of amino acid mutation trials. We use "calculated", as opposed to "inferred" to designate the acceptance rates obtained in this way.
Estimation of the amount of divergence
Some of the comparisons performed depend on the amount of divergence. For these cases, we estimated the average divergence of the LPXA family using the program PAML. Maximum likelihood distances were estimated using the JTT model with the frequencies estimated from the data and a gamma distribution with 8 categories to estimate the relative rates (JTT+F+Γ). The average time calculated was Ka = 0.28 amino acid substitutions per site.
Assessment of structure conservation
We evaluated whether sequences produced by evolutionary simulations using SCPE recognize the correct structure using THREADER 3 . We considered the following schemes: local-score SCPE with λ = 1.10 (ω = 0.15); local-score SCPE with λ = 8.00 (ω = 0.92); global-score SCPE with λ = 7.00 (ω = 0.19); global-score SCPE with λ = 90.00 (ω = 0.95). To compare, we also ran simulations using JTT. For each model, we performed 50 independent runs of lengths Ka = 0.28 and Ka = 1.7 amino acid substitutions per site. For each sequence, structure recognition using THREADER 3 was performed. The ability of models to conserve structure was measured by the percentage of sequences which recognized correctly (Z-score > 2.7) the LβH fold.
Site-specific replacement matrices are obtained straightforwardly by "counting" substitutions in SCPE simulations. For the test system considered, sites can be classified into c = 1,2,...6 site classes. Then, for each class we set up a matrix of counts: for i ≠ j, is half the number of mutational steps that result in either i → j or j → i amino-acid replacements at site class c, and is the number of mutational steps for which amino acid i remains constant (i → i replacement). Then, for each class, the matrix of substitution rates, Q c , is obtained using:
Given the rate matrices, Q c , the probability matrices are obtained using
P c = exp(t Q c )
The vector of amino acid equilibrium frequencies of class c is, then, obtained with
Since there are some substitutions that do not occur during the simulations (very low probabilities), we have found it convenient to re-calculate each Q c using a pseudocounts procedure similar to that developed by Tatusov  as follows
Entropies and amino acid distributions
To study the sequence variability profile, we calculated the entropy for each structural class using:
For SCPE, we used the equilibrium probabilities obtained from the substitution matrices, as described in the previous section. For the reference alignment, we grouped all columns of the same structural class together, counted the number of times each amino acid occurred in each class, and obtained the corresponding amino acid frequencies.
To assess the similarity between the equilibrium SCPE amino acid distributions and those obtained from the reference alignment, we used the similarity score based on information theory proposed by Yona and Levitt . The score is calculated by adding together the similarity scores of the six structural classes.
JTT distributions and entropies
The equilibrium SCPE distributions and their corresponding entropies were compared with JTT distributions and entropies. In contrast to SCPE, the equilibrium JTT distribution does not depend on structural class. Therefore, instead of the equilibrium distributions, we chose to use the distributions and entropies from the alignment of sequences obtained from simulations with the JTT model. To this end, we performed 100 independent simulations using the JTT substitution matrix. The simulation length was set to the average number of substitutions obtained for the LPXA family (Ka = 0.28). We aligned the 100 output sequences, grouped all columns of the same structural class together, counted the number of times each amino acid occurred in each class, and obtained the corresponding amino acid frequencies.
Maximum likelihood calculations
In order to assess the SCPE substitution patterns, we performed Maximum Likelihood (ML) calculations using the site-dependent SCPE substitution matrices, Q c . The maximum likelihood of a model, Q, given the data, s, for topology, T, is obtained by maximizing the probability L = Pr(s|T, Q).
For the SCPE model, the reference alignment was partitioned into 6 sub-alignments corresponding to the 6 structural classes. Using these sub-alignments and the corresponding SCPE Q c matrices, we calculated the maximum likelihood using PAML. In all cases a gamma distribution was used to take into consideration the rate heterogeneity among sites of the same class. Similarly, we performed ML calculations using the JTT substitution matrix with gamma distribution of rates (JTT+Γ), for each of the six structural classes. The ML values obtained for each class were added together to obtain the total ML, as was done with the SCPE models.
It has been shown that as long as the tree topology is reasonable, model comparison is robust with respect to variations in topology . In the present case, topologies were obtained using the program FITCH  of PHYLIP 3.57c  with ML distances obtained using JTT with PAML.
All the models compared here have the same number of parameters. Therefore, models were compared by comparing ML values. One should note, however, that when models with different number of parameters are compared, one should use a statistic that takes explicit account the number of parameters of each model [42, 43].
GP and JE developed the mathematical model. GP implemented the model, run the simulations, performed the analysis and wrote the first draft. JE edited and wrote the revised versions. All authors read and approved the final manuscript.
We thank Jeff Thorne and an anonymous reviewer for their useful comments. This work was supported by the Universidad Nacional de Quilmes, the Fundación Antorchas, and the Agencia Nacional de Promoción Científica, Tecnológica y de Innovación.
- Lesk AM, Chothia C: How different amino acid sequences determine similar protein structures: the structure and evolutionary dynamics of the globins. J Mol Biol. 1980, 136: 225-270. 10.1016/0022-2836(80)90373-3.View ArticlePubMedGoogle Scholar
- Bajaj M, Blundell T: Evolution and the tertiary structure of proteins. Annu Rev Biophys Bioeng. 1984, 13: 453-492. 10.1146/annurev.bb.13.060184.002321.View ArticlePubMedGoogle Scholar
- Chothia C, Lesk AM: The relation between the divergence of sequence and structure in proteins. EMBO J. 1986, 5: 823-826.PubMed CentralPubMedGoogle Scholar
- Russell RB, Saqi MA, Sayle RA, Bates PA, Sternberg MJ: Recognition of analogous and homologous protein folds: analysis of sequence and structure conservation. J Mol Biol. 1997, 269: 423-439. 10.1006/jmbi.1997.1019.View ArticlePubMedGoogle Scholar
- Kimura M, Ota T: On some principles governing molecular evolution. Proc Natl Acad Sci USA. 1974, 71: 2848-2852.PubMed CentralView ArticlePubMedGoogle Scholar
- Ponder JW, Richards FM: Tertiary templates for proteins. Use of packing criteria in the enumeration of allowed sequences for different structural classes. J Mol Biol. 1987, 193: 775-791.View ArticlePubMedGoogle Scholar
- Overington J, Johnson MS, Sali A, Blundell TL: Tertiary structural constraints on protein evolutionary diversity: templates, key residues and structure prediction. Proc R Soc Lond B Biol Sci. 1990, 241: 132-145.View ArticleGoogle Scholar
- Lim WA, Sauer RT: The role of internal packing interactions in determining the structure and stability of a protein. J Mol Biol. 1991, 219: 359-376.View ArticlePubMedGoogle Scholar
- Luthy R, McLachlan AD, Eisenberg D: Secondary structure-based profiles: use of structure-conserving scoring tables in searching protein sequence databases for structural similarities. Proteins. 1991, 10: 229-239.View ArticlePubMedGoogle Scholar
- Thorne JL, Goldman N, Jones DT: Combining protein evolution and secondary structure. Mol Biol Evol. 1996, 13: 666-673.View ArticlePubMedGoogle Scholar
- Lió P, Goldman N: Using protein structural information in evolutionary inference: transmembrane proteins. Mol Biol Evol. 1999, 16: 1696-1710.View ArticlePubMedGoogle Scholar
- Fornasari MS, Parisi G, Echave J: Site-specific amino acid replacement matrices from structurally constrained protein evolution simulations. Mol Biol Evol. 2002, 19: 352-356.View ArticlePubMedGoogle Scholar
- Robinson D, Jones D, Kishino H, Goldman N, Thorne J: Protein evolution with dependence among codons due to tertiary structure. Mol Biol Evol. 2003, 20: 1692-1704. 10.1093/molbev/msg184.View ArticlePubMedGoogle Scholar
- Parisi G, Echave J: Structural constraints and emergence of sequence patterns in protein evolution. Mol Biol Evol. 2001, 18: 750-756.View ArticlePubMedGoogle Scholar
- Govindarajan S, Goldstein RA: Evolution of model proteins on a foldability landscape. Proteins. 1997, 29: 461-466. 10.1002/(SICI)1097-0134(199712)29:4<461::AID-PROT6>3.0.CO;2-B.View ArticlePubMedGoogle Scholar
- Dokholyan NV, Shakhnovich EI: Understanding hierarchical protein evolution from first principles. J Mol Biol. 2001, 312: 289-307. 10.1006/jmbi.2001.4949.View ArticlePubMedGoogle Scholar
- Bastolla U, Porto M, Eduardo Roman M, Vendruscolo M: Connectivity of neutral networks, overdispersion, and structural conservation in protein evolution. J Mol Evol. 2003, 56: 243-254. 10.1007/s00239-002-2350-0.View ArticlePubMedGoogle Scholar
- Bastolla U, Porto M, Roman H, Vendruscolo M: Lack of self-averaging in neutral evolution of proteins. Phys Rev Lett. 2002, 89: 208101-1-10.1103/PhysRevLett.89.208101.View ArticleGoogle Scholar
- Bastolla U, Roman H, Vendruscolo M: Neutral evolution of model proteins: diffusion in sequence space and overdispersion. J Theor Biol. 1999, 200: 49-64. 10.1006/jtbi.1999.0975.View ArticlePubMedGoogle Scholar
- Babajide A, Farber R, Hofacker I, Inman J, Lapedes A, Stadler P: Exploring protein sequence space using knowledge-based potentials. J Theor Biol. 2001, 212: 35-46. 10.1006/jtbi.2001.2343.View ArticlePubMedGoogle Scholar
- Babajide A, Hofacker I, Sippl M, Stadler P: Neutral networks in protein space: a computational study based on knowledge-based potentials of mean force. Fold Des. 1997, 2: 261-269.View ArticlePubMedGoogle Scholar
- Bornberg-Bauer E, Chan H: Modeling evolutionary landscapes: mutational stability, topology, and superfunnels in sequence space. Proc Natl Acad Sci USA. 1999, 96: 10689-10694. 10.1073/pnas.96.19.10689.PubMed CentralView ArticlePubMedGoogle Scholar
- Pei J, Dokholyan NV, Shakhnovich EI, Grishin NV: Using protein design for homology detection and active site searches. Proc Natl Acad Sci USA. 2003, 100: 11361-11366. 10.1073/pnas.2034878100.PubMed CentralView ArticlePubMedGoogle Scholar
- Kimura M: The neutral theory of molecular evolution. Cambdige: Cambridge University Press. 1983View ArticleGoogle Scholar
- Gillespie JH: The causes of molecular evolution. 1991, Oxford: Oxford University PressGoogle Scholar
- Ohta T: The nearly neutral theory of molecular evolution. Annual Review of Ecology and Systematics. 1992, 23: 263-286. 10.1146/annurev.es.23.110192.001403.View ArticleGoogle Scholar
- Li WH: Molecular Evolution. 1997, Massachusetts: Sinauer AssociatesGoogle Scholar
- Patthy L: Protein Evolution. 1999, Oxford: Blackwell Science LtdGoogle Scholar
- Vaara M: Eight bacterial proteins, including UDP-N-acetylglucosamine acyltransferase (LPXA) and three other transferases of Escherichia coli, consist of a six-residue periodicity theme. FEMS Microbiol Lett. 1992, 76: 249-254. 10.1016/0378-1097(92)90344-N.View ArticlePubMedGoogle Scholar
- Vuorio R, Harkonen T, Tolvanen M, Vaara M: The novel hexapeptide motif found in the acyltransferases LPXA and LpxD of lipid A biosynthesis is conserved in various bacteria. FEBS Lett. 1994, 337: 289-292. 10.1016/0014-5793(94)80211-4.View ArticlePubMedGoogle Scholar
- Parisi G, Fornasari M, Echave J: Evolutionary analysis of gamma-carbonic anhydrase and structurally related proteins. Mol Phylogenet Evol. 2000, 14: 323-334. 10.1006/mpev.1999.0734.View ArticlePubMedGoogle Scholar
- Thorne JL: Models of protein sequence evolution and their applications. Curr Opin Genet Dev. 2000, 10: 602-605. 10.1016/S0959-437X(00)00142-8.View ArticlePubMedGoogle Scholar
- Whelan S, Lió P, Goldman N: Molecular phylogenetics: state-of-the-art methods for looking into the past. Trends Genet. 2001, 17: 262-272. 10.1016/S0168-9525(01)02272-7.View ArticlePubMedGoogle Scholar
- Jones DT, Taylor WR, Thornton JM: The rapid generation of mutation data matrices from protein sequences. Comput Appl Biosci. 1992, 8: 275-282.PubMedGoogle Scholar
- Yang Z: Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. J Mol Evol. 1994, 39: 306-314. 10.1007/BF00178256.View ArticlePubMedGoogle Scholar
- Atchley WR, Terhalle W, Dress A: Positional dependence, cliques, and predictive motifs in the bHLH protein domain. J Mol Evol. 1999, 48: 501-516.View ArticlePubMedGoogle Scholar
- Mirny LA, Shakhnovich EI: Universally conserved positions in protein folds: reading evolutionary signals about stability, folding kinetics and function. J Mol Biol. 1999, 291: 177-196. 10.1006/jmbi.1999.2911.View ArticlePubMedGoogle Scholar
- Larson SM, Ruczinski I, Davidson AR, Baker D, Plaxco KW: Residues participating in the protein folding nucleus do not exhibit preferential evolutionary conservation. J Mol Biol. 2002, 316: 225-233. 10.1006/jmbi.2001.5344.View ArticlePubMedGoogle Scholar
- Dokholyan NV, Shakhnovich EI: Understanding hierarchical protein evolution from first principles. J Mol Biol. 2001, 312: 289-307. 10.1006/jmbi.2001.4949.View ArticlePubMedGoogle Scholar
- Voigt CA, Mayo SL, Arnold FH, Wang ZG: Computational method to reduce the search space for directed protein evolution. Proc Natl Acad Sci U S A. 2001, 98: 3778-3783. 10.1073/pnas.051614498.PubMed CentralView ArticlePubMedGoogle Scholar
- Yona G, Levitt M: Within the twilight zone: a sensitive profile-profile comparison tool based on information theory. J Mol Biol. 2002, 315: 1257-1275. 10.1006/jmbi.2001.5293.View ArticlePubMedGoogle Scholar
- Schwarz G: Estimating the dimension of a model. Ann Stat. 1974, 6: 461-464.View ArticleGoogle Scholar
- Posada D: The effect of branch length variation on the selection of models of molecular evolution. J Mol Evol. 2001, 52: 434-444.PubMedGoogle Scholar
- Raetz CR, Roderick SL: A left-handed parallel beta helix in the structure of UDP-N-acetylglucosamine acyltransferase. Science. 1995, 270: 997-1000.View ArticlePubMedGoogle Scholar
- Sippl MJ: Calculation of conformational ensembles from potentials of mean force. An approach to the knowledge-based prediction of local structures in globular proteins. J Mol Biol. 1990, 213: 859-883.View ArticlePubMedGoogle Scholar
- Dunbrack RL, Karplus M: Backbone-dependent rotamer library for proteins. Application to side- chain prediction. J Mol Biol. 1993, 230: 543-574. 10.1006/jmbi.1993.1170.View ArticlePubMedGoogle Scholar
- Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE: The Protein Data Bank. Nucleic Acids Res. 2000, 28: 235-242. 10.1093/nar/28.1.235.PubMed CentralView ArticlePubMedGoogle Scholar
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25: 3389-3402. 10.1093/nar/25.17.3389.PubMed CentralView ArticlePubMedGoogle Scholar
- Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG: The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 1997, 25: 4876-4882. 10.1093/nar/25.24.4876.PubMed CentralView ArticlePubMedGoogle Scholar
- Yang Z: PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci. 1997, 13: 555-556.PubMedGoogle Scholar
- Jones DT, Taylor WR, Thornton JM: A new approach to protein fold recognition. Nature. 1992, 358: 86-89. 10.1038/358086a0.View ArticlePubMedGoogle Scholar
- Tatusov RL, Altschul SF, Koonin EV: Detection of conserved segments in proteins: iterative scanning of sequence databases with alignment blocks. Proc Natl Acad Sci U S A. 1994, 91: 12091-12095.PubMed CentralView ArticlePubMedGoogle Scholar
- Fitch WM, Margoliash E: Construction of phylogenetic trees. Science. 1974, 155: 279-284.View ArticleGoogle Scholar
- Felsenstein J: PHYLIP (Phylogeny Inference Package) version 3.5c. Distributed by the author. Department of Genetics, Univ. of Washington, Seatle. 1993Google Scholar
This article is published under license to BioMed Central Ltd. This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.