Genome-wide acceleration of protein evolution in flies (Diptera)
© Savard et al; licensee BioMed Central Ltd. 2006
Received: 05 July 2005
Accepted: 25 January 2006
Published: 25 January 2006
The rate of molecular evolution varies widely between proteins, both within and among lineages. To what extent is this variation influenced by genome-wide, lineage-specific effects? To answer this question, we assess the rate variation between insect lineages for a large number of orthologous genes.
When compared to the beetle Tribolium castaneum, we find that the stem lineage of flies and mosquitoes (Diptera) has experienced on average a 3-fold increase in the rate of evolution. Pairwise gene comparisons between Drosophila and Tribolium show a high correlation between evolutionary rates of orthologous proteins.
Gene specific divergence rates remain roughly constant over long evolutionary times, modulated by genome-wide, lineage-specific effects. Among the insects analysed so far, it appears that the Tribolium genes show the lowest rates of divergence. This has the practical consequence that homology searches for human genes yield significantly better matches in Tribolium than in Drosophila. We therefore suggest that Tribolium is better suited for comparisons between phyla than the widely employed dipterans.
Understanding the causes of rate variation in protein evolution is central for many fields including molecular evolution, comparative genomics and structural biology. A widely accepted principle is that more important proteins evolve more slowly . However, evolutionary rate variation not only exists between different proteins, but also between lineages . This was already observed in the earliest comparative studies, both within [3–5] and across animal phyla .
Following the initial observation that Drosophilids are fast evolving , it was shown that the rate of synonymous substitutions in Drosophila melanogaster is approximately two times higher than in rodents and ten times higher than in primates . However, the estimated acceleration seemed to depend on the types of proteins examined, with another study reporting only a 3-fold difference between Drosophila and mammalian rates . A study of the relative rates of ribosomal RNA evolution in insect lineages showed that there was an episodic substitution rate increase of about 20-fold in the stem lineage of dipterans , suggesting that high evolutionary rates may be characteristic of the whole dipteran order. However, it is currently unclear whether the observed rate accelerations encompassed the whole nuclear genome, or whether they were restricted to certain classes of genes.
In this study, we assess the genome-wide variation of evolutionary rates between insect lineages, by comparing the beetle Tribolium castaneum, and the dipterans Drosophila melanogaster and Anopheles gambiae. We test (i) whether there is an acceleration of protein evolution in dipterans that is observable on a genome-wide scale, (ii) whether this acceleration is confined to an episodic burst of changes at the base of the dipteran lineage, and (iii) whether this acceleration affects all genes to a similar extent.
Genomic rate estimates confirm an acceleration in dipterans
Relative evolutionary rates of Tribolium castaneum, and the dipterans Anopheles gambiae and Drosophila melanogaster. Amino acid distances, divergence time estimates, and substitution rates for the holometabolous insect branches and the base of Diptera are shown.
Branch (from node..to node)
Distance [aa subs/site]
Rate [10-3 aa subs/site/My]
0.180 ± 0.003
280.0 ± 4.4
0.643 ± 0.015
Anopheles – base (3..4)
0.222 ± 0.003
280.0 ± 4.4
0.793 ± 0.016
1.233 ± 0.038
Drosophila – base (2..4)
0.245 ± 0.003
280.0 ± 4.4
0.875 ± 0.017
1.361 ± 0.041
0.145 ± 0.002
241.0 ± 4.0
0.602 ± 0.013
0.936 ± 0.030
0.168 ± 0.002
241.0 ± 4.0
0.697 ± 0.014
1.084 ± 0.033
Base of Diptera (4..5)
0.077 ± 0.002
39.0 ± 5.9
1.974 ± 0.303
3.071 ± 0.477
In order to assign absolute rates of evolution, we combined the sequence distances in Figure 1 with absolute dates obtained from palaeontological estimates. Based on the oldest coleopteran fossil  and the timing of the primary radiation of holometabolous insect orders , the divergence of dipterans (Drosophila and Anopheles) from coleopterans (Tribolium) has been estimated to the early Permian (the Artinskian; about 284.4-275.6 Mya). The divergence between the Brachycera (Drosophila) and the Nematocera (Anopheles) lineages has been estimated to the Middle Permian (the Anisian; about 245.0-237.0 Mya) . Combining these dates with the maximum-likelihood estimates of branch lengths, we obtained the absolute evolutionary rates listed in Table 1. This shows that the 39 My time interval starting with the separation of coleopterans and dipterans and culminating in the radiation of Diptera was characterised by an episodic increase of evolutionary rate, averaging 3.07 times the mean rate found for the Tribolium lineage (95% CI: 2.39–4.34).
Deviations from clock-like evolution
Maximum likelihood estimates of different molecular clock models for concatenated sequences. The models are compared to the model without a clock via likelihood ratio tests.
2 × 10-65
Base of Diptera
1 × 10-37
Drosophila and Anopheles
2 × 10-27
Tribolium and Drosophila
Tribolium and Anopheles
Tribolium and Diptera
5 × 10-13
Tribolium and base of Diptera
5 × 10-13
Tribolium and tips of Diptera
5 × 10-13
Further comparison of increasingly complex models of evolution show that only models involving separate local clocks for the Tribolium and Drosophila lineages can fit the data as good as the model without a clock assumption (P = 0.1, Table 2). All other combinations of local clocks between the Tribolium, Drosophila and Anopheles lineages were found to be inferior to the model without the clock (P < 0.01, Table 2 and data not shown). Thus, although the accelerated evolution of dipterans was most pronounced during a burst of changes that occurred at the base of the order (Table 1), a sustained increase is also detected in the branches separating Drosophila from Anopheles. Remarkably, the need for a coleopteran local clock also demonstrates that the Tribolium lineage has generally evolved much slower than most of the other taxa considered here.
It should be noted that Table 2 contains multiple statistical comparisons, thereby decreasing the overall specificity of the statistical test. A conservative strategy to control for this is to divide the P-value cutoff for significance by the number of comparisons (Bonferroni-correction), i.e., to use P0 = 0.05/8 = 0.0063, or – if accounting for all comparisons in data not shown – P0 = 0.05/41 = 0.0012. This correction does not affect our conclusions.
The dipteran acceleration affects the majority of individual genes
When calculated from single gene distances, the mean evolutionary rate of the Drosophila lineage is significantly larger than the one of the Tribolium lineage (0.284 > 0.245; two-tailed Mann-Whitney U- test, P < 10-3). The corresponding constant of proportionality is 1.31, in close agreement with the relative rate increase of 1.36 obtained from the concatenated sequences (Table 1). Thus, although evolutionary rates among genes of a genome can vary by several orders of magnitude, these rates are nevertheless correlated between species even after more than 280 My of independent evolution.
Using 439 nuclear transcripts, we find that the dipteran lineage (represented by Drosophila and Anopheles) has experienced an episodic increase in evolutionary rate when compared to the coleopteran lineage (represented by Tribolium). This rate subsequently dropped in the diversifying dipteran lineage, but remained above average in Drosophila. This is consistent with previous findings from studies of ribosomal RNAs [9, 17]. The rate increases found in our genome-scale analysis of protein sequences are lower than those reported for rRNAs, with an average increase of only 1.3-fold during dipteran evolution versus an at least threefold average increase for rRNAs . However, our analysis is necessarily biased towards parts of genes that are sufficiently conserved to be detected in the outgroup (human) and that could be unequivocally aligned. Still, Figure 2 demonstrates that the averaged rate increase is representative for a large range of genes.
The Neutral Theory of Evolution [1, 18, 19] predicts that the accumulation of molecular changes is only driven by the mutation rate and the degree of purifying selection. Accordingly, genome-wide variation in the rate of protein evolution between species might be caused by physiological and ecological factors that affect the mutation rate (e.g. metabolic rate [20, 21], temperature ), or by differences in the efficiency of selection (notably variation in effective population size [22–24] or the level of outbreeding ). In the case of the rRNA comparisons, we could indeed show that there has been a mutational bias towards incorporating more A/T nucleotides than G/C nucleotides at the base of the Diptera , which has probably caused the episodic acceleration of evolutionary rates. Such a mutational bias is likely to have affected the whole genome. However, the GC content of the Tribolium genes in this study (46.0%) is very close to that of the Drosophila orthologs (47.1%), suggesting that mutational biases are unlikely to be responsible for the continued acceleration in dipterans. This interpretation is supported by the fact that it is the Tribolium proteins which show a slight bias towards AT-rich amino acids when compared to the Drosophila orthologs (28.1% AT-rich amino acids for Tribolium, compared to 27.2% for Drosophila).
We observe a strong correlation between evolutionary rates in the Tribolium and the Drosophila lineages (Figure 2). This indicates that the selective forces acting on the majority of these genes have been similar between the two lineages, consistent with a neutral model. The correlation would be expected to become weaker or even disappear if positive selection would occur episodically among these genes. Our finding that most proteins diverge in a clock-like fashion, with clock speeds that differ among lineages only due to genome-wide effects, suggests that episodic changes are an exception, although such changes are known to occur in some cases [26, 27]. However, we note that the class of fast evolving genes could not be analysed here, as they diverge too fast to allow identification as orthologs in distant comparisons . In a dedicated study of such genes between closely related Drosophila lineages, we found that there are indeed a significant number of genes that must have undergone episodic changes in substitution rates on a gene by gene basis . Thus, we emphasize that the above conclusions relate only to relatively conserved (or non-orphan) genes, while the generalized evolutionary patterns of fast evolving (or orphan) genes will need further study.
We reported here the analysis of evolutionary rate variation of a large number of orthologous genes between insect lineages. Variation in the rate of evolution has genome-wide effects, and is correlated between orthologous genes over very long evolutionary time scales.
Because Drosophila melanogaster is among the best studied animal model organisms, the peculiar evolutionary pattern confirmed here has practical implications. In BLAST searches of human sequences against Tribolium and Drosophila, Tribolium sequences are found in 70% of cases to be more similar to human than Drosophila genes (N = 1221, P = 10-50 from sign test). Thus, when attempting to link human genes to their Drosophila homologs, data from Tribolium will be helpful to provide a more conservative reference sequence. This approach has been used, e.g., to resolve the evolutionary relationship between the Drosophila zen gene and human HOX3 genes . The slowly evolving beetle Tribolium castaneum is thus likely to play a significant role for comparisons between phyla.
Drosophila melanogaster, Anopheles gambiae and Homo sapiens peptides were obtained from Ensembl . For Tribolium castaneum, EST data available through NCBI dbEST  were assembled into contigs using phrap and manually curated to ensure high quality of the data set. For the pea aphid Acyrthosiphon pisum, a publicly available EST-based unigene set was obtained from the INFOBIOGEN GnpSeq database . Tribolium and Acyrthosiphon pisum contigs were then searched against all Drosophila melanogaster proteins using BLASTx. The reading frame from the best hit was assumed to be the correct reading frame. We then chose the longest run of peptides uninterrupted by a stop codon as the peptide corresponding to each EST contig.
Identification of orthologs
We performed BLASTp searches of all proteome pairs among Tribolium castaneum, Drosophila melanogaster, Anopheles gambiae, Acyrthosiphon pisum and Homo sapiens. Orthologs were selected based on reciprocal best blast hits  using an e-value cut-off of 10-10. A group of sequences with exactly one member in each species was accepted as an orthologous family if each sequence had each of the other family sequences as the best BLASTp hit in the respective proteome. This requirement of all-against-all reciprocal best hits is very stringent, and thus gives good confidence in the inferred orthology.
Alignments and distance estimation
Multiple sequence alignments were performed with MUSCLE  using default settings. Resulting alignments were purged from putatively misaligned positions as well as gap positions with Gblocks using default settings .
Branch lengths were calculated with the maximum likelihood model by Goldman and Yang  as implemented in the PAML package . We used the empirical transition matrix compiled by Jones et al. . The distribution of evolutionary rates across sites was approximated by a discrete Γ-distribution, with the shape parameter as an additional free parameter. When calculating rates for individual genes, we assumed a uniform rate across sites. Clusters containing genes with zero branch length were discarded from further analysis.
List of abbreviations
degrees of freedom
million years ago
We thank Laurence Hurst for helpful discussions. Ziheng Yang and Joe Felsenstein provided valuable help on implementation details of their program packages, PAML and PHYLIP, respectively. JS and DT acknowledge support through grants from the HFSPO and the DFG. MJL acknowledges support from the Royal Society and a Heisenberg Fellowship of the DFG.
- Smith AB, Lafay B, Christen R: Comparative variation of morphological and molecular evolution through geologic time: 28S ribosomal RNA versus morphology in echinoids. Philos Trans R Soc Lond B Biol Sci. 1992, 338 (1286): 365-382.View ArticlePubMedGoogle Scholar
- Bromham L, Rambaut A, Harvey PH: Determinants of rate variation in mammalian DNA sequence evolution. J Mol Evol. 1996, 43 (6): 610-621. 10.1007/BF02202109.View ArticlePubMedGoogle Scholar
- Adachi J, Cao Y, Hasegawa M: Tempo and mode of mitochondrial DNA evolution in vertebrates at the amino acid sequence level: rapid evolution in warm-blooded vertebrates. J Mol Evol. 1993, 36 (3): 270-281. 10.1007/BF00160483.View ArticlePubMedGoogle Scholar
- Catzeflis FM, Sheldon FH, Ahlquist JE, Sibley CG: DNA-DNA hybridization evidence of the rapid rate of muroid rodent DNA evolution. Mol Biol Evol. 1987, 4 (3): 242-253.PubMedGoogle Scholar
- Gu X, Li WH: Higher rates of amino acid substitution in rodents than in humans. Mol Phylogenet Evol. 1992, 1 (3): 211-214. 10.1016/1055-7903(92)90017-B.View ArticlePubMedGoogle Scholar
- Britten RJ: Rates of DNA sequence evolution differ between taxonomic groups. Science. 1986, 231 (4744): 1393-1398.View ArticlePubMedGoogle Scholar
- Moriyama EN: Higher rates of nucleotide substitution in Drosophila than in mammals. Jpn J Genetics. 1987, 62: 139-147.View ArticleGoogle Scholar
- Sharp PM, Li WH: On the rate of DNA sequence evolution in Drosophila. J Mol Evol. 1989, 28 (5): 398-402.View ArticlePubMedGoogle Scholar
- Friedrich M, Tautz D: An episodic change of rDNA nucleotide substitution rate has occurred during the emergence of the insect order Diptera. Mol Biol Evol. 1997, 14 (6): 644-653.View ArticlePubMedGoogle Scholar
- The Tree of Life web project. [http://tolweb.org/tree/phylogeny.html]
- Goldman N, Yang ZH: Codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol Biol Evol. 1994, 11: 725-736.PubMedGoogle Scholar
- Ponomarenko AG: Superorder Scarabaeidea Laicharting, 1781. Order Coleoptera Linné, 1758. The Beetles. History of Insects. Edited by: Rasnitsyn AP, Quicke DLJ. 2002, Dordrecht , Kluwer Academic Publishers, 164-176.Google Scholar
- Rasnitsyn AP: Cohors Scarabaeiformes Laicharting, 1781. The Holometabolans. History of Insects. Edited by: Rasnitsyn AP, Quicke DLJ. 2002, Dordrecht , Kluwer Academic Publishers, 157-159.View ArticleGoogle Scholar
- Blagoderov VA, Lukashevich ED, Mostovski MB: Order Diptera Linné, 1758. The True Flies. History of Insects. Edited by: Rasnitsyn AP, Quicke DLJ. 2002, Dordrecht , Kluwer Academic Publishers, 227-241.Google Scholar
- Yoder AD, Yang ZH: Estimation of primate speciation dates using local molecular clocks. Mol Biol Evol. 2000, 17 (7): 1081-1090.View ArticlePubMedGoogle Scholar
- Huelsenbeck JP, Rannala B: Phylogenetic methods come of age: testing hypotheses in an evolutionary context. Science. 1997, 276 (5310): 227-232. 10.1126/science.276.5310.227.View ArticlePubMedGoogle Scholar
- Carmean D, Kimsey LS, Berbee ML: 18S rDNA sequences and the holometabolous insects. Mol Phylogenet Evol. 1992, 1 (4): 270-278. 10.1016/1055-7903(92)90002-X.View ArticlePubMedGoogle Scholar
- Ohta T: Very slightly deleterious mutations and the molecular clock. J Mol Evol. 1987, 26: 1-6. 10.1007/BF02111276.View ArticlePubMedGoogle Scholar
- Kimura M, Ohta T: On the rate of molecular evolution. J Mol Evol. 1971, 1: 1-17. 10.1007/BF01659390.View ArticlePubMedGoogle Scholar
- Gillooly JF, Allen AP, West GB, Brown JH: The rate of DNA evolution: effects of body size and temperature on the molecular clock. Proc Natl Acad Sci U S A. 2005, 102 (1): 140-145. 10.1073/pnas.0407735101.PubMed CentralView ArticlePubMedGoogle Scholar
- Martin AP: Substitution rates of organelle and nuclear genes in sharks: implicating metabolic rate (again). Mol Biol Evol. 1999, 16 (7): 996-1002.View ArticlePubMedGoogle Scholar
- Jordan IK, Rogozin IB, Wolf YI, Koonin EV: Microevolutionary genomics of bacteria. Theoretical Population Biology. 2002, 61 (4): 435-447. 10.1006/tpbi.2002.1588.View ArticlePubMedGoogle Scholar
- Eyre-Walker A, Keightley PD, Smith NG, Gaffney D: Quantifying the slightly deleterious mutation model of molecular evolution. Mol Biol Evol. 2002, 19 (12): 2142-2149.View ArticlePubMedGoogle Scholar
- Gu Z, David L, Petrov D, Jones T, Davis RW, Steinmetz LM: Elevated evolutionary rates in the laboratory strain of Saccharomyces cerevisiae. Proc Natl Acad Sci U S A. 2005, 102 (4): 1092-1097. 10.1073/pnas.0409159102.PubMed CentralView ArticlePubMedGoogle Scholar
- Bustamante CD, Nielsen R, Sawyer SA, Olsen KM, Purugganan MD, Hartl DL: The cost of inbreeding in Arabidopsis. Nature. 2002, 416 (6880): 531-534. 10.1038/416531a.View ArticlePubMedGoogle Scholar
- Ayala FJ: Molecular clock mirages. Bioessays. 1999, 21 (1): 71-75. 10.1002/(SICI)1521-1878(199901)21:1<71::AID-BIES9>3.0.CO;2-B.View ArticlePubMedGoogle Scholar
- Rodriguez-Trelles F, Tarrio R, Ayala FJ: Erratic overdispersion of three molecular clocks: GPDH, SOD, and XDH. Proc Natl Acad Sci USA. 2001, 98 (20): 11405-11410. 10.1073/pnas.201392198.PubMed CentralView ArticlePubMedGoogle Scholar
- Schmid KJ, Tautz D: A screen for fast evolving genes from Drosophila. Proc Natl Acad Sci USA. 1997, 94 (18): 9746-9750. 10.1073/pnas.94.18.9746.PubMed CentralView ArticlePubMedGoogle Scholar
- Domazet-Loso T, Tautz D: An evolutionary analysis of orphan genes in Drosophila. Genome Res. 2003, 13 (10): 2213-2219. 10.1101/gr.1311003.PubMed CentralView ArticlePubMedGoogle Scholar
- Falciani F, Hausdorf B, Schroder R, Akam M, Tautz D, Denell R, Brown S: Class 3 Hox genes in insects and the origin of zen. Proc Natl Acad Sci USA. 1996, 93 (16): 8479-8484. 10.1073/pnas.93.16.8479.PubMed CentralView ArticlePubMedGoogle Scholar
- Project Ensembl. [http://www.ensembl.org/]
- Expressed Sequence Tags database. [http://www.ncbi.nlm.nih.gov/projects/dbEST/]
- INFOBIOGEN GnpSeq database. [http://urgi.infobiogen.fr/data/gnpSeq/run.php]
- Tatusov RL, Koonin EV, Lipman DJ: A genomic perspective on protein families. Science. 1997, 278 (5338): 631-637. 10.1126/science.278.5338.631.View ArticlePubMedGoogle Scholar
- Edgar RC: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucl Acids Res. 2004, 32 (5): 1792-1797. 10.1093/nar/gkh340.PubMed CentralView ArticlePubMedGoogle Scholar
- Castresana J: Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol Biol Evol. 2000, 17 (4): 540-552.View ArticlePubMedGoogle Scholar
- Yang ZH: PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci. 1992, 8: 275-282.Google Scholar
- Jones DT, Taylor WR, Thornton JM: The rapid generation of mutation data matrices from protein sequences. Comput Appl Biosci. 1992, 8 (3): 275-282.PubMedGoogle Scholar
- Gradstein F, Ogg J, Smith A: A Geologic Time Scale 2004. 2004, Cambridge University Press, UKView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.