- Research article
- Open Access
Revealing pancrustacean relationships: Phylogenetic analysis of ribosomal protein genes places Collembola (springtails) in a monophyletic Hexapoda and reinforces the discrepancy between mitochondrial and nuclear DNA markers
BMC Evolutionary Biologyvolume 8, Article number: 83 (2008)
In recent years, several new hypotheses on phylogenetic relations among arthropods have been proposed on the basis of DNA sequences. One of the challenged hypotheses is the monophyly of hexapods. This discussion originated from analyses based on mitochondrial DNA datasets that, due to an unusual positioning of Collembola, suggested that the hexapod body plan evolved at least twice. Here, we re-evaluate the position of Collembola using ribosomal protein gene sequences.
In total 48 ribosomal proteins were obtained for the collembolan Folsomia candida. These 48 sequences were aligned with sequence data on 35 other ecdysozoans. Each ribosomal protein gene was available for 25% to 86% of the taxa. However, the total sequence information was unequally distributed over the taxa and ranged between 4% and 100%. A concatenated dataset was constructed (5034 inferred amino acids in length), of which ~66% of the positions were filled. Phylogenetic tree reconstructions, using Maximum Likelihood, Maximum Parsimony, and Bayesian methods, resulted in a topology that supports monophyly of Hexapoda.
Although ribosomal proteins in general may not evolve independently, they once more appear highly valuable for phylogenetic reconstruction. Our analyses clearly suggest that Hexapoda is monophyletic. This underpins the inconsistency between nuclear and mitochondrial datasets when analyzing pancrustacean relationships. Caution is needed when applying mitochondrial markers in deep phylogeny.
General hypotheses on arthropod phylogeny are rapidly being altered by DNA sequence data [1–3]. For instance, the Atelocerata concept held that hexapods and myriapods are united in one clade, but under the influence of molecular data (e.g. ) this concept was replaced by the view that crustaceans and hexapods constitute a monophyletic group, which is known as Pancrustacea (e.g. [2, 3, 5]).
Another recently proposed, but still highly debated viewpoint is the diphyletic origin of Hexapoda, which was initially raised by Nardi and co-workers in 2003 . Based on four mitochondrial genes, they  observed that two species of Collembola (Tetrodontophora bielanensis and Gomphiocephalus hodgsoni) branched off before the other pancrustacean groups that were included in their study (Insecta and Crustacea), suggesting paraphyly of Hexapoda. Their thesis was that the six-legged body plan of Collembola and other hexapods evolved at least twice: once in the group of wingless hexapods and another time in the true insects.
The conclusions of Nardi et al.  resulted in a vivid scientific debate, and many studies have addressed the phylogenetic placement of Collembola since then. Some authors focused on mitochondrial sequences, others analyzed nuclear genes. Additional mitochondrial sequences confirmed that, due to the placement of Collembola separate from the other hexapods, Hexapoda are indeed not monophyletic [3, 7]. However, after thorough analyses exploring the effects of outgroup and gene choice, sequence handling and optimality criteria on inferred trees, Cameron and co-workers  concluded that the mitochondrial data as available at the time were inadequate to fully resolve hexapod relationships . Hassinin  arrived at a similar conclusion in a more recent study focusing on the effects of reverse strand-bias. Most recently, Carapelli and co-workers  reported new analyses on a very large dataset, consisting of no fewer than a hundred almost-complete mitochondrial genomes. These new analyses, which were based on a novel model of amino acid sequence evolution (MtPan), supported the non-monophyly of hexapod groups.
It has gradually become clear in pancrustacean phylogeny that nuclear and mitochondrial datasets tell different stories, and often result in different conclusions . Remarkably, studies that addressed the question using nuclear genomic data (ribosomal RNA and protein-encoding genes) indicate that the Collembola group between crustaceans and insects and that Hexapoda is monophyletic [2, 5, 11–18]. However, most of those studies included a relatively small number of loci , most likely because obtaining data on protein-encoding DNA sequences is not always straightforward for groups for which little genomic information is available. Here we try to fill this gap by re-evaluating the position of Collembola using a relatively large number of nuclear protein-encoding sequences that are, although all for ribosomal proteins, assumed to be distributed throughout the genome (see for example ).
Several authors have shown that publicly available data can be useful when conducting a large-scale phylogenetic study (eg. ), and that expressed sequence tags (ESTs) can be extremely valuable for phylogenetic purposes [21, 22]. Here, we combine data from a recently finished EST sequencing project on the collembolan Folsomia candida , with data on 34 ecdysozoan species (Chelicerata, Hexapoda, Tardigrada, Nematoda and Crustacea) available in the public GenBank repository , and with data from a smaller EST dataset of the collembolan Orchesella cincta. We focus on ribosomal proteins to prevent the problem of analyzing paralogous genes (sensu ).
In total, gene-sequences for 48 ribosomal proteins were obtained from the Folsomia candida EST dataset. This is almost two-thirds of the total set of 79 ribosomal proteins  found in the genome of Drosophila melanogaster. Four D. melanogaster ribosomal protein sequences (RpL15, RpL32, RpL36 and RpL39) showed high similarity with two, instead of one F. candida transcript cluster in the EST dataset. Comparison of the F. candida transcripts with those of D. melanogaster revealed insertions/deletions resulting in frame shifts in one of the two F. candida EST clusters for RpL15, RpL32 and RpL36. Transcripts with a frame-shift were discarded. Two highly diverse F. candida EST clusters (one consisting of six EST sequences and one singleton sequence) showed homology with D. melanogaster RpL39. The F. candida RpL39 singleton sequence was excluded from further analysis. The discarded RpL15, RpL32, RpL36 and RpL39 transcripts may stem from duplications in the F. candida genome (for example, in D. melanogaster nine ribosomal proteins are represented by two separate functional genes ), or from constitutively expressed pseudogenes. This situation may be analogous to the apparent amplification of many mammalian ribosomal proteins; for instance, the human genome contains over 2000 ribosomal protein pseudogenes . Still, it seems that only one copy of each ribosomal protein is actually functional [26, 27].
As described in the methods section, the remaining 48 ribosomal protein sequences were used to retrieve ribosomal protein sequence information on 32 additional ecdysozoan species. In addition, ribosomal protein sequences of D. melanogaster, Apis mellifera and Caenorhabditis elegans were retrieved from the Ribosomal Protein Gene-database (RPG; ). The number of usable (partial) ribosomal protein gene sequences that were obtained per species ranged from two (4% of the 48 genes: Amblyomma variegatum) to 48 (100% of the 48 genes: D. melanogaster and Apis mellifera) (Table 1). Redundancy for a given ribosomal protein gene in a given species was often low, and many gene sequences were represented by one or a few EST sequences only. It should be mentioned that due to this rather low sequence coverage the dataset is most probably not free from sequencing errors. Furthermore, none of the 48 ribosomal proteins that were included in the dataset were observed in all of the 36 species investigated (Table 1 and Additional file 1). In summary, for each ribosomal protein gene information was available for 25% to 86% of the taxa.
We calculated "similarity" values between the amino acid sequences of F. candida and three well-represented species (C. elegans (most-distant outgroup), Daphnia magna and A. mellifera). These values were mapped onto a ternary graph (Figure 1). Almost all points cluster in the lower region of the ternary graph, showing that for almost all genes the distance between F. candida and C. elegans is greater than the distance between F. candida and A. mellifera. The graph also shows that most genes of F. candida are more "similar" to A. mellifera, while some have more in common with D. magna.
The individual alignments were concatenated and phylogenetic analyses were conducted to investigate the position of Collembola. Two species were excluded from the analyses (Table 1). The final alignment had a length of 5034 inferred amino acids, representing in total 15,102 nucleotides. Information was available for 66% of the amino acid positions.
Likelihood mapping was applied to obtain estimates of phylogenetic signal. The concatenated dataset contained more phylogenetic signal (89% fully resolved quartets) than each of the independent ribosomal protein alignments (9–72% fully resolved quartets; data not shown).
The trees obtained by the different tree-reconstruction algorithms were highly comparable (Figure 2). In all reconstructions (MP, ML and Bayesian), Chelicerata and Pancrustacea each formed a monophyletic group, with relatively high support (Bayesian posterior probabilities both 100%). The two branchiopods included in this study (D. magna and Artemia franciscana) grouped together, and remained separate from the other crustaceans (Malacostraca).
The relationships within the Insecta were weakly resolved; however, Diptera was recovered as a monophyletic clade, as were Lepidoptera and Coleoptera (Figure 2). However, the Hemiptera were resolved as a paraphyletic group. Homalodisca coagulata grouped with Locusta migratoria (Orthoptera), rather than with the other hemipterans Acyrthosypon pisum and Toxoptera citricida. The highly supported, but obviously incorrect, positioning of Homalodisca coagulata does not seem to be an artefact of the method that allowed for missing data, since all three Hemiptera, as well as Locusta migratoria, were represented by a large number of ribosomal protein gene sequences (32 to 47). The incorrect placement of H. coagulata could be a consequence of the inability of ribosomal protein genes to resolve more recent evolutionary splits, which may be a trade-off of their suitability for deeper phylogenies.
Hexapoda was clearly monophyletic: Both collembolans (F. candida and O. cincta) grouped together and formed the sister-group to the Insecta in all analyses conducted (ML bootstrap RtRev+G+F = 91%, ML bootstrap Wag+G+F = 86%, MP bootstrap = 78%, Bayesian posterior probabilities = 100%).
Several C. elegans and D. melanogaster ribosomal proteins are duplicated (see also RPG database). MP analyses of a second concatenated dataset that contained D. melanogaster homologs for ribosomal protein RpS5, RpS15A, RpS19, RpS28, RpL34 and RpL10A resulted in a similar topology (data not shown).
In this study we reassessed the position of Collembola, using (partial) genes for 48 nuclear encoded proteins. The main result of our study is clear evidence of monophyly of Hexapoda. All phylogenetic reconstruction methods employed in this study support this hypothesis (Figure 2). Based on our nuclear dataset we conclude that the six-legged body plan, as found among insects and Collembola, evolved only once in the course of evolution. This is in contrast to results obtained using large mitochondrial datasets [3, 6, 9, 10] that by and large suggest that the characteristic hexapod body plan was acquired in parallel by Collembola and insects due to convergent evolution, rather than by descent.
Discrepancies between pancrustacean relationships as revealed by either nuclear or mitochondrial datasets seem almost universal. It is of major importance to focus on the causes of these discrepancies, and whether or not one of the two types of markers is superior. Elaborate discussions on the 'pros and cons' of one or both of the two different markers, and the possible approaches on how to correct for ambiguous signals are given in several recent papers [8, 9, 29–32]. Comparative studies that contrast nuclear and mitochondrial datasets suggest that nuclear markers are preferred in deep arthropodan molecular phylogenetics, as mitochondrial genes tend to be more substitutionally biased and evolve (in general) in a much faster way .
Already in 1999 Curole and Kocher  stated in a review paper that the value of mitochondrial genes in deep-level phylogeny is debatable and that "controversial" mitochondrial DNA (mtDNA) results should be verified with nuclear encoded genes. This was also the final conclusion of Springer and co-workers . These authors compared the usability of nuclear and mitochondrial encoded genes in inferring deep-level mammalian phylogenies. The authors report that nuclear encoded genes (exons) outperform mitochondrial markers in resolving deep splits. Springer and co-workers suggest that the reason for this dissimilarity in resolving-power might be found, among others, in the rate of nucleotide substitution .
Still, although the nuclear protein-encoding sequences in the study of Springer et al.  outperformed the mitochondrial genes, mtDNA-based studies are not necessarily useless for deep phylogeny. They are only problematic if mitochondrial genomes evolve at such a rate that saturation of substitutions makes actual phylogenetic signals from deeper nodes hazy . Otherwise, analyses using appropriate models should still be able to retrieve a plausible tree . In a recent study, Kjer and Honeycutt  used an approach that included all data found in mitochondrial genomes (including for instance 3rd codon positions, but excluding the control region). After applying a site-specific rate model, these authors retrieved a phylogenetic tree of mammals that was in accordance with recent nuclear DNA based phylogenies .
When investigating cheliceratan relationships Jones et al.  arrived at a comparable conclusion. These authors state that mtDNA can be applied in molecular phylogenetics, but only when an appropriate substitution model (e.g. to correct for strand-bias) is used. These authors state as a final remark that earlier mtDNA studies that focused on deep-phylogenetic questions should be thoroughly re-evaluated . However, such models of mitochondrial sequence evolution might first need to be developed before Collembola can be placed with certainty in the arthropod phylogenetic tree. As mentioned before, Carapelli and co-workers  investigated an innovative pancrustacean-model of mitochondrial protein change. This model significantly aided the tree building, but did not yield a monophyletic Hexapoda .
An advantage of ribosomal protein genes is that the sequences of different species can be relatively easily homologized due to their conserved nature. However, there are also disadvantages. Although ribosomal protein genes are distributed all over the genome, they definitely do not evolve independently. Coevolving sites are known to exist in ribosomal proteins . For example, amino acid residues that are near tRNA binding sites in the ribosome appear to evolve in a related manner .
It has to be mentioned that we included only two Collembola in our analyses. Preferably, more springtail species, and maybe even more importantly, proturans and diplurans, should be included. Those latter basal hexapod groups were excluded from the current analysis as they lack available (EST) data. While earlier work suggests that proturan and dipluran genes might be fairly divergent from other arthropods , this and other papers (e.g. [21, 22]) suggest that it should be relatively easy to obtain phylogenetically relevant sequence information on those groups by EST sequencing.
Another intriguing result of this study is the non-monophyly of the crustaceans. The branchiopods D. magna and A. franciscana clustered with the hexapods rather than with the other crustaceans in the malacostracan group. This is in accordance with studies by Regier and co-workers  and Mallat and Giribet , which suggests that the hexapod lineage evolved from within the crustaceans . The observed close relationship between hexapods and branchiopods, in combination with some other characteristics, made Glenner and co-workers  suggest that branchiopod groups colonized terrestrial ecosystems as insects.
As a final remark we would like to point out that this study shows that Collembola occupy a crucial position. Obtaining additional (EST) sequence information on Collembola, as well as other basal hexapods (Protura, Diplura and Microcoryphia) will definitely result in a better understanding of the phylogenetic origin of insects.
The phylogenetic efforts presented here clearly show that Collembola is a sister group of Insecta (Figure 2). Our results reinforce the discrepancy between results obtained using mitochondrial and nuclear datasets. It seems of major importance to unravel the underlying causes of the disagreements observed, or otherwise focus on nuclear encoded genes.
EST dataset and ribosomal protein selection
Recently, approximately 9.000 F. candida EST sequences were generated (see  for additional information). In order to select springtail ribosomal protein gene sequences from this EST dataset ribosomal protein cDNA sequences of Drosophila melanogaster were retrieved from the Ribosomal Protein Gene database (RPG ). These sequences were then compared with the F. candida EST dataset using TBlastX . Springtail sequences showing significant similarity (E value < 10-10) were used for further analysis. All F. candida sequences are stored in dbEST.
Sequence retrieval and DNA alignment
For this study 35 additional species, comprising nineteen hexapod species, nine crustacean species, five chelicerates, and two non-arthropod ecdysozoans (one nematode, and one tardigrade) were selected (Table 1). For 31 species all available nucleotide sequences were retrieved from NCBI Genbank (including ESTs) using a Perl script, BioPerl  and NCBI Entrez Programming Utilities . Species-specific BLAST databases were constructed. The F. candida ribosomal protein gene sequences, obtained as described above, were compared to these databases using TBlastX (minimal E value < 10-10). For every species, the sequences showing significant resemblance to a specific ribosomal protein were retrieved using Perl and BioPerl  and grouped in a FASTA file (with a maximum of 24 sequences per ribosomal protein per species). Additional file 2 shows all GenBank accessions that were used. The software program Phrap (P. Green, pers. comm. ) was applied to assemble a "consensus" sequence for each of these FASTA files: Phrap combines all available sequences and takes sequence coverage into account, which results in more precise consensus sequences. When Phrap created more than one sequence for a given ribosomal protein gene in a given species, the sequence part that was most abundant in the original sequence dataset was used for further analysis. All obtained nucleotide sequences were automatically translated to high quality peptides using the software program prot4EST .
The 48 ribosomal sequences were in addition compared to three smaller and unpublished collembolan (Orchesella cincta) EST datasets. These O. cincta ESTs were generated from libraries constructed by Roelofs and co-workers , Ellers and co-workers  and T.K.S Janssens. Finally, D. melanogaster, C. elegans and Apis mellifera protein sequences were obtained from RPG as well.
For each ribosomal protein gene, the protein or the prot4EST inferred amino-acid sequences of the different species were aligned using ClustalW  and inspected with GeneDoc . If for a certain species a ribosomal protein was represented by more than one locus in the RPG database, one ribosomal protein was randomly taken. Additional alignments were made in which the chosen D. melanogaster sequences were replaced by their homologous counterparts. This was done for RpS5, RpS15A, RpS19, RpS28, RpL34 and RpL10A. Sequences that aligned poorly were subjected to visual inspection, and those sequences that appeared to be out of frame from an identifiable amino-acid position were manually corrected and re-aligned. This implied that insertions and deletions causing frame-shifts were characterized as missing or were removed. All alignments were trimmed to the length of the F. candida sequence. Finally, inadequately aligned regions were excluded from further analysis using the program Gblocks .
First, to obtain insight into the information contained by each of the 48 inferred ribosomal protein sequences, the distances (Kimura's distance ) between F. candida and three well-represented species (C. elegans (outgroup), D. magna (Crustacea) and A. mellifera (Insecta)) were calculated using the PHYLIP package Protdist . Those values were used to calculate "similarity" values by subtracting the distance value from one (similarity = 1 - distance). Similarities were visualized in a ternary graph in Microsoft Excel, using TernPlot . All the individual alignments were additionally subjected to a likelihood mapping analysis using Tree-Puzzle [52, 53] (max. 10.000 quartets, WAG model  of substitution) in order to assess the phylogenetic signal in the dataset.
Second, all the individual alignments were concatenated into a single alignment. If due to the presence of paralogous D. melanogaster sequences two alignments were available for one ribosomal protein, only one was included. This procedure resulted in a dataset with spaces of missing data (sensu ). The alignment is available from . To check if the final outcome depended critically on the choice for one or the other paralog, a second concatenated alignment was made in which each D. melanogaster homolog was replaced by its counterpart (for RpS5, RpS15A, RpS19, RpS28, RpL34 and RpL10A).
The first concatenated dataset was analyzed with Tree-Puzzle  as described above. Subsequently, this alignment was analyzed with Maximum Parsimony (MP), Maximum Likelihood (ML) and Bayesian methods. ML analyses (100 bootstrap replicates) were conducted using the Linux version of Phyml v2.4.4 , applying substitution models that were selected with ModelGenerator (gamma distribution with four rate categories). The selected model for the translated dataset was RtREV+G+F . The ML analysis of the inferred amino acid dataset was repeated using the WAG+G+F substitution model; this model is appropriate for soluble proteins like ribosomal proteins , and provided the third-best data fit after the RtREV+G+F and the RtREV+I+G+F models. Bayesian analysis (RtREV+G+F) was conducted using the Windows version of MrBayes . Analyses were run for 1,000,000 generations (MCMC sampling without heating, "one chain" and tree-sampling every 100 generations). The log likelihood values for the different generations were used to determine stationarity by plotting them, and the first 50,000 generations were discarded as "burn-in".
The ML and Bayesian analysis used the same model of sequence evolution for all the concatenated genes. However, likelihood methods restricted to only one model might perform inadequately when analyzing concatenated datasets . Therefore, the data was analyzed using Maximum Parsimony, which might address this problem. The MP analysis was performed in the software MEGA  under Windows using all available sites (1,000 bootstrap analyses; Starting tree obtained by Random Addition). The second concatenated dataset, which contained the homologous counterparts of duplicated D. melanogaster ribosomal proteins was analyzed using MP only.
Bootstrap values above 70% (ML and MP) or 95% (Bayesian) were deemed significant.
Bitsch J, Bitsch C, Bourgoin T, D'Haese C: The phylogenetic position of early hexapod lineages: morphological data contradict molecular data. Systematic Entomology. 2004, 29 (4): 433-440. 10.1111/j.0307-6970.2004.00261.x.
Carapelli A, Nardi F, Dallai R, Frati F: A review of molecular data for the phylogeny of basal hexapods. Pedobiologia. 2006, 50 (2): 191-204. 10.1016/j.pedobi.2006.01.001.
Carapelli A, Nardi F, Dallai R, Boore JL, Liò P, Frati F: Relationships between hexapods and crustaceans based on 4 mitochondrial genes . Crustacean Issues. Edited by: R.A.Jenner SK. 2005, 16 (Crustacea and Arthropod Relationships): 295-306.
Boore JL, Lavrov DV, Brown WM: Gene translocation links insects and crustaceans. Nature. 1998, 392 (6677): 667-668. 10.1038/33577.
Regier JC, Shultz JW, Kambic RE: Pancrustacean phylogeny: hexapods are terrestrial crustaceans and maxillopods are not monophyletic. Proceedings of the Royal Society of London Series B-Biological Sciences. 2005, 272 (1561): 395-401. 10.1098/rspb.2004.2917.
Nardi F, Spinsanti G, Boore JL, Carapelli A, Dallai R, Frati F: Hexapod origins: Monophyletic or paraphyletic?. Science. 2003, 299 (5614): 1887-1889. 10.1126/science.1078607.
Cook CE, Yue QY, Akam M: Mitochondrial genomes suggest that hexapods and crustaceans are mutually paraphyletic. Proceedings of the Royal Society of London Series B-Biological Sciences. 2005, 272 (1569): 1295-1304. 10.1098/rspb.2004.3042.
Cameron SL, Miller KB, D'Haese CA, Whiting MF, Barker SC: Mitochondrial genome data alone are not enough to unambiguously resolve the relationships of Entognatha, Insecta and Crustacea sensu lato (Arthropoda). Cladistics. 2004, 20 (6): 534-557. 10.1111/j.1096-0031.2004.00040.x.
Hassanin A: Phylogeny of Arthropoda inferred from mitochondrial sequences: Strategies for limiting the misleading effects of multiple changes in pattern and rates of substitution. Molecular Phylogenetics and Evolution. 2006, 38 (1): 100-116. 10.1016/j.ympev.2005.09.012.
Carapelli A, Lio P, Nardi F, van der Wath E, Frati F: Phylogenetic analysis of mitochondrial protein coding genes confirms the reciprocal paraphyly of Hexapoda and Crustacea. BMC Evolutionary Biology. 2007, 7: (Suppl 2):S8-10.1186/1471-2148-7-S2-S8.
Luan YX, Mallatt JM, Xie RD, Yang YM, Yin WY: The phylogenetic positions of three basal-hexapod groups (Protura, Diplura, and Collembola) based on ribosomal RNA gene sequences. Molecular Biology and Evolution. 2005, 22 (7): 1579-1592. 10.1093/molbev/msi148.
Mallatt J, Giribet G: Further use of nearly complete, 28S and 18S rRNA genes to classify Ecdysozoa: 37 more arthropods and a kinorhynch. Molecular Phylogenetics and Evolution. 2006, 40 (3): 772-794. 10.1016/j.ympev.2006.04.021.
Mallatt JM, Garey JR, Shultz JW: Ecdysozoan phylogeny and Bayesian inference: first use of nearly complete 28S and 18S rRNA gene sequences to classify the arthropods and their kin. Molecular Phylogenetics and Evolution. 2004, 31 (1): 178-191. 10.1016/j.ympev.2003.07.013.
Shultz JW, Regier JC: Phylogenetic analysis of arthropods using two nuclear protein-encoding genes supports a crustacean plus hexapod clade. Proceedings of the Royal Society of London Series B-Biological Sciences. 2000, 267 (1447): 1011-1019. 10.1098/rspb.2000.1104.
Regier JC, Shultz JW, Kambic RE: Phylogeny of basal hexapod lineages and estimates of divergence times. Annals of the Entomological Society of America. 2004, 97 (3): 411-419. 10.1603/0013-8746(2004)097[0411:POBHLA]2.0.CO;2.
Giribet G, Edgecombe GD, Carpenter JM, D'Haese CA, Wheeler WC: Is Ellipura monophyletic? A combined analysis of basal hexapod relationships with emphasis on the origin of insects. Organism Diversity & Evolution. 2004, 4 (4): 319-340. 10.1016/j.ode.2004.05.001.
Colgan DJ, McLauchlan A, Wilson GDF, Livingston SP, Edgecombe GD, Macaranas J, Cassis G, Gray MR: Histone H3 and U2 snRNA DNA sequences and arthropod molecular evolution. Australian Journal of Zoology. 1998, 46 (5): 419-437. 10.1071/ZO98048.
Cook CE, Smith ML, Telford MJ, Bastianello A, Akam M: Hox genes and the phylogeny of the arthropods. Current Biology. 2001, 11 (10): 759-763. 10.1016/S0960-9822(01)00222-6.
Marygold S, Roote J, Reuter G, Lambertsson A, Ashburner M, Millburn G, Harrison P, Yu Z, Kenmochi N, Kaufman T, Leevers S, Cook K: The ribosomal protein genes and Minute loci of Drosophila melanogaster. Genome Biology. 2007, 8 (10): R216-10.1186/gb-2007-8-10-r216.
Driskell AC, Ane C, Burleigh JG, McMahon MM, O'Meara BC, Sanderson MJ: Prospects for building the tree of life from large sequence databases. Science. 2004, 306 (5699): 1172-1174. 10.1126/science.1102036.
Hughes J, Longhorn SJ, Papadopoulou A, Theodorides K, de Riva A, Mejia-Chang M, Foster PG, Vogler AP: Dense taxonomic EST sampling and its applications for molecular systematics of the Coleoptera (beetles). Molecular Biology and Evolution. 2006, 23 (2): 268-278. 10.1093/molbev/msj041.
Parkinson J, Mitreva M, Whitton C, Thomson M, Daub J, Martin J, Schmid R, Hall N, Barrell B, Waterston RH, McCarter JP, Blaxter ML: A transcriptomic analysis of the phylum Nematoda. Nature Genetics. 2004, 36 (12): 1259-1267. 10.1038/ng1472.
Timmermans MJ, de Boer ME, Nota B, de Boer TE, Marien J, Klein-Lankhorst RM, van Straalen NM, Roelofs D: Collembase: a repository for springtail genomics and soil quality assessment. BMC Genomics. 2007, 8: 341-10.1186/1471-2164-8-341.
National Center of Biotechnology. [http://www.ncbi.nlm.nih.gov/]
Zhang ZL, Harrison P, Gerstein M: Identification and analysis of over 2000 ribosomal protein pseudogenes in the human genome. Genome Research. 2002, 12 (10): 1466-1482. 10.1101/gr.331902.
Chan YL, Suzuki K, Wool IG: The Carboxyl Extensions of 2 Rat Ubiquitin Fusion Proteins Are Ribosomal-Proteins S27a and L40. Biochemical and Biophysical Research Communications. 1995, 215 (2): 682-690. 10.1006/bbrc.1995.2518.
Kenmochi N, Kawaguchi T, Rozen S, Davis E, Goodman N, Hudson TJ, Tanaka T, Page DC: A map of 75 human ribosomal protein genes. Genome Research. 1998, 8 (5): 509-523.
Nakao A, Yoshihama M, Kenmochi N: RPG: the Ribosomal Protein Gene database. Nucleic Acids Research. 2004, 32: D168-D170. 10.1093/nar/gkh004.
Simon C, Buckley TR, Frati F, Stewart JB, Beckenbach AT: Incorporating molecular evolution into phylogenetic analysis, and a new compilation of conserved polymerase chain reaction primers for animal mitochondrial DNA. Annual Review of Ecology Evolution and Systematics. 2006, 37: 545-579. 10.1146/annurev.ecolsys.37.091305.110018.
Lin CP, Danforth BN: How do insect nuclear and mitochondrial gene substitution patterns differ? Insights from Bayesian analyses of combined datasets. Molecular Phylogenetics and Evolution. 2004, 30 (3): 686-702. 10.1016/S1055-7903(03)00241-0.
Cameron SL, Beckenbach AT, Dowton M, Whiting MF: Evidence from mitochondrial genomics on interordinal relationships in insects. Arthropod Systematics and Phylogeny. 2006, 64 (1): 27-34.
Delsuc F, Phillips MJ, Penny D: Comment on "Hexapod origins: Monophyletic or paraphyletic?". Science. 2003, 301: 5639-10.1126/science.1086558.
Curole JP, Kocher TD: Mitogenomics: digging deeper with complete mitochondrial genomes. Trends in Ecology & Evolution. 1999, 14 (10): 394-398. 10.1016/S0169-5347(99)01660-2.
Springer MS, DeBry RW, Douady C, Amrine HM, Madsen O, de Jong WW, Stanhope MJ: Mitochondrial versus nuclear gene sequences in deep-level mammalian phylogeny reconstruction. Molecular Biology and Evolution. 2001, 18 (2): 132-143.
Kjer KM, Honeycutt RL: Site specific rates of mitochondrial genomes and the phylogeny of eutheria. BMC Evolutionary Biology. 2007, 7: 8-10.1186/1471-2148-7-8.
Jones M, Gantenbein B, Fet V, Blaxter M: The effect of model choice on phylogenetic inference using mitochondrial sequence data: Lessons from the scorpions. Molecular Phylogenetics and Evolution. 2007, 43 (2): 583-595. 10.1016/j.ympev.2006.11.017.
Yeang CH, Haussler D: Detecting coevolution in and among protein domains. PLoS Computational Biology. 2007, 3 (11): 2122-2134. 10.1371/journal.pcbi.0030211.
Glenner H, Thomsen PF, Hebsgaard MB, Sorensen MV, Willerslev E: The origin of insects. Science. 2006, 314 (5807): 1883-1884. 10.1126/science.1129844.
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic Local Alignment Search Tool. Nucleic Acids Research. 1990, 215 (3): 403-410.
Stajich JE, Block D, Boulez K, Brenner SE, Chervitz SA, Dagdigian C, Fuellen G, Gilbert JGR, Korf I, Lapp H, Lehvaslaiho H, Matsalla C, Mungall CJ, Osborne BI, Pocock MR, Schattner P, Senger M, Stein LD, Stupka E, Wilkinson MD, Birney E: The bioperl toolkit: Perl modules for the life sciences. Genome Research. 2002, 12 (10): 1611-1618. 10.1101/gr.361602.
Entrez Programming Utilities. [http://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html]
Wasmuth JD, Blaxter ML: Prot4EST: Translating Expressed Sequence Tags from neglected genomes. BMC Bioinformatics. 2004, 5: 187-10.1186/1471-2105-5-187.
Roelofs D, Marien J, van Straalen NM: Differential gene expression profiles associated with heavy metal tolerance in the soil insect Orchesella cincta. Insect Biochemistry and Molecular Biology. 2007, 37 (4): 287-295. 10.1016/j.ibmb.2006.11.013.
Ellers J, Mariën J, Driessen G, van Straalen NM: Temperature-induced gene expression associated with different thermal reaction norms for growth rate. Journal of Experimental Zoology Part B: Molecular and Developmental Evolution. 2008, 310B: 137-147. 10.1002/jez.b.21194.
Chenna R, Sugawara H, Koike T, Lopez R, Gibson TJ, Higgins DG, Thompson JD: Multiple sequence alignment with the Clustal series of programs. Nucleic Acids Research. 2003, 31 (13): 3497-3500. 10.1093/nar/gkg500.
Nicholas KB, Nichokasm HB: GeneDoc: a tool for editing and annotating multiple sequence alignments. Distributed by the author. 1997
Castresana J: Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Molecular Biology and Evolution. 2000, 17 (4): 540-552.
Kimura M: The Neutral Theory of Molecular Evolution. 1983, Cambridge , Cambridge University Press
Felsenstein J: PHYLIP - Phylogeny Inference Package (Version 3.2). Cladistics. 1989, 5: 164-166.
Marshall D: TernPlot: An Excel spreadsheet for ternary diagrams. Computers & Geosciences. 1996, 22 (6): 697-699. 10.1016/0098-3004(96)00012-X.
Schmidt HA, Strimmer K, Vingron M, von Haeseler A: TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics. 2002, 18 (3): 502-504. 10.1093/bioinformatics/18.3.502.
Strimmer K, vonHaeseler A: Likelihood-mapping: A simple method to visualize phylogenetic content of a sequence alignment. Proceedings of the National Academy of Sciences of the United States of America. 1997, 94 (13): 6815-6819. 10.1073/pnas.94.13.6815.
Whelan S, Goldman N: A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Molecular Biology and Evolution. 2001, 18 (5): 691-699.
Guindon S, Gascuel O: A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Systematic Biology. 2003, 52 (5): 696-704. 10.1080/10635150390235520.
Keane TM, Creevey CJ, Pentony MM, Naughton TJ, McLnerney JO: Assessment of methods for amino acid matrix selection and their use on empirical data shows that ad hoc assumptions for choice of matrix are not justified. BMC Evolutionary Biology. 2006, 6: 29-10.1186/1471-2148-6-29.
Dimmic MW, Rest JS, Mindell DP, Goldstein RA: rtREV: An amino acid substitution matrix for inference of retrovirus and reverse transcriptase phylogeny. Journal of Molecular Evolution. 2002, 55 (1): 65-73. 10.1007/s00239-001-2304-y.
Ronquist F, Huelsenbeck JP: MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics. 2003, 19 (12): 1572-1574. 10.1093/bioinformatics/btg180.
Kolaczkowski B, Thornton JW: Performance of maximum parsimony and likelihood phylogenetics when evolution is heterogeneous. Nature. 2004, 431 (7011): 980-984. 10.1038/nature02917.
Tamura K, Dudley J, Nei M, Kumar S: MEGA4: Molecular evolutionary genetics analysis (MEGA) software version 4.0. Molecular Biology and Evolution. 2007, 24 (8): 1596-1599. 10.1093/molbev/msm092.
The authors would like to thank C.A.M van Gestel, Hélène Beauchamp and four anonymous reviewers for valuable comments on previous versions of the manuscript. In addition the authors thank F.N. Soto-Adames for suggestions that aided the phylogenetic reconstruction, and Muriel de Boer, Tjalf de Boer and Ben Nota, which were involved in the general set-up of the project. Finally the authors thank Thierry Janssens and Jacintha Ellers for allowing access to their Orchesella cincta EST datasets. This project was partly financed through funding from a Bsik Research grant (BSIK03011) from the Netherlands Genomics Initiative.
MJTNT, DR and NMvS designed the study. MJTNT conducted the bioinformatic and phylogenetic analyses and wrote the manuscript. JM generated Orchesella cincta EST data. DR, NMvS and JM commented on the manuscript. All authors read and approved the last version of the manuscript.