Rodent phylogeny revised: analysis of six nuclear genes from all major rodent clades
© Blanga-Kanfi et al. 2009
Received: 04 June 2008
Accepted: 02 April 2009
Published: 02 April 2009
Skip to main content
© Blanga-Kanfi et al. 2009
Received: 04 June 2008
Accepted: 02 April 2009
Published: 02 April 2009
Rodentia is the most diverse order of placental mammals, with extant rodent species representing about half of all placental diversity. In spite of many morphological and molecular studies, the family-level relationships among rodents and the location of the rodent root are still debated. Although various datasets have already been analyzed to solve rodent phylogeny at the family level, these are difficult to combine because they involve different taxa and genes.
We present here the largest protein-coding dataset used to study rodent relationships. It comprises six nuclear genes, 41 rodent species, and eight outgroups. Our phylogenetic reconstructions strongly support the division of Rodentia into three clades: (1) a "squirrel-related clade", (2) a "mouse-related clade", and (3) Ctenohystrica. Almost all evolutionary relationships within these clades are also highly supported. The primary remaining uncertainty is the position of the root. The application of various models and techniques aimed to remove non-phylogenetic signal was unable to solve the basal rodent trifurcation.
Sequencing and analyzing a large sequence dataset enabled us to resolve most of the evolutionary relationships among Rodentia. Our findings suggest that the uncertainty regarding the position of the rodent root reflects the rapid rodent radiation that occurred in the Paleocene rather than the presence of conflicting phylogenetic and non-phylogenetic signals in the dataset.
The order Rodentia is the most diverse among placental mammals: extant rodent species represent half of the placental diversity (2,277 species divided into 33 families) . Morphological phylogenetic approaches have identified characters supporting a common origin (monophyly) of rodents, and clustered rodents and lagomorphs (rabbits, pikas) in a clade called Glires . Morphological studies also generally agree on the number and content of rodent families [1, 3, 4]. However, the description of the relationships among rodent families has been confounded by rampant convergent evolution of morphological characters . Based on morphological characters, rodents have been divided into either two or three suborders. The first system, suggested by Brandt, divides rodents into three suborders, Myomorpha, Sciuromorpha, and Hystricomorpha, based on the position of masticatory muscles (the masseters) . However, it has since been proven that this character is homoplasic and that this classification does not reflect evolutionary relationships [7, 8]. The second system, proposed by Tullberg, divides rodents into two suborders, Sciurognathi and Hystricognathi, based on the position of the incisors and the angle of the jaw . The monophyly of Hystricognathi has been accepted, based on the identification of additional morphological synapomorphies, but the Sciurognathi are usually considered to be paraphyletic . Debates on the relationships within Sciurognathi and their relationships with Hystricognathi are the subject of numerous morphological papers [reviewed in ]. Molecular studies were expected to clarify the relationships among rodents. However, early studies based on molecular data complicated the understanding of rodent evolution by suggesting that rodents are paraphyletic [12–14]. These results initiated lively debates concerning evolutionary relationships among rodents and their place among placental mammals [15–17]. Phylogenetic conclusions supporting rodent paraphyly have been criticized, because they were based on a very limited taxonomic sampling. It has been suggested that increasing the sampling of rodent diversity  and/or mammalian diversity  would have supported rodent monophyly. Additionally, over-simplified models have been shown to erroneously support rodent paraphyly . Recent analyses based on a representative sampling of rodent taxonomic diversity and using model-based methods of sequence analysis have strongly supported the monophyly of rodents [20–24].
Within Rodentia, molecular analyses suggest that rodents are divided into seven well-supported clades: 1-Anomaluromorpha (scaly-tailed flying squirrels, springhares), 2-Castoridae (beavers), 3-Ctenohystrica (gundi, porcupines, guinea-pigs), 4-Geomyoidea (pocket gophers, pocket mice), 5-Gliridae (dormice), 6-Myodonta (rats, mice, jerboas), and 7-Sciuroidea (mountain beavers, squirrels, woodchucks) [25–28]. However, several evolutionary relationships within Rodentia are still debated. Recent studies have suggested that these seven clades are clustered into three main lineages: 1 – Anomaluromorpha, Castoridae, Geomyoidea, and Myodonta together form the "mouse-related clade"; 2 – Sciuriodea and Gliridae form the "squirrel-related clade"; and 3 – Ctenohystrica forms the third lineage [29–32]. However, most studies have not been able to solve the relationships among these three clades. Recently, Montgelard et al.  analyzed mitochondrial genes as well as nuclear exonic and intronic sequences, and found significant support in favor of a basal position of the "mouse-related clade". This result was dependent on the removal of the fastest evolving characters from the dataset, suggesting that mutational saturation might explain the inconclusive placement of the rodent root.
More generally, Rodriguez-Ezpeleta et al.  have shown that weakly supported nodes can sometimes be explained by the presence of conflicting phylogenetic and non-phylogenetic signal in a dataset. Three methods to reduce the non-phylogenetic information have been suggested: identification and removal of fast-evolving positions, character-recoding (e.g., RY coding), and the use of a site-heterogenous mixture model (e.g., CAT) .
Here, we aimed to resolve rodent relationships at the family level and above. We established a comprehensive dataset including six nuclear gene fragments from 41 rodent species together with eight outgroup species. We were able to solve most evolutionary relationships among rodent families. In order to minimize conflicting signals and thus solve the debated basal rodent relationships, we applied the three methods suggested by Rodriguez-Ezpeleta et al. . We show that none of these methods, nor the use of more complex evolutionary models, can significantly solve basal rodent relationships. Additionally, some of our analyses, surprisingly, suggest a basal position of the squirrel-related clade and significantly reject the basal position of the "mouse-related clade" supported by Montgelard et al. . We thus propose that the lack of resolution at the base of the rodent tree may reflect rapid rodent radiation, rather than conflicting phylogenetic signals.
Results of likelihood-based tests of alternative topologies
Diff -ln L
Unconstrained (ML tree)
Ctenohystrica at the base of the rodent tree
Myodonta at the base of the mouse-related clade
Mouse-related clade at the base of the rodent tree
Anomaluromorpha at the base of the mouse-related clade
[Cavioidea+Erethizontoidea] not monophyletic
[Chinchilloidea+Octodontoidea] not monophyletic
Paraphyly of the squirrel-related clade
Caviomorpha at the base of Hystricognathi
Phiomorpha at the base of Hystricognathi
Paraphyly of the mouse-related clade
Paraphyly of Ctenohystrica
Maximum likelihood bootstrap support of main rodent relationships under different coding models
Partitioned DNA models
and third codon
with RY coding
A partition per
A partition per
Squirrel-related clade at the base of the rodent tree
Mouse-related clade at the base of the rodent tree
less than 1
less than 1
Ctenohystrica at the base of the rodent tree
Monophyly of squirrel-related clade
Monophyly of mouse related clade
Monophyly of Ctenohystrica
[Anomaluromorpha + Myodonta] monophyly
Hystricidae at the base of the Hystricognathi
Previous phylogenetic reconstructions were unable to solve the relationships among the three main lineages of the mouse-related clade (Myodonta, Anomaluromorpha, and Castorimorpha), and all three possible evolutionary relationships have been suggested [22, 25, 26, 28, 31, 32, 36]. Our phylogenetic inference based on the full nucleotide dataset suggests the grouping of Anomaluromorpha with Myodonda (Figure 1). However, bootstrap and Bayesian support is at best moderate across the analyses considered (Table 2, BP = 37–72, PP = 0.58). In agreement with the bootstrap analysis, an AU test does not reject either alternative hypotheses (Table 1, p -value = 0.159–0.604). Additional data are thus needed to resolve the relationships at the base of the mouse-related clade. All other nodes within the mouse-related clade are well supported and alternatives are rejected based on an AU test (data not shown).
The grouping of Gliridae and Sciuridae has been recognized in morphological studies based on middle ear features , arterial pattern , and by most molecular analyses. Nevertheless, high support values have seldom been obtained to support this relationship [22, 25–27, 29, 31, 32]. This node is well supported in our study (BP = 86, PP = 1.0). It is also supported in our analyses using different coding and partitions approaches (Table 2, BP = 86–98). However, alternatives to the monophyly of this clade are not rejected according to the AU test (Table 1, p -value = 0.123).
The clustering of Ctenodactylidae and Hystricognathi is highly supported (BP = 100, PP = 1.0). Previous knowledge of relationships within hystricognaths has been based either on a single gene (vWF or 12sRNA) for many hystricognath species (22–23 species) [39, 40] or on multiple genes (3–6 genes) for fewer species (8–13 species) [22, 29, 32, 41]. The present dataset expands that of Huchon et al.  by the addition of two nuclear gene fragments and four hystricognath taxa (in particular, a second representative of the Hystricidae). This expanded dataset allows us to solve the debated relationships within Hystricognathi. We find strong support for a basal position of Hystricidae within Hystricognathi (Figure 1, Table 2, BP = 89–95, PP = 1.0), while this position was previously only weakly supported [25, 26, 29, 40]. However, AU tests do not reject alternative positions of Hystricidae (Table 1, p -value = 0.128).
Phylogenetic relationships among South-American hystricognaths (i.e., Caviomorpha) have long been debated. Caviomorphs have been found to comprise four distinct lineages (Cavioidae, Chinchilloidea, Erethizontoidea, and Octodontoidea) . Our results confirm that chinchilla rats (Abrocoma) are not related to Chinchilla but rather belong to the Octodontoidea (BP = 100, PP = 1.0) [40, 42, 43]. Previous molecular trees did not resolve the relationships among the four caviomorph lineages with high bootstrap support, and various alternative topologies have been suggested [22, 25, 26, 29, 40]. Our data support a sister clade relationship between Cavioidea and Erethizontoidea (BP = 95, PP = 0.98), and a sister clade relationship between Chinchilloidea and Octodontoidea (Figure 1, BP = 88, PP = 0.92). In spite of these high support values, AU tests indicate that the best alternatives to these arrangements within Caviomorpha cannot be rejected (Table 1, p -value = 0.148–0.206). Similarly, analyses using RY coding or removal of third codon positions, as well as protein sequence analysis, support other relationships within Caviomorpha (data not shown). This suggests that additional species sampling is needed in order to robustly solve caviomorph relationships at the superfamily level.
The most important unresolved relationship in rodent systematics is the one at the base of the rodent tree. To date, no phylogenetic analysis has been able to resolve this question with strong support, whether based on nucleotide sequence data [24, 25, 29, 31], SINE data [36, 44], or morphological data [8, 10]. The only exception is the analysis of Montgelard et al. , which supports a basal position of the mouse-related clade after removal of fast-evolving nucleotide positions. Our nucleotide-based ML and Bayesian analyses (all three codon positions; Figure 1) place the squirrel-related clade at the base of the rodent tree. Our Bayesian analysis with the data partitioned by gene and partially partitioned by codon position (1st- and 2nd-position sites combined within genes, 3rd-position sites for each gene separate) appears to provide strong support for this relationship (PP > 0.90), but the partitioned ML bootstrap support values are much lower (Table 2, BP = 51). It is possible for Bayesian PP values to be artificially inflated under circumstances of a near-trichotomy . With this single Bayesian analysis being the only suggestion of strong support, and with the corresponding ML bootstrap support being so low, we hesitate to give much weight to the partitioned Bayesian result at the present time.
Recently, Rodriguez-Ezpeleta et al.  have shown that the presence of conflicting phylogenetic and non-phylogenetic signal in a dataset may result in weakly supported nodes. They suggested various approaches to remove the non-phylogenetic signal and thus increase the ability to resolve difficult phylogenetic relationships. To evaluate whether the resolution of the basal relationships among rodents could be improved by reducing non-phylogenetic signal in our dataset, we tested all the approaches suggested by Rodriguez-Ezpeleta et al. .
Third codon-position sites evolve the fastest, and are thus the most likely source of non-phylogenetic signal. In an attempt to reduce the non-phylogenetic signal, we performed analyses using RY coding for these positions. We also explored the extreme solution of removing all third codon-position sites. However, none of the three possible basal branching topologies was highly supported under these alternatives (Table 2). The only signal that can be seen is that a basal position of the mouse-related clade is not supported by the analysis of either the nucleotide dataset with only the first two codon positions, or the protein sequence dataset (Table 2, BP < 1).
The phylogenetic trees obtained under the CAT model did not help resolving the basal rodent relationships. The CAT analysis suggests that the squirrel-related clade is the first rodent lineage to diverge. However, no strong support in favor of this relationship is found, whether reconstructions are based on nucleotide or protein sequences (PPDNA = 0.57; PPPROTEIN = 0.75).
Maximum log-likelihood scores and AU test p -values under different models of sequence evolution for three possible basal rodent relationships.
Model of sequence evolution
JTT + Γ4
Rate shift model + Γ4
Codon model + Γ4 (without positive selection)
Diff -ln L
Diff -ln L
Diff -ln L
Squirrel-related clade at the base
Ctenohystrica at the base
Mouse related clade at the base
The nucleotide sequences were also analyzed using codon models. No support for positive selection was found, and hence, we only report the results obtained using the M8a model, which does not allow sites to evolve under positive selection. Under this model, a basal position of the Ctenohystrica is the most likely. However, the fit of the data to this topology is not significantly better than alternative topologies (0.9 and 4.0 log-likelihood point differences, for the topology with a basal position of the squirrel-related clade and the topology with a basal position of the mouse-related clade, respectively). The three possible rootings of the rodent tree are thus not statistically different based on AU tests (Table 3).
Our phylogenetic reconstructions provide a well-resolved rodent tree, except for a few nodes and the basal relationships among the main rodent clades. Unlike Montgelard et al. , removing fast evolving characters did not improve the resolution at the base of the rodent tree. This lack of resolution remained when all the other methods suggested by Rodriguez-Ezpeleta et al.  to increase tree resolution were applied. Surprisingly, using the JTT and the rate-shift models, we were able to reject a basal position of the mouse-related clade supported by Montgelard et al.  and support instead a basal position of the squirrel-related clade (a topology rejected by Montgelard et al. ). This suggests that removing fast evolving positions is not a panacea to solve phylogenetic conflicts, since different datasets can lead to significantly different results when using this approach.
More generally, our results suggest that the low support at the base of the rodent tree cannot be attributed only to the presence of conflicting non-phylogenetic signal, since removing such non-phylogenetic signal failed to significantly increase the tree resolution. We thus hypothesize that this lack of resolution reflects rapid radiation at the base of the rodent tree and possibly incomplete lineage sorting. Indeed, rodents were already highly diversified in the Paleocene and Early Eocene. Many extinct families are identified in these geological periods (i.e., Decipomyidae, Alagomyidae, Ivanantoniidae, Sciuravidae, Ischyromyidae, Theridomorpha, and Yuomyidae). According to recent phylogenetic work based on fossils and extant taxa , some of these ancient families are sister clades of extant clades. In particular, Theridomorpha might be related to Sciuroidea, Sciuravidae to the mouse-related clade, and Yuomyidae to the Ctenohystrica . This supports the idea that the divergences among Ctenohystrica, the mouse-related clade, and the squirrel-related clade occurred during the explosive radiation of rodents in the Paleocene.
Our results further suggest that a basal position of the mouse-related clade is the least likely, while a basal position of the squirrel-related clade may be the most likely. Interestingly, structural analysis of B1 retroposon elements also provides additional support in favor of an early divergence of the squirrel-related clade [35, 44]. The basal position of the squirrel-related clade may further be supported by the fact that the earliest fossils representative of the Gliridae, Aplodontidae, and Sciuridae families are protrogomorphous, while most early Ctenohystrica and most early representatives of the mouse-related clade are hystricomorphous [see review of character states in ]. Consequently, an early divergence of the squirrel-related clade appears to be the most parsimonious evolutionary scenario, given our current knowledge.
All suborders and super families of the order Rodentia listed by Carleton and Musser  are included in the analysis (Additional file 1). The tree was rooted with the closest rodent outgroups: representative lagomorphs, representative primates, Cynocephalus (the flying lemur, order Dermoptera), and Tupaia (tree shrew, order Scandentia). Rodents together with lagomorphs, primates, flying lemurs, and tree shrews form a clade called Euarchontoglires or Supraprimates [21, 47, 48, 49].
Ethanol-preserved samples, frozen tissue samples, or previously purified genomic DNAs were obtained from the donor institutions listed in Additional file 2. Total DNA was extracted according to Sambrook, Fritsch, and Maniatis  with slight modifications. Fragments from the following six nuclear genes were sequenced: the alpha 2B adrenergic receptor (ADRA2B); the cannabinoid receptor 1 (CB1); the growth hormone receptor (GHR); the interphotoreceptor retinoid binding protein (IRBP); the recombination activating gene 2 (RAG2); and the von Willebrand factor (vWF). These nuclear genes were chosen for the following reasons: (i) a large number of sequences are already available for those genes, especially within rodents; (ii) these genes have been shown to contain phylogenetic information within rodents and between mammalian orders [21, 26, 30]; (iii) these genes are not genetically linked to one another (their location is variable, on chromosomes 2, 4, 15, 14, 2, and 6 in Mus, chromosomes 3, 5, 2, 16, 3, and 4 in Rattus, and chromosomes 2, 6, 5, 10, 11, and 12 in Homo); and (iv) no interactions among these proteins were previously reported.
Amplification of ADRA2B, IRBP, and vWF was performed as described in Huchon et al . Amplification of CB1 was performed in two steps. A first amplification was performed with primers CB1-D1: 5'-GGCTCAAATGACATTCAGTACGAA-3' and CB1-R1: 5'-GAGTCCCCCATGCTGTTATCTAGAGGCTG-3', followed by a re-amplification of the initial PCR product using primers CB1-D2: 5'-CAGTACGAAGATATCAAAGGAGACATGGC-3' and CB1-R2: 5'-GAGTCCCCCATGCTGTTATCTAGAGGCTG-3'. Amplification of RAG2 and GHR was performed similarly. For RAG2, the first amplification was performed with primers RAG2-D1 5'-CGCTGCACAGAGAAAGACTT-3' and RAG2-R1: 5'-AAGGATTTCTTGGCAGGAGT-3', followed by a re-amplification of the initial PCR product using primers RAG2-D2: 5'-TAYAGYCGAGGGAAAAGYATGGG-3' and RAG2-R2: 5'-GACAAGTGGATGAGTGTGCGTTC-3'. For GHR the first amplification was performed with primers GHR-D1 5'-TAGGAAGGAAAATTRGARGARGTNAA-3'and GHR-R1: 5'-AAGGCTANGGCATGATRTTRTT-3', followed by a re-amplification of the initial PCR product using primers GHR-D2: 5'-GGAAAATTRGAGGAGGTGAAYACNATHTT-3' and GHR-R2: 5'-GATTTTGTTCAGTTGGTCRGTRCTNAC-3' or GHR-R1. Purification of the PCR products and sequencing were performed according to Huchon et al. . Sequence accession numbers are available in Additional file 1.
Number of positions in each gene partition.
Phylogenetic tree reconstructions were performed on the concatenated nucleotide dataset using the ML criterion. The program MODELTEST 3.07  was used to determine the best probabilistic model of DNA sequence evolution using the Akaike Information Criterion. The best model was found to be GTR+Γ+I. ML searches for the best trees were performed using the program PAUP* . The parameters of the model and the ML tree were then determined by successive approximation . The initial parameter values were those estimated by MODELTEST 3.07, and those values were used for a first round of heuristic search starting with a Neighbor-Joining (NJ) tree and using TBR branch-swapping. Parameters were then estimated on the resulting tree and used for another round of heuristic search. The process was repeated until all parameter values were stable. Bootstrap percentages were estimated from 100 pseudo-replicates using the best estimated parameters, a NJ starting tree, and TBR branch-swapping.
Phylogenetic trees were also reconstructed based on protein sequences. The protein sequence alignment is provided in Additional file 4. The program PROTTEST 1.3  was used to estimate the best model of protein sequence evolution. The best model was found to be JTT+Γ+I. Phylogenetic trees were then reconstructed with the program PHYML  using the ML model identified by PROTTEST.
Three different partitioned ML-analyses were conducted on the nucleotide dataset with RAxML . The first analysis considered each gene as an independent partition (six partitions). The second analysis considered each codon position as an independent partition (three partitions). The third analysis considered each codon position of each gene as an independent partition (18 partitions). The GTR+Γ+I model was applied to all partitions, individual α-shape parameters, substitution rates, and base frequencies were estimated and optimized separately for each partition. Bootstrap support was estimated using 100 pseudo-replicates.
Bayesian analyses were performed on the nucleotide dataset using the program MrBayes v3.1.2 . Prior distributions for parameters in the Bayesian analyses were: topology, uniform; branch lengths, exponential (λ = 10); alpha parameter of the Γ distribution, uniform (0.05,50.0); p inv, uniform (0,1); κ, beta (1.0,1.0); R-matrix, Dirichlet (1,1,1,1,1,1); base frequencies, Dirichlet (1,1,1,1). Each run included two independently started chains, each beginning with a different, randomly chosen tree. From each starting tree, four related MCMC chains (one cold and three incrementally heated) were run. The temperature parameter λ was set to produce chain-swap frequencies in the range of 10–30%. Posterior distribution estimates were based on sampling the cold chain every 250 generations. Initial runs were allowed to stop at 107 cycles if the percent standard deviation among bifurcation split probabilities for the two separate chains was less than 0.01. In those cases, the first 5 × 106 cycles were discarded as burn-in. Additional longer runs were performed if needed, with the first 50% of samples discarded as burn-in. The dataset was divided into 12 partitions, two for each gene. For each gene, the first- and second-position sites were combined into a single partition, and the third-position sites in a separate partition. Analyses were conducted with each partition assigned either the HKY85+Γ or the GTR+Γ model, with the exception of the 1st- plus 2nd-position site partition for CB1, which used either HKY85+I or GTR+I, because of the extremely low number of variable sites in that partition. The GTR models were preferred by Bayes' Factors, while the HKY85 models were favored by the Bayesian information criterion (BIC). However, both models gave very similar results, as did a separate set of analyses with the CB1 1st- plus 2nd-position sites excluded.
Bayesian analyses under the CAT +Γ4 model were performed using the program Phylobayes 2.1c [34, 62]. Both for the DNA and protein datasets, two chains were run for 100,000 cycles and trees were sampled every 100 cycles after the first 25,000 cycles. As recommended in Phylobayes, the maximum difference in bipartition frequencies between the two chains was below 0.1, indicating a "good run" (for DNA, maxdiff = 0.051; for protein, maxdiff = 0.044). The phylogenetic trees obtained under the CAT +Γ4 model are available in Additional file 5.
The ML tree was compared to several constrained topologies using various likelihood-based tests as implemented in the program CONSEL v0.1i . Eleven alternative topologies were considered. 1 – The best tree placing the Ctenohystrica at the base of Rodentia. 2 – The best tree placing the mouse-related clade at the base of Rodentia. 3 – The best tree placing Myodonta at the base of the mouse-related clade. 4 – The best tree placing Anomaluromorpha at the base of the mouse-related clade. 5 – The best alternative that does not support monophyly of Chinchilloidea+Octodontoidea. 6 – The best alternative that does not support monophyly of Cavioioidea+Erethizontoidea. 7 – The best tree placing Phiomorpha at the base of the Hystricognathi. 8 – The best tree placing Caviomorpha at the base of the Hystricognathi. 9 – The best alternative that does not support monophyly of the mouse-related clade. 10 – The best alternative that does not support monophyly of the squirrel-related clade. 11 – The best alternative that does not support monophyly of the Ctenohystrica. The best alternatives were built using constrained ML heuristic searches. Each search was conducted starting with an NJ tree, using the TBR branch-swapping option, and the parameters of the unconstrained ML tree. Site-wise log-likelihoods were computed with PAUP* using the parameters of the best ML tree.
Following the approach of Rodriguez-Ezpeleta et al. , fast evolving sites were determined according to their site-wise rates calculated with the program Rate4Site  using the Tamura-Nei substitution model  with 16 discrete rate categories used to approximate the Gamma distribution. Rates were computed for three topologies: the ML tree topology obtained as describe above (i.e., the topology with the squirrel-related clade at the base), and the two alternative topologies (the mouse-related clade at the base and the Ctenohystrica at the base). All other nodes were identical between the topologies. Nucleotide sites were classified according to their average rate over the three topologies. Rates were normalized so that the average rate across all sites was 0. Rates ranged between – 0.698 to 3.989. The fastest-evolving sites were then progressively removed to create nine datasets: all sites; sites with rates ≤ 3.5; sites with rates ≤ 3.0; sites with rates ≤ 2.5; sites with rates ≤ 2.0; sites with rates ≤ 1.5; sites with rates ≤ 1.0; sites with rates ≤ 0.5; and sites with rates ≤ 0. Bootstrap values were computed with the program Treefinder . For each dataset, the ML tree and parameters were estimated by Treefinder under the GTR+Γ+I model. These ML parameters were then used to perform a bootstrap analysis using 500 replicates.
Removing sites based on their evolutionary rate does not allow differentiation between sites with few character states (e.g. sites with only purines or only pyrimidines) from sites with all possible character states (e.g. sites with all four bases). However, if two positions evolve under the same substitution rate, we can expect the one with more character states to be less homoplasious. Consequently, we used CI values as a measure of the level of homoplasy. It is worth noting that CI and site specific rates are weekly correlated (Additional file 6). For each of the three topologies described above, the CI of each site was computed using PAUP* and sites were classified according to their average CI values across the three topologies. Seven datasets were constructed by eliminating some sites based on their CI values (retaining all sites; retaining sites with CI > 0.1; retaining sites with CI > 0.2; retaining sites with CI > 0.3; retaining sites with CI > 0.4; retaining sites with CI > 0.5; and retaining sites with CI > 0.6). Bootstrap values were then computed as described above.
Since the third codon position generally evolves at the highest rate, we performed two types of ML analyses that were intended to reduce the impact of this high rate. Phylogenetic trees were reconstructed either without third codon position or using RY coding at the third codon position. The program MODELTEST 3.07 was used to infer the model of sequence evolution and phylogenetic trees were reconstructed using PAUP* as described above.
It has been shown that using codon-based models can improve phylogenetic inference from protein-coding genes . This method is computationally intensive, thus precluding the computation of bootstrap values for a dataset of 49 species. To this end, maximal log-likelihoods of the three topologies involving basal rodent relationships were compared using the program CONSEL . For this comparison, site-wise log-likelihoods of each topology under two codon models were computed using the program SELECTON version 2.3 . The first model, M8, allows for positive selection operating on the protein . The second model, M8a, does not allow for positive selection . In both codon analyses, four Gamma-rate categories were used to account for among-site rate variation.
Heterotachy (covarion) is define as the variation of the evolutionary rate within a given site, and it was shown to have an impact on phylogenetic inferences [e.g., ]. Consequently, the covarion model developed by Galtier , extended to amino-acids  was used to compute the site-wise log-likelihoods of the three topologies involving basal rodent relationships. Four Gamma-rate categories were used to account for among-site rate variation. The C++ code for computing site-wise log-likelihoods under the covarion model is available from the authors upon request.
The support for each topology under the codon and covarion models was compared to the support under the JTT model. The program Rate4Site was used to obtain the site-wise log-likelihood of the three topologies tested, and these site-wise log-likelihoods were used as input to CONSEL.
We thank F. Catzeflis (curator of the tissue collection of the Institut des Sciences de l'Evolution de Montpellier), the University of Alaska Museum, the Cleveland Metroparks, the Louisiana State University Museum of Natural Science, the Texas Cooperative Wildlife Collection, the Museum of Southwestern Biology, the Rotterdam Zoo and the Zoological Society of Philadelphia, as well as all donors and collectors of tissue: T. Arrizabalaga, M. R. Banta, M. Bebehani, C.J. Bonar, J. Cook, M. Corti, D. L. Dittmann, D. Eilam, C. G. Faulkes, P. Gouat, L. Granjon, J. Hayes, R.L. Honeycutt, R. Hoyt, J. Jarvis, N. Kronfeld-Schor, M. Mensink, P. Perret, D. Schlitter, D.S. Semple, M.S. Springer, R. Stuebing, J. Terkel, J. Trupkiewicz, E. Pelé, J.-C. Vié, and V. Volobouev. We would also like to thank Frida Belinky for running the CAT model analysis, Lily Kredy-Farhan for designing primers, and Naomi Paz for revising the English text. OP is a fellow of the Converging Technologies scholarship program. This work was supported by the United States-Israel Binational Science Foundation (BSF; 2004-407 to DH and RWD) the National Science Foundation (NSF; DEB-0075306 to RWD) and the High Council for Scientific and Technological Cooperation between France-Israel (to DH and TP).
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.