### Taxonomic sampling and molecular biological methods

Our study is based on a total of 25 arvicoline species representing 14 of the 28 genera and 7 of the 10 tribes according to [33]. For the genus *Microtus*, we included 12 specimens belonging to different subgenera or species whose taxonomic status is questionable [35, 36] (see Table 1). We used as outgroups, representatives of two lineages phylogenetically close to the Arvicolinae, *Mesocricetus auratus* (Cricetinae), and *Phyllotis darwinii* (Sigmodontinae) [42, 71, 30]. Two other taxa belonging to different subfamilies of Muridae, *Rattus norvegicus* (Murinae) and *Meriones* sp. (*M. shawi* + *M. unguiculatus*; Gerbillinae), were chosen as more distant outgroups.

All ethanol-preserved arvicoline samples were stored in the mammalian tissue collection of the Institut des Sciences de l'Evolution de Montpellier [72]. Total DNA was extracted using the QIAamp DNA mini kit (Qiagen). Part of the GHR exon 10 was amplified and sequenced using the primers GHR5 forward (5' GGCRTTCATGAYAACTACAAACCTGACYTC 3') and GHR6 reverse (5' GAGGAGAGGAACCTTCTTTTTWTCAGGC 3'), and GHR3 forward (5' GACTTTATGCYCARGTRAG 3') and GHR4 reverse (5'-CTYACYTGRGCATAAAAGTC 3'). PCR conditions were 95°C 5 min, followed by 95°C 30 sec, 61°C 30 sec, 72°C 1 min (5 times), then 95°C 30 sec, 59°C 30 sec, 72°C 1 min (5 times), followed by 95°C 30 sec, 57°C 30 sec, 72°C 1 min (5 times), then 95°C 30 sec, 55°C 30 sec, 72°C 1 min (5 times), and then 95°C 30 sec, 53°C 30 sec, 72°C 1 min (20 times), with a final extension at 72°C 5 min.

The amplification and sequencing of the CYB were conducted using primers MVZ05 and MVZ14 [73] and additional internal ones MVZ16 [73] and H8 [74]. PCR products for GHR and CYB were purified from 1% agarose gels using Amicon Ultrafree-DNA columns (Millipore) and sequenced on both strands using automatic sequencing (Big Dye Terminator cycle kit) on an ABI 310 (PE Applied Biosystems).

All taxa included in our study were represented by both CYB and GHR sequences. A 921 bp segment of the exon 10 of GHR was sequenced for 22 specimens and the complete CYB (1140 bp) for 7 specimens. The arvicoline sequences new to this study have been deposited in the EMBL data bank, and we also used previously published sequences when available (see Table 1).

### Sequence alignment and phylogenetic analyses

Sequences were manually aligned with the ED editor of the MUST package [75]. Non-sequenced positions as well as introduced gaps were treated as missing data in subsequent analyses. Heterozygotic bases found in GHR sequences were coded following the IUPAC nucleotide ambiguity code.

Phylogenetic analyses of CYB and GHR alignments were conducted under the maximum likelihood (ML) and Bayesian methods, using PAUP* [76] version 4b10, PHYML [77] version 2.4.4, and MrBayes version 3.04 [78]. Moreover, MrBayes provided the opportunity to run analyses assuming different models of sequence evolution for each predefined partition, thus permitting the parameters estimated for each model of sequence evolution to be directly compared among codon positions and between genes. When CYB and GHR sequences were combined, sequences from different specimens of the same species were sometimes used. We assumed that the phylogeographic structure detected in some arvicoline species [45, 79] was negligible as compared to the level of genetic divergence between the distinct *Microtus* subgenera and even arvicoline tribes here compared.

The program Modeltest [80] version 3.06, was used to determine the sequence evolution model that best fits our data using the Akaike Information Criterion (AIC). This program examined the fit of 56 models, with either a proportion of invariable sites (I), a gamma distribution of among-sites variation of substitution rates (Γ), or both (I + Γ). The best-fitting substitution models were TrN93 + Γ + I [81] for the GHR data set, and GTR + Γ + I [82] for the CYB data set. However, to run the same model of sequence evolution under PHYML and MrBayes, GTR was chosen for all phylogenetic analyses (the optimal GHR topology was not model-dependent, as it appeared that both GTR and TrN93 topologies were identical). To allow a fair comparison of α estimates of Γ-shape among genes and partitions, we did not use a proportion of invariable sites, but rather assigned eight discrete Γ categories (Γ_{8}). The Γ distribution allows some sites to evolve at a very low rate, and the incorporation of a fraction of invariable sites does not necessarily lead to a significant increase in likelihood [83].

To avoid excessive calculation times, our PAUP* ML analyses were conducted in two steps. First, we estimated ML parameters on a neighbor-joining (NJ) starting tree. Second, a ML heuristic search was conducted by Tree Bisection Reconnection (TBR) branch swapping to identify the optimal tree under these constrained GTR + Γ_{8} parameter estimates. This tree was re-used for a new round of parameter estimation/branch swapping, and the procedure was repeated until there was a stabilization of both topologies and parameters. The stability of nodes was estimated by ML bootstrap percentages (BP_{ML}) [84], computed by PHYML after 100 replicates of site resampling, with BioNJ starting trees. Because of its rapidity, PHYML was preferred over PAUP* for bootstrap analyses. To assess the amount of phylogenetic signal contained within an individual partition (each codon position of each gene), 500 replications of ML bootstrapping were also independently performed for each partition.

Bayesian analyses were performed with one distinct GTR + Γ_{8} model per gene and codon position, with unlinking base frequencies, GTR, and Γ parameters. Metropolis-Coupled Markov chain Monte Carlo (MCMCMC) sampling was conducted during 10,000,000 generations with five incrementally heated chains. We used Dirichlet priors for base frequencies (1,1,1,1) and for GTR parameters (1,1,1,1,1) scaled to the G-T rate, a Uniform(0.05,50.00) prior for the Γ shape, and an Exponential(10.0) prior for branch lengths. Bayesian posterior probabilities (PP) were computed from trees sampled every 100 generations, after removing the 50,000 first trees as the "burn-in" stage. In order to discriminate between moderately and strongly supported nodes – for which initial PP were superior to 0.95 – we also calculated bootstrapped Bayesian posterior probabilities (BP_{bay}) as suggested by [85] and [86]. Due to computing time limitations, BP_{bay} were only computed for the combined data set (GHR + CYB). First, 100 bootstrap pseudo-replicates were independently generated from each of the six partitions (the three CYB and the three GHR codon positions) using the SEQBOOT program 3.6a2.1 [87] of the PHYLIP package. Second, for each of the 100 concatenated bootstrap data sets, MCMCMC sampling of trees was performed as previously described for the original data under the six GTR + Γ_{8} partitioned model, except that trees were sampled every 100 generations for only 500,000 generations. To maximize the probability that the chains reached stationarity in each bootstrap replicate, one-half of the 5,000 trees sampled from the posterior probability distribution was systematically removed as the burnin [86]. BP_{bay} resulted from the overall 50% majority rule consensus of the 500,000 saved trees.

Following [34], sequences were also analyzed at the amino-acid level. We respectively used the JTT + Γ + I and mtREV + Γ + I ML models of protein evolution for GHR and CYB sequences as implemented in PHYML.

### Sensitivity analyses

We performed sensitivity analyses using PAUP* to assess the relative contribution of the various model parameters to the log-likelihood increase when more complex models are considered. To take the evolutionary properties of each codon position into account, nucleotide sites were categorized into each of the three codon positions. The procedure was conducted separately for CYB and GHR sequences. All sets of parameters – base composition (BC), substitution rate matrix (GTR), heterogeneity of substitution rate among sites (Γ), and branch lengths (BL) – were estimated independently for each codon partition. AIC values were compared to assess the significance of likelihood variation between global and partitioned models. For instance, to test the contribution of BC, only BC parameters were computed for each partition whereas other parameter values (GTR, Γ, BL) were fixed for the whole gene.

### Evaluation of the saturation of nucleotide substitutions

The nucleotide saturation of the phylogenetic markers was assessed graphically according to the procedure of [88], by plotting the number of observed differences as a function of the ML inferred number of substitutions for all 351 pairwise comparisons for 27 sequences (the partial CYB sequence of *Lasiopodomys mandarinus* was removed). The inferred number of substitutions was estimated from the ML tree as the sum of the branch lengths linking two terminals. The level of saturation was estimated by the slope (S) of the linear regression between the observed and inferred substitutions. Substitutions saturation is evidenced when the number of inferred substitutions increased, whereas the number of observed differences remained constant. We performed saturation analyses independently for each phylogenetic marker and for each codon position. ML branch lengths were obtained for each partition using PAUP* and by enforcing the topology as identical to the combined CYB + GHR tree.

### Variable Length Bootstrap

To compare the phylogenetic resolving power of GHR and CYB at varying taxonomic levels, we used the variable-length bootstrap (VLB). In this method, bootstrap support is estimated as a function of a variable number of resampled characters [89]. For each data set, nucleotide sites were resampled to generate bootstrap pseudomatrices of 100, 250, and until 5000 characters with increasing steps of 250 sites. All bootstrap searches were then performed using ML analyses with PAUP*.