Mitochondrial genome sequencing in the aurochs
Various preliminary biochemical tests indicated that a phalanx bone sample excavated in Vado all'Arancio rock shelter (Massa Marittima, Grosseto, Italy, Figure 1) and ascribed to Bos primigenius sp. by morphological and morphometrical analysis , was well-preserved for molecular analysis (Table S5, Additional File 8 and Figure S4, Additional File 9). Three lines of evidence suggested that endogenous biomolecules were likely to be well-preserved [41, 42]. Firstly, the degree of racemization for three amino acids (aspartic acid, alanine and glutamine) was low (Table S5, Additional file 8), and this has been suggested to be compatible with DNA survival [42, 43], though concerns have been raised about the utility of this approach [44, 45]. Second, we calculated that, under our reaction conditions, the estimated copy number of a 180 bp mtDNA target was greater than 5000 copies per reaction (5127 ± 912; standard curve fit values: Slope = -3.391, Y-intercept = 45.34, R2 = 0.977, Efficiency = 0.97). This value is much larger than the threshold under which consensus sequence determination is thought to be seriously affected by nucleotide modifications present in the ancient DNA molecules . Third, using a system of three overlapping primer pairs we obtained the same sequence for the hypervariable segment of the mtDNA control region from different bone fragments and in multiple PCRs (Figure S4, Additional File 9).
The Vado all'Arancio rock shelter is a well known Italian site with a clear stratigraphy, and was occupied only for a short period of time during the Late Palaeolithic. The presence of specific artifacts and the radiocarbon dating of associated faunal remains (11,300 ± 150BP obtained in Rome for R1333, and 11,600 ± 130 obtained in Lyon for Ly3415; average of 11,450 years BP) unequivocally date Vado all'Arancio rockshelter to a pre-Neolithic context preceding the spread of domestication . This further confirms that the specimen belongs to an aurochs, rather than an early Holocene Bos taurus.
We applied a combined typing strategy for mtDNA genome sequencing. Multiplex PCR amplifications of short overlapping fragments covering the whole mtDNA genome were used to enrich the total extracted DNA, prior to 454 pyrosequencing of the pooled enriched material. A multiplex PCR approach has previously been used for the reconstruction of ancient mithocondrial genomes [1, 47, 48, 17]. Compared with a singleplex PCR strategy, multiplex PCR has two advantages: first, a small quantity of starting material can be used to retrieve multiple DNA fragments simultaneously, and second, limited manipulation of the original template reduces the risk of sporadic contamination by exogenous DNA.
We designed 130 overlapping primer pairs on the basis of Bos taurus reference sequence (BRS) . Primer pairs (designed to produce 155 to 230-bp long DNA sequences) were arranged into four sets (A,B,C,D) with overlapping pairs in different sets (see Additional File 10). Each multiplex PCR set was performed on each different extract following stringent criteria for ancient DNA analysis and sequence authentication . The amplification success for each primer pair was checked with secondary singleplex PCRs. All the amplification products were then diluted to equal concentration, pooled, and used as a substrate for the FLX library preparation and pyrosequencing reaction using the Roche FLX/454 technology. After identifying sequence reads with the PCR primers at their termini, the primers were masked and the resulting portion mapped against the reference sequence [NCBI accession: V00654] using the Amplicon Variant Analyzer application (AVA, Roche) with default parameters. Finally, starting from the AVA multi-alignments, we generated consensus sequences using a home-made Python script, which assigned the most frequent base at each position. A large number of mtDNA amplicons were sequenced to allow the consensus sequence to be determined accurately, without laborious cloning and sequencing of PCR products. Amplicons with no or low coverage (< 10 reads) after the first round of sequencing were pooled again and pyrosequenced separately.
Standard ancient DNA singleplex PCRs, cloning and Sanger sequencing approaches were used to fill remaining gaps and resolve ambiguous sequences after the first assembly of the total reads generated with the 454 technology. Particular attention was paid to insertions or deletions in homopolymer regions that are problematic for the 454 pyrosequencing technology , and to potential misincorporations due to degradation in the original template (see Additional File 10).
Finally, to check the reproducibility of the results, we replicated 16 selected amplicons in three different laboratories in Italy, Sweden and Australia (see Table S3, Additional File 5, and the Additional File 10). Amplicons that showed polymorphic positions with respect to the BRS sequence, or with critical phylogenetic markers, were replicated at Uppsala University (Sweden) and at the Innovation and Research Centre (San Michele all'Adige, Trento, Italy). Control region fragments and a small portion of the 12S gene were sequenced at Adelaide University (Australia).
Additional details regarding the sample and the molecular procedures are reported in the Additional File 10.
Phylogenetic analysis and demography
Phylogenetic inferences were performed on the coding region (15440 bp) of the mtDNA alignment. The model of nucleotide substitution was selected by means of the MODELTEST software , according to the Akaike information criterion (AIC). The model resulting with the lowest AIC score was the GTR+γ+I (general time reversible model with heterogeneity in substitution rates plus a proportion of invariable sites). We performed Bayesian phylogenetic inference under two prior models: a) an unrooted, strictly bifurcating model, and b) an unrooted model allowing for polytomies (following the algorithm proposed in ). Both analyses were performed using the software PHYCAS , and the input file for this analysis is provided as Additional File 11.
The first model is similar to the standard unrooted model implemented in MRBAYES 3.1.2  with the difference that a hyperprior parameter is used to model the rate of the exponential prior distribution on branch lengths (while in MRBAYES such parameter is fixed and set by the user). We used as hyperprior distribution an inverse gamma with mean 1.0 and variance 10.0, following . In the second model, a polytomy prior  needs to be set. This value indicates the relative strength of a less resolved topology compared to a more resolved one: a value equal to 1 gives same prior to polytomic or strictly bifurcating topologies, while values greater than 1 places more weight on polytomic topologies. We set the polytomy prior to e (Nepero's number), following . PHYCAS uses slice sampling to update continuous parameters and the LOCAL move of Larget and Simon  to update topology and branch lengths. In the second model that allowed for polytomies, a RJ-MCMC scheme was used because each iteration the number of parameters can change (as the number of branches vary due to the presence of polytomies).
PHYCAS was run twice for 200,000 cycles for both models. A cycle on PHYCAS roughly corresponds to 100 iterations in MRBAYES. We sampled one cycle every 100, after discarding the first 100,000 cycles as burn-in. We compared the traces of the two runs to check for convergence. After the burn-in was removed, we compared both models using Bayes Factors. The Bayes Factor was computed as twice the difference between the log of the marginal likelihoods, which were approximated using the harmonic mean as suggested in . To check if the heating procedure could produce different results (which would in turn implies that a single MCMC chain could not converge properly to the correct posterior distribution), we ran the first model using MRBAYES. The analysis was run twice using four incrementally heated chains with the default temperature, over 2*108 generations long with a 20% burnin. Convergence was checked by examining the generation plot visualized with TRACER  and we computed the potential scale reduction factor  with the sump function in MRBAYES. The resulting tree topology as well as the posterior probabilities of the various nodes were almost identical to the results obtained with PHYCAS, suggesting that the posterior distribution was correctly explored using PHYCAS even without the heating procedure.
Molecular dating is subject to many sources of errors. Indeed, usually topology and branch lengths are not known a priori and they need to be estimated from data with the associated uncertainty in the estimation procedure. In addition, the calibration of molecular clock relies on a known divergence time which is often assumed as a fixed value. The importance of a correct calibration point (usually a fossil divergence) has been well acknowledged in the phylogenetic literature, though it is difficult to obtain a reliable estimate that can be readily translated into an accurate molecular clock rate. The Bos spp. phylogeny is no exception to this rule, and several different paleontological data have been used to calibrate the bovid mitochondrial clock. For this reason, we employed three divergent, though widely used calibration points and performed the following analyses for all of them. The three points were: a split between B. taurus and B. bison at 1 MYA , at 2 MYA  and at 5 MYA .
We first performed phylogenetic dating on all our dataset employing a bayesian algorithm implemented in BEAST . The input file for BEAST is provided as Additional File 12. We choose a Yule prior  on topology and branch lengths and constrained the root of our phylogeny (which coincides with the split B. taurus-B. bison) to one of the three calibration points above (i.e., 1MY, 2MY, and 5MY). Since the new B. primigenius specimen is dated at more than 11,000 years B.P., we excluded it from this analysis as the Yule prior has not been adapted to handle serial data. We ran each of these analyses twice for 20,000,000 MCMC, with a thinning value of 1,000. Since these models differ only in the age of one node, we could have just estimated the scaled branch lengths and used the three calibration points to translate them into years. The three analyses can be however considered as an additional check for convergence. We removed the first 10% MCMC iterations as burnin. Under this phylogenetic dating approach, we computed the TMRCA of several nodes of interest and, at the same time, estimated the posterior distribution of the molecular clock rate.
We then selected only Italian cattle and performed bayesian based population genetic inference and molecular dating under two coalescent priors: constant population size and the Bayesian skyline plot , with gene genealogies divided into three internode groups and effective population size function fitted with a piecewise constant function of population size change. Both analyses were run using BEAST software for 20,000,000 MCMC iterations with a 10% burn-in and a thinning interval of 1,000. The input file for BEAST is provided as Additional File 13. As local domestication and/or introgression from aurochs into domestic breeds in Italy cannot be excluded [24, 27–29], these analyses were perfomed both without the new Bos primigenius sequence and including it (i.e., assuming that Italian aurochs were direct ancestor of modern Italian breeds). The constant population size model and the Bayesian skyline were compared by means of Bayes Factor, computed as described above. In these analyses, the mutation rate was fixed as the median values estimated from the previous BEAST phylogenetic analyses. These values, corresponding respectively to 1, 2, and 5 million years of divergence between B. taurus and B. bison, are 3.3*10-8, 1.6*10-8, and 6.6*10-9. Population sizes were estimated assuming a generation time of 7 years .