Sample collection, DNA extraction, amplification and sequencing
Epidermal tissue was obtained by remote biopsying of free-ranging animals  and stored in 20% dimethyl sulphoxide (DMSO) saturated with salt .
Genomic DNA was extracted from epidermis using the Qiagen DNeasy (Qiagen DNeasy, Valencia, CA, USA) kit following the manufacturer's guidelines. The mitochondrial genome was amplified in 3-5 overlapping amplicons (dependent on DNA preservation) using previously published long-range PCR primers  and primers designed specifically for this study (see Additional file 3). Additional sequencing primers were also designed for gap filling using conventional Sanger sequencing at the commercial service offered by Macrogen (Seoul, South Korea). Each 25 μl PCR contained 1 μl extracted DNA, 1× PCR buffer, 2 mM MgSO4, 0.4 μM of each primer, 100 nM mixed dNTPs and 0.1 μl High Fidelity Platinum Taq (Invitrogen, Carlsbad, California). PCR amplifications were performed using an MJ Thermocycler with a 4 minute activation step at 94°C, followed by 35 cycles of 94°C for 30 seconds, 62°C for 30 seconds, 68°C for 6 minutes 30 seconds, followed by a final extension period of 72°C for 7 minutes. For amplifications under 6 kb in size the extension time was reduced to 3-5 minutes.
The amplified PCR products were purified using an Invitek PCRapace purification kit (PCRapace, Invitek, Berlin, Germany) and quantified using a NanoDrop spectrophotometer (NanoDrop Products, Wilmington, DE) to determine DNA concentration used to balance and pool amplicons in equimolar ratios. Length of fragment, ratio of fragment lengths per individual, and DNA concentration was taken into account when balancing the samples. Samples were either individually tagged according to Meyer et al.  and built into shotgun sequencing libraries following the manufacturer's instructions (454 Life Sciences, Branford, CT), or grouped into sets of 8-10, where within each sample set individual libraries were made to contain a different 10 bp multiplexing identifier (MID) tag, allowing libraries to be combined prior to emulsion PCR. Sequencing libraries were quantified by qPCR  and pooled at equimolar concentrations. Library pools were divided among regions on GS FLX sequencing runs, using either LR70 or Titanium chemistry (454 Life Sciences). Sequencing data was parsed into individual extractions and identifier tags were removed using a custom tag-removal Perl script (M. Rasmussen, unpublished, University of Copenhagen).
Sequences were assembled using gsMapper (Roche Applied Science, Indianapolis, IN, USA) and aligned by eye using SE-AL v2.0a11 (A. Rambaut, University of Oxford), while Geneious (Biomatters Ltd., Auckland, New Zealand) and Sequencher v4.8 (Genes Code Corporation, Ann Arbor, MI) were used to check coverage and sequence reliability. Conspicuous indels, base position differences, and differences in homopolymeric regions were double-checked and sequences with higher coverage were generally given preference. In order to ensure that the data was not affected by the erroneous incorporation of nuclear pseudogenes (numts) we visually assessed the recovered sequences for the presence of stop codons or frame-shift mutations in the aligned protein-coding genes. We observed no evidence that numts might be present in the data. This may be explained by a combination of (a) the general difficulty with PCR amplifying long amplicons, requiring relatively high levels of template for successful amplication, and thus (b) the fact that mtDNA template copy numbers are much higher than those for nuDNA templates, leading to preferential mtDNA over nuDNA amplification.
A total of 35 mitogenome sequences were used in the analyses, representing 21 species (Table 1). Of these, 18 mitogenomes were amplified and sequenced for this study and 17 mitogenomes attained from Genbank [17, 30, 48, 49], which included 4 outgroup sequences (narwhal, Monodon monoceros; harbor porpoise, Phocoena phocoena; Yangtze river dolphin, Lipotes vexillifer; Amazon river dolphin, Inia geoffrensis). The 18 generated sequences consist of 5 species (pygmy killer whale, Feresa attenuata; melon-headed whale, Peponocephala electra; Irrawaddy dolphin, Orcaella brevirostris; Australian snubfin dolphin, Orcaella heinsohni; rough-toothed dolphin, Steno bredanensis) whose mitogenomes had not been previously sequenced prior to this study.
A single representative mitogenome from each of the 21 species was used for the initial Bayesian phylogenetic analysis and divergence date estimation. This step was taken so that a speciation prior could be used for the tree topology and node times. Sequences were aligned and the 2 rRNA and 12 protein-encoding genes (excluding ND6) were used to form a data set comprising 13,958 sites. Stop codons were removed from all genes and the control region was excluded from analysis due to saturation, repetitive sequences, and alignment ambiguities. The resulting alignment was divided into four partitions: first, second and third codon sites of the protein-coding genes (3,792 bp per partition), and rRNA genes (2,582 bp). A comparison of Bayesian information criterion values in Modelgenerator  were used to find the optimal time-reversible substitution model for each partition. This criterion has been found to perform well in relation to other criteria used in evolutionary model selection . The selected models were GTR+I+G for first and third codon sites, HKY+I+G for second codon sites, and TN93+I+G for the rRNA partition. In all cases, rate variation among sites was modelled using a gamma distribution with six categories . There was little variation in the base frequencies across taxa (see Additional file 4).
The Bayesian phylogenetic analysis was performed using BEAST v1.6 . An uncorrelated lognormal relaxed-clock model was used to allow rate variation among branches . A Bayes-factor analysis indicated that this model received decisive support in comparison to a strict-clock model. The four data partitions shared the same relaxed clock but were allowed to have different relative rates. An exponential prior with a mean of 1/3 was used for the standard deviation of the lognormal distribution of rates, and a Yule prior was specified for the tree topology and relative divergence times. To enable the estimation of absolute divergence times in the tree, four calibrations based primarily on fossil calibrations, along with estimated divergence dates from published studies [27, 28, 55] (see Additional file 5), were incorporated into the analysis. The calibrations were implemented in the form of uniform prior distributions for the ages of the four nodes, and monophyly was enforced on the clades defined by these four nodes.
Posterior distributions of parameters, including the tree topology and divergence times, were estimated by Markov chain Monte Carlo (MCMC) sampling. Samples were drawn every 5,000 MCMC steps over a total of 50,000,000 steps. The first 10% of samples were discarded as burn-in. Convergence to the stationary distribution and acceptable mixing were investigated using the diagnostic software Tracer v1.5 (Rambaut and Drummond, 2007, University of Oxford). From the set of posterior samples, the tree with the highest product of clade credibilities was identified and the branch lengths were rescaled to match mean posterior estimates.
Additional phylogenetic analyses were performed in order to examine the effect of data partitioning. First, the analysis was repeated without data partitioning, so that the protein-coding genes and the rRNA genes shared the same substitution model and mean evolutionary rate. Second, analyses were performed on three data sets in which sites were randomly assigned to the four data partitions mentioned earlier. Randomisation of sites (sampling without replacement) was performed using the Java application SiteSampler v1.1  Support for the different partitioning schemes was examined by assessing Bayes factors, calculated using a harmonic-mean estimator in the software Tracer v1.6 [57, 58].
A second set of phylogenetic analyses was performed on the full dataset (35 mitogenomes), including multiple representatives per species, coding missing data as N and using data partitioning for each codon position, rRNA, and control region. For each partition, the best model of molecular evolution that was compatible with models implemented in MrBayes 3.12  was selected using the Bayesian information criterion. The selected models were TN93+I+G for the first and second codon sites, HKY+I+G for third codon sites and the control region, and HKY+I+G for the rRNA genes. Similar models were recovered after RY-recoding a hypervariable part of the control region (nucleotides 15556-15588 according to the Globicephala macrorhynchus, Genbank accession number HM060334, reference mitogenome). Bayesian phylogenetic analysis was performed using the 5-partition datasets (with and without RY-recoding of the control region) using MrBayes 3.12 . Posterior distributions of parameters were estimated using two independent MCMC analyses, each comprising one cold and three heated chains. Samples from the posterior were drawn every 1,000 steps over a total of 10 million steps, which appeared to be sufficient to keep the average standard deviation of split frequencies below the critical value of 0.01. The first 25% of samples were discarded as burn-in. A majority-rule consensus tree was constructed from the posterior sample of trees. A supplemental set of analyses was performed with MrBayes 3.12 and PhyML 3.0  using an unpartitioned dataset under a GTR+I+G model as selected using ModelTest  and Akaike Information Criterion, with and without RY-recoding of the hypervariable segment of the control region. All analyses yielded identical tree topologies and similar node support values. For maximum-likelihood analyses, the strength of the phylogenetic signal was assessed via non-parametric bootstrapping with 250 pseudo-replicates. In addition, using Consel 0.1 k  and site-wise likelihood values recovered from PhyML analyses, levels of statistical support for alternative topologies were evaluated from the p-values of approximately unbiased tests, weighted or unweighted unilateral Kishino-Hasegawa and Shimodeira-Hasegawa tests, and bootstrap and Bayesian posterior probabilities for the selected topologies (Table 2). All trees were drawn using Dendroscope .