Analysis of Canis mitochondrial DNA demonstrates high concordance between the control region and ATPase genes

Background Phylogenetic studies of wild Canis species have relied heavily on the mitochondrial DNA control region (mtDNA CR) to infer species relationships and evolutionary lineages. Previous analyses of the CR provided evidence for a North American evolved eastern wolf (C. lycaon), that is more closely related to red wolves (C. rufus) and coyotes (C. latrans) than grey wolves (C. lupus). Eastern wolf origins, however, continue to be questioned. Therefore, we analyzed mtDNA from 89 wolves and coyotes across North America and Eurasia at 347 base pairs (bp) of the CR and 1067 bp that included the ATPase6 and ATPase8 genes. Phylogenies and divergence estimates were used to clarify the evolutionary history of eastern wolves, and regional comparisons of nonsynonomous to synonomous substitutions (dN/dS) at the ATPase6 and ATPase8 genes were used to elucidate the potential role of selection in shaping mtDNA geographic distribution. Results We found high concordance across analyses between the mtDNA regions studied. Both had a high percentage of variable sites (CR = 14.6%; ATP = 9.7%) and both phylogenies clustered eastern wolf haplotypes monophyletically within a North American evolved lineage apart from coyotes. Divergence estimates suggest the putative red wolf sequence is more closely related to coyotes (DxyCR = 0.01982 ± 0.00494 SD; DxyATP = 0.00332 ± 0.00097 SD) than the eastern wolf sequences (DxyCR = 0.03047 ± 0.00664 SD; DxyATP = 0.00931 ± 0.00205 SD). Neutrality tests on both genes were indicative of the population expansion of coyotes across eastern North America, and dN/dS ratios suggest a possible role for purifying selection in the evolution of North American lineages. dN/dS ratios were higher in European evolved lineages from northern climates compared to North American evolved lineages from temperate regions, but these differences were not statistically significant. Conclusions These results demonstrate high concordance between coding and non-coding regions of mtDNA, and provide further evidence that the eastern wolf possessed distinct mtDNA lineages prior to recent coyote introgression. Purifying selection may have influenced North American evolved Canis lineages, but detection of adaptive selection in response to climate is limited by the power of current statistical tests. Increased sampling and development of alternative analytical tools will be necessary to disentangle demographic history from processes of natural selection.


Background
Mitochondrial DNA (mtDNA) has been widely used in phylogenetic studies aimed at answering questions related to ecology and evolution. Its maternal inheritance, lack of recombination, high copy number, variable substitution rates across regions, high mutation rate compared to nuclear DNA, and role in energy production [1] make it an attractive genome for research that aims to understand species relationships, evolutionary history, and demographic patterns within both contemporary and historic contexts. The control region of the mitochondria can be particularly useful in understanding genetic relationships of recently diverged species because it contains hypervariable regions [2]. The high variation can, however, be problematic for inferring phylogenetic relationships due to mutation rate heterogeneity among nucleotide sites [3] and high rates of homoplasy [4,5] that can lead to ambiguous phylogeographic patterns [6]. Although not without its own peculiarities [1], coding regions of the mtDNA genome may help clarify genetic and spatial relationships of species inferred from the control region alone. Although all regions of mtDNA are linked and the entire mtDNA genome is inherited as a single molecule without recombination, coding and non-coding sections exhibit different mutation rates due to higher selective forces acting on genes that code for functional proteins [1]. Thus, different patterns of diversity, divergence, and phylogenetic clustering may be evident when comparing regions under divergent selective forces.
In addition to complementing control region phylogenies, analysis of mtDNA coding regions may help resolve geographical distribution patterns because coding regions of the mtDNA are under strong selection due to their fundamental role in energy and heat production [7,8]. There is growing evidence that purifying selection on mtDNA coding regions has been important in shaping the evolution and distribution of mtDNA [8][9][10][11]. Additionally, adaptive selection may be important [12] with climatic adaptation acting as an influential factor in mtDNA geographic distribution [13][14][15], although some have disputed the climate hypothesis [8,16,17]. Despite this controversy, most agree that the evolution of mtDNA is likely more complex than any single factor could account for. Recent research, however, suggests that the mtDNA ATPase genes in particular, may be influenced by positive selection [8].
To date, studies of North American Canis phylogenetics have relied heavily on the mtDNA control region to infer species relationships and evolutionary history [18][19][20][21][22][23][24][25][26][27]. Phylogenetic analysis of the control region provided initial evidence for a North American-evolved wolf, the eastern wolf (Canis lycaon), that shared an evolutionary history with red wolves (C. rufus) and coyotes (C. latrans) independent of grey wolves (C. lupus) that evolved in Eurasia and dispersed into North American approximately 300,000 years ago [19,20]. Since then, various research has added to the growing evidence supporting the eastern wolf as a distinct species [28], including genetic analysis of historic [20] and ancient [25] samples. Despite this, the lineage of the eastern wolf continues to be challenged, in part because evidence from coding regions that are under selection is lacking [29]. Indeed, phylogenetic research on wild canids has rarely ventured beyond the mtDNA control region.
Phylogenetic analysis of nuclear Canis markers is complicated by historic and recent hybridization between coyotes, eastern wolves, and gray wolves [24,25,27]. The advantage of studying a circular, nonrecombining marker like mtDNA is that patterns of evolutionarily independent lineages can be more easily identified under complicated demographic histories. Here, we use both the control region and the ATPase coding region of the mtDNA genome to infer phylogenetic relationships of the eastern wolf to other Canis species. High levels of hybridization and probable incomplete lineage sorting in eastern North American Canis species [24,25,27] also complicate species inferences from phylogenetic analysis of mtDNA. We, therefore, focus on divergence of phylogenetic clades to provide an understanding of the evolutionary relationships and the role that selection has played in the geographic distribution of Canis mtDNA haplotypes. Our study is the first to provide a substantial analysis of the mtDNA ATPase region in wild Canis populations. For clarity, a list of abbreviations is included at the end of the manuscript.

Diversity and phylogenetic analysis
After combining sequences obtained from our study with 9 from Genbank we analyzed 83 control region sequences (347 bp; h = 29) and 89 ATPase sequences (1067 bp; h = 26) ( Table 1). Genbank accession numbers for sequences generated in this study are HM755678-HM755718. Similar to other studies on the control region [18], we found a higher proportion of NW coyote clustering haplotypes per sample size at both the control and ATPase region (0.38, 0.33) compared to OW haplotypes (0.29, 0.26). (Haplotype assignments to specific samples are shown in Additional File 1: Summary of sample locations and mtDNA control region and ATPase region haplotypes). Overall, we found high concordance between results from the control region and those from the ATPase region despite the different selective forces acting on the two regions. The control region had high variability but only 1.5 times more variable sites and 1.6 times higher nucleotide diversity per site (Pi) compared to the ATPase region (Table 1). Within specific phylogenetic clades, however, Pi was noticeably higher in the control region than the ATPase region ( Table 1). The higher diversity in the control region is expected since it is not known to code for functional proteins [30] and has been identified as a mutational hot spot [2]. Recent work, however, suggests that mutational hotspots also occur in coding regions of the mtDNA [31] making the high variation reported here for ATPase genes somewhat less surprising. There were fewer ATPase8 than ATPase6 haplotypes, due to the shorter sequence length of ATPase8, and grey wolves from Sweden, Russian, Spain, and Canada all shared an ATPase8 haplotype (Additional File 1), suggesting that ATPase8 is a highly conserved gene region.
There was a wide geographic distribution of NW coyote-like ATPase haplotypes ( Figure 1). Cladograms from each genetic region had very similar topologies, which is indicative of the haplotype association of the linked regions. Eastern wolf sequences (Ccr13, Ccr12; Catp13, Catp16) clustered monophyletically with high (> 0.9) posterior probability within the NW clade, but apart from coyotes, whereas the putative red wolf sequence clustered among coyote sequences within the NW CR2coyI /NW ATP2coyI clade (Figures 2a, b). Similar clustering of the red wolf control region sequence among coyote sequences has been reported [22], but the haplotype is attributed to red wolves because it is not known to occur in non-hybridizing coyotes from western North America. A similar argument has been made for coyote-like eastern wolf haplotypes [27,32]. These geographic distinctions of coyote-like sequences in eastern and red wolves combined with evidence of coyote-like sequences in eastern wolves prior to European settlement in North America [25] provide evidence for incomplete lineage sorting within the NW lineages, although ancient (~11,000 years ago) hybridization during the Wisconsin glaciation is difficult to rule out. This, combined with extensive hybridization in eastern Canis populations [23,24,26,27,33,34], makes species designations of wild Canis difficult when based on phylogenetic inference from mtDNA alone. For example, a conspecific nature of eastern wolves and red wolves is Analysis was conducted in DnaSP v. 5.10 with nucleotide diversity calculated according to [60].   suggested by nuclear data [19], but is not demonstrated by analysis of mtDNA. These differences are not unexpected and do not undermine the conspecific nature of eastern wolves and red wolves because gene trees based on mtDNA are not necessarily indicative of specific species relationships, and discordance is often found when comparing mtDNA and nuclear genetic signatures [35]. Given the recent divergence of NW lineages (see below) these issues are not unexpected [35]. New approaches to phylogenetic analysis that utilize multiple loci, including nuclear genes, may help reconcile inferred Canis species relationships [35,36], although extensive hybridization will likely continue to plague contemporary species designations based on nuclear markers. Regardless, the distinction of the two eastern wolf haplotypes shown in both the control region and ATPase region is indisputable, providing clear evidence for the presence of a North American evolved wolf lineage, distinct from coyotes and grey wolves.

Divergence and TMRCA
Fixed differences occurred in all but one comparison between groups (NW ATP2coyI vs. NW ATP3coyII ) ( Table 2). Whereas comparisons of the percentage of nucleotide differences were similar at both mtDNA regions when comparing deep divergences (ie. NW vs. OW), differences were substantially lower for more recent divergence comparisons within NW or OW evolved sequences ( Table 2), indicative of possible homoplasy in the hypervariable sections of the control region [4,5].
Overall patterns of divergence (D xyJC ) were similar for the control and ATPase regions, but estimates were consistently lower for ATPase compared to the control region ( Figure 3), reflecting the lower observed mutation rate in ATPase. Based on the control region, divergence of eastern wolves from coyotes was approximately 3.0%, that of the red wolf sequence from coyotes was 2.0%, and eastern wolves compared to red wolves was 2.7% ( Figure 3). These values are consistent with those reported for a 238 bp control region fragment (3.2%, 2.3%, 2.1%, respectively) [19]. As expected, divergence estimates from the ATPase region were lower (0.9%, 0.3%, and 0.8%, respectively) but were proportionally consistent with that from the control region. These results complement the phylogenetic analysis and show further confirmation for the eastern wolf lineage. Differences between OW evolved North American and Eurasian wolves were 1.8% for the control region and 0.6% at the ATPase region, suggesting a closer relationship between these lineages than between NW lineages.
TMRCA estimates from both genetic regions were similar and suggest divergence of eastern wolf sequences from other North American evolved sequences at approximately 486,300 -548,400 years ago (ya), and North American grey wolves from Eurasian wolves at 465,600 -518,100 ya, although the 95% highest posterior densities (HPD) had wide intervals (Table 3). These values for NW divergence are higher than previous TMRCA estimates of 150,000 -300,000 ya [19] but results are not entirely inconsistent when HPD is considered. Our estimate of NW coalescence is closer to that proposed for coyotes of 420,000 ya in [18], although that study did not identify distinct eastern wolf haplotypes because it occurred prior to identification of eastern wolf sequences [19].

Selection
Tests of neutrality showed significance in NW lineages, particularly in the ATPase6 region; for ATPase8, only Fu's Fs identified departure from neutrality (Table 4). Over the past 100 years, grey wolves have experienced a genetic bottleneck [37], whereas coyote populations have expanded across North America [23,27], presenting the two extremes of demographic history. Significantly negative values of neutrality statistics can be indicative of selection but are also consistent with either population subdivision or expansion [38], and Fu's Fs is a particularly powerful test of population growth [39,40]. Therefore, it is difficult to disentangle selection from demographic history when interpreting neutrality tests [1,41] and care should be taken not to over interpret results from a rejection of the neutrality hypothesis. The known demographic history, however, suggests that results from neutrality tests presented here are indicative of the population expansion of coyotes. The high rate of synonomous substitutions (SS) in the ATPase genes, particularly in the ATPase6 region where  dN/dS ratios were all < 0.3, indicate that purifying selection has been influential in Canis mtDNA evolution, particularly in NW lineages (Table 5). This excess of synonomous substitutions in mtDNA coding regions is consistent with that previously found in wolves, coyotes and dogs [42] and in other mammals including humans [17,43] and mice [11]. This pattern is thought to mainly affect terminal branches of phylogenetic trees suggesting recently diverged groups show a stronger synonomous substitution signal [16], which is consistent with the pattern observed in our dataset. The ratio of nonsynonomous to synonomous substitutions (dN/dS ) was highest for OW lineages from northern climates in the ATPase6 region, but the same pattern was not observed for ATPase8 ( Table 5). The difference in dN/dS ratios between OW and NW lineages is not unusual given that mutations can be neutral in some lineages but non-neutral in others [44]. It has been proposed that amino acid variation in the ATPase genes may reduce the efficiency of oxidative phosphorylation, thereby decreasing ATP production while increasing heat production thus conferring a selective advantage for certain haplotypes in colder climates [13][14][15], although this hypothesis is not without controversy [8,16,17]. The higher dN/dS ratios for ATPase6 in OW evolved lineages in northern climates compared to NW lineages from more temperate and subtropical climates (Table 5) hints at a possible role for adaptive selection in response to climate for Canis mtDNA   evolution [13], but statistical tests of nonsynonomous and synonomous substitutions did not support adaptive selection in northern regions (Table 6). Although a wolf poisoning campaign in the NWT during the 1950s [45] may have decreased genetic diversity of NWT grey wolves somewhat, it is unlikely to have impacted haplotype diversity to the extent that targeted extermination efforts did in the US [37]. Further complicating interpretation is the recent suggestion that analysis of dN/dS ratios and the McDonald-Kreitman test are ineffective at detecting positive selection [46,47]. The discrepancies in our analysis and in the human literature does not necessarily mean that climatic adaptation has not influenced the evolution of mtDNA lineages, but suggests rather that mtDNA evolution is more complex than climatic variation alone can explain [8]. Although further investigation on a larger dataset with an alternative approach [15] may help clarify the role of climate in shaping the distribution of Canis mtDNA, and could provide valuable insight into the observed patterns of mtDNA introgression in eastern North American populations, it will remain difficult to separate a signal for adaptive selection from the dramatic and contrasting demographic histories of Canis populations. The development of novel analytical tools will be required to adequately disentangle natural selection from demographic processes.

Conclusions
Here, we provide important new data for phylogenetic inference of wolves and coyotes in North America. We know of no other study that reports as extensively on the ATPase region in wild Canis species. Similar patterns of diversity and divergence between the control region and ATPase regions suggest that evolutionary patterns can be inferred from non-coding regions of mtDNA. Overall phylogenetic concordance between the control and ATPase regions suggests that the control region can be an informative marker for inferring gene trees when dealing with recent divergence. Of particular importance is the monophyletic clustering of eastern wolf sequences under a new Bayesian analytical approach, thereby providing further evidence for a distinct North American evolved wolf, independent of coyotes and grey wolves, that inhabited the temperate forests of eastern North America prior to colonization by European settlers. In addition, eastern wolf sequences are further diverged from coyotes than the red wolf sequence. This does not necessarily imply that the red wolf is not a distinct species but rather supports the assignment of coyote-like sequences as eastern wolf specific.
Understanding the role that selection has had on mtDNA evolution and distribution is a more difficult task. Although the high rate of synonomous substitutions provides evidence that purifying selection may have influenced the evolution of Canis mtDNA, especially in NW lineages, the role of adaptive selection in response to climate is more ambiguous. Adaptive selection may play a role in the geographic distribution of OW mtDNA sequences in North America, but alternative analytical approaches will no doubt be required to adequately test this hypothesis.
Based on the human literature, however, it seems probable that climate is influential in adaptive selection of mtDNA. Further research on adaptive selection of Canis mtDNA is particularly important because it provides a mechanism by which eastern wolf and coyote like mtDNA have introgressed extensively into greyeastern wolf hybrids in northern Ontario and the Great Lakes region [24,26,27,32,48] and why eastern wolf mtDNA is prevalent in eastern coyote populations [23,27,33,34]. It is important to note that mtDNA introgression can occur with little or no obvious nuclear introgression [49][50][51], and in some cases completely replace mtDNA in the absence of apparent nuclear introgression [50,52]. It is possible, therefore, that introgression of NW mtDNA lineages into grey-eastern wolf hybrids in eastern North America, and introgression of eastern wolf mtDNA into eastern coyotes, reflect chance or rare events on which selection then acted creating species discordance between mtDNA and the nuclear genome [1]. Overall, this research provides a new and important framework with which to study patterns of mtDNA introgression and geographic distribution in species where taxonomy has been blurred by incomplete lineage sorting and/or hybridization.  Figure 2b.
Higher dN/dS values in samples from more northern climates suggest a selective advantage at colder temperatures.

Sequencing
Previously extracted DNA from 83 individuals from various North American Canis species (eastern wolves, grey wolves, red wolves, and coyotes) were selected to represent a broad spectrum of geographic regions and mtDNA haplotypes. Extraction methods are provided elsewhere [19,53]. Nine additional sequences analyzed in [42] were obtained from GenBank (Accession Numbers:

Diversity and phylogenetic analysis
Measures of DNA sequence variation within and among groups, including number of haplotypes (h), variable sites, nucleotide diversity per site (Pi), and average number of nucleotide differences between groups were calculated for the control region, the full ATPase sequence, ATPase6, and ATPase8 in the software program DnaSP v5.10 [54]. Calculations for ATPase6 and ATPase8 were limited to a subset of groups analyzed with the full 1067 bp sequence because the number of variable sites was lower in the specific gene regions, especially in ATPase8, due to the shorter fragment size. Phylogenetic analysis was conducted under a Bayesian framework implemented in the program BEAST v. 1.4.8 [55]. We combined 3 independent runs, each with 10,000,000 MCMC iterations while sampling from the chain every 1000 steps. We used a relaxed uncorrelated lognormal molecular clock [56] with a substitution rate of 3.8 × 10 -8 /year for the control region [37,48] and 1.5 × 10 -9 for the ATPase region based on the median rate for substitution at the cytochrome b region in carnivores [57]. Based on Bayesian Information Criteria (BIC) generated in ModelGenerator [58] we used an HKY [59]

Divergence and TMRCA
Divergence between clades was estimated by comparing the number of fixed differences, the average number of nucleotide differences, and the average number of nucleotide substitutions per site with a Jukes and Cantor correction (D xy_JC ) [61] calculated in DnaSP v5.10 [54].
We also compared divergence of the putative red wolf sequence with the eastern wolf clade and other sequences in the broader coyote clade. Time to most recent common ancestor (TMRCA) was estimated in BEAST software with parameters described above. Four sequences were selected at random to represent the main clades: DQ480503 (Ccr15; Catp09) represented the OW Eurasian lineage, CAN001806 (Ccr03; Catp03) represented the OW North American lineage, CAN004377 (Ccr12; Catp16) represented the eastern wolf lineage, and CAN000142 (Ccr26; Catp22) represented the coyote lineage. Tree calibration was done by setting the divergence distribution between OW and NW lineages based on fossil evidence at 1.5 million years ago and a standard deviation of 0.5 million years such that the 95% range would be 1 -2 mya [19,62].

Selection
For all the haplotypes identified in the ATPase phylogenetic tree (n = 25, dog excluded) we examined the ATPase6 (681 bp) and ATPase8 (204 bp) regions to test for selection. To test the hypothesis of neutral evolution, we used DnaSP v 5.10 [54] to calculate Tajima's D [63], Fu and Li's D*, Fu and Li's F* [64], and Fu's Fs [65] overall and within clades. Significance values for each test are based on the confidence limits of D, the critical values of D* and F* [64], or with 1000 replicates in the coalescent simulations approach of DnaSP v. 5.10 for Fu's Fs. We also tested for neutrality in a group of OW sequences from northern climates (Russia, Sweden, and Northwest Territories, Canada).
To determine whether purifying or adaptive selection influenced the evolution of Canis mtDNA, we compared the number of nonsynonomous substitutions (NSS) to synonomous substitutions (SS), and the ratio of nonsynonomous substitutions per nonsynonomous site to synonomous substitutions per synonomous site (dN/dS) in ATPase6 and ATPase8 overall, within clades, and in the group of OW sequences from northern climates. To test whether climate may have influenced Canis mtDNA distribution, we compared dN/dS at ATPase6 and ATPase8 for all NW sequences compared to a) all OW sequences and b) compared to OW sequences from northern climates. If climate were a factor in the adaptive selection of mtDNA, one would expect higher dN/dS ratios in northern climates compared to more temperate regions [13]. Significance of differences was determined with a two-tailed Fisher's exact test on the raw data (NSS and SS) with the online calculator available at http://faculty.vassar.edu/lowry/VassarStats.html (accessed October 22, 2009) and a McDonald-Kreitman test [66] conducted in DnaSP v. 5.10.

Additional material
Additional file 1: Sample information and haplotype summary. Canis mtDNA control region and ATPase haplotype summary with comparisons of control region haplotypes found in this study to previously published literature.