Mitochondrial phylogenomics of the Bivalvia (Mollusca): searching for the origin and mitogenomic correlates of doubly uniparental inheritance of mtDNA

Background Doubly uniparental inheritance (DUI) is an atypical system of animal mtDNA inheritance found only in some bivalves. Under DUI, maternally (F genome) and paternally (M genome) transmitted mtDNAs yield two distinct gender-associated mtDNA lineages. The oldest distinct M and F genomes are found in freshwater mussels (order Unionoida). Comparative analyses of unionoid mitochondrial genomes and a robust phylogenetic framework are necessary to elucidate the origin, function and molecular evolutionary consequences of DUI. Herein, F and M genomes from three unionoid species, Venustaconcha ellipsiformis, Pyganodon grandis and Quadrula quadrula have been sequenced. Comparative genomic analyses were carried out on these six genomes along with two F and one M unionoid genomes from GenBank (F and M genomes of Inversidens japanensis and F genome of Lampsilis ornata). Results Compared to their unionoid F counterparts, the M genomes contain some unique features including a novel localization of the trnH gene, an inversion of the atp8-trnD genes and a unique 3'coding extension of the cytochrome c oxidase subunit II gene. One or more of these unique M genome features could be causally associated with paternal transmission. Unionoid bivalves are characterized by extreme intraspecific sequence divergences between gender-associated mtDNAs with an average of 50% for V. ellipsiformis, 50% for I. japanensis, 51% for P. grandis and 52% for Q. quadrula (uncorrected amino acid p-distances). Phylogenetic analyses of 12 protein-coding genes from 29 bivalve and five outgroup mt genomes robustly indicate bivalve monophyly and the following branching order within the autolamellibranch bivalves: ((Pteriomorphia, Veneroida) Unionoida). Conclusion The basal nature of the Unionoida within the autolamellibranch bivalves and the previously hypothesized single origin of DUI suggest that (1) DUI arose in the ancestral autolamellibranch bivalve lineage and was subsequently lost in multiple descendant lineages and (2) the mitochondrial genome characteristics observed in unionoid bivalves could more closely resemble the DUI ancestral condition. Descriptions and comparisons presented in this paper are fundamental to a more complete understanding regarding the origins and consequences of DUI.

Results: Compared to their unionoid F counterparts, the M genomes contain some unique features including a novel localization of the trnH gene, an inversion of the atp8-trnD genes and a unique 3'coding extension of the cytochrome c oxidase subunit II gene. One or more of these unique M genome features could be causally associated with paternal transmission. Unionoid bivalves are characterized by extreme intraspecific sequence divergences between gender-associated mtDNAs with an average of 50% for V. ellipsiformis, 50% for I. japanensis, 51% for P. grandis and 52% for Q. quadrula (uncorrected amino acid p-distances). Phylogenetic analyses of 12 protein-coding genes from 29 bivalve and five outgroup mt genomes robustly indicate bivalve monophyly and the following branching order within the autolamellibranch bivalves: ((Pteriomorphia, Veneroida) Unionoida). Conclusion: The basal nature of the Unionoida within the autolamellibranch bivalves and the previously hypothesized single origin of DUI suggest that (1) DUI arose in the ancestral autolamellibranch bivalve lineage and was subsequently lost in multiple descendant lineages and (2) the mitochondrial genome characteristics observed in unionoid bivalves could more closely resemble the DUI ancestral condition. Descriptions and comparisons presented in this paper are fundamental to a more complete understanding regarding the origins and consequences of DUI.

Background
Mitochondrial DNA (mtDNA) is the only extranuclear genome in animal cytoplasm. Located in the matrix of mitochondria, metazoan mtDNA is normally a small circular DNA molecule about 14-16 kilobases (kb) long usually encoding the same 37 genes ( [1,2]; but see [3] for exceptions). Typically, all mtDNAs in the zygote come from the oocyte and even though evidence for occasional paternal leakage has been reported [4,5], animal mtDNA is thought to strictly follow maternal inheritance [6]. This clonal inheritance coupled with the successive cell divisions that represent sequential bottlenecks for the mitochondrial population [6][7][8] result in an essentially homoplasmic state for mtDNA. An extreme exception to the paradigm of strict maternal inheritance of animal mtDNA (SMI) is found in three bivalve lineages (i.e., the orders Mytiloida, Unionoida and Veneroida), which possess an unusual system termed doubly uniparental inheritance of mtDNA (DUI) (see [9,10] for reviews).
In DUI-possessing organisms, distinct gender-associated mitochondrial DNA lineages coexist: a female-transmitted (F) genome and a male-transmitted (M) genome. Under DUI, female bivalves transmit their mitochondria (carrying F mtDNA) to both sons and daughters, as in SMI, but males pass on their mitochondria (via sperm carrying M mtDNA) to only sons (e.g., [11] but see [12]). At the organismal level, male bivalves with DUI are thus heteroplasmic and contain both M and F genomes. In male somatic tissues, the F genome predominates while in male gonadal tissues, the M genome is predominant [13] and it appears to be the exclusive type in sperm [14]. In females, both somatic and gonadal tissues typically contain the F genome, but the occasional presence of a small amount of the M genome has been demonstrated in somatic tissues and ovaries of some species [12][13][14][15][16].
The broad taxonomic distribution of DUI within the Bivalvia (e.g., [17][18][19][20][21][22][23][24][25][26]) reinforces the idea that it evolved once in an ancestral bivalve lineage, from standard uniparental inheritance, and was lost in some descendant bivalve lineages (e.g., oysters and probably scallops) [23,27,28]. DUI could then be the ancestral condition for the Bivalvia, however, a more definitive statement to this effect rests on producing a more reliable bivalve phylogeny along with clarifying the distribution of DUI in additional bivalve lineages. Although many of the essential elements of DUI have been described, (i.e., distinct M and F lineages, heteroplasmy in males, rapid molecular evolution particularly of M types [17][18][19]21,22,24,[29][30][31], the current and/or historical function of DUI still remains a mystery. Comparisons of entire F and M genomes (as opposed to partial sequences of a few genes) will enable the characterization of potential gene content/organizational/functional differences between the M and F genomes, and will help to reconstruct the history of any possible recombination and/or gene translocation events in these distinct, gender-associated lineages.
To date, 15 complete or nearly complete F and M mtDNA genome sequences are available for species with DUI but these are numerically biased towards marine taxa (i.e., species from the Mytiloida and Veneroida) [32][33][34][35][36][37] (Table 1). While the vertebrate mitochondrial gene order is almost invariant, mollusks, and bivalves in particular, exhibit radical rearrangements of mitochondrial genes and extensive mtDNA variability at the intrageneric level [2,38,39]. For example, the two congeneric oyster species Crassostrea virginica and C. gigas, both lacking DUI [28], show broad differences in gene content and gene order with relocation of most tRNA genes [2,40]. At the species level, the extent of genome rearrangement between the two distinct gender-associated mitochondrial genomes appears to vary greatly among the three divergent bivalve lineages. In the Mytiloida, the gene order and content of F and M genomes from a species are conserved but both lack the gene for ATPase subunit 8 (atp8) and have a second tRNA gene for methionine (trnM) [31,34,36]. By contrast, in the marine clam Venerupis philippinarum, gene content differs between M and F genomes as we observe a gene duplication for the cytochrome c oxidase subunit II gene (cox2) in the F genome and an extra trnM in the M genome, and both genomes have a short atp8 gene (37 amino acids) the function of which is unclear [2,41]. In freshwater mussels, an M genome-specific 3' extension of the cytochrome c oxidase subunit II gene (Mcox2) has been clearly demonstrated [22,25,42]. This functional extension is a unique feature of unionoid M genomes and typically yields an~80% increase in gene length relative to the female-transmitted cox2 gene [43]. In unionoid bivalves, the presence of atp8 has been confirmed in both M and F genomes [2,37]. Also, the F genomes of Inversidens japanensis and Hyriopsis cumingii (both in the subfamily Gonideinae) exhibit a different gene order compared with the M genome of I. japanensis and the F genome of Lampsilis ornata ( [37]; Zheng RL and Li JL, personal communication) (see Table 1). Analysis of gene order for additional F and M mtDNA genomes from freshwater mussels will allow us to test the following alternative hypotheses: (1) the translocation of several genes as observed in the F genomes of I. japanensis and H. cumingii represents an idiosyncratic gene rearrangement unique to these species or to the subfamily Gonideinae or (2) that this gene arrangement is in fact a characteristic shared with other unionoid species' F genomes or other subfamily.
Having additional unionoid F and M genomes available for comparative analyses would also significantly illuminate investigations into the likely unique origin of DUI [26]. Analyses of morphological and molecular datasets indicate that unionoid bivalves, together with trigonioid bivalves, compose a monophyletic subclass, the Paleoheterodonta, [44][45][46][47][48][49]. The relative antiquity of this subclass within the Bivalvia is supported by the molecular sequence-based phylogenies, presented in , which suggest that the Paleoheterodonta are a product of an early cladogenic event in extant autolamellibranch (~suspension-feeding) bivalves. Given the hypothesized relatively basal position of unionoids in the bivalve phylogeny, their mt genomes could retain ancestral character states that are informative with respect to the initial mt genome duplication event (i.e., the formation of a distinct male-transmitted lineage in addition to a female-transmitted lineage) that led to the evolution of DUI.
In the present study, six new complete mitochondrial genomes, namely, the F and M genomes of the unionoid bivalves Venustaconcha ellipsiformis (Unionoida: Unionidae: Ambleminae: Lampsilini), Pyganodon grandis (Unionoida: Unionidae: Unioninae: Anodontini) and Quadrula quadrula (Unionoida: Unionidae: Ambleminae: Quadrulini), were compared with the available complete genomes of DUI species deposited in GenBank and their gene order, gene content and variation were analyzed. Additionally, complete bivalve mt genomes were phylogenetically analyzed to further test the hypothesized basal position of the Paleoheterodonta among extant autolamellibranch bivalves and to evaluate the evolutionary history of mitogenomic character state changes. The aim is to provide a context for comparisons of mt genomes among DUI and non-DUI bivalve lineages, and ultimately to identify the gene region(s) involved in the manifestation of DUI. Such descriptions and comparisons will contribute to a more complete picture of the evolution not only of the DUI system per se, but also of the factors involved in the near universal presence of SMI in animals.

Phylogenetic analysis
The majority-rule codon-based BI tree (Figure 1), derived from using concatenated sequences of mitochondrial protein-coding genes, is well resolved and very similar in topology to the best BI tree produced from analysis of amino acids as well as to the best nucleotide-and amino acid-based ML and parsimony trees (not shown). In summation, these trees clearly  * * * * Figure 1 Bayesian inference majority-rule tree of bivalve mt genome relationships based on an analysis using the M3 codon substitution model and a nucleotide alignment of 12 mitochondrial protein-coding genes (atp8 excluded). Numbers above an internal branch, from top to bottom, indicate nodal support values from BI, ML and MP nucleotide-based analyses, respectively. Numbers below an internal branch, from top to bottom, indicate nodal support values from BI, ML and MP amino acid-based analyses, respectively. Only nodal support values > 50% are presented. An asterisk above an internal branch indicates that all three nucleotide-based nodal support values are 100; an asterisk below an internal branch indicates that all three amino acid-based nodal support values are 100. Branch lengths reflect substitutions per site and the taxonomic and gender-specific transmission affiliations of the individual sequences are indicated at the right. All phylogenetic analyses strongly indicate that the unionoids represent the basal lineage for the bivalve taxa represented in this analysis.
(P. grandis) and 16,970 bp (Q. quadrula). The length differences are mainly due to the presence of a unique M genome-specific 3' extension of the cytochrome c oxidase subunit 2 (cox2) gene and longer unassigned/noncoding regions in M genomes. The A+T composition is very similar among the six newly sequenced genomes but higher than F and M I. japanensis (Table 2). Ribosomal RNA genes and protein-coding genes (except atp8) of M and F genomes of V. ellipsiformis, P. grandis and Q. quadrula are arranged identically but tRNA order differs among the analyzed genomes ( Figure 3). In unionoids, ten or eleven genes out of~37 are located on one strand and all the other genes on the opposite (Figure 3).

tRNA Histidine
In the three newly sequenced M genomes of this study, tRNA histidine (trnH) is positioned between nd5 and nd1 while in F genomes it is located between nd2 and nd3 ( Figure 3). The location of the trnH gene in the I. japanensis M genome has previously been identified between cox1 and cox2 (Okazaki M and Ueshima R, personal communication). This location also corresponds to Mcox2e. However, our reannotation of the I. japanensis M genome identifies trnH between nd5 and nd1 as in the three new M genomes. Examination of 41 entire mt genomes across the Mollusca allows us to group some classes of mollusks according to the position of trnH. For example, nd5-trnH-nd4 (encoded on the heavy strand) is the common organization in the Cephalopoda while nd4-trnH-nd5 (encoded on the light strand) and cox2-trnG-trnH-trnQ-trnL2-atp8 (encoded on both strands) are two most common arrangements found in the Gastropoda. Except for unionoids genomes, no common arrangement is found in bivalves and the arrangement of the trnH-containing region appears to be unique to each genus or species sequenced to date. Nd2-trnR-trnH-nd4 is found in the genus Crassostrea spp. (oysters) whereas the marine mussels Mytilus spp. possess the arrangement nd2-trnR-trnW-trnA-trnS-trnH-trnP-nd3. Because of the current uncertainty regarding molluscan phylogeny, a rigorous ancestral character state reconstruction is not possible. However, the arrangement observed in Mytilus spp. could be an inversion (+ strand reversion) of the  nd3-trnH-trnA-trnS1-trnS2-nd2 observed in the unionoid F genomes. In the four unionoid M genomes, trnH is located between nd5 and nd1 and the only other molluscan species with a similar location for its trnH (i.e., nd5-trnL-trnH-nd1) is the patellogastropod limpet Lottia digitalis. The gene order of the L. digitalis mt genome is the most divergent among all gastropod mtDNAs sequenced thus far [50].

Extension of the M cytochrome c oxidase subunit II gene
The three analyzed M genomes possess the unique 3' extension of the cytochrome c oxidase subunit II gene (Mcox2e) [22]. In the three newly sequenced M genomes, the extension is 187 codons (V. ellipsiformis and P. grandis) or 186 codons (Q. quadrula) in length while the I. japanensis extension is slightly shorter with 181 codons.

Atp8 gene
As in the F genome of L. ornata [37] and the reannotated F and M genomes of I. japanensis [2], the newly sequenced F and M genomes contain the 13 protein-coding genes commonly found in other animal mtDNAs.
Only the M genome of P. grandis appears to lack a complete atp8. In this species, a remnant of the atp8 gene that corresponds to the first 15 amino acids (MPQLSPVYWVSIFFL) of the protein, and that shows similarities with other atp8 genes sequenced in this study, has been identified between trnD and atp6 ( Figure 3). Those 15 amino acids are followed by a complete stop codon. After the stop codon, we also identified an open reading frame, in a different frame than the first 15 amino acids, which could correspond to the remainder of atp8.  V. ellipsiformis Figure 3 Gene maps of the M and F mitochondrial genomes of Venustaconcha ellipsiformis, Pyganodon grandis, Inversidens japanensis and Quadrula quadrula. Protein and rRNA genes are named as in the text while tRNA genes are abbreviated by the one-letter code of the corresponding amino acid (L1 = trnL (cua), L2 = trnL (uaa), S1 = trnS (aga), and S2 = trnS (uaa)). Genes positioned inside the plain line are encoded on the heavy strand and genes outside the line are encoded on the light strand. Atp8* (= genomes lacking full size atp8 gene). Black arrows on the V. ellipsiformis M genome indicate regions that differ between male-and female-transmitted genomes and the arrow on the I. japanensis F genome indicates the region with a gene order distinct from that of the other figured F genomes. The circular gene maps of the genomes were drawn by GenomeVx [113] followed by manual modification.

NADH dehydrogenase subunits 4 and 4L genes
Most unionoid mt genomes examined in this study have an overlap of 7 bp for subunits 4 and 4L of the NADH dehydrogenase complex (nd4 and nd4l). Two exceptions are the M genome of V. ellipsiformis, which contains a noncoding region of 120 bp between nd4 and nd4l, and the F genome of I. japanensis, which possesses one nucleotide between those two genes.

Base composition and codon usage
The base composition bias of an individual strand can be described by skewness [51], where CG-skew = (C -G)/(C + G) and AT-skew = (A -T)/(A + T). The strand encoding most of the proteins (including cox1) from the F and M genomes of all unionoid species has strong negative CG-and AT-skews (Table 2). Skews calculated for the opposite strand in all six genomes indicate complementary strand bias, with positive CG-and AT-skew values ( Table 2), an expected result since, for example, A-skew on one strand is usually balanced by T-skew on the other [52].

Transfer RNA genes
In all eight unionoid mt genomes, we identified all 22 tRNA genes according to their secondary structure features and their corresponding anticodons. Most have the potential to fold into a normal cloverleaf structure, although some do not have paired DHU arms, and a few others have mismatched bp. The putative cloverleaf secondary structures of unionoid tRNAs are available in the additional files (Additional file 1, Figures S1, S2, S3, S4, S5, S6, S7 and S8). The tRNA genes are~60-70 bp long and the mean GC content varies between 35.2% and 37.4%. In the eight mt genomes, most of the tRNA genes are located on the light strand; only trnH (Histidine) and trnD (Aspartate) are located on the heavy strand along with most of the protein-coding genes. As specified earlier, trnH has distinct localizations in F and M unionoid genomes ( Figure 3). For all mt genomes (except M I. japanensis), the DHU arm of trnS1 (Serine) is unpaired. Unpaired DHU arms are also observed for the second Serine trnS2 (tct) and Threonine trnT in the M genome of P. grandis, for the Arginine trnR and Threonine trnT in the M genome of V. ellipsiformis and for the Cysteine trnC and Threonine trnT in the M genome of Q. quadrula. DHU arm for the Lysine trnK in the F genome of Q. quadrula is also unpaired. No unpaired DHU arm has been observed in tRNAs of M I. japanensis.

Unassigned regions and putative control regions
Twenty-two or thirty-three unassigned regions were detected in the six genomes, with sizes ranging from 1 to 1196 bp. The three newly sequenced F genomes present a more compact arrangement than the three M genomes (Table 3). We observed the opposite trend in I. japanensis where the F genome contains more unassigned sequences (12.9% of the genome) than the M genome (8.9% of the genome). The abundance of unassigned sequences in both F and M genomes of the four unionoid species analyzed here is similar with the results observed in their mytiloid F and M counterparts (~10% of unassigned sequences) [31,34,36]. In comparison, the veneroid clam V. philippinarum M and F genomes have a higher proportion of unassigned sequences (i.e., > 15.8% for F and > 21.3% for M) (Okazaki M, Ueshima R, personal communication). Table 4 contains Intra-and interspecific comparisons of the concatenated nucleotide and amino acid sequences of 12 protein-coding genes (atp8 has been excluded and only species for which both F and M genomes are available were used) from the M-and F-transmitted mitochondrial genomes of unionoid bivalves. The smallest nucleotide and amino acid distances are observed between the two F genomes of the most closely-related species, i.e. V. ellipsiformis and Q. quadrula (P of 0.207 and P(aa) of 0.157). Overall, nucleotide and amino acid sequence divergences between all pairs of F genomes are considerably lower than between M pairs or between the M and F genomes of the same species. Early estimations of the nucleotide divergence between M and F genomes of unionoid bivalves were based on p-distances of partial cox1 sequences [20] and were around 28 to 33%. Our results show that these early estimations were conservative and the average uncorrected divergence between the F and M concatenated nucleotide sequences for the 12 mitochondrial protein-coding genes is 41% for V. ellipsiformis, 42% for Q. quadrula and 43% for P. grandis and I. japanensis. These very high intraspecific divergences are observed at both the DNA and protein levels ( Table 4).

Phylogenetic analysis
The seemingly anomalous difference in branching pattern between the M and F Mytilus genomes is due to an asymmetric introgression of M. edulis M mtDNA into the Baltic M. trossulus [36]. Nevertheless, the ingroup (= bivalves) topology in Figure 1 is consistent with other sequence-based phylogenetic reconstructions in that the Unionoida is basal to Pteriomorphia+Veneroida (e.g., [41,45,48]) thus reinforcing the hypothesis that the Unionoida is a relatively ancient bivalve lineage potentially harboring the ancestral characteristics of DUI. Although the phylogenetic hypothesis ((Pteriomorphia, Veneroida) Unionoida) is not typically portrayed as the best estimate of evolutionary relationships for these lineages at this time (e.g., [53]: Figure six point height), the statistical robustness of our phylogenetic analyses with regards to the Bivalvia (Figure 1) indicates that it should be seriously evaluated in future, higher level bivalve phylogenetic studies.
The three origins of DUI for the taxa included in this study ( Figure 2A) runs counter to the prevailing hypothesis of a single origin for this complex trait with subsequent reversals to SMI [23,[26][27][28] but it is not unexpected given the bias toward "DUI absence" stemming from the difficulties in confirming the presence of DUI (e.g., [20,25,26,54]). The complexity of the cytonuclear interactions involved in DUI and its very narrow taxonomic distribution are consistent with the hypothesis that the gain of DUI is a relatively rare event with subsequent losses being potentially more common. If a low ratio of rate of DUI gain to rate of DUI loss actually holds, then the use of the Dollo parsimony model [55,56] is more appropriate than the use of the MK1 model and the former indicates a single gain of DUI with three subsequent losses ( Figure 2B). A much more accurate understanding of the actual taxonomic distribution of DUI combined with a taxonomically expanded version of our robustly supported bivalve phylogeny (Figure 1) would allow a rigorous evaluation of the single vs. multiple origins of DUI hypotheses as well as the rates of DUI gain vs. loss.

Genome structural features
Overall, the most notable differences observed between M and F unionoid genomes are (i) the position of trnH, (ii) an inversion of the trnD and atp8 genes, (iii) the length of the cox2 gene (the M genomes possess a 3' extension of cox2) as well as (iv) a noncoding region between nd4 and nd4l in the M mtDNA genome of V. ellipsiformis (Figure 3). The female-transmitted mtDNAs of V. ellipsiformis, P. grandis and Q. quadrula are comparable in many respects to the F mtDNA of L. ornata, which is unique in gene arrangement relative to all other molluscan and metazoan mt genomes [37]. The mitochondrial gene order rearrangement in the F genomes of I. japanensis and H. cumingii (Unionoida: Unionidae: Ambleminae: Gonideini), i.e. the relative positions of the nd2, trnM to nd3 genes, appear to be unique to the Gonideini as neither L. ornata, V. ellipsiformis (Unionidae: Unionidae: Ambleminae: Lampsilini), Q. quadrula (Unionoida: Unionidae: Ambleminae: Quadrulini) nor P. grandis (Unionidae: Unionidae: Unioninae: Anodontini) show this rearrangement. We suggest that this distinct gene order in the F genomes of I. japanensis and H. cumingii resulted from a tandem duplication of the gene region followed by the deletion of segments of the duplicated gene region. Losses and gains of genes, gene rearrangements and unusually large amounts of duplicated or noncoding nucleotides are common in mollusk mitochondrial genomes [39,57,58]. When looking across other complete bivalve genomes, which include species from the orders Pectinoida, Ostreoida, Veneroida and Mytiloida (Organellar Genome Retrieval database OGRe; [59]), all genes are characteristically on the same strand ( Figure 4A). Among the Bivalvia, only in unionoids are the genes transcribed in both directions. Robust phylogenies are necessary to compare bivalve mt genome arrangements in an evolutionary context and our understanding of the phylogenetic relationships among the major lineages within the Bivalvia and among molluscan classes is still limited and controversial [60]. Nonetheless, based on bivalve phylogenetic analyses presented in Hoeh et al. [45], Giribet and Distel [48], Dreyer and Steiner [41] and herein (Figure 1), the "all-on-one-strand" phenotype likely represents a shared, derived characteristic that evolved once in the common ancestor of the Pteriomorphia and Veneroida with the unionoid model of genes on different strands representing the ancestral state for the Bivalvia ( Figure 4A).

tRNA Histidine
While the gene boundary nd3-trH on the heavy strand observed in unionoid F genomes is not shared by any other mollusk taxon studied so far, the gene boundary nd5-trnH observed in the four unionoid M genomes is also shared by nine species of cephalopods, the polyplacophoran Katharina tunicata and the gastropod Haliotis rubra. This particular gene boundary could represent the ancestral character state for the Mollusca. The tRNA genes are the most evolutionarily mobile elements of the animal mitochondrial genome and variation in mitochondrial tRNA gene organization have been found in multiple divergent taxa [61,62]. Rearrangement of tRNAs occurs frequently because their secondary structure facilitates their translocation [63]; alternatively, rearrangements can also result from a duplication event [64].

Extension of the M cytochrome c oxidase subunit II gene
All unionoid bivalve M genomes examined to date contain an Mcox2e region [25], which is not present in other DUI-possessing bivalve lineages nor, apparently, in any other animal mitochondrial genomes [43]. Structural characterization of the MCOX2e region predicted the presence of an interspecifically variable number of transmembrane helices [43], and immunohistochemistry-and immunoelectronmicroscopy-based analyses revealed that MCOX2e is expressed in sperm mitochondria [65] and is sub-cellularly localized to both inner and outer mitochondrial membranes [16]. The latter localization, which possibly "tags" the outer surface of unionoid M genomebearing mitochondria, could facilitate the differential segregation of the M genome-containing mt, derived from the fertilizing sperm, in male and female embryos (as observed in Mytilus; [66,67]). Consistent with the above, seasonal variation in expression profiles suggest that unionoid MCOX2e functions in reproduction [16,43,64].

Atp8 gene
In animal mtDNAs, the atp8 gene is the smallest protein-coding gene (≈ 50 to 65 aa) with only a few highly conserved amino acid residues. It encodes a protein subunit of the F 0 portion of the mitochondrial ATP synthase, which is the enzymatic complex that drives the phosphorylation of ADP to ATP. The ATP synthase comprises the F 1 catalytic domain situated in the mitochondrial matrix and the F 0 proton pore embedded in the mitochondrial inner membrane. Although the specific function of ATP8 is not yet known, in yeast, it is thought to play an important role in the assembly of the F 0 portion of ATP synthase and in determining ATP synthase activity (reviewed in [68]). In mammals, it is the most rapidly evolving mitochondrial protein-coding gene [69]. Atp8 has been lost independently from the mt genomes of several lineages including some bivalves [e.g., marine mussels possessing DUI [31,34,36] and oysters [70], secernentean nematodes [71], and platyhelminths [72]. Interestingly, all other mollusk species (i.e., gastropods, cephalopods, polyplacophorans and scaphopods) studied to date possess an atp8 gene [2,70]  philippinarum also possess a short putative atp8 gene (37 aa; [41]) and a potential remnant of the atp8 gene has been found in the eastern oyster Crassostrea virginica [40]. These observations and the phylogeny displayed in Figure 1 reinforce the hypothesis that unionoid mt genomes possess the molluscan ancestral character state (= the presence of atp8) and that two losses of atp8 in veneroids and another in the common ancestor of the Pteriomorphia could have occurred during bivalve phylogenesis ( Figure 4B). In the M genome of P. grandis, even though we identified an open reading frame that corresponds to a portion of atp8, the complete stop codon early in the sequence could yield a non-functional protein. Further analysis will be necessary to confirm or refute the existence of a functional atp8 in the M genome of this species. For now, the presence/absence of atp8 seems extremely labile across bivalve taxa, but this phenomenon does not appear to be related to the presence/absence of doubly uniparental inheritance.

NADH dehydrogenase subunits 4 and 4L genes
NADH dehydrogenase subunits 4 and 4L genes generally overlap or are adjacent to one another in animal mt genomes [57,73]. This is also the case for most unionoid mt genomes examined in this study except for the M genome of V. ellipsiformis where those genes are separated by a large noncoding region as well as for the F genome of I. japanensis, where both genes are separated by one nucleotide. We cannot exclude the possibility that this single nucleotide in the F I. japanensis sequence represents a sequencing error.
In vertebrates nd4 and nd4l are transcribed as one bicistronic mRNA, and are therefore localized together [73]. Moreover, in several mollusks (i.e., one scaphopod, some gastropods and all 12 cephalopods studied to date (Organellar Genome Retrieval database OGRe; [59]), these two genes are also adjacent to one another or overlap. However, in all other non-unionoid bivalve species studied to date (7 genera), nd4 and nd4l have several intervening coding genes (e.g., Crassostera gigas [40] and Hiatella arctica [41]). Again, the overlap observed between nd4-nd4l in most of the new unionoid genomes analyzed herein and the phylogeny displayed in Figure 1 support the hypothesis that unionoid mt genomes possess the molluscan ancestral character state and that the derived state, "intervening genes", occurred once in the common ancestor of pteriomorph and veneroid bivalves ( Figure 4C).

Base composition and codon usage
Although the exact mechanisms responsible for creating CG-and AT-skews like those observed in this study are still poorly understood, it is most likely created by the biases in mutational pressure owing to differences in the time spent as single-stranded DNA during both transcription and replication [74]). The negative CG-and AT-skews observed in the strand that encodes most of the proteins (i.e., cox1 -cox3 -atp6 -atp8 -nd4L -nd4 -nd5 -nd3 -cox2 (Mcox2e)) and that make it G+T rich is reflected in the use of synonymous codons (Additional file 2, Table S1). This is particularly evident at the third codon positions of protein-coding genes where Cand A-ending codons are used less frequently. Overall, TTT (Phe), TTG (Leu) and TTA (Leu) are the most frequent codons, a result consistent with other invertebrate mtDNAs [34]. Except for the stop codons TAA and TAG, TGC, CGC and ACG are among the least used codons. Of these, CGC is also the least common codon in the mtDNA of other mollusks [34].

Transfer RNA genes
Interestingly, unionoid bivalves do not possess an extra trnM, a situation that is present in both pteriomorph and veneroid bivalves [2,31,34]. The presence of an extra trnM within the latter two lineages could represent a character state that was (1) independently derived multiple times, (2) derived once with multiple independent secondary losses or (3) both derived and lost multiple times independently ( Figure 4D). The absence of a second trnM in all other molluscan species studied to date [2] reinforces the hypothesis that unionoids likely possess the molluscan ancestral character state for this character ( Figure 4D).

Unassigned regions and putative control regions
The presence of multiple unassigned regions is not uncommon in mollusk mitochondrial genomes [75,76] and is usually suggestive of molecular rearrangements [77]. The large unassigned region located between nd5 and trnQ in F genomes and between trnH and trnQ in M genomes has been identified as a potential heavy strand origin of replication (OH) [78]. Otherwise, F and M unionoid mitochondrial genomes appear to contain multiple and potentially bidirectional OL control regions [78].

Levels of intra-and interspecies sequence divergences
It should be stressed that the measured divergences between unionoid F and M genomes considerably surpass intra-or inter-species values reported in classical model systems used for the study of intergenomic coevolution [9,79,80]. From a nucleo-mitochondrial evolutionary perspective, the question of how male freshwater mussels can tolerate heteroplasmy characterized by such variability remains to be solved.
Among species with DUI, freshwater mussels exhibit the greatest nucleotide and amino acid divergences between their gender-associated mtDNAs. For example, the average uncorrected nucleotide divergence observed between the F and M concatenated sequences of the 12 mitochondrial protein-coding genes of the marine mussel Mytilus edulis is about 23% [31]. The smaller level of divergence observed between the M and F mtDNAs in Mytilus is likely associated with periodic "role-reversal" or "masculinization" events, which are characterized by an invasion of the male route of inheritance by an F-like genome that becomes transmitted through sperm as a standard M genome [14,27,81,82]. Specifically, the Flike, "recently-masculinized" M genome is only significantly different from a standard Mytilus F genome in that it possesses a so-called "standard M genome control region" and, as it's name implies, it is paternally transmitted (see [31] and [83] for details). Therefore, such masculinization events reset to zero the level of mitochondrial gene sequence divergence between the M and F genomes. Complete absence of masculinization events, for over 200 million years, can explain the considerably greater divergences between unionoid M and F mtDNAs [22,25,42]. It has been proposed that the unionoid M-specific extension of the cytochrome c oxidase subunit II gene represents such specialization of the unionoid M genome that recombination (i.e. the addition of an M type genome's control region to an F genome) leading to role reversals are no longer possible in this taxon [83].
According to our results,~50% amino acid divergence between unionoid F and M genomes can be tolerated by a species' nuclear environment without any major disruption of cytonuclear co-adaptation or impairment of mitochondrial function. This level of divergence could hardly be explained only by relaxation of selective pressure induced by a loss of metabolic function of M mtDNA since two recent studies have clearly shown the importance of M mtDNA gene products on sperm performance in Mytilus edulis [84,85]. Further characterization of the conserved versus radical amino acid changes in evolutionarily conserved or non-conserved positions of mitochondrial proteins will help to delineate the levels/types of divergence in mtDNA encoded peptides that can be tolerated by a species' nuclear genome.

Conclusions
The basal position of the Unionoida within the autolamellibranch bivalves (Figure 1) and the hypothesized single origin of DUI ( Figure 2B; [23,[26][27][28]) suggest that (1) DUI arose in the ancestral autolamellibranch bivalve lineage and was subsequently lost in multiple descendant lineages and (2) the DUI characteristics observed in unionoid bivalves could more closely resemble the DUI ancestral condition. We described the general features of eight mt genomes from unionoid bivalve species with the doubly uniparental mode of mitochondrial inheritance and highlighted several unusual characteristics of the M genomes, compared to their female-transmitted counterparts, e.g., the presence of Mcox2e and a novel localization of trnH. Based on the concatenated nucleotide sequences of 12 mitochondrial protein-coding genes, we determined an uncorrected amino acid pdistance between the M and F genomes of~50%. From a nucleo-mitochondrial functional perspective, the question of how male freshwater mussels can tolerate heteroplasmy characterized by such variability remains to be solved as does the function(s) of DUI. Finally, the presence of the Mcox2e is one important feature that distinguishes markedly, but not solely, the unionoid M from the F genomes, but also the unionoid M from all other DUI-possessing bivalves as well as all other metazoan mtDNAs. This suggests that it could have been a facilitator of the transition from SMI to DUI in the ancestral autolamellibranch (assuming a single origin of DUI) or ancestral unionoid (assuming multiple origins of DUI) lineage. If the former hypothesis is corroborated, Mcox2e was subsequently lost from the M genome in the ancestor of the Pteriomorphia+Veneroida. Irrespective of a single vs. multiple origins of DUI, the ancestral character state reconstructions in Figure 4 imply that significant mt genomic reorganization occurred in the Bivalvia subsequent to the divergence of the unionoid lineage. Studying additional complete bivalve mt genomes will give us the best hope of unraveling the origin(s) and function(s) of DUI as well as the origins and consequences of the unique mt genomic variation in the Bivalvia.
Long-PCR amplifications were performed in 50 μl reaction volumes using the QIAGEN LongRange PCR Kit in similar conditions to the manufacturers' suggestions: 1× LongRange PCR Buffer with 2.5 mM of Mg2+, 500 μM of each dNTP, 0.4 μM each primer, 2 U of LongRange PCR Enzyme Mix and~25 ng of template DNA. For the M and the F genomes of V. ellipsiformis and Q. quadrula, reactions were cycled at 85°C for 60 s, 93°C for 60 s, and 37 cycles of 93°C for 15 s, 53°C for 30 s and 68°C for 6 min for the short fragment or 11 min for the longest one. Thermal cycling conditions for the M and F genomes of P. grandis were as follows: 93°C for 3 min, followed by 35 cycles of 93°C for 15 s, 46-54°C for 30 s and 68°C for 7-12 min and a final extension at 72°C for 10 min. Each amplicon appeared as one abundant band of the appropriate size on an agarose gel. The resulting PCR products were gel purified using QIAGEN QIAquick Gel Extraction Kit. Following DNA quantification for each amplicon, the two amplicons (~5 μg from each) for each genome were pooled and then processed for direct sequencing in a single reaction by the 454 Life Sciences Massively Parallel Pyrosequencing Platform (whole genome sequencing protocol) of the McGill University and Genome Quebec Innovation Center.
For the M and the F genomes of V. ellipsiformis, amplifications were pooled and a total of 10,413 reads were produced to provide at least 45× coverage of the complete mitochondrial genomes. The sequences were then assembled into a single contig of 15,975 base pairs (bp) for the F genome and 17,174 bp for the M genome. For the M and the F genomes of Q. quadrula, amplifications were pooled and a total of 11,978 reads were produced to provide at least 66× coverage of the complete mitochondrial genomes. The sequences were assembled into a single contig of 16,033 bp for the F genome and 16,970 bp for the M genome. For the P. grandis F genome, draft assemblies were based on 14,794 total reads. The initial assembly of the 454 pyrosequencing data into two predominant contigs (~6.7 kb) and a small one (834 bp) was provided by 454 Life Sciences (Branford, CT, USA), and corresponded to a mitochondrial genome coverage of 115× and 437× respectively. The final assembly in one large contig of 15,848 bp was performed using SeqMan (DNAStar Inc., Madison, WI, USA). The complete M genome of P. grandis was generated from assembly of 7,652 successful sequence reads into a single contig of 17,071 bp which corresponded to an overall mitochondrial genome coverage of > 100×.
The complete sequences of the F and M mitochondrial genomes for Venustaconcha ellipsiformis, Pyganodon grandis and Quadrula quadrula can be accessed under the GenBank accession numbers FJ809753, FJ809752, FJ809754, FJ809755, FJ809750, and FJ809751, respectively.

Gene annotation and analysis
The complete F and M mitochondrial genomes for each species were initially analyzed with the NCBI Open Reading Frame Finder using the invertebrate mitochondrial code. Protein-coding and ribosomal RNA genes were annotated using DOGMA [88] and then aligned with the mtDNA genes annotated in GenBank using ClustalW [89]. The 5' and 3' ends of both rrnL and rrnS genes were assumed to be adjacent to the ends of bordering tRNA genes. Mitochondrial tRNA genes were identified and confirmed using a combination of programs: tRNAscan-SE 1.21 [90] with a COVE cutoff score of 0.1, DOGMA [89] and ARWEN [91]. Mitochondrial gene order comparisons were facilitated by the use of the OGRe web site at http://drake.physics. mcmaster.ca/ogre/index.shtml [59].
Basic sequence statistics and evolutionary distances among genes were performed using MEGA version 4.0 [92] and DnaSP version 4.0 [93]. To estimate evolutionary distance between pairwise comparisons, the following parameters were used: uncorrected nucleotide divergence (Pi = uncorrected nucleotide diversity), nucleotide divergence using the Jukes-Cantor (JC), Kimura two-parameter (K2P), and Tamura and Nei (TrN) models of nucleotide substitution. Estimated parameters also included total amino acid differences (Na), uncorrected amino acid distances (p(aa)), poisson-corrected amino acid distances (D), number of synonymous substitutions per synonymous site (K s ) and number of nonsynonymous substitution per nonsynonymous site (K a ) [94]. The Jukes-Cantor correction for multiple substitutions was applied. Strand asymmetry was measured using the formulas AT-skew = (A -T)/(A + T) and CGskew = (C -G)/(C + G) [51,95] and calculated with MEGA 4.0 [92] at fourfold redundant sites for each mitochondrial protein-coding gene.
Phylogenetic trees for the Bivalvia, using Bayesian inference (BI), maximum likelihood (ML) and maximum parsimony (MP), were constructed using concatenated nucleotide and amino acid sequences from 12 proteincoding genes (we excluded atp8 due to alignment issues and its apparent absence in multiple bivalve species). We used both Clustal W [96] and Dialign version 2.2.1 [97] for the alignments, with subsequent manual adjustments, and the amino acid alignment was used as a template to align the corresponding codons. Amino acids from 29 complete bivalve mitochondrial genomes and those from five outgroup species ( Table 1). Regions of ambiguous alignment were excluded prior to the phylogenetic analyses. The analyzed matrices had either 7,704 nucleotide positions or 2,568 amino acid positions and these files are available from the authors.
The codon-and amino acid-based BI analyses were conducted with Mr. Bayes (v. 3.1.2; [98,99]). The codonbased analysis invoked the M3 model [100] with two simultaneous runs of 5 million generations each (a total of 50,000 saved trees/run). The amino acid-based BI analysis invoked the variable rate "glorified GTR model" (see the MrBayes manual; [101]) with two simultaneous runs of 2.9 million generations each (a total of 29,000 saved trees/run). Both sets of BI analyses reached convergence (average standard deviation of the split frequencies was <0.01) and the burnin for each set was determined by reference to the log probability of observing the data × generation plot (codon-based BI run burnin = 4 million generations [= the last 10,000 trees/ run saved contributed to the majority-rule tree], amino acid-based BI run burnin = 1.9 million generations [= the last 10,000 trees/run saved contributed to the majority-rule tree]).
Codon and amino acid-based ML analyses were conducted with Garli (v. 0.96; [102]). The M3 model was used in the codon-based ML analysis which was set to use the observed nucleotide frequencies at each codon position separately. A non-parametric bootstrap [103] analysis was performed, using 300 replicates, to assess nodal support for the codon analysis-based trees. The program ProtTest http://darwin.uvigo.es/software/prottest.html was used to evaluate the best amino acid model for our data from those models available in Garli. Both the Akaike information criterion and Bayesian information criterion selected the WAG+F model [104] as the best available model which was therefore used in the ML amino acid analysis. A non-parametric bootstrap was performed, using 600 replicates, to assess nodal support for the a.a. analysis-based trees.
Maximum parsimony analyses were conducted with PAUP* [105]. The nucleotide-based MP analysis utilized equally weighted transversion parsimony (= only purines vs. pyrimidines were coded) and 1000 random addition runs for estimating the most parsimonious tree. A nonparametric bootstrap transversion parsimony analysis was run (with 1000 replicates) using 10 random addition runs per replicate. The amino acid-based MP analysis was carried out with equal weighting and 1000 random addition runs were used to estimate the most parsimonious tree. Lastly, an equally weighted parsimony, nonparametric, bootstrap analysis was run on the a.a. matrix (with 1000 replicates) using 10 random addition runs per replicate.
The estimation of ancestral mitogenomic character states and the presence/absence of DUI, based on the majority-rule codon-based BI tree, was carried out using the ML algorithm in Mesquite (v.2.6; [106]). An estimation of ancestral character states for the presence/ absence of DUI, using the best codon-based BI tree and Dollo parsimony, was done with MacClade (v.4.07; [107]). The asymmetry likelihood ratio test was used to determine whether the AsymmMK model was significantly better than the MK1 model (see the Mesquite manual). The MK1 model was used in all likelihood reconstructions because in all cases, the AsymmMK model was not a significantly better model, therefore we used the simpler model (the MK1 model has one less parameter). The use of a likelihood ratio test to calculate P-values for ancestral states is not possible because hypotheses regarding the likelihoods of each possible state at a given node are non-nested. Therefore, to make decisions regarding the significance of ancestral character states Pagel ([108] following [109]), recommended that ancestral character state estimates with a log likelihood two or more units lower than the best state estimate (decision threshold [T] set to T = 2) be rejected. Generally viewed as a conservative cutoff, this threshold has been used by numerous recent authors (e. g., [110][111][112]). For the data presented herein, this protocol ensures that all of the character states judged to be significant have proportional likelihoods at least 10 times greater than that of any other state.
Additional file 1: Supplemental figures. Figure S1. Inferred secondary structures of the 22 mitochondrial tRNAs from F Venustaconcha ellipsiformis, shown in the order they occur in the genome, beginning with trnH. Amino acid identities are given above each sequence. Figure  S2. Inferred secondary structures of the 22 mitochondrial tRNAs from M Venustaconcha ellipsiformis, shown in the order they occur in the genome, beginning with trnA. Amino acid identities are given above each sequence. Figure S3. Inferred secondary structures of the 22 mitochondrial tRNAs from F Pyganodon grandis, shown in the order they occur in the genome, beginning with trnH. Amino acid identities are given above each sequence. Figure S4. Inferred secondary structures of the 22 mitochondrial tRNAs from M Pyganodon grandis, shown in the order they occur in the genome, beginning with trnA. Amino acid identities are given above each sequence. Figure S5. Inferred secondary structures of the 22 mitochondrial tRNAs from F Inversidens japanensis, shown in the order they occur in the genome, beginning with trnH. Amino acid identities are given above each sequence. Figure S6. Inferred secondary structures of the 22 mitochondrial tRNAs from M Inversidens japanensis, shown in the order they occur in the genome, beginning with trnA. Amino acid identities are given above each sequence. Figure S7. Inferred secondary structures of the 22 mitochondrial tRNAs from F Quadrula quadrula, shown in the order they occur in the genome, beginning with trnH. Amino acid identities are given above each sequence. Figure S8. Inferred secondary structures of the 22 mitochondrial tRNAs from M Quadrula quadrula, shown in the order they occur in the genome, beginning with trnA. Amino acid identities are given above each sequence. Click here for file [ http://www.biomedcentral.com/content/supplementary/1471-2148-10-50-S1.PDF ] Additional file 2: Table S1. Codon usage in the female-and maletransmitted mitochondrial genomes of Venustaconcha ellipsiformis, Pyganodon grandis, Inversidens japanensis and Quadrula quadrula.