Detecting the symplesiomorphy trap: a multigene phylogenetic analysis of terebelliform annelids
© Zhong et al; licensee BioMed Central Ltd. 2011
Received: 3 August 2011
Accepted: 20 December 2011
Published: 20 December 2011
For phylogenetic reconstructions, conflict in signal is a potential problem for tree reconstruction. For instance, molecular data from different cellular components, such as the mitochondrion and nucleus, may be inconsistent with each other. Mammalian studies provide one such case of conflict where mitochondrial data, which display compositional biases, support the Marsupionta hypothesis, but nuclear data confirm the Theria hypothesis. Most observations of compositional biases in tree reconstruction have focused on lineages with different composition than the majority of the lineages under analysis. However in some situations, the position of taxa that lack compositional bias may be influenced rather than the position of taxa that possess compositional bias. This situation is due to apparent symplesiomorphic characters and known as "the symplesiomorphy trap".
Herein, we report an example of the sympleisomorphy trap and how to detect it. Worms within Terebelliformia (sensu Rouse & Pleijel 2001) are mainly tube-dwelling annelids comprising five 'families': Alvinellidae, Ampharetidae, Terebellidae, Trichobranchidae and Pectinariidae. Using mitochondrial genomic data, as well as data from the nuclear 18S, 28S rDNA and elongation factor-1α genes, we revealed incongruence between mitochondrial and nuclear data regarding the placement of Trichobranchidae. Mitochondrial data favored a sister relationship between Terebellidae and Trichobranchidae, but nuclear data placed Trichobranchidae as sister to an Ampharetidae/Alvinellidae clade. Both positions have been proposed based on morphological data.
Our investigation revealed that mitochondrial data of Ampharetidae and Alvinellidae exhibited strong compositional biases. However, these biases resulted in a misplacement of Trichobranchidae, rather than Alvinellidae and Ampharetidae. Herein, we document that Trichobranchidae was apparently caught in the symplesiomorphy trap suggesting that in certain situations even homologies can be misleading.
This problem is common for morphological data and several instances are known. One well-known annelid example is the position of Clitellata as sister to Polychaeta due to the lack of typical polychaete characters such as parapodia and nuchal organs . However, molecular data clearly place Clitellata within polychaetes [e.g., [2, 3, 19]]. In theory, the symplesiomorphy trap is not restricted to morphological data, but can also apply to sequence data . However, studies addressing this problem in molecular data are scarce because detection of the trap is not straightforward. First, the misplaced taxa are not themselves affected by compositional biases or increased substitution rates. Second, support for monophyly of misplaced taxa is based on apomorphies for a higher taxonomic unit and hence not artificial. Third, knowledge of the 'true' phylogeny is needed to directly detect the symplesiomorphy trap. Typically, detection of the trap occurs indirectly by excluding other possibilities of incongruence and revealing characteristic signatures in the data. For example, Wägele and Mayer's  study showed that misplacement of Acrothoracica barnacles in a 18S parsimony analysis was due to symplesiomorphic characters shared exclusively by Ascothoracida (a non-barnacle outgroup) and Acrothoracica (Figure 1B). These characters overwhelmed the phylogenetic signal for the monophyly of Cirripedia. This phenomenon is known as the symplesiomorphy trap.
Here we report another instance of the symplesiomorphy trap in molecular data discovered while examining Terebelliformia (Annelida) phylogeny. Terebelliform worms [sensu ] are typically tube-dwelling annelids, found in diverse marine habitats, including intertidal, deep-sea and even hydrothermal vent areas. Terebelliformia include about 800 species within five 'families': Alvinellidae, Ampharetidae, Terebellidae, Trichobranchidae and Pectinariidae [20–22]. Based on thorough investigations using data partitioning, topology tests, removal and addition of taxa, spectral analyses, detection of compositional biases, models of non-stationary sequence evolution, and recoding of characters, we were able to pinpoint the source of the incongruence between mitochondrial and nuclear data and relate it to the symplesiomorphy trap. Ampharetidae and Alvinellidae exhibit strong compositional biases in their mitochondrial genomes. However, these biases affect placement of Trichobranchidae and Terebellidae rather than Ampharetidae and Alvinellidae.
Sample and Data Collection
Taxa used in phylogenetic analyses with 17 taxa.
63°30.84'N/10°25.01'E Storgrunnen (Norway)
39°53.88'N/69°39.64'W Southern New England (USA)
41°37.91'N/70°53.34'W Egypt Lane, Fairhaven, MA (USA)
47°57.001'N/129°05.851'W Juan de Fuca (Canada)
47°56.947'N/129°05.878'W Juan de Fuca (Canada)
Scoloplos cf. armiger
Genomic Assembly and Gene Identification
Sequences were edited and aligned using DNASTAR™ Lasergene programs SeqMan and MegAlign . Protein-coding genes and ribosomal RNA genes were identified by BLAST . All tRNA genes were identified using tRNAscan-SE web server [http://lowelab.ucsc.edu/tRNAscan-SE/, ] under default settings and source = "mito/chloroplast", or by hand based on their potential secondary structures and anticodon sequences.
Datasets consisted of mitochondrial and nuclear data. All alignments are available at TreeBASE http://www.treebase.org. Seventeen available annelid mitochondrial genomes with about 50% coverage or greater were used for the phylogenetic analyses (Table 1). The alignment of Zhong et al.  was employed with the addition of Nephtys sp., Pectinaria gouldi, Paralvinella sulfincola and Auchenoplax crinita. Because we were interested in relationships within Terebelliformia, we deleted the mitochondrial data of Katharina (Mollusca) and Terebratalia (Brachiopoda) and used all other annelids as outgroup taxa.
Both nucleotide and amino acid datasets were created for mitochondrial phylogenetic analyses. In the nucleotide dataset, all protein-coding genes (except for atp6, atp8 and nad6 genes which exhibit high variability) and the two rRNA genes (mLSU and mSSU) were included. Clustal X  under default settings was used to align rRNA genes. Gblocks 0.91b  was used to identify ambiguous aligned regions in the rRNA genes. These regions and the 3rd positions of protein-coding genes, which are saturated with substitutions for family-level analyses, were excluded from the analyses with the aid of MacClade4.08  and Se-Al v2.0a11 . The amino acid dataset was created from the aligned nucleotide dataset by translation of protein-coding genes with the Drosophila mitochondrial genetic code and exclusion of rRNA genes. The mitochondrial nucleotide and amino acid datasets comprised 6,287 and 2,990 positions, respectively.
Additionally, a combined data matrix was constructed with the addition of 18S, 28S and EF-1α sequences to the mitochondrial data for the above 17 taxa (Table 1). Because we employed data from GenBank and collected data in two different laboratories (Univ. of Osnabrück and Auburn Univ.), in some cases we concatenated data from as closely related species as possible to generate Operational Taxonomic Units (OTUs) with a more complete coverage (see Table 1). Sequences were aligned as above. Due to the addition of nuclear data, the combined datasets comprised 11,813 nucleotide and 3,331 amino acid positions. The amino acid dataset comprised only the protein-coding genes.
Moreover, we also constructed a nuclear dataset comprising only 18S, 28S and EF-1α sequences at the nucleotide level for these 17 taxa (Table 1). The nuclear dataset comprised 5,526 nucleotide positions. Analyses of nuclear ribosomal gene datasets were also based on 32 and 61 taxa to reveal if taxon sampling had a substantial impact on the phylogenetic reconstruction of the nuclear data. By comparison, taxon sampling was far more limited for mitochondrial genome sequences. Additional File 2 provides a summary of the construction of these datasets with more than 17 taxa.
Maximum likelihood (ML) and Bayesian inference (BI) approaches were employed for all mitochondrial, nuclear and combined datasets. For all nucleotide datasets with 17 taxa, ML analyses were performed in PAUP4.0b10  with a GTR+Γ+I model as determined by Modeltest v3.7 based on the Akaike information criterion (AIC) [33, 34]. Heuristic searches were run with random-taxon addition (10 replicates) using Tree-Bisection-Reconnection (TBR) swapping. All model parameters used fixed values as determined by Modeltest v3.7. Bootstrap analyses employed 1,000 iterations using heuristic searches with 10 random taxa addition replicates. Partitioned ML analyses were conducted with RAxML 7.2.8  using a GTR+Γ+I model for each individual gene and 200 bootstrap replicates followed by a best tree search. Partitioned BI invoked independent substitution models for each gene in MrBayes version 3.1.2  and ran for 5*106 (mitochondrial and nuclear) or 2*106 (combined) generations, respectively, with 2 runs of 4 chains (3 heated and 1 cold). Trees were sampled every 100 generations. The implemented diagnosis feature comparing the 2 runs by average standard deviation of split frequencies was determined every 10,000 generations. GTR+Γ+I models were selected under the AIC in MrModeltest [37, 38] for 18S and 28S rDNA, EF-1α, cox1, cox2, cob, nad1, nad3, and nad4, GTR+I models for both 12S and 16S rDNA, GTR+Γ model for cox3, and HKY+Γ model for nad2, nad4L and nad5. Convergence of -ln likelihood scores and tree length was determined using Tracer v1.4.1  to identify the burnin point at which all estimated parameters reached equilibrium (burnin = 100 trees). The majority-rule consensus tree containing posterior probabilities (PP) was determined from the remaining trees. Additional File 2 provides a more detailed description of the analyses and results for the datasets with more than 17 taxa.
For both amino acid datasets (mitochondrial and combined data with 17 taxa), non-partitioned and partitioned ML, and partitioned BI analyses were run. For ML analyses, model selection was performed in RAxML 7.2.8  and the MtZOA+Γ+I+F model was chosen as the best-fitting one for both non-partitioned datasets. For individual genes, MtZOA+Γ+I models were selected for cox1, cox2 (additionally +F), cox3 and cob, and DAYHOFF+Γ+I for nad1, nad2, nad3, nad4, nad4L, nad5 and EF-1α. Maximum likelihood searches were implemented with 200 bootstrap replicates using RAxML  followed by a ML tree search for both non-partitioned and partitioned ML analyses. For partitioned BI of amino acid datasets, the mixed amino acid substitution model option plus a Γ distribution and a proportion of invariant sites was assigned to each partition individually and unlinked in MrBayes v3.1.2. BI ran for 2*106 generations and trees sampled every 500 generations (burnin = 20 trees). In the mixed model option, a specific model is not specified a priori, but each model is chosen during the run based on its posterior probability.
Non-stationary sequence evolution
To analyze data in a non-stationary Bayesian framework, we used PHASE 2.0  to allow usage of different compositional vectors along branches of the tree. As in stationary Bayesian inferences using MrBayes, we conducted partitioned analyses for nucleotide datasets with 17 taxa of both mitochondrial and nuclear data invoking previously mentioned substitution models for each gene (except that the proportion-of-invariant-sites parameter is not available in PHASE 2.0). We performed analyses based on 3, 6 or 9 different compositional vectors. For each number of compositional vectors, we ran 4 independent runs, with one cold chain each and different random seeds (i.e., 3, 11, 88, and 1000), in parallel. Each run ran for 12*106 generations and trees were sampled every 1,000 generations. The first 2*106 generations were discarded as burnin as convergence of -ln likelihood scores and tree length was indicated by Tracer v1.4.1.
To further understand congruence and incongruence in our datasets, the Approximately Unbiased (AU) topology test of CONSEL [41, 42] was employed to assess support for alternative hypotheses. More specifically under the ML criterion, AU tests compared the three possible terebelliform hypotheses with respect to incongruence for each possible combination of partitions in the 17-taxa case (i.e., 18S, 28S, mtDNA, 18S/28S, 18S/EF-1α, 18S/mtDNA, 28S/EF-1α, 28S/mtDNA, EF-1α/mtDNA, 18S/28S/EF-1α, 18S/28S/mtDNA, 18S/EF-1α/mtDNA, 28S/EF-1α/mtDNA, and 18S/28S/EF-1α/mtDNA). Based on initial results, the following hypotheses were tested: 1) Trichobranchidae as sister to Alvinellidae/Ampharetidae (TriAA), 2) Trichobranchidae as sister to Terebellidae (TriTer), and 3) Terebellidae as sister to Alvinellidae/Ampharetidae (TerAA). PAUP analyses were constrained to obtain only the best trees congruent with the particular hypothesis. Settings for the analyses were as described above.
We conducted spectral analyses to gain further insights into the support for specific bipartitions (or splits) [43, 44] because they have been useful in the detection of the symplesiomorphy trap . A bipartition splits a set of OTUs into two groups. In the context of spectral analyses, we use the term ingroup (italicized here to distinguish its usage in spectral analyses from common systematic usage) to define the group of the bipartition we are interested in, and outgroup for the other group of that bipartition. For example, Trichobranchidae, Alvinellidae and Ampharetidae in one group of the bipartition, the ingroup, and all others including Terebellidae in the other, the outgroup, would be congruent with the TriAA hypothesis. To calculate and visualize the bipartition support, we used Splits Analyses MethodS [SAMS, ] and Microsoft Excel for mitochondrial, nuclear and combined datasets with 17 taxa. SAMS is a split-decomposition tool that does not require Hadamard conjugations. Hence, there is no need to consider the complete split space. SAMS differentiates support for a bipartition into three categories: 1) binary, both groups exhibit only one character state each, but different from each other; 2) noisy outgroup (i.e., while the ingroup exhibits only one state the outgroup exhibits more than one state, though a majority state within the group can still be identified); 3) noisy ingroup and outgroup . Because we were only interested in bipartitions regarding relationships within Terebelliformia, we only retrieved bipartitions from the results that were relevant regarding these relationships. The PERL script to retrieve these bipartitions is available from THS upon request.
Determination of Compositional Biases
ML and partitioned BI of the 17-taxa, three-nuclear-gene (i.e., 18S, 28S and EF-1α) dataset inferred an identical topology with respect to terebelliform relationships (Figure 2a). Interestingly, monophyly of Terebelliformia was not recovered as Pectinaria gouldi was placed as sister to the sipunculid Phascolopsis gouldi, albeit with weak support (Figure 2a). The other four terebelliform taxa formed a clade with stronger nodal support (BS: 86 for nNuc, 100 for pNuc; PP 1.00) than in mitochondrial analyses (BS: 69 for nNuc, <50 for pNuc; PP: 0.92, Figure 2b). As for the mitochondrial analyses, a sistergroup relationship of Alvinellidae and Ampharetidae is well corroborated (BS: 98 for nNuc, 99 for pNuc; PP: 1.00). Moreover, the TriAA hypothesis was supported (BS: 96 for nNuc, 92 for pNuc; PP: 1.00) and topology testing significantly rejects the alternative TriTer (favored by the mitochondrial data) and TerAA hypotheses (p = 0.038 and p = 0.006, respectively).
Phylogenetic trees from combined analyses (Figure 2c & Additional File 3) were similar to the ones from mitochondrial data (Figure 2b) with differences occurring in outgroup relationships. Monophyly of Terebelliformia is significantly supported in these analyses (BS: 99 for nNuc, 100 for pNuc, 98 for nAA and 93 for pAA; PP: 1.00 for both; Figure 2c, Additional File 3). Pectinariidae branched off first within terebelliforms (BS: 95 for nNuc, 100 for pNuc, 96 for nAA and 72 for pAA; PP: 1.00 for both). Alvinellidae was recovered as sister to Ampharetidae (BS: 100 for all four; PP: 1.00 for both). Trichobranchidae was placed as sister to Terebellidae, the TriTer hypothesis, in all analyses. However, bootstrap support for the TriTer hypothesis in the combined analyses was generally lower than in mtDNA alone analyses (83 in nNuc, 95 in pNuc, 41 in nAA, and 74 in pAA compared to 95, 100, 62, and 84, respectively; Figure 2 & Additional File 3). Furthermore in contrast to the mitochondrial Nuc dataset, topology testing did not significantly reject the alternative TriAA hypothesis favored by the nuclear dataset (p = 0.184), though the TerAA hypothesis is still significantly rejected (p = 0.012).
Congruence and Incongruence between Partitions regarding Terebelliformia
Besides the number of positions, the quality of supporting positions is different for these three alternative hypotheses in both 17-taxon datasets. For the nuclear dataset, two binary positions support the TriAA hypothesis (black color in Figure 4a) and no binary positions support the TriTer and TerAA hypotheses. In contrast, no binary positions are found to support any of the three hypotheses in the mitochondrial dataset. All other positions consistent with the TriAA or TerAA hypothesis are either noisy only in the outgroup (dark grey in Figure 4) or in both ingroup and outgroup (light grey in Figure 4), with more positions belonging to the latter class. Conversely, positions consistent with the TriTer hypothesis are exclusively based on a single class of positions, noisy in the outgroup only (Figure 4).
Source of Incongruence
Based on analyses herein, placement of Trichobranchidae is incongruent between mitochondrial and nuclear data. To further investigate possible sources of incongruence with regards to Trichobranchidae placement, we examined two properties known to mislead placement of taxa, placement of the root and base composition heterogeneity.
Placement of the root
Poor taxon sampling can also influence taxon placement and rooting [58, 59]. As we could not easily increase the available number of mitochondrial genomes for Terebelliformia, we focused on adding more nuclear data and included 18 new 18S and 13 28S sequences for Terebelliformia and one cirratulid to the available data (Additional File 2). Phylogenetic analyses of this dataset comprising 32 taxa also recovered a sistergroup relationship of Trichobranchidae to Alvinellidae/Ampharetidae (BS: 80; PP: 0.95) within a monophyletic Terebelliformia. Additionally, the 61-taxon dataset based only on 18S rRNA data failed to provide resolution within Terebelliformia (Additional File 2); thus, neither exclusion of long-branched taxa nor an increased taxon sampling had an influence on the placement of the root for the nuclear data.
Ampharetidae exhibited a strong G-C skew value towards guanine relative to cytosine (Figure 6c). Moreover for mitochondrial data, C-T skews indicated that Ampharetidae was biased towards thymine, and Alvinellidae away from it, relative to other taxa. The same pattern could be observed in A-T skews driven by the differences in thymine frequencies. Thus, Ampharetidae and Alvinellidae showed strong-but opposite-biases in frequencies of pyrimidines, and Ampharetidae also a strong skew towards guanine. These evaluations were based on the mitochondrial dataset, we used for phylogenetic analyses (i.e., excluding 3rd positions), but examining either 3rd positions alone or with 3rd positions included resulted in similar patterns (Additional File 4). Codon usage reflected biases in base frequencies with deviations in Ampharetidae and Alvinellidae compared to the other taxa (Additional File 1).
Amelioration of Incongruence
Non-stationary sequence evolution
Using models of non-stationary sequence evolution has successfully ameliorated misleading effects of compositional biases in mitochondrial genomes of beetles . Therefore, we also employed such models for both our mitochondrial and nuclear datasets using PHASE 2.0 . For both datasets and each number of different compositional vectors, 4 independent chains starting from different random seeds failed to converge upon the same score indicating a structured tree-space with several local optima. Nonetheless for mitochondrial data, the majority-rule consensus topology derived from the best run (i.e, -lnL values) for each number of different compositional vectors (i.e., 3, 6, or 9) were identical except for the position of the outgroup taxon Clymenella torquata (Additional File 5). As before with mitochondrial data, Terebellidae and Trichobranchidae were sister to each other (PP: 1.00 for all three; Additional File 5). For nuclear data, the three topologies derived from the best runs invoking 3, 6 or 9 different vectors placed Trichobranchidae as sister to Alvinellidae/Ampharetidae (PP: 1.00 for all three; Additional File 5). Thus, using different compositional vectors along the branches did not reduce incongruence between datasets.
In contrast, RY coding of the mitochondrial partition and combined dataset (inset in Figure 7) yielded different ingroup relationships (see Figures 2b & 2c for standard nucleotide coding) with Terebellidae as sister to Ampharetidae/Alvinellidae rather than Trichobranchidae. Notably, bootstrap support for this clade was below 50 in the analyses of both mitochondrial and combined data and all previous topology tests clearly rejected this relationship (Figures 3b & 3c). Besides this difference in ingroup relationships, RY coding of mitochondrial and combined data also differed in several outgroup relationships.
Biases in nucleotide frequencies influenced placement of Trichobranchidae and Terebellidae in both mitochondrial and combined analyses. Misplacement of these taxa is interesting because the taxa themselves did not exhibit compositional biases, but Alvinellidae and Ampharetidae biases influenced their placement. This misplacement was apparently due to biases in Ampharetidae and Alvinellidae and can be related to the "symplesiomorphy trap" for which few molecular examples have been elucidated [16, 17]. In the Cirripedia example by Wägele and Mayer  (Figure 1B), Acrothoracica and Ascothoracida grouped together due to symplesiomorphic characters because of the long branch uniting the remaining Cirripedia. Though no long branches could be observed in our analyses based on mitochondrial data regarding terebelliform taxa, biases in base composition and codon usage detected in Ampharetidae and Alvinellidae pointing in opposite directions appear to have had a similar effect. These directional biases affected nucleotides in all three coding positions of mitochondrial genes in Ampharetidae and Alvinellidae presumably due to differences in substitution rate or pattern.
In our case the symplesiomorphy trap appears to have misrooted a terebelliform subtree rendering a paraphyletic assemblage as a monophyletic group. The misinterpretation appears due to basal homologies, or symplesiomorphies, rather than an artificial signal due to homoplasy (e.g., long branches). First of all, though Alvinellidae and Ampharetidae are affected by opposite biases in mitochondrial nucleotide frequencies their sistergroup relationship, which is independently confirmed by the nuclear data, is still strongly supported by mitochondrial data as judged by bootstrap and spectral analyses. Hence, these two taxa appear unaffected by the opposite biases. Second, we could exclude that the nuclear partition is affected by an artificial signal; the nuclear data exhibited no biases with respect to terebelliform taxa. The root of the subtree comprising Terebellidae, Trichobranchidae and Ampharetidae/Alvinellidae, which was supported by all our analyses as well as several previous ones [e.g., [19, 57, 62]], was not placed differently by the inclusion or exclusion of taxa . Moreover, the spectral analysis of the nuclear partition is in agreement with the reconstructed nodes regarding the relations of these three taxa to each other. The number of supporting positions in the spectral analysis is in agreement with support by bootstrap and topology test p values for nuclear data. Third and contrasting with the nuclear data, the spectral analyses of the mitochondrial data are not congruent with tree reconstructions. Whereas the TriTer hypothesis was recovered in all best trees that included mtDNA data and was strongly supported by bootstrap and topology test results, spectral analyses revealed that this hypothesis was consistent with the fewest numbers of positions in the mitochondrial data. Using mitochondrial data, these characters overwhelmed the larger numbers of positions supporting the alternative placement of Trichobranchidae.
The process of deamination of the non-coding strand may be responsible for biases observed herein for pyrimidines and purines . Compositional biases in our mitochondrial data were greater within pyrimidines than in purines; guanine had the lowest average frequency (16%) of all nucleotides. This is similar to the situation found in mammals though their guanine frequency can be considerably lower [15, 55, 63, 64]. In mammals, this is due to spontaneous deamination of cytosine to uracil and adenine to hypoxanthine on the complementary strand during replication of mitochondrial genomes . The former deamination occurs more often than the latter  explaining the low level of guanines in mammals on the coding strand and the stronger bias observed in pyrimidines than in purines, because the low guanine frequency allows for little variation .
The best strategy to ameliorate the effect of the symplesiomorphy trap is to increase ingroup taxon sampling . However, increasing the taxon sampling might not always be easily achieved or possible. For example, sampling of nearly complete mitochondrial genomes in annelids is time consuming and expensive, but new sequencing technologies are changing this. In other cases, taxon sampling will be limited by number of extant taxa from which genetic material can be obtained. Therefore, we tested different strategies with respect to their capabilities to ameliorate the effect of the symplesiomorphy trap given a limited taxon sampling. In the Cirripedia example, using appropriate methods such as ML and increased outgroup sampling ameliorated the symplesiomorphy problem because this misplacement was due to long branches . In the Mammalia example, the problem could be solved by the RY coding strategy and partitioned analyses, which resulted in weak support for the Theria hypothesis even using mitochondrial data . Moreover, usage of non-stationary models of sequence evolution were able to adjust for compositional biases in mitochondrial genomes in the reconstruction of the beetle phylogeny .
In our case, the most effective strategy was RY coding, which reduced the effects of compositional biases within pyrimidines and purines. However, we still did not recover strong support for Trichobranchidae as sister to Ampharetidae/Alvinellidae with either mitochondrial or combined data. Moreover, phylogenetic signal in all datasets was substantially decreased by RY coding. Addition of nuclear data was only able to slightly minimize the effects of the symplesiomorphy trap as indicated, for example, by the slight decrease in bootstrap support for the presumed 'incorrect' hypothesis. Therefore, substantially more unbiased nuclear data would have been necessary to turn the tides. On the other hand, herein partitioned analyses always obtained the same topology as non-partitioned ML analyses, and PHASE analyses did not resolve incongruence either. The poor performance of non-stationary models of sequence evolution in our analyses, in comparison to Sheffield et al. , might be due to the limited sampling of ingroup taxa. Increased sampling may allow better adjustment to biases along the branches [58, 59]. Finally, we also tested if exclusion of biased taxa in turn would alter the results , but there was no noticeable effect. Thus, though several approaches were tried, none completely ameliorated the influence of the symplesiomorphy trap.
Interestingly, results based on combined data seem to be congruent with morphological and mitochondrial gene order data and, therefore, the underlying incongruence in the data was not apparent at first. Trichobranchidae strongly resemble Terebellidae and, thus, were placed as sister to or within Terebellidae [18, 20, 67]. However, only one non-homoplastic character supports their common origin: prostomium on peristomium with fused frontal edges. In contrast, others did not support a sister relationship of Terebellidae and Trichobranchidae [68, 69]. The position of two adjacent trnM genes also seemed to support such a relationship of Terebellidae and Trichobranchidae . However, two adjacent trnM genes are also found in the pectinarid P. gouldi (Additional File 1) and in some but not all sipunculids [70–72]. Thus, no unequivocal character supports a sistergroup relationship of Terebellidae and Trichobranchidae. Analyses herein revealed that support by mitochondrial and combined data was only due to symplesiomorphic characters. On the other hand, although a close relationship between alvinellids and ampharetids has been long suspected based on morphology [e.g., [18, 69, 73]], until now strong support by molecular data [e.g., [19, 68]] has been lacking.
Herein we report the detection of the symplesiomorphy trap in molecular data, one of a few known examples to date. Mitochondrial data placed Trichobranchidae as sister to Terebellidae in contrast to the nuclear data, which placed Trichobranchidae as sister to Ampharetidae and Alvinellidae. These latter two taxa exhibited strong compositional biases in the mitochondrial data as shown by spectral analyses as well as skew and RCFV values. However, Ampharetidae and Alvinellidae themselves were not misplaced but caused Trichobranchidae to be erroneously placed. This taxon exhibits no obvious compositional bias. Unfortunately, several state-of-the-art approaches (i.e., partitioning the dataset, performing ML analyses and partitioned analyses, use of several outgroup taxa, exclusion of biased taxa, use of different numbers of compositional vectors to implement time-heterogeneous models) were not able to ameliorate the influence of the symplesiomorphy trap in the mitochondrial data. Therefore, more sophisticated substitution models have to be developed to appropriately address this peculiar tree reconstruction artifact. In the mean time, partitioned and careful analyses can be used to detect the trap and to be aware of incongruencies in the molecular data even if nodal support is high as in our case. Given the advent of next generation sequencing technologies, we hope that analyses, such as those done here, will be better able to detect artifacts due to systematic errors because much more data will be brought to bear on such issues. Hence, these approaches may add strength and confidence to results of phylogenomic studies by allowing more in depth understanding of the sources of signal and noise.
This study was funded by the NSF-WormNet grant (EAR-0120646; DEB-1036537) and the German Science Foundation DFG STR683/5-2 from the priority program 1174 "Deep Metazoan Phylogeny" and DFG STR683/6-1. Contribution #86 to the AU Marine Biology Program and #6 to the Molette Biology Laboratory for Environmental and Climate Change Studies.
- Dunn CW, Hejnol A, Matus DQ, Pang K, Browne WE, Smith SA, Seaver E, Rouse GW, Obst M, Edgecombe GD, et al: Broad phylogenomic sampling improves resolution of the animal tree of life. Nature. 2008, 452 (7188): 745-750. 10.1038/nature06614.View ArticlePubMedGoogle Scholar
- Dordel J, Fisse F, Purschke G, Struck TH: Phylogenetic position of Sipuncula derived from multi-gene and phylogenomic data and its implication for the evolution of segmentation. J Zool Syst Evol Res. 2010, 48 (3): 197-207.Google Scholar
- Struck TH, Paul C, Hill N, Hartmann S, Hösel C, Kube M, Lieb B, Meyer A, Tiedemann R, Purschke G, et al: Phylogenomic analyses unravel annelid evolution. Nature. 2011, 471: 95-98. 10.1038/nature09864.View ArticlePubMedGoogle Scholar
- Hausdorf B, Helmkampf M, Meyer A, Witek A, Herlyn H, Bruchhaus I, Hankeln T, Struck TH, Lieb B: Spiralian phylogenomics supports the resurrection of Bryozoa comprising Ectoprocta and Entoprocta. Mol Biol Evol. 2007, 24 (12): 2723-2729. 10.1093/molbev/msm214.View ArticlePubMedGoogle Scholar
- Galtier N, Nabholz B, Glemin S, Hurst GDD: Mitochondrial DNA as a marker of molecular diversity: a reappraisal. Mol Ecol. 2009, 18: 4541-4550. 10.1111/j.1365-294X.2009.04380.x.View ArticlePubMedGoogle Scholar
- Janke A, Gemmell NJ, Feldmaier-Fuchs G, von Haeseler A, Pääbo S: The mitochondrial genome of monotreme - The platypus (Ornithorhynchus anatinus). J Mol Evol. 1996, 42: 153-159. 10.1007/BF02198841.View ArticlePubMedGoogle Scholar
- Janke A, Xu X, Arnason U: The complete mitochondrial genome of the wallaroo (Macropus robustus) and the phylogenetic relationship among Monotremata, Marsupialia, and Eutheria. Proc Natl Acad Sci USA. 1997, 94: 1276-1281. 10.1073/pnas.94.4.1276.View ArticlePubMedPubMed CentralGoogle Scholar
- Janke A, Magnell O, Wieczorek G, Arnason U: Phylogenetic analysis of 18S rRNA and the mitochondrial genomes of the wombat, Vombatus ursinus, and the spiny anteater, Tachyglossus aculeatus: increased support for the Marsupionta hypothesis. J Mol Evol. 2002, 54: 71-80. 10.1007/s00239-001-0019-8.View ArticlePubMedGoogle Scholar
- Kumazawa Y, Ota H, Nishida M, Ozawa T: The complete nucleotide sequence of snake (Dinodon semicarinatus) mitochondrial genome with two identical control regions. Genetics. 1998, 150: 313-329.PubMedPubMed CentralGoogle Scholar
- Penny D, Hasegawa M: The platypus put in its place. Nature. 1997, 387: 549-550. 10.1038/42352.View ArticlePubMedGoogle Scholar
- Zardoya R, Meyer A: Complete mitochondrial genome suggests diapsid affinities of turtles. Proc Natl Acad Sci USA. 1998, 95: 14226-14231. 10.1073/pnas.95.24.14226.View ArticlePubMedPubMed CentralGoogle Scholar
- Griffiths M: The Biology of the Monotremes. 1978, New York: Academic PressGoogle Scholar
- Killian JK, Buckley TR, Stewart N, Munday BL, Jirtle RL: Marsupials and eutherians reunited: genetic evidence for the Theria hypothesis of mammalian evolution. Mamm Genome. 2001, 12: 513-517. 10.1007/s003350020026.View ArticlePubMedGoogle Scholar
- Lee M-H, Shroff R, Cooper SJB, Hope R: Evolution and molecular characterization of a b-globin gene from the Australian echidna Tachyglossus aculeatus (Monotremata). Mol Phylogenet Evol. 1999, 12: 205-214. 10.1006/mpev.1999.0610.View ArticlePubMedGoogle Scholar
- Phillips MJ, Penny D: The root of the mammalian tree inferred from whole mitochondrial genomes. Mol Phylogenet Evol. 2003, 28 (2): 171-185. 10.1016/S1055-7903(03)00057-5.View ArticlePubMedGoogle Scholar
- Wägele JW: Foundations of Phylogenetic Systematics. 2005, München: Verlag Dr. Friedrich Pfeil, 2Google Scholar
- Wägele JW, Mayer C: Visualizing differences in phylogenetic information content of alignments and distinction of three classes of long-branch effects. BMC Evol Biol. 2007, 7: 147-10.1186/1471-2148-7-147.View ArticlePubMedPubMed CentralGoogle Scholar
- Rouse GW, Fauchald K: Cladistics and polychaetes. Zool Scr. 1997, 26: 139-204. 10.1111/j.1463-6409.1997.tb00412.x.View ArticleGoogle Scholar
- Struck TH, Schult N, Kusen T, Hickman E, Bleidorn C, McHugh D, Halanych KM: Annelida phylogeny and the status of Sipuncula and Echiura. BMC Evol Biol. 2007, 7: 57-10.1186/1471-2148-7-57.View ArticlePubMedPubMed CentralGoogle Scholar
- Rouse GW, Pleijel F: Polychaetes. 2001, Oxford: University PressGoogle Scholar
- Hessle C: Zur Kenntnis der terebellomorphen Polychaeten. Zool Bidr Upps. 1917, 5: 39-258.Google Scholar
- Holthe T: Polychaeta Terebellomorpha. 1986, Oslo: Norwegian University Press, 7:Google Scholar
- Zhong M, Struck TH, Halanych KM: Phylogenetic information from three mitochondrial genomes of Terebelliformia (Annelida) worms and duplication of the methionine tRNA. Gene. 2008, 416 (1): 11-21. 10.1016/j.gene.2008.02.020.View ArticlePubMedGoogle Scholar
- Struck TH, Purschke G, Halanych KM: Phylogeny of Eunicida (Annelida) and Exploring Data Congruence using a Partition Addition Bootstrap Alteration (PABA) approach. Syst Biol. 2006, 55: 1-20. 10.1080/10635150500354910.View ArticlePubMedGoogle Scholar
- Burland TG: DNASTAR's lasergene sequence analysis software. Methods Mol Biol. 2000, 132: 71-91.PubMedGoogle Scholar
- Altschul SF, Gish W, Miller W, Myers EM, Lipman DJ: Basic local alignment search tool. J Mol Biol. 1990, 215: 403-410.View ArticlePubMedGoogle Scholar
- Lowe TM, Eddy SR: tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 1997, 25: 955-964. 10.1093/nar/25.5.955.View ArticlePubMedPubMed CentralGoogle Scholar
- Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG: The ClustalX windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 1997, 24: 4876-4882.View ArticleGoogle Scholar
- Castresana J: Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol Biol Evol. 2000, 17: 540-552.View ArticlePubMedGoogle Scholar
- Maddison DR, Maddison WP: MacClade4: Analysis of Phylogeny and Character Evolution, version 4.0. 2002, Sunderland, MA: Sinauer AssociatesGoogle Scholar
- Rambaut A: The Use of Temporally Sampled DNA Sequences in Phylogenetic Analysis. 1996, Oxford, UK: Oxford UniversityGoogle Scholar
- Swofford DL: PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods). 2002, Sunderland, MA: Sinauer Associates, 4.0bGoogle Scholar
- Posada D, Crandall KA: MODELTEST: testing the model of DNA substitution. Bioinformatics. 1998, 14: 817-818. 10.1093/bioinformatics/14.9.817.View ArticlePubMedGoogle Scholar
- Posada D, Crandall KA: Selecting the best-fit model of nucleotide substitution. Syst Biol. 2001, 50: 580-601.View ArticlePubMedGoogle Scholar
- Stamatakis A, Hoover P, Rougemont J: A rapid bootstrap algorithm for the RAxML web-servers. Syst Biol. 2008, 75 (5): 758-771.View ArticleGoogle Scholar
- Huelsenbeck JP, Ronquist F: MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics. 2001, 17: 754-755. 10.1093/bioinformatics/17.8.754.View ArticlePubMedGoogle Scholar
- Nylander JAA: MrModeltest. Evolutionary Biology Centre. 2002, Uppsala University: Program distributed by the authorGoogle Scholar
- Nylander JAA: MrModeltest v2. Evolutionary Biology Centre. 2004, Uppsala University: Program distributed by the authorGoogle Scholar
- Tracer v1.4. Available from http://beast.bio.ed.ac.uk/Tracer
- Gowri-Shankar V, Rattray M: A Reversible Jump Method for Bayesian Phylogenetic Inference with a Nonhomogeneous Substitution Model. Mol Biol Evol. 2007, 24 (6): 1286-1299. 10.1093/molbev/msm046.View ArticlePubMedGoogle Scholar
- Shimodaira H: An approximately unbiased test of phylogenetic tree selection. Syst Biol. 2002, 51 (3): 492-508. 10.1080/10635150290069913.View ArticlePubMedGoogle Scholar
- Shimodaira H, Hasegawa M: CONSEL: for assessing the confidence of phylogenetic tree selection. Bioinformatics. 2001, 17 (12): 1246-1247. 10.1093/bioinformatics/17.12.1246.View ArticlePubMedGoogle Scholar
- Lockhart PJ, Howe C, Barbrook A, Larkum AWD, Penny D: Spectral Analysis, Systematic Bias, and the Evolution of Chloroplasts. Mol Biol Evol. 1999, 16 (4): 573-576.View ArticleGoogle Scholar
- Lockhart PJ, Penny D, Meyer A: Testing the phylogeny of swordtail fishes using split decomposition and spectral analysis. J Mol Evol. 1995, 41 (5): 666-674.View ArticleGoogle Scholar
- Cao Y, Fujiwara M, Nikaido M, Okada N, Hasegawa M: Interordinal relationships and timescale of eutherian evolution as inferred from mitochondrial genome data. Gene. 2000, 259: 149-158. 10.1016/S0378-1119(00)00427-3.View ArticlePubMedGoogle Scholar
- Mouchaty SK, Gullberg A, Janke A, Arnason U: Phylogenetic position of the Tenrecs (Mammalia: Tenrecidae) of Madagascar based on analysis of the complete mitochondrial genome sequence of Echinops telfari. Zool Scr. 2000, 29: 307-317. 10.1046/j.1463-6409.2000.00045.x.View ArticleGoogle Scholar
- Schmitz J, Ohme M, Zischler H: The complete mitochondrial sequence of Tarsius bancanus: Evidence for an extensive nucleotide compositional plasticity of primate mitochondrial DNA. Mol Biol Evol. 2002, 19:Google Scholar
- Härlid A, Arnason U: Analysis of mitochondrial DNA nest ratite birds within the Neognathae - supporting a neotenous origin of ratite morphological characters. Proc R Soc London B. 1999, 266: 1-5. 10.1098/rspb.1999.0597.View ArticleGoogle Scholar
- Mindell DP, Sorenson MD, Dimcheff DE, Hasegawa M, Ast JC, Yuri T: Interordinal relationships of birds and other reptiles based on whole mitochondrial genomes. Syst Biol. 1999, 48: 138-152. 10.1080/106351599260490.View ArticlePubMedGoogle Scholar
- Foster PG, Hickey DA: Compositional Bias May Affect Both DNA-Based and Protein-Based Phylogenetic Reconstructions. J Mol Evol. 1999, 48 (3): 284-290. 10.1007/PL00006471.View ArticlePubMedGoogle Scholar
- Hassanin A, LéGer N, Deutsch J: Evidence for Multiple Reversals of Asymmetric Mutational Constraints during the Evolution of the Mitochondrial Genome of Metazoa, and Consequences for Phylogenetic Inferences. Syst Biol. 2005, 54 (2): 277-298. 10.1080/10635150590947843.View ArticlePubMedGoogle Scholar
- Longhorn SJ, Foster PG, Vogler AP: The nematode-arthropod clade revisited: phylogenomic analyses from ribosomal protein genes misled by shared evolutionary biases. Cladistics. 2007, 23: 130-144. 10.1111/j.1096-0031.2006.00132.x.View ArticleGoogle Scholar
- Stach T, Braband A, Podsiadlowski L: Erosion of phylogenetic signal in tunicate mitochondrial genomes on different levels of analysis. Mol Phylogenet Evol. 2010, 55 (3): 860-870. 10.1016/j.ympev.2010.03.011.View ArticlePubMedGoogle Scholar
- Perna NT, Kocher TD: Patterns of nucleotide composition at fourfold degenerate sites of animal mitochondrial genomes. J Mol Evol. 1995, 41 (3): 353-358. 10.1007/BF01215182.View ArticlePubMedGoogle Scholar
- Reyes A, Gissi C, Pesole G, Saccone C: Asymmetrical directional mutation pressure in the mitochondrial genome of mammals. Mol Biol Evol. 1998, 15: 957-966.View ArticlePubMedGoogle Scholar
- Bergsten J: A review of long-branch attraction. Cladistics. 2005, 21 (2): 163-193. 10.1111/j.1096-0031.2005.00059.x.View ArticleGoogle Scholar
- Struck TH, Nesnidal MP, Purschke G, Halanych KM: Detecting possibly saturated positions in 18S and 28S sequences and their influence on phylogenetic reconstruction of Annelida (Lophotrochozoa). Mol Phylogenet Evol. 2008, 48 (2): 628-645. 10.1016/j.ympev.2008.05.015.View ArticlePubMedGoogle Scholar
- Lecointre G, Philippe H, Van Le HL, Le Guyader H: Species sampling has a major impact on phylogenetic inference. Mol Phylogenet Evol. 1993, 2 (3): 205-224. 10.1006/mpev.1993.1021.View ArticlePubMedGoogle Scholar
- Milinkovitch MC, LeDuc RG, Adachi J, Farnir F, Georges M, Hasegawa M: Effects of character weighting and species sampling on phylogeny reconstruction: A case study based on DNA sequence data in Cetaceans. Genetics. 1996, 144: 1817-1833.PubMedPubMed CentralGoogle Scholar
- Sheffield NC, Song H, Cameron SL, Whiting MF: Nonstationary evolution and compostional heterogeneity in beetle mitochondrial phylogenomics. Syst Biol. 2009, 58 (4): 381-394. 10.1093/sysbio/syp037.View ArticlePubMedGoogle Scholar
- Swofford DL, Olsen GJ, Waddell PJ, Hillis DM: Chapter 11 - Phylogenetic Inference. Molecular Systematics. Edited by: Hillis DM, Moritz C, Mable BK. 1996, Sunderland, MA: Sinauer Associates, 407-514. 2Google Scholar
- Rousset V, Pleijel F, Rouse GW, Erséus C, Siddall ME: A molecular phylogeny of annelids. Cladistics. 2007, 23 (1): 41-63. 10.1111/j.1096-0031.2006.00128.x.View ArticleGoogle Scholar
- Phillips MJ, Lin Y-H, Harrison GL, Penny D: Mitochondrial genomes of a bandicoot and a brushtail possum confirm the monophyly of australidelphian marsupials. Proc R Soc London B. 2001, 268: 1533-1538. 10.1098/rspb.2001.1677.View ArticleGoogle Scholar
- Springer MS, Douzery EJP: Secondary structure and patterns of evolution among mammalian 12S rRNA molecules. J Mol Evol. 1996, 43: 357-373. 10.1007/BF02339010.View ArticlePubMedGoogle Scholar
- Tanaka M, Ozawa T: Strand asymmetry in human mitochondrial DNA mutations. Genomics. 1994, 22: 327-335. 10.1006/geno.1994.1391.View ArticlePubMedGoogle Scholar
- Pérez-Losada M, Høeg JT, Kolbasov GA, Crandall KA: Reanalysis of the relationship among the Cirripedia and the Ascothoracida and the phylogenetic position of the Facetotecta (Maxillopoda: Thecostraca) using 18S rDNA sequences. J Crust Biol. 2002, 22: 661-669. 10.1651/0278-0372(2002)022[0661:ROTRAT]2.0.CO;2.View ArticleGoogle Scholar
- Malmgren AJ: Nordiska Hafs - Annulater. Öfv af K Sven Vet Akad Förhandl. 1866, 22: 355-410.Google Scholar
- Rousset V, Rouse G, Féral J-P, Desbruyères D, Pleijel F: Molecular and morphological evidence of Alvinellidae relationships (Terebelliformia, Polychaeta, Annelida). Zool Scr. 2003, 32: 185-197. 10.1046/j.1463-6409.2003.00110.x.View ArticleGoogle Scholar
- Glasby CJ, Hutchings PA, Hall K: Assessment of monophyly and taxon affinities within the polychaete clade Terebelliformia (Terebellida). J Mar Biol Ass UK. 2004, 84 (05): 961-971. 10.1017/S0025315404010252h.View ArticleGoogle Scholar
- Mwinyi A, Meyer A, Bleidorn C, Lieb B, Bartolomaeus T, Podsiadlowski L: Mitochondrial genome sequence and gene order of Sipunculus nudus give additional support for an inclusion of Sipuncula into Annelida. BMC Genomics. 2009, 10: 27-10.1186/1471-2164-10-27.View ArticlePubMedPubMed CentralGoogle Scholar
- Boore JL, Staton JL: The Mitochondrial Genome of the Sipunculid Phascolopsis gouldii Supports Its Association with Annelida Rather than Mollusca. Mol Biol Evol. 2002, 19 (2): 127-137.View ArticlePubMedGoogle Scholar
- Shen X, Ma X, Ren J, Zhao F: A close phylogenetic relationship between Sipuncula and Annelida evidenced from the complete mitochondrial genome sequence of Phascolosoma esculenta. BMC Genomics. 2009, 10: 136-10.1186/1471-2164-10-136.View ArticlePubMedPubMed CentralGoogle Scholar
- Desbruyères D, Laubier L: Alvinella pompejana gen. sp. nov., Ampharetidae abberant des sources hydrothermales de la ride Est-Pacifique. Oceanol Acta. 1980, 3: 267-274.Google Scholar
- Spears T, Abele LG, Applegate MA: A phylogenetic study of cirripeds and their relatives (Crustacea: Thecostraca). J Crust Biol. 1994, 14: 641-656. 10.2307/1548858.View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.