Phylogenomics of strongylocentrotid sea urchins
© Kober and Bernardi; licensee BioMed Central Ltd. 2013
Received: 19 December 2012
Accepted: 9 April 2013
Published: 23 April 2013
Strongylocentrotid sea urchins have a long tradition as model organisms for studying many fundamental processes in biology including fertilization, embryology, development and genome regulation but the phylogenetic relationships of the group remain largely unresolved. Although the differing isolating mechanisms of vicariance and rapidly evolving gamete recognition proteins have been proposed, a stable and robust phylogeny is unavailable.
We used a phylogenomic approach with mitochondrial and nuclear genes taking advantage of the whole-genome sequencing of nine species in the group to establish a stable (i.e. concordance in tree topology among multiple lies of evidence) and robust (i.e. high nodal support) phylogenetic hypothesis for the family Strongylocentrotidae. We generated eight draft mitochondrial genome assemblies and obtained 13 complete mitochondrial genes for each species. Consistent with previous studies, mitochondrial sequences failed to provide a reliable phylogeny. In contrast, we obtained a very well-supported phylogeny from 2301 nuclear genes without evidence of positive Darwinian selection both from the majority of most-likely gene trees and the concatenated fourfold degenerate sites: ((P. depressus, (M. nudus, M. franciscanus), (H. pulcherrimus, (S. purpuratus, (S. fragilis, (S. pallidus, (S. droebachiensis, S. intermedius)). This phylogeny was consistent with a single invasion of deep-water environments followed by a holarctic expansion by Strongylocentrotus. Divergence times for each species estimated with reference to the divergence times between the two major clades of the group suggest a correspondence in the timing with the opening of the Bering Strait and the invasion of the holarctic regions.
Nuclear genome data contains phylogenetic signal informative for understanding the evolutionary history of this group. However, mitochondrial genome data does not. Vicariance can explain major patterns observed in the phylogeny. Other isolating mechanisms are appropriate to explore in this system to help explain divergence patterns not well supported by vicariance, such as the effects of rapidly evolving gamete recognition proteins on isolating populations. Our findings of a stable and robust phylogeny, with the increase in mitochondrial and nuclear comparative genomic data, provide a system in which we can enhance our understanding of molecular evolution and adaptation in this group of sea urchins.
KeywordsStrongylocentrotus Phylogenomics Sea urchin Holarctic expansion Bering Strait Vicariant divergence
Sea urchins are benthic marine echinoderms distributed across all of the world’s oceans . Despite their unusual appearance, they have been a component of human diets since at least the ancient Greeks  and are still experiencing a vigorous fisheries industry today [3, 4]. The contribution of sea urchins to our understanding of many aspects of basic biology cannot be understated . Sea urchins are a primary research model for embryology , fertilization , bilaterian development , genomic regulatory systems [8, 9] and our basic understanding of fundamental properties of genomes [10, 11]. They provide broadly useful natural systems in which we investigate central evolutionary questions of natural selection [12, 13], reproductive isolation  and speciation [15–17] and ecological questions of population responses to disease  and global scale habitat distribution patterns . Indeed, our first coherent view of cancer was provided by studying embryonic development in sea urchins  and origins of the phagocytic theory, a key process in the idea of an immune system, were based on observations of the movement and engulfing of foreign particles in echinoderm tissue .
The location of Echinodermata as an early branch in the deuterostome phylogeny serves as an important node with which to infer ancestral states of vertebrate biology [22, 23]. This placement is useful for addressing broad reaching questions on the origins and evolution of animal immunity  and development . Among sea urchins, the family Strongylocentrotidae is arguably the best studied group  and includes the well-annotated genome of the representative model species Strongylocentrotus purpuratus. The Strongylocentrotidae are abundant marine echinoids with members living in the northern Pacific, northern Atlantic and the holarctic regions . The group is comprised of four genera: Strongylocentrotus, Hemicentrotus, Pseudocentrotus and Mesocentrotus.
The phylogenetic position of strongylocentrotids relative to other sea urchins is well understood [29–31]. The genus Strongylocentrotus comprises five species: S. purpuratus, S. pallidus, S. droebachiensis, S. intermedius, S. fragilis and S. polyacanthus. Strongylocentrotus djakonovi has been assigned as a junior synonym for S. droebachiensis, S. pulchellus a junior synonym for S. intermedius and A. fragilis is a junior synonym for S. fragilis. Mesocentrotus comprises M. franciscanus (nee S. franciscanus) and M. nudus (nee S. nudus). Hemicentrotus and Pseudocentrotus are monotypic with H. pulcherrimus and P. depressus, respectively.
Recent mitochondrial molecular phylogenies have identified two clades, one consisting of members of Strongylocentrotus and Hemicentrotus and the other consisting of Mesocentrotus and Pseudocentrotus[34–36]. However, the branching orders within the Strongylocentrotus and Hemicentrotus clades are largely incongruent. Specifically, the relationships of S. intermedius, S. droebachiensis and S. pallidus, the relative placements of S. purpuratus and S. fragilis, and the positions of S. intermedius and H. pulcherrimus are unresolved.
This issue is outstanding because an accurate and robust phylogeny is essential for correctly interpreting the broad range of contemporary biological research being performed on this group. The unresolved phylogenetic relationships among strongylocentrotids underscores the problem of using few loci in a group with large effective population sizes and complex histories that may involve hybridization . This is particularly relevant in this group since sea urchins are broadcast spawners, where fertilization occurs in the water column. Many strongylocentrotid species live in sympatry, display overlapping spawning seasons, and have unequal gametic compatibilities . The fertilization efficiencies of eggs and sperm between species are often asymmetric and gamete recognition loci are thought to play an important role in post-mating pre-zygotic isolation . Selection on components of gamete interactions are thought to be particularly important early on in the speciation process. In Strongylocentrotidae, however, gametes from distantly related sympatric M. franciscanus and S. purpuratus readily fertilize in the lab, but hybrids are seldom seen in nature and no introgression has been observed between these two species . Therefore, the rapid evolution of gamete recognition proteins is of particular interest in this group and is under intense study [13, 14, 40]. An accurate phylogeny is integral to this work.
The combined action of incomplete lineage sorting and introgression of genes between species are known to greatly complicate the resolution of species trees [41–43] weakening single loci phylogenetic inferences . Congruence among multiple genes and morphology has been suggested as a robust approach to reconstruct a reliable phylogeny . Multi-locus analyses at genome-wide scales offer a remarkable opportunity for powerful improvements in molecular phylogenetic inference . The advent of next-generation sequencing and genome assembly makes such analyses possible. A high quality, well-annotated, draft genome for S. purpuratus is available [27, 47, 48] and high coverage, whole-genome sequencing, has been completed for nine of the ten species comprising the family Stron gylocentrotidae (Kober and G. H. Pogson, unpublished data).
The objective of this study was to establish a strong phylogenetic hypothesis for the family Strongylocentrotidae based on alignments of nuclear and mitochondrial genes from 9 (out of the 10) species of the family. The development of a robust and stable phylogeny in this group will provide essential comparative tools to a vast group of scientists including those interested in ecology, evolution, developmental biology and physiology.
Mitochondrial DNA genome assemblies
Genera and species of sea urchins used in this study
Depth range (m)
Mitochondrial genome size
S. purpuratus (Stimpson)
S. pallidus (Sars)
S. droebachiensis (O. F. Müller)
S. intermedius (A. Agassiz)
S. fragilis (Jackson)
M. franciscanus (A. Agassiz)
M. nudus (A. Agassiz)
H. pulcherrimus (A. Agassiz)
P. depressus (A. Agassiz)
Alignments and model selection
The data partitions used for phylogenetic analysis
Tests for molecular adaptation of mitochondrial genes
We found no evidence for positive selection acting on any of the protein coding mitochondrial genes tested (ATPase6, COI, COII, CytB, ND1, ND2, ND3, ND4, ND4L). Internal stop codons were found for at least one site in the alignments for ND4 and ND6 and the approximate likelihood calculation in ATPase8 was unreliable. For these reasons we did not test these three genes for evidence of positive selection.
Phylogenic relationships inferred from neutral nuclear genes
The BI 50% majority-rule consensus phylogram of the stationary tree inferred from fourfold degenerate sites of nuclear genes without selection (‘N4Ds tree’) had complete (BI PP = 1, BSML = BSMP = 100) or very strong support from all three methods at all nodes with the N4N and N4A datasets except at the divergence of S. fragilis (BSMP = 74 and BSMP = 69, respectively). As such, we observed an effect on the phylogenetic inference when including genes found to be under positive selection. Indeed, the tree obtained from the N4S data produced a similar topography except S. purpuratus and S. fragilis branching locations were swapped, with S. fragilis the earlier branching of the two with low support for the node (not shown).
Comparing the mtDNA and nuclear results, we observed very strong support for H. pulcherrimus sister to the clade containing S. fragilis, S. purpuratus, S. intermedius, S. pallidus and S. droebachiensis across our analysis of concatenated datasets. We also found very strong support for a monophyletic clade of S. intermedius, S. droebachiensis and S. pallidus across our analysis of concatenated datasets. However, MA and M4 datasets produced S. pallidus sister to S. droebachiensis, but without strong support. In contrast, the N4A, N4S and N4N concatenated datasets found very strong support S. intermedius as sister to S. pallidus.
Phylogenic relationships inferred from mitochondrial genes
In all mitochondrial ML gene trees except ATPase8, 12S and ND6, the S. pallidus and S. droebachiensis individuals from GenBank and our de novo assemblies resolved as sister taxa as expected. However, the putative outgroup, P. lividus, consistently produced a very long branch and that shifted to different locations among the ML gene trees. Ignoring a P. lividus root, the individual mitochondrial ML gene trees topologies were consistent in resolving Clade M and Clade S (Additional file 2: Figures S2, Additional file 3: Figure S3, Additional file 4: Figure S4). However, the branching order within these clades was inconsistent and in conflict with the nuclear data. We found contradictory topologies for the relative positions of S. fragilis and S. purpuratus among mitochondrial genes trees. H. pulcherrimus was placed sister to Strongylocentrotus species in all gene trees except ATPase6, ND4L and ND6. No single mitochondrial gene returned a topology corroborating with the N4Ds tree.
The locations of S. purpuratus and S. fragilis were discordant between the MP method and BI and ML methods in the MA and M4 datasets. BI and ML trees had these two species sister to S. intermedius, S. pallidus and S. droebachiensis (Figure 3), whereas the MP method has S. purpuratus branching earliest, then S. fragilis and then the S. intermedius, S. pallidus and S. droebachiensis observed with the nuclear concatenated datasets (not shown). The monophyly of a S. intermedius, S. pallidus and S. droebachiensis clade was recovered in both MA and M4, but we found conflicting support for a sister relationship between S. pallidus and S. droebachiensis versus S. pallidus and S. intermedius (Figure 3).
The 12S sequences used by Lee (2003) were collected, aligned, and used to construct an ML tree as described above for rRNA mitochondrial genes. The proposed relationship shown in Figure 2 of Lee (2003) was found to be no better at explaining these data (P > 0.505) than our proposed species tree (Figure 1). The proposed tree of Lee (2003) differed significantly from our proposed species tree (Figure 1) in 762 of 2301 nuclear genes tested. The N4N tree was significantly better at explaining the data for 685 genes, while Lee (2003) Figure 2 was significantly better for 77. When we included the 12S sequences of Lee (2003) with our 12S data, our alignment and ML method produced a different tree (Additional file 5: Figure S5). Here, H. pulcherrimus and S. nudus individuals resolved as sister taxa, but S. intermedius falls in sister to the S. pallidus sequences rather than with the S. intermedius of Lee (2003). The S. intermedius sequence of Lee (2003) falls sister to S. fragilis in a clade with S. purpuratus.
The combined dataset (COI, COII, tRNA-Lys, ATPase8 and ATPase6) of Biermann et al. (2003) was collected and concatenated after removing S. polyacanthus. An alignment was generated and ML trees reconstructed as described above for the protein-coding mitochondrial genes. The proposed relationship in Figure 2 of Biermann et al. (2003) was found to be significantly better at explaining these data (P < 0.001) than our proposed species tree (Figure 1). However, the proposed tree of Figure 2 of Biermann et al. (2003) differed significantly from our proposed species tree (Figure 1) in 1865 of 2301 nuclear genes tested. Our species tree was significantly better at explaining the data for 1855 genes, while Figure 2 of Biermann et al. (2003) was significantly better for 10 genes.
Using RNA secondary structure in phylogeny reconstruction has been shown to have significant utility in resolving relationships in metazoan taxa [51–53]. However, our results from 12S and 16S mixed model and un-partitioned analyses produced very similar trees (not shown).
Rate of molecular evolution and divergence times
Divergence time estimates
12SrDNA rate estimate
12SrDNA Lee (2003)
Numerous processes, including horizontal gene transfer (HGT), gene duplication, introgressive hybridization, incomplete lineage sorting and natural selection may all contribute to gene tree histories that do not represent the true species tree , resulting in gene trees that do not necessarily reflect species trees ., In this group of sea urchins, introgression has been documented between some taxa , and of the primary mechanisms of HGT, the possibility of HGT by viral transfer exists but is likely to be extremely rare (G. H. Pogson, personal communication). Despite these factors, integrating information from large numbers of independent loci offers considerable promise to generate robust phylogenies in situations where small number of loci failed to do so , although care must be taken to assess the robustness of results in the proper biological context . The two multi-locus molecular phylogenies previously published for Strongylocentrotidae provided strong support for the composition of the major groups, but were unable to resolve the relationships of the species [34, 35].
The variation in the evolutionary histories of multiple independent genes are typically addressed with either data partitions with different nucleotide substitution models, or with mixture models allowing for random variation between sites . Recent phylogenomic work has demonstrated the potential poor performance of standard phylogenetic methods due to among-site rate variation, causing shifts in the phylogenetic positions of terminal taxa in well-supported trees generated from different models of nucleotide substitution or by different methods . Our analyses evaluated both the gene level support-based evidence and a concatenated site approach including the implicit model of nucleotide evolution in MP, an explicit model of GTR + I + G with BI and a mixture model allowing for rate variation among sites under ML. Our results did not find discordance between the topologies inferred between methods, or with the nodal support based on the different usage of nucleotide substitution models between the ML, BI and MP analysis of nuclear fourfold degenerate sites of genes without evidence of positive selection. We take the complete concordance between such disparate methods and the morphological data as strong support for the biological significance of these proposed species relationships.
Mitochondrial genes offer potential utility as molecular markers for reconstructing phylogenetic relationships, as the order and number of mitochondrial genes are typically conserved over large phylogenetic distances and orthology is clear . However, mitochondrial phylogenies may be misleading , a fact we find in our data best represented by the incongruence and limited node support between BI, MP and MP methods with the concatenated mitochondrial MA and M4 data. Our results using mitochondrial genes, and those of previous studies in this group, produce conflicting topologies and do not demonstrate clear or consistent signals of phylogenetic relationships.
Strongylocentrotus and Mesocentrotus
For this study, we have chosen to follow the taxon details of the World Echinoidea Database  and acknowledge the Mesocentrotus genus and identify S. fragilis (nee Allocentrotus fragilis). Indeed, the molecular evidence from this study strongly support the membership of M. franciscanus and M. nudus to a group sister to Strongylocentrotus. Our results confirm the two major clades of Strongylocentrotus and Mesocentrotus previously identified by mitochondrial gene studies [34, 35]. Clade S forms a monophyletic Strongylocentrotus and Hemicentrotus group supporting the inclusion of S. fragilis. Clade M conforms to the proposed Mesocentrotus group , including M. franciscanus, M. nudus and P. depressus. The molecular distinction between Strongylocentrotus and Mesocentrotus taxa is also supported by recent morphological classifications of the cross-section of the ultrastructure of primary spines .
Previous studies suggested H. pulcherrimus was an early branching member of clade S [34, 35]. Our data support H. pulcherrimus as an early branching member of this clade , rather than sister to S. intermedius.
Divergence patterns and speciation
Population disjunctions, such as vicariant events and limitations to dispersal, are important first steps towards allopatric speciation . Vicariant events due to sea level changes are well documented across the Isthmus of Panama , Baja California  and the Bering Strait . Sea levels experienced a severe decline at 10.5 Ma with regular fluctuations occurring since . This fluctuation broadly corresponds to the “Vicariant Pacific Pattern” (VPP), where amphi-Pacific taxa gave rise to eastern and western Pacific forms  during the Neogene.
Parsimoniously, our phylogeny suggests a western Pacific (WP) last common ancestor living in shallow, warmer waters followed by an expansion into the WP ancestor of the two major clades. Descendants of each clade experienced two separate eastern Pacific (EP) invasions (S. purpuratus and M. franciscanus). In Clade S, a single deep, cold-water invasion at the ancestor of S. fragilis and S. pallidus occurred, with the LCA of S. pallidus, S. droebachiensis and S. intermedius invading the Arctic and becoming holarctic (HA) in range. Surprisingly, our data provide strong support for a sister grouping of S. droebachiensis and S. intermedius. This suggests that S. intermedius has invaded the WP and moved into shallower and warmer water.
The sister species of M. nudus and M franciscanus show disjunct distributions, with one species inhabiting the northwest and the other the northeast Pacific, respectively. The estimated divergence time between these two species of 2.7-4.8 Ma using the fossil record calibration is more recent than the 5.7-8.1 Ma estimated from 12S mitochondrial DNA but still corresponds with the sea level fluctuations and fit with the VPP . This estimated time of divergence also corresponds to the split between P. depressus and the ancestor of M. nudus and M. franciscanus, suggesting a corresponding event occurring the northern Pacific. In the other sister pair, S. droebachiensis inhabits the holarctic region and overlaps the distribution of it’s sister, S. intermedius. Extant sister species, however, may not be true sisters as other lineages may be extinct. In addition, current ranges many not reflect historical ones. It is not clear from this phylogeny as to whether these two species likely diverged through allopatric or sympatric means . Interestingly, this habitat overlap becomes marginal if S. pulchellus is a distinct species, and not a synonym of S. droebachiensis. Major morphological work on the group found S. pulchellus agrees with S. intermedius in all morphological structures examined except for the tooth skeleton . Future molecular and morphological work will certainly shed light on this divergence.
The members of Clade S show rapid evolutionary divergence along with habitat expansions and changes following a split from a WP ancestor, conforming to the VPP. Isolated spines fossil evidence places undefined Strongylocentrotus in the northeast Pacific in the late Miocene  though the reliability of these identifications remain suspect . The opening of the Bering Strait would provide the access into arctic habitats necessary for a holarctic expansion [34, 35]. The Bering Strait opened at the end of the Miocene, 5.32 Ma , overlapping the early bounds of our estimated divergence time of 3.1-5.6 Ma from fossil calibration for the clade containing S. purpuratus, S. fragilis, S. pallidus, S. droebachiensis and S. intermedius. Furthermore, distinct S. purpuratus, S. droebachiensis, and M. franciscanus fossils are seen in California formations of the middle Pliocene and S. droebachiensis fossils reached western Europe by the late Pliocene , supporting a late Miocene, early Pliocene divergence.
The Strongylocentrotidae has two deep-water species, S. fragilis and S. pallidus. Our inferred phylogeny provides evidence for a single radiation into the deep-water habitat. S. pallidus is typically found in lower depths . S. droebachiensis is also know to reach depths of 1150 m, but is typically found in the shallow sub tidal zone from 0 to 50 m [26, 70]. These species coexist in the same geographic range with S. droebachiensis in the shallow and S. pallidus in the deep habitats. Our tree suggests that S. pallidus and S. fragilis share a recent common ancestor from a single deep-water invasion and as such may share adaptations to this environment. Indeed, adaptations for the deep-water habitat invasions of S. fragilis have been proposed based on genome-wide comparative analysis of three species (not including S. pallidus or S. droebachiensis) . However, gamete production declines with depth, and the very deep-water individuals of S. fragilis aren’t expected to be spawning (John Pearse, personal communication). If that is the case, then natural selection may not reach the very deep-water habitats and deep-water adaptations would be based on selection pressures found at the more shallow depths. With these genome-wide comparative data, future research can test for molecular adaptations along the branch leading to the ancestor of these taxa as well identify adaptations unique to the branches leading to these extant taxa.
Vicariance is insufficient to completely explain our observed pattern of divergence between these taxa, and much work has been done in this group to explore the effects of rapidly evolving gamete recognition proteins on isolating populations [14, 36, 39, 71]. However, the putative egg receptor protein, EBR1, for the sperm bindin gamete recognition protein in sea urchins is prohibitively long for traditional sequencing methods . The phylogenetic relationships inferred from our extended genomic sampling offer a unique opportunity to expand hypothesis of molecular evolution and adaptation in this group of sea urchins.
This phylogeny was consistent with a single invasion of deep-water environments followed by a holarctic expansion by Strongylocentrotus. Divergence times for each species estimated with reference to the divergence times between the two major clades of the group suggest a correspondence in the timing with the opening of the Bering Strait and the invasion of the holarctic regions. However, vicariance is insufficient to completely explain the divergence between these taxa and other isolating mechanisms are appropriate to explore in this system. In particular, much work has been done to explore the effects of rapidly evolving gamete recognition proteins on isolating populations in sea urchins. The phylogenetic relationships inferred in this study and the comparative genomic data now available provide a unique opportunity to explore hypothesis of molecular evolution and adaptation in natural populations.
Mitochondrial genome nucleotide sequences
Class Echinoidea has been found to be monophyletic in Echinoderm phylogenies inferred from both mitochondrial DNA (mtDNA) sequences and morphological data [29–31]. The family Strongylocentrotidae consistently resolves as sister to Paracentrotus lividus, Echinocardium cordatum and Arbacia lixula in both molecular and morphological phylogenies [31, 72].
The complete mitochondrial genomes available for six urchin species were obtained from GenBank (Strongylocentrotus droebachensis NC 009940; Strongylocentrotus purpuratus, NC 001453; Strongylocentrotus pallidus, NC 009941; Paracentrotus lividus, NC 001572; Arbacia lixula, NC 001770; Echinocardium cordatum, FN562581.1).
The published sequences for all nine strongylocentrotid species were collected for regions covering COI, COII, tRNA-Lys, ATPase8 and ATPase6 [GenBank:AY220998-AY221021] . The published sequences for S. intermedius, S. nudus and H. pulcherrimus were collected for COI [GenBank:AF525455, GenBank:AF525452 and GenBank:AF525453, respectively], NDI [GenBank:AF525454, GenBank:AF525450 and GenBank:AF525451, respectively] and 12S [GenBank:AF525769, GenBank:AF525767 and GenBank:AF525768, respectively) .
We supplemented the mitochondrial genomes collected from GenBank with de novo assemblies from Illumina paired-end reads of genomic. Reference sequences for twelve protein-coding (COI, COII, ND1, ND2, ND3, ND4, ND4L, ND5, ND6, CytB, ATPase6 and ATPase8) and two ribosomal RNA (12S and 16S) genes were identified in each mitochondrial genome based on the annotated nucleotide sequence from S. purpuratus [GenBank:NC001543].
Mitochondrial genome de novo assemblies and annotation
Mitochondrial genomic sequences were assembled de novo from Illumina paired-end reads of genomic DNA (Kober and Pogson, unpublished data). First, to obtain a set of putative mitochondrial DNA reads, all reads for each species were aligned to all six GenBank mitochondrial genome sequence with SSAHA2  using parameters ‘-solexa -skip 6 -pair 20,3000’. All reads that mapped to any of the six molecules with a mapping quality of greater than 5 were collected for each species.
The collected reads were used as input for de novo assembly of the molecule for each species using velvet . Hash size values between 11 and 99 were evaluated using VelvetOptimiser.pl  and optimized for the total number of base pairs in large contigs. For S. fragilis, previously sequenced 454 reads  were also aligned to S. purpuratus [GenBank:NC 001453] with SSAHA2. 454 reads aligning with a mapping quality >5 were also included with the Illumina paired end reads as input to the S. fragilis de novo assembly. Our assembled M. franciscanus molecule was included as an additional reference sequence for obtaining putative mitochondrial reads for the de novo assembly for M. nudus, P. depressus and H. pulcherrimus. De novo contigs over 1000 bp for P. depressus were collected and the partial mitochondrial genome was assembled from two non-overlapping contigs generated with CAP3 . These assembled mitochondrial genomes then provided the template from which we identified and extracted the nucleotide sequence of each gene for each species.
Reference sequences for twelve protein-coding (COI, COII, ND1, ND2, ND3, ND4, ND4L, ND5, ND6, CytB, ATPase6 and ATPase8) and two ribosomal RNA (12S and 16S) genes were identified in each mitochondrial genome based on the annotated nucleotide sequence from S. purpuratus [GenBank: NC 001543]. We identified the start and stop coordinates for each gene location on the de novo assembled mitochondrial genomes for each species by aligning the S. purpuratus gene reference sequence to our de novo assembled mitochondrial genome for each species using BLAT  with DNA sequences translated in six frames to protein and allowing one mismatch in the tile.
The sequence of protein coding mitochondrial genes we identified were aligned using transAlign  using the echinoderm mitochondrial code. The mitochondrial ribosomal RNA (rRNA) 12S and 16S gene sequences were aligned using clustalw2 . Ambiguously aligned regions were identified and removed with Gblocks  with default parameters and no gap positions allowed. Two sequences for each gene were obtained for S. droebachiensis and S. pallidus: one from the GenBank mitochondrial genome sequence and the other from our de novo assembly. The de novo assembled mitochondrial genomes and the predicted gene models were submitted to GenBank [GenBank: KC898196-KC898203].
Nuclear genome nucleotide sequences
Briefly, we defined nuclear genes based on the gene model coordinates defined in SpBase Build 6 based on the Spur v3.1 genome assembly (SpBase.org). The alignments generated from the genomic reads of S. purpuratus were used to represent that species, rather than the reference genome sequence. We discarded any gene model that was not manually annotated, incomplete (i.e. no internal stop codons, missing valid start or stop codon) or were putative in-paralogs (i.e. annotated as paralogs or of overlapping coordinates). Partial alignments of nuclear genes including ambiguous sites (i.e. heterozygote) were constructed from alignments of Illumina paired-end reads of nine species (Kober and Pogson, unpublished data) aligned to the S. purpuratus (Table 1). Paired-end short reads were aligned to Spur v3.1 using SSAHA2. Reads with a mapping quality of <30 were discarded. Nucleotides with a quality score of <25 were ignored. Heterozygotes sites were called when more than one allele was represented by a frequency of >0.20 and >10 valid nucleotides were present from aligned reads. We excluded alignments of greater than 100 unambiguous codons across all nine taxa, leaving 3,180 for analysis.
Additional tools used in the analyses included James Kent’s source tools , Biopython (http://biopython.org), FigTree (http://tree.bio.ed.ac.uk/software/figtree/), R , bedtools , EMBOSS  and the Newick utilites .
Concatenated alignments and tests for positive Darwinian selection
We created a concatenated alignment from the Gblocks masked alignments of mitochondrial genes (“MA”) and a concatenated alignment of mitochondrial genes from fourfold degenerate sites (“M4”) identified by codeml from PAML version 4.5 . Fourfold degenerate sites are identified in codeml as third position sites of a codon which are synonymous across all taxa in the alignment (Ziheng Yang, personal communication). Inference of positive Darwinian selection on mitochondrial protein coding genes was performed with codeml from the PAML package . The M7 and M8 models were used in an LRT test and significance was assessed based on a chi-square distribution with two degrees of freedom.
We identified nuclear genes with alignments of greater than 100 unambiguous codons across all nine taxa. A signal of positive Darwinian selection for a gene was defined as having a q-value < 0.05 based on the likelihood ratio test between models M7 and M8 implemented in codeml as described above for the mtDNA alignments. The most likely ML tree for these genes was used to represent the inferred phylogeny of that gene. We created concatenated alignments of 4-fold degenerate sites identified by codeml for all nuclear genes (“N4A”), nuclear genes with evidence for positive Darwinian selection (“N4S”) and those without any signal of positive selection (“N4N”).
The alignment of mitochondrial gene sequences newly obtained in the present study and the concatenated alignments of the fourfold degenerate sites of nuclear genes have been deposited in TreeBase (http://purl.org/phylo/treebase/phylows/study/TB2:S13990 ).
We used two main approaches to reconstruct the phylogeny of the group using mitochondrial and nuclear genes. The first was a support-based method, which evaluated the individual trees generated for each gene in the nuclear or mitochondrial genome. For the mitochondrial genes, this included an additional analysis accounting for pairing in the RNA secondary structure. The second main approach used the concatenation of sites between all genes in the nuclear or mitochondrial genome, respectively.
The ML tree for each mitochondrial gene was generated using PhyML with the best-fitting nucleotide substitution model, optimized tree topology, branch length and rate parameters, the best tree topology of NNI and SPR search operations, and 10 bootstrap replicates. The best fitting nucleotide substitution model was identified for each mitochondrial gene based on the AICc criterion evaluating 56 models using pmraic version 1.1 (http://www.abc.se/~nylander/mraic/pmraic.html) and PhyML 3.0 v. 20110919 .
For 12S and 16S, a Bayesian Inference (BI) partitioned analysis of RNA paired stem and unpaired loop sites  in were performed using PHASE 2.0 . We predicted a consensus secondary structure from each alignment using RNAalifold . Unpaired regions were analyzed under the general time-reversal REV  and paired stem regions were analyzed under the time reversible seven state RNA7D [92, 93]. We used a discrete-gamma model with six categories to approximate the Γ-distribution with no invariant sites allowed. We performed 1.5 million sampling iterations with a sampling period of 150 and burn-in iterations of 750,000. The remaining parameters followed Hudelot and colleagues .
For each nuclear gene consensus alignment, an ML tree was generated using PhyML using the estimated rate and probability of each class from the data (‘free_rates’), optimized tree topologies, branch length and rate parameters, the best tree topology of NNI and SPR search operations, and 100 bootstrap replicates.
For the concatenated ‘MA, ‘M4’, ‘N4A’, ‘N4S’ and ‘N4N’ alignments, we performed phylogenetic reconstructions using Maximum Parsimony (MP), Maximum Likelihood (ML) and Bayesian Inference (BI) methods in a uniform fashion. The ML analyses of concatenated datasets were performed with PhyML using the estimated rate and probability of each class from the data (‘free_rates’), optimized tree topologies, branch length and rate parameters, the best tree topology of NNI and SPR search operations, and 100 bootstrap replicates. The MP analyses of concatenated datasets were conducted using PAUP* version 4b10  and consisted of heuristic searches with 100,000 replicates of random stepwise addition and TBR branch swapping. Bootstrapping was done using 500,000 ‘fast-bootstrap’ pseudo-replicates. The BI analyses of concatenated datasets were performed using MrBayes v. 3.2.1  assuming a nucleotide substitution model with a gamma-distributed rate variation across sites and a proportion of sites invariable (GTR + I + G). Each dataset was run with four Markov chains for one million generations and sampled every 100 generations. Each analysis was run four times. The first 2500 trees from each run were discarded so that the final consensus tree was based on the combination of accepted trees from each run (a total of 30,004 trees). We tested the convergence between the four runs by examining the potential scale reduction factors (PSRF) produced by the ‘sump’ command in MrBayes. Support for nodes was determined using posterior probabilities (PP, calculated by MrBayes).
Assessment of significance of differences between trees
To determine if there were significant differences between two proposed trees given the data, we performed the SH test  using RELL bootstrap with 1000 replicates  and the HKY85 model of nucleotide substitution in PAUP*. We ascribed significance to a P-value < 0.05 as provided by the output.
Molecular clock and divergence times
A strict molecular-clock was tested against a non-clock model assuming a nucleotide substitution model with gamma-distributed rate variation across sites and a proportion of sites invariable (GTR + I + G) using MrBayes. Each dataset was run with four Markov chains for 500,000 generations to confirm PP convergence. The harmonic means of the likelihoods of the MCMC sampling were used as the marginal model likelihoods. A ratio exceeding 5 was considered very strong evidence favoring one model over the other . A strict clock-enforced BI tree with uniform branch lengths was used to estimate the divergence time of each species with MrBayes. We estimated a rate of substitutions per site per million years of 0.01 ± 0.005 and an exponential distribution with a rate of 0.1 for the tree age prior. One topology was tested with two divergence times. One calibration had a divergence time of 5–12 Ma for the ancestor S. intermedius and S. droebachiensis (Figure 4) based on the fossil record appearance of S. droebachiensis in the mid-Pliocene and identifiable Strongylocentrotus in the late Miocene . A second calibration used a divergence time between the (Strongylocentrotus, Hemicentrotus) clades and the (Paracentrotus, Mesocentrotus) clades of 13–19 Ma from 12S mitochondrial genes calibrated using a reference point estimated from the fossil record . Both of these calibrations remain within the Echinidae–Strongylocentrotidae divergence tentatively estimated to be at 25 Ma. .
We are grateful to Grant Pogson (UCSC) for discussion and comments on the manuscript and to John Pearse (UCSC) for discussion on the ecology and relationships of the group. We sincerely thank two anonymous reviewers for their helpful and constructive feedback. We thank David Haussler for access to computational resources. This manuscript is dedicated to memory of Roy J. Britten (1919–2012). This work was supported by the National Science Foundation (grant number DEB-1011061), the STEPS Foundation and the Myer’s Trust. Funding for Open Access provided by the UC Santa Cruz Open Access Fund.
- Pearse JS: Ecological role of purple Sea urchins. Science. 2006, 314: 940-941. 10.1126/science.1131888.PubMedView Article
- Soyer A: The Pantropheon : or, a history of food and its preparation in ancient times. 1977, New York: Paddington Press
- Wilen JE, Reynolds JA: The Sea urchin fishery: harvesting, processing and the market. Mar Resour Econ. 2000, 15: 115-126. 10.1023/A:1008300206313.View Article
- Berkes F, Hughes TP, Steneck RS, Wilson JA, Bellwood DR, Crona B, Folke C, Gunderson LH, Leslie HM, Norberg J, et al: Ecology. Globalization, roving bandits, and marine resources. Science. 2006, 311: 1557-1558. 10.1126/science.1122804.PubMedView Article
- Pederson T: The sea urchin’s siren. Dev Biol. 2006, 300: 9-14. 10.1016/j.ydbio.2006.10.006.PubMedView Article
- Briggs E, Wessel GM: In the beginning…animal fertilization and sea urchin development. Dev Biol. 2006, 300: 15-26. 10.1016/j.ydbio.2006.07.014.PubMedView Article
- Howard-Ashby M, Materna SC, Brown CT, Tu Q, Oliveri P, Cameron RA, Davidson EH: High regulatory gene use in sea urchin embryogenesis: Implications for bilaterian development and evolution. Dev Biol. 2006, 300: 27-34. 10.1016/j.ydbio.2006.10.016.PubMed CentralPubMedView Article
- Davidson EH: The Sea urchin genome: where will It lead Us?. Science. 2006, 314: 939-940. 10.1126/science.1136252.PubMedView Article
- Britten RJ, Davidson EH: Gene regulation for higher cells: a theory. Science. 1969, 165: 349-357. 10.1126/science.165.3891.349.PubMedView Article
- Britten RJ: Rates of DNA sequence evolution differ between taxonomic groups. Science. 1986, 231: 1393-1398. 10.1126/science.3082006.PubMedView Article
- Britten RJ, Kohne DE: Repeated sequences in DNA. Hundreds of thousands of copies of DNA sequences have been incorporated into the genomes of higher organisms. Science. 1968, 161: 529-540. 10.1126/science.161.3841.529.PubMedView Article
- Oliver TA, Garfield DA, Manier MK, Haygood R, Wray GA, Palumbi SR: Whole-genome positive selection and habitat-driven evolution in a shallow and a deep-Sea urchin. Genome Biol Evol. 2010, 2: 800-814. 10.1093/gbe/evq063.PubMed CentralPubMedView Article
- Pujolar JM, Pogson GH: Positive Darwinian selection in gamete recognition proteins of Strongylocentrotus sea urchins. Mol Ecol. 2011, 20: 4968-4982. 10.1111/j.1365-294X.2011.05336.x.PubMedView Article
- Lessios HA: Speciation genes in free-spawning marine invertebrates. Integr Comp Biol. 2011, 51: 456-465. 10.1093/icb/icr039.PubMedView Article
- Lessios H: Speciation in sea urchins. Echinoderms: DurhamProceedings of the 12th International Echinoderm Conference, 7–11 August 2006, Durham, New Hampshire, USA. Edited by: Harris L, Böttger S, Walker C, Lesser M. 2009, London: CRC Press, 91-101.View Article
- Palumbi S, Grabowsky G, Duda T, Geyer L, Tachino N: Speciation and population genetic structure in tropical pacific sea urchins. Evolution. 1997, 51: 1506-1517. 10.2307/2411203.View Article
- Lessios HA, Lockhart S, Collin R, Sotil G, Sanchez-Jerez P, Zigler KS, Perez AF, Garrido MJ, Geyer LB, Bernardi G, et al: Phylogeography and bindin evolution in Arbacia, a sea urchin genus with an unusual distribution. Mol Ecol. 2012, 21: 130-144. 10.1111/j.1365-294X.2011.05303.x.PubMedView Article
- Miller RJ, Colodey AG: Widespread mass mortalities of the green sea urchin in Nova Scotia, Canada. Mar Biol. 1983, 73: 263-267. 10.1007/BF00392252.View Article
- Iken K, Konar B, Benedetti-Cecchi L, Cruz-Motta JJ, Knowlton A, Pohle G, Mead A, Miloslavich P, Wong M, Trott T, et al: Large-scale spatial distribution patterns of echinoderms in nearshore rocky habitats. PLoS One. 2010, 5: e13845-10.1371/journal.pone.0013845.PubMed CentralPubMedView Article
- Knudson AG: Of sea urchins and worms: development and cancer. Cell Death Differ. 2004, 11: 11-12. 10.1038/sj.cdd.4401295.PubMedView Article
- Tajima K, Silva J, Lawrence J: Disease in Sea Urchins. Edible Sea Urchins: Biology and Ecology. Edited by: Lawrence J. 2007, Oxford, UK: Elsevier. , 167-182. Developments in Aquaculture and Fisheries ScienceView Article
- Andrew Cameron R, Davidson EH: A basal deuterostome genome viewed as a natural experiment. Gene. 2007, 406: 1-7. 10.1016/j.gene.2007.04.031.PubMed CentralPubMedView Article
- Materna SC, Berney K, Cameron RA: The S. purpuratus genome: a comparative perspective. Dev Biol. 2006, 300: 485-495. 10.1016/j.ydbio.2006.09.033.PubMedView Article
- Rast JP, Messier-Solek C: Marine invertebrate genome sequences and our evolving understanding of animal immunity. Biol Bull. 2008, 214: 274-283. 10.2307/25470669.PubMedView Article
- Pearson JC, Lemons D, McGinnis W: Modulating Hox gene functions during animal body patterning. Nat Rev Genet. 2005, 6: 893-904. 10.1038/nrg1726.PubMedView Article
- Lawrence JM: Edible sea urchins : biology and ecology. 2007, Amsterdam ; Boston: Elsevier, 2
- Sodergren E, Weinstock GM, Davidson EH, Cameron RA, Gibbs RA, Angerer RC, Angerer LM, Arnone MI, Burgess DR, Burke RD, et al: The Genome of the Sea Urchin Strongylocentrotus purpuratus. Science. 2006, 314: 941-952.PubMedView Article
- World Echinoidea Database. http://www.marinespecies.org/echinoidea/,
- Janies DA, Voight JR, Daly M: Echinoderm phylogeny including Xyloplax, a progenetic asteroid. Syst Biol. 2011, 60: 420-438. 10.1093/sysbio/syr044.PubMedView Article
- Littlewood DT, Smith AB: A combined morphological and molecular phylogeny for sea urchins (Echinoidea: Echinodermata). Philos Trans R Soc Lond B Biol Sci. 1995, 347: 213-234. 10.1098/rstb.1995.0023.PubMedView Article
- Perseke M, Bernhard D, Fritzsch G, Brümmer F, Stadler PF, Schlegel M: Mitochondrial genome evolution in Ophiuroidea, Echinoidea, and Holothuroidea: insights in phylogenetic relationships of Echinodermata. Mol Phylogenet Evol. 2010, 56: 201-211. 10.1016/j.ympev.2010.01.035.PubMedView Article
- Bazhin A: The sea urchin genus Strongylocentrotus in the seas of Russia: taxonomy and ranges. Echinoderms(Proceedings of the 9th International Echinoderm Conference). Edited by: Mooi R, Telford M. 1996, San Francisco: Balkema, 536-566.
- Tatarenko D, Poltaraus A: Affiliation of sea urchin, Pseudocentrotus depressus, to the family Strongylocentrotidae and description of Mesocentrotus new genus belonging to this group based on DNA-DNA hybridization and comparative morphological data. Zoologicheskii Zhurnal. 1993, 67: 61-72.
- Biermann CH, Kessing BD, Palumbi SR: Phylogeny and development of marine model species: strongylocentrotid sea urchins. Evol Dev. 2003, 5: 360-371. 10.1046/j.1525-142X.2003.03043.x.PubMedView Article
- Lee YH: Molecular phylogenies and divergence times of sea urchin species of Strongylocentrotidae, Echinoida. Mol Biol Evol. 2003, 20: 1211-1221. 10.1093/molbev/msg125.PubMedView Article
- Biermann CH: The molecular evolution of sperm bindin in six species of sea urchins (Echinoida : Strongylocentrotidae). Mol Biol Evol. 1998, 15: 1761-1771. 10.1093/oxfordjournals.molbev.a025902.PubMedView Article
- Addison JA, Pogson GH: Multiple gene genealogies reveal asymmetrical hybridization and introgression among strongylocentrotid sea urchins. Mol Ecol. 2009, 18: 1239-1251. 10.1111/j.1365-294X.2009.04094.x.PubMedView Article
- Lessios HA: Reproductive isolation between species of sea urchins. Bull Mar Sci. 2007, 81: 191-208.
- Palumbi SR: All males are not created equal: Fertility differences depend on gamete recognition polymorphisms in sea urchins. Proc Natl Acad Sci. 1999, 96: 12632-12637. 10.1073/pnas.96.22.12632.PubMed CentralPubMedView Article
- Zigler KS, McCartney MA, Levitan DR, Lessios HA: Sea urchin bindin divergence predicts gamete compatibility. Evolution. 2005, 59: 2399-2404.PubMedView Article
- Takahata N, Slatkin M: Genealogy of neutral genes in two partially isolated populations. Theor Popul Biol. 1990, 38: 331-350. 10.1016/0040-5809(90)90018-Q.PubMedView Article
- Maddison WP: Gene trees in species trees. Syst Biol. 1997, 46: 523-536. 10.1093/sysbio/46.3.523.View Article
- Maddison WP, Knowles LL: Inferring phylogeny despite incomplete lineage sorting. Syst Biol. 2006, 55: 21-30. 10.1080/10635150500354928.PubMedView Article
- Avise J, Ball R: Principles of genealogical concordance in species concepts and biological taxonomy. Oxford surveys in evolutionary biology. Edited by: Futuyma D, Antonovics J. 1990, Oxford, U.K: Oxford Univ. Press, 45-67.
- Avise JC, Wollenberg K: Phylogenetics and the origin of species. Proc Natl Acad Sci. 1997, 94: 7748-7755. 10.1073/pnas.94.15.7748.PubMed CentralPubMedView Article
- Rannala B, Yang Z: Phylogenetic inference using whole genomes. Annu Rev Genom Hum Genet. 2008, 9: 217-231. 10.1146/annurev.genom.9.081307.164407.View Article
- Karolchik D, Kuhn RM, Baertsch R, Barber GP, Clawson H, Diekhans M, Giardine B, Harte RA, Hinrichs AS, Hsu F, et al: The UCSC genome browser database: 2008 update. Nulceic Acids Res. 2008, 36: D773-D779.View Article
- Cameron RA, Samanta M, Yuan A, He D, Davidson E: SpBase: the sea urchin genome database and web site. Nucleic Acids Res. 2009, 37: D750-D754. 10.1093/nar/gkn887.PubMed CentralPubMedView Article
- Perseke M, Fritzsch G, Ramsch K, Bernt M, Merkle D, Middendorf M, Bernhard D, Stadler PF, Schlegel M: Evolution of mitochondrial gene orders in echinoderms. Mol Phylogenet Evol. 2008, 47: 855-864. 10.1016/j.ympev.2007.11.034.PubMedView Article
- Pamilo P, Nei M: Relationships between gene trees and species trees. Mol Biol Evol. 1988, 5: 568-583.PubMed
- Kober KM, Nichols SA: On the phylogenetic relationships of hadromerid and poecilosclerid sponges. J Mar Biol Assoc U K. 2007, 87: 1585-1598.View Article
- Telford MJ, Wise MJ, Gowri-Shankar V: Consideration of RNA secondary structure significantly improves likelihood-based estimates of phylogeny: examples from the bilateria. Mol Biol Evol. 2005, 22: 1129-1136. 10.1093/molbev/msi099.PubMedView Article
- Erpenbeck D, Nichols SA, Voigt O, Dohrmann M, Degnan BM, Hooper JN, Worheide G: Phylogenetic analyses under secondary structure-specific substitution models outperform traditional approaches - case studies with diploblast LSU. J Mol Evol. 2007, 64 (5): 543-57. 10.1007/s00239-006-0146-3. Epub 2007 Apr 24PubMedView Article
- Kumar S, Filipski AJ, Battistuzzi FU, Kosakovsky Pond SL, Tamura K: Statistics and truth in phylogenomics. Mol Biol Evol. 2011, 39: 457-472.
- Yang Z, Rannala B: Molecular phylogenetics: principles and practice. Nat Rev Genet. 2012, 13: 303-314. 10.1038/nrg3186.PubMedView Article
- Song H, Sheffield NC, Cameron SL, Miller KB, Whiting MF: When phylogenetic assumptions are violated: base compositional heterogeneity and among-site rate variation in beetle mitochondrial phylogenomics. Syst Entomol. 2010, 35: 429-448. 10.1111/j.1365-3113.2009.00517.x.View Article
- Boore JL: Animal mitochondrial genomes. Nucleic Acids Res. 1999, 27: 1767-1780. 10.1093/nar/27.8.1767.PubMed CentralPubMedView Article
- Shaw KL: Conflict between nuclear and mitochondrial DNA phylogenies of a recent species radiation: What mtDNA reveals and conceals about modes of speciation in Hawaiian crickets. Proc Natl Acad Sci. 2002, 99: 16122-16127. 10.1073/pnas.242585899.PubMed CentralPubMedView Article
- Vinnikova V, Drozdov A: The Ultrastructure of Spines in Sea Urchins of the Family Strongylocentrotidae. Biol Bull. 2011, 38: 861-867. 10.1134/S1062359011090093.View Article
- Palumbi SR: Marine speciation on a small planet. Trends Ecol Evol. 1992, 7: 114-118. 10.1016/0169-5347(92)90144-Z.PubMedView Article
- Zigler KS, Lessios HA: Speciation on the coasts of the new world: phylogeography and the evolution of bindin in the sea urchin genus Lytechinus. Evolution. 2004, 58: 1225-1241.PubMedView Article
- Bernardi G, Findley L, Rocha-Olivares A: Vicariance and dispersal across Baja California in disjunct marine fish populations. Evolution. 2003, 57: 1599-1609.PubMedView Article
- Lindstrom SC: The bering strait connection: dispersal and speciation in boreal macroalgae. J Biogeogr. 2001, 28: 243-251. 10.1046/j.1365-2699.2001.00529.x.View Article
- Haq BU, Hardenbol J, Vail PR: Chronology of fluctuating Sea levels since the triassic. Science. 1987, 235: 1156-1167. 10.1126/science.235.4793.1156.PubMedView Article
- Vermeij GJ: Geographical restriction as a guide to the causes of extinction: the case of the cold northern oceans during the neogene. Paleobiology. 1989, 15: 335-356.
- Jensen M: The Strongylocentrotidae (Echinoidea), a morphological and systematic study. Sarsia. 1974, 57: 113-148.
- Durham J, MacNeil F: Cenozoic migrations of marine invertebrates through the Bearing Strait region. The Bering Land Bridge. Edited by: Hopkins D. 1967, Palo Alto, CA: Stanford University Press, 326-349.
- Smith A: Phylogenetic relationship, divergence times, and rates of molecular evolution for camarodont Sea urchins. Mol Biol Evol. 1988, 5: 345-365.
- Gladenkov AY, Oleinik AE, Marincovich L, Barinov KB: A refined age for the earliest opening of Bering Strait. Palaeogeogr Palaeoclimatol Palaeoecol. 2002, 183: 321-328. 10.1016/S0031-0182(02)00249-3.View Article
- Bazhin AG: Distribution patterns of strongylocentrotus Sea urchins along the coast of eastern kamchatka. Russ J Mar Biol. 2002, 28: 298-307. 10.1023/A:1020951316734.View Article
- Palumbi SR, Lessios HA: Evolutionary animation: How do molecular phylogenies compare to Mayr’s reconstruction of speciation patterns in the sea?. Proc Natl Acad Sci USA. 2005, 102: 6566-6572. 10.1073/pnas.0501806102.PubMed CentralPubMedView Article
- Kroh A, Smith AB: The phylogeny and classification of post-Palaeozoic echinoids. J Syst Palaeontol. 2010, 8: 147-212. 10.1080/14772011003603556.View Article
- Ning Z, Cox AJ, Mullikin JC: SSAHA: a fast search method for large DNA databases. Genome Res. 2001, 11: 1725-1729. 10.1101/gr.194201.PubMed CentralPubMedView Article
- Zerbino DR, Birney E: Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 2008, 18: 821-829. 10.1101/gr.074492.107.PubMed CentralPubMedView Article
- Zerbino DR: Using the Velvet de novo assembler for short-read sequencing technologies. Curr Protoc Bioinformatics. 2010, 11: 11-15.
- Huang X, Madan A: CAP3: A DNA sequence assembly program. Genome Res. 1999, 9: 868-877. 10.1101/gr.9.9.868.PubMed CentralPubMedView Article
- Kent WJ: BLAT–-The BLAST-Like Alignment Tool. Genome Res. 2002, 12: 656-664.PubMed CentralPubMedView Article
- Bininda-Emonds OR: transAlign: using amino acids to facilitate the multiple alignment of protein-coding DNA sequences. BMC Bioinforma. 2005, 6: 156-10.1186/1471-2105-6-156.View Article
- Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R, et al: Clustal W and Clustal X version 2.0. Bioinformatics. 2007, 23: 2947-2948. 10.1093/bioinformatics/btm404.PubMedView Article
- Castresana J: Selection of conserved blocks from multiple alignments for their Use in phylogenetic analysis. Mol Biol Evol. 2000, 17: 540-552. 10.1093/oxfordjournals.molbev.a026334.PubMedView Article
- Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler D: The human genome browser at UCSC. Genome Res. 2002, 12: 996-1006.PubMed CentralPubMedView Article
- Ihaka R, Gentleman R: R: a language for data analysis and graphics. J Comput Graph Stat. 1996, 5: 299-314.
- Quinlan AR, Hall IM: BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010, 26: 841-842. 10.1093/bioinformatics/btq033.PubMed CentralPubMedView Article
- Rice P, Longden I, Bleasby A: EMBOSS: the european molecular biology open software suite. Trends Genet. 2000, 16: 276-277. 10.1016/S0168-9525(00)02024-2.PubMedView Article
- Junier T, Zdobnov EM: The Newick utilities: high-throughput phylogenetic tree processing in the Unix shell. Bioinformatics. 2010, 26: 1669-1670. 10.1093/bioinformatics/btq243.PubMed CentralPubMedView Article
- Yang Z: PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol. 2007, 24: 1586-1591. 10.1093/molbev/msm088.PubMedView Article
- Guindon S, Dufayard JF, Lefort V, Anisimova M, Hordijk W, Gascuel O: New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol. 2010, 59: 307-321. 10.1093/sysbio/syq010.PubMedView Article
- Yang Z: Among-site rate variation and its impact on phylogenetic analyses. Trends Ecol Evol. 1996, 11: 367-372. 10.1016/0169-5347(96)10041-0.PubMedView Article
- Gowri-Shankar V, Rattray M: On the correlation between composition and site-specific evolutionary rate: implications for phylogenetic inference. Mol Biol Evol. 2006, 23: 352-364.PubMedView Article
- Hofacker IL, Stadler PF: Automatic detection of conserved base pairing patterns in RNA virus genomes. Comput Chem. 1999, 23: 401-414. 10.1016/S0097-8485(99)00013-3.PubMedView Article
- Tavare S: Some probabilistic and statistical problems in the analysis of DNA sequences. Lectures Math Life Sci. 1986, 17: 57-86.
- Tillier ERM, Collins RA: High apparent rate of simultaneous compensatory base-pair substitutions in ribosomal RNA. Genetics. 1998, 148: 1993-2002.PubMed CentralPubMed
- Savill NJ, Hoyle DC, Higgs PG: RNA sequence evolution with secondary structure constraints: comparison of substitution rate models using maximum-likelihood methods. Genetics. 2001, 157: 399-411.PubMed CentralPubMed
- Hudelot C, Gowri-Shankar V, Jow H, Rattray M, Higgs PG: RNA-based phylogenetic methods: application to mammalian mitochondrial RNA sequences. Mol Phylogenet Evol. 2003, 28: 241-252. 10.1016/S1055-7903(03)00061-7.PubMedView Article
- Swofford DL: PAUP*: Phylogenetic Analysis Using Parsimony (and Other Methods) 4.0 Beta. 2002, Sunderland, MA: Sinauer
- Ronquist F, Teslenko M, van der Mark P, Ayres DL, Darling A, Höhna S, Larget B, Liu L, Suchard MA, Huelsenbeck JP: MrBayes 3.2: efficient bayesian phylogenetic inference and model choice across a large model space. Syst Biol. 2012, 61: 539-542. 10.1093/sysbio/sys029.PubMed CentralPubMedView Article
- Shimodaira H, Hasegawa M: Multiple comparisons of Log-likelihoods with applications to phylogenetic inference. Mol Biol Evol. 1999, 16: 1114-10.1093/oxfordjournals.molbev.a026201.View Article
- Kishino H, Miyata T, Hasegawa M: Maximum likelihood inference of protein phylogeny and the origin of chloroplasts. J Mol Evol. 1990, 31: 151-160. 10.1007/BF02109483.View Article
- Kass RE, Raftery AE: Bayes factors. J Am Stat Assoc. 1995, 90: 773-795. 10.1080/01621459.1995.10476572.View Article
- Smith AB, Pisani D, Mackenzie-Dodds JA, Stockley B, Webster BL, Littlewood DTJ: Testing the molecular clock: molecular and paleontological estimates of divergence times in the echinoidea (Echinodermata). Mol Biol Evol. 2006, 23: 1832-1851. 10.1093/molbev/msl039.PubMedView Article
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.