Phylogenomics of strongylocentrotid sea urchins
© Kober and Bernardi; licensee BioMed Central Ltd. 2013
Received: 19 December 2012
Accepted: 9 April 2013
Published: 23 April 2013
Skip to main content
© Kober and Bernardi; licensee BioMed Central Ltd. 2013
Received: 19 December 2012
Accepted: 9 April 2013
Published: 23 April 2013
Strongylocentrotid sea urchins have a long tradition as model organisms for studying many fundamental processes in biology including fertilization, embryology, development and genome regulation but the phylogenetic relationships of the group remain largely unresolved. Although the differing isolating mechanisms of vicariance and rapidly evolving gamete recognition proteins have been proposed, a stable and robust phylogeny is unavailable.
We used a phylogenomic approach with mitochondrial and nuclear genes taking advantage of the whole-genome sequencing of nine species in the group to establish a stable (i.e. concordance in tree topology among multiple lies of evidence) and robust (i.e. high nodal support) phylogenetic hypothesis for the family Strongylocentrotidae. We generated eight draft mitochondrial genome assemblies and obtained 13 complete mitochondrial genes for each species. Consistent with previous studies, mitochondrial sequences failed to provide a reliable phylogeny. In contrast, we obtained a very well-supported phylogeny from 2301 nuclear genes without evidence of positive Darwinian selection both from the majority of most-likely gene trees and the concatenated fourfold degenerate sites: ((P. depressus, (M. nudus, M. franciscanus), (H. pulcherrimus, (S. purpuratus, (S. fragilis, (S. pallidus, (S. droebachiensis, S. intermedius)). This phylogeny was consistent with a single invasion of deep-water environments followed by a holarctic expansion by Strongylocentrotus. Divergence times for each species estimated with reference to the divergence times between the two major clades of the group suggest a correspondence in the timing with the opening of the Bering Strait and the invasion of the holarctic regions.
Nuclear genome data contains phylogenetic signal informative for understanding the evolutionary history of this group. However, mitochondrial genome data does not. Vicariance can explain major patterns observed in the phylogeny. Other isolating mechanisms are appropriate to explore in this system to help explain divergence patterns not well supported by vicariance, such as the effects of rapidly evolving gamete recognition proteins on isolating populations. Our findings of a stable and robust phylogeny, with the increase in mitochondrial and nuclear comparative genomic data, provide a system in which we can enhance our understanding of molecular evolution and adaptation in this group of sea urchins.
Sea urchins are benthic marine echinoderms distributed across all of the world’s oceans . Despite their unusual appearance, they have been a component of human diets since at least the ancient Greeks  and are still experiencing a vigorous fisheries industry today [3, 4]. The contribution of sea urchins to our understanding of many aspects of basic biology cannot be understated . Sea urchins are a primary research model for embryology , fertilization , bilaterian development , genomic regulatory systems [8, 9] and our basic understanding of fundamental properties of genomes [10, 11]. They provide broadly useful natural systems in which we investigate central evolutionary questions of natural selection [12, 13], reproductive isolation  and speciation [15–17] and ecological questions of population responses to disease  and global scale habitat distribution patterns . Indeed, our first coherent view of cancer was provided by studying embryonic development in sea urchins  and origins of the phagocytic theory, a key process in the idea of an immune system, were based on observations of the movement and engulfing of foreign particles in echinoderm tissue .
The location of Echinodermata as an early branch in the deuterostome phylogeny serves as an important node with which to infer ancestral states of vertebrate biology [22, 23]. This placement is useful for addressing broad reaching questions on the origins and evolution of animal immunity  and development . Among sea urchins, the family Strongylocentrotidae is arguably the best studied group  and includes the well-annotated genome of the representative model species Strongylocentrotus purpuratus. The Strongylocentrotidae are abundant marine echinoids with members living in the northern Pacific, northern Atlantic and the holarctic regions . The group is comprised of four genera: Strongylocentrotus, Hemicentrotus, Pseudocentrotus and Mesocentrotus.
The phylogenetic position of strongylocentrotids relative to other sea urchins is well understood [29–31]. The genus Strongylocentrotus comprises five species: S. purpuratus, S. pallidus, S. droebachiensis, S. intermedius, S. fragilis and S. polyacanthus. Strongylocentrotus djakonovi has been assigned as a junior synonym for S. droebachiensis, S. pulchellus a junior synonym for S. intermedius and A. fragilis is a junior synonym for S. fragilis. Mesocentrotus comprises M. franciscanus (nee S. franciscanus) and M. nudus (nee S. nudus). Hemicentrotus and Pseudocentrotus are monotypic with H. pulcherrimus and P. depressus, respectively.
Recent mitochondrial molecular phylogenies have identified two clades, one consisting of members of Strongylocentrotus and Hemicentrotus and the other consisting of Mesocentrotus and Pseudocentrotus[34–36]. However, the branching orders within the Strongylocentrotus and Hemicentrotus clades are largely incongruent. Specifically, the relationships of S. intermedius, S. droebachiensis and S. pallidus, the relative placements of S. purpuratus and S. fragilis, and the positions of S. intermedius and H. pulcherrimus are unresolved.
This issue is outstanding because an accurate and robust phylogeny is essential for correctly interpreting the broad range of contemporary biological research being performed on this group. The unresolved phylogenetic relationships among strongylocentrotids underscores the problem of using few loci in a group with large effective population sizes and complex histories that may involve hybridization . This is particularly relevant in this group since sea urchins are broadcast spawners, where fertilization occurs in the water column. Many strongylocentrotid species live in sympatry, display overlapping spawning seasons, and have unequal gametic compatibilities . The fertilization efficiencies of eggs and sperm between species are often asymmetric and gamete recognition loci are thought to play an important role in post-mating pre-zygotic isolation . Selection on components of gamete interactions are thought to be particularly important early on in the speciation process. In Strongylocentrotidae, however, gametes from distantly related sympatric M. franciscanus and S. purpuratus readily fertilize in the lab, but hybrids are seldom seen in nature and no introgression has been observed between these two species . Therefore, the rapid evolution of gamete recognition proteins is of particular interest in this group and is under intense study [13, 14, 40]. An accurate phylogeny is integral to this work.
The combined action of incomplete lineage sorting and introgression of genes between species are known to greatly complicate the resolution of species trees [41–43] weakening single loci phylogenetic inferences . Congruence among multiple genes and morphology has been suggested as a robust approach to reconstruct a reliable phylogeny . Multi-locus analyses at genome-wide scales offer a remarkable opportunity for powerful improvements in molecular phylogenetic inference . The advent of next-generation sequencing and genome assembly makes such analyses possible. A high quality, well-annotated, draft genome for S. purpuratus is available [27, 47, 48] and high coverage, whole-genome sequencing, has been completed for nine of the ten species comprising the family Stron gylocentrotidae (Kober and G. H. Pogson, unpublished data).
The objective of this study was to establish a strong phylogenetic hypothesis for the family Strongylocentrotidae based on alignments of nuclear and mitochondrial genes from 9 (out of the 10) species of the family. The development of a robust and stable phylogeny in this group will provide essential comparative tools to a vast group of scientists including those interested in ecology, evolution, developmental biology and physiology.
Genera and species of sea urchins used in this study
Depth range (m)
Mitochondrial genome size
S. purpuratus (Stimpson)
S. pallidus (Sars)
S. droebachiensis (O. F. Müller)
S. intermedius (A. Agassiz)
S. fragilis (Jackson)
M. franciscanus (A. Agassiz)
M. nudus (A. Agassiz)
H. pulcherrimus (A. Agassiz)
P. depressus (A. Agassiz)
The data partitions used for phylogenetic analysis
We found no evidence for positive selection acting on any of the protein coding mitochondrial genes tested (ATPase6, COI, COII, CytB, ND1, ND2, ND3, ND4, ND4L). Internal stop codons were found for at least one site in the alignments for ND4 and ND6 and the approximate likelihood calculation in ATPase8 was unreliable. For these reasons we did not test these three genes for evidence of positive selection.
The BI 50% majority-rule consensus phylogram of the stationary tree inferred from fourfold degenerate sites of nuclear genes without selection (‘N4Ds tree’) had complete (BI PP = 1, BSML = BSMP = 100) or very strong support from all three methods at all nodes with the N4N and N4A datasets except at the divergence of S. fragilis (BSMP = 74 and BSMP = 69, respectively). As such, we observed an effect on the phylogenetic inference when including genes found to be under positive selection. Indeed, the tree obtained from the N4S data produced a similar topography except S. purpuratus and S. fragilis branching locations were swapped, with S. fragilis the earlier branching of the two with low support for the node (not shown).
Comparing the mtDNA and nuclear results, we observed very strong support for H. pulcherrimus sister to the clade containing S. fragilis, S. purpuratus, S. intermedius, S. pallidus and S. droebachiensis across our analysis of concatenated datasets. We also found very strong support for a monophyletic clade of S. intermedius, S. droebachiensis and S. pallidus across our analysis of concatenated datasets. However, MA and M4 datasets produced S. pallidus sister to S. droebachiensis, but without strong support. In contrast, the N4A, N4S and N4N concatenated datasets found very strong support S. intermedius as sister to S. pallidus.
In all mitochondrial ML gene trees except ATPase8, 12S and ND6, the S. pallidus and S. droebachiensis individuals from GenBank and our de novo assemblies resolved as sister taxa as expected. However, the putative outgroup, P. lividus, consistently produced a very long branch and that shifted to different locations among the ML gene trees. Ignoring a P. lividus root, the individual mitochondrial ML gene trees topologies were consistent in resolving Clade M and Clade S (Additional file 2: Figures S2, Additional file 3: Figure S3, Additional file 4: Figure S4). However, the branching order within these clades was inconsistent and in conflict with the nuclear data. We found contradictory topologies for the relative positions of S. fragilis and S. purpuratus among mitochondrial genes trees. H. pulcherrimus was placed sister to Strongylocentrotus species in all gene trees except ATPase6, ND4L and ND6. No single mitochondrial gene returned a topology corroborating with the N4Ds tree.
The locations of S. purpuratus and S. fragilis were discordant between the MP method and BI and ML methods in the MA and M4 datasets. BI and ML trees had these two species sister to S. intermedius, S. pallidus and S. droebachiensis (Figure 3), whereas the MP method has S. purpuratus branching earliest, then S. fragilis and then the S. intermedius, S. pallidus and S. droebachiensis observed with the nuclear concatenated datasets (not shown). The monophyly of a S. intermedius, S. pallidus and S. droebachiensis clade was recovered in both MA and M4, but we found conflicting support for a sister relationship between S. pallidus and S. droebachiensis versus S. pallidus and S. intermedius (Figure 3).
The 12S sequences used by Lee (2003) were collected, aligned, and used to construct an ML tree as described above for rRNA mitochondrial genes. The proposed relationship shown in Figure 2 of Lee (2003) was found to be no better at explaining these data (P > 0.505) than our proposed species tree (Figure 1). The proposed tree of Lee (2003) differed significantly from our proposed species tree (Figure 1) in 762 of 2301 nuclear genes tested. The N4N tree was significantly better at explaining the data for 685 genes, while Lee (2003) Figure 2 was significantly better for 77. When we included the 12S sequences of Lee (2003) with our 12S data, our alignment and ML method produced a different tree (Additional file 5: Figure S5). Here, H. pulcherrimus and S. nudus individuals resolved as sister taxa, but S. intermedius falls in sister to the S. pallidus sequences rather than with the S. intermedius of Lee (2003). The S. intermedius sequence of Lee (2003) falls sister to S. fragilis in a clade with S. purpuratus.
The combined dataset (COI, COII, tRNA-Lys, ATPase8 and ATPase6) of Biermann et al. (2003) was collected and concatenated after removing S. polyacanthus. An alignment was generated and ML trees reconstructed as described above for the protein-coding mitochondrial genes. The proposed relationship in Figure 2 of Biermann et al. (2003) was found to be significantly better at explaining these data (P < 0.001) than our proposed species tree (Figure 1). However, the proposed tree of Figure 2 of Biermann et al. (2003) differed significantly from our proposed species tree (Figure 1) in 1865 of 2301 nuclear genes tested. Our species tree was significantly better at explaining the data for 1855 genes, while Figure 2 of Biermann et al. (2003) was significantly better for 10 genes.
Using RNA secondary structure in phylogeny reconstruction has been shown to have significant utility in resolving relationships in metazoan taxa [51–53]. However, our results from 12S and 16S mixed model and un-partitioned analyses produced very similar trees (not shown).
Divergence time estimates
12S rDNA rate estimate
12S rDNA Lee (2003)
Numerous processes, including horizontal gene transfer (HGT), gene duplication, introgressive hybridization, incomplete lineage sorting and natural selection may all contribute to gene tree histories that do not represent the true species tree , resulting in gene trees that do not necessarily reflect species trees ., In this group of sea urchins, introgression has been documented between some taxa , and of the primary mechanisms of HGT, the possibility of HGT by viral transfer exists but is likely to be extremely rare (G. H. Pogson, personal communication). Despite these factors, integrating information from large numbers of independent loci offers considerable promise to generate robust phylogenies in situations where small number of loci failed to do so , although care must be taken to assess the robustness of results in the proper biological context . The two multi-locus molecular phylogenies previously published for Strongylocentrotidae provided strong support for the composition of the major groups, but were unable to resolve the relationships of the species [34, 35].
The variation in the evolutionary histories of multiple independent genes are typically addressed with either data partitions with different nucleotide substitution models, or with mixture models allowing for random variation between sites . Recent phylogenomic work has demonstrated the potential poor performance of standard phylogenetic methods due to among-site rate variation, causing shifts in the phylogenetic positions of terminal taxa in well-supported trees generated from different models of nucleotide substitution or by different methods . Our analyses evaluated both the gene level support-based evidence and a concatenated site approach including the implicit model of nucleotide evolution in MP, an explicit model of GTR + I + G with BI and a mixture model allowing for rate variation among sites under ML. Our results did not find discordance between the topologies inferred between methods, or with the nodal support based on the different usage of nucleotide substitution models between the ML, BI and MP analysis of nuclear fourfold degenerate sites of genes without evidence of positive selection. We take the complete concordance between such disparate methods and the morphological data as strong support for the biological significance of these proposed species relationships.
Mitochondrial genes offer potential utility as molecular markers for reconstructing phylogenetic relationships, as the order and number of mitochondrial genes are typically conserved over large phylogenetic distances and orthology is clear . However, mitochondrial phylogenies may be misleading , a fact we find in our data best represented by the incongruence and limited node support between BI, MP and MP methods with the concatenated mitochondrial MA and M4 data. Our results using mitochondrial genes, and those of previous studies in this group, produce conflicting topologies and do not demonstrate clear or consistent signals of phylogenetic relationships.
For this study, we have chosen to follow the taxon details of the World Echinoidea Database  and acknowledge the Mesocentrotus genus and identify S. fragilis (nee Allocentrotus fragilis). Indeed, the molecular evidence from this study strongly support the membership of M. franciscanus and M. nudus to a group sister to Strongylocentrotus. Our results confirm the two major clades of Strongylocentrotus and Mesocentrotus previously identified by mitochondrial gene studies [34, 35]. Clade S forms a monophyletic Strongylocentrotus and Hemicentrotus group supporting the inclusion of S. fragilis. Clade M conforms to the proposed Mesocentrotus group , including M. franciscanus, M. nudus and P. depressus. The molecular distinction between Strongylocentrotus and Mesocentrotus taxa is also supported by recent morphological classifications of the cross-section of the ultrastructure of primary spines .
Previous studies suggested H. pulcherrimus was an early branching member of clade S [34, 35]. Our data support H. pulcherrimus as an early branching member of this clade , rather than sister to S. intermedius.
Population disjunctions, such as vicariant events and limitations to dispersal, are important first steps towards allopatric speciation . Vicariant events due to sea level changes are well documented across the Isthmus of Panama , Baja California  and the Bering Strait . Sea levels experienced a severe decline at 10.5 Ma with regular fluctuations occurring since . This fluctuation broadly corresponds to the “Vicariant Pacific Pattern” (VPP), where amphi-Pacific taxa gave rise to eastern and western Pacific forms  during the Neogene.
Parsimoniously, our phylogeny suggests a western Pacific (WP) last common ancestor living in shallow, warmer waters followed by an expansion into the WP ancestor of the two major clades. Descendants of each clade experienced two separate eastern Pacific (EP) invasions (S. purpuratus and M. franciscanus). In Clade S, a single deep, cold-water invasion at the ancestor of S. fragilis and S. pallidus occurred, with the LCA of S. pallidus, S. droebachiensis and S. intermedius invading the Arctic and becoming holarctic (HA) in range. Surprisingly, our data provide strong support for a sister grouping of S. droebachiensis and S. intermedius. This suggests that S. intermedius has invaded the WP and moved into shallower and warmer water.
The sister species of M. nudus and M franciscanus show disjunct distributions, with one species inhabiting the northwest and the other the northeast Pacific, respectively. The estimated divergence time between these two species of 2.7-4.8 Ma using the fossil record calibration is more recent than the 5.7-8.1 Ma estimated from 12S mitochondrial DNA but still corresponds with the sea level fluctuations and fit with the VPP . This estimated time of divergence also corresponds to the split between P. depressus and the ancestor of M. nudus and M. franciscanus, suggesting a corresponding event occurring the northern Pacific. In the other sister pair, S. droebachiensis inhabits the holarctic region and overlaps the distribution of it’s sister, S. intermedius. Extant sister species, however, may not be true sisters as other lineages may be extinct. In addition, current ranges many not reflect historical ones. It is not clear from this phylogeny as to whether these two species likely diverged through allopatric or sympatric means . Interestingly, this habitat overlap becomes marginal if S. pulchellus is a distinct species, and not a synonym of S. droebachiensis. Major morphological work on the group found S. pulchellus agrees with S. intermedius in all morphological structures examined except for the tooth skeleton . Future molecular and morphological work will certainly shed light on this divergence.
The members of Clade S show rapid evolutionary divergence along with habitat expansions and changes following a split from a WP ancestor, conforming to the VPP. Isolated spines fossil evidence places undefined Strongylocentrotus in the northeast Pacific in the late Miocene  though the reliability of these identifications remain suspect . The opening of the Bering Strait would provide the access into arctic habitats necessary for a holarctic expansion [34, 35]. The Bering Strait opened at the end of the Miocene, 5.32 Ma , overlapping the early bounds of our estimated divergence time of 3.1-5.6 Ma from fossil calibration for the clade containing S. purpuratus, S. fragilis, S. pallidus, S. droebachiensis and S. intermedius. Furthermore, distinct S. purpuratus, S. droebachiensis, and M. franciscanus fossils are seen in California formations of the middle Pliocene and S. droebachiensis fossils reached western Europe by the late Pliocene , supporting a late Miocene, early Pliocene divergence.
The Strongylocentrotidae has two deep-water species, S. fragilis and S. pallidus. Our inferred phylogeny provides evidence for a single radiation into the deep-water habitat. S. pallidus is typically found in lower depths . S. droebachiensis is also know to reach depths of 1150 m, but is typically found in the shallow sub tidal zone from 0 to 50 m [26, 70]. These species coexist in the same geographic range with S. droebachiensis in the shallow and S. pallidus in the deep habitats. Our tree suggests that S. pallidus and S. fragilis share a recent common ancestor from a single deep-water invasion and as such may share adaptations to this environment. Indeed, adaptations for the deep-water habitat invasions of S. fragilis have been proposed based on genome-wide comparative analysis of three species (not including S. pallidus or S. droebachiensis) . However, gamete production declines with depth, and the very deep-water individuals of S. fragilis aren’t expected to be spawning (John Pearse, personal communication). If that is the case, then natural selection may not reach the very deep-water habitats and deep-water adaptations would be based on selection pressures found at the more shallow depths. With these genome-wide comparative data, future research can test for molecular adaptations along the branch leading to the ancestor of these taxa as well identify adaptations unique to the branches leading to these extant taxa.
Vicariance is insufficient to completely explain our observed pattern of divergence between these taxa, and much work has been done in this group to explore the effects of rapidly evolving gamete recognition proteins on isolating populations [14, 36, 39, 71]. However, the putative egg receptor protein, EBR1, for the sperm bindin gamete recognition protein in sea urchins is prohibitively long for traditional sequencing methods . The phylogenetic relationships inferred from our extended genomic sampling offer a unique opportunity to expand hypothesis of molecular evolution and adaptation in this group of sea urchins.
This phylogeny was consistent with a single invasion of deep-water environments followed by a holarctic expansion by Strongylocentrotus. Divergence times for each species estimated with reference to the divergence times between the two major clades of the group suggest a correspondence in the timing with the opening of the Bering Strait and the invasion of the holarctic regions. However, vicariance is insufficient to completely explain the divergence between these taxa and other isolating mechanisms are appropriate to explore in this system. In particular, much work has been done to explore the effects of rapidly evolving gamete recognition proteins on isolating populations in sea urchins. The phylogenetic relationships inferred in this study and the comparative genomic data now available provide a unique opportunity to explore hypothesis of molecular evolution and adaptation in natural populations.
Class Echinoidea has been found to be monophyletic in Echinoderm phylogenies inferred from both mitochondrial DNA (mtDNA) sequences and morphological data [29–31]. The family Strongylocentrotidae consistently resolves as sister to Paracentrotus lividus, Echinocardium cordatum and Arbacia lixula in both molecular and morphological phylogenies [31, 72].
The complete mitochondrial genomes available for six urchin species were obtained from GenBank (Strongylocentrotus droebachensis NC 009940; Strongylocentrotus purpuratus, NC 001453; Strongylocentrotus pallidus, NC 009941; Paracentrotus lividus, NC 001572; Arbacia lixula, NC 001770; Echinocardium cordatum, FN562581.1).
The published sequences for all nine strongylocentrotid species were collected for regions covering COI, COII, tRNA-Lys, ATPase8 and ATPase6 [GenBank:AY220998-AY221021] . The published sequences for S. intermedius, S. nudus and H. pulcherrimus were collected for COI [GenBank:AF525455, GenBank:AF525452 and GenBank:AF525453, respectively], NDI [GenBank:AF525454, GenBank:AF525450 and GenBank:AF525451, respectively] and 12S [GenBank:AF525769, GenBank:AF525767 and GenBank:AF525768, respectively) .
We supplemented the mitochondrial genomes collected from GenBank with de novo assemblies from Illumina paired-end reads of genomic. Reference sequences for twelve protein-coding (COI, COII, ND1, ND2, ND3, ND4, ND4L, ND5, ND6, CytB, ATPase6 and ATPase8) and two ribosomal RNA (12S and 16S) genes were identified in each mitochondrial genome based on the annotated nucleotide sequence from S. purpuratus [GenBank:NC001543].
Mitochondrial genomic sequences were assembled de novo from Illumina paired-end reads of genomic DNA (Kober and Pogson, unpublished data). First, to obtain a set of putative mitochondrial DNA reads, all reads for each species were aligned to all six GenBank mitochondrial genome sequence with SSAHA2  using parameters ‘-solexa -skip 6 -pair 20,3000’. All reads that mapped to any of the six molecules with a mapping quality of greater than 5 were collected for each species.
The collected reads were used as input for de novo assembly of the molecule for each species using velvet . Hash size values between 11 and 99 were evaluated using VelvetOptimiser.pl  and optimized for the total number of base pairs in large contigs. For S. fragilis, previously sequenced 454 reads  were also aligned to S. purpuratus [GenBank:NC 001453] with SSAHA2. 454 reads aligning with a mapping quality >5 were also included with the Illumina paired end reads as input to the S. fragilis de novo assembly. Our assembled M. franciscanus molecule was included as an additional reference sequence for obtaining putative mitochondrial reads for the de novo assembly for M. nudus, P. depressus and H. pulcherrimus. De novo contigs over 1000 bp for P. depressus were collected and the partial mitochondrial genome was assembled from two non-overlapping contigs generated with CAP3 . These assembled mitochondrial genomes then provided the template from which we identified and extracted the nucleotide sequence of each gene for each species.
Reference sequences for twelve protein-coding (COI, COII, ND1, ND2, ND3, ND4, ND4L, ND5, ND6, CytB, ATPase6 and ATPase8) and two ribosomal RNA (12S and 16S) genes were identified in each mitochondrial genome based on the annotated nucleotide sequence from S. purpuratus [GenBank: NC 001543]. We identified the start and stop coordinates for each gene location on the de novo assembled mitochondrial genomes for each species by aligning the S. purpuratus gene reference sequence to our de novo assembled mitochondrial genome for each species using BLAT  with DNA sequences translated in six frames to protein and allowing one mismatch in the tile.
The sequence of protein coding mitochondrial genes we identified were aligned using transAlign  using the echinoderm mitochondrial code. The mitochondrial ribosomal RNA (rRNA) 12S and 16S gene sequences were aligned using clustalw2 . Ambiguously aligned regions were identified and removed with Gblocks  with default parameters and no gap positions allowed. Two sequences for each gene were obtained for S. droebachiensis and S. pallidus: one from the GenBank mitochondrial genome sequence and the other from our de novo assembly. The de novo assembled mitochondrial genomes and the predicted gene models were submitted to GenBank [GenBank: KC898196-KC898203].
Briefly, we defined nuclear genes based on the gene model coordinates defined in SpBase Build 6 based on the Spur v3.1 genome assembly (SpBase.org). The alignments generated from the genomic reads of S. purpuratus were used to represent that species, rather than the reference genome sequence. We discarded any gene model that was not manually annotated, incomplete (i.e. no internal stop codons, missing valid start or stop codon) or were putative in-paralogs (i.e. annotated as paralogs or of overlapping coordinates). Partial alignments of nuclear genes including ambiguous sites (i.e. heterozygote) were constructed from alignments of Illumina paired-end reads of nine species (Kober and Pogson, unpublished data) aligned to the S. purpuratus (Table 1). Paired-end short reads were aligned to Spur v3.1 using SSAHA2. Reads with a mapping quality of <30 were discarded. Nucleotides with a quality score of <25 were ignored. Heterozygotes sites were called when more than one allele was represented by a frequency of >0.20 and >10 valid nucleotides were present from aligned reads. We excluded alignments of greater than 100 unambiguous codons across all nine taxa, leaving 3,180 for analysis.
Additional tools used in the analyses included James Kent’s source tools , Biopython (http://biopython.org), FigTree (http://tree.bio.ed.ac.uk/software/figtree/), R , bedtools , EMBOSS  and the Newick utilites .
We created a concatenated alignment from the Gblocks masked alignments of mitochondrial genes (“MA”) and a concatenated alignment of mitochondrial genes from fourfold degenerate sites (“M4”) identified by codeml from PAML version 4.5 . Fourfold degenerate sites are identified in codeml as third position sites of a codon which are synonymous across all taxa in the alignment (Ziheng Yang, personal communication). Inference of positive Darwinian selection on mitochondrial protein coding genes was performed with codeml from the PAML package . The M7 and M8 models were used in an LRT test and significance was assessed based on a chi-square distribution with two degrees of freedom.
We identified nuclear genes with alignments of greater than 100 unambiguous codons across all nine taxa. A signal of positive Darwinian selection for a gene was defined as having a q-value < 0.05 based on the likelihood ratio test between models M7 and M8 implemented in codeml as described above for the mtDNA alignments. The most likely ML tree for these genes was used to represent the inferred phylogeny of that gene. We created concatenated alignments of 4-fold degenerate sites identified by codeml for all nuclear genes (“N4A”), nuclear genes with evidence for positive Darwinian selection (“N4S”) and those without any signal of positive selection (“N4N”).
The alignment of mitochondrial gene sequences newly obtained in the present study and the concatenated alignments of the fourfold degenerate sites of nuclear genes have been deposited in TreeBase (http://purl.org/phylo/treebase/phylows/study/TB2:S13990 ).
We used two main approaches to reconstruct the phylogeny of the group using mitochondrial and nuclear genes. The first was a support-based method, which evaluated the individual trees generated for each gene in the nuclear or mitochondrial genome. For the mitochondrial genes, this included an additional analysis accounting for pairing in the RNA secondary structure. The second main approach used the concatenation of sites between all genes in the nuclear or mitochondrial genome, respectively.
The ML tree for each mitochondrial gene was generated using PhyML with the best-fitting nucleotide substitution model, optimized tree topology, branch length and rate parameters, the best tree topology of NNI and SPR search operations, and 10 bootstrap replicates. The best fitting nucleotide substitution model was identified for each mitochondrial gene based on the AICc criterion evaluating 56 models using pmraic version 1.1 (http://www.abc.se/~nylander/mraic/pmraic.html) and PhyML 3.0 v. 20110919 .
For 12S and 16S, a Bayesian Inference (BI) partitioned analysis of RNA paired stem and unpaired loop sites  in were performed using PHASE 2.0 . We predicted a consensus secondary structure from each alignment using RNAalifold . Unpaired regions were analyzed under the general time-reversal REV  and paired stem regions were analyzed under the time reversible seven state RNA7D [92, 93]. We used a discrete-gamma model with six categories to approximate the Γ-distribution with no invariant sites allowed. We performed 1.5 million sampling iterations with a sampling period of 150 and burn-in iterations of 750,000. The remaining parameters followed Hudelot and colleagues .
For each nuclear gene consensus alignment, an ML tree was generated using PhyML using the estimated rate and probability of each class from the data (‘free_rates’), optimized tree topologies, branch length and rate parameters, the best tree topology of NNI and SPR search operations, and 100 bootstrap replicates.
For the concatenated ‘MA, ‘M4’, ‘N4A’, ‘N4S’ and ‘N4N’ alignments, we performed phylogenetic reconstructions using Maximum Parsimony (MP), Maximum Likelihood (ML) and Bayesian Inference (BI) methods in a uniform fashion. The ML analyses of concatenated datasets were performed with PhyML using the estimated rate and probability of each class from the data (‘free_rates’), optimized tree topologies, branch length and rate parameters, the best tree topology of NNI and SPR search operations, and 100 bootstrap replicates. The MP analyses of concatenated datasets were conducted using PAUP* version 4b10  and consisted of heuristic searches with 100,000 replicates of random stepwise addition and TBR branch swapping. Bootstrapping was done using 500,000 ‘fast-bootstrap’ pseudo-replicates. The BI analyses of concatenated datasets were performed using MrBayes v. 3.2.1  assuming a nucleotide substitution model with a gamma-distributed rate variation across sites and a proportion of sites invariable (GTR + I + G). Each dataset was run with four Markov chains for one million generations and sampled every 100 generations. Each analysis was run four times. The first 2500 trees from each run were discarded so that the final consensus tree was based on the combination of accepted trees from each run (a total of 30,004 trees). We tested the convergence between the four runs by examining the potential scale reduction factors (PSRF) produced by the ‘sump’ command in MrBayes. Support for nodes was determined using posterior probabilities (PP, calculated by MrBayes).
To determine if there were significant differences between two proposed trees given the data, we performed the SH test  using RELL bootstrap with 1000 replicates  and the HKY85 model of nucleotide substitution in PAUP*. We ascribed significance to a P-value < 0.05 as provided by the output.
A strict molecular-clock was tested against a non-clock model assuming a nucleotide substitution model with gamma-distributed rate variation across sites and a proportion of sites invariable (GTR + I + G) using MrBayes. Each dataset was run with four Markov chains for 500,000 generations to confirm PP convergence. The harmonic means of the likelihoods of the MCMC sampling were used as the marginal model likelihoods. A ratio exceeding 5 was considered very strong evidence favoring one model over the other . A strict clock-enforced BI tree with uniform branch lengths was used to estimate the divergence time of each species with MrBayes. We estimated a rate of substitutions per site per million years of 0.01 ± 0.005 and an exponential distribution with a rate of 0.1 for the tree age prior. One topology was tested with two divergence times. One calibration had a divergence time of 5–12 Ma for the ancestor S. intermedius and S. droebachiensis (Figure 4) based on the fossil record appearance of S. droebachiensis in the mid-Pliocene and identifiable Strongylocentrotus in the late Miocene . A second calibration used a divergence time between the (Strongylocentrotus, Hemicentrotus) clades and the (Paracentrotus, Mesocentrotus) clades of 13–19 Ma from 12S mitochondrial genes calibrated using a reference point estimated from the fossil record . Both of these calibrations remain within the Echinidae–Strongylocentrotidae divergence tentatively estimated to be at 25 Ma. .
We are grateful to Grant Pogson (UCSC) for discussion and comments on the manuscript and to John Pearse (UCSC) for discussion on the ecology and relationships of the group. We sincerely thank two anonymous reviewers for their helpful and constructive feedback. We thank David Haussler for access to computational resources. This manuscript is dedicated to memory of Roy J. Britten (1919–2012). This work was supported by the National Science Foundation (grant number DEB-1011061), the STEPS Foundation and the Myer’s Trust. Funding for Open Access provided by the UC Santa Cruz Open Access Fund.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.