- Research article
- Open Access
Duplication and expression of Sox genes in spiders
BMC Evolutionary Biology volume 18, Article number: 205 (2018)
The Sox family of transcription factors is an important part of the genetic ‘toolbox’ of all metazoans examined to date and is known to play important developmental roles in vertebrates and insects. However, outside the commonly studied Drosophila model little is known about the repertoire of Sox family transcription factors in other arthropod species. Here we characterise the Sox family in two chelicerate species, the spiders Parasteatoda tepidariorum and Stegodyphus mimosarum, which have experienced a whole genome duplication (WGD) in their evolutionary history.
We find that virtually all of the duplicate Sox genes have been retained in these spiders after the WGD. Analysis of the expression of Sox genes in P. tepidariorum embryos suggests that it is likely that some of these genes have neofunctionalised after duplication. Our expression analysis also strengthens the view that an orthologue of vertebrate Group B1 genes, SoxNeuro, is implicated in the earliest events of CNS specification in both vertebrates and invertebrates. In addition, a gene in the Dichaete/Sox21b class is dynamically expressed in the spider segment addition zone, suggestive of an ancient regulatory mechanism controlling arthropod segmentation as recently suggested for flies and beetles. Together with the recent analysis of Sox gene expression in the embryos of other arthropods, our findings support the idea of conserved functions for some of these genes, including a potential role for SoxC and SoxD genes in CNS development and SoxF in limb development.
Our study provides a new chelicerate perspective to understanding the evolution and function of Sox genes and how the retention of duplicates of such important tool-box genes after WGD has contributed to different aspects of spider embryogenesis. Future characterisation of the function of these genes in spiders will help us to better understand the evolution of the regulation of important developmental processes in arthropods and other metazoans including neurogenesis and segmentation.
The evolution of metazoan life forms was in part driven by the acquisition of novel families of transcription factors and signalling molecules that were subsequently expanded by gene duplications and evolved new functions [1, 2]. One such family, encoded by Sox genes, encompasses a set of conserved metazoan specific transcriptional regulators that play critical roles in a range of important developmental processes, in particular, aspects of stem cell biology and nervous system development [3,4,5].
The Sox family is defined by a set of genes containing an HMG class DNA binding domain sharing greater than 50% sequence identity with that of SRY, the Y-linked sex determining factor in eutherian mammals . In the chordates the family is represented by approximately 20 genes, which have been subdivided into eight groups (A-H) based mainly on homology within the DNA binding domain but also related group-specific domains outwith the HMG domain [7, 8]. In all metazoans examined to date representatives of the Sox family have been identified and these are largely restricted to Groups B to F with other groups specific to particular lineages . While Sox-like sequences have been reported in the genome of the choanoflagellate Monosiga brevicollis, these are more closely related to the non-sequence specific HMG1/2 class of DNA binding domain and thus true Sox genes are restricted to metazoans [10,11,12].
While vertebrate Sox genes have been intensively studied due to their critical roles in development, with the exception of the fruit fly Drosophila melanogaster, they are less well characterised in invertebrates . D. melanogaster contains eight Sox genes (four group B and one each in groups C to F), which is generally consistent across the insect genomes examined to date [9, 13, 14]. Of particular interest are the Group B genes of insects, which share a common genomic organisation that has been conserved across all insects examined to date, with three genes closely linked in a cluster [13,14,15]. Dichaete (D) plays critical roles in early segmentation and nervous system development, while SoxNeuro (SoxN) is essential for CNS development, and where the expression of these two genes overlaps in the embryonic CNS they exhibit phenotypic redundancy [16,17,18,19].
The evolutionary conservation of Sox protein sequence and function has been shown in rescue or swap experiments, where mouse Sox2 rescues Dichaete null mutant phenotypes in the D. melanogaster embryo and Drosophila SoxN can replace Sox2 in mouse ES cells [20, 21]. Furthermore, a comparison of Dichaete and SoxN genomic binding in the D. melanogaster embryo with Sox2 and Sox3 binding in mouse embryonic or neural stem cells indicates that these proteins share a common set of over 1000 core target genes [22,23,24]. These and other studies suggest that Sox proteins have ancient roles, particularly in the CNS, where their functions have been conserved from flies to mammals.
Of the other two D. melanogaster group B genes, Sox21a plays a repressive role in maintaining adult intestinal stem cell populations but there is no known function for Sox21b [25, 26]. The group C gene, Sox14, is involved in the response to the steroid hormone ecdysone and is necessary for metamorphosis ; Sox102F (Group D) has a role in late neuronal differentiation ; Sox100B (Group E) is involved in male testis development  and Sox15 (Group F) is involved in wing metamorphosis and adult sensory organ development [30, 31].
While functional studies are lacking in other insects, gene expression analysis in Apis mellifera and Bombyx mori indicates that aspects of Sox function are likely to be conserved across species [13, 14]. More recently, a similar role for Dichaete in the early segmentation of both Drosophila and the flour beetle Tribolium castaneum suggests that aspects of regulatory function as well as genomic organisation may have been conserved across insects . Outside the insects little is known, however genome sequence analysis and gene expression studies suggest key roles for Sox family members in stem cell and cell fate processes in Ctenophores  and Porifera , as well as neural progenitor development in Cnidarians  and a Dioplopod . Taken together with the extensive work in vertebrate systems, it is clear that Sox genes play critical roles in many aspects of metazoan development, at least some of which appear to be deeply conserved.
Arthropods comprise approximately 80% of living animal species , exhibiting a huge range of biological and morphological diversity that is believed to have originated during the Cambrian Period over 500 million years ago . While the analysis of traditional model arthropods such as D. melanogaster has taught us much about conserved developmental genes and processes, it is only more recently that genomic and other experimental approaches are beginning to shed light on the way genes and regulatory networks are deployed to generate the diversity of body plans found in other insects  and more widely in chelicerates and myriapods . In terms of the Sox family, recent work indicates conserved Group B expression in the early neuroectoderm of the myriapod Glomeris marginata  and neuroectodermal expression of a Group B gene has been reported in the chelicerate P. tepidariorum .
Chelicerates in particular offer an interesting system for exploring the evolution and diversification of developmental genes since it has emerged that some arachnid lineages, including spiders and scorpions, have undergone a whole genome duplication (WGD) . Interestingly, duplicated copies of many developmental genes, including Hox genes and other regulatory factors such as microRNAs, have been retained in P. tepidariorum and other arachnids [41, 42]. Thus, chelicerate genomes provide an opportunity to explore issues of gene retention, loss or diversification .
Here we report an analysis of the Sox gene family in the spiders, P. tepidariorum and S. mimosarum, and show that most duplicate Sox genes have been retained in the genomes of these spiders after the WGD, as well as retention of some paralogs generated from tandem duplications. Furthermore, while group B genes show highly conserved expression in the developing CNS, the expression of other spider Sox genes suggests they have evolved potentially novel functions in other aspects of embryogenesis.
Results and discussion
Characterisation of Sox genes in spiders
In order to characterise the Sox gene complement of spiders we conducted TBLASTN searches of the genomes of P. tepidariorum  and S. mimosarum  using the HMG domain of the mouse Sox2 protein, recovering 15 and 14 sequences respectively. All but three of these contained the highly conserved RPMNAFMVW motif that is characteristic of Sox proteins and the three exceptions (ptSoxC-2, ptSoxB-like and ptSox21b-2) only show minor conservative substitutions in this motif (see Fig. 3 for full alignments). 14 of the P. tepidariorum sequences corresponded to annotated gene models. Moreover, two sequences were identical (ptSox21b-1, aug3.24914.t1 and aug3.g24896.t1) and since the latter maps to a genomic scaffold of only ~ 7 kb, we presume this represents an assembly error and thus consider them as a single gene. One genomic scaffold encoding a Sox domain (ptSoxB-like, Scaffold3643:28071..28299) is in a region of poor sequence quality and we cannot be sure it represents a bona fide gene but have nevertheless included it in our subsequent analysis.
In the case of S. mimosarum we identified 14 genomic regions, 11 of which correspond to annotated Sox genes. Reciprocal BLAST searches of D. melanogaster or vertebrate genes recovered Sox proteins as top scoring hits. In addition to these true Sox gene sequences, we also recovered sequences that correspond to the D. melanogaster capicua (cic) and bobby sox (bbx) genes in both spider species but here we do not consider these Sox-related genes further.
To classify the spider Sox proteins we generated MUSCLE sequence alignments and PhyML maximum likelihood phylogenies using the HMG domains recovered from the BLAST searches, along with those from the eight D. melanogaster Sox genes and representatives of each subgroup from mouse (Additional file 1: Table S1). These analyses resulted in a clear classification of spider Sox genes into groups B-F as found in other invertebrate genomes (Fig. 1). Note that Group A only contains the SRY gene specific to eutherian mammals and there are no Group G, H or I Sox genes found outside the vertebrates. Supporting this classification, phylogenetic trees constructed with the full-length sequences of the predicted spider Sox proteins and those from D. melanogaster yielded virtually identical results (Additional file 2: Figure S1). Following the recommended nomenclature for Sox genes , we have named the spider Sox genes as indicated in Additional file 1: Table S1. The naming of D. melanogaster Sox genes is confusing with some carrying historic names based on their phenotype (Dichaete and SoxN), others named after cytological locations (Sox100B and Sox102F) and others with inappropriate numerical designations (D. melanogaster Sox14 is a Group C gene while in vertebrates Sox14 is in Group B and D. melanogaster Sox15 is in group F, while vertebrate Sox15 is in Group G). For these reasons we propose renaming the D. melanogaster group C-F genes according to the standard nomenclature used in the Sox field: these designations are already recognised as synonyms in FlyBase . With respect to the Group B genes, since the sequence and organisation of these appears to be invertebrate specific, we propose a nomenclature based on the current D. melanogaster gene names: SoxN, Dichaete, Sox21a and Sox21b (Additional file 1: Table S1).
In common with many other gene families in spiders , the Sox genes are mostly represented by two or more copies in each group (Fig. 2). In other arthropods examined to date, as well as the onychophoran Euperipatoides kanangrensis , there is usually only a single copy of each gene, although there is a recent report of two Group E genes in the millipede G. marginata . In the case of spider Groups D and E, the duplications likely predate the divergence of the two spider species we analysed since the duplicates group together in the phylogenetic analysis and show extensive homology across the length of the coding sequence (Fig. 1). With Group F, there is only one gene identified in S. mimosarum but two in P. tepidariorum. In the case of group C, there appears to have been additional duplication events in S. mimosarum. When we consider the full-length protein sequences (Additional file 2: Figure S1), ptSoxC-1 groups with smSoxC-1 and ptSoxC-2 with smSoxC-2. smSoxC-2 has undergone a local head-to-head duplication, with smSoxC-2 and smSoxC-3 adjacent in the genome. smSoxC-4 has no predicted gene model but the region of the genome encodes an uninterrupted HMG domain closely related to those of the smSoxC-2 and C-3 duplicates. Whether this is a bona fide gene remains to be determined.
In many organisms, some genes in Groups D, E and F contain an intron within the DNA binding domain sequence in a position that is highly conserved and specific for each group : our analysis indicates that this is also the case for the spider genes in these three groups (see arrows in Fig. 3). While there is an intron within the region encoding the DNA binding domains of spider Group D genes, it has been lost in the D. melanogaster orthologue. Secondary intron loss is also observed in Group F, where mouse Sox7 has no intron but the related Sox17 and Sox18 genes do. The location of these HMG domain introns suggests they were present in the common ancestor of the vertebrates and the arthropods.
While the Group B genes of insects and vertebrates show considerable sequence similarity in their DNA binding domains, they are clearly different in terms of their genomic organisation and functions. Vertebrate Group B genes are not linked in the genome and are subdivided into B1 (Sox1, 2 and 3) and B2 (Sox14 and 21). This classification manifests both at sequence and functional levels, with Group B1 proteins acting as transcriptional activators particularly important for nervous system specification, while the Group B2 proteins act as transcriptional repressors [47,48,49]. In contrast, the organisation and functional classification of Group B genes in insects is subject to some debate. There is a clear orthologue of the Group B1 proteins, represented by SoxN in D. melanogaster and genes named SoxB1 or Sox2 in every invertebrate genome examined. The remaining three D. melanogaster Group B genes (Dichaete, Sox21 and Sox21b) have been characterised as Group B2 based on sequence alignments with vertebrate proteins. In D. melanogaster these three genes are arranged in a cluster on Chromosome 3 L, an organisation that is conserved across at least 300 MY of evolution, with a similar gene arrangement found in flies, mosquitoes, wasps, bees and beetles [11, 13, 15]. While there is evidence that Sox21a has a repressive role consistent with the vertebrate B2 class [25, 26], considerable genomic evidence clearly shows Dichaete mainly acts as a transcriptional activator, a role inconsistent with that observed for vertebrate SoxB2 proteins [22, 50].
The phylogenies generated with the HMG domains from a range of species (Fig. 1; Additional file 2: Figure S1) or full-length proteins sequences from spiders and D. melanogaster (Additional file 3: Figure S2) support a classification of arthropod Group B genes where there is a single SoxN gene, one or more Sox21a genes and two or more Dichaete-Sox21b genes. In spiders, we find strong support for a single SoxN gene, duplications of the Sox21a class and a single Dichaete-like gene in both species. In P. tepidariorum we find a duplication of the Sox21b genes and the possibility of a further tandem duplication of ptSox21b-2 gene if the ptSoxB-like ORF is a genuine gene. S. mimosarum, in contrast, has a single Sox21b class gene. Intriguingly, we find that two P. tepidariorum Group B genes (ptDichaete and ptSox21a-1) are located in the same genomic region, separated by over 200 kb of intervening DNA that is devoid of other predicted genes (Fig. 4), an organisation reminiscent of that found in insects. Indeed, the linkage of ptDichaete and ptSox21a-1 supports the idea that these genes were formed by a tandem duplication in the protostome/deuterostome ancestor [11, 15]. The separation of SoxN from the Dichaete/Sox21a-1 cluster in the spider suggests that either this fragmentation happened early in arthropod evolution  or that the duplication and separation of SoxN and Dichaete (or Sox21a) occurred early in Sox evolution [11, 15] (Fig. 4).
Taken together, our analysis clearly shows that the spider genomes we examined have the full complement of Sox genes found in insects, have mostly retained duplicates in Groups C, D, E and F after the WGD, and have a Group B organisation that more closely resembles insects than vertebrates.
Arrangement of P. tepidariorum and S. mimosarum Sox genes after WGD
The phylogenetic relationships of Sox genes in P. tepidariorum suggest that there are two paralogs of each Sox gene in groups C to F, the exception being in Group B where we found single copies of SoxN and Dichaete, but duplicates of Sox21a and Sox21b (Figs. 1 and 2). To investigate if all of these duplicated Sox genes arose from the WGD event in the ancestor of these animals , the synteny of Sox genes was analysed in the P. tepidariorum and S. mimosarum genomes (Fig. 4).
Most of the Sox genes in P. tepidariorum and S. mimosarum were found dispersed in the genome on separate scaffolds consistent with the expectation that they arose via WGD. Analysis of the five upstream and five downstream genes flanking each Sox gene, however, revealed that dispersed duplicated Sox genes are generally not closely linked to other duplicated genes (Fig. 4, Additional file 4: Table S2 and Additional file 5: Table S3). While it is likely that this is a consequence of extensive loss of ohnologs and genomic rearrangements since the WGD 430 MYA, we cannot rule out that at least some of the duplicated Sox genes in this spider arose via tandem duplication followed by rearrangements after the WGD. The only obvious evidence for retention of similar synteny between the two spiders was observed between ptSoxD-2 and smSoxD-1, which both have RIOK and KRR1 genes located directly upstream with a conserved transcriptional orientation (Additional file 4: Table 2 and Additional file 5: Table S3). These observations further evidence, in conjunction with phylogenetic relationships, that Group D genes were duplicated in the ancestor of both spiders.
The only tentative example of retained synteny within a species was in the SoxF group, where we found that the two SoxF genes of P. tepidariorum have an upstream flanking sequence with homology to a transposable element (TE) with matching transcriptional orientation. Interestingly, six of the thirteen P. tepidariorum Sox containing scaffolds also have TE-like sequences nearby (Fig. 4). Furthermore, of the nine S. mimosarum scaffolds that have flanking gene information, three have TEs flanking Sox genes (Additional file 5: Table S3). TEs have previously been linked to the expansion of genes and their rearrangements [51, 52], however further analysis is needed to determine if TEs identified in this synteny analysis are involved in the evolution of Sox genes in spiders.
The exceptions to the dispersion of Sox genes in P. tepidariorum are ptDichaete and ptSox21a-1 on scaffold #756 (as discussed above), ptSox21b-2 and SoxB-like on scaffold #642 (Fig. 4), as well as smSoxC-2 and smSoxC-3 that are adjacent on scaffold #4648 (Additional file 4: Table S2). The sequences of the HMG domains of the clustered ptSox21b-2 and SoxB-like genes grouped together with high bootstrap confidence, indicative of a head-to-head tandem duplication (Figs. 1 and 4). However, the HMG domain of SoxB-like is split across two reading frames and although the sequence quality is poor in parts of this scaffold, it’s sequence similarity to ptSox21b-2 suggests that SoxB-like may have been pseudogenised (Fig. 4).
Sox gene expression during P. tepidariorum embryogenesis
We next studied the expression of Sox genes during embryogenesis in P. tepidariorum using in situ hybridisation. For the SoxB family genes ptSox21a-1, ptSox21a-2, ptSox21b-2 and Dichaete, we did not detect any expression during embryogenesis. This might indicate that they are only expressed at very low levels, only in a few cells or that these genes are used during post-embryonic development.
ptSoxN expression is visible from late stage 7 in the most anterior part of the germ band, a region corresponding to the presumptive neuroectoderm (Fig. 5a). This head-specific expression in P. tepidariorum is similar to early expression of SoxN observed in D. melanogaster  and in A. mellifera, where SoxB1 is expressed in the gastrulation fold and the anterior part of the presumptive neuroectoderm . ptSoxN is subsequently expressed broadly in the developing head and follows neurogenesis in a progressive anterior-to-posterior pattern as new segments are added (Fig. 5b). By mid stage 9, ptSoxN is strongly expressed in the head lobes and in the ventral nerve cord (Fig. 5c), however, after this stage no further expression was detected. In both D. melanogaster and A. mellifera, SoxN expression is also observed throughout the neuroectoderm and becomes restricted to the neuroblasts [13, 18, 19].
In chelicerates, neurogenic progenitors delaminate in clusters of cells rather than single neuroblast-like cells found in dipterans and some hymenopterans . However, even with these different modes of neurogenic differentiation, the expression of SoxN orthologues suggests this gene performs the same function. Indeed, the recent study of T. castaneum, E. kanangrensis and G. marginata also shows that the SoxN orthologues in these species have widespread and early neuroectodermal expression . Taken together with published SoxN expression, our results clearly support the view that throughout the Bilateria a SoxN class protein is a marker of the earliest stages of neural specification.
Another member of the B group, ptSox21b-1, shows expression in the nascent prosomal segments and in the posterior segment addition zone (SAZ) from stage 7 (Fig. 6a and b). At stage 8.2 expression is observed in the most anterior part of the germ band, which corresponds to the presumptive neuroectoderm in the future head and prosomal segments (Fig. 6c). At stages 9 and 10, strong expression is apparent throughout the ventral nerve cord, similar to ptSoxN. Comparing expression in the SAZ at different stages in these fixed preparations suggest that Sox21b-1 may be dynamic in this region (Fig. 6d and e).
In T. castaneum, Sox21b has similar expression to insect Dichaete genes, early in the SAZ and then in the developing CNS. In E. kanangrensis and G. marginata, there is no early Sox21b expression , however in these species Dichaete is expressed during segmentation and then later in the CNS. This suggests that the role of Dichaete in D. melanogaster and T. castaneum segmentation  could extend to E. kanangrensis and G. marginata, whereas in spiders the closely related Sox21b-1 gene may play this role. The widespread expression of both SoxN and Sox21b-1 throughout the neuroectoderm strongly suggest that, as has been shown in vertebrates and flies, many cells in the developing CNS co-express two related SoxB genes. We confirmed their overlapping expression in the CNS, but not in the SAZ, with double in situ hybridisations, using SoxN and Sox21b-1 probes (Additional file 6: Figure S3). While both genes clearly show extensive expression overlap throughout the developing CNS, we were interested to note that at the very lateral regions of the neuroectoderm, Sox21b-1 is uniquely expressed. This is similar to the situation in Drosophila where SoxN has a unique lateral expression domain [18, 19].
In the case of the Sox C genes, we did not detect any expression for ptSoxC-2. However, ptSoxC-1 expression was found at mid-stage 6, in a pattern similar to that of ptSoxN in the most anterior part of the germ band in the presumptive neuroectoderm (Fig. 7a). By stage 8.2 expression is apparent in neuroectodermal progenitors along the germ band and at the anterior region of the SAZ (Fig. 7b), however by stage 9.1 (Fig. 7c) expression is lost from the SAZ. Interestingly, from stage 9.1, ptSoxC-1 is expressed in the ventral nerve cord, from the head to the SAZ, however unlike the uniform expression of ptSoxN, ptSoxC-1 is observed in clusters of cells, presumably undergoing neurogenic differentiation, progressively from the head through to opisthosomal segments as they differentiate in an anterior to posterior manner (Fig. 7c).
In D. melanogaster, the single SoxC gene has been shown to play a role in the response to ecdysone at the onset of metamorphosis and has no known role in the embryonic CNS . In contrast, the vertebrate SoxC genes (Sox4, 11 and 12) play critical roles in the differentiation of post-mitotic neurons, acting after the Group B genes, which specify neural progenitors . In A. mellifera, late expression of the SoxC gene was observed in the embryonic cephalic lobes and in the mushroom bodies . The expression of SoxC orthologues in the embryonic CNS of other invertebrates  suggests that this class of Sox gene may play a conserved role in aspects of neuronal differentiation, which has been lost in D. melanogaster. Interestingly, a comparison of target genes bound by Sox11 in differentiating mouse neurons and SoxN in the D. melanogaster embryo shows a conserved set of neural differentiation genes, suggesting that in D. melanogaster the role of SoxC in neuronogenesis has been taken over by SoxN [23, 56].
We identified two genes in each of the SoxD, E and F families, however, we found no in situ evidence for expression of SoxD-2, SoxE-2 or SoxF-1 during the P. tepidariorum embryonic stages we examined. For ptSoxD-2 we found no expression prior to stage 10, but we then observed expression in the ventral nerve cord from the head to the most posterior part of the opisthosoma (Fig. 8a). The D. melanogaster SoxD gene is also expressed at later stages of embryonic CNS development  and has been shown to play roles in neurogenesis in the larval CNS . While SoxD has been reported to be ubiquitously expressed in A. mellifera embryos, it is also expressed in the mushroom bodies of the adult brain . Embryonic brain expression of SoxD orthologues in beetles, myriapods and velvet worms , as well as a known role for SoxD genes in aspects of vertebrate neurogenesis [55, 58], again suggests conserved roles for SoxD during metazoan evolution.
ptSoxE-1 is expressed in the developing limbs from stage 9 in small regions of the chelicerae, pedipalps and L1 buds, with broader expression in L2 and L3, and in two prominent foci in the L4 limbs, that correspond to the differentiating peripheral nervous system (PNS) (Fig. 8b). At the stages we examined we did not observed any expression of ptSoxE-1 in opisthosomal segments 2 to 6 where the germline is believed to originate .
In D. melanogaster, the SoxE orthologue is associated with both endodermal and mesodermal differentiation, is expressed in the embryonic gut, malpighian tubules and gonad , and has been shown to be required for testis differentiation during metamorphosis . Both the A. mellifera SoxE genes are also expressed in the testis . Janssen and colleagues observed expression of SoxE genes in other invertebrates, associated with limb buds as we observed in the spider, but they also detected posterior expression associated with gonadogenesis . These observations are particularly intriguing since the vertebrate Sox9 gene has a crucial function in testis development . Therefore, while we did not observe SoxE expression associated with early gonadogenesis it remains possible that the spider genes are used later in this process. We note that while the fly SoxE gene is expressed from the earliest stages of gonadogenesis, null mutant phenotypes are not apparent until the onset of metamorphosis . In vertebrates, Group E genes are required in neural crest cells that contribute to the PNS [3, 62, 63] and we suggest the spider orthologue may have a similar function in the mechanoreceptors. These receptors are distributed all over the body, but the trichobothria only appear on the extremities of the limbs  where they differentiate from PNS progenitors.
Finally, the expression of ptSoxF-2 is only detected at stage 9, in single foci at the tips of the L1 segment limb buds (Fig. 8c). In D. melanogaster the SoxF gene is expressed in the embryonic PNS  and plays a role in the differentiation of sensory organ precursors , whereas in A. mellifera, the SoxF orthologue is expressed ubiquitously throughout the embryo . In T. castaneum, E. kanangrensis and G. marginata, SoxF expression is also associated with the embryonic limb buds , again suggesting that this was an ancestral function of this Sox family in the Euarthropoda.
Taken together, our study expands our understanding of a highly conserved family of transcriptional regulators that appear to have played prominent roles in metazoan evolution. Our analysis indicates that the classification of Sox genes in the invertebrates appears to be robust and that genes in all Groups have aspects of their expression patterns that suggest evolutionary conservation across the Bilateria. In particular, it is becoming increasingly clear that a SoxN orthologue (SoxB1 in vertebrates) has a prominent role in the earliest aspects of CNS development. The finding that a Dichaete/Sox21-b class gene is implicated in the segmentation of both long and short germ band insects as well as the spider, and more widely in other arthropods , supports the view that formation of the segmented arthropod body plan is driven by an ancient mechanism , involving these Sox genes.
Our analysis provides insights into the fate of duplicate genes in organisms that have undergone WGD. We find that virtually all the duplicates have been retained in the spider genome but the expression analysis suggests that some have possibly been subject to subfunctionalisation and/or neofunctionalisation. It is interesting to note that in teleost fish, which have also undergone WGD events, the pattern we observe for the Sox family in spiders is mirrored, with considerable gene retention and lineage-specific neo-functionalisation . Clearly, future functional studies in P. tepidariorum will help to reveal the precise roles played by Sox genes during spider embryogenesis and how this relates to other metazoans.
Materials and methods
TBLASTN searches of the P. tepidariorum and S. mimosarum genomes were performed with the HMG domain of mouse Sox2 (UniProtKB - P48432) at http://bioinf.uni-greifswald.de/blast/parasteatoda/blast.php and http://metazoa.ensembl.org/Stegodyphus_mimosarum/Info/Index respectively. Gene models were retrieved from the P. tepidariorum Web Apollo genome annotations via https://apollo.nal.usda.gov/partep/jbrowse/ and from http://metazoa.ensembl.org/Stegodyphus_mimosarum/Info/Index. Sox gene sequences for other insects and vertebrates were retrieved from UniProt https://www.uniprot.org.
Multiple sequence alignments and phylogenetic analysis were performed with Clustal Omega  at http://www.ebi.ac.uk/Tools/msa/clustalo/ or with MUSCLE and PhyLM 3.0 [67, 68] at http://www.phylogeny.fr/index.cgi. Pairwise sequence alignments were performed with SIM  at http://web.expasy.org/sim/.
Synteny analysis of Sox genes in P. tepidariorum and S. mimosarum
The synteny of Sox genes was analysed to determine whether Sox genes were duplicated during the reported WGD .
For P. tepidariorum the AUGUSTUS gene models are already mapped against the DoveTail/HiRise genome assembly  and using these data the locations of Sox genes along with five upstream and five downstream flanking genes were compared. Gene models were removed if they were partial, chimeric or artefacts of the AUGUSTUS annotation to the HiRise assembly. To infer putative homology of flanking genes, their protein sequences were compared with BLASTP to the NCBI non-redundant protein sequence database .
For S. mimosarum the Sox gene models and their location in the genome were obtained from . Similar to P. tepidariorum, the synteny of the five upstream and five downstream genes relative to each Sox gene were compared. Annotations of flanking genes was previously performed by Sanggaard et al .
Embryo collection and procedures
Embryos were collected from adult female spiders from the temperature controlled (25 °C) laboratory culture at Oxford Brookes University. Embryos at stages 5 to 12 were fixed as described in  and staged according to .
In situ hybridisation
RNA in situ hybridisation was carried out as described in , with the following minor modifications: Proteinase K treatment and post-fixations steps in the original protocol were omitted, and prior to hybridization, the probes were heated to 80 °C for 5 min and immediately put on ice before adding to the pre-hybridization buffer. Fluorescent in situ hybridization was performed following . Tyramide Signal Amplification (TSA) was performed with TSA kits from PerkinElmer (TSA Fluorescein and TSA Cyanine). Post hybridisation, nuclear staining was achieved by incubating embryos in 1 μg/ml 4–6-diamidino-2-phenylindol (DAPI) in PBS with 0.1% Tween-20 for 15 min. Embryos were mounted in glycerol on Poly-L-lysine (Sigma-Aldrich) coated coverslips, where the germband tissue attaches making it easier to remove the yolk before imaging. Images were taken with an AxioZoom V16 stereomicroscope (Zeiss) equipped with an Axiocam 506 mono and colour digital camera. Brightness and intensity of the pictures were adjusted in Corel PhotoPaint X5 (CorelDraw).
Gene isolation and cloning
Gene-specific cDNA fragments were amplified with primers designed with Primer Blast (https://www.ncbi.nlm.nih.gov/tools/primer-blast/) and PCR products cloned in the pCR4-TOPO vector (Invitrogen, Life Technologies). The primers to generate probe fragments for RNA in situ hybridization were designed to regions outside the consensus HMG domain to produce DNA fragments between 500 and 800 bp. The probes were in vitro transcribed as described in . Primers and fragment sizes are described in Additional file 7: Table S4.
Shimeld SM, Holland PW. Vertebrate innovations. Proc Natl Acad Sci U S A. 2000;97(9):4449–52.
Larroux C, Luke GN, Koopman P, Rokhsar DS, Shimeld SM, Degnan BM. Genesis and expansion of metazoan Transcription factor gene classes. Mol Biol Evol. 2008;25(5):980–96.
Kamachi Y, Kondoh H. Sox proteins: regulators of cell fate specification and differentiation. Development. 2013;140(20):4129–44.
Sarkar A, Hochedlinger K. The sox family of transcription factors: versatile regulators of stem and progenitor cell fate. Cell Stem Cell. 2013;12(1):15–30.
Reiprich S, Wegner M. From CNS stem cells to neurons and glia: Sox for everyone. Cell Tissue Res. 2015;359(1):111–24.
Sinclair A, Berta P, Palmer M, Hawkins J, Griffiths B, Smith M, Foster J, Frischauf A, Lovell-Badge R, Goodfellow P. A gene from the human sex determining region encodes a protein with homology to a conserved DNA binding motif. Nature. 1990;346:240–4.
Bowles J, Schepers G, Koopman P. Phylogeny of the SOX family of developmental transcription factors based on sequence and structural indicators. Dev Biol. 2000;227:239–55.
Heenan P, Zondag L, Wilson MJ. Evolution of the Sox gene family within the chordate phylum. Gene. 2016;575(2 Pt 2):385–92.
Phochanukul N, Russell S. No backbone but lots of Sox: invertebrate Sox genes. Int J Biochem Cell Biol. 2010;42(3):453–64.
King N, Westbrook MJ, Young SL, Kuo A, Abedin M, Chapman J, Fairclough S, Hellsten U, Isogai Y, Letunic I, et al. The genome of the choanoflagellate Monosiga brevicollis and the origin of metazoans. Nature. 2008;451(7180):783–8.
Zhong L, Wang D, Gan X, Yang T, He S. Parallel expansions of Sox Transcription factor group B predating the diversifications of the arthropods and jawed vertebrates. PLoS One. 2011;6(1):e16570.
Schnitzler CE, Simmons DK, Pang K, Martindale MQ, Baxevanis AD. Expression of multiple Sox genes through embryonic development in the ctenophore Mnemiopsis leidyi is spatially restricted to zones of cell proliferation. Evodevo. 2014;5:15.
Wilson MJ, Dearden PK. Evolution of the insect Sox genes. BMC Evol Biol. 2008;8(1):120.
Wei L, Cheng D, Li D, Meng M, Peng L, Tang L, Pan M, Xiang Z, Xia Q, Lu C. Identification and characterization of Sox genes in the silkworm. Mol Biol Rep. 2010:1–12.
McKimmie C, Woerfel G, Russell S. Conserved genomic organisation of group B Sox genes in insects. BMC Genet. 2005;6:26.21–15.
Russell SRH, Sanchez-Soriano N, Wright CR, Ashburner M. The Dichaete gene of Drosophila melanogaster encodes a SOX-domain protein required for embryonic segmentation. Development. 1996;122:3669–76.
Nambu P, Nambu J. The Drosophila fishhook gene encodes a HMG domain protein essential for segmentation and CNS development. Development. 1996;122:3467–75.
Buescher M, Hing FS, Chia W. Formation of neuroblasts in the embryonic central nervous system of Drosophila melanogaster is controlled by SoxNeuro. Development. 2002:4193–203.
Overton P, Meadows L, Urban J, Russell S. Evidence for differential and redundant function of the Sox genes Dichaete and SoxN during CNS development in Drosophila. Development. 2002;129:4219–28.
Sanchez-Soriano N, Russell S. The Drosophila Sox-domain protein Dichaete is required for the development of the central nervous system midline. Development. 1998;125:3989–96.
Niwa H, Nakamura A, Urata M, Shirae-Kurabayashi M, Kuraku S, Russell S, Ohtsuka S. The evolutionally-conserved function of group B1 Sox family members confers the unique role of Sox2 in mouse ES cells. BMC Evol Biol. 2016:1–12.
Aleksic J, Ferrero E, Fischer B, Shen SP, Russell S. The role of Dichaete in transcriptional regulation during Drosophila embryonic development. BMC Genomics. 2013;14:861.
Ferrero E, Fischer B, Russell S. SoxNeuro orchestrates central nervous system specification and differentiation in Drosophila and is only partially redundant with Dichaete. Genome Biol. 2014;15(5):R74.
Carl S, Russell S. Common binding by redundant group B Sox proteins is evolutionarily conserved in Drosophila. BMC Genomics. 2015;16:292.
Meng FW, Biteau B, Sox Transcription A. Factor is a critical regulator of adult stem cell proliferation in the Drosophila intestine. Cell Rep. 2015;13(5):906–14.
Chen J, Xu N, Huang H, Cai T, Xi R. A feedback amplification loop between stem cells and their progeny promotes tissue regeneration and tumorigenesis. eLife. 2016;5:e14330.
Ritter AR, Beckstead RB. Sox14 is required for transcriptional and developmental responses to 20-hydroxyecdysone at the onset of drosophila metamorphosis. Dev Dyn. 2010;239(10):2685–94.
Le R, Bubnys A, Kirchner R, Chapman B, Hofmann O, Hide W, Tanzi RE. Silencing of the Drosophila ortholog of Sox5 leads to abnormal neuronal development and behavioural impairment. Hum Mol Gen. 2017;26(8):1472–482.
Nanda S, Defalco T, Hui Yong Loh S, Phochanukul N, Camara N, Van Doren M, Russell S. Sox100B, a Drosophila group E Sox-domain gene, is required for somatic testis differentiation. Sex Dev. 2009;3(1):26–37.
Dichtel-Danjoy M-L, Caldeira J, Casares F. SoxF is part of a novel negative-feedback loop in the wingless pathway that controls proliferation in the Drosophila wing disc. Development. 2009;136(5):761–9.
Miller SW, Avidor-Reiss T, Polyanovsky A, Posakony JW. Complex interplay of three transcription factors in controlling the tormogen differentiation program of Drosophila mechanoreceptors. Dev Biol. 2009;329(2):386–99.
Clark E, Peel A. Evidence for the temporal regulation of insect segmentation by a conserved sequence of transcription factors. Development. 2018;145. https://doi.org/10.1242/dev.155580.
Fortunato S, Adamski M, Bergum B, Guder C, Jordal S, Leininger S, Zwafink C, Rapp HT, Adamska M. Genome-wide analysis of the sox family in the calcareous sponge Sycon ciliatum: multiple genes with unique expression patterns. Evodevo. 2012;3(1):14.
Richards GS, Rentzsch F. Regulation of Nematostella neural progenitors by SoxB, Notch and bHLH genes. Development. 2015;142(19):3332–42.
Pioro HL, Stollewerk A. The expression pattern of genes involved in early neurogenesis suggests distinct and conserved functions in the diplopod Glomeris marginata. Dev Genes Evol. 2006;216(7–8):417–30.
Stork NE, McBroom J, Gely C, Hamilton AJ. New approaches narrow global species estimates for beetles, insects, and terrestrial arthropods. Proc Natl Acad Sci U S A. 2015;112(24):7519–23.
Valentine JW, Jablonski D, Erwin DH. Fossils, molecules and embryos: new perspectives on the Cambrian explosion. Development. 1999;126(5):851–9.
Schmidt-Ott U, Lynch JA. Emerging developmental genetic model systems in holometabolous insects. Curr Opin Genet Dev. 2016;39:116–28.
Leite DJ, McGregor AP. Arthropod evolution and development: recent insights from chelicerates and myriapods. Curr Opin Genet Dev. 2016;39:93–100.
Akiyama-Oda Y, Oda H. Multi-color FISH facilitates analysis of cell-type diversification and developmental gene regulation in the Parasteatoda spider embryo. Develop Growth Differ. 2016;58(2):215–24.
Schwager EE, Sharma PP, Clarke T, Leite DJ, Wierschin T, Pechmann M, Akiyama-Oda Y, Esposito L, Bechsgaard J, Bilde T, et al. The house spider genome reveals an ancient whole-genome duplication during arachnid evolution. BMC Biol. 2017;15(1):62.
Leite DJ, Ninova M, Hilbrant M, Arif S, Griffiths-Jones S, Ronshaugen M, McGregor AP. Pervasive microRNA duplication in chelicerates: insights from the embryonic microRNA repertoire of the spider Parasteatoda tepidariorum. Genome Biol Evol. 2016;8(7):2133–44.
Hilbrant M, Damen WG, McGregor AP. Evolutionary crossroads in developmental biology: the spider Parasteatoda tepidariorum. Development. 2012;139(15):2655–62.
Sanggaard KW, Bechsgaard JS, Fang X, Duan J, Dyrlund TF, Gupta V, Jiang X, Cheng L, Fan D, Feng Y, et al. Spider genomes provide insight into composition and evolution of venom and silk. Nat Commun. 2014;5:3765.
Gramates LS, Marygold SJ, dos Santos G, Urbano J-M, Antonazzo G, Matthews BB, Rey AJ, Tabone CJ, Crosby MA, Emmert DB, Falls K, Goodman JL, Hu Y, Ponting L, Schroeder AJ, Strelets VB, Thurmond J, Zhou P, the FlyBase Consortium. FlyBase at 25: looking to the future. Nucleic Acids Res. 2017;45(D1):D663–71.
Janssen R, Andersson E, S B, Fowler W, Höök L, Leyhr J, Landström E, Mannelqvist A, Panara V, Smith K et al: embryonic expression patterns and phylogenetic analysis of panarthropod Sox genes: insight into nervous system development, segmentation and gonadogenesis. BMC Evol Biol. 2018;18(1):88.
Pevny L, Placzek M. Sox genes and neural progenitor identity. Curr Opin Neurobiol. 2005;15:7–13.
Uchikawa M, Kamachi Y, Kondoh H. Two distinct subgroups of group B Sox genes for transcriptional activators and repressors: their expression during embryonic organogenesis of the chicken. Mech Dev. 1999;84:103–20.
Popovic J, Stanisavljevic D, Schwirtlich M, Klajn A, Marjanovic J, Stevanovic M. Expression analysis of SOX14 during retinoic acid induced neural differentiation of embryonal carcinoma cells and assessment of the effect of its ectopic expression on SOXB members in HeLa cells. PLoS One. 2014;9(3):e91852.
Shen SP, Aleksic J, Russell S. Identifying targets of the Sox domain protein Dichaete in the Drosophila CNS via targeted expression of dominant negative proteins. BMC Dev Biol. 2013;13:1.
Faino L, Seidl MF, Shi-Kunne X, Pauper M, van den Berg GC, Wittenberg AH, Thomma BP. Transposons passively and actively contribute to evolution of the two-speed genome of a fungal pathogen. Genome Res. 2016;26(8):1091–100.
Levine MT, Vander Wende HM, Hsieh E, Baker EP, Malik HS. Recurrent gene duplication diversifies genome defense repertoire in Drosophila. Mol Biol Evol. 2016;33(7):1641–53.
Cremazy F, Berta P, Girard F. SoxNeuro, a new Drosophila Sox gene expressed in the developing central nervous system. Mech Dev. 2000;93:215–9.
Stollewerk A, Chipman AD. Neurogenesis in myriapods and chelicerates and its importance for understanding arthropod relationships. Integr Comp Biol. 2006;46(2):195–206.
Tanaka S, Kamachi Y, Tanouchi A, Hamada H, Jing N, Kondoh H. Interplay of SOX and POU factors in regulation of the nestin gene in neural primordial cells. Mol Cell Biol. 2004;24:8834–46.
Bergsland M, Ramskold D, Zaouter C, Klum S, Sandberg R, Muhr J. Sequentially acting Sox transcription factors in neural lineage development. Genes Dev. 2011;25:2453–64.
Cremazy F, Berta P, Girard F. Genome-wide analysis of Sox genes in Drosophila melanogaster. Mech Dev. 2001;109:371–5.
Lefebvre V. The SoxD transcription factors – Sox5, Sox6, and Sox13 – are key cell fate modulators. Int J Biochem Cell Biol. 2010;42:429–32.
Schwager EE, Meng Y, Extavour CG. Vasa and piwi are required for mitotic integrity in early embryogenesis in the spider Parasteatoda tepidariorum. Dev Biol. 2015;402(2):276–90.
Loh SHY, Russell S. A Drosophila group E Sox gene is dynamically expressed in the embryonic alimentary canal. Mech Dev. 2000;93:4.
Vidal VPI, Charboissier M-C, deRooij DG, Schedl A. Sox9 induces testis development in XX transgenic mice. Nat Genet. 2001;28:216–7.
Bell DM, Leung KK, Wheatley SC, Ng LJ, Zhou S, Ling KW, Sham MH, Koopman P, Tam PP, Cheah KS. SOX9 directly regulates the type-II collagen gene. Nat Genet. 1997;16(2):174–8.
Stolt CC, Wegner M. SoxE function in vertebrate nervous system development. Int J Biochem Cell Biol. 2010;42:437–40.
Stollewerk A, Weller M, Tautz D. Neurogenesis in the spider Cupiennius salei. Development. 2001;128(14):2673–88.
Voldoire E, Brunet F, Naville M, Volff JN, Galiana D. Expansion by whole genome duplication and evolution of the sox gene family in teleost fish. PLoS One. 2017;12(7):e0180936.
Sievers F, Wilm A, Dineen D, Gibson TJ, Karplus K, Li W, Lopez R, McWilliam H, Remmert M, Soding J, et al. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal omega. Mol Syst Biol. 2011;7:539.
Edgar RC. MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics. 2004;5:113.
Dereeper A, Guignon V, Blanc G, Audic S, Buffet S, Chevenet F, Dufayard JF, Guindon S, Lefort V, Lescot M, et al. Phylogeny.fr: robust phylogenetic analysis for the non-specialist. Nucleic Acids Res. 2008;36(Web Server issue):W465–9.
Huang X, Miller W. A time-efficient linear-space local similarity algorithm. Adv Appl Math. 1991;12:337–57.
Altschul SF, Wootton JC, Gertz EM, Agarwala R, Morgulis A, Schäffer AA, Yu YK. Protein database searches using compositionally adjusted substitution matrices. FEBS J. 2005;272:5101–9.
Akiyama-Oda Y, Oda H. Early patterning of the spider embryo: a cluster of mesenchymal cells at the cumulus produces Dpp signals received by germ disc epithelial cells. Development. 2003;130:1735–47.
Mittmann B, Wolff C. Embryonic development and staging of the cobweb spider Parasteatoda tepidariorum C. L. Koch, 1841 (syn.: Achaearanea tepidariorum; Araneomorphae; Theridiidae). Dev Genes Evol. 2012;222(4):189–216.
We thank Evelyn Schwager for assistance in identifying Sox genes in the P. tepidariorum genome. CLBP is immensely grateful for the help and discussion with the members of the Embryology course (Woods Hole – 2016), especially Joaquin Navajas Acedo (Stowers Institute – USA) for pushing to keep the ball rolling.
This research was funded by a CNPq scholarship to CLBP (234586/2014–1), a grant from The Leverhulme Trust (RPG-2016-234) to APM and AS, and in part by a BBSRC grant (BB/N007069/1) to SR.
Availability of data and materials
Gene models for P. tepidariorum and S. mimosarum were retrieved from https://www.hgsc.bcm.edu/arthropods/common-house-spider-genome-project and from http://metazoa.ensembl.org/Stegodyphus_mimosarum/Info/Index. Sox gene sequences for animals were retrieved from UniProt. The annotated P. tepidariorum genome is available at https://i5k.nal.usda.gov/JBrowse-partep and the assembly is deposited at NCBI: BioProject PRJNA167405 (Accession: AOMJ00000000).
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Table S1. HMG-domain and, where available, full-length protein sequences from D. melanogaster, P. tepidariorum, S. mimosarum and M. musculus. Gene indicates the proposed names (or defined names for mouse). DB_Name indicates gene or gene model name from databases. DB_ID is the gene or protein accession. Scaffold indicates chromosome or genomic scaffold location. Annotation is the designation from spider annotations. (XLSX 53 kb)
Figure S1. Phylogeny of Group B Sox HMG domains PhyLM tree and multiple sequence alignment of group B HMG domains from Mus musculus (Mm), Drosophila melanogaster (Dm), Anopheles gambiae (Ag), Tribolium castaneum (Tc) Parasteatoda tepidariorum (Pt) and Stegodyphus mimosarum (Sm). Branch support values from PhyML are indicated in red. Arrow indicates the conserved Isoleucine reside indicative of invertebrate Dichaete/Sox21b class genes . (PNG 849 kb)
Figure 2. Phylogeny of full-length Sox proteins from Drosophila and spiders. PhyLM tree of Sox genes from D. melanogaster (Dm), P. tepidariorum (Pt) and S. mimosarum (Sm) based on available full-length protein sequence (Additional file 1: Table S1). Branch support values from PhyML are indicated in red. (PNG 1624 kb)
Table 2. Gene and scaffold IDs of Sox and linked genes in the P. tepidariorum genome. (TXT 8 kb)
Table 3. Gene and scaffold IDs of Sox and linked genes in the S. mimosarum genome. (TXT 3 kb)
Figure S3. Double Fluorescent in situ Hybridization Double in situ hybridization with (A) digoxigenin-labelled pt-sox21b-1 in red and (B) fluorescein pt-SoxN in green. C) Merged figures A and B shows the overlap. (PNG 2968 kb)
Table S4. Genes, primers sequences and sizes for all the fragments used for in situ hybridisations. (DOCX 15 kb)