Evolution of the insect Sox genes
© Wilson and Dearden. 2008
Received: 14 January 2008
Accepted: 26 April 2008
Published: 26 April 2008
Skip to main content
© Wilson and Dearden. 2008
Received: 14 January 2008
Accepted: 26 April 2008
Published: 26 April 2008
The Sox gene family of transcriptional regulators have essential roles during development and have been extensively studied in vertebrates. The mouse, human and fugu genomes contain at least 20 Sox genes, which are subdivided into groups based on sequence similarity of the highly conserved HMG domain. In the well-studied insect Drosophila melanogaster, eight Sox genes have been identified and are involved in processes such as neurogenesis, dorsal-ventral patterning and segmentation.
We examined the available genome sequences of Apis mellifera, Nasonia vitripennis, Tribolium castaneum, Anopheles gambiae and identified Sox family members which were classified by phylogenetics using the HMG domains. Using in situ hybridisation we determined the expression patterns of eight honeybee Sox genes in honeybee embryo, adult brain and queen ovary. AmSoxB group genes were expressed in the nervous system, brain and Malphigian tubules. The restricted localization of AmSox21b and AmSoxB1 mRNAs within the oocyte, suggested a role in, or that they are regulated by, dorsal-ventral patterning. AmSoxC, D and F were expressed ubiquitously in late embryos and in the follicle cells of the queen ovary. Expression of AmSoxF and two AmSoxE genes was detected in the drone testis.
Insect genomes contain between eight and nine Sox genes, with at least four members belonging to Sox group B and other Sox subgroups each being represented by a single Sox gene. Hymenopteran insects have an additional SoxE gene, which may have arisen by gene duplication. Expression analyses of honeybee SoxB genes implies that this group of genes may be able to rapidly evolve new functions and expression domains, while the combined expression pattern of all the SoxB genes is maintained.
The SOX gene family is a group of related transcription factors that play critical roles in embryonic development. This family was originally identified in mammals based on sequence similarity to SRY, the sex-determining region Y chromosome . SOX proteins regulate gene expression by binding to DNA via a conserved DNA binding domain, the HMG (high mobility group) box (reviewed in ). Phylogenetic studies have determined that SOX family members segregate into ten groups (named A-J) on the basis of sequence similarities within the HMG box [3–5], with many groups containing multiple members from the same organism with related gene function. Human and mouse genomes each encode 20 Sox genes [3, 6] and analysis of the genomes of many model organisms including chicken, Drosophila, Xenopus and Zebrafish reveal that the Sox gene family is conserved between animal phyla. Recently Sox genes have been identified in the genomes of the cnidarian Nematostella vectensis, ctenophores, and the sponge species Reniera indicating these are ancestral animal genes [7–9]. In vertebrates, SOX proteins have been shown to have essential roles in the formation of many body systems including the central nervous system, eye and heart development, bone cartilage, vasculature, sex determination and testis development [10–14]. Molecular and biochemical studies have shown that SOX proteins regulate cell fate and differentiation during development. Mutations in Sox genes have been shown to be the underlying cause of a number of human disorders and Sox genes are expressed during cancer progression [12, 15–18].
In the arthropod model organism, Drosophila melanogaster, eight Sox genes have been identified and their expression patterns determined [19, 20]. Drosophila Sox genes are expressed in the brain, developing eye, hindgut, nervous system and testes. Group B SOX proteins are present in the developing Drosophila central nervous system (CNS), and also in the CNS of vertebrates, implying that some Sox genes maintain a conserved role throughout evolution [19, 21]. The phenotypes of Drosophila Sox gene mutants indicate that Sox genes are involved in dorsal-ventral patterning, segmentation and neurogenesis [22–28]. Collectively, these studies demonstrate that the SOX family is an evolutionally conserved group of proteins essential for development. In insects, however, only the Sox genes from Drosophila have been characterised.
Recent whole-genome sequencing projects for Apis mellifera (the honeybee), Nasonia vitripennis (a parasitic wasp), Tribolium castaneum (the red flour beetle) and Anopheles gambiae (the malaria mosquito) have been completed or are near completion [8, 9, 29–32] allowing the identification and classification of the complete complement of a gene family from several holometabolous insects. Here we identify Sox gene family members in the genomes of these insects and examine their relationship through phylogenetics. Additionally, we study the expression of the honeybee Sox genes by in situ hybridisation and RT-PCR.
An advanced social insect, the honeybee is fast becoming an important model organism for the study of behaviour, longevity, learning and memory, immunity, polyphenisms, evolution and development. Recently the honeybee genome has been sequenced  and analysis of developmental genes has revealed that some early acting developmental genes are absent . Furthermore, the development of molecular techniques including in situ hybridisation and RNA interference (RNAi) [34–37] allow us to examine gene expression and the biological role of genes during honeybee embryogenesis and development, about which little is known despite the importance of the honeybee both scientifically and economically. Given the wide range of organ systems in which Sox genes are expressed, we aimed to identify and examine the expression of honeybee Sox genes. We identified nine Sox genes and used in situ hybridisation and RT-PCR to determine their expression patterns in the honeybee embryo, ovary and adult brain.
In insects much of the diversity in Sox genes is found within the SoxB clade. Our phylogenetic analyses indicate that this clade is split into four groups (Fig. 2A), but that the details of the groupings are not well-resolved, due in part to the high sequence similarity of the HMG domains, and sequence divergence between SOX proteins outside of this domain. The SOX21/Dichaete clade is unresolved, but separate from the rest of the SOXB proteins. SOX21b proteins form a separate clade, as do the SOX21/Neuro orthologues. The SOXB2/21a clade is less well defined, with Drosophila SOX21a proteins being significantly different from SOXB2, perhaps indicating rapid evolution of these proteins in the lineage leading to Diptera.
The phylogenetic analysis demonstrates the evolutionary stability of the Sox gene complement in insect evolution. The major amount of diversification in sequence appears in the SoxB group and in the duplication of the SoxE genes, that is seen only in hymenoptera. Given the stability in sequence we examined the expression of these genes in honeybees to determine if sequence stability is matched with constancy of predicted function.
Phylogenetic analysis reveals that the honeybee genome contains four group B Sox genes. These were also identified by McKimmie et al., , who investigated the genomic organisation of group SoxB genes in insects. We examined the expression patterns of these Sox genes in the queen ovariole, honeybee worker embryos and adult brains.
During embryo development, AmSoxB1 is expressed along ventral gastrulation folds of stage 6 embryos (Fig. 4C), and in the procephalic neurogenic region. After gastrulation, AmSoxB1 expression continues in neuroblasts that arise from neuroectoderm along the ventral midline (Fig. 4E). At later stages these AmSoxB1 -positive cells migrate to lateral positions along the ventral axis to differentiate and take up positions within the CNS. Strong expression of AmSoxB1 is also found in neurons of embryonic brain cephalic lobes. This expression continues in the brain of the adult worker honeybee, where AmSoxB1 continues to be expressed in Kenyon cells in each calyx of the mushroom bodies (data not shown), the key region of the honeybee brain required for sensory processing and memory formation.
AmSox21b mRNA was detected in late embryos, in the CNS in paired/segmented ganglia on either side of the ventral nerve cord. Expression was also detected in the embryonic brain, intercalary head region and mandibles (Fig. 5C and 5D) and in the mushroom bodies of the adult worker brain (Fig. 5F). Strong AmSox21b expression is detected at the ventral tip of the developing mandible, implying that it may play a role in dorsal-ventral patterning of this appendage. In queen ovarioles, AmSox21b is strongly expressed by the nurse cells and its mRNA present in the oocyte, localized to both dorsal and ventral surfaces of the egg (Fig. 5E).
During late honeybee embryogenesis, the expression patterns of AmSoxB1 and AmSox21b group B genes do not overlap in the CNS (Figs 4F and 5C). These genes appear to be expressed in different neuronal cells along the ventral midline, implying that they play separate roles in the developing ÇNS. In the embryonic and adult brain, however, AmSoxB1 and AmSox21b are both expressed by the Kenyon cells of the mushroom bodies.
AmSoxC, AmSoxD and AmSoxF were all expressed by nurse cells of the queen ovariole and the follicle cells that surround the oocyte (Fig 6A). All three were also expressed ubiquitously throughout late stage embryos (Fig. 6B), although AmSoxC expression was slightly higher in the embryonic brain (Fig. 6C). AmSoxC and AmSoxD were also expressed by the Kenyon cells in the calyces of the mushroom bodies (MB) (Fig. 6D),.
As SOX proteins play key roles in gonad differentiation, we used RT-PCR to determine if the honeybee Sox genes were also expressed in the testis of the drone (Fig. 7). RNA was isolated from the testis of drone pupa, as the adult drone testis degenerates shortly after emergence . Strong expression of AmSoxE1, AmSoxE2 and AmSoxF was detected in testis and weak expression of AmSox21b (Fig. 7). AmSoxF was also expressed in queen ovaries (Fig. 6A) but only AmSoxE group gene expression appears to be testis-specific. No expression was detected in queen ovaries and only weak ubiquitous expression was found in late stage worker embryos.
Summary of honeybee Sox group expression analysis.
Honeybee expression summary
Drosophila expression summary
Vertebrate expression summary
Adult brain : MB*
Embryo : neuroectoderm, CNS (ventral midline), brain, malpighian tubules, mandibles, intercalary.
Queen ovary : oocyte and nurse cells.
Embryo : neuroectoderm, CNS, brain, hindgut, segments
Ovary : oocyte and nurse cells
Embryo : CNS, lens, brain, stem cells, pituitary.
Embryo: ubiquitous expression
Adult brain : MB
Queen ovary : nurse and follicle cells of the ovary
Embryo : ubiquitous
Embryo : many tissues including CNS, heart, lung
Embryo : ubiquitous
Adult brain : MB
Queen ovary : nurse and follicle cells
Embryo : brain
Embryo : many tissues including chrondrocytes, spermatogenesis, CNS, brain, thymus, ovary
Adult testis Embryo : no expression detected
Embryo : alimentary canal, gonadal mesoderm
Embryo : CNS, brain, limbs, heart, testes, chondrocytes, kidney, neural crest
Queen ovary : nurse and follicle cells
Embryo : peripheral nervous system
Embryo : endoderm, blood vessel and hair follicles.
The general features of group B gene expression are conserved for the honeybee, as their expression patterns suggest roles in neurogenesis and dorsal-ventral patterning. However, orthology based on phylogenetic evidence does not predict the expression pattern of an individual gene. Despite conservation in genomic organisation and sequence in insects , expression of the individual SoxB genes has changed considerably through the evolution of insects.
None of the AmSox B group genes show identical expression patterns to any of their orthologous DmSox B genes. For example, the AmSox21b expression pattern in the CNS is different to that of DmSox21b, which is expressed in abdominal epidermal stripes. AmSoxB1 expression pattern overlaps with both DmSoxB1 and DmDichaete (DmSoxB2.1) expression patterns. No expression was detected for AmSox21, which had been suggested to be a orthologue of DmDichaete by McKimmie et al.  based on phylogenetics and genome position, and is Dichaete's nearest neighbour in our phylogenetic analysis.
Recently, in Drosophila, examination of a DmDichaete (DmSoxB2.1) loss of function mutant found that Dichaete influenced dorsal-ventral patterning . Mutant eggs had defects in Gurken-dependent formation of dorsal appendages and differentiation of dorsal/anterior follicle cells. Additionally, in zebrafish, both knock-down and ectopic expression of the SOX protein SOX21a indicates that it acts in dorsal-ventral patterning . In the honeybee, two Sox genes appear to have a role in, or a regulated by, dorsal-ventral patterning. AmSoxB1 mRNA is localized to the dorsal surface of the oocyte and AmSox21b mRNA is localized to both the dorsal and ventral surface of the oocyte. As mRNA localization plays a critical part in axis specification in other insects [43, 44], it is likely that these AmSox genes have roles in dorsal-ventral patterning in the oocyte and they may have overlapping functions. It is currently unknown how axes are specified in the honeybee oocyte and early embryo, as the honeybee genome is missing several key genes essential for axis organisation in Drosophila . While these expression patterns suggest a conserved role for SOX group B proteins in dorsal-ventral patterning, the actual SoxB genes involved are not orthologous. The direct orthologue of DmDichaete, according to our phylogenetic analysis (and that of ) is AmSox21, which has no expression in the oocyte, while the honeybee orthologues of zebrafish Sox21a are AmSoxB2 and AmSox21a.
We have also found a novel expression pattern for a group B SOX protein, AmSOXB2, in the formation of the Malpighian tubules. The Drosophila group E SOX protein (DmSox100b) is also expressed in Malpighian tubules, while SOX proteins in mammals are expressed in analogous tissues, the foetal kidneys . AmSOXB2 sequence is highly divergent outside of the HMG box perhaps reflecting different selective pressure on its sequence due its co-option into a possible role in Malphigian tubule formation.
AmSoxC, AmSoxD and AmSoxF were expressed throughout late stage embryos. The Drosophila orthologue for SoxC is also ubiquitously expressed [19, 46], although SoxD and SoxF orthologues show specific nervous system expression. Vertebrate SoxD orthologues are expressed broadly in embryonic tissues  and more specifically in bone and pancreas.
There is little conservation in expression of SOXF group members between species. Vertebrate SoxF family members are involved in a range of activities including endoderm specification, blood and hair follicle development. DmSoxF is found in the peripheral nervous system  whereas C. elegans does not have a SoxF group gene .
Drosophila DmSox100b is expressed in gonadal mesoderm and its expression becomes male-specific after stage 15 but it is also expressed in other tissues including the alimentary canal, intestinal cells and Malpighian tubules [45, 48]. Upregulation of AmSOXE proteins solely in the drone testis implies that they may play a specific role in honeybee testis differentiation. Group F SOX proteins, a group closely related to SOXE proteins (Fig. 1), are also expressed in both testis and ovaries in other species including the eel  and human (sox17; ), indicating that SOXF proteins play a conserved evolutionary role in both male and female gonads.
Sequence analyses revealed the honeybee and Nasonia genomes encode two SoxE group members where-as there is only one in Drosophila (DmSox100b) and none in C. elegans . Expression of both AmSoxE genes was upregulated in the testis of honeybee drones, suggesting they play a role in testicular development. SOXE group proteins are expressed during testis determination in many species [48, 51–53]. Sequences outside of the HMG domains of SOXE1 and SOXE2 show little similarity. These sequence changes may have been necessary for interactions with other testis-related factors. Non-HMG domain sequences can play a role in protein partner selection between different SOX groups but SOX proteins within the same subgroup often interact with the same protein partners despite having sequences that are different outside of the HMG domain .
We identified and classified Sox genes in the genomes of Apis mellifera, Nasonia and Tribolium and examined the expression patterns of eight honeybee Sox genes by in situ hybridisation. The expression patterns of honeybee Sox genes confirm that members of this family are likely to play an essential role in embryogenesis and neural specification. Further studies are required including knock-down of gene expression to confirm the predicted roles of Sox genes in the honeybee.
SOX homologues were identified in insect genome sequences using tBlastN searches . Each putative AmSOX protein was analysed for the presence of a sequence motif RPMNAFMVW located within the HMG box which is conserved for all SOX sequences , confirming that those genes identified were members of the SOX group of HMG domain transcription factors. Multiple alignments of honeybee SOX HMG sequences with SOX domains from other species were carried out in ClustalX (see Additional files 1, 2 and 3). For Figure 1 the multiple alignment was analysed using MrBAYES 3.1.2. under the WAG model with default priors. The WAG model was chosen as the most appropriate model of amino-acid substitution after preliminary analysis using MrBAYES with mixed models. The Monte Carlo Markov Chain search was run with four chains over 1500000 generations with trees sampled every 1000 generations. The first 375000 trees were discarded as 'burn-in'. The trees in figures 2 and 3 were constructed using the PHYLIP package of programs from alignments bootstrapped using SEQBOOT. Maximum Likelihood trees were estimated using PROTML and majority-rule consensus trees derived using CONSENSE. Dendrograms were displayed using TreeViewPPC  or Dendroscope .
Genome sequence information for insects and other species was retrieved from their genome project websites [[32, 61–64]]. Exon/intron gene structure was predicted by either Genemachine  or was already predicted during the genome assembly using sets of reference sequences (including Drosophila) to help identify transcripts. Insect Sox genes were named based on their placement within each SOX groups (see Additional file 4 for Genebank accession numbers).
Total RNA was extracted from Honeybee embryos or testis dissected from drone pupa using the RNeasy Mini Kit (Qiagen) and cDNA was produced using Superscript II reverse transcriptase (Invitrogen). AmSox gene fragments were amplified by RT-PCR from embryo cDNA using oligonucleotide primers corresponding to non-HMG box encoding regions from the coding sequence of each predicted AmSox gene. Oligonucleotide primers used were: SoxC – 5'AGAAGCTGAGGAAATCGGGT3' and 5'AATTCCATCTTCATCTTTCCGTC3'; SoxB1 – 5'GCTCAAGAAGGATAAATTCCCC3' and 5'AATCGCCGTGTGATGCTG3'; soxB2 – 5'TCACACGTTGATGAGCCAC3' and 5'GACGACGACAAATTCTCCTCTTC3'; Sox21 – 5'-TCCAGGATCGAAGACCACC3' and 5'CTAGAATATTACGGAGACTGGCC3'; Sox21b – 5'GAAGTATTCGATGGAAGCGG3' and 5'GATGACAGTGAGCGGTCGT3'; SoxE1 – 5'CCAGAGCAACGTGACTTTCA3' and 5'CCACCTCGCACTCCTGAA3'; SoxE2 – 5'GAACGCGTTCATGGTCTG3' and 5'TCCTCGTGCACCGTGTAC3'; SoxF – 5'CTGAATTCAGGAAGACCAGTGG3' and 5'GACGGCTGTCTCTCGAAATT3'; SoxD – 5'GGAAGAAGATGCGCATATCC3' and 5'-TCAATCCTCGTCGTGGTG. Actin was amplified using 5'CTCTCTTTGATTGGGCTTCG3' and 5'TGACGAAGAAGTTGCTGCAC3' oligonucleotide primers as a positive control for RT-PCR. Amplified AmSox DNA fragments were then cloned into the pGEM-T Easy vector (Promega). The sequence and orientation of each cloned gene fragment was confirmed by DNA sequencing.
Honeybee embryos were collected and fixed as described . Brains were dissected from worker honeybees, fixed in 4% PFA overnight at 4°C and stored in methanol. Anti-sense or sense digoxigenin (DIG)-labeled RNA probes were produced by in vitro transcription from linearized DNA templates containing AmSox cDNA fragments. In situ hybridization on honeybee embryos, oocytes and worker brains were performed as described .
This work was supported by a Royal Society of New Zealand Marsden Grant (UOO0401) to PKD and a University of Otago Research Grant to PKD. We thank James Smith and Elizabeth Duncan for critical reading of this manuscript. We also thank Lucas Smith and William Dearden for their comments on the manuscript.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.