Since the first basic helix-loop-helix (bHLH) motif with DNA-binding and dimerization capabilities was reported
, numerous bHLH proteins have been found to be intimately involved in the regulation of a wide range of developmental processes, including neurogenesis, myogenesis, hematopoiesis, sex determination, gut development, cell differentiation and proliferation, as well as other essential processes in organisms ranging from yeast to humans
[2, 3]. Hence, it is crucial that we understand the relationship of the various bHLH members, and be able to classify them into well-defined categories. These transcription factors, having a signature bHLH structural motif of approximately 60 amino acids and 19 highly conserved amino acids, consist of a basic region followed by two α-helices separated by a loop (HLH) region of variable length
. Working as a DNA-binding domain, the two basic domains dimerize to create a DNA interaction surface that recognizes the consensus hexanucleotide sequence, while the HLH domain interacts with other bHLH proteins to form homodimers or heterodimers between different bHLH family members
In 1997, a phylogenetic analysis based on 122 bHLH sequences resulted in a natural classification of different bHLH transcription factors into four monophyletic protein groups named A, B, C and D in an attempt to functionally classify bHLH proteins
. Since more bHLH proteins had been identified in animals, plants and fungi, 44 orthologous families and six higher-order groups had been defined based on phylogenetic analyses to then available bHLH proteins
[3, 6–8]. In addition, after the revision of Simionato et al. in
2007, animal bHLH proteins are classified into 45 families, among which 22, 12, 7, 1, 2 and 1 families are included in high order groups A, B, C, D, E and F, respectively
. Briefly, groups A and B bHLH proteins are inclined to bind core DNA sequences typical of E boxes (CANNTG), in which group A recognizes and binds CACCTG or CAGCTG and group B recognizes and binds CACGTG or CATGTTG. Group A bHLH proteins mainly regulate neurogenesis, myogenesis and mesoderm formation, while group B ones mainly regulate cell proliferation and differentiation, sterol metabolism and adipocyte formation, and expression of glucose-responsive genes. Group C proteins, complex molecules with one or two PAS domains following the bHLH motif, tend to bind the core sequence of ACGTG or GCGTG. They are responsible for the regulation of midline and tracheal development, circadian rhythms, and for the activation of gene transcription in response to environmental toxins. Group D proteins correspond to bHLH proteins that are unable to bind DNA due to lack of a basic domain and act as antagonists of group A proteins. Group E proteins, mainly regulating embryonic segmentation, somitogenesis and organogenesis, bind preferentially to sequences referred to as N boxes (CACGCG or CACGAG) and usually contain two characteristic domains named “Orange” and “WRPW” peptide in the carboxyl terminus. Group F proteins have the COE domain which has an additional domain involved in both dimerization and DNA binding. It has only one family, and mainly regulates head development and formation of olfactory sensory neurons
Due to the pivotal regulatory functions of bHLH proteins displaying in various organisms and the completion of genome sequencing projects for an increased number of organisms, it would be desirable to have a more refined classification scheme of the various types of bHLH motifs, as well as a better understanding of their evolutionary relationships both within and among species. Large numbers of bHLH family members have been the subject of several studies targeting the identification of their full complement encoded by genomes completely sequenced. The putative full set of genes encoding bHLH proteins has been reported to be 8 bHLH genes in Saccharomyces cerevisiae, 16 in Amphimedon queenslandica, 33 in Hydra magnipapillata, 42 in Caenorhabditis elegans, 46 in Ciona intestinalis, 50 in Strongylocentrotus purpuratus, 50 in Tribolium castaneum, 51 in Apis mellifera, 52 in Bombyx mori, 54 in Acyrthosiphon pisum, 57 in Daphnia pulex, 59 in Drosophila melanogaster, 63 in Lottia gigantea, 64 in Capitella sp 1, 68 in Nematodtella vectensis, 70 in Acropora digitifera, 78 in Branchiostoma floridae, 87 in Tetraodon nigroviridis, 104 in Gallus gallus, 107 in Ailuropoda melanoleuca, 114 in Mus musculus, 114 in Rattus norvegicus, 118 in Homo sapiens, 139 in Danio rerio, 162 in Arabidopsis thaliana, and 167 in Oryza sativa[9–21].
Ponerine ant, Harpegnathos saltator (Jerdon, 1851), has recently been introduced as a model organism for studying the relationship between stress resistance and longevity of eusocial insects, as well as the role of epigenetics in behavior, aging, and development. Several studies have recently been conducted to elucidate the developmental processes that result in its particular characters
[22, 23]. However, the H. saltator bHLH proteins have not yet been studied and characterized in detail. The H. saltator genome is the first ant genome having been sequenced. The draft H. saltator genome assembly sequenced using the Illumina Genome Analyzer platform was submitted by the Beijing Genomics Institute –Shenzhen in August 2010. Moreover, the H. saltator draft genomic assemblies reached scaffold N50 with a size of ~600 kb and covered more than 90% of the genomes
The comprehensive identification of bHLH protein members encoded in the H. saltator genome would facilitate experimental studies on biological functions of bHLH proteins in the regulation of H. saltator development as well as evolutionary analyses to the diversification of insect bHLH genes. In this study, tblastn searches against H. saltator genome sequence database was conducted using both amino acid sequences of 59 Drosophila melanogaster bHLH (DmbHLH) motifs
 and the 45 representative bHLH families (Additional file
 to retrieve candidate bHLH members. Subsequent examination and phylogenetic analysis enabled us to identify the putative full set of bHLH members encoded in H. saltator and to define orthologous families with sufficient confidence. The obtained results are helpful for further investigations into the structure and function of bHLH proteins in the regulation of H. saltator development.