The evolution of Runx genes I. A comparative study of sequences from phylogenetically diverse model organisms
© Rennert et al; licensee BioMed Central Ltd. 2003
Received: 16 October 2002
Accepted: 24 March 2003
Published: 24 March 2003
Runx genes encode proteins defined by the highly conserved Runt DNA-binding domain. Studies of Runx genes and proteins in model organisms indicate that they are key transcriptional regulators of animal development. However, little is known about Runx gene evolution.
A phylogenetically broad sampling of publicly available Runx gene sequences was collected. In addition to the published sequences from mouse, sea urchin, Drosophila melanogaster and Caenorhabditis elegans, we collected several previously uncharacterised Runx sequences from public genome sequence databases. Among deuterostomes, mouse and pufferfish each contain three Runx genes, while the tunicate Ciona intestinalis and the sea urchin Strongylocentrotus purpuratus were each found to have only one Runx gene. Among protostomes, C. elegans has a single Runx gene, while Anopheles gambiae has three and D. melanogaster has four, including two genes that have not been previously described. Comparative sequence analysis reveals two highly conserved introns, one within and one just downstream of the Runt domain. All vertebrate Runx genes utilize two alternative promoters.
In the current public sequence database, the Runt domain is found only in bilaterians, suggesting that it may be a metazoan invention. Bilaterians appear to ancestrally contain a single Runx gene, suggesting that the multiple Runx genes in vertebrates and insects arose by independent duplication events within those respective lineages. At least two introns were present in the primordial bilaterian Runx gene. Alternative promoter usage arose prior to the duplication events that gave rise to three Runx genes in vertebrates.
Runx genes encode the sequence-specific DNA binding subunit of a heterodimeric transcription factor, the defining feature of which is the Runt domain, a highly conserved 128 amino acid sequence involved in DNA binding, heterodimerization, nucleotide binding, and nuclear localization [1, 2]. The Runt domain is named after the first member of the family to be discovered, the regulatory gene runt from Drosophila melanogaster. Runx genes have also been discovered and functionally characterized in mammals, sea urchins and nematodes, and in general are involved in the transcriptional control of developmental processes [3, 4]. In humans, mutations in each of the three Runx genes are associated with disease caused by defective control of cell proliferation and/or differentiation [4, 5]. Most studies of Runx gene function and regulation have been carried out in mammals and in D. melanogaster, each of which has multiple Runx genes. It is not currently known how the Runx gene family evolved, nor is it known how many Runx genes the first animal possessed. Answering these questions will facilitate identification of primitive (general) and derived (specialized) aspects of Runx gene structure, function and regulation.
Results and discussion
The collection of Runx gene sequences from phylogenetically diverse model organisms
Runx gene copy number
Two of the invertebrates in our collection with completely sequenced genomes, the protostome C. elegans and the deuterostome C. intestinalis, each were found to contain a single Runx gene. Screens of genomic libraries using the Runt domain sequence as probe indicate that the same is also apparently true of the sea urchin S. purpuratus , although final verification of this awaits the completion of the sea urchin genome. In contrast, the model organisms in which most functional studies of Runx genes have been performed (i.e. D. melanogaster and mouse) each contain multiple Runx genes. Both of the vertebrate species surveyed (mouse and T. rubripes) were each found to contain three Runx genes, as was the mosquito A. gambiae, whereas D. melanogaster contains four.
The simplest explanation of such a gene distribution is that possession of a single Runx gene is the primitive condition for bilaterians, i.e., that the multiplicity of Runx genes in the vertebrate and arthropod lineages resulted from independent gene duplications within those respective lineages. The results of a phylogenetic analysis of Runt domain sequences are consistent with this hypothesis (see below). Since fruit flies and mice each contain multiple Runx genes, it is probable that each of those genes has acquired specialized (e.g., region- or tissue-specific) functions that were derived subsequent to the duplications, and hence peculiar to the taxonomic group to which each of those model organisms belong. Experimental studies of invertebrates that contain only a single Runx gene (sea urchin, tunicate, and nematode) are therefore likely to highlight primitive (general) functions of this family of transcription factors in the control of cell fate, proliferation, and differentiation during metazoan development.
Exon and intron positions among Runx genes
The gross structural features of the collected Runx genes, including positions of exons and introns (excepting those upstream of the proximal promoter of the vertebrate Runx genes – see below), are depicted schematically in Figure 1 (see also Supplemental Table 2 and Supplemental Figure 1). To locate exon-intron positions in previously uncharacterised genes (or predicted genes) that have not yet been associated with cDNAs, we performed spliced alignment, maintaining maximum similarity between homologues, as described in Methods. As can be seen in Figure 1, all of the Runx genes contain a central, highly conserved exon, which encodes the C-terminal third of the Runt domain (black box in Fig. 1). The N-terminal end of this exon is bounded by an intron whose position is absolutely conserved among all Runx genes except DmRunt and AgRunt (which have apparently lost the intron), while the C-terminal end is bounded by an intron that is conserved among all deuterostome species in our collection, and shifted but a few nucleotides downstream in the insect genes and upstream in the C. elegans gene. The existence and locations of all of the other introns are variable, but some of them are characteristic of specific clades.
All of the chordate Runx genes (including CiRunt) contain an intron within the N-terminal half of the Runt domain, the position of which is invariant with respect to nucleotide sequence. This intron is missing in the other Runx genes, including the sea urchin representative (SpRunt), and is therefore likely to be a chordate-specific feature of Runx genes that arose prior to the Runx gene duplications in the vertebrate lineage. Upstream of the Runt domain, TrRunx1 is predicted to have two short introns that are not found in its mouse orthologue (MmRunx1) or in any of the other chordate Runx genes. The mammalian Runx genes have longer introns than their counterparts in Takifugu, as might be expected based on relative genome sizes. With the exception of DmRunt, all of the Drosophila Runx genes contain one (DmLozenge and DmRunxB) or two (DmRunxA) introns separating exons N-terminal to the Runt domain, but all of these have different positions, and were thus likely to have been incorporated subsequent to the divergence of these genes. The position of the first intron in DmLozenge is conserved in one of the mosquito genes (AgLozenge), indicating that these two genes are orthologues, which is confirmed by a phylogenetic analysis of sequences (see below). CeRun contains an intron within the N-terminal half of the Runt domain that is not found in any other Runx gene, and contains the largest number of exons of any of the genes in our collection. In general, the structure of the C. elegans gene is the most divergent of all of the Runx genes.
A variable number of exons are found 3' of the Runt domain among different species, and these reveal group-specific patterns. Except for the lozenge orthologues, which contain two such exons, all of the insect genes for which complete sequences are available contain only a single exon downstream of the Runt domain. The same is apparently true for the single sea urchin gene (although this is provisionally based on the fragmentary evidence of a single small BAC clone ), suggesting that a single exon downstream of the Runt domain may be the primitive condition for bilaterians. In vertebrates (from the proximal promoter), the Runx3 orthologues each contain 2 exons downstream of the Runt domain, while the Runx2 orthologues contain 4 and the Runx1 orthologues contain 3. CiRunt contains 5 exons downstream of the Runt domain, while CeRun contains 4. In vertebrates at least, the multiplicity of downstream exons is reflected in a large variety of alternatively spliced transcripts that give rise to multiple protein isoforms that differ in the C-terminal sequences appended to the Runt domain [12, 13]. It is important to note that the C-terminal exon of all Runx genes identified to date encodes the amino acid sequence VWRPY (or the functionally equivalent IWRPF in the case of C. elegans), which acts as a recruitment motif for the co-repressor Groucho/TLE . This motif is at the C-terminus within the 3'-most exon of all of the genes, possibly with the exception of CiRunt (where the open reading frame apparently continues beyond the VWRPY in the genome sequence), and was used in our analysis to identify the C-terminal exons of previously uncharacterised Runx genes.
Runt domain sequences across phylogeny
The amino acid sequences of the Runt domains of all each of the collected genes, together with two previously described sequences from the spider Cupiennius salei , a sequence from the crayfish Pacifastacus leniusculus, and one from the nematode Meloidogyne hapla, were aligned using Clustal W , as shown in Figure 2. Functionally, the Runt domain is required both for DNA binding and for interaction with its heterodimeric partner (the beta subunit), which serves to allosterically enhance the DNA binding of the Runt domain [1, 14]. Superposition of the alignment and the known crystal structure of mouse Runx1 [14, 17] reveals that residues that make either direct or indirect contact with DNA are invariant (marked by asterisks in Fig. 2). Sequence motifs that interact with the beta subunit are also highly conserved.
Inferences from phylogenetic analysis of Runt domain sequences*
Monophyly of deuterostome homologues?
Yes (71 % bootstrap support for the deepest branch)
Yes for vertebrates and urochordates (74%), ambiguous for echinoderms
Yes (93 % posterior probability)
Runx3 basal in vertebrates?
Deepest deuterostome branch is a single-copy gene?
Deepest protostome branch is a single-copy gene?
Unclear (unresolved topology of the nematode and spider paralogues)
Unclear (unresolved topology of the nematode and spider paralogues)
Unclear (unresolved topology of the nematode and spider paralogues)
Unclear (unresolved topology of the nematode and spider paralogues)
While the consensus of trees summarized in Table 1 is for the most part consistent with the expected monophyly of the protostome genes in our collection (nematodes, arthropods), the branching order is ambiguous. Among the insect representatives, each of the mosquito genes can be confidently assigned orthologues from Drosophila (Fig. 3), as was also suggested by intron positions (Fig. 1). Beyond that, low bootstrap values prohibit a confident assessment of evolutionary branching order of the arthropod Runx paralogues, and it is not clear whether they are monophyletic (although the position of the crayfish gene suggests a deep duplication event within in the insect/crustacean clade at least). Resolution of the evolutionary relationships among the protostome Runx genes will require a broader sampling of Runx genes from Arthropods as well as from protostomes in general.
The occurrence of three Runx genes in vertebrates, each on a different chromosome, is consistent with a scenario of multiple gene duplications, or two successive rounds of genome duplications that may have occurred near the base of the vertebrate branch of chordates . In contrast, the multiple Runx genes in Drosophila are all physically linked to runt on the X chromosome, and were almost certainly generated by local gene duplication events (for the location of the Drosophila genes, see FlyBase Genome Browser: http://www.bdgp.org/cgi-bin/annot/gbrowse). The two newly discovered Drosophila Runx genes both contain a complete open reading frame encoding the Runt domain (Fig. 2) and conserved exon-intron structure (Fig. 1 and Supplemental Fig. 1), suggesting that they are probably not pseudogenes. This is further supported by our finding that each of the three mosquito Runx genes has an orthologue in Drosophila (Figs. 1, 2, and 3).
Alternative promoter usage among vertebrate runx genes
In the public databases, the Runt domain is currently found only in sequences from bilaterians. This suggests that it may be a metazoan (and possibly a bilaterian) invention, and that the genomic regulatory networks through which Runx genes control the fate, proliferation and differentiation of cells are unique to animals.
The primitive condition in bilaterians is most likely a single Runx gene, represented in the deuterostomes of our collection by the sea urchin S. purpuratus and the tunicate C. intestinalis, and in the protostomes of our collection by C. elegans. Runx gene duplications appear to have occurred independently in the lineages leading to vertebrates (which have at least three Runx genes) and insects (which have three or four Runx genes), suggesting that Runx genes in these latter organisms have probably acquired a number of specialized, taxon-specific regulatory functions (e.g., segmentation in arthropods , bone development in vertebrates , etc.). Thus, studies of Runx genes in sea urchins, tunicates, and nematodes are likely to highlight primitive, pan-bilaterian regulatory functions of this family of genes in the cell biology of animal development.
The ancestral bilaterian Runx gene was apparently assembled from three exons and contained two introns, one within the sequence that encodes the Runt domain, and a second that borders the sequence encoding the C-terminal end of the Runt domain.
The ancestor of all three vertebrate Runx genes utilized two alternative promoters that generate alternative N-terminal sequences.
Collection and assembly of previously uncharacterised runx genes
The Ciona intestinalis Runx gene was assembled with sequence files obtained from the Trace Archive database. A "seed" sequence was found in C. intestinalis gDNA, Ti 119616831 (Supplemental Table 1), by a MegaBLAST search using nucleotide sequence encoding exon 3 of SpRunt in the NCBI Trace server site http://www.ncbi.nlm.nih.gov/blast/mmtrace.html. A translation of the seed sequence also showed significant similarity to the Runt domain. The seed sequence showing Runx homology was in turn used as the query to obtain overlapping sequence and those obtained replaced the query. This repetitive BLAST process was continued such that the seed sequence was extended 25 kb in both the 5' and 3' direction relative to the coding orientation of the seed. The intron position assignments were aided by an alignment of the acquired C. intestinalis Runx gene with partial cDNA sequences obtained from a TBLASTN search using the S. purpuratus SpRunt-1 protein sequence to query the C. intestinalis cDNA project site at http://ghost.zool.kyoto-u.ac.jp/indexr1.html. Introns in those regions of the gene upstream of the Runt domain, not found in any cDNA, were manually assigned by spliced alignment, maintaining maximum homologue similarity. Specifically, putative introns were located by searching for the termini of open reading frames, and then positioned in such a way that the resulting open reading frame had maximum similarity to that of a homologous gene while the intron-exon junctions had the expected consensus 5'GT/-AG-3' sequence. A known intron in a similar position in an orthologous gene was considered a validation of correct position assignment. Highly conserved sequence blocks or residues, such as the Runt domain or the C-terminal Groucho-binding site VWRPY, served as anchoring reference points. Multiple overlapping sequences in the C. intestinalis contig provided a confidence in the final sequence assembly. Close inspection of the sequence alignments showed no variation among high confidence regions of the sequence where high confidence regions means at least 40–50 nucleotides from the end of the read or with few unassigned base calls near the region. The complete absence of sequence variation, the multiple sequence coverage, and the high sequence quality all indicate that no additional Runx genes are present in the C. intestinalis genome, which was confirmed by inspection of the recently completed genome at http://genome.jgi-psf.org/ciona4/ciona4.home.html.
Three Runt domain genes in the A. gambiae genome and two previously unidentified Runt domain genes in the D. melanogaster genome were found by TBLASTN search using the S. purpuratus Runt protein sequence as the query. The newly identified Drosophila Runx gene sequences (CG1379 and CG15455) are present in a single sequence file (see Supplemental Table 1) and with opposite coding orientation and were named for this study RunxA (CG1379) and RunxB (CG15455). No cDNAs specific for the mosquito Runx genes or for the Drosophila RunxA or RunxB genes was found despite an extensive search, so putative intron positions were manually located in these genes by spliced alignment as described above. Unlike all of the deuterostome genes, the positions of introns among the insect genes were not conserved to the nucleotide, and were thus assigned with less confidence and should hence be considered provisional.
Three Runx genes were identified in the recently completed T. rubripes genome sequence by a TBLASTN search using the S. purpuratus Runt protein sequence as the query sequence at the Fugu BLAST Server of the UK Human Genome Mapping Project Resource Centre http://fugu.hgmp.mrc.ac.uk/blast/. cDNA sequences from the zebrafish (D. rerio) were used to place the start of the coding region in the distal promoter of TrRunx1 and TrRunx3, and cDNA sequences from the teleost O. laptipes were used to place the start of the coding region in the distal promoter of TrRunx2 (Supplemental Table 1). The promoter structure of TrRunx2 published by Eggers et al.  during the course of our study, was in agreement with our results.
In addition to the gene sequences, two published partial cDNA sequences from a spider (Cupiennius salei) and an EST from a nematode (Meloidogyne hapla) were obtained from the NCBI database and used in the multiple sequence alignment. Since the nematode sequence and one of the spider sequences (run-2) only contain part of the runt domain, these were not used in the phylogenetic analysis.
Multiple sequence alignments and phylogenetic tree construction
The alignment of the amino acid sequences of all Runt domains used in this study was performed using the modified Clustal W  program in the AlignX® module of Vector NTI (InforMax, Inc.). The trees were calculated using programs from the PHYLIP package [Felsenstein, J. 1993–2002. PHYLIP (Phylogeny Inference Package) version 3.6a3. Distributed by the author. Department of Genetics, University of Washington, Seattle] or by the MrBayes program . Specifically, 100 bootstrap replicates of the alignment were constructed using the SEQBOOT program, and the distances between sequences were computed by the PROTDIST program using the PAM substitution model. Neighbour joining trees were built using the NEIGHBOR program, and the consensus tree was derived using the CONSENSE program. For the maximum-likelihood trees, the experimental versions of the programs PROML (no assumption of molecular clock) and PROMLK (molecular clock assumed) from PHYLIP version 3.6a3 were used with the JTT evolutionary model and with the assumption of constant change rate between sites. The maximum parsimony trees were constructed using the PROTPARS program.
This research was made possible by a Stowers Institute Functional Genomics Fellowship awarded to J.R., and was funded entirely by the Stowers Institute for Medical Research. We thank Dr. Galina Glazko (Pennsylvania State University) for assistance with the phylogenetic analysis, and anonymous reviewers for providing a number of constructive criticisms that improved the manuscript.
- Kagoshima H, Shigesada K, Satake M, Ito Y, Miyoshi H, Ohki M, Pepling M, Gergen P: The Runt domain identifies a new family of heteromeric transcriptional regulators. Trends Genet. 1993, 9: 338-341. 10.1016/0168-9525(93)90026-E.View ArticlePubMedGoogle Scholar
- Crute BE, Lewis AF, Wu Z, Bushweller JH, Speck NA: Biochemical and biophysical properties of the core-binding factor alpha2 (AML1) DNA-binding domain. J Biol Chem. 1996, 271: 26251-26260. 10.1074/jbc.271.42.26251.View ArticlePubMedGoogle Scholar
- Wheeler JC, Shigesada K, Gergen JP, Ito Y: Mechanisms of transcriptional regulation by Runt domain proteins. Semin Cell Dev Biol. 2000, 11: 369-375. 10.1006/scdb.2000.0184.View ArticlePubMedGoogle Scholar
- Coffman JA: Runx transcription factors and the developmental balance between cell proliferation and differentiation. Cell Biology International. 2003, In Press:Google Scholar
- Lund AH, van Lohuizen M: RUNX: a trilogy of cancer genes. Cancer Cell. 2002, 1: 213-215. 10.1016/S1535-6108(02)00049-1.View ArticlePubMedGoogle Scholar
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol. 1990, 215: 403-410. 10.1006/jmbi.1990.9999.View ArticlePubMedGoogle Scholar
- Canon J, Banerjee U: Runt and Lozenge function in Drosophila development. Semin Cell Dev Biol. 2000, 11: 327-336. 10.1006/scdb.2000.0185.View ArticlePubMedGoogle Scholar
- Nam S, Jin YH, Li QL, Lee KY, Jeong GB, Ito Y, Lee J, Bae SC: Expression pattern, regulation, and biological role of runt domain transcription factor, run, in Caenorhabditis elegans. Mol Cell Biol. 2002, 22: 547-554. 10.1128/MCB.22.2.547-554.2002.PubMed CentralView ArticlePubMedGoogle Scholar
- Coffman JA, Kirchhamer CV, Harrington MG, Davidson EH: SpRunt-1, a new member of the runt domain family of transcription factors, is a positive regulator of the aboral ectoderm-specific CyIIIA gene in sea urchin embryos. Dev Biol. 1996, 174: 43-54. 10.1006/dbio.1996.0050.View ArticlePubMedGoogle Scholar
- Robertson AJ, Dickey CE, McCarthy JM, Coffman JA: The expression of SpRunt during sea urchin embryogenesis. Mech Dev. 2002, 117: 327-330. 10.1016/S0925-4773(02)00201-0.View ArticlePubMedGoogle Scholar
- Eggers JH, Stock M, Fliegauf M, Vonderstrass B, Otto F: Genomic characterization of the RUNX2 gene of Fugu rubripes. Gene. 2002, 291: 159-167. 10.1016/S0378-1119(02)00592-9.View ArticlePubMedGoogle Scholar
- Levanon D, Glusman G, Bangsow T, Ben-Asher E, Male DA, Avidan N, Bangsow C, Hattori M, Taylor TD, Taudien S, Blechschmidt K, Shimizu N, Rosenthal A, Sakaki Y, Lancet D, Groner Y: Architecture and anatomy of the genomic locus encoding the human leukemia-associated transcription factor RUNX1/AML1. Gene. 2001, 262: 23-33. 10.1016/S0378-1119(00)00532-1.View ArticlePubMedGoogle Scholar
- Bangsow C, Rubins N, Glusman G, Bernstein Y, Negreanu V, Goldenberg D, Lotem J, Ben-Asher E, Lancet D, Levanon D, Groner Y: The RUNX3 gene--sequence, structure and regulated expression. Gene. 2001, 279: 221-232. 10.1016/S0378-1119(01)00760-0.View ArticlePubMedGoogle Scholar
- Tahirov TH, Inoue-Bungo T, Morii H, Fujikawa A, Sasaki M, Kimura K, Shiina M, Sato K, Kumasaka T, Yamamoto M, Ishii S, Ogata K: Structural analyses of DNA recognition by the AML1/Runx-1 Runt domain and its allosteric control by CBFbeta. Cell. 2001, 104: 755-767.View ArticlePubMedGoogle Scholar
- Damen WG, Weller M, Tautz D: Expression patterns of hairy, even-skipped, and runt in the spider Cupiennius salei imply that these genes were segmentation genes in a basal arthropod. Proc Natl Acad Sci U S A. 2000, 97: 4515-4519. 10.1073/pnas.97.9.4515.PubMed CentralView ArticlePubMedGoogle Scholar
- Thompson JD, Higgins DG, Gibson TJ: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994, 22: 4673-4680.PubMed CentralView ArticlePubMedGoogle Scholar
- Backstrom S, Wolf-Watz M, Grundstrom C, Hard T, Grundstrom T, Sauer U: The RUNX1 Runt Domain at 1.25A Resolution: A Structural Switch and Specifically Bound Chloride Ions Modulate DNA Binding. J Mol Biol. 2002, 322: 259-10.1016/S0022-2836(02)00702-7.View ArticlePubMedGoogle Scholar
- Huelsenbeck JP, Ronquist F: MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics. 2001, 17: 754-755. 10.1093/bioinformatics/17.8.754.View ArticlePubMedGoogle Scholar
- Holland PW, Garcia-Fernandez J, Williams NA, Sidow A: Gene duplications and the origins of vertebrate development. Dev Suppl. 1994, 125-133.Google Scholar
- Komori T, Yagi H, Nomura S, Yamaguchi A, Sasaki K, Deguchi K, Shimizu Y, Bronson RT, Gao YH, Inada M, Sato M, Okamoto R, Kitamura Y, Yoshiki S, Kishimoto T: Targeted disruption of Cbfa1 results in a complete lack of bone formation owing to maturational arrest of osteoblasts. Cell. 1997, 89: 755-764.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.