Identification and characterization of novel human tissue-specific RFX transcription factors
© Aftab et al; licensee BioMed Central Ltd. 2008
Received: 01 April 2008
Accepted: 01 August 2008
Published: 01 August 2008
Five regulatory factor X (RFX) transcription factors (TFs)–RFX1-5–have been previously characterized in the human genome, which have been demonstrated to be critical for development and are associated with an expanding list of serious human disease conditions including major histocompatibility (MHC) class II deficiency and ciliaophathies.
In this study, we have identified two additional RFX genes–RFX6 and RFX7–in the current human genome sequences. Both RFX6 and RFX7 are demonstrated to be winged-helix TFs and have well conserved RFX DNA binding domains (DBDs), which are also found in winged-helix TFs RFX1-5. Phylogenetic analysis suggests that the RFX family in the human genome has undergone at least three gene duplications in evolution and the seven human RFX genes can be clearly categorized into three subgroups: (1) RFX1-3, (2) RFX4 and RFX6, and (3) RFX5 and RFX7. Our functional genomics analysis suggests that RFX6 and RFX7 have distinct expression profiles. RFX6 is expressed almost exclusively in the pancreatic islets, while RFX7 has high ubiquitous expression in nearly all tissues examined, particularly in various brain tissues.
The identification and further characterization of these two novel RFX genes hold promise for gaining critical insight into development and many disease conditions in mammals, potentially leading to identification of disease genes and biomarkers.
The regulatory factor X (RFX) gene family transcription factors (TFs) were first detected in mammals as the regulatory factor that binds to a conserved cis-regulatory element called the X-box motif about 20 years ago . The X-box motifs, which are typically 14-mer DNA sequences, were initially identified as a result of alignment and inspection of the promoter regions of major histocompatibility complex (MHC) class II genes for conserved DNA elements [2, 3]. Further investigations revealed that the X-box motif is highly conserved in the promoter regions of various MHC class II genes . The first RFX gene (RFX1) was later characterized as a candidate major histocompatibility complex (MHC) class II promoter binding protein . RFX1 was later found to function also as a transactivator of the hepatitis B virus enhancer . Subsequent studies revealed that RFX1 is not alone. Instead, it became the founding member of a novel family of homodimeric and heterodimeric DNA-binding proteins, which also includes RFX2 and RFX3 . More members of this gene family were subsequently identified. A fourth RFX gene (RFX4) was discovered in a human breast tumor tissue  and the fifth, RFX5, was identified as a DNA-binding regulatory factor that is mutated in primary MHC class II deficiency (bare lymphocyte syndrome, BLS) . The identification of RFX1-5 and RFX genes in other genomes including the genomes of lower eukaryote species Saccharomyces cerevisiae  and Schizosaccharomyces pombe , and higher eukaryote species the nematode Caenorhabdits elegans  helped understand both the evolution of the RFX gene family and the DNA binding domains . Notably, while previous studies reported five RFX genes (RFX1-5) in human, only one RFX gene has been identified in most invertebrate animals and yeast. In contrast, the fruit fly (Drosophila melanogaster) genome has been found to have two RFX genes, dRFX  and dRFX2 . All of these RFX genes are transcription factors possessing a novel and highly conserved DNA binding domain (DBD) called RFX DNA binding domain , the defining feature of all members belonging to the RFX gene family, suggesting that these RFX TFs all bind to the X-box motifs.
In addition to the defining DBD domains in all of these RFX genes, most of these previously identified RFX genes also contain other conserved domains including B, C, and D domains . The D domain is also called the dimerization domain . The B and C domains also play a role in dimerization and are thus called the extended dimerization domains . Another important domain found in many members of the RFX family is the RFX activation domain (AD). For instance, RFX1 contains a well defined AD . However, AD is not found in many other members of the RFX family including the human RFX5 and C. elegans DAF-19 . Outside of these conserved domains, RFX genes from different species or even from same species show little similarity in other regions, which is quite consistent with their diverse functions and distinct expression profiles.
In humans, RFX1 is primarily found in the brain with high expression in cerebral cortex and Purkinje cells . RFX2  and RFX4  are found to be heavily expressed in the testis. RFX4 is also expressed in the brain . RFX3 is expressed in ciliated cells and is required for growth and function of cilia including pancreatic endocrine cells , ependymal cells , and neuronal cells . RFX3-deficient mice show left-right (L-R) asymmetry defects , developmental defect, diabetes , and congenital hydrocephalus in mice . RFX5 is the most extensively studied RFX gene so far primarily since it serves as a transcription activator of the clinically important MHC II genes  and mediates a enhanceosome formation, which results in a complex containing RFXANK (also known as RFX-B), RFXAP, CREB, and CIITA . Mutation in any one of these complex members leads to bare lymphocyte syndrome (BLS) . In C.elegans and S.cerevisae only one copy of the RFX gene exists. In C. elegans it is called DAF-19 and in S.cerevisae it is called Crt1. DAF-19 is involved in regulation of sensory neuron cilium whereas Crt-1 is involved in regulating DNA replication and damage checkpoint pathways [10, 12]. In D.melanogaster, two of RFX genes have been identified, one is called dRFX and the other is called dRFX2. dRFX is expressed in the spermatid and brain and is necessary for ciliated sensory neuron differentiation [14, 26]. dRFX2 has not been studied extensively and as such its function in Drosophila still remains unclear; however, there is evidence suggesting that dRFX2 plays a role in cell-cycle of the eye imaginal discs .
In this project, we have identified and characterized two novel RFX genes in genomes of human and many other mammals, which have now been sequenced, annotated, and analyzed.
Results and discussions
Names and Protein ID of Representative RFX genes.
Accession Number (RefSeq)
ESEMBL protein ID
Number of exons
Number of isoforms
Because all known human RFX genes–RFX1-5–are well conserved and have been identified in other mammalian genomes, we hypothesized that orthologs of RFX6 and RFX7 also exist in other mammalian genomes. As expected, we have retrieved all seven RFX genes in the genomes of five other mammalian species including chimpanzee (Pan troglodytes), monkey (Macaca mulatta), dog (Canis familiaris), mouse (Mus musculus), and rat (Rattus norvegicus) with only one exception. In the rat genome, all except RFX2 were found despite extensive searches (Additional file 1). Most identified RFX genes are expressed and their transcripts can be found in existing EST libraries. Interestingly, existing EST evidence suggests that RFX6 and RFX7 have no or very few alternative isoforms similar to RFX1. In contrast, RFX2-4 usually have more alternative isoforms (Additional file 1).
To gain insight into the function of these two newly identified RFX genes, we explored the expression profiles of RFX6 and RFX7 and compared them to those of RFX1-5. We analyzed two independent datasets. First, we searched the dbEST database in genBank http://www.ncbi.nlm.nih.gov/dbEST/ to examine which EST libraries express transcripts of these RFX genes. The results indicate that the expression profile of RFX1-5 matches well with previously published data (see INTRODUCTION): RFX1 is found in many different tissue types including white blood cells, heart, eye, testis, and cancerous cell; RFX2 appears to be expressed in testis and brain; RFX3 appears to be expressed in the placenta and brain (i.e., medulla); RFX4 is found in the brain, as well as in testis as RFX2; and RFX5 expression has been observed in various different tissues including thymus, T-cells, kidney, brain, and lymph. The consistency of expression for RFX1-5 obtained from the dbEST database with previous observations suggests that dbEST provides good estimations of RFX genes' expression profiles. Using the same method, we found that RFX6 is primarily expressed in pancreas, with minor expression in liver, while RFX7 is widely and heavily expressed in many different tissue types including kidney (tumor tissues), thymus, brain, and placenta.
Examining additional gene expression databases, including publicly available Genomics Institute of the Novartis Research Foundation (GNF) Gene Expression Database http://symatlas.gnf.org/SymAtlas/, revealed very similar results.
Our results show that we have identified two novel RFX genes in the human genome, RFX6 and RFX7, thus expanding the human RFX gene family from five members (RFX1-5) to seven members (RFX1-7). In addition to their possession of highly conserved DBDs, RFX6 and RFX7 show similarity to known human RFX TFs in their functional domains. In particular, RFX6 and RFX4 all have B, C, and D domains, while RFX7 and RFX5 only have DBDs. Studies carried out over the past 20 years have demonstrated that RFX1-5 are critical for development and many additional biological processes and play an important role in various devastating disease conditions. For example, RFX3-deficient mice show left-right (L-R) asymmetry defects , developmental defects, diabetes , and congenital hydrocephalus . RFX3 may regulate the transcription of many genes that, when mutated, cause cilia defects and many disease conditions collectively called ciliopathies . Many known ciliopathy genes, including Bardet-Biedle syndrome (BBS) genes, are well conserved and the transcription of their C. elegans orthologs are regulated by the only RFX gene in C. elegans–DAF-19 [12, 39–41]. Mutation in any one of the RFX5 enhanceosome members–RFXANK, RFXAP, CREB, and CIITA–leads to bare lymphocyte syndrome (BLS) . We hypothesize that RFX6 and RFX7 are equally important as RFX1-5. The fact that RFX6 is primarily expressed in pancreatic tissues and is expressed at a low level compared to all other RFX genes (Figure 5) is particularly interesting. RFX6 may function as a key component of a transcriptional regulatory complex that regulates pancreas development and function.
Data source and data mining
Gene sets were obtained from the FTP site of the ENSEMBL database http://www.ensembl.org/index.html. In this project, the genomes of six mammals were analyzed. They are human (Homo sapiens, NCBI36.44), chimpanzee (Pan troglodytes, CHIMP2.1.44), dog (Canis familiaris, BROADD2.44), monkey (Macaca mulatta, MMUL_1.44), mouse (Mus musculus, NCBIM36.44), and rat (Rattus norvegicus, RGSC3.4.44). DBD sequences in human RFX1-5 were manually identified and extracted to a file. The sequences were aligned using ClustalW . The alignment was used as input to the profile building program hmmbuild, which is a program in the HMMER package http://hmmer.janelia.org. The resulting profile was used for searching curated proteomes of the six mammals described above using hmmsearch, another program in the HMMER package.
Gene model improvement
All RFX genes except one–dog (Cfa) RFX7–show good alignment with their corresponding orthologs. Dog RFX7 gene is truncated at the N-terminus, missing 37 residues compared to other RFX7 genes. We attempted to use GeneWise http://www.ebi.ac.uk/Wise2/[44, 45] to remodel this RFX gene. Using human (Hsa) RFX7 as the reference protein sequence and GeneWise, we recovered the missing residues. However, the first codon so identified was not the typical Met. Extending the coding sequence upstream did not help. This is likely due to a sequencing error.
Protein domain analysis
We retrieved DBDs and ADs from RFX genes using InterProScan (version 4.3.1) . To identify B, C, D domains, we used the HMMER program  as described above. Briefly, for HMMER searches, we used sequences of B, C, and D domains from known RFX genes (RFX1-3) to generate profiles for these domains respectively. We then searched for candidate B, C, and D domains in RFX6 and RFX7 using these profiles.
RFX interactome network analysis
Data were obtained at the HiMAP http://www.himap.org/ database  following online search instructions. All types of interactions were selected for searching. All seven interactions between RFX6 and other genes (DAT1, DTX2, FHL3, SS18L1, CCNK, RFX2, and RFX3) were previously reported by Rual et al.
Sequence alignment and phylogenetic analysis
Multiple-sequence alignment was carried out using the program ClustalW (version 1.83) . Phylogenetic tree construction was performed using PHYLIP http://evolution.genetics.washington.edu/phylip.html (Version 3.66). Briefly, sequence alignment in PHYLIP format was first created using ClustalW (Version1.83) . The alignment was used as input for PHYLIP. Programs utilized in the PHYLIP, in their respective order, were seqboot, protdist, neighbor, and consense. The phylogenetic tree file was visualized using Tree View http://taxonomy.zoology.gla.ac.uk/rod/treeview.html.
Expression profile of mammalian RFX genes using
ESTs and SAGE libraries
The EST database from NCBI was used to perform tblastn. The queries used for this tblastn were RFX1-7 of H. sapiens, M. musculus, and R. norvegicus. Hits with identity greater than or equal to 95% were selected.
This project was supported by an NSERC Discovery Award to NC. SA is supported by a NSERC USRA. LS is a Pacific Century Graduate Scholar. JSCC is supported by a Hemingway Nelson Architects Graduate Scholarship and a Weyerhaeuser Molecular Biology Graduate Scholarship. NC is also a Michael Smith Foundation for Health Research Scholar. We thank Drs. Robert Johnsen and Maja Tarailo for critical reading of the manuscript and insightful suggestions.
- Reith W, Satola S, Sanchez CH, Amaldi I, Lisowska-Grospierre B, Griscelli C, Hadam MR, Mach B: Congenital immunodeficiency with a regulatory defect in MHC class II gene expression lacks a specific HLA-DR promoter binding protein, RF-X. Cell. 1988, 53: 897-906.View ArticlePubMed
- Dorn A, Durand B, Marfing C, Le Meur M, Benoist C, Mathis D: Conserved major histocompatibility complex class II boxes–X and Y–are transcriptional control elements and specifically bind nuclear proteins. Proc Natl Acad Sci USA. 1987, 84: 6249-6253.PubMed CentralView ArticlePubMed
- Sherman PA, Basta PV, Ting JP: Upstream DNA sequences required for tissue-specific expression of the HLA-DR alpha gene. Proc Natl Acad Sci U S A. 1987, 84 (12): 4254-4258.PubMed CentralView ArticlePubMed
- Kara CJ, Glimcher LH: Regulation of MHC class II gene transcription. Curr Opin Immunol. 1991, 3: 16-21.View ArticlePubMed
- Reith W, Barras E, Satola S, Kobr M, Reinhart D, Sanchez CH, Mach B: Cloning of the major histocompatibility complex class II promoter binding protein affected in a hereditary defect in class II gene regulation. Proc Natl Acad Sci U S A. 1989, 86 (11): 4200-4204.PubMed CentralView ArticlePubMed
- Siegrist CA, Durand B, Emery P, David E, Hearing P, Mach B, Reith W: RFX1 is identical to enhancer factor C and functions as a transactivator of the hepatitis B virus enhancer. Mol Cell Biol. 1993, 13: 6375-6384.PubMed CentralView ArticlePubMed
- Reith W, Ucla C, Barras E, Gaud A, Durand B, Herrero-Sanchez C, Kobr M, Mach B: RFX1, a transactivator of hepatitis B virus enhancer I, belongs to a novel family of homodimeric and heterodimeric DNA-binding proteins. Mol Cell Biol. 1994, 14: 1230-1244.PubMed CentralView ArticlePubMed
- Dotzlaw H, Alkhalaf M, Murphy LC: Characterization of estrogen receptor variant mRNAs from human breast cancers. Mol Endocrinol. 1992, 6 (5): 773-785.PubMed
- Steimle V, Durand B, Barras E, Zufferey M, Hadam MR, Mach B, Reith W: A novel DNA-binding regulatory factor is mutated in primary MHC class II deficiency (bare lymphocyte syndrome). Genes Dev. 1995, 9 (9): 1021-1032.View ArticlePubMed
- Huang M, Zhou Z, Elledge SJ: The DNA replication and damage checkpoint pathways induce transcription by inhibition of the Crt1 repressor. Cell. 1998, 94 (5): 595-605.View ArticlePubMed
- Wu SY, McLeod M: The sak1+ gene of Schizosaccharomyces pombe encodes an RFX family DNA-binding protein that positively regulates cyclic AMP-dependent protein kinase-mediated exit from the mitotic cell cycle. Mol Cell Biol. 1995, 15 (3): 1479-1488.PubMed CentralView ArticlePubMed
- Swoboda P, Adler HT, Thomas JH: The RFX-type transcription factor DAF-19 regulates sensory neuron cilium formation in C. elegans. Mol Cell. 2000, 5: 411-421.View ArticlePubMed
- Emery P, Durand B, Mach B, Reith W: RFX proteins, a novel family of DNA binding proteins conserved in the eukaryotic kingdom. Nucleic Acids Res. 1996, 24 (5): 803-807.PubMed CentralView ArticlePubMed
- Dubruille R, Laurencon A, Vandaele C, Shishido E, Coulon-Bublex M, Swoboda P, Couble P, Kernan M, Durand B: Drosophila regulatory factor X is necessary for ciliated sensory neuron differentiation. Development. 2002, 129 (23): 5487-5498.View ArticlePubMed
- Otsuki K, Hayashi Y, Kato M, Yoshida H, Yamaguchi M: Characterization of dRFX2, a novel RFX family protein in Drosophila. Nucleic Acids Res. 2004, 32 (18): 5636-5648.PubMed CentralView ArticlePubMed
- Katan-Khaykovich Y, Shaul Y: RFX1, a single DNA-binding protein with a split dimerization domain, generates alternative complexes. J Biol Chem. 1998, 273: 24504-24512.View ArticlePubMed
- Ma K, Zheng S, Zuo Z: The transcription factor regulatory factor X1 increases the expression of neuronal glutamate transporter type 3. J Biol Chem. 2006, 281 (30): 21250-21255.View ArticlePubMed
- Wolfe SA, van Wert J, Grimes SR: Transcription factor RFX2 is abundant in rat testis and enriched in nuclei of primary spermatocytes where it appears to be required for transcription of the testis-specific histone H1t gene. J Cell Biochem. 2006, 99 (3): 735-746.View ArticlePubMed
- Morotomi-Yano K, Yano K, Saito H, Sun Z, Iwama A, Miki Y: Human regulatory factor X 4 (RFX4) is a testis-specific dimeric DNA-binding protein that cooperates with other human RFX members. J Biol Chem. 2002, 277 (1): 836-842.View ArticlePubMed
- Blackshear PJ, Graves JP, Stumpo DJ, Cobos I, Rubenstein JL, Zeldin DC: Graded phenotypic response to partial and complete deficiency of a brain-specific transcript variant of the winged helix transcription factor RFX4. Development. 2003, 130 (19): 4539-4552.View ArticlePubMed
- Ait-Lounis A, Baas D, Barras E, Benadiba C, Charollais A, Nlend Nlend R, Liegeois D, Meda P, Durand B, Reith W: Novel function of the ciliogenic transcription factor RFX3 in development of the endocrine pancreas. Diabetes. 2007, 56 (4): 950-959.View ArticlePubMed
- Baas D, Meiniel A, Benadiba C, Bonnafe E, Meiniel O, Reith W, Durand B: A deficiency in RFX3 causes hydrocephalus associated with abnormal differentiation of ependymal cells. Eur J Neurosci. 2006, 24 (4): 1020-1030.View ArticlePubMed
- Bonnafe E, Touka M, AitLounis A, Baas D, Barras E, Ucla C, Moreau A, Flamant F, Dubruille R, Couble P, et al: The transcription factor RFX3 directs nodal cilium development and left-right asymmetry specification. Mol Cell Biol. 2004, 24 (10): 4417-4427.PubMed CentralView ArticlePubMed
- Villard J, Peretti M, Masternak K, Barras E, Caretti G, Mantovani R, Reith W: A functionally essential domain of RFX5 mediates activation of major histocompatibility complex class II promoters by promoting cooperative binding between RFX and NF-Y. Mol Cell Biol. 2000, 20 (10): 3364-3376.PubMed CentralView ArticlePubMed
- Reith W, Mach B: The bare lymphocyte syndrome and the regulation of MHC expression. Annu Rev Immunol. 2001, 19: 331-373.View ArticlePubMed
- Vandaele C, Coulon-Bublex M, Couble P, Durand B: Drosophila regulatory factor X is an embryonic type I sensory neuron marker also expressed in spermatids and in the brain of Drosophila. Mech Dev. 2001, 103 (1-2): 159-162.View ArticlePubMed
- Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, et al: Initial sequencing and analysis of the human genome. Nature. 2001, 409: 860-921.View ArticlePubMed
- Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, Smith HO, Yandell M, Evans CA, Holt RA, et al: The sequence of the human genome. Science. 2001, 291: 1304-1351.View ArticlePubMed
- Durbin R, Eddy S, Krogh A, Mitchison G: Biological sequence analysis. probabilistic models of proteins and nucleic acids. 1998, Cambridge, United Kingdom: Cambridge University Press, 356-
- Wolberger C, Campbell R: New perch for the winged helix. Nat Struct Biol. 2000, 7 (4): 261-262.View ArticlePubMed
- Gajiwala KS, Chen H, Cornille F, Roques BP, Reith W, Mach B, Burley SK: Structure of the winged-helix protein hRFX1 reveals a new mode of DNA binding. Nature. 2000, 403 (6772): 916-921.View ArticlePubMed
- Mulder N, Apweiler R: InterPro and InterProScan: Tools for Protein Sequence Classification and Comparison. Methods Mol Biol. 2007, 396: 59-70.View ArticlePubMed
- Katan-Khaykovich Y, Spiegel I, Shaul Y: The dimerization/repression domain of RFX1 is related to a conserved region of its yeast homologues Crt1 and Sak1: a new function for an ancient motif. J Mol Biol. 1999, 294: 121-137.View ArticlePubMed
- Rual JF, Venkatesan K, Hao T, Hirozane-Kishikawa T, Dricot A, Li N, Berriz GF, Gibbons FD, Dreze M, Ayivi-Guedehoussou N, et al: Towards a proteome-scale map of the human protein-protein interaction network. Nature. 2005, 437 (7062): 1173-1178.View ArticlePubMed
- Emery P, Strubin M, Hofmann K, Bucher P, Mach B, Reith W: A consensus motif in the RFX DNA binding domain and binding domain mutants with altered specificity. Mol Cell Biol. 1996, 16 (8): 4486-4494.PubMed CentralView ArticlePubMed
- Rodriguez-Tome P: Searching the dbEST database. Methods Mol Biol. 1997, 69: 269-283.PubMed
- Siddiqui AS, Khattra J, Delaney AD, Zhao Y, Astell C, Asano J, Babakaiff R, Barber S, Beland J, Bohacec S, et al: A mouse atlas of gene expression: large-scale digital gene-expression profiles from precisely defined developing C57BL/6J mouse tissues and cells. Proc Natl Acad Sci U S A. 2005, 102 (51): 18485-18490.PubMed CentralView ArticlePubMed
- Badano JL, Mitsuma N, Beales PL, Katsanis N: The Ciliopathies: An Emerging Class of Human Genetic Disorders. Annu Rev Genomics Hum Genet. 2006, 7: 125-148.View ArticlePubMed
- Blacque OE, Perens EA, Boroevich KA, Inglis PN, Li C, Warner A, Khattra J, Holt RA, Ou G, Mah AK, et al: Functional genomics of the cilium, a sensory organelle. Curr Biol. 2005, 15 (10): 935-941.View ArticlePubMed
- Chen N, Mah A, Blacque OE, Chu J, Phgora K, Bakhoum MW, Newbury CR, Khattra J, Chan S, Go A, et al: Identification of ciliary and ciliopathy genes in Caenorhabditis elegans through comparative genomics. Genome Biol. 2006, 7 (12): R126-PubMed CentralView ArticlePubMed
- Efimenko E, Bubb K, Mak HY, Holzman T, Leroux MR, Ruvkun G, Thomas JH, Swoboda P: Analysis of xbx genes in C. elegans. Development. 2005, 132: 1923-1934.View ArticlePubMed
- Flicek P, Aken BL, Beal K, Ballester B, Caccamo M, Chen Y, Clarke L, Coates G, Cunningham F, Cutts T, et al: Ensembl 2008. Nucleic Acids Res. 2007
- Chenna R, Sugawara H, Koike T, Lopez R, Gibson TJ, Higgins DG, Thompson JD: Multiple sequence alignment with the Clustal series of programs. Nucleic Acids Res. 2003, 31 (13): 3497-3500.PubMed CentralView ArticlePubMed
- Birney E, Clamp M, Durbin R: GeneWise and Genomewise. Genome Res. 2004, 14 (5): 988-995.PubMed CentralView ArticlePubMed
- Birney E, Durbin R: Using GeneWise in the Drosophila annotation experiment. Genome Res. 2000, 10 (4): 547-548.PubMed CentralView ArticlePubMed
- Rhodes DR, Tomlins SA, Varambally S, Mahavisno V, Barrette T, Kalyana-Sundaram S, Ghosh D, Pandey A, Chinnaiyan AM: Probabilistic model of the human protein-protein interaction network. Nat Biotechnol. 2005, 23: 951-959.View ArticlePubMed
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.