- Research article
- Open Access
The architecture of mammalian ribosomal protein promoters
BMC Evolutionary Biology volume 5, Article number: 15 (2005)
Mammalian ribosomes contain 79 different proteins encoded by widely scattered single copy genes. Coordinate expression of these genes at transcriptional and post-transcriptional levels is required to ensure a roughly equimolar accumulation of ribosomal proteins. To date, detailed studies of only a very few ribosomal protein (rp) promoters have been made. To elucidate the general features of rp promoter architecture, I made a detailed sequence comparison of the promoter regions of the entire set of orthologous human and mouse rp genes.
A striking evolutionarily conserved feature of most rp genes is the separation by an intron of the sequences involved in transcriptional and translational regulation from the sequences with protein encoding function. Another conserved feature is the polypyrimidine initiator, which conforms to the consensus (Y)2C+1TY(T)2(Y)3. At least 60 % of the rp promoters contain a largely conserved TATA box or A/T-rich motif, which should theoretically have TBP-binding capability. A remarkably high proportion of the promoters contain conserved binding sites for transcription factors that were previously implicated in rp gene expression, namely upstream GABP and Sp1 sites and downstream YY1 sites. Over 80 % of human and mouse rp genes contain a transposable element residue within 900 bp of 5' flanking sequence; very little sequence identity between human and mouse orthologues was evident more than 200 bp upstream of the transcriptional start point.
This analysis has provided some valuable insights into the general architecture of mammalian rp promoters and has identified parameters that might coordinately regulate the transcriptional activity of certain subsets of rp genes.
Ribosomes are vital organelles, which catalyze protein synthesis in all living organisms. Eukaryotic ribosomes consist of four RNA molecules (rRNAs) and 79 different proteins. The mammalian genes encoding the rRNAs are multicopy and clustered at a few loci, whereas those encoding the ribosomal proteins (rp genes) are single copy and scattered throughout the genome . In addition to the functional rp genes, all of which contain introns, mammalian genomes contain many nonfunctional intronless rp pseudogenes . The earliest determinations of mouse rp gene sequences and of transcriptional start points (tsp's) revealed a salient feature of rp genes, namely that transcription is initiated at a C residue within a polypyrimidine tract [3–5]. A recent study by Kenmochi and coworkers  has demonstrated that this is a general property of virtually all human rp genes. Because of this novel feature, the rp mRNAs contain a 5' terminal oligopyrimidine sequence (TOP), which is essential for their translational control .
Coordinated expression of the rp genes at transcriptional and post-transcriptional levels is required to ensure a roughly equimolar accumulation of ribosomal proteins. Transcriptional run-on measurements with nuclei of rapidly proliferating cells indicated equivalent loading of RNA polymerases on three unlinked mouse rp genes , consistent with the equal abundance of the corresponding rp mRNAs . Moreover, the promoters of these genes were of comparable strength in driving the expression of a common reporter gene . These results suggested that similar promoter strength and mRNA processing efficiency might provide a basis for the coordinated expression of rp genes. However, whether this concept applies to all rp genes or to distinctive subsets of genes is presently unclear.
Despite the obvious importance of rp gene expression for cell viability, there have been very few experimental studies of rp promoter architecture and transcriptional regulation in higher eukaryotes. To date, efforts to identify functionally relevant cis-acting regulatory elements and transcription factor binding sites have been made for only 9 mammalian [8, 10–23] and 2 amphibian [24, 25] rp genes, less than 15 % of the total rp gene complement. These studies identified regulatory elements in the promoter-proximal regions, both upstream and downstream of the tsp. Some of these elements contained binding sites for known transcription factors, notably GABP, Sp1 and YY1. When the binding of any individual factor was eliminated by a site-specific mutation, transcriptional activity was reduced, but not abolished, indicating that the overall transcription efficiency is determined by a combinatorial effect of multiple factors. No regulatory element common to all rp genes was found, although certain elements were present in several of the genes that were studied. None of these rp genes contained a canonical TATA box in the -25 to -30 region, although some had a "TATA-like" A/T-rich sequence, which might bind the general transcription factor TBP under certain circumstances.
Because the previous experimental studies were limited to a small subset of rp genes, it is important to know whether the results of those studies can be used to predict special features of promoter architecture that characterize the entire set of rp genes or that could be used to sort the rp genes into classes with similar features. This applies not only to transcription factor-binding sites, but also to the polypyrimidine tract that spans the tsp. As an initiator element, this tract is atypical because the tsp of initiators is normally an adenine residue, flanked on both sides by pyrimidines .
To address these issues, I have applied the principle of "phylogenetic footprinting", which holds that important regulatory sequences will have a tendency to be evolutionarily conserved and thus revealed by a sequence comparison of corresponding regions of orthologous genes . Some regulatory promoter elements may escape detection by this approach, but a substantial majority will probably be recognized with confidence . Although previously done on a small scale for a few selected rp genes [19, 21, 29–32], such an analysis has not heretofore been made for the entire rp gene population. I have therefore compared the promoter regions of all 79 orthologous human and mouse rp genes and have extended the comparisons to include chicken, amphibian and fish counterparts when these sequences were available. I have compared the sequence organization of the rp promoters with that of the promoters of non-rp TOP genes and other genes in the housekeeping category. This analysis has provided some useful insights into the general architecture of mammalian rp promoters and has identified parameters that might coordinately regulate the transcriptional activity of certain subsets of rp genes.
Compilation of the promoter sequences of orthologous rp genes
To make a study of the evolutionarily conserved features of ribosomal protein gene promoters, I first had to assemble a set of orthologous human and mouse rp gene sequences, in which the transcription start sites are reasonably well defined. For all of the human rp genes, annotated sequences are now available owing to the studies of Kato, Kenmochi and coworkers [6, 33], who systematically determined the 5' termini of human rp mRNAs by the oligo-capping method. While these studies indicated some small variations in the exact 5' ends of individual rp mRNAs, the tsp's could usually be specified to within a few nucleotides. Independent determinations of the human tsp's by the oligo-capping procedure are also available on the database of transcription start sites (DBTSS) compiled by Sugano and colleagues . The agreement between the two sets of oligo-capping data and the results of primer extension/nuclease protection (PE/S1) experiments is generally very good [see Additional file 1]. In all cases, the human tsp's selected in the present study correspond either to the most abundant oligo-capped cDNA or to an observed variant.
In contrast to the situation with the human rp genes, accurate determinations of the tsp's of mouse rp genes have been made in only a few cases, and therefore the annotations of rp genes in the mouse genome databases are generally incomplete. To overcome this problem, I used the mVISTA alignment program  to compare mouse sequences extending several kb upstream of the coding regions with the sequences of the corresponding human orthologues. Fortunately, the strong conservation of sequences in the proximal promoter regions and in the first exons enabled me to identify the probable tsp's of most mouse rp genes with confidence. For the most part, the tsp's that were predicted by this strategy corresponded exactly to those previously determined by primer extension and/or nuclease protection assays of murine rp genes [see Additional file 1]. Some information on putative tsp's of mouse rp genes is available on the DBTSS database. However, the most prominent of these tsp's did not agree with the PE/S1 data. Moreover, they were not coincident with the aligned C tsp's of the human orthologues, but were frequently at adjacent or nearby T residues. Conceivably, these discrepancies are caused by a technical bias, as is sometimes encountered in the generation of genome-wide full-length cDNA libraries . Therefore, relying on maximum sequence conservation and the best agreement with previous experimental data, I compiled aligned sequences of all 79 pairs of orthologous human and mouse rp genes, and used these sequences for further analysis of the promoter regions.
In a preliminary analysis, I used the mVISTA sequence alignment program to examine six to ten kb of 5' flanking sequence of three pairs of orthologus rp genes (rpS16, rpL30 and rpL32 ). These genes were selected because the mouse orthologues have been previously studied extensively in my laboratory. Scans with a 50 bp window indicated that the conserved (> 75 % identity) flanking sequences are largely confined to regions within a few hundred base pairs of the tsp. There was very little alignment of the upstream region sequences and many gaps caused by the presence of numerous insertion elements. With this analysis, I detected only one short block of conserved sequence at -2 kb in rpL30 (77 % identity over 84 bp), one short block at -1.5 kb in rpL32 (82 % identity over 60 bp), and none in rpS16 . Neither of the conserved blocks contained any recognizable transcription factor binding sites or other remarkable characteristics. The lack of any credible long-range regulatory elements is consistent with earlier conclusions based on transient transfection experiments with these genes [8, 17, 37]. Given these results, I restricted the analysis of the full set of rp genes to the transcribed portions of the genes and about one kilobase of 5' flanking sequence [see Additional file 2].
An initial analysis of the large data set revealed the presence of repetitive sequence insertion elements (sines for human and B1/B2 for mouse) in the 5' flanks of a very high proportion of the rp genes [see Additional file 2]. Indeed, as seen in Figure 1, half of the rp genes contain such an element within 500 bp of the tsp and over 80 % contain elements within 900 bp. The distribution is very similar in the human and mouse rp genes. Both ancestral elements, which are moderately conserved between mouse and human, and lineage-specific elements could be identified. As in the case of the three examples discussed above, very little sequence conservation was evident more than 200 bp upstream of the tsp. Accordingly, for the refined analysis of rp promoter architecture, I analyzed segments extending from 200 bp upstream to 100 bp downstream of the tsp. These segments were scrutinized for the quality of TATA box motifs in the -25 region, for conserved sequences in the initiator region, for transcription factor binding sites, and for the location of the AUG translation start codons.
Criteria for the annotation of rp promoter sequences
To evaluate the quality of TATA box motifs, I established criteria based on rules derived from a crystallographic analysis of TBP-DNA complexes . With these rules one can classify each nucleotide in the motif as being "preferred" or "acceptable" or "incompatible" with TBP binding (Figure 2a). I considered motifs with a string of 6 or more compatible nucleotides, of which at least 5 are "preferred", as being capable of binding TBP with high affinity (+ motifs). Those with a string of 6 compatible nucleotides, 3 or 4 of which are "preferred", were considered as possible low affinity sites (+/- motifs). Sequences that do not satisfy these criteria were judged to be incapable of unaided binding to DNA (- motifs).
Employing stringent criteria (Figure 2b), I used the rVISTA program  to search systematically for conserved (aligned) sites that would be predicted to bind the three transcription factors (GABP, Sp1, and YY1) that were identified in earlier experimental studies of rp promoters and also for highly conserved sites that might bind other ubiquitously expressed factors. In addition, I scanned both the human and mouse rp promoter sequences for unaligned optimal sites for the above-mentioned factors and for a motif termed Box A, which was previously implicated in rp gene expression . I used the results of these analyses to complete the annotation of all 79 pairs of aligned human and mouse rp promoter sequences, which can be viewed individually on pages 1 through 80 of a supplementary file [see Additional file 3].
Annotated comparisons of human/mouse promoter sequences
Illustrative examples of four annotated promoter sequence comparisons are shown in Figure 3. All four of these examples illustrate the strong sequence conservation of the non coding portions of exon 1, which is a general feature of the entire rp gene family [see Additional file 3]. The AUG translation initiation codon is located within exon 1 of rpL13a (panel d) and within exon 2 of rpL30 (panel b), whereas in rpS18 (panel a) and rpS4 (panel c), it is at the extreme 3' end of exon 1. Many rp genes have this latter feature, which may be relevant to the evolution of vertebrate rp promoters (see below).
The rpS18 promoter contains a conserved TATA (+) motif, while the TATA motifs of rpS4 and rpL13a were scored as (+/-) and that of rpL30 as (-). Well-conserved (aligned) consensus binding sites for GABP and/or Sp1 are located upstream of the tsp in all four examples; downstream YY1 sites are evident in rpL30 and rpL13a . The rpS18 and rpL13a promoters also contain conserved consensus binding sites for the ubiquitous AP1/ATF factors. The promoter elements that were found to have functional significance in experimental studies of the mouse rpL30 gene  are all conserved in the human orthologue (panel b). This includes sites for RFX1 and the Gamma Factor as well as the GABP and YY1 sites. In addition to the conserved sites, a few non conserved (unaligned) GABP and Sp1 sites of uncertain relevance are evident in rpL30 and rpL13a .
The results of this type of analysis for the full set of rp promoters are tabulated in Table 1. The identity between human and mouse sequences in the 300 bp segment (-200 to +100) that was analyzed ranged from 50 to 75 % with an average of 61 %. The location of the translation initiation codon was conserved in all except one rp gene (rpL29 ). In 29 % of the rp genes the AUG codon is at the extreme 3' end of exon 1 and in 47 % it is in exon 2. Thus, in 3/4 of the rp genes, the genetic elements involved in transcriptional/translational regulation are spatially separated from those with protein encoding function.
Contrary to previous impressions based on an incomplete set of rp genes, the rp promoters cannot generally be classified as "TATA-less". Thirty-five percent of the promoters contain, in the -25 region, a TATA box that should theoretically bind TBP with high affinity. An additional 25 % have an A/T-rich tract in this region, which might bind TBP with lower affinity, and the remainder would not be predicted to bind TBP without help from other proteins. For the most part, the TATA box quality is well conserved between the two mammalian species: the assessed quality of human and mouse TATA motifs was the same in 82 % of the rp promoters.
The prevalence in rp promoters of evolutionarily conserved GABP, Sp1 and YY1 binding sites is readily apparent from these results. Conserved upstream GABP and Sp1 sites are present in 54 % and 48 % of the rp promoters, respectively. Conserved downstream YY1 sites are present in 52 % of the rp promoters. The ratio of aligned to unaligned sites in the human rp promoters is approximately 2:1 for GABP, 1:1 for Sp1 and 4:1 for YY1. The occurrence of unaligned sites in the mouse rp promoters is similar to that in the human rp promoters [see Additional file 3]. It is noteworthy that 76 % of the rp promoters contain at least one conserved upstream GABP and/or Sp1 site within 200 bp of the tsp. If unaligned sites are included, the proportion of human rp genes with an upstream GABP and/or Sp1 site is 92 %. Conserved consensus motifs for other ubiquitous transcription factors are present in a much lower proportion of the rp promoters, namely 12 % for the Box A-binding factor, 11 % for AP1, 10 % for ATF/CREB and 8 % for Nrf1.
The rp gene initiator
The existence of a novel polypyrimidine initiator sequence in which the conventional A residue at the tsp is replaced by a C residue is well known. Moreover, the roles of this sequence in rp gene transcription and rp mRNA translation have been previously demonstrated experimentally for a few rp genes [7, 40–42]. What is not known is whether there is a consensus initiator sequence that characterizes the entire rp gene set. To address this issue, I used the pairwise alignments of human and mouse orthologues to produce an occupancy matrix for positions -8 to +10 of the initiator [see Additional file 4]. With this matrix and the standard consensus rules, I determined that the rp consensus initiator sequence is (Y)2C+1TY(T)2(Y)3. A striking result is that T's are strongly preferred over C's at positions +2, +4 and +5. This preference, which is graphically illustrated in Figure 4, might be related to a transcriptional function of the initiator or a translational function of the TOP or to both functions.
The extent of conservation of rp promoter features in non-mammalian vertebrates
To determine the features of rp promoter architecture that are conserved over large evolutionary periods, I compared the promoter sequences of five chicken, six amphibian and five fish rp genes to their mammalian orthologues using the Clustal W multiple sequence alignment program. An example of such an alignment for rpS3 (Figure 5) shows strong conservation of both the coding and non-coding portions of the first exon, the TATA box motif, the initiator sequence, and a downstream YY1 site that spans the translation initiation codon. In contrast, the overlapping GABP and Sp1 sites, which are aligned in the human and mouse promoters, are not conserved in the amphibian and fish promoters. However, each of these promoters contains an unaligned consensus site for GABP or Sp1. A summary of the multi-sequence analysis for 11 rp genes (Table 2) indicates that the extent of conservation observed for rpS3 is fairly typical, although some small variations are evident. The conformance to the consensus initiator sequence is the same (90 %) for mammals and lower vertebrates. Yet, the particular C residue within the initiator that is used as the major tsp may differ slightly. Also the adjudged quality of the TATA box for TBP binding is not always the same. Nevertheless, except for the location of upstream transcription factor-binding sites, the general features of rp promoter architecture are usually well conserved over large evolutionary distances.
Comparison of rp promoter features with those of non-rp TOP genes and other housekeeping genes
It is of interest to know which features of rp promoter architecture are specific and which might be common to other ubiquitously expressed genes. To this end, I examined the promoters of two additional sets of genes. One set consisted of non-rp genes that also produce translationally controlled mRNAs with 5'-terminal oligopyrimidine tracts (non-rp TOP genes). At the present, there are nine known genes in this category for which the tsp of at least one orthologue has been experimentally determined [see Additional file 5]. The non-rp TOP set contains genes encoding translation elongation factors (eEF1A1, eEF1B and eEF2), RNA-binding proteins (PABPc1 and hnRNPA1), a major nucleolar protein (nucleoplasmin/B25), a protein with tubulin-binding properties (TCTP/p23), and two genes that do not encode proteins, but rather have small nucleolar RNA-encoding sequences embedded within their introns (gas5 and U17HG ). Annotated aligned sequences of the nine human and mouse non-rp TOP genes are presented on pages 1 through 9 of a supplementary file [see Additional file 6] and the results are summarized in Table 3.
There are some notable differences in the promoter architecture of rp genes and non-rp TOP genes. First, GABP- and YY1-binding motifs, which are prevalent in the rp promoters, are rarely found in the non-rp TOP promoters (Table 4a). Second, eight of the nine non-rp TOP promoters have conserved TATA boxes that would be expected to bind TBP with high affinity, whereas only a third of the rp promoters have such TATA boxes (Table 4b). These differences were considered to be statistically significant when analyzed by Fisher's exact test (Table 4). The initiator sequences of the non-rp TOP genes resemble the rp consensus (89 % identical from -2 to +7) except for the roughly equal occurrence of C's and T's at +2.
The second set of ubiquitously expressed genes consisted of 20 housekeeping genes randomly selected from the eukaryotic promoter database [see Additional file 7]. The promoters of these genes also have an under-representation of YY1 sites compared to the rp promoters, and like both the rp and non-rp TOP genes, contain abundant motifs for Sp1 (Table 4a). The proportion of these genes that have TATA boxes with TBP-binding capability is similar to that observed for the rp genes (Table 4b).
The foregoing analysis of 79 mammalian ribosomal protein genes has revealed several features of rp promoter architecture, some of which are largely conserved over long periods of vertebrate evolution (about 450 million years) and others that are strongly conserved only in mammals. (about 90 million years). One highly conserved feature, present in over 3/4 of the rp genes, is the separation by an intron of the sequences involved in transcriptional and/or translational regulation from the sequences with protein-encoding function. In 47 % of the rp genes, the AUG translation initiation codon is in exon 2, and in 29 % of the genes, it is at the extreme 3' end of exon 1. It would seem that at an early stage of vertebrate evolution, these regulatory sequences were appended as discrete units to the loci containing the protein-encoding sequences.
The polypyrimidine tract that spans the tsp is present in all vertebrate rp genes. This tract can function as a transcriptional initiator [40, 41] and also embraces the TOP sequence, which is essential for the translational control of rp mRNAs [7, 42]. Based on the assignments of human and mouse tsp's used in the present study, the average lengths of the polypyrimidine tracts and TOP sequences are 12.2 bp and 8.2 bp, respectively. A compilation of conserved sequences in the -8 to +10 regions of orthologous human and mouse rp genes revealed the consensus sequence (Y)2C+1TY(T)2(Y)3. Thus, in addition to the C at the tsp, there is a clear preference for T over C at positions +2, +4, and +5. This preference may reflect a structural bias for transcription, e.g., ease of strand separation, or for translation, e.g., affinity of rp mRNAs for a putative repressor.
The presence of transposable element residues in the 5' flanks of the rp genes is noteworthy. Half of the rp genes contain an element (sines for humans, B1/B2 for mouse) within 500 bp of the tsp, and over 80 % of the genes contain an element within 900 bp. Some elements are moderately conserved between mouse and human, but most appear to be lineage-specific. These elements are unlikely to have any specific role in rp promoter function. They may be passively tolerated because the vast majority of conserved 5' sequence is confined to within 200 bp of the tsp. Within a segment from -200 to +100, the sequence identity between human and mouse rp orthologues ranges from 50 to 75 % with an average of 61 %, whereas the sequence match beyond -200 is of borderline significance.
The observation that 35 % of the rp promoters contain a TATA box motif at -25, which would be predicted to bind TBP with high affinity, and that an additional 25 % possess A/T-rich motifs, which might bind TBP with lower affinity, was unexpected. The assessed quality of these motifs for TBP binding, made according to rules established by a detailed structural study of TBP-DNA complexes , was largely conserved between human and mouse rp orthologues. Some of the promoters classified as poor (-) binders, e.g., rpL32 , might bind TBP weakly without help from an additional protein [43, 44], so that the true proportion of rp promoters with TBP-binding capability might actually be greater than 60 %. Thus, contrary to earlier views based on an analysis of a small subset of rp genes, many of the rp promoters should not be classified in the "TATA-less" category.
When human and mouse rp promoter regions from -200 to +100 were scanned for conserved (aligned) transcription factor-binding sites with the rVISTA program , which is based on consensus sequences and matrix tables in the TRANSFAC database, three ubiquitously expressed factors that had previously been implicated in rp promoter activity predominated. Using high stringency criteria, I detected aligned GABP- and Sp1- binding sites upstream of the tsp and aligned YY1-binding sites downstream of the tsp in approximately half of the rp promoters. The occurrence of aligned motifs for other ubiquitously expressed factors is considerably less, i.e., only about 10 % for any single factor. Whereas Sp1 sites are also commonly found in the promoters of many housekeeping genes, including the non-rp TOP genes, the prevalence of GABP and YY1 sites appears to be a more prominent feature of the rp promoters.
GABP is a heteromeric protein consisting of an ETS family member, which has DNA-binding capability, and an ankyrin repeat-containing subunit, which greatly improves the stability of GABP-DNA interactions [45, 46]. Two-thirds of the human rp promoters contain one or more potential GABP-binding sites upstream of the tsp, 79 % of which are perfectly aligned in the orthologous mouse rp promoters. Previous experimental studies have implicated GABP as a positive transcriptional regulator of the mouse rpL30 and rpL32 genes [10, 14], the human rpS14 and rpS6 genes [20, 22], and the Xenopus rpL18 gene . The GABP sites that positively contribute to transcriptional activity are generally located upstream of the tsp, although in some cases, sites overlapping the initiator or downstream of the tsp may also be relevant. In the mouse rpS16 gene, GABP binding at the initiator decreases transcription activity both in vivo and in vitro . For simplicity, only upstream, presumably positively acting, GABP sites were included in the tabulation.
YY1 is a zinc finger-containing protein with a variety of gene-specific functions, including transcriptional activation and repression, positioning of RNA polymerase II, and chromatin modification [47, 48]. Fifty-nine percent of the human rp genes contain at least one YY1 site downstream of the tsp, 88 % of which are conserved in the mouse orthologues. Two functionally relevant binding sites for YY1 (originally termed "delta") were detected downstream of the tsp in mouse rpL32 . In mouse rpL30 , a downstream YY1 interaction also contributed positively to transcriptional activity, but only when a strong upstream interaction with GABP was eliminated . Interestingly, the vast majority of YY1 sites are located downstream of the tsp, and therefore, for simplicity, only downstream YY1 sites were included in the tabulation. The alignment of mammalian YY1 sites is frequently preserved in chickens, frogs and fish, in contrast to the GABP and Sp1 sites, which rarely have aligned counterparts in those species (Table 2). The aligned downstream site in Xenopus rpL18 was shown to interact with the frog homologue of YY1, but the functional significance of this interaction was not demonstrable with the reporter constructs that were used in these experiments . The diverse activities of YY1 in different promoter contexts and its propensity for interactions with a wide variety of proteins , have led to the idea that it may have multiple mechanistic roles in transcriptional regulation. Which of these roles apply to the rp promoters remains to be established.
Based on the results of the analysis of rp promoter sequences, I have sorted the different rp promoters into eight classes according to whether they possess conserved binding sites for the three prevalent transcription factors, GABP, Sp1 and YY1 (Figure 6). Only 10 of the rp promoters do not appear to contain conserved binding sites for any of the three factors, and considering the high stringency used for the analysis, the true number may even be lower. Moreover, 8 of these 10 promoters contain non-conserved GABP, Sp1 or YY1 sites and/or conserved sites for other ubiquitously expressed factors. TATA boxes with predicted affinity for TBP are distributed among the promoters of all eight classes. This classification might prove useful for interpreting the results of experiments in which cell-specific or physiologically induced variations in the expression of different subsets of rp genes are observed. In addition, when more extensive data on the relative transcription rates of rp genes or the relative abundance of rp mRNAs become available, this classification might help account for any as yet undetected variability.
It is worth noting that some ribosomal proteins can also have extraribosomal functions [50, 51]. Among the mammalian proteins that have been demonstrated or presumed to have additional functions are rpS3, rpS4, rpL5, rpL7, rpL10, rpL13a, rpLP0 and rpLP2. While, collectively, the promoters of the genes encoding these proteins do not fall into any particular class, most contain conserved binding sites for one or more of the three prevalent transcription factors and have TATA boxes with TBP-binding capability. Recently, the protein RACK1/Asc1p, which had previously been implicated in various signal transduction processes, was shown to have the properties of an authentic 40S ribosomal protein . When I analyzed the promoter structures of the orthologous human (NM_006098) and mouse (NM_088143) genes that encode this protein, I observed several features in common with the rp promoters. The tsp of the mammalian RACK1 gene is embedded in a polypyrimidine tract that conforms perfectly to the rp initiator consensus sequence. Moreover, the promoter contains a TATA box of +/- quality, an aligned upstream binding motif for Sp1 and an aligned downstream motif for YY1, but no readily detectable motifs for other ubiquitous transcription factors. Interestingly, the features of this promoter resemble those of the rpL13a gene, which also encodes a protein with apparent pleiotropic function.
This analysis has highlighted features of rp promoter architecture that are shared by a high proportion of the rp genes. The evolutionary conservation of these features lends strong support to their functional relevance. Yet, superimposed on this general design are variations that confer certain idiosyncratic characteristics on each promoter. There does not seem to be a single master switch that co-regulates all rp genes at the transcriptional level. Rather, the rp promoters are tuned to respond to a combination of factors, including components of the general transcription machinery, a relatively small group of sequence-specific transcription factors, and modifiers of chromatin structure. The inherent functional redundancy and lack of dependence on any single factor are useful design features for genes that must be expressed in a broad spectrum of cell types and environmental situations.
A sequence comparison of the promoter regions of all 79 orthologous human and mouse ribosomal protein genes has revealed several evolutionarily conserved features that are characteristic of a high proportion of the rp gene set. One such feature, which is also evident in the rp genes of lower vertebrates, is the separation by an intron of the sequences involved in transcriptional and translational regulation from the sequences with protein encoding function. Another conserved feature is the polypyrimidine initiator, which in mammals conforms to the consensus (Y)2C+1TY(T)2(Y)3. Contrary to previous impressions based on studies of a small subset of rp genes, the majority of rp promoters contain a TATA box or an A/T-rich motif at -25 that should theoretically have TBP-binding capability. Similarly, approximately half of the rp promoters contain conserved binding motifs for transcription factors previously implicated in rp gene expression, namely upstream GABP and Sp1 sites and downstream YY1 sites. Conserved motifs for other ubiquitous factors occurred much less frequently. Transposable element residues within 900 bp of 5'-flanking sequence were present in over 80 % of the rp genes; very little sequence conservation was evident more than 200 bp upstream of the tsp. Some of these architectural features were seen to be specific for rp promoters. From the results of this analysis, it was possible to sort the rp promoters into eight classes according to their possession of putative binding sites for GABP, Sp1 and YY1, and also to specify which promoters should have intrinsic affinity for TBP. This classification might prove useful for interpreting the results of experiments in which cell-specific or physiologically induced variations in the expression of different subsets of rp genes are observed.
The rp gene sequences were extracted from three database sources [see Additional file 2]. The vast majority of sequences were obtained from the UCSC database , which conveniently uses uppercase and lowercase letters to distinguish exon and flanking/intron sequences, respectively. The remaining sequences were obtained from the ncbi , Celera  and the recently available RPG  databases. For alignment of human and mouse rp promoter sequences I used mVISTA , which is based on the AVID global alignment program . The locations of repetitive insertion sequence elements were determined by the RepeatMasker program supplied with mVISTA and, in many cases, corroborated by ncbi annotations. Transcription factor-binding sites were detected with the rVISTA program and the FindPatterns tool of the GCG program. The analysis of non-rp TOP genes was made similarly to that of the rp genes. A graphic representation of the rp initiator consensus sequence was obtained with the Weblogo program . For alignment and viewing of three or more orthologus rp genes, I used the ClustalW program  and SeqVu shareware l.0.1 (Garvan Institute, Sydney Australia).
transcriptional start point
terminal oligopyrimidine sequence
Uechi T, Tanaka T, Kenmochi N: A complete map of the human ribosomal protein genes: assignment of 80 genes to the cytogenetic map and implications for human disorders. Genomics. 2001, 72: 223-230. 10.1006/geno.2000.6470.
Zhang Z, Harrison P, Gerstein M: Identification and analysis of over 2000 ribosomal protein pseudognes in the human genome. Genome Res. 2002, 12: 1466-1482. 10.1101/gr.331902.
Dudov KP, Perry RP: The gene family encoding the mouse ribosomal protein-L32 contains a uniquely expressed intron-containing gene and an unmutated processed gene. Cell. 1984, 37: 457-468. 10.1016/0092-8674(84)90376-3.
Wiedmann LM, Perry RP: Characterization of the expressed gene and several processed pseudogenes for the mouse ribosomal protein L30 gene family. Mol Cell Biol. 1984, 4: 2518-2528.
Wagner M, Perry RP: Characterization of the multigene family encoding the mouse S16 ribosomal protein: strategy for distinguishing an expressed gene from its processed pseudogene counterparts by an analysis of total genomic DNA. Mol Cell Biol. 1985, 5: 3560-3576.
Yoshihama M, Uechi T, Asakawa S, Kawasaki K, Kato S, Higa S, Maeda N, Minoshima S, Tanaka T, Shimizu N: The human ribosomal protein genes: Sequencing and comparative analysis of 73 genes. Genome Res. 2002, 12: 379-390. 10.1101/gr.214202.
Meyuhas O: Synthesis of the translational apparatus is regulated at the translational level. Eur J Biochem. 2000, 267: 6321-6330. 10.1046/j.1432-1327.2000.01719.x.
Hariharan N, Kelley DE, Perry RP: Equipotent mouse ribosomal protein promoters have a similar architecture that includes internal sequence elements. Genes Dev. 1989, 3: 1789-1800.
Meyuhas O, Perry RP: Construction and identification of cDNA clones for mouse ribosomal proteins: application for the study of r-protein gene expression. Gene. 1980, 10: 113-129. 10.1016/0378-1119(80)90129-8.
Genuario RR, Kelley DE, Perry RP: Comparative utilization of transcription factor GABP by the promoters of ribosomal protein genes rpL30 and rpL32. Gene Expr. 1993, 3: 279-288.
Safrany G, Perry RP: The relative contributions of various transcription factors to the overall promoter strength of the mouse ribosomal protein L30 gene. Eur J Biochem. 1995, 230: 1066-1072. 10.1111/j.1432-1033.1995.tb20657.x.
Genuario RR, Perry RP: The GA-binding protein can serve as both an activator and repressor of ribosomal protein gene transcription. J Biol Chem. 1996, 271: 4388-4395. 10.1074/jbc.271.8.4388.
Harris SA, Dudov KP, Bowman LH: Comparison of the mouse L32 ribosomal protein promoter elements in mouse myoblasts, fibers, and L cells. J Cell Biochem. 1992, 50: 178-189.
Yoganathan T, Bhat NK, Sells BH: A positive regulator of the ribosomal protein gene, beta factor, belongs to the ETS oncoprotein family. Biochem J. 1992, 287: 349-353.
Chung S, R.P. P: The importance of downstream delta-factor binding elements for the activity of the RPL32 promoter. Nucleic Acids Res. 1993, 21: 3301-3308.
Curcic D, Glibetic M, Larson DE, Sells BH: GA-binding protein is involved in altered expression of ribosomal protein L32 gene. J Cell Biochem. 1997, 65: 287-307.
Hariharan N, Perry RP: A characterization of the elements comprising the promoter of the mouse ribosomal protein gene RPS16. Nucleic Acids Res. 1989, 17: 5323-5337.
Meyuhas O, Klein A: The mouse ribosomal protein L7 gene. J Biol Chem. 1990, 265: 11465-11473.
Colombo P, Fried M: Functional elements of the ribosomal protein L7a (rpL7a) gene promoter region and their conservation between mammals and birds. Nucleic Acids Res. 1992, 20: 3367-3373.
Overman PF, Rhoads DD, Tasheva ES, Pyle MM, Roufa DJ: Multiple regulatory elements ensure accurate transcription of a human ribosomal protein gene. Somat Cell Mol Genet. 1993, 19: 347-362. 10.1007/BF01232747.
Davies B, Fried M: The L19 ribosomal protein gene (RPL19): Gene organization, chromosomal mapping, and novel promoter region. Genomics. 1995, 25: 372-380. 10.1016/0888-7543(95)80036-L.
Antoine M, Kiefer P: Functional characterization of transcriptional regulatory elements in the upstream region and intron 1 of the human S6 ribosomal protein gene. Biochem J. 1998, 336: 327-335.
Kirn-Safran CB, Dayal S, Martin-DeLeon PA, Carson DD: Cloning, expression, and chromosome mapping of the murine Hip/Rpl29 gene. Genomics. 2000, 68: 210-219. 10.1006/geno.2000.6283.
Marchioni M, Morabito S, Salvati AL, Beccari E, Carnevali F: XrpFI, an amphibian transcription factor composed of multiple polypeptides immunologically related to the GA-binding protein a and B subunits, is differentially expressed during Xenopus laevis development. Mol Cell Biol. 1993, 13: 6479-6489.
DeRinaldis E, Pisaneschi G, Camacho-Vanegas O, E. B: The binding sites for Xenopus laevis FIII/YY1 in the first exon of L1 and L14 ribosomal protein genes are dispensable for promoter expression. Eur J Biochem. 1998, 255: 563-569. 10.1046/j.1432-1327.1998.2550563.x.
Smale ST, Kadonaga JT: The RNA polymerase II core promoter. Annu Rev Biochem. 2003, 72: 449-479. 10.1146/annurev.biochem.72.121801.161520.
Duret L, Bucher P: Searching for regulatory elements in human noncoding sequences. Curr Opin Struct Biol. 1997, 7: 399-406. 10.1016/S0959-440X(97)80058-9.
Dermitzkis ET, Clark AG: Evolution of transcription factor binding sites in mammalian gene regulatory regions: Conservation and turnover. Mol Biol Evol. 2002, 19: 1114-1117.
Zinn AR, Alagappan RK, Brown LG, Wool I, Page DC: Structure and function of ribosomal protein S4 genes on the human and mouse sex chromosomes. Mol Cell Biol. 1994, 14: 2485-2492.
Pata I, Metspalu A: Structural characterization of the mouse ribosomal protein S6-encoding gene. Gene. 1996, 175: 241-245. 10.1016/0378-1119(96)00157-6.
Rebane A, Tamme R, Laan M, Pata I, Metspalu A: A novel snoRNA (U73) is encoded within the introns of the human and mouse ribosomal protein S3a genes. Gene. 1998, 210: 255-263. 10.1016/S0378-1119(98)00070-5.
Higa S, Yoshihama M, Tanaka T, Kenmochi N: Gene organization and sequence of the region containing the ribosomal protein genes RPL13A and RPS11 in the human genome and conserved features in the mouse genome. Gene. 1999, 240: 371-377. 10.1016/S0378-1119(99)00429-1.
Kato S, Sekine S, Oh SW, Kim NS, Umezawa Y, Abe N, Yokoyama-Kobayashi M, Aoki T: Construction of a human full-length cDNA bank. Gene. 1994, 150: 243-250. 10.1016/0378-1119(94)90433-2.
Database of Transcriptional Start Sites. [http://dbtss.hgc.jp]
Mayor C, Brudno M, Schwartz JR, Poliakov A, Rubin EM, Frazer KA, Pachter LS, Dubchak I: VISTA: visualizing global DNA sequence alignments of arbitrary length. Bioinformatics. 2000, 16: 1046-1047. 10.1093/bioinformatics/16.11.1046.
Sugahara Y, Carninci P, Itoh M, Shibata K, Konno H, Endo T, Muramatsu M, Hayashizaki Y: Comparative evaluation of 5' -end-sequence quality of clones in CAP trapper and other full-length-cDNA libraries. Gene. 2001, 263: 93-102. 10.1016/S0378-1119(00)00557-6.
Moura-Neto R, Dudov KP, Perry RP: An element downstream of the cap site is required for transcription of the gene encoding mouse ribosomal protein L32. Proc Natl Acad Sci USA. 1989, 86: 3997-4001.
Patikoglou GA, Kim JL, Sun L, Yang SH, Kodadek T, Burley SK: TATA element recognition by the TATA box-binding protein has been conserved throughout evolution. Genes Dev. 1999, 13: 3217-3230. 10.1101/gad.13.24.3217.
Loots G, Ovcharenko I, Pachter L, Dubchak I, Rubin E: rVISTA for comparative sequence-based discovery of functional transcription factor binding sites. Genome Res. 2002, 12: 832-839. 10.1101/gr.225502. Article published online before print in April 2002.
Hariharan N, Perry RP: Functional dissection of a mouse ribosomal protein promoter: significance of the polypyrimidine initiator and an element in the TATA-box region. Proc Natl Acad Sci USA. 1990, 87: 1526-1530.
Chung S, Perry RP: Cell-free transcription of a mouse ribosomal-protein-encoding gene: the effects of promoter mutations. Gene. 1991, 100: 173-180. 10.1016/0378-1119(91)90363-G.
Levy S, Avni D, Hariharan N, Perry RP, Meyuhas O: Oligopyrimidine tract at the 5' end of mammalian ribosomal protein messenger RNAs is required for their translational control. Proc Natl Acad Sci USA. 1991, 88: 3319-3323.
Wiley SR, Kraus RJ, Mertz JE: Functional binding of the "TATA" box binding component of transcription factor TFIID to the -30 region of TATA-less promoters. Proc Natl Acad Sci USA. 1992, 89: 5814-5818.
Yoganathan T, Horikoshi M, Roeder RG, Sells BH: Direct binding of yeast transcription factor (TFIID) to the ribosomal protein L32 (rpL32) TATA-less promoter sequence. Febs Lett. 1993, 326: 163-166. 10.1016/0014-5793(93)81783-V.
Thompson CC, Brown TA, McKnight SL: Convergence of Ets- and notch-related structural motifs in a heteromeric DNA binding complex. Science. 1991, 253: 762-768.
Batchelor AH, Piper DE, de la Brousse FC, McKnight SL, Wolberger C: The structure of GAPB ab: An ETS domain-ankyrin repeat heterodimer bound to DNA. Science. 1998, 279: 1037-1041. 10.1126/science.279.5353.1037.
Shi Y, Lee JS, Galvin KM: Everything you have ever wanted to know about Yin Yang 1. Biochim Biophys Acta. 1997, 1332: F49-F66.
Rezai-Zadeh N, Zhang X, Namour F, Fejer G, Wen YD, Yao YL, Gyory I, Wright K, Seto E: Targeted recruitment of a histone H4-specific methyltransferases by the transcription factor YY1. Genes Dev. 2003, 17: 1019-1029. 10.1101/gad.1068003.
Thomas MJ, Seto E: Unlocking the mechanisms of transcription factor YY1: are chromatin modifying enzymes the key?. Gene. 1999, 236: 197-208. 10.1016/S0378-1119(99)00261-9.
Wool IG: Extra-ribosomal functions of ribosomal proteins. Trends Biochem Sci. 1996, 21: 164-165. 10.1016/0968-0004(96)20011-8.
Zimmerman RA: The double life of ribosomal proteins. Cell. 2003, 115: 130-132. 10.1016/S0092-8674(03)00804-3.
Gerbasi VR, Weaver CM, Hill S, Friedman DB, Link AJ: Yeast-Asc1p and mammalian RACK 1 are functionally ortologous core 40S ribosomal proteins that repress gene expression. Mol Cell Biol. 2004, 24: 8276-8287. 10.1128/MCB.24.18.8276-8287.2004.
UCSC Database. [http://genome.ucsc.edu]
The ncbi database. [http://www.ncbi.nlm.nih.gov/genomes/static/euk_g]
Celera Database. [http://celeradiscoverysystem.com]
Ribosomal Protein Gene Database. [http://ribosome.miyazaki-med.ac.jp]
mVista Program. [http://genome.lbl.gov/vista]
Bray N, Dubchak I, Pachter L: AVID: A Global Alignment Proram. Genome Res. 2003, 13: 97-102. 10.1101/gr.789803.
Weblogo Program. [http://weblogo.berkeley.edu/logo.cgi]
Clustal W Program. [http://www.ebi.ac.uk/clustalw]
The author would like to thank Dr. Tom Moloshok of the FCCC Bioinformatics Facility for his valuable help with various bioinformatics tools and Dr. Sam Litwin for help with statistical analyses. He is also indebted to Drs. Oded Meyuhas, Francisco Amaldi, Ken Zaret and John Burch for their critical reviews and comments concerning this paper. This research was supported by a grant from The National Institutes of Health (CA006927) and an appropriation from the Commonwealth of Pennsylvania. Support to R.P. Perry from the Stanley P. Reimann Endowed Chair in Research is also acknowledged.