Position and sequence conservation in Amniota of polymorphic enhancer HS1.2 within the palindrome of IgH 3'Regulatory Region

Background The Immunoglobulin heavy chain (IgH) 3' Regulatory Region (3'RR), located at the 3' of the constant alpha gene, plays a crucial role in immunoglobulin production. In humans, there are 2 copies of the 3'RR, each composed of 4 main elements: 3 enhancers and a 20 bp tandem repeat. The single mouse 3'RR differs from the two human ones for the presence of 4 more regulative elements with the double copy of one enhancer at the border of a palindromic region. Results We compared the 3'RR organization in genomes of vertebrates to depict the evolutionary history of the region and highlight its shared features. We found that in the 8 species in which the whole region was included in a fully assembled contig (mouse, rat, dog, rabbit, panda, orangutan, chimpanzee, and human), the shared elements showed synteny and a highly conserved sequence, thus suggesting a strong evolutionary constraint. In these species, the wide 3'RR (~30 kb in human) bears a large palindromic sequence, consisting in two ~3 kb complementary branches spaced by a ~3 kb sequence always including the HS1.2 enhancer. In mouse and rat, HS3 is involved by the palindrome so that one copy of the enhancer is present on each side. A second relevant feature of our present work concerns human polymorphism of the HS1.2 enhancer, associated to immune diseases in our species. We detected a similar polymorphism in all the studied Catarrhini (a primate parvorder). The polymorphism consists of multiple copies of a 40 bp element up to 12 in chimpanzees, 8 in baboons, 6 in macaque, 5 in gibbons, 4 in humans and orangutan, separated by stretches of Cytosine. We show specific binding of this element to nuclear factors. Conclusions The nucleotide sequence of the palindrome is not conserved among evolutionary distant species, suggesting pressures for the maintenance of two self-matching regions driving a three-dimensional structure despite of the inter-specific divergence at sequence level. The information about the conservation of the palindromic structure and the settling in primates of the polymorphic feature of HS1.2 show the relevance of these structures in the control and modulation of the Ig production through the formation of possible three-dimensional structures.


Background
The immunoglobulin genes appeared, during evolution, in vertebrates. Because of their increasing physiological relevance, the evolution of these genes in fish, amphibian, birds and mammals witnessed several series of duplications that ended in adding copies and complexity to these genes. The class switch, in particular, was of importance in producing diversity [1], and constituted a crucial step in B cell maturation [2]. The region involved by the somatic rearrangements, allowing the class switch, is the Immunoglobulin heavy chain (IgH) locus. This domain is in a single copy in the genome of most extant species (as example, see the mouse locus in Figure 1A). Hominoidea (human, chimpanzee, gorilla and gibbon) are an exception, because the constant genes of the IgH locus underwent duplication in their common ancestor [3]. Studies in humans have shown that the duplication of the IgH locus included the Regulatory Region (3'RR) located immediately downstream of the constant alpha exons [2]. Portions of the 3'RR were first cloned in 1990 and 1991 [4,5] but only later were fully assembled as a complete contig sequence because they are repeats-rich unstable regions, moreover containing palindromic sequences [6,7]. In humans, the 2 copies of this 3' Regulatory Region (3'RR) have been reported as 3'RR1 and 3'RR2 ( Figure 1B) [8]. Each human 3'RR copy harbors three different enhancers. The mouse and rat 3'RR possess 4 more boundary regulatory regions instead. Their existence in other organisms may be hypothesized but have not been demonstrated yet [2] ( Figure 1A). The 3'RR has a crucial role in recruiting transcription factors for the initiation of germ line transcription of the constant genes to induce IgH switch [9]. The role of the 3'RR enhancers was studied with mice transgenic for the c-myc translocation, showing the active role of HS3. B and HS4 in peripheral B-cell lymphomas progression and not in pro-B lymphomas [10,11]. Relevant studies demonstrate by chromosome conformation capture techniques the presence of a three-dimensional structure originated by a loop among Regulatory Regions during class switch recombination [12]. New studies on 3'RR transgenic deleted mice report impairment of class switch and Ig expression [13,14]. Activation of the mouse 3'RR begins with selective demethylation of enhancers [15]. Binding sites variation affecting the enhancers sequence can lead to different epigenetic changes and bring cells to differently act. In our recently performed population studies we found, in fact, that some of the 3'RR enhancer HS1.2 variants (Figure 2) were associated to a higher risk for autoimmune diseases onset and other immune-disorders as IgA defect, systemic sclerodermia, Rheumatic arthritis, Psoriasis and Celiac disease [16][17][18][19][20]. We hypothesized that the cause was a change of a binding consensus for NF-B and other transcription factors as "in silico" predicted or experimentally determined [16,21]. In humans the presence of an allele with the NF-B consensus site was associated to increased haematic concentration of IgM, suggesting a contribution to the mechanism of class switch [16,17].
Little is known on the presence and organization of the 3'RR in Amniota other than human and mouse [22,23]. Comparative studies of this regulatory region can provide hypotheses on which elements are crucial with respect to their function [24]. To fill the gap, we investigated the genomic organization of this region taking advantage of the sequence data present in GenBank and Trace Archive (see Methods below). The most relevant achievement of the analysis was the discovering that the palindrome surrounding the HS1.2 enhancer is present in every mammal species for which enough sequence data were available. This finding has important implication on the understanding of HS1.2 functioning. In addition, our data supports the view that HS1.2 polymorphisms are widely spread in the primate parvorder of Catarrhini (Cercopithecoidea and Hominoidea). The improvement on the comparative studies on the non coding genome is a relevant task for new insight in the epigenetic and mechanisms of genome regulation [25].

3'RR genomic organization
The mouse 3'RR region contains 7 enhancers (HS3.A-HS1.2-HS3.B-HS4-HS5-HS6-HS7), while human has only three enhancers ( Figure 1). For this reason the mouse was used as the reference genome for the preliminary analysis. The IgA class exons are the transcripted sequence closest to the 3'RR, so we included a portion of this DNA in our analysis. It is to keep in mind that the IgA was the last class to appear during the evolution of IgH, because it is just shared among Amniota species (reptiles, bird and mammals) [26]. Finally, we surveyed also a satellite repeat. This is a conserved stretch of DNA (812 bp in human) composed of tandemly repeated 20 bp element and located inside the 3'UTR of the IgA gene, close to the 3'RR enhancer.
The "Comparative Genomics" tracks of the UCSC mouse genome browser http://genome.ucsc.edu/cgi-bin/ Figure 1 Schematic map of the mouse and human IgH genetic cluster. The IgH cluster and the closest genes (dark blue) are highly conserved in the mouse and human chromosomes. The same transcription direction was also detected in both species, with a "telomere towards centromere" transcription versus. The mouse IgH cluster harbors 8 constant genes and one copy of the 3' Regulatory Region (3'RR) with 7 enhancers, 3 of them (HS5, HS6, HS7) with possible insulator function at the 3' boundary of the cluster. The human IgH region has some differences for the constant genes since a wide duplication of gamma, epsilon and alpha genes occurred in the ancestor of Hominoidea. This duplication included also the 3'RR at the 3' of the constant alpha gene. Both the two copies of the 3'RR in humans have just three enhancers. In fact, the human loci do not harbor both the duplicated HS3.B enhancer and the insulators present in the mouse locus.
hgGateway?org=Mouse report graphical representations of Lastz comparison http://www.bx.psu.edu/miller_lab/ between mouse and each one of 19 Amniota genomes (rat, guinea pig, rabbit, human, chimp, orangutan, rhesus, marmoset, panda, dog, cat, horse, elephant, cow, pig, opossum, platypus, lizard, chicken). Some of these assemblies, human and mouse in particular, are very accurate. At the contrary some others are based on relatively low sequence coverage, with several unresolved gaps. This peculiar consideration has to be kept in mind when dealing with negative results of sequences comparison among genomes drafts.
Results of our search for the mouse 3'RR main elements in Amniota genomes are summarized in Figure 3. The full set of elements present in mouse (IgH alpha exons, 20 bp tandem repeat and 7 enhancers) was detected only in the rat genome. The HS5, HS6, and HS7 were always absent in all the remaining species. The HS3, HS1.2, and HS4 set was detected with certainty in mouse, rat, dog, rabbit, panda, human, chimpanzee, and orangutan. HS3 and the 20 bp repeat, were present in 12 mammals, but were undetected in cow, elephant, opossum, and platypus. The region delimited by the alpha exons and HS3.A enhancer and encompassing the 20 bp repeat, appeared to be highly conserved in placental mammals. We remark that, at the contrary, the Alpha marker remains entirely undetected in chicken and pig, because the similarity versus rodent IgA is low even at the level of peptide sequence.
As expected, the search in the Neanderthal genome [27] by inspection of related track in the UCSC human genome browser demonstrated that HS3, HS1.2 and HS4 were also present in the genome of our extinct relative (data not shown).
Enhancers in the sequenced species Figure 3 shows comparisons among assembled genome drafts. If the sequence of a specific region is still missing in the genome draft of a particular species, then the comparison versus the mouse genome will not find any match, even though it is expected. These missing sequences, however, can be present in the shot-gun sequences databases, as unassembled short sequences. We searched in GenBank genome-related databases, by BLAST, for the presence of murine (data not shown) and human IgH alpha exons (Table 1). Querying for this transcripted sequence, we tested the method limits. All the 35 positive species were mammals (8 primates). Then we BLASTed the murine (data not shown) and human (Table 1) enhancers against the same positive databases. The analysis showed that at least one 3'RR enhancer was present in 23 species, while all the three enhancers were present in only 8 species (in bold in Table 1), apart from human. The negative findings may be ascribed either to the non-completeness of the available genomic drafts/wgs/htgs databases or to an actual sequences divergence. It is worth noting that the longer map distance between the 3'RR features and the IgA gene, the less species detected (Table 1). Finally, the analysis confirmed that the mouse HS5, HS6, and HS7 were detectable by sequence similarity only in rat (data not shown).

Dot plot analysis of the 3'RR
It has been already reported, in man and mouse, that each HS1.2 enhancer is flanked, at some distance on both sides, by a 3 kb segment, and that these segments are in opposite orientation (palindromic), as evident from the dot plot analysis reported in Figure 4 (palindromic sequence in light blue). Very interestingly, we found that this organization is shared by 8 species in which the whole region was included in a fully assembled contig. The human versus non-primate dotplots are reported in Figure 5. It is worth noting that, while the similarity between the two components of the same palindrome is always very high (94% in human, Figure 4), the sequence itself almost completely varied among species ( Figure 5). Interestingly, HS1.2 always lies in the center of the palindrome. In the human 3' RR1, the region internal to the two components of the palindrome is inversely oriented with respect to the corresponding sequence of 3'RR2, as shown by the secondary diagonal line present at the core of the light blue frame ( Figure 4). In addition, sequence comparisons of the human 3'RRs with non-primate mammals harboring a single copy of 3'RR (panda, rabbit, mouse, and dog) showed that in 3 of them, with the exception of the mouse, the orientation of the region internal to the palindrome (containing the HS1.2) was identical to the human 3'RR2 ( Figure 5). This finding suggests that the 3'RR2 is ancestral with respect to the 3'RR1. The mouse showed an opposite orientation of the region internal to the palindrome. Moreover the mouse palindrome is larger than the human one, including the HS3 and the 20 bp repeat, thus originating HS3.A and HS3.B ( Figure 5). Very likely, an inversion event was triggered by the palindrome both in the mouse 3'RR and in the human 3'RR1.

HS1.2 enhancer in Trace Archive
While no polymorphisms have been reported for the enhancers HS3 and HS4 [28], the 2 human HS1.2 copies share a set of variant forms ( Figure 2) [21,29]. The main polymorphic feature of human HS1.2 consists of a tandem repeated pair of elements, i.e. a 40 bp sequence (40 mer, yellow boxes in Figure 2) and a~15 bp cytosinerich stretch (green boxes), that can or cannot be separated from the enhancer core (purple boxes) by a 29 bp sequence (red boxes). The HS1.2 human variants with more copies of the 40 mer showed an increasing effect on the transcription of a reporter gene in transfected cells [30]. In mouse there is just one copy of HS1.2 that constantly harbors a single copy of the 40 mer.
The Trace Archive databases of primate species sequences was searched by BLAST using the human HS1.2 sequence as query, to investigate the evolutionary history of this enhancer and to search for potential polymorphisms. Figure 6 summarizes the obtained results, along with all the available data from previous works [21] and from our previously unpublished sequencing data. This figure shows the organization of the HS1.2 in the different species. The highly conserved core of the enhancer (113 bp, purple) is constantly flanked, in the 11 nonprimate mammalian species (olive green background), by a partial 29 mer stretch (red element). The 102 bp terminal element (blue in Figure 6), constantly found in primates, was entirely detected only in panda when searched in non-primate mammals.
The most interesting finding of this analysis is the presence of a variable number of copies of the 40 mer (yellow in Figure 6) in all Catarrhini parvorder species, i.e. in both Hominoidea and Cercopithecoidea superfamilies. On the contrary, the duplication of the whole locus of the constant genes was found only in Hominoidea [3]. This observation strongly suggests that the emergence of the polymorphism occurred earlier with respect to the duplication. The number of the 40 mer varies from the 12 copies found in chimpanzee HS1.2, to a single one, as detected in some alleles in human, chimpanzee, and gorilla and in all the non-primate mammals. An additional variability found in Hominoidea is the occasional absence of the 29 mer, replaced by an 18 bp stretch of cytosine (green in Figure 6).

HS1.2 Transcription Factor Binding Sites (TFBS)
All the HS1.2 forms found in the different species ( Figure 6) were searched for transcription factor binding sites, using Alibaba2 software. Relevant results are summarized in Additional file 1 (full list in Additional file 2). Four TFBS (C/EBPalp, AP-2alpha, SP1, Oct1) are present in all the analyzed species; moreover, the NF variants are almost ubiquitous. The Additional file 1 clearly indicates that, while the number of C/EBPalp and Oct1 TFBS is substantially constant in different HS1.2 forms, the number of AP-2alpha and SP1 TFBS is proportional to the copies of the 40 mer present in that specific HS1.2 form. Note that c-myc containing 40 mers appears only in the Catarrhini HS1.2 forms and in dog.

Phylogenetic analysis
The four structures clustered in the 3'RR were analyzed for their sequence variation in 9 species comprising human, gorilla, orangutan, mouse, rat, rabbit, panda, dog and cat (see Additional file 3). The phylogenetic analysis obtained with the maximum likelihood method for C-alpha, HS3, HS1.2 and HS4 ( Figure 7A, B, C and 7D respectively), showed in all cases a similar variation from the standard reconstruction of mammals' phylogeny. Rodents and lagomorphs diverged from each other and from primates and carnivores, confirming at the nucleotide level the hypothesis of different evolutionary routes taken from the different groups, as shown after structural analysis. The concordance between the coding region (C-alpha) and the three enhancers in the observed divergence furthermore indicates that similar forces shaped the evolution of the whole 3' regulatory region, suggesting potential functional constrains also for the non coding sections.

Discussion
In the present article we have compared the genomic structure of the 3'RR domain of the IgH gene cluster in various species. We have confirmed that in all the analyzed species the 3'RR elements order is largely maintained. Two main results were also achieved: (i) a palindromic structured sequence flanks each HS1.2 enhancer; (ii) HS1.2 is polymorphic in all analyzed Catarrhini species and therefore rose before of the IgH locus duplication.  The most relevant result of our analysis was the finding that each HS1.2 enhancer, in all the examined species, is flanked by two 3 kb segments forming a palindromic structure (see Figures 4 and 5). Impressively, while the similarity of each pair of segments is extremely high, the similarity of the palindromic sequences among the different species is strikingly low. These findings suggest that the evolutionary pressure was much higher in maintaining the palindromic structure rather than the sequence conservation. As a consequence, it can be concluded that the palindrome plays a conformational role in the 3'RR functioning. The fact that the HS1.2 enhancer is constantly placed in the middle of the sequence spacing the two inverted elements, further supports the crucial role of the conformation of the region. We hypothesize that the palindrome triggers the formation of a hairpin structure externally exposing the HS1.2 enhancer (Figure 8). The orientation of HS1.2 is therefore irrelevant for the enhancer function. Indeed, it was found in different orientation in different species and also in different orientation in the two 3'RR human domains. Moreover, the opposite orientations of the two HS1.2 in human add support to the actual formation of the hairpin in vivo. The paired inverted sequences could form the stem of the hairpin. This is a fragile site that could be involved in rearrangements and translocation effect as in c-myc relocation [31]. An exchange involving the stem may result in the inversion of the loop region, changing the HS1.2 orientation. We can hypothesize that an inversion occurred at least two times since the divergence between Homo sapiens and Mus musculus (as suggested by dot plot analysis, see Figure 5 and above in Results). The inversions limits at least partially spanned the two palindromic regions, suggesting a cause/effect relationship. The palindrome could facilitate the inversion event, and the latter could contribute to perpetuate the palindrome.

HS1.2 polymorphisms
Population genetics of HS1.2 polymorphisms is available only in humans, for which six distinct variants have been sequenced (AY530201, AY530200, AJ544220, AJ544219, AJ544218 [21]; HM756255, our previously unpublished data). The human variants result from (i) a variable number of the 40 mer and its flanking cytosinerich box (yellow and green, respectively, in Figure 6); (ii) the sequence connecting the constant core of the HS1.2 (purple in Figure 6) to the stretches of 40 mer repeats, that is constituted by a 18 bp cytosine-rich box or by a 29 mer (green and red, respectively, in Figure 6). HS1.2 polymorphisms have been detected also in 6 out of 8 non-human primates for whom this enhancer was identified in genomic databases ( Figure 6). We acknowledge that the number of individuals for each species present in these archives is not known, as well as the sequence coverage. We can suppose that very few individuals, maybe a single one, are present in GenBank or Trace Archive. The variants we have detected are, therefore, very likely just the most frequent ones of each species. Nevertheless, it is worth noting that the two Platyrrhini (marmoset and titi) share the same HS1.2 form of the panda. We then hypothesize that this shared form of the enhancer was also at the base of all the Catarrhini variants. IgA and 3'RR have relevance in response to infections and in diseases [3,[16][17][18]20,32]. The more remarkable polymorphism found in 3'RR lies within the HS1.2 that has the central position in the palindromic structure, on top of the hairpin (Figure 8). We hypothesize that it can influence the modulation of the Ig switch through an interaction between the extruded enhancer and peptidic factors. The resulting molecular complexes may affect the mobility of the entire 3'RR and finally the formation of loops joining different constant and variable Ig portions. It could be interesting to investigate the role of the variants we have found in differently modulating the Ig switch and production in different species, especially in animal models such as macaque and mouse.

Conclusions
We remark that both coding sequences and wide noncoding regulatory regions have undergone to some evolutionary pressure, and that part of this pressure was Figure 7 Phylogenetic analysis of the C-alpha gene, HS3, HS1.2 and HS4 enhancers. Unrooted phylogenetic trees for C-Alpha (A), HS3 (B), HS1.2 (C) and HS4 (D). Branch length is scaled according to the evolutionary distance, shown as the number of base substitutions per site. The percentage of 100 replicate trees in which the taxa clustered together after the bootstrap analysis is shown at the root of the branches when significant (i.e. when higher than 50%). As examples, the value '100' at the base of the separation between the taxa Mus musculus and Rattus norvegicus in panel D means that the two taxa were clustered together in all the replicate trees, while '72' between Homo sapiens and Gorilla gorilla in the same panel means that 72% of the 100 replicate trees clustered these two groups together. aimed to preserve the 3'RR three-dimensional structure for the conservation of the regulatory function necessary for class switch recombination [12].
Inspecting the homologous human region in the UCSC human genome browser http://genome.ucsc.edu/ cgi-bin/hgTracks?db=hg18, we searched the tracks related to the Neanderthal genome [27] for the presence of HS3, HS1.2 and HS4.

Transfac analysis for transcription factors
The search for the transcription factor consensus was performed on the variant sequences of HS1.2 by the software AliBaba2.1 http://www.gene-regulation.com/ pub/programs/alibaba2/index.html. Additional file 1 lists the transcription factors detected at least in ten loci. The full list can be inspected as Additional file 2.

Phylogenetic analysis
Sequences of IgH constant alpha genes and of the enhancers HS3, HS1.2, HS4 retrieved after BLAST search, were used for the phylogenetic analysis (Figure 7; accession numbers and limits reported in Additional file 3). Multiple alignments of the sequences were obtained with Opal [37] and the results were manually inspected. The best-fitting substitution model was selected using ModelGenerator [38], under the Akaike information criterion (AIC1), as implemented in Multi-Phyl online [39]. The following models were integrated in the phylogenetic analysis: GTR + I + G for C-alpha; HKY + I for HS3 and HS4; HKY + G for HS1.2.
An unrooted tree was constructed using the maximum likelihood method applied to nucleotides, as implemented in Garli version 0.96 http://www.bio.utexas.edu/ faculty/antisense/garli/Garli.html, with bootstrap percentages obtained as a consensus after 100 replicates.