High divergence in primate-specific duplicated regions: Human and chimpanzee Chorionic Gonadotropin Beta genes
© Hallast et al. 2008
Received: 29 August 2007
Accepted: 07 July 2008
Published: 07 July 2008
Skip to main content
© Hallast et al. 2008
Received: 29 August 2007
Accepted: 07 July 2008
Published: 07 July 2008
Low nucleotide divergence between human and chimpanzee does not sufficiently explain the species-specific morphological, physiological and behavioral traits. As gene duplication is a major prerequisite for the emergence of new genes and novel biological processes, comparative studies of human and chimpanzee duplicated genes may assist in understanding the mechanisms behind primate evolution. We addressed the divergence between human and chimpanzee duplicated genomic regions by using Luteinizing Hormone Beta (LHB)/Chorionic Gonadotropin Beta (CGB) gene cluster as a model. The placental CGB genes that are essential for implantation have evolved from an ancestral pituitary LHB gene by duplications in the primate lineage.
We shotgun sequenced and compared the human (45,165 bp) and chimpanzee (39,876 bp) LHB/CGB regions and hereby present evidence for structural variation resulting in discordant number of CGB genes (6 in human, 5 in chimpanzee). The scenario of species-specific parallel duplications was supported (i) as the most parsimonious solution requiring the least rearrangement events to explain the interspecies structural differences; (ii) by the phylogenetic trees constructed with fragments of intergenic regions; (iii) by the sequence similarity calculations. Across the orthologous regions of LHB/CGB cluster, substitutions and indels contributed approximately equally to the interspecies divergence and the distribution of nucleotide identity was correlated with the regional repeat content. Intraspecies gene conversion may have shaped the LHB/CGB gene cluster. The substitution divergence (1.8–2.59%) exceeded two-three fold the estimates for single-copy loci and the fraction of transversional mutations was increased compared to the unique sequences (43% versus ~30%). Despite the high sequence identity among LHB/CGB genes, there are signs of functional differentiation among the gene copies. Estimates for dn/dsrate ratio suggested a purifying selection on LHB and CGB8, and a positive evolution of CGB1.
If generalized, our data suggests that in addition to species-specific deletions and duplications, parallel duplication events may have contributed to genetic differences separating humans from their closest relatives. Compared to unique genomic segments, duplicated regions are characterized by high divergence promoted by intraspecies gene conversion and species-specific chromosomal rearrangements, including the alterations in gene copy number.
Gene duplication has long been considered as one of the main mechanisms of the adaptive evolution and as an important source of the genetic novelty . Differential duplications and deletions of chromosomal regions including coding genes provide a powerful source for the evolution of species-specific biological differences . Compared to other mammals, the genomes of primates show an enrichment of large segmental duplications with high levels (>90%) of sequence identity . In the human genome particularly pronounced expansions of the copy number have been reported for genes involved in the structure and function of the brain . In comparison of human and its closest relative chimpanzee, large duplications contribute considerably (2.7%; ) to the overall divergence compared to single base pair substitutions (1.2–1.5%; [2, 6–12]). In addition to providing the substrate for non-allelic homologous recombination mediating genomic disorders (reviewed by ), the duplication architecture of a genome may also influence normal phenotypic variation. It has been estimated that ~20% of segmental duplications are polymorphic within human and chimpanzee populations contributing to intraspecies diversity [5, 14]. Despite the fact that segmental duplications cover a substantial fraction of the great apes genomes, the experimental data on the divergence and detailed evolutionary dynamics of duplicated gene regions is still limited. Sequence comparison of duplicated genes in sister-species would assist in understanding the mechanisms behind primate evolution and in associating the genetic divergence with phenotypic diversification.
One of the genomic regions that has evolved through several gene duplication events in primate lineage is the Luteinizing Hormone Beta (LHB)/Chorionic Gonadotropin Beta (CGB) gene cluster locating in human at 19q13.32. The LHB/CGB genes have an essential role in reproduction: placentally expressed HCG hormone contributes to the implantation process of the embryo during the early stages of pregnancy, pituitary expressed luteinizing hormone promotes the ovulation and luteinization of follicles and stimulates the steroidogenesis. In human, the cluster consists of seven highly homologous genes: an ancestral LHB and six duplicated CGB genes . The data from other primates supports the hypothesis of several sequential duplication events increasing gradually the number of CGB genes among primates from one (New World monkeys: the owl monkey, Aotus trivirgatus and the dusky titi monkey, Callicebus moloch) to six in human (Figure 1A). The mapped copy number of the CGB gene among Old World monkeys varies: three in rhesus macaque (Macaca mulatta), five in guereza monkey (Colobus guereza) and dusky leaf monkey (Presbytis obscura), four in orangutan (Pongo pygmaeus) . It has been suggested that CGB gene first arose in the common ancestor of the anthropoid primates after diverging from tarsiers .
The total length of the sequenced chimpanzee (Ch) LHB/CGB genomic region obtained from two overlapping BAC clones was 43,945 bp. It encompasses 1,029 bp of the flanking (centromeric side) RUVBL2 gene, 39,876 bp of the entire LHB/CGB cluster and 2,986 bp of the flanking (telomeric side) NTF5 gene (Figure 1B; Genbank submission: EU000308). Compared to the human (Hu) LHB/CGB region (Genbank: NG_000019), the sequence of the ChLHB/CGB cluster is 5,289 bp shorter. The sequence characteristics of the LHB/CGB genomic regions are similar in these two species: extremely high GC-nucleotide content (57% compared to average 41% for Hu and Ch ), high fraction of CpG islands (Hu 6.6%, Ch 6.1% compared to estimated 1–3.5% for Hu and Ch [6, 8]) and repetitive sequences (Hu 26.9%, Ch 25.15%), especially SINEs (Hu 23.23%, Ch 21.81%) (Additional file 1). High repeat content is also characteristic to several other duplicated regions, such as MHC class I region  and Apolipoprotein CI genomic segment .
As expected, there is a considerable similarity between the genomic organization of human and chimpanzee LHB/CGB clusters (Figure 1B). We identified two highly identical, apparently orthologous segments within the cluster: RUVBL2/LHB/intergenic region A (Ch 8,084 bp, Hu 7,973 bp; 96% sequence identity) and the region spanning from CGB1 to NTF5 (Ch 29,136 bp, Hu 28,568 bp; 94.8% sequence identity). However, a large species-specific structural rearrangement was localized between the intergenic region A and CGB1 gene, resulting in discordant size of human (45,165 bp) and chimpanzee (39,876 bp) clusters as well as species-specific number of duplicated gene copies, seven for human (1 LHB + 6 CGB genes) and six (1 LHB + 5 CGB) for chimpanzee. In human the rearranged region (12,700 bp) harbors one HCG beta coding CGB gene and one CGB1/2 -like gene (CGB2) recognized by a specific promoter-segment [18, 19], while in chimp (rearranged region 6,725 bp) only a CGB1/2 -like gene (CGB1B) is present in an inverted orientation compared to human. In addition, ChLHB/CGB cluster lacks the whole intergenic region C' and has a considerably shorter inverted intergenic region B' (Figure 1B).
Alternative scenarios leading to discordant gene number in LHB/CGB gene clusters in these two species are less supported. A minimum of three rearrangement events (Figure 1B) would have been required for the chimpanzee-specific deletion: loss of CGB and psNTF6A gene accompanied with the inversion of CGB2 and intergenic region B' (giving rise to CGB1B and chimp B'), and a separate deletion of region C' in chimp. Also the scenario of human-specific duplication (Figure 1B) would have required at least three events: an inversion of CGB1B and intergenic region Ch-B' (giving rise to part of Hu-B' and CGB2); either a direct duplication and translocation of CGB8 gene along with psNTF6A' or an inverted duplication and translocation of CGB5 along with psNTF6G' (resulting in CGB and psNTF6A), and a direct duplication and translocation of intergenic region C creating Hu-C' next to CGB1 gene.
A number of gene families have been characterized where the gene number differs between human and chimpanzee due to species-specific indels [7, 21]. To our knowledge, this is the first report where parallel independent duplications arisen within the same region in human and chimpanzee genomes give the best explanation for the observed structural differences between two sister-species. However, there are examples of independent duplications among primates resulting in convergent functions. A more recent duplication of X-linked opsin gene in New World howler monkeys (Alouatta seniculus and Alouatta caraya) compared to Old World primates, has lead to full trichromacy [22–24] and also there are independently arisen functionally close genes within the Growth Hormone/Somatomammotropin genome cluster in New World monkeys and Old World monkeys/hominoid lineages [20, 25].
Among the transitions, we observed an excess of C⇔T substitutions in LHB/CGB genes (38%) versus the whole region (28%). It is generally accepted that a high proportion of transitions are C to T substitutions in CpG dinucleotides, exhibiting about 10 times higher mutation rate than the genomic average [9, 32]. A higher GC content (64% vs 57%) and presence of CpG islands could explain an excess of C⇔T substitutions in LHB/CGB genes compared to intergenic regions.
Notably, when the polymorphism data from re-sequencing studies of the human (n = 95 individuals)  and the chimpanzee (n = 11 individuals) [Hallast et al, unpublished] LHB/CGB genes were incorporated into calculations, the sequence divergence in genes dropped from 1.8% to 1.26% in LHB, from 2.59% to 1.84% in CGB5, from 2.59% to 1.9% in CGB8 and from 2.18% to 1.57% in CGB7 (Figure 5; Additional file 3). Still, a higher divergence compared to the published data for single-copy genes remained.
Most importantly, our data indicates that the divergence estimates between human and chimpanzee might be substantially lower than reported when the intraspecies variation is taken into account.
Maximum likelihood estimation of ω (= dn/ds) values by PAML analysis and amino acid divergence in human and chimpanzee orthologous genes.
CGB1 gene (ω = 2.658, 1 synonymous, 4 non-synonymous changes; amino acid divergence 3.03%) stood out as the only locus in the gene cluster with estimated ω >1, which would be consistent with positive or adaptive evolution. It has been suggested that CGB1 has arisen in the common ancestor of African great apes through a duplication event accompanied by an insertion of a novel putative promoter, 5'UTR and exon 1. So far, the detection of CGB1 gene has been unsuccessful in orangutan Pongo pygmaeus by using the PCR approach  and in macaque Macaca mulatta (Genbank: AC202849) by in silico search of the current genome assembly. It has been shown that in human the contributions of CGB1 and its duplicate human-specific CGB2 to the summarized expression of the six CGB genes in placenta is much lower (1/1000 to 1/10000) compared to their gene dosage (two genes out of six total) [43, 45]. However, in testis the proportional contribution of CGB1/2 to the total CGB transcript pool is as high as 1/3 , which may indicate a possible role of these genes in male reproductive tract. Indeed, a recent study has shown that HCG alpha and HCG beta free subunits are produced in high amounts in the prostate and testes and are subsequently observed in seminal plasma .
In order to address which parts of the studied genes exhibit signals of evolving under positive selection, we used CRANN analysis calculating dnand dsvalues for sliding and overlapping windows along individual genes (Additional file 5) [47, 48]. In case of HCG beta subunit coding genes (CGB5, CGB8 and CGB7), the patterns of nucleotide differences between the sister species were similar – across the protein the synonymous substitutions exceeded the non-synonymous ones and were concentrated in the N- and C-terminus of the protein. In contrast, in CGB1 the non-synonymous substitutions dominated and were distributed in the signal peptide (amino acids 1–20) and the centre of the protein.
Primate-specific gene duplications have involved loci regulating immunity (e.g. MHC, beta-defensin, CD33rSiglec gene clusters), reproduction (e.g. LHB/CGB, GH/CSH, PRAME genes; Y-chromosomal gene families), development and adaptation (e.g. Beta Globin, Opsin, Rh blood group, Class 1 ADH, PRDM and FAM90A gene families), and brain functions (e.g. NAIP, ROCK1, USP10 and MGC8902 genes) [4, 16, 20, 22, 49–59]. It has been suggested that these duplication events may have been facilitated by non-allelic homologous recombination between Alu sequences, expanded into millions of copies all over primate genomes [54, 60]. There are several examples of independent duplication events in the New World monkey (NWM) and Old World monkey (OWM)/hominoid lineages as well as in distinct primate species. For example, the OWM and apes have three Opsin genes and are trichromats due to gene duplication at the base of the OWM lineage. In NWM, the situation is more variable: most species exhibit two Opsin genes, but in the howler monkey an additional gene duplication has led to full trichromacy [23, 24]. In Growth Hormone gene cluster (five to eight gene copies) some of the duplicate genes in the OWM/hominoid lineages have acquired a novel function and code for Chorionic Somatomammotropin (CSH genes) involved in the glucose metabolism of the fetus and the mother. However, the CSH genes are missing from the genomes of NW monkeys . Further species-specific duplications of GH/CSH genes have been reported for gibbon, macaque, chimpanzee and human [62, 63]. In human-chimpanzee comparison, only three GH/CSH genes are clearly orthologous . Other gene clusters with independent gene duplications in human, apes and macaque lineages are MHC and testes-expressed PRAME genes [53, 64]. The ancestral MHC-B duplicated into MHC-B and MHC-C in hominoids. While human MHC-C orthologs are found in African apes and orangutans, they are not present in macaque or any other OW monkeys [65, 66]. Species-specific evolutionary scenarios have also been reported for LHB/CGB genes . In addition to the structural differences between human and chimpanzee LHB/CGB genes reported in this study, an expansion of CGB genes up to 50 gene copies has been shown in gorilla (Gorilla gorilla) [4, 49]. Despite the high number of structural differences that has been shown between human and chimpanzee genomes by in silico whole-genome analysis (reviewed in ), the experimental data for copy number differences between these sister-species has been reported only for a few gene clusters (e.g. GH/CSH, LHB/CGB, PRAME, MGC8902, CXYorf1, KGF, CD33rSiglec, NANOG) [16, 20, 52, 53, 59, 68–70].
In addition to creating structural divergence among the species, duplications provide also the bases for diversification of gene functions. For primate-specific gene duplications, there is evidence of variability in evolutionary rates among the gene copies within and among the species, and of different selective constraints acting on different members of the gene clusters, such as MHC, beta-globin, GH/CSH, PRAME, Rh blood group, CD33rSiglec and beta-defensin genes [71–73]. For example, in human-chimpanzee comparison, the chimp MHC class I loci A, B and C are characterized by lower intra-species allelic variation compared to human, providing evidence that ancestral chimpanzee populations may have experienced a selective sweep [74, 75]. In marmoset LHB/CGB region, a switch of functions has happened between the ancestral LHB and the derived CGB gene. Although LHB and CGB genes are both present at the genomic level, only CGB gene is expressed in the pituitary and placenta tissues. Thus, Chorionic Gonadotropin Hormone is the only gonadotropin carrying also the luteinizing function fulfilled by Luteinizing Hormone in mammals [76, 77].
Duplicated genes tend to evolve in consort facilitated by active inter-locus gene conversion increasing and preserving sequence similarity among the gene copies [37, 38, 78]. Concerted evolution within species may lead to erroneous phylogenetic trees and to the overestimation of interspecies divergence dates [20, 79, 80]. Incorporation of gene conversion data into the equations for calculating interspecies divergence may be still a challenge requiring detailed knowledge of a particular genomic region.
So far, only a limited number of reports has been published that focus on detailed variation patterns of duplicated gene families in primates (MHC, Globin, GH/CSH and LHB/CGB genes) [7, 29, 81–83]. However, the common observation is that compared to single-copy segments, duplicated regions tend to exhibit higher interspecies diversity that could be explained by relaxed selective pressures and/or gene conversions spreading mutations. Thus, when calculating the divergence in duplicated primate-specific regions, the inclusion or exclusion of intraspecies variation data into equations may have a substantial impact on the divergence estimates.
We compared the human and chimpanzee duplicated LHB/CGB genome clusters and hereby present the detailed evidence for parallel independent duplication events in the two sister-species resulting in discordant number of CGB genes (6 in human, 5 in chimpanzee). To our knowledge, this is the first detailed report of parallel duplications in these sister-species leading to structural divergence. The evolutionary fate of duplicated genes is shaped by the interaction of gene conversion and selection. In LHB/CGB gene cluster, active gene conversion may have contributed to higher interspecies sequence divergence (both genic and intergenic sequences) and altered transition/transversion ratio compared to the single-copy loci. This higher divergence remained when intraspecies variation was taken into account. However, the drop in divergence estimates after incorporating the intraspecies variation data promotes to reanalyze previously studied loci, where the human-chimpanzee divergence may be substantially lower than initially calculated. Despite the high sequence homology among LHB/CGB genes (85–99%), there are signs of functional differentiation among the gene copies. To reconstruct the full evolutionary history of LHB/CGB gene cluster, further studies are required comprising high-quality sequence data from several primate species.
BAC library of common chimpanzee (Pan troglodytes) RPCI-43 was obtained from BACPAC Resource Center at the Children's Hospital Oakland Research Institute (Oakland, CA). In order to identify BAC clones containing LHB/CGB genome cluster we used recommended protocols and performed hybridization screening with a PCR-product containing LHB -specific sequence amplified from chimpanzee genomic DNA. Probe DNA was labeled with [32P]dCTP by random primer extension using DecaLabel™ DNA Labeling Kit (MBI Fermentas, Vilnius, Lithuania). BAC DNA was isolated using NucleoBond®BAC 100 plasmid purification kit (Macherey-Nagel GmbH & Co. KG, Düren, Germany). Two overlapping BAC clones (68P2 and 109B10) containing LHB/CGB genome cluster were sheared by nebulization to approximately 5 kb long fragments and used for shotgun library construction with TOPO®Shotgun Subcloning Kit (Invitrogen, Carlsbad, CA) according to manufacturers' instructions.
Plasmid DNA was purified with NucleoSpin®-Plasmid kit (Macherey-Nagel GmbH & Co. KG) and sequenced on ABI 3730 × l sequencer using BigDye®Terminator v3.1 Cycle Sequencing Kit (Applied Biosystems, Foster City, CA). Plasmid ends were sequenced using M13F and M13R primers, additional primers for primer walking were designed with the web-based version of the Primer3 software . Sequencing primers are available in Additional file 6. LHB/CGB genome cluster from two chimpanzee BAC clones were sequenced with an average redundancy of 7×, which was sufficient for assembly. Sequences were assembled using ContigExpress program from Vector NTI Suite 9 (Invitrogen) and the chimpanzee sequence was compared to human LHB/CGB genome cluster (GenBank: NG_000019). The full Chimpanzee LHB/CGB cluster sequence has been deposited to GenBank (accession number EU000308). Sequence alignments were performed and homologies determined by the web-based ClustalW  and Stretcher implemented in the EMBOSS package . The aligned sequences of the major transcripts of the chimp and human LHB/CGB genes are given in Additional file 7. Substitution and indel divergences were calculated as the percentage of the number of substitutions and the number of nucleotides in indels divided with the total number of aligned nucleotides in the specific genomic region. Phylogenetic trees were constructed by MEGA3.1  using Kimura's two parameter model to infer the neighbor-joining and the branch-and-bound algorithms to find maximum parsimony trees with 1000 replications for bootstrapping.
For coding regions maximum likelihood method  was used to estimate non-synonymous/synonymous rate ratio ω (= dn/ds) by CODEML implemented in PAML package version 4 [88, 89]. Codon frequencies were estimated from the dataset using the F3 × 4 option, other settings were as default. The simplest model M0 or one-ratio model was used to estimate the ω (an average over all the sites). As for CGB1 alternative reading frames have been predicted and no functional protein has been characterized so far. We defined CGB1 mRNA sequence and subsequently the predicted reading frame as supported by the published experimental data [18, 90]. In parallel, the number of non-synonymous substitutions per non-synonymous site (dn) and synonymous substitutions per synonymous site (ds) were estimated using an alternative method – the Li93 method [40, 41]. The significance of the difference between dnand dswas examined by a two-tailed Z-test  using MEGA3.1. To address which segments of the genes are evolving more rapidly, we performed CRANN analysis [47, 48] using sliding and overlapping windows and for the results in visual documentation of rate heterogeneities of dnand ds, Window size was set on 20 and shift size on 10 codons using the Li93 method.
Repetitive elements were detected by the REPEATMASKER program .
chorionic gonadotropin beta subunit gene
family with sequence similarity 90
human chorionic gonadotropin
keratinocyte growth factor
luteinizing hormone beta subnit gene
major histocompatibility complex
neuronal apoptosis inhibitory protein
neurotrophin 5 gene
preferentially expressed antigen of melanoma
neurotrophin 6 pseudogene(s)
Rho-dependent protein kinase
RuvB-like 2, homologue of the bacterial RuvB gene
We thank Tõnu Margus, Siim Sõber, Tarmo Annilo, Pekka Ellonen, Mari Kaunisto, Maija Wessman and Verneri Anttila for discussions and advice, and Kärt Tomberg for editing the English language. M.L. is a Wellcome Trust International Senior Research Fellow (grants no. 070191/Z/03/Z) in Biomedical Science in Central Europe and a HHMI International Scholar (grant #55005617). Additionally, the study has been supported by the Estonian Ministry of Education and Science core grant no. 0182721s06 and the Estonian Science Foundation grant no. 5796 (M.L., P.H.), as well as personal scholarships from the Centre for International Mobility (CIMO), Kristjan Jaak Stipend Program and World Federation of Scientists' (P.H.), the Center of Excellence grant of Complex Disease Genetics of the Academy of Finland and the Sigrid Juselius Foundation (J.S., A.P).
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.