- Research article
Chaperonin genes on the rise: new divergent classes and intense duplication in human and other vertebrate genomes
BMC Evolutionary Biologyvolume 10, Article number: 64 (2010)
Chaperonin proteins are well known for the critical role they play in protein folding and in disease. However, the recent identification of three diverged chaperonin paralogs associated with the human Bardet-Biedl and McKusick-Kaufman Syndromes (BBS and MKKS, respectively) indicates that the eukaryotic chaperonin-gene family is larger and more differentiated than previously thought. The availability of complete genome sequences makes possible a definitive characterization of the complete set of chaperonin sequences in human and other species.
We identified fifty-four chaperonin-like sequences in the human genome and similar numbers in the genomes of the model organisms mouse and rat. In mammal genomes we identified, besides the well-known CCT chaperonin genes and the three genes associated with the MKKS and BBS pathological conditions, a newly-defined class of chaperonin genes named CCT8L, represented in human by the two sequences CCT8L1 and CCT8L2. Comparative analyses from several vertebrate genomes established the monophyletic origin of chaperonin-like MKKS and BBS genes from the CCT8 lineage. The CCT8L gene originated from a later duplication also in the CCT8 lineage at the onset of mammal evolution and duplicated in primate genomes. The functionality of CCT8L genes in different species was confirmed by evolutionary analyses and in human by expression data. Detailed sequence analysis and structural predictions of MKKS, BBS and CCT8L proteins strongly suggested that they conserve a typical chaperonin-like core structure but that they are unlikely to form a CCT-like oligomeric complex. The characterization of many newly-discovered chaperonin pseudogenes uncovered the intense duplication activity of eukaryotic chaperonin genes.
In vertebrates, chaperonin genes, driven by intense duplication processes, have diversified into multiple classes and functionalities that extend beyond their well-known protein-folding role as part of the typical oligomeric chaperonin complex, emphasizing previous observations on the involvement of individual CCT monomers in microtubule elongation. The functional characterization of newly identified chaperonin genes will be a challenge for future experimental analyses.
Hsp60-like chaperonin proteins are well known for their role in assisting protein folding and in protecting cells from the deleterious effects of stress [1–5]. The eukaryotic cell expresses representatives of two distinct groups of chaperonin genes that are otherwise typical of bacteria (Group I) or archaea (Group II). In eukaryotes, Group I chaperonins are mostly expressed in mitochondria and chloroplasts, and Group II chaperonins are found in the eukaryotic cytosol [1, 6–10]. Chaperonin proteins form typical multi-subunit double-ringed structures collectively called "chaperonins" [9–13]. The Group I chaperonins are typically formed by the products of a single gene (groEL in bacteria; hsp60/cpn60 in mitochondria) assembled into a 14-subunit double-ringed structure in bacteria and into a double or single-ringed structure in mitochondria . Eukaryotic Group II chaperonin proteins assemble in a similar double-ringed oligomeric structure, called TRiC or CCT complex , composed of 16 subunits that in human are encoded by nine distinct genes (tcp1/cct1, cct2-5, cct6A-B, cct7-8) [8–10]. The CCT complex is mostly known for its role in folding the cytoskeleton proteins actin and tubulin [7, 16] and mutations in individual CCT subunits lead to defects in the functioning of the cytoskeleton and mitosis arrest .
As for other chaperones, the malfunctioning of chaperonin proteins has been associated with various human pathological conditions, the chaperonopathies [18–20]. In this respect, besides the canonical cct and cpn60 genes described above, three divergent hsp60-like genes have been more recently identified [21–23] in association with pathological conditions. One gene, MKKS , was named for its association with the developmental disease McKusick-Kaufman Syndrome and was soon after also identified as BBS6  for its association with the Bardet-Biedl Syndrome (BBS), another developmental condition involving cilium-related dysfunction . More recently two other hsp60-like BBS genes, named BBS10  and BBS12 , have been identified among fourteen genes (BBS1 to BBS14) so far associated with BBS. The protein products of MKKS/BBS6, BBS10 and BBS12 localize to the basal body of cilia and to the centrosome [26–28]. We will hereafter refer to the MKKS/BBS6 gene as MKKS, and collectively to the three hsp60-like BBS genes as the "BBS genes". The identification of these genes provides new perspectives on the spectrum of functionalities of Hsp60-like proteins in eukaryotes and on their role in development.
The recognition of chaperonopathies has increased the importance of elucidating the entire set of chaperone genes present in the human genome . The work reported here was conceived to: a) identify all Hsp60-like sequences encoded in the human and other genomes including all diverged chaperonin genes; b) reconstruct the evolutionary origins and relations of diverged chaperonin genes; c) distinguish with bioinformatics methods functional genes from pseudogenes; d) characterize structural properties of the corresponding proteins. We mostly devoted our attention to the characterization of the evolutionary history and structural properties of newly or recently identified sequences, referring the reader to the vast amount of published literature for information on functional/structural properties and the evolutionary history of mitochondrial Cpn60 or CCT-complex proteins.
Exhaustive searches of hsp60-like sequences were carried out in human and other genomes following and extending our "chaperonomics" methodological protocol . The extensive analysis of the genomes of human and other vertebrate species lead to the identification and characterization of many previously unknown sequences and to the discovery of a new, mammal-specific class of chaperonin proteins. Classification, evolutionary analysis and structural characterization of diverged chaperonin-like sequences should provide valuable information for future studies on the functional roles of these proteins.
Chaperonin sequences in the human genome
To identify all human hsp60-like sequences we queried the human genome using the nine human CCT subunit and mitochondrial Cpn60 sequences. Analogous extensive searches were performed in the mouse and rat genomes using corresponding queries. In the human genome, we found a total of 54 sequences with significant similarity to Hsp60 proteins (Tables 1 and 2). Fifteen sequences had a NCBI Entrez  gene descriptor assigned. Nine of these corresponded to the canonical CCT-subunit sequences and one, HSPD1, encoded the mitochondrial Cpn60 protein. Three sequences corresponded to the BBS genes MKKS, BBS10 and BBS12. We recovered two additional uncharacterized sequences designated in the NCBI Entrez Gene database as CCT8L1 and CCT8L2. Besides these complete Hsp60-like sequences, a sequence domain conserved across eukaryote species with highest similarity to the apical domain of the CCT3 protein has also been reported in PIKFYVE , a kinase belonging to the Fab1p protein family involved in corneal pathological conditions . In addition, we identified 39 other human hsp60 sequences that did not correspond to a gene descriptor in the NCBI Entrez Gene database (Table 2). All of these sequences contained in-frame stop codons or frame-shifts, suggesting that they were most likely pseudogenes. Thirty-five of these had not been described in the Pseudogene.org pseudogene database  and 33 were not listed in the Ensembl database , and are here annotated and classified for the first time. In analogous searches of the complete genomes of mouse and rat, we identified in each genome 14 chaperonin genes (nine for the canonical CCT monomers, one for the mitochondrial Cpn60, three BBS genes and one CCT8L gene), 38 pseudogenes in mouse and 61 pseudogenes in rat (see additional file 1: Table S1, for mouse sequences, and additional file 2: Table S2, for rat sequences).
Evolutionary origins of human BBS and CCT8L genes
A maximum-likelihood (ML) phylogenetic tree of human chaperonin-like proteins (Figure 1a) indicated that Hsp60-like BBS proteins are monophyletic (bootstrap support 86%) and that their common ancestor derived from a duplication event in the CCT8 lineage (bootstrap support 88%). The tree also showed that the unique ancestor of the two closely related genes CCT8L1 and CCT8L2 also originated in the CCT8 lineage from a more recent duplication event (bootstrap support 75%). The relation of BBS and CCT8L proteins with the CCT8 chaperonin subunit was confirmed with strong conditional probability support (0.99) by Bayesian tree construction (Figure 1b).
Although the association of BBS and CCT8L proteins with the CCT lineage was robustly supported, the high divergence of these sequences could produce clustering in the trees due to long-branch attraction. To address this concern, we built independent ML trees for each BBS or CCT8L sequence adding them separately to the tree of CCT subunits. All individual trees confirmed with strong bootstrap support the association of each BBS or CCT8L lineage with the CCT8 lineage (see additional file 3: Figure S1, additional file 4: Figure S2, additional file 5: Figure S3 and additional file 6: Figure S4). A ML evolutionary tree including hsp60-gene homologs found in the genomes of eighteen other vertebrate species, including representatives of several mammals, chicken, frogs, and fish, also confirmed the origin of BBS and CCT8L genes from the CCT8 lineage (see additional file 7: Figure S5).
We did not find CCT8L genes in the genomes of chicken, Xenopus laevis, or Danio rerio, representatives respectively of the reptile/bird, amphibian and fish lineages. However, among mammals we identified orthologs of CCT8L genes in genomes not only of placental mammals (Eutheria), but also of the marsupial opossum (Metatheria) and of the egg-laying platypus (Prototheria), suggesting that the CCT8L gene class originated at the onset of mammal evolution. All CCT8L gene orthologs were intron-less, indicating that their ancestor originated from a retro-transposition event. Two copies of CCT8L sequences were found in human and chimp and one CCT8L gene in all other genomes examined, including those from the other primate rhesus monkey (Macaca mulatta) and gray mouse lemur (Microcebus murinus) (Figure 2), suggesting that a duplication of the CCT8L gene occurred in Hominoidea after their separation from old world monkeys. However, the lone gene copy of CCT8L identified in rhesus monkey clustered with CCT8L1 in evolutionary trees (Figure 2), suggesting an earlier duplication of the gene and successive loss of the CCT8L2 copy from the genome of rhesus monkey. Close inspection of protein alignments revealed that the rhesus monkey CCT8L sequence included an anomalously diverged segment of about 50 amino acids of uncertain alignment. Excluding this segment from the analysis we obtained a different and more robustly supported tree topology (75% vs. 20% bootstrap value, see additional file 8: Figure S6, panels a and b), consistent with a later duplication of the CCT8L gene in Hominoidea. The tree also indicated that the removed segment was alone responsible for the overall higher evolutionary rate predicted for this sequence (see additional file 8: Figure S6).
Differentiation rate of BBS and CCT8L proteins
The branch lengths of the trees shown in Figure 1 indicate that BBS and CCT8L proteins have differentiated at much higher rates than CCT subunits. We applied a newly-developed, unbiased measure of differentiation called "B-index" (see Methods) to calculate differentiation of MKKS, BBS10 and BBS12 proteins from their respective last ancestor common to Actinopterygii (ray-finned fishes) and Sarcopterygii (including tetrapods), determined by rooting the trees with CCT8 proteins from corresponding fish and tetrapod species. Similarly, we calculated differentiation of CCT8L proteins from a eutherial ancestor rooting their tree with corresponding sets of CCT8 proteins (see footnotes of Table 3 and legend for Figure 2 for species represented in each tree). We estimated for the MKKS family an average evolutionary distance from their root of almost 0.7 substitutions per site, corresponding to a 6-fold higher rate of differentiation compared to the number of substitutions estimated in CCT8 proteins over the same period of time. For BBS10 and BBS12, we calculated a distance of about 1.0-1.2 substitutions per site, corresponding to a substitution rate about 8-10 times higher than in CCT8. Finally, for the mammal-specific family of CCT8L proteins, we estimated an evolutionary distance from their mammal root of about 0.3 substitutions per site. The smaller divergence of CCT8L proteins compared to BBS proteins reflects the more recent origin of the CCT8L gene. However, when scaled to the evolution of CCT8 sequences over the same periods of time, the substitution rate of CCT8L proteins was about 14-15 times higher than in CCT8 and 1.4-2.3 times higher than in BBS proteins.
Functional constraints in the evolution of CCT8L genes
We tested functionality of CCT8L genes from several species estimating ratios of non-synonymous and synonymous substitution rates (Ka/Ks) along their respective lineages (see Methods). The results of this analysis are shown in Table 4, which indicates the gene(s) analyzed (foreground), the two genes used to identify foreground and background branches, the estimated Ka/Ks values and their significance. The evolutionary lineages for which Ka/Ks values were evaluated correspond to the branch numbers identified in the overall tree topology shown in Figure 3. In this tree are represented the "molecular tree" of mammal phylogenetic relations , the gene duplication event involving the CCT8L gene family in primates as inferred by this analysis, and the pre-mammal separations of the CCT7, CCT8 and CCT8L families of paralogs. This topology is in agreement with the evolutionary tree of CCT8L genes (Figure 2) with the only exception of the weakly supported position of the CCT8L sequence from rhesus monkey (see above). The highly significant constraints in non-synonymous substitution rates (Ka/Ks < 1.0) estimated in the overall evolution of the CCT8L family (Table 4, foreground genes: "All CCT8L1/2") indicated that the CCT8L sequences are genes generally expressing functional proteins. In evaluating Ka/Ks ratios for individual CCT8L gene lineages (Table 4), significantly constrained evolution (Ka/Ks < 1.0) was detected for branches leading to most sequences, including those of murids, lemur, cow, dog, elephant, marsupial, and to the human CCT8L1 and CCT8L2 group along the hominoid lineage. Constrained evolution was also estimated for the CCT8L genes of armadillo and rhesus monkey, and for human CCT8L1 and human and chimp CCT8L2 after divergence of human and chimp, although in these cases Ka/Ks values did not reach significance. In the cases of the human and chimp CCT8L1 and CCT8L2 genes, the lack of significance can be related to the loss of power of the test since few mutations accumulated after separation of these sequences (see additional file 9: Table S3). In the case of rhesus monkey CCT8L, we found that its relatively high estimate of Ka/Ks (= 0.73) was due to the previously mentioned 50-amino-acid diverged region within this sequence. After removing this region we estimated Ka/Ks = 0.55. Only for the lineage of chimp CCT8L1 we estimated Ka/Ks ≅ 1, consistent with differentiation of a non-functional sequence. Since this sequence was also characterized by an internal stop codon and a frame-shift, all evidence strongly suggests that chimp CCT8L1 is a pseudogene.
To assess the functionality of human CCT8L sequences we investigated their expression profiles in comparison to those of human CCT monomers and BBS genes (see additional file 10: Table S4). Expression of CCT8L2 was confirmed by fifteen ESTs mostly identified from the testis, whereas only one EST identified as a CCT8L1 transcript has been so far reported (NCBI UniGene database, November 20, 2009). Querying the NCBI GEO microarray database, we found 542 expression-profile records identifying expression of CCT8L2, and none identifying expression of CCT8L1 (as of November 20, 2009). It must be noted, however, that CCT8L2 and CCT8L1 have similarity of 97.3% at the DNA level. Similarly to CCT8L2, another mammal-specific chaperonin gene, CCT6B, is also expressed almost exclusively in the testis, from which 160 ESTs have been reported versus an average of 4.4 ESTs (from 0 to 10 per tissue) found in all other tissues.
We identified in the human genome 39 sequences with significant similarity to CCT or HSPD1 genes that either were short fragments or were characterized by in-frame stop codons or frame-shifts. Based on their corruption, we classified these sequences as pseudogenes (Table 2). Similarly, searching the mouse and rat genomes we identified 38 and 61 pseudogenes, respectively (see additional file 1: Table S1 and additional file 2: Table S2). Most of these sequences have not been previously reported and are here systematically annotated and classified for the first time.
Based on phylogenetic-tree reconstructions (see additional file 11: Figure S7) or on similarity for the most corrupted sequences, we identified the association of 17 pseudogenes from human, 16 from mouse and 29 from rat with one of the nine CCT genes. None of the pseudogenes were related to MKKS, BBS10, BBS12 or CCT8L. To estimate the time of origin of the pseudogenes, we constructed trees using their translated sequences and chaperonin subunits from various vertebrate species (see additional file 12: Figures S8, and additional file 13: Figure S9). The trees indicated that all recognizable human CCT pseudogenes originated in the mammal lineage after separation from the reptile/bird lineage.
Of particular interest were the evolutionary relations of CCT6 genes and pseudogenes. Two CCT6 gene copies (CCT6A and CCT6B) were found, besides placental mammals, also in platypus and in opossum (see additional file 11: Figure S7), suggesting that the duplication of the CCT6 gene occurred in mammal evolution before separation of Theria (marsupial and placental mammals) and Prototheria (monotremes). We constructed an evolutionary tree of mammal CCT6 genes and pseudogenes (Figure 4) rooted by the corresponding gene sequences from chicken and frog (the diverged sequence Oa_con2651 from platypus was excluded from this tree to avoid long-branch attraction). Surprisingly, all recognizable human, mouse, and rat pseudogenes belonging to the CCT6 class branched in the tree from the CCT6A lineage after separation of the platypus, marsupial and placental mammal lineages.
Twenty-two pseudogenes in human (Table 2), and 22 and 32 pseudogenes in mouse and rat, respectively (see additional file 1: Table S1 and additional file 2: Table S2), associated with the mitochondrial HSPD1 gene (Group I cpn60). Evolutionary trees incorporating all pseudogenes from different vertebrate species were uninformative due to the presence among the pseudogenes of highly corrupted sequences, resulting in extensive long-branch attraction (not shown). An ML tree built using only translations of the most conserved pseudogenes (Figure 5) showed weakly supported but consistent association of the human pseudogenes with HSPD1 from primates, whereas pseudogenes from mouse and rat all associated with murid Hspd1 sequences, also indicating their relatively recent origin.
Ka/Ks ratio in the evolution of putative pseudogene sequences
Our characterization of many hsp60 sequences as pseudogenes was based on the presence of signs of corruption in the sequence (in-frame stop codons and frame-shifts). However, in-frame stop codons and frame-shifts may correspond to truncated proteins that are still functional. For example, although human HSPD1-5P and HSPD1-6P sequences contain signs of sequence corruption, EST data indicate that these sequences are expressed and possibly functional (see additional file 14: Table S5). To confirm our characterization, we estimated Ka/Ks ratios in trees that identified the pseudogene-sequence lineage (branch) including as out-group its parental gene and the orthologous gene sequence from chicken (see Methods). The results of these analyses (Table 2) showed in most cases Ka/Ks values not significantly different from 1.0, as expected in the differentiation of pseudogene sequences not constrained by coding of functional amino acids. Significant differences in mutation rate were estimated in the case of four sequences. These sequences, however, contained multiple in-frame stop codons and frame-shifts (Table 2).
Structural features of BBS and CCT8L proteins
Because of their high sequence divergence, it is unclear whether BBS and CCT8L Hsp60-like proteins conserve the typical fold of chaperonin subunits and their ability to assemble into typical oligomeric chaperonin complexes. Chaperonin monomers are characterized by three structural domains (apical, intermediate and equatorial) with distinct functional roles and it was relevant to investigate whether BBS and CCT8L proteins conserve each of the domains typical of chaperonins. Experimental models of eukaryotic Group II chaperonins are not available but their structural properties can be inferred by comparison with their closest relative, the archaeal thermosome. To infer tertiary-structure conservation in BBS and CCT8L proteins we predicted the secondary structure for each family from alignments of multiple sequences, excluding structure and sequence information from other families. The results of these predictions are schematically represented in Figure 6a, in relation to the secondary structure description of the PDB structure 1a6d chain A of the thermosome subunit ThsA from Thermoplasma acidophilum  (see additional file 15: Figure S10, additional file 16: Figure S11, additional file 17: Figure S12, additional file 18: Figure S13, additional file 19: Figure S14, and additional file 20: Figure S15 for detailed representations of multiple alignments, secondary structure predictions and alignments to the secondary-structure elements of ThsA). In Figure 6a, the secondary structure description of ThsA is shown (line "1a6d") in relation to the position of the equatorial, intermediate, and apical domains. The position of these elements in the tertiary structure of ThsA is represented in Figure 6b. Results of a blind test of the performance of the method on the corresponding ThsA sequence are also shown (Figure 6a, line "Ta_ThsA"). In this test most strand and helix elements (all "core" helices) described in the crystal structure were correctly predicted by the method, increasing our confidence in the reliability of other predictions. As expected, extensive conservation of predicted secondary-structure elements were also obtained from the alignment of human CCT sequences (Figure 6a, line "CCT") with only few discrepancies involving mostly short beta strands (4, 5, 18, and 21) and one short helix (P) exposed at the external surface of the archaeal thermosome complex. Secondary-structure predictions for mammal CCT8L and for vertebrate MKKS, BBS10 or BBS12 sequences were also largely consistent with the secondary-structure description of thermosome proteins. In the equatorial domain, CCT8L and BBS structure predictions corresponded to the mostly alpha-helical composition of this region. Variations were more obvious in BBS12 and involved mostly terminal elements of helices (most notably helices P and Q) and exposed beta-strands (strands 19-21). In the intermediate domain the core helical-bundle elements (helices F, G, and K) as well as the extensive beta-sheet composition of this region were predicted in all BBS and CCT8L proteins. Exceptions were, in all sequences, the two short strands 5 and 6, which are part of an external elongated loop in the thermosome structure, and, in BBS12, the N-terminal part of helix K, which in the thermosome protrudes towards the central cavity covering the ATP hydrolysis site (Figure 6b). The apical domain is formed in the thermosome by a 4-strand anti-parallel beta-sheet (strands 9, 10, 15, and 16) with strand 10 extending into a second parallel beta-sheet (strands 10, 12, 13, and 14). The two sheets are flanked by a helix (J) and are surmounted by a structure composed of two contacting helices (H and I) and an extended loop including strand 11. All helices and most strands of the apical domain were recognized in BBS sequences. Most obvious differences were observed in BBS12 proteins, where the long apical helix H was predicted to be shortened, and in CCT8L, where helix I and strand 11 were not predicted.
Differentiation of monomer-monomer interaction regions in BBS and CCT8L proteins
To investigate the potential of CCT8L and BBS proteins to establish intra-ring and inter-ring monomer-monomer contacts, we investigated the relative conservation of predicted contact positions in CCT, BBS and CCT8L sequences. We identified potential contact positions in these families based on homology to the positions involved in inter-monomer contacts in the crystal structure of the T. acidophilum thermosome complex (PDB code 1a6d). After identifying all contact positions in CCT monomers, we distinguished among them those that conserved similar amino acid types across the nine monomers. We counted how many amino acid types observed in all or in conserved contact positions of CCT monomers were also observed in the T. acidophilum Thsa sequence, in human CCT8Ls or in human BBS sequences (Table 5). A complete list of all and conserved positions considered and of the residue types observed in these positions in all sequences can be found in additional file 21: Table S6. Thsa and CCT subunits conserve 89% similarity in monomer-monomer contact positions, which is substantially higher than the average similarity (62%-66%) of all homologous positions between the two families. The higher similarity of monomer-monomer contact regions is consistent with functional conservation between the two families of these positions. In contrast, the high rate of differentiation in comparison to global average differentiation shown in putative monomer-monomer contact positions in BBS or CCT8L sequences (Table 5), suggests a loss of capability to associate into a typical CCT-like oligomeric complex. This result is consistent with the presence in BBS proteins of inserted elements (Figure 6) that would interfere with formation of the complex [22, 23].
Conservation of ATP-binding and hydrolysis residues in BBS and CCT8L proteins
We compared conservation in CCT, BBS and CCT8L sequences of the ATP-binding and ATP-hydrolysis motifs typical of chaperonins of Group II (Figure 7). Although there is considerable variation among BBS and CCT8L sequences at some of the ATP-binding positions, we observed complete conservation of the crucial ATP-binding dipeptide Gly-Pro, suggesting that these otherwise divergent proteins conserve ATP-binding ability. In the ATP-hydrolysis sites, substantial loss of conservation has been reported in MKKS  and in BBS12 . In the CCT8L, MKKS and BBS10 families, unusual substitutions are observed in phosphate-binding positions and within the catalytic triad, where only Asp is conserved in MKKS. The effect that these mutations may have on the hydrolytic activity in these protein families is unclear. The high level of differentiation of this region in BBS12 (where the ATP-hydrolysis motif is not recognizable) strongly suggests that BBS12 has lost hydrolytic activity.
Conservation of substrate-binding positions
Three positions crucial in determining substrate-specificity of CCT monomers have been identified in the distal region of helix I in the apical domain . We analyzed conservation at these positions across vertebrate species in all Group II chaperonin families and in the Fab1_TCP domain across vertebrate orthologs of the PIKFYVE protein kinase (Table 6). These positions are strikingly conserved within each CCT monomer type (with the exception of CCT6B) across species and are characteristically different between monomer types. They are mostly conserved also in the Fab1_TCP domain across vertebrate sequences. In contrast, in BBS and, particularly, in CCT8L sequences, the homologous positions are significantly more differentiated.
We identified the full complement of chaperonin hsp60 genes and pseudogenes encoded in the human genome and, for comparison, in the genomes of the model organisms mouse and rat. We delimited the set of hsp60 genes encoded in the human genome to: a) nine canonical cct genes (CCT1 to CCT8 including CCT6A and CCT6B) involved in formation of the CCT complex; b) the cpn60 gene (HSPD1) of mitochondrial origin; c) the three highly diverged hsp60-like BBS genes MKKS, BBS10 and BBS12; and d) a newly characterized class of genes, CCT8L, represented in human by CCT8L1 and CCT8L2. We also identified a plethora of pseudogene sequences, many of which had not been previously reported. The comparative analyses of these families of functional genes and of their pseudogenes revealed their evolutionary history and relationships.
In contrast to the uncertainty of the duplication pattern of canonical CCT subunits (our results and [38, 39]) the origin of Hsp60-like BBS and CCT8L proteins was unambiguously identified by phylogenetic tree reconstructions. Our analyses indicated that hsp60-like BBS genes originated monophyletically from a gene duplication event in the CCT8 gene lineage. In addition, we determined that the CCT8L family also originated in the CCT8 lineage, from a more recent retrotransposition event. The presence of this gene family in placental mammals, marsupials and monotremes but not in reptiles/birds or other vertebrate species, indicates that this family originated at the onset of mammal evolution, before divergence of Theria and Prototheria. Presence of two highly similar CCT8L genes (CCT8L1 and CCT8L2) in the genomes of human and chimp and of a single copy in other mammal genomes, including rhesus monkey, suggests that the duplication of this gene occurred in the ape lineage (Hominoidea) after its divergence from the old-world monkeys (Cercopithecidae). Multiple evidence gathered in this work indicates that CCT8L sequences (and at least one of the two paralogs in Hominoidea) encode for functional genes: (i) reduced rates of non-synonymous mutation were estimated along their lineages, as expected for functionally-constrained protein-coding genes; (ii) pseudogenes as ancient or more recent than the CCT8L genes were heavily degenerated and no pseudogenes pre-dating mammal evolution could be identified. In contrast, although CCT8L sequences originated early in mammal evolution, they did not show signs of degeneration (with the exception of the chimp CCT8L1 ortholog); (iii) multiple EST and microarray data have been collected for CCT8L2, mostly from testis, and one EST for CCT8L1 has been reported from placental tissue (as per the UniGene EST and GEO expression data, November 23, 2009). These features taken together are strong evidence that at least CCT8L2 in Hominoidea and the lone CCT8L gene in other mammal lineages encode for functional proteins. The sparse expression of CCT8L1 in human and the presence of one in-frame stop codon and one frame-shift in its orthologous sequence from chimp raise doubts about the functionality of this sequence.
Numerous sequences associated with cct or cpn60 genes found in the human, mouse or rat genomes were classified as pseudogenes based on the presence of internal stop codons, frame-shifts and non-significant difference in synonymous and non-synonymous mutation rates. Among them, the sequences HSPD1-5P and HSPD1-6P appear to be expressed based on EST analysis (see additional file 14: Table S5) and may represent instances of expressed pseudogenes . A general explosion of pseudogene generation in the human and murid lineages after they separated from the carnivore lineage has been reported . Our analysis of chaperonin pseudogenes is consistent with this observation, although their relatively high rate of degeneration suggests that pseudogenes generated before the origin of mammals may have degraded beyond recognition. The intense duplication of chaperonin sequences witnessed by the many pseudogenes identified in the human and murid genomes, very likely provided opportunities for multiple paralogy, resulting in the proliferation of chaperonin classes in the vertebrate and mammal lineages.
Although the Hsp60-like BBS and CCT8L protein families have considerably differentiated from the canonical CCT subunits and within themselves, our analyses indicated that they still conserve the overall three-domain structure typical of CCT proteins. Structure and sequence variations predicted for their apical domains may reflect distinctive substrate specificities. In particular, lack of conservation at positions crucial in providing substrate-specificity to CCT monomers  suggests that BBS and CCT8L proteins may interact with their substrate(s) in different regions as compared with the canonical CCT subunits. Sequence differentiation patterns and acquisition of inserted elements in correspondence to potential monomer-monomer contact regions suggested that BBS and CCT8L proteins do not assemble in a CCT-like complex. This prediction is supported by experimental evidence showing that MKKS localizes as a free monomer at the pericentriolar material of centrosomes . In this respect, it is also interesting to observe that among BBS and CCT8L sequences the ATP-hydrolysis motif "Gly-Asp-Gly-Thr", remarkably conserved among canonical chaperonins , has differentiated in MKKS and in BBS12 [23, 27]. This condition may indicate that these families have lost the hydrolytic activity necessary for the functionality of the chaperonin complex [43–52]. It has been shown for the archaeal thermosome complex that mutation of the ATP-hydrolysis-motif Asp residue prevents hydrolysis and productive protein folding  and that some CCT subunits, among which CCT8, dissociate in vitro from the complex in conditions that prevent hydrolysis of ATP .
Functionalities independent from formation of the complex have also been reported for canonical CCT subunits. TCP1 monomers not in complex confer enhanced salt tolerance in plants . Individual CCT subunits have been reported to associate in vitro with cytoskeleton structures, selectively binding to microtubule filaments  or to actin polymerizing filaments . The localization of Hsp60-like BBS proteins at the cilium basal body and at the centrosome [26–28] suggests that they may also interact and associate with, for example, cytoskeleton structures in promoting the correct development of cilia [28, 57]. The multiple structural and experimental evidence that BBS and CCT8L proteins do not form a canonical CCT-like complex provides strong indication that eukaryotic Group II chaperonin-protein functionalities extend beyond those of the typical oligomeric complex.
Chaperonin proteins are key players in ensuring and preserving cell and organism functionality under normal and stressful conditions and their biological and medical importance is undeniable. The recent discovery of hsp60 genes directly implicated in specific pathological conditions, the chaperonopathies, extends our understanding of the roles of chaperonin proteins in cellular processes and enhances awareness of their importance in pathology [18–20]. Here, we have provided a comprehensive, unifying framework encompassing all members of the extended hsp60 family of genes and pseudogenes. This unifying framework contributes to our understanding of the evolutionary history of the extended hsp60 family and widens our perspectives on the multiple roles that chaperonin proteins have acquired in vertebrates. Our findings highlight how differentiation of the chaperonin protein family in mammals has been facilitated by intense processes of gene duplication. The roles, mechanisms of action, and involvement in pathogenesis of individual chaperonin molecules beyond those typical of their canonical oligomeric complexes constitute aspects of chaperonin physiology particularly promising for future experimental testing.
Identification of chaperonin genes in eukaryotic genomes
Searches of genes for Hsp60-like proteins were exhaustively performed using TBLASTN  at Ensembl  and BLAT  at UCSC  on the genome sequences of human (NCBI Assembly 36, Genebuild Ensembl Dec 2006), mouse (NCBI Assembly m37, Genebuild Ensembl Apr 2007) and rat (Assembly RGSC 3.4, Genebuild Ensembl Feb 2006). We used the nine canonical human CCT proteins and the Cpn60 protein (mitochondrial Hsp60) as queries. We recursively queried the genomes with the sequences recovered from previous searches until no other Hsp60 sequences were detected. We used both search engines also to recover the full list of annotated hsp60-like genes in several other mammal genomes and in chicken. Sequences from frog (Xenopus sp.) were retrieved from the NCBI nr (non-redundant) database using PSI-BLAST  with Cpn60 and the individual CCT subunits as queries. To recover complete hsp60 gene and pseudogene sequences, after the TBLASTN searches the genomic sequences from approximately 2,000 nt upstream to 2,000 nt downstream of the hit-regions were excised and the hsp60 sequences were extracted using the homology-based gene prediction method implemented in FGENESH+  at the Softberry web site . For pseudogenes, when FGENESH+ failed to recognize the complete sequence due to in-frame stop codons or frame shifts in the sequence, the coding region was manually reconstructed, aligning the three-frame-translations of the genomic sequence to the query sequence with the multiple protein alignment program ITERALIGN . The Pseudogene.org [33, 65] database and Ensembl , Entrez  and HUGO  annotations were consulted for the presence of annotated human pseudogenes, as recorded in our tables of results.
Multiple sequence alignment and secondary structure prediction
Multiple sequence alignments were obtained using MUSCLE , which in previous analyses [68, 69] performed well when aligning divergent sequences. Alignments were manually adjusted as needed. Predictions of secondary structure for each protein family were performed from their multiple alignment using the Jnet algorithm as implemented in the JPRED-3 secondary structure prediction server [70, 71].
Evolutionary tree reconstructions
To infer phylogenetic relationships, evolutionary trees were obtained using the maximum-likelihood (ML) tree-building procedure implemented in PHYML  using the default JTT substitution model and 100 bootstrap resampling replicates (each ML tree reconstruction being quite time consuming). Selected trees were compared with those obtained with the Bayesian approach implemented in MrBayes 3.1  using the WAG substitution model and 10,000 iterations for the MCMC process. Conditional probabilities were estimated sampling the MCMC process every 10 iterations after 2,500 burn-in iterations (sample size 750).
Estimates of evolutionary divergence of sequence families
We obtained rates of divergence among families of sequences using a newly developed estimator, called "B-index". The B-index is an unbiased estimator of the average divergence of a family of sequences from its last common ancestor (root) that takes into consideration the correlations among sequences determined by their phylogenetic tree. Briefly, given a rooted tree, a terminal branch of length d i of the original tree is considered a "cluster" of size w i = 1 and length d = d i . Each fork-structure comprising two terminal branches (clusters) of lengths d1 and d2 and sizes w1 and w2 bifurcating from a stem-branch of length d s is considered in turn. The average length d of each fork-structure is computed as d = (d1 + d2)/2 + d s and the average size w of the structure is defined as w = [2(d1 + d2)/2 + 1d s ]/[(d1 + d2)/2 + d s ] = (d1 + d2 + d s )/d. Each fork-structure is progressively replaced by a corresponding cluster of length d and size w. The procedure is repeated merging bifurcating clusters of lengths d1 and d2 and sizes w1 and w2 connected to a stem-branch of length d s into a larger cluster of average length d = (w1d1 + w2d2)/(w1 + w2) + d s and average size w = (d1w1 + d2w2 + d s )/d, until the tree is reduced to two clusters connected to the root (d s = 0). The global average differentiation D ("B-index") and size W can finally be computed as D = (w1d1 + w2d2)/(w1 + w2) and W = w1 + w2. It can be shown that DW = L is the length of the tree (sum of all branch lengths). If two sequence families A and B are sampled from the same set of species and W A = W B , then D B /D A = L B /L A and the relative rate of differentiation of the two families of sequences can be estimated by the ratio of their tree lengths. The B-index has several advantages compared to the most commonly used average pair-wise sequence-similarity measure: (i) it takes into account the correlation among sequences imposed by the topology of the evolutionary tree; (ii) in contrast to average pair-wise similarity, its expectations are invariant over the number and phylogenetic relations of sequences sampled from a cluster with the same common ancestor and evolutionary model; and (iii) with the B-index, the average differentiation rate of a protein family relative to a reference family sharing the same evolutionary relations (e.g., sampled from the same set of species) is simply estimated by the ratio of the lengths of the evolutionary trees of the two families.
Estimates of ratios of non-synonymous vs. synonymous mutation rate (Ka/Ks)
Classification of hsp60 sequences as functional genes or pseudogenes was supported by the absence or presence of in-frame stop codons and frame-shifts, and by estimating non-synonymous vs. synonymous mutation-rate ratios (Ka/Ks) along relevant branches of evolutionary trees. Estimates were obtained using the maximum-likelihood branch-specific model implemented in PAML4 . In the case of pseudogenes, Ka/Ks values are expected not to significantly differ from 1 (absence of positive or negative selection at the protein level) whereas protein-coding genes, whose evolution is dominated by negative or positive selection, are expected to be characterized, respectively, by Ka/Ks < 1 or Ka/Ks > 1. Briefly, we applied the PAML4 "branch-specific model" creating an evolutionary tree including the sequences whose evolutionary lineage was tested, the appropriate sister sequence (in the case of pseudogenes, the gene sequence from whose lineage the pseudogene originated) and an out-group sequence. The tree branch(es) to be tested are designated as "foreground" and other branches as "background." Using the branch-specific model the Ka/Ks ratio is estimated for the foreground branch(es) and an analogous ratio is estimated for the background branches. The likelihood L1 generated using this evolutionary model is compared to the likelihood L0 of a null model where Ka/Ks for foreground branches is fixed to 1.0. In the Log-likelihood Ratio Test (LRT) the significance of the likelihood differences between the model with free estimate of Ka/Ks and the null model is estimated by the quantity 2•ln(L1/L0), which approximates a χ2 distribution.
All relevant gene and pseudogene information, including start and end positions, chromosomal location, strand, number of exons, GenBank accession number for functional genes, and Ensembl or Pseudogene.org ID for pseudogenes, can be found in additional file 22: Table S7. Newly annotated sequences have been approved and deposited in the Human Genome Organization (HUGO) database .
Chaperonin Containing TCP1
TCP1 Ring Complex.
Hartl FU, Hayer-Hartl M: Molecular chaperones in the cytosol: from nascent chain to folded protein. Science. 2002, 295: 1852-1858. 10.1126/science.1068408.
Frydman J: Folding of newly translated proteins in vivo: the role of molecular chaperones. Annu Rev Biochem. 2001, 70: 603-647. 10.1146/annurev.biochem.70.1.603.
Sigler PB, Xu Z, Rye HS, Burston SG, Fenton WA, Horwich AL: Structure and function in GroEL-mediated protein folding. Annu Rev Biochem. 1998, 67: 581-608. 10.1146/annurev.biochem.67.1.581.
Bukau B, Horwich AL: The Hsp70 and Hsp60 chaperone machines. Cell. 1998, 92: 351-366. 10.1016/S0092-8674(00)80928-9.
Hemmingsen SM, Woolford C, Vies van der SM, Tilly K, Dennis DT, Georgopoulos CP, Hendrix RW, Ellis RJ: Homologous plant and bacterial proteins chaperone oligomeric protein assembly. Nature. 1988, 333: 330-334. 10.1038/333330a0.
Trent JD, Nimmesgern E, Wall JS, Hartl FU, Horwich AL: A molecular chaperone from a thermophilic archaebacterium is related to the eukaryotic protein t-complex polypeptide-1. Nature. 1991, 354: 490-493. 10.1038/354490a0.
Kubota H, Hynes G, Willison K: The chaperonin containing t-complex polypeptide 1 (TCP-1). Multisubunit machinery assisting in protein folding and assembly in the eukaryotic cytosol. Eur J Biochem. 1995, 230: 3-16. 10.1111/j.1432-1033.1995.tb20527.x.
Macario AJL, Malz M, Conway de Macario E: Evolution of assisted protein folding: the distribution of the main chaperoning systems within the phylogenetic domain archaea. Front Biosci. 2004, 9: 1318-1332. 10.2741/1328.
Carrascosa JL, Llorca O, Valpuesta JM: Structural comparison of prokaryotic and eukaryotic chaperonins. Micron. 2001, 32: 43-50. 10.1016/S0968-4328(00)00027-5.
Large AT, Lund PA: Archaeal chaperonins. Front Biosci. 2009, 14: 1304-1324. 10.2741/3310.
Ranson NA, Clare DK, Farr GW, Houldershaw D, Horwich AL, Saibil HR: Allosteric signaling of ATP hydrolysis in GroEL-GroES complexes. Nat Struct Mol Biol. 2006, 13: 147-152. 10.1038/nsmb1046.
Ranson NA, Dunster NJ, Burston SG, Clarke AR: Chaperonins can catalyse the reversal of early aggregation steps when a protein misfolds. J Mol Biol. 1995, 250: 581-586. 10.1006/jmbi.1995.0399.
Ranson NA, White HE, Saibil HR: Chaperonins. Biochem J. 1998, 333 (Pt 2): 233-242.
Levy-Rimler G, Bell RE, Ben-Tal N, Azem A: Type I chaperonins: not all are created equal. FEBS Lett. 2002, 529: 1-5. 10.1016/S0014-5793(02)03178-2.
Frydman J, Nimmesgern E, Erdjument-Bromage H, Wall JS, Tempst P, Hartl FU: Function in protein folding of TRiC, a cytosolic ring complex containing TCP-1 and structurally related subunits. EMBO J. 1992, 11: 4767-4778.
Kubota H, Hynes G, Carne A, Ashworth A, Willison K: Identification of six Tcp-1-related genes encoding divergent subunits of the TCP-1-containing chaperonin. Curr Biol. 1994, 4: 89-99. 10.1016/S0960-9822(94)00024-2.
Stoldt V, Rademacher F, Kehren V, Ernst JF, Pearce DA, Sherman F: Review: the Cct eukaryotic chaperonin subunits of Saccharomyces cerevisiae and other yeasts. Yeast. 1996, 12: 523-529. 10.1002/(SICI)1097-0061(199605)12:6<523::AID-YEA962>3.0.CO;2-C.
Cappello F, Conway de Macario E, Marasa L, Zummo G, Macario AJL: Hsp60 expression, new locations, functions and perspectives for cancer diagnosis and therapy. Cancer Biol Ther. 2008, 7: 801-809.
Macario AJL, Conway de Macario E: Chaperonopathies by defect, excess, or mistake. Ann N Y Acad Sci. 2007, 1113: 178-191. 10.1196/annals.1391.009.
Macario AJL, Conway de Macario E: Sick chaperones, cellular stress, and disease. N Engl J Med. 2005, 353: 1489-1501. 10.1056/NEJMra050111.
Stone DL, Slavotinek A, Bouffard GG, Banerjee-Basu S, Baxevanis AD, Barr M, Biesecker LG: Mutation of a gene encoding a putative chaperonin causes McKusick-Kaufman syndrome. Nat Genet. 2000, 25: 79-82. 10.1038/75637.
Stoetzel C, Laurier V, Davis EE, Muller J, Rix S, Badano JL, Leitch CC, Salem N, Chouery E, Corbani S, et al: BBS10 encodes a vertebrate-specific chaperonin-like protein and is a major BBS locus. Nat Genet. 2006, 38: 521-524. 10.1038/ng1771.
Stoetzel C, Muller J, Laurier V, Davis EE, Zaghloul NA, Vicaire S, Jacquelin C, Plewniak F, Leitch CC, Sarda P, et al: Identification of a novel BBS gene (BBS12) highlights the major role of a vertebrate-specific branch of chaperonin-related proteins in Bardet-Biedl syndrome. Am J Hum Genet. 2007, 80: 1-11. 10.1086/510256.
Katsanis N, Beales PL, Woods MO, Lewis RA, Green JS, Parfrey PS, Ansley SJ, Davidson WS, Lupski JR: Mutations in MKKS cause obesity, retinal dystrophy and renal malformations associated with Bardet-Biedl syndrome. Nat Genet. 2000, 26: 67-70. 10.1038/79201.
Blacque OE, Leroux MR: Bardet-Biedl syndrome: an emerging pathomechanism of intracellular transport. Cell Mol Life Sci. 2006, 63: 2145-2161. 10.1007/s00018-006-6180-x.
Hirayama S, Yamazaki Y, Kitamura A, Oda Y, Morito D, Okawa K, Kimura H, Cyr DM, Kubota H, Nagata K: MKKS is a centrosome-shuttling protein degraded by disease-causing mutations via CHIP-mediated ubiquitination. Mol Biol Cell. 2008, 19: 899-911. 10.1091/mbc.E07-07-0631.
Kim JC, Ou YY, Badano JL, Esmail MA, Leitch CC, Fiedrich E, Beales PL, Archibald JM, Katsanis N, Rattner JB, et al: MKKS/BBS6, a divergent chaperonin-like protein linked to the obesity disorder Bardet-Biedl syndrome, is a novel centrosomal component required for cytokinesis. J Cell Sci. 2005, 118: 1007-1020. 10.1242/jcs.01676.
Marion V, Stoetzel C, Schlicht D, Messaddeq N, Koch M, Flori E, Danse JM, Mandel JL, Dollfus H: Transient ciliogenesis involving Bardet-Biedl syndrome proteins is a fundamental characteristic of adipogenic differentiation. Proc Natl Acad Sci USA. 2009, 106: 1820-1825. 10.1073/pnas.0812518106.
Brocchieri L, Conway de Macario E, Macario AJL: Chaperonomics, a new tool to study ageing and associated diseases. Mech Ageing Dev. 2007, 128: 125-136. 10.1016/j.mad.2006.11.019.
Entrez Gene. [http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene]
Shisheva A, Sbrissa D, Ikonomov O: Cloning, characterization, and expression of a novel Zn2+-binding FYVE finger-containing phosphoinositide kinase in insulin-sensitive cells. Mol Cell Biol. 1999, 19: 623-634.
Li S, Tiab L, Jiao X, Munier FL, Zografos L, Frueh BE, Sergeev Y, Smith J, Rubin B, Meallet MA, et al: Mutations in PIP5K3 are associated with Francois-Neetens mouchetee fleck corneal dystrophy. Am J Hum Genet. 2005, 77: 54-63. 10.1086/431346.
Karro JE, Yan Y, Zheng D, Zhang Z, Carriero N, Cayting P, Harrrison P, Gerstein M: Pseudogene.org: a comprehensive database and comparison platform for pseudogene annotation. Nucleic Acids Res. 2007, 35: D55-60. 10.1093/nar/gkl851.
Springer MS, Stanhope MJ, Madsen O, de Jong WW: Molecules consolidate the placental mammal tree. Trends Ecol Evol. 2004, 19: 430-438. 10.1016/j.tree.2004.05.006.
Ditzel L, Lowe J, Stock D, Stetter KO, Huber H, Huber R, Steinbacher S: Crystal structure of the thermosome, the archaeal chaperonin and homolog of CCT. Cell. 1998, 93: 125-138. 10.1016/S0092-8674(00)81152-6.
Spiess C, Miller EJ, McClellan AJ, Frydman J: Identification of the TRiC/CCT substrate binding sites uncovers the function of subunit diversity in eukaryotic chaperonins. Mol Cell. 2006, 24: 25-37. 10.1016/j.molcel.2006.09.003.
Fares MA, Wolfe KH: Positive selection and subfunctionalization of duplicated CCT chaperonin subunits. Mol Biol Evol. 2003, 20: 1588-1597. 10.1093/molbev/msg160.
Archibald JM, Logsdon JM, Doolittle WF: Origin and evolution of eukaryotic chaperonins: phylogenetic evidence for ancient duplications in CCT genes. Mol Biol Evol. 2000, 17: 1456-1466.
Harrison PM, Zheng D, Zhang Z, Carriero N, Gerstein M: Transcribed processed pseudogenes in the human genome: an intermediate form of expressed retrosequence lacking protein-coding ability. Nucleic Acids Res. 2005, 33: 2374-2383. 10.1093/nar/gki531.
Yu Z, Morais D, Ivanga M, Harrison PM: Analysis of the role of retrotransposition in gene evolution in vertebrates. BMC Bioinformatics. 2007, 8: 308-10.1186/1471-2105-8-308.
Brocchieri L, Karlin S: Conservation among HSP60 sequences in relation to structure, function, and evolution. Protein Sci. 2000, 9: 476-486.
Bigotti MG, Bellamy SR, Clarke AR: The asymmetric ATPase cycle of the thermosome: elucidation of the binding, hydrolysis and product-release steps. J Mol Biol. 2006, 362: 835-843. 10.1016/j.jmb.2006.07.064.
Bigotti MG, Clarke AR: Cooperativity in the thermosome. J Mol Biol. 2005, 348: 13-26. 10.1016/j.jmb.2005.01.066.
Cliff MJ, Kad NM, Hay N, Lund PA, Webb MR, Burston SG, Clarke AR: A kinetic analysis of the nucleotide-induced allosteric transitions of GroEL. J Mol Biol. 1999, 293: 667-684. 10.1006/jmbi.1999.3138.
Jackson GS, Staniforth RA, Halsall DJ, Atkinson T, Holbrook JJ, Clarke AR, Burston SG: Binding and hydrolysis of nucleotides in the chaperonin catalytic cycle: implications for the mechanism of assisted protein folding. Biochemistry. 1993, 32: 2554-2563. 10.1021/bi00061a013.
Kafri G, Horovitz A: Transient kinetic analysis of ATP-induced allosteric transitions in the eukaryotic chaperonin containing TCP-1. J Mol Biol. 2003, 326: 981-987. 10.1016/S0022-2836(03)00046-9.
Kafri G, Willison KR, Horovitz A: Nested allosteric interactions in the cytoplasmic chaperonin containing TCP-1. Protein Sci. 2001, 10: 445-449. 10.1110/ps.44401.
Kanzaki T, Iizuka R, Takahashi K, Maki K, Masuda R, Sahlan M, Yebenes H, Valpuesta JM, Oka T, Furutani M, et al: Sequential action of ATP-dependent subunit conformational change and interaction between helical protrusions in the closure of the built-in lid of group II chaperonins. J Biol Chem. 2008, 283: 34773-34784. 10.1074/jbc.M805303200.
Staniforth RA, Burston SG, Atkinson T, Clarke AR: Affinity of chaperonin-60 for a protein substrate and its modulation by nucleotides and chaperonin-10. Biochem J. 1994, 300 (Pt 3): 651-658.
Todd MJ, Viitanen PV, Lorimer GH: Dynamics of the chaperonin ATPase cycle: implications for facilitated protein folding. Science. 1994, 265: 659-666. 10.1126/science.7913555.
Yifrach O, Horovitz A: Coupling between protein folding and allostery in the GroE chaperonin system. Proc Natl Acad Sci USA. 2000, 97: 1521-1524. 10.1073/pnas.040449997.
Roobol A, Grantham J, Whitaker HC, Carden MJ: Disassembly of the cytosolic chaperonin in mammalian cell extracts at intracellular levels of K+ and ATP. J Biol Chem. 1999, 274: 19220-19227. 10.1074/jbc.274.27.19220.
Yamada A, Sekiguchi M, Mimura T, Ozeki Y: The role of plant CCTalpha in salt- and osmotic-stress tolerance. Plant Cell Physiol. 2002, 43: 1043-1048. 10.1093/pcp/pcf120.
Roobol A, Sahyoun ZP, Carden MJ: Selected subunits of the cytosolic chaperonin associate with microtubules assembled in vitro. J Biol Chem. 1999, 274: 2408-2415. 10.1074/jbc.274.4.2408.
Grantham J, Ruddock LW, Roobol A, Carden MJ: Eukaryotic chaperonin containing T-complex polypeptide 1 interacts with filamentous actin and reduces the initial rate of actin polymerization in vitro. Cell Stress Chaperones. 2002, 7: 235-242. 10.1379/1466-1268(2002)007<0235:ECCTCP>2.0.CO;2.
Shah AS, Farmen SL, Moninger TO, Businga TR, Andrews MP, Bugge K, Searby CC, Nishimura D, Brogden KA, Kline JN, et al: Loss of Bardet-Biedl syndrome proteins alters the morphology and function of motile cilia in airway epithelia. Proc Natl Acad Sci USA. 2008, 105: 3380-3385. 10.1073/pnas.0712327105.
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25: 3389-3402. 10.1093/nar/25.17.3389.
Kent WJ: BLAT--the BLAST-like alignment tool. Genome Res. 2002, 12: 656-664.
BLAT Search Genome. [http://genome.ucsc.edu/cgi-bin/hgBlat]
Altschul SF, Koonin EV: Iterated profile searches with PSI-BLAST--a tool for discovery in protein databases. Trends Biochem Sci. 1998, 23: 444-447. 10.1016/S0968-0004(98)01298-5.
Salamov AA, Solovyev VV: Ab initio gene finding in Drosophila genomic DNA. Genome Res. 2000, 10: 516-522. 10.1101/gr.10.4.516.
Brocchieri L, Karlin S: A symmetric-iterated multiple alignment of protein sequences. J Mol Biol. 1998, 276: 249-264. 10.1006/jmbi.1997.1527.
HUGO Gene Nomenclature Committee. [http://www.genenames.org]
Edgar RC: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004, 32: 1792-1797. 10.1093/nar/gkh340.
Mukherjee K, Bürglin TR: MEKHLA, a novel domain with similarity to PAS domains, is fused to plant homeodomain-leucine zipper III proteins. Plant Physiol. 2006, 140: 1142-1150. 10.1104/pp.105.073833.
Mukherjee K, Bürglin TR: Comprehensive analysis of animal TALE homeobox genes: new conserved motifs and cases of accelerated evolution. J Mol Evol. 2007, 65: 137-153. 10.1007/s00239-006-0023-0.
Cole C, Barber JD, Barton GJ: The Jpred 3 secondary structure prediction server. Nucleic Acids Res. 2008, 36: W197-201. 10.1093/nar/gkn238.
Jpred 3. A Secondary Structure Prediction Server. [http://www.compbio.dundee.ac.uk/www-jpred/]
Guindon S, Gascuel O: A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol. 2003, 52: 696-704. 10.1080/10635150390235520.
Ronquist F, Huelsenbeck JP: MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics. 2003, 19: 1572-1574. 10.1093/bioinformatics/btg180.
Yang Z: PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol. 2007, 24: 1586-1591. 10.1093/molbev/msm088.
The authors thank an anonymous reviewer for providing valuable information. AJLM and EC de M thank Wesley Harlow for his help in the initial stages of this work and the San Francisco Foundation for support. LB and KM thank Mr. Steve Oden and Ms. Shaina R. Wallach for critical proofreading of the manuscript. LB thanks the University of Florida Genetics Institute for financial support.
KM participated in research and methodological approach design, carried out all searches and most data analyses, wrote drafts of the manuscript and participated in its refinement, compiled all tables and produced most figures; EC de M and AJLM envisioned the research project, started data collection and participated in research design and in manuscript preparation; LB participated in research design and methodological approach, produced differentiation and mutation-accumulation estimates and analyses and participated in writing the manuscript. All authors read and approved the final manuscript.