- Research article
- Open Access
Evolutionary and functional relationships within the DJ1 superfamily
BMC Evolutionary Biologyvolume 4, Article number: 6 (2004)
Inferences about protein function are often made based on sequence homology to other gene products of known activities. This approach is valuable for small families of conserved proteins but can be difficult to apply to large superfamilies of proteins with diverse function. In this study we looked at sequence homology between members of the DJ-1/ThiJ/PfpI superfamily, which includes a human protein of unclear function, DJ-1, associated with inherited Parkinson's disease.
DJ-1 orthologs in a variety of eukaryotic species cluster together in a single group. The most closely related group is the bacterial ThiJ genes. These are kinases involved in the biosynthesis of thiamine, a function that has been dispensed with evolutionarily in most eukaryotes where thiamine is an essential nutrient. The similarity with other characterized members of the superfamily, including proteases, is more remote. This is congruent with the recently solved crystal structures that fail to demonstrate the presence of a catalytic triad required for protease activity.
DJ-1 may have evolved from the bacterial gene encoding ThiJ kinase. However, as this function has been dispensed with in eukaryotes it appears that the gene has been co-opted for another function.
Mutations in DJ-1 have been described recently that are associated with recessively inherited Parkinson's disease (PD). Evidence to date suggests that the mutations cause disease by a loss of function mechanism. The reported mutations either delete several exons and result in an effective gene knockout  or are point mutations that destabilize the protein . Therefore, the normal cellular function of DJ-1 is a critical piece of information in understanding how these mutations cause PD. DJ-1 has a number of reported functions, including cellular transformation , transcriptional effects , control of mRNA stability  and response to oxidative stress [6, 7] and it is unclear quite how all of these relate to the pathways involved in PD .
One way to understand protein function is to find other proteins of known function with sequence or structural homology. This approach has helped in understanding of other PD proteins; parkin was found to be an E3 protein-ubiquitin ligase based on homology to other proteins with similar domain structures . DJ1 shows sequence homology to a number of proteins that contain a ThiJ domain, including protein chaperones , catalases  proteases [12, 13] and the ThiJ kinases [14, 15]. Previous analyses have suggested that the ThiJ domain may be a member of the large glutamine amidotransferase (GAT) superfamily . Crystal structures of DJ-1 [16–20] and other members of this DJ-1/ThiJ/PfpI superfamily including the protease PH1704  have been reported. The proteins have an overall α/β sandwich structure, arranged similarly to the Rossman fold, which is also present in members of the GAT superfamily . The structure is similar to another protein of much lower sequence homology, the E coli chaperone Hsp31 [10, 20].
The multitude of functional groupings within the DJ-1/ThiJ/PfpI superfamily limits our ability to make predictions about the cellular role of the human ortholog. A putative catalytic cysteine, cys106, is present, which has led to the suggestion that DJ-1 may be a protease . However, structural data generally argues against DJ-1 having protease activity as the invariant catalytic triad seen in other cysteine proteases is present but in an unfavorable conformation [18, 20]. On the other hand, one recent report suggests human DJ-1 possesses weak protease activity , disputing another claim of chaperone activity . In an attempt to gain further insight to possible roles of DJ-1 we performed a detailed analysis of several hundred sequences of the DJ-1/ThiJ/PfpI superfamily members. These include orthologs (sequences that are separated by speciation) and paralogs, i.e., those that are separated by other types of rearrangements. Surprisingly, we found that the nearest homologous sequences are the bacterial ThiJ genes, suggesting that DJ-1 may have evolved from thiamine synthesis genes that have been dispensed with in eukaryotes.
Using human DJ-1 as a seed sequence for PSI-BLAST, we identified 311 sequences of proteins with significant homology (see additional file 1 for a list of all the sequences we identified). Within this large group there are several distinct subgroups supported by bootstrap analysis. Proteins with similar annotations across different species generally clustered together into distinct clades (Figure 1). Those proteins annotated as DJ-1 clustered into a specific node that included several eukaryotic species, which are likely to be orthologs of each other. As expected, primate members (Homo sapiens and Cercopithecus aethiops, 100% identity) clustered together, with progressively lower similarity to rodent (Rattus norvegicus, Mus musculus and Mesocricetus auratus; 91–97% identity) or other vertebrate (Gallus gallus, Salmo salar and Xenopus laevis; 80–89% identity) homologues. Within each of two invertebrate species with reported sequences (Caenorhabditis elegans and Drosophila melanogaster) there appear to be two distinct DJ-1 paralogs, each with about 40% identity to the human protein. The closest grouping to the eukaryote DJ-1 orthologs is the ThiJ family of 4-methyl-5(β-hydroxyethyl)-thiazol monophosphate biosynthesis enzymes, which we have analyzed in more detail (discussed below).
Outside of the DJ-1/ThiJ group, there are a number of distinct clades that have at least one member whose function is known. Of these, three can be separated from the DJ-1/ThiJ proteins by the presence of diagnostic structural elements. Firstly, a series of plant homologues group together and appear to be paralogs as both have a duplicated DJ-1/ThiJ (Pfam PF01965) domains, as described Chinese cabbage . These proteins, from Arabidopsis thaliana and Oryza sativa and Brassica rapa subsp. pekinensis, are annotated as ThiJ or protease-related, but cluster close to the ThiJ family. Secondly, there are a number of bacterial proteins containing a catalase domain and a DJ-1/ThiJ domain. These are large subunit catalases (EC 184.108.40.206), the structure of one of which has been solved . Thirdly, another prominent clade includes the AraC type transcriptional regulators from bacteria. These proteins can be defined by presence of one or more helix-turn-helix (HTH) motifs in the C-terminal portion of the protein. The HTH motif is thought to mediate DNA binding, whilst the ThiJ-like domain may be an amidase, although this is unproven.
Other families have a single DJ-1/ThiJ domain with variable extensions at the C- or N-termini. A major grouping includes two proteases from thermophilic bacteria, PfPI and PH1704 whose ATP-independent protease activity has been demonstrated [12, 23]. The crystal structure has been solved for PH1704 . This protein is hexameric, in contrast to dimeric human DJ-1 and this difference in oligomer formation is mediated by differences at the C-terminal of the two proteins . The proteases that grouped together in this analysis all lack the most C-terminal α-helix found in DJ-1, and thus are also likely to be hexameric, distinguishing them from the DJ-1/ThiJ clade.
Several proteins annotated as sigma cross-reacting proteins cluster together. This family has a unique conserved composition (91.4% identity) that distinguishes it from neighboring families. This group also appears to be most similar to a larger family that includes E coli Hsp31, a chaperone [10, 20], which we have annotated as ThiJ/PfPI-like proteins including chaperones. A Saccharomyces cerevisiae protein, YDR533C, whose transcription is up regulated when yeast cells enter the quiescent state after carbon starvation or in the presence of misfolded proteins [24, 25], is also present in this larger clade. Further analysis (see discussion) supports this group as having protease activities and we have annotated this family as ThiJ/PfpI-like protease/chaperones. How distinct this grouping is from the PfpI-like proteases is unclear, and they might be regarded as a single group. However, identity between these groups is only moderate (approximately 20%), therefore we have annotated them separately (fig 1). The sigma cross-reacting proteins have a distinct ElbB domain (COG3155.1), related to ThiJ but having a moderate overall homology. Hence we have kept these as a separate branch. The structure of a member from E coli has been solved (pdb entry 1OY1) and is dimeric.
To further assess the similarities and differences between the DJ-1 and ThiJ families, we extracted the sequences and realigned them. The resulting cladogram is shown in figure 2. The member of the prokaryotic ThiJ enzymes with highest homology to eukaryotic DJ-1 proteins is from Leptospira interrogans, which has a 42% identity with human DJ-1. This emphasizes the high degree of conservation between these two groups and indicates that there is likely to be structural conservation. Figure 3 and 4 shows a multiple alignment of the DJ-1 and ThiJ families. There are a series of key residues that have been suggested to be important in DJ-1 function which are well conserved between these two groups (see discussion).
The aim of this study was to compare sequence homologies between the clearly identifiable DJ-1 homologues and other members of this superfamily whose function or activity is known. The results of a search using PSI-BLAST yielded many genes with significant homology to human DJ-1. We were particularly interested in examining whether sequence analysis would support the previous suggestions that DJ-1 is a chaperone  or a protease . Our analysis provides some degree of separation of members of the DJ-1/ThiJ superfamily with these functions.
Although human DJ-1 does not contain a strong catalytic Cys-His-Asp/Glu triad found in proteases such as PH1704, C106 and H126 have been suggested to contribute a catalytic diad . C106 is absolutely conserved in all of the ThiJ and DJ-1 sequences, as it is within most members of the superfamily (data not shown). However, H126 is conserved only within the DJ-1 family. All higher eukaryotic members have an equivalent histidine with the exception of one of the Drosophila genes, which has a phenylalanine. H126 is probably not involved in catalysis, based on the 1.1 Å crystal structure , and the significance of conservation of this residue is therefore unclear.
Further evidence that the DJ-1/ThiJ families would have only minor protease activity comes from examination of the sequence around this conserved cysteine. In the protease family, a consensus sequence AIC HGP is found. In the case of PH1704, the equivalent Cys100/His101 pair form part of the catalytic triad . In contrast, the equivalent sequence in human DJ-1 is AIC AGPT, and is conserved in all DJ-1 homologues. These data are consistent with the lack of protease activity in different assays [18, 20]. The AIC AGPT sequence may, however, contribute to the weak protease activity reported recently in vitro . E coli Hsp31 has both protease and chaperone activities , and contains the sequence SLC HGP. This made us analyze all the members of the protease/chaperone family (as annotated in figure 1). The consensus sequence is [Aliphatic] [Aliphatic]CH [SAG], with the cys/his pair being invariant. As all known proteases contain adjacent Cys/His residues whereas substitution with Cys/X is found in all non-protease members, we predict that most of the "Hsp31-like proteases/chaperones" will have protease activity. However, our analysis supports the contention that DJ-1 has only minor protease activity. Further experimental evidence to assess the physiological relevance of this weak activity is required.
It should be noted that using PSI-BLAST with human DJ-1 as a seed sequence has limitations. DJ1 and related proteins represent part of the much larger type I glutamine amidotransferases (GATase; Pfam PF00117) superfamily based on structure and sequence similarities . No type I GATase enzymes were identified with the methods we have used. This might not be a substantial limitation as it is not possible to integrate all members of such large and divergent superfamilies in a single tree without loss of predictive value . Equally, there may be important groups of enzymes within the DJ-1/ThiJ/PfPI superfamily, distinct from type I GATases, that have not been highlighted that could be instructive for finding the function of DJ-1. An example is the phosphoribosylformylglycinamidine synthases (FGAM synthases; Pfam PF02700, EC 220.127.116.11). There are at least 50 enzymes with similar annotations within the DJ1/PfPI superfamily annotated in the public database (PF01965 at http://www.sanger.ac.uk/Software/Pfam/index.shtml). The public domain superfamily was constructed using 44 seed sequences, including an FGAM synthase, and identifies a larger grouping (497 sequences) than found in our analysis (311 unique sequences). This is likely due to the generation of a more specific sequence searching profile than the more broadly inclusive methods used to form Pfam families. The sequence identity between FGAM synthases and human DJ-1 is comparable to those between human DJ-1 and some proteins in the additional file 1. Therefore, the limits of the DJ-1 "superfamily" are unclear and the dataset generated in this study may represent the most tractable set of similar sequences rather than the largest possible grouping.
One area that this analysis has allowed us to highlight is the degree of conservation of specific residues that are mutated in PD. As noted previously , Leucine 166, which is mutated to proline, is highly conserved throughout the DJ-2 proteins and ThiJ enzymes (with the exception of a phenylalanine in Fusobacterium nucleatum). It appears that L166P, in the penultimate α-helix of human DJ-1, destabilizes the protein  perhaps by disrupting this α-helix. Another putative mutation, A104T , is almost completely conserved throughout all DJ-1 and ThiJ members. The site of the M26I mutation  is also absolutely conserved in all vertebrate orthologs, although a Leucine is present in invertebrates and in the ThiJ enzymes.
Our analyses demonstrate that the eukaryotic DJ-1 and prokaryotic ThiJ families are closely related. However, they also demonstrate the difficulty of predicting function based on sequence. ThiJ was cloned as an enzyme in the biosynthesis of thiamine in E coli [14, 15]. As thiamine is an essential vitamin for many eukaryotes, presumably another use for the gene family has evolved. The mechanistic details of the enzymatic reaction of ThiJ have not been fully elucidated, but it catalyses a phosphorylation reaction of hydroxymethylpyrimidine phosphate, a precursor to thiamine . An equivalent kinase activity has not been detected in human DJ-1 . In contrast, human DJ-1 has been suggested to have either chaperone  or a weak protease activity . A tentative conclusion is that as ThiJ activity was dispensed with, the eukaryotic DJ-1 orthologs have converged on a function that was present in one of the archaic paralogs, namely protein chaperone activity. However, equally feasible is that the major function of DJ-1 is in binding RNA  or an unrecognized function. The role of the conserved cysteine residue, catalytic in other members of the family, is unclear.
We performed homology search using iterative PSI-BLAST  using human DJ-1 as the seed sequence (NP_009193.2). PSI-BLAST was performed using default parameters from the NCBI site. The search converged in 7 iterations and the results were trimmed for duplicates and hypothetical results. The resulting 311 sequences were then aligned using CLUSTALW. The results were also aligned with T-COFFEE  and SAM 3.4 , neither of which offered a significant difference in quality (data not shown).
In order to assess the similarity and quality of subgroups in this alignment, different trees were first made with 1,000 bootstrap replicates using neighbor joining on all three alignment methods from CLUSTALW, T-COFFEE and SAM3.4. Each of these methods gave similar subgroups. Subsequently, the final consensus tree was constructed by maximum likelihood using protpars of the PHYLIP package (version 3.5 c, distributed by J Felsenstein, Department of Genetics, University of Washington, Seattle) with 100 bootstrap replicates. This second method adjusted the position of the subgroups relative to each other compared with the neighbor joining but did not change the overall subgroup membership. For figure 2, the TREEVIEW program  was used to render the tree used for figure 1. The subgroup containing human DJ-1 was extracted by removing the most specific tree with 100% bootstrap support containing human DJ-1 and its neighbors, the ThiJ group. The resultant subgroup was realigned using T-COFFEE and visually inspected and altered for corrections. One sequence was removed to obtain a higher quality semi-gapless alignment.
Bonifati V, Rizzu P, van Baren MJ, Schaap O, Breedveld GJ, Krieger E, Dekker MC, Squitieri F, Ibanez P, Joosse M, van Dongen JW, Vanacore N, van Swieten JC, Brice A, Meco G, van Duijn CM, Oostra BA, Heutink P: Mutations in the DJ-1 gene associated with autosomal recessive early-onset parkinsonism. Science. 2003, 299: 256-259. 10.1126/science.1077209.
Miller David W., Ahmad Rili, Hague Stephen, Baptista Melisa J., Canet-Aviles Rosa, McLendon Chris, Carter Donald M., Zhu Peng-Peng, Stadler Julia, Chandran Jayanth, Klinefelter Gary R., Blackstone Craig, Cookson Mark R.: L166P mutant DJ-1, causative for recessive Parkinson's disease, is degraded through the ubiquitin-proteasome system. Journal of Biological Chemistry. 2003, 278: 36588 -336595. 10.1074/jbc.M304272200.
Nagakubo D, Taira T, Kitaura H, Ikeda M, Tamai K, Iguchi-Ariga SM, Ariga H: DJ-1, a novel oncogene which transforms mouse NIH3T3 cells in cooperation with ras. Biochem Biophys Res Commun. 1997, 231: 509-513. 10.1006/bbrc.1997.6132.
Takahashi K, Taira T, Niki T, Seino C, Iguchi-Ariga SM, Ariga H: DJ-1 positively regulates the androgen receptor by impairing the binding of PIASx alpha to the receptor. J Biol Chem. 2001, 276: 37556-37563. 10.1074/jbc.M101730200.
Hod Y, Pentyala SN, Whyard TC, El-Maghrabi MR: Identification and characterization of a novel protein that regulates RNA-protein interaction. J Cell Biochem. 1999, 72: 435-444. 10.1002/(SICI)1097-4644(19990301)72:3<435::AID-JCB12>3.3.CO;2-8.
Mitsumoto A, Nakagawa Y, Takeuchi A, Okawa K, Iwamatsu A, Takanezawa Y: Oxidized forms of peroxiredoxins and DJ-1 on two-dimensional gels increased in response to sublethal levels of paraquat. Free Radic Res. 2001, 35: 301-310.
Mitsumoto A, Nakagawa Y: DJ-1 is an indicator for endogenous reactive oxygen species elicited by endotoxin. Free Radic Res. 2001, 35: 885-893.
Cookson MR: Pathways to Parkinsonism. Neuron. 2003, 37: 7-10. 10.1016/S0896-6273(02)01166-2.
Zhang Y, Gao J, Chung KK, Huang H, Dawson VL, Dawson TM: Parkin functions as an E2-dependent ubiquitin- protein ligase and promotes the degradation of the synaptic vesicle-associated protein, CDCrel-1. Proceedings of the National Academy Sciences USA. 2000, 97: 13354-13359. 10.1073/pnas.240347797.
Quigley PM, Korotkov K, Baneyx F, Hol WG: The 1.6-A crystal structure of the class of chaperones represented by Escherichia coli Hsp31 reveals a putative catalytic triad. Proc Natl Acad Sci U S A. 2003, 100: 3137-3142. 10.1073/pnas.0530312100.
Horvath MM, Grishin NV: The C-terminal domain of HPII catalase is a member of the type I glutamine amidotransferase superfamily. Proteins. 2001, 42: 230-236. 10.1002/1097-0134(20010201)42:2<230::AID-PROT100>3.3.CO;2-A.
Halio SB, Blumentals ,II, Short SA, Merrill BM, Kelly RM: Sequence, expression in Escherichia coli, and analysis of the gene encoding a novel intracellular protease (PfpI) from the hyperthermophilic archaeon Pyrococcus furiosus. J Bacteriol. 1996, 178: 2605-2612.
Du X, Choi IG, Kim R, Wang W, Jancarik J, Yokota H, Kim SH: Crystal structure of an intracellular protease from Pyrococcus horikoshii at 2-A resolution. Proc Natl Acad Sci U S A. 2000, 97: 14079-14084. 10.1073/pnas.260503597.
Mizote T, Tsuda M, Nakazawa T, Nakayama H: The thiJ locus and its relation to phosphorylation of hydroxymethylpyrimidine in Escherichia coli. Microbiology. 1996, 142 ( Pt 10): 2969-2974.
Mizote T, Tsuda M, Smith DD, Nakayama H, Nakazawa T: Cloning and characterization of the thiD/J gene of Escherichia coli encoding a thiamin-synthesizing bifunctional enzyme, hydroxymethylpyrimidine kinase/phosphomethylpyrimidine kinase. Microbiology. 1999, 145 ( Pt 2): 495-501.
Huai Q, Sun Y, Wang H, Chin LS, Li L, Robinson H, Ke H: Crystal structure of DJ-1/RS and implication on familial Parkinson's disease. FEBS Lett. 2003, 549: 171-175. 10.1016/S0014-5793(03)00764-6.
Honbou K, Suzuki NN, Horiuchi M, Niki T, Taira T, Ariga H, Inagaki F: The crystal structure of DJ-1, a protein related to male fertility and Parkinson's disease. J Biol Chem. 2003, 278: 31380-31384. 10.1074/jbc.M305878200.
Wilson MA, Collins JL, Hod Y, Ringe D, Petsko GA: The 1.1-A resolution crystal structure of DJ-1, the protein mutated in autosomal recessive early onset Parkinson's disease. Proc Natl Acad Sci U S A. 2003, 100: 9256-9261. 10.1073/pnas.1133288100.
Tao X, Tong L: Crystal structure of human DJ-1, a protein associated with early-onset Parkinson's diseasec. J Biol Chem. 2003, 278: 31372-31379. 10.1074/jbc.M304221200.
Lee SJ, Kim SJ, Kim IK, Ko J, Jeong CS, Kim GH, Park C, Kang SO, Suh PG, Lee HS, Cha SS: Crystal structures of human DJ-1 and Escherichia coli Hsp31 that share an evolutionarily conserved domain. J Biol Chem. 2003, 278: 44552-44559. 10.1074/jbc.M304517200.
Olzmann JA, Brown K, Wilkinson KD, Rees HD, Huai Q, Ke H, Levey AI, Li L, Chin LS: Familial Parkinson's disease-associated L166P mutation disrupts DJ-1 protein folding and function. J Biol Chem. 2003, , in press (epub doi:10.1074/jbc.M311017200)-
Park YS, Min HJ, Ryang SH, Oh KJ, Cha JS, Kim HY, Cho TJ: Characterization of salicylic acid-induced genes in Chinese cabbage. Plant Cell Rep. 2003, 21: 1027-1034. 10.1007/s00299-003-0606-9.
Snowden L, Blumentals II, Kelly R: Regulation of Proteolytic Activity in the Hyperthermophile Pyrococcus furiosus. Applied and Environmental Microbiology. 1992, 58: 1134-1141.
de Nobel H, Lawrie L, Brul S, Klis F, Davis M, Alloush H, Coote P: Parallel and comparative analysis of the proteome and transcriptome of sorbic acid-stressed Saccharomyces cerevisiae. Yeast. 2001, 18: 1413-1428. 10.1002/yea.793.
Trotter EW, Kao CM, Berenfeld L, Botstein D, Petsko GA, Gray JV: Misfolded proteins are competent to mediate a subset of the responses to heat shock in Saccharomyces cerevisiae. J Biol Chem. 2002, 277: 44817-44825. 10.1074/jbc.M204686200.
Sjolander K: Phylogenomic inference of protein molecular function: advances and challenges. Bioinformatics. 2004, 20: 170-179. 10.1093/bioinformatics/bth021.
Hague S, Rogaeva E, Hernandez D, Gulick C, Singleton A, Hanson M, Johnson J, Weiser R, Gallardo M, Ravina B, Gwinn-Hardy K, Crawley A, St George-Hyslop PH, Lang AE, Heutink P, Bonifati V, Hardy J: Early-onset Parkinson's disease caused by a compound heterozygous DJ-1 mutation. Ann Neurol. 2003, 54: 271-274. 10.1002/ana.10663.
Abou-Sleiman PM, Healy DG, Quinn N, Lees AJ, Wood NW: The role of pathogenic DJ-1 mutations in Parkinson's disease. Ann Neurol. 2003, 54: 283-286. 10.1002/ana.10675.
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25: 3389-3402. 10.1093/nar/25.17.3389.
Notredame C, Higgins DG, Heringa J: T-Coffee: A novel method for fast and accurate multiple sequence alignment. J Mol Biol. 2000, 302: 205-217. 10.1006/jmbi.2000.4042.
Karplus K, Barrett C, Hughey R: Hidden Markov models for detecting remote protein homologies. Bioinformatics. 1998, 14: 846-856. 10.1093/bioinformatics/14.10.846.
Page RD: TreeView: an application to display phylogenetic trees on personal computers. Comput Appl Biosci. 1996, 12: 357-358.
The authors would like to thank Dr Andrew Singleton for his helpful comments. This study utilized the high-performance computational capabilities of the Helix Systems at the National Institutes of Health, Bethesda, MD http://helix.nih.gov.
SB performed all of the multiple alignments and tree constructions. MRC participated in the study design and drafted the manuscript.