Evolutionary and functional relationships within the DJ1 superfamily
© Bandyopadhyay and Cookson 2004
Received: 25 September 2003
Accepted: 19 February 2004
Published: 19 February 2004
Skip to main content
© Bandyopadhyay and Cookson 2004
Received: 25 September 2003
Accepted: 19 February 2004
Published: 19 February 2004
Inferences about protein function are often made based on sequence homology to other gene products of known activities. This approach is valuable for small families of conserved proteins but can be difficult to apply to large superfamilies of proteins with diverse function. In this study we looked at sequence homology between members of the DJ-1/ThiJ/PfpI superfamily, which includes a human protein of unclear function, DJ-1, associated with inherited Parkinson's disease.
DJ-1 orthologs in a variety of eukaryotic species cluster together in a single group. The most closely related group is the bacterial ThiJ genes. These are kinases involved in the biosynthesis of thiamine, a function that has been dispensed with evolutionarily in most eukaryotes where thiamine is an essential nutrient. The similarity with other characterized members of the superfamily, including proteases, is more remote. This is congruent with the recently solved crystal structures that fail to demonstrate the presence of a catalytic triad required for protease activity.
DJ-1 may have evolved from the bacterial gene encoding ThiJ kinase. However, as this function has been dispensed with in eukaryotes it appears that the gene has been co-opted for another function.
Mutations in DJ-1 have been described recently that are associated with recessively inherited Parkinson's disease (PD). Evidence to date suggests that the mutations cause disease by a loss of function mechanism. The reported mutations either delete several exons and result in an effective gene knockout  or are point mutations that destabilize the protein . Therefore, the normal cellular function of DJ-1 is a critical piece of information in understanding how these mutations cause PD. DJ-1 has a number of reported functions, including cellular transformation , transcriptional effects , control of mRNA stability  and response to oxidative stress [6, 7] and it is unclear quite how all of these relate to the pathways involved in PD .
One way to understand protein function is to find other proteins of known function with sequence or structural homology. This approach has helped in understanding of other PD proteins; parkin was found to be an E3 protein-ubiquitin ligase based on homology to other proteins with similar domain structures . DJ1 shows sequence homology to a number of proteins that contain a ThiJ domain, including protein chaperones , catalases  proteases [12, 13] and the ThiJ kinases [14, 15]. Previous analyses have suggested that the ThiJ domain may be a member of the large glutamine amidotransferase (GAT) superfamily . Crystal structures of DJ-1 [16–20] and other members of this DJ-1/ThiJ/PfpI superfamily including the protease PH1704  have been reported. The proteins have an overall α/β sandwich structure, arranged similarly to the Rossman fold, which is also present in members of the GAT superfamily . The structure is similar to another protein of much lower sequence homology, the E coli chaperone Hsp31 [10, 20].
The multitude of functional groupings within the DJ-1/ThiJ/PfpI superfamily limits our ability to make predictions about the cellular role of the human ortholog. A putative catalytic cysteine, cys106, is present, which has led to the suggestion that DJ-1 may be a protease . However, structural data generally argues against DJ-1 having protease activity as the invariant catalytic triad seen in other cysteine proteases is present but in an unfavorable conformation [18, 20]. On the other hand, one recent report suggests human DJ-1 possesses weak protease activity , disputing another claim of chaperone activity . In an attempt to gain further insight to possible roles of DJ-1 we performed a detailed analysis of several hundred sequences of the DJ-1/ThiJ/PfpI superfamily members. These include orthologs (sequences that are separated by speciation) and paralogs, i.e., those that are separated by other types of rearrangements. Surprisingly, we found that the nearest homologous sequences are the bacterial ThiJ genes, suggesting that DJ-1 may have evolved from thiamine synthesis genes that have been dispensed with in eukaryotes.
Outside of the DJ-1/ThiJ group, there are a number of distinct clades that have at least one member whose function is known. Of these, three can be separated from the DJ-1/ThiJ proteins by the presence of diagnostic structural elements. Firstly, a series of plant homologues group together and appear to be paralogs as both have a duplicated DJ-1/ThiJ (Pfam PF01965) domains, as described Chinese cabbage . These proteins, from Arabidopsis thaliana and Oryza sativa and Brassica rapa subsp. pekinensis, are annotated as ThiJ or protease-related, but cluster close to the ThiJ family. Secondly, there are a number of bacterial proteins containing a catalase domain and a DJ-1/ThiJ domain. These are large subunit catalases (EC 184.108.40.206), the structure of one of which has been solved . Thirdly, another prominent clade includes the AraC type transcriptional regulators from bacteria. These proteins can be defined by presence of one or more helix-turn-helix (HTH) motifs in the C-terminal portion of the protein. The HTH motif is thought to mediate DNA binding, whilst the ThiJ-like domain may be an amidase, although this is unproven.
Other families have a single DJ-1/ThiJ domain with variable extensions at the C- or N-termini. A major grouping includes two proteases from thermophilic bacteria, PfPI and PH1704 whose ATP-independent protease activity has been demonstrated [12, 23]. The crystal structure has been solved for PH1704 . This protein is hexameric, in contrast to dimeric human DJ-1 and this difference in oligomer formation is mediated by differences at the C-terminal of the two proteins . The proteases that grouped together in this analysis all lack the most C-terminal α-helix found in DJ-1, and thus are also likely to be hexameric, distinguishing them from the DJ-1/ThiJ clade.
Several proteins annotated as sigma cross-reacting proteins cluster together. This family has a unique conserved composition (91.4% identity) that distinguishes it from neighboring families. This group also appears to be most similar to a larger family that includes E coli Hsp31, a chaperone [10, 20], which we have annotated as ThiJ/PfPI-like proteins including chaperones. A Saccharomyces cerevisiae protein, YDR533C, whose transcription is up regulated when yeast cells enter the quiescent state after carbon starvation or in the presence of misfolded proteins [24, 25], is also present in this larger clade. Further analysis (see discussion) supports this group as having protease activities and we have annotated this family as ThiJ/PfpI-like protease/chaperones. How distinct this grouping is from the PfpI-like proteases is unclear, and they might be regarded as a single group. However, identity between these groups is only moderate (approximately 20%), therefore we have annotated them separately (fig 1). The sigma cross-reacting proteins have a distinct ElbB domain (COG3155.1), related to ThiJ but having a moderate overall homology. Hence we have kept these as a separate branch. The structure of a member from E coli has been solved (pdb entry 1OY1) and is dimeric.
The aim of this study was to compare sequence homologies between the clearly identifiable DJ-1 homologues and other members of this superfamily whose function or activity is known. The results of a search using PSI-BLAST yielded many genes with significant homology to human DJ-1. We were particularly interested in examining whether sequence analysis would support the previous suggestions that DJ-1 is a chaperone  or a protease . Our analysis provides some degree of separation of members of the DJ-1/ThiJ superfamily with these functions.
Although human DJ-1 does not contain a strong catalytic Cys-His-Asp/Glu triad found in proteases such as PH1704, C106 and H126 have been suggested to contribute a catalytic diad . C106 is absolutely conserved in all of the ThiJ and DJ-1 sequences, as it is within most members of the superfamily (data not shown). However, H126 is conserved only within the DJ-1 family. All higher eukaryotic members have an equivalent histidine with the exception of one of the Drosophila genes, which has a phenylalanine. H126 is probably not involved in catalysis, based on the 1.1 Å crystal structure , and the significance of conservation of this residue is therefore unclear.
Further evidence that the DJ-1/ThiJ families would have only minor protease activity comes from examination of the sequence around this conserved cysteine. In the protease family, a consensus sequence AI C HGP is found. In the case of PH1704, the equivalent Cys100/His101 pair form part of the catalytic triad . In contrast, the equivalent sequence in human DJ-1 is AI C AGPT, and is conserved in all DJ-1 homologues. These data are consistent with the lack of protease activity in different assays [18, 20]. The AI C AGPT sequence may, however, contribute to the weak protease activity reported recently in vitro . E coli Hsp31 has both protease and chaperone activities , and contains the sequence SL C HGP. This made us analyze all the members of the protease/chaperone family (as annotated in figure 1). The consensus sequence is [Aliphatic] [Aliphatic]CH [SAG], with the cys/his pair being invariant. As all known proteases contain adjacent Cys/His residues whereas substitution with Cys/X is found in all non-protease members, we predict that most of the "Hsp31-like proteases/chaperones" will have protease activity. However, our analysis supports the contention that DJ-1 has only minor protease activity. Further experimental evidence to assess the physiological relevance of this weak activity is required.
It should be noted that using PSI-BLAST with human DJ-1 as a seed sequence has limitations. DJ1 and related proteins represent part of the much larger type I glutamine amidotransferases (GATase; Pfam PF00117) superfamily based on structure and sequence similarities . No type I GATase enzymes were identified with the methods we have used. This might not be a substantial limitation as it is not possible to integrate all members of such large and divergent superfamilies in a single tree without loss of predictive value . Equally, there may be important groups of enzymes within the DJ-1/ThiJ/PfPI superfamily, distinct from type I GATases, that have not been highlighted that could be instructive for finding the function of DJ-1. An example is the phosphoribosylformylglycinamidine synthases (FGAM synthases; Pfam PF02700, EC 220.127.116.11). There are at least 50 enzymes with similar annotations within the DJ1/PfPI superfamily annotated in the public database (PF01965 at http://www.sanger.ac.uk/Software/Pfam/index.shtml). The public domain superfamily was constructed using 44 seed sequences, including an FGAM synthase, and identifies a larger grouping (497 sequences) than found in our analysis (311 unique sequences). This is likely due to the generation of a more specific sequence searching profile than the more broadly inclusive methods used to form Pfam families. The sequence identity between FGAM synthases and human DJ-1 is comparable to those between human DJ-1 and some proteins in the additional file 1. Therefore, the limits of the DJ-1 "superfamily" are unclear and the dataset generated in this study may represent the most tractable set of similar sequences rather than the largest possible grouping.
One area that this analysis has allowed us to highlight is the degree of conservation of specific residues that are mutated in PD. As noted previously , Leucine 166, which is mutated to proline, is highly conserved throughout the DJ-2 proteins and ThiJ enzymes (with the exception of a phenylalanine in Fusobacterium nucleatum). It appears that L166P, in the penultimate α-helix of human DJ-1, destabilizes the protein  perhaps by disrupting this α-helix. Another putative mutation, A104T , is almost completely conserved throughout all DJ-1 and ThiJ members. The site of the M26I mutation  is also absolutely conserved in all vertebrate orthologs, although a Leucine is present in invertebrates and in the ThiJ enzymes.
Our analyses demonstrate that the eukaryotic DJ-1 and prokaryotic ThiJ families are closely related. However, they also demonstrate the difficulty of predicting function based on sequence. ThiJ was cloned as an enzyme in the biosynthesis of thiamine in E coli [14, 15]. As thiamine is an essential vitamin for many eukaryotes, presumably another use for the gene family has evolved. The mechanistic details of the enzymatic reaction of ThiJ have not been fully elucidated, but it catalyses a phosphorylation reaction of hydroxymethylpyrimidine phosphate, a precursor to thiamine . An equivalent kinase activity has not been detected in human DJ-1 . In contrast, human DJ-1 has been suggested to have either chaperone  or a weak protease activity . A tentative conclusion is that as ThiJ activity was dispensed with, the eukaryotic DJ-1 orthologs have converged on a function that was present in one of the archaic paralogs, namely protein chaperone activity. However, equally feasible is that the major function of DJ-1 is in binding RNA  or an unrecognized function. The role of the conserved cysteine residue, catalytic in other members of the family, is unclear.
We performed homology search using iterative PSI-BLAST  using human DJ-1 as the seed sequence (NP_009193.2). PSI-BLAST was performed using default parameters from the NCBI site. The search converged in 7 iterations and the results were trimmed for duplicates and hypothetical results. The resulting 311 sequences were then aligned using CLUSTALW. The results were also aligned with T-COFFEE  and SAM 3.4 , neither of which offered a significant difference in quality (data not shown).
In order to assess the similarity and quality of subgroups in this alignment, different trees were first made with 1,000 bootstrap replicates using neighbor joining on all three alignment methods from CLUSTALW, T-COFFEE and SAM3.4. Each of these methods gave similar subgroups. Subsequently, the final consensus tree was constructed by maximum likelihood using protpars of the PHYLIP package (version 3.5 c, distributed by J Felsenstein, Department of Genetics, University of Washington, Seattle) with 100 bootstrap replicates. This second method adjusted the position of the subgroups relative to each other compared with the neighbor joining but did not change the overall subgroup membership. For figure 2, the TREEVIEW program  was used to render the tree used for figure 1. The subgroup containing human DJ-1 was extracted by removing the most specific tree with 100% bootstrap support containing human DJ-1 and its neighbors, the ThiJ group. The resultant subgroup was realigned using T-COFFEE and visually inspected and altered for corrections. One sequence was removed to obtain a higher quality semi-gapless alignment.
The authors would like to thank Dr Andrew Singleton for his helpful comments. This study utilized the high-performance computational capabilities of the Helix Systems at the National Institutes of Health, Bethesda, MD http://helix.nih.gov.
This article is published under license to BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.