Molecular phylogenetics and comparative modeling of HEN1, a methyltransferase involved in plant microRNA biogenesis
© Tkaczuk et al; licensee BioMed Central Ltd. 2006
Received: 09 November 2005
Accepted: 24 January 2006
Published: 24 January 2006
Recently, HEN1 protein from Arabidopsis thaliana was discovered as an essential enzyme in plant microRNA (miRNA) biogenesis. HEN1 transfers a methyl group from S-adenosylmethionine to the 2'-OH or 3'-OH group of the last nucleotide of miRNA/miRNA* duplexes produced by the nuclease Dicer. Previously it was found that HEN1 possesses a Rossmann-fold methyltransferase (RFM) domain and a long N-terminal extension including a putative double-stranded RNA-binding motif (DSRM). However, little is known about the details of the structure and the mechanism of action of this enzyme, and about its phylogenetic origin.
Extensive database searches were carried out to identify orthologs and close paralogs of HEN1. Based on the multiple sequence alignment a phylogenetic tree of the HEN1 family was constructed. The fold-recognition approach was used to identify related methyltransferases with experimentally solved structures and to guide the homology modeling of the HEN1 catalytic domain. Additionally, we identified a La-like predicted RNA binding domain located C-terminally to the DSRM domain and a domain with a peptide prolyl cis/trans isomerase (PPIase) fold, but without the conserved PPIase active site, located N-terminally to the catalytic domain.
The bioinformatics analysis revealed that the catalytic domain of HEN1 is not closely related to any known RNA:2'-OH methyltransferases (e.g. to the RrmJ/fibrillarin superfamily), but rather to small-molecule methyltransferases. The structural model was used as a platform to identify the putative active site and substrate-binding residues of HEN and to propose its mechanism of action.
MicroRNAs (miRNAs) are small (~22 nt), single-stranded, noncoding RNAs that have recently emerged as important regulatory factors during growth and development in Eukaryota. To date, miRNAs were described in animals, plants, and viruses (reviews: [1–3]). miRNAs are processed from longer precursor RNAs transcribed by RNA polymerase II that form stem-loop structures, in which the mature miRNAs reside in the stems. In animals, long primary transcripts (pri-miRNAs) are first cropped in the nucleus by an RNase-III homolog Drosha to release the hairpin intermediates (pre-miRNAs) in the nucleus. Following their export to the cytoplasm, pre-miRNAs are subjected to the second processing step, which is carried out by another RNase III homolog Dicer. In plants that lack Drosha, it has been suggested that miRNA processing is executed by Dicer-like protein 1 (DCL1, also called CARPEL FACTORY or CAF) (reviews: [4, 5]). miRNAs down-regulate gene expression by binding to complementary mRNAs and either triggering mRNA elimination or arresting mRNA translation into protein. Thus far, miRNAs have been implicated in the control of several pathways, including developmental timing, haematopoiesis, organogenesis, apoptosis, cell proliferation and possibly even tumorigenesis (reviews: [6–8]). However, the mechanisms of miRNA generation and function are still poorly understood and the molecular details are only beginning to be revealed.
HEN1 was identified as a gene that plays a role in the specification of stamen and carpel identities during the flower development in Arabidopsis thaliana . Mutations in HEN1 resulted in similar defects to those observed for mutations in CAF, suggesting that they are both involved in miRNA metabolism . Recently, it was found that the product of HEN1 is a methyltransferase (MTase) that acts on miRNA duplexes in vitro and methylates the last nucleotide of both strands in the substrate . It was found that the methylation by HEN1 protects plant miRNAs against the 3'-end uridylation and the subsequent degradation . Both the 2'-OH and 3'-OH groups of ribose on the last nucleoside were found to be essential for methylation by the HEN1 protein, hence they are both considered as the possible methylation sites, they may also play a crucial role in the process of substrate recognition . The 2'-OH group is the most commonly methylated target in RNA, while 3'-methylated ribonucleosides have not been identified . However, it remains to be determined which of the OH groups of the last nucleoside of the miRNA/miRNA* duplex is the target of methylation by HEN1. Of note, HEN1 and its homologs analyzed in this article are completely unrelated to a human gene HEN1 that encodes a 20-kDa neuron-specific DNA-binding polypeptide (pp20HEN1) with the basic helix-loop-helix (bHLH) motif.
HEN1 is a long protein (942 aa), which was found to comprise a putative double-stranded RNA-binding motif (DSRM) in the very N-terminus and a C-terminal domain (CTD, aa ~694–911), which exhibits significant similarity to a group of uncharacterized protein from bacteria, fungi, and metazoa . These proteins are however much shorter – they lack the DSRM and the long central region of HEN1. HEN1-CTD was found to be related to the Rossmann-fold MTase (RFM) superfamily, suggesting that it is responsible for the RNA MTase activity of this protein . It is noteworthy that sequences of HEN1 and its homologs are so strongly diverged from other proteins that initially HEN1 was not recognized as a MTase homolog when it was discovered . Thus, apart from generic features common to all MTases, the molecular mechanism of specific interactions of HEN1 with its substrate RNA remains unknown. In particular, the three-dimensional structure, the identity of potential catalytic and substrate-binding residues, and the phylogenetic origin of HEN1-CTD have not yet been inferred. We have therefore carried out bioinformatics analyses to collect the possibly most complete set of HEN1 orthologs in current sequence databases as well as to identify closest paralogs amongst MTases with known structure and mechanism of action. The results were used to construct a tertiary model of the catalytic domain of HEN1 and to predict the architecture of the substrate-binding region and the active site.
Results and discussion
Sequence analyses of HEN1
In order to identify orthologs of A. thaliana HEN1, we used its full-length sequence as a query to search the non-redundant (nr) protein database using PSI-BLAST  as well as genomic databases using tBLASTn . A complete homologous sequences with significant similarity to the entire query was found only in Oryza sativa (gi: 50510095). We also searched the EST and genomic databases using tBLASTn  and found sequences from several different plant species that covered various segments of the query, but from which we could not assemble any contiguous fragment that would cover the full-length protein. Only the HTG sequence from Lotus corniculatus var. japonicus (gi: 17736840) displayed similarity to the entire query sequence, but we decided to omit it from further analyses due to uncertainties in positions of intron-exon boundaries (data not shown).
To identify domains in the primary structure of HEN1, we carried out an RPS-BLAST search of the CDD database of conserved domain alignments , which confirmed the presence of the N-terminal DSRM, albeit with low score (e-value 0.73, only 67.6% aligned) and the C-terminal RFM domain (aa 690–940; best match to the UbiG MTase family, e-value 6*10-05), but did not reveal any new domains in the large central region. Therefore, we divided the HEN1 sequence into a set of overlapping sequence fragments < 500 aa and submitted it to the GeneSilico protein structure prediction MetaServer  to carry out predictions of secondary structure, protein order/disorder and possible three-dimensional folds (see Methods for details). Fragments of 100–200 aa with apparent similarity to conserved domains were resubmitted as individual jobs.
To study the origin of the HEN1 enzyme, we carried out additional searches of the non-redundant sequence database using only HEN1-CTD (aa 694–911), with a stringent e (expectation) value threshold of 10-20. The search converged in the 4th iteration, revealing a family of sequences with well-conserved regions along the entire sequence. All sequences with scores below that threshold were reported with significantly shorter alignments and a preliminary visual analysis suggested that they lacked many of the residues apparently conserved among the close homologs of HEN1, they were also annotated as involved in distinct processes (typically – methylation of quinones), hence they were regarded as potential paralogs.
To identify closest paralogs (and potential ancestors) of HEN1, we converted the multiple sequence alignment of the HEN1 family into a profile-Hidden Markov Model (HMM) using HHpred  and we compared it with similar profile-HMMs pre-calculated for protein families collected in the Clusters of Orthologous Groups (COG) database . Interestingly, HHpred analysis suggested that the closest relatives of HEN1 are not MTases acting on nucleic acids, but enzymes acting on small molecules. The top three matches that obtained significantly higher similarity scores than other families, were: COG2227 (UbiG, "2-polyprenyl-3-methyl-5-hydroxy-6-methoxy-1,4-benzoquinol methylase"; reported with e-value: 3.1*10-24), COG2230 ("cyclopropane fatty acid synthase and related methyltransferases", reported with probability e-value:1.8*10-21), and COG2226 (UbiE, "methylase involved in ubiquinone/menaquinone biosynthesis", reported with e-value: 6.9*10-20). The fourth match, with already significantly lower score was COG4106 ('trans-aconitate MTase', e-value: 8.3*10-16). The best-scoring nucleic acid MTase family on the list of HEN1 homologs was found only on the fifth position (COG2519, "GCD14 tRNA (1-methyladenosine) methyltransferase and related methyltransferases"), with e-value 1.2*10-12. It is remarkable that no known ribose MTase families were reported at the top positions of the ranking.
Phylogenetic analysis of HEN1-CTD
The relationships between different eukaryotic lineages of HEN1 are in general agreement with the topology of the "Tree of Life". In Fungi they are present both in Basidiomycota (e.g. U. maydis) and Ascomycota (e.g. S. pombe), but they appear to have been lost from many lineages, e.g. Saccharomycotina. It is noteworthy that HEN1 orthologs could not be detected in Archaea or in primitive Eukaryota with fully sequenced genomes, such as Alveolata (e.g. Plasmodium) or Euglenozoa (e.g. Trypanosoma). On the other hand, the distribution of HEN1 homologs in Bacteria is very limited and quite erratic (only in Firmicutes – Clostridium and Streptococcus, Cyanobacteria – Nostoc and Anabaena and Actinobacteria – Kineococcus radiotolerans). This distribution would suggest that HEN1 originated in the common ancestor of Eukaryota, before the divergence of the Viridiplantae and Metazoa/Fungi branches, and has been transferred horizontally to Bacteria. However, the sequences of bacterial members of the HEN1 family appear to be more similar to the closest paralogous family UbiG than the eukaryotic members (Figure 3). This suggests that HEN1-CTD could have evolved in Bacteria by duplication of the UbiG-encoding gene and neofunctionalization of the second copy and only then was horizontally transferred to the ancestor of contemporary Eukaryota. In order to fully understand the origin of HEN1, it will be useful to characterize the molecular function of the short forms (lacking the N-terminal extensions of A. thaliana HEN1) from Bacteria as well as from other eukaryotic species (in particular animals and fungi).
Structure prediction of the HEN1-CTD
In the absence of an experimentally determined protein structure, comparative modeling may provide a structural platform for the investigation of sequence-structure-function relationships. This technique requires a homologous template structure to be identified and the sequence of the modeled protein (a target) to be correctly aligned to the template. The C-terminal catalytic domain of HEN1 showed distant similarity to many different structures of class-I MTases in standard database searches. It is known, however, that despite the common fold and conserved cofactor-binding site, different subfamilies of MTases exhibit significant differences in the architecture of their substrate-binding pocket and the active site (e.g. ref. [29, 30]). Thus, modeling of HEN1 based on a randomly selected MTase structure could introduce large errors in the functionally most important parts of the protein and mislead the structure-based functional predictions.
In order to identify the optimal set of template structures for modeling of HEN1, we used the fold-recognition (FR) approach, which allows to assess the compatibility of the target sequence with the available protein structures based not only on the sequence similarity, but also on the structural considerations (match of secondary structure elements, compatibility of residue-residue contacts, etc.). As mentioned earlier, the sequence of HEN1 CTD was therefore submitted to the GeneSilico protein fold-recognition metaserver . As expected, all FR methods reported RFM structures as the potentially best templates. Interestingly, none of them reported any of the known RNA:2'-OH MTase structures from the RrmJ/fibrillarin superfamily  or actually, any known RNA or DNA MTases, on top positions of the ranking. Instead, all FR algorithms suggested that the potentially best templates for modeling of HEN1 (i.e. its closest homologs among proteins of known structure) are either known small-molecule MTases or uncharacterized proteins from the structural genomics projects, which show strongest similarity to small-molecule MTases. In particular, PDB-BLAST reported 1 kpg (a mycolic acid cyclopropane synthase Cmaa1 from Mycobacterium) with the score of 2*10-42, FFAS  reported 1xxl (an uncharacterized protein YcgI from Bacillus subtilis) with the score of: -44.1, mGENTHREADER  reported 1xxl with the score of 0.949, SPARKS  reported 1vl5 (an uncharacterized protein Bh2331 from Bacillus halodurans) and 1y8c (an uncharacterized, predicted MTase from Clostridium acetobutylicum) with the score of -4.42 (these scores are not normalized as each server uses a different evaluation system; see the individual references for details). Ultimately, the consensus server Pcons2  assigned highest scores (2.673-2.42) to the small-molecule MTase structures 1xxl, 1vl5, and 1 kpg, as potentially best templates for modeling of HEN1. This result is in very good agreement from the profile-HMM analysis, which suggested that HEN1 is most closely related to small-molecule MTase families, including those with unknown structures such as UbiE and UbiG (which are thus unavailable for detection by the structure-based FR). Thus, bioinformatics analyses strongly suggests that HEN1 CTD exhibits sequence and structural features characteristic for the "small molecule" branch of the MTase superfamily.
Comparative modeling of the HEN1-CTD
A comparative model of HEN1 was constructed based on the alignments reported by fold-recognition methods, using the "FRankenstein's Monster" approach (see Methods). The C-terminal residues 912–942 were predicted to be disordered by DISOPRED  and PONDR  (data not shown), and therefore they were omitted from the analysis. The final model comprising residues 694–911 was constructed by iterating the homology modeling procedure (initially based on the raw FR alignments to the top-scoring templates 1 kpg, 1vl5, 1y8c, and 1xxl), evaluation of the sequence-structure fit by VERIFY3D, merging of fragments with best scores, and local realignment in poorly scored regions. Local realignments were constrained to maintain the overlap between the secondary structure elements found in the MTase structures used as modeling templates, and predicted for HEN1. This procedure was stopped when all regions in the protein core obtained acceptable VERIFY3D score (>0.3) or their score could not be improved by any manipulations, while the average VERIFY3D score for the whole model could not be improved.
Model-based identification of amino acid residues important for substrate-binding and catalysis
In agreement with the type of the template structures used, the spatial configuration of the C-terminal, surface-exposed part of motif IV of HEN1 at the bottom of the putative substrate-binding pocket is characteristic for small-molecule MTases. In particular, the peptide EhhEHh, (where h indicates a hydrophobic or aromatic residue) forms a small α-helix that is nearly perpendicular to all other secondary structure elements, which is commonly found in small molecule MTases, but thus far has not been identified in any nucleic acid MTase. Second, it conforms to the consensus sequence XhhEHh found in numerous small-molecule MTases, but is rather dissimilar to motif IV of typical MTases acting on nucleic acids (e.g. the (D/N/S)PP(Y/F/W/H) tetrapeptide of base MTases  or the DXXX motif of ribose MTases ). However, HEN1 contains an invariant Glu (E796) at the position commonly occupied by a carboxylate residue that participates in catalysis in nucleic acid MTases (e.g. by stabilization of the cofactor and the target in the preferred orientation and/or deprotonation of the substrate's attacking group), but is rarely present in small-molecule MTases.
The presumed catalytic pocket is also formed by conserved residues from motifs VI and X. Interestingly, the N-terminus of motif X in HEN1 reveals an invariant Arg residue (R701), which is located in a position similar to the invariant Lys residue in "orthodox" ribose MTases (e.g. K41 in VP39 or K38 in RrmJ). On the other hand, the surface-exposed C-terminal end of motif VI (located on the β-strand next to motif IV) is characterized by the pattern TPNXE(F/Y)N, which bears no similarity to its counterparts in other MTase families. In particular, it does not contain a Lys residue conserved and essential for catalysis in "orthodox" ribose MTases (e.g. K175 in VP39), which is proposed to position the hydroxyl oxygen toward the cofactor methyl group [45, 46]. Thus, the K-D-K triad of residues from motifs X, IV, and VI found in "orthodox" ribose MTases [29, 45] is definitely not conserved in the HEN1 family, although there is certain resemblance between the chemical character of invariant residues K175/D138 in VP39 and R701/E796 in HEN1.
Comparison of the putative active site of HEN1 with the ribose MTases from the SPOUT superfamily is even more difficult, as these proteins exhibit different folds and by definition do not share any homologous residues. The catalytic mechanism of SPOUT MTases is also much less understood than the mechanism of the RFM superfamily members, in part because of the lack of structural information on enzyme-substrate interactions. Nonetheless, several residues identified by the analysis of crystal structures and multiple sequence alignments have been found to be indispensable for the ribose MTase activity [47–49]. In particular, it has been proposed that the invariant Arg residue from one subunit in the SPOUT dimer (e.g. R145 in AviRb from S. viridochromogenes) is involved in steering the 2'-OH group of the target ribose towards the cofactor [48, 49], in analogy to K175 in VP39 . Here, we predict an analogous role also for R701 in HEN1. Important for the catalysis in SPOUT MTases are also two Asn residues (N139 and N262 in AviRb) that probably make contacts with the base of the methylated nucleoside. This role could be fulfilled by T826 and/or N828 from motif VI in HEN1.
Summarizing, we predict that the active site of HEN1 comprises R701 that orients the target hydroxyl group, E796 that stabilizes the cofactor and/or aids in deprotonation of the attacking oxygen atom. Other invariant or highly conserved residues of HEN1 such as T826 and/or N828 may be involved in binding of other regions of the substrate miRNA molecule (Figure 8). These predictions can be tested by site-directed mutagenesis of the respective residues.
It is remarkable that the predicted catalytic pocket of HEN1 is different from the "K-D-K" active site triad of known ribose 2'-O-MTases from the RFM superfamily, e.g. VP39, RrmJ, or fibrillarin [29, 31, 45] even though these proteins share the three-dimensional fold with the HEN1 CTD. The active site of HEN1 (as well as of the RrmJ-related MTases) is of course also different from the active site of ribose 2'-O-MTases that belong to the unrelated the SPOUT superfamily, e.g. TrmH . This suggests that ribose MTases evolved independently at least 3 times. Such independent origin of a particular type of MTase has been postulated also for enzymes that generate m7G in mRNA, rRNA, and tRNA [30, 50, 51], m1G in rRNA and different positions of tRNA [52–54], and m2G in different positions of tRNA [55, 56]. Thus, convergent evolution of the reaction specificity appears to be very frequent among RNA MTases. Unfortunately, thus far crystal structures of enzyme-substrate complexes are not yet available for comparison of any of these apparent functional analogs among base MTases. Among ribose MTases, only a crystal structure of a VP39-RNA complex  is available, which has served as a template for functional analyses of other members of the RrmJ/fibrillarin variety [29, 58], as well as unbound forms of structurally unrelated but functionally analogous enzymes from the SPOUT superfamily (e.g. [47, 48]). Hence, until a high resolution structure of HEN1 or one of its homologs is obtained (preferably as a co-crystal with the RNA substrate), our model will serve as a convenient platform to study sequence-structure-function relationships in this enzyme and its relation to other MTases.
Our analyses reveal that HEN1 shares a number of structural features and most likely a closer phylogenetic origin with small-molecule MTases rather than with other known RNA MTases. The phylogeny of HEN1-CTD suggests that the ancestor of this protein family appeared already before the divergence of plants and animals/fungi, by duplication and subfunctionalization of a small-molecule MTase similar to UbiG. Perhaps the ancient HEN1-CTD has been transferred to Eukaryota by horizontal gene transfer from a bacterium.
It remains to be determined if the additional region present only in the plant members of the HEN1 family and composed of the DSRM domain, La-like domain, unknown central domain, and PPI-like domain, is essential for the MTase specificity for the miRNAs and what is the exact role of the individual domains. It is interesting to note that this extension is present only in HEN1 from A. thaliana and O. sativa and not in HEN1 orthologs from other organisms. It can be speculated that DSRM and La-like domains may be responsible for substrate binding by the orthodox HEN1 from plants. So far, no miRNAs have been identified in Bacteria. Besides, in miRNAs from C. elegans or D. melanogaster no 2' or 3'-methylation was found . This suggests that the non-plant orthologs of HEN1 may be involved in methylation of some other substrates, which is particularly relevant given that HEN1 has apparently evolved from small-molecule MTases. Functional characterization of the short HEN1 orthologs, especially identification of their preferred substrates, and mutagenesis of the putative RNA-binding domains of plant HEN1 delineated in this work may shed the light on the evolution of specificity determinants in this interesting family of enzymes. It would be exciting to elucidate the evolutionary pathway leading from a small-molecule MTase to a nucleic acid MTase.
Sequence database searches
The BLAST family of algorithms [15, 59] were used to search the non-redundant version of current sequence databases (nr), the publicly available complete and incomplete genome sequences, and the EST (expressed sequence tag) database at the NCBI . Fragments of amino acid sequences (especially putative translations of the DNA sequences) were assembled into contiguous pieces using the sequence of A. thaliana HEN1 (GI 15638615) as a guide. The putative splicing sites were verified in reciprocal BLAST searches against the database comprising sequences of HEN1 homologs. All sequences were subsequently realigned using MUSCLE . Manual adjustments were introduced into the multiple sequence alignment (MSA) based on the BLAST pairwise comparisons, secondary structure prediction, and results of the fold-recognition analyses (see below).
The final alignments were used to generate a set of query profile HMMs using HHmake from the HHsearch package . The profile HHMs corresponding to all COG, KOG , PFAM , PDB70 , and CDD  entries were downloaded from the home site of HHsearch . Comparison of the profile HMMs (sequence+structure) was carried out using HHsearch , with default parameters.
To visualize pairwise similarities between and within protein families we used CLANS (CLuster ANalysis of Sequences), a Java utility that applies version of the Fruchterman-Reingold graph layout algorithm . CLANS uses the P-values of high-scoring segment pairs (HSPs) obtained from an N × N BLAST search, to compute attractive and repulsive forces between each sequence pair in a user-defined dataset. Three dimensional representation is achieved by randomly seeding sequences in space. The sequences are then moved within this environment according to the force vectors resulting from all pairwise interactions and the process is repeated to convergence.
The refined multiple sequence alignment was used to calculate the phylogenetic tree of the HEN1 family using SplisTree , which employs the split decomposition model developed by Bandel and Dress . The number of amino acid replacements per sequence position in the alignment was estimated using the JTT model . Aiming at the determination of the sampling variance of the distance values, 1000 bootstrap resampling of the alignment columns was exerted.
Protein structure prediction
Prediction of domains in the primary structure was carried out using the NCBI Conserved Domain Search utility . Prediction of secondary structure, protein order/disoreder, solvent accessibility, and tertiary fold-recognition was carried out via the GeneSilico meta-server gateway (see  and  for details). Secondary structure prediction was predicted using PSIPRED , PROFsec , PROF , SABLE , JNET , JUFO , and SAM-T02 . Protein disorder was predicted using PONDR  and DISOPRED . Solvent accessibility for the individual residues was predicted with SABLE  and JPRED . The fold-recognition analysis (attempt to match the query sequence to known protein structures) was carried out using FFAS03 , SAM-T02 , 3DPSSM , INBGU , FUGUE , mGENTHREADER , and SPARKS . Fold-recognition alignments reported by these methods were compared, evaluated, and ranked by the Pcons server .
The alignments between the sequence of HEN1 and the structures of selected templates (members of the RFM fold identified by Pcons) were used as a starting point for modeling of the HEN1 CTD tertiary structure using the "FRankenstein's Monster" approach , comprising cycles of model building by MODELLER , evaluation by VERIFY3D  via the COLORADO3D server , realignment in poorly scored regions and merging of best scoring fragments. The positions of predicted catalytic residues and secondary structure elements were used as spatial restraints. This strategy has previously helped us to build accurate, experimentally validated models of other RNA MTases, including 16S tRNA:2'-OH MTase Trm7p , sno/snRNA:cap hypermethylase Tgs1 , tRNA:m5C MTase Trm4p , tRNA:m1A MTase TrmI , tRNA:m2G MTases from Archaea  and Eukaryota , and tRNA:m7G MTase TrmB . Here, the refined comparative model comprised regions that could not be aligned to any of the templates and obtained unacceptably low scores in all models. Thus, they were re-modeled using a mixed "comparative/de novo" protocol, which has been successfully applied in the recent CASP6 competition to accurately model a variety of different proteins .
De novo modeling
The insertion between motifs VI and VII (aa 829–858) was modeled de novo using ROSETTA  in the context of the rest of the HEN1 CTD modeled by homology (and kept invariant during modeling of the insertion). Briefly, fragment selection based on profile-profile and secondary structure comparison with the ROSETTA database was performed and 3 and 9 amino acids fragment lists were generated for the re-modeled regions. Fragment assembly was performed with default options and medium level of side chains rotamers optimization. The set of 8000 preliminary models (decoys) was clustered and representatives of 5 largest clusters were selected as the final structures.
List of abbreviations
Dicer-like protein 1
double-stranded RNA-binding motif
product of an open reading frame
peptidyl prolyl cis-trans isomerase
This analysis was funded by the E.U. 6th Framework Programme (grant LSHG-CT-2003-503238). JMB was additionally supported by the EMBO/HHMI Young Investigator Award
- Chen X: microRNA biogenesis and function in plants. FEBS Lett. 2005, 579: 5923-5931. 10.1016/j.febslet.2005.07.071.View ArticlePubMed
- Millar AA, Waterhouse PM: Plant and animal microRNAs: similarities and differences. Funct Integr Genomics. 2005, 5: 129-135. 10.1007/s10142-005-0145-2.View ArticlePubMed
- Sullivan CS, Ganem D: MicroRNAs and viral infection. Mol Cell. 2005, 20: 3-7. 10.1016/j.molcel.2005.09.012.View ArticlePubMed
- Bartel DP: MicroRNAs: genomics, biogenesis, mechanism, and function. Cell. 2004, 116: 281-297. 10.1016/S0092-8674(04)00045-5.View ArticlePubMed
- Kim VN: MicroRNA biogenesis: coordinated cropping and dicing. Nat Rev Mol Cell Biol. 2005, 6: 376-385. 10.1038/nrm1644.View ArticlePubMed
- Ambros V: The functions of animal microRNAs. Nature. 2004, 431: 350-355. 10.1038/nature02871.View ArticlePubMed
- Gregory RI, Shiekhattar R: MicroRNA biogenesis and cancer. Cancer Res. 2005, 65: 3509-3512. 10.1158/0008-5472.CAN-05-0298.View ArticlePubMed
- Kidner CA, Martienssen RA: The developmental role of microRNA in plants. Curr Opin Plant Biol. 2005, 8: 38-44. 10.1016/j.pbi.2004.11.008.View ArticlePubMed
- Chen X, Liu J, Cheng Y, Jia D: HEN1 functions pleiotropically in Arabidopsis development and acts in C function in the flower. Development. 2002, 129: 1085-1094. 10.1242/dev.00114.View ArticlePubMed
- Park W, Li J, Song R, Messing J, Chen X: CARPEL FACTORY, a Dicer homolog, and HEN1, a novel protein, act in microRNA metabolism in Arabidopsis thaliana. Curr Biol. 2002, 12: 1484-1495. 10.1016/S0960-9822(02)01017-5.View ArticlePubMed
- Yu B, Yang Z, Li J, Minakhina S, Yang M, Padgett RW, Steward R, Chen X: Methylation as a crucial step in plant microRNA biogenesis. Science. 2005, 307: 932-935. 10.1126/science.1107130.View ArticlePubMed
- Li J, Yang Z, Yu B, Liu J, Chen X: Methylation protects miRNAs and siRNAs from a 3'-end uridylation activity in Arabidopsis. Curr Biol. 2005, 15: 1501-1507. 10.1016/j.cub.2005.07.029.View ArticlePubMed
- Dunin-Horkawicz S, Czerwoniec A, Gajda MJ, Feder M, Grosjean H, Bujnicki JM: MODOMICS: a database of RNA modification pathways. Nucleic Acids Res. 2006, 34: D145-9. 10.1093/nar/gkj084.PubMed CentralView ArticlePubMed
- Anantharaman V, Koonin EV, Aravind L: Comparative genomics and evolution of proteins involved in RNA metabolism. Nucleic Acids Res. 2002, 30: 1427-1464. 10.1093/nar/30.7.1427.PubMed CentralView ArticlePubMed
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25: 3389-3402. 10.1093/nar/25.17.3389.PubMed CentralView ArticlePubMed
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol. 1990, 215: 403-410. 10.1006/jmbi.1990.9999.View ArticlePubMed
- Marchler-Bauer A, Anderson JB, DeWeese-Scott C, Fedorova ND, Geer LY, He S, Hurwitz DI, Jackson JD, Jacobs AR, Lanczycki CJ, Liebert CA, Liu C, Madej T, Marchler GH, Mazumder R, Nikolskaya AN, Panchenko AR, Rao BS, Shoemaker BA, Simonyan V, Song JS, Thiessen PA, Vasudevan S, Wang Y, Yamashita RA, Yin JJ, Bryant SH: CDD: a curated Entrez database of conserved domain alignments. Nucleic Acids Res. 2003, 31: 383-387. 10.1093/nar/gkg087.PubMed CentralView ArticlePubMed
- Kurowski MA, Bujnicki JM: GeneSilico protein structure prediction meta-server. Nucleic Acids Res. 2003, 31: 3305-3307. 10.1093/nar/gkg557.PubMed CentralView ArticlePubMed
- Stefano JE: Purified lupus antigen La recognizes an oligouridylate stretch common to the 3' termini of RNA polymerase III transcripts. Cell. 1984, 36: 145-154. 10.1016/0092-8674(84)90083-7.View ArticlePubMed
- Gothel SF, Marahiel MA: Peptidyl-prolyl cis-trans isomerases, a superfamily of ubiquitous folding catalysts. Cell Mol Life Sci. 1999, 55: 423-436. 10.1007/s000180050299.View ArticlePubMed
- Kuzuhara T, Horikoshi M: A nuclear FK506-binding protein is a histone chaperone regulating rDNA silencing. Nat Struct Mol Biol. 2004, 11: 275-283. 10.1038/nsmb733.View ArticlePubMed
- Stebbins CE, Borukhov S, Orlova M, Polyakov A, Goldfarb A, Darst SA: Crystal structure of the GreA transcript cleavage factor from Escherichia coli. Nature. 1995, 373: 636-640. 10.1038/373636a0.View ArticlePubMed
- Laptenko O, Lee J, Lomakin I, Borukhov S: Transcript cleavage factors GreA and GreB act as transient catalytic components of RNA polymerase. Embo J. 2003, 22: 6322-6334. 10.1093/emboj/cdg610.PubMed CentralView ArticlePubMed
- Edgar RC: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004, 32: 1792-1797. 10.1093/nar/gkh340.PubMed CentralView ArticlePubMed
- Soding J: Protein homology detection by HMM-HMM comparison. Bioinformatics. 2005, 21: 951-960. 10.1093/bioinformatics/bti125.View ArticlePubMed
- Tatusov RL, Galperin MY, Natale DA, Koonin EV: The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res. 2000, 28: 33-36. 10.1093/nar/28.1.33.PubMed CentralView ArticlePubMed
- Frickey T, Lupas A: CLANS: a Java application for visualizing protein families based on pairwise similarity. Bioinformatics. 2004, 20: 3702-3704.View ArticlePubMed
- Bandelt HJ, Dress AW: Split decomposition: a new and useful approach to phylogenetic analysis of distance data. Mol Phylogenet Evol. 1992, 1: 242-252. 10.1016/1055-7903(92)90021-8.View ArticlePubMed
- Bujnicki JM, Rychlewski L: Reassignment of specificities of two cap methyltransferase domains in the reovirus lambda 2 protein. Genome Biol. 2001, 2: RESEARCH0038-10.1186/gb-2001-2-9-research0038.PubMed CentralView ArticlePubMed
- Bujnicki JM, Rychlewski L: Sequence analysis and structure prediction of aminoglycoside-resistance 16S rRNA:m7G methyltransferases. Acta Microbiol Pol. 2001, 50: 7-17.PubMed
- Feder M, Pas J, Wyrwicz LS, Bujnicki JM: Molecular phylogenetics of the RrmJ/fibrillarin superfamily of ribose 2'-O-methyltransferases. Gene. 2003, 302: 129-138. 10.1016/S0378-1119(02)01097-1.View ArticlePubMed
- Rychlewski L, Jaroszewski L, Li W, Godzik A: Comparison of sequence profiles. Strategies for structural predictions using sequence information. Protein Sci. 2000, 9: 232-241.PubMed CentralView ArticlePubMed
- Jones DT: GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences. J Mol Biol. 1999, 287: 797-815. 10.1006/jmbi.1999.2583.View ArticlePubMed
- Zhou H, Zhou Y: Single-body residue-level knowledge-based energy score combined with sequence-profile and secondary structure information for fold recognition. Proteins. 2004, 55: 1005-1013. 10.1002/prot.20007.View ArticlePubMed
- Lundstrom J, Rychlewski L, Bujnicki JM, Elofsson A: Pcons: a neural-network-based consensus predictor that improves fold recognition. Protein Sci. 2001, 10: 2354-2362. 10.1110/ps.08501.PubMed CentralView ArticlePubMed
- Kosinski J, Cymerman IA, Feder M, Kurowski MA, Sasin JM, Bujnicki JM: A "FRankenstein's monster" approach to comparative modeling: merging the finest fragments of Fold-Recognition models and iterative model refinement aided by 3D structure evaluation. Proteins. 2003, 53 Suppl 6: 369-379. 10.1002/prot.10545.View ArticlePubMed
- Ward JJ, McGuffin LJ, Bryson K, Buxton BF, Jones DT: The DISOPRED server for the prediction of protein disorder. Bioinformatics. 2004, 20: 2138-2139. 10.1093/bioinformatics/bth195.View ArticlePubMed
- Peng K, Vucetic S, Radivojac P, Brown CJ, Dunker AK, Obradovic Z: Optimizing long intrinsic disorder predictors with protein evolutionary information. J Bioinform Comput Biol. 2005, 3: 35-60. 10.1142/S0219720005000886.View ArticlePubMed
- Sasin JM, Bujnicki JM: COLORADO3D, a web server for the visual analysis of protein structures. Nucleic Acids Res. 2004, 32: W586-9. 10.1093/nar/gkh032.PubMed CentralView ArticlePubMed
- Simons KT, Kooperberg C, Huang E, Baker D: Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. J Mol Biol. 1997, 268: 209-225. 10.1006/jmbi.1997.0959.View ArticlePubMed
- Rohl CA, Strauss CE, Chivian D, Baker D: Modeling structurally variable regions in homologous proteins with ROSETTA. Proteins. 2004, 55: 656-677. 10.1002/prot.10629.View ArticlePubMed
- Fauman EB, Blumenthal RM, Cheng X: Structure and evolution of AdoMet-dependent methyltransferases. S-Adenosylmethionine-dependent methyltransferases: structures and functions. Edited by: Cheng X and Blumenthal RM. 1999, NJ, World Scientific Publishing, 1-38.View Article
- Glaser F, Pupko T, Paz I, Bell RE, Bechor-Shental D, Martz E, Ben-Tal N: ConSurf: identification of functional regions in proteins by surface-mapping of phylogenetic information. Bioinformatics. 2003, 19: 163-164. 10.1093/bioinformatics/19.1.163.View ArticlePubMed
- Bujnicki JM: Phylogenomic analysis of 16S rRNA:(guanine-N2) methyltransferases suggests new family members and reveals highly conserved motifs and a domain structure similar to other nucleic acid amino-methyltransferases. Faseb J. 2000, 14: 2365-2368. 10.1096/fj.00-0076com.View ArticlePubMed
- Hager J, Staker BL, Bugl H, Jakob U: Active site in RrmJ, a heat shock-induced methyltransferase. J Biol Chem. 2002, 277: 41978-41986. 10.1074/jbc.M205423200.View ArticlePubMed
- Li C, Xia Y, Gao X, Gershon PD: Mechanism of RNA 2'-O-methylation: evidence that the catalytic lysine acts to steer rather than deprotonate the target nucleophile. Biochemistry. 2004, 43: 5680-5687. 10.1021/bi0359980.View ArticlePubMed
- Nureki O, Watanabe K, Fukai S, Ishii R, Endo Y, Hori H, Yokoyama S: Deep knot structure for construction of active site and cofactor binding site of tRNA modification enzyme. Structure (Camb). 2004, 12: 593-602. 10.1016/j.str.2004.03.003.View Article
- Mosbacher TG, Bechthold A, Schulz GE: Structure and function of the antibiotic resistance-mediating methyltransferase AviRb from Streptomyces viridochromogenes. J Mol Biol. 2005, 345: 535-545. 10.1016/j.jmb.2004.10.051.View ArticlePubMed
- Watanabe K, Nureki O, Fukai S, Ishii R, Okamoto H, Yokoyama S, Endo Y, Hori H: Roles of conserved amino acid sequence motifs in the SpoU (TrmH) RNA methyltransferase family. J Biol Chem. 2005, 280: 10368-10377. 10.1074/jbc.M411209200.View ArticlePubMed
- Bujnicki JM, Feder M, Radlinska M, Rychlewski L: mRNA:guanine-N7 cap methyltransferases: identification of novel members of the family, evolutionary analysis, homology modeling, and analysis of sequence-structure-function relationships. BMC Bioinformatics. 2001, 2: 2-10.1186/1471-2105-2-2.PubMed CentralView ArticlePubMed
- Purta E, van Vliet F, Tricot C, De Bie LG, Feder M, Skowronek K, Droogmans L, Bujnicki JM: Sequence-structure-function relationships of a tRNA (m(7)G46) methyltransferase studied by homology modeling and site-directed mutagenesis. Proteins. 2005, 59: 482-488. 10.1002/prot.20454.View ArticlePubMed
- Bujnicki JM, Blumenthal RM, Rychlewski L: Sequence analysis and structure prediction of 23S rRNA:m1G methyltransferases reveals a conserved core augmented with a putative Zn-binding domain in the N-terminus and family-specific elaborations in the C-terminus. J Mol Microbiol Biotechnol. 2002, 4: 93-99.PubMed
- Jackman JE, Montange RK, Malik HS, Phizicky EM: Identification of the yeast gene encoding the tRNA m1G methyltransferase responsible for modification at position 9. Rna. 2003, 9: 574-585. 10.1261/rna.5070303.PubMed CentralView ArticlePubMed
- Christian T, Evilia C, Williams S, Hou YM: Distinct origins of tRNA(m1G37) methyltransferase. J Mol Biol. 2004, 339: 707-719. 10.1016/j.jmb.2004.04.025.View ArticlePubMed
- Bujnicki JM, Leach RA, Debski J, Rychlewski L: Bioinformatic analyses of the tRNA: (guanine:26, N2,N2)-dimethyltransferase (Trm1) family. J Mol Microbiol Biotechnol. 2002, 4: 405-415.PubMed
- Armengaud J, Urbonavicius J, Fernandez B, Chaussinand G, Bujnicki JM, Grosjean H: N2-methylation of guanosine at position 10 in tRNA is catalyzed by a THUMP domain-containing, S-adenosylmethionine-dependent methyltransferase, conserved in Archaea and Eukaryota. J Biol Chem. 2004, 279: 37142-37152. 10.1074/jbc.M403845200.View ArticlePubMed
- Hodel AE, Gershon PD, Quiocho FA: Structural basis for sequence-nonspecific recognition of 5'-capped mRNA by a cap-modifying enzyme. Mol Cell. 1998, 1: 443-447. 10.1016/S1097-2765(00)80044-1.View ArticlePubMed
- Bugl H, Fauman EB, Staker BL, Zheng F, Kushner SR, Saper MA, Bardwell JC, Jakob U: RNA methylation under heat shock control. Mol Cell. 2000, 6: 349-360. 10.1016/S1097-2765(00)00035-6.View ArticlePubMed
- Altschul SF, Lipman DJ: Protein database searches for multiple alignments. Proc Natl Acad Sci U S A. 1990, 87: 5509-5513.PubMed CentralView ArticlePubMed
- Wheeler DL, Barrett T, Benson DA, Bryant SH, Canese K, Church DM, DiCuccio M, Edgar R, Federhen S, Helmberg W, Kenton DL, Khovayko O, Lipman DJ, Madden TL, Maglott DR, Ostell J, Pontius JU, Pruitt KD, Schuler GD, Schriml LM, Sequeira E, Sherry ST, Sirotkin K, Starchenko G, Suzek TO, Tatusov R, Tatusova TA, Wagner L, Yaschenko E: Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2005, 33: D39-45. 10.1093/nar/gki062.PubMed CentralView ArticlePubMed
- Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, Krylov DM, Mazumder R, Mekhedov SL, Nikolskaya AN, Rao BS, Smirnov S, Sverdlov AV, Vasudevan S, Wolf YI, Yin JJ, Natale DA: The COG database: an updated version includes eukaryotes. BMC Bioinformatics. 2003, 4: 41-10.1186/1471-2105-4-41.PubMed CentralView ArticlePubMed
- Bateman A, Coin L, Durbin R, Finn RD, Hollich V, Griffiths-Jones S, Khanna A, Marshall M, Moxon S, Sonnhammer EL, Studholme DJ, Yeats C, Eddy SR: The Pfam protein families database. Nucleic Acids Res. 2004, 32: D138-41. 10.1093/nar/gkh121.PubMed CentralView ArticlePubMed
- Deshpande N, Addess KJ, Bluhm WF, Merino-Ott JC, Townsend-Merino W, Zhang Q, Knezevich C, Xie L, Chen L, Feng Z, Green RK, Flippen-Anderson JL, Westbrook J, Berman HM, Bourne PE: The RCSB Protein Data Bank: a redesigned query system and relational database based on the mmCIF schema. Nucleic Acids Res. 2005, 33 Database Issue: D233-7.
- The home site of HHsearch at the the Department of Developmental Biology (MPI): http://protevo.eb.tuebingen.mpg.de/toolkit/index.php?view=hhpred.
- Huson DH: SplitsTree: analyzing and visualizing evolutionary data. Bioinformatics. 1998, 14: 68-73. 10.1093/bioinformatics/14.1.68.View ArticlePubMed
- Jones DT, Taylor WR, Thornton JM: The rapid generation of mutation data matrices from protein sequences. Comput Appl Biosci. 1992, 8: 275-282.PubMed
- website NCBICDS: http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi.
- GeneSilico protein structure prediction MetaServer website: http://genesilico.pl/meta/.
- McGuffin LJ, Bryson K, Jones DT: The PSIPRED protein structure prediction server. Bioinformatics. 2000, 16: 404-405. 10.1093/bioinformatics/16.4.404.View ArticlePubMed
- Rost B, Yachdav G, Liu J: The PredictProtein server. Nucleic Acids Res. 2004, 32: W321-6.PubMed CentralView ArticlePubMed
- Ouali M, King RD: Cascaded multiple classifiers for secondary structure prediction. Protein Sci. 2000, 9: 1162-1176.PubMed CentralView ArticlePubMed
- Adamczak R, Porollo A, Meller J: Accurate prediction of solvent accessibility using neural networks-based regression. Proteins. 2004, 56: 753-767. 10.1002/prot.20176.View ArticlePubMed
- Cuff JA, Barton GJ: Application of multiple sequence alignment profiles to improve protein secondary structure prediction. Proteins. 2000, 40: 502-511. 10.1002/1097-0134(20000815)40:3<502::AID-PROT170>3.0.CO;2-Q.View ArticlePubMed
- Meiler J, Baker D: Coupled prediction of protein secondary and tertiary structure. Proc Natl Acad Sci U S A. 2003, 100: 12105-12110. 10.1073/pnas.1831973100.PubMed CentralView ArticlePubMed
- Karplus K, Karchin R, Draper J, Casper J, Mandel-Gutfreund Y, Diekhans M, Hughey R: Combining local-structure, fold-recognition, and new fold methods for protein structure prediction. Proteins. 2003, 53 Suppl 6: 491-496. 10.1002/prot.10540.View ArticlePubMed
- Romero P, Obradovic Z, Dunker AK: Natively disordered proteins : functions and predictions. Appl Bioinformatics. 2004, 3: 105-113. 10.2165/00822942-200403020-00005.View ArticlePubMed
- Cuff JA, Clamp ME, Siddiqui AS, Finlay M, Barton GJ: JPred: a consensus secondary structure prediction server. Bioinformatics. 1998, 14: 892-893. 10.1093/bioinformatics/14.10.892.View ArticlePubMed
- Kelley LA, MacCallum RM, Sternberg MJ: Enhanced genome annotation using structural profiles in the program 3D-PSSM. J Mol Biol. 2000, 299: 499-520. 10.1006/jmbi.2000.3741.View ArticlePubMed
- Fischer D: Hybrid fold recognition: combining sequence derived properties with evolutionary information. Pacific Symp Biocomp. 2000, 119-130.
- Shi J, Blundell TL, Mizuguchi K: FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. J Mol Biol. 2001, 310: 243-257. 10.1006/jmbi.2001.4762.View ArticlePubMed
- Fiser A, Sali A: Modeller: generation and refinement of homology-based protein structure models. Methods Enzymol. 2003, 374: 461-491.View ArticlePubMed
- Luthy R, Bowie JU, Eisenberg D: Assessment of protein models with three-dimensional profiles. Nature. 1992, 356: 83-85. 10.1038/356083a0.View ArticlePubMed
- Pintard L, Lecointe F, Bujnicki JM, Bonnerot C, Grosjean H, Lapeyre B: Trm7p catalyses the formation of two 2'-O-methylriboses in yeast tRNA anticodon loop. Embo J. 2002, 21: 1811-1820. 10.1093/emboj/21.7.1811.PubMed CentralView ArticlePubMed
- Mouaikel J, Bujnicki JM, Tazi J, Bordonne R: Sequence-structure-function relationships of Tgs1, the yeast snRNA/snoRNA cap hypermethylase. Nucleic Acids Res. 2003, 31: 4899-4909. 10.1093/nar/gkg656.PubMed CentralView ArticlePubMed
- Bujnicki JM, Feder M, Ayres CL, Redman KL: Sequence-structure-function studies of tRNA:m5C methyltransferase Trm4p and its relationship to DNA:m5C and RNA:m5U methyltransferases. Nucleic Acids Res. 2004, 32: 2453-2463. 10.1093/nar/gkh564.PubMed CentralView ArticlePubMed
- Roovers M, Wouters J, Bujnicki JM, Tricot C, Stalon V, Grosjean H, Droogmans L: A primordial RNA modification enzyme: the case of tRNA (m1A) methyltransferase. Nucleic Acids Res. 2004, 32: 465-476. 10.1093/nar/gkh191.PubMed CentralView ArticlePubMed
- Purushothaman SK, Bujnicki JM, Grosjean H, Lapeyre B: Trm11p and Trm112p are both required for the formation of 2-methylguanosine at position 10 in yeast tRNA. Mol Cell Biol. 2005, 25: 4359-4370. 10.1128/MCB.25.11.4359-4370.2005.PubMed CentralView ArticlePubMed
- Kosinski J, Gajda MJ, Cymerman IA, Kurowski MA, Pawlowski M, Boniecki M, Obarska A, Papaj G, Sroczynska-Obuchowicz P, Tkaczuk KL, Sniezynska P, Sasin JM, Augustyn A, Bujnicki JM, Feder M: FRankenstein becomes a cyborg: the automatic recombination and realignment of Fold-Recognition models in CASP6. Proteins. 2005, 61 Suppl 7: 106-113. 10.1002/prot.20726.View ArticlePubMed
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.