Evolutionary divergence of chloroplast FAD synthetase proteins
© Yruela et al. 2010
Received: 30 April 2010
Accepted: 18 October 2010
Published: 18 October 2010
Skip to main content
© Yruela et al. 2010
Received: 30 April 2010
Accepted: 18 October 2010
Published: 18 October 2010
Flavin adenine dinucleotide synthetases (FADSs) - a group of bifunctional enzymes that carry out the dual functions of riboflavin phosphorylation to produce flavin mononucleotide (FMN) and its subsequent adenylation to generate FAD in most prokaryotes - were studied in plants in terms of sequence, structure and evolutionary history.
Using a variety of bioinformatics methods we have found that FADS enzymes localized to the chloroplasts, which we term as plant-like FADS proteins, are distributed across a variety of green plant lineages and constitute a divergent protein family clearly of cyanobacterial origin. The C-terminal module of these enzymes does not contain the typical riboflavin kinase active site sequence, while the N-terminal module is broadly conserved. These results agree with a previous work reported by Sandoval et al. in 2008. Furthermore, our observations and preliminary experimental results indicate that the C-terminus of plant-like FADS proteins may contain a catalytic activity, but different to that of their prokaryotic counterparts. In fact, homology models predict that plant-specific conserved residues constitute a distinct active site in the C-terminus.
A structure-based sequence alignment and an in-depth evolutionary survey of FADS proteins, thought to be crucial in plant metabolism, are reported, which will be essential for the correct annotation of plant genomes and further structural and functional studies. This work is a contribution to our understanding of the evolutionary history of plant-like FADS enzymes, which constitute a new family of FADS proteins whose C-terminal module might be involved in a distinct catalytic activity.
Flavin mononucleotide (FMN) and flavin adenine dinucleotide (FAD) are essential cofactors for numerous enzymes (i.e., dehydrogenases, oxidases, reductases) that participate in one- and two-electron oxidation-reduction processes critical to the major metabolic routes in all living organisms [1–4]. Riboflavin (RF), the precursor of FMN and FAD can be de novo synthesized by plants, fungi and many bacteria, but in mammals the only known RF source is the exogenous riboflavin (vitamin B2) obtained through the diet [5–7].
In most prokaryotes, the synthesis of FMN and FAD is catalyzed from RF and ATP by a single bifunctional enzyme, usually known as FAD-synthetase (FADS), through the sequential action of its two enzymatic activities: an ATP:riboflavin kinase (RFK, EC 220.127.116.11) that transforms RF and ATP into FMN, and an ATP:FMN adenylyltransferase (FMNAT, 18.104.22.168) that catalyzes the subsequent adenylylation of FMN to FAD. Thus, FADS is a bifunctional RFK/FMNAT enzyme . FADSs are typically 310-340 residues in length and are folded in two modules [9–11], each one mainly involved in one of the activities. The RFK reaction has been related with the C-terminal module (RFK-module), while the N-terminal module is mainly related to the FMNAT activity (FMNAT-module); hence, two independent substrate binding and catalytic sites are in charge of each activity [11, 12]. In one hand, the RFK-module (~ 180 aa) folds in a globular domain and its overall topology is similar to that found in the RFKs from Homo sapiens (HsRFK) and Schizosaccharomyzes pombe (SpRFK), with differences only observed in the loops connecting secondary structure elements [13, 14]. Furthermore, the substrate binding motifs PTAN and GxY of the RFK-module are conserved among FADSs and eukaryotic RFKs. In the other hand, the FMNAT-module consists of an α/β dinucleotide binding domain with a typical Rossmann fold topology (~ 150 aa) [9–11]. Moreover, it seems to be remotely similar to the nucleotidyltransferase (NT) superfamily and contains the typical (H/T)xGH and xSST/SxxR motifs involved in binding nucleotide and phosphate groups. Interestingly, monofunctional enzymes with only RFK activity have been described in Bacillus subtilis  and Streptococcus agalactiae  but no monofunctional FMNAT enzymes have been reported in prokaryotes.
A different scenario is found in eukaryotes, where both activities are generally split in two different enzymes with either RFK or FMNAT activity [17–20]. As mentioned above, the RFK enzymes show sequence and structure similarity to the RFK-module of prokaryotic FADS [13, 14]. However, eukaryotic FMNATs share little or no sequence similarity to the FMNAT-module of FADS, as these enzymes belong to two different protein superfamilies, which are thought to require different sets of active-site residues to carry out the same chemistry [21–23]. The eukaryotic FMNAT-module is currently classified as a member of the 3'-phosphoadenosine 5'-phosphosulfate (PAPS) reductase-like family belonging to the "adenosine nucleotide α-hydrolase-like" superfamily, which has motifs different from those of NTs.
In plants only a few efforts have been devoted to this system. Early studies characterized apparently monofunctional enzymes with either RFK or FMNAT activities in several plant species [24–27]. In those studies the subcellular localization of RFK and FADS was not addressed although it is known that plants use flavin nucleotides in mitochondria, plastids and in the cytosol. In an earlier work RFK activity was associated to the cytosol and to an organellar fraction containing chloroplasts and mitochondria . More recently, a bifunctional enzyme with both FMN hydrolase and RFK activities has been described in Arabidopsis thaliana (AtFHy/RFK) , whose N-terminal module responsible for the FMN hydrolase activity, shares sequence similarity with members of the haloacid dehalogenase (HAD) superfamily. AtFHy/RFK enzyme was predicted to be cytosolic . Additionally, two more enzymes with FMNAT activity have been identified, cloned and characterized in the same species . These AtRibF1 and AtRibF2 enzymes, herein plant-like FADS proteins, have an N-terminal module which is found to be homologous to the FMNAT-module of FADS, but instead its C-terminal module does not catalyze RF phosphorylation. AtRibF1 and AtRibF2 were localized to the chloroplast . In mitochondria, the catalytically conversion of RF into FMN and FAD has been reported, due to the existence of mitochondrial RFK and FADS enzymes , but nevertheless FADS activity was much lower than in chloroplasts. These results agree with the cited confocal microscopy studies , but the hypothesis for the localization of FADS isoforms (AtRibF1 and AtRibF2) in mitochondria cannot be ruled out on the basis of bioinformatics (TAIR) analysis . The mitochondrial FAD-forming enzymes reside in two distinct monofunctional enzymes, which can be separated in soluble and membrane-enriched fractions. It is worth mentioning that the genes encoding organellar RFK activity remains unidentified.
In order to investigate RFK and FMNAT activities in plants we have conducted an extensive bioinformatics survey using the available genomes in public databases. Here we report the identification of a conserved C-terminal module in plant FADS enzymes, which does not contain the typical RFK active site sequence, suggesting that it belongs to a new family of FADS proteins. The activity of this module is discussed.
Bacterial genomes containing FADS-like proteins (1)
Type of protein
Chthoniobacter flavus Ellin428 ctg76
Mesoplasma florum L1
Mycoplasma capricolum subsp. capricolum
Mycoplasma mycoides subsp. capricolum
Most bacterial genomes
Bacillus cereus 03BB102,
Bacillus cereus ATCC 10987
Bacillus turingiensis str. AlHakam
Geobacillus thermodenitrificans NG80-2
Listeria monocytogenes EGD-e
Bacillus subtilis subsp. subtilis
Haemophilus influenzae 86-028NP
Arthrobacter chlorophenolicus A6
Bacillus cereus subsp. cytotoxis NVH 391-98
Lactobacillus plantarum JDM1
Lactobacillus plantarum WCFS1
Listeria monocytogenes HCC23
Listeria welshimeri serovar
Oceanobacillus iheyensis HTE831
Treponema denticola ATCC 35405
Eubacterium saphenum ATCC 49989
Treponema pallidum subsp. pallidum str. Nichols
Alistipes putredinis DSM 17216
Furthermore, a few bacterial parasites and pathogens isolated from plant, human or soil material and belonging to phyla Firmicutes, Actinobacteria, Tenericutes and Spirochaetes contain extra sequences with significant similarity to FADS-type I (E-values ≤ 1.5×10-10). However, as shown in Figure 1 and 2, these sequences do not conserve the catalytic PTAN motifs, and have shorter C-terminal modules similar in length to plant like-FADS, suggesting that they might constitute another divergent type of FADS, which we label as FADS-type II (see Table 1).
Eukaryotic genomes containing FADS-like proteins (1)
Type of protein
Caenorhabditis japonica strain DF5081
Caernorhabditis remanei strain PB4641
Culex pipiens quienquefasciatus
Most eukaryotic genomes
Micromonas sp. RCC299
Land plant genomes
Moreover, these results reveal that most bacteria containing FADS-type II sequences have also typical FADS-type I proteins (see Table 1) and the tree in Figure 3 shows that these two types of sequences cluster together, implying that they might actually be paralogous genes. Only the genomes of Eubacterium saphenum ATCC 49989, Mycoplasma conjunctivae, Treponema pallidum subsp. Pallidum contain exclusively FADS-type II proteins. Although the tree does not support that FADS-type II proteins constitute a distinct evolutionary class, their shorter and non-conserved C-terminal domains still clusters them clearly as a distinct functional group, which might have lost the C-terminal activity typical of FADS-type I proteins.
We also note the observed variability in terms of RFK and FMNAT enzymatic activities in bacterial genomes. While most prokaryotes have a single copy of a typical FADS-type I sequence, in 4 species both enzymatic activities are separated in monofunctional proteins, which correspond to RFK or FMNAT modules, respectively. In other cases the FADS-type I sequence was accompanied by either monofunctional prokaryotic FMNAT (5 genomes) or monofunctional prokaryotic RFK (2 genomes). For instance, the genome of Alistipes putredinis contains both a monofunctional RFK and a FADS-type II sequence. Furthermore, although most bacterial FADS and RFK proteins include the conserved PTAN motif, some sequence variants can be found, including PTLK, PTLN, PTIN or KTAN, which nevertheless conserve the C-terminal module length. As these genomes do not contain any other RFK related proteins, these sequence variants are supposed still to be responsible for the RFK activity.
FADS-type I sequences were also found in 8 eukaryotic species (Table 2), including Anopheles gambiae, Caenorhabditis sp., Trichoplax adhaerens, which is considered to be the most primitive multi-cellular animal known, or the freshwater amoeba Paulinella chromatophora, which harbours a cyanobacterial endosymbiont.
It has been proposed that the double enzymatic activity of FADS proteins might be the result of a gene fusion event that genetically perpetuated an ancient protein-protein interaction [8, 34]. If this hypothesis holds true, it is remarkable that 1190 out of 1194 bacterial genomes have a copy of this fused gene (Table 1) while monofunctional RFKs are vastly predominant (658/755) across eukaryotic genomes (Table 2). This observed unbalance suggests that this fusion event, or functional coupling, would have been evolutionary favoured only in unicellular organisms, from which chloroplasts are thought to be derived.
We would like to remark that FADS proteins are annotated in sequence databases with confusing or contradictory names such as riboflavin biosynthesis protein RibF (i.e., YP_002487514.1), FMN adenylylate transferase (i.e., NP_692523.1), FMN adenylyltransferase (i.e., YP_001623829.1), FAD synthase (i.e., YP_518746), riboflavin kinase/FMN adenylyltransferase (i.e., YP_932710), flavokinase/FAD synthetase (i.e., YP_002783884), riboflavin kinase/FAD synthetase (i.e., NP_975116.1). Indeed, non-strictly FADS sequences are also named as that (i.e., YP_003062293.1). In the case of plants, plant-like FADS sequences are found as riboflavin kinase (i.e., gb|CO899788.1|, gb|BG509026.1|, gb|CN491424.1|) or protein-s isoprenylcysteine o-methyltransferase (i.e., PTHR12714, gb|GR935784.1)|, cassava1385). This misleading variability in names is of no benefit to users, and clearly so a consensus in the nomenclature would be desirable. We hope this work makes a contribution in this direction.
PSI-BLAST searches of both the complete sequence of the plant-like FADS AtRibF1 and its C-terminus matched only NTs and RFKs (10 iterations, E-value < 3×10-8) from bacteria, cyanobacteria, yeast and human. No other family was identified as related to the C-terminus of plant-like FADS. The similarity between the newly identified C-terminal module and NTs was further explored in the pdb70 structural library using the fold-recognition algorithm HHPred in local and global mode. Local searches provided significant matches (E-value ≤ 1.4×10-17) to: the RFK-module of TmFADS (pdb 1mrz, 1s4m, 1t6x, 1t6y, 1t6z, 2i1l; [9, 10]), SpRFK (pdb 1n08, 1n05, 1n07, 1n06; ), HsRFK (pdb 1nb0, 1nb9, 1p4m, 1q9s; ) and Trypanosoma brucei RFK (pdb 3bnw). Apart from these hits, global searches with the C-terminal domain yielded significant matches (E-values ≤ 2.5×10-6) to: nicotinamide mononucleotide (NMN) adenylyl transferase/ribosylnicotinamide kinase from Haemophilus influenzae (pdb 1lw7), ethanolamine-phosphate cytidylyltransferase from H. sapiens (pdb 3elb), nicotinamide-nucleotide adenylyltransferase (pdb 2qjt) from Francisella tularensis and the C-terminal module of bifunctional nicotinamide mononucleotide (NMN) adenylyltransferase/Nudix hydrolase from Synechocystis sp. (SyNadM-Nudix) (pdb 2qjo; ).
Figure 4 shows the specific-plant conserved residues Leu295, Asn296, Leu297, Pro298 and Pro299 (295LNLPP299 motif), Cys307, Cys319, Glu331, Gln344, Glu352, Phe353 and Gly354. It can be observed that the LNLPP motif is located in a flexible loop, in an opposite site of that bound FMN or ADP in TmFADS and orientated towards a cavity. Furthermore, the conserved residues Ser292, Cys307, Glu331 and Glu352 appear orientated towards this cavity suggesting that this site could possibly be a putative new binding-site in plant-like FADSs. It is also worth mentioning that Glu331 residue, invariant also in FADS and RFK families (i.e., Glu268 in FADS-type I from Corynebacterium ammoniagenes ), has been proposed to act as a catalytic base.
As mentioned above, remote similarity of the C-terminal module of plant-like FADSs was found with the C-terminal domain of other families such as SyNadM-Nudix (E-value = 6.8×10-08), which belongs to a large superfamily of pyrophosphohydrolases (see Additional file 1; Figure S4). In Arabidopsis 27 Nudix hydrolase genes have been found and the proteins they encode are able to hydrolyze various types of nucleoside diphosphates derivatives such as ADP-glucose, ADP-ribose and a wide range of its derivatives, FAD, NADH, NADPH, and diadenosine polyphosphates . Moreover, a remote sequence consensus of this protein region including the LNxPP motif was found with serine/threonine phosphatases 2C and members of the hydrolase superfamily. These observations suggest that the C-terminal module of AtRibF1 could have a function other than RFK enzymatic activity. Sandoval et al.  showed that purified recombinant AtRibF1 and AtRibF2 enzymes only display FADS activity, with undetectable RFK activity and hence assumed that these are indeed monofunctional enzymes. However, they were able to measure FMN hydrolase, FAD pyrophosphatase and RFK activities in Percoll-isolated chloroplasts.
As mentioned above, our bioinformatic analyses point out that plant-like FADS proteins could be bifunctional enzymes. More precisely, structural similarities predict a hydrolase and phosphatase activity for the C-terminal module, although the possibility to have a non-enzymatic regulatory role or to be a simple evolutionary relic should not be dismissed. Nevertheless, considering the results of Sandoval et al.  and ours, we could speculate that some of the measured activities in isolated chloroplasts (i.e., FMN hydrolase or FAD pyrophosphatase) could be associated to this C-terminal module. In order to test this hypothesis we have designed some experiments with recombinant plant-like FADS from soybean (Glycine max) and preliminary results seem to indicate that its C-terminal module might have a hydrolytic activity since GmFADS was able to convert FMN into RF (data not shown). Interestingly, this activity was not detected in purified FADS from C. ammoniagenes, a typical FADS-type I protein . While these preliminary results seem to be in agreement with our theoretical analyses, clearly further investigations are necessary to confirm the possible enzymatic role of the C-terminal module of plant-like FADS. Future work will be done by using recombinant plant-like FADS (GmFADS) in order to confirm this observed hydrolytic enzymatic activity.
Plant-like FADS enzymes are distributed across a variety of green plant lineages and constitute a divergent protein family clearly of cyanobacterial origin. Homology models predict that plant-specific conserved residues are orientated towards a cavity, building a distinct active site when compared to that involved in substrate binding and catalysis in the C-terminus of typical FADS-type I enzymes. The remote relationship reported here between plant-like FADS proteins and members of pyrophosphohydrolase or phosphatase superfamilies as well as preliminary experimental results suggest that the C-terminal module of these proteins, clearly of bacterial origin, might be involved in a catalytic function.
The NCBI non-redundant protein sequences (nr), nucleotide collection (nr/nt) and concise microbial protein databases, and the CyanoBase (http://genome.kazusa.or.jp/cyanobase/) sequence library, were scanned with PSI-BLAST  and TBLASTN, in order to retrieve sequences similar (E-values < 10-14) to: i) FADS from Thermotoga maritima (UniProtKB Q9WZW1 [9, 10]), ii) RFK from Bacillus subtilis (GenBank AAC00333.1) and iii) the plant-like FADS AtRibF1 (At5g23330, NP_568429, GenBank ACH56223.1) or AtRibF2 (At5g08340; NP_568192, GenBank ACH56224.1). To increase sensitivity, further similar sequences were retrieved by scanning either the N-terminal or the C-terminal modules of prokaryotic and plant-like FADS and RFK proteins.
In order to increase the recovery of plant sequences, which are currently distributed from a variety of repositories, the AtRibF1 sequence was also scanned against JGI Genome portal (http://genome.jgi-psf.org/), Phytozome (http://www.phytozome.net/) and PLAZA (http://bioinformatics.psb.ugent.be/plaza/) with E-values < 10-50. Finally, further searches were performed against NCBI Expressed Sequence Tags (filter: Viridiplantae) and TIGR Plant Transcript Assembly databases, with E-values < 10-20.
A sequence profile of plant-like proteins was compiled with ClustalW .
A selection of bacterial and eukaryotic sequences was aligned to the profile.
The sequence of Thermotoga maritima was added following the fold recognition alignment produced by HHPred  using the Protein data Bank structure 1mrz. This template was predicted to be the best modelling template by the BioInfoBank Meta Server (see below).
A representative set of FADS-type I and FADS-type II sequences were multiply aligned with CLUSTALW  and their secondary structure was predicted with PSIPRED  taking the Thermotoga maritima sequence as a representative. The sequences selected are representative of bacterial species having FADS-type I and/or FADS-type II, and belonging to phyla Actinobacteria, Firmicutes, Spirochaetes and Tenericutes. Also sequences from species containing only FADS-type I, which belong to phyla Chlamydiae, Chlorobi, Chloroflexi (green non-sulfur bacteria), Cyanobacteria, Proteobacteria (purple bacteria) and Thermotogae are included, providing a good coverage of diverse phylogenetic bacterial groups.
The sequence of the cytosolic protein AtFHy/RFK from Arabidopsis thaliana  was added and aligned as an outgroup, and the resulting multiple alignment was converted to a hidden Markov model in HHSearch format with hhmake .
All plant-like FADS protein sequences that covered most of both domains (from the HxGH to the GxY motif) were considered complete, aligned with CLUSTALW  and converted to a hidden Markov model, including the PSIPRED secondary structure prediction of AtRibF1. The plant sequences selected cover the diverse phylogenetic groups of green plants as shown in Additional file 1; Figure S1.
The profiles 2) and 3) were globally aligned with hhalign  and the resulting alignment was trimmed by removing the poorly aligned segments, following the protocol "automated1" of the trimAL software (http://trimal.cgenomics.org/) . The original and trimmed alignments are available in Additional file 1; Figures S2 and S3.
The trimmed multiple alignment described above was used to drive a maximum likelihood phylogenetic tree with PhyML  and the best fitting amino acid substitution model selected with ProtTest . The tree was midpoint-rooted and plotted with FigTree (http://tree.bio.ed.ac.uk/software/figtree).
To further investigate possible molecular functions of the C-terminal module of plant-like FADS proteins the complete protein sequence of AtRibF1 as well as its C-terminal domain were submitted to the BioInfoBank Meta Server . The best aligned template provided by FUGUE  and PSI-BLAST  were subsequently employed to drive homology modelling with Modeller . Further templates were found with HHpred  scans of the pdb70 library. Structural superposition and alignments were performed with the software MAMMOTH . Molecular structures and models were inspected, analyzed and plotted with PyMol . Secondary structure predictions were made with PSIPRED .
GmFADS gene synthesis, and E. coli protein over-expression and purification were carried out by GeneScript USA Inc. Conversion of FMN into RF was qualitatively assayed by addition of GmFADS or CaFADS  (final enzyme concentration ~ 0.2 μM) to a solution (final volume, 150 μl) containing 50 μM FMN, either 0 or 0.2 mM ATP and 10 mM MgCl2, in 50 mM Tris-HCl, pH 8.0. After incubation overnight at 25°C or 5 min at 37°C the reaction was stopped by boiling the preparations for 5 minutes. Transformation of FMN was visualized by resolving the products of the reaction at room temperature and in the dark by TLC on Silica Gel SIL-G-25 (20 cm × 20 cm, thickness 0.25 mm) plates. The moving phase was a solution of butanol:acetic acid:water (12:3:5). Flavin TLC spots were visually examined and scanned by determining their fluorescence under an ultraviolet light .
bifunctional prokaryotic enzyme with riboflavin kinase and FMN adenylyltransferase activities
prokaryotic enzyme with FMN adenylyltransferase activity of FADS in the N-terminal module and a putative different activity in the C-terminal module
monofunctional prokaryotic enzyme with FMN adenylyltransferase activity
module of FADS with FMN adenylyltransferase activity
bifunctional enzyme found in plants with FMN adenylyltransferase activity of FADS in the N-terminal domain and a putative different activity to that of FADS-type I in the C-terminal domain
monofunctional prokaryotic enzyme with riboflavin kinase activity
module of FADS with riboflavin kinase activity
We thank L. Sánchez-Pulido, P. Vinuesa and D. Moreno for comments on the manuscript. S. Arilla-Luna holds a fellowship from the Spanish Ministry of Science and Innovation (FPU program). This work was supported by CONSI+D, DGA (Grant PM062/2007 to M.M. and I.Y.), the Spanish Ministry of Science and Innovation (BIO2007-65890-C02-01 and BIO2010-14983 to M.M.) and Gobierno de Aragón (DGA-GE B18 to M. M. and I.Y. and DGA-GC A06 to B.C-M).
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.