Evolutionary divergence of chloroplast FAD synthetase proteins

Background Flavin adenine dinucleotide synthetases (FADSs) - a group of bifunctional enzymes that carry out the dual functions of riboflavin phosphorylation to produce flavin mononucleotide (FMN) and its subsequent adenylation to generate FAD in most prokaryotes - were studied in plants in terms of sequence, structure and evolutionary history. Results Using a variety of bioinformatics methods we have found that FADS enzymes localized to the chloroplasts, which we term as plant-like FADS proteins, are distributed across a variety of green plant lineages and constitute a divergent protein family clearly of cyanobacterial origin. The C-terminal module of these enzymes does not contain the typical riboflavin kinase active site sequence, while the N-terminal module is broadly conserved. These results agree with a previous work reported by Sandoval et al. in 2008. Furthermore, our observations and preliminary experimental results indicate that the C-terminus of plant-like FADS proteins may contain a catalytic activity, but different to that of their prokaryotic counterparts. In fact, homology models predict that plant-specific conserved residues constitute a distinct active site in the C-terminus. Conclusions A structure-based sequence alignment and an in-depth evolutionary survey of FADS proteins, thought to be crucial in plant metabolism, are reported, which will be essential for the correct annotation of plant genomes and further structural and functional studies. This work is a contribution to our understanding of the evolutionary history of plant-like FADS enzymes, which constitute a new family of FADS proteins whose C-terminal module might be involved in a distinct catalytic activity.


Background
Flavin mononucleotide (FMN) and flavin adenine dinucleotide (FAD) are essential cofactors for numerous enzymes (i.e., dehydrogenases, oxidases, reductases) that participate in one-and two-electron oxidation-reduction processes critical to the major metabolic routes in all living organisms [1][2][3][4]. Riboflavin (RF), the precursor of FMN and FAD can be de novo synthesized by plants, fungi and many bacteria, but in mammals the only known RF source is the exogenous riboflavin (vitamin B 2 ) obtained through the diet [5][6][7].
In most prokaryotes, the synthesis of FMN and FAD is catalyzed from RF and ATP by a single bifunctional enzyme, usually known as FAD-synthetase (FADS), through the sequential action of its two enzymatic activities: an ATP:riboflavin kinase (RFK, EC 2.7.1.26) that transforms RF and ATP into FMN, and an ATP: FMN adenylyltransferase (FMNAT, 2.7.7.2) that catalyzes the subsequent adenylylation of FMN to FAD. Thus, FADS is a bifunctional RFK/FMNAT enzyme [8]. FADSs are typically 310-340 residues in length and are folded in two modules [9][10][11], each one mainly involved in one of the activities. The RFK reaction has been related with the C-terminal module (RFK-module), while the N-terminal module is mainly related to the FMNAT activity (FMNAT-module); hence, two independent substrate binding and catalytic sites are in charge of each activity [11,12]. In one hand, the RFKmodule (~180 aa) folds in a globular domain and its overall topology is similar to that found in the RFKs from Homo sapiens (HsRFK) and Schizosaccharomyzes pombe (SpRFK), with differences only observed in the loops connecting secondary structure elements [13,14]. Furthermore, the substrate binding motifs PTAN and GxY of the RFK-module are conserved among FADSs and eukaryotic RFKs. In the other hand, the FMNATmodule consists of an α/β dinucleotide binding domain with a typical Rossmann fold topology (~150 aa) [9][10][11]. Moreover, it seems to be remotely similar to the nucleotidyltransferase (NT) superfamily and contains the typical (H/T)xGH and xSST/SxxR motifs involved in binding nucleotide and phosphate groups. Interestingly, monofunctional enzymes with only RFK activity have been described in Bacillus subtilis [15] and Streptococcus agalactiae [16] but no monofunctional FMNAT enzymes have been reported in prokaryotes.
A different scenario is found in eukaryotes, where both activities are generally split in two different enzymes with either RFK or FMNAT activity [17][18][19][20]. As mentioned above, the RFK enzymes show sequence and structure similarity to the RFK-module of prokaryotic FADS [13,14]. However, eukaryotic FMNATs share little or no sequence similarity to the FMNAT-module of FADS, as these enzymes belong to two different protein superfamilies, which are thought to require different sets of active-site residues to carry out the same chemistry [21][22][23]. The eukaryotic FMNAT-module is currently classified as a member of the 3'-phosphoadenosine 5'phosphosulfate (PAPS) reductase-like family belonging to the "adenosine nucleotide α-hydrolase-like" superfamily, which has motifs different from those of NTs.
In plants only a few efforts have been devoted to this system. Early studies characterized apparently monofunctional enzymes with either RFK or FMNAT activities in several plant species [24][25][26][27]. In those studies the subcellular localization of RFK and FADS was not addressed although it is known that plants use flavin nucleotides in mitochondria, plastids and in the cytosol. In an earlier work RFK activity was associated to the cytosol and to an organellar fraction containing chloroplasts and mitochondria [28]. More recently, a bifunctional enzyme with both FMN hydrolase and RFK activities has been described in Arabidopsis thaliana (AtFHy/RFK) [29], whose N-terminal module responsible for the FMN hydrolase activity, shares sequence similarity with members of the haloacid dehalogenase (HAD) superfamily. AtFHy/RFK enzyme was predicted to be cytosolic [29]. Additionally, two more enzymes with FMNAT activity have been identified, cloned and characterized in the same species [30]. These AtRibF1 and AtRibF2 enzymes, herein plant-like FADS proteins, have an N-terminal module which is found to be homologous to the FMNAT-module of FADS, but instead its C-terminal module does not catalyze RF phosphorylation. AtRibF1 and AtRibF2 were localized to the chloroplast [30]. In mitochondria, the catalytically conversion of RF into FMN and FAD has been reported, due to the existence of mitochondrial RFK and FADS enzymes [31], but nevertheless FADS activity was much lower than in chloroplasts. These results agree with the cited confocal microscopy studies [30], but the hypothesis for the localization of FADS isoforms (AtRibF1 and AtRibF2) in mitochondria cannot be ruled out on the basis of bioinformatics (TAIR) analysis [31]. The mitochondrial FAD-forming enzymes reside in two distinct monofunctional enzymes, which can be separated in soluble and membrane-enriched fractions. It is worth mentioning that the genes encoding organellar RFK activity remains unidentified.
In order to investigate RFK and FMNAT activities in plants we have conducted an extensive bioinformatics survey using the available genomes in public databases. Here we report the identification of a conserved C-terminal module in plant FADS enzymes, which does not contain the typical RFK active site sequence, suggesting that it belongs to a new family of FADS proteins. The activity of this module is discussed.

Sequence and evolutionary analysis
As shown in Table 1, most prokaryotic genomes (1178/ 1194) surveyed in this study, including cyanobacteria, contain a single gene encoding for a bifunctional FADS enzyme, hereafter identified as FADS-type I protein.
Sequence searches in a variety of repositories of green plant sequences allowed us to identify a related group of genes, which contain two domains with high similarity (see Methods section) to FADS-type I sequences ( Figure 1 and 2), and are also present in a single copy in most cases with currently available complete genome (14/18). This result agrees with previous work by Sandoval et al. [30]. The N-terminal module of these proteins displays high similarity to the FMNAT-module of prokaryotic FADStype I, showing the typical motifs HxGH and xSST/SxxR involved in FMNAT activity, also common to other NTs. However, several observations can be made with respect to the C-terminal module in plant proteins: i) its length is 40 to 60 residues shorter; ii) the PTAN motif characteristic of the RFK activity mutates to PxS; iii) a LNxPP motif is found conserved in plants, next to the invariant GxY motif. AtRibF1 and AtRibF2 belong to this group of proteins and were recently characterized by Sandoval et al. [30], who did not detect any RFK activity in these enzymes. This experimental observation, together with the absence of the PTAN motif suggests a different enzymatic activity for this module. Therefore we named these proteins as plant-like FADS. Furthermore, a few bacterial parasites and pathogens isolated from plant, human or soil material and belonging to phyla Firmicutes, Actinobacteria, Tenericutes and Spirochaetes contain extra sequences with significant similarity to FADS-type I (E-values ≤ 1.5×10 -10 ). However, as shown in Figure 1 and 2, these sequences do not conserve the catalytic PTAN motifs, and have shorter C-terminal modules similar in length to plant like-FADS, suggesting that they might constitute another divergent type of FADS, which we label as FADS-type II (see Table 1).
In our sequence searches, plant-like FADS proteins are distributed across a variety of green plant lineages. Among land plants we found 84 matches (62 in Eudicots, 11 in Monocots, 6 in Coniferophyta, 3 in Magnoliids, 1 in Bryophyta, 1 in Lycopodiophyta). Other than land plants, plant-like FADS proteins were restricted to unicellular photosynthetic organisms belonging to phylum Chlorophyta (Micromonas pusilla, Chlamydomonas reinhardtii, Coccomyxa sp. and Ostreococcus lucimarinus) (see Table 2). All plant genomes surveyed encode these proteins in the nuclear genome (sequence searches in chloroplast genomes did not produce matches). Our search strategies did not find plant-like FADS proteins in any other eukaryotic genomes. These observations might imply that these genes have a prokaryotic origin, somewhat related to the endosymbiotic origin of chloroplasts. On the other hand, all green plant genomes explored have a copy of the cytosolic bifunctional FHy/ RFK protein except two Micromonas species that have a monofunctional RFK enzyme like in most eukaryotes. Proteins related to the HAD domain of FHy/RFK have been found in either bacteria or eukaryotes but this enzyme has been suggested to be unique in plant lineages probably being originated by fusion of a HAD to an eukaryotic-type RFK [29].
With the aim of further exploring the origin of plantlike FADS proteins, we carried out a phylogenetic analysis, which is summarized in the phylogram in Figure 3. According to this tree, which was rooted by taking the sequence of the cytosolic protein AtFHy/RFK from Arabidopsis thaliana as an outgroup, plant-like FADS proteins are closer to the group of cyanobacteria than any other bacterial species, which were selected to represent taxa included in Table 1. Indeed proteins from both cyanobacteria and green plants are enclosed in a clade with an associated approximate likelihood ratio (aLRT) of 0.80. These observations suggest that plant-like FADS proteins have a prokaryotic origin closely related with cyanobacteria, although shaping a divergent group of sequences, as illustrated in Figure 3. Note that AtFHy/ RFK clusters apart from prokaryotic FADS, confirming a different origin for this enzyme. The tree also suggests that plant-like FADS proteins diverged from bacterial FADS probably before the separation of the two major plant phyla (Streptophyta -plants and their closest green algal relatives-and Chlorophyta -the rest of green algae-), since they are present in species from both. In order to further investigate this, we searched for putative plant-like FADS homologues in Mesostigma viride, proposed to be the earliest plant lineage and anterior to the divergence of the Streptophyta and Chlorophyta [32]. Unfortunately, the nuclear genome of this species is not available and the chloroplast and mitochondrial genomes yielded no sequence matches. Our findings could indicate that plant-like FADS indeed derived from cyanobacterial FADS, despite the fact that they are now encoded in the nuclear genome [33].
Moreover, these results reveal that most bacteria containing FADS-type II sequences have also typical FADStype I proteins (see Table 1) and the tree in Figure 3 shows that these two types of sequences cluster together, implying that they might actually be paralogous genes. Only the genomes of Eubacterium saphenum ATCC 49989, Mycoplasma conjunctivae, Treponema pallidum subsp. Pallidum contain exclusively FADS-type II proteins. Although the tree does not support that FADStype II proteins constitute a distinct evolutionary class, their shorter and non-conserved C-terminal domains still clusters them clearly as a distinct functional group, which might have lost the C-terminal activity typical of FADS-type I proteins.
We also note the observed variability in terms of RFK and FMNAT enzymatic activities in bacterial genomes. While most prokaryotes have a single copy of a typical FADS-type I sequence, in 4 species both enzymatic activities are separated in monofunctional proteins,  which correspond to RFK or FMNAT modules, respectively. In other cases the FADS-type I sequence was accompanied by either monofunctional prokaryotic FMNAT (5 genomes) or monofunctional prokaryotic RFK (2 genomes). For instance, the genome of Alistipes putredinis contains both a monofunctional RFK and a FADS-type II sequence. Furthermore, although most bacterial FADS and RFK proteins include the conserved PTAN motif, some sequence variants can be found, including PTLK, PTLN, PTIN or KTAN, which nevertheless conserve the C-terminal module length. As these genomes do not contain any other RFK related proteins, these sequence variants are supposed still to be responsible for the RFK activity. FADS-type I sequences were also found in 8 eukaryotic species (Table 2), including Anopheles gambiae, Caenorhabditis sp., Trichoplax adhaerens, which is considered to be the most primitive multi-cellular animal known, or the freshwater amoeba Paulinella chromatophora, which harbours a cyanobacterial endosymbiont.
To evaluate the possible role of the C-terminal module of plant-like FADSs and the molecular arrangement of this protein region we modelled the AtRibF1 structure using all the above mentioned protein structures as templates. The best predicted models, as expected in terms of sequence similarity and alignment quality in putative binding and catalytic regions were obtained with the X-ray structure of TmFADS, SpRFK and HsRFK. In order to annotate putative functional residues, the comparative models were superposed to the original templates, including all ligands present in the experimental coordinates. The superposition in Figure 4 indicate that plant conserved residues 290 P×S 292 and 302 GVY 304 in the model occupy equivalent positions with respect to the ligands (ADP and FMN) present in the crystallographic structure of TmFADS. Similar results were obtained with SpRFK and HsRFK (data not shown). The predicted secondary structure of AtRibF1 corresponds well with that of TmFADS (see Figure 1 and 2), although it lacks the last α-helix at the N-terminal end, which is conserved also in SpRFK and HsRFK. This helix appears to be crucial for a correct orientation of the bound flavin substrate [14] and its absence in AtRibF1 is in agreement with the observed lack of RFK activity [30]. Figure 4 shows the specific-plant conserved residues Leu295, Asn296, Leu297, Pro298 and Pro299 ( 295 LNLPP 299 motif), Cys307, Cys319, Glu331, Gln344, Glu352, Phe353 and Gly354. It can be observed that the LNLPP motif is located in a flexible loop, in an opposite site of that bound FMN or ADP in TmFADS and orientated towards a cavity. Furthermore, the conserved residues Ser292, Cys307, Glu331 and Glu352 appear orientated towards this cavity suggesting that this site could possibly be a putative new binding-site in plantlike FADSs. It is also worth mentioning that Glu331 residue, invariant also in FADS and RFK families (i.e., Glu268 in FADS-type I from Corynebacterium ammoniagenes [11]), has been proposed to act as a catalytic base.
As mentioned above, remote similarity of the Cterminal module of plant-like FADSs was found with the C-terminal domain of other families such as SyNadM-Nudix (E-value = 6.8×10 -08 ), which belongs to a large superfamily of pyrophosphohydrolases (see Additional file 1; Figure S4). In Arabidopsis 27 Nudix hydrolase genes have been found and the proteins they encode are able to hydrolyze various types of nucleoside diphosphates derivatives such as ADP-glucose, ADPribose and a wide range of its derivatives, FAD, NADH, NADPH, and diadenosine polyphosphates [36]. Moreover, a remote sequence consensus of this protein region including the LNxPP motif was found with serine/threonine phosphatases 2C and members of the hydrolase superfamily. These observations suggest that the C-terminal module of AtRibF1 could have a function other than RFK enzymatic activity. Sandoval et al. [30] showed that purified recombinant AtRibF1 and AtRibF2 enzymes only display FADS activity, with undetectable RFK activity and hence assumed that these are indeed monofunctional enzymes. However, they were able to measure FMN hydrolase, FAD pyrophosphatase and RFK activities in Percoll-isolated chloroplasts.
As mentioned above, our bioinformatic analyses point out that plant-like FADS proteins could be bifunctional enzymes. More precisely, structural similarities predict a hydrolase and phosphatase activity for the C-terminal module, although the possibility to have a nonenzymatic regulatory role or to be a simple evolutionary relic should not be dismissed. Nevertheless, considering the results of Sandoval et al. [30] and ours, we could speculate that some of the measured activities in isolated chloroplasts (i.e., FMN hydrolase or FAD pyrophosphatase) could be associated to this C-terminal module. In order to test this hypothesis we have designed some experiments with recombinant plant-like FADS from soybean (Glycine max) and preliminary results seem to indicate that its C-terminal module might have a hydrolytic activity since GmFADS was able to convert FMN into RF (data not shown). Interestingly, this activity was not detected in purified FADS from C. ammoniagenes, a typical FADS-type I protein [11]. While these preliminary results seem to be in agreement with our theoretical analyses, clearly further investigations are necessary to confirm the possible enzymatic role of the C-terminal module of plant-like FADS. Future work will be done by using recombinant plant-like FADS (GmFADS) in order to confirm this observed hydrolytic enzymatic activity.

Conclusions
Plant-like FADS enzymes are distributed across a variety of green plant lineages and constitute a divergent protein family clearly of cyanobacterial origin. Homology models predict that plant-specific conserved residues are orientated towards a cavity, building a distinct active site when compared to that involved in substrate binding and catalysis in the C-terminus of typical FADS-type I enzymes. The remote relationship reported here between plant-like FADS proteins and members of pyrophosphohydrolase or phosphatase superfamilies as well as preliminary experimental results suggest that the Cterminal module of these proteins, clearly of bacterial origin, might be involved in a catalytic function.

Multiple alignments
The multiple alignment in Figure 1 and 2 was constructed in three steps, using the AtRibF1 protein as seed: 1) A sequence profile of plant-like proteins was compiled with ClustalW [38].
2) A selection of bacterial and eukaryotic sequences was aligned to the profile.
3) The sequence of Thermotoga maritima was added following the fold recognition alignment produced by HHPred [39] using the Protein data Bank structure 1mrz. This template was predicted to be the best modelling template by the BioInfoBank Meta Server (see below).
The multiple alignment used to drive the phylogenetic analysis summarized in Figure 3 was constructed as follows: 1) A representative set of FADS-type I and FADS-type II sequences were multiply aligned with CLUSTALW [38] and their secondary structure was predicted with PSIPRED [40] taking the Thermotoga maritima sequence as a representative. The sequences selected are representative of bacterial species having FADStype I and/or FADS-type II, and belonging to phyla Actinobacteria, Firmicutes, Spirochaetes and Tenericutes. Also sequences from species containing only FADS-type I, which belong to phyla Chlamydiae, Chlorobi, Chloroflexi (green non-sulfur bacteria), Cyanobacteria, Proteobacteria (purple bacteria) and Thermotogae are included, providing a good coverage of diverse phylogenetic bacterial groups.
2) The sequence of the cytosolic protein AtFHy/RFK from Arabidopsis thaliana [29] was added and aligned as an outgroup, and the resulting multiple alignment was converted to a hidden Markov model in HHSearch format with hhmake [39].
3) All plant-like FADS protein sequences that covered most of both domains (from the HxGH to the GxY motif) were considered complete, aligned with CLUSTALW [38] and converted to a hidden Markov model, including the PSIPRED secondary structure prediction of AtRibF1. The plant sequences selected cover the diverse phylogenetic groups of green plants as shown in Additional file 1; Figure S1. 4) The profiles 2) and 3) were globally aligned with hhalign [39] and the resulting alignment was trimmed by removing the poorly aligned segments, following the protocol "automated1" of the trimAL software (http://trimal.cgenomics.org/) [41]. The original and trimmed alignments are available in Additional file 1; Figures S2 and S3.

Phylogenetic analysis
The trimmed multiple alignment described above was used to drive a maximum likelihood phylogenetic tree with PhyML [42] and the best fitting amino acid substitution model selected with ProtTest [43]. The tree was midpoint-rooted and plotted with FigTree (http://tree. bio.ed.ac.uk/software/figtree).

Molecular modelling
To further investigate possible molecular functions of the C-terminal module of plant-like FADS proteins the complete protein sequence of AtRibF1 as well as its C-terminal domain were submitted to the BioInfoBank Meta Server [44]. The best aligned template provided by FUGUE [45] and PSI-BLAST [37] were subsequently employed to drive homology modelling with Modeller [46]. Further templates were found with HHpred [39] scans of the pdb70 library. Structural superposition and alignments were performed with the software MAMMOTH [47]. Molecular structures and models were inspected, analyzed and plotted with PyMol [48]. Secondary structure predictions were made with PSIPRED [40].
List of Abbreviations FADS-type I: bifunctional prokaryotic enzyme with riboflavin kinase and FMN adenylyltransferase activities; FADS-type II: prokaryotic enzyme with FMN adenylyltransferase activity of FADS in the N-terminal module and a putative different activity in the C-terminal module; FMNAT: monofunctional prokaryotic enzyme with FMN adenylyltransferase activity; FMNAT-module: module of FADS with FMN adenylyltransferase activity; plant-like FADS: bifunctional enzyme found in plants with FMN adenylyltransferase activity of FADS in the N-terminal domain and a putative different activity to that of FADS-type I in the C-terminal domain; RFK: monofunctional prokaryotic enzyme with riboflavin kinase activity; RFK-module: module of FADS with riboflavin kinase activity; Tris: Tris (hydroxymethyl)aminomethane.