Evolution of plant phage-type RNA polymerases: the genome of the basal angiosperm Nuphar advena encodes two mitochondrial and one plastid phage-type RNA polymerases

Background In mono- and eudicotyledonous plants, a small nuclear gene family (RpoT, RNA polymerase of the T3/T7 type) encodes mitochondrial as well as chloroplast RNA polymerases homologous to the T-odd bacteriophage enzymes. RpoT genes from angiosperms are well characterized, whereas data from deeper branching plant species are limited to the moss Physcomitrella and the spikemoss Selaginella. To further elucidate the molecular evolution of the RpoT polymerases in the plant kingdom and to get more insight into the potential importance of having more than one phage-type RNA polymerase (RNAP) available, we searched for the respective genes in the basal angiosperm Nuphar advena. Results By screening a set of BAC library filters, three RpoT genes were identified. Both genomic gene sequences and full-length cDNAs were determined. The NaRpoT mRNAs specify putative polypeptides of 996, 990 and 985 amino acids, respectively. All three genes comprise 19 exons and 18 introns, conserved in their positions with those known from RpoT genes of other land plants. The encoded proteins show a high degree of conservation at the amino acid sequence level, including all functional crucial regions and residues known from the phage T7 RNAP. The N-terminal transit peptides of two of the encoded polymerases, NaRpoTm1 and NaRpoTm2, conferred targeting of green fluorescent protein (GFP) exclusively to mitochondria, whereas the third polymerase, NaRpoTp, was targeted to chloroplasts. Remarkably, translation of NaRpoTp mRNA has to be initiated at a CUG codon to generate a functional plastid transit peptide. Thus, besides AGAMOUS in Arabidopsis and the Nicotiana RpoTp gene, N. advena RpoTp provides another example for a plant mRNA that is exclusively translated from a non-AUG codon. In contrast to the RpoT of the lycophyte Selaginella and those of the moss Physcomitrella, which are according to phylogenetic analyses in sister positions to all other phage-type polymerases of angiosperms, the Nuphar RpoTs clustered with the well separated clades of mitochondrial (NaRpoTm1 and NaRpoTm2) and plastid (NaRpoTp) polymerases. Conclusions Nuphar advena encodes two mitochondrial and one plastid phage-type RNAP. Identification of a plastid-localized phage-type RNAP in this basal angiosperm, orthologous to all other RpoTp enzymes of flowering plants, suggests that the duplication event giving rise to a nuclear gene-encoded plastid RNA polymerase, not present in lycopods, took place after the split of lycopods from all other tracheophytes. A dual-targeted mitochondrial and plastididal RNA polymerase (RpoTmp), as present in eudicots but not monocots, was not detected in Nuphar suggesting that its occurrence is an evolutionary novelty of eudicotyledonous plants like Arabidopsis.


Background
In the mitochondria of all eukaryotes, with the exception of jacobids, the bacterial-type RNA polymerase of the former endosymbiont has been replaced by a T-odd phage-type RNA polymerase (for review, see [1]). The mitochondrial genome of the jacobid Reclinomonas americana encodes a bacterial-type RNAP [2,3], whose expression has still to be demonstrated. Likewise, chloroplast genomes have retained the rpoA, B, and C genes of their cyanobacterial ancestor, which encode the core subunits of the plastid-encoded plastid RNAP (PEP). Additionally, mono-and eudicotyledonous plants were found to require a second, nuclear gene-encoded plastid RNAP activity (NEP) to transcribe their chloroplast genes [1,4,5]. Phage-type RNA polymerases were identified as representing this NEP activity [6][7][8]. Thus, in mono-and eudicots, nuclear gene-encoded phage-type RNA polymerases (RpoT polymerases) not only transcribe the mitochondrial genome but are also involved in the transcription of the plastid genome [1,5,9]. Genes encoding phage-type RNA polymerases have been identified in the nuclear genomes of various flowering plants, like Chenopodium album [10], Arabidopsis thaliana [7,11], Nicotiana ssp. [12][13][14], Zea mays [15], wheat [16], barley [17], and rice [18]. The moss Physcomitrella patens contains three RpoT genes [19,20], genome project data, http://www.phytozome.net/physcomitrella. Two of the Physcomitrella RpoTs are potentially capable of being targeted to both mitochondria and chloroplasts [19], whereas the third gene encodes an RNAP of exclusively mitochondrial localization (U. Richter, unpublished data). Eudicots like Arabidopsis and Nicotiana harbor three phage-type RNA polymerases as well, but their localization within the cell differs from the Physcomitrella enzymes. Eudicots possess a mitochondrial (RpoTm), a plastid (RpoTp) and a dual-targeted phage-type RNA polymerase (RpoTmp; [11,13,14]), the latter involved in the transcription of mitochondrial and plastid genes [21][22][23][24]. No phage-type NEP has been detected in algae thus far. In Chlamydomonas, only one RpoT gene was identified (Weihe et al., unpublished data; genome project data, http://genome.jgi-psf.org/ Chlre4/Chlre4.home.html), presumably encoding a mitochondrial-localized RNAP. The single-copy RpoT genes identified in the genomes of other green algae (Ostreococcus, Micromonas), most likely, encode mitochondrial RNA polymerases. Multiple phage-type RNA polymerases are only found in land plant species. Maier and colleagues [25] proposed that this feature could either be a prerequisite for the spatio-temporal regulatory needs of embryophytes and an adaption to the peculiar requirements of a terrestrial life style or it might be the mere result of the specifics of the plant organelle genetic systems in interaction with the nuclear genome (transgenomic suppression of point mutations). In this context it is interesting to note that the lycophyte Selaginella moellendorffii possesses also only a single RpoT polymerase, which likely is exclusively active in mitochondria [26]. Thus, there seems to be no NEP activity in the lycophytes. Like the Physcomitrella RpoTs, the Selaginella polymerase is separated in phylogentic trees from the angiosperm clade, which forms two groups: plastid-localized enzymes on one hand, and mitochondrial and dual-targeted polymerases on the other [1,5]. The origin of the NEP activity as found in mono-and eudicots and of the dual-targeted RpoT polymerases observed in eudicots remains unclear.
To gain a deeper insight into the evolution of phagetype RNA polymerases in the plant lineage and to deepen our understanding of the significance of multiple phage-type RNAP activities in both mitochondria and plastids we have investigated the waterlily Nuphar advena. Together with Amborella, Liriodendron and Acorus, Nuphar is one of the most studied basal angiosperms. As one of the deepest branching angiosperms, Nuphar has become an important model plant for understanding the origin of key angiosperm innovations. Here, we report the identification and characterization of three RpoT genes from Nuphar advena. Our data indicate that Nuphar advena (and possibly other basal angiosperms) possesses two mitochondrial-localized phage-type RNAPs as well as already a plastid-localized polymerase.

Nuphar advena possesses three RpoT genes
Screening of a BAC library identified three different RpoT genes in N. advena. 24 BAC clones hybridized with an RpoT cDNA fragment from Selaginella used as probe. PCR and sequencing suggested that they represented three similar, yet individual genes. Two of these genes have been sequenced completely, the third one in large portions, including all exons (see Figure 1). The genes were named, according to subcellular localization (see below) of their gene products, NaRpoTm1, NaR-poTm2, and NaRpoTp. The sequences of the three NaR-poT genes were deposited in the EMBL database under accession numbers FN811768 (NaRpoTm1), FN820498 (NaRpoTm2) and FN811769 (NaRpoTp), respectively. The lengths of the three genes were 28.5 kb for NaR-poTm1, > 16.2 kb for NaRpoTm2, and 13.6 kb for NaRpoTp.

Isolation of Nuphar RpoT cDNAs
Full-length cDNAs were obtained by RACE (rapid amplification of cDNA ends) reactions using specific primers (for primer sequences, see Additional file 1) derived from the genomic sequences as shown in Figure 1. All angiosperm nuclear RpoT genes identified thus far comprise 18 introns at conserved positions [1]. Comparison of genomic and cDNA sequences (see Figure 1) shows that these 18 introns are present as well, at the same insertion sites (see Figure 2), in the three Nuphar RpoT genes. None of the additional introns found in the 5' part of the Physcomitrella and Selaginella RpoT genes, respectively, were found in the Nuphar genes. The lengths of the introns vary considerably among the three Nuphar RpoTs, and most of the introns are much longer than those of other land plant RpoT genes. All exon-intron junctions contain conserved GT and AG sequences at the 5'-and 3'-ends of the introns, respectively. Remarkably, NaRpoTp did not exhibit the canonical translation start codon ATG (AUG). Instead, a CTG (CUG) codon was found at position +148, from which translation could be initiated. The following findings are indicative of a translation start from this position: Stop codons in the 5' region exclude further upstream translation initiation sites. The methionine encoded by the most upstream in-frame ATG (nt 466 of NaRpoTp) aligns to amino acid residue 125 of Arabidopsis RpoTp, and the amino terminus derived from this position displayed neither plastid nor mitochondrial targeting properties (see below). On the other hand, the deduced amino acid sequence starting at +148 is enriched in hydroxylated amino acids, but is virtually lacking acidic residues, thus exhibiting features of stroma-targeting plastid transit peptides [27]. Interestingly, a translational start from a CUG codon has been found in the RpoTp gene of tobacco [12]. Thus, we assume that translation of NaRpoTp starts from a non-canonical CUG at position +148.
The predicted NaRpoT proteins comprise 996 (NaR-poTm1), 990 (NaRpoTm2) and 985 (NaRpoTp) amino acids, respectively. NaRpoTm1 and NaRpoTm2 exhibit a remarkably high identity of 96.8%, NaRpoTp has 63.1% and 64.6% identical residues compared with NaRpoTm1 and NaRpoTm2, respectively. The alignment of the RpoT polymerases from N. advena with those from Arabidopsis, Physcomitrella and Selaginella (see Figure 2) demonstrates a high degree of conservation at the amino acid sequence level, most striking in the C-terminal part, including all functionally crucial regions and residues known from the phage T7 RNA polymerase [28,29].
Targeting of the N. advena RpoTm1 and RpoTm2 polymerases Subcellular localization of the Nuphar RpoT gene products was predicted using the algorithms TargetP [30] Figure 2 Comparison of the deduced amino acid sequences of RpoT polymerases. Sequences from Nuphar (NaRpoTm1, NaRpoTm2 and NaRpoTp), Selaginella (SmRpoTm), Arabidopsis (AtRpoTm, AtRpoTp and AtRpoTmp) and Physcomitrella (PpRpoT1mp, PpRpoT2mp and PpRpoT3) were aligned using ClustalW. Accession numbers are as follows: AtRpoTm, P92969; AtRpoTmp, CAC17120; AtRpoTp, O24600; PpRpoTmp1, CAC95163; and PpRpoTmp2, CAC95164. PpRpoT3 is an RpoT amino acid sequence derived from the database of the Physcomitrella patens genome project http://www.phytozome.net/physcomitrella. In silico analysis of the genome as well as expressed sequence tag (EST) data strongly suggest that the sequence, designated as PpRpoT3, is a product of an RpoT gene with the conserved intron-exon structure of land plants that encodes a functional RNA polymerase (U. Richter, unpublished data). Black lines indicate conserved blocks in the RpoT polymerase family; functionally crucial residues [28,29] are indicated by asterisks. The position of common introns is designated by filled triangles and www.cbs.dtu.dk/services/TargetP and Predotar [31] http:// urgi.versailles.inra.fr/predotar/predotar.html. For NaR-poTm1 and NaRpoTm2 both algorithms specified a mitochondrial import of the proteins, whereas analysis of NaRpoTp clearly indicated plastid targeting properties. To verify the subcellular localization, the amino termini of the Nuphar RpoT sequences were translationally fused to GFP (Figure 3). Assuming that translation starts from the first encoded methionine, the following constructs were generated: Na-RpoTm1 met -GFP and Na-RpoTm2 met -GFP with the first encoded methionine cloned immediately downstream of the 35 S promoter for forced translation initiation, Na-RpoTm1 utr -GFP and Na-RpoTm2 utr -GFP containing the whole 5' untranslated region, and Na-RpoTm1 mut -GFP and Na-RpoTm2 mut -GFP, in which the encoded methionine had been substituted by isoleucine (see Figure 3). The fusion proteins were expressed in Arabidopsis protoplasts. The results of the subcellular import studies are presented in Figure 4. Transformation with the mitochondrial control CoxIV-GFP [32] resulted in accumulation of GFP in punctuate structures of about 1 μm size ( Figure 4A) identified as mitochondria [7,11]. A GFP fusion of the amino terminus of Arabidopsis RecA [32] was employed as a plastid control ( Figure 4B). In accordance with the targeting predictions, both Na-RpoTm1-GFP and Na-RpoTm2-GFP constructs exhibited the same characteristic subcellular localization: in the case of Na-RpoTm1 met -GFP ( Figure 4D) and Na-RpoTm2 met -GFP ( Figure 4G), with forced translation from the first encoded methionine, GFP fluorescence was observed exclusively in mitochondria. The constructs containing the full-length of the 5' untranslated leader sequence, Na-RpoTm1 utr -GFP ( Figure 4E) and Na-RpoTm2 utr -GFP ( Figure 4H) showed exclusive mitochondrial targeting as well. When the mutated (preventing recognition of the AUG codon) transit peptides Na-RpoTm1 mut ( Figure 4F) and Na-RpoTm2mut ( Figure 4I) were used, GFP fluorescence was detectable neither in mitochondria, nor in chloroplasts. It was concluded that the AUG at position +177 (NaRpoTm1) and +253 (NaRpoTm2), respectively, are the only available RpoT start codons, from which translation of polypeptides with mitochondrial targeting properties is initiated.

Nuphar RpoTp translation is efficiently initiated at a CUG codon
Examination of NaRpoTp upstream sequences revealed a CTG triplet at nucleotide position +148 (see above). Translation initiation at this CUG codon would give rise to an RpoTp protein of 985 residues, the amino terminus of which was predicted in silico to possess plastid targeting properties. To experimentally test whether translation indeed initiates at this non-canonical codon, the following three Na-RpoTp-GFP constructs were generated (see Figure 3): Na-RpoTp met *-GFP, with the wild-type CUG (+148) cloned immediately downstream of the 35 S promoter for forced translation; Na-RpoTp utr -GFP containing the whole 5' untranslated region of 236 nt and thus preserving the sequence context, known to be crucial for initiation at non-AUG codons in plants [33]; and Na-RpoTp mut -GFP, in which the CUG was modified to CAC to prevent the recognition of CUG as a startcodon. The Na-RpoTp met *-GFP construct gave rise to green GFP fluorescence in chloroplasts which overlapped with the red chlorophyll autofluorescence, clearly confirming co-localization of red and green fluorescence in chloroplasts ( Figure 4J). An identical fluorescence pattern was observed using construct Na-RpoTp utr -GFP ( Figure 4K), whereas expression of Na-RpoTp mut -GFP ( Figure 4L) completely abolished import of the GFP to the chloroplasts. These data provide convincing evidence that translation of NaRpoTp is solely initiated from the CUG codon at position +148.

Phylogenetic analysis
Using the Bayesian algorithm, maximum-likelihood (ML) as well as maximum parsimony (MP), phylogenetic trees were reconstructed to elucidate the molecular phylogeny of the RpoT polymerases and to determine the evolutionary position of the polymerases identified and described in the present study. Tree reconstruction was based on a multiple alignment of 41 RpoT sequences (see "Methods"). Bayesian as well as ML and MP analysis resulted in essentially the same topology (not shown). Figure 5 shows the consensus tree of a Bayesian analysis in which angiosperm RpoT polymerases constitute two clearly discernible groups: one consisting of plastid-localized polymerases, and the other of mitochondrial-localized and dual-targeted enzymes. Whereas the Selaginella and Physcomitrella polymerases do not belong to the branches of well separated plastid and mitochondrial (and dual targeted) polymerases, the RpoT polymerases from the basal angiosperm N. advena cluster with the branches of plastid and mitochondrial/ dual targeted sequences: NaRpoTm1 and NaRpoTm2 within the mitochondrial, and NaRpoTp within the plastid branch.

Discussion
Genes encoding phage-type mitochondrial and plastid RNA polymerases have been identified from numerous monocotyledonous and eudicotyledonous angiosperm species (for review, see [1]). In contrast, knowledge on RpoT polymerases of deep branching land plants is so far limited to the moss Physcomitrella patens [19,20] and the lycophyte Selaginella moellendorfii [26], and no information at all is available about phage-type RNA polymerases from the basal angiosperm lineages that precede the monocot-eudicot divergence. Here we show that the waterlily Nuphar advena, a basal angiosperm, encodes three RpoT polymerases. The encoded proteins of 996, 990, and 985 amino acids, respectively, exhibit the characteristic domains that are highly conserved between all RpoT polymerases, including the residues shown to be essential and located within the catalytic pocket of the polymerase (D537, K631, Y639, G640, D812, residue numbers as given for T7 RNA polymerase). The high conservation of amino acid sequences and the identical position of the introns in the RpoT genes of Selaginella, Physcomitrella, Nuphar and monocotyledonous and eudicotyledonous angiosperms (see Figure 2) suggests a common ancestral gene giving rise to all land plant RpoT genes. Phylogenetic analysis (see Figure 5) confirms this hypothesis.
Although Physcomitrella (one mitochondrial and two dual-targeted) and eudictos (one mitochondrial, one plastid and one dual-targeted) possess also three phagetype RNA polymerases, the localization of the three Nuphar RpoT polymerases shows a new pattern. The N-termini of two of the three RpoT genes of N. advena show properties of mitochondrial transit peptides. Using translational fusions of the putative NaRpoT transit peptides with GFP, we demonstrated that these transit peptides confer exclusively mitochondrial import.
Mitochondrial import of NaRpoTm1-and NaRpoTm2-GFP was also maintained when the fusion constructs contained the full-length 5'-UTRs of the genes ( Figure  4). We included these constructs in our study since the presence of the 5'-UTR may alter the targeting of proteins [34]. Thus, we conclude that N. advena encodes two phage-type mitochondrial RNA polymerases. Phylogenetic analysis (see Figure 5) indicates that the third RpoT gene of Nuphar, NaRpoTp, encodes a plastid phage-type RNA polymerase. In the 5' part of the NaR-poTp cDNA no canonical start codon was identified, with the first ATG triplet occurring only at position  466. However, a potential non-AUG initiation codon (CUG) was revealed at position 148. Translation from this codon would yield an N-terminal leader peptide with genuine plastid targeting properties, as predicted by two prediction algorithms (TargetP and Predotar). Three different GFP fusions were designed to test the translation initiation capacity of this CUG codon. The results proved a plastid import of the derived aminoterminus ( Figure 4J), as well as an efficient translation initiation at the CUG within the context of the fulllength 5'-UTR ( Figure 4K) that could be abolished by modifying the codon to CAC ( Figure 4L). Thus, Nuphar RpoTp belongs to the rare cases of non-viral plant genes [35][36][37] that initiate translation exclusively at a non-AUG codon. Interestingly, this is the second case of non-AUG translation initiation among RpoT genes specifying plastid-localized RNA polymerases: translation of the tobacco RpoTp gene also starts from a CUG codon [12]. Both mono-and eudicotyledonous plants possess a solely plastid-localized phage-type RNAP (RpoTp) together with a purely mitochondrial-localized RpoT enzyme (RpoTm) and, in the case of eudicots, a third phage-type RNAP with dual localization in both organelles is found. The data presented here suggest that all RpoTp proteins descent from a common duplication event that took place in a common ancestor of all flowering plants. Thus far it is unknown whether ferns or gymnosperms contain nuclear genes encoding plastidlocalized phage-type RNAPs as well. Since the duplication event giving rise to the second NEP activity in eudicots is clearly more recent, identification of a purely plastid-localized phage-type RNAP in the basal angiosperm Nuphar advena, orthologous to all other purely plastid-targeted enzymes (RpoTp) of flowering plants, suggests that the acquisition of a nuclear gene-encoded transcriptional activity for plastids, not present in lycopods, took place after the split of lycopods from all other tracheophytes, with or before the rise of flowering plants. Moreover, the lack of a dual-targeted RpoTmp both in Nuphar and in monocots suggests that the RpoTmp enzyme detected in eudicots is an 'invention' due to an RpoTm gene duplication that might have occurred only after the separation of monocots and eudicots. The putative plastid targeting sequences as present in two of the three Physcomitrella RpoT proteins are therefore clearly species-or lineage-specific convergent inventions. Interestingly, multiple mitochondrial RNA polymerasesas as found in Physcomitrella and eudicots are indentified in Nuphar as well. The fixation of duplicated RpoT genes leads to convergent multiplicity of mitochondrial RNAPs in Nuphar, Physcomitrella and eudicots, not found in any other eukaryotic lineage.
Recently it was shown that in Arabidopsis RpoTmp null mutants transcription of a specific set of mitochondrial genes is strongly reduced. Moreover, accumulation of respiratory complexes was affected to very different levels, suggesting that the presence of multiple transcriptional activities in mitochondria may allow plants to regulate mitochondrial gene expression in a complex specific manner [24]. Further investigations will be necessary to show if a similar division of labor evolved in case of the two mitochondrial RNA polymerases in Nuphar and address the specific impact of NEP and PEP transcriptional activities for gene expression in Nuphar chloroplasts.

Conclusions
Identification of three RpoT genes in Nuphar advena, specifying two mitochondrial and one plastid-localized polymerases, suggests that multiple phage-type organellar RNAPs already exist among basal angiosperms. From the high similarity of the encoded amino acid sequences, the conservation of intron positions and phylogenetic analysis we conclude that the RpoT genes of Nuphar, like those of Selaginella, Physcomitrella and monocotyledonous and eudicotyledonous angiosperms, trace back to a common ancestral gene giving rise to all land plant RpoT genes. The presence of a plastid-localized phagetype RNAP in this basal angiosperm, orthologous to all other RpoTp enzymes of flowering plants, suggests that the duplication event giving rise to a nuclear geneencoded plastid RNA polymerase, not present in lycopods, took place after the split of lycopods from all other tracheophytes. A dual-targeted mitochondrial and plastid RNA polymerase (RpoTmp), as present in eudicots but not monocots, was not detected in Nuphar suggesting that this additional NEP activity (RpoTmp) is an evolutionary novelty of eudicotyledonous plants like Arabidopsis. Our results support the idea that RpoT gene duplications occurred independently of each other several times during the evolution of plants and led to different subcellular localization patterns of of organellar RNA polymerases. These data substantially extend our knowledge about the evolution of the transcriptional machineries in plant organelles.

Plant material and growth conditions
Nuphar advena were purchased from a commercial supplier (Seerosen Shop, Eschede, Germany). The plants were grown in a growth chamber at 23°C with a light/ dark regime of 8/16 hr. The intensity of light in all experiments was 210 μmol photons s -1 m -2 .

DNA and RNA isolation
Leaves of N. advena were ground to fine powder under liquid nitrogen and incubated in three volumes of CTAB