Evolution of RLSB, a nuclear-encoded S1 domain RNA binding protein associated with post-transcriptional regulation of plastid-encoded rbcL mRNA in vascular plants
- Pradeep Yerramsetty†1,
- Matt Stata†2,
- Rebecca Siford1,
- Tammy L. Sage2,
- Rowan F. Sage2,
- Gane Ka-Shu Wong3, 4, 5,
- Victor A. Albert1Email author and
- James O. Berry1Email author
© The Author(s). 2016
Received: 20 January 2016
Accepted: 14 June 2016
Published: 29 June 2016
RLSB, an S-1 domain RNA binding protein of Arabidopsis, selectively binds rbcL mRNA and co-localizes with Ribulose-1,5-bisphosphate carboxylase/oxygenase (Rubisco) within chloroplasts of C3 and C4 plants. Previous studies using both Arabidopsis (C3) and maize (C4) suggest RLSB homologs are post-transcriptional regulators of plastid-encoded rbcL mRNA. While RLSB accumulates in all Arabidopsis leaf chlorenchyma cells, in C4 leaves RLSB-like proteins accumulate only within Rubisco-containing bundle sheath chloroplasts of Kranz-type species, and only within central compartment chloroplasts in the single cell C4 plant Bienertia. Our recent evidence implicates this mRNA binding protein as a primary determinant of rbcL expression, cellular localization/compartmentalization, and photosynthetic function in all multicellular green plants. This study addresses the hypothesis that RLSB is a highly conserved Rubisco regulatory factor that occurs in the chloroplasts all higher plants.
Phylogenetic analysis has identified RLSB orthologs and paralogs in all major plant groups, from ancient liverworts to recent angiosperms. RLSB homologs were also identified in algae of the division Charophyta, a lineage closely related to land plants. RLSB-like sequences were not identified in any other algae, suggesting that it may be specific to the evolutionary line leading to land plants. The RLSB family occurs in single copy across most angiosperms, although a few species with two copies were identified, seemingly randomly distributed throughout the various taxa, although perhaps correlating in some cases with known ancient whole genome duplications. Monocots of the order Poales (Poaceae and Cyperaceae) were found to contain two copies, designated here as RLSB-a and RLSB-b, with only RLSB-a implicated in the regulation of rbcL across the maize developmental gradient. Analysis of microsynteny in angiosperms revealed high levels of conservation across eudicot species and for both paralogs in grasses, highlighting the possible importance of maintaining this gene and its surrounding genomic regions.
Findings presented here indicate that the RLSB family originated as a unique gene in land plant evolution, perhaps in the common ancestor of charophytes and higher plants. Purifying selection has maintained this as a highly conserved single- or two-copy gene across most extant species, with several conserved gene duplications. Together with previous findings, this study suggests that RLSB has been sustained as an important regulatory protein throughout the course of land plant evolution. While only RLSB-a has been directly implicated in rbcL regulation in maize, RLSB-b could have an overlapping function in the co-regulation of rbcL, or may have diverged as a regulator of one or more other plastid-encoded mRNAs. This analysis confirms that RLSB is an important and unique photosynthetic regulatory protein that has been continuously expressed in land plants as they emerged and diversified from their ancient common ancestor.
As photosynthetic organelles, chloroplasts perform several functions that are ultimately essential for all life on earth. In higher plants and eukaryotic algae, their most biologically significant activities are the conversion of solar energy into organic energy and the release of oxygen. The resulting energy molecules ATP and NADPH support biological carbon fixation, initiated through the carboxylation activity of the chloroplastic enzyme ribulose-1,5-bisphosphate carboxylase/oxygenase (Rubisco) and mediated through the Calvin-Benson cycle [1, 2]. The plastids of higher plants originated from ancient photosynthetic prokaryotes through endosymbiosis approximately 1.5 billion years ago. Organelle evolution has incorporated significant plastid genome reduction, so that only about 100–200 genes are encoded on a small circular genome of approximately 150 kilobases in size. The rest of the 2000–3000 proteins utilized within each chloroplast are encoded by the nuclear genome, translated in the cytoplasm, and imported into the chloroplasts via a plastid targeting/transit sequence [1–3]. Anterograde (nucleus to organelle) and retrograde (organelle to nucleus) signaling processes ensure the coordination of gene expression between the two compartmentalized genomes, so that the protein composition and biological processes confined within the chloroplasts themselves are appropriately integrated with the many other processes occurring throughout the entire plant cell .
Plastid-encoded genes are regulated primarily at post-transcriptional levels, with mRNA translation, processing, and stability being primary regulatory determinants [1, 4–7]. Anterograde signaling is dependent on nuclear-encoded, plastid-targeted RNA-binding proteins that interact directly with cis-acting sequences of plastid-encoded mRNAs, usually within their untranslated regions (UTRs). There are several classes of sequence-specific binding proteins, the most predominant being the pentatricopeptide repeat (PPR) proteins, with about 450 transcript-specific forms enabling many aspects of RNA metabolism [1, 4]. There are many other types of nuclear-encoded RNA binding proteins that affect chloroplast gene expression, including the CRM, PORR, APO1 families [1, 4], which for the most part have not been well characterized. Recently, the list of categories for RNA binding proteins with demonstrated effects on plastid gene expression was expanded through the identification of the RLSB (r bc L RNA S1-Binding domain) protein family, which is defined by its distinct nucleic acid binding domain . RLSB homologs have been associated with post-transcriptional expression of the plastid-encoded Rubisco rbcL gene in both C3 and C4 plant species [8, 9].
Rubisco is the principle enzyme of photosynthetic carbon fixation and is central to the viability, growth, and productivity of all plants. Compartmentalized within chloroplasts, it consists of eight large (LSU, 51–58 kDa) and eight small (SSU; 12–18 kDa) subunits [1, 10, 11]. The LSU-encoding rbcL gene is transcribed and translated within chloroplasts, while the nuclear SSU-encoding RbcS gene family is translated on cytoplasmic ribosomes as a precursor containing an N-terminal plastid transit sequence. The rbcL and RbcS mRNAs, as well as their encoded proteins, are coordinately regulated so that equal amounts of both subunits accumulate in each chloroplast. Regulation of Rubisco gene expression at post-transcriptional levels, including regulation of mRNA processing (degradation, stabilization, or maturation of transcripts) and control of translation, has been documented in many plant species [1, 10–12]. Post-transcriptional control has been implicated in the regulation and coordination of RbcS and rbcL genes in response to a variety of developmental and environmental signals [1, 10, 11]. Post-transcriptional regulation also plays a significant role in the cell-type specific compartmentalization of rbcL gene expression in plants that use the highly efficient C4 photosynthetic pathway for carbon fixation [1, 10, 11, 13]. This pathway requires that rbcL and RbcS gene expression becomes specifically localized to internalized leaf bundle sheath (B) cells that surround the vascular tissue, while the outer layer of photosynthetic cells, the leaf mesophyll (M) cells, do not express either subunit. While multiple examples of post-transcriptional Rubisco regulation have been described, very little is known about specific trans-acting factors involved in the regulation of either subunit. RLSB proteins represent rare examples of potential anterograde regulatory factors associated with post-transcriptional rbcL gene expression.
Encoded by the nuclear RLSB gene family, RLSB-like proteins appear to be highly conserved among plant species. The S1 binding domain that distinguishes this protein family was first identified in ribosomal protein S1, and is found in a large number of RNA binding proteins . While non-ribosomal proteins known to possess S1 binding domains are widespread among a variety of organisms, including plants, animals, and prokaryotes [8, 14], very little is currently known about the function of most proteins containing this domain. Previous studies identified RLSB orthologs in more than 100 plant species, including eudicots, monocots, C3 and C4 species; similarities range from 60 % (maize-Arabidopsis) to 90 % (maize-sorghum) [8, 9]. All of these contain a plastid transit sequence, and RLSB homologs have been shown to co-localize with the LSU to leaf chloroplasts in both C3 and C4 plants [8, 9]. Most significantly, RLSB accumulates only within Rubisco-containing chloroplasts of B cells (and not M cells which lack Rubisco) in the leaves of several C4 plants, providing additional correlative evidence for its association with rbcL gene expression. Even within the unique single-cell C4 chlorenchyma cells of Bienertia sinuspersici leaves, the RLSB homolog is highly specific to LSU-containing central compartment chloroplasts, and not to peripheral compartment chloroplasts that lack Rubisco . The co-localization of RLSB proteins with Rubisco within the chloroplasts of C3 and C4 plants, their selective in vitro and in vivo and binding to rbcL mRNA, correlation with reduced rbcL mRNA and protein accumulation in C3 and C4 RLSB mutants (as seen in insertion and RNA-silenced lines), and strong conservation across many plant species, provide support for a model in which RLSB proteins function as trans-acting regulatory determinants for Rubisco gene expression in all higher plants [8, 9].
As a step toward understanding how RLSB proteins relate to chloroplast development, rbcL gene expression, and overall photosynthetic function within the many different groups of plants, the evolutionary analyses presented here have focused on the distribution, copy number, and variation for genes encoding this protein across a highly diverse sampling of higher plant species. These analyses address the hypothesis that, as a central regulator of Rubisco expression, one or more copies of RLSB-like genes will be present, expressed, and highly conserved across a very broad range of plant genomes. Our findings show that nuclear-encoded RLSB-like genes are very highly conserved in higher plants. They occur as a single-copy gene in nearly all of the eudicot species examined, with a few rare species possessing two paralogs seemingly randomly distributed throughout this clade, but possibly correlating with some known ancient whole genome duplication events . Duplications of RLSB-like genes were also found in a few lower plant species. However, monocots species in the family Poaceae (grasses) contain a conserved paralog, the function of which has not yet been determined. With regard to the paralogs found in Poaceae, we have designated the previously identified gene associated with Rubisco regulation  that is present in all higher plants as RLSB-a, and the newly identified grass-specific paralog as RLSB-b. RLSB-a was found to occur within a region with high levels of local synteny, suggesting purifying selection has influenced its copy number and regional localization following whole genome duplication events. Our data identify RLSB-like proteins as highly conserved regulatory determinants associated with photosynthetic carbon fixation in all plants, including C3 as well as C4 species. Understanding RLSB protein family evolution throughout the plant kingdom provides a new window into the evolution of regulatory mechanisms responsible for the synthesis of Rubisco, and accordingly, primary productivity throughout Earth’s biosphere. Identification of molecular evolutionary processes responsible for photosynthetic carbon assimilation is an important step for bioengineering crop plants to enhance food and biofuel production.
Identification and analysis of expressed RLSB orthologs and paralogs in transcriptome databases
Conserved RLSB gene family sequences from a highly diverse range of plant species were obtained using the BLAST [16, 17] tblastn algorithm with Zea mays (GRMZM2G087628) and Arabidopsis (JX843767) RLSB  as query sequences. Multiple databases were screened, including the National Center for Biotechnology Information (NCBI) (http://www.ncbi.nlm.nih.gov), Phytozome 10.2 (http://phytozome.jgi.doe.gov/pz/portal.html#!search), the 1000 Plants project (https://sites.google.com/a/ualberta.ca/onekp/home) and the CoGe server (https://genomevolution.org/coge/). RLSB-like orthologs and paralogs were identified as having significant E-values (usually less than 10−5) and preserving the known conserved S1 RNA binding motifs. We identified homologs in more than 245 plant species using the aforementioned databases. This search also identified an RLSB paralog (RLSB-b) in several C3 and C4 grasses and sedges. Among these sequences, some represented complete full-length mRNAs, while others were scaffolds of partially sequenced genes aligned using the T-coffee sequence (TCS) aligner software (http://tcoffee.crg.cat/apps/tcoffee/index.html). This software revealed conserved regions of the orthologs, particularly within the S-1RNA binding domain region. TCS analysis of the alignment  showed a score higher than 85 for all species, indicating a highly reliable alignment.
Phylogenetic analysis of RLSB gene family transcripts
Orthologous transcriptome sequences were identified in the 1KP datasets and Phytozome 10.2, using BLAST for initial identification and then for validation via reciprocal BLAST back to the Arabidopsis sequence. Since de novo transcriptome assemblies are often fragmented, when multiple sequences were recovered from a given species the alignment and a preliminary approximate maximum-likelihood tree generated using FastTree were used to manually distinguish fragments from paralogs. If multiple sequences from a single species did not overlap in the alignment and the tree showed no evidence of duplication in that lineage the sequences were assumed to be transcript fragments and were combined in the alignment. If these sequences did overlap, they were assumed to be paralogs if they differed at the amino acid level. Overlapping sequences without amino acid substitutions were assumed to represent allelic variation and were combined. Some key taxa lacked a hit against RLSB, e.g., the two hornwort species Nothoceros aenigmaticus and N. vincentianus, as well as Welwitschia mirabilis. For these species, targeted assemblies were attempted using BLAST to identify short reads that mapped to RLSB family protein sequences from related taxa and de novo assembly of the identified reads using Geneious. This approach successfully assembled an RLSB-like coding sequence for Welwitschia, but failed to assemble sequences for the hornworts. It is possible that the hornworts lack RLSB homologs, or that they simply do not express it in tissues that were sampled for RNA-seq.
Bayesian phylogenetic reconstruction was performed on the trimmed amino acid alignment using MrBayes 3.2.5 with 2 independent runs, 32 chains per run. The amino acid substitution model was set to mixed, and thus determined by the MCMC run, and the favoured model was Jones et al. . The analysis was run for over 6.5 million generations, and plateaued with average standard deviation of split frequencies of approximately 4.2 % for the final 2 million generations. Six thousand five hundred trees from these last 2 million generations were sampled from each run and a majority rule consensus tree was made using the consensus program included with ExaBayes 1.4.2, since MrBayes lacks such a stand-alone program. A consensus support threshold of 50 % posterior probability was selected, with nodes below this level of support collapsed.
Analysis of synteny
Synteny for genomic RLSB-like genes and their surrounding regions among major angiosperms species was assessed and visualized using the CoGe server (https://genomevolution.org/CoGe/) as previously described .
Maize lines and growth conditions
For cloning of the maize RLSB-a and RLSB-b paralogs, and for mRNA analysis, seeds of wild type B73 and rlsb-a1/rlsb-a2 Mu-insertion mutant plants (these were previously designated as rlsb-1/rlsb-2) were germinated and grown in a growth chamber as described previously .
cDNA cloning and qRT-PCR of maize RLSB paralogs in wild type and mutant maize leaves
To confirm the specificity of primers that were used to independently quantify the accumulation of transcripts encoded from the two maize RLSB-like paralogs, total RNA was harvested from the leaves of both wild type B73 and rlsb-a1/rlsb-a2 Mu-insertion mutant plants, as described previously . cDNA was prepared from these RNA samples using the iScript cDNA synthesis kit (Bio-Rad®) cDNA from each ortholog (designated as RLSB-a and RLSB-b) was amplified by PCR using primers specific for each transcript. The amplified PCR fragments were then cloned into pDrive vector and transformed into bacteria using a PCR cloning kit (Qiagen®, Hilden, Germany) according to manufacturer’s instructions, for further analysis. PCR amplifications from the cloned fragments were performed in 25 μl volumes with the AmpliTaq DNA Polymerase buffer II kit (Applied Biosystems, Foster City, CA, USA) using 2.5 μl buffer, 2.5 μl MgCl2, 1.0 μl dNTP, and 0.6 μl each of M13F and M13R primers and 0.2 μl of AmpliTaq polymerase. All PCR products were examined by gel electrophoresis on 1 % agarose gels, and the insert-containing PCR-positive plasmids were sequenced in one direction using M13 primer at the Roswell Park Cancer Institute sequencing facility (http://biopolymer.roswellpark.org). The sequencing results from several independent clones were analyzed using BLAST to confirm that they corresponded to one or both of the maize RLSB paralogs.
Analysis of RLSB-a and RLSB-b transcript accumulation by qRT-PCR
As described previously, leaf 3 (these were referred to as second emerging leaves in ) from wild type B73 and Mu-insertion mutant maize plants were harvested at 12–13 inches in length. These were divided into 7 equidistant sections (from the base of the leaf to the tip) for analysis of RLSB-a and RLSB-b mRNAs in the different leaf sections. Total RNA was harvested from each section, according to methods previously described . cDNA was prepared from these RNA samples using iScript cDNA synthesis kit (Bio-Rad) with primers specific for RLSB-a (left primer, CCACTTCCATAACCCAGCAT and right primer, ATTTCACTCCAGGGGCACTA) and RLSB-b (left primer, ATCAACAGAAGAAGCGCTCG, and right primer, TAACTAACCCCACGCTCACC). Levels of mRNA for RLSB-a and RLSB-b in the different leaf sections was determined using qRT-PCR, and standardized to actin mRNA. Quantification of transcript levels in both cases was calculated using ΔΔCt method, standardized to actin mRNA. Data was averaged for three wild type and three mutant siblings, with three technical repeats for each of the plant samples. The differences in expression levels of RLSB-a and RLSB-b in each of the seven leaf sections of both wild type and mutant plants, from all the repeats was subject Student’s t-test to ensure the P values were lower than 0.05, which denotes the statistical significance of the data. Correlation analyses done separately for the expression level data from all the section denoting yellow bases and from the sections denoting the green tip regions of the mutants maize leaves yielded similar results with r2 values of 0.002 and 0.338 respectively, suggesting absence of significant correlation between the expression levels of RLSB-a and RLSB-b in mutants plants.
RLSB family gene transcripts are present and conserved in all vascular plant groups
Arabidopsis and maize RLSB cDNA sequences  were used in a comparative search for expressed orthologs in other angiosperm species using the BLAST tblastn algorithm as a search tool with several plant transcriptome databases. Transcripts were identified as being encoded by RLSB orthologs based on having significant E values (less than 10−5) and by containing alignable sequence outside the S1 binding motif. The data were compiled mostly from available complete transcriptomes. Data from some plant groups, such as algae, bryophytes, marchantiophyta, lycophytes, ptreridophytes, gymnosperms and angiosperms, were derived from partial transcriptomes or complete transcriptomes based on their availability.
In addition to the higher plants, RLSB-like sequences were identified in several lower plant species, including mosses, liverworts, bryophytes, lycophytes, and ferns (Figs. 2 and 3). RLSB-like transcript sequences were identified in eight Charophyte algae, from the Zygnematophyceae (recognized as the closest extant lineage to the land plants, [24–27]), Charales, and Klebsormidiales (the most distant relative of the Embryophyta in which an ortholog was found) (Figs. 1 and 3). Sequences were not identified in any other algae species examined, including aquatic Chlorophyte (Chlamydomonas reinhardtii) and brown (Saragassum thunbergi) algae. The finding that RLSB homologs are present in Charophyte lineages, considered to be closely related to the common ancestor of all vascular plants [24–27], as well as all other non-vascular plant groups examined, indicates an ancient origin for this conserved protein family that appears to have preceded the invasion of terrestrial environments.
All of the complete full-length RLSB-like gene sequences were found to encode a chloroplast transit sequence, indicating that their gene products are targeted to chloroplasts (Additional file 1: Figure S1). It should be noted that orthologs that did not indicate a transit peptide were derived from species with incomplete sequence assemblies (such as C. vulgaris). The widespread presence of conserved, plastid-targeted RLSB-like sequences in all of these major plant groups, including charophyte algae, strongly supports a significant and conserved regulatory function for this gene family within the chloroplasts of all plants. These findings are consistent with previous immunolocalization and cell-separation studies showing that RLSB protein homologs are localized to Rubisco-containing chloroplasts in several C3 and C4 dicot and monocot species [8, 9]. Taken together with studies demonstrating the association of RLSB-like proteins within the C3 dicot Arabidopsis and the C4 monocot maize , findings presented here provide evidence that the family has maintained an essential and conserved role in chloroplast function and the regulation of rbcL expression throughout plant evolution.
To search for the occurrence of potential ancestral S1 domain regulatory proteins in organisms that significantly predate the Charophyte-based monophyletic lineage leading to higher plants, a BLAST search of several representative prokaryotic organisms was conducted using the Arabidopsis RLSB transcript as a reference (see Additional file 2: Table S1). Stringent search parameters (E = less than 10−5) identified protein sequences with very low sequence similarity to the S1-RNA binding domain itself in some purple non-sulfur bacteria (Rhodospirillum, Rhodopseudomonas, etc.), a class of phototrophic bacteria that perform photosynthetic carbon assimilation through Rubisco-like proteins consisting of only LSU subunits . Since these similarities occur only near the C-terminal portion of the proteins that contains the S1-binding domain, the relatedness to RLSB-like proteins is limited only to their basic nucleic acid binding function. Regulatory functions of prokaryotic S1-RNA binding have been shown to play a role in the expression of several essential bacterial genes by binding to the 5′end of their mRNA to modulate translational initiation or elongation [29–31]. However, an effect on LSU expression for these proteins has not been investigated in these photosynthetic prokaryotes. While it is evident that S1-RNA binding domains do play an important role in prokaryotic as well as eukaryotic gene regulation [8, 14, 31], the current data cannot establish a direct evolutionary relationship between these prokaryotic proteins and the chloroplastic RLSB homologs present in eukaryotic plants.
RLSB gene family transcripts are present as a single copy in most eudicots, and in two copies in many monocot grasses and sedges
Using the Arabidopsis RLSB sequence as a reference, a BLAST search identified RLSB gene family transcripts in all angiosperm species for which data were available. This included RLSB-like sequences in the early-diverging angiosperm species Amborella, to more recent lineages within eudicots and monocots (Figs. 1, 2 and 3). Nearly all of the eudicots included in this analysis had a single copy of the RLSB gene family. However, there were a few rare species among the eudicots that were found to possess two copies (Fig. 3). Examples were the Fabaceae (Glycine max), Phrymaceae (Mimulus), and Arecaceae (Phoenix dactylus). These were seemingly randomly distributed among families, with no clear taxonomic correlation, although they might be related to some known ancient whole genome duplications . An independent RLSB family duplication was also found at the base of gymnosperms, including taxa such as Ginkgo biloba, Cedrus libani, Cycas micholitzii, Cunnighamia lanceolata, Pinus taeda, Cedrus libani, Taxus baccata. Similar deep duplications were also visible in bryophytes such as Rynchostegium serrulatum, Physcomitrella patens, Sphagnum lescurii, Bryum argenteum, and Ceratodon purpureus. In angiosperms, the only clear lineage-wide duplication appeared among monocot species within the family Poaceae (grasses) and Cyperaceae (sedges), where a conserved paralog is retained whose function has not yet been determined (Fig. 3). In rice and maize the translated protein sequence similarities between these paralogs ranged from approximately 50 to 60 %, respectively. The change from one RLSB-like gene to two paralogs in these commelinid monocots was most likely related to a whole genome duplication (WGD) event that occurred during evolution of the lineage around 70 to 100 million years ago (MYA) [32, 33], followed by chromosomal rearrangements and fusions . Most of the basal monocot species, such as Spirodela and Musa, show only one copy from RLSB gene family. Regardless of their duplication status, conservation of RLSB homologs was very high across the range of angiosperm species examined. These findings demonstrate that the RLSB gene family has been strictly maintained, at the very least for a single-copy, canonical function. Indeed, this high degree of conservation not only among angiosperms but across all land plants suggests strong purifying selection acting since their early evolution as well as through their subsequent radiations.
RLSB homologs share microsynteny in several angiosperm species
The two RLSB paralogs in maize
The occurrence of two RLSB paralogs in grasses and some other monocots raises a question about the function of two paralogs in these species. To address this question, the expression of the two maize paralogs RLSB-a and RLSB-b was examined in leaves from wild type as well as the Mu insertion mutant plants described previously . The qRT-PCR mRNA analysis utilized primer sets carefully chosen to make sure that they would only amplify either RLSB-a or RLSB-b transcripts, but not both (Additional file 3: Figure S2). The primers were first used to amplify each transcript from wild type maize leaf cDNA. Each of the amplified PCR products were then cloned, and several independent clones were sequenced to confirm specificity of the primer sets for each of the paralogs (Additional file 3: Figure S2).
The nuclear-encoded RLSB gene family produces mRNA binding proteins that are targeted to chloroplasts. Their defining S1 binding site is found in many other nucleic acid binding proteins, including many non-ribosomal proteins as well as some components of the chloroplast ribosomes [8, 14, 40–42]. However, outside of the conserved S1 binding domain, comparative sequence analysis demonstrates that RLSB proteins are very distinct from other known members of the protein superfamily, including ribosomal protein S1 and the recently identified ribosomal protein SDP [40, 41] (Additional file 4: Figure S3). While little is known about the function of most non-ribosomal S1 domain proteins in plants and other organisms, previous studies have linked RLSB homologs with the expression of the plastid-encoded rbcL gene in C3 and C4 plant species [8, 9].
Findings presented here indicate that RLSB homologs are present across the entire range of vascular plants, and are highly conserved even among evolutionarily divergent species. While most plant species possess only a single copy of this gene, gymnosperms, bryophytes, some eudicots and many species of Poaceae possess two conserved paralogs. Analysis of synteny indicates the local genomic region surrounding these genes and their paralogs are also conserved in many species. This analysis provides evidence that RLSB gene family sequence, copy number, and dosage have been strongly conserved throughout the evolution of land plants. Findings from these evolutionary analyses, together with the demonstrated role of RLSB-like proteins in the post-transcriptional regulation of plastid-encoded rbcL mRNAs, and the fact that reduced gene expression in both Arabidopsis and maize leads to severe photosynthetic impairment or lethality , provide compelling evidence that the RLSB family is an essential determinant of chloroplast function, rbcL expression, and photosynthesis in all plants.
All major groups of plants, including mosses, ferns, liverworts, gymnosperms and angiosperms, are thought to have originated as a monophyletic group from an ancient charophyte-like green alga between 450–500 million years ago [24–27]. RLSB family sequences were identified in several Charyophytes, but not in other green algae such as Chlamydomonas reinhardtii (a single cell green marine algae) or the phaeophyte Saragassum thunbergi (multicellular aquatic brown algae) (see Additional file 2: Table S1, for a list of bacterial and algal species examined). It is possible that the ancestral RLSB mRNA binding protein may have become established and maintained as a nuclear-encoded, plastid-targeted protein in an ancestral charophyte species. If this scenario is correct, then the proposed regulatory function of the protein on plastid-encoded rbcL mRNA could have originated either in an early charophyte, or possibly at some later point during evolution of the now-extinct stem lineage of land plants.
It is worth noting that S1 binding proteins with potential regulatory capability have been found in several prokaryotes, including Rhodopsedomonas palustris, a prokaryotic organism capable of photosynthetic carbon fixation [28–31]. Interestingly, this organism has a Rubisco enzyme that is composed of LSU-like proteins that complex as a homodimer . This would imply that RLSB proteins, which bind rbcL mRNAs, could in fact have a very ancient origin that preceded their establishment in photosynthetic eukaryotes. Although true RLSB homologs were not identified in any prokaryotic species examined, including cyanobacteria, these lineages do possess other S1-domain proteins. It is possible that lateral gene transfer from an endosymbiont-derived primordial chloroplast possessing an early S1-domain RNA binding protein could have led to its incorporation/modification as a nuclear-encoded regulatory gene during a very early stage of plant cell evolution, and that one of these S1 proteins subsequently gave rise to the RLSB gene family via duplication and neofunctionalization This mechanism has been proposed for many nuclear-encoded plastid genes, including some involved in chloroplast regulation and translation [1, 4, 44–47].
RLSB homologs and their surrounding genomic regions occur in duplicate in maize and many other monocot grasses. This is consistent with an early whole genome duplication that occurred at the base of the order Poaceae [48, 49], followed by the subsequent relocation of the duplicated block in cases such as maize where they exist on different chromosomes in the modern genome . This duplication has been maintained in modern grass species, suggesting that adaptive advantage (through neofunctionalization of one duplicate) or functional partitioning (via subfunctionalization) has led to the two RLSB-like paralogs becoming fixed within the genomes of this clade [38, 50–52]. Each of these processes would be consistent with the finding that RLSB–a and RLSB-b are both expressed without significant variation across the entire maize leaf gradient (Fig. 6), while inactivation of RLSB-a, in itself, was sufficient to cause reductions in rbcL mRNA and protein accumulation in the maize Mu insertion lines described previously . If the two paralogs have diverged to recognize different binding/regulatory mRNA targets (neo-functionalization), then RLSB-a might be specifically associated with rbcL transcripts, while RLSB-b could be associated with another as-yet unidentified plastidic mRNA. In another form of neofunctionalization, one paralog might have acquired a novel pattern of cell or tissue-specific expression . This could have led to the development of divergent patterns of functionalities, with one of the gene duplicates being more active at the leaf base and the other at the leaf tip. It is also possible that the two paralogs have diverged but are both still required for binding/regulation of rbcL mRNA within the same leaf cells (subfunctionalization), perhaps associating together as an RNA binding heterodimer. In this case, the loss of function for one of the two interacting proteins would be enough to cause loss of function for the entire heterodimer, leading to the rbcL mRNA and protein reductions observed in the mutant maize lines. If both paralogs have retained their original function (conservation of function), then each of the RLSB-like genes might serve identical complementary roles in rbcL mRNA metabolism, with both copies required for optimal (maximized) rbcL expression in these monocot leaves. This might explain the “leakiness” of the RLSB-a mutants, if the residual RLSB mRNA and protein were in fact produced only from the non-mutated RLSB-b paralog. Distinguishing between these different mechanisms will be resolved by additional functional analysis of both RLSB-a and the newly identified RLSB-b paralog in wild type and RLSB-b mutant maize leaves.
Its high microsynteny within eudicots and between grass subgenomes from an ancestral WGD in surrounding genomic regions, strong sequence conservation, and low copy number distinguish the RLSB gene family from the abundant and diverse PPR class of chloroplast RNA binding proteins. While RLSB-like genes occur only as single or few copies, there are otherwise more than 450 members of the extensive PPR gene family in higher plant genomes [1, 53]. These show many variations in sequence and function, with different members involved in RNA editing, transcript processing, and other RNA metabolic functions. In apparent contrast to the RLSB family, many PPR genes appear to show little or low synteny in their surrounding regions, even for PPR genes relatively closely linked on the same chromosomes . Although we have not characterized this data ourselves, this finding by others could suggest that PPR protein genes may have commonly been subject to diversifying selection, resulting in multiple paralogs and orthologs that vary in function. In contrast, evidence presented here suggests that negative selection has preserved RLSB gene family sequences, limited their functional divergence, and maintained their microsynteny across a very wide range of plant species.
Nuclear genes encoding the unique plastid-targeted RLSB S1 domain rbcL-mRNA binding protein are present and expressed across a wide array of plant species. The highly conserved RLSB gene family appears to have originated very early in the evolution of land plants, possibly in a common ancestor of charophytes and higher plants. RLSB homologs have been maintained as a single- or duplicate copies in all land plant species, with conserved duplications of RLSB and its surrounding genomic regions distributed throughout the taxa, most notably in monocot grasses and sedges. Of the two paralogs in Maize, only RLSB-a has been directly implicated in rbcL regulation. RLSB-b could have an overlapping function in the co-regulation of rbcL, or may have diverged to function as a regulator of one or more other plastid-encoded mRNAs. Taken together with previous findings (8,9) this study provides strong evidence that RLSB-like genes have been conserved and sustained at low copy number throughout the course of land plant evolution. Evidence presented here provides strong support for the conservation of RLSB as a critical regulator of photosynthetic function as the evolutionary lineages of higher plants emerged and diversified from their ancient common ancestor. This study represents the most thorough evolutionary analysis of any member of the S1 class of nucleic binging protein to date.
B, bundle sheath; LSU, rubisco large subunit protein; M, mesophyll; MYA, million years ago; PET, photosynthetic electron transport; PPR, pentatricopeptide repeat; RLSB, rbcL RNA S1-binding domain protein; Rubisco, ribulose-1,5-bisphosphate carboxylase/oxygenase; SSU, rubisco small subunit protein; UTR, untranslated region; WGD, whole genome duplication
We are grateful to Julian Hibbard and Sara Covshoft for sharing the Flaveria transcriptomes. We thank Jim Stamos for his help in preparing several of the Figures. This work was supported by USDA/NRI Grant 2008–01070 and bridge funding from the University at Buffalo College of Arts and Sciences and The Department of Biological Sciences to J.O.B. Publication costs were supported by a University at Buffalo Graduate School Dissertation Fellowship to PY. The 1000 Plants (1KP) initiative, led by GKSW, is funded by the Alberta Ministry of Innovation and Advanced Education, Alberta Innovates Technology Futures (AITF), Innovates Centres of Research Excellence (iCORE), Musea Ventures, BGI-Shenzhen and China National Genebank (CNGB).
PY and VAA conceived the study. PY, MS, VAA, and RS collected the data, performed the analysis, and prepared the figures, under the guidance of TLS, RFS, JOB, and VAA. GKSW collected and provided the Plant 1KP transcriptome database sequences. All authors contributed to writing the article, and all authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Berry JO, Yerramsetty P, Zielinski AM, Mure C. Photosynthetic gene expression in higher plants. Photosynth Res. 2013;117:91–120.View ArticlePubMedGoogle Scholar
- Jarvis P, López-Juez E. Biogenesis and homeostasis of chloroplasts and other plastids. Nat Rev Mol Cell Biol. 2013;14:787–802.View ArticlePubMedGoogle Scholar
- Wise RR. The diversity of plastid form and function. In: Wise RR, Hoober JK, editors. The structure and function of plastids. Advances in photosynthesis and respiration, vol. 23. Dordrecht: Springer; 2006. p. 3–26.View ArticleGoogle Scholar
- Barkan A. Expression of plastid genes: organelle-specific elaborations on a prokaryotic scaffold. Plant Physiol. 2011;155:1520–32.View ArticlePubMedPubMed CentralGoogle Scholar
- Raynaud C, Loisela C, Wostrikoff K, Kuras R, Girard-Bascou J, Wollman FA, Choquet Y. Evidence for regulatory function of nucleus-encoded factors on mRNA stabilization and translation in the chloroplast. Proc Natl Acad Sci U S A. 2007;104:9093–8.View ArticlePubMedPubMed CentralGoogle Scholar
- Stern DB, Goldschmidt-Clermont M, Hanson MR. Chloroplast RNA metabolism. Annu Rev Plant Biol. 2010;61:125–55.View ArticlePubMedGoogle Scholar
- Tillich M, Beick S, Schmitz-Linneweber C. Chloroplast RNA-binding proteins: repair and regulation of chloroplast transcripts. RNA Biol. 2010;7:172–8.View ArticlePubMedGoogle Scholar
- Bowman SM, Patel M, Yerramsetty P, Mure CM, Zielinski AM, Bruenn JA, Berry JO. A novel RNA binding protein affects rbcL gene expression and is specific to bundle sheath chloroplasts in C4 plants. BMC Plant Biol. 2013;13:138.View ArticlePubMedPubMed CentralGoogle Scholar
- Rosnow J, Yerramsetty P, Berry JO, Okita TW, Edwards GE. Exploring mechanisms linked to differentiation and function of dimorphic chloroplasts in the single cell C4 species Bienertia sinuspersici. BMC Plant Biol. 2014;14:34.View ArticlePubMedPubMed CentralGoogle Scholar
- Patel M, Berry JO. Rubisco gene expression in C4 plants. J Exp Bot. 2008;59:1625–34.View ArticlePubMedGoogle Scholar
- Berry JO, Mure CM, Yerramsetty P. Regulation of Rubisco gene expression in C4 plants. Curr Opin Plant Biol. 2016;31:23–8.View ArticlePubMedGoogle Scholar
- Sawchuk MG, Donner TJ, Head P, Scarpella E. Unique and overlapping expression patterns among members of photosynthesis-associated nuclear gene families in Arabidopsis. Plant Physiol. 2008;148:1908–24.View ArticlePubMedPubMed CentralGoogle Scholar
- Hibberd JM, Covshoff S. The regulation of gene expression required for C4 photosynthesis. Annu Rev Plant Biol. 2010;61:181–207.View ArticlePubMedGoogle Scholar
- Bycroft M, Hubbard TJP, Proctor M, Freund SMV, Murzin AG. The solution structure of the S1 RNA binding domain: a member of an ancient nucleic acid–binding fold. Cell. 1997;88:235–42.View ArticlePubMedGoogle Scholar
- Vanneste K, Maere S, Van de Peer Y. Tangled up in two: a burst of genome duplications at the end of the cretaceous and the consequences for plant evolution. Phil Trans R Soc B. 2014;369:20130353.View ArticlePubMedPubMed CentralGoogle Scholar
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215:403–10.View ArticlePubMedGoogle Scholar
- Altschul SF, Madden TL, A.A. Schäffer AA, Zhang J, Zheng Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–402.View ArticlePubMedPubMed CentralGoogle Scholar
- Chang J-M, Tommaso PD, Notredame C. TCS: a new multiple sequence alignment reliability measure to estimate alignment accuracy and improve phylogenetic tree reconstruction. Mol Biol Evol. 2014;31:1625–37.View ArticlePubMedGoogle Scholar
- Jones DT, Taylor WR, Thronton JR. The rapid generation of mutation data matrices from protein sequences. Comput Appl Biosci. 1992;8:275–82.PubMedGoogle Scholar
- Lyons E, Freeling M. How to usefully compare homologous plant genes and chromosomes as DNA sequences. Plant J. 2008;53:661–73.View ArticlePubMedGoogle Scholar
- Amborella Genome Project. Science. 2013;342,1241089. doi:10.1126/science.1241089.
- Hileman LC, Drea S, Martino G, Litt A, Irish VF. Virus-induced gene silencing is an effective tool for assaying gene function in the basal eudicot Papaver somniferum (opium poppy). Plant J. 2005;44:334–41.View ArticlePubMedGoogle Scholar
- Gross SM, Martin JA, Simpson J, Abraham-Juarez MJ, Wang Z, Visel A. De novo transcriptome assembly of drought tolerant CAM plants, Agave deserti and Agave tequilana. BMC Genomics. 2013;14:563.View ArticlePubMedPubMed CentralGoogle Scholar
- Turmel M, Otis C, Lemieux C. The chloroplast genome sequence of Chara vulgaris sheds new light into the closest green algal relatives of land plants. Mol Biol Evol. 2006;23:1324–38.View ArticlePubMedGoogle Scholar
- Wodniok S, Brinkmann H, Glockner G, Heidel AJ, Philippe H, Melkonian M, Becker B. Origin of land plants: do conjugating green algae hold the key? BMC Evol Biol. 2011;11:104.View ArticlePubMedPubMed CentralGoogle Scholar
- Timme RE, Bachvaroff TR, Delwiche CF. Broad phylogenomic sampling and the sister lineage of land plants. PLoS One. 2012;7:e29696.View ArticlePubMedPubMed CentralGoogle Scholar
- Wickett NJ, Mirarab S, Nguyen N, Warnow T, Carpenter E, et al. Phylotranscriptomic analysis of the origin and early diversification of land plants. Proc Natl Acad Sci U S A. 2014;111:E4859–68.View ArticlePubMedPubMed CentralGoogle Scholar
- Larimer FW, Chain P, Hauser L, Lamerdin J, Malfatti S, Do L, et al. Complete genome sequence of the metabolically versatile photosynthetic bacterium Rhodopseudomonas palustris. Nat Biotechnol. 2004;22:55–61.View ArticlePubMedGoogle Scholar
- Abe K, Obana N, Nakamura K. Effects of depletion of RNA-binding protein Tex on the expression of toxin genes in Clostridium perfringens. Biosci Biotechnol Biochem. 2010;74:1564–71.View ArticlePubMedGoogle Scholar
- He X, Thornton J, Carmicle-Davis S, McDaniel LS. Tex, a putative transcriptional accessory factor, is involved in pathogen fitness in Streptococcus pneumoniae. Microb Pathog. 2006;41:199–206.View ArticlePubMedGoogle Scholar
- Johnson SJ, Close D, Robinson H, Vallet-Gely I, Dove SL, Hill CP. Crystal structure and RNA binding of the Tex protein from Pseudomonas aeruginosa. J Mol Biol. 2008;377:1460–73.View ArticlePubMedPubMed CentralGoogle Scholar
- Jiao Y, Li J, Tang H, Paterson AH. Integrated syntenic and phylogenomic analyses reveal an ancient genome duplication in monocots. Plant Cell. 2014;26:2792–802.View ArticlePubMedPubMed CentralGoogle Scholar
- Hughes TE, Langdale JA, Kelly S. The impact of widespread regulatory neo-functionalization on homeolog gene evolution following whole-genome duplication in maize. Genome Res. 2014;24:1348–55.View ArticlePubMedPubMed CentralGoogle Scholar
- Swigonova Z, Lai J, Ma J, Ramakrishna W, Llaca V, Bennetzen JL, Messing J. Close split of sorghum and maize genome progenitors. Genome Res. 2004;14:1916–23.View ArticlePubMedPubMed CentralGoogle Scholar
- Adams K, Cronn R, Percifield R, Wendel J. Genes duplicated by polyploidy show unequal contributions to the transcriptome and organ-specific reciprocal silencing. Proc Natl Acad Sci U S A. 2003;100:4649–54.View ArticlePubMedPubMed CentralGoogle Scholar
- Adams KL, Wendel JF. Polyploidy and genome evolution in plants. Curr Opin Plant Biol. 2005;8:135–41.View ArticlePubMedGoogle Scholar
- Boivin K, Acarkan A, Mbulu R, Clarenz O, Schmidt R. The Arabidopsis genome sequence as a tool for genome analysis in Brassicaceae. A comparison of the Arabidopsis and Capsella rubella genomes. Plant Physiol. 2004;135:735–44.View ArticlePubMedPubMed CentralGoogle Scholar
- Ohno S. Evolution by gene duplication. New York: Springer; 1970.View ArticleGoogle Scholar
- McGrath C, Lynch M. Evolutionary significance of whole-genome duplication. In: Soltis PS, Soltis DE, editors. Polyploidy and genome evolution. Berlin: Springer; 2012. p. 1–20.View ArticleGoogle Scholar
- Han JH, Lee K, Jung S, Jeon Y, Pai HS, Kang H. A nuclear-encoded chloroplast-targeted S1 RNA-binding domain protein affects chloroplast rRNA processing and is crucial for the normal growth of Arabidopsis thaliana. Plant J. 2015;83:277–89.View ArticlePubMedGoogle Scholar
- Yu HD, Yang XF, Chen ST, Wang YT, Li JK, Shen Q, Liu XL, Guo FQ. Downregulation of chloroplast RPS1 negatively modulates nuclear heat-responsive expression of HsfA2 and its target genes in Arabidopsis. PLoS Genet. 2012;8(5):e1002669. doi:10.1371/journal.pgen.1002669.View ArticlePubMedPubMed CentralGoogle Scholar
- Fox GE. Origin and evolution of the ribosome. Cold Spring Harb Perspect Biol. 2010;9:a003483.Google Scholar
- Satagopan S, Chan S, Perry LJ, Tabita FR. Structure-function studies with the unique hexameric form II ribulose-1,5-bisphosphate carboxylase/oxygenase (Rubisco) from Rhodopseudomonas palustris. J Biol Chem. 2014;289:21433–50.View ArticlePubMedPubMed CentralGoogle Scholar
- Rujan T, Martin W. How many genes in Arabidopsis come from cyanobacteria? An estimate from 386 protein phylogenies. Trends Genet. 2001;17:113–20.View ArticlePubMedGoogle Scholar
- Yamaguchi K, Subramanian AR. Proteomic identification of all plastid-specific ribosomal proteins in higher plant chloroplast 30S ribosomal subunit PSRP-2 (U1A-type domains), PSRP-3a/b (ycf65 homologue) and PSRP-4 (Thx homologue). Eur J Biochem. 2003;270:190–205.View ArticlePubMedGoogle Scholar
- Givens RM, Lin MH, Taylor DJ, Mechold U, Berry JO, Hernandez VJ. Inducible expression, enzymatic activity, and origin of higher plant homologues of bacterial RelA/SpoT stress proteins in Nicotiana tabaccum. J Biol Chem. 2004;279:7495–504.View ArticlePubMedGoogle Scholar
- Zimorski V, Ku C, Martin WF, Gould SB. Endosymbiotic theory for organelle origins. Curr Opin Microbiol. 2014;22:38–48.View ArticlePubMedGoogle Scholar
- Soltis DE, Albert VA, Leebens-Mack J, Bell CD, Paterson AH, Zheng C, Sankoff D, dePamphilis CW, Wall PK, Soltis PS. Polyploidy and angiosperm diversification. Am J Bot. 2009;96:336–48.View ArticlePubMedGoogle Scholar
- Wang X, Wang J, Jin D, Guo H, Lee T-H, Liu T, Paterson AH. Genome alignment spanning major Poaceae lineages reveals heterogeneous evolutionary rates and alters inferred dates for key evolutionary events. Mol Plant. 2015;8:885–98.View ArticlePubMedGoogle Scholar
- Hahn WH. Distinguishing among evolutionary models for the maintenance of gene duplicates. J Hered. 2009;100:605–17.View ArticlePubMedGoogle Scholar
- Conant GC, Birchier JA, Pires JC. Dosage, duplication, and diploidization: clarifying the interplay of multiple models for duplicate gene evolution over time. Curr Opin Plant Biol. 2014;19:91–8.View ArticlePubMedGoogle Scholar
- Rogozin IB. Complexity of gene expression evolution after duplication: protein dosage rebalancing. Gen Res Int. 2014; Article ID 516518. 8 pages. doi:10.1155/2014/516508.
- Hayes ML, Mulligan MR. Pentatricopeptide repeat proteins constrain genome evolution in chloroplasts. Mol Biol Evol. 2011;28:2029–39.View ArticlePubMedGoogle Scholar
- Geddy R, Brown GG. Genes encoding pentatricopeptide repeat (PPR) proteins are not conserved in location in plant genomes and may be subject to diversifying selection. BMC Genomics. 2007;8:130.View ArticlePubMedPubMed CentralGoogle Scholar