Molecular evolution accompanying functional divergence of duplicated genes along the plant starch biosynthesis pathway
- Odrade Nougué1, 2, 3,
- Jonathan Corbi4,
- Steven G Ball5,
- Domenica Manicacci†3Email author and
- Maud I Tenaillon†2, 6
© Nougué et al.; licensee BioMed Central Ltd. 2014
Received: 17 October 2013
Accepted: 2 May 2014
Published: 15 May 2014
Starch is the main source of carbon storage in the Archaeplastida. The starch biosynthesis pathway (sbp) emerged from cytosolic glycogen metabolism shortly after plastid endosymbiosis and was redirected to the plastid stroma during the green lineage divergence. The SBP is a complex network of genes, most of which are members of large multigene families. While some gene duplications occurred in the Archaeplastida ancestor, most were generated during the sbp redirection process, and the remaining few paralogs were generated through compartmentalization or tissue specialization during the evolution of the land plants. In the present study, we tested models of duplicated gene evolution in order to understand the evolutionary forces that have led to the development of SBP in angiosperms. We combined phylogenetic analyses and tests on the rates of evolution along branches emerging from major duplication events in six gene families encoding sbp enzymes.
We found evidence of positive selection along branches following cytosolic or plastidial specialization in two starch phosphorylases and identified numerous residues that exhibited changes in volume, polarity or charge. Starch synthases, branching and debranching enzymes functional specializations were also accompanied by accelerated evolution. However, none of the sites targeted by selection corresponded to known functional domains, catalytic or regulatory. Interestingly, among the 13 duplications tested, 7 exhibited evidence of positive selection in both branches emerging from the duplication, 2 in only one branch, and 4 in none of the branches.
The majority of duplications were followed by accelerated evolution targeting specific residues along both branches. This pattern was consistent with the optimization of the two sub-functions originally fulfilled by the ancestral gene before duplication. Our results thereby provide strong support to the so-called “Escape from Adaptive Conflict” (EAC) model. Because none of the residues targeted by selection occurred in characterized functional domains, we propose that enzyme specialization has occurred through subtle changes in affinity, activity or interaction with other enzymes in complex formation, while the basic function defined by the catalytic domain has been maintained.
KeywordsStarch enzymes Angiosperms Positive selection Paralogous genes Escape from Adaptive Conflict model Protein sequence evolution PAML
Living organisms store carbon as soluble (glycogen) or insoluble (starch) polysaccharides. Starch is a storage polysaccharide made of α-1.4-glucans with α-1.6 branches . It is composed of two polymer fractions : the moderately branched amylopectin, forming the semi-crystalline backbone of the starch granule [3, 4]; and amylose, a fraction with very few branches, which is embedded inside the amylopectin matrix. The branching pattern of amylopectin is distinctively asymmetrical, allowing for the close packing of intertwined chains into helical structures that crystallise and collapse by dehydration, hence forming the starch granule [5, 6].
Glycogen is widespread across archaea, eubacteria and eukaryotes [6, 7]. In contrast, starch is found mostly in lineages derived from primary plastid endosymbiosis: the Archaeplastida. Starch is also occasionally found in some unicellular marine diazotrophic cyanobacteria and several secondary endosymbiotic lineages [8–10]. A majority of the enzymes in the starch biosynthesis pathway (sbp) are derived from members of the eukaryotic glycogen metabolism pathway. A few enzymes however display a prokaryotic phylogenetic affiliation. Among them, ADP-glucose pyrophosphorylase and Granule Bound Starch Synthase I were acquired through endosymbiotic gene transfer from the plastid ancestor. Additionally, isoamylases and soluble starch synthases III-IV were transmitted by lateral gene transfer from intracellular chlamydiae pathogens [9–11]. Finally archaeplastidal pullulanases are distinctively polyphyletic and were acquired from diverse unidentified proteobacterial sources.
It is generally acknowledged that the cyanobacterial and eukaryotic pathways of storage polysaccharide merged during plastid endosymbiosis to generate an ancient cytosolic starch biosynthesis pathway . After or during metabolic transformation of the protoplastid into a true organelle, the Archaeplastida diverged into three lineages: the Glaucophyta (glaucophytes), the Rhodophyceae (red algae) and the Chloroplastida (green algae and land plants) [5, 11]. While the ancient cytosolic localization of storage polysaccharides was maintained in red algae and glaucophytes, the green lineage redirected the whole sbp to the plastid stroma as it diverged from the other Archaeplastida lineages .
Most genes involved in the sbp are members of multigenic families that have emerged from duplications during the complex Archaeplastida evolutionary history . Gene duplications in the sbp have occurred at different times during Archaeplastida evolution. Some of them had already occurred in the Archaeplastida ancestor prior to divergence of Chloroplastida from Rhodophyceae and Glaucophyta, and possibly prior to plastid endosymbiosis, while others have occurred after the divergence of the three Archaeplastida lineages. Within Chloroplastida, redirection of the sbp from the cytosol to the evolving chloroplast was facilitated by gene duplications. Finally some duplications have occurred recently such as the duplication of Granule Bound Starch Synthase (gbss) within the Poaceae family.
Interestingly, Ball and Morell  reviewed the evolutionary history of duplications in three gene families involved in the sbp (starch synthase enzymes, branching enzymes and debranching enzymes), and have shown no functional redundancy among paralogs. Similarly, Yan et al. reported 32 sbp genes in maize (Zea mays) and 27 in rice (Oryza sativa) and found that a substantial proportion of genes diverged in structure and/or expression pattern following whole genome duplications. Altogether, these results indicate that the sbp set up in plants is linked to the persistence of duplicated genes via functional specialization. The sbp set up therefore stands as an interesting framework to explore models of long-term persistence of duplicated genes and their contribution to pathway evolution.
Three models of persistence of duplicated genes are commonly encountered in the literature. The Duplication-Degeneration-Complementation (ddc) sub-functionalization model, first proposed by Force et al., posits that the paralogs evolve complementary sub-functions, overall maintaining the ancestral function . The Escape from Adaptive Conflict (eac) sub-functionalization model proposes the specialization of duplicated genes in two distinct functions originally fulfilled by the same ancestral gene [20, 21]. Under this model, duplication resolves a conflict residing in the incapacity of improving simultaneously the two functions because of detrimental pleiotropic effects [19, 22]. Finally, the neofunctionalization (neo-f) model  postulates that one paralog is recruited to fulfil a new function while the other preserves the ancestral function . Variants of this last model have been proposed to resolve the so-called Ohno’s dilemma , i.e. loss of the neutrally evolving paralog before acquisition of a rare beneficial mutation . For instance, the Innovation-Amplification-Divergence (iad) model  posits the evolution of a specialized enzyme from a progenitor enzyme displaying one or a range of promiscuous activities in addition to its primary function. These activities provide the substrate upon which natural selection can act and ultimately lead to functional divergence. In both the eac and iad models, several activities are assumed to exist prior to duplication. The specific resolution of Ohno’s dilemma by the iad model resides in the fact that changes in the ecological niche makes one or several pre-existing minor activities valuable. Selection will increase promiscuous activities, allowing maintenance of the paralog during its neo-functionalization .
These models differ in terms of selective pressures and molecular evolution rate following gene duplication , the latter being generally estimated using ω, the ratio of non-synonymous (dN) over synonymous (dS) substitution rates . Hence, the ddc model predicts that the two paralogs evolve under selective neutrality (ω = 1) while accumulating complementary mutations leading to a loss of function. Under the eac model, both paralogs evolve under positive selection (ω > 1) allowing optimization of different sub-functions. In the neo-f and iad models, one paralog evolves under positive selection (ω > 1) as it is recruited for a new function or a previously neutral minor function, while the other paralog evolves under selective constraint (ω < 1) to preserve the ancestral function.
The neo-f model has been clearly illustrated through several examples of gain of function after duplication, such as the gain of an new enzyme function in glycosinolate synthesis in Boechera, the acquisition of the glutamate dehydrogenase gene in human and apes, that of the alcohol dehydrogenase gene in Drosophila, and those of gonadal paralogs of the pig cytochrome gene (for review see Conant and Wolfe ). iad illustrations come from the microbial literature, which offers several examples of new function evolution from the promiscuous activities of an ancestral enzyme (for a review see Soskine and Tawfik ). The distinction between the two neo-functionalization models, neo-f and iad, is challenging because it requires a knowledge of the promiscuous functions fulfilled by the ancestral gene – before duplication. Similarly, distinguishing between the two sub-functionalization models, ddc and eac, is difficult because both models rely on the partitioning of the ancestral function. So far, most of reported sub-functionalization cases have been interpreted in the light of the popular ddc model while the eac model has received little support in the literature. Des Marais and Rausher  identified clear evidence of eac from signs of adaptive evolution on two dihydroflavonol reductase paralogs in Ipomea and further suggest that evolution under the eac model may not be uncommon.
The fate of duplicated genes has been explored for one of the sbp enzymes among angiosperms, the ADP-Glucose Pyrophosphorylase (agpase) [29, 30]. In Archaeplastida, this protein is composed of two sub-units (one small and one large) encoded by paralogous genes originating from multiple duplication events. Patterns consistent with repeated sub-functionalization under the eac model have been described in the evolution of the agpase large sub-unit, leading to enzyme specialization for sink versus source tissues, as well as a particular agpase adaptation in grass endosperm . Contrastingly, the sequence of the small sub-unit paralogs revealed evidence neither of sub-functionalization nor positive selection during angiosperms evolution, in spite of numerous duplication events. The small sub-unit has evolved under strong constraints, preventing the acquisition of new or modified functions [29, 30]. Additionally, Corbi et al. revealed signs of coevolution among amino acid residues of the small sub-unit interaction domain  that likely also resulted from the strong evolutionary constraints placed on the agpase small sub-unit.
In the present study, we propose to extend the analysis of the evolutionary pattern following duplication events that occurred along the sbp evolution in angiosperms, to six gene families encoding sbp enzymes. We rely on phylogenetic approaches coupled with tests on the rates of evolution along branches and clades to assess selective processes that are responsible for the maintenance of paralogs along this pathway. More specifically, we compare observed patterns of evolution to those predicted by the ddc, the eac, and the neo-f/iad models. We discuss our results in the frame of the sbp evolution.
We studied the evolution of six gene families encoding enzymes of the sbp within angiosperms. For each family we identified paralogous genes maintained after duplication events that we matched to known compartmental or functional specialization. We further estimated the evolutionary rates in branches and clades emerging from such duplications to test whether accelerated evolution of paralogs has contributed to the evolution of this metabolic pathway. Patterns of evolutionary rates along branches were informative and provided support for distinct models of evolution of duplicated genes in the sbp pathway. In contrast, when performing pairwise comparisons of average ω values of clades emerging from gene duplication (data not shown), we found that all comparisons were significant (P-value < 7.35 10-16). Furthermore, ω values among clades varied between 0 and 0.674 consistent with purifying selection. Overall the clade model therefore did not detect positive selection and offered no power to discriminate among models. We therefore chose to focus primarily on the branch-site model in the presentation of our results.
Paralogs with cytosolic vs. plastidic specialization
Positive selection and constraint relaxation was observed in branches a and b for pgm (Table I – pgm). In branch D1a, about 9% of the sites were detected under positive selection with an ω value of 11.32 while in branch D1b, about 13% of the sites were detected under positive selection (ω = 7.37). beb revealed 4 and 12 sites under positive selection were identified in branches a and b, respectively (Figure 3B).
Among the 12 sites detected under positive selection in branch D1b (the branch leading to the cytosol-expressed paralog of the pgm enzyme), 4 of them (sites E119, A161, S249, R361) have a large posterior probability (above 0.99) to have evolved under positive selection. At position E119 (Figure 3B), a glutamic acid (E) was inferred in the ancestral sequence while, among plastidic paralogs, we found predominantly polar uncharged asparagine (N) or negatively charged aspartic (D) or glutamic (E) acids. In contrast, the paralogs expressed in the cytosol exhibited at this position a serine (S) or a threonine (T), two amino acids with polar uncharged side chains. At position A161 the ancestral sequence and the plastidic paralogs contained diverse residues with a predominance of proline (P) while only glutamic acid (E) was found in cytosolic paralogs. Residue S249 (Figure 3B) was a serine in all paralogs encoded by distinct codons, TCN for ancestral and plastidic sequences but AGY for cytosolic paralogs. Finally at R361, arginine (R) was the only residue found in the ancestral sequence and plastid-expressed paralogs, an amino acid with positively charged side chain, while the cytosolic paralogs carried two types of amino acids with hydrophobic side chains: valine (V) or isoleucine (I).
Paralogs with functional specialization
In order to test if gene duplications in the evolutionary histories of the starch synthase, branching and debranching enzymes were accompanied by functional specialization following gene duplications, we tested for variation of evolutionary rates in branches emerging from these duplications.
Starch synthase enzymes
For more recent duplications, posterior to angiosperm radiation, positive selection was detected only in branch D4b, with about 10% sites under positive selection and 4 sites detected using the beb method (Figure 5A). Note that the clade model revealed a higher ω value (0.286) in the clade emerging from the branch exhibiting accelerated evolution (D4b) than in the clade emerging from D4a (0.226).
The branching enzyme (thereafter be) gene phylogeny presented three clades of paralogs, be1, be2 and be3 (Figure 4B). While the origin of the be3 clade remains unclear (see Discussion), be1 and be2 arose selectively in the Chloroplastida as they diverged from the other Archaeplastida and the pathway was redirected to the plastids (D1, Figure 4B). Two additional duplications, one specific to the Poaceae (D2) and one specific to the Arabidopsis genus, arose within be2 (Figure 4B). Positive selection or relaxation of constraint were detected in both branches D1a and D1b, with about 9% and 13% sites having evolved under positive selection, respectively (Table I – be). Using the beb method we were able to highlight 17 and 32 sites in branches D1a and D1b, respectively (Figure 5B). The structure of the branching enzyme family and the presence of five catalytic sites included in conserved domains has been well described in Pisum sativum, Solanum tuberosum, Oryza sativa and Sorghum bicolor. However, none of the sites that we detected under selection matched to these catalytic domains.
The debranching enzyme (thereafter dbe) gene phylogeny was composed of three clades of isoamylase genes (isa1, isa2 and isa3; Figure 4C) that arose from two duplication events (D1 and D2) prior to the angiosperm radiation. The evolutionary history of debranching enzyme genes is complex and has been very recently reviewed . The duplications depicted here occurred as the pathway of the emerging green lineage was progressively reconstructed in plastids.
Acceleration in evolution rate was detected on the D1b branch, with about 9% sites under positive selection (Table I – dbe) and 5 sites were significant using the beb method (Figure 5C). Consistently with patterns observed for GBSS, the clade emerging from the branch displaying evidence of accelerated evolution also exhibited the greatest ω value (0.187 versus 0.143). About 9% and 23% of sites were detected under positive selection in branches D2a and D2b, respectively (Table I – dbe), and 8 and 28 sites detected with the beb method (Figure 5C).
The Poaceae albumen-specific ADP-Glucose transporter
Upon gene duplication, loss is the expected fate of the majority of paralogs . Evidence comes from the study of mutational effects of individual proteins . For instance, Jacquier et al. have shown that close to 50% of independent amino acid substitutions in a collection of 990 Escherichia coli mutants of the beta-lactamase TEM-1 exhibit deleterious effects as measured by a significant reduction of enzyme activity.
In this context, the starch biosynthetic pathway stands as an interesting example for studying the alternative fate, duplicated gene retention. Reconstruction of the sbp in the ancestor of Archaeplastida suggests that polysaccharide synthesis was ancestrally cytosolic, and then redirected to plastid at the origin of the Chloroplastida. This change in protein addressing was clearly accompanied by numerous gene duplications leading to 32 and 27 genes involved in starch synthesis in maize and rice . Interestingly, along the evolution of the Chloroplastida, complexification of the sbp was accompanied by an increase rate of paralog retention with fewer genes in Chlorophyta and Bryophyta (other chlorobiontes, Figures 2, 4, 6) as compared to angiosperms (Monocots and Dicots, Figures 2, 4, 6). This diversification is particularly prominent at the end of the synthesis pathway (for starch synthases, branching and debranching enzymes), where functions are fulfilled by a myriad of paralogs. The maintenance of so many paralogs has been possible because of the concomitant enzyme specialization and suggests that duplications were followed by sub-functionalization or neo-functionalization. In the present paper, we explore the evolutionary fate of genes from 6 families encoding sbp enzymes and revealed patterns of selection that have accompanied major gene duplication events and paralog preservation in the starch biosynthetic pathway.
In the ancestor of Archaeplastida, as well as in extant glaucophytes and rhodophytes, the nucleotide-sugar substrates were used for chain elongation exclusively in the cytosol . Multiple rounds of duplications for both pgm and pho genes have subsequently occurred during the green line radiation , one of which has given rise to a cytosolic/plastidic specialization (duplications D1 in Figures 2A and 2B). It is currently unknown where the ancestral genes of pgm or pho were expressed in chlorophytes . We detected positive selection accompanying compartmental specialization for these two enzymes, i.e. along the two branches emerging from each duplication event (Figure 2). This pattern is compatible with the eac sub-functionalization model that assumes that both paralogs evolve under positive selection thereby improving two distinct functions or sub-functions initially fulfilled by the ancestral gene.
We detected numerous sites under positive selection in both pgm and pho. Two functional domains have been described in the pgm sequence: a catalytic domain and a metal ion binding domain . The first site of the catalytic domain and the last site of the metal ion-binding domain differed between plastidic- and cytosolic-expressed paralog sequences but were not found to have evolved under positive selection (Figure 3B). Additionally, none of the pgm sites under positive selection occurred in these functional domains (Figure 3B). However, the sites we detected may be good candidates for future functional studies and help to reveal still unknown domains. Similarly, sites detected under positive selection in pho genes, for which no functional region is described, could help in the identification of crucial regions in this protein.
Along the branch that gave rise to the pgm cytosolic paralogs (D1b), positive selection was detected at 4 sites and was accompanied by changes in amino acid residues (Figure 3B). At position R361, amino acid residues with hydrophobic side chains replaced amino acid residues with positively charged side chains. This has likely modified the tertiary structure of the protein and its function and/or allosteric regulation since proteins are usually more stable with hydrophobic residues in the internal part of proteins (to avoid hydrophilic contact) while positively charged (polar) amino acids are mostly found on the protein surface.
At position S249, all paralogs shared the same amino acid encoded however by distinct codons, TCN for plastidic paralogs and ancestral sequence but AGY for cytosolic paralogs. Such a pattern of changes that involves multiple mutations could be explained by an initial deleterious mutation compensated by selection on subsequent mutations that restored the identity of the S residue.
Phylogenies of starch synthesis enzymes (ss, be and dbe) in the land plants reveal several duplications that occurred before angiosperm radiation (Figure 4) and led to a diverse panel of specialized enzymes that together insure the starch branching and debranching processes . In the dbe family (Figure 4C), positive selection was observed in branch D1b but not D1a, suggesting that neo-functionalization accompanied the Isa1 gene divergence. This result is in agreement with the existence of a distinct ancestral function for this enzyme. Indeed it is highly suspected that this enzyme emerged through duplication of a GlgX type of glucan hydrolase of chlamydial origin. In bacteria this enzyme displays a restricted substrate specificity in line with its function in glycogen catabolism. Isa1 has evolved both a novel substrate specificity allowing it to debranch longer chains and also most probably a novel quaternary organisation into either the Isa1 dimer or the Isa1/Isa2 heteromultimer . It is very likely that the ancestor of Isa2 and Isa3 genes maintained a GlgX-like function. Upon duplication Isa2 acquired a function as a scaffolding subunit of the complex heteromultimeric Isa1/Isa2 enzyme. It lost its catalytic function in this process, which correlates with longer branches in maximum likelihood phylogenetic trees. Isa3 on the other hand maintained some of GlgX restrictions with respect to substrate outer chain lengths but acquired the ability to accommodate debranching of a wider variety of branched oligosaccharides. Thus, the change in evolutionary rate we detected in both branches that gave rise to Isa2 and Isa3 (duplication D2) strongly suggest that positive selection rather constraint relaxation drove their divergence toward distinct specialized functions .
Detection of selection and/or relaxation of constraints by comparing MA, MA0 and M1a models (H1 hypothesis/H0 hypothesis)
The branching enzymes be1 and be2 play different roles in the structure of amylopectin in storage organs [31, 46]. The be1 knockout mutants observed display no particular phenotype, except in maize [4, 16] where be1 appears to be required for starch mobilization during seed germination, and in Chlamydomonas where be1 mutants are defective for starch catabolism [7, 47, 48]. Additionally, the be1 paralog is absent from the A. thaliana genome , suggesting that it is not required for starch synthesis or mobilisation. In contrast, the be2 paralog has been more largely maintained, and plays a key role in starch synthesis [11, 16]. Unexpectedly, we found evidence of positive selection in the branches giving rise to both be1 and be2 (Figure 4B), suggesting that the be1 paralog ancestor has initially evolved toward a specialized function that has secondarily been maintained in some taxa and lost in others. It is tempting to correlate this specialization to specific aspects of starch degradation during seed (for plants) or zygote (for green algae) germination.
Former results suggest that be3 is not directly issued from gene duplication but rather from an ancestral gene that could have pre-existed in the cytosolic glycogen metabolism network of the common ancestor of Archaeplastida before plastid endosymbiosis and which was lost from many Archaeplastida. This may explain why the grouping of be3 with be1 and be2 is not well supported in our phylogeny (Figure 4B), making the study of selective pressures on the be3 ancestor irrelevant.
We detected several instances of positive selection accompanying compartmentalization or functional specializations along the two branches emerging from duplication events at various steps during the evolution of the starch biosynthesis pathway. Several processes may generate these patterns including a combination of the above-cited models. For instance, sub-functionalization may be followed by independent improvements of functions but appeared as evidence for eac. We are also limited by the power of our analysis. Hence, positive selection may not be detected in a branch if the improvement of a pre-existing (major or promiscuous) function has been fulfilled by a single or very few mutations. Despite all these caveats, our study highlights a number of cases sustained by biological interpretations in favour of the eac sub-functionalization model. Our results thereby support its prominence along the evolutionary history of starch biosynthesis pathway.
In all multigene families studied here, none of the sites detected under positive selection matched with known functional domains of the proteins. At the angiosperm level, enzymes encoded by a given multigene family share the same basic function. For example, in the starch synthase enzyme family, all enzymes catalyse the same reaction that binds two glucoses in α-1,4 . Differences between those enzymes are therefore subtle and have to do with the affinity for/production of amylopectin chains with distinct length and solubility. In the sbp, important interactions between enzymes exist. For example, Tetlow and collaborators  showed that in wheat, complex structures were formed through the association of be1, be2a and pho. Hence, complex formation and phosphorylation are required to activate be2a. Therefore, while the basic function (defined by the catalytic step) of every enzyme in a multigene family is constrained, function of the enzyme complex (defined by enzyme conformation and interaction with other enzymes during catalysis) may evolve after a duplication event. Our results suggest that new functions are generally acquired by mutations outside the highly conserved catalytic domains, most likely in regulatory domains and/or residues involved in changes in enzyme conformation/activity.
Sequence retrieval and alignment
We retrieved from the literature [11, 12, 14, 17] the available coding sequences for six genes representative of six families (reference sequences) encoding starch biosynthesis enzymes: ADP-glucose transporter [agt: NM_119392.3, XM_002439325.1, XM_002438594.1], phosphoglucomutase [pgm: NM_001160993.1, AJ242601.1], debranching enzymes [dbe: NM_129551.3, NM_100213.3, NM_116971.5], branching enzymes [be: EF122471.1, NM_129196.3, AK118785.1], starch synthase [gbss&ss: AY149948.1, NM_122336.4, NM_110984.2, NM_101044.2, NM_117934.4] and starch phosphorylase [pho: NM_114564.2, AY049235.1]. In order to retrieve angiosperm sequences available for each gene family, we used all reference sequences as queries against the ncbi databases (http://www.ncbi.nlm.nih.gov/sites/gquery) using tBlastx. Sequences sharing more than 85% identity with any of the reference sequences were conserved, except for agt for which we used an identity threshold of 50% following . In addition to angiosperm sequences, we also retrieved outgroup sequences from other chlorobiontes (Chlamydomonas reinhardtii, Micromonas pusilla, Ostreococcus tauri, Physcomitrella patens, Selaginella moellendorffii), rhodophytes (Cyanidioschyzon merolae, Galdieria sulphuraria, Gracilaria gracilis) and glaucophytes (Cyanophora paradoxa). Finally, we employed the same protocol using the BioCyc database (http://biocyc.org/) and a 60% identity threshold to retrieve cyanobacterial outgroup sequences from the Prochlorococcus marinus and the Synechoccus genome sequences.
In total, we retrieved 19, 23, 27, 31 and 23 angiosperm sequences plus 2, 7, 10, 8 and 14 outgroup (non-angiosperm) sequences for agt, pgm, dbe, be, gbss & ss and pho, respectively. The source and accession numbers of sequences analysed are indicated in Additional file 1. Protein sequence alignments were obtained using ClustalW in the bioedit 184.108.40.206 software (http://www.mbio.ncsu.edu/bioedit/bioedit.html; ), followed by manual inspection. Poorly aligned regions (>50% gap in local alignment) and insertion-deletions were excluded from alignments resulting in alignment lengths of 286, 285, 724, 628, 317 and 785 residues for agt, pgm, dbe, be, gbss & ss and pho respectively.
We used nucleotide sequences from the protein sequence alignments to build gene family phylogenies, except for agt and dbe for which we used protein alignments, due to greater divergence between sequences. We obtained phylogenies by Maximum Likelihood (ml) method using the phyml software (http://www.atgc-montpellier.fr/phyml/; ). We employed the gtr (General Time Reversible) substitution models determined by modeltest (https://code.google.com/p/jmodeltest2/; ). We rooted phylogenies with cyanobacteria as outgroups for all enzymes except AGT for which we used Physcomitrella patens. Bootstrap supports were calculated using 500 replicates.
Detection of branches and residues deviating from neutral evolution
We checked our gene phylogenies with the known species phylogeny within each paralog . No tree incongruence with the species evolution was observed except for few terminal branches whose nodes were poorly supported by low bootstrap values. In such cases, we left the nodes unresolved. Topologies used to test for evidence of deviation from neutral evolution along branches (names as a and b) emerging from major duplications (D) were based on these phylogenies.
We first determined for each phylogeny the most parsimonious equilibrium codon frequency model using codonfreq (CF) in the Site model M0 of codeml package (pamlv.4; ). We compared 4 nested models of codon frequency – equal frequencies (0 parameters – CF0), frequencies deduced from average nucleotide frequencies (3 parameters – CF1), frequencies deduced from average nucleotide frequencies at each codon position (9 parameters – CF2), frequencies different for each codon (60 parameters – CF3) – using likelihood ratio tests (LRTs). We retained the CF0 model for pgm, pho, be (D1 and D2) and gbss&ss (D1, D2, D3, D4 and D5), the CF1 model for be (D3) and gbss&ss (D6) and the CF3 model for dbe (D1 and D2) and agt.
Second, we used the output of the codon frequency model previously determined to compute dS for all branches of the phylogenies under the nearly neutral Site model (M1a) of the codeml package (pamlv.4; ). Note that all models we used in paml were named after . This model allows the ratio ω to vary among sites . Because models of sequence evolution rely on the infinite site model assumption, we discarded saturated branches, i.e. branches likely bearing sites with multiple substitutions, where dS value could not be estimated by paml.
Third, for the non-saturated target branches, we estimated the non-synonymous substitution rate (dN), the synonymous substitution rate (dS) and their ratio ω (dN/dS). In branch-site model A (MA), ω varies among sites and branches thereby allowing to estimate the proportion of sites subject to contrasted evolution rate along target branches (foreground branches) and background branches . These models were compared using lrts as described by Yang . Significance between Branch-Site model A (MA) and the null Branch-Site model A (MA0) reveals signs of positive selection, while significant differences between MA and the nearly neutral site model (M1a) can be interpreted as evidence for either relaxation of constraint and/or positive selection. Finally, significant LRT comparing MA0 to M1a indicates relaxed constraints .
When the lrt was significant (using a 0.01 α threshold), the beb (Bayes empirical Bayes) method was used to identify residues that are likely to have evolved under positive selection , based on a posterior probability threshold of 0.95. Consensus sequences were implemented simultaneously by paml, at each node of the phylogenies. We used each of the reconstructed ancestral sequences at target nodes to position residues.
In order to test for the long-term effect of selection after gene duplication, we used the Clade model C  that aims at detecting a difference in the rates of evolution between both clades emerging from target duplications. This model allows estimating the proportion of sites evolving at different ω rates in both clades, and is tested against model M1a using an LRT with 3 df.
Availability of supporting data
All alignments and phylogenies used in the present paper are available on the Dryad repository at doi:10.5061/dryad.vj7nr.
- sbp :
starch biosynthesis pathway
UDP glucose pyrophosphorylase
- fk :
- pgi :
- agt :
- pgm :
- dbe :
- be :
- gbss & ss :
granule bound starch synthase and starch synthase
- pho :
- ddc :
- eac :
Escape from Adaptive Conflict
- iad :
- neo-f :
We thank Brandon Gaut and members of the Gaut Lab for their hospitality during our stay at the Department of Ecology and Evolutionary Biology in U.C. Irvine, and for many useful comments and discussions. We are grateful to Jean-Louis Prioul for insightful discussions throughout this work. We also thank Douglas M. Eudy for proofreading, and the two reviewers for the interest they have shown in our study and for their constructive remarks. Financial support to ON was provided by the UMR GénétiqueVégétale.
- Han Y, Gasic K, Sun F, Xu M, Korban SS: A gene encoding starch branching enzyme I (SBEI) in apple (Malus x domestica, Rosaceae) and its phylogenetic relationship to Sbe genes from other angiosperms. Mol Phylogenet Evol. 2007, 43: 852-863. 10.1016/j.ympev.2006.09.001.PubMedView ArticleGoogle Scholar
- Valdez HA, Busi MV, Wayllace NZ, Parisi G, Ugalde RA, Gomez-casati DF: Role of the N-Terminal starch-binding domains in the kinetic properties of Starch Synthase III from Arabidopsis thaliana. Biochemistry. 2008, 47: 3026-3032. 10.1021/bi702418h.PubMedView ArticleGoogle Scholar
- Hannah LC, James M: The complexities of starch biosynthesis in cereal endosperms. Curr Opin Biotechnol. 2008, 19: 160-165. 10.1016/j.copbio.2008.02.013.PubMedView ArticleGoogle Scholar
- Keeling PL, Myers AM: Biochemistry and genetics of starch synthesis. Annu Rev Food Sci Technol. 2010, 1: 271-303. 10.1146/annurev.food.102308.124214.PubMedView ArticleGoogle Scholar
- Colleoni C: L ’amidon: sa synthèse, sa mobilisation, son histoire évolutive. Cah Agric. 2009, 18: 315-322.Google Scholar
- Zeeman SC, Kossmann J, Smith AM: Starch: its metabolism, evolution, and biotechnological modification in plants. Annu Rev Plant Biol. 2010, 61: 209-234. 10.1146/annurev-arplant-042809-112301.PubMedView ArticleGoogle Scholar
- Hennen-Bierwagen T a, Liu F, Marsh RS, Kim S, Gan Q, Tetlow IJ, Emes MJ, James MG, Myers AM: Starch biosynthetic enzymes from developing maize endosperm associate in multisubunit complexes. Plant Physiol. 2008, 146: 1892-1908. 10.1104/pp.108.116285.PubMedPubMed CentralView ArticleGoogle Scholar
- Dauvillée D, Deschamps P, Ral J-P, Plancke C, Putaux J-L, Devassine J, Durand-Terrasson A, Devin A, Ball SG: Genetic dissection of floridean starch synthesis in the cytosol of the model dinoflagellate Crypthecodinium cohnii. Proc Natl Acad Sci U S A. 2009, 106: 21126-21130. 10.1073/pnas.0907424106.PubMedPubMed CentralView ArticleGoogle Scholar
- Ball S, Colleoni C, Cenci U, Raj JN, Tirtiaux C: The evolution of glycogen and starch metabolism in eukaryotes gives molecular clues to understand the establishment of plastid endosymbiosis. J Exp Bot. 2011, 62: 1775-1801. 10.1093/jxb/erq411.PubMedView ArticleGoogle Scholar
- Deschamps P, Colleoni C, Nakamura Y, Suzuki E, Putaux J-L, Buléon A, Haebel S, Ritte G, Steup M, Falcón LI, Moreira D, Löffelhardt W, Raj JN, Plancke C, d’Hulst C, Dauvillée D, Ball S: Metabolic symbiosis and the birth of the plant kingdom. Mol Biol Evol. 2008, 25: 536-548. 10.1093/molbev/msm280.PubMedView ArticleGoogle Scholar
- Deschamps P, Moreau H, Worden AZ, Dauvillée D, Ball SG: Early gene duplication within chloroplastida and its correspondence with relocation of starch metabolism to chloroplasts. Genetics. 2008, 178: 2373-2387. 10.1534/genetics.108.087205.PubMedPubMed CentralView ArticleGoogle Scholar
- Patron NJ, Keeling PJ: Common evolutionary origin of starch biosynthetic enzymes in green and red algae. J Phycol. 2005, 41: 1131-1141. 10.1111/j.1529-8817.2005.00135.x.View ArticleGoogle Scholar
- Ball SG, Subtil A, Bhattacharya D, Moustafa A, Weber a PM, Gehre L, Colleoni C, Arias M-C, Cenci U, Dauvillee D: Metabolic effectors secreted by bacterial pathogens: essential facilitators of plastid endosymbiosis?. Plant Cell. 2013, 25: 1-15. 10.1105/tpc.113.250110.View ArticleGoogle Scholar
- Comparot-Moss S, Denyer K: The evolution of the starch biosynthetic pathway in cereals and other grasses. J Exp Bot. 2009, 60: 2481-2492. 10.1093/jxb/erp141.PubMedView ArticleGoogle Scholar
- Zhang J: Evolution by gene duplication: an update. Trends Ecol Evol. 2003, 18: 292-298. 10.1016/S0169-5347(03)00033-8.View ArticleGoogle Scholar
- Ball SG, Morell MK: From bacterial glycogen to starch: understanding the biogenesis of the plant starch granule. Annu Rev Plant Biol. 2003, 54: 207-233. 10.1146/annurev.arplant.54.031902.134927.PubMedView ArticleGoogle Scholar
- Yan H-B, Pan X-X, Jiang H-W, Wu G-J: Comparison of the starch synthesis genes between maize and rice: copies, chromosome location and expression divergence. Theor Appl Genet. 2009, 119: 815-825. 10.1007/s00122-009-1091-5.PubMedView ArticleGoogle Scholar
- Force A, Lynch M, Pickett FB, Amores A, Yan YL, Postlethwait J: Preservation of duplicate genes by complementary, degenerative mutations. Genetics. 1999, 151: 1531-1545.PubMedPubMed CentralGoogle Scholar
- Conant GC, Wolfe KH: Turning a hobby into a job: how duplicated genes find new functions. Nat Rev Genet. 2008, 9: 938-950. 10.1038/nrg2482.PubMedView ArticleGoogle Scholar
- Hughes AL: The evolution of functionally novel proteins after gene duplication. Proc R Soc London Ser B Biol Sci. 1994, 256: 119-124. 10.1098/rspb.1994.0058.View ArticleGoogle Scholar
- Des Marais DL, Rausher MD: Escape from adaptive conflict after duplication in an anthocyanin pathway gene. Nature. 2008, 454: 762-765.PubMedGoogle Scholar
- Innan H, Kondrashov F: The evolution of gene duplications: classifying and distinguishing between models. Nat Rev Genet. 2010, 11: 97-108.PubMedView ArticleGoogle Scholar
- Ohno: Evolution by Gene Duplication. 1970, Germany: Springer eView ArticleGoogle Scholar
- Bergthorsson U, Andersson DI, Roth JR: Ohno’s dilemma: evolution of new genes under continuous selection. Proc Natl Acad Sci USA. 2007, 104: 17004-17009. 10.1073/pnas.0707158104.PubMedPubMed CentralView ArticleGoogle Scholar
- Bershtein S, Tawfik DS: Ohno’s model revisited: measuring the frequency of potentially adaptive mutations under various mutational drifts. Mol Biol Evol. 2008, 25: 2311-2318. 10.1093/molbev/msn174.PubMedView ArticleGoogle Scholar
- Muse SV: Estimating synonymous and nonsynonymous substitution rates. Mol Biol Evol. 1994, 13: 105-114.View ArticleGoogle Scholar
- Prasad KVSK, Song B-H, Olson-Manning C, Anderson JT, Lee C-R, Schranz ME, Windsor AJ, Clauss MJ, Manzaneda AJ, Naqvi I, Reichelt M, Gershenzon J, Rupasinghe SG, Schuler M a, Mitchell-Olds T: A gain-of-function polymorphism controlling complex traits and fitness in nature. Science. 2012, 337: 1081-1084. 10.1126/science.1221636.PubMedView ArticleGoogle Scholar
- Soskine M, Tawfik DS: Mutational effects and the evolution of new protein functions. Nat Rev Genet. 2010, 11: 572-582. 10.1038/nrg2808.PubMedView ArticleGoogle Scholar
- Corbi J, Dutheil JY, Damerval C, Tenaillon MI, Manicacci D: Accelerated evolution and coevolution drove the evolutionary history of AGPase sub-units during angiosperm radiation. Ann Bot. 2012, 109: 693-708. 10.1093/aob/mcr303.PubMedPubMed CentralView ArticleGoogle Scholar
- Georgelis N, Braun EL, Hannah LC: Duplications and functional divergence of ADP-glucose pyrophosphorylase genes in plants. BMC Evol Biol. 2008, 8: 232-10.1186/1471-2148-8-232.PubMedPubMed CentralView ArticleGoogle Scholar
- Burton RA, Bewley JD, Smith AM, Bhattacharyya MK, Tatge H, Ring S, Bull V, Hamilton WD, Martin C: Starch branching enzymes belonging to distinct enzyme families are differentially expressed during pea embryo development. Plant J. 1995, 7: 3-15. 10.1046/j.1365-313X.1995.07010003.x.PubMedView ArticleGoogle Scholar
- Khoshnoodi J, Blennow A, Ek B, Rask L, Larsson H: The multiple forms of starch-branching enzyme I in Solanum tuberosum. Eur J Biochem. 1996, 242: 148-155. 10.1111/j.1432-1033.1996.0148r.x.PubMedView ArticleGoogle Scholar
- Vu NT, Shimada H, Kakuta Y, Nakashima T, Ida H, Omori T, Nishi A, Satoh H, Kimura M: Biochemical and crystallographic characterization of the starch Branching Enzyme I (BEI) from Oryza sativa L. Biosci Biotechnol Biochem. 2008, 72: 2858-2866. 10.1271/bbb.80325.PubMedView ArticleGoogle Scholar
- Verlag F, Mutisya J, Sathish P, Sun C, Andersson L, Ahlandsberg S, Baguma Y, Palmqvist S, Odhiambo B, Åman P, Jansson C: Starch branching enzymes in sorghum (Sorghum bicolor) and barley (Hordeum vulgare): comparative analyses of enzyme structure and gene expression. J Plant Physiol. 2003, 160: 921-930. 10.1078/0176-1617-00960.View ArticleGoogle Scholar
- Cenci U, Nitschke F, Steup M, Minassian BA, Colleoni C, Ball SG: Transition from glycogen to starch metabolism in Archaeplastida. Trends Plant Sci. 2013, 19: 1-11.Google Scholar
- Lynch M, Force A: The probability of duplicate gene preservation by subfunctionalization. Genetics. 2000, 154: 459-473.PubMedPubMed CentralGoogle Scholar
- Jacquier H, Birgy A, le Nagard H, Mechulam Y, Schmitt E, Glodt J, Bercot B, Petit E, Poulain J, Barnaud G, Gros P-A, Tenaillon O: Capturing the mutational landscape of the beta-lactamase TEM-1. Proc Natl Acad Sci USA. 2013, 110: 13067-13072. 10.1073/pnas.1215206110.PubMedPubMed CentralView ArticleGoogle Scholar
- Manjunath S, Kenneth Lee C-H, van Winkle P, Bailey-Serres J: Molecular and biochemical characterization of cytosolic phosphoglucomutase in maize. Expression during development and in response to oxygen deprivation. Plant Physiol. 1998, 117: 997-1006. 10.1104/pp.117.3.997.PubMedPubMed CentralView ArticleGoogle Scholar
- Gao M, Wanat J, Stinard PS, James MG, Myers AM: Characterization of dull1, a maize gene coding for a novel starch synthase. Plant Cell. 1998, 10 (March): 399-412.PubMedPubMed CentralView ArticleGoogle Scholar
- Hennen-Bierwagen T a, Lin Q, Grimaud F, Planchot V, Keeling PL, James MG, Myers AM: Proteins from multiple metabolic pathways associate with starch biosynthetic enzymes in high molecular weight complexes: a model for regulation of carbon allocation in maize amyloplasts. Plant Physiol. 2009, 149: 1541-1559. 10.1104/pp.109.135293.PubMedPubMed CentralView ArticleGoogle Scholar
- Libessart N, Maddelein M-L, van den Koornhuyse N, Decq A, Delrue B, Mouille G, D’Hulst C, Ball S: Storage, Photosynthesis, and Growth: The conditional nature of mutations affecting starch synthesis and structure in Chlamydomonas. Plant Cell. 1995, 7: 1117-1127. 10.1105/tpc.7.8.1117.PubMedPubMed CentralView ArticleGoogle Scholar
- van de Wal M, D’Hulst C, Vincken J, Buléon A, Visser R, Ball S: Amylose is synthesized in vitro by extension of and cleavage from amylopectin. J Biol Chem. 1998, 273: 22232-22240. 10.1074/jbc.273.35.22232.PubMedView ArticleGoogle Scholar
- Ball SG, van de Wal MHB, Visser RG: Progress in understanding the biosynthesis of amylose. Trends Plant Sci. 1998, 3: 462-467. 10.1016/S1360-1385(98)01342-9.View ArticleGoogle Scholar
- van den Koornhuyse N, Libessart N, Delrue B, Zabawinski C, Decq A, Iglesias A, Carton A, Preiss J, Ball S: Control of starch composition and structure through substrate supply in the monocellular alga Chlamydomonas reinhardtii. J Biol Chem. 1996, 271: 16281-16287. 10.1074/jbc.271.27.16281.PubMedView ArticleGoogle Scholar
- Miao H, Sun P, Liu W, Xu B, Jin Z: Identification of genes encoding granule-bound starch synthase involved in amylose metabolism in banana fruit. PLoS One. 2014, 9: e88077-10.1371/journal.pone.0088077.PubMedPubMed CentralView ArticleGoogle Scholar
- Chang J-W, Li S-C, Shih Y-C, Wang R, Chung P-S, Ko Y-T: Molecular characterization of mungbean (Vigna radiata L.) starch branching enzyme I. J Agric Food Chem. 2010, 58: 10437-10444. 10.1021/jf102129f.PubMedView ArticleGoogle Scholar
- Tunçay H, Findinier J, Duchêne T, Cogez V, Cousin C, Peltier G, Ball SG, Dauvillée D: A forward genetic approach in Chlamydomonas reinhardtii as a strategy for exploring starch catabolism. PLoS One. 2013, 8: e74763-10.1371/journal.pone.0074763.PubMedPubMed CentralView ArticleGoogle Scholar
- Xia H, Yandeau-Nelson M, Thompson DB, Guiltinan MJ: Deficiency of maize starch-branching enzyme I results in altered starch fine structure, decreased digestibility and reduced coleoptile growth during germination. BMC Plant Biol. 2011, 11: 1-13. 10.1186/1471-2229-11-1.View ArticleGoogle Scholar
- Tetlow IJ, Wait R, Lu Z, Akkasaeng R, Bowsher CG, Esposito S, Kosar-hashemi B, Morell MK, Emes MJ: Protein phosphorylation in amyloplasts regulates starch branching enzyme activity and protein – protein interactions. Plant Cell. 2004, 16: 694-708. 10.1105/tpc.017400.PubMedPubMed CentralView ArticleGoogle Scholar
- Hall TA: BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symp Ser. 1999, 41: 95-98.Google Scholar
- Guindon S, Lethiec F, Duroux P, Gascuel O: PHYML Online—a web server for fast maximum likelihood-based phylogenetic inference. Nucleic Acids Res. 2005, 33 (Web Server issue): W557-W559.PubMedPubMed CentralView ArticleGoogle Scholar
- Posada D, Crandall KA: MODELTEST : testing the model of DNA substitution. Bioinformatics. 1998, 14: 817-818. 10.1093/bioinformatics/14.9.817.PubMedView ArticleGoogle Scholar
- Chase MW, Fay MF, Reveal JL, Soltis DE, Soltis PS,FP, Anderberg AA, Moore MJ, Olmstead RG, Rudall PJ,JK: An update of the angiosperm phylogeny group classification for the orders and families of flowering plants: APG III. Bot J Linn Soc. 2009, 161: 105-121.View ArticleGoogle Scholar
- Yang Z: PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol. 2007, 24: 1586-1591. 10.1093/molbev/msm088.PubMedView ArticleGoogle Scholar
- Nielsen R, Yang Z: Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene. Genetics. 1998, 148: 929-936.PubMedPubMed CentralGoogle Scholar
- Yang Z, Wong WSW, Nielsen R: Bayes empirical bayes inference of amino acid sites under positive selection. Mol Biol Evol. 2005, 22: 1107-1118. 10.1093/molbev/msi097.PubMedView ArticleGoogle Scholar
- Yang Z: Likelihood ratio tests for detecting positive selection and application to primate lysozyme evolution. Mol Biol Evol. 1997, 15: 568-573.View ArticleGoogle Scholar
- Yang Z, Nielsen R: Codon-substitution models for detecting molecular adaptation at individual sites along specific lineages. Mol Biol Evol. 2002, 19: 908-917. 10.1093/oxfordjournals.molbev.a004148.PubMedView ArticleGoogle Scholar
- Yang Z: Computational Molecular Evolution. 2006View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.