Structure and evolution of the plant cation diffusion facilitator family of ion transporters
© Gustin et al; licensee BioMed Central Ltd. 2011
Received: 28 September 2010
Accepted: 24 March 2011
Published: 24 March 2011
Skip to main content
© Gustin et al; licensee BioMed Central Ltd. 2011
Received: 28 September 2010
Accepted: 24 March 2011
Published: 24 March 2011
Members of the cation diffusion facilitator (CDF) family are integral membrane divalent cation transporters that transport metal ions out of the cytoplasm either into the extracellular space or into internal compartments such as the vacuole. The spectrum of cations known to be transported by proteins of the CDF family include Zn, Fe, Co, Cd, and Mn. Members of this family have been identified in prokaryotes, eukaryotes, and archaea, and in sequenced plant genomes. CDF families range in size from nine members in Selaginella moellendorffii to 19 members in Populus trichocarpa. Phylogenetic analysis suggests that the CDF family has expanded within plants, but a definitive plant CDF family phylogeny has not been constructed.
Representative CDF members were annotated from diverse genomes across the Viridiplantae and Rhodophyta lineages and used to identify phylogenetic relationships within the CDF family. Bayesian phylogenetic analysis of CDF amino acid sequence data supports organizing land plant CDF family sequences into 7 groups. The origin of the 7 groups predates the emergence of land plants. Among these, 5 of the 7 groups are likely to have originated at the base of the tree of life, and 2 of 7 groups appear to be derived from a duplication event prior to or coincident with land plant evolution. Within land plants, local expansion continues within select groups, while several groups are strictly maintained as one gene copy per genome.
Defining the CDF gene family phylogeny contributes to our understanding of this family in several ways. First, when embarking upon functional studies of the members, defining primary groups improves the predictive power of functional assignment of orthologous/paralogous genes and aids in hypothesis generation. Second, defining groups will allow a group-specific sequence motif to be generated that will help define future CDF family sequences and aid in functional motif identification, which currently is lacking for this family in plants. Third, the plant-specific expansion resulting in Groups 8 and 9 evolved coincident to the early primary radiation of plants onto land, suggesting these families may have been important for early land colonization.
Members of the cation diffusion facilitator (CDF) family have been shown to be important for maintenance of cation homeostasis in bacteria, yeast, plants, and mammals [For detailed reviews see references [1–5]]. CDF proteins, in general, bind to and efflux such cations as Zn from the cytoplasm through sequestration into internal compartments or through efflux from the cell. This role in modulating cellular cation concentrations has been demonstrated to impact cation accumulation, cation tolerance, signal transduction cascades, oxidative stress resistance, and protein turnover [6–8].
Several research groups have analyzed the phylogenetic relationships of CDFs and found that this is an ancient gene family that pre-dates the origin of eukaryotes, as reflected in the grouping of sequences from diverse organisms within several branches of constructed phylogenetic trees. Plant CDF members, including 12 members from the sequenced genome of Arabidopsis thaliana have been grouped into three or four lineages [2, 9, 10]. However, these analyses were limited by sequence availability due to the lack of sequence genomes and available cDNA libraries, which resulted in incomplete or weakly supported hypotheses about CDF family phylogeny within plants.
Montanini, et al. (2007) conducted global phylogenetic analysis on 273 CDFs from prokaryotes, eukaryotes, and archaea . Based on a maximum parsimony analysis, variation across the gene family could be partitioned into three major groups, designated Zn-CDFs, Zn/Fe-CDFs, and Mn-CDFs based on the hypothesized or confirmed transported substrate of one or more group members. For example, the Mn-CDF group containes 59 sequences and, within this group, the plant members MTP8 and MTP11 have been characterized as Mn transporters. Using vastly expanded sequence information and substrate-defined groups, an updated CDF signature sequence was derived as well as group-specific signature sequences. The conserved residues comprising these signature sequences were the target of amino acid substitution, many of which were found to be critical residues for a fully functional protein. Recently, Migeon et al. (2010) expanded this analysis by incorporating CDF sequences from additional plant genomes with emphasis on phylogenetic and molecular characterization of metal transporters in Populus trichocarpa . This analysis confirmed partitioning the sequences into three major functional groups. Grouping the sequences by predicted substrate specificity provides a useful hypothesis-generation tool for uncharacterized proteins within these broad groupings. However, higher resolution analysis of plant-specific CDF sequences is likely to reveal informative relationships within the linage of land plants.
With the generation of full genome sequences for multiple eukaryotic organisms, a wealth of information is available from which to generate detailed phylogenomic relationships of gene families within and between organisms. As genome sequences become available for more species, this "genomic" method of phylogenetic analysis should enable robust estimation of orthology and paralogy among related genes. This high level resolution of familial evolution provides a powerful analytical tool from which to synthesize hypothesis about, among other things, the function of gene family members . The precision in functionally annotating an uncharacterized sequence based on sequence similarity to a characterized protein should increase if a detailed estimation of family phylogeny is known . Once a sufficiently detailed map of the gene family structure and evolution are constructed, a more global understanding of the adaptive significance of the family dynamics through the course of evolution may become clearer and lead to testable hypotheses about the roles members play in organismal evolution.
Genome sequencing of a red alga, Cyanidioschyzon merolae, green algae, Ostreococcus tauri, Ostreococcus lucimarinus, and Clammydomonas reinhardii, basal nonvascular and vascular land plants, Physcomitrella patens (P. patens) and Selaginella moellendorffii, and representatives of angiosperm lineages have been completed [15–25]. C. merolae is a non-motile unicellular red alga that lives in extreme environmental conditions, such as sulfate-rich hot springs and is estimated to have diverged from the lineage leading to true plant (viridiplantae) approximately 1.5 billion years ago . Ostreococcus species are the smallest known eukaryotic organisms and belong to the Prasinophyceae, an early diverging class in the lineage of the green algae [27–29]. The algal model, C. reinhardii, is estimated to have shared a common ancestor with such species as A. thaliana 1.1 billion years ago . P. patens and S. moellendorffii represent early land plant lineages of Bryopsida and Lycopsid, respectfully, which are estimated to have diverged from seed-bearing plants (Spermatophytes) approximately 480 million years ago (mya) and 400 mya, respectively [31–35]. Within the more recent lineages of flowering plants (angiosperms), several genomes have been sequenced, including the monocotyledonous genomes of Oryza sativa (O. sativa) and Sorghum bicolor (S. bicolor), and the eudicotyledonous genomes of A. thaliana, P. trichocarpa, and Medicago truncatula[15–19, 22]. The monocot lineage is predicted to have diverged from other angiosperms approximately 200 mya, and within eudicots, the A. thaliana and P. trichocarpa lineages are predicted to have diverged in the Eurosid clade approximately 120 mya [35–38]. Collectively, the genomes of the six land plants contain information that allow for comparison of genome evolution throughout the approximately 450 million year history of land plants and inclusion of the genomes of red and green algae enables extension to 1.5 billion years of plant evolution.
In this study we conduct a detailed phylogenic analysis of plant CDF family members to lay out a framework from which more informed hypotheses can be generated regarding the function of CDF proteins in plants.
Scanning the genomes of the taxonomically diverse set of organisms outlined in the introduction for CDF sequences identified or confirmed the following number of sequences: O. lucimarinus (1), O. tauri (2), C. merolae (3), C. reinhardii (5), P. patens (11), S. moellendorffii (9), O. sativa (10), S. bicolor (9), P. trichocarpa (21), and A. thaliana (12) (Additional File 1). The number of CDF sequences identified from C. reinhardii, C. merolae, S. moellendorffii, P. patens, S. bicolor, and A. thaliana, genomes agree with previous published studies [2, 11, 12, 39], however, the gene models may not be the same. The number of P. trichocarpa CDFs was expanded to 21 from the previous estimate of 19  (Additional File 1). The expanded set includes a predicted pseudogene PtMTP8.4 and previously unidentified PtMTP10.4. The number of CDF sequences in the O. sativa genome was expanded from 8 to 10 due to the inclusion of previously unidentified members OsMTP7 and OsMTP8.
CDF family members from Viridiplantae and Rhodophyta genomes were used to estimate the CDF family phylogeny in land plants. The CDF sequences form 7 groups (1, 5, 6, 7, 8, 9, and 12). Groups were defined as lineages originated prior to or at the time of land plant evolution (Figure 1), and group nomenclature was assigned based on annotated CDF sequences from A. thaliana. Nomenclature for genes with prior annotations were kept [11, 12]. At least one sequence from all six land plant genomes included in this study was maintained in each of the seven groups. CDF members from algae C. reinhardii, C. merolae, O. lucimarinus, and O. tauri, are present within 4 of the 7 groups (Figure 1). Maintenance and in some cases expansion of these genes suggests that the CDF members from each group play important roles in plants.
CrMTP1 contains multiple introns and the P. patens sequences, PpMTP1 and PpMTP1.1, contain one and two introns, respectively (Figure 2B). The remaining sequences primarily contain only one 5' intron, with a few exceptions. Through searches of public databases, transcript support has been identified for each of the Group 1 members, except for PtMTP2 and PpMTP1.1. Missing transcript data from P. trichocarpa and P. patens may be due to incomplete transcript catalogues of these plants. The transcriptional evidence suggests that the genes of Group 1 are largely expressed in a variety of plants and algae, providing further evidence of this group's general importance in plants.
Phylogenetic analysis indicates that the MTP1/2 sequences and MTP3 sequences share a common ancestor some time after the monocot/eudicot divergence (Figure 2B). At the time of duplication, MTP1/2 and MTP3 most likely shared identical redundant function in that ancestor. The fate of the duplicated genes could take several different paths, including elimination, neofunctionalization, subfuctionalization, or even full/partial redundancy . The AtMTP1 and AtMTP3 DNA sequences share 67.7% sequence identity, and the proteins have similar predicted secondary structure with six transmembrane domains, cytoplasmically facing N-terminal and C-terminal ends, and a histidine-rich region [2, 41, 42]. Both proteins have been localized to the tonoplast membrane in yeast and plants, and both proteins have been shown to affect Zn and possibly Co tolerance and accumulation in yeast [41, 43–46]. However, the spatial, temporal, and responsive transcriptional regulation of each gene suggests that these proteins have different roles in plant Zn homeostasis. Evidence from an A. thaliana relative, Brassica juncea, suggests that BjMTP1 is expressed in secondary xylem parenchyma cells of the root while AtMTP3 is expressed in root epidermal and cortical cells [41, 47]. Also, AtMTP1 and BjMTP1 transcription is not regulated by Zn, while AtMTP3 is activated by elevated Zn influx [41, 43, 47]. Therefore, when MTP3 is expressed in conditions of high Zn or low Fe, accumulation of MTP3 and MTP1 could provide a continuous sequestration path in epidermal/cortical cell layers and xylem parenchyma cells limiting Zn translocation to the shoot [5, 41]. Spatial expression patterns of MTP1 and MTP3 are also different in vegetative and inflorescent shoot tissues [41, 43, 47]. So, while the protein sequence, structure, location, and substrate(s) are very similar, the expression patterns between AtMTP1 and AtMTP3 are unique. Therefore maintenance in the genome of the originally duplicated genes may be attributed to neofunctionalization/subfunctionalization via changes in expression patterns of the gene.
Additionally, the genome of A. thaliana maintains a more recent (<120 mya) duplication event yielding sequences AtMTP1 and AtMTP2 (Figure 2B). Comparing their gene expression metaprofiles across a database of microarrays suggests that they are not coexpressed (R2 = 0.001) , which suggests that these paralogs are not redundant.
The intron-exon boundaries largely support the evolutionary relationships of these sequences. In Group 8, two gene models, PtMTP8.4 and PpMTP8 (Figure 4), do not conform to a seven-exon gene structure. These loci have no associated ESTs, and when compared to their respective Group 8 sequences, both loci have large truncations of 5' regions that eliminate large portions of the cation efflux domains. This suggests that these loci are pseudogenes. Group 9 angiosperm sequences have very similar gene models (Figure 4). The exon boundaries of the S. moellendorffii and P. patens sequence deviate slightly from those defined in the angiosperms, but a clear 6 exon pattern is evident for most Group 9 sequences.
Function evidence for the role of Group 5 or Group 12 genes in plants is limited. The only functional data for these groups comes from the high throughput ionomic phenotyping database in which diverse plant accessions are screened for ionomic profiles . Among the many mutant lines screened by this group was an EMS induced mutation of AtMTP5. The ionomic profile of this mutant shows repeatable alterations in multiple ions in the mutant leaves including reduced levels of Mo, Mn, and Mg and increased levels of K and Zn. These data suggest that AtMTP5 has a role in regulating ion concentrations in A. thaliana under normal conditions.
The Group 6 members are the only plant CDF sequences to fall into the Zn/Fe-CDF group, although no studies have been conducted on Group 6 plant family members to confirm this substrate specificity . The only functional data for these groups comes from ionomic phenotyping . Profiling of an A. thaliana line with a homozygous T-DNA insertion into the coding region of AtMTP6 shows consistent diverse alterations in the ionome with reduced levels of Mg, Mo, and Ca and increased levels of Na, K, Mn, and Cd. The altered ion profile of the mtp6 mutant leaves suggests that Group 6 sequences are required for the maintenance of the plant ionome under normal conditions. The Group 7 sequences were not placed into any of the three substrate-specific groups and no functional data are available for members of this group .
Studies in mammals, nematodes, yeast, bacteria, and plants suggest CDF proteins serve important roles in essential cation transport and homeostasis. There is also evidence supporting other, more complex, roles in these organisms, such as involvement in oxidative stress resistance, interactions in signal transduction cascades, and proper functioning of the endoplasmic reticulum. Within plants only four members have been functionally characterized to any degree, and these studies show the importance of each member in essential cation accumulation, partitioning, and tolerance. Using phylogenomic analysis of complete CDF families from genomes of multiple, taxonomically diverse plants and algae, the plant CDF family is organized into seven primary groups that were present in ancestral genomes prior to or coincident with the origin of land plants. Within land plants, gene copy number expansion continues within select groups, while several groups are strictly maintained as one gene copy per genome. Defining these CDF lineages contributes to the study of this family in four ways.
1) Defining within group orthology/paralogy of particular genomes will help highlight potential redundant genes. For example, the P. trichocarpa genome has six Group 1 members, however these six sequences are actually three separate recent duplications of members in three different clades within Group 1 (Figure 2). This might predict that the protein products of the recently duplicated genes (i.e., PtMTP3.1 and PtMTP3.2) may have redundant function, but the inparalogs (PtMTP1 and PtMTP3.1) might not be redundant, but rather are subfunctionalized members similar to AtMTP1 and AtMTP3 (see discussion on Group 1, above).
2) Defining the primary groups improves the predictive power of functional assignment of orthologous/paralogous genes and aids in hypothesis generation when embarking upon functional studies of the members. For example, plant sequences from Groups 1, 5, 6, 7, and 12 are likely monophyletic lineages derived within ancestral prokaryotes and largely maintained in extant organisms. This suggests that comparisons with bacterial, archaeal, fungal, and mammalian homologues may be useful. Conversely, Group 8 and 9 lineages most likely result from a duplication of an ancestral Viridiplantae sequence. Therefore, sequences within at least one of these groups might have an altered functional role in plants as compared with the function of coorthologs in other organisms.
3) Defining groups will allow for a group-specific sequence motif to be generated that will help define future CDF family sequences and aid in functional motif and critical residue identification in plants. A CDF family signature sequence was defined that identifies CDF family members with only a 5% false identification rate, but this sequence is quite elaborate . The necessarily complex signature sequence may reflect the constraints inherent in encompassing all CDF family members and includes all variations within a diverse set of organisms. By focusing specifically on plant CDF members, the sequence variability due to host genome diversity will be reduced leading to more accurate identification of group-specific sequence motifs and critical residues important in plant CDF proteins.
4) The plant-specific expansion resulting in Groups 8 and 9 evolved prior to or coincident with the early primary radiation of plants onto land. The primary Siluro-Devonian radiation of terrestrial plants necessitated development of physiological mechanisms that would allow pioneering plants to take advantage of new ecological niches on land. In terms of the CDF family, the expansion from five to seven primary groups prior to or coincident with the divergence of bryophytes from the vascular plant lineage suggests the CDF family expansion provided an adaptive advantage before significant vascular development occurred in early land plants.
Protein sequences from A. thaliana were obtained from the NCBI database http://www.ncbi.nlm.nih.gov/. Gene models and protein sequences from O. sativa ssp. japonica, P. trichocarpa, S. bicolor, and C. reinhardtii were identified from the Phytozome website http://www.phytozome.net/ using the tBLASTn algorithm with the twelve A. thaliana CDF protein sequences [54, 55]. The gene model for OsMTP11.1 used in this study was obtained from the The Institute for Genomic Research (TIGR) website http://plantta.jcvi.org/ because the Phytozome gene model appears to be incorrect based on multiple sequence alignment with Group 9 sequences. Gene models of CDF family members from S. moellendorffii and P. patens were identified and annotated from the S. moellendorffii genome browser http://genome.jgi-psf.org/Selmo1/Selmo1.home.html and the P. patens resources website http://www.cosmoss.org/, respectively, through homology to A. thaliana CDF members by tBLASTn algorithm. CDF family members from O. tauri, O. lucimarinus, and C. merolae were identified through homology to S. moellendorffii CDF members by tBLASTn searches of their respective genome assemblies located at the Department Of Energy Joint Genome Institute (DOE JGI) http://www.jgi.doe.gov/ and the Cyanidioschyzon merolae Genome Project http://merolae.biol.s.u-tokyo.ac.jp/. Sequences from the genomes of M. acetivorans C2A, B. cereus ATCC 14579, N. punctiforme PCC 73102, E. histolytica HM-1:IMSS, D. melanogaster, C. elegans, S. cerevisiae, and H. sapien used for the CDF superfamily analysis (Additional File 2) were retrieved from the GenBank database using the accession numbers provide by . The CDF family members from P. aerophilum str. IM2, R. metallidurans CH34, T. crunogena XCL-2, and D. discoideum AX4 were identified from their respective sequenced genomes by tBLASTn using bacterial CDF sequences.
Protein sequences were aligned with ClustalW with using the Gonnet series weight matrix and default parameters http://workbench.sdsc.edu. Phylogenetic analysis was conducted by MrBayes, Bayesian inference of phylogeney, http://mrbayes.csit.fsu.edu/index.php with the amino acid model set to pr = mixed and lset rates = gamma [57, 58]. Two independent chains of Markov Chain Monte Carlo (MCMC) analysis were allowed to run until the standard deviation of the split frequencies was stable (ngen = 100,000-200,000)(SumT PRSF = ~1.0). The output file was read into the Interactive Tree of Life (iTOL) tool http://itol.embl.de/ for visualization and editing . Node probability values (posterior probability values) below 0.8 are shown in the figures. To test the accuracy of the tree topologies generated by the ClustalW alignments and Bayesian analysis, each group was also subjected to an alternative alignment by Muscle  and mafft  and alternative phylogenetic analysis by maximum likelihood (ML) using phyML  with LG and JTT substitution models and rate heterogeneity. In cases where the alternative algorithms indicated weaker branch support than the ClustalW/MrBayes predictions, the probability values of the alternative algorithms are included in the figures as posterior probability values with an asterisk. In large part, the alternative topologies agreed with those produced using ClustalW and MrBayes. One exception was Group 1. The topology of this group was sensitive to the method of alignment. Group 1 tree topologies generated by the Muscle and mafft alignments were consistent and contradicted topologies predicted by ClustalW alignments at several branches. However, the group was not sensitive to phylogenetic model selection as both MrBayes and phyML generated consistent topologies for a given alignment irrespective of the substitution model. Due to the consistent phylogenies produced by the Muscle and mafft alignments, the Muscle alignment was used for the phylogenetic analysis in Figure 2.
If annotations were lacking for plant CDF family members, annotations were given in accordance with the A. thaliana CDF family in most cases (Additional File 1) with the nomenclature model [1st letter of genus name][1st letter of species name]["MTP"][group number], for example, AtMTP1. In cases where one group contained multiple sequences from one plant, paralogous sequences are denoted with [group name][.n] where n is a number (1,2,3) that reflects sister lineage in cases where such predictions can be made. Established gene names were kept for A. thaliana and P. tricocarpa CDFs to maintain continuity between published studies. Changes to established annotations were recommended for C. reinhardtii to reflect each sequence's position in the phylogenetic tree. Additional File 1 lists the given names and the accession numbers used to identify the annotations in the given genome.
Arabidopsis thaliana (thale cress)
Oryza sativa (rice)
Sorghum bicolor (sorghum)
Populus trichocarpa (black cottonwood)
C. reinhardtii (green algae)
Selaginella moellendorffii (spike moss)
Physcomitrella patens (moss)
Ostreococcus tauri (phytoplankton)
Ostreococus lucimarinus (phytoplankton)
Cyanidioschyzon merolae (red algal). Additional abbreviations for Figure 1 are as follows
Pyrobaculum aerophilum str. IM2
Methanosarcina acetivorans C2A
Bacillus cereus ATCC 14579
Ralstonia metallidurans CH34
Thiomicrospira crunogena XCL-2
Nostoc punctiforme PCC 73102
Dictyostelium discoideum AX4 (slime mold)
Entamoeba histolytica HM-1:IMSS (amoeba)
Drosophila melanogaster (Fruit Fly)
Caenorhabditis elegans (nematode)
Homo sapien (human)
Sacchromyces cerevisiae, (baker's yeast).
This work was supported by grants to D.E.S. from the US National Science Foundation (0196310-IOS, 0129747-IOS and 0419695-IOS).
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.