Post-duplication charge evolution of phosphoglucose isomerases in teleost fishes through weak selection on many amino acid sites

Background The partitioning of ancestral functions among duplicated genes by neutral evolution, or subfunctionalization, has been considered the primary process for the evolution of novel proteins (neofunctionalization). Nonetheless, how a subfunctionalized protein can evolve into a more adaptive protein is poorly understood, mainly due to the limitations of current analytical methods, which can detect only strong selection for amino acid substitutions involved in adaptive molecular evolution. In this study, we employed a comparative evolutionary approach to this question, focusing on differences in the structural properties of a protein, specifically the electric charge, encoded by fish-specific duplicated phosphoglucose isomerase (Pgi) genes. Results Full-length cDNA cloning, RT-PCR based gene expression analyses, and comparative sequence analyses showed that after subfunctionalization with respect to the expression organ of duplicate Pgi genes, the net electric charge of the PGI-1 protein expressed mainly in internal tissues became more negative, and that of PGI-2 expressed mainly in muscular tissues became more positive. The difference in net protein charge was attributable not to specific amino acid sites but to the sum of various amino acid sites located on the surface of the PGI molecule. Conclusion This finding suggests that the surface charge evolution of PGI proteins was not driven by strong selection on individual amino acid sites leading to permanent fixation of a particular residue, but rather was driven by weak selection on a large number of amino acid sites and consequently by steady directional and/or purifying selection on the overall structural properties of the protein, which is derived from many modifiable sites. The mode of molecular evolution presented here may be relevant to various cases of adaptive modification in proteins, such as hydrophobic properties, molecular size, and electric charge.


Background
Proteins that arise through gene duplication can become novel proteins through fixation of beneficial mutations [1], but because beneficial mutations are generally rare, the partitioning of ancestral functions among duplicated genes by neutral evolution, or subfunctionalization, has been considered the primary process for the evolution of novel proteins [2][3][4][5]. To date, many duplicate genes have been demonstrated to evolve following this model of subfunctionalization, and this model thus has become widely accepted in the context of duplicated gene evolution [6][7][8][9][10].
Nonetheless, how a more adaptive or specialized protein property evolves after subfunctionalization is poorly understood, mainly due to the limited resolution power of current analytical methods, which seek to detect positive selection on individual amino acid substitutions involved in adaptive molecular evolution. Such methods can recognize substitutions expected to be driven by strong selection, many of which are usually function-altering substitutions at important amino acid sites, such as enzyme active sites or viral epitopes [e.g., [11]]. However, adaptive substitutions by relatively moderate or weak selection may not be recognized by these methods. Therefore, to detect adaptive protein evolution under a much wider range of selection pressure, a novel approach is required. In this study, we utilized a comparative evolutionary approach to this problem, focusing on differences in the high-dimensional properties of a protein, specifically the electric charge, encoded by a pair of duplicated genes.
As a model protein system for this study, we chose an important enzyme involved in glycolysis and gluconeogenesis, phosphoglucose isomerase (PGI; EC 5.3.1.9). The gene encoding this enzyme (Pgi) is present as a single copy in tetrapods, whereas two copies exist in most groups of ray-finned fishes [12][13][14]. The fact that these duplicated Pgi-1 and Pgi-2 genes in fishes are expressed in different organs [12,15] implies that these fish-specific duplicate Pgi genes are subfunctionalized with respect to their expression, and are thus good candidates for the model of subfunctionalized genes. For ray-finned fishes, a reliable phylogenetic framework, which is essential for comparative evolutionary analyses, is available due to recent progress in molecular phylogenetic studies [16][17][18][19][20][21]. In addition, the basal lineages of ray-finned fishes, including Semionotiformes (gar fish) and Amiiformes (amia), have only one Pgi locus [12]. This single-copy gene may be the direct descendant of the ancestral unduplicated Pgi in rayfinned fishes, and thus can be considered an appropriate outgroup gene for comparison between the duplicated Pgi genes.
The Pgi-1 and Pgi-2 genes, which were expected to be subfunctionalized in their expression, also differ in the net electric charge of their encoded proteins [12,13,15]. The electric charge of soluble proteins such as PGIs is a structural property brought by a large number of multiple amino acid residues and is involved in the adaptive evolution of several soluble proteins [e.g., [22][23][24]], such as an acquisition of protein thermostability [25][26][27]. Therefore, the evolution of electric charge in the duplicated PGI proteins is an interesting subject to investigate regarding the evolution of novel protein properties after subfunctionalization.
In this study, we examined first whether the spatial expression patterns of duplicated Pgi genes in ray-finned fishes are compatible with predictions based on the subfunctionalization model of duplicate gene evolution. Next, by focusing on the electric charges of the PGI proteins, we analyzed the underlying evolutionary process producing novel protein properties after gene duplication through ancestral sequence inference using the maximum likelihood (ML) method based on a reliable phylogenetic framework of ray-finned fishes, and also using threedimensional (3-D) structural information on the protein.

Duplication and subfunctionalization of the Pgi genes in teleost fishes
Our molecular phylogenetic analyses of vertebrate Pgi genes showed that the Pgi-1 and Pgi-2 genes in teleost fishes resulted from a gene duplication event that occurred before the radiation of teleosts but after the separation of basal non-teleost ray-finned fishes ( Figure 1A, arrow). The Pgi duplication appears to have derived from the ancient teleost genome duplication [28][29][30] because the phylogenetic position of the Pgi duplication confirmed here is the same as that of the estimated teleostspecific genome duplication event [31,32], and gene content around the Pgi locus in the human genome is partly conserved in the corresponding regions of both zebrafish Pgi loci (on chromosomes 13 and 25) [see Additional file 1: Fig. S1], which rules out the possibility of a tandem or single-gene duplication of the Pgi in teleosts. This condition allows us to study the duplicated Pgi genes without considering interlocus concerted evolution, which complicates the analysis of functional divergence in duplicated genes.
A reverse transcriptase-polymerase chain reaction (RT-PCR)-based expression analysis showed that the Pgi gene in non-teleost ray-finned fishes was expressed in all tissues examined ( Figure 1B), confirming that this gene is the direct descendant of the ancestral unduplicated Pgi with no tissue specificity, as in tetrapods [12,13,15]. In contrast, the teleost Pgi-1 gene was expressed mainly in internal organs, including the liver, heart, gill, brain, and kidney, and weakly in the muscle, whereas Pgi-2 was expressed mainly in the heart and muscle. The differential expression patterns of Pgi-1 and Pgi-2 support the concept of subfunctionalization [2,3], which is the complementary loss of subsets in the expression organs of the ancestral gene. Thus, the Pgi gene family in ray-finned fishes is an appropriate model for studying molecular evolution after subfunctionalization.
Between PGI-1 and PGI-2, 76 amino acid sites differed by the presence or absence of hydrophilic charged residues [Lys (K), Arg (R), Asp (D), and Glu (E)], which mainly contribute to net protein charge ( Figure 2B). These sites were not fixed for a unique charge state in the examined PGI-1 or PGI-2 proteins, except at position 294 (Gln in PGI-1 and Lys in PGI-2). Furthermore, only a few unique charged sites were shared among two or more genealogically related isoforms: five in PGI-1 (positions 27, 61, 78, 199, and 454) and two in PGI-2 (positions 17 and 226). These observations imply that very few specific amino acid residues were acquired in the early ancestral proteins and involved in the differences in electric charges between current PGI-1 and PGI-2.
The underlying process of the electric charge evolution of the PGI proteins can be inferred by ML sequence reconstructions [34] based on the recent ray-finned fish phylogeny [18]. We applied this approach and show the results in Figure 2A. Our results suggest that after gene duplication, pI values in the PGI-1 clade gradually decreased, whereas those in the PGI-2 clade increased. Next, we assigned charge-changing substitutions (between Lys/Arg or Asp/Glu and other residues) to the tree branches based on pairwise sequence comparisons among the inferred ancestral sequences or between the extant and ancestral sequences along the tree topology. This result of assignment ( Figure 2C) showed that the charge-changing substitutions were inferred to have occurred in excess (5-28) of that expected (3)(4)(5) for parsimonious evolution in electric charge differences between PGI-1 and PGI-2 ( Figure 2C; see also Table 1). Figure 2C also shows that the chargechanging substitutions have occurred in both directions (either upward or downward) on most branches at various amino acid sites (76 sites shown in Figure 2B) [see Additional file 1: Table S1]. An analysis using parsimony yielded similar results [see Additional file 1: Fig. S3].

Statistical analyses of the spatial clustering of inferred amino acid substitutions
Based on 3-D structural information on the PGI protein molecule, we further examined whether the inferred charge-changing substitutions were actually involved in the evolution of electric charge. The results of this analysis on the inferred substitution sites and number of substitutions are shown in Figure 3A and 3B, respectively. Figure  3A shows that the inferred charge-changing substitution Molecular phylogeny and spatial expression patterns of Pgi In cDNA clones, only one Pgi was identified from non-teleosts, whereas two Pgi genes were identified from teleosts. The two Pgi genes differed by about 20% in amino acid sequence, and were grouped into separate clades (Pgi-1 and Pgi-2). In both clades, the gene relationships were consistent with the evolutionary relationships of teleost species [18,19,21]. (B) Partial-length gel images of the RT-PCR expression analysis of Pgi genes and positive control (β-actin) genes in ray-finned fishes. The tree in the left panel shows the relationships among the Pgi genes inferred in this study. The black circle on the tree denotes the timing of the Pgi gene duplication event. Letters indicate tissues: M, muscle; L, liver; H, heart; Gi, gill; B, brain; K, kidney. Full-length gels, including negative controls and size markers, are presented in Additional file 1: Fig. S5 sites after the Pgi gene duplication (colored in magenta) were concentrated at the surface of the PGI molecule, in contrast to the inferred charge-neutral substitution sites (colored in dark gray) that contribute little or nothing to net protein charge. The inferred number of charge-changing and charge-neutral substitutions that can potentially occur at identical sites also followed the same trend (Figure 3B).
However, because water-soluble proteins such as PGI are generally surrounded by a hydrophilic shell containing a high density of polar residues, it is natural to expect the charge-changing substitutions to occur more frequently on the surface without any selection. Considering this expected mutation bias, further analysis was performed (  [35,36]. However, what is most important in this table is that the proportion of charge-changing

Discussion
The results of phylogenetic analysis, RT-PCR-based expression analysis, and sequence comparison of Pgi genes in teleost fishes suggest that after subfunctionalization of the duplicated Pgi genes in the ancestor of teleost fishes, the electric charges of the PGI-1 and PGI-2 proteins diverged. This evolution can be interpreted according to the sub-neofunctionalization model of gene evolution [2][3][4][5], which proposes that the partitioning of function between the duplicated genes alters the selective environment at each locus, resulting in structural fine-tuning or adaptation of the encoded proteins by positive selection. That is, the divergent evolution of the electric charges in the duplicated PGI isoforms is the consequence of specialization for the specific function (glycolysis or gluconeogenesis) or distinct cellular environment of tissues where each isoform is predominantly expressed (see Figure 1B), as suggested for other water-soluble proteins [22][23][24].
The present comparative evolutionary analysis implies that since the gene duplication event, the electric charges of the two PGI isoforms changed steadily through many charge-changing substitutions in both directions of charge change; only a few charged amino acid sites were specific to PGI-1 or PGI-2 ( Figure 2). Such charge-changing substitutions concentrated at the surface of PGI molecule (Figure 3) were inferred to have occurred much more frequently than expected in the parsimonious evolution of electric charge difference between the two isoforms (see Figure 2C and Table 1). From these observations, two pos-

Figure 2 Current states and inferred evolutionary process of electric charge change in PGI isoforms. (A) Maximum likelihood tree of
Pgi genes in ray-finned fishes inferred by BASEML [34] with a known phylogeny [18]. Numbers indicate estimated pI. Arrow denotes a gene duplication event. (B) Amino acid sites that differ by the presence or absence of hydrophilic charged residues between current PGI-1 and PGI-2. Positively charged residues are blue; negatively charged residues, red; other residues, light gray. The numbers above refer to the amino acid positions of PGI [33]. The stars below indicate sites located on the molecular surface. (C) Inferred charge-changing substitution events mapped over the PGI phylogeny. Orange and brown bars denote upward and downward charge changes, respectively.
sible scenarios are proposed for the evolution of protein charge in duplicated PGI isoforms: protein charges in PGI-1 decreased gradually, while those in PGI-2 increased; alternatively, charge divergence between PGI-1 and PGI-2 was completed soon after the duplication and before the radiation of teleosts, followed by maintenance of the protein charges under purifying selection, while stochastic charge-changing substitutions by drift occurred among lineages. In either scenario, we can conclude that the surface charge evolution of PGI proteins was not driven by strong selection on individual amino acid sites leading to permanent fixation of a particular residue, but rather was driven by weak selection on a large number of amino acid sites and consequently by steady directional or purifying selection on the overall structural properties of the protein, which is derived from many modifiable sites. This mode of molecular evolution agrees with the understanding that most proteins are substantially tolerant of a broad spectrum of substitutions and thus may harbor many amino acid sites available for evolutionary modification [37]. Our study provides the first plausible evidence of adaptive protein evolution through such selection.  The mode of molecular evolution proposed in this study would be difficult to find using existing methods that detect strong selection for particular substitutions. We applied such an analysis to identify positively selected sites after Pgi gene duplication using DIVERGE version 1.04 [38]; however, the results were not clear (data not shown). Further analysis using the program CODEML [34] did not detect the acceleration of the rate of nonsynonymous substitution, not showing selection for amino acid changes (estimated ω was 0.01 to 0.23). Even if, in general, a significant excess of amino acid change is detected, such methodology itself cannot rule out possible confounding effects, or alternative interpretations, particularly the relax of purifying selection [39]. In a previous study, an analysis using the program HonNew [40] also failed to detect selection for charge-changing substitutions in teleost PGIs, leading to the conclusion that the charge change in the duplicate PGIs of teleosts may be selectively neutral [13]. The mode of molecular evolution presented here, in which diverse evolutionary resolutions exist at the level of a primary sequence that corresponds to a certain selective pressure on a protein property, may be relevant to various cases of adaptive modification in proteins, such as hydrophobic properties, molecular size, and electric charge. This may be an important pathway underlying physiological adaptation, along with protein evolution by simple amino-acid changes, gene deletion or silencing, and possibly cis-regulatory changes [39,41].

Conclusion
In this paper, we provide the evidence that relatively weak selection on a large number of amino acid sites drives the evolution of novel charge-state of duplicated phosphoglucose isomerases, which are subfunctionalized in teleost fishes. Such mode of adaptive molecular evolution, which was hardly recognizable by existing analytical methods aiming to detect strong selection on individual amino acid changes, may play a substantial role in the evolution of novel proteins.

Taxonomic sampling
Our data set contains representatives from divergent lineages of ray-finned fishes, as follows -basal non-teleost ray-finned fishes: Polypterus ornatipinnis (bichir), Acipenser ruthenus (sturgeon), Amia calva (amia), and Lepisosteus osseus (gar); teleosts: Osteoglossum bicirrhosum (arowana) and Anguilla anguilla (eel) from basal groups, Plecoglossus altivelis (smelt) and Danio rerio (zebrafish) from intermediate groups, and Mugil cephalus (mullet) and Fugu rubripes (fugu) from derived groups. Live specimens, which were obtained either from local shops or other investigators in Japan, were treated according to the ethical recommendations of the Ichthyological Society of Japan and the University of Tokyo.

Cloning and sequencing
Total RNA was extracted from fresh skeletal muscle and liver tissue using TRIzol reagent (Invitrogen) and reversetranscribed into first-strand cDNA with oligo-dT adaptor primer using an RNA PCR kit (TaKaRa). Partial Pgi cDNA was amplified using PCR with vertebrate universal degenerate primers [13]. The well amplified DNA fragments were purified using a MinElute gel extraction kit (Qiagen), ligated into the pGEM-T Easy Vector system (Promega), transmitted into competent E. coli (Competent High DH5a, Toyobo), and sequenced with an ABI PRISM 3100 (Applied Biosystems) using T7 or SP6 primers. The partial Pgi sequences were used to design gene-specific primers (GSPs) for RACE PCR [see Additional file 1: Table S2]; 3' RACE PCR was conducted with the sense GSP and M13 primer M4 (TaKaRa) and the first-strand cDNA as the template. For 5' RACE, double-stranded cDNA PCR libraries were synthesized from 1 µg of total RNA using the cDNA synthesis kit (M-MLV version; TaKaRa) combined with the cDNA PCR library kit (TaKaRa). Then, 5' RACE PCR was conducted with the antisense GSP and CA primer (TaKaRa). Subcloning and sequencing were performed as above.

Phylogenetic analysis
The Pgi genes from 20 vertebrates were phylogenetically analyzed with the Bayesian and ML methods using the programs MrBayes 3.0B4 [42] and PAUP 4.0b10 [43], , Bufo melanostictus (toad; AJ306397), and Paramyxine yangi (hagfish; AJ306391). Newly cloned sequences in this study (marked with asterisks) were named under the denomination of PGI isozymes [12]. Bayesian and ML trees were constructed under the GTR + I + Γ model [44], which was selected as the best-fitting model of nucleotide substitution by hierarchical likelihood ratio tests (hLRTs) [45,46] with 1100 base pairs (bp) of the Pgi coding region (excluding the third codon position) [see Additional file 1: Fig. S4]. The Bayesian posterior probabilities of the phylogeny and its branches were determined from 9901 trees. Support for heuristic ML analysis was assessed using 100 bootstrap replications.

Synteny analysis
The genomic regions around the Pgi locus (or loci) in the human, chicken, and zebrafish genomes were investigated and compared. Genomic data from the pufferfishes Fugu rubripes and Tetraodon nigroviridis were not useful in this analysis because the locations of their Pgi loci were not determined. Data on the neighborhood of the Pgi locus in the human and chicken genomes were obtained from the NCBI Mapviewer Web site [47]. Twenty-seven proteincoding genes were identified around the human PGI locus, within a 1.8-Mb-long region on chromosome 19. The nucleotide sequences of these human genes were subjected to BLASTN searches against the zebrafish genome sequences using the Ensembl BLASTN search service [48]. The matches detected with an E-value threshold of <10 -3 were checked visually. Then, we selected identifiable genes described as putative orthologs of the queries. Their genomic location data were used to rebuild the synteny maps around the zebrafish Pgi loci.

Gene expression analysis
RT-PCR was performed for expression analysis of the Pgi genes. The primers used are described in [see Additional file 1: Table S3]. They were designed as follows: to distinguish the duplicate Pgi loci in teleosts, the 3' region of one primer from each primer pair was made to locate the differential nucleotide site between the two loci of the species concerned, and to avoid false amplification from genomic DNA contaminants, each primer pair was designed to span a Pgi exon/intron boundary considered conservative among vertebrates. Total RNA was extracted from liver, skeletal muscle, heart, gill filament, brain, and kidney (or gonad) tissues of fresh fish samples. RNA extraction, reverse-transcription into first-strand cDNA and PCR were performed in the same manner as mentioned in the Cloning and sequencing section. The thermalcycle profile was as follows: 1 cycle at 94°C for 2 min; 30 cycles at 94°C for 30 sec, 60°C for 30 sec, and 72°C for 30 sec; followed by 1 cycle at 72°C for 7 min. As a positive control for gene expression, β-actin cDNA was amplified using the primers 5'-GACATGGAGAAGATCTGGCA-3' and 5'-TGATCCACATCTGCTGGAAGGT-3' (predicted product size = 834 bp), which were designed by Dr. Kaoru Kuriiwa of the National Museum of Nature and Science, Tokyo. These primer sequences were based on a highly conserved region of the β-actin gene in mangrove killifish, Rivulus marmoratus (GenBank accession number AF168615). The amplified DNA fragments were separated on a 2.0% L03 agarose gel (TaKaRa), stained with ethidium bromide, and visualized under UV light. GeneRuler™ 100 bp DNA Ladder Plus (MBI Fermentas) was used as a size marker for electrophoresis.

Charge evolution analysis
The ML inference of the ancestral sequences of Pgi genes was performed by BASEML [34] based on the phylogeny of ray-finned fishes using whole mitochondrial genome data [18]. Tetrapods were excluded from this analysis because of their absence in this tree. Nucleotide sequence alignments of the coding region of Pgi cDNAs (1650 bp, without ambiguous regions) from 10 ray-finned fishes plus hagfish were used. The GTR + Γ [44] model was selected as the best fitting model by the hLRTs [see Additional file 1: Table S4]. The average overall accuracy of the reconstructed sequences (#1-#15) [see Additional file 1: Appendix] was 0.948 ± 0.003 SE. The pI values were estimated from the deduced amino acid sequences using the ProtParam tool [49]. The solvent-accessible surface area (SASA) of each amino acid residue was estimated with GETAREA 1.1 [50] for the dimeric PGI protein structure using a solvent radius of 1.4 Å (approximately the size of a water molecule). Rabbit PGI [PDB: 1XTB] [33] was used as a reference structure. The structural portion of the PGI composed of amino acid residues with more than 20 Å 2 SASA was considered "molecular surface." This boundary mostly agrees with other criteria based on the ratio of sidechain surface area to random coil value per residue [50]. A three-dimensional graphical model of the PGI molecule was constructed using RasMol [51].

Calculation of the expected spatial distribution of amino acid substitutions
To determine which model of amino acid substitution provided the best fit to the data (550-amino-acid sequence of PGIs from 11 fishes and the known phylogenetic framework of ray-finned fishes [18]), likelihood ratio tests were conducted among pairs of five models mounted in PAML 3.13d [34]. Parameters F and Γ were incorporated in this analysis. As a result, the amino acid substitution matrix JTT [52] gave the highest likelihood score (lnL = -5851.41); the second-best matrix was Dayhoff [53] (lnL = -5865.41). Using the JTT matrix (m ij ), transition rates between pairs of amino acids (P ij ) were calculated by the equation where f i is the normalized frequency and µ i is the relative mutability of each amino acid. The parameter f i was estimated separately for the surface and interior portions of the inferred common ancestral protein of PGI-1 and PGI-2 (node #5 in Figure 2C) to consider differential amino acid composition in different parts of the protein [see Additional file 1: Table S5]. Based on the resultant P ij , we estimated the theoretical ratio of the charge-changing substitutions to charge-neutral substitutions (ΣP charge-changing :