Testis-specific glyceraldehyde-3-phosphate dehydrogenase: origin and evolution
© Kuravsky et al; licensee BioMed Central Ltd. 2011
Received: 24 November 2010
Accepted: 10 June 2011
Published: 10 June 2011
Skip to main content
© Kuravsky et al; licensee BioMed Central Ltd. 2011
Received: 24 November 2010
Accepted: 10 June 2011
Published: 10 June 2011
Glyceraldehyde-3-phosphate dehydrogenase (GAPD) catalyses one of the glycolytic reactions and is also involved in a number of non-glycolytic processes, such as endocytosis, DNA excision repair, and induction of apoptosis. Mammals are known to possess two homologous GAPD isoenzymes: GAPD-1, a well-studied protein found in all somatic cells, and GAPD-2, which is expressed solely in testis. GAPD-2 supplies energy required for the movement of spermatozoa and is tightly bound to the sperm tail cytoskeleton by the additional N-terminal proline-rich domain absent in GAPD-1. In this study we investigate the evolutionary history of GAPD and gain some insights into specialization of GAPD-2 as a testis-specific protein.
A dataset of GAPD sequences was assembled from public databases and used for phylogeny reconstruction by means of the Bayesian method. Since resolution in some clades of the obtained tree was too low, syntenic analysis was carried out to define the evolutionary history of GAPD more precisely. The performed selection tests showed that selective pressure varies across lineages and isoenzymes, as well as across different regions of the same sequences.
The obtained results suggest that GAPD-1 and GAPD-2 emerged after duplication during the early evolution of chordates. GAPD-2 was subsequently lost by most lineages except lizards, mammals, as well as cartilaginous and bony fishes. In reptilians and mammals, GAPD-2 specialized to a testis-specific protein and acquired the novel N-terminal proline-rich domain anchoring the protein in the sperm tail cytoskeleton. This domain is likely to have originated by exonization of a microsatellite genomic region. Recognition of the proline-rich domain by cytoskeletal proteins seems to be unspecific. Besides testis, GAPD-2 of lizards was also found in some regenerating tissues, but it lacks the proline-rich domain due to tissue-specific alternative splicing.
Glyceraldehyde-3-phosphate dehydrogenase (GAPD, EC 184.108.40.206) is a homotetrameric glycolytic enzyme providing phosphorylation of 3-phosphoglyceraldehyde to 1,3-diphosphoglycerate coupled with reduction of NAD+ to NADH. Mammals are known to possess two tissue-specific GAPD isoenzymes: somatic (GAPD-1) and testis-specific (GAPD-2, GAPDS). For Homo sapiens, their protein sequences are 68% identical. Besides the two isoenzymes, a vast amount of GAPD pseudogenes was found in the genomes of primates and rodents [1, 2].
Mammalian GAPD-1 is a well-studied protein, a high concentration of which in cells (5-15% of all cytoplasmic proteins) confirms its functional significance. Recent studies established that GAPD-1 is not simply a classical metabolic protein involved in glycolytic energy production, but rather a multifunctional protein with specific functions in numerous processes [3, 4]. GAPD-1 was shown to display both cytosolic and nuclear localization participating in endocytosis [5–7], plasma membrane fusion , microtubule assembly [9, 10], secretory vesicular transport [11, 12], protein phosphotransferase/kinase reactions [13, 14], translational and transcriptional controls of gene expression [15–17], regulation of telomere structure [18, 19], nuclear membrane fusion , nuclear RNA transport , DNA excision-repair [22, 23] and induction of apoptosis in case of oxidative stress [24–27]. Furthermore, GAPD-1 was implicated in Alzheimer's [28–30] and Huntington's [30–32] neurodegenerative diseases.
As opposed to soluble GAPD-1, mammalian GAPD-2 is tightly attached to the cytoskeleton, namely to the principal piece of the spermatic filament fibrous sheath [33–35]. The attachment is mediated by an additional N-terminal proline-rich domain of 74 amino acids [35, 36]. GAPD-2 supplies the dynein ATPases of filament with energy, therefore playing a crucial role in the maintaining of sperm motility. Disruption of its expression generally leads to infertility . Due to its strong association with cytoskeleton GAPD-2 remains within the insoluble fraction after cell breaking, significantly complicating its experimental investigation. As a result, there is only little data on GAPD-2 properties. It was recently discovered to display enhanced stability towards denaturation that may be an adaptation to the absence of protein expression in spermatozoa. Enzyme kinetics exhibited by GAPD-2 was found to differ from the one exhibited by GAPD-1 too . Based on the study of short functional motives of both mammalian isoenzymes, GAPD-2 was proposed to evade involvement in most non-glycolytic processes characteristic for GAPD-1 .
GAPD-1 and GAPD-2 are also possessed by some other vertebrates besides Mammalia [40–42], but their expression is apparently not always tissue-specific. In the bony fish Oplegnathus fasciatus both GAPD mRNAs were detected ubiquitously in all of tissues examined , and therefore the functional specificities of the isoenzymes seem to differ from the mammalian ones. Based on the phylogenetic trees, it was hypothesized that GAPD could diverge to the isoenzymes around the origin of Bilateria, but as only vertebrates have retained GAPD-2, this scenario seems unlikely. However, some vertebrates (e.g. Xenopus laevis) were discovered to lack GAPD-2 .
Single copy genes are thought to evolve conservatively because of strong negative selective pressure. Gene duplications produce a redundant gene copy and thus release one or both copies from negative selective pressure. Thus, duplications should be an important precursor of functional divergence. The increased availability of sequences in the public databases allows the investigation of the molecular evolution of the GAPD gene family and the evaluation of selection following duplication events. In the present study we focus on the evolution of the poorly uninvestigated GAPD-2 isoenzyme. Previously GAPD-2 was discovered to be specific for vertebrates . Therefore we will focus on this taxon as well as on the other groups of deuterostomes not considered in . Specifically, we (1) examine the evolutionary history of GAPD-2 and other GAPD isoenzymes of deuterostomes, (2) evaluate lineage-specific changes in selective pressure affecting GAPD isoenzymes, and (3) look into the metamorphosis of GAPD-2 to a testis-specific protein.
GAPD sequences of teleosts were obtained using both Ensembl (5 species present in this database) and BLAST searches against RefSeq transcripts and the EST division of GenBank (species not covered by Ensembl). Three different sequences were discovered for 4 species, two sequences - for 6 species and a single sequence - for 3 species. The differences between the numbers of obtained GAPD sequences are not necessarily a result of data incompleteness and may be biologically relevant. For example, only two sequences were identified within a complete genome of zebrafish.
Identification of GAPD sequences of all other species was performed by conducting BLAST searches against RefSeq transcripts and the EST division of GenBank. Two different GAPD sequences were discovered for lizards, some cartilaginous fishes, some jawless vertebrates, some tunicates and a few non-deuterostomes (3 of 10 insects, a leech and a flatworm). Single GAPD sequences were discovered for all examined birds, reptiles except lizards, amphibians, lancelet, echinoderms, acorn worm, Xenoturbella bocki, as well as for the remaining cartilaginous fishes, jawless vertebrates, tunicates and most examined non-deuterostomes. Two species (Xenopus laevis and Ciona savignyi) were revealed to possess even three, but slightly different GAPD family members.
Availability of the two EST variants must be a result of tissue-specific alternative splicing: the exon of 103 nucleotides was either preserved, as in gonads, or spliced out, as in embryo and regenerating tail. Thus, the presence of the proline-rich domain has a tissue-specific character.
A few ESTs of G. gecko were extracted from samples of injured brain and spinal cord [GenBank:EB170778, GenBank:CV053413] and had incomplete 5'-termini: only a part of the sequence encoding the proline-rich domain was present. Therefore it is impossible to ascertain whether the translation of the proline-rich domain in G. gecko is governed by alternative splicing like in A. carolinensis.
All GAPD isoenzymes of vertebrates can be subdivided into two groups based on the clades of phylogenetic tree: the group including mammalian GAPD-1 and the group including mammalian GAPD-2. GAPD of insects separates before these two groups diverge, which means that the duplication into GAPD-1 and GAPD-2 took place after the divergence of protostomes and deuterostomes. The orthologs of mammalian GAPD-1 and GAPD-2 are further referred to as GAPD-1 and GAPD-2, correspondingly.
The clade including mammalian GAPD-1 is supported by a high posterior probability (100%). Inside this clade a number of additional duplications were detected. One of them apparently happened near the origin of teleosts and produced a third GAPD isoenzyme hereinafter referred to as GAPD-3. Other independent duplications produced additional GAPD isoenzymes in lamprey, hagfish, sea squirt and Xenopus laevis.
The clade including mammalian GAPD-2 is based on a less robust branch with a posterior probability of 77%. It splits into a clade of vertebrates (100% posterior probability) and a clade including the only GAPD isoenzymes of echinoderms, lancelet, hemichordates and Xenoturbella bocki, as well as the second GAPD isoenzyme of some tunicates (77% posterior probability). On account of lower support value, merging of these two clades into one is questionable and needs confirmation.
BLAST searches of genes which are syntenic to human and fish GAPD-2 were carried out in the genomes of lancelet, sea squirt, sea urchin and acorn worm. They showed that these genes are dispersed in the genomes rather than combined together in a single locus.
The constructed synteny maps provide support for orthology between GAPD-1 of human, either GAPD-1 or GAPD-3 of stickleback, GAPD of lancelet and sea squirt, as well as between GAPD-2 of human and both fishes. These results generally agree with the phylogenetic trees, indicating orthology between appropriate isoenzymes of human and fishes. Syntenic analysis helped to identify the origin of lancelet and sea squirt GAPDs, which was not determined with confidence by phylogenetic trees construction because of low branch support values. The evidence is also given for the origination of GAPD-1 and GAPD-3 of stickleback and probably some other bony fishes as a result of teleost-specific whole genome duplication.
Pairs of sequences used for Ka/Ks calculation
Sequence identity, %
Homo sapiens - Microcebus murinus
Homo sapiens - Callitrix jacchus
Tetraodon nigroviridis - Takifugu rubripes
Tetraodon nigroviridis - Takifugu rubripes
Drosophila ananassae - Drosophila virilis
In mammalian GAPD-1 and GAPD-2, teleost GAPD-2 and insect GAPD, the purifying selection is impaired approximately between the 85th and 105th positions of protein sequences (Section 1; from here on the numbering of amino acid positions as in mammalian GAPD-1). In mammalian GAPD-2 purifying selection is also weakened between the 265th and 285th positions (Section 2). In teleost GAPD-1 purifying selection is weakened between the 55th and 75th positions (Section 3).
The regions under impaired purifying selection were mapped on the 3D-structure of human GAPD-1 (PDB ID 1u8f). Section 1 corresponds to a buried β-strand and adjacent loops near the NAD-binding site. Sections 2 and 3 are solvent-exposed regions of the polypeptide chain also composed of both β-strands and loops.
The Ka/Ks ratio estimates for GAPD isoenzymes under various branch-specific models
0.05363 (all except mammalian GAPD-2)
0.12179 (mammalian GAPD-2)
0.07405 (all except insect GAPD)
0.02642 (insect GAPD)
0.06263 (all except listed below)
0.12186 (mammalian GAPD-2)
0.02567 (insect GAPD)
0.06672 (mammalian GAPD-1)
0.05218 (fish GAPD-1)
0.07492 (fish GAPD-3)
0.12187 (mammalian GAPD-2)
0.06218 (fish GAPD-2)
0.02593 (insect GAPD)
In this study we sought to expand the previous phylogenetic investigations of GAPD [42–50] by concentrating on deuterostomes. As compared to the study in reference , which is also focused on deuterostomes, we introduced a number of new sequences especially from non-mammalian and non-teleost species and carried out the syntenic analysis. This allowed more accurate determination of phylogeny, as well as the identification of some novel GAPD isoenzymes, for example the third isoenzyme of teleosts.
GAPD-2 was lost in most lineages and retained only by mammals, lizards, teleosts and cartilaginous fishes. The presence of both isoenzymes in these organisms raises the question of a functional difference between them. It is assumed that if two isoenzymes perform the same function in the same set of tissues, one of them is free from functional constraints and its gene will eventually turn into a non-functional pseudogene or will be deleted [62–64]. In mammals and lizards GAPD-1 and GAPD-2 specialized to tissue-specific proteins and this is probably the reason why one of them avoided the lost. Generally, specialization towards tissue-specificity is a trend among glycolytic enzymes that have acquired additional copies. In vertebrates, they usually have distinctive isoenzymes in liver, muscle and brain, sometimes in erythrocytes and other tissues [42, 60]. The situation with GAPD of teleosts and cartilaginous fishes is more complex. According to EST data, GAPD-1 and GAPD-2 of fishes are expressed in the same tissues. The results of branch-specific tests indicate that the evolutionary rates of both isoenzymes are accelerated as compared to the ancestral GAPD (GAPD of insects, which separated before the emergence of GAPD-1 and GAPD-2, was considered to evolve with the similar rate as the ancestral protein). This is in line with the model of gene duplications proposed by Hughes [65, 66]. It suggests that the original gene was performing two or more functions. After duplication each copy specialized on performing a part of them. GAPD is known to be a multifunctional protein participating in many processes beyond glycolysis. As the catalytic center is conserved in both isoenzymes, GAPD-1 and GAPD-2 of teleosts and cartilaginous fishes may specialize on performing different non-glycolytic functions, as also evidenced by Ka/Ks profiling. Different regions of teleost GAPD-1 and GAPD-2 are under impaired purifying selection. These regions can correspond to the parts of proteins which are responsible for performing isoenzyme-specific non-glycolytic functions.
A number of additional duplications of GAPD genes occurred independently in certain lineages. For example, some teleosts possess the third GAPD isoenzyme (GAPD-3) in addition to GAPD-1 and GAPD-2. Taking into account both the constructed phylogenetic trees and the obtained data on syntenies, it can be concluded that GAPD-3 originated from GAPD-1 during the teleost-specific whole-genome duplication . However, GAPD-3 was not found in complete genomes of zebrafish, tetraodon and fugu, which means that it was lost.
The retention of GAPD-3 by certain species of teleosts agrees with the model of dosage balance proposed by Papp with colleagues [65, 68]. It states that genes having optimal dosages that are dependent on each other may be lost only synchronously after whole-genome duplications. Therefore they are preferentially kept. In the study by  most of the other glycolytic enzymes were shown to have extra copies in teleosts, which also originated during the whole-genome duplication. Therefore, GAPD-3 as well as the other additional glycolytic isoenzymes in teleosts may be retained to prevent dosage imbalance leading to glycolysis malfunction.
The model of dosage balance also provides an explanation for Xenopus laevis possessing three slightly different GAPD isoenzymes [Swiss-Prot:P51469, GenBank:BC043972, GenBank:BC048770]. Following the results of phylogenetic analysis (Figure 3), the duplications of GAPD-1 gene giving rise to these isoenzymes seem to have taken place after the divergence of Xenopus and Rana genera of frogs. X. laevis is known to have undergone a whole-genome duplication event about 40 million years ago [69, 70] and most of its genes have two copies . Furthermore, the GAPD genes in reference [GenBank:BC043972, GenBank:BC048770] might be the allelic variants of a single gene since they are 99% identical and their evidence is only at transcript level. If so, X. laevis would have only two GAPD genes, in line with the dosage balance model.
Sea squirt Ciona savignyi was discovered to possess three different GAPD isoenzymes as well. All of them seem to have originated from GAPD-1 after the emergence of tunicates (Figure 3). To check whether these isoenzymes are encoded by distinct genes or allelic variants of a single gene, we turned to the C. savignyi genome assembly version 2.0 (Broad Institute) with removed redundant alleles  available via Ensembl. There were only two GAPD genes [Ensembl:ENSCSAVG00000004357, Ensembl:ENSCSAVG00000007442] corresponding to the two isoenzymes. The remaining isoenzyme was 97% identical to one of the others. Perhaps, it is nothing but an allelic variant since C. savignyi displays extremely high allelic polymorphism .
It looks like the duplication giving rise to the GAPD-1 copies in C. savignyi is advantageous itself by increasing GAPD dosage. Otherwise fixation of even two consecutive duplications seems to be unlikely. The model of gene duplications assuming beneficial increase in gene dosage has been extensively studied and shown to be applicable in a number of cases [74–76]. The duplications of GAPD in the considered species may be explained by the emerged necessity of enhancing of some non-glycolytic functions of GAPD, as it is hard to imagine that such a conserved process as glycolysis needs an increase of a dose of one of its enzymes.
Mammalian GAPD-2 is known to be a highly specialized isoenzyme, which is present solely in testis (microarray data are available in the ArrayExpress database at http://www.ebi.ac.uk/arrayexpress under accession numbers E-GEOD-7307, E-GEOD-3526, E-TABM-969 and E-GEOD-2361) [77–79]. We have found that GAPD-2 is expressed in a testis-specific way also by two lizard species. Lizards are also the only lineage besides mammals in which GAPD-2 possesses the proline-rich domain. Taking into account that this domain serves as an anchor to spermatic filament cytoskeleton, correlation between its presence and testis-specific expression seems to be evident.
As GAPD-2 is a testis-specific protein only in mammals and lizards, it is likely to have specialized in this way during the early evolution of amniots. However, birds have completely lost GAPD-2. We could not detect it in any of the examined bird species including Gallus gallus and Taeniopygia guttata with complete genomes. So, the same GAPD isoenzyme should act in both somatic tissues and testis. It remains unclear what changes in bird spermatozoa rendered testis-specific GAPD-2 unnecessary.
GAPD-2 is not the only testis-specific glycolytic isoenzyme. There are also testis-specific isoenzymes of phosphoglycerate kinase (PGK-2) [80, 81] and lactate dehydrogenase (LDHC) [82–84]. It is remarkable that both are possessed only by mammals, thus resembling GAPD-2. PGK-2 originated from PGK-1 isoenzyme by retrotransposition [85, 86], while LDHC stems from the LDHA isoenzyme . These events are supposed to have taken take place during the early evolution of mammals. Perhaps, the gain of three testis-specific glycolytic isoenzymes is a consequence of an alteration of spermatozoa structure. Mammalian spermatozoa are known to have a relatively long and thin tail, complicating ATP diffusion from mitochondria along it . Therefore, energy is generated mostly by glycolytic enzymes located in the tail cytoplasm [34, 88]. Such reorganization of metabolism may require special isoenzymes with distinctive catalytic properties.
As mentioned before, a unique feature of testis-specific GAPD-2 is the additional N-terminal proline-rich domain, which is absent in all other GAPD isoenzymes. Moreover, there are no additional fragments in PGK-2 and LDHC. The spatial structure of the proline-rich domain is still unsolved. We have found that for the majority of mammals it is encoded by two exons. The first exon encodes a conservative segment of 22 amino acids. The second exon encodes a segment with a high content of proline residues, highly variable in both length (58-97 amino acids) and composition (see additional file 2: The proline-rich N-terminal domains of mammalian GAPD-2). The layout of proline residues has a strikingly repetitive character. They form Pn and (XP)n motifs, where X is any amino acid (often cysteine, glutamic acid or glutamine). Generally, polyproline repetitive motifs are known to participate in strong but unspecific protein-protein interactions . Apparently they play the same role in the proline-rich domain of GAPD-2 mediating the binding to spermatic filament cytoskeleton. The presence of two different kinds of polyproline motifs suggests GAPD-2 being bound to more than one protein of cytoskeleton.
An evidence for unspecific proline-rich domain recognition by cytoskeletal proteins is also furnished by the results of Ka/Ks calculation. Ka/Ks value estimated for the variable segment of the proline-rich domain of mammalian GAPD-2 was close to unity, which means that this domain is subjected to neither purifying nor positive selection and therefore its specific sequence is not important for functioning.
The proline-rich domain is likely to be relatively young since it is absent in all other GAPD isoenzymes and no similar sequences have been revealed in other proteins by means of BLAST searches. So-called exonization of non-coding sequences is now assumed to be the source of new protein domains [90–93]. The repetitive character of the proline-rich domain sequence implies that it could have emerged from a microsatellite region. This way of new domain origination was proposed to be a general mechanism for the repetitive protein sequences [90, 94, 95].
Tissue-specific alternative splicing was discovered to govern the presence of proline-rich domain in GAPD-2 of a lizard Anolis carolinensis: it depends on a cassette exon being either spliced or retained. Unfortunately, no conclusion can be made as to whether this mechanism preceded GAPD-2 specialization to a testis-specific protein or appeared after it. It may be that GAPD-2 first incorporated the proline-rich domain as a rare optional splice variant in some tissues and only then specialized towards testis-specificity.
The results of our study substantially expand the current knowledge on evolution of GAPD family members. We show that GAPD-1 and GAPD-2 isoenzymes of mammals are also present in other lineages. We speculate that they emerged after duplication of the ancestral GAPD gene during the early evolution of chordates. GAPD-1 then underwent a number of additional independent duplications in different species, while GAPD-2 was lost in most lineages and is now found only in mammals and lizards, as well as cartilaginous and bony fishes.
We have demonstrated that GAPD-2 of mammals and lizards is specialized to a testis-specific protein. Accordingly, in these lineages GAPD-2 has acquired the novel N-terminal proline-rich domain anchoring the protein to the sperm tail cytoskeleton. This domain is likely to have originated by exonization of a microsatellite genomic region in a common ancestor of amniots. Estimates of selective pressure suggest unspecific recognition of the proline-rich domain by cytoskeletal proteins. Besides testis, GAPD-2 of lizards was also found in some regenerating tissues, but lacking the proline-rich domain due to tissue-specific alternative splicing.
In the previous study , GAPD-2 was shown to be specific for vertebrates. Therefore we decided to limit the consideration of GAPD isoenzymes and focused only on those belonging to vertebrates and also to the other groups of deuterostomes since they were not examined in . In order to find all GAPD sequences of deuterostomes, we first turned to the Ensembl database . 69 sequences of mammals and bony fishes were obtained from it as belonging to glyceraldehyde-3-phosphate dehydrogenase protein family [Ensembl:ENSFM00250000000211]. Second, a PSI-BLAST  search using the human GAPD-1 [SwissProt:P04406] as query (which was selected to be a typical example of GAPD) was conducted against UniProt . Since GAPD is known to be a well-conserved protein, a strict e-value threshold of 10-6 was chosen. The search converged in 6 steps returning 8957 hits, all of which showed more than 30% of identity to the query sequence. All in all 60 sequences of deuterostomes were picked out (excluding fragments and those previously obtained from the Ensembl database). We also selectively picked out 13 sequences of the major protostome phyla (arthropods, mollusks, annelid worms, roundworms and flatworms). Third, additional 55 sequences were obtained by employing TBLASTN algorithm with default parameters  to search with human GAPD-1 [SwissProt:P04406] and GAPD-2 [SwissProt:O14556] as queries in the EST division of GenBank . EST hits, which usually represent fragments of complete mRNAs, were manually scanned for extensive overlapping regions and then joined into larger sequences. Further inspection revealed some cases of contamination, which were excluded from the analysis. Specifically, we identified a chicken EST [GenBank:AM067846] actually belonging to Aspergillus flavus and three lancelet ESTs [GenBank:FE567488, GenBank:FE567489, GenBank:BW781185] belonging to some diatoms. As a result of this three step procedure the total of 197 GAPD sequences were identified for 131 species (109 deuterostomes and 22 other animals, see additional file 3: Accession codes of GAPD sequences used in the analysis).
Since phylogenetic tree reconstruction is a computationally expensive process, only a part of the obtained sequences was subjected to the analysis. No more than 7 species from each class of deuterostomes were considered, as well as 6 species of insects as the representatives of protostomes (for more details see additional file 3: Accession codes of GAPD sequences used in the analysis). Two slightly different GAPD sequences from the flatworm Macrostomum lignano, both derived from several ESTs [GenBank:EG952499, GenBank:EG951174, GenBank:EG952414, GenBank:EG952720, GenBank:EG953822], were used as an outgroup. The total dataset for phylogenetic analysis comprised 92 GAPD sequences. Multiple alignment of protein sequences was performed by MUSCLE  and then manually edited. The alignment of nucleic sequences was constructed by means of RevTrans 1.4 Server  based on the protein alignment (see additional file 4: Raw alignment of GAPD nucleic sequences used in the phylogenetic analysis). Columns with gaps were eliminated before phylogenetic analysis.
The phylogenetic relationships between GAPD family members were reconstructed using both protein and nucleic sequences. The Bayesian method of tree reconstruction as implemented in MrBayes 3.1.2 [102, 103] software was applied. The JTT model of amino-acid change , as well as the GTR model of nucleotide substitutions  were used. Preliminary analyses indicated that variation at the third position was saturated and confounded resolution at deep internal nodes. Therefore, trees based on nucleotide data were reconstructed in MrBayes by partitioning the data into the first, second and third codon positions, and allowing each partition to evolve at its own rate with its own shape parameter of gamma distribution.
For the Bayesian analyses, two independent runs were performed, each with four simultaneous chains that sampled every 100 generation. Trees sampled before the cold chain reached stationarity based on plots of the maximum likelihood scores were discarded. Sampling continued until convergence was achieved based on the average standard deviation of the split frequencies as given in MrBayes. Node support was accessed as Bayesian posterior probabilities.
Syntenic analysis is a reliable approach for establishing orthology. It is based on the assumption that local surroundings of genes are rarely affected by genomic rearrangements. Therefore, if the two genes have homologous neighbors, they are likely to have originated by vertical descent from a single ancestor and, in other words, be orthologous.
A syntenic analysis of the relationship between GAPD family members was performed by identification of positions of up to 20 genes both upstream and downstream of GAPD genes in human (Homo sapiens), stickleback (Gasterosteus aculeatus), zebrafish (Danio rerio), lancelet (Branchiostoma floridae), sea squirt (Ciona savignyi), sea urchin (Strongylocentrotus purpuratus) and acorn worm (Saccoglossus kowalevskii). Syntenic maps were constructed based on the information regarding gene location either available from Ensembl (human, both fishes and sea squirt) or obtained by conducting BLASTX searches of adjacent genomic regions against non-redundant protein databases. In the latter case, the homology between genes was decided if the identities of their protein product sequences were greater than 30%. The following genomes were used: B. floridae version 2.0 (Joint Genome Institute) , S. purpuratus version 2.1 (Human Genome Sequencing Center)  and S. kowalevskii version 1.0 (Human Genome Sequencing Center). Since genomic micro-rearrangements might occur, the matches between the local surroundings of GAPD genes were not required to be co-linear for establishing orthology. Gene losses and insertions were allowed as well.
To examine whether the GAPD family members are subjected to adaptive evolution, an analysis of variation under selective pressure was performed. Usually selective pressure is estimated by comparing the rates of synonymous (Ks) and non-synonymous substitutions (Ka) for the entire sequence. If Ka/Ks value is greater than unity, the whole sequence is supposed to be under positive selection, otherwise under purifying selection [107–109]. However, since each amino acid has a different function, the type and strength of natural selection may be different for each amino acid. To detect the variation in Ka/Ks values across the sequence a sliding-window approach is often used [110, 111].
Alignments of nucleotide sequences were constructed by PAL2NAL  based on protein alignments. Ka/Ks profiles were generated using a window of 120 base pairs and a step of 20 base pairs. Such a wide window was used because of high conservation of the analyzed sequences. Calculations of Ka and Ks for each window position were carried out with the aid of DnaSP 5.10 software .
The differences in selective pressure between GAPD isoenzymes were also examined by means of more sophisticated branch-specific models as implemented in codeml program from PAML software . Such kind of models assumes separate Ka/Ks values for different branches of the phylogenetic tree. They are often used for detecting selection changes after gene duplications, where one copy might evolve at a different rate due to acquisition of a new function or the loss of an old one [115–118].
First, GAPD sequences were divided into groups according to the results of phylogenetic analysis. Then a number of branch-specific models assuming separate Ka/Ks ratios for different combinations of groups were assayed. Likelihood ratio test (LRT) was used to determine whether the likelihoods of a pair of alternative branch-specific models are significantly different.
We thank Dr. Dmitry Ivankov for providing helpful comments. The work was supported by the Russian Foundation for Basic Researches (grants 09-04-01122-a, 09-04-01150, 09-04-92740-NNIOM_a and 09-04-92741-NNIOM_a), the Ministry of Education and Science of Russian Federation (Federal Target Program "Scientific and scientific-pedagogical personnel of the innovative Russia for 2009-2013"), and the DFG International Research Training Group "Regulation and Evolution of Cellular Systems" (GRK 1563).
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.