Inventing an arsenal: adaptive evolution and neofunctionalization of snake venom phospholipase A2 genes

Background Gene duplication followed by functional divergence has long been hypothesized to be the main source of molecular novelty. Convincing examples of neofunctionalization, however, remain rare. Snake venom phospholipase A2 genes are members of large multigene families with many diverse functions, thus they are excellent models to study the emergence of novel functions after gene duplications. Results Here, I show that positive Darwinian selection and neofunctionalization is common in snake venom phospholipase A2 genes. The pattern of gene duplication and positive selection indicates that adaptive molecular evolution occurs immediately after duplication events as novel functions emerge and continues as gene families diversify and are refined. Surprisingly, adaptive evolution of group-I phospholipases in elapids is also associated with speciation events, suggesting adaptation of the phospholipase arsenal to novel prey species after niche shifts. Mapping the location of sites under positive selection onto the crystal structure of phospholipase A2 identified regions evolving under diversifying selection are located on the molecular surface and are likely protein-protein interactions sites essential for toxin functions. Conclusion These data show that increases in genomic complexity (through gene duplications) can lead to phenotypic complexity (venom composition) and that positive Darwinian selection is a common evolutionary force in snake venoms. Finally, regions identified under selection on the surface of phospholipase A2 enzymes are potential candidate sites for structure based antivenin design.


Background
Phospholipase A 2 s (PLA 2 ; EC 3.1.1.4) are esterolytic enzymes that hydrolyze glycerophospholipids at the sn-2 fatty acyl bond, releasing lysophospholipids and fatty acids. PLA 2 s play key roles in various biological processes in mammals including signal transduction, lipid digestion, host defense and production of eicosanoids and other lysophospholipid derivates with potent biological activities [1]. PLA 2 enzymes are also the major compo-nents of snake venoms where they function to immobilize and rapidly kill prey [1]. PLA 2 s from elapid venoms (group-I) are structurally similar to pancreatic secretions while PLA 2 s from viper venoms (group-II) are structurally similar to inflammatory secretions [2]. A third group of PLA 2 s (group-III) have been identified from the venom of bees, jellyfish, scorpions and lizards [2] indicating that PLA 2 s have been recruited into a toxic function multiple times in diverse lineages.
Snake venom PLA 2 s are members of large multigene families with diverse pharmacological activities including neurotoxic, myotoxic, cardiotoxic, anticoagulant and hemolytic effects [2]. These diverse activities evolved from an ancestral nontoxic PLA 2 by a process of repeated gene duplication followed by functional divergence. PLA 2 toxicity is independent of enzymatic activity [3,4] and is mediated through "pharmacological sites" on the protein surface that directly interact with ligands on the cell membrane [5]. Thus, the surface of PLA 2 s forms a scaffold for adaptive modification that has been used to generate a diverse array of pharmacological effects through a process of neofunctionalization (the generation of new protein functions that were not the primary function of the ancestral protein).
Previous studies of PLA 2 genes identified accelerated evolution of group-I genes from Naja naja [6] and group-II genes from Trimeresurus [7] and Vipera [8] consistent with positive Darwinian selection, however these studies focused on one or two species, included relatively few genes and used methods that lack power to detect episodic adaptive evolution. A larger study of group-I and -II genes found that amino acid substitutions were correlated with surface accessibility [9], suggesting that modifications of surface residues and positive selection play important roles in generating toxin diversity. To further explore this possibility I compiled an extensive dataset of full length snake venom PLA 2 genes from public databases, inferred the gene trees for these toxins and tested for episodes of positive Darwinian selection coincident with the origin of novel pharmacological effects and recurrent diversifying selection on specific sites. In addition, I used a larger amino acids dataset to test if conclusions drawn from the smaller nucleotide dataset were robust to phylogenetic inference.
The results indicate that adaptive evolution is common in snake venom PLA 2 genes and is associated with the evolution of new toxin functions and speciation events, demonstrating that molecular adaptation has played a pervasive role in the evolution of snakes and their venom arsenal. This analysis has identified the mutational pathway leading from non-toxic to highly toxic PLA 2 enzymes, reconstructing the processes of mutation and adaptation. Finally, increases in genomic complexity gained through gene duplications has promoted the evolution an increasingly complex phenotype (venom composition), providing a link between molecular, phenotypic and organismal evolution.

Gene duplication and speciation history
To study the molecular evolution of snake venom PLA 2 genes, I compiled a dataset of 83 group-I and 90 group-II genes from public databases and inferred the evolutionary history of these genes using Bayesian phylogenetics [10]. The molecular phylogeny inferred for group-I PLA 2 enzymes (Figs. 1 and 3) indicates that genes group with higher-order snake phylogeny and are divided into two sister clades containing marine and Australian species (the "Hydrophiids") and African, American and Asian species (the "Elapids"). This division is similar to the results of earlier phylogenetic studies of group-I PLA 2 genes [11] and likely reflects a deep divergence between these groups. Within these two major clades, subclades contain genes from closely related taxa that have similar pharmacological effects, suggesting functional diversification occurred after speciation. In contrast, group-II genes cluster by pharmacological effect with little species distinction (Figs. 2 and 4), indicating that functional divergence arose before the divergence of these species.
Even though branch support for the group-I and group-II nucleotide trees was high in this analysis, nucleotide data are only about a third of the gene sequences that are available, the majority are amino acid data (and thus not suitable for codon-based selection analysis discussed later). To test if the topology of nucleotide-based gene trees was sensitive to taxon sampling I inferring group-I and group-II phylogenies using larger amino acids datasets (245 group-I and 271 group-II genes, respectively). Although the trees inferred from amino acid data (Figs. 3 and 4) had lower support for recent lineages than the nucleotide data, perhaps because protein sequences have not accumulated enough phylogenetically informative substitutions to accurately reconstruct recent branching orders, the deeper nodes were well supported and the overall topology was congruent between amino acid and nucleotide data indicating that inferences based on the nucleotide datasets are reliable.

Origin and elaboration of toxic genes
Snake venom PLA 2 s have diverse pharmacological activities including neurotoxic, myotoxic, cardiotoxic, anticoagulant and hemolytic effects [2], which must have originated after they diverged from nontoxic ancestors. Although uncertainty in the tree topology and the richness of toxin functions makes assessment of specific ancestral and derived functions difficult for all lineages, it is clear from these phylogenies that many novel functions have originated in PLA 2 genes after gene duplications. For example, three nontoxic group-I PLA 2 genes isolated from Laticuadata semifasciata pancreas [12] branch near the base of the Elapinae group in the nucleotide tree, but are the most basal snake group-I genes in the amino acids tree ( Figs. 1 and 3). These pancreatic genes have been suggested to be intermediates between nontoxic and toxic enzymes [12], suggesting that duplication of an ancestral nontoxic gene originally expressed in the pancreas was Molecular phylogeny of group-I phospholipase A 2 genes Figure 1 Molecular phylogeny of group-I phospholipase A 2 genes. (A) Bayesian phylogeny, branch lengths are given as number of substitutions per codon. (B) Evolution of group-I genes. Numbers above the branches are Bayesian posterior probability values (BP) followed by the d N /d S ratio (ω) or the number of nonsynonymous substitutions (N) if no synonymous substitutions (S) occurred along that branch (BP | ω or N/S). Branches in red were inferred to be under positive selection. Genes are labeled according the species they were identified from followed by the GenBank GI number for that gene. Pharmacological effects and higher order classifications are given to the right of clades. Pan, nontoxic pancreatic isoforms. AtC, anticoagulent. Ctx, cardiotoxic. PrC, procoagulent. Molecular phylogeny of group-II (B) phospholipase A 2 genes Figure 2 Molecular phylogeny of group-II (B) phospholipase A 2 genes. (A) Bayesian phylogeny, branch lengths are given as number of substitutions per codon. (B) Evolution of group-II genes. Numbers above the branches are Bayesian posterior probability values (BP) followed by the d N /d S ratio (ω) or the number of nonsynonymous substitutions (N) if no synonymous substitutions (S) occurred along that branch (BP | ω or N/S). Branches in red were inferred to be under positive selection. Genes are labeled according the species they were identified from followed by the GenBank GI number for that gene. Pharmacological effects and higher order classifications are given to the right of clades. AtC, anticoagulent. Chp/Chaperone, neurotoxin chaperone. Ntx, neurotoxic. followed by recruitment into the venom gland and the emergence of toxic functions. Also within group-I, a clade containing neurotoxins from Laticuadata has emerged from antiplatelet enzymes; the nested position of this clade indicates that the neurotoxic effect of these enzymes is derived from more ancestral antiplatelet enzymes. The origin of group-II toxins that target muscle is also associated with a gene duplication event. Group-II myotoxins share a unique amino acid substitution at residue 49 (aspartate to lysine) that abolishes enzymatic activity [13,14]. Thus, Lys 49 -myotoxins evolved a novel nonhydrolytic mechanism to induce membrane damage [15,16] Molecular phylogeny of group-I phospholipase A 2 genes Figure 3 Molecular phylogeny of group-I phospholipase A 2 genes. Amino acid dataset with Bayesian posterior probabilities shown along branches.      5 5 6 6 9 5 3 7 M .I ka h e ka 5 5 6 6 9 5 4 4 O .h a n n a h 1 0 8 6 3 7 6 0 O .H a n n a h 3 4 8 0 9 9 6 0 O .h a n n a h 1 0 8 6 3 7 6 2 B .f a s c ia tu s 1 2 9 4 9 4 N .k a o u th ia 2 1 4 4 4 4 0 N .k a o u th ia 6 7 1 7 2 N .s p u ta tr ix 8 9 5 3 8 9 9 N .s p u ta tr ix 2 5 4 5 3 1 6 3 N .s p u ta tr ix 8 9 5 3 9 0 1 N .s p u ta tr ix  N .n a ja N .n a ja 1 2 9 5 1 4 N .o x ia n a 1 2 9 4 1 1 N .s a g it ti fe ra 3 7 7 0 0 4 8 9 N .S a g it ti fe ra 4 6 0 1 5 7 3 2 N .S a g itt ife ra 3 1 6 1 5 5 8 5 N .s a g itt ife ra 3 8 0 1 7 4 9   I used maximum likelihood models of coding-sequence evolution [17,18] to test the hypothesis that functional diversification of snake venom PLA 2 genes is driven by positive Darwinian selection. This method determines the strength and direction of selection by estimating the non-synonymous-to-synonymous substitution rate (d N /d S = ω), with ω = 1, <1, and >1 indicating neutral evolution, purifying selection and directional selection, respectively. The branch-specific one-ratio model is the simplest, estimating the same ω for all branches in the phylogeny. The estimate of ω for group-I genes under this model, 1.28, is an average over all codons and lineages, highlighting the dominant role of positive selection on elapid venom Molecular phylogeny of group-II phospholipase A 2 genes   E .c a ri n a tu ss o ch u re ki 2 5 9 9 2 6 6 1 E .c o lo ra tu s 1 3 9 3 6 5 4 5 E .c a ri n a tu s 3 0 9 8 3 9 1 8 E .C a ri n a tu s 4 0 8 8 9 2 5 9 V .a .a s p is 3 3 1 8 7 1 4   B .in su la ris 2 0 0 6 9 1 3 7 T .f la vo vi ri d is 1 5 7 9 9 2 6 5 T .f la v o v ir id is 2 2 2 9 5 9 P .e le g a n s 8 4 5 7 8 8 8   T .p u n ic e u s3 phospholipases. The estimate of ω for group-II genes under the one-ratio model, 0.686, indicates that group-II genes are generally under purifying selection, however, this estimate is higher than reported from most genes.
The one-ratio model can only detect adaptive evolution when the majority of amino acids and lineages under study have been under positive selection (such as in group-I genes). If adaptive evolution is primarily episodic, then short episodes of positive selection, which are followed by long periods of purifying selection, will not be detected. To test for episodes of positive selection in group-I and group-II gene lineages, I used a free-ratios model that estimates separate d N /d S ratios for all lineages in the tree. These models fit the data significantly better than either the one-ratio model or a constrained one-ratio model with ω forced to be 1 (group-I genes) or a free-ratio model with lineages previously identified with ω >1 constrained to be 1 (group-II genes), indicating that episodes of directional selection are common in snake venom PLA 2 evolution with nearly 32% and 21% of group-I and -II gene lineages, respectively, having been under directional selection ( Figs. 1 and 2). Moreover, there are several branches with extremely high ω values, including two group-I and three group-II branches with ω >3, one group-I branch with ω >5 and one group-I branch with ω = 9.06 ( Figs. 1 and 2).
Ohno's model [19] of post-duplication divergence predicts an increase in the nonsynonymous substitution rate following duplication as positive Darwinian selection drives the fixation of mutations that confer new or modified functions on gene duplicates. To test for accelerated evolution after duplication I used smaller datasets for which speciation and duplication events could be unambiguously assigned to each branch and a two-ratios model that estimated different ω parameters for post-duplication (PD) and post-speciation (PS) branches. Surprisingly, in group-I genes PD and PS branches have nearly identical ω values (ω PD = 1.12, ω PS = 1.22), indicating that positive selection is associated with both gene duplication and speciation. In contrast to group-I genes, group-II gene PD branches evolve much faster than PS branches (ω PD = 1.4, ω PS = 0.63), consistent with the classical model of neofunctionalization.
Here, neofunctionalization is defined as the emergence of a new toxic effect from an ancestral enzyme that did not posses that effect as its main toxin function (for example, neurotoxic Laticuadata genes and Lys 49 -myotoxins discussed above). Strikingly, positive selection occurred in the stem-lineage of 67% (4/6) of group-I functional groups and 88% (7/8) of group-II functional groups (Figs. 1 and 2) indicating that positive selection played a pervasive role in the origin of novel toxin functions during the diversification of vipers and elapids and their venoms. It also suggest lineages which can be targeted for ancestral sequence reconstructions for characterization of ancestral toxin functions to compare extant functions to.
The importance of gene duplication to the evolution of species-specific traits is relatively unknown, but duplications resulting in species-specific adaptations have been demonstrated for some genes [20,21]. The unexpectedly high group-I ω PS may be the result of enzyme adaptation to new prey preference after speciation. Indeed, the three semi-aquatic kraits (Laticaudata sp.) prey primarily on moray and conger eels and assorted fishes [22,23] while Australian copperheads (Austrelaps) prey on frogs and lizards [24]. In the Elapinae group, the king cobra (Ophiophagus hannah) and kraits (Bungarus sp.) feed almost exclusively on snakes and other reptiles [25], while the true cobras (Naja sp.) and the Eastern brown snake (Pseudonaja textilis) feed on small mammals, amphibians and birds [25]. This pattern suggest a scenario where dietary shifts after speciation runs the PLA 2 gene repertoire through a "selective sieve"; those genes which are no longer effective in subduing new prey species are lost, while genes that are still effective adapt to the new prey type and subsequently diversify.
A limitation of the lineage-specific models of protein evolution utilized above is that they can only detect directional selection when the average ω over all amino acids in the protein is >1. Thus, lineage-specific models have limited ability to detect short episodes of directional selection that affect only a few amino acids or amino acids under recurrent diversifying selection. Site-specific models [26] account for rate variation among sites and are powerful tools for detecting diversifying selection. I used three pairs of site-specific models to test for recurrent, diversifying, selection: M0 (one ratio) and M3 (Discrete), M1 (Neutral) and M2 (Selection), and M7 (Beta) and M8 (Beta & ω). Parameter estimates under models M2, M3 and M8, which allow for sites with ω >1, identified that up to 65% of sites in group-I genes and 27% of sites in group-II genes are under positive selection (Tables 1 and 2). This is strong evidence that diversification of snake venom PLA 2 genes is driven by recurrent positive selection and suggest that venomous snakes are caught in a co-evolutionary arms race with prey as prey evolve resistance to the current venom arsenal and snakes evolve ever more toxic venoms.
The three dimensional structure of PLA 2 enzymes are extremely conserved, obscuring the mechanisms that produce such a wide spectrum of pharmacological effects. To investigate how functional diversity is generated in PLA 2 enzymes, I mapped sites that were identified as being under diversifying selection on to the crystal structure of group-I and -II phospholipases (Fig. 5). The vast majority of amino acids under diversifying selection occur outside of the α-helicical central scaffold and in regions of the protein that form connecting loops, however, the scaffold of group-II proteins is more conserved than group-I proteins. Functionally important residues, including cysteins that participate in disulfide bonds, the catalytic triad, the calcium-binding site and the hydrophobic channel have d N /d S ratios near zero indicating these regions are under strong structural and functional constraint. In contrast, there are several clusters of amino acids on the molecular surface under intense diversifying selection in both group-I and -II proteins (Fig. 5). These rapidly evolving regions are similar to sites known to produce toxic effects in PLA 2  enzymes [27][28][29], highly suggesting that regions under positive selection on the proteins surface are responsible for generating toxic functions.
Although positive selection is often associated with the origin of novel toxin functions (such as antiplatelet, neurotoxic and procoagulent toxins), there are several lineages in which new functions emerge without a significant increase in the nonsynonymous substitution rate (other neurotoxins and cardiotoxins). This is not unexpected since it has long been known that relatively few amino acid changes can have drastic effects on protein functions [ref], suggesting some lineages may have substitutions that contribute to the origins of novel functions but that might have been missed in the lineage and site-specific analyses above. To further clarify the pattern of amino acid replacement that promotes functional changes, I mapped amino acid changes inferred from ancestral sequence reconstructions for select group-I (Figs. 6 and 7) and group-II genes (Fig. 8) onto the crystal structure of these proteins. Clearly, replacements are nearly evenly distributed on protein surface in both group-I and -II genes, but there are several regions that are devoid of amino acids changes including residues in and around the active site and patches of conserved residues on the "back" of the proteins. These regions also contain residues in the slowest evolving site-class from the site-specific analysis. Taken together, these patterns suggest that while only a few changes on the surface are needed to evolve a new function, there are regions under strong structural/functional constraints that limit divergence such as the hydrophobic core and patches of conservation on the surface. Given this, it is interesting to note that several basal clades in the group-II genes with uncharacterized pharmacological effects have strong evidence of selection and many amino acid replacements that map to the surface (Figs. 2 and 8) suggesting that they may have evolved novel, if as of yet unidentified, functions.

Conclusion
The molecular evolution of group-I and group-II PLA 2 genes, such as the birth and death like and "selective sieve" processes of gene duplication, divergence and loss are similar to evolution of other snake venom proteins, The structure of group-I (A-C) and group-II (D-F) phospholipase A 2 proteins Figure 5 The structure of group-I (A-C) and group-II (D-F) phospholipase A 2 proteins. The structures are represented by ribbons in A and D with disulfide bonds and catalytic residues shown as sticks and as molecular surfaces rendered in 3D in B, C, E and F. Residues are colored coded according to their approximate posterior mean ω (scale shown between rows) calculated under model M3 (discrete). B and E are in the same orientation as A and D, respectively, while C and F are rotated 180° about a horizontal axis through the molecule.   Ancestral sequence mapping for elapinae group genes. Organization follows Figure 6.
particularly the elapid three-finger toxins [30]. Indeed, there is even evidence for species-specific toxin adaptation to prey type within the three-finger toxins and maintenance of a well-ordered tertiary structure [30] similar to that seen in PLA 2 genes, suggesting that this mode of molecular evolution may be common in venom genes.
Kini and Evans [5] have proposed that 'target sites' on the surface of prey cells are recognized by 'pharmacological sites' on PLA 2 enzymes. These protein-protein interactions determine PLA 2 specificity by having complementary charges, hydrophobicities, and Van der Waals contact surfaces. This model, combined with the analyses above, suggest that entirely new functions originate after duplication through substitutions in pharmacological sites that alter binding specificities. Although most substitutions will likely disrupt binding specificity for the current target site, a few may create new interaction sites leading to the emergence of novel functions.
The extraordinary level of positive selection acting on snake venom phospholipase A 2 genes indicates that adaptive molecular evolution plays an important role in the emergence of these novel functions, continues as functions are diversified and refined, and may contribute to niche differentiation after speciation. Interestingly, mapping sites under positive selection onto the structure of PLA 2 enzymes has identified regions that are attractive candidates for structure-based drug design. These data also demonstrate that increases in genomic complexity gained through gene duplications has lead to an increase in phenotypic complexity (venom composition) and likely the ability of venomous snakes to adapt to new prey types.