Protein-based signatures of functional evolution in Plasmodium falciparum
© Gardner et al; licensee BioMed Central Ltd. 2011
Received: 24 March 2011
Accepted: 14 September 2011
Published: 14 September 2011
Skip to main content
© Gardner et al; licensee BioMed Central Ltd. 2011
Received: 24 March 2011
Accepted: 14 September 2011
Published: 14 September 2011
It has been known for over a decade that Plasmodium falciparum proteins are enriched in non-globular domains of unknown function. The potential for these regions of protein sequence to undergo high levels of genetic drift provides a fundamental challenge to attempts to identify the molecular basis of adaptive change in malaria parasites.
Evolutionary comparisons were undertaken using a set of forty P. falciparum metabolic enzyme genes, both within the hominid malaria clade (P. reichenowi) and across the genus (P. chabaudi). All genes contained coding elements highly conserved across the genus, but there were also a large number of regions of weakly or non-aligning coding sequence. These displayed remarkable levels of non-synonymous fixed differences within the hominid malaria clade indicating near complete release from purifying selection (dN/dS ratio at residues non-aligning across genus: 0.64, dN/dS ratio at residues identical across genus: 0.03). Regions of low conservation also possessed high levels of hydrophilicity, a marker of non-globularity. The propensity for such regions to act as potent sources of non-synonymous genetic drift within extant P. falciparum isolates was confirmed at chromosomal regions containing genes known to mediate drug resistance in field isolates, where 150 of 153 amino acid variants were located in poorly conserved regions. In contrast, all 22 amino acid variants associated with drug resistance were restricted to highly conserved regions. Additional mutations associated with laboratory-selected drug resistance, such as those in PfATPase4 selected by spiroindolone, were similarly restricted while mutations in another calcium ATPase (PfSERCA, a gene proposed to mediate artemisinin resistance) that reach significant frequencies in field isolates were located exclusively in poorly conserved regions consistent with genetic drift.
Coding sequences of malaria parasites contain prospectively definable domains subject to neutral or nearly neutral evolution on a scale that appears unrivalled in biology. This distinct evolutionary landscape has potential to confound analytical methods developed for other genera. Against this tide of genetic drift, polymorphisms mediating functional change stand out to such an extent that evolutionary context provides a useful signal for identifying the molecular basis of drug resistance in malaria parasites, a finding that is of relevance to both genome-wide and candidate gene studies in this genus.
Identifying the molecular basis of disease-causing traits is one of the major justifications for the recent expansion in genomic data covering a wide range of taxa. Nowhere is this goal more clearly defined than in the case of malaria, where adaptive evolution in the form of drug resistance continues to undermine efforts to control human disease caused by P. falciparum  and P. vivax . Understanding the molecular basis of resistance phenotypes is of great operational importance as markers can be used to monitor spread and alternative therapeutic strategies can be designed. Establishment of a genetic cross has proven a fruitful starting point for determination of the genotypic basis of drug resistance [3–5], but is difficult, expensive and hence rarely achieved in practice. Genomic epidemiological approaches currently represent a promising route forward allowing detection of signatures of selection associated with drug resistance , although this approach relies on identification of linkage disequilibrium which is known to be of variable strength . Candidate gene approaches based on other forms of data analysis may also generate hypotheses suitable for further testing [8, 9].
Despite the power of these approaches for the discovery of new drug-resistance genotypes, a decade has passed since the description of the last confirmed resistance gene for a widely used antimalarial . It remains exceptional for mutations hypothesized as being involved in drug-resistance to be validated by transfection and phenotypic testing in P. falciparum, steps that allow polymorphisms of true adaptive value to be discriminated from those that are present simply as a result of genetic drift.
A specific phenomenon that may complicate studies in this genus relates to the remarkable degree to which Plasmodium proteins are enriched in non-globular domains . Since their first systematic description more than a decade ago, the function of these domains has remained unresolved [11–13] with one possibility being that they represent downstream consequences of events at the DNA level (i.e. broadly neutral in functional terms). If this were to be the case, they would represent potent generators of neutral or near neutral non-synonymous polymorphism in line with central concepts of modern evolutionary biology [14–16] that will inevitably confound both genome-wide and candidate approaches to identifying the molecular basis of resistance. Notably, minimal attempts have been made to determine systematically the degree to which specific areas of coding sequence have become released from negative selective in the long-term, and by inference unlikely to mediate functional change in the short-term. Measurement of selective forces is still generally undertaken at a whole-gene level, an approach that falls down if pressures vary considerably within individual genes, preventing determination of the true baseline for variation which is neutral evolution within and between malaria species.
We reasoned that application of two basic biological concepts could improve significantly the accuracy of approaches designed to identify the genetic basis of drug-resistance. Firstly, drug resistance, by necessity, requires a significant functional change in the parasite, and hence a functional change in one or more proteins. Secondly, functionally important regions of proteins are generally conserved across wide distances of evolution, and can therefore be discriminated in orthologous sequence alignments from sequences under minimal constraint .
We undertook a systematic study of evolutionary processes in Plasmodium coding sequences, revealing widespread, dynamic variation in selection pressures within individual coding sequences consistent with neutral theories of evolution. The use of underlying conservation score as an independent means of identifying functional variation against the backdrop of genetic drift was validated and applied by examining the molecular basis of drug-resistance, a phenotype that, by necessity, requires a significant functional change in the parasite, and hence frequently a functional change in one or more proteins.
Reference set of genes
P. falciparum gene
Pf amino acids
P. chabaudi gene
Pc amino acids
Vacuolar ATPase-coupled proton transport
Phospholipid transporting ATPase
Cation transporting ATPase
Cell-division cycle ATPase
Vacuolar proton transporting ATPase
Cu2+- transporting ATPase
Non-SERCA type Ca2+-transporting ATPase
Phospholipid transporting ATPase
Zn2+ or Fe2+ transporter
Amino acid transporter
Mg2+, Co2+ & Ni2+ channel
CorA-like Mg2+ transporter
Inorganic anion antiporter
DEAD/DEAH box helicase
DNA-directed RNA polymerase II
RNA polymerase I
Leucyl tRNA synthase
Mismatch repair protein pms1
DNA topoisomerase III
DNA-directed DNA polymerase
DNA topoisomerase II
DNA-directed RNA polymerase
Study of conservation across the Plasmodium genus was undertaken using orthologous protein sequences from the rodent parasite P. chabaudi, as genome sequencing of P. chabaudi was at a more advanced stage at the time of the studies than other available rodent species. P. falciparum (3D7) protein sequences were compared with P. chabaudi; an average score for all hominid malaria variants would strictly be more accurate, although this would only influence approximately 1% of residues. The mean level of protein conservation per gene ranged widely across the reference set of 40 housekeeping genes (mean +4.88 to -0.47 based on the BLOSUM62 matrix). It was also noted that overall 20.1% of P. falciparum residues were non-aligned, compared to 13.3% for P. chabaudi, consistent with expansion of non-aligned sequence in the hominid malaria lineage since the last common ancestor of these species.
Divergence and polymorphism in reference genes assessed by McDonald-Kreitman test, stratified by level of cross-genus conservation
8.3 × 10-15
9.5 × 10-6
Given the lack of a gold-standard informatic approach to defining non-globular regions, the relationship between conservation and non-globularity was explored via one purely structural marker of non-globularity (Kyte-Doolittle hydrophilicity) as well as a purely informational one (low-complexity regions defined by the SEG algorithm). These studies were undertaken in the context of an approach to cross-genus conservation measurement designed for high throughput use, involving a sliding window of 9 amino acid residues.
Maximum field allele frequencies for non-synonymous mutations in PfSERCA (PFA0310c, PfATP6)
Papua New Guinea
We show that genes that maintain synteny and clearly defined orthologous relationships across the genus also contain poorly conserved domains that occupy approximately half of the total coding sequence when analysed at the chromosomal level. In other words, in Plasmodium species, negative (purifying) selective force is unusually chimeric in nature, being alternately strong and weak within the same gene. Plasmodium chabaudi has approximately two-thirds as much non-aligned sequence as P. falciparum, while other work on a genome-wide scale indicates that P. vivax has approximately 40% . This indicates that the phenomenon is genus-wide, being particularly marked in the hominid malaria clade. The reason why internal proteins mediating core biological functions possess such widespread areas of non-globular domain experiencing minimal purifying selection remains obscure but our data are certainly consistent with these sequences representing a downstream consequence of genetic processes [13, 26] such as replication slippage and unequal crossing over . The dN/dS ratio of 0.64 in non-aligned regions suggests that these sequences are under very weak negative selective pressure, with purifying selection perhaps acting only to maintain non-globular structure.
Improved understanding of these issues has major implications for the study of all Plasmodium biology that relates to evolution. The findings are most immediately relevant to the goal of understanding molecular mechanisms of drug resistance, since this represents a classical example of evolution in action. As predicted, we were able to show that mutations selected by drug pressure in the field or laboratory are located in conserved protein sequences that have remained largely unchanged for millions of years (being conserved across the Plasmodium genus) by virtue of their functional importance, as already noted by other authors [22, 28–30]. What has not been considered is that the gradient in purifying selective force across the genus is strong enough to allow drug-resistance mutations to be discriminated accurately from those that are likely to non-adaptive in nature. Prediction of the functional consequences of mutations based on long-term evolutionary conservation has been described in the context of human genetic studies [31–33] but there is even greater scope for this approach to assist studies in P. falciparum, where the contribution of non-adaptive change appears to be log-orders of magnitude higher.
The potential for non-adaptive evolution to confound studies of genotype-phenotype relationships is clear in several areas. Candidate gene surveys based on sequencing of field isolates, where few or no phenotypic data are available, provide an obvious example; the majority of polymorphisms found in PfSERCA are seen to be located in poorly conserved regions, including all those reaching frequencies of more than 25% in field isolates. This finding indicates that polymorphism in this gene is dominated by non-adaptive evolution, and the temptation to invoke positive selection  should be avoided. Use of dN/dS to infer positive selection in whole genome analyses of P. vivax  may also be prone to the same issue; although measures of dN/dS across a gene tend to be conservative due to purifying selection, it is still likely on statistical grounds that across a large number of genes, a number of high dN/dS ratios (greater than one) will be generated simply by the action of non-adaptive evolution.
Understanding of the degree to which non-adaptive evolution occurs is also relevant to the identification of drug-resistance genes in a chromosomal context. In our analyses of regions known to contain drug-resistance genes, application of evolutionary conservation added considerable specificity with more than 95% of mutations outside drug-resistance genes falling into poorly conserved sequence. Nevertheless the accuracy of the evolutionary approach as it stands is unlikely to be sufficient for its use in isolation, but rather as a method for pinpointing polymorphisms within regions identified by other means, particularly genome-wide association studies using population genetic tests [6, 36]. Genetic cross experiments are also a robust starting point; for example, the complex polymorphisms in the cg1 and cg2 genes seen at the chloroquine resistance locus identified by the cross of HB3 and DD2  would be defined as being of low priority for further study by the approach we describe. Further work is in progress to examine systematically the evolutionary properties of other regions already identified as being of phenotypic importance by experimental means , with the aim of prioritizing polymorphisms for more detailed studies.
Awareness of the chimeric selection landscape of Plasmodium coding sequences is also relevant to the study of the history of Plasmodium species. A higher dN/dS at the whole gene level for hominid parasites P. falciparum and P. reichenowi compared to rodent malaria parasites has been suggested to provide evidence for small population size during the P. falciparum-reichenowi divergence . However, based on the set of forty reference genes we studied, hominid malaria proteins contain significantly more non-aligned sequence than do rodents, likely leading to higher dN/dS ratios at the whole gene level. In fact in sequence regions conserved across the Plasmodium genus the average dN/dS ratio for the P. falciparum-P. reichenowi divergence was 0.034, indicating strong purifying selection and implying a historically large population size following the lateral transfer of P. falciparum to humans from bonobos  or gorillas .
The coding sequences of Plasmodium species exemplify the theories of neutral evolution [14, 42] and the ensuing nearly neutral theory , operating on a grand scale that to our knowledge is unrivalled in genome biology. Failing to take into account the effect of this in any work that relates to evolution of coding sequence represents a missed opportunity to distinguish adaptive from non-adaptive polymorphism. For example neither genome-wide or candidate gene association studies into antimalarial resistance currently take into account the fact that most polymorphism is non-adaptive in nature. The approach we describe takes advantage of protein sequence data, but does not require any knowledge of protein function, a major advantage since the elucidation of function for the majority of Plasmodium genes remains a distant goal. Application of this method on a genome-wide scale, where it can be integrated into DNA-based methods as well as candidate gene approaches, offers a powerful approach for future studies in the field of antimalarial resistance, as well as other areas of research that are linked to evolutionary biology in this important genus.
The rodent malaria species Plasmodium chabaudi was used as comparator species given its status in terms of genome completion. Protein and DNA sequences (Plasmodb version: 2009.03.24) were obtained from Plasmodb.org. Forty orthologous pairs of sequences were chosen, fulfilling the following criteria; no known or hypothesised role in drug resistance or host interaction, syntenic relationship between P. falciparum and P. chabaudi and > 75% coverage in the P. reichenowi ortholog  (used for measurement of divergence within hominid malaria parasites). Since sexual-stage genes are released from purifying selection in asexual culture (experienced by several of the isolates under study)  genes with no evidence of asexual expression in transcriptomic surveys [44, 45] were also excluded. In order to reflect the types of genes which are implicated in drug resistance, as well as to obtain a range of conservation levels across this reference set, the reference genes consisted of ATPases (8), secondarily active transporters (12), glycolytic enzymes (10) and enzymes involved in DNA and RNA processing (10)(Table 1). The Plasmodb gene model for PfL0590c (PfATPase4) was found to be inconsistent with published data based on cDNA; the latter were used as coding sequence .
CLUSTALW alignments of orthologous protein sequences from P. falciparum and P. chabaudi were performed using the default settings of BioEdit. BLOSUM62 scores (reflecting conservation) were then calculated for each P. falciparum residue. Regions that could not be aligned between P. falciparum and P. chabaudi orthologs were defined as the gaps in BLASTP alignments of P. falciparum and P. chabaudi orthologs (BLOSUM62 matrix, gap penalties: existence 9, extension 2) A manual check of the protein alignment in BioEdit was also performed and on rare occasions where short alignments had been excluded by the BLASTP search these were retained. For residues where there was no aligned P. chabaudi residue, a conservation score of -5 was applied. Relatively small regions with no P. reichenowi coverage were also removed from analysis to ensure comparable denominators for inter- and intra-species comparison.
We analysed single-nucleotide polymorphisms (SNPs) derived from Plasmodb, based on available sequence for various P. falciparum strains from around the world generated by the Broad Institute , Wellcome Trust Sanger Institute  and NIH . Fixed differences between P. falciparum 3D7 strain and P. reichenowi (Oscar strain) were also obtained from Plasmodb . Radical amino acid substitutions were defined as those with BLOSUM62 matrix score < 0.
The challenge of identifying single nucleotide changes within a sequence that is undergoing frequent insertion-deletion polymorphism has been described . In addition, we noted that although complex polymorphisms were said to be excluded in publications, the Plasmodb lists of SNPs within P. falciparum and fixed differences between P. falciparum 3D7 strain and P. reichenowi sometimes contained repetitive mutations within tandem repeats (confirmed by Pustell protein matrix, MacVector) that were clearly part of complex indel polymorphisms and hence not genuine SNPs. These regions contributed 27.5% of all SNPs among P. falciparum isolates and 9.4% of the interspecies divergence, and were excluded from both polymorphism and divergence analyses.
Synonymous sites in reference set of genes
Hydrophilicity scores were measured by the Kyte-Doolittle index (window = 14). Low-complexity regions were defined using the SEG algorithm at its default parameters .
P. chabaudi orthologs were again used to generate the conservation score. In the case of one gene (MAL8P1.111) there was no rodent malaria parasite ortholog as previously reported  and in consequence the syntenic P. vivax ortholog was used for comparison. For the single apicoplast gene rpl4 the partial P. chabaudi sequence PC103611.00.0 was available for generation of the cross-genus conservation score at sites of mutation. For all studies at drug-resistance regions a neighbourhood conservation score was used (averaging the individual conservation scores across a sliding, overlapping window of 9 residues). This allows for the possibility that drug-resistance mutations may occur at residues that have previously undergone conservative change within a wider area of conservation, thereby reducing stochastic loss of sensitivity. This also obviated the need for a specific step to identify non-aligned regions at genomic regions. Non-synonymous SNPs (nSNPs) between sensitive and resistant parasites were studied at each locus, spanning the drug-resistance gene in each case and extending outwards symmetrically until 10 nSNPs outside the drug-resistance gene itself had been documented in at least one pairwise comparison. All residues known to be intrinsic to resistance haplotypes, whether or not each individual residue has been shown to cause drug-resistance independently, were included. Chi-squared testing was undertaken testing whether the distribution of amino acid variants in terms of conservation level was the same as the distribution of total sequence, using 13 conservation levels (bins of 1), and hence 12 degrees of freedom. The test was performed first for drug-resistance genes and mutations, and then for the other genes and the mutations within them.
CW, ND and NW are clinicians based at the Mahidol-Oxford Tropical Medicine Research Unit and are engaged in tackling the problem of multi-drug resistant malaria in Southeast Asia.
CW and KG were funded by an MRC Clinician Scientist Fellowship to CW. NW is a Wellcome Trust Principal Fellow. The authors thank Arjen Dondorp, Mallika Imwong, Olivo Miotto, Chris Newbold and Sue Lee for helpful discussions. No funding body had any role in study design, collection, analysis, and interpretation of data, the writing of the manuscript or the decision to submit the manuscript for publication.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.