Selection shapes malaria genomes and drives divergence between pathogens infecting hominids versus rodents
© Prugnolle et al. 2008
Received: 28 May 2008
Accepted: 30 July 2008
Published: 30 July 2008
Skip to main content
© Prugnolle et al. 2008
Received: 28 May 2008
Accepted: 30 July 2008
Published: 30 July 2008
Malaria kills more people worldwide than all inherited human genetic disorders combined. To characterize how the parasites causing this disease adapt to different host environments, we compared the evolutionary genomics of two distinct groups of malaria pathogens in order to identify critical properties associated with infection of different hosts: those parasites infecting hominids (Plasmodium falciparum and P. reichenowi) versus parasites infecting rodent hosts (P. yoelii yoelii, P. berghei, and P. chabaudi). Adaptation by the parasite to its host is likely highly critical to the evolution of these species.
Our comparative analysis suggests that patterns of molecular evolution in the hominid parasite lineage are generally similar to those of the rodent lineage but distinct in several aspects. The most rapidly evolving genes in both lineages are those involved in host-parasite interactions as well as those that show the lowest expression levels. However, we found that, similar to their respective mammal host lineages, parasite genomes infecting hominids are generally less constrained, evolving at faster rates, and accumulating more deleterious mutations than those infecting murids, which may reflect an historical lower effective size of the hominid lineage and relaxed host-driven selective pressures.
Our study highlights for the first time the differences in trends and rates of evolution in Plasmodium lineages infecting different hosts and emphasizes the potential importance of the variation in effective size between lineages to explain variation in selective constraints among genomes.
A number of useful evolutionary parameters can be estimated from between species comparisons of genome-wide divergence patterns: the magnitude of positive and negative (purifying) selection, variation in selection across different lineages, chromosomes, gene families and individual genes as well as the number of genes involved in the speciation process and adaptation to new environments.
Comparative approaches have revealed that genes involved in immunity or in host defenses tend to exhibit the highest rate of evolution in the genome of different species. The arms race between hosts and parasites is generally invoked to explain this rapid evolution of genes involved in immune defense . Pathogens continuously evolve to escape the defense of the host and the host, in turn, responds by modifying its defense. At the molecular level, this cycle of environmental changes means that new mutations are continuously tested and fixed by selection if adaptive, which translates into higher rates of molecular evolution in genes controlling immunity in the hosts.
While accelerated evolution of genes involved in immunity is common in mammals, the relative rate of evolution of those genes may vary from one phylogenetic lineage to another. This is the case for hominids (human and chimpanzee) compared to rodents (mouse and rats) where genes involved in immune defense show an accelerated rate of evolution in murids compared to hominids, suggesting that the immune system of murids has undergone more extensive specific innovations .
Hosts belonging to different lineages can therefore represent different environments for parasites to adapt. How do parasites preferentially infecting different host lineages respond to these different environments? Do parasites infecting different host lineages show lineage specific rates of evolution?
Plasmodium is a practical case study for genome evolution of parasites specifically infecting different host lineages. The genomes of two parasite species infecting only hominids (Plasmodium falciparum , P. reichenowi ) and three species preferentially infecting rodent hosts (P. yoelii yoelii, P. chabaudi and P. berghei)  are now partially or completely sequenced. Although these two groups of species might be subject to similar selective pressures acting either on the genome as a whole or on genes with similar function across species, some aspects of their genomes, such as genes associated with evading host immunity, may evolve in a unique manner.
As for their two mammal host lineages (i.e. hominids and rodents) , our study reveals, in particular, that the evolution of the hominid lineage parasite genomes was less constrained than the evolution of those parasites infecting the murid lineage, which likely reflects a lower effective population size in hominid parasites (specifically P. falciparum).
By taking the complete set of coding genes of P. falciparum and aligning it to a partial genome shotgun of P. reichenowi (covering approximately 1/3 of the entire genome) available in PlasmoDB, we identified a set of 843 pairs of genes with unambiguous orthology for which it was possible to generate high quality sequence alignments covering virtually the entire coding region (see Methods and Additional file 1). The same procedure retrieved 3060 triplets of orthologues for rodent malaria parasites P. yoelii yoelii, P. berghei and P. chabaudi, distributed throughout these genomes (Additional file 2).
Evolutionary rates in hominid and rodent's Plasmodium lineage
Rodent (murid) lineage
0.012 ± 0.00044
0.026 ± 0.00041
0.057 ± 0.0013
0.20 ± 0.0027
0.21 ± 0.0068
0.13 ± 0.0023
The average ωMurids for the 3060 genes analyzed in P. yoelii yoelii, P. chabaudi and P. berghei, was estimated as 0.13 (Table 1), significantly lower than in parasites infecting hominids (Wilcoxon test over all genes, p-value < 10-4). Only 4 genes displayed a ratio greater than 1, but none were significant. In fact, more than 98% of the genes displayed a ratio significantly lower than 1.
Several observations suggest that the difference observed between the hominid and the rodent lineage is not due to the number of species aligned in each lineage nor their phylogenetic distances. First, the ω obtained between any pair of rodent parasite species, are similar, and lower than the estimate obtained for P. falciparum and P. reichenowi (data from : P. yoelii yoelii and P. chabaudi : ω = 0.11, P. yoelii yoelii and P. berghei : ω = 0.16 and P. chabaudi and P. berghei : ω = 0.13). Second, the marked difference between hominid and rodent malaria parasites still held when comparisons between lineages are made in a paired way using only those genes from hominid and rodent lineages of known orthology (orthology between P. falciparum and P. yoelii yoelii was retrieved from ; Wilcoxon rank sum test; n (genes) = 263; p-value = 0.002). Therefore, there is an excess of about 25% of amino-acid altering substitutions, relative to synonymous substitutions, in the hominid lineage compared to the rodent lineage.
Interestingly, the host lineages of these two groups of parasites show similar differences in their rate of molecular evolution. Hominids show an average ω of 0.20, over the whole genome, while murids show an average ratio close to 0.13 [2, 7]. As in the case of Plasmodium parasites, the difference between both estimates is highly significant. This difference in the rate of molecular evolution among the host lineages could be the result of two different processes: 1) a relaxation in the selective constraints acting on the amino-acid sequence in the host hominid lineage, relative to the murid lineage, subsequent to a reduction in their effective population size or 2) an acceleration in the global rate of adaptive evolution in hominids. Genetic evidence favors the first hypothesis [8, 2, 9]. Does the same explanation hold for parasites?
Comparing genetic variation at synonymous and non-synonymous sites within (polymorphism) and between species (divergence) can help distinguish between the two hypotheses above . When most variation is neutral, the ratio of the number of nonsynonymous to synonymous polymorphisms observed within populations should be the same as the ratio of divergence between species . Mu and collaborators  analyzed genome-wide polymorphism of 5 P. falciparum isolates distributed globally. Jeffares and collaborators also analyzed polymorphisms using three different African isolates . In total, we obtained divergence and polymorphism data for 518 genes using Mu et al 's data and 839 genes using Jeffares et al. 's data. The ratio of the number of nonsynonymous to synonymous polymorphism was 2.3 for the first dataset and 2.7 for the second one. Comparatively, the ratio of divergence we observed among P. falciparum-P. reichenowi was only 1.1 and 1.3, respectively, thus indicating a 2-fold increase in the number of nonsynonymous polymorphisms. Although some amino-acid substitutions observed among species are likely adaptive, this observation supports selection being less efficient in removing segregating deleterious amino-acids in P. falciparum. A reduced effective size in hominid parasites compared to the murid lineage might be one explanation for the observed difference in ω. This scenario is supported by different studies suggesting the existence of an historical low effective population size in P. falciparum . Unfortunately, no such population study exists for rodent malaria species. The evolution of both hominids and their malaria parasites appears to be less constrained than that of murids, which might reflect, for both the host and the parasite, a small historical effective size and an evolution mainly driven by slightly deleterious mutations .
We then asked whether specific groups of genes evolved differently from the rest of the genome. In particular, we searched for groups of genes that could have experienced an accelerated evolution in one lineage compared to the other, thus leading to a higher difference in their amino-acid substitution rate between lineages than expected given the difference observed across the genome.
Go categories and relative divergence rates (dN/dS) in hominid and murid lineages
Go categories within "biological process"
GO: Organelle organization and biogenesis
GO: Carbohydrate metabolism
GO: Energy pathways
GO: Protein transport
GO: Cell organization and biogenesis
GO: Cytoskeleton organization and biogenesis
GO: Protein biosynthesis
GO: Cell growth and/or maintenance
GO: Cytoplasm organization and biogenesis
GO: Amino acid and derivative metabolism
GO: Protein metabolism
GO: Physiological process
GO: Cell proliferation
GO: Nucleobase et al*.
GO: Cell cycle
GO: Response to stress
GO: DNA metabolism
GO: Protein modification
GO: Lipid metabolism
GO: Biological_process unknown
GO: Cell communication
We finally addressed how variation in evolutionary rates could evolve relative to the timing and the breadth of expression of the genes. Do genes expressed at a particular stage show lineage specific evolution?
The classification we used for Fig. 3A was nevertheless very broad. It included genes that were expressed at one particular stage but those same genes could also be expressed at other stages. Because this could preclude the detection of stage-specific evolution, we then re-analyzed our data keeping only the genes that were expressed at one stage. Doing so, we observed a significant difference in the rate of evolution among the different categories in the hominid lineage (p- value = 0.017) but no difference was observed in the rodent lineage (p- value = 0.59, Fig. 3B). We observed an overall difference between hominid and rodent parasites primarily due to the loci expressed at the sporozoite stage that showed an accelerated evolution in the hominid lineage compared to the rodent one (p- value = 0.018). This result suggests that genes only expressed at the sporozoite stage might be key genes in the infection process of the mammal host by malaria parasites and experienced higher adaptive evolution in the hominid lineage than in the rodent one.
Relationship between gene expression, GC content and substitutions rates in both hominid's and rodent's Plasmodium parasites
Rho = -0.26; p = 0
Rho = -0.26; p = 0
Rho = -0.13; p = 0.008
Rho = -0.15; p = 0
Rho = 0.18; p = 0
Rho = 0.113; p = 0.00001
Rho = 0.28; p = 0
Rho = 0.31; p = 0
Rho = 0.37; p = 0
Rho = 0.35; p = 0
Rho = -0.37; p = 0
Rho = -0.38; p = 0
Rho = -0.43; p = 0
Rho = -0.38; p = 0
Our results are congruent with previous studies reporting relationships between breadth of expression and the rate of gene evolution in other organisms [14, 15]. In both hominid and rodent lineages, highly expressed genes are generally more constrained than less expressed genes . This observation is often attributed to the fact that proteins that are expressed in more diverse cellular environments are subjected to stronger functional constraints [14, 15, 17]. These results are consistent with the observation in both hominid and rodent parasites of a positive relationship between GC content at position 1 and 2 of codons (GC1-2) and the level of expression as well as a negative relationship between GC1-2 and dN/dS (Table 3). Highly and universally expressed genes are more GC-rich than lowly expressed genes which might thus reflect a codon bias, in particular for GC-rich codons. In other words, amino acids encoded by GC-rich residues are preferred and conserved in protein coding genes of the genus Plasmodium.
The positive relationship observed between the rate of synonymous substitutions and expression can potentially be explained by translational selection acting on synonymous codon sites of highly expressed genes . Alternatively, an increase in transcription may simply increase the level of spontaneous mutations as demonstrated in certain bacteria .
Our knowledge about the evolution of parasites responsible for malaria is increasing rapidly thanks to the availability of several completely sequenced genomes from species belonging to different lineages. As shown in the present study, different questions can be directly answered by comparing genomes from multiple Plasmodium species. Our comparative analysis suggests that, while there are a few aspects that are distinct among lineages, patterns of molecular evolution in the hominid parasite lineage are generally consistent with those observed in the rodent parasite lineage. In the murid lineage the most rapidly evolving genes are those involved in host-parasite interactions and those that are the least expressed. However, the evolution of the hominid lineage appears to be less constrained likely reflecting their historical lower effective size and an evolution driven by slightly deleterious mutations.
While we tried to be as exhaustive as possible in our comparison of the evolution of the genome of both species, this study is still imperfect and incomplete. A definitive study will require the use of a high-quality complete sequence for P. reichenowi as well as more population data on the rodent lineages. Analyses of polymorphisms in natural populations of both hominid and rodent lineages (in the rodent lineage this information is specifically lacking) is critical to better understand the nature and intensity of selection acting on different categories of genes.
Because for parasites with complex life-cycles (like Plasmodium sp.), the vector and the vertebrate host constitute very different environment, an interesting analysis would be to study the evolution of the genes exclusively expressed in the stages infecting the mammal host versus those only expressed in the stages infecting the mosquito vector. Such an analysis would however require the collection of more detailed expression profiles in both lineages.
For rodent malaria species, protein and nucleotide sequences for annotated genes for P. berghei, P. chabaudi and P. yoelii yoelii were obtained from The Plasmodium Genome Resource Database (Plasmodb ). Orthologous genes between the three species were obtained with BlastN using the criterion of best hits with scores of E < 1*10-15 and at least 70% similarity in length. Only the groups of genes for which only one gene of each species corresponded to these criteria were conserved. All groups of coding sequences were aligned using Clustal W version 1.82  (default parameters) using amino acid sequences followed by back-translation into nucleotides sequences using the original sequence provided by Plasmodb.
For hominid malaria species, protein and nucleotides sequences for annotated genes were obtained for P. falciparum only. For P reichenowi, Plasmodb provided only nucleotide contigs (release 09 July 2004) of a partial genome shotgun of approximately onefold coverage. The assembled contiguous sequences cover slightly less than one third of the P. reichenowi genome. Orthologous genes between the two species and their alignment were obtained following several steps. First, P. falciparum and P. reichenowi orthologues were obtained using BlastN with scores of E < 1*10-15 and at least 70% similarity in length. Only groups of genes where we obtained only one gene of each species were kept. These groups were then aligned using ClustalW (V. 1.82) using default parameters. All the alignments were then very carefully checked by eye and corrected when necessary. Introns were deleted.
For both the rodent and hominid malaria gene groups, maximum likelihood estimates of rates of non-synonymous (dN) and synonymous (dS) substitutions, averaged over all branches, were obtained using PAML version 3.14 . We used a codon-based model of sequence evolution with dN and dS considered as free parameters and the average nucleotide frequencies estimated from the data at each codon position (F3 × 4 MG model). The transition/tranversion bias K was estimated for each group of orthologous sequence. Because estimates of dS > 1 are more prone to error , only genes with dS ≤ 1 were used for statistical calculations, yielding 843 and 3060 valid orthologues for hominid and rodents malaria groups respectively. For each gene, Likelihood Ratio Tests (LRT) were used to test whether the estimated dN/dS (ω) ratio differed significantly from 1 . The tests were performed as bilateral tests of the hypothesis H o: dN/dS =1 versus the alternative hypothesis H 1: dN/dS ≠ 1 for each group of sequence. Twice the difference of the log likelihood estimated for each hypothesis was then compared to a χ2 distribution with one degree of freedom (df).
To learn more about both synonymous and non-synonymous substitution patterns and their possible causes, we analyzed the effect of several genomic features such as the GC content, the level and timing of expression of genes and the function of proteins.
The biological process of the annotated proteins of P. falciparum was determined using GO (Gene ontology) annotations. A biological process is a series of events accomplished by one or more ordered assemblies of molecular functions. Examples of broad biological process terms are cellular physiological process or signal transduction. Classification of proteins was made using the software GENERIC GENE ONTOLOGY (GO) TERM MAPPER . Because no such classification was available for any of the rodent species, P. yoelii yoelii genes were classified as their orthologous P. falciparum genes defined in The TIGR Plasmodium yoelii yoelii Genome Annotation Database.
For gene expression, we retrieved the mRNA abundance for genes of P. falciparum for different stages (rings, trophozoites, shizonts, merozoites, gametocytes and sporozoites) from . For the rodent lineage, information on expression was available for P. berghei on a lower number of stages (asexual blood stages, gametocytes, ookinetes, oocysts and sporozoites) . Such data precluded any possible rigorous comparison between the two lineages of parasites because of a lack of overlap between stages analyzed in the hominid and rodent lineages. We therefore decided to determine the timing of expression of rodent genes by using their orthology with P. falciparum, simply considering orthologous genes to be expressed at similar stages. We computed the breadth of expression for each gene in each lineage as the total number of different stages in which a gene is expressed.
GC content was computed using CODONW . GC content quantifies the proportion of GC inside the gene.
The P. reichenowi sequence data were produced by the Pathogen Sequencing Group at the Welcome Trust Sanger Institute and can be obtained from ftp://ftp.sanger.ac.uk/pub/pathogens/P_reichenowi/. Authors thank the Welcome Trust for allowing them to use P. reichenowi sequence data. We also thank Nicolas Galtier for very useful discussion. F.P. is supported by CNRS, P. Awadalla is supported by the National Academies and Keck Futures Initiative, the National Institute of Health and the Human Frontiers in Science Program.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.