Phylogenomic analyses of malaria parasites and evolution of their exported proteins
© Pick et al; licensee BioMed Central Ltd. 2011
Received: 14 February 2011
Accepted: 15 June 2011
Published: 15 June 2011
Skip to main content
© Pick et al; licensee BioMed Central Ltd. 2011
Received: 14 February 2011
Accepted: 15 June 2011
Published: 15 June 2011
Plasmodium falciparum is the most malignant agent of human malaria. It belongs to the taxon Laverania, which includes other ape-infecting Plasmodium species. The origin of the Laverania is still debated. P. falciparum exports pathogenicity-related proteins into the host cell using the Plasmodium export element (PEXEL). Predictions based on the presence of a PEXEL motif suggest that more than 300 proteins are exported by P. falciparum, while there are many fewer exported proteins in non-Laverania.
A whole-genome approach was applied to resolve the phylogeny of eight Plasmodium species and four outgroup taxa. By using 218 orthologous proteins we received unanimous support for a sister group position of Laverania and avian malaria parasites. This observation was corroborated by the analyses of 28 exported proteins with orthologs present in all Plasmodium species. Most interestingly, several deviations from the P. falciparum PEXEL motif were found to be present in the orthologous sequences of non-Laverania.
Our phylogenomic analyses strongly support the hypotheses that the Laverania have been founded by a single Plasmodium species switching from birds to African great apes or vice versa. The deviations from the canonical PEXEL motif in orthologs may explain the comparably low number of exported proteins that have been predicted in non-Laverania.
Malaria is one of the most common infectious diseases, putting about two billion humans at risk and resulting in about one million fatalities each year . Malaria is caused by protozoan parasites of the genus Plasmodium (Haemosporidae; Apicomplexa). Species of this genus undergo a complex life cycle including an asexual proliferation phase in the erythrocytes of vertebrate hosts.
Although hundreds of Plasmodium species are currently known, only few infect humans. In moderate climate zones, human malaria infection is largely due to P. vivax, but the life-threatening form of this disease is almost exclusively caused by P. falciparum. About 60 years ago, the high pathogenicity of P. falciparum led to the proposal that this parasite may be a rather recent acquisition from a non-human host . Since then, it has become evident that P. falciparum indeed is closely related to other Plasmodium species from African great apes [3, 4]. Together they constitute the subgenus Laverania and several reciprocal host switches have occurred during the evolution of this group of malaria parasites [[5–9]].
The evolutionary ancestry of P. falciparum and the other Laverania is still a matter of debate. Until now, it has not been conclusively agreed on whether this subgenus is more closely related to other mammalian malaria parasites or whether it shares a common ancestry with bird-infecting Plasmodium species (reviewed in ). Most molecular phylogenetic studies of the genus Plasmodium are based on the analysis of single proteins such as cytochrome b oxidase, adenylosuccinate lyase, and caseinolytic protease C . While these proteins contain sufficient phylogenetic information to resolve the relationships within the Laverania, multiple substitutions per site (homoplasy) limit their utility at a deeper phylogenetic level .
Upon invasion by P. falciparum, erythrocytes are subjected to an extensive remodeling process resulting in altered mechanical and adhesive properties . Prominent examples include the formation of cytoadherence knobs at the erythrocyte membrane and the associated exposure of PfEMP1 (P. falciparum erythrocyte membrane protein) at the surface of the infected cell . Plasmodium proteins involved in this remodeling process have to pass the parasitophorous vacuole membrane (PVM) on their way from the parasite into the erythrocyte; most of these proteins are characterized by a hydrophobic signal sequence for targeting the protein to the endoplasmic reticulum (ER) and a sequence motif (RxLxE/Q/D) either referred to as Plasmodium export element (PEXEL; ) or vacuolar transport signal .
The PEXEL motif is cleaved by the aspartyl protease plasmepsin V in the ER of the parasite [[15–17]] and the nascent protein is released into the parasitophorous vacuole. From there it is transported through the PVM into the host cell by the Plasmodium translocon of exported proteins (PTEX; ). Predictions based on the presence of the PEXEL suggest that more than 300 P. falciparum proteins are exported into the host cell [[13, 14, 19, 20]]. Notably, PfEMP1s are structurally different, having an export element that precedes the signal sequence (R/KxL/V/MxE/D; cf. ). This export element appears to be necessary for export  but is not cleaved in vivo, and therefore might be functionally distinct .
The conservation of plasmepsin V and the components of PTEX throughout the genus Plasmodium indicates that the same protein export machinery is used by all Plasmodium species [[16–18]]. In addition, PEXEL sequences from P. falciparum proteins proved to be functional in rodent malaria parasites  and vice versa . Thus, in principle, the screens to detect exported proteins in P. falciparum should be extendable to other Plasmodium species. However, surprisingly few proteins have been detected outside of the Laverania using the P. falciparum PEXEL motif, and it has been suggested that these species export substantially fewer proteins into the host cell than P. falciparum [[13, 14, 19]].
Non-Laverania, however, also induce elaborate morphological changes in their host cells and the low number of predicted exported proteins may argue for a prominent role of PEXEL-negative exported proteins (PNEPs; reviewed in ). An additional, thus far unexplored, explanation could be a slightly different consensus of the PEXEL motif in Plasmodium taxa other than Laverania that could hamper the prediction of these proteins. This would inevitably lead to an underestimation of the respective exportomes.
Here, we took advantage of the available genomic sequences from eight Plasmodium species and four other apicomplexan species. Orthologous proteins were identified and used (i) to reconstruct the phylogeny of these species, (ii) to obtain a set of exported P. falciparum proteins that are conserved throughout Plasmodium evolution, and (iii) to investigate the evolutionary plasticity of the corresponding Plasmodium export elements.
The genomic sequences of P. falciparum 3D7 , P. yoelii 17XNL , P. berghei Anka , P. chabaudi AS , P. knowlesi H , P. vivax Sal-1 , as well as P. reichenowi and P. gallinaceum (both unpublished data produced by the Wellcome Trust Sanger Institute; used with permission) were obtained from PlasmoDB v. 6.1 ; sequences of Toxoplasma gondii ME49 were obtained from ToxoDB v. 5.2 , sequences of Babesia bovis T2Bo from Integr8 , sequences of Cryptosporidium parvum Iowa from CryptoDB ; sequences of Theileria annulata Ankara  were downloaded from the Sanger Institute http://www.sanger.ac.uk/resources/downloads/protozoa/.
The dataset of orthologous proteins for phylogeny reconstructions was compiled as described before . In brief, InParanoid-TC was used with P. falciparum, P. vivax, P. knowlesi, P. yoelii, P. berghei, P. chabaudi, T. gondii, and B. bovis as primer taxa. For 921 proteins orthologs were present in all eight primer taxa. These 921 core orthologs served then as input for HaMStR to search for the corresponding proteins in P. reichenowi, P. gallinaceum, C. parvum, and T. annulata. Following search species - reference species pairs were used in the HaMStR search: P. reichenowi - P. falciparum, P. gallinaceum - P. falciparum, C. parvum - T. gondii, and T. annulata - B. bovis. HaMStR could extend 218 core orthologs with sequences from all four species such that each ortholog group consisted of twelve sequences. The amino acid sequences for each of the 218 core orthologs were aligned with MAFFT  using the options --maxiterate 1000 and --localpair. The 218 single alignments were concatenated to form a super-alignment with 192,102 aa positions. This super-alignment was processed twice: (i) positions for which less than half of the sequences were represented by an amino acid were removed, and (ii) Gblocks 0.91b  was applied using the following parameters: --minimum number of sequences for a conserved position was set to 7; --minimum number of sequences for a flanking position was set to 10; --maximum number of contiguous nonconserved positions was set to 4; --minimum length of a block was set to 10; and --allowed gap positions was set to none.
To obtain the collection of exported proteins that have functionally equivalent orthologs in the other Plasmodium species, the two most comprehensive predictions of exported P. falciparum proteins were used [19, 20]. These predictions contain 396 and 422 proteins (not including the structurally distinct PfEMP1s), respectively; the combination of both resulted in a non-redundant set of 531 putatively exported proteins. Each protein was used as query for a tBLASTn search against the P. falciparum genome. Proteins, for which the E value of the best BLAST hit (not considering the hit against itself) was larger than 10-10, were considered to have no paralogs present in P. falciparum and were used for further analysis. For each paralog-free protein a reciprocal tBLASTn search was performed to identify candidate orthologs in the other Plasmodium species (E value cut-off: < 10-10). Proteins with a single ortholog present in each of the eight Plasmodium species were aligned with MAFFT  using the options --maxiterate 1000 and --localpair.
Maximum likelihood (ML) trees were reconstructed with RAxML v. 7.2.2  using the WAG model  of amino acid sequence evolution with empirical amino acid frequencies (option F). Substitution rate heterogeneity was modeled using a gamma distribution, allowing for a fraction of invariant sites (option GAMMAI). Bayesian tree search was performed with PhyloBayes v. 3.2  using the WAG model. Four MCMC chains were run for 10,000 cycles. Every 10th cycle was sampled and convergence of the chains was pair-wise checked with bpcomp allowing for a burn-in of 1,000 cycles. Increasing the burn-in or usage of other models of amino acid sequence evolution such as the CAT  or LG model  did not change the results (not shown).
Molecular phylogenetic analyses attempting to resolve the relationships among malaria parasites.
Number of ingroup taxa
Laverania + avian malaria parasites
Waters et al. 1991 
Escalante and Ayala 1994 
Escalante and Ayala 1995 
Qari et al. 1996
Escalante et al. 1998 
Rathore et al. 2001 
Perkins and Schall 2002 
Kissinger et al. 2002 
Leclerc et al. 2004
Roy and Irimia 2008
Martinsen et al. 2008 
Cyt b, Cox I, ClpC, ASL (concatenated)
Ollomo et al. 2009 
Cyt b, Cox I, Cox III (concatenated)
Krief et al. 2010 
Dhfr-ts, Msp2 (concatenated)
Silva et al. 2010 
Pairwise amino acid identities and similarities were calculated with GeneDoc v. 2.6  using the Blosum 62 model. PEXEL sequences of the P. falciparum proteins were identified via a match to the published consensus sequences [[13, 14, 19, 20]]. The putative PEXEL sequences of proteins from other Plasmodium species were extracted by aligning these proteins to their ortholog in P. falciparum; we then used the homologous amino acid positions to the P. falciparum PEXEL as candidate export elements in these species. PEXEL sequences from the individual proteins were aligned separately for each species by hand and the corresponding PEXEL motifs were generated with WebLogo . Presence of hydrophobic signal sequences was assessed using SignalP v. 3.0 .
We extracted the genomic sequences of eight Plasmodium species (P. falciparum, P. reichenowi, P. vivax, P. knowlesi, P. gallinaceum, P. chabaudi, P. yoelii, and P. berghei) and four additional apicomplexan species (T. gondii, C. parvum, T. annulata, and B. bovis) from public databases. HaMStR, a Hidden Markov Model based tool , was used to identify 218 proteins with orthologs in all twelve species (Additional file 1). This number is similar to that used in a recent phylogenomic study of eight apicomplexan species, including two species from the genus Plasmodium .
The phylogenetic analyses show that the eight Plasmodium species form a monophyletic clade (100% bootstrap support and 1.00 Bayesian posterior probabilities). The malaria parasites from rodents (P. chabaudi, P. yoelii, and P. berghei) are clearly separated from those infecting birds and primates (100% bootstrap support and 1.00 Bayesian posterior probabilities). Notably, the Laverania (P. falciparum and P. reichenowi) do not group with the other primate-infecting malaria parasites, but form a well-supported clade with P. gallinaceum (99% bootstrap support and 1.00 Bayesian posterior probabilities).
Until now, two other whole-genome approaches attempted to resolve the evolutionary relationships of the eight Plasmodium species. Dávalos and Perkins  based their analyses on a set of 104 proteins (~26,000 aa positions), recovering the same topology among Plasmodium species as displayed in Figure 1. However, no outgroup taxa were included to root the tree, and thus no information on the evolutionary ancestry of the Laverania could be provided. Silva et al. , on the other hand, based their analyses on a set of 29 proteins (~12,000 aa positions) and used two species from the genus Theileria to root the tree. While they proposed the monophyly of mammalian Plasmodium species, some of their results supported a grouping of P. gallinaceum and the Laverania.
The bootstrap support for the clade consisting of P. gallinaceum and Laverania was maximal for the dataset comprising proteins evolving at an intermediate speed (98%) and minimal for the dataset comprising the fast-evolving proteins (76%). The branch leading to the clade consisting of P. gallinaceum and Laverania was short (~0.02 expected substitutions per site; cf. Figure 4). When using fast-evolving proteins, multiple substitutions in the dataset might confound the phylogenetic signal leading to artifacts due to long branch attraction . On the other hand, using only slow-evolving proteins is likely to result in a dataset with a phylogenetic signal that is too weak to resolve this branch (see also Additional file 6). This may explain why proteins evolving with an intermediate rate provide the most robust tree.
The finding of a relationship between the Laverania and avian malaria parasites agrees with earlier studies by Waters et al. , Escalante and Ayala , and Kissinger et al. . However, it contradicts more recent results by Perkins and Schall , Leclerc et al. , Roy and Irimia  and Martinsen et al. . This discrepancy may be attributed to the limited phylogenetic information in the few proteins that were used in those studies . While the selection of proteins may have some effect (see above), the number and choice of the outgroup taxa deserve particular attention (e.g., ). Alternative root placements lead to different conclusions about the order in which the individual Plasmodium species emerged (c.f. Figure 2). In many previous studies, only a single outgroup taxon was used (Table 1). Moreover, in some cases this outgroup was evolutionarily so distantly related that a meaningful placement of the root is unlikely (e.g., ). Most recent studies of Plasmodium phylogeny used selected species from the closely related genus Leucocytozoon as an outgroup (c.f. Table 1). However, the limited amount of sequence data available for this taxon - mainly a few mitochondrial genes - currently prevents its use in phylogenomic studies. Other haemosporidians (i.e., species from the genera Haemoproteus, Parahaemoproteus, and Hepatocystis) should not be considered as an outgroup since the genus Plasmodium has been shown to be paraphyletic with respect to these taxa (e.g., ). Alternative strategies for a reliable root placement employ the inclusion of multiple outgroup taxa to break the branch separating the ingroup from the outgroup, and the use of a comprehensive set of proteins . Our trees include four apicomplexan species as an outgroup and are based on 218 orthologous proteins. We have obtained identical tree topologies by employing different tree reconstruction methods (ML and Bayesian inference) and different models of sequence evolution. Moreover, our findings remain unchanged when we use proteins with different evolutionary rates. Ultimately, likelihood ratio tests rejected all alternative tree topologies. Thus, we are confident that our root placement is robust and that P. gallinaceum and the Laverania indeed share a common ancestry.
An avian parasite as sister to the Laverania has significant implications: it suggests that a host switch from birds to African great apes or vice versa has occurred. Host switches have repeatedly taken place during the evolution of avian Plasmodium species . Moreover, avian Plasmodium species are able to infect mammals under experimental conditions . Both observations are congruent with an evolutionary scenario in which the laveranian lineage was established by a single Plasmodium species switching from birds to African great apes. Subsequent diversification of Laverania associated with multiple host switches within the apes eventually led to the emergence of P. falciparum in humans [[5–9]]. Note that this scenario also implies that the great diversity of malaria parasites infecting birds  may in fact derive from an early host switch by another mammalian Plasmodium species. At present, however, we cannot exclude the alternative scenario in which the avian Plasmodium lineage was established by a Plasmodium species from the laveranian lineage. Therefore, phylogenomic analyses considering additional Plasmodium species (and in particular those infecting birds and squamate reptiles) will be necessary to provide a more detailed picture of how the Laverania emerged.
The availability of Plasmodium genome sequences together with the reliable reconstruction of their phylogenetic relationships provides a robust framework to investigate the evolutionary history of exported P. falciparum proteins. Here, we used 531 P. falciparum proteins that had been predicted to be exported into the host cell [19, 20] to identify functionally equivalent orthologs in the other Plasmodium species. BLAST searches in the P. falciparum genome identified 102 proteins without any recognizable paralog (Additional file 7), whereas the other 429 proteins mainly belong to large gene families such as RIFINs (repetitive interspersed family) and STEVORs (subtelomeric variable open reading frames). These gene families have a complex evolutionary history and have undergone independent lineage-specific diversifications . This indicates that even if homologs of these proteins exist in the other Plasmodium species, they do not necessarily share the same function. These proteins were therefore excluded from further analyses.
Exported P. falciparum proteins with one-to-one orthologs present in all Plasmodium species.
Both reciprocal best BLAST hit searches and phylogenetic tree reconstructions indicate that the proteins in the 33 groups are encoded by genes that remained single copy throughout evolution (one-to-one orthologs). Ample evidence exists that such one-to-one orthologs are functionally equivalent . Therefore, we conclude that if the P. falciparum protein is exported, its orthologs in other Plasmodium species are exported as well and hence, that these proteins are suitable to assess the evolutionary plasticity of the PEXEL motif. Note that five of these 33 proteins have already been confirmed to be exported in P. falciparum using GFP-constructs (; Table 2). However, five proteins appear not to be exported ([19, 20]; Table 2); thus they were omitted from further analyses.
The most prominent difference between the Plasmodium species was found for the positively charged amino acid at position 1 of the PEXEL motif. All 28 P. falciparum proteins harbor an arginine (R), whereas about 20% of the proteins from non-Laverania have a lysine (K) at this position. Three lines of evidence indicate that this alternate PEXEL is nevertheless functional: (i) lysine at position 1 of the PEXEL was found in orthologs of those P. falciparum proteins whose export into the host cell has been confirmed (Figure 5), and thus our observation is not restricted to proteins that might have been erroneously predicted as being exported; (ii) recent experimental evidence suggests that the typical cleavage at the leucine (L) at position 3 can occur in proteins containing lysine at position 1 (PFI1780w and MAL3P8.15; ); and (iii) a small number of proteins with a lysine at position 1 of the PEXEL have already been predicted to be exported using a Hidden Markov Model based prediction method (21 in P. falciparum, three or less in each of the other Plasmodium species; cf. ). Other deviations at position 1 that are less prominent include the presence of histidine (H) in the P. knowlesi and P. vivax orthologs of PFC0435w and of glutamine (Q) in the P. gallinaceum protein that is orthologous to PFA0210c (Figure 5). Both PFC0435w and PFA0210c belong to the confirmed set of exported proteins in P. falciparum  and therefore these PEXEL sequences are likely to be functional as well. Position 3, which almost invariably harbors a hydrophobic leucine (L), was also found almost invariable in the orthologs of the confirmed exported P. falciparum proteins. However, several orthologs of P. falciparum proteins that have not yet been confirmed to be exported have an isoleucine (I) at this position (Figure 6). Position 5, which is considered to be the least conserved position [13, 14], was found to be even more variable in the group of confirmed exported P. falciparum proteins.
Even though it remains to be demonstrated that these orthologous proteins are cleaved and exported with the same efficiency, these observations suggest that the PEXEL motif is more variable than previously acknowledged. This provides a possible explanation for the small number of exported proteins predicted for some Plasmodium species. Taking this plasticity into account will be essential to arrive at a more comprehensive set of exported proteins for all Plasmodium species.
Our phylogenetic analyses of orthologs deduced from the Plasmodium genomes strongly suggests that the subgenus Laverania was established by a single host switch from birds to African great apes (or vice versa). However, sequences from additional bird-infecting Plasmodium species and the closely related Haemosporida are required to better understand the early evolution of the Laverania. Exported proteins, as identified by the PEXEL motif, play a major role in Plasmodium virulence and facilitate the parasite's survival in the host cell. Our results suggest that the number of exported proteins is higher in the non-laveranian Plasmodium species than previously assumed. Comprehensive knowledge of their diversity and evolution will help to unravel the emergence of the high pathogenicity of P. falciparum, and may allow the identification of novel targets for malaria therapy.
The P. gallinaceum and P. reichenowi sequence data were produced by the parasite genomics group at the Wellcome Trust Sanger Institute and are available from http://www.sanger.ac.uk/resources/downloads/protozoa/. We thank Janus Borner for help with the software applications and Tina Koestler for helpful discussion. The studies were supported in part by the DFG priority program SPP 1174 Deep Metazoan Phylogeny [BU 956/8]. IE acknowledges support by a grant of the Wiener Wissenschafts-, Forschungs- und Technologie Fonds (WWTF) to Arndt von Haeseler, and from the DFG priority program SPP 1174 Deep Metazoan Phylogeny [HA 1628/9]. We thank two anonymous reviewers and the editor for their helpful comments, and Kathleen Rankin for correction of the language.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.