Consistent and contrasting properties of lineage-specific genes in the apicomplexan parasites Plasmodium and Theileria
© Kuo and Kissinger. 2008
Received: 02 November 2007
Accepted: 11 April 2008
Published: 11 April 2008
Skip to main content
© Kuo and Kissinger. 2008
Received: 02 November 2007
Accepted: 11 April 2008
Published: 11 April 2008
Lineage-specific genes, the genes that are restricted to a limited subset of related organisms, may be important in adaptation. In parasitic organisms, lineage-specific gene products are possible targets for vaccine development or therapeutics when these genes are absent from the host genome.
In this study, we utilized comparative approaches based on a phylogenetic framework to characterize lineage-specific genes in the parasitic protozoan phylum Apicomplexa. Genes from species in two major apicomplexan genera, Plasmodium and Theileria, were categorized into six levels of lineage specificity based on a nine-species phylogeny. In both genera, lineage-specific genes tend to have a higher level of sequence divergence among sister species. In addition, species-specific genes possess a strong codon usage bias compared to other genes in the genome. We found that a large number of genus- or species-specific genes are putative surface antigens that may be involved in host-parasite interactions. Interestingly, the two parasite lineages exhibit several notable differences. In Plasmodium, the (G + C) content at the third codon position increases with lineage specificity while Theileria shows the opposite trend. Surface antigens in Plasmodium are species-specific and mainly located in sub-telomeric regions. In contrast, surface antigens in Theileria are conserved at the genus level and distributed across the entire lengths of chromosomes.
Our results provide further support for the model that gene duplication followed by rapid divergence is a major mechanism for generating lineage-specific genes. The result that many lineage-specific genes are putative surface antigens supports the hypothesis that lineage-specific genes could be important in parasite adaptation. The contrasting properties between the lineage-specific genes in two major apicomplexan genera indicate that the mechanisms of generating lineage-specific genes and the subsequent evolutionary fates can differ between related parasite lineages. Future studies that focus on improving functional annotation of parasite genomes and collection of genetic variation data at within- and between-species levels will be important in facilitating our understanding of parasite adaptation and natural selection.
Comparative genomics has revealed pronounced differences in gene content across species . In an early analysis of eight microbial genomes, 20–56% of the genes in a genome were shown to not have high similarity to any sequence in public databases . Initially these genes were referred to as orphan genes, or ORFans, because they correspond to stretches of open reading frame in bacterial genomes that have no known relationship to other sequences. As more eukaryote genome sequences become available, the term 'lineage-specific gene' is gaining in popularity because one can specify the 'lineage specificity' of a gene to describe its phylogenetic distribution .
Newly evolved genes may be important for adaptation and generation of diversity . For example, the protozoan parasite Cryptosporidium parvum possesses a set of nucleotide salvage genes that are unique among all apicomplexans surveyed to date . Acquisition of the nucleotide salvage pathway from a proteobacterial source as well as other sources apparently facilitated loss of genes involved in de novo pyrimidine biosynthesis, rendering this parasite entirely dependent on the host for both its purines and pyrimidines. Characterization of these lineage-specific genes not only leads to a better understanding of the parasite's biology but also provides a promising therapeutic target against an important parasite, since blocking the nucleotide salvage pathway can inhibit parasite growth but not harm its human host .
Currently, there are several hypotheses regarding the origin of lineage-specific genes. The first model invokes the process of horizontal gene transfer, in which organisms acquire genes from other distantly related species. This mechanism can create lineage-specific genes that are not shared by closely related organisms, as in the example of nucleotide salvage enzymes in C. parvum . Previous studies have shown that horizontal gene transfer is an important force for genome evolution in bacteria [6–8], unicellular eukaryotes , and multicellular eukaryotes .
The second model is based on gene duplication followed by rapid sequence divergence [11, 12]. Based on the observation that the sequence divergence rate is positively correlated with lineage specificity in a diverse set of organisms [3, 11–14], Alba and Castresana  proposed that newly duplicated genes may be released from selective constraint and accumulate mutations at a faster rate. While most of the mutations may be deleterious and lead to loss of function in one copy , it is also possible that one of the copies can acquire new functions and become a novel gene in the genome. However, whether gene duplication followed by rapid divergence is truly an important mechanism of generating lineage-specific genes is still under debate. Elhaik et al.  suggested that the correlation between divergence rate and lineage specificity may simply be an artifact, stemming from our inability to identify homologs of fast-evolving genes across distantly related taxa based on sequence similarity searches. However, a recent simulation study by Alba and Castresana  demonstrated that sequence similarity searches performed at the amino acid level can reliably detect fast-evolving genes due to the rate heterogeneity among sites.
In addition to the two main models discussed above, other explanations for the origin of lineage-specific genes such as de novo creation from non-coding sequences [18, 19], exon-shuffling [20, 21], intracellular gene transfer between organellar and nuclear genomes , and differential gene loss  also have been proposed. However, the relative importance of various forces that generate lineage-specific genes remains largely unknown.
While erroneous annotation has also been proposed as one explanation for the abundance of lineage-specific genes [23, 24], expression data [25, 26] and nucleotide substitution patterns [24, 27] suggest that many lineage-specific genes are indeed functional and not annotation artifacts. Unfortunately, understanding the biological function of these genes is difficult due to the lack of homologs in model organisms to use for functional characterization. As a result, a large percentage of the lineage-specific genes that have been identified to date are annotated as hypothetical proteins of unknown function.
In this study, we aim to characterize the lineage-specific genes in a group of unicellular eukaryotes from the phylum Apicomplexa, including several important pathogens of humans and animals. The most infamous member of this phylum is the causative agent of malaria, Plasmodium, which causes more than one million human deaths per year globally . Other important lineages include Cryptosporidium that causes cryptosporidiosis in humans and animals [29, 30], Theileria that causes tropical theileriosis and East Coast fever in cattle [31, 32], and Toxoplasma that causes toxoplasmosis in immunocompromised patients and congenitally infected fetuses . The availability of genome sequences from these apicomplexan species has provided us with new and exciting opportunities to study their genome evolution. Improved knowledge of the lineage-specific genes in these important parasites can lead to a better understanding of their adaptation history and possibly identification of novel therapeutic targets.
List of species name abbreviation and data sources
Number of sequences
Cryptosporidium hominis 
Cryptosporidium parvum 
Plasmodium falciparum 
Theileria annulata 
Theileria parva 
J. Craig Venter Institute 
Paramecium tetraurelia 
Tetrahymena thermophila 
J. Craig Venter Institute 
We selected Plasmodium falciparum and Theileria annulata for further investigations of lineage-specific genes. The asymmetrical topology of the species tree allows categorization of the genes in these two species into six levels of lineage specificity (Figure 2), yielding the highest resolution in determining the lineage specificity of a gene. The least specific genes at level 1, denoted as Pf1 for those in the P. falciparum genome and Ta1 for those in the T. annulata genome, are shared by all nine species analyzed, including two free-living ciliates; the most specific genes at level 6, denoted as Pf6 for those in the P. falciparum genome and Ta6 for those in the T. annulata genome, are species-specific. Together these six sets of genes account for 77% of annotated P. falciparum proteins (4,141/5,411) and 84% of annotated T. annulata proteins (3,191/3,795). Genes that are shared by a non-monophyletic group (e.g., shared by P. falciparum and T. annulata but are not found in any other species) are omitted from the following analyses. Additionally, the two species pairs, P. falciparum-P. vivax and T. annulata-T. parva, may have comparable divergence times in the range of approximately 80–100 million years [41, 42] such that we can directly compare the properties of their species-specific genes. Finally, within the two focal genera, P. falciparum and T. annulata have a higher level of completeness of genome assembly than their sister species and thus are better choices for determining the chromosomal location of the lineage-specific genes.
Nucleotide substitution rates in Theileria
Number of sequences
d N /d S ratio
Relative codon bias
Characteristics of lineage-specific genes in Plasmodium falciparum
Number of gene clusters
Number of P. falciparum genes
Average protein length (a.a.)
Frequency of genes with
"Hypothetical protein" in product description
Predicted signal peptide or transmembrane domains
The two focal lineages in our analysis, Plasmodium and Theileria, exhibit one interesting difference in terms of the phylogenetic distribution of surface antigens. We found that surface antigens are species-specific in Plasmodium and genus-specific in Theileria. All members of the three large surface antigen protein families in P. falciparum genome, including 161 rifin, 74 PfEMP1, and 35 stevor, are found in the Pf6 list and have no ortholog in P. vivax. Of the 163 T. annulata proteins that contain FAINT, a protein domain that associates with proteins exported to the host cell , 116 are in the Ta5 list (i.e., shared by T. annulata and T. parva) and only 28 are in the Ta6 list (i.e., specific to T. annulata).
In P. falciparum 41% of the genus-specific proteins and 62% of the species-specific proteins contain a putative signal peptide or at least one predicted transmembrane domain (Table 4), which suggests that these proteins may be exported to the host cell or present on the surface of the parasite or its vacuole. This result is consistent with the hypothesis that lineage-specific genes in apicomplexan parasites are likely to be involved in host-parasite interactions and thus, potentially adaptation.
In T. annulata, genes with different levels of lineage specificity have similar average distances to chromosome ends (Figure 7C). This result corroborates the visual pattern in Figure 6A that species-specific genes are distributed across the entire length of a chromosome, in contrast to the clustering near chromosome ends observed in P. falciparum (Figure 5A). For all four chromosomes in T. annulata, the regions that are adjacent to chromosome ends and devoid of phylogenetically conserved genes (i.e., Ta1 through Ta4) are approximately 20–40 kb (Figure 7D), a distance smaller than in P. falciparum. Unlike the pattern found in P. falciparum in which species-specific genes are closer to chromosome ends than genus-specific genes, genus- and species-specific genes in T. annulata (i.e., Ta5 and Ta6) have similar minimal distances in all four chromosomes (Figure 7D).
We identified a pattern in which lineage-specific genes have a higher level of sequence divergence among sister species in a group of important protozoan parasites. This result is consistent with previous studies in bacteria , fungi , and animals [11, 12, 14]. Now we further confirm that this pattern also holds true in a protistan phylum, suggesting that it may be universal across much of the tree-of-life. Results from functional analyses agree with our intuitive expectation that conserved genes are involved in basic cellular functionalities and are well annotated. A large number of the lineage-specific genes (at the species level in Plasmodium and the genus level in Theileria) are found to be putative surface antigens that the parasites use to interact with their hosts. This result supports the hypothesis that lineage-specific genes may be important in adaptation . In addition, the physical distance of a gene to the nearest chromosome end is correlated with the level of sequence divergence.
We found three contrasting properties of lineage-specific genes between two major apicomplexan lineages. First, families of surface antigens are species-specific in Plasmodium but genus-specific in Theileria. Second, most of the species-specific genes are located in sub-telomeric regions in P. falciparum but no such pattern exists in T. annulata. Third, the (G + C) content at the third codon position increases with lineage specificity in P. falciparum but decreases in T. annulata. Taken together, these results suggest that the mechanisms of generating lineage-specific genes and their subsequent evolutionary fates differ between apicomplexan parasite lineages.
All apicomplexan species analyzed have small genomes compared to the free-living out-group. This result is consistent with comparative genomic analyses conducted in other pathogenic bacteria and eukaryotes; extreme genome reduction is a common theme in the genome evolution of these organisms .
A large proportion of the genes in apicomplexans are genus-specific (Figure 2). One parsimonious explanation for this observation is that each lineage acquired a new set of genes during its evolutionary history. An alternative explanation invokes differential loss among lineages when evolving from a free-living ancestor with a relatively large genome. We found that 23% of the protein coding genes in P. falciparum and 16% in T. annulata have a complex phylogenetic distribution pattern and do not fit into a simple single gain/loss model. These results suggest that some ancestral genes in the apicomplexans may have experienced multiple independent losses during their evolutionary history. Further investigation is necessary to distinguish true gene gains from differential retention of ancestral genes.
Consistent with previous studies in bacteria , fungi , and animals [11, 12, 14], we observed a pattern in which sequence divergence is higher in genes with a higher level of lineage specificity. One explanation is that phylogenetically conserved genes are often involved in fundamental cellular processes (see Results). These genes are likely to be under purifying selection that constrains the rate of sequence divergence. In support of this hypothesis, we observe that the mean d N /d S ratio among the level 1 genes in Theileria is only 0.07 (Table 2), indicating an extremely low rate of nonsynonymous substitution relative to synonymous substitution.
Based on the hypothesis that lineage-specific genes are often involved in adaptation , such as invasion of hosts or evasion of the immune responses, lineage-specific genes may be under positive selection and have a faster rate of sequence divergence. Our data is suggestive in this regard, as genus-specific genes exhibit higher sequence divergence than genes with lower levels of lineage specificity. Unfortunately we cannot directly test the hypothesis that lineage-specific genes are more likely to be under positive selection using the d N /d S ratio data. The level of sequence divergence is too high in both species pairs for such analysis. Practically all of the genes from the Plasmodium pair and approximately 1,000 genes from the Theileria pair (i.e., more than a quarter of the gene repertoire) have a d S estimate that is larger than one. Under this high level of sequence divergence, we cannot confidently estimate the substitution rate due to saturation. Better detection of positive selection in these genes requires data on genetic variation at within- and between-species levels [46, 47].
Codon bias analyses indicate that species-specific genes have a different codon preference compared to other genes in the same genome, whereas the genes with lower levels of lineage specificity are relatively similar to each other (Table 3). It is possible that species-specific genes are relatively young and have yet to adapt to the codon usage pattern of the genome. Support for this hypothesis provided by the observation that the (G + C) content at the third codon position is much lower in the phylogenetically conserved genes in P. falciparum (Figure 4), suggesting that these 'older' genes are more biased toward GC-poor codons in this AT-rich genome. Alternatively, some species-specific genes may be subject to a different pattern of selection and thus possess different codon preference.
For the lineage-specific genes at the genus and species level that have functional annotations, many are known surface antigens. Because surface antigens are used by the parasites to interact with their hosts , such as adhesion to the cell surface or evasion of the host immune response, this result supports the hypothesis that (at least some) lineage-specific genes are involved in host-parasite interactions and have facilitated lineage-specific adaptation. Interestingly, surface antigens are species-specific in Plasmodium, but are genus-specific in Theileria. In addition, 62% of P. falciparum -specific genes contain a putative signal peptide or at least one predicted transmembrane domain. This result is consistent with one previous study that compared P. falciparum with three other Plasmodium species that cause rodent malaria . Of the 168 P. falciparum -specific genes identified in this previous study that are not located in sub-telomeric regions, 68% are predicted to be exported to the surface of the parasites or the infected host cells.
Previous studies suggest that the two focal species pairs have similar divergence times. The two Plasmodium species diverged about 80–100 million years ago  and the two Theileria species diverged about 82 million years ago . Our results indicate that sequence divergence is much higher between the two Plasmodium species (Figures 1 and 3). This may be caused by the difference in nucleotide composition, since P. falciparum has a GC content of 24% while P. vivax has a GC content of 46% in the coding region. Bias in nucleotide composition has been shown to change codon usage and amino acid composition . Alternatively, it is also possible that the divergence time between T. annulata and T. parva was overestimated because it was based on a simplified assumption that the synonymous substitution rate in Theileria is similar to that in Plasmodium .
In both P. falciparum and T. annulata, the sub-telomeric regions contain exclusively genus- or species-specific genes. Interestingly, the physical size of these regions is not correlated with chromosome size. This observation indicates that these regions are proportionally larger in smaller chromosomes and helps explains the pattern that the three small chromosomes in P. falciparum have many more species-specific genes than predicted by random expectations (see Results). In addition, genes that are located near a chromosome end have a higher level of sequence divergence in both species, regardless of their lineage specificity (Figure 8). The high evolutionary rates in sub-telomeric regions are shared by many eukaryotic lineages; high rates of inter-chromosomal recombination, local duplication, and segmental rearrangement have been reported in organisms including humans , yeasts , and plants .
Given the high rates of evolution in sub-telomeric regions, it may be advantageous for pathogens to have their surface antigen genes located in these evolutionary hotspots to facilitate the generation of antigenic diversity. Consistent with this hypothesis, many micro-parasites have large gene families that encode surface antigens in sub-telomeric regions (reviewed in ). The best-studied example is the causative agent of African trypanosomiasis, Trypanosoma brucei. The vsg gene family in T. brucei encodes variant surface glycoproteins (VSG) that form a dense coat on the outside of the parasite. In the bloodstream stage, T. brucei sequentially expresses different members of the vsg gene family, one at a time, to generate antigenic variation . The positioning of vsg genes in the genome is tightly linked to regulation of expression; the actively expressed vsg is duplicated into one of the bloodstream expression sites located in the sub-telomeric regions (reviewed in [56, 57]). This homologous recombination process which involves loci that are not positional alleles is hypothesized to be important in generating genetic diversity within the gene family . Although the genes encoding surface antigens in P. falciparum are not known to be duplicated into specific expression sites as observed in T. brucei, the clustering of these genes in sub-telomeric regions can facilitate inter-chromosomal recombination that increases antigenic variation .
We found that most of the surface antigen genes in P. falciparum are located in sub-telomeric regions, as previously noted . Several studies have established the importance of genome location in the generation and maintenance of antigenic variation in P. falciparum [58, 59]. The surface antigen PfEMP1 possessed by P. falciparum is exported to the cell surface of infected erythrocytes. PfEMP1 can remove infected erythrocytes from blood circulation by cellular adherence to microvascular endothelial cells and avoid spleen-dependent killing . The study on genetic structuring suggested that the approximately 60 copies of var genes (which encode PfEMP1) in the P. falciparum genome can be divided into three functionally diverged groups with two in sub-telomeric regions and one close to the centers of chromosomes . Furthermore, the recombination rate is found to be high among members in the same functional group but low for members belonging to different groups. This recombinational hierarchy may facilitate the generation of genetic diversity within a group and promote specialization between different groups. Experimental evidence suggests that the clustering of var genes in the sub-telomeric regions is important in the epigenetic regulation of gene expression in P. falciparum [61, 62].
Given the generality of association between surface antigen genes and sub-telomeric regions in micro-parasites, it is interesting to see that T. annulata appears to be an exception to this rule. This finding may provide an explanation for the difference in host range between the two apicomplexan lineages. Because a large percentage of surface antigen genes in Plasmodium are located in sub-telomeric regions, the generation of antigenic variation may be faster in Plasmodium than in Theileria. Our results indicate that gene families encoding surface antigens in Plasmodium are highly diverged between species within the genus, whereas the two Theileria species still share most of their surface antigens and the genes encoding them are distributed across the entire lengths of chromosomes. For this reason, Plasmodium may be able to adapt to new host species at a faster rate, resulting in its much wider host range compared to Theileria; Plasmodium spp. can infect mammals, birds, and reptiles, whereas Theileria spp. are limited to ruminants .
Our results agree with previous observations in other organisms that lineage-specific genes have a higher level of sequence divergence compared to phylogenetically conserved genes. In addition, two major apicomplexan lineages may have different mechanisms for generating or retaining species-specific genes. Because many lineage-specific genes in these parasites are surface antigens that interact with the host, future investigations on genome evolution in these parasites may facilitate the identification of new therapeutic or vaccine targets. Future studies that focus on improving functional annotation of parasite genomes and the collection of genetic variation data at different phylogenetic levels will be important in our understanding of parasite adaptation and natural selection.
The data sources of the annotated proteins are listed in Table 1. Protein domain identification was performed with HMMPFAM  (version 20.0). Transmembrane domain prediction  and gene expression data  of annotated Plasmodium falciparum genes were downloaded from PlasmoDB  (Release 5.3).
Orthologous gene clusters were identified using OrthoMCL  (version 1.3, April 10, 2006) with default parameter settings. The ortholog identification process in OrthoMCL is largely based on the popular criterion of reciprocal best-hits but also involves an additional step of Markov Clustering  to improve sensitivity and specificity. We used WU-BLAST  (version 2.0) for the all-against-all BLASTP similarity search step with the e-value cutoff set to 1e-15.
Based on the orthologous gene clustering result, we identified genes that are shared by all nine species to infer the species tree. Orthologous gene clusters that contain more than one gene from any given species were removed to avoid the complications introduced by paralogous genes in phylogenetic inference. Of the 768 orthologous gene clusters that are shared by all nine species (Figure 2), 154 clusters were single-copy in all species. For each gene, CLUSTALW  (version 1.83) was used for multiple sequence alignment. We enabled the 'tossgaps' option to ignore gaps when constructing the guide tree and used the default settings for all other parameters. The alignments produced by CLUSTALW were filtered by GBLOCKS  (version 0.91b) to remove regions that contain gaps or are highly divergent. Individual genes that had less than 100 aligned amino acid sites (33/154) or contained identical sequences from different taxa (38/154) after GBLOCKS filtering were eliminated from further analysis. We concatenated the alignments from the remaining 83 genes (with a total of 24,494 aligned amino acid sites) and utilized PHYML  to infer the species tree based on the maximum likelihood method. We used PHYML to estimate the proportion of invariable sites and the gamma distribution parameter (with eight substitution categories). The substitution model was set to JTT  and we enabled the optimization options for tree topology, branch lengths, and rate parameters. To estimate the level of support on each internal branch, we performed 100 non-parametric bootstrap samplings.
The nonsynonymous and synonymous substitution rates at the nucleotide level (i.e., d N and d S ) were estimated using CODEML in the PAML package . We performed pairwise sequence alignment at the amino acid level using CLUSTALW  with default parameters for all orthologous genes that are single copy in both Plasmodium species or both Theileria species. The protein alignments were converted into the corresponding nucleotide alignments using NAL2PAL  (version 12). All gap positions were removed from the alignments before the substitution rate estimation by CODEML. To avoid problems of inaccurate rate estimation caused by saturation, we excluded sequences with a synonymous substitution rate (d S ) that is greater than one.
To quantify the level of sequence divergence at the amino acid level, we used TREE-PUZZLE  to calculate the protein distance between orthologs in sister species. The parameters were set to the JTT substitution model , mixed model of rate heterogeneity with one invariable and eight Gamma rate categories, and the exact and slow parameter estimation. Orthologous sequences were first aligned using CLUSTALW  followed by a filtering step using GBLOCKS  to remove gaps and highly divergent regions before the calculation of protein distance. Five sequences (PFA0650w, PFD0105c, PFL0060w, and PFD1140w from P. falciparum and TA18345 from T. annulata) that were not reliably aligned to their ortholog in the sister species were excluded from this analysis.
The relative codon bias between sets of genes in the two focal species, P. falciparum and T. annulata, was calculated based on the method developed by Karlin et al. . Briefly, the method considers two sets of genes, one focal set and one reference set, and calculates the difference in relative frequency of codon family that encode the same amino acid between the two sets. The theoretical maximum of the difference between two sets of genes is 2.000, but the empirical values based on biological data generally range from 0.050 to 0.300 [44, 72, 73]. This measurement is different from the conventional codon adaptation index (CAI) developed by Sharp and Li , in which a set of highly expressed genes is always used as the reference set. We choose the relative codon bias to measure codon preference because it can provide a better resolution under certain conditions. For example, two sets of weakly expressed genes may have similar values of codon adaptation index but still possess vastly different codon preferences.
GBROWSE  was used for visualization of gene distribution on chromosomes. To quantify the pattern of chromosomal location, we calculated the distance of each gene to the nearest chromosome end. For example, the P. falciparum gene PF10_0023 on chromosome MAL10 (physical size is 1,694,445 bp) starts at position 99,380 and ends at 100,362. Its distance to the nearest chromosome end was calculated as 99,380 - 1 = 99,379 bp. For gene PF10_0369 on the same chromosome that starts at 1,493,991 and ends at 1,496,955, its distance to the nearest chromosome end was calculated as 1,694,445 – 1,496,955 = 197,490 bp. The orientation of a gene (i.e., whether it is on the '+' strand or the '-' strand) is ignored for distance calculation.
CHK was supported by a NIH Training Grant (GM07103), the Kirby and Jan Alton Graduate Fellowship, and a Dissertation Completion Assistantship at the University of Georgia. Funding for this work was provided by NIH R01 AI068908 to JCK. The Institute of Bioinformatics and the Research Computing Center at the University of Georgia provided computation resources. P. Brunk, F. Chen, J. Felsenstein, M. Heiges, J. Mrazek, A. Oliveira, E. Robinson, and H. Wang provided valuable assistance on the use of computer hardware and software. D. Promislow, J. Bennetzen, D. Hall, J. Linder, J. Moorad, B. Striepen, and four anonymous reviewers provided helpful comments that improved the manuscript. We thank the J. Craig Venter Institute for providing pre-publication access to the genome sequence data of Plasmodium vivax and Toxoplasma gondii. The US Department of Defense, the National Institute of Allergy and Infectious Disease, and the Burroughs Wellcome Fund provided funding for the genome sequencing project of Plasmodium vivax and Toxoplasma gondii.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.