The evolution of cardiolipin biosynthesis and maturation pathways and its implications for the evolution of eukaryotes

Background Cardiolipin (CL) is an important component in mitochondrial inner and bacterial membranes. Its appearance in these two biomembranes has been considered as evidence of the endosymbiotic origin of mitochondria. But CL was reported to be synthesized through two distinct enzymes--CLS_cap and CLS_pld in eukaryotes and bacteria. Therefore, how the CL biosynthesis pathway evolved is an interesting question. Results Phylogenetic distribution investigation of CL synthase (CLS) showed: most bacteria have CLS_pld pathway, but in partial bacteria including proteobacteria and actinobacteria CLS_cap pathway has already appeared; in eukaryotes, Supergroup Opisthokonta and Archaeplastida, and Subgroup Stramenopiles, which all contain multicellular organisms, possess CLS_cap pathway, while Supergroup Amoebozoa and Excavata and Subgroup Alveolata, which all consist exclusively of unicellular eukaryotes, bear CLS_pld pathway; amitochondriate protists in any supergroups have neither. Phylogenetic analysis indicated the CLS_cap in eukaryotes have the closest relationship with those of alpha proteobacteria, while the CLS_pld in eukaryotes share a common ancestor but have no close correlation with those of any particular bacteria. Conclusions The first eukaryote common ancestor (FECA) inherited the CLS_pld from its bacterial ancestor (e. g. the bacterial partner according to any of the hypotheses about eukaryote evolution); later, when the FECA evolved into the last eukaryote common ancestor (LECA), the endosymbiotic mitochondria (alpha proteobacteria) brought in CLS_cap, and then in some LECA individuals the CLS_cap substituted the CLS_pld, and these LECAs would evolve into the protist lineages from which multicellular eukaryotes could arise, while in the other LECAs the CLS_pld was retained and the CLS_cap was lost, and these LECAs would evolve into the protist lineages possessing CLS_pld. Besides, our work indicated CL maturation pathway arose after the emergence of eukaryotes probably through mechanisms such as duplication of other genes, and gene duplication and loss occurred frequently at different lineage levels, increasing the pathway diversity probably to fit the complicated cellular process in various cells. Our work also implies the classification putting Stramenopiles and Alveolata together to form Chromalveolata may be unreasonable; the absence of CL synthesis and maturation pathways in amitochondriate protists is most probably due to secondary loss.


Accession number
The nucleotide sequences of the Phaeodactylum tricornutum CLS_cap identified by us have been submitted to GenBank and their accession numbers are JN088191 and JN088192.

Background
Cardiolipin (CL) is an important phospholipid component of mitochondrial inner membrane and bacterial membrane. In mitochondria, CL stabilizes the respiratory complexes and the supercomplexes mainly made up of complex III/IV [1,2], and maintains the generation of ATP [3,4]; it is also involved in mitochondrial protein import, cell wall biogenesis, translational regulation, aging and apoptosis [2]. In bacteria, CL interacts with energy metabolism proteins such as succinate dehydrogenase [5], formate dehydrogenase-N [6], and respiratory complex [7], and is assembled into reaction centers [8,9], and is also involved in proper localization of proteins on membrane [10,11]. Whereas, no CL have ever been found in archaea yet [12].
CL is biosynthesized from two molecules of phosphatidylglycerols (PG) molecules in bacteria while from a PG and a Cytidine diphosphate diacylglycerol (CDP-DAG) in eukaryotes ( Figure 1) [13]. In bacteria, the biosynthesis reaction is a reversible transesterification catalyzed by a kind of cardiolipin synthase (CLS) containing two phospholipase D (PLDc_2) domains-CLS_pld, while in eukaryotes, the reaction is not a reversible one catalyzed by another kind of CLS containing one CDP-alcohol phosphatidyltransferase (CAP) domain-CLS_cap. In addition, only in eukaryotes the nascent CL is further remodeled to become mature CL, which generally contains the same fatty acids at sn-1, 2 sites in a molecule of a certain organism [14][15][16]. The indispensable eukaryotic CL maturation process and enzymes are as follows: nascent CL is deacylated to form monolysocardiolipin (MLCL), which is catalyzed by either of the two kinds of enzymes-CL-specific phospholipase (CLD1, YGR110W) identified in yeast [17] and calcium-independent phospholipase A 2 (iPLA 2 ) beta or gamma reported in Drosophila and rat [18,19]; MLCL is then reacylated by CoA-independent tafazzin (TAZ) [20] or acylCoA:lysocardiolipin acyltransferase 1 (ALCAT1) [21] to become mature CL. Through this process, a high degree of acyl chain symmetry in CL is established. In bacteria, there is not such a maturation process at all.
As seen above, the CL biosynthesis and maturation pathways in eukaryotes are distinct from those in bacteria. However, the simultaneous appearance of CL in both bacteria and eukaryotic mitochondria has been considered to be a line of evidence for the endosymbiotic origin of mitochondrion from bacteria [22,23]. According to the endosymbiosis theory, many mitochondrial properties such as energy metabolism including respiratory chain are inherited from the bacterial endosymbiont. But the above differences between mitochondria and bacteria make it uncertain whether this is true to CL biosynthesis pathway. Therefore, in fact how the eukaryotic CL biosynthesis and maturation pathways arise during the origin of eukaryotes from prokaryotes is still a mystery.
Moreover, CL was reported to be absent in some anaerobic protists such as Giardia lamblia [24] and Trichomonas vaginalis [25]. These organisms possess no canonical mitochondria but mitosomes or hydrogenosomes, which do not have electron transport chain (ETC), membrane potential, and proton-driven ATP generation [26]. The lack of mitochondria in G. lamblia was once taken as the main evidence by many authors to support this organism is the most primitive eukaryote diverging from the eukaryotic trunk before the emergence of mitochondria [27][28][29]. Therefore, whether the lack of CL in these 'amitochondriate' protists is due to their primitiveness or secondary degeneration is a question even relating to the early evolution of eukaryotes.
To study the origin and evolution of CL biosynthesis and maturation pathways, herein, phylogenetic distribution and phylogeny of the CL biosynthesis and maturation enzymes were investigated in diverse eukaryotes of the five supergroups: Opisthokonta, Amoebozoa, Archaeplastida, Chromalveolata, and Excavata, and diverse bacteria, and some interesting observations were obtained.

Results
Phylogenetic distribution of CL biosynthesis enzymes in eukaryotes and their similar sequences in bacteria CL synthase (CLS) Homologs of CLS_cap were identified in Opisthokonta (except the amitochondriate Microsporidia), Archaeplastida, and Stramenopiles (except B. hominis, which does not have genome database) of Chromalveolata (Table 1). The two supergroups and one subgroup contain all the multicellular eukaryotes (Animalia, Fungi, Planta, Chlorophyta, Rhodophyta, and Phaeophyceae) and some unicellular eukaryotes (protists). This means all the multicellular eukaryotes and only those unicellular eukaryotes that belong to the same supergroups (Opisthokonta and Archaeplastida) or subgroup (Stramenopiles of Chromalveolata) with these multicellular eukaryotes possess CLS_cap. Generally, each species has only one homolog, but a few of them such as H. sapiens, M. musculus, C. elegans, D. melanogaster, S. purpuratus, and H. magnipapillata have more than one copy (Additional file 1: Table S1). Multiple sequence alignments revealed most of these identified homologs possess the conserved amino acid residues and membrane-binding regions of the CLS_cap [30] (Additional file 2: Figure S1). Many (> 3,000) bacterial similar sequences were found in diverse bacteria following eukaryotic homologs in the hit list when searching against RefSeq_protein database when the cutoff E-value is 0.001, though most of them are annotated as CDP-diacylglycerol-glycerol-3-phosphate 3-phosphatidyltransferase (PGPS). To reduce computation burden, only those top hits (1,500 sequences, E < 1e-18) were included in the below analyses. Among them there are two previously reported CLS_cap from two actinobacteria [31], and according to our phylogenetic analysis, much more sequences from actinobacteria (88 of the 148 sequenced actinobacterial species) and some other bacteria including diverse proteobacteria and others are CLS_cap (data not shown). When these bacterial homologs were aligned to build HMM profile, and then the profile was used as query to search against all kinds of prokaryotic genomes, we also found only a small part of the surveyed bacteria (172 of the 1,375 bacteria), which are mainly proteobacteria, actinobacteria, and a few other bacteria, possess CLS_cap (data not shown).
Whereas, interestingly, in all the other two investigated eukaryotic supergroups and one subgroup, which all exclusively consist of unicellular eukaryotes (protists), including Amoebozoa (except the amitochondriate Entamoebida), Excavata (except the amitochondriate Parabasalia and Diplomonadida), and Alveolata in Chromalveolata, no CLS_cap but CLS_pld homologs were identified (Table 1). These homologs all contain the two conserved motifs which were proposed to be involved in phosphatidyl group transfer [32] (Additional file 3: Figure S2). Many (> 5,000, when E-value < 0.001) sequences annotated as CLS from diverse bacteria were also found to be top hits of CLS_pld. To investigate the distribution of CLS_pld in prokaryotes, a HMM profile built from seven genes whose CLS function were confirmed experimentally [33] was used as query to search bacterial genomes, CLS_pld homologs was found in most investigated bacteria (927 of the 1,375 bacteria). None type CLS is found in archaea.
None of the eukaryotes investigated contains the both types of CLS. Whereas, in all the amitochondriate protists mentioned above in brackets (e.g. Microsporidia, Entamoebida, Parabasalia, Diplomonadida), neither of the two types of CLS were found. No CLS were found in B. hominis yet, but this is probably due to its incomplete genome database.

CL-specific phospholipase (CLD)
Homologs were found in most genomes of four of the five eukaryotic supergroups except Amoebozoa, but within the four supergroups some subgroups or species such as Microsporidia, Ostreococcus, G. sulphuraria,   Perkinsida, Apicomplexa, B. hominis, Heterolobosea, Parabasalia, and Diplomonadida do not have the homolog yet (Table 1). Two typical motifs ("GXSXG" and "HX4D") of CLD [17], which are considered to function as lipase and acyltransferase, respectively, were found in almost all of these identified homologs (Additional file 4: Figure S3). Many (> 5,000, when E-value < 0.001) bacterial similar sequences were also found following the eukaryotic homologs in the hit list, but most of them were annotated as "alpha/beta hydrolase" or "hypothetical protein". We only choose those very close to eukaryotic sequences in the hit list for the below phylogenetic analyses.
Calcium-independent phospholipase A2 (iPLA2) As the hits of iPLA 2 beta and gamma mixed together in the hit list due to high sequence similarity between the two enzymes, they were discriminated according to the below phylogenetic analyses. It was found homologs of iPLA 2 gamma exist in most genomes of four of the five supergroups (except Amoebozoa) and homologs of iPLA 2 beta were found in all animals and a fungus in Opisthokonta and two species in Amoebozoa. None homologs of the two iPLA 2 were found in many subgroups and species, such as Choanoflagellate, most fungi (except A. fumigatus), Entamoebida, G. sulphuraria, Ciliata, Cryptosporidium, Oomycetes, B. hominis, Parabasalia, and Diplomonadida. But many other fungi not listed in Table 1 were found to have iPLA 2 homologs when searching against RefSeq_protein database. Some organisms possess multiple homologs (Additional file 1: Table S1). Most of the identified homologs possess the two conserved segments which are the features of iPLA 2 [34] (Additional file 5: Figure S4). Many bacterial similar sequences annotated to be "patatin" were found following these eukaryotic homologs in the hit list, and only those top hits (> 500 sequences when E-value < 0.001 for each query) were picked and supplied to the below phylogenetic analyses.

acyl-CoA:lysocardiolipin acyltransferase 1 (ALCAT)
Besides annotated ALCAT, other eukaryotic enzyme homologs such as "1-acylglycerol-3-phosphate Oacyltransferase (AGPAT) 3, 4, 5" and "lysophosphatidylglycerol acyltransferase (LPGAT)" were also found in the genomes of all five supergroups when searching against the RefSeq_protein database. Because of the high sequence similarities among them, their identities were further determined by the below phylogenetic analyses. No homolog was found in several subgroups and Table 1 The phylogenetic distribution of CL biosynthesis and maturation enzymes of five eukaryotic supergroups (Continued) Giardia lamblia -------+ = presence; -= absence; a , homologs obtained by tblastn search rather than by blastp search as others; *, nr database rather than genome database was searched against for homologs; # , EST database rather than genome database was searched against for homologs species including D. melanogaster, Microsporidia, Entamoebida, most Chlorophyta (except M. sp.), Rhodophyta, Alveolata, T. pseudonana, Heterolobosea, Diplomonadida, and Parabasalia (Table 1). Many (> 1,000, when E-value < 0.001) bacterial similar sequences were also found following the above eukaryotic homologs in the hit list. Their relationship with eukaryotic ALCAT homologs was determined by the below phylogenetic analyses.

Tafazzin (TAZ)
Homologs were found in all the five supergroups, but not found in several subgroups and species such as Microsporidia, Entamoebida, Alveolata, Kinetoplastids, Parabasalia, Diplomonadida, S. pombe, P. sojae, and P. tricornutum (Table 1). Bacterial sequences were also found after eukaryotic TAZ homologs in the hit list, and were mostly annotated as "acyltransferase". But they have very low sequence similarities with eukaryotic TAZ homologs, and our preliminary phylogenetic analysis does not support they have close relationship with eukaryotic TAZ, thus they were not included in the further analyses. Briefly, the distribution of the maturation pathway enzymes can be summarized as the following three conditions: 1) not any enzymes exist in Microsporida, Entamoebida, Cryptosporidium, Parabasalia, and Diplomonadida; 2) there are only one or two enzymes in some protists, including G. sulphuraria, Alveolata (except Cryptosporidium), and B. hominis, they are unable to form the complete two-step maturation pathway in these protists; 3) all the other eukaryotes possess most of the enzymes, which can form the complete two-step maturation pathway.

Phylogeny of CL biosynthesis enzymes
As the Maximum Likelihood (ML) and Bayesian trees showed similar topologies, here we chose the Bayesian tree as a representative with the bootstrap values of ML tree also on the tree (As for the following other enzymes, the similar results were obtained, and so Bayesian trees were also chosen as representatives).
On the CLS_cap phylogenetic tree (Figure 2, for the ML tree please see Additional file 6: Figure S5), all the identified homologs from eukaryotes are recovered into a highly supported big monophyletic clade (Clade E). Within this clade, homologs from Opisthokonta, Archaeplastida, and Stramenopiles of Chromalveolata form three subclades with high support values, and within these subclades many groups corresponding to their source lineages were also recovered. Furthermore, multiple homologs from a species always cluster together firstly, suggesting they are the products of species-specific gene duplication. A clade consisting of all homologs from alpha proteobacteria was recovered to be the closest sistergroup of the Clade E with a moderate support value (0.73/54) with all the homologs of other diverse bacteria being its outgroups. Among these outgroups, the actinobacterial clade, which contains the two previously reported CLS_cap identified from two actinobacteria [31], is the outmost group, suggesting all the homologs of these outgroups are CLS_cap. Finally, PGPS from diverse bacteria form an outgroup of all the above clades. Therefore, our results suggest besides in actinobacteria as reported previously, CLS_cap might have already emerged in some other bacteria including diverse proteobacteria and others, and eukaryotes might acquire their CLS_cap from alpha proteobacteria.
On the CLS_pld phylogenetic tree (Figure 3), all the identified homologs from eukaryotes are also recovered into a highly supported big monophyletic clade (Clade E). Within this clade, homologs form three subclades almost corresponding to their three source supergroups-Alveolata of Chromalveolata, Amoebozoa, and Excavata, and within these subclades homologs also form groups corresponding to their source lineages (e.g. Apicomplexa, Perkinsida, and Ciliata). However, Clade E does not show any particular close correlations with those similar sequences from any current bacterial lineages. These results suggest that all the CLS_pld from the eukaryotes (which are exclusively unicellular organisms, protists) of the three eukaryotic supergroups have a common ancestor, which does not fall into any of the present bacterial lineages.

Phylogeny of CL maturation enzymes
Due to their very low sequence similarities with TAZ and ALCAT, bacterial similar sequences of these two enzymes were not included in the final phylogenetic analysis. The obtained four phylogenetic trees (Additional file 7: Figure S6, Additional file 8: Figure S7, Additional file 9: Figure S8, Additional file 10: Figure S9 and Additional file 11: Table S2) showed: 1) all the eukaryotic homologs of each enzyme cluster together firstly with high support values, none of these enzymes show a close relationship with any particular bacterial similar sequences, and the low support values also do not support they have direct phylogenetic correlations with any bacterial sequences, suggesting they are not inherited from bacterial ancestors directly but arose after the emergence of eukaryotes, and each of these enzymes in all eukaryotes has a common ancestor which have might already emerged in the last eukaryotic common ancestor (LECA) of the five supergroups; 2) homologs of each enzymes from a common supergroup or lineage (e.g. Animalia, Fungi, Oomycetes and Planta) do not form a common clade corresponding their source supergroup or lineage but usually form two or more separated clades, and alternative trees constraining them as monophyly were rejected significantly (Additional file 11: Table S2), suggesting gene duplication and loss occurred frequently on these enzymes at different lineage levels. Moreover, as for ALCAT, firstly, all the homologs form a sistergroup to AGPAT 3/4 clade, suggesting ALCAT arose through gene duplication and divergence with the enzyme AGPAT 3/4. This means gene duplication and divergence also have ever occurred between ALCAT and AGPAT 3/4 during the origin of ALCAT. What is more, Figure 3 Phylogeny of all CLS_pld from the eukaryotes (which all are protists) and diverse bacteria. The tree was inferred using MrBayes 3.12 on 356 aligned amino acids. The tree is illustrated using the same conventions as in Figure 2. multiple copies of homologs of each of these enzymes from a species generally clustered together, suggesting gene duplication of these enzymes continues occurring relatively recently in some species.

Discussion
The origin and evolution of CL biosynthesis pathways in eukaryotes As mentioned above, CL is biosynthesized by two distinct synthases-CLS_cap and CLS_pld. The two types of enzymes belong to two distinct protein families without any primary sequence similarity between them [16]. Generally, it is considered eukaryotes have CLS_cap and bacteria CLS_pld. However, our investigation revealed although most bacteria possess CLS_pld, some kinds of bacteria including actinobacteria, proteobacteria, and some others, bear CLS_cap, suggesting CLS_cap has already arisen in some bacteria actually; in eukaryotes, all the multicellular organisms and only those unicellular organisms (protists) which belong to the same supergroups or subgroup with these multicellular organisms possess CLS_cap. Our phylogenetic analysis further showed all the CLS_cap in these eukaryotes have the closest relationship with those of alpha proteobacteria. Since alpha proteobacteria is generally considered to be the endosymbiotic ancestor of mitochondrion [35][36][37], then the CLS_cap pathway in these eukaryotes most probably originated from alpha proteobacteria through the mitochondrial endosymbiotic event. This is inconsistent with the previous postulation that eukaryotic CLS originated from the prokaryotic type PGPS which existed in ancestral eukaryotes [38].
On the other hand, our investigation revealed all the other eukaryotes whose supergroups or subgroup consist exclusively of unicellular eukaryotes (protists) possess CLS_pld. Among these eukaryotes a few lineages such as Trypanosoma, Leishmania, Theileria, Plasmodium, Cryptosporidium and Dictyostelium had previously been reported to have CLS_pld by other authors, and this condition was explained as an evolutionary survival of the prokaryotic reaction for CL formation into the eukaryotic kingdom [38]. Actually, CL was reported to really exist in these eukaryotes such as D. discoideum, T. thermophila, P. tetraurelia, P. marinus and T. cruzi [39][40][41][42][43]. But, according to our present work, since 1) CLS_pld is widely distributed in so many kinds of protists (only with the exception of those protists in Supergroup Opisthokonta, Archaeplastida, and Stramenopiles of Supergroup Chromalveolata), and forms a complementary distribution with the CLS_cap within the entire eukaryote Domain (mainly within protists); 2) on the phylegenetic tree, all the CLS_pld from different eukaryotes (protists) were clustered together as a common clade, without showing close relationship with the CLS_pld from any particular extant bacterial lineages, suggesting they have a common ancestor which is probably very ancient and is not kept in any extant bacterial lineages without obvious changes, then these CLS_pld in eukaryotes can not be a secondary acquisition by independent horizontal gene transfer (HGT) from different bacteria in different protist lineages, but must have be inherited from a common ancestor of these eukaryotes. Because 1) such a common ancestor can only be the last eukaryotic common ancestor (LECA) or the first eukaryotic common ancestor (FECA); 2) most bacteria (except most proteobacteria and actinobacteria, which bear CLS_cap pathway) possess CLS_pld pathway, and the emergence of CLS_cap in partial bacteria might occur much later than CLS_pld; 3) the common ancestor of these eukaryotic CLS_pld can not be found in extant bacteria as that of eukaryotic CLS_cap, so the acquisition of these eukaryotic CLS_pld might occurred very anciently (probably earlier than the endosymbiotic origin of mitochondria from alpha proteobacteria). Therefore, it is most probably that the FECA inherited the CLS_pld pathway from a ancient bacterium such as the bacterial partner according to the "fusion hypothesis" [44], or the proto-eukaryote derived from bacteria according to the 'phagotrophy hypothesis' [45], or the bacteirium related to the origin of the nucleus according to the 'endosymbiosis hypothesis' [46][47][48].
Neither CLS_cap nor CLS_pld was found in all the investigated amitochondriate protists, inspite of which eukaryotic supergroup (Opisthokonta, Amoebozoa, or Excavata) these protists belong to. This is consistent with the lack of CL in these organisms such as G. lamblia, T. vaginalis, and E. cuniculi [24,25,49]. Since both bacteria and all the other eukaryotes have CL and the corresponding CL biosynthesis pathways, the absence of either of the two CL biosynthesis pathways in these amitochondriate protists must be the results of secondary loss due to their degeneration of mitochondria. Consistently, it was showed anaerobic prokaryotes lack CL, and anaerobic condition can cause the decrease of CL in contrast to aerobic in yeast [50,51]. The existence of CL in a relative of T. vaginalis-Tritrichomonas foetus [23] further support such a secondary loss once occurred at least in T. vaginalis. The lack of either type of CLS in B. hominis might also due to its lack of mitochondria or incomplete genome database.
Considering the distinctive difference of phospholipid between archaea and bacteria and eukaryotes [52], and the absence of either type of CLS in archaea, it is reasonable to postulate archaea may not contribute to the origin of eukaryotic CL biosynthesis. Therefore, based on the above analyses, we can propose a evolutionary scenario about the CL biosynthesis pathway in eukaryotes as follow (Figure 4): in the process of the origin and evolution of eukaryotes, the FECA inherited the CLS_pld pathway from its bacterial ancestor, which is probably the bacterial partner according to any of the hypotheses about eukaryote evolution such as the 'fusion hypothesis', the 'phagotrophy' hypothesis and the 'endosymbiosis hypothesis'; later, when the FECA evolved into LECA, the endosymbiotic origin of mitochondrion brought in another CL synthase-CLS_cap, which had arisen in the endosymbiotic bacteria-alpha proteobacteria; then, in those LECA individuals which would evolve into those unicellular eukaryote lineages (e. g. Chonanoflagellates, Chlorophyta) from which multicellular eukaryotes (e. g. Animalia and Fungi in Opisthokonta, Archaeplastida, and Phaeophyceae in Chromalveolata) could arise, the endosymbiotic-original CLS_cap gene was transferred into the nuclear genome of the host cell, and the previous CLS_pld pathway was substituted, while in the other LECA individuals which would just evolve into the other unicellular protist lineages (e. g. Amoebozoa, Alveolata of Chromalveolata, and Excavata) from which no multicellular eukaryotes would arise, the previous CLS_pld was retained and the endosymbioticoriginal CLS_cap was lost; in the amitochondriate protists (including Microsporidia) the CL biosynthesis pathway (either CLS_pld or CLS_cap) was secondly totally lost due to their secondary degeneration of mitochondria.

The origin and evolution of CL maturation pathway in eukaryotes
The eukaryotic CL maturation pathway consists of two steps, and altogether five enzymes have been previously identified to participate in this process in different eukaryotes.
CL maturation is indispensable in higher eukaryotes though the purpose of this process is not very clear. Our phylogenetic analyses indicated all maturation enzymes arise after the emergence of eukaryotes, and might have already emerged prior to the divergence of all the eukaryote supergroups. Except ALCAT seems to arise through gene duplication and divergence of other existing enzyme (AGPAT 3/4), the origins of the other enzymes are not clear yet.
Our phylogenetic analyses also indicated gene duplication and gene loss occurring frequently at different lineage levels in the evolution of maturation pathways. These gene duplications and losses result in a patchy distribution of the maturation pathway enzymes in diverse eukaryotes, increasing the diversity of the pathway. Different enzymes or multiple homologs in the same step of the pathway can widen the recognition of substrates carrying different fatty acid substituents, and iPLA 2 beta and gamma are just such a reported example for this [53]. Such a condition for the maturation pathway might be the results of adaptive evolution for coping with the complicated cellular process in various eukaryotic cells.
In the present work, we found except some unicellular eukaryotes including all the amitochondriate protists (Microsporidia, Entamoebida, Parabasalia and Diplomonadida), all Alveolata in Chromalveolata, and a few other species (e. g. G. sulphuraria, B. hominis), all the other eukaryotes, which distribute in all the five eukaryotic supergroups, either unicellular or multicellular, either parasitic or free-living, possess a complete CL maturation pathway by having at least one enzyme for each step of the pathway. The absence of the complete pathway in B. hominis and G. sulphuraria might be caused by their incomplete databases, and the lack in other protists are probably due to various secondary losses, because 1) each enzyme of pathway from various lineages form a monophyly on the phylogenetic trees, 2) their close relatives have this pathway, and 3) some, though not a complete set, of the enzymes of the pathway, appear in some of these protists. The totally absence of this pathway in amitochondriate protists (without any enzymes of this pathway) is consistent with the lack of typical mitochondria and CL in these protists, and must be due to the degeneration of mitochondria in them. Whereas, the presence of partial of the enzymes of this pathway in some protists (e.g. Ciliata, Perkinsida, most Apicomplexa) might suggest their maturation pathways are in the process of losing or the enzymes left might have other functions.
Implications to the evolution of eukaryotes and the classification of the five eukaryotic supergroups According to our above analysis about the phylogenetic distribution and the phylogeny of the two types of CLS in eukaryotes, the acquisition of CLS_cap pathway through mitochondrial endosymbiosis might have offered some potential for the evolution of multicellularity. Because the CLS_pld pathway exists exclusively in unicellular eukaryotes (protists), while the CLS_cap pathway is distributed in all the multicellular organisms and only those unicellular eukaryotes (protists) that belong to the same supergroups or subgroup with these multicellular organisms. Therefore, for the first time, our work implies the endosymbiotic event of alpha proteobacteria not only led to the origin of mitochondria, but also might affect the subsequent evolution of eukaryotes such as the evolution of multicellularity, which may depend on what kinds of genes of the endosymbiont are transferred into the host nucleus and thus what kinds of endosymbiotic relationships are established.
The classification and relationships of the five eukaryotic supergroups are still under controversial now [54][55][56][57]. In the present work, it was showed the CL biosynthesis and maturation pathways are very different between the two subgroups in Supergroup Chromalveolata-Stramenopiles possess the CLS_cap pathway and a complete maturation pathway, while Alveolata bear the CLS_pld pathway and not a complete maturation pathway (due to completely lacking the second step). Therefore, the classification putting these two subgroups into a common supergroup may be unreasonable.
Amitochondriate protists were once thought as the most primitive extant eukaryotes because of their lack of mitochondrion and other primitive characteristics [28,29,58]. However, recently, accumulating molecular evidence and the identification of atypical mitochondria-mitosome or hydrogenosome-in these organisms argued they might once possess mitochondria [59][60][61][62]. Our investigation indicates the absence of CL biosynthesis and maturation pathways in these amitochondriate protists might be due to secondary losses. Thus, these atypical mitochondria in these amitochondriate protists might also result from degeneration of their once-existent typical mitochondria.

Conclusions
We propose that the FECA inherited the CLS_pld pathway from its bacterial ancestor (which could be the bacterial partner according to the 'fusion hypothesis' or the 'phagotrophy hypothesis' or the 'endosymbiosis hypothesis' about the origin of eukaryotes from prokaryotes); later, when the FECA evolved into the last eukaryote common ancestor (LECA), the endosymbiotic mitochondria (alpha proteobacteria) brought in another pathway-CLS_cap pathway, and then in some LECA individuals the CLS_cap pathway substituted the previous CLS_pld pathway, and these LECA would evolve into the protist lineages from which multicellular eukaryotes could arise, while in the other LECA individuals the previous CLS_pld pathway was kept and the CLS_cap pathway was lost, and these LECA would evolve into the current protist lineages that possess the CLS_cap pathway. Besides, our work indicated CL maturation pathway arose after the emergence of eukaryotes probably through mechanisms such as the duplication of other already-existent genes, and gene duplication and loss occurred frequently at different lineage levels, increasing the diversity of the pathway probably so as to fit the complicated cellular process in various cells. On the other hand, our work implies what kind of the endosymbiotic relationship is established during the evolutionary origin of mitochondrion in early eukaryotes might affect the subsequent evolution of multicellularity; the classification putting Stramenopiles and Alveolata together to form Chromalveolata may be unreasonable; the absence of CL synthesis and maturation pathways in amitochondriate protists is most probably due to secondary degeneration.

Organisms
The following organisms with genome or expressed sequence tag (EST) databases were taken as representatives of the five eukaryotic supergroups in this study:

CL biosynthesis and maturation pathway gene collection and identification
All the reviewed eukaryotic CLS sequences (Q07560, O01916, Q8MZC4, Q9UJA2, Q80ZM8, Q5U2V5, and B6TPV7) and bacterial CLS sequences (127 sequences, their accession ID and sequences can be obtained from the authors upon request), and reviewed TAZ sequences (Q9V6G5, Q16635, Q6IV77, Q06510, Q6IV84, Q6IV76, Q6IV83, Q6IV82, Q6IV78, and Q54DX7) were downloaded from Uniprot. As only a few reviewed CLD1, PLA2 and ALCAT are available in Uniprot, the curated orthologs of CLD1 (K13535) and ALCAT (K13513) were downloaded from KEGG database; As for iPLA 2 beta (CG6718) and gamma (Q9NP80), their putative orthologs (beta: 15 sequences; gamma: 14 sequences. Their accession ID and sequences can be obtained from the authors upon request.) were retrieved from KEGG SSDB database (hits with best-best relationship and identity > 0.5). These obtained sequences were aligned by MUSCLE, v 3.8.31 [63]. Then, HMM profile of each enzyme was build and calibrated from their multiple aligned sequences by HMMER package (v3.0) with default parameters. Finally, the obtained profiles were used as queries to search against genome databases of those organisms mentioned above and ResSeq_protein databases by using hmmsearch. The obtained similar sequences with high E-value were further analyzed by PFAM to confirm whether they are really homologs. To exclude repeat "ANK" domain of PLA 2 beta (CG6718 and its orthologs), corresponding Nterminal region were removed according to the annotation of PFAM database before hmm profiles building. If no similar sequence was detected for a certain species, then its non-redundant (nr) protein and nucleotide database and genome database online were searched against by using BLASTP or tBLASTn program independently. The EST database of G. sulphuraria was searched against by using tBLASTn program.
Bacterial similar sequences of each of these enzymes were also collected during searching against RefSeq_protein database. As many bacterial similar sequences were found under the cutoff E-value 0.001, they were collected as many as possible at first and then only a subset of them, determined by using preliminary phylogeny analyses were kept for the further analyses.

Phylogenetic analyses
In order to infer the origin of eukaryotic CL biosynthesis and maturation enzymes, all the sequences obtained above were used for the following phylogenetic analyses.
Multiple alignment of each dataset was initially carried out using MUSCLE, version 3.8.31 [63]. Nonhomologous insertions and sequence characters that could not be aligned with confidence were removed manually. Only unambiguously aligned sites were used for phylogenetic analyses.
Phylogenetic trees were inferred using maximum likelihood (ML) and Bayesian methods. ML trees were inferred with FastTree 2.1 [64] using default CAT model and other settings. MrBayes 3.1.2 [65] was used to perform parallel Bayesian analyses with four incrementally heated Markov chains, sampled every 1,000 generations with the temperature set to 0.5. Among-site substitution rate heterogeneity was corrected with an invariable and eight Γdistributed substitution rate categories and the WAG model for amino acid substitutions [66], abbreviated herein as WAG+I+8 G. Two separate runs were performed to confirm the convergence of the chains. The average standard deviation of split frequencies and the potential scale reduction factor convergence diagnostic were used to assess the convergence of the 2 runs. Trees below the observed stationarity level were discarded, resulting in a 'burnin' that comprised 25% of the posterior distribution of trees. The 50% majority-rule consensus tree was determined to calculate the posterior probabilities for each node.
Prior to the above phylogenetic analyses, usually the large data sets including much more bacterial similar sequences were applied for preliminary analysis by using FastTree 2.1 with default parameters, and then only the sub-datasets including eukaryotic sequence data and the closest relationship with eukaryotes on the preliminary trees were picked out and subjected to the further analysis.

Tree topology tests
To assess the significance of gene duplication in each of the maturation pathway enzymes, alternative trees constraining two or more separate subclades of a certain lineage as a monphyly were obtained by 20 searches using RAxML [67] with the models mentioned above. The best-scoring ML tree from each constraint tree search was then compared with the Bayesian tree. Site likelihoods were calculated in RAxML (-f g option) under the GTRGAMMA model of sequence evolution. The Approximately Unbiased (AU) test was performed using CONSEL 0.1 k [68].