Taxonomic distribution and origins of the extended LHC (light-harvesting complex) antenna protein superfamily
© Engelken et al; licensee BioMed Central Ltd. 2010
Received: 23 April 2010
Accepted: 30 July 2010
Published: 30 July 2010
The extended light-harvesting complex (LHC) protein superfamily is a centerpiece of eukaryotic photosynthesis, comprising the LHC family and several families involved in photoprotection, like the LHC-like and the photosystem II subunit S (PSBS). The evolution of this complex superfamily has long remained elusive, partially due to previously missing families.
In this study we present a meticulous search for LHC-like sequences in public genome and expressed sequence tag databases covering twelve representative photosynthetic eukaryotes from the three primary lineages of plants (Plantae): glaucophytes, red algae and green plants (Viridiplantae). By introducing a coherent classification of the different protein families based on both, hidden Markov model analyses and structural predictions, numerous new LHC-like sequences were identified and several new families were described, including the red lineage chlorophyll a/b-binding-like protein (RedCAP) family from red algae and diatoms. The test of alternative topologies of sequences of the highly conserved chlorophyll-binding core structure of LHC and PSBS proteins significantly supports the independent origins of LHC and PSBS families via two unrelated internal gene duplication events. This result was confirmed by the application of cluster likelihood mapping.
The independent evolution of LHC and PSBS families is supported by strong phylogenetic evidence. In addition, a possible origin of LHC and PSBS families from different homologous members of the stress-enhanced protein subfamily, a diverse and anciently paralogous group of two-helix proteins, seems likely. The new hypothesis for the evolution of the extended LHC protein superfamily proposed here is in agreement with the character evolution analysis that incorporates the distribution of families and subfamilies across taxonomic lineages. Intriguingly, stress-enhanced proteins, which are universally found in the genomes of green plants, red algae, glaucophytes and in diatoms with complex plastids, could represent an important and previously missing link in the evolution of the extended LHC protein superfamily.
The evolution of algae and land plants and their photosynthetic machineries is intimately linked to the extended light-harvesting complex (LHC) protein superfamily. A chlorophyll-binding (CB) motif that is part of a transmembrane (TM) alpha-helix located in the thylakoid membrane is the homologous core structure of this protein superfamily. Several families belong to the extended LHC protein superfamily [1–5], including the LHC proteins, the LHC-like proteins, the subunit S of photosystem II (PSBS), the ferrochelatase II and a new family described in this work, the red lineage chlorophyll a/b-binding (CAB)-like proteins (RedCAP). Ferrochelatases are enzymes that catalyze the terminal step in the heme biosynthesis. Two different ferrochelatases exist in plants , but since only one of them possesses a CB motif and is imported into chloroplasts  we included only ferrochelatase II into our study. Non-homologous pigment-binding proteins, such as the prochlorophyte CB protein family , are not considered here. While the PSBS family consists of four-helix proteins , the LHC-like protein family is divided into three-helix early light-induced proteins (ELIPs) [2, 3], two-helix stress-enhanced proteins (SEPs)  and one-helix proteins (OHPs) [11, 12], which in cyanobacteria are also called high light-induced proteins (HLIPs) or small CB-like proteins [13, 14]. In contrast to LHC proteins, whose primary function is the absorption of light through chlorophyll excitation and transfer of absorbed energy to photochemical reaction centers, members of LHC-like and PSBS families are likely involved in stress protection [2, 3, 14, 15].
Many different models have been proposed for the evolution of the extended LHC protein superfamily [1, 2, 5, 16–19]. Most of them postulate a four-helix intermediate, similar to PSBS, as the ancestor of the LHC, LHC-like and PSBS families, or alternatively, a direct origin from HLIPs . Currently, the interpretation of the function and taxonomic distribution of these proteins is hampered by the absence of clearly defined families and a consistent framework of their evolution.
In the attempt to solve this problem, we systematically searched representative genomic and expressed sequence tag (EST) databases for members of the extended LHC protein superfamily with a focus on LHC-like sequences. Systematic analysis of their taxonomic distribution together with their primary and predicted secondary structures allowed us to provide a coherent classification and to propose an improved hypothesis for the evolution of this superfamily.
Results and Discussion
Classification of the extended LHC protein superfamily
For each protein family, the HMM profiles captured unique similarity patterns. Interestingly, the sequence logo-plots showed specific and highly conserved amino acid positions for given LHC, PSBS, RedCAP protein families and LHC-like protein subfamilies (marked by orange arrows in Figure 1), in addition to the ubiquitously conserved positions, like glutamate E-0 and arginine R-5. Due to their conservation pattern, these family-specific amino acid residues are expected to be functionally relevant and are likely correlated to specific molecular and physiological functions of the respective protein families. For example, several proline (P) residues are conserved at different positions in OHP1 and the RedCAP family (see orange arrows). Likewise, OHP2, ELIP, LHC and PSBS proteins all possess several specific and highly conserved residues of currently unknown function. In contrast to this, the sequence logos of the LHC-like protein subfamilies, such as HLIP and SEP, do not reveal uniquely conserved amino acid positions, they can, however, be found in subsets of HLIPs and SEPs (data not shown). The most likely explanation is that these two subfamilies are anciently paralogous, in addition, they are the oldest subfamilies. Another set of amino acid positions of potential functional interest are residues that are conserved across a distinct subset of protein families, like several amino acids within the first TM helix and one (glutamine Q-28) at the C-terminal end of the first TM helix in SEPs, OHP2, ELIPs and PSBS (blue arrows in Figure 1).
A diverse set of sequences with the characteristic CB sequence motif was found in systematic database searches. The 15 organisms under study represent the three major lineages of Plantae (Archaeplastida), including two glaucophytes, two red algae, two green algae and four divergent land plants (a moss, a conifer, a monocot and a dicot), as well as two diatoms (stramenopiles) and three divergent cyanobacteria. Individual sequences are listed in Additional file 1, Table S1. The sensitivity of our search approach was demonstrated by the identification of numerous previously unreported sequences that belong to the extended LHC protein superfamily, including three sequences from the well-annotated genome of Arabidopsis thaliana. Homologous sequences were exclusively found in photosynthetic organisms and were neither present in the ciliates Tetrahymena thermophila and Paramecium tetraurelia nor in the oomycete stramenopile Phytophtora ramorum (related to diatoms) , which were recently suggested to have had a photosynthetic ancestry . The only exceptions to this rule are transducing cyanophages (bacteriophages that infect cyanobacteria) that contained several HLIP sequences in their genomes .
The identified sequences of the extended LHC protein superfamily show a unique distribution across the taxonomic lineages. The presence/absence of LHC, RedCAP and PSBS families and several LHC-like subfamilies is presented in Figure 2B. An ancestral character analysis within the framework of an established consensus plastid phylogeny suggests the likely order of emergence of the different protein families (Figure 2C, see also Methods). Since the HLIP/OHP1 are ubiquitously distributed among eukaryotes and represent the only group (except for the fusion protein ferrochelatase II) that is also present in cyanobacteria, the cyanobacterial HLIPs are generally assumed the origin of the eukaryotic LHC protein superfamily. They are still plastid-encoded in glaucophytes and red algae. The plastid-encoded eukaryotic HLIPs are orthologs of the nuclear-encoded OHP1 sequences in the green lineage (Viridiplantae), which were transferred to the nucleus via endosymbiotic gene transfer , and apparently plastid-encoded HLIPs were lost in the green lineage. Some nuclear-encoded one-helix sequences from the red lineage and glaucophytes were named OHP1-like, but they showed no specific sequence similarity to OHP1. OHP2 are distributed ubiquitously across photosynthetic eukaryotes and are different from HLIP/OHP1 by possessing a short C-terminal hydrophobic element, which is possibly embedded in the thylakoid membrane . In addition, their significantly different primary sequence structure (Figure 1) makes them a unique group within the LHC-like family.
At least some LHC subfamilies, like CAB, fucoxanthin chlorophyll a/c-binding proteins (FCPs) and LI818, are present in all major red and green lineages, but apparently not in glaucophytes (Figure 2B). It seems highly unlikely that these abundant proteins would remain undetected in all EST approaches and thus would escape detection in the current study. This absence of molecular data is supported by immunological methods . A 28 kDa protein cross-reacting with an antibody raised against FCP of a marine raphidophyte was reported in glaucophytes . Unfortunately, the question whether this 28 kDa protein belongs to the LHC family remained unresolved since it cannot be excluded that it only shares epitopes with LHC proteins but is structurally different . Based on these studies Koziol and colleagues  proposed an origin of LHC proteins at the basis of the green and red algal lineage that is in agreement with our conclusions.
RedCAPs and other new sequences
While most families of the extended LHC protein superfamily had been described earlier, the nuclear-encoded RedCAPs from the red lineage (Sturm S, Engelken J, Gruber A, Vugrines S, Adamska I, Kroth P, Lavaud J, unpublished) have not been defined yet. The RedCAP family and the OHP2 subfamily can be reliably assigned based on HMM and BLASTP analyses (Additional file 1, Table S1). RedCAP sequences form a well-conserved family, and in contrast to ELIP or LHC proteins also their second helix is conserved. In public databases, RedCAP sequences sometimes are erroneously described as HV60 (based on the name of an ELIP sequence from Hordeum vulgare). However, based on primary sequence similarity, sequence length, conservation patterns, HMM analyses and phylogenetic analyses there is no indication that RedCAPs were specifically related to any other group of the extended LHC protein superfamily (Sturm S, Engelken J, Gruber A, Vugrines S, Adamska I, Kroth P, Lavaud J, unpublished). In contrast to the almost ubiquitous OHP2 and SEPs, the RedCAPs are clearly restricted to the red lineage, whereas PSBS and ELIPs are limited to the green lineage without any overlap (Figure 2B and 2C).
In addition to two copies of the already described PSBS in the green alga Chlamydomonas reinhardtii [JGI_Chlre4: 196341 and 171516], we identified a third, rather divergent PSBS sequence, which we named PSBS-like [JGI_Chlre4: 175221]. Based on HMM analysis, BLASTP, phylogenetic analysis (data not shown) and the number of TM helices it can be clearly classified as a PSBS (or a PSBS-like) sequence (Additional file 1, Table S1). This new sequence encodes a 311 amino acid long protein that has a highly similar counterpart in Volvox carteri with a length of 316 amino acid [JGI_Volca1: 94261]. Both PSB-like sequences are likely functional, based on the highly conserved exonic sequences in the C. reinhardtii - V. carteri comparison, although no EST is available. As a side-note, it has recently been shown that in C. reinhardtii, the common PSBS protein may not be translated under many growth conditions . Notably, a high number of LHC-like sequences were identified in the genome of Physcomitrella patens that had partially been described in the general genome analysis  and in a second analysis with a special focus on the antenna gene supplement .
Broad taxonomic distribution of the two-helix SEP subfamily
SEPs were defined as a LHC-like subfamily with one characteristic CB motif and a conserved secondary structure with two TM helices. SEPs are absent in cyanobacteria, but they seem ubiquitously distributed in photosynthetic eukaryotes. A total of 40 SEP sequences were identified in 15 organisms. In the glaucophytes we found six sequences in Glaucocystis nostochinearum and two in C. paradoxa and in streptophytes, six sequences in A. thaliana and with nine the largest number in the moss P. patens (Additional file 1, Table S1). Among the Cyanidiales, the red alga Galdieria sulphuraria has one SEP, whereas the completely sequenced thermo- and extremophile red alga Cyanidioschyzon merolae has none. In algae with complex plastids, we have detected a single rather divergent SEP in each of the two diatoms Phaeodactylum tricornutum and Thalassiosira pseudonana and in the pelagophyte Aureococcus anophagefferens. The presence of only one SEP in these taxa could be due to a streamlining process of the genome size of these particular taxa. Therefore, SEPs do not appear to be essential in the red lineage and seem to have been secondarily lost from several taxa.
The similarity in predicted secondary structure of SEPs is shown for selected SEP sequences from the three major lineages of Plantae, i.e. glaucophytes (C. paradoxa), red lineage (G. sulphuraria) and green lineage (A. thaliana) in Figure 3B, using the Dense Surface Alignment (DAS) algorithm . The high similarity of their predicted primary structure is displayed in Figure 3C. In cyanobacteria and cyanophages, no SEP-like sequences have been found, which is consistent with the idea of a eukaryotic origin of SEPs. The recent finding of a fusion protein with two predicted TM helices in a Synechococcus strain, termed hli5OS-B' (similar to YP_478210) , is not relevant for our study, since the order of the two TM helices ("coh1" precedes the CB motif), is inverted compared to two-helix SEPs.
The identification of SEPs, as well as OHP2, in the red lineage and in glaucophytes (Figure 2B and 2C and Additional file 1, Table S1) is a notable extension of their previously known distribution within chlorophytes and streptophytes (Viridaeplantae). This distribution argues strongly in favor of an early origin of SEPs and OHP2 in the common eukaryotic ancestor of Plantae that predates the origin of all three- and four-helix proteins (Figure 2C).
Independent origins of LHC and PSBS families in discrete duplication events
The PSBS proteins are predicted to form four-helix structures that were proposed to have originated in internal gene duplication [31, 32]. It was also noted, that the two halves of PSBS are more related to each other than the comparable parts of the LHC protein . In contrast to PSBS, the origin of LHC proteins is less clear. Among other scenarios , it was suggested  that LHC and PSBS proteins evolved from a common four-helix ancestor (Additional file 1, Figure S2).
In the tri-partite diagram, 94.1% of the quartets support the respective sister-group relationship of helices I and III between both, LHC and PSBS proteins, versus 2.5% support the topology expected under the scenario of a common origin. In the more stringent diagram (Figure 4A, right bottom triangle divided into seven areas), the result was 84.2% versus 0.2% and 0.5%, with 10.9% that are not in favor of any of the three alternative topologies (unresolved quartets). This result strongly contradicts the often favored, long-standing evolutionary scenario of a common origin shown in Additional file 1, Figure S2A . In this case the first and the third helices of both, LHC and PSBS proteins, would most resemble each other and therefore cluster together in a phylogenetic tree. This tree topology is displayed at the bottom right corner of the PUZZLE triangle diagram of Figure 4A and should be the only one supported. However, this alternative is supported by only 2.5% and 0.2%, respectively. The support for the common origin is therefore even lower than the support for the biologically unrealistic solution of a hybrid LHC/PSBS protein that is nevertheless supported by 3.4% and 0.5%, respectively (Figure 4A, left bottom triangle).
We accounted for the high degree of sequence diversity in the complex LHC protein family by incorporating all five LHC subfamilies (CAB, Li818 and Li818-like, the red algae/cryptomonad LHC and FCP), as well as a newly described clade LHCz . PSBS likewise were chosen from taxonomically distant green algae and land plants. When the dataset was reduced by removing the fast-evolving (long-branch) Ostreococcus tauri PSBS and the PSBS-like sequences from V. carteri and C. reinhardtii, as well as the divergent LHCz and FCP sequences, the percentage of unresolved quartets strongly decreased and the support for the clustering of helices I and III was further improved (99.8% versus 0.2%, Additional file 1, Figure S4A). However, to include the maximal sequence diversity we show the more conservative result of the larger dataset with 120 sequences (Figure 4A). The presence of a great number of phylogenetic diverse sequences resulted in a highly informative alignment, despite its short length of 32 amino acid positions. Sequences could be readily aligned due to the virtual absence of both gaps and insertions within and surrounding the CB-TM helices. We chose to limit the analysis to a short but accurate alignment and avoided the potentially dangerous inclusion of many unreliably aligned positions. Nevertheless, the result is quite robust to the inclusion of more noisy positions (data not shown), but note the major difference observed after the removal of fast evolving and divergent sequences (84.2% versus 99.8%). The surprisingly strong phylogenetic signal is also reflected in a low percentage of partially (4.2%) and completely (10.9%) unresolved quartets in this analysis (Figure 4A), which are essentially due to the inclusion of divergent primary sequences.
In the attempt to further validate our finding we inferred a maximum likelihood tree using a representative set of CB-TM sequences from LHC and PSBS families and the SEP subfamily (shown schematically in Figure 4B and entirely in Additional file 1, Figure S3). The results revealed that helices I and III of PSBS protein formed a monophyletic group (bootstrap value 85/83, with and without gamma correction) and the same situation was encountered for the LHC helices, albeit with weaker support (bootstrap value 51/41). To test if this topology (Figure 4B) was significantly better than the one expected under the old scenario (Additional file 1, Figure S2B), the expected likelihood weight and the Shimodaira-Hasegawa topology tests were performed in Tree-Puzzle. The scenario of Figure 4B is supported at a very high significance level (p = 0.0001) by both tests. In agreement with this result, we note that the duplicated area within both LHC and PSBS sequences extends substantially beyond the shared CB-TM helices and is not homologous between the two groups.
Functional constraints acting on CB motifs could hypothetically interfere with the genuine phylogenetic signal analyzed. However, this should affect functional sites and these sites (like glutamate E+0, histidine/asparagines H/N+3 or arginine R+5 in LHCII from spinach Spinacia oleracea, ) are conserved to such a high degree that they essentially do not contribute to the phylogenetic signal. This was confirmed in an additional likelihood mapping analysis, where these three functional sites were omitted (89.3% versus 5.0%, Additional file 1, Figure S4B). Furthermore, the CB motifs are generally under purifying selection maintaining structure and function. This selective force results in divergent rather than convergent evolution. Therefore, we conclude that potential functional constraints do not substantially interfere with the phylogenetic signal.
Character evolution and the possible origins of LHC and PSBS families from distinct SEPs
Since all LHC subfamilies share a common origin  and are absent in glaucophytes, the first LHC proteins did most likely evolve in a common ancestor of the red and green lineages ( and Figure 2C). The presence of two-helix SEPs in all three lineages of plants and deduced from this distribution their existence in the common ancestor, potentially already in form of paralogous copies (Figure 2C), make them prime candidates for the origin of LHC proteins. The internal gene duplication of a two-helix sequence would provide a simple and parsimonious explanation for the origin of the second, less-conserved CB-TM helix in LHC proteins. This makes SEPs a better candidate for the origin of LHC proteins than the previously proposed HLIPs [4, 19]. Furthermore, in eukaryotes HLIPs tend to occur as single copy genes and are plastid-encoded, whereas the internal gene duplication/unequal crossing-over event of tandem genes from which the first LHC protein evolved, is very likely to have taken place in the nuclear genome. There are several different processes, which may result in an internal gene duplication: (i) a slippage of the replication apparatus may lead either to a duplicated or to a deleted region, this process is happening rather frequently and is leading to duplicated areas (genes) arranged in tandem, and/or (ii) if there are already at least two closely related copies of a gene arranged in tandem, an unequal crossing-over between different copies on the two sister-chromosomes may lead to a fusion of parts of two genes (resulting in an internal duplication) on one chromosome and to a truncated copy on the other.
In light of these mechanistic considerations it seems that genes, which (i) occur in tandem repeat units, and (ii) are nuclear-encoded did most likely provide the genomic context for the proposed internal gene duplication. In addition, even a nuclear copy of a HLIP/OHP arranged in tandem would not be sufficient to generate a LHC protein, since there are no second transmembrane helices, which would need to be newly created. Again, all these characteristics favor SEPs over HLIPs as candidates for the origin of LHC proteins.
Most members of the LHC family contain the well-conserved carotenoid-binding motif  consisting of the amino acid residues FDPLGL (or similar) found approximately 15 amino acid positions in front of the CB motif in both, the first and third TM helices. However, neither RedCAP and ELIP nor PSPS family members harbor this specific carotenoid-binding motif in any of the two possible locations. This would make a two-helix protein that already contained the carotenoid-binding motif the most likely source for the origin of LHC proteins. Intriguingly, we found a SEP sequence (named here SEPx.4) in the glaucophyte G. nostochinearum that contains the three core amino acid residues FDP of the carotenoid-binding motif in the expected distance from the CB motif (Additional file 1, Figure S5).
Evaluation of alternative scenarios
Many attempts have been made to solve the question of the order in which the different families and subfamilies of the extended LHC protein superfamily have originated [5, 16, 19, 35, 36]. The existence of a LHC-like protein with only two TM helices as the ancestor of three- and four-helix proteins was already predicted more than one decade ago [16, 37], but the first experimental proof was only presented many years later in A. thaliana . While the number and diversity of identified sequences and families progressively increased, the order of their emergence remained enigmatic. The reason for this major limitation lies in the small size of their defining element, the CB helix, and the considerable age of the families under study, which renders it impossible to simply deduce the order of their emergence from a phylogenetic analysis of the primary sequences. In order to overcome this limitation, we took advantage of (i) the new wealth of sequence data with special emphasis on completely sequenced genomes, (ii) recent multi-gene phylogenies that established a solidly supported phylogenomic tree of plastids, as well as the basal position of glaucophytes , and (iii) an independent phylogenetic approach in which we test the hypothesis of independent origins of the LHC and PSBS families.
An initial hypothesis for the order of emergence of the different family members was deduced from their taxonomic distribution using the ancestral character evolution analysis (Figure 2C). Although the "tree of eukaryotes" is still far from being resolved , the topology of the underlying (plastid) tree of photosynthetic eukaryotes used in this study is supported by several publications [40–43]. By relying on established group/species relationships, character evolution is independent of the potential pitfalls of phylogenies based on a single or a few genes.
The establishment of a rigorous classification scheme for the various families of the extended LHC protein superfamily was based on primary and secondary sequence information from a comprehensive database search. The fact that independent approaches (BLAST, HMM, different phylogenetic methods, TM helix analysis) led to mutually compatible results make us confident that the proposed relationships reflect to some detail biological realities.
A possible alternative for the origin of certain families could be to assume their origin at an earlier stage, e.g. the PSBS in the common ancestor of red/green lineages. However, apart from requiring a complete secondary loss in several lineages, this scenario would not provide more plausible explanations for the origin of the remaining families. Individual families, nevertheless, may have experienced isolated losses in certain taxonomic groups, like the SEPs that were lost in the extremophile red alga C. merolae and in certain algae with complex plastids, e.g. Emiliania huxleyi, but not in G. sulphuraria, in diatoms or in the pelagophyte A. anophagefferens. A broad taxonomic distribution, the presence of CB motifs and a conserved secondary structure would support a role of SEPs as recurrent building blocks of three-helix proteins, like LHC and four-helix proteins, like PSBS.
Based only on EST databases we cannot rule out the presence of additional relevant protein families of the extended LHC protein superfamily, for example in the two glaucophytes. However, we note that the chosen databases present very substantial numbers of unique ESTs. For C. paradoxa 9,867 unique EST clusters are available at TBestDB , which were derived from two different EST libraries, one based on mRNAs from "high light" and the other from "low light regular" conditions. TBEST contains also 4,673 EST clusters derived from C. paradoxa grown under different CO2 environments, as well as 8,745 unique EST clusters from G. nostochinearum. Additional glaucophyte ESTs are available from NCBI. Hence, the risk of overlooking important relevant protein families is substantially reduced due to the availability of different environmental conditions, with the ultimate proof being the genome sequences of these two distantly related glaucophytes. Based on available resources we currently assume that ELIP, RedCAP, LHC and PSBS proteins do not exist in glaucophytes. In addition, even if new LHC-like families were found in glaucophytes, this would not affect the independent origin of LHC and PSBS families.
Hypothesis for the evolution of the extended LHC protein superfamily
The proposed model (Figure 6) can explain the diversity of the extended LHC protein superfamily in cyanobacteria and photosynthetic eukaryotes. Notably, paralogs of the identified two-helix SEPs likely represent an important missing link in the evolution from the ancestral HLIPs to their three- and four-helix descendants in eukaryotes. Furthermore, this model does neither invoke events of horizontal gene transfer nor massive secondary losses, although it requires an additional internal gene duplication event. The discovery of many new and sometimes distantly related sequences (e.g. three in the well-annotated A. thaliana genome) suggests that our search has identified all available canonical LHC-like sequences in the surveyed genomes. Additional database searches (see Methods) were in agreement with these results and conclusions.
Using sequence data from a wide diversity of photosynthetic eukaryotes, cyanobacteria and non-photosynthetic organisms we identified many new members of the extended LHC protein superfamily. We propose a simple and powerful classification scheme based on predicted primary and secondary structures. A new and coherent hypothesis of the evolution of the extended LHC protein superfamily was inferred (Figure 6), supported by comparative genomics and molecular phylogenetic approaches. Importantly, the present study sheds light on the significance of two-helix SEPs and other LHC-like proteins with the discovery of their unexpected diversity and characteristic distribution across photosynthetic eukaryotes. From these evolutionary patterns we expect that proteins of the LHC-like family perform important, yet largely unknown, functions in photoprotection and regulation of photosynthesis.
Sequence search and annotation
Initially, fully sequenced genomes and large EST databases (Additional file 1, Table S1) representing twelve photosynthetic eukaryotes (Plantae) and three cyanobacteria were searched for sequences belonging to the extended LHC protein superfamily. Subsequently, sequence data from additional genomes were collected from public databases including TBestDB , NCBI http://www.ncbi.nlm.nih.gov, TIGR http://www.jcvi.org, Kazusa http://bacteria.kazusa.or.jp/cyanobase and UniProt http://www.uniprot.org. We excluded some available genomes from the ancestral character evolution analysis either because of their unclear taxonomic position (E. huxleyi), their preliminary nature (Fragilariopsis cylindrus, A. anophagefferens, V. carteri, Chlorella sp., Micromonas sp. and Selaginella moellendorffii), or their highly similar content of LHC-like sequences to A. thaliana (Populus trichocarpa, Arabidopsis lyrata, Vitis vinifera), Oryza sativa (Sorghum bicolor) or Ostreococcus lucimarinus (other Ostreococcus spp.) genomes. Database searches were done with the TBLASTN and BLASTP algorithms using consensus sequences for individual subgroups and non-stringent e-values (e = 0.1). When public annotations were unclear or missing, the genes were annotated manually with the helpτmes. Database searches were done with the TBLASTN and BLASTP algorithms using consensus sequences for individual subgroups and non-stringent e-values (e = 0.1). When public annotations were unclear or missing, the genes were annotated manually with the help of the GeneWise algorithm  and the tools at the genome browser of the Joint Genome Institute http://genome.jgi-psf.org. EST sequences were translated and manually controlled for frame-shifts that might have created artifacts and gene models were submitted to TPA_inf at NCBI http://www.ncbi.nlm.nih.gov/Genbank/TPA-Inf. For transit peptide prediction the predictor ChloroP was used http://www.expasy.org/tools. In diatoms, the presence/absence of a characteristic N-terminal signal sequence  and in red algae the twin-arginine motif  were used for signal peptide prediction. All identified LHC-like sequences from 15 organisms are given in Additional file 1, Table S1. Genes with identical deduced amino acid sequence, but different genomic location, as well as closely related ELIPs arranged in tandem (especially in P. patens), were listed under a single accession number. In general, annotation was straightforward due to the shortness and intron-scarcity of most LHC-like sequences, as well as due to the presence of transit peptides and signal sequences.
Classification of sequences
For the HMM analysis  we prepared seed alignments containing the first CB motif for each known or newly found protein family. For HLIPs and SEPs, several starting sequences were chosen in order to adequately cover their entire sequence diversity. The length of all alignments was limited to 54 amino acid positions in order to allow the comparison across all families. The seed alignments were augmented in a step-wise manner with the best hits from a sequence search in our local dataset consisting of all identified sequences from 15 organisms and used to create sequence logos (http://weblogo.berkeley.edu, Figure 1). From the same alignments we built a conservative HMM database containing 18 profiles (Additional file 1, Table S1). After calibration we searched the entire local collection of LHC-like sequences from 15 organisms against this HMM database. The three best hits to the local HMM profile database are given in Additional file 1, Table S1. Starting from full-length sequences, we used BLASTP  version 2.2.10 to search all sequences against the same local collection of sequences. P. patens ELIPs were not numbered due to the number of ELIPs in tandem and therefore, they were not part of the Local Reference Set used for BLASTP. The four best local BLASTP hits against the local collection of LHC-like sequences are given in Additional file 1, Table S1. HMM profiles were the most sensitive tool for classification of LHC-like sequences and this classification was complemented by local BLASTP analysis that have the advantage of using the entire sequence. Prediction of prokaryotic TM alpha-helices was done with the DAS program [29, 50] (all proteins were treated as prokaryotic, since they are most likely of cyanobacterial origin and are active in the chloroplast). CB motifs were automatically designated as TM helices due to their experimentally derived helix structure in a LHC protein from photosystem II [34, 51].
The most efficient criteria for classification differed slightly among the protein families and depended on their degree of conservation and on the length of conserved sequence domains. The HLIPs are clearly defined by their one-helix structure, together with being plastid-encoded in eukaryotes. HMM analysis, together with the predicted one-helix structure, is sufficient to define the OHP1 subfamily, which is nuclear-encoded after endosymbiotic gene transfer of the HLIPs in green plants. OHP2 are best classified by HMM and local BLASTP analysis due to their well-conserved primary protein structure. SEPs were classified based on the order of their two TM helices (CB motif containing helix precedes a second TM helix) and sequence similarity to other SEPs, as evident from HMM and BLASTP analyses. Subdivision into SEP1-5 was based on HMM, BLASTP and phylogenetic analysis. ELIP sequences were classified based on HMM, BLASTP and their three-helix structure. Accordingly, RedCAP and LHC sequences, including LHC proteins associated with photosystem I (LHCa) and photosystem II (LHCb), FCP, Li818 and LHCz, and the four-helix PSBS were unequivocally classified based on HMM, BLASTP and their predicted number of helices. The two fusion proteins, ferrochelatase II and Rieske-like CAB protein, possess a less-conserved CB motif (which can be missing in some cases) and therefore, the most efficient classification criterion for these two groups was full-length sequence similarity based on BLASTP.
Character evolution and phylogenetic analysis
Character evolution based on parsimony (unordered model) was used as implemented in Mesquite  for the reconstruction of ancestral states. The analyzed taxa were chosen to obtain a good representation of all photosynthetic organisms that possess members of the extended LHC protein superfamily. Amino acid sequence alignments were done with M-Coffee  and manually refined in Bioedit . Informative sites for phylogenetic analyses were chosen using G-blocks  with manual refinement. Amino acid substitution matrices for the SEP analysis (Figure 3A) were chosen with ProtTest . Neighbor-joining bootstrap values (10,000 replicates) were obtained in MEGA4 . Maximum likelihood bootstrap analyses with 100 replicates were performed using PhyML , posterior probabilities were calculated using MrBayes (3 million generations, the first 1 million trees were discarded as "burn-in") , the latter two using a WAG+Γ4 model (Figures 3C and Additional file 1, Figure S2). Consensus trees were created with the Consense option of the PHYLIP package . The significance of alternative topologies (Figure 3) was tested in Tree-Puzzle  using the Shimodaira-Hasegawa  and the expected likelihood weight  tests.
The four-cluster likelihood mapping analysis was performed with Tree-Puzzle  using the Dayhoff substitution matrix with four discrete gamma distributed categories. An approximate parameter estimation with quartet sampling for the substitution process and rate variation based on a neighbor-joining tree and 10,000 randomly chosen quartets were used. The dataset included a total of 120 sequences, with 41 pairs of LHC and 19 pairs of PSBS (helices I and III) sequences, respectively.
Sequence data of newly identified sequences from C. paradoxa and G. nostochinearum are available in the Third Party Annotation Section of the DDBJ/EMBL/GenBank databases under the accession numbers TPA: BK006744-BK006754. Gene models of all identified sequences from 15 organisms are listed in Additional file 1, Table S1.
Helpful suggestions from Cédric Notredame and an initial discussion with Tal Dagan are kindly acknowledged. We thank Verena Vogler for help with graphics and two anonymous reviewers for helpful comments. We acknowledge all public databases used in this study, especially the TBestDB http://tbestdb.bcm.umontreal.ca/searches/login.php in Montreal, Canada, as well as the U.S. Department of Energy Joint Genome Institute (JGI, http://www.jgi.doe.gov) at Walnut Creek California for pre-publication access to the Physcomitrella genome. The work conducted by the JGI is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231. This work was supported by grants from the Deutsche Forschungsgemeinschaft (AD-92/7-2) and the Konstanz University to IA, JE was supported by a grant (I/82 750) from the Volkswagenstiftung, "Förderungsinitiative Evolutionsbiologie".
- Durnford DG, Deane JA, Tan S, McFadden GI, Gantt E, Green BR: A phylogenetic assessment of the eukaryotic light-harvesting antenna proteins, with implications for plastid evolution. J Mol Evol. 1999, 48: 59-68. 10.1007/PL00006445.View ArticlePubMedGoogle Scholar
- Montané MH, Kloppstech K: The family of light-harvesting-related proteins (LHCs, ELIPs, HLIPs): was the harvesting of light their primary function?. Gene. 2000, 258: 1-8. 10.1016/S0378-1119(00)00413-3.View ArticlePubMedGoogle Scholar
- Adamska I: The Elip family of stress proteins in the thylakoid membranes of pro- and eukaryota. Regulation of Photosynthesis. Edited by: Aro EM, Andersson B. 2001, Kluwer Academic Publishers, 487-505.Google Scholar
- Green BR: The evolution of light-harvesting antennas. Light-harvesting Antennas in Photosynthesis. Edited by: Green BR, Parson WW. 2003, Kluwer Academic Publishers, 129-168.View ArticleGoogle Scholar
- Jansson S: A protein family saga: From photoprotection to light-harvesting (and back?). Photoprotection, Photoinhibition, Gene Regulation, and Environment. Edited by: Demmig-Adams B, Adams WW. 2006, Mattoo AK. Springer, 145-153. full_text.View ArticleGoogle Scholar
- Chow KS, Singh DP, Walker AR, Smith AG: Two different genes encode ferrochelatase in Arabidopsis: mapping, expression and subcellular targeting of the precursor proteins. Plant J. 1998, 15: 531-541. 10.1046/j.1365-313X.1998.00235.x.View ArticlePubMedGoogle Scholar
- Suzuki T, Masuda T, Singh DP, Tan FC, Tsuchiya T, Shimada H, Ohta H, Smith AG, Takamiya K: Two types of ferrochelatase in photosynthetic and nonphotosynthetic tissues of cucumber: their difference in phylogeny, gene expression, and localisation. J Biol Chem. 2002, 277: 4731-4737. 10.1074/jbc.M105613200.View ArticlePubMedGoogle Scholar
- La Roche J, van der Staay GW, Partensky F, Ducret A, Aebersold R, Li R, Golden SS, Hiller RG, Wrench PM, Larkum AW, et al: Independent evolution of the prochlorophyte and green plant chlorophyll a/b light-harvesting proteins. Proc Natl Acad Sci USA. 1996, 93: 15244-15248. 10.1073/pnas.93.26.15244.View ArticlePubMedGoogle Scholar
- Funk C: The PsbS protein: A Cab-protein with a function of its own. Regulation of Photosynthesis. Edited by: Aro EM, Andersson B. 2001, Kluwer Academic Publishers, 453-467.Google Scholar
- Heddad M, Adamska I: Light stress-regulated two-helix proteins in Arabidopsis thaliana related to the chlorophyll a/b-binding gene family. Proc Natl Acad Sci USA. 2000, 97: 3741-3746. 10.1073/pnas.050391397.PubMed CentralView ArticlePubMedGoogle Scholar
- Jansson S, Andersson J, Kim SJ, Jackowski G: An Arabidopsis thaliana protein homologous to cyanobacterial high-light-inducible proteins. Plant Mol Biol. 2000, 42: 345-351. 10.1023/A:1006365213954.View ArticlePubMedGoogle Scholar
- Andersson U, Heddad M, Adamska I: Light stress-induced one-helix protein of the chlorophyll a/b-binding family associated with photosystem I. Plant Physiol. 2003, 132: 811-820. 10.1104/pp.102.019281.PubMed CentralView ArticlePubMedGoogle Scholar
- Funk C, Vermaas W: A cyanobacterial gene family coding for single-helix proteins resembling part of the light-harvesting proteins from higher plants. Biochemistry. 1999, 38: 9397-9404. 10.1021/bi990545+.View ArticlePubMedGoogle Scholar
- Dolganov NA, Bhaya D, Grossman AR: Cyanobacterial protein with similarity to the chlorophyll a/b binding proteins of higher plants: evolution and regulation. Proc Natl Acad Sci USA. 1995, 92: 636-640. 10.1073/pnas.92.2.636.PubMed CentralView ArticlePubMedGoogle Scholar
- Li XP, Björkman O, Shih C, Grossman AR, Rosenquist M, Jansson S, Niyogi KK: A pigment-binding protein essential for regulation of photosynthetic light harvesting. Nature. 2000, 403: 391-395. 10.1038/35000131.View ArticlePubMedGoogle Scholar
- Green BR, Pichersky E: Hypothesis for the evolution of three-helix chl a/b and chl a/c light-harvesting antenna proteins from two-helix and four-helix ancestors. Photosynth Res. 1994, 39: 149-162. 10.1007/BF00029382.View ArticlePubMedGoogle Scholar
- Green BR: Was "molecular opportunism" a factor in the evolution of different photosynthetic light-harvesting pigment systems?. Proc Natl Acad Sci USA. 2001, 98: 2119-2121. 10.1073/pnas.061023198.PubMed CentralView ArticlePubMedGoogle Scholar
- Garczarek L, Poupon A, Partensky F: Origin and evolution of transmembrane chl-binding proteins: hydrophobic cluster analysis suggests a common one-helix ancestor for prokaryotic (Pcb) and eukaryotic (LHC) antenna protein superfamilies. FEMS Microbiol Lett. 2003, 222: 59-68. 10.1016/S0378-1097(03)00241-6.View ArticlePubMedGoogle Scholar
- Koziol AG, Borza T, Ishida K, Keeling P, Lee RW, Durnford DG: Tracing the evolution of the light-harvesting antennae in chlorophyll a/b-containing organisms. Plant Physiol. 2007, 143: 1802-1816. 10.1104/pp.106.092536.PubMed CentralView ArticlePubMedGoogle Scholar
- Bassi R, Croce R, Cugini D, Sandona D: Mutational analysis of a higher plant antenna protein provides identification of chromophores bound into multiple sites. Proc Natl Acad Sci USA. 1999, 96: 10056-10061. 10.1073/pnas.96.18.10056.PubMed CentralView ArticlePubMedGoogle Scholar
- Tyler BM, Tripathy S, Zhang X, Dehal P, Jiang RH, Aerts A, Arredondo FD, Baxter L, Bensasson D, Beynon JL, et al: Phytophthora genome sequences uncover evolutionary origins and mechanisms of pathogenesis. Science. 2006, 313: 1261-1266. 10.1126/science.1128796.View ArticlePubMedGoogle Scholar
- Reyes-Prieto A, Moustafa A, Bhattacharya D: Multiple genes of apparent algal origin suggest ciliates may once have been photosynthetic. Curr Biol. 2008, 18: 956-962. 10.1016/j.cub.2008.05.042.PubMed CentralView ArticlePubMedGoogle Scholar
- Lindell D, Sullivan MB, Johnson ZI, Tolonen AC, Rohwer F, Chisholm SW: Transfer of photosynthesis genes to and from Prochlorococcus viruses. Proc Natl Acad Sci USA. 2005, 101: 11013-11018. 10.1073/pnas.0401526101.View ArticleGoogle Scholar
- Koike H, Shibata M, Yasutomi K, Kashino Y, Satoh K: Identification of photosystem I components from a glaucocystophyte, Cyanophora paradoxa: The PsaD protein has an N-terminal stretch homologous to higher plants. Photosynth Res. 2000, 65: 207-217. 10.1023/A:1010734912776.View ArticlePubMedGoogle Scholar
- Rissler HM, Durnford DG: Isolation of a novel carotenoid-rich protein in Cyanophora paradoxa that is immunologically related to the light-harvesting complexes of photosynthetic eukaryotes. Plant Cell Physiol. 2005, 46: 416-424. 10.1093/pcp/pci054.View ArticlePubMedGoogle Scholar
- Bonente G, Passarini F, Cazzaniga S, Mancone C, Buia MC, Tripodi M, Bassi R, Caffarri S: The occurrence of the psbS gene product in Chlamydomonas reinhardtii and in other photosynthetic organisms and its correlation with energy quenching. Photochem Photobiol. 2008, 84: 1359-1370. 10.1111/j.1751-1097.2008.00456.x.View ArticlePubMedGoogle Scholar
- Rensing SA, Lang D, Zimmer AD, Terry A, Salamov A, Shapiro H, Nishiyama T, Perroud PF, Lindquist EA, Kamisugi Y, et al: The Physcomitrella genome reveals evolutionary insights into the conquest of land by plants. Science. 2008, 319: 64-69. 10.1126/science.1150646.View ArticlePubMedGoogle Scholar
- Alboresi AS, Caffarri S, Nogue F, Bassi R, Morosinotto T: In silico and biochemical analysis of Physcomitrella patens photosynthetic antenna: identification of subunits which evolved upon land adaptation. PLoS ONE. 2008, 3: e2033-10.1371/journal.pone.0002033.PubMed CentralView ArticlePubMedGoogle Scholar
- Cserzo M, Wallin E, Simon I, von Heijne G, Elofsson A: Prediction of transmembrane alpha-helices in prokaryotic membrane proteins: the dense alignment surface method. Protein Eng. 1997, 10: 673-676. 10.1093/protein/10.6.673.View ArticlePubMedGoogle Scholar
- Kilian O, Soisig Steunou A, Grossman AR, Bhaya D: A novel two-domain fusion protein in cyanobacteria with similarity to the CAB/ELIP/HLIP superfamily: Evolutionary implications and regulation. Mol Plant. 2008, 1: 155-166. 10.1093/mp/ssm019.View ArticlePubMedGoogle Scholar
- Kim S, Sandusky P, Bowlby NR, Aebersold R, Green BR, Vlahakis S, Yocum CF, Pichersky E: Characterization of a spinach psbS cDNA encoding the 22 kDa protein of photosystem II. FEBS Lett. 1992, 314: 67-71. 10.1016/0014-5793(92)81463-V.View ArticlePubMedGoogle Scholar
- Wedel N, Klein R, Ljungberg U, Andersson B, Herrmann RG: The single-copy gene psbS codes for a phylogenetically intriguing 22 kDa polypeptide of photosystem II. FEBS Lett. 1992, 314: 61-66. 10.1016/0014-5793(92)81462-U.View ArticlePubMedGoogle Scholar
- Strimmer K, von Haeseler A: Likelihood-mapping: a simple method to visualize phylogenetic content of a sequence alignment. Proc Natl Acad Sci USA. 1997, 94: 6815-6819. 10.1073/pnas.94.13.6815.PubMed CentralView ArticlePubMedGoogle Scholar
- Kühlbrandt W, Wang DN, Fujiyoshi Y: Atomic model of plant light-harvesting complex by electron crystallography. Nature. 1994, 367: 614-621. 10.1038/367614a0.View ArticlePubMedGoogle Scholar
- Wolfe GR, Cunningham FX, Durnford D, Green BR, Gantt E: Evidence for a common origin of chloroplasts with light-harvesting complexes of different pigmentation. Nature. 1994, 367: 566-568. 10.1038/367566a0.View ArticleGoogle Scholar
- Heddad M, Adamska I: The evolution of light stress proteins in photosynthetic organisms. Comp Funct Genom. 2002, 3: 504-510. 10.1002/cfg.221.View ArticleGoogle Scholar
- Green BR, Kühlbrandt W: Sequence conservation of light-harvesting and stress-response proteins in relation to the three-dimensional molecular structure of LHCII. Photosynth Res. 1995, 44: 139-148. 10.1007/BF00018304.View ArticlePubMedGoogle Scholar
- Reyes-Prieto A, Bhattacharya D: Phylogeny of nuclear-encoded plastid-targeted proteins supports an early divergence of glaucophytes within Plantae. Mol Biol Evol. 2007, 24: 2358-2361. 10.1093/molbev/msm186.View ArticlePubMedGoogle Scholar
- Keeling PJ, Burger G, Durnford DG, Lang BF, Lee RW, Pearlman RE, Roger AJ, Gray MW: The tree of eukaryotes. Trends Ecol Evol. 2005, 20: 670-676. 10.1016/j.tree.2005.09.005.View ArticlePubMedGoogle Scholar
- Martin W, Stoebe B, Goremykin V, Hapsmann S, Hasegawa M, Kowallik KV: Gene transfer to the nucleus and the evolution of chloroplasts. Nature. 1998, 393: 162-165. 10.1038/30234.View ArticlePubMedGoogle Scholar
- Moreira D, Le Guyader H, Philippe H: The origin of red algae and the evolution of chloroplasts. Nature. 2000, 405: 69-72. 10.1038/35011054.View ArticlePubMedGoogle Scholar
- Yoon HS, Hackett JD, Ciniglia C, Pinto G, Bhattacharya D: A molecular timeline for the origin of photosynthetic eukaryotes. Mol Biol Evol 204. 2004, 21: 809-818. 10.1093/molbev/msh075.View ArticleGoogle Scholar
- Rodriguez-Ezpeleta N, Brinkmann H, Burey SC, Roure B, Burger G, Löffelhardt W, Bohnert HJ, Philippe H, Lang BF: Monophyly of primary photosynthetic eukaryotes: green plants, red algae, and glaucophytes. Curr Biol. 2005, 15: 1325-1330. 10.1016/j.cub.2005.06.040.View ArticlePubMedGoogle Scholar
- O'Brien EA, Koski LB, Zhang Y, Yang L, Wang E, Gray MW, Burger G, Lang BF: TBestDB: a taxonomically broad database of expressed sequence tags (ESTs). Nucleic Acids Res. 2007, 35: D445-451. 10.1093/nar/gkl770.PubMed CentralView ArticlePubMedGoogle Scholar
- Birney E, Clamp M, Durbin R: GeneWise and Genomewise. Genome Res. 2004, 14: 988-995. 10.1101/gr.1865504.PubMed CentralView ArticlePubMedGoogle Scholar
- Kilian O, Kroth PG: Identification and characterization of a new conserved motif within the presequence of proteins targeted into complex diatom plastids. Plant J. 2005, 41: 175-183. 10.1111/j.1365-313X.2004.02294.x.View ArticlePubMedGoogle Scholar
- Bendtsen JD, Nielsen H, Widdick D, Palmer T, Brunak S: Prediction of twin-arginine signal peptides. BMC Bioinformatics. 2005, 6: 1802-1816. 10.1186/1471-2105-6-167.View ArticleGoogle Scholar
- Eddy SR: Profile hidden Markov models. Bioinformatics. 1998, 14: 755-763. 10.1093/bioinformatics/14.9.755.View ArticlePubMedGoogle Scholar
- Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25: 3389-3402. 10.1093/nar/25.17.3389.PubMed CentralView ArticlePubMedGoogle Scholar
- Cserzo M, Eisenhaber F, Eisenhaber B, Simon I: On filtering false positive transmembrane protein predictions. Protein Eng. 2002, 15: 745-752. 10.1093/protein/15.9.745.View ArticlePubMedGoogle Scholar
- Liu Z, Yan H, Wang K, Kuang T, Zhang J, Gui L, An X, Chang W: Crystal structure of spinach major light-harvesting complex at 2.72 Å resolution. Nature. 2004, 428: 287-292. 10.1038/nature02373.View ArticlePubMedGoogle Scholar
- Maddison WP, Maddison DR: Mesquite: A modular system for evolutionary analysis. Version 2.5. 2008, [http://mesquiteproject.org]Google Scholar
- Wallace IM, O'Sullivan O, Higgins DG, Notredame C: M-Coffee: combining multiple sequence alignment methods with T-Coffee. Nucleic Acids Res. 2006, 34: 1692-1699. 10.1093/nar/gkl091.PubMed CentralView ArticlePubMedGoogle Scholar
- Hall TA: BioEdit: A user-friendly biological sequence alignment editor and analysis program for Windows 95/98. Nucleic Acids Symp. 1999, Ser 41: 95-98.Google Scholar
- Castresana J: Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol Biol Evol. 2000, 17: 540-552.View ArticlePubMedGoogle Scholar
- Abascal F, Zardoya R, Posada D: ProtTest: selection of best-fit models of protein evolution. Bioinformatics. 2005, 21: 2104-2105. 10.1093/bioinformatics/bti263.View ArticlePubMedGoogle Scholar
- Tamura K, Dudley J, Nei M, Kumar S: MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. Mol Biol Evol. 2007, 24: 1596-1599. 10.1093/molbev/msm092.View ArticlePubMedGoogle Scholar
- Guindon S, Gascuel O: A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol. 2003, 52: 696-704. 10.1080/10635150390235520.View ArticlePubMedGoogle Scholar
- Huelsenbeck JP, Ronquist F: MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics. 2001, 17: 754-755. 10.1093/bioinformatics/17.8.754.View ArticlePubMedGoogle Scholar
- Felsenstein J: PHYLIP-Phylogeny Inference Package. 1993, Department of Genetics, University of Washington, Seattle, Distributed by the authorGoogle Scholar
- Schmidt HA, Strimmer K, Vingron M, von Haeseler A: TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics. 2002, 18: 502-504. 10.1093/bioinformatics/18.3.502.View ArticlePubMedGoogle Scholar
- Shimodaira H, Hasegawa M: Multiple comparisons of log-likelihoods with applications to phylogenetic inference. Mol Biol Evol. 1999, 16: 1114-1116.View ArticleGoogle Scholar
- Strimmer K, Rambaut A: Inferring confidence sets of possibly misspecified gene trees. Proc Biol Sci. 2002, 269: 137-142. 10.1098/rspb.2001.1862.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.