ESTimating plant phylogeny: lessons from partitioning
- Jose EB de la Torre†1,
- Mary G Egan†2,
- Manpreet S Katari1,
- Eric D Brenner1, 3,
- Dennis W Stevenson3,
- Gloria M Coruzzi1 and
- Rob DeSalle2Email author
© de la Torre et al; licensee BioMed Central Ltd. 2006
Received: 03 February 2006
Accepted: 15 June 2006
Published: 15 June 2006
While Expressed Sequence Tags (ESTs) have proven a viable and efficient way to sample genomes, particularly those for which whole-genome sequencing is impractical, phylogenetic analysis using ESTs remains difficult. Sequencing errors and orthology determination are the major problems when using ESTs as a source of characters for systematics. Here we develop methods to incorporate EST sequence information in a simultaneous analysis framework to address controversial phylogenetic questions regarding the relationships among the major groups of seed plants. We use an automated, phylogenetically derived approach to orthology determination called OrthologID generate a phylogeny based on 43 process partitions, many of which are derived from ESTs, and examine several measures of support to assess the utility of EST data for phylogenies.
A maximum parsimony (MP) analysis resulted in a single tree with relatively high support at all nodes in the tree despite rampant conflict among trees generated from the separate analysis of individual partitions. In a comparison of broader-scale groupings based on cellular compartment (ie: chloroplast, mitochondrial or nuclear) or function, only the nuclear partition tree (based largely on EST data) was found to be topologically identical to the tree based on the simultaneous analysis of all data. Despite topological conflict among the broader-scale groupings examined, only the tree based on morphological data showed statistically significant differences.
Based on the amount of character support contributed by EST data which make up a majority of the nuclear data set, and the lack of conflict of the nuclear data set with the simultaneous analysis tree, we conclude that the inclusion of EST data does provide a viable and efficient approach to address phylogenetic questions within a parsimony framework on a genomic scale, if problems of orthology determination and potential sequencing errors can be overcome. In addition, approaches that examine conflict and support in a simultaneous analysis framework allow for a more precise understanding of the evolutionary history of individual process partitions and may be a novel way to understand functional aspects of different kinds of cellular classes of gene products.
Higher order Spermatophyte phylogeny: an unresolved systematics problem
In contrast, the majority of molecular phylogenies have postulated the gymnosperms to be a monophyletic group sister to all angiosperms. Most molecular studies place the Gnetales as a sister group to the conifers (Fig. 1B; [7–9, 15, 16]). However, some molecular evidence can also be interpreted as supporting the anthophyte theory . Attempts to associate molecular expression data with morphological structures (e.g. ) also place the Gnetales and conifers together, with shared expression of orthologous genes indicating that the Gnetum strobilar collar and ovule are homologous to the conifer bract-and-ovule/ovuliferous scale complex. Adding to the controversy, a recent study involving phytochrome genes (Fig. 1D; ) has placed the Gnetales as basal gymnosperms, with Ginkgoales and Cycadales as sister groups branching after the Coniferales. No recent combined analyses of molecular and morphological data have been produced and a very early one was equivocal . In the past this question has been addressed with a single partition and more recently with eight  and thirteen partitions . To our knowledge as of yet no general consensus has been reached as to the phylogenetic arrangement of these six major seed plant lineages. In fact the Tree of Life website for Spermatophytes  resolves only two nodes involving the five relevant taxa listed above. In the tree of life study, Gnetales are shown as the sister group to angiosperms, yet the difference between the Gnetales as the sister group to the angiosperms versus a monophyletic gymnoperms with cycads sister to other gymnosperms requires a very different set of morphological concepts and transformations. For example, are carpels leaves with marginal ovules or are they subtending leaves with axillary ovules? These are very different and mutually exclusive scenarios. Consequently, attempts to understand the genes involved in the innovations achieved by the angiosperms are severely hampered.
It is clear from the literature on seed plant phylogenetics that the addition of information relevant to the seed plants may be a viable way to solve this difficult problem. In addition, many studies on other taxa have demonstrated that the simultaneous analysis of multiple data partitions can result in an increase in overall branch support, despite conflict among the characters, due to emergent properties not evident in the separate analyses of individual data partitions [23–27]. An additional positive aspect of adding process partitions to an analysis is that once a large number of partitions from various cellular functional classes are available, partitioned analysis will also allow detailed examination of the evolutionary dynamics of these classes of genes. The latter advantage may shed light on the role of certain genes in organismal evolution.
The (phylogenetic) trouble with ESTs
Sequencing errors and orthology determination pose challenges to the use of ESTs as a source of characters for systematics. There can be a high rate of sequencing error in raw EST data, since it is derived from single pass reads. A strategy to minimize this problem is contig assembly and EST clustering using several reads at every region (e.g. [39, 40]). In our approach, a minimum of 10 reads were used to determine each EST sequence. While orthology assessment is difficult in sub-genomic studies such as ones that use PCR or gene cloning approaches to obtain sequences, one can enhance orthology assessment in such studies by careful design of primers, and by referring to the whole genome sequences of closely related model taxa as guides for assessing orthology. Assessing orthology of ESTs is more difficult, because of the inaccuracy that accompanies EST analysis and by the possibility that some desired orthologs are not expressed or expressed at low levels. We determined the orthology of EST sequences using a tree-building approach. Initially this was accomplished by including ESTs in the gene tree analysis for each gene family. Without automation, this approach would be prohibitively time consuming and labor intensive and would greatly restrict the use of genomic-scale EST data in phylogenetic analyses. Therefore, during the course of this study, we developed automated methods for orthology determination within a parsimony framework, described elsewhere  (see also Methods section below for an overview).
Why partition? Hidden support and phylogenetic inference
Studies with large numbers of process partitions in a dataset exist [23, 26, 42–48], and some of these have attempted to address higher phylogenetic questions of mammals, yeast and bacteria by taking advantage of genomic level approaches. These large data set approaches can be divided into whole-genome approaches (mostly microbial) and "subgenomic"  approaches. The whole-genome approach has the obvious advantage that orthology assessment is made with more certainty and ease when whole genome sequences are used in a phylogenetic analysis. Such is the case in a study of the relationships of seven ingroup yeast species with whole genomes sequenced  and much of the whole genome bacterial phylogenetic studies that are beginning to appear in the literature [44, 45]. The yeast study is particularly interesting in that the authors approached the very question of where in sequence analysis space we need to be to resolve phylogenies with robustness, Using 106 carefully chosen orthologous genes they showed that > 20 genes or > 15 kb of sequence produced a plateau of robustness at measures of 100% for conventionally used detectors of node robustness (bootstrapping; ) in phylogenetics. In addition, they showed that despite rampant incongruence (as typified by the large number of single gene trees that disagree in topology amongst the 106 single gene trees that can be produced), combining gene partitions into a concatenated or simultaneous analysis [51, 52] was always the best way to analyze the sequence information in a phylogenetic context.
Implementing a different node support measure  than the ones used in the yeast study, DeSalle  demonstrated that this phenomenon is the result of hidden support in the various gene partitions included in the analysis. Hidden support  is simply the amount of support at a node that is NOT found in the separate gene partitions analyzed individually. All character partitions have either positive, neutral or negative latent support for any given phylogenetic hypothesis, that becomes evident only after combining or concatenating data partitions and performing a simultaneous analysis of all available data. An assessment of hidden support using the yeast dataset of Rokas et al.  reveals that one in every five characters that support the simultaneous analysis (SA) tree is hidden . This large amount of hidden support for the nodes in the SA tree, suggests that interaction of character information is an important concept in reconstructing phylogenetic relationships. More importantly quantifying hidden support can enlighten researchers about the degree of positive or negative interaction of characters in a concatenated analysis that can help determine "next steps" in phylogenetic studies. Hidden support and partitioned analyses can also aid in determining the effects of missing data and the congruence of particular partitions with an overall phylogenetic hypothesis. Issues arising from missing data are especially problematic with EST phylogenetic studies and are caused by two factors. First, in EST studies partial gene sequences are more frequently reported than full length cDNA sequences; and second, because of the random nature of clones in EST libraries often times orthologs are not found in all taxa in the study. An exploration of the amount of support contributed by each partition to the simultaneous analysis tree can aid in determining the effect of these two kinds of missing data on overall phylogenetic hypotheses.
Phylogenetic analyses and support
In our analysis, cycads appear basal to a grouping of Ginkgo, Gnetum and conifers, with Gnetales sister to the Coniferales. This arrangement is in accordance with other recent molecular studies, that conflict with the Anthophyte hypothesis. These other studies did not include a morphological component. Our separate analysis of morphological characters supported the grouping of Gnetales with angiosperms. However, the inclusion of the morphological data set in a combined analysis contributed nine steps of hidden support to the grouping of Gnetales and Coniferales.
Total branch support on the simultaneous analysis tree in Figure 3 is 275 steps. Of those 275 steps, only 81 are apparent in the separately analyzed partitions, while 194 are hidden. In other words, 70.5% of the phylogenetically informative characters provide hidden support that would not have been apparent had each gene region been analyzed separately. Figure 3 shows the distribution of this hidden support (contributed by the 43 partitions) at each node of the tree. Strikingly, 100% of the support for node 3 is hidden (69.3% hidden for node 1, 65.6% hidden for node 2 and 78.1% hidden for node 4).
Effect of different partitioning strategies on hidden support
In general, the broad nuclear gene data partition proved to be the most consistent with the simultaneous analysis tree. Not only is the nuclear gene tree topologically identical to the simultaneous analysis tree, but all simultaneous analysis tree nodes also receive positive branch support, both hidden and apparent, from the nuclear data set. While this might be seen as an argument for the preferential use of nuclear genes (or ESTs) in phylogenetic analyses, over the molecular characters from other subcellular compartments (such as chloroplast and mitochondria), these subcellular compartments did contribute character support to the simultaneous analysis tree despite topological conflict. In addition, the topological differences among subcellular gene partitions examined using the ILD test (implemented in PAUP*) were not significant (p-value > 0.05, see below).
Exploration of incongruence among data partitions
In order to explore the interaction among data partitions analyzed separately, we calculated ILDs  and tested the significance of the resulting length differences  between all possible pairwise comparisons of the individual data partitions as well as among the broader scale groupings of the data that we examined for hidden support above. Of the 937 pairwise comparisons, 83 showed significant length differences [see Additional file 4]. Of those 83 conflicting pairwise comparisons, 13 were comparisons between the morphological data set and an individual gene partition; 18 were between mitochondrial CO1 and another individual partition; 11 were between the nuclear heat shock protein 82 and other individual partitions; and 8 were between the chloroplast RNA polymerase beta subunit 1 and other individual partitions. We highlight these examples to show that no single partition dominated in terms of contributing conflict, but only a handful of partitions are involved in significant length differences. As with our examination of hidden support, when we examined conflict among broader scale groupings of the data, we found less conflict (as measured by ILD). In addition, none of the broader scale groupings examined for hidden support showed significant conflict (as measured by ILD) except for those groupings compared to the morphological data set.
Effect of missing taxa
Several of the partitions we used had missing taxa due, for example, to the lack of available sequence data for a given taxon for a particular gene region [see Additional Files 1 and 3]. We explored the effect these missing taxa had on the overall phylogenetic hypothesis by comparing the amount of branch support and hidden branch support for each node using partitions where information was available for 7, 6, 5 and 4 taxa. This type of analysis is particularly relevant to EST studies as the probability of obtaining a full complement of taxa for a particular ortholog is reduced as the number of taxa in the analysis increases. Recent studies using large data sets, also containing ESTs [58, 59] examined the effect of missing data by removing taxa with large amounts of missing data and comparing those results to an analysis in which these taxa were not excluded. Since the results of these analyses were similar, it was concluded that the use of taxa with large amounts of missing data did not bias the results. A simulation study  concluded that it is not the amount of missing data that is problematic in terms of resolving trees but the presence of too few characters to allow taxon placement.
In our analyses, we established a matrix with six ingroup taxa and one outgroup, and no taxa were removed from the matrix at any time in our analysis regardless of amounts of missing data in the various partitions. This approach allowed us to explore the effect of the inclusion of taxa with missing data by examining branch support values contributed to the simultaneous analysis tree by partitions with varying amounts of missing data. In this case, we compared the contribution to support provided by those partitions that contained at least some data (but not necessarily the complete dataset) for all seven taxa to the group of partitions that were lacking data for one taxon for an entire partition (an individual gene region); to those that were lacking data for two taxa and so forth. In this way we were able to examine the affect of incompletely taxonomically sampled partitions.
Effect of different functional classes of genes
We also partitioned the data set into classes of genes based on their cellular function. Our functional partitions were MnoPSR (Non-Photosynthetic or Respiratory Metabolism: 7 gene partitions), photosynthetic (11 gene partitions), respiration (7 gene partitions), signalling (3 gene partitions), structural (8 gene partitions), transcription factors (2 gene partitions) and genes of unknown function (4 gene partitions).
While our results would suggest that conserved structural proteins and signalling proteins might be better at defining both deeper nodes and tree tip nodes, and proteins in the transcription factor and respiration class of genes might be best for nodes nearer the tips of the tree; the sample size of genes in functional partitions for this study is small. Nevertheless, these results do suggest a potential method for categorizing functional gene classes with respect to their congruence with a simultaneous or organismal phylogeny. As more ESTs are added to the sequence database, the sample sizes of these functional classes will become larger and a more rigorous test of the role of functional class in phylogenetic analysis may be possible.
Where to from here?
Rokas et al.,  addressed the question of where in sequence analysis space we need to be to robustly resolve phylogenies; showing that > 20 genes or > 15 kb of sequence produced a plateau of robustness at measures of 100% for conventionally used detectors of node robustness. The > 15, 000 base pairs and > 40 genes in the present study is only enough to garner strong support for three of the four nodes in the concatenated analysis tree with node 3 receiving the weakest support in the analysis. More sequence information is thus needed to resolve this problem, and it appears from the hidden support analysis that nuclear gene partitions will most efficiently provide information for all nodes in the concatenated analysis tree. In addition, both the yeast analysis by Rokas et al. , and the seed plant analyses presented here strongly suggest that even though a single gene partition might support an alternative topology to the concatenated analysis tree, hidden support in most gene partitions will contribute positively to overall robustness of a phylogenetic hypothesis. Finally, the yeast and seed plant examples, while having similar numbers of ingroup taxa, suggest that different numbers of characters and genes will be needed to assign robust inferences to nodes in studies. We suggest that this discrepancy may be a factor of the different phylogenetic ages of the groups: the ingroup species in the yeast phylogeny diverged between 50 and 100 MYA [63, 64] is basically within a genus, while the ingroup taxa in the plant study diverged no earlier than 400 MYA [65–67]. In addition, several studies with much larger numbers of ingroup taxa exist [23, 42, 44] and these studies suggest that larger numbers of characters than those of the yeast study are required for robust resolution of this simple phylogenetic hypothesis. How many more characters? A strong indication may be given by the high support and robustness of node 1 (Arabidopsis + Oryza). A plateau of topological robustness for other nodes may be reached when a similar number of phylogenetically informative characters is reached.
The approach we describe here, where support for the SA tree is estimated for each process partition, will also pinpoint those partitions that disagree or conflict with the overall general pattern of divergence of the taxa in the analysis. If one assumes that the SA tree best represents evolutionary history of the taxa involved, then such partitions are in conflict with overall organismal history of the taxa in the analysis. This approach then would provide a method for detecting process partitions that might be selected for or have experienced drift and such partitions might be important in some of the more interesting organismal differences amongst the taxa in the analysis.
One final and important aspect of the present analysis highlights a problem that will be prevalent in future genomic level phylogenetic studies. This problem concerns the almost continual revision of the overall phylogenetic hypothesis for a set of taxa. For instance, as more and more EST data are added to the database, more and more process partitions can be added to an analysis. This will effectively create a growing matrix that might even expand daily. With the addition of each new process partition to an analysis, all support values and other tree metrics such as bootstrap values , jackknife values [68, 69], Bayesian posterior probabilities [70–72], and node support values [21–23, 42, 61, 62, 73] need to be recalculated. In addition, the manual inclusion of the new process partitions to a growing matrix is time consuming and sometimes prone to error. We therefore suggest that such important systematic questions where large amounts of genomic level data are available have a need for an automated and rapid means for inclusion of new process partitions to the growing matrix. Such an automated approach is under development for the seed plant question and will be discussed in a separate publication .
• Simultaneous analysis using 42 gene partitions and a morphological partition yield a phylogenetic hypothesis with a monophyletic gymnosperms which is at odds with the Anthophyte hypothesis.
• Addition of short EST sequences to a data set can enhance a phylogenetic analysis, if the problems of sequence quality and orthology are overcome.
• The majority of support in this study is hidden support, meaning that the support is not immediately apparent in single gene partitions.
• Completeness of data partitions with respect to full complement of taxa had a large affect on levels of support in phylogenetic analysis. In our study example with seven taxa, support from partitions that had sequences for five or fewer taxa was nonexistent. However, variation in the amount and distribution of data within partitions may also play a role.
• When phylogenetic incongruence between a partitioned functional class of genes (such as transcription factors) and the organismal phylogeny is detected, this result suggests that the partition has experienced a unique evolutionary history relative to the organisms. This different evolutionary history can be used as a signpost of altered evolutionary pressure in a particular class of genes. In this way, incongruence of a particular class of genes (such as transcription factors) in a partitioned analysis allow us to establish hypotheses about the evolution and potential function of these gene classes.
Orthology determination and phylogentic analyses
Many studies use pairwise sequence comparison schemes, such as BLAST , COG (Clusters of Orthologous Groups; ), INPARANOID [77, 78], RBH (Reciprocal Blast Hits; [79, 80]), and RSD (Reciprocal Smallest Distance Algorithm; ) to determine gene orthology on a genomic scale.
Since we are ultimately interested in exploring the characters associated with particular evolutionary novelties, we use a character-based alternative to distance based methods for the identification of orthologous gene regions. The tree-building approach to orthology determination involves the generation of gene family trees in order to identify the orthologous gene family member for each EST sequence. Within a character-based parsimony framework, nodes are defined by shared derived characters.
Without automation, this approach would be prohibitively time consuming and labor intensive and would greatly restrict the use of genomic-scale EST data in parsimony based analyses since the placement of ESTs into orthology groups using this character-based approach would require manual rebuilding of gene family trees for each new EST to be classified. Therefore, during the course of this study, we developed automated methods for orthology determination within a parsimony framework: OrthologID ,  firefox/Netscape is the preferred browser–Internet Explorer is not supported by the current OrthologID viewer). This approach builds gene family trees using sequences from available completely sequenced genomes (currently, Arabidopsis,Oryza and Populus; Chlamydomonas reinhardtii is used as an outgroup in the gene tree analyses for orthology determination), such that all members of a given gene family are included. In addition, rather than including EST sequences in the gene tree construction analysis, the whole genome gene family trees are first used to construct "guide" trees. These gene family guide trees are used to identify diagnostic characters for each gene family member and then EST query sequences are screened for the presence of shared diagnostics using the CAOS algorithm (See  for details of the guide tree/CAOS approach). This approach eliminates the need to manually rebuild a gene family tree each time a new EST sequence requires orthology determination.
We use only completely sequenced genomes for constructing gene family guide trees with OrthologID in order to minimize the possibility of the erroneous placement of query sequences due to missing data. If gene family guide trees had been constructed using partially sequenced genomes, it is possible that some gene family members would be missing, in which case it could be possible that queries orthologous to these missing gene family members would be incorrectly placed. The current database of plant genomes will soon be expanded to include complete genomes from other phylogenetic lineages, including prokaryotes and non-plant eukaryotes.
OrthologID automatically searches the local database of completely sequenced plant genomes and performs an initial clustering of gene sequences into putative gene families, using NCBI BLAST  with an expectation value cutoff of 1e-20. Next it builds gene family trees. It performs sequence alignments using the program MAFFT  using different sets of alignment parameters to create three different alignments for each gene family and culls  alignment ambiguous regions. The three pairs of gap open penalty and offset values are (1.53, 0.123), (2.4, 0.1), and (1.0, 0.2). It performs tree searches within a parsimony framework, using either exhaustive searches or, where exhaustive tree searches are not possible due to a large number of putative gene family members, heuristic searches are performed implementing the parsimony ratchet  with 200 re-weighting iterations for each of 20 ratchets; in order to rigorously explore tree space. It saves resultant trees and computes the strict consensus when multiple equally parsimonious trees are obtained from the analysis and then passes these guide trees to the CAOS algorithm to identify node diagnostics. In order to identify the ortholog of an EST sequence, OrthologID uses the CAOS algorithm to screen the ESTs for the presence of characters that are diagnostic of nodes on the guide tree.
Once EST orthologs had been identified, we manually assembled a process partition matrix for each orthologous gene region for all of the seven chosen plant taxa; sequences for the moss Physcomitrella patens [87, 88] were used as outgroup in all phylogenetic analyses. We aligned the sequences for each process partition using the default parameters in Clustal . We assembled a simultaneous analysis matrix composed of 42 gene regions consisting of mitochondrial (6), chloroplast (16) and nuclear (20; including 19 composed of EST protein sequences and one of DNA sequences [18S rDNA]) along with a single morphological partition. A list of all the gene regions, and the accession numbers of all sequences used in the analysis, can be found in Additional file 1. In several instances in which a mitochondrial or chloroplast sequence was not available for a given taxon, we substituted the corresponding sequence from a related species. These are noted inAdditional file 1. The matrix used in the analyses can be found as Additional file 2. Phylogenetic analyses were accomplished in PAUP* version 4.0b1.0  using exhaustive searches. Measures of branch support [21, 23, 61, 62] were accomplished using batch command files in PAUP*, and the resulting log files were imported into an Excel spreadsheet for final calculations.
The authors acknowledge Rob Martienssen and W. Richard McCombie (CSHL), Indra Neil Sarkar (AMNH) and other members of the New York Plant Genomics Consortium (NYPGC) for stimulating discussion and comments on the manuscript. The work in this manuscript was supported by an NSF Plant Genome grant #DBI -0421604 to GMC, DWS, and RD. MGE, RD and DWS thank the continued support of the Lewis B and Dorothy Cullman Program in Molecular Systematics at the AMNH and the NYBG.
- Crane P: Phylogenetic analysis of seed plants and the origin of angiosperms. Annals of the Missouri Botanical Garden. 1985, 72: 716-793. 10.2307/2399221.View Article
- Doyle J, Donoghue M: Seed plant phylogeny and the origin of angiosperms: An experimental cladistic approach. Bot Rev. 1986, 52: 331-429.View Article
- Loconte H, Stevenson D: Cladistics of the Spermatophyta. Brittonia. 1990, 42: 197-211. 10.2307/2807216.View Article
- Rothwell G, Serbert R: Lignophyte phylogeny and the evolution of spermatophytes: A numerical cladistic analysis. Systematic Botany. 1994, 19: 443-482. 10.2307/2419767.View Article
- Nixon K, Crepet W, Stevenson D, Friis EM: A reevaluation of seed plant phylogeny. Annals of the Missouri Botanical Garden. 1994, 81: 484-533. 10.2307/2399901.View Article
- Doyle JA: Molecules, morphology, fossils, and the relationship of angiosperms and Gnetales. Molecular Phylogenetics and Evolution. 1998, 9: 448-462. 10.1006/mpev.1998.0506.View ArticlePubMed
- Bowe LM, Coat G, dePamphilis CW: Phylogeny of seed plants based on all three genomic compartments: extant gymnosperms are monophyletic and Gnetales' closest relatives are conifers. Proceedings of the National Academy of Sciences USA. 2000, 97: 4092-4097. 10.1073/pnas.97.8.4092.View Article
- Chaw SM, Parkinson CL, Cheng Y, Vincent TM, Palmer JD: Seed plant phylogeny inferred from all three plant genomes: monophyly of extant gymnosperms and origin of Gnetales from conifers. Proceedings of the National Academy of Sciences USA. 2000, 97: 4086-4091. 10.1073/pnas.97.8.4086.View Article
- Donoghue MJ, Doyle JA: Seed plant phylogeny: Demise of the anthophyte hypothesis?. Curr Biol. 2000, 10: R106-109. 10.1016/S0960-9822(00)00304-3.View ArticlePubMed
- Soltis PS, Soltis DE, Chase MW: Angiosperm phylogeny inferred from multiple genes as a tool for comparative biology. Nature. 1999, 402: 402-404. 10.1038/46528.View ArticlePubMed
- Soltis PS, Soltis DE, Wolf PG, Nickrent DL, Chaw SM, Chapman RL: The phylogeny of land plants inferred from 18S rDNA sequences: pushing the limits of rDNA signal?. Mol Biol Evol. 1999, 16: 1774-1784.View ArticlePubMed
- Winter KU, Becker A, Munster T, Kim JT, Saedler H, Theissen G: MADS-box genes reveal that gnetophytes are more closely related to conifers than to flowering plants. Proceedings of the National Academy of Sciences USA. 1999, 96: 7342-7347. 10.1073/pnas.96.13.7342.View Article
- Schmidt M, Schneider-Poetsch HA: The evolution of gymnosperms redrawn by phytochrome genes: the Gnetatae appear at the base of the gymnosperms. Journal of Molecular Evolution. 2002, 54: 715-724. 10.1007/s00239-001-0042-9.View ArticlePubMed
- Gifford EM, Foster AS: Morphology and Evolution of Vascular Plants. 1989, New York, Freeman and Co., 3rd
- Goremykin V, Bobrova V, Pahnke J, Troitsky A, Antonov A, Martin W: Noncoding sequences from the slowly evolving chloroplast inverted repeat in addition to rbcL data do not support gnetalean affinities of angiosperms. Mol Biol Evol. 1996, 13: 383-396.View ArticlePubMed
- Soltis DE, Soltis PS, Zanis MJ: Phylogeny of seed plants based on evidence from eight genes. Am J Bot. 2002, 89: 1670-1681.View ArticlePubMed
- Rydin C, Källersjö M, Friis EM: Seed plant relationships and the systematic position of Gnetales based on nuclear and chloroplast DNA: Conflicting data, rooting problems, and the monophyly of conifers. International Journal of Plant Science. 2002, 163: 197-214. 10.1086/338321.View Article
- Shindo S, Sakakibara K, Sano R, Ueda K, Hasebe M: Characterization of a FLORICAUL/LEAFY homologue of Gnetum parvifolium and its implications for the evolution of reproductive organs in seed plants. International Journal of Plant Science. 2001, 162: 1199-1209. 10.1086/323417.View Article
- Doyle JA, Donoghue MJ, Zimmer EA: Integration of morphological and ribosomal RNA data on the origin of angiosperms. Annals of the Missouri Botanical Garden. 1994, 81: 419-450. 10.2307/2399899.View Article
- Burleigh JG, Mathews S: Phylogenetic signal in nucleotide data from seed plants: implications for resolving the seed plant tree of life. Am J Botany. 2004, 91: 1599-1613.View Article
- Baker RH, DeSalle R: Multiple sources of character information and the phylogeny of Hawaiian drosophilids. Systematic Biology. 1997, 46: 654-673. 10.2307/2413499.View ArticlePubMed
- Tree of life. [http://tolweb.org/tree]
- Gatesy J, O'Grady P, Baker RH: Corroboration among data sets in simultaneous analysis: hidden support for phylogenetic relationships among higher level artiodactyl taxa. Cladistics. 1999, 15: 271-313. 10.1111/j.1096-0031.1999.tb00268.x.View Article
- Gatesy J, Matthee C, DeSalle R, Hayashi C: Resolution of a supertree/supermatrix paradox. Systematic Biology. 2002, 51: 652-664. 10.1080/10635150290102311.View ArticlePubMed
- Gatesy J, Amato G, Norell M, DeSalle R, Hayashi C: Combined support for wholesale taxic atavism in gavialine crocodylians. Systematic Biology. 2003, 52: 403-422.View ArticlePubMed
- Rokas A, Williams BL, King N, Carroll SB: Genome-scale approaches to resolving incongruence in molecular phylogenies. Nature. 2003, 425: 798-804. 10.1038/nature02053.View ArticlePubMed
- Mayer K, Mewes HW: How can we deliver the large plant genomes? Strategies and perspectives. Curr Opin Plant Biol. 2002, 5: 173-177. 10.1016/S1369-5266(02)00235-2.View ArticlePubMed
- Goff SA, Ricke D, Lan TH, Presting G, Wang R, Dunn M, Glazebrook J, Sessions A, Oeller P, Varma H, Hadley D, Hutchison D, Martin C, Katagiri F, Lange BM, Moughamer T, Xia Y, Budworth P, Zhong J, Miguel T, Paszkowski U, Zhang S, Colbert M, Sun W, Chen L, Cooper B, Park S, Wood TC, Mao L, Quail P, Wing R, Dean R, Yu Y, Zharkikh A, Shen R, Sahasrabudhe S, Thomas A, Cannings R, Gutin A, Pruss D, Reid J, Tavtigian S, Mitchell J, Eldredge G, Scholl T, Miller RM, Bhatnagar S, Adey N, Rubano T, Tusneem N, Robinson R, Feldhaus J, Macalma T, Oliphant A, Briggs S: A draft sequence of the rice genome (Oryza sativa L. ssp. japonica). Science. 2002, 296: 92-100. 10.1126/science.1068275.View ArticlePubMed
- Yu J, Hu S, Wang J, Wong GKS, Li S, Liu B, Deng Y, Dai L, Zhou Y, Zhang X, Cao M, Liu J, Sun J, Tang J, Chen Y, Huang X, Lin W, Ye C, Tong W, Cong L, Geng J, Han Y, Li L, Li W, Hu G, Huang X, Li W, Li J, Liu Z, Li L, Liu J, Qi Q, Liu J, Li L, Li T, Wang X, Lu H, Wu T, Zhu M, Ni P, Han H, Dong W, Ren X, Feng X, Cui P, Li X, Wang H, Xu X, Zhai W, Xu Z, Zhang J, He S, Zhang J, Xu J, Zhang K, Zheng X, Dong J, Zeng W, Tao L, Ye J, Tan J, Ren X, Chen X, He J, Liu D, Tian W, Tian C, Xia H, Bao Q, Li G, Gao H, Cao T, Wang J, Zhao W, Li P, Chen W, Wang X, Zhang Y, Hu J, Wang J, Liu S, Yang J, Zhang G, Xiong Y, Li Z, Mao L, Zhou C, Zhu Z, Chen R, Hao B, Zheng W, Chen S, Guo W, Li G, Liu S, Tao M, Wang J, Zhu L, Yuan L, Yang H: A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science. 2002, 296: 79-92. 10.1126/science.1068037.View ArticlePubMed
- Albert VA, Soltis DE, Carlson JE, Farmerie WG, Wall PK, Ilut DC, Solow TM, Mueller LA, Landherr LL, Hu Y, Buzgo M, Kim S, Yoo MJ, Frohlich MW, Perl-Treves R, Schlarbaum SE, Bliss BJ, Zhang X, Tanksley SD, Oppenheimer DG, Soltis PS, Ma H, dePamphilis CW, Leebens-Mack JH: Floral gene resources from basal angiosperms for comparative genomics research. BMC Plant Biology. 2005, 5: 5-10.1186/1471-2229-5-5.PubMed CentralView ArticlePubMed
- Rudd S: Expressed sequence tags: alternative or complement to whole genome sequences?. Trends in Plant Science. 2003, 8: 321-329. 10.1016/S1360-1385(03)00131-6.View ArticlePubMed
- Allona I, Quinn M, Shoop E, Swope K, St. Cyr S, Carlis J, Riedl J, Retzel E, Campbell MM, Sederoff R, Whetten RW: Analysis of xylem formation in pine by cDNA sequencing. Proceedings of the National Academy of Sciences USA. 1998, 95: 9693-9698. 10.1073/pnas.95.16.9693.View Article
- Brenner ED, Stevenson DW, McCombie RW, Katari MS, Rudd SA, Mayer KFX, Palenchar PM, Runko SJ, Twigg RW, Dai G, Martienssen RA, Benfey PN, Coruzzi GM: Expressed sequence tag analysis in Cycas, the most primitive living seed plant. Genome Biology. 2003, 4: R78-10.1186/gb-2003-4-12-r78.PubMed CentralView ArticlePubMed
- Brenner ED, Katari MS, Stevenson DW, Rudd SA, Douglas AW, Moss WN, Twigg RW, Runko SJ, Stellari GM, McCombie WR, Coruzzi GM: EST analysis in Ginkgo biloba: an assessment of conserved developmental regulators and gymnosperm specific genes. BMC Genomics. 2005, 1: 143-10.1186/1471-2164-6-143.View Article
- Egertsdotter U, van Zyl LM, MacKay J, Peter G, Kirst M, Clark C, Whetten R, Sederoff R: Gene expression during formation of earlywood and latewood in loblolly pine: expression profiles of 350 genes. Plant Biol (Stuttg). 2004, 6: 654-663. 10.1055/s-2004-830383.View Article
- Kirst M, Johnson AF, Baucom C, Ulrich E, Hubbard K, Staggs R, Paule C, Retzel E, Whetten R, Sederoff R: Apparent homology of expressed genes from wood-forming tissues of loblolly pine (Pinus taeda L.) with Arabidopsis thaliana. Proceedings of the National Academy of Sciences USA. 2003, 100: 7383-7388. 10.1073/pnas.1132171100.View Article
- Ohlrogge J, Benning C: Unravelling plant metabolism by EST analysis. Curr Opin Plant Biol. 2000, 3: 224-228.View ArticlePubMed
- Whetten R, Sun YH, Zhang Y, Sederoff R: Functional genomics and cell wall biosynthesis in loblolly pine. Plant Mol Biol. 2001, 47: 275-291. 10.1023/A:1010652003395.View ArticlePubMed
- Parkinson J, Guiliano DB, Blaxter M: Making sense of EST sequences by CLOBBing them. BMC Bioinformatics. 2002, 3: 31-10.1186/1471-2105-3-31.PubMed CentralView ArticlePubMed
- Slater GSC: Algorithms for the Analysis of ESTs. 2000, , University of Cambridge
- Chiu JC, Lee EK, Egan MG, Sarkar IN, Coruzzi GM, DeSalle R: OrthologID: automation of genome-scale ortholog identification within a parsimony framework. Bioinformatics. 2006, 22: 699-707. 10.1093/bioinformatics/btk040.View ArticlePubMed
- Gatesy J, Baker RH: Hidden likelihood support in genomic data: can forty-five wrongs make a right?. Systematic Biology. 2005, 54: 483-492. 10.1080/10635150590945368.View ArticlePubMed
- Rokas A, Krüger D, Carroll SB: Animal evolution and the molecular signature of radiations compressed in time. Science. 2005, 310: 1933-1938. 10.1126/science.1116759.View ArticlePubMed
- Planet PJ, Kachlany SC, Fine DH, DeSalle R, Figurski DH: The widespread colonization island of Actinobacillus actinomycetemcomitans. Nat Genet. 2003, 34: 193-198. 10.1038/ng1154.View ArticlePubMed
- Wolf YI, Rogozin IB, Grishin NV, Koonin EV: Genome trees and the tree of life. Trends in Genetics. 2002, 18: 472-479. 10.1016/S0168-9525(02)02744-0.View ArticlePubMed
- Cognato AI, Vogler AP: Exploring data interaction and nucleotide alignment in a multiple gene analysis of Ips (Coleoptera: Scolytinae). Systematic Biology. 2001, 50: 758-780. 10.1080/106351501753462803.View ArticlePubMed
- Murphy WJ, Eizirik E, Johnson WE, Zhang YP, Ryder OA, O'Brien SJ: Molecular phylogenetics and the origins of placental mammals. Nature. 2001, 409: 614-618. 10.1038/35054550.View ArticlePubMed
- Murphy WJ, Eizirik E, O'Brien SJ, Madsen O, Scally M, Douady CJ, Teeling E, Ryder OA, Stanhope MJ, de Jong WW, Springer MS: Resolution of the early placental mammal radiation using Bayesian phylogenetics. Science. 2001, 294: 2348-2351. 10.1126/science.1067179.View ArticlePubMed
- Phillips AJ: Techniques in Molecular Systematics and Evolution. Edited by: DeSalle R, Giribet G and Wheeler WC. 2002, Basel, Birkhäuser Verlag, 132-145. Comparative phylogenomics: a strategy for high-throughput large-scale sub-genomic sequencing projects for phylogenetic analysis, Methods and Tools in Biosciences and Medicine,View Article
- Felsenstein J: Confidence limits on phylogenies: an approach using the bootstrap. Evolution. 1985, 39: 783-791. 10.2307/2408678.View Article
- Kluge AG: A concern for evidence and a phylogenetic hypothesis of relationships among Epicrates (Boidae, Serpentes). Systematic Zoology. 1989, 38: 7-25. 10.2307/2992432.View Article
- Nixon KC, Carpenter JM: On simultaneous analysis. Cladistics. 1996, 12: 221-241. 10.1111/j.1096-0031.1996.tb00010.x.View Article
- DeSalle R: Animal phylogenomics: multiple interspecific genome comparisons. Methods in Enzymology. 2005, 395: 104-133. 10.1016/S0076-6879(05)95008-8.View ArticlePubMed
- Stevenson D, Loconte H: Ordinal and familial relationships of Pteridophyte genera. Pteridology in Perspective. Edited by: Camus JM, Gibby M and Johns RJ. 1997, , Royal Botanic Gardens, Kew
- Swofford DL: PAUP* Phylogenetic Analysis Using Parsimony (*and Other Methods). Version 4.0b10. 2002, Sunderland, Massachusetts, Sinauer Associates
- Farris JS, Källersjö M, Kluge AG, Bult C: Testing significance of incongruence. Cladistics. 1994, 10: 315-319. 10.1111/j.1096-0031.1994.tb00181.x.View Article
- Farris JS, Källersjö M, Kluge AG, Bult C: Constructing a significance test for incongruence. Syst Biol. 1995, 44: 570-572. 10.2307/2413663.View Article
- Philippe H, Lartillot N, Brinkmann H: Multigene analyses of bilaterian animals corroborate the monophyly of Ecdysozoa, Lophotrochozoa, and Protostomia. Mol Biol Evol. 2005, 22: 1246-1253. 10.1093/molbev/msi111.View ArticlePubMed
- Philippe H, Snell EA, Bapteste E, Lopez P, Holland PWH, Casane D: Phylogenomics of eukaryotes: impact of missing data on large alignments. Mol Biol Evol. 2004, 21: 1740-1752. 10.1093/molbev/msh182.View ArticlePubMed
- Wiens JJ: Missing data, incomplete taxa, and phylogenetic accuracy. Systematic Biology. 2003, 52: 528-538.View ArticlePubMed
- Bremer K: The limits of amino acid sequence data in angiosperm phylogenetic reconstruction. Evolution. 1988, 42: 795-803. 10.2307/2408870.View Article
- Bremer K: Branch support and tree stability. Cladistics. 1994, 10: 295-304. 10.1111/j.1096-0031.1994.tb00179.x.View Article
- Kellis M, Patterson N, Endrizzi M, Birren B, Lander ES: Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature. 2003, 423: 241-254. 10.1038/nature01644.View ArticlePubMed
- Beltrao P, Serrano L: Comparative genomics and disorder prediction identify biologically relevant SH3 protein interactions. PLoS Computational Biology. 2005, 1(3): e26-View Article
- Crane PR: Time for the angiosperms. Nature. 1993, 336: 631-632. 10.1038/366631a0.View Article
- Sanderson MJ, Thorne JL, Wikström N, Bremer K: Molecular evidence on plant divergence times. American Journal of Botany. 2004, 91: 1656-1665.View ArticlePubMed
- Magallo SA, Sanderson MJ: Angiosperm divergence times: the effect of genes, codon positions, and time constraints. Evolution. 2005, 58: 1653-1670. 10.1554/04-565.1.View Article
- Lanyon SM: Detecting internal inconsistencies in distance data. Systematic Zoology. 1985, 34: 397-403. 10.2307/2413204.View Article
- Farris JS, Albert VA, Källersjö M, Lipscomb D, Kluge AG: Parsimony jackknifing outperforms neighbor-joining. Cladistics. 1996, 12: 99-124. 10.1111/j.1096-0031.1996.tb00196.x.View Article
- Yang Z, Rannala B: Bayesian phylogenetic inference using DNA sequences: a Markov chain Monte Carlo method. Molecular Biology and Evolution. 1997, 14: 717-724.View ArticlePubMed
- Larget B, Simon D: Markov chain Monte Carlo algorithms for the Bayesian analysis of phylogenetic trees. Molecular Biology and Evolution. 1999, 16: 750–759-View Article
- Huelsenbeck JP, Ronquist F, Nielsen R, Bollback JP: Bayesian inference of phylogeny and its impact on evolutionary biology. Science. 2001, 294: 2310–2314-10.1126/science.1065889.View ArticlePubMed
- Gatesy J, Arctander P: Hidden morphological support for the phylogenetic placement of Pseudoryx nghetinhensis with bovine bovids: A combined analysis of gross anatomical evidence and DNA sequences from five genes. Systematic Biology. 2000, 49: 515-538. 10.1080/10635159950127376.View ArticlePubMed
- Sarkar IN, Egan MG, DeSalle R, Coruzzi G: ASAP: automated simultaneous analyses phylogenies. manuscript in preparation
- Altschul SF, Gish W, Miller W, Meyers EW, Lipman DJ: Basic Local Alignment Search Tool. Journal of Molecular Biology. 1990, 215: 403-410. 10.1006/jmbi.1990.9999.View ArticlePubMed
- Tatusov RL, Galperin MY, Natale DA, Koonin EV: The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Research. 2000, 28: 33-36. 10.1093/nar/28.1.33.PubMed CentralView ArticlePubMed
- Remm M, Storm CEV, Sonnhammer ELL: Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. Journal of Molecular Biology. 2001, 314: 1041-1052. 10.1006/jmbi.2000.5197.View ArticlePubMed
- O'Brien KP, Remm M, Sonnhammer EL: Inparanoid: a comprehensive database of eukaryotic orthologs. Nucleic Acids Research. 2005, 33: D476-D480. 10.1093/nar/gki107.PubMed CentralView ArticlePubMed
- Hirsh AE, Fraser HB: Protein dispensability and rate of evolution. Nature. 2001, 411: 1046-1049. 10.1038/35082561.View ArticlePubMed
- Jordan IK, Rogozin IB, Wolf YI, Koonin EV: Essential genes are more evolutionarily conserved than are nonessential genes in bacteria. Genome Research. 2002, 12: 962-968. 10.1101/gr.87702. Article published online before print in May 2002.PubMed CentralView ArticlePubMed
- Wall DP, Fraser HB, Hirsh AE: Detecting putative orthologs. Bioinformatics. 2003, 19: 1710-1711. 10.1093/bioinformatics/btg213.View ArticlePubMed
- Sarkar IN, Thornton JW, Planet PJ, Figurski DH, Schierwater B, DeSalle R: An automated phylogenetic key for classifying homeoboxes. Molecular Phylogenetics and Evolution. 2002, 24: 388-399. 10.1016/S1055-7903(02)00259-2.View ArticlePubMed
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research. 1997, 25: 3389–3402-10.1093/nar/25.17.3389.PubMed CentralView ArticlePubMed
- Katoh K, Kuma K, Toh H, Miyata T: MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Research. 2005, 33: 511-518. 10.1093/nar/gki198.PubMed CentralView ArticlePubMed
- Gatesy J, DeSalle R, Wheeler W: Alignment-ambiguous nucleotide sites and the exclusion of systematic data. Molecular Phylogenetics and Evolution. 1993, 2: 152-157. 10.1006/mpev.1993.1015.View ArticlePubMed
- Nixon KC: The parsimony ratchet, a new method for rapid parsimony analysis. Cladistics. 1999, 15: 407-414. 10.1111/j.1096-0031.1999.tb00277.x.View Article
- Nishiyama T, Fujita T, Shin-I T, Seki M, Nishide H, Uchiyama I, Kamiya A, Carninci P, Hayashizaki Y, Shinozaki K, Kohara Y, Hasebe M: Comparative genomics of Physcomitrella patens gametophytic transcriptome and Arabidopsis thaliana: implication for land plant evolution. Proceedings of the National Academy of Sciences USA. 2003, 100: 8007-8012. 10.1073/pnas.0932694100.View Article
- Rensing SA, Rombauts S, Van de Peer Y, Reski R: Moss transcriptome and beyond. Trends Plant Sci. 2002, 7: 535-538. 10.1016/S1360-1385(02)02363-4.View ArticlePubMed
- Thompson JD, Higgins DG, Gibson TJ: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994, 22: 4673-4680.PubMed CentralView ArticlePubMed
- Kluge AG, Farris JS: Quantitative phyletics and the evolution of Anurans. Systematic Zoology. 1969, 18: 1-32. 10.2307/2412407.View Article
- Farris JS: The retention index and the rescaled consistency index. Cladistics. 1989, 5: 417-419.View Article
- OrthologID. [http://nypg.bio.nyu.edu/orthologid]
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.