A practical approach to phylogenomics: the phylogeny of ray-finned fish (Actinopterygii) as a case study
© Li et al; licensee BioMed Central Ltd. 2007
Received: 25 September 2006
Accepted: 20 March 2007
Published: 20 March 2007
Molecular systematics occupies one of the central stages in biology in the genomic era, ushered in by unprecedented progress in DNA technology. The inference of organismal phylogeny is now based on many independent genetic loci, a widely accepted approach to assemble the tree of life. Surprisingly, this approach is hindered by lack of appropriate nuclear gene markers for many taxonomic groups especially at high taxonomic level, partially due to the lack of tools for efficiently developing new phylogenetic makers. We report here a genome-comparison strategy to identifying nuclear gene markers for phylogenetic inference and apply it to the ray-finned fishes – the largest vertebrate clade in need of phylogenetic resolution.
A total of 154 candidate molecular markers – relatively well conserved, putatively single-copy gene fragments with long, uninterrupted exons – were obtained by comparing whole genome sequences of two model organisms, Danio rerio and Takifugu rubripes. Experimental tests of 15 of these (randomly picked) markers on 36 taxa (representing two-thirds of the ray-finned fish orders) demonstrate the feasibility of amplifying by PCR and directly sequencing most of these candidates from whole genomic DNA in a vast diversity of fish species. Preliminary phylogenetic analyses of sequence data obtained for 14 taxa and 10 markers (total of 7,872 bp for each species) are encouraging, suggesting that the markers obtained will make significant contributions to future fish phylogenetic studies.
We present a practical approach that systematically compares whole genome sequences to identify single-copy nuclear gene markers for inferring phylogeny. Our method is an improvement over traditional approaches (e.g., manually picking genes for testing) because it uses genomic information and automates the process to identify large numbers of candidate makers. This approach is shown here to be successful for fishes, but also could be applied to other groups of organisms for which two or more complete genome sequences exist, which has important implications for assembling the tree of life.
The ultimate goal of obtaining a well-supported and accurate representation of the tree of life relies on the assembly of phylogenomic data sets for large numbers of taxa . Molecular phylogenies based on DNA sequences of a single locus or a few loci often suffer from low resolution and marginal statistical support due to limited character sampling. Individual gene genealogies also may differ from each other and from the organismal phylogeny (the "gene-tree vs. species-tree" issue) [2, 3], in many cases due to systematic biases (i.e., compositional bias, long-branch attraction, heterotachy), leading to statistical inconsistency in phylogenetic reconstruction [4–7]. Phylogenomic data sets – using genome sequences to study evolutionary relationship – provide the best solution to these problems [1, 8]. This approach requires compilation of large data sets that include many independent nuclear loci for many species [9–14]. Such data sets are less likely to succumb to sampling and systematic errors  by offering the possibility of analyzing large numbers of phylogenetically informative characters from different genomic locations, and also of corroborating phylogenetic results by varying the species sampled. If any systematic bias may be present in a fraction of individual loci sampled, it is unlikely that all affected loci will be biased in the same direction. Powerful analytical approaches that accommodate model heterogeneity among data partitions are becoming available to efficiently analyze such complex phylogenomic data sets [15, 16].
Constructing phylogenomic data sets for large number of taxa still is, however, quite challenging. Most attempts to use this approach have been based either on few available complete genomic sequence data [13, 17, 18], or cDNA and ESTs sequences [9, 12, 18, 19] for relatively few taxa. Availability of complete genomes limits the number of taxa that can be analyzed [13, 17], imposing known problems for phylogenetic inference associated with poor taxon sampling [20, 21]. On the other hand, methods based on ESTs or cDNA sequence data are not practical for many taxa because they require construction of cDNA libraries and fresh tissue samples. In addition, some genes may not be expressed in certain tissues or developmental stages, leading to cases with undesirable amounts of missing data . The most efficient way to collect nuclear gene sequences for many taxa is to directly amplify target sequences using "universal" PCR primers, an approach so far used for just a few widely-used nuclear genes [22–25], or selected taxonomic groups (e.g., placental mammals and land plants). Widespread use of this strategy in most taxonomic groups has been hindered by the paucity of available PCR-targeted gene markers.
Sequence conservatism and long exonic regions have been used as preferred criteria to select phylogenetic markers in the past . However, finding many preferred, easy-to-apply gene markers is unlikely when candidate genes are manually screened from data bases or taken from isolated studies of few individual genes. This complexity partially explains the scarcity of currently available nuclear gene markers in many taxonomic groups. To address the problem, we present a simple bioinformatic approach to obtain nuclear gene markers from complete genomic data, based on the three aforementioned criteria. Our method incorporates two improvements over the traditional way of manually picking genes and testing their phylogenetic utilities. These improvements include using full genomic information and automating the process of searching for candidate makers. We apply the method to Actinoptertygii (ray-finned fish), the largest vertebrate clade – they make up about half of all known vertebrate species – with a poorly known phylogeny [38–42]. We also present experimental tests to show that PCR primers designed for a subset of the candidate markers can efficiently amplify these markers for a highly diverse sample of ray-finned fishes. Comparative analyses of the sequences obtained show encouraging phylogenetic properties for future studies.
To investigate the properties of candidate markers, we analyzed those found in the zebrafish and torafugu comparison, since their genome sequences are well annotated. Among them, 154 putative homologs were identified between zebrafish and torafugu by cross-genome comparison. Further comparison with EST sequences from other fish species reduced this number to 138 candidate markers (Supplementary Table 1). The 154 candidate markers shared between these two genomes according to our search criteria are distributed among 24 of the 25 chromosomes of zebrafish, and a Chi-square test did not reject a Poisson distribution of markers among chromosomes (χ2 = 16.99, df = 10, p = 0.0746). The size of candidate markers ranged from 802 to 5811 bp (in D. rerio). Their GC content ranged from 41.6% to 63.9% (in D. rerio), and the average similarity of the DNA sequence of these markers between D. rerio and T. rubripes varied from 77.3% to 93.2% (constrained by the search criteria).
PCR primers and annealing temperatures used to amplify 10 new markers
5' GGACGCAGGACCGCARTAYC 3'
5' CTGTGTGTGTCCTTTTGTGRATYTT 3'
5' GGACCGCAGTATCCCACYMT 3'
5' GTGTGTCCTTTTGTGAATTTTYAGRT 3'
5' CATMTTYTCCATCTCAGATAATGC 3'
5' ATTCTCACCACCATCCAGTTGAA 3'
5' GGAGAATCARTCKGTGCTCATCA 3'
5' CTCACCACCATCCAGTTGAACAT 3'
5' GGAACTATYGGTAAGCARATGG 3'
5' TGGAAGAAKCCAAAKATGATGC 3'
5' TCGGTAAGCARATGGTGGACA 3'
5' AGAATCCRGTGAAGAGCATCCA 3'
5' AGAATGGATWACCAACACYTACG 3'
5' TAAGGCACAGGATTGAGATGCT 3'
5' GGATAACCAACACYTACGTCAA 3'
5' ACAGGATTGAGATGCTGTCCA 3'
5' TGTCTACACAGGCTGCGACAT 3'
5' GATGTCCTTRGWGCAGTTTTT 3'
5' GCCATGMCTGGYTCTTTCCT 3'
5' GGAGCAGTTTTTCTCRCATTC 3'
5' GACATGCTGGAGTTTCAGGA 3'
5' ACTTGTTRGCMACTGGGTCAAA 3'
5' ATGCTGGAGTTTCAGGACAT 3'
5' AGCMACTGGGTCAAACTGCTC 3'
5' GGACTGTCMAAGATGACCACMT 3'
5' CCCAAGAGGTTCTTGTTRAAGAT 3'
5' ACATGGTACCAGTATGGCTTTGT 3'
5' GTAAGGCATATASGTGTTCTCTCC 3'
5' GTATGGTSGGCAGGAACYTGAA 3'
5' CAAACAKCTCYCCGATGTTCTC 3'
5' GACGTTCCCATGATGGCWAAAAT 3'
5' CATCTCYCCGATGTTCTCGTA 3'
5' CCACACACTCYCCACAGAA 3'
5' TTCTCAAGCAGGTATGAGGTAGA 3'
5' AAAAGATGTTTCACCGMAAAGA 3'
5' GGTATGAGGTAGATCCSAGCTG 3'
5' ATGGCGAACTAYAGCCATGC 3'
5' CTGGATTTTCTGCAGTASAGGAG 3'
5' TGCAGGGGACCACAMCAT 3'
5' CAGTASAGGAGCGTGGTGCT 3'
Summary information of the 10 gene markers amplified in 14 taxa
No. of bp
No. of var.
No. of PI
Genetic distance (%)
The bioinformatic approach implemented in this study resulted in a large set (154 loci for the zebrafish and torafugu comparison) of candidate genes to infer high-level phylogeny of ray-finned fishes. The actual number of candidate loci depended on the genomes being compared and the fixed search parameters. Experimental tests of a smaller subset (15 loci) demonstrate that a large fraction (2/3) of these candidates are easily amplified by PCR from whole genomic DNA extractions in a vast diversity of fish taxa. The assumption that these loci are represented by a single copy in the fish genomes could not be rejected by the PCR assays in the species tested (all amplifications resulted in a single product), increasing the likelihood that the genetic markers are orthologous and suitable to infer organismal phylogeny. Our method is based on searching, under specific criteria, the available complete genomic databases of organisms closely related to the taxa of interest. Therefore, the same approach that is shown to be successful for fishes could be applied to other groups of organisms for which two or more complete genome sequences exist. Parameter values (L, S, and C) used for the search (Figure 2) may be altered to obtain fragments of different size or with different levels of conservation (i.e., less conserved for phylogenies of more closely related organisms).
An alternative way to develop nuclear gene markers for phylogenetic studies is to construct a cDNA library or sequence several ESTs for a small pilot group of taxa, and then to design specific PCR primers to amplify the orthologous gene copy in all the other taxa of interest [19, 46]. The major potential problem with this approach stems from the fact that the method starts with a cDNA library or a set of EST sequences, with no prior knowledge of how many copies a gene has in each genome. As discussed above, this condition may lead to mistaken paralogy. In our approach, we search the genomic database to find single-copy candidates so no duplicate gene copies, if present, would be missed (see below).
Recent studies have proposed whole genome duplication events during vertebrate evolution and also genome duplications restricted to ray-finned fishes [31, 32, 47, 48]. Our results indicate that many single-copy genes still exist in a wide diversity of fish taxa (representing 28 orders of actinopterygian fishes), in agreement with previous estimates that a vast majority of duplicated genes are secondarily lost [34, 35]. All 154 candidates were identified as single-copy genes in D. rerio and T. rubripes, according to our search criteria. Our results also show the 154 candidate genes are randomly distributed in the fish genome (at least among chromosomes of D. rerio). In the experimental tests, 10 out of 15 markers were found in single-copy condition in all successful amplifications, including the tetraploid species, O. mykiss. However, relaxing the search criteria, and conserving targets less than 50% similar in a subsequent blast search against the zebrafish genome, 7 of the 10 genes were found to have "alignable paralogs" (the 3 exceptions were myh6, tbr1, and Gylt). Genomes of medaka, stickleback, and fugu were also checked for these 3 genes, and no "paralogs" were detected, suggesting the sequences of ray-finned fish collected for these 3 genes are unambiguously orthologous to each other. Phylogenetic analyses for each of the 7 genes that include the putative paralogs found by this procedure produced tree topologies that strongly suggest an ancient duplication event in the vertebrate lineage, before the divergence of tetrapods from ray-finned fishes. Paralogous sequences are placed at the base of the tetrapod-actinopteryigian divergence, or as part of a basal polytomy with the other tetrapod and ray-finned fish sequences. In the terminology proposed by Remm et al.  these would be considered out-paralogs. In no case are these sequences nested among ingroup actinopterygian sequences (see Additional file 4), as would be the case expected for in-paralogs . Stringent search critera implemented in our approach followed by phylogenetic analysis can distinguish between orthologs and putative our-paralogs. Although the method will not guarantee that single copy genes amplified by PCR in several taxa are orthologs as opposed to in-paralogs, the existence and identification of genome-scale single-copy nuclear markers should facilitate the construction of the tree of life, even if the evolutionary mechanism responsible for maintaining single-copy genes is poorly known .
The molecular evolutionary profiles of the 10 newly developed markers are in the same range as RAG-1, a widely-used gene marker in vertebrates. The genes with high treeness values have intermediate substitution rate, suggesting that optimal rate and base composition stationarity are important factors that determine the suitability of a phylogenetic marker. The phylogeny based on individual markers revealed incongruent phylogenetic signal among 6 of the 10 individual genes. This incongruence suggests that significant biases in the data might obscure the true phylogenetic signal in some individual genes, but the direction of the bias is hardly shared among genes (Additional file 3), justifying the use of genome-scale gene makers to infer organismal phylogeny.
Finally, with respect to the phylogenetic results per se, there are two significant areas of discrepancy between the phylogeny obtained in this study (Figure 3a) and a consensus view of fish phylogeny (Figure 3b) . Although these differences could be due to poor taxonomic sampling, we discuss them briefly. First, the traditional tree groups cichlids with other perciforms, whereas our results showed the cichlid O. niloticus is more closely related to atherinomorphs (Cyprinodontiformes + Beloniformes) than to other perciforms. This result also was supported by two recent studies analysing multiple nuclear genes [17, 51]. The second difference is that the traditional tree groups Lycodes with other perciforms, while Lycodes was found closely related to Gasterosteus (Gasterosteiformes) in our results. Interestingly, the sister-taxa relationship between Lycodes and Gasterosteus also is supported by recent studies using mitochondrial genome data [38, 52]. The difference between our "total evidence" tree and the classical hypothesis is significant based on the new data, as indicated by a one-tailed Shimodaira-Hasegawa (SH) test (p = 0.000) .
We developed a genome-based approach to identify nuclear gene markers for phylogeny inference that are single-copy, contain large exons, and are conserved across extensive taxonomic distances. We show that our approach has practical value through direct experimentation on a representative sample of ray-finned fish, the largest vertebrate clade in need of phylogenetic resolution. The same approach, however, could be applied to other groups of organisms as long as two or more complete genome sequences are available. This research may have important implications for assembling the tree of life.
Genome-scale mining for phylogenetic markers
Whole genomic sequences of Danio rerio and Takifugu rubripes were retrieved from the ENSEMBL database . Exon sequences with length > 800 bp were then extracted from the genome databases. The exons extracted were compared in two steps: (1) within-genome sequence comparisons and (2) between genome comparisons. The first step is designed to generate a set of single-copy nuclear gene exons (length > 800 bp) within each genome, whereas the second step should identify single-copy, putatively orthologous exons between D. rerio and T. rubripes (Figure 2). The BLAST algorithm was used for sequence similarity comparison. In addition to the parameters available in the BLAST program, we applied another parameter, coverage (C), to identify global sequence similarity between exons. The coverage was defined as the ratio of total length of locally aligned sequences over the length of query sequence. The similarity (S) was set to S < 50% for within-genome comparison, which means that only genes that have no counterpart more than 50% similar to themselves were kept. The similarity was set to S × > 70% and the coverage was set to C > 30% in cross-genome comparison, which selected genes that are 70% similar and 30% aligned between D. rerio and T. rubripes. Subsequent comparisons were performed on the newly available genome of stickleback (Gasterosteus aculeatus) and Japanese rice fish (Oryzias latipes), as described above. We programmed this procedure using PERL programming language to automate the processes and made the source code publicly available on our website . We are in progress to make it available for other genomic sequences and parameter values.
Experimental testing for candidate markers
PCR and sequencing primers were designed on aligned sequences of D. rerio and T. rubripes for 15 random selected genes. Primer3 was used to design the primers . Degenerate primers and a nested-PCR design were used to assure the amplification for each gene in most of the taxa. Ten of the 15 genes tested were amplified with single fragment in most of the 36 taxa examined. PCR primers for 10 gene markers are listed in Table 1. The amplified fragments were directly sequenced, without cloning, using the BigDye system (Applied Biosystems). Sequences of the frequently used RAG1 gene were retrieved for the same taxa from GenBank for comparison to the newly developed markers [GenBank: AY430199, NM_131389, U15663, AB120889, DQ492511, AY308767, AF108420, EF033039 – EF033043]. When RAG1 sequences for the same taxa were not available, a taxon of the same family was used, i.e. Nimbochromis was used instead of Oreochromis and Neobythites was used instead of Brotula.
Sequences of the 10 new markers in the 14 taxa were used in phylogenetic analysis to assess their performance. Sequences were aligned using ClustalX  on the translated protein sequences. Uncorrected genetic distances were calculated using PAUP . Relative substitution rate for each markers were estimated using a Bayesian approach . Relative composition variability (RCV) and treeness were calculated following Phillips and Penny . Prottest  was used to chose the best model for protein sequence data and the AIC criteria to determine the scheme of data partitioning. Bayesian analysis implemented in MrBayes v3.1.1 and maximum likelihood analysis implemented in TreeFinder  were performed on the protein sequences. One million generation with 4 chains were run for Bayesian analysis and the trees sampled prior to reaching convergence were discarded (as burnin) before computing the consensus tree and posterior probabilities. Two independent runs were used to provide additional confirmation of convergence of posterior probability distribution. Given the biased base composition in the nucleotide data indicated by the RCV value (Table 2), we analyzed the nucleotide data under the RY-coding scheme (C and T = Y, A and G = R), partitioned by gene in TreeFinder, since RY-coded data are less sensitive to base compositional bias . Alternative hypotheses were tested by one-tailed Shimodaira and Hasegawa (SH) test  with 1000 RELL bootstrap replicates implemented in TreeFinder.
This work was supported by the grants from University of Nebraska-Lincoln (to C. L.), National Science Foundation DEB-9985045 (to G. O.) and University of Nebraska-Omaha (to G. L.). We thank Fred J. Potmesil and Thaine W. Rowley for help in computer programming.
- Delsuc F, Brinkmann H, Philippe H: Phylogenomics and the reconstruction of the tree of life. Nat Rev Genet. 2005, 6 (5): 361-375. 10.1038/nrg1603.View ArticlePubMedGoogle Scholar
- Pamilo P, Nei M: Relationships Between Gene Trees and Species Trees. Mol Biol Evol. 1988, 5 (5): 568-583.PubMedGoogle Scholar
- Fitch WM: Distinguishing homologous from analogous proteins. Syst Zool. 1970, 19 (2): 99-113. 10.2307/2412448.View ArticlePubMedGoogle Scholar
- Lopez P, Casane D, Philippe H: Heterotachy, an important process of protein evolution. Mol Biol Evol. 2002, 19 (1): 1-7.View ArticlePubMedGoogle Scholar
- Felsenstein J: Case in which parsimony or compatibility methods will be positively misleading. Syst Biol. 1978, 27: 401-410.View ArticleGoogle Scholar
- Weisburg WG, Giovannoni SJ, Woese CR: The Deinococcus-Thermus phylum and the effect of rRNA composition on phylogenetic tree construction. Syst Appl Microbiol. 1989, 11: 128-134.View ArticlePubMedGoogle Scholar
- Foster PG, Hickey DA: Compositional bias may affect both DNA-based and protein-based phylogenetic reconstructions. J Mol Evol. 1999, 48 (3): 284-290. 10.1007/PL00006471.View ArticlePubMedGoogle Scholar
- Eisen JA, Fraser CM: Phylogenomics: intersection of evolution and genomics. Science. 2003, 300 (5626): 1706-1707. 10.1126/science.1086292.View ArticlePubMedGoogle Scholar
- Philippe H, Snell EA, Bapteste E, Lopez P, Holland PW, Casane D: Phylogenomics of eukaryotes: impact of missing data on large alignments. Mol Biol Evol. 2004, 21 (9): 1740-1752. 10.1093/molbev/msh182.View ArticlePubMedGoogle Scholar
- Driskell AC, Ane C, Burleigh JG, McMahon MM, O'Meara B C, Sanderson MJ: Prospects for building the tree of life from large sequence databases. Science. 2004, 306 (5699): 1172-1174. 10.1126/science.1102036.View ArticlePubMedGoogle Scholar
- Takezaki N, Figueroa F, Zaleska-Rutczynska Z, Klein J: Molecular phylogeny of early vertebrates: monophyly of the agnathans as revealed by sequences of 35 genes. Mol Biol Evol. 2003, 20 (2): 287-292. 10.1093/molbev/msg040.View ArticlePubMedGoogle Scholar
- Bapteste E, Brinkmann H, Lee JA, Moore DV, Sensen CW, Gordon P, Durufle L, Gaasterland T, Lopez P, Muller M, Philippe H: The analysis of 100 genes supports the grouping of three highly divergent amoebae: Dictyostelium, Entamoeba, and Mastigamoeba. Proc Natl Acad Sci U S A. 2002, 99 (3): 1414-1419. 10.1073/pnas.032662799.PubMed CentralView ArticlePubMedGoogle Scholar
- Rokas A, Williams BL, King N, Carroll SB: Genome-scale approaches to resolving incongruence in molecular phylogenies. Nature. 2003, 425 (6960): 798-804. 10.1038/nature02053.View ArticlePubMedGoogle Scholar
- Murphy WJ, Eizirik E, Johnson WE, Zhang YP, Ryder OA, O'Brien SJ: Molecular phylogenetics and the origins of placental mammals. Nature. 2001, 409 (6820): 614-618. 10.1038/35054550.View ArticlePubMedGoogle Scholar
- Brandley MC, Schmitz A, Reeder TW: Partitioned Bayesian analyses, partition choice, and the phylogenetic relationships of scincid lizards. Syst Biol. 2005, 54 (3): 373-390. 10.1080/10635150590946808.View ArticlePubMedGoogle Scholar
- Castoe TA, Doan TM, Parkinson CL: Data partitions and complex models in Bayesian analysis: the phylogeny of Gymnophthalmid lizards. Syst Biol. 2004, 53 (3): 448-469. 10.1080/10635150490445797.View ArticlePubMedGoogle Scholar
- Chen WJ, Ortí G, Meyer A: Novel evolutionary relationship among four fish model systems. Trends Genet. 2004, 20 (9): 424-431. 10.1016/j.tig.2004.07.005.View ArticlePubMedGoogle Scholar
- Rokas A, Kruger D, Carroll SB: Animal evolution and the molecular signature of radiations compressed in time. Science. 2005, 310 (5756): 1933-1938. 10.1126/science.1116759.View ArticlePubMedGoogle Scholar
- Whittall JB, Medina-Marino A, Zimmer EA, Hodges SA: Generating single-copy nuclear gene data for a recent adaptive radiation. Mol Phylogenet Evol. 2006, 39 (1): 124-134. 10.1016/j.ympev.2005.10.010.View ArticlePubMedGoogle Scholar
- Hillis DM, Pollock DD, McGuire JA, Zwickl DJ: Is sparse taxon sampling a problem for phylogenetic inference?. Syst Biol. 2003, 52 (1): 124-126. 10.1080/10635150390132911.PubMed CentralView ArticlePubMedGoogle Scholar
- Soltis DE, Albert VA, Savolainen V, Hilu K, Qiu YL, Chase MW, Farris JS, Stefanovic S, Rice DW, Palmer JD, Soltis PS: Genome-scale data, angiosperm relationships, and "ending incongruence": a cautionary tale in phylogenetics. Trends Plant Sci. 2004, 9 (10): 477-483. 10.1016/j.tplants.2004.08.008.View ArticlePubMedGoogle Scholar
- Lovejoy NR, Collette BB: Phylogenetic relaionships of new world needlefishes (Teleostei: Belonidae) and the biogeography of transitions between marine and freshwater habitats. Copeia. 2001, 2001 (2): 324-338. 10.1643/0045-8511(2001)001[0324:PRONWN]2.0.CO;2.View ArticleGoogle Scholar
- Saint KM, Austin CC, Donnellan SC, Hutchinson MN: C-mos, a nuclear marker useful for squamate phylogenetic analysis. Mol Phylogenet Evol. 1998, 10 (2): 259-263. 10.1006/mpev.1998.0515.View ArticlePubMedGoogle Scholar
- Mohammad-Ali K, Eladari ME, Galibert F: Gorilla and orangutan c-myc nucleotide sequences: inference on hominoid phylogeny. J Mol Evol. 1995, 41 (3): 262-276. 10.1007/BF01215173.View ArticlePubMedGoogle Scholar
- Groth JG, Barrowclough GF: Basal divergences in birds and the phylogenetic utility of the nuclear RAG-1 gene. Mol Phylogenet Evol. 1999, 12 (2): 115-123. 10.1006/mpev.1998.0603.View ArticlePubMedGoogle Scholar
- Lyons-Weiler J, Hoelzer GA, Tausch RJ: Relative apparent synapomorphy analysis (RASA). I: The statistical measurement of phylogenetic signal. Mol Biol Evol. 1996, 13 (6): 749-757.View ArticlePubMedGoogle Scholar
- Philippe H, Zhou Y, Brinkmann H, Rodrigue N, Delsuc F: Heterotachy and long-branch attraction in phylogenetics. BMC Evol Biol. 2005, 5: 50-10.1186/1471-2148-5-50.PubMed CentralView ArticlePubMedGoogle Scholar
- Collins TM, Fedrigo O, Naylor GJ: Choosing the best genes for the job: the case for stationary genes in genome-scale phylogenetics. Syst Biol. 2005, 54 (3): 493-500. 10.1080/10635150590947339.View ArticlePubMedGoogle Scholar
- Steel MA, Lockhart PJ, Penny D: Confidence in evolutionary trees from biological sequence data. Nature. 1993, 364 (6436): 440-442. 10.1038/364440a0.View ArticlePubMedGoogle Scholar
- Phillips MJ, Delsuc F, Penny D: Genome-scale phylogeny and the detection of systematic biases. Mol Biol Evol. 2004, 21 (7): 1455-1458. 10.1093/molbev/msh137.View ArticlePubMedGoogle Scholar
- Amores A, Force A, Yan YL, Joly L, Amemiya C, Fritz A, Ho RK, Langeland J, Prince V, Wang YL, Westerfield M, Ekker M, Postlethwait JH: Zebrafish hox clusters and vertebrate genome evolution. Science. 1998, 282 (5394): 1711-1714. 10.1126/science.282.5394.1711.View ArticlePubMedGoogle Scholar
- Meyer A, Van de Peer Y: From 2R to 3R: evidence for a fish-specific genome duplication (FSGD). Bioessays. 2005, 27 (9): 937-945. 10.1002/bies.20293.View ArticlePubMedGoogle Scholar
- Ciccarelli FD, von Mering C, Suyama M, Harrington ED, Izaurralde E, Bork P: Complex genomic rearrangements lead to novel primate gene function. Genome Res. 2005, 15 (3): 343-351. 10.1101/gr.3266405.PubMed CentralView ArticlePubMedGoogle Scholar
- Woods IG, Wilson C, Friedlander B, Chang P, Reyes DK, Nix R, Kelly PD, Chu F, Postlethwait JH, Talbot WS: The zebrafish gene map defines ancestral vertebrate chromosomes. Genome Res. 2005, 15 (9): 1307-1314. 10.1101/gr.4134305.PubMed CentralView ArticlePubMedGoogle Scholar
- Jaillon O, Aury JM, Brunet F, Petit JL, Stange-Thomann N, Mauceli E, Bouneau L, Fischer C, Ozouf-Costaz C, Bernot A, Nicaud S, Jaffe D, Fisher S, Lutfalla G, Dossat C, Segurens B, Dasilva C, Salanoubat M, Levy M, Boudet N, Castellano S, Anthouard V, Jubin C, Castelli V, Katinka M, Vacherie B, Biemont C, Skalli Z, Cattolico L, Poulain J, De Berardinis V, Cruaud C, Duprat S, Brottier P, Coutanceau JP, Gouzy J, Parra G, Lardier G, Chapple C, McKernan KJ, McEwan P, Bosak S, Kellis M, Volff JN, Guigo R, Zody MC, Mesirov J, Lindblad-Toh K, Birren B, Nusbaum C, Kahn D, Robinson-Rechavi M, Laudet V, Schachter V, Quetier F, Saurin W, Scarpelli C, Wincker P, Lander ES, Weissenbach J, Roest Crollius H: Genome duplication in the teleost fish Tetraodon nigroviridis reveals the early vertebrate proto-karyotype. Nature. 2004, 431 (7011): 946-957. 10.1038/nature03025.View ArticlePubMedGoogle Scholar
- Page RD, Cotton JA: Vertebrate phylogenomics: reconciled trees and gene duplications. Pac Symp Biocomput. 2002, 536-547.Google Scholar
- Friedlander TP, Regier JC, Mitter C: Nuclear gene sequences for higher level phylogenetic analysis: 14 promising candidates. Syst Biol. 1992, 41 (4): 483-490. 10.2307/2992589.View ArticleGoogle Scholar
- Miya M, Takeshima H, Endo H, Ishiguro NB, Inoue JG, Mukai T, Satoh TP, Yamaguchi M, Kawaguchi A, Mabuchi K, Shirai SM, Nishida M: Major patterns of higher teleostean phylogenies: a new perspective based on 100 complete mitochondrial DNA sequences. Mol Phylogenet Evol. 2003, 26 (1): 121-138. 10.1016/S1055-7903(02)00332-9.View ArticlePubMedGoogle Scholar
- Stiassny MLJ, Wiley EO, Johnson GD, de Carvalho MR: Gnathostome fishes. Assembling The Tree of Life. Edited by: Cracraft J, Donoghue MJ. 2004, New York , Oxford University Press, 410-429.Google Scholar
- Stiassny MLJ, Parenti LR, Johnson GD: Interrelationships of fishes. 1996, San Diego , Academic Press, xiii, 496 p.-Google Scholar
- Arratia G: Phylogenetic relationships of teleostei: past and present. Estud Oceanol. 2000, 19: 19-51.Google Scholar
- Greenwood PH, Miles RS, Patterson C: Interrelationships of fishes. 1973, London , Academic Press, 536 p.-Google Scholar
- Phylomarker - mining phylogenetic markers for assembling the Tree of Life [http://bioinfo-srv1.awh.unomaha.edu/phylomarker].Google Scholar
- Phillips MJ, Penny D: The root of the mammalian tree inferred from whole mitochondrial genomes. Mol Phylogenet Evol. 2003, 28 (2): 171-185. 10.1016/S1055-7903(03)00057-5.View ArticlePubMedGoogle Scholar
- Abascal F, Zardoya R, Posada D: ProtTest: selection of best-fit models of protein evolution. Bioinformatics. 2005, 21 (9): 2104-2105. 10.1093/bioinformatics/bti263.View ArticlePubMedGoogle Scholar
- Small RL, Cronn RC, Wendel JF: L. A. S. Johnson Review No. 2. Use of nuclear genes for phylogeny reconstruction in plants. Australian Systematic Botany. 2004, 17: 145-170. 10.1071/SB03015.View ArticleGoogle Scholar
- Taylor JS, Braasch I, Frickey T, Meyer A, Van de Peer Y: Genome duplication, a trait shared by 22000 species of ray-finned fish. Genome Res. 2003, 13 (3): 382-390. 10.1101/gr.640303.PubMed CentralView ArticlePubMedGoogle Scholar
- Van de Peer Y, Taylor JS, Meyer A: Are all fishes ancient polyploids?. J Struct Funct Genomics. 2003, 3 (1-4): 65-73. 10.1023/A:1022652814749.View ArticlePubMedGoogle Scholar
- Remm M, Storm CE, Sonnhammer EL: Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. J Mol Biol. 2001, 314 (5): 1041-1052. 10.1006/jmbi.2000.5197.View ArticlePubMedGoogle Scholar
- Nelson JS: Fishes of the world. 2006, New York , John Wiley and Sons, Inc., 601 pp.-4thGoogle Scholar
- Steinke D, Salzburger W, Meyer A: Novel Relationships Among Ten Fish Model Species Revealed Based on a Phylogenomic Analysis Using ESTs. J Mol Evol. 2006, 62 (6): 772-784. 10.1007/s00239-005-0170-8.View ArticlePubMedGoogle Scholar
- Miya M, Satoh TP, Nishida M: The phylogenetic position of toadfishes (order Batrachoidiformes) in the higher ray-finned fish as inferred from partitioned Bayesian analysis of 102 whole mitochondrial genome sequences. Biol J Linn Sco Lond. 2005, 85: 289-306. 10.1111/j.1095-8312.2005.00483.x.View ArticleGoogle Scholar
- Shimodaira H, Hasegawa M: Multiple comparisons of log-likelihoods with applications to phylogenetic inference. Mol Biol Evol. 1999, 16: 1114-1116.View ArticleGoogle Scholar
- Ensembl [www.ensembl.org/index.html].Google Scholar
- Rozen S, Skaletsky H: Primer3 on the WWW for general users and for biologist programmers. Methods Mol Biol. 2000, 132: 365-386.PubMedGoogle Scholar
- Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG: The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 1997, 25 (24): 4876-4882. 10.1093/nar/25.24.4876.PubMed CentralView ArticlePubMedGoogle Scholar
- Swofford DL: PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods). Version 4. 2003, Sinauer Associates, Sunderland, Massachusetts.Google Scholar
- Ronquist F, Huelsenbeck JP: MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics. 2003, 19 (12): 1572-1574. 10.1093/bioinformatics/btg180.View ArticlePubMedGoogle Scholar
- Jobb G, von Haeseler A, Strimmer K: TREEFINDER: a powerful graphical analysis environment for molecular phylogenetics. BMC Evol Biol. 2004, 4: 18-10.1186/1471-2148-4-18.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.