Skip to main content

Table 1 Sequence and cluster data for each taxon

From: Inferring angiosperm phylogeny from EST data with widespread gene duplication

Taxona Releaseb Original TCsc MaxORFsd Clusterse Final TCsf
Arabidopsis thaliana 12.1 28900 23737 343 729
Glycine max 12.0 31928 13930 538 1065
Lotus japonicus 3.0 12485 3116 365 452
Medicago truncatula 8.0 18612 12254 528 852
Oryza sativa 16.0 36381 25842 199 418
Pinus g 6.0 23531 13949 159 315
Solanum tuberosum 10.0 21063 12625 378 705
Total   172900 105453 577 4536
  1. a Taxon as given by TIGR for the EST collection assembled in the Gene Index Database.
  2. b Versions used in this paper, current as of 18 February 2006.
  3. c The 363,971 sequences in the database for these taxa were screened to include only those sequences assembled by TIGR into Tentative Consensus (TC) sequences.
  4. d TCs were trimmed to the largest sense-direction ORF that was at least 500 nt in length; shorter sequences were discarded.
  5. eNumber of clusters in which the taxon is represented, after screening for phylogenetic informativeness (at least three taxa and at least four sequences).
  6. f Total number of sequences from each taxon in the final set of clusters.
  7. gTIGR assembled this library from several species of Pinus.