Skip to main content

Table 2 Effects of hit fraction threshold on cluster assembly. Bold indicates the threshold chosen for the current study.

From: Inferring angiosperm phylogeny from EST data with widespread gene duplication

Hit fractiona

Clustersb

Singletonsc

Phylogenetically informative clustersd

Max sizee

TCs in phylogenetically informative clustersf

0.0

39924

26782

4423

6565

54051

0.1

47798

32824

4079

1947

42406

0.2

57229

41327

3324

1362

29403

0.3

64691

48864

2561

330

21504

0.4

71333

56383

1876

117

15457

0.5

77564

63890

1340

98

10721

0.6

83435

71539

897

95

7105

0.7

88864

79122

577

94

4536

0.8

94296

87186

324

92

2529

0.9

99843

95975

103

89

872

1.0

105144

104860

1

6

6

  1. a Minimum proportion of sequence similarity based on BLAST's pairwise comparisons. The hit fraction determines whether a sequence is linked to another (if a pair is linked, they will be placed in the same cluster) and thus affects the level of heterogeneity within clusters and the number of assembled clusters. Original number of sequences is 105,453 TCs.
  2. b Total number of assembled clusters.
  3. c Number of single-sequence clusters.
  4. d Phylogenetically informative clusters for this study are those that include at least three species and at least four sequences.
  5. e Number of tentative consensus sequences (TCs) in the largest phylogenetically informative cluster.
  6. f Total TCs in all phylogenetically informative clusters.