Skip to main content
Figure 2 | BMC Evolutionary Biology

Figure 2

From: An empirical evaluation of two-stage species tree inference strategies using a multilocus dataset from North American pines

Figure 2

Schematic for creating the four subsets D s , D s , 0 , D p , and D p , 0 from dataset. For the matrices of datasets , D s , D s , 0 , D p , and D p , 0 (see Table 2), each row is an individual and each column is a locus. Thick black lines in these matrices separate the individuals in different species. Gray boxes indicate missing sequences. (A) At each locus, a single sequence from each species (indicated in red) is selected from dataset . These selected sequences are used to create D s such that there exists a single sequence sampled per species at each locus. Sequences from a subset of loci in D s (indicated in yellow) are used to create dataset D s , 0 such that each locus has at least one nucleotide difference between each distinct pair of species other than pairs from distinct outgroups. (B) Dataset D p is the full starting dataset . At each locus , a distance matrix is created according to eq. 2. Sequences from a subset of loci (indicated in red) in D p are used to create dataset D p , 0 such that each locus has a nonzero p-distance between each distinct pair of species other than pairs from distinct outgroups. Observe that the D p , 0 matrix includes loci 3 and 7, which are not included in the D s , 0 matrix. Loci 3 and 7 are included in D p , 0 but not in D s , 0 because in D p , 0 , pairs of species contain at least one pair of individuals with different sequences, whereas in D s , 0 , at least one pair of the 11 selected individuals have identical sequences. Therefore, the set of loci in D p , 0 is a superset of the set of loci in D s , 0 , and the number of loci in D p , 0 is always greater than or equal to the number of loci in D s , 0 .

Back to article page