Correspondence  Open  Published:
On Hill et al's conjecture for calculating the subtree prune and regraft distance between phylogenies
BMC Evolutionary Biologyvolume 10, Article number: 334 (2010)
Abstract
Background
Recently, Hill et al. [1] implemented a new software packagecalled SPRITwhich aims at calculating the minimum number of horizontal gene transfer events that is needed to simultaneously explain the evolution of two rooted binary phylogenetic trees on the same set of taxa. To this end, SPRIT computes the closely related socalled rooted subtree prune and regraft distance between two phylogenies. However, calculating this distance is an NPhard problem and exact algorithms are often only applicable to small or mediumsized problem instances. Trying to overcome this problem, Hill et al. propose a divideandconquer approach to speed up their algorithm and conjecture that this approach can be used to compute the rooted subtree prune and regraft distance exactly.
Results
In this note, we present a counterexample to Hill et al's conjecture and subsequently show that a modified version of their conjecture holds.
Conclusion
While Hill et al's conjecture may result in an overestimate of the rooted subtree prune and regraft distance, a slightly more restricted version of their approach gives the desired outcome and can be applied to speed up the exact calculation of this distance between two phylogenies.
Background
In recent years, one of the main research foci in the development of theoretical frameworks that aim at approaching questions in evolutionary biology turns from the reconstruction of phylogenetic trees towards the reconstruction of phylogenetic networks. This has partly been triggered by the exponentially growing amount of available sequence data arising from whole genome sequencing projects and a successive detection of genes whose sequences are chimeras of distinct ancestral gene sequences, and hence, are likely to be the result of reticulation (e.g. horizontal gene transfer or hybridization). Although evolutionary biologists are now mostly acknowledging the existence of species arising from reticulation within certain groups of organisms, the extent to which such events have influenced the evolutionary history for a set of presentday species remains controversially discussed until today. To shed light on this question, Hill et al. [1] recently published a study that is centered around the identification and quantification of horizontal gene transfer. The authors have implemented a new software packagecalled SPRITconsisting of a heuristic as well as an exact algorithm, applied it to several data sets of variable size, and compared their results and running times with those obtained from other algorithms that have previously been developed to analyze reticulate evolution.
Algorithmically, SPRIT draws on ideas that are borrowed from work that has been done in the context of the graphtheoretic operation of rooted subtree prune and regraft (rSPR) which is a popular tool to quantify the dissimilarity between two trees. Loosely speaking, an rSPR operation cuts (prunes) a subtree and reattaches (regrafts) it to another part of the tree. A lower bound on the number of reticulation events that is needed to simultaneously explain two phylogenies is the minimum number of rSPR operations that transform one phylogeny into the other [2, 3]. This minimum number, which is computed by SPRIT, is referred to as the rSPR distance. However, since the task of calculating this distance is an NPhard optimization problem, the application of exact algorithms is often restricted to mediumsized data sets.
In trying to overcome this obstacle, thus to speed up SPRIT, Hill et al. propose a divideandconquertype reduction that breaks the problem into several smaller and more tractable subproblems before calculating the rSPR distance for each subproblem separately. Briefly, the authors conjecture that the sum of rSPR distances over all smaller subproblems is equal to the rSPR distance of the original unreduced trees. In this note, we give a counterexample to their conjecture. Nevertheless, we subsequently show that a slightly more restricted version of their conjecture holds and can be used to exactly calculate the rSPR distance between two phylogenies by breaking the problem into smaller subproblems.
The remainder of this paper is organized as follows. The next section contains some mathematical preliminaries that are needed to formally state Hill at al's conjecture. This conjecture is then given in the subsequent section which also contains the aforementioned counterexample. We then show that a modified version of the conjecture holds in the following section. We end this note with a brief conclusion.
Preliminaries
In this section, we give some preliminary definitions that are used throughout this paper. Unless otherwise stated, the notation and terminology follows [4].
Phylogenetic Trees
A rooted binary phylogenetic Xtree $\mathcal{T}$ is a rooted tree whose root has degree two while all other interior vertices have degree three and whose leaf set is X . The set X is the label set of $\mathcal{T}$ and is frequently denoted by $\mathcal{L}(\mathcal{T})$. Furthermore, let X′ be a subset of X. The minimal rooted subtree of $\mathcal{T}$ that connects all the leaves in X′ is denoted by $\mathcal{T}$(X′) while the restriction of $\mathcal{T}$to X′, denoted by $\mathcal{T}$X′, is the rooted binary phylogenetic X′tree obtained from $\mathcal{T}$(X′) by contracting all degreetwo vertices apart from the root.
Rooted Subtree Prune and Regraft
Let $\mathcal{T}$ be a rooted binary phylogenetic Xtrees. For the purposes of the upcoming definition, we view the root of $\mathcal{T}$ as a vertex ρ adjoined to the original root by a pendant edge. Now, let e = {u, v} be any edge of $\mathcal{T}$ that is not incident with ρ such that u is the vertex on the path from ρ to $v$. Let ${\mathcal{T}}^{\prime}$ be the rooted binary phylogenetic Xtree obtained from $\mathcal{T}$ by deleting e and reattaching the resulting subtree with root v via a new edge, say f , as follows. Subdivide an edge of the component that contains ρ with a new vertex u′, join u′ and $v$ with f , and contract u. Then ${\mathcal{T}}^{\prime}$ has been obtained from $\mathcal{T}$ by a rooted subtree prune and regraft (rSPR) operation. The rSPR distance between two rooted binary phylogenetic Xtrees $\mathcal{T}$ and ${\mathcal{T}}^{\prime}$ is the minimum number of rSPR operations that transform $\mathcal{T}$ into ${\mathcal{T}}^{\prime}$. We denote this distance by ${d}_{\text{rSPR}}(\mathcal{T},{\mathcal{T}}^{\prime})$.
Agreement Forests
Let $\mathcal{T}$ and ${\mathcal{T}}^{\prime}$ be two rooted binary phylogenetic Xtrees. Again, to make the following work, regard the roots of $\mathcal{T}$ and ${\mathcal{T}}^{\prime}$ as a vertex ρ adjoined to the original root by a pendant edge. An agreement forest $\mathcal{F}=\{{\mathcal{L}}_{p},{\mathcal{L}}_{1},{\mathcal{L}}_{2},\mathrm{...},{\mathcal{L}}_{k}\}$ for $\mathcal{T}$ and ${\mathcal{T}}^{\prime}$ is a partition of $X\cup \left\{\rho \right\}$ such that $\rho \in {\mathcal{L}}_{\rho}$ and the following properties are satisfied:

(i)
for all i ∈ {ρ, 1, ..., k}, we have $\mathcal{T}{\mathcal{L}}_{i}\cong {\mathcal{T}}^{\prime}\text{}{\mathcal{L}}_{i}$, and

(ii)
the trees in $\{\mathcal{T}({\mathcal{L}}_{i}):i\in \{\rho ,1,\mathrm{...},k\left\}\right\}$ and $\{{\mathcal{T}}^{\prime}({\mathcal{L}}_{i}):i\in \{\rho ,1,\mathrm{...},k\left\}\right\}$ are vertexdisjoint subtrees of $\mathcal{T}$ and ${\mathcal{T}}^{\prime}$, respectively.
Throughout the remainder of this note, we will interchangeably refer to $\{\mathcal{T}{\mathcal{L}}_{\rho},\mathcal{T}{\mathcal{L}}_{1},\mathcal{T}{\mathcal{L}}_{2},\mathrm{...},\mathcal{T}{\mathcal{L}}_{k}\}$ and $\{{\mathcal{L}}_{\rho},{\mathcal{L}}_{1},\mathrm{...},{\mathcal{L}}_{k}\}$ as an agreement forest for $\mathcal{T}$ and ${\mathcal{T}}^{\prime}$. A maximumagreement forest for $\mathcal{T}$ and ${\mathcal{T}}^{\prime}$ is an agreement forest for $\mathcal{T}$ and ${\mathcal{T}}^{\prime}$ with the smallest number of elements over all agreement forests for $\mathcal{T}$ and ${\mathcal{T}}^{\prime}$. Note that a maximumagreement forest for $\mathcal{T}$ and ${\mathcal{T}}^{\prime}$ is not necessarily unique.
Bordewich and Semple [5] established the following characterization which directly relates the rSPR distance to the number of elements in a maximumagreement forest and is crucial to many algorithms that exactly compute the rSPR distance between two rooted binary phylogenetic trees.
Theorem 1. Let $\mathcal{T}$and ${\mathcal{T}}^{\prime}$be two rooted binary phylogenetic Xtrees, and let $\{{\mathcal{T}}_{\rho},{\mathcal{T}}_{1},{\mathcal{T}}_{2},\mathrm{...},{\mathcal{T}}_{k}\}$be a maximumagreement forest for $\mathcal{T}$and ${\mathcal{T}}^{\prime}$. Then
Clusters
Let $\mathcal{T}$ be a rooted binary phylogenetic Xtree, and let A be a subset of X with A ≥ 2. We say that A is a cluster of $\mathcal{T}$ if there is a vertex $v$ in $\mathcal{T}$ whose set of descendants is precisely A. We denote this cluster by ${\mathcal{C}}_{\mathcal{T}}(v)$.
We next consider several different types of clusters that will play an important role in the remainder of this paper. Let $\mathcal{T}$ and ${\mathcal{T}}^{\prime}$ be two rooted binary phylogenetic Xtrees, and let A be a cluster that is common to $\mathcal{T}$ and ${\mathcal{T}}^{\prime}$; that is there exists a vertex $v$ in $\mathcal{T}$ and a vertex ${v}^{\prime}$ in ${\mathcal{T}}^{\prime}$ such that ${\mathcal{C}}_{\mathcal{T}}(v)={\mathcal{C}}_{{\mathcal{T}}^{\prime}}({v}^{\prime})$. Furthermore, let u (resp. u′) be the parent vertex of $v$ (resp. ${v}^{\prime}$) in $\mathcal{T}$ (resp. ${\mathcal{T}}^{\prime}$), and let w (resp. w′) be the child vertex of u (resp. u′) with $w\ne v$ (resp. ${w}^{\prime}\ne {v}^{\prime}$). If no proper subset of A is a common cluster of $\mathcal{T}$ and ${\mathcal{T}}^{\prime}$, we refer to A as a minimal cluster. Moreover, A is a solvable cluster if A is minimal and ${\mathcal{C}}_{\mathcal{T}}(u)={\mathcal{C}}_{{\mathcal{T}}^{\prime}}({u}^{\prime})$. Lastly, we say that A is a subtreelike cluster if A is a solvable cluster and $\mathcal{T}\text{}{\mathcal{C}}_{\mathcal{T}}(w)\cong {\mathcal{T}}^{\prime}\text{}{\mathcal{C}}_{{\mathcal{T}}^{\prime}}({w}^{\prime})$. Roughly speaking, the condition $\mathcal{T}\text{}{\mathcal{C}}_{\mathcal{T}}(w)\cong {\mathcal{T}}^{\prime}\text{}{\mathcal{C}}_{{\mathcal{T}}^{\prime}}({w}^{\prime})$ is satisfied if the subtree with root w in $\mathcal{T}$ is identical to the subtree with root w′ in ${\mathcal{T}}^{\prime}$. We refer to $\mathcal{T}\text{}{\mathcal{C}}_{\mathcal{T}}(w)$ as the common subtree associated with A and note that it can exclusively consist of an isolated vertex. For example, A = {1, 2, ..., 6} is a solvable cluster of the two rooted binary phylogenetic Xtrees $\mathcal{T}$ and ${\mathcal{T}}^{\prime}$ that are shown in Figure 1 since ${\mathcal{C}}_{\mathcal{T}}(u)={\mathcal{C}}_{{\mathcal{T}}^{\prime}}({u}^{\prime})$ = {1, 2, ..., 12}. However, as $\mathcal{T}(7,8,\phantom{\rule{0.5em}{0ex}}\dots ,\phantom{\rule{0.5em}{0ex}}12)\overline{)\cong}{\mathcal{T}}^{\prime}(7,8,\phantom{\rule{0.5em}{0ex}}\dots ,\phantom{\rule{0.5em}{0ex}}12)$, it follows that A is not a subtreelike cluster of $\mathcal{T}$ and ${\mathcal{T}}^{\prime}$.
Now, let Θ ∈ {minimal, solvable, subtreelike}. We next describe algorithmically how to obtain a sequence of tree pairswhich is important to mathematically state Hill et al's conjectureby decomposing two rooted binary phylogenetic Xtrees $\mathcal{T}$ and ${\mathcal{T}}^{\prime}$ into smaller subtrees. As previously, view the roots of $\mathcal{T}$ and ${\mathcal{T}}^{\prime}$ as a vertex ρ adjoined to the original root by a pendant edge, and regard ρ as part of the label set; that is $\mathcal{L}(\mathcal{T})=X\cup \left\{\rho \right\}$. Setting i to be 1, let A_{ i } be a common Θ cluster of $\mathcal{T}$ and ${\mathcal{T}}^{\prime}$ with $\mathcal{L}(\mathcal{T}){A}_{i}>1$. Let ${\mathcal{T}}_{i}$ denote the rooted binary phylogenetic tree $\mathcal{T}{A}_{i}$ (viewing the root of ${\mathcal{T}}_{i}$ as a vertex ρ_{ i } adjoined to the original root by a pendant edge) and reset $\mathcal{T}$ to be the tree obtained from $\mathcal{T}$ by replacing $\mathcal{T}({A}_{i})$ with a new vertex a_{ i } . Analogously, let ${{\mathcal{T}}^{\prime}}_{i}$ denote the rooted binary phylogenetic tree ${\mathcal{T}}^{\prime}{A}_{i}$ (viewing the root of ${{\mathcal{T}}^{\prime}}_{i}$ as a vertex ρ_{ i } adjoined to the original root by a pendant edge) and reset ${\mathcal{T}}^{\prime}$ to be the tree obtained from ${\mathcal{T}}^{\prime}$ by replacing ${\mathcal{T}}^{\prime}({A}_{i})$ with a new vertex a_{ i } . If $\mathcal{T}$ and ${\mathcal{T}}^{\prime}$ contain a Θ cluster A_{i+1 }with $\mathcal{L}(\mathcal{T}){A}_{i+1}>1$, stop or increment i by 1 and repeat this process; otherwise, stop. Eventually, we obtain a sequence
of pairs of rooted binary phylogenetic trees, where ${\mathcal{T}}_{\rho}$ and ${{\mathcal{T}}^{\prime}}_{\rho}$ denote the two trees after the replacement of $\mathcal{T}({A}_{t})$ and ${\mathcal{T}}^{\prime}({A}_{t})$ with a vertex a_{ t } . We call this sequence a cluster sequence of $\mathcal{T}$ and ${\mathcal{T}}^{\prime}$ with respect to a specific cluster type Θ. An example of a cluster sequence with respect to Θ = solvable for the two rooted binary phylogenetic trees depicted in Figure 1 is shown in Figure 2.
Hill et al's Conjecture and a Counterexample
We begin this section by formally stating Hill et al's conjecture which was introduced in [1].
Conjecture 2. Let $\mathcal{T}$and ${\mathcal{T}}^{\prime}$be two rooted binary phylogenetic Xtrees. Let $({\mathcal{T}}_{\text{1}},{{\mathcal{T}}^{\prime}}_{1}),\mathrm{...},({\mathcal{T}}_{t},{{\mathcal{T}}^{\prime}}_{t}),({\mathcal{T}}_{\rho},{{\mathcal{T}}^{\prime}}_{\rho})$be a cluster sequence for $\mathcal{T}$and ${\mathcal{T}}^{\prime}$with respect to Θ = solvable. Then
Next, we detail a counterexample to the above conjecture which is based on the two rooted binary phylogenetic Xtrees $\mathcal{T}$ and ${\mathcal{T}}^{\prime}$ that are shown in Figure 1. A maximumagreement forest $\mathcal{F}$ for $\mathcal{T}$ and ${\mathcal{T}}^{\prime}$ contains 5 elements and is shown in the top of Figure 3. By Theorem 1, this implies that ${d}_{\text{rSPR}}(\mathcal{T},{\mathcal{T}}^{\prime})=4$. Now, consider the cluster sequence with respect to Θ = solvable for $\mathcal{T}$ and ${\mathcal{T}}^{\prime}$ that contains three tree pairs and is depicted in Figure 2. The first tree pair (${\mathcal{T}}_{1},{{\mathcal{T}}^{\prime}}_{1}$) consists of the restricted subtrees of $\mathcal{T}$ and ${\mathcal{T}}^{\prime}$ whose leaf set is the solvable cluster A_{1} = {1, 2, ..., 6} of $\mathcal{T}$ and ${\mathcal{T}}^{\prime}$; thus ${\mathcal{T}}_{1}=\mathcal{T}({A}_{1}\cup \left\{{\rho}_{1}\right\})$ and ${{\mathcal{T}}^{\prime}}_{1}={\mathcal{T}}^{\prime}({A}_{1}\cup \left\{{\rho}_{1}\right\})$. Similarly, the second tree pair (${\mathcal{T}}_{2},{{\mathcal{T}}^{\prime}}_{2}$) consists of the restricted subtrees of the two trees that have been obtained from $\mathcal{T}$ and ${\mathcal{T}}^{\prime}$ by replacing $\mathcal{T}({A}_{1})$ and ${\mathcal{T}}^{\prime}({A}_{1})$, respectively, with a single leaf a_{1} whose leaf set is the solvable cluster A_{2} = {7, 8, ..., 12}. Lastly, the third tree pair (${\mathcal{T}}_{\rho},{{\mathcal{T}}^{\prime}}_{\rho}$) can be regarded as being obtained from $\mathcal{T}$ and ${\mathcal{T}}^{\prime}$ by replacing $\mathcal{T}({A}_{1})$ and ${\mathcal{T}}^{\prime}({A}_{1})$ with a leaf a_{1} and replacing $\mathcal{T}({A}_{2})$ and ${\mathcal{T}}^{\prime}({A}_{2})$ with a leaf a_{2}. For each tree pair (${\mathcal{T}}_{i},{{\mathcal{T}}^{\prime}}_{i}$) of the cluster sequence shown in Figure 2, a maximumagreement forest ${\mathcal{F}}_{i}$ with i ∈ {1, 2, ρ} is depicted in the bottom part of Figure 3. Note that each forest ${\mathcal{F}}_{i}$ is the unique maximumagreement forest for ${\mathcal{T}}_{i}$ and ${{\mathcal{T}}^{\prime}}_{i}$ Now, by Equation 1, we have
which is strictly greater than ${d}_{\text{rSPR}}(\mathcal{T},{\mathcal{T}}^{\prime})$; thus showing that Conjecture 2 does not hold.
Using SubtreeLike Clusters to Prove Hill et al's Conjecture
In this section, we show that Conjecture 2 holds, if we consider a subtreelike cluster instead of a solvable cluster in each iteration of computing a cluster sequence for two rooted binary phylogenetic trees. We first prove the result for a cluster sequence of size two and then see that this result generalizes to cluster sequences of greater size.
Lemma 3. Let $\mathcal{T}$and ${\mathcal{T}}^{\prime}$be two rooted binary phylogenetic Xtrees. Let (${\mathcal{T}}_{1},{{\mathcal{T}}^{\prime}}_{1}$), (${\mathcal{T}}_{\rho},{{\mathcal{T}}^{\prime}}_{\rho}$) be a cluster sequence for $\mathcal{T}$and ${\mathcal{T}}^{\prime}$with respect to Θ = subtreelike. Then
Proof. Let A_{1} be the subtreelike cluster $\mathcal{L}({\mathcal{T}}_{1})\left\{{\rho}_{1}\right\}$ of $\mathcal{T}$ and ${\mathcal{T}}^{\prime}$. We start by making an observation that is crucial for what follows. By the definition of a subtreelike cluster, there exists a common subtree, say $\mathcal{S}$, that is associated with A_{1} in $\mathcal{T}$ and ${\mathcal{T}}^{\prime}$. Clearly, $\mathcal{S}$ is also a common subtree of ${\mathcal{T}}_{\rho}$ and ${{\mathcal{T}}^{\prime}}_{\rho}$. Furthermore, as ${\mathcal{T}}_{\rho}$ has been obtained from $\mathcal{T}$ by replacing $\mathcal{T}({A}_{1})$ with a single vertex a_{1} and as ${{\mathcal{T}}^{\prime}}_{\rho}$ has been obtained from ${\mathcal{T}}^{\prime}$ by replacing ${\mathcal{T}}^{\prime}({A}_{1})$ with a single vertex a_{1}, it is easily checked that $\mathcal{T}$($\mathcal{L}(\mathcal{S})\cup \left\{{a}_{1}\right\}$) is a common subtree of ${\mathcal{T}}_{\rho}$ and ${{\mathcal{T}}^{\prime}}_{\rho}$.
We now show that
Let ${\mathcal{F}}_{1}$ be a maximumagreement forest for ${\mathcal{T}}_{1}$ and ${{\mathcal{T}}^{\prime}}_{1}$, and let ${\mathcal{F}}_{\rho}$ be a maximumagreement forest for ${\mathcal{T}}_{\rho}$ and ${{\mathcal{T}}^{\prime}}_{\rho}$. By the observation prior to this paragraph, it follows from Proposition 3.2 of [5] that $\mathcal{L}(\mathcal{S})\cup \left\{{a}_{1}\right\}$ is a subset of an element, say ${\mathcal{L}}_{{a}_{1}}$, in ${\mathcal{F}}_{\rho}$. Furthermore, let ${\mathcal{L}}_{{\rho}_{1}}$ be the label set of ${\mathcal{F}}_{1}$ with ρ_{1} ∈ ${\mathcal{L}}_{{\rho}_{1}}$. As ${\mathcal{F}}_{1}$ is an agreement forest for ${\mathcal{T}}_{1}$ and ${{\mathcal{T}}^{\prime}}_{1}$ and as ${\mathcal{F}}_{\rho}$ is such a forest for ${\mathcal{T}}_{\rho}$ and ${{\mathcal{T}}^{\prime}}_{\rho}$, it follows that
is an agreement forest for $\mathcal{T}$ and ${\mathcal{T}}^{\prime}$. As ${\mathcal{L}}_{{a}_{1}}$  {a_{1}} always contains an element, note that $({\mathcal{L}}_{{\rho}_{1}}\{{\rho}_{1}\})\cup ({\mathcal{L}}_{{a}_{1}}\{{a}_{1}\})$ is never the empty set. Thus $\mathcal{F}={\mathcal{F}}_{1}+{\mathcal{F}}_{\rho}1$ and, by Theorem 1, we have
This establishes Equation 2.
We now turn to the second part of this proof and show that
Let $\mathcal{F}$ be a maximumagreement forest for $\mathcal{T}$ and ${\mathcal{T}}^{\prime}$. The remainder of this part splits into two cases. First, assume that there exists an element in $\mathcal{F}$, say ${\mathcal{L}}_{m}$, such that ${\mathcal{L}}_{m}\cap {A}_{1}\ne \varnothing $ and ${\mathcal{L}}_{m}\cap (X{A}_{1})\cup \left\{\rho \right\}\ne \varnothing $. Note that ${\mathcal{L}}_{m}$ is the only label set with the described properties, as otherwise, $\mathcal{F}$ is not an agreement forest for $\mathcal{T}$ and ${\mathcal{T}}^{\prime}$. Let ${\mathcal{L}}_{{m}^{\prime}}=({\mathcal{L}}_{m}\cap {A}_{1})\cup \left\{{\rho}_{1}\right\}$, and let ${\mathcal{L}}_{{m}^{\text{'}\text{'}}}=({\mathcal{L}}_{m}\cap ((X{A}_{1})\cup \left\{\rho \right\}))\cup \left\{{a}_{1}\right\}$. Since $\mathcal{F}$ is an agreement forest for $\mathcal{T}$ and ${\mathcal{T}}^{\prime}$,
is such a forest for ${\mathcal{T}}_{1}$ and ${{\mathcal{T}}^{\prime}}_{1}$ and
is an agreement forest for ${\mathcal{T}}_{\rho}$ and ${{\mathcal{T}}^{\prime}}_{\rho}$. Second, assume that no such element ${\mathcal{L}}_{m}$ exists. Hence, every element $\mathcal{L}$ in $\mathcal{F}$ is either a subset of A_{1} or a subset of $(X{A}_{1})\cup \left\{\rho \right\}$. Furthermore, as A_{1} is a subtreelike cluster of $\mathcal{T}$ and ${\mathcal{T}}^{\prime}$ whose associated common subtree is $\mathcal{S}$, it again follows from Proposition 3.2 of [5], that $\mathcal{L}(\mathcal{S})$ is a subset of an element, say ${\mathcal{L}}_{\mathcal{S}}$, in $\mathcal{F}$. Now, as $\mathcal{F}$ is an agreement forest for $\mathcal{T}$ and ${\mathcal{T}}^{\prime}$, it follows that
is an agreement forest for ${\mathcal{T}}_{1}$ and ${{\mathcal{T}}^{\prime}}_{1}$ and
is such a forest for ${\mathcal{T}}_{\rho}$ and ${{\mathcal{T}}^{\prime}}_{\rho}$. Regardless of whether or not ${\mathcal{L}}_{m}$ exists, we have $\mathcal{F}={\mathcal{F}}_{1}+{\mathcal{F}}_{\rho}1$, and therefore,
This establishes Equation 3, and combining Equations 2 and 3 completes the proof of this lemma.
The next theorem directly follows from repeated applications of Lemma 3.
Theorem 4. Let $\mathcal{T}$and ${\mathcal{T}}^{\prime}$be two rooted binary phylogenetic Xtrees. Let $({\mathcal{T}}_{\text{1}},{{\mathcal{T}}^{\prime}}_{1}),\mathrm{...},({\mathcal{T}}_{t},{{\mathcal{T}}^{\prime}}_{t}),({\mathcal{T}}_{\rho},{{\mathcal{T}}^{\prime}}_{\rho})$be a cluster sequence for$\mathcal{T}$and ${\mathcal{T}}^{\prime}$with respect to Θ = subtreelike. Then
Conclusion
In this paper, we have shown that Hill et al's conjecture [1] and the underlying divideandconquer approach cannot be used to calculate the rSPR distance between two phylogenies exactly. To provide some intuition why this conjecture fails, consider the following. Let $({\mathcal{T}}_{\text{1}},{{\mathcal{T}}^{\prime}}_{1}),\mathrm{...},({\mathcal{T}}_{t},{{\mathcal{T}}^{\prime}}_{t}),({\mathcal{T}}_{\rho},{{\mathcal{T}}^{\prime}}_{\rho})$ be a cluster sequence with respect to Θ = solvable for two rooted binary phylogenetic trees $\mathcal{T}$ and ${\mathcal{T}}^{\prime}$. Calculating a maximumagreement forest for each tree pair (${\mathcal{T}}_{i},{{\mathcal{T}}^{\prime}}_{i}$), taking their union, and, for each i ∈; {1, 2, ..., t}, joining the element containing a_{ i } with the element containing ρ_{ i } can potentially result in a set, say $\mathcal{G}$, which contains an element that is a subset of {a_{1}, a_{2}, ..., a_{ t } , ρ_{1}, ρ_{2}, ..., ρ_{ t } }. In the case of our counterexample,
contains one such element. Trivially, this element is not part of any agreement forest for $\mathcal{T}$ and ${\mathcal{T}}^{\prime}$ while $\mathcal{G}$  {{a_{1}, a_{2}, ρ_{1}, ρ_{2}}} is precisely a maximumagreement forest for $\mathcal{T}$ and ${\mathcal{T}}^{\prime}$. Consequently, a divideandconquer approach that exactly calculates ${d}_{\text{rSPR}}(\mathcal{T},{\mathcal{T}}^{\prime})$ needs to take into account the number of elements in $\mathcal{G}$ that are subsets of {a_{1}, a_{2}, ..., a_{ t } , ρ_{1}, ρ_{2}, ..., ρ_{ t } }; otherwise, the result may be an overestimate of the exact solution. Alternatively, one can approach the problem by finding a strategy which guarantees that no element in $\mathcal{G}$ is a subset of {a_{1}, a_{2}, ..., a_{ t } , ρ_{1}, ρ_{2}, ..., ρ_{ t } }. This is the underlying idea of Theorem 4 which uses a slightly more restricted version of Hill et al's conjecture and finally gives the desired outcome. Hence, decomposing $\mathcal{T}$ and ${\mathcal{T}}^{\prime}$ into a cluster sequence with respect to Θ = subtreelike can be used to speed up the exact calculation of ${d}_{\text{rSPR}}(\mathcal{T},{\mathcal{T}}^{\prime})$.
However, for practical problem instances, it may be unlikely to find many subtreelike clusters. For example, the two phylogenies shown in Figure 1 do not have any common subtreelike cluster. This is due to the restricted definition of such a cluster which requires that a vertex whose set of descendants is a common cluster of two rooted binary phylogenetic Xtrees $\mathcal{T}$ and ${\mathcal{T}}^{\prime}$ has the same parent vertex than a common subtree of $\mathcal{T}$ and ${\mathcal{T}}^{\prime}$. To lessen this problem, an alternative approachthat has recently been published by Linz and Semple [6]can be applied. This paper describes a more general divideandconquer approach that exactly computes the rSPR distance between $\mathcal{T}$ and ${\mathcal{T}}^{\prime}$ for when a cluster sequence $({\mathcal{T}}_{\text{1}},{{\mathcal{T}}^{\prime}}_{1}),\mathrm{...},({\mathcal{T}}_{t},{{\mathcal{T}}^{\prime}}_{t}),({\mathcal{T}}_{\rho},{{\mathcal{T}}^{\prime}}_{\rho})$ with respect to Θ = minimal for $\mathcal{T}$ and ${\mathcal{T}}^{\prime}$ is given. Loosely speaking, the authors calculate a socalled minimumweight partition $\mathcal{G}$ of X ∪ {ρ} ∪ {a_{1}, a_{2}, ..., a_{ t } , ρ_{1}, ρ_{2}, ..., ρ_{ t } } such that $\mathcal{G}$ contains an agreement forest (not necessarily a maximumagreement forest) for each tree pair (${\mathcal{T}}_{i},{{\mathcal{T}}^{\prime}}_{i}$). To compute $\mathcal{G}$, it has been shown that applying a 'bottomup' approach which locally works on subtrees of each tree pair (${\mathcal{T}}_{i},{{\mathcal{T}}^{\prime}}_{i}$) guarantees that the number of elements in $\mathcal{G}$ that are subsets of {a_{1}, a_{2}, ..., a_{ t } , ρ_{1}, ρ_{2}, ..., ρ_{ t } } is maximized while $\mathcal{G}$ is minimized.
References
 1.
Hill T, Nordström KJV, Thollesson M, Säfström TM, Vernersson AKE, Fredriksson R, Schiöth HB: SPRIT: Identifying horizontal gene transfer in rooted phylogenetic trees. BMC Evol Biol. 2010, 10: 4210.1186/147121481042.
 2.
Hein J, Jing T, Wang L, Zhang K: On the complexity of comparing evolutionary trees. Discrete Appl Math. 1996, 71: 153169. 10.1016/S0166218X(96)000625.
 3.
Baroni M, Grünewald S, Moulton V, Semple C: Bounding the number of hybridization events for a consistent evolutionary history. J Math Biol. 2005, 51: 171182. 10.1007/s0028500503159.
 4.
Semple C, Steel M: Phylogenetics. 2003, Oxford University Press
 5.
Bordewich M, Semple C: On the computational complexity of the rooted subtree prune and regraft distance. Ann Comb. 2004, 8: 409423. 10.1007/s000260040229z.
 6.
Linz S, Semple C: A cluster reduction for computing the subtree distance between phylogenies. Ann Comb
Acknowledgements
I thank Maria Luisa Bonet, Mareike Fischer, and Charles Semple for useful discussions and comments on an earlier version of this paper. Financial support from MEC (TIN200768005C0403) is gratefully acknowledged.
Response
By Helgi B Schiöth
EMail: helgis@bmc.uu.se
Address: Department of Neuroscience, Functional Pharmacology, Uppsala University, BMC, Box 593, 751 24, Uppsala, Sweden
"We have found that the manuscript by Linz is correct and to the point. We have therefore updated the SPRIT software and published the new version online.
The new version supports both the old incorrect conjecture as well as the new correct one to allow for comparisons to be made."
Author information
Authors’ original submitted files for images
Rights and permissions
About this article
Received
Accepted
Published
DOI
Keywords
 Horizontal Gene Transfer
 Cluster Sequence
 Tree Pair
 Pendant Edge
 Solvable Cluster