Open Access

A tensorial approach to the inversion of group-based phylogenetic models

BMC Evolutionary Biology201414:236

https://doi.org/10.1186/s12862-014-0236-6

Received: 15 April 2014

Accepted: 6 November 2014

Published: 4 December 2014

Abstract

Background

Hadamard conjugation is part of the standard mathematical armoury in the analysis of molecular phylogenetic methods. For group-based models, the approach provides a one-to-one correspondence between the so-called “edge length” and “sequence” spectrum on a phylogenetic tree. The Hadamard conjugation has been used in diverse phylogenetic applications not only for inference but also as an important conceptual tool for thinking about molecular data leading to generalizations beyond strictly tree-like evolutionary modelling.

Results

For general group-based models of phylogenetic branching processes, we reformulate the problem of constructing a one-one correspondence between pattern probabilities and edge parameters. This takes a classic result previously shown through use of Fourier analysis and presents it in the language of tensors and group representation theory. This derivation makes it clear why the inversion is possible, because, under their usual definition, group-based models are defined for abelian groups only.

Conclusion

We provide an inversion of group-based phylogenetic models that can implemented using matrix multiplication between rectangular matrices indexed by ordered-partitions of varying sizes. Our approach provides additional context for the construction of phylogenetic probability distributions on network structures, and highlights the potential limitations of restricting to group-based models in this setting.

Keywords

Groups Representation theory Symmetry Markov chains

Background

Fundamental to evolutionary biology is the development and implementation of molecular phylogenetic methods [1]. These methods provide the means to reconstruct the past evolutionary history of biological entities given present-day molecular data, such as DNA. Considering Kimura’s neutral theory of molecular evolution, it is logical to apply a stochastic model at the level of DNA substitutions to construct probabilistic description of what molecular alignments are expected to be observed, given a proposed evolutionary history (tree topology and edge lengths). is commonly implemented assuming an IID (across sites in the alignment) and Markov process for DNA substitution, leading to a model that has a continuous-time Markov chain at its core (see Semple and Steel [2] for an introduction to the mathematics underlying modern phylogenetic methodology).

In a series of papers, Hendy and colleagues introduced the Hadamard conjugation as a novel tool for phylogenetic analyses [3]–[5]. They found an invertible relationship between a phylogenetic tree, as characterized by its edge length spectrum, and the probability distribution of site patterns (referred to as the sequence spectrum). Originally introduced only for the 2-state symmetric model, the Hadamard conjugation was later extended to the K3ST model [6]–[8] and further to any of the so-called “group-based” models [9]. Hadamard conjugation has been used as both a tool for simulation [10] and to look at statistical properties of methods, exploring the inconsistency of parsimony under a molecular clock [5],[11]. For these sorts of applications, following the notation in Felsenstein [1], we can use the Hadamard transform H to start with an edge length spectrum γ and calculate the sequence spectrum s=H −1 log(H γ). The beauty of Hadamard conjugations is that one can also begin with an observed sequence spectrum ŝ and perform the inverse of the conjugation to empirically obtain an edge length spectrum γ ̂ = H 1 log ( ) . Although it is not expected that the γ ̂ spectrum will precisely match a tree, Hendy [12] proposed using an optimisation criterion to map from γ ̂ to the “closest tree”.

Several authors have commented that it is potentially a useful feature of Hadamard conjugation that data isn’t forced onto a fixed tree. The conflicting information can be retained and interpreted in the form of a “lentoplot” [13] or a splits-graph [14], with both of these methods implemented in Spectronet [15]. Schliep [16] gives some more statistical justification for such an approach by making a link to modern statistical techniques such as the Lasso and Ridge regression.

von Haeseler and Churchill [17] seems to be the first paper that explicitly suggests using Hadamard conjugation to provide a likelihood framework for networks. The principle idea in this work was to start with an edge length spectrum that encodes a set of incompatible splits, use the Hadamard transformation to get site probabilities and use these to determine a likelihood. This idea was further explored by Bryant [18], and Bryant [19] followed this through defining the “n-taxon process” for group-based models. It should be noted that likelihoods calculated via Hadamard are not equivalent to likelihoods calculated by taking a mixture of trees. Indeed, Matsen and Steel [20], Matsen et al. [21] used Hadamard methods in combination with phylogenetic invariants to show that mixtures of trees with the same topology can exactly mimic another tree under the 2-state model. Considering biological applications, thinking in terms of mixtures of trees or partitions where the data can be thought of as arising on a set of trees [22]–[24] seems more reasonable than the Hadamard conjugation. Strimmer and Moulton [25] suggested using split networks as a spring board to likelihood-based analyses on DAGs, but later identified several problems with the approach [26]; most notably, in split-networks internal nodes do not have a biological interpretation as an ancestor.

In Sumner et al. [27], we gave some additional insight into the interpretation of applying the Hadamard conjugation in a network setting. We showed that permutation group structure inherent to the Hadamard transformation – as for any group-based model – restricts the resulting process from being capable of reproducing truly convergent processes. This is a serious limitation, as one of the biological motivations for explicit network models is the ability to model convergent processes. We also presented an alternative algebraic formalism for the general Markov model, analogous to the n-taxon process, but capable of reproducing convergent processes.

From the point of view of group representation theory, the inversion of group-based models relies on the fact that the irreducible representations of an abelian group are one-dimensional, and the model structure essentially reduces to analysing group characters – hence the standard presentation of a Fourier inversion. In this article, we make this connection concrete. For the general Markov model, it is then immediately apparent that an analogous inversion is not possible because the algebraic structure underlying the model is not abelian and hence the irreducible representations are not one-dimensional. In fact, to obtain one-dimensional representations for the general Markov model, it is necessary to apply higher-degree polynomial maps (beyond the degree 1, linear case), and define “Markov invariants” [28]. These invariants present one-dimensional representations but at the cost of the higher degree – degree 5 in the case of the general Markov model with four states on quartet trees [29],[30]. This connection between Hadamard transformation and Markov invariants is an interesting one, but we do not discuss it further here.

In this paper we approach the inversion of group-based phylogenetic models by taking a representation-theoretic perspective and working explicitly with tensor indices. Our approach rests heavily on the formalism of “phylogenetic tensors”, as presented in Bashford et al. [31], for the binary-symmetric and K3ST model, and Sumner et al. [27],[28], for the general Markov model.

Although the main inversion results presented here are not more general than those in in Székely et al. [7], we think it is important to reformulate them using the language of tensors and representation theory. This viewpoint has already led to new approaches for modeling convergent evolution [27] and for studying non-group-based models [28]. However, in none of our previous work was the link to Hadamard conjugation explicitly discussed. By presenting an old technique (Hadamard conjugation) in a new light we hope to introduce other researchers to the viewpoint of tensor analysis and representation theory.

Methods

Group-based models

We consider the continuous-time formulation of Markov processes, and show how to implement the inversion of a group-based phylogenetic model based on any abelian group G. We note that such an inversion requires a map from tensor product space (where elements are indexed by ordered-n-partitions) to phylogenetic splits (where elements are indexed by bipartitions). We achieve this by finding canonical maps from bipartitions to ordered-n-partitions.

For a group G (not necessarily abelian) with order |G|=d, we write G={σ 1,σ 2,…,σ d }, and, when necessary, write εG to specify the identity element of G. We will discuss the “regular representation” of G shortly, but skipping ahead we find that any rate matrix Q occurring in a group-based Markov model can be written in the form
Q = λ 1 + ε σ G α σ K σ ,
(1)

where each 0 α σ , λ = ε σ G α σ and the K σ are the permutation matrices corresponding to the (non-identity) group elements σG.

For the reader interested in deriving this result, consider the d-dimensional vector space G σ 1 , σ 2 , , σ d = { v = v 1 σ 1 + v 2 σ 2 + + v d σ d : v i } , with scalar multiplication and vector addition defined via
v + λ v = ( v 1 σ 1 + v 2 σ 2 + + v d σ d ) + λ ( v 1 σ 1 + v 2 σ 2 + + v d σ d ) = ( v 1 + λ v 1 ) σ 1 + ( v 2 + λ v 2 ) σ 2 + + ( v d + λ v d ) σ d ,
for all v , v G and λ . The regular representation, ρ reg : G GL ( d , ) , is then defined by setting the group action
σ : v σv = v 1 ( σ σ 1 ) + v 2 ( σ σ 2 ) + + v d ( σ σ d ) ,
for all v G and σG. If we fix {σ 1,σ 2,…,σ d } as an ordered basis for G , it is then clear – via Cayley’s theorem – that each group element σ gets mapped to a permutation matrix K σ :=ρ reg(σ), with K σ σ i = j K σ i j σ j : = σ σ i . Thus K σ has matrix elements
K σ i j = 1 , if σ j = σ σ i , 0 , otherwise.
(2)
Consider the unit column vectors
ξ 1 = ( 1 , 0 , 0 , , 0 ) T , ξ 2 = ( 0 , 1 , 0 , 0 , , 0 ) T , ξ d = ( 0 , 0 , , 0 , 1 ) T ;

and identify each σ i G with ξ i d , so that the group action becomes σ:ξ i K σ ξ i =ξ j where σ j =σ σ i . Thus the matrix elements K σ i j have i as the column label and j as the row label.

A group-based Markov model is then obtained by taking a continuous-time Markov chain with state space G={σ 1,σ 2,…,σ d } and using the group multiplication in G to assign a rate α σ to all substitutions σ 1σ 2 where σ σ 1=σ 2. Following this through (as is done in detail in [32]) we are led to the formula (1) for rate matrices in any group-based model.

The regular representation is one example of the general concept of a representation of G on a vector space V, defined as a homomorphism ρ:GG L(V) satisfying ρ(g 1g 2)=ρ(g 1)ρ(g 2) for all g 1,g 2G. A representation is said to be reducible if there exists a proper subspace UV satisfying ρ(g)UU, i.e. the set of matrices ρ(G) send vectors in U back to U. In this case, U is called an invariant subspace. The representation ρ is then called irreducible if V does not contain any invariant subspaces.

The reader should note that the usual construction of a “group-based” model [2] stipulates that G be abelian. Although the construction just given using the regular representation allows for non-abelian G, we will nonetheless only consider the abelian case in this paper, because, as discussed in the introduction, it is only in the abelian case that a (linear) inversion of phylogenetic models is possible. In this case the irreducible representations of G are all one-dimensional [33], and hence the analysis reduces to computations with group characters, as is exploited in the previous approaches using Fourier analysis [9],[34].

Phylogenetic tensors

We denote [d]:={1,2,…,d} as the state space for a continuous-time Markov chain. Consider an n-taxa phylogenetic tree and a d-state phylogenetic pattern distribution { p i 1 i 2 i n } i 1 , i 2 , , i n d with the interpretation that p i 1 i 2 i n is the probability that the observed state at the k t h leaf on the tree is i k . As is shown in Sumner and Jarvis [35] and in more detail in Sumner et al. [27], such phylogenetic pattern distributions can be represented abstractly as tensors in the n-fold tensor product space n d : = d d d , as follows. If we choose {ξ 1,ξ 2,…,ξ d } as an ordered basis for d , and ordered basis { ξ i 1 ξ i 2 ξ i d } i 1 , i 2 , , i n d for the tensor product space, a “phylogenetic tensor” P n d is then defined as
P = i 1 , i 2 , , i n d p i 1 i 2 i n ξ i 1 ξ i 2 ξ i n .

For readers who are unfamiliar with tensor products, it is possible to understand the general concept via the definition of the “Kronecker” product of a n×m matrix A and a n ×m matrix B as the n n ×m m matrix given by

We can index the matrix AB with row indicies i 1j 1=11,12,…,n n and column indices j 1j 2=11,12,…,m m , i.e. generically ( A B ) i 1 j 1 , i 2 j 2 = A i 1 i 2 B j 1 j 2 and specifically (AB)12,32=A 13B 22. This point of view is useful if one wants to write out specific matrix representations of tensors, however, in the development that follows will focus heavily on the indexing of tensor components in the various cases discussed.

Suppose π = i d π i ξ i d represents the state distribution of a single taxa, i.e. π i is the probability that a randomly chosen site in the sequence will be in state i. Now suppose a phylogenetic branching event occurs and the sequence is copied. The corresponding phylogenetic tensor P = i 1 , i 2 d p i 1 i 2 ξ i 1 ξ i 2 representing the joint distribution of the two-taxa just after the branching event then has the property that p i 1 i 2 = π i 1 if i 2=i 1 and is zero otherwise. Thinking in terms of tensor operations, we find that phylogenetic branching events can be generated by a linear operator δ : d d d determined by δ(π)=P and defined in general using our chosen basis as
δ ( ξ i ) : = ξ i ξ i , δ ( π ) = δ i π i ξ i = i π i δ ( ξ i ) = i π i ξ i ξ i .
The remarkable fact for group-based models, central to the present article, is that the permutation matrices “intertwine” particularly simply with the branching operator:
δ ( K σ ξ i ) = δ ( ξ σ ( i ) ) = ξ σ ( i ) ξ σ ( i ) = K σ K σ · δ ( ξ i ) .
Thus, for any rate matrix Q arising from a group-based model, we have (via the linearity of δ):
δ · Q = λ 1 1 + ε σ G α σ K σ K σ · δ.
(3)
We also note that, since Q can be expressed a linear combination of permutation matrices representing elements in a group G, the matrix powers Q 2,Q 3,Q 4… will also be expressible as linear combinations of the same permutation matrices (although precise expressions for the relevant coefficients may or may not be easily computable). Together with (3), this implies that, for any substitution matrix e Q t arising from matrix exponentiation,
δ · e Qt = e λ exp ε σ G α σ K σ K σ · δ.
(4)
This relation shows that mathematically, and hence conceptually, “Markov evolution on a single followed by a branching event” can be replaced with “Branching event on a single taxon followed by (correlated) Markov evolution of two taxa.” This equivalence is illustrated in Figure 1, and should be compared to the equivalent discussion of the “n-taxa process” given in [18] and [19].
Figure 1

Markov evolution on a single followed by a branching event (illustrated on the left), is equivalent to a branching event on a single taxon followed by correlated Markov evolution of two taxa (illustrated on the right). Mathematically, this equivalence can be implemented by exploiting the equality given in (4).

In Sumner et al. [27] we showed how to generalise this intertwining action to the case of the general Markov model. Interestingly, for the general Markov model the appropriate intertwining has quite a different structure from what occurs in group-based models, and hence the simplicity of (4) is somewhat misleading in general. We refer the reader to Sumner et al. [27] for more discussion on this point.

Returning to the case of group-based models, for each subset A[n], we define a linear map on n d as the tensor product K σ ( A ) : = K σ a 1 K σ a 2 K σ a n where a i =1 if iA and 0 otherwise. For example, if n=5, we have
K σ ( { 1 , 2 , 4 } ) = K σ K σ 1 K σ 1 .
To develop a phylogenetic tensor on a tree, we root the phylogenetic tree at taxon n, and label edges by subsets e[n−1], where ie if the path from taxon n to taxon i crosses the edge labelled by e. A five taxon tree with this labelling, is presented in Figure 2. To each edge labelled by e[n−1], we assign the rate matrix
Q e : = λ e 1 + ε σ G α e σ K σ ,
Figure 2

A six taxa tree rooted at taxon 6 with edges labelled by subsets of {1,2,3,4,5}.

where each α e σ 0 is the rate of substitution for all states σ 1 to σ 2 satisfying σ = σ 2 σ 1 1 , and λ e = σ G α e σ . Each edge is then assigned substitution matrix M e = e Q e , so that the time parameter for each edge is absorbed into the definition of Q e .

Now iterating (4) multiple times, Bashford et al. [27],[31] show that any phylogenetic tensor can be written as
P = e λ exp e n 1 , σ G α e σ K σ ( e ) · δ n 1 π.
(5)

where λ = e n 1 λ e = e n 1 , ε σ G α e σ , and δ n−1π is the d×d×…×d tensor that represents the “zero edge-length star tree” distribution on n taxa. It is this form of phylogenetic tensors that will do a lot of the heavy lifting in the discussion that follows. The reader should note that under this representation, there is no need for the edge parameters α e σ : e n 1 , σ G to be chosen to be compatible with a particular tree, hence the possibilities for generalising to non-tree-like or network models, as discussed in the introduction.

The stationary distribution for group-based models is uniform (because the rate matrices are doubly stochastic). In this paper we always assume a stationary distribution, so that:
π = 1 d ( 1 , 1 , , 1 ) T ,
and δ n−1π has tensor components
δ n 1 π i 1 i 2 i n = 1 d , if i 1 = i 2 = = i n , 0 , otherwise.

This concludes our discussion of the tensor presentation of phylogenetic probability distributions under group-based models. It is important to note that everything discussed so far works for any group-based model, with no requirement that the underlying group G be abelian.

In what follows, we discuss the inversion of abelian group-based models. We present the simplest case with G = 2 ; the G = 3 case; the G = 2 × 2 case; the general G = r case; and finally we discuss the case of any abelian group.

Results

The binary-symmetric case

We begin with the inversion of the so-called “binary-symmetric” model. Consider 2 with standard basis
ξ 0 = 1 0 , ξ 1 = 0 1 .
As a group-based model, the binary-symmetric model arises by taking the group
G : = 2 = { 0 , 1 } + (mod 2) σ | σ 2 = ε ,
with a generic rate matrix given by
Q = 1 1 1 1 = 1 + K ,

where K = 0 1 1 0 is the permutation matrix representing σ in the standard basis.

Now ρ reg : 2 M 2 ( ) , with σK, is the regular representation of 2 , and the character table of 2 given in Table 1 is easily recognised to be the Hadamard matrix
h = 1 1 1 1 .
Table 1

The character table of 2

 

id id

sgn sgn

[ e]

1

1

[ σ]

1

-1

As 2 is an abelian group, the irreducible representations are one-dimensional.

The corresponding projection operators can be read off from the columns of the character table. That is, the operators
Θ id : = 1 2 ε + σ , Θ sgn : = 1 2 ε σ ;

project ρ reg=i ds g n onto the id and sgn representations of 2 , respectively.

This observation prompts us to work in the alternative basis:
f 0 : = Θ id · ξ 0 = Θ id · ξ 1 = h ξ 0 = ξ 0 + ξ 1 , f 1 : = Θ sgn · ξ 0 = Θ sgn · ξ 1 = h ξ 1 = ξ 0 ξ 1 .
In this basis the permutation matrix is diagonal:
K ̂ : = hK h 1 = 1 0 0 1 , Q ̂ : = 1 + K ̂ = 0 0 0 2 .

The representation-theoretic perspective on K ̂ is to observe that i d(σ)=1 and s g n(σ)=−1.

Referring to (5), we know that we can write a generic phylogenetic tensor as
P = e λ exp e n 1 α e K ( e ) · δ n 1 π ,

where λ = e n 1 α e .

We index matrix and tensor indices by using i , j , k = 0 , 1 2 and allow multiplication × in the ring of integers . The Hadamard matrix then has matrix elements [ h ] i j = ( 1 ) i × j where j is the row index and i is the column index. Observe that in the diagonal basis, the permutation matrix has elements
K ̂ i j = δ ij ( 1 ) i .
Thus we have expressions such as
K ̂ ( { 2 , 3 } ) i 1 i 2 i 3 j 1 j 2 j 3 = δ i 1 j 1 δ i 2 j 2 δ i 3 j 3 ( 1 ) i 2 + i 3 ,

where K ̂ ( { 2 , 3 } ) = 1 K ̂ K ̂ .

As we are dealing with tensors of arbitrary size, it is convenient to represent a string such as i 1i 2i n as an ordered-bipartition μ=μ 0: μ 1 of the set [n], where μ 0,μ 1[n] with jμ k if and only if i j =k. For example we have the following equivalences:
00110 { 1 , 2 , 5 } : { 3 , 4 } , 01111 { 1 } : { 2 , 3 , 4 , 5 } , 10001 { 2 , 3 , 4 } : { 1 , 5 }
and inequivalence:
01010 { 1 , 3 , 5 } : { 2 , 4 } { 2 , 4 } : { 1 , 3 , 5 } 10101 .
We then have
K ̂ ( e ) i 1 i 2 i n j 1 j 2 j n = K ̂ ( e ) μ ν = K ̂ ( e ) μ 0 : μ 1 ν 0 : ν 1 = δ μ 0 ν 0 δ μ 1 ν 1 ( 1 ) | e μ 1 | .
Defining h (n):=h (n−1)h where h (1):=h, in the diagonal basis P ̂ : = h ( n ) · P and using our notation h (n) has tensor components
h ( n ) μ ν = h ( n ) μ 0 : μ 1 ν 0 : ν 1 = h ( n ) i 1 i 2 i n j 1 j 2 j n = ( 1 ) i 1 × j 1 + i 2 × j 2 + + i n × j n = ( 1 ) | μ 1 ν 1 | .
The zero edge-length star-tree initial distribution has tensor components
δ n 1 π i 1 i 2 i n = 1 2 δ i 1 i 2 δ i 1 i 3 δ i 1 i n ,
(where, although it seems we have given preference to taxon 1 in this expression, there are many ways that this distribution can be expressed using the δ ij ). In the diagonal basis with δ n 1 π ̂ : = h ( n ) · δ n 1 π , we have components
δ n 1 π ̂ i 1 i 2 i n = 1 2 j 1 , j 2 , , j n ( 1 ) i 1 × j 1 + i 2 × j 2 + + i n × j n δ j 1 j 2 δ j 1 j 3 δ j 1 j n = 1 2 j 1 ( 1 ) i 1 + i 2 + + i n × j 1 = 1 2 1 + ( 1 ) i 1 + i 2 + + i n ,
which is exactly the statement
δ n 1 π ̂ μ = δ n 1 π ̂ μ 0 : μ 1 = 1 2 1 + ( 1 ) | μ 1 | .
Since K ̂ is diagonal in the transformed basis, we can conclude that
P ̂ μ = P ̂ μ 0 : μ 1 = e λ exp e 2 , n α e K ̂ ( e ) μ 0 : μ 1 μ 0 : μ 1 1 2 1 + ( 1 ) | μ 1 | .

Of course many of these tensor components will be zero and we would like to ignore these.

Take u=u 0: u 1 as an ordered bipartition of the reduced set [n−1], so that ui 1i 2i n−1 where ju k if and only if i j =k, and define
γ ( u ) = 0 , if | u 1 | is even, 1 , if | u 1 | is odd; = 2 0 | u 0 | + 1 | u 1 | (mod 2) ,

and interpret u·γ(u) as a string: u·γ(u) = i 1i 2i n−1γ(u).

If we make the definitions
P u : = P ̂ u · γ ( u ) , η u : = 1 2 e n 1 α e K ̂ ( e ) u · γ ( u ) u · γ ( u ) ,
then we can write the non-zero components as
P u = e λ exp η u ,
with inverses
η u = ln P u + λ.
(6)

This is the first part of the inversion.

We would like to go further and actually recover the individual edge weights α e . To do this we define the (square) 2n−1×2n−1 matrix F with components
F u e : = K ̂ ( e ) u · γ ( u ) u · γ ( u ) = ( 1 ) | e u | = h ( n 1 ) u e ,
with e a subset and u an ordered-bipartition of [n−1]. As h ( n 1 ) 2 = 1 2 n 1 1 , we see that F provides its own inverse F −1 with components
F 1 e u : = 1 2 n 1 F u e .
Defining the column vectors α = α e and η = η u , we can write the matrix equations
η = F α , α = F 1 η .

Together with the first part of the inversion (6), these equations give a one-one map between pattern probabilities and edge weights for the binary-symmetric model.

Inversion of the 3 model

Taking confidence from the previous case we now discuss the inversion of the group-based phylogenetic model with G = 3 . We take
3 = { 0 , 1 , 2 } + (mod 3) σ | σ 3 = ε ,

and, by analogy to the 2 case, index tensors with indices i,j=0,1,2 and allow multiplication × by extending 3 to the ring F 3 = { 0 , 1 , 2 } + , × (mod 3) .

In this case a generic rate matrix is given by
Q = ( α + β ) β α α ( α + β ) β β α ( α + β ) = α + β 1 + α K 1 + β K 2 ,
where
K 1 = 0 0 1 1 0 0 0 1 0 , K 2 = 0 1 0 0 0 1 1 0 0 ,

are the matrices representing the permutations σ(123) and σ 2(132) under the regular representation, respectively.

We define ω=e 2π i/3, and present the character table of 3 in Table 2. The decomposition of the regular representation is ρ reg=i dωω 2, and the columns of the character table give the projection operators onto the (one-dimensional) irreducible subspaces:
Θ id : = 1 3 ε + σ + σ 2 , Θ ω : = 1 3 ε + ωσ + ω 2 σ 2 , Θ ω 2 : = 1 3 ε + ω 2 σ + ω σ 2 .
Table 2

The character table of 3

 

id id

ω

ω 2

[ e]

1

1

1

[ σ]

1

ω

ω 2

[ σ 2]

1

ω 2

ω

Therefore, the matrix
f = 1 1 1 1 ω ω 2 1 ω 2 ω ,
diagonalizes the generic rate matrix for this model:
Q ̂ = fQ f 1 = 0 0 0 0 αω + β ω 2 0 0 0 α ω 2 + βω ,
or, equivalently,
K ̂ 1 = fK 1 f 1 = 1 0 0 0 ω 0 0 0 ω 2 , K ̂ 2 = fK 2 f 1 = 1 0 0 0 ω 2 0 0 0 ω .
We recall our basic result (5) that for group-based models, a generic phylogenetic tensor can be expressed as
P = e λ exp e n 1 α e K 1 ( e ) + β e K 2 ( e ) · δ n 1 π ,

where λ = e n 1 α e + β e . We take the stationary distribution as initial distribution, so π = 1 3 , 1 3 , 1 3 T .

The matrix elements of f can be expressed as [ f ] i j = ω i × j , where we extend i , j 3 to include multiplication × from the ring of integers . Similarly,
K ̂ 1 i j = δ ij ω i , K ̂ 2 i j = δ ij ( ω 2 ) i .
More generally, tensorial components can be expressed as
1 K 1 ̂ K 1 ̂ i 1 i 2 i 3 j 1 j 2 j 3 = δ i 1 j 1 δ i 2 j 2 δ i 3 j 3 ω i 2 + i 3 .
We represent a string i 1i 2i n as an ordered-tripartition, i 1i 2i n μ=μ 0: μ 1: μ 2, of the set [n], where jμ k if and only if i j =k. For example, if we take n=5, we have:
00000 { 1 , 2 , 3 , 4 , 5 } :∅:∅ , 20120 { 2 , 5 } : { 3 } : { 1 , 4 } , 01122 { 1 } : { 2 , 3 } : { 4 , 5 } .
Taking n = 3, we have
K ̂ 1 ( { 2 , 3 } ) μ ν = 1 K ̂ 1 K ̂ 1 μ ν = 1 K ̂ 1 K ̂ 1 μ 0 : μ 1 : μ 2 ν 0 : ν 1 : ν 2 = δ μν ω | μ 1 { 2 , 3 } | + 2 | μ 2 { 2 , 3 } | ,
and in general:
K ̂ 1 ( e ) μ ν = δ μν ω | e μ 1 | + 2 | e μ 2 | , K ̂ 2 ( e ) μ ν = δ μν ω | e μ 2 | + 2 | e μ 1 | .
Taking the uniform distribution as initial distribution, the initial star-tree distribution can be written as
δ n 1 π i 1 i 2 i n = 1 3 δ i 1 i 2 δ i 1 i 3 δ i 1 i n .
Defining f (n)=f (n−1)f where f (1)=f, we have
f ( n ) μ ν = f ( n ) i 1 i 2 i n j 1 j 2 j n = f i 1 j 1 f i 2 j 2 f i n j n = ω i 1 × j 1 + i 2 × j 2 + + i n × j n ,
and in the transformed basis, where δ n 1 π ̂ : = f ( n ) · δ n 1 π , we have
δ n 1 π ̂ i 1 i 2 i n = 1 3 j 1 , j 2 , , j n ω i 1 × j 1 + i 2 × j 2 + + i n × j n × δ j 1 j 2 δ j 1 j 3 δ j 1 j n = 1 3 j 1 ω j 1 × ( i 1 + i 2 + + i n ) = 1 3 1 + ω i 1 + i 2 + + i n + ( ω 2 ) i 1 + i 2 + + i n .
Indexing by ordered-tripartitions, we conclude that
δ n 1 π ̂ μ = 1 3 1 + ω i 1 + i 2 + + i n + ( ω 2 ) i 1 + i 2 + + i n = 1 3 1 + ω | μ 1 | + 2 | μ 2 | + ( ω 2 ) | μ 1 | + 2 | μ 2 | .
Now suppose |μ 1|+2|μ 2|=0 (mod 3), then
δ n 1 π ̂ μ = 1 3 1 + 1 + 1 = 1 .
If |μ 1|+2|μ 2|=1 (mod 3), then
δ n 1 π ̂ μ = 1 3 1 + ω + ω 2 = 0 ,
and if |μ 1|+2|μ 2|=2 (mod 3), then
δ n 1 π ̂ μ = 1 3 1 + ω 2 + ω = 0 .
Thus we have found a basis where all the elements of the initial star-tree tensor are zero unless the tripartion μ satisfies |μ 1|+2|μ 2|=0 (mod 3). Crucially, this statement also holds for the phylogenetic tensor P ̂ because in this basis the rate matrices of this model are diagonal:
P ̂ μ = P ̂ μ 0 : μ 1 : μ 2 = e λ exp 1 2 e n 1 α e K 1 ( e ) + β e K 2 ( e ) μ 0 : μ 1 : μ 2 μ 0 : μ 1 : μ 2 × 1 3 1 + ω 1 | μ 1 | + ω 2 | μ 2 | .
We deal with this condition on μ by taking u=u 0: u 1: u 2 as an ordered-tripartion of the reduced set [n−1] and setting μ=u·γ(u) (considered as the concatenation of strings) where
γ ( u ) = 0 , if | u 1 | + 2 | u 2 | = 0 1 , if | u 1 | + 2 | u 2 | = 2 2 ; if | u 1 | + 2 | u 2 | = 1 = 3 0 | u 0 | + 1 | u 1 | + 2 | u 2 | (mod 3) .
If we make the definitions
P u : = P ̂ u · γ ( u ) , η u : = e n 1 α e K 1 ( e ) + β e K 2 ( e ) u · γ ( u ) u · γ ( u ) ,
we then have the first part of the inversion
P u = e λ exp η u , η u = ln P u + λ.
(7)

As in the 2 case, we would like to use η u to recover the rate parameters α e ,β e for all e[n−1] and thus complete the full inversion for this model. Of course, it is little bit more difficult this time.

Recall that μ=μ 0: μ 1: μ 2 with μ i [n], whereas u=u 0: u 1: u 2 with u i [n−1], and e[n−1]. Considering
K 1 ( e ) μ μ = ω | e μ 1 | + 2 | e μ 2 | ,
it follows that
K 1 ( e ) u · γ ( u ) u · γ ( u ) = ω | e u 1 | + 2 | e u 2 | ,
and similarly
K 2 ( e ) u · γ ( u ) u · γ ( u ) = ω | e u 2 | + 2 | e u 1 | .
We make the observation that
F 1 u e : = f ( n 1 ) u 0 : u 1 : u 2 e c :e:∅ = ω | u 1 e | + 2 | u 2 e | = K α ( e ) u · γ ( u ) u · γ ( u ) ,
and
F 2 u e : = f ( n 1 ) u 0 : u 1 : u 2 e c :∅:e = ω | u 2 e | + 2 | u 1 e | = K β ( e ) u · γ ( u ) u · γ ( u ) ,

where F 1 and F 2 are 2n−1×3n−1 matrices.

Thus we may write
η u = e n 1 α e F 1 u e + β e F 2 u e .
Defining the column vectors α = { α e } , β = { β e } and η = { η u } , we can write
η = F 1 α + F 2 β ,
and define two 3n−1×2n−1 matrices G 1 and G 2 as
G 1 e u : = f 1 ( n 1 ) e c :e:∅ u , G 2 e u : = f 1 ( n 1 ) e c :∅:e u ,
where
f 1 = 1 1 1 1 ω ω 2 1 ω 2 ω ,

with f f −1=1.

Considering that
v f 1 ( n 1 ) u v f ( n 1 ) v w = δ uw ,
for all ordered-triparitions u,w of [n−1], we have the matrix products
G 1 F 1 = 1 , G 1 F 2 = 0 , G 2 F 2 = 1 , G 2 F 1 = 0 .
Thus the second part of the inversion for this model is
α = G 1 η , β = G 2 η .

Together with (7), these equations give a one-one map between pattern probabilities and edge weights for the group-based model with G = 3 .

Inversion of the K3ST model

We now consider the K3ST model [36] which occurs as the group-based model with
G = 2 × 2 = { ( 0 , 0 ) , ( 0 , 1 ) , ( 1 , 0 ) , ( 1 , 1 ) } + (mod 2) ( 12 ) ( 34 ) , ( 13 ) ( 24 ) .
In this model a generic rate matrix is given by
Q = α + β + γ 1 + α K 01 + β K 10 + γ K 11 ,
where
K 01 = 1 K = 0 1 0 0 1 0 0 0 0 0 0 1 0 0 1 0 , K 10 = K 1 = 0 0 1 0 0 0 0 1 1 0 0 0 0 1 0 0 , K 11 = K K = 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0 .
(8)
We already know that the 2×2 Hadamard matrix h diagonalizes K, so we see immediately that H=hh diagonalizes this model:
K ̂ 01 : = HK 01 H 1 = 1 hK h 1 = 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 , K ̂ 10 : = HK 10 H 1 = 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 , K ̂ 11 : = HK 11 H 1 = 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 .

Of course H is the character table of 2 × 2 and the permutation matrices (8), together with K 00:=1, give the regular representation ρ regi di di ds g ns g ni ds g ns g n, where we recall the basic result that the tensor product of two irreducible representations of a group G gives an irreducible representation of G×G.

Simplifying notation, for this model we index tensors with indices given as pairs: i , j = 00 , 01 , 10 , 11 2 × 2 ; and we express the individual parts using lower case Roman characters. For example, we write i:=a b=01, with a=0 and b=1. This gives matrix elements:
K ̂ 01 ab cd = δ ac δ bd ( 1 ) b , K ̂ 10 ab cd = δ ac δ bd ( 1 ) a , K ̂ 11 ab cd = δ ac δ bd ( 1 ) a + b ;
and more complicated tensor products such as
K ̂ 01 K ̂ 01 1 a 1 b 1 a 2 b 2 a 3 b 3 c 1 d 1 c 2 d 2 c 3 d 3 = δ a 1 c 1 δ b 1 d 1 δ a 2 c 2 δ b 2 d 2 δ a 3 c 3 δ b 3 d 3 ( 1 ) b 1 + b 2 .
Again we interpret strings such as μa 1a 2a n and νb 1b 2b n as ordered-bipartitions μ=μ 0: μ 1 and ν=ν 0: ν 1 of the set [n]. We can then write matrix elements of tensor products as
K ̂ 01 ( e ) μ , ν μ , ν = δ μ μ δ ν ν ( 1 ) | e ν 1 | , K ̂ 10 ( e ) μ , ν μ , ν = δ μ μ δ ν ν ( 1 ) | e μ 1 | , K ̂ 11 ( e ) μ , ν μ , ν = δ μ μ δ ν ν ( 1 ) | e μ 1 | + | e ν 1 | .
Taking the stationary distribution π = 1 4 ( 1 , 1 , 1 , 1 ) T as initial distribution, the zero edge-length star-tree distribution is given by
δ n 1 π i 1 i 2 i n = 1 4 δ i 1 i 2 δ i 1 i 3 δ i 1 i n ,
which in the finer index representation is
δ n 1 π a 1 b 1 a 2 b 2 a n b n = 1 4 δ a 1 a 2 δ a 1 a 3 δ a 1 a n δ b 1 b 2 δ b 1 b 3 δ b 1 b n .
Recall that elements of the Hadamard matrix can be written as h b a = ( 1 ) a × b , where a , b 2 and we allow multiplication × by extending to the ring of integers . In the transformed basis, we have
δ n 1 π ̂ a 1 b 1 a 2 b 2 a n b n = δ n 1 π ̂ μ , ν = 1 4 c 1 , c 2 , , c n d 1 , d 2 , , d n h c 1 a 1 h c 2 a 2 h c n a n h d 1 b 1 h d 2 b 2 × h d n b n δ a 1 a 2 δ a 1 a 3 δ a 1 a n δ b 1 b 2 δ b 1 b 3 δ b 1 b n = 1 4 c 1 , d 1 ( 1 ) ( a 1 + a 2 + a n ) × c 1 + ( b 1 + b 2 + + b n ) × d 1 = 1 4 1 + ( 1 ) a 1 + + a n + ( 1 ) b 1 + + b n + ( 1 ) a 1 + + a n + b 1 + + b n = 0 , if either | μ 1 | or | ν 1 | is odd ; 1 , if | μ 1 | and | ν 1 | are both even .
We recall (5), so under this model we can express a generic phylogenetic tensor as
P = e λ exp e n 1 α e K 01 ( e ) + β e K 10 ( e ) + γ e K 11 ( e ) · δ n 1 π.
To exclude the vanishing components we define, for all ordered bipartitions u=u 0: u 1 of the reduced set [n−1],
γ ( u ) = 0 , if | u 1 | is even , 1 , if | u 1 | is odd ; = 2 ( 0 | u 0 | + 1 | u 1 | ) (mod 2) ,
and intepret u·γ(u) as the string u·γ(u)=a 1a 2a n−1γ(u). Then, for each pair u,v of ordered-bipartitions of [n−1], we define
η u , v : = e n 1 α e K 01 ( e ) + β e K 10 ( e ) + γ e K 11 ( e ) u · γ ( u ) , v · γ ( v ) u · γ ( u ) , v · γ ( v ) ,
and
P u , v : = P u · γ ( u ) , v · γ ( v ) ,
This gives the inversion
P u , v = e λ exp η u , v , η u , v = λ + ln P u , v .
Consider the 2 n ×2n−1 rectangular matrices F 01, F 10 and F 11 with components
F 01 u , v e = K 01 ( e ) u , v u , v = ( 1 ) | e v 1 | , F 10 u , v e = K 10 ( e ) u , v u , v = ( 1 ) | e u 1 | , F 11 u , v e = K 11 ( e ) u , v u , v = ( 1 ) | e u 1 | + | e v 1 | ;
where e[n−1] and u=u 0: u 1 and v=v 0: v 1 are ordered-bipartitions of [n−1]. If we define the column vector η : = { η u , v } indexed by pairs of ordered-bipartitions and the column vectors α : = { α e } , β : = { α e } and γ : = { α e } indexed by subsets of [n−1], we then have the matrix equation
η = F 01 α + F 10 β + F 11 γ .
Writing H (n)=H (n−1)H with H (1)=H, we note that
F 01 u , v e = H ( n 1 ) u , v , e , F 10 u , v e = H ( n 1 ) u , v e , , F 11 u , v e = H ( n 1 ) u , v e , e ;
and define the 2n−1×2 n rectangular matrices G 01,G 10 and G 11 as
G 01 e u , v = H 1 ( n 1 ) , e u , v , G 10 e u , v = H 1 ( n 1 ) e , u , v , G 11 e u , v = H 1 ( n 1 ) e , e u , v .
Noting that
w , x H 1 ( n 1 ) u , v w , x H ( n 1 ) w , x y , z = δ u , y δ v , z ,
for all u,v,y,z ordered-bipartitions of [n−1], we then have the matrix identities
G 01 F 01 = 1 , G 10 F 10 = 1 , G 11 F 11 = 1 ,
and
G 01 F 10 = 0 = G 01 F 11 = G β F 11 = G 10 F 01 = G 11 F 01 = G 11 F 10 .
Writing
α = G 01 η , β = G 10 η , γ = G 11 η ,

completes the inversion for the K3ST model.

Inversion of the r model

We now consider the group based model for r = 0 , 1 , 2 , ( r 1 ) + ( mod r ) σ : σ r = e . For this model the generic rate matrix has the form
Q = λ 1 + i = 1 r α i K σ i ,
where λ = i = 1 r α i and
K σ = 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 1 0 ,

so that K σ i = K σ i .

Defining ω=e 2π i/r, we have ω r =1 and 1+ω+ω 2+…+ω r−1=0 and f i j = ω ij where i,j=0,1,2,…,r−1. Of course, f is the character table of r and f 1 j i = 1 r ω ij .

Lemma 1.

ν f f f μ ν f 1 f 1 f 1 ν μ = δ μ μ ,

where μ,ν,μ are ordered-r-partitions of the set [n] defined by the strings i 1i 2i n , j 1j 2j n and k 1k 2k n , respectively.

Proof.

The result is obvious by the definition of tensor product. However, explicitly we have
ν f f f μ ν f 1 f 1 f 1 ν μ = 1 r n 0 j 1 , j 2 , , j r 1 ( r 1 ) × ω i 1 j 1 + i 2 j 2 + i r 1 j r 1 ω j 1 k 1 + j 2 k 2 + + j n k n = 1 r n 0 j 1 , j 2 , , j r 1 ( r 1 ) × ω j 1 ( i 1 k 1 ) + j 2 ( i 2 k 2 ) + + j n ( i n k n )

which clearly equals 1 if i k =0 for all , and, by repeatedly applying 1+ω+ω 2+…+ω r−1=0, equals 0 otherwise.

The regular representation contains exactly one copy of every irreducible representation and the irreducible representations of r are given by the powers of ω:
ρ i : r σ ω i .

Thus the change of basis K σ i K ̂ σ i = fK σ i f 1 will give diagonal matrices K ̂ σ i . Additionally,

Lemma 2.

In the diagonal basis, the matrices K ̂ σ i : = fK σ i f 1 have matrix elements given by K ̂ σ s i j = ω is δ ij .

Proof.

Consider the matrix elements K σ s i j = δ i σ s ( j ) . Thus
fK σ s f 1 j i = k , l ω ik δ k σ s ( l ) ω lj = l ω i σ s ( l ) lj = l ω i ( l + s ) lj = ω is l ω l ( i j ) = ω is δ ij ,

where we have used ω σ s ( m ) = ω m + s .

Now
δ n 1 π i 1 i 2 i n = 1 r δ i 1 i 2 δ i 1 i 3 δ i 1 i n ,
and
δ n 1 π ̂ i 1 i 2 i n = 1 r j 1 , j 2 , , j r ω i 1 j 1 + i 2 j 2 + + i n j n × δ j 1 j 2 δ j 1 j 3 δ j 1 j n = 1 r j 1 ω j 1 ( i 1 + i 2 + + i n ) = 1 if i 1 + i 2 + + i n = 0 (mod r) 0 , otherwise.

Translating this result using the ordered-r-partitions for indices, we have

Lemma 3.

In the diagonal basis, the uniform initial distribution on the star tree has components
δ n 1 π ̂ μ = 1 if 0 | μ 0 | + 1 | μ 1 | + 2 | μ 2 | + + ( r 1 ) | μ r 1 | = 0 (mod r) 0 , otherwise. ,

where μ=μ 0: μ 1: μ 2:: μ r−1 is an ordered-r-partition of the set [n].

Again recall that for this model a generic phylogenetic tensor can be written as
P = e λ exp e n 1 , s r 1 α e s K σ s ( e ) δ n 1 π ,
where π = 1 r ( 1 , 1 , , 1 ) T . In the diagonal basis P ̂ : = f ( n ) · P and as a consequence of Lemma 3 P ̂ will have many vanishing components. To avoid these we take u=u 0: u 1: u 2:: u r−1 as an ordered-r-partition of [n−1] and set
γ ( u ) = r 0 | u 0 | + 1 | u 1 | + 2 | u 2 | + + ( r 1 ) | u r 1 | (mod r) .
If we define P u : = P ̂ u · γ ( u )