The phylogeny of the mammalian heme peroxidases and the evolution of their diverse functions
© Loughran et al. 2008
Received: 29 August 2007
Accepted: 27 March 2008
Published: 27 March 2008
Skip to main content
© Loughran et al. 2008
Received: 29 August 2007
Accepted: 27 March 2008
Published: 27 March 2008
The mammalian heme peroxidases (MHPs) are a medically important group of enzymes. Included in this group are myeloperoxidase, eosinophil peroxidase, lactoperoxidase, and thyroid peroxidase. These enzymes are associated with such diverse diseases as asthma, Alzheimer's disease and inflammatory vascular disease. Despite much effort to elucidate a clearer understanding of the function of the 4 major groups of this multigene family, we still do not have a clear understanding of their relationships to each other.
Sufficient signal exists for the resolution of the evolutionary relationships of this family of enzymes. We demonstrate, using a root mean squared deviation statistic, how the removal of the fastest evolving sites aids in the minimisation of the effect of long branch attraction and the generation of a highly supported phylogeny. Based on this phylogeny we have pinpointed the amino acid positions that have most likely contributed to the diverse functions of these enzymes. Many of these residues are in close proximity to sites implicated in protein misfolding, loss of function or disease.
Our analysis of all available genomic sequence data for the MHPs from all available completed mammalian genomes, involved sophisticated methods of phylogeny reconstruction and data treatment. Our study has (i) fully resolved the phylogeny of the MHPs and the subsequent pattern of gene duplication, and (ii), we have detected amino acids under positive selection that have most likely contributed to the observed functional shifts in each type of MHP.
Heme peroxidases are readily abundant enzymes that can be classified into two major families, namely the animal and non-animal peroxidases, that have arisen from two independent evolutionary events . The non-animal peroxidases include plant, bacterial, fungal and protist . The classical peroxidase cycle involves the reaction sequence from native enzyme through compound I, then compound II and finally back to native enzyme . An alternative and highly important pathway that mammalian heme peroxidases (MHPs) pass through, depending on substrate availability, is the halogenation cycle . In the presence of H2O2 and a halide (especially iodide), myeloperoxidase (MPO) can catalyse a halogenation reaction that plays an important role in the antibacterial activity of leukocytes . Animal peroxidases are a medically important group of enzymes implicated in many different diseases including asthma , Alzheimer's disease (AD)  and inflammatory vascular disease . From biochemical studies it is believed that the heme peroxidases for mammals arose following a number of gene duplication events [3, 8, 9].
Gene duplication provides the raw material for evolution of diversity and is believed to be the principal source of new genes . The process of gene duplication has a number of alternative outcomes, and remains a controversial issue. Gene duplicates may become functionally redundant , or functionally divergent. There are a number of ways in which functional redundant duplicates can be preserved [12, 13]. It has been proposed that the preservation of duplicates can be brought about by degenerative mutations in the regulatory elements of the duplicates, this is referred to as the Duplication-Degeneration-Complementation model (DDC) . The DDC model does not allow a role for positive selection in the evolution of duplicates and is based solely on a neutral model with degenerate mutations and subsequent negative selection. Under this model duplicates are preserved as each accumulates degenerate mutations, resulting in specific subfunctions that in toto ensure optimal fitness .
An alternative mode of duplicate retention is positive selection. For example, in direct contrast to the predictions of the DDC model it has been shown for human and mouse that the number of retentions and losses of duplicates fits more consistently with a model incorporating positive selection . Rapid divergence in gene expression profiles of duplicates following the duplication event results in expression profiles as diverse as those of singletons. An example of this is the functional redundancy of transcription factor inhibitors, Iκα and β, that have acquired different functions through divergence of gene expression rather than biochemical function . Recent studies have indicated that for mammalian genomes neofunctionalisation, be it independent of -, or coupled with – subfunctionalisation, is the most common mode of evolution of gene duplicates . These selective pressures following the process of gene duplication are key to the evolution of specificity of divergent multigene families, such as the MHPs .
In those cases where having all duplicates is deleterious, dosage requirements may cause the partitioning of subfunctions to be favored by positive selection resulting from selective pressure for the fixation of nonfunctional or subfunctional alleles. The divergence of function may occur through neofunctionalisation , or, subfunctionalisation where the ancestral function is partitioned between the duplicates  (for detail on current gene duplication models see ).
We hypothesise that the selective pressures on MHPs following gene duplication events will, (i) still be traceable in the extant sequences of these enzymes, and (ii), will have contributed to the functional diversity observed in these enzymes. A fully resolved phylogeny can provide a basis for such comparative genomic analysis of these heme peroxidases.
Mammalian heme peroxidase features and functions (adapted from Clark 2000 and O'Brien 2000).
Superfamily (EC no.)
Chromosomal Location (Human)
Neutrophils, mono-nuclear phagocytes
Milk, saliva, tears and other secretions
Bacteriostatic and bactericidal activity
Thyroid cell surface and cytoplasm
Thyroid hormone biosynthesis
To infer the phylogeny of the MHPs from sequence data, it is fundamental to consider the challenges associated with resolving mammalian gene phylogenies. The main pitfalls include poor phylogenetic signal resulting from mutationally saturated positions, inadequate modelling of the evolutionary process and systematic bias due to variable rates of evolution among species or within sequences .
A systematic bias or systematic error is one that results in greater support for an incorrect conclusion with the accumulation of more data. Long branch attraction (LBA) is one of the most commonly occurring systematic biases and is a consequence of unequal evolutionary rates across lineages. This can occur due to the number of cell divisions per unit time being different in different species or due to rapid fixation of mutations due to reduced population size, e.g., a bottleneck. Rodent species accumulate many more mutations within a defined time frame than larger mammals [27, 28]. Therefore, rodentia are often placed close to the outgroup species on a phylogeny due to their increased number of mutations. There are a number of ways in which the noise (LBA) can be minimised. Firstly, the addition of more taxa to the dataset: denser sampling of species of intermediate generation time can reduce the effect of LBA by reducing the overall distances between taxa. Secondly, the use of improved models of sequence evolution, i.e., models sensitive to multiple substitutions at the same site and rate heterogeneity across the phylogeny. And finally, stripping the alignment of its most rapidly evolving sites and using only the remaining more slowly evolving sites to reconstruct phylogenies reduces the amount of LBA noise in the dataset . These approaches can be used in combination. While databases such as Peroxibase  house all the up-to-date peroxidase sequences , we have included only those MHPs from completed mammalian genomes (allows us identify species-specific gene birth and death). We have used Maximum Likelihood (ML) and Bayesian methods of phylogeny reconstruction together with the stripping of the most rapidly evolving sites in the dataset.
The major questions addressed in this study pertain firstly to the resolution of the evolutionary relationships of these MHPs using molecular sequence data, and secondly, to the analysis of functional diversities among these superfamilies using the resolved phylogeny and ML methods for testing selective pressures.
Selection can be classified as being neutral, purifying or positive. Positive selection/Adaptive evolution is strongly indicative of functional shifts within proteins . To determine what selective pressures may have influenced the functional diversification of the MHP families, we tested the data using a variety of ML models of evolution with different properties. These included models that allow for only purifying selection and/or neutral evolution, and those that allow for positive selection. Likelihood scores for all alternative models and their null hypotheses are calculated. The likelihood scores for the null hypothesis versus the alternative hypothesis for those models that are extensions of each other were then compared using a likelihood ratio test (LRT) for goodness-of-fit. For those models that allow for the estimation of site-specific evolution, we can identify those amino acids that have undergone positive selection. The location of these amino acid positions were estimated using Bayesian statistics and their location and possible functional significance were determined. In our analysis we have shown that positive selection has contributed to the evolution of these enzymes following gene duplication events.
Despite the 4 major clades in the phylogeny corresponding to the 4 major groups of MHPs, the relationships of the species within these clades conflicts with the previously published mammalian species phylogeny . The rat and mouse are members of the glires group, and as such are a sister group to the primates, which together form the Euarchontoglires mammalian superorder. The topology seen here for the LPOs (see Figure 2a) suggests that dog and cow are the outgroup to the primate clade. This is a common error in mammalian phylogeny reconstruction, and has been proven to be an effect of LBA . Also, for the TPO group opossum is placed next to rat and mouse and not as the outgroup as expected, suggesting that the opossum and the rodents have similar rapid rates of evolution, see Figure 2a.
All gene duplication events were verified using gene tree – species tree reconciliation. We analysed the resolved MHP phylogeny (Figure 3a), and identified in total 4 duplication events and 4 losses. This method over prescribes gene losses as in the case of EPO, where the sequence data was not available and therefore is assumed to be a loss. There is an LPO specific duplication event predicted, see Figure 3b. Our results show differential retention and loss in the LPO lineage following this gene duplication event resulting in the cow species retaining an alternative duplicate copy to the other mammals in the dataset, as shown in Figure 3b. This method must be used with caution as it does not take into account rate heterogeneity amongst species or sites in the data, and relies solely on the topology. However, reciprocal BLAST analysis of the cow sequence against the other mammal genomes identifies this sequence as an ortholog.
We wished to test the hypothesis that following the gene duplication events in the MHPs (as resolved in this study), selective forces – specifically positive selection – have contributed to the observed changes in function in each of the 4 major groups of MHPs. Tests for heterogeneous selective pressures were carried out on the resolved phylogeny using the evolutionary models implemented in PAML 3.15  and the complete MSA. The Dn/Ds ratios were estimated in a likelihood framework at both site-specific and lineage-specific levels. A total of seven tests of significance were carried out using χ2 tests of significance, five site-specific comparisons and two branch-site comparisons were performed.
No positively selected sites were estimated for the one ratio model (see Additional file 3). Strong purifying selection across sites was indicated with an ω of 0.1516. However, this model is a poor fit for the data (ln L = -34417.1085). Positive selection was tested in a site-specific manner across the dataset using the site models; M1 (neutral), M2 (selection), M3 discrete (k = 2), M3 discrete (k = 3), M7 (beta), M8 (beta & omega > 1) and M8a (beta & omega = 1). The results of the site-specific analysis are shown in Additional file 3.
Poor likelihood values were achieved using the site-specific models of evolution, however, the most complex site-specific model used, M8 yielded significant results when it was tested with its null model M8a. A small proportion of sites are under relaxed positive selection (Additional file 3). Through the use of Bayesian estimations, four positively selected sites have been identified across the alignment, with posterior probability (PP) > 0.50.
Parameter estimates and likelihood scores for branch-site model, model B.
Estimates of parameters
Positively selected sites
p0 = 0.4975, p1 = 0.4553, (p2 = 0.0246, p3 = 0.0225)
ω0 = 0.0458, ω1 = 0.3307, ω2 = 0.0458, ω3 = 0.3307
ω0 = 0.0458, ω1 = 0.3307, ω2 = 251.6783, ω3 = 251.6783
19 > 0.50
2 > 0.95
1 > 0.99
p0 = 0.4967 p1 = 0.4469, (p2 = 0.0297, p3 = 0.0267)
ω0 = 0.0464, ω1 = 0.3322, ω2 = 0.0464, ω3 = 0.3322
ω0 = 0.0464, ω1 = 0.3322, ω2 = 774.6323, ω3 = 774.6323
28 > 0.50
6 > 0.95
4 > 0.99
p0 = 0.4431, p1 = 0.3884, (p2 = 0.0898, p3 = 0.0787)
ω0 = 0.0470, ω1 = 0.3414, ω2 = 0.0470, ω3 = 0.3414
ω0 = 0.0470, ω1 = 0.3414, ω2 = 82.8559, ω3 = 82.8559
96 > 0.50
18 > 0.95
11 > 0.99
p0 = 0.4358, p1 = 0.3690, (p2 = 0.1057, p3 = 0.0895)
ω0 = 0.0479, ω1 = 0.3468, ω2 = 0.0479, ω3 = 0.3468
ω0 = 0.0479, ω1 = 0.3468, ω2 = 999.0000, ω3 = 999.0000
82 > 0.50
8 > 0.95
Our results show that following gene duplication, each individual type of MHP has undergone positive selection in amino acid residues that are unique to that type of MHP, see Table 2. As positive selection is closely associated with functional shift, we postulate that these positively selected sites have significantly contributed to the evolution of the functional diversity of these MHPs.
For the MPO superfamily, a total of 19 positively selected sites were identified (PP > 0.50). We have found functional information from the literature on 11 of these sites, these are now discussed: Position 80 (Arg) is located within the propeptide sequence and is under positive selection. Previous studies indicate that propeptide in MPO plays a key role in the processing and sorting of human MPO . Position 568 is under positive selection and is next to the polymorphic site R569W, mutations in position 569 have been shown to suppress posttranslational processing in MPO . The 2 positions with strongest support, PP > 0.95, are separated by 8 amino acid residues on the MPO heavy chain, they are Asn496 and Leu504. These 2 positions along with Tyr500 are in close proximity to the proximal heme ligand in MPO, His502 . Position 259 (Leu) is located between two important distal residues, Gln257 and His261, involved in the formation of hydrogen bonds . His261 has an important role in the formation of compound I, a redox intermediate of the peroxidase cycle . A further four sites (Leu630, Gln633, Glu652; (primates Lys652) and Asn654 (primates Lys654) were identified as positively selected, PP > 0.70, these are located within a disulfide bond linking helices 19 and 22 on the MPO heavy chain. Disulfide bonds are associated with the folding and stability of proteins and as such are significant to the overall function of that protein .
For the EPO clade, 28 sites are positively selected, PP > 0.50. We have found functional information for 15 of these sites. One of these, Asp71, is located in the EPO propeptide. The inferred phylogeny, shown in Figure 3a, suggests that MPO and EPO are closely related enzymes, therefore it may be possible that the EPO propeptide may also be crucial for the function of EPO. The region separating the catalytic residues Arg377 and His474 , contains 8 positively selected sites (PP > 0.50). Arg377 is the conserved prominent distal amino acid associated with hydrogen bond formation. The proximal heme ligands His474 (EPO), His502 (MPO) and His468 (LPO), are conserved in all the MHPs [3, 25]. Six of the 28 positively selected sites, Arg584, Gln588, Arg591, Ala618, Gly626 and Ala627, are located on the EPO heavy chain within a single disulfide bond region, this would suggest that they are structurally and functionally important to EPO. Position 441 has been identified as under positive selection, this residue has also been noted as being polymorphic in the human population (Lys/Thr).
There are 18 positively selected sites for the LPO group (PP > 0.95). We have found functional information on 13 of these sites. Residues Glu72, Asn87 and Trp91 are found in the LPO propeptide sequence and have a probability of greater than 0.95 of being positively selected. Residues Asn255, Phe282, Ser312, Ser352 and Glu355 are all located in the disulfide bond region (PP > 0.95). From biochemical analysis both Arg372 (Arg377 in EPO) and His468 are believed to have catalytic properties, and are conserved in the MHPs [3, 25]. We find positive selection in His376 (PP > 0.99) just four amino acids downstream of the first of these catalytic residues (Arg372), interestingly this site is specific to the primate lineage. Also we have detected positive selection in Glu470 (PP > 0.98) adjacent to the second catalytic site (His468). We have also detected positive selection in Asp700 which is a known genetic variant and Glu240 and Gln245 that are located to the right and left of a known human polymorphism A244T.
With the TPO clade treated as foreground, 8 sites are positively selected, PP > 0.95. Of these 8 sites, 6 are missing in the alternatively spliced TPO isoform 5, which exhibits incorrect protein folding . Asp228 (PP > 0.95), Ala232 and Ala242 (both PP > 0.50) are in the region of the TPO active site His239. Glu378 has also been identified as a novel mutational site (E378K) associated with the common inherited deficiency total iodide organification defect (TIOD) and is under positive selection in our analysis .
Summary of results of analysis using DIVERGE software.
Summary of results from SwissModel analysis of positively selected sites.
Affect on Hydrogen Bond
The MHPs are a functionally diverse family of enzymes which are implicated in a variety of inflammatory and neurodegenerative diseases such as asthma and AD respectively. In this study the evolutionary history of the four major groups of MHPs; MPO, EPO, LPO and TPO, was investigated allowing for the analysis of their functional diversity.
Initial ML and Bayesian phylogenies estimated here for the MHPs support previous biochemical studies [3, 8, 9]. From Figure 3 the order of gene duplication events can be traced, with an MPO-EPO-LPO MRCA arising from a gene duplication with extant TPO; then a further duplication event that gave rise to, (i) the MPO-EPO MRCA, and (ii), the lineage leading to extant LPO; and the final and most recent duplication of the MPO-EPO MRCA into extant MPO and EPO clades. PXDN is the outgroup to the MHP sequences and was included in the analysis to illustrate that TPO is the most ancestral MHP (Figure 2a). However, the species relationships estimated within these clearly defined clades were in disagreement with the previously resolved mammalian phylogeny .
Including all sites of the alignment in the analysis, we have shown that the major types of MHP form monophyletic clades and are therefore the result of gene duplication events prior to speciation of modern day mammals, see Figure 2(a). However, also evident from Figure 2(a), species with more similar generation times are clustered together, with species of shorter generation times and therefore more rapid rates of mutation assuming a basal position in the phylogeny. This observed branching pattern could be a result of LBA, incorrect ortholog prediction or hidden paralogy.
If a phylogeny is seen to approach the ideal by removing the most rapidly evolving sites, then we propose that LBA is most likely to have contributed to the misleading phylogeny. To test for the presence of LBA we calculated 8 categories of rates of evolution for all sites, from the most rapidly evolving to the most slowly evolving. We observed that the sequential removal of rapidly evolving categories of sites from the alignment decreased the difference, in terms of nodal distance RMSD, between the phylogeny produced and the ideal phylogeny. This occurred only for removal of the 4 fastest evolving categories of site from the alignment. Further removal after this point resulted in increased RMSD values between the phylogeny produced and the ideal. The MHP phylogeny shown in Figure 3(a), with maximum number of sites and minimum amount of noise. We propose that a possible reason for the presence of LBA in this dataset is the presence of taxa with vastly different generation times. The rodentia have previously been shown as "fast evolving" due to their short germ-line generation time, whereas species such as dogs and humans have longer germ-line generation times [27, 28, 50]. In any given dataset there are sites that are variable and sites that are invariable, this pattern is conserved across homologous sequences. In a dataset with a mixture of germ line generation times, the mutation rate in the species with shorter germ line generation times will be higher, because the number of cell divisions per unit time is greater. Therefore the number of mutations in the variable regions will increase for these species. The result is an LBA effect derived from having a mixture of long and short germ line generation times in the dataset, where the species with a short germ line generation time assumes a basal position in the phylogeny [26–28]. A number of approaches have been explored to systematically deal with fast evolving taxa the most popular include, (1) reconstructing the phylogeny based on slow evolving sites (applied here), (2) increasing the sample size, this is based on the assumption that increasing the sample size actually increases the number of slowly evolving positions, (3) decreasing the distance to the outgroup, and (4) using more accurate models of sequence change such as covarion derivatives.
Our gene tree – species tree reconciliation analysis has verified the duplication pattern amongst the MHPs. However, we believe that current methods of reconciliation such as the one used here may be biased towards inferring excess gene duplication and differential loss events, as is the case here. The method only considers the topology and not the corresponding alignment or any rate heterogeneity that may exist . We would also like to highlight that the variation of the " Slow-Fast" method employed here is an approximate method for a complex evolutionary dynamic and is not without its limitations.
Using this fully resolved phylogeny, positively selected sites have been identified, through the use of Bayesian estimation, unique to all four MHPs; MPO, EPO, LPO and TPO. The majority of these sites are in close proximity to catalytically important residues, suggesting that they may potentially be linked to functional shifts across the MHPs. The conserved proximal histidines in close proximity to sites under positive selection in MPO, EPO and LPO are crucial in preserving the redox properties of the heme iron for catalysis . The conserved distal histidines, also shown here to be in the vicinity of positively selected sites, act as both proton acceptors and donor to oxygen during the formation of Compound 1, which is an integral step in the peroxidase pathway . A number sites identified under positive selection are located in disulphide bond regions, which are believed to be crucial to the structure and function of a protein. Disruption of such regions can be detrimental to the enzymatic stability and activity [43, 52]. In particular, six sites pertaining to the LPO family are linked to the same disulphide bond. This strongly suggests that these sites are associated with the unique function of LPO as they are not present in the two closely related families MPO and EPO. In the TPO analysis the majority of the sites with highest probability of being positively selected are located in exon 8 of the protein. Deletion of exon 8 results in misfolding of the TPO protein . Exon 8 is also believed to be part of TPOs catalytic centre (exons 8, 9 and 10) . TPO functional defects are strongly associated with TIOD and several deleterious mutations within this catalytic region have been reported [44, 53–55]. We also find that one of our positively selected sites in TPO is associated directly with an inherited deficiency disorder .
Our detailed in silico site directed mutagenesis of the positively selected sites in MPO has shown that mutating these positions from their positively selected amino acid state to an alternative ancestral state results in loss/gain of hydrogen bonds between alternative amino acid positions for other sites in particular in the heme binding region of the MPO structure. The sites we have identified as positively selected in the MHPs have played a major role in the functioning of these enzymes as evidenced by mutational studies, proximity to active sites and catalytic residues, and inherited disorders.
The results of this study show for the first time from molecular sequence data (i) how this medically important group of enzymes are related to each other, and (ii) suggest that following gene duplication, positive selection has led to the functional diversity observed for the MHPs.
Representative mammalian heme peroxidase sequences used in this study.
Entry ID (Name)*/Gene ID
Each protein coding sequence in the MHP dataset was translated to amino acid using in-house translation software. This protein sequence dataset and the two PXDN sequences were combined to give a dataset of 33 sequences (complete dataset). Both MHP and "complete" datasets were aligned in ClustalW 1.8  independently using default parameter settings. The corresponding nucleotide sequences for the MHP dataset were aligned with respect to the amino acid MSA with the use of in-house software to insert gaps in the protein coding sequence according to their positions in the amino acid alignment. The nucleotide and subsequent protein MSAs were manually edited by removing ambiguous regions from the alignment using the sequence alignment editor, Se-Al 2.0a11 . The PXDN sequences served as an outgroup for the MHPs and therefore aided in determining the earliest diverging MHP.
The phylogenetic tree for the dataset was estimated using Bayesian statistics implemented in MrBayes 3.1.2 . The model of amino acid substitution used was JTT  because following model testing using MultiPhyl  this was the model that was best-fit to the data. Using 4 Markov chains for 400,000 generations, trees were sampled every 10 generations with the first 20,000 sampled trees discarded as burnin. The remaining trees samples were summarized on a majority rule consensus tree with clade supports given as Posterior Probabilities (PPs). ML trees were also inferred using the high-throughput phylogenomics webserver, MultiPhyl . The ML tree was generated using the nearest neighbour interchange (NNI) tree search algorithm and 100 bootstrap replicates implemented in MultiPhyl  under the Akaike Information Criterion (AIC) statistic, the selected substitution model was JTT with invariable sites and a discrete gamma model of rate heterogeneity. This was repeated a total of 10 times to generate 1000 bootstrap replicates. (The Bayesian tree reconstruction methods were applied to the MHP dataset only).
The resulting phylogenies from both analyses (MrBayes and MultiPhyl) were then analysed for signatures of LBA. The rate of evolution at each site in the alignment was placed into one of 8 categories, 8 being the most rapidly evolving and 1 being the most conserved, using the maximum likelihood approach implemented in TreePuzzle 5.1 . Sites were progressively removed from the protein MSA according to their evolutionary rate and the resultant trees were analysed for changes in topology.
Nine separate site-stripped alignments were constructed by successive removal of the most rapidly evolving sites . The aforementioned Bayesian method was used to infer phylogenetic relationships for each of the nine alignments generated. The ML phylogeny was also estimated for each of the site-stripped alignments from the model of best-fit following hierarchical likelihood ratio tests (hLRTs) of alternative models implemented in MultiPhyl .
The pruned nodal distance method implemented in TOPD/FMTS v3.3  was used to calculate the distance between each of the site-stripped trees and the ideal tree. The ideal tree was generated by pruning the resolved mammalian phylogeny  to represent those taxa present. A distance matrix is calculated for both the site-stripped phylogeny and the ideal phylogeny by counting the number of nodes that separate every taxon from every other taxon on the tree. Using the root means squared deviation (RMSD) implemented in the TOPD/FMTS v3.3  software package, the RMSD between the site-stripped phylogeny matrix and the ideal phylogeny matrix is calculated. A RMSD value of zero indicates that the two trees being compared are identical.
Following nodal distance analysis, the gene phylogeny with the lowest RMSD value (for the MHP sequences alone), and the species tree were examined for gene duplication and loss events using the default settings for gene tree – species tree reconciliation implemented in GeneTree 1.3.0 .
Analysis of variation in selective pressure following gene duplication in the MHPs was carried out using codon substitution models implemented in PAML 3.15 . Both site-specific and branch-site specific models were applied. The models used for this analysis allow for heterogeneous nonsynonymous-to-synonymous rate ratios (ω = Dn/Ds) across sites and amongst branches/lineages.
An ω-value > 1 indicates positive selection, ω < 1, purifying selection and neutral evolution when ω = 1. The statistically significant model for the data was selected using a series of LRTs to compare models and their more parameter rich extensions. Tests of significance were carried out using χ2 tests of significance, the comparisons performed were; M0 (one ratio) with M3(k = 2)(discrete), M1(neutral) with M2(selection), M3(k = 2) with M3(k = 3) discrete models, M7 (beta) with M8 (beta & omega > 1), M8 (beta & omega > 1) with the null hypothesis M8a (beta & omega = 1), M1 with model A (branch-site) and finally M3(k = 2) with model B (branch-site). The models and approach taken here have been described previously [39, 63].
Using the MHP gene phylogeny with the lowest RMSD value, each of the four MHPs were selected as independent clusters. Using the MHP protein MSA and this MHP gene phylogeny, statistical analysis implemented in the software DIVERGE v 1.04 [66, 46], was used to estimate the coefficient of functional divergence (theta ML or θ) for all pairs of clusters. The following are the clusters used in the analysis are taken from the resolved phylogeny (from Figure 3a) (1) MPO Cluster, (2) EPO Cluster, (3) LPO Cluster, and (4) TPO Cluster.
Homology modeling was performed using the human representative sequence for the MPO family and the first approach mode implemented by the homology-modeling server, SWISS-MODEL . The structure was modeled using the crystal structure of bromide-bound human MPO isoform C (PDB accession code 1d2vC). The positively selected sites identified from the PAML 3.15 (Yang 1997) analysis were highlighted (in gold) on the 3D structure generated using DeepView v3.7 . The conserved proximal heme ligand (His 502) was also highlighted (in blue) on the 3D model. In silico mutational analysis on these positively sites was carried out and their subsequent affect on hydrogen bonding was assessed using DeepView v3.7 .
Akaike Information Criterion
Bayes Empirical Bayes
nonsynonymous substitutions per nonsynonymous site
synonymous substitutions per synonymous site
hierarchical Likelihood Ratio Test
Jones, Taylor and Thornton
Long Branch Attraction
Likelihood Ratio Test
Mammalian Heme Peroxidase
Most Recent Common Ancestor
Multiple Sequence Alignment
Naïve Empirical Bayes
Nearest Neighbour Interchange
Protein Data Bank
Root Mean Squared Deviation
Total Iodide Organification Defect
We would like to thank the Irish Research Council for Science, Engineering and Technology (Embark Initiative Postgraduate Scholarship to NBL) for financial support. We would like to thank the SFI/HEA Irish Centre for High-End Computing (ICHEC) for processor time and technical support for both phylogeny reconstruction and selection analysis. We would like to acknowledge Dr James McInerney's research group at the Bioinformatics Laboratory, NUI Maynooth for the use of their computational facilities. We would like to thank Dr Christopher Creevey, European Molecular Biology Laboratory, Heidelberg, Germany for generously supplying us with the necessary computer code for conducting our site-stripping analysis.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.