Scaling properties of protein family phylogenies
© Herrada et al; licensee BioMed Central Ltd. 2011
Received: 14 March 2011
Accepted: 6 June 2011
Published: 6 June 2011
One of the classical questions in evolutionary biology is how evolutionary processes are coupled at the gene and species level. With this motivation, we compare the topological properties (mainly the depth scaling, as a characterization of balance) of a large set of protein phylogenies with those of a set of species phylogenies.
The comparative analysis between protein and species phylogenies shows that both sets of phylogenies share a remarkably similar scaling behavior, suggesting the universality of branching rules and of the evolutionary processes that drive biological diversification from gene to species level. In order to explain such generality, we propose a simple model which allows us to estimate the proportion of evolvability/robustness needed to approximate the scaling behavior observed in the phylogenies, highlighting the relevance of the robustness of a biological system (species or protein) in the scaling properties of the phylogenetic trees.
The invariance of the scaling properties at levels spanning from genes to species suggests that rules that govern the incapability of a biological system to diversify are equally relevant both at the gene and at the species level.
During the last century, an important effort has been devoted to the understanding of diversification patterns and processes in terms of branching evolutionary trees [1–7]. Tempo and mode of genetic change, and their connections with tempo and mode of speciation is an important issue in this context. In that sense, we address the question of whether similar forces act across the gene level and species-level evolution [8–10], through a comparative analysis of the topological behavior of protein and species phylogenies.
Previous analyses of the topological properties of phylogenies have revealed universal patterns of phylogenetic differentiation [3, 6, 7, 11, 12]. This means that the impact of evolutionary forces shaping the diversity of life on Earth on the shape of phylogenetic trees is, at least to the level of detail captured by the descriptors used, similar across a broad range of scales, from macro-evolution to speciation and population differentiation, and across diverse organisms such as eukaryotes, eubacteria, archaea or viruses, thereby. This together with the fact that evolutionary forces work at molecular level motivates the study of the topology of evolutionary relationships among molecular entities, looking for patterns of differentiation at such molecular level, thereby extending the examination of the universality of the scaling of branching laws in phylogenies all the way from molecular- to macro-evolution.
A protein family phylogeny is represented as a tree, i.e., as an acyclic graph of nodes connected by branches (links), where each node represents a diversification event. For each node in a phylogeny, a subtree (or subfamily) S is made of the root at the selected node and all of its descendant nodes. The subtree size A is the total number of subfamily members that diversify from the root (including itself). The characterization of how protein diversity is arranged through the phylogenies can be achieved in a variety of ways [19–26]. We focus here on the mean depth, d, of the subtree S (see Methods) [6, 27, 28] defined as: , where, for a given node j, droot,jis its topological distance to the root of the subtree S, that is, the number of nodes one has to go through so as to go from that node to the root (including the root in the counting), and the sum is over all nodes in the subtree S. Note that we use here the mean depth over all subtree nodes and not just the leaves, which gives a different but related measure [4, 29, 30]. In the remainder, when no subindex is indicated, we understand that mean depth and other quantities refer to a whole tree or a subtree depending on the context.
How the shape of a phylogenetic tree, i.e., the distribution of protein diversification, changes with tree size, i.e., with the number of proteins it contains, can be analyzed by examining the dependence of the mean depth on subfamily size d = d(A). This gives information on the balance characteristics of the tree. To be clearer, in the additional file 1 we show the analysis of A and d for a fully balanced and a fully imbalanced 15-tip tree, as well as for a 15-tip subtree of a real phylogenetic tree. For a given tree size, the smallest value of the mean depth corresponds to the fully polytomic tree. The mean depth d as a function of tree size A is given in this case by
The leading contribution at large sizes is logarithmic: d ~ ln A. This logarithmic scaling is not exclusive of fully balanced trees, it is also the behavior of the Equal-Rates Markov (ERM) model [28, 31, 32], the natural null model for stochastic tree construction, in which, at each time step, one of the existing leaves of the tree is chosen at random and bifurcated into two new leaves.
We report here the patterns of mean depth for protein families, and compare the branching patterns derived for protein families, from the PANDIT database with those of species phylogenies, reported previously from the TreeBASE database . This comparison shows that branching patterns are mostly preserved across evolutionary scales spanning from genes to species.
Protein phylogenies depth scaling
The depth scaling behavior shared by protein and species phylogenies can be explained by different branching mechanisms. In this direction, during the last decade, several models have been published proposing different mechanisms to capture the topology of phylogenetic trees [6, 27, 28, 33, 35, 36]. Most of the models proposed yield a logarithmic scaling of the mean depth, i.e., ERM-type for large sizes [31, 32, 37], which is not a good description of our data (see Figure 2 and additional file 2), at least at the tree sizes available; the AB model proposed in Ref.  is one of the few models that deviate from the ERM-like scaling leading to a squared logarithmic d ~ (ln A)2 (see also ); models with power law scaling of the mean depth d ~ A η have also been defined in terms of statistical rules assigning probabilities to different splittings or types of trees  or in terms of (simplified) evolutionary events (in the sense specified in Ref. ) occurring in time [27, 28].
An alternative explanation of the scaling properties of the phylogenetic trees  suggests that the non-ERM behavior is a small-size transient behavior, which would cross-over to the ERM scaling d ~ ln A as larger tree sizes become available.
The process conducive to trees that deviate from ERM behavior is the presence of temporal correlations, which leads to asymptotic or just finite-size deviations with respect to the ERM behavior depending on whether these correlations are permanent or restricted to finite but large times. We, thus, explored the role of such correlations through a simple model based on the inheritability of the evolvability, i.e., the ability to evolve [38, 39], as a biological characteristic which is itself inherited by sister species in speciation events. The process starts with the root, which we consider capable to speciate. At each time step, all present species capable to speciate branch simultaneously. Each branching event yields two new daughter species, for which we allow two possible outcomes:
with probability p, the new species inherit the evolvability of the mother species, i.e., they have the same capacity as the mother species to speciate again;
with probability 1- p, one of the daughter species is unable to speciate again, that is, only one of the two daughter species preserves the ability to evolve. Stemming from the definition of robustness as the property of a system to remain invariant in the presence of genetic or environmental perturbations , we consider a species' inability to speciate its robustness.
The trees generated with this algorithm yield a scaling very close to those observed for phylogenetic trees in both PANDIT and TreeBASE for p = 0.24 (see Figure 6, and additional file 3). This result identifies the prevalence of imbalanced branching events (occurring with frequency 1- p = 0.76) relative to balanced ones (p = 0.24), which is consistent with earlier reports [5, 6, 33].
At large n, the leading contributions are A ~ z n and C ~ nz n (we do not write explicitly prefactors which may depend on z but not on n). Taking into account Eq. (7) in Methods (i.e. d = (C/A) - 1) and inverting the relationship between A and n (n ~ ln A), we obtain that for large sizes the leading order of the mean depth is d ~ ln A, which indicates that what we observe in the simulations is a long transient behavior. This transient behavior leads to the fact that our model ts the proper behavior of the data at the sizes in the databases, but the asymptotic scaling at the larger sizes will finally be d ~ ln A, as in the ERM.
The development of high-throughput "-omics" has provided the data required to address the traditional debate on how gene-level evolution shapes the species-level evolution [8–10]. This debate connects with that on the (dis)continuity between micro- and macro-evolution, and gradualism versus saltationism [41–43]. In the context of these debates, the universal scaling of phylogenetic trees at intra and inter-specific levels shown earlier  suggested the conservation of the evolutionary processes that drive biological diversification across the entire history of life. Here we extend this observation further to demonstrate that the universality of the scaling properties can also be extrapolated to the gene-level. The results presented here show that the branching and scaling patterns in protein families do not differ significantly from the patterns observed in species phylogenies, at least for the topological properties we have calculated. We do not observe any discrepancy between the shape of protein phylogenies and species phylogenies. Moreover, the results presented here shows no evidence for possible differences in phylogenetic trees among protein families with different biological functions, further providing evidence of universal, conserved evolutionary processes from genes to species.
In 2006, Cotton and Page published a comparative analysis between human gene phylogenies and species phylogenies . They found quantitative differences between human paralogous gene and orthologous gene phylogenies. Their research focused on the comparison between (small) paralogous and orthologous gene families, while here we have analyzed complete protein families, which included both paralogous and orthologous protein members, focusing on the comparison between protein and species phylogenies. Our approach is based on a scaling analysis, examining how variables change with tree size, whereas the Cotton-Page's approach is based on a quantitative analysis of small sizes. This implies that despite their finding of quantitative differences between paralogous and orthologous gene phylogenies, we expect that both phylogenies would display scaling behavior similar to that we described here for complete protein phylogenies and organism phylogenies .
Different evolutionary models and mechanisms have been proposed to explain the branching patterns arising in evolution [6, 27, 28, 33, 36, 37]. Here we have introduced a simple model accounting for differences in the degree of evolvability, which is emerging as a key trait constraint as important as robustness in evolution [44–47]. The model we proposed can be interpreted in the framework of the balance between evolvability as the potential of a biological system for future adaptive mutation and evolution , and robustness as the property of a system to produce relatively invariant output in the presence of a perturbation . Indeed, the symmetric diversification event should correspond to the biological context in which the biological system is evolvable, while the asymmetric diversification process should correspond to a biological context where the new biological system, which has just appeared from the diversification process, is robust and unable of unlimited diversification.
The asymptotic behavior of our model at long tree sizes recovers the logarithmic behavior of the ERM scaling, so that, as in the models by , the non-ERM behavior occurs as a transient for the relatively small tree sizes present in the databases. Despite this, the local (i.e. present for finite sizes) imbalance in real trees can be interpreted in terms of the evolvability concept. The prevalence of the unbalanced branching found is consistent with previous works [6, 33, 48–51], and has been traditionally explained by the presence of variations in the speciation and/or extinction rates throughout the Tree of Life [4, 5].
Different biological explanations for these variations in the speciation and/or extinction rates have been proposed, such as: refractory period , mass extinctions , specialization  or environment effects . The consideration of an evolutionary scenario based on the evolvability/robustness interplay has led us to postulate the presence of asymmetric diversification events over the depth scaling during evolutionary processes giving rise to a new biological system which is unable to undergo a new diversification event. An incapability to diversify may occur at different levels of evolution, and can be found at the macroevolutionary level with taxa that require very long refractory periods or with random massive extinctions of taxa, as well as at the microevolutionary or gene level, where the elements unable to diversify are individuals from a population or genetic variants from a cell, embryo or individual.
In summary, the finding of universal scaling properties at gene and species level, characterized by the similar scaling laws, strongly suggest the universality of branching rules, and hence of the evolutionary processes that drive biological diversification across the entire history of life, from genes to species. The topological characterization of phylogenetic trees has proven helpful to analyze the relevance of the robustness of a biological system (species or protein) in the scaling properties of the phylogenetic trees. Thus, the invariance of the scaling properties at levels spanning from genes to species suggests that the mechanisms leading to the incapability of a biological system to diversify for a very long period of time act at both the gene- and species-level.
0.1 Protein phylogenies database
We analyzed the 7,738 protein families available in the PANDIT database (http://www.ebi.ac.uk/goldman-srv/pandit/ accession date May 27, 2008) . PANDIT is based upon Pfam http://pfam.sanger.ac.uk/, and constitutes a large collection of protein family phylogenies from different signalling pathways, cellular organelles and biological functions, reconstructed with five different methods: NJ , BioNJ , Weighbor , FastME  and Phyml . The size of each of the protein phylogenies, T, ranges from 2 to more than 2000 tips (i.e. proteins within families) and, in agreement with previous reports [20, 61–64], shows a power law distribution P(T) ~T-γ(see Figure 1). Most of the bifurcations in these phylogenies are binary, with only 22% of polytomic bifurcations.
0.2 Mean depth
Accounting for the initial condition, that is, the root, with C = 1 and S = 0, yields C = 2S + 1 for binary trees. Thus, at large sizes, both quantities, C and S, become proportional and scale in the same way with size.
We acknowledge financial support from the European Commission through the NEST-Complexity project EDEN (043251) and from MICINN (Spain) and FEDER through project FISICOS (FIS2007-60327).
- Willis JC: Age and area: a study in geographical distribution and origin of species. 1922, Cambridge: Cambridge University PressView ArticleGoogle Scholar
- Savage HM: The shape of evolution: systematic tree topology. Biol J Linnean Soc. 1983, 20: 225-244. 10.1111/j.1095-8312.1983.tb01874.x.View ArticleGoogle Scholar
- Burlando B: The fractal geometry of evolution. J Theor Biol. 1993, 163 (2): 161-172. 10.1006/jtbi.1993.1114.View ArticlePubMedGoogle Scholar
- Kirkpatrick M, Slatkin M: Searching for Evolutionary Patterns in the Shape of a Phylogenetic Tree. Evolution. 1993, 47: 1171-1181. 10.2307/2409983.View ArticleGoogle Scholar
- Mooers AO, Heard SB: Inferring evolutionary process from the phylogenetic tree shape. Q Rev Biol. 1997, 72: 31-54. 10.1086/419657.View ArticleGoogle Scholar
- Blum MGB, François O: Which random processes describe the tree of life? A large-scale study of phylogenetic tree imbalance. Syst Biol. 2006, 55 (4): 685-691. 10.1080/10635150600889625.View ArticlePubMedGoogle Scholar
- Herrada EA, Tessone CJ, Klemm K, Eguíluz VM, Hernández-García E, Duarte CM: Universal Scaling in the Branching of the Tree of Life. PLoS ONE. 2008, 3: e2757-10.1371/journal.pone.0002757.View ArticlePubMedPubMed CentralGoogle Scholar
- Morris SC: Evolution: bringing molecules into the fold. Cell. 2000, 100: 1-11.View ArticlePubMedGoogle Scholar
- Carroll SB: Evolution at two levels: on genes and form. PLoS Biol. 2005, 3 (7): e245-10.1371/journal.pbio.0030245.View ArticlePubMedPubMed CentralGoogle Scholar
- Roth C, Rastogi S, Arvestad L, Dittmar K, Light S, Ekman D, Liberles DA: Evolution after gene duplication: models, mechanisms, sequences, systems, and organisms. J Exp Zool B Mol Dev Evol. 2007, 308: 58-73.View ArticlePubMedGoogle Scholar
- Dial KP, Marzluff JM: Nonrandom diversification within taxonomic assemblages. Syst Zool. 1989, 38: 26-37. 10.2307/2992433.View ArticleGoogle Scholar
- Burlando B: The fractal dimension of taxonomic systems. J Theor Biol. 1990, 146: 99-114. 10.1016/S0022-5193(05)80046-3.View ArticleGoogle Scholar
- Dayhoff MO: Atlas of Protein Sequence and Structure. 1965, Washington: National Biomedical Research FoundationGoogle Scholar
- Whelan S, de Bakker PIW, Quevillon E, Rodriguez N, Goldman N: PANDIT: an evolution-centric database of protein and associated nucleotide domains with inferred trees. Nucleic Acids Res. 2006, D327-D331. 34 Database
- Banavar JR, Maritan A, Rinaldo A: Size and form in efficient transportation networks. Nature. 1982, 399: 130-132.View ArticleGoogle Scholar
- Garlaschelli D, Caldarelli G, Pietronero L: Universal scaling relations in food webs. Nature. 2003, 423 (6936): 165-168. 10.1038/nature01604.View ArticlePubMedGoogle Scholar
- Camacho J, Arenas A: Food-web topology: universal scaling in food-web structure?. Nature. 2005, 435 (7044): E3-E4. 10.1038/nature03839.View ArticlePubMedGoogle Scholar
- Klemm K, Eguíluz VM, San Miguel M: Scaling in the structure of directory trees in a computer cluster. Phys Rev Lett. 2005, 95 (12): 128701-View ArticlePubMedGoogle Scholar
- Apic G, Huber W, Teichmann SA: Multi-domain protein families and domain pairs: comparison with known structures and a random model of domain recombination. J Struct Funct Genomics. 2003, 4 (2-3): 67-78.View ArticlePubMedGoogle Scholar
- Unger R, Uliel S, Havlin S: Scaling law in sizes of protein sequence families: from super-families to orphan genes. Proteins. 2003, 51 (4): 569-576. 10.1002/prot.10347.View ArticlePubMedGoogle Scholar
- Cotton JA, Page RDM: Rates and patterns of gene duplication and loss in the human genome. Proc R Soc B. 2005, 272 (1560): 277-283. 10.1098/rspb.2004.2969.View ArticlePubMedPubMed CentralGoogle Scholar
- Kunin V, Teichmann SA, Huynen MA, Ouzounis CA: The properties of protein family space depend on experimental design. Bioinformatics. 2005, 21 (11): 2618-2622. 10.1093/bioinformatics/bti386.View ArticlePubMedGoogle Scholar
- Lee D, Grant A, Marsden RL, Orengo C: Identification and distribution of protein families in 120 completed genomes using Gene3D. Proteins. 2005, 59 (3): 603-615. 10.1002/prot.20409.View ArticlePubMedGoogle Scholar
- Cotton JA, Page RDM: The shape of human gene family phylogenies. BMC Evol Biol. 2006, 6: 66-10.1186/1471-2148-6-66.View ArticlePubMedPubMed CentralGoogle Scholar
- Sales-Pardo M, Chan AOB, Amaral LAN, Guimerà R: Evolution of protein families: is it possible to distinguish between domains of life?. Gene. 2007, 402 (1-2): 81-93. 10.1016/j.gene.2007.07.029.View ArticlePubMedPubMed CentralGoogle Scholar
- Hughes T, Liberles DA: The power-law distribution of gene family size is driven by the pseudogenisation rate's heterogeneity between gene families. Gene. 2008, 414 (1-2): 85-94. 10.1016/j.gene.2008.02.014.View ArticlePubMedGoogle Scholar
- Ford DJ: Probabilities on cladograms:introduction to the alpha model. PhD thesis. 2006, Stanford University, StanfordGoogle Scholar
- Hernández-García E, Tuğrul M, Herrada EA, Eguíluz VM, Klemm K: Simple models for scaling in phylogenetic trees. Int J Bifurcat Chaos. 2010, 20: 805-811. 10.1142/S0218127410026095.View ArticleGoogle Scholar
- Sackin M: Good and bad phenograms. Syst Zool. 1972, 21: 225-226. 10.2307/2412292.View ArticleGoogle Scholar
- Blum MGB, François O: On statistical tests of phylogenetic tree imbalance: the Sackin and other indices revisited. Math Biosci. 2005, 195 (2): 141-153. 10.1016/j.mbs.2005.03.003.View ArticlePubMedGoogle Scholar
- Cavalli-Sforza LL, Edwards AWF: Phylogenetic analysis: models and estimation procedures. Am J Hum Genet. 1967, 19: 233-257.PubMedPubMed CentralGoogle Scholar
- Harding EF: The probabilities of rooted tree-shapes generated by random bifurcation. Adv Appl Prob. 1971, 3: 44-77. 10.2307/1426329.View ArticleGoogle Scholar
- Aldous DJ: Stochastic models and descriptive statistics for phylogenetic trees from Yule to today. Stat Sci. 2001, 16: 23-34. 10.1214/ss/998929474.View ArticleGoogle Scholar
- Keller-Schmidt S, Tuğrul M, Eguíluz VM, Hernández-Garca E, Klemm K: An Age Dependent Branching Model for Macroevolution. 2010, [http://arxiv.org/abs/1012.3298]Google Scholar
- Pinelis I: Evolutionary models of phylogenetic trees. Proc R Soc B. 2003, 270 (1522): 1425-1431. 10.1098/rspb.2003.2374.View ArticlePubMedPubMed CentralGoogle Scholar
- Stich M, Manrubia SC: Topological properties of phylogenetic trees in evolutionary models. Eur Phys J B. 2009, 71: 583-592.View ArticleGoogle Scholar
- Yule GU: A mathematical theory of evolution, based on the conclusions of Dr. J. C. Willis. Philos Trans R Soc Lond A. 1924, 213: 21-87.View ArticleGoogle Scholar
- Dawkins R: The evolution of evolvability. Artificial Life. The Proceedings of an Interdisciplinary Workshop on the Synthesis and Simulation of Living Systems. Edited by: Langton C. 1989, Los Alamos: Addison-Wesley Pub. Corp, VI: 201-220.Google Scholar
- Brookfield JFY: Evolution and evolvability: celebrating Darwin 200. Biol Lett. 2009, 5: 44-46. 10.1098/rsbl.2008.0639.View ArticlePubMedGoogle Scholar
- Masel J, Siegal ML: Robustness: mechanisms and consequences. Trends Genet. 2009, 25 (9): 395-403. 10.1016/j.tig.2009.07.005.View ArticlePubMedPubMed CentralGoogle Scholar
- Erwin DH: Macroevolution is more than repeated rounds of microevolution. Evol Dev. 2000, 2: 78-84. 10.1046/j.1525-142x.2000.00045.x.View ArticlePubMedGoogle Scholar
- Simons AM: The continuity of microevolution and macroevolution. J Evol Biol. 2002, 15: 688-701. 10.1046/j.1420-9101.2002.00437.x.View ArticleGoogle Scholar
- Grantham T: Is macroevolution more than succesive rounds of microevolution?. Paleontology. 2007, 50: 75-85. 10.1111/j.1475-4983.2006.00603.x.View ArticleGoogle Scholar
- Wagner A: Robustness and evolvability in living systems. 2005, Princeton: Princeton University PressGoogle Scholar
- Lenski RE, Barrick JE, Ofria C: Balancing robustness and evolvability. PLoS Biol. 2006, 4 (12): e428-10.1371/journal.pbio.0040428.View ArticlePubMedPubMed CentralGoogle Scholar
- Daniels BC, Chen YJ, Sethna JP, Gutenkunst RN, Myers CR: Sloppiness, robustness, and evolvability in systems biology. Curr Opin Biotechnol. 2008, 19 (4): 389-395. 10.1016/j.copbio.2008.06.008.View ArticlePubMedGoogle Scholar
- Wagner A: Robustness and evolvability: a paradox resolved. Proc R Soc B. 2008, 275 (1630): 91-100. 10.1098/rspb.2007.1137.View ArticlePubMedGoogle Scholar
- Guyer C, Slowinski JB: Comparisons between observed phylogenetic topologies with null expectation among three monophyletic lineages. Evolution. 1991, 45: 340-350. 10.2307/2409668.View ArticleGoogle Scholar
- Heard SB: Patterns in tree balance among cladistic, phenetic, and randomly generated phylogenetic trees. Evolution. 1992, 46: 1818-1826. 10.2307/2410033.View ArticleGoogle Scholar
- Guyer C, Slowinski JB: Adaptive radiation an the topology of large phylogenies. Evolution. 1993, 47: 253-263. 10.2307/2410133.View ArticleGoogle Scholar
- Mooers AØ, Page RDM, Purvis A, Harvey PH: Phylogenetic noise leads to unbalanced cladistic trees reconstructions. Syst Biol. 1995, 44: 332-342.View ArticleGoogle Scholar
- Chan KMA, Moore BR: Accounting for mode of speciation increases power and realism of tests of phylogenetic asymmetry. Am Nat. 1999, 153: 332-346. 10.1086/303173.View ArticleGoogle Scholar
- Heard SB, Mooers AØ: Signatures of random and selective mass extinctions in phylogenetic tree balance. Syst Biol. 2002, 51 (6): 889-897. 10.1080/10635150290102591.View ArticlePubMedGoogle Scholar
- Davies TJ, Savolainen V, Chase MW, Goldblatt P, Barraclough TG: Environment, area, and diversification in the species-rich owering plant family Iridaceae. Am Nat. 2005, 166 (3): 418-425. 10.1086/432022.View ArticlePubMedGoogle Scholar
- Bateman A, Coin L, Durbin R, Finn RD, Hollich V, Griffiths-Jones S, Khanna A, Marshall M, Moxon S, Sonnhammer ELL, Studholme DJ, Yeats C, Eddy SR: The Pfam protein families database. Nucleic Acids Res. 2004, D138-D141. 32 Database
- Saitou N, Nei M: The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol. 1987, 4 (4): 406-425. [http://www.ncbi.nlm.nih.gov/pubmed/3447015]PubMedGoogle Scholar
- Gascuel O: BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data. Mol Biol Evol. 1997, 14 (7): 685-695. [http://www.ncbi.nlm.nih.gov/pubmed/9254330]View ArticlePubMedGoogle Scholar
- Bruno WJ, Socci ND, Halpern AL: Weighted neighbor joining: a likelihood-based approach to distance-based phylogeny reconstruction. Mol Biol Evol. 2000, 17: 189-197. [http://www.ncbi.nlm.nih.gov/pubmed/10666718]View ArticlePubMedGoogle Scholar
- Desper R, Gascuel O: Fast and accurate phylogeny reconstruction algorithms based on the minimum-evolution principle. J Comput Biol. 2002, 9 (5): 687-705. 10.1089/106652702761034136.View ArticlePubMedGoogle Scholar
- Guindon S, Gascuel O: A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol. 2003, 52 (5): 696-704. 10.1080/10635150390235520. [http://www.ncbi.nlm.nih.gov/pubmed/14530136]View ArticlePubMedGoogle Scholar
- Huynen MA, van Nimwegen E: The frequency distribution of gene family sizes in complete genomes. Mol Biol Evol. 1998, 15 (5): 583-589.View ArticlePubMedGoogle Scholar
- Harrison PM, Gerstein M: Studying genomes through the aeons: protein families, pseudogenes and proteome evolution. J Mol Biol. 2002, 318 (5): 1155-1174. 10.1016/S0022-2836(02)00109-2.View ArticlePubMedGoogle Scholar
- Koonin EV, Wolf YI, Karev GP: The structure of the protein universe and genome evolution. Nature. 2002, 420 (6912): 218-223. 10.1038/nature01256.View ArticlePubMedGoogle Scholar
- Luscombe NM, Qian J, Zhang Z, Johnson T, Gerstein M: The dominance of the population by a selected few: power-law behaviour applies to a wide variety of genomic properties. Genome Biol. 2002, 3 (8): RESEARCH0040-View ArticlePubMedPubMed CentralGoogle Scholar
- Campos PRA, de Oliveira VM: Emergence of allometric scaling in genealogical trees. Advances in Complex Systems. 2004, 7: 39-46. 10.1142/S0219525904000044.View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.