- Research article
- Open Access
Expression level, cellular compartment and metabolic network position all influence the average selective constraint on mammalian enzymes
© Hudson and Conant; licensee BioMed Central Ltd. 2011
- Received: 12 November 2010
- Accepted: 6 April 2011
- Published: 6 April 2011
A gene's position in regulatory, protein interaction or metabolic networks can be predictive of the strength of purifying selection acting on it, but these relationships are neither universal nor invariably strong. Following work in bacteria, fungi and invertebrate animals, we explore the relationship between selective constraint and metabolic function in mammals.
We measure the association between selective constraint, estimated by the ratio of nonsynonymous (Ka) to synonymous (Ks) substitutions, and several, primarily metabolic, measures of gene function. We find significant differences between the selective constraints acting on enzyme-coding genes from different cellular compartments, with the nucleus showing higher constraint than genes from either the cytoplasm or the mitochondria. Among metabolic genes, the centrality of an enzyme in the metabolic network is significantly correlated with Ka/Ks. In contrast to yeasts, gene expression magnitude does not appear to be the primary predictor of selective constraint in these organisms.
Our results imply that the relationship between selective constraint and enzyme centrality is complex: the strength of selective constraint acting on mammalian genes is quite variable and does not appear to exclusively follow patterns seen in other organisms.
- Metabolic Network
- Node Degree
- Metabolic Gene
- Betweenness Centrality
- Protein Interaction Network
The rate and manner of evolutionary change has long been a matter of keen interest to biologists . Kimura provided theoretical underpinnings to molecular evolution by relating rates of sequence substitution, population parameters and mutation rates [2, 3]. Thus, Kimura's neutral theory  predicts that mutations having no fitness effect will become fixed in a population at a rate equal to the mutation rate. Such neutral mutations therefore provide a standard for measuring the action of natural selection: regions changing more slowly than neutral ones are inferred to be experiencing purifying selection (e.g., selective constraint), those changing more rapidly, adaptive evolution. While the relative contributions of genetic drift, adaptive evolution and purifying selection to population differentiation are still debated, , there is general agreement that the patterns of selection vary both across species as well as among genes in the same species .
Regarding interspecific variation, Lynch and Conery  argue that much of the variation in genome structure and content between species can be attributed to differences in their effective population sizes (Ne). Small effective population sizes limit the efficiency of purifying selection and allow the occasional fixation of mildly deleterious mutations. While some cross-taxa surveys have reported patterns consistent with this hypothesis [8–10], others have found that if one allows for reasonably frequent directional selection there is only a weak relationship between Ne and selective constraint [11–13].
The second type of variation in selective constraint, that between genetic loci in the same population, has also been studied [14–16]. In particular, considerable effort has gone into identifying factors that predict the selection acting on a particular gene. One critical variable is expression level: mammalian genes expressed in many tissues show stronger selective constraints than do those expressed in only a few tissues . Likewise, in yeast, a high expression level is the primary predictor of strong purifying selection acting on a gene , likely because the selective cost of protein misfolding is especially large for highly translated proteins .
This association is also in keeping with Wagner's theoretical analyses showing that gene expression is selectively costly in yeast . However, as he notes, the fitness cost of mis-expression is likely to be very different in multicellular organisms .
The influence of other factors on selective constraint is also debated, with the evidence primarily coming from studies in yeast [18, 20–26]. The topic is confounded by the intercorrelation of many of these predictors . Thus, some researchers report a significant correlation between the fitness cost of gene knockouts and those genes' selective constraint [21, 23], while others have questioned this association [20, 24]. There is similar debate regarding whether the position of a gene or protein in an interaction network influences selective constraint.
Recall that in these networks genes or proteins are nodes; relationships, such as protein interactions or shared metabolites, are represented as edges between nodes. Researchers have studied the association between selective constraint and measures such as node degree (the number of edges for a given node) and betweenness centrality [a more global statistic measuring the number of shortest paths passing through a node; [27–29]]. Significant associations between node importance and selective constraint have been found in regulatory , protein interaction , coexpression , and metabolic networks [32–34]. However, at least for protein interaction networks, this association seems to be at best quite weak [22, 25, 26].
Metabolic and nonmetabolic genes differ in selective constraint
The ratio of nonsynonymous to synonymous substitutions (Ka/Ks: hereafter ω) for each set of orthologous genes was estimated by maximum likelihood using PAML 4.2 [Figure 2A; ]. This ratio can be interpreted as a measure of selective constraint: values near 0 indicate strong purifying selection, while values greater than 1.0 suggest directional selection.
We hypothesized that metabolic genes would also be under stronger selective constraint than the average non-metabolic gene, so we performed three statistical tests of this hypothesis. First, using a Mann-Whitney U-test (Wilcoxon two-sample test), we rejected the null hypothesis that the median ω among metabolic genes is no smaller than that of non-metabolic genes (i.e., a one-tailed test, P = 0.035; Figure 2). Next, we performed a similar test for unequal mean ω values between the two groups. Given that neither distribution in Figure 2A appears normal, we adopted a bootstrapping approach, drawing 1,000,000 samples of size n = 1,190 from the set of non-metabolic genes (n = 12,738) and calculating these samples' means. In no case was the mean value of ω from the bootstrapped samples as small as the observed mean value for metabolic genes (ωmetabolic = 0.1292, P < 10-6). We also drew 100,000 samples of sizes n = 1,190 and of n = 12,738 and calculated the difference in their means. The absolute differences in the mean values was never as large as that observed between the metabolic and non-metabolic genes (P < 10-5; Figure 2C).
Log-likelihoods of a linear fit between all ω values and each of 8 common distributions, with likelihood ratio tests for the differences in distributions calculated for the 3 best distributional fits
Cellular compartments differ in the selective constraint acting on their enzymes
The mean value of ω varies from 0.0935 in the nucleus to 0.1735 in the peroxisome. To determine if the differences in ω values are significant across compartments, we first clustered the compartments by mean [UPGMA; ]. The resulting three groups, in order of increasing ω, are: the Golgi apparatus and the nucleus, all other compartments except the peroxisome, and finally the peroxisome (Figure 3). We tested for significant pairwise differences between compartments in ω using a Mann-Whitney U-test (Figure 3) at a significance level of α = 0.01 (to account for the inherent multiple testing issues). The tests were conducted in a nested fashion, such that groups for which we could not reject the hypothesis of equal values of ω were compared to their nearest neighbors (c.f., the tree in Figure 3). This procedure allowed us to make seven comparisons, rather than the 56 possible pairwise comparisons. We find that the distributions clustered with low ω values (the nucleus and Golgi apparatus) are statistically indistinguishable (P = 0.047). Those in the large intermediate cluster can be split among groups that are statistically indistinguishable including lysozyme and transport compartments (P = 0.087), endoplasmic reticulum and external reactions (P = 0.205), and the cytosol and mitochondrial compartments, which are both statistically distinct from each other (P < 0.01). The peroxisome is also statistically distinct from the remaining compartments (P < 0.01).
We next explored the role of metabolic network structure in influencing selective constraint, using the metabolic network of Duarte et al. . This network includes information on reaction compartment and directionality that were used to create a semi-directed metabolic network where reactions are nodes. Two nodes are connected by an edge if they share a metabolite. Note that because metabolites are compartment-specific, edges do not connect reactions in differing compartments. Edges are also disallowed if the two reactions in question are irreversible and the interconnecting metabolite serves as a substrate in both reactions or a product in both. The resulting network has 298,004 edges and 3,741 nodes, of which 2,264 have at least one associated gene (Figure 1).
Removal of currency metabolites
One of the implicit steps in preprocessing metabolic networks is removing currency metabolites, such as water and ATP that participate in numerous reactions. Failing to remove such metabolites prior to analysis can lead to an overestimation of connectedness between reactions.
Correlations between graph properties and ω
Correlations between ω and the graph properties (degree and betweenness) for each compartment, including the number of reactions and edges in each compartment
# of reactions
# of edges
Positive selection among the metabolic genes cannot explain the associations seen
We found 52 sets of orthologous metabolic genes that showed evidence for positive selection, spread across all cellular compartments (ranging from 0.7% of mitochondrial genes to 6.4% of cytoplasmic ones; see Methods). Excluding these genes did not alter our compartment specific estimates of ω, the correlations between network statistics and ω or the significance of the differences in ω between compartments (data not shown).
There is a weak relationship between gene expression and selective constraint
Using 4,105 genes in both our sample and the HUGE Index  we found a weak statistical relationship between ω and maximum expression level (r = -0.081; P < 10-6). For metabolic genes this correlation is somewhat stronger (r = -0.089; P = 0.029). This relationship, however, is weaker than the relationship we find between network position and ω in metabolic genes, implying that expression may not the dominant predictor of selective constraint in mammals in the same way it is in yeast .
Our conclusions that gene function, expression, cellular localization and network position influence selective constraint will individually come as little surprise to researchers. This is especially true of our conclusion that purifying selection acts more strongly on metabolic genes than on genes from the genome at large: function is a known correlate of rate of evolution [42–44].
While we find a significant correlation between reaction centrality (betweenness) and selective constraint in the metabolic network, this result comes with several important caveats. First, although it is reasonable to interpret Ka/Ks as the level of selective constraint a gene experiences, in fact, this statistic represents an average evolutionary rate: in particular, two genes with the same fraction of amino acid substitutions forbidden by natural selection might have differing values of Ka/Ks if one gene had undergone more adaptive amino acid substitutions. We have partly controlled for this effect by omitting orthologs with evidence of positive selection, but it is not currently possible to completely remove this effect. Another caveat is that the association of betweenness-centrality and (apparent) constraint disappears when the currency metabolites are included. It is also worth noting that node degree on its own is not predictive of constraint in mammals, similar to the lack of association between these variables seen in E. coli . We suggest one useful message to take from this result is that the relationship that exists between selective constraint and betweenness centrality is dependent on the manner in which the network is constructed. Special care has been taken in justifying the removal of currency metabolites across networks, however different removal strategies produce different associations of centrality and constraint (Methods).
From a more general perspective, it is also important to recall that networks are only computational abstractions of a biological reality. To speak of an association of betweenness and selection is therefore actually to suggest that betweenness, a measurable quality, also represents an underlying biological feature. In this work, we have not directly demonstrated such a biological association. Likewise, there is a difference between the metabolic network associations seen here and those in protein interaction networks. In protein interaction networks, the pairwise binding of proteins is directly mediated by sequences, and natural selection can act to maintain complementary sequences in two interacting proteins . In metabolic networks, the relationship is more tenuous; one assumes that central reactions are required for proper function of the metabolic network and hence enzymes catalyzing such reactions will be under greater constraint. Even if this argument holds, the constraint is on function and not specifically on sequence. If an enzyme can maintain this function using differing sequences, there might be no necessary association of sequence constraint and centrality.
When we break the metabolic network down by compartment, we do find associations between network centrality (degree or betweenness) and constraint in some, but not all, compartments (Table 2). One lesson from these complex results is that although it is intuitive to consider the relationship between metabolic network structure and selective constraint at a global level, differences in constraint among compartments may confound global analyses. Likewise, the variation in constraint among these compartment raises interesting questions: it is unclear why enzymes from the Golgi apparatus and nucleus should be more highly conserved than those from the central group of compartments (Figure 3). Strikingly, enzymes implicated in external reactions fall within this central group, and are not distinguished by having a uniquely fast or slow rate of substitution. This result contrasts the findings by Liao et al.  and Julenius and Pedersend  that the intra-/extra-cellular localization of a protein is highly predictive of its ω. However, note that these authors considered all genes in a given compartment, as opposed to the strictly metabolic ones analyzed here.
One potential explanation for these differences in constraint between compartments is that those compartments have different tolerances for misfolded proteins. Protein misfolding appears to have a significant fitness cost in yeast , and it is not unreasonable to hypothesize that the spatial organization of the nucleus  might induce a particularly high cost for misfolded proteins. However, one observation that speaks against this hypothesis is the weak association of constraint and expression. Our results thus suggest that although gene expression in some manner constrains mammalian protein evolution, it is less effective at doing so in mammals than in yeast.
In general, we find that although the position of a mammalian gene's product in the metabolic network and its expression level are both associated with that gene's evolutionary constraint, neither factor is determinative. Thus, unlike yeast, the forces that determine the selective constraint on mammalian protein-coding genes are likely both to be complex and to vary between genes.
Our method for orthology identification first detects homologous genes using sequence similarity and then uses gene order to resolve orthology [orthology and paralogy are reviewed in ]. Specifically, we first conduct a pairwise homology search among all genes in G1 and G2 using GenomeHistory . GenomeHistory hits were filtered to exclude those with E-values greater than 10-10 (comparisons to chimpanzee and macaque) or 10-9 (all other comparisons) and amino acid sequence identity less than 50% (chimpanzee and macaque) or 45% (all others). Cases where two homologous genes are immediate neighbors on a chromosome (e.g., tandem duplicates) are treated as a single locus. An initial ortholog pair A and B is inferred if three criteria are met:
A is from G1 and B is from G2.
The only homology of A in G2 is B and the only homology of B in G1 is A.
The synonymous divergence between A and B is less than a threshold (Ks < 0.5 for the human-chimpanzee and human-macaque comparisons and <0.75 in all other cases).
Many genes in G1 and G2 will have multiple homologs and hence not fall into a one-to-one relationship. Instead, we use this smaller set of one-to-one relationships to detect further orthologs. First, define C as the immediate (left or right) neighboring gene of A and D as the neighbor of B. If C and D are homologs, even if they also show homology to other genes, they are defined as orthologs. Importantly, now that C and D are identified as orthologs, their other homology relationships are deleted. We repeat the procedure for identifying one-to-one pairs, no longer using criterion 3. The entire process is repeated until no further orthologs are identified .
Sequence alignment and quality control
We created a data pipeline using Bioperl 1.6 . The initial inputs were the set of gene orthologs determined above: we found 19,416 ortholog sets for the 8 eutherian mammal species. This set is made up of protein-coding genes with no clear evidence of tandem duplication, since duplication and subsequent functional specialization can alter measured selective constraints . Human genes with orthologs in fewer than 5 other mammals were excluded from analysis.
Given a set of orthologous genes from humans and at least five other mammals, the corresponding amino acid sequences were aligned using MUSCLE v3.6 with default parameters . We next performed several filtering steps to assure alignment quality. First, we required that all possible pairs of sequences in each alignment have pairwise percent identity (PID) of ≥40%. If any pair of sequences had a PID < 40%, then the sequence with the lowest PID to a consensus sequence was removed. The remaining sequences were then realigned and their PID rechecked. Next, we removed gap columns from the finished alignments. In cases where this resulted in fewer than 50 aligned amino acid columns, the sequences with the lowest PID to the consensus sequence was removed and the original sequences were realigned. This was done iteratively as long as there were still more than 5 sequences to align. The result of these filtering steps was the 13,928 multiple sequence alignments used in the remainder of our analyses.
Estimation of selective constraint
Using the above amino acid alignments, we inferred codon-preserving nucleotide alignments and estimated the ratio of nonsynonymous to synonymous rates (Ka/Ks or ω) with the codeml package in PAML 4.2 . We assumed the sequences had evolved under previously published mammalian phylogenetic relationships [54, 55], namely ((((human, chimpanzee), macaque), (mouse, rat)),((horse, dog), cow)).
PAML model M0 was used to estimate the maximum likelihood value of ω . Recall that the synonymous substitution rate is used as a proxy for the mutation rate: the deficit or surplus of nonsynonymous substitutions relative to this value is then indicative of purifying or diversifying selection.
Identification of metabolic genes under positive selection
With these same data, we used a site-specific model to look for genes under positive selection. We compared PAML models M1a and M2a, nearly-neutral and positive selection models, respectively . M1a has two ω parameters: ω0 < 1 and ω1 = 1. M2a has three ω parameters: ω0 < 1, ω1 = 1, and ω2 > 1. A likelihood ratio test was used to determine if M2a was a significantly better fit to the data than M1a, given that model M2a has two more free parameters. Genes that had twice the difference in log-likelihood greater than the critical χ2 value (5.99; P < 0.05) were assumed to be under positive selection.
Gene expression analysis
We collected expression levels for 4,105 genes by querying Affimetrix microarray data in the HUman Gene Expression Index [HUGE Index; ].We then determined the maximum level at which a gene is expressed in the 19 tissues, comprising 59 experiments in the HUGE Index database.
We tested whether the metabolic and non-metabolic ortholog sets had differing values of ω using a nonparametric Mann-Whitney U-test  as implemented in R. Differences in ω between cellular compartments were also analyzed with this test.
As discussed in the Results, because the data in question were visually skewed, we sought to confirm the results of the Mann-Whitney test of differences in ω between metabolic and nonmetabolic genes in two ways. First, we bootstrapped samples of size n = 1,190 (the number of metabolic orthologs in our dataset) from the set of non-metabolic ω values and compared the sample means to the actual mean ω of the metabolic genes. Second, to compare not only the means of the two sets of genes but also their variability, we fit several probability density functions to these data using the MASS library in R . The distributional parameters were estimated by maximum likelihood: numerical optimization was carried out using Nelder-Mead or Broyden-Fletcher-Goldfarb-Shanno methods for single and multi-parameter distributions, respectively . We used these estimations to calculate the Pearson's correlation coefficient for Q-Q plots of quantile values versus observed frequency (Additional file 1, Figure S1). The result of this analysis was a list of distributional families ranked in terms of the best fit of each distribution to the data (Table 1).
We used a likelihood ratio test [LRT; ] to compare the distributions of metabolic and non-metabolic ω values. The null model requires the set of ω values from both metabolic and non-metabolic genes to be drawn from a single probability distribution. The alternative model allows the metabolic and non-metabolic genes to have differing values of the distribution parameters (while still following the same distribution function). As a result, the alternative model has twice as many free parameters (2p) as does the null model, allowing us to compare the difference in log likelihood between the two models to a χ2 distribution with p degrees of freedom.
Modularity, degree, and betweenness estimates for the metabolic networks were calculated using the igraph library  in R. Modularity was estimated using the Clauset et al. algorithm for detecting community structure . Betweenness was calculated using the Brandes algorithm [27, 28, 62].
Removal of currency metabolites
As discussed in the Results, and following Huss and Holme , we defined currency metabolites as metabolites whose removal increased the effective modularity (ΔQ) of the network in question. Modularity is a measure of the degree to which nodes fit into distinct and connected subunits . Effective modularity is the difference between the maximum modularity of the graph and the average maximum modularity of a number of random graphs . Removing currency metabolites increases modularity, since it removes interconnections between distinct subgraphs, while removing non-currency metabolites decreases modularity by isolating reactions from their subgraph modules.
To calculate ΔQ we compared modularity (Q)  of the graph to the average Q of 1000 randomly rewired graphs. The Q for the graph minus the average Q of these 1000 random graphs is ΔQ . We thus ordered the metabolites by the frequency with which they formed edges, removed the most frequent metabolite and calculated ΔQ. We then reordered the metabolites and repeated this procedure until any further removal of metabolites only decreased ΔQ.
We optimized ΔQ for three different types of networks. First, we used a noncompartmentalized network (i.e., we removed compartmental metabolite designations so that ATP in the cytoplasm is treated equivalently to ATP in the mitochondria, Figure 4A). We next considered a network where we retained compartmental metabolite designations (where a specific metabolite may be removed from one cellular component, but not another) and optimized the global compartmentalized network (Figure 4B; Additional file 2, Table S1). Finally, we optimized the modularity in each cellular compartment individually (Additional file 1, Figure S3). Because the compartmentalized network offered both improved biological intuition and better performance in our modularity analysis, it was used for all of the analyses presented above. We note, however, this noncompartmentalized network shows a weaker association of betweenness and ω than does the compartmentalized one (r betweenness = -0.07, P < 10-4).
We would like to thank Michaël Bekaert, Patrick Edger, and Chris Pires for helpful discussions. This work was supported by the National Library of Medicine Biomedical and Health Informatics Training Fellowship (LM007089-19; CMH) and the Reproductive Biology Group of the Food for the 21st Century program at the University of Missouri (GCC).
- Simpson GG: Tempo and mode in evolution. 1944, New York: Columbia University PressGoogle Scholar
- Kimura M: Evolutionary rate at the molecular level. Nature. 1968, 217: 624-626. 10.1038/217624a0.View ArticlePubMedGoogle Scholar
- Kimura M: The rate of molecular evolution considered from the standpoint of population genetics. Proc Natl Acad Sci USA. 1969, 63: 1181-1188. 10.1073/pnas.63.4.1181.View ArticlePubMedPubMed CentralGoogle Scholar
- Kimura M: The neutral theory of molecular evolution. 1983, Cambridge: Cambridge University PressView ArticleGoogle Scholar
- Nei M: Selectionism and neutralism in molecular evolution. Mol Biol Evol. 2005, 22: 2318-2342. 10.1093/molbev/msi242.View ArticlePubMedPubMed CentralGoogle Scholar
- Cooper GM, Brudno M, Stone EA, Dubchak I, Batzoglou S, Sidow A: Characterization of evolutionary rates and constraints in three mammalian genomes. Genome Res. 2004, 14: 539-548. 10.1101/gr.2034704.View ArticlePubMedPubMed CentralGoogle Scholar
- Lynch M, Conery JS: The origins of genome complexity. Science. 2003, 302: 1401-1404. 10.1126/science.1089370.View ArticlePubMedGoogle Scholar
- Ellegren H: A selection model of molecular evolution incorporating the effective population size. Evolution. 2009, 63: 301-305. 10.1111/j.1558-5646.2008.00560.x.View ArticlePubMedGoogle Scholar
- Conant G: Neutral evolution on mammalian protein surfaces. Trends Gen. 2009, 25: 377-381. 10.1016/j.tig.2009.07.004.View ArticleGoogle Scholar
- Slotte T, Foxe JP, Hazzouri KM, Wright SI: Genome-wide evidence for efficient positive and purifying selection in Capsella grandiflora, a plant species with a large effective population size. Mol Biol Evol. 2010, 27: 1813-1821. 10.1093/molbev/msq062.View ArticlePubMedGoogle Scholar
- Fay JC, Wyckoff GJ, Wu CI: Testing the neutral theory of molecular evolution with genomic data from Drosophila. Nature. 2002, 415: 1024-1026. 10.1038/4151024a.View ArticlePubMedGoogle Scholar
- Charlesworth J, Eyre-Walker A: The other side of the nearly neutral theory, evidence of slightly advantageous back-mutations. Proc Natl Acad Sci USA. 2007, 104: 16992-16997. 10.1073/pnas.0705456104.View ArticlePubMedPubMed CentralGoogle Scholar
- Bachtrog D: Similar rates of protein adaptation in Drosophila miranda and D. melanogaster, two species with different current effective population sizes. BMC Evol Biol. 2008, 8: 334-10.1186/1471-2148-8-334.View ArticlePubMedPubMed CentralGoogle Scholar
- Fitch WM, Margoliash E: Construction of phylogenetic trees. Science. 1967, 155: 279-284. 10.1126/science.155.3760.279.View ArticlePubMedGoogle Scholar
- Yang Z: Maximum-likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. Mol Biol Evol. 1993, 10: 1396-1401.PubMedGoogle Scholar
- Drummond DA, Bloom JD, Adami C, Wilke CO, Arnold FH: Why highly expressed proteins evolve slowly. Proc Natl Acad Sci USA. 2005, 102: 14338-14343. 10.1073/pnas.0504070102.View ArticlePubMedPubMed CentralGoogle Scholar
- Duret L, Mouchiroud D: Determinants of substitution rates in mammalian genes: expression pattern affects selection intensity but not mutation rate. Mol Biol Evol. 2000, 17: 68-85.View ArticlePubMedGoogle Scholar
- Drummond DA, Raval A, Wilke CO: A single determinant dominates the rate of yeast protein evolution. Mol Biol Evol. 2006, 23: 327-337. 10.1093/molbev/msj038.View ArticlePubMedGoogle Scholar
- Wagner A: Energy constraints on the evolution of gene expression. Mol Biol Evol. 2005, 22: 1365-1374. 10.1093/molbev/msi126.View ArticlePubMedGoogle Scholar
- Hurst LD, Smith NG: Do essential genes evolve slowly?. Curr Biol. 1999, 9: 474-450. 10.1016/S0960-9822(99)80334-0.View ArticleGoogle Scholar
- Hirsh AE, Fraser HB: Protein dispensability and rate of evolution. Nature. 2001, 411: 1046-1049. 10.1038/35082561.View ArticlePubMedGoogle Scholar
- Fraser HB, Hirsh AE, Steinmetz LM, Scharfe C, Feldman MW: Evolutionary rate in the protein interaction network. Science. 2002, 296: 750-752. 10.1126/science.1068696.View ArticlePubMedGoogle Scholar
- Jordan IK, Rogozin IB, Wolf YI, Koonin EV: Essential genes are more evolutionarily conserved than are nonessential genes in bacteria. Genome Res. 2002, 12: 962-968.View ArticlePubMedPubMed CentralGoogle Scholar
- Pál C, Papp B, Hurst LD: Rate of evolution and gene dispensability. Nature. 2003, 421: 496-497.View ArticlePubMedGoogle Scholar
- Jordan IK, Wolf YI, Koonin EV: No simple dependence between protein evolution rate and the number of protein-protein interactions: only the most prolific interactors tend to evolve slowly. BMC Evol Biol. 2003, 3: 1-10.1186/1471-2148-3-1.View ArticlePubMedPubMed CentralGoogle Scholar
- Hahn MW, Conant GC, Wagner A: Molecular evolution in large genetic networks: Connectivity does not equal constraint. J Mol Evol. 2004, 58: 203-211. 10.1007/s00239-003-2544-0.View ArticlePubMedGoogle Scholar
- Freeman LC: A set of measures of centrality based on betweenness. Sociometry. 1977, 40: 35-41. 10.2307/3033543.View ArticleGoogle Scholar
- Brandes U: A faster algorithm for betweenness centrality. J of Math Sociol. 2001, 25: 163-177. 10.1080/0022250X.2001.9990249.View ArticleGoogle Scholar
- Liu WC, Lin WH, Davis AJ, Jordan F, Yang HT, Hwang MJ: A network perspective on the topological importance of enzymes and their phylogenetic conservation. BMC Bioinformatics. 2007, 8: 121-10.1186/1471-2105-8-121.View ArticlePubMedPubMed CentralGoogle Scholar
- Jovelin R, Phillips CP: Evolutionary rates and centrality in the yeast gene regulatory network. Genome Biol. 2009, 10: R35-10.1186/gb-2009-10-4-r35.View ArticlePubMedPubMed CentralGoogle Scholar
- Jordan IK, Marino-Ramirez L, Wolf YI, Koonin EV: Conservation and coevolution in the scale-free human gene coexpression network. Mol Biol Evol. 2004, 21: 2058-2070. 10.1093/molbev/msh222.View ArticlePubMedGoogle Scholar
- Vitkup D, Kharchenko P, Wagner A: Influence of metabolic network structure and function on enzyme evolution. Genome Biol. 2006, 7: R39-10.1186/gb-2006-7-5-r39.View ArticlePubMedPubMed CentralGoogle Scholar
- Wagner A: Evolutionary constraints permeate large metabolic networks. BMC Evol Biol. 2009, 9: 231-10.1186/1471-2148-9-231.View ArticlePubMedPubMed CentralGoogle Scholar
- Greenberg AJ, Stockwell SR, Clark AG: Evolutionary constraint and adaptation in the metabolic network of Drosophila. Mol Biol Evol. 2008, 25: 2537-2546. 10.1093/molbev/msn205.View ArticlePubMedPubMed CentralGoogle Scholar
- Duarte NC, Becker SA, Jamshidi N, Thiele I, Mo ML, Vo TD, Srivas R, Palsson BØ: Global reconstruction of the human metabolic network based on genomic and bibliomic data. Proc Natl Acad Sci USA. 2007, 104: 1777-1782. 10.1073/pnas.0610772104.View ArticlePubMedPubMed CentralGoogle Scholar
- Yang Z: PAML 4: Phylogenetic analysis by maximum likelihood. Mol Biol Evol. 2007, 24: 1586-1591. 10.1093/molbev/msm088.View ArticlePubMedGoogle Scholar
- Sneath PHA, Sokal RR: Numerical Taxonomy. 1973, San Francisco: W. H. Freeman & CoGoogle Scholar
- Huss M, Holme P: Currency and commodity metabolites: their identification and relation to the modularity of metabolic networks. IET Systems Biology. 2007, 1: 280-285. 10.1049/iet-syb:20060077.View ArticlePubMedGoogle Scholar
- Newman MEJ: Modularity and community structure in networks. Proc Natl Acad Sci USA. 2006, 103: 8577-8582. 10.1073/pnas.0601602103.View ArticlePubMedPubMed CentralGoogle Scholar
- Newman MEJ, Girvan M: Finding and evaluating community structure in networks. Physical Review E. 2004, 69: 026113-10.1103/PhysRevE.69.026113.View ArticleGoogle Scholar
- Haverty P, Weng Z, Best N, Auerbach K, Hsiao LL, Jensen R, Gullans S: HugeIndex: a database with visualization tools for high-density oligonucleotide array data from normal human tissues. Nucleic Acids Res. 2002, 30: 214-217. 10.1093/nar/30.1.214.View ArticlePubMedPubMed CentralGoogle Scholar
- De S, Lopez-Bigas N, Teichmann SA: Patterns of evolutionary constraints on genes in humans. BMC Evol Biol. 2008, 8: 275-10.1186/1471-2148-8-275.View ArticlePubMedPubMed CentralGoogle Scholar
- Lopez-Bigas N, De S, Teichmann SA: Functional protein divergence in the evolution of Homo sapiens. Genome Biol. 2008, 9: R33-10.1186/gb-2008-9-2-r33.View ArticlePubMedPubMed CentralGoogle Scholar
- Tuller T, Kupiec M, Ruppin E: Co-evolutionary networks of genes and cellular processes across fungal species. Genome Biol. 2009, 10: R48-10.1186/gb-2009-10-5-r48.View ArticlePubMedPubMed CentralGoogle Scholar
- Bloom JD, C A: Apparent dependence of protein evolutionary rate on number of interactions is linked to biases in protein-protein interactions data sets. BMC Evol Biol. 2003, 3: 21-10.1186/1471-2148-3-21.View ArticlePubMedPubMed CentralGoogle Scholar
- Liao B, Weng M, Zhang J: Impact of extracellularity on the evolutionary rate of mammalian proteins. Genome Biol Evol. 2010, 2: 39-43. 10.1093/gbe/evp058.View ArticlePubMedPubMed CentralGoogle Scholar
- Julenius K, Pedersen AG: Protein evolution is faster outside the cell. Mol Biol Evol. 2006, 22: 2039-2048. 10.1093/molbev/msl081.View ArticleGoogle Scholar
- Fraser P, Bickmore W: Nuclear organization of the genome and the potential for gene regulation. Nature. 2007, 447: 413-417. 10.1038/nature05916.View ArticlePubMedGoogle Scholar
- Koonin E: Orthologs, paralogs, and evolutionary genomics. Annual Review of Genetics. 2005, 39: 309-338. 10.1146/annurev.genet.39.073003.114725.View ArticlePubMedGoogle Scholar
- Conant GC, Wagner A: GenomeHistory: A software tool and its application to fully sequenced genomes. Nucleic Acids Res. 2002, 30: 3378-3386. 10.1093/nar/gkf449.View ArticlePubMedPubMed CentralGoogle Scholar
- Stajich JE, Block D, Boulez K, Brenner SE, Chervitz SA, Dagdigian C, Fuellen G, Gilbert JGR, Korf I, Lapp H, et al: The Bioperl Toolkit: Perl modules for the life sciences. Genome Res. 2002, 12: 1611-1618. 10.1101/gr.361602.View ArticlePubMedPubMed CentralGoogle Scholar
- Kondrashov FA, Rogozin IB, Wolf YI, Koonin EV: Selection in the evolution of gene duplications. Genome Biol. 2002, 3: research0008-10.1186/gb-2002-3-2-research0008.View ArticlePubMedPubMed CentralGoogle Scholar
- Edgar R: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004, 32: 1792-1797. 10.1093/nar/gkh340.View ArticlePubMedPubMed CentralGoogle Scholar
- Murphy WJ, Pevzner PA, O'Brien SJ: Mammalian phylogenomics comes of age. Trends Gen. 2004, 20: 631-639. 10.1016/j.tig.2004.09.005.View ArticleGoogle Scholar
- Nishihara H, Hasegawa M, Okada N: Pegasoferae, an unexpected mammalian clade revealed by tracking ancient retroposon insertions. Proc Natl Acad Sci USA. 2006, 103: 9929-9934. 10.1073/pnas.0603797103.View ArticlePubMedPubMed CentralGoogle Scholar
- Yang Z, Nielsen R, Goldman N, Pedersen A-MK: Codon-substitution models for heterogeneous selection pressure at amino acid sites. Genetics. 2000, 155: 431-449.PubMedPubMed CentralGoogle Scholar
- Wong W, Yang Z, Goldman N, Nielsen R: Accuracy and power of statistical methods for detecting adaptive evolution in protein coding sequences and for identifying positively selected sites. Genetics. 2005, 168: 1041-1051. 10.1534/genetics.104.031153.View ArticleGoogle Scholar
- Sokal R, Rohlf FJ: Biometry. 2000, New York: W. H. Freeman and Company, 3Google Scholar
- Venables WN, Ripley BD: Modern Applied Statistics with S. 2002, New York: Springer, FourthView ArticleGoogle Scholar
- Csárdi G, Nepusz T: The igraph software package for complex network research. InterJournal, Complex Systems. 2006, 1695-Google Scholar
- Clauset A, Newman MEJ, Moore C: Finding community structure in very large networks. Phys Rev E. 2004, 70: 066111-10.1103/PhysRevE.70.066111.View ArticleGoogle Scholar
- Freeman LC: Centrality in social networks I: conceptual clarificaiton. Social Networks. 1979, 1: 215-239. 10.1016/0378-8733(78)90021-7.View ArticleGoogle Scholar
- Bastian M, Heymann S, Jacomy M: Gephi: an open source software for exploring and manipulating networks. International AAAI Conference on Weblogs and Social Media. 2009Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.