Structural correlations in bacterial metabolic networks
© Bernhardsson et al; licensee BioMed Central Ltd. 2011
Received: 28 October 2010
Accepted: 20 January 2011
Published: 20 January 2011
Skip to main content
© Bernhardsson et al; licensee BioMed Central Ltd. 2011
Received: 28 October 2010
Accepted: 20 January 2011
Published: 20 January 2011
Evolution of metabolism occurs through the acquisition and loss of genes whose products acts as enzymes in metabolic reactions, and from a presumably simple primordial metabolism the organisms living today have evolved complex and highly variable metabolisms. We have studied this phenomenon by comparing the metabolic networks of 134 bacterial species with known phylogenetic relationships, and by studying a neutral model of metabolic network evolution.
We consider the 'union-network' of 134 bacterial metabolisms, and also the union of two smaller subsets of closely related species. Each reaction-node is tagged with the number of organisms it belongs to, which we denote organism degree (OD), a key concept in our study. Network analysis shows that common reactions are found at the centre of the network and that the average OD decreases as we move to the periphery. Nodes of the same OD are also more likely to be connected to each other compared to a random OD relabelling based on their occurrence in the real data. This trend persists up to a distance of around five reactions. A simple growth model of metabolic networks is used to investigate the biochemical constraints put on metabolic-network evolution. Despite this seemingly drastic simplification, a 'union-network' of a collection of unrelated model networks, free of any selective pressure, still exhibit similar structural features as their bacterial counterpart.
The OD distribution quantifies topological properties of the evolutionary history of bacterial metabolic networks, and lends additional support to the importance of horizontal gene transfer during bacterial metabolic evolution where new reactions are attached at the periphery of the network. The neutral model of metabolic network growth can reproduce the main features of real networks, but we observe that the real networks contain a smaller common core, while they are more similar at the periphery of the network. This suggests that natural selection and biochemical correlations can act both to diversify and to narrow down metabolic evolution.
Evolution of metabolism occurs through the acquisition and loss of genes whose products acts as enzymes in metabolic reactions. From a presumably simple primordial metabolism the organisms living today have evolved complex and highly variable metabolisms, ranging from extremophiles thriving in extreme conditions and feeding on inorganic compounds such as ironsulfide, to endosymbiontic parasites who are dependent on their hosts and cannot themselves synthesise essential compounds like aminoacids. In spite of this diversity there are still reactions and pathways which are found in a large number of organisms, such as the glycolytic pathway, which probably emerged early in the history of life on earth . Changes to the metabolism are constrained by the available genetic variation, the laws of biochemistry and by the selective pressure the organisms are subject to. Among eukaryotes the main source of evolutionary novelty is gene duplication [2, 3], whereby unequal cross-over of a section of a chromosome leads to the duplication of the corresponding genes. The loss of selective pressure on one copy can then lead to the divergence and sometimes a new functionality of the duplicated gene. Prokaryotes, on the other hand, make use of horizontal gene transfer (HGT) , where genes can be transferred among organisms even from different species. This is a powerful mechanisms which allows for rapid adaptation to new conditions and possibly to swift changes in metabolic capacities. Several models have been proposed for the evolution of metabolism, the most prominent being the retrograde  and the patchwork model . The retrograde model states that metabolic pathways evolve from the core and outwards from a key-metabolite assumed to have been abundant in the environment of the ancestral organism. Depletion of the key-metabolite then led to a selective advantage for organisms that could synthesise the missing metabolite from some other metabolite in the environment. This process is thought to have been repeated giving rise to outward-growing pathways. The patchwork model on the other hand assumes a low substrate-specificity to enzymes in the early stages of evolution, so that each enzyme could catalyse several reactions. Gene duplication events followed by divergence in function then lead to the refinement and specialisation among the enzymes which presumably gave rise to the structure found in metabolic networks today. These two models are however not mutually exclusive, and evidence has been found in support of both. For example homologous enzymes containing a specific domain (TIM-barrels) have been found in several distinct pathways , supporting the patchwork model, but there is also evidence that homologous enzymes are commonly found within the same pathway . Horizontal gene transfer (HGT), which is prominent among prokaryotes, allows for a different mode of metabolic evolution, where enzymes tend to be transferred in groups which are functionally coupled. This would lead to a growth pattern similar to the retrograde model, but not necessarily with the genes being homologous. Further, it has been shown that the genes encoding for enzymes that are involved in transport reactions and reactions far away from the core of the network are more variable in evolution . That study was however only carried out on E. coli and its closest neighbours in the γ-proteobacteria class, and more thorough studies are required to draw more accurate conclusions. In this paper we take a broader view by incorporating 134 bacteria from 16 phyla, and utilising a network perspective to gain quantitative insight into structural correlations between metabolic networks of different species. Comparative studies of microbial metabolic networks have shown that the overlap in metabolic capability between species correlates well with evolutionary kinship, and that phylogenetic trees based on this metric are similar to those which are based on more traditional measures of similarity in 16 S rRNA [10, 11]. It has also been shown that typical network measures such as modularity and average path length have a good predictive power when it comes to clustering metabolic networks into phylogenetic groups . The properties of the reaction nodes themselves was shown to correlate with the number of organisms the reactions are found in. In particular the betweenness centrality, which measures the number of shortest paths in the network passing through the node, was a good predictor, while the node degree itself was a poor predictor, i.e. nodes with high connectivity are not typically present in more organisms . Looking at correlations between pairs of reactions Wagner  showed that some reaction pairs are likely to occur together in one organism, while others are rarely found together. This suggests that evolution of metabolic networks is governed by reaction combinations that are favoured by natural selection. The study by Wagner, however, does not take into account the biochemical network structure these reactions form, and there are many other questions yet unanswered. How are these correlated reactions interconnected in the network? How are reactions added and removed through the evolution of the metabolism, and to what extent does a common metabolic connected core exist? In the present study we aim at an understanding of metabolic network evolution in the context of the structural constraints imposed by existing biochemistry. By comparing which parts, their topological location, and to what degree the metabolic networks of different bacterial species overlap we hope to gain insight into the evolutionary history which has shaped these networks. A few studies have dealt with related topics such as the extent of which network modularity [15–17] and synthesising capabilities  correlates among species in the ancestral tree, but have not done so from a topological perspective. As a further comparison we will also make use of a simple model of network growth, which will serve as a model of a scenario where evolution is completely neutral and only controlled by biochemical constraints. Our analysis presented here indicates that reactions with similar organism degree (OD), denoting the number of organisms in which the reaction is present, tend to be found structurally close to each other in the metabolic network and those with high OD are in general located in proximity of the centre.
We have focused on the reaction-representation of the metabolic networks of 134 bacterial species. The metabolism of an organism can conveniently be represented by a network, which consists of two types of nodes: reactions and metabolites. In the network a directed link connects a metabolite and a reaction if that metabolite is a substrate of the reaction, and conversely a link connects a reaction node and a metabolite if the metabolite is produced in that reaction. However, for simplicity one often considers a projected network where only reaction nodes are present. This means that the nodes in the network represent metabolic reactions, and there is a directed link between two nodes if any of the products of one reaction is the substrate of the other. We have studied both the metabolic networks of a union of 134 bacterial species, as well as those of a subset of two phyla, Chlamydiae and Proteobacteria. One of the key concepts in this work is the organism degree (OD) of a metabolic reaction which denotes the number of different species it is present in. This is of course a crude approximation and ignores species specific information, which could be used for inferring the evolution of metabolic networks. For example, one could from known phylogenies and metabolic networks infer the gain and loss of reaction along the phylogenetic tree, as was done by Pal and colleagues . Furthermore, Borenstein et al.  studied the concept of "seed set" of metabolic networks, defined as the minimal subset of metabolites for a given organism that cannot be synthesised by the other metabolites in the network, from which phylogenetic trees could be reconstructed. Another approach could be to analyse the precise overlap between pairs of species, which coupled with phylogenetic distance, could give insight into evolutionary patterns. In the light of this, discarding the phylogenetic profile of every reaction and replacing it by single number might seem a drastic move, but it allows us to view the data from a topological perspective, something that would have been next to impossible if species identity and phylogenetic information was taken into consideration. Further, it allows for the use of methods from graph theory and complex network analysis, which strengthen our approach. The analysis presented in this paper should thus be viewed as a complement to more traditional bioinformatics approaches to inferring the growth of metabolic networks. In order to make sure that the projection from the list of actual species (i.e. phylogenetic profile) onto a single number, the organism degree, for each reaction in the network, was reasonable and information preserving, we measured how often two neighbouring nodes in the reaction network of the same OD in fact are present in precisely the same species. This turned out to be true in 93% of the cases confirming that the OD is a reasonable level of description, and a meaningful concept to study. The remaining 7% correspond to 85 reaction pairs of which a majority are pairs with OD = 1, which, as our analysis will show, lie at the periphery of the network, and are unlikely to have an impact on the main conclusions of our study.
Table of core sizes, where s core is the size of the largest super node of maximum OD, N tot is the total number of reactions in the respective reaction network, and is the total number of reactions with maximum OD.
Random subset of species
Union of bact.(ΔOD = 10)
Union of bact.(ΔOD = 50)
Model (ΔOD = 10)
Model (ΔOD = 50)
Random subset of model networks
The structural correlations between metabolic networks from different species bear a clear mark of their evolutionary history. Species which are closely related usually have more similar metabolisms compared to distant relatives. This feature can clearly be seen in figure 1, which shows the condensed metabolic network of both the union of a wide range of species, and that of phylogenetically closer species. The overlap of the networks restricted to a single phyla are considerably larger, implying a higher similarity in metabolic capability. This picture also suggests that what is common to the species within a phyla forms a connected component, and that the species specific reactions are attached onto this core metabolism. Although this cannot visually be seen for the union of the 134 bacterial species, the quantification of average OD as function of distance from the centre of the network shows that reactions become more specific as we move out to the periphery. The average OD for the two smaller networks decay more rapidly, but this does not imply that the distribution of OD across the network is less structured. On the other hand, as can be seen in figure 4 correlations are prominent. The drop in average OD should instead be interpreted as a consequence of the similarity of the underlying species. A similarity which persists even at the periphery of the network. The two-node correlation function shows that correlations in OD are present up to distance of approximately 5 nodes. This suggests that there is a characteristic length scale within the union-network which tends to be conserved among closely related species. In other words, viewing metabolic evolution as a transfer of subsets of the metabolic network from one species to another (as in HGT), our results suggest that the typical size of these transferred sets of reactions is approximately five. Interestingly, this is close to the average pathway length of 4.4, as defined in the MetaCyc database, suggesting that the annotated pathway structure reflects the structural correlations we have reported on. There is also evidence that pathways are enriched within super nodes (see additional file 1, figure S3), which further supports this view. The importance of HGT in shaping metabolic evolution has previously been highlighted, for example in a study by Pal et. al . They showed that HT-genes are integrated at the periphery of the metabolic network, and that the proportion of HT-genes increases as one moves from biomass producing reactions to nodes at the periphery where transport reactions occur. Further, they showed that pairs of enzymes which were flux coupled were more often gained and lost together in the phylogenetic tree. This concords well with the results presented in this study, that the variability of network structure is higher closer to the periphery, and by taking into account the precise placement of enzymes/reactions on the metabolic union-network we have put forward further quantitative evidence for this growth mechanism. It has been claimed that many of the properties exhibited by metabolic networks such as modularity and scale-free degree distributions are products of adaptive evolution. One argument that has been put forward is for example the robustness and fault-tolerance conferred by a scale-free architecture . However, little emphasis has been put on the underlying biochemical constraints these networks are subject to, and the existing biochemistry has an obvious influence on possible network structures. It has for example been shown that atmospheric reaction networks exhibit scale-free degree distributions , and these networks have clearly not been subject to any selective process. On the theoretical side it has also been shown that the scale-free degree distributions of metabolic networks can be obtained as a maximum-entropy solution, giving a random null-model neutral to any selective pressure [26, 27]. Naturally the external environment of an organism will have an influence on metabolic evolution, such as in the case of certain endosymbionts which have lost the capability to synthesise amino acids provided by the host, but it is still unclear how much the laws of chemistry act to constrain metabolic evolution. The present study will not resolve these questions, but still gives an idea of how the metabolism of different bacterial species are related with respect to each other. Approximately 1.5% of the nodes in the union-network of all bacteria are present in all species, but these reactions do not form a coherent core, but are distributed across the network. The presence of these universal reactions does suggest some level of constraint, but the fact that they are bridged by different reaction paths in different species on the other hand hints at a degree of plasticity. A large metabolic core was indeed found in the union-networks of the subset of the Chlamydiae and Proteobacteria phyla, but this is an effect of the close phylogenetic relationship of these species, and not of a constraining chemistry. The results of the network growth model also support this dual view. Although the largest super node with maximal OD in the super-network generated from 134 model networks contains almost all the nodes of that OD, it only spans a small fraction of the network. This means that the completely neutral evolutionary process occurring in the model results in metabolisms which exhibit too much overlap compared to the real data, but at the same time are more diverse than the real metabolic networks. Notably the model overestimates the average OD close to the core, while underestimating it further away, an effect which is also obvious from the OD distribution plot (figure 5). This is most likely due to two separate deficiencies of the model. Firstly, nodes with a high degree in the BioCyc-network are more likely to be included in the model networks. The average node degree (in- plus out-degree) in the union-network generated from the model was 4.4, compared to 3.3 for the real data, suggesting that this is the case. These hubs are typically close to the centre of the network, and their affinity to be included in the networks therefore leads to an increased average OD at small distances from the centre. On the other hand, further away from the centre the model gives rise to lower average OD than the real data. This means that the model networks exhibit a larger diversity on the periphery, compared to the bacterial networks which are less diverse at this distance. This discrepancy from the neutral model suggests that natural selection and biochemical correlations can act both to diversify and to narrow down metabolic evolution. Bacterial species sharing the same environment would of course be likely to share pathways which allow for the degradation of compounds specific to that environment , and this effect would lead to a lower diversity at the periphery of the metabolic networks. The complimentary roles of some reactions, which were reported as anti-correlated reaction pairs by Wagner , could on the other hand lead to a smaller amount of overlap, and thus a smaller number of nodes with high OD. One might argue that the model made use of here is too simple to accurately describe the evolutionary paths of metabolic networks. For example, the model does not take into account the possibilities that the network can shrink (p < 1), add/remove chunks of reactions, or that all substrates must be available before a new reaction can be incorporated. This is all very true, however the model is not meant to give a detailed description of the evolutionary path of metabolic networks. Its purpose is to capture how the diversity of possible networks is influenced by the underlying biochemical constraints. In this setting it effectively means that it is the fluctuations of the "surface" of the networks (reactions not included in the common core at the periphery of the network) that is the interesting property, which is independent of the details of the growth model. The model used here is then just the simplest, most straight forward, way of obtaining an ensemble of networks of a given size, grown from a given seed reaction. The growth model has an obvious flavour of the retrograde model of metabolic evolution, where new reactions are added at the periphery of the network. It is therefore not surprising to find that nodes with high OD are found close to the centre of the network and that the average OD decreases as we move out from the centre, but the similar results for the real metabolic networks suggest that an analogous growth process has shaped their evolution. Our results thus point to the retrograde model as being more likely, however this should be looked upon in the light of HGT, as has been suggested in .
We have presented a study of the structural correlations in the metabolic networks of 134 bacterial species. By considering the union-network formed by these networks we were able to show that the universal reactions lie at the centre of the network and more specific reactions are placed closer to the periphery. The analysis of two smaller groups of species taken from the Chlamydiae and Proteobacteria phyla showed that these union-networks exhibited metabolic cores onto which species-specific subnetworks were attached. These features could be reproduced by a simple network growth model confirming that horizontal gene transfer is a prominent mechanism in the evolution of metabolic networks. The model does however overestimate the number of species specific reactions, and also overestimates the size of the common metabolic core, highlighting the effect of environmental selection and biochemical correlations.
The coarse-graining of the union of the metabolic networks was done by clustering all nodes that are connected to each other, and have the same OD, into one super node, using a depth-first algorithm. The size, s i , of the super node, i, equals the number of reactions that were condensed together. For the small subsets of the phyla Chlamydiae and Proteobacteria two neighbouring reaction nodes are placed in the same super node if their OD is exactly equal. For the much larger set of 134 organisms the sparseness in OD naturally increases, which we compensate by letting each super node represent a range of 10 OD values (e.g. OD1 = 1,..., 10, OD2 = 11,..., 20, etc.). Two super nodes are then joined if there exists at least a reaction in each super node which are linked together. In figure 1 we have omitted the smallest super nodes for the sake of clarity, which explains why the union of all bacteria seems highly disconnected.
In our analysis we have done a small-scale analysis on a subset of two bacterial phyla (Proteobacteria and Chlamydiae) as well as a large-scale analysis based on a union of bacterial species taken from the MetaCyc database . In order to have well-defined phylogenetic relationships between the species under consideration we picked 134 species for which a phylogenetic tree has been constructed . Please note that this presents a possible bias where certain phyla are over-represented in terms of DNA-sequencing. The subset of the species from the Proteobacteria were chosen as the 10 closest relatives to Bordetella parapertusis (including itself), and the subset of the Chlamydiae was chosen as all the species of that phyla which were included in the phylogenetic study, which amounted to 7 species. A full list and the two subsets of species are given in additional file 1, Table S1.
The data describing the metabolism of the 134 species was downloaded from MetaCyc (22/10/2009) . We transformed the data into a bipartite network, which contains two types of nodes: metabolites and reactions with directed links between them. Each reaction node was also tagged with a list of species in which it occurs, the organism degree. In order to simplify the analysis, the network was projected to a reaction-reaction network, where two reactions are connected if the product of one is the substrate of the other. However, in order to avoid connections contributed by currency metabolites such as water and ATP, which tend to dominate the network, we applied a pruning algorithm , which removed all metabolites with a connectivity higher than 10 prior to the projection. The resulting metabolic network contained 3650 reaction nodes and 4411 links. In order to make sure that the pruning level chosen is appropriate we also generated networks with pruning level 20 and 5. The higher pruning level leads to a dense network with short path lengths, while the lower one gives similar results to pruning level 10. The dependence of average OD as a function of distance on pruning level is shown in additional file 1, figure S1.
The network growth model used in this study was inspired by a previously published model by Maslov et. al , which was used for modelling the evolution of prokaryote metabolism. In our model, the growth of the metabolic network occurs on a background of possible reactions. These were defined from the BioCyc database , which describes the union of all known biochemical reactions. This data was downloaded from the BioCyc website (22/10/2009) and subject to the same processing as the MetaCyc-data. In addition we only kept the reactions which were in the giant connected component of the network. This resulted in a network consisting of 5191 reaction nodes and 10130 links. The growth model was implemented as follows: first we define an initial, or seed set of reactions R 0. In each time step t of the model we do one of the following: (i) with probability p we add a reaction node to R t -1by at random picking a node which is not in R t -1, but is neighbour of at least one of the current nodes (as defined in the underlying BioCyc-network). With complementary probability 1- p we pick a node in R t -1, which is a leaf, i.e. has only one link, and remove it. The following is repeated either until the network defined by R t reaches a given size or after a given number of time steps. In our version of the model the seed set is only one reaction; either the Methenyltetrahydrofolate cyclohydrolase - reaction (found in all 134 species) or a randomly chosen reaction, and the parameter p was set to unity, meaning that we only add reactions to the network. If we would have used p < 1 the end result would have been roughly the same (differing mainly in the collection of peripheral reactions) but would have required more iteration steps. Since we wished to produce a large collection of metabolisms, and not study their exact evolution, we used p = 1. The model was run 134 times to produce an ensemble of organisms having the same size as the union of bacteria. Also, the size of each individual network was set to match exactly that of each considered species. In order to make sure that the choice of seed set, in the fixed reaction scheme, does not bias the dynamics of the model we also ran the model with a starting node with OD = 1. The result of this simulation can be found in additional file 1, figure S2, and shows that the union-network in this case exhibits a similar structure.
The work of PG was funded by the Danish National Research Foundation through the Center for Models of Life (CMOL), as well as by the Willum Kann Rasmussen foundation through BIONET. LL acknowledges financial support form the Knut and Alice Wallenberg foundation.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.