Environmental variability and modularity of bacterial metabolic networks

Background Biological systems are often modular: they can be decomposed into nearly-independent structural units that perform specific functions. The evolutionary origin of modularity is a subject of much current interest. Recent theory suggests that modularity can be enhanced when the environment changes over time. However, this theory has not yet been tested using biological data. Results To address this, we studied the relation between environmental variability and modularity in a natural and well-studied system, the metabolic networks of bacteria. We classified 117 bacterial species according to the degree of variability in their natural habitat. We find that metabolic networks of organisms in variable environments are significantly more modular than networks of organisms that evolved under more constant conditions. Conclusion This study supports the view that variability in the natural habitat of an organism promotes modularity in its metabolic network and perhaps in other biological systems.


Reactions-substrates bipartite networks analysis
Reaction-substrate bipartite graph is a description of the metabolic reactions, where each metabolite node is connected to the reactions nodes that consume/produce it. Q rand was computed by averaging over bipartite random networks that preserve the metabolites' as well as the reactions' degree distribution.

Construction of equally-sized networks
Metabolic networks of bacteria with different lifestyles are of different sizes. To control for the effect of the network's size, we repeated the analysis on reduced networks with the same number of nodes. We constructed a set of equal size networks (60 nodes) by a serial removal of nodes with degree <=2 (i.e. we remove cycles and shorten linear pathways). Perason's partial correlation between X and Y conditioned on Z allows one to compute the correlation between X and Y, discounting the correlations between X and Z and between Y and Z [1]. We computed the correlation between modularity and variability conditioned on the size of the networks (=number of metabolites).

Figure S3
For this purpose, we grouped the species into two classes corresponding to bacteria with low-variability lifestyle (Obligate, Specialized and Aquatic) and bacteria with high-variability lifestyle (Facultative, Multiple, Terrestrial). The partial Pearson correlation between modularity and variability conditioned on the size of the network is c=0.24 with p-value= 0.02. This implies that the correlation is still significant when conditioned on network size. It is not as high as the full correlation coefficient because variability itself seems to correlate with network size (Fig 1b, main text).

Relation between variability in the environment and other structural
indices of the metabolic networks

Clustering-coefficient
Clustering coefficient reflects the local community structure [2]. The clustering-coefficient of a node is defined as the number of edges between its neighbors divided by the total number of possible such edges (considering the network as undirected). The clustering coefficient of the network is the mean over all nodes with degree >1. We computed the clustering coefficient for each natural network and its corresponding set of random networks. The normalized measure was obtained by dividing the measure of the real network by the mean of the random networks. That normalized clustering coefficient significantly increases with the variability of the environment (Fig. S4a).

Betweenness-centrality
Centrality of node X is defined as the number of shortest paths between pairs of nodes in the network that go through X. Betweenness centrality measures the average centrality of all nodes in the network. We scaled this parameters by dividing it by the maximal value that could be obtained for a network of the same size [3]. Analytic analysis of this measure reveals that networks with a tree like topology have higher betweenness than networks with cycles. The intuition behind this is that for acyclic graph, each non leaf node must be visited when traveling from one side of the tree to the other side. When adding shortcuts to the tree, one creates alternative pathways and the centrality of the node decreases. We find that the normalized betweenness-centrality is anti correlated with variability ( Fig. S4b).

Cyclic-coefficient
The cyclic coefficient of a node is defined as the inverse of the mean of the shortest loop length connecting it with each pair of its neighbors [4]. Cyclic coefficient of a network is the mean over all nodes with degree>1. Networks without cycles, such as perfect trees, have cyclic coefficient of zero.
Generally, tree like networks are characterized by low cyclic coefficient. We normalized this parameter by the mean of its value for randomized networks.
We find that the cyclic coefficient correlates with the variability of the environment (Fig. S4c). Figure S4: c.

Single attributes analysis
We considered a certain set of possible factors that can explain the modularity level of organism's metabolic network:   b. Each dot corresponds to a pair of bacteria. Its Y axis corresponds to the difference in Qm and its X axis corresponds to the evolutionary distance between these bacteria.

Multi way analysis of variance (n-way ANOVA)
Multi way analysis was performed in order to evaluate the quality of each presumably explanatory factor while taking into account the effects of all other variables on the response (e.g. ∆ Qm). Since the phylogenetic distance is an attribute of a pair, we perform pairs analysis. That is, for each pair we consider 5 measures: ∆#(Partial)TF, ∆ #Genes, Phylogenetic distance and ∆ Qm. Using the Matlab function anovan, one obtains a p-value for each explanatory variable. The lower the p-value, the stronger is its association to the response variable. To obtain a distribution of p-values we performed a bootstrap procedure, where we sampled with repetitions 100 samples for 1000 iterations. We find that the partial number of transcription factors out the total number of genes is the best predictor of modularity. Genome size is a less powerful predictor, and phylogenetic distance is a weak predictor.
We also evaluated the influence of each attribute on the environmental variability ( Fig S7).

Quantifying structure-function association in the metabolic networks
We tested the structural modules, obtained from the Newmann-Girvan algorithm, for enrichment in metabolic functions. We score the strength of <Structure, Function> association by evaluating two measures: • Functionality -The fraction of structural modules that are significantly enriched in at least one metabolic function.
• Coverage -The fraction of metabolic functions that are found to be enriched in at least one structural module in the network.
We found that these quantities correlate well with the variability in the environment.  c.

Networks Visualization for E.coli and Buchnera
To help understand the structure of different networks, it is useful to obtain an image of the networks, in which the networks are of the same size. We therefore reduced a "varying environment" network in a manner that preserved its original topological properties. This step enables comparison of two equally sized networks from two different environmental groups (Fig S9a,b). Please note that the procedure described below is meant only for ease of visualization, and is distinct from the We wish to compare these networks by eye with respect to stractural properties.
As we saw, topological comparison between two different size networks involves normalization of the structural indices (over random network ensemble). Though this is simple computationally, capturing the difference by eye is less intuitive. As a preceding step we then wish to reduce the larger network (V1) by removing V1-V2 nodes and yet to preserve the information embedded within the original network with respect to structural as well as functional properties. Although a common procedure for connected sub network sampling, 'breadth-first-search' (BFS) is applicable for some applications that involve analysis of local structural properties such as clustering coefficient and network motifs [5], it is not applicable when analyzing global properties (i.e. modular organization). As an example, lets consider the network of varying environment bacteria, composed of ~20 modules. We would like to remove 80% of its nodes. If we perform BFS from different starting points, the resulted network will usually correspond to 1-3 modules of the original network and thus will not reflect the metabolic capabilities of the original network as well as its pronounced modular design. In what follows we describe our method that uses a special attribute of metabolic networks: its hierarchical modular organization that reflects both the structural and the functional aspects we would like to maintain.
Reduction Procedure: In the present context we want to compare two networks of related bacteria with different lifestyles. The varying environment network (of E.coli) composed of • Problem Generalization: We can generalize the question in the following manner: That is, we wish to construct subnetwork that will preserve the modular organization (Q*) of the complete network (Constrain 3 will be fullfilled by our construction).
According to the Newman and Girvan approach, the modularity score of a network (Q) is the summation over the strength of its modules. That implies that we can reduce our problem of network reduction to a problem of reducing a module while preserving its strength (where module strength is defined as its contribution to its Q measure). Formally, given a module Mi(Vi, Ei ) with strength Qi, we need to build a module Mi'(Vi',Ei') , Vi' ⊂ Vi , Ei' ⊂ Ei such that Qi'=Qi . This procedure ensures that Q is preserved since Q=∑Qi , thus it preserves the modular organization of the complete network. A module's strength can be considered as the ratio between edges within the module to edges that connect it to other modules. This intuitive definition is the key for the present construction. It is easy to see that we can maintain (at least approximately) the ratio of number of edges inside the module to the number of edges outside the module, for each module, and by that obtain a smaller network with the same community structure.

Return Gi'
} Practically, it may be hard to obey all the constrains imposed by the algorithm, and yet our experience suggests that even if some of the local constrains are relaxed the resulting network seems to exhibit very similar hierarchical organization as the original one. Future work may employ Monte-Carlo optimization approaches to satisfy this problem.