- Research article
- Open Access

# Convergent evolution of modularity in metabolic networks through different community structures

- Wanding Zhou
^{1}Email author and - Luay Nakhleh
^{2}Email author

**12**:181

https://doi.org/10.1186/1471-2148-12-181

© Zhou and Nakhleh; licensee BioMed Central Ltd. 2012

**Received:**15 May 2012**Accepted:**9 August 2012**Published:**14 September 2012

## Abstract

### Background

It has been reported that the modularity of metabolic networks of bacteria is closely related to the variability of their living habitats. However, given the dependency of the modularity score on the community structure, it remains unknown whether organisms achieve certain modularity via similar or different community structures.

### Results

In this work, we studied the relationship between similarities in modularity scores and similarities in community structures of the metabolic networks of 1021 species. Both similarities are then compared against the genetic distances. We revisited the association between modularity and variability of the microbial living environments and extended the analysis to other aspects of their life style such as temperature and oxygen requirements. We also tested both topological and biological intuition of the community structures identified and investigated the extent of their conservation with respect to the taxomony.

### Conclusions

We find that similar modularities are realized by different community structures. We find that such convergent evolution of modularity is closely associated with the number of (distinct) enzymes in the organism’s metabolome, a consequence of different life styles of the species. We find that the order of modularity is the same as the order of the number of the enzymes under the classification based on the temperature preference but not on the oxygen requirement. Besides, inspection of modularity-based communities reveals that these communities are graph-theoretically meaningful yet not reflective of specific biological functions. From an evolutionary perspective, we find that the community structures are conserved only at the level of kingdoms. Our results call for more investigation into the interplay between evolution and modularity: how evolution shapes modularity, and how modularity affects evolution (mainly in terms of fitness and evolvability). Further, our results call for exploring new measures of modularity and network communities that better correspond to functional categorizations.

## Keywords

- Gene Ontology
- Community Structure
- Metabolic Network
- Community Detection
- Oxygen Requirement

## Background

*in vivo*and in simulation, to have risen from fluctuating environments [8, 9]. An alternative explanation can be formulated from the other direction: because species with a higher modularity in their metabolic networks are more capable of adapting to changes in environment, they colonize a wider range of habitats, giving rise to the observation that bacteria living in varying habitats have more modular metabolic networks (edge 2 in Figure 1). In another recent study of an Archaea data set [10], such relationship between modularity and habitat variability was not found, which calls for more investigation of alternative explanations.

Modularity as a graph-theoretic concept, when studied on biological networks, can be quantified in different ways [6, 11–15]. In the works of Parter et al. [6] and Kreimer et al. [15], modularity is based on the definition of Newman and Girvan [16]. This definition quantifies the extent to which the graph connectivity of a network exhibits a modular structure, that is, communities with a majority of the connections falling within, rather than across, communities. Roughly speaking, the modularity score *Q* [16] (see Methods), which is a quantity associated with a partition of the network, indicates how much more likely it is for an edge to be placed inside a community from that partition than would be expected from a random selection of neighbors for a node of a certain degree. The partition of nodes that gives rise to the maximum *Q* value is regarded as the community structure of the graph, and the score itself is taken to be the graph’s modularity.

Although the modularity score depends on the community structure, similar modularity scores may arise from different community structures. It is natural to ask (and is currently unknown) whether a specific modularity (high or low) of metabolic networks is the result of acquiring a similar community structure or of achieving different community structures. More specifically, assuming that network modularity plays an adaptive role [17], as is the case for the first explanation (Figure 1), is it the modularity score that confers higher fitness regardless of the community structure giving rise to it, or is it the community structure that is the unit of selection and modularity is conserved only as a consequence? If modularity is achieved via similar community structures, it might be the community structure that is the unit of selection under different environments. That said, any observed association of modularities with the environmental features [6, 15] or growth conditions [10] would naturally give rise to a question as to whether such a correlation arises due to similar community structures (which, by definition, would have similar modularity scores) or different community structures with similar modularity scores.

In this work, we analyzed metabolic networks of species spanning three kingdoms of life by computing their community structures and modularity scores (see Methods for details on metabolic network reconstruction). We compared the difference in community structures against the difference in modularities and the genetic distance, to investigate the correlation, or lack thereof, among the three. The results suggest that the difference in community structures does not parallel the difference in modularity scores we compute, except when community structures are extremely similar. That is, we find that larger community structure differences do not necessarily mean larger differences in modularity scores and vice versa, which is an indication of convergent evolution of modularities via different underlying community structures. To further understand the evolutionary driving force behind such convergent evolution, we revisited the analysis of Parter et al. [6], which first associated modularity with habitat variability, but under different aspects of the microbial life styles, including temperature preference and oxygen requirement. We also confirmed the finding of Kreimer et al. [15] that the size of the metabolome (the number of enzymes) is a major determinant of the modularity score, even after the score is normalized and believed to be size-independent on general (non-metabolic) networks.

From a computational perspective, a contribution of this paper is an improved heuristic based on spectral decomposition for modularity optimization [18] using a self-organizational merge and resplit refinement. The goal of this improvement is to deterministically identify more optimal modularity scores and community structures efficiently. We show, on well-studied benchmark data sets, that compared to the original algorithm of Newman [18] and some other existing algorithms [16, 19–21], our algorithm achieves higher *Q* scores at the cost of only a moderate increase in time.

## Results

### Community structure differences do not parallel the modularity differences

Previous studies have shown the association of modularity of metabolic networks with variability of the living environment of species [6] and the bacterial life style [15]. However, it remains unclear whether or not this association is a consequence of any further association with the underlying community structure. In other words, the relation between the living environment and modularity might be a consequence of the habitats’ association with the community structure. To answer this question, we investigate whether for a similar modularity score there exist multiple distinct community structures in metabolic networks of different species.

When the community structures are similar (roughly < 0.2), their modularity scores must be similar. Such dependency is expected from the definition of modularity. Beyond 0.2 in the difference of community structures, modularities vary significantly, from very similar to very different, despite different community structures. In other words, the same modularity score may be achieved via different community structures. Such convergence at the modularity level takes place mostly between bacteria and eukaryota, though also happening between species within the same kingdom, as indicated by the green and blue dots on the bottom right corner of the left panel of Figure 2. To further explore this relationship between modularity scores and community structures on metabolic networks, we plotted the distribution of modularity scores for each community structure cluster (Figure 2) obtained through hierarchical clustering (see Methods). In the right panel of Figure 2, we see that most community structure clusters span many bins of modularities and for each bin of modularity scores, community structures from different clusters can be discerned. This indicates that similar modularity scores found on metabolic networks can stem from different community structures.

### Convergent evolution of modularity scores

### Convergent evolution of modularity is driven by life style

Knowing that similar modularity may be achieved independently via different community structures, we revisit the question of what drives the convergent evolution of modularity. We studied several factors ranging from the size of the metabolome (the number of enzymes and the size of the network under the current choice of network semantics) to environmental factors that include temperature preferences and oxygen requirements.

#### Network size remains a determinant of normalized modularity on enzyme networks

*r*= 0.85,

*p*= 2.0 × 10

^{−282}in the normalized case and

*r*= 0.80,

*p*= 2.6 × 10

^{−229}in the unnormalized case). We also see that species with a reduced metabolome (such as those under the clade of Mollicutes and Rickettsiales) possess smaller modularities in their metabolic networks (see Discussion), which is consistent with our observation here. The dependence of modularity on the number of enzymes is sensitive to rewiring (see Additional file 1: Figure S3). It is worth mentioning that similar correlation is seen on: 1) synthetic linear graphs (graphs composed of nodes linearly concatenating each other); see Additional file 1: Figures S4; and S2) the line graph transformations [23] of rewired compound networks with currency metabolites deleted; see Additional file 1: Figure S5, implying that their resemblance to the organization of metabolic networks may explain the dependence of Newman’s modularity on the sizes of the network.

#### The association of the environmental variability with the modularity is a consequence of its association with the number of enzymes

*p*= 4.6 × 10

^{−10}). Moreover, this refinement is not perfect (for example, the smallest facultative species

*B. burgdorferi*is often described as obligate [24, 25] and the second largest obligate species

*R. Baltica*in the data set is in fact free-living marine bacteria [26]). Therefore, the difference in the number of enzymes between facultative species and obligate species could in fact be more striking.

It is conceivable that microbes capable of coping with a varying and open habitats have a larger metabolome and microbes that lead specialized lifestyles have a smaller metabolome. An extreme case is that bacteria leading an obligate lifestyle has a reduced metabolome. One explanation of this phenomenon is that unnecessary genes for living in a specialized niche that only increase the overhead of maintenance were lost during evolutionary history [27–30]. For example, the *γ*-proteobacteria *B. aphidicola* lack the genes for the synthesis of tryptophan, riboflavin, fatty acids and phospholipids due to its endosymbiosis with aphids [31, 32]. Here we see that the numbers of enzymes of 8 insect endosymbionts in *γ*-proteobacteria are significantly smaller than the other species in our dataset (one-tailed Wilcoxon rank-sum test *p* = 1.1 × 10^{−6}). Even the largest of these endosymbionts (*B. pennsylvanicus*, 366 enzymes) has a smaller metabolome than the smallest non-endosymbiont (*D. nodosus*, 459 enzymes). Modularity scores of endosymbionts are also significantly smaller than non-endosymbionts (one-tailed Wilcoxon rank-sum test, *p* = 2.5 × 10^{−6}).

To study whether habitat variability truly affects the modularity of the metabolic networks besides the effect of the number of enzymes, we binned the species into groups with the number of enzymes in bins ranging within at most 50 enzymes. Out of 24 bins from 100 to 820 with the number of enzymes incrementing by 30, 16 bins contain at least two categories of species each of which has more than 10 members. Only in 4 of these 16 bins (310∼340, 430∼460, 490∼520, 520∼550) habitat variability significantly (Kruskal-Wallis H-test *p* < 0.05) affects the network modularity. This fact shows that most of the seeming dependence of modularity on the habitat variability may disappear if the number of enzymes is controlled.

#### The order of modularity is the same as the order of the number of enzymes under the classification based on temperature preference but not oxygen requirement

By comparing the modularities against the oxygen requirements of the species (bottom row of Figure 6), we find that facultative bacteria have the highest modularity. Microaerophilic bacteria have the least modularity. Facultative bacteria are ones that normally utilize oxygen as their electron receptor but can also ferment other endogenous electron receptors such as ethanol and lactate. On the contrary, microaerophiles have the most strict requirement for oxygen. For them, oxygen is not only a requirement for survival, but the concentration of oxygen must also be lower than what is present in the atmosphere. If environmental variability should explain the difference in modularity, the flexibility in oxygen usage, as one way of reflecting environmental variability, supports such explanation: facultative bacteria have higher modularity than strictly aerobic and strictly anaerobic bacteria. And strictly aerobic bacteria have higher modularity than microaerophiles. There is no significant difference in modularity between anaerobic bacteria and microaerophiles (two tailed Wilcoxon rank-sum test *p* = 0.40, same result for the number of enzymes, *p* = 0.57). However, bacteria that are capable of freely metabolizing oxygen (facultative joined with aerobic) have significantly (one tailed Wilcoxon rank-sum test *p* = 5.5 × 10^{−26}) higher modularities than those who have limited capability of handling oxygen or have to rely on fermentation (microaerophiles joined with anaerobic). The same result is obtained when the number of enzymes are compared (*p* = 1.1 × 10^{−35}). Comparison between only strictly aerobic microbes against strictly anaerobic microbes also indicates statistical significance (one tailed Wilcoxon rank-sum test *p* = 6.9 × 10^{−16} in modularities and *p* = 1.2 × 10^{−30} in the numbers of enzymes). Facultative bacteria have significantly higher modularities than strictly aerobic bacteria (one tailed Wilcoxon rank-sum test *p* = 0.0025). However, a null hypothesis is accepted when it comes to the number of enzymes (one tailed Wilcoxon rank-sum test *p* = 0.18), meaning that the significant difference in modularity between facultative bacteria and strictly aerobic bacteria is not a consequence of the difference in the numbers of the enzymes.

## Discussion

### Modularity-based communities are topologically meaningful yet do not reflect biological functional classifications

Despite the existing studies on the modularity of metabolic networks and reported limitation in modularity-based community detection such as the resolution limit [33] (optimizing the modularity score might fail to detect small communities), the non-locality [34] (the local delineation of a community depends on the global network connectivity) and the extreme degeneracy [35] (there might exist multiple optimal/suboptimal community structures), it remains unclear whether, in this specific case of metabolic networks, modularity-based communties reflect the graph-theoretic intuition of a community structure.

*k*

^{ in }) be greater than the number of neighbors of the node from different communities (

*k*

^{ out }). The definition in a weaker sense only require the sum of

*k*

^{ in }be greater than the sum of the

*k*

^{ out }over all nodes in a community. We computed the

*k*

^{ in },

*k*

^{ out }for all the nodes in the metabolic network of

*E.coli*. We find that the partitions obtained via modularity optimization satisfy the weaker definition (see Figure 7). Most communities also satisfy the strong definition (Additional file 1: Figure S6). In all the 10 nodes in

*E.coli*that break the definition in the strong sense, the connections to nodes from the same community outnumber the connections to any one of the other communities to which the node does not belong (even though the sum of outward connections is greater). This explains why these nodes are not classified into any of the other communties. These 10 nodes consist of 2 oxidoreductase, 6 transferase and 2 lyases. No particular preferences of pathway participation from these exceptions was observed.

In order to test the extent of the resolution limit of the modularity based community detection on metabolic networks, we computed densely connected subgraphs using the SIDES program [37]. As shown in the right panel of Figure 7, most of the densely connected subgraphs are contained in the same communities, which is a rough indication of the exemption from the resolution limit.

Despite these findings, the definition of modularity we use might still be problematic when applied to linear/sparse graphs. As we show in Additional file 1: Figure S4, the longer the linear graph, the higher its modularity, which is problematic given that two line graphs should be intuitively considered equally modular regardless of their lengths.

Another crucial question in studying the modularity of metabolic networks is whether the communities detected carry any functional meaning (in the biological sense). Intuitively, modularity or density-based methods would not identify linear, or more generally sparse, pathways. To answer this question, we investigate the functional meaning of communities computed on the metabolic network of *E.coli*. We find that these communities have limited specificity to partitions based on biological functions.

We studied the Gene Ontology (GO) annotation of the genes that transcribe the enzymes in the *E.coli* network using GS^{2} [40], a measure that quantifies the similarity of GO terms among a group of genes. In order to tell whether enzymes inside the same community have a similar ontology, we ran GS^{2} on genes that are annotated to transcribe enzymes belonging to the same community. We find that genes inside the same community have a higher similarity of GO annotations than the same number of genes but randomly selected from the gene pool of the organism (right panel of Figure 8). Following Bauer et al. [41], we test whether a community is functionally significant by whether there is a significant enrichment of any GO term. The GO specificity is calculated by dividing the extent of overlap between the GO term and the community by the total number of genes that have that GO term in *E.coli*. The community specificity is calculated by dividing the extent of overlap between the GO term with the community by the number of genes that transcribe the enzymes in the community. GO-community pairs where the GO term significantly annotates the community are isolated (tested against the hypergeometric distribution with Bonferroni correction for multiple comparisons, *α* = 4.7 × 10^{−5} [42]). In spite of many GO-community annotations with significant p-values, no clear 1-1 correspondence between GO terms and community structures is seen (Additional file 1: Figure S8). This suggests that the GO similarity among genes inside the same community might result from their closer distance on the network, assuming genes inside a community are closer on the network and nodes closer on the network are more likely to share GO annotations.

### Community structures are only kingdom-specific

*enzyme profiles*, or the spectra of all enzymatic activities as are characterized by the sets of Enzyme Commission (EC) numbers, among species from the same kingdom. As is shown in Figure 9 where we label on each branch the number of enzymes appearing exclusively in the descendants of the branch (an indication of metabolic innovation specific to the lineage), both bacteria and eukaryotes have their characteristic metabolic capabilities (465 and 625 respectively) while archea tend to share their metabolic capabilities with species from other kingdoms (14 unique enzymes).

Due to the independence of enzyme-reaction relationship from the choice of the species, enzyme profiles directly determine the connectivity, and hence the community structure of the metabolic networks. Any difference in the community structure is a result of some difference in the enzyme profile. To see whether different enzyme profiles would generate similar community structures, we cluster the species by their enzyme profiles using Unweighted Pair Group Method with Arithmetic Mean (UPGMA) [43]. We find that the clusters based on enzyme profiles agree to a substantial degree to the clusters based on community structures (third and fourth tracks from the outer rim in Figure 9).

## Conclusions and prospects

In this paper, we conducted an evolutionary analysis of metabolic network modularity in order to explore whether it is the network modularity or the community structure on which the modularity score is based, is the unit of selection. We showed that modularities undergo convergent evolution via different community structures. Further we revisited the association of the modularity score to environmental variability and extended the analysis to other aspects of microbial life styles. We found that on enzyme networks, the number of enzymes, which is also the size of the network and could also indicate the size of the metabolome, might be a determinant of the observed association between modularity and environmental variability. Further, we identified a strong association between network modularity and the microbe’s temperature and oxygen requirements. We also found that modularity-based community structure does not correspond to biological functional classifications and is conserved only at the kingdom level.

An important confounding factor with metabolic network analysis is the network semantics, or what the nodes of the network represent and how the network is reconstructed. Previous studies have been based on different reconstructions and network semantics; for example, Parter et al. [6] considered networks with nodes representing metabolites while Kreimer et al. [15] considered networks with nodes representing enzymes. In order for the results to be comparable, we considered in this work four different alternatives (see Data). We found that the same analysis on different network reconstructions can lead to qualitatively different conclusions. For example, the correlation of modularity to the number of enzymes is only true for enzyme networks (Figure 4 and Additional file 1: Figure S9) but not for compound networks (Additional file 1: Figure S10 and Figure S11). For compound networks, we find a significant difference in normalized modularity among different groups but no clear association between modularity and habitat variability (Additional file 1: Figures S12 and S13) in contrast with enzyme networks (Additional file 1: Figure S14). We cannot repeat the association of network modularity to the environmental variability on compound network with currency metabolites deleted, as reported in Parter et. al. [6]. Our result is consistent with a more recent analysis on an Archaean data set where no association was found either [10]. Discrepancy might result from the differences in the network reconstruction, algorithm used to optimize modularity or data used (due to different database releases). Despite different network semantics, it remains consistent that normalized modularity is significantly different among the groups classified by temperature requirements while not as significantly different among the groups classified by the oxygen requirements (Additional file 1: Figures S15, S16 and S17) and that modularity scores are achieved via distinct community structures (Additional file 1: Figure S18, S19 and S20).

Our work calls for more biologically meaningful definitions of the modularity for metabolic networks. Modules under such definition might not be graph-theoretically intuitive. Density-based definitions do not describe well pathways and sparse graphs which seem to be ubiquitous in biological systems (e.g., a biochemical pathway may be very sparse and does not fit the definition of a graph-theoretic module). Another drawback from defining modularity as a graph-theoretic concept in metabolic networks is that metabolic systems are inherently hypergraphs instead of standard graphs [44]. Adopting the graph-theoretic definition of modularity imposes a graph representation onto the metabolic system. Thus our work also calls for more careful scrutiny on the recent results related to the adaptive roles on modularity scores and their association with biological phenotypes. Adaptive roles should be explained under specific network reconstruction and care should be taken when one makes generalized conclusions.

## Methods

### Community detection and modularity

*V*and set of edges

*E*, the

*Q*score is defined as a function of a partition $\mathcal{P}$ of

*V*,

where *e*_{
ii
}is the fraction of edges in community *i* (over all edges in the network) and *a*_{
i
} is the fraction of edges that are incident on a node in community *i*. The highest *Q* score attained over all possible partitions, $arg\underset{\mathcal{P}}{max}Q\left(\mathcal{P}\right)$, is defined as the network’s modularity. Two communities are neighbors if there is an edge connecting any pair of their members, i.e., *C*_{
i
} is a neighbor of *C*_{
j
} if there is some *p*∈*C*_{
i
} and *q*∈*C*_{
j
} such that (*p* *q*)∈*E*. Several algorithms have been devised to estimate the modularity together with its corresponding community structure; see [45] for a review. In this work, we improve the algorithm of Newman [18] to optimize the modularity score. The improvement is achieved by global merge and resplit and is given in Algorithm 1.

#### Algorithm 1

Merge-Resplit

**Input** : Graph *g*=(*V*,*E*).

**Output**: A partition $\mathcal{P}$to maximize

*Q*.

- 1.
$\mathcal{P}$= RecursiveBipart(

*V*,*E*); - 2.
**do** - 3.
**for**${C}_{i},{C}_{j}=\text{neighbors in}\phantom{\rule{.3em}{0ex}}\mathcal{P}$**do** - 4.
${C}_{\mathit{\text{merge}}}={C}_{i}\cup {C}_{j};$

- 5.
${\mathcal{P}}^{\prime}$ = RecursiveBipart(

*C*_{ merge },*E*); - 6.
**foreach***v*∈*C*_{ merge }**do** - 7.
$S\left(v\right)=\left\{\begin{array}{rl}1& \text{if}v\in {C}_{i}\\ -1& \text{if}v\in {C}_{j}\end{array}\right.;$

**end**

- 8.
${\mathcal{P}}^{\mathrm{\prime \prime}}$ = KirnighanLin(

*C*_{ merge },*E*,*S*); - 9.
$\mathcal{P}={\text{argmax}}_{\mathcal{P}\in \{{\mathcal{P}}^{\prime},{\mathcal{P}}^{\mathrm{\prime \prime}}\}}Q\left(\mathcal{P}\right);$

**end**

**while**$\mathcal{P}$is varying;

- 10.
**return**$\mathcal{P}$

*C*

_{ i },

*C*

_{ j }), if we define Q as a quadratic product of graph Laplacian

*L*and the membership vector

*S*(as defined in line 6).

Optimal Q is achieved by finding *S* with the leading eigenvalue of *L*. Eigen problems are solved using shifted power method. Each step in KirnighanLin procedure both on line 8 and inside RecursiveBipart (following Newman [18]) optimizes the boundary of two communities by greedily swapping a pair of nodes whose exchange results in the largest increase in *Q*. The intermediate state with the highest *Q* is returned.

After the initial decomposition from RecursiveBipart, each pair of communities thus obtained are merged and fed again into RecursiveBipart, whose spectral property guarantees that the computed partition, which might contain one, two or more subsets, yields no lower *Q*. The new partition obtained is compared with a partition obtained by directly applying the KirnighanLin procedure to the boundary between the two original communities. The partition that gives rise to the larger *Q* is kept. This is to ensure the new partition will lead to better optimization than the current one. Such merge-resplit process continues until the partition no longer varies after completely traversing the boundaries between all pairs of the neighboring communities, thereby reaching a self-organized state (a state in which boundaries between any two neighboring communities can not be further improved). The modified algorithm outperforms the existing deterministic algorithms and some computationally heavy stochastic methods, in maximizing *Q*, as is shown in Additional file 1: Table S2 (see Additional file 1: Table S3 for the computation time at each benchmark data set). A C implementation of the improved algorithm is available at http://www.bioinfo.cs.rice.edu/.

#### Normalized modularity

where *M* is the number of communities in the real network and *Q*_{rand} is the mean Q value of randomized networks. To determine the number of rewiring operations in computing *Q*_{rand}, we use the leveling of global clustering coefficient [48] of the network as the signal for convergence. For each edge semantics, the number of rewiring operations required to make level the global clustering coefficient of the largest network is used for all species when we rewire its metabolic network of the particular edge semantics (see Additional file 1: Figure S21). Each rewiring operation involves swapping the ends of two randomly chosen edges. This process keeps the networks’ degree distribution. Alternative null models can involve the constraint of the number of short cycles. We do not consider the constraint due to difficulty in identifying all the cycles and ambiguity in determining the length of the cycles constrained.

### Mutual information

*N*

_{ i }is the number of nodes that belong to set $i\in \mathcal{A}$ and

*N*is the total number of nodes common to both networks. The joint entropy is defined as,

and *N*_{
ij
} is the number of nodes that belong to both set $i\in \mathcal{A}$ and set $j\in \mathcal{B}$.

### Clustering of community structures

We cluster the community structures by using hierarchical clustering (nearest point algorithm) implemented in the open source SciPy [50] package. The distance between any two networks is 1−*MI* where *MI* is the mutual information between their community structures. Clusters are flattened by looking for largest sets of individuals such that the pairwise distance among its members are within a chosen threshold based on inspection. The threshold used is 0.7. The clusters of species by community structure similarity are listed in Additional file 3.

### Data

We obtained manually annotated metabolic networks of 1021 species from the KEGG database [39] (see Additional file 1: Figure S22 for a summary of enzymatic annotations and Additional file 4 for a summary of organisms). The networks were assembled following Kreimer et al. [15]. Reaction direction information was extracted from the pathway KGML file provided by KEGG. Altogether there are 3548 KEGG reactions with direction identified, leaving 4635 reactions denoted as reversible. From these data, we assembled four types of networks using four different semantics, namely, compound networks where nodes are metabolites, enzyme networks where nodes are enzymes, compound networks with currency deletion where nodes are metabolites and connections are pruned as in [51, 52], and enzyme networks with currency link deletion where nodes are enzymes and connections are pruned as in [15]. Analyses shown in this work are of enzyme networks with currency link deletion unless stated otherwise. The species’ habitat variability, temperature preferences and oxygen requirements are obtained from NCBI Genome Project Organisms Info Tab (http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi).

To conduct an evolutionary analysis of the data, we make use of the phylogeny, both branching pattern and branch lengths with branch lengths measuring sequence divergence in the unit of the number of subtitutions per site, inferred by [53]. Out of the 1021 species, only 56 appear in this phylogeny. Therefore, when we compare the community structures and modularity scores against genetic distances, only the 56 species shared by the phylogeny are used. The genetic distance between any pair of species is defined as the sum of the lengths of the branches on the path between the two species on the species phylogeny.

## Author’s contributions

All authors contribute equally. Both authors read and approved the final manuscript.

## Declarations

### Acknowledgements

This work was supported in part by NSF grant CCF-0622037 and an Alfred P. Sloan Research Fellowship. The contents are solely the responsibility of the authors and do not necessarily represent the official views of the NSF or the Alfred P. Sloan Foundation. We thank two anonymous reviewers for extensive comments that helped improve the manuscript significantly.

## Authors’ Affiliations

## References

- Yang AS: Modularity, evolvability, and adaptive radiations: a comparison of the hemi- and holometabolous insects. Evol Dev. 2001, 3 (2): 59-72. 10.1046/j.1525-142x.2001.003002059.x. [http://www.ncbi.nlm.nih.gov/pubmed/11341675], [http://doi.wiley.com/10.1046/j.1525-142x.2001.003002059.x],PubMedView ArticleGoogle Scholar
- Hansen TF: Is modularity necessary for evolvability?. Biosystems. 2003, 69 (2-3): 83-94. 10.1016/S0303-2647(02)00132-6. [http://dx.doi.org/10.1016/S0303-2647(02)00132-6],PubMedView ArticleGoogle Scholar
- Griswold CK: Pleiotropic mutation, modularity and evolvability. Evol Dev. 2006, 8: 81-93. 10.1111/j.1525-142X.2006.05077.x. [http://www.ncbi.nlm.nih.gov/pubmed/16409385],PubMedView ArticleGoogle Scholar
- Brookfield JFY: Evolution and evolvability: celebrating Darwin 200. Biol Lett. 2009, 5: 44-46. 10.1098/rsbl.2008.0639. [http://rsbl.royalsocietypublishing.org/cgi/content/abstract/5/1/44],PubMedPubMed CentralView ArticleGoogle Scholar
- Hintze A, Adami C: Evolution of complex modular biological networks. PLoS Comput Biol. 2008, 4 (2): e23-10.1371/journal.pcbi.0040023. [http://dx.plos.org/10.1371/journal.pcbi.0040023],PubMedPubMed CentralView ArticleGoogle Scholar
- Parter M, Kashtan N, Alon U: Environmental variability and modularity of bacterial metabolic networks. BMC Evolutionary Biol. 2007, 7: 169-10.1186/1471-2148-7-169. [http://www.biomedcentral.com/1471-2148/7/169],View ArticleGoogle Scholar
- Holme P: Metabolic robustness and network modularity: a model study. PloS one. 2011, 6 (2): e16605-10.1371/journal.pone.0016605. [http://dx.plos.org/10.1371/journal.pone.0016605],PubMedPubMed CentralView ArticleGoogle Scholar
- Harrison R, Papp B, Pál C, Oliver SG, Delneri D: Plasticity of genetic interactions in metabolic networks of yeast. Proc Nat Acad Sci USA. 2007, 104 (7): 2307-2312. 10.1073/pnas.0607153104. [http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1892960&tool=pmcentrez&rendertype=abstract],PubMedPubMed CentralView ArticleGoogle Scholar
- Soyer OS, Pfeiffer T: Evolution under fluctuating environments explains observed robustness in metabolic networks. PLoS Comput Biol. 2010, 6 (8): [http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2928748&tool=pmcentrez&rendertype=abstract],Google Scholar
- Takemoto K, Borjigin S: Metabolic Network Modularity in Archaea Depends on Growth Conditions. PLoS ONE. 2011, 6 (10): e25874-10.1371/journal.pone.0025874. [http://dx.plos.org/10.1371/journal.pone.0025874],PubMedPubMed CentralView ArticleGoogle Scholar
- Peregrín-Alvarez JM, Sanford C, Parkinson J: The conservation and evolutionary modularity of metabolism. Genome biol. 2009, 10 (6): R63-10.1186/gb-2009-10-6-r63. [http://dx.doi.org/10.1186/gb-2009-10-6-r63], [http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2718497&tool=pmcentrez&rendertype=abstract],PubMedPubMed CentralView ArticleGoogle Scholar
- Peregrín-Alvarez JM, Xiong X, Su C, Parkinson J: The Modular Organization of Protein Interactions in Escherichia coli. PLoS Comput Biol. 2009, 5 (10): e1000523-10.1371/journal.pcbi.1000523. [http://dx.doi.org/10.1371%2Fjournal.pcbi.1000523] [http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2739439&tool=pmcentrez&rendertype=abstract],PubMedPubMed CentralView ArticleGoogle Scholar
- Ten Tusscher KH, Hogeweg P: Evolution of Networks for Body Plan Patterning; Interplay of Modularity, Robustness and Evolvability. PLoS Comput Biol. 2011, 7 (10): e1002208-10.1371/journal.pcbi.1002208. [http://dx.plos.org/10.1371/journal.pcbi.1002208],PubMedPubMed CentralView ArticleGoogle Scholar
- Dagan T, Artzy-Randrup Y, Martin W: Modular networks and cumulative impact of lateral transfer in prokaryote genome evolution. Proceedings of the National Academy of Sciences of the United States of America. 2008, 105 (29): 10039-10044. 10.1073/pnas.0800679105. [http://dx.doi.org/10.1073/pnas.0800679105] [http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2474566&tool=pmcentrez&rendertype=abstract],PubMedPubMed CentralView ArticleGoogle Scholar
- Kreimer A, Borenstein E, Gophna U, Ruppin E: The evolution of modularity in bacterial metabolic networks. Proceedings of the National Academy of Sciences. 2008, 105 (19): 6976-6981. 10.1073/pnas.0712149105. [http://www.pnas.org/content/105/19/6976.abstract],View ArticleGoogle Scholar
- Newman M, Girvan M: Finding and evaluating community structure in networks. Physical Rev E. 2004, 69 (2): 26113-[http://link.aps.org/abstract/PRE/v69/e026113] [http://link.aps.org/doi/10.1103/PhysRevE.69.026113],View ArticleGoogle Scholar
- Wagner G, Altenberg L: Complex Adaptations and the Evolution of Evolvability. Evolution. 1996, 50 (3): 967-976. 10.2307/2410639.View ArticleGoogle Scholar
- Newman MEJ: Modularity and community structure in networks. Proc Nat Acad Sci. 2006, 103 (23): 8577-8582. 10.1073/pnas.0601602103. [http://www.pnas.org/content/103/23/8577.abstract],PubMedPubMed CentralView ArticleGoogle Scholar
- Clauset A, Newman MEJ, Moore C: Finding community structure in very large networks. Physical Rev E. 2004, 70 (6): 66111-[http://link.aps.org/abstract/PRE/v70/e066111],View ArticleGoogle Scholar
- Duch J, Arenas A: Community detection in complex networks using extremal optimization. Physical Rev E. 2005, 72 (2): 27104-[http://prola.aps.org/abstract/PRE/v72/i2/e027104] [http://link.aps.org/doi/10.1103/PhysRevE.72.027104],View ArticleGoogle Scholar
- Agarwal G, Kempe D: Modularity-maximizing graph communities via mathematical programming. Eur Phys J B. 2008, 66 (3): 409-418. 10.1140/epjb/e2008-00425-1. [http://dx.doi.org/10.1140/epjb/e2008-00425-1],View ArticleGoogle Scholar
- Kashtan N, Alon U: Spontaneous evolution of modularity and network motifs. Proc Nat Acad Sci USA. 2005, 102 (39): 13773-13778. 10.1073/pnas.0503610102. [http://www.pnas.org/content/102/39/13773.abstract] [http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1236541&tool=pmcentrez&rendertype=abstract],PubMedPubMed CentralView ArticleGoogle Scholar
- Nacher JC, Ueda N, Yamada T, Kanehisa M, Akutsu T: Clustering under the line graph transformation: application to reaction network. BMC Bioinformatics. 2004, 5: 207-10.1186/1471-2105-5-207. [http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=545960&tool=pmcentrez&rendertype=abstract],PubMedPubMed CentralView ArticleGoogle Scholar
- Lawrence KA, Jewett MW, Rosa PA, Gherardini FC: Borrelia burgdorferi bb0426 encodes a 2’-deoxyribosyltransferase that plays a central role in purine salvage. Mol microbiol. 2009, 72 (6): 1517-1529. 10.1111/j.1365-2958.2009.06740.x. [http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2764106&tool=pmcentrez&rendertype=abstract],PubMedPubMed CentralView ArticleGoogle Scholar
- Tilly K, Rosa PA, Stewart PE: Biology of infection with Borrelia burgdorferi. Infectious dis clinics of North Am. 2008, 22 (2): 217-234, v.. 10.1016/j.idc.2007.12.013. [http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2440571&tool=pmcentrez&rendertype=abstract],View ArticleGoogle Scholar
- Glöckner FO, Kube M, Bauer M, Teeling H, Lombardot T, Ludwig W, Gade D, Beck A, Borzym K, Heitmann K, Rabus R, Schlesner H, Amann R, Reinhardt R: Complete genome sequence of the marine planctomycete Pirellula sp. strain 1. Proc Nat Acad Sci USA. 2003, 100 (14): 8298-8303. 10.1073/pnas.1431443100. [http://www.pnas.org/cgi/content/abstract/100/14/8298],PubMedPubMed CentralView ArticleGoogle Scholar
- Andersson SG, Kurland CG: Reductive evolution of resident genomes. Trends in Microbiol. 1998, 6 (7): 263-268. 10.1016/S0966-842X(98)01312-2. [http://www.ncbi.nlm.nih.gov/pubmed/9717214],View ArticleGoogle Scholar
- Andersson JO, Andersson SG: Genome degradation is an ongoing process in Rickettsia. Mol Biol Evol. 1999, 16 (9): 1178-1191. 10.1093/oxfordjournals.molbev.a026208. [http://www.ncbi.nlm.nih.gov/pubmed/10486973],PubMedView ArticleGoogle Scholar
- Moran N, Wernegreen J: Lifestyle evolution in symbiotic bacteria: insights from genomics. Trends Ecol Evol. 2000, 15 (8): 321-326. 10.1016/S0169-5347(00)01902-9. [http://www.ncbi.nlm.nih.gov/pubmed/10884696],PubMedView ArticleGoogle Scholar
- Gil R, Sabater-Muñoz B, Latorre A, Silva FJ, Moya A: Extreme genome reduction in Buchnera spp.: toward the minimal genome needed for symbiotic life. Proc Nat Acad Sci USA. 2002, 99 (7): 4454-4458. 10.1073/pnas.062067299. [http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=123669&tool=pmcentrez&rendertype=abstract],PubMedPubMed CentralView ArticleGoogle Scholar
- Pérez-Brocal V, Gil R, Ramos S, Lamelas A, Postigo M, Michelena JM, Silva FJ, Moya A, Latorre A: A small microbial genome: the end of a long symbiotic relationship?. Sci (New York, N.Y.). 2006, 314 (5797): 312-313. 10.1126/science.1130441. [http://www.ncbi.nlm.nih.gov/pubmed/17038625],View ArticleGoogle Scholar
- Douglas AE: Nutritional interactions in insect-microbial symbioses: aphids and their symbiotic bacteria Buchnera. Annu Rev Entomology. 1998, 43: 17-37. 10.1146/annurev.ento.43.1.17. [http://www.ncbi.nlm.nih.gov/pubmed/15012383],View ArticleGoogle Scholar
- Fortunato S, Barthélemy M: Resolution limit in community detection. Proc Nat Acad Sci USA. 2007, 104: 36-41. 10.1073/pnas.0605965104. [http://www.pnas.org/content/104/1/36.abstract] [http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1765466&tool=pmcentrez&rendertype=abstract],PubMedPubMed CentralView ArticleGoogle Scholar
- Brandes U, Delling D, Gaertler M, Görke R, Hoefer M, Nikoloski Z, Wagner D: On Modularity Clustering. IEEE Trans Knowledge and Data Eng. 2008, 20 (2): 172-188.View ArticleGoogle Scholar
- Good BH, de Montjoye YA, Clauset A: Performance of modularity maximization in practical contexts. Phys Rev E. 2010, 81 (4): 46106-[http://link.aps.org/doi/10.1103/PhysRevE.81.046106],View ArticleGoogle Scholar
- Radicchi F, Castellano C, Cecconi F, Loreto V, Parisi D: Defining and identifying communities in networks. Proc Nat Acad Sci USA. 2004, 101 (9): 2658-2663. 10.1073/pnas.0400054101. [http://www.pnas.org/content/101/9/2658.abstract],PubMedPubMed CentralView ArticleGoogle Scholar
- Koyutürk M, Szpankowski W, Grama A: Assessing significance of connectivity and conservation in protein interaction networks. J Comput Biol : J Comput Mol Cell Biol. 2007, 14 (6): 747-764. 10.1089/cmb.2007.R014. [http://www.liebertonline.com/doi/abs/10.1089/cmb.2007.R014] [http://www.ncbi.nlm.nih.gov/pubmed/17691892],View ArticleGoogle Scholar
- Consortium GO: The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 2004, 32 (suppl 1): D258-D261. [http://nar.oxfordjournals.org/content/32/suppl1/D258.abstract],View ArticleGoogle Scholar
- Kanehisa M, Goto S: {KEGG:} Kyoto Encyclopedia of Genes and Genomes. Nucl Acids Res. 2000, 28: 27-30. 10.1093/nar/28.1.27. [http://nar.oxfordjournals.org/cgi/content/abstract/28/1/27],PubMedPubMed CentralView ArticleGoogle Scholar
- Ruths T, Ruths D, Nakhleh L: GS2: an efficiently computable measure of GO-based similarity of gene sets. Bioinformatics. 2009, 25 (9): 1178-1184. 10.1093/bioinformatics/btp128. [http://bioinformatics.oxfordjournals.org/content/25/9/1178.abstract],PubMedPubMed CentralView ArticleGoogle Scholar
- Bauer S, Grossmann S, Vingron M, Robinson PN: Ontologizer 2.0–a multifunctional tool for GO term enrichment analysis and data exploration. Bioinf (Oxford, England). 2008, 24 (14): 1650-1651. 10.1093/bioinformatics/btn250. [http://bioinformatics.oxfordjournals.org/cgi/content/abstract/24/14/1650],View ArticleGoogle Scholar
- Boyle EI, Weng S, Gollub J, Jin H, Botstein D, Cherry JM, Sherlock G: GO::TermFinder–open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes. Bioinf (Oxford, England). 2004, 20 (18): 3710-3715. 10.1093/bioinformatics/bth456. [http://dx.doi.org/10.1093/bioinformatics/bth456] [http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3037731&tool=pmcentrez&rendertype=abstract],View ArticleGoogle Scholar
- Sokal R, Michener C: A statistical method for evaluating systematic relationships. University of Kansas Sci Bull. 1958, 28: 1409-1438. [http://www.citeulike.org/user/druvus/article/1327877],Google Scholar
- Zhou W, Nakhleh L: Properties of metabolic graphs: biological organization or representation artifacts?. BMC bioinf. 2011, 12: 132-10.1186/1471-2105-12-132. [http://www.biomedcentral.com/1471-2105/12/132] [http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3098788&tool=pmcentrez&rendertype=abstract],View ArticleGoogle Scholar
- Fortunato S: Community detection in graphs. Physics Rep. 2010, 486 (3-5): 75-174. 10.1016/j.physrep.2009.11.002. [http://www.sciencedirect.com/science/article/B6TVP-4XPYXF1-1/2/99061fac6435db4343b2374d26e64ac1] [http://linkinghub.elsevier.com/retrieve/pii/S0370157309002841],View ArticleGoogle Scholar
- Fiedler M: Algebraic connectivity of graphs. Czechoslovak Math J. 1973, 23: 298-305.Google Scholar
- Pothen A, Simon HD, Liou KP: Partitioning Sparse Matrices with Eigenvectors of Graphs. SIAM J Matrix Anal App. 1990, 11: 430-452. 10.1137/0611030.View ArticleGoogle Scholar
- Barrat A, Weigt M: On the properties of small-world network models. Eur Phys J B - Condens Matter And Complex Syst. 1999, 19-[http://arxiv.org/abs/cond-mat/9903411],Google Scholar
- Kraskov A, Stögbauer H, Andrzejak RG, Grassberger P: Hierarchical clustering using mutual information. Europhys Lett (EPL). 2005, 70 (2): 278-284. 10.1209/epl/i2004-10483-y. [http://stacks.iop.org/0295-5075/70/i=2/a=278?key=crossref.4b80db2e1ce6b59ac95a3a163d16c2ed],View ArticleGoogle Scholar
- Jones E, Oliphant T, Peterson P: SciPy: Open source scientific tools for Python. 2001, [http://www.scipy.org/],Google Scholar
- Zhao J, Ding GH, Tao L, Yu H, Yu ZH, Luo JH, Cao ZW, Li YX: Modular co-evolution of metabolic networks. {BMC} Bioinformatics. 2007, 8: 311-10.1186/1471-2105-8-311. [http://www.biomedcentral.com/1471-2105/8/311],PubMedPubMed CentralView ArticleGoogle Scholar
- Croes D, Couche F, Wodak SJ, van Helden J: Inferring Meaningful Pathways in Weighted Metabolic Networks. J Mol Biol. 2006, 356: 222-236. 10.1016/j.jmb.2005.09.079. [http://www.sciencedirect.com/science/article/B6WK7-4HBSJYD-1/2/9dac1fe1e59e756b1ac8b56f994ccdc1],PubMedView ArticleGoogle Scholar
- Ciccarelli FD, Doerks T, von Mering C, Creevey CJ, Snel B, Bork P: Toward automatic reconstruction of a highly resolved tree of life. Science. 2006, 311 (5765): 1283-1287. 10.1126/science.1123061. [http://dx.doi.org/10.1126/science.1123061],PubMedView ArticleGoogle Scholar

## Copyright

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.