Construction of Arabidopsis metabolic network
Three files were downloaded to reconstruct the Arabidopsis metabolic network: (a) an expert-curated list of Arabidopsis encoded enzymes and the corresponding genes from Aracyc (ftp://ftp.plantcyc.org/Pathways, june,2008 updated) , (b) a "reaction" file ftp://ftp.genome.jp/pub/kegg/ligand/reaction/reaction/ to scan all catalyzed reactions of Arabidopsis enzymes , (c) a "reaction_mapformula.lst" file ftp://ftp.genome.jp/pub/kegg/ligand/reaction/reaction_mapformula.lst to obtain the information of metabolites in reactions. The reactions between metabolites were used to determine the interactions among enzymes. In the Figure 1A, enzyme EC188.8.131.52 uses alpha-D-glucose-1-phosphate as substrate to produce UDP-glucose, which is then used by enzyme EC184.108.40.206, the interaction was defined as EC220.127.116.11 → EC 18.104.22.168. Because small molecules, H+, NADH, NADP, NADPH, NH3, ATP, ADP, AMP, NAD, CoA, O2, CO2, Glu and pyrophosphate, are involved in many reactions or are used as carriers for transferring electrons, they were excluded from the analysis [35, 36].
Calculation of node in-degree, out-degree and between-ness
The in-degree was calculated by the number of enzymes providing substrates for the considered enzyme, whereas the out-degree was calculated by the number of enzymes using the products of the considered enzyme as substrates. The between-ness was calculated by the "breath-first tree" based algorithm as following steps [18, 37]. (a) The calculation was initialized by defining the between-ness of every vertex j in the network as B(j) = 0. (b) Starting from vertex i, a breadth-first tree http://en.wikipedia.org/wiki/Breadth-first_search was built with i on the top, those that were nearest to i directly below and those that were farthest from i at the bottom. Each node was placed at a certain level of the tree based on its shortest metabolic reaction step. (c) P(n) = 1 was assigned to every vertex j in the tree. For every vertex j
, where k is the set of nodes that directly connect ("provides substrates") to j. (d) B(j) = 1 was assigned to every vertex j in the tree. (e) Starting from bottom vertex j of the tree, B(j) was added to the corresponding variable of the predecessor of j. If j had more than one predecessor ("enzyme k provides more than one substrate to the enzyme j"), each predecessor k was assigned the value of
. (f) Step (e) was performed for every vertex in the tree. (g) Steps (b)-(f) were repeated for every vertex in the network. Finally, every vertex in the network was assigned with a between-ness value which is the sum of its between-ness of every sub-tree involved.
Modular analysis of the metabolic network
The node distributions of P <In> and P <Out> were used to investigate the frequency of the in-degree and out-degree. The least-squares method was used to estimate power-law exponent of p(k)∞k
for log-transformed data (t, power exponent; k, in/out-degree). Since the estimated power-law exponent was 1.67, methods for study of scale-free structure was applied in analysis the Arabidopsis metabolic network. The algorithm of Guimera and Amaral , with parameter settings as iteration factor = 1.0, cooling factor = 0.95 and number of randomization = 100, was used to measure the extent of modularity of network and separate the network into topological modules. The Kobas toolkit  was used to infer the frequent pathways in every topologically separated module.
Analysis of transcription datasets of Arabidopsis
The gene transcription datasets of different developmental stages were obtained from the Affymetrix ATH1 data (TAIR accession number, ME00319) . The raw data were normalized by the Affymetrix detection algorithms in the MAS5 library, the background levels and PM/MM ratios were corrected according to the Affymetrix Statistical Algorithms. Based the estimated expression values of probes, the expression values of corresponding 22,380 Arabidopsis genes. After filtering the mixture of RNA pools or measurement of the same developmental stages, 59 datasets of ME00319, which measured the developmental stages of Arabidopsis, were selected for downstream analysis (Additional file 3). Expression intensities were averaged among three replicates for every developmental stage.
The Expression Variation index V
was used to measure the variations of gene i in expressional level across developmental stages [40
where n is the number of stages, S
is the expression signal of gene i in tissue j, S
(i, max) is the highest expression signal of gene i across the stages, if the S
is lower than 50 we arbitrary let it be 50 to minimize the influence of noise from low intensities. The V
value ranges form 0 to 1, higher value indicating higher variations in expressional level across stages or tend to be stage specific genes.
The expression maximum density of enzymes were calculated by
(where n was the number of genes annotated with this enzyme, S
(i, max) was the maximum expression density of gene i among the developmental stages). The expression variation of enzymes were calculated by
(where n was the number of genes annotated with this enzyme, ν
(i) was Expression Variation index gene i).
The Pearson correlation coefficient (P
) was used to measure the co-expression between gene i
and gene j
as following formula:
where n = the number of developmental stages, S
were the expression value of gene i and j under condition k;
were the mean expression value of gene i and j across the stages, the P
was between -1 and 1, with 1 standing for highly co-expressed whereas -1 standing for highly divergent expressed.
The datasets, which time-seriously measured the shoot tissues responding to various abiotic stress (exogenous factors: cold, genotixic, osmotic, salt, UV-B, wound, drought, and heat, see Table S3), were also analyzed with the same procedures.
Identification of ortholog groups between Arabidopsis and Populus
The phylogenetic tree approach was used to infer orthologs between Arabidopsis and Populus. The proteomes of Arabidopsis thaliana, Populus trichocarpa and Oryza sativa japonica were downloaded from http://www.tigr.org/tdb/e2k1/ath1/, http://genome.jgi-psf.org/Poptr1_1/ and http://rice.plantbiology.msu.edu/index.shtml. The sequences of three species were used an all-against-all BLAST with an cutoff of 1e-10, by transforming the E-values with the absolute values of their logarithm, the score matrix were constructed and used for similarity clustering with Markov Clustering . The protein clusters containing the Arabidopsis metabolic genes were used for phylogenetic analysis.
The protein sequences of members in each cluster were aligned with ClustalW  and the alignments were used to generate neighbor-joining trees with the two-parameter substitution correction. The phylogenetic trees were rooted at midpoints. By reconciling between phylogentic tree and the species tree ((Arabidopsis thaliana, Populus trichocarpa), Oryza sativa japonica)) with Notung , the ortholog groups were identified between Arabidopsis and Populus.
Measurement of the evolutionary rates in coding region and 5' upstream, 3' downstream regions of genes encoding enzymes in Arabidopsis
The Clustalw was used to globally align two amino acid sequences of orthologs between Arabidopsis and Populus, and the corresponding coding sequences were realigned with the gaps in the alignment trimmed. The Ka was estimated from the codon-based nucleotide sequence alignment by using the Yang-Nielsen maximum-likelihood method implemented in the yn00 program of the PAML package. To calculate the substitution rates in Arabidopsis genes 5' 1000bp upstream and 3' 1000 bp downstream region were calculated against that of Populus orthologs. The Clustalw software  were used to globally align the non-coding regions of orthologs, the substitution rate per sites K
with the Kimura two-parameter model were calculated by dismat program of the EMBOSS package . For a Arabidopsis gene i with more than one orthologs in Populus, the smallest of calculated Ka, K
were selected as Ka
5u(i) and K
The average divergence in coding regions, 5'upstream and 3'downstream region of enzyme could be calculated by the formula
, the n was the number of genes annotated by this enzyme, Ka
5u(i) and K
3u(i) was the substitution rate in coding, upstream and downstream non-coding regions of Arabidopsis gene.
Statistical analysis and computational methods
Spearman's rank correlation coefficients were estimated to evaluate the correlations between three topological centralities and expression intensities, expression variation, substitution rates of coding regions of gene encoding enzymes. The p-values were FDR-corrected by using the Q-value program in R package . The comparison of topological centralities between WGD-enzymes and the other enzymes were done by using Manny-Whitney U with two-tail test. Computations were performed on a Linux cluster with 16 nodes (Intel 5130, 2.0 GHz CPU, 4G memory, Research Center for Systematic and Evolutionary Botany, Institute of Botany, CAS). Perl http://perl.org and R http://www.r-project.org/ scripts were used for analysis, and can be obtained on request.