Histone modification pattern evolution after yeast gene duplication

Background Gene duplication and subsequent functional divergence especially expression divergence have been widely considered as main sources for evolutionary innovations. Many studies evidenced that genetic regulatory network evolved rapidly shortly after gene duplication, thus leading to accelerated expression divergence and diversification. However, little is known whether epigenetic factors have mediated the evolution of expression regulation since gene duplication. In this study, we conducted detailed analyses on yeast histone modification (HM), the major epigenetics type in this organism, as well as other available functional genomics data to address this issue. Results Duplicate genes, on average, share more common HM-code patterns than random singleton pairs in their promoters and open reading frames (ORF). Though HM-code divergence between duplicates in both promoter and ORF regions increase with their sequence divergence, the HM-code in ORF region evolves slower than that in promoter region, probably owing to the functional constraints imposed on protein sequences. After excluding the confounding effect of sequence divergence (or evolutionary time), we found the evidence supporting the notion that in yeast, the HM-code may co-evolve with cis- and trans-regulatory factors. Moreover, we observed that deletion of some yeast HM-related enzymes increases the expression divergence between duplicate genes, yet the effect is lower than the case of transcription factor (TF) deletion or environmental stresses. Conclusions Our analyses demonstrate that after gene duplication, yeast histone modification profile between duplicates diverged with evolutionary time, similar to genetic regulatory elements. Moreover, we found the evidence of the co-evolution between genetic and epigenetic elements since gene duplication, together contributing to the expression divergence between duplicate genes.


Background
Although gene duplication has been widely considered as the main source of evolutionary novelties [1][2][3][4], the issue of duplicate gene preservation remains a subject of hot debate, i.e., how duplicate copies can escape from being pseudogenized and then evolve from an initial state of complete redundancy to a steady-stable state that both functionally divergent copies are maintained by purifying selection. As one of plausible hypotheses, rapid expression divergence between duplicate genes may be the first step that is fundamental for the preservation of redundant duplicates [3]. There are three recognized types involved in expression regulatory mechanism: (i) cis-regulation of transcription mediated by promoters, enhancers, silencers, etc.; (ii) trans-regulation mediated by regulatory proteins binding to cis elements, such as transcription factors (TF); and (iii) epigenetic regulation, such as DNA methylation, specific histone modification pattern of genes (histone code hypothesis). Many studies have addressed the effect of cis-and trans-regulatory mechanisms on the expression divergence after gene duplication, e.g., cis-regulatory motif, TF-binding interaction, transacting expression quantitative trait loci (eQTLs) [5][6][7][8][9][10]. Though there is increasing evidence that epigenetic changes may play important roles in the initial expression divergence between duplicate genes [11][12][13][14][15], little study has been done about how regulatory network between duplicate genes evolve at epigenetic level.
Epigenetic regulation on gene expression is a highly complex process. In a broad sense, it includes DNA methylation, histone modification, nucleosome occupancy, as well as microRNA [11]. Moreover, these epigenetic elements can interact with each other, for instance, the reciprocity between DNA methylation and histone modification [16,17]. The complexity of epigenetic regulation has made it difficult to explore its role in regulatory divergence between duplicates. Nevertheless, we have recognized budding yeast (Saccharomyces cerevisiae) as an ideal organism for our purpose, because its epigenetic regulation is relatively simple: DNA cytosine methylation and microRNA were not detected [18,19]. In other words, histone modification is the main representation for epigenetic modification in the budding yeast, less affected by other epigenetic modification types. Therefore, in the present study, we focus on the evolution of histone modification (HM) between yeast duplicate genes.
Eukaryotic DNA with a unit of 146 bp wound around a histone octamer (two copies of each core histone H2A, H2B, H3, H4) is assembled into chromatin. Histone Nterminal tails are subject to multiple covalent posttranslational modifications, including lysine (K) acetylation, lysine or arginine (R) methylation, serine (S) phosphorylation, and so on [20,21]. Enormous possible combinations and interactions of histone modification types constitute histone code. The 'histone code' hypothesis claims that a specific pattern of hisone modification code can produce a specific effect on local chromatin structure, modulating DNA accessibility, and consequently regulating transcription and other DNA-based biological processes [20][21][22][23][24].
Our goal is to investigate the pattern of yeast histone modification (HM) code divergence between duplicates. Our hypothesis claims (i) that, when a gene is duplicated, the gene-specific histone modification profile is also duplicated, so on average duplicate pairs tend to have a higher degree of HM code similarity than randomly selected singleton gene pairs; and (ii) that since duplication, HM-code profile between duplicate copies become divergent with evolutionary time. We test these two predictions by conducting genome-wide analyses, as well as genes involved in different biological functions. Moreover, we are particularly interested in whether genetic regulatory elements, including cis-motifs (such as TATA box) and transcription factors, and epigenetic HM-code profile co-diverge during the evolution since gene duplication. To this end, time-dependent confounding factors in both genetic and epigenetic factors need to be ruled out. Finally, the significance of our study for having a better understanding of regulatory evolution after gene duplication is discussed.

Results
Combinational interactions of histone-modifying enzymes (HATs, HDACs, HMTs, HDMs, etc.) to histone N-tail produce numerous types of post-translational modification of histones (H2A, H2B, H3, H4), such as methylation, acetylation [25]. Moreover, the same modification site can be affected by different modifying enzymes, and vice versa, generating different histone modification combinations, like H3K4me2 and H3K4ac (dimethylation and acetylation in Lys4 of histone H3, respectively), or H4K8ac (acetylation in Lys8 of histone H4). In this study, histone modification (HM) code of a gene represents the combined profile of different HM sites, HM types and HM states in gene promoter and open reading frame (ORF) regions, respectively (see Methods). We believe that the HM code of a gene reveals the pattern of HM mediated regulatory network of that gene.

Functional redundancy in histone modification (HM) between yeast duplicate genes
Because of evolutionary relatedness, duplicate pairs may have a higher similarity of histone modification (HM) code than two randomly selected single-copy genes (singletons). To test this hypothesis, we compared the distance of HM code between duplicate pair and randomized singleton pair. To be simple we choose one minus Pearson's product-moment correlation coefficient, i.e., D HM = 1-r, to define the distance of HM code. The larger the value of D HM , the higher divergence of histone modification code between duplicate genes. Specifically, denote the distance of HM code associated in promoter and ORF regions by D HM-P and D HM-O , respectively. Randomized pairs were selected from single copy genes with 10000 repeats. As expected, we observed that both D HM-P and D HM-O measures show a lower divergence degree of histone modification code between duplicate genes than that of randomized singleton pairs (Wilcoxon rank sum test: P < 10 -15 for both cases; Figure 1A).
Generally speaking, local chromatin environment around genes is one of important components leading to different histone modification code of each gene. While chromatin environment differs in different chromosomes, duplicate genes locating in different chromosomes may be under different chromatin environment, possessing the chromosomespecific HM profile. Following this argument, we classified all yeast gene pairs (duplicate and randomized singleton pairs) under study into two groups: they are located in the same chromosome, or different chromosomes, and tested the relationship between histone modification pattern and chromosome condition. We observed that though the HM profile divergence is not significantly correlated with location distance of gene pairs in the same chromosome (Pearson's product-moment correlation: r = 0.06, P = 0.09 for promoter region and r = 0.02, P = 0.45 for ORF region), gene pairs locating in the same chromosome share more common HM code than that in different chromosomes (Wilcoxon rank sum test: P = 0.07 and P = 0.01 for promoter and ORF regions, respectively; Figure 1B).
The chromosome effect on the HM-code divergence between duplicate genes may suggest an alternative interpretation about a higher similarity of HM distance between duplicate genes than random pairs. That is, duplicate pairs tend to be located in the same chromosome because of tandem gene duplications, compared to randomized singleton pairs (Chi-squared test: χ 2 =17.2, d.f. = 1, P < 10 -4 ). To rule out this possibility, we chose duplicate pairs and randomized singleton pairs where both copies are belonging to different chromosomes, and observed the similar result to Figure 1A (see Additional file 1: Figure S1). Hence, we conclude that duplicate genes represent their functional redundancy at the level of histone modification mediated regulatory network.
The evolution of HM is coupled with coding sequence divergence after gene duplication We further expect that functional redundancy in histone modification code of duplicate genes as shown in Figure 1A would maintain a high degree in recently duplicated genes, and low in ancient duplicates. To verify this claim, we investigated the relationship between the distance of HM code (D HM-P or D HM-O ) and coding sequence divergence (the synonymous distance K S or the nonsynonymous distance K A between duplicate genes). Considering the statistically unreliable estimation of synonymous or nonsynonymous substitution distance when K S or K A becomes larger because of repeated substitutions at the same site, we selected duplicate pairs with K S < 2.0 and K A < 0.5 for this analysis. We observed that both D HM-P and D HM-O are positively correlated with K S or K A (Pearson's product-moment correlation: all r > 0.45, P < 10 -15 for all data points; Figure 2), suggesting that the divergence of HM code is coupled with the coding sequence divergence between duplicate genes. To avoid correlated data points bringing the bias, we selected independent pairs of duplicate genes using the method from Zou et al. [9] and reanalyzed. The similar result remains hold (all r > 0.40, P < 10 -13 for all data points; in Additional file 1: Figure S2).
Considering K S or K A as a proxy to evolutionary time since gene duplication, we suggest that the correlation between the HM-code divergence and the coding sequence divergence has been mainly driven by mutations accumulated with evolutionary time. Our interpretation is based on two reasons: First, we observed a weak negative correlation between the HM divergence and the K A /K S ratio of duplicate genes (Pearson's productmoment correlation: r = -0.12, P < 10 -7 for D HM-P and r = -0.05, P = 0.04 for D HM-O ). As the K A /K S ratio is an indicator of sequence conservation in coding region, our finding implies that duplicate genes with stringent functional constraints on coding sequence may have greater divergence in the HM code, but the effect is marginal. Second, promoter HM code of duplicate genes diverges much quicker than that of ORF HM code (Wilcoxon rank sum test: P < 10 -11 ; Figure 2), while significant but weak difference of D HM-P and D HM-O in randomized singleton pairs (Wilcoxon rank sum test: P = 0.02; Figure 1). Some factors may be involved to accelerate the divergence of promoter HM code between duplicate genes, such as the evolution of transcription factors (TF) shared by duplicate genes.
Co-evolution of the HM-code divergence between duplicates with several genetic regulatory elements The interaction between epigenetic and genetic elements in gene regulation has been increasingly acknowledged [21,26], raising an interesting question whether the divergence of histone modification code between duplicate genes co-evolves with some trans-acting factors binding to those duplicate genes. We first studied the relationship between the distance of promoter HM code (D HM-P ) and the distance of transcription factors (D TF ) and trans-acting expression quantitative trait loci (eQTLs) (D t-eQTL ) between duplicate genes. Trans-acting eQTLs of one gene represent all trans-regulatory proteins for its transcription, not limited to transcription factors. Two distance measures D TF and D t-eQTL were determined by Czekanowski-Dice formula (Methods). We found that they are significantly correlated (Peasron's product-moment correlation: r > 0.25, P < 10 -13 ; Figure 3A, 3C). Similar results were obtained in the case of ORF HM code ( Figure 3B, 3D).
As both D HM-P and D HM-O , as well as D t-eQTL and D TF , increase with evolutionary time (K S or K A as the proxy) [ Figure 2; 9], it is reasonable to suspect that K S or K A may underlie these statistically significant correlations between the HM code and genetic regulatory elements. We conducted the partial correlation in D HM-P -D TF and D HM-P -D t-eQTL of duplicate genes under the controlling of K S and K A variables (with the restriction of K S < 2.0 and K A < 0.5), and still observed the significant relationship (r = 0.18, P < 0.001 for D HM-P -D TF and r = 0.15, P < 0.05 for D HM-P -D t-eQTL ), though they are relatively weak. In short, our analysis provides the evidence that the histone modification code and trans-regulators shared by duplicate genes may have co-evolved since gene duplication.
Moreover, we design the following analysis to further explore the co-evolution between the HM code and transregulators (TF or trans-acting eQTL), by dividing yeast genes into two categories, trans-targeted genes and controlling genes. Trans-targeted genes are genes that are targeted by transcription factors (TF-targeted genes) or have at least one trans-acting eQTL (trans-eQTL acting genes) and the rest of genes are controlling genes (Methods). We totally obtained 4495 trans-targeted genes and 2226 controlling genes ( Figure 4A). Interestingly, both promoter and ORF HM code distances of duplicates in the group of trans-targeted genes are, on average, significantly higher than those in the group of controlling genes ( Figure 4B) (Wilcoxon rank sum test: promoter, P = 0.003 and ORF, P = 0.0001). It should be noticed that the pattern we observed in Figure 4B would not be affected by the strong correlation between the HM divergence and evolutionary time (K S as the proxy) of duplicate genes, because the distribution of K S has been found no significant difference between trans-targeted genes and controlling genes (Wilcoxon rank sum test: P = 0.08).
The HM-code divergence and TATA-box regulation TATA box is the core promoter element for gene regulation responding to environmental stresses [27,28]. To examine the role of TATA box in the HM-code divergence between duplicate genes, we divided all yeast duplicate pairs into three groups, TATA-containing (both have TATA-box), TATA-less (both do not have TATA-box), and TATA_diverge (only one copy has TATA-box). We compared the HM-code distance in promoter (D HM-P ) and ORF (D HM-O ) region between duplicate genes in these groups. Interestingly, both D HM-P and D HM-O show the highest degree in the TATA-diverge group, and the lowest in the TATA-containing group ( Figure 5) (Wilcoxon rank sum test: promoter, P < 10 -10 ; ORF, P < 10 -15 ).
Our explanation is as follows. Note that TATAcontaining genes may interact with some specific chromatin modification factors to regulate gene expression [29]. In the TATA_diverge group, only one duplicate with TATA-box has such interaction, resulting in a higher HM-code divergence between them. By contrast, in the case of TATA-containing group, both duplicates with TATA-box have similar interactions, resulting in a higher HM-code similarity between them.
Biological functions and the HM-code divergence between duplicate genes Do different biological functions affect the level of the HM-code divergence between duplicate genes? We used GO (Gene Ontology) Slim (biological process with 37 categories; see Methods) to address the issue. We analyzed D HM-P and D HM-O of duplicate genes in each GO category and compared D HM-P and D HM-O among these GO groups by one-way analysis of variance (ANOVA). Our finding is that duplicate pairs with different biological processes differ significantly in D HM-P and D HM-O (P < 10 -13 for D HM-P , P < 10 -15 for D HM-O ; Table 1). Use D HM-P for an example, duplicate genes involved in cofactor metabolic process, cellular amino acid and derivative metabolic process, cell cycle and cytoskeleton organization may have greater HM-code divergence, i.e., a higher D HM-P , while duplicate genes involved in translation, DNA metabolic process, pseudohyphal growth and transcription may have lower divergence of the promoter HM code. Moreover, we conducted a similar analysis on 24 cellular components (GO Slim classification, see Methods), and observed that duplicate genes in different subcellular localization also undergo the different evolutionary rate of histone modification code after gene duplication ( Table 2). It is possible that duplicate genes in some biological processes or subcellular localization are young (measured by small K S ) while others are old (with high K S ). Since D HM-P and D HM-O are positively correlated with K S (Figure 2), we have to remove the confounding effect caused by K S . We used the analysis of covariance (ANCOVA) (D HM-P or D HM-OF igure 3 Relationship between the distance of histone modification pattern and trans-regulators shared by duplicate genes. (A) and (B) show the correlation between the distance of transcription factors shared by duplicate genes and the divergence of promoter and ORF histone modification pattern between duplicate genes, respectively, while (C) and (D) represent the interrelationship between the distance of trans-acting eQTLs targeted to the duplicate genes and the distance of promoter and ORF histone modification pattern between duplicate genes, respectively. K S + T (biological process or subcellular localization) + K S : T) and the result remains statistically highly significant (Table 1, 2), suggesting K S is not a confounding factor that may affect our analyses.

The expression divergence under genetic, epigenetic and stressful perturbations
It is well-documented that gene expression divergence of duplicate genes increases with evolutionary time, but the underlying mechanism remains a subject of debate [30]. The analysis we describe below is to know whether the divergence of HM-mediated regulatory network affects the expression divergence between duplicate genes. We compared the interrelation between the histone modification pattern distance (D HM-P , D HM-O ) and the expression distance (E) between duplicate genes, and found that they are significantly correlated (Pearson's product-moment correlation; D HM-P -E: r = 0.24, P < 10 -15 ; D HM-O -E: r = 0.30, P < 10 -15 ; Figure 6).
We further divided yeast expression profiles into four types from different perturbation conditions: 1) normal developmental or physiological conditions, defined as 'Normal' treatment; 2) a set of conditions where expression changes attribute to environmental stresses, denoted as 'Stress' treatment; 3) conditions where single gene coding chromatin modifiers (CM) like SWI/SNF, HDACs, HATs, etc. was deleted, denoted as 'CM_del' treatment; 4) conditions where single gene coding transcription factors (TF) was deleted, denoted as 'TF_del' treatment. The latter two types are able to test the effect of chromatin modification related proteins and transcription factors on other genes, respectively. We then calculated the expression divergence between duplicate genes under these four types of conditions, denoted by E Normal , E Stress , E CM_del , E TF_del , respectively. We observed that the expression distance between duplicate genes in 'CM_del' condition (E CM_del ) is significantly greater than that in 'Normal' condition (E Normal ) (Wilcoxon rank sum test: P < 10 -10 ; Figure 7A), but much lower than that in 'Stress' and 'TF_del' conditions (E Stress and E TF_del ) (Wilcoxon sum rank test, P < 10 -

15
; Figure 7A). Results imply that histone modification related enzymes and gene associated histone modification profile may indeed influence the expression evolution of duplicate genes, though the relative contribution to the expression divergence between duplicate genes is highly lower than genetic related factors like transcription factors, even externally environmental stresses.
TATA-containing genes are usually enriched in stressrelated genes [29], and represent high expression variability [31,32]. In our study, we detected that in four disturbed   conditions (Normal, Stress, CM-del, TF_del), the expression divergence in TATA-diverge and TATA-containing groups are significantly larger than that in TATA-less duplicate genes ( Figure 7B). The discrepancies between expression change and the histone modification divergence in these duplicate gene types (TATA-containing, TATA-less, TATAdiverge) are observed.

Discussion
The divergence of HM profile between duplicate genes Our detailed analyses on yeast whole-genome histone modification (HM) code profile have shown that duplicate genes share more common HM-code patterns than randomized singleton pairs in their promoter and ORF regions, and the HM-code divergence between duplicates in both regions increase with the sequence divergence. This finding supports the notation that epigenetic divergence between duplicate genes may have been driven by the accumulation of mutations with the duplication time in both their promoter and ORF regions, because it has been shown that the divergence of coding sequence such as K S between yeast duplicates is a proxy to evolutionary time. In other words, no external driving force is needed to explain the HM profile divergence between duplicates, though it remains possible of neofunctionalization based upon divergent HM profile through some positive selection mechanisms.

Hypothesis-based genomic correlation analysis
Genome-wide functional analysis of duplicate genes is in attempt to reveal functional correlation between genetic and epigenetic factors during the process of functional innovation through gene duplication. However, the universal confounding effect of the sequence divergence (K S ) has complicated the practical analyses. Consequently, the controversy about the cause-effect interpretation has been inevitable, because genome-wide analysis of duplicate genes has been viewed as an exploration rather than a hypothesis-testing approach.
Since the correlation between the HM profile divergence and the sequence divergence actually reflects the fundamental evolutionary process driven by the accumulation of mutations, we view this as a null hypothesis in the genome-wide functional analysis of duplicate genes. That is, any meaningful inference about the functional correlation within or between genetic and epigenetic elements needs to reject this null hypothesis, as we have shown in this study.
Interaction between genetic and epigenetic elements: who is the driver?
The interaction between epigenetic and genetic elements in gene regulation has been increasingly acknowledged. For instance, the establishment of the histone modification code may partially involve the recruitment of specific histone modifying enzymes such as HATs, HDACs, HMTs by transcription factors (TF) [26]. Meanwhile, the distinctly combinatory histone modification code associated with gene may also provide specific binding code that is read by other transcription factors [21]. These observations raise an interesting question whether the divergence of histone modification code between duplicate genes may co-evolve with trans-acting factors binding to those  duplicate genes, e.g., transcription factors, histone modifying enzymes.
We have observed a higher divergence for HM code of duplicate genes in the category of trans-targeted genes than that of controlling genes, suggesting that the change of trans-regulators binding to duplicate genes may affect the pattern of HM code in both promoter and ORF regions, and thus accelerating the divergence of histone modification code between duplicate genes. Yet, it remains unclear about the cause-effect relationship. For instance, does the divergence of trans-regulators to duplicate genes facilitate the divergence of histone modification between duplicate genes, or vice versa? Our further study will address this issue.

Functional preference in the HM code divergence between duplicate genes
We observed the functional bias on the HM-code divergence after gene duplication. One possibility is that histone proteins associated with genes in different biological functions may be subject to differentially post-translational modification, leading to different divergence rate of histone modification code between duplicate genes. Since histone modification process of one gene largely depends on its chromatin environment and DNA sequence interacted by histone modifying enzymes [33], functionally selective constraints may also be imposed on histone modification evolution associated with that gene, a situation similar to DNA sequence evolution.

Conclusions
In this study, we unveiled the evolution of yeast histone modification code since gene duplication. Though duplicate genes represent functional redundancy at histone modification level compared with single-copy genes, the histone modification divergence occurred along with evolutionary time (K S as the proxy), which possibly due to the coding sequence evolution after gene duplication. Moreover, the histone modification code in ORF region evolves slower than that in promoter region, indicative of functionally selective constraints on protein sequences. Going further, after controlling the confounding effect of the coding sequence divergence (K S ), the histone modification code co-evolves with cis-(TATA box) and trans-(TF and trans-acting eQTL) regulatory factors, confirmed by the fact that the histone modification code is shaped by the combined interaction among histone-modifying enzymes, trans-acting elements and cis-regulatory motif. In addition, histone modification makes contribution to the expression divergence between duplicate genes, despite the minor effect compared to transcription factors and environmental stresses. Taking together, we provided the evidence of the co-evolution between genetic and epigenetic elements since gene duplication, together contributing to the expression divergence between duplicate genes.

Data of yeast histone modification pattern
Genome-wide histone modification pattern data of Saccharomyces cerevisiae were downloaded from Chroma-tinDB [34] (http://www.bioinformatics2.wsu.edu/cgi-bin/ ChromatinDB/cgi/visualize_select.pl). ChromatinDB provides the user with easy access to ChIP-microarray data for a large set of histones or histone modifications in S. cerevisiae, which includes 17 distinct histone modification combinations like dimethylation in Lys4 of histone H3 (H3K4me2), acetylation in Lys12 of histone H4 (H4K12ac), etc. and 5 histone protein occupancy levels (H2A, H2B, H3, H4, H2A.Z). We applied log base-2 of average enrichment ratio with nucleosome-normalizing for each of 22 histone modification data in promoter and open reading frame (ORF) regions of genes.

Yeast microarray expression data
A total of 84 microarray expression data points of S. cerevisiae whose expression changes are attributing to internal disturbing like developmental or physiological conditions were respectively collected [35][36][37]. This type was defined as "Normal" conditions, which are not genetically perturbed by regulatory network related elements like transcription factors and chromatin modifiers (CM) or other environmental stresses. We collected 170 gene expression profiles of yeast strains mutated for various chromatin modifiers from 26 publications [38]. We called this type of expression profile data as 'CM_del' conditions. Expression profile data of 263 transcription factor-deletion experiments were obtained from the Gene Expression Omnibus (GEO) database under the series accession number GSE4654 [39]. Similarly, this type was denoted as 'TF_del' experiments. A total of 504 cDNA microarray data points of yeast whose expression changes are attributing to environmental stresses were collected [9]. We call this type as 'Stress' conditions. Normalization was done as each original paper recommended.

Data of transand cisregulatory elements
Transcription factor (TF)-DNA binding profiles of yeast were downloaded from Lee et al. [40] and Harbison et al. [41]. In the study of Harbison et al. (2004), we just used 203 DNA-binding transcription factors in rich media conditions, regardless of 84 regulators in environmental stressed conditions. Most transcription factors in two studies are overlapped. Finally, we obtained 207 TFall S. cerevisiae genes binding interaction profiles. For each gene, a p-value was assigned to measure the probability of true TF-target interaction; a smaller p-value means the interaction is more likely. Here, we used relatively stringent significance level of 0.001 as cutoff to define the status of TF-target gene interaction. Two studies together determine all TF-target gene interactions, and we observed that 3183 genes are binding by at least one transcription factor (TF-targeted genes). Yeast genomic expression quantitative trait loci (eQTLs) data were downloaded from Brem and Kruglyak [42]. Wilcoxon-Mann-Whitney (WMW) test with the criterion of 50 kb interval was conducted to detect and define eQTL regions [9]. Finally, we obtained 2775 genes which at least have one trans-acting eQTL (trans-eQTL acting genes). Thus, we can divide all yeast genes into two categories, trans-targeted genes and controlling genes, where trans-targeted genes are the union of TF-targeted genes and trans-eQTL acting genes, while controlling genes are the reminders. Since trans-eQTL acting genes were not only regulated by transcription factors, but most by chromatin related enzymes and other factors [43], trans-targeted genes may have the possibility to be regulated by all trans-regulators, not restricted to transcription factors.
There are two types of genes, TATA-containing genes and TATA-less genes [29]. We classified all duplicate pairs into three categories, TATA-containing, TATA-less and TATA-diverge pairs. TATA-containing and TATA-less types are those duplicate pairs where both copies have or don't have TATA box, respectively, while TATA-diverge type refers to duplicate pairs where one copy belongs to TATA-containing genes, and the other TATA-less genes.

Protein subcellular localization and biological process
The information of protein localization and biological process for S. cerevisiae was defined by the Gene Ontology (GO) classification, and downloaded from Saccharomyces Genome Database. GO Slim was used to classify 24 cellular component sorts and 37 biological process categories. A duplicate gene pair was assigned to a GO Slim term if both duplicate copies are belonging to this GO term, or one copy is belonging to while the other is not annotated.

Defining functional divergence between yeast duplicate genes
There are two types of histone modification profiles, histone modification pattern associated with gene promoter region and open reading frame (ORF) region. One minus Pearson's product-moment correlation coefficient (1-r) was used to determine the distance of these two types of histone modification pattern between duplicate genes, shortly denoted as D HM-P for promoter histone modification distance and D HM-O for ORF histone modification distance.
We modified the Czekanowski-Dice formula [44] to calculate the distance of transcription factors or trans-acting eQTLs shared by duplicate gene 1 and 2 in a duplicate pair, shortly denoted as D TF and D t-eQTL , respectively. Suppose Δ 12 be the number of TFs or trans-acting eQTLs that differ between one duplicate pair; y 1 [y 2 be the number of TFs or trans-acting eQTLs that regulate at least one of duplicate genes, and y 1 \y 2 be the number of shared TFs or trans-acting eQTLs between a duplicate pair. Then, the TF distance or trans-acting eQTL distance between duplicate genes 1 and 2 is defined as follows: Apparently, the greater the value, the higher degree of TF or trans-acting eQTL divergence between duplicate genes.
We used evolutionary distance (E) defined by Gu et al. [5] as the measure of expression divergence between duplicate genes of four types, shortly denoted as E Normal , E CM_del , E TF_del and E Stress for 'Normal' , 'CM_del' , 'TF_del' and 'Stress' experiments, respectively. Specifically, for any duplicate gene 1 and 2, let x 1k and x 2k be its expression level, respectively, in the kth microarray experiment; x 1 and x 2 be the mean of expression level in kth microarray experiments, respectively, where k = 1,. . .m. The formula of the expression distance (E) between gene 1 and 2 is as follows: Determination of yeast duplicate pairs The method of Gu et al. [45] was applied to identify duplicate genes. As the criterion of 80% alignable regions between protein sequences is too stringent, and may miss some duplicate genes, we reduced this criterion to 50%. All pairs of duplicate genes in each gene family were used for the analysis. The reminders of S. cerevisiae genes were considered as singleton genes. The rate of synonymous substitutions (K S ) and nonsynonymous substitutions (K A ) between duplicate genes were estimated using PAML [46] with default parameters.

Additional file
Additional file 1: Figures S1-S2. are available at online web site of BMC Evolutionary Biology journal.