Gene expression is regulated by many factors acting in concert with the status of the chromatin environment, particularly histone methylation . The SET-domain-containing protein family is a major player in histone methylation. These proteins are responsible for the methylation of lysine (K) residues in various histones, specifically K4, K9, K27, and K36 in histone H3 and K20 in histone H4 ; H3K79 is an exception . All members of this family share a highly conserved SET domain , named for three Drosophila melanogaster proteins: S
uppressor of variegation 3-9 (Su(var)3-9) (Suv), E
nhancer of Zeste (E(z)) and T
rithorax (Trx). This domain has approximately 130 amino acids and has been found in all eukaryotic organisms studied so far . Proteins containing the SET domain can also be found in viruses as well as both domains of prokaryotes [4, 5].
Recent studies have revealed that the SET domain (here after referred as SET) genes are important for regulating growth and reproduction processes, such as control of flowering time and embryogenesis in plants . Genome sequencing has uncovered many genes encoding SET-domain proteins; in particular, Arabidopsis SET genes are the best annotated and characterized. For example, the Arabidopsis CURLY LEAF (CLF) gene is required for stable repression of the floral homeotic gene AGAMOUS in leaves and stems . ARABIDOPSIS TRITHORAX 1 (ATX1) functions as an activator of homeotic genes, like Trithorax in animal systems . ARABIDOPSIS TRITHORAX-RELATED PROTEIN 7 (ATXR7) is an H3K4 methylase required for proper expression of the Flowering Locus C (FLC) gene . The Arabidopsis ASH1 HOMOLOG 2 (ASHH2) protein has been suggested to methylate H3K4 and/or H3K36, similar to Drosophila ASH1 and yeast SET2, a H3K36 histone methyltransferase (HMT) . Other SET genes are associated with embryogenesis, including MEDEA (MEA); a maternally inherited loss-of-function mea allele results in embryo abortion and prolonged endosperm production . Recently, ATXR3 is crucial for both sporophyte and gametophyte development and encodes the major enzyme responsible for trimethylation of H3K4 [12, 13].
In plants, at least 47, 33, 31, and 43 SET genes have been identified in Arabidopsis, grape, maize and rice, respectively [14–16]. In both Arabidopsis and rice it has been determined that many SET genes are located in large blocks of related regions derived from whole genome duplication events, indicating that whole genome duplication could be an important contributor to the duplication of SET genes . In addition, different classifications of SET genes were used in different plants. Initially, 37 putative Arabidopsis SET domain proteins were classified into four distinct classes: (I) enhancer of zeste [E(z)] homologs; (II) trithorax (Trx) homologs and related proteins; (III) Ash1 homologs and related proteins; and (IV) suppressor of variegation [Su(var)] homologs and related proteins . In another study, 32 Arabidopsis and 22 maize SET genes were classified into five classes according to phylogenetic relationships and domain organization . More recently, two additional classes (VI and VII) were recognized for SET genes in Arabidopsis, grape, maize, and rice [14–16]. Interestingly, in Arabidopsis several genes in class III, like ATXR3, were shown to be crucial for both sporophyte and gametophyte development [12, 13]. Moreover, Arabidopsis has ten Su(var) homologue (SUVH) genes belonging to Class V, including several that control heterochromatic domains. Loss of function of these genes suppresses gene silencing, whereas overexpression enhances silencing, causing ectopic heterochromatization and significant growth defects in Arabidopsis . Therefore, SET genes in different subfamilies could have diverse functions.
Previous studies of SET genes have focused on annotation and Arabidopsis functional characterizations [7–13, 19]; in addition evolutionary analyses have been limited to herbaceous plants [14–18]. Trees are distinct from herbaceous species in many ways: they have a self-supporting structure, the secondary growth or wood, and a much longer lifespan [20, 21]. The regulatory networks and molecular mechanisms that underlie these unique properties cannot be investigated through the examination of nontree species. Therefore, it is worthwhile to study SET genes in trees, thereby improving our understanding in their functions and the evolution of SET genes. The recently completed genome sequence of the model tree, Populus trichocarpa (hereafter called Populus) , provides a great opportunity to investigate these issues.
Molecular evidence suggests that Arabidopsis and Populus shared their last common ancestor as much as 100 to 120 million years ago . Since then, Arabidopsis and Populus have evolved different life histories, including herbaceous versus arboreal development, annual versus perennial habit, and self-pollination versus cross-pollination strategies [20, 21]. In addition, since they diverged from each other, Populus has experienced whole genome duplication once, whereas Arabidopsis has twice [22, 23]. In plants, evolutionary diversity has been hypothesized to be modulated directly or indirectly by epigenetic regulations . Therefore, the SET gene family, among the most important epigenetic regulators, could be postulated to contribute substantially to the evolutionary innovations in plant diversity.
We conducted a comparative analysis of SET genes from Arabidopsis and Populus to address the key question: how have SET genes evolved in Populus after the divergence of Arabidopsis and Populus. In particular, how did the SET gene family expand and diversify in Populus? In this study, we performed comprehensive analyses of SET genes from Populus, including phylogeny, gene structure, domain architecture, gene duplication and diversification, and expression profiling analyses. Our results provide insight into the function of Populus SET genes and provide a basis for understanding how gene functions, particularly functions involved in the development of trees, have evolved.