- Research article
- Open Access
Functional bias in molecular evolution rate of Arabidopsis thaliana
© Warren et al; licensee BioMed Central Ltd. 2010
- Received: 29 September 2009
- Accepted: 1 May 2010
- Published: 1 May 2010
Characteristics derived from mutation and other mechanisms that are advantageous for survival are often preserved during evolution by natural selection. Some genes are conserved in many organisms because they are responsible for fundamental biological function, others are conserved for their unique functional characteristics. Therefore one would expect the rate of molecular evolution for individual genes to be dependent on their biological function. Whether this expectation holds for genes duplicated by whole genome duplication is not known.
We empirically demonstrate here, using duplicated genes generated from the Arabidopsis thaliana α-duplication event, that the rate of molecular evolution of genes duplicated in this event depend on biological function. Using functional clustering based on gene ontology annotation of gene pairs, we show that some duplicated genes, such as defense response genes, are under weaker purifying selection or under stronger diversifying selection than other duplicated genes, such as protein translation genes, as measured by the ratio of nonsynonymous to synonymous divergence (dN/dS).
These results provide empirical evidence indicating that molecular evolution rate for genes duplicated in whole genome duplication, as measured by dN/dS, may depend on biological function, which we characterize using gene ontology annotation. Furthermore, the general approach used here provides a framework for comparative analysis of molecular evolution rate for genes based on their biological function.
- Gene Ontology
- Gene Pair
- Duplicate Gene
- Nonsynonymous Substitution
- Endomembrane System
Gene duplication has been considered to be an important process for creating novel gene function [1, 2]. Duplicated genes give organisms the opportunity to "experiment" with mutations without losing the biological function of the original gene because when one copy experiences a deleterious mutation that destroys its function, the other copy can still be functional. Beneficial mutations can be retained due to the advantages they confer. Neutral mutations, such as most synonymous mutations, or some nonsynonymous mutations that do not alter gene function, can also be retained. With accumulated mutations, the pair of duplicate genes will diverge in sequence, and possibly function, over time. The amount of sequence divergence depends on (a) the natural mutation rate, determined by environment condition, error correction mechanism, GC content etc., and (b) the selective constraint due to a gene's biological function, i.e. biological functions that are more likely to benefit from mutations could be under diversifying selection whereas those that are less likely to benefit from mutations could be under purifying selection [3–10].
Gene sequence divergence can be measured by the number of nonsynonymous mutations per site, dN, i.e nucleotide substitutions that change the amino acid encoded by the codon. However, in addition to selection due to evolutionary pressure, other factors such as expression level , guanine-cytosine (GC) content , and location on the chromosome , have been found to affect dN. To isolate the effect of selective constraint, dN can be compared to the number of synonymous mutations per synonymous site, dS, i.e. nucleotide substitutions that do not change the amino acid encoded by a given codon. dS is also affected by expression level, GC content [14, 15], and chromosomal location . Because synonymous changes do not change the amino acid, they are often considered to be selectively neutral (although evidence shows that synonymous changes can be weakly selected ). Thus the ratio of nonsynonymous to synonymous mutations, dN/dS, should measure the net effect of selection on molecular evolution rate, assuming the other factors equally affect synonymous and nonsynonymous mutations. A study by Williams and Hurst  indicates that in some cases chromosome location may not equally affect dN and dS.
Previous work in plants reveal that rates of nonsynonymous substitutions were negatively correlated with GC content at synonymous third codon positions, and synonymous substitution rates were negatively correlated with codon bias, similar to what has been found in the animal system . Moreover, a study of 83 genes in Arabidopsis thaliana and Arabidopsis lyrata has shown a significant negative correlation between the rates of nonsynonymous substitutions and gene expression level . Ganko et al.  found that average expression level and breadth of expression tend to decline with dN in duplicated genes of Arabidopsis but do not have a significant correlation with dS. Furthermore, it has been well documented that some functional classes of genes have distinct expression levels . However, there has not been a large-scale study on how variations in selective constraint are reflected in gene functions. In this work we use the genes from the Arabidopsis thaliana gene duplication events of 20-60 MYA [21, 22], i.e. the α-duplication event, as a benchmark to investigate the effect of biological function on molecular evolution rate of the duplicated genes. Using a single genome duplication event eliminates variations in divergence rates due to factors such as population size that may be different for different organisms.
A whole genome duplication event in an ancestral species creates many paralogs. Some of these genes may be discarded over time. There is evidence for a functional bias for those paralogs that are retained . Those that are retained tend to be functionally redundant and take advantage of the dosage effect or have functional divergence through neo- and subfunctionalization. However, "the incidence of functional divergence among duplicated genes is difficult to quantify" . The objective of this study is to empirically examine the relationship between biological function and the rate of sequence divergence after a duplication event. To this end we have employed Gene Ontology (GO) annotations in our analysis to describe the common biological role of a paralogous pair. Because many of the duplicated genes are also experimentally characterized in the TAIR gene ontology (GO) dataset , it may be possible to discern properties of evolutionary rates based on GO categories.
We used the Bowers et al.  dataset for the A. thaliana whole genome α-duplication event, which based on estimation, occurred before the divergence of Arabidopsis thaliana from Brassica but after its divergence from the Malvaceae.
To assess the overall effect of gene function on sequence divergence, we placed genes into functional groups. The objective in creating these groups can be defined as follows. For any two gene pairs the more specific their shared role in the cell the tighter their subsequent grouping. To do this we used the GOSim package  to cluster genes based on their functional profile. The GOSim package "provides the researcher with various information theoretic similarity concepts for GO terms." Within the GOSim package we selected the Resnik method  to create term, term similarities for all pairs of terms in the Gene Ontology. This is defined to be: , where Pa(t, t') denotes the term set of all common ancestors of GO terms t and t', and IC(t) is the information content of term t as defined by Lord et al. . These similarities were combined using the "optimal assignment" method by Frohlich et al.  to give coefficients, for all pairwise combinations of gene pairs, that indicate the functional similarity of selected pairs. The optimal assignment method assigns each term of the gene with fewer annotations to exactly one term of the other gene, such that the sum of term-term similarities is maximized. Based on these coefficients, Ward's hierarchical clustering algorithm  was used to group genes together with similar functional profiles. The resulting hierarchical tree was cut using a bottom up approach such that each group meets a minimum size constraint of ≥ 20. In order to maintain good functional specificity for shared functions, without reducing the population of the resulting groups to a trivial number, groups were defined by the lowest internal node that achieved the minimum size threshold. For Ward's clustering algorithm the height corresponds to the analysis of variance (ANOVA) sum of squares difference between two clusters added up over all the variables within those clusters.
The protein and DNA sequences were obtained from The Arabidopsis Information Resource (TAIR) database on February 20, 2008 . We aligned the protein sequences for duplicate gene pairs using the needle program, with default parameters. The program implements the Needleman-Wunsch global alignment algorithm . We then aligned the DNA sequences for duplicated gene pairs according to the aligned protein sequences using the PAL2NAL program . Last, to calculate dN/dS for duplicate gene pairs, we use the yn00 with default parameters within the Phylogenetic Analysis of Maximum Likelihood (PAML) program . yn00 implements the method of Yang and Nielsen  which calculates dN/dS taking into account transition/transversion rate biases and base/codon frequency biases.
To evaluate statistical correlation between dN/dS and functional groups we used the analysis of variance (ANOVA) method . ANOVA uses Fisher's F-test to determine statistical significance of variance in group means compared to the mean for a group. The resulting p-value determines if the null hypotheses, that mean dN/dS values are equivalent for all functional groups, should be rejected. To identify the specific groups that are significantly different, we used Tukey's honestly significant difference (HSD) criteria which is based on Studentized range distribution for determining critical values .
To characterize specific functional groups we examine the group's relative enrichment of GO annotation for genes in the group compared to all the genes in the α-duplication event. For this purpose we use the Ontologizer software's parent-child union method with Bonferroni correction .
There are a total of 3822 duplicated gene pairs from the α duplication event. After removing the gene pairs that only have electronic annotation of GO terms, we had 2728 duplicated gene pairs. Then after removing the gene pairs that have shared gene function that is at depth 1, we were finally left with 2683 duplicated gene pairs.
Prob > F
We performed functional enrichment analysis to identify the primary biological function (GO term) representing each of the groups. The top three GO terms from the enrichment analysis for all the groups and visualizations of enrichments for the examples mentioned here are provided in Additional file 1. Figure 3 shows that molecular function group 27 and group 28 genes have a sharp contrast in dN/dS values. Group 27 is enriched with genes that are structural constituent of ribosome (GO:0003735, p-value = 2.11E-240). In contrast, group 28 is enriched with genes that have the molecular function of, enzyme inhibitor activity (GO:0004857, p-value = 1.44E-079) and pectinesterase inhibitor activity (GO:0046910, p-value = 3.42E-074). We noticed that group 29 also has a relatively high mean dN/dS, and is enriched with genes with signal transducer activity (GO:0060089, p-value = 1.72E-039) and activation of innate immune response (GO:0002218, p-value = 1.26E-005), which seems to be consistent with the fact that immune related genes tend to evolve fast. Group 25 also has elevated dN/dS values and is enriched for lipid binding (GO:0008289, p-value = 1.35E-078). An ontology enrichment figure for these groups can be found in Additional file 2.
Figure 4 shows that groups 1, 15 and 24 genes have a sharp contrast in dN/dS values. Group 1's slowly-evolving genes are enriched with functions involved in the cellular macromolecule biosynthetic process (GO:0034645, p-value = 1.70E-078). In contrast, group 15 has elevated dN/dS values and is enriched with genes involved in lipid transport (GO:0006869, p-value = 1.62E-093). All but one pair of genes from this group are members of Group 25 (enriched for lipid binding) from the molecular function analysis. The other group with high dN/dS values, Group 24, has genes involved in defense response (GO:0006952, p-value = 6.48E-047), response to stimulus (GO:0050896, p-value = 3.11E-022), and activation of immune response (GO:0002253, p-value = 7.31E-004). An ontology enrichment figure for these groups can be found in Additional file 3.
Figure 5 shows that group 2 and group 20 genes have a sharp contrast in dN/dS values. Analysis of functional enrichment in these groups shows that group 2 is enriched with genes that make up components of cytosolic ribosomes (GO:0022626, p-value = 1.61E-115). Group 20 is enriched with genes whose protein products are anchored to membrane (GO:0031225, p-value = 7.67E-94). Group 14 has a high dN/dS value and is significantly enriched for genes whose products are contained in the nucleus (GO:0005634, p-value = 1.32E-107). Further analysis of this group with respect to molecular function shows enrichment for genes involved in transcription regulator activity (GO:0030528, p-value = 2.53E-265). Two other groups with genes having high dN/dS, groups 23 and 25, are enriched with genes that are components of endomembrane system. Out of the 872 genes found to be annoated with endomembrane system in the α-duplicated genes, 623 were found to be in groups 23 and 25. An ontology enrichment figure for these groups can be found in Additional file 4.
When making predictions about evolution, a frequent caveat is that the results are given under the assumption that the genes/proteins involved evolve at some constant rate defined as a parameter in the model. Bos and Pasada  have suggested fitting the model to the genes being used to obtain more accurate results. Why not then also adjust the rate at which the genes are assumed to evolve based on selective pressure? We are not aware of any data that exists which defines, in broad fashion, the tendency of genes to diverge at a certain rate based on its functional characteristics. Perhaps the most similar study to that done here, Blanc and Wolfe  investigate a similar but distinct issue by comparing the divergence of paralogous pairs to an outgroup protein. They found a statistically significant divergence rate for 173 out of the 833 paralogous pairs analyzed. However, their analysis was conditioned on the choice of an appropriate outgroup protein. They investigated functional enrichment for duplicated pairs where one gene diverges faster than the other in comparison to an outgroup protein, i.e., "asymmetric divergence". In their work the aim is to evaluate the probability that "the two duplicated protein sequences evolve at the same rate." In this work we are looking for significant differences in the average dN/dS ratio for groups that have been clustered according to their functional profiles and use only the differences between two α-duplicated genes.
The analysis we employ presents compelling evidence for functional bias in molecular evolution rate of duplicated genes arising from the Arabidopsis α duplication event. However, it is possible that there is another cause for the differences in dN/dS, or that some of the conditions imposed in our analysis has inadvertently biased the results. To try and account for some of these possibilities we examine the effect of chromosome location, combining multiple and single duplicates, and the clustering parameters for defining functional groups.
Effect of chromosomal location
A recent study has shown that genes from the same functional classes tend to cluster together on the chromosomes in Arabidopsis thaliana . Also, gene expression is highly correlated between neighboring genes on a chromosome in Arabidopsis thaliana . In addition, the distribution of tandemly arrayed genes appears to be positively correlated with recombination rates in both Arabidopsis thaliana  and rice . Here we examined whether the rate of evolution in duplicated genes generated from the α duplication event in Arabidopsis is a function of chromosome location. The chromosome location can be calculated as the distance, in base pairs (bp) from the centromere to the midpoint of the gene. The correlation coefficient between dN/dS and chromosome location is 0.006 (p-value = 0.7) indicating that for the dataset used here, distance from the centromere is not the cause of variations in dN/dS values. Chromosomal location of duplicated gene pairs are shown in Additional file 5.
Effect of combining multiple and single duplicates
The Arabidopsis thaliana α-duplication dataset contains a number of genes that have more than one duplicate due to subsequent gene duplications. We refer to these genes as multiple duplicates and refer to the genes that do not have additional duplications following α-duplication as single duplicates. Analysis of variance between dN/dS and multiple/single duplicates has a p-value of < 10-16 indicating that copy number does influence the rate of divergence for multiple/single duplicates. A two-way analysis of variance between dN/dS and both functional groups and multiple/single duplicates also has a p-value of < 10-16 indicating that there is an interdependence between functional groups and multiple/single duplicates. Furthermore, an analysis of variance excluding genes with multiple duplicates does identify protein translation genes as having a significantly lower dN/dS value, but does not contain a sufficient number defense response genes to establish a functional group. Most (22 of the 27) defense response pairs have genes with multiple duplicates while few (7 of the 53) protein translation pairs have genes with multiple duplicates. This analysis shows that genes with multiple duplicates tend to diverge at a faster rate (i.e. high dN/dS) than other genes. A figure illustrating this difference can be found in Additional file 6. As there is no reason to believe that some genes duplicate more frequently than others, the duplicates for defense response genes must be retained more frequently than others. This is what one would expect if there were weaker negative selection against mutations in the defense response genes, stronger selective pressure and constraints on mutations in protein translation genes. This is further evidence that natural selection favors more variation in defense response genes, while variation in translation genes are selected against, probably to maintain this fundamental biological function. Our observation that multiple copy genes tend to evolve faster find both support and contradict previous studies. For example, Scannell and Wolfe  studied duplicated genes resulting from the whole genome in yeast and found that duplicated genes tend to evolve much faster than single copy genes (i.e. singletons). However, counter examples also exist. Jordan et al.  found that duplicated genes in several pairs of species such as human-mouse, fly-mosquito, and yeast-C. albicans, on average, evolved slower than singletons. It is not clear what causes the contradictory observations.
Effect of clustering parameters
Changing the clustering method or parameters will change the composition of functional groups. We tested a number of alternative clustering parameters to determine the effect on our results. For the alternative parameters tested, the ANOVA results showed p-values of < 10-6 indicating a strong correlation between dN/dS and functional groups. For example, using a fixed cluster cutoff height of 0.1 for biological process results in 36 groups, whereas using a cutoff height of 0.001 results in 222 groups. The corresponding p-values are < 10-16 and < 10-6 for the analysis of variance for dN/dS by group. Interestingly, the number of pairs in the "defense response" group does not change and the number of pairs in "protein translation" group changes a little from 53 to 45. This indicates that the results are robust with respect to different clustering parameters.
This analysis provided empirical evidence for functional bias in molecular evolution rate of genes duplicated by whole genome duplication in Arabidopsis. Furthermore, it identified specific functional groups that are likely to have significantly higher or lower molecular evolution rates. For example we found that defense response genes are highly variable while protein translation genes are highly conserved.
Intuitively, these findings are not surprising. Defense response genes should be highly variable to be able to respond to changing environmental conditions, and protein translation genes should be highly conserved since any change would effect all biological functions. Previous studies have shown that the molecular evolution in individual disease defense genes were subject to diversifying selection . It has also been shown that the molecular evolution rate for some ribosomal proteins, which are responsible for protein translation, is slower than that of other known cellular proteins . The fact that these well known patterns emerge from our functional clustering strongly indicate that the method proposed here is a viable one for investigating the relative divergence rates for genes based on their role in the cell, as described by the Gene Ontology. Besides providing independent support to previous findings, our study revealed additional groups of genes such as the ones involved in the endomembrane system in the α-duplication event that can be potentially interesting for future empirical studies.
Although our results are statistically significant, some additional analysis would define the extent to which function impacts divergence rate in duplicated genes. Further examination of the effect of GC content, protein domains, expression level and gene relocation, would clarify the relation between functional bias and other factors that correlate with divergence rate. Comparison to an outgroup and extending the analysis to another organism such as yeast, will provide supporting evidence and generalize the results to other organisms. It is well known that different functional classes of genes have different levels of expression and that expression has been shown to correlate negatively with nonsynonymous substitutions. A study of how expression level and breadth of expression manifests itself in various functional groups, relative to divergence rate, holds potential for future work.
There are also some refinements to the methods used here that should be considered for future studies. First is to devise a method to combine the three Ontology similarities computed for pairs in GOSim clustering. This should increase the resolution of the groups with respect to their role in the cell and would require only one functional enrichment analysis across all three ontologies, biological process, molecular function and cellular function. Second is to repeat functional clustering with only evidence code filtering, not shared function filtering. This will increase the number of pairs that remain and may find additional emergent functional properties. Finally, consider clustering by GO annotation for individual genes instead of the common annotation for gene pairs. This would produce a more enriched grouping which may provide better insight into the reasons for differences in dN/dS values.
This work was partially supported by NSF grant IIS-0710945 to LZ.
- Zhang J: Evolution by gene duplication: an update. Trends in Ecology & Evolution. 2003, 18 (6): 292-298.View ArticleGoogle Scholar
- Force A, Lynch M, Pickett BF, Amores A, Yan YL, Postlethwait J: Preservation of Duplicate Genes by Complementary, Degenerative Mutations. Genetics. 1999, 151 (4): 1531-1545.PubMed CentralPubMedGoogle Scholar
- Baer CF, Miyamoto MM, Denver DR: Mutation rate variation in multicellular eukaryotes: causes and consequences. Nature Reviews Genetics. 2007, 8 (8): 619-631. 10.1038/nrg2158.View ArticlePubMedGoogle Scholar
- Wagner A: Selection and gene duplication: a view from the genome. Genome Biology. 2002, 3 (5): reviews1012.1-reviews1012.3. 10.1186/gb-2002-3-5-reviews1012.View ArticleGoogle Scholar
- Tourasse NJ, Li WH: Selective Constraints, Amino Acid Composition, and the Rate of Protein Evolution. Mol Biol Evol. 2000, 17 (4): 656-664.View ArticlePubMedGoogle Scholar
- Seoighe C, Gehring C: Genome duplication led to highly selective expansion of the Arabidopsis thaliana proteome. Trends Genet. 2004, 20 (10): 461-464. 10.1016/j.tig.2004.07.008.View ArticlePubMedGoogle Scholar
- Stevens TJ, Arkin IT: Substitution rates in alpha-helical transmembrane proteins. Protein Sci. 2001, 10 (12): 2507-2517. 10.1110/ps.ps.10501.PubMed CentralView ArticlePubMedGoogle Scholar
- Stern DL, Orgogozo V: Is Genetic Evolution Predictable?. Science. 2009, 323 (5915): 746-751. 10.1126/science.1158997.PubMed CentralView ArticlePubMedGoogle Scholar
- Wolf YI, Novichkov PS, Karev GP, Koonin EV, Lipman DJ: The universal distribution of evolutionary rates of genes and distinct characteristics of eukaryotic genes of different apparent ages. Proceedings of the National Academy of Sciences. 2009Google Scholar
- Xia Y, Franzosa EA, Gerstein MB: Integrated Assessment of Genomic Correlates of Protein Evolutionary Rate. PLoS Comput Biol. 2009, 5 (6): e1000413-10.1371/journal.pcbi.1000413.PubMed CentralView ArticlePubMedGoogle Scholar
- Drummond D, Allan A, Bloom D, Jesse D, Adami C, Wilke O, Claus O, Arnold H, Frances H: Why highly expressed proteins evolve slowly. Proc Natl Acad Sci USA. 2005Google Scholar
- Rodríguez-Trelles F, Tarrío R, Ayala FJ: Fluctuating mutation bias and the evolution of base composition in Drosophila. Journal of Molecular Evolution. 2000, 50: 1-10.View ArticlePubMedGoogle Scholar
- Williams EJB, Hurst LD: The proteins of linked genes evolve at similar rates. Nature. 2000, 407: 900-903. 10.1038/35038066.View ArticlePubMedGoogle Scholar
- DeRose-Wilson L, Gaut B: Transcription-related mutations and GC content drive variation in nucleotide substitution rates across the genomes of Arabidopsis thaliana and Arabidopsis lyrata. BMC Evolutionary Biology. 2007, 7: 66-10.1186/1471-2148-7-66.PubMed CentralView ArticlePubMedGoogle Scholar
- Wright SI, Yau CBK, Looseley M, Meyers BC: Effects of Gene Expression on Molecular Evolution in Arabidopsis thaliana and Arabidopsis lyrata. Mol Biol Evol. 2004, 21 (9): 1719-1726. 10.1093/molbev/msh191.View ArticlePubMedGoogle Scholar
- Matassi G, Sharp PM, Gautier C: Chromosomal location effects on gene sequence evolution in mammals. Current Biology. 1999, 9: 786-791. 10.1016/S0960-9822(99)80361-3.View ArticlePubMedGoogle Scholar
- Duret L, Mouchiroud D: Determinants of Substitution Rates in Mammalian Genes: Expression Pattern Affects Selection Intensity but Not Mutation Rate. Mol Biol Evol. 2000, 17: 68-70.View ArticlePubMedGoogle Scholar
- Zhang L, Vision TJ, Gaut BS: Patterns of nucleotide substitution among simultaneously duplicated gene pairs in Arabidopsis thaliana. Mol Biol Evol. 2002, 19 (9): 1464-1473.View ArticlePubMedGoogle Scholar
- Ganko EW, Meyers BC, Vision TJ: Divergence in Expression between Duplicated Genes in Arabidopsis. Mol Biol Evol. 2007, 24 (10): 2298-2309. 10.1093/molbev/msm158.View ArticlePubMedGoogle Scholar
- Schmid M, Davison TS, Henz SR, Pape UJ, Demar M, Vingron M, Scholkopf B, Weigel D, Lohmann JU: A gene expression map of Arabidopsis thaliana development. Nat Genet. 2005, 37 (5): 501-506. 10.1038/ng1543.View ArticlePubMedGoogle Scholar
- Blanc G, Hokamp K, Wolfe KH: A recent polyploidy superimposed on older large-scale duplications in the Arabidopsis genome. Genome Res. 2003, 13 (2): 137-144. 10.1101/gr.751803.PubMed CentralView ArticlePubMedGoogle Scholar
- Bowers JE, Chapman BA, Rong J, Paterson AH: Unravelling angiosperm genome evolution by phylogenetic analysis of chromosomal duplication events. Nature. 2003, 422 (6930): 433-438. 10.1038/nature01521.View ArticlePubMedGoogle Scholar
- Blanc G, Wolfe KH: Functional divergence of duplicated genes formed by polyploidy during Arabidopsis evolution. Plant Cell. 2004, 16 (7): 1679-1691. 10.1105/tpc.021410.PubMed CentralView ArticlePubMedGoogle Scholar
- Rhee SY, Beavis W, Berardini TZ, Chen G, Dixon D, Doyle A, Garcia-Hernandez M, Huala E, Lander G, Montoya M, Miller N, Mueller LA, Mundodi S, Reiser L, Tacklind J, Weems DC, Wu Y, Xu I, Yoo D, Yoon J, Zhang P: The Arabidopsis Information Resource (TAIR): a model organism database providing a centralized, curated gateway to Arabidopsis biology, research materials and community. Nucl Acids Res. 2003, 31: 224-228. 10.1093/nar/gkg076.View ArticlePubMedGoogle Scholar
- Frohlich H, Speer N, Poustka A, BeiSZbarth T: GOSim - an R-package for computation of information theoretic GO similarities between terms and gene products. BMC Bioinformatics. 2007, 8: 166-10.1186/1471-2105-8-166.PubMed CentralView ArticlePubMedGoogle Scholar
- Resnik P: Using Information Content to Evaluate Semantic Similarity in a Taxonomy. IJCAI. 1995, 448-453.Google Scholar
- Lord PW, Stevens RD, Brass A, Goble CA: Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation. Bioinformatics. 2003, 19 (10): 1275-1283. 10.1093/bioinformatics/btg153.View ArticlePubMedGoogle Scholar
- Frohlich H, Speer N, Spieth C, Zell A: Kernel Based Functional Gene Grouping. Neural Networks, 2006. IJCNN '06. International Joint Conference on. 2006, 3580-3585. full_text.Google Scholar
- Ward J, Joe H: Hierarchical Grouping to Optimize an Objective Function. Journal of the American Statistical Association. 1963, 58 (301): 236-244. 10.2307/2282967.View ArticleGoogle Scholar
- Huala E, Dickerman AW, Garcia-Hernandez M, Weems D, Reiser L, LaFond F, Hanley D, Kiphart D, Zhuang M, Huang W, Mueller LA, Bhattacharyya D, Bhaya D, Sobral BW, Beavis W, Meinke DW, Town CD, Somerville C, Rhee SY: The Arabidopsis Information Resource (TAIR): a comprehensive database and web-based information retrieval, analysis, and visualization system for a model plant. Nucleic Acids Res. 2001, 29: 102-105. 10.1093/nar/29.1.102.PubMed CentralView ArticlePubMedGoogle Scholar
- Needleman SB, Wunsch CD: A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology. 1970, 48 (3): 443-453. 10.1016/0022-2836(70)90057-4.View ArticlePubMedGoogle Scholar
- Suyama M, Torrents D, Bork P: PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments. Nucleic Acids Research. 2006, W609-W612. 10.1093/nar/gkl315. 34 web serverGoogle Scholar
- Yang Z: PAML 4: Phylogenetic Analysis by Maximum Likelihood. Mol Biol Evol. 2007, msm088+Google Scholar
- Yang Z, Nielsen R: Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Mol Biol Evol. 2000, 17: 32-43.View ArticlePubMedGoogle Scholar
- Fisher RA: The correlation between relatives on the supposition of mendelian inheritence. Transactions of the Royal Society of Edinburgh. 1918, 52: 399-433.View ArticleGoogle Scholar
- Stoline MR: The Status of Multiple Comparisons: Simultaneous Estimation of All Pairwise Comparisons in One-Way ANOVA Designs. The American Statistician. 1981, 35 (3): 134-141. 10.2307/2683979.Google Scholar
- Grossmann S, Bauer S, Robinson PNN, Vingron M: Improved Detection of Overrepresentation of Gene-Ontology Annotations with Parent-Child Analysis. Bioinformatics. 2007, 23 (22): 3024-31. 10.1093/bioinformatics/btm440.View ArticlePubMedGoogle Scholar
- Bos DH, Posada D: Using Models of Nucleotide Evolution to Build Phylogenetic Trees. Developmental and Comparative Immunology. 2005, 29 (3): 211-227. 10.1016/j.dci.2004.07.007.View ArticlePubMedGoogle Scholar
- Riley MC, Clare A, King RD: Locational distribution of gene functional classes in Arabidopsis thaliana. BMC Bioinformatics. 2007, 8: 112-10.1186/1471-2105-8-112.PubMed CentralView ArticlePubMedGoogle Scholar
- Williams EJ, Bowles DJ: Coexpression of neighboring genes in the genome of Arabidopsis thaliana. Genome Res. 2004, 14 (6): 1060-7. 10.1101/gr.2131104.PubMed CentralView ArticlePubMedGoogle Scholar
- Zhang L, Gaut BS: Does recombination shape the distribution and evolution of tandemly arrayed genes (TAGs) in the Arabidopsis thaliana genome?. Genome Res. 2003, 13 (12): 2533-40. 10.1101/gr.1318503.PubMed CentralView ArticlePubMedGoogle Scholar
- Rizzon C, Ponger L, Gaut BS: Striking similarities in the genomic distribution of tandemly arrayed genes in Arabidopsis and rice. PLoS Comput Biol. 2006, 2 (9): e115-10.1371/journal.pcbi.0020115.PubMed CentralView ArticlePubMedGoogle Scholar
- Scannell DR, Wolfe KH: A burst of protein sequence evolution and a prolonged period of asymmetric evolution follow gene duplication in yeast. Genome Res. 2008, 18: 137-47. 10.1101/gr.6341207.PubMed CentralView ArticlePubMedGoogle Scholar
- Jordan IK, Wolf YI, Koonin EV: Duplicated genes evolve slower than singletons despite the initial rate increase. BMC Evol Biol. 2004, 4: 22-10.1186/1471-2148-4-22.PubMed CentralView ArticlePubMedGoogle Scholar
- Zhang L, Peek AS, Dunams D, Gaut BS: Population genetics of duplicated disease-defense genes, hm1 and hm2, in maize (Zea mays ssp. mays L.) and its wild ancestor (Zea mays ssp. parviglumis). Genetics. 2002, 162 (2): 851-860.PubMed CentralPubMedGoogle Scholar
- Hori H, Higo KI, Osawa S: The rates of evolution in some ribosomal components. Journal of Molecular Evolution. 1977, 9 (3): 191-201. 10.1007/BF01796108.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.