No simple dependence between protein evolution rate and the number of protein-protein interactions: only the most prolific interactors tend to evolve slowly
© Jordan et al; licensee BioMed Central Ltd. 2003
Received: 10 November 2002
Accepted: 6 January 2003
Published: 6 January 2003
The Erratum to this article has been published in BMC Evolutionary Biology 2003 3:5
It has been suggested that rates of protein evolution are influenced, to a great extent, by the proportion of amino acid residues that are directly involved in protein function. In agreement with this hypothesis, recent work has shown a negative correlation between evolutionary rates and the number of protein-protein interactions. However, the extent to which the number of protein-protein interactions influences evolutionary rates remains unclear. Here, we address this question at several different levels of evolutionary relatedness.
Manually curated data on the number of protein-protein interactions among Saccharomyces cerevisiae proteins was examined for possible correlation with evolutionary rates between S. cerevisiae and Schizosaccharomyces pombe orthologs. Only a very weak negative correlation between the number of interactions and evolutionary rate of a protein was observed. Furthermore, no relationship was found between a more general measure of the evolutionary conservation of S. cerevisiae proteins, based on the taxonomic distribution of their homologs, and the number of protein-protein interactions. However, when the proteins from yeast were assorted into discrete bins according to the number of interactions, it turned out that 6.5% of the proteins with the greatest number of interactions evolved, on average, significantly slower than the rest of the proteins. Comparisons were also performed using protein-protein interaction data obtained with high-throughput analysis of Helicobacter pylori proteins. No convincing relationship between the number of protein-protein interactions and evolutionary rates was detected, either for comparisons of orthologs from two completely sequenced H. pylori strains or for comparisons of H. pylori and Campylobacter jejuni orthologs, even when the proteins were classified into bins by the number of interactions.
The currently available comparative-genomic data do not support the hypothesis that the evolutionary rates of the majority of proteins substantially depend on the number of protein-protein interactions they are involved in. However, a small fraction of yeast proteins with the largest number of interactions (the hubs of the interaction network) tend to evolve slower than the bulk of the proteins.
Rates of protein evolution vary greatly and may be influenced by a variety of factors. Recently, it has been demonstrated that the magnitude of the fitness effects associated with deleterious mutations in protein-coding genes (i.e. proteins' dispensability) correlates with rates of protein evolution [1, 2]. Essential proteins or those that are less dispensable to an organism tend to evolve slower than those that are more dispensable. It has also been suggested that proteins' evolutionary rates are determined by the proportion of amino-acids that are critical to their function . According to this intuitively plausible notion, proteins with a greater fraction of amino acid residues that play an essential role in the protein's function are predicted to evolve slower than those with a smaller fraction of such crucial residues. Consistent with this prediction, a negative correlation has been reported between protein evolutionary rates, which were determined from evolutionary distances between orthologous proteins from yeast Saccharomyces cerevisiae and the nematode Caenorhabditis elegans, and the number of protein-protein interactions (i.e., physical interactions determined, primarily, using the yeast two-hybrid system) proteins are involved in . Yeast proteins that have a large number of interacting partners were found to have evolved slower, on average, than those with fewer interacting partners, and this was presumed to be due to the fact that proteins with more interacting partners have a greater fraction of residues directly involved in function. However, these same data indicate that less than 6% of the variance in evolutionary rates is explained by the variance in the number of protein-protein interactions, suggesting that the influence of the number of interacting partners on protein evolutionary rates might not be substantial. We sought to further investigate this phenomenon by examining the relationship between the number of protein-protein interacting partners and protein evolutionary rates for the yeasts S. cerevisiae and Schizosaccharomyces pombe as well as for the proteobacteria Helicobacter pylori and Camplyobacter jejuni.
Results and Discussion
Evolutionary rates and protein-protein interactions: yeast
Correlation between the number of protein-protein interactions and the evolutionary rate
Linear correlation coefficient (r)/ P-value
Rank correlation coefficient (R)/P-value
S. cerevisiae – S. pombe (all orthologs, N = 1044)
S. cerevisiae – S. pombe (only orthologs with >40% identity, N = 465)
H. pylori J99 – H. pylori 26695 (N = 672)
H. pylori – C. jejuni (N = 458)
It is tempting to speculate that the difference between the results obtained here and those reported previously  can be attributed to the difference in the evolutionary relationships between the pairs of species compared in the two studies. The species compared here, S. cerevisiae and S. pombe, are much more closely related than S. cerevisiae and C. elegans, and orthologous proteins are likely to be more reliably inferred between the closely related genomes. However, we also performed comparisons for pairs of orthologous proteins identified between the more distantly related S. cerevisiae and C. elegans  and no significant relationship between evolutionary rates and protein-protein interactions was observed (data not shown).
Long-term evolutionary conservation and protein-protein interactions: yeast
To examine the relationship between protein-protein interactions and evolutionary conservation of proteins over longer periods of time, the numbers of interactions for S. cerevisiae proteins were assessed against the taxonomic distribution of their homologs, which were detected using BLAST searches of the Genbank non-redundant protein database with expect value ≤ 10-3. Five distinct levels of taxonomic distribution categories, each including taxa that are successively more distant from S. cerevisiae, were considered: 1 – hits only to ascomycetes, 2 – hits to non-ascomycete fungi, 3 – hits to metazoa and plants, 4 – hits to non-crown-group eukaryotes, 5 – hits to archaea and/or bacteria. The broader the taxonomic distribution of homologs of a S. cerevisiae protein the more evolutionarily conserved it is considered to be. Each S. cerevisiae protein was assigned a taxonomic distribution category, and this value was compared to the number of protein-protein interactions reported for the given protein. Correlation between these two features of S. cerevisiae proteins was not statistically significant (r2 = 0.007, p = 0.39). Thus, as with the comparison between evolutionary rates and the number of interactions, no substantial relationship between long-term evolutionary conservation of S. cerevisiae proteins and the number of interactions was found.
Evolutionary rates and protein-protein interactions: bacteria
Yeast proteins with the greatest number of interactions appear to evolve slowly
Statistical significance of the differences in evolutionary rates between groups of proteins with different numbers of interactions.
Bin (# interactions) comparisons a
S. cerevisiae – S. pombe
41 – 60 vs. 1 – 40
8.3 × 10-4
31 – 60 vs. 1 – 30
2.4 × 10-2
21 – 60 vs. 1 – 20
1.7 × 10-4
H. pylori 26695 – H. pylori J99
21 – 55 vs. 1 – 20
1.5 × 10-1
15 – 55 vs. 1 – 14
1.8 × 10-1
11 – 55 vs. 1 – 10
3.2 × 10-1
H. pylori 26695 – C. jejuni
21 – 47 vs. 1 – 20
9.8 × 10-1
11 – 47 vs. 1 – 10
5.1 × 10-1
Discussion and conclusions
The hypothesis that a protein's rate of evolution is determined by the fraction of residues that are critical to its function, and this, in turn, is likely to be proportional to the number of interactions a protein is involved in, seems to make perfectly good sense. Indeed, a recent report is consistent with this idea in suggesting that the number of protein-protein interactions significantly affects rates of evolution . However, upon investigation of this relationship at multiple levels of evolutionary relatedness, we found that there was only a slight correlation, at best, between evolutionary rates and the number of protein-protein interactions. In fact, examination of the actual data presented in support of the previous claim of a connection between the number of interactions and evolutionary rates  also shows a weak correlation, albeit greater than the one observed in this study. Thus, differences in the number of interaction partners seem to explain, at best, only a small part of the great variation of the evolutionary rates of proteins encoded in each genome .
Why does the number of interaction partners apparently have only a slight effect on the evolutionary rate? The first and most obvious possibility to consider would be that the low quality of protein-protein interaction data might obscure the signal. Indeed, a recent comparison of protein-protein interaction data sets from high-throughput studies suggested that more than half of all interactions determined by large scale experiments are likely to be false positives . However, at least for the yeast data, we relied on manually curated protein-protein interaction data from the MIPS database, which are expected to have a substantially lower error rate. Second, one could speculate that, even if the majority of the analyzed interactions actually do occur, they are selectively (nearly) neutral; the number of such real but functionally irrelevant interactions would not affect the rate of evolution. Third, the possibility exists that, even if many of the observed interactions are functionally important and, by inference, the respective binding sites are subject to purifying selection, the binding sites for different partners tend to overlap such that the number of amino residues in these sites increases only slowly with the increase in the numbers of interactions.
The latter two possibilities are not incompatible with each other and with the other aspect of the observations reported here. We found that the small fraction of yeast proteins that have the greatest number of interaction partners do, on average, evolve slower than the bulk of the proteins, which are involved in a moderate or small number of interactions. This effect was less pronounced, if observed at all, for H. pylori, but it has to be noticed that the top bins of the H. pylori interaction data included proteins with fewer interactions than the respective bins in the yeast data (compare Fig. 3b,3c and 3a). Protein-protein interactions form scale-free networks, which show the characteristic power-law distribution of the node degrees; simply put, there is a small number of highly connected proteins (hubs), whereas the majority have a small number of partners (the most abundant class are proteins that are involved in just one interaction) [13, 14]. Scale-free networks are highly tolerant to error (elimination of nodes at random) but are vulnerable to attack, i.e. elimination of the hubs  and, indeed, it has been found that the most highly connected proteins in yeast interaction networks tend to be essential . This might explain the present findings, namely that a small number of yeast protein-protein interaction hubs evolve slowly due to strong purifying selection, whereas, for the great majority of the proteins, there is no discernible connection between the number of interactions and evolutionary rates.
Comparison of evolutionary rates and protein-protein interactions
Sets of protein sequences encoded by the complete genome sequences of the yeasts S. cerevisiae  and S. pombe , the nematode C. elegans  and the proteobacteria H. pylori strain 26695 , H. pylori strain J99  and C. jejuni  were downloaded from the National Center of Biotechnology Information's Genbank ftp site ftp://ftp.ncbi.nlm.nih.gov/genomes/. Protein sets (proteomes) from the following pairs of complete genome sequences were compared in order to identify orthologous sequences: S. cerevisiae – S. pombe, S. cerevisiae – C. elegans, H. pylori strain 26695 – H. pylori strain J99, H. pylori strain 26695 – C. jejuni. Pairs of proteomes were compared using the BLASTP program , with post-processing of results done using the SEALS package . For each proteome, individual proteins were used as queries in BLASTP searches against the entire proteome of the other analyzed species (or strain). Symmetrical best hits in these BLAST searches (expectation value ≤ 10-3) were taken to be orthologs . Pairs of orthologous proteins were aligned using the ClustalW program  and their substitution (evolutionary) rates were calculated using the gamma distance correction . The data on protein-protein interactions for the S. cerevisiae proteome were obtained from the Munich Information Center for Protein Sequences (MIPS)  Comprehensive Yeast Genome Database http://mips.gsf.de/proj/yeast/CYGD/db/index.html. This database includes a manually curated catalogue of binary protein-protein interactions that is considered to be a reliable reference set . Protein-protein interactions for the H. pylori proteome  were taken from the PIMRider functional proteomics software platform http://pim.hybrigenics.fr/pimrider/pimriderlobby/PimRiderLobby.jsp.
- Hirsh AE, Fraser HB: Protein dispensability and rate of evolution. Nature. 2001, 411: 1046-1049. 10.1038/35082561.View ArticlePubMedGoogle Scholar
- Jordan IK, Rogozin IB, Wolf YI, Koonin EV: Essential genes are more evolutionarily conserved than are nonessential genes in bacteria. Genome Res. 2002, 12: 962-968. 10.1101/gr.87702. Article published online before print in May 2002.PubMed CentralView ArticlePubMedGoogle Scholar
- Brookfield JF: What determines the rate of sequence evolution?. Curr Biol. 2000, 10: R410-R0411. 10.1016/S0960-9822(00)00506-6.View ArticlePubMedGoogle Scholar
- Fraser HB, Hirsh AE, Steinmetz LM, Scharfe C, Feldman MW: Evolutionary rate in the protein interaction network. Science. 2002, 296: 750-752. 10.1126/science.1068696.View ArticlePubMedGoogle Scholar
- Mewes HW, Frishman D, Guldener U, Mannhaupt G, Mayer K, Mokrejs M, Morgenstern B, Munsterkotter M, Rudd S, Weil B: MIPS: a database for genomes and protein sequences. Nucleic Acids Res. 2002, 30: 31-34. 10.1093/nar/30.1.31.PubMed CentralView ArticlePubMedGoogle Scholar
- The C. elegans Sequencing Consortium: Genome sequence of the nematode C. elegans: a platform for investigating biology. Science. 1998, 282: 2012-2018. 10.1126/science.282.5396.2012.View ArticleGoogle Scholar
- Rain JC, Selig L, De Reuse H, Battaglia V, Reverdy C, Simon S, Lenzen G, Petel F, Wojcik J, Schachter V: The protein-protein interaction map of Helicobacter pylori. Nature. 2001, 409: 211-215. 10.1038/35051615.View ArticlePubMedGoogle Scholar
- Alm RA, Ling LS, Moir DT, King BL, Brown ED, Doig PC, Smith DR, Noonan B, Guild BC, deJonge BL: Genomic-sequence comparison of two unrelated isolates of the human gastric pathogen Helicobacter pylori. Nature. 1999, 397: 176-180. 10.1038/16495.View ArticlePubMedGoogle Scholar
- Tomb JF, White O, Kerlavage AR, Clayton RA, Sutton GG, Fleischmann RD, Ketchum KA, Klenk HP, Gill S, Dougherty BA: The complete genome sequence of the gastric pathogen Helicobacter pylori. Nature. 1997, 388: 539-547. 10.1038/41483.View ArticlePubMedGoogle Scholar
- Parkhill J, Wren BW, Mungall K, Ketley JM, Churcher C, Basham D, Chillingworth T, Davies RM, Feltwell T, Holroyd S: The genome sequence of the food-borne pathogen Campylobacter jejuni reveals hypervariable sequences. Nature. 2000, 403: 665-668. 10.1038/35001088.View ArticlePubMedGoogle Scholar
- Grishin NV, Wolf YI, Koonin EV: From complete genomes to measures of substitution rate variability within and between proteins. Genome Res. 2000, 10: 991-1000. 10.1101/gr.10.7.991.PubMed CentralView ArticlePubMedGoogle Scholar
- von Mering C, Krause R, Snel B, Cornell M, Oliver SG, Fields S, Bork P: Comparative assessment of large-scale data sets of protein-protein interactions. Nature. 2002, 417: 399-403. 10.1038/nature750.View ArticlePubMedGoogle Scholar
- Jeong H, Mason SP, Barabasi AL, Oltvai ZN: Lethality and centrality in protein networks. Nature. 2001, 411: 41-42. 10.1038/35075138.View ArticlePubMedGoogle Scholar
- Lappe M, Park J, Niggemann O, Holm L: Generating protein interaction maps from incomplete data: application to fold assignment. Bioinformatics. 2001, 17 (Suppl 1): S149-156.View ArticlePubMedGoogle Scholar
- Albert R, Jeong H, Barabasi AL: Error and attack tolerance of complex networks. Nature. 2000, 406: 378-382. 10.1038/35019019.View ArticlePubMedGoogle Scholar
- Goffeau A, Barrell BG, Bussey H, Davis RW, Dujon B, Feldmann H, Galibert F, Hoheisel JD, Jacq C, Johnston M: Life with 6000 genes. Science. 1996, 274: 563-547. 10.1126/science.274.5287.546.View ArticleGoogle Scholar
- Wood V, Gwilliam R, Rajandream MA, Lyne M, Lyne R, Stewart A, Sgouros J, Peat N, Hayles J, Baker S: The genome sequence of Schizosaccharomyces pombe. Nature. 2002, 415: 871-880. 10.1038/nature724.View ArticlePubMedGoogle Scholar
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25: 3389-3402. 10.1093/nar/25.17.3389.PubMed CentralView ArticlePubMedGoogle Scholar
- Tatusov RL, Koonin EV, Lipman DJ: A genomic perspective on protein families. Science. 1997, 278: 631-637. 10.1126/science.278.5338.631.View ArticlePubMedGoogle Scholar
- Walker DR, Koonin EV: SEALS: a system for easy analysis of lots of sequences. Proc Int Conf Intell Syst Mol Biol. 1997, 5: 333-339.PubMedGoogle Scholar
- Higgins DG, Thompson JD, Gibson TJ: Using CLUSTAL for multiple sequence alignments. Methods Enzymol. 1996, 266: 383-402.View ArticlePubMedGoogle Scholar
- Ota T, Nei M: Estimation of the number of amino acid substitutions per site when the substitution rate varies among sites. J Mol Evol. 1994, 38: 642-643.View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.