Transcriptional abundance is not the single force driving the evolution of bacterial proteins
© Wei et al.; licensee BioMed Central Ltd. 2013
Received: 19 March 2013
Accepted: 1 August 2013
Published: 2 August 2013
Despite rapid progress in understanding the mechanisms that shape the evolution of proteins, the relative importance of various factors remain to be elucidated. In this study, we have assessed the effects of 16 different biological features on the evolutionary rates (ERs) of protein-coding sequences in bacterial genomes.
Our analysis of 18 bacterial species revealed new correlations between ERs and constraining factors. Previous studies have suggested that transcriptional abundance overwhelmingly constrains the evolution of yeast protein sequences. This transcriptional abundance leads to selection against misfolding or misinteractions. In this study we found that there was no single factor in determining the evolution of bacterial proteins. Not only transcriptional abundance (codon adaptation index and expression level), but also protein-protein associations (PPAs), essentiality (ESS), subcellular localization of cytoplasmic membrane (SLM), transmembrane helices (TMH) and hydropathicity score (HS) independently and significantly affected the ERs of bacterial proteins. In some species, PPA and ESS demonstrate higher correlations with ER than transcriptional abundance.
Different forces drive the evolution of protein sequences in yeast and bacteria. In bacteria, the constraints are involved in avoiding a build-up of toxic molecules caused by misfolding/misinteraction (transcriptional abundance), while retaining important functions (ESS, PPA) and maintaining the cell membrane (SLM, TMH and HS). Each of these independently contributes to the variation in protein evolution.
KeywordsEvolutionary rates Bacteria Multiple features Transcriptional abundance
Amino acid substitution rates vary considerably among different proteins. Although rapid progress has been made in determining the most important factors that shape protein evolution, the challenge remains to assess the relative importance of various variables, such as gene expression level, essentiality (ESS) and protein interactions [1–10]. One early study  proposed a negative correlation between the severity of gene knockout effects and coding sequence evolution, which was dependent upon the notion that purifying selection should be more efficient for essential genes than those that are non-essential. A link has been discovered between protein expression levels and evolutionary rates (ERs) in both unicellular and multicellular organisms [7, 12–19].
In general, genes that are highly expressed preferentially use optimal codons to improve translational efficiency. The codon adaptation index (CAI), a measure of synonymous codon usage bias, has been widely used as a proxy for gene expression levels . When CAI values were used as a substitute for actual expression levels in yeast  and bacteria , only a small proportion of rate variation in protein evolution can be explained by ESS. After replacing CAI values with experimental data and controlling for gene expression levels, ESS still had significant effects on protein ERs, but did not appear to be a major determinant of protein evolution [21, 22]. CAI, expression level, and protein abundance can account for most of the variation in yeast protein ERs . Keeping proteins from misfolding or misinteraction result in the slow evolution of highly expressed genes, and impose a general constraint on coding sequence evolution [7, 23]. However, by using noiseless variables, protein interactions have explained more ER variation than transcriptional abundance . Results from another study suggest that the molecular evolution of protein-coding genes is affected by both the context of extrinsic translational expression rates and intrinsic structural-functional constraints .
Bacterial features examined in this study
Evolutionary rate (Ka)
Codon adaption index
mRNA folding strength
Number of protein–protein associations
Subcellular localization: cytoplasm
Subcellular localization: cytoplasmic membrane
Subcellular localization: periplasm
Subcellular localization: outer membrane
Subcellular localization: extracellular
Subcellular localization: cell wall
Number of transmembrane helices
Length of protein in amino acids
Replication strand bias
Results and discussion
Genomic feature correlates of protein ER
Bacterial species investigated in this study
Codon usage separation (CUS)
Generation times 
Bacillus subtilis 168
Bacteroides thetaiotaomicron VPI-5482
Caulobacter crescentus NA1000
Escherichia coli K-12
Francisella novicida U112
Haemophilus influenzae Rd KW20
Helicobacter pylori 26695
Mycoplasma genitalium G37
Mycoplasma pulmonis UAB CTIP
Mycobacterium tuberculosis H37Rv
Porphyromonas gingivalis ATCC 33277
Pseudomonas aeruginosa UCBPP-PA14
Staphylococcus aureus NCTC 8325
Streptococcus pneumoniae TIGR4
Streptococcus sanguinis SK36
Salmonella typhimurium LT2
Vibrio cholera O1 biovar El Tor N16961
A single variable linked to transcriptional abundance (CAI, EL and protein abundance) was found to explain the dominance of observed variation in yeast ERs . CAI and EL are related to transcriptional abundance, while protein abundance is a result of the combined consequences of transcription and translation. Recent studies observed that MFS was strong for more abundant proteins, resulting in stronger evolutionary constraints of more highly expressed proteins [26, 27]. We used three variables (CAI, EL and MFS) to highlight the impact of transcriptional abundance on ER.
It has been previously demonstrated that translational selection across species is also strongly affected by genomic GC content . We found that CAI-ER coefficients significantly correlated with GC content (Pearson’s r = -0.473; p = 0.045). CAI-ER coefficients of GC-rich bacteria are significantly greater than those for AT-rich bacteria, as translational selection is often absent in AT-rich organisms . It is also known that mRNAs have a stronger secondary structure if there are more GC-rich codons [32, 33]. Moreover, there is stronger selection to improve translation efficiency for weak folding at translation-initiation sites of a gene in GC-rich hosts . These GC-rich organisms preferentially use GC-rich optimal codons . GC-rich genomes therefore show stronger translational selection compared with AT-rich genomes. Accordingly, we found that transcriptional abundance does not always influence ERs as GC content varies across species.
A recent study found that CAI, microarray-based EL or sequencing-based EL approaches for measuring transcriptional abundance affected the assessment of the importance of transcriptional abundance to ER . We found that bacteria, whose EL-ER correlation was weaker than the CAI-ER correlation, demonstrated greater CUS (0.816 vs. 0.220). For those species strongly mediated by translational selection, CAI as opposed to EL likely better explains the variation of ER. Although RNA sequencing (RNA-seq) data could be more accurate than microarray data, there is currently little RNA-seq data available for most bacterial species. In this study, we derived EL from RNA-seq data for E. coli, and used microarray data for other bacterial species. Although the sequencing-based EL-ER correlation is weaker than the CAI-ER correlation in E. coli, it is stronger than other correlations. With the development of RNA-seq experiments, we believe that the assessment of EL-ER correlations could be more accurate, and the impact of EL on ER could be stronger in certain bacterial species. To compensate for the inadequacy of each single variable to represent expression levels, we used CAI, EL and MFS to describe the impact of transcriptional abundance on ERs.
In an earlier study, it was proposed that ESS and protein interactions were negatively correlated with coding sequence ERs because of the constraints of important physical functions [3, 6, 11]. We used many types of protein associations (PPA), not only physical protein interactions (PPI), which were directly extracted from the STRING database. As expected, significant correlations between PPA/ESS and ERs were found for almost all the bacteria we investigated in our study. The strength of PPA-ER correlations was even greater than that of CAI-ER/EL-ER correlations in six organisms: Acinetobacter ADP1; Francisella novicida; H. pylori; Mycoplasma genitalium; Mycoplasma pulmonis; and Streptococcus sanguinis. In F. novicida, the ESS-ER correlation was also larger than that for CAI/EL-ER. The function of a gene is indeed an important driving force in bacterial protein evolution.
Variation in subcellular localization
Most cellular activities, including many metabolic pathways and processes, occur within the SLC. In this study, we observed significant negative correlations between SLC and ER. For example, the correlation coefficient for Caulobacter crescentus was -0.424 (p = 5.18 × 10-118). The SLM surrounds the cytoplasm of living cells, and positive correlations between SLM and ER were also observed in our study. The cell membrane functions as a selective filter, allowing molecules either to be pumped across the membrane by transmembrane transporters, or to be diffused through protein channels. These transmembrane proteins are usually specific; as a consequence, SLM proteins are fast-evolving and well adapted. We also found, as expected, that TMH positively correlated with bacterial protein ERs. The positive correlations are relatively weak between other subcellular localizations (SLP, SLO, SLE, and SLW) and the ERs of proteins. Secreted proteins located in SLO/SLE for Proteobacteria and SLW/SLE for Firmicutes were found to rapidly evolve . This could be a potential explanation of why SLW, SLO and SLE rapidly evolve.
Limitations of aromatic amino acids
To manufacture proteins, microorganisms must synthesize their aromatic amino acids via the shikimate pathway. These amino acids have a limited source that impacts upon the rate at which translation errors can be corrected, and the maintenance of translation efficiency and accuracy. Therefore, the adoption of aromatic amino acids in functional or abundant proteins is not encouraged. In this study, we found that slowly evolving proteins tend to avoid adopting aromatic amino acids. In most of the investigated bacteria, AS positively and significantly correlated with ER (Figure 1).
In many bacteria, genes tend to be encoded on the leading strand. The likelihood of a gene being found on the leading strand was weakly, but significantly, associated with ER in most of the studied bacteria. As an example, RSB of Bacillus subtilis, whose genome contains over 70% leading proteins, was significantly and positively correlated with ER (Pearson’s r = -0.139; p = 7.55 × 10-11). Transcription and replication occur simultaneously in bacterial cells [36–38]. Replication progresses much faster than transcription, and inevitable conflicts occur between DNA and RNA polymerases when they bind to the same template. Co-directional collisions occur when the leading strand is the template for transcription, resulting in head-on collisions taking place when the lagging strand is the template. Head-on collisions have particularly deleterious effects, as replication forks may be arrested and transcription slowed. Over the course of evolution, transcripts are more likely to be retained if they are on the leading strand, which explains why bacterial genes on the leading strand evolve more slowly than those on the lagging strand.
Multiple factors cooperatively dominate ER
In all the bacterial species we investigated, SLM, TMH and HS were found to cooperatively affect ER (Additional file 1: Figure S1). These three factors have been grouped in principal component plots (Figure 4). Membrane protein transport takes place via helix-dependent protein channels embedded in cell membranes, because of their hydrophobic structure. The need to maintain transmembrane protein function may help explain the relationship among SLM, TMH, and HS.
Different forces drive ER in different species
According to PCR analysis, factors associated with transcriptional abundance (CAI, EL), important functionality (ESS, PPA) and transmembrane protein function (SLM, TMH, and HS) were the main contributors (8%) to protein ER variation in over 50% of bacterial species we studied. Transcriptional abundance is the most dominant factor in yeast , but not in mice  or bacteria (this study). The extent to which transcriptional abundance affects ERs correlates with the strength of codon bias. Our PCR analysis indicated multiple factors contribute to the rate of protein evolution in bacteria. We also found that PPA was a common important contributor to bacterial evolution, with greater effects than CAI/EL. Our results were basically identical to those presented by Plotkin and Fraser -PPI appears to be responsible for most of the ER variation in yeast. The deleterious effects of protein misinteractions can affect the optimal protein concentrations and shape functional interaction networks . Therefore there is a need to maintain proper interactions among high connectivity proteins as it constrains their evolution. Although ESS does not contribute strongly to yeast ERs, it is still an important factor in determining bacterial protein evolution. Our findings suggest that various forces drive protein sequence evolution in different species.
We have uncovered new relationships among ERs in bacterial genomes related to protein subcellular localization, transmembrane helices, hydropathicity, aromaticity, and replication strand localization. ER had a significant negative correlation with SLC, but a significant positive correlation with SLM. Because of the effects of TMH and HS on SLM, these two variables were also found to positively, although relatively weakly, correlate with bacterial protein ERs. The impact of bacterial SLM/TMH/HS and SLC on ER is independent of functional importance and transcriptional abundance. This is consistent with results from a recent study in mammalian proteins . We also found that proteins that evolved slowly in bacterial genomes tended to avoid adopting aromatic amino acids. Additionally, bacterial genes on the leading strand evolved more slowly than those with genes on the lagging strand. We investigated the independent contributions of biological features to ER, and found that the dominant effect of transcriptional abundance on ER is absent in bacteria. Factors that retain important functionality (ESS, PPA), maintain cell membrane function (SLM, TMH, and HS) and avoid a build-up of toxic molecules caused by misfolding or misinteraction (CAI, EL) influence the ERs of bacterial proteins. If more RNA-Seq data are available in the future, the correlation of EL-ER could be found to be stronger in certain bacterial species than reported here. However, the influences of PPA, ESS, SLM, TMH, and HS on ER are comparable with the impact of transcriptional abundance on ER in most bacteria.
We investigated 18 bacterial species (Table 2) in the current version (7.0) of the Database of Essential Genes (DEG; http://tubic.tju.edu.cn/deg/), which hosts records of available essential genes identified by well-known genome-wide experimental techniques from a range of organisms . In each of these experiments, almost all genes were investigated for their ESS scores; therefore datasets were not biased or partial. Complete coding sequences of these bacteria and their gene ESS annotations were obtained from GenBank and DEG databases, respectively.
Orthologous gene pairs between each genome pair were identified based on reciprocal best hits using the Blastp program with criteria of E <10-5, 80% minimum residues that could be aligned, and 30% identity. Protein sequences encoded by identified orthologous gene pairs were aligned with ClustalW , and then back-translated into nucleotide sequences based on their original sequences. Numbers of substitutions per non-synonymous site (K a ) were calculated following Yang’s definition using the PAML package with default parameters . We retained all ortholog assignments coding for more than 30 amino acids, which were not acquired by horizontal transfer, as determined by the Horizontal Gene Transfer  (HGT-DB; http://genomes.urv.cat/HGT-DB/) and DarkHorse  (http://darkhorse.ucsd.edu/) databases. Values for ERs were log-transformed after addition of a small constant (0.001).
CAI, expression level and mRNA folding strength
Transcriptional abundance was predicted from CAI, expression levels and mRNA folding strength. CAI is a species-dependent codon bias measurement that has been widely used as an empirical approach for gene expressivity, especially in microbial genomes . With this methodology, dozens of ribosomal protein genes were chosen as a reference set of highly expressed genes for each genome. Our mRNA levels, derived from RNA-seq data for E. coli and microarray data for other species, under favorable environmental conditions were extracted from the Gene Expression Omnibus  (GEO; http://www.ncbi.nlm.nih.gov/geo/) database. Data were obtained for the following bacteria: B. subtilis (GEO Sample Accession Numbers GSM177105–GSM177118); Bacteroides thetaiotaomicron (GSM40897–GSM40906); E. coli (GSM99211–GSM99216); Haemophilus influenzae (GSM114031–GSM114033); H. pylori (GSM623401–GSM623404); Mycobacterium tuberculosis (GSM71958, GSM71988–GSM71990); Porphyromonas gingivalis (GSM590017); Pseudomonas aeruginosa (GSM462061–GSM462064, GSM462352–GSM462355); S. aureus (GSM724739–GSM724741), Streptococcus pneumonia (GSM673840); Streptococcus sanguinis (GSM908371–GSM908373); and Salmonella typhimurium (GSM874413–GSM874415). Expression level values were scaled using a logarithmic function.
The secondary structures of mRNAs, for a folding temperature under 30°C, were predicted by RNAfold within the ViennaRNA package . Windows comprising 150 nucleotides were slid in 10 nucleotide steps during analysis . At each nucleotide, the probability that it paired was estimated by the number of sliding windows with which it paired, divided by the number of sliding windows that include the nucleotide. We then used the average pairing possibility for an mRNA to estimate its folding strength.
Number of protein–protein associations
Protein-protein association data were obtained from the STRING database  (http://string-db.org/). These association data included physical PPIs and other links such as co-expression data. From the original data, we computed the number of associations for each gene using a default confidence score cutoff of 0.4.
Subcellular localization and number of transmembrane helices
We used PSORTb v3.0  (http://www.psort.org/psortb/) to predict subcellular localization of proteins. Four subcellular localization types can be predicted for Gram-positive bacteria and five types can be predicted for Gram-negative bacteria. For a certain localization type, genes were assigned PSORTb prediction scores if they belong to this type, and 0 if they did not. The number of transmembrane helices was predicted from bacterial proteomes using the TMHMM Server v2.0 (http://www.cbs.dtu.dk/services/TMHMM/).
Protein hydropathicity, aromaticity, and length
We used CodonW (http://codonw.sourceforge.net/) to determine hydropathicity, aromaticity and protein length. The general average hydropathicity score for each gene product was obtained by calculating the arithmetic mean of the sum of the hydropathic indices for each amino acid. Aromaticity scores are indices for indicating frequency of aromatic amino acids.
Replication strand bias
Replication origin and terminus positions for each bacterial species were annotated using the DoriC database  (http://tubic.tju.edu.cn/doric/index.html). Genes were assigned a value of 1 if these positions were located on the leading strand, and 0 if otherwise.
Spearman rank correlation and PCR
Spearman’s rank correlation test was used to investigate expected direct correlations between each variable and ER. To further determine the independent contribution (R2) of each biological feature to ER, we used PCR.
where , , and S denote the average PC1 and PC2 values of non-ribosomal proteins, and the standard deviation of the principal component value, respectively. A greater CUS indicates a greater difference in codon usage between ribosomal proteins and non-ribosomal proteins. All statistical analyses were conducted and plots generated using the R package (http://www.r-project.org/).
Principal component regression
Relative synonymous codon usage
Codon usage separation
Gene expression omnibus
Database of essential genes.
This work was supported by the Program for New Century Excellent Talents in University (NCET-11-0059), the National Natural Science Foundation of China (Grant 31,071,109), and the special fund of the China Postdoctoral Science Foundation (Grant 201,104,687).
- Hirsh AE, Fraser HB: Protein dispensability and rate of evolution. Nature. 2001, 411 (6841): 1046-1049. 10.1038/35082561.PubMedView ArticleGoogle Scholar
- Pal C, Papp B, Hurst LD: Genomic function: rate of evolution and gene dispensability. Nature. 2003, 421 (6922): 496-497. discussion 497–498PubMedView ArticleGoogle Scholar
- Fraser HB, Hirsh AE, Steinmetz LM, Scharfe C, Feldman MW: Evolutionary rate in the protein interaction network. Science. 2002, 296 (5568): 750-752. 10.1126/science.1068696.PubMedView ArticleGoogle Scholar
- Jordan IK, Rogozin IB, Wolf YI, Koonin EV: Essential genes are more evolutionarily conserved than are nonessential genes in bacteria. Genome Res. 2002, 12 (6): 962-968.PubMed CentralPubMedView ArticleGoogle Scholar
- Yang J, Gu Z, Li WH: Rate of protein evolution versus fitness effect of gene deletion. Mol Biol Evol. 2003, 20 (5): 772-774. 10.1093/molbev/msg078.PubMedView ArticleGoogle Scholar
- Hahn MW, Kern AD: Comparative genomics of centrality and essentiality in three eukaryotic protein-interaction networks. Mol Biol Evol. 2005, 22 (4): 803-806. 10.1093/molbev/msi072.PubMedView ArticleGoogle Scholar
- Drummond DA, Wilke CO: Mistranslation-induced protein misfolding as a dominant constraint on coding-sequence evolution. Cell. 2008, 134 (2): 341-352. 10.1016/j.cell.2008.05.042.PubMed CentralPubMedView ArticleGoogle Scholar
- Liao BY, Weng MP, Zhang J: Impact of extracellularity on the evolutionary rate of mammalian proteins. Genome Biol Evol. 2010, 2: 39-43. 10.1093/gbe/evp058.PubMed CentralPubMedView ArticleGoogle Scholar
- Chang TY, Liao BY: Flagellated algae protein evolution suggests the prevalence of lineage-specific rules governing evolutionary rates of eukaryotic proteins. Genome Biol Evol. 2013, 5 (5): 913-922. 10.1093/gbe/evt055.PubMed CentralPubMedView ArticleGoogle Scholar
- Nogueira T, Touchon M, Rocha EP: Rapid evolution of the sequences and gene repertoires of secreted proteins in bacteria. PLoS One. 2012, 7 (11): e49403-10.1371/journal.pone.0049403.PubMed CentralPubMedView ArticleGoogle Scholar
- Wilson AC, Carlson SS, White TJ: Biochemical evolution. Annu Rev Biochem. 1977, 46: 573-639. 10.1146/annurev.bi.46.070177.003041.PubMedView ArticleGoogle Scholar
- Rocha EP, Danchin A: An analysis of determinants of amino acids substitution rates in bacterial proteins. Mol Biol Evol. 2004, 21 (1): 108-116.PubMedView ArticleGoogle Scholar
- Drummond DA, Raval A, Wilke CO: A single determinant dominates the rate of yeast protein evolution. Mol Biol Evol. 2006, 23 (2): 327-337.PubMedView ArticleGoogle Scholar
- Pal C, Papp B, Hurst LD: Highly expressed genes in yeast evolve slowly. Genetics. 2001, 158 (2): 927-931.PubMed CentralPubMedGoogle Scholar
- Krylov DM, Wolf YI, Rogozin IB, Koonin EV: Gene loss, protein sequence divergence, gene dispensability, expression level, and interactivity are correlated in eukaryotic evolution. Genome Res. 2003, 13 (10): 2229-2235. 10.1101/gr.1589103.PubMed CentralPubMedView ArticleGoogle Scholar
- Subramanian S, Kumar S: Gene expression intensity shapes evolutionary rates of the proteins encoded by the vertebrate genome. Genetics. 2004, 168 (1): 373-381. 10.1534/genetics.104.028944.PubMed CentralPubMedView ArticleGoogle Scholar
- Lemos B, Bettencourt BR, Meiklejohn CD, Hartl DL: Evolution of proteins and gene expression levels are coupled in Drosophila and are independently associated with mRNA abundance, protein length, and number of protein-protein interactions. Mol Biol Evol. 2005, 22 (5): 1345-1354. 10.1093/molbev/msi122.PubMedView ArticleGoogle Scholar
- Popescu CE, Borza T, Bielawski JP, Lee RW: Evolutionary rates and expression level in Chlamydomonas. Genetics. 2006, 172 (3): 1567-1576.PubMed CentralPubMedView ArticleGoogle Scholar
- Ingvarsson PK: Gene expression and protein length influence codon usage and rates of sequence evolution in Populus tremula. Mol Biol Evol. 2007, 24 (3): 836-844.PubMedView ArticleGoogle Scholar
- Sharp PM, Li WH: The codon Adaptation Index–a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res. 1987, 15 (3): 1281-1295. 10.1093/nar/15.3.1281.PubMed CentralPubMedView ArticleGoogle Scholar
- Wall DP, Hirsh AE, Fraser HB, Kumm J, Giaever G, Eisen MB, Feldman MW: Functional genomic analysis of the rates of protein evolution. Proc Natl Acad Sci U S A. 2005, 102 (15): 5483-5488. 10.1073/pnas.0501761102.PubMed CentralPubMedView ArticleGoogle Scholar
- Zhang J, He X: Significant impact of protein dispensability on the instantaneous rate of protein evolution. Mol Biol Evol. 2005, 22 (4): 1147-1155. 10.1093/molbev/msi101.PubMedView ArticleGoogle Scholar
- Yang JR, Liao BY, Zhuang SM, Zhang J: Protein misinteraction avoidance causes highly expressed proteins to evolve slowly. Proc Natl Acad Sci U S A. 2012, 109 (14): E831-E840. 10.1073/pnas.1117408109.PubMed CentralPubMedView ArticleGoogle Scholar
- Plotkin JB, Fraser HB: Assessing the determinants of evolutionary rates in the presence of noise. Mol Biol Evol. 2007, 24 (5): 1113-1121. 10.1093/molbev/msm044.PubMedView ArticleGoogle Scholar
- Wolf MY, Wolf YI, Koonin EV: Comparable contributions of structural-functional constraints and expression level to the rate of protein sequence evolution. Biol Direct. 2008, 3: 40-10.1186/1745-6150-3-40.PubMed CentralPubMedView ArticleGoogle Scholar
- Park C, Chen X, Yang JR, Zhang J: Differential requirements for mRNA folding partially explain why highly expressed proteins evolve slowly. Proc Natl Acad Sci U S A. 2013, 110 (8): E678-E686. 10.1073/pnas.1218066110.PubMed CentralPubMedView ArticleGoogle Scholar
- Zur H, Tuller T: Strong association between mRNA folding strength and protein abundance in S. cerevisiae. EMBO Rep. 2012, 13 (3): 272-277. 10.1038/embor.2011.262.PubMed CentralPubMedView ArticleGoogle Scholar
- Lafay B, Atherton JC, Sharp PM: Absence of translationally selected synonymous codon usage bias in Helicobacter pylori. Microbiology. 2000, 146 (Pt 4): 851-860.PubMedView ArticleGoogle Scholar
- Vieira-Silva S, Rocha EP: The systemic imprint of growth and its uses in ecological (meta) genomics. PLoS Genet. 2010, 6 (1): e1000808-10.1371/journal.pgen.1000808.PubMed CentralPubMedView ArticleGoogle Scholar
- Plotkin JB, Kudla G: Synonymous but not the same: the causes and consequences of codon bias. Nat Rev Genet. 2011, 12 (1): 32-42. 10.1038/nrg2899.PubMed CentralPubMedView ArticleGoogle Scholar
- Naya H, Romero H, Carels N, Zavala A, Musto H: Translational selection shapes codon usage in the GC-rich genome of Chlamydomonas reinhardtii. FEBS Lett. 2001, 501 (2–3): 127-130.PubMedView ArticleGoogle Scholar
- Voges D, Watzele M, Nemetz C, Wizemann S, Buchberger B: Analyzing and enhancing mRNA translational efficiency in an Escherichia coli in vitro expression system. Biochem Biophys Res Commun. 2004, 318 (2): 601-614. 10.1016/j.bbrc.2004.04.064.PubMedView ArticleGoogle Scholar
- Kudla G, Murray AW, Tollervey D, Plotkin JB: Coding-sequence determinants of gene expression in Escherichia coli. Science. 2009, 324 (5924): 255-258. 10.1126/science.1170160.PubMed CentralPubMedView ArticleGoogle Scholar
- Gu W, Zhou T, Wilke CO: A universal trend of reduced mRNA stability near the translation-initiation site in prokaryotes and eukaryotes. PLoS Comput Biol. 2010, 6 (2): e1000664-10.1371/journal.pcbi.1000664.PubMed CentralPubMedView ArticleGoogle Scholar
- Hershberg R, Petrov DA: General rules for optimal codon choice. PLoS Genet. 2009, 5 (7): e1000556-10.1371/journal.pgen.1000556.PubMed CentralPubMedView ArticleGoogle Scholar
- Mirkin EV, Mirkin SM: Mechanisms of transcription-replication collisions in bacteria. Mol Cell Biol. 2005, 25 (3): 888-895. 10.1128/MCB.25.3.888-895.2005.PubMed CentralPubMedView ArticleGoogle Scholar
- Pomerantz RT, O’Donnell M: What happens when replication and transcription complexes collide?. Cell Cycle. 2010, 9 (13): 2537-2543.PubMed CentralPubMedView ArticleGoogle Scholar
- Kim N, Jinks-Robertson S: Transcription as a source of genome instability. Nat Rev Genet. 2012, 13 (3): 204-214.PubMed CentralPubMedGoogle Scholar
- Theis FJ, Latif N, Wong P, Frishman D: Complex principal component and correlation structure of 16 yeast genomic variables. Mol Biol Evol. 2011, 28 (9): 2501-2512. 10.1093/molbev/msr077.PubMedView ArticleGoogle Scholar
- Liao BY, Scott NM, Zhang J: Impacts of gene essentiality, expression pattern, and gene compactness on the evolutionary rate of mammalian proteins. Mol Biol Evol. 2006, 23 (11): 2072-2080. 10.1093/molbev/msl076.PubMedView ArticleGoogle Scholar
- Zhang J, Maslov S, Shakhnovich EI: Constraints imposed by non-functional protein-protein interactions on gene expression and proteome size. Mol Syst Biol. 2008, 4: 210-PubMed CentralPubMedView ArticleGoogle Scholar
- Zhang R, Lin Y: DEG 5.0, a database of essential genes in both prokaryotes and eukaryotes. Nucleic Acids Res. 2009, 37 (Database issue): D455-D458.PubMed CentralPubMedView ArticleGoogle Scholar
- Thompson JD, Higgins DG, Gibson TJ: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994, 22 (22): 4673-4680. 10.1093/nar/22.22.4673.PubMed CentralPubMedView ArticleGoogle Scholar
- Yang Z: PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol. 2007, 24 (8): 1586-1591. 10.1093/molbev/msm088.PubMedView ArticleGoogle Scholar
- Garcia-Vallve S, Guzman E, Montero MA, Romeu A: HGT-DB: a database of putative horizontally transferred genes in prokaryotic complete genomes. Nucleic Acids Res. 2003, 31 (1): 187-189. 10.1093/nar/gkg004.PubMed CentralPubMedView ArticleGoogle Scholar
- Podell S, Gaasterland T: DarkHorse: a method for genome-wide prediction of horizontal gene transfer. Genome Biol. 2007, 8 (2): R16-10.1186/gb-2007-8-2-r16.PubMed CentralPubMedView ArticleGoogle Scholar
- Barrett T, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, Marshall KA, Phillippy KH, Sherman PM, Holko M, et al: NCBI GEO: archive for functional genomics data sets–update. Nucleic Acids Res. 2013, 41 (D1): D991-D995. 10.1093/nar/gks1193.PubMed CentralPubMedView ArticleGoogle Scholar
- Lorenz R, Bernhart SH, Honer Zu Siederdissen C, Tafer H, Flamm C, Stadler PF, Hofacker IL: ViennaRNA Package 2.0. Algorithms Mol Biol. 2011, 6: 26-10.1186/1748-7188-6-26.PubMed CentralPubMedView ArticleGoogle Scholar
- Szklarczyk D, Franceschini A, Kuhn M, Simonovic M, Roth A, Minguez P, Doerks T, Stark M, Muller J, Bork P, et al: The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored. Nucleic Acids Res. 2011, 39 (Database issue): D561-D568.PubMed CentralPubMedView ArticleGoogle Scholar
- Yu NY, Wagner JR, Laird MR, Melli G, Rey S, Lo R, Dao P, Sahinalp SC, Ester M, Foster LJ, et al: PSORTb 3.0: improved protein subcellular localization prediction with refined localization subcategories and predictive capabilities for all prokaryotes. Bioinformatics. 2010, 26 (13): 1608-1615. 10.1093/bioinformatics/btq249.PubMed CentralPubMedView ArticleGoogle Scholar
- Gao F, Luo H, Zhang CT: DoriC 5.0: an updated database of oriC regions in both bacterial and archaeal genomes. Nucleic Acids Res. 2013, 41 (Database issue): D90-D93.PubMed CentralPubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.