Synonymous site conservation in the HIV-1 genome
© Mayrose et al.; licensee BioMed Central Ltd. 2013
Received: 28 May 2013
Accepted: 25 July 2013
Published: 4 August 2013
Synonymous or silent mutations are usually thought to evolve neutrally. However, accumulating recent evidence has demonstrated that silent mutations may destabilize RNA structures or disrupt cis regulatory motifs superimposed on coding sequences. Such observations suggest the existence of stretches of codon sites that are evolutionary conserved at both DNA-RNA and protein levels. Such stretches may point to functionally important regions within protein coding sequences not necessarily reflecting functional constraints on the amino-acid sequence. The HIV-1 genome is highly compact, and often harbors overlapping functional elements at the protein, RNA, and DNA levels. This superimposition of functions leads to complex selective forces acting on all levels of the genome and proteome. Considering the constraints on HIV-1 to maintain such a highly compact genome, we hypothesized that stretches of synonymous conservation would be common within its genome.
We used a combined computational-experimental approach to detect and characterize regions exhibiting strong purifying selection against synonymous substitutions along the HIV-1 genome. Our methodology is based on advanced probabilistic evolutionary models that explicitly account for synonymous rate variation among sites and rate dependencies among adjacent sites. These models are combined with a randomization procedure to automatically identify the most statistically significant regions of conserved synonymous sites along the genome. Using this procedure we identified 21 conserved regions. Twelve of these are mapped to regions within overlapping genes, seven correlate with known functional elements, while the functions of the remaining four are yet unknown. Among these four regions, we chose the one that deviates most from synonymous rate homogeneity for in-depth computational and experimental characterization. In our assays aiming to quantify viral fitness in both early and late stages of the replication cycle, no differences were observed between the mutated and the wild type virus following the introduction of synonymous mutations.
The contradiction between the inferred purifying selective forces and the lack of effect of these mutations on viral replication may be explained by the fact that the phenotype was measured in single-cycle infection assays in cell culture. Such a system does not account for the complexity of HIV-1 infections in vivo, which involves multiple infection cycles and interaction with the host immune system.
Molecular evolutionary models are often applied to detect and quantify the type and the intensity of selective pressures acting on genes and on specific sites within them. Sites playing an essential role, such as those composing the active site of an enzyme, are unlikely to undergo amino-acid replacements over evolutionary time. This constraint reflects a purifying selection force, which prevents the fixation of mutations that otherwise would disturb protein structure or function [1, 2]. The higher the intensity of purifying selection acting on a specific nucleotide or amino-acid site, the more conserved the site is expected to be. Hence, under the neutral theory of molecular evolution, conserved sites are assumed to indicate functional or structural importance, while variable sites may be regarded as evolving free from, or under weak, selective constraints. Some sites, however, are very fast evolving, yet are of great functional significance. Such fast evolution may reflect positive Darwinian selection. For example, the Influenza virus Hemagglutinin (HA) gene encodes a major surface antigen, targeted by neutralizing antibodies during infection. Sequence analysis of this gene suggested a few sites to be positively selected , and these were shown to be responsible for evading the host immune system. Taken together, sites that are under strong purifying or positive selection are likely to be functionally or structurally important and their identification is often a critical first step towards understanding and manipulating a protein’s function [Reviewed in [4, 5].
In protein-coding genes, the selection type and intensity are usually inferred by contrasting the rate of nonsynonymous (amino-acid altering; Ka) to the rate of synonymous (silent; Ks) nucleotide substitutions . The most currently-used models assume that the selective forces acting on protein-coding genes operate at the protein level only, while synonymous substitutions are free from selection and reflect the neutral rate of evolution . In such a case, Ks is constant across codon positions, Ka is heterogeneous, and the inference of site-specific Ka/Ks values is based solely on Ka variation. Codon sites showing a Ka/Ks ratio significantly lower than 1 are regarded as undergoing purifying selection, suggesting that they are functionally or structurally important. Codon sites with Ka/Ks ratios significantly higher than 1 are indicative of positive Darwinian selection, suggesting adaptive evolution. Finally, sites inferred to evolve under a Ka/Ks ratio not significantly different from 1 are regarded as free from selection at the protein level . A large number of studies have used this methodology to infer adaptive evolution in Human Immunodeficiency Virus type 1 (HIV-1), resulting from immune pressure or drug treatment [e.g., [7–9].
The assumption that Ks reflects the neutral rate of evolution suggests that synonymous substitutions should be fixed at a constant rate along the sequence. However, it has been shown that selection regularly operates on synonymous sites in different organisms [e.g., [10, 11]. For example, purifying selection on synonymous sites was found in 9.4% of all yeast genes and 5.1% in all worm genes using a likelihood based model that distinguishes between synonymous substitutions between preferred and unpreferred codons . Purifying selection on synonymous sites was suggested to result from considerations such as mRNA stability [13–17], splicing regulatory elements [e.g., , cis regulatory elements, and overlapping genes [11, 19, 20]. Furthermore, in bacteria and yeast, codon bias in highly expressed genes is the most documented type of selection on synonymous sites [e.g., . Additionally, the choice among synonymous codons was shown to affect protein structure . In addition, conservation of “optimal” and “non-optimal” codons was shown to reflect selection for cotranslational folding in yeast. Thus, it is becoming recognized that the evolutionary selective forces acting on synonymous sites reflect various biological phenomena [recently reviewed in .
The “Ks equal neutrality” assumption is particularly problematic in viruses since, due to their compact genome, regulatory elements are abundant within their protein coding regions. For example, there are well known RNA secondary structures along the HIV-1 genome, such as the Rev-Response Element (RRE), which plays important functional roles during the virus life cycle . Another obvious example is the existence of overlapping genes, in which strong selection against both synonymous and nonsynonymous substitutions is expected. However, since the commonly used codon models fail to account for Ks variation, such regions are usually excluded from the analysis, despite their high biological interest. For example, to avoid overlapping regions, Yang et al.  excluded the entire tat, rev, and vpu genes and analyzed the evolutionary selective forces in the rest of the HIV-1 genome. Thus, ignoring Ks variability across sites is biologically unrealistic, problematic and may bias inference of selective forces across protein coding genes . Furthermore, the ability to correctly model Ks variability across sites may reveal novel insights into the functional constraints acting on the DNA and RNA levels, which cannot be inferred using standard codon models.
Ngandu et al.  have previously applied a sliding-window approach to detect consecutive codons with low synonymous substitution rates within the HIV-1 genome. One of the regions discovered in that study was a 12-base-pair long region that resides within env for which no functional role had been previously described. Mutagenesis experiments in which synonymous mutations were introduced in this region, led to no observed difference between wild type and mutated clones in cell lines. Kosokovski-Pond et al.  have also applied models allowing for Ks variation to HIV-1 and other RNA viruses. However, their goal was to find shared evolutionary rate patterns across alignments rather than to detect conserved Ks regions.
In this study we constructed a genome-wide map of the selective forces operating against the fixation of synonymous substitutions across the HIV-1 genome. Our method relies on a model that explicitly accounts for dependency of synonymous rates among adjacent sites, thus removing stochastic noise from site specific Ks estimates . Specifically, we developed the StrechFinder methodology that detects statistically significant consecutive synonymous-conserved sites. This methodology overcomes the inherent statistical problematic aspects of sliding window approaches for detecting evolutionary rate variation among sites . We detected twenty-one short linear stretches within the genome, which display significantly lower levels of Ks than expected under neutrality. The forces leading to lowered Ks in these regions were explored. The majority of the identified regions were mapped to previously known regulatory functional regions, or to overlapping open reading frames (ORFs), validating our methodology as a general tool for predicting functional regions at the nucleotide level nested within protein-coding regions. Furthermore, we detected a number of Ks conserved regions that could not be assigned to a known function, and are thus predicted to be under selection due to a yet unknown reason. One such region is located within the pol gene. Mutagenesis experiments of this region were conducted to reveal its function but showed no clear effect on viral replication in cell culture. The implications of these results are discussed.
Results and discussion
Selection intensity on synonymous sites varies within and across HIV ORFs
Maximum log-likelihood (LL) values for the nine HIV-1 ORFs under the KaD-KsD, KaV-KsV, and KaV-KsC models
Identification of Ks-conserved stretches across the HIV-1 genome
Summary of stretches with significantly low Ks, ranked according to StretchFinder
Pos. in ORF
Contains the cPPT, the CTS and the DNA flap regions. The first two regions are involved in regulation of reverse transcription of the genome, and the third in translocation of the genome to the nucleus.
Overlaps Vpr 1–14. In this region of Vpr there is a conserved homo-ologimerization domain.
Overlaps Rev 18–52 (as well as Env 719–744), which includes the conserved homo-multimerization region and RNA-binding domain.
Overlaps Vif 1-18
Contains the PPT (a primer for reverse transcription)
Sites 47–52 overlap Rev 1–6.
Unexplained low Ks in sites 41-46
Equivalent to region (3) (overlap of functional elements in Rev)
Overlaps Gag 442–448 and Gag 454-463
Overlaps Protease 1-9
Equivalent to region (4) (overlaps Pol 986–1003)
Overlaps Tat 49–56: conserved nuclear localization signal and RNA binding domain
Overlaps Vpu 56-60
Unexplained low Ks
Four RNA loops (SL1-4) of the HIV-1 packaging signal. Gag 2–5 overlaps the fourth loop SL4.
Overlaps Env 240-257
Pol 939–947 forms the CTS end.
Unexplained low Ks at sites 948-954
Unexplained low Ks
Seven of the conserved stretches were mapped to regions previously reported to have a functional role. The most statistically significant Ks conserved stretch along the HIV-1 genome region is located inside the pol ORF and spans positions 898 to 937. This region spans a few functional nucleic acid elements: the central polypurine tract (cPPT) , the central termination sequence (CTS), and the DNA flap region . This stretch resides near a second significant stretch within pol, which spans positions 939 to 954 (Stretch 19 in Table 2). The conservation in this stretch may be partially explained by the presence of the end of the CTS within it. Indeed, this region within pol was previously identified using a manual scan for Ks conserved regions . The second polypurine tract (PPT) in HIV-1 is located inside the nef gene and is also part of a stretch with significantly low Ks values (Stretch 5 in Table 2). A third known functional element that was detected as a significantly conserved stretch is SL4 [32, 33], which serves as part of the RNA packaging signal and is located in gag (Stretch 15 in Table 2). Three out of five conserved stretches found in env (Stretches 12, 16, and 20) are mapped to the RRE region [34, 35]. Additional twelve significantly conserved stretches are mapped to regions of overlapping genes. Indeed, in almost all of these cases the overlapped region was frame-shifted from the first stretch, and the low Ks values could be explained by selective constraints on the overlapped gene, i.e. by low Ka values in the other gene.
We next tested whether secondary structure motifs within this stretch may explain the apparent Ks conservation. To determine the possible functional role of this region, an RNA secondary structure prediction was performed, and was compared to the experimental values obtained for the whole HIV-1 genome . The predicted RNA secondary structure (Figure 2C) suggested that the pol 82–90 region is mostly unstructured, apart from a short stem and loop at the beginning of the stretch. This prediction matches the secondary structure model based on a high-throughput RNA analysis performed on the entire RNA genome of HIV-1 .
Taken together, these analyses reinforce the hypothesis that the low Ks of the pol 82–90 stretch is not related to its participation in RNA secondary structures and is most likely also unrelated to translational efficiency; thus, it may have a regulatory role, such as binding an unknown factor.
Disruption of early stages should result in decreased levels of integrated HIV-1 proviruses. To detect such a reduction, we measured the overall expression of genes (Gag and GFP) transcribed from the HIV-1 provirus. Searching for a possible effect of the mutations on late events, we quantified virion release from the cells by measuring capsid levels in the context of assembled virions. In addition, the infectivity of produced virions was monitored using single cycle infection assays.
The above results cannot rule out possible effects of the mutations on viral infectivity. To further investigate this option, naïve HEK293T cells were infected with equal amounts of wild type and mutant virions, normalized by reverse transcriptase activity [as in . To measure infection levels, infected cells were tested for either intracellular Gag or GFP levels, quantified by Western blot or flow cytometry analyses, respectively. Both Gag (Figure 4C) and GFP levels (not shown) were not significantly different among the wild type and mutated clones, suggesting that the mutations did not hamper infectivity. Finally, we tested whether HIV-1 particle production originating from infected cells (rather than transfected cells) varied between the tested clones, and here too, no significant difference was observed (Figure 4D). Overall, these results indicate that in the current experimental settings, the functional role of this conserved stretch has no (or only minimal) effect on the HIV-1 replication cycle. Possible reasons for the disagreement between the strong evolutionary conservation signal and the experimental results are discussed below.
This study provides and explores a comprehensive map of the evolutionary selective forces operating on HIV-1. Previous studies inferring natural selection in HIV did not account for selection on synonymous substitutions, which may have possibly inflated estimates of positive selection . Specifically, overlapping regions of genes are often considered problematic and are removed from analyses . Here we show that complex inter-related selective forces may operate on all protein coding genes within HIV-1.
This analysis underscores the association between synonymously conserved regions and functional regions at the DNA or RNA levels. HIV-1 represents one of the most studied biological systems and thus provides us with the opportunity to correlate between the inferred Ks conserved stretches and known functional regions residing within coding regions. It is noteworthy that nearly all well studied regulatory regions residing within the ORFs of HIV-1 were also detected using our methodology. Nevertheless, four synonymously conserved regions were detected with no previously reported function. An important aspect of this work was the effort to experimentally elucidate the significance of the computational predictions derived from the evolutionary models. However, the mutagenesis assay performed here on one of the stretches (pol 82–90) resulted in insignificant differences between the wild type and mutated constructs. Insignificant differences between wild type and mutated clones were also obtained by Ngandu et al. , who experimentally tested the role for a synonymous conserved region within env. Notably, these experiments were performed in cell cultures, which do not necessarily mimic natural infections of CD4+ cells. For instance, interactions with salient host proteins (such as the interaction between Tetherin and Vpu) are often dependent on the specific cell-line used . In addition, it is possible that these synonymous mutations reduce viral fitness, but this reduction is not significant enough to be revealed after only a single infection cycle. Finally, since this stretch may have an important role only in the context of the virus interaction with the host adaptive immune response, it was not discovered in the artificial cell culture experiments conducted here. Hence, future work is needed to determine the role of the novel conserved stretches detected here. Nevertheless, the methodology presented here is generic, and can be used to investigate other less-studied viral datasets to predict novel functional regions, as well as to explore the breadth of synonymous conservation in both cells and viruses. Moreover, the identification of conserved sites in viral populations should enhance PCR diagnostics of a variety of viral strains that otherwise significantly differ from each other, like in the case of HIV-1.
We have previously developed an evolutionary model that accounts for site variability and dependency among sites of both synonymous and nonsynonymous rates . Using this model, Bayesian posterior estimates of both the Ka and Ks rates at each site can be computed. Briefly, this model specifies two rate distributions: the first accounts for the variability among synonymous sites and the second for the variability among nonsynonymous sites. In addition, the model explicitly accounts for dependencies between adjacent codon sites by incorporating two hidden Markov models (HMMs): the first representing dependencies between adjacent Ka rates and the second between adjacent Ks rates. Hereafter, this model is termed KaD-KsD. It was compared to two alternative models; the KaV-KsV model that allows for variation at both the Ka and Ks levels, yet assumes that adjacent sites evolve independently of each other, and the KaV-KsC model that assumes that Ks is constant across sites, Ka is variable across sites, and Ka values at adjacent sites are independent of each other. Since these are nested models, a likelihood ratio test (LRT) can be used to assess which model best fits the data. Specifically, differences in maximal log-likelihood (LL) values are compared to their expected values based on a chi-square distribution with 3 or 2 degrees of freedom when the KaD-KsD model is compared against the KaV-KsC or KaV-KsV models, respectively. If the best-fit model is either KaD-KsD or KaV-KsV, separate Ka and Ks values for each codon site can be estimated by calculating the expectation of the posterior rate distribution at each site. If the best-fit model is KaV-KsC, only Ka values for each site can be estimated.
Identification of Ks-conserved stretches
Given the Ks estimates for each site, we aimed to obtain the set of linear stretches with significantly low Ks values. We thus developed the StretchFinder algorithm that tests for significant clustering of consecutive conserved codon sites. StretchFinder allows distinguishing between a region with a few randomly dispersed Ks-conserved sites and an extended region with low Ks values, which may be indicative of a functionally important region. Specifically, for a given sub-region of length l, the algorithm computes the average Ks value. Next, a null distribution of average Ks values is computed assuming no clustering of Ks conserved sites. Specifically, the null distribution is computed by shuffling the site specific Ks score across the entire alignment. The shuffling procedure is repeated m times and for each such shuffle, the average Ks score of a stretch of length l is computed. The set of average (shuffled) Ks values provides the null distribution. These null distributions were used to fit the parameters of an extreme value distribution (EVD). Finally a p-value for the score of a certain size was obtained from the corresponding EVD. In order to correct for multiple testing, the False Discovery Rate (FDR) method (Benjamini and Hochberg, 1995) was applied to correct each p-value. In this study we considered l =2 to 40 codons, and m =106.
HIV-1 accession numbers were retrieved from the whole genome DNA multiple sequence alignment (MSA) available at Los Alamos HIV sequence database (http://www.hiv.lanl.gov/). Sequences were additionally filtered if: (1) not all nine ORFs were present in the corresponding GenBank file, (2) one of the coding sequences (CDS) was not divisible by three, (3) a stop codon was found in the middle of the CDS. This procedure resulted in 301 fully-sequenced genomes. The HXB2 sequence (accession number AF033819) was added to serve as a reference sequence in further analyses. Since the original Los Alamos MSA does not always maintain the reading frame, codon-based MSAs were created for each gene based on the translated protein MSAs created using the MUSCLE program . In order to eliminate the number of gaps in the alignment, sequences that opened insertion positions shared by less than 2% of the sequences were removed, resulting in 122 HIV-1 aligned sequences. A whole proteome MSA was then generated by concatenating the nine MSAs generated for each protein separately. Maximum likelihood (ML) phylogeny reconstruction was performed based on the whole proteome alignment with the PhyML program , using among-site rate variation of four discrete rate categories, and the rtREV amino acid substitution matrix. Given the phylogenetic tree, the parameters of the three evolutionary codon models (KaV-KsC, KaV-KsV, and KaD-KsD) were optimized by ML for the whole-genome MSA as well as for each of the nine ORFs separately. Ka and Ks rates for each site were inferred over the whole proteome MSA, allowing comparison of the different values.
The codon adaptation index (CAI) of a codon position in a sequence is defined as the ratio of the usage of the codon, to the usage of the most abundant codon for the same amino acid . The usage of a codon is defined as the relative frequency of the codon in a dataset of highly expressed genes. The CAI of a sequence is defined as the geometrical mean of the values across the sequence. Here, since a multiple sequence alignment rather than a single sequence exists, the arithmetic mean (ω) of each codon position was calculated. The usage values of codons were taken from Haas et al.  based on a dataset of highly expressed human genes. To determine whether a stretch had deviant codon bias, CAI values were calculated for sliding windows of size 10 across the sequence, and plotted along the sequence. The stretch was manually inspected to see whether it was part of a peak or a depression as compared to the rest of the sequence.
RNA secondary structure prediction
RNA secondary structure predictions were performed with the RNAalifold software . The underlying method in this program takes into account both thermodynamic stability and sequence covariation, i.e., a stem structure is inferred only if supported by compensatory mutations along the alignment. The predictions were performed on regions spanning 30 nucleotides upstream and downstream of each stretch.
We cloned an internal GFP expression cassette into the Nef ORF of the NLR+HSA construct, which is a derivative of the HIV-AP clone , to create the NLR+GFP plasmid (Figure 3). The NLR+GFP vector contains the entire HIV-1 genome sequence and expresses all viral proteins except Env and Nef. This construct allows us to compare infectivity (provided that envelope proteins were supplemented) of the NLR+GFP clone versus a clone (NLR+GFPpolmut), in which twelve synonymous mutations were introduced into this stretch. Notably, these 12 silent point mutations in the 33 base long sequence were chosen to maximize the number of synonymous changes without altering the protein’s amino-acid sequence. When applicable, double synonymous mutations per codon were introduced (e.g., TTA to CTC); otherwise, transversions were preferred over transitions, and the resulting codon was chosen as the one with the lowest codon usage in HIV, according to . In addition, the introduction of this vector into the host cells resulted in the expression of a cloned GFP, thus allowing easy monitoring of the viral RNA expression by fluorescent microscopy or flow cytometry analysis. HEK293T cells (approximately 80% confluency in 60 mm plates) were transfected with a mixture of plasmids consisting of 10 μg of NLR+GFP (or NLR+GFPpolmut) clone and 2.5 μg of pMD.G (expressing the vesicular stomatitis virus envelope glycoprotein) by the calcium phosphate method. Two days post transfection, virions were purified from supernatants (3 ml) of transfected cultures by ultracentrifugation through a 25% sucrose cushion and the transfected cells were harvested and extracted. Gag and Capsid proteins were detected in extracts of transfected or infected (two days post infection) cells by immunoblotting using the monoclonal antibody against HIV-1 capsid protein (from hybridoma clone 183-H12-5C; NIH AIDS Research and Reference Program). Exogenous reverse transcriptase (RT) assay was performed to quantify HIV particle levels in supernatants of transfected cells. These supernatants, normalized to have equal amounts of virions, were diluted, supplemented with 8 μg/ml polybrene and used to infect naive HEK293T cells. Two days post infection, a portion of infected cells was extracted for immunoblotting and a portion was analyzed for GFP fluorescence, using flow cytometry. All the above methods were conducted as described previously [38, 47, 48].
This study is supported by a Saia Foundation Grant (I.M and T.P), by a Marie Curie Reintegration Grant (I.M) and by Israel Science Foundation grants (878/09 to T.P and 1265/12 to I.M).
- Kimura M: The neutral theory of molecular evolution. 1983, Cambridge, U.K: Cambridge University PressView ArticleGoogle Scholar
- Kimura M: The neutral theory of molecular evolution. Evolution of Genes and Proteins. Edited by: Nei M, Koehn R. 1983, Sunderland, Mass: Sinauer Associates, 208-233.Google Scholar
- Yang Z, Nielsen R, Goldman N, Pedersen AM: Codon-substitution models for heterogeneous selection pressure at amino acid sites. Genetics. 2000, 155 (1): 431-449.PubMed CentralPubMedGoogle Scholar
- Lichtarge O, Sowa ME, Philippi A: Evolutionary traces of functional surfaces along G protein signaling pathway. Methods Enzymol. 2002, 344: 536-556.PubMedView ArticleGoogle Scholar
- Yang Z: The power of phylogenetic comparison in revealing protein function. Proc Natl Acad Sci USA. 2005, 102 (9): 3179-3180. 10.1073/pnas.0500371102.PubMed CentralPubMedView ArticleGoogle Scholar
- Hurst LD: The Ka/Ks ratio: diagnosing the form of sequence evolution. Trends Genet. 2002, 18 (9): 486-10.1016/S0168-9525(02)02722-1.PubMedView ArticleGoogle Scholar
- Chen L, Perlina A, Lee CJ: Positive selection detection in 40,000 human immunodeficiency virus (HIV) type 1 sequences automatically identifies drug resistance and positive fitness mutations in HIV protease and reverse transcriptase. J Virol. 2004, 78 (7): 3722-3732. 10.1128/JVI.78.7.3722-3732.2004.PubMed CentralPubMedView ArticleGoogle Scholar
- Doron-Faigenboim A, Stern A, Mayrose I, Bacharach E, Pupko T: Selecton: a server for detecting evolutionary forces at a single amino-acid site. Bioinformatics. 2005, 21 (9): 2101-2103. 10.1093/bioinformatics/bti259.PubMedView ArticleGoogle Scholar
- Yang W, Bielawski JP, Yang Z: Widespread adaptive evolution in the human immunodeficiency virus type 1 genome. J Mol Evol. 2003, 57 (2): 212-221. 10.1007/s00239-003-2467-9.PubMedView ArticleGoogle Scholar
- Hellmann I, Zollner S, Enard W, Ebersberger I, Nickel B, Paabo S: Selection on human genes as revealed by comparisons to chimpanzee cDNA. Genome Res. 2003, 13 (5): 831-837. 10.1101/gr.944903.PubMed CentralPubMedView ArticleGoogle Scholar
- Chamary JV, Parmley JL, Hurst LD: Hearing silence: non-neutral evolution at synonymous sites in mammals. Nat Rev Genet. 2006, 7 (2): 98-108. 10.1038/nrg1770.PubMedView ArticleGoogle Scholar
- Zhou T, Gu W, Wilke CO: Detecting positive and purifying selection at synonymous sites in yeast and worm. Mol Biol Evol. 2010, 27 (8): 1912-1922. 10.1093/molbev/msq077.PubMed CentralPubMedView ArticleGoogle Scholar
- Gu W, Zhou T, Wilke CO: A universal trend of reduced mRNA stability near the translation-initiation site in prokaryotes and eukaryotes. PLoS Comput Biol. 2010, 6 (2): e1000664-10.1371/journal.pcbi.1000664.PubMed CentralPubMedView ArticleGoogle Scholar
- Nackley AG, Shabalina SA, Tchivileva IE, Satterfield K, Korchynskyi O, Makarov SS, Maixner W, Diatchenko L: Human catechol-O-methyltransferase haplotypes modulate protein expression by altering mRNA secondary structure. Science. 2006, 314 (5807): 1930-1933. 10.1126/science.1131262.PubMedView ArticleGoogle Scholar
- Zhou T, Wilke CO: Reduced stability of mRNA secondary structure near the translation-initiation site in dsDNA viruses. BMC Evol Biol. 2011, 11: 59-10.1186/1471-2148-11-59.PubMed CentralPubMedView ArticleGoogle Scholar
- Kudla G, Murray AW, Tollervey D, Plotkin JB: Coding-sequence determinants of gene expression in Escherichia coli. Science. 2009, 324 (5924): 255-258. 10.1126/science.1170160.PubMed CentralPubMedView ArticleGoogle Scholar
- Tuller T, Waldman YY, Kupiec M, Ruppin E: Translation efficiency is determined by both codon bias and folding energy. Proc Natl Acad Sci USA. 2010, 107 (8): 3645-3650. 10.1073/pnas.0909910107.PubMed CentralPubMedView ArticleGoogle Scholar
- Goren A, Ram O, Amit M, Keren H, Lev-Maor G, Vig I, Pupko T, Ast G: Comparative analysis identifies exonic splicing regulatory sequences–The complex definition of enhancers and silencers. Mol Cell. 2006, 22 (6): 769-781. 10.1016/j.molcel.2006.05.008.PubMedView ArticleGoogle Scholar
- Miyata T, Yasunaga T: Evolution of overlapping genes. Nature. 1978, 272 (5653): 532-535. 10.1038/272532a0.PubMedView ArticleGoogle Scholar
- Rogozin IB, Spiridonov AN, Sorokin AV, Wolf YI, Jordan IK, Tatusov RL, Koonin EV: Purifying and directional selection in overlapping prokaryotic genes. Trends Genet. 2002, 18 (5): 228-232. 10.1016/S0168-9525(02)02649-5.PubMedView ArticleGoogle Scholar
- Ikemura T: Codon usage and tRNA content in unicellular and multicellular organisms. Mol Biol Evol. 1985, 2 (1): 13-34.PubMedGoogle Scholar
- Kimchi-Sarfaty C, Oh JM, Kim IW, Sauna ZE, Calcagno AM, Ambudkar SV, Gottesman MM: A "silent" polymorphism in the MDR1 gene changes substrate specificity. Science. 2007, 315 (5811): 525-528. 10.1126/science.1135308.PubMedView ArticleGoogle Scholar
- Rubinstein ND, Pupko T: Detection and analysis of conservation at synonymous sites. Codon Evolution: Mechanisms and Models. Edited by: Cannarozzi GM, Schneider A. 2012, New York: Oxford University Press, 218-228.View ArticleGoogle Scholar
- Malim MH, Tiley LS, McCarn DF, Rusche JR, Hauber J, Cullen BR: HIV-1 structural gene expression requires binding of the Rev trans-activator to its RNA target sequence. Cell. 1990, 60 (4): 675-683. 10.1016/0092-8674(90)90670-A.PubMedView ArticleGoogle Scholar
- Rubinstein ND, Doron-Faigenboim A, Mayrose I, Pupko T: Evolutionary models accounting for layers of selection in protein-coding genes and their impact on the inference of positive selection. Mol Biol Evol. 2011, 28 (12): 3297-3308. 10.1093/molbev/msr162.PubMedView ArticleGoogle Scholar
- Ngandu NK, Scheffler K, Moore P, Woodman Z, Martin D, Seoighe C: Extensive purifying selection acting on synonymous sites in HIV-1 Group M sequences. Virol J. 2008, 5: 160-10.1186/1743-422X-5-160.PubMed CentralPubMedView ArticleGoogle Scholar
- Pond SL, Scheffler K, Gravenor MB, Poon AF, Frost SD: Evolutionary fingerprinting of genes. Mol Biol Evol. 2010, 27 (3): 520-536. 10.1093/molbev/msp260.PubMedView ArticleGoogle Scholar
- Mayrose I, Doron-Faigenboim A, Bacharach E, Pupko T: Towards realistic codon models: among site variability and dependency of synonymous and non-synonymous rates. Bioinformatics. 2007, 23 (13): i319-i327. 10.1093/bioinformatics/btm176.PubMedView ArticleGoogle Scholar
- Schmid K, Yang Z: The trouble with sliding windows and the selective pressure in BRCA1. PLoS One. 2008, 3 (11): e3746-10.1371/journal.pone.0003746.PubMed CentralPubMedView ArticleGoogle Scholar
- Charneau P, Clavel F: A single-stranded gap in human immunodeficiency virus unintegrated linear DNA defined by a central copy of the polypurine tract. J Virol. 1991, 65 (5): 2415-2421.PubMed CentralPubMedGoogle Scholar
- Charneau P, Mirambeau G, Roux P, Paulous S, Buc H, Clavel F: HIV-1 reverse transcription. A termination step at the center of the genome. J Mol Biol. 1994, 241 (5): 651-662. 10.1006/jmbi.1994.1542.PubMedView ArticleGoogle Scholar
- Clever J, Sassetti C, Parslow TG: RNA secondary structure and binding sites for gag gene products in the 5' packaging signal of human immunodeficiency virus type 1. J Virol. 1995, 69 (4): 2101-2109.PubMed CentralPubMedGoogle Scholar
- McBride MS, Panganiban AT: The human immunodeficiency virus type 1 encapsidation site is a multipartite RNA element composed of functional hairpin structures. J Virol. 1996, 70 (5): 2963-2973.PubMed CentralPubMedGoogle Scholar
- Cmarko D, Boe SO, Scassellati C, Szilvay AM, Davanger S, Fu XD, Haukenes G, Kalland KH, Fakan S: Rev inhibition strongly affects intracellular distribution of human immunodeficiency virus type 1 RNAs. J Virol. 2002, 76 (20): 10473-10484. 10.1128/JVI.76.20.10473-10484.2002.PubMed CentralPubMedView ArticleGoogle Scholar
- Pollard VW, Malim MH: The HIV-1 Rev protein. Annu Rev Microbiol. 1998, 52: 491-532. 10.1146/annurev.micro.52.1.491.PubMedView ArticleGoogle Scholar
- Ngumbela KC, Ryan KP, Sivamurthy R, Brockman MA, Gandhi RT, Bhardwaj N, Kavanagh DG: Quantitative effect of suboptimal codon usage on translational efficiency of mRNA encoding HIV-1 gag in intact T cells. PLoS One. 2008, 3 (6): e2356-10.1371/journal.pone.0002356.PubMed CentralPubMedView ArticleGoogle Scholar
- Watts JM, Dang KK, Gorelick RJ, Leonard CW, Bess JW, Swanstrom R, Burch CL, Weeks KM: Architecture and secondary structure of an entire HIV-1 RNA genome. Nature. 2009, 460 (7256): 711-716. 10.1038/nature08237.PubMed CentralPubMedView ArticleGoogle Scholar
- Telesnitsky A, Blain S, Goff SP: Assays for retroviral reverse transcriptase. Methods Enzymol. 1995, 262: 347-362.PubMedView ArticleGoogle Scholar
- Neil SJ, Zang T, Bieniasz PD: Tetherin inhibits retrovirus release and is antagonized by HIV-1 Vpu. Nature. 2008, 451 (7177): 425-430. 10.1038/nature06553.PubMedView ArticleGoogle Scholar
- Edgar RC: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004, 32 (5): 1792-1797. 10.1093/nar/gkh340.PubMed CentralPubMedView ArticleGoogle Scholar
- Guindon S, Gascuel O: A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol. 2003, 52 (5): 696-704. 10.1080/10635150390235520.PubMedView ArticleGoogle Scholar
- Sharp PM, Li WH: The codon Adaptation Index–a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res. 1987, 15 (3): 1281-1295. 10.1093/nar/15.3.1281.PubMed CentralPubMedView ArticleGoogle Scholar
- Haas J, Park EC, Seed B: Codon usage limitation in the expression of HIV-1 envelope glycoprotein. Current biology : CB. 1996, 6 (3): 315-324. 10.1016/S0960-9822(02)00482-7.PubMedView ArticleGoogle Scholar
- Hofacker IL, Fekete M, Stadler PF: Secondary structure prediction for aligned RNA sequences. J Mol Biol. 2002, 319 (5): 1059-1066. 10.1016/S0022-2836(02)00308-X.PubMedView ArticleGoogle Scholar
- He J, Landau NR: Use of a novel human immunodeficiency virus type 1 reporter virus expressing human placental alkaline phosphatase to detect an alternative viral receptor. J Virol. 1995, 69 (7): 4587-4592.PubMed CentralPubMedGoogle Scholar
- Stephens CR, Waelbroeck H: Codon bias and mutability in HIV sequences. J Mol Evol. 1999, 48 (4): 390-397. 10.1007/PL00006483.PubMedView ArticleGoogle Scholar
- Melamed D, Mark-Danieli M, Kenan-Eichler M, Kraus O, Castiel A, Laham N, Pupko T, Glaser F, Ben-Tal N, Bacharach E: The conserved carboxy terminus of the capsid domain of human immunodeficiency virus type 1 gag protein is important for virion assembly and release. J Virol. 2004, 78 (18): 9675-9688. 10.1128/JVI.78.18.9675-9688.2004.PubMed CentralPubMedView ArticleGoogle Scholar
- Laham-Karam N, Bacharach E: Transduction of human immunodeficiency virus type 1 vectors lacking encapsidation and dimerization signals. J Virol. 2007, 81 (19): 10687-10698. 10.1128/JVI.00653-07.PubMed CentralPubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.