- Research article
- Open Access
Evolution and phylogeny of insect endogenous retroviruses
BMC Evolutionary Biologyvolume 1, Article number: 3 (2001)
The genome of invertebrates is rich in retroelements which are structurally reminiscent of the retroviruses of vertebrates. Those containing three open reading frames (ORFs), including an env-like gene, may well be considered as endogenous retroviruses. Further support to this similarity has been provided by the ability of the env-like gene of DmeGypV (the Gypsy endogenous retrovirus of Drosophila melanogaster) to promote infection of Drosophila cells by a pseudotyped vertebrate retrovirus vector.
To gain insights into their evolutionary story, a sample of thirteen insect endogenous retroviruses, which represents the largest sample analysed until now, was studied by computer-assisted comparison of the translated products of their gag, pol and env genes, as well as their LTR structural features. We found that the three phylogenetic trees based respectively on Gag, Pol and Env common motifs are congruent, which suggest a monophyletic origin for these elements.
We showed that most of the insect endogenous retroviruses belong to a major clade group which can be further divided into two main subgroups which also differ by the sequence of their primer binding sites (PBS). We propose to name IERV-K and IERV-S these two major subgroups of I nsect E ndogenous R etro V iruses (or I nsect ER rantiV irus, according to the ICTV nomenclature) which respectively use Lys and Ser tRNAs to prime reverse transcription.
Retrovirus-like elements have been found in the genomes of most Eukaryotes. Their integrated/proviral forms consist of two long terminal repeats (LTRs) flanking an internal region which contains one to three major open reading frames (ORFs) coding for structural and enzymatic functions necessary for their replication cycle. Based on a phylogeny of their reverse transcriptase (RT) domains, the retrovirus-like elements were divided into two major groups: the Ty1/Copia and the Ty3/Gypsy families . They differ by the order of enzymatic domains encoded in the pol gene: Integrase – Reverse Transcriptase in the case of the Ty1/Copia family, and Reverse Transcriptase – Integrase in the case of the Ty3/Gypsy family which is also the case for vertebrate retroviruses. Moreover, the Ty3/Gypsy family is more closely related to retroviruses than to the Ty1/Copia family. Recently, the International Committee on Taxonomy of Viruses (ICTV) has proposed to call these groups Metaviridae and Pseudoviridae, respectively . Among the Metaviridae present in the genome of Invertebrates, a noticeable proportion contain an env-like third ORF which makes them strikingly similar to vertebrate endogenous retroviruses. DmeGypV (the Drosophila Gypsy endogenous retrovirus) is the best studied of these elements, and its infectious properties due to its env-like gene have been demonstrated by feeding experiments [3, 4] and by the observation that a moloney murine leukemia virus pseudotyped with the DmeGypV env gene was infectious for Drosophila cells .
A relationship between the envelope proteins of some of these insect endogenous retrovirus and the Ld130 group of baculovirus envelope proteins was recently described . Moreover, it was shown that members of this family contain a furin cleavage motif, a conserved motif downstream of this site, predicted coiled-coil domains, and a pattern of conserved cysteine residues . Experimental data support these comparative analyses: it was recently reported that Ld130 has the properties of a low-pH-dependant envelope fusion protein and may play a role in the infection cycle . Moreover, IJkel et al. (2000)  have shown that Se8, the homologue of Ld130 in Spodoptera exigua, is an envelope fusion protein, the R-X-K-R corresponding to the furin-like proprotein convertase cleavage site. Thirteen insect endogenous retroviral sequences are now available, providing an opportunity to analyse in detail their evolutionary relationships. The results presented here show that most of these sequences (including that of DmeGypV) cluster in a very homogeneous and specific group.
The sequences of thirteen insect endogenous retroviruses were included in this analysis, which represents the largest sample analysed until now. Their virus names, genome sequence accession numbers and assigned abbreviations are indicated in Table 1.
The primer binding sites (PBS) of insect endogenous retroviruses have unusual properties
The insect endogenous retroviruses can be divided into two main groups: (DmeGypV DsuGypV DviGypV DmeNomV CcaYoyV) using a tRNALys and (Dme176V Dme297V DanTomV DmeZamV DmeTirV DmeIdeV DviTv1V TniTedV) using a tRNASer binding sites. A striking feature of these element is that the last bp of the 5'LTR is always the first bp of the PBS. Hence, a mechanism should be involved in order to explain why the last nucleotide of the 5'LTR is not lost after a round of replication. Such a mechanism would imply the deletion of the 3' terminus nucleotide of the tRNA. It is also possible that the cleavage mediated by the RNase H encoded by these elements to remove the primer tRNA attached to the 5' end of the minus-strand template occurs in the CCA triplet between the terminal A and the adjacent C. The different products of the reverse transcription of these elements should be experimentally analysed in order to understand this unusual characteristic of errantiviruses.
Common motifs can be found in insect retroviral Gag and Env sequences
None of the canonical domains described for Gag retroviral proteins are present in insect retroviral Gag sequences. Lack of a canonical cystein-rich nucleic acid-binding domain has also been reported in spumaviruses . In the case of DmeGypV, DviGypV and DsuGypV, the C-terminal part of their Gag sequences contain an arginine-rich region which might act as an RNA binding motif  but might also play a role as a nuclear localization signal . Three motifs can be identified in the N-terminal part of the sequences (Fig. 1). Moreover, these motifs are also present in the burdock element of the Metaviridae family, which does not have an env-like gene (see below).
The insect retroviral Env sequences share structural properties with vertebrate retroviral Env proteins: a signal peptide, N-glycosylation sites and a transmembrane domain . It has been shown experimentally that the env gene of DmeGypV, DmeZamV, DanTomV and DmeNomV is expressed from a spliced mRNA [13–16]. Using the MEME program, we have identified six collinear motifs in 12 out of the 13 insect retroviral Env sequences (Fig. 2). Considering the high variability of viral envelope proteins generally described, this strongly suggests that these Env sequences form a monophyletic group. Moreover, the motif II, previously described [6, 7], is common to the thirteen Env sequences. The R-X-K-R sequence present at the beginning of this motif was previously described as a common motif present in some insect and vertebrate retro viral Env sequences [14, 17]. It is the consensus cleavage site recognized by a cellular endopeptidase that cleaves the precursor envelope protein into the surface (SU) and transmembrane (TM) polypeptides .
We first addressed the specificity of the motif II. For this purpose, we scanned for the R-x(2)-R-X(5,6)-[GE]-x(5)-[LV]-x-G-x(2)-D-x(2)-D pattern in TrEMBL using the ScanProsite program and got eight hits. Six of them are indeed insect retroviral Env sequences. The other ones corresponds to Ld130 and Se8 previously described in Malik et al. and Rohrmann and Karplus . Because all sequences are not present in TrEMBL, we also performed a PHI-BLAST on Dme176V Env (default parameters) using this same pattern sequence as the seed. We detected all the expected Env sequences at significant level. Hence, the R-x(2)-R-X(5,6)-[GE]-x(5)-[LV]-x-G-x(2)-D-x(2)-D pattern can be used as a highly specific probe for the in silico detection of insect endogenous retroviral envelope protein sequences in databanks.
Phylogeny based on the Reverse Transcriptase (RT) and RNaseH domains
A phylogeny based on alignment of the RT and RNaseH domains using the characteristic motifs previously defined by Malik & Eickbush  is shown in Fig. 3B. In order to illustrate the relationships among the insects retroelements from the Ty3/Gypsy family, we added five insect Metaviridae sequences which do not have an env-like gene. This phylogenetic analysis indicates that the insect endogenous retroviruses form a homogeneous group, whereas the other elements are attached to the tree by longer branchs. One discrepancy emerged: the burdock element clusters with the insect endogenous retroviruses.
Gag and Env based phylogenies
Two unrooted phylogenetic trees were generated based on the Gag motifs I to III (Fig. 3A) and the Env motif II (Fig. 3C) domains previously identified. Both trees are in good agreement with the RT/RnaseH-based phylogeny (Fig. 3A). The insect retroviruses are then mainly divided into two major groups: (DmeGypV DsuGypV DviGypV DmeNomV CcaYoyV) and (Dme176V Dme297V DanTomV DmeZamV DmeTir DmeIdeV DviTvIV DmeB104 TniTedV). Moreover, this division is perfectly correlated with the form of tRNA used to prime the reverse transcription.
DmB104V, a member of the BEL clade, encodes an envelope containing the motif II
The PHI-BLAST search has revealed a match with the truncated Env-like amino-acid sequence (CAA88414.1) of DmeB104 T(Z48503, also known as roo), a member of the BEL clade which is widespread in metazoans . Using TBLASTN, we found multiple copies of DmB104V env-like genes in the Drosophila genome database. Most of these sequences display premature stop codons due to a couple of insertions/deletions (indels), but a few complete env genes are present (e.g. AC008209, 112295–114043). However, it should be noted that the DmB104V Env full-length sequence does not share the five other common motifs found previously in errantiviral env translated sequences. Moreover, the DmeB104V Env sequence is closer to the Ld130 protein family than to the errantiviral Env sequences as revealed by the score obtained from a PSI-BLAST analysis (data not shown). This result does not agree with Frame et al.'s claim  that DmeB104V captured the envelope coding region from a Zam-like element, but, according to Malik et al., suggest that a B104-like ancestor has acquired an envelope gene from a baculovirus.
The Env insect retroviral sequences share common features with the Ld130 protein family
The similarities between insect retro viral and Ld130 and Se8 envelope proteins, and evidence indicating that moloney murine leukemia virus pseudotyped with an env gene from DmeGypV is infectious for Drosophila cells suggest that the Env proteins are fusion proteins. We confirm and extend Rohrmann and Karplus' results  that a coiled-coil-like region is present in 9 out of the 13 Env sequences studied, as well as in Ld130 and Se8 using the Leam-coiled-VMF program .
The IERV monophyletic lineage includes most of the insect endogenous retroviruses
We took advantage of the fact that the sequences of up to 13 insect endogenous retroviruses have been determined to investigate their phylogenetic relationships. The three phylogenetic trees, based on the well conserved Pol domains or on more divergent Gag and Env domains, show the same topology, supported by high bootstrap values. The congruence of these trees strengthens previous conclusions  concerning the monophyletic origin of the insect endogenous retroviruses. In all trees, this major clade can be further divided into two main subgroups which also differ by the sequence of their PBS. We propose to use the taxonomic nomenclature of Larsson et al.  to name IERV-K and IERV-S these two major sub-groups of I nsect E ndogenous R etroV iruses (or I nsect ER rantiV irus, according to the ICTV nomenclature) which respectively use Lys and Ser tRNAs to prime reverse transcription.
In both the Pol and the Gag trees, the burdock element is associated with the IERV-K clade. This clustering is strengthened by the fact that it also displays the LTR-PBS overlap typical of IERVs and contains a tRNASer PBS. This retrotransposon which, according to the ICTV, is classified as a metavirus may be a former errantivirus which has lost its env gene. The fact that env is such a modular gene explains why the classification of Metaviridae based only on the presence/absence of an env gene  does not always reflect the actual phylogeny.
Do lERVs Env proteins lack a typical retroviral SU subunit?
Little is known about the structure and function of the IERV Env proteins. The discovery of the relationship between IERV Env and baculoviral envelope fusion proteins described previously [6, 7] leads us to reconsider previous analyses based on a possible analogy with the glycoproteins of vertebrate retroviruses. The latter are known to be cleaved in two subunits by a furin-like cellular endopeptidase. The fusion peptide is located in the extracellular terminus of the transmembrane subunit (TM). The surface subunit (SU) is linked to TM by disulfide bonds and is responsible for the attachment of the virus to the cellular receptor . Only the envelope proteins of DmeGypV, DanTomV and TniTedV have been studied by Western blotting [3, 15, 24]. Bands tentatively corresponding to the processing of DmeGypV and DanTomV envelopes at a more or less degenerate cleavage site could be visualized but in lower amounts than major bands of higher molecular weights. In the three cases, the latter were inferred to correspond to the unprocessed precursor. However, since the putative cleavage site of the motif II is located very close to the NH2 terminus of all three proteins, the sizes of the major bands are also compatible with a preferential processing of the precursor at this site. Moreover, the exact size is also difficult to predict because the Env proteins are likely glycosylated as has been shown for DmeGypV and TniTedV. One can also note that the only furin-like cleavage sites that can be predicted from the sequences of the other IERV envelopes are those located in the NH2 terminus of the motif II. Whether they are processed or not, these species of high molecular weight would look like large TM-like fusion proteins. A general feature of vertebrate retroviral Env proteins is that SU is always larger than TM. Since the SU-like peptide upstream of the fusion peptide of IERV Env is much smaller than TM, we wonder whether the attachment function is encoded by another (cellular?) protein or is definitely dispensable. Further studies of the replication of DmeGypV, the IERV paradigm, will hopefully provide deeper insights into these questions.
Previous studies have shown a relationship between envelope proteins of some errantiviruses and baculovirus envelope fusion proteins [6, 7]. We extended these results to a larger sample of errantiviruses, and we showed that these elements form a monophyletic group. Our analysis provide a basis for the understanding of the evolutionary origin of IERV in insect genomes.
Materials and Methods
Sequence identification and retrieval
Insect retroviral sequences previously identified [2, 25] were obtained from the EMBL/Genbank database using their accession number (Table 1). We observed that DviTv1V, DviGypV, DsuGypV and DmeNomV Env sequences display premature stop codons due to a couple of indels. Hence we introduced 2 or 3 frameshifts to restore full-length Env coding capacity. The full-length "restored" sequences were then used in our analyses.
Alignment and phylogenetic analyses
The MEME program http://meme.sdsc.edu/meme/website/ was used to identify the most conserved amino acid motifs of insect Metaviridae Gag and Env proteins. MEME is a motif-based method which detects conserved patterns or motifs among a set of sequences . This method provides a robust way to construct multiple alignments based on segment-to-segment (or local) comparisons rather than the residue-to-residue (or global) comparisons. This approach is specially adequate when one considers sequences with low levels of global similarities , which is the case for Gag and Env proteins [28, 29]. The selected domains were then manually aligned. Phylogenetic analyses were performed on the multiple sequence alignments using Neighbor-Joining distance methods used by CLUSTAL X . Bootstrap N-J commands (N = 1000) of CLUSTAL X were used to generate bootstrapped trees. PHI-BLAST (Pattern-Hit Initiated BLAST; http://www.ncbi.nlm.nih.gov/blast/) is a search program that combines matching of regular expressions with local alignments surrounding the match . All trees generated were visualised with Tree View 1.5.2 http://taxonomy.zoology.gla.ac.uk/rod/treeview.html. The putative coiled-coil-like regions were detected using the LearnCoil-VMF program http://web.wi.mit.edu/kim, which is designed for viral membrane-fusion proteins .
Xiong Y, Eickbush TH: Origin and evolution of retroelements based upon their reverse transcriptase sequences. Embo J. 1990, 9: 3353-62.
Boeke JD, Eickbush TH, Sandmeyer SB, Voytas DF: Metaviridae. In: in Virus Taxonomy: ICTV VIIth Report Edited by Murphy FA. New York: Springer-Verlag. 2000
Song SU, Gerasimova T, Kurkulos M, Boeke JD, Corces VG: An env-like protein encoded by a Drosophila retroelement: evidence that gypsy is an infectious retrovirus. Genes Dev. 1994, 8: 2046-57.
Kim A, Terzian C, Santamaria P, Pelisson A, Purd'homme N, Bucheton A: Retroviruses in invertebrates: the gypsy retrotransposon is apparently an infectious retrovirus of Drosophila melanogaster. Proc Natl Acad Sci USA. 1994, 91: 1285-9.
Teysset L, Burns JC, Shike H, Sullivan BL, Bucheton A, Terzian C: A Moloney murine leukemia virus-based retroviral vector pseudotyped by the insect retroviral gypsy envelope can infect Drosophila cells. J Virol. 1998, 72: 853-6.
Malik HS, Henikoff S, Eickbush TH: Poised for Contagion: Evolutionary Origins of the Infectious Abilities of Invertebrate Retroviruses. Genome Res. 2000, 10: 1307-1318. 10.1101/gr.145000.
Rohrmann GF, Karplus PA: Relatedness of baculovirus and gypsy retrotransposon envelope proteins. BMC Evol Biol. 2001, 1: 1-10.1186/1471-2148-1-1.
Pearson MN, Groten C, Rohrmann GF: Identification of the lymantria dispar nucleopolyhedro virus envelope fusion protein provides evidence for a phylogenetic division of the baculoviridae. J Virol. 2000, 74: 6126-31. 10.1128/JVI.74.13.6126-6131.2000.
IJkel WF, Westenberg M, Goldbach RW, Blissard GW, Vlak JM, Zuidema D: A novel baculovirus envelope fusion protein with a proprotein convertase cleavage site. Virology. 2000, 275: 30-41. 10.1006/viro.2000.0483.
Vogt VM: Retroviral virions and genomes. In: Retroviruses Edited by Coffin JM, Hughes SH, Varmus HE. NY: Cold Spring Harbor Laboratory Press. 1997, 27-69.
Alberola TM, de Frutos R: Molecular structure of a gypsy element of Drosophila subobscura (gypsyDs) constituting a degenerate form of insect retroviruses. Nucleic Acids Res. 1996, 24: 914-23. 10.1093/nar/24.5.914.
Robbins J, Dilworth SM, Laskey RA, Dingwall C: Two interdependent basic domains in nucleoplasmin nuclear targeting sequence: identification of a class of bipartite nuclear targeting sequence. Cell. 1991, 64: 615-23.
Pelisson A, Song SU, Prud'homme N, Smith PA, Bucheton A, Corces VG: Gypsy transposition correlates with the production of a retroviral envelope-like protein under the tissue-specific control of the Drosophila flamenco gene. EMBO J. 1994, 13: 4401-11.
Leblanc P, Desset S, Dastugue B, Vaury C: Invertebrate retroviruses: ZAM a new candidate in D.melanogaster. EMBO J. 1997, 16: 7521-31. 10.1093/emboj/16.24.7521.
Tanda S, Mullor JL, Corces VG: The Drosophila torn retrotransposon encodes an envelope protein. Mol Cell Biol. 1994, 14: 5392-401.
Whalen JH, Grigliatti TA: Molecular characterization of a retrotransposon in Drosophila melanogaster, nomad, and its relationship to other retrovirus-like mobile elements. Mol Gen Genet. 1998, 260: 401-9. 10.1007/s004380050910.
Lerat E, Capy P: Retrotransposons and retroviruses: analysis of the envelope gene. Mol Biol Evol. 1999, 16: 1198-207.
Steiner DF: The proprotein convertases. Curr Opin Chem Biol. 1998, 2: 31-9. 10.1016/S1367-5931(98)80033-1.
Malik HS, Eickbush TH: Modular evolution of the integrase domain in the Ty3/Gypsy class of LTR retrotransposons. J Virol. 1999, 73: 5186-90.
Frame IG, Cutfield JF, Poulter RT: BEL-like LTR-retrotransposons in Fugu rubripes, Caenorhabditis elegans, and Drosophila melanogaster. Gene. 2001, 263: 219-30. 10.1016/S0378-1119(00)00567-9.
Singh M, Berger B, Kim PS: LearnCoil-VMF: computational evidence for coiled-coil-like motifs in many viral membrane-fusion proteins. J Mol Biol. 1999, 290: 1031-41. 10.1006/jmbi.1999.2796.
Larsson E, Kato N, Cohen M: Human endogenous proviruses. Curr Top Microbiol Immunol. 1989, 148: 115-32.
Swanstrom VM, Wills JM: Synthesis, assembly and processing of viral proteins. In: Retroviruses Edited by Coffin JM, Hughes SH, Varmus HE. pp. 263–334. NY: Cold Spring Harbor Laboratory Press;. 1997, 263-334.
Ozers MS, Friesen PD: The Env-like open reading frame of the baculovirus-integrated retrotransposon TED encodes a retrovirus-like envelope protein. Virology. 1996, 226: 252-9. 10.1006/viro.1996.0653.
Bucheton A, Terzian C, Pelisson A: Retroviruses of Drosophila: the gypsy paradigm. In: Encyclopedia of Virology. Edited by Press A. London . 1999, 1526-1530.
Bailey TL, Elkan C: Fitting a mixture model by expectation maximization to discover motifs in biopolymers. In: Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology; Menlo Park, California. 1994, 28-36.
Thompson JD, Plewniak F, Poch O: A comprehensive comparison of multiple sequence alignment programs. Nucleic Acids Res. 1999, 27: 2682-90. 10.1093/nar/27.13.2682.
McClure MA, Johnson MS, Feng DF, Doolittle RF: Sequence comparisons of retroviral proteins: relative rates of change and general phylogeny. Proc Natl Acad Sci US A. 1988, 85: 2469-73.
Lewe G, Flugel RM: Comparative analysis of the retroviral pol and env protein sequences reveal different evolutionary trees. Virus Genes. 1990, 3: 195-204.
Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG: The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 1997, 25: 4876-82. 10.1093/nar/25.24.4876.
Zhang Z, Schaffer AA, Miller W, Madden TL, Lipman DJ, Koonin EV, Altschul SF: Protein sequence similarity searches using patterns as seeds. Nucleic Acids Res. 1998, 26: 3986-90. 10.1093/nar/26.17.3986.
Page RD: Tree View: an application to display phylogenetic trees on personal computers. Comput Appi Biosci. 1996, 12: 357-8.
We thank two anonymous reviewers for valuable comments. This research was supported by the "Programme Informatique et Genome ", ARC and the CNRS.