Relationships of gag-pol diversity between Ty3/Gypsy and RetroviridaeLTR retroelements and the three kings hypothesis
© Llorens et al; licensee BioMed Central Ltd. 2008
Received: 23 March 2008
Accepted: 08 October 2008
Published: 08 October 2008
The origin of vertebrate retroviruses (Retroviridae) is yet to be thoroughly investigated, but due to their similarity and identical gag-pol (and env) genome structure, it is accepted that they evolve from Ty3/Gypsy LTR retroelements the retrotransposons and retroviruses of plants, fungi and animals. These 2 groups of LTR retroelements code for 3 proteins rarely studied due to the high variability – gag polyprotein, protease and GPY/F module. In relation to 3 previously proposed Retroviridae classes I, II and II, investigation of the above proteins conclusively uncovers important insights regarding the ancient history of Ty3/Gypsy and Retroviridae LTR retroelements.
We performed a comprehensive study of 120 non-redundant Ty3/Gypsy and Retroviridae LTR retroelements. Phylogenetic reconstruction inferred based on the concatenated analysis of the gag and pol polyproteins shows a robust phylogenetic signal regarding the clustering of OTUs. Evaluation of gag and pol polyproteins separately yields discordant information. While pol signal supports the traditional perspective (2 monophyletic groups), gag polyprotein describes an alternative scenario where each Retroviridae class can be distantly related with one or more Ty3/Gypsy lineages. We investigated more in depth this evidence through comparative analyses performed based on the gag polyprotein, the protease and the GPY/F module. Our results indicate that contrary to the traditional monophyletic view of the origin of vertebrate retroviruses, the Retroviridae class I is a molecular fossil, preserving features that were probably predominant among Ty3/Gypsy ancestors predating the split of plants, fungi and animals. In contrast, classes II and III maintain other phenotypes that emerged more recently during Ty3/Gypsy evolution.
The 3 Retroviridae classes I, II and III exhibit phenotypic differences that delineate a network never before reported between Ty3/Gypsy and Retroviridae LTR retroelements. This new scenario reveals how the diversity of vertebrate retroviruses is polyphyletically recurrent into the Ty3/Gypsy evolution, i.e. older than previously thought. The simplest hypothesis to explain this finding is that classes I, II and III trace back to at least 3 Ty3/Gypsy ancestors that emerged at different evolutionary times prior to protostomes-deuterostomes divergence. We have called this "the three kings hypothesis" concerning the origin of vertebrate retroviruses.
Attention was first drawn to the Retroviridae when HTLV-1 was characterized as pathogenic in humans [1, 2]. They further increased in significance with the discovery of HIV-1, the retrovirus responsible for AIDS in humans [3, 4]. These 2 retroviruses represent only a small part of Retroviridae diversity, which can be divided in seven genera; Alpha-, Beta-, Gamma-, Delta-, Epsilon-, Spumaretroviridae and Lentiviridae (according to ICTV classification ). Based on their strategy of transmission, the Retroviridae can also be classified as endogenous retroviruses when they enter the germ lines of hosts and are vertically transmitted; or as exogenous retroviruses, when they can be transmitted horizontally from one host into another via infection. Most recent trends in Retroviridae taxonomy [6–10] group endogenous and exogenousretroviruses into 3 major classes designated as I, II and III. Both classifications are complementary as class I comprises gamma- and epsilonretroviruses; class II includes lentiviruses, delta-, alpha- and betaretroviruses; and class III groups spumaretroviruses with ERV-L retroelements. The ancient history of the Retroviridae is yet to be thoroughly investigated, but due to their similarity and identical gag-pol (and env) genome structure, it is usually assumed that they evolve from the Ty3/Gypsy LTR retroelements of plants, fungi and animals . The traditional view suggested by pol polyprotein domains such as the RT [12–14], RNAse H [14, 15], and INT [14, 16] used to resolve the phylogeny, delineates a common Ty3/Gypsy origin for all vertebrate retroviruses. Nevertheless little is known about this scenario because RT, RNAse H and INT analyses appear unable of agreeing on a precise well-supported Ty3/Gypsy root for the Retroviridae. In an attempt to bring light on this topic, we investigated 120 non-redundant Ty3/Gypsy and Retroviridae taxa based on the phylogenetic analysis of both gag and pol polyproteins. Our results revealed conflicting phylogenetic signals between these 2 polyproteins. From that point, we aimed to investigate more in depth this evidence through comparative analyses performed based on 3 independent proteins rarely considered by prior studies due to their variability – the gag polyprotein, the PR and the GPY/F module. Our study reveals taxonomic differences among the 3 Retroviridae classes, and an evolutionary network that distantly relates each class with one or more Ty3/Gypsy lineages. This observation appears to be at odds with the traditional monophyletic view suggested by prior approaches to determining the origin of vertebrate retroviruses, but requires further study. In light of this new perspective, we introduce here a new hypothesis for debate and further evaluation. Our hypothesis argues that classes I, II and III probably trace back to at least 3 independent Ty3/Gypsy ancestors. We call this the three kings hypothesis.
Consistency of lineages but conflicting phylogenetic signals between gag and pol polyproteins in the Ty3/Gypsy and Retroviridaeevolutionary history
Retroviridae differentiation into classes outlines phenotypic differences in the gag polyprotein that distantly relate each class with one or more Ty3/Gypsylineages
Hits of BLASTp similarity between Micropia/Mdg3 and other Ty3/Gypsy and Retroviridae gags
Query: Micropia gag
Query: Mdg3 gag
Hits of BLASTp similarity between Tat and other Ty3/Gypsy and Retroviridae gags
Query: Retrosor1 gag
Query: Tat4-1 gag
Comparative analyses confirm phenotypic features in the gag polyprotein that distantly relate each Retroviridae class with one or more of the Ty3/Gypsy lineages evaluated. The similarity spans the CA-NC core and the most prominent feature in common is the variability in the number of CCHC arrays per NC. With very few exceptions, the Athila/Tat elements of plants usually code for NCs exhibiting one CCHC array, Micropia/Mdg3 and Mag elements code for NCs usually exhibiting 2 arrays (except Mag elements of C.elegans), and errantiviral gags have not CCHC arrays at their C-terminus. This indicates that the number of CCHC arrays per NC is evolutionarily preserved depending on the Ty3/Gypsy lineage and the Retroviridae class, and that this phenotype is an excellent indicator of taxonomy and evolution. For simplicity's sake, we do not discuss all Ty3/Gypsy cases. We discuss but one example, the most interesting instance of using this indicator – the chromodomain-containing Ty3/Gypsy LTR retrotransposons  called chromoviruses . Chromoviruses are the most ancient branch of Ty3/Gypsy LTR retroelements as they have been described in the genomes of plants, fungi and vertebrates (for a more extensive information about chromoviruses, see [23, 32, 33]). It noteworthy that all Ty3/Gypsy LTR retroelements of plants can be divided in 2 major branches – chromoviruses and Athila/Tat – and that chromoviruses appear to be the only branch of Ty3/Gypsy LTR retroelements capable of colonizing the genomes of fungi. A prior study  reported that this branch of Ty3/Gypsy LTR retroelements displays similarity (we confirm) to gammaretroviruses based on CA-NC. However, we have also found how that chromoviruses show similarities to class II in addition to a number of Ty3/Gypsy lineages (for this reason chromoviruses fall at an intermediate position in the gag phylogeny). With rare exceptions, NCs coded by chromoviruses usually bear one CCHC array (data not shown). In contrast, the different Ty3/Gypsy lineages described in bilateria organisms show greater variability in the number of CCHC arrays at NC thantheir Ty3/Gypsy counterparts of plants and fungi (i.e. chromoviruses and the Athila/Tat branch). Gag evidence thus relates class I to the most likely CA-NC phenotype of Ty3/Gypsy ancestors predating the split between plants and the ophistokonts (fungi and animals) and classes II and III with other CA-NC phenotypes, more frequently observed among the Ty3/Gypsy LTR retroelements of protostomes and deuterostomes.
Retroviridae differentiation into classes reveals three protease isoforms based on flap motif polymorphisms, which are common to Ty3/Gypsy and RetroviridaeLTR retroelements
Through phylogenetic analyses, we have shown that the pol signal is primarily responsible for the branching of Ty3/Gypsy and Retroviridae LTR retroelements in 2 monophyletic groups. That is the usual evolutionary perspective based on the RT and other pol polyprotein domains. We have also shown that gag signal discloses an alternative scenario wherein each Retroviridae class can be related to one or more Ty3/Gypsy lineages. An in-depth examination of gag diversity through comparative analyses has revealed the phenotypic variations involved in this differential similarity. Gag evidence is thus well supported. An interesting question is whether this evidence should be considered a convergence due to the fast rate of evolution of the gag polyprotein, or if it is due to an ancient divergence. Certainly, the most robust components of the pol polyprotein – the RT, RNAse H and INT – usually support the traditional perspective originally delineated by RT analyses . However, the strong signal from these 3 proteins disguises the particular perspective provided by another pol protein domain – the PR. Non-redundant studies focusing on Ty3/Gypsy and Retroviridae PRs are rarely reported as this enzyme presents identical analytical difficulty to gag due to its fast rate of evolution. Despite this it is well known that LTR retroelement PRs in general are aspartic peptidases belonging to clan AA (following MEROPS Database classification ). Within clan AA, Retroviridae PRs are divided into 2 protein families, retropepsins (family A2) and spumaretropepsins (family A9). Family A2 groups all PRs coded by classes I and II and family A9 collects the PRs coded by spumaretroviruses (class III). Such a classification keeps going because retropepsins and spumaretropepsins are strongly dissimilar each other and do not group on a single branch in any analysis (data not shown). On the other hand, Ty3/Gypsy PRs are extremely variable and little is known about them. MEROPS Database at least classifies many Ty3/Gypsy examples within family A2 because these PRs display great similarity to retropepsins. However, not all Ty3/Gypsy PR are similar to retropepsins as not all Retroviridae PRs are retropepsins. Because no study evaluates the relationships between Ty3/Gypsy and Retroviridae PRs, we investigated this topic, taking into consideration the differentiation of the 2 groups of LTR retroelements into lineages. It is worth remembering that while gag and pol signals are in disagreement over the taxonomical groups, they do support the differentiation into clades, genera and classes of Ty3/Gypsy and Retroviridae LTR retroelements.
Retroviridae class I is a molecular fossil preserving GPY/F module phenotypes that probably were predominant among Ty3/Gypsyancestors predating the split between plants fungi and animals
Retroviridae differentiation into the 3 classes I, II and III unravels phenotypic aspects of vertebrate retroviruses, which are probably related with their ancient Ty3/Gypsyorigins
Phylogenetic analysis inferred based on all concatenated gag and pol products coded by Ty3/Gypsy and Retroviridae LTR retroelements shows the robustness of their phylogenetic signal regarding the clustering of OTUs [5–14, 19–25]. We used the parsimony method to infer this phylogeny, but the clustering of OTUs is independent of the method of phylogenetic reconstruction used (see Methods). The gag-pol analysis also divides Ty3/Gypsy and Retroviridae LTR retroelements into 2 separate branches, as suggested by original approaches in this topic [12, 41]. We do not disagree this classification for 2 reasons; first, the strong phylogenetic signal of RT, RNAseH, and INT cannot be dismissed;and second, the Retroviridae (except gammaretroviruses) can be distinguished from Ty3/Gypsy LTR retroelements by features such as the presence of accessory genes. Nevertheless, thecurrent Ty3/Gypsy and Retroviridae classification only exposes the modern evolutionary history of these 2 groups of retroelements (we have shown how their ancient history is not straightforward). Due to the wide distribution of Ty3/Gypsy elements in eukaryotes, the usual means of transference of a canonical Ty3/Gypsy LTR retrotransposon is probably vertical. However, the viral nature of a true Ty3/Gypsy or Retroviridae exogenous retrovirus resides in its capability of horizontal transference from one host to another via infection. Moreover, the incidence of mechanisms such as gene recruitment, genome rearrangement, recombination and chimerism in LTR retroelement evolution, presents difficulties in identifying the true natural history of Ty3/Gypsy and Retroviridae LTR retroelements. This suggests that the most realistic (not yet proposed) model for describing Ty3/Gypsy and Retroviridae evolution alternates gradual and modular evolution, and combines vertical and horizontal means of transference.
The traditional argument supporting the Ty3/Gypsy origins of vertebrate retroviruses is shown by their similarity in sequence and genome structure . The question is, however, what genetic material is more informative for exploring the relationships between these two (and other) groupsof LTR retroelements, highl y variable traits such as gag and PR or strongly preserved substrates such as the RT, RNAseH and INT? Certainly, RT, RNAseH and INT are an excellent means of classifying Ty3/Gypsy and Retroviridae LTR retroelements into lineages. However, phylogenetic analyses based on RT, RNAseH and INT are not exact enough to resolve the ancient evolutionary history of these 2 groups. This is because the inferred phylogeny based on these proteins does not necessarily coincide with the true natural history of the full-length retroelement genome. Here, the advantage of using the gag-pol alignment to infer the phylogeny is the increase in statistical power of the analysis, allowing the opportunity to correct the single gene tree discrepancies. This analytical strategy is useful but has limitations for which solutions remain elusive; the inferred tree can accumulate systematic errors due to the use of concatenated information. We have shown how gag-pol tree suggests a Ty3/Gypsy root in the origins of vertebrate retroviruses that is close to the Micropia/Mdg3 clade. However, evaluation of gag and pol polyproteins separately yields discordant information. Here, while pol phylogeny supports the traditional perspective (2 retroelement groups), gag phylogeny describes a new scenario that appears to be informative with respect to the ancient patterns of diversity of Ty3/Gypsy and Retroviridae LTR retroelements. Certainly, the phylogenetic signal of the gag polyprotein has several limitations due to its fast evolution. To overcome these limitations we investigated other protein domains and used different methodologies to evaluate the significance of the new scenario. The most important feature here is that, for first time in the scientific literature, we have carried out a non-redundant study of three independent proteins that have rarely been attempted before because their difficulty.
Our investigation conclusively reveals that the taxonomical differentiation into the 3 Retroviridae classes I, II and III discloses 3 different gag and PR products, and that each product has one or more distant Ty3/Gypsy counterparts. The analysis of the GPY/F module reveals partial consistency and how the similarity of class I to Ty3/Gypsy LTR retroelements of plant and fungi, is significant. Our results thus support an ancient scenario of polyphyly involving the 3 Retroviridae classes and different Ty3/Gypsy lineages. Here, we stress that the identification of the Retroviridae classes is not a conclusion but an assumption based on previous studies [6–10]. Notwithstanding, we cannot argue for the existence of a direct ancestor between each class and any particular Ty3/Gypsy lineage. Classes I and II are sufficiently similar to corroborate their accepted evolutionary relationship, and it can also be assumed that Ty3/Gypsy and Retroviridae phylogeny is incomplete (sequencing projects are continuously disclosing new lineages). Despite this, the similarity of each class by simple convergence to different Ty3/Gypsy lineages based on 3 independent protein products is an implausible parsimonious explanation. Moreover, while class III spumaretroviruses are dissimilar to classes I and II, our results reveal that they in turn display an intriguing domain similarity to errantiviruses that ought to be followed up. Hence we think that the class differentiation probably unravels certain aspects of vertebrate retroviruses related to their ancient Ty3/Gypsy origins. Instead of a single root to this new scenario, we show how an ancient evolutionary network between the 2 groups can exist, with its most interesting aspect being its polyphyly. (The Ty3/Gypsy lineages related to each class does not constitute a monophyletic branch in any phylogeny). Therefore, our approach strongly suggest that class I is a molecular fossil that emerged quite soon in Ty3/Gypsy evolution, while classes II and III emerged later, together with the ancestors of Ty3/Gypsy LTR retroelements described in protostomes.
Introducing the Three Kings Hypothesis: A new principle for debate and further evaluation about the subject of the Ty3/Gypsyorigins of vertebrate retroviruses
The evolutionary network identified by classes I, II, III is inconsistent with the idea of a unique Retroviridae ancestor. It follows that various scenarios may either support or disprove such a network. Assuming this network exists, the most likely scenario relates Ty3/Gypsy elements of plants and fungi with the Retroviridae class I. This scenario assumes the existence of a distant evolutionary relationship between the lineages or an ancient horizontal transfer of chromoviruses from fungi (or plants) to vertebrates. Indeed, chromoviruses are the most ancient lineage of Ty3/Gypsy LTR retrotransposons. They are rich in genetic variability, and are also present in the genome of many vertebrates [23, 32, 33]. In both cases, the most likely explanation for the relationship between class I and Athila/Tat retroviruses and retrotransposons of plants is that chromoviruses and class I are related, an argument suggested by a previous study . Nevertheless, chromoviruses of vertebrate organisms are usually more similar to their chromoviral counterparts of fungi than to those of plants. Therefore the chromoviral scenario does not explain why class I and Athila/Tat elements of plants are similar each other based on gag. On the other hand, chromoviruses have not yet been described in protostomes, echinoderms and urochordates; furthermore it remains unclear whether chromoviruses were inexorably driven to extinction in these organisms or were horizontally transmitted from plants/fungi to vertebrates. Consequently, the chromoviral scenario does not clarify why classes II and III and the Ty3/Gypsy lineages of protostomes share sequence similarities and phenotypic features rarely found among the Ty3/Gypsy lineages of plants and fungi. With this in mind, a new theoretical principle is posited here for debate and further research. The simplest hypothesis is that classes I, II and III probably evolved from at least 3 Ty3/Gypsy ancestors and emerged at different evolutionary times prior to the split between protostomes and deuterostomes (the three kings hypothesis). Several points involved in the background of this hypothesis should be emphasized. First, we include the words "at least" to acknowledge the three classes but do not dismiss the possibility of more Ty3/Gypsy ancestors in the evolutionary history of the Retroviridae. Second, "different times of emergence" suggests, but does not necessarily mean, independent origins. Class II may in fact be directly related to class I, but the emergence of class II seems more recent and in parallel with the emergence of the ancestorsof several Ty3/Gypsy lineages, such as the Micropia/Mdg3 clade (or others). Class III spumaretroviruses delineate identical perspective with Ty3/Gypsy errantiviruses. Third, we use the term "polyphyletic" because the Ty3/Gypsy lineages related to each class do not constitute a monophyletic branch in any phylogeny. Moreover, viral evolution is always a polyphyletic challenge involving ecological parameters such as host populations, environment, vectors, mechanisms of transmissions, etc.
The polyphyletic recurrence of vertebrate retroviruses into the evolutionary performance of Ty3/GypsyLTR retroelements
We have described how the different gags, PRs and GPY/F modules evaluated show a variability that is preserved, depending on the Ty3/Gypsy lineage and Retroviridae class (or genus). While class I can be related to Ty3/Gypsy elements of plants and fungi, classes II and III preserve phenotypic features typically observed among Ty3/Gypsy elements of protostomes. That is the evolutionary perspective provided by the protein product of 3 independent coding regions. We have discussed this evidence but have not yet interpreted why the diversity and phylogeny of Ty3/Gypsy and Retroviridae LTR retroelements are so different regarding the different gag or pol substrates. In general, the action of viruses and mobile genetic elements is important in host evolution [16, 42–47] because they are vectors of evolution and potential inducers of diseases and genetic disorders, such as chromosome rearrangements and inversions . However, if the action of viruses and mobile genetic elements might somehow influence the host evolution, it is reasonable that host evolution could also constrain the evolution of these genetic agents. We thus speculate with the possibility of selective influences imposed on Retroviridae genes such as the rt, rnase h and int (and other regions) to optimize essential functions, such as retrotranscription and integration (according to the complexity of the new genome environment provided by vertebrate organisms). This probably involves gradual evolution but also a number of molecular mechanisms, such as gene recruitment and recombination to generate variability and new effective genetic combinations. Here, it is important to keep in mind that except gammaretroviruses and other exceptions, the Retroviridae usually incorporate accessory genes, usually needed to adjust diverse aspects of their replication and infectivity (these features appear to be specific of retroviruses infecting vertebrate organisms). On the other hand, a prior study  supports a putative chimeric origin of the Retroviridae RNAse H domain and the modular acquisition of the GPY/F module by Ty3/Gypsy and Retroviridae INTs . Moreover, D-type betaretroviruses probably are viral hybrids between a B-type betaretrovirus and a C-type gammaretrovirus [5, 17, 49]. Finally, a number of studies reveal how recombination is a mechanism frequently embraced by HIV evolution to generate variability. Two studies reveal for instance how recombination of M subtypes, has resulted in the generation of multiple circulating recombinant forms consisting of mosaic HIV-1 lineages [50, 51].
Regarding coding regions such as gag, pr and gpy/f module, we think that these traits reveal features and aspects involving different evolutionary strategies, but which are intrinsic and taxonomically related with ancient events of retroelement speciation and divergence. This argument finds an important evolutionary marker in the variability in the number of CCHC arrays at NC and the different PR and GPY/F module isoforms. Indeed, the CCHC array at NC is involved in virion assembly, RNA packaging, reverse transcription and integration processes . On the other hand, the flap lies over the PR active site and conveys specificity to the enzyme by carrying important substrate-binding functions (for more information in this topic, see [35, 53, 54]). Finally, while the GPY/F module is now under investigation, the C-terminal end of the INT appears to be important in the integration of the retroelement into the host genome [55, 56]. The variability of these three regions probably reveals different evolutionary strategies of speciation and divergence, which can be assumed older than previously supposed, since it does not only occur in the Retroviridae group, but also in all Ty3/Gypsy LTR retroelements of plants, fungi and animals. Here, the three kings hypothesis and its testing (in one sense or another) does not affect the evidence we have presented. That is, class I, II and III taxonomically code for 3 gag, PR and GPY/F products that have one or more distant counterparts among Ty3/Gypsy LTR retroelements. However, the most interesting aspect of the gag-PR-GPY/F variability is that it appears to be constrained by the bio-distribution of Ty3/Gypsy LTR retroelements. In turn, the diversity patterns of the Retroviridae based on these regions appear to be recurrent into the evolutionary performance of Ty3/Gypsy LTR retroelements, the most interesting aspect of which is that they seem polyphyletic. Therefore the evolutionary network between Ty3/Gypsy and Retroviridae LTR retroelements is informative regarding an ancestral history, which is in some respects similar to those models of evolution indistinctly described by population genetics and quasi-species theory (for more details see ). This means that further analysis of the evolutionary network we disclose in this study challenges the involvement of different parameters such as bio-distribution, host's populations, environment, vectors and mechanisms of transmissions, etc. With this aim, our hypothesis makes possible a first evaluation of this new scenario we present in a forthcoming manuscript (submitted for publication). In this approach, we use the number of CCHC arrays at NC and the different PR and GPY/F module isoforms as evolutionary markers to trace the network. This is by superimposing not only Ty3/Gypsy and Retroviridae LTR retroelements, but also other LTR retroelement groups over their host bio-distribution.
Retroviridae classes I, II and III exhibit phenotypic differences that delineate a network never before reported between Ty3/Gypsy and Retroviridae LTR retroelements. This new scenario reveals how the diversity of vertebrate retroviruses is polyphyletically recurrent into the Ty3/Gypsy evolution, i.e. older than previously thought. The simplest hypothesis to explain this finding is that classes I, II and III trace back to at least 3 Ty3/Gypsy ancestors that emerged at different evolutionary times prior to protostomes-deuterostomes divergence. We have called this "the three kings hypothesis" concerning the origin of vertebrate retroviruses.
Sequences and databases
This work is part of the GyDB Project  an ongoing database launched with the aim of phylogenetically analyzing and classifying mobile genetic elements based on their diversity and evolutionary profile. In the first iteration, we consider the Ty3/Gypsy and Retroviridae LTR retroelements of eukaryotes. We have investigated 120 non-redundant full-length Ty3/Gypsy and Retroviridae genomes collected from NCBI . An extended version of the gag-pol tree evaluated summarizing names, taxonomy, hosts, and Genbank accessions of all retroelement taxa used to perform this analysis, is available online as the Additional file 1 accompanying this paper. By clicking the name of each OTU in this tree, the user can browse the GyDB and locate a file providing information of the OTU selected, including a link to the Genbank accession of the requested element at NCBI. The gag-pol tree can also be found online in the Section Phylogenies at GyDB .
Multiple alignments and comparative analyses
In general, all Ty3/Gypsy and Retroviridae LTR retroelements have 2 polyproteins in common – gag and pol. Gag is composed of 3 domains -MA, CA and NC -, pol is usually carrier of 4 domains – PR, RT, RNAse H and INT. Note however that PR can be coded separately or in frame with gag and other protein domains. We have used and analyzed a gag-pol multiple alignment ~1700 residues in size, constructed based on the concatenation of the CA, NC, PR, RT, RNAseH and INT cores. The gag-pol alignment is freely accessible within the GyDB collection deposited at Biotechvana Bioinformatics . The alignment is available in 6 formats at the following URL . We have also analyzed the gag and pol polyproteins by separate dividing the gag-pol alignment into 2 independent alignments CA-NC and PR-RT-RNAseH-INT, to perform phylogenetic or comparative analyses.
Alignments were compared using GENEDOC editor  in shaded mode and the following groups of amino acid similarity: [T,S small nucleophile amino acids] [K,R,H basic amino acids], [D,E,N,Q acidic amino acid and relative amides], and [L,I,V,M,A,G,P,F,Y,W hydrophobic amino acids]. Similarities between gag sequences were correlated using different gag queries to the CORES database available via the NCBI BLAST search  at GyDB, using BLASTp search mode. BLAST databases available at GyDB are non-redundant, small and include only Ty3/Gypsy and Retroviridae or related sequences, allowing flexible comparisons between both distantly and closely related sequences with homologous known functions.
Comparative analyses based on sequence logos involved CheckAlign 1.0  in Shannon's algorithm mode  and correction factor. Sequence logomethodology was originally introduced by Schneider et al. [65, 66] to display consensus sequences for DNA and protein alignments. Later, Schneider dismissed the term "consensus" , arguing that a logo provides more information than the consensus sequence of a protein or DNA alignment. While this can be controversial because there are many manners to obtain or describe a consensus sequence, logos methodology being one of them, we are in agreement with the proposition of the original author in the use of the term "sequence logo" suggested in his website . We employ the term "sequence logo" to describe the resultant output reported by this analysis, and then refer to the protein information underlying the content shape of the logos constructed, based on our alignments as "amino acidic architecture". This term may be useful to describe with a single word – consensus, core and amino acid patterns. CheckAlign directly builds the logo from an ungapped alignment using the conventional methodology [65, 66]. Here, the maximum uncertainty by position in a protein alignment is log2 20 = 4.3. In the case of gapped alignments, CheckAlign automatically builds the logo, taking the gap as another amino acid species. Here, the tool considers the maximum uncertainty by position to be log2 21 = 4.4 for protein alignments (for more details about CheckAlign see ).
Phylogenetic reconstructions of Ty3/Gypsy and Retroviridae LTR retroelements inferred from gag-pol, pol and gag alignments employed the PHYLIP 3.6 package . We first generated 100 bootstrap replicates of each alignment using SEQBOOT. Second, we used the protein sequence parsimony method of Felsenstein, based on the approaches of Eck and Dayhoff  and Fitch  to perform the analyses. Here, the bootstrap file was used as an input to PROTPARS and the input randomized using the following parameters, random number seed = 5 and number of times to jumble = 5. Third, CONSENSE was used to obtain a MRC tree  using the tree file generated by PROTPARS as an input. As the MRC tree usually consists of all clusters that occur >50% of the time, we took consensus values >55 as a bootstrap reference. Bootstrap values were used to scale the trees.
We also tested the NJ method  using different models of distances implemented in PROTDIST. Here, it is important to keep in mind that the overall efficiency of the different methods of phylogenetic reconstruction in building the true tree vary with substitution rate, transition-transversion ratio, and sequence divergence [77, 78]. With the particular material we studied, parsimony and NJ trees support the clustering of OTUs into clades and genera in gag-pol and pol analyses, and they are consistent in not supporting the monophyly of each group in gag analyses. However, parsimony phylogenies proved more consistent with comparative analyses than NJ trees when inferring phylogenies including or evaluating the gag and/or PR proteins. Parsimony analyses also reported better bootstrapping and were more consistent with the three Retroviridae classes than NJ analyses (NJ trees only support classes I and II).
Acquired Immune Deficiency Syndrome
Bovine Leukemia Virus
Caprine Arthritis Encephalitis Virus
Equine Foamy Virus
Feline Leukemia Virus
Feline Foamy Virus
Feline Syncytial Virus
International Committee on Taxonomy of Viruses
Human Immunodeficiency Virus
Human Foamy Virus
Human T-cell Leukemia Virus
- (HMM profile):
Hidden Markov Model
Long terminal repeat
Lymphoproliferative Disease Virus
Major homology region
Maedi Visna Virus
Mason-Pfizer Monkey Virus
Mouse Mammary Tumor Virus
National Center of Biotechnology Information
Operative taxonomical unit
Ovine Maedi Visna Virus
Research Collaboratory for Structural Bioinformatics
- (RNAse H):
Rous Sarcoma Virus
Servei Central de Suport a la Investigació Experimental
Simian Endogenous Retrovirus of Mandrill
Simian Immunodeficiency Retrovirus of Macaques, Simian Foamy Virus (SFV)
- (3D structure):
We thank Javier Ortiz and Isaac Fernandez of the SCSIE at University of Valencia for technical support, and the 2 anonymous referees for their useful comments for improving the original manuscript. The GyDB project was awarded the NOVA 2006 by IMPIVA and Conselleria d'Empresa, Universitat i Cìencia of Valencia. The research has been partly supported by grants IMCBTA/2005/45, IMIDTD/2006/158 and IMIDTD/2007/33 from IMPIVA, by grant BFU2005-00503 from MEC to AM, and by financial grant 17092008 from ENISA (Empresa Nacional de Innovacion SA) to Biotechvana. Funding to pay the Open Access publication charges for this article was provided by University of Valencia.
- Poiesz BJ, Ruscetti FW, Gazdar AF, Bunn PA, Minna JD, Gallo RC: Detection and isolation of type C retrovirus particles from fresh and cultured lymphocytes of a patient with cutaneous T-cell lymphoma. Proc Natl Acad Sci USA. 1980, 77: 7415-7419. 10.1073/pnas.77.12.7415.PubMed CentralView ArticlePubMedGoogle Scholar
- Yoshida M, Miyoshi I, Hinuma Y: Isolation and characterization of retrovirus from cell lines of human adult T-cell leukemia and its implication in the disease. Proc Natl Acad Sci USA. 1982, 79: 2031-2035. 10.1073/pnas.79.6.2031.PubMed CentralView ArticlePubMedGoogle Scholar
- Barre-Sinoussi F, Chermann JC, Rey F, Nugeyre MT, Chamaret S, Gruest J, Dauguet C, xler-Blin C, Vezinet-Brun F, Rouzioux C, et al: Isolation of a T-lymphotropic retrovirus from a patient at risk for acquired immune deficiency syndrome (AIDS). Science. 1983, 220: 868-871. 10.1126/science.6189183.View ArticlePubMedGoogle Scholar
- Gallo RC, Salahuddin SZ, Popovic M, Shearer GM, Kaplan M, Haynes BF, Palker TJ, Redfield R, Oleske J, Safai B: Frequent detection and isolation of cytopathic retroviruses (HTLV-III) from patients with AIDS and at risk for AIDS. Science. 1984, 224: 500-503. 10.1126/science.6200936.View ArticlePubMedGoogle Scholar
- Van Regenmortel MHV, Fauquet CM, Bishop DHL, Carstens EB, Estes MK, Lemon SM, Maniloff J, Mayo MA, McGeoch DJ, Pringle CR, Wickner RB: Virus Taxonomy: the classification and nomenclature of viruses. 2000, San Diego, CaliforniaGoogle Scholar
- International Human Genome Consortium: Initial sequencing and analysis of the human genome. Nature. 2001, 409: 860-921. 10.1038/35057062.View ArticleGoogle Scholar
- International Human Genome Consortium: Initial sequencing and analysis of the human genome. Nature. 2002, 420: 520-562. 10.1038/nature01262.View ArticleGoogle Scholar
- Wilkinson DA, Mager DL, Leong JA: Endogenous Human Retroviruses. The Retroviridae. Edited by: Levy JA. 1994, New York, N.Y.: Plenum Press, Inc, II: 465-535.View ArticleGoogle Scholar
- Gifford R, Tristem M: The evolution, distribution and diversity of endogenous retroviruses. Virus Genes. 2003, 26: 291-315. 10.1023/A:1024455415443.View ArticlePubMedGoogle Scholar
- Gifford R, Kabat P, Martin J, Lynch C, Tristem M: Evolution and distribution of class II-related endogenous retroviruses. J Virol. 2005, 79: 6478-6486. 10.1128/JVI.79.10.6478-6486.2005.PubMed CentralView ArticlePubMedGoogle Scholar
- Eickbush TH, Malik HS: Origin and Evolution of retrotransposons. Mobile DNA II. Edited by: Craig NL, Craigie R, Gellert M, Lambowitz AM. 2002, Washington DC.: ASM Press, 1111-1144.View ArticleGoogle Scholar
- Xiong Y, Eickbush TH: Origin and evolution of retroelements based upon their reverse transcriptase sequences. EMBO J. 1990, 9: 3353-3362.PubMed CentralPubMedGoogle Scholar
- Marin I, Llorens C: Ty3/Gypsy retrotransposons: description of new Arabidopsis thaliana elements and evolutionary perspectives derived from comparative genomic data. Mol Biol Evol. 2000, 17: 1040-1049.View ArticlePubMedGoogle Scholar
- Malik HS, Eickbush TH: Modular evolution of the integrase domain in the Ty3/Gypsy class of LTR retrotransposons. J Virol. 1999, 73: 5186-5190.PubMed CentralPubMedGoogle Scholar
- Malik HS, Eickbush TH: Phylogenetic analysis of ribonuclease H domains suggests a late, chimeric origin of LTR retrotransposable elements and retroviruses. Genome Res. 2001, 11: 1187-1197. 10.1101/gr.185101.View ArticlePubMedGoogle Scholar
- Llorens C, Marin I: A mammalian gene evolved from the integrase domain of an LTR retrotransposon. Mol Biol Evol. 2001, 18: 1597-1600.View ArticlePubMedGoogle Scholar
- Llorens C, Futami R, Bezemer D, Moya A: The Gypsy Database (GyDB) of Mobile Genetic Elements. Nucleic Acids Research (NAR). 2008, 36: 38-46. 10.1093/nar/gkm697.View ArticleGoogle Scholar
- Eddy SR: Profile hidden Markov models. Bioinformatics. 1998, 14: 755-763. 10.1093/bioinformatics/14.9.755.View ArticlePubMedGoogle Scholar
- Wright DA, Voytas DF: Potential retroviruses in plants: Tat1 is related to a group of Arabidopsis thaliana Ty3/gypsy retrotransposons that encode envelope-like proteins. Genetics. 1998, 149: 703-715.PubMed CentralPubMedGoogle Scholar
- Wright DA, Voytas DF: Athila4 of Arabidopsis and Calypso of soybean define a lineage of endogenous plant retroviruses. Genome Res. 2002, 12: 122-131. 10.1101/gr.196001.PubMed CentralView ArticlePubMedGoogle Scholar
- Bae YA, Moon SY, Kong Y, Cho SY, Rhyu MG: CsRn1, a novel active retrotransposon in a parasitic trematode, Clonorchis sinensis, discloses a new phylogenetic clade of Ty3/gypsy-like LTR retrotransposons. Mol Biol Evol. 2001, 18: 1474-1483.View ArticlePubMedGoogle Scholar
- Bowen NJ, McDonald JF: Genomic analysis of Caenorhabditis elegans reveals ancient families of retroviral-like elements. Genome Research. 1999, 9: 924-935. 10.1101/gr.9.10.924.View ArticlePubMedGoogle Scholar
- Gorinsek B, Gubensek F, Kordis D: Evolutionary genomics of chromoviruses in eukaryotes. Mol Biol Evol. 2004, 21: 781-798. 10.1093/molbev/msh057.View ArticlePubMedGoogle Scholar
- Britten RJ: Active gypsy/Ty3 retrotransposons or retroviruses in Caenorhabditis elegans. Proc Natl Acad Sci USA. 1995, 92: 599-601. 10.1073/pnas.92.2.599.PubMed CentralView ArticlePubMedGoogle Scholar
- Ganko EW, Fielman KT, MacDonald JF: Evolutionary History of Cer Elements and Their Impact on the C.elegans genome. Genome Res. 2001, 11: 2066-2074. 10.1101/gr.196201.PubMed CentralView ArticlePubMedGoogle Scholar
- Pringle CR: Virus taxonomy, The Universal System of Virus Taxonomy, updated to include the new proposals ratified by the International Committee on Taxonomy of Viruses during 1998. Archives of Virology. 1999, 144: 421-429. 10.1007/s007050050515.View ArticlePubMedGoogle Scholar
- Boeke JD, Eickbush TH, Sandmeyer SB, Voytas DF: Metaviridae. Virus Taxonomy: ICTV VIIth report. 1999, Springer-Verlag, New YorkGoogle Scholar
- Hull R: Classification of reverse transcribing elements: a discussion document. Archives of Virology. 1999, 144: 209-214. 10.1007/s007050050498.View ArticlePubMedGoogle Scholar
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25: 3389-3402. 10.1093/nar/25.17.3389.PubMed CentralView ArticlePubMedGoogle Scholar
- Nakayashiki H, Matsuo H, Chuma I, Ikeda K, Betsuyaku S, Kusaba M, Tosa Y, Mayama S: Pyret, a Ty3/Gypsy retrotransposon in Magnaporthe grisea contains an extra domain between the nucleocapsid and protease domains. Nucleic Acids Res. 2001, 29: 4106-4113. 10.1093/nar/29.20.4106.PubMed CentralView ArticlePubMedGoogle Scholar
- Green LM, Berg JM: A retroviral Cys-Xaa2-Cys-Xaa4-His-Xaa4-Cys peptide binds metal ions: spectroscopic studies and a proposed three-dimensional structure. Proc Natl Acad Sci USA. 1989, 86: 4047-4051. 10.1073/pnas.86.11.4047.PubMed CentralView ArticlePubMedGoogle Scholar
- Gorinsek B, Gubensek F, Kordis D: Phylogenomic analysis of chromoviruses. Cytogenet Genome Res. 2005, 110: 543-552. 10.1159/000084987.View ArticlePubMedGoogle Scholar
- Kordis D: A genomic perspective on the chromodomain-containing retrotransposons: Chromoviruses. Gene. 2005, 347: 161-173. 10.1016/j.gene.2004.12.017.View ArticlePubMedGoogle Scholar
- Rawlings ND, Tolle DP, Barrett AJ: MEROPS: the peptidase database. Nucleic Acids Research. 2004, 32: D160-D164. 10.1093/nar/gkh071.PubMed CentralView ArticlePubMedGoogle Scholar
- Wlodawer A, Gustchina A: Structural and biochemical studies of retroviral proteases. Biochim Biophys Acta. 2000, 1477: 16-34.View ArticlePubMedGoogle Scholar
- Butler M, Goodwin T, Poulter R: An unusual vertebrate LTR retrotransposon from the cod Gadus morhua. Mol Biol Evol. 2001, 18: 443-447.View ArticlePubMedGoogle Scholar
- Goodwin TJ, Poulter RT: A group of deuterostome Ty3/gypsy-like retrotransposons with Ty1/copia-like pol-domain orders. Mol Genet Genomics. 2002, 267: 481-491. 10.1007/s00438-002-0679-0.View ArticlePubMedGoogle Scholar
- Lodi PJ, Ernst JA, Kuszewski J, Hickman AB, Engelman A, Craigie R, Clore GM, Gronenborn AM: Solution structure of the DNA binding domain of HIV-1 integrase. Biochemistry. 1995, 34: 9826-9833. 10.1021/bi00031a002.View ArticlePubMedGoogle Scholar
- Polard P, Chandler M: Bacterial transposases and retroviral integrases. Mol Microbiol. 1995, 15: 13-23. 10.1111/j.1365-2958.1995.tb02217.x.View ArticlePubMedGoogle Scholar
- Khan E, Mack JP, Katz RA, Kulkosky J, Skalka AM: Retroviral integrase domains: DNA binding and the recognition of LTR sequences. Nucleic Acids Res. 1991, 19: 851-860. 10.1093/nar/19.4.851.PubMed CentralView ArticlePubMedGoogle Scholar
- Eickbush TH: Origin and evolutionary relationships of LTR retroelements. The evolutionary Biology of viruses. Edited by: Morse SS. 1994, New York: Raven, 121-157.Google Scholar
- Lynch M, Conery JS: The origins of genome complexity. Science. 2003, 302: 1401-1404. 10.1126/science.1089370.View ArticlePubMedGoogle Scholar
- Ganko EW, Bhattacharjee V, Schliekelman P, McDonald JF: Evidence for the contribution of LTR retrotransposons to C. elegans gene evolution. Mol Biol Evol. 2003, 20: 1925-1931. 10.1093/molbev/msg200.View ArticlePubMedGoogle Scholar
- Brandt J, Schrauth S, Veith AM, Froschauer A, Haneke T, Schultheis C, Gessler M, Leimeister C, Volff JN: Transposable elements as a source of genetic innovation: expression and evolution of a family of retrotransposon-derived neogenes in mammals. Gene. 2005, 345: 101-111. 10.1016/j.gene.2004.11.022.View ArticlePubMedGoogle Scholar
- Jurka J, Kapitonov VV, Kohany O, Jurka MV: Repetitive sequences in complex genomes: structure and evolution. Annu Rev Genomics Hum Genet. 2007, 8: 241-259. 10.1146/annurev.genom.8.080706.092416.View ArticlePubMedGoogle Scholar
- Volff JN: Turning junk into gold: domestication of transposable elements and the creation of new genes in eukaryotes. Bioessays. 2006, 28: 913-922. 10.1002/bies.20452.View ArticlePubMedGoogle Scholar
- Kazazian HH: Mobile elements: drivers of genome evolution. Science. 2004, 303: 1626-1632. 10.1126/science.1089670.View ArticlePubMedGoogle Scholar
- Hurst GDD, Schilthuizen M: Selfish genetic elements and speciation. Heredity. 1998, 80: 2-8. 10.1046/j.1365-2540.1998.00337.x.View ArticleGoogle Scholar
- Sonigo P, Barker C, Hunter E, Wain-Hobson S: Nucleotide sequence of Mason-Pfizer monkey virus: an immunosuppressive D-type retrovirus. Cell. 1986, 45: 375-385. 10.1016/0092-8674(86)90323-5.View ArticlePubMedGoogle Scholar
- Perrin L, Kaiser L, Yerly S: Travel and the spread of HIV-1 genetic variants. Lancet Infect Dis. 2003, 3: 22-27. 10.1016/S1473-3099(03)00484-5.View ArticlePubMedGoogle Scholar
- Rambaut A, Posada D, Crandall KA, Holmes EC: The causes and consequences of HIV evolution. Nat Rev Genet. 2004, 5: 52-61. 10.1038/nrg1246.View ArticlePubMedGoogle Scholar
- Berkhout B, Gorelick R, Summers MF, Mely Y, Darlix J: 6th International Symposium on Retroviral Nucleocapsid. Retrovirology. 2008, 5: 21-10.1186/1742-4690-5-21.PubMed CentralView ArticlePubMedGoogle Scholar
- Cascella M, Micheletti C, Rothlisberger U, Carloni P: Evolutionarily conserved functional mechanics across pepsin-like and retroviral aspartic proteases. J Am Chem Soc. 2005, 127: 3734-3742. 10.1021/ja044608+.View ArticlePubMedGoogle Scholar
- Hornak V, Okur A, Rizzo RC, Simmerling C: HIV-1 protease flaps spontaneously close to the correct structure in simulations following manual placement of an inhibitor into the open state. J Am Chem Soc. 2006, 128: 2812-2813. 10.1021/ja058211x.PubMed CentralView ArticlePubMedGoogle Scholar
- Wright DA, Townsend JA, Winfrey RJ, Irwin PA, Rajagopal J, Lonosky M, Hall BD, Jondle MD, Voytas DF: High-frequency homologous recombination in plants mediated by zinc-finger nucleases. Plant J. 2005, 44: 693-705. 10.1111/j.1365-313X.2005.02551.x.View ArticlePubMedGoogle Scholar
- Singleton TL, Levin HL: A long terminal repeat retrotransposon of fission yeast has strong preferences for specific sites of insertion. Eukaryot Cell. 2002, 1: 44-55. 10.1128/EC.01.1.44-55.2002.PubMed CentralView ArticlePubMedGoogle Scholar
- Wilke CO: Quasispecies theory in the context of population genetics. BMC Evolutionary Biology. 2005, 5: 44-10.1186/1471-2148-5-44.PubMed CentralView ArticlePubMedGoogle Scholar
- National Center of Biotechnology Information. [http://www.ncbi.nlm.nih.gov]
- Gag-pol tree. [http://gydb.uv.es/gydb/phylogeny.php?tree=gagpol]
- Llorens C, Futami R, Moya A: The GyDB collection: Ty3/Gypsy and Retroviridae LTR retroelements and related nonviral proteins. Biotechvana Bioinformatics. 2008, CR: GyDB CollectionGoogle Scholar
- Gag-pol multiple alignment URL. [http://gydb.uv.es/biotechvana/collection/alignment.php?alignment=GAGPOL_retroelement&format=htm]
- Genedoc. [http://www.nrbsc.org/gfx/genedoc/index.html]
- Llorens C, Futami R, Vicente-Ripolles M, Moya A: The CheckAlign logo-maker application in analyses of both gapped and ungapped DNA and protein alignments. Biotechvana Bioinformatics. 2008, SOFT: CheckAlignGoogle Scholar
- Shannon CE: The mathematical theory of communication. 1963. MD Comput. 1997, 14: 306-317.PubMedGoogle Scholar
- Schneider TD, Stephens RM: Sequence Logos – A New Way to Display Consensus Sequences. Nucleic Acids Research. 1990, 18: 6097-6100. 10.1093/nar/18.20.6097.PubMed CentralView ArticlePubMedGoogle Scholar
- Schneider TD, Stormo GD, Gold L, Ehrenfeucht A: Information content of binding sites on nucleotide sequences. J Mol Biol. 1986, 188: 415-431. 10.1016/0022-2836(86)90165-8.View ArticlePubMedGoogle Scholar
- Schneider TD: Consensus sequence Zen. Appl Bioinformatics. 2002, 1: 111-119.PubMed CentralPubMedGoogle Scholar
- Tom Schneider Web site. [http://www-lecb.ncifcrf.gov/~toms/]
- Louis JM, Dyda F, Nashed NT, Kimmel AR, Davies DR: Hydrophilic peptides derived from the transframe region of Gag-Pol inhibit the HIV-1 protease. Biochemistry. 1998, 37: 2105-2110. 10.1021/bi972059x.View ArticlePubMedGoogle Scholar
- Schwede T, Kopp J, Guex N, Peitsch MC: SWISS-MODEL: An automated protein homology-modeling server. Nucleic Acids Res. 2003, 31: 3381-3385. 10.1093/nar/gkg520.PubMed CentralView ArticlePubMedGoogle Scholar
- RCSB Protein Data Bank. [http://www.rcsb.org/pdb/home/home.do]
- PHYLIP package of programs for inferring phylogenies. Version 3.6a3. [http://evolution.genetics.washington.edu/phylip.html]
- Eck RV, Dayhoff MO: Atlas of Protein Sequence and Structure. 1966, National Biomedical Research Foundation, Silver Spring, MarylandGoogle Scholar
- Fitch WM: Toward Defining the Course of Evolution: Minimum Change for a Specific Tree Topology. Systematic Zoology. 1971, 20: 406-416. 10.2307/2412116.View ArticleGoogle Scholar
- Margus T, McMorris FR: Consensus n-trees. Bull Math Biol. 1981, 43: 239-244.Google Scholar
- Saitou N, Nei M: The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol. 1987, 4: 406-425.PubMedGoogle Scholar
- Miyamoto MM, Cracraft JL: Phylogenetic inference, DNA sequence analysis, and the future of molecular systematics. 1991, Oxford University Press, Oxford, EnglandGoogle Scholar
- Nei M, Kumar S: Molecular evolution and phylogenetics. 2000, Oxford University Press, Oxford, EnglandGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.