Relationships of gag-pol diversity between Ty3/Gypsy and Retroviridae LTR retroelements and the three kings hypothesis
© Llorens et al. 2008
Received: 23 March 2008
Accepted: 08 October 2008
Published: 08 October 2008
Skip to main content
© Llorens et al. 2008
Received: 23 March 2008
Accepted: 08 October 2008
Published: 08 October 2008
The origin of vertebrate retroviruses (Retroviridae) is yet to be thoroughly investigated, but due to their similarity and identical gag-pol (and env) genome structure, it is accepted that they evolve from Ty3/Gypsy LTR retroelements the retrotransposons and retroviruses of plants, fungi and animals. These 2 groups of LTR retroelements code for 3 proteins rarely studied due to the high variability – gag polyprotein, protease and GPY/F module. In relation to 3 previously proposed Retroviridae classes I, II and II, investigation of the above proteins conclusively uncovers important insights regarding the ancient history of Ty3/Gypsy and Retroviridae LTR retroelements.
We performed a comprehensive study of 120 non-redundant Ty3/Gypsy and Retroviridae LTR retroelements. Phylogenetic reconstruction inferred based on the concatenated analysis of the gag and pol polyproteins shows a robust phylogenetic signal regarding the clustering of OTUs. Evaluation of gag and pol polyproteins separately yields discordant information. While pol signal supports the traditional perspective (2 monophyletic groups), gag polyprotein describes an alternative scenario where each Retroviridae class can be distantly related with one or more Ty3/Gypsy lineages. We investigated more in depth this evidence through comparative analyses performed based on the gag polyprotein, the protease and the GPY/F module. Our results indicate that contrary to the traditional monophyletic view of the origin of vertebrate retroviruses, the Retroviridae class I is a molecular fossil, preserving features that were probably predominant among Ty3/Gypsy ancestors predating the split of plants, fungi and animals. In contrast, classes II and III maintain other phenotypes that emerged more recently during Ty3/Gypsy evolution.
The 3 Retroviridae classes I, II and III exhibit phenotypic differences that delineate a network never before reported between Ty3/Gypsy and Retroviridae LTR retroelements. This new scenario reveals how the diversity of vertebrate retroviruses is polyphyletically recurrent into the Ty3/Gypsy evolution, i.e. older than previously thought. The simplest hypothesis to explain this finding is that classes I, II and III trace back to at least 3 Ty3/Gypsy ancestors that emerged at different evolutionary times prior to protostomes-deuterostomes divergence. We have called this "the three kings hypothesis" concerning the origin of vertebrate retroviruses.
Attention was first drawn to the Retroviridae when HTLV-1 was characterized as pathogenic in humans [1, 2]. They further increased in significance with the discovery of HIV-1, the retrovirus responsible for AIDS in humans [3, 4]. These 2 retroviruses represent only a small part of Retroviridae diversity, which can be divided in seven genera; Alpha-, Beta-, Gamma-, Delta-, Epsilon-, Spumaretroviridae and Lentiviridae (according to ICTV classification ). Based on their strategy of transmission, the Retroviridae can also be classified as endogenous retroviruses when they enter the germ lines of hosts and are vertically transmitted; or as exogenous retroviruses, when they can be transmitted horizontally from one host into another via infection. Most recent trends in Retroviridae taxonomy [6–10] group endogenous and exogenousretroviruses into 3 major classes designated as I, II and III. Both classifications are complementary as class I comprises gamma- and epsilonretroviruses; class II includes lentiviruses, delta-, alpha- and betaretroviruses; and class III groups spumaretroviruses with ERV-L retroelements. The ancient history of the Retroviridae is yet to be thoroughly investigated, but due to their similarity and identical gag-pol (and env) genome structure, it is usually assumed that they evolve from the Ty3/Gypsy LTR retroelements of plants, fungi and animals . The traditional view suggested by pol polyprotein domains such as the RT [12–14], RNAse H [14, 15], and INT [14, 16] used to resolve the phylogeny, delineates a common Ty3/Gypsy origin for all vertebrate retroviruses. Nevertheless little is known about this scenario because RT, RNAse H and INT analyses appear unable of agreeing on a precise well-supported Ty3/Gypsy root for the Retroviridae. In an attempt to bring light on this topic, we investigated 120 non-redundant Ty3/Gypsy and Retroviridae taxa based on the phylogenetic analysis of both gag and pol polyproteins. Our results revealed conflicting phylogenetic signals between these 2 polyproteins. From that point, we aimed to investigate more in depth this evidence through comparative analyses performed based on 3 independent proteins rarely considered by prior studies due to their variability – the gag polyprotein, the PR and the GPY/F module. Our study reveals taxonomic differences among the 3 Retroviridae classes, and an evolutionary network that distantly relates each class with one or more Ty3/Gypsy lineages. This observation appears to be at odds with the traditional monophyletic view suggested by prior approaches to determining the origin of vertebrate retroviruses, but requires further study. In light of this new perspective, we introduce here a new hypothesis for debate and further evaluation. Our hypothesis argues that classes I, II and III probably trace back to at least 3 independent Ty3/Gypsy ancestors. We call this the three kings hypothesis.
Hits of BLASTp similarity between Micropia/Mdg3 and other Ty3/Gypsy and Retroviridae gags
Query: Micropia gag
Query: Mdg3 gag
Hits of BLASTp similarity between Tat and other Ty3/Gypsy and Retroviridae gags
Query: Retrosor1 gag
Query: Tat4-1 gag
Comparative analyses confirm phenotypic features in the gag polyprotein that distantly relate each Retroviridae class with one or more of the Ty3/Gypsy lineages evaluated. The similarity spans the CA-NC core and the most prominent feature in common is the variability in the number of CCHC arrays per NC. With very few exceptions, the Athila/Tat elements of plants usually code for NCs exhibiting one CCHC array, Micropia/Mdg3 and Mag elements code for NCs usually exhibiting 2 arrays (except Mag elements of C.elegans), and errantiviral gags have not CCHC arrays at their C-terminus. This indicates that the number of CCHC arrays per NC is evolutionarily preserved depending on the Ty3/Gypsy lineage and the Retroviridae class, and that this phenotype is an excellent indicator of taxonomy and evolution. For simplicity's sake, we do not discuss all Ty3/Gypsy cases. We discuss but one example, the most interesting instance of using this indicator – the chromodomain-containing Ty3/Gypsy LTR retrotransposons  called chromoviruses . Chromoviruses are the most ancient branch of Ty3/Gypsy LTR retroelements as they have been described in the genomes of plants, fungi and vertebrates (for a more extensive information about chromoviruses, see [23, 32, 33]). It noteworthy that all Ty3/Gypsy LTR retroelements of plants can be divided in 2 major branches – chromoviruses and Athila/Tat – and that chromoviruses appear to be the only branch of Ty3/Gypsy LTR retroelements capable of colonizing the genomes of fungi. A prior study  reported that this branch of Ty3/Gypsy LTR retroelements displays similarity (we confirm) to gammaretroviruses based on CA-NC. However, we have also found how that chromoviruses show similarities to class II in addition to a number of Ty3/Gypsy lineages (for this reason chromoviruses fall at an intermediate position in the gag phylogeny). With rare exceptions, NCs coded by chromoviruses usually bear one CCHC array (data not shown). In contrast, the different Ty3/Gypsy lineages described in bilateria organisms show greater variability in the number of CCHC arrays at NC thantheir Ty3/Gypsy counterparts of plants and fungi (i.e. chromoviruses and the Athila/Tat branch). Gag evidence thus relates class I to the most likely CA-NC phenotype of Ty3/Gypsy ancestors predating the split between plants and the ophistokonts (fungi and animals) and classes II and III with other CA-NC phenotypes, more frequently observed among the Ty3/Gypsy LTR retroelements of protostomes and deuterostomes.
Through phylogenetic analyses, we have shown that the pol signal is primarily responsible for the branching of Ty3/Gypsy and Retroviridae LTR retroelements in 2 monophyletic groups. That is the usual evolutionary perspective based on the RT and other pol polyprotein domains. We have also shown that gag signal discloses an alternative scenario wherein each Retroviridae class can be related to one or more Ty3/Gypsy lineages. An in-depth examination of gag diversity through comparative analyses has revealed the phenotypic variations involved in this differential similarity. Gag evidence is thus well supported. An interesting question is whether this evidence should be considered a convergence due to the fast rate of evolution of the gag polyprotein, or if it is due to an ancient divergence. Certainly, the most robust components of the pol polyprotein – the RT, RNAse H and INT – usually support the traditional perspective originally delineated by RT analyses . However, the strong signal from these 3 proteins disguises the particular perspective provided by another pol protein domain – the PR. Non-redundant studies focusing on Ty3/Gypsy and Retroviridae PRs are rarely reported as this enzyme presents identical analytical difficulty to gag due to its fast rate of evolution. Despite this it is well known that LTR retroelement PRs in general are aspartic peptidases belonging to clan AA (following MEROPS Database classification ). Within clan AA, Retroviridae PRs are divided into 2 protein families, retropepsins (family A2) and spumaretropepsins (family A9). Family A2 groups all PRs coded by classes I and II and family A9 collects the PRs coded by spumaretroviruses (class III). Such a classification keeps going because retropepsins and spumaretropepsins are strongly dissimilar each other and do not group on a single branch in any analysis (data not shown). On the other hand, Ty3/Gypsy PRs are extremely variable and little is known about them. MEROPS Database at least classifies many Ty3/Gypsy examples within family A2 because these PRs display great similarity to retropepsins. However, not all Ty3/Gypsy PR are similar to retropepsins as not all Retroviridae PRs are retropepsins. Because no study evaluates the relationships between Ty3/Gypsy and Retroviridae PRs, we investigated this topic, taking into consideration the differentiation of the 2 groups of LTR retroelements into lineages. It is worth remembering that while gag and pol signals are in disagreement over the taxonomical groups, they do support the differentiation into clades, genera and classes of Ty3/Gypsy and Retroviridae LTR retroelements.
Phylogenetic analysis inferred based on all concatenated gag and pol products coded by Ty3/Gypsy and Retroviridae LTR retroelements shows the robustness of their phylogenetic signal regarding the clustering of OTUs [5–14, 19–25]. We used the parsimony method to infer this phylogeny, but the clustering of OTUs is independent of the method of phylogenetic reconstruction used (see Methods). The gag-pol analysis also divides Ty3/Gypsy and Retroviridae LTR retroelements into 2 separate branches, as suggested by original approaches in this topic [12, 41]. We do not disagree this classification for 2 reasons; first, the strong phylogenetic signal of RT, RNAseH, and INT cannot be dismissed;and second, the Retroviridae (except gammaretroviruses) can be distinguished from Ty3/Gypsy LTR retroelements by features such as the presence of accessory genes. Nevertheless, thecurrent Ty3/Gypsy and Retroviridae classification only exposes the modern evolutionary history of these 2 groups of retroelements (we have shown how their ancient history is not straightforward). Due to the wide distribution of Ty3/Gypsy elements in eukaryotes, the usual means of transference of a canonical Ty3/Gypsy LTR retrotransposon is probably vertical. However, the viral nature of a true Ty3/Gypsy or Retroviridae exogenous retrovirus resides in its capability of horizontal transference from one host to another via infection. Moreover, the incidence of mechanisms such as gene recruitment, genome rearrangement, recombination and chimerism in LTR retroelement evolution, presents difficulties in identifying the true natural history of Ty3/Gypsy and Retroviridae LTR retroelements. This suggests that the most realistic (not yet proposed) model for describing Ty3/Gypsy and Retroviridae evolution alternates gradual and modular evolution, and combines vertical and horizontal means of transference.
The traditional argument supporting the Ty3/Gypsy origins of vertebrate retroviruses is shown by their similarity in sequence and genome structure . The question is, however, what genetic material is more informative for exploring the relationships between these two (and other) groupsof LTR retroelements, highl y variable traits such as gag and PR or strongly preserved substrates such as the RT, RNAseH and INT? Certainly, RT, RNAseH and INT are an excellent means of classifying Ty3/Gypsy and Retroviridae LTR retroelements into lineages. However, phylogenetic analyses based on RT, RNAseH and INT are not exact enough to resolve the ancient evolutionary history of these 2 groups. This is because the inferred phylogeny based on these proteins does not necessarily coincide with the true natural history of the full-length retroelement genome. Here, the advantage of using the gag-pol alignment to infer the phylogeny is the increase in statistical power of the analysis, allowing the opportunity to correct the single gene tree discrepancies. This analytical strategy is useful but has limitations for which solutions remain elusive; the inferred tree can accumulate systematic errors due to the use of concatenated information. We have shown how gag-pol tree suggests a Ty3/Gypsy root in the origins of vertebrate retroviruses that is close to the Micropia/Mdg3 clade. However, evaluation of gag and pol polyproteins separately yields discordant information. Here, while pol phylogeny supports the traditional perspective (2 retroelement groups), gag phylogeny describes a new scenario that appears to be informative with respect to the ancient patterns of diversity of Ty3/Gypsy and Retroviridae LTR retroelements. Certainly, the phylogenetic signal of the gag polyprotein has several limitations due to its fast evolution. To overcome these limitations we investigated other protein domains and used different methodologies to evaluate the significance of the new scenario. The most important feature here is that, for first time in the scientific literature, we have carried out a non-redundant study of three independent proteins that have rarely been attempted before because their difficulty.
Our investigation conclusively reveals that the taxonomical differentiation into the 3 Retroviridae classes I, II and III discloses 3 different gag and PR products, and that each product has one or more distant Ty3/Gypsy counterparts. The analysis of the GPY/F module reveals partial consistency and how the similarity of class I to Ty3/Gypsy LTR retroelements of plant and fungi, is significant. Our results thus support an ancient scenario of polyphyly involving the 3 Retroviridae classes and different Ty3/Gypsy lineages. Here, we stress that the identification of the Retroviridae classes is not a conclusion but an assumption based on previous studies [6–10]. Notwithstanding, we cannot argue for the existence of a direct ancestor between each class and any particular Ty3/Gypsy lineage. Classes I and II are sufficiently similar to corroborate their accepted evolutionary relationship, and it can also be assumed that Ty3/Gypsy and Retroviridae phylogeny is incomplete (sequencing projects are continuously disclosing new lineages). Despite this, the similarity of each class by simple convergence to different Ty3/Gypsy lineages based on 3 independent protein products is an implausible parsimonious explanation. Moreover, while class III spumaretroviruses are dissimilar to classes I and II, our results reveal that they in turn display an intriguing domain similarity to errantiviruses that ought to be followed up. Hence we think that the class differentiation probably unravels certain aspects of vertebrate retroviruses related to their ancient Ty3/Gypsy origins. Instead of a single root to this new scenario, we show how an ancient evolutionary network between the 2 groups can exist, with its most interesting aspect being its polyphyly. (The Ty3/Gypsy lineages related to each class does not constitute a monophyletic branch in any phylogeny). Therefore, our approach strongly suggest that class I is a molecular fossil that emerged quite soon in Ty3/Gypsy evolution, while classes II and III emerged later, together with the ancestors of Ty3/Gypsy LTR retroelements described in protostomes.
The evolutionary network identified by classes I, II, III is inconsistent with the idea of a unique Retroviridae ancestor. It follows that various scenarios may either support or disprove such a network. Assuming this network exists, the most likely scenario relates Ty3/Gypsy elements of plants and fungi with the Retroviridae class I. This scenario assumes the existence of a distant evolutionary relationship between the lineages or an ancient horizontal transfer of chromoviruses from fungi (or plants) to vertebrates. Indeed, chromoviruses are the most ancient lineage of Ty3/Gypsy LTR retrotransposons. They are rich in genetic variability, and are also present in the genome of many vertebrates [23, 32, 33]. In both cases, the most likely explanation for the relationship between class I and Athila/Tat retroviruses and retrotransposons of plants is that chromoviruses and class I are related, an argument suggested by a previous study . Nevertheless, chromoviruses of vertebrate organisms are usually more similar to their chromoviral counterparts of fungi than to those of plants. Therefore the chromoviral scenario does not explain why class I and Athila/Tat elements of plants are similar each other based on gag. On the other hand, chromoviruses have not yet been described in protostomes, echinoderms and urochordates; furthermore it remains unclear whether chromoviruses were inexorably driven to extinction in these organisms or were horizontally transmitted from plants/fungi to vertebrates. Consequently, the chromoviral scenario does not clarify why classes II and III and the Ty3/Gypsy lineages of protostomes share sequence similarities and phenotypic features rarely found among the Ty3/Gypsy lineages of plants and fungi. With this in mind, a new theoretical principle is posited here for debate and further research. The simplest hypothesis is that classes I, II and III probably evolved from at least 3 Ty3/Gypsy ancestors and emerged at different evolutionary times prior to the split between protostomes and deuterostomes (the three kings hypothesis). Several points involved in the background of this hypothesis should be emphasized. First, we include the words "at least" to acknowledge the three classes but do not dismiss the possibility of more Ty3/Gypsy ancestors in the evolutionary history of the Retroviridae. Second, "different times of emergence" suggests, but does not necessarily mean, independent origins. Class II may in fact be directly related to class I, but the emergence of class II seems more recent and in parallel with the emergence of the ancestorsof several Ty3/Gypsy lineages, such as the Micropia/Mdg3 clade (or others). Class III spumaretroviruses delineate identical perspective with Ty3/Gypsy errantiviruses. Third, we use the term "polyphyletic" because the Ty3/Gypsy lineages related to each class do not constitute a monophyletic branch in any phylogeny. Moreover, viral evolution is always a polyphyletic challenge involving ecological parameters such as host populations, environment, vectors, mechanisms of transmissions, etc.
We have described how the different gags, PRs and GPY/F modules evaluated show a variability that is preserved, depending on the Ty3/Gypsy lineage and Retroviridae class (or genus). While class I can be related to Ty3/Gypsy elements of plants and fungi, classes II and III preserve phenotypic features typically observed among Ty3/Gypsy elements of protostomes. That is the evolutionary perspective provided by the protein product of 3 independent coding regions. We have discussed this evidence but have not yet interpreted why the diversity and phylogeny of Ty3/Gypsy and Retroviridae LTR retroelements are so different regarding the different gag or pol substrates. In general, the action of viruses and mobile genetic elements is important in host evolution [16, 42, 43, 44, 45, 46, 47] because they are vectors of evolution and potential inducers of diseases and genetic disorders, such as chromosome rearrangements and inversions . However, if the action of viruses and mobile genetic elements might somehow influence the host evolution, it is reasonable that host evolution could also constrain the evolution of these genetic agents. We thus speculate with the possibility of selective influences imposed on Retroviridae genes such as the rt, rnase h and int (and other regions) to optimize essential functions, such as retrotranscription and integration (according to the complexity of the new genome environment provided by vertebrate organisms). This probably involves gradual evolution but also a number of molecular mechanisms, such as gene recruitment and recombination to generate variability and new effective genetic combinations. Here, it is important to keep in mind that except gammaretroviruses and other exceptions, the Retroviridae usually incorporate accessory genes, usually needed to adjust diverse aspects of their replication and infectivity (these features appear to be specific of retroviruses infecting vertebrate organisms). On the other hand, a prior study  supports a putative chimeric origin of the Retroviridae RNAse H domain and the modular acquisition of the GPY/F module by Ty3/Gypsy and Retroviridae INTs . Moreover, D-type betaretroviruses probably are viral hybrids between a B-type betaretrovirus and a C-type gammaretrovirus [5, 17, 49]. Finally, a number of studies reveal how recombination is a mechanism frequently embraced by HIV evolution to generate variability. Two studies reveal for instance how recombination of M subtypes, has resulted in the generation of multiple circulating recombinant forms consisting of mosaic HIV-1 lineages [50, 51].
Regarding coding regions such as gag, pr and gpy/f module, we think that these traits reveal features and aspects involving different evolutionary strategies, but which are intrinsic and taxonomically related with ancient events of retroelement speciation and divergence. This argument finds an important evolutionary marker in the variability in the number of CCHC arrays at NC and the different PR and GPY/F module isoforms. Indeed, the CCHC array at NC is involved in virion assembly, RNA packaging, reverse transcription and integration processes . On the other hand, the flap lies over the PR active site and conveys specificity to the enzyme by carrying important substrate-binding functions (for more information in this topic, see [35, 53, 54]). Finally, while the GPY/F module is now under investigation, the C-terminal end of the INT appears to be important in the integration of the retroelement into the host genome [55, 56]. The variability of these three regions probably reveals different evolutionary strategies of speciation and divergence, which can be assumed older than previously supposed, since it does not only occur in the Retroviridae group, but also in all Ty3/Gypsy LTR retroelements of plants, fungi and animals. Here, the three kings hypothesis and its testing (in one sense or another) does not affect the evidence we have presented. That is, class I, II and III taxonomically code for 3 gag, PR and GPY/F products that have one or more distant counterparts among Ty3/Gypsy LTR retroelements. However, the most interesting aspect of the gag-PR-GPY/F variability is that it appears to be constrained by the bio-distribution of Ty3/Gypsy LTR retroelements. In turn, the diversity patterns of the Retroviridae based on these regions appear to be recurrent into the evolutionary performance of Ty3/Gypsy LTR retroelements, the most interesting aspect of which is that they seem polyphyletic. Therefore the evolutionary network between Ty3/Gypsy and Retroviridae LTR retroelements is informative regarding an ancestral history, which is in some respects similar to those models of evolution indistinctly described by population genetics and quasi-species theory (for more details see ). This means that further analysis of the evolutionary network we disclose in this study challenges the involvement of different parameters such as bio-distribution, host's populations, environment, vectors and mechanisms of transmissions, etc. With this aim, our hypothesis makes possible a first evaluation of this new scenario we present in a forthcoming manuscript (submitted for publication). In this approach, we use the number of CCHC arrays at NC and the different PR and GPY/F module isoforms as evolutionary markers to trace the network. This is by superimposing not only Ty3/Gypsy and Retroviridae LTR retroelements, but also other LTR retroelement groups over their host bio-distribution.
Retroviridae classes I, II and III exhibit phenotypic differences that delineate a network never before reported between Ty3/Gypsy and Retroviridae LTR retroelements. This new scenario reveals how the diversity of vertebrate retroviruses is polyphyletically recurrent into the Ty3/Gypsy evolution, i.e. older than previously thought. The simplest hypothesis to explain this finding is that classes I, II and III trace back to at least 3 Ty3/Gypsy ancestors that emerged at different evolutionary times prior to protostomes-deuterostomes divergence. We have called this "the three kings hypothesis" concerning the origin of vertebrate retroviruses.
This work is part of the GyDB Project  an ongoing database launched with the aim of phylogenetically analyzing and classifying mobile genetic elements based on their diversity and evolutionary profile. In the first iteration, we consider the Ty3/Gypsy and Retroviridae LTR retroelements of eukaryotes. We have investigated 120 non-redundant full-length Ty3/Gypsy and Retroviridae genomes collected from NCBI . An extended version of the gag-pol tree evaluated summarizing names, taxonomy, hosts, and Genbank accessions of all retroelement taxa used to perform this analysis, is available online as the Additional file 1 accompanying this paper. By clicking the name of each OTU in this tree, the user can browse the GyDB and locate a file providing information of the OTU selected, including a link to the Genbank accession of the requested element at NCBI. The gag-pol tree can also be found online in the Section Phylogenies at GyDB .
In general, all Ty3/Gypsy and Retroviridae LTR retroelements have 2 polyproteins in common – gag and pol. Gag is composed of 3 domains -MA, CA and NC -, pol is usually carrier of 4 domains – PR, RT, RNAse H and INT. Note however that PR can be coded separately or in frame with gag and other protein domains. We have used and analyzed a gag-pol multiple alignment ~1700 residues in size, constructed based on the concatenation of the CA, NC, PR, RT, RNAseH and INT cores. The gag-pol alignment is freely accessible within the GyDB collection deposited at Biotechvana Bioinformatics . The alignment is available in 6 formats at the following URL . We have also analyzed the gag and pol polyproteins by separate dividing the gag-pol alignment into 2 independent alignments CA-NC and PR-RT-RNAseH-INT, to perform phylogenetic or comparative analyses.
Alignments were compared using GENEDOC editor  in shaded mode and the following groups of amino acid similarity: [T,S small nucleophile amino acids] [K,R,H basic amino acids], [D,E,N,Q acidic amino acid and relative amides], and [L,I,V,M,A,G,P,F,Y,W hydrophobic amino acids]. Similarities between gag sequences were correlated using different gag queries to the CORES database available via the NCBI BLAST search  at GyDB, using BLASTp search mode. BLAST databases available at GyDB are non-redundant, small and include only Ty3/Gypsy and Retroviridae or related sequences, allowing flexible comparisons between both distantly and closely related sequences with homologous known functions.
Comparative analyses based on sequence logos involved CheckAlign 1.0  in Shannon's algorithm mode  and correction factor. Sequence logomethodology was originally introduced by Schneider et al. [65, 66] to display consensus sequences for DNA and protein alignments. Later, Schneider dismissed the term "consensus" , arguing that a logo provides more information than the consensus sequence of a protein or DNA alignment. While this can be controversial because there are many manners to obtain or describe a consensus sequence, logos methodology being one of them, we are in agreement with the proposition of the original author in the use of the term "sequence logo" suggested in his website . We employ the term "sequence logo" to describe the resultant output reported by this analysis, and then refer to the protein information underlying the content shape of the logos constructed, based on our alignments as "amino acidic architecture". This term may be useful to describe with a single word – consensus, core and amino acid patterns. CheckAlign directly builds the logo from an ungapped alignment using the conventional methodology [65, 66]. Here, the maximum uncertainty by position in a protein alignment is log2 20 = 4.3. In the case of gapped alignments, CheckAlign automatically builds the logo, taking the gap as another amino acid species. Here, the tool considers the maximum uncertainty by position to be log2 21 = 4.4 for protein alignments (for more details about CheckAlign see ).
Phylogenetic reconstructions of Ty3/Gypsy and Retroviridae LTR retroelements inferred from gag-pol, pol and gag alignments employed the PHYLIP 3.6 package . We first generated 100 bootstrap replicates of each alignment using SEQBOOT. Second, we used the protein sequence parsimony method of Felsenstein, based on the approaches of Eck and Dayhoff  and Fitch  to perform the analyses. Here, the bootstrap file was used as an input to PROTPARS and the input randomized using the following parameters, random number seed = 5 and number of times to jumble = 5. Third, CONSENSE was used to obtain a MRC tree  using the tree file generated by PROTPARS as an input. As the MRC tree usually consists of all clusters that occur >50% of the time, we took consensus values >55 as a bootstrap reference. Bootstrap values were used to scale the trees.
We also tested the NJ method  using different models of distances implemented in PROTDIST. Here, it is important to keep in mind that the overall efficiency of the different methods of phylogenetic reconstruction in building the true tree vary with substitution rate, transition-transversion ratio, and sequence divergence [77, 78]. With the particular material we studied, parsimony and NJ trees support the clustering of OTUs into clades and genera in gag-pol and pol analyses, and they are consistent in not supporting the monophyly of each group in gag analyses. However, parsimony phylogenies proved more consistent with comparative analyses than NJ trees when inferring phylogenies including or evaluating the gag and/or PR proteins. Parsimony analyses also reported better bootstrapping and were more consistent with the three Retroviridae classes than NJ analyses (NJ trees only support classes I and II).
Acquired Immune Deficiency Syndrome
Bovine Leukemia Virus
Caprine Arthritis Encephalitis Virus
Equine Foamy Virus
Feline Leukemia Virus
Feline Foamy Virus
Feline Syncytial Virus
International Committee on Taxonomy of Viruses
Human Immunodeficiency Virus
Human Foamy Virus
Human T-cell Leukemia Virus
Hidden Markov Model
Long terminal repeat
Lymphoproliferative Disease Virus
Major homology region
Maedi Visna Virus
Mason-Pfizer Monkey Virus
Mouse Mammary Tumor Virus
National Center of Biotechnology Information
Operative taxonomical unit
Ovine Maedi Visna Virus
Research Collaboratory for Structural Bioinformatics
Rous Sarcoma Virus
Servei Central de Suport a la Investigació Experimental
Simian Endogenous Retrovirus of Mandrill
Simian Immunodeficiency Retrovirus of Macaques; Simian Foamy Virus (SFV)
We thank Javier Ortiz and Isaac Fernandez of the SCSIE at University of Valencia for technical support, and the 2 anonymous referees for their useful comments for improving the original manuscript. The GyDB project was awarded the NOVA 2006 by IMPIVA and Conselleria d'Empresa, Universitat i Cìencia of Valencia. The research has been partly supported by grants IMCBTA/2005/45, IMIDTD/2006/158 and IMIDTD/2007/33 from IMPIVA, by grant BFU2005-00503 from MEC to AM, and by financial grant 17092008 from ENISA (Empresa Nacional de Innovacion SA) to Biotechvana. Funding to pay the Open Access publication charges for this article was provided by University of Valencia.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.