Multiple invasions of Gypsy and Micropia retroelements in genus Zaprionus and melanogaster subgroup of the genus Drosophila

Background The Zaprionus genus shares evolutionary features with the melanogaster subgroup, such as space and time of origin. Although little information about the transposable element content in the Zaprionus genus had been accumulated, some of their elements appear to be more closely related with those of the melanogaster subgroup, indicating that these two groups of species were involved in horizontal transfer events during their evolution. Among these elements, the Gypsy and the Micropia retroelements were chosen for screening in seven species of the two Zaprionus subgenera, Anaprionus and Zaprionus. Results Screening allowed the identification of diverse Gypsy and Micropia retroelements only in species of the Zaprionus subgenus, showing that they are transcriptionally active in the sampled species. The sequences of each retroelement were closely related to those of the melanogaster species subgroup, and the most parsimonious hypothesis would be that 15 horizontal transfer events shaped their evolution. The Gypsy retroelement of the melanogaster subgroup probably invaded the Zaprionus genomes about 11 MYA. In contrast, the Micropia retroelement may have been introduced into the Zaprionus subgenus and the melanogaster subgroup from an unknown donor more recently (~3 MYA). Conclusion Gypsy and Micropia of Zaprionus and melanogaster species share similar evolutionary patterns. The sharing of evolutionary, ecological and ethological features probably allowed these species to pass through a permissive period of transposable element invasion, explaining the proposed waves of horizontal transfers.


Background
The Zaprionus genus (Diptera, Drosophilidae) is composed by two subgenera (Anaprionus s.s. and Zaprionus s.s.) which seem to be originated in Asia about 14 MYA [1]. The Anaprionus subgenus has first diversified in the Oriental biogeographic region, being relatively less studied. However, the Zaprionus subgenus ancestor(s) originated during the Middle to Early Miocene in Oriental regions before diversifying in Tropical Africa, sharing space and time with the best studied drosophilid species, the melanogaster subgroup [1,2]. Due to its evolutionary history, ecological and morphological diversity, the Zaprionus genus seems to be a good model for comparative studies with the melanogaster subgroup. Although the phylogenetic relationships within the Zaprionus genus had been recently proposed [1], its taxonomic positioning in the Drosophilidae family remains a matter for discussion. Currently, most reports agree that Zaprionus belongs to the Drosophila genus, and that it would be more closely related to Drosophila than to Sophophora subgenera [3][4][5][6][7][8][9][10], to which the species of the melanogaster subgroup belong ( Figure 1). Hence, Zaprionus and melanogaster species would share a last common ancestor at least as old as the divergence of the Sophophora and Drosophila subgenera, i.e., between 35 and 60 MYA [5,11].
While the melanogaster species subgroup -mainly the two sibling species D. melanogaster and D. simulans -presents the most well known dipteran mobilomes, knowledge about the transposable elements (TEs) content of the Zaprionus species is scant. Only seven TE families have been identified in all Zaprionus genus. The elements 412 of Z. tuberculatus, 731 of Z. ornatus, Bari-1 of Z. tuberculatus and, Mariner of 12 Zaprionus species could only be detected by Southern blot hybridization using probes derived from D. melanogaster [12][13][14][15][16][17]. Furthermore, partial sequences were produced for Gypsy of Z. indianus, Copia of Z. tuberculatus, Mariner of Z. tuberculatus and Z. verruca, and Rover of Z.
indianus [17][18][19][20]. Interestingly, evolutionary analyses demonstrated that four of these seven TEs of Zaprionus are highly related to those the melanogaster subgroup, sharing their common ancestor more recently than the species ones. Gypsy, Copia, Mariner and Rover were therefore proposed to have been horizontally transferred between species of the Zaprionus subgenus and the melanogaster subgroup of the genus Drosophila [15][16][17][18]20,21]. Horizontal transfer (HT) of TEs has been inferred when three criteria are met: (a) high sequence conservation between TE sequences of distantly related species; (b) incongruence between TE and host phylogenies and, (c) discontinuous distribution of a TE across a species group. Additionally, requirements like geographic, temporal and ecological overlapping and, exclusion of alternative hypothesis as ancestral polymorphism promoting differential subfamily fixation and high selective constraints over the sequences need to be considered [22].
In a previous study, we have performed a search of D. melanogaster TEs in the American continent invasive species Z. indianus by Dot blot methodology, aiming to broaden our understanding of TE occurrence in species of the Zaprionus genus [23]. All the 10 TEs analyzed were present in the Z. indianus genome, but only five had strong hybridization signals with the probes that were used: Roo/B104, Doc, MDG-1, Micropia and Gypsy. Among them, we selected Gypsy and Micropia retroelements to expand our evolutionary analyses in the Zaprionus genus. This choice was due to the evidence of the close relationship of these Phylogenetic tree of the Drosophilidae family Figure 1 Phylogenetic tree of the Drosophilidae family. Phylogenetic relationships between the main Drosophilidae species groups, evidencing the taxonomic positioning of the Zaprionus genus inside the Drosophila subgenus and the divergence time between Drosophila and Sophophora subgenus [redrawn from reference [7]].
Z. indianus elements with others already described in the literature, inferred by the strong hybridization signals using heterologous probes, the easy amplification with the primers available in the literature and the proposition of a Gypsy HT of between Z. indianus and D. simulans [18].
Gypsy and Micropia are classified as Ty3/Gypsy elements because of their gene arrangement in the pol coding region -Reverse transcriptase/RNAseH/Integrase [24]. Ty3/Gypsy retroelements, which are similar to vertebrate retroviruses in both sequence and genomic structure, are broadly distributed among eukaryotes [25]. They differ from vertebrate retroviruses mainly because they do not have an env gene necessary to complete the infectious cycle [24]. Gypsy is an exception to the typical Ty3/Gypsy structure because an active env is present, potentially capable of maintaining infectivity [26]. Hence, Gypsy was classified as an Errantivirus by the International Committee on Taxonomy of Viruses (ICTV) [27].
The Micropia retroelement had already been studied in D. melanogaster, and several species of the repleta and the cardini groups of the subgenus Drosophila [28][29][30]. Evolutionary analyses showed the existence of at least two Micropia families that have ~30% divergence in DNA sequence [29]. The first family, present in the repleta group, comprises two subfamilies than were differentiated by nucleotide divergence and antisense RNA expression. The repleta species group seems to have transmitted one of its elements to the D. cardini group through HT [30]. The second family is represented by the D. melanogaster element that also transcribes antisense RNAs responsible for control of its expression [31]. The data suggest that the Micropia retroelement is an ancient component of Drosophila genomes [29].
In contrast to Micropia, the Gypsy retroelement has been analyzed in the 12 Drosophila genomes [32], and individual sequences from several Drosophilidae species are available, such as the virilis and repleta groups of subgenus Drosophila; the willistoni and melanogaster groups of the subgenus Sophophora; and the Scaptodrosophila latifasciaeformis and the Z. indianus species [18,[32][33][34]. Two Gypsy families have been identified, one exclusive to D. willistoni and the other widely distributed among Drosophilidae species [18]. The latter is divided into 10 subfamilies, seven of which have been implicated in HT events, demonstrating that inheritance of the Gypsy retroelement is complex [18,32]. HT events involving Gypsy could have been facilitated by the presence of env, assumed to confer autonomous infectivity on the retroelement, thereby allowing it to cross species barriers [18,32,35]. Specifically in the melanogaster species subgroup, two Gypsy subfamilies have been identified, and seven HTs have been proposed among melanogaster species subgroups, or between one species of that subgroup and S. latifasciaeformis and Z. indianus [18,32]. These inter-genera HTs are assumed to have occurred because synonymous sites of Gypsy env gene are three to five times more highly conserved than those of host genes in species that diverged longer than 40 MYA [18].
We have now surveyed the Gypsy and Micropia retroelements in seven species of the Zaprionus genus, comprising one species of the Anaprionus subgenus and six of the Zaprionus subgenus. Our objective was to investigate the distribution and evolutionary history of these two elements in this scarcely studied genus by comparing their sequences with those of the 12 Drosophila genomes and other sequences available in the nucleotide database. The results show that the Zaprionus Gypsy and Micropia retroelements are present and transcriptionally active only in the subgenus Zaprionus, to which they were introduced by more than one horizontal transfer event from different donor species.

Distribution of Gypsy and Micropia in the genus Zaprionus
In an attempt to search for the presence of Gypsy and Micropia retroelements in seven species of the Zaprionus genus (Table 1), PCR reactions using primers that amplify Gypsy env and Micropia RNAseH sequences were carried out (see Materials and Methods). This revealed the presence of these elements in the six species of the Zaprionus subgenus, but not in Z. multistriatus (Anaprionus subgenus). Three clones of each Gypsy and Micropia PCR fragments were sequenced and used in the evolutionary analyses. Additionally, RT-PCR reactions and Southern blot hybridization showed that all Zaprionus subgenus species analyzed harbor a few transcriptionally active insertions of Gypsy and Micropia elements, varying from one (Z. gabonicus) to seven (Z. indianus and Z. davidi) insertions for Gypsy, and from two (Z. tuberculatus and Z. africanus) to seven (Z. indianus) for Micropia [Additional files 1 and 2].

Evolutionary analyses of the Gypsy retroelement
To infer the evolutionary history of Gypsy retroelement in the Zaprionus subgenus, we carried out genomic searches for Zaprionus homologous sequences in the 12 Drosophila genomes and the GenBank database. The in silico search retrieved genomic sequences only from the melanogaster subgroup, indicating that the other seven

Inferring phylogenetic relationships and divergence
The phylogenetic relationships inferred using the Maximum Parsimony (MP), Maximum Likelihood (ML) and Neighbor-joining (NJ) methods of phylogenetic reconstruction produced similar patterns with identical main branches. The analyses were performed with three sequences of each Zaprionus species and 52 sequences of species from the melanogaster subgroup. The S. latifasciaeformis and D. willistoni Gypsy sequences available in the GenBank database were included in these reconstructions because of their respectively closer and more distant evolutionary relationships to those of the melanogaster subgroup [18]. Figure 2 shows the tree reconstructed by ML analysis. The sequences of the genus Zaprionus, the melanogaster species subgroup and S. latifasciaeformis were grouped together in three clades that were not congruent with the species phylogeny. Clade 1 (bootstrap 97%) contains sequences of D. melanogaster, D. erecta, D. simulans and only one sequence of the genus Zaprionus, Z. tuberculatus1. Clade 2 (bootstrap 84%) includes Z. tuberculatus, Z. camerounensis, S. latifasciaeformis, D. melanogaster, D. erecta and all sequences of D. yakuba, except D. yakubaA6, which branches outside the 3 clades. The reported clustering of D. erecta and S. latifasciaeformis has been explained as an HT event [31]. Finally, clade 3 comprises the Gypsy sequences of Z. davidi, Z. indianus and the two cryptic species, Z. gabonicus and Z. africanus, as well as sequences of D. melanogaster, D. simulans and all sequences of D. sechellia, with no bootstrap support. Additionally, a tree was reconstructed using the Zaprionus sequences, the genomic sequences of melanogaster subgroup, and Gypsy sequences of 17 other Drosophila species (from the guarani, cardini, pallidipennis, tripunctata, repleta, anulimana and flavopilosa groups of the subgenus Drosophila and D. busckii [18]). A similar topology of the Figure 2 was obtained, with all Zaprionus sequences clustering together with melanogaster elements in the same clade distribution, which reinforces the close relationship between Zaprionus and melanogaster Gypsy sequences (data not shown).
Distance analysis shows that the sequences within the three clades have relatively low divergence levels, with mean values of 0.063, 0.070 and 0.042 within clades 1, 2 and 3, respectively ( Table 2 and Additional file 5). The distances of the Zaprionus vs melanogaster species within each group are similar to the mean values just cited. The mean distances between the clades show that clade 2 is closer to clade 3 than to clade 1, with mean values of 0.119 (clade 2 vs clade 3), 0.239 (clade 1 vs clade 2) and 0.241 (clade 1 vs clade 3). The Drosophila Gypsy sequences, which vary up to 20% in nucleotide composition, were previously classified into 10 subfamilies [18,32]. Among them, the sequences of Z. indianus and the melanogaster species subgroup were included in two different subfamilies. The addition of 18 sequences belonging to six Zaprionus species allowed us to confirm this proposition. The species distribution and divergence of the clade 1 Phylogenetic relationships between Gypsy retroelements Figure 2 Phylogenetic relationships between Gypsy retroelements. Phylogeny of Gypsy sequences of genus Zaprionus (green squares), the melanogaster subgroup (red circles), D. willistoni (blue triangles) and S. latifasciaeformis (pink circles). Numbers in squares indicate the clade number mentioned in the text. The tree was reconstructed using the maximum-likelihood method (HKY85 distance), as implemented in the PhyML program. The branch support was calculated using bootstrap test (1000 replications). Numbers next to species names indicate the clone or genomic sequence identification. from 2 and 3 (0.24) indicate that it corresponds to one of those subfamilies previously proposed, hereafter called subfamily 1 (S1). On the other hand, a divergence of 0.119 between clades 2 and 3 led us to classify them as variants 1 and 2 (V1 and V2) of another subfamily (S2) that may correspond with the second subfamily already described due to its species composition.
Testing the HT hypothesis Three hypotheses could explain the high levels of conservation among Gypsy sequences within each clade (0.04-0.07) and the phylogenetic inconsistencies compared to the species tree: (a) high selective constraints conserving the sequences; (b) ancestral polymorphism promoting differential subfamily fixation in the species; and (c) the occurrence of HT. To test whether (a) can best explain the estimated conservation, the ratio of non-synonymous divergence per synonymous divergence (dN:dS) within the clades were compared to those of the Gpdh nuclear gene, offering a comparative measurement of the selective constraints. Although active elements might suffer selective pressure to maintain their coding sequences, Gpdh would be expected to be even more restricted because it plays an essential role in glycerophospholipid metabolism in Drosophila. If it is assumed that synonymous substitutions are under almost strictly neutral evolution, dN:dS = 1, dN:dS <1 and dN:dS >1 will represent neutral evolution, purifying selection and positive Darwinian selection, respectively. The analyses indicate that the mean Gypsy dN:dS ratios ranged from 0.153 to 1.002, while the mean Gpdh values did not exceed 0.041 (Table   3, and Additional file 6). These ratios suggest that, although dN:dS <1, purifying selection over the Gypsy env sequences is relaxed (dN:dS mean = 0.395) compared to the host gene. Also, a Z-test for the Gypsy env region indicates neutrality for most of the pairwise comparisons (62%) within the clades [Additional file 7]. Hence, high selective constraints do not explain the incongruities observed intra-clades. On the other hand, the neutrality hypothesis was refuted for all pairwise comparisons between the clades, indicating that purifying selection could plays a role in the conservation of the env coding sequence among the Gypsy subfamilies. In that case, natural selection could act at the genomic level, conserving the TE genes, since elements that transpose most efficiently, or at the highest rate are most likely to survive and propagate [36].
In the absence of intra-clade selective constraints, the comparisons between the dS values of the TE and a host gene permit evaluation of the hypothesis of ancestral polymorphism within subfamilies. If the Gypsy retroelement is evolving vertically and the incongruence in the phylogeny are due to differential subfamily distribution in the analyzed species, the dS pairwise comparisons between the TE and the host gene would be expected to be equivalent. But, if the dS of TE is lower than the dS of host gene under similar or higher levels of selective constraints and in the absence of strong codon usage bias, HT events can be inferred. Using this approach, the dS of Gypsy and Gpdh gene, as well as the Codon Bias Index (CBI) were compared for one clone representing each species of Zap- rionus and the best Blastn subject of the genomic sequences in each clade. Codon usage bias is a factor that may be responsible for low dS values for TE sequences, since there is a negative correlation between codon usage bias and dS values [20]. The Codon Bias Index (CBI) indicates that the Gypsy and Gpdh sequences did not suffer high codon bias, although the mean value of 0.602 for Gpdh suggests that the small difference in magnitude of dS between Gpdh and Gypsy ought to be taken cautiously in inferring HT, since the variation in the Gpdh dS distance could reflect codon usage. Hence, comparisons between Gypsy and Gpdh dS were used to infer HT only if the dS values of Gypsy were at least twice as low as those of Gpdh. The Gypsy dS values between Zaprionus and melanogaster species ranged from 3 (Z. tuberculatus2 vs D. melanogaster5 and D. erecta3) to 20 times (D. melanogaster7 vs Z. davidi2) lower than those of Gpdh ( Figure 3). On the other hand, comparisons of dS between the Zaprionus species within each clade gave Gypsy dS values approximately equal or higher than Gpdh dS, excepting for Z. davidi2 vs Z. indianus1 and Z. africanus1 which presented dS of Gypsy slightly lower than the host gene ones. These results may indicate that the Gypsy retroelement evolved mainly by vertical transmission within the Zaprionus subgenus, but there may have been HT events between the Zaprionus and the melanogaster species what could reject the hypothesis of ancestral polymorphism within the clades.

Phylogenetic relationships of the Micropia retroelement
In order to evaluate the evolutionary history of Micropia, the same approaches of phylogeny and divergence estimations used for Gypsy were applied. Sequences homologous to the Zaprionus species were also identified only in species of the melanogaster subgroup, except for D. erecta, the unique species of the melanogaster subgroup for which no significant matches were obtained. D. sechellia had the most hits (22)  simulans genomes harbor putatively full-length insertions [Additional file 8].

Inferring phylogenetic relationships and divergence
The MP, NJ and ML reconstructions of the Micropia sequences from the Zaprionus subgenus, the genomic sequences from the melanogaster subgroup and the Gen-Bank sequences of the repleta group had similar topologies. As for Gypsy, only ML reconstruction is given ( Figure  4). The tree exhibits several incongruences according to species phylogeny, such as grouping species in the Zaprionus genus together with those of the melanogaster subgroup. To evaluate the ancestral polymorphism hypothesis for Micropia, the dS values of Micropia RNAseH and Gpdh sequences were compared between the sequences of Zaprionus and melanogaster species in each clade ( Figure 5). All the dS distances for the Micropia were lower than those for Phylogenetic relationships between Micropia retroelements Figure 4 Phylogenetic relationships between Micropia retroelements. Phylogeny of Micropia sequences of genus Zaprionus (green squares) and the melanogaster (red circles) and repleta (blue triangles) groups. The trees were reconstructed using the maximum-likelihood method (HKY85 distance), as implemented in the PhyML program. Numbers in squares indicate the clade number mentioned in the text. The branch support was calculated using bootstrap test (1000 replications). Numbers next to species names indicate the clone or genomic sequence identification.
the Gpdh. For instance, the dS between Z. tuberculatus and the melanogaster subgroup sequences in clade 1 were from 61 (vs D. simulans) to 76 (vs D. sechellia) times lower than that those for the Gpdh. Those for the D. yakubaA1 sequence were from 18 (vs Z. indianus1) to 31 (vs Z. camerounensis1) times lower. In addition, dS was zero in some comparisons, e.g. those among the Z. camerounensis, Z. davidi, Z. africanus and Z. gabonicus sequences of clade 1. The high conservation in the synonymous sites of the RNAseH sequences favors the hypothesis that their low divergence is due to recent acquisitions of Micropia in the Zaprionus and melanogaster genomes, and that HT has shaped its evolution, as already proposed for the Gypsy retroelement.

Network trees corroborate the inference of HT
Network reconstructions were used as an additional approach to test the hypothesis that HT has shaped the evolution of Gypsy and Micropia in Zaprionus and melanogaster species. Since TE sequences can be considered as populations of sequences that share a common ancestor, this type of phylogeny lets one propose possible donors and receptors species, and to reconstruct the TE dispersion routes by the proposition of the ancestral nodes.
The Gypsy network indicated three clusters of sequences (Figure 6a), which correspond to the S1 subfamily and the two variants of the S2 subfamily (S2V1 and S2V2). Moreover, the S1 subfamily was closer to S2V2 than to S2V1, which indicates that S2V1 diversified from S2V2. The presence of median vectors -that correspond to a hypothetical (in theory ancestral) or unsampled sequencesconnecting the Zaprionus and melanogaster sequences suggests that the HT events occurred in the ancestors of the sequences sampled, indicating that the lateral transmissions were not recent. The network tree allowed us to see a possible HT not clearly showed by the traditional phylogeny, but evident from the dS comparisons. The Z. davidi2 sequence is the ancestral node of the Z. africanus1 sequence (Figure 6b), suggesting a single HT event from Z. davidi to Z. africanus. The network reconstruction also let the D. yakubaA6 sequence be included as a S2V2 member.
Network reconstruction for Micropia clustered the sequences of D. yakuba, D. melanogaster, Z. camerounensis, Z. davidi, Z. indianus and Z. africanus with central median vectors, suggesting that high sequence conservation was not due to HTs involving the species analyzed, but probably one or more donor species not sampled (Figure 6c). The D. melanogasterA3 sequence connects the median vectors with a large cluster that harbors D. sechellia, D. simulans and D. melanogaster sequences, which in turn links a median branch to the Z. tuberculatus sequences. This arrangement indicates at least two more recent transfers, one from D. melanogaster to D. simulans and/or D. sechellia or even to the ancestor of those sister species (corroborated by the low dS values), and a second from any of these species to Z. tuberculatus. In addition, the branch of the D. sechelliaA20 sequence with a median vector that links the Z. camerounensis, Z. davidi, Z. indianus and Z. africanus sequences suggest an additional HT involving the ancestor of D. sechellia. However, the absence of geographic sharing between D. sechellia and Zaprionus species means that this proposal should be viewed cautiously. An alternative explanation could be a high rate of evolution of the D. sechelliaA20 sequence, followed by random convergent evolution.

HT inference and estimation of the divergence time between TE sequences
To reinforce the HT hypothesis, the divergence times of both retroelements were calculated using the molecular clock (see Material and Methods). The divergence estimates for Gypsy sequences showed that the HTs may have started around 11.2 MYA (Table 4). Regarding the S1 subfamily, two HTs can be proposed.  [18], exemplifying the importance of analyzing as many species as possible before inferring horizontal transfers [22]. Finally, the time of divergence also corroborates the HT from Z. davidi to Z. africanus S2V2 sequences, as seen in the network tree, which probably occurred ~1.9 MYA, subsequent to the divergence of these species (2.6 to 4.1 MYA [1]).

Discussion
This survey of the Gypsy and Micropia in seven species of Zaprionus indicates that they are widely distributed and transcriptionally active only in species of subgenus Zaprionus. Additionally, the evolutionary analyses demonstrate that these two retroelements are closely related to those of the melanogaster species subgroup, and that their histories might have been repeatedly marked by events of interand intra-subgenus HTs. Several pieces of evidence have shown that the phenomenon of HT is frequent in eukaryotes [22,[37][38][39]. It is important to note that the HTs proposed here were assumed only when the species involved in the event shared geographic, temporal and ecological environments, and when three separate pieces of evidence suggested their occurrence: (i) lower dS values between the TE sequences than the Gpdh sequences, (ii) incongruence between host and TE phylogenies, and (iii) last common ancestor of TE more recent than that of the species.
The studies of Micropia and Gypsy in subgenus Zaprionus species highlight the HT events as an alternative evolutionary mechanism of the retroelements evolution. Among the HTs inferred for Gypsy, four events seemed to involve the Zaprionus and melanogaster species, three involved only the melanogaster group species, and only one the Zaprionus species. For the Micropia retrotransposon, seven HTs were suggested: five between unknown donor(s) and the Zaprionus or the melanogaster species, one among the melanogaster species, and one between the Zaprionus and the melanogaster species. The 15 HTs we have proposed probably occurred during two different waves of invasions. First, the Gypsy retroelement of the melanogaster subgroup had been introduced in the Zaprionus subgenus ~11 MYA. After that (~3 MYA), a second HT wave involving the Micropia retroelement may have introduced it into both the Zaprionus subgenus and melanogaster subgroup from an unknown donor. After the introduction, the TEs could have been prone to transposition and re-introduction in other related species. For example, Z. davidi has donated their Gypsy sequence to Z. africanus after had received it from an ancestor of D. melanogaster, D. simulans and D. sechellia.
According to the TE evolutionary cycle, the TE history could start from HT events, followed by the initial transpositional burst in the new host, and then the accumulation of defective copies along with host-directed epigenetic silencing [40]. The last step could lead to the loss of mobility and, finally, to molecular erosion by random mutations. The high frequency of HTs, combined with the transcriptional activity and low insertion numbers of Gypsy and Micropia retroelements in the Zaprionus species subgenus indicates that these elements remain at the first or second steps of the evolutionary life history of the TEs, i.e. the invasion or the genomic spread period. A question that arises is whether these elements have been transferred in similar ways. HTs of Gypsy are easier to explain, since the active Gypsy env can enable its dispersion and expansion of the genomic territory, but this mechanism does not to apply to Micropia, which lacks env. Nevertheless, retroelements could have been transferred concomitantly with retrovirus infections, even Gypsy; or parasitic infestations, as for example by intracellular Wolbachia bacteria or mites [22]. Moreover, our data allowed us to propose there were HT of Gypsy and Micropia among sister species such as D. melanogaster, D. simulans and D. sechellia, although introgression could not be ruled out.
Another concern is the evolutionary process of fixation of the TEs after the horizontal introduction. These elements could share similar mechanisms of regulation, which would permit them escaping from selective forces and allow fixation after invasion. Control of Drosophila retroelement mobilization depends on the transcription rate, which is directly related to the presence of specific regulatory proteins [41][42][43][44][45] and the rate of RNA degradation, mediated by RNAi systems [46]. However, genetic information about the species of the genus Zaprionus is scarce. Further, studies of the Zaprionus genes involved in controlling mobilization could lead to hypotheses about the mechanisms that allow highly related sequences to be shared in the Zaprionus and melanogaster species genomes.

Conclusion
Our survey suggests that Zaprionus species have experienced waves of retroelement invasions, particularly during the last 7 million years. Species of the melanogaster subgroup might have donated their Gypsy sequences to species of the Zaprionus subgenus, and these two species groups could have received the Micropia retroelement from one or more unknown donor species. After the initial introduction, these species could share their elements. Since the Zaprionus genus and the melanogaster subgroup seem to share the same age of origin and diversification in tropical Africa, as well as ecological features, our data suggest that they passed through a permissive period of transposable element invasion during the diversification period.

Fly stocks and DNA extraction
All the strains of the Zaprionus species that were used are listed in Table 1, being derived from a single female selected randomly from a mass culture kindly provided by Drs Jean David and Amir Yassin from LEGS, CNRS, France. The D. melanogaster Canton-S strain was used as positive control for the molecular analyses. Genomic DNA was extracted from 10 individuals of each strain using the phenol-chloroform method [47].

PCR reactions, cloning and sequencing
Primers 2813 (5' TTA ACT CCT AGA GTT CAT CGC TGG 3') and 2814 (5' CAT GTA CCT GGT TAA CTA CTG ACC 3') were used to amplify the equivalent 386 bp fragment of the D. hydei Micropia retroelement located in a highly conserved region of the RNAseH domain, between nucleotides 2813 and 3198 [31]. The Gypsy retroelement fragment was obtained using primers GYP3S2F (5' AAA GGC GAY TTG GTT GAC ACT CC 3') and GYP3S2R (5' CAR GTG GCT RGG TTG RGT GTG 3') and corresponds to a 485 bp sequence located in the 6491-6511 region of env (ORF 3) in the D. melanogaster Gypsy element [18]. Both PCR reactions were performed in a final volume of 25 μl, using 200 ng of genomic DNA, 0.4 μM of each primer, 160 μM of each dNTP, 2 mM MgCl 2 and 1 U Taq Platinum polymerase (Invitrogen) in 1× PCR buffer. The Gypsy cycling parameters used for amplification have been described previously [18]. The Micropia PCR cycling parameters were: 94°C for 3 min for initial denaturation, 40 cycles of 94°C for 30 s, 58°C for 1 min, and 72°C for 1 min, followed by a final extension step at 72°C for 10 min. The fragments obtained were purified directly from the PCR product, using the GFX PCR DNA and Gel Band Purification Kit (GE Healthcare), and cloned with the TOPO TA Cloning Kit (Invitrogen). Three randomly chosen clones were automatically sequenced in an ABI PRISM 3100 Genetic Analyzer (Applied Biosystems/Hitashi) using the primer pair T7 and M13R.

Phylogenetic analyses
Zaprionus subgenus sequences were manipulated into BioEdit [48] and aligned with Clustal W 1.81 [49]. The most divergent clones (>25%) were selected as queries for a search in the 12 Drosophila genomes using the flybase BLASTn tool http://flybase.bio.indiana.edu/blast/ and the genome database released on October 17 th , 2008. The queries selected were Z. tuberculatus1, Z. tuberculatus2 and Z. gabonicus1 for Gypsy, and Z. tuberculatus2 and Z. africanus1 for Micropia. The structure of the genomic sequences were predicted using the BLAST 2 sequences and ORF Finder programs. In order to obtain only highly related and non-disrupted sequences the searches were performed with stringent parameters (e-values > e- 50 and 80% coverage). Redundant genomic sequences (100% identical) were not included in the phylogenetic analyses in order to minimize polytomies.
Multiple alignments of the Zaprionus and the genomic sequences were used to infer the phylogenetic relationships, using the maximum likelihood (ML), neighborjoining (NJ) and maximum parsimony (MP) methods, as implemented in PhyML 3.0, MEGA 4.1 and PAUP v.4.0b10 [50][51][52], respectively. Branch support was calculated by bootstrap analysis consisting of 1000 replicates [53]. In the NJ and ML analyses, maximum composite likelihood (MCL) and HKY85 distances were used to estimate the divergence matrices and reconstructed trees, respectively [54,55]. A heuristic search algorithm was used for MP reconstruction. The sequences obtained were registered in the GenBank database (Table 1)

Estimations of divergence time for Gypsy and Micropia
The time of divergence of Gypsy and Micropia retrotransposons was estimated according to the molecular clock equation r = k/2T, where r is the evolutionary rate (0.011 per site per MY, according to synonymous sites rate for genes with low codon bias in Drosophila [11]), k is the dS pairwise divergence, and T is the divergence time between species.

Median-joining networks
The median-joining network trees were obtained using all sequences studied in the conventional phylogenies. The reconstructions were performed using DNA Aligment 1.3.0.1 and NETWORK 4.5.1.0 software [58], available at the Fluxus Technology Ltd. website. 'MP calculation' was applied for post-processing the networks, the characters were equally weighted, and the other median-joining parameters followed the software default.