Estimation of divergence time between two sibling species of the Anopheles (Kerteszia) cruzii complex using a multilocus approach

Background Anopheles cruzii is the primary human Plasmodium vector in southern and southeastern Brazil. The distribution of this mosquito follows the coast of the Brazilian Atlantic Forest. Previous studies indicated that An. cruzii is a complex of cryptic species. Results A multilocus approach using six loci, three circadian clock genes and three encoding ribosomal proteins, was implemented to investigate in more detail the genetic differentiation between the An. cruzii populations from Santa Catarina (southern Brazil) and Bahia States (northeastern Brazil) that represent two sibling species. The analysis revealed very high FST values and fixed differences between the two An. cruzii sibling species in all loci, irrespective of their function. An Isolation with Migration model was fit to the data using the IM program. The results reveal no migration in either direction and allowed a rough estimate of the divergence time between the two sibling species. Conclusions Population genetics analysis of An. cruzii samples from two Brazilian localities using a multilocus approach confirmed that they represent two different sibling species in this complex. The results suggest that the two species have not exchanged migrants since their separation and that they possibly diverged between 1.1 and 3.6 million years ago, a period of intense climatic changes.


Background
Anopheles cruzii (Diptera: Culicidae) is the primary vector of human and simian malaria parasites in southern and southeastern Brazil [1,2]. Earlier studies that evaluated X chromosome inversion frequencies [3,4] and isoenzyme profiles [5] suggest that Anopheles cruzii is a species complex. A recent analysis of genetic differentiation using the timeless gene among An. cruzii populations from southern, southeastern and northeastern Brazil indicated that the population from Itaparica, Bahia State (northeastern Brazil) is a different species [6].
In the current study, a multilocus analysis using six different nuclear gene fragments was performed comparing two populations of An. cruzii (Florianópolis and Itaparica), representing respectively the southeastern and northeastern sibling species. Three of the fragments used are orthologues of Drosophila melanogaster genes involved in the control of circadian rhythms: timeless (tim), Clock (Clk) and cycle (cyc); and three code for ribosomal proteins: Rp49 (Ribosomal protein 49, known also as RpL32 -Ribosomal protein L32), RpS2 (Ribosomal protein S2) and RpS29 (Ribosomal protein S29).
The aim of the study was to determine if there is still gene flow between the two sibling species and to estimate their divergence time. Furthermore, circadian genes [7] putatively involved in the control of mating rhythms [8], such as timeless, Clock and cycle, are potentially important in maintaining temporal reproductive isolation between closely related species. Based on that, this study also aimed to verify whether the differentiation in circadian genes is higher than the divergence in constitutive loci, such as the ribosomal protein genes Rp49, RpS29 and RpS2.

Polymorphism and divergence between Florianópolis and Itaparica
One of the assumptions of the Isolation with Migration model used in this study is the absence of recombination within the studied loci. In order to fulfill this requirement, the optimal recombination-filtered block was extracted from each gene alignment (see below). Table 1 shows the position of the non-recombining (NR) blocks used in this study as well as the putative recombinant sequences that were removed (see Methods). Another assumption of the IM program is that the variation observed in the studied loci is neutral. Therefore, the Tajima [9] and Fu & Li [10] tests of neutrality were used and the results are presented in Table 2. No significant deviations from neutrality were observed after Bonferroni correction. Table 2 also shows the minimum number of recombination events for each gene (RM) and the length of the whole fragment and for the NR block of each gene (values in parentheses). The larger differences in length between the whole fragment and the NR block were observed for timeless and cycle and this was due to the higher number of recombination events identified in these two genes (RM = 14 and 5 respectively). The alignments of the whole sequences of each gene are presented in Additional files 1, 2, 3, 4, 5 and 6. All loci include at least one intron of variable size, Edition of sequences prior to IM analysis using the IM GC program and based on alignment presented in additional files 1, 2, 3, 4, 5 and 6. NR blocks, fragment positions of the non-recombining blocks used in the analyses; Removed sequences, the putative recombinant sequences removed before the IM analysis. except the cycle gene fragment, which was composed entirely of an exon. Except for the timeless gene, all base substitutions were synonymous or occurred within introns. The few non-synonymous changes found in the timeless gene are described in Rona et al. [6]. Table 2 also shows the number of polymorphic sites (S) for each An. cruzii sibling species and two measures of nucleotide diversity: π, based on the average number of pairwise differences and θ, based on the total number of mutations (values for the NR blocks in parentheses). In general, Itaparica was less polymorphic than Florianópolis, having showed the lowest θ and π values, as well as fewer polymorphic sites (S). Table 3 shows the pairwise estimates of population differentiation between the two An. cruzii sibling species. Very high F ST values (ranging from~0.6 to 0.9) were found between Florianópolis and Itaparica using both the whole fragment as well as the NR blocks for all loci. Table 3 also shows the average number of nucleotide substitutions per site (Dxy), the number of net nucleotide substitutions per site between species (Da) and the distribution of the four mutually exclusive categories of segregating sites observed in each comparison: the number of exclusive polymorphisms for each species (S 1 and S 2 ), the number of shared polymorphisms (S s ) and the number of fixed differences (S f ). The timeless and the cycle loci were the only ones that shared polymorphisms between Florianópolis and Itaparica, albeit they were few (7 and 1 for the whole fragment, respectively). All loci presented a large number of fixed differences between the two species ( Table 3).

Estimation of Demographic Population Parameters
The IM program was used to simultaneously estimate six demographic parameters (θ 1 , θ 2 , θ A , t, m 1 , m 2 ) from the two An. cruzii sibling species through an "Isolation with Migration" model using multiple loci [11]. As mentioned above, only the NR blocks were used and some recombining sequences were removed before the IM analysis (Table 1). Figure 1 shows the posterior probability distributions for each of the six demographic parameters estimated using IM and Additional file 7 summarizes the features from the marginal histograms for each of the parameters in all MCMC runs. Among four independent runs, the simulations between the two sibling species showed good convergence and consistency resulting in complete posterior distributions.
The estimates of θ suggest that the effective population size of the ancestral population is smaller than the current Florianópolis and Itaparica populations indicating that both may have had a history of growth since separation ( Figure 1). The migration rates in both directions for all combined loci were also estimated by the IM software (m 1 and m 2 ). No indication of migration was found in either direction in the multiple simulations.
The divergence time parameter was estimated for all combined loci in four different IM runs. This parameter cannot be directly converted to years because the mutation rates in Anopheles cruzii species are unknown. Therefore, an estimate of the divergence time between Anopheles cruzii species was performed using the average of Drosophila synonymous and nonsynonymous substitution rates for several nuclear genes (0.0156 and 0.00191 per site per million year respectively) [12]. Using this approach and based on the average of HiSmth values, an estimate of the divergence time between Florianópolis and Itaparica would be approximately 2.4 Mya (range from 1.1 to 3.6 Mya, based on the average of HPD90Lo and HPD90Hi values).
Another manner of estimating the divergence time between these two Anopheles species is to use the same Drosophila synonymous substitution rate mentioned above and the average Da values from the six loci (Table 3). Based on these values, the divergence time between the populations from Florianópolis and Itaparica was estimated to be 1.91 ± 0.76 Mya and 1.93 ± 0.65 Mya for the whole sequence and NR blocks, respectively.

Genealogy analysis
Gene trees of the sequences from all loci for both whole sequences and NR blocks were estimated using the Neighbor-Joining method (NJ) (Figures 2 and 3, and Additional files 8 and 9, respectively). The most suitable model selected using Modeltest 3.7 [13] was Kimura 2parameter [14] for all loci except for the Clock gene where the Jukes and Cantor [15] model was chosen. All trees were performed with 1,000 bootstrap replicates. The resulting NJ trees clearly grouped the sequences from the two sibling species in different clusters with high bootstrap values in most cases.

Discussion
The results presented here confirm the high level of differentiation between the Itaparica and Florianópolis sibling species of the An. cruzii complex [5,6]. Less differentiation might have been expected in the three genes that code for the highly conserved ribosomal proteins (Rp49, RpS29 and RpS2) than in loci possibly involved in the control of mating rhythms (timeless, Clock and cycle) [7,8]. The latter three genes are potentially important in maintaining temporal reproductive isolation between closely related species, and might be involved in the speciation process in some insects. In fact, Rona et al. [6] showed very high differentiation between Itaparica and the more southern Brazilian populations, including Florianópolis, using the timeless gene as a molecular marker. However, very high F ST values were detected in all loci between these two sibling species and they were even higher for Rp49, RpS29 and RpS2 (0.8854, 0.8865 and 0.8502, respectively for the whole fragment) than for timeless, Clock and cycle (0.8150, 0.7088 and 0.5806, respectively for the whole fragment). Mazzoni et al. [16] found similar results in a multilocus analysis between two sand fly vectors of leishmaniasis.
No indication of migration was found in either direction in the multiple IM simulations, which was consistent with the very high differentiation values for all loci. Itaparica also presented lower levels of variability than those from Florianópolis, possibly indicating a smaller population size. This is confirmed by IM results, which also indicated a smaller effective population size for Itaparica. The estimated difference in population sizes seems coherent, since the southern An. cruzii sibling species found in Florianópolis is distributed throughout most of the southern and southeastern Brazilian Atlantic Forest (from Santa Catarina to Espírito Santo State) while the northeastern sibling species found in Itaparica seems to occur only in a more restricted region [6].
The multilocus results corroborate previous data [5,6] indicating that these populations represent two different species in the An. cruzii complex. This was also confirmed by NJ trees, which show that Florianópolis and Itaparica are clearly separated in two isolated groups, except perhaps in the case of cycle which suggests persistence of ancestral polymorphisms in Florianópolis. However, this gene fragment presents a very small number of variable sites in the Itaparica sample.
The estimated divergence time from 1.1 to 3.6 Mya, based on the IM results, corresponds to the end of the Pliocene and beginning of the Pleistocene [17]. Significant climate changes, including the onset of heavy Northern Hemisphere glaciation, around 2.75 Mya, occurred at the end of the Pliocene [18]. A very important consequence of this cooling was an extensive increase in aridification, which lead to fragmentation of forests, including the Brazilian Atlantic Forest [18,19]. Interestingly, Carnaval et al. [20] discussed the hypothesis of refugia for neotropical species occurring in the Atlantic Forest. Itaparica is located in an area proposed to be a large central refugium in the Brazilian Atlantic Forest and another refugium is proposed in the southern and southeastern Brazil. Climate changes have been proposed to explain the differentiation among many groups such as fruit flies [21], insect vectors [22] as well as many forest-obligate species [20,[23][24][25]. Since An. cruzii is endemic to the Atlantic Forest, it seems likely that differentiation between its populations might have occurred due to forest fragmentation, which might have split a single ancestral species into two or more isolated groups.

Conclusions
The results of the multilocus analysis corroborate previous data indicating that Florianópolis and Itaparica represent two different species of the An. cruzii complex and suggest that they have not exchanged migrants since their separation between 1.1 and 3.6 Mya.
The sequences of the timeless gene from Florianópolis and Itaparica were those previously published by our group [6] (Accession numbers: FJ408732 -FJ408865). The sequences of the other genes were obtained by PCR, cloning and sequencing as described below.
The primers listed in Table 4 were used with An. cruzii genomic DNA extracted according to Jowett [27] in PCR reactions carried out in an Eppendorf Mastercycler® thermocycler using the proofreading Pfu DNA polymerase (Biotools). PCR products were purified and cloned using either Zero Blunt TOPO PCR cloning kit (Invitrogen) or pMOS Blue vector blunt-ended cloning kit (GE Healthcare). Sequencing of positive clones was carried out in an ABI Prism 3730 DNA sequencer at the Oswaldo Cruz Institute using the ABI Prism Big Dye Terminator Cycle Sequencing Ready Reaction kit (Applied Biosystems). The identity of the cloned fragments was determined by BlastX analysis using Gen-Bank [28].
At least eight clones were sequenced for each mosquito. Sequences were edited and in most cases consensus sequences representing the two alleles were generated. In a number of individuals only one haplotype was observed among the eight sequences and in these cases mosquitoes were classified as homozygotes. The probability of incorrectly classifying a heterozygote as a homozygote individual with this procedure is less than 1%. The sequences from homozygote mosquitoes were duplicated prior to analysis. However, when carried out without duplicating homozygote sequences, the analysis rendered similar results. Sequences were submitted to GenBank (Accession numbers: GU016330-GU016569).

DNA sequence analysis
The sequences were aligned using ClustalX software [29] and population genetics analysis was carried out using DNASP4.0 [30] and P RO S EQ v 2.91 [31] softwares.
The Modeltest version 3.7 [13] was used with a model block implemented in PAUP 4.0d105 [32] to find the most suitable model for each gene evolution. Models selected by the Bayesian Information Criterion (BIC) were favored and used in the phylogenetic analysis, carried out using MEGA 4.0 [33].
The IM program is an implementation of the Isolation with Migration model and is based on the MCMC (Markov Chain Monte Carlo) simulations of genealogies [11,34]. It simultaneously estimates six demographic parameters from multilocus data: effective population size for an ancestral and two descendent populations (θ A , θ 1 , and θ 2 , respectively), divergence time (t) and migration parameters in both directions (m 1 and m 2 ). Initial IM runs were performed in order to establish appropriate upper limits for the priors of each demographic parameter. These preliminary simulations generated marginal distributions that facilitated the choice of parameter values used in the final IM analyses. The convergence was assessed through multiple long runs (four independent MCMC runs with different seed numbers were carried out with at least 30,000,000 recorded steps after a burn-in of 100,000 steps) and by monitoring the ESS values, the update acceptance rates and the trend lines.
The Infinite Sites model [35] was chosen as the mutation model in the IM simulations because the two species are closely related and all genes are nuclear.
The optimal recombination-filtered block was extracted from each gene alignment using the IM GC program, which also removes haplotypes that represent likely recombinant sequences [36,37].  Degenerate and specific primers used to amplify the different gene fragments in the two An. cruzii sibling species. The sequences of primer pairs 5'CYCdeg1 + 3'CYCdeg1 and 5'aquaRP1 + 3'aeaquaRP1b are from [39] and [40], respectively. Degenerated primers were used in preliminary amplifications to isolate initial fragments of the cycle and Clock genes. Sequence of these fragments allowed the design of the specific primers used in the population genetics analysis.