Non-equilibrium estimates of gene flow inferred from nuclear genealogies suggest that Iberian and North African wall lizards (Podarcis spp.) are an assemblage of incipient species

Background The study of recently-diverged species offers significant challenges both in the definition of evolutionary entities and in the estimation of gene flow among them. Iberian and North African wall lizards (Podarcis) constitute a cryptic species complex for which previous assessments of mitochondrial DNA (mtDNA) and allozyme variation are concordant in describing the existence of several highly differentiated evolutionary units. However, these studies report important differences suggesting the occurrence of gene flow among forms. Here we study sequence variation in two nuclear introns, β-fibint7 and 6-Pgdint7, to further investigate overall evolutionary dynamics and test hypotheses related to species delimitation within this complex. Results Both nuclear gene genealogies fail to define species as monophyletic. To discriminate between the effects of incomplete lineage sorting and gene flow in setting this pattern, we estimated migration rates among species using both FST-based estimators of gene flow, which assume migration-drift equilibrium, and a coalescent approach based on a model of divergence with gene flow. Equilibrium estimates of gene flow suggest widespread introgression between species, but coalescent estimates describe virtually zero admixture between most (but not all) species pairs. This suggests that although gene flow among forms may have occurred the main cause for species polyphyly is incomplete lineage sorting, implying that most forms have been isolated since their divergence. This observation is therefore in accordance with previous reports of strong differentiation based on mtDNA and allozyme data. Conclusion These results corroborate most forms of Iberian and North African Podarcis as differentiated, although incipient, species, supporting a gradual view of speciation, according to which species may persist as distinct despite some permeability to genetic exchange and without having clearly definable genetic boundaries. Additionally, this study constitutes a warning against the misuse of equilibrium estimates of migration among recently-diverged groups.


Background
The study of emerging species poses several challenges for evolutionary biologists. From a phylogenetic perspective, for example, attempts to reconstruct relationships among such taxa are often hampered by a poor resolution of relationships, by lack of monophyly inferred from individual gene genealogies due to incomplete lineage sorting or, when multiple loci are analysed, by discordant scenarios portrayed by distinct genealogies. When a species splits into two, the diverging forms will share much of their genetic variation for a long period of time and the preexisting polymorphism may be sorted in a stochastic fashion or reflect differential selective pressures, not necessarily tracking species-splitting events. Moreover, closelyrelated species are likely to retain some permeability to the exchange of genes, which may affect some loci more than others [1,2]. These features become even more striking when more than two species are involved; in rapidly radiating taxa the chance of incomplete lineage sorting increases and complex patterns of admixture often arise, complicating the recognition of species boundaries in the context of both biological and phylogenetic species definition criteria [3][4][5][6]. As a consequence, an additional complication arises when attempting to estimate levels of gene flow among such nascent species because of the need to discriminate between the amount of polymorphism that is shared due to incomplete lineage sorting and to actual genetic exchange. In fact, several widely-used estimates of gene flow such as Wright's [7] equation relating F ST and the product Nm assume equilibrium migration and therefore may not perform well in species that are still undergoing lineage sorting and therefore violate this assumption [8][9][10]. Recent analytical methods [10,11], in contrast, do not assume equilibrium and allow the analysis of gene flow and divergence in the same framework, thus being appropriate tools to evaluate migration rates among recently diverged taxa.
Podarcis wall lizards in the Iberian Peninsula and North Africa constitute a cryptic species complex that has been studied using both mtDNA [12][13][14][15] and allozyme data [16][17][18]. In concordance with emerging morphological evidence [19][20][21] both types of markers have documented the existence of several highly differentiated population groups, some of which correspond to the currently accepted species (P. bocagei, P. carbonelli and P. vaucheri), whereas others constitute different forms within the polytypic and, from a mitochondrial perspective, paraphyletic P. hispanica. Reported genetic distances at the mitochondrial level are higher than traditional species-delimitation thresholds proposed for squamates [22]. However, not all the mitochondrial DNA lineages seem to belong to clearly distinct morphological entities [23]. Moreover, although both nuclear and mitochondrial markers provide roughly concordant species delimitation, a detailed comparison of both analyses reports two significant differences [18]. First, not all of the forms detected using mtDNA are readily distinguishable using allozymes, which could result either from massive introgression of the nuclear genome or from the low number of loci studied coupled with low overall diversity levels observed in allozymes, limiting their ability to detect subtle population structure. Secondly, evolutionary relationships between species and forms within P. hispanica as depicted by mtDNA are very well supported, suggesting a "step-by-step" speciation scenario, whereas the analyses of allozymes produces only a few well-supported multi-species clusters, implying a radiation scenario. These discordant results were interpreted as reflecting the shorter time required by mtDNA to achieve monophyly [18]. Despite their utility in this case of corroborating the major partitions inferred from mtDNA, allozyme analyses have several drawbacks, of which the most important for comparing the patterns of mitochondrial and nuclear levels of divergence is the lack of a genealogical framework to understand the relationships between alleles and the evolutionary processes underlying the distribution of genetic variation.
In this work we aim to analyse the evolutionary history of the Iberian and North African Podarcis species complex from a yet unexplored perspective: nuclear genealogies. We studied nucleotide variation at two nuclear introns in individuals representing all known morphotypes and mitochondrial lineages. Our main goals were: a) to investigate whether species and forms of Podarcis are monophyletic with respect to nuclear genealogies; b) if not, to assess the relative roles of incomplete lineage sorting and gene flow in shaping the observed patterns; and ultimately c) to understand if the genetic boundaries delimiting forms within the Podarcis species complex are welldefined. Our results evidence a striking contrast between monophyly of mitochondrial DNA and extreme polyphyly of nuclear genealogies. Moreover, because equilibrium-based, classic estimates of gene flow among forms are remarkably different from those estimated in a divergence, non-equilibrium framework, this study highlights the confounding effect that incomplete lineage sorting may have in migration rate estimates and consequent inference of species boundaries when recent divergence is not taken into account.

Mitochondrial DNA assignment
In order to assign individuals to a mitochondrial DNA lineage, we sequenced a portion of the ND4 gene in all analysed individuals. The mitochondrial data set included 78 sequences and was trimmed to a common fragment of 534 bp. Previously unpublished sequences have been submitted to GenBank, accession numbers EU269551-EU269594. This data set contained 187 polymorphic positions, of which 179 were parsimony informative. None of these polymorphisms correspond to insertions or deletions, which conforms to the coding nature of the region analysed. An ML tree depicting the relationships between the observed haplotypes is shown in Figure 1. Because it includes less informative characters, the depicted tree does not reflect the same well-supported relationships that were inferred in a previous phylogenetic study including 2425 bp of mitochondrial DNA sequence data [15] although exactly the same major phylogroups were recovered. This tree was used to assign individuals to a mitochondrial group. All the individuals that had a prior morphological assignment clustered within the expected mitochondrial lineage, with the exception of the individual BEV7337, which morphologically resembles a "Galera type" individual (P.A. Crochet, pers. comm.) but that nevertheless clustered within the sympatric P. hispanica sensu stricto.

Nuclear gene variability and recombination
We studied DNA sequence variation in two nuclear introns, 6-Pgdint7 and β-fibint7, in individuals representing all known mtDNA lineages. Alignment of the 6-Pgdint7 sequences for 136 chromosomes was trimmed to a common fragment of 503 bp. This 503 bp alignment required 10 insertions or deletions, from 1 bp to 23 bp long, and included a small poly-G repetitive region with size variability. Indels were inserted to maximise base pair identity in conserved sequenced blocks flanking the indel. A total of 65 different alleles were detected. Alignment of β-fibint7 was pruned to a common fragment of total length 951 bp for 140 chromosomes; this alignment required several insertions and deletions, including a very large (~350 bp) insertion in all samples bearing the "Galera" mtDNA lineage plus sample BEV7337 which was probably responsible for the failure to amplify the complete intron in these samples using primers BF8 and BFXF. At least 80 alleles were detected. The "Galera" insertion showed a poly-C/poly-T motif within it, with size variability, but each allele's number of repeats could not be resolved. Complete allele sequences have been submitted to GenBank, accession numbers EU269595-EU269659 (6-Pgdint7) and EU269471-EU269550 (β-fibint7). Correspondence between alleles, the samples in which they were detected and accession numbers is given in Additional File 1.
Polymorphism levels calculated from the data sets without alignment gaps are presented in Table 1. The nuclear loci analysed in this study, despite being introns, show polymorphism levels considerably lower than those observed in the mitochondrial DNA fragment analysed (e.g. π 6-Pgdint7 = 0.01721; π β-fibint7 = 0.01269; π ND4 = 0.10738). Interestingly, both nuclear genes show significantly negative values of Tajima's D, indicating a skew towards rare alleles. Both genes failed to pass the fourgamete test of Hudson and Kaplan [24] and were inferred to have suffered several recombination events (at least 7 in the case of 6-Pgdint7 and at least 8 in β-fibint7). However, when applying the more sensitive test of Bruen et al. [25] we were not able to reject the null hypothesis of no recombination (P-values for the permutation test shown in Table 1).

Nuclear gene genealogies
The haplotype networks inferred for both nuclear genes are shown in Figure 2. These networks were built after removing all but the first base of each indel (see Methods), which resulted in 100 variable characters and 61 alleles in 6-Pgdint7 and 113 characters and 72 alleles in β-fibint7. The correspondence between alleles represented in the haplotype networks and the complete alleles that have been deposited in GenBank is given in Additional File 1.
The most evident result is a complete lack of monophyly of the mitochondrial-defined groups. Many alleles are even shared by distinct species. Moreover, although the outgroup P. muralis represents a relatively distinct lineage according to both genes, the group formed by all Iberian and North African Podarcis is not monophyletic with respect to this species according to the β-fibint7 genealogy, since one individual carrying P. hispanica type 3 mtDNA exhibits an allele that falls within the clade formed by P. muralis.
Because this lack of monophyly could result from an insufficiency of data to recover correct relationships, we tested alternative hypotheses based on mitochondrial DNA phylogenetic relationships using a Shimodaira-Hasegawa test [26]. Using datasets without removing indels (but excluding the poly-C/poly-T region in "Galera" samples), we enforced two distinct topological constraints for each gene: i) monophyly of all species; ii) monophyly of the three major groups recovered for Iberian and North African Podarcis by Pinho et al. [15] (group 1: P. bocagei, P. carbonelli, P. hispanica type 1A and 1B, P. hispanica type 2; group 2: P. hispanica sensu stricto, P. vaucheri, P. hispanica Tunisia and P. hispanica Jebel Sirwah; group 3: P. hispanica type 3 and P. hispanica Galera). In both of these analyses P. muralis was also included and constrained to monophyly. Because a few cases of allele proximity could at a first glance be interpreted as suggestions of recent gene flow, we also conducted the same analyses excluding these alleles (BEV7337A and BEV7337B in 6-Pgdint7; BEV7337A, BEV7337B, Get1A and Get1B in β-fibint7). All these tests suggest that enforced topologies are significantly less likely than the ones observed (p < 0.01), with the exception of the division into three groups in β-fibint7 excluding the putative introgressed alleles.
Geographical origin and mitochondrial DNA assignment of samples used in this study

Genetic differentiation and equilibrium gene flow estimates between groups
We evaluated overall genetic differentiation among mtDNA-defined species by computing F ST for the two data sets. These results are shown in Table 1. Values obtained for the mitochondrial DNA are also shown for comparison (noting, however, that "species" are, by definition, monophyletic with respect to the mtDNA, leading to necessarily high levels of differentiation). Despite the lack of monophyly, differentiation between the 12 species accounts for around 61% of genetic variation in 6-Pgdint7 and 51% in β-fibint7; these values fall to around 46% and 40%, respectively, when the outgroup P. muralis is excluded from the analyses.
Examination of pairwise Nm values in more detail ( Table  2) reveals distinct trends across species pairs; in general Nm values based on nuclear genes are different from 0, ranging from 0.01 (between P. muralis and both P. bocagei and P. hispanica Tunisian type) to 1.76 (between P. hispanica types 1A and 1B). Besides this last case, other species pairs exhibit Nm values higher than 1, suggesting large levels of gene flow under this model: P. vaucheri vs. P. hispanica types 1A and 1B, P. hispanica type 1B vs. P. hispanica type 2 and P. hispanica sensu stricto vs. P. hispanica type 3. Gene genealogies for two nuclear introns in Iberian and North African Podarcis

Application of the isolation with migration model to Iberian and North African Podarcis
Independent runs using IM converged on approximate marginal posterior probability distributions. We were able to obtain reliable estimates for θ1, θ2 and migration rates across most of the tested pairs; however, some 90% highest posterior density (HPD) intervals could not be reliably assessed either for θ or migration. The posterior probability distributions for the ancestral population sizes were generally flat or had a maximum at the lowest bin; our data do not therefore seem to incorporate enough information for these estimates to be reliable. Time since divergence scaled by the mutation rate (t) was also not recovered with confidence for a few species pairs because the distributions were flat (albeit non-zero), particularly in the comparisons involving species that are reciprocally monophyletic for both genes. Most of the curves, however, were well resolved or presented a clear peak although their tails did not reach 0; in these cases we report a value for the HPD, without obtaining a reliable 90% HPD interval.
In order to convert IM estimates into biologically meaningful values (effective population sizes, divergence times in years or migration rates) one needs an estimate of mutation rate. For the two genes included in the analyses, we have no information regarding the mutation rate and no straightforward way of calibrating a molecular clock. We therefore focused on estimates that do not require the assumption of a particular evolutionary rate: the population migration rate (2Nm) and estimates of relative divergence times (t = tµ) between species pairs. These results are given in Tables 3 and 4, respectively. In these tables, results involving P. hispanica sensu stricto and P. hispanica type 3 were estimated excluding alleles presumably introgressed from P. hispanica Galera type and P. muralis, respectively (obviously, the comparisons with the species from which the alleles were introgressed were performed on the complete data sets). Confirmation that these alleles and not others were introgressed from the other species was obtained when runs excluding them yielded migration rates of 0, in contrast to the levels estimated when they were present. In other cases of introgression, alleles that were obviously introgressed could not be a priori pinpointed with confidence.
Population migration rates obtained using this method are strikingly different from those calculated using classic estimators of gene flow. In fact, 2Nm values between species pairs are in general very low, close to zero. Most of these values are at the lower limit of resolution because the obtained migration parameter estimates (m1 and m2) correspond to the first bin of the surveyed parameter space. However, non-zero estimates of gene flow were obtained between eight species pairs. The highest population migration rates were documented from P. bocagei into P. hispanica type 1A. A relatively high (2Nm > 1) migration rate was also detected from P. hispanica "Galera" type to P. hispanica sensu stricto and from the latter to P. vaucheri; lower but still non-zero rates were also observed from P. hispanica type 1B into P. carbonelli, from P. hispanica sensu stricto to P. hispanica type 1B, from P. hispanica type 3 into P. vaucheri, from P. hispanica Tunisian type to Jebel Sirwah type, and from P. muralis to P. hispanica type 3. Curiously, all these cases involve instances of asymmetrical gene flow.
In preliminary analyses, very high levels were also suspected to occur from P. hispanica sensu stricto into P. hispanica type 3, although estimates were not completely reliable because migration curves from both runs showed two peaks: one, corresponding to the maximum likelihood estimate, reflecting high levels of migration, and a smaller peak close to zero. That is, the program could not confidently choose between the hypotheses of high or zero gene flow. To solve this situation, we performed a second set of runs with a narrower prior distribution for divergence time (based on the clear peak obtained in the 0.000 -0.228 *: migration rate significantly different from 0 (P < 0.05). ‡, + : the actual interval could not be reliably estimated because the likelihood surfaces for θ ( ‡) or m ( + ) were relatively flat. first runs). These analyses yielded unimodal curves for all parameters, suggesting virtually zero gene flow between the two forms.
Of these eight cases of non-zero migration rates, only three were found to be significantly different from zero (P < 0.05) according to a likelihood ratio test: from P. bocagei into P. hispanica type 1A, from P. hispanica "Galera" type into P. hispanica sensu stricto and from the latter to P. vaucheri.
Estimates of relative divergence times range from 1.64 between P. hispanica sensu stricto and P. hispanica type 3, to 3.83 between P. carbonelli and P. hispanica type 2.

Lack of population structure in nuclear genealogies
The most striking result emerging from this study is a complete lack of monophyly at nuclear genes in Iberian and North African species of Podarcis. Both nuclear genealogies fail to detect obvious population structure that relates to that observed in mitochondrial DNA. In fact, there is a single common pattern emerging from the study of mtDNA, allozymes and both the nuclear genealogies, which is a close relationshp between P. hispanica from Tunisia and Jebel Sirwah.
This situation of polyphyly in nuclear genealogies relative to mitochondrial data is not uncommon (see for example [27][28][29], amongst many others, for similar results). At first glance, this discordance could suggest that mitochondrial DNA-defined species of Iberian and North African Podarcis do not correspond to true evolutionary entities and that this differentiation is the result of stochastic or deterministic effects acting only on the mitochondrial genome. In fact, it has been demonstrated that quite deep phylogeographic breaks in single gene genealogies may appear in the absence of historical barriers to gene flow simply because of stochastic lineage sorting [30,31]. It has also been shown that mitochondrial DNA divergence may not be necessarily corroborated by nuclear divergence because of selection acting on mitochondrial DNA [32]. However, there is a remarkable degree of concordance between the units defined based on mitochondrial DNA [12][13][14][15] and those observed by morphological analyses [19][20][21] and multilocus nuclear data [16][17][18]. It is therefore unlikely that subdivision within Iberian and North African Podarcis is nonexistent. Why, then, do nuclear genealogies apparently not support the partitions observed in mtDNA and, in particular, in allozyme data?
A simple explanation could be a low level of diversity and consequently of resolution in estimates of relationships. However, monophyly of Podarcis mtDNA lineages has been observed even in mitochondrial genes evolving at a very slow rate (such as the control region data set analysed by Pinho et al. [15], for which ~14.5% of the positions are variable (compared to ~22% in 6-Pgdint7 and ~19.7% in β-fibint7); also, if a lack of resolution was responsible for the polyphyly of mtDNA groups, then we would expect that no significant differences would be observed in tree likelihoods with and without enforced species' monophyly. Tests involving topological constraints, however, demonstrated that this is not the case.

Distinguishing between incomplete lineage sorting of ancestral polymorphism and gene flow
Given that lack of informative variation is not a satisfactory explanation for the discrepant results observed between mtDNA and nuclear genealogies, we are left with two possible scenarios, which are not mutually exclusive: a) incomplete lineage sorting of ancestral polymorphism and b) gene flow, particularly male-biased.
In order to investigate whether, despite their polyphyly, species of wall lizards are differentiated with respect to nuclear genealogies, it becomes critical to discriminate between the influences of these two scenarios in shaping the patterns of allele sharing. In order to do so, we estimated levels of gene flow based on nuclear genealogies. Our data strongly suggest that, although gene flow may be important between some species pairs, the polyphyletic pattern mostly derives from the incomplete lineage sorting of ancestral polymorphism. This is a likely scenario because nuclear genes take on average four-times as much time to reach monophyly than mitochondrial DNA does [33]. Therefore, when differentiation is recent, there is a distinct possibility that mitochondrial lineages may not be monophyletic with respect to nuclear genealogies. Three major lines of evidence support our conclusion of a greater effect of incomplete lineage sorting:

Comparison with allozyme variation
If gene flow was the main cause for the non-monophyly of species when considering nuclear gene variation, then allozyme data would also fail to observe any differentiation. This is certainly not the case in Iberian and North African Podarcis. A recent study documented not only that species are in general highly differentiated and monophyletic when considering a population tree, but also that most species can be individualized as discrete clusters by taking into account individual multilocus genotypes [18].
There are nevertheless exceptions with this regard, since some species pairs could not be distinguished, but even if these exceptions resulted solely from introgression, the correspondent pattern in nuclear genealogies would be the sharing of alleles between these particular hybridizing forms and not others, which is not verified.

Relationships between allele age and transpecificity
A second line of evidence suggesting a major role for incomplete lineage sorting relies on the simple observation of patterns of allele relationships, particularly on the β-fibint7 data set. If gene flow was the main cause for the observed species polyphyly, we would expect that both ancestral (alleles that have a central position in the haplotype network) and derived alleles were transpecific or closely related to alleles found in other species. However, we do not detect this pattern; instead, we find that putatively older alleles are more widespread among species than alleles placed as tips, which are likely to have arisen after the species were separated. In order to visualize this trend, we plotted for the β-fibint7 data set the number of mutations separating a given haplotype from haplotype BR25, placed in the centre of the network, against the number of mutations separating that particular haplotype from the nearest haplotype found in another species (Fig.  3). The ancestral nature of haplotype BR25 was confirmed using an alternative method of haplotype network construction [34], which pinpoints the haplotype that is most probably the ancestral within a network. Although Figure  3 is a very rough representation, it clearly illustrates the relationship between allele "ancestrality" and trans-species proximity. Exceptions to this "rule" therefore constitute the most obvious cases of gene flow, as documented by the outlier position of the haplotypes BR62 and BR66 (putatively introgressed from P. hispanica Galera type into P. hispanica sensu stricto). The results are not as clear for 6-Pgdint7, both because a central haplotype cannot be pinpointed with confidence and because this gene shows more segregation between species groups. Nevertheless, there is still a positive correlation between both measures (results not shown).

Non-equilibrium estimates of gene flow between species
Inferences based on coalescent simulations in a divergence with gene flow framework [11] suggest low levels of gene flow among forms, since we estimated virtually zero levels of historical gene flow between all but eight species pairs, of which only three cases were actually found to be significantly different from zero, Moreover, only one of these corresponds to important levels of gene flow since divergence.
Taken together, these results illustrate that the polyphyletic pattern probably arises as a consequence of the incomplete lineage sorting of ancestral polymorphism, and that, even though they are non-monophyletic, Iberian and North African wall lizard species are indeed overall differentiated with respect to nuclear genealogies.

Differences between equilibrium and non-equilibrium estimates of migration rates
This study highlights how different estimates of gene flow provide different pictures of the evolutionary dynamics of a species group. On one hand, according to F ST -based estimates of Nm, most species pairs have exchanged genes since they began to diverge; indeed, only a few comparisons suggest the absence of gene flow ( Table 2). Based on these estimates, we would be led to think that gene flow is a strong evolutionary force shaping genetic variability between species of Podarcis. On the other hand, gene flow estimates based on an isolation with migration model revealed only a few cases of clear genetic exchange between species. These results need to be interpreted with caution because the isolation with migration model that was fitted to our data does not directly apply to a multispecies scenario. However, as discussed above, these results are inline with inferences based on allozyme data and with the patterns of allele sharing or proximity, sug-gesting that gene flow, although existent, has probably had a minor role in shaping these lizards' evolution.
These differences between both classes of estimators are not completely unexpected. Wright's [7] island model of migration, on which the relationship between F ST and Nm is based, is indeed subject to numerous assumptions, of which several (if not all) are hardly met in most biological systems. One assumption of particular relevance to this study is that of equilibrium between migration and drift, which is violated in populations that have recently become completely isolated. In such cases low F ST does not necessarily translate into high levels of gene flow. This is a particular concern when estimating gene flow between species rather than populations, since in these cases effective population size are typically high and migration rates low, which implies that equilibrium conditions will take a very long time to be reached [9].
The pitfalls of gene flow estimates based on F ST have been widely debated in the literature [8,9,35]. However, only a few studies have actually demonstrated the influence of these limitations in the inference of migration rates in non-equilibrium conditions [36,37]. The present study Relationship between allele ancestrality and trans-specific proximity for locus β-fibint7 Relationship between allele ancestrality and trans-specific proximity for locus β-fibint7. Ancestrality was measured as the distance, in number of mutations, to haplotype BR25, while trans-species allele proximity was measured as the distance to the closest-related allele found in a different species. The area of each circle is proportional to the allele frequency. Alleles belonging to P. muralis (including the supposedly introgressed allele found in P. hispanica type 3) were excluded. Other putatively introgressed alleles are highlighted. Noteworthy, this warning is valid not only for F ST -based estimates but for any method that assumes equilibrium migration, even if other problems inherent to F ST are minimized [38].

Evidence for historical gene flow among Podarcis species
Although our results argue in favour of a major role for incomplete lineage sorting in shaping the observed polyphyletic patterns in nuclear gene genealogies, we identified a few species pairs that have probably been exchanging genes.
Very high levels of admixture were detected from P. bocagei to P. hispanica type 1A. Albeit lower, still significant levels were detected from P. hispanica "Galera" type to P. hispanica sensu stricto and from P. hispanica sensu stricto into P. vaucheri. Excluding the latter, these cases do not constitute a surprise since suggestions of hybridization and introgression between these species pairs have been described before. Concerning P. bocagei and P. hispanica type 1A, these were based on individual multilocus genotype clustering [18] and the observation of interspecific matings in nature (R. Ribeiro and M. A. Carretero, pers. comm.). Similarly, gene flow between P. hispanica "Galera" type and P. hispanica sensu stricto had also been reported based on discordance between mitochondrial and morphological variation, with some additional evidence from nuclear data [23].
Our results suggest that gene flow seems to have been more frequent between forms that are sympatric at least in part of their distribution (P. bocagei/P. hispanica type 1A; P. hispanica "Galera"/P. hispanica sensu stricto), which is not totally unexpected. However, we also find strong support for gene flow events between P. vaucheri and P. hispanica sensu stricto. These two species still have poorly studied distribution limits, so it is yet unknown if they establish contact zones or if they are completely allopatric. Nevertheless, in theory present contact should not be a prerequisite for gene exchange; given that species have most likely suffered repeated pulses of range expansions and contractions due to geological or climatic events, presently allopatric pairs of species may have met at some point in their past and then hybridized.
Besides spatial considerations, other results regarding introgression between species are noteworthy. First, in the case of P. bocagei and P. hispanica type 1A, the inferred levels of admixture were surprisingly high, above the values considered as thresholds for the maintenance of genetic differentiation [7]. However, the two species are sympatric and nevertheless maintain their morphological identifiability. It has also been shown that males discriminate conspecific from heterospecific females based on chemical stimuli, which suggests at least some degree of assortative mating [39]. Furthermore, the two species correspond to clearly distinct genetic clusters as shown by almost perfect assignment using individual allozyme multilocus genotype clustering approaches [18]. It therefore seems that the present estimates of gene flow between these two species are slightly inflated; this unexpected outcome could result from our limited sampling and therefore needs further clarification.
Second, we should discuss in more detail the situation of P. hispanica sensu stricto. It has been suggested that the divergent mitochondrial clade that we refer to as P. hispanica sensu stricto is a "ghost" lineage; in other words, this divergent mtDNA clade would be an extant signature of a nowadays extinct species whose mtDNA survived in the nuclear background of other species inhabiting its former area of distribution (namely P. hispanica Galera type and P. hispanica type 3) [23]. Similar scenarios of mtDNA persistence through interspecific capture and nuclear dilution have in fact been shown to occur in several other cases [40,41], so this hypothesis would not be completely farfetched. However, our results do not agree with this scenario. On one hand, we find that significant amounts of gene flow have indeed occurred between the mtDNA-defined P. hispanica sensu stricto and P. hispanica Galera type. However, these levels do not seem to have been large enough for the two forms to stand out as undifferentiated. The same situation is suggested by analyses of allozyme variation, which report high levels of differentiation and a complete lack of gene flow between nearby interspecific populations of these two taxa [18]. On the other hand, with respect to P. hispanica type 3, although this was unclear in preliminary runs, IM returned very low population migration rates in both directions. This therefore contrasts with analyses of allozyme data, in which these two forms were found to be non-diagnosable [18]. Because data with respect to the nuclear differentiation of mitochondrial DNA lineages from eastern Iberian Peninsula thus remains controversial, this subject clearly needs further assessment.
Third, although migration from P. muralis to P. hispanica type 3 was found to be non significant according to the test proposed by Nielsen and Wakeley [10], there is strong evidence suggesting that gene flow may have occurred between these two species. In fact, an allele separated by a single mutation from those typical of P. muralis alleles, which in turn are distinct by several mutations from the ingroup, was detected in an individual with P. hispanica type 3 mtDNA. For this to correspond to an allele shared by incomplete lineage sorting, it would imply that a single mutation had occurred in either of the lineages since the divergence of these two species (over 10 million years ago [42]), which is unlikely. We therefore believe this to be a false negative for the occurrence of migration, which the conservative significance test is prone to produce [10].
Although the confirmation of this result requires further investigation, this may imply that species of Podarcis take a long time of divergence to acquire complete reproductive isolation.
Nevertheless, this study should not be taken as a definitive assessment of gene flow among Podarcis species. First, because other false negatives may exist among the species pairs in which IM identified non-trivial migration rates. Second, and most importantly, because our limited sample size does not allow pinpointing with complete certainty which species have exchanged genes and which have not. We may infer from this work that historical gene flow among Iberian and North African Podarcis has been low, yet existent; however, a detailed analysis of the patterns of genetic exchange across species will require the detection and thorough characterization of contact zones.

Implications regarding the evolutionary patterns of Iberian and North African Podarcis
A primary observation that can be drawn from this work is that single nuclear genealogies do not reflect a branching pattern related to putative isolation in allopatry that led to the formation of the observed species, mainly because of shared ancestral polymorphism. Hudson and Coyne [43] estimated that, assuming that mutation and drift are the only evolutionary forces operating, reciprocal monophyly should be attained at ~95% of nuclear loci 9N to 12N generations after the population split. It is noteworthy that the minimum separation time between the various lineages has been estimated to be around 5 million years (i.e ~2 million generations), based on mtDNA phylogenetic analyses. Assuming that these estimates are correct, the fact that mtDNA lineages are not reciprocally monophyletic at the nuclear level suggest that the majority, if not all, of extant Iberian and North African Podarcis lineages were able to maintain a relatively high historical effective population size throughout their presumably eventful history, which allowed sorting of mtDNA lineages to monophyly without the corresponding pattern in nuclear genes. We hypothesise that larger effective population sizes in Podarcis might explain the contrasting patterns of subdivision observed at nuclear loci between wall lizards and, for example, the Iberian newt Lissotriton boscai. This species complex is characterized by similarly old or even younger mtDNA splits [44], but in this case the various groups have reached almost complete reciprocal monophyly at the β-fibint7 locus [45].
Pinho et al. [18] suggested that contrasting evolutionary scenarios inferred for the evolution of the genus on the basis of mitochondrial vs. nuclear markers (i.e. step-bystep speciation vs. radiation) could be accommodated taking into account the different effective population sizes that characterize both classes of markers. Indeed, speciation may have occurred at a rate fast enough so that nuclear gene genealogies are uninformative regarding species relationships but mitochondrial genealogies are not.
In fact, although multi-species clades in Pinho et al.'s [15] mitochondrial phylogenetic assessment of relationships are supported by high bootstrap values, internal branches are short, which is consistent with a fast speciation rate. Accordingly, our estimates of divergence time between mtDNA-defined lineages on the basis of nuclear markers do not correlate to mitochondrial DNA branching patterns; for example, the largest time since divergence was inferred for the mitochondrial sister pair P. carbonelli/P. hispanica type 2.
An intriguing characteristic of both nuclear gene genealogies isthe detection of a skew towards rare alleles, as suggested by largely negative and significant Tajima's D.
Although such a multilocus pattern can usually be interpreted as a signature of demographic growth, it has been shown that the same patterns may also arise as a consequence of fine-scale population structure, which is translated as a bias into sampling schemes such as ours, in which few individuals from multiple populations are analysed [46].

A gradual view of speciation
With the analyses of nuclear genealogies in Podarcis wall lizards from the Iberian Peninsula and North Africa, we have provided compelling evidence for general historical reproductive isolation among distinct lineages despite their non reciprocal monophyly. These results are therefore in accordance with emergent morphological data and previous reports of mitochondrial and nuclear gene variation suggesting that these forms may correspond to different, almost completely reproductively isolated species. We have also provided evidence for the existence of limited to important levels of gene flow among a few species pairs, also in accordance with previously published data. Although the effect that such admixture processes may have in species differentiation remains to be completely acknowledged, preliminary work on the contact zone between P. bocagei and P. carbonelli suggests spatiallyrestricted genetic exchange and a bimodal hybrid zone concordant with the existence of strong barriers to gene flow [47].
The lack of reciprocal monophyly and the permeability to gene flow implies that wall lizard lineages will fail to be considered good species in the light of most species con-cepts. Nevertheless, it appears that most forms remain differentiated despite the merging influence of gene flow. Our results in Podarcis wall lizards therefore add to recent evidence stemming from the study of other incipient species groups (such as heliconiine butterflies [48] for example) illustrating that the acquisition of reproductive isolation is a gradual process and contributing to the establishment of a "species with fuzzy borders" paradigm.
Podarcis wall lizards in the Iberian Peninsula and North Africa present a wide variety of distribution patterns, including sympatric, parapatric and allopatric forms and will therefore constitute an excellent model for testing hypotheses related to the origin and maintenance of reproductive isolation in closely related species.

Conclusion
Along with a growing body of work focusing on recent species complexes, this study supports a progressive view of speciation, according to which species may persist as differentiated despite remaining porous to genetic exchange and having poorly-defined genetic boundaries for a very long period of time after divergence. Such recent species complexes often have not yet achieved equilibrium between migration and drift, a condition which will strongly influence the outcome of studies aiming at quantifying gene flow if recent divergence is not taken into account.

Sampling and DNA extraction
Currently, 11 evolutionary units have been identified using mitochondrial DNA within the Iberian Peninsula and North Africa clade of Podarcis. The species used here as an outgroup, P. muralis, is phylogenetically distinct from the other forms [42,49] and has a wide distribution range throughout Europe, including Northern Iberian Peninsula. Our sampling scheme was designed to incorporate a minimum of five individuals per species, morphotype or mtDNA lineage ( Table 5). The majority of individuals were preliminarily identified and assigned to an evolutionary group using morphological criteria. Because some mtDNA lineages have not been studied in detail morphologically and others that have are indistinguishable, our final assignment was based on mitochondrial DNA sequencing (see below). We will, from here on, refer to the defined groups as "species" for a matter of simplicity. It should be noted, however, that the specific status of some of these groups still remains to be evaluated. The sampling scheme was as representative of the species distributions as possible. In total, 78 individuals were analysed. The geographical locations of the samples, as well as a preliminary distribution map of mitochondrial lineages are shown in Figure 1. Samples consisted of a small tail clip, obtained taking advantage of the lizard's natural tail autotomy capacity. All lizards were released after sample collection. Samples were stored in ethanol or frozen at -80°C prior to DNA extraction, which was accomplished following standard protocols [50]. All mitochondrial DNA sequences from P. bocagei, P. carbonelli and P. vaucheri individuals, as well as several individuals from other species (identified in Table 5) were generated for previous studies on mitochondrial variation [15,51].

DNA amplification, sequencing and haplotype determination
We generated sequences from two nuclear genes: β-fibrinogen intron 7 (β-fibint7) and 6-phosphogluconic acid dehydrogenase intron 7 (6-Pgdint7). To establish a correct assignment of all individuals to a mitochondrial DNA lineage, we further sequenced a mitochondrial fragment comprising the 3' end of the NADH dehydrogenase subunit 4 gene and adjacent tRNAs (from here on referred to as ND4). A list of the primers that were developed for amplification of the three studied genes is given in Table 6.
Amplification and sequencing of the ND4 gene was accomplished using primers ND4 and Leu [52] and followed the conditions described by Pinho et al. [15]. In this work the authors reported the existence of a nuclear pseudogene of ND4 in the P. hispanica morphotype from the population of Galera that was amplified instead of the mitochondrial fragment (see [15] for details). To avoid the amplification of this nuclear copy we developed primers targeting only the mitochondrial sequence observed in this morphotype (primers GalND4F and GalND4R, described in Table 6). Sequences obtained through the use of these primers were apparently single-copy and similar to the mitochondrial sequence inferred by Pinho et al. [15].
At a preliminary stage, amplification of the β-fibint7 gene was accomplished using primers FIB-B17U and FIB-B17L [53] under the conditions described by Godinho et al. [54], but lowering the annealing temperature to 50°C. However, because amplification and sequencing success was low with these primers, we tried several other combinations of primers, both already published and newly designed for this study. The definitive amplification and sequencing of this gene was accomplished using primer BFXF described in Sequeira et al. [55], as well as a new primer (BF8) designed considering the alignment of sequences from Podarcis with those from several Lacerta species [54]. Amplification was carried out in 20 µL volumes, containing 2 µL 10× reaction buffer (Ecogen), 2 mM MgCl 2 , 0.4 mM each dNTP, 0.2 µM each primer, 1 unit of Ecotaq DNA polymerase (Ecogen) and approximately 100 ng of genomic DNA. Amplification conditions consisted of a pre-denaturing step of 3 min at 92°C fol- This protocol failed to amplify some of the samples, including all P. hispanica with the Galera mtDNA type. We therefore designed an internal primer, BfibR, which was used together with primer BF8 under the same conditions described above.
The locus 6-Pgdint7 was chosen for this work based on very high polymorphism levels observed for the studied lizards in the allozyme encoded by the 6-Pgd gene [16][17][18]. We began by aligning mRNA sequences for several organisms (Homo sapiens, accession number BC000368, Mus musculus, accession number BC011329, Danio rerio, accession number AY391449 and Lasaea australis, accession number AF345495). Highly conserved regions situated on human exons 7 and 8 were targeted to design multiple degenerate primers for exon-primed, introncrossing (EPIC) PCR [56]. Several combinations of these were tried on a limited number of Podarcis samples under gradients of annealing temperatures and MgCl 2 concen-trations. The beginning, supposedly exonic portions of the amplified fragments revealed a high degree of homology with the mRNA sequences used to construct the initial alignment, suggesting that the sequenced genomic portion was indeed intron 7 of the 6-Pgd gene. New primers, named PgdP7F and PgdP8R, specific for Podarcis, were then designed in the exons and allowed successful amplification of this gene region. Optimal amplification was achieved using the same conditions described above for β-fibint7, with the exception that some samples required annealing temperatures of 60°C. An internal reverse primer, PGD500, was also designed for sequencing purposes.
The PCR products were enzymatically purified and sequenced using the ABI Prism BigDye Terminator Cycle sequencing protocol in an ABI Prism 310 automated sequencer (Applied Biosystems) with the same primers used for amplification.
We used two approaches to resolve the haplotype phase of nuclear DNA sequences. First, for sequences that were het-    erozygous for insertions or deletions, we used the method described by Flot et al. [57]. Second, we used the Bayesian algorithm implemented in the PHASE software [58], using known phases of haplotypes determined by the previous method. Several individuals were sequenced for which the haplotype phase could not be completely determined; these individuals were not included in subsequent analyses and for simplicity are not represented in Table 5 and Figure 1. Sequences were aligned manually using BIOEDIT v. 7.0.5.2 [59]. For nuclear genes, each allele was coded with the name of the individual carrying it following by the letters A or B.

Mitochondrial DNA phylogeny
We used an alignment of 536 bp of the ND4 gene to carry out phylogenetic analyses under a maximum likelihood approach using GARLI version 0.95 [60,61]. This software allows simultaneous estimates of tree topology, branch lengths and model parameters under the general-timereversible model of sequence evolution.

Nuclear gene genealogies
To illustrate relationships between haplotypes, a medianjoining network [62] was built using NETWORK [63]. We preferred this method of representing genealogies rather than phylogenetic trees because the levels of polymorphism observed at both nuclear genes were much lower than those observed in mitochondrial DNA. For these situations, haplotype networks constitute a better way of representing relationships [64]. Because multiple-base insertions or deletions are likely to have resulted from a single evolutionary step, we pruned the data in order to leave only the first base of the indel. We chose this approach rather than completely removing indels because this would significantly reduce the number of polymorphic sites used to build the network and disregard much of the information contained in the data sets. Nevertheless, some segregating positions located within areas of indels were removed by this pruning method.
In order to test alternative hypotheses (see Results) relative to the topology of the nuclear gene genealogies, we also searched for ML trees that better represented the phylogeny of the observed haplotypes using GARLI. Under the model of sequence evolution derived from these analyses, we performed new ML searches enforcing a particular topology using PAUP* 4.0b10 [65] and compared the resulting trees' likelihoods with those obtained from unconstrained analyses using Shimodaira-Hasegawa tests [26] under the RELL approximation with 1000 bootstrap replicates. This step was conducted using the full data sets (i.e. without removing indels from the alignment).

DNA sequence polymorphism, population subdivision and recombination
All the following analyses were performed using reduced data sets from which all indels were eliminated. We calculated summary diversity statistics (number of haplotypes, number of segregating sites, nucleotide diversity, π, and the population mutation parameter θ [66] for the three data sets and separately for each partition. We also computed Tajima's D [67] to test for non-neutral evolution of the analysed data-sets. We also computed values of Hudson et al.'s [68]F ST between all species pairs. These analyses were conducted in DNASP [69]. To evaluate the possibility of recombination in the nuclear genes, we computed Hudson and Kaplan's [24] Rm statistic (representing the minimum number of recombination events) using DNASP. Because this statistic is likely to be highly affected by homoplasy, we also used the software PHIPACK [70] to test for recombination using the pairwise homoplasy index (Φ w statistic) of Bruen et al. [25]. This statistic is a powerful detector of recombination and has been shown by simulation studies to be less sensitive to the effects of mutation rate correlation than other available statistics, which are prone to falsely infer recombination when levels of recurrent mutation are high [25]. PHIPACK provides P-values for the acceptance of the null hypothesis of no recombination. We used the two available options to estimate such P-values: an analytical approach and a permutation test (1000 permutations).

Estimation of migration rates between species
We used two different approaches to assess levels of gene flow between forms. First, Nm values were inferred from F ST according to Wright's [7] island model of population structure using the expression F ST ≈ (1+2Nm) for mitochondrial DNA and F ST ≈ 1/(1+4Nm) for nuclear autosomal loci. Traditional, F ST -based methods of estimating Nm make many unrealistic assumptions, among which is the equilibrium between drift and migration. Species that have recently diverged often share much of their genetic variation not only due to gene flow but also because of retention of ancestral polymorphism. In these cases, obtaining realistic estimates of migration between populations is challenging. Nielsen and Wakeley [10] developed a model which takes into account population divergence and gene flow in the same framework, thus being appropriate to disentangle between the relative effects of isolation and migration in shaping the patterns of variation among diverging species. This model was extended to incorporate multilocus data by Hey and Nielsen [11], as implemented in the IM computer program, which works under a two-population model and uses a Markov Chain Monte Carlo approach to estimate six parameters scaled by the neutral mutation rate: effective population sizes for both extant populations and their ancestor, time since divergence and migration rates on both directions. This model was developed for a simple two-population scenario and therefore does not directly apply to the complexity inherent to divergence within Iberian and North African Podarcis, in which eleven mitochondrial DNA lineages have been described. Nevertheless, because no applications of this model have been developed to deal with more than two populations, we performed pairwise analyses to assess historical migration rates between all species pairs. We included the outgroup, P. muralis, in these analyses, because preliminary inspection of sequence data for one of the studied gene regions suggested that introgression with the ingroup could have occurred (see Results). Several analyses involving more than two species have been successfully performed using IM or similar methods [29,36,[71][72][73]. It should be kept in mind, however, that using a two population model influence from other populations might have an unpredictable effect on our estimates. In order to minimize such effect, we performed a second set of runs for comparisons involving P. hispanica sensu stricto and P. hispanica type 3, excluding alleles that were apparently introgressed from other species (see Results).
Since IM cannot accommodate gaps in DNA sequence data, we pruned all gaps and missing data from the data sets. An assumption made by IM is no recombination; because we had no evidence for recombination in our data sets using Bruen et al. 's [25] method (see Results), we used the complete data sets in the analyses. We used the HKY [74] model in all analyses. After several experimental runs to assess appropriate parameter settings and ensure proper mixing, IM was run for each two-species data set for 20 million steps along the Markov Chain after 1 million steps of burn-in (using the ramped heating scheme, option -bh), with 5 Metropolis-coupled chains with linear heating. For each data set, a second, smaller run with 10 million steps using the same options and a different random seed was used to confirm convergence of parameter estimates.
To investigate whether estimates of migration rates were significantly different from zero, we compared the value of the likelihood for zero migration with the maximum likelihood for this parameter, by calculating the log-likelihood ratio between the two. The distribution of the loglikelihood test statistic (minus two times the log-likelihood ratio) can be approximated taking into account the distribution of a random variable which takes the value 0 with probability 0.5 and takes on a value from a χ 2 distribution with 1 degree of freedom with 0.5 probability [10].