Highly polymorphic mitochondrial DNA and deceiving haplotypic differentiation: implications for assessing population genetic differentiation and connectivity

Background Hyperdiverse mtDNA with more than 5% of variable synonymous nucleotide sites can lead to erroneous interpretations of population genetic differentiation patterns and parameters (φST, DEST). We illustrate this by using hyperdiverse mtDNA markers to infer population genetic differentiation and connectivity in Melarhaphe neritoides, a NE Atlantic (NEA) gastropod with a high dispersal potential. We also provide a recent literature example of how mtDNA hyperdiversity may have misguided the interpretation of genetic connectivity in the crab Opecarcinus hypostegus. Results mtDNA variation surveyed throughout the NEA showed that nearly all M. neritoides specimens had haplotypes private to populations, suggesting at first glance a lack of gene flow and thus a strong population genetic differentiation. Yet, the bush-like haplotype network, though visually misleading, showed no signs of phylogeographic or other haplotype structuring. Coalescent-based gene flow estimates were high throughout the NEA, irrespective of whether or not mtDNA hyperdiversity was reduced by removing hypervariable sites. Conclusions Melarhaphe neritoides seems to be panmictic over the entire NEA, which is consistent with its long-lived pelagic larval stage. With hyperdiverse mtDNA, the apparent lack of shared haplotypes among populations does not necessarily reflect a lack of gene flow and/or population genetic differentiation by fixation of alternative haplotypes (DEST ≈ 1 does not a fortiori imply φST ≈ 1), but may be due to (1) a too low sampling effort to detect shared haplotypes and/or (2) a very high mutation rate that may conceal the signal of gene flow. Hyperdiverse mtDNA can be used to assess connectivity by coalescent-based methods. Yet, the combined use of φST and DEST can provide a reasonable inference of connectivity patterns from hyperdiverse mtDNA, too.


Background
Assessing the amounts and patterning of spatio-temporal genetic diversity within and among populations provides essential information on the evolutionary dynamics of population structuring and speciation. Yet, very large amounts of genetic variation, i.e. genetic hyperdiversity where variable synonymous nucleotide sites can exceed 5%, may bias population genetic interpretations, due to the complex relationship between the statistics used to estimate genetic differentiation and processes that produce genetic differentiation [1,2]. Genetic hyperdiversity is more common than currently appreciated and occurs in at least 43% of animal species [3]. Genetic markers differ in their levels of variability, for example nuclear DNA microsatellites are often genetically very variable [4,5], while mitochondrial DNA (mtDNA) usually reveals more moderate amounts of genetic variation. Nevertheless, the marine, rock-dwelling, planktonic-dispersing gastropod Melarhaphe neritoides (Linnaeus, 1758) (Gastropoda: Littorinidae) shows hyperdiverse mtDNA [3].
Planktonic dispersers spread as planktonic larvae during early life stages and subsequently become sedentary after settlement. Planktonic dispersers with a long pelagic larval duration (PLD), such as M. neritoides (PLD = 4 to 8 weeks) [6], are expected to display long-distance dispersal and high rates of gene flow, and to show little, if any, population genetic differentiation even over thousands of kilometres [3,[6][7][8]. However, at least three methodological issues may blur the exploration of these paradigmatic expectations in species with hyperdiverse mtDNA: (1) Highly variable genetic markers whose mutation rate (μ) is similar or higher than the levels of gene flow, i.e. N e μ ≥ N e m, violate the assumption of a negligible μ [9][10][11][12]. This results in a low F ST (and relatives) values that reflect the influence of mutation [F ST = 1/(1 + 4N e *μ)] instead of the influence of migration [F ST = 1/(1 + 4N e *m)]. Indeed, although mutations lead to genetic differentiation among populations and hence increase F ST , high μ generates numerous private alleles with low frequency as well as high within-population genetic diversity, which are two quantities that may bias genetic differentiation estimates in terms of F ST . On the one hand, F ST is restricted to values much less than 1 (mean maximum F ST ≈ 0.3585) if the frequency of the most frequent allele is low (near zero) or high (near 1) [13][14][15][16]. On the other hand, F ST drops to zero when within-population diversity is high [1,2,[17][18][19][20][21][22][23][24][25][26][27][28][29][30][31][32][33]. (2) Conversely, highly variable genetic markers may deceivingly suggest population genetic differentiation by concealing gene flow, because high μ may provoke a shortfall of shared haplotypes among populations and/or require unrealistic sample sizes to detect shared haplotypes [3].
(3) F ST -based methods may produce biased estimates of population genetic connectivity with highly variable markers [11,18,21,26,[32][33][34]. In contrast, gene genealogy-based methods, such as implemented in the MIGRATE-N software, are suited to accommodate highly polymorphic data and produce reliable population genetic connectivity estimates over the whole spectrum of mutation rates, because the coalescent process is not dependent on mutation rates [34][35][36][37]. Gene genealogybased methods use raw sequence data, to estimate genealogies and convert coalescent times between pairs of alleles into amounts of gene flow that would result in a similar distribution of alleles in gene genealogies. Therefore, gene genealogy-based methods can be used to assess the influence of high μ on estimates of population genetic differentiation inferred from frequency-based parameters.
Melarhaphe neritoides has hyperdiverse mtDNA with an extremely high haplotype diversity (Hd = 0.999 ± 0.001) and a high neutral nucleotide diversity (π syn = 6.8%) for 16S, COI and Cytb in the Azores [3], and for COI (Hd = 0.998; π syn = 7.6%) at the Galician coast [38]. This is mainly explained by an extremely high mtDNA mutation rate (μ = 1.99 × 10 − 4 mutations per nucleotide site per generation or 5.82 × 10 − 5 mutations per nucleotide site per year, at the COI locus) [3]. The species is distributed throughout the North East Atlantic (NEA), from Southern Norway to the Canary Islands i.e. over approx. 4000 km, and even up to 5500 km if the Cape Verde Islands are included [39][40][41], and over a West-East beeline distance of 6000 km from the Azores in the Atlantic to Lebanon in the eastern Mediterranean and into the Black Sea [42][43][44]. Within the Azores, M. neritoides revealed no shared mtDNA haplotypes among populations [3], thus deceivingly suggesting a lack of gene flow. Yet, in view of the confounding issues listed above, it is necessary to assess the influence of mtDNA hyperdiversity in M. neritoides on this observation and to check how patterns of population genetic structuring in M. neritoides are manifested over larger geographic scales. Hence, the present study aims at exploring to what extent the hyperdiverse mtDNA of M. neritoides influences the assessment of population genetic differentiation and connectivity in this species. It does so by: (1) assessing mtDNA differentiation among populations at several spatial scales within the range 1-6000 km to test for panmixis throughout the NEA, (2) comparing scenarios of gene flow among three oceanographic areas in the distribution range of M. neritoides, viz. the Azores, the NEA coast and the Mediterranean Sea, and quantifying coalescent-based gene flow among these oceanographic areas, and (3) illustrating the influence of mtDNA hyperdiversity in the estimation of population genetic differentiation and connectivity, by comparing estimates of population genetic differentiation and gene flow using mtDNA data with different amounts of polymorphism.

mtDNA population differentiation
In the total population (original hyperdiverse dataset A), G ST and φ ST reveal very low, but significant differentiation (G ST = 0.001, p = 0.02; φ ST = 0.005, p = 0.04), whereas N ST suggests no significant differentiation (N ST = 0.004, p = 1.00). So, haplotype frequencies are usually similar among sampling sites (Table 2).
In contrast, Morisita's unbiased dissimilarity index is significant (D EST = 0.679, CI = 0.664-0.688) and shows strong haplotypic differentiation in the total population. This means that haplotypes are usually distinct among the 11 sampling sites, with complete haplotypic differentiation (D EST = 1) in 47 of the 55 pairs of sampling sites, including the two geographically closest sites SM2 and SM3 that are 1.2 km apart. Five out of 390 haplotypes occur in more than one individual (H s = 4 and H w = 1) ( Table 1). Since one of these haplotypes is shared by two individuals of the same sampling site (H w = 1), only four haplotypes are shared among sampling sites (H s = 4), viz. within AZOR (between FLO and SM2), within ATCO (among the three sampling sites), between AZOR and ATCO (among PIC, POR, SCO, SPA), and between AZOR and MEDI (between FLO and RHO). The most common haplotype (hap 108) is shared between AZOR and ATCO, with a low frequency of 0.0125 (Table 3). No haplotypes are shared between ATCO and MEDI.    Therefore, of the 390 haplotypes, the vast majority is private to sampling sites (H p = 386 out of 390 haplotypes) and involves 96.7% of the 399 individuals sequenced ( Table 1). Four of the 11 sampling sites (FAI, SM1, SM3 and SMA), located in the Azores, share no haplotypes with other sampling sites (H s = 0). The non-hyperdiverse mtDNA dataset B provides the same picture of low but significant (only φ ST ) differentiation among sampling sites in the NEA (Table 2). However, haplotypic differentiation in the total population disappears (D EST = 0.026, CI = 0.000-0.100, i.e. not significantly different from zero), due to the reduced mtDNA variability in dataset B and the larger proportion of shared haplotypes (17% against 1% in dataset A).
We assessed population genetic differentiation at several spatial scales within the range 1-6000 km over the NEA basin among the three oceanographic areas ATCO, AZOR and MEDI (Table 4). At large scale, over the entire NEA, the AMOVA (dataset A) shows no significant differentiation among sampling sites (φ SC = 0.003, p > 0.05) or among the three areas AZOR, ATCO and MEDI (φ CT = 0.004, p > 0.05) ( Table 4).
Very low but significant differentiation is detected at the within-sampling site level using the hyperdiverse dataset A (φ IS = 0.007, p = 0.03), which is not an artefact of mtDNA hyperdiversity since identical results are obtained using the non-hyperdiverse dataset B (φ IS = 0.007, p = 0.03). The φ IS index reflects high variation among individuals of the same sampling site (σ = 99.31%) and not differentiation among sampling sites of the three areas or among areas. Indeed, at all spatial scales, the AMOVAs show that > 99% of the variation is due to within-sampling site variation and not to among-sampling site differentiation (< 1%). Moreover, none of the pairwise large-scale area comparisons of genetic differentiation (φ ST ) show significant values (Table 3). At smaller spatial scales, no population genetic structure is detected, neither among Azorean islands (100-550 km), nor among sampling sites on the same shore (1.2 km). Hence, these data suggest that there is no genetic differentiation among sampling sites at any scale over the entire species' distribution range.

Population genetic connectivity
Gene flow estimates with MIGRATE-N applied to datasets A and B suggest that M. neritoides complies with a Table 4 φ-based hierarchical AMOVA showing mtDNA differentiation among and within sampling sites of Melarhaphe neritoides in the NEA For each AMOVA are given the spatial scale (in parenthesis), the percentage of among-group variance or within-group variance (σ), the φ-statistic (φ, significant values marked with * for p < 0.05) and the associated probability of significance (p). For the abbreviation of geographical groupings and sampling sites names, see Fig. 3.
panmictic model, and hence that the species behaves as a single panmictic population over its entire distribution range. Indeed, M5 has the lowest log marginal likelihood of the six gene flow models tested, and the highest probability (p = 0.755) ( Table 5) Assessing genetic connectivity using the MIGRATE-N analysis of the non-hyperdiverse dataset B yields the same ranking of gene flow models (Table 5), in less computing time (5 to 24 days) than dataset A (18 to 34 days), but increases the model probability of the first-ranked model M5 to the maximal value (p = 1) and drops that of the second-ranked model M6 to near-zero (p = 1.66 × 10 − 47 ).
The hyperdiverse mtDNA data, i.e. the combined 16S-COI-Cytb dataset A and the single COI and Cytb genes, all show bush-like haplotype networks (Fig. 2  a, b, c) of private haplotypes represented by single individuals (i.e. singletons) and very few shared haplotypes among sampling sites (sectored circles), a pattern characteristic of DNA hyperdiversity. Intuitively, such pattern would not be associated with a strong signal of gene flow and population connectivity. Yet, it is exactly a pattern one would expect for high gene flow and strong connectivity [45]. Moreover, the lack of an association between haplotype relationship and geography in the networks suggests the absence of phylogeographic structure in M. neritoides in the NEA, which is also supported by the non-significant difference between N ST and G ST (N ST -G ST = 0.003), indicative of no phylogeographic signal [46]. The impact of mtDNA hyperdiversity becomes clear in the haplotype networks of the non-hyperdiverse combined 16S-COI-Cytb dataset B and the single 16S data, showing a classic star-like pattern typical of population expansion and high gene flow [47], where most new haplotypes arise by recent mutation events from a central widespread haplotype (Fig. 2 d, e).

Genetic hyperdiversity and assessing population genetic differentiation
The present study confirms that mtDNA in M. neritoides is hyperdiverse (π syn ≥ 5%), not only in the Azores  and Galicia [3], but all over the NEA. This mtDNA hyperdiversity results in an overwhelming number of private haplotypes and a paucity or lack of shared haplotypes among sampling sites as close as 1.2 km. Despite this nearly complete haplotypic differentiation (D EST ) among sampling sites, there is no significant pairwise population genetic differentiation (φ ST ). Yet, in the absence of hyperdiversity (dataset B), the haplotypic differentiation drops to zero, and thus showing the effect of mtDNA hyperdiversity on D EST . Therefore, when using hyperdiverse mtDNA markers, population genetic differentiation in terms of lack of haplotype sharing may be substantial, but is not indicative of population genetic differentiation in terms of fixation of alternative haplotypes, i.e. D EST ≈ 1 does not a fortiori imply φ ST ≈ 1. From a practical point of view, hyperdiverse mtDNA may require unrealistically high sampling efforts to detect haplotypes more than once and to reliably assess haplotype sharing among populations [3]. This phenomenon is best explained by the high mutation rates in hyperdiverse mtDNA, which generate numerous private haplotypes with low frequency that provoke a high within-population genetic diversity [3] influencing D EST , but not φ ST [48]. mtDNA hyperdiversity represents the upper boundary of intra-specific genetic variation, and allowed us to use φ ST and D EST at a limit of their applicability, i.e. for extreme intra-population variation. mtDNA hyperdiversity reveals that φ ST (and related indices such as F ST and G ST ) reliably measures population genetic differentiation in terms of dissimilarities in the frequencies of shared haplotypes and degrees of fixation of alternative haplotypes among populations, whereas D EST reliably measures differentiation in terms of lack of haplotype sharing among populations. This is in accordance with the use of F ST recommended by Wright [18, page 82], and the use of D EST intended by Jost [26]. The two indices thus measure two different, but complementary characteristics of population genetic differentiation.

Is Melarhaphe neritoides panmictic?
Our assessment of population genetic differentiation in M. neritoides in the NEA, based on mtDNA markers that are far more variable than Johannesson's [6] allozyme data, confirms that the pattern of broad-scale allozyme homogeneity between Cretan and Swedish populations of this species [6] is not the result of the lower variability of the allozyme data.
At large scales (2000-6000 km), no significant genetic differentiation is detected among sampling sites within (pairwise φ ST , φ ST , G ST and φ SC ) and between (φ CT ) oceanographic areas, indicating that there is no mtDNA differentiation in M. neritoides throughout the NEA. Yet, a small amount of differentiation is detected at the intra-population level (φ IS ), i.e. among individuals within sampling site. Intra-population variation without inter-population differentiation reflects the very high diversity of haplotypes within sampling sites, and besides, may be a sampling artefact since the Scottish population in ATCO has a smaller sample size (N = 18) than any other population (N = 32 to 43). As such, its haplotype composition may be more biased than elsewhere due to the extremely high haplotype richness of M. neritoides [3]. At smaller scales (1.2 km, 100 km, 550 km), our results also show no mtDNA differentiation at all among sampling sites. This was also reported at a very small scale (30 m) between upper and lower shores in Silleiro, Spain [38]. Thus, M. neritoides shows no sign of population genetic structure and, although we note that selection is potentially acting on M. neritoides mtDNA and may bias gene flow estimates by violating the assumption of neutrality which underlies the coalescent model of MIGRATE-N [11,34], our results suggest that M. neritoides is panmictic over the entire NEA basin.
The Atlantico-Mediterranean transition (defined here as the area encompassing the Gibraltar Strait, the Almeria-Oran Front and the Siculo-Tunisian Strait) and the English Channel potentially form barriers to dispersal, and hence possible phylogeographic breaks for planktonic-dispersing species [49][50][51]. Yet, our study did not find any evidence of barriers to gene flow or phylogeographic breaks over the entire NEA basin.

Hyperdiverse mtDNA and assessing gene flow
To the best of our knowledge, the present work is the first gene flow and genetic connectivity estimation in a marine gastropod over its entire geographic range in the NEA using a coalescent approach. The quantitative assessment of gene flow in M. neritoides, based on gene genealogies using MIGRATE-N, shows substantial gene flow within the whole NEA basin (N e m = 558 to 4419). This high rate of gene flow counteracts genetic drift and provokes spatio-temporal homogeneity of the species gene pool. Although global within the NEA, gene flow appears strongly directed eastward from the Atlantic towards the Mediterranean, than westward from the Mediterranean to the Atlantic. In this eastward gene flow pattern from the Atlantic to the Mediterranean, the Atlantic European coasts seem to act as a stepping-stone for gene flow from more western Atlantic areas such as the Azores. This pattern was apparent with data showing high mtDNA polymorphism (dataset A) but is no more supported with data showing reduced mtDNA polymorphism (dataset B). Therefore, model ranking in MIGRATE-N gene flow analyses is seemingly not influenced by the amount of mtDNA diversity, whereas model probability is influenced by the amount of mtDNA diversity and subsequent selection of one single model is better defined without mtDNA hyperdiversity.
A recent illustration of how hyperdiverse mtDNA data may affect the interpretation of genetic connectivity is provided by the Atlantic coral-dwelling crab Opecarcinus hypostegus. This species has a planktonic larval development (PLD unknown), with supposedly high potential for long-distance dispersal. Yet, gene flow in this species seems to be limited and follows an isolation-by-distance pattern [52]. Like M. neritoides, O. hypostegus shows an extreme degree of mtDNA COI variation (Hd = 0.999; π = 0.026; 22% polymorphic sites; H p = 187 out of 195 specimens) [52]. This high mtDNA diversity was interpreted as an early sign of speciation resulting from adaptive genetic divergence over the coral host species. Yet, Fu and Li's F and Tajima's D were non-significant [52] and hence do not provide signal of selection and/or demographic expansion. Moreover, the nucleotide diversity at synonymous sites in O. hypostegus (calculated from [52]), is well-above the threshold of 5% (π syn = 10.2%), indicating mtDNA hyperdiversity. This is in line with the bush-like pattern of the mtDNA haplotype network [ Fig. 2 [55]. Substantial long-distance gene flow is also reported outward the NEA, for the sea cucumber Cucumaria frondosa over 5000 km from Norway to the East coasts of North America (N e m = 80) [56]. The PLD of 4-8 weeks in M. neritoides is comparable to that of Paracentrotus lividus (PLD = 3 weeks) [57], Scrobicularia plana (PLD = 2-4 weeks) [58] and Cucumaria frondosa (PLD = 6 weeks) [59] (the PLD of Tectarius striatus is unknown). This suggests that, as expected, planktonic-dispersing species with a long-lived larval dispersal stage may achieve high levels of gene flow in the NEA basin.
The directional pattern of gene flow in M. neritoides as described by model M6 inferred from hyperdiverse mtDNA (dataset A) is congruent with the history of the sea currents in the NEA (Fig. 3). Short-lived Pleistocene sea surface currents allowed the colonization of Macaronesia from Eastern Atlantic areas [60]. However, nowadays the Azores Current flows eastward to Gibraltar, where its surface water enters the Mediterranean through the Atlantic Water Current [61,62] [68], suggesting that larval transport predominantly occurs from the Atlantic European coasts to the Mediterranean. In the opposite directions, gene flow appears weaker from the Mediterranean westward to the Atlantic European coasts and Macaronesia, as it goes against mainstream currents and rather follows the Levantine Intermediate Water and the Mediterranean Outflow Water that flow below 500 m depth westward to Macaronesia and northward to Ireland [62,69], as well as the seasonal northward flow of the Portugal Current in winter. Therefore, the Atlantic European coasts and Macaronesia are most probably a source of new, dispersing, haplotypes supplying the Mediterranean, rather than sinks receiving new haplotypes from the Mediterranean.

Conclusions
The mtDNA data presented here strongly suggest that Melarhaphe neritoides shows no genetic structure and is panmictic over its entire distribution range in the NEA, though with a predominantly eastward gene flow. The Mediterranean acts as a sink receiving large numbers of immigrants per generation from primarily the NEA coasts (N e m > 800). Direction in gene flow is, however, no more evident after removing hyperdiversity from mtDNA, suggesting a potential influence of mtDNA polymorphism on coalescent-based inference of gene flow model probability. The gene flow pattern revealed here is consistent with prior expectations and allozyme data. The mtDNA hyperdiversity (π syn ≥ 5%) of M. neritoides results in a lack of shared haplotypes among localities sampled throughout the NEA, up to a complete haplotypic differentiation between localities as close as 1.2 km. Yet, the deceiving haplotypic mtDNA differentiation among localities does not reflect a lack of gene flow, but results from the concealed signal of gene flow by the high mutation rate, so that sampling efforts are too low to detect shared haplotypes with realistic probabilities. When using such hyperdiverse genetic markers, population genetic differentiation in terms of a lack of shared haplotypes may be substantial, but is not indicative of population genetic differentiation in terms of fixation of alternative haplotypes (D EST ≈ 1 does not a fortiori imply φ ST ≈ 1). Because F ST (and its related parameters G ST , φ ST ) and D EST measure two different characteristics of population genetic differentiation, the inappropriate use of these indices can lead to erroneous interpretations of population genetic differentiation. However, when using F ST (or its relatives) as a measure of population differentiation through fixation of alternative alleles according to the original recommendation of Wright [18, page 82], and D EST as a measure of population differentiation by a lack of haplotype sharing as intended by Jost [26], the presence of mtDNA hyperdiversity will not affect population genetic interpretations, in which case F ST (and relatives) and D EST are complementary. When coalescent-based gene flow inference is not possible, combining φ ST with D EST gives reasonable clues about migration using hyperdiverse mtDNA.

Sample collection and DNA sequencing
We used 399 specimens of M. neritoides from 11 sampling sites throughout the species' distribution range in the NEA (Fig. 3, Table 6). Figures 1 and 3 were created using the open source geographic information system QGIS 2.8.8 [70] and shoreline data from the "Global Self-consistent Hierarchical High-resolution Geography" database [71]. All specimens were stored at − 20°C until DNA analysis. Remaining body parts were preserved in ethanol and deposited in the collections of the Royal Belgian Institute of Natural Sciences, Brussels (RBINS) under the general inventory number IG 32962. Genomic DNA extraction, amplification and sequencing of the 16S (482 bp), COI (614 bp) and Cytb (675 bp) mtDNA gene fragments, sequence assembly and alignment, were performed as described in Fourdrilis et al. [3]. In total, 1197 sequences of 16S, COI and Cytb gene fragments were used, 555 of which were previously published in Fourdrilis et al. [3] (GenBank: KT996152-KT997344), and 642 were obtained from 214 newly sequenced specimens (GenBank: KX537775-KX538416). The three gene fragments were used as single gene datasets, and were also concatenated using GENEIOUS 5.3.4 (http://www.geneious.com, [72]), producing 399 combined 16S-COI-Cytb haplotypes (1771 bp) referred to as "dataset A".
mtDNA hyperdiversity and assessing population genetic differentiation and connectivity The impact of mtDNA hyperdiversity on assessing population genetic differentiation and connectivity was investigated using two datasets: one with and one without hyperdiversity. The hyperdiverse dataset (A) contained the original, unmodified, 16S-COI-Cytb data. The dataset without hyperdiversity (B) was derived from the hypervariable dataset A by removing the hypervariable nucleotide sites. To this end, the original hypervariable  mtDNA data (A) were imported into NETWORK 5.0.0.1 [73] and hypervariable nucleotide sites were identified in the .sta outfile as the characters showing a weight > 1, which correspond to fast-mutating nucleotide sites and/ or sites segregating for two or more nucleotides (i.e. showing three or more variants). In this way, 346 hypervariable nucleotide sites were deleted from the 540 variable sites in the sequence alignment, representing respectively 10, 24 and 23% from the length of the 16S, COI and Cytb gene fragments. The total length of the new multiple sequence alignment is 1429 bp. This procedure allows to preserve the high genetic diversity (Hd moved from 0.999 ± 0.001 to 0.824 ± 0.001) while lowering the amount of polymorphism (S decreased from 30 to 13% and π from 0.013 ± 0.001 to 0.001 ± 0.001) ( Table 1).
Population genetic diversity and differentiation analyses mtDNA diversity metrics were computed for dataset A and dataset B in the 11 sampling sites separately and after pooling the 11 sites (referred to as "total population"), and for the three single gene datasets in the total population, using DNASP 5.10.1 [74]. DNASP considers sites with alignment gaps in the 16S sequences as fifth nucleotide states for the calculations of H and Hd, but excludes them from the calculations of S, π, π syn and π nonsyn . Population genetic differentiation was assessed in the total population (datasets A and B), by calculating G ST [75] and N ST based on a distance matrix of pairwise differences [46] using SPAGEDI 1.4 [76], φ ST based on a distance matrix of pairwise differences between haplotypes [77] using ARLEQUIN 3.5.1.3 [78], and the unbiased Morisita dissimilarity index D EST using SPADE [79]. Population genetic differentiation was also assessed among pairs of sampling sites by calculating pairwise φ ST using ARLEQUIN. The significance of pairwise φ ST was corrected for multiple test biases using the Sequential Bonferroni procedure [80] and only p-values that remained significant after these corrections were considered to be meaningful. Hierarchical analyses of molecular variance [AMOVA, 77] of Tamura-Nei distances among haplotypes were performed using ARLEQUIN, in order to quantify population genetic differentiation among groups (φ CT ), among populations within groups (φ SC ) and within populations (φ IS ) at several geographic scales, and to test for panmixis. The significance of φ-statistics was assessed using 90,000 permutations of individuals among populations, and of populations among geographic groupings. A population is a sampling site. Three groupings were defined to represent the three oceanographic areas of interest (Fig. 3), i.e. the North East Atlantic coast (ATCO, N = 95), the remote Azores archipelago at the southwesternmost border of the distribution area (AZOR, N = 265), and the Mediterranean (MEDI, N = 39). The AMOVA with three groupings contains nine populations following a sampling scheme k = 5,3,1 (i.e. first grouping including five populations, second grouping including three populations and third grouping including one population) and hence provides adequate statistical power (i.e. p-value ≤ 0.05 and at least 20 unique permutations) at this level [81].

Population genetic connectivity analyses
Population genetic connectivity in M. neritoides was qualitatively investigated by reconstructing a median-joining haplotype network [73] using POPART 1.7 [82] on the three single gene datasets, dataset A and dataset B. Such a network provides information about phylogeographic structure and gene flow among populations. Population genetic connectivity was then assessed and compared between datasets A and B by quantifying long-term gene flow, or immigration rate (i.e. N e m the effective number of immigrants per generation), among the three oceanographic areas AZOR, ATCO and MEDI in the NEA basin, using the Bayesian MCMC method implemented in MIGRATE-N 3.6.11 [83] and hosted on the CIPRES Science Gateway [84]. MIGRATE-N estimates the mutation-scaled population size (θ = 2N e μ for haploid mtDNA) for each area and the mutation-scaled immigration rate (M = m/μ). Subsampling the three oceanographic areas to get equal sample sizes was not necessary as the difference between the largest (AZOR, N = 265) and the smallest (MEDI, N = 39) sample sizes was less than ten-fold (personal communication: Peter Beerli, School of Computational Science and Information Technology at Florida State University). Five models of dispersal were first evaluated (Fig. 4): (M1) a full migration model with three population sizes and six immigration rates, (M2) an island model where all areas share a single mean estimate of θ and exchange genes with all other areas at the same mean rate, (M3) a source-sink model with three population sizes and three directional West-to-East immigration rates, where the main sink is MEDI receiving immigrants from the sources AZOR and ATCO, and the second sink is ATCO receiving immigrants from AZOR, (M4) a source-sink model with three population sizes and three directional East-to-West immigration rates, where the main sink is AZOR receiving immigrants from the sources MEDI and ATCO, and the second sink is ATCO receiving immigrants from MEDI, and (M5) a panmictic model with one population size parameter. Preliminary results showed that M3 was the second best model after M5, and included the possibility of null gene flow from AZOR to MEDI. Following this observation, an additional model was tested based on the hypothesis that setting the gene flow to zero from AZOR to MEDI would best fit the data: (M6) a source-sink model like M3 with three population sizes, but only two directional West-to-East immigration rates. In M6, the two sinks MEDI and ATCO are the same as in M3, but MEDI receives immigrants from only one source (ATCO) and not from AZOR. We ran MIGRATE-N analyses under an F84 mutational model, with a windowed uniform prior for θ and M, the bounds of which are (0; 2) and (0; 9500) respectively. For each model, we ran three replicates using four MCMC chains with relative temperatures of 1.0, 1.5, 3.0 and 100,000, and of 500 million generations, which sampled one of every 100 iterations. The first 30% of generations were discarded from each run as burn-in. The MIGRA-TE-N analyses were computationally intensive. When replicates were run consecutively, five to 12 weeks were required depending on the model. The use of the message passing interface version of MIGRATE-N, enabling simultaneous analysis of replicates using three nodes, decreased computing time to 18-34 days. Convergence of MCMC chains was assessed by visual examination of the log trace of each posterior distribution showing caterpillar shape, and making sure that the effective sampling size value of each statistic was > 200 [85], using the 'coda' package [86] in R 3.0.2 [87]. The R script is accessible on Figshare (see Data Accessibility section). The models were ranked using log Bayes factors (LBF) and probabilities (p), that compare the marginal likelihood of each model calculated using the thermodynamic integration method implemented in MIGRATE-N [88]. The ranking tells how useful a model is to infer a relationship between the pattern of connectivity hypothesised and the biology of M. neritoides. The most useful information is found in the model ranked first. The effective number of immigrants per generation was calculated for haploid data with female-transmission following the equation N e m = θ recipient *M [89]. The effective population size was calculated with the equation N e = θ/μ using μ = 1.99 × 10 − 4 mutations per nucleotide site per generation from Fourdrilis et al. [3]. In order to verify the assumption of neutral evolution that underlies the coalescent model of gene flow inference in MIGRATE-N, selection was assessed by applying Fay & Wu's H statistic [90] to dataset A using DNASP 6.12.01 [91]. Tectarius striatus was the most closely related species to M. neritoides [92] for which the three same gene fragments of 16S, COI and Cytb were available on Genbank (U46825, AJ488644, U46826), and was therefore used as outgroup for the Fay & Wu test. The 95% confidence interval was calculated based on 10,000 coalescent-based simulations.