Evolution of microgastropods (Ellobioidea, Carychiidae): integrating taxonomic, phylogenetic and evolutionary hypotheses

Background Current biodiversity patterns are considered largely the result of past climatic and tectonic changes. In an integrative approach, we combine taxonomic and phylogenetic hypotheses to analyze temporal and geographic diversification of epigean (Carychium) and subterranean (Zospeum) evolutionary lineages in Carychiidae (Eupulmonata, Ellobioidea). We explicitly test three hypotheses: 1) morphospecies encompass unrecognized evolutionary lineages, 2) limited dispersal results in a close genetic relationship of geographical proximally distributed taxa and 3) major climatic and tectonic events had an impact on lineage diversification within Carychiidae. Results Initial morphospecies assignments were investigated by different molecular delimitation approaches (threshold, ABGD, GMYC and SP). Despite a conservative delimitation strategy, carychiid morphospecies comprise a great number of unrecognized evolutionary lineages. We attribute this phenomenon to historic underestimation of morphological stasis and phenotypic variability amongst lineages. The first molecular phylogenetic hypothesis for the Carychiidae (based on COI, 16S and H3) reveals Carychium and Zospeum to be reciprocally monophyletic. Geographical proximally distributed lineages are often closely related. The temporal diversification of Carychiidae is best described by a constant rate model of diversification. The evolution of Carychiidae is characterized by relatively few (long distance) colonization events. We find support for an Asian origin of Carychium. Zospeum may have arrived in Europe before extant members of Carychium. Distantly related Carychium clades inhabit a wide spectrum of the available bioclimatic niche and demonstrate considerable niche overlap. Conclusions Carychiid taxonomy is in dire need of revision. An inferred wide distribution and variable phenotype suggest underestimated diversity in Zospeum. Several Carychium morphospecies are results of past taxonomic lumping. By collecting populations at their type locality, molecular investigations are able to link historic morphospecies assignments to their respective evolutionary lineage. We propose that rare founder populations initially colonized a continent or cave system. Subsequent passive dispersal into adjacent areas led to in situ pan-continental or mountain range diversifications. Major environmental changes did not influence carychiid diversification. However, certain molecular delimitation methods indicated a recent decrease in diversification rate. We attribute this decrease to protracted speciation.


Background
Climatic and geological changes are considered to be major drivers of biological diversification. Many wellcharacterized radiations were initiated in aftermath of major geologic events [1][2][3]. Current biodiversity patterns reflect these consequential processes. While taxa with low dispersal ability may be particularly sensitive to changes in their environment, historically-formed patterns within these taxa are known to remain well preserved [4,5]. The taxon Ellobioidea (Gastropoda, Eupulmonata) comprises a group of morphologically and ecologically highly diverse snails, known to have successfully invaded the marine, brackish water and terrestrial habitats [6,7]. Species are traditionally classified into five taxonomic groups, the Pythiidae, the Laemodontidae, the Melampodidae, the Ellobiidae and the Carychiidae. These taxa have been recognized either as families within Ellobioidea or as subfamilies within the family Ellobiidae [6][7][8]. To avoid confusion, we will here refer to the taxon Ellobioidea and to families. Taxonomic descriptions and systematic classifications were exclusively based on morphological (anatomical and conchological) characters while extant species show a mosaic pattern of plesiomorphic and apomorphic features resulting from convergent evolution [6,7,9,10]. Due to the tenuous nature of morphological characters, phylogenetic reconstructions are extremely difficult [6,11]. Moreover, the high degree of homoplasy in morphological characters and frequent low variability has led to the description of approximately 800 species names available in the literature, whereby 250 are likely to be valid [12]. The most comprehensive molecular study for the Ellobioidea suggests a monophyletic origin of the entire group. However, the relationships among the five traditional ellobioid taxa are still unclear [13].
One lineage of the Ellobioidea, the Carychiidae Jeffreys, 1830 has successfully accomplished a complete transition onto land. Extant carychiid snails inhabit aphotic and permanently wet epigean (Carychium) or subterranean (Zospeum) environments throughout their Holarctic distribution. This dramatic shift from a marine to a terrestrial habitat has occurred independently of the stylommatophoran land-snails of the Eupulmonata [13,14]. As for all Ellobioidea, taxonomic and systematic descriptions of Carychiidae are based upon characters of the mature shell, which in the case of carychiid gastropods, are suspected to vary according to environmental conditions [15][16][17]. The first attempt to characterize carychiid taxa using DNA barcodes supported 90% of traditional morphospecies assignments [18]. Nevertheless, the same study only addressed a single population or a few populations per morphospecies. Phenotypic variability (in Carychium) and morphological stasis (in Zospeum) were identified as potential explanations for discrepancies in morphological and molecular taxonomy. In particular, Zospeum displayed high intraspecific genetic diversity within single morphospecies, as reflected by several cave-endemic evolutionary lineages (ELs). In the case of European Carychium, DNA barcoding revealed a previously overlooked taxonomic entity and helped to reevaluate the taxonomic status of questionable morphospecies [18,19]. Additionally, the phenotypically variable Carychium shell was shown to span a wide range of shell dimension and proportions, encompassing three morphospecies. Hence, it is very likely that the Carychiidae harbor a considerable number of morphologically unrecognized ELs.
A gastropod's migratory ability is correlated with its shell size [20,21]. Due to their small size, the active dispersal abilities of carychiid snails are highly limited. Slapnik [22] conducted investigations on Zospeum isselianum activity within a cave showing moving distances measuring 1 to 15 cm per week (on average 0.7 cm per day). In a comparative phylogeography of two European Carychium species, Weigand et al. [23] revealed that the local population structure is often formed by only a few mtDNA haplotypes suggesting that adjacent populations may only exhibit infrequent gene flow. Nevertheless, minute gastropods are well adapted to passive dispersal (e.g. [24,25]), a mechanism that best explains the foundation of transatlantic European Carychium populations in North America [26,27] and the postglacial recolonization of Northern Europe [23]. Although passive dispersal events are relatively rare, these, along with limited mobility, could well contribute to a higher incidence of isolated populations and narrow endemism.
Here, we use an integrative approach combining taxonomic, phylogenetic and evolutionary hypotheses to assess the diversification of ELs in Carychiidae. This process first assigned all specimens to phenotype hypotheses. A conservative genetic delimitation method is used to evaluate the initial morphospecies assignments and to identify distinct ELs [28,29]. This approach allows us to include otherwise morphologically unrecognized ELs, whose absence during phylogenetic tree reconstructions could yield misleading results [30]. Based on the identified lineages, a molecular phylogenetic hypothesis for Carychiidae is then reconstructed to test three evolutionary hypotheses: 1) morphospecies encompass unrecognized ELs, 2) limited dispersal results in a close genetic relationship of geographical proximally distributed taxa and 3) major climatic and tectonic events had an impact on lineage diversification within Carychiidae.

Molecular identification of evolutionary lineages and species delimitation
To address morphologically unrecognized ELs, the initial morphospecies identifications (Figures 1 and 2, Table 1) were examined applying a combination of five molecular delimitation approaches on mitochondrial sequence data. The 3.2% K2P threshold value identified 45 partitions (Figures 3 and 4). The Automatic Barcoding Gap Detection (ABGD) method consistently revealed 43 ELs for all tested combinations. Two versions of the General Mixed Yule-Coalescent (GMYC) model were applied. Both variants performed significantly better than the null model of a single coalescent population (L 0 = 1275.858, both p-values < 0.0001), but led to to a high partitioning of the dataset resulting in 64 (L GMYCs = 1315.46) and 78 clusters (L GMYCm = 1321.515), respectively. No significant improvement was found when applying GMYCm instead of GMYCs (Chi-square = 12.1081, df = 6 and p = 0.0596). Finally, the Statistical Parsimony (SP) approach delimited 51 ELs mostly congruent to the 3.2% threshold and ABGD results. A conservative delimitation strategy was used to combine the partitions of all five approaches. Since all molecular methods tended to split morphospecies, we decided to risk taxonomic lumping, and classified ELs as the most comprehensive grouping of specimens predicted by any of the five delimitation methods. Thus, the 28 carychiid morphospecies (Figures 1 and 2 Table 2). In total, 17 morphospecies were each recovered as a single EL (Table 2; 'matches'). The morphologically distinct, but so far, undescribed Carychium sp. 2, Carychium sp. 3 (both from China), Zospeum sp. 1 and Zospeum sp. 2 (both from Cantabrian Mts.) were also genetically revealed as separate clusters. Divergent ELs within a single morphospecies hypothesis were found for eight morphospecies (32%): Carychium exile mexicanum (2 lineages), C. mexicanum (2), C. mexicanum costaricanum (3), C. noduliferum (3), C. cf. pessimum (2), Zospeum   Data on the morphological identification, taxonomic first description, specimen number (#), sampling locality and evolutionary lineage (EL) are provided. Type locality populations or localities referred to in the phenotype descriptions of Zospeum morphospecies are marked by an asterisk.
isselianum (3), Z. spelaeum schmidti (2) and Z. suarezi (4). Hence, we uncovered at least 20 ELs that could not clearly be distinguished with the initial morphospecies hypotheses (i.e. 47% of all ELs). For Zospeum isselianum, Z. spelaeum schmidti, Z. suarezi, Z. subobesum and Carychium mexicanum costaricanum, our analysis of topotypic populations enabled us to link a single EL to the initial morphospecies hypothesis (Tables 1 and 2). The three morphospecies, Zospeum isselianum, Z. spelaeum schmidti and Z. suarezi were found over a large geographical area and possessed moderately variable shell phenotypes, two characteristics that make them prime candidates for taxonomic lumping. Specimens of C. exile mexicanum, C. clappi and C. stygium shared closely related barcodes and consequently, were treated as a single taxon (C1). The European Carychium sp. 1 (C23)  needs further consideration since results of the barcoding approach and phylogenetic reconstruction were not congruent. Thus, two ELs (C1 and C23) included more than one morphospecies each.

Phylogenetic tree hypothesis
Phylogenetic relationships of 38 ELs based on 1210 bp of nuclear (H3) and mitochondrial sequence data (COI and 16S) were estimated with three different phylogenetic inference methods. Maximum Likelihood (ML), Maximum Composite Likelihood (MCL) and Bayesian inference (BI) yielded mainly congruent results ( Figure 5). Lineage assignments correspond to the identification scheme for ELs described in the previous chapter however, with the following exceptions: morphologically unrecognized ELs, for which topotypic populations of the same morphospecies were analysed, were marked with 'sp. cf.' and the morphospecies name (e.g. Z16, Z. sp. cf. suarezi). The EL of the same morphospecies, which included specimens from the type locality or from localities used for the phenotype description, was named after the morphospecies hypothesis (e.g. Z14, Z. suarezi from Cueva Inguanzo referred to in the first description [31]). Due to the molecular distinctiveness of C. costaricanum (C11) from its type locality [32], we address this taxon at the species level. The morphologically and ecologically defined designation into Carychium and Zospeum was supported by molecular data (MCL: 100; PP: 1.00; ML ≥ 97). All analysed mitochondrial ELs (C1-25 and Z1-18, Figure 5) were monophyletic after incorporation of ncDNA with the exception of Carychium sp. 1 (C23). However, sister relationship between lineages were not resolved entirely. Only the relationship of one individual of Carychium sp. 1 to C. ibazoricum (C7) was highly supported (97; 1.00; 100). Another cluster formed by two lineages of Carychium sp. 1 and C. tridentatum was only weakly supported (82; 0.95; -).

Biogeographical reconstruction
Geographical range evolution of Carychiidae was reconstructed to evaluate geographical transitions and the presence or absence of geographically monophyletic clades. Results are shown only for the model where taxa inhabit a maximum of two regions (continents or mountain ranges). Differences to alternative models, where the maximum range was allowed to encompass all ranges (3 or 4), were largely restricted to basal nodes. However, in some cases these scenarios resulted in extremely wide ancestral ranges which seems biologically unreasonable. Generally, geographical proximally distributed taxa were closely related. Only few colonization events were discovered. The range reconstruction of the most likely (conservative) scenario for the root node of Carychium revealed an ' Asian' (A) or ' Asian + North American' (A+N) origin ( Figure 5).
The oldest diversification events within Carychium are characterized by relatively short branches with low statistical support, giving rise mostly to deep Asian lineages. Central America could have been colonized by more than one lineage. The European branch originated out of a long separately evolving lineage. We further tested two alternative tree hypotheses for Carychium in a model selection framework: i) constrained monophyletic Asian Carychium and ii) constrained monophyletic American Carychium (see Additional file 1). Both constraints received higher support than the unconstrained phylogenetic hypothesis. Monophyletic Asian and/or American lineages should not be ruled out ( Table 3).
The geographic origin of Zospeum could not be unambiguously resolved ( Figure 5). Our reconstructions equally supported an ancestral distribution in the 'Cantabrian Mountains + Alps' or 'Cantabrian Mountains + Dinaric Alps'. In comparison to Carychium, Zospeum arrived much earlier in Europe. Only two colonization events were discovered: Both lineages of Z. spelaeum schmidti were geographically restricted to the Alps (Z12) and the Dinaric Alps (Z11), respectively. Moreover, the sister clade (comprised of all other Dinaric and Alpine taxa) demonstrated a similar pattern with separate colonization of the Alps or Dinaric Alps; depending upon the ancestral state. We tested our phylogenetic hypothesis against two constrained hypotheses for Zospeum: i) monophyletic Dinaric Zospeum and ii) monophyletic Alpine Zospeum. Both scenarios received considerably lower support than the unconstrained hypothesis (Table 3).

Temporal dynamics of lineage diversification
By fitting different models of diversification using a maximum likelihood approach, temporal dynamics of the diversification of lineages, such as rate shifts due to historic climatic or tectonic changes, were investigated. To account for uncertainties in species delimitation, the analyses were performed on each of the five trees produced by the genetic delimitation approaches (threshold, ABGD, SP, GMYCs and GMYCm). A constant rate model of diversification was preferred for the GMYCm tree ( Figure 6, Table 4), whereas a rate-variable model (yule2rate) provided a better fit for all other trees (but not significantly in case of the GMYCs tree). A two-rate pure-birth model best explained the data for the threshold, ABGD and SP trees (Table 4) with a recent decrease (relative shift time approx. -0.01) of the speciation rate.

Bioclimatic niche differentiation
Bioclimatic niche modeling was performed to study niche differences between two well supported Carychium clades and to identify potential climatic factors responsible for lineage persistence and diversification. Maxent models based on occurrence data of European and North + Central American Carychium behaved realistically and were statistically well supported (AUC Europe : 0.769 ± 0.033; AUC NorthCentralAmerica : 0.881± 0.016) (see Additional file 2). Bioclimatic niche models for native Carychium taxa in America and in Europe, respectively, generally predicted wide areas of suitable habitat (see Additional file 2A, B). The American model was characterized by a sharp longitudinal transition from suitable habitats in the East to unsuitable conditions in the West. Visual inspection of the geographical distribution of all 19 bioclimatic variables throughout this region identified diurnal range (bio2) to best explain the observed pattern (see Additional file 3). Modeling results for Europe highlighted a continuous region of high habitat suitability, stretching from the Mediterranean Sea to Northern Scandinavia and from the Atlantic Coast to the Black Sea.
Compared to the niche model of the native clade in Europe, the projected model for America suspected European Carychium to inhabit regions at lower and higher latitude (see Additional file 2C). Variable degrees of predicted habitat suitability were modeled for populations of non-native European Carychium minimum (CM) and C. tridentatum (CT) populations in North America (see Additional files 2C and Additional file 4). The modeled latitudinal distribution in Eastern North America was interrupted by a large territory of unsuitable bioclimatic conditions. Similar to the native American clade, this phenomenon could be attributed to changes in the variable states of diurnal range (see Additional file 3). Projection of the North and Central American bioclimatic envelope onto Europe largely corresponded with the distribution range of native Carychium taxa (see Additional file 2D). However, a slight shift towards more Mediterranean (to the South) and continental conditions (to the East) was visible.

Unrecognized evolutionary lineages
While most species concepts view a species as 'a separately evolving metapopulation lineage' [33], they disagree on which characters should be applied to organize this speciation continuum [33]. In the case of the microgastropod taxon Carychiidae, variable environmental conditions can lead to phenotypic variability (e.g. lineages inhabit large geographical regions), whereas relatively stable environments can lead to morphological stasis (e.g. cave endemic lineages) [34][35][36][37][38]. Given that traditional taxonomic classifications completely relied upon characters of the mature shell, unrecognized ELs had to be expected.
Integration of molecular data in a conservative delimitation approach revealed that the 28 analyzed morphospecies actually comprise 43 distinct ELs. From our results, three different taxonomic scenarios can be distinguished: i) ELs directly matching the morphospecies, including morphologically delimited, but so far, undescribed taxa (21 lineages), ii) distinct ELs being lumped into a single morphospecies (8 morphospecies encompassing 20 lineages), and iii) single ELs including more than one morphospecies hypothesis (4 morphospecies, 1 undescribed taxon, 2 lineages). This suggests that basing carychiid taxonomic delineations only on conchological characters may be inadequate. We emphasize that vague taxonomic (under-) descriptions and semantic tradition, with morphospecies concepts often only referring to a few specimens, have widely neglected aspects such as intraspecific shell variability. In congruence with overlapping, interspecific shell dimensions, this has led to a vague situation in carychiid taxonomy and systematics.
Carychiidae appear to still harbor a considerable amount of undiscovered diversity, especially in biogeographic regions that are underexplored due to political strife, nonaccess and challenging geography, such as Asia or Central America. For example, East Asian C. noduliferum, C. cf. pessimum and the Central American C. mexicanum and C. mexicanum costaricanum morphospecies for which, at least one unrecognized EL was discovered. In Zospeum, similar as in other cave animals [39][40][41], ELs are likely to be morphologically cryptic, possibly due to morphological stasis.
On the other hand Carychium exile mexicanum most likely served as a 'taxonomic lumping bin' built up by the frequent nomenclatural intermixture of ELs of the morphospecies C. stygium, C. clappi, C. floridanum, C. mexicanum and C. mexicanum costaricanum (e.g. [42][43][44][45][46]). The relevant taxonomic literature is puzzling and contains contradictory statements [42]. In general, individual morphospecies within the C. exile mexicanum + C. stygium + C. clappi EL showed highly similar barcode sequences and are thus, difficult to distinguish by molecular delimitation methods, suggesting that these are probably in the process of speciation [18]. A similar complex situation is evident for C. tridentatum and Carychium sp. 1, which still need further taxonomic investigation. An alternative explanation for the clustering of morphologically distinguishable taxa may be an artifact created by our conservative genetic delimitation strategy. However, if we expect these 'true' species to be artificially lumped into a single EL, we may equally expect other ELs to include even more 'true' species, e.g. as indicated by the two GMYC models.
Based on our taxonomic investigations, we suggest that apparently widespread and assumedly variable Zospeum morphospecies must be revised. Moreover, since several Zospeum spp. are already listed as vulnerable or endangered [47], we anticipate high conservation value amidst cave-endemics. Furthermore, we recommend that future molecular analyses should focus on populations collected at the type localities, to link an EL with the historic morphospecies hypothesis. A more comprehensive geographic sampling would most likely uncover yet even more unrecognized ELs.

Diversification of evolutionary lineages
Appropriate taxon sampling is crucial for the reconstruction of phylogenetic relationships [30]. Our taxon sampling not only covered large parts of the known Holarctic distribution of the Carychiidae but we integrated molecular data to uncover morphologically unrecognized ELs. As expected, the geographic evolution of Carychiidae suggests that the majority of ELs in a geographic area are phylogenetically more closely related than taxa between distant regions (i.e. continents / mountain ranges). This general pattern is most obvious for the European Carychium and Cantabrian Zospeum. After the initial colonization of new areas due to rare (long-distance) passive dispersal, the ancestral lineages diversified in situ.
Based on five different genetic delimitation approaches, our analyses of diversification modes provided mixed results, and in some cases, indicate that diversification rates may have changed over time. While the two GMYC models favored a constant-rate diversification model, the three more conservative genetic delimitation strategies (threshold, ABGD and SP) point to a relatively recent rate slowdown. Such a rate shift can be interpreted in different ways. First, a decrease in the rate of speciation may be due to diversity dependence, e.g. a niche-filling process, where new species reduce the probability of future speciation events [48]. Second, incomplete taxon sampling can result in a spurious rate slowdown due to an overrepresentation of deeper nodes [49]. Third, since speciation is a continuous process, relatively young divergence events are likely to remain unobserved. This effect has recently been described as 'protracted speciation' [50]. Given that the more conservative delimitation approaches are likely to underestimate the number of ELs, we suggest that the observed pattern is more likely an artifact than an actual slowdown. The other two causes may be difficult to distinguish as they can lead to similar patterns, but we suggest protracted speciation as the most likely cause here. First, while taxon sampling in Carychium and Zospeum is indeed incomplete, several of the missing morphospecies in fact seem to belong to ELs included in this study (for example, C. mariae and C. riparium; [19] and Jochum & Weigand, unpublished data). Second, the very recent shift in diversification rates is expected under protracted speciation, but could result from incomplete taxon sampling only if the missing species were those that have originated most recently. After omitting the youngest 5th percentile of total branch lengths (i.e. excluding the most recent 5% of the evolutionary history), a constant-rate model is also favored for the threshold, ABGD and SP partitions (data not shown). This not only highlights the very recent timing of the rate Habitat suitability maps were calculated based on species occurrences in each region (Europe and North America) independently and projected into the other region (North America and Europe, respectively). shift (consistent with protracted speciation), but it also shows that no additional rate shifts throughout the evolutionary history of Carychiidae have occurred, suggesting that major environmental changes did not affect the rate of diversification of this clade. The geographic evolution of Carychium implies an Asian or Asian + North American origin. Asian Carychium are the genetically most distinct. Since high regional genetic diversity of a taxon can be taken as evidence for lineage persistence in ancestral areas [51], this provides support for an out-of-Asia hypothesis. Niche models suggest significant overlap and occupancy of a wide range of environmental conditions among two geographically and phylogenetically distant Carychium clades (Europe and America). There is, however, no evidence for an intermixture of the East and West North American Carychium lineages. Thus, larger-scale bioclimatic factors are likely to have been of lesser importance than microhabitat conditions in the diversification of Carychium. Nonetheless, differences in the larger-scale bioclimatic niche can result from independent evolutionary histories in isolated biomes and, affect the marginal distribution of lineages. For example diurnal range patterns, an important factor for North American Carychium, might affect the colonization of habitats featuring a high variation between day and night. Temperature extremes, moisture and relative humidity levels in typical microhabitats occupied by Carychium (e.g. moist leaf-litter, crevices or superficial subterranean habitats) are much less pronounced than those on the surface [52][53][54]. In cases where the temperature diurnal range is high, microhabitats will not be able to stabilize a given condition.
Since the available bioclimatic layers do not cover subterranean habitats, the influence of bioclimatic parameters on the distribution and diversification of Zospeum could not be addressed. The ancestral area reconstruction indicated the 'Cantabrian Mountains + Alps' or 'Cantabrian Mountains + Dinaric Alps' as the ancestral area. We identified two independent colonizations of the Alps and/or the Dinaric Alps but the geographical directionality of these colonizations could not yet be deduced. The incorporation of a recently (re)discovered Asian cf. Zospeum from Chinese and South Korean caves (RS and [55]) may provide further insight into the evolutionary history of Zospeum.
The diversification of Zospeum is characterized by rare, long-distance colonization events with in situ (mountain range) radiations into several (sometimes morphologically cryptic) lineages, occupying isolated cave systems. Trogloxene cave animals like bats and cave crickets, returning to the surface periodically, represent potential vectors for the passive dispersal of cave-dwelling microgastropods [56,57]. Such an allopatric diversification without phenotypic (and perhaps ecological) divergence is referred to as morphostatic radiation [58] or non-adaptive radiation [59]. Once established at a new locality, ancestral populations may reach adjacent habitats by floating via underground drainage systems or by active subsurface migration [40]. As an example, seasonal flooding events are known triggers for the wash-out of the cave salamander Proteus anguinus [60]. As has been shown for humandispersed transatlantic populations of Carychium [26,27], the transportation of only a few hermaphroditic individuals is needed for a successful population foundation. Finally, the remarkably wide distribution of the Cantabrian Zospeum sp. 1 with the presence of identical DNA barcodes in four distant caves (up to 30 km apart) suggests recent long-distance dispersal and merits further investigation.
In contrast to the aforementioned, dispersal colonization, cave lineages can arise from multiple diversification events of surface populations via vicariance colonization [56,61]. The geographic distribution of extant Zospeum allows room for speculation about the maximum age and evolutionary history for this group. All caves inhabited by European Zospeum are embedded in sediments of the Alpine belt initially formed during the Early Cenozoic Alpine orogeny [62,63]. Our most parsimonious assumption suggests that European Zospeum originated no earlier than the beginning of the Early Cenozoic (approximately 65 mya), coinciding with the beginning of the Alpine orogeny. Their last common ancestor (LCA) could have descended from the noncave-dwelling Carychiopsis, for which fossils are known since the European Paleocene (66-56 mya) [64-66].

Conclusions
Carychiidae harbor a substantial number of morphologically unrecognized ELs. In particular, several of the assumedly widespread cave-dwelling Zospeum as well as Asian and Central American Carychium species resulted from past taxonomic lumping. Future studies should focus on specimens from type localities to link the ELs with the initial phenotype descriptions. Rare longdistance colonization and in situ radiations within the newly inhabited areas, i.e. continents (Carychium spp.) and mountain ranges (Zospeum spp.) represent likely diversification processes in the Carychiidae. However, East Asia provides a notable exception, exhibiting high regional genetic diversity formed by several distinct Carychium lineages: indicating a potential origin for Carychium and a potentiality for all the Carychiidae.
Although global climatic conditions could influence distribution, microhabitat structure most likely determined local presence and promoted allopatric diversification. Land invasion and desiccation avoidance by the LCA of Carychiidae could have been achieved by colonizing aphotic, permanently humid microenvironments. The occurrence of a true, cave-dwelling lineage (Zospeum) and sporadically-observed cave populations in Carychium only demonstrate ecological extremes for survival during the adaptive shift onto land. Shallow subterranean habitats, providing a connection between surface and subsurface realms, could well have promoted ecological transitions within the Carychiidae. Future species-specific ecological studies will allow the identification of micro-environmental parameters shaping the distribution and promoting lineage diversification.

DNA extraction, PCR and sequencing
DNA extraction was carried out on ethanol-preserved individuals using the DNeasy Blood and Tissue Kit (Qiagen, Hilden, Germany) protocol. Shell and visceral material were removed to lower cross-contamination risk. Polymerase chain reactions (PCR) were performed to amplify nuclear Histone 3 (H3), a partial fragment of the mitochondrial 16S rRNA (16S) and the DNA barcoding fragment of the mitochondrial-encoded Cytochrome C Oxidase Subunit I (COI). For COI, we used the standard invertebrate primer pair LCO1490 -5'-GGT CAA CAA ATC ATA AAG ATA TTG G-3' and HCO2198 -5'-TAA ACT TCA GGG TGA CCA AAA AAT CA-3' [71]. Each 25 μL PCR mixture included 1 μL (10 pmol) of each primer, 2.5 μL 10x PCR buffer, 2 μL  [72]. For H3, we used the degenerated primer pair H3-F -5'-ATG GCT CGT ACC AAG CAG AC(ACG) GC-3' and H3-R -5'-ATA TCC TT(AG) GGC AT(AG) AT (AG) GTG-3' [73]. In principle, the same PCR conditions have been used for the amplification of the H3 marker. Modifications contain: the use of 0.1 μL (20 mM) dNTPs, 0.14 μL Taq-polymerase and no tetramethylammonium chloride and bovine serum albumin. PCR cycles were run at the following conditions: 5 min at 95°C, followed by 34 cycles of 30 s at 95°C, 25 s at 52°C and 45 s at 72°C, and finally, 5 min at 72°C. Visualization of single PCR products was performed on a 1.4% agarose gel. They were cleaned using the GeneJET PCR Purification Kit (Fermentas, St. Leon-Rot, Germany). In cases multiple PCR products were detected, the QIAquick Gel Extraction protocol (Qiagen) was used. PCR products were bidirectionally sequenced using the PCR primer pair (5 pmol) and the BigDye W Terminator v.3.1 Cycle Sequencing Kit (Applied Biosystems, Inc.) on an ABI 3730 xl capillary sequencer following the manufacturer's instructions.

Molecular delimitation strategies
COI sequences of all 166 individuals were aligned using MAFFT 6.814 [74] implemented in the Geneious software under the G-INS -i algorithm proposed for less than 200 sequences with global homology. Ambiguous characters (Ns) were treated as missing data. The alignment was further modified by manual primer deletion and a 3' and 5' trimming conducted with GBLOCKS [75]. The length of the final alignment was 607 bp. DNA barcodes are deposited in the BOLD project 'Barcoding Carychiidae microsnails [BARCA]'.
Initial morphospecies assignments were tested using five genetic delimitation approaches: DNA barcoding via a threshold value [76], the Automatic Barcoding Gap Detection (ABGD) method [77], the General Mixed Yule-Coalescent single (GMYCs) and multiple (GMYCm) models [78,79] and Statistical parsimony network analysis (SP) [80]. A threshold value of 3.2% K2P genetic distance was used to separate intra-and interspecific variability in Carychiidae, as recently established by Weigand et al. [18]. The ABGD method separates DNA sequences based on an automatic procedure of barcode gap discovery. Three user defined input variables are requested: The minimum (Pmin) and maximum intraspecific variability (Pmax), which refer to the area were the barcode gap should be detected; and the minimum gap width (X) which relates to the sensitivity of the method to gap width. We tested model combinations of X ranging from 0.01 to 0.9 with Pmax of 0.001 and 0.9, respectively. All runs were performed using Kimura (K89) genetic distances and 50 screening steps. The GMYC delimitation method combines phylogenetic and phylogeographical approaches to estimate the number of well-separated entities in a sample. It uses an ultrametric input tree to define partitions according to transitional points between speciation and coalescence within species rates. A model based on a single (GMYCs) or multiple (GMYCm) threshold values can be tested. Analyses and model comparisons were performed in the R package 'Splits' using the 'gmyc' and 'compare' functions. The ultrametric input tree was obtained with BEAST v1.7.4. [81]. Statistical parsimony network analysis (SP) is commonly used to cluster haplotypes in a phylogeographical framework [82]. However, an inverse consideration of this method is proposed to allow the delimitation of coalescent populations [80]. The program TCS 1.21 [83] was used to delimitate entities on the basis of 95% statistical confidence (i.e. connection probability).

Alignment optimization and phylogenetic tree reconstruction
Phylogenetic hypotheses were reconstructed using a concatenated dataset of three phylogenetic markers (mitochondrial 16S, COI and nuclear H3) resulting in 1210 bp. In total, 86 individuals comprising 26 carychiid morphospecies (17 Carychium, 9 Zospeum) and 38 ELs as well as three outgroup taxa (Ellobioidea: Pythiidae: Laemodonta cubensis; Ellobioidea: Melampodidae: Microtralia occidentalis and Veronicelloidea: Veronicellidae: Veronicella cubensis) were analyzed for the concatenated dataset (see Additional file 5). Alignment optimization was performed separately for each phylogenetic marker: 16S sequences were aligned with MAFFT 6.814 under the FFT-NS -i x 1000 algorithm implemented in the Geneious software. The initial 16S alignment had a length of 548bp. Primer sequences were deleted and the initial alignment was further modified with GBLOCKS [75] to remove ambiguously aligned internal positions and to trim the alignment at the 5' and 3' ends. The final 16S alignment had a length of 368 bp (67% of the initial alignment). For the nuclear H3 marker, the G-INS -i algorithm was used. Primer deletion and GBLOCKS 5' and 3' trimming resulted in 235 bp of the initial 330 bp (71%). The already trimmed barcoding alignment was used as the COI alignment (607 bp, 93% of initial alignment).
Topologies were estimated under three different phylogenetic reconstruction methods: Maximum Composite Likelihood (MCL), Maximum Likelihood (ML) and Bayesian inference (BI). Runs for MCL were performed in MEGA5 under the pairwise deletion option, a gamma distribution (G) with rate parameter 1 and 1,000 bootstrap replicates. RaxML 7.0.3 [84] was used to estimate the ML topology. To account for varying substitution rates between different loci and nucleotide positions, three marker-specific partitions under the GTR+G+I substitution model were set. MrModeltest 2.3 was used to distinguish between competing substitution models [85]. A thorough ML bootstrapping with 1,000 replicates was conducted. A Bayesian phylogenetic analysis was performed with MrBayes 3.2.1 [86,87] using Veronicella cubensis as outgroup. Three gene partitions were defined keeping the estimation of all parameters of the GTR+G+I model of evolution unlinked during the analysis. Two runs of 2,000,000 generations of the MCMC (Markov Chain Monte Carlo) were executed, sampling every 500 generations. The first 25% of the samples were discarded as burn-in to ensure sampling from the stationary phase of the model runs. The chain temperature parameter was set at 0.1. At the 2,000,000 th generation, the average standard deviation of split frequencies had already fell below 0.01, thus the analysis was stopped.

Alternative hypotheses testing
A model selection approach using BEAST v1.7.4 [81,88] was followed to test the monophyly of geographical closely-distributed taxa. Four taxon sets of species inhabiting a certain geographic area were created and constrained to be monophyletic, i.e. i) American Carychium, ii) Asian Carychium, iii) Dinaric Zospeum and iv) Alpine Zospeum. The monophyletic constraints were analyzed independently and compared with the results of the unconstrained topology. The Markov Chain Monte Carlo was run for 30 million generations, sampling trees and parameters every 1,000 generations. After verifying that appropriate effective sample sizes were achieved, three model selection methods were applied: a posterior simulation-based analogue of the Akaike Information Criterion (AICM) [89,90], and Bayes Factors (BF) between marginal likelihoods estimated through Path Sampling (PS) [91] and Stepping Stone Sampling (SS) [92]. We used the settings suggested on the BEAST website (http://beast.bio.ed.ac.uk/Model_selection). Competing topological hypotheses were ranked according to the results of the AICM and the marginal likelihood values obtained with PS and SS. Differences between AIC were calculated (Δ AICM) as were Bayes Factors between competing hypotheses. A Δ>7 between AICM values of the best ranked hypothesis and the other hypotheses suggests that the latter are very unlikely [93]. A Bayes Factor ln > 2.3 was considered as strong support for the hypothesis (modified guidelines of [94]).

Estimation of relative times of divergence
The scarcity of reliable fossils of Carychium hinders the estimation of absolute times of divergence. Carychium brotianum De Loriol, 1865 from the Upper Jurassic in France is assumed to be by far the oldest carychiid fossil [95]. Our own investigations of the type material of Carychium brotianum deposited in the Musée Cantonal de Géologie Lausanne reject a close affiliation to Carychiidae. We rather regard the extinct lineage Carychiopsis Sandberger, 1872 known from the Paleocene until the Neogene or Carychium munieri Briart & Cornet, 1889 reported from the Early Paleocene as the oldest representatives of Carychiidae [64- 66,96]. In respect to this problem, we chose to estimate only relative ages. The analysis was performed with the program BEAST v1.6.1 [81] using an uncorrelated, relaxed lognormal molecular clock model. The three genetic markers were concatenated but the parameters of the substitution and molecular clock models were independently estimated for each gene partition. The taxa Veronicella cubensis, Microtralia occidentalis and Laemodonta cubensis formed the outgroup taxa set. Trees were sampled from a Birth-Death tree-prior and following the GTR+G+I substitution model. The MCMC was run for 30,000,000 generations sampling trees and parameters every 1,000 generations. Effective sample size and convergence were evaluated in Tracer 1.5 [97]. The first 10% of samples were discarded as burn-in before building the maximum clade credibility (MCC) tree.

Temporal patterns of diversification
To assess the temporal dynamics of lineage diversification, we tested several constant-and variable-rate models of diversification using a maximum likelihood approach as implemented in the R package laser [98,99]. Model-fit was evaluated using the Akaike Information Criterion (AIC) [90]. Temporal changes in the net diversification rate were evaluated by calculating the difference in the AIC score between the best fit constant-rate (pure-birth, birth-death) and variable-rate models (DDL, DDX, yule2rate). Since these comparisons are susceptible to a high Type I error rate [98], a null distribution of the test statistic was generated by calculating the ΔAIC score for 500 phylogenetic trees simulated under a pure-birth process.

Geographic range evolution
The dispersal-extinction-cladogenesis (DEC) model implemented in Lagrange [100] was used to infer geographic range evolution. The model assumes that geographic splits for a given lineage occur along the branches of a topology rather than at bifurcation points. Changes in the geographic range of a given taxon can be accounted for by dispersal events (range expansion) or local extinction (range contraction). Alternative Lagrange analyses were run with the maximum range size set to either two or four (the maximum number of areas in the model). The patterns were largely congruent and only the results of the conservative approach are presented in which taxa are allowed a maximum range size of two, i.e. simultaneously inhabiting up to two geographic areas (here continents or mountain ranges). Given their low dispersal potential and the high proportion of continent-or cave-endemic taxa, this scenario seems more likely. The character state with the highest relative probability (fraction of the global likelihood) was plotted on the topology. In cases of similar relative probability values, all alternative scenarios were plotted to account for model uncertainty.

Bioclimatic niche modeling
Bioclimatic niche models were constructed with the program Maxent 3.3.3 [101]. This approach is based on the principle of least assumptions. In the absence of any further information, it prefers the model with the maximum entropy. Niche parameters are extracted from occurrence points and global environmental layers and combined into taxon-specific bioclimatic envelopes. We used georeferenced data for monophyletic lineages instead of species-specific data because ambiguous species-level identifications of Carychium taxa are likely to produce questionable entries in public databases or museum collections. The investigation of the bioclimatic niche and macro-evolutionary changes above the species level has proven suitable in earlier studies [102]. This is understandable as traits that allow taxa to persist tend to be conserved over time [102]. The bioclimatic niches for i) the monophyletic European Carychium clade and ii) the monophyletic North + Central American (NC) Carychium clade were estimated. In total, 136 sampling points for the European clade were collected either from our own collections or the GBIF database (www.gbif. org). Data from American museum collections (Carnegie Museum of Natural History, Pittsburgh; Field Museum of Natural History, Chicago; Florida Museum of Natural History, Gainesville) and entries within the GBIF database are used as distribution data for the NC-clade resulting in 241 sampling points for this model. All 19 bioclimatic variables of the WorldClim project [103] were used in highest resolution (~30 arc-seconds) in order to discover potential, fine-scale patterns affecting the distribution of carychiid microgastropods. Almost all included georeference points (>98%) were accurate to about less than 1 km, which justifies the implementation of bioclimatic variables in their highest resolution. We performed five cross-validated Maxent runs and considered grid cells with a cumulative probability of more than 10 (from a range of 0-100) as suitable [104,105]. The area under the ROC curve (AUC) gave an evaluation of the projections' overall quality. An AUC score above 0.7 is considered good model performance [106]. Outputs are generalized clade-based models. Such a model design can slightly overestimate taxon-specific bioclimatic envelopes and can lead to false-positive model results (i.e. predicted habitat suitability but biological absence of taxa).

Niche similarity
ENMtools [107] was used for the comparative analysis of environmental niche models of the European and NCclade. Niche overlap between the two clades was assessed using three different similarity statistics: Schoener's D [108], the I statistic [109], and the relative rank test [110]. Schoener's D and I calculate the difference in standardized suitability scores for each grid cell. The relative rank test provides an estimate of the congruence of relative ranks of suitability for each grid cell. All three measurements can have a range between 0 (non-overlapping niches) and 1 (identical niches). To calculate niche similarity, climatic niche models were cross-projected onto the geographic region of the other clade.