Biology of Maculinea arion
Maculinea arion is a habitat specialist, like the rest of the genus Maculinea (the large blues) and many other members of the tribe Polyommatini (the blues). M. arion caterpillars exploit two resources during their development; a specific food-plant on which they feed during the first three weeks after hatching, and subsequently a specific host-ant in the nests of which they live as obligate predators of ant brood, and where they overwinter and pupate in the late spring after 11-23 months [35–37]. This extreme specialization leads to exceedingly high juvenile mortality rates, with 20-40% of a typical population dying in the egg or early larval stages, and mortality among caterpillars inside the host-ant nest representing 80-90% of the total breeding population mortality . Whereas M. arion has relatively little impact on the fitness of the food-plant, its main host-ant, Myrmica sabuleti, experiences dramatic reductions in colony fitness upon infection . The intimate butterfly-ant relationship leads to large oscillations in census population sizes of M. arion, suggesting that genetic diversity may primarily be maintained by gene flow between local low-density populations, rather than substantial effective population sizes at each local site.
The Møn site and demographic surveys
The number and connectedness of M. arion populations in Europe has decreased over the last century. In Denmark M. arion was previously known from approximately 40 localities, but only half of these persisted until the second half of the 20th century [Figure 1b; ]. At present only a single population remains, at Høvblege on the island of Møn. M. arion has a long and probably continuous history of occurrence at this site, with specimens in natural history collections dating back to 1926 (among the earliest in the collections). At that time three to four local populations were found on the island within the typical dispersal distance of the species [max dispersal distance is estimated to be at least 10 km based on genetic markers; ].
In 1991 a critically low number of imagos was observed flying at the single remaining Høvblege site, which at this point was ca. 8 ha (breeding area). The site, a grassland habitat, had been left almost untouched in the period 1915-1991, allowing for trees, scrub and larger grasses to invade the area with negative consequences for the food-plant distribution. Since 1991 the population has been managed and monitored every year, and now has a breeding area of ca. 10 ha. The number of imagos was determined using transect walks, a standard method for assessing year to year changes in butterfly abundance . Population census sizes were estimated according to Thomas , assuming that approximately 1/3 of the total population is flying on the best day of the flight season and that 85% of these can be observed during a thorough survey, i.e. N
≈ 3.5 × the number of imagos counted on the best day. Recent evaluations of butterfly monitoring methods conclude that caution is needed when estimating population census sizes from transect counts [40–43]. The reason being that transect counts are influenced by adult longevity, which is affected by weather patterns and thus vary between years. However, methods such as mark-release-recapture are unfavourable in endangered populations as they may negatively impact the butterflies. Transect counts were therefore consistently used to estimate population sizes in all years, despite yielding cruder estimates.
In 1991-1996 the population size was consistently around 50-85 individuals, but increased to much larger numbers in 1997-2005 (175-440 individuals), to subsequently drop to 70-105 individuals in 2006-2008 (Figure 1a). The lack of empirical data on population size prior to 1991 makes it impossible to estimate the duration and precise magnitude of the population bottleneck around 1991, but according to anecdotal observations the Møn population was very numerous in 1973.
For comparison we used six M. arion populations in south and central Sweden (Figure 1b, see Ugelvig et al.  for details on these sampling localities). Population demography surveys are not available for these populations, but amateur lepidopterists maintain that the populations in Skåne, Gotland and Öland have never been close to extinction. Furthermore, although the number of local populations have also declined in these areas , a recent study suggests that well functioning meta-populations still exists at these three sites . Conversely, the populations in Uppland, Västergötland and Södermanland are more isolated  and unlikely to still be part of a functioning metapopulation .
The contemporary samples were collected in the summers of 2005 (n = 46) and 2007 (n = 17) using a non-invasive sampling technique, collecting 2 × 2 mm2 wingtip fragments from adult M. arion butterflies. This sampling technique does not affect survival or flight ability of the butterflies [45; D.R. Nash unpublished data].
The historical samples were kindly provided by the Danish Natural History Museums in Copenhagen and Aarhus, and included specimens collected at Høvblege and the two nearby, now extinct, sites Jydelejet and Møns Klint. Unless there is a need to distinguish them, we will collectively refer to all these sites as Møn. One middle leg per museum specimen was sampled from years in which a reasonable number of specimens were collected, which gave a time series with 4-12 years between samples, covering 77 years (Table 1). After 1975, few M. arion specimens exist in the collections, reflecting the rarity of the butterfly, which was finally declared protected in Denmark in 1992. Forceps used for the collection of legs were cleaned in bleach (1% sodium hypochlorite) between each sampling to prevent cross contamination.
DNA from the contemporary samples was extracted by homogenizing the wing fragment in a solution of 100 μl 5% chelex-TRIS (10 mM) and 5 μl proteinase K (0.75 units). The samples were then incubated at 56°C for 90 min, boiled at 99°C for 15 min, and centrifuged at 13000 rpm for 3 min. The supernatant was stored at -20°C.
DNA from the historical samples was extracted using a buffer slightly modified from Gilbert et al. , which consisted of 10 mM Tris (pH 8), 10 mM NaCl, 2.5 mM EDTA, 5 mM CaCl2, 2% sodium dodecyl sulphate (SDS), 40 mM dithiothreitol (DTT) and 10% proteinase K (final concentrations). The samples were incubated for 24 h at 56°C with gentle agitation. The extracted DNA was purified using a Qiagen PCR purification kit (QIAquick), re-suspended in 30 μl elution buffer and stored at -20°C. Extraction and PCR-setup was performed in dedicated ancient DNA clean-laboratories at the Centre for GeoGenetics at the Natural History Museum in Copenhagen, where only pre-PCR work occurs. According to standard protocols for work with low quality/quantity DNA , contamination was monitored at both the extraction and PCR steps by blank controls and all post-PCR procedures were conducted in physically distant laboratories.
Microsatellite amplification and genotype error rates
Two sets of nuclear microsatellite markers were employed corresponding to the two research questions (see Additional file 1 Table S1 for details on all microsatellite loci used in the study). In the comparison between contemporary samples from Møn and Sweden nine microsatellite loci were used; Macu8, Macu9, Macu11, Macu15, Macu17, Macu20, Macu26, Macu44 and Macu45 [with genotype data already existing for the Swedish populations, see ]. Of these loci, four were suitable for the temporal study, as only loci with allele sizes ≤160 bp allowed amplification in the historical samples (Macu15, Macu20, Macu26, Macu45). Primers for an additional ten loci were developed by ECOGENICS GmbH (Zürich, Switzerland), specifically targeting loci with short allele sizes (< 200 bp); Macu30, Macu31, Macari02, Macari05, Macari08, Macari16, Macari18, Macari19, Macari22, Macari23. Amplification from 1 μl of DNA extracts was carried out in 12 μl mastermix volumes using AmpliTaq Gold (contemporary samples; Applied Biosystems) or Platinium Taq High Fidelity (historical samples; Invitrogen). The following cycling conditions were used: initial denaturation 5 min at 95°C; 35-40 cycles of 30 s at 95°C, 30 s at the locus specific annealing temperature of 56/57°C, 30 s at 72°C; final elongation of 30 min at 72°C. PCR products were run on an ABI 3031 × l automated sequencer with the GeneScan-500 LIZ size standard and analysed using GENEMAPPER 4.0 (Applied Biosystems).
We applied a multiple tube approach when genotyping the historical samples, as they were expected to be prone to genotype errors such as allelic dropout and false allele amplification . The amount of DNA was limited to that extracted from a single butterfly leg, thus it was not possible to replicate as extensively as originally proposed by Taberlet et al. (1996; 7-10 replicates per genotype). Instead, two independent amplifications were performed for each sample at each locus. If the same genotype was obtained, this was recorded as the consensus genotype. Conversely, if two different genotypes were found (e.g. one homozygote and one heterozygote) a third PCR was conducted. Genotypes were only scored when every allele was observed at least twice, and in cases where a consensus genotype was not found after three PCRs, it was recorded as missing. Genotype error rates were calculated as recommended by Pompanon et al. , i.e. the error rate per locus. In the contemporary samples, error rates were estimated by re-genotyping a subset (22%) of the samples.
PCR amplification success was analysed by fitting a general linear model (GLM) with binomial errors and logit link, correcting for over-dispersion, and using the number of samples successfully amplifying at each microsatellite locus as the response variable and the total number of samples as the binominal denominator. The maximum allele length amplified at each locus (in base pairs), the age of the samples (in years), and their interaction were used as explanatory variables. The analysis was carried out in JMP 7.02 (SAS Institute Inc.).
Linkage disequilibrium among pairs of microsatellite loci was tested using FSTAT 2.9.3. The program GENALEX 6.3 was used to calculate expected and observed heterozygosities for each microsatellite locus, and for testing genotype frequencies against Hardy-Weinberg (HW) equilibrium expectations. When excess homozygosity was found, the program MICRO-CHECKER 2.2.3  was used to check for evidence of null alleles, and their frequencies at different loci were estimated with the program FREENA. High null allele frequencies were found in some of the historical samples, which may affect F-statistics [50, 51]. Pairwise F
values among the historical samples were re-calculated after applying the ENA correction for null alleles as implemented in FREENA and then correlated with the temporal difference between samples using Mantel tests (Mantel 1967) in FSTAT, using 2000 permutations. The contemporary populations did not show signs of null alleles, and pairwise F
values among the seven Scandinavian populations were calculated in FSTAT, using 1000 permutations. A second measure of genetic diversity, allelic richness, was computed in FSTAT for both contemporary and historical samples and differences in allelic richness and expected heterozygosity among samples were tested using repeated-measures ANOVA in JMP.
Evidence of recent genetic bottlenecks in the temporal samples was tested using the software developed by Garza and Williamson . The program assumes that a reduction in population size will have a stronger affect on the number of alleles (k) than the range of allele sizes (r), leading to a smaller M-ratio (= k/r) in size-reduced populations compared to equilibrium populations. In order to evaluate the empirical M-ratio, an equilibrium population was simulated based on parameters describing the evolution of the analysed microsatellite loci (Δ
: the mean size of larger mutations, p
: fraction of mutations larger than a single step, and μ: the mutation rate/locus/generation) and the effective population size of pre-bottlenecked populations (N
). These parameters are difficult to estimate in empirical samples and each sample estimate of M-ratio was thus tested under different evolutionary scenarios as suggested by Guinand and Scribner . The scenarios include: i) a two-phase mutation model with proportions of non one-step mutations in the range 0.00 (SMM model), 0.05, 0.10 and 0.20, ii) Δ
varying between 2 and 4 (Garza and Williamson  suggest 3.5 as default setting), and iii) a constant μ of 10-4 /locus/generation, but with N
ranging from 50, 100, 250 and 500 corresponding to θ (= 4 × N
× μ) equal to 0.02, 0.04, 0.1 and 0.2. For each sample, an equilibrium population was simulated 10000 times using these parameter settings. The empirical M-ratio averaged across loci was compared to the distribution of simulated M-ratios, in order to evaluate the likelihood of a bottleneck event having taken place (95% criterion).