Sample collection and range estimation
Amphipsalta and Notopsalta specimens (typically male) were collected during surveys from 1993–2010. The species Kobonga apicata (Ashton) (see ), from central and western Australia, was selected as an outgroup based on phylogenetic analyses at the tribe level (unpublished data). Tissue collections were sought from sites distributed as uniformly as possible within each species' range. When a species was detected but not collected, the presence of its diagnostic song was noted and digitally recorded if possible. Localities used for ecological niche modeling were based on collection and song data from this study and historical records  (Figure 2; see Additional file 2: Appendix for details).
Genomic DNA was extracted from 1–2 legs using the Qiagen DNeasy Tissue Kit and protocol (Qiagen, Valencia, California, USA), except that the digestion was conducted over 3–18 hours at 54 °C. Approximately 1350 base pairs of the mitochondrial cytochrome oxidase I (COI) gene were amplified in one of two ways: (1) in two sections; the 5’ (barcoding) section with primers C1-J-1490 + C1-N-2198 , annealing temperature 50°C, and the 3’ section with primers C1-J-2195 + TL2-N-3014 , annealing temperature 56°C; (2) in one piece using C1-J-1490 + TL2-N-3014, annealing temperature 45°C. Amplified products were cleaned using the Clontech Extract II Kit (Clontech, Mountain View, CA, USA) or ExoSAP-IT (USB Corp., Cleveland, OH). For examination of the species-level Amphipsalta
Notopsalta relationships, a subset (see Additional file 2: Appendix) of the individuals sampled were sequenced for two nuclear genes, Calmodulin (763 bp) and elongation factor-1 alpha (EF-1α; ca. 1400 bp). Calmodulin was amplified using the primers Cal60For  and Cal2Rev (UBC insect primer kit). A touchdown cycle was used with the following parameters: (1) a 94°C hold for 2 min.; (2) a touchdown PCR sequence of 94°C for 45 sec., 60-50°C for 45 sec., and 72°C for 1 min. – decreasing by 1°C every cycle and repeated for 10 cycles; (3) a steady PCR sequence of 94°C for 45 sec., 50°C for 45 sec., and 72°C for 1 min., repeated for 25 cycles; and (4) a 72°C hold for 10 min. For A. cingulata, internal calmodulin primers were created from the sequence of the other cicadas. EF-1α was amplified in two overlapping sections using the primers EF1-F001-cicada + EF1-R752 , annealing temperature 53°C, and EF1-PA-f650ambig  + EF1-N-1419 , annealing temperature 58°C. Cycle sequencing was conducted using the Applied Biosystems Big Dye Terminator v1.1 cycle sequencing kit at 1/8- to 1/4-scale reaction volume, and the product was cleaned by Sephadex filtration (GE Healthcare, Piscataway, NJ, USA) and visualized on an Applied Biosystems ABI 3100 capillary sequencer. Sequences were analyzed using ABI Prism Sequencing Analysis software v3.7 (Applied Biosystems), edited in Sequencher v3.1 (Gene Codes Corp., Ann Arbor, MI), aligned in MacClade  and checked by eye.
Individual gene-trees were estimated using partitioned maximum-likelihood (ML) in Garli version 2.0 , using datasets that contained all unique haplotypes. Each nuclear gene was modeled using a single partition because of the low number of parsimony-informative sites available for estimating parameters (see Results); substitution models for these genes were selected using Modeltest v3.7 . For the mtDNA data, alternative combinations of partition schemes and partition-specific substitution models, based on the three codon-position categories, were tested using the “greedy” algorithm of the program PartitionFinder v1.0  and the BIC criterion, with the branch lengths of alternative partitions linked and with the software set to evaluate the full substitution model set. Default settings were used for Garli analyses, except that the number of generations required for termination (genthreshfortopoterm) was increased to 40,000 to improve the search. ML analyses in Garli offered estimates of branch lengths that were not influenced by Bayesian priors [78, 79]. The number of discrete gamma categories was set to 8 to improve resolution of site rates over the default setting of 4 categories. For each analysis, 100 bootstrap pseudoreplicates were conducted in Garli, and these were summarized on the ML tree with the program SumTrees v3.3.1, part of the DendroPy package .
Pairwise genetic divergence plots
Because “saturation” of genetic data by multiple hits has been observed to influence divergence time estimates [81, 82], we constructed plots of uncorrected genetic distances calculated in Mesquite v. 2.75  vs. ML model-corrected patristic (tree-based) distances for the Amphipsalta
Notopsalta COI dataset and a published, combined COI + COII ingroup dataset of about the same size (bp) from Kikihia containing 30 species. Maximum uncorrected genetic distances from the Kikihia dataset are similar to those in Amphipsalta
To place the Amphipsalta-Notopsalta tree in an approximate time frame, Bayesian relaxed-clock phylogenetic analyses of the full COI dataset, with identical haplotypes and the outgroup included, were conducted using BEAST v1.6.1 . Because no fossil or geological calibrations are available for dating these cicadas, the analysis was calibrated using COI molecular clock estimates from the literature. While many insect studies have used the 0.0115 substitutions/site/my/lineage clock rate estimated by Brower , more recent studies have suggested both lower (e.g., [19, 85]) and higher rates  (Table 5), in part because of the use of highly partitioned sequence-evolution models  that tend to reconstruct more change along branches. In each analysis, the mean and standard deviation of the relaxed clock were fixed to reflect the literature estimates from Table 5 by the following settings: (1) we set ucld.mean (the mean of the lognormal prior distribution on branch rates) to 0.0115 substitutions/site/my, the Brower  rate, (2) we set ucld.stdev, the standard deviation of the lognormal prior on branch rates, to 0.3, so that the 95% confidence interval of the lognormal distribution included rates from 0.006 to 0.0188 substitutions/site/my, and (3) we deactivated the operators for ucld.mean and ucld.stdev, so that the software did not attempt to estimate these parameters. We suggest that the approach of combining all empirical knowledge into a single prior is superior to the use of separate analyses under alternative rates an approach that was used in earlier work on Kikihia.
BEAST analyses of the COI dataset used the same three-partition scheme as the ML analysis. The Amphipsalta sequences were constrained to form a monophyletic clade, as supported by the ML analysis (see Results), because initial runs of some models suggested a tendency for BEAST to place the ingroup root between clades containing A. cingulata + A. zelandica and N. sericea + A. strepitans. Model parameters and partition rate multipliers were estimated separately for each partition. Four tree priors were tested in separate analyses and compared using the Bayes factor scores calculated in Tracer v1.5 : Yule, birth-death, a constant-size coalescent, and an expansion-growth coalescent. The criterion 2 ln Bayes factor > 10  was used to decide if a given model was superior. Models were run to 40 million generations, and stationarity and adequate sample sizes (i.e., > 200) were confirmed using Tracer  and by ensuring that multiple runs found the same solution. Other priors were left at the default values specified by BEAUti v 1.6.1 (provided with the BEAST package). Statistically improper priors, such as intervals extending to infinity, were converted to proper priors with a large, finite upper bound. Prior distributions were estimated by running the analysis with no data.
Test for intraspecific diversification
Because phylogeographic patterns in the mtDNA dataset suggested ongoing intraspecific allopatric diversification, we tested for the threshold between interspecific and intraspecific lineage dynamics using the generalized mixed Yule-coalescent (GMYC) model of Pons et al.  as implemented in the R package SPLITS [88, 89]. If the samples of each species are drawn from a single panmictic population, the method should identify a branching-rate shift at the base of each species caused by the change from coalescent (intraspecific) to birth-death (interspecific) dynamics. A tree containing only unique haplotypes was analyzed in BEAST to obtain a chronogram under the same model used in the divergence-time analysis above. The chronogram was tested in the GMYC analysis using an interval of 0,10 and a single-threshold model.
Geographic structure in the mtDNA dataset was also tested using an analysis of molecular variance (AMOVA) as implemented in Arlequin v 3.1 . Populations were combined into groups according to a six-region scheme defined by Cook Strait and latitude lines -37° S, -39° S, -43° S, and -45° S.
Ecological niche modeling
Combining ecological niche models with paleoclimatic data to hindcast species ranges offers a means of testing refugia hypotheses developed from genetic information. For comparison with intraspecific phylogeographic patterns, the current and LGM distributions of each species were estimated using ecological niche models. Climate data used to construct the ENMs included mean annual rainfall (mm), mean February rainfall (mm), mean annual solar radiation (kJ/day/m2), mean annual temperature (°C), mean February temperature (°C), minimum temperature of the coldest month (°C, July), and October vapor pressure deficit (kPa), all at 100 m resolution . These layers, derived from meteorological data (1950–1980, ), were fitted to the LGM based on estimates of temperature depression from marine isotope stages and estimates of LGM topography based on depression of the sea level by 120 m (J.R. Leathwick, unpublished data, see ).
Ecological niche models were generated with Maxent 3.3.1 [92, 93]. Models were built using 341 A. zelandica, 62 A. strepitans, 86 A. cingulata and 87 Notopsalta sericea localities. Maximum iterations were increased as necessary to allow the algorithm to converge, with default settings used otherwise. To ensure consistency of model predictions among repeated runs, we performed a 10-fold cross-validation, in which a different 90% of localities were used to train the model and 10% were used to test it for each of 10 runs, such that each locality was used to test the model once. Estimated models for each species were projected onto LGM climate surfaces for all 10 runs, using the "fade by clamping" setting for output grids, to visually inspect for differences in regions projected as suitable habitat between replicate runs. "Clamping" refers to the handling of LGM grid sections with climatic values outside the range observed during training – Maxent sets these to the corresponding maximum or minimum observed training value. The "fade by clamping" method attempts to correct for the effect of this uncertainty on the projected distribution by reducing the estimated climatic suitability proportional to the level of clamping. Clamping grids were also inspected for differences in model output under different data partitions. The final geographical projections represent the mean point-wise prediction over ten model runs. Model performance was evaluated using threshold-dependent binomial omission tests and the Area Under the (Receiver Operating Characteristic) Curve (AUC) calculated by Maxent.
Regions which contain suitable climate conditions for a species may be unoccupied due to other factors, such as dispersal barriers . Inclusion of these regions during model calibration can result in overfitting to conditions found near the collection localities . When hindcasting to LGM climate conditions indicated no refugia for the two taxa limited to NI (A. cingulata and N. sericea; see below), we re-ran these analyses using NI climate data only for training and testing to examine whether the model for these species was being overfitted by the inclusion of SI data. The model was then projected onto LGM surfaces for all of NZ. While Cook Strait is not likely to be an absolute dispersal barrier to cicadas, it could delay colonization of SI by species restricted to northern NI refugia.