Sample collection and Molecular techniques
Sampling was designed to cover most of the range of P. nigromaculata across the temperate and subtropical zones in China. We obtained 262 individuals from 28 localities during the year 2002 and 2003. Two hundred and six samples were used in cytochrome b (Cyt b) sequence analyses, the other 56, collected from 4 sites (25-a, 25-b, 25-c and 25-d) in Dongliao area (locality 25, DL, Fig. 1) were used in PCR-RFLP analyses to identify secondary contact between haplotype group A and B. The distance between sites 25-a and 25-b is 11.9 km, and that of 25-b and 25-c, 25-c and 25-d is 2.2 and 28.1 km, respectively. For inter simple sequence repeat (ISSR) analyses, six or nine (just in DL) individuals were sampled from the 28 localities mentioned above. Rana catesbeiana [GenBank: AF205089] and Rana rugosa [GenBank: AF205093] were used as outgroup taxa. Four conspecific haplotypes from Korea (KR) and Japan (JP-1, 2, 3) [GenBank: AF205087, AY315755~AY315757] were also included in the phylogenetic analyses to help infer relationships with Chinese haplotypes. The sampling code and population localities are given in Table 1, and the map of collecting localities is shown in Fig. 1.
Tissue was taken from leg muscle and stored in 95% ethanol. Extraction of DNA followed standard phenol/chloroform extraction methods. The complete 1143 bp of mitochondrial Cyt b gene was amplified with primers Glu-f: GAC TCT AAC CTG GAC CAA TAG-3' and contr5'r: TAA ATT TAT GCT CTA TAC A-3', which were designed based on the published mtDNA complete sequence of Rana nigromaculata [GenBank: AB043889] . The polymerase chain reaction (PCR) was carried out in 30 μL volumes containing 3.0 μL 10 × Buffer, 2.0 mM MgCl2, 0.2 mM each dNTP, 0.25 μM each of primers, 5–10 ng of template DNA and 1.0 U of Taq DNA polymerase (Promega). The PCR cycling parameters were 95°C for 4 min, 35 cycles of 95°C for 40 s, 50°C for 45 s, 72°C for 1 min, and 1 cycle of 72°C for 7 min. Templates for sequencing were purified and sequenced in both directions by United Gene Holdings (Shanghai). Nucleotide sequences were initially aligned using CLUSTAL X Version 1.81 and corrected manually. Complete sequences were assembled using Seqman II (DNASTAR). The program DNAClub which could predict cutting sites in a DNA sequence was used to search for restriction endonucleases that would yield unique, specific restriction digestion profiles for each haplotype group. For the 56 unsequenced samples, we used two restriction enzymes of Taq I and Xba I to digest the Cyt b products simultaneously, by which we could distinguish the two sharply diverged haplotype groups rapidly and unambiguously.
We also investigated diversification and potential gene flow using inter simple sequence repeat PCR (ISSR). Ten 3'-anchored primers (UBC807, 808, 809, 811, 817, 818, 825, 826, 827 and 835) from University of British Columbia (ISSR set #9) were used for the study. The reaction mixture (25 μL) contained 2.5 μL of 10 × reaction buffer, 25 ng of DNA template, 0.5 μM of a single primer, 2.0 mM MgCl2, 0.2 mM of each dNTP and 1.0 U of Taq DNA Polymerase (Promega). The PCR products were separated on 2% agarose gels buffered with 0.5 × TBE, detected by staining with ethidum bromide, and photographed under ultraviolet light. Molecular weights were estimated using a 100-bp DNA ladder.
We used PAUP*4.0b10  for phylogenetic analyses using maximum parsimony (MP), maximum likelihood (ML) and neighbor-joining (NJ) methods. Most local populations showed low level of genetic differentiation and were composed of haplotypes with only 1 to 3 nucleotide changes. For ML phylogenetic reconstruction, these similar haplotypes would considerably increase the computation time. Therefore, the dataset comprised of all haplotypes was only used for MP reconstruction. To save the computing time, we cut the sequences with less than four-step substitutions for ML analysis to a 49 ingroup exemplar dataset including 45 from China, 1 from Korea and 3 from Japan. The best-fit likelihood model of sequence evolution was HKY + I + G, which was chosen on the basis of hierarchical likelihood-ratio tests (hLRTs) as implemented in MODELTEST 3.06 . The parameters of the model were calculated using PAUP*. The base frequencies were 0.2582, 0.3340, 0.1334 and 0.2743 for A, C, G and T, respectively, transition/transversion ratio ti/tv = 4.2823, proportion of invariable sites Pinvar = 0.5841, gamma distribution shape parameter α = 1.9961. We also conducted MP and NJ analyses for our reduced exemplar taxa. ML heuristic search was conducted using the following options: addition sequence = 'as-is', with 1 tree held at each step, TBR swapping algorithm, Collapse and Multrees options in effect, steepest descent option not in effect. Nodal support of the ML tree was estimated using bootstrap method  with 100 pseudoreplicates; Maximum parsimony (MP) analyses were conducted using heuristic search with 100 random sequence additions and tree-bisection-reconnection (TBR) branch swapping. Robustness of the MP trees was assessed by 1000 bootstrap replicates; NJ reconstruction was conducted using distances calculated under the HKY model, support for the topology was evaluated by bootstrapping 5000 replicates. We also used the chosen model to calculate the diversity indices and the sequence distances.
We calculated the statistical tests D * and F * proposed by Fu and Li , and the D test statistic proposed by Tajima  implemented in DnaSP 3.0  for testing the hypothesis that all mutations are selectively neutral .
Nested-clade phylogeographic analysis
We applied nested clade analysis (NCA [18, 52]) to gain further insight into the demographic history of P. nigromaculata. The statistical parsimony  cladogram networks were constructed using TCS version 1.13 . The connection for haplotypes at 95% confidence level were nested into higher level clades followed the algorithm given by Templeton et al. . We used GEODIS version 2.0  to test the null hypothesis of no association between haplotypes and their geographical locations by randomly permuting the observations within a nesting clade across geographical locations. After 1000 random permutations, the clade and nested clade distances (D
), the interior-tip distances (I-T
) were calculated. The observed D
were then contrasted to the null distribution to infer the statistically significantly large (L) and small (S) distances at 5% deviation level. For the clades with significant geographical associations, the recent published inference key  was used to infer possible causation.
Mismatch distributions were created using ARLEQUIN version 2.000  to detect historical demographic expansions for the major lineages (with or without range expansion in NCA analysis). We also calculated Tajima's D [49, 57] and performed Fu's Fs test  to confirm evidence of demographic expansion. We estimated the time since expansion with the equation τ = 2 ut , where τ is the mode of the mismatch distribution, an index of time since expansion expressed in units of mutational time, u is the mutation rate for the whole sequence, and t is the time since expansion. Finally, the comparisons of haplotype diversity (h) and nucleotide diversity (π) within different lineages were used to infer historical demography and examine the biological inference from our nested clade analysis.
Divergence time between lineages
We analyzed the divergence time between the two lineages with a coalescent-based approach using the program MDIV . We ran the program under the 'finite model', which does not assume that only one mutation occurs at each site but takes into account the possibility of multiple hits, differences in the nucleotide frequencies and the presence of transition/transversion bias. MDIV estimated values for theta (θ = 4 N
μ), migration rate (M = 2 N
m), time of divergence (T = t/2 N
) . We then ran each simulation 5 × 106 times with a 10% burn- in period as suggested. Likelihood values for θ, M and T were calculated and the value with the highest posterior probability accepted as the best estimate. Values for T were calculated using an evolution rate estimated at 4.4% per Myr .
Unequivocally scorable and consistently reproducible amplified ISSR bands were scored as present (1) and absent (0), each of which was treated as an independent character regardless of its intensity. Fragments of the same molecular weight were considered as the same locus.
The binary matrix was used under the Hardy-Weinberg equilibrium to calculate the percentage of polymorphism (PPB), Nei's gene diversity (H
), the observed number of alleles per locus (A
), the effective number of alleles per locus (A
), Shannon's information index (I), total gene diversity, the level of gene flow using POPGENE version 1.31 .
Furthermore, to illustrate the relationship among populations, genetic divergence (F
) were calculated using software ARLEQUIN version 2.000. We generated UPGMA (unweighted pair group method with arithmetical averages) trees using the SAHN- clustering and TREE programs from the NTSYS-pc, version 2.1 package . In this tree, individuals were enforced to their populations except for those from the DL population (site 25), which were subdivided into DLA and DLB according to their ISSR patterns.