Taxonomic names and frog specimens used
Since Frost et al. , taxonomic ranks and names of many anuran taxa including microhylids have changed frequently. To avoid needless confusion, in this paper, taxonomic ranks and names followed Frost et al.  and Frost , respectively. The 35 specimens analyzed here are shown in Table 1.
PCR and sequencing
Total genomic DNA was extracted from muscle tissue of each specimen using a DNeasy Tissue Kit (QIAGEN) according to the manufacturer's protocol. From the total DNA, partial portions of two mt genes, 16S ribosomal RNA (16S, approx. 0.9 kbp) and Cytochrome c oxidase subunit I (cox1, approx. 0.8 kbp), and six nuclear encoding genes, brain-derived neurotrophic factor (bdnf, approx. 0.7 kbp), chemokine receptor 4 (cxcr4, approx. 0.7 kbp), Na+/Ca2+ exchanger (ncx1, approx. 1.3 kbp), recombination-activating proteins 1 and 2 (rag1 and rag2, approx. 1.6 and 1.2 kbp, respectively) and tyrosinase (tyr, approx. 0.7 kbp), were amplified by PCR. These gene portions cover almost all sequence regions of the alignment data used in two previous molecular phylogenetic studies of microhylids (16S, ncx1, cxcr4, and rag1 ; cox1, bdnf, tyr, rag1 and rag2 ), and the amplification primers used here basically followed these studies. The detailed sequences of PCR and sequencing primers are available upon request (to AK). PCR mixtures were prepared with an Ex-Taq Kit (TaKaRa Bio) or a KOD-FX Kit (TOYOBO) according to the manufacturer's protocols. The resultant PCR fragments were purified by ExoSAP-It for PCR Clean-Up kit (US Biochemical) and ethanol precipitation. The gene sequences were directly determined from the purified PCR fragments with a BigDye® Terminator cycle sequencing kit and an automated DNA sequencer (ABI3130xl, Applied Biosystems). The resultant sequence data were deposited in the DNA databases (Table 1).
Molecular phylogenetic analyses
To perform phylogenetic analyses, we produced a long alignment dataset (Aln-1) and two data subsets (Aln-2 and Aln-3) by combining our data with that from Van Bocxlaer et al. and/or van der Meijden et al. [2, 3] (Additional file 6). The long dataset (Aln-1) made from the data of all three studies includes a long sequence (7164 nucleotide sites in total) of eight gene partitions of 42 taxa (29 microhylids from nine subfamilies, six afrobatrachians, five natatanurans, and two hyloids). Among these two data subsets, the Aln-2 has a middle length sequence (4122 nucleotide sites) of five gene portions from 82 OTUs consisting of 53 microhylids from ten subfamilies, eight afrobatrachians, seven natatanurans and three hyloids, six archaeobatrachians, a caudate, and four other vertebrates. In Aln-2, cox1 sequences of non-neobatrachian taxa were not used because of the fast nucleotide substitution rate of this gene . The Aln-3 data subset includes four gene portions (2813 nucleotide sites) from 63 taxa consisting of 44 microhylids from ten microhylid subfamilies, eight afrobatrachians, nine natatanurans, and two hyloids. More detailed information of used taxa and sequences in each alignment data are summarized in the Additional file 7. To make these alignments, we initially aligned each gene portion by using MUSCLE  implemented in SeaView ver. 3.2 . The resultant alignments were revised by eye using amino acid alignments as the guide. For 16S data, ambiguous alignment sites were removed by using Gblocks ver. 0.91 b  with a default parameter. Then, each gene portions were concatenated to make the above alignment datasets. The datasets used in this study are available from Additional file 8.
Based on the long dataset (Aln-1) and two data subsets (Aln-2 and 3), phylogenetic trees were constructed by the ML and BI methods. Heterozygous nucleotides occasionally found in nuclear gene sequences were deleted (and assign as missing data). Gaps in the alignments were treated as missing data. For ML and BI analyses, partitioned models were applied. The most appropriate substitution model for each gene portion was estimated based on the Akaike and Bayesian Information Criteria (AICc1 and BIC1 [20, 21]) implemented in Kakusan3  for ML and BI analyses, respectively. In ML analyses, the parameters for nucleotide frequencies, gamma distribution (G; with eight categories), and proportion of invariable sites were estimated by Treefinder program ver. Oct. 2008 . The estimated best-fit models for each partition are shown in Additional file 6.
The ML analyses were performed using the Treefinder, and BP values were calculated with 1000 pseudo-replications. The BI analyses were performed using MrBayes ver. 3.1.2 . Two independent runs of four Markov chains were conducted for 11 million generations for all datasets (sampling frequency was one tree per 100 generations for every datasets). Parameter estimates and convergence were checked with Tracer ver. 1.4 , and the first 1 million trees and first 3 million trees were discarded for Aln-1 and 2 data and Aln-3 data, respectively. Node credibility of the BI tree was evaluated by Bayesian posterior probabilities (BPP). Two hyloids (Agalychnis callidryas and Bufo japonicus) and zebrafish (Danio rerio) were employed as outgroups in the analyses based on Aln-1 and-3 data and Aln-2 data, respectively.
Our ML tree topology and alternative microhylid phylogenies suggested by previous studies [1–3] were compared in an ML framework using approximately unbiased (AU) and Kishino-Hasegawa (KH) tests [26, 27] implemented in Treefinder. For the phylogenetic position of Gastrophrynoides, alternative topologies having high lnL scores were searched under the "non-monophyly of Gastrophrynoides and asterophryines" constraint. In this analysis, we used "are NOT" option of tree constraint command implemented in PAUP4.0 b . Among the trees obtained under this constraint, one topology with the highest lnL score was also tested (topology 17 in Table 3).
Divergence time estimations using a Bayesian molecular clock method were conducted as in previous molecular dating analyses on microhylids [2, 3]. We used two combinations of an alignment dataset and a topology, Aln-2d + Tree1a and Aln-3d + Tree1b (see below). Three sets of calibration points were applied to each combination (see below) for a total six dating analyses (calibration A-F). The topology and the applied calibration points in each calibration are shown in Additional file 2 and the alignment datasets used are available in Additional file 8.
The first alignment dataset used (Aln-2d) is basically the same with the Aln-2 but cox1 and tyr sequences were removed from the original Aln-2. This is because these genes are unsuitable for molecular dating of microhylid subfamilies (due to high nucleotide substitution rates ), and the tyr sequences did not allow us to calculate variance-covariance matrices of branch lengths for the designated topology (Tree1a), possibly due to many nodes lacking supporting nucleotide changes. Thus, Aln-2d only contains rag-1, rag-2 and bdnf sequences (2970 nucleotide sites in total). The other alignment data (Aln-3d) is similar to Aln-3 but this data contains a larger number of OTUs (total 101) to allow us to employ broad calibration points that were used in Van Bocxlaer et al. . The Aln-3d (2739 sites in total) is slightly shorter than the original Aln-3 because of increment of ambiguous alignment sites due to adding OTUs. Zebrafish (Danio rerio) was employed as outgroup for both Aln-2d and-3d analyses. The two tree topologies (Tree1a and Tree1b) used in the age calibrations were modified from the ML trees of Aln-2 and Aln-3 datasets, respectively; they have the ML topology of Aln-1 (= Figure 1) for microhylids, natatanurans, and afrobatrachians (see Additional File 2), while all other relationships were as inferred for the respective datasets.
A total 14 calibration points, based on nine fossil records (F1-9) and five paleogeographic events (G1-5), were applied in this study as indicated below. F1: > 330 Ma, split of Lissamphibia and Amniota (fossil of the earliest aïstopod). F2: 338 - 312 Ma, split of Diapsida and Synapsida (fossils of early diapsids and synapsids). F3: > 230 Ma, split of Anura and Caudata (fossil of Triadobatrachus). F4: > 164 Ma, split of Costata (Alytidae and Bombinatoridae) from other anurans (fossil of Eodiscoglossus). F5: > 151 Ma, split of Rhinophrynidae and Pipidae (fossil of Rhadinosteus). F6: > 146 Ma, split of Cryptobranchidae and Hynobiidae (fossil of Chunepeton). F7: > 55 Ma, split of Bufonidae and other hyloid families (fossil of the oldest Bufonidae). F8: > 29 Ma, split of Rana (sensu lato) and other ranid genera (fossil of the oldest Rana). F9: > 404 Ma, split of lungfishes and tetrapods (fossil of the oldest tetrapodomorph ). G1: > 110 Ma, split of Pipa and Xenopus (fragmentation of the African and South American landmasses). G2: > 65 Ma, split of Dyscophinae and Microhylinae (fragmentation of India and Madagascar). G3: > 42 Ma, split of Agalychnis and Litoria (fragmentation of Australia and South America). G4: < 15 Ma, Blommersia wittei and B. sp. "Comoro" (formation of Comoro Islands). G5: > 5 Ma, Alytes muletensis and A. dickhilleni (the Mediterranean salinity crisis). With the exception of F9, these calibration points were applied in Van Bocxlaer et al.  and/or van der Meijden et al. . The minimum ages of F1 and F2 were adjusted from those used by van der Meijden et al. (from 338 and 288 to 330 and 312, respectively) based on more recent information . Also to accommodate more recent information, the minimum divergence times of F2 and F3 were changed from the original values used by Van Bocxlaer et al. (from 306.1 and 245.0 to 312 and 230, respectively).
Seven calibration points, (F1-F3, F8, and G1-G3) were applied in calibrations A and D. The seven points (F1-F3, G1, and G3-G5) previously used in van der Meijden et al. , plus F9, were applied in calibration B. Six points (F1-F3 and F7-F9) consisting of only fossil evidences were applied in calibration C. Nine points (F1-8 and G2) previously used in Van Bocxlaer et al.  were applied in calibration E. Eight fossil points (F1-8) were applied in calibration F.
The age calibrations were performed using software packages PAML ver. 4  and Multidivtime . In all calibrations, optimized branch lengths with their variance-covariance matrices of each alignment data were estimated for each gene partition (i.e., multiple loci analysis) with an F84 + G model (similar to the best model among available models in Multidivtime) using estbranches program. Parameters used in the model were estimated by PAML. To estimate divergence times, Markov chains were conducted for 10 million cycles with one per 100 sampling frequency and 10% burn-in for all six calibrations.
Four additional dating analyses (calibrations G-J) were performed using two distinct tree topologies (= the topologies of maximum likelihood trees from Aln-2 and Aln-3 datasets; see Additional file 4). The procedures of these calculations were much the same as the above but 3 million Markov chain cycles were conducted for these additional calculations.