Singleton molecular species delimitation based on COI-5P barcode sequences revealed high cryptic/undescribed diversity for Chinese katydids (Orthoptera: Tettigoniidae)

Background DNA barcoding has been developed as a useful tool for species discrimination. Several sequence-based species delimitation methods, such as Barcode Index Number (BIN), REfined Single Linkage (RESL), Automatic Barcode Gap Discovery (ABGD), a Java program uses an explicit, determinate algorithm to define Molecular Operational Taxonomic Unit (jMOTU), Generalized Mixed Yule Coalescent (GMYC), and Bayesian implementation of the Poisson Tree Processes model (bPTP), were used. Our aim was to estimate Chinese katydid biodiversity using standard DNA barcode cytochrome c oxidase subunit I (COI-5P) sequences. Results Detection of a barcoding gap by similarity-based analyses and clustering-base analyses indicated that 131 identified morphological species (morphospecies) were assigned to 196 BINs and were divided into four categories: (i) MATCH (83/131 = 64.89%), morphospecies were a perfect match between morphospecies and BINs (including 61 concordant BINs and 22 singleton BINs); (ii) MERGE (14/131 = 10.69%), morphospecies shared its unique BIN with other species; (iii) SPLIT (33/131 = 25.19%, when 22 singleton species were excluded, it rose to 33/109 = 30.28%), morphospecies were placed in more than one BIN; (iv) MIXTURE (4/131 = 5.34%), morphospecies showed a more complex partition involving both a merge and a split. Neighbor-joining (NJ) analyses showed that nearly all BINs and most morphospecies formed monophyletic cluster with little variation. The molecular operational taxonomic units (MOTUs) were defined considering only the more inclusive clades found by at least four of seven species delimitation methods. Our results robustly supported 61 of 109 (55.96%) morphospecies represented by more than one specimen, 159 of 213 (74.65%) concordant BINs, and 3 of 8 (37.5%) discordant BINs. Conclusions Molecular species delimitation analyses generated a larger number of MOTUs compared with morphospecies. If these MOTU splits are proven to be true, Chinese katydids probably contain a seemingly large proportion of cryptic/undescribed taxa. Future amplification of additional molecular markers, particularly from the nuclear DNA, may be especially useful for specimens that were identified here as problematic taxa. Electronic supplementary material The online version of this article (10.1186/s12862-019-1404-5) contains supplementary material, which is available to authorized users.


Introduction
Taxonomic ambiguities and uncertainties are frequently generated due to cryptic or hidden species [1]. Species identification based on morphological characters requires experienced taxonomists [2]. Recently, DNA barcode have been recommended for the insect biodiversity evaluation [3]. DNA barcoding employs a single or a few standardized, highly variable and easily amplified DNA fragments for species identification [4,5]. The 5′ portion of mitochondrial cytochrome c oxidase subunit I (COI-5P) has become the standard insect barcoding marker. Numerous studies rely on COI-5P as the only molecular information for insect species delimitation and identification [6][7][8][9]. DNA barcodes not only substantially improve the accuracy of species identifications but also accelerate the study of taxonomically difficult and hyperdiverse taxon. The introduction of DNA barcoding as an auxiliary method in taxonomy has many benefits. Firstly, DNA barcodes can lead to an easy assignment of specimens of certain life stages (e.g. eggs, larvae, nymphs or pupae) to known species [10]. Second, DNA barcoding requires a solid taxonomic background to use as a reference [11]. DNA barcodes can also add scientific value to standard museum specimens, as the information they contain is revealed through molecular analyses [12]. Third, DNA barcodes can accelerate the discovery of cryptic/undescribed species and have been incorporated into the new species description for several zoological groups [13][14][15][16].
Recently, several similarity-based species delimitation approaches, e.g. Barcode Index Number (BIN), REfined Single Linkage algorithm (RESL), Automatic Barcode Gap Discovery (ABGD) and a Java program uses an explicit, determinate algorithm to define Molecular Operational Taxonomic Unit (jMOTU); and clustering-based approaches, e.g. Generalized Mixed Yule Coalescent (GMYC), Bayesian implementation of the Poisson Tree Processes model (bPTP) have highlighted extensive inconsistency in morphological taxonomy [17,18]. ABGD analysis generate MOTUs based on features in sequence distance distributions [19]. RESL employs single linkage clustering as a tool for the preliminary assignment of records into one MOTU [20]. BIN system developed within the Barcode of Life Data (BOLD, www.barcodinglife.org) system to register the OTUs delineated by RESL [20]. jMOTU, a Java program uses an explicit, determinate algorithm to define MOTU [21]. GMYC and bPTP based on quite different models. GMYC applies single or multiple time thresholds to delimit species in a Maximum likelihood context, using ultrametric trees [22,23]. bPTP is similar to GMYC, but used substitution calibrated trees [24]. Either single approach may lead to over-and/or underestimating species diversity. Here, we carried out different approaches for species delineation using DNA sequence data in order to have more robust results. The distinction between intraspecific and interspecific genetic divergence is critical for DNA barcoding; greater intraspecific divergence produces a greater likelihood of overlap with interspecific divergence [11]. For similarity-based approaches, the distance cut-off used for the determination of MOTUs is important, but arbitrary. Even no one threshold captures all species concepts or operational criteria [12]. Relaxed clustering-based methods that permit larger divergences within cohesive clusters may give even greater utility to similarity-based approaches [25]. The choice of a species delimitation method from molecular data has a considerable effect on estimated species entities and, thus, also on species richness estimates [26]. BOLD system [27] provides a unique environment for sharing data across projects; it not only supports all phases of the analytical pathway, from specimen collection to a tightly validated barcode library, but has already integrated many analysis tools. The BOLD system assigns a BIN for all barcode records. The BIN system will also help us to focus on those taxa that share the same BIN or split amongst multiple BINs [28].
Single-locus species delimitation methods have become popular due to the adoption of the DNA barcoding paradigm [17]. When delimiting putative species based on single-locus data, researchers should consider using both clustering-and similarity-based methods to account for the shortcomings of different methods [17,29,30]. Previously reported cases of high-failure rates in using traditional morphospecies definitions were largely resolved upon using MOTUs instead of traditionally described morphospecies, which suggested that some morphospecies may require taxonomic revision [25].
DNA barcoding has been used to document both grasshoppers [10,[31][32][33][34] and katydids [35,36]. Unfortunately, these studies included only a few species and a limited number of specimens from each species, which prevents the rigorous assessment of species boundaries among closely related lineages and for calculation of intraspecific distances. Recently, Hawlitschek et al. (2017) presented a large-scale DNA barcode data set that includes 748 COI sequences from 127 species of Central European crickets, katydids and grasshoppers [37].
The Katydid diversity is rich but woefully underexplored in China. Both Mecopoda elongata and Gampsocleis gratiosa have a long history as singing pets in China. Researchers have reported that numerous species belong to the family Tettigoniidae Krauss, 1902 [38], but only a very limited COI-5P barcode records were available in the GenBank and BOLD systems. Barcode-based species identification relies on a comparison of its DNA barcode with those of determined individuals. To be effective, species-level assignments require a reference sequences database which represents all known species [39].
As different methods may yield inconformity conclusions [40], the accurate species identification and/or delimitation requires further integrative analysis. This study represents the first large-scale barcoding study of the family Tettigoniidae. Different molecular species delimitation methods such as BIN, RESL, jMOTU, ABGD, GMYC, and bPTP were applied to an unexplored Chinese katydid fauna. The main aims were (i) to present the largest species-level barcoding study for the Chinese katydids to date and then characterize the range of genetic divergence; (ii) to evaluate the correspondence between the identified morphospecies and the defined barcode groupings using molecular species delimitation methods; (iii) to infer species diversity and compare it to traditionally identified morphospecies sorting; and (iv) to test for the existence of a hidden diversity among otherwise well-defined taxa. Here, we define independently barcode groupings that remain morphologically indistinguishable from each other as cryptic or hidden species.

Specimen collection and morphological identification
We collected 2576 katydid specimens throughout China. Specimens were fixed in absolute ethanol and were transferred to − 20°C storage prior to genomic DNA extraction. Whenever possible, more than one location was sampled for each species to survey for intraspecific variation. Due to the polymorphism and remarkably wide distribution, a broader sampling (n ≥ 10 specimens) was particularly intended for 39 morphospecies, namely, Specimens were sorted morphologically and were taxonomically identified at least to the subfamily-level. For ease of information management, all unidentified specimens without a scientific name were assigned to an interim species (hereafter noted as BIN-species) by BINs provided by BOLD systems, and were noted by the generic/subfamily name plus sp.1, sp.2 and so on. For example, Atlanticus spp. were assigned to 19 BINs, so we noted them as Atlanticus sp1, Atlanticus sp2, Atlanticus sp3, etc. The specimen described as 'undescribed genus' were identified up to the subfamily level. All voucher specimens were preserved at the College of Life Sciences, Hebei University. More precise taxonomic determinations have been added for some specimens since their initial identification, and further taxonomic detail will be added to the BOLD systems as work progresses after publication.

DNA extraction, amplification and sequencing
DNA extraction from the leg muscle tissue of each specimen was carried out using the TIANamp Genomic DNA Kit (Tiangen Biotech, Beijing, China), following the manufacturer's instructions. The universal primer pairs LCO1490/HCO2198 [41] were used to amplify and sequence the animal barcode region. PCRs were performed with 96-well plates. The reaction master mix consisted of 2400 μL 2 × Premix Taq™, 480 μL each primer (10 μmol/L) and 1152 μL water. This mixture was prepared for each plate, and each well contained 1 × Premix Taq™, 1 μmol each primer, 3~5 ng genomic DNA. The PCR profile was comprised of an initial denaturation step of 2 min at 95°C, and 35 cycles of 30 s at 94°C , 40 s at 50°C and 1 min at 72°C, with a final extension of 7 min at 72°C.
Amplicons were checked through a 1% agarose gel and bi-directional sequencing was performed at GENE-WIZ (Tianjin, China). Sequences were manually edited and assembled into a consensus sequence using SeqMan Pro [42]. Consensus sequences, specimen collection data, specimen images and sequence trace files were uploaded to the Barcode of Life Data System (BOLD) and are available to the public domain as part of the project DNA Barcoding to Katydids from China (DBKC).

Data analysis
We constructed our primary dataset from BOLD systems, including all public records that were geographically limited in China, and with a length ≥ 600 bp. To reduce computational requirements, we divided our entire dataset into four subsets by a subfamily or consonant subfamilies based on the four monophyletic lineages that were recognized by our previous mitogenomic Bayesian inference analysis with the site-heterogeneous CAT-GTR model [43]. The DS-DBCHL dataset is composed of 596 barcode sequences from Conocephalinae, Hexacentrinae, and Lipotactinae. The DS-DBMEC dataset is composed of 376 barcode sequences from Meconematinae. The DS-DBPPM dataset is composed of 993 barcode sequences from Phaneropterinae, Pseudophyllinae, and Mecopodinae. The DS-DBTB dataset is composed of 200 barcode sequences from Tettigoniinae and Bradyporinae.

Sequence analysis module of BOLD systems
BIN was used as a registry for the records on the BOLD systems [27], which provided a means of confirming the concordance between barcode sequence clusters and species designations [20,28]. Cluster sequence analysis using RESL was independent of the BIN registry of BOLD systems [20]. Genetic distances were calculated and summarized using the "Distance Summary" and "Barcode Gap Analysis" tools on BOLD systems [27]. Barcode gap analysis provides the distribution of distances within each species and the distance to the nearest neighbor (NN) of each species. Species are tested for the presence of the barcode gap. The NN distance is the genetic distance between a species and its closest congeneric relative. All sequences (> 600 bp) were aligned using MUSCLE [44], phylogeny model used was the Kimura 2-parameter (K2P) [45], and pairwise deletion of missing data was done. The correlation between the maximum intraspecific variation (K2P) against record count and maximum geographic extent (km) of sampled individuals was determined for each species sampled from more than one sites.

Similarity-based methods: jMOTUs and ABGD
In performing the additional "Sequence analysis module" of BOLD, we also applied two similarity-based methods to generate MOTUs. Each of the four datasets was analyzed with jMOTU_define 2.04 [21,46] using different cut-offs (from 0 to 25 bp). ABGD analysis was performed to sort the sequences into hypothetical species that are based on the barcode gap, which can be observed whenever the divergence among organisms that belonging to the same species is smaller than the divergence among organisms from different species [19].

Clustering-based methods: GMYC and bPTP
Previous studies suggest that an additional bias may be introduced for clustering-based methods when duplicate haplotypes are not removed [17]. Prior to species delimitation analyses with two clustering-based methods, we applied DAMBE [47] to remove duplicate haplotypes. A total of 530 unique haplotypes from the DS-DBPPM dataset, 147 haplotypes from DS-DBTB, 158 haplotypes from DS-DBMEC, and 390 haplotypes from DS-DBCHL were included in the further analyses. Ultrametric trees were estimated with BEAST v1.8.3 [48] using a Yule speciation prior and an uncorrelated lognormal relaxed clock. The best-fitting substitution models were selected under the Bayesian Information Criteria (BIC), as was implemented in jModelTest 2.1.7 [49]. Each of the four datasets was analyzed for 200 million iterations with the first 10% discarded as burn-in. Posterior probabilities (PP) were estimated under a sampling frequency of every 10,000 steps. Tracer v.1.6 (http://tree.bio.ed.ac.uk/soft ware/tracer/) was used to determine when the analyses became stable and to check whether the effective sample size (ESS) values were greater than 200, as recommended by Drummond et al. (2007). The consensus trees obtained before the Markov chain reached stable and convergent likelihood values were discarded as burn-in with TreeAnnotator v.1.7. The resulting ultrametric trees were used for both single-threshold GMYC (sGMYC) [22] and multiple-threshold GMYC (mGMYC) [23] analyses using the Splits [50] and Ape [51] libraries.
The coalescent clustering-based method (bPTP) was performed using the online server (http://species.h-it s.org/) and the Bayesian Inference trees from MrBayes 3.2 [52]. We ran bPTP analyses for 500,000 MCMC generations with a thinning of 500 and a burn-in of 0.1. Convergence of the MCMC chain was assessed as recommended by Zhang et al. (2013). Outgroups were pruned before conducting bPTP analyses to avoid bias that may arise if some of the outgroup taxa were too distantly related to the ingroup taxa [24].

Comparison morphospecies and MOTU of species delimitation method outputs
All NJ-K2P trees of unique COI-5P haplotypes were performed using MEGA v7.0 [53]. The results of the species delimitation methods were summarized on the NJ-K2P trees with a midpoint root. Four different taxonomic scenarios between morphospecies (equated with BIN-species for unidentified specimens) and MOTU clustering methods outputs can be distinguished: (i) 'MATCH' , whereby the members of a species were placed in one MOTU that had no other members; (ii) 'MERGE' , whereby the members of a species were placed in one MOTU together with members from another species; (iii) 'SPLIT' , whereby the members of a species were assigned to more than one MOTU that had no other specimens from another species; and (iv) 'MIX-TURE' , whereby each species show a more complex partition involving both 'MERGE' and 'SPLIT'. To further compare the results of different species delimitation methods, we also employed the adjusted Wallace coefficients analysis [54] to quantify MOTUs agreement with Linnaean species labels or among MOTUs from different species delimitation methods through the website Comparing Partitions (http://www.comparingpartitions. info/) [55]. Here, we excluded singletons and only discussed species or MOTUs that are represented by more than one specimen. Finally, MOTUs were defined considering only the clades represent groups of barcodes recovered in at least four of the seven species delimitation methods [20].

Results
DNA was extracted from 2576 Chinese katydid specimens, of which 2131 specimens (82.73%) were successfully sequenced for COI-5P barcode. All records were removed that less than 600 bp, contained contaminants, had stop codons, flagged as misidentifications or errors. In summary, we generated 2131 original COI-5P sequences from 131 morphospecies, including 528 specimens that were identified to genus level and one specimen that was placed at the subfamily level. The remaining unidentified lineages were represented using BINs, because they were either unable to be reliably identified based on the available reference materials or they were still undescribed. Additional 34 published COI-5P barcode sequences (Additional file 1: Table  S1) were retrieved from GenBank in the BOLD system and included for further analyses. The entire dataset containing 2165 Chinese katydids COI-5P barcode sequences comprised 1225 distinct haplotypes, and represented 60 genera, 9 subfamilies of the family Tettigoniidae Krauss, 1902, including Bradyporinae (n = 1), Conocephalinae (n = 490), Hexacentrinae (n = 102), Lipotactinae (n = 4), Meconematinae (n = 376), Mecopodinae (n = 53), Phaneropterinae (n = 861), Pseudophyllinae (n = 79), and Tettigoniinae (n = 199). Nearly a third of the morphospecies (39/131 = 29.77%) included 10 or more specimens. The number of barcode sequences per species varied from 1 up to 142 in the commonly occurring Ducetia japonica, dispersed throughout China.

Intra-and interspecific genetic divergences
The intra-and interspecific genetic divergences within different taxonomic ranks are detailed in Table 1. The intraspecific divergence for 109 morphospecies represented by more than one specimen averaged 1.54% (ranging from 0 to 27.45%). However, the intraspecific divergence for 77 unidentified BIN-species represented by more than one specimen averaged 0.39% (ranging from 0 to 2.81%). The identified morphospecies, which were assigned to more than one BIN, were the major cause of higher intraspecificity. The mean interspecific divergence for identified morphospecies and unidentified BIN-species at genus level were 15.35% (ranging from 0 to 28.74%) and 12.86% (ranging from 1.07 to 28.07%), respectively. The mean interspecific divergence for identified morphospecies and unidentified BIN-species at family level were 22.29% (ranging from 2.16 to 32.88%) and 21.67% (ranging from 3.27 to 32.78%), respectively. The normalized mean intraspecific and minimum interspecific distance were 1.40 ± 0.02 and 0% for 109 identified morphospecies, in contrast to 0.45 ± 0.01 and 1.07% for 77 unidentified BIN-species, respectively ( Table 2).

Monophyletic morphospecies or BIN-species recovered by NJ-K2P trees
The NJ-K2P trees based on COI-5P haplotypes are shown in Additional files 3, 4, 5 and 6. The BIN 2.2% seed threshold was calibrated against morphological species using a selected groups of taxa: bees, butterflies and moths, fish, and birds [20]. The DBCHL dataset included 596 COI-5P barcode sequences, and was assigned to 77 BINs, including 13 singleton BINs, 61 concordant BINs, and 3 discordant BINs. NJ analysis with 390 COI-5P haplotypes sequences showed that all BIN-species represented by more than one specimen formed a monophyletic clade, except for BOLD:ACH8981, ACN8107 and ADE4977 (Additional file 3). The 579 sequences represented 40 identified morphospecies, in which 39 species included more than one specimen. 30 identified morphospecies formed monophyletic clusters. In addition, three specimens of Conanalus brevicaudus shared a unique COI-5P haplotype. The members of Ruspolia dubia, R. jezoensis, and R. liangshanensis were grouped jointly and formed a larger monophyletic clade with a low divergence (Additional file 3). The BOLD:ADE4977 includes a triad of species, including all members of Ruspolia jezoensis (n = 10), R. liangshanensis (n = 5) and most The within-species distribution is normalized to reduce bias in sampling at the species level   BOLD:ADB9302, and the members of clade D2 with high divergences were assigned to two BINs (BOLD:ADB9301 and ADB9303). Two specimens identified as Conocephalus japonicus is nested within the C. longlpennis clade (Additional file 3). The widely distributed morphospecies Ruspolia lineosa was monophyletic but contained two deeply subclusters. Almost all species delimitation analyses suggested that R. lineosa split into two putative species, R. lineosa BOLD:ACD5256 and ACD5257. Only ABGD suggested that R. lineosa to be a distinct species. Both sGMYC and mGMYC subsplit R. lineosa BOL-D:ACD5257 into two parts. The DBMEC dataset included 376 COI-5P barcode sequences and was assigned to 56 BINs, which included 24 singleton BINs, 28 concordant BINs, and 4 discordant BINs. NJ analysis with 158 COI-5P haplotypes sequences showed that all BIN-species with more than one sequence formed a monophyletic clade (Additional file 4). The 341 sequences represented 40 identified morphospecies, in which 29 species included more than one specimen. 22 identified morphospecies revealed nonoverlapping monophyletic clusters. The remaining 7 morphospecies were not monophyletic. Euxiphidiopsis capricercus was split into two reciprocally monophyletic clusters (Additional file 4) and corresponded to two BINs (BOLD:ADB5001 and ADE2467). Both Xiphidiopsis gurneyi and X. autumnalis were split into two reciprocally monophyletic clusters, and four clusters were grouped jointly as a separate clade (Additional file 4). The singleton species identified as Xiphidiopsis maculatus was embedded into the Xizicus spathulatus clade (Additional file 4). The members of Xizicus howardi were recovered in three reciprocally monophyletic clusters (Additional file 4) and corresponded to three BINs (BOLD:ACD5539, ADB5688 and ADE3141). Xizicus concavilaminus and X. kulingensis were grouped jointly (Additional file 4), and shared a single BIN (BOLD:ADB3332). Xizicus tinkhami and X. laminatus were grouped jointly (Additional file 4) and shared a single BIN (BOLD:ADB5868). Xiphidiopsis bituberculata and X. minorincisus were grouped jointly (Additional file 4) and shared a single BIN (BOL-D:ADB3697). Due to their small size, the active dispersal abilities of Meconematinae katydids were highly limited.
The DBPPM dataset included 993 COI-5P barcode sequences and was assigned to 181 BINs, which included 76 singleton BINs, and 105 concordant BINs. Only one barcode (Ducetia japonica RBTC523-16) without BIN records corresponded to sequences that did not fulfill the barcode compliance standards. NJ analysis with 530 COI-5P haplotypes sequences showed that all BIN-species with more than one sequence formed a monophyletic clade (Additional file 5). The 551 sequences represented 43 identified morphospecies, in which 33 species included more than one specimen. Twenty-seven identified morphospecies were revealed in nonoverlapping monophyletic clusters. Ruidocollaris truncatolobata was split into two relatively distant clusters (Additional file 5) and corresponded to two BINs (BOLD:ACD6433 and ACD7529). Record count, the number of COI-5P sequences after quality control, * Alignment: MUSCLE (Edgar, 2004). Filters applied: ≥ 600 bp only, exclude records contained contaminants, had stop codons, flagged as misidentifications or errors; Deletion method: Pairwise deletion; One COI-5P sequence without BIN records in DBPPM dataset corresponded that sequences did not fulfill with barcode compliance standards. All bPTP results were from Bayesian MCMC analyses. Results of GMYC and bPTP analyses were from genealogies based on COI-5P haplotype sequences  Parapsyra midcarina BOLD:ADB5037 (9)

Number of barcodes included in each BIN was given in brackets
One specimen that was identified as Mecopoda sp. is nested within the clade of M. niponensis (Additional file 5). Sinochlora szechwanensis was split into two relatively distant clades (Additional file 5). Clade C1 was formed exclusively by specimens from Yuexi, Anhui and was closely related to the species Sinochlora longifissa. Clade C2 was formed by the remaining specimens, and was closely related to Sinochlora sp2 (BOLD:ADB3463). Two specimens that were identified as Phyllomimus sp14 (BOL-D:ADB3808) and Phyllomimus sp15 (BOLD:ADB6425) are nested within the P. sinicus clade (Additional file 5). The DBTB dataset included 200 COI-5P barcode sequences, and was assigned to 32 BINs, which included 12 singleton BINs, 19 concordant BINs, and 1 discordant BINs. NJ analysis with 147 COI-5P haplotypes sequences showed that all BIN-species with more than one sequence formed a monophyletic clade, except for BOLD: ADA6837 and ADB3445 (Additional file 6). A total of 8 identified morphospecies were represented by 164 sequences, including 7 species that were represented by more than one specimen. The NJ analysis based on K2P distances revealed nonoverlapping clusters for 4 identified morphospecies, Chizuella bonneti, Gampsocleis carinata, G. gratiosa, and Tettigonia chinensis. In contrast, Gampsocleis sedakovii, G. sinensis, and G. ussuriensis were grouped jointly. The remaining 36 sequences were provisionally assigned into 19 putative species based on BINs.

Concordance among MOTUs from clustering-based species delimitation methods
Both sGMYC and mGMYC coalescence-based clustering of the specimens were partitioned in the data far more than in all of the other methods ( Table 4). The mGMYC analysis was by far the most sensitive of the methods compared, inferring a total of 397 entities, which was slightly higher than sGMYC (n = 382). For the DBCHL dataset, the sGMYC analysis identified 87 ML entities (95% confidence interval = 77-95): 71 ML clusters (95% confidence interval = 63-76) and 16 singletons. The mGMYC analysis identified 5 independent switches between speciation and coalescent processes, resulting in 94 ML entities (95% confidence interval = 85-105): 73  Previous studies have found that in some cases, GMYC could lead to an overestimation of the number of species [57,58]. GMYC requires prior construction of a ultrametric tree, which does not necessarily reflect the real divergence between species [30]. Alternatively, PTP estimates branching processes using the expected number of substitutions (vs. time in GMYC) and thus utilizes a nonultrametric phylogenetic tree as input [17]. Moreover, in contrast to GMYC, bPTP appeared less sensitive to the sampling regime [17].

Bidirectional concordance among species delimitation methods with adjusted Wallace coefficients
The adjusted Wallace coefficients were used to compare the bidirectional concordance among species delimitation methods for the identified katydid specimens dataset (Table 8). There were markedly directional results in discriminatory power between the molecular species delimitation methods and morphospecies or across molecular species delimitation methods. For example, the adjusted Wallace coefficient value (0.954) from BIN to morphology meant two specimens within a BIN had a 95.4% chance of belonging to the same morphospecies. In contrast, two specimens within a morphospecies had only a 70.9% chance of belonging to the same BIN. Overall, molecular species delimitation methods had a strong explanatory ability (0.872-0.969) for morphospecies; in contrast, the morphospecies had a generally low explanatory ability (0.391-0.995) for molecular species delimitation results. Both sGMYC and mGMYC were less concordant with morphospecies than other molecular species delimitation results. GMYC inferred a substantially unrealistically high number of katydid MOTUs ( Table 4). The morphospecies was best able to explain the results of both ABGD (0.955) and bPTP (0.940). They generally exhibited a more modest ability to explain the molecular results in comparison with all other methods (0.996-1 for ABGD and 0.990-0.998 for bPTP to be explained by all other molecular species delimitation results). The differences among the remaining methods (e.g., BIN, jMOTU, and RESL) in their concordance to the current taxonomy were modest.

Discussion
Species was defined as lineages that evolve separately from each other [59]. Determining the species boundaries is one of the central debates in biology [18]. DNA barcoding was widely used for species identification and/ or delimitation. Recent research on Central European Orthoptera found that ninety-three of these 122 species (76.2%, including all Ensifera) could be reliably identified using DNA barcodes [37]. In European diving beetles, 36% of multiply sampled species were nonmonophyletic [60]. Our study provides barcode data for 131 identified morphospecies and 148 unidentified BIN-species of Chinese katydids. There was a perfect correspondence between BIN membership and morphospecies in 83 cases, while another 34 species split into more than one BIN. more than one species merges as one BIN or in a combination of merges and splits. The maximum intraspecific distance is less than 3% in 74 of the 109 identified morphospecies (67.89%) represented by multiple individuals. Maximum intraspecific distance is less than 3% in all unidentified BIN-species. Our results revealed a much higher diversity in Chinese katydids than the current taxonomy suggests. There are more katydid species to be described and cryptic lineages within currently recognized species.

COI-5P barcode and BIN sharing
The causes for barcode and BIN sharing in closely related species include imperfect taxonomy [32], nonfunctional nuclear-encoded mitochondrial pseudogenes (Numts), hybridization, and incomplete lineage sorting [37,61,62]. Previous studies have found evidence for frequent hybridization across orthopteran closely related species, such as the genus Chorthippus [63], Aglaothorax [64], Tetrix [65]. COI-5P barcodes sharing was found in two cases, Gampsocleis sedakovii (GHF077_16) vs. G. sinensis (GHF074_16), as well as G. sinensis (RBTC480_16, RBTC1222_16, RBTC1209_16, RBTC1193_16) vs. G. ussuriensis (GHF028_16). The barcodes of G. sedakovii, G. sinensis and G. ussuriensis pooled into one discordance BIN (BOLD:AAY1322), but was not supported by our other analyses. The morphological high similarity between G. sinensis and G. ussuriensis. Another possible reason is that G. ussuriensis might in fact synonymised to G. sinensis. Meanwhile, G. sedakovii and G. ussuriensis occur in sympatry over large parts of their distribution ranges. Hybridization in sympatry has resulted the transfer barcodes from G. sedakovii to G. sinensis and/or G. ussuriensis, causing COI-5P barcode and BIN sharing. Hausmann et al. (2013) suggested that cases of BIN sharing among allopatric, slightly divergent genetic clusters represent recently separated lineages that have recently speciated or are still undergoing genetic differentiation [28]. The bPTP analysis indicated four Ruspolia species, R. dubia, R. jezoensis, R. liangshanensis, and R. yunnana, pooled into one MOTU. Meanwhile, both RESL and jMOTU analysis indicated R. dubia, R. jezoensis, and R. liangshanensis pooled into one MOTU. Consistent results has also been previously observed in which with regard to R. jezoensis synonymised to R. dubia, and R. liangshanensis may be recently separated from R. dubia [36]. Conocephalus japonicus is nested within the C. longipennis cluster. These species formed very recently indeed, and young species (incomplete lineage sorting) remain within its sister species' coalescent lead to BIN sharing.
In the last few years, some locally distributed Xizicus species have been described. Our analyses found X. concavilaminus and X. kulingensis pooled into one MOTU (BOLD:ADB3332). X. laminatus and X. tinkhami pooled into one MOTU (BOLD:ADB5868) except for GMYC analyses. Xizicus rehni and X. howardi share BOL-D:ACD5539, but still exhibit subclusters that separate species at a very low distance. Three discordance BINs (BOLD: ACD5539, ADB3332, and ADB5868) between Xizicus species were supported by most analyses. This phenomenon might reflect their relatively recent split or the current taxonomy of Xizicus too detailed.

Morphospecies split into more than one BIN
Our results demonstrate the existence of more than one separate lineage in several katydids with wide geographic distribution range. Some species split into more than one BIN may referred to sister clusters on the barcode trees may representing true potential cryptic diversity [31]. Many species with wide geographic distribution range were placed in either a single BIN or a few, but Conocephalus maculatus was outliers, being assigned to 11 BINs. 10 of 11 BINs in C. maculatus represented by two or more specimens. Four BINs of C. maculatus co-occur at different sites across China, such as BOL-D:ACN8107 from Hainan, Yunnan, Guangxi; BOL-D:ADB6356 from Xizang, Yunnan; BOLD:ACD2116 from Zhejiang, Jiangxi; BOLD:ADB6002 from Xizang, Yunnan. C. maculatus is one of the most widespread species of genus Conocephalus, and exhibit one monophyletic cluster. C. maculatus is very likely to represent a species complex. Previous research found specimens of one Canadian spiders Tetragnatha versicolor were assigned to 20 BINs [9]. The maximum intraspecific distances possessed up to 18.85% in Conocephalus bidentatus. All barcodes of C. bidentatus exhibit one monophyletic cluster, and clearly distinct subclusters reflected by three BINs. Three BINs within C. bidentatus reflect geographic clustering with BOLD:ADB6577 from Fujiang and Zhejiang, BOL-D:ADB9596 from Sichuang and BOLD:ADC0531 from Guizhou. Our analyses support C. bidentatus split into three MOTUs except for ABGD and mGMYC treating ADB6577 and ADB9596 as one MOTU.
Xiphidiopsis autumnalis and X. gurneyi form one monophyletic cluster. The two BINs within X. autumnalis reflect geographic clustering with ADE1666 from Hainan, ADE1667 from Guangxi. Meanwhile, the four BINs (BOLD:ADB7052, ADE1668, ADE1669, and ADE1670) within X. gurneyi reflect different sampled localities. Xizicus howardi split into three BINs: ACD5539 from Guangxi, Henan, Hubei, and Zhejiang, ACD5688 from Zhejiang, and ADE3141 from Zhejiang, presumably with a species status. Two or more species are cryptic if they are morphologically similar, biologically distinct, and misclassified as a single species [66]. Cryptic species complexes, in which the component taxa have not diverged morphologically too much, are very difficult to identify, and their discovery is frequently a matter of chance [67]. This phenomenon may suggest possible cryptic species or there are more species to be described.
Conflicts among different species delimitation approaches are very common. Both Gampsocleis carinata and G. gratiosa were split into two MOTUs except for ABGD analysis. The two BINs (BOLD:ADA6038 and ADA5568) of Gampsocleis carinata were predominantly matched to most other clustering methods, which potentially implies detectable intraspecific diversity within Gampsocleis carinata and G. gratiosa, or the probable existence of more than one species. The species Tettigonia chinensis was supported by three clustering methods (ABGD, mGMYC and bPTP). Meanwhile, it was split into two BINs (BOLD:ACD6622 and ACD6623) and was supported by both RESL and jMOTU results.
DNA-based species delimitation may be compromised by limited sampling effort and species rarity, including "singleton" representatives of species, which hampers estimates of intra-versus interspecies evolutionary processes [68]. A broader intraspecific sampling is a critical step for increasing the success of species identification, and a special effort was made to achieve this aim [69]. Previous studies demonstrated broader geographical sampling decreases the barcoding gap between species and hence reduces the accuracy of DNA barcoding [60]. Ducetia japonica have been found are distributed over a huge area extending from Pakistan in the West to the Solomon Islands in the East and from Northern China in the North to northern Australia in the South [70]. We have a broader sampling of Ducetia japonica distribution in China. Our results demonstrate the existence of three separate lineages (BOLD:ACD7324, ADB6191, and ACE7214) in D. japonica. The different song types indicated clearly that D. japonica as presently understood is not a homogeneous, extremely widespread species, but a complex of several distinct species [70].

Conclusions
No universal barcode gap was observed in our four data sets. There are moderately variable results from different delimitation methods. Our research supported the contention of Ortiz and Francke (2016) contention that combining evidence from multiple delimitation methods obtains better-supported results [18]. To diminish the probability of species under-or overestimation solutions, we determined separate MOTUs that were recovered in at least four of the seven species delimitation analyses. Excluding singletons (22 identified species, 125 BINs), we recognized 62 robustly supported identified morphospecies, and 166 BIN-species. The GMYC model exhibited a characteristic "overestimating" solutions.
If most MOTU splits detected in this study reflect cryptic/undescribed taxa, the true species count for Chinese katydids could be a large proportion higher than currently recognized. Moreover, only less than 20% species (50 of 279) were represented by not less than 10 specimens, and expanded sample sizes might reveal more barcode splits. Here, we refrain from taxonomic descriptions, as this requires a thorough morphological and taxonomic study for each putative taxon. It is also important to note that there could be noise in our results, potentially due to considerable unidentified specimens. Nevertheless, our results support COI-5P efficacy for rapid delimitation of katydid species and for indicating likely cryptic/undescribed species for further exploration.

Additional files
Additional file 1: Table S1. Additional 34 COI-5P sequences that were mined from GenBank. (XLSX 23 kb) Additional file 2: Table S2. The distance within-species and to its nearest neighbor (NN). (XLSX 21 kb) Additional file 3: Comparison of the species delimitation results of Chinese katydids based on an analysis of 390 unique COI-5P haplotypes of the DBCHL dataset. A midpoint-rooted NJ-K2P tree was implemented in MEGA 7.0. Terminals were labeled with Sequence/Process ID, Species identifications, plus BIN. * indicated a haplotype representing more than one specimen. ** indicated a haplotype shared by more than one species. On the right: summary of putative species delimitation drawn by BINs, RESL, jMOTU, ABGD, sGMYC, mGMYC and bPTP (one column per method). Black codes represented putative MOTUs defined by at least four of the seven species delimitation methods. Grey codes represented MOTUs defined by less than four of the seven species delimitation methods. Other color codes for each column represented clustering together as a single MOTU. (TIF 8640 kb) Additional file 4: Comparison of the species delimitation results of Chinese katydids based on an analysis of 158 unique COI-5P haplotypes of the DBMEC dataset. A midpoint-rooted NJ-K2P tree was implemented in MEGA 7.0. Terminals were labeled with Sequence/Process ID, Species identifications, plus BIN. * indicated a haplotype representing more than one specimen. ** indicated a haplotype shared by more than one species. On the right: summary of putative species delimitation drawn by BINs, RESL, jMOTU, ABGD, sGMYC, mGMYC and bPTP (one column per method). Black codes represented putative MOTUs defined by at least four of the seven species delimitation methods. Grey codes represented MOTUs defined by less than four of the seven species delimitation methods. Other color codes for each column represented clustering together as a single MOTU. (TIF 3270 kb) Additional file 5: Comparison of the species delimitation results of Chinese katydids based on an analysis of 530 unique COI-5P haplotypes of the DBPPM dataset. A midpoint-rooted NJ-K2P tree was implemented in MEGA 7.0. Terminals were labeled with Sequence/Process ID, Species identifications, plus BIN. * indicated a haplotype representing more than one specimen. ** indicated a haplotype shared by more than one species. On the right: summary of putative species delimitation drawn by BINs, RESL, jMOTU, ABGD, sGMYC, mGMYC and bPTP (one column per method). Black codes represented putative MOTUs defined by at least four of the seven species delimitation methods. Grey codes represented MOTUs defined by less than four of the seven species delimitation methods. Other color codes for each column represented clustering together as a single MOTU (TIF 1150 kb) Additional file 6: Comparison of the species delimitation results of Chinese katydids based on an analysis of 147 unique COI-5P haplotypes of the DBTB dataset. A midpoint-rooted NJ-K2P tree was implemented in MEGA 7.0. Terminals were labeled with Sequence/Process ID, Species identifications, plus BIN. * indicated a haplotype representing more than one specimen. ** indicated a haplotype shared by more than one species. On the right: summary of putative species delimitation drawn by BINs, RESL, jMOTU, ABGD, sGMYC, mGMYC and bPTP (one column per method). Black codes represented putative MOTUs defined by at least four of the seven species delimitation methods. Grey codes represented MOTUs defined by less than four of the seven species delimitation methods. Other color codes for each column represented clustering together as a single MOTU. (TIF 3020 kb)