Sequence analysis of mitochondrial ND1 gene can reveal the genetic structure and origin of Bactrocera dorsalis s.s.

Background The oriental fruit fly, Bactrocera dorsalis s.s., is one of the most important quarantine pests in many countries, including China. Although the oriental fruit fly has been investigated extensively, its origins and genetic structure remain disputed. In this study, the NADH dehydrogenase subunit 1 (ND1) gene was used as a genetic marker to examine the genetic diversity, population structure, and gene flow of B. dorsalis s.s. throughout its range in China and southeast Asia. Results Haplotype networks and phylogenetic analysis indicated two distinguishable lineages of the fly population but provided no strong support for geographical subdivision in B. philippinensis. Demographic analysis revealed rapid expansion of B. dorsalis s.s. populations in China and Southeast Asia in the recent years. The greatest amount of genetic diversity was observed in Manila, Pattaya, and Bangkok, and asymmetric migration patterns were observed in different parts of China. The data collected here further show that B. dorsalis s.s. in Yunnan, Guangdong, and Fujian Provinces, and in Taiwan might have different origins within southeast Asia. Conclusions Using the mitochondrial ND1 gene, the results of the present study showed B. dorsalis s.s. from different parts of China to have different genetic structures and origins. B. dorsalis s.s. in China and southeast Asia was found to have experienced rapid expansion in recent years. Data further support the existence of two distinguishable lineages of B. dorsalis s.s. in China and indicate genetic diversity and gene flow from multiple origins. The sequences in this paper have been deposited in GenBank/NCBI under accession numbers KC413034–KC413367.


Background
The Bactrocera dorsalis (Diptera: Tephritidae) is a major horticultural pest. More than 70 species have been identified within this group [1]. Some of the B. dorsalis species are highly destructive because of their wide host-ranges, considerable ecological adaptability, pronounced reproductive potential, and dispersal capacity. The B. dorsalis s.s., B. papayae, and B. philippinensis species have become invasive in many parts of China. Despite previous studies that have described the subtle morphological characteristics distinguishing these three species [1][2][3], taxonomists still doubt whether B. dorsalis s.s., B. papayae, and B. philippinensis are truly separate species [4,5].
B. dorsalis s.s. was first recorded in Taiwan in 1912 and is now widely distributed in most countries in the Asia-Pacific region [6]. Specifically, it has dispersed from China to the northern parts of the Indian subcontinent over the past 90 years [1]. B. dorsalis s.s. has been observed multiple times in China during the past hundred years [7][8][9]. Currently, B. dorsalis s.s. is distributed mostly in China's southern provinces [10].
Mitochondrial DNA analyses have been conducted to study the genetic structure and origin of B. dorsalis s.s. populations in China and Southeast Asia [11][12][13][14][15]. However, the results of these studies have been inconsistent, although it is not known whether these inconsistencies should be attributed to differences in sampling strategies and approaches to analysis. According to Schutze et al., B. dorsalis s.s., B. papayae, and B. philippinensis should be considered a single species [5]. If this is correct, then analysis of populations sampled from both the ranges of B. philippinensis and the B. dorsalis s.s. /B. papayae transition zone (e.g., Pattaya and Bangkok, in southern Thailand) should provide clues regarding the invasion events in the region [4]. Although mitochondrial DNA markers have been used to examine the changes in the structures of Chinese populations of B. dorsalis s.s. over time, the NADH dehydrogenase subunit 1 (ND1) markers in B. dorsalis s.s. have not yet been used for an evaluation of the distribution area of B. philippinensis or the B. dorsalis s.s./B. papayae transition zone [15]. In the present investigation, ND1 was used as a genetic marker to study the genetic structure and origin of the B. dorsalis s.s. population within China and the surrounding areas. Mitochondrial DNA is suitable for analyses of population genetics because of its relatively high rate of genetic evolution [16][17][18][19][20] Figure 1) were investigated. Analysis of334 ND1 sequences from these 19 samples revealed a total of 203 haplotypes, of which 45 were shared and the remaining 158 were single haplotypes. By performing median-joining network analysis, two distinct networks were identified among these 203 haplotypes ( Figure 2). The results of the phylogenetic analysis (see Additional file 1: Figure S1) and network analysis clearly showed two distinct lineages, or cryptic species of B. dorsalis s.s., in these samples.
The lineage I network exhibited seven haplotypes (H55, H87, H90, H96, H97, H184, and H186) and the lineage II network showed a star-like pattern with six predominant haplotypes (H1, H3, H14, H15, H28, and H33) ( Figure 2). The remaining haplotypes identified from the samples constituted single population groups. The lineage II network mainly included the Pattaya and Panzhihua populations ( Figure 2; see Additional file 2: Table S2). The smaller haplotypes of lineage II network, although connected to the central haplotypes, were found frequently ( Figure 2; see Additional file 3: Table S2). These results suggest that most haplotypes have a wide geographic distribution throughout China and the surrounding areas. Rare haplotypes were mainly distributed in Pattaya and Panzhihua, showing strict geographic distribution characteristics. These findings suggest that the Pattaya and Panzhihua are mix zones of two distinct lineages, which are here treated as putative species.

Limited genetic differentiation in geographical populations
To understand the population structure of B. dorsalis s.s., a spatial analysis of molecular variance (SAMOVA) of the ND1 sequences was performed. The F CT values in the SAMOVA analysis of the lineage II network suggested the existence of three population groups (F CT highest for K = 3). The 19 populations clustered as follows: Guangzhou, Zhuhai, Nanning, Pingxiang, Yuxi, Hekou, Yuzhong, Guiyang, Haikou, Wenchang, Taibei, Bangkok, and Phou Phanang as Subgroup 1; Fuzhou, Zhangzhou, and Xiamen as Subgroup 2; and Manila alone as Subgroup 3. The group comprising most of mainland China, Hainan island, Taiwan island, and southeastern Asia (including mix zones) was found to be the largest subgroup among all the samples (see Additional file 4: Figure S2).
The AMOVA analysis showed inter-subgroup genetic differentiation to account for 12.31% of all differentiation. The genetic differentiation within populations accounted for 86.38%. The analysis also showed genetic differentiation among the three subgroups to be very limited (1.31%) ( Table 1). Differentiation among groups (Fct) and within populations (Fst) was not found to be significant. However, differentiation between groups within the same population (Fsc) was (P < 0.01) ( Table 1). Mantel testing indicated significant association between the pairwise coefficient of genetic differentiation (Fst) and the pairwise geographical distance (r = 0.400; P = 0.003) within lineage II (see Additional file 2: Table S1).  Table S1).
Results obtained from genetic diversity analysis among the 19 geographic populations are summarized in Table 2. The number of haplotypes per population (Hp) ranged from 6 to 25. Haplotype diversity (Hd) ranged from 0.833 to 1.000, and nucleotide diversity (π) ranged from 0.00781 to 0.02586. These differences suggested that all populations retained fairly high levels of genetic diversity. The Manila population showed the most nucleotide diversity (π), followed by the Pattaya population.

High migration rates in specific southeast Asian populations
The migration rate of lineage II was estimated in both directions using MIGRATE. Results are shown in Additional file 5: Table S3. The migration rate ranged from 11.9 (from Pingxiang to Guiyang) to 917.1 (from Hekou to Phou). Specific southeast Asian populations (Bangkok, Pattaya, Phou, and Manila) showed very high migration rates. Asymmetric migration was observed between Manila and Pattaya. A similar situation was observed in populations located from Bangkok to China (Guangzhou, Hekou, Fuzhou, Zhangzhou, and Taibei), from Pattaya to China (Guangzhou, Nanning, Hekou, Yuzhong, Xiamen), from Phou to China (Guangzhou, ZhuHai, Hekou, Yuzhong, Fuzhou, Taibei), and from Manila to China (Nanning, Fuzhou, Zhangzhou). The direction of migration between two sites can be inferred using these asymmetric migration patterns ( Figure 3).

Marked historical population expansion
The sequence variation of ND1 genes was used to perform Tajima's D test to determine significance of deviations from neutrality. A negative D value is produced when more than an expected number of polymorphic sites have low frequencies within a sample. This pattern can be explained by either a recent increase in the size of the population or recent selection [21]. Results indicate negative values (D = −1.08524, P = 0.17095; Table 3) for lineage II, but these were not found to be statistically significant (P > 0.1). This is consistent with a neutral    Table 3). The sudden expansion model was rejected for lineage I (P < 0.05). The ratio between estimated effective population size after expansion (θ 0 ) and before expansion (θ 1 ), which can serve as an estimate of the extent of population growth, was 6589× for lineage I and 195,319× for lineage II.

Molecular markers and sampling strategy
Molecular markers are useful tools for genetic analysis. They can be used to elucidate the mechanisms underlying biological invasion. They are applicable to the identification of cryptic invasions, reconstruction of invasion history, and assessment of the genetic structure of a population [22]. Mitochondrial genes such as the cytochrome oxidase genes (e.g., COI and COII) have served as molecular makers in studies of the population genetics of invasive insects [23][24][25][26]. Although the use of the NADH dehydrogenase gene in such studies is relatively rare, the results of the present study suggest that the NADH dehydrogenase gene is very useful because it shows greater nucleotide substitution rates and more variation than the cytochrome oxidase gene [17][18][19][20]27]. Because of this higher rate of variation, the NADH dehydrogenase gene provided a higher resolution in deciphering the geographical population genetics of the recent invasion. NADH dehydrogenase genes have been successfully applied to phylogenetic and population genetic studies of insects, both as an alternative and as a complement to cytochrome oxidase genes [28][29][30]. Recent investigations   [2]. However, integrative taxonomy has shown that they are actually a single species [4,5]. For this reason, a larger number of samples were used than in previous studies, and these samples were taken from sites across a broader geographical range (e.g. B. philippinensis distribution area and B. dorsalis s.s./B. papayae transition zone).

Genetic structure, demographic history, and identification of cryptic invasions
Evidence from the ND1 gene indicated the existence of two distinct lineages that are sympatric within some parts of Pattaya and Panzhihua (Figure 2; see Additional file 1: Figure S1). Of the 334 individuals, 21 were found to belong to the lineage I clade and 313 to lineage II. This disequilibrium can be explained by the existence of two putative species. The first dispersed across a wide geographic area, and the second underwent genetic hitchhiking using external factors (such as infection of maternally inherited microorganisms, such as Wolbachia). The role of Wolbachia in population differentiation and gene flow is known [35]. Previous studies have also found similar phenomena in other species, including Dipteran Drosophila melanogaster [36], Drosophila simulans [37], Rhagoletis cerasi [38], mosquitoes [39], Hemipteran Bemisia tabaci [40,41], and Lepidopteran Danaus chrysippus (L.) [42,43]. These phenomena suggest that infection of maternally inherited microorganisms may influence random mating among specific taxa, leading to divergence [39,42,43]. If Wolbachia had a role in population structure observed in the current study has yet to be examined. In lineage II, a weak genetic structure was observed due to low differentiation among three groups (Fct value was not significant; see Table 1). This is consistent with the results by Wan et al. [15]. Median-joining networks showed that each clade contains both widespread and localized haplotypes (Figure 2). These patterns suggest recent increases in population size and geographical range. Mismatch distribution analysis was conducted to evaluate the history of species expansion [44]. In order to confirm whether or not the B. dorsalis s.s population had experienced a large-scale population expansion, the ARLEQUIN 3.5 software package was used to analyze mismatch distribution between the haplotypes' base variance. The results showed significant differences between observed and expected values of the haplotypes' base variance along with distinct unimodality distribution patterns of variation (Figure 4). The estimated effective population size after expansion (θ 1 ) was 190,000 times higher than before expansion (θ 0 ) ( Table 3). It was here inferred that the B. dorsalis s.s populations of lineage II has experienced rapid expansion in recent years. The average number of nucleotide differences is significantly affected by a population bottleneck, but the number of segregation sites is not [21]. As indicated by the average number of differences in nucleotide sequence (k) ( Table 2), Chinese B. dorsalis s.s populations showed stronger indications of bottleneck and founder events than the southeast Asian populations (Manila, Pattaya, and Bangkok).
Previous taxonomic analyses of the B. dorsalis complex were based on geographical subdivisions due to the difficulties of discrimination using morphological traits [2,45]. According to Drew and Hancock, B. philippinensis and B. occipitalis but not B. dorsalis s.s are distributed in the Philippines [2]. However, in the present study, haplotype networks and phylogenetic evidence indicated no strong support for geographical subdivision within the B. philippinensis population. In this population, three B. philippinensis individuals and the entire B. dorsalis s.s. population (Guiyang and Xiamen) were found to belong to the same clade. However, five B. philippinensis individuals deviated from the B. dorsalis s.s. populations (see Additional file 1: Figure S1). Mantel testing indicated that genetic differentiation was significantly influenced by geographical factors (r = 0.400; P = 0.003). Geographic distance is probably responsible for this partitioning of genetic variation in the Philippines.

Invasion history
Nucleotide diversity is the ideal index used to evaluate the degree of variation of nucleotide sites between populations [46]. Dependence can be measured using sample size and sequence length, but genetic diversity can be measured using nucleotide diversity. Nucleotide diversity is the average number of differences in the nucleotide sequence among individuals of a given population, compared pairwise. Generally, ancestors show significantly more genetic diversity than derivative populations because of the founder effect [47,48]. According to this principle, it was here inferred that the populations of B. dorsalis s.s. from China might have originated in southeast Asia (Manila, Pattaya, and Bangkok) ( Table 2). The MIGRATE analysis showed that gene flow occurred from Manila to Bangkok to Phou to China. However, it occurred in an asymmetric manner. The asymmetric migration of gene flow probably indicates colonization events in China that may have involved multiple sources and sites of invasion. Phylogenetic analysis provided another line of evidence for this: numerous southeast Asian individuals belonged to the same clade as Chinese populations (see Additional file 1: Figure S1). This conclusion is supported by studies showing that southeast Asia might be the region of origin of B. dorsalis s.s [11,49]. As in previous studies, the maximum genetic diversity was observed in southeast Asia and Yunnan Province [11,33,49]. However, symmetric migration patterns were observed at a number of sites, distributed mainly in Yunnan, Guangdong, Fujian, and Taiwan. It is likely that these places may have been the first to be invaded. In conclusion, two distinct lineages of B. dorsalis s.s. were identified in the present investigation based on sequence data of ND1 gene. The asymmetric migration of gene flow indicated multiple invasions and multiple origins.

Conclusions
Using mitochondrial DNA ND1 markers, the genetic structure, origin, and invasion history of B. dorsalis s.s. were investigated. It was observed that distinct lineages (both minor and major) originated from specific southeast Asian populations. Interestingly, minor lineages have not spread in China. Evidence was found indicating symmetrical migration from southeast Asia to China. Understanding origin and genetic structure of B. dorsalis s.s. will possibly assist in the development of effective management strategies to prevent biological invasion. Source-tracking and minor distinct lineage "encounter" approaches may also provide better clues to the design of appropriate control methods, such as introducing natural enemies, to minimize biological invasion of B. dorsalis s.s. in China.

Sample collection, DNA extraction, and sequencing
Adult male flies were collected from 19 natural populations from different locations within China and surrounding areas. All the collections were done during 2007 and 2008. Flies were captured using traps baited with parakairomone methyl eugenol (Table 4; Figure 1). No specific permits were required for these field studies. No specific permission was required to perform these activities in these locations. The locations were not privately owned or protected in any way. The field studies did not involve endangered or protected species. Specimens were preserved in 95% ethanol at -20°C before use. Flies were identified using morphological identification.
Total DNA was extracted from 334 individual specimens using the Universal Genomic DNA Extraction Kit (TaKaRa). The ND1 gene fragment of the mitochondrial genome was amplified from all the individuals using the primer ND1-F (5′-TTAGTTGCTTGGTTGTGTATTCC-3′) and ND1-R (5′-GAAAAAGGTAAAAAACTCTTTC AAGC-3′) [50]. PCR was performed with an initial denaturation step of 95°C for 5 min. The cycle conditions were 95°C for 1 min, 56°C for 1 min, and 72°C for 1.5 min for a total of 35 cycles. PCR ended with a final extension of 72°C for 8 min. PCR products were purified and sequenced using PCR primers from Invitrogen Biotechnology Co. (Shanghai, China). Sequences were deposited in GenBank under accession numbers KC413034-KC413367 (Table 4).

Data analysis
DNA sequences were assembled using SeqVerter™ and aligned using CLUSTALX 2.0 [51]. They were adjusted manually. Haplotype diversity (Hd), nucleotide diversity (π), average number of differences in nucleotide sequence between haplotypes (k), and standard deviation were determined using DNASP 5.0 [52]. To depict the phylogenetic and geographical relationships of the haplotypic sequences, a haplotype network was created with the median-joining method in NETWORK 4.6 using "connection cost" as the parameter criterion (Frequency > 1 criterion, external rooting, and square option was inactive) [53,54]. A phylogenetic tree was constructed using the neighbor joining method implemented in MEGA 5.0 [55].
The coalescence-based program MIGRATE-N 3.5.1 was used to test for and estimate gene flow between populations [56]. MIGRATE-N estimated past migration rates among populations using a matrix model of asymmetric migration rates. MIGRATE-N was used to estimate the immigration rate M (M = m/μ, where m is the immigration rate per generation and μ is the mutation rate per generation per locus) among populations. One long chain of 100,000,000 generations was set, with the initial 10,000 excluded as burn-in.
Spatial analysis of molecular variance was performed using SAMOVA 1.0 to identify population groups [57]. Sequence datasets, longitude information, and latitude information served as input data. The greatest number of supported groups (K) was determined by repeating the analysis with K ranging from 2 to 6 and selecting the subdivision scheme associated with highest F CT . An AMOVA hierarchical analysis of variance was performed using ARLEQUIN 3.5 to partition total variance into its components among groups, among populations, and within populations [58]. This was based on the identities of the groups inferred by the SAMOVA analysis. Each of the groupings of populations examined using the AMOVA models explained a significant portion of the molecular variance. Genetic differentiation was estimated by calculating Fst between pairs of populations using ARLEQUIN 3.5.
The correlation of genetic differentiation (Fst) over geographic distances for all pairs of populations was tested using the Mantel permutation procedure as implemented in ARLEQUIN 3.5 (1000 permutations, significance level P < 0.01) [59].
Demographic history was examined using neutrality statistics of Tajima's D test, Fu's Fs test, and mismatch distribution analysis implemented in ARLEQUIN 3.5