Unexpected relationships of substructured populations in Chinese Locusta migratoria

Background Highly migratory species are usually expected to have minimal population substructure because strong gene flow has the effect of homogenizing genetic variation over geographical populations, counteracting random drift, selection and mutation. The migratory locust Locusta migratoria belongs to a monotypic genus, and is an infamous pest insect with exceptional migratory ability – with dispersal documented over a thousand kilometers. Its distributional area is greater than that of any other locust or grasshopper, occurring in practically all the temperate and tropical regions of the eastern hemisphere. Consequently, minimal population substructuring is expected. However, in marked contrast to its high dispersal ability, three geographical subspecies have been distinguished in China, with more than nine being biologically and morphologically identified in the world. Such subspecies status has been under considerable debate. Results By multilocus microsatellite genotyping analysis, we provide ample genetic evidence for strong population substructure in this highly migratory insect that conforms to geography. More importantly, our genetic data identified an unexpected cryptic subdivision and demonstrated a strong affiliation of the East China locusts to those in Northwest/Northern China. The migratory locusts in China formed three distinct groups, viz. (1) the Tibetan group, comprising locusts from Tibet and nearby West China high mountain regions; this is congruent with the previously recognized Tibetan subspecies, L. m. tibetensis; (2) the South China group, containing locusts from the Hainan islands; this corresponds to the Southeast Asia oriental tropical subspecies L. m. manilensis; (3) the North China group, including locusts from the Northwest and Northern China (the Asiatic subspecies L. m. migratoria), Central China and Eastern China regions. Therefore, the traditional concept on Locusta subspecies status established from Uvarov in 1930s needs to be revised. The three groups of locusts probably have separate evolutionary histories that were most likely linked to Quaternary glaciations events, and derived from different ancestral refugial populations following postglacial expansions. Conclusion The migratory locust populations in China have differentiated into three genetically distinct groups despite high dispersal capability. While this clarified long-standing suspicions on the subspecific diversification of this species in China, it also revealed that the locusts in the vast area of East China are not the oriental subspecies but the Asiatic subspecies, an unexpected substructuring pattern. The distribution pattern of the three locust groups in China may be primarily defined by adaptive differentiation coupled to Quaternary glaciations events. Our results are of general significance both for locust research and for phylogeographical study of flora and fauna in China, illustrating the potential importance of phylogeographical history in shaping the divergence and distribution patterns of widespread species with strong dispersal ability.

subspecific diversification of this species in China, it also revealed that the locusts in the vast area of East China are not the oriental subspecies but the Asiatic subspecies, an unexpected substructuring pattern. The distribution pattern of the three locust groups in China may be primarily defined by adaptive differentiation coupled to Quaternary glaciations events. Our results are of general significance both for locust research and for phylogeographical study of flora and fauna in China, illustrating the potential importance of phylogeographical history in shaping the divergence and distribution patterns of widespread species with strong dispersal ability.

Background
Patterns of population genetic differentiation of an organism are shaped by various factors, such as geographical barriers, ecological difference, and historical processes, as well as the dispersal ability of the species. Highly migratory species are usually expected to have minimal population substructure over their distributional ranges because strong gene flow can counteract the isolating effects of geographical distance and physical barriers, and even remove genetic differentiation due to local adaptation [1]. However, historical processes such as climatic fluctuations and geological events can modify their range, leading to population subdivision even in species with high dispersal capabilities, as seen in some large mammals [2] and insects [3].
The migratory locust Locusta migratoria, belonging to a monospecific genus, is one of the most important agricultural pests in the world, and outbreaks were recorded as early as in 13 th century BC [4]. Its distributional area is greater than that of any other locust or grasshopper [5], occurring in practically all the temperate and tropical regions of the eastern hemisphere (Asia, Europe, Africa and Australasia) from 154 m below sea level in Xinjiang (Sinkiang) to about 4,600 m above sea level on the Tibetan plateau [6]. Such a vast distribution and monospecific status suggest exceptional migratory ability, and indeed dispersal over a thousand kilometers has been documented [7,8]. Consequently, minimal population substructure is expected within its distributional range. Nevertheless, some nine geographical subspecies have been distinguished biologically and morphologically [9][10][11][12], three of which are present in China, viz: Locusta migratoria migratorioides, L. m. tibetensis, and L. m. manilensis. There is considerable debate on the reliability of the subspecific status so identified [11,13,14], since morphological characters are readily influenced by regional climatic and habitat variation, and identification of subspecies has often been based on locality rather than critical examination of specimens [5]. This is particularly so for locusts in China with perennial doubts of their subspecific affinities [11,14].
Additionally, phylogeographical study in China as a whole is still underpresented [15], and genetic data are particularly scanty for deducing what has happened to the biomes during glaciations and deglaciations in past Quaternary cycles in China. This further impinges on the origins of species in neighboring areas in the Palaearctic and also North America [16,17], and is an important issue for understanding global effects of Pleistocene climate variations [18]. Thus, large geographical scale phylogeographical study in East Asia is pressingly needed and likely to reveal hidden patterns of biogeographic evolution. Here we present an extensive population genetic survey of the migratory locust in China using highly polymorphic microsatellite DNA markers. We aim to explore the following questions: (1) Whether the subspecific patterns morphologically identified for the migratory locust in China are genetically supported; (2) How to explain the patterns in the context of biogeographic evolution, given the strong dispersal capabilities of the insect? We provide robust genetic evidence for strong population structure and an unexpected cryptic subdivision in this insect in China, clarifying some long-standing issues in the subspecific divergence of this insect. We suggest that historical phylogeographical factors and the associated ecological adaptation have played important roles in shaping the observed genetic and geographic patterns in this highly migratory insect.

Results
In total, 1381 individual locusts from 26 localities ( Figure  1) were genotyped at eight nuclear microsatellite loci [19]. After corrections for multiple comparisons, no linkage disequilibrium was detected for the eight loci employed, but most population samples (23 out of 26) deviated from Hardy-Weinberg equilibrium (HWE) at one to five loci. Micro-Checker identified that the presence of null allele(s) potentially contributed to this deviation. Estimated frequencies of null alleles per locus per sample ranged from 0 to 0.438 (only in five cases, the value was >0.4), and in 112 of the 208 locus-sample combinations the values were greater than 0.05. The most frequent null allele frequencies were in the range of 0 -0.1. The average null allele frequencies over loci (F null ) vary from 0.048 (BaM) to 0.217 (F) among population samples (Table 1). At locus LmIOZc36, null alleles were detected in all 26 samples (with the frequency ranging between 0.208 -0.429). If this locus was excluded, nearly half of samples were in HWE at the remaining seven loci. Three complementary approaches were performed to examine the relationships of the locust populations studied, i.e. genetic distance based neighbor-joining approach, Bayesian inference and principal component analysis. There was no significant change in results if the locus LmIOZc36 was excluded from the data. Each analysis was conducted separately on two data sets: the original data set without correction for null alleles and the data set corrected for null alleles. Since highly congruent results were obtained in such analyses, we only show the results from the original data set. Figure 2 shows the neighbor-joining population tree based on Cavalli-Sforza's chord distance (Dc) from the microsatellite loci [tree based Nei's standard genetic distance (Ds) has highly concordant topology]. It indicates that the Chinese locust populations studied are clustered genetically into three major groups each with strong bootstrap support: (1) the Tibetan group (orange clade in Figure 2), comprising locusts from Tibet and nearby West China high mountain regions (R, L and SCJSJ); this is congruent with the previously recognized Tibetan subspecies, L. m. tibetensis [12]; (2) the South China group (red clade), containing locusts from the Hainan islands (hNLD, hNSYms, hNSY1, and hNSY2); this corresponds to the Southeast Asia tropical subspecies L. m. manilensis [11];   (Table S1)). Within each group, populations are genetically similar; F ST values among populations in general are small (0.000 -0.010) and not significantly different from zero for most pair-wise comparisons (excepting the Tibetan group and some populations of the North China group, see below) (Additional file 1 (Table S1)).
The results of principal component analysis (PCA) of the microsatellite genotype data are shown in Figure 3. The Chinese locust populations form three clusters "Tibetan", "South China" and "North China", which is identical to patterns seen in the neighbor-joining population tree based on Dc ( Figure 3). Figure 4 shows the results of Bayesian STRUCTURE analysis. It also inferred three clusters (K = 3) for the Chinese locust populations (the mean Dirichlet parameter Alpha (α) for degree of admixture is 0.041 at K = 3), corresponding to the three major groups identified in the aforementioned phylogenetic approach ( Figure 2) and PCA analysis (Figure 3). At various defined K values (simulated from 2 to 8, Figure 4), the Tibetan group and the South China group each remains as a fixed cluster (except at K = 2, where these two groups merged as one cluster, with the rest of the populations as the other cluster). At higher K values (4-8), the locusts in North China group keep splitting further, albeit apparently irregularly ( Figure 4).  China group (3.93%) is about 36-fold that of their withingroup-among-population variance (0.11%).

Patterns of genetic differentiation and unexpected cryptic subdivision
The present multilocus microsatellite genotyping analysis studied 25 population samples from all three subspecies of the migratory locust in China, namely the Asiatic migratory locust L. m. migratoria, the oriental migratory locust L. m. manilensis, and the Tibetan migratory locust L. m. tibetensis. Overall, our data revealed that the migratory locust populations in China have differentiated into three distinct groups: the Tibetan clade (orange circles in Figure  1), the South China clade (red coded) and the North China clade (blue and green coded). This genetic pattern is concordant with geographic distribution, and was strongly supported by several complementary approaches (Figures 2, 3, 4 and Table 2, genetic distance-based phylogenetic approach, multivariate method, Bayesian cluster-ing inference and variance analysis, respectively). We emphasize that the principal component analysis does not make strong assumptions of Hardy-Weinberg equilibrium, and Bayesian inference does not take into account the sample locations of individuals. The concordance between these approaches indicates the robustness of the patterns revealed. The above genetic pattern largely confirms the subspecific diversification in this species recognized from biological and morphological data [5,6,10,12]. A major, unexpected disagreement exists, however, between our genetic data and the traditional treatment of subspecific status of the locusts in East China (green circles in Figure 1). Traditionally, locusts in the immense area of East China and South China (e.g. Hainan islands, red circles in Figure 1) have been classified as the subspecies L. m. manilensis [5,6,10], this being an accepted concept since Uvarov's work in 1930s [10]. Our genetic data have identified a cryptic subdivision between these locusts, and demonstrate a strong affiliation of the locusts in East China to those in Northwest/Northern China (blue circles in Figure 1) instead of those in South China.
An issue related to the sampling scheme deserves some consideration before we can draw any firm conclusion from the above observations. In our study, all population samples from South China are from the Hainan islands with no sample from the adjacent continent. (Although Neighbor-joining tree illustrating the relationships of the migratory locust populations in China based on allelic fre-quencies at eight microsatellite loci Colors are coded as in Figure 2. Note that the North China group symbols cannot be fully seem due to overlapping. the migratory locusts were recorded in continental South China, in most of the time they form only solitary populations of low density. We failed to obtain any sample from there after several attempts). Thus one possibility is that the observed population structure might reflect the effect of gene flow barrier between the island and the continent populations, with simply an artifact of insufficient sampling in South China. Several lines of evidence argue against this suggestion. First, the minimum distance between Hainan islands and the main continent is only 20 km (the width of the Qiongzhou Strait that separates the island and the continent varies between 20 to 30 km [20]), which does not form an effective barrier of gene flow. Long-distance migration of locusts in Hainan has been well documented [14]. Second, revisiting the earlier literature carefully revealed that in the 1990s ecologists and taxonomists had already noticed some subtle morphological differences between locusts in East China and South China (Hainan region) [14] and the somewhat closer affiliation in certain morphometric measures of the East China locusts to those in the Mengxin region [21]. For example, Ding questioned in 1995 whether the migratory locusts in East China are really the oriental subspecies as seen in Hainan, since the black strip marking on both sides of the pronotum found in the solitary locusts from Hainan were not present in the majority of locusts from East China [14] (however his view has received little attention). This lends independent support to our genetic find-Bayesian estimation of population structure   [23] have reported the existence of intraspecific subdivision in this highly migratory insect by microsatellite DNA analysis of rangewide samples, which appeared to not correspond well to traditional subspecies taxonomy. Our results readily clarify some of the oddities observed in their study -viz. why do the oriental migratory locusts in East China (their collecting site no. 15) not cluster with their consubspecifics in Southeast Asia? This is because they belong to different subspecies.

Phylogeographical implications of the observed differentiation patterns
Highly migratory species are usually expected to have minimal population substructure over their distributional ranges [24] because strong gene flow has the effect of homogenizing genetic variation over geographical populations, counteracting random drift, selection and mutation [1,[24][25][26]. In contrast, both the traditional morphometric and our complementary genetic analyses demonstrated a largely concordant differentiation pattern of locust populations in China. This suggests that either the dispersal ability of the migratory locust is not as strong as thought (such that gene flow cannot effectively prevent geographical populations from drifting apart genetically), or some other processes are involved, which caused population divergence. However, the strong migratory ability of the migratory locust (especially long-distance migration) has been well documented [7,8]. Our results also revealed that populations separated over 1000 km in East China do not show genetic differentiation (Additional file 1) and there is no isolation by distance (IBD) within this region (data not shown). From classical population genetic theory, this indicates strong gene flow across large geographical area homogenizing populations, confirming the migratory locust as a strong disperser. It further indicates that geographical distance does not constitute a barrier for gene flow in this insect in China. Similarly, no physical barriers preventing locust migration seem to exist in East and South China, for example, locusts in East China and the Mengxin region are well connected despite the Taihangshan mountain chains (at 3,058 m) separating them.
Among the other processes likely involved in the divergence of the locust populations (e.g. habitat patchness, local extinction/recolonization events, phenological isolation, behavioral difference), we believe that historical process, such as historical climatic fluctuations played a primary role. The impact of Pleistocene glaciations cycles  on floral and faunal distributions is now well recognized, being a major force shaping population diverging patterns in many organisms [16,17]. As a common scenario, populations were isolated in different refugial areas during glacial periods and diverged genetically from each others, subsequently extending their ranges by (re)colonization as the favourable climatic and ecological conditions resumed. This is also plausible for the migratory locust [27]. For example, in China in the mid-latitudes (30 -40°N), at the last glacial maximum (LGM, ~20 kya), significant southward and eastward extension of steppe and desert biomes occurred. Cool mixed forests shifted c. 1,000 km eastward into the lowlands, and the northern boundary of broadleaved evergreen/warm mixed forests was displaced southward by c. 1,000 km [28]. Over the whole of north and east China, climatic conditions were much drier and colder in the LGM than today. A reduction of temperature between 7 to 12°C has been estimated [29,30], with a fall of sea level along the East China Sea coast up to 140 m [28]. Consequently, during the LGM in the areas where the North China group of locusts is found today, north to the latitude 38-40°N (eastern part) and 37-39°N (western part) were permafrost [31], and in the vast East China steppe and desert were the dominant vegetation types, with herbaceous plants being composed mainly of Artemisia and Chenopodiaceae [29,[31][32][33]. These plants, which are indicators of cold conditions and also cause high death rate of hoppers (95%) or abortion of the moulting process [6,11,34], are not suitable food for the migratory locusts. Therefore, we can deduce that at the LGM the migratory locusts were very unlikely to survive in these areas. This means that locusts found today in these areas originated by recolonization from elsewhere after the LGM.
The most likely source of origin of locusts in North China is glacial refugia in the Black, Caspian and possibly Aral Seas basins regions, given that (1)  There is not enough evidence to deduce how the South China group of locusts (red circles in Figure 1) was affected by past glaciations. These locusts are most likely of Southeast Asia origin (locusts in Southeast Asia such as the nearby Indo-China Peninsula and the Philippines are all known as the tropical subspecies L. m. manilensis), considering recorded invasions of locusts from the Philippines to the Taiwan islands [10] and comparable tropical Savannah breeding habitat in these regions [14]. By contrast, the Tibetan group probably developed from local refugial sources. This group of locusts has a closer affinity to the outgroup, the African migratory locust, than the other two groups, and shows a strong within-group divergence pattern (Figures 2 and 3). Wright's F-statistics also indicate significant population differentiation within this group (Additional file 1; F ST >0.11 between SCJSJ and L/ R), suggesting local isolation of geographical populations over a sufficiently long period of time. Pollen evidence suggests that the southern and eastern edges of the Tibetan plateau had favorable climatic conditions during the last glacial [38], being important refugial places for plants and animals. Thus, the present populations have probably been derived from glacial refugia in these areas, and local geography (high mountainous landscape) should have further enhanced genetic differentiation among populations. Interestingly, the present distribution pattern of the Tibetan group locusts largely parallels the distribution pattern of the broadleaved forests at LGM in Tibet, albeit shifted somewhat internally, and this is indicative of the refugial areas of the locusts and directions of postglacial expansion.
Therefore, circumstantial evidence suggests that the three genetically distinct locust groups in China were isolated from each other during evolution most likely coupled to Quaternary glaciation events, and were derived from different glacial refugial populations following postglacial expansions. Although we have focused our discussion above on LGM as this is the glaciation best understood, the differentiation patterns observed in the locust could well be a combined consequence of several glaciations cycles. Glacier studies in Tibetan Plateau have identified three major glaciation events in China in the Quaternary that were of great amplitude and left recognizable footprints (glacier relics) [39], including the LGM. Further study with DNA sequence data is clearly needed to more precisely estimate the time scale of differentiation of locust populations.

Factors maintaining the current isolation of locust populations
How is the substructuring pattern of locust populations in China maintained given the strong dispersal ability of this insect? Distributional patterns of species are molded by a number of factors, including barriers to dispersal, physical and biological factors that make particular regions of habitat unsuitable for viability and/or reproduction [40]. The actual geographic distribution is defined by the complex interaction of the environment, the species fundamental ecological niche, and particular biological realities and historical events [41][42][43][44][45][46]. It is known that once populations have become genetically differentiated, their divergence status can be maintained if they have differentially adapted to regional ecological conditions, since geographic variation in selection can act as a strong barrier to gene flow [26,47]. This is likely the case to the migratory locust even though it is a strong disperser. The significant physiological difference in cold hardiness between North China (Mengxin region + East China) populations and the Hainan populations [22] reflects differential selection in this species in different regions potentially linked to historical isolation (see above). That is, the migratory locust populations in different refugial areas during glaciations periods could have undergone allopatric (or parapatric) divergence with adaptive evolution, and shifted to different adaptive landscapes. Thus, populations in Tibet have adapted to the ecological and climatic conditions at high altitude, the South China populations to subtropical and tropical conditions, and the North China populations to temperate conditions. This should have ecologically restricted their distributional ranges in postglacial expansions, and then prevented effective migration among ecologically different regions. Therefore, the current pattern of distribution of the three locust groups in China appears to be primarily defined by adaptive difference which has acted as barriers to gene flow. As a consequence, the current effective gene flow is weak and has little genetic consequence; that is, it is not strong enough to wipe out the patterns of differentiation created during historical isolation.

Conclusion
In summary, the migratory locust populations in China have differentiated into three distinct groups despite high dispersal capability, and the locusts in the vast area of East China are not the oriental subspecies but the Asiatic subspecies. It suggests that these groups of locusts have separate evolutionary histories most likely molded by Quaternary glaciations events, and derived from different ancestral refugial populations following postglacial expansions. The population substructuring patterns observed in the migratory locusts, as reported here and in Chapuis et al. [23], are of general significance both for locust research and for phylogeographical study of flora and fauna in China and beyond, and are illustrating for widespread species with strong dispersal ability. In view of our sampling density and results obtained, it suggests that far more population samples are needed in order to study the worldwide population genetic structure and biogeographic evolution of highly mobile species, such as the migratory locust.  Figure 1). In total, 1381 individual locusts from 26 localities were used in this study. Genomic DNA was extracted using a modified phenolchloroform procedure as described by Zhang and Hewitt [48]. Each individual was genotyped at eight microsatellite loci [19] on an ABI PRISM™ 3100 Genetic Analyzer using Pop4 gel matrix with GENESCAN ® 400HD (ROX) as the internal size standard. Sizes of the amplified microsatellites were scored by GeneScan 3.7 and manually checked for every allele. A blank control was carried out along each set of DNA extractions and PCR amplifications to monitor any possible cross contamination. Samples that did not amplify at more than two loci were excluded from further analysis.

Sample collection and microsatellite genotyping
Note that mitochondrial DNA of locusts is of little use for population genetic studies due to the presence of numerous pseudogenes in the nuclear genome [49], and nuclear ribosomal ITS regions do not contain enough sequence variation (DXZ's unpublished data).

Data analysis
Heterogeneity testing was carried out for the two sexes and multiple samples collected from the same areas before pooling them in analysis, and no genetic difference was observed [50]. Basic population genetic parameters (the number of alleles, the observed and expected heterozygosity per locus) were estimated with MSTools 3.0 [51]. Hardy-Weinberg equilibrium (HWE) and linkage disequilibrium were tested using GENEPOP ver. 3.4 [52], GDA [53] and ARLEQUIN ver. 3.0 [54], with sequential Bonfer-roni correction for critical significance levels. Null alleles were examined with Micro-Checker [55].
Wright's F-statistics (F ST or θ), measures of population subdivision, were calculated using FSTAT 2.9.3 [56] and ARLEQUIN. Statistical significance of the estimates was evaluated by permutation or bootstrap procedure. Exact test of population differentiation has been carried out using GENEPOP, and the significance levels were assessed by Markov chain procedure.
AMOVA (Analysis of Molecular Variance) was performed using the program implemented in ARLEQUIN. AMOVA was used to examine which grouping of the Chinese locusts has the maximum among group variance, and whether the traditional taxonomic classification of the Chinese locust has a high among group variance. The traditional taxonomic classification is as follows (Figure 1;  [57] was used for calculating the genetic distances and constructing population phylogenetic trees. Nei's standard genetic distance (Ds) and Cavalli-Sforza's chord distance (Dc) were estimated using the program GENDIST. Dc distance based tree topology is generally more robust for gene frequency data [58] and insensitive to null alleles [59]. 1,000 bootstrap replicates were performed to obtain statistical support for inferred trees.
A Bayesian clustering analysis implemented in the program STRUCTURE [60] was also used to infer population structure in the locust. This method allows the assignment of individual insects to distinct clusters based on their genotypes, without using sampling locations, hypothesized genetic origins of individuals or phenotypic information. Trial runs were first tested with varying length of iterations (10 4 -10 6 ) after a burn-in period of various lengths (10 4 -10 6 ). We found that stationarity was reached with a burn-in period of 1 × 10 4 iterations, and data col-lection for ≥1 × 10 5 iterations produced highly consistent results. Independent runs with different K values each with several replicates were then performed using a burnin period of 1 × 10 5 iterations and data collection for 1 × 10 6 iterations, with a model of correlated allele frequencies. A criterion recommended for selecting the appropriate K value is the estimated posterior probability of the data, P(K/X) (see the program manual). For complex datasets with many groups, this criterion is difficult to apply. We have observed that the Dirichlet parameter Alpha (α) for degree of admixture appears to be a more reliable indicator of the 'correct' K value. For the clustering pattern with the most appropriate population structure (at the simulated 'correct K'), admixture among populations (the inferred clusters) should be minimal, and therefore α should be minimal; for values smaller or larger than the 'correct' K, α should always be larger. Thus, the smallest K with the smallest α is most likely the real structure contained in the data. It is expected that departures of data from HWE may lead to overestimating K. While this could particularly be a problem for closely related populations, it should have little influence on divergent populations. Graphical display of the results of STRUCTURE was done with the program DISTRUCT by N. A. Rosenberg [61].
Principal component analysis (PCA) was performed with PCA-GEN [62], incorporating 1,000 randomizations, and verified independently using the statistic software package SPSS ver. 10.0 (SPSS Inc., Chicago, IL, USA). As a complementary approach to model-based genetic analyses described above, this multivariate method does not make strong assumptions of Hardy-Weinberg equilibrium or linkage equilibrium in the data.

Authors' contributions
LNY carried out the molecular genetic studies and participated in data analysis. YJJ participated in some field work, genotyping data collection, analysis and coordination, and helped to revise the manuscript. ZSH participated in some field work and genotyping studies. GMH coordinated the study and the Royal Society UK-China joint project, and helped to revise the manuscript. DXZ conceived of and designed the study, carried out field work, participated in data analysis and coordination, and wrote the manuscript. All authors read and approved the final manuscript.