SNP-revealed genetic diversity in wild emmer wheat correlates with ecological factors

Background Patterns of genetic diversity between and within natural plant populations and their driving forces are of great interest in evolutionary biology. However, few studies have been performed on the genetic structure and population divergence in wild emmer wheat using a large number of EST-related single nucleotide polymorphism (SNP) markers. Results In the present study, twenty-five natural wild emmer wheat populations representing a wide range of ecological conditions in Israel and Turkey were used. Genetic diversity and genetic structure were investigated using over 1,000 SNP markers. A moderate level of genetic diversity was detected due to the biallelic property of SNP markers. Clustering based on Bayesian model showed that grouping pattern is related to the geographical distribution of the wild emmer wheat. However, genetic differentiation between populations was not necessarily dependent on the geographical distances. A total of 33 outlier loci under positive selection were identified using a FST-outlier method. Significant correlations between loci and ecogeographical factors were observed. Conclusions Natural selection appears to play a major role in generating adaptive structures in wild emmer wheat. SNP markers are appropriate for detecting selectively-channeled adaptive genetic diversity in natural populations of wild emmer wheat. This adaptive genetic diversity is significantly associated with ecological factors.


Background
Patterns of genetic diversity between and within natural plant populations and their driving forces are of great interest in evolutionary biology, as well as in studies of ecological and population genetics (Nevo list of wild cereals at http://evolution.haifa.ac.il) [1,2]. The analyses of genetic diversity and structure are helpful for management, research and utilization of plant germplasm. It is also critical for studies of crop evolution and genetic improvement to identify and correctly interpret the associations between functional variation and molecular genetic diversity [2,3]. Wild emmer wheat, Triticum dicoccoides, has been found in a wide range of environments, and shows high genetic and phenotypic diversity [3]. The analysis of the genetic structure and population divergence of such high diversity is important for breeding purposes, especially to identify genes or genomic regions involved in environmental adaptation. Furthermore, wheat serves as a good model of polyploidy, one of the most common forms of plant evolution [4,5]. Hence, it is cardinal to study adaptive genetic diversity in wild emmer, the progenitor of modern tetraploid and hexaploid cultivated wheats [1,2,6,7].
In previous studies, genetic diversity of wild emmer wheat populations has been evaluated using various methods such as morphological traits [1,17], allozyme analysis [1,3,13], and many molecular markers (SSRs, RAPDs, and SRAPs) [14,15,22]. Association between markers and ecogeographical factors were also discussed [13][14][15]22]. However, genetic structure and population divergence revealed by EST-related SNP markers have not been reported in wild emmer wheat populations. EST-related markers discovered directly from the EST sequences or from genomic sequences amplified using PCR primers designed from ESTs, are useful resources for assaying functional genetic variation [23]. Variation in functional regions, expressed or regulatory sequence, might reflect the past influences of natural selection. Besides, because this type of SNPs can be linked to functional genes, it is important to determine which markers have been likely associated with selection, especially to identify genes or genomic regions involved in environmental adaptation. Hence, SNP markers seem the best to meet needs of marker-assisted management of genetic resources, and of diversity studies and marker-assisted selection in breeding programs. At present, the majority of studies using these EST-related SNP markers have focused on model organisms [24,25] with fewer applications to non-model taxa [26]. Only a limited number of SNPs have been reported in wheat [27][28][29][30]. Large-scale SNP discovery in wheat is limited by both the polyploidy nature of the organism and the high sequence similarity found among the three homoeologous wheat genomes [29,31].
In the present study, a large number of EST-related SNP markers were used to investigate genetic diversity and genetic structure of a natural collection of 200 accessions belonging to 25 wild emmer wheat populations. This germplasm was collected by E. Nevo from various locations in Israel and Turkey, which covers a wide range of ecological conditions such as soil, temperature, and water availability. Noteworthy, a F ST -outlier method was used to identify loci that may be under positive selection and therefore might be linked to genome regions conferring the phenotypic variation present in analyzed germplasm for breeding programs.

Plant materials and ecological background of wild emmer wheat
The center of distribution and diversity of emmer wheat was found in the catchment area of the upper Jordan Valley in Israel and its vicinity [13]. A total of 200 wild emmer wheat accessions representing 25 populations collected from Israel and Turkey (five to ten accessions per population) were used in this study. The plant materials originated from a wide range of ecological conditions of soil, temperature and water availability, representing the natural distribution of wild emmer wheat. Geographical locations of all the investigated populations are shown in Figure 1. The populations used in this study, along with their geographic origin and climatic conditions, are presented in Table 1. The Israeli climatic data was obtained from publications of the  Table 1.
Meteorological Service of Israel [13]. Detailed information about each population and their collection sites have been described in the literatures (Nevo list of wild cereals at http://evolution.haifa.ac.il) [13][14][15].
The 200 wild emmer wheat accessions were genotyped with 1,536 SNP markers. These SNPs discovered in a panel of 32 lines of tetraploid and hexaploid wheat were downloaded from the Wheat SNP Database (http:// wheat.pw.usda.gov/SNP/new/index.shtml). A detailed procedure of SNP selection and assay design have been described by Akhunov et al. [27,28] and Chao et al. [33]. Briefly, a total of 150 ng of genomic DNA per genotype was used for Illumina SNP genotyping at the Genome Center of University of California, Davis (www. genomecenter.ucdavis.edu/dna_technologies) using the Illumina Bead Array Platform and Golden Gate Assay following the manufacturer's protocol [34]. The fluorescence images of an array matrix carrying Cy3-and Cy5-labeled beads were generated with the two-channel Water availability: Rn mean annual rainfall, Rd mean number of rainy days, Hu-an mean annual humidity, Hu-14 mean humidity at 14:00 h, Dw mean number of dew nights in summer, Ev mean annual evaporation, Rv mean inter-annual variability of rainfall, Rr mean relative variability of rainfall; Edaphic: So soil type, 1 = terra-rossa (t.r.); 2 = rendzina; 5 = basalt.
scanner. The ratio of the intensity of Cy3 and Cy5 fluorescence is used to determine the allelic state at an SNP site. Golden Gate genotyping reaction performed on polyploid wheat genomic DNA is expected to produce Cy3/Cy5 fluorescence ratios that differ from those expected for a diploid. Due to the bottleneck in the formation of tetraploid wheat, there was virtually no polymorphism introduced from the A or B genome ancestor. Thus all mutations arose after the formation of the found tetraploid population. The rate of spontaneous mutation is extremely low, 10 -8 -10 -9 mutations/site/year in eukaryotic genomes. Therefore, two-mutation event occurred simultaneously in both the A and B genomes at a given nucleotide site is negligible. Considering the nature of self-pollination in emmer wheat, there will be only two genotypes for the accessions involved, for example, A-> T mutation in the A-genome yields a derived T base and an A/T SNP. In the B-genome, the ancestral A base remains unchanged. Hence, the SNP results in two homozygous genotypes, AAAA and TTAA. The ratio of A:T bases in these two genotypes are 1:0 and 1:1. Subsequent genotype calling was carried out using Illumina's BeadStudio software v.3. The accuracy of the genotype call was manually evaluated for the misclassification of homozygous and heterozygous clusters using the software's clustering algorithm. This step proved critical for reducing the genotyping error rate associated with peculiarities of clustering patterns in polyploidy wheat [27,33].
Genetic diversity and genetic structure POWERMARKER Ver. 3.25 was used to evaluate genetic diversity [35]. The genetic parameters included Nei's gene diversity and polymorphism information content (PIC). Nei's gene diversity was defined as the probability that two randomly chosen alleles from the population are different [36]. PIC values provide an estimate of the probability of finding polymorphism between two random samples in the germplasm.
In order to have a better insight into the genetic structure of wild emmer wheat, we applied the Bayesian model-based clustering algorithm implemented in STRUCTURE 2.2 [37]. Admixture and correlated allele frequency models were employed with the number of clusters (K) ranging from 1 to 12. For each K, five runs were carried out. Burn-in time and replication number were both set to 100,000 for each run. The optimal value of K was determined using the ΔK method [38] and by inspecting the relationship between the log probability of the data and K.
The correlation between shared-allele distance and geographic distance (measured in kilometers) among populations was performed using the Mantel test, implemented in the GENEALEX6.0 software [39].

Population differentiation and detection of outliers
Population differentiation and significance were assessed by calculating pairwise F ST values for all population pairs using Arlequin 3.5 software [40]. Analysis of molecular variance (AMOVA) was performed to estimate the variance between populations and among accessions within populations, also implemented in the Arlequin 3.5 software. Significance levels for variance components and F ST statistics were estimated using 16,000 permutations.
We also used Arlequin 3.5 to detect outlier loci taking into account the hierarchical structure of the populations, in which populations are divided into groups according to their genetic structure revealed by STRUC-TURE analysis. The analysis was performed with 20,000 simulations under a hierarchical island model with 10 groups of 100 demes. The joint null distribution of F ST and heterozygosity (heterozygosity within populations divided by (1-F ST )) was obtained according to Excoffier and Lischer [40]. Based on F ST values that fall outside of the 99% confidence interval, candidate loci under positive selection were used for further analysis.
Statistical tests SPSS V.13.0 program (http://www.spss.com) was used to perform statistical analyses. The significance of differences for Nei's gene diversity and PIC among chromosomes was tested by estimating a 95% confidence interval (CI) of the genome mean, which was calculated using bootstrap analysis with 1,000 replications. Chromosome means outside of the 95% CI were declared significantly different from the genome mean [28].
Multiple regression analysis was performed to investigate the relationship between environmental variables and SNP allele frequencies, and detect the best predictors of gene diversity and PIC index [14,15]. Nei's gene diversity, PIC, and SNP allele frequencies were employed as dependent variables in the model, respectively; and geographic, climatic and edaphic factors served as independent variables. The following ecogeographical factors were included in the analysis. Geographical [longitude (Ln), latitude (Lt), and altitude (Al)], climatic [temperatureannual (Tm), January (Tj), August (Ta), seasonal temperature difference (Td), daily temperature difference (Tdd); number of tropical days (Trd), evaporation (Ev); moistureannual rainfall (Rn), number of rainy days (Rd), number of dewy nights in summer (Dw), annual humidity (Huan), humidity at 14:00 (Hu14), inter-annual rainfall variation (Rv), coefficient of variation in rainfall (Rr)], and edaphic dummy variables [one per each of the soil types: basalt (Ba), rendzina (Ren) and terra rossa (Tr)]. The analysis was conducted using 21 of the examined wild wheat populations. Populations from Turkey including W. Siverek, E. Siverek, and N. Diyarbakir with many missing data and Mt. Hermon, a cold desert with the highest rainfall were excluded from this analysis in order to minimize the errors or the bias caused by extreme climate conditions.

SNP marker quality and genomic distribution
Genotyping of 200 wild emmer wheat accessions with multiplexed 1,536 Illumina Golden Gate SNP assay generated 307,200 genotypic data points. Out of the 1,536 SNPs presented in our oligonucleotide pool assay (OPA), 1,371 (89.3%) SNPs with high quality genotype calls were obtained, while the other 10% failing to generate clear genotype clustering were removed. Out of the 1,371 scoreable SNP markers, 266 were monomorphic across all the 200 accessions and the overall polymorphism rate was 80.6%. Marker distribution, Nei's gene diversity, and PIC values calculated for each chromosome and genome were presented in Table 2.
Polymorphic SNP loci were not evenly distributed across the seven homoeologous groups, and coverage, number of marker loci per group, ranged from 123 in group 5 to 186 loci in group 1. Differences between homoeologous groups were significant (P < 0.05) for gene diversity and PIC (Table 2). Nei's gene diversity varied from 0.1531 in group 5 to 0.2079 in group 6 with an average of 0.1841. The PIC value ranged from 0.1292 in group 5 to 0.1731 in group 6 with an average of 0.1530.
Of the polymorphic loci, 613 and 492 were located in A and B genomes of wild emmer wheat, respectively. As shown in Table 2, the higher genetic diversity was detected in genome B with Nei's gene diversity and PIC values of 0.1975 and 0.1649, respectively, while 0.1733 and 0.1443 for genomes A, respectively. This difference between genome A and B was not statistically significant for both gene diversity (t = 1.762, P = 0.129, paired t test) and PIC (t = 2.126, P = 0.078, paired t test). In the genome A, chromosome 3A and 6A had higher genetic diversity and chromosome 1A and 5A had lower genetic diversity than the genome-wide average in the analyzed germplasm (Table 2). In the genome B, genetic diversity was lower in chromosome 4B and 5B than the genome-wide average, while genetic diversity was higher in chromosome 6B than the genome-wide average ( Table 2).

Genetic diversity
Proportion of polymorphic loci, gene diversity, and PIC of the 25 wild emmer wheat populations were summarized in Table 3. Among 25 populations, genetic diversity estimates exhibited remarkable variations, with Nei's gene diversity ranging from 0.1101 (Qazrin) to 0.2583 (Daliyya) and PIC ranging from 0.0899 (Qazrin) to 0.2221 (Daliyya), respectively. Similarly, genetic diversity pattern was also reflected by the percentage of polymorphic loci within a population. The population of Daliyya had the highest percentage of polymorphic loci (P = 81.45%), followed by N. Diyarbakir (55.75%) and Yehudiyya (51.49%), whereas the polymorphic loci of Rosh-Pinna and Qazrin were the least (31.49-32.49%).

Genetic relationships
Genetic distances (D) were calculated for all the population pairs, based on the shared-allele distance (Additional file 1: Table S1). The highest genetic distance (0.1953) was obtained between populations of Hermon and Yehudiyya, whereas the most related populations were Qazrin and Yehudiyya with a genetic distance of 0.0401. However, lower D values (D < 0.050) were observed between some populations from different areas, and, for the most part, the estimates of D value were geographically independent, as revealed by Mantel test (r = 0.014, P = 0.543; Figure 2A). These results suggest that geographic distance alone may not explain interpopulation genetic divergence.
Genetic structure SNP genotyping data were used for genetic structure analysis, using the Bayesian clustering model implemented in the STRUCTURE software. The estimated log probability (LnP(D)) increased continuously with increasing K, and there was no critical K value that clearly defines the number of populations ( Figure 3A). We applied the rate of change in the Napierian logarithm probability relative to standard deviation (ΔK). The results suggested that the optimal value of K was 2 ( Figure 3B). When K = 2, the largest number of accessions (188/200 = 94%) assigned to a specific cluster with a probability higher than 80% was obtained, and only 6% were classified as admixed. However, percentage of unassigned genotypes, classified as admixed, increased continuously with K, and this percentage is 8.5%, 14%, and 42% when K = 3, 4 and 5, respectively. Hence, the clustering diagrams with K ranging from 2 to 4 are presented in Figure 3C. When K = 2, the analyzed wild emmer wheat populations can be divided into two genetically distinct groups (Group I and Group II) ( Figure 3C  Mt. Gilboa, Kokhav-Hashahar, Taiyiba, Bet-Meir, Sanbedriyya, and Jaba; and north marginal population: Hermon) and Turkey populations (W. Siverek, E. Siverk, N. Diyarbakir) ( Figure 3C). When K = 3, Group I was the same as in the previous analysis, but Group II was subdivided into two subgroups (Group A and Group B) ( Figure 3C). That is, accessions from Hermon and N. Diyarbakir were separated from Group II. When K = 4, only Group B was further subdivided into two subgroups (Group B1 and Group B2), and accessions from south marginal populations including Taiyiba, Bet-Meir, Sanbedriyya and Jaba were clustered together ( Figure 3C).

Genetic differentiation of populations
Population differentiation was assessed with an analysis of molecular variance (AMOVA). The AMOVA revealed that individuals within populations are highly genetically differentiated in relation to individuals among populations, which is reflected by a higher proportion of variance within populations than among populations. Ninety percent of the genetic variation resided among accessions within populations, while a small (9.82%) but significant (P < 10 -5 ) portion of the variation resided between populations (Table 4). Moreover, fixation index (F ST = 0.098) was highly significant (P < 10 -5 ) as indicated by permutation test.
These results indicate that differentiation between populations has truly occurred. Indeed, coefficients of population differentiation (F ST ) were also calculated for pairwise comparisons of the 25 populations (Additional file 2: Figure S1). The F ST values for all 300 pairs ranged from −0.0356 to 0.3502, with 126 pairs showing significant genetic differentiation (P < 0.05). Forty-four out of 126 pairs showed strong genetic differentiation (F ST > 0.2). However, genetic differentiation between populations was independent of geographical distances between the sites of collection, as revealed by the Mantel test (r = 0.051, P = 0.468; Figure 2B). This finding suggests that there is no evidence for an isolation-by-distance pattern of population differentiation in wild emmer wheat.
Adaptive differentiation has conventionally been identified from differences in allele frequencies among different populations, reflected by F ST , an appropriate genetic parameter for measuring population differentiation and hence identifying outlier loci. In this study, outlier loci were identified using the F ST -based method that considers the hierarchical structure in order to minimize the number of false-positive loci. We focused on the results when K = 2, since the model-based approach of STRUCTURE indicated that K = 2 was assumed to be optimal. A total of 102 outlier loci were identified when K = 2. Among these, 69 loci were candidates for balanced selection, while only 33 loci were candidates for being subjected to positive selection (Figure 4). Chromosomal distributions of these loci were shown in wheat chromosome bin maps ( Figure 5). A high portion of these loci (54.5%) were located in chromosomes 1B, 2A, 3B, 4A, and 7A (Table 5; Figure 5).
The SNP markers used in the present study were derived from genomic sequences amplified from conserved primers, which were located in exons and were designed on the conserved sequences between wheat EST and rice genomic sequences [28,41]. A putative function of these 33 loci thus may be deduced based on comparison of the underlying genes to a protein sequence database. Among the 33 loci, P-EA (phosphoethanolamin emethyltransferase), GBP-1 (GTP-binding protein), and SPDS (Spermidine synthase) were found to be under positive selection (Table 5; Figure 5).

Association between markers and ecogeographical factors
As shown in Additional file 3: Table S2, the wateravailability factor alone explained a significant proportion of the diversity revealed by SNP markers. The best two variable predictors of gene diversity and PIC index, explaining significantly 0.29-0.30 of their variance (P < 0.01), were Rv and Ev (inter-annual rainfall variation and evaporation). A three-variable combination involving RvEvHu14 (inter-annual rainfall variation, evaporation, and humidity at 14:00), accounted significantly (p < 0.01) for 0.48-0.49 of the variance in gene diversity and PIC index. Out of 1,105 polymorphic SNP markers, 755 including 33 outlier loci subjected to positive selection were significantly correlated with ecogeographical factors, single or in combination, for allele frequency (Additional file 3: Table S2). Environmental factors including geography, temperature, and water-availability factors, singly or in combination, explained a significant proportion of variation in SNP allele frequency, from 0.2 to 0.9. Based on correlation of allele frequency with environmental factors, the 755 SNP markers can be classified into several categories in terms of their chosen ecogeographical predictors (Additional file 3: Table S2

Genetic diversity revealed by EST-related SNP markers
Average Nei's gene diversity and PIC of the 25 populations of wild emmer wheat in this study were 0.1841 and 0.1530, respectively. Compared to those obtained previously with EST-SSR [42], SSR [15], RAPD [14], and allozyme [13], this level of genetic diversity is moderate. As shown in Figure 6, EST-related SNP markers were more polymorphic than allozyme loci, but lower than RAPD and SSR loci among the wild emmer wheat populations. Furthermore, a medium proportion of SNPs (31.49%-81.45%) were detected within populations indicating a moderate level of diversity within populations (Table 3). This result is expected, because of the more conserved nature of coding sequences sampled by ESTrelated SNP markers relative to non-coding sequences sampled by microsatellites and RAPDs. Another reason may be explained by the property of SNPs and the definition of gene diversity. SNP markers are mainly biallelic, the gene diversity and PIC thus cannot exceed 0.50, whereas the maximum can approach 1 for multi-allelic markers, such as SSRs. Despite these facts, our results show a sufficient level of variation when using ESTrelated SNP markers to carry out genetic structure and future association mapping analysis. Therefore, the result of this study provided evidence showing that the EST-related SNP markers may provide an opportunity to examine the functional diversity of germplasm collections, as reported by Chao et al. [29].

Genetic structure of wild emmer wheat populations
This study presents the first genome-wide analysis on population structure of SNP genetic variation among natural populations in wild emmer wheat. Clustering based on Bayesian model showed that the grouping pattern is related to the ecogeographic distribution of the wild emmer wheat populations. All central populations collected from warm and humid environments in the Golan Plateau (Qazrin, Yehudiyya and Gamala) and near the Sea of Galilee (Tabigha, Ammiad and Rosh-Pinna) were separated from marginal populations when K = 2, 3 and 4, respectively ( Figure 3C). Although marginal populations, collected across a wide geographic areas on the northern, eastern, and southern borders of wild emmer distribution, involving hot, cold and xeric peripheries, were clustered together when K = 2, while Mt. Hermon in Israel together with N. Diyarbakir in  Table 5. Candidate loci from known genes of wheat were indicated by *, and these known genes subjected to positive selection were listed after each loci. The number in parentheses at the bottom of each chromosome is the number of EST loci mapped in that chromosome without knowing the exact bin. Only these bins with mapped loci are indicated. Turkey showed a clear separation from the other marginal populations when K = 3. This clustering may be explained by the similarity in ecological conditions. The two sites are located in mountains with relatively high altitude, 1300 m and 720 m, and similarly low winter temperature, 3°C and 3°C of mean January temperature, for Mt. Hermon and N. Diyarbakir, respectively (Table 1). Furthermore, Mt. Hermon is closer to N. Diyarbakir than the other Israeli populations (Figure 1). When K = 4, the south xeric populations, Taiyiba, Bet-Meir, Sanbedriyya and Jaba are clustered together, but clearly separated from the west mesic (Mediterranean) populations. These results suggest that ecological variables play an important role in shaping the genetic structure of wild emmer wheat. Indeed, SNP-based genetic distances were found to be independent on the geographical distances, as revealed by the Mantel test (r = 0.014, P = 0.543; Figure 2A). For example, the two most geographically distant populations, J'aba and N. Diyarbakir (850 km apart), exhibited a low value of genetic distance (0.067), while two adjacent populations, Gamla and Yehudiyya (7 km apart), showed a relatively high value of genetic distance (0.137). This suggests that geographic distance alone may not explain inter-population genetic divergence, which rules out an isolation-by-distance model. Hence, genetic distances of some populations may have a closer association with ecological variables relative to geographical distribution.

Genetic differentiation of populations
Natural habitats of wild emmer wheat differ from one another in a large number of variables such as macroand micro-climate, topography, soil type, etc. Such local ecogeographic differentiation may enhance plant populations to evolve local ecological adaptations that provide an advantage under the prevailing conditions [2,16]. Adaptive differentiation has conventionally been identified from differences in allele frequencies among different populations, summarized by an estimate of F ST [43,44]. This F ST approach has been applied to many crops, such as the common bean [45] and tomato [46], and markers identified by using a F ST -outlier method in these species tended to map to genome regions with known genes and quantitative trait loci related to domestication.
In the present study, we identified 33 candidate loci under positive selection based on F ST values that displayed differentiation higher than the 99% limit of the confidence interval ( Figure 4). These loci may be directly under selection, but more likely mark regions of the genome that have been selected during evolution, because some candidate loci clustered in the same chromosomal regions, such as outlier 17 and 18, and outlier 30 and 31 (Table 5; Figure 5). The loci we identified have a disproportional bias with 54.5% mapping to chromosomes 1B, 2A, 3B, 4A and 7A (Table 5; Figure 5). This observation suggests that there are 'hot spots' for directional selection in genome of wild emmer wheat. An analysis of wheat's chromosome maps by Map Viewer (http://www.ncbi.nlm.nih.gov/projects/mapview/) indicated that a large number of multiple fungal disease-resistance genes exist in chromosomes 1B, 2A, 3B, 4A and 7A, such as Lr17, Lr20, Lr27, Lr28, Lr30, Lr38, Sr2, Sr7, Sr15, Sr21, Sr22, Sr38, Yr17, Pm1, Pm4, and Hd. In addition, three genes including P-EA [47], GBP-1 [48] and SPDS [49,50], which play important roles in plant responses to biotic and abiotic stresses or in plant growth and development in wheat [47][48][49][50], appear to be subjected to positive selection. This result suggested that the markers and genome locations we identified as outliers under positive selection were consistent with known patterns of selection that differentiated central populations from marginal populations. Large number of accessions from central populations located near the Sea of Galilee and the Golan Heights were resistant to stripe rust and powdery mildew, while marginal populations were collected across wide geographic areas on the northern, eastern and southern borders of wild emmer distribution, involving in hot, cold and xeric stress [1]. Such an objective assessment may provide a scalable means for comprehensive assessments of genetic variation within wild emmer wheat as emerging sequence data and improved genotyping platforms lead to larger datasets [46].

Ecogeographical factors vs. population divergence and genetic structure
The organization and evolution of genetic diversity in nature at global, regional, and local scales are nonrandom and heavily structured; and are positively correlated with, and partly predictable by, abiotic and biotic environmental heterogeneity and stress [51], as shown earlier by allozyme and DNA markers. However, Prunier et al. recently found origin and evolution of adaptive polymorphisms in black spruce can be modified by historical events, hence affecting the outcome of recent selection and leading to different adaptive routes between intraspecific lineages [43]. In this study, we found that ecogeographical factors play an important role in shaping genetic structure and enhancing population divergence in wild emmer wheat from Israel and Turkey. Significant correlations between marker loci and ecogeographical factors were observed in the analyzed germplasm. Latitude, temperature, and wateravailability factors, singly or in combination, explained a significant proportion in variation of SNP allele frequency (Additional file 3: Table S2). These findings suggest that natural selection could create regional divergence in wild emmer wheat. Especially, wateravailability factors alone explained a significant proportion of genetic diversity revealed by SNP markers (Additional file 3: Table S2). The association of these factors with SNPbased genetic diversity was similar to that between allozyme variation and ecogeographical factors [13] and to that of latitude/altitude with RAPD and microsatellite diversity [14,15]. These results suggested that the operation of natural selection and the adaptive nature of genetic variation could be explained by the variation of ecological factors. The sharp regional gradient of climatic conditions from north to south in Israel, with increasing temperatures and decreasing water availability towards the semiarid zones in southern Israel play a major role as do microecological climatic and edaphic stresses [16,52]. That is also why latitude was found to be associated with frequency variation for most SNP allele (Additional file 3: Table S2). Therefore, natural selection appears to play a major role in generating adaptive structures coupling with environmental stresses in wild emmer wheat as in other organisms [53].