Looking for adaptive footprints in the HSP90AA1 ovine gene

Background Climatic factors play an important role in determining species distributions and phenotypic variation of populations over geographic space. Since domestic sheep is managed under low intensive systems animals could have retained some genome adaptive footprints. The gene encoding the Hsp90α has been extensively studied in sheep and some polymorphisms located at its promoter have been associates with differences in the transcription rate of the gene depending on climatic conditions. In this work the relationships among the distribution and frequencies of 11 polymorphisms of the ovine HSP90AA1 gene promoter in 31 sheep breeds and the climatic and geographic variables prevailing in their regions of origin have been studied. Also the promoter sequence has been characterized in 9 species of the Caprinae subfamily. Results Correlations among several climatic variables and allele frequencies of the polymorphisms of the HSP90AA1 gene promoter linked with differences in the transcription activity of the gene under heat stress conditions have been assessed. A group of breeds reared in semi dry climates have high frequencies of the insertion allele of the g.667-668insC associated with the heat stress response. Other group of breeds native to semi arid conditions showed very low frequencies of this same allele. However, in some cases, this previous correlation has not been achieved, revealing the high levels of gene flow among populations occurred following domestication. The Bayesian Test of Beaumont and Balding identified two outlier loci, the g.522A > G and g.703_704del(2)A candidates to balancing and directional selection, respectively. Polymorphisms detected in O. aries are also present in several species of the Caprinae subfamily being C. hircus, O. musimon and O. moschatus those sharing the highest number of them with O. aries. Conclusions Despite domestication, sheep breeds showed some genetic footprints related to climatic variables. Adaptation of breeds to heat climates can suppose a selective advantage to cope with global warming caused by climatic change. Polymorphisms of the HSP90AA1 gene detected in the Ovis aries species are also present in wild species from the Caprinae subfamily, indicating a great antiquity of these mutations and its importance in the adaptation of species to past climatic conditions existing in its native environments. Electronic supplementary material The online version of this article (doi:10.1186/s12862-015-0280-x) contains supplementary material, which is available to authorized users.


Background
The Subfamily Caprinae includes a widespread and diverse group of ungulates (hoofed mammals) that are most extending from the arctic to the equator. Wild Caprinae were the ancestors of two of the most important species of domestic livestock -domestic sheep (Ovis aries) and goats (Capra hircus). Present day populations of wild Caprinae represent a potential source of knowledge of adaptation genetics which can be used to improve or adapt current domestic breeds to less productive conditions [1].
Sheep was one of the first species to be domesticated, approximately 11,000 years before the present in the Fertile Crescent [2], due to its small size, docile behavior and high adaptability to very different environments. This domestication process must have involved a genetically broad sampling of wild stock and also the persistence of cross-breeding with wild populations [3]. Domestication pressure over animal's life had as consequence that natural selection loosed impact over their biological fitness giving up the turn to artificial selection imposed by humans over productive traits (wool, meat, milk). However, sheep is one of the livestock species managed under low intensive systems and therefore could have retained from its wild ancestors some genome footprints in genes related to environmental adaptation.
Climatic factors like temperature and humidity play an important role in determining species distributions and they likely influence phenotypic variation of populations over geographic space [4]. Correlations between phenotype and environment may be revealed by genetic polymorphisms which allele frequencies strongly differentiate populations that live in different environments [5] and such differences can be maintained in the face of gene flow [6].
Several studies have examined the distributions of genetic variants in candidate genes for traits that vary with climate. For example, in humans, candidate gene studies have yielded evidence that variants involved in sodium homeostasis and energy metabolism [4] and those related with type 2 diabetes and obesity [7] are strongly correlated with climate variables. Also a decrease in the frequency of variants implicated in salt sensitive hypertension had been correlated with increasing distance from the equator [8]. In Drosophila melanogaster, variants involved in circadian rhythms, aging and energy metabolism were correlated with climate [9], in Arabidopsis thaliana, variants associated with flowering time were correlated with latitude [10], and in pines several genes contain variation have been correlated with temperature [11].
The heat shock response is among the most important and ubiquitous fact in nature. Heat, both quantitatively and qualitatively is one of the best inducers of Heat Stress Proteins (Hsp). They act as molecular chaperones, helping to maintain the metabolic and structural integrity of the cell, as a protective response to external stresses. The chaperone Hsp90 is one of the most abundant, highly conserved and usually heat-induced proteins found in all eukaryotes studied so far. HSP90 gene presents two isoforms, HSP90-α (inducible form) and HSP90-β (constitutive form). There are only few publications on the role of Hsp90 function in species adaptation and survival under extreme conditions [12][13][14]. The gene encoding the Hsp90α heat-shock protein (HSP90AA1) has been extensively studied in sheep [15][16][17][18]. Differences in the HSP90AA1 transcription rate [18][19][20] depending on genotype combination of some polymorphisms located at its promoter and the environmental conditions existing when sample collections have been shown. Also an effect of these polymorphisms over ram's sperm DNA fragmentation depending on environmental temperatures has been assessed [20,21].
This work has the aim to study the relationships between the frequencies of 11 polymorphisms located in the HSP90AA1 gene promoter in 31 sheep breeds from different locations of the European, Asian and Africa continents and the climatic and geographic variables prevailing in the regions where these breeds are reared; and to characterize the HSP90AA1 promoter sequence in 9 species of the Caprinae and in 2 species of the Bovinae subfamilies to determine polymorphisms history and contribute to elucidate the phylogeny of one of the most controversial subfamily of the sub order Ruminantia.

Polymorphism variability and test for linkage disequilibrium in sheep breeds
Genotype and allele frequencies of the 11 polymorphisms studied in each of the 31 sheep breeds are showed in Tables 1 and 2. Levels of polymorphism were generally high in all breeds. There were no private alleles in any of the breeds studied. The less polymorphic marker was the SNP g.522 > G for which the G allele was fixed in 18 breeds. For the INDELs g.666_667insC and g.516_517insG, the D allele was fixed in nine and six breeds, respectively.
The Hardy Weinberg equilibrium test for all breeds joined (Additional file 1, AF1) shows all polymorphisms in HW equilibrium except for the INDELs g.666_667insC and g.703_704del (2)A. The average expected (Ehet) and observed (Ohet) heterozygosis were 0.273 and 0.258, respectively, for all breeds joined.
Linkage disequilibrium (LD) was estimated to obtain polymorphism linked blocks across and within breeds. Additional file 2 (AF2) shows the LD matrix for all populations and for each breed separately and also a figure of LD blocks and haplotypes. In most breeds, similar LD than those previously observed in Manchega Spanish breed (MNCH) [19] were obtained. Thus, three LD blocks of polymorphisms can be established: g.666_667insC_g. 444A > G; g.703_704del(2)A_g.660G > C_g.528A > G and g.601A > C_g.524G > T_g.468G > T.

Phylogenetic relationships between sheep breeds
Additional file 3 (AF3) shows population pairwise F STs , p values and significances and the Reynolds's distance matrix among the 31 sheep breeds studied. Average, median, maximum and minimum distances across populations were 0.0952, 0.0628, 0.6159 and 0.0000, respectively. Among AW, SZ, AS, Cl, LX, KAR, DGL, KARM and EDIL breeds distances higher than 0.25 were observed. Breeds with distance values lower than 0.01 (even 0.00)  Breed  ID breed  DD  AD  AA  II  ID  DD  II  ID  DD  GG  CG  CC  AA  AC  CC  GG  AG Breed  ID breed  GG  GT  TT  AA  AG  GG  II  ID  DD  GG  GT  TT  AA  AG   among them were found for ARME, Ch, KRY, MNCH, AKA, CAUC, BNI, BOUJ and BOZ. Figure 1 shows NeighborNet graph based on Reynold's distance constructed with the ClusterNetwork splits transformation method for the 31 sheep breeds studied. The LSFit (which is the least squares fit between the pairwise distances in the graph and the pairwise distances in the matrix) of the NeighborNet was 92.26.
The group constituted by EDIL, KARM, KAR, DGL, KRC, KRB, BAJ and BOZ breeds is outside the reticulations of the NeigborNet graph, indicating a certain degree of separation of this set from the remaining breeds. All these breeds have in common that belong to regions of West Asia and East Europe with high thermal width (arid and semiarid climates). Average, minimum and maximum distances among these breeds were 0.040, 0.000 and 0.167, respectively. The remaining breeds are included in a complex system of reticulations which indicates the existence of a genetic admixture among them [3]. AS and AW breeds are joined in the same branch, as should be expected due to high genetic linkage (Assaf is a synthetic breed from a cross between Awasi and Milkchaff milk breeds). KVR, SZ, ME and Cl breeds come from the same node. All these breeds belong to Mediterranean regions with low thermal width and semi-damp climates. Average, minimum and maximum distances among these breeds were 0.040, 0.000 and 0.080, respectively. Tests to detect association of loci frequencies with environmental parameters PLSR PLSR analysis was conducted including the MAF of six polymorphisms as response variables and 14 environmental variables as predictors (CTY was not included in analyses due to its discrete nature). Polymorphisms considered were g.667_668insC, g.522A > G, g.516_517insG and one polymorphism of each LD block common to most breeds: g.666_667insC, g.660G > C and g.601A > C. For all polymorphisms analyzed, the allele at lower frequency (MAF) was the same in all breeds (I -668 , I -667 , A -601 , A -522 and I -516 ). However, the G -660 allele of the g.660G > C SNP was the MAF in 25 from the 31 breeds studied.
Basic statistics, and Pearson and Spearman correlations among MAF and environmental variables are shown in Additional file 4 (AF4) and Additional file 5 (AF5), respectively. High negative Pearson (-0.68) and Spearman (-0.70) correlation coefficients were found between MAF of g.667_668insC and g.660G > C (p < 0.0001). A positive Figure 1 NeighborNet graph based on Reynold's distance constructed with the ClusterNetwork splits transformation method for the 31 sheep breeds studied. and moderate (0.43) Spearman correlation was found for g.667_668insC and g.666_667insC (p < 0.05). Regarding correlations among environmental predictors high (≥0.70) negative correlations (Pearson and Spearman) were found between LAT-MINaT, LAT-ANT, LON-MINaT, MINaT-TW and ANT-TW; and positive between LON-TW, MINaT-ANT and TAR-MxR. Only significant correlations among MAF and environmental predictors were found for g.667_668insC, g.666_667insC and g.660G > C. Similar magnitude but with opposite sign had the correlations found between g.667_668insC and g.660G > C with MINaT, ANT, TW, TAR and MxR. Table 3 shows Variable Importance in Projection (VIP) and percentage of variance explained by the top two (VT2) PLSR components for each environmental variable. Those variables showing VIP values greater than 0.83 and which VT2 was at least 40%, were retained for posterior analyses. With these criteria, MAXaT, HrMx, HrMi and THI variables were discarded.
A second PLSR analysis including six polymorphisms and ten environmental variables were developed. Predictive Residual Sum of Squares (PRESS) of the complete (14 predictor variables) and reduced (10 predictor variables) models were 0.9726 and 0.9589, respectively, which indicates that the elimination of 4 useless environmental variables improve the prediction model. Reducing the number  LAT = latitude; LON = longitude; MAXaT = maximum average temperature; MThm = maximum temperature of the hottest month; MINaT = minimum average temperature; ANT = average annual temperature; TW (MAXaT-MINaT) = thermal width; TAR = total annual rainfall; MxR = maximum rainfall; MiR = minimum rainfall; HrA = relative average annual humidity (%); HrMx = maximum relative humidity (%); HrMi = minimum relative humidity (%); THI = Temperature Humidity Index [22] Variables discarded for posterior analysis are indicated in bold.
of predictors, R 2 (value of explained variation) was also improved from 0.82 (14 predictors) to 0.85 (10 predictors). Three components were retained using the optimal model determination by the leave-one-out cross validation procedure and the minimum PRESS criteria (van der Voet's test). The 72.47% of the predictor variation is already explained by just two, but only 24.50% of the response variation is achieved. Figure 3 shows VIP and VT2 values for each of the ten environmental predictors included in the model. Taking into account for both statistics, MINaT, ANT, TW, TAR and MiR were the predictors with the best combination of VIP and VT2. However, environmental variables, as LON and MxR despite having high VIP values showed percentages of the variance explained below 50%.
Predicted variation (Q 2 ) values obtained as in equation (1)  Regression coefficients for responses with Q 2 values higher than 0.4, are shown in Figure 4 (Additional file 6 (AF6) showed regression coefficients of scaled and centered variables for all predictors and responses). Absolute values of regression coefficients ranged from 0.02 to 0.28. Interestingly, regression coefficients of I -668 and G -660 have opposite sign for all environmental predictors, except for MiR, indicating that the MAF at these polymorphisms depends on opposite environmental and geographical circumstances. Thus, the frequency of the I -668 and G -660 alleles increases and decreases respectively, for higher values of MINaT, ANT, TAR and HrA. Otherwise, high values of LAT, LON and TW are linked to low and high frequencies of the I -668 and G -660 alleles, respectively.

SAM
MATSAM was run for six polymorphisms, g.667_668insC, g.522A > G, g.516_517insG, g.666_667insC, g.660G > C and g.601A > C, and 14 geographic and climatic variables (LAT, LON, MINaT, MAXaT, MThm, ANT, TW, TAR, MxR, MiR, HrA, HrMx, HrMi and THI). Seven alleles at 5 loci were detected as significantly associated with at least one environmental variable with a confidence level of 99.99% (significant threshold ST = 5.952-E05) based on cumulated results from W and G tests (Table 4). These alleles are involved in 168 significant models according to the W and G test.
No association with environmental variables was found for the SNP g.522A > G alleles. MAF alleles I -668 , I -667 and G -660 were associated with the highest number of environmental variables, 5, 8 and 8 respectively. The environmental variables related with more number of loci were MxR, ANT, LON, MINaT, TAR and TW (4, 3, 3, 3, 3 and 3, respectively). Figure 5 shows correlograms of significant associations between markers and environmental variables, which differences in probability of presence of the allele between the extremes of the distribution were higher than 37%. MINaT is the environmental variable for which greatest changes was shown in the probability to find the G -660 allele. A decrease in this probability from 0.9 to near 0.3 (60%) was found for G -660 when MINaT increases from -22°C to 17°C. An opposite trend was observed for I -668 . In this case, the likeliness to find de I allele increases from 0.15 to near 0.60 (42.8%) for the same rank of MINaT change. For ANT and MxR the same pattern above described was found. For TW an opposite pattern was observed. So the probability of the G -660 allele increases from 0.3 to 0.9 (58%) for 28 units of increment in TW (11 to 38.8), and the probability of the I -668 allele decreases from 0.6 to 0.1 (44.1%) for the same rank of TW variation.
Test to detect loci under selection Bayesian Test of Beaumont and Balding Table 5 shows expected heterozygosities (H e ) and F ST values obtained after 100,000 simulation runs of the LOSITAN and FDIST software using 31 sheep breeds and 6 unlinked polymorphisms at the HSP90AA1 promoter. Estimated neutral F ST was 0.072512. Two outlier loci were identified: g.522A > G with a significant (p = 0.02) low F ST value (0.02) candidate for balancing selection processes, and the g.703_704del(2)A with a significant (p = 0.99) high F ST value (0.14) candidate for directional selection. With the frequentist approach of FDIST, only sign of balancing selection was found for the g.522A > G SNP. Figure 5 Correlograms showing polymorphisms alleles significantly associated with environmental variables, which differences in probability of presence of the allele between the extremes of the distribution were higher than 37%. The Tamura 3-parameter model (T92) with evolutionary rates among sites modeled by using a discrete Gamma distribution (+G) with 5 rate categories had the highest fit (lowest BIC value) among the 24 different nucleotide substitution models tested by maximum likelihood. This model was fitted to estimate Evolutionary Divergence between species sequences, to conduct the Tajima's Neutrality Test and to construct the ML tree. Table 6 shows estimates of evolutionary divergence over sequence pairs between species. Within the Caprinae subfamily, the R. pyrenaica showed the highest percentage of sequence divergence with the remaining species (4.4% with species from the Ovis and Ovibos genus, 7% with species from the Capra genus and 7.6% with A. lervia. Interestingly O. moschatus was much closer to species of the Ovis genus (0.8 to 1%) than to those of Capra, Ammotragus and Rupicapra. As expected, very low evolutionary divergences among species of the Ovis genus and among the species of the Capra genus were observed. Table 7 shows polymorphisms detected and its frequencies in the species studied. Within the Caprinae subfamily, the species with more number of polymorphisms, SNPs or INDELs, were C. hircus and O. aries, with 13 and 11 polymorphic sites, respectively, from which only 6 were shared between them. Also O. moschatus and O. musimon showed high number of polymorphism, 8 and 7, respectively. The species which shared more number of polymorphisms with O. aries (reference species in our work) were O. musimon (7), O moschatus (7) and C hircus (5). In general, within this subfamily, polymorphisms shared among the different species had the same pattern of allele frequency, except for g.528A > G in O. moschatus where the A allele showed the highest frequency (0.93) and for g.703_704del(2)A in R. pyrenaica where the double A deletion allele was the most frequent (0.75). Exclusive polymorphisms were found in C. hircus (8), A. lervia (3), O aries (2) and O. moschatus (1). The two out group species from the Bovis genus (B. mutus and B. taurus) showed very few polymorphisms and did not share any mutations with the reaming species.
The three INDELs g.703_704del(2)A, g.667_668insC and g.666_667insC existed simultaneously only in O. moschatus, O. musimon and O. aries. C hircus had the two contiguous g.667_668insC and g.666_667insC, and O. ammon and R pyrenaica the g.703_704del(2)A. The highest frequency of the I -668 allele, related with heat stress tolerance, was found in O. musimon (0.47) and O. moschatus (0.43) followed by O. aries (0.28) and C. hircus (0.18). The SNP g.660G > C seems to be exclusive of the Caprinae subfamily. Unfortunately, this region is a sequence of several consecutive cytosines and therefore is difficult to know if there is not mutation at -660 position or if the C -660 allele is fixed in Bos. Anyway, C appears to be the wild allele of the g.660G > C SNP.
Tajima's Neutrality Test [23] conducted for the sequences of the HSP90AA1 promoter was -2.56 (p_value < 0.05 [24]) which reveals an excess of low frequency polymorphisms relative to expectation (D L = -1.78). This fact could indicate a purifying selection removing alleles that diminish animal's biological fitness but also the presence of "young" beneficial mutations going to higher frequencies. Figure 6 shows Maximum Likelihood bootstrap original and condensed trees based on the Tamura 3parameter model and inferred from 5000 replicates. The tree was constructed considering only haplotypes with frequencies higher than 0.05. Branches corresponding to partitions reproduced in less than 50% bootstrap replicates are collapsed. The analysis involved 33 nucleotide sequences. There were a total of 410 positions in the final dataset. Out-group species (B. taurus and B.mutus) were located in a separate branch with a high bootstrap percentage (99). One branch were constituted by species of the Capra and Ammotragus genus (97) and other branch by those of the Ovis, Rupicapra and Ovibos ones (86). In the ML consensus tree, O. moschatus was located as a sister species of R. pyrenaica. Species from the Ovis genus (O. aries; O. musimon, O. vignei, O. ammon and O. canadiensis) appear mixed since many haplotypes are shared among them and promoter sequences showed a high degree of similarity.

Discussion
Previous studies from our group pointed out the existence of different expression profiles in sheep carrying alternative genotypes of some polymorphisms located at the HSP90AA1 gene promoter depending on environmental temperatures [18][19][20]. The join genotype of the g.667_668insC and the g.660G > C polymorphisms had Table 6 Estimates of evolutionary divergence between species (below diagonal) and its standard errors (above diagonal) Analyses were conducted using the Tamura 3-parameter model (T92 + G). Table 7 Polymorphisms and their frequencies in the wild species analyzed Species g.703_704 del(2)A g.667_668 insC g.666_667 insC g.660G>C g.657A>G g.653C>A g.601A>C g.571G>C g.551A>G g.529G>C g.528A/T>G g.525A>G  Table 7 Polymorphisms and their frequencies in the wild species analyzed (Continued) Species g.524G>T g.522A>G g.516_517 insG g.498G>C g.482T>C g.468G>T g.463G>A g.456A>G g.444A>G g.406A>G g.395A>G g.384T>G g.320c>G the highest effect on the expression rate of the gene and on the sperm DNA fragmentation levels [20,21]. Animals carrying the II -668 -CC -660 genotype showed higher expression rate (Fold Change = 3.1 to 3.5) of the HSP90AA1 gene than those with DD -668 -CC -660 , DD -668 -CG -660 and DD -668 -GG -660 under heat stress environmental conditions (27°C average daily temperature and 34°C maximum daily temperature) [19,20]. At the phenotypic level the II -668 -CC -660 combined genotype had the lowest values of sperm DNA fragmentation (up to 3.5 times less) compared with the remaining genotypes, when heat stress events occurs along the spermatogenesis process [20,21]. The results above described may contribute to clarify the phylogeographic relationship for some sheep breeds and the opposite correlations observed between the frequency of the I -668 and G -660 alleles and some climatic and geographic variables from the locations where they are reared. Since the I -668 allele is responsible of the upregulation of the gene under heat stress conditions, high frequency of this allele is expected to be found in climates with high minimum (MINaT) and average annual (ANT) temperatures (positive regression coefficients, 0.11 and 0.13) and therefore with low thermal width (TW) (negative regression coefficient, -0.14). Despite no significant correlation was found among Total Annual Rainfall (TAR) and Maximum Rainfall (MxR) with MINaT and ANT, the frequency of the I -668 allele seems to be also associated with changes in these last variables. Thus the I -668 allele frequency is high in climates with high MINaT, ANT, TAR and MxR values and low TW values.

Ovis aries
Looking at climatic variables of countries where those breeds are reared (Table 8) we can observe that Semi Arid (SA) regions showed greater average TW (24.87) and average ANT (13.70) and lower average MINaT (-1.15) than Semi Damp (SD) locations (average TW = 18.69, ANT = 11.04 and MINaT = 4.47). Also in SD locations TAR and MxR values (626 and 103, respectively) are much higher than those of SA regions (372 and 58, respectively). Therefore, as SD are heater than SA regions, it is possible to hypothesize that heat events accompanied with high rainfall in SD regions could be more stressful since thermal stress increases when high    temperatures and high relative air humidity go together [25]. In this sense, Paim and colleagues [26] described that THI (an index that combines air temperature and relative air humidity) had a great influence on animal superficial temperatures, demonstrating that this is able to characterize the animal response to environment. However in this work any association was found between this variable and polymorphisms frequencies.
Regarding the G -660 allele in the gene promoter, opposite results than those of the I -668 were observed, which agree with the transcription results above mentioned. The G -660 allele is linked to the lowest expression rates of the HSP90AA1 gene under both heat stress and mild temperature conditions. Therefore high frequencies of such allele are only expected in breeds reared in regions with low MINaT and ANT temperatures and high TW, in which heat is not a critical source of stress. The negative association of the G -660 frequency with MINaT (-0.16) and ANT (-0.13) and positive with TW (0.18) agree with such expectations. However, high frequencies of the C -660 allele were found in all kind of locations but predominating in breeds reared in hot climates. This could be due to the genetic exchange that occurred during the development of modern breeds more than to adaptation processes. Linkage disequilibrium (LD) between g.667_668insC and g.660G > C is little than 0.20 in the whole breeds and range between 0.001 and 0.540 across breeds, however D -668 and G -660 alleles are completely linked in the 836 animals genotyped, constituting the most thermo sensible haplotype [20].
On the basis of Reynold's distances, two groups of breeds showing the minimum distances between breeds within group and maximum distances with breeds of the other group can be established. The first group was constituted by KRC, KRB, KAR, DGL, KARM and EDIL breeds and the second group by ME, PRAM, SZ, AS, KVR and Cl breeds. Among breeds of these two groups, average, minimum and maximum distances were 0.267, 0.064 and 0.604, respectively. In these two groups of breeds, opposite frequencies of the two polymorphism most related with gene expression differences (g.667_668insC and g.660G > C) were observed. Thus, in the first group of breeds the average frequencies of the I -668 and G -660 alleles were 0.08 and 0.61, respectively. In all these breeds the G -660 allele was that with the maximum frequency and the I -668 allele had a frequency <13%. In the second group of breeds average frequencies of I -668 and G -660 alleles were 0.41 and 0.23, respectively. In all these breeds the I -668 allele frequency was >30% and the G -660 allele had a frequency <33%. Interestingly, all breeds from group 1, except KARM, are reared in SA or A climates, mainly from Asian regions. On the contrary, all breeds from group 2, except ME, are reared in SD Mediterranean climates. Average MINaT, ANT, TW, TAR and MxR were -6.57, 8.02, 28.52, 367.63 and 52.30, respectively, in group 1 and 7.87, 16.38, 17.47, 617 and 114.67, respectively, in group 2. Q 2 values higher than 0.4 were only found for I -668 , I -667 and G -660 suggesting the action of natural selection in driving the differential allele frequency distribution of these polymorphisms among sheep populations. Therefore, a correlation between genetic (allele frequencies) and environmental (climatic parameters) variables among some sheep breeds have been established which demonstrates that despite of the great admixture existing among them and its domestication status, some footprints of the natural selection action can be glimpsed. This fact may be due to the general low artificial selection exerted over breeds of this species and their semi-extensive or extensive management conditions which may have retained some genes related with adaptation to environmental conditions existing in nature. Thus, breeds reared in SD climates, in which high temperatures and humidity are sources of physiological stress, have high frequency of alleles (I -668 and C -660 ) related to higher expression rates of the HSP90AA1 gene as response to heat stress. However, low frequencies of these alleles were only found in those breeds reared in climates in which heat and humidity levels are not enough to induce a heat stress response. The frequencies of A -601 , A -522 and I -516 alleles (Q 2 values < 0.4) are not influenced by climatic conditions and therefore its presence in the HS90AA1 gene promoter seem to have no impact in the adaptation to environment of the ovine species. This finding was already suggested by [19] by notice that these polymorphisms did not produce expression differences among genotypes when comparing RNA samples obtained under heat stress and thermo-neutral conditions.
In a large study where 49,034 SNPs were genotyped in 74 sheep breeds, [3] some signs of directional selection in two candidate genes located at chromosome 18 (F ST = 0.428), in which also the HSP90AA1 gene is located, were found. One of them was ABHD2 (abhydrolase domain containing 2) which has, among other functions, a role in the response to wounding. This protein that interacts with UBC (polyubiquitin C) has a high expression rate in testicle (BioGPS. biogps.org) and correlates with HSPA1L (Heat shock protein 70 kD like). Hsp70 is a well known protein involved in the heat shock response which is part of the Hsp90 complex. Therefore, although authors [3] recognize that the identification of adaptive alleles has not been achieved, some footprints of directional selection over genes more or less directly related with adaptive traits can be found.
When assessing evidence for an ecocline, it is crucial to control for population history and structure, for accurately assessing whether a correlation between a genetic variant and geographic or climate variables is due to natural selection [27]. For example, if migration patterns correspond closely with variation in a particular climate variable, the correlations between neutral alleles and that climate variable may be high even if selection has not acted on the locus. Conversely, if selection effects are lower to that of population structure on allele frequencies, correlations may be underestimated if population history is not taken into account [4].This is the reason why PLSR and SAM approaches cannot be used independently, without comparing results with specialized statistic methods based on population genetics theories, and focus on the analysis of genetic data as the Bayesian Test of Beaumont and Balding (LOSITAN). Thus, among all loci-environment associations detected by PLSR and SAM methods, only the frequency of two polymorphisms, the g.703_704del(2)A and the g.522A > G, seems to be under the action of some selective process. The g.703_704del(2)A showed a high F ST outlier which makes it a candidate to directional selective processes. The low F ST outlier of the g.522A > G SNP reveals the possibility of balancing selection acting over its frequency. The g.703_704del(2)A is highly linked with the g.660G > C SNP (r 2 = 0.86 in the whole data) ranging r 2 values in most breeds from 0.84 to 1. Thus, directional selection predicted for the g.703_704del(2)A could be extended to the SNP g.660G > C for which differential expression of the HSP90AA1 gene has been assessed depending on genotype [19,20], but not with the g.667_668insC. The high degree of conservation in LD phase found in this sort sequence in almost breeds, independently of their geographic origin, could indicate that high levels of gene flow have occurred between populations following domestication, as is suggested by Kijas and coworkers [3], but also, a selection pressure exerted over this DNA region [28].
The Bovidae family includes more species than any other extant family of large mammals, but their phylogenetic relationships remain largely unresolved in part because it appears to represent a rapid, early radiation into many forms without clear connections among them [29]. Furthermore, certain morphological traits have evolved several times within the family to create evolutive convergences that obscures true relationships [30]. The subfamily Caprinae includes bovids adapted to extreme climates and difficult terrains. Fossil records are poorly documented but the group first appeared during the upper Miocene [31]. In a recent work, a complete estimate of the phylogenetic relationships in Ruminantia has been proposed combining morphological, ethological and molecular information [29]. The resolution of the supertree varies among groups and some component clades, particularly Caprinae (67.7%), are much less well resolved than others (e.g. Bovinae, 95.7%). In particular, the position of the genera Budorcas and Ovibos has been controversial, having at times constituted the tribe Ovibovini, and at others been separated and located in different tribes. In general, the genus Ovis is split into a "New World" clade represented by O. dalli and O. canadensis and an "Old World" clade including the two sister species O. vignei and O. aries, on the one hand, and O. ammon, on the other hand [29,32]. In our work, haplotypes from O. vignei, O canadiensis and O. musimon appeared mixed with those from O. aries. O. aries and O. musimon share many polymorphic sites (7) as expected from the past hybridization between both species.
Ropiquet and Hassanin [33] using mitochondrial and nuclear DNA sequences located A. lervia closer to goats (Capra) and O. moschatus closer to R. pyrenaica. However, in recent works [34,35] A. lervia was closer to Rupicapra genus within the Caprina tribe and O. moschatus was distant from them within the Ovibovina tribe. Our tree located A. lervia as a sister species of C. hircus and C. pyrenaica (boosttrap proportion = 97) and R. pyrenaica closer to O moschatus (bootstrap proportion = 60). In the work of Matthee and Davis [36] using data from nuclear DNA a politomy for C. hircus, O. moschatus and O. aries was found. However, when analyzing nuclear DNA joined to mtDNA data, C. hircus and O. aries appear as sister species separated from O. moschatus. In our work we have observed a relative high similarity between O. moschatus, O. aries and O. musimon species regarding polymorphism shared among them.
Although O. moschatus is currently restricted to Greenland and the Arctic Archipelago [37], a higher frequency of alleles related with the heat stress response (I -668 = 0.43; C -660 = 0.90) were found in this species. Fossils of this species have occasionally found in southwest Europe, so that's why it seems that Ovibos did not inhabit exclusively cold tundra during the Pleistocene [37]. Praeovibos, an older morphotype of O moschatus, does not appear to have been restricted to inhabiting cold climates as its remains have also been identified in temperate and Mediterranean forest [38,39]. In contrast to modern Ovibos, Praeovibos was distributed over much more southern latitudes, samples have been found as far south as France and Spain [38][39][40], which indicates that Praeovibos is an early more cosmopolitan form of muskox [37]. Could these high frequencies of alleles related with the heat stress response found in O moschatus came from its Praeovibos ancestor? Lent [41] indicates that muskox is sensitive to both climate warming and fluctuations, that is why Campos and colleagues [37] hold these factors responsible of the actual confinement of the muskox to Greenland and the Arctic Archipelago but not a human impact. Our results regarding the polymorphisms of the HSP90AA1 gene in this species seems to indicate that the actual muskox is genetically well prepared to tolerate warm climates. Therefore, which could be the reasons to its actual geographic limitations? Climate change is known to affect not only animal's thermo sensitivity but also by triggering vegetation change [39,42].
Increasing temperature pushed the adaptive vegetation balance firmly towards bogs, shrub tundra, forest and lownutrient acidic soils, which resulted in communities of conservative plants highly defended against herbivore and supporting a small biomass of large mammals [43]. Palmqvist and coworkers [44] in an ecomorphological analysis of the early Pleistocene fauna of Venta Micena (Orce, Guadix-Baza basin, SE Spain), provide interesting clues on the physiology, dietary regimes, habitat preferences and ecological interactions of large mammals.
Unexpectedly, A. lervia which colonizes arid and hot areas of the rocky mountains of north Africa (Sahara and Magreb) is not polymorphic for the mutation most associated to the upregulation of the HSP90AA1 gene induced by heat stress events. It is probably that in this species, as in the Bos genus, other genetic mechanisms exist to cope with stress imposed by climatic conditions.
Regarding those polymorphisms for which our group has detected some relation with hot climates adaptation by its association with the expression rate of the HSP90AA1 gene under heat stress conditions (g-.667_ 668insC and g.660G > C), it's noteworthy that they were only segregating in C. hircus, O. moschatus, O.musimon and O. aries. Because there were only one sample of O. vignei and two of O ammon, any conclusion from these two species can be extracted. It seems reasonable to hypothesize that these polymorphisms could come from an ancestral species common to the Ovis, Ovibos and Capra genera but not to Ammotragus. However also it is possible that the evolvability of this gene may be due to its physical susceptibility to mutagenesis and therefore that the similitudes/differences found in the species analyzed does not be related with their phylogenetic relations.

Conclusion
We have assessed that despite the domestication process occurred 11,000 years BP, sheep breeds showed some genetic footprints related to climatic variables existing in the regions where they are reared. Thus artificial selection carried out by humans to improve productive traits in this species seems to be occurred concurrently with natural selective forces for traits related with the adaptation to environmental conditions. Adaptation of breeds to heat climates can suppose a selective advantage to cope with global warming caused by climatic change. Polymorphisms of the HSP90AA1 gene detected in the Ovis aries species can be used in selection programs to improve animals resistance to heat environments. Mutations of the ovine HSP90AA1 gene promoter are also been found in wild species from the Caprinae subfamily, indicating a great antiquity of these mutations which can help us to elucidate how climatic conditions have evolved in the past.

Ethics statement
The current study was carried out under a Project License from the INIA Scientific Ethic Committee. Animal manipulations were performed according to the Spanish Policy for Animal Protection RD 53/2013, which meets the European Union Directive 86/609 about the protection of animals used in experimentation. We hereby confirm that the INIA Scientific Ethic Committee (IACUC) has approved this study.
Animal material, nucleic acid isolation, DNA amplification and SNPs genotyping Animals from 31 sheep breeds from Europe, Asia and Africa and from 11 species of the Caprinae (9) and the Bovinae (2) subfamilies constitute the biological material of this work. Tables 8 and 9 shows breeds, species, number of animals from each breed and species, location, country, continent and climatic and geographic variables. Additional file 9 (AF9) contains the genotypes of all animals analyzed for each polimorphisms existent in each species.

Polymorphisms characterization and linkage disequilibrium estimation
PLINK software [45] was used to estimate linkage disequilibrium among all pairs of the 11 SNPs measured as r 2 in the whole sheep data and in each breed separately. Hardy-Weinberg equilibrium exact test, observed and expected heterozygosis for each breed were also calculated using PLINK.

Phylogenetic relationship between sheep breeds
The relationship between breeds was examined using the Reynold's distance metric [46]. Reynold's distance (D = -ln(1-F ST ) matrix was estimated performing 90,000 permutations and a significance level of 0.05 was established. An Exact Test of population differentiation with a significance level of 0.05 was performed to test the hypothesis of a random distribution of individuals between pairs of populations [47,48], running 100,000 Markov chain and 10,000 dememorization steps. The histogram of the number of populations which are significantly different (p < 0.05) from a given population was generated.
All analyses were made by using the ARLEQUIN 3.1 software [49].
Tests to detect association of loci frequencies with environmental parameters Partial least square regression (PLSR) Partial Least Squares multiple regression (PLSR) was applied to model the relationships between polymorphisms allele frequencies found in the 31 sheep breeds genotyped and a matrix describing environmental factors (14 geographical and climatic variables) as in Fumagalli et al. [51]. The specific algorithm used to compute extracted PLSR factors was SVD (Singular Value Decomposition). SVD is a factorization of a matrix which bases the extraction on the singular value decomposition of X'Y.
For each polymorphism the relationship between population allele frequency matrix (F) of dimension 31x1 and environmental predictors matrix (M) of 31x14 dimensions was assessed. F describes minor allele frequency (MAF) at each breed for the examined polymorphism, whereas M describes all the 14 environmental variables for each population.
In order to evaluate the fit of a model, values of explained variation, R 2 , and predicted variation, Q 2 , were computed as in [51]. Q 2 provides a measure of how well a model predicts the observed data using a cross-validation procedure, which is in this case how well a model of environmental variables predicts the observed distribution of allele frequencies among breeds. If allele frequencies covary with environmental variables Q 2 will be large. Acceptable values of R 2 and Q 2 are totally dependent on the nature of the data. Lundstedt et al. [52] propose Q 2 > 0.4 and R 2 > 0.7 as acceptable thresholds for biological data.
The number of factors chosen is usually the one that minimizes the Predictive Residual Sum of Squares (PRESS). However, often models with fewer factors have PRESS statistics that are only marginally larger than the absolute minimum. To address this, van der Voet [53] proposed a statistical test for comparing the predicted residuals from different models. By applying the van der Voet's test, the number of factors chosen is the fewest with residuals that are insignificantly larger than the residuals of the model with minimum PRESS.
Uninformative variable elimination to remove those variables that are useless was made in the basis of two filter criteria: the Variable Importance in Projection values VIP [54] and the cumulative variance explained by the top two PLSR components. The idea behind VIP measure is to accumulate the importance of each variable being reflected by the loading weights from each component. It is generally accepted that a variable should be selected if VIP > 1, but a proper threshold between 0.83 and 1.21 can yield more relevant variables [54,55]. A meteorological variable was declared important when 1) its variable importance in projection (VIP) was greater than 0.83 and 2) the cumulative variance explained by that meteorological observation by the top two PLSR components was at least 40%. The criterion to asses that the elimination of uninformative variables improves the model is to compare PRESS values obtained for the complete and the reduced model (new). If PRESS new < PRESS we can conclude that the elimination of uninformative variables improve modeling.
All computation were performed using the PLS procedure of the SAS 9.3 Statistical Package (Base SAS® 9.3).