The Mediterranean Sea as a barrier to gene flow: evidence from variation in and around the F7 and F12 genomic regions
© Athanasiadis et al. 2010
Received: 16 November 2009
Accepted: 27 March 2010
Published: 27 March 2010
Skip to main content
© Athanasiadis et al. 2010
Received: 16 November 2009
Accepted: 27 March 2010
Published: 27 March 2010
The Mediterranean has a long history of interactions among different peoples. In this study, we investigate the genetic relationships among thirteen population samples from the broader Mediterranean region together with three other groups from the Ivory Coast and Bolivia with a particular focus on the genetic structure between North Africa and South Europe. Analyses were carried out on a diverse set of neutral and functional polymorphisms located in and around the coagulation factor VII and XII genomic regions (F7 and F12).
Principal component analysis revealed a significant clustering of the Mediterranean samples into North African and South European groups consistent with the results from the hierarchical AMOVA, which showed a low but significant differentiation between groups from the two shores. For the same range of geographic distances, populations from each side of the Mediterranean were found to differ genetically more than populations within the same side. To further investigate this differentiation, we carried out haplotype analyses, which provided partial evidence that sub-Saharan gene flow was higher towards North Africa than South Europe.
As there is no consensus between the two genomic regions regarding gene flow through the Sahara, it is hard to reach a solid conclusion about its role in the differentiation between the two Mediterranean shores and more data are necessary to reach a definite conclusion. However our data suggest that the Mediterranean Sea was at least partially a barrier to gene flow between the two shores.
The history of the Mediterranean involves successive population movements across the lands that surround it, both in prehistoric and historical times. In historical times, these population movements have included peoples like Greeks, Romans, Celts, Goths, Slavs, Arabs and Turks. It is thus a great challenge - as the great number of relevant human population genetic studies also reveals - to investigate the extent to which this intense migratory activity has influenced the genetic composition of the present Mediterranean populations.
Regarding the Mediterranean genetic profile, a recent X chromosome SNP study showed that the region exhibits a high overall genetic homogeneity, which seems to agree with an apparently weak genetic structure between South Europeans and North Africans, as revealed by an analysis of Y chromosome microsatellites. This pattern may be a consequence of the Neolithic demic diffusion in this region (around 10,000 years before present) and/or a high level of gene flow in the area.
In any case, the genetically homogeneous Mediterranean landscape is sprinkled with differentiated isolates such as the Corsicans, the Sardinians and populations from the Balearic Islands. Moreover, a Moroccan sample was found to present significant genetic differences from other Mediterranean populations in their X chromosomes. This last observation has been attributed by some scholars to the potential role of the Gibraltar Strait as a genetic barrier between Northwest Africa and the Iberian Peninsula, although there is no general consensus on this issue,[8, 9] possibly reflecting the fact that different markers and genomic components reveal different patterns.
In this study we investigate the genetic structure of human populations in the Mediterranean, with a particular emphasis on the genetic relationships between groups from North Africa and South Europe. We paid special attention to the role of gene flow through the Sahara in the genetic differentiation between Northern Africans and Southern Europeans. To accomplish our goals, we used polymorphisms in and around the genomic regions of the F7 and F12 genes. These genes code for the coagulation factors VII and XII respectively and are involved in blood clotting. The chosen polymorphisms from the functional regions of the two genes were previously reported to be associated with susceptibility to cardiovascular disease in groups from the Mediterranean[10, 11].
Some of the data used here (i.e. variation in and around the F7 gene) were published previously, while new data include neutral variation around the F12 gene and the F12 46C>T functional polymorphism. This extensively studied marker is related to Factor XII plasma levels and the development of thrombosis, although the causal relationship between these two features is questionable.
According to our data, the Mediterranean populations are significantly clustered into South Europeans and North Africans, despite the low genetic differentiation between the two groups. Our analyses also suggest that this differentiation can be explained by the Mediterranean Sea acting a genetic barrier, which may also have affected the sub-Saharan gene flow into the Mediterranean region.
All SNPs and the insertion/deletion polymorphism were typed with the iPLEX™ Gold assay on the Sequenom MassARRAY® Platform. For the microsatellites, PCR amplification was carried out, followed by 1:5 standard dilution and fragment analysis with the Applied Biosystems 3130 Genetic Analyzer.
Allele frequencies of all polymorphisms were calculated with GENETIX v4.05.2. Genotype frequencies were tested for goodness-of-fit to Hardy-Weinberg proportions with ARLEQUIN v3.1. Additional microsatellite statistics (number of alleles; mean and variance of repeat number; and heterozygosity) were calculated using Microsatellite Analyzer (MSA) v4.05.
In most of the analyses that follow, our intention was to use as many of the chosen polymorphisms as possible from both the F7 and F12 genomic regions. To achieve this, we first tested all 'risk' markers for selective neutrality. Those 'risk' markers for which neutrality could not be rejected were lumped together with the neutral ones. Neutrality was tested through FST comparisons: FST values were calculated for all loci by a locus-by-locus analysis of molecular variance (AMOVA) using ARLEQUIN. The molecular distance used was the number of pairwise differences. In the absence of selection, the FST values from the 'risk' polymorphisms are expected to have the same distribution as the FST values from the neutral loci. For the F7 genomic region, neutral and 'risk' FST values were compared by a Mann-Whitney test in R http://www.r-project.org. For the F12 genomic region, the single 'risk' 46C>T FST value was compared with the 95% confidence interval from the corresponding neutral FST distribution. As a final step, an additional Mann-Whitney test was carried out to check for significant differences in the patterns of variation between the markers from the two genomic regions.
In order to gain a first insight into the genetic relationships among our samples, we carried out a principal component analysis (PCA) with the ade4 statistical package in R, using the allele frequencies of all the markers from both the F7 and F12 regions. PCA was performed for 3 different datasets: (i) all 16 populations, (ii) Old World populations only (i.e. without the Bolivian samples) and (iii) populations from the broader Mediterranean region only. PC significance was evaluated through linear correlation of PC axes with group membership of each population by an analysis of variance (ANOVA) in R. PC eigenvalues were treated as dependent variables and group membership of each population as factors. The factors used were 'South Europe', 'North Africa', 'Ivory Coast' and 'South America'.
To explore the possibility of a genetic boundary, we plotted together the geographic vs. genetic distances of both same-coast and opposite-coast pairs of Mediterranean samples. Genetic distances among same-coast samples similar to those among opposite-coast samples indicate that isolation by distance (IBD) is the most plausible model of differentiation. Conversely, greater genetic distances among opposite-coast samples as compared to same-coast samples would suggest that the Mediterranean is a barrier to gene flow.
We also searched for differences in gene flow from sub-Saharan Africa towards North Africa and South Europe as a potential consequence of the Mediterranean Sea acting as a genetic barrier via two different methods:
First, the degree of haplotype sharing among different groups was determined at a regional level (i.e. sub-Saharan Africa, North Africa, South Europe), as well as at a sample level (each sample treated individually); haplotypes based on both SNPs and microsatellites were inferred for each of the two genomic regions with PHASE v2.1[23, 24] and were further analysed with ARLEQUIN to determine which haplotypes were shared by each pair of populations. This analysis was repeated for the SNPs alone. If the Mediterranean Sea was a genetic barrier between North Africa and South Europe, then we would expect a greater number of haplotypes to be shared between North Africans and sub-Saharans than between South Europeans and sub-Saharans, contributing also to a greater genetic differentiation between the two shores.
Second, pairwise Nm values, which reflect the rate of migrant exchange between two populations, were estimated using the markers from both genomic regions with ARLEQUIN, applying the same molecular distance as in the AMOVA (see above). As ARLEQUIN returns the matrix of the M values (M = 2 Nm for diploid populations), we divided this matrix by a factor 2. Again, if the 'Mediterranean as a genetic barrier' hypothesis was true, we would expect higher Nm values between the Ivory Coast and each of the North African samples than between the Ivory Coast and the South Europeans. Conversely, similar numbers of shared haplotypes or Nm values would point towards the lack of a genetic barrier imposed by the Mediterranean Sea.
After Bonferroni correction, none of the markers showed a significant departure from Hardy-Weinberg equilibrium in any population (data not shown). [Additional file 1] shows allele frequencies of the 'risk' and neutral SNPs from the F12 genomic region. The frequency of the 'risk' variant T in the polymorphism 46C>T (rs1801020) ranges from 0.081 to 0.357 in the samples from the Mediterranean countries, but was higher in the Ivory Coast (0.464) and the Native American samples (0.539 in Aymara and 0.577 in Quechua). The 46C>T frequency pattern in Native Americans was closer to that reported for Asians (C/T frequency = 0.27/0.73) than for Europeans. Allele frequencies and variation statistics of the three novel microsatellites from the F12 genomic region are shown in [Additional file 2] and [Additional file 3]. Microsatellites (TTAT)n and (AAAT)n show moderate-high variability (9 alleles, total H ≈ 0.7), while tetranucleotide (TTTA)n is considerably less variable (6 alleles, total H = 0.181). The results from the analysis of the three microsatellites were submitted to GenBank and will become available in dbSNP Build 131. For the F7 genomic region, allele frequencies of the 'risk' and neutral biallelic, as well as microsatellite, polymorphisms were reported elsewhere.
Global FST values for the neutral and risk variants (the latter in bold) of the F12 gene under 3 different datasets: All Populations; Old World samples (i.e., excluding Bolivians); and Mediterranean samples only
Analysis of variance for the PC significance
Old world populations
(S.E. - N.A. - I.C. - S.AM.)
(S.E.- N.A. - I.C.)
In the PCA plot of the old world dataset (Mediterranean countries and Ivory Coast), the 13 populations from around the Mediterranean are clustered together while the Ivory Coast is positioned further away (Figure 3b). This plot revealed a clear separation between the North Africans and South Europeans along both PC1 and PC2. The North African samples are slightly closer to the sub-Saharan sample than are the European samples in PC1, but not PC2. This may reflect a higher genetic affinity of sub-Saharan Africa with North Africa than with South Europe, due to a potentially higher gene flow from the south of the Sahara towards North Africa. Clustering by population groups along the first two PCs (37.88% of the original variation) was again significant (Table 2).
In the PCA plot of just the Mediterranean dataset, the South Europe and North Africa clusters were maintained along the first PC (Figure 3c). Tunisia is the closest of the North African groups to the South Europe groups. The first two PCs account for 29.99% of the variance, although the population clustering into these two groups was significant only along the first PC (Table 2). It is also worth noting that the High Atlas Moroccans (Khenifra and Asni) seem to be separated from the other North Africans along the second PC.
Hierarchical analysis of molecular variance based on the variation of both the F7 and the F12 genomic regions for three population datasets: All populations; Old World samples (i.e. excluding Bolivians); and Mediterranean samples only
Among populations within groups
Mantel test for the significance of the correlation between genetic and geographic distance matrices for different sample subsets: Old World, western Mediterranean (i.e. without Crete and Turkey), North Africa, South Europe and Iberian Peninsula
Haplotype frequencies of the two genomic regions in the 3 major geographic areas (Ivory Coast, North Africa and South Europe) are shown in [Additional file 4]. For the F7 genomic region, the samples from North Africa were found to share 24 of their 225 inferred haplotypes (10.67%) with the Ivory Coast. However, an almost identical percentage (10.41%) of haplotype sharing with the Ivory Coast was found for South Europe (28 out of 269 haplotypes). When each sample was examined separately, the South European samples were found to share 6.35% - 16.33% of their inferred haplotypes (SNPs and microsatellites included) with the Ivory Coast, while the respective values for North Africans ranged between 8.33% and 18% [Additional file 5]. A Mann-Whitney test showed no significant differences between the two groups of values (data not shown). The same results were obtained when haplotypes based only on SNPs were used (data not shown). Since both shores of the Mediterranean share the same percentage of F7 haplotypes with the Ivory Coast, this genomic region does not provide any support to the hypothesis that the Mediterranean Sea obstructs sub-Saharan gene flow.
As for the F12 genomic region, 21.85% of the North African inferred haplotypes (33 out of 151) are shared with the Ivory Coast, while in South Europe this percentage falls to 13.30% (25 out of 188), suggesting higher gene flow from sub-Saharan Africa to North Africa. When each population was examined separately, the North African samples were found to share 20.34% - 28.07% of their inferred haplotypes (SNPs and microsatellites included) with the Ivory Coast, while the South European samples were found to share 12.20% - 22.22% of their inferred haplotypes with the Ivory Coast [Additional file 5]. In this case, the Mann-Whitney test showed that the above percentages in North African samples are significantly greater than those in South Europe (data not shown). However, no significant differences were found when haplotypes based only on SNPs were used (data not shown).
Although some of the above observations may indicate a higher gene flow from sub-Saharan Africa to North Africa than to South Europe, there is clearly no agreement between the two genomic regions or even between marker sets using both SNPs and microsatellites or just SNPs. Interestingly, a recent study based on autosomal Alu polymorphisms and compound Alu/microsatellite systems showed that gene flow through the Sahara was different between the two Mediterranean shores for the same sample set.
Migrant exchange rates for all pairs of populations as reflected by the Nm values are shown in [Additional file 6]. As seen in the first column, the range of the Nm values for each North African sample with the Ivory Coast is 2.316 - 4.887, while the same range for the South Europeans is 1.685 - 2.627. A Mann-Whitney test showed that the Nm values of the North African samples with the Ivory Coast are significantly higher than those for South Europe with the Ivory Coast (data not shown). This finding suggests that sub-Saharan gene flow, albeit of the same order of magnitude towards both sides of the Mediterranean, is higher towards North Africa, in agreement with the data based on the F12 haplotypes presented above. Since our analyses also suggest that the Mediterranean Sea acted as a barrier to gene flow between South Europe and North Africa (see Figure 4), the differences in sub-Saharan gene flow could be interpreted as a further consequence of this barrier. However, as the significant geographic-genetic correlation for the overall Old World sample set revealed, we cannot overlook that such differences in gene flow might as well be explained on the basis of geographic distance alone.
Regarding population structure in North Africa, the Nm values tended to infinity for most pairs of North African samples, indicating the lack of any barrier to migration in the region. The only exception are the High Atlas Moroccans (Asni), which show a low rate of gene flow as compared to the remaining North African samples in agreement with the extreme position of this population in the PC plots (Figure 3).
With the exception of Tunisia in Figure 3a, South Europe and North Africa were always in separate clusters (Figure 3). Population structure between the two Mediterranean shores is also supported by the results of the hierarchical AMOVA, which point towards a low but significant genetic differentiation, confirming the results of previous independent studies[7, 8, 27]. Moreover, the F7 and F12 variation in the Western Mediterranean presents a distribution potentially compatible with the existence a genetic barrier. However, the extent to which this is true for the whole Mediterranean could not be shown here, as for such a purpose data from other important geographic regions would be necessary such as the Italian peninsula, the Adriatic Sea (e.g. Croatia and Albania) and the Northeast Africa. Finally, the data showed no consensus regarding sub-Saharan gene flow into the two sides of the Mediterranean, thereby weighing down any evaluation of its role in the North Africa vs. South Europe differentiation. The role of the Mediterranean Sea as a barrier to gene flow is still an open case.
This research has been financially supported by the CGL2005-03391 and CGL2008-03955 projects of the Spanish Ministerio de Educación y Ciencia, as well as the 2005SGR00252 project of the Generalitat de Catalunya. The work of GA has been financed by an FPU grant from the Ministerio de Educación y Ciencia (grant reference: AP2005-4425). We are grateful to all of the donors for providing blood samples and to our sampling collaborators: F Luna, C Rodríguez and M De Grado (samples from Spain), N Moschonas (Crete), H Chaabani (Monastir), M Kandil and N Harich (Khenifra), M Cherkaoui (Asni), M Melhaoui (Bouhria), and N Bissar-Tadmouri (Istanbul), A Cambon-Thomsen and MS Issad (M'zab), A Chaventré (Ivory Coast) and finally M Villena (Bolivian Aymaras and Quechuas). We also want to thank the technical and material support received by the Dr. M Stoneking's laboratory of Molecular Anthropology at the Max Planck Institute for Evolutionary Anthropology in Leipzig, Germany, where most of the microsatellite genotyping was carried out.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.