On the edge of Bantu expansions: mtDNA, Y chromosome and lactase persistence genetic variation in southwestern Angola
© Coelho et al. 2009
Received: 28 July 2008
Accepted: 21 April 2009
Published: 21 April 2009
Skip to main content
© Coelho et al. 2009
Received: 28 July 2008
Accepted: 21 April 2009
Published: 21 April 2009
Current information about the expansion of Bantu-speaking peoples is hampered by the scarcity of genetic data from well identified populations from southern Africa. Here, we fill an important gap in the analysis of the western edge of the Bantu migrations by studying for the first time the patterns of Y-chromosome, mtDNA and lactase persistence genetic variation in four representative groups living around the Namib Desert in southwestern Angola (Ovimbundu, Ganguela, Nyaneka-Nkumbi and Kuvale). We assessed the differentiation between these populations and their levels of admixture with Khoe-San groups, and examined their relationship with other sub-Saharan populations. We further combined our dataset with previously published data on Y-chromosome and mtDNA variation to explore a general isolation with migration model and infer the demographic parameters underlying current genetic diversity in Bantu populations.
Correspondence analysis, lineage sharing patterns and admixture estimates indicate that the gene pool from southwestern Angola is predominantly derived from West-Central Africa. The pastoralist Herero-speaking Kuvale people were additionally characterized by relatively high frequencies of Y-chromosome (12%) and mtDNA (22%) Khoe-San lineages, as well as by the presence of the -14010C lactase persistence mutation (6%), which likely originated in non-Bantu pastoralists from East Africa. Inferred demographic parameters show that both male and female populations underwent significant size growth after the split between the western and eastern branches of Bantu expansions occurring 4000 years ago. However, males had lower population sizes and migration rates than females throughout the Bantu dispersals.
Genetic variation in southwestern Angola essentially results from the encounter of an offshoot of West-Central Africa with autochthonous Khoisan-speaking peoples from the south. Interactions between the Bantus and the Khoe-San likely involved cattle herders from the two groups sharing common aspects of their social organization. The presence of the -14010C mutation in southwestern Angola provides a link between the East and Southwest African pastoral scenes that might have been established indirectly, through migrations of Khoe herders across southern Africa. Differences in patterns of mtDNA and Y-chromosome intrapopulation diversity and interpopulation differentiation may be explained by contrasting demographic histories underlying the current female and male genetic variation.
Among the complex series of demographic events that shaped the patterns of human genetic variation in Africa, the massive dispersal of Bantu-speakers stands as one of the most impressive examples of human migration. Both linguistic and archeological evidences suggest that the spread of Bantu languages started about 4000 years ago in the adjacent grasslands of Cameroon-Nigeria and involved large movements of farmers carrying an agricultural tradition especially well-suited to the climate conditions prevailing in subequatorial Africa [1, 2]. According to a widely accepted dispersion model, one major population movement involved the expansion of ancestors of East Bantu speakers along the northern fringe of the African rain forest into the interlacustrine areas surrounding Uganda [1–3]. Another important movement is thought to be linked to the early penetration of ancestors of West Bantu speakers into the wet coastal areas of the central African forest, beyond the Cameroon plateau . More recent major expansions would include the migrations of West and East Bantu speakers into the dry territories located beyond the southern borders of the rain forests, which eventually culminated with the diffusion of Bantu languages across southern Africa [1, 2]. However, this basic representation of the major trends of Bantu dispersals has not remained unchallenged [3–5], and many specific details of the migration dynamics leading to the emergence of widespread Bantu-speaking communities are still poorly understood .
Genetic data have great potential for unraveling the complex population history underlying Bantu expansions, but there are a number of difficulties related to sampling coverage and parameter estimation that need to be overcome. So far, most of the available genetic information has been gathered in phylogeographic studies of Y-chromosome and mitochondrial DNA (mtDNA) variation. These studies identified several mtDNA haplogroups likely to be associated with the Bantu migrations that trace their ancestries to different geographic regions of Africa . In contrast, the great majority of Bantu Y-chromosome lineages were found to belong to a single widespread haplogroup (E3a), which seems to have overrun most pre-existing diversity [7–9]. Recently, a few studies have begun to address more detailed aspects of regional mtDNA variation by increasing both the resolution of sequence data and the density of population sampling [10, 11]. However, in spite of this progress, current understanding of Bantu expansions is still hampered by lack of sampling of crucial regions in subequatorial Africa. The area of Angola, in particular, has remained persistently underrepresented in most studies of African genetic variation . Although new genetic data has become available from Kimbundu and Bakongo speakers from northern Angola and Cabinda [13, 14], no information exists on the broad area encompassing the dry woodlands to the south of the Cuanza river, which is critical for understanding the push of West Savanna Bantu-speaking peoples out of the rain forest into the arid steppes of southwestern Africa.
Being exposed to the effects of the Benguela current, southern Angola provided a new environment characterized by increasing levels of aridity that challenged the progression of the agricultural lifestyle that had predominated in the well irrigated lands of the Congo basin [1, 2]. Faced with this environmental shift, some groups, like the Ovimbundu, settled the high grounds of the Bié plateau where they could find areas of relatively fertile soil and higher rainfall . In the coastal areas, and further to the south, settlements had to be limited to major river valleys and subsistence economies became increasingly dependent on cattle raising. The Herero, the Ovambo and the Nyaneka-Nkhumbi are examples of such Bantu groups in southwestern Angola and form a broad cultural and economic cluster relying on cattle raising to various degrees [15, 16]. Among them, the Herero are the most exclusively pastoral of all Bantu peoples from southwestern Africa and penetrated well into the arid regions of the Namib Desert where they shared their mode of life with neighboring non-Bantu Khoe cattle herders . This cultural and geographical proximity between Bantu and Khoisan-speaking groups poses intriguing questions about the development of the Southwest African pastoral scene and the nature of the interactions between the vanguard of West Bantu speakers and the non-Bantu peoples from the desert. For example, the role played by Khoe herders in the adoption of the present pastoral specialization of Bantu speakers is still not clear . Moreover, the relative isolation of Southwest Africa from the major East African pastoral centers represents an important challenge for the identification of the migration routes that led to the emergence of a cattle-herding zone in the southwestern periphery.
Here we present an analysis of the western edge of the Bantu expansions based in the study of Y-chromosome, mtDNA and lactase persistence genetic variation in West Savanna Bantu-speaking groups from southwestern Angola. We analyzed the data in the context of regional and continental genetic diversity by assessing the differentiation between these populations and their levels of admixture with Khoisan-speaking groups, and by examining the relationship between southwestern Angola and other areas of Africa. Furthermore, we combined our dataset with published data on Y-chromosome and mtDNA variation from Southeast Africa to explore a general isolation with migration (IM) model [19, 20] and infer key demographic parameters underlying the history of Bantu expansions.
The groups included in our sample represent West-Savanna Bantu-speaking populations  from the southwestern edge of the Bantu expansions (Figure 1A) and rely on different combinations of agricultural and pastoral lifeways. The Ovimbundu form the largest ethnolinguistic cluster in Angola, making up 35% of the total population. Their original core area was located in the Bié plateau, but they underwent a series of southward expansions that considerably enlarged their territory and are presently one of the major groups inhabiting Namibe [15, 16] (Figure 1A). Traditionally, most Ovimbundu groups practiced mixed farming and kept livestock. However, cattle raising was not crucial for subsistence and few families owned large herds . The Nyaneka-Nkhumbi-speaking groups, who have also spread to present-day Namibe, originally settled the area located West from the middle Cunene River, including the Huíla plateau in the eastern limit of the Namibe province (Figure 1A). These peoples are agro-pastoralists that depend in part on cultivation, but keep large cattle herds and use dairying products as an important source of subsistence [16, 23]. The Kuvale people dwell in the arid lowland areas of the Namibe province and are one of the most representative Herero-speaking groups from Angola  (Figure 1A). Like other groups included in the Herero cultural division, they are semi-nomadic cattle herders and rank among the most exclusively pastoral peoples of southwestern Africa. The Ganguela-speaking peoples originally settled southeastern Angola, which is well removed from Namibe (Figure 1A). However, during the Angolan civil war many Ganguela families flew to neighboring countries and to other regions of Angola, including Namibe. The Ganguela originally included a number of scattered farming communities that were split by the southern expansion of the Chokwe peoples in the 19th century. The populations that remained in the western side of the Chokwe penetration progressively adopted cattle and became mixed farmers .
We have sequenced both hypervariable segments I (HVS-I; positions 16024–16400) and II (HVS-II; positions 73–340) of the mtDNA control region. MtDNA sequencing was performed as described previously . All HVS-I and HVS-II sequences are shown in Additional file 1. To assign mtDNA sequences to previously defined haplogroups, we initially followed established criteria based on HVS-I sequence variation [6, 24] updated as recently discussed . Occasional ambiguities in these assignments were resolved by additional typing of a selected set of four diagnostic restriction fragment length polymorphisms (RFLPs): 3592 Hpa I (absent in L3), 2349 Mbo I (present in L3e), 10084 Taq I (present in L3b) and 8616 Mbo I (absent in L3d). After this initial assignment step, we used the available information on HVS-I and HVS-II variation provided by published complete mtDNA sequences to refine and/or rename the classifications according to the most recently updated mtDNA phylogeny  (see Additional file 1). For the sake of comparison we refer to the HVS-I-based nomenclature throughout the article.
To characterize the nonrecombining portion of the Y-chromosome (NRY) we genotyped 9 unique event polymorphisms (UEPs; M2, M35, M60, M91, M112, M150, M213, YAP, SRY4064) and 11 short tandem repeats (STRs; DYS19, DYS389I, DYS389II, DYS385, DYS390, DYS391, DYS392, DYS393, DYS437, DYS438, and DYS439). The DYS385 locus consists of a duplicated tetranucleotide STR region and was omitted from some analyses. Except for YAP, UEPs were typed by direct sequencing of PCR products. Primer sequences and protocols are provided upon request. Short tandem repeats were typed with the Promega Powerplex Y System. All Y-chromosome combined haplotypes, defined by UEP and STRs, are shown in Additional file 2. For the sake of comparison NRY haplogroups based on the UEP variation were named according to the Y-chromosome Consortium guidelines [27, 28], but we also provide haplogroup names according to a most recent update , which in our dataset essentially involves the renaming of haplogroup E3a as E1b1a (see Additional file 2).
Lactase persistence was screened by direct sequencing of a 359 bp PCR fragment located within intron 13 of the MCM6 gene, which contains all single nucleotide polymorphisms (SNPs) that have been so far associated with lactase persistence in human populations: G/C -14010; T/G -13915; C/T -13910; and C/G -13907 [30–32] (see Additional file 3 for typing details). In addition to the southwestern Angolan sample, we further typed the lactase persistence-associated SNPs in a total of 111 Bantu speaking individuals belonging to 11 different population groups from Mozambique: 3 Chopi, 4 Chwabo, 19 Makhwa, 15 Makonde, 15 Ndau, 11 Nyanja, 15 Ronga, 2 Sena, 15 Shangaan, 1 Shona and 11 Tswa.
Summary statistics for mtDNA and Y-chromosome haplotype variation, and Tajima's D and Fu's Fs tests were calculated and performed with the ARLEQUIN 3.11 software package . Analyses of molecular variance (AMOVA) to evaluate the apportionment of genetic variation were also performed using ARLEQUIN 3.11. Correspondence analysis based on mtDNA and Y-chromosome haplogroup frequencies was performed using the POPSTR program .
Population cross-comparisons for the mtDNA data were restricted to the 16090–16365 HVS-I sequence range and were based in an assembled dataset comprising approximately 5400 mtDNA profiles from 73 populations (Figure 1B and Additional file 4). For the NRY data, cross-population comparisons were based either on UEP-defined haplogroups or on higher resolution haplotypes defined by a subset of 7 STR loci (DYS19, DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393) common to all samples assembled in a database of about 5072 haplotypes from 72 populations (see Additional file 5). Networks of NRY haplotypes and mtDNA sequences were constructed using the NETWORK 4.5 software . For NRY haplotypes, the reduced-median  and median-joining  algorithms were applied sequentially and differential microsatellite weighting was used to resolve extensive reticulation at microsatellite loci. Weights for each microsatellite were inversely proportional to the ratio of the variance displayed by each marker within the respective haplogroups and the average variance value across loci in those haplogroups. For mtDNA sequences, the median-joining algorithm  was used without further weighting. Ages of mtDNA and NRY lineages were estimated with the ρ (rho) statistic  using NETWORK 4.5, assuming 25 years per generation, a mtDNA control region mutation rate of μ = 7.55 x 10-6 per nucleotide per generation, based in a recent Bayesian estimate , and the following NRY-STR per generation mutation rates : μDYS19 = 0.0017; μDYS389I = 0.0019; μDYS389II = 0.0023; μDYS390 = 0.0023; μDYS391 = 0.0035; μDYS392 = 0.0006; μDYS393 = 0.0007.
Admixture proportions were estimated with the ADMIX2.0 program . MtDNA-based estimates were calculated from haplogroup frequencies without taking into account molecular distances between haplogroups. NRY-based estimates were calculated from the frequency of haplotypes defined by STR loci DYS19, DYS389I, DYS389II, DYS390, DYS391, DYS392, and DYS393, not taking into account molecular distances between haplotypes.
We have also attempted to infer the key demographic parameters of Bantu expansions by analyzing our NRY and mtDNA data from southern Angola together with additional data from Southeast Africa (see Additional files 4 and 5) within the framework of a general isolation with migration (IM) model [19, 20], using the IMa program . The IM model describes the historical demographic properties of two related populations that may have varied in size after diverging from a single ancestral population, with bidirectional migration occurring at constant rate following the initial split [19, 20]. Thus, we reasoned that this framework could be applied to populations located in the two opposite edges of the Bantu expansions in order to analyze the split between the eastern and western streams of Bantu dispersals after a common origin in an area likely to be located in West-Central Africa (see Additional file 6). The model has six parameters whose posterior probability distributions can be estimated by using the Markov chain Monte Carlo (MCMC) approach implemented in the IMa computer program : effective population sizes for both extant (N1 and N2) and ancestral (NA) populations, time since divergence (t) and migration rates in both directions (m1 and m2). These demographic terms are obtained by conversion from estimated basic parameters that are scaled by the mutation rate (per locus per generation): θA = 4NA μ; θ1 = 4N1 μ; θ2 = 4N2 μ; t = tμ; m 1 = m1/μ; m 2 = m2/μ. The mtDNA dataset consisted of 724 HVS-I 276 bp-long sequences ranging from positions 16090 to 16365, including 358 sequences from southwestern Angola, collected in the present study, and 366 assorted sequences from different ethnic groups from Mozambique and Zimbabwe (see Additional file 4). The NRY data consisted of 348 haplotypes defined by 7 STR loci totaling 236 Y chromosomes from the present Angolan sample and 112 chromosomes from Mozambique (see Additional file 5). Parameter conversions were done by using the aforementioned mtDNA control region and NRY STR mutation rates. After preliminary runs to determine plausible uniform prior ranges, the IMa program was run for at least 10 million steps after 100000 steps of burn-in with 8 Metropolis-coupled chains, with geometric heating. For each dataset at least two independent replicates were performed using the same running options and a different random seed to assess convergence of the parameter estimates. MtDNA sequences were assumed to mutate under the Hasegawa-Kishino-Yano (HKY) finite sites mutation model . Mutation in NRY STRs was modeled by the stepwise mutation model (SMM). The mode of each marginal posterior distribution generated by the program was considered a point estimate of the corresponding parameter value. Reported parameter estimates are means from replicate runs.
MtDNA HVS-I sequence diversity in populations from southwestern Angola
To investigate the relationship between Namibe and other African populations, a correspondence analysis based on mtDNA haplogroup frequencies was performed using contextual samples from different Sub-Saharan regions (Figure 1B; Additional file 4). The southern African Khoe-San samples distort the correspondence analysis (data not shown) due to their genetic uniqueness (high frequencies of haplogroups L0d/k) and were excluded from further analysis in order to achieve better resolution of the genetic relationships among populations.
Most mtDNA haplotypes that are commonly found in Sub-Saharan populations were also observed in Namibe (see Additional file 1; Figure 2C). The most frequent (>5%) haplogroups were L0d (6%), L0a1 (9%), L0a2 (8%), L1c1 (8%), L1c2 (7%), L2a1 (10%), L3e1 (9%), L3e2 (7%) and L3f (6%). Haplogroup L1c1a, which is typical of Pygmy populations from Central Africa [10, 11], was virtually absent from our sample (see Additional files 1; Figure 2C). The relatively high frequency of the typical Khoe-San L0d haplogroup contrasts with previous findings from northern Angola [13, 14] but compares with observations in Bantu groups from Southeast Africa [6, 44]. However, this haplogroup is not evenly distributed in the Namibe samples and reaches much higher frequencies in the Kuvale (22%; Figure 2D) than in other groups. When correspondence analysis is focused on the four populations from Namibe (Figure 2B) the genetic peculiarity of the Kuvale caused by the high frequency of L0d becomes obvious.
In order to analyze the likely origin of mtDNA sequences from southwestern Angola, we used the comparative mtDNA African dataset (see Figure 1B and Additional file 4) to study the patterns of HVS-I lineage sharing between Namibe and other Sub-Saharan populations. Although restriction of population cross-comparisons to the HVS-I control region increases the number of available samples, it is important to note that some matches may involve sequences that are phylogenetically unrelated. However, these cases are expected to seriously bias the conclusions only if convergence episodes are non-randomly distributed across lineages.
Lineage sharing with Pygmies and southern African Khoisan-speaking peoples is low (Figure 3). Even the sequences that belong to the typical Khoe-San L0d haplogroup did not match any Khoisan-speaking population from the database. However, network analysis clearly shows that the Angolan L0d lineages are phylogenetically related to other typical Khoe-San sequences from southern Africa (see Additional file 8). We have attempted to calculate the age of the unmatched L0d lineages by estimating the average number of mutational changes to their closest southern African ancestor, using the ρ statistic (data not shown). Estimated ages were found to vary between 4816 (± 4816) and 17308 (± 9667) years.
Estimated admixture proportions of mtDNA lineages from southwestern Angola
South Africa Khoe-San
Despite being associated with high standard deviations, estimates of the admixture proportions are consistent with a major (0.74 ± 0.15) contribution of West-Central Africa to the southwestern Angola mtDNA pool (Table 2). Contributions from West Africa (0.04 ± 0.07), East Africa (0.04 ± 0.06) and the Pygmies (0.05 ± 0.05) seem to have been residual and significantly lower than that from southern African Khoisan-speaking peoples (0.13 ± 0.02). Moreover, it is interesting to note that the calculated contribution from the Khoe-San in each population group shows a stepwise increase that appears to be correlated with the degree of dependence on animal husbandry of the different groups: Ganguela (0.03 ± 0.02) < Ovimbundu (0.07 ± 0.03) < Nyaneka-Nkhumbi (0.12 ± 0.03) < Kuvale (0.33 ± 0.08). The relatively low standard deviations associated with these estimates reflects the high levels of differentiation of the Khoe-San, showing that the use of an admixture model is more adequate when parental populations have remained isolated for a long time.
In accordance with the trend observed for mtDNA (Table 1), NRY diversity in STR-defined haplotypes loci was found to be lower among the Kuvale (θk = 24.6) than in the Nyaneka-Humbi (θk = 60.3) and the Ovimbundu (θk = 61.1), revealing a consistent pattern of population size reduction and genetic drift in the Kuvale group.
Although the majority (~80%) of Y-chromosome lineages in southwestern Angola belong to haplogroup E3a-M2 (Figure 4C; see Additional file 2) the distribution of the remaining lineages is not uniform across the Namibe samples (Figure 4D). The small sample from the Ganguela has E(xE3a, E3b) lineages that are less common in the other groups, while the Kuvale and the the Nyaneka-Nkhumbi carry the B2b-M112 haplogroup, which is known to be frequent both in Pygmies and Khoisan-speakers [8, 9, 46]. Figure 4B emphasizes the influence of the E(xE3a, E3b) and B2b minor haplogroups in separating the Ganguela and Kuvale from the Nyaneka-Nkhumbi and the Ovimbundu. This local pattern is remarkably congruent with that obtained with mtDNA (Figure 2B).
To study the patterns of lineage sharing in the Y-chromosome, we used the populations from the comparative NRY African dataset with reported haplotype data for a common set of seven STR loci (see Additional file 5). Due to the high level of convergent evolution among NRY haplotypes based on this limited subset of STRs, the possibility of phylogenetically unrelated matches cannot be completely ruled out, as for mtDNA.
Haplotypes within haplogroup B2b remained unmatched, but phylogenetic relationships inferred by network analysis suggest that these haplotypes are more likely to have been derived from Khoe-San populations than from Pygmies (see Additional file 9). Estimated ages of the unmatched B2b lineages based in the average number of mutational changes to the closest southern African ancestor were found to vary between 19445 (± 13749) and 29168 (± 20624) years.
As for mtDNA, we used an explicit admixture model to infer the relative contributions of different African regions to the sampled southwestern Angola Y-chromosome pool. However, admixture calculations based on the frequencies of SNP-defined haplogroups lead to estimates that were associated with high standard deviations and often exceeded the 0–100% range under different combinations of parental populations. As these implausible results were likely to be due to the high similarity of haplogroup frequency profiles of West and West-Central Africa, both dominated by the E3a-M2 haplogroup, we performed a higher resolution analysis using the frequencies of haplotypes defined by a common set of 7 STR loci (see Additional file 5). At this level of resolution, the E3a-M2 haplotype subset defined by alleles 15-21-10-11-13 at loci DYS19, DYS390, DYS391, DYS392 and DYS393, which has been considered a founder lineage of Bantu expansions , has very different frequencies in West-Central (~0.24) and West Africa (~0.08).
Estimated admixture proportions of Y-chromosome lineages from southwestern Angola
South Africa Khoe-San
Given the well known association between lactose tolerance and pastoralism , we reasoned that the study of lactase persistence mutations in Namibe might be informative for exploring historical links between southwestern Bantu cattle herders and other pastoral communities elsewhere in Africa. To this end, we screened our sample for all SNPs that are currently known to be associated with lactase persistence in human populations. We found that the -14010C allele, which is most frequent in Nilo-Saharan and Afro-Asiatic populations from Kenya and Tanzania (32–42%, ), was present at lower frequencies in the Ovimbundu (1%), the Nyaneka-Nkhumbi (3%) and the Kuvale (6%). By contrast, we could not find any lactase persistence-associated allele in an additional sample of 111 individuals from several ethnolinguistic groups in Mozambique.
Estimates of demographic parameters in the Southwest and Southeast edges of the Bantu expansions
(0–4 × 10-3)
(0–6 × 10-3)
2.6 × 10-4
5.5 × 10-4
(9.0 × 10-5-4.5 × 10-3)
(2.8 × 10-4-4.7 × 10-3)
2.4 × 10-4
3.0 × 10-3
(5.0 × 10-5-3.9 × 10-3)
(4.5 × 10-4-4.8 × 10-3)
2 × 10-3
7 × 10-3
(4.7 × 10-4-1.6 × 10-2)
(1.4 × 10-3-1.6 × 10-2)
1 × 10-3
9 × 10-3
(4.5 × 10-4-1.1 × 10-2)
(4.5 × 10-4-1.5 × 10-2)
Estimated current population sizes (N1 and N2) are 4 to 8 fold higher than ancestral population sizes (NA) showing that the dispersal of Bantu speaking groups involved both significant size growth and geographic expansions. Population growth could actually have been even more marked, if bottlenecks occurring at the formation of daughter populations caused initial size reductions . There are important differences in the estimates based in the mtDNA and NRY datasets. The Y-chromosome-based estimate for the current Southeast Africa population size is about one half that from the Southwest (N1~10000, N2~5000; Table 4), while current population sizes estimated from the mtDNA data are similar in both edge populations (N1 and N2 ~30000; Table 4). Current population sizes based on Y-chromosome estimates lie on the lower range of reported African-specific population size, while current population size estimates from the mtDNA are in the upper range of reported values from African populations . Ancestral population sizes inferred from the Y-chromosome (NA~1200) and mtDNA (NA~7100) are also different.
With regard to migration, all Y-chromosome runs consistently yielded values close to zero (see Additional file 10), corresponding to the first bin of the surveyed parameter space (Table 4). In contrast, population migration rates inferred from mtDNA are high (2N1m1 and 2N2m2 > 15; Table 4), pointing to extensive female-mediated gene flow between the west and east branches of Bantu expansions. In all cases, 2N2m2 estimates (migration from Southwest into Southeast) were found to be consistently higher than 2N1m1 (migration from Southeast into Southwest), but this observation must be regarded with caution, since 2N2m2 estimates from independent runs failed to converge to a single maximum (see Additional file 11). Although the credibility intervals obtained in different runs were quite similar, migration rate distributions were typically two-peaked. While a fraction of the runs yielded a major peak for lower 2N2m2 values (~35), in other cases the pattern reversed and the major peak corresponded to unusually high 22Nm2values (~180) (Table 4; Additional file 11).
In order to overcome the limitations of single locus estimates, we have also generated inferences based on the combined mtDNA and NRY datasets (Table 4; Additional file 12). Joint estimates of population sizes support a 5-fold growth after population splitting (N1and N2 ~7000; NA~1300; Table 4). Divergence time estimates were remarkably consistent with the archeological data (t = 4000 years; Table 4), while migration rates from the western to the eastern branch remained difficult to resolve, reflecting the uncertainty associated with mtDNA dataset (Table 4; Additional file 10).
It is generally accepted that the mtDNA pool of Bantu speaking populations comprises a diverse set of lineages that trace their phylogeographical ancestry into three major sub-continental regions: West Africa, East Africa and West-Central Africa [6, 11, 24]. In spite of the growing knowledge about the ultimate regional sources of Bantu mtDNA lineages, the understanding of the major demographic processes that led to the assemblage and distribution of these diverse regional contributions among the different areas of the Bantu-speaking universe is still far from being complete. Our analysis shows that haplogroups currently associated with the Bantu mtDNA pool from southwest Angola reflect the combination of different regional contributions generally observed in most Bantu-speaking populations [6, 11, 24]. However, both the patterns of lineage sharing and admixture estimates from different potential source populations strongly suggest that the bulk (~75%) of mtDNA variation in southwestern Angola can be traced back just to West-Central Africa, in areas that are adjacent to the original heartland of Bantu expansions . The only additional region with a significant (~13%) genetic contribution to Southwest Angola was southern Africa, indicating that most extant mtDNA variation from southwestern Angola may have simply resulted from the encounter of an offshoot of West-Central Africa with autochthonous Khoisan-speaking peoples from the south.
It is, therefore, likely that the occupation of Southwest Africa has been preceded by a period of assemblage of diverse mtDNA contributions up north, in West-Central Africa, followed by subsequent migrations from specific dispersal centers into the southwest. According to linguistic and archeological evidences, a likely dispersal center to Angola would have been located in savanna areas just south of the equatorial forest, around the Tshikapa site, where premetallurgical Bantu speakers originating in Cameroon/Gabon might have acquired iron technology and livestock from eastern Bantu peoples, before proceeding to the southwest [21, 50]. The location of this center on the southern savanna edge of West-Central Africa would explain the lack of Pygmy L1c1a lineages in Angola, in contrast with the areas closer to Cameroon and Gabon where gene flow from Pygmies was more important .
In contrast with the collection of diverse haplogroups that is generally found in the maternal pool, the NRY haplogroup composition is highly homogeneous in most potential source areas of Bantu dispersions, due to the predominance of haplogroup E3a-M2 in West and West-Central Africa [8, 9, 51]. However, STR-defined haplotypes yielded sufficient resolution to allow discrimination between Y-chromosome contributions from West and West-Central Africa and reveal a link between southwestern Angola and West-Central Africa that is remarkably congruent with the results from the mtDNA dataset.
Despite the substantial differences between their levels of haplogroup variation, both NRY and mtDNA data concurred in showing that the populations sampled in Namibe are clustered together with other Bantu groups from elsewhere. Within the local context of southwestern Angola, the divergence of the Herero-speaking Kuvale from other population groups was found to be associated with the lack of signals of demographic expansions (Table 1), suggesting that this differentiation was shaped by increased genetic drift. Evidence for reduced levels of mtDNA diversity that are likely to have been caused by recent bottlenecks were previously described in Herero populations from Namibia and Botswana, and seem to be a pervasive feature of these groups [52, 53]. However, it is difficult to know to what extent the present diversity patterns reflect the traditional semi-nomadic pastoral way of life of the Herero or were caused by population size reductions ensuing recent conflicts with colonial rulers [54–56]. Moreover, it is not clear whether genetic drift was sufficient to generate the divergence among Herero-speaking groups that is evidenced by comparisons between the Kuvale from Angola and the Herero from Namibia, which are believed to be the most representative population of the group [17, 56]. In fact, while our Kuvale sample is essentially composed of mtDNA haplogroups that are commonly found in other Bantu populations from Angola (Figure 2 and Additional file 1), earlier data indicates that the Herero from Namibia display an unusually high (~50%) frequency of haplogroup L3d [57, 58], which was not found in the Kuvale and is known to be much less common in most Bantu populations . This pattern may imply that the broad Herero cultural division encompasses a very heterogeneous set of population groups with no obvious common origin.
A further feature of the genetic composition of the Kuvale that is not paralleled by the Herero from Namibia is their substantial levels of assimilation of Khoe-San lineages (Table 2). In fact, while Khoe-San lineages were absent in sampled Y chromosomes from the Namibian Herero  and may represent at most 8% of their mtDNA pool [57, 58], typical Khoe-San mtDNA L0d and NRY B2b haplogroups reached 22% and 12% frequencies, respectively, in the Kuvale (Figures 2 and 4). Other sampled Bantu groups from southern Angola that are not as cattle dependent as the Kuvale exhibit much lower levels of Khoe-San lineage assimilation (Figure 2, Table 2 and Additional file 1), suggesting that most gene flow occurred between the herding Khoe peoples and the herding Bantu, probably due to the similarity of their social organization. Given the lack of shared haplotypes between Namibe L0d and B2b haplotypes and the sequences available in databases of Khoisan-speaking populations, we have estimated the ages of the introgressed lineages in order to assess the time depth underlying their present differentiation. In spite of their large uncertainty, coalescent estimates pointed to ages ranging from 4816 (± 4816) to 29168 (± 20624) years, which consistently pre-date the expected arrival time of Bantu-speaking populations to southwestern Africa [1, 2]. Thus, it is likely that the divergence of these lineages occurred prior to the recent Bantu expansion. In this context, it is tempting to speculate that the unmatched Angolan L0d and B2b lineages may represent a legacy of the original speakers of Kwadi, an extinct click language remotely related with Central Khoisan that is known to have been spoken in the geographical area presently occupied by the Kuvale [60–62].
We have attempted to infer basic demographic properties of Bantu expansions using the framework of the IM model by assuming that populations located in the southwestern and southeastern edges of sub-equatorial Africa encompass the deepest branches of Bantu divergence after a common origin in West-Central Africa.
A major advantage of the IM class of models is the ability to disentangle the effects of evolutionary factors that are typically confounded in summary statistics based in equilibrium models [19, 20, 48]. However, like other model-based approaches, the IM framework relies on a number of simplifying assumptions that may be violated by empirical datasets. There are at least two assumptions that may influence the validity of parameter estimates in the context of the Bantu expansions. First, the model does not take into account the effects of gene flow from third party populations, whereas Bantu-speakers did undergo regional interactions with local non-Bantu groups that may distort the interpretation of inferred parameter values. Moreover, gene exchange involving unsampled demes lying between the two edge populations may affect inferences on the true patterns of migration, including the degree of asymmetrical gene flow . A second limiting assumption is that the ancestral population is assumed to be unstructured and to have persisted in isolation for a long time before population splitting . However the patterns of mtDNA variation suggest that the Bantu expansions might have been preceded by complex female-mediated population dynamics involving lineage assemblage in West-Central Africa and the formation of an admixed ancestral population that had no time to achieve panmixy before the expansion. It is possible that the implausibly high divergence time inferred from our mtDNA dataset (t~25000 years; Table 4) was influenced by this kind of older population structure, reflecting lack of panmixy in the ancestral population. In contrast, the more consistent divergence time estimate inferred from the Y-chromosome data (t~2000 years; Table 4) may be related to the erasure of previous ancestral variation that seems to have caused the current predominance of haplogroup E3a-M2, leading to a better fitting of the Y-chromosome data to the model. A further limitation lies in the lack of a geographical specific framework accounting for the spatial expansion of Bantu-speaking peoples.
In spite of these caveats, several consistent results could be found, showing that the analyses presented here do provide informative parameter estimates that may be contrasted in the future with other inferential frameworks and empirical datasets. Our joint estimation of the time of split between the two edges of Bantu migrations (t~4000 years; Table 4), inferred from clearly resolved posterior density peaks, is remarkably consistent with archeology-based estimates for the onset of the dispersion of Bantu speaking peoples across Africa [1, 2]. On the other hand, comparisons between estimates based on the NRY and mtDNA data reveal clearly contrasting patterns between the historic demographic parameters of male and females that may account for key present-day properties of Bantu genetic variation.
Previous comparative studies on Y-chromosome and mtDNA variation in Africa provided evidence for sex biased demographic patterns, including the observation of different levels of correlation between genetic, linguistic and geographic variation , as well as the finding that interpopulation differentiation measured by Fst estimators is higher for the Y-chromosome than for the mtDNA in food-producing societies . The latter pattern was interpreted as the result of higher migration rates and/or effective sizes in females than in males. More recently, a resequencing study of mtDNA and Y-chromosome stretches, performed in the same set of sub-Saharan African populations, has found that signals of population expansion in food-producing populations, including one combined Bantu sample, were limited to the mtDNA, while Y-chromosome data better fit models of population stationarity . We found evidence for a demographic expansion both using the Y-chromosome and the mtDNA datasets (Table 4). However, since we used a different inferential framework, a different set of populations and distinct types of genetic information, it is difficult to evaluate the causes of this discrepancy. In any case, our inferences based on the IM model seem to confirm and extend the previous trends by showing that expanding Bantu females most likely had both greater population sizes (N) and higher migration rates (m) (Table 4).
As previously proposed [67, 68], it is likely that cultural practices like polygyny, leading to a lower male effective size, and patrilocality, leading to a higher female migration rate, were the major driving forces underlying the observed patterns of genetic variation in current Bantu speaking populations. However, it is important to stress that differences in migration rates among Bantu populations do not necessarily imply differences in the ability to advance and settle new territory. Thus, the higher mobility of females does not mean that males advanced slower than females during the range expansion of Bantu populations, but simply that females were more likely to migrate across the different settlements that were progressively established as the Bantu dispersions unfolded.
Based on patterns of lineage sharing and admixture estimates, our analysis provides evidence that most genetic variation from southwestern Angola is likely to have derived from West-Central Africa. The differences in the amount of haplogroup variation between the mtDNA and Y-chromosome data suggest that the push of Bantu peoples out of the rain forests was preceded by the assemblage of diverse mtDNA contributions in West-Central Africa, a process that was not paralleled by the Y-chromosome, in which lineage extinction must have prevailed. Estimates of demographic parameters have shown that contrasting patterns of female and male genetic variation were a pervasive feature of Bantu expansions, characterized by lower male than female effective sizes and migration rates. Local interactions between the western vanguard of the Bantu migrations and Khoisan-speaking peoples from the arid regions of the South were essentially mediated by Bantu pastoral peoples like the Herero-speaking Kuvale, who share aspects of their social organization with Khoe cattle herders from adjacent areas. We hypothesize that the East African lactase persistence -14010C mutation has been carried to southern Africa by Khoe herders who contacted East African pastoralists and subsequently transferred the mutation to Bantu cattle herders in the course of genetic interactions in the Southwest.
We are grateful to all sample donors, to the Governor of the Namibe Province, Dr. Álvaro de Boavida Neto, and to Dr. Pedro Viyayauca, Chairman of Namibe's Provincial Health Department, for permission to collect samples and José Pimentel for logistic support during field work in Angola. This work was partially financed by the following research grants from Fundação para a Ciência e a Tecnologia (FCT): PPCDT/BIA-BDE/56654/2004 and PTDC/BIA-BDE/68999/2006. MC, FS and SB are supported by FCT grants SFRH/BD/22651/2005, SFRH/BPD/27134/2006, SFRH/BPD/21887/2005, respectively. We also like to acknowledge António Prista and all colleagues of the Human Biological Variability in Mozambique project for the Mozambican samples, and Nuno Ferrand for comments on the manuscript.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.