A common genetic heritage of the Austro-Asiatic groups
Mundari populations show O-M95 as the most common haplogroup and only three of the 22 populations – Lodha, Savara and Mahali- show departure from this general trend (Table 2), which appears to be because of their disputed origin . This haplogroup is also found in a relatively high frequency in the Khasi and Nicobarese. It is therefore not surprising that in a recent study  all the 12 samples of Shompen from Nicobar islands, like their own linguistic neighbors from the region, the Nicobarese, showed O-M95. This may underscore that the Mundari, Khasi-Khmuic and Mon-Khmer groups of India are not only linguistically related but also genetically linked, probably with a single but relatively broad paternal genetic source. This haplogroup has been reported to be absent or present in low frequency in other linguistic groups of India [20, 25–29], suggesting a distinct genetic identity of the Indian Austro-Asiatic populations. On the other hand, while the Austro-Asiatic populations of Southeast Asia show high frequency of O-M95 (average 38%) their neighboring populations also show considerable frequency (14.7%). However, this haplogroup has negligible presence in North and Central Asia (Fig. 5 and Table 5). Thus the predominance of this haplogroup both in Austro-Asiatic populations of India and Southeast Asia and its absence/negligible presence in other Asian populations suggests a common genetic heritage of the people of this linguistic family.
The virtual absence of O-M95 in the Tibeto-Burman populations of India [20, 28, 29] suggests that the migrations of these populations into India were not accompanied by the O-M95 haplogroup. Therefore, the presence of this haplogroup in the Garo tribe of Meghalaya is due to high degree of gene flow from the neighboring Khasi, which has been facilitated by the matrilocal system of marriage among these two tribes [11, 12]. Similarly, the presence of haplogroup O-M122 in the Austro-Asiatic Khasi with relatively high frequency (29%) could be suspected to be due to gene flow from the neighboring Garo, which is substantiated by a similar frequency and composition of subclades of O-M122 between them (χ2 = 1.597; p = 0.45). Concurrently, no separate Y-STR lineages could be identified in the M-J network within the subclades (Fig. S2 [see Additional file 1]). The comparative data suggests that Southeast Asian Austro-Asiatics near the Northeast border of India  have either O-M133* or O-M134* subclades (63%), whereas majority of the Austro-Asiatic populations from geographically distant Southeast China and Cambodia [24, 33] have O-M159 subclade (65%), suggesting that the Austro-Asiatic populations of different regions have different subclades of O-M122, which are characteristic of the neighboring non-Austro-Asiatic groups, possibly due to extensive admixture. Therefore, the presence of O-M133* and/or O-M134* subclades in the Austro-Asiatic Khasi and other Tibeto-Burman populations  including Garo from Northeast India may imply that the O-M122 in Khasi probably had its source in the neighboring Tibeto-Burman groups, particularly from the Garo. Although the foregoing analysis suggests that the Austro-Asiatic populations of India share common genetic ties, a comparative analysis among the sub-families suggests that these populations have separated quite early and are now well differentiated as indicated by the results of AMOVA (Table 3), the M-J network (Fig. 4) and the TMRCA (Table 4).
Origin of haplogroup O-M95 and expansion of Austro-Asiatic populations
Given the overwhelmingly high frequency of O-M95 in the Austro-Asiatic populations it is most likely that this haplogroup has originated among them. However, the question is whether it has its origin in India or Southeast Asia? The most likely region of origin of a haplogroup can be identified on the basis of two characteristics – the highest frequency and the highest diversity . The maximum frequency of O-M95 among the 8 Southeast Asian Austro-Asiatic populations is only 35% after excluding 3 populations with small sample size. On the other hand, the sample size for Mundari populations is generally large (~35–109, except for two) and the frequency of the haplogroup ranges from 39%–98% with an average of 63% (excluding the three populations with disputed origin), which is significantly higher compared to that of the Southeast Asian Austro-Asiatics (χ2 = 108.60; p < 0.001). Furthermore, the haplotype diversity among the Mundari populations is as high as 99%. Given this and the fact that this haplogroup is nearly absent in other parts of India as well as in Western and Central Asia, one may safely conclude that O-M95 has originated in Mundari populations roughly around 65,000 YBP (95% C.I. 25,442 – 132,230), as suggested by TMRCA. Therefore, the ancestors of present day Mundari populations must have come to India prior to the origin of haplogroup O-M95, probably in the Pleistocene era. This is consistent with the archeological evidence, which suggest human habitation in mainland India during early Paleolithic times [35–37].
Kayser et al. , however, suggested Southeast Asian origin of haplogroup O-M95, implying migration of Austro-Asiatic populations from Southeast Asia to India. Based on the presence of East Asian mtDNA haplogroups, Kumar et al.  and Thangaraj et al.  suggested that the non-Mundari Austro-Asiatic groups of India (Nicobarese and Khasi) have migrated from Southeast Asia. However, no such maternal genetic link between the Mundari tribes [8–10], and those of Southeast Asia was found. Further, our analysis of 1147 samples representing most of the Mundari tribes, including the transitional groups, shows total absence of East Asian mtDNA haplogroups [Kumar V, Reddy BM and Langstieh BT, unpublished results), suggesting different maternal genetic histories of the three sub-families of Austro-Asiatics of India as compared to the common paternal genetic history outlined earlier. How do we account for this discrepancy? Since all the sub-families of Austro-Asiatics have a common paternal genetic link in haplogroup O-M95, which is absent in the case of mtDNA, a predominantly male driven migration of Austro-Asiatic populations appears to be a strong possibility. However, and most importantly, haplogroup O-M122, which is considered to be the signature haplogroup of Southeast Asian populations, is absent among the Mundari populations, whereas any inference on the migration of populations from Southeast Asia is principally based on the presence of haplogroup O-M122 [18, 20, 33]. The age estimated for haplogroup O-M122 is between 15,000–60,000 YBP  whereas it is ~8,000 YBP for O-M95 in Southeast Asia . Therefore, if indeed Indian Austro-Asiatic populations would have migrated from Southeast Asia, then they should have shown the presence of haplogroup O-M122, and/or the TMRCA estimated for O-M95 among the Mundaris of India should have been much lower (Table 4) than what has been obtained (~65,000 YBP), although given the large Confidence Interval. (25,442 – 132,230) this estimate needs to viewed with caution. Nevertheless, the lower bound of our estimate for the Mundari is still higher and non-overlapping with the upper limit obtained by Kayser et al. . Since Kayser et al.  have used relatively higher mutation rates and only 7 of the 16 loci, we reanalyzed our data based on those 7 loci and the mutation rate used by Kayser et al.  and observed a similar TMRCA (~65,000 YBP) suggesting that the TMRCA of the present study may not be an artifact of large number of loci and low mutation rate. Furthermore, the Mundari populations are considered to be traditionally hunters and food-gatherers and at present they inhabit the areas unfit for cultivation, which may reflect their traditional mode of subsistence. Therefore, migration of Mundari populations during demic expansion of the agriculturalists in the Neolithic era appears improbable as has been suggested for Nicobarese . Based on these evidences, we suggest that the ancestors of present day Mundari populations have migrated to Southeast Asia instead of coming from Southeast Asia. This scenario is also consistent with the inference that Mundari language is grammatically and phonologically the most conservative branch of the Austro-Asiatic family [2, 38] and more similar to proto-Austroasiatics than the other branches of this family suggesting that linguistic ancestors of the Mundari populations have originated in India . The foregoing analysis therefore suggests in-situ origin of O-M95 haplogroup, most probably in the ancestors of present day Mundari populations, who might have carried it further to Southeast Asia.
The results of AMOVA (Table 3), M-J Network (Fig. 4) and TMRCA of haplogroup O-M95 (Table 4) suggest an early separation of Mundari and other Austro-Asiatic populations. Due to this early separation, we expected that at least in one of these groups sublineage of O-M95 might have originated. However, none of the groups showed the sublineage O-M88 (Fig. 2). Till now this lineage has been reported from the region of Cambodia and Laos only in 1 sample  suggesting probably that this lineage is present with a very low frequency and is probably originated and confined to that region. Therefore, if the sublineage(s) exists, it is probably identified by some other binary marker(s) which is yet to be known. Since the Khasi shows relatively high frequency of O-M122 (29%) and given that populations of Khasi-Khmuic sub-family are concentrated in the regions North of Burma and Thailand (Fig. 1), one may suspect that Khasi-Khmuic populations have migrated from Southeast Asia to India. However, the presence of O-M122 in the Khasi is observed to be due to gene flow from the neighboring Garo, suggesting that this population was initially devoid of this haplogroup. Moreover, Indian mtDNA haplogroups constitute 30% of the mtDNA motifs of the Khasi subtribes, (Reddy BM et al. unpublished manuscript), which are practically absent in their Tibeto-Burman neighbors [8, 41]. Therefore, the presence of East Asian mtDNA among the Khasi could be due to gene flow from the neighboring Garo and the other Tibeto-Burman populations which have virtually only East Asian mtDNA haplogroups. This may reinforce the suggestion that Mundari and Khasi-Khmuic populations have separated long back and the latter have probably gone to Southeast Asia, via the northeast Indian corridor, as reflected in their geographic distribution (Fig. 1).
The Nicobarese is also quite distinct from both the Mundari and Khasi-Khmuic tribes as revealed by the AMOVA (Table 3) and M-J network (Fig. 4) based on Y-STRs. This tribe has only East Asian female lineages [6, 7] and only O-M95 as male lineages (Fig. 2 and Table 2). We also performed AMOVA based on the same set of 16 Y-STR for Shompen tribe , which is also a Mon-Khmer group. The results suggest that Shompen like the Nicobarese are also quite distinct from the Mundari (FST = 0.402) and the Khasi (FST = 0.476). The TMRCA of Nicobarese (~17,000 YBP) and the Shompen (~19,000 YBP), and the distribution of Mon-Khmer populations (Fig. 1), which is confined to lower part of Burma and Thailand, Vietnam and Cambodia suggest that they have migrated from Southeast Asia to India during demic expansion of the agriculturalists in the Neolithic era . The complete absence of O-M122 among them appears to be due to the profound impact of founder effect and subsequent genetic drift, although the ascertainment bias due to small sample size cannot be ruled out.
Two possible routes of entry of Austro-Asiatics into the Indian Subcontinent
Kumar and Reddy  suggested the possibility of ancestors of Indian Austro-Asiatic tribes having migrated from Africa to India via either Northeast Asia through the Northeast Indian corridor or via Central Asia through its Western Indian corridor. The sister-clade of haplogroup O-M175, i.e. haplogroup N-LLY22g, is confined only to Northeast Asia including Russia and Siberia (Table 5) and is absent or found in negligibly low frequency in Central, South and Southeast Asia. Similarly, the haplogroup O-M175 and its subclades are either absent or found in low frequency in South (except among Austro-Asiatics) and central Asia, while it is present in whole of East Asia. Two alternative scenarios can be envisaged: 1) it appears that the modern man had probably migrated from Africa to Northeast Asia via Central Asia, where the haplogroups N-LLY22g and O-M175 might have originated [43, 44]. Subsequently, populations carrying haplogroup O-M175 might have migrated to India, where haplogroup O-M95 could have originated and later on spread to Southeast Asia. However, since 100% of the Mundari populations and 30% of the Khasi samples show Indian-specific mtDNA whereas all the Southeast Asian Austro-Asiatic populations have East Asian-specific mtDNA only , this migration could have been primarily male driven. 2) In view of this, the possibility of ancestors of Austro-Asiatics migrating from central Asia to India through the western Indian corridor cannot be discounted, as it can account for the presence of contrasting patterns of mtDNA in the Indian and Southeast Asian Austro-Asiatics. Although neither haplogroup O-M175 nor N-LLY22g has been reported from central Asia, many studies have observed reasonably high frequency of haplogroup K-M9* [15, 24, 26] and it is possible that these samples may fall in the haplogroup defined by the binary marker M214, which connects haplogroups 'O' and 'N' . However, this marker has not been typed in the Central Asian populations. A section of the population might have migrated towards Northeast Asia where haplogroup 'N' originated and another wave towards South Asia and entered India through its western corridor wherein haplogroup O-M175 originated. Haplogroup O-M95 might have evolved subsequently as a predominant male lineage along with the Indian-specific female lineages. Subsequently, a primarily male-driven and rapid migration of these people to Southeast Asia via Northeast India might have resulted in the total absence of Indian-specific mtDNA haplogroups but presence of 100% East Asian motifs in the Southeast Asian Austro-Asiatics. The age estimation of fossils of anatomically modern man excavated from East Asia is not older than 40,000 YBP [18, 47, 48], which may imply that the earliest possible migration of Austro-Asiatic populations to Southeast Asia is about 40,000 YBP or later. Therefore, the Mundari populations appear to be one of the earliest source of populations from which the Khasi-Khmuic and Mon-Khmer populations have separated quite early and migrated to and settled in Southeast Asia, while another wave of migration, much later, by the Mon-Khmer people from Southeast Asia through Thailand and coastal southern Burma to Andaman and Nicobar Islands can be inferred from the current spread of Mon-Khmer populations (Fig. 1).