With an area of more than three million square kilometers – roughly one third of that of Europe - the Sakha Republic (Yakutia) dominates Eastern Siberia. The southern part of Sakha extends to southern Siberia, which has served as an entry region into northern Asia [1, 2]. Southern Siberia connects Sakha with the Inner Eurasian steppe belt, which stretches from the Black Sea to the Yellow Sea and has enabled human movements across large distances from east to west and vice versa. The northeastern part of Sakha overlaps with former Beringia, which connected Asia and America during the Last Glacial Maximum (LGM), permitting human migration to the Americas [3, 4]. Because Sakha, particularly the Lena valley, served as the main pathway to the arctic coast and, beyond that, America, during Paleolithic times , an understanding of its settlement history is important to elucidate the colonization of Northeast Eurasia as well as the peopling of the Americas.
Anatomically modern humans colonized Sakha, including the high Arctic, about 30,000 years ago . Southern Siberia as well as the southern part of Sakha, was continuously populated through the LGM [2, 7]. The population began to increase rapidly ~19,000 years ago . Consecutive archaeological cultures in the territory of Sakha hint at multiple waves of migrations from southern areas surrounding the upper reaches of the Yenisey, Lake Baikal and the Amur River [9, 10]. Ancient tribes, inhabiting this area since the Neolithic, are regarded as the presumable ancestors of different contemporary circumpolar ethnic groups speaking Paleoasiatic and Uralic languages, while Tungusic-speaking tribes spread all over Siberia at a later time [5, 10]. The ancestors of Turkic-speaking Yakuts, under Mongol pressure from the south, moved from the Baikal region up the Lena valley, arriving at the middle reaches of the Lena and Vilyuy Rivers presumably during the 11th-13th centuries [5, 10, 11]. In the 17th century Yakutia was incorporated into the Russian Empire.
The first genetic studies of the native populations of Sakha based on haploid loci (mitochondrial DNA (mtDNA) and non-recombining part of Y chromosome (NRY)) primarily focused on the peopling of the Americas [12, 13] and as a “by-product”, detected a very strong bottleneck in Yakut male lineages . Native Siberians, including populations from Sakha, continually receive attention in relation to the colonization of the Americas [15–18], whereas phylogenetic analyses of their uniparental data have added valuable information about the colonization and re-colonization of northeastern Eurasia. The analysis of Siberian mtDNA pool has provided evidence to rule out the existence of a northern Asian route for the initial human colonization of Asia , and revealed that the present-day northern Asian maternal gene pool consists of predominantly post-LGM components of eastern Asian ancestry [20, 21]. The most frequent Y-chromosome haplogroup in northern Eurasia – N1c – most probably arose in present day China and spread to Siberia after the founder event associated with the human entry into the Americas . Two other Y-chromosome haplogroups dominant in Siberia – C3 and Q1 – are more ancient in northern Asia [17, 23].
Genetic studies focused on the Yakuts have shown their strong genetic similarity to South Siberian/Central Asian populations [19, 24–28]. MtDNA and Y-chromosome variation in Tungusic sub-groups from different part of northern Asia (Sakha, Middle Siberia and the Russian Far East) has revealed the common shared origin of Evenks and Evens . In addition, mtDNA data of Arctic Siberian populations have shown a genetic discontinuity between Yukaghirs, the oldest population in Sakha, and the adjoining Chukchi, descendants of the latest inhabitants of Beringia . Most of the existing mtDNA data from Sakha populations were obtained by the examination of hypervariable segment I (HVSI) sequences and a limited number of coding region markers, thus permitting to determine the main haplogroups only. However, analyses of large data sets of eastern [31, 32] and northern Asian complete mtDNA sequences [19–21, 30, 33, 34] have significantly refined the topology of the mtDNA phylogeny, providing new informative markers for large scale population studies. This was an essential prerequisite to clarify the events that led to the re-colonization of Siberia, as most of the newly defined sub-haplogroups common in Siberia have been dated as post-LGM [20, 21, 35].
A global study of genetic variance encompassing 51 populations, carried out at the level of 650,000 genome-wide single nucleotide polymorphisms (SNPs) , revealed that the Yakuts were closest to the Han Chinese, Japanese and other, less numerous East Asian populations. Even so, Yakuts stand out among East Asian populations due to two distinct signals: the first signal, a minor one, brings Yakuts together with Amerinds, probably reflecting the deep shared ancestry of Siberians and Native Americans, and the second signal is explained by an overlap with the major genetic component in European populations. Analysis of a dataset complemented by eleven more Siberian populations differentiated Siberians from East Asian populations . Furthermore, this analysis separated Koryaks and Chukchi from the rest of the Siberians, demonstrating a close genetic proximity of Yakuts and Evenks. However, the population coverage of Siberia is still limited in genetic studies. For instance, the second Tungusic-speaking population of Sakha – Evens – was not represented in previous analyses. Moreover, Siberian populations have so far not been in the focus of autosomal SNP variance pattern analyses.
In the present study, we combined detailed phylogenetic analyses of the maternal and paternal lineages of the native populations of Sakha with the analysis of a genome-wide sample of more than 600,000 SNPs to better define the genetic relationships between Sakha, South Siberia, East Asia, Northeast Siberia, and Europe, with an emphasis on clarifying the genetic history of the native populations of Sakha. We applied phylogenetic analysis to 829 mtDNAs and 375 Y chromosomes from five populations of Sakha (Yakuts, Evenks, Evens, Dolgans and Yukaghirs) and Dolgans of the adjacent Taymyr Peninsula, and implemented FST, principal component analysis (PCA) and ADMIXTURE to autosomal SNPs in a sample set combining 40 newly genotyped Siberian individuals, published data on Siberia [36, 37] and relevant global reference populations [36–38].