Sample collection procedure
Buccal swabs were collected from males over eighteen years old unrelated at the paternal grandfather level from locations in South East Nigeria as shown in Table 1. All buccal swabs were collected anonymously with informed consent. Ethical approval was obtained from University College Hospitals and University College London Joint Committee on the Ethics of Human Research (reference number 99/0196). Sociological data were also collected from each individual including age, current residence, birthplace, self-declared cultural identity, first language, second language and (when available) clan affiliation (Clan identities were verified with information presented in Cross River and Akwa Ibom State Population Bulletin 1982-90 ) for the individual as well as similar information on the individual's father, mother, paternal grandfather and maternal grandmother. The samples were classified into groups primarily by first language spoken, then by place of collection and thirdly, when available, by clan or some other subsidiary criterion. Where collections from a particular group were made in more than one location (for example the Ediene Abak were collected from two neighbouring villages: Afaha Esang and Ikot Ubom) and co-ordinate data are available for both sites, locations are represented by averages.
Buccal swabs and similar sociological data as described above were also collected from males eighteen years or older unrelated at the paternal grandfather level from the following groups:
CA-BT: Tikar speakers from Bankim Cameroon (n = 34), CA-FB: Bamoun speakers from Foumban Cameroon (n = 117), CA-WA: Aghem speakers from Wum Cameroon (n = 118), GH-AEW: Twi speakers from Enchi Ghana (n = 21), GH-AKE: Twi speakers from Kibi Ghana (n = 51), GH-ASWW: Twi speakers from Sefwi Wiawso Ghana (n = 22), GH-EHVR: Ewe speakers from Ho Ghana (n = 88), GH-FEWR: Fante speakers from Enchi (n = 61).
Standard phenol-chloroform DNA extractions were performed on all samples.
Assembly of comparison NRY and mtDNA datasets
NRY data for 5 microsatellites (DYS19, DYS390, DYS391, DYS392, DYS393) was assembled from previous studies conducted on sub-Saharan African populations for comparison to data generated in this study. The populations considered were Namibe from Angola ; Bangui from the Central African Republic ; Ngumbacam , Bamileke and Ewondo  from Cameroon; Fali , Fulani , Mandara  and Tupuri  from Northern Cameroon; Bakaka  and Bassa  from Southern Cameroon; individuals from Equatorial Guinea ; Fang from Gabon ; individuals from Guinea' Bissau ; individuals from Mozambique ; Yoruba from Nigeria ; Hutu from Rwanda ; Bantu speaker from South Africa ; and Sukuma from Tanzania .
HVS-1 VSO haplotype data from positions 16030 to 16360 was also assembled from previous studies from the following populations: Namimbe from Angola ; Bamileke  and Ewondo  from Cameroon, individuals from Mozambique ; Hutu from Rwanda ; Wolof from Senegal ; Temne from Sierra Leone ; and Shona from Zimbabwe .
The NRY of all South East Nigerian samples as well as all Cameroonian and Ghanaian samples were typed in the following manner: standard TCGA kits were used to characterise six microsatellites (DYS19, DYS388, DYS390, DYS391, DYS392, DYS393) and eleven biallelic Unique Event Polymorphism (UEP) markers (92R7, M9, M13, M17, M20, SRY+465, SRY4064, SRY10831, sY81, Tat, YAP), as described by Thomas et al. . Microsatellite repeat sizes were assigned according to the nomenclature of Kayser et al. . Where necessary the additional markers M191 and U175, were typed using a tetra primer ARMS PCR method . Each PCR involved four oligonucleotide primers and resulted in the amplification of a full fragment (control band) and one allele specific fragment (see supplementary materials for further details [Additional file 2: Supplemental Table S12]). P12f2 was typed as described by Rosser et al. . NRY Haplogroups were defined by the 14 UEP markers according to the nomenclature proposed by Karafet et al.  [Additional file 1: Supplemental Figure S5]. Markers typed were chosen to reflect that as well as characterising NRY types of recent African origin we would also be likely to characterise a minority of NRY types of recent European origin due to possible introgression from North Atlantic slave traders.
The mtDNA (Hypervariable Segment 1) HVS-1 region of all South East Nigerian samples as well as all Cameroonian and Ghanaian samples was sequenced as described by Veeramah et al. . HVS-1 Variable Site Only (VSO) haplotypes were determined for all samples from South East Nigeria by comparing sequence data covering nucleotides 16020-16400 with the Cambridge Reference Sequence [52, 53]. Haplotypes were defined by base changes and nucleotide positions where substitutions, insertions or deletions occurred. Tentative mtDNA Africa-specific haplogroup classification was based on the scheme of Salas et al. . HVS-1 VSO haplotypes were also determined for all samples from Cameroon and Ghana with sequence data covering nucleotides 16023-16380. South East Nigerian HVS-1 coverage was reduced to this range for comparisons including these groups.
Statistical and population genetic analysis
Genetic differences between pairs of populations when individuals in populations were characterised by a) NRY UEP haplogroups, b) combined NRY UEP haplogroup and six microsatellite haplotypes (UEP+MS) or c) mtDNA HVS-1 VSO haplotypes were assessed using an Exact Test of Pairwise Population Differentiation (ETPD) with 10,000 Markov steps [54, 55].
Population Genetic Structure was estimated using Hierarchical Analysis of Molecular Variance (AMOVA)  based on a particular mutation model to generate a single Fixation Index statistic, FST, when a simple structure of populations within a single group was defined, or three Fixation Indices, FST (the within-population Fixation Index), FSC (the among-populations within-group Fixation Index) and FCT (the among-group Fixation Index), when a more complex structure of populations within multiple groups was defined. Significances of Fixation Indices are assessed by randomly permuting individuals (given that only haploid systems are considered) among populations or groups of populations, depending on the Fixation Index being tested and after every round of permutations, of which 10,000 were performed, Fixation Indices are recalculated to create a null distribution.
Population pairwise genetic distances were estimated from Analysis of Molecular Variance φST values . The genetic distances used were a) FST  (when individuals in populations were described by UEP haplogroups, UEP+MS haplotypes and mtDNA HVS-1 VSO haplotypes), b) RST  (when NRY were characterised by the six microsatellites) and c) the Kimura-2 parameter model (which allows different transition and transversion rates) with gamma distribution of value 0.47 (K2)  (when mtDNA was characterised by HVS-1 sequences with gaps removed). Significance of genetic distances was assessed by permutation of individuals as described above for testing significance of Fixation Indices. All the above was performed using Arlequin software .
Principal Coordinates Analysis (PCO)  was performed using the 'R' statistical package http://www.R-project.org by implementing the 'cmdscale' function found in the 'mva' package on pairwise FST (or equivalent) matrices.
TMRCA estimates based on the level haplogroup specific microsatellite diversity and associated confidence intervals (CIs) were estimated using YTIME software http://www.ucl.ac.uk/tcga/software/index.html. An inter-generation time of 25 years was applied to convert from generations to years. A mutation rate of 0.002  was utilized under a single-stepwise mutation model and under a length-dependent mutation model the constants a and b in the equation μ = a + bL were represented by -0.004758677 and 4.46E-04 respectively (YTIME user guide http://www.ucl.ac.uk/tcga/software/index.html). The most frequent haplotype in the corresponding haplogroup was utilized as the ancestral haplotypes (therefore this method does not take into account error in the choice of ancestral haplotypes in the genealogy).
Mantel and Partial Mantel tests  were performed between genetic distance and both geographic and linguistic distance using the 'R' package 'Vegan', which uses the Pearson product-moment method. Significance was assessed by permuting the rows and columns of the matrices 1,000 times.
Geographic distances were Great Circle distances estimated from latitude and longitude data. Linguistic distances were constructed as described in the supplementary materials [Additional file 1: Supplemental Section 3], drawing from lexicostatistics reported in the literature and incomplete data matrix prediction algorithms.
Median Joining Networks were constructed for NRY data as described by Helgason et al.  and for mtDNA data as described by Vilar et al. .
NRY and mtDNA simulations were performed as described in the supplementary materials [Additional file 1: Supplemental Section 2], the results of which could be compared to empirical data in order to guide our understanding of the effect migration rate and sample size on genetic structure in the Cross River region. These simulations are at best crude approximations of the true Cross River region system that do not explore the full likely parameter space and thus are not formally statistically assessed in comparison to our observed data.