Characterization and 454 pyrosequencing of Major Histocompatibility Complex class I genes in the great tit reveal complexity in a passerine system

Sepil, Irem; Moghadam, Hooman K; Huchard, Elise; Sheldon, Ben C

doi:10.1186/1471-2148-12-68

Table 2 Rationale for each step of the variant validation procedure

From: Characterization and 454 pyrosequencing of Major Histocompatibility Complex class I genes in the great tit reveal complexity in a passerine system

Variant validation procedure	Rationale
1 Remove variants that don’t match the expected allele size (212, 215, 221 bp)	Variants that have deletions/substitutions shifting the reading frame probably result from sequencing errors (Assumption 1)
2 Remove variants that have less than four copies in the whole dataset	Variants represented once in an individual probably result from sequencing errors (Assumption 4) and variants represented only in one individual probably result from PCR errors (Assumption 5)
3 Remove individuals with less than 200 reads	A low number of reads per individual might lead to incomplete genotyping, thus the results would be unreliable (Assumption 6). The minimum number of reads required per individual is estimated using the probability distribution plotted by Galan et al. [28]
4 Remove variants that have MPAF lower than 0.01	Variants represented rarely in the whole dataset probably result from sequencing errors (Assumption 2)
Remove variants that have MPAF between 0.01 - 0.025 if they can be explained as a chimera or a single basepair mutation	Variants represented rarely in the whole dataset but more frequently in per individual bases probably result from PCR errors if the parental sequences are also present (Assumption 3)
5 Remove variants that have a single copy per individual	Variants represented once in an individual probably result from sequencing errors (Assumption 4)
Remove variants that have less than five copies per individual if they can be explained as a chimera or a single basepair mutation	Variants represented two, three or four times within an individual probably result from PCR errors if the parental sequences are present (Assumption 3). The threshold for PCR errors is estimated from the distribution of artefacts in the previous step

Back to article page

ISSN: 2730-7182

Contact us

General enquiries: journalsubmissions@springernature.com

BMC Ecology and Evolution

Contact us