Tissue samples from 40 nesting loggerhead sea turtles were collected between July and September 2010 on the island of Sal, Cape Verde. A 3 mm sample was taken from the superficial part of the non-keratinized skin of the flippers using a single-use disposable scalpel immediately after egg deposition. Samples were individually preserved in ethanol until DNA extraction.
All tissues were washed in Milli-Q water for 1 minute and were air dried for 15 minutes. DNA extraction was performed using the DNeasy® 96 Blood & Tissue Kit (QIAGEN, Hilden, Germany). All steps followed the manufacturer’s protocol with the exception of the elution, which was conducted in two steps of 100 μl, re-using the first elution.
In order to compare the MHC based phylogeny with a phylogeny obtained from a neutral maker, we amplified 723 bp of the mtDNA control region for all individuals using LCN15382 and H950 primers . After amplification and cleaning of PCR product using EXoSap, sequences were loaded into an ABI 3730 Genetic Analyzer (Applied Biosystems, Darmstadt, Germany). For more details, see Stiebens et al. . Four different haplotypes were found: CcA1.3, CcA17.1, CcA17.2 and CcA2.1 following the Archie Carr Center for Sea Turtle Research nomenclature (http://accstr.ufl.edu/resources/mtdna-sequences/).
MHC primer design
In order to design primers to characterize the highly polymorphic MHC class I exon 2, GeneBank was searched for MHC sequences of related species to the loggerhead turtle. Reptile and avian MHC class I sequences were aligned using BioEdit version 184.108.40.206  and consisted of sequences from reptiles Malaclemy terrapin (Genebank accession numbers: GQ495891.1), Pelodiscus sinensis (AB185243.1 and AB022885.1), Sphenodon punctatus (FJ457094.1, FJ457093.1), and a bird species Gallus gallus (AY123227.1). Within this alignment, conserved regions in the exon 2 were selected to design several primer pairs. The exon 2 was chosen because it encodes for a part of the peptide-binding groove involved in parasite recognition. After various PCR tests for the best primer combination, Cc-MHC-I-F (5’-GATGTATGGGTGTGATCTCCGGG-‘3) and Cc-MHC-I-R (5’-TTCACTCGATGCAGGTCDNCTCCAGGT-‘3) showed consistent amplification of multiple MHC class I sequences across several cloning procedures. Although, the Cc-MHC-I-R primer shows polymorphism from the 16th to 18th base pair, no better primers could be designed.
MHC amplification, cloning, and sequencing
To reduce the risk of PCR artifacts, two independent 20 μl PCR reactions were prepared. Each “replicate” consisted of 2 μl 10× Dreamtaq® Buffer, 1 μl dNTP’s (10 mM), 2 μl of each primer (5pmol/μl), 0.2 μl Taq Polymerase (Dreamtaq®) and 2 μl template DNA [~20 μg/μl]. Thermal profile started with an initial denaturing step at 95°C for 3 minutes, followed by 30 cycles of 30 seconds at 94°C, 30 seconds at 66°C and 1 minute at 72°C. The final elongation was set for 5 min at 72°C. The volumes of both reactions were then pooled, of which 30 μl was loaded in an agarose gel (1.5%, 5 h at 45 V). This procedure was recommended by  and  in order to reduce PCR artifacts. Bands of the expected size (~220 bp) were excised.
Gel purification followed manufacturer’s protocol for the NucleoSpin Extract II Kit (Macherey-Nagel, Düren, Germany). PCR amplicons were cloned with the Qiagen® PCR cloning Kit (Qiagen, Hilden, Germany). The manufacturer’s ligation protocol was followed, except that the ligation-reaction-mixture consisted of 1 μl pDrive Cloning Vector, of 5 μl Ligation Master Mix and of 4 μl PCR products. The transformation protocol was modified as follows: 5 μl of the ligation-reaction mixture were mixed with 25 μl competent cells. Reactions were then heated for 40 seconds at 42°C. Later, 150 μl SOC medium were added and to allow recombinant growth for Kanamycin selection, the reaction mixture was first incubated for 30 minutes at 37°C (slightly shaken) and then plated on a Kan® IptgX-Gal plate. Plasmids were extracted with the Invisorb® Spin Plasmid Mini Two Extraction Kit (Invitek, Berlin, Germany) as described in Kit’s provided protocol, with a final elution step of 50 μl. Cycle sequencing took place in 10 μl PCR reactions consisting of 1 μl Big Dye® Buffer, 1 μl Big Dye® Terminator, 1 μl of the universal M13 Forward primer, 3 μl of HPLC water and 4 μl of extracted plasmid template. The thermal cycling protocol had a first step for 1 minute at 96°C, then 26 cycles at 96°C for 10 seconds and 50°C for 5 seconds. The elongation final step was set at 60°C for 4 minutes. DNA was precipitated and re-diluted in HiDi before being loaded on an ABI 3130 Genetic Analyzer (Applied Biosystems, Darmstadt, Germany). After comparisons of the different sequences obtained with the different primer pairs, the best combination (i.e. the one providing most sequences) was used for high throughput sequencing on a next generation sequencing platform.
Barcoded 454 sequencing of MHC genes
The 454 next generation sequencing platform using a barcoded deep amplicon approach [29, 30] was chosen because of the long sequence reads and large coverage to help determine high intra and inter individual variability. To this end, DNA concentrations were standardized to 10 ng/μl in order to maximize the likelihood of equal coverage of all samples. As previously described, two independent PCR reactions were performed. For each replicate, the protocol was split into two steps. In the first step, PCR conditions were kept as described above, but the number of PCR cycles was reduced to 25. The first PCR products were used as a template for another 10 PCR cycles. The reconditioning procedure coupled with independent PCR reactions reduces the final proportion of artifacts , a major problem with new sequencing technologies. The reconditioning step used 454 sequencing adaptors (Forward side TitaA CCATCTCATCCCTGCGTGTCTCCGACTCAG; Reverse side TitaB CCTATCCCCTGTGTGCCTTGGCAGTCTCAG, GATC, Constance, Germany), followed by a 10 nucleotide individual tag (MID, Roche) and the newly developed MHC class I primer pair. The MID tags were designed such that the random accumulation of up to two polymerase errors in the MID would still lead to the correct individual identification. For a given individual, replicated PCRs had the same forward MID tags but different reverse MID tags which allowed us to track the product of each PCR reaction all along the amplification and sequencing.
After amplification, amplicons were cleaned using the Qiagen PCR Purification Kit (Qiagen, Hilden, Germany). The cleaned products were run on gels, to verify the presence of the expected bands. From all cleaned samples, DNA concentration was re-measured and all samples were pooled so that each PCR reaction contributed to an equal amount of 100 ng/sample. To remove potential unspecific amplicons, the final pool was loaded on a 1.5% agarose gel (14 h at 30 V). Bands of ~340 bp were cut out and products were extracted as described above.
Individual MHC genotyping
MHC alleles were called and assigned to each individual using Perl scripts. Reads were screened for the forward and reverse sequencing primers, allowing one nucleotide mismatch or indel (insertion/deletion) in case of sequencing errors and otherwise discarded. Remaining reads were then assigned to individuals based on MID tags, again allowing for one nucleotide mismatch or indel. Reads were then trimmed (removing the primer and MID sequence) and aligned using BioEdit, resulting in a set of putative allele variants for each individual. To cull out less reliable sequence variants, alleles were retained only if they met the following criteria per individual: (1) if they appeared in both independent PCR preparations (both MID tags) and (2) if their frequency (in terms of proportion of reads) was above 10% of the most frequently occurring allele within that individual. The remaining variants, although they might stem from different loci, are referred to as “alleles” and make up our final allele dataset.
Errors occurring during the 454 sequencing include substitutions and small indels [29, 30], and these were expected to occur randomly across the sequence. From our MID tags, the frequency of errors resulting in base substitutions was low. Therefore, the probability of multiple, identical substitution errors is estimated to be low . Single-base indels occurring in homopolymer tracts were relatively common and were non-randomly distributed along the sequence. However, such variants were removed with our method because of their low frequency of occurrence within an individual and across independent replicate PCR reactions.
Under positive selection, a relative excess of non-synonymous over synonymous substitutions is expected . We calculated the relative rates of synonymous (d
) and non-synonymous (d
) substitutions following the method of Nei and Gojobory  with the Jukes-Cantor  correction for multiple substitutions implemented in MEGA 4 . The rate ratio d
was tested for significant deviation from one using a Z-test.
MEGA 4 was also used to build a neighbor-joining tree with 1000 bootstraps for all MHC alleles found in the sampled turtles. Two additional neighbor-joining trees were simulated: one based on the control region of the mitochondrial genome (mtDNA) of 6 reptile species and one based on the MHC class I of 5 reptile species.
Maximum likelihood site models implemented in the CODEML program from PAML version 4.4  were used to test for evidence of positive selection and to identify branch-specific positively selected codon sites [ω > 1, where ω = (d
)]. The maximum likelihood procedures evaluate heterogeneous rate ratios (ω) among sites by applying different models of codon evolution. Three likelihood-ratio tests of positive selection were performed comparing the models M1a (nearly neutral) vs M2a (positive selection), M7 (ß) vs M8 (ß + ω), and M8a (ß + ω = 1) vs M8 . In these likelihood-ratio tests, two nested models are compared: a model based on the null hypothesis of no positive selection, and a model that allows some sites to evolve under positive selection. The null model M1a assumes two site classes in the molecule with 0 < ω 0 < 1 and ω1 = 1 in proportions p
0 and p
1 = 1-p
0. The alternative model M2a incorporates another class of sites with ω2 > 1 and the proportion p
2 estimated from the data. The null model M7 assumes a beta distribution for ω, not allowing positive selection (0 < ω < 1). The alternative model M8 has additional classes of sites that allow some codons to evolve under positive selection (ω > 1, ). A third null model M8a differs from model M8 in that its additional class of sites are evolving neutrally (ω = 1). In the models M2a and M8, positively selected sites are inferred from posterior probabilities calculated by the Bayes empirical Bayes method . Because MHC alleles are so variable and often represent ancient lineages (TSP), we thought the evaluation of dN and dS appropriate despite the comparison within a species.
We used the ScoreCons online server  to determine variation for amino acid residues in the exon 2 of the loggerhead turtles. The software MultiLocus 1.22  was used to estimate linkage disequilibrium between detected alleles using 10000 randomizations.
The minimum number of recombinant events (RM) was calculated after Hudson and Kaplan [four-gamete method, McVean et al. using the software DnaSP.
The program GENECONV version 1.81 was used to detect sequence fragments that were likely to have been subjected to gene conversions. GENECONV detects pairs of sequences that share unusually long stretches of similarity given their overall polymorphism . We used global and pairwise permutation tests (10,000 replicates) to assess significance.
Although fitness is difficult to estimate in loggerhead turtles, studies have shown that larger females have a higher clutch size, linking turtle morphometrics to high fecundity [38, 39]. As a fitness proxy we used the curved carapace length corrected (residuals of correlation) for curved carapace width, as equivalent to body condition. Residuals for this correlation were then tested against individual number of MHC alleles (linear and quadratic terms) following . Curved carapace length and curved carapace width were measured for all turtles immediately after egg deposition.