Rapid evolution of BRCA1 and BRCA2 in humans and other primates

Background The maintenance of chromosomal integrity is an essential task of every living organism and cellular repair mechanisms exist to guard against insults to DNA. Given the importance of this process, it is expected that DNA repair proteins would be evolutionarily conserved, exhibiting very minimal sequence change over time. However, BRCA1, an essential gene involved in DNA repair, has been reported to be evolving rapidly despite the fact that many protein-altering mutations within this gene convey a significantly elevated risk for breast and ovarian cancers. Results To obtain a deeper understanding of the evolutionary trajectory of BRCA1, we analyzed complete BRCA1 gene sequences from 23 primate species. We show that specific amino acid sites have experienced repeated selection for amino acid replacement over primate evolution. This selection has been focused specifically on humans and our closest living relatives, chimpanzees (Pan troglodytes) and bonobos (Pan paniscus). After examining BRCA1 polymorphisms in 7 bonobo, 44 chimpanzee, and 44 rhesus macaque (Macaca mulatta) individuals, we find considerable variation within each of these species and evidence for recent selection in chimpanzee populations. Finally, we also sequenced and analyzed BRCA2 from 24 primate species and find that this gene has also evolved under positive selection. Conclusions While mutations leading to truncated forms of BRCA1 are clearly linked to cancer phenotypes in humans, there is also an underlying selective pressure in favor of amino acid-altering substitutions in this gene. A hypothesis where viruses are the drivers of this natural selection is discussed.


Background
Defects in the BRCA1 or BRCA2 genes are responsible for most hereditary forms of breast cancer and account for as many as 10% of all breast cancer cases [1]. Women with a strong family history of cancer who possess a harmful BRCA1 or BRCA2 allele are at high risk for developing breast cancer within their lifetime (80% and 60%, respectively) [2,3]. In addition, BRCA1 mutation carriers have a 30-40% chance of developing ovarian cancer, while BRCA2 mutations also increase the risk of ovarian, pancreatic, prostate, and male breast cancer [2]. Cancers occur when heterozygous individuals experience a somatic loss of heterozygosity event at the BRCA1 or BRCA2 locus, leaving only the abnormal allele intact. Because both gene products play a critical role in key cellular processes such as DNA repair, cell cycle control, and transcriptional regulation, it is clear why inactivating mutations are so detrimental. The importance of these proteins is further evidenced by the fact that both BRCA1 and BRCA2 null mice are embryonic lethal [4].
Given their indispensible functions in maintaining the integrity of the genome, one might expect strict evolutionary conservation of BRCA1 and BRCA2 over time. Indeed, some regions of BRCA1 have experienced purifying selection strong enough to operate even on synonymous mutations [5]. However, contrary to this line of reasoning, a number of groups have documented the rapid evolution of BRCA1 [6][7][8][9][10][11] and BRCA2 [10] in mammals. Rapid evolution occurs when a gene experiences positive natural selection for new, advantageous mutations that arise in a population. Because advantageous mutations commonly involve a change in protein sequence (non-synonymous mutations), recurrent rounds of positive selection in a gene lead to rapid evolution of the encoded protein sequence over time. For BRCA1, the evolutionary rate was particularly elevated on the branches leading to humans and chimpanzees (Pan troglodytes) [6]. The identification of this signature in BRCA1 suggests that some alleles and polymorphisms currently circulating within the human population may offer a selectable advantage. However, both the cause and consequence of this unexpected mode of evolution seen in BRCA1 remain unknown.
Here, we report an extensive evolutionary analysis of the primate BRCA1 gene. In previous studies of BRCA1 evolution, only exon 11 was examined with a limited number of primate species included in the analyses [6][7][8][9][10][11]. To extend previous studies, we have generated full-length BRCA1 sequences for 17 additional primate species. Using this more extensive dataset, we validate the finding of positive selection in humans and their closest ape relatives (in our study, chimpanzees and also bonobos (Pan paniscus)). We also show that specific codons in BRCA1 have experienced recurrent positive selection over evolutionary time, both within and outside of exon 11, resulting in a small number of highly variable residue positions in an otherwise highly conserved protein. In addition, we sequenced exon 11 of BRCA1 from populations of chimpanzee, bonobo, and rhesus macaque (Macaca mulatta) individuals and found that several unique polymorphisms exist within these populations. Two polymorphisms in the chimpanzee population were found to be in Hardy-Weinberg disequilibrium suggesting that selection may still be operating on this gene in modern times. Lastly, exon 11 of BRCA2, another important genetic determinant for hereditary breast and ovarian cancers, was also sequenced from diverse primate species. This gene also bears the surprising signature of positive selection. It is unclear why these critical genes bear this unusual evolutionary signature, but we present one possible hypothesis involving interactions between DNA repair proteins and viruses.

BRCA1 is evolving under positive selection in primates
To expand our understanding of the positive selection shaping BRCA1 in primates, we obtained cell lines from 17 simian primate species, harvested total RNA, and created cDNA libraries. From these, the 5.6 kilobase fulllength coding region of BRCA1 was sequenced. These sequences were combined with full-length BRCA1 sequences from six primate species with available genome projects, creating an alignment of 23 full-length BRCA1 sequences. 17 out of the 23 full-length sequences have never before been analyzed (asterisks in Figure 1A).
The type of selection that a gene has experienced can be inferred from its rate of accumulation of non-synonymous (changing the encoded amino acid; denoted dN) and synonymous (silent; dS) substitutions over time. Proteinaltering mutations are far less likely to be tolerated than synonymous mutations, and so dN/dS < < 1 for the vast majority of genes encoded by human and other mammalian genomes [12]. Some genes, such as pseudogenes, evolve neutrally with dN/dS~1 because there is not strong selection for or against new mutations in these genes. Finally, selection in favor of non-synonymous mutations results in a dN/dS > 1. These genes are classified as being under positive selection, and are experiencing continued selection for "innovation" at the protein sequence level. In these genes, not only has the penalty against protein-altering mutations been relaxed, but this very type of mutation is being selectively retained. Using PAML [13], we fit the full-length BRCA1 alignment (Additional file 1) to models of positive selection where a subset of codons is allowed to evolve with dN/dS > 1 (M2a, M8) and to null models not allowing positive selection (M1a, M7, M8a). Likelihood ratio tests revealed that the dataset fit the positive selection models significantly better than the null models (p < 0.05, Table 1). Thus, BRCA1 has experienced selection in favor of non-synonymous mutations over the speciation of simian primates.
We next estimated dN/dS values on each branch on the primate evolutionary tree using the free-ratio model in PAML. As expected, most branches exhibited a dN/ dS < 1 ( Figure 1A). The branch leading to humans had the most elevated signal with a dN/dS of 2.79. The second highest value of dN/dS on the BRCA1 tree is found on the branch leading to the last common ancestor of bonobos and chimpanzees, with a dN/dS of 2.66. Because the free-ratio model is highly parameterized, we next compared one-ratio and two-ratio models to determine whether selection has differentially affected the human, chimpanzee, and bonobo clade. As shown in Figure 1B, our simian primate dataset fit the two-ratio model significantly better than the one-ratio model, with the human, chimpanzee, and bonobo clade exhibiting a dN/dS of 1.78, while all other branches had a dN/dS of 0.59. In summary, our extended primate dataset shows that BRCA1 is experiencing positive selection, and that the most intense selection has operated on the human/ chimpanzee/bonobo clade.
Based on a comparison of extant and predicted ancestral sequences, humans are estimated to have accumulated 25 substitutions in the BRCA1 gene since their divergence from chimpanzees and bonobos six million years ago, 22 of which are non-synonymous ( Figure 2A). In order to understand how unusual this is, we looked at the evolution of other genes, specifically ones encoding BRCA1-interacting proteins, along the branch leading to humans. Because we do not have extended sequence sets for all of these genes, we took a simpler approach. For each gene, we aligned the human, chimpanzee, and gorilla sequences and manually counted the number of human-specific substitutions (any position where the human gene sequence differs from both the chimpanzee and gorilla gene sequence). These were categorized as non-synonymous (N) or synonymous (S) based on how they affected the codon in which they were found. When these values are normalized to gene size, BRCA1 has the highest enrichment of non-synonymous substitutions [(N/kb)/(S/kb)]. Care must be taken in comparing this metric between genes, because different genes have different equilibrium codon frequencies, and therefore have different mutational opportunities for synonymous and non-synonymous mutations. However, the BRCA1 gene has an enrichment ratio that is more than 4-fold higher than any of the other genes shown ( Figure 2B).
BRCA1 encodes a 220 kDa protein with two conserved domains: an N-terminal RING domain and two tandem C-terminal BRCT domains ( Figure 2C). The RING domain has E3 ubiquitin ligase activity that is essential in the DNA damage response. The BRCT motifs function as a protein-protein interaction module that binds phosphorylated proteins involved in DNA repair, cell cycle control, chromatin remodeling, and transcription. There is also a coiled-coil region between these two domains. Interestingly, all but one of the non-synonymous substitutions predicted to have occurred in the human/ bonobo/chimpanzee clade fall outside of these known structural motifs ( Figure 2C).

Human variation at selected sites in BRCA1
The M8 model allows a class of codons to evolve under positive selection (dN/dS > 1). 10 codons were identified as belonging to this class with a high posterior probability (P = 0.85 or above). These codons do not lie in the region of BRCA1 where it was previously reported that selection might be acting against synonymous mutations [5], potentially given rise to a false signature of dN/dS > 1. Instead, all 10 sites show high variability between primate species at the protein level, often encoding very Figure 1 Evolution of BRCA1 over the course of primate speciation. A. dN/dS values for each branch of the primate phylogeny were calculated using the free-ratio model in PAML [13]. Branches exhibiting dN/dS values > 1 are shown in bold italics. Dashes (−) represent branches where zero synonymous substitutions are predicted to have occurred. On these branches, dS = 0 and dN/dS can therefore not be calculated. In these instances, the numbers of non-synonymous (N) and synonymous (S) substitutions predicted to have occurred along each branch are indicated in parentheses (N:S). Of these, branches that experienced 4 or more non-synonymous substitutions are in bold italics. Asterisks indicate new sequences generated in this study. B. The human, bonobo, and chimpanzee clade was isolated and dN/dS values were calculated using the one-ratio and two-ratio models in PAML. The two-ratio model was a better fit as determined by the likelihood ratio test shown in the box. ω0 is the calculated dN/dS for all branches under the one-ratio model, or for background branches under the two-ratio model, and ω1 is the dN/dS for the isolated branches in the two-ratio model.   dissimilar amino acids (first four rows in Figure 3A). Next, these positively selected codon positions were examined for variability within the human population. The Breast Cancer Information Core (BIC, http://research.nhgri.nih.gov/bic/) is a repository of human BRCA1 polymorphisms. Using this database, we identified single nucleotide polymorphisms (SNPs) at amino acid sites 170, 888, 890, 1203, and 1443 ( Figure 3A). At four out of these five sites (position 888, 890, 1203, and 1443), we find that some human BRCA1 alleles encode a unique amino acid not observed in any of our primate sequences. In addition, SNPs known to cause human disease occur in six out of 10 sites. In all cases, these disease-linked SNPs are not amino acid-altering mutations, but rather more radical frame-shifting or nonsense mutations ( Figure 3A). In particular, nonsense mutations occurring in codon 1443 are among the most common mutations documented in the BIC. In Figure 3B, all 10 sites of positive selection were mapped onto a domain diagram of BRCA1 (bottom) along with the most common human non-synonymous SNPs found in the BIC (top). As described previously for mutations accumulated in the human/chimpanzee/bonobo clade, all but one of the positively selected residues (1370S in the coiled-coil domain) lie outside of any known structural motifs. In summary, the 10 codon positions identified in this analysis are highly variable between primate species and within the human population, and are involved in the etiology of cancers associated with this gene. Disease-associated SNPs at these sites tend to be radical, protein-truncating mutations. However, a presumably distinct phenomenon appears to be driving selection in favor of non-synonymous point mutations at these positions.

BRCA1 variation in other primate populations
So far, we have documented sequence differences between the BRCA1 proteins of different primate species.
We have shown that non-synonymous substitutions are accumulating in BRCA1 faster than expected under constrained, or even neutral, evolution. We next wished to explore whether positive selection is still acting on BRCA1 in modern populations. There is already evidence that this is true in the human population, because several BRCA1 SNPs have been found to depart from Hardy-Weinberg equilibrium in European populations [14,15] and in Australia [6]. We wished to determine if the same might be true in bonobo and chimpanzee populations. We amplified and sequenced the largest BRCA1 exon, exon 11 which is~3.4 kilobases and com-prises~61% of the BRCA1 coding region, from the genomic DNA of seven bonobo and 44 chimpanzee individuals ( Table 2). In bonobos, we found nine polymorphic sites, eight of which were single nucleotide polymorphisms (SNPs), with three of these being non-synonymous. Eight of the SNPs were in Hardy Weinberg equilibrium.
Interestingly, one bonobo individual was also homozygous for a seven amino acid deletion (Δ1058-1064) ( Table 2). Hardy-Weinberg equilibrium was rejected for this polymorphism, although the support was weak and did not survive correction for multiple testing ( Table 2). The chimpanzee sequence set revealed nine SNPs, seven of which were non-synonymous. Interestingly, in this larger sample set (n = 44), three of the non-synonymous SNPs  were found to be in Hardy Weinberg disequilibrium, suggesting that selection is acting either for (E309K and G590S) or against (G1077R) these mutations. The support for one of these (E309K) was weak and did not survive correction for multiple testing ( Table 2). It is particularly intriguing to see that humans also share with chimpanzees this same S/G SNP at position 590. In both the bonobo and chimpanzee populations, all synonymous SNPs were in Hardy-Weinberg equilibrium. We also sequenced exon 11 from 44 rhesus macaque individuals. Rhesus macaques are not part of the human/ chimpanzee/bonobo clade and are instead distantly-related members of the Old World monkey clade ( Figure 1A). In these macaques, we found 12 SNPs in BRCA1, with seven being non-synonymous ( Table 2). This includes a SNP found at position 1203, a site of positive selection in the inter-species dataset. This codon is also the site of a known disease-linked mutation in humans; however, the cancerlinked SNP at this position introduces a stop codon. Nonetheless, all of these are in Hardy-Weinberg equilibrium.
Caution must be used when interpreting signatures of selection acting on polymorphisms in primate populations. When sampling primates, it is not possible to get completely random and non-related population sets. Deviations from Hardy-Weinberg equilibrium may occur due to factors other than selection. Reasons for falsely rejecting Hardy Weinberg equilibrium include 1) nonrandom mating, 2) small population sizes which magnify the effects of genetic drift, 3) introduction of new alleles, 4) population subdivision or admixture, 5) biases in sequencing errors, and 6) linkage disequilibrium with another locus under selection. Because the chimpanzee population consists of individuals from two different subspecies, admixture could plausibly lead to rejection of Hardy Weinberg equilibrium.
We also performed the McDonald-Kreitman and Tajima's D tests on our datasets (data not shown). The tests were not significant and therefore do not support selection acting on any of these polymorphisms. False conclusions in this test can again result from a population with hidden structure. In summary, while the analyses using the simian primate dataset consisting of 23 species suggest that recurrent positive selection has been acting on BRCA1 over the course of several million years, the Hardy-Weinberg equilibrium tests performed here and by others indicate that selection is acting on modern day humans, and possibly also chimpanzees.

BRCA2 is also evolving under positive selection in primates
Because of the rapidly evolving nature of BRCA1, we also completed an evolutionary analysis of BRCA2, another strong determinant for hereditary breast and ovarian cancer. Although BRCA2 has been shown to be under positive selection, only a small number of primate species was included in this study [10]. We sequenced the~5 kilobase exon 11 from 18 primate species. Exon 11 is the largest of 27 exons and encodes about 50% of the entire BRCA2 protein. The sequences, along with six additional sequences from available genome projects, were assembled into a multiple alignment (Additional file 2). We fit the alignment to positive selection and null models as described above. The positive selection models were again a significantly better fit to the sequence set than the null models, with a p value ≤ 0.0003 (Table 1). In summary, BRCA2 is under positive selection in primates as well, although this signature appears not to be concentrated on the human/chimpanzee/bonobo clade (Additional file 3).
In contrast to BRCA1, BRCA2 is a 390 kDa nuclear protein that is exclusively involved in the homologous recombination pathway for repairing double-strand breaks. The eight BRC motifs and the extreme C terminus mediate interactions with and recruitment of Rad51, a protein that catalyzes strand invasion during homologous recombination [16][17][18]. All eight BRC repeats are encoded within exon 11. The M8 model estimates that five codons are evolving under positive selection with posterior probability > 0.85 ( Figure 4A). Two of these positively selected sites were found to have a human polymorphism documented in the BIC ( Figure 4A). When all five sites of positive selection are mapped onto a domain diagram of BRCA2 ( Figure 4B), they cluster within the first three BRC domains (1008, 1225, and 1426) and the intervening regions (1159 and 1272). To examine this further, we aligned the amino acid sequence of all eight BRC repeats of human BRCA2 and highlighted sites 1008, 1225, and 1426 ( Figure 5A). Surprisingly, all three sites of positive selection lie adjacent to a hydrophobic motif (FxxA) known to mediate interactions with Rad51 ( Figure 5A red box). Since the co-crystal structure of the BRCA2 BRC4 in complex with Rad51 is available, we mapped these three sites to their analogous positions in BRC4 and found that they are in close proximity to the Rad51 binding interface ( Figure 5B, PDB: 1N0W) [19]. The clustering of these residues near this interface might provide a clue to the driver of natural selection at these sites.

Discussion
Nearly all known cases of recurrent positive selection in primate genomes involve genes in one of three categories: 1) immunity, 2) environmental perception (such as odorant and taste receptors), or 3) sexual selection and matechoice [21,22]. This is due to the fact that ever-changing external stimuli (i.e. pathogens, environmental odors/ tastes, etc.) drive the selection of new allelic variants. For example, immunity factors that are constantly challenged by pathogens exhibit some of the most striking signatures of positive selection seen in primate genomes [23][24][25][26][27][28].
Here, immunity genes will experience positive selection for protein-altering mutations that improve recognition of a relevant pathogen. Conversely, the pathogen will counterevolve to escape detection, again placing selective pressure on the host population for new mutations that improve the immunity protein. This cycle can repeat itself indefinitely, resulting in an ever-escalating host-virus arms race. Therefore, it is surprising to see that BRCA1 and BRCA2, genes that do not classically fit into any of the three categories listed above, are evolving in a similar manner to these highly adaptive immunity genes. In addition to the two described here, other DNA repair genes have also been shown to evolve under positive selection [29,30], but the driver behind this unusual finding remains to be identified.
An intense battle exists between host DNA repair machinery and viruses, and we propose that this could contribute to the evolutionary signatures documented here. Many viruses are known to interact with the DNA repair machinery and cell cycle regulators [31,32]. One fundamental issue is that the free ends of viral genomes are exposed, in contrast to the host's DNA, which is capped by telomeres. Despite this, many viruses need to access the nucleus where the host's DNA repair machinery recognizes these un-capped viral genome ends as "damaged" cellular DNA, activating the DNA damage response. In order for productive infection to proceed, viruses must actively thwart these host repair pathways. For example, DNA repair proteins interfere with the adenovirus lifecycle by concatenating the ends of newly synthesized viral DNA, inhibiting efficient packaging into viral progeny [33]. In turn, adenovirus has evolved a way around this blockade by encoding proteins that mislocalize or degrade the specific host factors involved. Depending on the virus involved, host DNA repair factors can also be hijacked to facilitate viral replication. For instance, herpes simplex virus-1 simultaneously activates DNA repair constituents that aid in viral genome replication [34,35] and counteracts those that do not [36,37]. Human immunodeficiency virus 1 is also known to activate the DNA damage response and manipulate cell cycle checkpoints through the actions of its accessory protein Vpr [38,39]. Additionally, several studies have shown that specific DNA repair proteins play critical roles in retroviral genome integration [40][41][42][43] while others seem to decrease the efficiency of infection [44][45][46].
One can imagine that these and other viruses that access the nucleus during replication could feasibly interact with BRCA1 or BRCA2, driving the selection of variants that ultimately lead to decreased susceptibility to infection. However, it is possible that variant alleles selected for this purpose would have detrimental consequences to protein function in the context of host DNA repair. Most of the deleterious BRCA1 and BRCA2 variants characterized thus far introduce stop codons or frame-shifts that result in premature truncation of the protein, the consequences of which manifest as cancer at relatively early ages. The effects of non-synonymous point mutations, such as those documented here, might be expected to be much more subtle. The effects of subtle mutations are more difficult to assess because the resulting genomic instability may only be realized later in life and can be confounded by other genetic or environmental influences. We therefore propose a hypothesis where viruses are driving the intriguingly rapid rate of evolution seen in BRCA1 and BRCA2, potentially giving rise to antagonistic pleiotropy. This would be analogous to the malaria and sickle cell anemia trade-off that is well documented [47].

Conclusions
The BRCA1 and BRCA2 proteins play key roles in the repair of damage to chromosomal DNA. We have Figure 5 The sites of positive selection lying within the BRC repeats of BRCA2 are located adjacent to the Rad51 binding region. A. The 8 BRC repeats of the human BRCA2 protein were aligned using ClustalX. The red and peach colored boxes are the motifs within the BRC repeats thought to facilitate binding with Rad51 [20]. Residues 1008, 1225, and 1426 are colored in green, orange, and yellow, respectively. All three sites lie just adjacent to the FxxA motif which interacts with two hydrophobic pockets in the Rad51 oligomer. B. The co-crystal structure of BRC4 (blue) in complex with Rad51 (grey) is shown (PDB ID 1N0W [19]). The FxxA motif is depicted in red. Residues 1008, 1225, and 1426 are shown in green, orange, and yellow, respectively. expanded the analysis of the evolution of these genes, showing that both have been subject to recurrent positive selection during simian primate speciation. Although the force or forces driving the diversifying selection of these genes is unknown, the result is that the sequence of these proteins has been altered in humans and our closest living relatives. It remains to be seen whether this is an instance of antagonistic pleiotropy, where positive selection driven by one force causes functional consequences in another context, potentially the formation of cancers [48].

Non-human primate samples
Of the 44 chimpanzee samples evaluated in this study, 34 were obtained from the Chimpanzee Biomedical Research Resource (NIH8U42OD011197-13), which is supported through a cooperative agreement with the National Institutes of Health (NIH). This NIH-supported colony is housed at the MD Anderson Cancer Center's Michale E. Keeling Center for Comparative Medicine and Research (KCCMR) in Bastrop, TX. The origins of the chimpanzees comprising the KCCMR colony are highly diverse with only a few closely related (siblings/offspring) animals in the colony (Additional file 4). Blood from 34 chimpanzees was collected directly into PAXgene Blood RNA Tubes (PreAnalytix) at the same time other blood samples were obtained as part of the prescheduled annual veterinary exam for each animal. Another 10 chimpanzee genomic DNA samples were purchased from Coriell (Additional file 5).
All 44 rhesus macaque samples evaluated in this study were obtained from animals housed at the KCCMR in collaboration with researchers at this institution. The colony at the KCCMR is a closed breeding colony comprised of approximately 980 rhesus macaques of Indianorigin that originated from a colony of 286 founder animals in 1988 (degree of relatedness can be found in Additional file 6). Blood from these animals was collected directly into PAXgene Blood RNA Tubes (PreAnalytix) at the same time other blood samples were obtained as part of the prescheduled annual veterinary exam for each animal.
Bonobo genomic DNA samples were obtained from the integrated primate biomaterials and information resource (IPBIR) of the Coriell Institute or extracted from blood samples obtained from the Columbus zoo and the Language Research Center, Georgia State University. All seven individuals are unrelated (Additional file 7).
The remaining non-human primate samples were acquired as cell lines purchased from the Coriell Institute under a U.S. Fish and Wildlife Service permit (sources and unique identifiers are listed in Additional file 8). This study was approved by the University of Texas at Austin Institutional Review Board.
Primate BRCA1 and BRCA2 sequencing Human BRCA1 and BRCA2 coding sequences were obtained from GenBank (accession number NM 007294 and NM 000059, respectively). BRCA1 and BRCA2 sequences from chimpanzee, gorilla, orangutan, rhesus macaque, and marmoset were obtained using the BLAT alignment tool on the UCSC genome database (http://genome.ucsc.edu/). For the remaining 18 primate sequences, primary or immortalized cell lines were grown in standard media supplemented with 15% fetal bovine serum at 37°C and 5% CO 2 . Cells were collected and RNA was extracted using the AllPrep DNA/RNA kit (QIAGEN). cDNA libraries were generated using SuperScript III First-Strand Synthesis Kit (Invitrogen) using oligo dT or random hexamer primers. PCR products were generated using PCR Super-Mix High Fidelity (Invitrogen) and directly sequenced or cloned into pCR4 for sequencing. Primers used for PCR and sequencing can be found in Additional files 9, 10, 11 and 12. These sequences have been deposited in GenBank (accession numbers KM017616-KM017652).
Blood from rhesus macaque and chimpanzee individuals was collected in PAXgene Blood RNA Tubes (Pre-AnalytiX). RNA was extracted using the PAXgene Blood miRNA Kit (QIAGEN) and genomic DNA was obtained using the AllPrep DNA/RNA kit (QIAGEN). BRCA1 Exon 11 was amplified from extracted genomic DNA (chimpanzee, bonobo, and rhesus macaque) using PCR SuperMix High Fidelity (Invitrogen) and sequenced. Details on PCR and sequencing primers can be found in Additional file 9 and 10.

PAML analysis
A multiple sequence alignment was generated for BRCA1 and BRCA2 using ClustalX2.1 [49]. The alignments are straight-forward with only a few small indels (Additional files 1 and 2). Gene sequences at each ancestral node were reconstructed using the codeml program in PAML 4.3 [50]. dN/dS values along each branch of the phylogenetic tree were calculated using the free-ratio model. Substitution counts given along specified branches are the estimates made in the free ratio model, but were also calculated by directly comparing the predicted ancestral and the known extant sequences and counting differences manually. Both methods yielded the same values. The one-ratio and tworatio models were performed as described previously [51]. To detect selection, multiple alignments were fit to the NSsites models M1a (null model, codon values of dN/dS are fit into two site classes, one with value between 0 and 1, and one fixed at dN/dS = 1), M2a (positive selection model, similar to M1a but with an extra codon class of dN/dS > 1), M7 (null model, codon values of dN/dS fit to a beta distribution bounded between 0 and 1), M8a (null model, similar to M7 except with an extra fixed codon class at dN/dS = 1), and M8 (positive selection model, similar to M7 but with