- Research article
- Open Access
Gene synteny comparisons between different vertebrates provide new insights into breakage and fusion events during mammalian karyotype evolution
BMC Evolutionary Biologyvolume 9, Article number: 84 (2009)
Genome comparisons have made possible the reconstruction of the eutherian ancestral karyotype but also have the potential to provide new insights into the evolutionary inter-relationship of the different eutherian orders within the mammalian phylogenetic tree. Such comparisons can additionally reveal (i) the nature of the DNA sequences present within the evolutionary breakpoint regions and (ii) whether or not the evolutionary breakpoints occur randomly across the genome. Gene synteny analysis (E-painting) not only greatly reduces the complexity of comparative genome sequence analysis but also extends its evolutionary reach.
E-painting was used to compare the genome sequences of six different mammalian species and chicken. A total of 526 evolutionary breakpoint intervals were identified and these were mapped to a median resolution of 120 kb, the highest level of resolution so far obtained. A marked correlation was noted between evolutionary breakpoint frequency and gene density. This correlation was significant not only at the chromosomal level but also sub-chromosomally when comparing genome intervals of lengths as short as 40 kb. Contrary to previous findings, a comparison of evolutionary breakpoint locations with the chromosomal positions of well mapped common fragile sites and cancer-associated breakpoints failed to reveal any evidence for significant co-location. Primate-specific chromosomal rearrangements were however found to occur preferentially in regions containing segmental duplications and copy number variants.
Specific chromosomal regions appear to be prone to recurring rearrangement in different mammalian lineages ('breakpoint reuse') even if the breakpoints themselves are likely to be non-identical. The putative ancestral eutherian genome, reconstructed on the basis of the synteny analysis of 7 vertebrate genome sequences, not only confirmed the results of previous molecular cytogenetic studies but also increased the definition of the inferred structure of ancestral eutherian chromosomes. For the first time in such an analysis, the opossum was included as an outgroup species. This served to confirm our previous model of the ancestral eutherian genome since all ancestral syntenic segment associations were also noted in this marsupial.
By comparison with other vertebrates, mammals display a high degree of karyotype variability. Chromosome numbers vary considerably, ranging from 2n = 6 in the Indian muntjak  to 2n = 102 in the red viscacha rat . Despite this numerical variability, conserved (syntenic) chromosome segments have been successfully identified by means of comparative cytogenetics . A conserved genome framework, initially concealed by inter-species karyotypic divergence, was first revealed by comparative gene mapping, but became readily apparent with the advent of comparative chromosome painting. The application of these methodologies has served to confirm the presence of a limited number of chromosomal segments which have been evolutionarily conserved across a variety of mammalian species [4–7]. Taken together, these approaches have allowed the reconstruction of synteny maps of a number of ancestral mammalian genomes [8–12].
Recently performed comparisons of entire genome sequences have extended our understanding of the evolutionary history of mammalian genomes by revealing the presence of a limited number of syntenic segments with highly conserved gene orders, termed 'conserved linkage groups' [9, 13–16]. These segments can be used, almost as if they were pieces of a giant jig-saw puzzle, to compare extant genomes as well as to reconstruct ancestral genomes. Both comparative chromosome painting and genome sequence comparisons have indicated that the human genome possesses an organization which is highly conserved evolutionarily and which displays considerable similarity to the postulated ancestral eutherian karyotype [10, 12, 17] dating from ~105 million years ago (MYA) .
Ancestral genome models deduced from comparative cytogenetic analysis exhibit marked differences when compared to reconstructions of ancestral eutherian genomes based on whole genome sequence alignments [19, 20]. Recently, we devised a simplified method of comparative genome analysis based on the comparison of gene order in different species. By focussing exclusively on the relative positions of genes instead of aligning large contigs of genomic DNA, this method reduces the complexity of whole genome alignments thereby facilitating the identification of conserved syntenic segments. This technique was used successfully to identify the evolutionary origin of the mammalian X chromosome from three distinct ancestral chromosome building segments  and has also made possible the reconstruction of a vertebrate protokaryotype from 450 MYA . Since this methodology relies upon in silico gene order comparisons using genome sequence data from different species, an approach reminiscent of comparative chromosome painting, the in silico approach has been termed 'E-painting' (electronic chromosome painting) .
Estimates of the number, location and extent of evolutionary breakpoint intervals vary owing to methodological differences, and this variation has helped to fuel considerable controversy. Recent comparative genome sequence studies have been interpreted as indicating that evolutionary chromosomal rearrangements are non-randomly distributed across mammalian genomes and that the associated breakpoints have often been 'reused' [9, 23, 24]. The resulting 'fragile breakage model' of genome evolution has therefore presented a direct challenge to the now classical 'random breakage' model of Nadeau and Taylor .
In this study, we have performed an in silico genome-wide analysis of synteny (E-painting) in order to improve our understanding of the organization of the ancestral eutherian genome. Our analysis employed genome sequence data from human , mouse , rat , dog , cow http://aug2007.archive.ensembl.org/Bos_taurus/index.html; B_tau3.1, and opossum , genomes which have all been sequenced with at least 7-fold coverage. The chicken genome sequence  was also included in our comparison since previous studies have shown that chicken genome organization displays a remarkable resemblance to that of eutherian mammals [9, 21] despite its evolutionary divergence about 310 MYA.
Establishment of syntenic relationships and reconstruction of ancestral karyotypes
A previous comparative synteny analysis of about 3000 human genes and their orthologues in 5 other vertebrate species permitted the first reconstruction of an ancestral vertebrate karyotype . In this study, we have extended this comparative approach to identifying syntenic segments of orthologous genes and included all those human genes for which orthologues have been annotated in the genomes of mouse, rat, dog, cow, opossum (a marsupial) and chicken. Beginning with 28197 human genes (Human Genome Assembly 18, NCBI build 36), the number of orthologous genes in the studied species identified by the program BioMart ranges from 12591 in chicken to 17796 in mouse (Table 1). The maximum number of orthologous genes identifiable in a given species (by comparison with human) was recruited on the basis that the higher the number of genes employed in the analysis, the more precise would be the identification of the breakpoint intervals. Had we considered only those genes for which a one-to-one orthology relationship was identifiable in all species under investigation, this would have resulted in a considerable decrease in the number of genes to be analysed and hence a substantial decrease in the degree of resolution possible. Instead, the genome-wide coverage attained by using the maximum number of orthologous genes identifiable between human and the other studied vertebrate species served to optimize the resolution of the mapping of the evolutionary chromosomal breakpoints.
The dataset from human, representing the best characterized vertebrate genome to date (as well as one of the evolutionarily most conserved karyotypes among eutherian mammals), provided the reference against which segments of conserved syntenic genes could be identified in the genomes of the other species under investigation. In principle, blocks or segments containing syntenic human genes were sought which are also present as blocks of syntenic genes in the other species under study. Conversion of the syntenic segment associations into colour-coded ideograms rendered the conserved syntenic segments (and at the same time, the breakpoint intervals) readily identifiable (Figure 1; Additional file 1). The colour code employed in Figure 2 was used to indicate the orthologous relationships of syntenic segments in a comparison of the different species with human, as depicted in Figure 1, Additional file 1 and Figure 3. For example, the region of human chromosome 1 between positions 1.27 Mb and 67.23 Mb is identifiable as a continuous (syntenic) segment on rat chromosome 5 and mouse chromosome 4 (Figure 1). During our analysis, we considered as evolutionary breakpoints those disruptions in gene order (synteny) that resulted from (i) interchromosomal rearrangements in an ancestral species as deduced by comparing human with one of the other six species under investigation and (ii) intrachromosomal inversions that occurred in the human lineage where both breakpoint regions could be identified. If the breakpoint region of an interchromosomal rearrangement, identified by comparing the human genome with that of another species, was found to coincide with the breakpoint of an intrachromosomal rearrangement in any one of the other species, this intrachromosomal breakpoint was also considered as a break in synteny.
Employing these criteria to define evolutionary breakpoint intervals, a total of 526 such intervals, with an average size of 290 kb and a median size of 120 kb, were identified (Table 2; Additional file 2). To visualize all syntenic breakpoint intervals, chromosome ideograms were drawn up such that all breakpoints were arranged equidistantly, with the precise positions of the breakpoint intervals being demarcated by the genomic coordinates of the flanking genes (an exemplar is shown in Figure 1 for HSA1, whilst all ideograms from chromosomes 1 to 22 are depicted in Additional file 1). The orthologous relationships between the analyzed genomes served to identify a total of 38 different ancestral syntenic segments which are indicated by a colour code in Figure 2. The ideograms in Figure 1 and Additional file 1 are equivalent to a reverse chromosome painting dataset of the six analyzed species onto human chromosomes at high resolution. The precise positions of the genes flanking all identified breakpoint intervals are listed in Additional file 2.
The graphical compilation of syntenic disruptions shown in Additional file 1 indicates that 7.6% of the evolutionary breakpoints (N = 40 of 526, highlighted by stippled green lines) have been 'reused' i.e. breakpoints were found in the same genomic intervals in at least three species from two different clades (reused breakpoints are marked in red in Additional file 1). The assignment of the species under investigation to different clades within the mammalian phylogenetic tree is indicated in Additional file 3 (during this analysis, chicken and opossum were considered as two different clades). Taking all autosomes into consideration, 218 breakpoint regions were identified in a comparison of the chicken and human genomes whereas 153 breaks in synteny serve to differentiate the human and opossum chromosomes. A total of 27 breakpoints were found to be shared between chicken and opossum but were not observed in any other species, suggesting that these constitute evolutionary breakpoints that occurred in the eutherian common ancestor (Additional file 2). A comparison of the gene orders exhibited by both murid species with those of humans, revealed 106 breaks in synteny (Additional file 2). However, only 4 breaks in synteny were specific to the rat whereas 17 were specific to the mouse. The many murid-shared breaks in synteny (N = 85) as compared with humans is clearly a reflection of the extended common phylogenetic history of mouse and rat, which only became separated into distinct species 16–23 MYA [32, 33]. The two ferungulate species, dog and cow, only share 14 breaks, with 65 breaks being restricted to the canine lineage and 114 breaks confined to the bovine lineage . The much higher number of lineage-specific breaks in these two species, both of which belong to the Laurasiatheria, is indicative of the longer period of time that has elapsed since the evolutionary divergence of the carnivores and artiodactyls ~88 MYA .
The version of the cow genome used for our analysis (Btau_3.1) may contain some local errors caused by intrachromosomal misplacement of scaffold. These intrachromosomal inconsistencies are not however relevant to the tests we have performed since we were primarily interested in analysing interchromosomal rearrangements between the human and bovine genomes.
Several breaks in synteny were identified in mouse, rat, dog, cow, opossum and chicken that are common to all six species (Additional file 2). The most parsimonious explanation for this observation is not breakpoint 'reuse' but rather that these were primate- (or even human-) specific breaks. Some 63 such primate lineage-specific breakpoints were identified and these are indicated by stippled red lines in the ideograms (Fig. 1A, Additional file 1). Most of these breaks appear to have been caused by primate-specific inversions (N = 22, Table 3). Proportional to its length, HSA17 is especially rich in such primate-specific inversions. A disproportionate number of these inversions were also noted in the orthologous segment of HSA19p in the lineage leading to rodents, in the orthologous segment of HSA20p in the lineage leading to chicken and in the orthologous segment of HSA1 in the canine lineage (Additional file 1). The remaining primate-specific breakpoints may be attributable to chromosome fusions and insertions of small segments.
Employing the previously described method of concatenating overlapping conserved syntenic segments , the eutherian mammal genome data permitted the seamless assembly of conserved segments into ancestral chromosomes. Ancestral associations between conserved syntenic segments are identifiable by virtue of the presence of shared orthologies between mammalian chromosomes from at least three different species. The resulting model of the ancestral boreoeutherian genome (Figure 3), with a chromosome number of 2n = 46, describes the karyotype of the last common ancestor of primates and rodents (superorder Euarchontoglires, Additional file 3) as well as of carnivores and cetartiodactyls (superorder Laurasiatheria).
Chromosomal sites of syntenic breakage
High precision syntenic breakpoint mapping permits the evaluation, at least in principle, of whether or not these evolutionary breaks coincide with potential hotspots of chromosomal rearrangement such as fragile sites or cancer-associated breakpoints. Fragile sites are classified as either rare (spontaneously occurring) or common (inducible) . Altogether, some 89 common fragile sites have been mapped at the cytogenetic level  although only the 11 most common autosomal fragile sites have been precisely characterized at the molecular level [35, 37–49]. A comparison of these 11 precisely characterized fragile sites with the positions of the evolutionary breakpoints identified in this study indicated that only FRA4F and FRA7E, which span distances of 5.9 Mb and 4.4 Mb respectively, partially overlap with evolutionary breakpoint regions (Table 4). For none of the other 524 evolutionary breakpoints was any overlap with a fragile site observed. Under a random model, we estimate that ~1.23% (37.9/3093) of the 526 observed breakpoints intervals would have been expected to overlap with one of the 11 fragile sites. Since only 2/526 breakpoints (0.38%) were found to display an overlap with a fragile site (p = 0.11), we concluded that there was no evidence for extensive co-location.
A second class of chromosomal breakage hotspot is represented by recurring cancer-associated breakpoints. Although the majority of such breakpoints have been assigned to cytogenetic bands, they have not yet been mapped with any degree of precision. A variety of genes, with actual or potential roles in tumorigenesis, nevertheless reside at or near these breakpoints. We therefore identified the exact genomic positions of 387 annotated cancer-associated autosomal genes using the Atlas of Genetics and Cytogenetics in Oncology and Haematology http://atlasgeneticsoncology.org. For the purposes of this analysis, only well-established cancer-associated genes were included (for convenience, these are listed separately in this database). Other genes in this database that have not yet been convincingly implicated in cancer were not included in this analysis. Of the 387 cancer genes, only 13 mapped to evolutionary breakpoint intervals identified in this study (Table 5, Additional file 2). Since the 526 evolutionary breakpoint intervals together comprise 151.7 Mb of genomic sequence, we estimate that some 20 cancer-associated genes might have been expected to occur within the breakpoint intervals by chance alone. We therefore conclude that genes occurring at cancer-associated breakpoints are not disproportionately represented within regions of evolutionary breakpoints.
The question then arises as to the location of these evolutionary breakpoints in relation to genes and other DNA sequence features. As mentioned above, a total of 66 primate-specific breaks in synteny were identified in this analysis. Remarkably, 78% of these breakpoint intervals coincide with segmental duplications (SDs) in the human genome (Additional file 2) despite the fact that SDs comprise only 4–5% of the human genome sequence [50–52]. Colocalization with copy number variants (CNVs) was also observed in the case of 76% of these breakpoints (Additional file 2). Thus, primate-specific breakpoint regions would appear to be highly enriched for both SDs and CNVs.
Those human chromosomes that are known to be gene-dense also appear to contain significantly more breakpoints than gene-poor chromosomes (Table 6). Indeed, a strong correlation was noted between protein-coding gene density and the number of evolutionary breakpoints per chromosome (r = 0.60; p = 0.0031). When the gene-dense chromosomes HSA17, HSA19 and HSA22 were directly compared with the gene-poor chromosomes HSA13, HSA18 and HSA21, the gene-dense chromosomes exhibited nearly three times as many breaks per Mb as gene-poor chromosomes.
We further observed a correlation between transcript density and breakpoint occurrence (r = 0.62, p = 0.0029). To calculate this correlation coefficient, we used the Human Transcriptome Map, based on the draft human genome sequence as provided by the UCSC Genome Bioinformatics Project http://genome.ucsc.edu/, which includes all transcribed sequences except processed pseudogenes (according to Versteeg et al. ). The correlation noted between transcript density and breakpoint occurrence became even stronger when chromosomal regions were considered rather than entire chromosomes. The evolutionary breakpoint regions identified here exhibited a 1.54-fold increase in transcript density for the central 1 Mb of syntenic breakpoint regions as compared to the genome average (Additional file 4). When this analysis was further restricted to the 144 most precisely mapped breakpoint intervals of <40 kb, the transcript density attained a value some 2.9 times that of the genome-wide average (Additional file 5). Finally, analyses of breakpoint intervals assigned to individual evolutionary lineages indicated that the breakpoint regions identified in both chicken and opossum lineages displayed very high transcript densities corresponding to 3.7 times the genomic average (Table 7).
Random breakage or non-random location of evolutionary breakpoints
In order to ascertain whether the evolutionary breakpoints identified in this study occurred randomly or were instead preferentially located in certain genomic regions, we performed simulation experiments. To avoid consideration of breakpoints that did not result from independent breakage (and which could have been identical-by-descent), we selected only breakpoints that were present in mouse, cow, opossum and chicken, respectively. Breakpoints in rat and dog were excluded from this analysis in order to avoid consideration of breakpoints that could have been identical-by-descent and shared either by mouse and rat or by dog and cow. For example, breakpoints present in mouse and rat (as compared to human) could have been identical-by-descent yet would have been counted twice in our analysis. Thus, only breakpoints in mouse and cow were considered (and not those in rat and dog) in order to avoid the potential double-counting of some evolutionary breakpoints. Those 63 breakpoint regions observed in all 4 species (mouse, cow, opossum, chicken) compared to human, and which were thus specific to the primate lineage, were also excluded (indicated in yellow in Additional file 2). Finally, a total of 519 breakpoints were considered that were evident in four species (N = 132 in mouse, N = 143 in cow, N = 89 in opossum and N = 155 in chicken; Additional file 2). These 519 breakpoints occurred in 410 genomic regions, 324 of which contained a breakpoint observed in only one species (as compared to human), whereas 63 genomic regions contained breakpoints in two species, and 23 genomic regions contained breakpoints in three species.
By means of a simulation with 100,000 iterations, we then estimated the proportion of the genome in which these 519 breakpoints would have been expected to occur, by chance alone, given a certain specified number of genomic regions available to harbour evolutionary breakpoints (Additional file 6). For these simulations, the human genome was partitioned into 10,000 regions, each 0.3 Mb in length (the average length of the observed breakpoint regions). Assuming a random breakage model for the entire genome, partitioned into 10,000 equal-sized genomic segments available to harbour breakpoint regions, the 519 evolutionary breakpoints would have been expected to occur in between 500 and 516 regions with 99% probability (Additional file 6). In other words, given random breakage, a maximum of 19/519 (3.7%) breakpoints might reasonably have been expected to co-locate by chance to the same regions at the 1% level of probability. In practice, however, we have noted that the 519 observed evolutionary breakpoints were confined to only 410 breakpoint regions. According to our simulations (presented in Additional file 6), this number of breakpoint regions would be expected if only 7–10% of the genome (i.e. 700–1000 of the 0.3 Mb regions) were available to harbour evolutionary breakpoints. Thus, according to our model-based simulations, the observation of 519 breakpoints being located within 410 out of 10,000 genomic regions is most plausible when the occurrence of breakpoints is confined to only 7–10% of the genome. Even if we were to assume that some 20% of the genome could harbour evolutionary breakpoints, the observed distribution has a <1% probability of occurring under the model of random breakage. We therefore feel confident in rejecting the null hypothesis that these breakage events occurred randomly. We instead conclude that they occurred preferentially within certain genomic regions.
Among the 519 breakpoints considered in the above mentioned simulation analysis were 27 breaks in synteny that occurred in the same genomic interval in both chicken and opossum, but not in mouse or cow. These breakpoints shared by chicken and opossum could however have been identical-by-descent and would thus have occurred only once in the eutherian common ancestor, not twice as we implicitly assumed in the previously described simulations. In order to avoid double counting of some breakpoints, we repeated the simulations, this time considering only the breakpoint regions in mouse (N = 132), cow (N = 143) and opossum (N = 89). A total of 41 breakpoint intervals were found to be shared by these species, whereas 323 breakpoint regions were unique to the species considered. During these simulations, the genome was subdivided into 10,000 bins, each of length 0.3 Mb (potential regions for a breakpoint), and the 323 mammalian breakpoints were distributed between these bins. The simulation experiments served to demonstrate that the breakpoint positions are incompatible with a random model of breakage. The expected number of breakpoint regions under this model was calculated to be 359.7; in none of the 100,000 simulation runs was such a low number of breakpoint intervals noted as that actually observed (N = 323; two-sided p-value approximates zero). When the model was relaxed to 2000 selected bins (special candidate regions for breakpoints), 342.6 unique breakpoints would have been expected (two-sided p = 0.00002). On the other hand, a model with 1000 bins, i.e. one using ~10% of the genome, appears to be compatible with the observed values: expected number of unique breakpoints = 322.3 (p = 0.92).
Refining the structure of boreoeutherian ancestral chromosomes
Comparative genome maps, based on more than eighty species of eutherian mammal, have previously been generated by chromosome painting. Such analyses have revealed the pathways of mammalian genome evolution at the chromosomal level [6–8, 10–12, 54–57]. However, comparative chromosome painting is inadequate to the task of comparing the genomes of species which have been separated for more than 100 million years. This is due to the lower hybridization efficiency of probes consequent to increased sequence divergence. Thus, reports of successful hybridizations of eutherian probes onto marsupial chromosomes are confined to a single chromosome . To overcome this limitation, comparative genome sequence analyses based on direct genome alignments have been performed with the aim of reconstructing precise ancestral gene orders [9, 14–16]. However, models of ancestral eutherian genome organization constructed from such genome sequence alignments display considerable differences with respect to the assignment of ancestral syntenic groups, when compared to models derived from comparative chromosome painting data [12, 19, 20, 59].
E-painting (electronic chromosome painting)  was introduced in order both to overcome the inherent limitations of comparative cytogenetic approaches and to reduce the complexity of direct whole genome sequence alignments. This in silico technique is based on the comparative mapping of orthologous genes and the identification of conserved syntenic segments of genes instead of comparative alignments of large sequence contigs containing intergenic sequences as well as genes. The advantage of E-painting over comparative genome sequence analysis is that the former reduces the complexity of genome alignments to easily manageable conserved syntenic segments comprising orthologous genes. Its limitation, however, is that it cannot be applied to the investigation of telomeric, centromeric or non-genic regions that could have nevertheless played an important role during karyotype evolution.
In the present study, E-painting was used to reinvestigate the previously proposed boreoeutherian protokaryotype [8, 10, 12, 54]. The resulting model of the boreoeutherian genome (Figure 3) closely resembles those models previously derived by means of comparative chromosome painting. Indeed, our data derived from E-painting analysis not only confirmed all major syntenic segment associations proposed in previous studies [8–12] but also served to refine the model by accommodating short syntenic segments orthologous to portions of chromosomes HSA7, HSA10, HSA12 and HSA22 (Figure 3).
The improved definition of ancestral eutherian chromosomes by E-painting achieved in this study is particularly evident in the context of the evolution of chromosomes HSA12 and HSA22. A common feature of previously proposed protokaryotypes has been the presence of two different protochromosomes displaying associations of HSA12 and HSA22. As is evident from the colour-coded ideograms in Fig. 3, the larger protochromosome, 12p-q/22q, comprises an extended 12p-q segment stretching from HSA12pter to a point 106.67 Mb from 12q and includes the terminal segment of HSA22q (31.10 Mb toward 22qter). Further, we have identified a third proximal 2.7 Mb segment from HSA22q (14.4 Mb to 17.03 Mb) that bears the same colour code in all analyzed species (Figure 4) and which must therefore also form part of this large protochromosome. Additionally, the E-painting indicated that the ancestral chromosome orthologous to HSA10q should be extended by a 1.5 Mb-sized proximal portion of its p-arm (Figure 4). The existence of this extension was supported by both eutherian and chicken genome sequence data and indicates that the breakpoint is located in a region orthologous to 10p rather than within the centromere (Figure 4).
Importantly, E-painting using the opossum and chicken genomes indicated an HSA10p/12/22 association (Figure 4). These findings, taken together with recent comparative chromosome painting data supporting the 10p/12/22 association in the Afrotheria and in some Xenarthra [10, 11, 56, 60] and carnivores , strongly corroborate an ancestral 10p/12/22 chromosome as part of the ancestral eutherian karyotype. Furthermore, this 10p/12/22 association is compatible with an ancestral eutherian chromosome number of 2n = 46 (Figure 3).
The extensive agreement between ancestral genome reconstructions based respectively upon comparative chromosome painting and E-painting is strongly supportive of the validity of the E-painting approach. Further, the E-painting analysis performed here has confirmed the previously proposed ancestral eutherian chromosome associations, 3/21, 4/8, 7/16, 10/12/22, 12/22, 16/19 and 14/15 [8–12], since all these associations are readily identifiable in the opossum genome. However, the 3/21 association in the opossum involves a different set of genes as compared to the 3/21 association in the eutherian species, thereby indicating the presence of additional rearrangements involving the corresponding chromosomal regions in marsupials.
Recent comparative chromosome painting studies performed with several afrotherian [10, 55, 60, 62] and xenarthran species [11, 56, 63] have indicated that their karyotypes display a remarkable degree of similarity to the previously proposed ancestral boreoeutherian karyotype . The chromosomal associations 1/19 and 5/21 seem, however, to be specific to afrotherians [55, 56, 62, 64] with no xenarthran-specific chromosomal rearrangements having been identified as yet [11, 56].
Our findings indicate that none of the afrotherian-specific rearrangements are evident in the opossum genome. This finding, together with the observation that the above mentioned ancestral eutherian chromosome associations are also present in the opossum, suggest that the ancestral boreoeutherian karyotype is very similar to the ancestral eutherian karyotype (see Additional file 3 for an overview of the phylogenetic relationships among the major placental groups, according to Wildman et al. ).
Chromosomal distribution of evolutionary breakpoints
The comparative synteny analysis presented here has succeeded in defining evolutionary chromosomal breakpoints with a considerably higher degree of resolution than has previously been achieved. For example, the length of the median breakpoint interval in this study is only 120 kb (Table 2). Furthermore, the average length (290 kb) of the breakpoint intervals assigned here is about a quarter of that reported by Murphy et al. . Ruiz-Herrera et al. , in a second related study, included data from Murphy et al.  but added further species with even less precisely defined breakpoint data. The present study has avoided the uncertainty inherent in matching up cytogenetic band information with genome sequence data. The assessment of the spatial correlation between evolutionary chromosomal breakpoints and DNA sequence features such as gene density, GC-content, segmental duplications and copy number variations (as well as cytogenetic features such as fragile sites and cancer-associated breakpoints), promises to yield new insights into mechanisms of chromosomal rearrangement whose relevance may well extend beyond the confines of evolution and into the sphere of genetic disease (and particularly tumorigenesis).
In this study, a total of 526 evolutionary breakpoint intervals were identified. Knowledge of their respective genomic positions then allowed us to address the question as to whether evolutionary breakpoints co-locate with cancer-associated breakpoints and/or common fragile sites, an issue which has been quite contentious over the last few years [23, 67]. The original 'random breakage model' of Nadeau and Taylor  has been challenged by Pevzner and Tesler  who favour an alternative model in which at least some evolutionary breakpoint regions are prone to repeated breakage in the context of disease-related rearrangements. Inherent to the latter model is the prediction that evolutionary breaks will frequently overlap with fragile sites and cancer-associated breakpoints [9, 66, 69, 70]. The precise mapping data presented here are not however compatible with such a physical overlap of breakpoints. When considering fragile sites, rare and common sites must be clearly distinguished . Rare fragile sites are less frequent and, at the DNA sequence level, are associated with expanded repeats. In some cases, such sites are associated with a specific clinical phenotype . By contrast, common fragile sites (numbering 89 according to Debacker and Kooy ) are observed in different mammalian species [71, 72] and may be spatially associated with large active gene clusters . In our analysis, we focussed exclusively on the 11 common fragile sites that have been well characterized at the DNA sequence level [35, 38–49] but only two of these sites were found to exhibit partial overlap with an evolutionary breakpoint interval (N = 526) identified here (Table 4). We cannot however make any statement with respect to a potential overlap between the evolutionary breakpoints and those common fragile sites that are hitherto poorly mapped and remain uncharacterized at the DNA sequence level.
A second class of common chromosomal breakpoint is represented by those breakpoints associated with tumorigenesis. These cancer-related breakage events frequently generate fusion genes that are commonly characterized by gains of function . To refine the DNA sequence positions of known cancer-associated breakpoints, we utilized the known sequence coordinates of 387 cancer-associated genes. These were then cross-compared with the 526 evolutionary breakpoint intervals identified in our analysis. However, no evidence was found for the known cancer-associated genes (and hence their associated breakpoint regions) being over-represented within regions of evolutionary chromosomal breakpoints.
A word of caution is appropriate here. Although it may eventually prove possible to identify unequivocally the positions of many evolutionary and cancer-associated breakpoints, there is no a priori reason to suppose that these breakpoints should occur in precisely the same locations. Indeed, there is every reason to believe that, even if we were to focus our attention upon those breakpoints which colocalize to the extended regions characterized by segmental duplication, these breakpoints would probably occur in heterogeneous locations with respect to the various genes present within the unstable regions. This is because, in order to come to clinical attention, somatic cancer-associated gene rearrangements must confer a growth advantage upon the affected cells or tissue, usually via gene deregulation or through the creation of a fusion gene. Evolutionary rearrangements (which must, by definition, be heritable and hence occur in germ cells) represent the other side of the coin: they could not have become fixed had they been disadvantageous to individuals of the species concerned. It follows that the rearrangements derived in these two quite different contexts (i.e. somatic/cancer-associated versus germ cell/evolutionary) are likely (i) to have affected the structure, function and expression of different genes in different ways, (ii) to have been subject to quite different 'selective pressures' in these different contexts and hence (iii) would have been most unlikely to have occurred in precisely the same genomic locations. In agreement with these predictions, a different regional distribution of cancer-associated and evolutionary breakpoints has been noted by Sankoff et al.  whilst Helmrich et al.  failed to detect any overlap between fragile sites and evolutionary breakpoints.
Our E-painting data do however provide some support for the postulate that evolutionary breakpoints have been 'reused', sensu lato . Indeed, 7.6% of the identified evolutionary breakpoint intervals identified here contain two or more breakpoints. By computer simulation, we confirmed that the distribution of the 519 observed breakpoints into only 410 different genomic segments is best explained by non-random breakage with only ~7–10% of the genome harbouring evolutionary breakpoints. This proportion is somewhat lower than that previously reported (20%) for the 'reuse' of breakpoint regions  but this could be due to the higher resolution breakpoint mapping achieved here. Recently, breakpoint 'reuse' has also been noted in the case of a recurrent inversion on the eutherian X chromosome  and in a comparison of chicken chromosome GGA28 with orthologous syntenic segments in human, fish (Fugu), amphibian (Xenopus), opossum, dog and mouse . Taken together, these findings are quite compatible with the fragile breakage model of chromosome evolution first proposed by Pevzner and Tesler  and sustained by the more recent analysis of Alekseyev and Pevzner .
Our data confirm and extend previous reports of associations between segmental duplications (SDs) with evolutionary rearrangements [77, 78]. SDs comprise 4–5% of human autosomal euchromatin [50–52] whereas the primate lineage-specific breakpoint intervals comprise 0.86% of the euchromatin. This notwithstanding, some 78% of the evolutionary breakpoint intervals colocalize with known SDs whilst 76% coincide with regions of known copy number variation (Additional file 2). These proportions are significantly higher than those reported from comparative analyses of evolutionary breakpoints between the human and murine lineages [51, 78]. This difference is probably due to the focus in the present analysis having been placed upon primate lineage-specific breakage.
Turning to the sites at which evolutionarily fixed chromosomal breaks have occurred, we have previously mapped at the DNA sequence level the breakpoints of eight inversions that serve to distinguish the human and chimpanzee karyotypes [79–81]. None of these rearrangements is as yet known to be associated with either the activation or inactivation of genes at or near the breakpoint sites. The present study indicates that, at least in the primate lineage, the evolutionary breakpoints are enriched for SDs whilst overlapping to a similar extent with sites of known copy number variants. This concurs with recent findings from comparative studies of syntenic disruptions between gibbon and human chromosomes [82, 83]. Indeed, nearly half of all gibbon-human breaks in synteny occur within regions of segmental duplication in the human genome, thereby providing further evidence for the evolutionary plasticity of these regions which has clearly been responsible for promoting a significant proportion of the chromosomal breaks in primates .
Our analysis has revealed an even stronger correlation between high gene density and evolutionary fragility than that previously reported . Although the evolutionary breakpoint regions identified here display about 3 to 4 times the transcript density of the euchromatic genome average (Table 7), it would seem rather unlikely that evolutionary breakpoints have frequently disrupted gene coding regions. Intriguingly, a study of chicken chromosome GGA28  has revealed that evolutionary breakpoint regions, identified through the analysis of human-chicken synteny, are disproportionately located in regions with a high GC-content and high CpG island density rather than in gene-dense regions per se. Thus, it is tempting to speculate that at least some of these evolutionary breakpoints, particularly those occurring in gene-associated CpG-islands, could have contributed to functional changes in mammalian gene structure or expression .
In summary, we have presented an approach that greatly reduces the complexity of comparative genome sequence analysis and which is capable of providing valuable insights into the dynamics of eutherian karyotype evolution. The gene synteny analysis data yielded high definition evolutionary breakpoint maps which have significantly improved the resolution of existing maps derived by chromosome painting . Correlation analyses with similarly well mapped cancer-associated breakpoints and fragile sites however failed to provide any evidence for an association with evolutionary breakpoints. We nevertheless noted a higher than previously observed positive correlation of evolutionary breakpoints with gene density and also corroborated the reported association of segmental duplications with evolutionary breakpoints in the primate lineage. The ancestral eutherian genome, reconstructed through E-painting, displays a high degree of agreement with that derived from the much larger comparative cytogenetic dataset. The inclusion of a marsupial genome in this comparison, which has not hitherto been attempted, suggested that the ancestral boreoeutherian karyotype was probably very similar to the ancestral eutherian karyotype.
Gene synteny analysis
The synteny comparisons across different vertebrate species were carried out in silico by means of reciprocal BLAST 'best-hit' searches using the ENSEMBL database; http://www.ensembl.org. Only genomes with at least a 7-fold sequence coverage were included in the analysis (human, mouse, rat, cow, dog, chicken, opossum). Data mining for established protein coding genes was performed using the program BioMart (http://www.ensembl.org; ENSEMBL release 46). Orthologous gene location data were retrieved from the genomes of rat, mouse, dog, cow, opossum and chicken, and were arranged by reference to the human gene order (NCBI Build 36). For the purposes of this analysis, a syntenic segment was defined as consisting of a group of contiguous genes in humans as well as in the other species under investigation (mouse, rat or dog etc). We have included in these gene order comparisons all those human genes for which orthologues have been annotated in the genomes of mouse, rat, dog, cow, opossum and chicken. Only segments with three or more consecutive syntenic genes were considered in order to avoid annotation errors or the inclusion of pseudogenes and retrotransposed genes. To aid visualization, the syntenic segments were individually identified by differential colour coding according to the colour code given in Figure 2. Breakpoint intervals were defined by the last gene from the proximal syntenic segment and the first gene from the following more distal syntenic segment of the respective species (summarized in Additional file 2). Gene positions are given in Mb according to the human genome sequence http://www.ensembl.org. The data analysis was otherwise performed as previously described [22, 34].
Gene density calculations were carried out using Stata software (StataCorp, College Station, TX) based on the transcriptome data presented by Versteeg et al.  with updates available through the Human Transcriptome Map http://bioinfo.amc.uva.nl/HTMseq.
The diploid chromosome numbers of the species investigated are: N = 40 in mouse; N = 42 in rat; N = 60 in cow; N = 78 in dog; N = 18 in opossum; N = 78 in chicken. The assembly of conserved syntenic segments into ancestral chromosomes was used to model the ancestral boreoeutherian karyotype with a chromosome number of 2n = 46.
Bovine genome versions
At the time of writing, the bovine genome sequence remains unpublished although a near complete version (B_tau3.1) was made available to us for the purposes of this study B_tau3.1 http://www.ensembl.org/Bos_taurus/index.html. B_tau3.1 has recently been replaced by the latest version B_tau4.0. The only major differences between the two versions of the bovine genome sequence resulted from scaffolds being misplaced within chromosomes BTA6, 19 and 29, respectively. These errors could however only account for the misclassification of intrachromosomal rearrangement breakpoints. Our synteny comparisons were, by contrast, largely based on the identification of interchromosomal rearrangements (syntenic genes in humans being located on two different chromosomes in the species under investigation). Nevertheless, re-examination of our data allowed us to conclude that our original results were not affected in any way by the occasional intrachromosomal misplacement of scaffolds on the BTA chromosomes in version B_tau3.1. All six intrachromosomal breakpoints (involving BTA chromosomes 6, 19 and 29) were found to coincide with breakpoints identified in other species (Additional file 1). Indeed, four of these 6 intrachromosomal breakpoints coincided with breakpoints in two or more additional species. It therefore follows that the removal of these B_tau3.1-derived 'breakpoints' from our analysis would not have resulted in any reduction in the overall breakpoint number.
Assessment of overlap between evolutionary breakpoints and common fragile sites
The χ2-goodness-of-fit (exact version implemented in SAS) was applied to test whether the overlap between autosomal fragile sites and evolutionary breakpoint intervals is non-random. The genomic region covered by 11 selected fragile sites is 34.6 Mb, as summarized in Table 4, amounting to 1.12% of the autosomal genome (assuming it to be 3093 Mb). Since the average extension of a breakpoint interval is 0.3 Mb, it is on average sufficient for an overlap that the midpoint of a breakpoint interval lies within the borders of a fragile site ± 0.15 Mb, an area which amounts to 34.6+11 × 0.3 = 37.9 Mb. Thus, under a random model, ~1.23% (37.9/3093) of the 526 observed breakpoints intervals would be predicted to overlap with a fragile site. Since only 2/526 breakpoints (0.38%) were found to display an overlap with a fragile site (p = 0.11), there was no evidence for significant co-location.
To assess whether the positions of the breakpoints identified in this study would fit best with a model of random or non-random chromosomal breakage during vertebrate karyotype evolution, 100,000 simulation experiments were performed. Depending upon the number of genomic regions of length 0.3 Mb available for evolutionary breakpoints, the expected number of different breakpoint regions assumed to harbour a total of 519 observed breakpoints (N = 132 in mouse, 143 in cow, 89 in opossum and 155 in chicken) was estimated under a model of random breakpoint selection in each species. The deduced relationship between the number of genomic segments available for chromosomal breakage and the expected and observed numbers of genomic segments used by 519 breakpoints has been graphically depicted (Additional file 6). Additionally, the '99%-probability intervals' were determined to provide an indication of the ranges over which the different breakpoint regions are situated with a probability of 99%. The expected numbers of genomic segments were then directly compared with the observed number of 410 regions actually used. Thus, for example, if 1000 segments (corresponding to ~10% of the genome) were available to harbour evolutionary breakpoints, some 427 would have been expected to be used by 519 breakpoints. The probability that <409 or >445 segments would contain a breakpoint was calculated to be only ~1%.
million years ago
Wurster DH, Benirschke K: Indian muntjac, Muntiacus muntjak: A deer with a low diploid chromosome number. Science. 1970, 168: 1364-1366. 10.1126/science.168.3937.1364.
Contreras LC, Torres-Mura JC, Spotorno AE: The largest known chromosome number for a mammal, in a South American desert rodent. Experientia. 1990, 15: 506-508. 10.1007/BF01954248.
Dutrillaux B, Couturier J: The ancestral karyotype of Carnivora: comparison with that of plathyrrhine monkeys. Cytogenet Cell Genet. 1983, 35: 200-208. 10.1159/000131867.
Jauch A, Wienberg J, Stanyon R, Arnold N, Tofanelli S, Ishida T, Cremer T: Reconstruction of genomic rearrangements in great apes and gibbons by chromosome painting. Proc Natl Acad Sci USA. 1992, 89: 8611-8615. 10.1073/pnas.89.18.8611.
Scherthan H, Cremer T, Arnason U, Weier HU, Lima-de-Faria A, Frönicke L: Comparative chromosome painting discloses homologous segments in distantly related mammals. Nat Genet. 1994, 6: 342-347. 10.1038/ng0494-342.
Rettenberger G, Klett C, Zechner U, Bruch J, Just W, Vogel W, Hameister H: Zoo-FISH analysis: cat and human karyotypes closely resemble the putative ancestral mammalian karyotype. Chrom Res. 1995, 3: 479-486. 10.1007/BF00713962.
Rettenberger G, Klett C, Zechner U, Kunz J, Vogel W, Hameister H: Visualization of the conservation of synteny between humans and pigs by heterologous chromosomal painting. Genomics. 1995, 26: 372-378. 10.1016/0888-7543(95)80222-8.
Chowdhary BP, Raudsepp T, Froenicke L, Scherthan H: Emerging patterns of comparative genome organization in some mammalian species as revealed by Zoo-FISH. Genome Res. 1998, 8: 577-589.
Murphy WJ, Larkin DM, Wind van der AE, Bourque G, Tesler G, Auvil L, Beever JE, Chowdhary BP, Galibert F, Gatzke L, Hitte C, Meyers SN, Milan D, Ostrander EA, Pape G, Parker HG, Raudsepp T, Rogatcheva MB, Schook LB, Skow LC, Welge M, Womack JE, O'brien SJ, Pevzner PA, Lewin HA: Dynamics of mammalian chromosome evolution inferred from multispecies comparative maps. Science. 2005, 309: 613-617. 10.1126/science.1111387.
Yang F, Alkalaeva EZ, Perelman PL, Pardini AT, Harrison WR, O'Brien PC, Fu B, Graphodatsky AS, Ferguson-Smith MA, Robinson TJ: Reciprocal chromosome painting among human, aardvark, and elephant (superorder Afrotheria) reveals the likely eutherian ancestral karyotype. Proc Natl Acad Sci USA. 2003, 100: 1062-1066. 10.1073/pnas.0335540100.
Yang F, Graphodatsky AS, Li T, Fu B, Dobigny G, Wang J, Perelman PL, Serdukova NA, Su W, O'Brien PC, Wang Y, Ferguson-Smith MA, Volobouev V, Nie W: Comparative genome maps of the pangolin, hedgehog, sloth, anteater and human revealed by cross-species chromosome painting: further insight into the ancestral karyotype and genome evolution of eutherian mammals. Chrom Res. 2006, 14: 283-296. 10.1007/s10577-006-1045-6.
Froenicke L: Origins of primate chromosomes – as delineated by Zoo-FISH and alignments of human and mouse draft genome sequences. Cytogenet Genome Res. 2005, 108: 122-138. 10.1159/000080810.
Eichler EE, Sankoff D: Structural dynamics of eukaryotic chromosome evolution. Science. 2003, 301: 793-797. 10.1126/science.1086132.
Bourque G, Pevzner PA, Tesler G: Reconstructing the genomic architecture of ancestral mammals: lessons from human, mouse, and rat genomes. Genome Res. 2004, 14: 507-516. 10.1101/gr.1975204.
Bourque G, Zdobnov EM, Bork P, Pevzner PA, Tesler G: Comparative architectures of mammalian and chicken genomes reveal highly variable rates of genomic rearrangements across different lineages. Genome Res. 2005, 15: 98-110. 10.1101/gr.3002305.
Ma J, Zhang L, Suh BB, Raney BJ, Burhans RC, Kent WJ, Blanchette M, Haussler D, Miller W: Reconstructing contiguous regions of an ancestral genome. Genome Res. 2006, 16: 1557-1565. 10.1101/gr.5383506.
Wienberg J: The evolution of eutherian chromosomes. Curr Op Genet Devel. 2004, 14: 657-666. 10.1016/j.gde.2004.10.001.
Bininda-Emonds OR, Cardillo M, Jones KE, MacPhee RD, Beck RM, Grenyer R, Price SA, Vos RA, Gittleman JL, Purvis A: The delayed rise of present-day mammals. Nature. 2007, 446: 507-512. 10.1038/nature05634.
Bourque G, Tesler G, Pevzner PA: The convergence of cytogenetics and rearrangement-based models for ancestral genome reconstruction. Genome Res. 2006, 16: 311-313. 10.1101/gr.4631806.
Froenicke L, Caldés MG, Graphodatsky A, Müller S, Lyons LA, Robinson TJ, Volleth M, Yang F, Wienberg J: Are molecular cytogenetics and bioinformatics suggesting diverging models of ancestral mammalian genomes?. Genome Res. 2006, 16: 306-310. 10.1101/gr.3955206.
Kohn M, Kehrer-Sawatzki H, Vogel W, Graves JAM, Hameister H: Wide genome comparisons reveal the origins of the human X chromosome. Trends Genet. 2004, 20: 598-603. 10.1016/j.tig.2004.09.008.
Kohn M, Högel J, Vogel W, Minich P, Kehrer-Sawatzki H, Graves JA, Hameister H: Reconstruction of a 450-My-old ancestral vertebrate protokaryotype. Trends Genet. 2006, 22: 203-210. 10.1016/j.tig.2006.02.008.
Peng Q, Pevzner PA, Tesler G: The fragile breakage versus random breakage models of chromosome evolution. PLoS Computat Biol. 2006, 2: 0100-0111. 10.1371/journal.pcbi.0020100.
Gordon L, Yang S, Tran-Gyamfi M, Baggott D, Christensen M, Hamilton A, Crooijmans R, Groenen M, Lucas S, Ovcharenko I, Stubbs L: Comparative analysis of chicken chromosome 28 provides new clues to the evolutionary fragility of gene-rich vertebrate regions. Genome Res. 2007, 17: 1603-1613. 10.1101/gr.6775107.
Nadeau JH, Taylor BA: Lengths of chromosomal segments conserved since divergence of man and mouse. Proc Natl Acad Sci USA. 1984, 81: 814-818. 10.1073/pnas.81.3.814.
International Human Genome Sequencing Consortium: Finishing the euchromatic sequence of the human genome. Nature. 2004, 431: 931-945. 10.1038/nature03001.
Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, Agarwal P, Agarwala R, Ainscough R, Alexandersson M, An P, Antonarakis SE, Attwood J, Baertsch R, Bailey J, Barlow K, Beck S, Berry E, Birren B, Bloom T, Bork P, Botcherby M, Bray N, Brent MR, Brown DG, Brown SD, Bult C, Burton J, Butler J, Campbell RD, Carninci P, Cawley S, Chiaromonte F, Chinwalla AT, Church DM, Clamp M, Clee C, Collins FS, Cook LL, Copley RR, Coulson A, Couronne O, Cuff J, Curwen V, Cutts T, Daly M, David R, Davies J, Delehaunty KD, Deri J, Dermitzakis ET, Dewey C, Dickens NJ, Diekhans M, Dodge S, Dubchak I, Dunn DM, Eddy SR, Elnitski L, Emes RD, Eswara P, Eyras E, Felsenfeld A, Fewell GA, Flicek P, Foley K, Frankel WN, Fulton LA, Fulton RS, Furey TS, Gage D, Gibbs RA, Glusman G, Gnerre S, Goldman N, Goodstadt L, Grafham D, Graves TA, Green ED, Gregory S, Guigó R, Guyer M, Hardison RC, Haussler D, Hayashizaki Y, Hillier LW, Hinrichs A, Hlavina W, Holzer T, Hsu F, Hua A, Hubbard T, Hunt A, Jackson I, Jaffe DB, Johnson LS, Jones M, Jones TA, Joy A, Kamal M, Karlsson EK, Karolchik D, Kasprzyk A, Kawai J, Keibler E, Kells C, Kent WJ, Kirby A, Kolbe DL, Korf I, Kucherlapati RS, Kulbokas EJ, Kulp D, Landers T, Leger JP, Leonard S, Letunic I, Levine R, Li J, Li M, Lloyd C, Lucas S, Ma B, Maglott DR, Mardis ER, Matthews L, Mauceli E, Mayer JH, McCarthy M, McCombie WR, McLaren S, McLay K, McPherson JD, Meldrim J, Meredith B, Mesirov JP, Miller W, Miner TL, Mongin E, Montgomery KT, Morgan M, Mott R, Mullikin JC, Muzny DM, Nash WE, Nelson JO, Nhan MN, Nicol R, Ning Z, Nusbaum C, O'Connor MJ, Okazaki Y, Oliver K, Overton-Larty E, Pachter L, Parra G, Pepin KH, Peterson J, Pevzner P, Plumb R, Pohl CS, Poliakov A, Ponce TC, Ponting CP, Potter S, Quail M, Reymond A, Roe BA, Roskin KM, Rubin EM, Rust AG, Santos R, Sapojnikov V, Schultz B, Schultz J, Schwartz MS, Schwartz S, Scott C, Seaman S, Searle S, Sharpe T, Sheridan A, Shownkeen R, Sims S, Singer JB, Slater G, Smit A, Smith DR, Spencer B, Stabenau A, Stange-Thomann N, Sugnet C, Suyama M, Tesler G, Thompson J, Torrents D, Trevaskis E, Tromp J, Ucla C, Ureta-Vidal A, Vinson JP, Von Niederhausern AC, Wade CM, Wall M, Weber RJ, Weiss RB, Wendl MC, West AP, Wetterstrand K, Wheeler R, Whelan S, Wierzbowski J, Willey D, Williams S, Wilson RK, Winter E, Worley KC, Wyman D, Yang S, Yang SP, Zdobnov EM, Zody MC, Lander ES, Rogers J, Abril JF, et al: Initial sequencing and comparative analysis of the mouse genome. Nature. 2002, 420: 520-562. 10.1038/nature01262.
Gibbs RA, Weinstock GM, Metzker ML, Muzny DM, Sodergren EJ, Scherer S, Scott G, Steffen D, Worley KC, Burch PE, Okwuonu G, Hines S, Lewis L, DeRamo C, Delgado O, Dugan-Rocha S, Miner G, Morgan M, Hawes A, Gill R, Celera , Holt RA, Adams MD, Amanatides PG, Baden-Tillson H, Barnstead M, Chin S, Evans CA, Ferriera S, Fosler C, Glodek A, Gu Z, Jennings D, Kraft CL, Nguyen T, Pfannkoch CM, Sitter C, Sutton GG, Venter JC, Woodage T, Smith D, Lee HM, Gustafson E, Cahill P, Kana A, Doucette-Stamm L, Weinstock K, Fechtel K, Weiss RB, Dunn DM, Green ED, Blakesley RW, Bouffard GG, De Jong PJ, Osoegawa K, Zhu B, Marra M, Schein J, Bosdet I, Fjell C, Jones S, Krzywinski M, Mathewson C, Siddiqui A, Wye N, McPherson J, Zhao S, Fraser CM, Shetty J, Shatsman S, Geer K, Chen Y, Abramzon S, Nierman WC, Havlak PH, Chen R, Durbin KJ, Egan A, Ren Y, Song XZ, Li B, Liu Y, Qin X, Cawley S, Worley KC, Cooney AJ, D'Souza LM, Martin K, Wu JQ, Gonzalez-Garay ML, Jackson AR, Kalafus KJ, McLeod MP, Milosavljevic A, Virk D, Volkov A, Wheeler DA, Zhang Z, Bailey JA, Eichler EE, Tuzun E, Birney E, Mongin E, Ureta-Vidal A, Woodwark C, Zdobnov E, Bork P, Suyama M, Torrents D, Alexandersson M, Trask BJ, Young JM, Huang H, Wang H, Xing H, Daniels S, Gietzen D, Schmidt J, Stevens K, Vitt U, Wingrove J, Camara F, Mar Albà M, Abril JF, Guigo R, Smit A, Dubchak I, Rubin EM, Couronne O, Poliakov A, Hübner N, Ganten D, Goesele C, Hummel O, Kreitler T, Lee YA, Monti J, Schulz H, Zimdahl H, Himmelbauer H, Lehrach H, Jacob HJ, Bromberg S, Gullings-Handley J, Jensen-Seaman MI, Kwitek AE, Lazar J, Pasko D, Tonellato PJ, Twigger S, Ponting CP, Duarte JM, Rice S, Goodstadt L, Beatson SA, Emes RD, Winter EE, Webber C, Brandt P, Nyakatura G, Adetobi M, Chiaromonte F, Elnitski L, Eswara P, Hardison RC, Hou M, Kolbe D, Makova K, Miller W, Nekrutenko A, Riemer C, Schwartz S, Taylor J, Yang S, Zhang Y, Lindpaintner K, Andrews TD, Caccamo M, Clamp M, Clarke L, Curwen V, Durbin R, Eyras E, Searle SM, Cooper GM, Batzoglou S, Brudno M, Sidow A, Stone EA, Venter JC, Payseur BA, Bourque G, López-Otín C, Puente XS, Chakrabarti K, Chatterji S, Dewey C, Pachter L, Bray N, Yap VB, Caspi A, Tesler G, Pevzner PA, Haussler D, Roskin KM, Baertsch R, Clawson H, Furey TS, Hinrichs AS, Karolchik D, Kent WJ, Rosenbloom KR, Trumbower H, Weirauch M, Cooper DN, Stenson PD, Ma B, Brent M, Arumugam M, Shteynberg D, Copley RR, Taylor MS, Riethman H, Mudunuri U, Peterson J, Guyer M, Felsenfeld A, Old S, Mockrin S, Collins F, Rat Genome Sequencing Project Consortium: Genome sequence of the Brown Norway rat yields insights into mammalian evolution. Nature. 2004, 428: 493-521. 10.1038/nature02426.
Lindblad-Toh K, Wade CM, Mikkelsen TS, Karlsson EK, Jaffe DB, Kamal M, Clamp M, Chang JL, Kulbokas EJ, Zody MC, Mauceli E, Xie X, Breen M, Wayne RK, Ostrander EA, Ponting CP, Galibert F, Smith DR, DeJong PJ, Kirkness E, Alvarez P, Biagi T, Brockman W, Butler J, Chin CW, Cook A, Cuff J, Daly MJ, DeCaprio D, Gnerre S, Grabherr M, Kellis M, Kleber M, Bardeleben C, Goodstadt L, Heger A, Hitte C, Kim L, Koepfli KP, Parker HG, Pollinger JP, Searle SM, Sutter NB, Thomas R, Webber C, Baldwin J, Abebe A, Abouelleil A, Aftuck L, Ait-Zahra M, Aldredge T, Allen N, An P, Anderson S, Antoine C, Arachchi H, Aslam A, Ayotte L, Bachantsang P, Barry A, Bayul T, Benamara M, Berlin A, Bessette D, Blitshteyn B, Bloom T, Blye J, Boguslavskiy L, Bonnet C, Boukhgalter B, Brown A, Cahill P, Calixte N, Camarata J, Cheshatsang Y, Chu J, Citroen M, Collymore A, Cooke P, Dawoe T, Daza R, Decktor K, DeGray S, Dhargay N, Dooley K, Dooley K, Dorje P, Dorjee K, Dorris L, Duffey N, Dupes A, Egbiremolen O, Elong R, Falk J, Farina A, Faro S, Ferguson D, Ferreira P, Fisher S, FitzGerald M, Foley K, Foley C, Franke A, Friedrich D, Gage D, Garber M, Gearin G, Giannoukos G, Goode T, Goyette A, Graham J, Grandbois E, Gyaltsen K, Hafez N, Hagopian D, Hagos B, Hall J, Healy C, Hegarty R, Honan T, Horn A, Houde N, Hughes L, Hunnicutt L, Husby M, Jester B, Jones C, Kamat A, Kanga B, Kells C, Khazanovich D, Kieu AC, Kisner P, Kumar M, Lance K, Landers T, Lara M, Lee W, Leger JP, Lennon N, Leuper L, LeVine S, Liu J, Liu X, Lokyitsang Y, Lokyitsang T, Lui A, Macdonald J, Major J, Marabella R, Maru K, Matthews C, McDonough S, Mehta T, Meldrim J, Melnikov A, Meneus L, Mihalev A, Mihova T, Miller K, Mittelman R, Mlenga V, Mulrain L, Munson G, Navidi A, Naylor J, Nguyen T, Nguyen N, Nguyen C, Nguyen T, Nicol R, Norbu N, Norbu C, Novod N, Nyima T, Olandt P, O'Neill B, O'Neill K, Osman S, Oyono L, Patti C, Perrin D, Phunkhang P, Pierre F, Priest M, Rachupka A, Raghuraman S, Rameau R, Ray V, Raymond C, Rege F, Rise C, Rogers J, Rogov P, Sahalie J, Settipalli S, Sharpe T, Shea T, Sheehan M, Sherpa N, Shi J, Shih D, Sloan J, Smith C, Sparrow T, Stalker J, Stange-Thomann N, Stavropoulos S, Stone C, Stone S, Sykes S, Tchuinga P, Tenzing P, Tesfaye S, Thoulutsang D, Thoulutsang Y, Topham K, Topping I, Tsamla T, Vassiliev H, Venkataraman V, Vo A, Wangchuk T, Wangdi T, Weiand M, Wilkinson J, Wilson A, Yadav S, Yang S, Yang X, Young G, Yu Q, Zainoun J, Zembek L, Zimmer A, Lander ES: Genome sequence, comparative analysis and haplotype structure of the domestic dog. Nature. 2005, 438: 803-819. 10.1038/nature04338.
Mikkelsen TS, Wakefield MJ, Aken B, Amemiya CT, Chang JL, Duke S, Garber M, Gentles AJ, Goodstadt L, Heger A, Jurka J, Kamal M, Mauceli E, Searle SM, Sharpe T, Baker ML, Batzer MA, Benos PV, Belov K, Clamp M, Cook A, Cuff J, Das R, Davidow L, Deakin JE, Fazzari MJ, Glass JL, Grabherr M, Greally JM, Gu W, Hore TA, Huttley GA, Kleber M, Jirtle RL, Koina E, Lee JT, Mahony S, Marra MA, Miller RD, Nicholls RD, Oda M, Papenfuss AT, Parra ZE, Pollock DD, Ray DA, Schein JE, Speed TP, Thompson K, VandeBerg JL, Wade CM, Walker JA, Waters PD, Webber C, Weidman JR, Xie X, Zody MC, Broad Institute Genome Sequencing Platform; Broad Institute Whole Genome Assembly Team, Graves JA, Ponting CP, Breen M, Samollow PB, Lander ES, Lindblad-Toh K: Genome of the marsupial Monodelphis domestica reveals innovation in non-coding sequences. Nature. 2007, 447: 167-177. 10.1038/nature05805.
International Human Genome Sequencing Consortium: Finishing the euchromatic sequence of the human genome. Nature. 2004, 431: 931-945. 10.1038/nature03001.
Springer MS, Murphy WJ, Eizirik E, O'Brien SJ: Placental mammal diversification and the Cretaceous-Tertiary boundary. Proc Natl Acad Sci USA. 2003, 100: 1056-1061. 10.1073/pnas.0334222100.
Pontius JU, Mullikin JC, Smith DR, Agencourt Sequencing Team, Lindblad-Toh K, Gnerre S, Clamp M, Chang J, Stephens R, Neelam B, Volfovsky N, Schäffer AA, Agarwala R, Narfström K, Murphy WJ, Giger U, Roca AL, Antunes A, Menotti-Raymond M, Yuhki N, Pecon-Slattery J, Johnson WE, Bourque G, Tesler G, NISC Comparative Sequencing Program, O'Brien SJ: Initial sequence and comparative analysis of the cat genome. Genome Res. 2007, 17: 1675-1689. 10.1101/gr.6380007.
Kemkemer C, Kohn M, Kehrer-Sawatzki H, Minich P, Hoegel J, Froenicke L, Hameister H: Reconstruction of the ancestral ferungulate karyotype by electronic chromosome painting (E-painting). Chrom Res. 2006, 14: 899-907. 10.1007/s10577-006-1097-7.
Glover TW, Arlt MF, Casper AM, Durkin SG: Mechanism of common fragile site instability. Hum Mol Genet. 2005, 14: R197-R205. 10.1093/hmg/ddi265.
Debacker K, Kooy RF: Fragile sites and human disease. Hum Mol Genet. 2007, 16: R150-158. 10.1093/hmg/ddm136.
Schwartz M, Zlotorrynski E, Kerem B: The molecular basis of common and rare fragile sites. Cancer Letts. 2006, 232: 13-26. 10.1016/j.canlet.2005.07.039.
Limongi MZ, Pelliccia F, Rocchi A: Characterization of the human common fragile site FRA2G. Genomics. 2003, 81: 93-97. 10.1016/S0888-7543(03)00007-7.
Wilke CM, Hall BK, Hoge A, Paradee W, Smith DI, Glover TW: FRA3B extends over a broad region and contains a spontaneous HPV16 integration site: direct evidence for the coincidence of viral integration sites and fragile sites. Hum Mol Genet. 1996, 5: 187-195. 10.1093/hmg/5.2.187.
Rozier L, El-Achkar E, Apiou F, Debatisse M: Characterization of a conserved aphidicolin-sensitive common fragile site at human 4q22 and mouse 6C1: possible association with an inherited disease and cancer. Oncogene. 2004, 23: 6872-6880. 10.1038/sj.onc.1207809.
Denison SR, Callahan G, Becker NA, Phillips LA, Smith DI: Characterization of FRA6E and its potential role in autosomal recessive juvenile parkinsonism and ovarian cancer. Genes Chrom Cancer. 2003, 38: 40-52. 10.1002/gcc.10236.
Morelli C, Karayianni E, Magnanini C, Mungall AJ, Thorland E, Negrini M, Smith DI, Barbanti-Brodano G: Cloning and characterization of the common fragile site FRA6F harboring a replicative senescence gene and frequently deleted in human tumors. Oncogene. 2002, 21: 7266-7276. 10.1038/sj.onc.1205573.
Zlotorynski E, Rahat A, Skaug J, Ben-Porat N, Ozeri E, Hershberg R, Levi A, Scherer SW, Margalit H, Kerem B: Molecular basis for expression of common and rare fragile sites. Mol Cell Biol. 2003, 23: 7143-7151. 10.1128/MCB.23.20.7143-7151.2003.
Hellman A, Zlotorynski E, Scherer SW, Cheung J, Vincent JB, Smith DI, Trakhtenbrot L, Kerem B: A role for common fragile site induction in amplification of human oncogenes. Cancer Cell. 2002, 1: 89-97. 10.1016/S1535-6108(02)00017-X.
Mishmar D, Rahat A, Scherer SW, Nyakatura G, Hinzmann B, Kohwi Y, Mandel-Gutfroind Y, Lee JR, Drescher B, Sas DE, Margalit H, Platzer M, Weiss A, Tsui LC, Rosenthal A, Kerem B: Molecular characterization of a common fragile site (FRA7H) on human chromosome 7 by the cloning of a simian virus 40 integration site. Proc Natl Acad Sci USA. 1998, 95: 8141-8146. 10.1073/pnas.95.14.8141.
Ciullo M, Debily MA, Rozier L, Autiero M, Billault A, Mayau V, El Marhomy S, Guardiola J, Bernheim A, Coullin P, Piatier-Tonneau D, Debatisse M: Initiation of the breakage-fusion-bridge mechanism through common fragile site activation in human breast cancer cells: the model of PIP gene duplication from a break at FRA7I. Hum Mol Genet. 2002, 11: 2887-2894. 10.1093/hmg/11.23.2887.
Helmrich A, Stout-Weider K, Matthaei A, Hermann K, Heiden T, Schrock E: Identification of the human/mouse syntenic common fragile site FRA7K/Fra12C1 – relation of FRA7K and other human common fragile sites on chromosome 7 to evolutionary breakpoints. Int J Cancer. 2007, 120: 48-54. 10.1002/ijc.22049.
Callahan G, Denison SR, Phillips LA, Shridhar V, Smith DI: Characterization of the common fragile site FRA9E and its potential role in ovarian cancer. Oncogene. 2003, 22: 590-601. 10.1038/sj.onc.1206171.
Mangelsdorf M, Ried K, Woollatt E, Dayan S, Eyre H, Finnis M, Hobson L, Nancarrow J, Venter D, Baker E, Richards RI: Chromosomal fragile site FRA16D and DNA instability in cancer. Cancer Res. 2000, 60: 1683-1689.
Bailey JA, Gu Z, Clark RA, Reinert K, Samonte RV, Schwartz S, Adams MD, Myers EW, Li PW, Eichler EE: Recent segmental duplications in the human genome. Science. 2002, 297: 1003-1007. 10.1126/science.1072047.
Bailey JA, Eichler EE: Genome-wide detection and analysis of recent segmental duplications within mammalian organisms. Cold Spring Harbor Symp Quant Biol. 2003, 68: 115-124. 10.1101/sqb.2003.68.115.
Zhang L, Lu HH, Chung WY, Yang J, Li WH: Patterns of segmental duplication in the human genome. Mol Biol Evol. 2005, 22: 135-141. 10.1093/molbev/msh262.
Versteeg R, van Schaik BD, van Batenburg MF, Roos M, Monajemi R, Caron H, Bussemaker HJ, van Kampen AH: The human transcriptome map reveals extremes in gene density, intron length, GC content, and repeat pattern for domains of highly and weakly expressed genes. Genome Res. 2003, 13: 1998-2004. 10.1101/gr.1649303.
Murphy WJ, Stanyon R, O'Brien SJ: Evolution of mammalian genome organization inferred from comparative gene mapping. Genome Biol. 2001, 2: Reviews 0005.1-0005.8. 10.1186/gb-2001-2-6-reviews0005.
Svartman M, Stone G, Page JE, Stanyon R: A chromosome painting test of the basal Eutherian karyotype. Chrom Res. 2004, 12: 45-53. 10.1023/B:CHRO.0000009294.18760.e4.
Svartman M, Stone G, Stanyon R: The ancestral eutherian karyotype is present in Xenarthra. PloS Genet. 2006, 2: e109-10.1371/journal.pgen.0020109.
Volleth M, Müller S: Zoo-FISH in the European mole (Talpa europaea) detects all ancestral Boreo-Eutherian human homologous chromosome associations. Cytogenet Genome Res. 2006, 115: 154-157. 10.1159/000095236.
Glas R, Marshall Graves JA, Toder R, Ferguson-Smith M, O'Brien PC: Cross-species chromosome painting between human and marsupial directly demonstrates the ancient region of the mammalian X. Mamm Genome. 1999, 10: 1115-1116. 10.1007/s003359901174.
Robinson TJ, Ruiz-Herrera A, Froenicke L: Dissecting the mammalian genome – new insights into chromosomal evolution. Trends Genet. 2006, 22: 297-301. 10.1016/j.tig.2006.04.002.
Kellogg ME, Burkett S, Dennis TR, Stone G, Gray BA, McGuire PM, Zori RT, Stanyon R: Chromosome painting in the manatee supports Afrotheria and Paenungulata. BMC Evol Biol. 2007, 7: 6-10.1186/1471-2148-7-6.
Graphodatsky AS, Yang F, Perelman PL, O'Brien PC, Serdukova NA, et al: Comparative molecular cytogenetic studies in the order Carnivora: mapping chromosomal rearrangements onto the phylogenetic tree. Cytogenet Genome Res. 2002, 96: 137-145. 10.1159/000063032.
Froenicke L, Wienberg J, Stone G, Adams L, Stanyon R: Towards the delineation of the ancestral Eutherian genome organization: Comparative genome maps of human and the African elephant (Loxodonta africana) generated by chromosome painting. Proc R Soc Lond B Biol Sci. 2003, 270: 1331-1340. 10.1098/rspb.2003.2383.
Dobigny G, Yang F, O'Brien PC, Volobouev V, Kovács A, Pieczarka JC, Ferguson-Smith MA, Robinson TJ: Low rate of genomic repatterning in Xenarthra inferred from chromosome painting data. Chrom Res. 2005, 13: 651-663. 10.1007/s10577-005-1002-9.
Robinson TJ, Fu B, Ferguson-Smith MA, Yang F: Cross-species chromosome painting in the golden mole and elephant-shrew: support for the mammalian clades Afrotheria and Afroinsectiphillia but not Afroinsectivora. Proc Biol Sci. 2004, 271: 1477-1484. 10.1098/rspb.2004.2754.
Wildman DE, Uddin M, Opazo JC, Liu G, Lefort V, Guindon S, Gascuel O, Grossman LI, Romero R, Goodman M: Genomics, biogeography, and the diversification of placental mammals. Proc Natl Acad Sci USA. 2007, 104: 14395-14400. 10.1073/pnas.0704342104.
Ruiz-Herrera A, Castresana J, Robinson TJ: Is mammalian chromosomal evolution driven by regions of genome fragility?. Genome Biol. 2006, 7: R115-10.1186/gb-2006-7-12-r115.
Sankoff D: The signal in the genomes. PLoS Computat Biol. 2006, 2: 0320-0321.
Pevzner P, Tesler G: Genome rearrangements in mammalian evolution: lessons from human and mouse genomes. Genome Res. 2003, 13: 37-45. 10.1101/gr.757503.
Becker TS, Lenhard B: The random versus fragile breakage models of chromosome evolution: a matter of resolution. Mol Genet Genomics. 2007, 278: 487-491. 10.1007/s00438-007-0287-0.
Ruiz-Herrera A, Robinson TJ: Chromosomal instability in Afrotheria: fragile sites, evolutionary breakpoints and phylogenetic inference from genome sequence assemblies. BMC Evol Biol. 2007, 7: 199-10.1186/1471-2148-7-199.
Djalali M, Adolph S, Steinbach P, Winking H, Hameister H: A comparative mapping study of fragile sites in the human and murine genomes. Hum Genet. 1987, 77: 157-162. 10.1007/BF00272384.
Helmrich A, Stout-Weider K, Hermann K, Schroeck E, Heiden T: Common fragile sites are conserved features of human and mouse chromosomes and relate to large active genes. Genome Res. 2006, 16: 1222-1230. 10.1101/gr.5335506.
Mitelman F, Johansson B, Mertens F: The impact of translocations and gene fusions on cancer causation. Nat Rev Cancer. 2007, 7: 233-245. 10.1038/nrc2091.
Sankoff D, Deneault M, Turbis P, Allen C: Chromosomal distributions of breakpoints in cancer, infertility, and evolution. Theor Popul Biol. 2002, 61: 497-501. 10.1006/tpbi.2002.1599.
Cáceres M, National Institutes of Health Intramural Sequencing Center Comparative Sequencing Program, Sullivan RT, Thomas JW: A recurrent inversion on the eutherian X chromosome. Proc Natl Acad Sci USA. 2007, 104: 18571-18576. 10.1073/pnas.0706604104.
Alekseyev MA, Pevzner PA: Are there rearrangement hotspots in the human genome?. PLoS Comput Biol. 2007, 3: e209-10.1371/journal.pcbi.0030209.
Armengol L, Pujana MA, Cheung J, Scherer SW, Estivill X: Enrichment of segmental duplications in regions of breaks of synteny between the human and mouse genomes suggest their involvement in evolutionary rearrangements. Hum Mol Genet. 2003, 12: 2201-2208. 10.1093/hmg/ddg223.
Armengol L, Marquès-Bonet T, Cheung J, Khaja R, González JR, Scherer SW, Navarro A, Estivill X: Murine segmental duplications are hot spots for chromosome and gene evolution. Genomics. 2005, 86: 692-700. 10.1016/j.ygeno.2005.08.008.
Goidts V, Szamalek JM, Hameister H, Kehrer-Sawatzki H: Segmental duplication associated with the human-specific inversion of chromosome 18: a further example of the impact of segmental duplications on karyotype and genome evolution in primates. Hum Genet. 2004, 115: 116-122. 10.1007/s00439-004-1120-z.
Kehrer-Sawatzki H, Cooper DN: Understanding the recent evolution of the human genome: insights from human-chimpanzee genome comparisons. Hum Mutat. 2007, 28: 99-130. 10.1002/humu.20420.
Szamalek JM, Cooper DN, Hoegel J, Hameister H, Kehrer-Sawatzki H: Chromosomal speciation of humans and chimpanzees revisited: studies of DNA divergence within inverted regions. Cytogenet Genome Res. 2007, 116: 53-60. 10.1159/000097417.
Carbone L, Vessere GM, ten Hallers BF, Zhu B, Osoegawa K, Mootnick A, Kofler A, Wienberg J, Rogers J, Humphray S, Scott C, Harris RA, Milosavljevic A, de Jong PJ: A high-resolution map of synteny disruptions in gibbon and human genomes. PloS Genet. 2006, 2: e223-10.1371/journal.pgen.0020223.
Roberto R, Capozzi O, Wilson RK, Mardis ER, Lomiento M, Tuzun E, Cheng Z, Mootnick AR, Archidiacono N, Rocchi M, Eichler EE: Molecular refinement of gibbon genome rearrangements. Genome Res. 2007, 17: 249-257. 10.1101/gr.6052507.
Ferguson-Smith MA, Trifonov V: Mammalian karyotype evolution. Nature Reviews Genet. 2007, 8: 950-962. 10.1038/nrg2199.
We are most grateful to Kim Worley and Richard Gibbs at the Human Genome Sequencing Center in Houston, Texas for making the B-tau3.1 version of the bovine genome available to us. This work was supported by grant HA 1082/16-3 from the Deutsche Forschungsgemeinschaft.
The authors declare that they have no competing interests.
HH and HKS conceived and designed the experiments. CK and MK performed the experiments. HKS, LF, HH and DNC analyzed the data and wrote the paper. JH performed the statistical analysis. All authors read and approved the final manuscript.