Waves of genomic hitchhikers shed light on the evolution of gamebirds (Aves: Galliformes)
© Kriegs et al.. 2007
Received: 18 September 2007
Accepted: 09 October 2007
Published: 09 October 2007
Skip to main content
© Kriegs et al.. 2007
Received: 18 September 2007
Accepted: 09 October 2007
Published: 09 October 2007
The phylogenetic tree of Galliformes (gamebirds, including megapodes, currassows, guinea fowl, New and Old World quails, chicken, pheasants, grouse, and turkeys) has been considerably remodeled over the last decades as new data and analytical methods became available. Analyzing presence/absence patterns of retroposed elements avoids the problems of homoplastic characters inherent in other methodologies. In gamebirds, chicken repeats 1 (CR1) are the most prevalent retroposed elements, but little is known about the activity of their various subtypes over time. Ascertaining the fixation patterns of CR1 elements would help unravel the phylogeny of gamebirds and other poorly resolved avian clades.
We analyzed 1,978 nested CR1 elements and developed a multidimensional approach taking advantage of their transposition in transposition character (TinT) to characterize the fixation patterns of all 22 known chicken CR1 subtypes. The presence/absence patterns of those elements that were active at different periods of gamebird evolution provided evidence for a clade (Cracidae + (Numididae + (Odontophoridae + Phasianidae))) not including Megapodiidae; and for Rollulus as the sister taxon of the other analyzed Phasianidae. Genomic trace sequences of the turkey genome further demonstrated that the endangered African Congo Peafowl (Afropavo congensis) is the sister taxon of the Asian Peafowl (Pavo), rejecting other predominantly morphology-based groupings, and that phasianids are monophyletic, including the sister taxa Tetraoninae and Meleagridinae.
The TinT information concerning relative fixation times of CR1 subtypes enabled us to efficiently investigate gamebird phylogeny and to reconstruct an unambiguous tree topology. This method should provide a useful tool for investigations in other taxonomic groups as well.
In parallel to the application of new analytical methods, the avian phylogenetic tree has undergone substantial changes in the past decades. But even today many branchings remain highly controversial, although it is widely accepted that modern birds fall into two major clades, (1) Palaeognathae, a clade comprising rheas, kiwis, ostrich, emus, cassowaries, and tinamous and (2) Neognathae, comprising Galloanseres (fowl and waterfowl) and Neoaves (all other taxa) [1, 2]. Within Galloanseres, Galliformes (gamebirds) are traditionally classified into five families: Megapodiidae (megapodes, brush turkey and allies), Cracidae (currassows, guans, and chachalacas), Odontophoridae (New world quails), Numididae (Guinea fowl), and Phasianidae (pheasants, peacocks, partridges, and allies) [3–6].
While virtually all studies identify megapodes and cracids as successive sister taxa of the remaining Galliformes, the branching orders of Odontophoridae, Numididae, and Phasianidae, with its presumed subfamilies Tetraoninae (grouses) and Meleagridinae (turkeys), are less clear. Especially, the interrelationships between Numididae, Odontophoridae, and Phasianidae are considered a "major puzzle" of galliform phylogeny . Also debated are the exact affinities of Tetraoninae and Meleagridinae. In traditional classifications, these two taxa are separated from the other Phasianidae, whereas molecular sequence analyses support a position of Tetraoninae and Meleagridinae deeply within a clade including the other Phasianidae [8–11]. Recently, Kaiser et al.  investigated phylogenetically informative retropositions in Galliformes and found significant support for two clades, (I) a monophyletic Phasianidae including Meleagridinae and Tetraoninae and (II) a clade comprising Meleagridinae, Tetraoninae, Phasianus, and Tragopan. The relative positions of Numididae and Odontophoridae and the topology of the phasianid tree differed from other sequence-based studies depending on the genes investigated and the analytical methods used [9–8].
Discrepancies in phylogenetic reconstructions based on various paleontological, morphological, behavioral, and molecular methodologies are often due to the presence of homoplastic characters . Markers that are less likely to be confounded by problems of homoplasy include rare genomic changes (RGC) such as random insertions and deletions (indels) and retroposed elements . Indels are frequently used for phylogenetic reconstructions [19, 21–24] and the presence/absence patterns of retroposed elements have proven invaluable for reconstructing virtually ambiguity-free phylogenetic trees [25–29]. Presence/absence data resemble virtually homoplasy-free multistate characters with an extremely large possible number of unique character states. Steel and Penny  suggest that for this kind of data, maximum parsimony converge to a maximum likelihood estimator. The clear "presence" of a retroposed element at orthologous positions in related taxa indicates a derived condition acquired via a common ancestor, while its "absence" in more distant taxa represents the plesiomorphic condition prior to integration. Retroposed elements contain several features that, on their own, are very unlikely to occur twice independently at orthologous genomic positions. These include defined subtypes of retroposed elements, diagnostic mutations, and characteristic truncations of the consensus retroelements. Although presence/absence patterns are virtually homoplasy-free, there does exist, as for any other marker system, a low probability of lineage sorting  and a slight chance of exact excision of retroposed elements with perfect direct repeats . These caveats aside, a statistical framework was developed to evaluate presence/absence data , and presence/absence patterns of retroposed elements have now been successfully used to reconstruct, for example, the placental mammalian tree at the superordinal level , the monophyly of Cetartiodactyla  and Pegasoferae , the position of Primates within Supraprimates , and internal primate relationships [35–37].
In the chicken genome, retroposed elements of the chicken repeat 1 (CR1) family of Long INterspersed Elements (LINEs), with more than 200,000 copies, constitute 80% of all interspersed repeats and 3.1% of the entire genome [38, 39], while the second largest fraction of retroposed elements, the Long Terminal Repeat elements (LTRs) of endogenous retroviruses, with 12,000 copies, constitute only 4.7% of all interspersed repeats [38, 39]. As CR1 elements do not show target site duplication (direct repeats) [40, 41], excisions, such as that described by van de Lagemaat , cannot occur, making them the most suitable retroposed elements in bird genomes for phylogenetic purposes .
To further investigate and reconstruct the phylogenetic tree of Galliformes, we developed a multidimensional, computer applicable model for computing the frequencies of TinT genome-wide. Using this model, we describe the waves of activity or fixation patterns of various CR1 subtypes. We then used this information to directly select specific subtypes of retroposed elements that were active on the galliform evolutionary lineage leading to the chicken (Gallus gallus). These element subtypes were used to experimentally extract phylogenetic informative orthologous sequences from representative loci of all galliform families. As LTR elements also insert into each other, but were not frequent enough to apply the TinT method, we investigated their random insertions in the chicken genome for potential phylogenetic use. Furthermore, other phylogenetic signals (indels) observed during the genomic alignments of our presence/absence loci provided support for additional clades. With this multifaceted approach we reconstructed the major aspects of galliform evolution.
To obtain a relative temporal order of CR1 element activity for phylogenetic use, we explored the patterns of nested retroposed CR1 subtypes (Figure 1) from the chicken genome. From a genome wide collection of annotated retroposed elements we extracted all 1,978 cases of nested CR1 elements. The resulting matrix (additional data file 1) was used to calculate a multidimensional model (additional data file 1) giving the maximum probability of activity for each of the 22 CR1 subtypes on a relative timescale (TinT, see additional data file 1). The model makes the following assumptions: (i) For each CR1 subtype there was one limited period of activity. (ii) There was no known target site preference for the CR1 subtypes, thus each individual copy could have inserted at any random position in the genome. This could have been either an anonymous sequence or another CR1 copy. (iii) The number of copies of any given CR1 subtype in the genome reflects the duration of its activity. (iv) The temporal fixation rate of each CR1 subtype can be described by a normal distribution as is shown by the divergences of its single copies from their consensus sequences (additional data file 1). (v) The probabilities of fixation among the individual CR1 copies during their specific activity periods were relatively equal (equal promoter activity, equal affinity of reverse transcriptase to the mRNA, and an equal availability of reverse transcriptase). Based on these assumptions we developed a function describing the behavior of the fixation of each CR1 subtype on a relative timescale. Using the maximum likelihood approach we calculated the maxima of probability of fixation for each CR1 subtype.
To verify the relative times of TinT activities we calculated the average divergencies of all CR1 subtypes from their consensus sequences. Assuming random accumulation of mutations, the degree of divergency should then be age-related. There was a significant correlation between the relative time scales of the cumulative TinT and the CR1 divergencies (R = -0,6489, P << 0.01) (additional data file 1).
Armed with the relative ages of chicken CR1 subtypes deduced from the cumulative TinT, we selected a representative set of diverse CR1 elements, whose activity periods spanned the entire time frame of galliform evolution, to use as experimental probes of phylogenetic branch points of the other galliform species. Intronic sequences of the chicken genome (17,300 introns; maximal length of 1 kb) were screened for embedded CR1 elements and LTR elements (RepeatMasker) (300 CR1 and 45 LTR elements). These were inspected by eye in the genome browser (UCSC)  and the most conserved loci (120 cases), representing the broad activity range of CR1 subtypes as compiled by the TinT method, were chosen to generate conserved PCR primers for amplification of orthologous loci in representative galliforms (see Methods for species sampling). Of these, 19 loci were successfully amplified in all important taxa, revealing a total of 25 phylogenetically informative retroposed elements. CR1 elements present in these amplified loci, along with those presented by Kaiser et al.  show that at least the CR1 subtypes E, Y2, X2, Y4, F2, D2, H2, C, C2, B2, H, and G were active during galliform evolution. Moreover, the results of the cumulative TinT analysis are clearly in line with the exemplarily shown activities of certain CR1 elements identified by their presence/absence patterns in various species (Figure 2B). The elements represented in the older cumulative TinT peak we found as well to be active during the first divergences in galliform evolution, while the elements of the second peak were active during times of younger divergences. Thus the phylogenetic markers present an actual calibration of the TinT relative timescale. In further investigations we focused on screening for intronic CR1 elements with TinT-selected activity patterns in a galliform-wide amplification.
As our analysis of chicken sequences did not furnish elements that retroposed after the divergence of the lineage leading to turkeys from the one leading to the chicken, to provide phylogenetic information to solve the potential sister group relationships on this lineage we scanned the available turkey genomic trace sequences (about 6 million) for insertions of relatively young repeats (CR1-C2 and CR1-B2). Cases with elements absent in chicken orthologous genomic loci were selected in the genome browser (UCSC)  and we generated conserved PCR primers for eight loci.
Within the loci containing the retroposed elements, we also found support for additional clades by the presence of 95 random intronic indels (see Figure 3). One indel was found only in the Meleagridinae and Tetraoninae species, a grouping that was also recently indicated by one CR1-insertion . For example, twelve indels were exclusive to Tetrao and Tympanuchus, grouping these two species of the subfamily Tetraoninae together, three independent indels were specific for intronic sequences of Perdix and Chrysolophus, seven were unique to the two Chrysolophus species, seven were unique to a clade comprising Pavo and Afropavo, in agreement with Kimball et al.  and four were unique to Pavo muticus and Pavo cristatus. Twenty-one independent indels group together the two odontophorid genera Callipepla and Colinus, and eighteen unite Crax alector and Crax fasciolata. Together with all the retropositions presented in points 4, 5, 6, 7 and 8 above (Fig. 3), these data clearly support a sister group relationship between Afropavo and Pavo, rejecting an earlier morphology-based hypothesis of a clade comprising Afropavo and Numididae . Although low complexity RGCs have a higher probability of being homoplastic  than do the insertion patterns of retroposons, we did not find any indels contradicting the topology of the tree supported by retroposed elements.
We provide the first retroposon evidence that Odontophoridae are the sister taxon of Phasianidae, which is also supported by at least one morphological feature, the presence of a well-developed intermetacarpal process on the carpometacarpus (see however, Stegmann (1978) concerning the possibility of a secondary loss of this process in Numididae ). Our study further provides clear evidence against a recently hypothesized sister group relationship between Perdix and Meleagridinae (Crowe et al. 2006; note that the morphological data set used in this analysis contains several incorrect character scorings ).
To the best of our knowledge, it has not yet been pointed out that the sister group relationship between New World turkeys and Palaearctic grouse, which is also supported by analyses of sequence data [11, 14], indicates a New World origin of grouse. Because the clade (Meleagridinae + Tetraoninae) is nested within taxa with predominantly Asian distributions, the stem species of this clade probably reached the New World from Asia.
A striking observation from the TinT data is that the maximal frequencies of individual CR1 subtype fixation rates fall in close temporal proximity to one another and tend to be concentrated in distinct temporal waves, as is visible from the cumulative curve (Figure 2). Interestingly, two of these peaks of CR1 fixation rates coincide with the two most highly supported branches, indicating long internal branches and/or high retroposition fixation rates. At least five parameters might affect the fixation rate of retroposons in a population in a given branch of the phylogenetic tree: (I) the promoter activity of the master gene, (II) the overall availability of enzymatic retroposition machinery and (III) its affinity to the master RNA, (IV) the population size, and (V) the branch length. In small populations the fixation of a single retroposition event is much more likely than in large populations . As the cumulative curve (Figure 2) reflects the timeframes of maximum fixation rates of several independent CR1 subtypes, it is unlikely that individual peaks are the result of promoter activity. The peaks might reflect times of over expression of the total retroposition machinery or might be due to severe population bottlenecks in the ancestral chicken lineage.
The GenBank accession numbers for the sequences discussed in this paper are [EU054465–EU054818].
In summary, we present a method to calculate the relative times of maximum fixation frequencies for retroposons. We applied the TinT method to obtain a temporal order of the activity of different CR1 subtypes. The results of the TinT method enabled us to preselect potential phylogenetically informative presence/absence loci to test hypotheses for specific internal branches of a phylogenetic tree. With this preselection strategy, we found the first retroposition evidence supporting successive sister taxon relationships between Megapodiidae, Cracidae, Numididae, and the remainder of galliform birds. Highly significant support is presented for the first time for the monophyly of the phasianoid clade comprising Numididae, Odontophoridae and Phasianidae (Figure 3, point 3). One marker suggests that Numididae are the sister taxon of a clade comprising Odontophoridae and Phasianidae. Five independent retroposon insertions presented in this study, along with four previous ones, offer overwhelming support for the monophyly of all investigated phasianid species (Figure 3, point 5). We present the first significant support for a sister group relationship between Rollulus and all other investigated phasianids (Figure 3, point 6), and additional retroposon evidence for the monophyly of a clade containing Pavo, Afropavo, Coturnix, Chrysolophus, Perdix, Tragopan, Tympanuchus, Tetrao and Meleagris (Figure 3, point 7) and for a clade comprising Chrysolophus, Perdix, Tragopan, Tympanuchus, Tetrao, and Meleagris (Figure 3, point 8). Complementary information from random indels indicate the existence of a clade including Tympanuchus, Tetrao, and Meleagris another clade comprising Tympanuchus and Tetrao, one comprising Chrysolophus and Perdix to the exclusion of Tragopan, Tympanuchus, Tetrao, Meleagris, and the other investigated galliform taxa, and finally a clade comprising Pavo and Afropavo to the exclusion of all other investigated taxa.
The mathematical TinT model, applied to birds, is currently in the process of being tested in several mammalian groups. We believe that it will prove to be a significant tool for all genomic projects characterizing the activity periods of retroposed elements.
We downloaded all 207,284 annotated genomic chicken CR1 sequences along with their 250 nt flanking regions from the University of California Santa Cruz (UCSC) Server [54, 55]. We analyzed this dataset searching for internally retroposed element insertions using the local version of RepeatMasker . From the RepeatMasker results we used a novel C-language script to extract all nested CR1 elements with their flanking CR1 host sequences. We considered a CR1 retroposed element to be nested if the upstream and downstream 25 nt flanking sequences were clearly assignable to CR1 host elements. We eliminated uninformative cases in which the host and nested CR1 elements were from the same subtype.
The number and subtype identity of all extracted host and nested CR1 elements were compiled in a 22-dimensional matrix (additional data file 1), which was used to calculate the relative integration period and transposition activity maxima for each nested CR1 subtype. This mathematical model (additional data file 1) considers the simplest scenario, that each CR1 subtype had only one period of activity and no specific target site preferences. For visualization we calculated cumulative TinT values (Figure 2).
An independent measure to examine the relative temporal order of transposon activity was obtained by comparing the average levels of nucleotide divergency from their consensus sequences in the various CR1 subtypes, assuming that the highest divergency appears in the oldest subtypes that became inactive first. The lowest divergency is expected for subtypes that are still active. However, the method is dependent on accurate consensus sequences . As RepeatMasker calculates the nucleotide divergency of each retroposed element from its consensus sequence, we used these values to calculate the average subtype divergency level (additional data file 1). We then used linear regression analysis to correlate the divergency values to the relative temporal position of CR1 subtypes obtained by the TinT method.
We used four computational strategies to find phylogenetically informative loci featuring presence/absence patterns of retroposed elements and random indels.
An ideal starting point to investigate galliform phylogenetic relationships is the chicken, the domesticated form of the Red Junglefowl (Gallus gallus), because the genome of this model organism has been fully sequenced. Thus, retroposon insertions can be bioinformatically located and the orthologous loci can be experimentally investigated in other galliform species. Chicken intronic sequences (23,236 introns) were downloaded from the Santa Cruz Server [55, 57]. After excluding duplicated sequences and, to facilitate PCR amplification, introns larger than 1 kb (17,300 introns), we screened for the presence of retroposed elements (RepeatMasker). The resulting loci (containing about 300 CR1 elements and 45 LTRs) were analyzed for the presence of conserved flanks (Santa Cruz Server) [58, 59] and 120 loci containing elements that retroposed during different parts of the relative timescale (TinT) were chosen to generate PCR primers.
Because the turkey genome is currently being sequenced as well, preliminary genomic data can be used to locate retroposed elements and to experimentally investigate potential sister groups of turkey. We downloaded all available trace sequences (6 million) from the turkey (Meleagris gallopavo) genome  and searched for retroposed element insertions (RepeatMasker). As in strategy I, the resulting 784 loci were then analyzed for the presence of conserved flanks (Santa Cruz Server; ) and 8 loci containing copies of the relatively young CR1-B2 and CR1-C2 were chosen to generate PCR primers to investigate potential turkey sister groups.
The introns selected under strategy I that contained LTR insertions (45 loci) were analyzed for the presence of conserved flanks (Santa Cruz Server; ) and 7 loci containing LTR elements were chosen to generate PCR primers.
Although no concerted effort was made to conduct a systematic search for phylogenetically informative random indels, all alignments of our retroposon presence/absence markers were checked for this potential additional source of information. This was possible, because most galliform introns are highly conserved and thus more easily alignable, compared to many mammalian introns for example [27, 28]. Here we restricted ourselves to shared indels larger than three nucleotides to ensure a certain level of complexity in the sequences, so that they would not be confused with coincidental random mutations.
We analyzed DNA samples and/or sequences from the following bird species. Anseriformes: Cairina moschata, Anas crecca; Galliformes: Alectura lathami, Crax fasciolata, Crax alector, Numida meleagris, Colinus virginianus, Callipepla squamata, Rollulus rouloul, Gallus lafayetii, Gallus gallus, Afropavo congensis, Pavo cristatus, Pavo muticus, Coturnix japonica, Perdix perdix, Chrysolophus pictus, Chrysolophus amherstiae, Tragopan caboti, Tetrao tetrix, Tympanuchus cupido, Meleagris gallopavo, and the outgroup, Taeniopygia guttata.
We designed PCR primers for sequences located in DNA regions highly conserved between chicken and human, to guarantee even higher conservation among the various galliform species (additional data file 2). PCR reactions were performed using Phusion DNA Polymerase (New England BioLabs, Beverly, MA). The first high throughput PCR was carried out in a 96-well plate format, amplifying the selected genomic loci from representatives of the investigated families and subfamilies: Cairina moschata, Alectura lathami, Crax fasciolata, Numida meleagris, Callipepla squamata, Rollulus rouloul, Gallus gallus, Pavo cristatus, Tetrao tetrix, and Meleagris gallopavo. PCR was performed for 30 s at 98°C followed by 35 cycles of 10 s at 98°C, 30 s at the respective primer-specific annealing temperature, and 30 s at 72°C. Following gel-electrophoreses, those loci in which fragment size shifts indicated the presence and/or absence of the embedded transposed elements were amplified in the expanded species sampling. All investigated PCR fragments were sequenced directly or purified on agarose gels, ligated into the pDrive Cloning Vector (Qiagen, Hilden), and electroporated into TOP10 cells (Invitrogen, Groningen). Sequencing was performed using the Ampli Taq FS Big Dye Terminator Kit (PE Biosystems, Foster City) and standard M13 forward and reverse primers.
In our analyses of retroposition presence/absence data, we applied the statistical test developed by Waddell et al.  to determine the level of statistical support for particular branching points of the galliform phylogenetic tree. This methodology assumes the existence of a prior hypothesis based on other data, and calculates the relative probability that one of the three possible branching patterns is correct based on the number of independent retropositional markers supporting the various hypotheses. p < 0.05 was considered to be significant (usually achieved with a minimum of three independent retropositional insertions).
The results of the TinT method were compared to those of the relative timescale obtained by the average level of CR1 nucleotide divergency using the standard linear regression model.
We are indebted to Nils Anthes, Sharon Birks, Roland Van Bocxstaele, Peter Galbusera, Herbert Grimm, Lorenz Husterer, Franz Müller, Julian Schnare, and Alexandra Wilms, who provided us with tissue samples. Many thanks go to Denise Kelsey and Loida Erhard for helping with PCR amplification and sequencing. We thank Winfried Scharlau and Volker Seibt for giving valuable comments on the multidimensional model. For editing the manuscript we thank Marsha Bundman. This work was supported by the Deutsche Forschungsgemeinschaft (SCHM 1469 to J.S. and J.B.).
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.