Polytene chromosomes as indicators of phylogeny in several species groups of Drosophila

Background Polytene chromosome banding patterns have long been used by Drosophila evolutionists to infer degree of relatedness among taxa. Recently, nucleotide sequences have preempted this traditional method. We place the classical Drosophila evolutionary biology tools of polytene chromosome inversion analysis in a phylogenetic context and assess their utility in comparison to nucleotide sequences. Results A simultaneous analysis framework was used to examine the congruence of the chromosomal inversion data with more recent DNA sequence data in four Drosophila species groups – the melanogaster, virilis, repleta, and picture wing. Inversions and nucleotides were highly congruent with one another based on incongruence length difference and partitioned Bremer support values. Inversion phylogenies were less resolved because of fewer numbers of characters. Partitioned Bremer supports, corrected for the number of characters in each matrix, were higher for inversion matrices. Conclusions Polytene chromosome data are highly congruent with DNA sequence data and, when placed in a simultaneous analysis framework, are shown to be more information rich than nucleotide data.


Background
Species in the family Drosophilidae have been premier research subjects in evolutionary biology since T. H. Morgan first used Drosophila melanogaster as a genetic tool in the early part of the 20th Century. Polytene chromosome phylogenies have become commonplace in the examination of this family of flies. The chromosomal analyses have been used in two ways in evolutionary studies. The first use is as genetic markers [1,2] in which the chromosomal inversions are considered alleles and are utilized to examine gene flow and other population genetic parameters. The second use of polytene chromosomes in evolutionary studies is as tracers of phylogeny [3][4][5][6][7]. This approach has resulted in important and detailed chromosomal phylogenies for several groups of flies in the genus Drosophila. Lists of species for which polytene chromosome maps have been produced are available [8,9], and over 300 species of Drosophila have been examined at this level. In particular, extensive chromosomal phylogenies for flies in the picture wing [4],melanogaster [5], virilis [6], and repleta [7] species groups exist. Cladistic analyses of the chromosomal data for these groups exist for the picture wing [10] and melanogaster species groups [11]. More recently, large amounts of DNA sequence information have been collected for these and many other species groups, yet no detailed analysis of the overall utility of chromosomal in-version data or of their congruence with DNA sequence data has been carried out.
The main objective of the present study is to place the chromosomal inversion data into a phylogenetic framework. We examine the congruence of DNA sequence characters and chromosomal inversion data in four different species groups in the genus Drosophila (picture wing, repleta, melanogaster and virilis) to assess the relative contribution to phylogenetic hypotheses that the two different sources of character information make. Table 1 lists the sources of the data used in this study. Table 2 shows the results of phylogenetic analysis of the inversion and DNA sequence character partitions separately and in combination. Tree topologies of the chromosomal and DNA sequence character partitions were very similar ( Figure 1). The major difference upon direct examination of the molecular and inversion cladograms in Figure 1 was the lack of resolution for inversions compared to DNA sequences. The similarity in topology of these trees was borne out by the ILD analyses In general, the agreement of chromosomal inversion topology with DNA sequence topology was extremely good. The number of nodes in the trees that disagreed (as assessed by a negative partitioned Bremer support value) for both data partitions was extremely low (Table 3). In all species groups examined here there was at least one node that was negative for partitioned Bremer support of the molecular partition, indicating that the molecular data are in conflict with the simultaneous analysis hypothesis for these nodes. Both the melanogaster and the virilis group chromosomal data showed complete agreement with the simultaneous analysis tree, while three nodes in the repleta group tree and one node in the picture wing tree had negative partitioned Bremer supports for the inversion partition. While there were many nodes that had zero partitioned Bremer support for the inversion partition (Table 3), the total support rendered by the inversion data to the simultaneous analysis tree was relatively large ("total BS" column in Table 3). To standardize the total partitioned Bremer support contribution of each partition to the simultaneous analysis tree we divid-

Figure 1
Cladograms as described in the text for the four species groups used in this study -a) the melanogaster species group; b) the virilis species group; c) the picture wing species group and d) the repleta species group. Numbers above branches indicate the bootstrap values for the nodes to the right of the number. The trees on the left are for DNA sequences only and the trees on the right are for chromosomal inversion data. < indicates a bootstrap value less than 50%. ed the total Bremer support by the number of phylogenetically informative characters in that character partition and by the minimum steps of the simultaneous analysis tree ("corrected total BS" column in Table 3). Using both kinds of correction (Table 3) resulted in inversions showing higher corrected values relative to the DNA sequence characters in seven out of eight comparisons. This result is most likely due to the considerably higher consistency of the chromosomal inversion data. Only when the total Bremer support values of both the inversion data and the molecular data were standardized by the minimum steps of the simultaneous analysis for the repleta group did we find that both data partitions contributed equally to the simultaneous analysis tree.
We also computed the consistency indices of the inversion and DNA partitions for the four data matrices using the simultaneous analysis trees for each as a constraint. The consistency indices for the inversion partitions were considerably higher than those for the DNA partitions. Previous surveys of the patterns and distributions of consistency indices with taxon number indicate that in general the CI decreases with the number of taxa in a study [12]. Figure 3 shows a plot of the CI versus number of taxa for both the inversion and the DNA partitions for all four data matrices. This figure demonstrates that inversions were highly consistent in all four cladograms and that they did not show the characteristic lowering of consistency index with increasing number of taxa that most character data show. However, the DNA partitions did show the depression of consistency index value with number of taxa. Together these analyses suggest that there is a high degree of agreement among classical chromosomal data and more recent DNA sequence data and that inversion data are extremely consistent with simultaneous analysis hypotheses of relationship in these groups of Drosophila.

Conclusion
Classical Drosophila studies have used chromosomal inversions to understand phylogeny and speciation. More recent DNA sequence information can be combined with these classical data to make inferences about phylogeny and species divergence. The approach we have taken here is to combine the chromosomal inversion data with DNA sequence data to examine some of the classical notions of Drosophila evolution. Our results suggest that there is a great deal of congruence among DNA sequence data and chromosomal inversion data. Although this result is reassuring, there are still some areas of the phylogenies of the four species groups examined here that are not congruent. These areas are indicative of poor phylogenetic signal from one or both of the kinds of data -DNA and chromosomal inversion data. Chromosomal inversion data are much more information rich as as-

Figure 2
Cladograms showing the total evidence hypotheses for combined analyses of DNA sequences and chromosomes. a) the melanogaster species group; b) the virilis species group; c) the picture wing species group and d) the repleta species group. The numbers above nodes indicates the bootstrap value and the number below indicates the Bremer index. < indicates a bootstrap value less than 50%. Gene Abbreviations: Yp-1 = Yolk protein, Adh = Alcohol dehydrogenase, Sry = Serendipity, nullo = nullo, 28S = 28S rDNA, 5S = 5S rDNA, Amy = α-amylase, G6pd = glycerol-6-phosphate dehydrogenase, COII = cytochrome oxidase subunit II, 16S = 16S rDNA, hb = hunchback, ND1 = NADH dehydrogenase subunit 1, bp = base pairs, Chromo = chromosome. Abbreviations: Total = combined data set, Molecular = only DNA sequence data, Inversions = only inversion data, CI = consistency index, RI = retention index, PI = number of parsimony informative characters in the data partition, ST = length of shortest tree, #T = number of most parsimonious trees obtained in the analysis, nch = number of characters in the data set, ntx = number of taxa.
sessed in the present study, and this is probably due to the higher consistency of these characters. This higher consistency stems from that fact that, while nucleotide characters have only four possible states to change to and from, chromosome breaks can occur at many different locations. Therefore, it is much simpler to change from an A to a T in two unrelated lineages than it is to have the exact same chromosome break in those same lineages.

Data matrices
Four data matrices were constructed using DNA sequences and chromosomal inversion data from the literature ( Table 1). The four species groups for which we have obtained inversion data represent the four major species groups for which such data exist. Chromosome inversion information for other smaller groups such as the antopocerus species group (Hawaiian Drosophila) is published [13], however DNA sequence data are not yet available for these groups.

Phylogenetic Trees
Phylogenetic trees were generated using PAUP 4.0b7 [14]. In order to assess the relative contribution of chromosomal inversion and DNA sequence data we placed our analysis in a simultaneous analysis framework [15][16][17]. Bootstrap values were computed using PAUP 4.0b7. Decay indices were computed using AUTODECAY [18] and using the methods described in Baker et al. [19]. Significance of Incongruence Length Differences (ILDs [20,21]) were calculated using the Partition Homogeneity Option in PAUP 4.0b7 [14]. Partitioned Bremer supports for the inversion partition and the DNA partition were calculated using the approaches outlined in Baker et al. [19].

Phylogenetic Measures
Here we include definitions of several phylogenetic measures used in this paper. The consistency index (CI; [39]) is used to determine how much homoplasy (i.e., how many times a character evolves on a tree) a given character displays. The CI of a character is the mimimum number of steps for that character on a given tree divided by the total number of steps reconstructed for that character on the same tree. Those characters which are highly consistent, or without homoplasy, would have a consistency index of 1.

Figure 3
Plot of the consistency index of chromosomal inversion partition (dotted line) and the DNA sequence partition (solid line) when forced to fit the parsimony tree versus the number of taxa in the data set. The melanogaster species group analysis has eight taxa, the virilis species group analysis has 12 taxa, the picture wing species group has 35 taxa and the repleta species group has 54 taxa.
The decay index or Bremer support value is used to measure support at a node of interest in a phylogenetic tree. To obtain the decay index, the treelength of a tree constrained not to contain a node of interest is substracted from the unconstrained (most parsimonious) treelength. Higher numbers generally indicate greater support at a node. Partitioned Bremer support (PBS; [19]) measures the amount of support provided by each individual partition to the DI for every node in the combined analysis phylogenies. PBS is an extension of the decay index in that it shows the contribution of each partition to the decay index of every node on the combined analysis tree. To obtain the PBS value for a given node on a combined tree, the length of the partition on the unconstrained combined tree is subtracted from the length of a partition on a tree constrained to not contain the node of interest. If the partition supports a relationship represented by a node in the combined tree, then the PBS value will be positive. If, on the other hand, a partition supports an alternative relationship, the PBS value will be negative. The magnitude of PBS values indicate the level of support for, or disagreement with, a node. The sum of all partition lengths for any given node will always equal the decay index for that node on the total evidence tree.