Phylogenetic analysis and temporal diversification of mosquitoes (Diptera: Culicidae) based on nuclear genes and morphology

Background Phylogenetic analyses provide a framework for examining the evolution of morphological and molecular diversity, interpreting patterns in biogeography, and achieving a stable classification. The generic and suprageneric relationships within mosquitoes (Diptera: Culicidae) are poorly resolved, making these subjects difficult to address. Results We carried out maximum parsimony and maximum likelihood, including Bayesian, analyses on a data set consisting of six nuclear genes and 80 morphological characters to assess their ability to resolve relationships among 25 genera. We also estimated divergence times based on sequence data and fossil calibration points, using Bayesian relaxed clock methods. Strong support was recovered for the basal position and monophyly of the subfamily Anophelinae and the tribes Aedini and Sabethini of subfamily Culicinae. Divergence times for major culicid lineages date to the early Cretaceous. Conclusions Deeper relationships within the family remain poorly resolved, suggesting the need for additional taxonomic sampling. Our results support the notion of rapid radiations early in the diversification of mosquitoes.


Background
Mosquitoes (Diptera: Culicidae) are a monophyletic group of true flies [1][2][3][4], recognizable by their elongate adult mouthparts through which the females of most species feed on vertebrate blood. Mosquitoes occur throughout temperate and tropical regions, and well beyond the Arctic Circle, but are most diverse in tropical forest environments [5]. A bewildering amount of morphological diversity parallels their spectacular radiation into virtually every conceivable collection of water, ranging from a few droplets trapped by plant parts to large bodies of fresh and brackish surface water, making mosquitoes "as ubiquitous as water" [6]. The relationship between human health and those species that are medically important (<200 of 3,524 currently recognized; http://mosquito-tax onomic-inventory.info/) has driven most mosquito research. Within this small subset of disease vector species, morphological similarities between close relatives (e.g., cryptic or sibling species complexes) continue to pose practical and academic challenges to disease control, conventional taxonomy, and phylogenetic inference. Ironically, it is not the morphological similarity but rather the morphological diversity of mosquitoes that has confounded efforts to delimit many supraspecific groups and reconstruct their evolutionary history.
The Culicidae had an ancient origin, probably in the Jurassic [7,8], consistent with the fossil record of their sister group Chaoboridae [9]. Unfortunately, the sparse mosquito fossil record sheds no light on evolutionary relationships in the family. Traditional classification of Culicidae based on the phenetic framework of Edwards [7] resulted in arbitrary groupings reflecting intuitive interpretation of morphological similarities. More modern classifications have incorporated important revisions of select genera and tribes based on explicit methodology, but in the absence of a comprehensive application of quantitative methods across the family, the result still does not entirely reflect evolutionary history [reviewed in [2]]. Current classification divides the family into two subfamilies (Anophelinae and Culicinae), 11 tribes, and a minimum of 44 genera [2,[10][11][12][13] (Table 1).
Generic-level relationships across all Culicidae have rarely been studied. The first attempt [14] was based on comparative bionomics and morphology using intuitive methods typical of that time. Surprisingly, further attempts using modern cladistic methods were not made for nearly 50 years. The most comprehensive of these phylogenetic reanalyses employed 73 morphological characters to examine the relationships of the 38 genera then recognized [1]. In general, almost none of the hypotheses raised by the 1951 phylogeny were supported in the 1998 reconstruction, with few exceptions including the monophyletic and basal position of the subfamily Anophelinae, and the monophyly of the tribes Sabethini and Culicini. However, most characters were homoplastic -some extensively -and many relationships were inadequately resolved. Although the Harbach and Kitching [1] study challenged traditional generic groupings and reinforced the need for reappraisal, it also suggested that robust recovery of generic-level relationships of Culicidae may be difficult with morphological characters alone.
Only four higher-level phylogenies of Culicidae based on gene sequences have been published, each of which were very taxon-limited in scope. All were able to show that, in agreement with the morphological phylogenies, Anopheles was sister to other sampled genera [4,[15][16][17]. The study by Miller et al. [4], based on four mosquito species, was inconclusive and the more comprehensive study of Shepard et al. [15], based on 18S rDNA sequences of 39 species representing nine genera, was unable to resolve deeper relationships. Of the four studies, only Besansky and Fahey [16] employed a single-copy nuclear protein coding gene to assess relationships. Their study, based on the white gene, included 13 species representing nine genera. When third codon positions were excluded, Anophelinae was recovered as a basal lineage and Sabethini, Culicini and Aedini were recovered as monophyletic, suggesting the potential of protein-coding sequences for reconstructing generic-level relationships within Culicidae.
The importance of sampling multiple genes when attempting to reconstruct species phylogenies is well recognized [18]. Mitochondrial DNA and nuclear ribosomal DNA are convenient targets, due to conserved primer binding sequences and ease of amplification based on their typically high copy number. However, both can be problematic for resolving deep phylogenetic relationships. Mitochondrial DNA may exhibit a high mutation rate, skewed base composition, and even symbiontinduced biases [19,20]. Beyond base compositional bias, ribosomal DNA also can be exceedingly difficult to align [21]. Given the increasing availability of completely sequenced mosquito genomes, protein-coding nuclear genes represent a viable alternative as well as a rich and largely untapped resource.
In the study reported here, we explored the phylogenetic utility of six nuclear protein-coding genes: arginine kinase, CAD, catalase, enolase, hunchback, and white. As noted above, only white was used previously in mosquitoes [16]. All except catalase have been used in other insect groups: CAD in bees, empidoid flies, and lacewings, among others [22][23][24]; arginine kinase in hymenopterans [25,26]; enolase in beetles [27]; and hunchback in Hawaiian drosophilids [28]. We sequenced these six genes from 26 mosquito species representing 25 genera, and two chaoborid outgroup species. In addition, 80 morphological characters were scored from these mosquito and outgroup species. Our goals were twofold: (1) to estimate a generic-level phylogeny of Culicidae based on molecular and morphological evidence, and (2) to use fossils and sequence data to infer divergence times for major culicid lineages.

Taxon sampling
Twenty-six species of mosquitoes representing 25 genera were used as ingroup taxa (Table 1). Two chaoborid midges, Eucorethra underwoodi and Chaoborus astictopus, were used as outgroup taxa based on the sister-group relationship between Chaoboridae and Culicidae [3,29,30]. Specimens were preserved in 70-100% ethanol at -20°C. Sampling of additional ingroup genera for this study was precluded by unavailability of specimens adequately preserved for molecular analysis; DNA from pinned museum specimens or specimens stored at room temperature for prolonged periods was found to be excessively degraded.

DNA extraction, PCR, and sequencing
Sequences were obtained from six nuclear protein-coding genes (arginine kinase, CAD, catalase, enolase, hunchback, and white) from VectorBase http://www.vectorbase.org for the three mosquito species with completely sequenced genomes (Anopheles gambiae, Culex quinquefasciatus, and Aedes aegypti), from GenBank for available white sequences (GenBank accession numbers U73829, AF318199, AF318200, U73834, U73827, AF318206, U73835, U73837, AY055811, AF318193, AF318209), or by PCR amplification and direct sequencing. Genes were amplified and sequenced using degenerate primers previously reported in the literature, or designed based on amino acid alignments of respective genes from the three sequenced mosquito species. Primer sequences and their sources are provided in Table 2. Primers were located in regions that would allow amplification within a single large exon, or across exons separated by small introns, to facilitate amplification from genomic DNA templates. Primers were hemi-nested, whereby the first round of amplification based on the outermost pair was followed by alternative second round PCR from which two internal, overlapping fragments were amplified ( Figure 1).

Phylogenetic analysis
Nucleotide sequence alignments were guided by the corresponding amino acid alignments, using utilities within the program suite EMBOSS [33]; http://emboss gui.sourceforge.net/demo/. Inferred amino acid sequences were aligned using the program "emma," which provides an interface to ClustalW [34]. The Gonnet matrix [35] was chosen and the resulting alignment was followed by limited manual adjustment. "Tranalign" was then used to align the nucleotide sequence based on the previously aligned amino acid sequence. Introns were removed from aligned gene regions before analysis.
Basic sequence information (pairwise sequence divergence, base composition, statistical tests of homogeneity of base composition, number of variable and parsimony informative characters) was obtained using PAUP* v4.0b10 [36]. In addition, plots of transitions and transversions versus divergence at each codon position were based on observed (uncorrected) p-distances from PAUP*.
Maximum parsimony (MP) analyses were implemented in PAUP* on a phylogenetic data set that included concatenated genes with/without morphological characters. Third codon positions for each gene were removed and gaps were treated as missing data. Heuristic searches consisted of 1000 random sequence additions with tree bisection-reconnection (TBR) branch swapping. Bootstrap support values were based on 500 replicates, each with 10 random additions and TBR branch swapping.
Maximum likelihood (ML) analyses were performed on molecular data sets only, which included both individual/ concatenated genes, with/without third codon positions. The ML heuristic searches were performed in PAUP*, using the model of nucleotide substitution and parameter values selected via Modeltest [37]. Values for the substitution matrix, base composition, gamma distribution of among-site rate variation (G) and the proportion of invariant sites (I) are available from the authors on request. Bootstrap resampling was conducted using 1000 replicate neighbour-joining (NJ) trees based on the ML substitution matrix. The Shimodaira-Hasegawa test [ [38]; data not shown] was used to test for incongruence between phylogenies suggested by individual genes, or successive combinations of congruent genes. Bayesian (BI) phylogenetic tree searches were performed in MrBayes 3.1.2 [39] on concatenated gene and gene + morphology data sets using aligned nucleotides, both including and excluding third codon positions, and using concatenated aligned amino acids. For concatenated genes (nucleotides) and genes (nucleotides) + morphology, a mixed model approach was used with model parameters specified per gene partition according to Modeltest and a Markov K + G model for morphology, with branch lengths unlinked and estimated for each partition [40,41]. Bayesian tree searches using aligned amino acids were carried out in MrBayes 3.1.2 using the WAG model of amino acid evolution (WAG+I+G) [42]. Each Bayesian search was carried out for 10,000,000 generations (sampling every 1000) using four chains (default heating parameters) and a 30% burn-in value. The included Bayesian sets of trees were sampled after likelihood scores reached convergence and the mean split difference values were below 0.02.

Divergence time estimation
Estimates of divergence times for mosquito lineages were calculated using the parametric Bayesian-relaxed clock approach implemented in the programs ESTBRANCHES and MULTIDIVTIME [43] and using the combined gene data set (nucleotides) including third codon positions. Branch lengths and evolutionary rate priors were estimated from the data using the BASEML program in the PAML software package [44] and ESTBRANCHES. Tree topology, minimum and maximum root node age, and fossil-based minimum age constraints are set as userdefined analysis priors. For the tree topology we used the tree recovered from the BI search of combined amino acids (see Phylogenetic Analyses, below). The root max-min age prior between Culicidae and outgroups was set as 230-187 Ma corresponding to the hypothesized age of the Diptera [6,8] and a fossil assignable to the Chaoboridae [187 Ma; ref [9]], and three lineages were constrained according to fossil-based minimum ages (Toxorhynchites mexicanus, 16 Ma; Culex winchesteri, 34 Ma; Anopheles dominicanus, 34 Ma; http://mosquito-taxonomic-inven tory.info/category/fossil-culicidae/fossil-culicidae) [45]. We followed the analytical procedure described in Rutschmann et al. [46] and in the MULTIDIVTIME readme files, and ran the Markov chain for 1.1 × 10 6 cycles with samples collected every 100 cycles and discarded the first 100,000 cycles as burn-in. We performed the MULTIDIV-TIME analysis multiple times from different initial conditions to confirm convergence of the Markov chain on highly similar resulting time estimates and posterior intervals.

Sequence variation
Across six genes, the molecular data matrix consisted of 5352 aligned characters, of which 2839 were variable and 2259 were parsimony informative (Table 3). Not surprisingly, most of the variation was found in the third codon (nt3) position. Analysis of base composition of the combined genes for each major taxonomic grouping revealed significant departures from homogeneity at the nt3 position, owing to three groups: Anophelinae, Culicini (Cx. quinquefasciatus) and Ficalbiini (Mimomyia luzonensis) ( Table 4). Moreover, plots of transitions and transversions against uncorrected pairwise nucleotide divergences at each codon position suggested saturation of transitions at the nt3 position ( Figure 2). These results prompted us to perform ML and BI phylogenetic analyses both with and without the nt3 partition, and to exclude this partition in MP analyses (see Phylogenetic analyses).
Uncorrected pairwise sequence divergence within the mosquitoes sampled ranged widely, from 10-27% (summarized by major taxonomic groupings in Table 5). However, the average distance between tribes and subfamilies was at the upper end of the range (20% and 23%, respectively), approaching that between mosquitoes and their sister group, the chaoborid midges (26%).

Phylogenetic analyses
Evidence was found for incongruence among some genes or gene combinations via the Shimodaira-Hasegawa test [38]; data not shown. However, examination of phylogenies resulting from individual genes revealed that topological incongruence was generally limited to certain poorly supported nodes. It has been suggested that different data sets may have a common phylogenetic signal recoverable only upon combined analysis [47,48], and, under the hypothesis that combining data from multiple genes may potentially overcome misleading signal in individual genes [49], we conducted further analyses to compare results from a concatenated data set with results obtained from ML and BI.
Relationships inferred by MP and ML, including BI, are summarized in Figure 3 and Table 6. All three algorithms, as applied to various data partitions (± nt3; ± morphological characters; nucleotides or amino acids), gave overwhelming support for the monophyly of Culicidae (node O), the monophyly and basal position of the subfamily Anophelinae (node A; gray box), and the monophyly of the tribe Sabethini (node I; gray box). Less conclusive support by ML, but reasonable support by MP and BI, was observed for the monophyly of the tribe Aedini (node C). These results confirm the conclusions of Harbach [2] regarding what was already known about the phylogeny of mosquitoes. Support varied for relationships within these well-supported clades. The subfamily Anophelinae was represented in this study by three species from two genera: Bironella (Bironella gracilis) and Anopheles (Anopheles atroparvus, subgenus Anopheles; An. gambiae, subgenus Cellia). Not all analyses supported the monophyly of the genus Anopheles; an alternative relationship of Bi. gracilis + An. atroparvus was also recovered. There is precedence for the paraphyly of Anopheles relative to Bironella in previous morphological [50,51] and molecular [52] studies. Reliable inference of relationships between these groups may be problematic due to conflicting signals or contemporaneous radiations, but the suggestion of Sallum et al. [50] to redefine Bironella as an informal group within Anopheles seems premature [2,53].
Within tribe Sabethini, Malaya occupied the most basal position among the taxa sampled, although this placement was not recovered in a subset of BI analyses. The genus Maorigoeldia, containing only a single species exclusive to New Zealand, was sister to Tripteroides (Oriental, Australasian and Palaearctic species), in all cases with  100% bootstrap support or posterior probability of 1.0. These relationships were not recovered in cladistic analyses of morphological data that included representatives of a larger number of genera. Belkin [54] regarded Maorigoeldia to be sister to all other sabethine species. This was supported in the studies of Harbach and Kitching [1], Harbach and Peyton [55], and Harbach et al. [56], but not in the study of Judd [57], which placed Maorigoeldia as the sister group to the New World genera of Sabethini. Whereas Malaya was recovered as the sister of genus Topomyia in the first two of these four studies, it was paired with Limatus in the most derived clade of Sabethini when Harbach et al. [56] included the new genus Kimia in the data set of Harbach and Peyton [55]. Also, Tripteroides (Old World) was recovered as sister to Trichoprosopon (New World), which is supported by shared morphological characters that are unique to these two genera.
Decisive support for monophyly of the New World genera (Limatus, Sabethes, Shannoniana, Trichoprosopon, and Wyeomyia) was found, in agreement with previous studies [2,57]. Among these genera, a close relationship between Numbers of transitions or transversions at each codon position (nt1, nt2 and nt3) plotted against uncorrected nucleotide divergence for pairwise species comparisons across the combined six-gene data set Limatus, Sabethes and Wyeomyia was strongly supported by all analyses, but other nodes were unstable. Although not apparent in Figure 3, Shannoniana and Trichoprosopon showed a sister relationship in all but the BI amino acid analysis. However, as indicated in the previous paragraph, the results of cladistic analyses based on morphological data and a larger sample of sabethine taxa casts doubt on these relationships.
The remarkably large tribe Aedini (1255 species, http:// mosquito-taxonomic-inventory.info/taxonomy/term/ 6065) has been the subject of recent efforts to infer higherlevel relationships based on morphological characters of all life stages [10][11][12][13]. Although this has resulted in major changes to classification, phylogenetic resolution has been limited. In the present study, as in the cladistic analyses of extensive morphological data by Reinert et al. [11][12][13], Psorophora was recovered as sister to all other Aedini.
Sister-group relationships strongly supported in most cases were Aedes (Stegomyia) + Eretmapodites and Haemagogus + Ochlerotatus. Other relationships within Aedini were less clear. Moreover, no consensus could be reached regarding affinities of any other genera within Culicidae as a whole, outside of Aedini and Sabethini.

Divergence time estimates
A chronogram for Culicidae is given in Figure 4, and corresponding divergence time estimates are provided in Table 7. Based on the taxa sampled and three fossil constraints, earliest divergence within mosquitoes -between the lineages leading to Culicinae and Anophelinae -dates to ~226 Ma. This estimate is in reasonable agreement with Krzywinski et al. [58], who determined the split between Anopheles and Aedes (Stegomyia) to have occurred ~145-200 Ma based on mitochondrial DNA sequences. Although 226 Ma is substantially older than the 118 Ma divergence between Chaoboridae and Culicidae estimated by Bertone et al. [8], the 95% credibility ranges overlap between studies. Because the latter study was aimed at deeper divergences within lower Diptera, Bertone et al. [8] only included two mosquitoes and one chaoborid, possibly accounting for the discrepancy. Moreover, Bertone et al. [8] estimated divergence times from a single gene (28S rDNA) and used only a few fossil calibration points (none close to Culicidae), which may also have contributed to differences in age estimates. We favor the older divergence estimates, as they are consistent with other evidence suggesting that mosquitoes likely originated in the Jurassic [7,59]. As early as 1923, Edwards [60] surmized that the "origin and phylogenetic history of the Culicidae must go back to well into the Mesozoic Era."   Phylogram of relationships among mosquito species, inferred by Bayesian likelihood analysis of combined amino acids Figure 3 Phylogram of relationships among mosquito species, inferred by Bayesian likelihood analysis of combined amino acids. Amount of inferred character change is indicated by the scale bar below. Numbers associated with nodes are Bayesian posterior probabilities above 0.5. Letters associated with nodes refer to bootstrap support values and posterior probabilities estimated from alternative analyses, provided in Table 6 Table 4), is presently unknown. However, given the universal agreement that Anopheles occupies an early-branching position among the Culicidae, it seems likely that our crown-group estimates do not accurately reflect the age of this group.
The only well-supported clades of the subfamily Culicinae in the phylogenetic analyses are the tribes Aedini and Sabethini, which apparently arose at similar times (roughly 112 and 115 Ma, respectively) and diversified more recently. Eight genera of the subfamily Culicinae (Aedeomyia, Coquillettidia, Culiseta, Culex, Mimomyia, Orthopodomyia, Toxorhynchites, Uranotaenia), whose relationships were not strongly or consistently recovered, represent the deeper internal branches of the tree. The nodes connecting these branches are not only ancient (exceeding 127 Ma), but also relatively close together in time, occurring within a ~30 million year interval between 127-158 Ma. If these estimates are corroborated in the future by denser taxon sampling and more fossil-based age constraints, they will support the notion of rapid radiations early in the diversification of mosquitoes, potentially explaining the difficulty in attaining a stable phylogeny for these lineages. An early Cretaceous timing for these rapid radiations is consistent with the appearance of angiosperms, a group of plants whose nectar is exploited as an energy source by mosquitoes [61], and whose waterfilled parts are the sole habitats occupied by the immature stages of many groups of mosquitoes, notably members of the tribe Sabethini [54].

Conclusion
This study represents one of the few attempts to reconstruct generic-level relationships within Culicidae as a whole, and the only attempt to combine morphological data and molecular characters from multiple genes. Among molecular phylogenetic studies of the family, it more than doubled the number of taxa sampled to date. Yet results were mixed. The ability to recover previously known clades (Anophelinae, Sabethini, and Aedini) was encouraging. However, the deeper relationships among genera could not be resolved unambiguously, potentially due to ancient and rapid radiation, as hypothesized for other insect groups [6]. There has been much debate regarding whether better resolution and support of relationships is achieved through broader taxonomic sampling [62,63] or sequencing of more loci [64]. The current explosion of sequencing whole genomes from organisms, including mosquitoes, promises many potentially informative genes beyond those included here. On the other hand, the recent study by Wiegmann et al. [49] successfully resolved even deeper divergences and a longstanding controversy in the phylogeny of holometabolous insects, using only six single-copy nuclear genes comprising a similar number of base pairs to that compiled for the present study. Although more molecular, as well as morphological, characters may well prove useful, there is little doubt that broader taxonomic sampling is now the key roadblock. Considering that the mosquito diversity housed in museums is almost invariably preserved in a fashion that has impeded conventional molecular data collection, this roadblock may be substantial. There is an urgent need for fresh museum collections, particularly from under-sampled yet high-biodiversity regions worldwide, and their cryo-or ethanol preservation with vouch- Chronogram of mosquito age divergences with 95% confidence intervals (red bars) Figure 4 Chronogram of mosquito age divergences with 95% confidence intervals (red bars) Numerical node ages and their 95% confidence intervals are presented in Table 7. Calibration points: Chaoboridae+Culicidae, 230-187 Ma [8,9]  ers. The present study was limited by what was available in existing collections. Broader taxon sampling is crucial not merely because it may help break up long branches. To the extent that the current generic system of classification includes paraphyletic and polyphyletic groups containing numerous species, it is clear that inclusion of only one or few generic exemplars can be misleading, and that more representative sampling is needed.