The study system
The species of the ant genus Lasius are currently placed in six subgenera, Acanthomyops, Austrolasius, Cautolasius, Chthonolasius, Dendrolasius, and Lasius sensu stricto . The most recent taxonomic revision at the genus level dates back to 1955 . It recognised four subgenera: Cautolasius, Chthonolasius, Dendrolasius, and Lasius sensu stricto. The fifth subgenus, Austrolasius, was established in 1967  and includes one species which had up to then been placed in Chthonolasius. The sixth subgenus, Acanthomyops, has an unstable history in that its status changed several times between that of a separate genus and being a subgenus, mostly of Lasius . Evidence for its inclusion into Lasius has accumulated [38, 76] and it was therefore formally returned to Lasius . Monophyly of the subgenera is supported by various pieces of evidence , except for Lasius sensu stricto. Lasius sensu stricto harbours one taxon, L. pallitarsis, which on morphological and molecular grounds has been hypothesised to best constitute a separate subgenus . We here validate these earlier findings .
Social parasitism is confined to four subgenera, Acanthomyops, Austrolasius, Chthonolasius, and Dendrolasius, with all species of these subgenera obligatorily displaying this lifestyle [1, 66]. One subgenus, Dendrolasius, is hyperparasitic in that it parasitises parasitic Chthonolasius species [1, 36]. Fungiculture is known from two subgenera, Chthonolasius, and Dendrolasius  and, as far as known, all members of the two subgenera use fungi (B.C. Schlick-Steiner, unpublished data; M. Maruyama, unpublished data; ), although New World species still await study.
The material of this study is listed in Table 1. It comprises 27 species of Lasius (including one undescribed) representing all 6 of the subgenera currently recognised as well as Lasius pallitarsis. We thus present data on more than a quarter of the currently 100 valid extant Lasius species , covering both the Palaearctic and the Nearctic regions. The number of species per subgenus ranged from one to 11. Only two and three species each of Chthonolasius and Acanthomyops, respectively, were analysed, with each sample carefully chosen to be unambiguously identified to species (because the species of these subgenera are known for their habitual multidirectional hybridisation [78, 79] which is likely to compromise the resolving power of gene sequences  and distorts morphological characters ). To account for intraspecific diversity, up to four colonies per species were included wherever possible. Two species of Myrmecocystus, which genus is sister to Lasius [[42, 49]; confirmed by a personal communication by P.S. Ward, of February 2008], as well as Formica japonica were used as outgroup. Species were identified according to [1, 36, 77, 81–83]. Voucher specimens are deposited at the National Science Museum, Tokyo under the voucher numbers listed in Table 1.
Some samples were irreplaceable dried museum specimens. Therefore, genomic DNA was extracted from whole body using a DNAeasy tissue kit (Qiagen, Hilden, Germany) using established protocols  without any damage to the voucher specimens. We used 1 μl of DNA (25 – 50 ng/μl) as template for PCR amplification. A 490 – 550 bp region of 16S rRNA was amplified and sequenced using primers "16Sar-L" 5'-CGCCTGTTTATCAAAAACAT-3' and "16Sar-L2" 5'-CCGGTCTGAACTCAGATCATG-3' originally taken from  but the latter one slightly altered and thus renamed. A ca. 900 bp region of cox1 was amplified and sequenced using the new, degenerated primers "Lasius-L" 5'-TAYCCGCCATTAGCTTCAAA-3' and "Lasius-R" 5'-TGAAATTAAGGATCCAATWGA-3'. Reactions were carried out at 10 μl volumes in a PCR Thermal Cycler MP (TaKaRa Bio Inc.) under the following conditions: a first cycle of 94°C for 3 min, followed by 35 cycles of 94°C for 30 s, annealing at 50°C for 50 s, and finally 72°C for 1 min for the 16S rRNA; for cox1 all settings were identical except for annealing which was set to 42°C for 1 min 15 s. PCR products were purified with 0.5 μl of ExoSap-IT (GE Healthcare Life Sciences). All products were sequenced in both directions using BigDye Terminator v3.1 (Applied Biosystems) on an ABI 3100 Avant DNA Sequencer (Applied Biosystems) at the National Science Museum, Tokyo. The sequence data were deposited at DNA Data Base of Japan, DDBJ (see Table 1 for accession numbers).
Exploration of molecular data concerning potential causes for phylogenetic distortion
The 16S rRNA and cox1 sequences were aligned with default settings of the program Clustal X v1.83 ; ambiguously aligned sites in the 16S rRNA alignment were excluded. We partitioned the cox1 sequences into the first, second and third codon positions using the program DAMBE v4.2.13 . This program was also used to perform tests for the saturation of substitutions  on the cox1 and 16S rRNA data. For cox1 all codon positions were tested simultaneously, as well as separately. We found no indication of substitution saturation (see Results for detail), but we additionally performed the cox1 MCMC analysis without the third codon position.
To detect potential positive selection we used the program HYPHY  accessed through the Datamonkey interface http://www.datamonkey.org. Mean numbers of nonsynonymous substitutions (dN) and synonymous substitutions (dS) per site (ratio dN/dS) were estimated in the cox1 data using the fixed effect (two-rate FEL) method and basing estimates on a Neighbour-Joining tree under the HKY substitution model. We used a nominal alpha level of 0.1.
To avoid any effect from compositional heterogeneity of sequences  on the phylogenetic reconstruction we separately tested each codon position of cox1 as well as 16S rRNA using the program TREE-PUZZLE 5.2 . When we found indications for compositional heterogeneity we recoded the sequences into purines and pyrimidines and repeated the test.
The morphological characters considered are presented in Additional File 1. Our rationale in composing the character set was aiming at, first, a comprehensive capturing of variation at the species, subgenus and genus level, and, second, exclusion of any characters potentially distorting the phylogenetic reconstructions. We started from the complete set of character definitions of Janda et al. . Pursuing our first aim, we applied 67 characters and their states exactly as described by , adapted – because analysis of our material revealed the necessity to do so – the definitions of character states, partly also concerning the number of character states, for another 16 characters, out of which 7 characters also were adapted in the character definitions themselves, and added 20 entirely new characters. Pursuing our second aim, we excluded from the set presented by  the three behavioural/ecological characters (among others the occurrence of social parasitism), and on the basis of information from [1, 2, 13, 54, 55, 63, 64] we further excluded 37 morphological characters, including four of the new ones. The excluded characters were characters which we suspected to be functionally coupled with temporary social parasitism, either (i) because of direct functional reasoning, or (ii) because they are known to be correlated with social parasitism in other ant genera which contain both parasitic and non-parasitic species. The characters excluded under (i) concerned the mandible, which is needed by parasitic Lasius queens to dismember host workers and strangle the host queen; the maxillary palp and the scape, which may be under selective pressure to be short and robust so as to escape damage by aggressive hosts in the initial stage of colony take-over; and the mesosoma size, because parasitic, i.e., dependently founding, queens need less tissue storage than independently founding queens. The characters excluded under (ii) concerned the length of hairs, which can be either extremely long or extremely short in parasites; the pubescence, which is absent in some parasites; the size of the head and the overall body, which both are frequently small in parasitic queens; and the shape of the petiole, which is frequently aberrant in parasites. We finally also excluded three characters for which we observed no variation in our material (their general lack of variation for the species analysed confirmed by a personal communication by B. Seifert, of April 2008). For details of how we composed our morphological data set including information on the provenance of characters and whether we excluded them, see Additional File 1. Overall, we included 64 morphological characters in our reconstructions; 35 of these concern adult workers, 18 adult queens, 23 adult males, and 2 worker larvae; 47 are binary and 17 are multi-state. To explore the effect of excluding characters, as we had done, on phylogenetic reconstructions, we also subjected the complete data set to reconstructions (but excluding invariant characters); this data set included 99 characters. We assessed any morphological data by analysis of voucher material housed in the National Science Museum, Tokyo, except the three characters concerning larval morphology. A total of 155 specimens were analysed. All multistate characters were treated as unordered.
To select the best-fitting nucleotide substitution models for cox1 and 16S rRNA we used the hierarchical likelihood ratio test (hLRT) and the Akaike Information Criterion (AIC) implemented in the program MrModeltest 2.2 . When sequence partitions had been recoded into purines and pyrimidines, models were adjusted to account for the two state character of the data. AIC and hLRT in one instance differed in the selection of models for the single DNA sequence data partitions (Table 2). AIC selected the more parameter rich model and we opted for this solution as erring on the side of overparameterisation is preferable over the opposite [93, 94].
For the morphological data the Markov k (Mk) model  was applied both with (+Γ) and without gamma-distributed rates of character change in separate MCMC runs. We used Bayes factors for model selection as they are established to provide good orientation tools in this  and calculated them as follows, using the outcomes of the single MCMC runs: 2LnB10 = 2 × (Harmonic Mean Ln likelihood for Mk – Harmonic Mean Ln likelihood for Mk+Γ). In the interpretation of the yielded absolute value of 2.5 in favour of the Mk+Γ model we followed published recommendations , on page 777, and consequently used the Mk+Γ model for all reconstructions.
Bayesian analysis using MCMC was performed with MrBayes 3.1.2  on the individual data sets (cox1, 16S rRNA, morphology) and the combined, concatenated data set (cox1 plus 16S rRNA plus morphology). We also analysed the combined, concatenated data set including those morphological characters suspected to be coupled with social parasitism. In addition, we analysed the concatenated molecular data (cox1 plus 16S rRNA) without any morphological data. Data partitions were established to allow model parameters to be separately estimated for all partitions and additionally for the single codon positions of cox1. 10,000,000 generations with a sample frequency set to 100 were run. As after 9,000,000 generations stationarity was achieved with average standard deviation of split frequencies in all cases constantly below 0.002 except the reconstruction with the W55 constraint for which it was 0.063 (Table 3), we always used the last 10,000 trees of each run to compute a majority rule consensus tree assigning posterior probabilities of tree topology. We also confirmed that true convergence had been reached and that the MCMC was sampling from the posterior distribution by repeating all runs three times and checking for congruence across the runs. All runs were performed using parallel versions of MrBayes, implemented on a SGI Origin 3800 under IRIX version 6.21m, of HPC, James Cook University. All MCMC runs achieved stationarity and detailed statistics on the runs are presented in Table 3. In the interpretation of the MCMC trees we followed previous authors [94, 98, 99] to regard only nodes with node support of p > 0.95 as significantly supported in Bayesian analysis. We also applied this cutoff when comparing the MCMC trees based on the individual data sets and that based on the concatenated data.
We also performed MP analysis of the combined, concatenated data, as well as of the combined concatenated data adding those morphological characters suspected to be coupled with social parasitism. All MP analyses were unweighted and performed with PAUP* 4.0b10  using the heuristic search algorithm with tree bisection reconnection branch swapping and 10 random stepwise additions. All characters were treated as unordered and polymorphic states were taken into account. Node support was calculated by 1,000 bootstrap replicates. In the interpretation of the MP trees, we applied the widely accepted node support threshold of > 70 .
We deposited the aligned, concatenated data matrix with TreeBase (Study accession number S2136).
Comparison with previous Lasius phylogenies
To compare the subgenus relationships of W55, H98 and J04 directly with the new, Bayesian framework, we enforced the various topologies as constraints (Fig. 1) on our concatenated data. We then performed additional MCMC runs of our concatenated data under these constraints and compared the outcomes of the single MCMC runs using Bayes factors as given under morphological analyses. In extracting the topologies from the literature we proceeded as follows. For W55 we used the tree of "Fig. 2 " of  slightly modified. We adopted the position of Acanthomyops, explicitly treated as ingroup of Lasius in the tree, although Wilson did not formally treat Acanthomyop s as a subgenus of Lasius in the taxonomic revision itself. We allocated Lasius sitkaensis, now treated as a junior synonym of L. pallitarsis, to Lasius sensu stricto. A similar situation pertains to Austrolasius : the subgenus had not yet been established in 1955, but the then only known species which today is treated under Austrolasius, L. carniolicus, was allocated to Chthonolasius and we accounted for this in our treatment of W55. For H98 we applied the node support threshold of > 70  to the Maximum Parsimony reconstruction presented in "Fig. 1 " of , and we applied the same threshold for J04, to the Maximum Parsimony reconstruction presented in "Fig. 6" of .
Posterior mapping analysis
To estimate the probabilities of the possible ancestral states at each well supported node of the concatenated Bayesian topology we chose the Bayesian approach of posterior mapping [61, 62], using the program SIMMAP 1.0  freely available online http://www.SIMMAP.com. In contrast to parsimony approaches to character mapping this is a probabilistic approach, which (i) does not assume that only a single change has occurred along any branch, and (ii) is not prone to underestimation of the variance in ancestral state assignments . Further, the SIMMAP approach allows uncertainty in phylogenetic reconstruction. A single stochastic mapping was done per tree using the last 2,000 post-burnin trees of the 20,000 trees used to derive the consensus tree of Fig. 2 and the ancestral states were inferred for the consensus tree from the MrBayes analysis.