To test whether there is a phylogenetic association between anther smut fungi and their Caryophyllaceae hosts, we used data from 21 host species from Europe and North America for which we have observed Microbotryum infections in natural populations. All host species analyzed in this study belong to subfamily Caryophylloideae, except one Stellaria species, belonging to the subfamily Alsinoideae . To root host phylogenetic trees, the Stellaria sp. was used as an outgroup based on a previous molecular phylogenetic study . To root anther smut phylogenetic trees, the strains from the North American hosts S. caroliniana and S. virginica were used because they were previously shown to branch at the base of all the species analyzed in the present study . Because the plant species are reasonably well established , a single plant sample was collected per host species. Some North American Silene species are not monophyletic  but they were not those included in our dataset. Several fungal samples per host species were collected whenever possible because the parasite taxonomy is currently being resolved (see above). The origins of anther smut and host samples are given in the Additional files 4 and 5, respectively.
Infected plants were detected by the violet, sporulating anthers of open flowers and flower buds from these same plants were collected and stored in individual paper or glacine envelopes on silica gel.
Host plant DNA extraction and PCR amplification of plant nuclear ITS and cpDNA (the intron within the trn L gene, hereafter trn L, and the intergenic region between the genes trn L and trn F, hereafter trn LF) were performed as described previously . DNA from the fungal strains was extracted from the cultures using a Chelex protocol . Fungal DNA extraction and PCR amplification of the β-tubulin (β-tub), γ-tubulin (γ-tub) and Elongation factor 1 α (Ef1α) were amplified according to . PCR fragments were purified and sequenced as described previously [29, 52].
The programs Navigator PPC (Applied Biosystems) and Bioedit 6.0.7  were used to check sequence electropherograms. Multiple alignments based on consensus sequences were carried out using BioEdit. Alignments were then checked and apparent alignment errors were corrected by hand. Regions of ambiguous alignment and gaps were excluded from all analyses.
The sequences generated are available in GenBank (Accession numbers plant ITS are EF407925–EF407945, for plant trn L EF407883–EF407903, for plant trn LF EF407904–EF407924, for fungal β-tub, DQ992076–DQ992113 and EF419304, γ-tub, DQ992114–DQ992147, and Ef1α, DQ992148–DQ992177 and EF419301–EF419303). The smut sequences generated previously  were also used, and are indicated in the Additional file 4.
Phylogenetic trees were reconstructed by Bayesian inference, maximum parsimony (MP) and Neighbor-Joining (NJ). MP and NJ analyses were performed using PAUP version 4.0b10 . The following options were employed for MP analyses in PAUP: heuristic search, characters unordered with equal weight, starting tree obtained via stepwise addition option and constructed with random sequence addition (10 replicates), branch swapping by TBR (tree bisection reconnection). A single MP tree was recovered for all datasets. NJ analyses were performed using the molecular evolution models selected by AIC in ModelTest 3.7 . The models retained were TIM+G, gamma shape = 0.41 for the concatenated plant data set, HKY+G, Ti/Tv = 2.2, gamma shape = 1.12 for the fungal β-tub gene, HKY+I, invariable sites = 0.52 for the fungal γ-tub gene, GTR+G, gamma shape = 0.24 for the fungal Ef1α gene and GTR+G, gamma shape = 0.26 for the concatenated fungal dataset. Bootstrap confidence values were calculated for 1,000 pseudoreplicates. Bayesian analyses were run using MrBayes version 3.0b5 . Each run consisted of 4 incrementally heated Markov chains run simultaneously, with heating value set to default (0.2). Priors were constrained according to the results obtained by running MrModeltest 2.2 . Markov chains were initiated from a random tree and run for increasing numbers of generations, until the average standard deviation remained below 0.01, i.e. 1,000,000 generations for Ef1α, 1,250,000 generations for γ-tub, 1,000,000 generations for β-tub, 500,000 generations for the fungal and plant concatenated datasets. Trees were sampled every 50 generations and the first 25% of trees were not taken into account. We used a 50% majority rule consensus tree to obtain the Bayesian posterior probabilities (Bpp). Details on the phylogenetic parameters used and output statistics are available upon request. Data matrices and resulting trees are available in TreeBase (submission ID number SN3239; Journal Peer Reviewer's PIN number: 30601). We considered nodes as strongly supported by a given method when they had values of Bayesian Posterior Probabilities/Maximum Parsimony Bootstraps/Neighbor-Joining Bootstraps at least equal to 0.9/70/70, respectively. Monophyly supported by at least two methods was considered as significant.
Congruence between individual phylogenies within fungal or plant systems
Congruence between individual phylogenies was estimated by Approximately Unbiased tests (AU) as implemented in CONSEL , by comparing for each gene the likelihood of the MP topology obtained for this gene to the likelihood of the enforced topologies (obtained with each other gene) . Likelihoods were obtained in PAUP using the sequence evolution model selected, using AIC, according to the results of ModelTest v. 3.7 . The incongruence length difference test (ILD, ) was not used because several works have underlined that the ILD test is a poor indicator of data set combinability .
In absence of significant difference, we further checked the congruence of each node by visual inspection. Nodes were considered as congruent in two gene phylogenies when supported by significant statistical values of at least two of the three phylogenetic reconstruction methods in each of the two phylogenies. Nodes were considered as incongruent between two gene phylogenies when significant statistical values of at least two of the three phylogenetic reconstruction methods supported conflicting nodes between the two gene phylogenies.
When we found no evidence for incongruence, the genes were concatenated to perform combined analyses, similarly as described above. Consistency between the resulting tree and individual gene trees was again checked by visual inspection.
Identification of fungal phylogenetic species
To detect phylogenetic species within M. violaceum, we used the criterion of phylogenetic congruence between different gene phylogenies . We thus considered a group of strains as an independent evolutionary lineage when 1) it was strongly supported as monophyletic by two of the three reconstruction methods in at least one gene phylogeny or in the concatenated phylogeny, and 2) this was not contradicted by the other gene phylogenies. Using three different methods of reconstruction allows us to be conservative in our species delimitation rule, and to avoid splitting the fungus into too many species based on some artefact of one particular method. We considered here again nodes as strongly supported by a given method when they had values of Bayesian Posterior Probabilities/Maximum Parsimony Bootstraps/Neighbor-Joining Bootstraps at least equal to 0.9/70/70, respectively.
Comparison of host and fungal trees
To compare the plant and fungal phylogenies, we used the data derived from the concatenated sequences. As we wanted to assess the impact of species delimitation, we retained successively: 1) one fungal strain per host species, and 2) one fungal strain per fungal species but linking it to all the hosts that this parasite species was found to infect.
For reconciliation analyses (TreeMap  and TreeFitter ) and Maximum Agreement Subtrees (Icong index ), which do not accept polytomies, phylogenetic relationships in our plant tree that were poorly supported were resolved when possible according to previous studies (see symbol * on Figs. 3 and 4). For the remaining unresolved nodes, the alternative placements were considered as equally possible. In order to reduce the combinations of plant and fungal tree comparisons, we used only two topologies among all possible ones, for each of the plants and fungi. One topology was chosen as a priori maximizing the congruence with the other partner and the other one minimizing it (see symbol ¤ on Fig. 3 and 4). For the methods using genetic distances (Mantel tests) or patristic distances (ParaFit) original datasets were conserved but we excluded S. acaulis, because of the incongruence between individual gene phylogenies, which may have resulted from hybridization between two distant lineages.
The history of association between hosts and fungi was first investigated using reconciliation analysis as implemented in the programs TreeMap  and TreeFitter . TreeMap 1 uses a model to find optimal reconstructions of the history of the association by maximizing cospeciation events . In situations when host shifts are likely to be common, as in our case , this methodology is less likely to find optimal solutions than when host shifts are rare . A later release of this program, TreeMap 2.0b , considers all potentially optimal solutions and offers a more appropriate means of dealing with host shifts. However, this program is currently limited in size and complexity of datasets that can be computed, and we could not run the complete analysis with our dataset. Thus, in this study we used the program TreeMap 1 with the heuristic search option to reconcile plant and fungal trees. The program TreeFitter 1.0 uses a different algorithm and optimality criterion for reconciling the host and fungal phylogenies, allowing reconstructions involving many host shifts to be recovered. In both TreeMap and TreeFitter, one can test the null hypothesis that the two phylogenies are randomly related by comparing the scores of optimal reconstructions (i.e. the number of cospeciation events for TreeMap and global cost for TreeFitter) with those of randomly obtained phylogenies through permutational procedures. We chose to randomize the parasite trees because in cophylogeny analyses, the host tree is considered as given, and one wants to test whether the observed parasite tree is more congruent with the host tree than are random parasite trees. Tests were performed based on 3000 permutations for TreeMap and 10,000 permutations for TreeFitter. TreeFitter allows assignment of different costs to four types of events (i.e. cospeciation, duplication, extinction, and host shift). When using the option seeking for costs with the highest likelihood, we found a very large range of possible costs. We varied these costs to assess effects on the test results, and retained one set of costs maximizing the likelihood of cospeciation events inferred, and the other one minimizing the total number of events.
As a third approach we used the method ParaFit  that uses matrices of principal coordinates, derived either from patristic distances (summed branch lengths along a phylogenetic tree; in that case, S. acaulis was excluded) or genetic distances, and the matrix of presence/absence of host parasite associations. Using patristic distances allows taking into account the most likely phylogenetic relationships in addition to genetic distances. We performed both analyses. The trace statistic is calculated by taking plant-fungus associations into account. The null hypothesis that the plant and fungal samples are randomly associated is tested by a permutational procedure. Patristic distances were calculated from the unresolved trees derived from the concatenated datasets using PATRISTIC [65, 66]. Phylogenetic distances were calculated using the software MEGA . Principal coordinates were then computed using the software DistPCoA .
The fourth method assessed whether the plant and fungal topologies were more similar than expected by chance using the Icong index . With this method, the topological congruence of two trees is assessed through their maximum agreement subtree (MAST). A MAST is the largest possible tree compatible with two given trees  and is obtained by removing the minimum number of leaves (i.e. terminal branches) in both trees in order to obtain perfect congruence. Significant congruence is inferred when congruence between the two trees is higher than that of random trees with the same leaf number. As for the previous methods, we reduced the number of topologies tested by choosing two topologies for the plant tree and two for the fungal tree. As this method requires that the trees had the same number of leaves, hosts harbouring several fungal species were duplicated. When using a single strain per fungal species, we also had to select a single one of the multiple hosts. The choice was made so that the congruence with the fungal tree was maximized or minimized. We assessed the level of congruence for the four resulting combinations. Finally, we correlated genetic distances of host plants with the distances between associated Microbotryum species. Genetic distances were computed as for ParaFit. As the data were not independent, we tested the matrix correlation using permutations (Mantel test in the software Genepop [70, 71]).
Pollinator and ecological data
Data on the common pollinators of the Caryophyllaceae species included in this study were collected from the literature [27, 72–77] and from personal observations and communications (J.A. Shykoff, A. Erhardt). Some individual species, e.g. Hadena bicruris, were reported as pollinators, but most descriptions were made at higher taxonomic levels (e.g. genus or family) or common name (e.g. moths or butterflies). This raises the concern that plants pollinated by the same genus or family may not be visited by exactly the same species or the same individuals. For well studied plant species, however, only a few different pollinators are reported and these pollinator species are rather unspecialized, being found pollinating several plant genera or families [27, 72, 74]. For instance, Macroglossum stellatarum (Sphingidae), Hadena bicruris, and Autographa gamma (Noctuidae) were all found on at least three different host species . Moreover, other observations showed that individual pollinators land successively on three sympatric species: Dianthus carthusianorum, Silene vulgaris and Silene nutans . We therefore considered that sharing a clade of pollinators implied sharing at least one pollinator species, thus possibly allowing M. violaceum spore transmission.
To partition the plant species into ecological groups based on their most common habitats, the French botanical web site SOPHY , literature data , and personal observations and communications (J.A. Shykoff, A. Erhardt, C. Bock) were used. Ecological similarity may lead to cross-species disease transmission even in the absence of shared insect pollinators because spores can also be disseminated by wind, rain and phytophageous insects .