Multigenic phylogeny and analysis of tree incongruences in Triticeae (Poaceae)

Background Introgressive events (e.g., hybridization, gene flow, horizontal gene transfer) and incomplete lineage sorting of ancestral polymorphisms are a challenge for phylogenetic analyses since different genes may exhibit conflicting genealogical histories. Grasses of the Triticeae tribe provide a particularly striking example of incongruence among gene trees. Previous phylogenies, mostly inferred with one gene, are in conflict for several taxon positions. Therefore, obtaining a resolved picture of relationships among genera and species of this tribe has been a challenging task. Here, we obtain the most comprehensive molecular dataset to date in Triticeae, including one chloroplastic and 26 nuclear genes. We aim to test whether it is possible to infer phylogenetic relationships in the face of (potentially) large-scale introgressive events and/or incomplete lineage sorting; to identify parts of the evolutionary history that have not evolved in a tree-like manner; and to decipher the biological causes of gene-tree conflicts in this tribe. Results We obtain resolved phylogenetic hypotheses using the supermatrix and Bayesian Concordance Factors (BCF) approaches despite numerous incongruences among gene trees. These phylogenies suggest the existence of 4-5 major clades within Triticeae, with Psathyrostachys and Hordeum being the deepest genera. In addition, we construct a multigenic network that highlights parts of the Triticeae history that have not evolved in a tree-like manner. Dasypyrum, Heteranthelium and genera of clade V, grouping Secale, Taeniatherum, Triticum and Aegilops, have evolved in a reticulated manner. Their relationships are thus better represented by the multigenic network than by the supermatrix or BCF trees. Noteworthy, we demonstrate that gene-tree incongruences increase with genetic distance and are greater in telomeric than centromeric genes. Together, our results suggest that recombination is the main factor decoupling gene trees from multigenic trees. Conclusions Our study is the first to propose a comprehensive, multigenic phylogeny of Triticeae. It clarifies several aspects of the relationships among genera and species of this tribe, and pinpoints biological groups with likely reticulate evolution. Importantly, this study extends previous results obtained in Drosophila by demonstrating that recombination can exacerbate gene-tree conflicts in phylogenetic reconstructions.


Background
When reconstructing the phylogeny of a biological group it is implicitly assumed that species split in a tree-like manner and that all characters (e.g., all genes in the genome) reveal the same genealogical history that has occurred in each lineage after the split from a common ancestor. When these two assumptions are met phylogenetic trees inferred from one or a few genes can be used as proxies of the species tree. However, recent studies have shown that trees inferred from different genes may conflict with each other and that violation of these assumptions is more common than previously thought [1][2][3][4][5][6][7][8][9][10].
Incongruence may appear among gene trees for various reasons. If the genes used to infer the phylogenetic relationships among genera and species are sampled from introgressed portions of the genome produced by hybridization, gene flow or horizontal gene transfer, the trees obtained likely reflect the history of the introgression * Correspondence: jsescobar2002@yahoo.fr 1 Institut National de la Recherche Agronomique, Centre de Montpellier, UMR Diversité et Adaptation des Plantes Cultivées, Domaine de Melgueil, 34130 Mauguio, France Full list of author information is available at the end of the article rather than the history of lineage splitting [11,12]. The genealogical histories of individual genes may also be misleading due to retention and stochastic sorting of ancestral polymorphisms caused by incomplete lineage sorting. This is especially likely when the effective population size of a given lineage is large with respect to the time elapsed since divergence [13][14][15]. In this case, genetic drift is unlikely to have brought alleles to fixation before subsequent divergence [1,6]. Finally, gene duplication followed by gene loss may lead to incongruence because paralogous gene copies are incorrectly inferred to be orthologous [16].
Whatever their origin, incongruences among gene trees require careful attention for several reasons. First, they affect the interpretation of morphological and molecular patterns of evolution. Second, they maintain extensive instability in taxonomy. Third, they complicate the choice of wild taxa as sources of novel genes in breeding programs (e.g., genes conferring resistance to pathogens, tolerance to salt, low temperatures and drought). Finally, uncertainty in phylogenetic relationships may lead to inadequate conservation decisions (e. g., the protection of particular species or habitats).
In prokaryotes, some authors argue that numerous hybridizations and gene transfers preclude the possibility and the meaning of a tree-like representation of a species history [17,18]. In plants too it has been argued that, in some cases, reticulate evolution is more appropriate than a tree-like description [19]. On the contrary, other authors argue that despite incongruences it is possible to reconstruct phylogenies and tree-like histories [20,21]. Among angiosperms, Triticeae grasses provide a particularly striking example of incongruence among gene trees, suggesting reticulate evolution [22]. This tribe comprises species of major economic importance, including wheat, barley and rye. In recent years, attempts to try and sort out the phylogenetic details of the group, based on analyses of single-copy nuclear genes [23][24][25][26], highly repetitive nuclear DNA [27], internal transcribed spacers [28], and chloroplastic genes [29][30][31], failed to lead to any consensual definition of clades. Current evidence suggests that different portions of the nuclear and chloroplastic genomes have different genealogical histories. Because published trees are in conflict for almost all taxon positions we do not know whether the historical relationships among the genera and species of this tribe can be resolved, or not, in a tree-like manner and, if so, what are the real phylogenetic relationships. In this paper, we use the most comprehensive molecular dataset to date in Triticeae, including 27 gene fragments, with the aim to (i) reconstruct a multigenic phylogeny of this tribe, (ii) quantify tree incongruences, and (iii) explore possible factors affecting incongruence, including the frequency of recombination, chromosomal location and evolutionary rate.

Species Studied and Loci Sampled
Nineteen diploid species, spanning 13 genera of Triticeae, were analyzed. These species were selected because they belong to most phylogenetic clades recognized so far [22,26,29] and represent most of the diversity of diploid genera (68% according to [22] and [32]), life styles (annual and perennial), mating systems (self-compatible and self-incompatible), and geographic origin (Europe, Middle East, Asia, North America and Australia). One or two accessions per species were obtained from the United States Department of Agriculture (USDA), National Plant Germplasm System (available at http://www.ars-grin.gov/npgs/index.html), making a total of 32 accessions (Table 1). Although Bromus is supposed to be the closest outgroup of Triticeae [26,33,34], to simplify primer design we preferred to use Brachypodium distachyon, a more distant species for which the complete genome is available [35]. As the ingroup topology may depend upon the choice of a single outgroup, Zea mays and Oryza sativa were also incorporated as additional, more distant outgroups. The choice of distant outgroups may increase the number of homoplasies. However, owing to the selective constraints likely acting on the coding sequences we used (see below), it is likely they have been affected by low substitutional saturation and hence by a low homoplasy level.
Orthologous coding sequences (cDNA) of one gene fragment from the chloroplast (MATK) and 26 nuclear gene fragments located on three different chromosomes (out of the seven chromosomes representative of Triticeae) were sequenced for each accession (Table 2; GenBank: HM539308-HM540073). Sequences of B. distachyon were obtained from the US Department of Energy Joint Genome Institute http://www.jgi.doe.gov/. Sequences of Z. mays and O. sativa were obtained from the GenBank.

RNA Extraction, cDNA Synthesis, PCR Amplification and Sequencing
Total RNA was extracted from 100 mg of young leaves using the RNeasy Plant Mini Kit (Qiagen). cDNA was synthesized from this RNA using oligo-dT primers with the Reverse Transcription System kit (Promega), following the manufacturer's protocol. For each gene fragment a couple of primers were designed on conserved regions identified based on alignments of barley and wheat EST (Additional file 1, Table S1). PCR amplification was performed on cDNA and amplification products were purified with the AMPure kit (Agencourt). Sanger sequencing was performed on amplicons with the same primers used in the PCR amplification. The BigDye Terminator v3.1 Cycle Sequencing Kit (Applied BioSystems) was used with 1.0 μl of BidDye v3.1 enzymatic reaction mix. The reactions were purified with the CleanSEQ kit (Agencourt) and separated on a 3130×l Genetic Analyzer (Applied BioSystems).

Orthology Determination and Location of Loci on the Triticeae Genome
Because no genome of Triticeae species has been sequenced yet the orthology of the 27 sequenced gene fragments was established indirectly. Two observations strongly suggest the uniqueness of our sequences in the genome. First, all the sequenced loci are present in single copy in rice and B. distachyon, two species whose complete genomes are available. Second, according to EST data, they are expressed in one copy in diploid barley and in three copies in hexaploid wheat, which suggests the existence of a single copy per genome. The probability that our sequences coexist with paralogs is therefore limited.
From the 26 sequenced nuclear loci, 21 were derived from the rice chromosome 1, known to be collinear with the wheat chromosome 3B [36][37][38]. We assumed that chromosomal locations are mostly conserved across Triticeae [36][37][38][39][40] and used the location of rice orthologs as a proxy of their chromosomal position. Moreover, we checked that using B. distachyon as reference did not alter the estimated physical position. The relative distance to the centromere of each gene fragment was then computed assuming that the chromosome fraction separating them from the centromere was proportional in rice chromosome 1 and Triticeae chromosome 3 [36,38]. The centromere is located around 17 Mb from the telomere of the short arm in rice and 388 Mb in wheat [36,41].
Wheat has a strong recombination gradient [41,42], like other Triticeae species [39,43,44], which fits a positive exponential function from centromeres to telomeres [45]. The 21 loci located on chromosome 3 were thus suitable candidates to study the impact of recombination intensity in gene evolution. These loci were classified as centromeric (physical distance < 70% of chromosome arm), for which recombination is low, and telomeric (physical distance > 70% of chromosome arm), which concentrate most recombination events [45]. Due to the strong non-linear relationship between the physical and genetic map, the genetic distance along the chromosome was approximated according to reference [45]. Akhunov et al. [45] estimated the centimorgan per mega base (cM/Mb) ratio as a function of the percent of chromosome arm. To obtain the genetic distance in cM, we integrated the equation given in figure 1 in [45] and multiplied the result by the arm length: where L is the length of the chromosome arm (388 Mb for the short arm and 437 Mb for the long arm; [41]) and x is the relative distance to the centromere. To follow the evolution of recombination along the chromosome, positive (respectively negative) distances were assigned to the long (respectively short) chromosome arm.
In addition to the 21 nuclear loci located on chromosome 3, two loci corresponding to the hardness gene (PinA and PinB; [46]), one gene fragment corresponding to a eukaryotic initiation factor involved in translational regulation (eIFiso4E), and two gene fragments involved in the carotenoid biosynthetic pathway (CRTISO and PSY2; [36,47]), were sequenced. Positions of loci PinA and PinB were obtained from published data [46]. Positions of eIFiso4E and CRTISO were inferred from synteny with rice. The position of PSY2 is undetermined.

Individual Gene Trees
Raw sequence data were aligned with the Staden Package [48] and the resulting alignments were manually corrected. Alignments for individual loci were analyzed using maximum likelihood (ML) and Bayesian approaches. ML analyses were conducted using the best-fitting model of sequence evolution, based on Akaike Information Criterion (AIC) using ModelTest 3.7 [49] ( Table 3). PAUP* 4.0b10 [50] was used to obtain the highest-likelihood phylogenetic trees (heuristic search with neighbor-joining starting tree, tree bisection-reconnection branch swapping and 100 bootstrap replicates). Bayesian analyses were performed with MrBayes 3.1.2 [51,52] with the following priors: Dirichlet priors (1,1,1,1) for base frequencies and (1,1,1,1,1) for General Time Reversible (GTR) parameters scaled to the G-T rate, a uniform (0.05,50) and (0,1) priors for the gamma (Γ) shape and the proportion of invariable sites (I), and an exponential (10.0) prior for branch lengths. Metropolis-coupled Markov Chain Monte Carlo analyses (MCMCMC) were run with random starting trees and five simultaneous, sequentially heated independent chains. Analyses were run for 10,000,000 generations. We used the BPCOMP program implemented in PhyloBayes 2.3c [53] to determine appropriate convergence of the chains (i.e., the maximum difference [maxdiff] between posterior probabilities attached to the same clade as evidenced by independent chains is < 0.10). A burn-in was discarded after identifying the stationary phase.

Multigenic Trees and Network
We obtained a multigenic, supermatrix tree by concatenating alignments of all 27 loci (24,652 bp). ML analysis of this supermatrix was performed in the same way as individual locus alignments using a GTR+Γ+I model. Bayesian inference was performed by partitioning the concatenate alignment on the basis of individual loci using a GTR+Γ+I model. Chains were run for 10,000,000 generations. Multigenic, Bayesian Concordance Factors (BCF) were estimated using BUCKy 1.3.1 [54]. BCF estimates the degree of conflict of individual gene trees and accounts for all biological processes resulting in different phylogenies (e.g., introgression, incomplete lineage sorting). We summarized information from five MCMCMC chains obtained in individual locus analyses with MrBayes and removed 50% of the samples from each chain as burnin. Then, BCF were estimated using six a priori levels of discordance among loci (α = 0.1, 0.5, 1, 5, 10 and 100) and 1,000,000 generations in each run.
Supermatrix and BCF analyses provide powerful means of using the evidence from all characters in the final estimation of the phylogenetic tree [55]. However, they implicitly assume that species split in a tree-like manner, which would not be the case when hybridization and/or lineage sorting have played an important role in the history of a group, as seems to be the case in Triticeae. To identify regions of the phylogeny of Triticeae that have not evolved in a tree-like manner, we constructed a multigenic network summarizing information conveyed by individual gene trees. The 27 gene trees were modified using the PhySIC_IST preprocess of source trees [56]. This preprocess aims at reducing source tree conflicts by eliminating a topological resolution when it is significantly less frequent in source trees than an alternative conflicting resolution. We applied a correction threshold of 0.9 to only keep strongly supported incongruences. Then, a network displaying all clades present in at least one among the modified gene trees was computed using the Cass algorithm [57] implemented in Dendroscope 2 [58], inputted with the Z-closure of the modified trees [59].

Incongruence Quantification
The level of incongruence among individual gene trees and the two multigenic trees was first assessed by Shimodaira and Hasegawa tests [60]. The Shimodaira and Hasegawa test, based on sequence alignments, was used to compare majority-rule consensus tree topologies obtained with PAUP* for individual genes and the topologies of the supermatrix and BUCKy trees. Polytomies were randomly resolved by bipartitions using the mul-ti2di function implemented in the APE package [61] of R 2.9.1 [62]. This was done because polytomies were strongly penalized in the log-likelihood score. Indeed, if polytomies are left unresolved it is not possible to determine whether significance of Shimodaira and Hasegawa tests was due to the fact that an alternative topology was more (or less) likely than a given tested topology or simply because it was more (respectively less) resolved. In all cases, when polytomies were kept the supermatrix and BUCKy trees resulted in higher log-likelihoods, consistent with the fact they are fully resolved. Shimodaira and Hasegawa tests were run using a GTR+Γ+I model in the BASEML program implemented in PAML 4.1 [63].
In addition, we used the χ 2 test of the PhySIC_IST preprocess [56] to identify triplets of leaves observed in the multigenic trees that were strongly rejected by the 27 bootstrap gene-tree collections. A strong rejection was defined as follows: denoting R s the set of triplets of a multigenic tree (supermatrix or BUCKy), and R b the set of triplets of the 2,700 bootstrap gene trees (100 per locus), a triplet of R s was said to be strongly rejected if it contradicted at least one triplet of R b and failed the χ 2 test described in [56] with a threshold of 0.9. Using this procedure we counted the number of strongly rejected triplets a taxon belongs to.
To quantify the degree of incongruence between individual gene trees and the two multigenic trees, we defined a triplet-based distance between a given multigenic tree (T s ) and the forest (F j ) of 100 bootstrap trees obtained for locus j. To put it simply, the triplet distance represented the percentage of triplets that were resolved differently by a multigenic tree (supermatrix or BUCKy) and a given gene tree. In order to separate the signal of this locus from potential stochastic errors, we focused on triplets that appeared more than 50% of times in F j . This threshold has the advantage of keeping one and only one resolution per group of 3 species. Defining a threshold at 60% does not qualitatively alter our results (results not shown).
We denoted N eq(T s ,F j ) the number of retained triplets of F j that had the same resolution as T s , and N diff (T s ,F j ) the number of retained triplets of F j with a different resolution. We defined the distance between the tree T s and the forest F j , denoted d(T s , F j ), as the triplet fit dissimilarity (1 minus the triplet fit similarity [64]) between the triplet set of T s and the retained triplets of F j : Using similar procedures, we computed the triplet distance between all pairs of the 21 individual loci located on chromosome 3. We defined a triplet-based distance between each pair of forests F i and F j , where F i and F j were, respectively, the forests of 100 bootstrap trees obtained for loci i and j. As above, we focused on triplets that appeared more than 50% of times in each forest in order to eliminate potential stochastic errors. The distance d(F i , F j ) between F i and F j is defined as: In this way, we obtained a symmetric distance matrix (M) with 21 rows and 21 columns, where each entry M ij contained the triplet distance between loci i and j. This matrix was used in the analysis of gene-tree incongruence and recombination (see below).

Analyses of Patterns of Incongruence
In order to understand the origin of incongruences, we correlated triplet distances between individual loci and the multigenic trees (d(T s , F j ) in equation 2) to relevant phylogenetic parameters, including alignment length, average evolutionary rate (estimated with the super-distance matrix method [65]), and shape parameter α of the gamma distribution (obtained in ML analyses of individual loci). We also tested if incongruences were positively correlated with recombination by using the 21 loci located on chromosome 3. This correlation is expected whatever the origin of tree incongruence. Indeed, following interspecific hybridization, recombination is necessary for genes of one species to introgress into the genome of the other species. Alternatively, because the effective population size is expected to be smaller in low than in high recombining regions [66,67], coalescence is expected to be quicker and lineage sorting more complete when recombination is low. We thus tested if the triplet distance was lower in centromeric than in telomeric regions by fitting a quadratic regression of d(T s , F j ) on the genetic distance. We performed the same analyses on the aforementioned phylogenetic parameters because recombination could affect incongruences indirectly through these parameters (e.g., higher evolutionary rates in high recombining regions).
In addition, we tested whether the distribution of incongruences differed significantly between centromeric and telomeric loci located on chromosome 3. To this end, we estimated the triplet distance per pair of loci by distinguishing chromosome arms (short, long) and regions (centromere, telomere). Note that we did not mix loci located on different arms. Then, we obtained the difference in medians of the two distributions. To test whether this difference was statistically significant, we performed 10,000 replicates by permuting loci on each arm and recalculated the difference in medians at each permutation. The median difference observed with the actual dataset was compared with those observed in the permutated datasets.
Finally, closely linked loci more likely share a common genealogical history than distant loci [13]. To test this hypothesis we constructed a matrix of genetic distance between pairs of loci for the 21 genes located on chromosome 3. We correlated this matrix with the matrix of incongruences by pairs (M ij ) and tested the significance of the correlation by performing 10,000 permutations of locus locations on each chromosome arm (avoiding permutation from one arm to another).
Statistical analyses were performed with R 2.9.1 [62] and analysis of the distribution of incongruences in centromeric and telomeric loci was performed with Mathematica [68].

Multigenic Analyses: a more Resolved Picture
The supermatrix tree obtained with the concatenation of all loci (~25 Kb) provides a much more resolved picture than individual gene trees. ML and Bayesian analyses are consistent and produce very similar trees. According to these trees, we distinguished 5 to 7 clades depending on posterior probability or bootstrap supporting values (Figure 1). The first divergent group within Triticeae is Psathyrostachys (clade I), followed by Hordeum (clade IIA) and Pseudoroegneria (clade IIB). The internal branches are quite short compared with the terminal branches, suggesting that cladogenesis occurred in rapid succession. Two well-supported clades diverge at this point. The first is formed by Australopyrum (clade IIIA), Henrardia and Eremopyrum bonaepartis (clade IIIB), and Agropyrum and E. triticeum (clade IIIC). The second consists of Dasypyrum and Heteranthelium (clade IV), on the one hand, and Secale, Taeniatherum, Triticum and Aegilops (clade V), on the other hand.
BUCKy retrieves a unique topology irrespective of the different a priori levels of incongruence (α varying from 0.1 to 100), although mean sample-wide concordance factors (i.e., the proportion of the dataset that supports a bipartition) diminish with increasing α (Figure 2, Table 4). There is agreement between the estimated sample-wide and the extrapolated genome-wide concordance factors (i.e., the proportion of the whole genome that agrees with a given bipartition) and both concordance factors are higher in terminal than in deeper branches. The BCF tree is congruent with the supermatrix tree in several respects. First, Psathyrostachys (clade I) and then Hordeum (clade IIA) are the first divergent genera within Triticeae. Second, clade V is retrieved although branching within this clade changes relative to the supermatrix tree: Secale and Taeniatherum branch together, T. monococcum branches sister to Ae. tauschii, and Ae. speltoides and Ae. longissima group together. Third, monophyly of clade III is confirmed in this analysis although with alternative branching: Henrardia and E. bonaepartis (clade IIIB) are the first divergent taxa, and Australopyrum (clade IIIA) branches sister to Agropyron and E. triticeum (clade IIIC). However, a major discrepancy between this tree and the supermatrix tree is worth noting: Pseudoroegneria does not group with Hordeum but sister to Dasypyrum. Consequently, Heteranthelium branches at the base of clade V and these two new inferred clades (Pseudoroegneria-Dasypyrum and Heteranthelium-clade V) are closely related to each other. Despite differences among the supermatrix and BUCKy trees, the resolution and support gained with multigenic approaches compared with single-locus analyses are remarkable. Differences among trees are mainly due to uncertainty in the position of Pseudoroegneria.
The multigenic network displays most of the relationships present in the supermatrix and BUCKy trees (Figure 3). In addition, it points to the less resolved parts of the phylogeny that mainly correspond to nodes with low support. Psathyrostachys and Hordeum are the deepest genera of Triticeae. Their divergence occurs in a treelike manner as do relationships within clade III. Note that the topology of clade III is the same between the multigenic network and the BUCKy tree. Uncertainties for inter-clade relationships mainly involve Dasypyrum and Heteranthelium but only few alternatives are proposed by the network analysis. Finally, branching of most species in clade V is quite variable and this instability is better taken into account by the network than by any of the multigenic trees (supermatrix or BUCKy). Overall, the network analysis reveals a general tree-like divergence history of the Triticeae with local episodes of reticulate evolution.

Patterns of Incongruence among Trees
One of the most puzzling results obtained in this study is the numerous incongruences among individual gene trees. In most cases, Shimodaira and Hasegawa tests confirm that, regarding a given locus alignment, the corresponding gene tree has a significantly higher log-likelihood than that of other individual gene trees. However, in most cases, differences between individual gene trees and the two multigenic trees (supermatrix and BUCKy) are not statistically significant (Table 5). This suggests that the splitting histories of species lineages depicted by the supermatrix or BUCKy trees are reasonable compromises of individual gene tree scenarios.
In order to quantify topological incongruences, we estimated triplet distances among individual gene trees, as well as between each individual gene tree and the two multigenic trees, as described in Incongruence Quantification in the Methods section. The average triplet distance between individual gene trees (in absolute  value) is 0.53 ± 0.14 (mean ± SD; range: 0.10-0.88). The average triplet distance between individual gene trees and the supermatrix tree is 0.21 ± 0.10 (0.08-0.50) and the average triplet distance between individual gene trees and the BUCKy tree is 0.25 ± 0.09 (0.07-0.43). In addition, we counted, for each accession, the number of triplets that are observed in individual gene trees and strongly rejected by the two multigenic trees. Excepting the two Psathyrostachys accessions, all other taxa are involved in several strongly rejected triplets (Table 6). Pseudoroegneria is often the most incongruent genus (13% or 24% of all incongruences according to the BUCKy or supermatrix tree, respectively). Whereas Aegilops/Triticum species are not especially incongruent with the supermatrix tree (16%), they are highly incongruent with the BUCKy tree (35%). Conversely, Hordeum is involved in many incongruent triplets when the supermatrix tree is used (20%) but is only involved in as few as 1% of incongruent triplets when the BUCKy tree is considered. Results are similar when we remove the outgroups (Zea, Oryza and Brachypodium) and re-root the trees with Psathyrostachys (results not shown). This demonstrates that incongruences are not due to the difficult positioning of the root of the trees.

The Effect of Recombination on Incongruences
We performed several tests to understand the origin of incongruences among gene trees. First, we tested if variation in incongruence could be explained by the nature of the phylogenetic signal. After correlating the triplet   Table 4. Clades named as in Figure 1.
distance between individual genes and the two multigenic trees (supermatrix and BUCKy) with relevant phylogenetic parameters per locus, we only detect a significant positive correlation between the average evolutionary rate and triplet distance separating individual gene trees from the supermatrix tree (Spearman's rho = 0.42, P = 0.03), although not with the BUCKy tree (rho = 0.21, P = 0.28). Next, we investigated the effect of recombination. Recombination does not significantly affect any phylogenetic parameter (P > 0.5 in all correlations; results not shown). In contrast, it affects incongruences in three ways. First, incongruences are significantly greater in telomeric than in centromeric loci (respective medians = 0.511 and 0.429, P = 0.028; Figure 4A). Second, loci located close together on the chromosome tend to have more similar genealogical histories than distant loci ( Figure 4B; rho = 0.22, P = 0.026). Third, although the statistical significance is low (possibly because of the small number of genes we sampled), incongruences between individual gene trees and the two multigenic trees tend to increase with the genetic distance between pairs of loci, that is, with the likelihood of recombination (P = 0.09 for the supermatrix tree; P = 0.10 for the BUCKy tree; in the two cases we removed one potential outlier; Figure 5). Note that similar qualitative patterns are observed when using a more restrictive cut-off to keep incongruences (triplets that appeared more than 60% of times in the 100 bootstrap trees rather than 50% of times), although the statistical significance disappeared (results not shown). Again, this is possibly due to the limited number of genes we sampled.  Up to date, morphological and molecular analyses have failed to infer a reliable phylogeny of the Triticeae. Many previous phylogenetic reconstructions were based on a limited number of genes, in most cases only one (see references above). The many conflicts among published trees, combined with poor resolution of branching among genera and species, prevented a clear picture of the relationships among members of this tribe from emerging. Moreover, it was not possible to conclude whether reticulate evolution is the dominant rule in this tribe, so that reconstructing a resolved phylogeny is hopeless, or whether multigenic approaches could solve the problem. Thanks to the largest dataset used so far in this tribe (27 genes), we show that combining information from several loci located on different chromosomes and cellular compartments (nucleus and chloroplast) enable the identification of major clades. Although the branching position of some groups remains uncertain (e.g., Pseudoroegneria), the two multigenic trees and the multigenic network enabled us to resolve most parts of the Triticeae phylogeny. Some incongruence persists in our analyses. They are represented in the phylogeny by low support values  (bootstrap, posterior support and concordance factors) and are summarized within the multigenic network. Figures 1, 2 and 3 show that Psathyrostachys branches sister to the remaining Triticeae, followed by the sequential branching of Hordeum. Some previously published phylogenies recognized the early divergence of Psathyrostachys and Hordeum, including nuclear [23,27,33] and chloroplastic DNA analyses [29,30], although many other studies disagreed [24,26,28,32,33]. Here, Psathyrostachys is involved in no incongruence (Table 6) and the branch leading to the rest of the tribe is among the longest internal branches. This clearly indicates that this genus is the sister group of all other Triticeae.
Our dataset does not allow us to resolve the position of Pseudoroegneria. No study has raised the possibility that it branches out with Hordeum, as proposed by the supermatrix tree, and only one study suggested that Pseudoroegneria could group with Dasypyrum, as proposed by the BUCKy tree, although support for this relationship was low [69]. Other studies proposed that Pseudoroegneria could branch sister to Taeniatherum and/or Australopyrum [23,24,33], Heteranthelium [26] or Aegilops [32]. In other cases its branching pattern was unstable [22,29] and it was even considered paraphyletic [30,70]. Consistent with the difficult positioning of this genus, the supermatrix tree groups it with Hordeum with a rather weak bootstrap support (0.69), although maximal posterior probability (1.00), conflicting with the BUCKy tree. More strikingly, the three Pseudoroegneria accessions are, in general, involved in more incongruent triplets than other accessions (Table 6). This could be due to a strong capacity of introgression during divergence of this group and/or a large ancestral population size. In agreement with the first hypothesis, Pseudoroegneria and Hordeum hybridized and were at the origin of the species-rich allotetraploid genus Elymus [23,30]. If recent interspecific introgression were responsible for the incongruence pattern we observed it would likely proceed via polyploids because all genera investigated here are currently intersterile [71][72][73][74]. Polyploids could serve as bridges of genes between diploid species [22]. Alternatively, if rapid radiation followed by incomplete lineage sorting were the main factors contributing to incongruence, one would expect deep branches to be short and less supported than external branches. This is basically what we observe in individual gene trees and the supermatrix tree. The relatively strong support obtained in the supermatrix analysis is due to accumulation of phylogenetic signals when concatenating all genes. Hence, it could be that both factors, hybridization and incomplete lineage sorting of ancestral polymorphism, contributed to the pattern of incongruence and the unstable positioning of Pseudoroegneria.
Phylogenetic positions of all other genera and species varied in previous studies and no consensus emerged. In the present study we find strong support for clades III and V, although branching order varied depending on the multigenic tree. Overall, our results suggest rapid radiation following -or concomitant with-divergence of clade III (grouping Agropyron, Australopyrum, Eremopyrum and Henrardia).

Incongruence and Recombination
Our results provide strong evidence of incongruence among individual gene trees, unraveling a complex biological reality where different portions of the genome  exhibit different histories (their own evolutionary histories). We analyze the pattern of incongruence and pinpoint the role of recombination and gene location on it. We demonstrate that physically close loci are more likely to share a common history than distant loci. More interestingly, loci located in centromeric regions tend to be more congruent with one another than loci located in telomeric regions. A similar correlation was found in Drosophila at the kilobase scale, the scale of linkage disequilibrium in this group [13]. In contrast, the mosaics of conflicting genealogies observed in Oryza (rice) were randomly distributed across the genome [75]. It could be surprising that the correlation we observe holds at the scale of the whole chromosome 3 (~1 Gb; [38]). Several non-exclusive reasons could explain this pattern. First, the recombination gradient along chromosomes is very steep in all Triticeae, including wheat [42,43,45], rye [76] and Aegilops speltoides [44]. For instance, along chromosome 3B in bread wheat (Triticum aestivum), the cM/Mb ratio spans about two orders of magnitude (from 0.01 to 0.85; [41]). Accordingly, in bread and durum wheat (T. aestivum and T. durum, respectively), linkage disequilibrium decays slowly over several cM [77]. Despite the impressive chromosome size, linkage disequilibrium could be high in centromeric regions because recombination is strongly reduced (see [78] for a study on chromosome 3B). However, the level of linkage disequilibrium is low in barley [79]. This discrepancy highlights the need for studying linkage disequilibrium patterns in Triticeae in more detail. Second, centromeric genes may have a lower local effective size than telomeric genes because of hitchhiking effects due to the lack of recombination [66,67]. In agreement with this prediction, the levels of diversity positively correlate with the proxy of recombination in Aegilops: the RFLP polymorphism is 1.5 to 25 times higher in telomeric than in centromeric regions [80]. Consequently, ancient polymorphisms would be less completely sorted in genes located in high recombining than in low recombining regions. Finally, recombination could play an important role in introgressive events between species. For instance, genes located in high recombining regions would introgress easier than genes located in regions of low recombination.
There is no straightforward way to distinguish whether the overall pattern of incongruence in Triticeae is produced by incomplete lineage sorting or a form of introgression (e.g., gene flow proceeding via polyploids). Methods that enable introgression and incomplete lineage sorting to be distinguished [81] require the estimation of population sizes and divergence times for all branches of the species phylogeny. This information cannot be obtained in Triticeae without making strong assumptions. More knowledge about population  Figure 5 Effect of recombination on incongruences. Relationship between the triplet distance between individual gene trees and the two multigenic trees (supermatrix tree in A; BUCKy tree in B) as a function of the genetic distance between genes located on chromosome 3. The triplet distance between individual gene trees and the multigenic trees is the percentage of triplets of accessions that were resolved differently by a multigenic tree and a given gene tree. Solid line: best fit using all points; dashed line: best fit without a potential outlier (filled point). The genetic distance is connected to the chromosomal position according to the schematic diagram presented in C (red point: centromere; dark blue: centromeric regions; light blue: telomeric regions). D. Degree of incongruence among pairs of loci relative to the genetic distance on chromosome 3. Colors represent the degree of incongruence (white: no incongruence; red: strongest incongruence).
parameters and divergence times is necessary to distinguish between these two sources of tree incongruence in this tribe.

Conclusions
Our study contributes two important aspects for research in Triticeae in particular, and for the broad phylogenetic community in general. First, we show that in spite of strong tree conflicts not all clades of Triticeae are affected by introgression and/or incomplete lineage sorting. Notably, Psathyrostachys, Hordeum and genera in the clade III (including Agropyron, Australopyrum, Eremopyrum and Henrardia) diverge in a tree-like manner, a result that was not supported by previous studies. Because the evolution of Pseudoroegneria and genera in clades IV (Dasypyrum and Heteranthelium) and V (Secale, Taeniatherum, Triticum and Aegilops) is more reticulated than in other clades, the multigenic network better reflects their phylogenetic history than do the supermatrix or BUCKy trees. Second, we demonstrate that recombination could be an important evolutionary force in exacerbating the level of incongruence among gene trees. It would be worthwhile estimating the frequency of recombination of genes used in future phylogenetic studies in order to assess the generality of the pattern previously observed in Drosophila and now evidenced in Triticeae.

Additional material
Additional file 1: Primers used in the PCR amplification of each locus. Table S1 with the sequence of each primer used during PCR amplification.
Additional file 2: Phylogenetic tree inferred with LOC_Os01g01790 sequences. Figure S1 showing the phylogenetic tree inferred with locus LOC_Os01g01790.
Additional file 3: Phylogenetic tree inferred with LOC_Os01g09300sequences. Figure S2 showing the phylogenetic tree inferred with locus LOC_Os01g09300.
Additional file 4: Phylogenetic tree inferred with LOC_Os01g11070 sequences. Figure S3 showing the phylogenetic tree inferred with locus LOC_Os01g11070.
Additional file 5: Phylogenetic tree inferred with LOC_Os01g13200 sequences. Figure S4 showing the phylogenetic tree inferred with locus LOC_Os01g13200.
Additional file 6: Phylogenetic tree inferred with LOC_Os01g19470 sequences. Figure S5 showing the phylogenetic tree inferred with locus LOC_Os01g19470.
Additional file 7: Phylogenetic tree inferred with LOC_Os01g21160 sequences. Figure S6 showing the phylogenetic tree inferred with locus LOC_Os01g21160.
Additional file 8: Phylogenetic tree inferred with LOC_Os01g24680 sequences. Figure S7 showing the phylogenetic tree inferred with locus LOC_Os01g24680.
Additional file 9: Phylogenetic tree inferred with LOC_Os01g37560 sequences. Figure S8 showing the phylogenetic tree inferred with locus LOC_Os01g37560.
Additional file 10: Phylogenetic tree inferred with LOC_Os01g39310 sequences. Figure S9 showing the phylogenetic tree inferred with locus LOC_Os01g39310.
Additional file 11: Phylogenetic tree inferred with LOC_Os01g48720 sequences. Figure S10 showing the phylogenetic tree inferred with locus LOC_Os01g48720.
Additional file 12: Phylogenetic tree inferred with LOC_Os01g53720 sequences. Figure S11 showing the phylogenetic tree inferred with locus LOC_Os01g53720.
Additional file 13: Phylogenetic tree inferred with LOC_Os01g55530 sequences. Figure S12 showing the phylogenetic tree inferred with locus LOC_Os01g55530.
Additional file 14: Phylogenetic tree inferred with LOC_Os01g56630 sequences. Figure S13 showing the phylogenetic tree inferred with locus LOC_Os01g56630.
Additional file 15: Phylogenetic tree inferred with LOC_Os01g60230 sequences. Figure S14 showing the phylogenetic tree inferred with locus LOC_Os01g60230.
Additional file 16: Phylogenetic tree inferred with LOC_Os01g61720 sequences. Figure S15 showing the phylogenetic tree inferred with locus LOC_Os01g61720.
Additional file 17: Phylogenetic tree inferred with LOC_Os01g62900 sequences. Figure S16 showing the phylogenetic tree inferred with locus LOC_Os01g62900.
Additional file 18: Phylogenetic tree inferred with LOC_Os01g67220 sequences. Figure S17 showing the phylogenetic tree inferred with locus LOC_Os01g67220.
Additional file 19: Phylogenetic tree inferred with LOC_Os01g68770 sequences. Figure S18 showing the phylogenetic tree inferred with locus LOC_Os01g68770.
Additional file 20: Phylogenetic tree inferred with LOC_Os01g70670 sequences. Figure S19 showing the phylogenetic tree inferred with locus LOC_Os01g70670.
Additional file 21: Phylogenetic tree inferred with LOC_Os01g72220 sequences. Figure S20 showing the phylogenetic tree inferred with locus LOC_Os01g72220.
Additional file 22: Phylogenetic tree inferred with LOC_Os01g73790 sequences. Figure S21 showing the phylogenetic tree inferred with locus LOC_Os01g73790.
Additional file 23: Phylogenetic tree inferred with eIFiso4E sequences. Figure S22 showing the phylogenetic tree inferred with locus eIFiso4E.
Additional file 24: Phylogenetic tree inferred with CRTISO sequences. Figure S23 showing the phylogenetic tree inferred with locus CRTISO.
Additional file 25: Phylogenetic tree inferred with PinA sequences. Figure S24 showing the phylogenetic tree inferred with locus PinA.
Additional file 26: Phylogenetic tree inferred with PinB sequences. Figure S25 showing the phylogenetic tree inferred with locus PinB.
Additional file 27: Phylogenetic tree inferred with PSY2 sequences. Figure S26 showing the phylogenetic tree inferred with locus PSY2.
Additional file 28: Phylogenetic tree inferred with MATK sequences. Figure S27 showing the phylogenetic tree inferred with locus MATK.