Nuclear genome composition
Our GBSSI data indicate the contribution of distinct lineages falling to the following present-day genera: Pseudoroegneria, Dasypyrum, Taeniatherum, Aegilops and Thinopyrum. The contribution of Aegilops and Thinopyrum is still uncertain due to only moderate support in phylogenetic analyses. GISH clearly identified the donors of two subgenomes: Pseudoroegneria and Dasypyrum. However, GISH did not provide a clear picture as to the contribution from Aegilops, Thinopyrum and Taeniatherum. Since the presence of five lineages (or even more if we consider multiple contributions from Dasypyrum) is not consistent with hexaploidy in Th. intermedium, it seems that the origin of Th. intermedium is more complex than would be expected if it originated through allohexaploidy alone. So, to explain the diversity of gene copies amplified in the Th. intermedium samples studied here (i.e., the number of potential progenitors as well as the sequence diversity within clades in which Th. intermedium sequences appear), mechanisms other than allopolyploidy through recent hybridization and/or introgression must also be considered.
For example, the appearance of polymorphism through ancient hybridization (many early hybridizations must have occurred in the early Triticeae) followed by incomplete sorting of ancestral polymorphism could lead to intra-specific variation in a diploid and, consequently, in a polyploid. Origin of North American tetraploid Elymus species is blurred by unexpected diversity of Pseudoroegneria-like GBSSI copies, likely caused by either ancient introgression or incomplete sorting of ancestral polymorphism . The general question is how much of potential intra-individual polymorphism in nuclear genes (in diploids in particular) may have been overlooked. Only extensive sampling of Triticeae diploids would tell how common is this phenomenon.
Gene duplication is another mechanism potentially responsible for excessive gene diversity . Thinopyrum intermedium is a species possessing a large amount of cytogenetic polymorphism and structural modifications of chromosomes, with not all accessions previously studied having identical genomic structure [20, 22–24]. Therefore, duplications of some loci following allohexaploid formation followed by paralog diversification cannot be ruled out. Corresponding orthologs and paralogs would form two clades that would be more or less similar to one another in a phylogenetic analysis. Since gene loss must also be taken into account, it cannot be ruled out that only paralogous sequences of an individual homoeolog (i.e. progenitor) exist within the Th. intermedium genome.
Furthermore, intra-individual variation in a marker may be the result of heterozygosity. Allelic variation is usually irrelevant for disentangling origins of allopolyploid species. However, if allelic variation spans species boundaries, i.e., if some alleles of a species are more closely related to alleles of another species than they are to those of the same species , such a variation might confuse the identification of the allopolyploid's progenitors.
Thinopyrum intermedium and Pseudoroegneria
The contribution from Pseudoroegneria to the accessions studied here is evidenced by chloroplast and GBSSI markers as well as in situ hybridization. Pseudoroegneria-like GBSSI variants were amplified in three out of four accessions (though the placement of sequence Thinopyrum intermedium-1d in the Pseudoroegneria clade is questionable due to only moderate support in the MP analysis); between one and three Pseudoroegneria-like sequences were retrieved from the three individuals (Table 2). Such a biased proportion of amplified Pseudoroegneria-like copies is not consistent with the contribution of a whole Pseudoroegneria-derived genome. However, GISH clearly identified the presence of a whole chromosome set corresponding to Pseudoroegneria in all accessions studied. Interestingly, Pseudoroegneria-like sequence variant was very rare in the three accessions and may therefore also be present in accession 2, but maybe was not retrieved by the clones. To achieve a good representation of individual gene variants, we performed PCR in triplicates and mixed equimolar amounts of PCR products prior to cloning. Moreover, biased amplification due to fragment length differences can be excluded, as all fragments amplified with the F/K primers are of similar lengths. Thus, the reason for such underrepresentation of Pseudoroegneria-like gene variants is yet unclear.
The presence of the Pseudoroegneria subgenome in Th. intermedium is concordant with the literature [6, 17–19]. Liu and Wang  and Tang et al.  identified in Th. intermedium two pairs of long chromosomes and one pair of short chromosomes, ascribing the long sets of chromosomes to Thinopyrum and the short set to Pseudoroegneria (St). Assadi and Runemark  also suggested the presence of one genome of Th. intermedium homologous to Pseudoroegneria (St) based on chromosome pairing in interspecific hybrids.
Thinopyrum intermedium and Dasypyrum
Phylogenetic analyses clearly placed Th. intermedium sequences in a clade containing Dasypyrum (Figure 2), identifying Dasypyrum as one of the progenitors. Dasypyrum-like sequences were the most frequently retrieved sequence types overall and were amplified in all four individuals (Table 2). Consistently, GISH identified the presence of a Dasypyrum-like genome in all accessions studied (Figure 3). Remarkably, if we omit unique sequences 2d and 3c, accessions Thinopyrum intermedium-1 and -4 harbour Dasypyrum-derived sequences different from accessions -2 and -3. The presence of three different Dasypyrum-like sequence types in the four accessions coupled with their relatively high divergence is intriguing. For example, sequence Thinopyrum intermedium-1b differs from sequence 2c by 16 substitutions and two indels of 8 and 4 bp (K2P distance 0.029) and from sequence 3b by 24 substitutions and two indels (0.044). For illustration, the difference between Thinopyrum intermedium-1b and Pseudoroegneria-like sequence 4c is 32 substitutions and three indels (0.059). Such diversity of Dasypyrum-like sequences could have several explanations: 1) contribution from different sources close to Dasypyrum and maintenance of the divergent copies, 2) duplication and diversification of Dasypyrum-like sequences following the origin of the allopolyploid, giving rise to divergent paralogs, 3) allelic variation, and 4) a combination of 1-3.
It is hard to explain the first scenario, as three different lineages are one more than the number of currently recognized Dasypyrum haplomes. However, apart from the acknowledged existence of two allogamous Dasypyrum species, Dasypyrum villosum (diploid, haplome V) and D. breviaristatum (Lindb. f.) Frederiksen (diploid and autotetraploid, haplome V
- ), the situation within the genus is yet to be untangled. Investigations of the genome relationships within Dasypyrum revealed substantial dissimilarity between the V and V
genomes [65–68]. Both the V and V
genomes are so unrelated that Uslu et al.  suggested a weaker relationship between the two Dasypyrum species than of D. villosum with Thinopyrum bessarabicum and Secale cereale. Similarly, Yang et al.  showed that the RAPD pattern of D. breviaristatum was closer to Thinopyrum intermedium than to D. villosum. Since no sequence of Th. intermedium accessions studied by us is tightly related to present-day D. villosum in the phylogenetic tree (Figure 2), the possibility that D. breviaristatum or an extinct or other unsampled Dasypyrum (or their hybrid) are the ancestral species cannot be ruled out. Discovering potential intra-specific diversity within Dasypyrum could therefore at least help clarify the situation as to potential multiple contributions from Dasypyrum.
Alternatively, some of the Dasypyrum-like sequences may represent divergent paralogs. Positive selection along branches leading to two Dasypyrum-like sequences (4b and 3c) was detected (Additional file 3). There were several non-synonymous substitutions encountered within the sequences. It is not clear, however, whether the non-synonymous substitutions are related to any functional role. Therefore, if these sequences really represent divergent paralogs, it is not clear, whether they underwent non-functionalization (silencing by degenerative mutations), neofunctionalization (non-synonymous substitutions providing a beneficial function) or subfunctionalization (partitioning of ancestral functions between duplicates) .
A contribution from Dasypyrum to Th. intermedium was recently proposed by Kishii et al. , who using multicolour GISH indicated the presence of a whole subgenome derived from Dasypyrum. Similarly to our results, Kishii et al.  observed St centromeric signal on nine Dasypyrum-like chromosomes (see Figure 3a,b). Similar "translocations" were observed in another allohexaploid Elymus repens, in which one pair of chromosomes of the Hordeum subgenome (H) carried a centromeric H/St translocation. Intriguingly, both centromeres belonged to Pseudoroegneria . Apparently, chromosomal rearrangements have occurred in both species.
Thinopyrum intermedium and Taeniatherum
The contribution from Taeniatherum to intermediate wheatgrass is a new finding since it was never reported before. Interestingly, an obscure contribution from Taeniatherum has been detected using GBSSI sequences in introduced North American as well as native Central European accessions of the closely related allohexaploid Elymus repens [25, 26, 35]. It is noteworthy, according to , that all the sequences of the Taeniatherum/E. repens clade (including Taeniatherum caput-medusae itself) were most probably non-functional pseudogenes, suggesting that the loss of function predated the origin of E. repens. Originally, Mason-Gamer  interpreted the presence of the Taeniatherum-like GBSSI gene as a result of introgression, but later the same author  put forward another explanation for its acquisition when she doubted the contribution of Taeniatherum per se and suggested that Taeniatherum itself might have acquired its GBSSI from other species. Our data on E. repens  are consistent with either of these hypotheses, as we did not find any direct evidence for a recent contribution from Taeniatherum using GISH. If the GBSSI copy amplified in T. caput-medusae is a pseudogene, too, the question is what is the functional GBSSI variant of Taeniatherum. There is a possibility that the pseudogenic GBSSI variant preferentially amplifies not only in the hexaploids Th. intermedium and E. repens but also in diploid Taeniatherum. Hence, the functional variant may not have yet been retrieved. We tried to recover GBSSI using F/M primers in Taeniatherum, but amplifications failed several times, probably due to alteration of primer sites.
The situation in Th. intermedium seems to be paralleled by that of E. repens. The fact that the Taeniatherum-like GBSSI copies amplified in Th. intermedium are identical with those pseudogenes amplified in E. repens casts doubts on the possible contribution of a whole subgenome from Taeniatherum. Instead, it is more likely that Th. intermedium acquired its Taeniatherum-like copies from another diploid progenitor, which therefore must have contained additional GBSSI copies. Since both E. repens and Th. intermedium share a Pseudoroegneria-like progenitor, Pseudoroegneria is a good candidate in this case. The ease with which Th. intermedium crosses with E. repens under field conditions in Central Europe (hence the connection of E. repens with the present study; [44, 45, 71]) leads to another hypothesis, not incompatible with the former scenarios, according to which either species might have obtained the Taeniatherum-like GBSSI pseudogene from one another through introgression.
Thinopyrum intermedium and diploid Thinopyrum
Thinopyrum-like sequences were the most often retrieved sequence types in accessions Thinopyrum intermedium-1 and -3, and their absence in the other two is surprising and hard to explain. Since Th. intermedium is a polymorphic species displaying structural chromosomal rearrangements and modifications, locus loss in accessions -2 and -4 is one possible explanation of this phenomenon. Genomes E
of Thinopyrum elongatum and Th. bessarabicum, respectively, are further genomes whose involvement in hexaploid Th. intermedium has most often been discussed in the literature [6, 12, 14, 15, 17, 19]. There has been a debate on the degree of homology between Th. bessarabicum and Th. elongatum genomes [14, 72–74]. Still, no consensus has been reached in this respect, and the treatment of the two genomes continues to vary among authors.
Thinopyrum intermedium and Aegilops
As the clade consisting of Th. intermedium sequences plus five Aegilops species is supported only moderately (Figure 2), the statement that Th. intermedium contains genetic material derived from Aegilops must be considered as provisional. Remarkably, neither Triticum/Aegilops clades in GBSSI-based phylogenies presented elsewhere [e.g., [25, 26, 75]] form tight, strongly supported groups, which is likely caused by the fact that neither Triticum, Aegilops nor Triticum + Aegilops are monophyletic . Early investigations [9–11] advanced the hypothesis that Th. intermedium has at least one genome homologous with one of the Triticum genomes. Since Triticum aestivum L. is an allohexaploid constituting of one Triticum genome and two different genomes derived from Aegilops , it is possible that it was one of Aegilops which represented the homologous genome. As noted before, however, early works, in which chromosome pairing data (at high ploidy levels in particular) were used as exclusive evidence for or a measure of genomic relationships, must be interpreted with a great deal of caution. Up to now, the presence of neither Triticum nor Aegilops within the genome of Th. intermedium has been reported based on any more sophisticated approach.
While the identity of Pseudoroegneria- and Dasypyrum-derived subgenomes seems to be relatively straightforward based on the combined GBSSI and GISH data, the identity of the third subgenome remains unresolved satisfactorily. GBSSI sequences suggest the contribution from Thinopyrum and Aegilops to the accessions studied, placing these two among possible donors. Similarly, GISH with Th. elongatum, T. caput-medusae and A. tauschii probes produced overlapping signal on one chromosome set (Figure 3a-d). Possibly, the level of divergence among Th. elongatum, T. caput-medusae and A. tauschii is below the detection threshold of in situ hybridization in this case, making unambiguous identification of the subgenome impossible. If we set aside contribution from Taeniatherum (discussed above), the most parsimonious explanation of the origin of the third subgenome is its hybridogenous origin. Possibly, the progenitor was an ancient hybrid or introgressant between species close to Aegilops and Thinopyrum. Such an ancient origin of Th. intermedium (or at least of some of its subgenomes) could then also explain why some of the GBSSI copies did not group tightly with its presumed progenitors in phylogenetic analyses. This may indicate that some of the ancestors no longer exist or that the allopolyploidization happened so long ago that the genes within Th. intermedium have already diverged.