Expansion of the gamma-gliadin gene family in Aegilops and Triticum

Background The gamma-gliadins are considered to be the oldest of the gliadin family of storage proteins in Aegilops/Triticum. However, the expansion of this multigene family has not been studied in an evolutionary perspective. Results We have cloned 59 gamma-gliadin genes from Aegilops and Triticum species (Aegilops caudata L., Aegilops comosa Sm. in Sibth. & Sm., Aegilops mutica Boiss., Aegilops speltoides Tausch, Aegilops tauschii Coss., Aegilops umbellulata Zhuk., Aegilops uniaristata Vis., and Triticum monococcum L.) representing eight different genomes: Am, B/S, C, D, M, N, T and U. Overall, 15% of the sequences contained internal stop codons resulting in pseudogenes, but this percentage was variable among genomes, up to over 50% in Ae. umbellulata. The most common length of the deduced protein, including the signal peptide, was 302 amino acids, but the length varied from 215 to 362 amino acids, both obtained from Ae. speltoides. Most genes encoded proteins with eight cysteines. However, all Aegilops species had genes that encoded a gamma-gliadin protein of 302 amino acids with an additional cysteine. These conserved nine-cysteine gamma-gliadins may perform a specific function, possibly as chain terminators in gluten network formation in protein bodies during endosperm development. A phylogenetic analysis of gamma-gliadins derived from Aegilops and Triticum species and the related genera Lophopyrum, Crithopsis, and Dasypyrum showed six groups of genes. Most Aegilops species contained gamma-gliadin genes from several of these groups, which also included sequences from the genera Lophopyrum, Crithopsis, and Dasypyrum. Hordein and secalin sequences formed separate groups. Conclusions We present a model for the evolution of the gamma-gliadins from which we deduce that the most recent common ancestor (MRCA) of Aegilops/Triticum-Dasypyrum-Lophopyrum-Crithopsis already had four groups of gamma-gliadin sequences, presumably the result of two rounds of duplication of the locus.


Background
Prolamin storage proteins are produced in large amounts in the developing endosperm of Triticeae species. These storage proteins are a complex mixture of alpha/beta-, gamma-and omega-gliadins and high-and low molecular weight glutenins, collectively called 'gluten' in wheat. They are encoded by medium to large multigene families. For example, the alpha-gliadins are encoded by a complex gene family with estimates for copy number that range from 25-35 copies [1] to 100 [2] or even 150 copies [3] per haploid genome, most of which (72-95%) are pseudogenes [3,4]. Sequence similarity of alpha-gliadins from bread wheat to alpha-gliadins from diploid Aegilops/Triticum species, which are close relatives of the diploid ancestors of bread wheat, demonstrated that there are three distinct groups of alpha-gliadins, one for each of the three homoeologous loci in hexaploid bread wheat [4]. This is consistent with the notion that the expansion of this gene family took place after the ancestors of the different genomes of Aegilops/ Triticum became separated.
The gamma-gliadins are considered to be the most ancient of the gliadins and LMW-glutenins [5]. In bread wheat they are encoded by the homoeologous Gli-1 loci (Gli-A1, Gli-B1 and Gli-D1), located on the short arms of the homoeologous chromosomes 1 [6,7]. In the variety Chinese Spring the number of gamma-gliadins was preliminary estimated at 15-40 [8,9] and, in contrast to the situation in alpha-gliadins, only a small fraction (~14%) of the gamma-gliadin genes in hexaploid bread wheat consisted of pseudogenes [10]. Nevertheless, sequence analysis showed that the gamma-gliadins form a highly diverse gene family [9,10].
The large majority of the gamma-gliadin sequences available in Genbank are from tetraploid Triticum durum (A and B genomes) and hexaploid Triticum aestivum (A, B and D genomes), diploid Triticum monococcum (A genome) and diploid Aegilops species with S and D genomes (the B genome is closely related to the S genome of Aegilops speltoides, [11,12]). Using such a collection of gamma-gliadin sequences Qi et al. [10] classified gamma-gliadins into 17 subgroups, most of which had 8 cysteine residues per protein, but 7, 8, and 10 residues also occurred. The cysteine residues form sulphur bridges, and proteins with unequal numbers of cysteins can covalently bind to a network of HMW glutenins and other gluten proteins [13]. Of these 17 subgroups those with A genome gamma-gliadins appeared to be distinct from the subgroups that contain B (S) and/or D genome genes. As only these three diploid progenitor genomes were included, the study did not provide insight in the evolutionary history of the gamma-gliadins. Wang et al. [14] recognised four groups of gamma-gliadins.
Although wheat storage proteins form multigene families, their phylogeny can be established effectively using knowledge on the phylogenetic and evolutionary relationships among Triticum and Aegilops genomes. Zhang et al. [15] and Li et al. [16] studied the HMW glutenin subunits, whereas Zhang et al. [17] and Wang et al. [18] focused on LMW glutenin subunits. From this it appears that, in case of multigene families, it may be necessary to infer relationships at the level of groups of closely related genes rather than for individual genes.
Here we have studied the evolution of gamma-gliadins. For this we have complemented the available gammagliadin sequences from diploid Aegilops/Triticum species with novel sequences from diploid species representing the other main genome types in Aegilops/Triticum: the C, M, N, U, and T genomes. Our analysis of these genes shows that there are six groups of gamma-gliadins that occur in different combinations across all the genomes. We present a model for gene duplications and losses that is consistent with our data. Our model indicates that at least some gene duplications are presumed to predate the most recent common ancestor (MRCA) of all Aegilops/Triticum genomes.

Sequence analysis
The reads were merged per clone and the sequence data were manually checked using SeqMan (DNASTAR) to exclude sequencing mistakes. Sequences that were suspected to be chimeric, that lacked 5' or 3' ends, or that had a very long deletion (sequence length in the alignment less than 600 bp) were excluded from the phylogenetic analysis. Each PCR product was a mixture of sequences from different genes, so many of the 11-81 clones obtained from one PCR reaction were independent. However, some duplicate clones may be derived from the same gene, possibly even from the same amplification product with a particular PCR error. Therefore all remaining 335 sequences were conservatively organized into 59 contigs (sets of overlapping DNA sequences) with 99% similarity. The consensus sequences of the contigs thus obtained were used for further statistic/phylogenetic analysis. One to three sequences representing each consensus sequence were submitted to Genbank. In total 69 novel gamma-gliadin sequences were submitted, representing 59 contig consensus sequences. The length of the partial gammagliadin sequences obtained varied from 545 to 986 base pairs and corresponded to a part of full-length open reading frame region of gamma-gliadins which is 648-1089 bp in length. They encode gamma-gliadins of 215-362 amino acids. These sequences are probably not the complete set of gamma-gliadin genes from each of the accessions, but the aim was to clone a sufficient number of genes from each accession to obtain representatives of all distinct groups of gamma-gliadins for a phylogenetic analysis, rather than a complete set of gamma-gliadin genes and pseudogenes from all accessions.
For the phylogenetic analysis the genes cloned and sequenced here were supplemented with sequences of diploid Triticum and Aegilops species and of the related genera Lophopyrum, Crithiopsis, and Dasypyrum from EMBL/Genbank (as present in August 2011). These were organized in the same way in contigs of 99% sequence similarity; a total of 145 sequences and 68 contigs ( Table 1). All 127 contigs (59 composed of novel sequences and 68 of EMBL/Genbank-derived sequences) were trimmed to represent the same part of the gene. One gamma-hordein sequence (AY338365 from Hordeum chilense) and three secalins (EU368041 from Secale cereale, EF432546 from Secale sylvestre, and HQ266670 from Secale strictum) were included as outgroups, as the sequence alignment already indicated that they are more distant.
Both the nucleotide and the deduced amino acid sequences of the gamma-gliadin dataset were aligned using MEGA4 [22], and Maximum-Likelihood (ML) Total in Genbank/EMBL/DDBJ 145 68 52 16 analysis was performed with PhyML 3.0 (http://www. phylogeny.fr [23,24]) using the GTR-substitution model for nucleotide data and WAG-model for amino acid data. SH-like approximate likelihood-ratio test was used for estimation of branch support [25]. MEGA4 used the complete alignment, while the ML-program at PhyML excluded all sites with deletions. When we used the pairwise deletion option for neighbour joining (NJ) in MEGA4 we obtained the same tree topology. The number of base differences per site, number of synonymous differences per synonymous site and number of non-synonymous differences per non-synonymous site from averaging over all sequence pairs within each group and overall sequences was calculated using the method of Nei and Gojobori [26] with incorporation of the Jukes-Cantor correction in MEGA4. Standard error estimates were obtained by a bootstrap procedure (1000 replicates). All positions containing alignment gaps and missing data were eliminated only in pairwise sequence comparisons (Pairwise deletion option). The ratio between synonymous substitutions per site (d S ) and non-synonymous substitutions per site (d N ) and (d S /d N ratio) was calculated.
To study the selection pressure on gamma-gliadin sequences the codon-based test for selection (Z-test) was performed for sequences of each of groups and for overall dataset. The variance was computed using bootstrapping (1000 replicates). To analyse differences in selection pressure on full open reading frame (ORF) and pseudogene gamma-gliadin sequences the number of synonymous (Ks) and non-synonymous substitutions (Ka) per site were calculated from pairwise comparisons for ORF and pseudogene sequence pairs using the method of Nei and Gojobori [26]. The values obtained were used for a scatter plot in Excel.

Gamma-gliadin sequences
In order to analyse genetic diversity and the evolution of the gamma-gliadin multigene family 335 gamma-gliadin sequences were cloned and sequenced from species representing all main genome types in Aegilops/Triticum (A, B/ S, D, G, M, N, U, and T genomes). The aim was to clone and sequence a sufficient number of genes from each accession to obtain representatives of all distinct groups. The sequences were assembled into contigs at 99% homology at nucleotide level (Additional file 1). The contigs with intact open reading frames represented 46 different predicted gamma-gliadin proteins (Table 1). Thirteen contigs (49 sequences) contained internal stop-codon or frameshift mutations and were therefore considered to represent pseudogenes. The fraction pseudogene sequences differed among the eight Aegilops/Triticum species analysed. For example, more than half of all sequences of Ae. umbellulata were pseudogenes (20 of 35 sequences in 5 of 10 contigs), while no pseudogene contigs were present among 32 sequences from Ae. tauschii (Table 1). Figure 1 presents a schematic overview of the structure of gamma-gliadins, after [9] and [27]. The sequences of the predicted intact proteins varied in length considerably due to variation in the length of the repetitive domain (II) and the length of the glutamin-rich domain (IV). Most of the sequence length variation was observed among Ae. speltoides sequences, and both the shortest and the longest sequences were isolated from Ae. speltoides.

Clustering and phylogenetic analysis
An analysis of the sequences with a gamma-hordein as outgroup, resulted in a multiple sequence alignment (Additional file 2 contains the nucleotide alignment, Additional file 3 contains the amino acid alignment, both in Nexus format). The maximum-likelihood (ML) tree produced on the basis of the alignment contained a separate cluster of secalins and two well-supported groups of gliadins of unequal size: 53 consensus sequences belonged to the first group and 74 belonged to the second group (Additional file 4 contains the tree based on nucleotide sequences, Figure 2 shows the tree based on deduced amino acid sequences). In total six significant (bootstrap support value 84% or higher) groups were observed, two within the first branch (designated group 1 and 2) and four within the second branch (designated group 3-6). The groups contain sequences cloned here as well as sequences obtained from Genbank, and Genbank sequences do not form additional groups, indicating that we have cloned and sequenced sufficiently deep.
Sequences of Ae. umbellulata (U), Ae. comosa (M), Ae. mutica (T), Ae. tauschii (D), all species with an S genome (Ae. speltoides (S), Ae. searsii (S s ), Ae. bicornis (S b ), Ae. sharonensis (S sh ) and Ae. longissima (S l )) occurred in both branches and in at least two unrelated Figure 1 Schematic overview of the structure of gamma-gliadins. The proteins consist of a short N-terminal signal peptide (S) followed by a unique N-terminal domain (I) and a repetitive domain (II). Domain III contains most (often 6) of the cysteines. IV is rich in glutamine. Two conserved cysteines are in V. Eight cysteine residues (indicated with vertical lines) can form four interchain disulfide bonds (indicated as connections between lines). Figure after [9]. groups ( Figure 3). Sequences originating from Triticum species with an A genome (T. monococcum (A m ) and T. urartu (A u ), and Aegilops species Ae. caudata (C) and Ae. uniaristata (N) were restricted to the second branch. Within this second branch, all gamma-gliadin sequences from T. monococcum (A m ) and T. urartu (A u ) clustered in group 4. Group 3 consisted only of Ae. caudata (C) sequences, and it included all of them except one that was present in group 6. All groups except the Ae. caudata-specific group 3 included a mixture of sequences of three to seven species of Aegilops/Triticum. Each of the groups included terminal branches that are mainly species/genome-specific.
The gliadin sequences of Dasypyrum, Lophopyrum and Crithopsis included in the analysis were also positioned within the two branches despite the fact that Triticum and Aegilops are much more closely related and treated as one large genus by some authors [28,29]. The sequences of Lophopyrum clustered in groups 2 and 6, sequences of Dasypyrum clustered in groups 1 and 4 (in group 4 only pseudogenes, visible in the nucleotide maximum likelihood (ML) tree in Additional file 4), and those from Crithopsis clustered in group 1. Only groups 3 and 5 contained exclusively sequences of Aegilops/Triticum species.

Genetic variation within and among the groups
The most polymorphic sequences were found in group 1. This group of sequences varied in length from 762 to 1089 bp, which means that it includes many of the shortest and all of the longest variants of the whole study. They were highly polymorphic with a codonbased evolutionary divergence (d) of 0.089 ± 0.005 (ds=0.191, dn=0.065) ( Table 2, Additional file 5). Genes of this group are only maintained in the D and various S genomes and in the genera Lophopyrum, Crithiopsis, Dasypyrum. They occur as pseudogenes in the U and M genome (Figure 3). It thus appears that group 1 has undergone intensive diversification and death processes in most of the species analysed.
The least polymorphic are the group 6 gamma-gliadins. They are present in seven Aegilops genome types (T, D, U, C, N, M and S (only Ae. speltoides)) and in Lophopyrum. The Aegilops sequences of this group all have the same deduced ORF length of 909 bp, coding Interestingly, all Aegilops sequences of group 6 have an additional cysteine residue whereas in Lophopyrum sequences of group 6 the additional cysteine is not present, and here the predicted length of the protein is not 302 amino acids either. The cysteine can easily be formed by a single nucleotide change (TCC to TGC).
The Aegilops species that do not have group 6 gliadins are the S genome species except Ae. speltoides (S s , S b , S sh , S l genomes), all of which have group 5 gliadins (Figure 3). These gliadins, although distinct in sequence composition, have the same length of 302 amino acids as the group 6 gliadins and have also an additional cysteine in the same position (except FJ006687, which has a large deletion). As a consequence, each Aegilops species contains a group of 9-cysteine gliadins, either from group 6 or from group 5. The U and N genomes contain

Gr3
Gr4 pseudo Gr5 9 cys 9 cys 9 cys 9 cys Gr6 9 cys 9 cys 9 cys 9 cys 9 cys 9 cys 9 cys  group 6 sequences and group 5 sequences but, in contrast to group 5 sequences from S-genome Aegilops species, the U and N sequences from group 5 all contain only eight cysteins and are variable in length.

Selection
The codon-based test for selection (Z-test) showed evidence for purifying selection in each of the six groups of sequences and also overall ( Table 2). The ratio between synonymous and non-synonymous substitutions per site (d S /d N ) for pairwise comparisons of sequences showed a relative excess of synonymous substitutions compared to non-synonymous substitutions in full open reading frame genes compared to genes with stop codons (pseudogenes) (see the trend line in Additional file 5). The difference in the ratios is comparable to those obtained for intact and pseudogene alpha-gliadins [4] but some of the values for dS as well as dN are higher, indicating that gamma-gliadins are an evolutionary older family.

Discussion
The main genomes within the Aegilops/Triticum group (A, S/B, C, D, M, N, T, U) have split within an evolutionary short period, 2.5 to 4.5 MYA [30]. Multi-gene families have expanded in the same period as these genomes split. Here we obtained 59 new gamma-gliadin genes from eight genomes, and have analysed these data together with gene sequences in Genbank in the frame of gains and losses of groups of gamma-gliadin genes during the evolution of these species. This has produced new insight in how this multigene family has developed. Among the diversity of genes some groups show a remarkable stability of protein length and number of cysteines, suggesting functional relevance.

A model for the evolution of gamma-gliadins
Evolution of multigene families occurs by duplication of gene clusters [31,32]. Gao et al. [33] showed evidence for multiple rounds of segmental duplication of omega-gliadin genes in wheat. The evolution of the gamma-gliadins appears to fit to the birth-and-death evolutionary model [34]. The sequence data obtained here allowed us to distinguish six groups of closely related gamma-gliadins (Figures 2 and 3, Additional file 4), which appear to be organised in two branches. These two ancestral branches predate the MRCA of the Aegilops/Triticum clade, as they also include sequences from the genera Lophopyrum, Crithopsis, and Dasypyrum. A hordein sequence from Hordeum and the secalins from Secale clustered outside the two main branches. A recent phylogenetic study of the Triticeae based on one chloroplastic and 26 nuclear gene sequences [35] placed Secale closer to Aegilops and Triticum than Dasypyrum, but also noted that the clade grouping these genera had evolved in a reticulated manner, and that their relationships are better represented by a multigenic network. Based on a careful examination of the presence and absence of the six groups of gamma-gliadins we present a model for the evolution of this multigene family during the evolution of the Aegilops/Triticum (Figure 4). Note that in this model the order of the groups along the chromosome is arbitrary, and that repetitive DNA and non-gamma-gliadin genes that are present between gamma-gliadins [33] have been omitted. While developing this model we have assumed that our set of sequences (both cloned here and obtained from Genbank) is sufficiently deep to not have missed particular groups. Evidence supporting this notion is that (i) our sequences, obtained using PCR primers designed by us, fall into the same six groups as those of other diploid taxa from Genbank; (ii) all groups except the Ae. caudata-specific group 3 included a mixture of sequences of three to seven species of Aegilops/Triticum; (iii) the number of genes from one genome was not correlated with the number of groups into which they clustered. All Ae. caudata genes but one ended up in group 3, but we had cloned 12 different genes. T. monococcum genes ended up only in the lower branch, but we had as many as 19 different genes (Table 1). Finally, (iv) four of these groups were also recognised by other studies. One of the two groups missed by Wang et al. [14] was the Ae. caudata-specific group 3.

Gamma-gliadin duplication, pseudogenisation, and loss during Aegilops/Triticum genome evolution
The six groups of gamma-gliadins fall into two branches: one including group 1 and group 2 genes, and one including groups 3 to 6. In our evolutionary model the MRCA of the Aegilops/Triticum spp. already has four distinct groups of differentiated gamma-gliadin sequences, i.e., two from each branch (group 1, 2, 4 and 6, Figure 4). Almost all extant Aegilops/Triticum genomes include several distinct groups of gamma-gliadins. The only exception is the A genome of Triticum, which contains only group 4 gliadins. Consequently, its position in the model is the least supported, as loss of the other groups may have occurred at several points in time. The T genome lost group 4 and group 1 gliadins. A major split is between the D genome and the S genomes, that have lost the group 4 gliadins but maintained group 1 plus group 2 gliadins, and the genomes that lost group 1 and group 2 gliadins (M, N, U, C genomes). It is likely that these lineages have split from the MRCA of the other Aegilops genomes very early. This is consistent with taxonomic studies. T. monococcum and T. urartu, carrying two different modifications of the A genome, are usually treated together with polyploids carrying the A genome as a separate genus, Triticum [19,[36][37][38][39][40]. Ae. mutica (T) appears to represent a separate evolutionary line within Aegilops/Triticum as this species shows many primitive characters. In some classifications it is treated as a separate genus, Ambylopyrum [19,41], or placed within a separate monotypic subgenus, Ambylopyrum, within Aegilops [39]. Cytogenetic studies [42] confirmed this isolated position. The D genome of Ae. tauschii was already regarded by early cytogenetic studies as a rather well-separated lineage [43]. Some DNA marker-based studies placed it at basal position in the Aegilops/Triticum group [44][45][46]. According to our model, the most recent ancestor (MRCA) of the S genomes probably gained the group 5 gliadins. Ae. searsii (S s ), Ae. bicornis (S b) , Ae. sharonensis (S sh ) and Ae. longissima (S l ) all have sequences of group 5 but none of group 6. Ae. speltoides (S) has group 6 sequences but none of group 5, in correspondence with it being the most divergent of the species of section Sitopsis [46][47][48][49][50][51][52][53]. Note that Eig [37] put Ae. speltoides in a separate subsection, Truncata, on the basis of morphological evidence. As the S genome species together are well separated from all other Aegilops species, they were by some considered as more closely related to Triticum than to other Aegilops species [54,55].
The species Ae. caudata (C), Ae. umbellulata (U), Ae. comosa (M) and Ae. uniaristata (N) share a common node in our model, representing a hypothetical common ancester that was differentiated from all other genomes by the combination of pseudogenes in group 1 gammagliadins and the absence of group 2 gamma-gliadins. From this ancestor the N and M genomes maintained group 4 gliadins, while the C and U genomes lost them. The similarity of Ae. caudata to Ae. umbellulata and Ae. comosa to Ae. uniaristata was already proposed by Kihara [43] and Lucas and Jahier [56] based on cytogenetic analysis, and by Dvorak and Zhang [48] based on RFLP data. A recent phylogenetic analysis of chloroplast haplotypes also showed similarity between the genomes of Ae. comosa, Ae. uniaristata and Ae. caudata [57].

Evolution and selection of gamma-gliadins
A high level of genetic diversity was observed among gamma-gliadins, similarly to results of [3,10] and [14]. The number of groups in each genome reflects a more complicated evolution, over a longer period of time, than e.g. the alpha-gliadins of locus Gli-2 on chromosome 6, which have been suggested to originate from a gliadin locus on chromosome 1 through a translocation event [5]. At the same time they do contain fewer pseudogenes that the 90% of alpha-gliadins [4]. The codon-based test for selection (Z-test) showed evidence for purifying selection in all groups of gamma-gliadin sequences ( Table 2, Additional file 5) and at higher levels in intact genes than in pseudogenes. What mechanism made the gamma-gliadins split into separate groups, why is purifying selection stronger, and why do they have relatively few pseudogenes? One clue may come from the fact that the strength of selection, the variation in sequence length and in the number of cysteines, and the percentage pseudogenes, are clearly different between the six groups ( Figure 3). This is most readily understood by comparing the most conserved and most polymorphic groups.
The most polymorphic is group 1, in which the genes encode proteins with 8 cysteines, which would allow them to be present as monomers. Deduced full sequences of this group varied in length from 762 (an Ae. searsii sequence from Genbank) to 1089 bp, which means that this group contains some of the shortest and all of the longest variants of the whole study. They were also most polymorphic in terms of sequence divergence, and the group is lost in many lineages (only maintained in Lophopyrum, Crithiopsis, Dasypyrum, and D and various S genomes) or consists of pseudogenes only (U and M genome). This suggests that as far as group 1 proteins perform any biological function, they are interchangeable with gliadins from other groups.
The most conserved are the group 6 gamma-gliadins, present in almost all Aegilops genome types (T, D, U, C, N, M and S (only Ae. speltoides)) and in Lophopyrum. They all have an uneven number of nine cysteines. The uneven number of cysteines would allow these proteins to become linked to a gluten network and function as a chain terminator. This particular group of gliadins is very conserved in length (all are 302 amino acids), except in Lophopyrum, where the additional cysteine is not present. The Aegilops species that do not have group 6 gamma-gliadins are the S genome species (except Ae. speltoides), all of which have group 5 gamma-gliadins, which are distinct in sequence composition but have the same length as the group 6 gliadins and have an additional cysteine in the same position. As a result, each Aegilops species has a group of 9-cysteine gamma-gliadins of a specific and conserved length. This strongly suggests that these 302 amino acid, 9-cysteine gammagliadins perform a specific function, possibly in relation to the gluten network formation during protein body formation in developing wheat grains. The traditional idea that gamma-gliadins have no free cysteines, and that all four S-S linkages (corresponding to 8 cysteines) are intramolecular, thus preventing gliadins from participating in the polymeric structure of glutenin, is clearly too simple. Altenbach et al. [58] already found several of these odd-numbered gamma-gliadins, but not yet in all genomes. The cysteines may be functional in combination with a fixed length if that provides a particular secondary structure (beta-reverse turns [59], possibly also related their capability to function as chain terminators in the polymer network).
Upelniek et al. [60] showed that differences in gliadin allele composition of Gli-1 loci among bread wheat varieties were correlated with differences in proteolysis rates during germination. Nevertheless, and apparently in contrast to the notion of specific functionality of at least some gamma-gliadins, hexaploid wheat appears to tolerate the loss of most or all gamma-gliadin proteins, as spring wheat cultivar Bobwhite grains remained viable when gamma-gliadin gene expression was mostly eliminated with RNAi [61] or when the bulk of all gliadins was silenced using an RNAi construct based on a conserved region from alpha-, gamma-and omega-gliadins [62]. However, Gil-Humanes et al. [63] did observe irregularities in the development of protein bodies in the endosperm when all gliadins were down-regulated, not only the gamma-gliadins. The effect of a reduction of gamma-gliadins by RNAi in commercial cultivars [64,65] or as a result of deletions in 'Chinese Spring' [66] is an increase in dough strength, which is consistent with a chain termination activity of part of the gamma-gliadins.

Conclusion
We have studied the evolution of gamma-gliadins in diploid species of Aegilops/Triticum representing all main genome types in the group. Wide sampling enabled us to show that gamma-gliadins are represented by six diverged groups of genes that occur in different combinations across the genomes. The current gamma-gliadin composition in each of the genomes is the result of multiple gene duplication and divergence events followed by pseudogenisation within groups as well as loss of groups of genes during genome evolution. We have presented a possible model for duplications and deletions of groups of genes that proposes that at least some duplications predate the most recent common ancestor of all Aegilops/Triticum genomes that currently exist. Although the length and repeat composition are variable among genes, one specific type, a nine cysteine-containing gamma-gliadin of 302 amino acids, occurs in all Aegilops genomes, and these proteins may have a function in protein network formation.

Additional files
Additional file 1: List of all contigs, number of sequences, and Genbank accessions numbers.
Additional file 3: Alignment of gamma-gliadin amino acid sequences (Nexus format).
Additional file 4: Maximum-likelihood tree of the gamma-gliadins (based on nucleotide sequences) from diploid species of tribe Triticeae. A maximum-likelihood (ML) analysis was performed with PhyML 3.0 using the GTR-substitution model. SH-like approximate likelihood-ratio test was used for estimation of branch support. Sequences that had length in the alignment less than 600 bp were excluded from the analysis. The gamma-gliadins fall into six groups (1-6 on the right) in two branches (1-2 and 3-4-5-6). Key for the sequence codes in Additional file 1.