Tracking the evolution of a cold stress associated gene family in cold tolerant grasses
© Sandve et al; licensee BioMed Central Ltd. 2008
Received: 30 April 2008
Accepted: 05 September 2008
Published: 05 September 2008
Grasses are adapted to a wide range of climatic conditions. Species of the subfamily Pooideae, which includes wheat, barley and important forage grasses, have evolved extreme frost tolerance. A class of ice binding proteins that inhibit ice re-crystallisation, specific to the Pooideae subfamily lineage, have been identified in perennial ryegrass and wheat, and these proteins are thought to have evolved from a leucine-rich repeat phytosulfokine receptor kinase (LRR-PSR)-like ancestor gene. Even though the ice re-crystallisation inhibition function of these proteins has been studied extensively in vitro, little is known about the evolution of these genes on the molecular level.
We identified 15 putative novel ice re-crystallisation inhibition (IRI)-like protein coding genes in perennial ryegrass, barley, and wheat. Using synonymous divergence estimates we reconstructed the evolution of the IRI-like gene family. We also explored the hypothesis that the IRI-domain has evolved through repeated motif expansion and investigated the evolutionary relationship between a LRR-domain containing IRI coding gene in carrot and the Pooideae IRI-like genes. Our analysis showed that the main expansion of the IRI-gene family happened ~36 million years ago (Mya). In addition to IRI-like paralogs, wheat contained several sequences that likely were products of polyploidisation events (homoeologs). Through sequence analysis we identified two short motifs in the rice LRR-PSR gene highly similar to the repeat motifs of the IRI-domain in cold tolerant grasses. Finally we show that the LRR-domain of carrot and grass IRI proteins both share homology to an Arabidopsis thaliana LRR-trans membrane protein kinase (LRR-TPK).
The diverse IRI-like genes identified in this study tell a tale of a complex evolutionary history including birth of an ice binding domain, a burst of gene duplication events after cold tolerant grasses radiated from rice, protein domain structure differentiation between paralogs, and sub- and/or neofunctionalisation of IRI-like proteins. From our sequence analysis we provide evidence for IRI-domain evolution probably occurring through increased copy number of a repeated motif. Finally, we discuss the possibility of parallel evolution of LRR domain containing IRI proteins in carrot and grasses through two completely different molecular adaptations.
The Poaceae family (grasses) contains some of the most economically important and well studied plant species, e.g. maize, wheat, barley, and rice. Generally speaking the Pooideae subfamily, which includes wheat, barley and forage grasses, are adapted to cold seasons. Many species in this subfamily can withstand temperatures far below freezing and intercellular ice formation [1, 2]. Rice and maize on the other hand belongs to the subfamilies Ehrhartoideae and Panicoideae, respectively, and are adapted to warm and tropical climates. Pooideae lineage (from now on referred to as cold tolerant grasses) adaptation to cold climates makes grasses an interesting model system for studying climatic adaptation at the physiological and molecular level.
Frost tolerance adaptations are in many organisms associated with the evolution of antifreeze proteins (AFPs) . AFPs can affect freezing- and ice crystallisation related stress via different mechanisms. Thermal hysteresis (TH) depresses the freezing point at which ice crystallisation initiates, which render it possible for organisms to survive under freezing temperatures. Ice re-crystallisation inhibition (IRI) on the other hand does not hinder ice crystallisation but manipulates the growth of the ice crystals such that small ice crystals grow at the expense of larger ice crystals, and this has been suggested to prevent or minimize the cellular damage in plants . A third mode of AFP action is membrane stabilisation which has been reported for a fish AFP . Animal AFPs generally possess high thermal hysteresis (TH) characteristics and lower ice crystallisation initiation temperature by 1–5°C [6, 7]. Plant AFPs on the other hand have low TH-activity, but exhibits strong ice re-crystallisation inhibition (IRI) activity .
Genes encoding peptides with IRI capacity have evolved independently several times in different lineages of higher plants. These IRI peptides are homologous to diverse protein classes, e.g. thaumatin like proteins, endochitinases, endo-B-1,3-glucanase, and leucine rich repeat (LRR) containing proteins [6, 8, 9]. Three LRR-domain containing IRI proteins (LRR-IRI) have been identified in plants, one in carrot (DcAFP; accession number AAC6293) and two in wheat (TaIRI1 and TaIRI2 with accession numbers AAX81542 and AAX81543) [10, 11]. DcAFP has been classified as a polygalacturonase-inhibiting protein (PGIP) but does not exhibit PGIP activity . LRR motifs span across the entire processed DcAFP protein and form 10-loop beta-helix secondary structure with solvent exposed asparagine residues at putative ice binding sites . TaIRI1 and TaIRI2 genes (accession numbers AY9968588 and AY968589) have been identified as homologous to the LRR-domain coding region of a rice phytosulfokine LRR receptor kinase (OsLRR-PSR: NP_001058711) and an Arabidopsis trans-membrane protein kinase (AtLRR-TPK: NP_200200). The wheat IRI peptides differ structurally from DcAFP in that the LRR-domain only comprises about half of the processed peptide .
In addition to the N-terminal LLR domain, wheat IRI proteins have a C-terminal repeat domain consisting of two similar A and B motifs, NxVxG and NxVxxG, respectively. This repeat domain has been reported to exhibit strong in vitro IRI capacity . Interestingly, blast search yields no sequences with homology to the IRI-domain outside the subfamily of cold tolerant grasses . Protein modelling has shown that the A and B repeated motifs of the IRI-domain folds into a B-roll with ice binding sites matching the prism face of ice . Expression studies have shown that increased expression levels in wheat  and perennial ryegrass [Rudi et al, unpublished] are correlated to cold acclimation, but no in vivo studies to determine the localisation of these grass IRI peptides have been reported in the literature. However, TaIRI1 and TaIRI2 have been predicted to encode a N-terminal 20 amino acid signal peptide domain targeting the proteins to the secretory pathway, suggesting that the peptides could be located in the extracellular space .
While DcAFP is evolutionarily closely related to PGIPs; TaIRI1 and TaIRI2 genes are thought to have evolved from a LRR-PSR like ancestor gene. Furthermore, the evolutionary origin of the IRI-domain in grass IRI genes is much less obvious, because the IRI-domain is not homologous to any other sequences outside the cold tolerant grass lineage. Tremblay et al.  proposed a "TE-hypothesis" to explain this apparent lack of homologous coding regions; that the IRI-domain had arisen by a transposable element (TE) insertion. However, no TE signature sequence could be identified surrounding the IRI-domain , thus no empirical data supports the TE-hypothesis so far.
Here we report the identification and characterisation of novel LRR-IRI homologous genes in cold tolerant grass species. We perform a detailed study of the evolutionary relationships between OsLRR-PSR and IRI-like genes by analysing sequence divergence at synonymous sites. We also use synonymous site divergence to trace the evolutionary history of the IRI-like gene family with respect to gene duplication events. The evolution of gene families per se is in itself a much debated topic, and gene family expansion and subsequent functional diversification is thought to have been a significant factor contributing to adaptations to new environments [16, 17]. The evolution of the IRI-like gene family of cold tolerant grasses is discussed in the context of the Duplication-Degeneration-Complementation (DDC) model . Finally we address the unresolved matter of the evolutionary mechanism underlying the birth of the cold tolerant grass IRI-domain, and propose a novel hypothesis on the evolution of this IRI-domain.
Screening of perennial ryegrass BAC libraries
Initial screening of the perennial ryegrass (Lolium perenne) LTS18 BAC library with the LpAFP primer pair produced two hits, from which LpIRI1 (EU680848) and LpIRI2 (EU680849) were isolated. The NV#20F1-30 BAC library produced four hits with the LpIRIx primer pair, and three hits with the LpAFP primer pair. LpIRI4 (EU680851) was subsequently isolated from one of the four positive LpIRIx hits and LpIRI3 (EU680850) was isolated from one of three positive LpAFP hits. All genes isolated from perennial ryegrass were intronless and encoded putative peptides with high identity to the wheat TaIRI1 and TaIRI2 genes (blastp < 4e-10). LpIRI1, LpIRI4, and LpIRI3 were similar in size and encoded peptides of 285, 242, and 254 amino acids, respectively. LpIRI2 encoded a shorter ORF of 150 amino acids that was 94% identical to the LpIRI4 IRI-domain. The IRI-domain of LpIRI3 is identical to an earlier identified partial IRI peptide encoded by LpAFP (AJ277399).
Nucleotide alignments of the LpIRI-like genes showed that LpIRI2 has undergone a deletion of almost the entire LRR-domain coding region, the only remains of it being a 102 base pair (bp) fragment upstream of the LpIRI2 putative start codon. This could indicate that LpIRI2 is a pseudogene or a non-functional allele. Non-functional sequences are expected to evolve under neutral expectation, which means that the rate of non-synonymous to synonymous substitutions (w) is expected to be 1. Average w between LpIRI2 and the other perennial ryegrass sequences was estimated to be 0.56 which suggests that LpIRI2 is under selective constraints despite the major deletion in the LRR-domain.
In silicoidentification of IRI-like sequences
All IRI-like sequences identified through EST in silico mining.
No. EST sequences
Full length IRI-like sequences identified through EST in silico mining and the number of ESTs per contig.
ESTs in contig
Predicted protein structure characterisation
The IRI-domain also varies in size by number of repeated motifs (Fig. 1). About 60% of all sequences with an IRI-domain have 15 repeat motifs or more. Six sequences were detected to have a reduced number of repeat motifs, or had completely lost the IRI-domain. Analysis of codon based nucleotide alignments revealed that frameshift (FS) mutations could be identified in four of the IRI-like sequences (TaC3, TaC10, TaC11, and AK249041) that showed reduced IRI-domain size (data not shown). HvC3 is the only IRI-like sequence with a completely reduced IRI-domain. For all sequences in the HvC3 contig additional information on abiotic conditions under which the plants had been grown were included in the EST files. Without exception all ESTs originated from tissue sampled from etiolated barley seedlings, and not from cold acclimated tissue. This is congruent with HvC3 lacking the entire ice binding IRI-domain, suggesting that IRI-like paralogs are involved in several different stress responses.
Prediction of the subcellular location of the IRI-like peptides (see methods) predicted a signal peptide that targets the peptides to the secretory pathway present in all IRI-like peptides, except from LpIRI2. The lack of an LpIRI2 signal peptide, and the fact that LpIRI2 has undergone a deletion of almost the entire LRR-domain could suggest that LpIRI2 is in fact a non-functional allele or pseudogene. However the results from the w estimates contradict the non-functionality hypothesis. Alternatively the lack of a signal peptide can be interpreted as that LpIRI2 simply has evolved a different function than the IRI-like peptides with a conserved signal peptide.
Phylogenetic analysis of IRI-like paralogs
Estimation of synonymous divergence of IRI-like sequences
If we assume a molecular clock, synonymous substitution rates (dS) between two DNA sequences can be interpreted as a relative measurement of time since MRCA, thus for two paralogous genes dS can be interpreted as the time since gene duplication . Without being able to account for all the IRI-like paralogs existing in a genome we cannot infer if two paralogs descend from a single duplication event (i.e. being true paralogs) or if they are products of two separate duplication events. We therefore restricted our initial analysis of IRI-like gene duplication events to only comprise the dSmax and dSmin for all pairwise comparisons. The dSmax-dSmin range can be interpreted as the evolutionary time span in which all duplications of IRI-like genes in our dataset have occurred.
Very low dS between paralog pairs might reflect the inclusion of highly diverged alleles in our paralog dataset. Sequencing errors in ESTs and inclusion of highly diverged genotypes in our dataset could potentially give an inflated polymorphism level producing artificial contigs that are alleles rather than paralogs. To identify putative false paralogs we set an allelic dS threshold of dS < 0.03 (see methods section). Based on this definition we identified two putative allelic sequence pairs TaC2-TaC11 (dS = 0.01) and LpIRI2-LpIRI4 (dS = 0.03).
Estimation of synonymous divergence between control genes
Evolutionary rate control genes and their pairwise synonymous distances.
Synonymous distance (dS)
Ta vs Lp
TaLp vs Os
Cytosolic glyceraldehyde-3-phosphate dehydrogenase
Casein protein kinase 2 alpha subunit
Na+/H+ antiporter precursor
Putative plasma membrane Na+/H+ antiporter
Myo-inositol phosphate synthase
Cinnamoyl CoA reductase
Fructan beta-(2,1) fructosidase
Molecular analysis of LRR and IRI-domains
Until now only three cold tolerant grass IRI protein coding genes have been reported; a partial coding sequence of an IRI-domain from perennial ryegrass , and two highly identical full length mRNA paralogs from wheat . Through in silico mining and BAC sequencing we have identified 15 full length and 8 partial novel IRI-like genes in cold tolerant grasses. In addition, we have obtained the complete sequence of LpAFP. The data accumulated leaves no doubt: cold tolerant grasses of the Pooideae subfamily have evolved a lineage specific family of IRI-like genes.
IRI-gene family radiation happened after the cold tolerant grass divergence
The prevailing hypothesis on the evolution of LRR-IRI-like genes belonging to cold tolerant grasses is that they are lineage specific and that an OsLRR-PSR-like gene is the MRCA . This hypothesis was proposed based on sequence homology data only and we therefore re-examined this idea using more rigorous statistical methods by estimating synonymous divergence. When employing a commonly used mutation rate for grasses of 6.5*10-9 [20–22], estimated by Gaut et al. , the synonymous divergence level between OsLRR-PSR and IRI-like sequences suggested a MRCA about 75 Mya. This is slightly higher than upper thresholds of some published rice-Pooideae divergence estimates . However, our estimate of divergence time between rice and cold tolerant grasses based on the control genes suggest a rice-Pooideae divergence only 42 Mya. This is similar to divergence estimates published by Patterson et al.  and Salse et al. , dating back 41–47 and ~46 Mya, respectively. Our wheat and perennial ryegrass divergence estimates is dated ~10 million years prior to a previously published estimate of ~35 Mya .
The observed discrepancy between the two divergence estimates of rice and cold tolerant grasses in our study (Fig. 3A and 3B) can be interpreted in two different ways. The OsLRR-PSR gene is the true ortholog of IRI-like genes and the incongruent divergence time estimates are caused by differences in molecular clock rates. Or alternatively, if the molecular clock rate is similar, it follows that OsLRR-PSR diverged from IRI-like genes long before rice and cold tolerant grasses diverged (Fig. 3). The burst of IRI-like sequence duplications must then have occurred in the ancestor genome of rice, and this implies that the rice genome subsequently must have lost all genes belonging to the IRI-like gene family. Even though loss of genes and whole gene families is not an uncommon feature of plant genome evolution [16, 24], elevated clock rate differences is a more parsimonious explanation to the divergence estimate differences seen in Figure 3A and 3B. Evolutionary rate differences are highly common among closely related species, different lineages of a species, and also within a genome [26, 27]. When using a single gene family to estimate divergence between species, as with the IRI-like genes, deviation from the average genome clock rate would be expected. As an example the clock rate of the ten control genes varied from 4.1–7.1*10-9, with an average rate of 5.4*10-9, when using a divergence time between rice-cold tolerant grasses of 50 My.
Assuming true orthologous relationship between OsLRR-PSR and IRI-like genes we can calibrate an average molecular clock rate for IRI-like genes using the dS = 0.97 and assuming an absolute divergence time of 50 Mya (see methods). This gives us an estimate of an IRI-like gene family specific clock rate of 9.7*10-9. Employing this adjusted clock rate pushes the estimates for the initiation of IRI-like gene duplications forward to 36, 27, and 39 Mya for wheat, perennial ryegrass and barley, respectively. This is approximately 3–14 My after our estimate for divergence between rice and cold tolerant grasses based on the control genes.
Species specific differences in IRI-like sequence numbers
Twice as many IRI-like sequences, partial and full length, were identified in wheat compared to barley (Table 1). In silico mining is vulnerable to methodologically introduced uncertainties. For example, the fact that the wheat EST database at NCBI is more than twice as large (1.2 M ESTs) than for barley (500 K ESTs) could be a contributing factor to the differences in numbers of IRI-like mined sequences because we expect that the EST database size is positively correlated to transcriptome coverage of an organism. A separate effect of a larger EST database will be the inclusion of ESTs from an increased number of genotypes, which could be a source of introduction of allelic polymorphisms.
Even though methodological properties might elevate the number of wheat sequences identified to some extent, we believe that much of the difference in wheat and barley IRI-like sequence numbers are related to genomic ploidy level differences. Wheat (Triticum aestivum) is an allo-hexaploid originating about 8.000 years ago. It has three homoeologous genomes A, B, and D, which are estimated to have diverged 4.5–2.5 Mya . Our results from the phylogenetic analyses, supported by the dS estimates, suggests that IRI-like sequences within monophyletic clade I and II (Fig. 2) could be homoeologous rather than paralogous. But low pairwise dS can alternatively reflect recent gene duplications. Consequently our inferences on the evolutionary relationship between putative wheat homoeologous sequences must be viewed in a critical manner.
A model of wheat IRI-like sequence evolution
The evolutionary model with adjusted clock rates strengthens our hypothesis on the homoeologous relationship of the wheat clade I and II sequences (Fig. 2). The divergence between TaC6 and TaC2/TaC11 are predicted slightly earlier than the polyploidisation event, thus this internal node could not be classified unambiguously. Sub-trees of clade I and II, in which we have included the barley sequence with lowest synonymous distance to each clade, are presented in Figure 6B and 6C, respectively. The sub-trees further support the hypothesis that clade I and II represents genes that mainly have arisen through polyploidisation events. In both sub-trees the closest related barley sequence is estimated to have diverged from the wheat clade I and II about 15 Mya, coinciding with wheat-barley divergence . However without a complete knowledge of the orthologous relationships of the IRI-like sequences the inferences on evolutionary relationships are somewhat speculative.
Structural and functional diversification of the IRI-like gene family
A striking feature of the IRI-like gene family is the structural differentiation between paralogs (Fig. 1). Structural diversification of IRI-like genes, as seen in our sequence collection, would be expected to affect the spectrum of IRI-like peptide function, because both LRR and IRI-domains are known to be involved in substrate binding [15, 28]. One interpretation of this pattern is that IRI-like sequences with complementary combinations of LRR motifs and IRI-domain sizes are selected for and retained in the genome, which is what we expect from the duplication-degeneration-complementation (DDC) model of paralog evolution . DDC predicts that mutations in regulatory elements increase the probability of paralog retention because it leads to partitioning of ancestral functions (subfunctionalisation), and the model has proven to be an important contribution to understanding evolution of paralogous genes [29, 30]. The DDC model has later been expanded to coding sequences [31, 32], and recently a combination of regulatory and structural DDC has been demonstrated [33, 34].
Regulatory subfunctionalisation in gene expression and tissue localisation has been demonstrated between TaIRI1 and TaIRI2 (TaC3) , two genes coding for highly divergent LRR-domain structure and length. In our study we have also found evidence that peptide structure divergence has led to sub- or neofunctionalisation. A barley IRI-like sequence contig (HvC3) with no IRI-domain still seems to play a functional role under etiolation. This suggests that the LRR-domain of IRI-like genes may play a functional role in multiple stress responses. Other LRR-domain containing genes in plants have also been shown to be involved in stress responses under drought stress and as a key membrane-bound regulator of absiscic acid signalling [35, 36].
One interesting aspect of the structural divergence pattern is that all genes except LpIRI2 are predicted to encode a conserved N-terminus signalling domain targeting the proteins to the secretory pathway. Secretion to the apoplast is expected for proteins with ice interacting functions. In the light of these data, an interpretation of the structural variability of LRR-domains, combined with the apparent conservation of the N-terminal signalling domain, is that IRI-like genes might be under selective pressure for a continuous ORF from the signalling domain across the LRR-domain and into the IRI-domain, conserving the crucial function of apoplast export of IRI peptides. The LRR-domain itself might not be under functional conservation. As an example: the full length sequenced mRNA AK249041 from barley has a N-terminal conserved predicted signal peptide motif, a completely reduced LRR-domain with no predicted LRR motifs, and an IRI-domain.
Less dramatic polymorphisms between paralogs, such as single amino acid substitutions or small motif number differences could potentially have a large effect on the functionality. Single amino acid substitutions have been shown to radically change AFP functionality in both plant and animal AFPs [13, 37]. Chakrabartty and co-workers  showed that only small deletions in an AFP with repetitive structure from flounder altered the ice interacting properties dramatically. Thus, all the observed polymorphisms between IRI paralogs, even down to single amino acid substitutions, could potentially be of functional significance.
Birth of an IRI repeat domain
The molecular mechanisms underlying the metamorphosis from an OsLRR-PSR-like ancestor gene into the first bipartite IRI-like gene have been addressed by Trembley et al. . They proposed the "transposable element hypothesis" (TE) suggesting that the IRI-domain is a TE insertion that has resulted in a FS mutation and caused the loss of the PK-domain. However no TE signatures were found flanking the IRI-domain . Based on results from our sequence analysis (Fig. 4) we propose a competing hypothesis on the evolution of the IRI-domain, namely the repeated motif expansion (RME) hypothesis. It has been shown that expansions of domains by duplication of repeated motifs are common in genes of repetitive structure . We suggest that IRI motifs have increased in copy number by a yet unknown mechanism, possibly illegitimate recombination, slippage, or uneven crossing over. Contrary to the TE-hypothesis the RME hypothesis can explain the evolution of the IRI-domain and at the same time account for the existence of two IRI motif-like blocks in OsLRR-PSR (Fig 4). Lastly, if the entire IRI-domain is a TE-insertion we would expect this TE sequence to be found at other loci in grasses. However no such reports of TE-like sequences homologous to the IRI-domain are known to our knowledge.
Convergent evolution of LRR containing AFPs
LRR-domain containing proteins are extremely abundant in plants. The largest LRR containing plant peptide group is LRR receptor kinases (LRR-RK), having more than 200 members in the A. thaliana genome . Plant disease resistance associated genes (NBS-LRR) comprise another large LRR containing functional group . Common for the function of LRR domains in any peptide is that they are associated with peptide-peptide recognition and binding interactions [42–44].
Through comparative protein domain analysis we have shown that LRR-domains of IRI-like genes are much less conserved compared to the predicted signal peptide motif flanking the N-terminus of the LRR-domains. We believe that this could be due to lack of selective constrains on the LRR-domain function itself, or perhaps selection for divergent LRR-domain functions as predicted by DDC. Whatever functional role today's IRI-like sequence LRR-domains might play; there is little doubt that the LRR-domain of IRI-like genes in cold tolerant grasses shares an ancient common ancestor with the LRR-domain of DcAFP (Fig. 5). However, while cold tolerant grass IRI-like proteins have evolved ice binding capacity through the evolution of an IRI-domain [14, 15], DcAFP have evolved ice binding capacity through changes in the LRR-domain itself . DcAFP and grass LRR-IRI genes are therefore intriguing examples of parallel evolution of function by two completely different molecular mechanisms; evolutionary alterations of a pre-existing LRR-domain and evolution of a novel repeat domain with ice binding properties.
The IRI-like genes identified by Sidebottom et al.  and Tremblay et al. , and in this study tell a tale of a complex evolutionary history that includes birth of an ice binding domain, a burst of gene duplication events after cold tolerant grasses radiated from rice, domain structure differentiation between paralogs, and sub- and/or neofunctionalisation of IRI-like proteins. Given more detailed functional studies, the IRI-like gene family can provide a valuable example of how duplicated genes evolve novel functional spectres. The hypothesis that evolution of IRI-like genes has been important for Pooideae grass adaptation to cold climate  is strengthened by this study as we show that the evolution of the IRI-like gene family probably happened after the divergence from rice, and furthermore that the numbers of IRI-like genes are higher than earlier known.
In silicoIRI-like sequence mining
A blastn search in the NCBI database was performed using TaIRI1. All sequences with blast E-value < 1*10-20 were downloaded from the EST and core nucleotide databases. Contigs were aligned with alignment parameters set to > 97% identity and > 40 nucleotides overlap using Sequencher (Gene Codes Corp., Ann Arbor, MI, USA). The 97% identity threshold was set to allow contig alignments to include different allelic forms and polymorphisms caused by EST sequencing errors. Non-coding nucleotides (i.e. promoter and 3'UTR) were removed after an initial prediction of open reading frame (ORF), and subsequently the sequences were realigned with identical parameters. All contigs were translated into their predicted amino acid sequence. Sequence contigs with lack of start and stop codon due to incomplete sequence coverage or putative sequence errors causing frame shift mutations were not included in the analysis. We validated the in silico mining method by aligning EST mined unigenes with four full length cDNA clones of grass IRI-like genes from the NCBI core nucleotide collection (barley; AK252915, AK249041/wheat; AY968588, AY968589).
BAC identification and sequencing
Two perennial ryegrass BAC libraries were used to identify novel IRI-like genes . Primers for the initial identification of novel IRI-like genes were designed from coding sequences of LpAFP (AJ277399) and a partial sequence of a Festuca pratensis IRI-like homolog (EU684537). The LpAFP primer pair had forward primer 5'GATGAACAGCCGAATACGATTTCT3' and reverse primer 5'GCTTCCAGATACAACGTGGTTGCT3', denaturing at 94°C for 4 minutes and then 35 cycles of 94°C 30 s, 60°C 45 s, 72°C 45 s, and 72°C 10 min. Primer pairs designed from the F. pratensis sequence were forward primer 5'TGTCATATCGGGGAACAACA3' and reverse primer 5'ACATGGTTTCGTCCGGATAC3' denaturing at 94°C for 4 minutes and then 40 cycles of 94°C 10 s, 60°C 45 s, and 72°C 1.30 min, and 70°C for 10 min. We also designed a third primer pair, referred to as LpIRIx primer pair, with forward primer 5'GAATGCCGTATCTGGGGACC3' and reverse primer 5'GTGGTTCCCGGATACGGTATT3', based on multiple sequence data acquired from sequencing of the above mentioned genes. This primer pair was used under the same conditions as the LpAFP primers. DNA maxi-preps of the BAC-clones were preformed using the NucloBond BAC 100 Kit (MACHEREY-NAGEL, Düren, Germany). For BAC-sequencing 500 ng BAC-DNA was combined with 20 μM of primer, 8 μl BigDye 3.1 ready mix and dH20, to 20 μl total volume. Following 5 min of denaturising at 95°C, 50 cycles were performed with 30 s at 95°C, 10 s 50°C, and 4 min 60°C. Subsequently, the sequencing reactions were precipitated and sequenced on an ABI PRISM 3100 (Applied Biosystems, Foster City, CA, USA).
Protein domain characterisation
Predicted peptide LRR domains were characterised using Pfam . We verified the Pfam results by visual inspection of the sequences defining a LRR motif as LxxLxLxx, or variations of it where L is substituted with I, V, or A. To track the molecular evolution of the LRR-domain, OsLRR-PSR was used as a template for comparison to the predicted domains of IRI-like sequences. LRR motifs predicted by Pfam were considered significant if the Pfam E-value was lower than 0.05. IRI-like amino acid sequences were aligned with the LRR-domain of OsLRR-PSR and the LRR motifs in IRI-like sequences were named according to which of the LRR motif number in OsLRR-PSR they aligned to. IRI-domain characterisation was performed by visual scoring of the total number of repeat motifs (NxVxG/NxVxxG). IRI-domain repeat motifs were considered as "present", and counted, when they contained no more than one amino acid substitution compared to the consensus motifs. Signal peptide domains were predicted by TargetP .
Estimation of substitution rates
To estimate divergence times between putatively paralogous and orthologous sequences we used the average dS obtained from three different methods, Nei & Gojobori , Kumar , and the Li-Wu-Luo  method, in MEGA (4.0) . As a control for evolutionary rates of IRI-like genes we calculated the average dS values of ten randomly selected orthologs from wheat, perennial ryegrass, and rice. Maximum likelihood estimation of non-synonymous to synonymous substitution ratios (w) was performed using Codeml in the PAML software package (v 3.15) . The 3 × 4 codon substitution model was chosen for Codeml w estimations. PAL2NAL  was used to make codon based nucleotide alignments for the use in MEGA and PAML. The absolute time of divergence between orthologs and paralogs was estimated using a rate of 6.5*10-9 substitutions/synonymous site/year for grasses . For estimation of mutation rates and absolute divergence times we used the relationship k = dS/2T, where k is the absolute rate of synonymous substitutions per year, dS is the synonymous substitution rate, and T is the absolute time since divergence. To identify putative alleles not grouped in the same contig due to methodological errors, a cut-off threshold of dS = 0.03 was used. This threshold was set on the basis of average inter-allelic dS for LRR-domains of 27 disease resistance like genes in A. thaliana , and inter-allelic dSmax of LpIRI1 calculated from twelve European perennial ryegrass genotypes (dSmax = 0.015, data not published).
Molecular and phylogenetic analysis
All amino acid and nucleotide alignments were made by MAFFT  and manually edited in BioEdit , and the phylogenetic trees were constructed in Treefinder . An AIC criteria test , implemented in the Modeltest option in Treefinder, was used to choose substitution model for the phylogenetic analysis. ML trees were bootstrapped with 1000 replicates. Synonymous distance based trees were inferred by UPGMA from a pairwise dS distance matrix in MEGA. Alignment figures were prepared by BoxShade http://www.ch.embnet.org/software/BOX_faq.html.
We thank Kjetil Fosnes for assistance with BAC sequencing and Dr. Magnus Dehli Vigeland, Dr. Gordon Allison, and anonymous reviewers for valuable comments on the manuscript. This study was funded through the KMB project Festulolium with Improved Forage Quality and Winter Survival for Norwegian Farming, project number 173319/I10, funded by the Research Council of Norway and Graminor AS.
- Pearce RS, Fuller MP: Freezing of Barley Studied by Infrared Video Thermography. Plant Physiol. 2001, 125 (1): 227-240. 10.1104/pp.125.1.227.PubMed CentralView ArticlePubMed
- Márquez E, Rada F, Fariñas M: Freezing tolerance in grasses along an altitudinal gradient in the Venezuelan Andes. Oecologia. 2006, 150 (3): 393-397. 10.1007/s00442-006-0556-3.View ArticlePubMed
- Zachariassen KE, Kristiansen E: Ice Nucleation and Antinucleation in Nature. Cryobiology. 2000, 41 (4): 257-279. 10.1006/cryo.2000.2289.View ArticlePubMed
- Smallwood M, Bowles JD: Plants in a cold climate. Philos Trans R Soc Lond B Biol Sci. 2002, 357 (1423): 831-847. 10.1098/rstb.2002.1073.PubMed CentralView ArticlePubMed
- Tomczak MM, Hincha DK, Estrada SD, Wolkers WF, Crowe LM, Feeney RE, Tablin F, Crowe JH: A Mechanism for Stabilization of Membranes at Low Temperatures by an Antifreeze Protein. Biophys J. 2002, 82 (2): 874-881.PubMed CentralView ArticlePubMed
- Griffith M, Yaish MWF: Antifreeze proteins in overwintering plants: a tale of two activities. Trends in Plant Science. 2004, 9 (8): 399-405. 10.1016/j.tplants.2004.06.007.View ArticlePubMed
- Barrett J: Thermal hysteresis proteins. The International Journal of Biochemistry & Cell Biology. 2001, 33 (2): 105-117. 10.1016/S1357-2725(00)00083-2.View Article
- Hon WC, Griffith M, Mlynarz A, Kwok YC, Yang DSC: Antifreeze Proteins in Winter Rye Are Similar to Pathogenesis-Related Proteins. Plant Physiol. 1995, 109 (3): 879-889. 10.1104/pp.109.3.879.PubMed CentralView ArticlePubMed
- Smallwood M, Worrall D, Byass L, Elias L, Ashford D, Doucet CJ, Holt C, Telford J, Lillford P, Bowles DJ: Isolation and characterization of a novel antifreeze protein from carrot (Daucus carota). Biochem J. 1999, 340 (2): 385-391. 10.1042/0264-6021:3400385.PubMed CentralView ArticlePubMed
- Tremblay K, Ouellet F, Fournier J, Danyluk J, Sarhan F: Molecular Characterization and Origin of Novel Bipartite Cold-regulated Ice Recrystallization Inhibition Proteins from Cereals. Plant Cell Physiol. 2005, 46 (6): 884-891. 10.1093/pcp/pci093.View ArticlePubMed
- Worrall D, Elias L, Ashford D, Smallwood M, Sidebottom C, Lillford P, Telford J, Holt C, Bowles D: A Carrot Leucine-Rich-Repeat Protein That Inhibits Ice Recrystallization. Science. 1998, 282 (5386): 115-117. 10.1126/science.282.5386.115.View ArticlePubMed
- Zhang D-Q, Wang H-B, Liu B, Feng D-R, He Y-M, Wang J-F: Carrot Antifreeze Protein Does Not Exhibit the Polygalacturonase-inhibiting Activity of PGIP Family. Acta Genetica Sinica. 2006, 33 (11): 1027-1036. 10.1016/S0379-4172(06)60139-X.View ArticlePubMed
- Zhang DQ, Liu B, Feng DR, He YM, Wang SQ, Wang HB, Wang JF: Significance of conservative asparagine residues in the thermal hysteresis activity of carrot antifreeze protein. Biochemical Journal. 2004, 377: 589-595. 10.1042/BJ20031249.PubMed CentralView ArticlePubMed
- Sidebottom C, Buckley S, Pudney P, Twigg S, Jarman C, Holt C, Telford J, McArthur A, Worrall D, Hubbard R, et al: Phytochemistry: Heat-stable antifreeze protein from grass. Nature. 2000, 406 (6793): 256-256. 10.1038/35018639.View ArticlePubMed
- Kuiper MJ, Davies PL, Walker VK: A Theoretical Model of a Plant Antifreeze Protein from Lolium perenne. Biophys J. 2001, 81 (6): 3560-3565.PubMed CentralView ArticlePubMed
- Rensing SA, Lang D, Zimmer AD, Terry A, Salamov A, Shapiro H, Nishiyama T, Perroud P-F, Lindquist EA, Kamisugi Y, et al: The Physcomitrella Genome Reveals Evolutionary Insights into the Conquest of Land by Plants. Science. 2008, 319 (5859): 64-69. 10.1126/science.1150646.View ArticlePubMed
- Kondrashov FA, Kondrashov AS: Role of selection in fixation of gene duplications. Journal of Theoretical Biology. 2006, 239 (2): 141-151. 10.1016/j.jtbi.2005.08.033.View ArticlePubMed
- Force A, Lynch M, Pickett FB, Amores A, Yan Y-l, Postlethwait J: Preservation of Duplicate Genes by Complementary, Degenerative Mutations. Genetics. 1999, 151 (4): 1531-1545.PubMed CentralPubMed
- Blanc G, Hokamp K, Wolfe KH: A Recent Polyploidy Superimposed on Older Large-Scale Duplications in the Arabidopsis Genome. Genome Res. 2003, 13 (2): 137-144. 10.1101/gr.751803.PubMed CentralView ArticlePubMed
- Salse J, Bolot S, Throude M, Jouffe V, Piegu B, Masood Quraishi U, Calcagno T, Cooke R, Delseny M, Feuillet C: Identification and Characterization of Shared Duplications between Rice and Wheat Provide New Insight into Grass Genome Evolution. Plant Cell. 2008, tpc.107.056309
- Ma J, Bennetzen JL: Recombination, rearrangement, reshuffling, and divergence in a centromeric region of rice. Proceedings of the National Academy of Sciences of the United States of America. 2006, 103 (2): 383-388. 10.1073/pnas.0509810102.PubMed CentralView ArticlePubMed
- Blanc G, Wolfe KH: Widespread Paleopolyploidy in Model Plant Species Inferred from Age Distributions of Duplicate Genes. Plant Cell. 2004, 16 (7): 1667-1678. 10.1105/tpc.021345.PubMed CentralView ArticlePubMed
- Gaut BS, Morton BR, McCaig BC, Clegg MT: Substitution rate comparisons between grasses and palms: Synonymous rate differences at the nuclear gene Adh parallel rate differences at the plastid gene rbcL. Proceedings of the National Academy of Sciences. 1996, 93 (19): 10274-10279. 10.1073/pnas.93.19.10274.View Article
- Paterson AH, Bowers JE, Chapman BA: Ancient polyploidization predating divergence of the cereals, and its consequences for comparative genomics. Proceedings of the National Academy of Sciences. 2004, 101 (26): 9903-9908. 10.1073/pnas.0307901101.View Article
- Huang S, Sirikhachornkit A, Su X, Faris J, Gill B, Haselkorn R, Gornicki P: Genes encoding plastid acetyl-CoA carboxylase and 3-phosphoglycerate kinase of the Triticum/Aegilops complex and the evolutionary history of polyploid wheat. Proceedings of the National Academy of Sciences. 2002, 99 (12): 8133-8138. 10.1073/pnas.072223799.View Article
- Sloan DB, Barr CM, Olson MS, Keller SR, Taylor DR: Evolutionary Rate Variation at Multiple Levels of Biological Organization in Plant Mitochondrial DNA. Mol Biol Evol. 2008, 25 (2): 243-246. 10.1093/molbev/msm266.View ArticlePubMed
- DeRose-Wilson L, Gaut B: Transcription-related mutations and GC content drive variation in nucleotide substitution rates across the genomes of Arabidopsis thaliana and Arabidopsis lyrata. BMC Evolutionary Biology. 2007, 7 (1): 66-10.1186/1471-2148-7-66.PubMed CentralView ArticlePubMed
- Ellis J, Dodds P, Pryor T: The generation of plant disease resistance gene specificities. Trends in Plant Science. 2000, 5 (9): 373-379. 10.1016/S1360-1385(00)01694-0.View ArticlePubMed
- Gu Z, Rifkin SA, White KP, Li W-H: Duplicate genes increase gene expression diversity within and between species. Nat Genet. 2004, 36 (6): 577-579. 10.1038/ng1355.View ArticlePubMed
- Hughes AL, Friedman R: Expression Patterns of Duplicate Genes in the Developing Root in Arabidopsis thaliana. Journal of Molecular Evolution. 2005, 60 (2): 247-256. 10.1007/s00239-004-0171-z.View ArticlePubMed
- Tocchini-Valentini GD, Fruscoloni P, Tocchini-Valentini GP: From the Cover: Structure, function, and evolution of the tRNA endonucleases of Archaea: An example of subfunctionalization. Proceedings of the National Academy of Sciences. 2005, 102 (25): 8933-8938. 10.1073/pnas.0502350102.View Article
- Cusack BP, Wolfe KH: When gene marriages don't work out: divorce by subfunctionalization. Trends in Genetics. 2007, 23 (6): 270-272. 10.1016/j.tig.2007.03.010.View ArticlePubMed
- Babbitt CC, Haygood R, Wray GA: When Two Is Better Than One. Cell. 2007, 131 (2): 225-227. 10.1016/j.cell.2007.10.001.View ArticlePubMed
- Hittinger CT, Carroll SB: Gene duplication and the adaptive evolution of a classic genetic switch. Nature. 2007, 449 (7163): 677-681. 10.1038/nature06151.View ArticlePubMed
- Osakabe Y, Maruyama K, Seki M, Satou M, Shinozaki K, Yamaguchi-Shinozaki K: Leucine-Rich Repeat Receptor-Like Kinase1 Is a Key Membrane-Bound Regulator of Abscisic Acid Early Signaling in Arabidopsis. Plant Cell. 2005, 17 (4): 1105-1119. 10.1105/tpc.104.027474.PubMed CentralView ArticlePubMed
- Chini A, Grant JJ, Seki M, Shinozaki K, Loake GJ: Drought tolerance established by enhanced expression of the CC-NBS-LRR gene, ADR1, requires salicylic acid, EDS1 and ABI1. The Plant Journal. 2004, 38 (5): 810-822. 10.1111/j.1365-313X.2004.02086.x.View ArticlePubMed
- Haymet ADJ, Ward LG, Harding MM, Knight CA: Valine substituted winter flounder "antifreeze": preservation of ice growth hysteresis. FEBS letters. 1998, 430: 301-306. 10.1016/S0014-5793(98)00652-8.View ArticlePubMed
- Chakrabartty A, Yang DS, Hew CL: Structure-function relationship in a winter flounder antifreeze polypeptide. II. Alteration of the component growth rates of ice by synthetic antifreeze polypeptides. J Biol Chem. 1989, 264 (19): 11313-11316.PubMed
- Wicker T, Yahiaoui N, Keller B: Illegitimate recombination is a major evolutionary mechanism for initiating size variation in plant resistance genes. The Plant Journal. 2007, 51 (4): 631-641. 10.1111/j.1365-313X.2007.03164.x.View ArticlePubMed
- Shiu S-H, Bleecker AB: Receptor-like kinases from Arabidopsis form a monophyletic gene family related to animal receptor kinases. Proceedings of the National Academy of Sciences. 2001, 98 (19): 10763-10768. 10.1073/pnas.181141598.View Article
- Belkhadir Y, Subramaniam R, Dangl JL: Plant disease resistance protein signaling: NBS-LRR proteins and their partners. Current Opinion in Plant Biology. 2004, 7 (4): 391-399. 10.1016/j.pbi.2004.05.009.View ArticlePubMed
- Ogawa M, Shinohara H, Sakagami Y, Matsubayashi Y: Arabidopsis CLV3 Peptide Directly Binds CLV1 Ectodomain. Science. 2008, 319 (5861): 294-10.1126/science.1150083.View ArticlePubMed
- Thomas CM, Jones DA, Parniske M, Harrison K, Balint-Kurti PJ, Hatzixanthis K, Jones JDG: Characterization of the Tomato Cf-4 Gene for Resistance to Cladosporium fulvum Identifies Sequences That Determine Recognitional Specificity in Cf-4 and Cf-9. Plant Cell. 1997, 9 (12): 2209-2224. 10.1105/tpc.9.12.2209.PubMed CentralView ArticlePubMed
- Meyers BC, Shen KA, Rohani P, Gaut BS, Michelmore RW: Receptor-like Genes in the Major Resistance Locus of Lettuce Are Subject to Divergent Selection. Plant Cell. 1998, 10 (11): 1833-1846. 10.1105/tpc.10.11.1833.PubMed CentralView ArticlePubMed
- Farrar K, Asp T, Lubberstedt T, Xu ML, Thomas AM, Christiansen C, Humphreys MO, Donnison IS: Construction of two Lolium perenne BAC libraries and identification of BACs containing candidate genes for disease resistance and forage quality. Molecular Breeding. 2007, 19 (1): 15-23. 10.1007/s11032-006-9036-z.View Article
- Finn RD, Mistry J, Schuster-Bockler B, Griffiths-Jones S, Hollich V, Lassmann T, Moxon S, Marshall M, Khanna A, Durbin R, et al: Pfam: clans, web tools and services. Nucl Acids Res. 2006, 34 (suppl_1): D247-251. 10.1093/nar/gkj149.PubMed CentralView ArticlePubMed
- Nielsen H, Engelbrecht J, Brunak S, von Heijne G: Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Protein Eng. 1997, 10 (1): 1-6. 10.1093/protein/10.1.1.View ArticlePubMed
- Nei M, Gojobori T: Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol Biol Evol. 1986, 3 (5): 418-426.PubMed
- Nei M, Kumar S: Molecular Evolution and Phylogenetics. 2000, Oxford University Press
- Li WH, Wu CI, Luo CC: A new method for estimating synonymous and nonsynonymous rates of nucleotide substitution considering the relative likelihood of nucleotide and codon changes. Mol Biol Evol. 1985, 2 (2): 150-174.PubMed
- Tamura K, Dudley J, Nei M, Kumar S: MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) Software Version 4.0. Mol Biol Evol. 2007, 24 (8): 1596-1599. 10.1093/molbev/msm092.View ArticlePubMed
- Yang Z: PAML: a program package for phylogenetic analysis by maximum likelihood. Computational Applied Bioscience. 1997, 13 (5): 555-556.
- Suyama M, Torrents D, Bork P: PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments. Nucl Acids Res. 2006, 34 (suppl_2): W609-612. 10.1093/nar/gkl315.PubMed CentralView ArticlePubMed
- Bakker EG, Toomajian C, Kreitman M, Bergelson J: A Genome-Wide Survey of R Gene Polymorphisms in Arabidopsis. Plant Cell. 2006, 18 (8): 1803-1818. 10.1105/tpc.106.042614.PubMed CentralView ArticlePubMed
- Katoh K, Kuma K-i, Toh H, Miyata T: MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucl Acids Res. 2005, 33 (2): 511-518. 10.1093/nar/gki198.PubMed CentralView ArticlePubMed
- Hall TA: BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symposium Series. 1999, 41: 95-98.
- Jobb G, von Haeseler A, Strimmer K: TREEFINDER: a powerful graphical analysis environment for molecular phylogenetics. BMC Evolutionary Biology. 2004, 4 (1): 18-10.1186/1471-2148-4-18.PubMed CentralView ArticlePubMed
- Aikake H: A new look at the statistical model identification. IEEE Transactions on Automatic Control. 1974, 19 (3): 716-723. 10.1109/TAC.1974.1100705.View Article
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.