- Research article
- Open Access
The distribution and evolutionary history of the PRP8 intein
BMC Evolutionary Biologyvolume 6, Article number: 42 (2006)
We recently described a mini-intein in the PRP8 gene of a strain of the basidiomycete Cryptococcus neoformans, an important fungal pathogen of humans. This was the second described intein in the nuclear genome of any eukaryote; the first nuclear encoded intein was found in the VMA gene of several saccharomycete yeasts. The evolution of eukaryote inteins is not well understood. In this report we describe additional PRP8 inteins (bringing the total of these to over 20). We compare and contrast the phylogenetic distribution and evolutionary history of the PRP8 intein and the saccharomycete VMA intein, in order to derive a broader understanding of eukaryote intein evolution. It has been suggested that eukaryote inteins undergo horizontal transfer and the present analysis explores this proposal.
In total, 22 PRP8 inteins have been detected in species from three different orders of euascomycetes, including Aspergillus nidulans and Aspergillus fumigatus (Eurotiales), Paracoccidiodes brasiliensis, Uncinocarpus reesii and Histoplasma capsulatum (Onygales) and Botrytis cinerea (Helotiales). These inteins are all at the same site in the PRP8 sequence as the original Cryptococcus neoformans intein. Some of the PRP8 inteins contain apparently intact homing endonuclease domains and are thus potentially mobile, while some lack the region corresponding to the homing endonuclease and are thus mini-inteins. In contrast, no mini-inteins have been reported in the VMA gene of yeast. There are several examples of pairs of closely related species where one species carries the PRP8 intein while the intein is absent from the other species. Bio-informatic and phylogenetic analyses suggest that many of the ascomycete PRP8 homing endonucleases are active. This contrasts with the VMA homing endonucleases, most of which are inactive.
PRP8 inteins are widespread in the euascomycetes (Pezizomycota) and apparently their homing endonucleases are active. There is no evidence for horizontal transfer within the euascomycetes. This suggests that the intein is of ancient origin and has been vertically transmitted amongst the euascomycetes. It is possible that horizontal transfer has occurred between the euascomycetes and members of the basidiomycete genus Cryptococcus.
An intein is a specific insertion within a host protein that is excised during protein maturation, that is, post-translationally . This maturation process is termed "protein splicing" and involves precise excision of the in ternal protein (intein) sequence and ligation of the flanking ex ternal protein (extein) sequences to form a peptide bond. The excision of the intein and the subsequent ligation of the flanking host exteins are catalysed by the intein itself [2, 3]. The sequence encoding the intein appears as an in-frame insertion within the gene for the host protein. For the sake of simplicity, this intein encoding sequence is also often referred to as an intein.
As well as controlling their own protein splicing, many inteins also include site-specific 'homing' DNA endonucleases. These homing endonucleases belong to several distinct families (for example the LAGLIDADG and His-Cys box groups); the LAGLIDADG type is by far the most common in inteins. Related homing endonucleases are found encoded in Group I self-splicing introns [4, 5]. Homing endonucleases can cause a gene conversion event that converts a cell heterozygous for the intein into a homozygote. To initiate this gene conversion, the homing endonuclease, encoded by an intein sequence located at a specific site, recognises a long DNA target sequence corresponding to an unoccupied allelic site. The homing endonuclease generates a double strand chromosomal break within the unoccupied target sequence. This break is repaired by the host DNA repair machinery using as a template the allele containing the intein gene. This gene conversion results in the replication of the intein gene into a specific site in a previously unoccupied allele. The occupied allele is no longer a target for the homing endonuclease because the target site is split by the insertion . Thus, inteins are 'selfish' mobile elements that occupy unique, specific sites in the genome. When they are copied into an empty allele, they are still retained by the donor allele. These characteristics make inteins especially effective tools in phylogenetic analysis of such phenomena as horizontal transfer.
It has been hypothesised, using information derived mainly from studies of the VMA intein in Saccharomyces, that at the population level the 'super-Mendelian' replication process caused by the homing endonuclease will increase the frequency of the intein within the gene pool of a sexual species, until eventually the intein may come to fixation [7, 8]. The rate of spread through a population will depend upon the host mating system . At fixation there will be no remaining unoccupied alleles. Intra-specific movement of the intein will no longer occur and selection will no longer operate on the homing endonuclease function. In the absence of selection, the homing endonuclease function of the intein will become inactive through random mutation . In agreement with this prediction many yeast VMA inteins have been shown to have non-functional homing endonucleases . It has been suggested that inteins depend on horizontal transmission to species that do not already contain an intein at a specific site to ensure their long-term survival [10, 11] and retention of active homing endonuclease. This horizontal transfer between species could be initiated by the encoded homing endonuclease. It has been hypothesised  that inteins undergo recurrent cycles of (i) horizontal transmission to new genomes, (ii) fixation in the recipient population by homing, (iii) degeneration of the homing endonuclease gene (HEG) due to the absence of target sequences and (iv) eventual loss by deletion of the whole intein.
About 15% of the known inteins lack an endonuclease domain. These minimal protein splicing elements (mini-inteins) are probably derived from full-length inteins by deletion. Mini-inteins are 130–200 amino acids in length with conserved sequence blocks at each end, while 'full-length' inteins (with a central homing endonuclease domain) are about 360–550 amino acids in length . Both mutagenesis studies  and 3-D crystal structures  have indicated that the two functions inherent in a mobile intein are largely separate. Thus the endonuclease domain is a discrete region of the intein that has little or no functional or structural overlap with the protein splicing domain . Mini-inteins can be considered as an extreme example of HEG corruption and loss. This loss of homing endonuclease function will prevent interspecific or intraspecific replication by homing, restricting the evolution of mini-inteins and their phylogenetic distribution.
Inteins are found in diverse prokaryotes and a few unicellular eukaryotes. Until recently, the only eukaryote nuclear gene intein described was the VMA intein found in the vacuolar ATPase gene of a number of hemiascomycete yeasts such as Saccharomyces cerevisiae . These inteins all appear in exactly the same site (VMA-a) within the host VMA1 gene. Such inteins are referred to as 'allelic inteins', even though they are from different species (there is a non-allelic intein in the VMA-b insertion site of the vacuolar ATPase of several Archaea). It was analyses of the yeast VMA inteins, all of which contain a LAGLIDADG HEG, which suggested the possibility of frequent horizontal transfer between species of ascomycete yeasts although the mechanism for such a transfer of nuclear genes is unknown [17, 18]. The suggestion was based on the comparison of the intein phylogeny and the host gene (VMA1).
Recently we reported the presence and sequences of a second set of allelic nuclear encoded inteins (PRP8 inteins) in a large number of strains of Cryptococcus neoformans and in the related species Cryptococcus gattii [19, 20]. Cryptococcus neoformans is a basidiomycete fungus capable of causing serious infections in both immunocompromised and immunocompetent people . C. neoformans is divided into two varieties, C. neoformans var. neoformans and C. neoformans var grubii. Molecular phylogenetic work indicates that the grubii and neoformans varieties are separated by ~18.5 million years of evolution, and these varieties diverged from C. gattii ~37 million years ago . Intein encoding sequences are present in the PRP8 genes of both C. neoformans varieties and also in C. gattii. These inteins all lack homing endonuclease domains and are thus mini-inteins. The sequence differences of the mini-inteins reflect the relationships among the host species; for example, the variety neoformans and variety grubii mini-inteins are more similar to each other than either is to the inteins of C. gattii .
The Cryptococcus mini-inteins are encoded within the nuclear gene for PRP8, a highly conserved protein central in the formation of the spliceosome, that coordinates multiple processes in spliceosome activation [23, 24]. The protein-splicing component of the PRP8 inteins must remain functional, even after HEG corruption and loss, so that the intein can remove itself from the PRP8 precursor. If the intein is not removed, it is probable that the spliceosome will be non-functional and the fungus will not be able to process introns from messenger RNA. The presence of the mini-intein in both varieties of C. neoformans and in C. gattii is consistent with their vertical inheritance from their common ancestor but the original source of the intein in Cryptococcus remains unclear. The closest relative of C. neoformans and C. gattii, Cryptococcus amylolentus, does not contain a PRP8 intein, nor does a slightly more distant relative, C. heveanensis . The only other member of the Tremallales clade known to contain a PRP8 intein is Cryptococcus laurentii, a species relatively distantly related to C. neoformans. The C. laurentii intein is a full-length, HEG-containing, intein .
We have investigated the distribution of the PRP8 intein in order to clarify its evolutionary history and, by inference, to better understand the behaviour of inteins in eukaryotes. Previously, PRP8 inteins have been described from four species of Cryptococcus  and three from filamentous ascomycetes, Aspergillus nidulans, Aspergillus fumigatus and Histoplasma capsulatum . In order to understand intein distribution we have screened the genomes of various fungi for PRP8 inteins. In this work we describe the PRP8 inteins of Paracoccidioides brasiliensis and Uncinocarpus reesii of the Onygales, Botrytis cinerea of the Helotiales and inteins from diverse members of the Sections Fumigati and Clavati of the Eurotiales. Many of these are mini-inteins. This is in contrast to the VMA inteins of yeast, all of which are full-length, although many of their encoded endonucleases are no longer active. In three orders of euascomycete fungi the PRP8 genes have been found to encode inteins with homing endonucleases. Within two of these orders we found closely related species that contain mini-inteins. All three orders contain species that do not contain any PRP8 inteins, often species closely related to intein-carrying species. Thus the phylogenetic distribution of these inteins poses some interesting questions. Comparing and contrasting the datasets of the fungal PRP8 and VMA inteins clarifies some of these questions.
PRP8 inteins in ascomycete sequence databases
To examine the distribution of PRP8 inteins in fungi other than species of Cryptococcus we searched the databases of many genome-sequencing projects with the predicted protein sequence of the C. neoformans mini-intein CnePRP8. Other searches used as query sequences the PRP8 protein sequences from several fungi. Our searches of the publicly available eukaryote sequence databases revealed the presence of putative full-length PRP8 inteins in the following ascomycete species: Aspergillus nidulans, Aspergillus fumigatus, Paracoccidiodes brasiliensis, three strains of Histoplasma capsulatum and Botrytis cinerea. We also detected by this means a mini-intein in Uncinocarpus reesii, a close relative of Coccidioides immitis and Coccidioides posadasii. All of these inteins occur at the same site within the PRP8 gene as the inteins in Cryptococcus. The insertions all form part of the open reading frame of the 'host' PRP8 gene and encode residues at each end that show similarity to the splicing domains of other inteins, especially to the Cryptococcus inteins (Figure 1).
The intein encoding sequence in Paracoccidioides brasiliensis was represented only by a single, 594bp, EST [GenBank: CN242988]. The first 243bp are predicted to encode the most C-terminal region (81 residues) of an intein similar to the other, complete, ascomycete PRP8 inteins.
Histoplasma intein encoding sequences can be found on GenBank accession AAJI01001309, describing data from a NAmI strain, WU24 (from base pair 3014–4618) and on contigs describing two strains which are available at the Histoplasma capsulatum genome project website . The sequence encoding HcaPRP8-186AR is found on contig0.41 from position 26003–27604; HcaPRP8-217B is encoded on Histo_FE.contig19 from position 885924–887525. There are 52 variable positions and a single codon indel among the three Histoplasma inteins. The Aspergillus nidulans intein (AniPRP8) is described in a third party annotation [GenBank: BK001316] referring to the primary data in another accession [GenBank: AACD0100078]. The PRP8 intein of A. fumigatus Af293 is encoded by the complementary strand of GenBank: AAHF01000008 (position 782611 to 780155). The B. cinerea intein (BciPRP8) is encoded on the complementary strand of supercontig_1.1533, base pairs 22933-20420 [GenBank: AAID01001533]. The mini-intein in Uncinocarpus reesii is encoded between positions 33382 to 33921 of GenBank: AAIW01000130; see also Table 1.
Detection and sequencing of further PRP8 inteins in ascomycetes
We used data from the Aspergillus sequence databases to design primers to amplify and sequence PRP8 inteins from other strains of A. fumigatus (including FRR0163) and from A. fumigatus var.ellipticus NRRL5109 [GenBank: DQ285414]. A. fumigatus var.ellipticus NRRL5109 is a member of a small group of strains that form a cryptic species clade (A. fumigatus "occultens" ). In addition, we amplified and sequenced a full-length intein from Neosartorya fischeri (FRR0181), a species closely related to A. fumigatus. We also obtained the intein sequence of A. nidulans R20 [GenBank: AY946006]. It is identical to that of the whole genome sequence strain, FGSC_A4; both strains originate from Glasgow University. It should be noted that the genus Aspergillus is large and diverse, for example A. fumigatus and N. fischeri are much more closely related to each other than either is to Aspergillus (Emericella)nidulans [28, 29]. N. fischeri (FRR0181) is also known as NRRL181 and is now the subject of a genome sequencing project at The Institute for Genomic Research (TIGR). The intein encoding sequence from A. fumigatus FRR0163 [GenBank: AY832923] is identical to that of the fully sequenced strain of A. fumigatus (Af293), except for one silent third-codon position change in 2457bp (not shown). The intein in A. fumigatus var.ellipticus NRRL5109 varies from that in A. fumigatus (FRR0163) at 12 positions, resulting in four amino acid substitutions in 819 residues. The PRP8 intein of N. fischeri [GenBank: AY832922] is 94% identical at the DNA level to the A. fumigatus intein. There are three indels of various sizes (609bp, 42bp and 285bp in AfuPRP8) that do not include any of the conserved sequence motifs of the splicing or homing endonuclease domains.
We attempted to amplify PRP8 intein sequences from other species of Aspergillus and Neosartorya from the section Fumigati (that is, species closely related to A. fumigatus) and from more distantly related species using primers complementary to internal intein sequences. These attempts were unsuccessful. In contrast, when we used primers complementary to the PRP8 regions flanking the intein insertion site we detected PCR products, but these were much shorter than would be expected if a full intein were present. These PCR products were sequenced directly, yielding sequences representing mini-inteins in six species of Neosartorya (N. spinosa, N. glabra, N. fenelliae, N. quadricincta, N. aurata, N pseudofischeri FRR0186) and in three species of Aspergillus (A. brevipes, A. giganteus NRRL 6136 and A. viridinutans) (Figure 2) (Table 1). Both N. spinosa and N. glabra are sometimes described as varieties of N. fischeri, but are more distinct from A. fumigatus than is N. fischeri (Girardin, Monod and Latge 1995). The taxonomy of both N. glabra and N. spinosa isolates is undergoing revision . Of these nine species with PRP8 mini-inteins, all except A. giganteus NRRL 6136 are members of the section Fumigati. A. giganteus is a member of the section Clavati .N. pseudofischeri FRR0186 is the designation we have given to a strain that was originally described as N. fischeri. Our analysis of other sequences derived from this strain (ITS and cytB) suggests that it is a member of the N. pseudofischeri species. The ITS sequence of FRR0186 is very closely similar (99%) to that of N. pseudofischeri [GenBank: AF459729]; the next best match (97% identity) is to N. fischeri [GenBank: AF176661]. N. pseudofischeri has occasionally been isolated as an opportunistic infection ; some isolates have been initially identified as A. fumigatus .
When using primers complementary to the PRP8 regions flanking the intein insertion site we detected short PCR products from Aspergillus unilateralis (FRR0577), A. clavatus (NRRL5811) and four strains of A. lentulus. These PCR products yielded sequence showing 'empty' alleles; that is, there were no PRP8 inteins present in these strains. A. lentulus is a newly recognised species, originally isolated from patients at the Fred Hutchinson Cancer Research Center .
Using DNA from strain Pb18 of Paracoccidioides brasiliensis (a kind gift from Professor Gusatvo Goldman) and a combination of primers, some designed to complement regions of the EST [GenBank: CN242988] that carries a part of a PRP8 intein and other less specific primers designed to complement regions of the PRP8 gene, we amplified the whole of the intein-containing PRP8 region from P. brasiliensis. The predicted protein sequence reveals an intein containing a homing endonuclease domain; the intein sequence and flanking regions are described in GenBank accession DQ285419. The protein sequence of PbrPRP8 from strain Pb18 has 53% identity with that of the Histoplasma capsulatum intein HcaPRP8_217B and 45% identity to the Aspergillus nidulans intein AniPRP8. In the 72-residue PRP8 intein region shared by strains Pb18 and Pb01 (GenBank: CN242988), 65 (90%) are identical.
We attempted to amplify a PRP8 intein encoding sequence from a strain of Botrytis cinerea isolated in New Zealand. The strain was confirmed as belonging to B. cinerea by amplifying and sequencing the ITS1 and ITS2 regions of the ribosomal gene array (data not shown). Amplification of the region surrounding the PRP8 intein insertion site, however, showed that this strain does not contain an intein, in contrast to strain B05.10 from the sequence project.
Protein domains encoded by PRP8 intein sequences
The splicing domains and endonuclease domain of PRP8 inteins were predicted through comparison with those of the Saccharomyces cerevisiae VMA intein (SceVMA). We identified the conserved sequence blocks using sequences of known inteins held at InBase . There are homing endonuclease domains in the PRP8 inteins of A. fumigatus, A. nidulans, N. fischeri, H. capsulatum, P. brasiliensis and the genome sequence strain of B. cinerea, but none in PRP8 inteins of other Aspergillus or Neosartorya species (including N. pseudofischeri FRR0186). The AfuPRP8 inteins (from FRR0163, from the genome sequencing strain, Af293, as well as from A. fumigatus var. ellipticus, NRRL5109) include a large insertion of 203 amino acid residues not present in the closely related Neosartorya fischeri FRR0181. This sequence is located immediately after the conserved sequence block B of the splicing domain (Figure 3). It does not show significant sequence or structural similarity to other protein sequences in the databases. The intein in B. cinerea (Order: Helotiales) also contains a substantial insertion in this region (Figure 3). The sequence of the insertion in Botrytis shows 58% similarity to the insertion in the A. fumigatus inteins, although it is 33 residues longer. Other, different, indels occur in this region of the other ascomycete inteins and mini-inteins (Figure 3). It may be that this region is especially tolerant of substantial indels. After this variable region, but prior to the first motif of the homing endonuclease (block C), there is a region (~40 residues) that is relatively well conserved across all of the PRP8 inteins (including the mini-inteins) and which must presumably be part of the splicing domain. This region does not, however, show similarity to the N4 splicing domain motif recognised by Pietrokovski .
The splicing domains of the PRP8 inteins (Figure 1, In Base motifs A, B, F, G), whether from full-length or mini-inteins, are very closely similar to each other. The splicing domains from the ascomycete PRP8 inteins share more residues with those from the basidiomycete (Cryptococcus) PRP8 inteins than they do with the splicing domains of the saccharomycete (ascomycete) VMA inteins (Figure 1).
The internal regions of some of the PRP8 inteins evidently form part of an endonuclease due to their similarity to other homing endonuclease domains such as those in the VMA inteins. The homing endonuclease domains of the PRP8 inteins (Figure 1C, InBase motifs C, D, E, H) are closely similar to each other, although the spacing between the first two highly conserved motifs (C and D) is somewhat variable; for example, the distance between motifs C and D in the PRP8 full-length inteins ranges from 212 to 300 residues. Length variation in this region is also seen in the yeast VMA inteins; while in VMA inteins with active homing endonucleases this distance is ~80 residues, in inactive VMA homing endonucleases there may be as many as ~160 residues.
Many of the ascomycete PRP8 inteins found so far, are mini-inteins apparently without the homing endonuclease domain. The whole of the homing endonuclease domain is absent from all the ascomycete mini-inteins (as in the Cryptococcus mini-inteins). All of the mini-inteins are of approximately the same length and have no regions with vestiges of homing endonuclease motifs. The alignment of the mini-inteins with the homing endonuclease containing inteins (Figure 3) indicates that the homing endonuclease deletions have occurred at almost the identical site within the ascomycete mini-inteins. The Cryptococcus mini-inteins have an internal deletion of a similar length that also covers all the conserved homing endonuclease domains (Figure 3). The presence of numerous mini-inteins in the PRP8 allelic series contrasts with the VMA allelic series of inteins in saccharomycetes, all of which are full-length.
Nucleotide changes and dS/dN values of PRP8 intein encoding sequences
The endonucleases of the yeast VMA intein series are frequently non-functional. It is of interest to determine if the newly described PRP8 HEG sequences encode active or inactive homing endonucleases. One way of approaching this question is to compare pairs of sequences and determine the frequencies of synonymous and non-synonymous changes in the HEGs of the inteins . A large value for dS/dN implies that the encoded peptide is selectively constrained (functional). It should be borne in mind that since PRP8 is an essential gene, even if the homing endonuclease becomes non-functional, the intein must remain as an ORF and must encode functional splicing domains. The HEG domain is therefore still subject to weak selection pressure, even if it is inactive (for example stop codons and frame-shifts will be selected against).
We analysed the distribution of point mutations in the codons over a concatenated 213bp region of the HEG domains; this region encodes the nine residues of motif C and the whole region covering motifs D, E and H of the homing endonuclease (that is, the residues illustrated in Figure 1C). Results from analysis of the nine full-length PRP8 inteins are summarised in Table 2. The rate ratio of synonymous to non-synonymous substitutions (dS/dN)  was calculated via the syn-SCAN website [38, 39]. The synonymous to non-synonymous rate provides a sensitive measure of selective pressure on the protein; a (dS/dN) value greater than 1.0 indicates that selection is operating to minimize the number of amino acid changes and thus retain the activity of the protein.
Comparison of the HEG regions of closely related fungi gave high dS/dN ratios. For example comparison of the HEG region of the A. fumigatus intein with the HEG region of its closest relative, N. fischeri, yields a dS/dN value of 20.84, indicating that the rate of synonymous (neutral) change far outweighs the rate of non-synonymous change. It is therefore likely that selection is acting to reduce change in these sequences by eliminating mutant alleles carrying non-synonymous substitutions. This indicates that the HEG region of these inteins encodes active endonucleases (or did so until recently). Comparisons between Histoplasma and Paracoccidioides gave a dS/dN of 14.27. Even some less closely related pairs gave high dS/dN values, for example the Aspergillus nidulans/Aspergillus fumigatus dS/dN is 21.44 and the Botrytis cinerea/Neosartorya fischeri dS/dN is 24.21. The calculation of the dS/dN value was not possible for many pairs because the frequency of synonymous substitutions was greater than 0.74, which makes calculating the Jukes-Cantor correction unreliable (that is, the system is saturated by synonymous substitutions in highly diverged sequences). Several comparisons, especially those between the more distantly related species (for example all comparisons involving Cryptococcus laurentii), encountered this limitation. These dS/dN values are entered as NA in Table 2. However, consideration of the underlying substitution frequencies shows clearly that even these comparisons are reflecting strong selective pressure. For example, in Table 2, the comparison of Histoplasma (Hca217B) with Nfi0181, allows a dS/dN value to be calculated, the comparison gives a pS of 0.73, a pN of 0.2 and a dS/dN of 12.41. In contrast, comparison of the other Histoplasma intein (Hca186AR) with Neosartorya fischeri (Nfi0181) does not allow a dS/dN value to be calculated; it gives a pS of 0.75 (which only just exceeds the Jukes-Cantor limit, 0.74) a pN of 0.2 and therefore the dS/dN value cannot be calculated. Many of the comparisons listed as NA in Table 2 are only just beyond calculation (due to the Jukes-Cantor limit) and are very similar to comparisons that give clear evidence of selective constraint (high dS/dN values).
In conclusion, dS/dN analysis of the PRP8 HEGs suggests that they are constrained by selection. In addition, the high frequency of synonymous substitutions indicates that these inteins have been diverging for a considerable period of time.
Nucleotide changes and dS/dN values of VMA intein encoding sequences
It is of interest to compare the nucleotide substitution pattern in the PRP8 HEGs (described above) with a similar analysis of the yeast VMA HEGs. It has been shown that some of these VMA homing endonucleases are inactive . The longer a homing endonuclease has been inactive, the more random the pattern of substitutions will be (dS/dN will approach 1.0). We have analysed the substitution patterns between VMA HEGs from 17 strains from 16 yeast species, using the same HEG region used to analyse the PRP8 homing endonucleases (Table 3). Of these 17, four have been shown to have active homing endonucleases (S. cerevisiae, S. cerevisiae DH1-1A,S. cariocanus, Zygosaccharomyces baillii), two homing endonucleases have not had their activity determined experimentally (Candida glabrata and Debaryomyces hansenii) and the remaining 11 homing endonucleases are inactive .
Comparison of substitution patterns of active VMA homing endonucleases gave dS/dN values ranging from 4.09 (Z. baillii/S. cerevisiae DH1-1A) to 13.99 (S. cerevisiae DH11-1A /S. cariocanus). Comparison of substitution patterns of inactive VMA homing endonucleases gave dS/dN values from 0.67 (S. exiguus/Candida glabrata) to 6.87 (Torulaspora pretoriensis/T. globosa). The VMA homing endonuclease from these two species of Torulaspora is closely similar to that of Z. baillii, a homing endonuclease known to be functional. It is likely that the Torulaspora homing endonucleases have only recently become inactive (resulting in a residual high dS/dN value). Of the 74 comparisons between inactive homing endonucleases, 70 give dS/dN values below 4. Of the 52 comparisons with mixed pairs (active homing endonucleases compared with inactive ones), there were only 11 instances where the dS/dN value was >4.0. Nine of these involved endonucleases from species of Torulaspora that are probably only recently inactive.
In summary, the correlation of the dS/dN values and homing endonuclease activity in the VMA cohort supports the concept of using dS/dN analyses to predict the activity of the PRP8 intein homing endonucleases. In other words, the comparison of active homing endonucleases gave high dS/dN values; in contrast comparison of inactive homing endonucleases gave low dS/dN values. The high dS/dN values of the PRP8 homing endonucleases are strikingly greater than the VMA active homing endonuclease values. Also, the frequency of synonymous substitutions was less than 0.74 in all but one VMA comparison, that is, the VMA system is not saturated by synonymous substitutions. Taken at face value this implies that the saccharomycete VMA intein allelic group is of more recent origin than the PRP8 allelic group.
Comparison of the nucleotide changes of the VMA and PRP8 intein encoding sequences
The above dS/dN analyses showed numerous comparisons where the Jukes-Cantor correction could not be calculated because of the high frequency of synonymous substitutions. These comparisons are described as NA in Table 2. In order to examine the homing endonuclease evolution in a way that includes all the comparisons, we submitted our data to analysis at the SNAP (Synonymous/Non-synonymous Analysis Program) site . SNAP calculates rates of nucleotide substitution from a set of codon-aligned nucleotide sequences, based on the method of Nei and Gojobori . The XY-Plot function at this site provides an illustration of the cumulative behaviour of the synonymous and non-synonymous substitutions across the coding region, one codon at a time; that is, the average behaviour at each codon is estimated from all the pair-wise comparisons. In the PRP8 HEG region, the synonymous and non-synonymous substitutions accumulate at a similar rate (despite there being many more non-synonymous options per codon). In the VMA HEG, non-synonymous substitutions accumulate at almost twice the rate as those in the PRP8 HEG region (see Additional file 1: SNAP XY plots of the homing endonuclease encoding regions of the PRP8 and VMA inteins). This analysis, which is based on the complete data set, supports the individual dS/dN comparisons.
PRP8 and VMA intein substitutions at the Asp active sites
To explore the activity of the PRP8 homing endonucleases further we aligned the active site residues [9, 36] of the homing endonucleases from PRP8 and yeast VMA inteins. As an example, the two aspartic acid residues (Asp-218 and Asp-326 in the S. cerevisiae VMA intein; marked by asterisks in Figure 1C) are involved in the active site co-ordination of a divalent metal ion, and are critical for activity of the homing endonuclease. These two aspartic acid residues are conserved in the active VMA homing endonucleases (S. cerevisiae, S. cariocanus and Z. baillii). The only inactive homing endonucleases to retain both these sites are from the two Torulaspora species; these homing endonucleases are closely similar to the active Z. baillii homing endonuclease. We believe, therefore, that the Torulaspora homing endonucleases have only recently become inactive. Of the two VMA homing endonucleases whose functionality is unknown, the one from C. glabrata lacks both these critical aspartic acid residues and the D. hansenii homing endonuclease lacks the proximal aspartic acid. The homing endonuclease domains of all of the ascomycete PRP8 inteins, except PbrPRP8 and ClaPRP8, have both these critical aspartates, supporting the belief that these homing endonucleases are active.
Euascomycete fungi lacking a PRP8 intein
During our systematic search for further PRP8 inteins using the public sequence databases (last search September 26th 2005), we encountered several instances where there was no PRP8 intein in close relatives of species known to contain an intein (see Table 4 for the complete list). For example, Coccidiodes posadasii and Coccidiodes immitis (Order: Onygales) are very closely related to Uncinocarpus reesii (Bowman et al., 1996), but neither species of Coccidioides contains an intein, whereas U. reesii does (Table 1, Table 4). In the family Sclerotiniaceae, B. cinerea and Sclerotinia sclerotiorum are very closely related  but there is no PRP8 intein in S. sclerotiorum (Table 4). Several species of Aspergillus are also known not to have a PRP8 intein  and (Table 4). Within the section Clavati of the genus Aspergillus there are two major clusters of species [29, 31]. One cluster includes A. clavatus (which has no PRP8 intein) and the other includes A. giganteus, which has a PRP8 mini-intein quite distinct from those present in species from the section Fumigati. Within the Fumigati, only one of the species analysed (A. unilateralis) was without an intein.
PRP8 inteins may eventually be found in other isolates of some of these apparently 'intein-less' species. The data in Table 4 are derived from single isolates of each species and it is possible that species will be found that are polymorphic for the PRP8 intein.
Distribution of PRP8 inteins
In addition to the sporadic nature of PRP8 intein distribution within the euascomycetes mentioned above, the wider distribution of the PRP8 intein is also decidedly sporadic (Figure 4) (Table 4). The intein is present in species from three orders of euascomycetes (Pezizomycotina) and a very narrow range of basidiomycete species. This would suggest a widespread occurrence and an ancient origin for the intein within the euascomycetes; however most euascomycetes for which there is PRP8 sequence data available do not have an intein in PRP8. There is no intein encoded in the PRP8 gene of any species from the other major ascomycete class, hemiascomycetes (Saccharomycotina); for example, Candida albicans, Candida guilliermondii, Eremothecium gossypii, Kluyveromyces waltii, Saccharomyces cerevisiae. Neither of the two archaeascomycetes (Schizosaccharomyces pombe, Pneumocystis carinii) for which PRP8 sequence data are available encodes a PRP8 intein. In summary, if the PRP8 intein is descended from a precursor present in the common ancestor of the Ascomycota, the loss of the intein must have been a frequent occurrence, either by deletion or by fixation of an empty allele.
The only other PRP8 inteins are present in four species of Cryptococcus (a basidiomycete genus from another phylum). A survey of other Cryptococcus species within the Tremellales has not yet yielded any further PRP8 intein sequences but has shown 'empty sites' in eight species, including in C. amylolentus, the most closely related species to C. neoformans and C. gattii .
To better understand the sporadic distribution of the PRP8 inteins we analysed the sequences of the PRP8 host proteins and PRP8 intein splicing domains. This analysis should determine if the inteins have a similar phylogeny to that of the PRP8 proteins (suggesting vertical descent of the intein) or, alternatively, have discordant phylogenies suggesting horizontal transfer. We chose to concentrate the phylogenetic analysis on the splicing domains for two main reasons. Firstly, C. neoformans and C. gattii have PRP8 mini-inteins, as do several of the euascomycetes. These inteins therefore have splicing domains but no homing endonucleases. Secondly, it has been shown that many of the yeast VMA homing endonucleases are non-functional and this will influence the rate of substitution and therefore potentially disturb phylogenetic analyses.
The phylogeny of the PRP8 proteins follows the expected organism phylogeny (Figure 4). The basidiomycete PRP8 sequences (Cryptococcus, Coprinopsis, Ustilago and Phanerochaete) fall within the fungal group, but are outside those of the ascomycete species. The euascomycete PRP8 proteins group into their respective orders, all well separated from hemiascomycete PRP8 proteins. There are no paralogs of the PRP8 gene present in any genome that might perturb this phylogeny.
Further phylogenetic analyses were based on an alignment of the four sequence blocks/motifs of the splicing domains of the fungal inteins. These analyses indicate that the splicing domains of the euascomycete PRP8 inteins are closely similar to those of the Cryptococcus PRP8 inteins (Figure 5). There is much less variation between the splicing domain sequences of the euascomycete PRP8 inteins and the splicing domains of the Cryptococcus PRP8 inteins than there is within the splicing domains of the VMA inteins found in the single family Saccharomycetaceae. The basidiomycete and ascomycete fungi last shared a common ancestor more than 500 million years ago, while two of the most diverged VMA intein carrying species (Kluyveromyces lactis and Saccharomyces cerevisiae) diverged approximately 70 million years ago . The close similarity of the basidiomycete and euascomycete PRP8 inteins does not reflect the length of time since the divergence of their host organisms, and suggests horizontal transfer from a euascomycete to a basidiomycete.
The splicing domains used in this alignment span only 126 residues and thus provide insufficient discrimination within the Fumigati group of the Aspergillus species to determine within-group relationships. Inteins carried by species in more taxonomically distant groups are clearly separated however (for example those in the Onygales order), indicating significant phylogenetic signal is present.
Phylogenetic analysis of the PRP8 homing endonuclease domains was limited to the nine full-length inteins in Table 1. The phylogram (Figure 6) indicates an unresolved polytomy, reflecting the highly diverged nature of the homing endonucleases, with the exception of the closely related Aspergillus fumigatus/Neosartorya fischeri group and the moderately related Histoplasma capsulatum/Paracoccidioides brasiliensis group. This extensive divergence means that nothing can be concluded from these data about the phylogenetic relationship of the Cryptococcus laurentii PRP8 endonuclease with the euascomycetes. These results fit with the conclusions derived from the synonymous substitution analysis; the PRP8 intein shows great diversity. The comparable VMA phylogram also shows a great deal of diversity in the homing endonuclease domain (Figure 6), however much of this diversity is in inactive endonucleases that would be expected to evolve more quickly.
The PRP8 and VMA HEG domains show comparable protein diversity. In contrast, the synonymous substitution frequency has reached saturation in the PRP8 HEGs, while there is much less synonymous substitution present in the VMA HEGs. These observations taken together suggest that the PRP8 intein is the more ancient element.
In the introduction we described a model in which HEGs (and more specifically, homing endonuclease-containing inteins) undergo recurrent cycles of (i) horizontal transmission to new genomes, (ii) spread and fixation in the recipient population by homing, (iii) degeneration due to the absence of target sequences and (iv) eventual loss . This general model could be applied to any HEG but the HEGs of the VMA inteins of the saccharomycetes provided much of the experimental basis for the model. Our results allow evaluation of this model in the context of a second group of nuclear inteins. Relevant gene distribution models, used to investigate patchy, non-phylogenetic gene distributions, have been the subjects of several studies [44, 45].
Full length inteins: active and inactive homing endonucleases
The majority of the homing endonucleases encoded by the VMA intein sequences are no longer functional . In vitro assays of the activity of the VMA intein homing endonucleases have shown that only three are functional . The dS/dN values derived from comparisons of these active inteins (calculated across a concatenation of the conserved motifs C, D, E and H) are higher on average than the dS/dN values produced when inactive homing endonucleases are compared. As noted by Koufopanou and Burt  the dS/dN value of the splicing domain is strongly positive for the VMA inteins, including those with an inactive homing endonuclease. To take one of the extreme examples, our comparison of two species with inactive VMA homing endonucleases (S. exiguus and Candida glabrata) gave dS/dN of 4.41 for the splicing domain and a dS/dN of 0.67 for the homing endonuclease domain. In other words, when inteins with inactive homing endonucleases are compared the dS/dN value of the (active) splicing domain is different from that for the inactive homing endonuclease. Comparison of the dS/dN value of the splicing domain and homing endonuclease domain of the PRP8 inteins suggest that both of these domains are constrained (both show high dS/dN values). For example, comparison of the Histoplasma and Paracoccidioides inteins show a dS/dN value of 23.57 for the splicing domain and a dS/dN value of 14.27 for the homing endonuclease domain. The euascomycete PRP8 homing endonucleases almost all contain the two critical aspartate residues of the active site. It is therefore likely that they are active although this clearly requires experimental confirmation.
Even though the PRP8 homing endonucleases appear to be constrained by selection, they are nevertheless quite diverse. This may be attributable, in part, to a long evolutionary separation but it also seems to reflect a relatively relaxed selection as compared to the splicing domain. The diversity of the homing endonucleases is shown by amino acid substitution as illustrated in the phylogenetic tree (Figure 6), but there is also frequent indel occurrence. This diversity makes the homing endonucleases useful sequences for the study of the phylogenetic relationships of intein carrying fungi. Many major human pathogens (A. fumigatus, Histoplasma, Paracoccidioides) and plant pathogens (Botrytis) carry inteins with homing endonucleases that, because of their diversity, can facilitate phylogenetic analysis.
Mini-inteins: artificial and natural
The VMA inteins of the saccharomycetes are all full-length, homing endonuclease-containing inteins. No allelic VMA mini-inteins have been reported. Chong and Xu  created artificial, splicing-competent, VMA inteins by the deletion of the internal homing endonuclease region (motifs C, D, E and F) from the S. cerevisiae VMA intein. In order for the splicing activity to be restored in these artificial constructs, the 183amino acid residue deletion had to be replaced by a 14 or 19 amino acid residue 'linker' region. Chong and Xu  suggest that the linker provides sufficient flexibility to allow correct conformation for efficient splicing. This result demonstrates that a single deletion event removing all of the HEG is compatible with retaining splicing function (that is, the HEG and splicing functions are separate).
In contrast to the VMA inteins of saccharomycetes and the GLT1 inteins present in other ascomycetes , which are all naturally full-length, the PRP8 inteins of the basidiomycetes and euascomycetes are found both as mini-inteins and homing endonuclease-containing inteins. We previously reported that the PRP8 genes of Cryptococcus neoformans and C. gattii encode a mini-intein [19, 20]. The full-length euascomycete PRP8 inteins vary in length from 517 to 838 residues; the mini-inteins, in contrast, are all of very similar length (153 to 180 residues), with only the A. giganteus and Cryptococcus mini-inteins being somewhat different in length. We have not detected inteins of an intermediate length or inteins with only remnants of any of the homing endonuclease domains. This raises the interesting questions of why the PRP8 system should have frequent mini-inteins and also how a full-length intein might be precisely reduced to a mini-intein.
Empty alleles and the intein cycle
Inteins are often present at highly conserved sites in their host proteins, presumably because such inteins are less likely to be deleted. The model of Burt and Koufopanou  proposes that inteins can be deleted to regenerate target sites and that target sites can be invaded through horizontal transmission. The allelic PRP8 inteins are present at a highly conserved site in a highly conserved protein that forms an essential part of the spliceosome . The elimination of the intein or mini-intein encoding sequence would have to be exact to allow the continued function of the protein encoded by the PRP8 gene. Another aspect of the model is the expectation that the frequency of the HEG allele will increase within the gene pool and eventually come to fixation. The model therefore proposes that, if a HEG is present in a species, then all members of a species should eventually carry the intein. This is not the case for S. cerevisiae where some members do not have the VMA intein [18; and authors' unpublished data], or for Candida (Pichia) guilliermondii where some members of the species do not contain the GLT1 intein [46, 47]. The genome sequencing strain of Botrytis cinerea has a full-length PRP8 intein, while a strain isolated in New Zealand has no allelic intein. We are investigating the intein status of other B. cinerea strains. At present it therefore seems true that inteins are not found at fixation in many species, perhaps most species. Aspergillus fumigatus may have the intein at fixation but this species seems to be an asexual clone with little polymorphism. The model of Burt and Koufopanou  may require modification for asexual or predominantly asexual species such as A. fumigatus. As pointed out by these authors, the amount of gene flow within (or between) populations and species will determine how rapidly an element such as an intein will come to fixation [8, 11]. The VMA intein is only capable of homing during meiosis. If this restriction applies to other eukaryote inteins one would expect asexual species to lose their inteins more rapidly than sexual species.
Liu and Yang  indicated that there were no allelic PRP8 inteins in some species of Aspergillus (A. flavus, A. niger, A. oryzae, A. parasiticus, A. terreus, and A. ustus). In this study, we have shown that there is no PRP8 intein in A. unilateralis (a member of the section Fumigati). There is no PRP8 intein in A. clavatus (a member of the section Clavati, which is the sister clade to section Fumigati) but there is a PRP8 intein in A. giganteus (another member of the Clavati). There are numerous other examples where an intein is found in one species but not in a closely related species (for example, Coccidioides/Uncinocarpus, Botrytis/Sclerotinia). Determination of the PRP8 intein status of further euascomycetes related to Aspergillus is desirable. Vertical inheritance accompanied by multiple events of intein loss may explain the present intein/mini-intein/empty allele distribution.
To determine if an intein is present in a species it is necessary to analyse more than one member of a species (this is a limitation of the data in Table 4). Similarly, before concluding that all members of a sexually reproducing fungal species carry an intein (as the model expects will be the case at fixation) it is necessary to test a significant number of isolates from diverse origins, as suggested by Burt and Koufopanou . This prediction was investigated by Okuda et al. , who sequenced the VMA1 gene (and any accompanying VMA intein sequence) from 10 strains of Saccharomyces cerevisiae. They detected two different groups of VMA intein sequence, one group of 3 were identical to the intein sequence in strain DH1-1A and the other group of six were identical to the strain X2180-1A. The two groups were 96% identical to each other. A final strain, NKY278 (1) did not contain an intein in VMA. A further strain of S. cerevisiae, the natural isolate RM11-1a, does not contain an allelic VMA intein [authors' unpublished data; 48]. It then follows that not all members of the species contained an intein, even though the S. cerevisiae VMA homing endonuclease is active.
Intein, mini-intein, empty allele polymorphism
It is possible that inteins do not typically come to fixation but rather achieve an equilibrium with their empty targets within the gene pool. For example, it may be that the presence of an intein is not perfectly neutral but confers a very slight selective disadvantage due to the added process of protein splicing or the occurrence of ectopic homing endonuclease cleavage. If inteins exist in equilibrium with empty target sites within a gene pool this would allow continuous selection for homing endonuclease function during extended periods of vertical transmission. If true, this would explain how an active homing endonuclease could be retained during an extended period of vertical transmission. Segregation of the empty allele (without any requirement for deletion) would also explain the frequent occurrence of species lacking the intein. This modified model may be applicable to the filamentous fungi such as Aspergillus fumigatus that, because of their abundant aerial mitotic conidia, have global distributions and therefore very large effective gene pools.
Intein phylogeny and horizontal transmission
The only occurrences of an intein in PRP8 outside the euascomycetes are those found in closely related, pathogenic, varieties of Cryptococcus neoformans and C. gattii (members of the genus: Filobasidiella) and in the occasional pathogen Cryptococcus laurentii. The absence of an intein encoded in the PRP8 gene of species closely related to Cryptococcus neoformans and C. gattii poses an interesting puzzle as to the evolutionary origin of the immobile mini-inteins at this site in the Cryptococcus PRP8 protein. If the intein were inherited vertically, it might be expected to be present in all, or at least some, of the other members of the Tremellales group, whereas the only other Cryptococcus species in which a PRP8 intein has been detected is Cryptococcus laurentii . This species contains a full-length homing endonuclease-encoding intein in PRP8. C. laurentii is a species from a different cluster of the Tremellales group than C. neoformans . It might be suggested as a source for the C. neoformans mini-inteins. The intein is absent, however, from all the other Tremellales species that have been examined, including another member of the Filobasidiella genus (C. amylolentus). The common ancestor of pathogenic cryptococci may have gained the intein by horizontal transmission of an HEG containing intein sequence from another PRP8 gene. The presence of an allelic, HEG containing, intein sequence in the PRP8 gene of a group of euascomycetes suggests that this euascomycete group is a possible source of the Cryptococcus mini-intein.
Our analyses of the PRP8 intein group have shown many points of difference between PRP8 inteins and the VMA inteins of saccharomycetes. The PRP8 intein has been found in members of two phyla (ascomycetes and basidiomycetes), while the VMA intein of yeast is restricted to several genera in one family, the Saccharomycetaceae. Many PRP8 inteins are mini-inteins, whereas no mini inteins have been found in the VMA-a site of members of the Saccharomycetaceae. Most of the VMA inteins contain an inactive homing endonuclease. In contrast, the homing endonucleases of the PRP8 inteins are apparently active (they show high dS/dN values). The phylogeny of the euascomycete PRP8 inteins provides no evidence for horizontal transfer. Comparison of the homing endonuclease domains indicates that the level of synonymous change has reached saturation (pS >0.74) suggesting that the homing endonucleases have been diverging over a substantial period of evolutionary time. Despite this extended period of vertical transmission, the homing endonucleases have apparently remained active. The VMA homing endonuclease is not saturated by synonymous substitutions (pS<0.74). This implies that the VMA intein allelic group is of more recent origin than the PRP8 allelic group, even though the homing endonuclease amino acid sequences are more divergent than those of the PRP8 inteins. The extensive divergence in the VMA homing endonucleases is attributable to their loss of function.
We submit a modified model for intein evolution that may be more appropriate for inteins present in euascomycete species. We suggest that inteins may not become fixed in a population, but that their presence/absence may be polymorphic in a particular species. This would allow continuous selection for homing endonuclease function during extended periods of vertical transmission. Segregation of the empty allele (without any requirement for deletion) would also explain the frequent occurrence of closely related species that lack the intein.
Isolates of species of Aspergillus or Neosartorya were obtained from Food Science Australia  except for Aspergillus nidulans strain R20 which originates from Glasgow University and A. fumigatus var. ellipticus (NRRL5109) and strains of A. lentulus (FH4, FH5, FH7 and FH220) which were provided by Arun Balajee of the Fred Hutchinson Cancer Research Center, Seattle. Strains were grown on Aspergillus nutrient agar  at 27°C, or 37°C for A. fumigatus. A strain of Botrytis cinerea, isolated from a vineyard in New Zealand, was provided by colleagues at Lincoln University. Genomic DNA was isolated from 50ml overnight cultures essentially using the method of Philippsen et al. . Genomic DNA samples from species within the section Fumigati were supplied by Carla Rydholm of Duke University. DNA from Paracoccidioides brasiliensis (strain Pb18) was a gift of Professor Gustavo Goldman from the Universidade de São Paulo, Brazil.
Amplification of the intein sequence and flanking regions was accomplished with the Expand High Fidelity PCR system (Roche, Mannheim, Germany) as outlined in Butler, Goodwin and Poulter . Primers were synthesised by Proligo, Singapore; the primer sequences used are shown in Table 5. Amplifications of the internal transcribed spacer (ITS) regions, including the 5.8S rRNA gene, were performed using primers ITS1 (5' TCCGTAGGTGAACCTGCGG) and ITS4 (5' TCCTCCGCTTATTGATATGC) ; the mitochondrial cytochrome b genes were amplified using the primers (Wang_E1m (5' TGAGGTGCTACAGTTATTAC) and Wang_E2rev (5' GGTATAG [AC]TCTTAA [AT]ATAGC). The resulting PCR products were purified with Qiagen columns (Hilden, Germany) prior to automatic sequencing at the Allan Wilson Centre Genome Service at Massey University  using an ABI 3730 DNA Sequencer.
General sequence analyses were done using 4Peaks V1.6  and the Wisconsin GCG package (Genetics Computer Group, 575 Science Drive, Madison, Wis.). Sequence similarity searches were performed using the National Center for Biotechnology Information BLAST server  and at the various fungal genome sequencing project web sites (see below). Multiple sequence alignments were constructed using CLUSTAL_X at the European Bioinformatics Institute server  edited with Seaview  and shaded with MacBoxshade . Phylogenetic trees were constructed using PAUP4b10 . The rates of synonymous and non-synonymous substitutions within the intein HEG domains were calculated using Syn-SCAN, a program found within the Resources section of the Stanford HIV RT and Protease Sequence database [38, 39].
Representatives of the newly described intein sequences in the PRP8 genes and flanking sequences have been assigned GenBank accession numbers AY832918-AY832926 (Table 1). Descriptions of the ascomycete PRP8 inteins have been added to those of the Cryptococcus inteins at InBase records .
Fungal genome sequencing projects
Genomic sequence data were from the Genome Sequencing Center at Washington University in St. Louis  where two distinct strains of H. capsulatum (G217B and G186AR) are being sequenced, and from the Broad Institute .
Data consist of a large set of P. brasiliensis ESTs generated by a group of laboratories in Brazil in order to gather information about the differences in yeast and mycelial transcriptomes of this pathogenic fungus . Data are available via GenBank accessions CA580326-CA584263.
Preliminary sequence data for Aspergillus fumigatus (strain Af293) were obtained from The Institute for Genomic Research website . Aspergillus nidulans (strain FGSC_A4) data were available at the Aspergillus nidulans Sequencing Project of the Broad Institute of MIT and Harvard . We examined several other fungal whole genome sequences (including those of Botrytis cinerea, Uncinocarpus reesii, Coccidioides posadasii, Coprinopsis cinereus, Fusarium graminearum, Magnaporthe grisea, Histoplasma capsulatum, Neurospora crassa, and Ustilago maydis) that are provided by the Broad Institute of MIT and Harvard . The Génolevures website  provides data and search tools relating to the complete genomes of four hemiascomycetes and the initial data from random sequencing of nine other hemiascomycete species.
We also searched the database held by the Consortium for the functional Genomics of Microbial Eukaryotes (Cogeme). In this project, expressed sequence tags (ESTs) have been obtained from thirteen plant pathogenic fungi and two plant pathogenic oomycetes . ESTs representing the same gene have been used to produce a single contig or consensus sequence and a BLAST facility is available at the website . We also used data provided by the US Department of Energy's Joint Genome Institute  derived from the genome of the basidiomycete Phanerochaete chrysosporium (the white rot fungus).
Perler F, Davis E, Dean G, Gimble F, Jack W, Neff N, Noren C, Thorner J, Belfort M: Protein splicing elements: inteins and exteins–a definition of terms and recommended nomenclature. Nucleic Acids Res. 1994, 22: 1125-1127.
Xu M, Southworth M, Mersha F, Hornstra L, Perler F: In vitro protein splicing of purified precursor and the identification of a branched intermediate. Cell. 1993, 75: 1371-77. 10.1016/0092-8674(93)90623-X.
Chong S, Shao Y, Paulus H, Benner J, Perler F, Xu M: Protein splicing involving the Saccharomyces cerevisiae VMA intein. The steps in the splicing pathway, side reactions leading to protein cleavage, and establishment of an in vitro splicing system. J Biol Chem. 1996, 271: 22159-68. 10.1074/jbc.271.36.22159.
Liu X: Protein-splicing intein: genetic mobility, origin, and evolution. Annu Rev Genet. 2000, 34: 61-76. 10.1146/annurev.genet.34.1.61.
Chevalier B, Stoddard B: Homing endonucleases: structural and functional insight into the catalysts of intron/intein mobility. Nucleic Acids Res. 2001, 29: 3757-3774. 10.1093/nar/29.18.3757.
Gimble F, Thorner J: Homing of a DNA endonuclease gene by meiotic gene conversion in Saccharomyces cerevisiae. Nature. 1992, 357: 301-306. 10.1038/357301a0.
Goddard M, Burt A: Recurrent invasion and extinction of a selfish gene. Proc Natl Acad Sci U S A. 1999, 96: 13880-13885. 10.1073/pnas.96.24.13880.
Goddard M, Greig D, Burt A: Outcrossed sex allows a selfish gene to invade yeast populations. Proc Biol Sci. 2001, 268: 2537-42. 10.1098/rspb.2001.1830.
Posey K, Koufopanou V, Burt A, Gimble F: Evolution of divergent DNA recognition specificities in VDE homing endonucleases from two yeast species. Nucleic Acids Res. 2004, 32: 3947-3956. 10.1093/nar/gkh734.
Gogarten J, Senejani A, Zhaxybayeva O, Olendzenski L, Hilario E: Inteins: structure, function, and evolution. Annu Rev Microbiol. 2002, 56: 263-87. 10.1146/annurev.micro.56.012302.160741.
Burt A, Koufopanou V: Homing endonuclease genes: the rise and fall and rise again of a selfish element. Curr Opinion Genetics & Development. 2004, 14: 609-615. 10.1016/j.gde.2004.09.010.
Perler FB: InBase, the Intein Database. Nucleic Acids Res. 2000, 28: 344-5. 10.1093/nar/28.1.344.
Chong S, Xu M: Protein splicing of the Saccharomyces cerevisiae VMA intein without the endonuclease motifs. J Biol Chem. 1997, 272: 15587-90. 10.1074/jbc.272.25.15587.
Duan X, Gimble F, Quiocho F: Crystal structure of PI-SceI, a homing endonuclease with protein splicing activity. Cell. 1997, 89: 555-564. 10.1016/S0092-8674(00)80237-8.
Paulus H: Protein splicing and related forms of protein autoprocessing. Annu Rev Biochem. 2000, 69: 447-496. 10.1146/annurev.biochem.69.1.447.
Kane P, Yamashiro C, Wolczyk D, Neff N, Goebl M, Stevens T: Protein splicing converts the yeast TFP1 gene product to the 69-kD subunit of the vacuolar H(+)-adenosine triphosphate. Science. 1990, 250: 651-657.
Koufopanou V, Goddard M, Burt A: Adaptation for horizontal transfer in a homing endonuclease. Mol Biol Evol. 2002, 19: 239-46.
Okuda Y, Sasaki D, Nogami S, Kaneko Y, Ohya Y, Anraku Y: Occurrence, horizontal transfer and degeneration of VDE intein family in Saccharomycete yeasts. Yeast. 2003, 20: 563-73. 10.1002/yea.984.
Butler M, Goodwin T, Poulter RTM: A nuclear encoded intein in the fungal pathogen Cryptococcus neoformans. Yeast. 2001, 18: 365-70. 10.1002/yea.781.
Butler M, Poulter RTM: The PRP8 inteins in Cryptococcus are a source of phylogenetic and epidemiological information. Fungal Genetics and Biology. 2005, 42: 452-63. 10.1016/j.fgb.2005.01.011.
Hull C, Heitman J: Genetics of Cryptococcus neoformans. Annu Rev Genet. 2002, 36: 557-615. 10.1146/annurev.genet.36.052402.152652.
Xu J, Vilgalys R, Mitchell T: Multiple gene genealogies reveal recent dispersion and hybridization in the human pathogenic fungus Cryptococcus neoformans. Mol Ecol. 2000, 9: 1471-81. 10.1046/j.1365-294x.2000.01021.x.
Collins C, Guthrie C: The question remains: is the spliceosome a ribozyme?. Nat Struct Biol. 2000, 7: 850-4. 10.1038/79598.
Kuhn A, Reichl E, Brow D: Distinct domains of splicing factor Prp8 mediate different aspects of spliceosome activation. Proc Natl Acad Sci U S A. 2002, 99: 9145-9. 10.1073/pnas.102304299.
Liu X, Yang J: Prp8 intein in fungal pathogens, target for potential antifungal drugs. FEBS Lett. 2004, 572: 46-50. 10.1016/j.febslet.2004.07.016.
Histoplasma Sequencing at the Genome Sequencing Center. [http://genomeold.wustl.edu/projects/hcapsulatum/]
Pringle A, Baker D, Platt J, Wares J, Latgé J, Taylor J: Cryptic speciation in the cosmopolitan and clonal human pathogenic fungus Aspergillus fumigatus. Evolution. 2005, 59: 1880-1890.
Girardin H, Monod M, Latge J-P: Molecular characterization of the food-borne fungus Neosartorya fischeri (Malloch and Cain). Appl Environ Microbiol. 1995, 61: 1378-83.
Varga J, Vida Z, Tóth B, Debets F, Horie Y: Phylogenetic analysis of newly described Neosartorya species. Antonie van Leeuwenhoek. 2000, 77: 235-239. 10.1023/A:1002476205873.
Hong S, Cho H-S, Frisvad J, Samson R: Polyphasic taxonomy of Neosartorya spinosa, N. glabra and related species, and thedescription of N. laciniosa sp. nov. and N. coreanensis sp. nov. Poster presentation at the International Congress of Mycology. San Francisco. American Society for Microbiology, 23–28 July 2005
Varga J, Rigó K, Molnár J, Tóth B, Szencz S, Téren J, Kozakiewicz Z: Mycotoxin production and evolutionary relationships among species of the Aspergillus section Clavati. Antonie van Leeuwenhoek. 2003, 83: 191-200. 10.1023/A:1023355707646.
Jarv H, Lehtmaa J, Summerbell R, Hoekstra E, Samson R, Naaber P: Neosartorya pseudofischeri. 2004, 42: 925-928.
Balajee SA, Gribskov JL, Brandt M, Ito J, Fothergill A, Marr K: Mistaken identity: Neosartorya pseudofischeri and its anamorph masquerading as Aspergillus fumigatus. J Clin Microbiol. 2005, 43: 5996-5999. 10.1128/JCM.43.12.5996-5999.2005.
Balajee SA, Gribskov JL, Hanley E, Nickle D, Marr K: Aspergillus lentulus sp. Nov., a new sibling species of A. fumigatus. Eukaryotic Cell. 2005, 4: 625-632. 10.1128/EC.4.3.625-632.2005.
Pietrokovski S: Modular organization of inteins and C-terminal autocatalytic domains. Protein Sci. 1998, 7: 64-71.
Koufopanou V, Burt A: Degeneration and domestication of a selfish gene in yeast: molecular evolution versus site-directed mutagenesis. Mol Biol Evol. 2005, 22: 1535-1538. 10.1093/molbev/msi149.
Nei M, Gojobori T: Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol Biol Evol. 1986, 3: 418-26.
Stanford HIV Drug Resistance Database_SynScan. [http://hivdb6.stanford.edu/synscan/synscan.cgi]
Gonzales M, Dugan J, Shafer R: Synonymous-non-synonymous mutation rates between sequences containing ambiguous nucleotides (Syn-SCAN). Bioinformatics. 2002, 18: 886-887. 10.1093/bioinformatics/18.6.886.
The SNAP server. [http://www.hiv.lanl.gov/content/hiv-db/SNAP/WEBSNAP/SNAP.html]
Bowman B, White T, Taylor J: Human pathogenic fungi and their close non-pathogenic relatives. Molec Phylogenet and Evoln. 1996, 6: 89-96. 10.1006/mpev.1996.0061.
Holst-Jensen A, Kohn LM, Schumacher T: Nuclear rDNA phylogeny of the Sclerotiniaceae. Mycologia. 1997, 89: 885-889.
Berbee M, Taylor J: Fungal molecular evolution: gene trees and geologic time. The Mycota, vol. VIIB. Systematics and Evolution. Edited by: McLaughlin D, McLaughlin E, Lemke E. 2001, Springer-Verlag, Berlin, 229-246.
Snel B, Bork P, Huynen MA: Genomes in flux: the evolution of archaeal and proteobacterial gene content. Genome Res. 2002, 12: 17-25. 10.1101/gr.176501.
Pietrokovski S: Intein spread and extinction in evolution. Trends Genet. 2001, 17: 465-72. 10.1016/S0168-9525(01)02365-4.
Butler M, Goodwin T, Poulter R: Two new fungal inteins. Yeast. 2005, 22: 493-501. 10.1002/yea.1229.
George K: The inteins of the fungi Pichia guilliermondii and Paracoccidioides brasiliensis. Post-graduate diploma in science thesis. 2005, University of Otago, Dunedin, NZ
Fungal Genome Initiative: Broad Institute. [http://www.broad.mit.edu]
Fell J, Boekhout T, Fonseca A, Scorzetti G, Statzell-Tallman A: Biodiversity and systematics of basidiomycetous yeasts as determined by large-subunit rDNA D1/D2 domain sequence analysis. Int J Syst Evol Microbiol. 2000, 50: 1351-71.
FRR Culture Collection Catalogue: Food Science Australia. [http://www.foodscience.afisc.csiro.au/fcc]
Clowes R, Hayes W, editors: Experiments in Microbial Genetics. 1968, Blackwell Scientific, Oxford, UK
Philippsen P, Stotz A, Scherf C: DNA of Saccharomyces cerevisiae. Methods Enzymol. 1991, 194: 169-182.
Scorzetti G, Fell J, Fonseca A, Statzell-Tallman A: Systematics of basidiomycetous yeasts, a comparison of large subunit D1/D2 and internal transcribed spacer rDNA regions. FEMS Yeast Res. 2002, 2: 495-517.
Wang L, Yokoyama K, Miyaji M, Nishimura K: Mitochondrial cytochrome b gene analysis of Aspergillus fumigatus and related species. J Clin Microbiol. 2000, 38: 1352-1358.
Allan Wilson Centre Genome Service. [http://awcmee.massey.ac.nz/genome-service.htm]
4Peaks website. [http://www.mekentosj.com/4peaks/]
ClustalW submission form: EMBL-EBI. [http://www.ebi.ac.uk/clustalw/]
Galtier N, Gouy M, Gautier C: SEAVIEW and PHYLO_WIN: two graphic tools for sequence alignment and molecular phylogeny. Comput Appl Biosci. 1996, 12: 543-548.
MacBoxshade submission form. [http://www.isrec.isb-sib.ch/ftp-server/boxshade/MacBoxshade/]
Swofford D: PAUP*. Phylogenetic analysis using parsimony (* and other methods). 1998, Sinauer, Sunderland, Mass, Version 4
InBase, the Intein Database and Registry. [http://www.neb.com/neb/inteins.html]
Felipe MS, Andrade RV, Petrofeza SS, Maranhao AQ, Torres FA, Albuquerque P, Arraes FB, Arruda M, Azevedo MO, Baptista AJ, Bataus LA, Borges CL, Campos EG, Cruz MR, Daher BS, Dantas A, Ferreira MA, Ghil GV, Jesuino RS, Kyaw CM, Leitao L, Martins CR, Moraes LM, Neves EO, Nicola AM, Alves ES, Parente JA, Pereira M, Pocas-Fonseca MJ, Resende R, Ribeiro BM, Saldanha RR, Santos SC, Silva-Pereira I, Silva MA, Silveira E, Simoes IC, Soares RB, Souza DP, De-Souza MT, Andrade EV, Xavier MA, Veiga HP, Venancio EJ, Carvalho MJ, Oliveira AG, Inoue MK, Almeida NF, Walter ME, Soares CM, Brigido MM: Transcriptome characterization of the dimorphic and pathogenic fungus Paracoccidioides brasiliensis by EST analysis. Yeast. 2003, 20: 263-71. 10.1002/yea.964.
The Institute for Genome Research. [http://www.tigr.org]
Galagan J, Henn M, Ma L-J, Cuomo C, Birren B: Genomics of the fungal kingdom: Insights into eukaryote biology. Genome Research. 2005, 15: 1620-1631. 10.1101/gr.3767105.
Soanes D, Skinner W, Keon J, Hargreaves J, Talbot N: Genomics of phytopathogenic fungi and the development of bioinformatic resources. Mol Plant-Microbe Interact. 2002, 15: 421-427.
Cogeme: Phytopathogenic Fungi and Oomycete EST Database. [http://cogeme.ex.ac.uk/]
DOE Joint Genome Institute: Genome Portal. [http://genome.jgi-psf.org/]
Kurtzman C: Phylogenetic circumscription of Saccharomyces, Kluyveromyces and other members of the Saccharomycetaceae, and the proposal of the new genera Lachancea, Nakaseomyces, Naumovia, Vanderwaltozyma and Zygotorulaspora. FEMS Yeast Res. 2003, 4: 233-45. 10.1016/S1567-1356(03)00175-2.
This study was aided substantially by gifts of strains and DNA from our colleagues: Ms Carla Rydholm, Dr Arun Balajee and Professor Gustavo Goldman. We remain very grateful for this assistance.
Preliminary sequence data for Aspergillus fumigatus were obtained from The Institute for Genomic Research . Sequencing of Aspergillus fumigatus was funded by the National Institute of Allergy and Infectious Disease U01 AI 48830 to David Denning and William Nierman, the Wellcome Trust, and Fondo de Investicagiones Sanitarias. Initial data for Paracoccidiodes brasiliensis were from the EST database constructed by Felipe et al. . Histoplasma capsulatum sequence data were produced by the Genome Sequencing Center at Washington University School of Medicine in St. Louis  and the Broad Institute . Aspergillus nidulans data were provided by the Aspergillus nidulans Sequencing Project of the Broad Institute of MIT and Harvard . We are grateful to the Broad Institute of MIT and Harvard for access to the sequence data of the many fungal genomes sequenced there. We also used sequence data provided by the US Department of Energy's Joint Genome Institute , from the database held by the Consortium for the functional Genomics of Microbial Eukaryotes (Cogeme ) which is funded by the Biotechnology and Biological Sciences Research Council (UK) and from the Génolevures website .
We are indebted to Dr Francine Perler and others who maintain the intein database at New England Biolabs .
The manuscript was improved by suggestions from four anonymous reviewers.
TJDG was supported by the New Zealand Foundation for Research, Science and Technology (contract no. UOOX0222); MIB was funded by the New Zealand Lottery Grants Board; JG was funded in part by the Otago Research Committee.
RP and MB conceived of the study, participated in the study design, data analysis and manuscript revision. TG participated in intein discovery, data analysis and manuscript revision. JG and MB performed the amplification and sequence analyses. MB drafted the manuscript. All of the authors read and approved the manuscript.