The distribution and evolutionary history of the PRP8 intein

Background We recently described a mini-intein in the PRP8 gene of a strain of the basidiomycete Cryptococcus neoformans, an important fungal pathogen of humans. This was the second described intein in the nuclear genome of any eukaryote; the first nuclear encoded intein was found in the VMA gene of several saccharomycete yeasts. The evolution of eukaryote inteins is not well understood. In this report we describe additional PRP8 inteins (bringing the total of these to over 20). We compare and contrast the phylogenetic distribution and evolutionary history of the PRP8 intein and the saccharomycete VMA intein, in order to derive a broader understanding of eukaryote intein evolution. It has been suggested that eukaryote inteins undergo horizontal transfer and the present analysis explores this proposal. Results In total, 22 PRP8 inteins have been detected in species from three different orders of euascomycetes, including Aspergillus nidulans and Aspergillus fumigatus (Eurotiales), Paracoccidiodes brasiliensis, Uncinocarpus reesii and Histoplasma capsulatum (Onygales) and Botrytis cinerea (Helotiales). These inteins are all at the same site in the PRP8 sequence as the original Cryptococcus neoformans intein. Some of the PRP8 inteins contain apparently intact homing endonuclease domains and are thus potentially mobile, while some lack the region corresponding to the homing endonuclease and are thus mini-inteins. In contrast, no mini-inteins have been reported in the VMA gene of yeast. There are several examples of pairs of closely related species where one species carries the PRP8 intein while the intein is absent from the other species. Bio-informatic and phylogenetic analyses suggest that many of the ascomycete PRP8 homing endonucleases are active. This contrasts with the VMA homing endonucleases, most of which are inactive. Conclusion PRP8 inteins are widespread in the euascomycetes (Pezizomycota) and apparently their homing endonucleases are active. There is no evidence for horizontal transfer within the euascomycetes. This suggests that the intein is of ancient origin and has been vertically transmitted amongst the euascomycetes. It is possible that horizontal transfer has occurred between the euascomycetes and members of the basidiomycete genus Cryptococcus.


Background
An intein is a specific insertion within a host protein that is excised during protein maturation, that is, post-translationally [1]. This maturation process is termed "protein splicing" and involves precise excision of the internal protein (intein) sequence and ligation of the flanking external protein (extein) sequences to form a peptide bond. The excision of the intein and the subsequent ligation of the flanking host exteins are catalysed by the intein itself [2,3]. The sequence encoding the intein appears as an inframe insertion within the gene for the host protein. For the sake of simplicity, this intein encoding sequence is also often referred to as an intein.
As well as controlling their own protein splicing, many inteins also include site-specific 'homing' DNA endonucleases. These homing endonucleases belong to several distinct families (for example the LAGLIDADG and His-Cys box groups); the LAGLIDADG type is by far the most common in inteins. Related homing endonucleases are found encoded in Group I self-splicing introns [4,5]. Homing endonucleases can cause a gene conversion event that converts a cell heterozygous for the intein into a homozygote. To initiate this gene conversion, the homing endonuclease, encoded by an intein sequence located at a specific site, recognises a long DNA target sequence corresponding to an unoccupied allelic site. The homing endonuclease generates a double strand chromosomal break within the unoccupied target sequence. This break is repaired by the host DNA repair machinery using as a template the allele containing the intein gene. This gene conversion results in the replication of the intein gene into a specific site in a previously unoccupied allele. The occupied allele is no longer a target for the homing endonuclease because the target site is split by the insertion [6]. Thus, inteins are 'selfish' mobile elements that occupy unique, specific sites in the genome. When they are copied into an empty allele, they are still retained by the donor allele. These characteristics make inteins especially effective tools in phylogenetic analysis of such phenomena as horizontal transfer.
It has been hypothesised, using information derived mainly from studies of the VMA intein in Saccharomyces, that at the population level the 'super-Mendelian' replication process caused by the homing endonuclease will increase the frequency of the intein within the gene pool of a sexual species, until eventually the intein may come to fixation [7,8]. The rate of spread through a population will depend upon the host mating system [8]. At fixation there will be no remaining unoccupied alleles. Intra-specific movement of the intein will no longer occur and selection will no longer operate on the homing endonuclease function. In the absence of selection, the homing endonuclease function of the intein will become inactive through random mutation [7]. In agreement with this prediction many yeast VMA inteins have been shown to have non-functional homing endonucleases [9]. It has been suggested that inteins depend on horizontal transmission to species that do not already contain an intein at a specific site to ensure their long-term survival [10,11] and retention of active homing endonuclease. This horizontal transfer between species could be initiated by the encoded homing endonuclease. It has been hypothesised [11] that inteins undergo recurrent cycles of (i) horizontal transmission to new genomes, (ii) fixation in the recipient population by homing, (iii) degeneration of the homing endonuclease gene (HEG) due to the absence of target sequences and (iv) eventual loss by deletion of the whole intein.
About 15% of the known inteins lack an endonuclease domain. These minimal protein splicing elements (miniinteins) are probably derived from full-length inteins by deletion. Mini-inteins are 130-200 amino acids in length with conserved sequence blocks at each end, while 'fulllength' inteins (with a central homing endonuclease domain) are about 360-550 amino acids in length [12]. Both mutagenesis studies [13] and 3-D crystal structures [14] have indicated that the two functions inherent in a mobile intein are largely separate. Thus the endonuclease domain is a discrete region of the intein that has little or no functional or structural overlap with the protein splicing domain [15]. Mini-inteins can be considered as an extreme example of HEG corruption and loss. This loss of homing endonuclease function will prevent interspecific or intraspecific replication by homing, restricting the evolution of mini-inteins and their phylogenetic distribution.
Inteins are found in diverse prokaryotes and a few unicellular eukaryotes. Until recently, the only eukaryote nuclear gene intein described was the VMA intein found in the vacuolar ATPase gene of a number of hemiascomycete yeasts such as Saccharomyces cerevisiae [16]. These inteins all appear in exactly the same site (VMA-a) within the host VMA1 gene. Such inteins are referred to as 'allelic inteins', even though they are from different species (there is a non-allelic intein in the VMA-b insertion site of the vacuolar ATPase of several Archaea). It was analyses of the yeast VMA inteins, all of which contain a LAGLIDADG HEG, which suggested the possibility of frequent horizontal transfer between species of ascomycete yeasts although the mechanism for such a transfer of nuclear genes is unknown [17,18]. The suggestion was based on the comparison of the intein phylogeny and the host gene (VMA1).
Recently we reported the presence and sequences of a second set of allelic nuclear encoded inteins (PRP8 inteins) in a large number of strains of Cryptococcus neoformans and in the related species Cryptococcus gattii [19,20]. Cryptococcus neoformans is a basidiomycete fungus capable of causing serious infections in both immunocompromised and immunocompetent people [21]. C. neoformans is divided into two varieties, C. neoformans var. neoformans and C. neoformans var grubii. Molecular phylogenetic work indicates that the grubii and neoformans varieties are separated by ~18.5 million years of evolution, and these varieties diverged from C. gattii ~37 million years ago [22]. Intein encoding sequences are present in the PRP8 genes of both C. neoformans varieties and also in C. gattii. These inteins all lack homing endonuclease domains and are thus miniinteins. The sequence differences of the mini-inteins reflect the relationships among the host species; for example, the variety neoformans and variety grubii mini-inteins are more similar to each other than either is to the inteins of C. gattii [20].
The Cryptococcus mini-inteins are encoded within the nuclear gene for PRP8, a highly conserved protein central in the formation of the spliceosome, that coordinates multiple processes in spliceosome activation [23,24]. The protein-splicing component of the PRP8 inteins must remain functional, even after HEG corruption and loss, so that the intein can remove itself from the PRP8 precursor. If the intein is not removed, it is probable that the spliceosome will be non-functional and the fungus will not be able to process introns from messenger RNA. The presence of the mini-intein in both varieties of C. neoformans and in C. gattii is consistent with their vertical inheritance from their common ancestor but the original source of the intein in Cryptococcus remains unclear. The closest relative of C. neoformans and C. gattii, Cryptococcus amylolentus, does not contain a PRP8 intein, nor does a slightly more distant relative, C. heveanensis [20]. The only other member of the Tremallales clade known to contain a PRP8 intein is Cryptococcus laurentii, a species relatively distantly related to C. neoformans. The C. laurentii intein is a fulllength, HEG-containing, intein [20].
We have investigated the distribution of the PRP8 intein in order to clarify its evolutionary history and, by inference, to better understand the behaviour of inteins in eukaryotes. Previously, PRP8 inteins have been described Insertions in fungal PRP8 genes encode full-length inteins Figure 1 Insertions in fungal PRP8 genes encode full-length inteins. The splicing and homing endonuclease domain motifs of ascomycete VMA inteins aligned with PRP8 inteins present in public databases. A. Alignment of the N-terminal splicing domains (blocks A and B; see InBase [62]. B. Alignment of the C-terminal splicing domains (F and G). C. An alignment of the four homing endonuclease domains (blocks C, D, E, H; see InBase) of the full-length PRP8 inteins and the active ascomycete VMA inteins. A region of variable length is indicated between blocks C and D together with the number of residues removed from the alignment. Accession numbers in the NCBI/protein database for the VMA inteins are SceVMA PXBYVA; ScarVMA, CAC86344; ZbaVMA, CAC86348.1; CtrVMA, A46080. Sequences of PRP8 inteins can be found at InBase and/or from accession data in Table 1. The taxonomic relationships of the species are also summarised in Table 1.
from four species of Cryptococcus [20] and three from filamentous ascomycetes, Aspergillus nidulans, Aspergillus fumigatus and Histoplasma capsulatum [25]. In order to understand intein distribution we have screened the genomes of various fungi for PRP8 inteins. In this work we describe the PRP8 inteins of Paracoccidioides brasiliensis and Uncinocarpus reesii of the Onygales, Botrytis cinerea of the Helotiales and inteins from diverse members of the Sections Fumigati and Clavati of the Eurotiales. Many of these are mini-inteins. This is in contrast to the VMA inteins of yeast, all of which are full-length, although many of their encoded endonucleases are no longer active. In three orders of euascomycete fungi the PRP8 genes have been found to encode inteins with homing endonucleases. Within two of these orders we found closely related species that contain mini-inteins. All three orders contain species that do not contain any PRP8 inteins, often species closely related to intein-carrying species. Thus the phylogenetic distribution of these inteins poses some interesting questions. Comparing and contrasting the datasets of the fungal PRP8 and VMA inteins clarifies some of these questions.

PRP8 inteins in ascomycete sequence databases
To examine the distribution of PRP8 inteins in fungi other than species of Cryptococcus we searched the databases of many genome-sequencing projects with the predicted protein sequence of the

Botrytis
We attempted to amplify a PRP8 intein encoding sequence from a strain of Botrytis cinerea isolated in New Zealand. The strain was confirmed as belonging to B. cinerea by amplifying and sequencing the ITS1 and ITS2 regions of the ribosomal gene array (data not shown). Amplification of the region surrounding the PRP8 intein insertion site, however, showed that this strain does not contain an intein, in contrast to strain B05.10 from the sequence project.

Protein domains encoded by PRP8 intein sequences
The splicing domains and endonuclease domain of PRP8 inteins were predicted through comparison with those of the Saccharomyces cerevisiae VMA intein (SceVMA). We identified the conserved sequence blocks using sequences of known inteins held at InBase [12].  Figure 3). It does not show significant sequence or structural similarity to other protein sequences in the databases. The intein in B. cinerea (Order: Helotiales) also contains a substantial insertion in this region ( Figure 3). The sequence of the insertion in Botrytis shows 58% similarity to the insertion in the A. fumigatus inteins, although it is 33 residues longer. Other, different, indels occur in this region of the other ascomycete inteins and miniinteins ( Figure 3). It may be that this region is especially tolerant of substantial indels. After this variable region, but prior to the first motif of the homing endonuclease (block C), there is a region (~40 residues) that is relatively well conserved across all of the PRP8 inteins (including the mini-inteins) and which must presumably be part of the splicing domain. This region does not, however, show similarity to the N4 splicing domain motif recognised by Pietrokovski [35].
The splicing domains of the PRP8 inteins ( Figure 1, In Base motifs A, B, F, G), whether from full-length or miniinteins, are very closely similar to each other. The splicing domains from the ascomycete PRP8 inteins share more residues with those from the basidiomycete (Cryptococcus) PRP8 inteins than they do with the splicing domains of the saccharomycete (ascomycete) VMA inteins (Figure 1).
The internal regions of some of the PRP8 inteins evidently form part of an endonuclease due to their similarity to other homing endonuclease domains such as those in the VMA inteins. The homing endonuclease domains of the PRP8 inteins ( Figure 1C, InBase motifs C, D, E, H) are closely similar to each other, although the spacing between the first two highly conserved motifs (C and D) is somewhat variable; for example, the distance between motifs C and D in the PRP8 full-length inteins ranges from 212 to 300 residues. Length variation in this region is also seen in the yeast VMA inteins; while in VMA inteins with active homing endonucleases this distance is ~80 residues, in inactive VMA homing endonucleases there may be as many as ~160 residues.  (Figure 3) indicates that the homing endonuclease deletions have occurred at almost the identical site within the ascomycete mini-inteins. The Cryptococcus mini-inteins have an internal deletion of a similar length that also covers all the conserved homing endonuclease domains (Figure 3). The presence of numerous mini-inteins in the PRP8 allelic series contrasts with the VMA allelic series of inteins in saccharomycetes, all of which are full-length.

Nucleotide changes and dS/dN values of PRP8 intein encoding sequences
The endonucleases of the yeast VMA intein series are frequently non-functional. It is of interest to determine if the newly described PRP8 HEG sequences encode active or inactive homing endonucleases. One way of approaching this question is to compare pairs of sequences and determine the frequencies of synonymous and non-synonymous changes in the HEGs of the inteins [36]. A large value for dS/dN implies that the encoded peptide is selectively constrained (functional). It should be borne in mind that since PRP8 is an essential gene, even if the homing endonuclease becomes non-functional, the intein must remain as an ORF and must encode functional splicing domains. The HEG domain is therefore still subject to weak selection pressure, even if it is inactive (for example stop codons and frame-shifts will be selected against).
We analysed the distribution of point mutations in the codons over a concatenated 213bp region of the HEG domains; this region encodes the nine residues of motif C and the whole region covering motifs D, E and H of the homing endonuclease (that is, the residues illustrated in Figure 1C). Results from analysis of the nine full-length PRP8 inteins are summarised in  Table 2 are only just beyond calculation (due to the Jukes-Cantor limit) and are very similar to comparisons that give clear evidence of selective constraint (high dS/dN values).
In conclusion, dS/dN analysis of the PRP8 HEGs suggests that they are constrained by selection. In addition, the high frequency of synonymous substitutions indicates that these inteins have been diverging for a considerable period of time.

Nucleotide changes and dS/dN values of VMA intein encoding sequences
It is of interest to compare the nucleotide substitution pattern in the PRP8 HEGs (described above) with a similar analysis of the yeast VMA HEGs. It has been shown that some of these VMA homing endonucleases are inactive [9]. The longer a homing endonuclease has been inactive, the more random the pattern of substitutions will be (dS/ dN will approach 1.0). We have analysed the substitution patterns between VMA HEGs from 17 strains from 16 yeast species, using the same HEG region used to analyse the PRP8 homing endonucleases (Table 3). Of these 17,       In summary, the correlation of the dS/dN values and homing endonuclease activity in the VMA cohort supports the concept of using dS/dN analyses to predict the activity of the PRP8 intein homing endonucleases. In other words, the comparison of active homing endonucleases gave high dS/dN values; in contrast comparison of inactive homing endonucleases gave low dS/dN values. The high dS/dN values of the PRP8 homing endonucleases are strikingly greater than the VMA active homing endonuclease values. Also, the frequency of synonymous substitutions was less than 0.74 in all but one VMA comparison, that is, the VMA system is not saturated by synonymous substitutions. Taken at face value this implies that the saccharomycete VMA intein allelic group is of more recent origin than the PRP8 allelic group.

Comparison of the nucleotide changes of the VMA and PRP8 intein encoding sequences
The above dS/dN analyses showed numerous comparisons where the Jukes-Cantor correction could not be calculated because of the high frequency of synonymous substitutions. These comparisons are described as NA in Table 2. In order to examine the homing endonuclease evolution in a way that includes all the comparisons, we submitted our data to analysis at the SNAP (Synonymous/ Non-synonymous Analysis Program) site [40]. SNAP calculates rates of nucleotide substitution from a set of codon-aligned nucleotide sequences, based on the method of Nei and Gojobori [37]. The XY-Plot function at this site provides an illustration of the cumulative behaviour of the synonymous and non-synonymous substitutions across the coding region, one codon at a time; that is, the average behaviour at each codon is estimated from all the pair-wise comparisons. In the PRP8 HEG region, the synonymous and non-synonymous substitutions accumulate at a similar rate (despite there being many more non-synonymous options per codon). In the VMA HEG, non-synonymous substitutions accumulate at almost twice the rate as those in the PRP8 HEG region (see Additional file 1: SNAP XY plots of the homing endonuclease encoding regions of the PRP8 and VMA inteins). This analysis, which is based on the complete data set, supports the individual dS/dN comparisons.

PRP8 and VMA intein substitutions at the Asp active sites
To explore the activity of the PRP8 homing endonucleases further we aligned the active site residues [9,36] of the homing endonucleases from PRP8 and yeast VMA inteins.
As an example, the two aspartic acid residues (Asp-218 and Asp-326 in the S. cerevisiae VMA intein; marked by asterisks in Figure 1C) are involved in the active site coordination of a divalent metal ion, and are critical for activity of the homing endonuclease. These two aspartic acid residues are conserved in the active VMA homing endonucleases (S. cerevisiae, S. cariocanus and Z. baillii).
The only inactive homing endonucleases to retain both these sites are from the two Torulaspora species; these homing endonucleases are closely similar to the active Z. baillii homing endonuclease. We believe, therefore, that the Torulaspora homing endonucleases have only recently become inactive. Of the two VMA homing endonucleases whose functionality is unknown, the one from C. glabrata lacks both these critical aspartic acid residues and the D. hansenii homing endonuclease lacks the proximal aspartic acid. The homing endonuclease domains of all of the ascomycete PRP8 inteins, except PbrPRP8 and ClaPRP8, have both these critical aspartates, supporting the belief that these homing endonucleases are active.

Euascomycete fungi lacking a PRP8 intein
During our systematic search for further PRP8 inteins using the public sequence databases (last search September 26th 2005), we encountered several instances where there was no PRP8 intein in close relatives of species known to contain an intein (see Table 4 for the complete list). For example, Coccidiodes posadasii and Coccidiodes immitis (Order: Onygales) are very closely related to Uncinocarpus reesii (Bowman et al., 1996), but neither species of Coccidioides contains an intein, whereas U. reesii does (Table 1, Table 4). In the family Sclerotiniaceae, B. cinerea and Sclerotinia sclerotiorum are very closely related [42] but there is no PRP8 intein in S. sclerotiorum (Table 4). Several species of Aspergillus are also known not to have a PRP8 intein [25] and (Table 4).
Within the section Clavati of the genus Aspergillus there are two major clusters of species [29,31]. One cluster includes A. clavatus (which has no PRP8 intein) and the other includes A. giganteus, which has a PRP8 mini-intein quite distinct from those present in species from the section Fumigati. Within the Fumigati, only one of the species analysed (A. unilateralis) was without an intein.
PRP8 inteins may eventually be found in other isolates of some of these apparently 'intein-less' species. The data in Table 4 are derived from single isolates of each species and it is possible that species will be found that are polymorphic for the PRP8 intein.

Distribution of PRP8 inteins
In addition to the sporadic nature of PRP8 intein distribution within the euascomycetes mentioned above, the wider distribution of the PRP8 intein is also decidedly sporadic ( Figure 4) ( Table 4). The intein is present in species from three orders of euascomycetes (Pezizomycotina) and a very narrow range of basidiomycete species. This would suggest a widespread occurrence and an ancient origin for the intein within the euascomycetes; however most euascomycetes for which there is PRP8 sequence data available do not have an intein in PRP8. There is no intein encoded in the PRP8 gene of any species from the other major ascomycete class, hemiascomycetes (Saccharomycotina); for example, Candida albicans, Candida guilliermondii, Eremothecium gossypii, Kluyveromyces waltii, Saccharomyces cerevisiae. Neither of the two archaeascomycetes (Schizosaccharomyces pombe, Pneumocystis carinii) for which PRP8 sequence data are available encodes a PRP8 intein. In summary, if the PRP8 intein is descended from a precursor present in the common ancestor of the Ascomycota, the loss of the intein must have been a frequent occurrence, either by deletion or by fixation of an empty allele.
The only other PRP8 inteins are present in four species of Cryptococcus (a basidiomycete genus from another phylum). A survey of other Cryptococcus species within the Tremellales has not yet yielded any further PRP8 intein sequences but has shown 'empty sites' in eight species, including in C. amylolentus, the most closely related species to C. neoformans and C. gattii [20].

Phylogenetic analyses
To better understand the sporadic distribution of the PRP8 inteins we analysed the sequences of the PRP8 host proteins and PRP8 intein splicing domains. This analysis should determine if the inteins have a similar phylogeny to that of the PRP8 proteins (suggesting vertical descent of the intein) or, alternatively, have discordant phylogenies suggesting horizontal transfer. We chose to concentrate the phylogenetic analysis on the splicing domains for two main reasons. Firstly, C. neoformans and C. gattii have PRP8 mini-inteins, as do several of the euascomycetes. These inteins therefore have splicing domains but no homing endonucleases. Secondly, it has been shown that many of the yeast VMA homing endonucleases are nonfunctional and this will influence the rate of substitution and therefore potentially disturb phylogenetic analyses.
The phylogeny of the PRP8 proteins follows the expected organism phylogeny (Figure 4). The basidiomycete PRP8 sequences (Cryptococcus, Coprinopsis, Ustilago and Phanerochaete) fall within the fungal group, but are outside those of the ascomycete species. The euascomycete PRP8 proteins group into their respective orders, all well separated from hemiascomycete PRP8 proteins. There are no paralogs of the PRP8 gene present in any genome that might perturb this phylogeny.
Further phylogenetic analyses were based on an alignment of the four sequence blocks/motifs of the splicing domains of the fungal inteins. These analyses indicate that the splicing domains of the euascomycete PRP8 inteins are closely similar to those of the Cryptococcus PRP8 inteins ( Figure 5). There is much less variation between the splicing domain sequences of the euascomycete PRP8 inteins and the splicing domains of the Cryptococcus PRP8 inteins than there is within the splicing domains of the VMA inteins found in the single family Saccharomycetaceae. The basidiomycete and ascomycete fungi last shared a common ancestor more than 500 million years ago,  Phylogenetic tree based on an alignment of PRP8 proteins Figure 4 Phylogenetic tree based on an alignment of PRP8 proteins. The tree was constructed by the neighbour-joining method using PAUP* [61] and is a consensus derived from 1000 bootstrap replicates.  Phylogenetic tree of inteins based on an alignment of the four protein splicing domain motifs Figure 5 Phylogenetic tree of inteins based on an alignment of the four protein splicing domain motifs. Phylogenetic tree based on an alignment of the four protein splicing motifs from the PRP8 inteins and the VMA inteins, together with splicing domains from inteins found in eukaryote viruses and plastids. The tree was constructed by the neighbour-joining method using PAUP* [61] and is a consensus derived from 1000 bootstrap replicates. The numbers indicate the percentage bootstrap support (only nodes with >50% support are shown). Accession numbers of the intein sequences are as in Figures 1 and 3 or can be obtained from InBase [62]. S. dairenensis and S. castellii are now included in the newly described genus Naumovia; S. exiguus and S. unisporus are now included in the genus Kazachstania [70]. Phylogenetic trees based on alignments of the homing endonuclease domains Figure 6 Phylogenetic trees based on alignments of the homing endonuclease domains. Phylogenetic trees based on alignments of the whole homing endonuclease from:A. PRP8 full-length inteins. B. VMA inteins. Asterisks denote the VMA intein homing endonucleases known to be active [9]. The trees were constructed by the neighbour-joining method using PAUP* [61] and each represents a consensus derived from 100 bootstrap replicates. The numbers indicate the percentage bootstrap support (only nodes with >50% support are shown). Accession numbers of the intein sequences are as in Figures 1 and 3 or can be obtained from InBase [62]. S. dairenensis and S. castellii are now included in the newly described genus Naumovia; S. exiguus and S. unisporus are now included in the genus Kazachstania [70]. while two of the most diverged VMA intein carrying species (Kluyveromyces lactis and Saccharomyces cerevisiae) diverged approximately 70 million years ago [43]. The close similarity of the basidiomycete and euascomycete PRP8 inteins does not reflect the length of time since the divergence of their host organisms, and suggests horizontal transfer from a euascomycete to a basidiomycete.
The splicing domains used in this alignment span only 126 residues and thus provide insufficient discrimination within the Fumigati group of the Aspergillus species to determine within-group relationships. Inteins carried by species in more taxonomically distant groups are clearly separated however (for example those in the Onygales order), indicating significant phylogenetic signal is present.
Phylogenetic analysis of the PRP8 homing endonuclease domains was limited to the nine full-length inteins in Table 1. The phylogram ( Figure 6) indicates an unresolved polytomy, reflecting the highly diverged nature of the homing endonucleases, with the exception of the closely related Aspergillus fumigatus/Neosartorya fischeri group and the moderately related Histoplasma capsulatum/Paracoccidioides brasiliensis group. This extensive divergence means that nothing can be concluded from these data about the phylogenetic relationship of the Cryptococcus laurentii PRP8 endonuclease with the euascomycetes. These results fit with the conclusions derived from the synonymous substitution analysis; the PRP8 intein shows great diversity. The comparable VMA phylogram also shows a great deal of diversity in the homing endonuclease domain ( Figure 6), however much of this diversity is in inactive endonucleases that would be expected to evolve more quickly.
The PRP8 and VMA HEG domains show comparable protein diversity. In contrast, the synonymous substitution frequency has reached saturation in the PRP8 HEGs, while there is much less synonymous substitution present in the VMA HEGs. These observations taken together suggest that the PRP8 intein is the more ancient element.

Discussion
In the introduction we described a model in which HEGs (and more specifically, homing endonuclease-containing inteins) undergo recurrent cycles of (i) horizontal transmission to new genomes, (ii) spread and fixation in the recipient population by homing, (iii) degeneration due to the absence of target sequences and (iv) eventual loss [11]. This general model could be applied to any HEG but the HEGs of the VMA inteins of the saccharomycetes provided much of the experimental basis for the model. Our results allow evaluation of this model in the context of a second group of nuclear inteins. Relevant gene distribu-tion models, used to investigate patchy, non-phylogenetic gene distributions, have been the subjects of several studies [44,45].

Full length inteins: active and inactive homing endonucleases
The majority of the homing endonucleases encoded by the VMA intein sequences are no longer functional [9]. In vitro assays of the activity of the VMA intein homing endonucleases have shown that only three are functional [9]. The dS/dN values derived from comparisons of these active inteins (calculated across a concatenation of the conserved motifs C, D, E and H) are higher on average than the dS/dN values produced when inactive homing endonucleases are compared. As noted by Koufopanou and Burt [36]  Even though the PRP8 homing endonucleases appear to be constrained by selection, they are nevertheless quite diverse. This may be attributable, in part, to a long evolutionary separation but it also seems to reflect a relatively relaxed selection as compared to the splicing domain. The diversity of the homing endonucleases is shown by amino acid substitution as illustrated in the phylogenetic tree ( Figure 6), but there is also frequent indel occurrence. This diversity makes the homing endonucleases useful sequences for the study of the phylogenetic relationships of intein carrying fungi. Many major human pathogens (A. fumigatus, Histoplasma, Paracoccidioides) and plant pathogens (Botrytis) carry inteins with homing endonucleases that, because of their diversity, can facilitate phylogenetic analysis.

Mini-inteins: artificial and natural
The VMA inteins of the saccharomycetes are all fulllength, homing endonuclease-containing inteins. No allelic VMA mini-inteins have been reported. Chong and Xu [13] created artificial, splicing-competent, VMA inteins by the deletion of the internal homing endonuclease region (motifs C, D, E and F) from the S. cerevisiae VMA intein. In order for the splicing activity to be restored in these artificial constructs, the 183amino acid residue deletion had to be replaced by a 14 or 19 amino acid residue 'linker' region. Chong and Xu [13] suggest that the linker provides sufficient flexibility to allow correct conformation for efficient splicing. This result demonstrates that a single deletion event removing all of the HEG is compatible with retaining splicing function (that is, the HEG and splicing functions are separate).
In contrast to the VMA inteins of saccharomycetes and the GLT1 inteins present in other ascomycetes [46], which are all naturally full-length, the PRP8 inteins of the basidiomycetes and euascomycetes are found both as miniinteins and homing endonuclease-containing inteins. We previously reported that the PRP8 genes of Cryptococcus neoformans and C. gattii encode a mini-intein [19,20]. The full-length euascomycete PRP8 inteins vary in length from 517 to 838 residues; the mini-inteins, in contrast, are all of very similar length (153 to 180 residues), with only the A. giganteus and Cryptococcus mini-inteins being somewhat different in length. We have not detected inteins of an intermediate length or inteins with only remnants of any of the homing endonuclease domains. This raises the interesting questions of why the PRP8 system should have frequent mini-inteins and also how a full-length intein might be precisely reduced to a mini-intein.

Empty alleles and the intein cycle
Inteins are often present at highly conserved sites in their host proteins, presumably because such inteins are less likely to be deleted. The model of Burt and Koufopanou [11] proposes that inteins can be deleted to regenerate target sites and that target sites can be invaded through horizontal transmission. The allelic PRP8 inteins are present at a highly conserved site in a highly conserved protein that forms an essential part of the spliceosome [23]. The elimination of the intein or mini-intein encoding sequence would have to be exact to allow the continued function of the protein encoded by the PRP8 gene. Another aspect of the model is the expectation that the frequency of the HEG allele will increase within the gene pool and eventually come to fixation. The model therefore proposes that, if a HEG is present in a species, then all members of a species should eventually carry the intein. This is not the case for S. cerevisiae where some members do not have the VMA intein [18; and authors' unpublished data], or for Candida (Pichia) guilliermondii where some members of the species do not contain the GLT1 intein [46,47]. The genome sequencing strain of Botrytis cinerea has a full-length PRP8 intein, while a strain isolated in New Zealand has no allelic intein. We are investigating the intein status of other B. cinerea strains. At present it therefore seems true that inteins are not found at fixation in many species, perhaps most species. Aspergillus fumigatus may have the intein at fixation but this species seems to be an asexual clone with little polymorphism. The model of Burt and Koufopanou [11] may require modification for asexual or predominantly asexual species such as A. fumigatus. As pointed out by these authors, the amount of gene flow within (or between) populations and species will determine how rapidly an element such as an intein will come to fixation [8,11]. The VMA intein is only capable of homing during meiosis. If this restriction applies to other eukaryote inteins one would expect asexual species to lose their inteins more rapidly than sexual species.
Liu and Yang [25]  To determine if an intein is present in a species it is necessary to analyse more than one member of a species (this is a limitation of the data in Table 4). Similarly, before concluding that all members of a sexually reproducing fungal species carry an intein (as the model expects will be the case at fixation) it is necessary to test a significant number of isolates from diverse origins, as suggested by Burt and Koufopanou [11]. This prediction was investigated by Okuda et al. [18], who sequenced the VMA1 gene (and any accompanying VMA intein sequence) from 10 strains of Saccharomyces cerevisiae. They detected two different groups of VMA intein sequence, one group of 3 were identical to the intein sequence in strain DH1-1A and the other group of six were identical to the strain X2180-1A. The two groups were 96% identical to each other. A final strain, NKY278 (1) did not contain an intein in VMA. A further strain of S. cerevisiae, the natural isolate RM11-1a, does not contain an allelic VMA intein [authors' unpub-lished data ; 48]. It then follows that not all members of the species contained an intein, even though the S. cerevisiae VMA homing endonuclease is active.

Intein, mini-intein, empty allele polymorphism
It is possible that inteins do not typically come to fixation but rather achieve an equilibrium with their empty targets within the gene pool. For example, it may be that the presence of an intein is not perfectly neutral but confers a very slight selective disadvantage due to the added process of protein splicing or the occurrence of ectopic homing endonuclease cleavage. If inteins exist in equilibrium with empty target sites within a gene pool this would allow continuous selection for homing endonuclease function during extended periods of vertical transmission. If true, this would explain how an active homing endonuclease could be retained during an extended period of vertical transmission. Segregation of the empty allele (without any requirement for deletion) would also explain the frequent occurrence of species lacking the intein. This modified model may be applicable to the filamentous fungi such as Aspergillus fumigatus that, because of their abundant aerial mitotic conidia, have global distributions and therefore very large effective gene pools.

Intein phylogeny and horizontal transmission
The only occurrences of an intein in PRP8 outside the euascomycetes are those found in closely related, pathogenic, varieties of Cryptococcus neoformans and C. gattii (members of the genus: Filobasidiella) and in the occasional pathogen Cryptococcus laurentii. The absence of an intein encoded in the PRP8 gene of species closely related to Cryptococcus neoformans and C. gattii poses an interesting puzzle as to the evolutionary origin of the immobile mini-inteins at this site in the Cryptococcus PRP8 protein.
If the intein were inherited vertically, it might be expected to be present in all, or at least some, of the other members of the Tremellales group, whereas the only other Cryptococcus species in which a PRP8 intein has been detected is Cryptococcus laurentii [20]. This species contains a fulllength homing endonuclease-encoding intein in PRP8. C. laurentii is a species from a different cluster of the Tremellales group than C. neoformans [49]. It might be suggested as a source for the C. neoformans mini-inteins. The intein is absent, however, from all the other Tremellales species that have been examined, including another member of the Filobasidiella genus (C. amylolentus). The common ancestor of pathogenic cryptococci may have gained the intein by horizontal transmission of an HEG containing intein sequence from another PRP8 gene. The presence of an allelic, HEG containing, intein sequence in the PRP8 gene of a group of euascomycetes suggests that this euascomycete group is a possible source of the Cryptococcus mini-intein.

Conclusion
Our analyses of the PRP8 intein group have shown many points of difference between PRP8 inteins and the VMA inteins of saccharomycetes. The PRP8 intein has been found in members of two phyla (ascomycetes and basidiomycetes), while the VMA intein of yeast is restricted to several genera in one family, the Saccharomycetaceae. Many PRP8 inteins are mini-inteins, whereas no mini inteins have been found in the VMA-a site of members of the Saccharomycetaceae. Most of the VMA inteins contain an inactive homing endonuclease. In contrast, the homing endonucleases of the PRP8 inteins are apparently active (they show high dS/dN values). The phylogeny of the euascomycete PRP8 inteins provides no evidence for horizontal transfer. Comparison of the homing endonuclease domains indicates that the level of synonymous change has reached saturation (pS >0.74) suggesting that the homing endonucleases have been diverging over a substantial period of evolutionary time. Despite this extended period of vertical transmission, the homing endonucleases have apparently remained active. The VMA homing endonuclease is not saturated by synonymous substitutions (pS<0.74). This implies that the VMA intein allelic group is of more recent origin than the PRP8 allelic group, even though the homing endonuclease amino acid sequences are more divergent than those of the PRP8 inteins. The extensive divergence in the VMA homing endonucleases is attributable to their loss of function.
We submit a modified model for intein evolution that may be more appropriate for inteins present in euascomycete species. We suggest that inteins may not become fixed in a population, but that their presence/absence may be polymorphic in a particular species. This would allow continuous selection for homing endonuclease function during extended periods of vertical transmission. Segregation of the empty allele (without any requirement for deletion) would also explain the frequent occurrence of closely related species that lack the intein.

Intein sequencing
Isolates of species of Aspergillus or Neosartorya were obtained from Food Science Australia [50] except for Aspergillus nidulans strain R20 which originates from Glasgow University and A. fumigatus var. ellipticus (NRRL5109) and strains of A. lentulus (FH4, FH5, FH7 and FH220) which were provided by Arun Balajee of the Fred Hutchinson Cancer Research Center, Seattle. Strains were grown on Aspergillus nutrient agar [51] at 27°C, or 37°C for A. fumigatus. A strain of Botrytis cinerea, isolated from a vineyard in New Zealand, was provided by colleagues at Lincoln University. Genomic DNA was isolated from 50ml overnight cultures essentially using the method of Philippsen et al. [52]. Genomic DNA samples from spe-cies within the section Fumigati were supplied by Carla Rydholm of Duke University. DNA from Paracoccidioides brasiliensis (strain Pb18) was a gift of Professor Gustavo Goldman from the Universidade de São Paulo, Brazil.
Amplification of the intein sequence and flanking regions was accomplished with the Expand High Fidelity PCR system (Roche, Mannheim, Germany) as outlined in Butler, Goodwin and Poulter [19]. Primers were synthesised by Proligo, Singapore; the primer sequences used are shown in Table 5. Amplifications of the internal transcribed spacer (ITS) regions, including the 5.8S rRNA gene, were performed using primers ITS1 (5' TCCGTAGGTGAACCT-GCGG) and ITS4 (5' TCCTCCGCTTATTGATATGC) [53]; the mitochondrial cytochrome b genes were amplified using the primers (Wang_E1m (5' TGAGGTGCTACAGT-TATTAC) and Wang_E2rev (5' GGTATAG [AC]TCTTAA [AT]ATAGC) [54]. The resulting PCR products were purified with Qiagen columns (Hilden, Germany) prior to automatic sequencing at the Allan Wilson Centre Genome Service at Massey University [55] using an ABI 3730 DNA Sequencer.
General sequence analyses were done using 4Peaks V1. 6 [56] and the Wisconsin GCG package (Genetics Computer Group, 575 Science Drive, Madison, Wis.). Sequence similarity searches were performed using the National Center for Biotechnology Information BLAST server [57] and at the various fungal genome sequencing project web sites (see below). Multiple sequence alignments were con-structed using CLUSTAL_X at the European Bioinformatics Institute server [58] edited with Seaview [59] and shaded with MacBoxshade [60]. Phylogenetic trees were constructed using PAUP4b10 [61]. The rates of synonymous and non-synonymous substitutions within the intein HEG domains were calculated using Syn-SCAN, a program found within the Resources section of the Stanford HIV RT and Protease Sequence database [38,39].
Representatives of the newly described intein sequences in the PRP8 genes and flanking sequences have been assigned GenBank accession numbers AY832918-AY832926 (Table 1). Descriptions of the ascomycete PRP8 inteins have been added to those of the Cryptococcus inteins at InBase records [62].

Fungal genome sequencing projects
Histoplasma capsulatum Genomic sequence data were from the Genome Sequencing Center at Washington University in St. Louis [26] where two distinct strains of H. capsulatum (G217B and G186AR) are being sequenced, and from the Broad Institute [48].

Paracoccidioides brasiliensis
Data consist of a large set of P. brasiliensis ESTs generated by a group of laboratories in Brazil in order to gather information about the differences in yeast and mycelial transcriptomes of this pathogenic fungus [63]. Data are available via GenBank accessions CA580326-CA584263.  [65]. The Génolevures website [66] provides data and search tools relating to the complete genomes of four hemiascomycetes and the initial data from random sequencing of nine other hemiascomycete species.
We also searched the database held by the Consortium for the functional Genomics of Microbial Eukaryotes (Cogeme). In this project, expressed sequence tags (ESTs) have been obtained from thirteen plant pathogenic fungi and two plant pathogenic oomycetes [67]. ESTs representing the same gene have been used to produce a single contig or consensus sequence and a BLAST facility is available at the website [68]. We also used data provided by the US Department of Energy's Joint Genome Institute [69] derived from the genome of the basidiomycete Phanerochaete chrysosporium (the white rot fungus).