Group I introns and associated homing endonuclease genes reveals a clinal structure for Porphyra spiralis var. amplifolia (Bangiales, Rhodophyta) along the Eastern coast of South America

Background Group I introns are found in the nuclear small subunit ribosomal RNA gene (SSU rDNA) of some species of the genus Porphyra (Bangiales, Rhodophyta). Size polymorphisms in group I introns has been interpreted as the result of the degeneration of homing endonuclease genes (HEG) inserted in peripheral loops of intron paired elements. In this study, intron size polymorphisms were characterized for different Porphyra spiralis var. amplifolia (PSA) populations on the Southern Brazilian coast, and were used to infer genetic relationships and genetic structure of these PSA populations, in addition to cox2-3 and rbcL-S regions. Introns of different sizes were tested qualitatively for in vitro self-splicing. Results Five intron size polymorphisms within 17 haplotypes were obtained from 80 individuals representing eight localities along the distribution of PSA in the Eastern coast of South America. In order to infer genetic structure and genetic relationships of PSA, these polymorphisms and haplotypes were used as markers for pairwise Fst analyses, Mantel's test and median joining network. The five cox2-3 haplotypes and the unique rbcL-S haplotype were used as markers for summary statistics, neutrality tests Tajima's D and Fu's Fs and for median joining network analyses. An event of demographic expansion from a population with low effective number, followed by a pattern of isolation by distance was obtained for PSA populations with the three analyses. In vitro experiments have shown that introns of different lengths were able to self-splice from pre-RNA transcripts. Conclusion The findings indicated that degenerated HEGs are reminiscent of the presence of a full-length and functional HEG, once fixed for PSA populations. The cline of HEG degeneration determined the pattern of isolation by distance. Analyses with the other markers indicated an event of demographic expansion from a population with low effective number. The different degrees of degeneration of the HEG do not refrain intron self-splicing. To our knowledge, this was the first study to address intraspecific evolutionary history of a nuclear group I intron; to use nuclear, mitochondrial and chloroplast DNA for population level analyses of Porphyra; and intron size polymorphism as a marker for population genetics.


Background
Group I introns belong to a family of RNAs with catalytic activities. These ribozymes are mobile elements inserted within coding sequences of nuclear rDNA, chloroplast and mitochondrial genomes of some eukaryotes; and less frequently within coding sequences of eubacteria, phages and viruses. Group I introns catalyze their own excision (self-splicing) from pre-mRNA when mature RNA is being processed. The exact site of intron excision and the perfect reestablishment of the interrupted message are defined by specific interactions between intron and exons, determined by a conserved secondary structure. Group I introns fold on a structure, forming 10 conserved paired elements (P1-P10) with a conserved catalytic core [reviewed in [1]].
Size polymorphisms in group I introns have been described [2][3][4][5] and are occasionally generated by the insertion of a mobile element such as homing endonuclease genes (HEG) in peripheral loops of intron paired elements P1, P2, P6, P8 and P9 [4,6]. Homing endonuclease genes encode for site specific homing endonucleases (HEs), which in genetic crosses between an HEG containing intron allele and an intronless allele, recognize the intron insertion site and catalyze a double strand break. The intronless allele is then repaired using the HEG containing intron allele as a template. This mechanism of intron mobility is known as homing [6]. Homing endonucleases are classified in five different families according to conserved protein motifs and functional and structural properties: LAGLIDADG; GIY-YIG; H-N-H; His-Cys box [6][7][8]; and the recently described PD-(D/E)-XK motif [9]. His-Cys box motifs are identified in HE exclusively associated to nuclear group I introns [10]. Homing endonucleases were described for fungi, protists, bacteria and viruses, but with unknown function for the hosts [6].
In a previous work, Oliveira and Ragan [2] characterized introns of different sizes inserted in the nuclear small subunit rRNA gene (SSU rDNA) close to the 3' end (intron S1506) of three Porphyra spiralis var. amplifolia (PSA) individuals collected at different sites on the Southern Brazilian coast. Open reading frames (ORFs) with His-Cys Box motifs were described inserted in the P1 paired element, confined within the conserved pair U*G, located in the SSU rDNA exon and in the intron respectively in the complementary strand. This region is known as P1-extension [4]. These findings prompted us to: 1) characterize introns size polymorphisms at different PSA populations on the Eastern coast of South America; 2) Infer genetic relationships and population structure of PSA populations using introns in addition to rbcL-S and cox2-3 regions as genetic markers; and 3) Verify if the different polymorphisms in peripheral loop of intron P1 paired element affected qualitatively introns excision, through an in vitro self-splicing assay.

DNA extraction, PCR amplification and sequencing
Population samples of Porphyra spiralis var. amplifolia were collected at eight different sites in the Southern Brazilian shore (Table 1, Figure 1). A minimum of 10 individuals were obtained from each site. Gametophyte blades were identified based on morphological description [24], and did not present any meaningful morphological variation among and within populations. Samples were screened for epiphytes using a stereomicroscope, and stored individually in silica gel. Each individual was ground in liquid nitrogen and total genomic DNA was extracted using the "DNeasy Plant Mini Kit" (Qiagen, Santa Clarita, CA), according to manufacturer's specifications. Voucher specimens are deposited at University of São Paulo herbarium (SPF, Table 1).
Total DNA was extracted from 10 individuals for each of the eight geographic locations. Primers 1400F and 18S3' were used to amplify part of the 3' end of the first SSU rDNA exon + intron + the 5' end of the second SSU rDNA exon; as a positive control for intron presence in the multiple SSU rDNA copies [25] primers 1400F and iR2 were used to amplify HEG-containing ORF, including the flanking 213 bp of the SSU rDNA 5'exon and 175 bp of the intron. Primers COX 2F and COX 3R were used to amplify the 3' end of cox2 gene + spacer + 5' end of cox3 gene; and primers F993 and RBCS3'R were used to amplify the 3' end of rbcL gene + spacer + 5' end of rbcS gene.
PCR amplification conditions for a total volume of 50 μL were: 1× PCR buffer, 1 Table 2. Negative controls for PCR reactions, that included all reagents except DNA template, were performed. At least three independent PCR reactions were pooled together before sequencing [26].
PCR products were purified using the MicroSpin™ S-300 HR Columns (Amersham Pharmacia Biotech, Piscataway, NJ), and were directly sequenced on an ABI PRISM™ 310 Genetic Analyser or 3100 DNA Sequencer (Applied Biosystems, Foster City, CA) using the sequencing kit " BigDye™ Terminator Cycle Sequencing Ready Reaction" (Applied Biosystems) according to manufacturer's specifications. Sequences were manually assembled and aligned with BioEdit version 5.0.6 [27]. Ambiguous nucleotides within the same individual sequence position were checked against the sequencing chromatograms, to confirm validity of the nucleotide.
All parameters implemented in NETWORK were set to default: Characters weights (10 for all characters), transversions/transitions ratios (1:1) and the distance calculation method (connection cost). Parameter epsilon, a weighted genetic distance measure, was set to 0. Population genetics analyses for intron + HEG were carried out using Arlequin [32]. The dataset was input as sequence length polymorphism (based on PCR results) with ten individuals per population. Introns were grouped in four categories according to their size: 1-616 bp; 2-from 791 to 792 bp; 3-909 bp; 4-from 1054 to 1058 bp. This dataset was analyzed for F-statistics implementation. This software was also used for Mantel's test of isolation by distance. PSA-B, PSA-D and PSA-R individuals were excluded from these analyses (data available for only one individual per population).
Intron nomenclature adopted in this work (i.e. S1506) was modified from Johansen and Haugen [33] and the insertion location of the introns is given according to the reference position in Escherichia coli SSU rDNA.

Cloning and in vitro transcription
Primers 1400F and 18S3' were used for the amplification of the nuclear SSU rDNA intron, including the flanking 213 bp of the SSU rDNA 5'exon and 27 bp of the SSU rDNA 3'exon. Amplicons of one individual of PSA-G, PSA-L, PSA-T and PSA-V populations were cloned according to manufacturer's specifications in pGEM ® -T vectors (Promega Corporation, Madison, WI), and were replicated in E. coli DH10B. Plasmids were recovered and purified with Wizard ® plus SV Minipreps DNA Purification System (Promega) according to manufacturer's protocol. Inserts were PCR amplified with primers T7 and 18S3' for an in vitro transcription assay (Table 2), and purified with the kit Wizard ® SV Gel and PCR Clean up System Map of South America highlighting Porphyra spiralis var. amplifolia (PSA) collection sites  Table 1. * Data obtained from Oliveira and Ragan [2]. Freshwater et al. [46] (Promega) according to manufacturer's protocol. Negative controls for PCR reactions were performed.

Intron in vitro self-splicing assays
The extracted RNAs were tested for intron qualitative in vitro self-splicing by the following assay: an aliquot of each transcription reaction was incubated at 45°C for 45 min, in the presence of the self-splicing buffer as described in Sogin  PCR products were purified from agarose gels using Wizard ® SV Gel and PCR clean up system kit (Promega) and were re-amplified with primers 1400F and 18S3' as described above. The PCR cycle used was as described for RT-PCR. PCR products were purified with Wizard ® SV Gel and PCR clean up system kit (Promega) and were directly sequenced as described above.  Complementary strands of P1-extensions of the 17 haplotypes were translated in silico to amino acid sequences according to Haugen et al. [4]. Open reading frames from eight to 150 amino acids, were generated (Table 3, Figure  2) and were blasted against other available proteins in GenBank with BLASTP [36]. Start codons for HEG were found in all ORFs, however when compared to others HEs, premature stop codons or stop codon deletions were observed. His-Cys box motifs, zinc binding sites and active sites were characterized on intron-coding complementary strands for six PSA individuals from three different populations (Haplotypes H2, H4 and H5, Table 3, Figure 2). For the remaining introns, His-Cys box motifs, zinc binding sites and active sites were only verified when frame-shifts corrections were manually inserted in silico, or were absent. These results indicate that P1-extension of the 17 haplotypes are degenerated HEG.

Genetic relationships and population structure of PSA populations
To determine the genetic relationships among the studied PSA populations, three median joining networks were constructed. The first one included intron + HEG sequences ( Figure 3A), the second was performed with intron without HEG sequence data ( Figure 3B), and the third was constructed with cox2-3 region sequences (Figure 3C).
The network generated for intron + HEG sequences shows three main clusters connected by median vectors, which represent missing intermediates, that is extant haplotype that was not sampled or an extinct ancestral haplotype [37].  The results for pairwise Fst analyses of frequency of allele size polymorphisms are presented in Table 5. It was possible to note three distinct patterns of significant Fst values, according to population geographic distribution. The minimum spanning tree generated for intron without HEG ( Figure 3B) and cox2-3 region sequences ( Figure 3C) exhibit a star-like topology. For intron without HEG, the most frequent central haplotype occurred in 9 out of 11 collection sites (39 of the 44 individuals; 89%). For cox2-3 region sequences, the majority of the individuals (7 of the 8 populations accounting for 30 out of 40 individuals analyzed; 75%) possess the most frequent central haplotype. According to both networks topologies, a recent population expansion was detected for these markers. To confirm whether the exons were ligated, a RT-PCR reaction was performed with primers 1400F and 18S3' using the RNA previously incubated in the conditions described above. The bands were excised from the gel and re-amplified with primers 1400F and 18S3' which anchors in the exons. The results of the re-amplification are shown in additional file 3. Amplicons were sequenced and the smaller bands were the joined 5' and 3' exons, presenting the reconstructed insertion site.

Characterization of introns size polymorphisms at different PSA populations
Group I introns are well documented in the literature occurring in the red algal genera Bangia and Porphyra (Order Bangiales) [5,[13][14][15]. Some of these group I introns present ORFs of different sizes inserted in their P1 and P2 paired elements [3][4][5]. Size variation in these ORFs represents different stages of the HEG cycle (full length, degenerated or absent). Goddard and Burt [38] postulated that the HE coded by an intron recognizes the intron insertion site in an intronless population, invade it by lateral transfer and then it is vertically transmitted to the offspring. After being fixed with high frequencies in a population, the HEG degenerates to a non-functional state, and then the intron and the HEG tend to be lost. In this way, the intron recognition site is reestablished becoming available to be invaded again by an active HEGcontaining intron from the same species or from a closely related species, thus restarting the homing cycle.
In a previous work, Oliveira and Ragan [2] characterized three size polymorphisms for group I introns from three PSA individuals. In this work, two more sizes were characterized, in a total of five different sizes distributed in 11 PSA populations along the Brazilian coast. According to the cycle proposed by Goddard and Burt [38], HEGs can be found in three different character states: functional (full length), nonfunctional (degenerated) and absent (both HEG and intron). In PSA populations analyzed in this work, we could only detect the nonfunctional state represented by HEG degeneration, indicating that full length HEG containing intron was once fixed for these populations. The different states of the intron + HEG are not always found within natural populations, probably as a result of insufficient sampling [1].
Müller et al. [5] evaluated if the cycle proposed by Goddard and Burt [38], applied to group I introns present in the order Bangiales. Presence of introns containing degenerated HEGs, presence of introns without HEG and absence of introns, all these states scattered along individ-

PSA-
uals from different species, indicated that Goddard & Burt [38] model is supported by intron + HEG distribution in the order Bangiales.
Of the 44 introns sequences, only six presented the intact His-Cys box motif. Although these ORFs did not present frameshifts mutation, they terminated prematurely relatively to the amino acid sequence for the homing endonuclease I-PpoI from the slime mould Physarum polycephalum Median-joining networks: A -Introns + homing endonuclease gene (HEG); B -and introns without HEG; C -Cox2-3 region [39], likewise Porphyra fucicola and P. umbilicalis HEG sequences [5]. As these sequences were not tested for endonuclease activity, they will be considered as HEG pseudogenes, as suggested by Müller et al. [5].

Cox2-3 and rbcL-S regions analyses
Cox2-3 and rbcL-S regions were sequenced in addition to the SSU rDNA introns, to infer genealogical relationships of PSA populations. Divergence among cox2-3 haplotypes of the 40 PSA individuals sequenced ranged from 0.2% to 0.7%. These values are in accordance to the divergence found among cox2-3 haplotypes from Grateloupia doryphora (0.3% to 0.6%) from North Atlantic and North Pacific [19]. However, they differ significantly when compared to the divergence among cox2-3 region from Batrachospermum helminthosum individuals from North America, 0.3% to 6.5% [22] and from Acanthophora spicifera individuals from the Hawaiian Islands, where a single haplotype was observed [40].
RbcL-S region has been employed in Rhodophyta as a marker at the inter-specific, intra-specific and intra-population levels [21,23,[41][42][43], with levels of divergence of: 13% to18% for Gracilaria species [41] and 12.5% to 13.4% for individuals of the Gymnogongrus complex [42]. However, a unique haplotype from all the sampled range (ca. 800 km) was determined for the rbcL-S sequences from 25 PSA individuals.

Population structure and genetic relationships of PSA populations
Genetic relationships of the 17 haplotypes (introns + HEG) obtained from PSA populations were accessed through network analyses. The network exhibit three main clusters suggesting a pattern of isolation by distance for the populations analyzed. The same grouping pattern was obtained for Fst analysis. Mantel's test corroborated the hypothesis indicated by the two previous analyses. Therefore, isolation by distance appears to be the basic process accounting for structure in PSA populations, manifested in a cline of HEG degeneration. Populations sam-pled at the southernmost end of the distribution present the entire His-Cys box motif, while the population sampled at the northernmost end of the distribution, considering the start codon proposed by Haugen et al. [4], has only eight amino acids of the HE. Distributed between these two extremes, are the intermediate-sized alleles.
The neutrality tests results for the intron without HEG indicated a fast population growth from an ancestor population with small effective number. At the same time, cox2-3 region results were marginally significant for the neutrality tests and rbcL-S region results showed no nucleotide variability. The low variability in these markers, also observed in the networks, is consistent with a demographic event of expansion from a population with low effective number affecting all loci. These results are not compatible to the HEG length polymorphism. The intron + HEG marker showed remarkable variation in length in the same individuals that presented few variations for the other markers. Therefore, in the same window of time, much more variation was accumulated in HEG than in sequence variation in the other three markers, which are probably under different selection constraints. Considering the recent population expansion for PSA along the Brazilian coast, degeneration of HEG was a very fast process. First, if we assume that a functional HEG have a cost to the host cell, then natural selection will increase the frequency of nonfunctional elements; and second, if we assume that the HEG was already fixed for PSA populations (there is no availability of insertion site), then the frequency of nonfunctional elements will increase due to low selection to keep a functional HEG [38].
Based on these assumptions, two different scenarios can be proposed to explain intron +HEG evolution in PSA populations: In the first scenario, the horizontal transfer of intron + HEG occurred in an ancestral individual, prior to the colonization of PSA populations in the Brazilian coast. Therefore, it is reasonable to believe that the same bottleneck event that was detected by the three markers (intron without HEG, cox2-3 and rbcL-S regions) probably had as a consequence the fixation in the population of the full length HEG (functional). If the largest introns are considered as the ancestral state, then the oldest populations are located in the southernmost end of the distribution, and as long as individuals migrate to the north, their HEGs tend to degenerate. This scenario is consistent with the proposed hypothesis for intron insertion and evolution in the order Bangiales based on phylogenetics analysis of SSU rDNA and respective group I introns -Intron horizontal transfer to an ancestor of the order Bangiales, followed by vertical inheritance and evolution within the order as proposed by Muller et al [5].
In the second scenario, the ancestral PSA individual lack intron + HEG. It is possible to suggest that the horizontal transfer of intron + HEG in PSA SSU rDNA occurred after the event of demographic expansion. In this case, the intron with the most degenerated HEG -with more deletions accumulated -is present in the oldest population. This state was found in the northernmost population, suggesting a horizontal transfer event to have occurred in an individual from the PSA-V population. After the horizontal transfer event, the functional HEG invaded the other PSA populations by gene flow, being the largest HEG present in the more recently invaded populations, at the southernmost end of distribution. Within the context of intron evolution in the order Bangiales, this scenario is possible assuming the hypothesis that more than one intron insertion events has occurred during the divergence of the order [5].
This is the first report addressing intron evolution focusing on only one species. Understanding the mechanisms beyond intron + HEG evolution has been a challenge, despite all the knowledge obtained for these elements.

Self splicing assays
As self-splicing catalytic properties of group I introns are highly dependent on intron three-dimensional structure [44], we verified whether the occurrence of insertions in P1 paired element alters intron catalytic activities, checking if introns sizes variants self-splice in vitro. One way to check for intron excision is the confirmation of exons ligation [45].
The four intron size variants analyzed in this work selfspliced in vitro. Therefore, the occurrence of different size polymorphisms in intron P1 paired element do not refrain intron self-splicing mechanism, although there is a hypothesis that the presence of HEG may diminish selfsplicing efficiency [1]. The loops are strategic localities for HEG insertion. HEG have been considered invasive mobile elements that remain neutral to the host when inserted into introns, becoming invisible to negative selection [6]. If group I introns lose their self-splicing capabil-ity due to the presence of a HEG, they both would probably be eliminated from the gene they were inserted, since there is a strong selection against non-functional rDNA genes.

Conclusion
Commercial exploitation, mariculture and introduction of invasive species have been a major problem in the assignment of Porphyra geographic origins. Phenotypic plasticity along with a simple morphology is also an obstacle in Porphyra identification. Furthermore, scientific researches are more focused on taxonomy and phylogeny of the group than in population surveys. Population structure of Porphyra spiralis var. amplifolia could be assigned by HEG degeneration, although not by cox2-3 and rbcL-S regions. Therefore, intron size polymorphism is a suitable population marker for this species, and it can be rapidly detected using PCR assay.
The intron size polymorphism found in the PSA populations, corroborate the HEG cycle proposed by Goddard & Burt [38], indicating that the degenerated HEGs are reminiscent of the presence of a full-length and functional HEG, once fixed for PSA populations. The cline of HEG degeneration detected for PSA populations along the Southern Brazilian coast, determined the pattern of isolation by distance. Analyses with the other markers indicated a demographic event of expansion for PSA, from a population with low effective number. The maintenance of the HEG apparently does not refrain the ability of the intron to self-splice even when the different degrees of degeneration of these elements are present.

Authors' contributions
DM carried out the research, carried out and interpreted the analyses and wrote the manuscript. MCO conceived the project, interpreted the analysis, supervised the project and revised the manuscript. FMM carried out and interpreted the population genetics analyses and wrote the manuscript. SRM co-supervised of the project and revised the manuscript. All authors participated in the discussions and approved the final manuscript.