ITS2 data corroborate a monophyletic chlorophycean DO-group (Sphaeropleales)

Background Within Chlorophyceae the ITS2 secondary structure shows an unbranched helix I, except for the 'Hydrodictyon' and the 'Scenedesmus' clade having a ramified first helix. The latter two are classified within the Sphaeropleales, characterised by directly opposed basal bodies in their flagellar apparatuses (DO-group). Previous studies could not resolve the taxonomic position of the 'Sphaeroplea' clade within the Chlorophyceae without ambiguity and two pivotal questions remain open: (1) Is the DO-group monophyletic and (2) is a branched helix I an apomorphic feature of the DO-group? In the present study we analysed the secondary structure of three newly obtained ITS2 sequences classified within the 'Sphaeroplea' clade and resolved sphaeroplealean relationships by applying different phylogenetic approaches based on a combined sequence-structure alignment. Results The newly obtained ITS2 sequences of Ankyra judayi, Atractomorpha porcata and Sphaeroplea annulina of the 'Sphaeroplea' clade do not show any branching in the secondary structure of their helix I. All applied phylogenetic methods highly support the 'Sphaeroplea' clade as a sister group to the 'core Sphaeropleales'. Thus, the DO-group is monophyletic. Furthermore, based on characteristics in the sequence-structure alignment one is able to distinguish distinct lineages within the green algae. Conclusion In green algae, a branched helix I in the secondary structure of the ITS2 evolves past the 'Sphaeroplea' clade. A branched helix I is an apomorph characteristic within the monophyletic DO-group. Our results corroborate the fundamental relevance of including the secondary structure in sequence analysis and phylogenetics.


Background
Taxonomists face inconsistent or even contradictory clues when they examine the affiliation of organisms to higher taxonomic groupings. Several characters may yield alternative hypotheses explaining their evolutionary back-ground. This also applies to the taxonomic position of the Sphaeropleaceae . Different authors affiliate the green algal family by morphological characters to either ulvophytes or chlorophytes, until amendatory Deason et al. [10] suggested that the Neochloridaceae, the Hydrodic-tyaceae and the Sphaeropleaceae should be grouped as Sphaeropleales within the chlorophytes, since all of them have motile biflagellate zoospores with a direct-opposite (DO) confirmation of basal bodies.
Although nowadays most authors agree that the DO group is monophyletic, until now no study pinpointed the taxonomic linkage of the name-giving 'Sphaeroplea' clade to the remaining 'core Sphaeropleales' persuasively with genetic evidence [6,23], i.e. the sister clade remains unclear [15,24]. Likewise, with respect to morphology, studies of 18S and 26S rRNA gene sequences neither resolve the basal branching patterns within the Chlorophyceae with high statistical power nor corroborate a monophyletic biflagellate DO group without ambiguity [6,23].
Müller et al. [25] obtained moderate statistical support for the close relationship of the 'Sphaeroplea' clade and the 'core Sphaeropleales' with profile distances of 18S and 26S rDNA. In this study we followed and expanded their methodology with a very different phylogenetic marker. The internal transcribed spacer 2 (ITS2), the region of ribosomal RNA between the 5.8S rRNA gene and the large subunit (26S rDNA) has proven to be an appropriate marker for the study of small scale phylogenies of close relatives [26][27][28][29]. The sequence is in contrast to the bordering regions of ribosomal subunits evolutionary not conserved, thus genetic differentiation is detectable even in closely related groups of organisms. By contrast, the secondary structure seems to be well conserved and thus provides clues for higher taxonomic studies [27,[30][31][32][33]. Secondary structure information is furthermore especially interesting within the Chlorophyceae, because van Hannen et al. [34] described an uncommon branching of ITS2 helix 1 within the genera Desmodesmus, Hydrodictyon [35] and Scenedesmus. It is not known when this feature evolved and whether it is, as we expect, an apomorphic feature for the DO-group. It is obvious that phylogenetic statements should be improvable by inclusion of structural information in common sequence analysis. For example, Grajales et al. [36] calculated morphometric matrices from ITS2 secondary structures for phylogenetic analyses, but treated information of sequence and structure as different markers. Here we combine sequence with structural information in just one analysis. Aside from the biological problem, we address the pivotal question of a methodological pipeline for sequence-structure phylogenetics using rDNA data.
Cycling conditions for amplification consisted of 94°C for 10 min, 30 cycles of 94°C for 30 s, 50°C for 30 s and 72°C for 45 s, followed by a final extension step of 10 min at 72°C. PCR products were analysed by 3% agarose gel electrophoresis and ethidium bromide staining.
PCR probes where purified with the PCR Purificaton Kit (Qiagen) and where quantified by spectrometry. Each sequencing probe was prepared in an 8 μl volume containing 20 ng DNA and 1.25 μM Primer. Sequencing was carried out using an annealing temperature of 50°C with the sequencer Applied Biosystems QST 3130 Genetic Analyzer by the Institute of Hygiene and Microbiology (Würzburg, Germany).

ITS2 secondary structure prediction
ITS2 secondary structures of the three newly obtained sequences were folded with the help of RNAstructure [38] and afterwards manually corrected. All available 788 chlorophycean ITS2 sequences were obtained from the NCBI nucleotide database. The ITS2 secondary structure of Atractomorpha porcata was used as template for homology modelling. Homology modelling was performed by using the custom modelling option as provided with the ITS2-Database [30][31][32][33] (identity matrix and 50% threshold for the helix transfer). Forty-nine species representing the chlorophycean diversity were retained and used as comparative taxa in inferring phylogenies (Table 1). For this taxon sampling, accurate secondary structures of sequences were now folded by RNAstructure and additionally corrected using Pseudoviewer 3 [39]. We standardized start and end of all helices according to the optimal folding of the newly obtained sequences.

Alignment and phylogenetic analyses
Using 4SALE [40,41] with its ITS2 specific scoring matrix, we automatically aligned sequences and structures simultaneously. Sequence-structure alignment is available at the ITS2 database supplements page. For the complete . A maximum likelihood (ML) analysis was performed with a heuristic search (ten random taxon addition replicates) and nearest neighbour interchange (NNI) [44].
Maximum parsimony (MP) [45] was accomplished with gaps treated as missing data and all characters coded as "unordered" and equally weighted. Additionally, we clustered taxonomic units with neighbour-joining (NJ) [46] using maximum likelihood distances. Furthermore, with MrBayes [47] a Bayesian analysis (B) was carried out for tree reconstruction using a general time reversible substitution model (GTR) [48][49][50] with substitution rates estimated by MrBayes (nst = 6). Moreover, using ProfDist, a profile neighbour-joining (PNJ) tree [51,25] was calculated using the ITS2 specific substitution model available from the ITS2 Database. PNJ was also performed with predefined profiles (prePNJ) of all the clades given in Table 1.
For clade 'Scenedesmus' two profiles were used for groups 'true Scenedesmus' (Scenedesmus except S. longus) and 'Desmodesmus' (Desmodesmus and S. longus). We performed a sequence-structure profile neighbour-joining (strPNJ) analysis with a developmental beta version of ProfDist (available upon request). The tree reconstructing algorithm works on a 12 letter alphabet comprised of the 4 nucleotides in three structural states (unpaired, paired left, paired right). Based on a suitable substitution model [40], evolutionary distances between sequence structure pairs have been estimated by maximum likelihood. All other applied analyses were computed only on the sequence part of the sequence-structure alignment. For MP, NJ, PNJ, prePNJ and strPNJ analyses 1.000 bootstrap pseudoreplicates [52] were generated. One hundred bootstrap replicates were generated for the ML analysis. Additionally we used RAxML at the CIPRES portal to achieve 1.000 bootstraps with a substitution model estimated by RAxML [53]. All methods were additionally applied to a 50% structural consensus alignment cropped with 4SALE (data not shown). The individual steps of the analysis are displayed in a flow chart (Fig. 1).

New ITS2 sequences
GenBank accession numbers for newly obtained nucleotide sequences are given in Table 1 (entries 1-4). The two ITS2 sequences of Sphaeroplea annulina (Roth, Agardh) strain SAG 377-1a and strain SAG 377-1e were identical and thus only the first one was used for further analysis. According to folding with RNAstructure, ITS2 secondary structures of the three newly obtained sequences did not exhibit any branching in their helix I (Fig. 2) as it is described for the 'core Sphaeropleales', i.e. helix I was more similar to those of the CW-group and the 'Oedogonium' clade. Helix I of Sphaeroplea annulina was explicitly longer (9 nucleotides) than those of the other newly obtained algae. Due to this insertion, for Sphaeroplea, a branching pattern was enforceable, but would have lower energy efficiency. However, the additional nucleotides are not homologous to the insertion capable of making an additional stem (Y-structure) found in the 'Scenedesmus' and the 'Hydrodictyon' clade (approximately 25 bases).

ITS2 sequence and secondary structure information
ITS2 sequence lengths of all studied species ran from 202 to 262 nucleotides (nt), 235 nt on average. The GC contents of ITS2 sequences ranged from 36.84% to 59.92%, with a mean value of 52.42%. The number of base pairs (bp) varied between 64 and 89 bp and averaged 77 bp. The cropped alignment (50% structural consensus) showed that 23% of the nucleotides had at least a 50% consistency in their pairings. Compensatory base changes (CBCs) as well as hemi-CBCs (all against all) range from 0 to 16 with a mean of 6.6 CBCs (Fig. 2). Sequence pairs lacking CBCs were exclusively found within the same major clade.

Characteristics in a conserved part of alignment
In agreement with Coleman [28], the 5' side part near the tip of helix III was highly conserved including the UGGU motif [54,55,30], likewise the UGGGU motif in case of Chlorophyceae. We selected a part of the alignment at this position with adjacent columns (Fig. 2) to verify the suggested conservation. Having a closer look at this part of helix III, in our case, it showed typical sequence and structural characteristics for distinct groups. Studied species of the 'Oedogonium' clade possess at position 3 in the selected part of the alignment an adenine and in addition at positions 3-5 paired bases. In contrast, the CW-group solely possessed three consecutively paired bases in this block, but not the adenine. A typical pattern for clades of the DO-group was a twofold motif of 3 bases: uracile, adenine and guanine at positions 7-9, which is repeated at positions 11-13. This could be a duplication, which results in a modified secondary structure. In addition, the 'core Sphaeropleales' ('Hydrodictyon' clade and 'Scenedesmus' clade) showed an adenine base change at position 6, compared to all other clades.

Phylogenetic tree information
The PAUP* calculation applying maximum Parsimony included a total of 479 characters, whereas 181 characters were constant, 214 variable characters were parsimony-informative compared to 84 parsimony-uninformative ones.
Except for the Bayesian analysis (least support for node "c"), all applied methods yielded node "e" as the weakest point within the basal (labelled) branches (Table 2), which presents the relationship between the 'Hydrodictyon' and the 'Scenedesmus' clade on the one hand and the 'Dunaliella', the 'Oedogonium', the 'Reinhardtii' and the 'Sphaeroplea' clade on the other hand. The phylogenetic tree resulting from neighbour-joining analysis by PAUP* (Fig. 3) did not support node "e" at all, but strongly supported the remaining labelled branches. The maximum likelihood analysis by PAUP* (Fig. 4) did not encourage node "e" either. Both maximum likelihood methods did not even support nodes "a" ('true Scenedesmus' compared to remaining clades) and "c" ('Scenedesmus' opposite to remaining clades). All other basal branches were supported by this method.
Varying neighbour-joining analyses by ProfDist (NJ, PNJ, prePNJ, strPNJ) supported all basal branches -except for the weakest node "e" (average support) -with very high bootstrap support values of 84-100%. The maximum Parsimony method gave average support (63 and 62%) for Flowchart of the methods applied in this study Figure 1 Flowchart of the methods applied in this study. Sequences were obtained from the laboratory and from NCBI and afterwards folded with RNAstructure [38] or custom modelling of the ITS2 Database [30][31][32][33]]. An alternative way may pose to directly access sequences and structures deposed at the ITS2 Database. The sequence-structure alignment was derived by 4SALE [40]. Afterwards several phylogenetic approaches were used to calculate trees: NJ = neighbour-joining, PNJ = profile neighbour-joining, strPNJ = sequence-structure neighbour-joining, prePNJ = predefined profiles profile neighbour-joining, MP = maximum Parsimony, ML = maximum likelihood and B = Bayesian analysis.    In comparison, the topology of the phylogenetic tree based on the 50% cropped alignment did not change, but the bootstrap support values were lower in all cases (data not shown).
Using three newly obtained ITS2 sequences from Ankyra judayi, Atractomorpha porcata and Sphaeroplea annulina (Sphaeropleaceae) in this study we aimed to pursue two consecutive questions concerning the phylogenetic relationships within Chlorophyceae. (1) What is the phylogenetic position of the newly sequenced algae relative to the 'core Sphaeropleales' and could the biflagellate DO-group be regarded as monophyletic? (2) How does the secondary structure of the new ITS2 sequences look like and is an autapomorphic feature of the secondary structure associated with the monophyletic DO-group?
Considering the question (1) Buchheim et al. [6] and Wolf et al. [23] approached the problem with 18S + 26S rDNA and 18S rDNA data, but the relationship between the 'core Sphaeropleales' and the Sphaeropleaceae remained unclear. However, in their studies, Ankyra, Atractomorpha and Sphaeroplea clustered in a monophyletic clade named Sphaeropleaceae. We confirm this 'Sphaeroplea' clade with all three genera being strongly separated from other clades. As a result of a Bayesian analysis on a combined 18S and 26S rDNA dataset Shoup and Lewis [61] also found the Sphaeropleaceae as the most basal clade within the Sphaeropleales, but again the analysis lacked a strong backing. Beside these difficulties the 'core Sphaeropleales' were already shown to be monophyletic with high certainty [6,25,62,61,23].
The DO-group (Sphaeropleales including the 'Sphaeroplea' clade) as emended by Deason et al. [10], for which the directly opposed basal body orientation and basal body connection features are verified [63][64][65], is now strongly supported by molecular phylogenetic analyses. There was already evidence of an extended DO-group [6,66,67], however, for some groups ultrastructural results are still lacking, and even though the collective basal body orientation and connection imply a monophyletic DO-group, until now no molecular phylogenetic analysis could show this with solid support [6,62,24,23]. We demonstrate for the first time with robust support values for the equivocal nodes that the 'core Sphaeropleales', the 'Sphaeroplea' clade, and the Sphaeropleales are monophyletic.
Regarding question (2), for all structures of the 'Hydrodictyon' and the 'Scenedesmus' clade, helix I shows the typical  Neighbour-joining phylogeny of the Chlorophyceae based on comparison of ITS2 rRNA sequences and structures Figure 3 Neighbour-joining phylogeny of the Chlorophyceae based on comparison of ITS2 rRNA sequences and structures. The tree is unrooted, but the 'Oedogonium' clade is most likely appropriate as outgroup [56]. Sequences of the 'Sphaeroplea' clade were sequenced for this study and shown in bold letters. The phylogenetic tree is calculated by neighbour-joining with PAUP* [46,43] for an alignment with 52 taxa and 479 characters. The substitution model was set to TVM+I+G with parameters estimated by Modeltest [42]. Bootstrap values of basal branches are given for profile neighbour-joining with predefined profiles (ProfDist with ITS2 substitution model) [51,31]. Branch thickness is dependant of Bootstrap values calculated with four distance methods: neighbour-joining (PAUP*), neighbour-joining, complete profile neighbour-joining and sequencestructure profile neighbour-joining (all three ProfDist with ITS2 substitution model). branching (Y-structure). Initially, An et al. [68] proposed a secondary structure model with an unbranched helix I for ITS2 sequences of 'Scenedesmus' clade members. Thereafter, van Hannen et al. [34] updated the model by folding the nucleotide sequences based upon minimum free energy and found a branched helix I as the most energetically stable option. The branching is result of an insertion of approximately 25 nucleotides capable of folding as an individual stem within the 5' end of the first helix. However, ITS2 sequence and secondary structure information of further 'core Sphaeropleales' members, e.g. the 'Ankistrodesmus' clade and the 'Bracteacoccus' clade, lacks hitherto. In contrast, the Y-structure is absent within the 'Sphaeroplea' clade and any other investigated group so far. Thus this feature is -contrary to our expectation -not an autapomorphic character for the biflagellate DO-group as a whole but for the 'core Sphaeropleales'.
Regarding future work, the resolution among the main clades of Chlorophyceae was statistically poorly supported in previous studies [68,15,6,23]. Pröschold and Leliaert [24] reviewed the systematics of green algae by applying a polyphasic approach, but did not yield a clear resolution regarding a sister taxon to the Sphaeropleales. Since they are not yet available, ITS2 sequences of chaetopeltidalean and chaetophoralean taxa could not be included in the present study and therefore the phylogenetic relationships between the main Chlorophyceae clades remain open. We recommend involving sequence and secondary structure information of chaetopeltidalean and chaetophoralean ITS2 sequences in future studies to find out if the monophyletic biflagellate DO-group could be further extended to a general monophyletic DO-group containing quadri-and biflagellate taxa. A genome-wide approach indicates that Sphaeropleales and Chlamydomonadales are sister taxa, however only a few organisms are included in this study [56]. An additional uprising question is when the Y has evolved within the 'core Sphaeropleales'. This could be resolved by inclusion of other members (e.g. Bracteacoccus) in further studies.
The two major reasons contributing to the robust results presented here are the change of the phylogenetic marker and the inclusion of secondary structure information. In contrast to previous phylogenetic work concerning Chlorophyceae, this study is based on the ITS2, which offers a resolution power for relationships from the level of subspecies up to the order level, because of their variable sequence but conserved secondary structure [26,[30][31][32][33]. Hitherto commonly used markers in contrast are a lot more restricted. Using 4SALE [40] with implemented structure consideration, we could achieve for the first time a global simultaneously generated sequence-structure alignment (c.f. Fig. 1) yielding specific sequence and structural features distinguishing different algae lineages (c.f. Fig. 2).

Conclusion
In summary, the powerful combination of the ITS2 rRNA gene marker plus a multiple global alignment based synchronously on sequence and secondary structure yielded high bootstrap support values for almost all nodes of the computed phylogenetic trees. Thus, the relationship of Sphaeropleaceae is here resolved, being a part of the Sphaeropleales representing the monophyletic biflagellate DO-group. Furthermore, we could elucidate a branched helix I of ITS2 as an autapomorphic feature within the DO-group. This feature could be found only in the 'Hydrodictyon' and the 'Scenedesmus' clade. Our results corroborate the presented methodological pipeline, the fundamental relevance of secondary structure consideration, as well as the elevated power and suitability of ITS2 in phylogenetics. For a methodological improvement it is suitable to ameliorate the alignment algorithm in further considering horizontal dependencies of paired nucleotides, and moreover in future ITS2 studies it is suggested to include sequence and secondary structure information of hitherto not regarded taxa to resolve the chlorophycean phylogeny.