In the present contribution we developed a suite of methods to gain 'close-up' insights into ITS2 evolution that may guide future studies of ITS diversification in general. Therefore, we propose a general strategy for studies of ITS evolution and phylogeny, starting with the minimal requirements of the data set. ITS sequences differ from most other molecular markers by their low primary sequence and length conservation, and only the common intra-molecular folding pattern of their RNA transcripts, i.e. their secondary structure, allows comparative investigations. The correctly folded secondary structure is fundamental not only for improving the alignment [43–48], but also for building the alignment itself (especially in case of variable markers such as ITS2) as well as for identifying and detecting synapomorphies. In fact, the secondary structure is a prerequisite for all conclusions derived from the phylogenetic analyses. Even with many available sequences, deciphering the 'genuine' secondary structure is a demanding procedure, since the initial secondary structure folding process of a single ITS2 sequence (e.g. via MFold) often yields several alternative folds, and must be performed with ITS2 sequences from as many closely/distantly related taxa as is possible, to select the common folding pattern, substantiated by occurrence of CBCs and hCBCs [4, 49]. To simplify this analysis, an alternative, standardized procedure has been developed in which a novel ITS2 sequence is automatically compared to > 110.000 sequences in the ITS2 Database III with known secondary structures as a reference [46, 50]. However, for selected ITS2 sequences of the Ulvales, we obtained clearly false folding patterns using the ITS2 Database III. This is especially surprising since the authors described their criteria for how to evaluate the quality of secondary structure models, e.g. presence of four helices with conserved helix length distribution, and a UGGU motif near the 5' site apex of Helix III . However, some of the artificial 'reference' ITS2 structures of the Ulvales were in conflict with these criteria. Moreover, even structures that comply with the standards may often represent artifacts, as shown here for the Ulvales. As a conclusion, the time-consuming manual approach to identify the common ITS2 secondary structure for a selected group of organisms as done here cannot be abbreviated by a semi-automated procedure without significant loss of accuracy.
Fortunately, the ITS2 sequences of the order Ulvales proved to be an almost ideal model for comparative structural and phylogenetic studies. These sequences were unusually well conserved in length, and contained many, almost invariable sequence motifs, which allowed high-quality alignments. Sequence conservation allowed integration of more than 80 ITS2 sequences of the Ulvales, which together represented five families, within a single alignment - so far a unique case in the algae where an ITS2 data set is usually confined to a single family or genus. Furthermore, most ITS2 folds (using MFold or RNAstructure) spontaneously favored the same overall secondary structure, which corresponded well with already known ITS2 features in other green algae [4, 6]. Hallmarks of this common secondary structure, as e.g. the start/end of the four helices, and the spacers between helices, could easily be related to highly conserved sequence motifs in the ITS2 alignment. Even the most highly divergent ITS2 regions that were not alignable by manual sequence comparison showed excellent secondary structure conservation that allowed an unambiguous alignment across all Ulvales, except for the apical parts of the four helices. In consequence, each column in the alignable ITS2 regions represents a single homologous character, which applies not only for the paired positions but also for single-stranded spacer and internal loop regions.
To achieve an Ulvales-wide system to identify and number ITS2-nucleotides as a statement of positional homology, all unambiguously aligned positions were either classified as 'universal', i.e., present across all Ulvales, or 'non-universal', i.e. existing in only some Ulvales and thus being subject to insertion/deletion events. Only the first group of nucleotides were given 'universal' position numbers (1-129), allowing a clear nomenclature of e.g. ITS2 base pairs. These universal positions covered the whole range between invariable, moderately variable, and highly variable characters. To specify the conservation status of individual positions, usually a majority rule consensus is generated across the taxa investigated, e.g. a character that is G in 80 out of 100 taxa is termed '80% conserved' [4, 16, 52]. Here, we instead used the absolute number of changes in the evolution of a given character as a more appropriate measure of its degree of conservation. As an example, both positions of base pair 29/32 changed only once in the evolution of the Ulvales in the common ancestor of a taxon-rich family, the Ulvaceae. Thus, by simple majority rule consensus these characters would be regarded as 'less than 55% conserved', whereas our evolutionary measure (one change) clearly reveals their high conservation.
Following clarification of homology, universality, nomenclature, and the degree of variation of ITS2 characters, summarized in consensus secondary structure diagrams, all character state changes (substitutions) of each position could be investigated in detail to deduce the rules under which ITS2 evolved towards its current diversity. As a method, the previously developed synapomorphy search procedure  automatically generated a complete inventory of all substitutions of ITS2 positions within the Ulvales, and in addition, precisely identified the branches in the phylogenetic tree where these substitutions occurred. Since the most interesting questions regarding ITS2 evolution are related to the paired positions in the double-stranded helices, the resulting list of single-character evolutionary changes was analyzed manually to trace the evolution of all known base pairs for (1) co-evolution by maintaining base pairing via CBC, and (2) single-sided changes retaining pairing via hCBC. The result of this screen is an overview of all recent CBC- or hCBC-type changes underlying terminal branches, as well as changes that characterize basal divergences in the phylogeny of the Ulvales. Especially the latter point marks a difference to other studies where ITS sequences of extant taxa are compared without consideration of evolutionary changes that led to these sequences [53–55].
Are CBC frequencies proportional to the overall sequence divergence? To analyze this question, previous investigators [56, 57] plotted the ITS-distances between pairs of extant taxa against the number of CBCs, and found similar relations: CBC-frequencies (maximally 8-9 CBCs) are increasing from low to medium distance values, while for highly diverging pairs of sequences the number of CBCs is relatively small, indicating saturation. Surprisingly, this distribution was analyzed by linear regression methods and then characterized as 'linear proportional relation' . In the present study, synapomorphy searches revealed all CBCs, and precisely identified the branches on which they occurred. These data allowed a phylogenetic rather than a statistical approach, i.e. by plotting CBC frequencies versus the length (determined for paired sites only) of the respective internal or terminal branch. For the Ulvales, we also found a saturation-type relation between CBC frequencies and branch lengths, with the CBC vs. branch length ratio (CBC_R) being negatively correlated with branch lengths. In their study on Myrtaceae , the authors assumed 'unobserved' substitutions for the distant sequence comparisons, i.e. reversals, as one reason for the low number of observed CBCs, and also noticed that CBCs actually occur at relatively few sites in ITS molecules. We fully confirmed the latter phenomenon - out of 45 'universal' base pairs in ITS2, only 19 pairs underwent CBC-type changes throughout the entire order Ulvales. In other words, the limited number of sites that can per se evolve via CBCs may be the major reason for the unexpectedly low number of CBCs in divergent branches or taxa. As an example, the long branch of Kornmannia (21 substitutions), which could theoretically involve up to 10 CBCs, actually shows CBCs at only four sites. As an alternative explanation for the observed saturation in divergent branches or taxa, a high rate of 'unobserved' CBCs may be assumed, i.e. CBCs, which were immediately reverted towards the ancestral state. However, the synapomorphy analysis/mapping approach performed here allowed precise quantification of CBC-type reversals throughout the Ulvales: among 38 CBCs, we found only two reversals. Therefore, it appears very unlikely that high rates of 'unobserved' CBCs contributed to CBC saturation in the Ulvales. All these data suggest that CBCs represent a complex evolutionary process, which at higher divergence levels is constrained by available sites in ITS2 rather than depending simply on overall sequence divergence.
It is usually assumed that a CBC cannot evolve by two simultaneous substitutions, given the low evolutionary rates of most paired positions in ITS2 [57, 58]. Instead, a CBC may have evolved by two single-sided changes within a short time, and usually, the 'wobble' pair (G-U) is assumed as intermediate, suggesting the series A-U ⇔ G-U ⇔ G-C that represents two consecutive hCBCs [58–64]. As an alternative scenario, the intermediate stage may comprise mismatching nucleotides (e.g. A-U ⇔ AxC ⇔ G-C). Although the '2x hCBC → CBC' scenario seems attractive, it only applies for one case of CBC (A-U ⇔ G-C), and not to any of the remaining observed CBC categories (e.g. A-U ⇔ U-A/U-G/C-G). A popular approach to address this question is to determine frequencies of the respective changes. In the Ulvales, hCBCs of the A-U ⇔ G-U type as well as the G-U ⇔ G-C type were observed at high numbers, suggesting that in fact CBCs may have evolved via two subsequent hCBC-steps. However, such a summarizing view of overall substitution rates, which is often applied as the only source of evidence [e.g. , can be misleading for two reasons. First, these hCBCs may have occurred at different positions (see below), and second, even if these hCBCs referred to the same ITS base pair, they may have evolved independently in organisms that do not form a phyletic series. In fact, our synapomorphy analysis readily revealed that almost all pairs of hCBCs, which could theoretically form a 2-step CBC, occurred in different ITS2 positions, and already this spatial separation within the ITS2 molecule makes any causal relation between CBCs and hCBCs highly unlikely. Only in a single case, both hCBCs required for a full 2-step CBC mapped upon the same ITS2 position in Helix 1 (Figure 7). However, the respective taxa were unrelated to each other, highlighting that both hCBCs emerged as independent evolutionary events that did not converge towards a CBC. The simple formula 2x hCBC → CBC can at best be regarded as an exceptional scenario, which, however, could not be demonstrated in the Ulvales. In contrast to the misleading conclusions derived from statistical methods, the specific reconstruction of the phylogenetic history of ITS2 base pairs via synapomorphy analysis resolved this question.
Are CBCs and hCBCs equally distributed over ITS2 positions, or can one recognize distinct positional preferences? In fact, only seven pairs in the entire ITS2 molecule displayed both CBCs and hCBCs, whereas all remaining pairs appeared 'specialized' to either category of change. Already this simple observation is difficult to reconcile with the notion that the majority of CBCs followed a '2x hCBC → CBC' pathway.
Taken together, a hCBC appears to be a stable substitution, suggesting that the 'wobble' pair (G-U) is not at a disadvantage compared with 'canonical' base pairs [63, 65, 66]. In other words, when a canonical pair underwent a hCBC that lead to G-U, there was no selection pressure in favor of an immediate second hCBC restoring a canonical pair. In the Ulvales, we found similar preferences for both directions of hCBCs: 23 hCBCs of the canonical → 'wobble' pair type, and a comparable number (28) of the 'wobble' → canonical pair type. Comparisons of models of RNA sequence evolution, using ITS data from angiosperms, also suggested absence of strong selection against non-canonical base pairs [57, 64]. Interestingly, the evolutionary behavior of the 'wobble' pair is strongly biased in the Ulvales: we observed only a single hCBC of the G-U/U-G → A-U/U-A type, versus 27 hCBC in the G-U/U-G → G-C/C-G categories. A similar bias has been reported for some angiosperm families [57, 64]. It seems attractive to explain such a bias in substitution rates by unequal frequencies of G-C/C-G (31/32%) and A-U/U-A pairs (8/7% in the Ulvales), as e.g. done by . However, this conclusion is illegitimate (see below), and we favor another explanation, regarding functional constraints underlying a 'wobble' pair (for specific features of G-U, see [e.g. [66–69]]. The thermodynamic stability of A-U/U-A is more or less comparable to G-U/U-G, whereas the G-C/C-G pairs contribute much more to the stability of a helix [58, 66, 70, 71]. Thus, G-U/U-G → A-U/U-A changes may be comparatively neutral compared to G-U/U-G → G-C/C-G changes, which may be under positive selection in the Ulvales. As a suggestion, exchanges towards G-C/C-G pairs could improve ITS2 folding stability  when an organism is undergoing specialization to habitats with higher temperatures, and perhaps, the fast-evolving hCBC pathways (G-U/U-G → G-C/C-G) allow rapid ecological adaptation processes, in contrast to two-step CBC-type changes.
How did double-sided CBCs in ITS2 actually evolve? We favor a 2-step scenario that involves a non-pair as a short-living intermediate, i.e. N-N → N×N → N-N. In contrast to the '2x hCBC → CBC' scenario, this pathway holds for all CBC categories (22; blue arrows in Figure 7). At least for base pairs under functional constraints, it should be assumed that any spontaneous single-sided substitution leading to a non-pair is disadvantageous, with impaired ITS2 folding and excision characteristics . This event will usually lead to strongly reduced fitness or even extinction of the mutant genotype [65, 72]. Alternatively, mutants may escape extinction by intragenomic rRNA homogenization, which reverts the mutation and thus restores ITS2 functions and fitness . With respect to extant organisms, extinction of mutants as well as rRNA homogenization processes cannot be readily investigated. However, we may be able to recognize selection against non-pairs in the double-stranded backbone of ITS2 helices, by comparison of non-compensating changes (N-N ⇔ N×N) versus overall frequencies of CBCs and hCBCs . In fact, disruption of pairs (N-N → N×N) and restoration of pairing (N×N → N-N) both occurred at much lower frequencies (ca. 19 and 10 cases, respectively, within the Ulvales; uncertain cases in highly variable pairs were ignored) than CBCs and hCBCs (38 and 51 cases, respectively). Several of the conserved pairs even evolved exclusively by compensating changes, without any non-pairs. In the apical part of Helix 3, however, we found a few 'exceptional' positions that were almost universally paired, but evolved towards non-pairs within suprageneric clades (e.g. pair 79/101) or even whole families (pairs 68/109 - Ulvaceae, 75/105 - Kornmanniaceae and Bolbocoleonaceae, 84/97- Ulvaceae). How is it possible that the mismatch status remained stable over long periods of time? All these 'exceptional' non-pairs are surrounded by several conserved pairs, which, we suspect, in combination lead to strong thermodynamic stability of this helix . Therefore, a few isolated non-pairs in Helix 3 do apparently not reduce fitness and viability of the respective organisms, since e.g. the three families listed above belong to the ecologically most successful green algae in marine and coastal environments [42, 76].
Our data regarding Helix 2 provide the strongest evidence of selection against mismatch pairs - among 10 universal base pairs, nine were invariably double-stranded in all Ulvales and evolved exclusively by CBCs and hCBCs. Only the most variable pair 30/31 located just before the expansion region showed a few cases of mismatch. It should be noted that the two- dimensional shape of Helix 2 is regarded as a highly conserved 'hallmark' of the ITS2 core structure, i.e. a basal stem comprising about five base pairs, followed by a short internal loop (bulge) consisting of 1-2 pyrimidine-pyrimidine mismatches, and an apical stem+loop region [4, 43]. Experimental changes of this secondary structure by mutagenesis leads to failure in ITS2 excision at the transcript level, and especially, introduction of even one additional non-pair in the stem region is sufficient to prevent efficient pre-RNA processing . This corresponds well with our investigations in the Ulvales - such a change is perhaps not viable. However, only the basal pair of Helix 2 is invariant in the order, whereas all remaining pairs evolved at moderate rates, and - except pair 30/31 - lacked changes that interrupt base pairing. Although it might initially seem paradoxical, we assume that especially in these cases CBCs may have originated via non-paired intermediate steps, which in most cases were rapidly eliminated by natural selection (extinction). As a rare event, a lethal mismatch pair regained the essential base pairing by a second substitution, which must have occurred within a short time frame. As an example, the C-G → G-C CBC in pair 23/38 in Helix 2 may have evolved via short-living CxC or GxG mismatch state.
To substantiate our hypothesis that in ITS2 CBCs and hCBCs follow different evolutionary rules, we further investigated their homoplasious changes, i.e. parallelisms, convergences, and reversals. Fortunately, the problem to distinguish these three types of homoplasy was readily achieved by our approach of direct mapping of all substitutions in ITS2 base pairs, in contrast to indirect statistical methods, e.g. calculating a homoplasy index [15, 18]. As a first insight, parallelisms seem to be the most frequent case of homoplasy in ITS2, followed by reversals and convergences. Interestingly, parallelisms and especially reversals occurred much more frequently in the hCBC category. Considering the only slightly higher number of hCBCs (51) versus CBCs (38), we observed twice the number of parallelisms (38 versus 16), and even a threefold increase of hCBC-type reversals (6 versus 2; Figure 6). The remaining homoplasy category, i.e. convergence, shows the opposite tendency: we found five cases of CBC-type convergences, but no such event among hCBCs (Figure 6). This appears surprising, since there are only two possible pathways for hCBC-type convergences (A-U → G-U ← G-C, and U-A → U-G ← C-G), and most of these individual substitutions happened rather frequently (Figure 7). However, all these individual substitutions referred to different base pairs in ITS2, and therefore did not contribute to any hCBC-type convergence. What is the reason for the higher rate of CBC-type convergences? The explanation may be the higher number of possible pathways, since every base pair can directly originate via CBCs from four other pairs (Figure 7). As an example, A-U can theoretically evolve from G-C, U-A, U-G, or C-G. Notably, all these changes were found in the Ulvales (Figure 7) and in some cases referred to the same ITS2 position, thus leading to the observed CBC-type convergences (Additional file 4).
Since CBCs and hCBCs showed clear positional preferences (see above), it is not surprising that their homoplasies are also spatially separated in the ITS2 molecule. Among 17 homoplasious positions, only two showed CBC- as well as hCBC homoplasies (Figure 6). Interestingly, the most conservative regions of the ITS2, i.e. the conserved parts of Helix 2 and 3, were both characterized by very low frequencies of CBC-type homoplasies accompanied by unusually high rates of hCBC homoplasies (Figure 6). This phenomenon might explain why several authors have restricted their conclusions to (1) these conserved parts of ITS2, and (2) to CBCs. Obviously, most CBCs in the conserved regions are non-homoplasious changes, and thus offer informative molecular signatures, which unambiguously characterize taxa and clades (including CBC clades). In contrast, hCBC are usually considered as taxonomically meaningless (genotypes differing by one hCBC may even be able to mate), and this is mirrored by e.g. elevated homoplasy levels even in the conserved regions, and very high substitution rates.
Can we explain the observed substitution rates of CBCs and hCBCs in the ITS2 with empirical frequencies of the respective base pairs? It might appear logical to assume that a high frequency of a given base pair should correlate with a high rate of substitutions leading to that base pair. Within the Ulvales, G-C and C-G are the most frequently occurring base pairs in ITS2 (31 and 32%, respectively), whereas the four remaining pairs were comparatively rare, each counting for only 7-8% (Figure 7). Assuming a frequency-substitution rate correlation, we should observe the highest substitution rates for 'frequent ⇔ frequent' CBCs (G-C ⇔ C-G), lower rates for 'frequent ⇔ rare' interchanges (e.g. C-G ⇔ U-A), and the lowest substitution rates for the category 'rare ⇔ rare' (e.g. U-A ⇔ A-U). Our data clearly reject such a correlation, and rather show almost complete independence between frequency and substitution rates. For example, a direct 'rare → rare' CBC (U-A → A-U) shows the same rate as C-G → G-C from the 'frequent → frequent' category. Clearly, the highest observed substitution rates were found among the 'frequent ⇔ rare' interchanges, and this holds for the highest CBC-rates (C-G ⇔ U-A) as well as the highest hCBC rates (C-G → U-G, G-U → G-C).
How can we explain that substitution rates are obviously independent of frequencies? First, several base pairs in ITS2 are essential for proper secondary structure folding, and thus are under strict functional constraints. Not surprisingly, several strong G-C and C-G pairs contribute to ITS2 stability, and thus are conserved or even invariant, as shown in the ITS2 secondary structure diagram (Figure 1), explaining the unexpectedly low number of observed changes. However, there is also a general reason why frequencies cannot be correlated with substitution rates - observed frequencies apply to sequences of extant taxa only, whereas substitution rates refer to ancient as well as recent evolutionary changes. This means, that a single early occurring change, mapped upon a deep branch in the phylogenetic tree, will affect several descendent taxa and will thus considerably influence the base pair frequency distribution among recent taxa. In contrast, a recent substitution, mapped upon shallow or terminal branches, changes the base pair frequency of only few or even single taxa, with almost no effect on the observed overall frequencies.
As an example, in the Ulvales and also in angiosperms , the 'wobble' pairs G-U/U-G display much higher substitution rates with G-C/C-G than with A-U/U-A (see above).  argued that this bias in substitution rates is simply the result of the several fold higher frequencies of G-C/C-G versus A-U/U-A. For the above-mentioned reasons, this argument is inconclusive, and we instead propose functional constraints under adaptive processes as a possible explanation for the observed bias (see above).
What is the significance of ITS2 for taxonomy and species definition in the Ulvales? So far, the ITS2 molecule has only rarely been used as marker for phylogenetic analyses in the Ulvales, except in studies of single genera (Acrochaete - ; Acrosiphonia - ; Blidingia - [e.g. [79, 80]; Collinsiella/Monostroma - ; Gloeotilopsis - ; Ulva - [e.g. [23, 83–88]]; Ulvaria - ; Urospora - . As a first surprise, ITS2 proved to be well alignable across the entire order due to its high structural conservation and low sequence length divergence, and thus allowed reconstructions of the phylogenetic branching pattern even above the level of the sampled families. To test whether the ITS2 tree is accurate, it was compared with a phylogeny derived from 18S rDNA data that covered a similar, albeit not identical, set of taxa, and this comparison revealed only a few conflicting branching patterns (see Results). Thus, ITS2 is an exceptionally informative phylogenetic marker in the Ulvales (see also ), especially with respect to the relatively low number of alignable positions, and in future should be analyzed in combination with congruent data sets of other genes.
However, the most spectacular evolutionary aspect regarding ITS2 concerns its potential to predict sexual compatibility (intercrossing) among closely related organisms, thereby defining the level of 'biological' species. One of the most recent proposals is that any CBC in the ITS2 is informative, and when two ITS2 sequences differ by at least one CBC, they likely represent two species . Although the predicted ITS2 secondary structure in the Ulvales shows a high degree of conservation, we found it very difficult, sometimes impossible or at least subjective to align the highly variable regions (red circles surrounded by green line in Figure 1). Applying the proposal by Müller et al. , variations in ITS2 lengths (as is observed in many taxa) would automatically result in the recognition of more species, an untenable situation. We therefore favour the more conservative proposal by Coleman [25, 26] which refers to the presence of at least one CBC between two organisms in the conserved regions of ITS2 predicting a failure to sexually cross, i.e. these organisms represent two different species. Ideally, CBCs should have evolved at (1) approximately the same rate in sister lineages, and (2) at approximately the same or slightly slower rates than genes that control gamete compatibility. As a consequence, the 'first' CBCs should appear at about the same time, associated with shallow divergences in the phylogenetic tree, and should define several parallel clades (CBC clades sensu Coleman) that might correspond to 'biological' species. In this scenario, those branches where 'first' CBCs occurred could be connected by a single vertical line as e.g. shown in a cartoon phylogenetic tree . In the Ulvales, we found that none of these 'ideal' assumptions is fulfilled.
Clearly, many 'first' CBCs in the Ulvales are not associated with shallow branches at the level of 'biological' species, but instead mapped upon deep divergences representing the levels of genera, families, or even higher taxonomic levels. Only a few taxonomic species were equivalent to single CBC clades, e.g. Collinsiella tuberculata. Most CBC clades (sensu Coleman) within the Ulvales are therefore based on deep-branching CBCs, and each of them contains up to about 30 taxonomic species in several genera. Analysis concentrating on the ITS2 region of the Volvocaeae revealed a remarkable correspondence between CBC clade, Z clade and species (e.g. Gonium pectorale), . Is it, therefore, possible that each of these comprehensive CBC clades in fact represents only a single species, containing a diverging population of several morphotypes that are still able to cross? Unfortunately, the crossing capability of most species of the Ulvales analyzed here has not been investigated, but the limited evidence available may already address this question. Species of Ulva are well separated from each other by gametic mating barriers, as e.g. studied in detail for the same strains of U. ohnoi, U. reticulata and U. fasciata that were investigated here . These three species form one of many subclades within the large CBC clade sensu Coleman that includes the entire genus Ulva as well as most other members of the family Ulvaceae. Further observations regarding morphological organization [e.g. [76, 93–100]], ultrastructural characterization - e.g. presence/absence of scales on zoospores/aplanospores/gametes [82, 101–113] and type of habitat e.g. [42, 76] in other Ulvales lead to the same conclusion. For example, the macroalgae Protomonostroma (foliose, marine) and Capsosiphon (tubular thallus, marine), as well as the branched filamentous Chamaetrichon (square-shaped scales on zoospores, freshwater) and several unbranched filamentous microalgae (e.g. Urospora, no scales, marine) are not differentiated by a CBC in the highly conserved regions of helices 2 and 3.
In summary, genes controlling gamete compatibility as well as genes involved in structural differentiation apparently evolved much faster than most CBCs in the ITS2 of the Ulvales.
The scattered, non-synchronous distribution of CBCs has another, unexpected consequence. Several major CBC clades, which are based on ancient CBC events, contain nested CBC clades that originated by more recent CBCs. Thus, only the latter category is monophyletic, whereas the major CBC clades, deeply rooted in the phylogenetic tree, usually form paraphyletic groupings, here termed CBC grades. In the Ulvales, only a few taxa fall into one of the four 'genuine' CBC clades, whereas most taxa are distributed among five comprehensive CBC grades. In other words, the absence of a CBC in the highly conserved regions of helices 2 and 3 does not imply the presence of a monophyletic group nor is indicative of a close relationship (i.e. at the species level) among the taxa that share this trait. It remains to be determined whether non-synchronization of 'first' CBCs and thus predominance of CBC grades is a special feature of the Ulvales, or is widely distributed among eukaryotes.
Mapping all CBCs on the phylogenetic tree is the only method to distinguish between 'genuine' CBC clades and CBC grades. Coleman  already mapped CBCs in helices 2 and 3 of ITS2 upon the phylogeny of Pandorina isolates, similar to our approach, and to our knowledge this is still the only published reference. Although most members of Pandorina analyzed formed CBC (monophyletic) clades, the tree revealed the presence of CBC grades that contained isolates which are less closely related to each other than isolates that are excluded from the grade - because of the presence of a specific CBC (e.g. PmU879 + PmNoz3923/PmKiev). Unfortunately, ITS2 comparisons including CBC-concepts are commonly performed in a more simple way, i.e. by pairwise comparison between two taxa [e.g. [22, 34, 53, 54, 114–118]]. This 'phenetic' approach usually does not consider the phylogenetic history of CBC-type substitutions (plesiomorphic vs. apomorphic), and for different reasons it can lead to wrong conclusions (see Results). In the case of distantly related taxa, pairwise comparison is always impaired by the possibility of homoplasious changes. All homoplasy types (parallelisms, convergences, reversals) can lead to similar or even identical sequences in unrelated organisms. Even in the case of sister taxa, pairwise comparison of ITS2 CBCs is illegitimate unless the character state in their last common ancestor is taken into consideration. The discrepancy between a phenetic vs. a phylogenetic approach was highlighted here for two sister species of Acrochaete (Figure 4). In one base pair located in the conserved part of Helix 2, A. viridis and A. heteroclada seem to differ by a single hCBC only (A-U vs. G-U), resulting from pairwise comparison. However, the ancestral state of this pair in their last common ancestor was G-C, and thus, A. viridis evolved via CBC (G-C → A-U), whereas its sister species differs from the ancestor by one hCBC (G-C → G-U). Phenetic pairwise comparison would therefore predict possible mating ability, whereas the phylogenetic analysis resolves A. viridis as a separate species, likely unable to mate with its sister species.
Our case study in the Ulvales demonstrated several discrepancies in the generally accepted assumptions underlying ITS2 evolution and taxonomic concepts based on ITS2 characters. We hope that this study will stimulate others to investigate ITS2 data in greater detail by directly tracing the evolutionary history of individual characters instead of relying on indirect statistical methods only. As soon as such 'close-up' views on ITS2 evolution are available for other groups of eukaryotes, it may be possible to re-evaluate the significance of ITS2 sequence variations for evolution, taxonomy, and speciation processes in eukaryotes in general.