The origin of multicellularity in cyanobacteria
© Schirrmeister et al; licensee BioMed Central Ltd. 2011
Received: 15 October 2010
Accepted: 14 February 2011
Published: 14 February 2011
Skip to main content
© Schirrmeister et al; licensee BioMed Central Ltd. 2011
Received: 15 October 2010
Accepted: 14 February 2011
Published: 14 February 2011
Cyanobacteria are one of the oldest and morphologically most diverse prokaryotic phyla on our planet. The early development of an oxygen-containing atmosphere approximately 2.45 - 2.22 billion years ago is attributed to the photosynthetic activity of cyanobacteria. Furthermore, they are one of the few prokaryotic phyla where multicellularity has evolved. Understanding when and how multicellularity evolved in these ancient organisms would provide fundamental information on the early history of life and further our knowledge of complex life forms.
We conducted and compared phylogenetic analyses of 16S rDNA sequences from a large sample of taxa representing the morphological and genetic diversity of cyanobacteria. We reconstructed ancestral character states on 10,000 phylogenetic trees. The results suggest that the majority of extant cyanobacteria descend from multicellular ancestors. Reversals to unicellularity occurred at least 5 times. Multicellularity was established again at least once within a single-celled clade. Comparison to the fossil record supports an early origin of multicellularity, possibly as early as the "Great Oxygenation Event" that occurred 2.45 - 2.22 billion years ago.
The results indicate that a multicellular morphotype evolved early in the cyanobacterial lineage and was regained at least once after a previous loss. Most of the morphological diversity exhibited in cyanobacteria today —including the majority of single-celled species— arose from ancient multicellular lineages. Multicellularity could have conferred a considerable advantage for exploring new niches and hence facilitated the diversification of new lineages.
Subset of cyanobacterial taxa used for the analyses with GenBank accession numbers for 16S rDNA sequences
Chamaesiphon subglobosus PCC 74301
Arthronema gygaxiana UTCC 393
Cyanobium sp. JJ23-1
Arthrospira platensis PCC 8005
Cyanothece sp. PCC 88011
Crinalium magnum SAG 34.87
Chroococcus sp. JJCM
Filamentous thermophilic cyanobacterium
Geitlerinema sp. BBD HS2171
Gloeobacter violaceus PCC 74211
Gloeothece sp. PCC 6909/11
Leptolyngbya sp. ANT.LH52.1
Microcystis aeruginosa strain 0381
Lyngbya aestuarii PCC 74191
Prochlorococcus sp. MIT93131
Microcoleus chthonoplastes PCC 74201
Radiocystis sp. JJ30-3
Oscillatoria sancta PCC 7515
Synechococcus elongatus PCC 63011
Phormidium mucicola IAM M-221
Synechococcus sp. CC9605
Plectonema sp. F31
Synechococcus sp. WH8101
Planktothrix sp. FP1
Synechocystis sp. PCC 6803
Prochlorothrix hollandica 1
Synechocystis sp. PCC 63081
Pseudanabaena sp. PCC 6802
Synechocystis sp. CR_L291
Pseudanabaena sp. PCC 7304 1
Synechococcus sp. P1
Spirulina sp. PCC 6313
Synechococcus sp. C91
Starria zimbabweensis SAG 74.901
Synechococcus lividus C1
Symploca sp.PCC 8002
Acaryochloris sp. JJ8A61
Trichodesmium erythraeum IMS 1011
Thermosynechococcus elongatus BP-11
Anabaena sp. PCC 7108
Chroococcidiopsis sp. CC2
Calothrix sp. PCC 71031
Dermocarpa sp. MBIC10768
Nodularia sp. PCC 78041
Nostoc sp. PCC 7120
Myxosarcina sp. PCC 73121
Scytonema sp. U-3-31
Myxosarcina sp. PCC 7325
Pleurocapsa sp. CALU 1126
Chlorogloeopsis sp. PCC 75181
Pleurocapsa sp. PCC 7516
Fischerella sp. PCC 7414
Symphyonema sp. strain 1517
Beggiatoa sp. 'Chiprana'
Different interpretations of multicellularity are currently used [10–12]. For cyanobacteria, characterization of multicellularity has been described in previous studies [13–16]. Cell to cell adhesion, intercellular communication, and for more complex species, terminal cell differentiation seem to be three essential processes that define multicellular, prokaryotic organisms on this planet . Some forms of complexity found in several multicellular eukaryotes are not present in prokaryotes, but simple forms of multicellularity can be identified in three sections of the phylum cyanobacteria. Multicellular patterns comprise basic filamentous forms as found for section III, as well as more complex forms involving terminal differentiation, present in sections IV and V. In eukaryotes, multicellular complexity ranges from what is comparable to cyanobacteria to cases with up to 55 cell types as estimated for higher invertebrates such as arthropods or molluscs . Considering that cyanobacterial sections III, IV and V resemble some of the first forms of multicellular filaments on Earth, knowing when and how these shapes evolved would further our understanding of complex life forms.
Some of the oldest body fossils unambiguously identified as cyanobacteria have been found in the Kasegalik and McLeary Formations of the Belcher Subgroup, Canada, and are evaluated to be between 1.8 billion and 2.5 billion years old [6, 18]. Studies from ~ 2.0 billion year old formations [18, 19] contain both unicellular and multicellular morphotypes of cyanobacteria. Cyanobacteria certainly existed as early as 2.32 billion years ago, if one accepts the assumption that they were responsible for the rapid accumulation of oxygen levels, known as the "Great Oxygenation Event" [1–3, 5, 7]. Multicellular fossils belonging to the cyanobacteria are well known from the late Precambrian [12, 20, 21] and possibly already existed 2.32 billion years ago. Other microbe-like multicellular filaments even older than 3.0 billion years have been found several times [22–26]. Some of the latter fossils are morphologically similar to species from the cyanobacterial order Oscillatoriales [27, 28], but no clear evidence has been adduced yet. Although biogenicity of some of the oldest fossils has been questioned [29, 30], a large variety of bacteria including anoxic phototrophs already existed by the time cyanobacteria evolved oxygenic photosynthesis . Though impressive for prokaryotes, the fragmentary fossil record alone is not sufficient to disentangle the origin of cyanobacteria and their morphological phenotypes. Therefore, additional methods such as phylogenetic analysis provide a promising possibility to gather further clues on the evolution of such a complex phylum.
Phylogenetic analyses of cyanobacteria have gained in quantity over the past 20 years [4, 31–39]. These studies have shown that morphological characterization does not necessarily reflect true relationships between taxa, and possibly none of the five traditional morphological sections is monophyletic. Similar morphologies must have evolved several times independently, but details on this morphological evolution are scarce. Analyses assessing characteristics of cyanobacterial ancestors [37, 39] provide not only fundamental information on the history of cyanobacteria, but also on the evolution of life forms in the Archean Eon.
If one studies phylogenetic relationships based on protein coding genes in bacteria, it is possible to encounter the outcome of horizontal gene transfer (HGT) . This issue is not as problematic for ribosomal DNA . Nonetheless, the problem could be potentially reduced by analyzing datasets of concatenated conserved genes. Identification of these genes for phylogenetic analyses is not without difficulty, and requires in an ideal case comparison of complete genome data . In cyanobacteria, many phylogenetic studies have concentrated on specific clades or smaller subsets of known species in this diverse phylum [39, 43–48]. Therefore the genomic data presently available are strongly biased towards certain groups. In particular, genomic studies in cyanobacteria have emphasized marine species from Section I. Marine microphytoplankton (Synechococcus and Prochlorococcus) are a particularly well studied group [43, 45, 47, 48], reflected by 19 sequenced genomes out of 41 cyanobacterial genomes sequenced to date (http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi, accessed in January 2011). From species belonging to section III only two genomes (Trichodesmium erythraeum and Arthrospira platensis) are known. For sections IV (four genomes known) and V (no genomes known) molecular data are rare or missing. As genomic data accumulate, promising phylogenomic approaches to cyanobacteria are being established [37–39, 47]. Despite these advances, it is at present difficult to obtain sequences other then 16S rDNA to cover a representative sample of species from all five sections.
The aim of this paper is to use molecular phylogenetic methods to address the evolutionary history of cyanobacteria and the evolution of multicellularity. For this purpose, we established a phylogeny based on 16S rDNA sequences belonging to 1,254 cyanobacterial taxa. From that phylogeny we sampled 58 cyanobacterial taxa that represent all main clades obtained and all five sections described by Castenholz et al. [8, 9], and feature a 1:1 ratio of unicellular to multicellular species. We used several methods to reconstruct the morphological evolution of ancestral lineages, and compared our results to known fossil data. Since the fossil record is inconclusive on the timing and taxonomic position of multicellular cyanobacteria, our study provides independent evidence on the first appearance and evolution of multicellularity among the ancestors of living cyanobacteria.
We separately tested each of the 22 eubacterial species originating from a diverse set of non-cyanobacterial phyla, with a subset of the cyanobacteria (58 taxa). The latter were chosen from the large dataset containing 1,254 taxa, and cover all sub-groups of the tree (Table 1). This subset was used for all subsequent phylogenetic analyses. Though multicellular species seem to dominate the known cyanobacteria, we chose to sample a taxa set containing unicellular and multicellular morphotypes in a 1:1 ratio. That way biases towards certain character states would be excluded. Furthermore, taxa used in the analyses should represent species from all five sections described by Castenholz et al. . Given our interest in the base of the phylogeny, a greater number of taxa were sampled from basal sub-groups. Due to a lack of data available on GenBank at the present state of research, efforts to build a phylogenetic reconstruction of this size (58 species) using additional ribosomal protein sequences failed. But genomic data are accumulating (57 genomes in progress according to GenBank) and will soon offer possibilities for further extensive analyses.
In total 14 trees showed congruent topologies. From the 14 eubacteria which have been used as an outgroup in these trees, we chose Beggiatoa sp. as an outgroup for further analyses because its 16S rRNA gene sequence exhibits the shortest distance to the cyanobacteria.
A general substitution model (GTR+G+I) was applied for both analyses. Results of the maximum likelihood and Bayesian methods are highly congruent. Result of the Bayesian analysis with posterior probabilities (black) and bootstrap values (red) displayed at the nodes is pictured in Figure 4. Posterior probabilities above 0.95 and bootstrap values over 70% are considered to represent a high phylogenetic support. Bootstrap values between 50% and 70% are considered weak support. Posterior probabilities below 0.90 and bootstrap values below 50% are not displayed. At deep nodes, the tree topology is fully resolved with high posterior probabilities. Apart from section V, none of the morphological sections described by Castenholz et al.  is monophyletic. Compared to the outgroup Beggiatoa sp., branch lengths are relatively short, which seems surprising given the old age of the phylum. Rates of evolution in cyanobacteria are extremely slow. This so called "hypobradytelic" tempo would explain their short evolutionary distances [20, 57, 58].
Cyanobacteria form three distinct clades mentioned earlier (Figure 3). Clades E, AC and C exhibit posterior probabilities (PP)/bootstrap values (BV) of 1.0/51%, 0.99/-, and 1.0/97% respectively (no support: "-"). Clade E comprises all taxa analyzed from section II, some from section I (Synechocystis, Microcystis, Gloeothece and others), some from section III (Oscillatoria, Trichodesmium, Arthrospira, Lyngbya, Microcoleus, Spirulina and others) and all from sections IV and V. Within clade E two subclades, E1 (species from section II; PP/BV = 1.0/81%) and B (species from sections IV and V among others; PP/BV = 1.0/100%), are found. Clade AC contains species from section I and III (among others, species from the genera Synechococcus, Prochlorococcus, Oscillatoria, Plectonema). Clade C consists of Pseudanabaena species, Arthronema gygaxiana and Phormidium mucicola belonging to section III. Gloeobacter violaceus is placed closest to the outgroup. Several phylogenetic studies were conducted showing approximate agreement with the tree topology generated here [4, 31–39, 54]. To check the consistency of results from the maximum likelihood and Bayesian analysis to previous studies, we compare our results to the trees produced by Honda et al. , Turner et al.  who used 16S rDNA sequences, and Swingley et al.  who used a genomic approach.
The tree from Figure 2 in Honda et al.  shows overall strong congruences with our tree. The only exception is that in Honda et al.  "Synechococcus elongatus Toray" is placed separately between Gloeobacter and the rest of the cyanobacteria. We found that "Synechococcus elongatus Toray" (identical to Thermosynechococcus elongatus BP1) is located within clade AC in our study and not next to Gloeobacter violaceus.
In Turner et al. , the major clades are congruent with those inferred in our study, but there are a few differences in the relationships among these clades. In that study, the analog of clade E1 is sister to clade AC, which is not the case in our consensus tree. Furthermore, Synechococcus C9 is grouped with Synechococcus P1, which might be due to long branch attraction. In our phylogenetic tree, Synechococcus C9 is grouped within clade AC, a relationship supported by high posterior probabilities and bootstrap values (1.0/99%). Clade C in our study is placed in the same position as in the tree from Turner et al. .
Swingley et al. , used a phylogenomic approach to investigate cyanobacterial relationships. Due to limited, biased genome data available at present, some clades present in our tree are missing in that study. Even so, the main clades retrieved in that study are mostly congruent with clades in our tree.
Monophyly of section V (the branching, differentiated cyanobacteria) shown in our tree agrees with Turner et al.  and other studies [36, 54]. Nonetheless it is possible that the monophyly of section V bacteria is due to limited taxon sampling, since polyphyly has been detected for section V in another study . Gloeobacter violaceus is placed as the first diverging lineage in the phylogeny after the outgroup, as suggested by previous studies [4, 32–35, 37, 39, 54]. Our phylogenetic reconstruction also confirms the placement of taxa belonging to section I and III throughout the tree [4, 31–37, 39, 54]. The finding that possibly none of the traditional morphological sections are monophyletic, clearly indicates that similar morphologies have been gained and lost several times during the evolutionary history of living cyanobacteria. Overall, the strong phylogenetic agreement between this and earlier studies confirms the suitability of the tree presented here for further analyses of morphological evolution.
Our analysis indicates that multicellularity is a phylogenetically conservative character (p-value < 0.01). If the terminal taxa of the Bayesian consensus tree are randomly re-shuffled, a count through 1,000 re-shuffled trees gives an average of 20 transition steps. However an average of only nine parsimonious transitions was observed in a count through 10,000 randomly sampled trees of our ancestral character state reconstruction.
Different Transition rates with whom ancestral character states were estimated.
Maximum likelihood analysis
Ancestral character states of nodes 3, 4 and 5 using different transition rates and methods.
The maximum likelihood analysis is not contradicted by a Maximum Parsimony optimization (Table 3 and Additional File 5). Applying maximum parsimony as a reconstruction method, the uniquely best states were counted across 10,000 trees randomly sampled from the two (MC 3) runs of the Bayesian tree reconstruction. The relative probabilities for a multicellular ancestor at nodes 3, 4 and 5 are 0.68, 0.68 and 0.69, respectively. In contrast, the relative probabilities for a unicellular ancestor at nodes 3, 4 and 5 under parsimony reconstruction are 0.0013, 0.0014 and 0.0014, respectively.
At least five reversals to unicellularity occurred in the tree, three of them within clade AC. The first transition occurred on a branch which led to a group of thermophilic cyanobacteria: Acharyochloris sp., Synechococcus lividus C1 and Thermosynechococcus elongatus. Posterior probabilities (PP) and bootstrap values (BV) for this group are 0.99/73%, whereas the sister group within AC is supported by 0.96/66% (PP/BV). The second transition within clade AC led also to a thermophilic cyanobacterium Synechococcus C9. Sister relation of this species to a filamentous thermophilic cyanobacterium is supported by 1.0/99% (PP/BV). The last transition in clade AC occurred within the group including the marine pico-phytoplankton genera Synechococcus and Prochlorococcus. The filamentous Prochlorothrix hollandica is supposed to be the closest relative to the group that includes marine pico-phytoplankton, supported by 1.0/61% (PP/BV). Clade AC has a PP of 0.99, while its BV is below 50%. Although bootstrap support is below 70% for clade AC and some groups within it, posterior probabilities show a very high support (> 0.95). Simulation studies have shown that posterior probabilities approach the actual probability of a clade [61–63]. Bootstrapping tends to underestimate the actual probability of a true clade. Although, posterior probabilities tend to be erroneous if the model of evolution is underparameterized, overparameterization has only a minor effect on the posterior probabilities. Therefore, using a complex model of evolution, such as the "general time reversible with gamma distributed rate variation"(GTR+G), is recommended [62, 63]. We used the GTR+G+I model for our analysis, and assume that nodes with a PP higher than 0.95 are reliable.
It is very likely that at least one additional reversal to unicellularity occurred in clade E1, but phylogenetic support is not high enough to locate the exact position of this transition. Similarly, support for the nodes where the other transition to multicellularity within clade E occurred is missing. The exact locations of reversals within clade E therefore are not certain and a scenario where multiple reversals occurred cannot be excluded. In clade E, there is also a reversal to multicellularity observed in Spirulina sp. PCC 6313. The location of this transition is supported by posterior probabilities of 0.99 at two ancestral nodes.
Stucken et al.  compared gene sets of multicellular cyanobacteria and found that at least 10 genes are essential for the formation of filaments. Besides genes previously thought to be correlated with heterocyst formation (hetR, patU3 and hetZ) they found seven genes coding for hypothetical proteins. The species they compare are all located within clade E in our tree, most of them being differentiated. Unfortunately no genomes from multicellular species in more basal clades are available at present. But genome projects of Phormidium sp. ISC 31 and Plectonema sp. ISC 33 are presently being conducted http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi. If these species turn out to group with Phormidium mucicola IAM M-221 and Plectonema sp. F3 from the basal clades C and AC in our study, this could provide important information on the original metabolic pathways in ancient multicellular cyanobacteria and on possible advantages of multicellularity.
The majority of cyanobacteria living today are described as successful ecological generalists growing under diverse conditions . Our analysis indicates that this diverse range of cyanobacterial morphotypes found in various habitats today —whether multicellular or unicellular— has evolved from multicellular ancestors.
In eukaryotes, simple multicellular forms build the foundation for the evolution of complex multicellular organisms. Although complex multicellularity exhibiting more than three cell types is presumably missing in prokaryotes, bacteria invented simple multicellular forms possibly more than 1.5 billion years earlier than eukaryotes [24–26, 65]. Multicellularity has been described as one of several major transitions that occurred in the history of life. These transitions between different units of selection  resulted in changes in the organizational confines of the individual. Maynard Smith and Szathmary (1995, p.6) summarize eight major transitions in the evolution of life after which, "entities that were capable of independent replication before the transition can replicate only as part of a larger whole after it". These transitions can create new units of selection at a higher level of complexity . Origin of chromosomes, origin of the eukaryotic cell, origin of multicellular organisms and the origin of eusocial communities are some major transitions that redefine the degree of individuality [66, 67, 69, 70]. Some transitions are thought to be unique, such as the evolution of meiosis or the evolution of the genetic code. Other major transitions occurred several times independently, such as the evolution of eusociality [71, 72] and multicellularity [10, 66, 73–75]. There is a tendency to assume that these transitions occur in a progression that leads to an increase in complexity. However, it seems that in cyanobacteria this is not the case. Anatomical complexity has been lost during their evolution several times (Figure 5). In a similar fashion, a complex character such as eusociality has been lost several times in halictid bees [72, 76]. Conversely the phylogeny indicates that multicellularity re-evolved in Spirulina. Regaining complex characters has been observed in other studies as well [77–79]. Nonetheless, some studies state that re-evolution of a complex character after a previous loss is not possible [80, 81]. Such studies argue that according to 'Dollo's law', a loss of complexity is irreversible , a statement that is not supported in the cyanobacterial case. Repeated transitions in either direction are possible.
Some late Archean fossils show an oscillatorian or chroococcacean morphotype (Figure 7: 8, 9). 2.52 and 2.56 billion year old oscillatorian-like fossils [24, 88, 89] could possibly represent close relatives of cyanobacterial ancestors. 2.72 billion year old filamentous bacteria  could potentially represent one of the first multicellular cyanobacteria detected. For single celled forms, 2.56 billion year old unicellular fossils [89–92] could likely represent chroococcacean fossils, relatives of ancestral Gloeobacter violaceus or Synechococcus sp. P1 (Figure 7).
The first conclusive cyanobacterial fossils from all five sections have been reported from around 2.15 billion year old rocks. In 1976, Hofmann described Microfossils from stromatolitic dolomite stones in the Kasegalik and McLeary Formations of the Belcher Supergroup in Hudson Bay, Northern Canada. Among these fossils are Halythrix which seems to belong to the order Oscillatoriales (section III), Eosynechococcus and Entophysalis both presumably order Chroococcales (section I) and Myxococcoides fossils (section II). In 1997 similar fossils were described by Amard and Bertrand-Safarti in paleoproterozoic cherty stromatolites from the "Formation C (FC)" of the Franceville Group in Gabon, dating back 2.00 billion years. They also characterized chroococcalean fossils, particularly Eosynechococcus and Tetraphycus, and filamentous bacteria (Gunflinta) which could likely resemble cyanobacteria and Myxococcoides fossils. Furthermore, large microfossils (so called Archaeoellipsoides elongatus), with akinetes similar to the ones from Anabaena-like species were found [4, 19]. Akinetes are resting cells which are only present in differentiated cyanobacteria from sections IV and V. As it has been confirmed in several studies, sections IV and V share a most recent common ancestor [4, 33, 36]. Therefore these fossil akinetes document the existence of differentiated cyanobacteria 2.00 billion years ago. Given that differentiation in cyanobacteria is evolutionary stable only in a multicellular setting , this again supports the notion that multicellular species belonging to the cyanobacteria must have existed earlier than 2.0 billion years ago.
Several studies have assessed prokaryotic history using phylogenetic dating methods [50, 52]. In these studies the origin of cyanobacteria has been estimated around the time of the "Great Oxygenation Event" of 2.20-2.45 billion years ago [2, 7]. Other studies have reported elevations of oxygen levels before the great rise of atmospheric oxygen [7, 94]. Using small and large ribosomal subunit sequences, Blank and Sanchez-Baracaldo  estimated the origin of cyanobacteria between 2.7 and 3.1 billion years ago. They also try to address the evolution of cyanobacterial traits and assess that multicellular cyanobacteria did not originate before 2.29-2.49 billion years ago. In the study of Blank and Sanchez-Baracaldo , a smaller set of cyanobacterial taxa was used, with some basal multicellular species that are present in clade C of our analysis missing. These taxa could have an essential effect on the timing of the first multicellular cyanobacteria. To resolve this issue further dating analyses would be needed. Clearly, as Blank and Sanchez-Baracaldo point out, for such analyses to ultimately resolve the cyanobacterial history, a larger number of cyanobacterial genome data would be needed to represent all the morphological and genetic diversity within this phylum.
Cyanobacteria, photosynthetic prokaryotes, are one of the oldest phyla still alive on this planet. Approximately 2.20-2.45 billion years ago cyanobacteria raised the atmospheric oxygen level and established the basis for the evolution of aerobic respiration [1–6]. They introduced a dramatic change in the Earth's atmosphere, which might have created possibilities for more complex lifeforms to evolve. Considering the importance of cyanobacteria for the evolution of life, it seems unfortunate that data sets for a representative phylogenomic analysis are not yet available. A coordinated perspective between research groups and a diversified taxon sampling strategy for genome projects would offer the possibility for more comprehensive studies on cyanobacterial evolution. By presenting results obtained from 16S rDNA data analysis here, we hope to boost interest for more extensive genomic studies in this phylum. Phylogenomic approaches would help to further investigate some of the results in the present work.
Multicellular prokaryotic fossils from the Archean Eon are documented [25, 26], and fossil data can support the possibility of multicellular cyanobacteria in the Archean Eon [24, 88–90]. Furthermore, studies describe smaller accumulations of oxygen levels around 2.8 to 2.6 billion years ago  and around 2.5 billion years ago . Therefore multicellular cyanobacteria could have evolved before the rise of oxygen in the atmosphere. The "Great Oxygenation Event", also referred to as "oxygen crisis", could presumably have marked one of the first mass extinction events during Earth's history. New habitats developing around 2.32 billion years ago, due to a dramatic change of Earth's atmosphere could have triggered cyanobacteria to evolve the variety of morphotypes preserved until today.
In terms of cell types, cyanobacteria reached their maximum morphological complexity around 2.00 billion years ago . By the time eukaryotes evolved, cyanobacteria already exhibited the full range of their morphological diversity. Due to slow evolutionary rates in cyanobacteria, which have been described as "hypobradytelic" [20, 57, 58], extant cyanobacteria that appear to exhibit the same morphotype as in the Precambrian Eon  are reminiscent of the idea of "living fossils". However, one should consider the possibility that what may appear as morphological stasis may be due to developmental constraints at the phylum level. Cyanobacteria apparently reached their maximum complexity early in Earth history, but instead of morphological stasis at the species level, our results suggest that they subsequently changed morphotypes several times during their evolution. This allowed for the exploration of diverse morphotypes within their developmental constraints, including the loss and regaining of multicellular growth forms.
A total of 2,065 16S rRNA gene sequences from the phylum cyanobacteria were downloaded from GenBank. Unidentified and uncultured species were excluded. With this large dataset phylogenetic reconstructions were conducted as described in the next section. Aside from cyanobacteria, the dataset included six chloroplast sequences and six eubacterial sequences: Beggiatoa sp., Thiobacillus prosperus, Agrobacterium tumefaciens, Chlorobium sp., Candidatus Chlorothrix halophila and Escherichia coli HS.
From this large tree a subset of 58 cyanobacterial sequences were selected for further analyses. Accession numbers are provided in Table 1. Species from all five sections described by Castenholz et al.  were included. Taxa were chosen to represent a 1:1 ratio of unicellular and multicellular species. The final data set contained 22 single-celled taxa from section I, 7 single-celled taxa from section II, 21 multicellular taxa from section III, 5 multicellular, differentiated taxa from section IV and 3 differentiated, branching taxa from section V as described by Castenholz et al. .
Non-cyanobacterial species used in this study with GenBank accession numbers for 16S rDNA sequences
Acidobacterium capsulatum ATCC 51196
Actinosynnema mirum DSM 43827
Aquifex aeolicus VF5
Bacteroidetes bacterium X3-d
Verrucomicrobia bacterium YC6886
Chlorobium sp. sy9
Chloroflexus sp. Y-400-fl
Deferribacter desulfuricans SSM1
Deinococcus sp. AA63
Streptococcus mutans NN2025
Planctomyces brasiliensis DSM 5305
Beggiatoa sp. 'Chiprana'
Spirochaeta thermophila DSM 6192
Thermotoga lettingae TMO
Nanoarchaeum equitans Kin4-M,
The 2,065 16S rRNA gene sequences were aligned using the software MAFFT  via Cipres Portal . The alignment was corrected manually using BioEdit v7.0.5 . Poorly aligned and duplicated sequences were excluded from the alignment. From the remaining 1,254 sequences (1235 characters) a phylogenetic tree was reconstructed running 10 maximum likelihood analyses as implemented in RAxML v7.0.4 . GTR + G + I (General time reversible model, G: Gamma correction, I: proportion of invariable sites) [105, 106] was used as an evolutionary substitution model. Bootstrap values were calculated from 100 re-samplings of the dataset and plotted on the best maximum likelihood tree using RAxML v7.0.4. The resultant tree (Figure 1; Additional File 6: newick format; Additional File 7: taxon names) was visualised in FigTree v1.3.1 http://tree.bio.ed.ac.uk/software/figtree/ and graphically edited with Adobe Illustrator CS2 http://www.adobe.com/products/illustrator/.
To test different outgroups, phylogenetic trees were reconstructed using all sampled non-cyanobacterial species (Table 4) plus five representative species from the cyanobacterial phylum (Table 1). Sequences were aligned using Clustal-X with default settings  and corrected manually. The trees were built using maximum likelihood and Bayesian inference, with and without an outgroup from the kingdom archaea. Fifty separate maximum likelihood searches were conducted using RAxML v7.0.4 software , from which the tree with the best log-likelihood was chosen. Bootstrap support for each tree was gathered from 100 re-samplings. Bayesian analyses were conducted with MRBAYES 3.1  using a GTR + G + I evolutionary model with substitution rates, base frequencies, invariable sites and the shape parameter of the gamma distribution estimated by the program. Two Metropolis-coupled Markov Chain Monte Carlo (MC 3) searches with four chains, three heated and a cold one, were run. The analyses started with a random tree and was run for 5,000,000 generations. Trees and parameters were sampled every 100th generation. The trees were checked to show a standard deviation of split frequencies below 0.05. The first 3,000,000 generations were excluded as the burn-in.
Additionally phylogenetic analyses were conducted with Bayesian inference, using each of the 22 eubacterial species separately with the sampled cyanobacterial subset (58 taxa). Alignments were built using Clustal-X software with default settings  and corrected manually. For each phylogenetic analysis two (MC 3) searches were run for 10,000,000 generations using MRBAYES 3.1 . Trees and parameters were sampled every 100th generation. The first 3,000,000 generations being excluded as a burn-in, assuring that the standard deviation of split frequencies were below 0.05 and log-likelihoods of the trees had reached stationarity. Results were compared and Beggiatoa sp. was chosen as an outgroup for further analyses.
Sequence alignments of the 16S rRNA gene sequences from the cyanobacterial subset and Beggiatoa sp. (59 taxa, 1166 characters) were carried out using Clustal-X with default settings  and corrected manually. Whether the cyanobacterial alignment (excluding the outgroup) was substitutionally saturated was tested using the program DAMBE [109, 110]. The information-entropy based index of substitutional saturation  was used to analyze our alignment of 16S rRNA gene sequences. The test performs only on a maximum of 32 species. Therefore we sampled from our phylogeny 32 representative sequences that span the whole tree, and performed the test introduced by Xia et al. (Table 1 and Additional File 4).
Phylogenetic reconstruction was carried out using Bayesian analysis and maximum likelihood. Maximum likelihood analysis was performed using GARLI 0.96  and Bayesian analysis was conducted with MRBAYES 3.1 . The evolutionary model of nucleotide substitution that best fitted the data was obtained by using the Akaike Information Criterion as implemented in Modeltest 3.5 . The selected model was GTR + G + I. Substitution rates, base frequencies, invariable sites and the shape parameter of the gamma distribution were estimated by the program. Fifty maximum likelihood searches were performed. Bootstrap values were calculated from 500 re-samplings of the data set. The bootstrap values were plotted on the best ML-tree using the program SumTrees  (Additional File 3).
Bayesian analysis was conducted running two (MC 3) searches, each with four chains, one cold and three heated. Starting with a random tree, analyses were run for 16,616,000 generations each, with trees being sampled every 100th generation. The trees were checked for convergence of parameters (standard deviation of split frequencies below 0.01, effective sample sizes above 200, potential scale reduction factor equal to 1.0) using Tracer v1.4.1  and the program AWTY . Burn-in was set to 3,323,200 generations each, corresponding to the first 20% of the analyses. The average standard deviation of split frequencies was below 0.01 for the remaining 132,929 trees of each run, indicating that steady state of the log-likelihoods was reached.
Character state reconstructions were performed using maximum parsimony (MP; Additional File 5) and maximum likelihood criteria as implemented in Mesquite 2.71 . 5,000 trees from each MC 3 run were randomly chosen from the post burn-in Bayesian sample and combined. Discrete characters were coded into multicellular or unicellular states. The results over 10,000 Bayesian trees were summarized and displayed on the consensus tree of the Bayesian analysis. For maximum likelihood estimates, both the "Markov k-state 1 parameter model" (MK1 model) and "Asymmetrical Markov k-state 2 parameter model" (AsymmMK model) were applied. Rate of change is the only parameter in the MK1 model. The AsymmMK model exhibits two parameters, describing the forward and backward transitions between states. Phylogenetic conservativeness of multicellularity was tested by comparing the observed distribution of parsimony steps across 10,000 randomly chosen trees from the Bayesian analysis against the distribution from 1,000 trees modified from the Bayesian consensus by randomly shuffling the terminal taxa, while keeping the relative proportion of states unaltered. The root was assumed to be at equilibrium. Transition rates for the MK1 and AsymmMK model were estimated by the program. Rates for the latter models presented in Table 2 were estimated for the consensus tree. To explore properties of the data set, character states were additionally reconstructed with manually fixed transition rates (F1-F6; Table 2). The state of the outgroup was excluded from the analyses to avoid biased inferences within the ingroup.
The character states of nodes 3, 4 and 5 of the Bayesian consensus tree were additionally estimated using a reversible jump MCMC search as implemented in BayesTraits . MCMC was run for 30 million iterations, and a burnin set to 50,000. The analysis was run several times with parameters of the evolutionary model being chosen from different prior distributions. In order to determine which model offered the best fitting priors, models were tested using Bayes Factors. A hyperprior approach with mean-values of the exponential priors derived from a uniform distribution between 0 to 10 was determined to fit best the data. The results of the analysis were visualized in Tracer v1.5 .
We would like to thank Elena Conti, Brian R. Moore and Jurriaan M.de Vos for helpful comments on an earlier version of our manuscript. Furthermore, we would like to thank Marco Bernasconi whose comments on the final version were of great help, and Jurriaan M. de Vos for help with the software BayesTraits.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.