- Research article
- Open Access
Signature proteins for the major clades of Cyanobacteria
© Gupta and Mathews; licensee BioMed Central Ltd. 2010
- Received: 27 April 2009
- Accepted: 25 January 2010
- Published: 25 January 2010
The phylogeny and taxonomy of cyanobacteria is currently poorly understood due to paucity of reliable markers for identification and circumscription of its major clades.
A combination of phylogenomic and protein signature based approaches was used to characterize the major clades of cyanobacteria. Phylogenetic trees were constructed for 44 cyanobacteria based on 44 conserved proteins. In parallel, Blastp searches were carried out on each ORF in the genomes of Synechococcus WH8102, Synechocystis PCC6803, Nostoc PCC7120, Synechococcus JA-3-3Ab, Prochlorococcus MIT9215 and Prochlor. marinus subsp. marinus CCMP1375 to identify proteins that are specific for various main clades of cyanobacteria. These studies have identified 39 proteins that are specific for all (or most) cyanobacteria and large numbers of proteins for other cyanobacterial clades. The identified signature proteins include: (i) 14 proteins for a deep branching clade (Clade A) of Gloebacter violaceus and two diazotrophic Synechococcus strains (JA-3-3Ab and JA2-3-B'a); (ii) 5 proteins that are present in all other cyanobacteria except those from Clade A; (iii) 60 proteins that are specific for a clade (Clade C) consisting of various marine unicellular cyanobacteria (viz. Synechococcus and Prochlorococcus); (iv) 14 and 19 signature proteins that are specific for the Clade C Synechococcus and Prochlorococcus strains, respectively; (v) 67 proteins that are specific for the Low B/A ecotype Prochlorococcus strains, containing lower ratio of chl b/a2 and adapted to growth at high light intensities; (vi) 65 and 8 proteins that are specific for the Nostocales and Chroococcales orders, respectively; and (vii) 22 and 9 proteins that are uniquely shared by various Nostocales and Oscillatoriales orders, or by these two orders and the Chroococcales, respectively. We also describe 3 conserved indels in flavoprotein, heme oxygenase and protochlorophyllide oxidoreductase proteins that are specific for either Clade C cyanobacteria or for various subclades of Prochlorococcus. Many other conserved indels for cyanobacterial clades have been described recently.
These signature proteins and indels provide novel means for circumscription of various cyanobacterial clades in clear molecular terms. Their functional studies should lead to discovery of novel properties that are unique to these groups of cyanobacteria.
- Lateral Gene Transfer
- Main Clade
- Cyanobacterial Genome
- Protochlorophyllide Oxidoreductase
- Nostoc PCC7120
Cyanobacteria are the sole prokaryotic group that carries out oxygenic photosynthesis. The species from this phylum exhibit enormous diversity in terms of their morphology, physiology and other characteristics (e.g. motility, thermophily, cell division characteristic, nitrogen fixation ability, etc.) [1–5]. The taxonomy and evolutionary relationships among cyanobacteria is presently poorly understood. In the 16S rRNA trees, which provides the current basis for understanding microbial phylogeny, cyanobacteria species/strains form 14 unresolved clusters . Although cyanobacteria is a large phylum with >4000 isolates , only a small number of species and higher taxonomic groups within this phylum have been validly described [8–10]. Except for 16S rRNA, sequence information for cyanobacteria for other genes/proteins sequences until recently was very limited. Hence, the availability of genome sequences has provided new opportunities for understanding cyanobacterial phylogeny and taxonomy. Based upon these sequences, several investigators have assembled phylogenetic trees for cyanobacteria based upon combined sequences for different large sets of proteins. These studies have included analyses of 14 cyanobacteria based upon 34 proteins by Sanchez-Barcaldo et al. , trees for 24 cyanobacteria based upon 583 orthologous proteins by Swingley et al. , and branching patterns of 13 cyanobacteria based upon 682 proteins by Shi and Falkowski . Additionally, Zhaxybayeva et al.  have examined individual phylogenies of 1128 protein-coding genes from 11 cyanobacterial genomes to identify phylogenetic signal exhibited by the plurality of these proteins and to recognize the incidence of lateral gene transfers. These studies have proven very useful in establishing the existence of certain important clades within the sequenced cyanobacteria and in clarifying their relative branching positions [4, 11, 12].
The studies of the above kind, although very useful, are limited to species whose genomes are sequenced. Further, as indicated by earlier work [4, 11, 12], integration of sequence information from any new genome by this approach requires reassembly of the entire phylogenomic tree(s). Based upon the phylogenomic approach it is also difficult to circumscribe various cyanobacterial clades in definitive biochemical or molecular terms, which is important for developing a stable taxonomy [14–16]. Hence, it is important to identify other reliable molecular markers that are consistent with the results of phylogenomic studies, but which can also be used to circumscribe different phylogenetic clades in more definitive (molecular) terms. One approach that has proven very useful in this regard consists of identifying molecular markers or synapomorphies that are specific for different phylogenetically defined clades. Two different kinds of molecular markers are proving very useful for these studies. The first of these consists of conserved inserts and deletions (indels) in widely distributed proteins that are distinctive characteristics of either a given phylum or its different main subgroups [17–21]. Our recent work has identified >40 conserved indels in important proteins that are exclusively present in either all cyanobacteria or many of its major clades that are observed in phylogenomic trees [22, 23]. The presence of several of these indels in the plants/plastids homologs has also provided evidence for the derivation of plastids from cyanobacterial ancestors [22–24]. The second kind of molecular markers consists of whole proteins that are uniquely found in various species from a given phylogenetic clade [25–28]. Martin et al.  have earlier reported Blast analysis on 8 cyanobacterial genomes (6 finished and 2 unfinished) to identify 181 proteins that were uniquely found in at least 7 out of 8 of these cyanobacteria. A later study by Mulkidjanian et al.  on 15 cyanobacterial genomes identified 50 proteins that were uniquely present in at least 14 out of 15 cyanobacteria and 84 others that were exclusively present in plants/plastids and cyanobacteria.
These earlier studies primarily looked for proteins that were uniquely found in most cyanobacteria and no work was carried out on identifying proteins that are specific for various main clades of cyanobacteria, observed in phylogenetic trees. In the past 2-3 years, the number of sequenced cyanobacterial genomes has also more than doubled to a total of 36 genomes. Hence, it was of much interest to carry out both phylogenomic as well as gene content analyses on these genomes to identify signature proteins that are distinctive characteristics of either all cyanobacteria or its various main clades in the phylogenomic trees.
Phylogenomic/phylogenetic analyses on Cyanobacteria
List of Cyanobacterial Genomes Studied in this work
Genome size (Mb)
GC content %
Acaryochloris marina MBIC11017
Anabaena variabilis ATCC 29413
Gloeobacter violaceus PCC 7421
Cyanothece sp. ATCC 51142
Cyanothece sp. PCC 8801
Nostoc sp. PCC 7120
Microcystis aeruginosa NIES-843
Nostoc punctiforme PCC73102
Prochloro. marinus str. AS9601
J. Craig Venter Institute
Prochloro. marinus str. MIT 9211
Prochloro. marinus str. MIT 9215
Prochloro. marinus str. MIT 9301
Prochloro. marinus str. MIT 9303
J. Craig Venter Institute
Prochloro. marinus str. MIT 9312
Prochloro. marinus str. MIT 9313
Prochloro. marinus str. MIT 9515
J. Craig Venter Institute
Prochloro. marinus str. NATL1A
J. Craig Venter Institute
Prochloro. marinus str. NATL2A
DOE Joint Genome Inst.
Prochloro. marinus subsp. marinus str. CCMP1375
Prochloro. marinus subsp. pastoris str. CCMP1986
Synechococcus elongatus PCC 6301
Synechococcus elongatus PCC 7942
Synechococcus sp. CC9311
Synechococcus sp. CC9605
Synechococcus sp. CC9902
Synechococcus sp. JA-2-3B'a(2-13)
Synechococcus sp. JA-3-3Ab
Synechococcus sp. RCC307
Synechococcus sp. WH7803
Synechococcus sp. PCC 7002
Penn. State University
Synechococcus sp. WH8102
Synechocystis sp. PCC 6803
Thermosynechococcus elongatus BP-1
Trichodesmium erythraeum IMS101
DOE Joint Genome Inst.
Most other cyanobacteria could be grouped into two main clades in these trees. One of these clades (designated here as Clade B) is comprised of diverse cyanobacteria including Thermosynechococcus, Acaryochloris, as well as other cyanobacterial groups such as Chroococcales (Synechocystis/Crocosphaera/Microcystis/Cyanothece), Nostocales (Nostoc/Nodularia/Anabaena) and Oscillatoriales (Trichodesmium/Lynbya). Within Clade B, a subclade comprising of the Chroococcales, Nostocales and Oscillatoriales is also observed in both ML and NJ trees (Fig. 1 and additional file 2). The other main clade (clade C) is composed entirely of different strains/isolates of marine unicellular Prochlorococcus and Synechococcus cyanobacteria. This latter clade has been referred to as the Syn/Pro clade  and it corresponds to the subclass Synechococcophycidae in the proposal by Hoffman et al. . Within clade C, different Prochlorococcus and Synechococcus strains/isolates were not completely separated from each other. In particular, two of the Prochlorococcus strains, MIT 9303 and MIT 9313, branched within the Synechoccous strains/isolates, in both ML and NJ trees (Fig. 1 and additional file 2). Similar polyphyletic branching of these strains has been observed in earlier studies [12, 23]. However, in both these trees, one subclade of Prochlorococcus strains, which is referred to as the low B/A ecotype subgroup [40, 41], was separated from all others Prochlorococcus strains by a long-branch. The branching position of the freshwater unicellular cyanobacterium Synechococcus elongatus (strains PCC 6301 and PCC 7942), although it appeared as a deep branching lineage of Clade C, was uncertain in these trees (discussed later).
Signature proteins for Cyanobacteria and its major subgroups
These phylogenetic trees provide a framework for identifying proteins that are specific for either all cyanobacteria or their different well-resolved clades. Based upon earlier studies, within any given group of bacteria or organisms, signature proteins are present at various phylogenetic depths [25, 27, 28, 42–44]. Hence, to identify proteins that are specific for different main clades of cyanobacteria, Blastp searches were carried out on each ORF in the genomes of the following 6 cyanobacteria: Synechococcus sp. WH8102, Synechocystis sp. PCC6803, Nostoc sp. PCC7120, Synechococcus sp. JA-3-3Ab, Prochlorococcus sp. MIT9215 and Pro. marinus subsp. marinus str. CCMP1375. These cyanobacteria are present at the tips of various clades in phylogenetic trees (Fig. 1 and additional file 2). Hence, blast searches with the proteins in them should enable us to identify proteins that are specific for various main clades of cyanobacteria at different phylogenetic depths. The results of these studies are summarized below.
Signature proteins that are specific for Cyanobacteria
Cyanobacterial Signature Proteins
(a) Protein that are Uniquely found in All (or most) Cyanobacteria
(b) Proteins Specific for Various Cyanobacteria Except those from Clade A
(c) Proteins Specific for Various Cyanobacteria Except those from Clade C
Signature proteins for the Clade B cyanobacteria
The Clade B comprises the majority of known cyanobacteria except the unicellular marine cyanobacteria (Clade C) and some deep branching cyanobacteria (see Fig. 1). This clade as defined in our work includes all of the species/strains from the orders Chroococcales, Nostocales and Oscillatoriales as well as the deeper branching cyanobacteria, A. marina and Thermosyn. elongatus. Of these latter cyanobacteria, Acaryochloris is unique in containing chlorophyll d as its primary photosynthetic pigment , whereas Thermosynechococcus is a unicellular thermophilic cyanobacterium . Our analyses have identified 38 proteins that are uniquely shared by all or most of the species/strains from this clade. Two of the Synechococcus strains viz. PCC7002 and PCC7335, also consistently appeared in this group and of these Synechococcus PCC7002, for which sequence information was available from various cyanobacteria, branched with the Chroococcales in phylogenetic trees (Fig. 1 and additional file 2).
Proteins Specific for Clade B Cyanobacteria
(a) Protein that are Uniquely found in All (or most) Clade B Cyanobacteria
photosystem II reaction center
general secretion pathway protein (207)
(b) Proteins Specific for clade B Cyanobacteria and also Synechococcus elongates
Proteins Specific for Different Groups within Clade B Cyanobacteria
(a) Proteins Specific for Nostocales, Oscillatoriales and Chroococcales (NOC) Orders
(b) Proteins Specific for Nostocales and Oscillatoriales Orders
filament integrity protein (179)
(c) Proteins specific for Chroococcales
cytochrome b6-f complex subunit (36)
(d) Proteins specific for Nostocales +
Within Clade B, the heterocyst-forming cyanobacteria form a monophyletic group (subclass Nostocophycidae) [6, 10, 47, 50]. We recently described two conserved indels (a 4 aa insert in the PetA protein, a precursor of the apocytochrome f, and a 5 aa insert in the ribosomal protein S3) that are specific for these bacteria . In the present work, blast searches on the genome of Nostoc sp. PCC7120 have identified 65 proteins that are uniquely shared by all of the sequenced Nostocales species/strains (Nostoc, Anabaena and Nodularia) (Table 4d and additional file 5). Fifty-eight additional protein listed in the additional file 5 are also specific for this order, but they are missing in 1-2 species/strains. These proteins provide potential molecular signatures for the Nostocales order (Nostocophycidae subclass).
The cyanobacteria such as Synechocystis, Microcystis, Crocosphaera and Cyanothece, belonging to the order Chroococcales, form another well-defined clade in phylogenetic trees (see Fig. 1 and additional file 2) [4, 11, 12, 37, 47]. A 1 aa insert in a highly conserved region of the RecA protein is also specific for these cyanobacteria . This insert is also present in Synechococcus sp. PCC7002, which branches with this clade in the phylogenetic trees (see Fig. 1 and additional file 2) [4, 47]. In this work, we have identified 8 proteins that are uniquely present in various sequenced Chroococcales species/strains (Table 4c). The evolutionary stages where the genes for these proteins have likely evolved are indicated in the interpretive diagram (Fig. 2).
Signature proteins for the Clade C Cyanobacteria
Proteins Specific for the Clade C Cyanobacteria (Synechococcus/Prochlorococcus)
predicted membrane protein (87)
predicted membrane protein (203)
predicted membrane protein (128)
dihydroneopterin aldolase (127)
predicted protein family PM-1 (67)
Predicted protein with signal (144)
type II secretion system (149)
TIR domain-containing protein (82)
possible Pollen allergen (139)
(b) Proteins Specific for Clade C which are missing in Low B/A ecotype Prochlorococcus strains #
Predicted protein family PM-3 (178)
Predicted protein family PM-3 (195)
predicted protein family PM-3 (167)
Predicted dehydrogenase (273)
As noted earlier, in phylogenetic trees, the branching position of Syn. elongatus is not resolved. In our analyses, we have come across only 3 proteins (marked with + in Table 5a) that are uniquely found in Clade C species/strains as well as Syn. elongatus. This is in contrast to 22 proteins that are uniquely shared by Clade B cyanobacteria and Syn. elongatus (Table 3b). These observations in conjunction with the unique presence of split DnaE genes in Clade B cyanobacteria and Syn. elongatus make a strong case that Syn. elongatus is more closely related to the Clade B cyanobacteria than to the Clade C species/strains.
Proteins specific for the Main Groups of Clade C Cyanobacteria
(a) Proteins Specific for the Clade C cyanobacteria+ except Prochlorococcus
(b) Proteins Specific for Prochlorococcus
If Prochlorococcus strains/isolates form a monophyletic lineage, then one expect that other cyanobacteria that are part of Clade C might also share many unique proteins in common. Indeed, our blast searches have identified 14 proteins that are uniquely present in various other cyanobacteria (mostly Synechococcus strains) that are part of Clade C (Table 6a). It should be mentioned that for several of these proteins, blast hits indicating significant similarity are also found for Cyanobium sp. PCC7001 and Paulinellla chromatophora, indicating that these cyanobacteria are also part of the Clade C. The grouping of Cyanobium sp. PCC7001 with Clade C is also supported by the conserved indel in the flavoprotein (see Fig. 3).
Earlier studies have led to the division of Prochlorococcus strains/isolates into two physiologically distinct groups (high B/A and low B/A ecotypes), based upon the ratios of chlorophyll b and a2 in their light-harvesting systems and their ability to grow at different light intensities [40, 41, 56]. Of these two groups, strains from the high B/A ecotype, which have larger ratio of chlorophyll b/a2 are able to grow at extremely low irradiance, whereas those from the low- B/A ecotype containing lower ratio of chlorophyll b/a2 are unable to grow under these conditions. The low- B/A ecotype strains instead are adapted to growth at high light intensities, where the growth of high B/A ecotype strains is inhibited. The strains from these two ecotypes also differ in terms of their sensitivity to copper and their ability to use nitrite or nitrate as nitrogen sources [41, 57]. In phylogenetic trees, the low B/A ecotype Prochlorococcus isolates (viz. MIT9515, CCMP1986, MIT9312, MIT9215, MIT9301 and AS9601) formed a distinct subclade that was well separated from all other Clade C species/strains by a long-branch and 100% bootstrap score (Fig. 1 and additional file 2)[23, 41]. We have also described two conserved indels (viz. a 5 aa deletion in leucyl-tRNA synthetase and 1 aa insert in the Ffh protein) that are uniquely shared by all of the low B/A ecotype Prochlorococcus strains . In the present work, we have identified 67 proteins that are exclusively found in all of the sequenced strains from the low B/A ecotype clade (additional file 7a). Seventy-two proteins listed in the additional file 7b are also specific for this clade, but they are missing in 1-2 of the strains/isolates. These signature proteins and indels together with the distinct branching of the low B/A strains in phylogenetic trees provide strong evidence that this group of Prochlorococcus strains are phylogenetically, physiologically and molecularly distinct from all other Prochlorococcus strains. Based upon species distribution patterns of various cyanobacteria-specific proteins, evolutionary stages where the genes for these proteins likely evolved are indicated in the interpretive diagram in Fig. 2.
In this work, we have used a combination of phylogenomic and signature proteins based approaches to elucidate the evolutionary relationships among cyanobacteria. Phylogenetic trees were initially constructed for 44 cyanobacteria based on concatenated sequences for 44 widely distributed proteins present in various cyanobacteria. The branching pattern of cyanobacteria in these trees was very similar to that observed in other recent studies based on different large sets of proteins for smaller numbers of cyanobacteria [4, 11, 12]. In all of these trees a number of distinct clades of cyanobacteria are consistently observed. However, the main focus of the present work was on comparative analyses of cyanobacterial genomes to identify unique sets of genes/proteins that are limited to particular groups of cyanobacteria, corresponding to various phylogenetically identified clades. This work complement our recent studies, where a comparative genomic approach was employed to identify >40 conserved indels in widely distributed proteins that are also specific for the same groups/clades of cyanobacteria .
Recent analyses of genomic sequences have revealed that whole proteins that are limited to different monophyletic clades are present at different phylogenetic depths [26–28, 43, 44, 58, 59]. Unlike ORFan proteins, which are unique to a given species or a strain and are subject to rapid gene loss [44, 60, 61], these lineage-specific proteins are retained in a conserved state by all or most species/strains from a given clade, indicating that they are conferring selective advantage to species from these clades [28, 58, 62]. Although the mechanism responsible for the evolution or acquisition of genes for these proteins is unclear [28, 61], their specific presence in different clades indicates that the genes for these proteins first evolved (or introduced) in a common ancestor of these clades followed by their retention by various descendents of these clades. Because of their clade specificity, these lineage specific-proteins or conserved signature proteins (CSPs) provide valuable molecular markers for these clades [26–28, 43, 59]. Our recent analyses of CSPs from several major groups of bacteria (viz. alpha proteobacteria, epsilon proteobacteria, gamma proteobacteria, chlamydiae, Bacteroidetes-Chlorobi and Actinobacteria) provide evidence that the species distribution of most of these CSPs show high degree of concordance with different clades in the phylogenetic trees [25–27, 42, 63, 64]. This inference is strongly reinforced by the results of present study, where most of the identified CSPs correspond to well-defined clades in the phylogenetic trees.
It should be mentioned that in our analyses we have not come across significant numbers of CSPs that support alternate groupings i.e. where the proteins are commonly shared by various species/strains from clades that are phylogenetically unrelated (e.g. Nostocales and Clade C, or Oscillatoriales and Clade C). However, one commonly observed pattern is that if two clades are close to each other in phylogenetic trees, but their branching is not clearly resolved (i.e. weakly supported by bootstrap scores), then in addition to observing many proteins that are unique to each of these two clades, several proteins that are commonly shared by them are also observed. This could be due to either that genes for many of these proteins probably evolved in a common ancestor of these clades prior to their becoming phylogenetically distinct or due to lateral gene transfers among closely related taxa [13, 65]. Nevertheless, our results that most of these proteins are distinctive characteristics of phylogenetically well-defined monophyletic clades strongly suggest that their species distribution has not been significantly affected by lateral gene transfers, which is indicated to be very common in cyanobacteria [13, 66].
When a protein is confined to only a certain group of species/strains, then based upon this information alone, it is difficult to determine whether the group of species containing this protein form a clade in the phylogenetic sense or not. To properly evaluate the results of such studies, it is necessary to carry out these studies in conjunction with phylogenetic as well as other forms of analyses (e.g. studies based on conserved indels), where it is possible to establish a rooted relationship among different groups or taxa under consideration [23, 26, 59]. Based on these studies, if a given protein is uniquely found in all or most of the species from a well-defined monophyletic clade, and generally no where else, then the simplest and most parsimonious explanation for this is that the gene for this protein first appeared in a common ancestor of this group and then passed on vertically to its various descendants [17, 20, 67]. We have interpreted the results of species distribution of various unique proteins based on this minimal assumption. Based on this interpretation, various identified signature proteins or CSPs could be regarded as molecular synapomorphies that are specific for different clades of cyanobacteria.
The branching order and interrelationships among cyanobacteria that emerges based upon all of these different approaches is shown in Fig. 2. All of these approaches indicate that a clade consisting of Gloebacter and the Synechococcus strains JA-3-3Ab and JA2-3-B'a (Clade A) forms the deepest branching lineage within cyanobacteria. A large number of sequenced cyanobacteria correspond to marine unicellular Synechococcus and Prochlorococcus strains (Clade C). We have identified numerous proteins and conserved indels that are specific for this clade. Although Synechococcus and Prochlorococcus strains do not form monophyletic clusters in phylogenetic trees, the shared presence of many novel proteins as well as some conserved indels by various Prochlorococcus strains provide evidence that this group is monophyletic. The unique pigments that are found in the light harvesting system of Prochlorococcus also support their distinctness from other cyanobacteria. The monophyletic grouping of marine unicellular Synechococcus strains/isolates based upon these molecular and biochemical characteristics is at variance with their polyphyletic branching in different phylogenetic trees (see Fig. 1, additional file 2) [4, 11, 23]. This discordance could be explained by either lateral migration of genes responsible for these characteristics [11, 13, 33, 68], or due to inability of the phylogenetic trees to resolve the branching order among closely related species/strains. Among the Prochlorococcus strains, our analyses confirm that the strains corresponding to low B/A ecotype are distinct not only in physiological and phylogenetic terms [40, 41, 56], but that they also share large numbers of proteins that are unique to them. Several conserved indels that are specific for the low B/A ecotype clade have also been identified . Recent study by Zhaxybayeva et al.  also provides evidence that the high-light adapted low B/A ecotype Prochlorococcus strains form a monophyletic clade, in contrast to the paraphyletic grouping of the low-light adapted (i.e. high B/A ecotype) Prochlorococcus spp. . All of these observations make a strong case for the recognition of low B/A ecotype Prochlorococcus strains as a distinct taxonomic entity.
Within Clade B, many CSPs were identified that are specific for the Nostocales and Chroococcales orders. In addition, several other CSPs are uniquely present in the Nostocales and Oscillatoriales orders, or by the Nostocales, Oscillatoriales and Chroococcales. In recent work, a number of conserved indels that are unique to these orders of cyanobacteria have also been identified . Although, the clade comprising of these cyanobacterial orders is not clearly resolved in phylogenetic trees [4, 11], the shared presence of large numbers of novel CSPs as well as some conserved indels by these cyanobacteria strongly suggests that species/strains from these groups shared a common ancestor exclusive of other cyanobacteria and that this clade represents a deeper branching grouping within cyanobacteria. The results presented here also suggest that Syn. elongatus is more closely related to Clade B in comparison to either clade A or C of cyanobacteria.
The signature proteins and conserved indels for different cyanobacterial clades that are described in this work and in our recent studies  provide novel and powerful means for understanding cyanobacterial phylogeny and taxonomy. Based on these molecular markers, all of the main clades of cyanobacteria can now be identified and circumscribed in molecular terms. These signature proteins and indels should also prove useful for the identification and assignment of cyanobacterial species/strains to specific clades based upon the presence or absence of various signature indels or CSPs. Because many of these CSPs, or proteins containing the conserved indels, are highly conserved, degenerate PCR primers could be readily designed to sequence the corresponding genes/proteins from any given cyanobacteria. The assignment of any species/strains into a given clade by this approach is based upon several independent signatures that provide complementary information. Some of these signatures serve to exclude a given species/strains from particular groups or clades, whereas others point to its inclusion in more and more specific clades. Blast searches with these cyanobacteria-specific CSPs should also prove useful in determining the presence or absence of different groups of cyanobacteria in metagenomic sequences 
Most of the cyanobacterial signature proteins identified in this work are of unknown functions. However, the retention of these genes by all cyanobacteria from the indicated clades strongly suggests that these proteins perform important functions in these groups of cyanobacteria [70–72]. Likewise, our recent work shows that the conserved indels in protein sequences are also essential for the group or clade of species where they are found . Hence, further work on understanding the cellular functions of these cyanobacterial signature proteins and signature indels should be of great interest. These studies should provide valuable insights regarding biochemical and physiological characteristics that are unique to different clades of cyanobacteria [64, 74–76].
Phylogenetic analyses were carried out on a set of 44 proteins involved in important housekeeping functions that are present in most organisms (see Additional file 1) . Blast searches with these proteins revealed that their homologs were present in all 34 sequenced cyanobacterial genomes (listed in Table 1), the two outgroup species (Bacillus subtilis and Staphylococcus aureus), as well as 10 other cyanobacteria (viz. Crocosphaera watsonii WH8501, Cyanothece sp. CCY0110, Lyngbya sp. PCC8106, Microcystis aeruginosa PCC7806, Nodularia spumigena CCY9414, Syenchococcus sp. WH5701, Syenchococcus sp. BL107, Syenchococcus sp. RS9917, Syenchococcus sp. RS9916 and Syenchococcus sp. WH7805). Hence, sequence information for all of these cyanobacteria was included in our analyses. The multiple sequence alignments for these proteins were created using the ClustalX 1.83 program  and they were concatenated into a single large file. This unedited sequence alignment was imported into the Gblocks 0.91b program to remove poorly aligned regions . This program was used with default settings except that allowed gap position parameter was changed to half. The resulting final alignment of 16834 amino acid sites was used for phylogenetic analyses. A neighbour-joining (NJ) tree based on 1000 bootstrap replicates was constructed by the Kimura model  using the TREECON 1.3b program . The maximum-likelihood (ML) analysis was carried out using the WAG+F model with gamma distribution of evolutionary rates with four categories using the TREE-PUZZLE program with 10000 puzzling steps .
Identification of proteins and conserved indels that are specific for Cyanobacteria
The Blastp searches were carried out on each ORF in the genomes of Synechococcus sp. WH8102, Synechocystis sp. PCC6803, Nostoc sp. PCC7120, Synechococcus sp. JA-3-3Ab, Prochlorococcus sp. MIT9215 and Prochlorococcus marinus subsp. marinus str. CCMP1375 to identify proteins that are uniquely present in various clades of cyanobacteria seen in the phylogenetic trees (Fig. 1). The blast searches were performed against all organisms (i.e. non-redundant (nr) database) using the default parameters, without the low complexity filter . The proteins that were of interest were those where either all significant hits were from the indicated groups of cyanobacteria, or which involved a large increase in E values from the last hit belonging to a particular clade to the first hit from any other bacteria/cyanobacteria and the E values for the latter hits were >1e-04, indicating weak similarity that could occur by chance. Higher E values are often significant for smaller proteins as the magnitude of the E value depends upon the length of the query sequence . Hence, the lengths of the query proteins and those of various hits were also taken into consideration when analyzing the results of these studies. In most cases, the lengths of various significant hits were very similar to those of the query proteins. Some proteins, which in addition to cyanobacteria were also found in the plants/plastids, or in an isolated species from some other groups (noted appropriately), were also retained. The proteins, which were uniquely found in a given species or strain were not examined in this work. For all cyanobacterial proteins that are specific for various clades or subgroups, their accession numbers, any information regarding cellular functions, and protein lengths, were tabulated and are presented. Identification of new conserved indels that are specific for cyanobacterial clades was carried out as described in our earlier work [22, 23].
This work was supported by a research grant from the Natural Science and Engineering Research Council of Canada. We thank Kenneth Ng and Amy Mok for assistance in carrying out some earlier blast searches on the cyanobacterial genomes.
- Rippka R, Deruelles J, Waterbury JB, Herdman M, Stanier RY: Generic assignments, strain histories and properties of pure cultures of cyanobacteria. J Gen Microbiol. 1979, 111: 1-61.Google Scholar
- Kondratieva EN, Pfennig N, Truper HG: The Phototrophic Prokaryotes. The Prokaryotes. Edited by: Balows A, Truper HG, Dworkin M, Harder W, Schleifer KH. 1992, New York: Springer-Verlag, 312-330.Google Scholar
- Castenholz RW: Phylum BX. Cyanobacteria: Oxygenic Photosynthetic Bacteria. Bergey's Manual of Systematic Bacteriology. Edited by: Boone DR, Castenholz RW. 2001, New York: Springer, 474-487.Google Scholar
- Sanchez-Baracaldo P, Hayes PK, Blank CE: Morphological and habitat evolution in the Cyanobacteria using a compartmentalization approach. Geobiology. 2005, 3: 145-165. 10.1111/j.1472-4669.2005.00050.x.View ArticleGoogle Scholar
- Wilmotte A, Golubic S: Morphological and genetic criteria in the taxonomy of Cyanophyta/Cyanobacteria. Arhciv fur Hydrobiologie. 1991, 64: 1-24.Google Scholar
- Wilmotte A, Herdman M: Phylogenetic Relationships among the Cyanobacteria Based on 16S rRNA Sequences. Bergey's Manual of Systematic Bacteriology. Edited by: Boone DR, Castenholz RW. 2001, New York: Springer, 487-493.Google Scholar
- Maidak BL, Cole JR, Lilburn TG, Parker CT, Saxman PR, Farris RJ, Garrity GM, Olsen GJ, Schmidt TM, Tiedje JM: The RDP-II (Ribosomal Database Project). Nucleic Acids Res. 2001, 29: 173-174. 10.1093/nar/29.1.173.PubMed CentralView ArticlePubMedGoogle Scholar
- Garrity GM, Bell JA, Lilburn TG: The Revised Road Map to the Manual. Bergey's Manual of Systematic Bacteriology, Part A, Introductory Essays. Edited by: Brenner DJ, Krieg NR, Staley JT. 2005, New York: Springer, 2: 159-220. full_text.View ArticleGoogle Scholar
- Oren A: A proposal for further integration of the cyanobacteria under the Bacteriological Code. Int J Syst Evol Microbiol. 2004, 54: 1895-1902. 10.1099/ijs.0.03008-0.View ArticlePubMedGoogle Scholar
- Hoffmann L: Nomenclature of Cyanophyta/Cyanobacteria: roundtable on the unification of the nomenclature under the Botanical and Bacteriological Codes. Algological Studies. 2005, 117: 13-29. 10.1127/1864-1318/2005/0117-0013.View ArticleGoogle Scholar
- Swingley WD, Blankenship RE, Raymond J: Integrating Markov clustering and molecular phylogenetics to reconstruct the cyanobacterial species tree from conserved protein families. Mol Biol Evol. 2008, 25: 643-654. 10.1093/molbev/msn034.View ArticlePubMedGoogle Scholar
- Shi T, Falkowski PG: Genome evolution in cyanobacteria: the stable core and the variable shell. Proc Natl Acad Sci USA. 2008, 105: 2510-2515. 10.1073/pnas.0711165105.PubMed CentralView ArticlePubMedGoogle Scholar
- Zhaxybayeva O, Gogarten JP, Charlebois RL, Doolittle WF, Papke RT: Phylogenetic analyses of cyanobacterial genomes: quantification of horizontal gene transfer events. Genome Res. 2006, 16: 1099-1108. 10.1101/gr.5322306.PubMed CentralView ArticlePubMedGoogle Scholar
- Oren A, Stackebrandt E: Prokaryote taxonomy online: challenges ahead. Nature. 2002, 419: 15-10.1038/419015c.View ArticlePubMedGoogle Scholar
- Hoffmann L, Komarek J, kastovsky J: System of Cyanoprokaryotes (Cyanobacteria) - State in 2004. Algological Studies. 2005, 95-1155. 10.1127/1864-1318/2005/0117-0095.Google Scholar
- Gupta RS, Griffiths E: Critical Issues in Bacterial Phylogenies. Theor Popul Biol. 2002, 61: 423-434. 10.1006/tpbi.2002.1589.View ArticlePubMedGoogle Scholar
- Gupta RS: Protein Phylogenies and Signature Sequences: A Reappraisal of Evolutionary Relationships Among Archaebacteria, Eubacteria, and Eukaryotes. Microbiol Mol Biol Rev. 1998, 62: 1435-1491.PubMed CentralPubMedGoogle Scholar
- Gupta RS: The phylogeny of Proteobacteria: relationships to other eubacterial phyla and eukaryotes. FEMS Microbiol Rev. 2000, 24: 367-402. 10.1111/j.1574-6976.2000.tb00547.x.View ArticlePubMedGoogle Scholar
- Delwiche CF, Kuhsel M, Palmer JD: Phylogenetic analysis of tufA sequences indicates a cyanobacterial origin of all plastids. Mol Phylogenet Evol. 1995, 4: 110-128. 10.1006/mpev.1995.1012.View ArticlePubMedGoogle Scholar
- Rivera MC, Lake JA: Evidence that eukaryotes and eocyte prokaryotes are immediate relatives. Science. 1992, 257: 74-76. 10.1126/science.1621096.View ArticlePubMedGoogle Scholar
- Griffiths E, Gupta RS: Phylogeny and shared conserved inserts in proteins provide evidence that Verrucomicrobia are the closest known free-living relatives of chlamydiae. Microbiology. 2007, 153: 2648-2654. 10.1099/mic.0.2007/009118-0.View ArticlePubMedGoogle Scholar
- Gupta RS, Pereira M, Chandrasekera C, Johari V: Molecular signatures in protein sequences that are characteristic of Cyanobacteria and plastid homologues. Int J Syst Evol Microbiol. 2003, 53: 1833-1842. 10.1099/ijs.0.02720-0.View ArticlePubMedGoogle Scholar
- Gupta RS: Protein signatures (molecular synapomorphies) that are distinctive characteristics of the major cyanobacterial clades. Int J Syst Evol Microbiol. 2009, 59: 2510-2526. 10.1099/ijs.0.005678-0.View ArticlePubMedGoogle Scholar
- Palmer JD, Delwiche CF: The origin and evolution of plastids and their genomes. Molecular Systematics of Plants II DNA Sequencing. Edited by: Sotis DE, Soltis PE, Doyle JJ. 1998, Norwell, MA, USA. Kluwer Academic Publishers, 375-409.View ArticleGoogle Scholar
- Gao B, Parmanathan R, Gupta RS: Signature proteins that are distinctive characteristics of Actinobacteria and their subgroups. Antonie van Leeuwenhoek. 2006, 90: 69-91. 10.1007/s10482-006-9061-2.View ArticlePubMedGoogle Scholar
- Gupta RS, Lorenzini E: Phylogeny and molecular signatures (conserved proteins and indels) that are specific for the Bacteroidetes and Chlorobi species. BMC Evol Biol. 2007, 7: 71-10.1186/1471-2148-7-71.PubMed CentralView ArticlePubMedGoogle Scholar
- Gupta RS, Mok A: Phylogenomics and signature proteins for the alpha Proteobacteria and its main groups. BMC Microbiol. 2007, 7: 106-10.1186/1471-2180-7-106.PubMed CentralView ArticlePubMedGoogle Scholar
- Dutilh BE, Snel B, Ettema TJ, Huynen MA: Signature genes as a phylogenomic tool. Mol Biol Evol. 2008, 25: 1659-1667. 10.1093/molbev/msn115.PubMed CentralView ArticlePubMedGoogle Scholar
- Martin KA, Siefert JL, Yerrapragada S, Lu Y, McNeill TZ, Moreno PA, Weinstock GM, Widger WR, Fox GE: Cyanobacterial signatures genes. Photosynth Res. 2003, 75: 211-221. 10.1023/A:1023990402346.View ArticlePubMedGoogle Scholar
- Mulkidjanian AY, Koonin EV, Makarova KS, Mekhedov SL, Sorokin A, Wolf YI, Dufresne A, Partensky F, Burd H, Kaznadzey D, Haselkorn R, Galperin MY: The cyanobacterial genome core and the origin of photosynthesis. Proc Natl Acad Sci USA. 2006, 103: 13126-13131. 10.1073/pnas.0605709103.PubMed CentralView ArticlePubMedGoogle Scholar
- Ciccarelli FD, Doerks T, von Mering C, Creevey CJ, Snel B, Bork P: Toward automatic reconstruction of a highly resolved tree of life. Science. 2006, 311: 1283-1287. 10.1126/science.1123061.View ArticlePubMedGoogle Scholar
- Gogarten JP, Townsend JP: Horizontal gene transfer, genome innovation and evolution. Nat Rev Microbiol. 2005, 3: 679-687. 10.1038/nrmicro1204.View ArticlePubMedGoogle Scholar
- Zhaxybayeva O, Doolittle WF, Papke RT, Gogarten JP: Intertwined evolutionary histories of marine Synechococcus and Prochlorococcus marinus. Genome Biology and Evolution. 2009, 1: 325-339. 10.1093/gbe/evp032.PubMed CentralView ArticlePubMedGoogle Scholar
- Herbeck JT, Degnan PH, Wernegreen JJ: Nonhomogeneous model of sequence evolution indicates independent origins of primary endosymbionts within the enterobacteriales (gamma-Proteobacteria). Mol Biol Evol. 2005, 22: 520-532. 10.1093/molbev/msi036.View ArticlePubMedGoogle Scholar
- Harris JK, Kelley ST, Spiegelman GB, Pace NR: The genetic core of the universal ancestor. Genome Res. 2003, 13: 407-412. 10.1101/gr.652803.PubMed CentralView ArticlePubMedGoogle Scholar
- Nakamura Y, Kaneko T, Sato S, Mimuro M, Miyashita H, Tsuchiya T, Sasamoto S, Watanabe A, Kawashima K, Kishida Y, Kiyokawa C, Kohara M, Matsumoto M, Matsuno A, Nakazaki N, Shimpo S, Takeuchi C, Yamada M, Tabata S: Complete genome structure of Gloeobacter violaceus PCC a cyanobacterium that lacks thylakoids. DNA Res. 7421, 10: 137-145. 10.1093/dnares/10.4.137.View ArticleGoogle Scholar
- Honda D, Yokota A, Sugiyama J: Detection of seven major evolutionary lineages in cyanobacteria based on the 16S rRNA gene sequence analysis with new sequences of five marine Synechococcus strains. J Mol Evol. 1999, 48: 723-739. 10.1007/PL00006517.View ArticlePubMedGoogle Scholar
- Giovannoni SJ, Turner S, Olsen GJ, Barns S, Lane DJ, Pace NR: Evolutionary relationships among cyanobacteria and green chloroplasts. J Bacteriol. 1988, 170: 3584-3592.PubMed CentralPubMedGoogle Scholar
- Seo PS, Yokota A: The phylogenetic relationships of cyanobacteria inferred from 16S rRNA, gyrB, rpoC1 and rpoD1 gene sequences. J Gen Appl Microbiol. 2003, 49: 191-203. 10.2323/jgam.49.191.View ArticlePubMedGoogle Scholar
- Rocap G, Larimer FW, Lamerdin J, Malfatti S, Chain P, Ahlgren NA, Arellano A, Coleman M, Hauser L, Hess WR, Johnson ZI, Land M, Lindell D, Post AF, Regala W, Shah M, Shaw SL, Steglich C, Sullivan MB, Ting CS, Tolonen A, Webb EA, Zinser ER, Chisholm SW: Genome divergence in two Prochlorococcus ecotypes reflects oceanic niche differentiation. Nature. 2003, 424: 1042-1047. 10.1038/nature01947.View ArticlePubMedGoogle Scholar
- Rocap G, Distel DL, Waterbury JB, Chisholm SW: Resolution of Prochlorococcus and Synechococcus ecotypes by using 16S-23S ribosomal DNA internal transcribed spacer sequences. Appl Environ Microbiol. 2002, 68: 1180-1191. 10.1128/AEM.68.3.1180-1191.2002.PubMed CentralView ArticlePubMedGoogle Scholar
- Gao B, Mohan R, Gupta RS: Phylogenomics and protein signatures elucidating the evolutionary relationships among the Gammaproteobacteria. Int J Syst Evol Microbiol. 2009, 59: 234-247. 10.1099/ijs.0.002741-0.View ArticlePubMedGoogle Scholar
- Gao B, Gupta RS: Phylogenomic analysis of proteins that are distinctive of Archaea and its main subgroups and the origin of methanogenesis. BMC Genomics. 2007, 8: 86-10.1186/1471-2164-8-86.PubMed CentralView ArticlePubMedGoogle Scholar
- Lerat E, Daubin V, Ochman H, Moran NA: Evolutionary Origins of Genomic Repertoires in Bacteria. PLoS Biol. 2005, 3: e130-10.1371/journal.pbio.0030130.PubMed CentralView ArticlePubMedGoogle Scholar
- Swingley WD, Chen M, Cheung PC, Conrad AL, Dejesa LC, Hao J, Honchak BM, Karbach LE, Kurdoglu A, Lahiri S, Mastrian SD, Miyashita H, Page L, Ramakrishna P, Satoh S, Sattley WM, Shimada Y, Taylor HL, Tomo T, Tsuchiya T, Wang ZT, Raymond J, Mimuro M, Blankenship RE, Touchman JW: Niche adaptation and genome expansion in the chlorophyll d-producing cyanobacterium Acaryochloris marina. Proc Natl Acad Sci USA. 2008, 105: 2005-2010. 10.1073/pnas.0709772105.PubMed CentralView ArticlePubMedGoogle Scholar
- Nakamura Y, Kaneko T, Sato S, Ikeuchi M, Katoh H, Sasamoto S, Watanabe A, Iriguchi M, Kawashima K, Kimura T, Kishida Y, Kiyokawa C, Kohara M, Matsumoto M, Matsuno A, Nakazaki N, Shimpo S, Sugimoto M, Takeuchi C, Yamada M, Tabata S: Complete genome structure of the thermophilic cyanobacterium Thermosynechococcus elongatus BP-1. DNA Research. 2002, 9: 123-130. 10.1093/dnares/9.4.123.View ArticlePubMedGoogle Scholar
- Turner S, Pryer KM, Miao VP, Palmer JD: Investigating deep phylogenetic relationships among cyanobacteria and plastids by small subunit rRNA sequence analysis. J Eukaryot Microbiol. 1999, 46: 327-338. 10.1111/j.1550-7408.1999.tb04612.x.View ArticlePubMedGoogle Scholar
- Kaneko T, Nakamura Y, Wolk CP, Kuritz T, Sasamoto S, Watanabe A, Iriguchi M, Ishikawa A, Kawashima K, Kimura T, Kishida Y, Kohara M, Matsumoto M, Matsuno A, Muraki A, Nakazaki N, Shimpo S, Sugimoto M, Takazawa M, Yamada M, Yasuda M, Tabata S: Complete genomic sequence of the filamentous nitrogen-fixing cyanobacterium Anabaena sp. strain PCC 7120. DNA Res. 2001, 8: 205-213. 10.1093/dnares/8.5.205.View ArticlePubMedGoogle Scholar
- Caspi J, Amitai G, Belenkiy O, Pietrokovski S: Distribution of split DnaE inteins in cyanobacteria. Mol Microbiol. 2003, 50: 1569-1577. 10.1046/j.1365-2958.2003.03825.x.View ArticlePubMedGoogle Scholar
- Adams DG: Heterocyst formation in cyanobacteria. Curr Opin Microbiol. 2000, 3: 618-624. 10.1016/S1369-5274(00)00150-8.View ArticlePubMedGoogle Scholar
- Dufresne A, Salanoubat M, Partensky F, Artiguenave F, Axmann IM, Barbe V, Duprat S, Galperin MY, Koonin EV, Le Gall F, Makarova KS, Ostrowski M, Oztas S, Robert C, Rogozin IB, Scanlan DJ, De Marsac NT, Weissenbach J, Wincker P, Wolf YI, Hess WR: Genome sequence of the cyanobacterium Prochlorococcus marinus SS120, a nearly minimal oxyphototrophic genome. Proc Natl Acad Sci USA. 2003, 100: 10020-10025. 10.1073/pnas.1733211100.PubMed CentralView ArticlePubMedGoogle Scholar
- Palenik B, Brahamsha B, Larimer FW, Land M, Hauser L, Chain P, Lamerdin J, Regala W, Allen EE, McCarren J, Paulsen I, Dufresne A, Partensky F, Webb EA, Waterbury J: The genome of a motile marine Synechococcus. Nature. 2003, 424: 1037-1042. 10.1038/nature01943.View ArticlePubMedGoogle Scholar
- Palenik B, Ren Q, Dupont CL, Myers GS, Heidelberg JF, Badger JH, Madupu R, Nelson WC, Brinkac LM, Dodson RJ, Durkin AS, Daugherty SC, Sullivan SA, Khouri H, Mohamoud Y, Halpin R, Paulsen IT: Genome sequence of Synechococcus CC9311: Insights into adaptation to a coastal environment. Proc Natl Acad Sci USA. 2006, 103: 13555-13559. 10.1073/pnas.0602963103.PubMed CentralView ArticlePubMedGoogle Scholar
- Sugishima M, Migita CT, Zhang X, Yoshida T, Fukuyama K: Crystal structure of heme oxygenase-1 from cyanobacterium Synechocystis sp. PCC 6803 in complex with heme. Eur J Biochem. 2004, 271: 4517-4525. 10.1111/j.1432-1033.2004.04411.x.View ArticlePubMedGoogle Scholar
- Heyes DJ, Scrutton NS: Conformational changes in the catalytic cycle of protochlorophyllide oxidoreductase: what lessons can be learnt from dihydrofolate reductase?. Biochem Soc Trans. 2009, 37: 354-357. 10.1042/BST0370354.View ArticlePubMedGoogle Scholar
- Moore LR, Rocap G, Chisholm SW: Physiology and molecular phylogeny of coexisting Prochlorococcus ecotypes. Nature. 1998, 393: 464-467. 10.1038/30965.View ArticlePubMedGoogle Scholar
- Ferris MJ, Palenik B: Niche adaptation in ocean cyanobacteria. Nature. 1998, 396: 226-228. 10.1038/24297.View ArticleGoogle Scholar
- Narra HP, Cordes MH, Ochman H: Structural features and the persistence of acquired proteins. Proteomics. 2008, 8: 4772-4781. 10.1002/pmic.200800061.PubMed CentralView ArticlePubMedGoogle Scholar
- Gao B, Mohan R, Gupta RS: Phylogenomics and protein signatures elucidating the evolutionary relationships among the Gammaproteobacteria. Int J Syst Evol Microbiol. 2009, 59: 234-247. 10.1099/ijs.0.002741-0.View ArticlePubMedGoogle Scholar
- Siew N, Fischer D: Analysis of singleton ORFans in fully sequenced microbial genomes. Proteins. 2003, 53: 241-251. 10.1002/prot.10423.View ArticlePubMedGoogle Scholar
- Kuo CH, Ochman H: The fate of new bacterial genes. FEMS Microbiol Rev. 2009, 33: 38-43. 10.1111/j.1574-6976.2008.00140.x.View ArticlePubMedGoogle Scholar
- Fang G, Rocha EP, Danchin A: Persistence drives gene clustering in bacterial genomes. BMC Genomics. 2008, 9: 4-10.1186/1471-2164-9-4.PubMed CentralView ArticlePubMedGoogle Scholar
- Gupta RS: Molecular signatures (unique proteins and conserved Indels) that are specific for the epsilon proteobacteria (Campylobacterales). BMC Genomics. 2006, 7: 167-10.1186/1471-2164-7-167.PubMed CentralView ArticlePubMedGoogle Scholar
- Gupta RS, Griffiths E: Chlamydiae-specific proteins and indels: novel tools for studies. Trends Microbiol. 2006, 14: 527-535. 10.1016/j.tim.2006.10.002.View ArticlePubMedGoogle Scholar
- Gogarten JP, Doolittle WF, Lawrence JG: Prokaryotic evolution in light of gene transfer. Mol Biol Evol. 2002, 19: 2226-2238.View ArticlePubMedGoogle Scholar
- Raymond J, Zhaxybayeva O, Gogarten JP, Gerdes SY, Blankenship RE: Whole-genome analysis of photosynthetic prokaryotes. Science. 2002, 298: 1616-1620. 10.1126/science.1075558.View ArticlePubMedGoogle Scholar
- Rokas A, Holland PW: Rare genomic changes as a tool for phylogenetics. Trends Ecol Evol. 2000, 15: 454-459. 10.1016/S0169-5347(00)01967-4.View ArticlePubMedGoogle Scholar
- Huang J, Gogarten JP: Ancient gene transfer as a tool in phylogenetic reconstruction. Methods Mol Biol. 2009, 532: 127-139. full_text.View ArticlePubMedGoogle Scholar
- von Mering C, Hugenholtz P, Raes J, Tringe SG, Doerks T, Jensen LJ, Ward N, Bork P: Quantitative phylogenetic assessment of microbial communities in diverse environments. Science. 2007, 315: 1126-1130. 10.1126/science.1133420.View ArticlePubMedGoogle Scholar
- Doerks T, von Mering C, Bork P: Functional clues for hypothetical proteins based on genomic context analysis in prokaryotes. Nucleic Acids Res. 2004, 32: 6321-6326. 10.1093/nar/gkh973.PubMed CentralView ArticlePubMedGoogle Scholar
- Fang G, Rocha E, Danchin A: How essential are nonessential genes?. Mol Biol Evol. 2005, 22: 2147-2156. 10.1093/molbev/msi211.View ArticlePubMedGoogle Scholar
- Yang Z: The power of phylogenetic comparison in revealing protein function. Proc Natl Acad Sci USA. 2005, 102: 3179-3180. 10.1073/pnas.0500371102.PubMed CentralView ArticlePubMedGoogle Scholar
- Singh B, Gupta RS: Conserved inserts in the Hsp60 (GroEL) and Hsp70 (DnaK) proteins are essential for cellular growth. Mol Genet Genomics. 2009, 281: 361-373. 10.1007/s00438-008-0417-3.View ArticlePubMedGoogle Scholar
- Roberts RJ: Identifying protein function--a call for community action. PLoS Biol. 2004, 2: E42-10.1371/journal.pbio.0020042.PubMed CentralView ArticlePubMedGoogle Scholar
- Galperin MY, Koonin EV: 'Conserved hypothetical' proteins: prioritization of targets for experimental study. Nucleic Acids Res. 2004, 32: 5452-5463. 10.1093/nar/gkh885.PubMed CentralView ArticlePubMedGoogle Scholar
- Danchin A: From protein sequence to function. Curr Opin Struct Biol. 1999, 9: 363-367. 10.1016/S0959-440X(99)80049-9.View ArticlePubMedGoogle Scholar
- Jeanmougin F, Thompson JD, Gouy M, Higgins DG, Gibson TJ: Multiple sequence alignment with Clustal x. Trends Biochem Sci. 1998, 23: 403-405. 10.1016/S0968-0004(98)01285-7.View ArticlePubMedGoogle Scholar
- Castresana J: Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol Biol Evol. 2000, 17: 540-552.View ArticlePubMedGoogle Scholar
- Kimura M: The Neutral Theory of Molecular Evolution. 1983, Cambridge: Cambridge University PressView ArticleGoogle Scholar
- Peer Van de Y, De Wachter R: TREECON for Windows: a software package for the construction and drawing of evolutionary trees for the Microsoft Windows environment. Comput Appl Biosci. 1994, 10: 569-570.PubMedGoogle Scholar
- Schmidt HA, Strimmer K, Vingron M, von Haeseler A: TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics. 2002, 18: 502-504. 10.1093/bioinformatics/18.3.502.View ArticlePubMedGoogle Scholar
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein databases search programs. Nucleic Acids Research. 1997, 25: 3389-3402. 10.1093/nar/25.17.3389.PubMed CentralView ArticlePubMedGoogle Scholar
- Sugita C, Ogata K, Shikata M, Jikuya H, Takano J, Furumichi M, Kanehisa M, Omata T, Sugiura M, Sugita M: Complete nucleotide sequence of the freshwater unicellular cyanobacterium Synechococcus elongatus PCC 6301 chromosome: gene content and organization. Photosynth Res. 2007, 93: 55-67. 10.1007/s11120-006-9122-4.View ArticlePubMedGoogle Scholar
- Dufresne A, Ostrowski M, Scanlan DJ, Garczarek L, Mazard S, Palenik BP, Paulsen IT, De Marsac NT, Wincker P, Dossat C, Ferriera S, Johnson J, Post AF, Hess WR, Partensky F: Unraveling the genomic mosaic of a ubiquitous genus of marine cyanobacteria. Genome Biol. 2008, 9: R90-10.1186/gb-2008-9-5-r90.PubMed CentralView ArticlePubMedGoogle Scholar
- Kaneko T, Sato S, Kotani H, Tanaka A, Asamizu E, Nakamura Y, Miyajima N, Hirosawa M, Sugiura M, Sasamoto S, Kimura T, Hosouchi T, Matsuno A, Muraki A, Nakazaki N, Naruo K, Okumura S, Shimpo S, Takeuchi C, Wada T, Watanabe A, Yamada M, Yasuda M, Tabata S: Sequence analysis of the genome of the unicellular cyanobacterium Synechocystis sp. strain PCC6803. II. Sequence determination of the entire genome and assignment of potential protein-coding regions. DNA Research. 1996, 3: 109-136. 10.1093/dnares/3.3.109.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.