Contrasting evolutionary patterns of spore coat proteins in two Bacillus species groups are linked to a difference in cellular structure

Background The Bacillus subtilis-group and the Bacillus cereus-group are two well-studied groups of species in the genus Bacillus. Bacteria in this genus can produce a highly resistant cell type, the spore, which is encased in a complex protective protein shell called the coat. Spores in the B. cereus-group contain an additional outer layer, the exosporium, which encircles the coat. The coat in B. subtilis spores possesses inner and outer layers. The aim of this study is to investigate whether differences in the spore structures influenced the divergence of the coat protein genes during the evolution of these two Bacillus species groups. Results We designed and implemented a computational framework to compare the evolutionary histories of coat proteins. We curated a list of B. subtilis coat proteins and identified their orthologs in 11 Bacillus species based on phylogenetic congruence. Phylogenetic profiles of these coat proteins show that they can be divided into conserved and labile ones. Coat proteins comprising the B. subtilis inner coat are significantly more conserved than those comprising the outer coat. We then performed genome-wide comparisons of the nonsynonymous/synonymous substitution rate ratio, dN/dS, and found contrasting patterns: Coat proteins have significantly higher dN/dS in the B. subtilis-group genomes, but not in the B. cereus-group genomes. We further corroborated this contrast by examining changes of dN/dS within gene trees, and found that some coat protein gene trees have significantly different dN/dS between the B subtilis-clade and the B. cereus-clade. Conclusions Coat proteins in the B. subtilis- and B. cereus-group species are under contrasting selective pressures. We speculate that the absence of the exosporium in the B. subtilis spore coat effectively lifted a structural constraint that has led to relaxed negative selection pressure on the outer coat.

Supplementary Information for Contrasting evolutionary patterns of spore coat proteins in two groups of Bacillus species are linked to a cellular structural difference. Hong Qin and Adam Driks Contents Table S1.    (Bmo,Bsu)))), (Bwe,(Bce,(Ban,Bth)))); Ranked highest in 20 genes.

Accepted in 0 genes
The first tree is the neighbor-joining tree based on the concatenated sequences. The first tree is the more representative one and is chosen as the species reference tree. Key differences between the first tree and other topologies are underlined. Alternative branching patterns with the B subtilis-clade or B cereus-clade were often accepted, and these are treated as acceptable alternative topologies during identification of orthologs. Alternative branching patterns violating the two major clades, such as in the 10 th tree, was consistently rejected. Data and codes used to perform the tests are at https://github.com/hongqin/BacillusSporeCoat/tree/master/tree.test/topology.tes ts.

Figure S1. Phylogenetic profile of essential genes.
Blue indicates the presence of orthologous hits and red indicates the absence of detectable orthologous hits. Hierarchical clustering is applied by row. Essential genes are mostly conserved in the B. subtlis clade, and these hits are not informative for clustering analysis. Therefore, we exclude the B. subtilis hits from the hierarchical clustering analysis and heat map presentation.
(A) ω, coat genes (B) dN, coat genes (C) dS, coat genes (D) ω, essential genes  6 | P a g e Figure S3. One of the maximal likelihood trees of 16S rRNA.
Red indicates the B. subtilis-clade, and green indicates the B. cereus-clade. Not all nodes are labeled due to space limit. Notice that 16S genes from many species are intermingled in the B. cereus-clade. Six trees were generated by maximal likelihood method in PAUP. None of them is a resolved species reference tree. The first tree output by PAUP is presented here. Three independent runs of the maximal likelihood inferences were performed.

Improved annotations of Bacillus spore coat proteins.
Our analyses of coat protein phylogeny and orthologous groups allowed us to place coat proteins into phylogenetically meaningful families, and to revisit and improve previous annotations, such as those in the genome databases and in previous studies [2]. About 50% of our orthologous profiles differ from those previously published, and in the case of six genes (cotI, cotS, cotSA, cotV, and ymaG) those profiles are inconsistent in various ways in at least four genomes. For example, we discovered orthologs of YmaG (BG13415) that were not reported previously. We note that low complexity regions (LCRs), which can be responsible for spurious homology hits, are present in YmaG (43% LCR, 53 out of 123 amino acids). We included LCRs in BLASTP searches and detected reciprocal-best-hits of YmaG with E-values less than 1x10 -6 in all of the studied genomes. We argue that at least some of these hits are plausible YmaG orthologs. We believe that the inclusion of LCR sequence in the phylogenetic analyses (albeit with caveats) allows us to generate a more complete understanding of coat protein relatedness than previous studies.
We also generated improved coat protein annotations by more careful analysis of coat protein families. For example, CotI (BG13821, previously YtaA), CotS (BG11380) and YutH (BG14044) constitute a family, based on their similar sequences. Although our analysis indicates that CotI and CotS are not encoded in the B. anthracis and B. cereus genomes, both genes are annotated in those species. Noticeably, TIGR assigned some coat proteins, including the potential YutH orthologs of B. thuringiensis, to both YutH and CotS families using sequence similarity approaches. To better understand this discrepancy, we generated phylogenic trees using the CotS, CotI, and YutH sequences ( Figure S4). The TIGR seed CotS and CotI proteins are from Clostridia, and they are grouped with BG11380 and BG13821, and we did find detectable orthologs of CotS and CotI in the other species of the B. subtilis clade. In contrast, the yutH clade resembles the species reference tree (see the pink and green labeled taxa in Figure S4). Because B. subtlis is a model organism, genes in other Bacillus species are often named after their counterparts in the B. subtilis genome. It appears that the annotated CotI and CotS orthologs in the B. cereus group genomes were named after a wrong B. subtilis gene member and should re-annotated as yutH.
Another interesting finding concerns YobN (BG13506). The two best matches in B. licheniformis to B. subtilis YobN are the contiguous genes NT02BL2036 and NT02BL2037, which are highly similar to the 5'and 3' portions of yobN, respectively. This unusual case did not pass the topology comparison and prompted us to examine it carefully. It turns out that a stop codon at the end of NT02BL2036 introduces an eightresidue gap in the sequence, "splitting" YobN into two ORFs. It is unclear at present whether this is due to a sequencing error or indicates a non-sense mutation with biological implications.
Analysis of sequence characteristics specific to a group of genes can reveal distinctive and shared functional features, as well as evolutionary commonalities. Compared to B. subtilis non-coat proteins, coat proteins are shorter than average in length (p=0.013, one-sided t-test). They contain an average of 6.8% LCRs, significantly more than do the non-coat proteins (with an average of 3.1%) at a p-value of 0.0088 (Wilcoxon rank sum test). Those coat proteins with a relatively large fraction of LCRs tend to be relatively short, but this is the general trend of all of the proteins in B. subtilis (data not shown). Among the 73 coat proteins, 24 show disordered regions based on the consensus of three independent methods implemented in DisEMBL [3]. There is a moderate correlation between LCRs and disordered regions (R 2 =0.11, p-value = 0.007), suggesting that only a small number of disordered regions in coat proteins are due to repeats.
The prevalence of disordered regions in coat proteins raises the possibility that these regions provide a function needed by most or all coat proteins. We do not know whether this is so, but we note that in some cases (such as the muscle protein titin [4]) disordered regions can act effectively as springs that unfold when stretched by an external force. If this were true for the disordered regions within coat proteins, then such regions could at least partially explain the ability of the coat to fold and unfold like the pleats of an accordion, since this dynamic action likely involves stretching and contracting the coat at specific points within each fold [5][6][7][8][9][10][11]. Interestingly, intrinsically disordered regions of proteins can evolve more rapidly than other regions, because disordered regions have fewer constraints than ordered regions [12]. However, because the alignment between sequences with intrinsic disorder may be poorer than other regions, evolutionary rates can appear faster than they actually are. Further analysis will be needed to determine whether these regions within coat proteins are, indeed, evolving faster than other regions.