Massive horizontal gene transfer, strictly vertical inheritance and ancient duplications differentially shape the evolution of Bacillus cereus enterotoxin operons hbl, cytK and nhe

Background Bacillus cereus sensu lato comprises eight closely related species including the human pathogens Bacillus anthracis and Bacillus cereus. Within B. cereus sensu lato, chromosomally and plasmid-encoded toxins exist. While plasmid-mediated horizontal gene transfer of the emetic toxin, anthrax and insecticidal toxins is known, evolution of enterotoxin genes within the group has not been studied. Results We report draft genome assemblies of 25 strains, a phylogenetic network of 142 strains based on ANI derived from genome sequences and a phylogeny based on whole-genome SNP analysis. The data clearly support subdivision of B. cereus sensu lato into seven phylogenetic groups. While group I, V and VII represent B. pseudomycoides, B. toyonensis and B. cytotoxicus, which are distinguishable at species level (ANI border ≥ 96 %), strains ascribed to the other five species do not match phylogenic groups. The chromosomal enterotoxin operons nheABC and hblCDAB are abundant within B. cereus both isolated from infections and from the environment. While the duplicated hbl variant hbla is present in 22 % of all strains investigated, duplication of nheABC is extremely rare (0.02 %) and appears to be phylogenetically unstable. Distribution of toxin genes was matched to a master tree based on seven concatenated housekeeping genes, which depicts species relationships in B. cereus sensu lato as accurately as whole-genome comparisons. Comparison to the phylogeny of enterotoxin genes uncovered ample evidence for horizontal transfer of hbl, cytK and plcR, as well as frequent deletion of both toxins and duplication of hbl. No evidence for nhe deletion was found and stable horizontal transfer of nhe is rare. Therefore, evolution of B. cereus enterotoxin operons is shaped unexpectedly different for yet unknown reasons. Conclusions Frequent exchange of the pathogenicity factors hbl, cytK and plcR in B. cereus sensu lato appears to be an important mechanism of B. cereus virulence evolution, including so-called probiotic or non-pathogenic species, which might have consequences for risk assessment procedures. In contrast, exclusively vertical inheritance of nhe was observed, and since nhe-negative strains appear to be extremely rare, we suggest that fitness loss may be associated with deletion or horizontal transfer of the nhe operon. Electronic supplementary material The online version of this article (doi:10.1186/s12862-015-0529-4) contains supplementary material, which is available to authorized users.


Presence of second nhe a operon in newly sequenced strains
To confirm the presence of the second nhe a operon in some of the newly sequenced strains trimmed and quality filtered read data was aligned separately against each of the suspicious nhe operons as well as the contigs on which the operons are located on. Read alignment was performed using BWA v.0.7.12 [50]. Subsequently, the resulting SAM files were converted into BAM format, whereby reads not mapping to the reference or not being part of a primary alignment to the reference were discarded.
SAM file conversion and filtering was carried out using the 'view' utility of the SAMtools package v.0.1.18 [51]. Filtered BAM files served as input for the 'genomeCoverageBed' utility of the  with only a small number of reads mapping to both copies (# combined reads). Three of the de novo assembled strains were found to contain nhe a , which is discerned from nhe by its uniquely mapping reads.
* The nhe operon maybe be located within the wrong contig due to an unexpected high ratio (cov operon /cov contig ) of 1.44. However, examination of read sets mapping uniquely either to nhe or nhe a unambiguously show that both versions are present within the genome of strain #87.
** Unusually high read number is caused by a ~ 40x higher coverage over the intergenic region (2474 -2634 bp) between nheB and nheC, which may be due to a duplication of this region into a plasmid with high copy numbers.

Presence of second hbl a operon in newly sequenced strains
To confirm the presence of the second hbl a operon in some of the newly sequenced strains the same approach of read remapping and filtering as for the nhe operons was applied. In addition to mapping reads to hbl, hbl a and the contigs the operons are located on, reads were also mapped to an artificial sequence construct separating each strain's version of hbl and hbl a by a sequence of 5,000 'N' characters. Table S6 summarizes the median coverage information obtained for each operon (cov operon ) and the contig (cov contig ) it belongs to as well as the median coverages of hbl and hbl a within artificial sequence constructs (cov construct ). Taking the ratio of cov construct to cov contig shows that all operons fit very well to their genomic backgrounds (contigs), since respective values are close to 1. Taking the ratio of cov operon to cov construct for each individual operon shows that median coverages obtained after remapping against operon sequences alone are higher than compared to the ones after remapping against corresponding artificial constructs, since respective ratio values are greater than 1. This can be explained by the fact that within each artificial construct reads are preferentially forming primary alignments (best hits) to the operon (hbl or hbl a ) where they naturally are originating from. In contrast, when mapping against individual operon sequences alone (no construct!), a substantial fraction of reads originating from hbl a are aligning to hbl as well, but only due to the missing possibility of forming a better alignment with hbl a (since it is not present). This observation accounts also in vice versa direction.
The third copy of hbl in strain #245 is due to an assembly error. On the one hand, an extremely low ratio (cov construct /cov contig ) of 0.04 was found. On the other hand, there are almost no reads mapping uniquely to the third hbl copy, revealing it as a mis-assembled second copy of hbl a .