- Research article
- Open Access
Micro-evolution of three Streptococcus species: selection, antigenic variation, and horizontal gene inflow
- Pavel V. Shelyakin†1, 2, 3Email authorView ORCID ID profile,
- Olga O. Bochkareva†2, 3,
- Anna A. Karan4 and
- Mikhail S. Gelfand2, 3, 5
© The Author(s) 2019
- Received: 22 December 2017
- Accepted: 25 February 2019
- Published: 27 March 2019
The genus Streptococcus comprises pathogens that strongly influence the health of humans and animals. Genome sequencing of multiple Streptococcus strains demonstrated high variability in gene content and order even in closely related strains of the same species and created a newly emerged object for genomic analysis, the pan-genome. Here we analysed the genome evolution of 25 strains of Streptococcus suis, 50 strains of Streptococcus pyogenes and 28 strains of Streptococcus pneumoniae.
Fractions of the pan-genome, unique, periphery, and universal genes differ in size, functional composition, the level of nucleotide substitutions, and predisposition to horizontal gene transfer and genomic rearrangements. The density of substitutions in intergenic regions appears to be correlated with selection acting on adjacent genes, implying that more conserved genes tend to have more conserved regulatory regions.
The total pan-genome of the genus is open, but only due to strain-specific genes, whereas other pan-genome fractions reach saturation. We have identified the set of genes with phylogenies inconsistent with species and non-conserved location in the chromosome; these genes are rare in at least one species and have likely experienced recent horizontal transfer between species. The strain-specific fraction is enriched with mobile elements and hypothetical proteins, but also contains a number of candidate virulence-related genes, so it may have a strong impact on adaptability and pathogenicity.
Mapping the rearrangements to the phylogenetic tree revealed large parallel inversions in all species. A parallel inversion of length 15 kB with breakpoints formed by genes encoding surface antigen proteins PhtD and PhtB in S. pneumoniae leads to replacement of gene fragments that likely indicates the action of an antigen variation mechanism.
Members of genus Streptococcus have a highly dynamic, open pan-genome, that potentially confers them with the ability to adapt to changing environmental conditions, i.e. antibiotic resistance or transmission between different hosts. Hence, integrated analysis of all aspects of genome evolution is important for the identification of potential pathogens and design of drugs and vaccines.
- Genomic rearrangements
- Antigen variation
- Gene inflow
- Selection in upstream regions
The genus Streptococcus are Gram-positive bacteria that exert strong influence on the health of humans and animals. In particular, Streptococcus pneumoniae, normally a commensal from the nasopharynx microflora, at the same time is responsible for most pneumonia cases and is second only to Mycobacterium tuberculosis as a cause of mortality from bacterial infection worldwide . Streptococcus pyogenes is among the top ten of bacterial causes of human mortality worldwide [2, 3], and due to the molecular mimicry with heart and brain cells causes severe autoimmune sequelae like rheumatic fever  and, possibly, autoimmune neuropsychiatric disorders . Streptococcus suis rarely causes disease in human, but is one of the most important swine pathogens .
Sequencing of multiple strains of one species has demonstrated that the genome of any single strain does not reflect the genetic variability of the species, as two strains may differ by 20–35% of the gene content . The concept of pan-genome was introduced to represent the total set of genes observed in genomes of strains assigned to a given species [7–9]. The pan-genome consists of core genes, present in all sequenced strains, dispensable, or periphery, genes, present in a subset of strains, and unique, strain-specific genes. The pan-genome is said to be open if upon addition of new strains its size continues to grow, or closed, if at some point it saturates .
Fractions of the pan-genome may differ not only in size, but also in the functional composition . In general, core genes encode housekeeping functions, while dispensable and unique genes confer selective advantages such as adaptation to particular niches, e.g. colonization of different hosts for pathogens, or antibiotic resistance . So one may expect that genes from different fractions of the pan-genome evolve in different modes, including gene gain/loss rate, frequency of horizontal gene transfer, and selective pressure [12, 13].
A consequence of the highly dynamic nature of bacterial genomes is frequent genomic rearrangements. Large inversions across the replication axis, deletions and insertions have been observed in S. pneumoniae [14, 15], S. suis [16, 17] and S. pyogenes . The inversions have been suggested to rebalance the chromosomal architecture affected by insertions of large DNA segments . The majority of these rearrangements occur at genome areas encoding transposases. Other genomic rearrangements occur at rRNA operons or sites encoding phage integrases and/or phage-related proteins.
Genome arrangement may have profound effects on a bacterial phenotype. Rearrangements can disrupt genes, create new genes by fusion of gene parts, or change gene expression. One example of such inversions is truncation of the so-called srtF pilus island in S. pneumoniae NSUI060 . In S. pyogenes M23ND, genomic rearrangements resulted in re-clustering of a broad set of CovRS-regulated, actively transcribed genes, including virulence factors and metabolic genes, to the same leading strand. This may provide a potential advantage by creating spatial proximity to the transcription complexes, which may contain the global transcriptional regulator, CovRS, and RNA polymerases, in turn allowing for efficient transcription of the genes required for growth, virulence, and persistence .
Here we describe a comprehensive pan-genomic analysis of S. pneumoniae, S. pyogenes, and S. suis strains with integrated analysis of their genome evolution. The paper is organized as follows. First, we describe and functionally characterize the pan-genome and then use the results of this analysis to detect variations in selection regimes for genes and intergenic regions from different pan-genome fractions. Next, we focus on genome rearrangements revealing large parallel inversions in all studied species and make a prediction of the antigenic variation of histidine triad protein PhtD in S. pneumoniae. Finally, we use the gene order data to identify and functionally characterize the fraction of genes horizontally transferred after the divergence of the studied species and further spreading between the strains.
The selection of the species was based on the number of available strain genomes. We analyzed 25 strains of Streptococcus suis, 50 strains of Streptococcus pyogenes, and 28 strains of Streptococcus pneumoniae, all available complete genomes as of June 2016 (Additional files 1: Table S1 and 2: Figure S1). The complete genomes were downloaded from the GenBank . For all but two genomes (Streptococcus pyogenes STAB901 and Streptococcus pyogenes MTB313) the GenBank annotation coincides with that of the NCBI Refseq database.
Construction of orthologous groups (OGs)
We constructed orthologous groups using Proteinortho v5.13 with the default parameters . Each gene was thus assigned to an orthologous group or labeled as a singleton. The size of a pan-genome was estimated with the Chao algorithm from the Micropan R-package .
Assignment of Gene Ontology (GO) terms to orthologous groups
To assign GO terms to genes, we used Interproscan . A GO term was assigned to an orthologous group, if it was assigned to at least 90% of genes in this group. To determine overrepresented functional categories, we used GOstat . The fit by theoretical models was estimated using the Akaike information criterion (AIC) .
Assignment of KEGG Orthology (KO) categories to orthologous groups
Initially, we assigned KO categories to genes with GhostKOALA . Then a KO category was assigned to an orthologous group, if it was assigned to at least 90% of genes in this group. KO terms were divided into supercategories “Genetic Information Processing”, “Metabolism”, “Cellular Processes”, “Environmental Information Processing”, and “other” based on the KEGG hierarchy classification.
Prediction of virulence-related orthologous groups
We found virulence-related genes with MP3 (threshold 0.2)  that combined a support vector machine classifier trained on virulence factors from MvirDB  and a hidden Markov model classifier based on Pfam domains present in virulence factors. Orthologous group was considered virulence-related if at least 10% of its members were predicted to be virulence-related. To predict potential prophages, we used web server PHAST .
p N/p S calculation
To estimate the number of synonymous (pS) and nonsynonymous (pN) polymorphisms, we aligned amino acid sequences of proteins using MUSCLE  and then reconstituted the corresponding nucleotide alignment. Then we calculated pN and pS using the KaKs-Calculator Toolbox v2.0 with the Modified version of the Yang-Nielsen (MYN) method . Multiple substitutions were accounted for using the Jukes-Cantor correction . For these calculations, we considered different Streptococcus species separately. While homologous recombination clearly is important for the Streptococcus evolution, in case of pairs of very close genomes, homologous recombination would affect synonymous and non-synonymous substitution at the same degree. For each species and each orthologous group not containing paralogos, we performed pairwise comparisons of all strains and assigned the median pN/pS ratio to this group.
Selection in intergenic regions
We extracted intergenic regions from.gbk files downloaded from the NCBI Genome database. We removed intergenic regions shorter then 50 bp. Out of the remaining intergenic regions we constructed the sample of upstream fragments in the following way. We extracted 100 bp upstream fragments for all intergenic regions longer than 100 bp [34, 35]. For intergenic regions shorter than 100 bp its complete sequence was considered as an upstream fragment.
We estimated the fraction of positions under negative selection in two ways. To assess the correlation between the level of conservation in intergenic region and universality of the respective genes, we simply calculated substitutions in upstream fragments. Specifically, we considered all pairs of strains from one species, extracted aligned upstream fragments of orthologous genes from the multiple genome alignment, and counted nucleotide substitutions with the Jukes-Cantor correction. The same approach was used to compare the conservation level in univeral regions and regions deleted in some strains.
To estimate the overall selection pressure in intergenic regions, we applied the method from  to calculate the fraction of positions under negative selection by comparing conservation statistics of multiple sequence alignments of orthologous upstream fragments from strains of two closely related species.
Detection and analysis of large insertions/deletions (indels)
In orthologous upstream fragments, we considered all indels of length at least six nucleotides, observed in at least two strains, and not located at the alignments termini (to reduce the bias from misalignment of fragment termini and varying length of upstream regions).
Identification of candidate transcription-factor binding sites
We scanned for candidate binding sites in upstream fragments with FIMO , using positional weight matrices downloaded from PRODORIC . Candidate binding sites were filtered using the FDR correction for multiple testing (q<0.05).
Gene composition of the leading and lagging strands
We identified origin (OriC) and terminus (Ter) of replication analyzing GC-skew plots. Based on the OriC and Ter locations, we determined the strands for genes from different fractions of the pan-genome. To test the statistical significance of differences between the pan-genome fractions, we performed a permutation test by shuffling genes between pan-genome fractions (retaining the fractions sizes) 250 times, thus obtaining the distribution of differences between the fractions under the random null model, and compared the observed differences with this distribution. Calculated differences with p-value satisfying the threshold with the Bonferroni correction for multiple testing were considered as statistically significant.
Statistical significance of over-representation of inter-replichore inversions was calculated as the probability of a given number of inter-replichore inversions in the set of inversions with given lengths. The probability of occurrence of the origin or the terminator of replication within the inversion was calculated as the ratio of the inversion length to the replichore length.
Construction of phylogenetic trees
For construction of phylogenetic trees we used concatenated aligned amino acid sequences of all core genes reverse translated to nucleotide alignment. Then maximum likelihood trees were constructed by RAxML  with default parameters.
Synteny blocks and rearrangements history
Synteny blocks were constructed using the Sibelia algorithm  with default parameters for whole-genome nucleotide alignments. Blocks observed in a genome more than once were filtered out. The history of inversions was reconstructed using the MGRA algorithm .
Detection of gene inflow
To detect genes horizontally transferred into species, we used the following model. If a gene with a mosaic phyletic pattern has been inherited vertically from the common ancestor and lost by several genomes, we expect to find it at the same syntenic region in the remaining strains. Genes not satisfying this condition are candidates for having been obtained horizontally. For this analysis, we excluded genes whose universal neighbours were affected by the reconstructed rearrangements, that is, genes located at or near boundaries of synteny blocks.
Pan-genome and its fractions
We constructed 5742 OGs comprising 192782 genes. The number of genes in a genome assigned to OGs was 1872±178 with the median 1857 (Additional files 1: Tables S1 and 3: Table S2); the number of singletons was 48±53, median=22.
Size of pan-genome fractions
As in , we split the pan-genome into percentile fractions by considering OGs present in at least a given fraction of strains. All such pan-genome fractions reach saturation after addition of the first few strains, an exception being the core genome, that continues shrinking, although at a decreasing rate, and the total pan-genome that grows, mostly due to strain-specific, unique genes. If unique genes are excluded, the total pan-genome becomes closed and converges to about 5750 genes (Fig. 2b and Table 1).
Distribution of GO terms across pan-genome fractions
Interproscan  provided at least one GO term to 127672 genes. These assignments are largely consistent, as members of an orthologous group tend to be assigned the same GO term (Additional file 6: Figure S4). Requiring that at least 90% of proteins from an OG share the GO term, we assigned GO terms to 2969 orthologous groups.
Overrepresented functional categories in different fractions of the pan-genome with regards to the described cube representation are shown in Additional file 3: Table S2. The common core genome and weakly species-specific cores, that is genes observed in all strains of one species and some strains of the remaining species, are enriched with GOs involved in information processing, such as translation, ribosome, gene expression, RNA, and all kinds of metabolic processes. The periphery is enriched in a small set of functions, including response to other organisms and pathogenesis (this fraction features the highest percent of predicted virulence-related genes, Additional file 9: Figure S7), in particular, sialidase activity (S. pyogenes, S. pneumonie), DNA binding and some carbohydrate-related functions (S. pneumoniae, S. suis), as well as transcription factors (S. pyogenes). Strain-specific genes are mainly enriched in transposase activity, DNA recombination, and DNA integration, consistent with the origin of strain-specific genes from mobile elements ; in addition, these categories are enriched in orthologous groups from the common periphery, that is, among genes present in a fraction of strains from all three species. Species-specific cores are enriched in vitamin biosynthesis (S. pneumonie), transport, histidine and lactose metabolism, and response to oxidative stress (S. pyogenes), and iron transport, amino acid metabolic processes, and regulation of transcription (S. suis).
The distribution of KEGG KO categories across the pan-genome is shown in Fig. 4b. The fraction of orthologous groups assigned with a KO category decreases when moving from the core genome to the periphery and then to strain-specific genes. Most orthologous groups related to “Genetic Information Processing”, that can be considered as most essential groups, correspond to the common core, followed by the periphery and then strain-specific genes; no such orthologous groups were found among species-specific cores.
Hence, the functional distribution agrees with the pan-genome model in which the core is responsible for information and most metabolic processes, the periphery performs fine-tuning of bacteria to specific ecological niches, and strain-specific genome fraction is comprised mainly of mobile elements-related genes .
Strand preference of genes from pan-genome fractions
Pan-genome fraction (number of OGs)
Percent of genes on leading strand
All genes (5742 OGs + 4584 singletons)
Non strain-specific OGs (5742)
Common core (458)
S.pneumoniae specific core (114)
S.pyogenes specific core (87)
S.suis specific core (126)
S.pneumoniae periphery (1136)
S.pyogenes periphery (922)
S.suis periphery (891)
Common periphery (270)
Strain-specific OGs (4584)
Selection regime in the pan-genome fractions
In addition to protein-coding genes, purifying selection acts on regulatory elements in intergenic regions. In this and the next sections, we attempt to quantify this selection by determining the fraction of intergenic nucleotide positions evolving under negative selection and by comparing regions that are deleted in some strains with universal intergenic regions.
The median fraction of nucleotide substitutions (with the Jukes-Cantor correction) in intra-species alignments of orthologous upstream regions was 5.6%. The distribution of the number of nucleotide substitutions with the Jukes-Cantor correction, dD, is shown in Fig. 5b and Additional file 10: Figure S8b. The fraction of the pan-genome with the lowest number of substitutions is the core genome. Hence, not only the core genes, but their expression level and regulation are likely to be conserved.
In inter-species alignments, conserved columns may indicate functional conservation or simply insufficient time after speciation to accumulate mutations in all non-essential positions. To estimate the number of hidden non-conserved positions we used the method from . We have calculated that only 10-20% of positions in the upstream regions evolve under purifying selection (Additional file 11: Figure S9). However, this may be an underestimate due to the large distance between the analysed species and the low number of conserved positions.
Inserted and deleted fragments in intergenic regions are not neutral
Genomic rearrangements and antigenic variation of histidine triad protein PhtD
An important mode of genome evolution is rearrangements of chromosome fragments. In prokaryotes with single chromosomes the prevalent type of rearrangements are symmetrical inversions around the origin of replication [49–52]. While several inversions in some Streptococcus strains had been described , the increased phylogenetic coverage allowed us to actually map the events to the phylogenetic tree.
Synteny blocks were obtained using whole-genome alignments for each species. Only blocks present in all strains were used for the reconstruction of inversions. As a result, 13 inversions for S. pneumoniae, 21 inversions for S. suis, and 26 inversions for S. pyogenes were identified. Mapping these events to phylogenetic trees (Additional file 13: Figure S11) revealed cases of parallel inversions in all three species.
The observed parallel inversions could be explained by homologous recombination (horizontal transfer between strains) involving a segment containing the inverted fragments. If this were the case, sequence trees constructed using the genes from the inverted fragments would cluster together strains with the parallel inversions. However, such trees for all inversions are consistent (Additional file 14: Figure S12) with the benchmark tree constructed using the alignments of all core gene (Additional file 2: Figure S1) confirming the independent origin of these inversions.
Previously, inversions in Streptococcus spp. were explained by selection to rebalance the replichore architecture affected by insertion of prophages . To check this hypothesis, we compared lengths of prophage regions in strains that contained the same inversion, and vice versa the number of inversions in strains with the same rate of prophage insertions (Additional file 1: Table S1). No correlation between the rates of prophage insertions and inversions was observed.
Detection of gene inflow
Statistics of periphery genes
Overrepresented GO terms in genes with non-conserved location, compared with all non-core genes
Genes with non-conserved location
Total number of genes with this GO term
ATP hydrolysis coupled proton transport
Energy coupled proton transport, against electrochemical gradient
Nucleic acid binding
Proton-transporting two-sector ATPase complex
Sequence-specific DNA binding
Proton-transporting V-type ATPase complex
Proton-transporting V-type ATPase, V0 domain
The pan-genome of many bacterial species including Streptococcus was shown to be open [56, 57]. In agreement with previous observations, the pan-genome of studied Streptococcus species is also open but it is mainly due to strain-specific genes. The pan-genome size exceeds 10300 genes, but if unique genes are excluded, the total pan-genome becomes closed and converges to about 5750 genes. Splitting the pan-genome into percentile fractions by considering OGs present in at least a given fraction of strains revealed the saturation of all such fractions after addition of the first few strains except the core genome and unique genes.
In a typical genome for studied Streptococcus species, one quarter of genes belong to the genus core genome; one quarter, to the species-specific core; most other genes are periphery ones, and a minority are strain-specific. The core genome of studied Streptococcus species is enriched with information-process and main metabolic functions and depleted with mobile elements and phage-related genes; the periphery fraction is enriched with niche-specific metabolic functions, including pathogenesis-related ones; and strain-specific genes are enriched with hypothetical genes and mobile elements, but also contain many virulence-related genes. At that, Streptococcus has a broad periphery and a huge repertoire of strain-specific genes. A large periphery fraction of pan-genome is thought to be a characteristic of organisms with large long-term effective population sizes and an ability to fill a variety of new niches .
Variation of selection regimes for genes and their upstream regions is consistent with the suggested evolutionary role of pan-genome fractions [12, 59]. Specifically, the core genes demonstrate a lower level of substitutions than periphery and unique genes and this tendency holds both for protein-coding sequences and for upstream regions [60–62]. More generally, while it is known that intergenic regions in bacteria experience purifying selection [63, 64], its strength appears to be different between pan-genome fractions. The fact that upstream regions of core genes have fewer substitutions might reflect stronger conservation of their regulation or more complex regulation, yielding a larger density of transcription-factor binding sites and other regulatory structures. On the other hand, fragments of intergenic regions that are deleted (or inserted) in some strains, are not less conserved than the surrounding regions, which might be a sign of newly evolving regulatory interactions or of ‘horizontal regulatory transfer’ [46, 65]. Evolution of intergenic regions in prokaryotes is a sparsely studied area, and new tools such as PIGGY  should accelerate the progress in this direction, specifically, by allowing for rapid analysis of additional, diverse species and genera.
In many bacteria, including Streptococcus, within-replichore inversions, that is, inversions with endpoints in the same replichore, have been shown to be relatively rare and significantly shorter than inter-replichore inversions [67–71]. Both non-random mutational processes and selection have been suggested as potential drivers of biased inversion landscapes [67, 69, 72, 73]. In more recent papers it was shown that symmetric inversion bias is not a universal feature of prokaryotic genome evolution but varies considerably across clades and the magnitude of the symmetric inversion bias is associated with various features of adaptive genome architecture, including the distance of essential genes to the origin of replication and the preferential localization of genes on the leading strand .
The pattern of inversions reconstructed in the studied Streptococcus species revealed a strong selection against intra-replichore inversions that, in agreement with previous observations, might be caused by strong preferential localization of genes on the leading strand (more than 80% of core genes). Despite low frequency of inversions, parallel inversions were observed in all three studied species. Most inversions were bound by mobile elements or clusters of rRNA, so most parallel events were likely to be caused by intragenome recombination linked to a limited number of repeated elements. The exception was the inversion in the S. pneumoniae subtree with breakpoints formed by genes encoding surface antigen proteins phtB and phtD. As the inversion was shown to exchange gene fragments, it is likely to indicate the action of antigen variation.
Phase variation is known to be an important mechanism that leads to phenotype diversification via intra-genomic recombination. Antigenic variation via inversions of short genomic fragments was shown to play a significant role for the S. pneumoniae infection influencing its pathogenicity . While this paper was under review, antigen variation by the observed large parallel inversion between the phtD and phtB genes in S. pneumoniae had been confirmed . The practical relevance of this observation comes from the fact that this protein is a candidate for a next-generation pneumococcal vaccine . This shows that evolutionary and functional analysis of predicted parallel rearrangements with direct confirmation of this mechanism may identify possible cases of phase variation by inversions in human pathogens.
In the studied Streptococcus species, about 7% single-copy periphery genes occur in multiple syntenic regions. The genes with inconsistent trees and non-conserved genome position are rare in at least one species and have likely experienced horizontal transfer between species. Hence, a large periphery in the Streptococcus pan-genome is likely to be explained by horizontal gene transfer, that is known to be one of the major drivers of genome evolution [78, 79]. Horizontal gene transfer in Streptococcus is facilitated by the competence system and is associated with immune system . Moreover, the early proof that DNA carries genetic information was provided by experiments with pneumococcus [81, 82]. This emphasizes the importance of pan-genome studies of medically relevant bacteria, as their pathogenicity may be affected by rare periphery or even strain-specific genes.
Members of the genus Streptococcus have a highly dynamic, open pan-genome, that potentially confers them with the ability to adapt to changing environmental conditions, i.e. antibiotic resistance or transmission between different hosts. Streptococcus genome plasticity is shaped by a dynamic interaction of major evolutionary forces such as horizontal gene transfer, genome rearrangements, and propagation of mobile elements reflecting the ecological niche and the lifestyle. Hence, integrated analysis of all aspects of genome evolution is important for the identification of potential pathogens and design of drugs and vaccines.
We are grateful to Marat Kazanov for sharing preliminary data.
The study was supported by the Russian Foundation of Basic Research under grant 16-54-21004 and program “Molecular and Cellular Biology” of the Russian Academy of Sciences.
Availability of data and materials
All sequences analyzed in this study were taken from GenBank. Accession numbers and details are available in Additional file 1: Table S1. Orthologous groups composition described in Additional file 15: Table S3 and GO term annotations are available in Additional file 3: Table S2.
PVS, OOB and MSG conceived and designed the study; PVS, OOB and AAK analyzed the data; PVS, OOB and MSG wrote the paper. All authors read and approved the final version of the manuscript.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Krzyściak W, Pluskwa K, Jurczak A, Kościelniak D. The pathogenicity of the Streptococcus genus. Eur J Clin Microbiol Infect Dis. 2013; 32(11):1361–76.PubMedPubMed CentralView ArticleGoogle Scholar
- Brown JS, Gilliland SM, Holden DW. A Streptococcus pneumoniae pathogenicity island encoding an ABC transporter involved in iron uptake and virulence. Mol Microbiol. 2001; 40(3):572–85.PubMedView ArticleGoogle Scholar
- Richards VP, Palmer SR, Pavinski Bitar PD, Qin X, Weinstock GM, Highlander SK, Town CD, Burne RA, Stanhope MJ. Phylogenomics and the dynamic genome evolution of the genus Streptococcus. Genome Biol Evol. 2014; 6(4):741–53.PubMedPubMed CentralView ArticleGoogle Scholar
- Cunningham MW. Post-Streptococcal Autoimmune Sequelae: Rheumatic Fever and Beyond In: Ferretti JJ, Stevens DL, Fischetti VA, editors. Streptococcus pyogenes: Basic Biology to Clinical Manifestations. Oklahoma: University of Oklahoma Health Sciences Center: 2016.Google Scholar
- Mullen S. Review of pediatric autoimmune neuropsychiatric disorder associated with streptococcal infections. Ment Health Clin. 2015; 5(4):184–8.View ArticleGoogle Scholar
- Gottschalk M, Segura M. The pathogenesis of the meningitis caused by Streptococcus suis: the unresolved questions. Vet Microbiol. 2000; 76(3):259–72.PubMedView ArticleGoogle Scholar
- Medini D, Donati C, Tettelin H, Masignani V, Rappuoli R. The microbial pan-genome. Curr Opin Genet Dev. 2005; 15(6):589–94.PubMedView ArticleGoogle Scholar
- Tettelin H, Masignani V, Cieslewicz MJ, Donati C, Medini D, Ward NL, Angiuoli SV, Crabtree J, Jones AL, Durkin AS, et al. Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial “pan-genome”. Proc Natl Acad Sci U S A. 2005; 102(39):13950–5.PubMedPubMed CentralView ArticleGoogle Scholar
- Hogg JS, Hu FZ, Janto B, Boissy R, Hayes J, Keefe R, Post JC, Ehrlich GD. Characterization and modeling of the Haemophilus influenzae core and supragenomes based on the complete genomic sequences of rd and 12 clinical nontypeable strains. Genome Biol. 2007; 8(6):103.View ArticleGoogle Scholar
- Gordienko EN, Kazanov MD, Gelfand MS. Evolution of pan-genomes of Escherichia coli, Shigella spp., and Salmonella enterica. J Bacteriol. 2013; 195(12):2786–92.PubMedPubMed CentralView ArticleGoogle Scholar
- Muzzi A, Masignani V, Rappuoli R. The pan-genome: towards a knowledge-based discovery of novel targets for vaccines and antibacterials. Drug Discov Today. 2007; 12(11):429–39.PubMedView ArticleGoogle Scholar
- Sarkar SF, Guttman DS. Evolution of the core genome of Pseudomonas syringae, a highly clonal, endemic plant pathogen. Appl Environ Microbiol. 2004; 70(4):1999–2012.PubMedPubMed CentralView ArticleGoogle Scholar
- Wolf YI, Makarova KS, Lobkovsky AE, Koonin EV. Two fundamentally different classes of microbial genes. Nat Microbiol. 2016; 2:16208.PubMedView ArticleGoogle Scholar
- Camilli R, Bonnal R, Del Grosso M, Iacono M, Corti G, Rizzi E, Marchetti M, Mulas L, Iannelli F, Superti F, Oggioni M, De Bellis G, Pantosti A. Complete genome sequence of a serotype 11A, ST62 Streptococcus pneumoniae invasive isolate. BMC Microbiol. 2011; 11(25).Google Scholar
- Williams T, Loman N, Ebruke C, Musher D, Adegbola R, Pallen M, Weinstock G, Antonio M. Genome analysis of a highly virulent serotype 1 strain of Streptococcus pneumoniae from West Africa. PLoS ONE. 2012; 7(10):26742.View ArticleGoogle Scholar
- Yao X, Li M, Wang J, Wang C, Hu D, Zheng F, Pan X, Tan Y, Zhao Y, Hu L, Tang J, Hu F. Isolation and characterization of a native avirulent strain of Streptococcus suis serotype 2: a perspective for vaccine development. Sci Rep. 2015; 5:9835.PubMedPubMed CentralView ArticleGoogle Scholar
- Athey T, Auger J, Teatero S, Dumesnil A, Takamatsu D, Wasserscheid J, Dewar K, Gottschalk M, Fittipaldi N. Complex population structure and virulence differences among serotype 2 Streptococcus suis strains belonging to sequence type 28. PLoS ONE. 2015; 10(9):0137760.View ArticleGoogle Scholar
- Hamada S, Kawabata S, Nakagawa I. Molecular and genomic characterization of pathogenic traits of group a Streptococcus pyogenes. Proc Jpn Acad Ser B Phys Biol Sci. 2015; 91(10):539–59.PubMedPubMed CentralView ArticleGoogle Scholar
- Athey T, Teatero S, Takamatsu D, Wasserscheid J, Dewar K, Gottschalk M, Fittipaldi N. Population structure and antimicrobial resistance profiles of Streptococcus suis serotype 2 sequence type 25 strains. PLoS ONE. 2016; 11(3):0150908.View ArticleGoogle Scholar
- Bao Y, Liang Z, Mayfield J, McShan W, Lee S, Ploplis V, Castellino F. Novel genomic rearrangements mediated by multiple genetic elements in Streptococcus pyogenes M23ND confer potential for evolutionary persistence. Microbiology. 2016; 162(8):1346–59.PubMedPubMed CentralView ArticleGoogle Scholar
- NCBI RC. Database resources of the national center for biotechnology information. Nucleic Acids Res. 2017; 45(D1):12.View ArticleGoogle Scholar
- Lechner M, Findeiß S, Steiner L, Marz M, Stadler PF, Prohaska SJ. Proteinortho: detection of (co-) orthologs in large-scale analysis. BMC Bioinformatics. 2011; 12(1):124.PubMedPubMed CentralView ArticleGoogle Scholar
- Snipen L, Liland KH. Micropan: An R-package for microbial pan-genomics. BMC Bioinformatics. 2015; 16(1):79.PubMedPubMed CentralView ArticleGoogle Scholar
- Jones P, Binns D, Chang H-Y, Fraser M, Li W, McAnulla C, McWilliam H, Maslen J, Mitchell A, Nuka G, et al. InterProScan 5: genome-scale protein function classification. Bioinformatics. 2014; 30(9):1236–40.PubMedPubMed CentralView ArticleGoogle Scholar
- Beißbarth T, Speed TP. GOstat: find statistically overrepresented gene ontologies within a group of genes. Bioinformatics. 2004; 20(9):1464–5.PubMedView ArticleGoogle Scholar
- Hurvich CM, Tsai C-L. Regression and time series model selection in small samples. Biometrika. 1989; 76:297–307.View ArticleGoogle Scholar
- Kanehisa M, Sato Y, Morishima K. BlastKOALA and GhostKOALA: KEGG tools for functional characterization of genome and metagenome sequences. J Mol Biol. 2016; 428(4):726–31.PubMedView ArticleGoogle Scholar
- Gupta A, Kapil R, Dhakan DB, Sharma VK. MP3: a software tool for the prediction of pathogenic proteins in genomic and metagenomic data. PloS ONE. 2014; 9(4):93907.View ArticleGoogle Scholar
- Zhou C, Smith J, Lam M, Zemla A, Dyer MD, Slezak T. MvirDB—a microbial database of protein toxins, virulence factors and antibiotic resistance genes for bio-defence applications. Nucleic Acids Res. 2006; 35(suppl 1):391–4.Google Scholar
- Zhou Y, Liang Y, Lynch KH, Dennis JJ, Wishart DS. PHAST: a fast phage search tool. Nucleic Acids Res. 2011; 39(suppl 2):347–52.View ArticleGoogle Scholar
- Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004; 32(5):1792–7.PubMedPubMed CentralView ArticleGoogle Scholar
- Zhang Z, Li J, Zhao X-Q, Wang J, Wong GK-S, Yu J. KaKs_Calculator: calculating Ka and Ks through model selection and model averaging. Genom Proteomics Bioinforma. 2006; 4(4):259–63.View ArticleGoogle Scholar
- Jukes TH, Cantor CR. Evolution of protein molecules. Mammal Protein Metab. 1969; 3(21):132.Google Scholar
- Gordon JJ, Towsey MW, Hogan JM, Mathews SA, Timms P. Improved prediction of bacterial transcription start sites. Bioinformatics. 2005; 22(2):142–8.PubMedView ArticleGoogle Scholar
- Burden S, Lin Y-X, Zhang R. Improving promoter prediction improving promoter prediction for the nnpp2. 2 algorithm: a case study using Escherichia coli DNA sequences. Bioinformatics. 2004; 21(5):601–7.PubMedView ArticleGoogle Scholar
- Tsoy OV, Pyatnitskiy MA, Kazanov MD, Gelfand MS. Evolution of transcriptional regulation in closely related bacteria. BMC Evol Biol. 2012; 12(1):200.PubMedPubMed CentralView ArticleGoogle Scholar
- Grant CE, Bailey TL, Noble WS. FIMO: scanning for occurrences of a given motif. Bioinformatics. 2011; 27(7):1017–8.PubMedPubMed CentralView ArticleGoogle Scholar
- Münch R, Hiller K, Barg H, Heldt D, Linz S, Wingender E, Jahn D. PRODORIC: prokaryotic database of gene regulation. Nucleic Acids Res. 2003; 31(1):266–9.PubMedPubMed CentralView ArticleGoogle Scholar
- Stamatakis A. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics. 2006; 22(21):2688–90.PubMedPubMed CentralView ArticleGoogle Scholar
- Minkin I, Patel A, Kolmogorov M, Vyahhi N, Pham S. Sibelia: A Scalable and Comprehensive Synteny Block Generation Tool for Closely Related Microbial Genomes In: Darling A, Stoye J, editors. Algorithms in Bioinformatics. WABI 2013. Lecture Notes in Computer Science, vol 8126. Berlin: Springer: 2013.Google Scholar
- Avdeyev P, Jiang S, Aganezov S, Hu F, Alekseyev MA. Reconstruction of ancestral genomes in presence of gene gain and loss. J Comput Biol. 2016; 23(3):150–64.PubMedView ArticleGoogle Scholar
- Koonin EV, Wolf YI. Genomics of bacteria and archaea: the emerging dynamic view of the prokaryotic world. Nucleic Acids Res. 2008; 36(21):6688–719.PubMedPubMed CentralView ArticleGoogle Scholar
- Carlos Guimaraes L, Benevides de Jesus L, Vinicius Canario Viana M, Silva A, Thiago Juca Ramos R, de Castro Soares S, Azevedo V. Inside the pan-genome-methods and software overview. Curr Genom. 2015; 16(4):245–52.View ArticleGoogle Scholar
- Zheng W-X, Luo C-S, Deng Y-Y, Guo F-B. Essentiality drives the orientation bias of bacterial genes in a continuous manner. Sci Rep. 2015; 5:16431.PubMedPubMed CentralView ArticleGoogle Scholar
- Schloissnig S, Arumugam M, Sunagawa S, Mitreva M, Tap J, Zhu A, Waller A, Mende DR, Kultima JR, Martin J, Kota K, Sunyaev S, Weinstock G, Bork P. Genomic variation landscape of the human gut microbiome. Nature. 2013; 493(7430):45.PubMedView ArticleGoogle Scholar
- Oren Y, Smith MB, Johns NI, Zeevi MK, Biran D, Ron EZ, Corander J, Wang HH, Alm EJ, Pupko T. Transfer of noncoding DNA drives regulatory rewiring in bacteria. Proc Natl Acad Sci. 2014; 111(45):16112–7.PubMedView ArticleGoogle Scholar
- Čuklina J, Hahn J, Imakaev M, Omasits U, Förstner KU, Ljubimov N, Goebel M, Pessi G, Fischer H-M, Ahrens CH, Gelfand M, E E-H. Genome-wide transcription start site mapping of Bradyrhizobium japonicum grown free-living or in symbiosis – a rich resource to identify new transcripts, proteins and to study gene regulation. BMC Genomics. 2016; 17(1):302.PubMedPubMed CentralView ArticleGoogle Scholar
- Smirnov A, Schneider C, Hör J, Vogel J. Discovery of new RNA classes and global RNA-binding proteins. Curr Opin Microbiol. 2017; 39:152–60.PubMedView ArticleGoogle Scholar
- Bochkareva OO, Dranenko NO, Ocheredko ES, Kanevsky GM, Lozinsky YN, Khalaycheva VA, Artamonova II, Gelfand MS. Genome rearrangements and phylogeny reconstruction in Yersinia pestis. PeerJ. 2018; 6:4545.View ArticleGoogle Scholar
- Cossu M, Badel C, Catchpole R, Gadelle D, Marguet E, Barbe V, Forterre P, Oberto J. Flipping chromosomes in deep-sea archaea. PLoS Genet. 2017; 13(6):1006847.View ArticleGoogle Scholar
- Repar J, Supek F, Klanjscek T, Warnecke T, Zahradka K, Zahradka D. Elevated rate of genome rearrangements in radiation-resistant bacteria. Genetics. 2017; 205(4):1677–89.PubMedPubMed CentralView ArticleGoogle Scholar
- Wang D, Li S, Guo F, Ning K, Wang L.Core-genome scaffold comparison reveals the prevalence that inversion events are associated with pairs of inverted repeats. BMC Genomics. 2017; 18(1):268.PubMedPubMed CentralView ArticleGoogle Scholar
- Zhang A, Yang M, Hu P, Wu J, Chen B, Hua Y, Yu J, Chen H, Xiao J, Jin M. Comparative genomic analysis of Streptococcus suis reveals significant genomic diversity among different serotypes. BMC Genomics. 2011; 12:523.PubMedPubMed CentralView ArticleGoogle Scholar
- Plumptre C, Ogunniyi A, Paton J. Polyhistidine triad proteins of pathogenic streptococci. Trends Microbiol. 2012; 20(10):485–93.PubMedView ArticleGoogle Scholar
- Mulkidjanian AY, Makarova KS, Galperin MY, Koonin EV. Inventing the dynamo machine: the evolution of the F-type and V-type ATPases. Nat Rev Microbiol. 2007; 5(11):892.PubMedView ArticleGoogle Scholar
- Donati C, Hiller NL, Tettelin H, Muzzi A, Croucher NJ, Angiuoli SV, Oggioni M, Hotopp JCD, Hu FZ, Riley DR, et al. Structure and dynamics of the pan-genome of streptococcus pneumoniae and closely related species. Genome Biol. 2010; 11(10):107.View ArticleGoogle Scholar
- Vernikos G, Medini D, Riley DR, Tettelin H.Ten years of pan-genome analyses. Curr Opin Microbiol. 2015; 23:148–54.PubMedView ArticleGoogle Scholar
- McInerney JO, McNally A, O’Connell MJ. Why prokaryotes have pangenomes. Nat Microbiol. 2017; 2:17404.Google Scholar
- Losada L, Ronning CM, DeShazer D, Woods D, Fedorova N, Stanley Kim H, Shabalina SA, Pearson TR, Brinkac L, Tan P, et al. Continuing evolution of Burkholderia mallei through genome reduction and large-scale rearrangements. Genome Biol Evol. 2010; 2:102–16.PubMedPubMed CentralView ArticleGoogle Scholar
- Zhang J, Yang J-R. Determinants of the rate of protein sequence evolution. Nat Rev Genet. 2015; 16(7):409.PubMedPubMed CentralView ArticleGoogle Scholar
- Marek A., Tomala K.The contribution of purifying selection, linkage, and mutation bias to the negative correlation between gene expression and polymorphism density in yeast populations. Genome Biol Evol. 2018; 10(11):2986–96.PubMedPubMed CentralGoogle Scholar
- Koonin EV. Are there laws of genome evolution?. PLoS Comput Biol. 2011; 7(8):1002173.View ArticleGoogle Scholar
- Thorpe HA, Bayliss SC, Hurst LD, Feil EJ. Comparative analyses of selection operating on non-translated intergenic regions of diverse bacterial species. Genetics. 2017; 206(1):363–376.PubMedPubMed CentralView ArticleGoogle Scholar
- Molina N, Van Nimwegen E. Universal patterns of purifying selection at noncoding positions in bacteria. Genome Res. 2008; 18(1):148–60.PubMedPubMed CentralView ArticleGoogle Scholar
- Koonin EV. Horizontal transfer beyond genes. Proc Natl Acad Sci. 2014; 111(45):15865–6.PubMedView ArticleGoogle Scholar
- Thorpe HA, Bayliss SC, Sheppard SK, Feil EJ. Piggy: a rapid, large-scale pan-genome analysis tool for intergenic regions in bacteria. GigaScience. 2018; 7(4):015.View ArticleGoogle Scholar
- Eisen JA, Heidelberg J, White O, Salzberg S. Evidence for symmetric chromosomal inversions around the replication origin in bacteria. Genome Biol. 2000; 1:0011.View ArticleGoogle Scholar
- Suyama M, Bork P.Evolution of prokaryotic gene order: genome rearrangements in closely related species. Trend Genet. 2001; 17:10–3.View ArticleGoogle Scholar
- Tillier E, Collins R. Genome rearrangement by replication-directed translocation. Nat Genet. 2000; 26:195–7.PubMedView ArticleGoogle Scholar
- Darling AE, Miklós I, Ragan MA. Dynamics of genome rearrangement in bacterial populations. PLoS Genet. 2008; 4(7):1000128.View ArticleGoogle Scholar
- Nakagawa I, Kurokawa K, Yamashita A, Nakata M, Tomiyasu Y, Okahashi N, Kawabata S, Yamazaki K, Shiba T, Yasunaga T, Hayashi H, Hattori M, Hamada S. Genome sequence of an m3 strain of Streptococcus pyogenes reveals a large-scale genomic rearrangement in invasive strains and new insights into phage evolution. Genome Res. 2003; 13:1042–55.PubMedPubMed CentralView ArticleGoogle Scholar
- Makino S, Suzuki M. Bacterial genomic reorganization upon dna replication. Science. 2001; 292(5518):803.PubMedView ArticleGoogle Scholar
- Mackiewicz P, Mackiewicz D, Kowalczuk M, Cebrat S. Flip-flop around the origin and terminus of replication in prokaryotic genomes. Genome Biol. 2001; 2(12):1004.View ArticleGoogle Scholar
- Repar J, Warnecke T.Non-random inversion landscapes in prokaryotic genomes are shaped by heterogeneous selection pressures. Mol Biol Evol. 2017; 34(8):1902–11.PubMedPubMed CentralView ArticleGoogle Scholar
- Li J, Li J, Feng Z, Wang J, An H, Liu Y, Wang Y, Wang K, Zhang X, Miao Z, Liang W, Sebra R, Wang G, Wang W, Zhang J. Epigenetic switch driven by DNA inversions dictates phase variation in Streptococcus pneumoniae,. PLoS Pathog. 2016; 12(7):1005762.View ArticleGoogle Scholar
- Slager J, Aprianto R, Veening J-W.Deep genome annotation of the opportunistic human pathogen Streptococcus pneumoniae d39. Nucleic Acids Res. 2018; 46(19):9971–89.PubMedPubMed CentralGoogle Scholar
- Yun K, Lee H, Choi E, Lee H.Diversity of pneumolysin and pneumococcal histidine triad protein d of Streptococcus pneumoniae isolated from invasive diseases in korean children. PLoS ONE. 2015; 10(8):0134055.View ArticleGoogle Scholar
- Kunin V, Ouzounis C. The balance of driving forces during genome evolution in prokaryotes. Genome Res. 2003; 13(7):1589–94.PubMedPubMed CentralView ArticleGoogle Scholar
- Ochman H. Distinguishing the ORFs from the ELFs: short bacterial genes and the annotation of genomes. Trends Genet. 2002; 18(7):335–7.PubMedView ArticleGoogle Scholar
- Andam CP, Hanage WP. Mechanisms of genome evolution of Streptococcus. Infect Genet Evol. 2015; 33:334–42.PubMedView ArticleGoogle Scholar
- Griffith F.The significance of Pneumococcal types. J Hyg. 1928; 27(2):113–59.PubMedView ArticleGoogle Scholar
- Avery OT, MacLeod CM, McCarty M. Studies on the chemical nature of the substance inducing transformation of Pneumococcal types: Induction of transformation by a desoxyribonucleic acid fraction isolated from Pneumococcus type III. J Exp Med. 1944; 79(2):137–58.PubMedPubMed CentralView ArticleGoogle Scholar