Genomic taxonomy of vibrios
© Thompson et al; licensee BioMed Central Ltd. 2009
Received: 26 January 2009
Accepted: 27 October 2009
Published: 27 October 2009
Vibrio taxonomy has been based on a polyphasic approach. In this study, we retrieve useful taxonomic information (i.e. data that can be used to distinguish different taxonomic levels, such as species and genera) from 32 genome sequences of different vibrio species. We use a variety of tools to explore the taxonomic relationship between the sequenced genomes, including Multilocus Sequence Analysis (MLSA), supertrees, Average Amino Acid Identity (AAI), genomic signatures, and Genome BLAST atlases. Our aim is to analyse the usefulness of these tools for species identification in vibrios.
We have generated four new genome sequences of three Vibrio species, i.e., V. alginolyticus 40B, V. harveyi-like 1DA3, and V. mimicus strains VM573 and VM603, and present a broad analyses of these genomes along with other sequenced Vibrio species. The genome atlas and pangenome plots provide a tantalizing image of the genomic differences that occur between closely related sister species, e.g. V. cholerae and V. mimicus. The vibrio pangenome contains around 26504 genes. The V. cholerae core genome and pangenome consist of 1520 and 6923 genes, respectively. Pangenomes might allow different strains of V. cholerae to occupy different niches. MLSA and supertree analyses resulted in a similar phylogenetic picture, with a clear distinction of four groups (Vibrio core group, V. cholerae-V. mimicus, Aliivibrio spp., and Photobacterium spp.). A Vibrio species is defined as a group of strains that share > 95% DNA identity in MLSA and supertree analysis, > 96% AAI, ≤ 10 genome signature dissimilarity, and > 61% proteome identity. Strains of the same species and species of the same genus will form monophyletic groups on the basis of MLSA and supertree.
The combination of different analytical and bioinformatics tools will enable the most accurate species identification through genomic computational analysis. This endeavour will culminate in the birth of the online genomic taxonomy whereby researchers and end-users of taxonomy will be able to identify their isolates through a web-based server. This novel approach to microbial systematics will result in a tremendous advance concerning biodiversity discovery, description, and understanding.
Taxonomy is at the basis of the biological sciences, being one of its oldest branches. It deals with identification, classification (i.e. creation of new taxa) and nomenclature. In the early 1970s, a seminal work done by Colwell put forward the concept of polyphasic taxonomy that is currently still in use . According to this concept, in order to achieve the most natural classification system, there should be an integration of information from the molecular to the ecological levels. DNA-DNA hybridization (DDH) data had a pivotal role to infer the species boundaries in the polyphasic taxonomy. Indeed DDH proved to be the most powerful tool to unambiguously identify prokaryotes, providing for the first time, a reliable means of categorizing microbes. However, this technique suffers from various limitations, including the need to include reference strains in each new experiment. In addition, it is not portable and requires special facilities available in a limited number of international laboratories. The introduction of 16S rRNA analysis as an alternative method for use in taxonomy allowed the development of an evolutionary framework [2, 3]. Boundaries for species identification and a species definition were set on the basis of extensive empirical data . A bacterial species is defined as a group of strains (including the type strain), having > 70% DDH similarity, < 5°C ΔTm, < 5% mol G+C difference of total genomic DNA, > 97% 16S rRNA identity [4, 5].
The bacterial species definition is pragmatic and operational, aiming at the establishment of a rapid, reliable, reproducible, and useful taxonomic framework, based on microbial evolution, for a variety of applications (e.g. medicine and agriculture) . This polyphasic definition is a consensus in microbiology, although it is not based on a concept (i.e. the biological processes behind speciation and species). It is crucial to highlight that the current polyphasic framework does not question if this definition corresponds to a biological reality . Within the framework of polyphasic taxonomy, strains of the same species have similar phenotypes (e.g. expression of different types of enzymes, ability to using different types of compounds as energy source, and growth in different temperatures and concentrations of acid and salt), genotypes (e.g. rep-PCR and AFLP), and chemotaxonomic features (e.g. FAME and polyamines), forming distinguishable tight groups . Ideally, these groups should be readily identifiable and differentiated from closely related species. However, currently there is not a consensus on the definition of a bacterial genus .
The most recent ideas on the species concepts corroborate polyphasic taxonomy . According to these authors, a bacterial species concept is a framework that explains how bacterial strains share common features and how they maintain genomic cohesion. The cohesion is observed as recurrent patterns or groups of strains that are recognized in nature and are termed species for practical reasons. However, the biological process giving rise and maintaining cohesion of these groups is what matters in a species concept. Homologous recombination appears to be a major force leading to genomic cohesion of strains of the same species. It is more frequent between strains of the same species than between strains of different species simply because this genetic process depends on sequence similarity. Horizontal gene transfer (HGT) between unrelated strains would cause an increase in phenotypic variation, but would not be frequent enough to hamper the formation and recognition of species. An alternative species concept is the stable ecotype in which cohesion of bacterial strains of a given species is maintained by accumulation of advantageous mutations and periodic purging of allelic variability . In contrast with the polyphasic species definition which is widely accepted, there is not a consensus on a bacterial species concept. It is of course possible that one single concept is not sufficient to explain the complexity of bacterial diversity. In addition, mutation, homologous recombination and HGT may be detected at varying levels in a single strain, making the scenario even more complex.
Whole microbial genome sequencing studies launched microbial taxonomy into a new era, with the possibility of establishing sistematics on the basis of complete genomes . How does one go about using whole genome sequences (WGS) for establishing a genomic taxonomy? And more specifically, how can one taxonomically define and identify species by means of WGS? WGS may contain taxonomic information in the form of gene content, genome wide signatures, phylogenetic markers, amino acid identity and overall genetic composition that might be useful for building novel taxonomic schemes . Pioneer computational and mathematical studies performed in the 1990s suggested that genomes contain species-specific signatures . Genome signature is a compositional parameter reflecting the dinucleotide relative abundance, which is similar between closely related species, and dissimilar between non-related species. Genome signatures appear to allow the identification of isolates and metagenomes into known species [13, 14]. Whole genome sequences also permit the reconstruction of more robust taxonomic trees (i.e. supertrees) based on all genes of the core genome [15–17]. A good congruence was obtained by the traditional 16S rRNA based trees and the novel supertree methods  proposed that the average amino acid identity (AAI) could be used to distinguish closely related sister species. Subsequently, a close relationship between DDH and AAI was shown . Some studies have suggested that the effective number of codons (Nc) could also be a species-specific marker .
In Vibrios, the birth of the genomic taxonomy occurred with a series of papers that attempted to use multilocus sequence analysis (MLSA) [21–26]. These studies allowed the establishment of rapid and powerful identification systems through the internet. Currently there are MLSA schemes for most of the human pathogens available for free access in the internet. Establishing a universal MLSA will not be possible though. Studies accomplished so far have shown that the resolution of different markers varies according to the taxonomic groups. For instance, the recA gene is very useful to differentiate closely related species of Burkholderia , but it is not appropriate for vibrios . Clearly, genes have different molecular clocks in different microbes, indicating the need of a multigene approach. With the advent of ultra-rapid genome sequencing, it is now possible to sequence one almost complete microbial genome in less than a day [28, 29]. The new generation of DNA sequencers will enable sequencing of more than a dozen prokaryotic genomes in less than an hour, possibly making it cheaper and faster to sequence a whole genome than several genes for MLSA. In future, MLSA might be used simply as a rapid screen methodology .
Vibrios are an excellent test model for genomic taxonomy because they are ubiquitous in the marine environment, associated with a wide range of marine life (some species such as V. cholerae, V. parahaemolyticus and V. vulnificus cause serious disease in man) and experiencing a variety of environmental conditions and selection forces, leading to high genomic plasticity [31, 32]. Consequently, differentiation of sister species becomes very difficult. For instance, V. cholerae and V. mimicus have nearly indistinguishable phenotypes. Among the phenotypic tests used in the Bergey's manual, only sucrose fermentation and lipase activity may discriminate the two species. According to the most recent version of the Bergey's manual, a Vibrio species is defined as a group of strains forming small (0.5-0.8 × 1.4-2.6 μm) comma-shapped rods with polar flagella enclosed in a sheath, facultative anaerobic metabolism, capable of fermenting D-glucose and growth at 20°C . Primarily aquatic, most species are oxidase positive, reduce nitrate to nitrite, require Na+ for growth, and ferment D-frutose, maltose, and glycerol. Each vibrio species is further identified by an array of over 100 phenotypic tests. There is not an operational definition for genera within the vibrios . In our hands, vibrio species may be better defined on the basis of amplified fragment length polymorphism (AFLP) and MLSA [21, 22, 25, 34]. Strains of the same species (including the type strain) share more than 60% mutual AFLP band pattern similarity and more than 95% similarity in MLSA (using the loci rpoA, recA, pyrH, ftsZ, topA, mreB gyrB and gapA). More importantly, strains of the same species and species of the same genus will form monophyletic groups on the basis of MLSA. This was the main argument used to propose the newly described genus Aliivibrio .
In order to test the feasibility of the genomic taxonomy in vibrios, several markers were analysed in a collection of 32 genomes, including four newly pyrosequenced genomes. Several Vibrio strains had the genome completely sequenced and are available on the web. Eleven V. cholerae and two V. mimicus genomes formed an ideal test case for taxonomy because of their close relatedness as sister species. These sister species have nearly identical 16S rRNA sequences and around 70% DDH. Disclosing species-specific patterns for the different genome-wide markers would reinforce their usefulness in prokaryotic taxonomy. The aim of this study was to extract taxonomic information from vibrio genome sequences by means of a detailed analysis of MLSA, supertree, Nc, AAI, genomic signatures, Genome BLAST atlas and pangenome plot that would allow species identification.
Genome sequence data
Genomic features of the vibrios genomes.
Genome size (nt)
No. of CDS
Aliivibrio salmonicida FLI1238
Photobacterium profundum SS9
Vibrio alginolyticus 40B
Vibrio alginolyticus 12G01
Vibrio angustum S14+
Vibrio campbellii AND4
Vibrio cholerae N16961
Vibrio cholerae 0395
Vibrio cholerae 1587
Vibrio cholerae 2740-80
Vibrio cholerae 623-39
Vibrio cholerae B33
Vibrio cholerae MAK757
Vibrio cholerae MZO-2
Vibrio cholerae MZO-3
Vibrio cholerae NCTC8457
Vibrio cholerae V52
Vibrio fischeri ES114+
Vibrio fischeri MJ11+
Vibrio harveyi-like 1AD3
Vibrio harveyi ATCC BAA-1116
Vibrio harveyi HY01
Vibrio mimicus VM573
Vibrio mimicus VM603
Vibrio parahaemolyticus RIMD2210633
Vibrio parahaemolyticus AQ3810
Vibrio shilonii AK1
Vibrio sp Ex25
Vibrio sp MED222
Vibrio splendidus 12B01
Vibrio vulnificus CMCP6
Vibrio vulnificus YJ016
Genome BLAST Atlas, proteome matrix, and pangenome plot
The BlastAtlas plots were constructed as described previously [40, 41]. The pangenome plot, and proteome matrix were constructed as described . For building the atlas, the genomes were automatically annotated and were compared to the reference chromosome (V. cholera strain N16961 in this case). The BLAST matrix perl script performs an all-against-all BLAST comparison of genomes from multiple organisms. For every combination, a protein blast is carried out, finding all homologous proteins. For our purposes, we use the "50-50 rule", which requires both of the following characteristics: 1.) at least 50% of the query protein must overlap in the alignment, and 2.) at least 50% of the residues within the alignment must be identical. After the homologous proteins are identified, the proteins are clustered into protein families and the number of families containing proteins from both strains are counted. The fraction of these shared families out of the total number of families is the number reported in the BLAST matrix. Since the direction of comparison of the two organisms will give identical results under these conditions, one redundant half of the square matrix plot is left out. Thus, we use a triangular shaped diagram where the hypotenuse corresponds to the paralogs (red), which are the internal homologous proteins (e.g., repeated genes). Since this is a comparison of all the proteins in a genome, compared to the pan-genome, it is possible to see related organisms, in terms of their similar composition of gene families.
16S rRNA tree, Multilocus Sequence Analysis (MLSA) and Supertree approach
MLSA and supertree approach were based on the concatenated sequences of house-keeping genes [15, 21]. The 16S rRNA gene sequences, the gene sequences used for MLSA (i.e. ftsZ, gyrB, mreB, pyrH, recA, rpoA and topA) and the gene sequences used for supertree (i.e. aminopeptidase P, alaS, aspS, ftsZ, gltX, gyrB, hisS, ileS, infB, metG, mreB, pntA, pheT, pyrH, recA, rpoA, rpoB, rpsH, signal recognition particle protein, threonyl-tRNA synthetase, topA, valS and 30S ribosomal protein S11) were obtained from the NCBI. The concatenated sequences were aligned by CLUSTALX. Phylogenetic analyses were conducted using MEGA version 4.0  and PAUP version 4.0b10 . The phylogenetic inference was based on the maximum-parsimony character method (MP), the neighbour-joining genetic distance method (NJ) , and the maximum likelihood method (ML). Distance estimations were obtained according to the Kimura-2-parameter for 16S rRNA gene and Jukes-Cantor  for MLSA and supertree for NJ. The program Modeltest was used to select the GTR+I+G as the model for MLSA and supertree and Tamura-Nei+I+G as the model for 16S rRNA in the ML analysis. The reliability of each tree topology was checked by 2000 bootstrap replications .
Average amino acid identity (AAI)
The AAI was calculated according to . Genes conserved between a pair of genomes were determined by whole-genome pairwise sequence comparisons using the BLAST algorithm release 2.2.5 . For these comparisons, all protein-coding sequences (CDSs) from one genome were searched against the genomic sequence of the other genome. CDSs that had a BLAST match of at least 40% identity at the amino acid level and an alignable region with more than 70% of the length of the query CDS were considered as conserved genes . This cutoff is above the twilight zone of similarity searches, where inference of homology is error prone due to low similarity between aligned sequences. Thus, query CDSs were presumably homologous to their matches. The genetic relatedness between a pair of genomes was measured by the average amino acid identity of all conserved genes between the two genomes as computed by the BLAST algorithm.
Codon usage bias was calculated for each genome. The effective number of codons used in a sequence (Nc) , was calculated using CHIPS (EMBOSS). Nc values range from 20 (in an extremely biased genome where only one codon is used per amino acid) to 61 (all synonymous codons are used with equal probability) . The rose plot of codon usage was constructed as described previously .
Determination of dinucleotide relative abundance values
We determined the dinucleotide relative abundance value for each genome. Sequences were concatenated with their inverted complementary sequence using REVSEQ, YANK and UNION (EMBOSS). Mononucleotide and dinucleotide frequencies were calculated using COMPSEQ (EMBOSS). Dinucleotide relative abundances (ρ*XY) were calculated using the equation ρ*XY = fXY/fXfY where fXY denotes the frequency of dinucleotide XY, and fX and fY denote the frequencies of X and Y, respectively . Statistical theory and data from previous studies [12, 49] indicate that the normal range of ρ*XY is between 0.78 and 1.23. The difference in genome signature between two sequences is expressed by the genomic dissimilarity (δ*), which is the average absolute dinucleotide of relative abundance difference between two sequences. The dissimilarities in relative abundance of dinucleotides between both sequences were calculated using the equation described by : δ*(f,g) = 1/16Σ|ρ*XY (f) - ρ*XY (g)| (multiplied by 1000 for convenience), where the sum extends over all dinucleotides.
General features of the sequenced genomes
Phylogenetic reconstructions by 16S rRNA, MLSA and supertree
Average amino acid identity (AAI)
The percentage of identity of putative orthologous protein-encoding genes detected in the pairwise comparison is shown in Figure 3 (and see Additional file 4; Table S2). The identity of protein-encoding genes between different genera of vibrios varied considerably. The mutual AAI for the pairs Vibrio and Photobacterium, Vibrio and Aliivibrio, and Photobacterium and Aliivibrio were at most 67%, 70% and 68%, respectively. The AAI within the genus Phobacterium (represented by P. angustum S14 and P. profundum SS9) was only 73%, whereas the AAI within the genus Aliivibrio was 85%. The AAI within the genus Vibrio varied between 70 and 91%. The Vibrio core group (i.e. V. alginolyticus, V. campbellii, V. harveyi and V. parahaemolyticus) shared at most 75% of their protein-encoding genes. The wider range of variation is explained by the higher number of representatives in the latter genus. The sister species V. cholerae-V. mimicus, V. harveyi-V. campbellii, V. parahaemolyticus-V. alginolyticus shared 90-91% AAI, whereas the intra-species AAI in V. cholerae varied between 98 and 99.5%. Vibrio sp. EX25 and V. alginolyticus 40B had 95% identity, suggesting that EX25 belongs to the species V. alginolyticus. The rather low AAI within the species V. harveyi (i.e. 90%) may be due to the incomplete genome sequences and to unresolved taxonomic issues. V. harveyi-like 1DA3 had 70% DDH in previous experiments and formed a separate genomic group on the basis of molecular fingerprinting .
Dinucleotide relative abundance values (ρ*) and species-especifc genome signatures [δ*(f,g)]
ρ* values were in the normal range for all dinucleotides in all taxa investigated except for CG (over-represented in almost all genomes except in V. campbellii, V. harveyi, V. shilonii, V. splendidus and Vibrio sp. MED222) and TA (under-represented in almost all genomes except in V. shilonii, P. angustum, P. profundum, A. fischeri and A. salmonicida) (data not shown). The genomic dissimilarity value [δ*(f,g)] of the genus Vibrio towards the genera Photobacterium and Aliivibrio was 38-66 and 35-59, respectively. The δ*(f,g) value between the genera Photobacterium and Aliivibrio were in the range of 20 to 34. δ*(f,g) values within the genera Alliivibrio and Photobacterium were 26.5 and 13, respectively.
δ(f,g) values within each vibrio species were between 1 and 4, whereas the interspecies δ(f,g) were between 10 and 61 (Figure 3 and see Additional file 5; Table S3). Thus, the interspecies value was higher than the intergenus value. The δ(f,g) values among the Vibrio core group members and V. cholerae/V. mimicus were at least 38. The δ(f,g) values between the sisters species V. cholerae-V. mimicus, V. harveyi-V. campbellii and V. parahaemolyticus-V. alginolyticus were 14, 13 and 17, respectively. As vibrio species contain two chromosomes (one larger ca. 2.9 Mb and one smaller chromosome ca. 1.0 Mb) we also calculated the δ(f,g) between the two chromosomes of the same strain. The δ(f,g) values of the intragenomic comparison of the two chromosomes of complete vibrio genomes were between 10 and 18. The two chromosomes are essential for the cell survival and persistence, but yet they showed distinct patterns, suggesting a high genomic plasticity.
Codon usage bias
Overall codon usage bias was very similar among the vibrio species investigated (Table 1). There was little variation in Nc among the different genomes, with Nc ranging from 45 to 54. The Ncwithin the genera Vibrio, Photobacterium and Aliivibrio were 51-54, 48-51, and 45-48, respectively. Sister vibrio species had similar Nc values. Thus, there was not a clear differentiation of closely related taxa using the Nc.
This study aimed at providing the underpins for the establishment of an online genomic taxonomy of vibrios. The methods used to extract taxonomic information from vibrios genomes are freely available in the web, have complementary taxonomic resolutions and are all amenable to automation for species and genera identification. Species identification is the major goal of microbial taxonomy. The identification of closely related sister species V. cholerae - V. mimicus, V. alginolyticus - V. parahaemolyticus, and V. harveyi - V. campbellii were evident in our study. The methods with the higher resolution for species and genera identification were, in order, MLSA, supertrees, and AAI. Karlin's genomic signature (δ(f,g)) performed well for species identification, whereas Nc appeared to be useful for differentiating genera.
According to our analyses the AAI is one of the most useful genomic features for figuring out vibrio taxonomy. With the ever growing number of whole-genome sequences, this new method could be incorporated in a future re-valuation of the bacterial species definition. It is important to bear in mind that the number of orthologous genes shared between species depends on genome size and phylogenetic relationship [53, 54]. Vibrio genomes have a wide variation in genome size, varying from 4 to 6 Mb (Table 1). As the genome size may influence the AAI, possibly the cut-off for species delineation will vary slightly when additional vibrio species are analysed.
Genome signature dissimilarity δ*(f,g))
Karlin's genomic signature dissimilarity δ*(f,g)) can be used for species identification in vibrios. Overall, the genomic signature of vibrios was more similar between closely related species than to distantly related species. However, species from different genera may have similar signatures. For instance, V. mimicus and Photobacterium spp. shared values of genomic signature dissimilarity in the range of 38-40, whereas V. mimicus and V. splendidus had a genomic signature dissimilarity of 57. V. campbellii and Aliivibrio fischeri had a genomic signature dissimilarity of 35, indicating that the taxonomic resolution of Karlin's genomic signature δ*(f,g)) is lower for discriminating genera. Similar results were found in other studies concerning the resolution of this type of signature . Genome signatures alone have significant limitations when used as phylogenetic markers for higher taxonomic levels e.g. genera to phyla. Apparently, the primary limitation is the lack of divergence in some phylogenetically distant related species that could result from absence of molecular clock. The equilibrium between mutational biases and selective constraints results in equilibrium in the oligonucleotide composition of a genome. Similar genome signatures between phylogenetically distant related species could arise from coincidental convergence due to crowding of the genome signature space derived from dinucleotide frequencies, which may not capture sufficient information to differentiate between distant taxa e.g. genera to phyla.
Concordance between the methods
Pearson correlation coefficient (expressed as percentage) between different methods
1. 16S rRNA gene identity
2. Identity in MLSA
3. Identity in supertree analysis
4. Average aminoacid identity (AAI)
5. Karlin genome signature dissimilarity
6. BLAST proteome identity
The AAI and the proteome matrix correlated well, yet the latter is measuring the fraction of proteins that are the same in both genomes, and the former is measuring the average identity of the amino acids of the proteins in those matches. In addition, AAI uses 40% amino acid identity and > 70% of the aligned length of a protein which is stricter than the settings used (50% identity-50% length) to construct the proteome matrix. Because the Karlin's genomic signature dissimilarity indexes genome wide variation, its phylogenetic resolution is distinct of individual genetic marker genes. In addition, this signature considers variation in both coding and non coding genomic regions. This may explain why the correlation between the signature and the gene sequence based methods obtained in this study is slightly lower.
Towards a new species definition in vibrios
A new species definition is mandatory if one aims to establish an automatic identification of vibrios through a web-based server. So far, the 16S rRNA gene analysis has been applied for species definition and identification . Its value for these purposes in vibrios is limited because of its low taxonomic resolution. This study showed enough WGS-based evidence to propose a new species definition in vibrios. In our hands, a vibrio species is defined as a group of strains that share > 95% DNA identity in MLSA and supertree gene sequence, > 96% AAI, ≤ 10 genome signature dissimilarity, and > 61% proteome identity. Strains of the same species and species of the same genus will form monophyletic groups on the basis of MLSA and supertree.
Ecology and genomic features
Each Vibrio species appear to have a specific ecologic niche. Genomes exhibit diverse patterns of species-specific compositional bias, i.e GC content, GC and AT skews, codon bias, and mutation bias. The exact mechanisms that generate and maintain the genome signatures are complex, but possibly involve differences in species-specific properties of DNA replication and repair machineries [49, 56]. In Borrelia burgdoferi, there is a bias related to the speed of the replication , whereas in Proteobacteria, DNA repair enzymes co-evolve with the genome signature . The evolutionary distances between DNA repair and recombination orthologous enzymes (mainly those involved in the nucleotide excision repair system) were highly correlated with genome signature distances. On the other hand, there was a significantly lower correlation between the evolutionary distances of the structural and metabolic enzymes and genome signature.
Environmental temperature and oxygen appear to influence the GC content of bacteria . The frequencies of AA, TA, and TT dinucleotides were higher than the frequencies of AT, GC, and CG dinucleotides in the vaccine strain of Pasteurella multocida compared to the virulent strain. Although the vaccine strain is cultured at higher temperature, its GC content is lower than the virulent strain. The AA + TT dinucleotide increased significantly in the vaccine strain, which may represent an adaptation to increased culturing temperature because AA/TT dinucleotides are conformationally very stable. Higher culturing temperature increases spontaneous hydrolytic deamination of cytosine and 5-methylcytosine which, in turn, tend to decrease GC content [60, 61]. Deamination and methylation favour nucleotide changes from G and C to A and T in a variety of microbial genomes .
Horizontal gene transfer may influence the genomic features of vibrios. The three genetic processes that mediate HGT often occur in vibrios and may cause phenotypic variation [63, 64]. Such variation may confound a phenotype based identification. Another interesting feature of vibrios genomes is the presence of two chromosomes. The intragenomic dissimilarity between the two chromosomes of each vibrio strain is higher than the genomic dissimilarity between chromosome I of two strains of the same species. For instance, V. cholerae N16961 chromossomes I and II genomic dissimilarity δ(f,g)) was 12 and V. cholerae N16961 chromossome I and V. cholerae O395 chromossome I was 1. The fact that the two chromosomes of vibrios are dissimilar, with chromosome II less conserved than the chromosome I might support the hypothesis that the chromosome II was acquired by horizontal gene transfer . The chromosome II has only a few essential housekeeping protein coding genes. This chromosome might have been a megaplasmid acquired by an ancestor prior diversification of the vibrios. Nearly 1/6 of the chromosome II (ca. 150 Kb) corresponds to a superintegron, a rapidly evolving region specialized in capture and loss of genes, and gene expression . It is important to highlight that the fact that the two chromosomes of the same strain are more dissimilar than the chromosomes of different strains does not hamper the use of genome signatures for identification. Chromosomes I of V. cholerae and V. mimicus have signature dissimilarity higher than the signature dissimilarity between chromosome I of two V. cholerae strains. The same holds true for the chromosome II.
The availability of new technologies for ultra-rapid whole genome sequencing and the development of concepts in comparative genomics will allow for rapid and reliable automatable identification of microbial isolates through a web-based server. The concept of an online electronic taxonomy based on whole genome features as illustrated in this study will improve microbial taxonomy. Environmental biodiversity surveys and ecologic studies on vibrios will also benefit from this new approach to identification. In this new context, traditional molecular approaches (i.e. DDH, MLSA, AFLP, rep-PCR) may still be useful for the screen of large collections of strains that will subsequently be used in whole genome based identification schemes.
Amplified Fragment Lenght Polymorphism
Horizontal Gene Transfer
Whole Genome Sequencing
is the melting temperature of a double strand DNA molecule
is the difference between the Tm of a given double strand DNA molecule and the Tm of a hybrid of this molecule formed under controlled experimental conditions
Toxin Co-regulated Pilus.
CCT acknowledges a PhD scholarship of CNPq. FLT acknowledges grants of CNPq (Pro-Abrolhos), FAPERJ, IFS, and FUJB. ACPV and ATRV acknowledge grants of CNPq. NAJr acknowledges the scholarship of CNPq. The authors thank Bruno Sergio de O. Silva, Rodrigo L. de Moura, Ronaldo Francini Filho (UEPA) and Clovis B. Castro (MN-UFRJ) for help with sampling of the two brazilian coral vibrio strains, and Lena Sá with sampling of the VM603. The authors thank partial support of Genoprot/Finep. DWU acknowledges support from the Danish Center for Scientific Computing. The authors thank the comments of the referees. The authors dedicate this paper to Professor Jean Swings.
- Colwell RR: Polyphasic taxonomy of the genus vibrio: numerical taxonomy of Vibrio cholerae, Vibrio parahaemolyticus, and related Vibrio species. J Bacteriol. 1970, 104 (1): 410-433.PubMed CentralPubMed
- Wang Q, Garrity GM, Tiedje JM, Cole JR: Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Appl Environ Microbiol. 2007, 73 (16): 5261-5267. 10.1128/AEM.00062-07.PubMed CentralView ArticlePubMed
- Woese CR, Fox GE: Phylogenetic structure of the prokaryotic domain: the primary kingdoms. Proc Natl Acad Sci USA. 1977, 74 (11): 5088-5090. 10.1073/pnas.74.11.5088.PubMed CentralView ArticlePubMed
- Stackebrandt E, Goebel BM: A place for DNA-DNA reassociation and 16S ribosomal-RNA sequence analysis in the present species definition in bacteriology. Int J Syst Bacteriol. 1994, 49: 846-849.View Article
- Wayne LG, Brenner DJ, Colwell RR, Grimont PAD, Kandler O, Krichevsky MI, Moore LH, Moore WEC, Murray RGE, Stackebrandt E, et al: Report of the ad hoc committee on reconciliation of approaches to bacterial systematics. Int J Syst Bacteriol. 1987, 37: 463-464.View Article
- Gevers D, Cohan FM, Lawrence JG, Spratt BG, Coenye T, Feil EJ, Stackebrandt E, Peer Van de Y, Vandamme P, Thompson FL, et al: Opinion: Re-evaluating prokaryotic species. Nat Rev Microbiol. 2005, 3 (9): 733-739. 10.1038/nrmicro1236.View ArticlePubMed
- Gillis M, Vandamme P, De Vos P, Swings J, Kersters K: Polyphasic taxonomy. Bergey's Manual of Systematic Bacteriology: The Proteobacteria: Introductory Essays. Edited by: Brenner DJ, Krieg NR, Staley JT, Garrity GM. 2005, New york: Springer, 2: 43-48.View Article
- Lawrence JG, Retchless AC: The interplay of homologous recombination and horizontal gene transfer in bacterial speciation. Methods Mol Biol. 2009, 532: 29-53. full_text.View ArticlePubMed
- Cohan FM, Koeppel AF: The origins of ecological diversity in prokaryotes. Curr Biol. 2008, 18: 1024-1034. 10.1016/j.cub.2008.09.014.View Article
- Coenye T, Gevers D, Peer Van de Y, Vandamme P, Swings J: Towards a prokaryotic genomic taxonomy. FEMS Microbiol Rev. 2005, 29 (2): 147-167. 10.1016/j.femsre.2004.11.004.View ArticlePubMed
- Coenye T, Vandamme P: Extracting phylogenetic information from whole-genome sequencing projects: the lactic acid bacteria as a test case. Microbiology. 2003, 149 (Pt 12): 3507-3517. 10.1099/mic.0.26515-0.View ArticlePubMed
- Karlin S, Mrazek J, Campbell AM: Compositional biases of bacterial genomes and evolutionary implications. J Bacteriol. 1997, 179 (12): 3899-3913.PubMed CentralPubMed
- Phillippy AM, Mason JA, Ayanbule K, Sommer DD, Taviani E, Huq A, Colwell RR, Knight IT, Salzberg SL: Comprehensive DNA signature discovery and validation. PLoS Comput Biol. 2007, 3 (5): e98-10.1371/journal.pcbi.0030098.PubMed CentralView ArticlePubMed
- van Passel MWJ, Kuramae EE, Luyf ACM, Bart A, Boekhout T: The reach of the genome signature in prokaryotes. BMC Evol Biol. 2006, 6 (84):
- Brown JR, Douady CJ, Italia MJ, Marshall WE, Stanhope MJ: Universal trees based on large combined protein sequence data sets. Nat Genet. 2001, 28 (3): 281-285. 10.1038/90129.View ArticlePubMed
- Daubin V, Gouy M, Perriere G: Bacterial molecular phylogeny using supertree approach. Genome Inform. 2001, 12: 155-164.PubMed
- Wolf YI, Rogozin IB, Grishin NV, Tatusov RL, Koonin EV: Genome trees constructed using five different approaches suggest new major bacterial clades. BMC Evol Biol. 2001, 1: 8-10.1186/1471-2148-1-8.PubMed CentralView ArticlePubMed
- Konstantinidis KT, Tiedje JM: Towards a genome-based taxonomy for prokaryotes. J Bacteriol. 2005, 187 (18): 6258-6264. 10.1128/JB.187.18.6258-6264.2005.PubMed CentralView ArticlePubMed
- Goris J, Konstantinidis KT, Klappenbach JA, Coenye T, Vandamme P, Tiedje JM: DNA-DNA hybridization values and their relationship to whole-genome sequence similarities. Int J Syst Evol Microbiol. 2007, 57 (Pt 1): 81-91. 10.1099/ijs.0.64483-0.View ArticlePubMed
- Wright F: The 'effective number of codons' used in a gene. Gene. 1990, 87 (1): 23-29. 10.1016/0378-1119(90)90491-9.View ArticlePubMed
- Sawabe T, Kita-Tsukamoto K, Thompson FL: Inferring the evolutionary history of vibrios by means of multilocus sequence analysis. J Bacteriol. 2007, 189 (21): 7932-7936. 10.1128/JB.00693-07.PubMed CentralView ArticlePubMed
- Thompson FL, Gevers D, Thompson CC, Dawyndt P, Naser S, Hoste B, Munn CB, Swings J: Phylogeny and molecular identification of vibrios on the basis of multilocus sequence analysis. Appl Environ Microbiol. 2005, 71 (9): 5107-5115. 10.1128/AEM.71.9.5107-5115.2005.PubMed CentralView ArticlePubMed
- Thompson CC, Thompson FL, Vicente AC, Swings J: Phylogenetic analysis of vibrios and related species by means of atpA gene sequences. Int J Syst Evol Microbiol. 2007, 57 (Pt 11): 2480-2484. 10.1099/ijs.0.65223-0.View ArticlePubMed
- Thompson FL, Gomez-Gil B, Vasconcelos AT, Sawabe T: Multilocus sequence analysis reveals that Vibrio harveyi and V. campbellii are distinct species. Appl Environ Microbiol. 2007, 73 (13): 4279-4285. 10.1128/AEM.00020-07.PubMed CentralView ArticlePubMed
- Thompson CC, Thompson FL, Vicente AC: Identification of vibrio cholerae and vibrio mimicus by multilocus sequence analysis (MLSA). Int J Syst Evol Microbiol. 2008, 58 (Pt 3): 617-621. 10.1099/ijs.0.65461-0.View ArticlePubMed
- Thompson FL, Austin B, Swings J: The biology of vibrios. 2006, Washington: ASMView Article
- Payne GW, Vandamme P, Morgan SH, Lipuma JJ, Coenye T, Weightman AJ, Jones TH, Mahenthiralingam E: Development of a recA gene-based identification approach for the entire Burkholderia genus. Appl Environ Microbiol. 2005, 71 (7): 3917-3927. 10.1128/AEM.71.7.3917-3927.2005.PubMed CentralView ArticlePubMed
- Andersson AF, Lindberg M, Jakobsson H, Backhed F, Nyren P, Engstrand L: Comparative analysis of human gut microbiota by barcoded pyrosequencing. PLoS ONE. 2008, 3 (7): e2836-10.1371/journal.pone.0002836.PubMed CentralView ArticlePubMed
- Eid J, Fehr A, Gray J, Luong K, Lyle J, Otto G, Peluso P, Rank D, Baybayan P, Bettman B, et al: Real-time DNA sequencing from single polymerase molecules. Science. 2009, 323 (5910): 133-138. 10.1126/science.1162986.View ArticlePubMed
- Mahenthiralingam E, Baldwin A, Drevinek P, Vanlaere E, Vandamme P, LiPuma JJ, Dowson CG: Multilocus sequence typing breathes life into a microbial metagenome. PLoS ONE. 2006, 1: e17-10.1371/journal.pone.0000017.PubMed CentralView ArticlePubMed
- Hunt DE, David LA, Gevers D, Preheim SP, Alm EJ, Polz MF: Resource partitioning and sympatric differentiation among closely related bacterioplankton. Science. 2008, 320 (5879): 1081-1085. 10.1126/science.1157890.View ArticlePubMed
- Thompson FL, Klose KE: Vibrio the First International Conference on the Biology of Vibrios. J Bacteriol. 2005, 188 (13): 4592-4596. 10.1128/JB.00141-06.View Article
- Farmer JJ, Janda M, Brenner FW, Cameron DN, Birkhead KM: Genus I. Vibrio Pacini 1854. Bergey's Manual of Systematic Bacteriology: The Proteobacteria: The Gamma-Proteobacteria. Edited by: Brenner DJ, Krieg NR, Staley JT, Garrity GM. 2005, New York: Springer, 2: 494-545.
- Thompson FL, Iida T, Swings J: Biodiversity of vibrios. Microbiol Mol Biol Rev. 2004, 68 (3): 403-431. 10.1128/MMBR.68.3.403-431.2004. table of contentsPubMed CentralView ArticlePubMed
- Urbanczyk H, Ast JC, Higgins MJ, Carson J, Dunlap PV: Reclassification of Vibrio fischeri, Vibrio logei, Vibrio salmonicida and Vibrio wodanis as Aliivibrio fischeri gen. nov., comb. nov., Aliivibrio logei comb. nov., Aliivibrio salmonicida comb. nov. and Aliivibrio wodanis comb. nov. Int J Syst Evol Microbiol. 2007, 57 (Pt 12): 2823-2829. 10.1099/ijs.0.65081-0.View ArticlePubMed
- Pitcher DG, Saunders NA, Owen RJ: Rapid extraction of bacterial genomic DNA with guanidium thiocyanate. Lett Appl Microbiol. 1989, 8: 151-156. 10.1111/j.1472-765X.1989.tb00262.x.View Article
- Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, Bemben LA, Berka J, Braverman MS, Chen YJ, Chen Z, et al: Genome sequencing in microfabricated high-density picolitre reactors. Nature. 2005, 437 (7057): 376-380.PubMed CentralPubMed
- Almeida LG, Paixao R, Souza RC, Costa GC, Barrientos FJ, Santos MT, Almeida DF, Vasconcelos AT: A System for Automated Bacterial (genome) Integrated Annotation--SABIA. Bioinformatics. 2004, 20 (16): 2832-2833. 10.1093/bioinformatics/bth273.View ArticlePubMed
- Tamura K, Dudley J, Nei M, Kumar S: MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. Mol Biol Evol. 2007, 24 (8): 1596-1599. 10.1093/molbev/msm092.View ArticlePubMed
- Hallin PF, Binnewies TT, Ussery DW: The genome BLASTatlas-a GeneWiz extension for visualization of whole-genome homology. Mol Biosyst. 2008, 4 (5): 363-371. 10.1039/b717118h.View ArticlePubMed
- Pedersen AG, Jensen LJ, Brunak S, Staerfeldt HH, Ussery DW: A DNA structural atlas for Escherichia coli. J Mol Biol. 2000, 299 (4): 907-930. 10.1006/jmbi.2000.3787.View ArticlePubMed
- Ussery DW, Borini S, Wassenaar T: Computing for comparative genomics: bioinformatics for microbiologists. 2009, London: Spring-verlagView Article
- Swofford DL, Olsen GJ, Waddell PJ, Hillis DM: Phylogentic inference. Molecular Systematics. Edited by: Hillis DM, Moritz C, Mable BK. 1996, Sunderland: Sinauer, 407-514.
- Saitou N, Nei M: The Neighbor-Joining Method - a New Method for Reconstructing Phylogenetic Trees. Japanese Journal of Genetics. 1986, 61 (6): 611-611.
- Jukes TH, Cantor CR: Evolution of protein molecules. Mammalian protein metabolism. Edited by: Munro HN. 1996, New York: Academic Press, 21-132.
- Felsenstein J: Confidence limits on phylogenies: An approach using the bootstrap. Evolution. 1985, 39: 783-791. 10.2307/2408678.View Article
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25 (17): 3389-3402. 10.1093/nar/25.17.3389.PubMed CentralView ArticlePubMed
- Rost B: Twilight zone of protein sequence alignments. Protein Eng. 1999, 12 (2): 85-94. 10.1093/protein/12.2.85.View ArticlePubMed
- Karlin S: Global dinucleotide signatures and analysis of genomic heterogeneity. Curr Opin Microbiol. 1998, 1 (5): 598-610. 10.1016/S1369-5274(98)80095-7.View ArticlePubMed
- Heidelberg JF, Eisen JA, Nelson WC, Clayton RA, Gwinn ML, Dodson RJ, Haft DH, Hickey EK, Peterson JD, Umayam L, et al: DNA sequence of both chromosomes of the cholera pathogen Vibrio cholerae. Nature. 2000, 406 (6795): 477-483. 10.1038/35020000.View ArticlePubMed
- Gonzalez-Escalona N, Martinez-Urtaza J, Romero J, Espejo RT, Jaykus LA, DePaola A: Determination of molecular phylogenetics of Vibrio parahaemolyticus strains by multilocus sequence typing. J Bacteriol. 2008, 190 (8): 2831-2840. 10.1128/JB.01808-07.PubMed CentralView ArticlePubMed
- Thompson FL, Hoste B, Vandemeulebroecke K, Swings J: Genomic diversity amongst Vibrio isolates from different sources determined by fluorescent amplified fragment length polymorphism. Syst Appl Microbiol. 2001, 24 (4): 520-538. 10.1078/0723-2020-00067.View ArticlePubMed
- Bansal AK, Meyer TE: Evolutionary analysis by whole-genome comparisons. J Bacteriol. 2002, 184 (8): 2260-2272. 10.1128/JB.184.8.2260-2272.2002.PubMed CentralView ArticlePubMed
- Snel B, Bork P, Huynen MA: Genome phylogeny based on gene content. Nat Genet. 1999, 21 (1): 108-110. 10.1038/5052.View ArticlePubMed
- Mrazek J: Phylogenetic Signals in DNA Composition: Limitations and Prospects. Mol Biol Evol. 2009, 26 (5): 1163-1169. 10.1093/molbev/msp032.View ArticlePubMed
- Foerstner KU, von Mering C, Hooper SD, Bork P: Environments shape the nucleotide composition of genomes. EMBO Rep. 2005, 6 (12): 1208-1213. 10.1038/sj.embor.7400538.PubMed CentralView ArticlePubMed
- Vetsigian K, Goldenfeld N: Genome rhetoric and the emergence of compositional bias. Proc Natl Acad Sci USA. 2009, 106 (1): 215-220. 10.1073/pnas.0810122106.PubMed CentralView ArticlePubMed
- Paz A, Kirzhner V, Nevo E, Korol A: Coevolution of DNA-interacting proteins and genome "dialect". Mol Biol Evol. 2006, 23 (1): 56-64. 10.1093/molbev/msj007.View ArticlePubMed
- Xia X, Wei T, Xie Z, Danchin A: Genomic changes in nucleotide and dinucleotide frequencies in Pasteurella multocida cultured under high temperature. Genetics. 2002, 161 (4): 1385-1394.PubMed CentralPubMed
- Horst JP, Fritz HJ: Counteracting the mutagenic effect of hydrolytic deamination of DNA 5-methylcytosine residues at high temperature: DNA mismatch N-glycosylase Mig.Mth of the thermophilic archaeon Methanobacterium thermoautotrophicum THF. EMBO J. 1996, 15 (19): 5459-5469.PubMed CentralPubMed
- Lindahl T: Instability and decay of the primary structure of DNA. Nature. 1993, 362 (6422): 709-715. 10.1038/362709a0.View ArticlePubMed
- Wang RF, Campbell W, Cao WW, Summage C, Steele RS, Cerniglia CE: Detection of Pasteurella pneumotropica in laboratory mice and rats by polymerase chain reaction. Lab Anim Sci. 1996, 46 (1): 81-85.PubMed
- Meibom KL, Blokesch M, Dolganov NA, Wu CY, Schoolnik GK: Chitin induces natural competence in Vibrio cholerae. Science. 2005, 310 (5755): 1824-1827. 10.1126/science.1120096.View ArticlePubMed
- Vidgen M, Carson J, Higgins M, Owens L: Changes to the phenotypic profile of Vibrio harveyi when infected with the Vibrio harveyi myovirus-like (VHML) bacteriophage. J Appl Microbiol. 2006, 100 (3): 481-487. 10.1111/j.1365-2672.2005.02829.x.View ArticlePubMed
- Mazel D: Integrons: agents of bacterial evolution. Nat Rev Microbiol. 2006, 4 (8): 608-620. 10.1038/nrmicro1462.View ArticlePubMed
- Mandel MJ, Wollenberg MS, Stabb EV, Visick KL, Ruby EG: A single regulatory gene is sufficient to alter bacterial host range. Nature. 2009, 458 (7235): 215-218. 10.1038/nature07660.PubMed CentralView ArticlePubMed
- Reichelt JL, Baumann P, Baumann L: Study of genetic relationships among marine species of the genera Beneckea and Photobacterium by means of in vitro DNA/DNA hybridization. Arch Microbiol. 1976, 110 (1): 101-120. 10.1007/BF00416975.View ArticlePubMed
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.