Taxonomy is at the basis of the biological sciences, being one of its oldest branches. It deals with identification, classification (i. e. creation of new taxa) and nomenclature. In the early 1970s, a seminal work done by Colwell put forward the concept of polyphasic taxonomy that is currently still in use . According to this concept, in order to achieve the most natural classification system, there should be an integration of information from the molecular to the ecological levels. DNA-DNA hybridization (DDH) data had a pivotal role to infer the species boundaries in the polyphasic taxonomy. Indeed DDH proved to be the most powerful tool to unambiguously identify prokaryotes, providing for the first time, a reliable means of categorizing microbes. However, this technique suffers from various limitations, including the need to include reference strains in each new experiment. In addition, it is not portable and requires special facilities available in a limited number of international laboratories. The introduction of 16S rRNA analysis as an alternative method for use in taxonomy allowed the development of an evolutionary framework [2, 3]. Boundaries for species identification and a species definition were set on the basis of extensive empirical data . A bacterial species is defined as a group of strains (including the type strain), having > 70% DDH similarity, < 5°C ΔTm, < 5% mol G+C difference of total genomic DNA, > 97% 16S rRNA identity [4, 5].
The bacterial species definition is pragmatic and operational, aiming at the establishment of a rapid, reliable, reproducible, and useful taxonomic framework, based on microbial evolution, for a variety of applications (e. g. medicine and agriculture) . This polyphasic definition is a consensus in microbiology, although it is not based on a concept (i.e. the biological processes behind speciation and species). It is crucial to highlight that the current polyphasic framework does not question if this definition corresponds to a biological reality . Within the framework of polyphasic taxonomy, strains of the same species have similar phenotypes (e. g. expression of different types of enzymes, ability to using different types of compounds as energy source, and growth in different temperatures and concentrations of acid and salt), genotypes (e. g. rep-PCR and AFLP), and chemotaxonomic features (e. g. FAME and polyamines), forming distinguishable tight groups . Ideally, these groups should be readily identifiable and differentiated from closely related species. However, currently there is not a consensus on the definition of a bacterial genus .
The most recent ideas on the species concepts corroborate polyphasic taxonomy . According to these authors, a bacterial species concept is a framework that explains how bacterial strains share common features and how they maintain genomic cohesion. The cohesion is observed as recurrent patterns or groups of strains that are recognized in nature and are termed species for practical reasons. However, the biological process giving rise and maintaining cohesion of these groups is what matters in a species concept. Homologous recombination appears to be a major force leading to genomic cohesion of strains of the same species. It is more frequent between strains of the same species than between strains of different species simply because this genetic process depends on sequence similarity. Horizontal gene transfer (HGT) between unrelated strains would cause an increase in phenotypic variation, but would not be frequent enough to hamper the formation and recognition of species. An alternative species concept is the stable ecotype in which cohesion of bacterial strains of a given species is maintained by accumulation of advantageous mutations and periodic purging of allelic variability . In contrast with the polyphasic species definition which is widely accepted, there is not a consensus on a bacterial species concept. It is of course possible that one single concept is not sufficient to explain the complexity of bacterial diversity. In addition, mutation, homologous recombination and HGT may be detected at varying levels in a single strain, making the scenario even more complex.
Whole microbial genome sequencing studies launched microbial taxonomy into a new era, with the possibility of establishing sistematics on the basis of complete genomes . How does one go about using whole genome sequences (WGS) for establishing a genomic taxonomy? And more specifically, how can one taxonomically define and identify species by means of WGS? WGS may contain taxonomic information in the form of gene content, genome wide signatures, phylogenetic markers, amino acid identity and overall genetic composition that might be useful for building novel taxonomic schemes . Pioneer computational and mathematical studies performed in the 1990s suggested that genomes contain species-specific signatures . Genome signature is a compositional parameter reflecting the dinucleotide relative abundance, which is similar between closely related species, and dissimilar between non-related species. Genome signatures appear to allow the identification of isolates and metagenomes into known species [13, 14]. Whole genome sequences also permit the reconstruction of more robust taxonomic trees (i. e. supertrees) based on all genes of the core genome [15–17]. A good congruence was obtained by the traditional 16S rRNA based trees and the novel supertree methods  proposed that the average amino acid identity (AAI) could be used to distinguish closely related sister species. Subsequently, a close relationship between DDH and AAI was shown . Some studies have suggested that the effective number of codons (Nc) could also be a species-specific marker .
In Vibrios, the birth of the genomic taxonomy occurred with a series of papers that attempted to use multilocus sequence analysis (MLSA) [21–26]. These studies allowed the establishment of rapid and powerful identification systems through the internet. Currently there are MLSA schemes for most of the human pathogens available for free access in the internet. Establishing a universal MLSA will not be possible though. Studies accomplished so far have shown that the resolution of different markers varies according to the taxonomic groups. For instance, the recA gene is very useful to differentiate closely related species of Burkholderia , but it is not appropriate for vibrios . Clearly, genes have different molecular clocks in different microbes, indicating the need of a multigene approach. With the advent of ultra-rapid genome sequencing, it is now possible to sequence one almost complete microbial genome in less than a day [28, 29]. The new generation of DNA sequencers will enable sequencing of more than a dozen prokaryotic genomes in less than an hour, possibly making it cheaper and faster to sequence a whole genome than several genes for MLSA. In future, MLSA might be used simply as a rapid screen methodology .
Vibrios are an excellent test model for genomic taxonomy because they are ubiquitous in the marine environment, associated with a wide range of marine life (some species such as V. cholerae, V. parahaemolyticus and V. vulnificus cause serious disease in man) and experiencing a variety of environmental conditions and selection forces, leading to high genomic plasticity [31, 32]. Consequently, differentiation of sister species becomes very difficult. For instance, V. cholerae and V. mimicus have nearly indistinguishable phenotypes. Among the phenotypic tests used in the Bergey's manual, only sucrose fermentation and lipase activity may discriminate the two species. According to the most recent version of the Bergey's manual, a Vibrio species is defined as a group of strains forming small (0.5-0.8 × 1.4-2.6 μm) comma-shapped rods with polar flagella enclosed in a sheath, facultative anaerobic metabolism, capable of fermenting D-glucose and growth at 20°C . Primarily aquatic, most species are oxidase positive, reduce nitrate to nitrite, require Na+ for growth, and ferment D-frutose, maltose, and glycerol. Each vibrio species is further identified by an array of over 100 phenotypic tests. There is not an operational definition for genera within the vibrios . In our hands, vibrio species may be better defined on the basis of amplified fragment length polymorphism (AFLP) and MLSA [21, 22, 25, 34]. Strains of the same species (including the type strain) share more than 60% mutual AFLP band pattern similarity and more than 95% similarity in MLSA (using the loci rpoA, recA, pyrH, ftsZ, topA, mreB gyrB and gapA). More importantly, strains of the same species and species of the same genus will form monophyletic groups on the basis of MLSA. This was the main argument used to propose the newly described genus Aliivibrio .
In order to test the feasibility of the genomic taxonomy in vibrios, several markers were analysed in a collection of 32 genomes, including four newly pyrosequenced genomes. Several Vibrio strains had the genome completely sequenced and are available on the web. Eleven V. cholerae and two V. mimicus genomes formed an ideal test case for taxonomy because of their close relatedness as sister species. These sister species have nearly identical 16S rRNA sequences and around 70% DDH. Disclosing species-specific patterns for the different genome-wide markers would reinforce their usefulness in prokaryotic taxonomy. The aim of this study was to extract taxonomic information from vibrio genome sequences by means of a detailed analysis of MLSA, supertree, Nc, AAI, genomic signatures, Genome BLAST atlas and pangenome plot that would allow species identification.