- Research article
- Open Access
Characterization of fossilized relatives of the White Spot Syndrome Virus in genomes of decapod crustaceans
© Rozenberg et al. 2015
- Received: 5 May 2015
- Accepted: 13 May 2015
- Published: 19 July 2015
The White Spot Syndrome Virus (WSSV) is an important pathogen that infects a variety of decapod species and causes a highly contagious disease in penaeid shrimps. Mass mortalities caused by WSSV have pronounced commercial impact on shrimp aquaculture. Until now WSSV is the only known member of the virus family Nimaviridae, a group with obscure phylogenetic affinities. Its isolated position makes WSSV studies challenging due to large number of genes without homology in other viruses or cellular organisms.
Here we report the discovery of an unusually large amount of sequences with high similarity to WSSV in a genomic library from the Jamaican bromeliad crab Metopaulias depressus. De novo assembly of these sequences allowed for the partial reconstruction of the genome of this endogenized virus with total length of 200 kbp encompassed in three scaffolds. The genome includes at least 68 putative open reading frames with homology in WSSV, most of which are intact. Among these, twelve orthologs of WSSV genes coding for non-structural proteins and nine genes known to code for the major components of the WSSV virion were discovered. Together with reanalysis of two similar cases of WSSV-like sequences in penaeid shrimp genomic libraries, our data allowed comparison of gene composition and gene order between different lineages related to WSSV. Furthermore, screening of published sequence databases revealed sequences with highest similarity to WSSV and the newly described virus in genomic libraries of at least three further decapod species. Analysis of the viral sequences detected in decapods suggests that they are less a result of contemporary WSSV infection, but rather originate from ancestral infection events. Phylogenetic analyses suggest that genes were acquired repeatedly by divergent viruses or viral strains of the Nimaviridae.
Our results shed new light on the evolution of the Nimaviridae and point to a long association of this viral group with decapod crustaceans.
- White Spot Syndrome Virus
- Endogenized viruses
The White Spot Syndrome Virus (WSSV) is a widespread pathogen of several commercially important penaeid shrimp species which causes mass mortalities in aquaculture with high economic impact . The virus is highly virulent, but can also induce latent and asymptomatic infections, which makes creation of virus-free penaeid stocks challenging . Since its discovery in shrimp farms, the virus was detected in a number of other marine and even freshwater decapods: caridean shrimps, brachyuran crabs, anomurans, lobsters and freshwater crayfish [3–7].
WSSV has one of the largest genomes among the viruses that infect animals, with a single circular DNA molecule of about 300 kbp in size [8, 9]. Several lineages of WSSV were described from different geographic locations, with a considerable variation in certain genomic regions, including repeat number variation and deletions leading to differences in genome sizes . Nevertheless, this variation affects only a small part of the genome, while 99.3 % remain identical . Thus, all WSSV strains known to date belong to the same viral species, formally classified in the genus Whispovirus and a monotypic family – Nimaviridae [1, 10, 11]. This fact is especially noteworthy in the light of the isolated position of this group among DNA viruses as revealed by phylogenetic reconstructions based on DNA polymerase and protein kinase amino acid sequences [12, 13]. Moreover, among the approximately 180 genes identified in the large WSSV genome only a minor fraction shows similarities to already described proteins in other viruses or cellular organisms [1, 8, 9]. About one third of the gene products have been functionally characterized in direct experiments or homology-based bioinformatic approaches: proteins involved in cellular functions (e.g. DNA polymerase, helicase, protein kinases etc.), virion proteins, gene products with proposed roles in latency and early phase of infection and others [1, 8, 9, 14].
The description of new representatives of this phylogenetically distinct viral group is important to allow for evolutionary studies of the Nimaviridae and could advance the understanding of host-pathogen interactions in this system.
Here we report the discovery of a WSSV-like viral genome in a genomic library from the brachyuran crab Metopaulias depressus Rathbun, 1896 that is endemic to Jamaica and lives and breeds in water axils of bromeliads [15, 16]. Our results suggest that these viral sequences, together with fragments detected in systematic screenings of previously published decapod genomic libraries [17, 18], stem from several related viral species divergent from WSSV and provide valuable information on the evolutionary history of Nimaviridae.
The detailed description of the procedures and results of 454-Roche shallow sequencing of a genomic library of Metopaulias depressus has been published elsewhere . Briefly, DNA was extracted from muscle tissue using Qiagen DNA Blood and Tissue Kit and 5 μg of it was sent to Macrogen Inc. (Seoul, South Korea) for library preparation and sequencing. The library was pooled together with other libraries and sequenced on a 454 GS-FLX sequencer (Roche). For the crab library 186,890 reads were yielded with an average length of 265.5 base pairs (bp). Our initial estimate of the amount of viral DNA in the data was roughly 10 % and this was the only library among the analyzed crustacean species in which WSSV-like sequences were detected . To identify as many reads belonging to the viral genome as possible the following approach was utilized. The whole dataset was assembled using MIRA v. 18.104.22.168  using relaxed parameters (--job=denovo,est,draft,454, -CO:asir=yes -SK:mnr=no 454_SETTINGS -ED:ace=yes -AL:mrs=75:mo=15 -SK:pr=75) and the resulting contigs were used in BLASTx v. 2.2.29+  (E-value threshold: 0.01) and BLAT v. 35x1  (-t=dnax -q=dnax) searches against the genome of the WSSV isolate from Thailand (AF440570). All contigs matching to the WSSV genome were isolated and associated reads were reassembled with more stringent parameters (MIRA: --job=denovo,est,accurate,454), yielding 918 contigs with 15 of them exceeding 5 kbp. To obtain more accurate information on the consensus sequences of the scaffolded contigs, all reads were mapped back to the chosen MIRA contigs using a local alignment algorithm with Bowtie2 v. 2.0.4 (settings: “--local --sensitive”) . The resulting output files were further processed by custom scripts: all soft-clipping regions left by Bowtie2 were trimmed and all alignments with >80 % of the length covered by short tandem repeats (STRs) were discarded; STRs were detected using Phobos v. 3.3.12  with the default parameters adjusted to minimal STR perfection of 80 % and a maximum unit length of six. Twelve contigs at least 8 kbp long were chosen for scaffolding, while another contig (10 kbp) showed an anomalous coverage profile (presumably mis-assembled) and was not used during scaffold construction. Scaffolding was performed manually based on sequence similarity and gene annotations with subsequent verification of the existence of the overlaps with PCR and Sanger sequencing in questionable cases. The same verification approach was utilized for low coverage regions (<10x per base coverage). Respective primers (Additional file 1: Table S1) were designed with Primer3 v. 2.2.3 . One of the smaller contigs (c152) was later included to extend scaffold I based on the presence of an overlap spanning fragments of two ORFs (homologs of wsv119 and wsv035).
The three available WSSV isolates and their genomes are referred to as WSSV-CH (an isolate from Penaeus japonicus, China, accession number AF332093; 305,108 bp ), WSSV-TH (originating from Penaeus monodon, Thailand, AF369029; 292,967 bp ) and WSSV-TW (from Penaeus monodon, Taiwan; AF440570, 307,287 bp ). As the Chinese isolate was previously chosen as the type strain , in the current paper we utilize the gene notation system accepted for WSSV-CH with consecutive numbers prefixed by “wsv”.
Potential protein-coding ORFs and their fragmented copies were searched for with BLAT and BLAST. BLAT searches for protein sequences coded by all 184 ORFs recognized in the WSSV-TH genome  against the assembled contigs were performed with default settings. BLASTx queries against the GenBank nr1 database (accessed: June 2014) were performed with default parameters and a relaxed E-value threshold of 0.1. In addition, regions between the obtained annotations and unscaffolded contigs were inspected with local BLASTx searches against protein sequences of all potential 532 ORFs from the genome of WSSV-CH with an E-value threshold of 0.01. Finer alignments and detection of frameshifts were carried out by inspection of predictable ORFs with eukaryotic start and stop codons and by performing pairwise translation alignments with ORFs from the WSSV genomes in MACSE v. 0.9b1 .
Some occurrences of supposed frameshifts and ambiguous gene boundaries were further verified with Sanger sequencing as described above (see Additional file 1: Table S1).
For subsequent analyses, protein sequences for the annotated loci were predicted based on the standard eukaryotic genetic code. For this purpose, frameshifts were corrected manually. Resulting amino acid sequences were aligned to respective sequences from the WSSV-CH or WSSV-TH genomes in MAFFT v. 7.017 (G-INS-i mode) .
Protein domain prediction
Conserved protein domains were identified based on the amino acid alignments and InterProScan searches .
RECON v. 1.05  was used to predict interspersed repeats. Results from a BLASTn search using the discontiguous-megablast algorithm and an E-value threshold of 10−5 were used as the input for RECON. To facilitate comparisons, the same analysis was performed for the genome of WSSV-TH.
Reanalysis of WSSV-like sequences from shrimp genomic libraries
Repetitive sequences containing WSSV-like fragments from a fosmid  and a BAC  library prepared from two penaeid shrimp species (Penaeus monodon and P. japonicus) were downloaded from the Penaeus Genome Database (http://sysbio.iis.sinica.edu.tw/page/) and GenBank (accession number AP010878), respectively. The sequences from the fosmid library were represented by 103 repetitive element families with a maximum length of 24 kbp, while the BAC library comprised three fragments of 77, 32 and 9 kbp in length. To locate the WSSV-like genes, BLASTx searches were performed against all ORFs from the WSSV-CH genome with an E-value threshold of 0.01. dUTPase and IAP genes annotated in the two publications were included in the downstream analyses irrespective of their direct association with WSSV-like sequences. Gene boundaries were determined based on ORF locations and homology to WSSV and Metopaulias viral genes.
WSSV-like sequences from EST libraries
To search for WSSV-like sequences in the NCBI non-human and non-mouse EST database (accessed: February 2015) a tBLASTn homology search was used. On that account, protein sequences of all 184 accepted genes from WSSV-TH plus one ORF from the WSSV-CH assembly with homologs in Metopaulias (wsv419, see Results) were used as queries. The BLAST results were filtered with an E-value threshold of 10−5. In this way, detected EST-sequences were subsequently used as BLASTn and BLASTx queries against the NCBI nt and nr databases (accessed: February 2015), respectively. Sequences that had WSSV genes as their best BLASTx hits with an E-value cutoff of 10−5 and at the same time no BLASTn hits, i.e. showing similarity to WSSV on the protein level only, were regarded WSSV-like.
Protein diversity and similarity
All homologous genes from the decapod genomic libraries were traced and aligned with their orthologs in the WSSV genome using MAFFT (G-INS-i mode). To calculate comparable estimates of protein identity all positions containing at least one gap, unknown amino acid, stop codon or mis-translated regions caused by frameshifts were excluded. In cases of incompatibly truncated protein sequences several alignments were produced for the same gene. Similarity of viral proteins from the decapod libraries and the WSSV-genome was visualized as a hive-plot in HivePanelExplorer (https://github.com/sperez8/HivePanelExplorer) with the axes corresponding to the different genomes, orthologs represented as nodes and edges connecting them reflecting homology and the degree of similarity. The set of genes chosen for WSSV included 106 genes with reported function  and/or homology in the decapod libraries. All protein kinases from the decapod libraries were treated as homologs of wsv423 (PK1) (see Results). The ORFs wsv064 and wsv065 were treated as one gene following the evidence from WSSV-TH, WSSV-TW and the Metopaulias scaffolds.
Synteny analysis was based on the comparison of the positions of the annotated genes in the Metopaulias scaffolds of viral origin and their orthologs in the WSSV genome, and visualized in the program MizBee . To avoid interference of potential mis-assembly, scaffold III was conservatively broken at the low-coverage region (see “Genome assembly” in Results) and at two locations with insertions of large interspersed repeat units (see “Interspersed repeats” in Results).
To facilitate cross-comparison to the shrimp genomic libraries, which are represented by short fragments encompassing only a few genes per fragment, we utilized a breakpoint-oriented approach as defined by Sankoff and Blanchette  to summarize topological differences. In short, for all genomes involved in the comparison only shared genes were retained and the remaining pairs of neighboring genes were traced. If presence or absence of a certain gene pair could not be confirmed in at least one fragmented genome, the pair was discarded. A script implementing this breakpoint-oriented approach (breakpoints.pl) is available under https://github.com/har-wradim/miscngs. In the P. monodon fosmid library several homologs of the same WSSV-genes were found (see Results) of which only the homologs on the longest respective genome fragments were considered for the breakpoint analysis.
For phylogenetic reconstructions we chose all non-structural genes found among the sequences of viral origin in Metopaulias except for those present in WSSV only. Thus, phylogenetic affinities of WSSV and the WSSV-like sequences from the decapod libraries were analyzed based on alignments of the genes coding for DNA polymerase, dUTPase, non-specific endonuclease, helicase, protein kinases (PKs), ribonucleotide reductase subunits (RR1 and RR2), TATA-box binding protein (TBP) and inhibitor of apoptosis (IAP). To identify potential closely related proteins from GenBank (accessed: July 2013), BLASTp searches were performed using WSSV and WSSV-like protein sequences from M. depressus as queries with an E-value threshold of 0.01. The best hits (up to 500) were further filtered based on the taxonomic proximity and pairwise identity values using MAFFT alignments: single sequences were retained for groups of proteins of high similarity from closely related organisms. After filtering, new MAFFT alignments were performed using either the G-INS-i mode for proteins with global homology (dUTPase, BIR and RING domains of IAP, RR1, RR2) or the E-INS-i mode for multi-domain proteins (all other cases) with all other parameters being set to default values. Blocks of ambiguous homology (mainly gaps between conserved domains) were excised manually. Maximum-likelihood phylogenies were reconstructed in RAxML v. 7.2.7  with “PROTGAMMALG” as the selected substitution model. Bootstrap support values were estimated based on 500 permutations. Since for different genes incompatible lists of organisms were collected with BLAST and since the resulting topologies sharply diverged, a concatenated dataset was not analyzed.
To assess infection status in M. depressus specimens two fragments were chosen for a PCR assay: a fragment from the homolog of the WSSV main nucleocapsid protein VP664 (wsv360) (primer pair VP664:2 in Additional file 1 : Table S1) and a fragment of the DNA polymerase gene (wsv514) (primer pair DNApol:4). DNA was extracted from muscle tissue for 22 crabs sampled from different locations across Jamaica using the Puregene kit (Gentra Systems) (coordinates of the sampling sites are available on request). DNA integrity was confirmed by PCR of host multicopy genes (data not shown). Negative controls of the PCR mix without template were run in every PCR reaction to control for contamination of the PCR reagents.
The sequences of the assembled viral scaffolds from M. depressus are available in NCBI GenBank (accession numbers KR820240-KR820242). The raw reads are accessible from the NCBI Sequence Read Archive (BioProject PRJNA283742).
Viral sequences in the crab genomic library
De novo assembly of the M. depressus genomic library resulted in 563 contigs (max. contig size 32,245 bp; N50 = 1,931 bp) including 20,563 reads. Of these, 13 contigs were chosen for semi-manual scaffolding. These contigs were composed of 14,972 reads (8.01 % of the whole genomic library) and the mean coverage achieved 34.1x per site after read mapping. Three scaffolds with lengths of 100,139, 62,154 and 46,520 bp (208,813 bp in total, GC content 44.1 %) successfully incorporated all 13 contigs of putative viral origin. Nevertheless, considerable variation was observed between different reads in terms of single nucleotide polymorphisms and length variation of STRs. Most of the low-coverage and contig junction regions were confirmed with PCR and Sanger sequencing with the exception of one low-coverage region on the scaffold III (primer pairs C9:1 and C9:2: see Additional file 1: Table S1). It was also not possible to fill gaps between the scaffolds.
Orthologs of WSSV genes identified on the three viral scaffolds of Metopaulias depressus
Predicted protein length, aa
ORF number in WSSV b
Protein in WSSV
Pairwise identity, %
Alignment length, aa
TATA box binding protein (TBP)
VP674, VP76, class I cytokine receptor
Ribonucleotide reductase large subunit (RR1)
Ribonucleotide reductase small subunit (RR2)
7 + 8
5’, fragment e
165 + 166
399 + 412
Protein kinase 1 (PK1)
5’; 3’ e
86 + 82
492 + 496
64 + 63
121 + 120
VP544, VP60 §
5’, 3’; fragment
Inhibitor of apoptosis protein
Thirty-six (53.7 %) of the reconstructed genes are homologous to previously functionally characterized genes of WSSV (and other organisms in the case of IAP. Out of the 21 non-structural genes reported for WSSV [1, 2, 8, 34, 35] twelve were found on the reconstructed Metopaulias scaffolds. These include most of the proteins involved in replication and nucleotide metabolism: the DNA polymerase, helicase, non-specific nuclease, dUTPase and both subunits of the ribonucleotide reductase, while no homologs of the chimeric thymidine/thymidylate kinase and the thymidylate synthase were found. Additionally, homologs to both homologous-region-binding proteins were identified: one of them (wsv021: ) is represented by a considerably truncated ORF in M. depressus, while the second one (wsv078: ) is presumably of full length (see Table 1).
Out of the 17 major virion proteins currently recognized in WSSV  we found homologs of genes coding for nine of them: VP664, VP160B, VP51C (nucleocapsid); VP95, VP39A, VP26 (tegument); and VP53A, VP28, VP19 (envelope). An intact ORF with a predicted amino acid sequence similar to the collagen protein from WSSV (wsv001, VP1684) was also identified, but contained much less of the tandem collagen tripeptide (Gly-Xaa-Yaa) repeats (388 vs. 26 units).
Fragments homologous to WSSV ORFs identified only on non-scaffolded viral contigs from Metopaulias
ORF number in WSSV
Number of contigs
Longest match a
Match length, bp
148 + 149
Homologs of WSSV genes represented by more than 20 fragments on scaffolds and/or unscaffolded contigs
ORF number in WSSV
Number of fragments
165 + 166
399 + 412
The longest identified interspersed repeat (see Fig. 1) corresponds to a region on scaffold III intersecting intact homologs of wsv327 and wsv147 interrupted by a fragment homologous to wsv191, and a similar region with broken copies of all three ORFs on scaffold II. Among WSSV homologs represented multiple times the homologs to wsv327 and wsv191 were detected at least thrice: wsv327 with five fragments of varying lengths, one of them being the intact ORF neighboring a homolog of wsv332 (the adjacent ORF in the WSSV genome); and wsv191 (the endonuclease) represented by three different copies, one of them being the intact ORF neighboring a homolog of wsv192.
Prevalence of WSSV-like sequences in Metopaulias individuals
Our PCR assay was designed on the basis of the homologs of wsv360 (VP664) and wsv514 (the DNA polymerase) represented on the scaffolds. This choice was motivated by the functional significance of these genes in WSSV and apparent lack of variation in the respective fragments in the reference specimen (but see below). The PCR test was positive for all 22 DNA samples from Metopaulias individuals from across Jamaica (Additional file 2 : Figure S1). For twelve individuals the DNA polymerase fragment was Sanger-sequenced in both directions to produce 715-bp sequences. No nucleotide position was variable among individuals, but at least one position appeared to be polymorphic in each one of the sequences: a synonymous A/G variation at position 111, which was also confirmed for the reference specimen (Additional file 2 : Figure S2).
WSSV-like sequences in the shrimp genomic libraries
Although the inventories of the WSSV-like genes were provided in the original publications [17, 18], we decided to perform a new analysis similar to the one performed for the M. depressus library, primarily to clarify gene boundaries and extract protein sequences. For the P. monodon repetitive families our BLASTx search yielded 76 hits (of which 48 were reported in the original study) to 48 different WSSV genes (see Additional file 1 : Table S3). Seven hits to seven WSSV genes were obtained for the P. japonicus BAC library, all of which were originally reported and numbered (genes 11, 13–18) (see Additional file 1 : Table S4).
Additionally, in both libraries single dUTPase ORFs and several IAP genes were confirmed in locations indicated in the original publications. All genes were represented by single continuous ORFs with the exception of one IAP gene from the P. japonicus BAC library, which we interpret as consisting of at least four exons. In contrast to Koyama and co-authors , we consider it separately from the IAP gene next to it and designate them as genes 06A and 06B respectively (see below).
WSSV-like sequences in EST libraries
Cross-comparison of WSSV and WSSV-like genomes
Diversity and similarity of proteins with homology in WSSV
Comparisons of p-distances between 38 proteins shared by the WSSV genome (W), M. depressus scaffolds (M) and penaeid libraries (P): summary statistics, correlation coefficients and results of Wilcoxon signed rank tests. Asterisks indicate significance levels after Holm’s correction: * – p < 0.05, **** – p < 0.0001
Pearson’s r \ Median difference
Many of the WSSV-homologs are found several times on different fragments in the P. monodon library. Due to the fact that many of them are represented by incomplete gene sequences, the exact number of genes with several WSSV paralogs could not be estimated. However, 22 pairs of overlapping fragments representing 19 unique genes could be determined (see Additional file 3) with the mean (±SD) p-distance of 0.687 ± 0.0728. No unambiguous cases of more than two overlapping paralogs were found. Twelve fragments from P. monodon could be compared to the seven WSSV homologs from the P. japonicus library with the resulting mean p-distance of 0.593 ± 0.2340.
Synteny between WSSV and M. depressus scaffolds
Analysis of breakpoints
In order to enable comparison of gene orders with the data from the shrimp libraries, we utilized a breakpoint-oriented approach. For the set involving WSSV, M. depressus scaffolds and P. monodon repeat families, 50 pairs of neighboring genes could be inspected for their presence/absence. Resulting breakpoint distances were nearly identical for all three pairs of viral genomes: WSSV genome vs. crab scaffolds – 28 breakpoints, WSSV genome vs. shrimp library – 28 breakpoints, crab scaffolds vs. shrimp library – 30 breakpoints (no statistical test performed). In total only seven neighboring gene pairs were shared by all three gene sets (the “-” sign denotes genes on the opposite strand): wsv023/wsv021, -wsv327/wsv332, wsv306/wsv308, wsv293a/wsv289, wsv139/wsv137, -wsv427/wsv433 and wsv433/wsv440. Due to the low number of genes n the fragment containing WSSV-like sequences from P. japonicus, its addition decreased the number of gene pairs amenable to the analysis to ten with only a single pair of neighbors shared by all four gene sets: -wsv327/wsv332, which was also the sole pair shared by P. japonicus and P. monodon.
The WSSV genome and the M. depressus scaffolds contained single-copy helicase genes each, while two distinct helicase sequences are identifiable in the P. monodon library. The helicases from WSSV and the decapod libraries cluster in a well-supported clade presumably at the base of eukaryotic helicases while bacterial helicases form a distinct clade (predominantly Bacteroidetes) (see Additional file 2 : Figure S4). Among the four helicases of viral origin the sequences from P. monodon occupy basal positions and do not form a clade with each other.
Ribonucleotide reductase subunits
TATA-box binding protein
Among the two WSSV PK enzymes the sole protein kinase identified in M. depressus is presumably an ortholog of WSSV PK1, given the higher similarity and shared location of the ORF next to the homolog of wsv421. The two kinases from the P. monodon library are roughly equally distant from WSSV PK1 and PK2 and from each other (see Additional file 3) and no positional information is available for them. The alignment yielded a tree with poor resolution around the five protein kinases (see Additional file 2 : Figure S9). Nevertheless, they form a distinct clade in the PK tree with high bootstrap support. The two P. monodon proteins group with WSSV PK2, but do not form a cluster with each other, while WSSV PK1 clusters with the sole PK sequence from M. depressus with weak support. Among the PK-sequences sampled with BLASTp, serine-threonine kinases were the only class of PKs with explicitly specified activity.
Inhibitor of apoptosis proteins
The tree for the BIR-domains of the IAPs associated with WSSV-like sequences in the decapod libraries and IAPs from different animal taxa is poorly resolved due to the very low number of positions (72 amino acids) (Additional file 2 : Figure S10). Still, three or four large clades involving IAPs from the three decapod libraries and four previously identified IAPs from penaeids and a caridean shrimp are discernible and roughly correspond respectively to the first, second and third BIR domains of the decapod three-BIR-domain IAPs. Three second BIR domains from the shrimp repetitive sequences form a separate clade close to the clade of the first BIR domains. The IAP from M. depressus has only two BIR domains: one of them clusters together with the group of second domains of other IAPs, and the other one does not belong to any of the three decapod clades. Another outlying BIR domain is the third BIR domain of the IAP coded by gene 06A from the P. japonicus library. Not all of the IAPs contain RING domains, and the phylogeny reconstructed based on the respective alignment (35 positions only) shows a grouping of the RING domains from two IAPs from the two shrimp libraries together with IAPs independently identified in penaeid and caridean shrimps (see Additional file 2 : Figure S11). The RING-domain from the M. depressus IAP does not belong to this clade.
The phylogenetic tree based on the alignment for the dUTPase protein sequences also lacks resolution (see Additional file 2 : Figure S12). The four dUTPases from WSSV and the three decapod libraries do not form a single monophylum and cluster together with other sequences in different parts of the tree, albeit with low support values. At the same time, the dUTPases from the two shrimp libraries are closely related and form a well-supported grouping.
The White Spot Syndrome Virus (WSSV) is currently considered as the sole representative of an isolated viral group with unclear relationships to other viruses [1, 8]. In this study we describe a viral genome with high similarity to WSSV discovered in a genomic library of the Jamaican bromeliad crab Metopaulias depressus . The genome is seemingly nearly complete given the fact that undamaged homologs of most of the WSSV non-structural proteins (including the DNA polymerase) and half of the major virion protein genes were identified. Nevertheless, we postulate that this virus is not independent, but endogenized in the genome of its host, because: 1) despite the high sequencing coverage we were not able to reconstruct a single circular molecule as known for WSSV, and no connection between the three reconstructed scaffolds could be detected with PCR, 2) some fragments of WSSV-like genes are clearly not functional, 3) several transposon-derived sequences are found in association with the WSSV-like sequences, 4) there is considerable intra-individual variation in the specimen chosen for sequencing, and 5) analogous variation is obtained in a test fragment sequenced for several other crab specimens from Jamaica. The unexpectedly high ratio of the viral to host DNA (1:10) and intact state of the genes with WSSV homology located on the scaffolds are mirrored by the two similar cases of WSSV-like sequences discovered in genomic libraries of penaeid shrimps [17, 18], that were reanalyzed for this study.
Relationships between decapod WSSV-like sequences and WSSV
In all three cases of the WSSV-like viruses in decapod genomes, the similarity to WSSV is restricted to the protein level only. Together with the difference in gene order this clearly indicates that these sequences are not derived from contemporary infections by WSSV. Despite the fact that the genomes of the crab and the two penaeids contain very high amounts of viral sequences of similar origin, respective endogenization events must have taken place independently, since no WSSV-like sequences were found by us in two other decapod libraries: from a brachyuran crab and a caridean shrimp  (see Fig. 3). Moreover, while M. depressus and P. japonicus contain single sets of WSSV-like genes in their genomes, P. monodon unambiguously shows signs of multiple genome integration events as already pointed out by Huang and co-authors . In addition, our analysis of the EST database shows that the phenomenon of the presence of WSSV-like sequences in genomes (or infection with WSSV-like viruses) might be relatively common among decapods and urges a systematic investigation.
Phylogenetic trees reconstructed for slowly evolving genes and the fact that the viral sequences integrated in the crab genome are much more similar to WSSV genes than their counterparts from the shrimp genomes, indicate that the divergence of the viruses integrated in penaeid genomes predates the split between WSSV and the virus integrated in M. depressus.
The gene order in WSSV and in the sequences of viral origin in decapods are very different, although the shuffling is incomplete and syntenic blocks can still be recognized in the scaffolds reconstructed for M. depressus. Inspection of shared neighboring genes for the largest gene sets, WSSV, M. depressus scaffolds and P. monodon repetitive elements, shows that all three viral genomes are equidistant from each other in terms of gene order. Most of these structural differences were likely present in the WSSV-like viruses before their integration in the decapod genomes, because the rearrangements do not seem to involve any host genes in the case of M. depressus and only in isolated cases were they involved in the fragments of viral origin in shrimps [17, 18].
Ancestral gene composition and new gene acquisitions
Despite the structural dissimilarity and prominent differences in homologous protein sequences, the viral genome detected in M. depressus on the one hand and the viral sequences found in the shrimp databases on the other show striking similarity in gene content. While the WSSV genome is predicted to encode about ~180 genes, we were able to identify only 67 of their homologs in the M. depressus and 48 in P. monodon. Of these, homologs for 38 genes were detected in all three gene sets. This indicates that most of the WSSV genes are either 1) lineage-specific, 2) have high evolutionary rates and cannot be detected with the homology-based methods we utilized, or 3) were not recovered due to the selective loss of genes in the fossilized viral genomes. The fact that the majority of ORFs in the viral genome recovered from the M. depressus library remain uncharacterized indicates that the comparatively low number of WSSV homologs detected is not due to the incompleteness of the draft genome. In a single case we were able to recover a WSSV homolog thanks to positional information only, indicating low sequence conservation for this gene. The existence of lineage-specific genes is evidenced e.g. by the dUTPase genes. While a single dUTPase gene is present in WSSV  and its homologues were also present in the decapod viruses, at least three independent gene acquisition events can be inferred. A similar situation with purported multiple acquirements of dUTPases is known for Siphoviridae .
A complex history also underlies the evolution of the inhibitor of apoptosis proteins. While two proteins involved in apoptosis suppression were previously identified in WSSV [39, 40], none of them showed similarity to the IAP family. At the same time a single IAP ORF was found in the M. depressus virus and several IAP-coding genes were discovered in repetitive sequences in shrimps, even if not always in clear direct association with WSSV-like sequences [17, 18]. Most of these IAPs group together with IAPs independently identified in penaeid and caridean shrimps . At the same time, the sole IAP from the M. depressus scaffolds may be of mosaic origin: only one of its three domains undoubtedly comes from the same source as other decapod IAPs.
Fossilized WSSV-like viruses and the decapod hosts
The protein similarities between WSSV orthologs are highly correlated indicating that different genes have their own substitution rates, which remained relatively stable since divergence of the viruses. This, and also the absence or relative rarity of frameshifts and premature stop codons in the WSSV-like genes from the decapods can be explained by either their contemporary functionality in the host genomes or very recent genome integration events. While the second scenario cannot be completely excluded, direct evidence does exist in favor of at least some transcriptional activity of the WSSV-like genes in shrimps [17, 18] (see also Fig. 3). It has been hypothesized that these sequences may play a role in defending the host from WSSV infection . It is noteworthy in this respect, that M. depressus is a fully terrestrial crab, which evolved from marine ancestors about 4.5 mya and since then had no persistent contact with other decapods . As WSSV is transmitted horizontally between hosts through water , it is likely that the WSSV-like virus was integrated into the crab genome before the switch of the host to terrestrial habitats. Otherwise, the same or a similar virus may still be present in the crab population, but in that case it must utilize a different way of transmission.
The amount of WSSV-like sequences in all three decapod libraries is surprisingly high: about 10 % in M. depressus and about 22 % in P. monodon, which has clearly very important consequences for the genome organization of the hosts [17, 18]. Although some differences between different copies of the same regions of viral origin are identifiable, relative scarcity of this variation must be explained by the action of mechanisms similar to concerted evolution of rRNA gene clusters (see e.g. ).
The very fact of the presence of genomic fragments of WSSV-like viruses in host genomes is rather surprising, since WSSV itself does not integrate in the host genome . Nevertheless, similar cases of fossilization of viruses which normally have no integrated stages were discovered in other animals [44–46].
The presence of large genomic fragments from WSSV-like viruses in decapod genomes is indicative of a long term co-evolution and points at the existence of multiple lineages within this viral group. Other non-fossilized members of the Nimaviridae are thus expected to be discovered. One of the practical consequences of this hypothesis is that the methods currently being developed developed to detect and resist the spread of the white spot syndrome in penaeid shrimp aquaculture [47–50] would require adjustments to target multiple species from the same group.
Phylogenetic position of the Nimaviridae
A wider sample of (endogenized) viruses related to WSSV could potentially bring new information about affinities of this isolated viral group (for earlier attempts see [12, 13, 51]). Although the results are not yet conclusive, it is clear that no other extant virus family is closely related to this group. The DNA polymerase and TBP phylogenies point to a very basal position of the clade within Eukaryota or close to the split Eukaryota-Archaea. Other genes (both subunits of the ribonucleotide reductase and the endonuclease) are seemingly of metazoan or at least opisthokont origin. As already indicated by Wynant and co-authors , the WSSV endonuclease belongs to a specific clade of enzymes found exclusively in Pancrustacea (crustaceans and hexapods); its ortholog discovered in this study shares its structure and basal position within this clade.
Our study gives the first detailed analysis of fossilized relatives of the White Spot Syndrome Virus, an important pest in penaeid shrimp aquaculture with obscure phylogenetic affinities. Genomic fragments of these viruses are found in several decapod species, but reflect independent endogenization events and are not directly related to the contemporary infection with WSSV. In this respect our study provides an important starting point for comparative studies of WSSV.
This work would have been impossible without continuous support from the Department of Animal Ecology, Evolution and Biodiversity headed by Prof. Dr. Ralph Tollrian. P.B. was supported by the Germany Scholarship. 454 -sequencing was made possible through a student fellowship by the Crustacean Society to N.R. and departmental financial support by Prof. Jürgen Heinze (University of Regensburg). Collection of the material for the current paper by C.D.S. was supported by the German Research Foundation DFG (project Schu 1460/3). This project was supported in part by a grant of the Dinter-Foundation to F.L. and DFG project LE 2323/2.
- Leu J, Yang F, Zhang X, Xu X, Kou G, Lo C. Whispovirus. Curr Top Microbiol Immunol. 2009;328:197–227.PubMedGoogle Scholar
- Khadijah S, Neo S, Hossain M, Miller L, Mathavan S, Kwang J. Identification of white spot syndrome virus latency-related genes in specific-pathogen-free shrimps by use of a microarray. J Virol. 2003;77:10162–7.PubMed CentralPubMedView ArticleGoogle Scholar
- Lo C, Ho C, Peng S, Chen C, Hsu H, Chiu Y, et al. White spot syndrome baculovirus (WSBV) detected in cultured and captured shrimp, crabs and other arthropods. Dis Aquat Organ. 1996;27:215–25.View ArticleGoogle Scholar
- Flegel T. Major viral diseases of the black tiger prawn (Penaeus monodon) in Thailand. World J Microbiol Biotechnol. 1997;13:433–42.View ArticleGoogle Scholar
- Rajendran K, Vijayan K, Santiago T, Krol R. Experimental host range and histopathology of white spot syndrome virus (WSSV) infection in shrimp, prawns, crabs and lobsters from India. J Fish Dis. 1999;22:183–91.View ArticleGoogle Scholar
- Chen L, Lo C, Chiu Y, Chang C, Kou G. Natural and experimental infection of white spot syndrome virus (WSSV) in benthic larvae of mud crab Scylla serrata. Dis Aquat Organ. 2000;40:157–61.PubMedView ArticleGoogle Scholar
- Pradeep B, Rai P, Mohan S, Shekhar M, Karunasagar I. Biology, host range, pathogenesis and diagnosis of White spot syndrome virus. Indian J Virol. 2012;23:161–74.PubMed CentralPubMedView ArticleGoogle Scholar
- Van Hulten M, Witteveldt J, Peters S, Kloosterboer N, Tarchini R, Fiers M, et al. The white spot syndrome virus DNA genome sequence. Virology. 2001;286:7–22.PubMedView ArticleGoogle Scholar
- Yang F, He J, Lin X, Li Q, Pan D, Zhang X, et al. Complete genome sequence of the shrimp white spot bacilliform virus. J Virol. 2001;75:11811–20.PubMed CentralPubMedView ArticleGoogle Scholar
- Marks H, Goldbach R, Vlak J, Van Hulten M. Genetic variation among isolates of white spot syndrome virus. Arch Virol. 2004;149:673–97.PubMedView ArticleGoogle Scholar
- King A, Adams M, Carstens E, Lefkowitz E. Virus Taxonomy: Classification and Nomenclature of Viruses, 9th Report of the International Committee on Taxonomy of Viruses. USA: Elsevier; 2012. p. 1327.Google Scholar
- Van Hulten M, Vlak J. Identification and phylogeny of a protein kinase gene of white spot syndrome virus. Virus Genes. 2001;22:201–7.PubMedView ArticleGoogle Scholar
- Chen L-L, Wang H-C, Huang C-J, Peng S-E, Chen Y-G, Lin S-J, et al. Transcriptional analysis of the DNA polymerase gene of shrimp white spot syndrome virus. Virology. 2002;301:136–47.PubMedView ArticleGoogle Scholar
- Marks H, Vorst O, van Houwelingen A, van Hulten M, Vlak J. Gene-expression profiling of White spot syndrome virus in vivo. J Gen Virol. 2005;86:2081–100.PubMedView ArticleGoogle Scholar
- Schubart C, Diesel R, Hedges S. Rapid evolution to terrestrial life in Jamaican crabs. Nature. 1998;393:363–5.View ArticleGoogle Scholar
- Diesel R, Schubart C, Duffy J, Thiel M: The social breeding system of the Jamaican bromeliad crab Metopaulias depressus. In: JE Duffy, M Thiel (eds) Evolutionary ecology of social and sexual systems: Crustaceans as model organisms. New York: Oxford University Press; 2007:365–386.Google Scholar
- Koyama T, Asakawa S, Katagiri T, Shimizu A, Fagutao F, Mavichak R, et al. Hyper-expansion of large DNA segments in the genome of kuruma shrimp. Marsupenaeus japonicus BMC Genomics. 2010;11:141.PubMedView ArticleGoogle Scholar
- Huang S, Lin Y, You E, Liu T, Shu H, Wu K, et al. Fosmid library end sequencing reveals a rarely known genome structure of marine shrimp Penaeus monodon. BMC Genomics. 2011;12:242.PubMed CentralPubMedView ArticleGoogle Scholar
- Leese F, Brand P, Rozenberg A, Mayer C, Agrawal S, Dambach J, et al. Exploring Pandora’s box: potential and pitfalls of low coverage genome surveys for evolutionary biology. PLoS One. 2012;7, e49202.PubMed CentralPubMedView ArticleGoogle Scholar
- Chevreux B. MIRA: an automated genome and EST assembler. PhD thesis, Ruprecht-Karls University. Heidelberg, Germany: Ruprecht-Karls University; 2005. p. 161.Google Scholar
- Altschul S, Madden T, Schaffer A, Zhang J, Zhang Z, Miller W, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucl Acids Res. 1997;25:3389–402.PubMed CentralPubMedView ArticleGoogle Scholar
- Kent W. BLAT—the BLAST-like alignment tool. Genome Res. 2002;12:656–64.PubMed CentralPubMedView ArticleGoogle Scholar
- Langmead B, Salzberg S. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9:357–9.PubMed CentralPubMedView ArticleGoogle Scholar
- Mayer C. Phobos Version 3.3.12. A tandem repeat search program. 2010. p. 20.Google Scholar
- Rozen S, Skaletsky H. Primer3 on the WWW for general users and for biologist programmers. Methods Mol Biol. 2000;132:365–86.PubMedGoogle Scholar
- Ranwez V, Harispe S, Delsuc F, Douzery E. MACSE: Multiple Alignment of Coding SEquences accounting for frameshifts and stop codons. PLoS One. 2011;6, e22594.PubMed CentralPubMedView ArticleGoogle Scholar
- Katoh K, Misawa K, Kuma K, Miyata T. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucl Acids Res. 2002;30:3059–66.PubMed CentralPubMedView ArticleGoogle Scholar
- Quevillon E, Silventoinen V, Pillai S, Harte N, Mulder N, Apweiler R, et al. InterProScan: protein domains identifier. Nucl Acids Res. 2005;33:W116–20.PubMed CentralPubMedView ArticleGoogle Scholar
- Bao Z, Eddy S. Automated de novo identification of repeat sequence families in sequenced genomes. Genome Res. 2002;12:1269–76.PubMed CentralPubMedView ArticleGoogle Scholar
- Meyer M, Munzner T, Pfister H. MizBee: a multiscale synteny browser. IEEE Trans Vis Comput Graph. 2009;15:897–904.PubMedView ArticleGoogle Scholar
- Sankoff D, Blanchette M. Multiple genome rearrangement and breakpoint phylogeny. J Comput Biol. 1998;5:555–70.PubMedView ArticleGoogle Scholar
- Stamatakis A. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics. 2006;22:2688–90.PubMedView ArticleGoogle Scholar
- Gilles A, Meglécz E, Pech N, Ferreira S, Malausa T, Martin J. Accuracy and quality assessment of 454 GS-FLX Titanium pyrosequencing. BMC Genomics. 2011;12:245.PubMed CentralPubMedView ArticleGoogle Scholar
- Li Q, Yang F, Zhang J, Chen Y. Proteomic analysis of proteins that binds specifically to the homologous repeat regions of white spot syndrome virus. Biol Pharm Bull. 2003;26:1517–22.PubMedView ArticleGoogle Scholar
- Marks H, Ren X, Witteveldt J, Sandbrink H, Vlak J, van Hulten M. Transcription regulation and genomics of White Spot Syndrome Virus. Dis Asian Aquac. 2005;5:363–77.Google Scholar
- Zhu Y, Ding Q, Yang F. Characterization of a homologous-region-binding protein from white spot syndrome virus by phage display. Virus Res. 2007;125:145–52.PubMedView ArticleGoogle Scholar
- Liu X, Yang F. Identification and function of a shrimp white spot syndrome virus (WSSV) gene that encodes a dUTPase. Virus Res. 2005;110:21–30.PubMedView ArticleGoogle Scholar
- Baldo A, McClure M. Evolution and horizontal transfer of dUTPase-encoding genes in viruses and their hosts. J Virol. 1999;73:7710–21.PubMed CentralPubMedGoogle Scholar
- Wang Z, Hu L, Yi G, Xu H, Qi Y, Yao L. ORF390 of white spot syndrome virus genome is identified as a novel anti-apoptosis gene. Biochem Biophys Res Commun. 2004;325:899–907.PubMedView ArticleGoogle Scholar
- Leu J, Chen L, Lin Y, Kou G, Lo C. Molecular mechanism of the interactions between white spot syndrome virus anti-apoptosis protein AAP-1 (WSSV449) and shrimp effector caspase. Dev Comp Immunol. 2010;34:1068–74.PubMedView ArticleGoogle Scholar
- Leu J, Kuo Y, Kou G, Lo C. Molecular cloning and characterization of an inhibitor of apoptosis protein (IAP) from the tiger shrimp, Penaeus monodon. Dev Comp Immunol. 2008;32:121–33.PubMedView ArticleGoogle Scholar
- Eickbush T, Eickbush D. Finely orchestrated movements: evolution of the ribosomal RNA genes. Genetics. 2007;175:477–85.PubMed CentralPubMedView ArticleGoogle Scholar
- Escobedo‐Bonilla C, Alday‐Sanz V, Wille M, Sorgeloos P, Pensaert M, Nauwynck H. A review on the morphology, molecular characterization, morphogenesis and pathogenesis of white spot syndrome virus. J Fish Dis. 2008;31:1.PubMedView ArticleGoogle Scholar
- Katzourakis A, Gifford R. Endogenous viral elements in animal genomes. PLoS Genet. 2010;6, e1001191.PubMed CentralPubMedView ArticleGoogle Scholar
- Patel M, Emerman M, Malik H. Paleovirology—ghosts and gifts of viruses past. Current Opinion in Virology. 2011;1:304–9.PubMed CentralPubMedView ArticleGoogle Scholar
- Thézé J, Leclercq S, Moumen B, Cordaux R, Gilbert C: Remarkable diversity of endogenous viruses in a crustacean genome. Genome Biol Evol 2014;6:2129-2140Google Scholar
- Witteveldt J, Vlak J: Virus-host interactions of white spot syndrome virus. In: Leung KY (ed) Current Trends in the Study of Bacterial and Viral Fish and Shrimp Diseases. Singapore: World Scientific Publishing; 2004:237–255.Google Scholar
- Ha Y, Soo-Jung G, Thi-Hoai N, Ra C, Kim K, Nam Y, et al. Vaccination of shrimp (Penaeus chinensis) against white spot syndrome virus (WSSV). J Microbiol Biotechnol. 2008;18:964–7.PubMedGoogle Scholar
- Chaivisuthangkura P, Longyant S, Rukpratanporn S, Srisuk C, Sridulyakul P, Sithigorngul P. Enhanced white spot syndrome virus (WSSV) detection sensitivity using monoclonal antibody specific to heterologously expressed VP19 envelope protein. Aquaculture. 2010;299:15–20.View ArticleGoogle Scholar
- Pathan M, Gireesh-Babu P, Pavan-Kumar A, Jeena K, Sharma R, Makesh M, et al. In vivo therapeutic efficacy of recombinant Penaeus monodon antiviral protein (rPmAV) administered in three different forms to WSSV infected Penaeus monodon. Aquaculture. 2013;376–379:64–7.View ArticleGoogle Scholar
- Liu W, Yu H, Peng S, Chang Y, Pien H, Lin C, et al. Cloning, characterization, and phylogenetic analysis of a shrimp white spot syndrome virus gene that encodes a protein kinase. Virology. 2001;289:362–77.PubMedView ArticleGoogle Scholar
- Wynant N, Santos D, Verdonck R, Spit J, Van W, Broeck J. Identification, functional characterization and phylogenetic analysis of double stranded RNA degrading enzymes present in the gut of the desert locust, Schistocerca gregaria. Insect Biochem Mol Biol. 2014;46:1–8.PubMedView ArticleGoogle Scholar
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.