Molecular phylodynamics and protein modeling of infectious salmon anemia virus (ISAV)
© Castro-Nallar et al; licensee BioMed Central Ltd. 2011
Received: 6 July 2011
Accepted: 2 December 2011
Published: 2 December 2011
Skip to main content
© Castro-Nallar et al; licensee BioMed Central Ltd. 2011
Received: 6 July 2011
Accepted: 2 December 2011
Published: 2 December 2011
ISAV is a member of the Orthomyxoviridae family that affects salmonids with disastrous results. It was first detected in 1984 in Norway and from then on it has been reported in Canada, United States, Scotland and the Faroe Islands. Recently, an outbreak was recorded in Chile with negative consequences for the local fishing industry. However, few studies have examined available data to test hypotheses associated with the phylogeographic partitioning of the infecting viral population, the population dynamics, or the evolutionary rates and demographic history of ISAV. To explore these issues, we collected relevant sequences of genes coding for both surface proteins from Chile, Canada, and Norway. We addressed questions regarding their phylogenetic relationships, evolutionary rates, and demographic history using modern phylogenetic methods.
A recombination breakpoint was consistently detected in the Hemagglutinin-Esterase (he) gene at either side of the Highly Polymorphic Region (HPR), whereas no recombination breakpoints were detected in Fusion protein (f) gene. Evolutionary relationships of ISAV revealed the 2007 Chilean outbreak group as a monophyletic clade for f that has a sister relationship to the Norwegian isolates. Their tMRCA is consistent with epidemiological data and demographic history was successfully recovered showing a profound bottleneck with further population expansion. Finally, selection analyses detected ongoing diversifying selection in f and he codons associated with protease processing and the HPR region, respectively.
Our results are consistent with the Norwegian origin hypothesis for the Chilean outbreak clade. In particular, ISAV HPR0 genotype is not the ancestor of all ISAV strains, although SK779/06 (HPR0) shares a common ancestor with the Chilean outbreak clade. Our analyses suggest that ISAV shows hallmarks typical of RNA viruses that can be exploited in epidemiological and surveillance settings. In addition, we hypothesized that genetic diversity of the HPR region is governed by recombination, probably due to template switching and that novel fusion gene proteolytic sites confer a selective advantage for the isolates that carry them. Additionally, protein modeling allowed us to relate the results of phylogenetic studies with the predicted structures. This study demonstrates that phylogenetic methods are important tools to predict future outbreaks of ISAV and other salmon pathogens.
Infectious salmon anemia virus (ISAV) is a pathogen that has been associated with high fish mortality in the aquaculture industry . The cumulative mortality associated with each outbreak of ISAV in Norway and other countries is very high, reaching up to 100% of fish stock in some cases [2–6]. ISAV has been classified as the only member of Isavirus genus belonging to the Orthomyxoviridae family, which includes Influenza viruses . Virions consist mainly of a membranous envelope with two surface glycoproteins and a matrix protein surrounding a ribonucleoprotein complex. Association of nucleoprotein, three polymerase subunits and the genomic RNA, in turn, forms the ribonucleoprotein complex. Virion morphology varies from spherical to pleiomorphic. The enveloped virus particles are of 45-140 nm in diameter, however, highly pleiomorphic particles up to 700 nm in the longest dimension are occasionally observed. The genome organization is consistent with other members of the Orthomyxoviridae; comprising of 8 segments of single stranded negative sense RNA ((-) ssRNA), in which each segment encodes one protein except segments 7 and 8, which encode two (European strains)/three (Canadian strains) and two proteins, respectively . Four major structural proteins have been identified, including a 68 kDa nucleoprotein, a 22 kDa matrix protein, a 42 kDa surface glycoprotein named haemagglutinin-esterase protein with receptor-binding and receptor-destroying activity, and a 50 kDa surface glycoprotein with fusion activity, coded by genome segments 3, 8, 6, and 5, respectively. Segments 1, 2, and 4 encode the putative viral polymerase subunits PB1, PB2, and PA. The ORF1 of segment 7 encodes a nonstructural protein with interferon antagonistic properties, while ORF2 encodes a nuclear export protein. The smaller ORF1 of segment 8 encodes the matrix protein, while the larger ORF2 encodes an RNA-binding structural protein also with interferon antagonistic properties [4, 9–11].
Regarding the genes coding for surface proteins, he genes exhibit a length polymorphism close to the 3' end called highly polymorphic region (HPR) that has been related to virulence, although this is not entirely clear [12–14]. In turn, f genes seem to be more conserved in length, with the exception of the region next to the putative protease-processing site (PPPS) [15, 16]. The fusion protein in ISAV is synthesized as a precursor protein designated as F0. In order to exert its biological function, F0 must be cleaved by cellular proteases to generate F1 and F2 . Several insertions of 10-12 amino acids (IN1-4), identical to sequences from other genomic segments, have been suggested to confer novel protease cleavage sites , and thus they could be subjected to positive selection.
Unlike other members of the Orthomyxoviridae family that commonly infect mammals or birds, ISAV naturally infects fishes of the Salmonidae family. Although natural outbreaks of ISA have only been recorded in farmed Atlantic salmon (Salmo salar), the virus has been isolated also from rainbow trout (Oncorhynchus mykiss)  and Coho salmon (Oncorhynchus kisutch) in Chile . The disease was first reported in Norway and the virus was associated with the disease in 1984. At present, ISAV has been reported in Canada (New Brunswick in 1996 and Nova Scotia in 2000), USA (Maine in 2001), Scotland (1998), The Faroe Islands (2000; Denmark), and recently in Chile [6, 19, 20]. Because of the worldwide distribution and the potential for serious economical losses, ISAV has been listed as a non-exotic disease for the European Union (EU) and is therefore monitored closely by the European Community Reference Laboratory for Fish Diseases http://www.crl-fish.eu/.
In general, phylogenetic studies of infectious diseases have been centered on human and zoonotic diseases with little attention to enzootic pathogens, although with few exceptions, for example, in plant viruses' crop-related diseases [21–24]. Given the human health impact, phylogenetic studies of Orthomyxoviruses have been largely devoted to influenza viruses in particular to influenza A [25, 26]. To date, questions related to global distributions, molecular evolution and adaptations, sources of genetic diversity and emergence of new isolates have been extensively addressed in the influenza literature. However, little has been done with its relatives, e.g., ISAV, even when the same genetic structure, and presumably, similar evolutionary forces might be acting on ISAV populations.
In 1999 an ISAV belonging to the North American (NA) genotype was isolated from Coho salmon in Chile and the first outbreak of ISAV in marine-farmed Atlantic salmon in the Southern hemisphere occurred in the same country starting in June 2007. Given that there are no known natural hosts for ISAV in Chile, only a few studies have investigated whether the outbreak source came from North America or Europe by using basic phylogenetic methods [6, 15, 19, 20].
In this study, we apply more sophisticated phylogenetic and population genetic methods to currently available sequence data to address questions regarding the 2007 Chilean ISAV outbreak. In particular, we asked (i) whether recombination is a relevant force acting on f and he genes, (ii) whether the age estimates of the Chilean clade agree with epidemiological data, and (iii) whether population processes leave marks in ISAV genomes that allow us to reconstruct its demographic history. Finally, we developed new improved structural homology models with which positively selected sites were mapped to relate them with functional traits.
We included 70 isolates from Canada, Chile and Norway for which both genes, f and he, were available along with their sampling dates. The length of the final alignments was 1365 and 1269 bp, respectively (additional file 1). We selected TVM as the best-fit nucleotide substitution model for both genes based on Akaike Information Criterion (AIC) . Rate heterogeneity was estimated as the gamma distributed shape parameter α = 0.5740 and 0.6570 for f and he, respectively. We also tested for whether the data were best explained by a strict or relaxed molecular clock. Bayes Factor hypothesis testing favored the relaxed model in both cases (10.44 and 21.82 log BF for f and he, respectively). This was further supported by the Coefficient of Variation and ucld.stdv parameters.
In order to explore the data and to see whether recombination plays a recognizable role in ISAV evolution, we estimated the population genetic parameter Theta (θ) and recombination rate (r) for each population. When calculated under the infinite-sites assumption, the Canadian and Chilean populations were more diverse than the older Norwegian population (0.08579, 0.06247 and 0.01809, respectively). In turn, when using a coalescent sampler estimator, the Norwegian population turned out to be, as expected, more than three times more diverse 0.04774 (95% CI 0.0198-0.08448) than the young Chilean population 0.0142 (95% CI 0.008368-0.020745), and than the Canadian population 0.03523 (95% CI 0.021083-0.051501) as well. Although at first this seems contradictory, coalescent-based estimators of genetic diversity are far more reliable than those based on segregating sites . By taking into account evolutionary history, coalescent-based estimates are more robust to model assumptions and allow us to determine factors influencing the observed patterns (see below). This overall result has practical implications as less diverse populations, i.e., genetically homogeneous, would be more susceptible to vaccines and antiviral treatments whereas more structured populations, i.e., individuals genetically similar within populations but different among populations, make effective vaccine and antiviral deliveries more difficult. So, population genetic inferences can be instrumental in fish health surveillance programs as demonstrated in human health settings [29–32].
Regarding the f gene, we do not detect any recombination breakpoints. At first this was surprising since small non-homologous fragments have been observed at around 266-276, which might confer novel protease cleavage sites (IN1-4) and therefore have a fitness impact. We think there are at least two potential reasons for our inability to detect recombination in this gene region. First, small fragments are hard to detect with current methods because of the low information they contain. Second, methods are not designed to explicitly look for non-homologous recombination. Also, it would be hard to detect if the INs are evolutionarily linked to the f gene and/or have similar substitution rates.
In order to estimate phylogenetic relationships, we used Bayesian phylogenetic methods to test the Chilean outbreak isolates for monophyly, to estimate rates of molecular evolution across the genomes, and estimate divergence times for clades of particular interest (e.g., geographically isolated outbreaks). Recombination has detrimental effects on phylogeny estimation and phylogeny-based analyses [34, 42, 43] and thus the recombinant portion of the he gene was not included in these analyses. Segmented viruses have the potential to re-assort their segments in novel combinations, which in phylogenetic terms has the same effect as recombination. Therefore, we estimated phylogenies and molecular evolution parameters separately for each gene since concatenating both datasets would render inaccurate inferences as ISAV might exhibit reassortment between segments.
As other (-) ssRNA viruses, ISAV exhibited extremely high substitution rates. Similar to influenza viruses, ISAV replication machinery lacks proofreading activity, which has been proposed as the increased source of variability in RNA viruses [33, 44]. Accordingly, substitution rates were estimated at 7.83 × 10-4 substitutions per site per year (s/s/y) (4.31 × 10-4 - 1.18 × 10-3 95% HPD) for the f gene and 1.039 × 10-3 (s/s/y) (3.87 × 10-4 - 1.86 × 10-3 95% HPD) for the he gene (Figure 2).
Although controversial, Vike et al.  suggested that the source of ISAV in the Chilean outbreak were infected eggs brought from Norway, which could be explained by the fact that until 2008 this country was the main provider of embryonated eggs http://www.sernapesca.cl. It is clear that the Chilean outbreak clade has a sister relationship with Norwegian isolates (closely related to ISAV8 (97/09/615); see Figure 2), that the basal clades and ancestral nodes in the phylogeny are from Norway, and that the tMRCA is around 1989-1996, which matches the date when egg importation grew from less than 20 million to 50 million imported units http://www.sernapesca.cl. In addition, the BSP analysis shows that the ISAV population increased exponentially since 2006, suggesting that the virus population remained constant since 2001 (tMRCA of Chilean outbreak clade) and that the outbreak was publicly reported one year after this predicted population expansion. These results constitute valuable information for fish health officials to make informed future decisions based on what contingency measures were implemented at the time of transmission from Norway to Chile. For instance, it is very interesting to note that in 2011 the main ISAV genotype detected in Chile corresponded to HPR0 http://www.sernapesca.cl. Conversely, HPR7b was nearly 80% prevalent between epidemic years 2007 and 2008 . This could be attributed to the change of source material from Norway to Iceland in 2009. For a future study it would be interesting to address the origin of newly detected avirulent HPR0 strains that would come from a country other than Norway.
In order to find potentially relevant sites in the primary sequence of both surface genes and to determine possible structure-function relations, we used a two-rate Fixed-Effects likelihood (two-rate FEL) approach to find positively selected sites (dN/dS rate ratio > 1). In a gene-based approach, we found that both genes are evolving under purifying selection (dN/dS rate ratio < 1) as expected for fast-evolving entities with relatively small genomes (0.2038 and 0.1382 for he and f genes, respectively).
Using a site-by-site approach, we found that the he gene has two sites under positive selection within codons in positions 134 and 340 (Figure 1B; numbering according to ISAV752 isolate). In the he gene, amino acid Val/Leu134 is located specifically on the globular distal domain (Figure 1A), which is related to HE protein main function, i.e., cell receptor recognition . Amino acid Ser/Arg/Gly/Ala340 lies upstream the HPR region, located in the stalk of the protein (Figure 1A), which prompted us to hypothesize that it might be involved in recombination. Interestingly, the expression of recombinant ISAV HEΔ339-351 results in decreased expression levels, impaired salmon serum recognition, and hema-adsorption activity. In contrast, HEΔ349-351 shows normal parameters , suggesting that positively selected amino acid in position 340 is due to its great importance for HE activity.
We also found three sites under positive selection in the f gene within codons that code for amino acid residues 130, 170, and 279 (Figure 1F). It was not possible to reliably model the f gene protein product structure. However, we were able to map amino acid Ala/Gly/Thr279, which lies immediately downstream of insertions (INs) . This constitutes an interesting finding given that insertions in this region have been reported to confer novel PPPS , as outlined in Figure 1F. In addition, Ala/Gly/Thr279 lies immediately downstream to Arg278 (Arg267 in ISAV fusion without insertion), a putative cleavage site  that has been suggested as a virulence factor . Conversely, a recent study suggested that substitution of Gln/His266 to Leu266 (two positions upstream to our predicted site) in ISAV fusion protein without insertion is in association with an increase of virulence behavior, as seen in Chilean ISAV 901 strain [13, 15]. This has epidemiological implications as isolates carrying these elements would have an advantage compared to those that do not. This finding predicts that the frequency of these alleles will increase in the populations where they are present, e.g., IN4, which is only found in Chile with almost 80% prevalence. It has been suggested that INs close to the cleavage site exposes this region to the solvent probably helping in recognition and processing. Our improved model shows that insertions lie inside the alpha helix structure that is enlarged with subsequent addition of residues, which leads to the consequent exposure of PPPSs (Figure 1D and 1E), in comparison with ISAV901 fusion protein without insertion (Figure 1C). These findings also support the notion that IN's serve as virulence determinants. For instance, the unique insertion, IN4, provides a cleavage site for Asp-N endopeptidase metalloprotease, which has been detected in the skin mucosa of salmonids . Since the HPR in he as well as the INs in f are associated with virulence and are also under natural selection pressure, we think that experimental confirmation of these results will lead to a better understanding of molecular determinants of pathogenicity for both surface proteins.
In summary, this study investigated the evolutionary history of ISAV, an important economic infection for salmon culture worldwide. Although less exploited in non-human diseases, phylogenetic and phylodynamic analyses characterize pathogens in space and time, which is perfectly suited for surveillance programs. For instance, identification of the origin of an outbreak could aid officials in formulating management policies. Though not pursued in this study, phylogeographic analyses could help identify virus sources and venues from which pathogens are spreading into different populations or fisheries. Reconstructing past population dynamics could help monitor how an infectious agent is changing or reacting to sanitary interventions in terms of its effective population size. Also, examining molecular sequences for positive selection, all in a structural context, can identify determinants of virulence or other molecular adaptations.
In this study we found that ISAV, as its relatives, (i) exhibit a low recombination rate, though recombination is present in the he surface gene. We hypothesized that HPR region diversity is governed by recombination probably due to template switching. (ii) We also provided more evidence in favor of the Norwegian origin hypothesis in a coherent phylogenetic and statistical framework. Contrary to the widely accepted view , (ii) ISAV HPR0 genotype is not the precursor of all ISAV he genotypes, but appears to share a common ancestor with the Chilean outbreak clade. Finally, (iv) we identified potential determinants of fitness as identified by being under positive selection in a structural context; the protein models allowed us to relate the results of phylogenetic studies with the predicted structures, which suggest an important structural function for the predicted positively selected sites. Our analysis shows that the ISAV outbreak remained undetected in Chile since 2001 and that the population started to increase prior to the identification of the outbreak. Thus, we demonstrate the utility of evolutionary analyses in characterizing temporal, spatial, and mechanistic aspects of ISAV outbreaks, as well as their use in predicting future outbreaks of ISAV and other pathogens. Undoubtedly, collaboration between basic experimental research and phylogenetic fields will help understand the dynamics of ISAV spread in space and time as it has proven fruitful in human health and emerging infectious diseases studies.
For this study, we focused on the two most abundant genes available, which also have been linked to virulence and used extensively for genotyping, namely the Fusion (f) and Hemagglutinin-esterase (he) genes. Sequences ranging from 1989-2009 were obtained from GenBank according to several criteria. First, in order to implement a molecular clock, we selected only full-length sequences with reported isolation dates. Second, we only used isolates for which both target genes were sequenced. After these criteria were applied, seventy full-length sequences from each gene were retrieved from GenBank (accession numbers in additional file 3) from strains isolated in Canada, Norway, and Chile (alignment in additional file 1). To avoid alignment artifacts and to come up with an optimal statement of homology, nucleotide sequences were visualized in Seaview 4.2.7  and then converted to amino acid sequences for alignment with MAFFT  under L-INS-i algorithm, thereby preserving the open reading frame for these protein coding sequences. Amino acid sequences were then back translated into their original nucleotide sequences for further analyses.
Genetic diversity (θ) within populations was estimated both by the traditional segregating sites approach of Watterson  (implemented in DnaSP5 ) and by a coalescent Bayesian Markov Chain Monte Carlo method (BMCMC ; implemented in Lamarc 2.1.3 ). Recombination breakpoint analysis was performed under a suit of methods looking for different types of recombination evidence (Rdp, Genconv, MaxChi, Chimaera as primary methods and Bootscanning and Siscan as secondary (summarized in ) as implemented in Rdp3 . Windows and step sizes were varied to refine primary evidence of recombination as suggested in the user manual. In addition, recombination detection was performed on both genes by using a genetic algorithm approach looking at topological incongruence (GARD)  as implemented in the http://datamonkey.org server.
The best-fit model of evolution was selected under Akaike information Criterion  (AIC) and parameter values using model averaging as implemented in jModelTest 1.0.1 . The best-fit model was used for subsequent analyses; nevertheless, base frequencies and rates were co-estimated with the phylogeny. A phylogenetic tree was inferred using Bayesian Inference (BI) as optimality criterion as implemented in the Beast 1.6.1 package . Posterior probabilities (PP) were approximated by running four independent analyses of 1 × 108 steps sampled every 1 × 105 each. A burn-in period was defined using Tracer  by testing for convergence on likelihood scores. Burn-in was defined as the period before convergence and those runs were discarded accordingly. Also, we assessed convergence and mixing by checking the effective sample size statistic (ESS > 500) and the trace itself. Strict molecular clock was rejected on the basis of Bayes Factor hypothesis testing between uncorrelated lognormal clock (H1) and strict clock (H0) as implemented in Tracer . A lognormal prior with a mean centered on the reported Influenza A HA gene substitution rate  and tip dates were used to calibrate the clock. The same analysis was run 'on empty' to assess the influence of the selected priors. Estimating Association Index (AI), Parsimony Score (PS), and Maximum Clade number (MC) statistics tested the extent of geographic structure in BaTS . Past population dynamics of both genes were inferred using the Bayesian Skyline Plot (BSP) model as a tree prior . Ancestral State Reconstruction (ASR) was estimated used the method of Lemey et al.  implemented in BEAST as described online http://beast.bio.ed.ac.uk/Tutorials#Phylogeography_tutorials
Both genes, he and f, were investigated for evidence of positive selection. First, a global analysis was performed using a counting method (SLAC). Then, a site-by-site analysis was performed using Fixed-Effects likelihood method as implemented in HyPhy 2.0 [65, 66]. Differences in selective pressures between the Chilean clade and the rest of the tree were estimated with SelectionLRT.bf script also within the HyPhy 2.0 package.
To assess whether selected sites lay on structural and functionally relevant regions of proteins, f and he 3D structure models were build using the following protocol. The sequences of the he and the f genes were obtained from GenBank (ADF36510.1; ABG65799.1; AAX46277.1; AAX46259.1; ADF36509.1 and ADF36499.1). Alignments for HE protein were obtained by modeler 9v9 , and for F the server CLUSTALW was used . To verify that the alignments were suitable, we used the secondary structure prediction server PSIPRED http://bioinf.cs.ucl.ac.uk/psipred/. In order to generate the F protein model, we used two templates: hemagglutinin-esterase-fusion glycoprotein structure of influenza C virus (PDB ID: 1FLC) (3.20Å) and hemagglutinin-fusion glycoprotein structure of influenza A virus (PDB ID: 3EYJ) (2.60 Å), with identities of 21% and 20%, respectively. For the HE protein model, we used hemagglutinin-esterase fusion glycoprotein structure of influenza C virus (PDB ID: 1FLC) with 19% identity as template. Finally, both models were minimized (50,000 steps) and a trajectory of 1 ns was obtained by molecular dynamics using the CHARMM force field  in NAMD program . The quality of the models was tested by the Anolea server .
ECN wants to thank Romina Surot-Madrid for useful comments, Comisión Nacional de Investigación Científica y Tecnológica, Gobierno de Chile - Becas Chile, and to BYU Graduate Mentoring Award 2011-12 for funding. MCS wants to thank to DICYT grant 020943CSM from the Universidad de Santiago de Chile, PBCT-CONICYT grant PDA-20 (Gobierno de Chile) and Grant Innova-Corfo 09MCSS-6698. KAC would like to thank Brigham Young University for continuous support and NIH grant R01GM66276 for funding. All the authors want to thank three anonymous reviewers that helped in improving the final version of the manuscript.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.