Evolutionary analysis of foot-and-mouth disease virus serotype SAT 1 isolates from east africa suggests two independent introductions from southern africa

Background In East Africa, foot-and-mouth disease virus serotype SAT 1 is responsible for occasional severe outbreaks in livestock and is known to be maintained within the buffalo populations. Little is known about the evolutionary forces underlying its epidemiology in the region. To enhance our appreciation of the epidemiological status of serotype SAT 1 virus in the region, we inferred its evolutionary and phylogeographic history by means of genealogy-based coalescent methods using 53 VP1 coding sequences covering a sampling period from 1948-2007. Results The VP1 coding sequence of 11 serotype SAT 1 FMD viruses from East Africa has been determined and compared with known sequences derived from other SAT 1 viruses from sub-Saharan Africa. Purifying (negative) selection and low substitution rates characterized the SAT 1 virus isolates in East Africa. Two virus groups with probable independent introductions from southern Africa were identified from a maximum clade credibility tree. One group was exclusive to Uganda while the other was present within Kenya and Tanzania. Conclusions Our results provide a baseline characterization of the inter-regional spread of SAT 1 in sub-Saharan Africa and highlight the importance of a regional approach to trans-boundary animal disease control in order to monitor circulating strains and apply appropriate vaccines.


Background
Foot-and-mouth disease (FMD) is an acute, highly communicable and economically important disease of livestock and it also affects wild ruminants [1]. The causative agent, foot-and-mouth disease virus (FMDV) belongs to the Aphthovirus genus in the family Picornaviridae. Its positive-sense, single-stranded RNA genome of 8.5 kb is translated into a polyprotein which is post-translationally cleaved to 4 structural (VP1, VP2, VP3, VP4) and 8 nonstructural proteins [2]. The structural proteins form the capsid of the virion and, with the exception of VP4, are surface exposed. The VP1 is involved in the interaction with the host cells via the RGD-dependent integrins [3]. The coding sequence for VP1 has been widely used in studies of evolutionary dynamics of FMDV needed for the understanding of the epidemiological patterns of these viruses and for determining possible sources of outbreaks [4][5][6]. The genetic diversity of FMDV is a consequence of the high mutation rate due to the error-prone RNA polymerase lacking proofreading activity [7].
There are seven immunologically distinct serotypes (O, A, C, SAT 1, SAT 2, SAT 3 and Asia 1) of FMDV, each with a wide spectrum of antigenic and epidemiological subtypes distributed around the world [5]. The Southern Africa Territories (SAT) serotypes are restricted in their distribution mainly to sub-Saharan Africa and they co-exist with the Euro-Asiatic (O, A, C) serotypes in the East African region although serotype C has not been reported since 2004. In southern Africa, the epidemiology of the SAT serotypes is mainly associated with African buffalos (Syncerus caffer) which act as reservoirs and sources of outbreaks [8,9]. In eastern Africa, FMD is prevalent in wildlife and within the African buffalo in particular although their role in the epidemiology of the disease has not been as widely studied as in southern Africa. Most outbreaks of FMD in the region are reported among livestock populations. The African buffalo has been reported to be a carrier of the SAT serotypes but not the Euro-Asiatic serotypes in East Africa [10][11][12]. This is similar to the situation in southern Africa. Widespread animal movements in the eastern Africa region are possibly responsible for longterm circulation and reintroductions of FMDV strains, including SAT 1 [13]. However, little quantitative information exists about the extent of such livestock and wildlife mediated dispersal of FMDV as well as the origin and evolutionary history of the SAT 1 viruses circulating in eastern Africa [13,14]. Furthermore, the connectivity between the individual countries and the main routes of dispersal remain unknown, although such information would be of great value in containing the spread of the disease and avoiding introduction of novel strains against which existing vaccine programs may offer little protection.
We have investigated the emergence of FMDV SAT 1 diversity in the region by inferring the phylogeographic history by means of genealogy-based coalescent methods. Furthermore, we have tested for evidence of recombination in the data set which is known to bias phylogenetic inferences as described previously [15][16][17].

Results
Phylogenetic relationships, substitution rates and divergence times The VP1 coding sequences of 11 additional serotype SAT 1 FMD viruses from East Africa have been determined. Using this information, the complete VP1 coding sequences of 8 southern Africa, 14 western Africa, 3 Sudanese, 1 Ethiopian and 27 East Africa FMD serotype SAT 1 viruses from the period 1948 to 2007 were analysed to determine phylogenetic relationships, phylogeography, divergence times and substitution rates. Dating of the common root of the samples showed considerable uncertainty in determination with a mean estimate for the most recent common ancestor (TMRCA) at 538 years before present (ybp) (95% highest posterior density (HPD): 228-897 ybp). The inferred maximum clade credibility (MCC) tree is shown in Figure 1 with the posterior probabilities for the branches shown. The East African SAT 1 viruses formed two main clades (lineages) labelled A and B supported by high posterior probabilities. The Ugandan viruses differed from those of Tanzania and Kenya and are of mainly one lineage (A) while one isolate (UGA/13/74) grouped with viruses of the Sudan and western Africa. Kenyan and Tanzanian viruses grouped together in lineage B and were related to a Zimbabwean isolate. Only little geographic structure was observed within lineage B isolates from Kenya and Tanzania, suggesting high migration rates between these countries. The mean nt substitution rate was 1.30 × 10 -3 substitutions/site/year (s/s/yr) (95% HPD: 5.43 × 10 -4 -2.18 × 10 -3 ) with distinct variation in rates among the clades. We analysed the East African viruses (comprising 27 samples) separately in BEAST and found relatively lower rates at 2.75 × 10 -4 s/s/yr (4.69 × 10 -5 -7.39 × 10 -4 ), while the western African viruses (comprising 14 samples) had higher rates at 6.91 × 10 -3 s/s/yr (3.32 × 10 -3 -1.04 × 10 -2 ).
While the location of the root of the SAT1 tree could not be identified with particular confidence (Bayes factor, BF = 1.5 comparing the posterior probability of the root being in southern Africa against Sudan), there was relatively strong support for the location of several of the remaining nodes in the MCC tree ( Figure 1). From the location-annotated MCC tree, two separate introductions from southern Africa to East Africa were supported by the data, namely one leading to lineage A and one leading to lineage B. In addition, there was strong support for two separate introductions of SAT 1 from Sudan to western Africa. Bayes factor tests revealed that the most significant routes of inter-regional dispersal were the Sudan-western Africa (BF = 21.4) and southern Africa-Kenya/Tanzania (BF = 4.5). No link between the Ugandan and the Kenyan/Tanzanian samples (between lineage A and B) could be identified, and this was in fact found to be the link with the second-lowest posterior support. The western Africa-Kenya/Tanzania link had the lowest support (results not shown).
Predominant purifying selection in the VP1 coding region of FMDV SAT 1 The majority of codons in the VP1 coding region of FMDV SAT 1 appeared to be under purifying (negative) selection. Of the 221 codons analysed, 153 were found to be under negative selection using three methods (single-likelihood ancestor counting, SLAC, at P = 0.1, fixed effects likelihood, FEL, at P = 0.1 and random effects likelihood, REL, with BF > 50) as summarized in Table 1. Five sites (codons 47, 61, 99, 143 and 147) were identified to be under positive selection by at least one method but no site was identified by all the three methods together at values of P = 0.1 (SLAC and FEL) or BF > 50 (REL). However, at P = 0.2, codon 147 (H/ N/E/T, 2 codon positions before the receptor binding motif RGD) was identified by all the methods and was mostly likely to be under true positive selection.
The genetic algorithm (GA) branch analysis showed that 5 rate classes were supported with a large number of models (over 1500 in the 95% confidence set). No branches had significant support for dN >dS although differences existed in the branch selection pattern indicating that some branches may have been under weak positive selection.

Recombination
The genetic algorithm for recombination detection (GARD) detected a putative recombination breakpoint at nucleotide position 168 with a change in Akaike's Information Criterion (AIC c ) of more than100 which suggested support for the recombination model while the Kishino-Hasegewa test showed support for significant topological incongruence at P = 0.01. Indeed, the exploratory analysis using the recombination detection programme (RDP2) had at least one method detect some recombinant sequences (TAN/60/99 and K66/80). However, further analysis did not support the view that these sequences were recombinant and the exclusion of these sequences from the analysis did not affect the phylogenetic results, indicating that they are not likely to be true recombinants (results not shown).

Discussion
FMDV serotype SAT 1 virus strains from East Africa analysed in this study grouped into 2 distinct clades (lineages with > 20% nucleotide divergence) designated   here as lineages A and B. While one of these lineages (A) was found exclusively in Uganda, the other had virus strains from Tanzania and Kenya. Over the whole sampling period, Kenyan and Tanzanian isolates were interspersed in one clade of the phylogenetic tree, suggesting that these countries form a single ecosystem for SAT 1. The separate introduction of lineages A and B to eastern Africa from southern Africa was supported by the high posterior probabilities of the location states in the phylogeographic analysis. A close association of Kenyan/Tanzanian and southern African lineages has been observed earlier [13], but the link between Ugandan and southern Africa lineages reported here reveals a previously undiscovered aspect of the ancestry of the East African SAT 1 lineages. This new finding was not due to a different data set, but rather to our Bayesian phylogeographic analysis framework. When we constructed a neighbour-joining tree using similar methods to [13] using our data set, we were not able to infer a southern African origin for the Ugandan lineage.
Several interesting aspects about the history of sub-Saharan SAT 1 viruses emerge from our continental phylogeographic approach. First, we found that the most likely root location of SAT 1 is in southern Africa. Because of the relatively deep root of the tree (~538 ybp), we could not achieve unequivocal posterior support for this root location (also see [18]). We found a strong link between western African and Sudanese SAT 1 sequences (in agreement with [13]) and our results suggest that the route of entry of SAT 1 into western Africa has been along the Sahel rather than through the rain forest belt surrounding equatorial Africa. A Ugandan isolate from 1974 was found to belong to a lineage otherwise consisting of Sudanese, Ethiopian and western African strains, and the phylogeographic analysis suggested this was an incursion from Sudan. Hence, Ugandan SAT 1 strains appear to be derived from two different sources, southern Africa and Sudan, respectively.
The sampling scheme used in this study may to some extent have affected the outcome of the phylogeographic analysis. For example, we cannot exclude that the inclusion of more samples from Uganda would alter the posterior state probability of some nodes in the tree to reflect an earlier introduction of SAT 1 into Uganda. Given that Uganda is represented in both of the two major clades, it may have played a more prominent role connecting southern African SAT 1 viruses with those of Sudan, Ethiopia and western Africa. Such a scenario seems plausible given the central location of Uganda according to our definition of location states. Furthermore, we cannot exclude that additional samples from Uganda will show phylogenetic affinity with the surrounding countries. This could be tested by acquiring more Ugandan samples. In fact, more recent SAT 1 virus isolates from Uganda have grouped within the Ugandan lineage A [19], in agreement with the phylogeographic conclusions reported here. In general, however, we stress that our findings should be viewed as a null hypothesis about continental SAT 1 dispersal against which studies based on more comprehensive sampling can be tested. Denser sampling (both temporally and spatially) can be expected to reveal novel dispersal patterns not observed here and further address the fine-scale historical movement of the serotype.
The substitution rate inferred in our study differs considerably from [20]. This leads to a significantly deeper tree, and hence it is difficult for us to put our results into a historical context that includes all FMDV serotypes. Our mean estimate of 538 ybp for the TMRCA of SAT 1 actually predates that of the whole FMDV found in [20], although it is within the 95% HPDs reported in that study (218-1250 ybp). We caution that the time line of our phylogeographic tree should not be regarded as conclusive and that further studies are needed to establish the rate of evolution in FMDV. Our inferred rate is, however, closer to the reported mean rate of evolution across all serotypes (2.48 × 10 -3 ) for the VP1 coding sequence [20]. In [20], the SAT 1 virus sequence was found to have a roughly 3-fold faster rate than the species average. We speculate that this exceptionally fast rate could be derived from the sampling scheme in [20], where many of the included SAT 1 isolates are from the same epidemic outbreaks. This tends to yield faster rates of evolution, since what is recovered is actually the mutation rate rather than the long-term substitution rate subject to selection and other forces [21], leading to a bias towards higher rates and a shallower tree. In accordance with this, we did find much faster rates of evolution in the western African samples; all collected during two epidemic outbreaks each spanning just two years. However, regionally variable evolutionary rates may in fact reflect real differences in the epidemiological dynamics and host-interaction of FMDV. For example, buffalos and other wildlife may play a more prominent role in the epidemiology of SAT 1 in eastern than in western Africa, and this may give rise to changed patterns of evolution of virus lineages in the two regions. Considerable localized differentiation in evolutionary rates has not previously been observed in FMDV, and although potentially informative concerning epidemiology and evolution, it also complicates evolutionary estimates based on global or widespread sample collections. Given these two (not necessarily mutually exclusive) causes of the observed rate heterogeneity, it is vital that future studies address the caveats in using the VP1 coding sequence to infer evolutionary rates and history.
Purifying (negative) selection was the most predominant evolutionary force at play among the SAT 1 viruses. At least 153 codon positions including the RGD motif (amino acid residue positions 149-151) of VP1, required for receptor interaction, were estimated to be under purifying selection signifying amino acid conservation as reflected in the low evolutionary rates. There was less evidence for positive selection although a few sites may have been under adaptive selection. Amino acid sites that are distinct between the regional virus groups as well as conservation of the RGD motif were observed when inferred using MEGA version 4 [22] and is in agreement with previous reports [13]. These evolutionary patterns may reflect the observed apparent long term circulation of some virus strains in the region previously reported in [13]. It has also been observed that genetic heterogeneity may be limited by evolutionary constraints [23]. There was no evidence for the presence of recombination within the VP1 coding sequences (in agreement with observations that recombination is largely restricted to non-structural coding regions with very few phylogenetic incongruities in the capsid proteins [24][25][26]) adding confidence to our results.

Conclusions
We have inferred the most likely phylogeographic history of SAT 1 in sub-Saharan Africa. We found evidence that the SAT 1 viruses circulating in Uganda and Kenya/Tanzania represent independent phylogeographic lineages. Kenya and Tanzania appear to experience a much greater exchange of viruses at their respective southern and northern borders through the transboundary livestock and wildlife movements (a common feature in this area) than with Uganda. This highlights the importance of a regional approach to trans-boundary animal disease control. It is apparent from the SAT 1 analysis presented here that monitoring of the emerging strains in the region is required for the success of vaccination strategies.

Virus Isolates
Eleven (10 Kenyan and 1 Tanzanian) SAT 1 virus isolates for this study (collected between 1977 and 2006) were obtained from the Embakasi FMD laboratory in Nairobi which is a repository of all FMD sample materials collected in Kenya. Virus was isolated from clinical material according to standard procedures on baby hamster kidney (BHK) cells. The details of the isolates are shown in Table 2.

Viral RNA extraction, cDNA synthesis and amplification
Total RNA was extracted and cDNA synthesized as previously described [27]. The complete VP1 coding region was amplified using the primer pair, FMD AKS (5'-ATGGGACACAGGTCTGAACTCGA-3') and FMD- 2B58 [28] applying PCR reagent volumes and conditions as previously described [27]. PCR products were visualized, purified and cycle-sequenced using the same primers as for PCR above.

Phylogeographic analysis
In addition to the eleven sequences generated in the study, 42 (17 from East Africa and 25 the rest of Africa) other complete VP1 coding sequences available in the GenBank covering a sampling period from 1948-2007 were included to put the results from East Africa into a continental SAT 1 context. The sequences were assembled and aligned using the software program Geneious version 4.6 [29]. The best fitting nucleotide substitution model was tested by means of a hierarchical likelihood ratio test (LRT) and the Akaike information criteria (AIC) as implemented in MrModeltest version 2.2 software [30] and executed in PAUP* version 4b10 software [31]. The selected model was general time-reversible (GTR) [32] with gammadistributed rates among sites and a proportion of invariable sites.
Phylogenetic relationships, evolutionary rates and population size changes were co-estimated for the whole data set and geographic subsets using a Bayesian Markov Chain Monte Carlo (MCMC) method implemented in the BEAST (Bayesian evolutionary analysis sampling trees) software version 1.6.0 package [33]http://beast.bio. ed.ac.uk using the selected model of nucleotide substitution. The method utilizes the sampling time of the sequences to infer rates of evolution along lineages, time of TMRCA and demographic history. A recent extension of the software allows tracking of the geographic location state along the phylogenetic tree, yielding posterior estimates of the location of each branch/node in the tree given the phylogenetic uncertainties [18]. Given our limited data set with low representation of many countries, we defined the geographical states as six coherent regions roughly corresponding to areas separated by known topotype boundaries. The regions include: western Africa, Ethiopia, Sudan, Uganda, Kenya/Tanzania and southern Africa (Table 2). We used a Bayesian stochastic search variable selection (BSSVS) without distance informed priors on diffusion rates, as this has been shown not to improve confidence in the phylogeographical state assignment when dispersal patterns are complex such as with many viruses (e.g. [18]). Rate indicator log files were inspected in Tracer software version 1.4 http://tree.bio.ed.ac.uk/software/tracer/, and Bayes factor tests were carried out to test the most significant routes of dispersal using the Rate Indicator BF tool of the BEAST package.
In a preliminary analysis, we tested four different demographic models/coalescent priors as suggested in [33]. In addition, we tested the appropriateness of a strict clock versus various versions of relaxed clocks available in BEAST [34]. This process of model selection suggested the constant population size model with an uncorrelated exponential clock to be the best fit to the data. We used a HKY+G+I substitution model [35] with four rate categories, as in a recent influenza study ( [18], Additional file 1). The MCMC chains were run long enough (100 million steps) to allow high effective sample sizes (ESSs) (above 250 for most parameters, minimum 100 for all parameters) with a 10% burn-in as viewed in Tracer. Statistical uncertainties of the substitution rates and the MRCA were summarized as the lower 95%, mean, and upper 95% values of the HPD interval. Mean evolutionary rates (averaged over branches weighted by their lengths) were measured as the number of nucleotide substitution per site per year (s/s/y). Maximum clade credibility trees were obtained using Tree Annotator program in BEAST and visualized with FigTree version 1.1.2 software http://tree.bio.ed.ac. uk/software/figtree/.

Selection and recombination detection
Tests for selection were performed using four methods which estimate selection in a phylogenetic context available in the Datamonkey web interface [36]. The best-fitting nucleotide substitution model was selected using the automated link. To identify codon sites under positive (adaptive) or negative (purifying) selection, we used the single-likelihood ancestor counting, the fixed effects likelihood and the random effects likelihood methods. The SLAC and FEL methods estimate selection on a site-by-site basis with the former method comparing observed to expected synonymous and non-synonymous rates while the latter uses two models which assume independent and equal rates and a likelihood ratio test to determine significance. The REL method determines independent general discrete distributions for the global synonymous and non-synonymous rates using a codon based model which are then used as priors for Empirical Bayes analysis of site selection [36]. The integrative selection analysis option in Datamonkey was then used to increase confidence on the estimation of selection at a site if all three methods support it. To test the hypothesis that different selective environments were acting on the branches of the phylogeny, we used the GA branch method to estimate dN/dS. To add confidence to our coalescent inferences, the presence of recombination in the data was tested using the GARD method [37] on the Datamonkey server with topological incongruence significance estimated by the Kishino-Hasegawa test [38] and also by the exploratory methods implemented in RDP version 2 beta 0.8 software [39] which included; RDP, [40] GENECONV, [41] Bootscan, [42] MaxChi, [43] and Chimaera [44].

Additional material
Additional file 1: BEAST XML file for the FMDV serotype SAT 1 analysis. Input XML file used for BEAST relaxed molecular clock and constant size prior analysis of VP1 coding region.