Patterns of genetic variation in populations of infectious agents

Gordo, Isabel; Campos, Paulo RA

doi:10.1186/1471-2148-7-116

Research article
Open access
Published: 13 July 2007

Patterns of genetic variation in populations of infectious agents

Isabel Gordo¹ &
Paulo RA Campos²

BMC Evolutionary Biology volume 7, Article number: 116 (2007) Cite this article

6770 Accesses
12 Citations
Metrics details

Abstract

Background

The analysis of genetic variation in populations of infectious agents may help us understand their epidemiology and evolution. Here we study a model for assessing the levels and patterns of genetic diversity in populations of infectious agents. The population is structured into many small subpopulations, which correspond to their hosts, that are connected according to a specific type of contact network. We considered different types of networks, including fully connected networks and scale free networks, which have been considered as a model that captures some properties of real contact networks. Infectious agents transmit between hosts, through migration, where they grow and mutate until elimination by the host immune system.

Results

We show how our model is closely related to the classical SIS model in epidemiology and find that: depending on the relation between the rate at which infectious agents are eliminated by the immune system and the within host effective population size, genetic diversity increases with R₀ or peaks at intermediate R₀ levels; patterns of genetic diversity in this model are in general similar to those expected under the standard neutral model, but in a scale free network and for low values of R₀ a distortion in the neutral mutation frequency spectrum can be observed; highly connected hosts (hubs in the network) show patterns of diversity different from poorly connected individuals, namely higher levels of genetic variation, lower levels of genetic differentiation and larger values of Tajima's D.

Conclusion

We have found that levels of genetic variability in the population of infectious agents can be predicted by simple analytical approximations, and exhibit two distinct scenarios which are met according to the relation between the rate of drift and the rate at which infectious agents are eliminated. In one scenario the diversity is an increasing function of the level of transmission and in a second scenario it is peaked around intermediate levels of transmission. This is independent of the type of host contact structure. Furthermore for low values of R₀, very heterogeneous host contact structures lead to lower levels of diversity.

Background

Patterns of genetic diversity in populations of infectious agents contain important information about their epidemiology and evolution. They depend on the population dynamics of the infectious agents, which involves their replication within hosts and transmission between hosts, their mutation and recombination rate. Infectious agents vary enormously in their ability to mutate and to transmit, which will lead to large differences in levels of variability. Furthermore there can be variation within an infectious species for the ability to evade the host immune system. In fact, infectious agent genetic diversity can help in targeting genes under selection pressure created by the immune system [1]. In addition patterns of infectious agent variation can, under certain circumstances, be used to infer host population history [2], and the level of infectious agent genetic structure may reflect its evolutionary potential [3]. Importantly, the need for a continuous integration between population genetics and epidemiology has been increasingly recognized [4–7].

In population genetics the standard neutral model has a long history in DNA sequence data analysis [8], and has been extensively used as a null model for understanding genetic variation in natural populations, including that in our own species [8, 9]. The standard neutral model makes several simplifying assumptions: in particular it makes the simple assumption that individuals form one single constant size population. When considering populations of infectious agents it is much more reasonable to assume, as the null model, a population composed of a collection of much smaller populations.

Here we develop population genetics models of structured populations, that incorporate epidemiological parameters explicitly, in order to study genetic variability under one of the simplest possible epidemiological models. We ask mainly two questions: 1) what do levels and patterns of sequence variation in these infectious agents look like under this model? And 2) how does host contact structure influence their diversity?

The models we will study here are very similar to the metapopulation models where each subpopulation can go extinct and be recolonized [10–12]. Generally studies of genetic diversity in such subdivided populations [13, 14] assume a simple symmetric topology for the metapopulation – the most well studied is the island model of Wright. Simple as it is, this model has provided a wealth of results that have led to enormous contributions to our understanding of evolution in structured populations [15, 16]. Nevertheless, there are several reasons to think that this model is too simple to be readily applicable to natural populations [14, 17], especially if the goal is to understand molecular diversity of infectious agents. As we know, the underlying topology at which certain disease epidemics and spreading takes place is that of social networks [18]. Several recent investigations have demonstrated that real networks of interaction have a much more complex structure than those predicted by totally regular networks or totally random networks [19]. Most real networks of social interactions present two different topological properties: a low average pairwise distance between nodes and a high clustering degree (which measures local structuring).

The former occurs in random networks and the latter in regular networks. In such way, some models of network topologies have been recently proposed in the literature (for a review see Ref. [20]). One of the most successful models for network structure is the scale-free network [21]. In addition to the common properties of real interaction networks, in scale-free networks the distribution of connectivities obeys a power-law distribution as $P (k_{i}) \propto k_{i}^{- γ}$ , which is observed in some actual systems ranging from World Wide Web to the network of human sexual contacts [22, 23]. As initially proposed, scale-free networks are dynamical networks where growth and preferential attachment are some of the key mechanisms.

Accordingly, each newly introduced node in the network preferentially joins with an already well connected-node. As a result, it will produce a highly heterogeneous network where most nodes have a low connectivity while a few nodes display a very large connectivity. These latter ones are referred to as hubs. The understanding of the interplay between the underlying topology and the forces driving systems is of crucial relevance [24, 25]. One example of this, that has received a great deal of attention, is that of network epidemiology: the study of epidemic and disease spreading [26–29], which are strictly tied to the topology of social contact networks. In this context, a striking result has arisen from the study of the classical susceptible-infected-susceptible (SIS) epidemiological model on scale free networks: scale-free networks are more prone to spreading of diseases than random graphs and regular lattices [26, 27]. In this kind of model the role of microbe evolution is disregarded. Recently, we have focused on this latter feature and we have shown that although scale-free networks are more prone to infectious agent spread, the accumulation of deleterious mutations in asexual infectious agent with high mutation rates can also be accelerated in this kind of networks in comparison to random graphs [30]. This shows that not only disease dynamics but also its evolution should be considered as an important key in the investigation of epidemiological models [7]. Another very important feature that has to be considered is co-evolution between infectious agent and their hosts [31]. Modeling of these complex systems have provided us with insights into how host-parasite interactions can modulate the mode of reproduction [32], ploidy levels [33], the patterns of gene expression in hosts and parasites [34] and how different types of interspecies interactions affect genetic and phenotypic variation [35].

Results and Discussion

Levels of metapopulation infection

The susceptible-infected-susceptible model (SIS model) is one of the simplest classical models in epidemiology. In this model, hosts born susceptible (S) can become infected (I) at a rate β per unit time, given contact with at least one infected host. Infected hosts become susceptible at a rate λ, such that 1/λ is the average duration of an infection. One of the most fundamental quantities to assess the equilibrium frequency of infections in the population is the R₀ of the infectious agent. The R₀ is defined as the number of secondary cases produced by an infectious individual in a totally susceptible population. At epidemiological equilibrium, the frequency of infected individuals is i = 1 - 1/R₀, with R₀ = β/λ. If R₀ < 1 then the infection does not spread.

To assess the patterns of variation under the SIS model, we have studied a population genetic model of a structured population that is composed of many small subpopulations, which are named demes. There is a total of D demes, which are connected according to a given network topology: corresponding to either the island model or the scale free network. These demes can go extinct and be recolonized through migrants that they received from the other demes. Each deme can contain at most N_dindividuals, which reproduce and mutate within each deme (see Methods). In Table 1 we make a summary of the model's key parameters.

Table 1 Model parameterization

Full size table

We now relate our metapopulation model with the SIS model and in this study we will ask what equilibrium patterns of infectious agent genetic variation look like under this model. In our model a deme corresponds to a host. An empty deme means that the host is susceptible, whereas a deme which is full corresponds to an infected host. A deme that is currently full can become empty with probability e, which means that e corresponds to λ. A deme that is currently empty can become full through the migrants it receives from nearby demes. This implies that β is proportional to m. Given that the average connectivity of a deme is K and that the number of migrants per link is N_dm, then β corresponds to N_dmK.

In order to assess the correspondence between our model and the SIS model, we have compared the average frequency of infected individuals in our metapopulation with the expectation for the deterministic SIS model, which implies that:

i = 1 - 1/R₀ = 1 - e/N_dmK (1)

Equation 1 is the expected frequency when there is no variance in k_i, which is not the case in scale free networks.

In Figure 1 we show the results from our simulations, where the proportion of infected individuals in the metapopulation is measured as we increase the transmission coefficient of the infectious agent, β, through increments in m. The results for the different types of networks considered are shown, and the deterministic expectation is also plotted. In all cases R₀ = N_dmK/e, where K = D - 1 for the island model and K = 6 for the scale free topology. The results of the simulations show that the proportion of infected individuals observed and that predicted are quite concordant. In particular, if we assume the topology corresponding to the island model, then the level of infection is exactly that predicted by Equation 1. We notice that the prediction holds for an effectively infinite population under the mass action assumption.

One may expect deviations to be observed when these assumptions are violated [36]. Nevertheless the deviations we observe are small, unless R₀ is very low. In fact in the case of very low R₀ there is a high probability that the infection does not spread. For example in the scale free network, if the infection starts in a poorly connected host it may have very little chance of spreading. We performed simulations with the scale free topology in conditions where the infection starts in a single randomly chosen host. With the same parameters as in Figure 1 and for R₀ = 1.5, we observed 66% of cases where the disease could not spread. With R₀ = 3, the fraction of cases where the infectious agent could not invade dropped to 40%.

Levels of metapopulation diversity

We now study the level of genetic diversity in infectious agents sampled randomly from the whole population of infected hosts. We first consider a metapopulation where every host contacts every other host. This corresponds to the island model in the populations genetics literature and mass action assumption in epidemiological models. We then assess how the level of diversity is affected by differences in the level of contact between hosts, in particular when a small number of hosts can have a very large number of contacts, such as in the scale free network. In Figure 2 we show the level of diversity in samples taken from the whole population, π_t, as we increase R₀, through increments in m.

We observe that, for both topologies and for the sets of parameters considered, the level of π_tis maximal for intermediate values of R₀. For instance, when e = 0.01 this maximum value is achieved at R₀ around 3 for the island model and around 10 for the scale free topology. Beyond these points the level of diversity starts to decrease with increasing R₀. From Figure 2 we observe the occurrence of two quite distinct regimes, according to the level of transmission. In the region of low transmission, R₀ is small, extinction is much stronger than migration (e >> mK), the fraction of infected hosts is small and levels of diversity are low. In fact, starting from R₀ = 1, where the fraction of infected individuals, i, is 0, as we increase R₀ (by increasing m), the level of infection rises and the level of diversity accompanies that increase. In this region the level of infection bounds the level of diversity in the population, since it is expected that diversity will be higher when the total number of infectious agents in the metapopulation is larger. When the level of infection achieves a value close to 0.9, increments in m, lead to small increments in i and the level of diversity stops increasing. The second regime comes about at high transmission, where R₀ is very large. In this region migration is much stronger than extinction, mK >> e, the level of infection is close to 1 and so it is not the limiting factor for diversity to grow. From this point, increments in migration cause a drastic reduction in the isolation between demes and lead to a reduction in diversity. In fact in the limit of extremely high levels of migration the diversity in the structured population tends to that expected in a panmitic population of size N_t= DN_d. So, for very high values of R₀, diversity tends toward the value π_t= π_d= 2N_dD_μ, which in the case of Figure 2 is 8, for the value of the mutation rate, μ, assumed. Figure 2 also shows that in the region of low R₀, diversity in the island model is higher than in the scale free network, whereas for large values of R₀, there is little difference between the topologies. The latter is expected since the larger the value of the migration rate the less important the precise contact structure will be. The former can be understood as follows: a low value of R₀ corresponds to a small fraction of infected hosts both in the island model and in the scale free network. But whereas in the island model new infections of a susceptible host occur from contact with any of the infected hosts in the metapopulation, in the scale free network infections are more likely to come from well connected hosts, which are a small subset of the metapopulation. This then will lead to lower diversity levels in the scale free network, as compared to the island model, for the same low R₀ value.

We have compared our simulation results with some of the analytical approximations for the levels of diversity in metapopulations [13]. In the vast majority of metatopulation models with extinction and recolonization, the island model of population subdivision is assumed. Furthermore, the processes of migration and recolonization are assumed to be distinct. Two different schemes of colonization are normally considered, according to where colonists come from: the migrant pool model and the propagule pool model [14, 37, 38]. In both models there are k colonists (where k is a fixed number independent of migration), which may constitute a random sample from the whole metapopulation (migrant pool model) or from a single deme (propagule model). Pannell and Charlesworth [13] have studied levels of within and between population diversity under these models and have provided a set of analytical approximations. We have adapted the approximations in their Table 2, which correspond to the infinite sites mutation model as we assume here, to the metapopulation model that we are studying, which is slightly different from the one they have used. In particular, besides the different types of contact structure studied here, there are two key differences in the models: 1) in our model recolonization and migration are similar processes; and 2) while in the classical model it is assumed that when one deme goes extinct it gets immediately recolonized, in our model when a deme goes extinct it will only be recolonized when it receives migrants. In this way, the equilibrium number of empty demes (susceptible individuals) decreases as m, or R₀, increases. Whereas for infectious agent populations assumption 2) is more appropriate, for some infectious agents assumption 1) may be too simple. One can imagine that when a host is infected, its ability to transmit the infectious agents to another infected host is reduced compared to its ability to transmit the infectious agent to a susceptible individual. This implies that the migration rate between subpopulations may, in some infectious agents, depend on the host history. We have taken the simplest scenario here.

Table 2 Mean values of Tajima's D in the scale free network with parameters

Full size table

We have thus compared the expected level of metapopulation genetic diversity in our simulations for the symmetric island model (where every host contacts every other host) with the following approximation:

π_{t_{i s l}} = 2 N_{d} D (1 - 1 / R_{0}) μ \frac{R_{0} + 1 / 2 e}{N_{d} + R_{0}}

(2)

which is adapted from the approximation for the classical case of the migrant pool model of recolonization. We can expect that in the case of the scale free topology, where a large number of hosts have few connections and a few hosts are very well connected, levels of diversity will be closer to those expected for the propagule pool recolonization model. This is because hubs in the network will contribute much more than the other nodes in the process of recolonization. We have thus compared the expected level of genetic diversity for the scale free network with the following approximation for the propagule pool model:

π_{t_{s f}} = D (1 - 1 / R_{0}) μ \frac{1 - e}{e (2 - e)}

(3)

which is valid only when mK <e [13]. We therefore expect this expression to provide a good approximation for cases in which R₀ <N_d.

As seen in Figure 3, these formulas provide very good approximations to the simulation results, for low values of R₀. For very large values of R₀, the level of diversity is similar in the two topologies and is very well approximated by Equation 2.

Equation 2 suggests a strong dependence of the level of metapopulation diversity with N_d, the effective population size within a host. This effective population size is likely to vary considerably among different infectious agent species. We have therefore explored how the value of N_daffects the levels of diversity with simulations.

In Figure 4 we show the results of varying N_d, for both types of network (island model in the left panel and scale free network in the right panel). Figure 4 clearly shows that when e < 1/N_d(filled symbols in both panels), levels of diversity are maximal for intermediate R₀. But for e > 1/N_ddiversity always increases with R₀. This occurs both in the island model and in scale free networks. This shows that, independently of the type of host contact structure, for infectious agents with large intrahost effective population size, levels of diversity increase with increasing R₀.

Furthermore, as suggested by Equation 2, for small values of R₀, increasing N_dhas a very small effect on the level of diversity, but for intermediate to high R₀ values the effect is more pronounced.

When R₀ > 10, the level of infection is not a limiting factor in the level of diversity, because the number of infected hosts is very high. Thus for large values of R₀ infectious agent diversity will increase with N_d.

Comparing the panels in Figure 4, we can observe that when R₀ << 10, diversity is always smaller in the scale free network, whereas when R₀ >> 10 and e > 1/N_dthe levels of diversity are similar in both contact networks. In fact, for large values of R₀, the largest difference between the topologies can be observed when e = 1/N_d.

In this metapopulation model there are two forces which generate diversity within each host: mutation and transmission; there are also two forces that undermine diversity: extinction and genetic drift. So in general, we can expect that, when the forces that reduce diversity are stronger than those that generate it (that is low R₀, low N_dor high e), diversity levels will be low. On the contrary, high R₀, high N_dor low e, we can expect levels of diversity to be much larger.

Metapopulation mutation frequency spectrum

The spectrum of frequencies of mutations that are segregating in the population is important to understand deviations from the standard neutral model, which assumes an undivided, constant size population at equilibrium between mutation and drift [39]. In fact, the mutational spectrum of infectious agent gene sequences has been used to reject the standard neutral model suggesting that natural selection is determining the evolution of certain genes [40, 41]. Tajima's D is a widely used statistic to assess distortions in the frequency spectrum [42]. If the number of mutations that appear at frequency 1/n in sample of size n (singletons) is higher than that expected under the standard neutral model, then Tajima's D becomes negative. On the other hand if the number of mutations at intermediate frequency is large then Tajima's D becomes positive. When a departure from the standard neutral model is observed in a given gene of a given species, several alternative hypotheses can be made. These typically involve natural selection and/or demographic factors, such as population growth or population structure. In infectious agent populations the relevant null model against which we would like to test for the molecular signature of selection is closer to a metapopulation neutral model than to the standard neutral model. From all the simulations in all the metapopulation structures we have studied, we have observed that D_twas always very close to 0. This is in agreement with the results of coalescent theory and simulations in metapopulations under the island model [43, 44]. However, we have observed that in some simulations of scale free networks a slight distortion of the frequency spectrum was apparent. In cases of low R₀ mean values of the Tajima's D statistic become negative. In Table 2 we show one example where this occurs. Although the values of D_tare not very negative when the sample size is small, they become more negative with increasing sample size.

In Figure 5 we show an example of the mutation frequency spectrum in the scale free network for two values of transmission with a large sample size. Clearly we see that when R₀ is small the proportion of singletons in the samples is much higher than when R₀ is large. For R₀ = 15 the spectrum is similar to the one expected under the standard neutral model. These results imply that it is very dificult to reject an equilibrium neutral model with constant population size when using Tajima's D.

Infectious agent diversity within hosts and differentiation amongst hosts

In infectious agents with very high mutation rates, as it is the case of RNA viruses [45], one may expect some level of within host diversity to be observed. We have therefore studied the level of diversity in samples taken from each infected host. We also studied the level of genetic differentiation between hosts measured by F_ST[46]. The statistic of genetic differentiation we use measures the difference between the level of infectious agent diversity within an infected host and that of the entire infectious agent metapopulation. It is known that all of these statistics are important for the understanding of the relative importance of the processes governing metapopulation dynamics [13] and therefore they can be important in understanding their epidemiology. Figure 6 shows the results for the levels of genetic differentiation, as measured by F_ST, for different values of R₀. As can be seen from this figure, for infectious agents with low transmission, the levels of host differentiation are very high. In this case and with the value of the mutation rate considered, the levels of within host genetic variability can be very low. For example in the case of the island model with R₀ = 2.5, the observed mean level of intrahost infectious agent diversity was 0.46, which is only 0.04 of the level observed in the whole metapopulation. In certain RNA viruses, such as the Dengue virus, the levels of intrahost genetic diversity that have been observed are about 0.03 of that between hosts [47]. In infectious agents with high transmission rates the levels of differentiation are much smaller and are accompanied by higher levels of within host diversity. In the case of the island model every host has similar levels of diversity. And so, as the infectious agent transmission rate increases, so does the level of within host diversity. But in scale free networks, host connectivity affects infectious agent diversity within that host and the levels of differentiation between hosts. We have considered the case of two infectious agent with different transmission coeficients and have looked at the relation between host connectivity and within host diversity. Figure 7 shows that well connected hosts have much higher levels of π_dand much lower levels of F_ST. In this instance, F_STreflects the average divergence between demes with connectivity k_iand all other demes. From the figure we see that well connected hosts also show significantly larger mean values of the Tajima's D statistic, for intermediate values of R₀.

Conclusion

One of the main goals in infectious disease research is to understand how infectious agent variation, host immunity, transmission dynamics and epidemic dynamics determine patterns of infectious agent evolution. Information about evolutionary and epidemiological processes can be extracted from studying infectious agent genetic diversity. In particular it can help us to understand the origin of disease and the selective pressures that act on certain infectious agent genes. The link between infectious agent dynamics and genetic diversity at within and between host level is a very important problem. The means towards its solution requires the integration of population genetics and epidemiology. This has recently been recognized as a major step for understanding infectious agent evolution [5].

Here we have studied levels and patterns of infectious agent diversity under one of the simplest classical epidemiological models: the SIS model. In this model, hosts that are susceptible can become infected at a given rate, and hosts that are infected can become susceptible by clearance of the infectious agent. We have found that, under this model and in the conditions studied, for low clearance rates and low intrahost effective population size, levels of genetic variability in samples from the whole infectious agent population are maximal for intermediate levels of transmission. This pattern of DNA sequence diversity was found to be independent of the type of host contact structure.

Although we have not performed simulations with values of N_dclose to those that have been estimated for some infectious agent (N_d≃ 1000 estimated for HIV-1 [48]) due to the high computational cost, from the simulations we have done we have checked that when the rate at which the immune system clears the infectious agent (e) is higher than the rate of drift (1/Nd) within the host, levels of infectious agent diversity in the whole metapopulation monotonically increase with R₀.

In highly transmitted infectious agents, levels of diversity are weakly dependent on the type of host contact structure. However for infectious agents with low values of R₀, levels of diversity do depend on the host contact structure: when interactions between hosts are such that every host is in contact with every other, levels of diversity are higher than when the host contact structure is such that a few hosts have a disproportionate number of contacts, whereas the majority has a small number of contacts. In this latter case levels of infectious agent diversity are expected to be low. Furthermore, in this latter case the frequency spectrum of neutral mutations can be distorted, in relation to that expected for the standard neutral model [39]. This feature is captured by negative values of the Tajima's D statistics. The observation of positive values of D_tin infections agent genes suggests that strong diversifying selection could be occurring, since even when we account for the complex contact structure in which infectious agents evolve, under a neutral model one would expect to observe values of D_tclose to 0 or negative.

The results presented here can also be used to make some predictions about future adaptation in infectious agents. If we assume that new adaptive mutations in infectious agents arise from standing neutral variation [49, 50], Figures 2, 3 and 4 imply that for infectious agents with low intrahost effective population size, those with intermediate R₀ will be likely to adapt more rapidly than those with larger R₀. For infectious agents in which these conditions are met, an important implication regarding public health measures can be drawn: if control programs with the aim of lowering transmission do not reduce R₀ to very low values, but instead only lead to small reductions in R₀, then this may imply an increased chance of the infectious agent escaping the immune system.

One feature of several natural populations, including infectious agent populations is the occurrence of correlations between genetic and geographical distance [14, 51]. In the island model of population structure that pattern does not arise, whereas in the stepping stone model it is evident. We have explored the relation between genetic and geographical distance in the scale free contact network, which is likely to be closer to the relevant contact structure for infectious agent evolution. Although in our models we have not considered geography explicitly, we have assumed that it can be related to the shortest path length between nodes in the network. Figure 8 shows a clear correlation between these distances. One can intuitively suspect that natural selection can cause infectious agents to adapt to local conditions and that local adaptation can lead to spatial genetic structuring. But before one jumps to the conclusion that natural selection is playing a role in spatially structuring diversity one has to rule out the simpler explanation of neutral evolution in a complex host contact network. Hopefully, the careful consideration of all diversity measures and the use of several test statistics will help us to find the molecular signature of adaptation in infectious agent gene sequences.

Methods

General model description

We consider the evolution of a haploid non-recombining population subdivided into small subpopulations-demes. There are D demes, each corresponding to a node in the network comprising all the population. Each deme has a maximum size of N_dindividuals. The total maximum number of individuals in the metapopulation is N_t= DN_d. Each deme can go extinct with probability e and be recolonized through migration of individuals from other demes to which the deme is connected to (see below). Note that in our model recolonization occurs through migration (which is different from other metapopulation models [14]). In order to model migration we do the following. Each deme i of a given network is connected to k_iother demes according to the specific type of contact network considered. Each edge of the network connects two demes that exchange migrants at a mean rate m. We produce a new generation of individuals by taking the following steps: we draw the number of migrants going out from each deme from a Poisson distribution with mean N_dmk_i, if the deme is not empty. The individuals that migrate are sampled at random, without replacement, from the original deme and added to the recipient demes. The assumption of sampling without replacement, is not restrictive, since we obtain the same results in simulations where sampling with replacement is considered. The relevant parameter of the SIS model is the basic reproductive number R₀, which corresponds to R₀ = N_dmk/e in our model. So in our simulations we changed the value of R₀ by changing the migration rate m while keeping constant all other quantities. After migration, reproduction and mutation occurs. N_dindividuals are chosen at random to form the new population of each deme. Each individual is subject to new mutations following a Poisson distribution with mean μ. We assume the infinite sites mutational model where every new mutation occurs at a new site. At the start of each simulation run, all demes have N_dindividuals, which are mutation-free and are represented by an infinitely large sequence. We then let the simulation run for an initial period, T_eq, to allow the metapopulation to reach an epidemiological and genetic equilibrium. The time to reach equilibrium depends on the set of parameters of the simulation. Since all the measurements are obtained after equilibrium, the results do not depend on the initial condition. Every T = 5000 generations, after the initial T_eq generations, we take a sample of size n_t= 50 from the entire population, and samples from within each deme of size n_d= 5, unless stated otherwise. We then calculate the average number of pairwise differences for the entire population:

π_{t} = \frac{\sum_{i < j} π_{i j}}{n_{t} (n_{t} - 1) / 2}

(4)

where π_ijis the number of differences between two sampled sequences, and also for each deme (π_d).

We also calculate the number of segregating sites in each sample (S_tand S_d) and the test statistic Tajima's D [42] which for samples of the entire population is given by:

D_{t} = \frac{π_{t} - S_{t} / a_{n_{t}}}{b_{n_{t}}}

(5)

where $a_{n} = \sum_{i = 1, n} \frac{1}{i}$ , b_n= e₁S + e₂S(S - 1) and e₁ e₂ as defined by Tajima [42].

One other quantity of interest that we have studied is F_ST, a measure of genetic differentiation amongst demes. This measure is defined as [46]:

F_{S T} = \frac{π_{t} - π_{d}}{π_{t}}

(6)

A well studied topology in the population genetics literature is the island model, introduced by Wright, which corresponds to a fully connected network where every deme is connected to the others, so k_i= D - 1. A commonly studied topology in epidemiology is the scale-free network, where the distribution of connectivities obeys a power-law: $P (k_{i}) \propto k_{i}^{- γ}$ . In real systems the exponent γ is in the range between 2 and 3. Nodes of low connectivity are predominant in the network, whereas well-connected nodes are rare. One of the mechanisms that can lead to the occurrence of a network with a power-law degree distribution is growth with preferential attachment, where nodes newly introduced to the network are preferentially attached to those nodes which are already well connected. We use the standard algorithm by Albert and Barabasi to build up the scale-free networks [21], and so we generate networks with exponent γ = 3. Scale free networks, that are extremely heterogeneous, may be appropriate descriptions for studying sexually transmitted diseases [18]. Our results for scale-free networks were compared to the island model. For every network and every parameter set we have run 30 independent simulations.

References

Connway D, Cavanagh D, Tanabe K, Roper C, Mikes Z, Sakihama N, Bojang K, Oduola A, Kremsner P, Arnot D, Greenwood B, McBride J: A principal target of human immunity to malaria identified by molecular population genetics and immunological analyses. Nature Medicine. 2000, 6: 689-692. 10.1038/76272.
Article Google Scholar
Falush D, Wirth T, Linz B, Pritchard J, Stephens M, Kidd M, Blaser M, Graham D, Vacher S, Perez-Perez G, Yamaoka Y, Megraud F, Otto K, Reichard U, Katzowitsch E, Wang X, Achtman M, Suerbaum S: Traces of human migrations in Helicobacter pylori populations. Science. 2003, 299: 1582-1585. 10.1126/science.1080857.
Article CAS PubMed Google Scholar
McDonald B, Linde C: Pathogen population genetics, evolutionary potential, and durable resistance. Annu Rev Phytopathol. 2002, 40: 349-379. 10.1146/annurev.phyto.40.120501.101443.
Article CAS PubMed Google Scholar
Paterson S, Viney M: The interface between epidemiology and population genetics. Parasitology Today. 2000, 16: 528-532. 10.1016/S0169-4758(00)01776-2.
Article CAS PubMed Google Scholar
Grenfell B, Pybus O, Gog J, Wood J, Daly J, Mumford J, Holmes E: Unifying the epidemiological and evolutionary dynamics of pathogens. Science. 2004, 303: 327-331. 10.1126/science.1090727.
Article CAS PubMed Google Scholar
Wilson D, Falush D, McVean G: Germs, genomes and genealogies. Trends in Ecology and Evolution. 2005, 420 (1): 39-45. 10.1016/j.tree.2004.10.009.
Article Google Scholar
Galvani A: Epidemiology meets evolutionary ecology. Trends in Ecology and Evolution. 2003, 18: 132-139. 10.1016/S0169-5347(02)00050-2.
Article Google Scholar
Kreitman M: Methods to detect selection in populations with applications to the human. Annu Rev Genomics Hum Genet. 2000, 1: 539-559. 10.1146/annurev.genom.1.1.539.
Article CAS PubMed Google Scholar
Przeworski M, Hudson R, Di Rienzo A: Adjusting the focus on human variation. Trends in Genetics. 2000, 16: 296-302. 10.1016/S0168-9525(00)02030-8.
Article CAS PubMed Google Scholar
Levins R: Evolution in changing environments. 1968, Princeton, NJ: Princeton University Press
Google Scholar
Levins R: Some demographic and genetic consequences of environmental heterogeneity for biological control. Bull Entomol Soc Am. 1969, 15: 237-240.
Google Scholar
Hansky I: Metapopulation Dynamics. Nature. 1998, 396: 41-49. 10.1038/23876.
Article Google Scholar
Pannell J, Charlesworth B: Neutral genetics diversity in a metapopulation with recurrent local extinction and recolonization. Evolution. 1999, 53: 664-676. 10.2307/2640708.
Article Google Scholar
Pannell J, Charlesworth B: Effects of metapopulation processes on measures of genetic diversity. Phil Trans R Soc Lond B. 2000, 355: 1851-1864. 10.1098/rstb.2000.0740.
Article CAS Google Scholar
Rousset F: Genetic Structure and Selection in Subdivided Populations. 2004, Princeton, NJ: Princeton University Press
Google Scholar
Charlesworth B, Charlesworth D, Barton N: The effects of genetic and geographic structure on neutral variation. Annu Rev Ecol Evol Syst. 2003, 23: 99-125. 10.1146/annurev.ecolsys.34.011802.132359.
Article Google Scholar
Whitlock M, McCauley D: Indirect measures of gene flow and migration: F_ST not equal 1/(4Nm + 1). Heredity. 1999, 82: 117-125. 10.1038/sj.hdy.6884960.
Article PubMed Google Scholar
Lloyd A, May R: How viruses apread among computers and people. Science. 2001, 292: 1316-1317. 10.1126/science.1061076.
Article CAS PubMed Google Scholar
Erdös P, Rényi A: On the evolution of random graphs. Inst Hung Acad Sci. 1960, 5: 17-61.
Google Scholar
Newman M: The structure and function of complex networks. SIAM. 2003, 45: 167-256. 10.1137/S003614450342480.
Article Google Scholar
Albert R, Barabási AL: Statistical mechanics of complex networks. Rev Mod Phys. 2002, 74: 47-97. 10.1103/RevModPhys.74.47.
Article Google Scholar
Albert R, Jeong H, Barabási AL: Diameter of the world-wide web. Nature. 1999, 401: 130-131. 10.1038/43601.
Article CAS Google Scholar
Liljeros F, Edling C, Amaral L, Stanley H, Aberg Y: The web of human sexual contact. Nature. 2001, 401: 907-908. 10.1038/35082140.
Article Google Scholar
Camazine S, Deneubourg JL, Franks N, Sneyd J, Theraulaz G, Bonabeau E: Self-Organizations in Biological Systems. 2001, Princeton, NJ: Princeton University Press
Google Scholar
Fewell J: Social insect networks. Science. 2003, 301: 1867-1870. 10.1126/science.1088945.
Article CAS PubMed Google Scholar
Pastor-Satorras R, Vespignani A: Epidemic spreading in scale-free networks. Phys Rev Lett. 2001, 86: 3200-3204. 10.1103/PhysRevLett.86.3200.
Article CAS PubMed Google Scholar
Barthélemy M, Barrat A, Pastor-Satorras R, Vespignani A: Velocity and Hierarchical Spread of Epidemic Outbreaks in Scale-Free Networks. Phys Rev Lett. 2004, 92 (17): 178701-10.1103/PhysRevLett.92.178701.
Article PubMed Google Scholar
Keeling M, Eames K: Networks and epidemic models. J R Soc Interface. 2005, 22 (2): 295-307. 10.1098/rsif.2005.0051.
Article Google Scholar
May R: Network structure and the biology of populations. Trends in ecology and Evolution. 2006, 21: 394-399. 10.1016/j.tree.2006.03.013.
Article PubMed Google Scholar
Campos P, Combadao J, Dionisio F, Gordo I: Muller's ratchet in random graphs and scale-free networks. Phys Rev E. 2006, 74 (4 Pt 1): 042901-10.1103/PhysRevE.74.042901.
Article Google Scholar
Woolhouse M, Webster J, Domingo E, Charlesworth B, Levin B: Biological and biomedical implications of the co-evolution of pathogens and their hosts. Nature Genetics. 2002, 32: 569-77. 10.1038/ng1202-569.
Article CAS PubMed Google Scholar
Otto S, Nuismer S: Species interactions and the evolution of sex. Science. 2004, 304: 1018-20. 10.1126/science.1094072.
Article CAS PubMed Google Scholar
Nuismer S, Otto S: Host-parasite interactions and the evolution of ploidy. Proc Natl Acad Sci USA. 2004, 101: 11036-9. 10.1073/pnas.0403151101.
Article PubMed Central CAS PubMed Google Scholar
Nuismer S, Otto S: Host-parasite interactions and the evolution of gene expression. PLoS Biol. 2005, 3: 1283-8. 10.1371/journal.pbio.0030203.
Article CAS Google Scholar
Kopp M, Gavrilets S: Multilocus genetics and the coevolution of quantitative traits. Evolution. 2006, 60: 1321-36.
Article CAS PubMed Google Scholar
May R, Anderson R: The transmission dynamics of human immunodeficiency virus (HIV). Philos Trans R Soc London B. 1988, 321: 565-607. 10.1098/rstb.1988.0108.
Article CAS Google Scholar
Slatkin M: Gene flow and genetic drift in a species subject to frequent local extinction. Theor Popul Biol. 1997, 12: 253-262. 10.1016/0040-5809(77)90045-4.
Article Google Scholar
Whitlock M, Barton N: The effective size of a subdivided population. Genetics. 1997, 146: 427-441.
PubMed Central CAS PubMed Google Scholar
Kimura M: The neutral theory of molecular evolution. 1983, Princeton, NJ: Princeton University Press
Book Google Scholar
Shriner D: Influence of random genetic drift on human immunodeficiency virus type 1 env evolution during chronic infection. Genetics. 2004, 166: 1155-1164. 10.1534/genetics.166.3.1155.
Article PubMed Central CAS PubMed Google Scholar
Polley S, Chokejindachai W, Conway D: Allele frequency based analyses robustly identify sites under balancing selection in a malaria vaccine candidate antigen. Genetics. 2003, 165: 555-561.
PubMed Central CAS PubMed Google Scholar
Tajima F: Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics. 1989, 123: 585-595.
PubMed Central CAS PubMed Google Scholar
Wakeley J, Aliacar N: Gene genealogies in a metapopulation. Genetics. 2001, 159: 893-905.
PubMed Central CAS PubMed Google Scholar
Pannell J: Coalescence in a metapopulation with recurrent local extinction and recolonization. Evolution. 2003, 57: 949-961.
Article PubMed Google Scholar
Drake J: The distribution of rates of spontaneous mutation over viruses, prokaryotes, and eukaryotes. Ann N Y Acad Sci. 1999, 870: 100-107. 10.1111/j.1749-6632.1999.tb08870.x.
Article CAS PubMed Google Scholar
Charlesworth B: Measures of divergence between populations and the effect of forces that reduce variability. Mol Biol Evol. 1998, 15: 538-43.
Article CAS PubMed Google Scholar
Holmes E: Patterns of intra and interhost nonsynonymous variation reveal strong purifying selection in dengue virus. J Virol. 2003, 77: 11296-8. 10.1128/JVI.77.20.11296-11298.2003.
Article PubMed Central CAS PubMed Google Scholar
Shriner D, Liu Y, Nickle D, Mullins J: Evolution of intrahost HIV-1 genetic diversity during chronic infection. Evolution. 2006, 60 (6): 1165-1176.
PubMed Google Scholar
Przeworski M, Coop G, JD W: The signature of positive selection on standing genetic variation. Evolution. 2005, 59: 2312-232355-561
Google Scholar
Hermisson J, Pennings P: Soft sweeps: molecular population genetics of adaptation from standing genetic variation. Genetics. 2005, 169: 2335-2352. 10.1534/genetics.104.036947.
Article PubMed Central CAS PubMed Google Scholar
Real L, Henderson J, Biek R, Snaman J, Jack T, Childs J, Stahl E, Waller L, Tinline R, Nadin-Davis S: Unifying the spatial population dynamics and molecular evolution of epidemic rabies virus. Proc Natl Acad Sci USA. 2005, 102: 12107-12111. 10.1073/pnas.0500057102.
Article PubMed Central CAS PubMed Google Scholar

Download references

Acknowledgements

We thank Gabriela Gomes, David Conway and Gareth Weedall for helpful suggestions. This work was supported by project POCTI/BSE/46856/2002 through Fund. para a Ciência e Tecnologia (FCT). I.G. is supported by FCT/FEDER fellowship. PRAC is partially supported by Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq).

Author information

Authors and Affiliations

Instituto Gulbenkian de Ciência, P-2781-901, Oeiras, Portugal
Isabel Gordo
Departamento de Física, Universidade Federal Rural de Pernambuco, 52171-900, Dois Irmãos, Recife, PE, Brazil
Paulo RA Campos

Authors

Isabel Gordo
View author publications
You can also search for this author in PubMed Google Scholar
Paulo RA Campos
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Isabel Gordo.

Additional information

Authors' contributions

The authors contributed equally to this work.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Authors’ original file for figure 4

Authors’ original file for figure 5

Authors’ original file for figure 6

Authors’ original file for figure 7

Authors’ original file for figure 8

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Gordo, I., Campos, P.R. Patterns of genetic variation in populations of infectious agents. BMC Evol Biol 7, 116 (2007). https://doi.org/10.1186/1471-2148-7-116

Download citation

Received: 25 September 2006
Accepted: 13 July 2007
Published: 13 July 2007
DOI: https://doi.org/10.1186/1471-2148-7-116

Patterns of genetic variation in populations of infectious agents