Evolution of variation in presence and absence of genes in bacterial pathways
© Francis and Tanaka; licensee BioMed Central Ltd. 2012
Received: 16 December 2011
Accepted: 17 February 2012
Published: 20 April 2012
Skip to main content
© Francis and Tanaka; licensee BioMed Central Ltd. 2012
Received: 16 December 2011
Accepted: 17 February 2012
Published: 20 April 2012
Bacterial genomes exhibit a remarkable degree of variation in the presence and absence of genes, which probably extends to the level of individual pathways. This variation may be a consequence of the significant evolutionary role played by horizontal gene transfer, but might also be explained by the loss of genes through mutation. A challenge is to understand why there would be variation in gene presence within pathways if they confer a benefit only when complete.
Here, we develop a mathematical model to study how variation in pathway content is produced by horizontal transfer, gene loss and partial exposure of a population to a novel environment.
We discuss the possibility that variation in gene presence acts as cryptic genetic variation on which selection acts when the appropriate environment occurs. We find that a high level of variation in gene presence can be readily explained by decay of the pathway through mutation when there is no longer exposure to the selective environment, or when selection becomes too weak to maintain the genes. In the context of pathway variation the role of horizontal gene transfer is probably the initial introduction of a complete novel pathway rather than in building up the variation in a genome without the pathway.
The analysis of bacterial genomes has revealed a remarkable degree of variation in the presence and absence of genes among bacteria within the same species. For instance, in a set of 3 genomes of Escherichia coli studied by Welch et al. 2002  only 40% of all genes were common to all 3 genomes, and 21% of genes in the uropathogenic isolate did not exist in the other two isolates (a non-pathogenic and a enterohaemorrhagic strain). Horizontal gene transfer plays a major role in the evolution of bacterial genomes [2–4], and is one factor explaining the large extent of variation in gene presence and absence among sets of related genomes . Indeed, bacterial genomes can acquire entire genetic pathways encoding discrete functions through horizontal gene transfer . A major impact of this dynamic is that bacteria can acquire new functions without having to evolve all of the elements of the pathway de novo. When a bacterial population is faced with a new niche, this ability to acquire new pathways confers considerable advantage . For example, genes for enzymes acting on polysaccharides in edible algae have been transferred from marine bacteria to human gut bacteria . Many examples of the horizontal acquisition of pathogenicity islands have also been described [7–9].
The large extent of variation in gene content between related bacterial genomes suggests a corresponding variation in gene presence and absence within pathways. For example, isolates of Neisseria meningitidis lacking components of virulence systems have been observed ; variation in the presence and absence of genes involved in oxidative sulfur metabolism occurs across species of green sulfur bacteria (Chlorobium and related genera) ; and metagenomic analysis of human gut microflora suggests a patchy distribution of porphyranase genes of marine origin in gut bacteria of genus Bacteroides . Given such observations, extensive variation in gene content within pathways is likely to be uncovered as more bacterial genomes are analysed.
A challenge is to explain variation in gene content, particularly at the pathway level. Why would there be any variation if a pathway is beneficial? One possibility is that a change to environmental conditions creates a situation in which a pathway is no longer needed, and so existing pathways undergo decay by loss of individual genes over time. This reductive evolution occurs at the genome level in some taxa [12, 13].
Another possibility arises in the situation in which the pathway is not essential for survival, and in which there is little benefit in carrying it. Under this scenario, provided there is little or no cost to carrying the genes, horizontal transfer contributes to an accumulation of variation in the presence of components of the pathway. If the environment changes to one in which the pathway confers an advantage, selection would then favour acquisition of the complete set of genes. Indeed, Hall et al. 1983  proposed that phenotypically silent genes or pathways may act as reservoirs for adaptation when environments change. In this context, cryptic genes may be maintained by fluctuation in environments . We later discuss how this relates to the idea of cryptic genetic variation. A similar but distinct alternative hypothesis is that selection for the pathway acts continuously through partial exposure to the selective environment. Variation in gene presence again would be found en route to the acquisition of the complete pathway.
In this paper, we investigate these alternative explanations for the occurrence of variation in pathway content. We do this by developing an evolutionary model of pathway acquisition and gene loss for a bacterial population undergoing horizontal gene transfer and facing partial exposure to a new environment in which the pathway is favoured. We find that while pathway variation is readily generated by decay of pathways no longer under selection, horizontal gene transfer is generally unable to produce much variation unless the rate of transfer is extremely high. We interpret specific examples of variation in gene presence in bacteria in the context of these findings.
We model a bacterial population that, through horizontal gene transfer, is able to acquire genes of a pathway that confers a selective advantage in some special “selective” environment or niche. The source of these genes is the collection of species carrying the complete pathway living in the selective environment. We consider the recipient bacterial population but not these source species. We assume this recipient population is large enough to be described by a deterministic system, that is, ignoring the effects of genetic drift. The model tracks the proportion x i of the population containing i genes of the n-gene pathway in the genome. We assume that the external populations have a substantially greater effect than the recipient population on the distribution of gene content in the pool of DNA fragments available for uptake by transformation or other processes. Pathways are diverse in structure and could be network-like or contain redundancy . To focus on the dynamics of acquisition and variation we assume pathway linearity and that all genes of a pathway are equally necessary.
Genes of the pathway can be inactivated through mutation. We assume these mutations are irreversible and have in mind processes such as deletion, either of the whole gene or enough of the gene to render it unable to function in the pathway. This deletion occurs at rate δ per individual per generation. We assume that δis significantly smaller than 1/n, so that we approximate the probability of one of i genes being deleted by iδ. At most one gene is deleted at a time.
We model the horizontal transfer of sets of genes belonging to the pathway as follows. Let the parameter γ 1 be the probability of acquisition of a single gene in a generation. Genes that are grouped closely — that is, clustered — on the chromosome are more likely to be transferred than unclustered genes. This assumption follows the observation that the probability of transfer decreases with the length of the segment [5, 17, 18]. Letting each additional gene on a fragment reduce the probability of transfer by a factor of κ, the transfer probability for i genes is γ i = γ 1 κ i−1 per recipient per generation. The parameter κ represents the degree of clustering of the genes: when κ is high the genes are more clustered and γ i is flatter with respect to i. Specifically, a value of κ = 1 corresponds to the genes being colocated, while κapproaches 0 for evenly spaced genes as the genome length increases (under a model of clustering such as that of Ballouz et al.).
to the population carrying i genes from the pathway.
We note that the assumption of random arrangement of genes is not entirely realistic because local synteny would persist to some extent in the source population; however, some rearrangement weakens this synteny. Furthermore, most transfer events involve a single gene since shorter fragments are more likely to be transferred, and in these cases, synteny does not affect the dynamics. We therefore adopt random arrangement as a simplifying assumption; there may be benefit in re-examining this in future work.
Parameters of the model
Range or value
the number of genes in the pathway
the fitness cost per gene to a single genome
the exposure probability (of encountering the selective environment)
the fitness benefit conferred by the whole pathway
the probability of deletion of a single gene per generation
the transfer probability for 1 gene per generation
the transfer decay parameter, modelling clustering
We consider properties of the model in the Results, and also analyse the dynamics numerically by iterating Equation (2) with initial conditions of either x 0=1 (all individuals lack all genes of the pathway) or x n = 1(all individuals have the complete pathway). We will refer to the state of lacking all genes of the pathway as the “empty pathway”. This computational work was done in the programming language Sage .
The parameters are set according to the default values shown in Table 1 and varied within the ranges given to explore their impact on the dynamics. We fix the length of the pathway at n = 5genes. Deleterious mutation occurs in E. coli at an estimated minimum rate of 2×10−4per genome per generation . If there are 4000 genes, this corresponds to 5×10−8 per gene per generation. We set the per gene loss rate to 10−9, given also that not all deleterious mutations render genes non-functional. Rates of horizontal gene transfer would vary widely depending on what mechanisms of genetic exchange are operating, the availability of foreign DNA, and presumably many other factors. We therefore consider two rates: a low rate of the same order of magnitude as the rate of gene loss, namely 10−9, and a high rate of 10−3 corresponding to the presence of highly mobile plasmids.
The metabolic cost of carrying genes is likely to be low , so we consider values in the range 0 to 0.01. The fitness w n of carrying the full pathway ranges from (1 − c) i when ε = 0 to 1 + b when ε = 1, and so the parameter b sets an upper limit for the overall fitness conferred by possession of the full pathway. We fix this at b = 0.1, and allow ε to vary in the range [0,1].
This system has a unique equilibrium, as the following shows. The state transitions of this model can be described by the equation where x is a state vector x = (x 0,x 1,…,x n ) T , and A is the matrix whose (i,j) entry is given by the coefficient of x j in the expansion of in Equation (2). This is a non-linear system, since , which is also the mean fitness of the population, depends on the state x.
Despite the system’s non-linearity, we can obtain the existence and uniqueness of an equilibrium state through considering the matrix A in the light of the Perron-Frobenius Theorem for irreducible matrices (see, for example, ), Section 8.3. This Theorem states that a non-negative irreducible matrix has a unique eigenvector that is a scalar multiple of a positive vector. The associated (positive) eigenvalue is the spectral radius of the matrix and is called the Perron root, and the positive eigenvector is called the Perron vector.
The (n + 1)×(n + 1)matrix A that arises in our model is irreducible because it is regular, meaning that a positive power of A has strictly positive entries. This in turn follows because the entries of A are all strictly positive on and below the main diagonal, as well as in the first diagonal above the main diagonal (the remaining entries are zero). Hence A n will have only positive entries, implying that A is regular. The Perron-Frobenius Theorem therefore applies, and so A has a unique eigenvector that is a scalar multiple of a positive vector. In other words, there is an eigenvector of A that consists of positive entries, and with the additional constraint that the entries sum to 1 this vector (that is, state) is unique. In concrete terms, there is a unique positive vector v summing to 1 such that A v = λ v, with λ > 0. This provides a unique solution to our system’s equilibrium equation , namely x is the Perron vector v and is the Perron root λ.
An additional consequence of the regularity of A is that the dynamics are pushed away from the boundaries of the positive cone of the state space. In other words, there is protected polymorphism. Specifically, once the state vector is strictly positive, which occurs after at most n steps, no variable x i can ever again be zero because the transition matrix has non-negative entries.
following from the definition of w i (Equation (1)). That is, under this heuristic, when ε > ε ∗ the dynamics should move to high values of x n at equilibrium. Note that for a given fitness cost c, as the pathway length n increases the exposure ε required in order for the pathway to become fixed in the population also increases.
We study the transient dynamics of pathway acquisition numerically by setting the parameters according to Table 1 and iterating the process (Equation (2)) until the variables are within a small tolerance of equilibrium. For each set of parameters, the dynamics were examined from two boundary state initial conditions: 1. no individuals in the population carried any genes in the pathway (fixation on the empty pathway, x 0=1), and 2. All individuals in the population carried all of the pathway (fixation on the complete pathway, x n =1). We use these initial conditions to understand how genes of a pathway are acquired and lost; other initial conditions would also lead to the unique internal equilibrium point described above.
To study how the dynamics of this system are affected by horizontal transfer and natural selection, we varied parameters controlling these features of the model (see Table 1). Transfer is controlled by the base rate γ 1 of transfer of a single gene and the extent of clustering κ, while selection is controlled by c, b and ε.
The first question to ask of this model is how it behaves in the absence of any cost to carrying a partial pathway and no benefit from the acquisition of the full pathway. In this way the impact of the transfer dynamics is isolated. Since we have fixed the benefit b=0.1, the lack of benefit is modelled by setting exposure to zero, and the cost c of carrying a gene is also set to zero. This situation corresponds to the bottom left hand point of each panel in Figure 4. The behaviour of the model depends on the value of γ 1. If the transfer rate is high (panels A and B), the full pathway is acquired by over 95% of the population (indicated by a black shape). If the transfer rate is low (panels C and D), we do not have 95% of the equilibrium population at either the full or empty pathway (indicated by a greyed shape).
These results reflect the balance between the only forces in play without fitness differences: transfer (pushing towards the pathway) and deletion (pushing away). The deletion rate is fixed at 10−9, and so when the transfer rate is of a similar order of magnitude γ 1 = 10−9(regardless of κ), as in the lower panels, equilibrium is near neither “corner” (x 0 = 1or x n = 1). In the case of a higher transfer rate (γ 1 = 10−3, upper panels, Figure 4) deletion is overpowered and the pathway is acquired.
The predicted threshold given by Equation (3), shown as a dashed curve on Figure 4, proves to be accurate: instances of fixation or near fixation on complete pathways (represented by black shapes) lie on the right of the dashed curve. Here, the exposure to the selective environment is high enough compared to the cost to allow pathway acquisition. To the left of the dashed curve, where exposure is low compared to cost, are instances of empty pathways (white shapes) or polymorphic equilibria (grey shapes). The latter outcomes, in which the equilibrium is neither close to the complete nor to the empty pathway, indicate a stable state for which there is substantial variation in gene presence. These only occur when exposure to the selective environment is very low, and the fitness cost of carrying a gene is also low or zero, with one exception: when the horizontal transfer parameters favour frequent transfer and high gene clustering (Figure 4B, high γ 1 and high κ). In this case, variation at equilibrium appears in a wider range of parameter values.
As mentioned above, variation in the presence of genes from a pathway means that a significant proportion of the population contains neither the complete pathway nor the empty pathway. In our context, we define this be when at least 5% of the population are carrying neither of these extremes. The maximum value of 1 − x 0 − x n is an indicator of this variation. On the de Finetti diagram of a given trajectory (e.g. see Figures 2 and 3), this corresponds to the maximum height attained by the trajectory. Such variation can arise in several ways.
Firstly, at equilibrium as noted above, variation in pathway content can occur when the exposure ε is sufficiently low compared to cost c and the transfer rate γ 1 is high (Figure 3A–C and grey triangles in Figure 4A and B). This high level of variation occurs because the force of horizontal transfer is strong enough to counter the loss of genes.
Variation can also arise in transition towards equilibrium, either from the empty or the complete pathway. In the case of the trajectory en route to acquisition, observe that genes are transferred in the model in groups of any number but with decreasing rates for larger numbers. Thus, depending on the balance of parameters, the pathway may be acquired directly by transfers of large numbers of genes, or more gradually with small numbers of genes at a time. Variation en route to acquisition is represented by black triangles in Figure 4, and occurs when the transfer rate is very high, and is more pronounced when clustering (represented by κ) is also high (Figure 4B and Figure 3D). In particular, from Figures 4A and 4B we see that variation en route to acquisition occurs when both cost c and exposure εare low.
Variation can also arise in transition in the other direction, through the loss of genes starting from a complete pathway. Indeed, when transfer rates are low, the only significant variation in the presence of genes from the pathway arises when gene loss occurs starting from the complete pathway (Figures 4C and D and Figures 3E–G). This occurs when exposure to the selective environment is very low. The same phenomenon arises for high transfer: absence of exposure to the selective environment results in slow decay that passes through significant variation.
Roughly speaking, for exposure levels lower than this, large amounts of variation in gene presence appear as pathways decay. When exposure levels are higher than ε ∗∗ but still lower than ε ∗, selection leads to pathway decay but through a more direct replacement with the empty pathway. The threshold given in Equation (4) corresponds to a very low level of exposure, closer to the far left column of the panels in Figure 4 than to the second column (representing ε=0.125). Figure 3 illustrates how de Finetti trajectories change around this threshold, with a value of ε ∗∗ = 0.039 for these parameters lying between the second and third columns.
When there is no exposure to the selective environment at all (ε = 0) — the left columns of each panel in Figure 4 — we see the greatest extent of variation (represented by triangles). Most such variation arises from pathway decay, but when the transfer rate is high enough, as in panels A and B of Figure 4, polymorphic equilibria (grey triangles) indicate that variation has accumulated from the empty pathway in the absence of selection. Even in this situation, when the cost is too high little variation accumulates (white triangles at the top of the left axes in panels A and B) reflecting the dynamic balance between transfer and fitness cost.
Flat trajectories that do not reach 0.05 in height – so that 95% of the population mass remains with x 0 and x n at all times – tend to appear when exposure is high (circles in Figure 4). These points are where the pathway is acquired through transfers of large numbers of genes. For low transfer rates (Figure 4C and D), this is the only way that pathways are acquired. An example of such a trajectory is shown in Figure 3H. When there is a higher rate of transfer (Figure 4A and B), this outcome occurs less readily, especially when clustering is high (Figure 4B). The inverse outcome, in which the pathway is acquired in pieces (black triangles in Figure 4), is only seen when the transfer rate is high, and in particular when the degree of gene clustering κ is also high (Figure 4B). Figure 3D shows this scenario on a de Finetti diagram. Even here, however, the trajectory is relatively flat, indicating only a low level of transient variation. Note that gene clustering κincreases the transient peak variation, but only when the transfer rate γ 1 is high.
In the light of the discussion of the equilibrium above, exactly one of these gives a positive solution for x 0 and x 1 under realistic parameter choices. In the case that there is no fitness cost and no exposure to the beneficial environment (c = ε = 0), we have both states equally fit (w 0 = w 1 = 1), and the system becomes linear. The dynamics are then a balance between transfer γ 1 and deletion δ, and the equilibrium is simply x 0 = δ/(δ + γ 1) and x 1 = γ 1/(δ + γ 1). If deletion and transfer occur at the same rate, the equilibrium is (x 0,x 1) = (1/2,1/2).
Analysis of the model shows that with continuous partial exposure to the selective environment a pathway can be acquired when the level of exposure is high enough compared to the cost of carrying fragments of the pathway. Continuous partial exposure may explain, for instance, the acquisition of porphyranase genes from marine bacteria by human gut bacteria. From the perspective of the gut microbes, the diet of humans ingesting seaweed provides constant exposure to the selective environment containing polysaccharides that marine bacteria are able to metabolise. Because genes of a pathway are often clustered on chromosomes [5, 24], horizontal transfer can move multiple genes, which accelerates the process of acquisition. Indeed, a bioinformatic analysis of pathways shows that acquisition is mostly rapid rather than gradual . The speed of acquisition clearly depends on the extent of exposure to the selective environment.
We have shown that some variation in the presence of genes of the pathway can accumulate when the selective environment is absent, or nearly absent, in certain conditions. Such variation is seen in Figure 4 where variation is evident along the left axis of each panel (as well as in Figure 6 when the pathway is long and exposure very low). This scenario is very similar to the notion of cryptic genetic variation, or evolutionary capacitance, which is usually discussed in the context of eukaryotes [26–28]. In principle cryptic genetic variation can also apply to prokaryotes. We contrast this concept with cryptic genes in the sense of Hall et al. 1983  who have in mind complete rather than partial pathways, which are maintained by alternating environments. Here, we are interested in variation: pathways can be built from scratch through horizontal transfer, and swiftly be driven to fixation when the appropriate environment arises. Hence, this process is much like cryptic genetic variation, except that the mechanism for variation accumulation is horizontal transfer rather than mutation. Based on our results, however, this kind of variation only occurs when the rate of horizontal transfer is very high and pathway genes are clustered (Figure 4).
Various genetic structures and mechanisms promote the transfer of genes between bacterial genomes. Integrons, for example, capture genes in cassette structures and are notable for their role, in conjunction with plasmids, in the accumulation of multiple drug resistance genes in pathogens [29, 30]. Here, horizontal transfer plays a far greater role in generating variation for selection to act on than does mutation . When mobile genetic elements harbour genes that arose by transfer rather than being directly selected, this is very much like cryptic genetic variation. Although multiple drug resistance is not an instance of a metabolic pathway it falls into the scope of this study: multiple genes are required to generate a fitness advantage in special environments in which many drugs are found. Such an environment may be an infected host who is treated either sequentially or simultaneously with multiple antibiotics. If continuous partial exposure to this environment contributes to the maintenance of the multiple resistance phenotype, then the variation is not so cryptic and these genes are more like cryptic genes in Hall’s sense .
Our model shows how variation in gene presence can also occur through decay of an existing pathway. This type of variation occurs when exposure is very low or zero; that is, when the selective environment no longer acts on the pathway. Here, the pathway is mostly lost at equilibrium but there is a large degree of variation produced transiently as the pathway decays. This variation in gene content is generally far greater than the variation due to the accumulation of genes by horizontal transfer (Figure 3). The model structure (see Figure 1) shows why this is so. The process of acquisition by horizontal transfer can be a rapid process as described above because multiple genes can be gained at once. In contrast, deletion of genes occurs gradually – one gene at a time. While acquisition occurs without much intermediate variation, decay from a complete pathway takes a different route, visiting these intermediate states. Thus, where large amounts of variation occur in pathways, such observations would be better explained by past changes in the environment so that the genes in question were no longer being needed or were only weakly selected because of genetic drift . The extensive genome variation due to gene decay observed in Shigella[13, 33] and Mycobacterium leprae is a clear illustration of this point.
Horizontal gene transfer has frequently been argued to be an important force in bacterial genome evolution . This stems from the observation of frequent transfer events that cross prokaryotic species boundaries and the inevitable distortion of phylogenetic signal that results . This idea is strengthened by the observations of the transfer of whole pathways across species and the mixed origins of elements of pathways [35, 36]. Our study examines the impact of horizontal gene transfer as an evolutionary force acting within a bacterial species, especially on the extent of variation at the pathway level. Although horizontal gene transfer is likely to play a large role in the initial acquisition of pathways, it is far less able to explain pathway variation than can mutational loss of genes.
As more genomic data become available, a careful examination of variation within pathways within species will shed light on these ideas. If the distribution of the number of genes in a pathway is either weighted towards the complete pathway or weighted in the middle (e.g., Figure 2B or 2C), then it is likely to be a decaying pathway in transit. If on the other hand, the distribution is weighted towards zero elements of the pathway, then this distribution could be consistent with either a small accumulation of variation due to horizontal transfer, or the decay of an existing pathway in its final stages (e.g. Figure 2A). In this case an examination of the phylogenies of the remaining genes would allow the issue to be resolved. If those trees are congruent (and follow the tree of vertical evolution), then the observed pattern arose by decay provided there have not been subsequent transfer events to swamp the signal; otherwise it was from horizontal transfer. Understanding variation in presence and absence of genes in pathways can provide insight into the current ecology of a species as well as its evolutionary history.
Analysis of the mathematical model developed in this study has shown that a high level of variation in gene presence in bacterial genomes can be readily explained through decay of an existing pathway. In the context of pathway variation the role of horizontal gene transfer is probably the initial introduction of a complete novel pathway rather than in building up the variation in a genome without the pathway.
The authors would like to thank Attila Egri-Nagy, Steven Hamblin, Ruiting Lan, Laurence Park and Glenn Stone for their comments on the manuscript. This work is supported by the Australian Research Council through Future Fellowship FT100100898 (to ARF) and Discovery Grant DP0987302 including a QEII Fellowship (to MMT).
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.