Research article | Open | Published:

# Evidence for positive selection on Mycobacterium tuberculosis within patients

*BMC Evolutionary Biology***volume 4**, Article number: 31 (2004)

## Abstract

### Background

While the pathogenesis and epidemiology of tuberculosis are well studied, relatively little is known about the evolution of the infectious agent *Mycobacterium tuberculosis*, especially at the within-host level. The insertion sequence IS*6110* is a genetic marker that is widely used to track the transmission of tuberculosis between individuals. This and other markers may also facilitate our understanding of the disease within patients.

### Results

This article presents three lines of evidence supporting the action of positive selection on *M. tuberculosis* within patients. The arguments are based on a comparison between empirical findings from molecular epidemiology, and population genetic models of evolution. Under the hypothesis of neutrality of genotypes, 1) the mutation rate of the marker IS*6110* is unusually high, 2) the time it takes for substitutions to occur within patients is too short, and 3) the amount of polymorphism within patients is too low.

### Conclusions

Empirical observations are explained by the action of positive selection during infection, or alternatively by very low effective population sizes. I discuss the possible roles of antibiotic treatment, the host immune system and extrapulmonary dissemination in creating opportunities for positive selection.

## Background

How actively do populations of *Mycobacterium tuberculosis* cells undergo adaptive evolution on the spatial and temporal scales of individual infections? On the one hand, the long generation time and limited sequence diversity of this organism might suggest a slow pace of adaptive evolution. On the other hand, the rapidity and ease with which antibiotic resistance is generated during infection suggests otherwise. The physiology and immunology of tuberculosis pathogenesis have been well studied. The infectious agent *M. tuberculosis* is known to invade and replicate within alveolar macrophages. There is a spectrum of responses by the immune system, corresponding to the relative involvement of *Thl* and *Th2* immune cells, which respectively stimulate the cytotoxic response (more effective against infected cells), and the humoral/antibody response (more effective against extracellular pathogens) [1, 2]. Some progress has been made in describing the population dynamics of mycobacterial infection quantitatively [3–5]. At the wider spatial and temporal scales of populations, the molecular epidemiology and the evolution of *M. tuberculosis* have been carefully studied. Genotypic data are rapidly accumulating in the molecular epidemiology of infectious diseases. These are usually compiled and summarised to make inferences about the state of an epidemic in a given geographic location, or at the global level. For example, epidemiologists seek to identify risk factors for infection, and to locate particular strains that are especially transmissible or pathogenic [6–8]. The evolutionary history of *M. tuberculosis* has also been characterised. For example, it has been argued that the limited variation at the nucleotide is due to a recent population bottleneck [9], and that the common ancestor of *Mycobacterium bovis* and *M. tuberculosis* may well have been a human rather than bovine pathogen [10].

Less understood is the evolution of *M. tuberculosis* at the cellular level *inside* bodies. There has been little integration of the genetic information from markers with the population genetics of the bacterial population within hosts. In this article, I examine data collected for the purposes of molecular epidemiology to present three lines of evidence supporting the action of positive selection on *M. tuberculosis*. The data come from the marker IS*6110*, which is currently the standard method of typing tuberculosis isolates. These genotypic data will be considered under assumptions of neutrality, and then under the assumption that positive selection is acting. The case for the action of selection is based on the following three arguments.

Under the assumption of neutrality, the observed mutation (or transposition) rate of the genetic marker IS*6110* is unusually high; the estimated mutation rate is lower if selection is acting.

The observed times associated with change are too low to be explained by neutrality; positive selection lowers the expected substitution time.

The observed level of polymorphism is too low to be explained by neutrality.

## Results: Models and observations

In each of the following sections a comparison is made between the strictly neutral model and a generalised model including selection through a single parameter *s* (described in the Appendix). Although the analyses start with strict neutrality (*s* = 0) in each argument, alleles for which *s* < 1/*N*, where *N* is the effective population size, can be considered *nearly neutral*, in that the effects of drift outweigh the force of selection [11]. In each case, explaining observations in this range of selective coefficients requires very low effective population sizes.

### Transposition rates of IS6110

When genetic mutations are selectively neutral, the substitution rate is equal to the mutation rate [11]. In the present case, the *within-host* substitution process is of interest. Rosenberg *et al.* [12] determined the within-host substitution rate of the *IS6110* marker to be around 0.00184 to 0.0390 events per copy per year, with the maximum likelihood estimate at 0.0287. Under neutrality, therefore, this rate corresponds to a *per insertion* mutation rate of *μ*_{
i
}~ 7.9 × 10^{-5} events per site per generation, assuming a generation time of 1 day in active infections. This figure comes from a measured doubling time of close to 24 hours, based on clinical isolates grown in human monocyte cultures and in culture media [13–15]. Rates of point mutation (events *per nucleotide* per generation) are usually in the vicinity of 10^{-9}. In mutator strains, that is, genomes in which the DNA repair machinery is damaged, leading to elevated mutation rates, the mutation rate rises orders of magnitude, up to ~10^{-7} – 10^{-6} [16]. The mutation rate of IS*6110* under neutrality therefore seems suspiciously high, although this is only "circumstantial evidence", since it is not inherently problematic. Indeed, mutation rates as high as 10^{-4} per element per generation have been measured for IS*10 in vitro* [17]. Nevertheless, if positive selection is allowed the estimated mutation rate decreases. Leaving aside the complicating influence of clonal interference [18], the rate of substitution is

*K* = *uN μ*_{
i
} (1)

where *u* is the probability of fixation of a mutant, *μ*_{
i
}is the mutation rate and *N* is the population size [11]. An estimate of the mutation rate when mutants have advantage *s* is = *K*/(*uN*). The diffusion model of drift provides an expression for *u* as a function of the population size *N* and selective coefficient *s* (see the Appendix). Figure 1 plots over *s* for a few different values of *N*. In each curve the estimated mutation rate decreases as the selective coefficient rises. According to this analysis, lower mutation rates are possible when there is some selection and a large population size, or when selection is strong and the population size is small. Note that the estimated mutation rate remains high if mutations are nearly neutral.

### Fixation times

Various studies have measured the stability of IS*6110* as a genetic marker by examining genotypes of serial isolates from patients with persistent infection. A small number of changes in the genotypes between serial isolates indicates a stable marker. Differences in genotypes due to exogeneous reinfection by unrelated strains are excluded from consideration. In the data of Niemann *et al.* [19] and Rosenberg *et al.* [12], the median time interval associated with changes in IS*6110* genotypes from serial samples of *M. tuberculosis* is 212 days, and the maximum is 683 days. Because the second sample is taken some time after fixation of the mutant, the actual substitution times are unknown, but they were clearly all under 683 days. I will now show that the expected substitution times under strict neutrality are well in excess of this value.

Let us start with the assumption that the expected time for substitution to occur is the average time taken for the successful mutant to appear plus the time taken for that mutant to reach fixation conditional on its eventual fixation. (I will later drop the assumption about waiting for the mutant to appear). The average appearance time is 1/(*μNu*) = 1/*μ* since *u* = 1/*N* under strict neutrality. The average time for a successful neutral mutant to reach fixation is 4*N* generations. The mutation rate of interest in this context is the rate *per genome* per generation, since what is of concern is whether any of the elements in a given genome produce change. For simplicity, assume that the genomic mutation rate scales linearly with copy number. (At the resolution of this analysis, this is a reasonable approximation.) Considering a typical strain has 10 copies of the IS element, the relevant mutation rate here is *μ* = *μ*_{
i
}× 10 = 7.9 × 10^{-4}. Therefore, for *N* = 10, 10^{3}, 10^{5}, the expected substitution times are roughly 1300, 5300, 4 × 10^{5} generations, respectively. With the generation time set to one day, the upper bound of observed substitution times was 683 generations, which is well below theoretical expectations.

Now consider the possibility of positive selection under two alternative conservative assumptions. The earlier assumption that there are no successful mutants at the time of the first sample is favourable to the parental strain. A more conservative approach (favouring mutants) would be to say that the mutant destined to reach fixation appears exactly at the time of the first sample. We can then ask how long it takes on average for this mutant to reach fixation if it is positively selected. An even more conservative model would be that not only is the successor strain present at the time of the first sample, but is present at a frequency of 30%. Furthermore, let us say the subdominant strain only needs to be at 70% at the time of the second sample to be considered to have replaced the parental strain.

A model of the sojourn times of alleles in populations conditional on fixation must now be specified. Again using the diffusion model of drift (see Appendix), the mean time spent by a mutant in the range of frequencies (*a, b*) (provided *a* is greater than the initial frequency), conditional on fixation, was found by Ewens [20] and by Maruyama [21] to be

Figure 2 shows the two conservative models, corresponding to two different boundary values for (*a, b*). Even in the extremely conservative model shown in the right-hand plot, the effective population size must be below 400 in order to explain the observed substitution times under strict neutrality. The data are difficult to account for even in terms of nearly neutral mutations (*s* < 1/*N*) and an effective population size of *N* = 1000. The alternative explanation is that the effective population size is larger, but positive selection is acting to make changes sweep through the population faster.

### Polymorphism

Many analyses of pathogen genotypes assume isolated strains to be clonal, that is, to be monomorphic. This assumption has been scrutinised by De Boer *et al.* [22], who showed that, in fact, a large proportion (93%) of *M. tuberculosis* isolates are monomorphic using IS*6110* as the marker. They also show that the limits of detection of a second strain are around frequencies of 0.1 to 0.3. More sensitive instruments and refined genotyping procedures are likely to reveal greater polymorphism. The current information can be used, however, to study the population of the organism in hosts by using ranges of *detectable polymorphism*. In this section, two ranges will be considered in examining predictions from models: first, 0.1 to 0.9, and second, 0.3 to 0.7.

The polymorphism argument rests on the assumption that the isolates reported in [22] can be viewed as a random sample from a set of populations in mutation-drift equilibrium. It should be noted that because the isolate represents a sample of cells from the patient, it presumably does not always reflect the diversity of cells in the greater within-host population. Thus the polymorphism or heterogeneity observed from isolates is an underestimate of the actual levels.

Wright [23] found the stationary probability distribution of allele frequencies under the diffusion model with mutation and two alleles. Let *f*(*x*) be the probability density function of this distribution and *F(x)* be the cumulative probability function *F(x)* (see Appendix). The probability that a given population (patient) is between frequencies *a* and *b* (where *a* <*b*) is

This quantity can be alternatively interpreted as the proportion of populations observed to be polymorphic according to the detection limits set by (*a, b*).

First consider the neutral case. When there is no selection (*s* = 0), the distribution described by *f(x)* is a Beta distribution. Figure 3 shows the probability of an isolate being scored as a polymorphic population, using two alternative detectable polymorphism ranges (*a, b*) = (0.1, 0.9) and (0.3, 0.7), and a mutation rate of *μ* = 7.9 × 10^{-4} per cell per generation.

Next, consider the model that includes selection. For the two detectable polymorphism ranges, Figure 4 shows how selective coefficient *s* and effective population size *N* are related to the probability of observing polymorphism. As *s* increases, the predicted polymorphism decreases dramatically, particularly for large *N*. Again, an explanation of the observed level of polymorphism is only possibly by setting *N* to be extremely low.

## Discussion

The three lines of evidence presented in this article suggest positive selection on *M. tuberculosis* within hosts. There are, however, limitations to these analyses. In the first argument there is no inherent problem with finding transposition rates that are high. In the second argument, it is possible to lower the effective population size far enough to explain the speed of substitution. In the third argument, 1) the bacterial populations sampled in [22] might not be close to mutation-drift equilibrium, 2) the sampled cells might not reflect the true diversity of the bacterial population in a patient, and 3) the levels of polymorphism again may be explained by very low effective population sizes. Consistency with observations nevertheless requires *N* values of around 100 or lower, which seems grossly at odds with the usually large census population sizes of bacteria. In mouse models of TB infection, for instance, bacterial loads reach around 10^{5} – 10^{7} colony forming units per lung [2, 24]. It has been noted, however, that effective population sizes of bacteria can be much lower than actual sizes [11, 25].

I will also comment on why I have not attempted to statistically fit the model to data to estimate *N* and *s*. First, from the plots shown here, it is clear that different combinations of the two parameters can explain the observations. This would make it difficult to locate the best fit. Second, although the model can be used to assess the possibility of neutrality in the current context, it cannot adequately serve as a framework for estimation given the intricacies of host-pathogen interactions. Further, adding more parameters to the model would increase the complexity of the analysis beyond what can be sustained by the resolution of the currently available data.

Taken together, the results suggest positive selection, although the evidence is not conclusive. A possible alternative is that the effective population sizes of *M. tuberculosis* within patients are very low due to population structure, background selection, or other factors. If there is indeed detectable adaptive evolution of tuberculosis within patients, what are the sources of selection? Two important candidates are antibiotic treatment and the host immune system. Studies using serial isolates have found no correlation between IS*6110* genotype instability and (a) drug resistance/susceptibility of the isolate [26, 27], (b) *change* in drug resistance status [19] or (c) drug adherence by the patient [26]. It is still possible, however, that the collection of observed changes involve a variety of different genetic loci, with at least some conferring drug resistance, although such events may not be statistically detectable. Further, mutation in drug resistance loci will not necessarily be revealed by a marker. Genetic analysis of isolates of *M. tuberculosis* from the lung lesions of six patients has shown heterogeneity in resistance-associated alleles, but not with respect to IS*6110* [28].

Alternatively, fingerprint changes may reflect (evolutionary) escape from the immune system. The analysis here hints at low effective population sizes – perhaps the immune system induces a heavy decline in population sizes of *M. tuberculosis* within patients, i.e., bottlenecks – which is overcome by survivors with new genotypes. If the observed patterns are to be explained by severe bottlenecks, the surviving cells are not necessarily better adapted to residing in the host than the parental cells that were eliminated by the immune response. It is noteworthy that Yeh *et al.* [26] found no relationship between HIV status of the patient and genotype instability. This suggests that genetic changes in *M. tuberculosis* are not primarily driven by the immune system. However, the extraordinary ability of *M. tuberculosis* to manipulate the T cell response [2] suggests the role of adaptation to the immune system in the deeper evolutionary history of the organism.

Interestingly, de Boer *et al.* [27] found an association between IS*6110* change and extrapulmonary disease or pulmonary+extrapulmonary disease and extrapulmonary origin of isolates. Dissemination is a major factor in the pathogenesis of tuberculosis. Since the lungs are the preferred environment of the organism, the new environments outside lungs may create opportunities for adaptive evolution. Adaptive evolution leading to specialisation to tissue types is to be expected. A recent article [29], for example, has found the occurrence of tissue-specific adaptations in *Streptococcus pyogenes* by examining ratios of non-synonymous to synonymous substitution rates (*d*_{
n
}/*d*_{
s
}).

Is IS*6110* directly responsible for adaptive mutations? On one hand, the apparently strict asexuality of *M. tuberculosis* implies that all genes good, bad or neutral are tightly linked to each other. It is likely then that IS-induced changes hitchhike to fixation with other mutations that confer advantage to the genome. On the other hand, it has recently been demonstrated that IS*6110* carries a promoter that can modify the the expression of neighbouring genes, raising the possibility of a direct role for the element in adaptive evolution [30]. Note that changes caused by IS*6110* can be not only beneficial, but also neutral or deleterious [31].

At the within-patient level, the best studied pathogen is perhaps HIV. While *M. tuberculosis* shares with viruses the characteristic of replicating within cells, a major difference is that mutation rates in viruses are much higher, particularly in retroviruses, which depend on reverse transcriptase (a low-fidelity enzyme) to copy their genomes. Hence, the extent of nucleotide variation of *M. tuberculosis* is not expected to be the same as is commonly observed for example in HIV [32]. There is ongoing controversy among HIV researchers about the role of stochasticity due to low effective population sizes in the evolution of the virus [33–35]. In any case, investigating the ratio of non-synonymous to synonymous substitutions (*d*_{
n
}/*d*_{
s
}) has established the action of positive selection on HIV within patients [32, 36].

In *M. tuberculosis*, the level of polymorphism at synonymous sites has been noted to be extremely low [9]. It would be of interest to measure the ratio of non-synonymous to synonymous polymorphisms in key genes, such as loci conferring resistance to drugs, or those implicated in interactions with the immune system. These *d*_{
n
}/*d*_{
s
}ratios may provide further insight into the nature of positive selection in *M. tuberculosis*.

## Appendix: Bacteria and the Wright-Fisher process

The analyses here rely on the commonly used diffusion model of drift and selection in a population, based on the Wright-Fisher process [37, 38]. It is also possible to use the Moran model, in which at each time step an individual is chosen randomly to reproduce, and then another individual is chosen to die. The individual to die may be the same as the individual that reproduced, but not the offspring. Selection can be incorporated by including differential probability of birth or death for different genotypes. As noted by Ewens [38], the Moran model closely resembles the Wright-Fisher model; the critical difference between the two models arises from differences in the distribution of offspring number. The theory is usually discussed in relation to a diploid population of size *N*_{
e
}in which there are 2*N*_{
e
}copies of the (autosomal) gene in question. The diffusion model is used here with minor adjustments to describe bacterial populations. Let the number of bacterial cells in a population be *N*. Each mutant appears in the population at frequency 1/*N*. Realistically, population sizes fluctuate and only a subset of cells actively divide. The number *N* should therefore be considered to be the effective rather than the actual population size, which may be much larger than *N*.

### Mean and variance of change

It can be shown that the deterministic dynamics of selection in a haploid model are well approximated by a logistic model. The mean change in frequency *x* of an allele per generation is *m*(*x*) = *sx*(1-*x*), which is identical to the diploid model with additive fitnesses (no dominance) if each copy of the advantageous allele adds *s* to the fitness (see [[37], p. 192]). That is, heterozygotes enjoy a fitness advantage *s* and homozygotes have advantage *2s*.

The variance component *v(x)*, the variance in change of allele frequency per generation, can also be taken from diploid theory. Replacing the diploid model of the random union of gametes with choosing cells randomly from each generation to the next, the effective population size is adjusted according to the distribution of offspring number under a given model of cellular division.

### Binary fission

There are numerous ways to model drift in populations of organisms that reproduce by dividing to produce two daughters [39]. Here, cells are assumed to undergo fission synchronously and daughter cells are chosen randomly at each generation. In the absence of selective effects, the offspring distribution is *p*_{0} = 1/4, *p*_{1} = 1/2, *p*_{2} = 1/4, where *p*_{
i
}is the probability of producing *i* offspring. The variance in offspring number here is 1/2 and the variance-effective population size equals *N*/ = 2*N*. Thus in this case, the diploid theory can be directly used as far as *v(x)* is concerned (replacing 2*N*_{
e
}with 2*N*). Johnson and Gerrish [39] consider alternative models. These alternatives are associated with different rates at which drift proceeds in a population, and would not affect the qualitative conclusions drawn here.

### Fixation probability

Diffusion models of genetic drift have shown [11] that the probability of fixation of an allele at frequency *p* in a randomly mating diploid population of size *N*_{
e
}(with 2*N*_{
e
}copies of the gene in question) is

Therefore, using *p* = 1/*N* rather than the usual *p* = l/(2*N*_{
e
}) the probability of fixation of a mutant bacterial cell with selective advantage *s* is

Note that when 4*N* s >> 1, *u* ~ 4*s*. This agrees with a result of Gerrish and Lenski [18], using a branching process model (rather than the diffusion model) to find the fixation probability under this same model of binary fission. See [39] for discussion of *u* for alternative models.

### Steady state distribution

Let the mutation rates from any genotype to any other be equal (*μ*). As stated above, selection is additive. As shown by Wright [23], the steady state distribution of allele frequency is then given by the density function

## References

- 1.
Bloom BR: Tuberculosis: pathogenesis, protection and control. 1994, Washington: ASM Press

- 2.
Flynn JL, Chan J: Tuberculosis: Latency and reactivation. Infect Immun. 2001, 69 (7): 4195-4201. 10.1128/IAI.69.7.4195-4201.2001.

- 3.
Antia R, Koella JC, Perrot V: Models of the within-host dynamics of persistent mycobacterial infections. Proc R Soc Lond B Biol Sci. 1996, 263 (1368): 257-263.

- 4.
Wigginton JE, Kirschner D: A model to predict cell-mediated immune regulatory mechanisms during human infection with

*Mycobacterium tuberculosis.*J Immunol. 2001, 166 (3): 1951-1967. - 5.
Gammack D, Doering CR, Kirschner DE: Macrophage response to

*Mycobacterium tuberculosis*infection. J Math Biol. 2004, 48 (2): 218-242. 10.1007/s00285-003-0232-8. - 6.
Small PM, Hopewell PC, Singh SP, Paz A, Parsonnet J, Ruston DC, Schecter GF, Daley CL, Schoolnik GK: The epidemiology of tuberculosis in San Francisco: A population-based study using conventional and molecular methods. N Engl J Med. 1994, 330: 1703-1709. 10.1056/NEJM199406163302402.

- 7.
Foxman B, Riley L: Molecular epidemiology: focus on infection. Am J Epidemiol. 2001, 153 (12): 1135-1141. 10.1093/aje/153.12.1135.

- 8.
Seidler A, Nienhaus A, Diel R: The transmission of tuberculosis in the light of new molecular biological approaches. Occup Environ Med. 2004, 61 (2): 96-102. 10.1136/oem.2003.008573.

- 9.
Sreevatsan S, Pan X, Stockbauer KE, Connell ND, Kreiswirth BN, Whittam TS, Musser JM: Restricted structural gene polymorphism in the

*Mycobacterium tuberculosis*complex indicates evolutionarily recent global dissemination. Proc Natl Acad Sci U S A. 1997, 94: 9869-9874. 10.1073/pnas.94.18.9869. - 10.
Brosch R, Gordon SV, Marmiesse M, Brodin P, Buchrieser C, Eiglmeier K, Gamier T, Gutierrez C, Hewinson G, Kremer K, Parsons LM, Pym AS, Samper S, van Soolingen D, Cole ST: A new evolutionary scenario for the

*Mycobacterium tuberculosis*complex. Proc Natl Acad Sci U S A. 2002, 99 (6): 3684-3689. 10.1073/pnas.052548299. - 11.
Kimura M: The Neutral Theory of molecular evolution. 1983, Cambridge: Cambridge University Press

- 12.
Rosenberg NA, Tsolaki AG, Tanaka MM: Estimating change rates of genetic markers using serial samples: applications to the transposon IS

*611O*in*Mycobacterium tuberculosis.*Theor Popul Biol. 2003, 63 (4): 347-363. 10.1016/S0040-5809(03)00010-8. - 13.
James BW, Williams A, Marsh PD: The physiology and pathogenicity of

*Mycobacterium tuberculosis*grown under controlled conditions in a defined medium. J Appl Microbiol. 2000, 88 (4): 669-677. 10.1046/j.1365-2672.2000.01020.x. - 14.
Laochumroonvorapong P, Paul S, Manca C, Freedman VH, Kaplan G: Mycobacterial growth and sensitivity to H2O2 killing in human monocytes in vitro. Infect Immun. 1997, 65 (11): 4850-4857.

- 15.
Zhang M, Gong J, Lin Y, Barnes PF: Growth of virulent and avirulent

*Mycobacterium tuberculosis*strains in human macrophages. Infect Immun. 1998, 66 (2): 794-799. - 16.
Sniegowski PD, Gerrish PJ, Johnson T, Shaver A: The evolution of mutation rates: separating causes from consequences. Bioessays. 2000, 22 (12): 1057-1066. 10.1002/1521-1878(200012)22:12<1057::AID-BIES3>3.3.CO;2-N.

- 17.
Shen MM, Raleigh EA, Kleckner N: Physical analysis of Tn10 and IS10 promoted transpositions and rearrangements. Genetics. 1987, 116: 359-369.

- 18.
Gerrish PJ, Lenski RE: The fate of competing beneficial mutations in an asexual population. Genetica. 1998, 102-103: 127-144. 10.1023/A:1017067816551.

- 19.
Niemann S, Richter E, Rusch-Gerdes S: Stability of

*Mycobacterium tuberculosis*IS*6110*restriction fragment length polymorphism patterns and spoligotypes determined by analyzing serial isolates from patients with drug-resistant tuberculosis. J Clin Microbiol. 1999, 37: 409-412. - 20.
Ewens WJ: Conditional diffusion processes in population genetics. Theor Popul Biol. 1973, 4: 21-30.

- 21.
Maruyama T: The average number and the variance of generations at particular gene frequency in the course of fixation of a mutant gene in a finite population. Genetics. 1972, 19: 109-113.

- 22.
de Boer AS, Kremer K, Borgdorff MW, de Haas PE, Heersma HF, van Soolingen D: Genetic heterogeneity in

*Mycobacterium tuberculosis*isolates reflected in IS*6110*restriction fragment length polymorphism patterns as low-intensity bands. J Clin Microbiol. 2000, 38 (12): 4478-4484. - 23.
Wright S: Evolution in Mendelian populations. Genetics. 1931, 16: 97-159.

- 24.
Moreira AL, Tsenova L, Aman MH, Bekker LG, Freeman S, Mangaliso B, Schroder U, Jagirdar J, Rom WN, Tovey MG, Freedman VH, Kaplan G: Mycobacterial antigens exacerbate disease manifestations in

*Mycobacterium tuberculosis-infected*mice. Infect Immun. 2002, 70 (4): 2100-2107. 10.1128/IAI.70.4.2100-2107.2002. - 25.
Selander R, Levin B: Genetic diversity and structure in

*Escherichia coli*populations. Science. 1980, 210 (4469): 545-7. - 26.
Yeh RW, Ponce De Leon A, Agasino CB, Hahn JA, Daley CL, Hopewell PC, Small PM: Stability of

*Mycobacterium tuberculosis*DNA genotypes. J Infect Dis. 1998, 177 (4): 1107-1111. - 27.
de Boer AS, Borgdorff MW, de Haas PEW, Nagelkerke NJD, van Embden JDA, van Soolingen D: Analysis of rate of change of IS

*6110*RFLP patterns of*Mycobacterium tuberculosis*based on serial patient isolates. J Infect Dis. 1999, 180: 1238-1244. 10.1086/314979. - 28.
Kaplan G, Post FA, Moreira AL, Wainwright H, Kreiswirth BN, Tanverdi M, Mathema B, Ramaswamy SV, Walther G, Steyn LM, Barry CE, Bekker LG:

*Mycobacterium tuberculosis*growth at the cavity surface: a microenvironment with failed immunity. Infect Immun. 2003, 71 (12): 7099-7108. 10.1128/IAI.71.12.7099-7108.2003. - 29.
Kalia A, Bessen DE: Natural selection and evolution of streptococcal virulence genes involved in tissue-specific adaptations. J Bacteriol. 2004, 186: 110-121. 10.1128/JB.186.1.110-121.2004.

- 30.
Safi H, Barnes PF, Lakey DL, Shams H, Samten B, Vankayalapati R, Howard ST: IS

*6110*functions as a mobile, monocyte-activated promoter in*Mycobacterium tuberculosis*. Mol Microbiol. 2004, 52 (4): 999-1012. 10.1111/j.1365-2958.2004.04037.x. - 31.
Tanaka MM, Rosenberg NA, Small PM: The control of copy number of IS

*6110*in*Mycobacterium tuberculosis.*Mol Biol Evol. 2004, - 32.
Bonhoeffer S, Holmes EC, Nowak MA: Causes of HIV diversity. Nature. 1995, 376 (6536): 125-10.1038/376125a0.

- 33.
Brown AJ: Analysis of HIV-1 env gene sequences reveals evidence for a low effective number in the viral population. Proc Natl Acad Sci U S A. 1997, 94 (5): 1862-1865. 10.1073/pnas.94.5.1862.

- 34.
Rouzine IM, Coffin JM: Linkage disequilibrium test implies a large effective population number for HIV

*in vivo*. Proc Natl Acad Sci U S A. 1999, 96 (19): 10758-10763. 10.1073/pnas.96.19.10758. - 35.
Rodrigo AG: HIV evolutionary genetics. Proc Natl Acad Sci U S A. 1999, 96 (19): 10559-10561. 10.1073/pnas.96.19.10559.

- 36.
Nielsen R: Changes in ds /dn in the HIV-1 env gene. Mol Biol Evol. 1999, 16 (5): 711-714.

- 37.
Crow JF, Kimura M: An introduction to population genetics theory. 1970, New York: Harper and Row

- 38.
Ewens WJ: Mathematical population genetics. 1979, Berlin: Springer-Verlag

- 39.
Johnson T, Gerrish P: The fixation probability of a beneficial allele in a population dividing by binary fission. Genetica. 2002, 115 (3): 283-287. 10.1023/A:1020687416478.

## Acknowledgments

I thank Noah Rosenberg, Joanna Masel and Roland Regoes for helpful discussions. This work was supported by a Faculty Research Grant from the University of New South Wales.

## Author information

## Authors’ original submitted files for images

## Rights and permissions

## About this article

#### Received

#### Accepted

#### Published

#### DOI

### Keywords

- Tuberculosis
- Mutation Rate
- Effective Population Size
- Adaptive Evolution
- Selective Coefficient