Insights into the evolutionary history of tubercle bacilli as disclosed by genetic rearrangements within a PE_PGRS duplicated gene pair
- Anis Karboul†1,
- Nicolaas C Gey van Pittius†2,
- Amine Namouchi1,
- Véronique Vincent3,
- Christophe Sola4, 11,
- Nalin Rastogi4,
- Philip Suffys5,
- Michel Fabre6,
- Angel Cataldi7,
- Richard C Huard8,
- Natalia Kurepina9,
- Barry Kreiswirth9,
- John L Ho10,
- M Cristina Gutierrez3 and
- Helmi Mardassi1Email author
© Karboul et al; licensee BioMed Central Ltd. 2006
Received: 24 August 2006
Accepted: 12 December 2006
Published: 12 December 2006
The highly homologous PE_PGRS (Proline-glutamic acid_polymorphic GC-rich repetitive sequence) genes are members of the PE multigene family which is found only in mycobacteria. PE genes are particularly abundant within the genomes of pathogenic mycobacteria where they seem to have expanded as a result of gene duplication events. PE_PGRS genes are characterized by their high GC content and extensive repetitive sequences, making them prone to recombination events and genetic variability.
Comparative sequence analysis of Mycobacterium tuberculosis genes PE_PGRS17 (Rv0978c) and PE_PGRS18 (Rv0980c) revealed a striking genetic variation associated with this typical tandem duplicate. In comparison to the M. tuberculosis reference strain H37Rv, the variation (named the 12/40 polymorphism) consists of an in-frame 12-bp insertion invariably accompanied by a set of 40 single nucleotide polymorphisms (SNPs) that occurs either in PE_PGRS17 or in both genes. Sequence analysis of the paralogous genes in a representative set of worldwide distributed tubercle bacilli isolates revealed data which supported previously proposed evolutionary scenarios for the M. tuberculosis complex (MTBC) and confirmed the very ancient origin of "M. canettii" and other smooth tubercle bacilli. Strikingly, the identified polymorphism appears to be coincident with the emergence of the post-bottleneck successful clone from which the MTBC expanded. Furthermore, the findings provide direct and clear evidence for the natural occurrence of gene conversion in mycobacteria, which appears to be restricted to modern M. tuberculosis strains.
This study provides a new perspective to explore the molecular events that accompanied the evolution, clonal expansion, and recent diversification of tubercle bacilli.
Mycobacterium tuberculosis complex strains (MTBC) are the causative agents of tuberculosis (TB), a disease that has a considerable detrimental impact on human and animal health worldwide . This group of slow growing pathogens includes the classical M. tuberculosis, M. bovis, M. africanum, M. microti, as well as the newly recognized MTBC members, M. pinnipedii and M. caprae species. M. tuberculosis remains one of the most successful and adaptable pathogens known to mankind despite the availability of a vaccine and effective antimicrobial agents. This adaptability certainly reflects a very ancient and prolific evolutionary history.
With the availability of complete mycobacterial genome sequences, whole-genome comparative sequence analyses were possible and resulted in the identification of sequence polymorphisms, that greatly inform our understanding of the evolutionary process of the MTBC [2–14]. It is now assumed that M. tuberculosis (the major etiological agent of human TB) and M. bovis (having a wide host range) both arose from a common ancestor [15, 16]. It has also become apparent that the M. africanum-M. microti lineage represents a phylogenetic bridge between M. tuberculosis and M. bovis, whereas "M. canettii", a rare phenotypically unusual tubercle bacillus, appears to be closest to the common progenitor of the MTBC [17, 18]. Recent studies confirmed that "M. canettii" and other smooth tubercle bacilli are representatives of pre-bottleneck lineages and that the progenitor species (the so-called M. prototuberculosis), from which the MTBC emerged, might have coexisted with early hominids [19, 20].
Completion of the genome sequence of M. tuberculosis strain H37Rv , revealed that a major source of genetic variation in this species could be associated with two large gene families encoding acidic, asparagine- or glycine-rich proteins referred to as PE (n = 99) and PPE (n = 68). These multigene families represent approximately 10% of the coding capacity of the genome and are characterized by their high GC content and extensive repetitive structure. Both families have been divided into subgroups, of which the PE_PGRS subfamily (n = 61) of the PE family is particularly polymorphic and found to be enriched in essential genes . Although the function of the members of this gene subfamily is currently unknown, the PE_PGRS genes are strongly suspected to be associated with antigenic and genetic variability as well as virulence [22–31]. It is thought that members of the PE/PPE multigene families might frequently undergo genetic remodelling by gene duplication, recombination, and/or strand slippage mechanisms because of the presence of a large number of repeat sequences within these genes . In the current study, we focused on a prominent polymorphism motif that occurs within two adjacent PE_PGRS genes, and provide evidence for its association with both early and recent evolutionary events leading to a new PE_PGRS-based perspective to dissect the evolution of tubercle bacilli.
Comparative sequence analysis of contiguous PE genes
Identification of a prominent genetic variation in PE_PGRS17 and PE_PGRS18 coding sequences
Distribution of the 12/40 polymorphism throughout a worldwide collection of tubercle bacilli
Development of a reverse hybridization-based assay for the detection of the 12/40 polymorphism and confirmation of its non-random distribution throughout the evolutionary history of the MTBC
Frequency of the 3 new PE_PGRS-based genotypic groups (PGRST) in three geographically distinct populations
As our collection contained a substantial number of strains originating from South Africa (Cape Town; n = 61), Tunisia (Tunis, Bizerte, and Zaghouan; n = 144), and the USA (New York and New Jersey; n = 82), we analysed the frequency of PGRST1, 2 and 3 types in these three geographically and socio-economically distinct countries. As shown in Figure (6B, C and 6D), PGRST1 was predominant in the three geographic situations (P < 0.001; P < 0.001; and P = 0.0192, respectively).
Genetic variation of the PE_PGRS17 and PE_PGRS18 genes
The sequences from the pre-bottleneck species, "M. canettii" and other smooth tubercle bacilli, were clearly the most divergent. Indeed, aside from sharing 10 SNPs with the other MTBC strains, they showed 25 additional specific polymorphic sites (15 sSNPs, 7 nsSNPs, 2 insertions, and 1 deletion), clearly indicating their evolutionary distance from the rest of the MTBC. In both "M. canettii" and the two other smooth tubercle bacilli strains, PE_PGRS18 is frame-shifted (a 1-nt insertion immediately after position 270) and this gene appears to be much more variable than its paralog.
Further inspection of the nucleotide substitutions revealed that nucleotide changes in both paralogs tend to occur within the same nucleotide positions, irrespective of the evolutionary status of the species. In fact, nucleotide positions 54, 119, 129, 153, 213, 217, 247, 450, 462, 507, 508, and 510, showed variability in both PE_PGRS genes. Thus, certain positions appear to be prone to genetic variation although evolving differently within different species. Strikingly, in all but one variable position (position 119), the mutations in one paralog are permuted comparatively to the other. Consequently, where the mutation is non synonymous (nucleotide positions 129, 217, 247, and 508) the phenotypic change for both paralogs is limited to two amino acids throughout all the species (Figure 7).
Because PE_PGRS genes are GC-rich sequences, we looked at the occurrence of the mutation with respect to the codon position. ANOVA and Tukey's tests (See Additional file 6) showed that, for both genes, the third codon position significantly displayed the highest GC content (P < 0,001 for both genes). However, although mutations occurred more frequently at the third codon position (P < 0.001 for both ANOVA and Tukey's test) in PE_PGRS18, no such association could be observed for PE_PGRS17. Thus, a mutational bias might have operated for the diversification of PE_PGRS18.
Previous studies involving comparative genomics and explorative genome-wide multilocus analysis conclusively showed that the present MTBC strains appear as a genetically homogeneous clonal pool, since they display a highly significant linkage disequilibrium and an exceptionally low rate of silent nucleotide substitutions [4, 7, 33, 35–37]. This picture contrasts with the situation that seems to have prevailed in the very early history of the tubercle bacillus in which a significant rate of DNA exchanges were allowed, most likely through intragenomic recombination and horizontal gene transfer . Compelling evidence suggest that members of the MTBC arose from a single successful ancestor, resulting from a recent evolutionary bottleneck [4, 15, 19]. The identity of such a parental strain has not been defined, though some genetic markers (polymorphisms located in codon 463 of the katG gene and codon 95 of the gyrA sequence, an SNP in the promoter region of the narGHJI gene complex, and the TbD1 deletion) help to distinguish between ancestral and modern MTBC strains [4, 15, 16, 38].
The findings from this study provide additional evidence for the concept that the present clonal MTBC strains are the progeny of a single successful ancestor. Indeed, the identified PE_PGRS-associated 12/40 polymorphism could represent a genetic marker for the most successful post-bottleneck-derived clone from which the MTBC strains expanded. Based on this polymorphism, we showed that all MTBC strains could be assigned to three new PE_PGRS-based genotypic groups (PGRST1 to 3). Strikingly, PGRST1 was found to be predominant in all three katG-gyrA defined PGG groups irrespective of their geographic origin and evolutionary status (ancestral or modern). Because all ancestral (TbD1+) strains, including the M. africanum-M. microti-M. bovis lineage, belong to PGRST1, one can argue that acquisition of the 12/40 polymorphism is coincident with the emergence of the most successful MTBC parental strain. Consistently, the 12/40 polymorphism was absent from both PE_PGRS genes in 6 "M. canettii" strains and one other smooth tubercle bacillus analyzed, which are believed to represent the very early pre-bottleneck MTBC progenitors .
The emergence of these two newly defined modern subpopulations could be explained by the natural occurrence of homologous recombination between the two PE_PGRS gene sequences of strains from the PGRST1 population. The sequence environment (close proximity and high sequence identity) appears to be optimal for such a mechanism to take place, although it only seems to occur in the modern M. tuberculosis strains. It has already been shown, through in vitro experiments, that initiation of homologous recombination occurs in mycobacteria provided that sequence heterology does not exceed 10–12% . Such a requirement is largely fulfilled in the case of PE_PGRS17 and PE_PGRS18, but may eventually restrict homologous recombination events between other PE_PGRS paralogs that have undergone substantial levels of sequence divergence (up to 12%).
Transfer of the 12/40 polymorphism from the PE_PGRS17 to its neighboring paralog, or its reversion (loss from PE_PGRS17 sequence) in modern M. tuberculosis, is typical of a homologous recombination process called "gene conversion". Such a gene replacement event is frequently observed among members of multigene families in bacterial genomes and contributes to both the maintenance of genetic information and creation of genetic diversity . It is very unlikely that horizontal gene transfer (HGT) may have contributed to generate the two modern PGRS types of populations (PGRST2 and PGRST3). Indeed, random acquisition of the 12/40 polymorphism through HGT would have generated an additional PGRS type population, harboring the polymorphism uniquely in its PE_PGRS18 sequence (a putative "-/+" population).
The recent generation of PGRST2 and PGRST3 subpopulations from the predominant PGRST1 accommodates either the double-strand break repair (DSBR) or the synthesis-dependent strand annealing models [41, 42] as the molecular basis for this gene conversion event. It seems that the pre-synaptic double-strand breaks that initiate the homologous recombination event in the PGRST1 population occurred more frequently in PE_PGRS18 than in PE_PGRS17, so that PGRST2 emerges more frequently than PGRST3. Under such circumstances, the DNA polymerase will use the 12/40 polymorphism-containing PE_PGRS17 sequence as homologous template to fill-in the broken PE_PGRS18 gene sequence, resulting in the PGRST2-associated genotype.
RecA-mediated gene conversion processes have been shown to occur in vitro between two rRNA operon copies in M. smegmatis, uncovering the molecular mechanism underlying resistance to aminoglycosides . As far as could be ascertained, this study provides the first concrete example and the most direct evidence for the natural occurrence of gene conversion events in mycobacteria. However, Gutacker et al.  have previously suspected recombination when addressing the distribution pattern of 5 polymorphic nucleotides within the Rv0980c-Rv0981 intergenic region (Rv0980c-Rv0981 iSNPs). This finding raises the question whether the complicated pattern of the Rv0980c-Rv0981 iSNPs would be linked to the 12/40 polymorphism-associated gene conversion event. Indeed, if gene conversion extends to the homologous intergenic sequences of PE_PGRS17 and PE_PGRS18, the distribution profile of the iSNPs must be identical for both genes, as a result of gene replacement. The data show, that for both H37Rv and CDC1551, whose PE_PGRS17 and PE_PGRS18 genes have undergone gene conversion, the iSNPs distribution patterns of the two genes are quite different. Thus, unless the 5 polymorphic nucleotide positions are exceptionally unstable, the complicated pattern of this intergenic polymorphism does not seem to be associated with the gene conversion event described in the present paper.
It is well assumed that gene duplication and subsequent functional divergence are crucial for bacterial evolution as they play a major role in gene innovation and adaptation to changing environments . In this context, it is worth mentioning that although PE_PGRS genes are restricted to mycobacterial species, they have preferentially expanded within the genomes of pathogenic mycobacteria, most likely through extensive gene duplication events coupled to genetic divergence during their adaptation to the very hostile intra-macrophagic environment . We hypothesize that gene conversion may have contributed to the evolution of members of the PE_PGRS subfamily and may have participated in the generation of antigenic variation in their members [22, 2, 28]. It is striking that this type of recombination does not seem to occur in the MTBC members other than modern M. tuberculosis, and one wonders whether this is a mechanism that is specific to, or that may occur at greater frequency in modern M. tuberculosis. Recently, Lui et al. , extending Gutacker's analysis , identified a mosaic polymorphic pattern (the IRMT0105 locus) associated with a PPE gene (MT0105). The authors hypothesize that small-scale gene conversion or recombination at hotspots near PE or PPE gene families has been an important mechanism for M. tuberculosis to escape immune surveillance.
As far as could be ascertained, the functions(s) of PE_PGRS17 and PE_PGRS18 are unknown and, as yet, there is no indication whether they are essential or not. According to the present study, it could be assumed that both PE_PGRS genes may be dispensable for normal in vivo growth under certain conditions as they are absent from the genome of M. leprae, and PE_PGRS18 was frame-shifted in the two "M. canettii" and two other smooth tubercle bacilli strains analysed. By contrast, no such frame shift mutations were observed in the worldwide sequenced collection of MTBC strains, indicating that they may have evolved to assume an essential role in these particular widespread species. Consistently, PE_PGRS17 and PE_PGRS18 belong to the so-called iVEGI (in vivo-expressed genomic island), a cluster of 49 in vivo-expressed genes, thought to encode cell wall components and participate in lipid metabolism required for mycobacterial survival in vivo . Within this island, PE_PGRS17 and PE_PGRS18 account among the 21 genes that display higher expression levels in mice samples compared to in vitro cultures. The iVEGI locus harbors at least three genes (Rv0981, Rv0986, and Rv0987), whose products were shown to be required in early interactions with the host cell as well as in persistence [46–48]. Furthermore, genes playing critical roles in bacterial survival and fitness generally display higher acquisition rates for sSNP (Ks) in comparison to nsSNP (Ka). We found that both PE_PGRS17 and PE_PGRS18 are under purifying selection as the majority of disadvantageous phenotypic changes would have been eliminated during evolution. We consistently found that, irrespective of the species, only particular non-synonymous changes are tolerated within certain nucleotide positions of both genes. These findings, and the fact that these genes appear to be preferentially expressed in vivo , argue for a potential role in host-pathogen interactions.
Finally, the question comes to mind whether the occurrence of the 12/40 polymorphism could have enabled PE_PGRS17 to acquire a new or altered function that positively influenced the evolution of the MTBC. If so, it is also interesting to speculate whether the recent change from the PGRST1 to PGRST2 genotype in M. tuberculosis would further increase its fitness and/or adaptability. Indeed, as with PGG2 isolates, PGRST2 appears to be frequently associated with outbreak strains and clustered cases, although not exclusively. By contrast, as mentioned earlier, the PGRST3 genotype is very rare among TB cases, and two of the four strains found to contain the genotype are in fact laboratory strains (H37Rv and H37Ra), thus seeming less prone to expansion and/or to have occurred as a more recent reversion event. Although it is hard to believe that a single polymorphism in one or two genes would have dramatically impacted the evolution of the tubercle mycobacterial species, further experiments based on gene replacement and/or inactivation of the different forms of both PE_PGRS genes are needed to clarify this issue. It is worth mentioning that the 12/40 polymorphism lies within a region of the protein, which according to the domain organization proposed by Brenan et al. , may represent a transmembrane helix. This location may be critical for the protein function inasmuch as PE/PPE protein complexes are strongly suspected to be involved in signal transduction .
Deciphering the evolution of bacterial populations is crucial to better understand the genetic traits behind the emergence of biomedically relevant strains. In the present study, we identified a novel, PE_PGRS-based, genetic polymorphism that expands our knowledge of the history of the tubercle mycobacterial species. This polymorphism provides a valuable marker of the ill-defined successful ancestor that emerged from the evolutionary bottleneck and from which the MTBC expanded. The findings also demonstrate the involvement of natural gene conversion events specifically in the diversification of the modern M. tuberculosis population. To our knowledge, this paper provides the first concrete example for the natural occurrence of such a molecular event in mycobacteria.
The complete gene and protein sequences of all members of the PE gene family in the genome of M. tuberculosis H37Rv were obtained from the GenoList (Pasteur Institute) website . The sequences from all PE members which are contiguous in the genome were extracted from these datasets for further characterization. The contiguous PE sequences were aligned using CLUSTALW . Amino acid similarity and identity rates were calculated using BioEdit  with an integrated Blosum62 matrix. Members that showed high percentages of identity were further aligned to their corresponding orthologs from the genome sequences of M. tuberculosis CDC1551  and M. bovis AF2122/97 . The complete genome sequences of M. marinum strain ATCC BAA-535  and M. microti strain OV254  were obtained from the Sanger Institute website.
A total of 521 mycobacterial isolates recovered from diverse geographic origins (Africa, Asia, Australia, Europe, and North- and South-America), were used in this study. The collection was chosen to be representative of the known diversity of the MTBC and the pre-bottleneck lineages. It included 415 M. tuberculosis strains (representing members from all three Principal Genetic Groups (PGG) as defined by Sreevatsan et al. , i.e. 108 PGG1, 259 PGG2 and, 48 PGG3 strains), 42 M. bovis strains (including 5 different BCG strains), 30 M. africanum strains representing members from all three subtypes defined by Viana-Niero et al.  (i.e. 14 subtype A1, 6 subtype A2 and 8 subtype A3 strains), 17 M. microti strains (including 9 from voles, 3 from llama, 2 from cat, 1 from pig and 2 from humans), 3 dassie bacillus strains, 4 M. pinnipedii strains, 2 M. caprae strains, 6 "M. canettii" strains and 2 smooth tubercle bacilli isolates (representing members from five of the eight Smooth Tubercle Bacilli Groups- ST groups-identified by Gutierrez et al. , ie.1 ST group A, 3 ST group C, 2 ST group D, and 1 each of ST group B and I). The M. tuberculosis isolates, which were recovered from at least 32 different countries, involved 57 ancestral (TbD1+) , 31 Beijing, 91 Haarlem, 73 LAM, 85 T, 16 CAS, 17 X, 5 S, 10 U and 1 MANU families . Details on the geographic origin, host, spoligotype pattern, and PGG of each isolate of the whole collection are available in the Additional file 4.
PCR and DNA sequencing
PCR amplification of PE_PGRS17 (Rv0978c) and PE_PGRS18 (Rv0980c) gene sequences was accomplished using a common sense primer, 7880S (5'-ATGTCGTTTGTCAACGTGGC-3'; positions 1–20) and the specific reverse oligonucleotides 0978R1 (5'-TCAGCTGATTACCGACACCGT-3', 976–996) and 0980R1 (5'-TCATATGGCCGCCGAACACAC-3', 1354–1374), respectively. The amplification reaction mixture contained 2 μl of template genomic DNA (about 20 ng), 10 μl of 10× buffer (Qiagen), 10 μl DMSO, 2 μl of 10 mM nucleotide mix (Amersham Biosciences), 2 μl of each primer (20 μM stock), 0,25 μl (1.25 U) of HotStart Taq DNA polymerase (Qiagen) and sterile nuclease-free water (Amersham Biosciences) to 100 μl total reaction volume. Cycling was carried out in a PTC 9700 thermocycler (Applied Biosystems) with an initial denaturation step of 10 min at 96°C followed by 35 cycles consisting of 1 min at 95°C, 1 min at 60°C and 2 min at 72°C. The amplification ended with a final elongation step of 7 min at 72°C. PCR products were purified using the GFX PCR DNA and Gel Band purification kit (Amersham Biosciences) according to the manufacturer's protocol. Partial DNA sequencing (nucleotides 31 to 712) was performed using the sense primers 7880S (see above) and PEGA.S (5'-CAAGCGATCAGCGCGCAGG-3', 184–202) for both genes. Sequencing on the reverse strand involved the internal primers 0978R2 (5'-CGCTTGGACCGTTGCCGATGG-3', 770–790) and 0980R2 (5'-GAGGCTGACCGCGCCGCCGGT-3', 730–750) for PE_PGRS17 and PE_PGRS18, respectively. Determination of the nucleotide sequence was performed with the Prism Ready Reaction Dye Deoxy Terminator Cycle sequencing Kit on an ABI PRISM 377 DNA sequencer (Applied Biosystems). Each sample was sequenced from two independent PCR amplification reactions.
The sequence data was edited and aligned using the software programs BioEdit  and ClustalW . Both the PE_PGRS17 and PE_PGRS18 genes were either analysed individually or upon concatenation. The software programs Arlequin v.2.0  and DNASP  were used to obtain summary statistics of genetic diversity. To test for adaptive selection, we determined the nucleotide substitution changes and the ratio of synonymous (Ks) and nonsynonymous (Ka) substitutions per site. For this purpose, we used the analysis developed by Nei-Gojobori  as implemented in the DNASP package. A statistical analysis of ANOVA and a Tukey's test were performed to test for significant difference of GC content and substitution rates between PE_PGRS17 and PE_PGRS18.
Set up of a reverse hybridization dot blot assay for rapid PE_PGRS-based grouping (PEGAssay) of MTBC strains
Biotinylated PCR products encompassing the 12/40 polymorphism were obtained using the common sense primer PEGA.S and the biotinylated member specific reverse primers PEGA78.R (5'-bGACACCGTGCCGCTGCCGAAA-3', 705–725) and PEGA80.R (5'-bCCGTTGCCGAACAGCCATCC-3', 568–587) for PE_PGRS17 and PE_PGRS18, respectively. The amplification conditions were the same as mentioned above. Ten microliters of the biotinylated and heat denatured PCR product (diluted to a total volume of 150 μl with 2 × SSPE-0.1% SDS) was further hybridized with a 24-mer 5' amino-linked oligonucleotide probe (5'-GATCGAGCAGGCCCTGTTGGGGGT-3'). The latter represents the 12-nucleotide insertion and the immediate downstream 12 nucleotides of the 12/40 polymorphism and was synthesized according to the PE_PGRS17 sequence of strain CDC1551. The probe was diluted in 150 μl of 0.5 M NaHCO3 (final concentration of 3 ng/μl) and covalently bound to a Biodyne C membrane (Pall Biosupport, Portsmouth, United Kingdom) using standard protocols . Briefly, after activation of the membrane, 150 μl of the diluted probe was applied in wells of a 96-well dot blot apparatus (BioRad) and incubated for 5 min at room temperature. Following inactivation and washing steps, biotinylated PCR products were added to the wells and the entire apparatus was then incubated at stringent hybridization conditions (65°C for 1 h). The samples were removed by vacuum aspiration and the membrane was washed three times with 50 ml of 2 × SSPE-0.5% SDS for 10 min at 57°C. After incubation for 45 min at 42°C with 10 U of streptavidin-POD (Amersham Biosciences) diluted in 20 ml of 2 × SSPE-0.5% SDS, the membrane was washed twice with 50 ml of 2 × SSPE-0.5% SDS for 10 min at 42°C and then once with 25 ml of 2 × SSPE for 10 min at room temperature. ECL chemiluminescence detection reagents (Amersham Biosciences) were added according to the manufacturer's instructions, and the membrane was exposed to Hyperfilm ECL (Amersham Biosciences) for 5 min. To allow for repeated use (up to 10 times), the membrane was stripped during a 1 h incubation in 1% SDS at 90°C, after which it was incubated in 20 mM EDTA for 20 min at room temperature and stored at 4°C.
Associations were evaluated for statistical significance using the χ2 or Fisher's Exact Test implemented in GrapPad Prism v.4. (GraphPad Software, Inc., USA). A P value < 0.05 was considered to be significant.
Proline-glutamic acid_polymorphic GC-rich repetitive sequence
non-synonymous single nucleotide polymorphism
synonymous single nucleotide polymorphism
Principal Genetic group
We thank Valerie Mizrahi for her valuable comments on the manuscript, and Maherzia Ben Fadhel for technical assistance with sequencing. We are grateful to Thierry Zozio (Unité de la Tuberculose et des Mycobactéries, Institut Pasteur de Guadeloupe, Guadeloupe) for helping with DNA preparation of LAM and CAS strain families of M. tuberculosis.
This work was supported by funds from the United Nations Development Program/World Bank/World Health Organization Special Program for Research and Training in Tropical Diseases (TDR).
- World Health Organization: Global Tuberculosis Control Surveillance, Planning, Financing. 2005, World Health Organization, GenevaGoogle Scholar
- Cole ST, Brosch R, Parkhill J, Garnier T, Churcher C, Harris D, Gordon SV, Eiglmeier K, Gas S, Barry CE: Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence. Nature. 1998, 393: 537-544. 10.1038/31159.View ArticlePubMedGoogle Scholar
- Garnier T, Eiglmeier K, Camus JC, Medina N, Mansoor H, Pryor M, Duthoy S, Grondin S, Lacroix C, Monsempe C: The complete genome sequence of Mycobacterium bovis . Proc Natl Acad Sci USA. 2003, 100: 7877-7882. 10.1073/pnas.1130426100.PubMed CentralView ArticlePubMedGoogle Scholar
- Sreevatsan S, Pan X, Stockbauer KE, Connell ND, Kreiswirth BN, Whittam TS, Musser JM: Restricted structural gene polymorphism in the Mycobacterium tuberculosis complex indicates evolutionarily recent global dissemination. Proc Natl Acad Sci USA. 1997, 94: 9869-9874. 10.1073/pnas.94.18.9869.PubMed CentralView ArticlePubMedGoogle Scholar
- Musser JM, Amin A, Ramaswamy S: Negligible genetic diversity of Mycobacterium tuberculosis host immune system protein targets: evidence of limited selective pressure. Genetics. 2000, 155: 7-16.PubMed CentralPubMedGoogle Scholar
- Gutacker MM, Smoot JC, Migliaccio CA, Ricklefs SM, Hua S, Cousins DV, Graviss EA, Shashkina E, Kreiswirth BN, Musser JM: Genome-wide analysis of synonymous single nucleotide polymorphisms in Mycobacterium tuberculosis complex organisms: resolution of genetic relationships among closely related microbial strains. Genetics. 2002, 162: 1533-1543.PubMed CentralPubMedGoogle Scholar
- Fleischmann RD, Alland D, Eisen JA, Carpenter L, White O, Peterson J, DeBoy R, Dodson R, Gwinn M, Haft D: Whole-genome comparison of Mycobacterium tuberculosis clinical and laboratory strains. J Bacteriol. 2002, 184: 5479-5490. 10.1128/JB.184.19.5479-5490.2002.PubMed CentralView ArticlePubMedGoogle Scholar
- Alland D, Whittam TS, Murray MB, Cave MD, Hazbon MH, Dix K, Kokoris M, Duesterhoeft A, Eisen JA, Fraser CM, Fleischmann RD: Modeling bacterial evolution with comparative-genome-based marker systems: application to Mycobacterium tuberculosis evolution and pathogenesis. J Bacteriol. 2003, 185: 3392-3399. 10.1128/JB.185.11.3392-3399.2003.PubMed CentralView ArticlePubMedGoogle Scholar
- Baker L, Brown T, Maiden MC, Drobniewski F: Silent nucleotide polymorphisms and a phylogeny for Mycobacterium tuberculosis. Emerg Infect Dis. 2004, 10: 1568-1577.PubMed CentralView ArticlePubMedGoogle Scholar
- Garcia-Pelayo MC, Caimi KC, Inwald JK, Hinds J, Bigi F, Romano MI, van Soolingen D, Hewinson RG, Cataldi A, Gordon SV: Microarray analysis of Mycobacterium microti reveals deletion of genes encoding PE-PPE proteins and ESAT-6 family antigens. Tuberculosis (Edinb). 2004, 84: 159-166. 10.1016/j.tube.2003.12.002.View ArticleGoogle Scholar
- Behr MA, Wilson MA, Gill WP, Salamon H, Schoolnik GK, Rane S, Small PM: Comparative genomics of BCG vaccines by whole-genome DNA microarray. Science. 1999, 284: 1520-1523. 10.1126/science.284.5419.1520.View ArticlePubMedGoogle Scholar
- Kato-Maeda M, Rhee JT, Gingeras TR, Salamon H, Drenkow J, Smittipat N, Small PM: Comparing genomes within the species Mycobacterium tuberculosis . Genome Res. 2001, 11: 547-554. 10.1101/gr.166401.PubMed CentralView ArticlePubMedGoogle Scholar
- Gagneux S, DeRiemer K, Van T, Kato-Maeda M, de Jong BC, Narayanan S, Nicol M, Niemann S, K Kremer, Gutierrez C: Variable host-pathogen compatibility in Mycobacterium tuberculosis . Proc Natl Acad Sci USA. 2006, 103: 2869-2873. 10.1073/pnas.0511240103.PubMed CentralView ArticlePubMedGoogle Scholar
- Huard RC, Fabre M, de Haas P, Lazzarini LCO, van Soolingen D, Cousin D, Ho JL: Novel Genetic polymorphisms that further delineate the phylogeny of the Mycobacterium tuberculosis complex. J Bacteriol. 2006, 188: 4271-4287. 10.1128/JB.01783-05.PubMed CentralView ArticlePubMedGoogle Scholar
- Brosch R, Gordon SV, Marmiesse M, Brodin P, Buchrieser C, Eiglmeier K, Garnier T, Gutierrez C, Hewinson G, Kremer K, Parsons LM, Pym AS, Samper S, van Soolingen D, Cole ST: A new evolutionary scenario for the Mycobacterium tuberculosis complex. Proc Natl Acad Sci USA. 2002, 99: 3684-3689. 10.1073/pnas.052548299.PubMed CentralView ArticlePubMedGoogle Scholar
- Marmiesse M, Brodin P, Buchrieser C, Gutierrez C, Simoes N, Vincent V, Glaser P, Cole ST, Brosch R: Macro-array and bioinformatic analyses reveal mycobacterial 'core' genes, variation in the ESAT-6 gene family and new phylogenetic markers for the Mycobacterium tuberculosis complex. Microbiology. 2004, 150: 483-496. 10.1099/mic.0.26662-0.View ArticlePubMedGoogle Scholar
- Van Soolingen D, Hoogenboezem T, De Haas PEW, Hermans PWM, Koedam MA, Teppema KS, Brennan PJ, Besra GS, Portaels F, Top J, Schouls LM, Van EMbden DA: A novel pathogenic taxon of the Mycobcaterium tuberculosis complex, canettii: Characterization of an exceptional isolate from africa. Int J Sys Bac. 1997, 47: 1236-1245.View ArticleGoogle Scholar
- Fabre M, Koeck JL, Le Flèche P, Simon F, Hervé V, Vergnaud G, Pourcel C: High Genetic Diversity Revealed by Variable-Number Tandem Repeat Genotyping and Analysis of hsp65 Gene Polymorphism in a Large Collection of " Mycobacterium canettii " Strains Indicates that the M. tuberculosis Complex Is a Recently Emerged Clone of " M. canettii ". J Clin Microbiol. 2004, 42: 3248-3255. 10.1128/JCM.42.7.3248-3255.2004.PubMed CentralView ArticlePubMedGoogle Scholar
- Gutierrez MC, Brisse S, Brosch R, Fabre M, Omais B, Marmiesse M, Supply P, Vincent V: Ancient origin and gene mosaicism of the progenitor of Mycobacterium tuberculosis. PLoS Pathogens. 2005, 1: 1-7. 10.1371/journal.ppat.0010005.View ArticleGoogle Scholar
- Goh KS, Fabre M, Huard RC, Schmid S, Sola C, Rastogi N: Study of the gyrB gene polymorphism as a tool to differentiate among Mycobacterium tuberculosis complex subspecies further underlines the older evolutionary age of "Mycobacterium canettii ". Mol Cell Probes. 2006, 20: 182-190. 10.1016/j.mcp.2005.11.008.View ArticlePubMedGoogle Scholar
- Lamichhane G, Zignol M, Blades NJ, Geiman DE, Dougherty A, Grosset J, Broman KW, Bishai WR: A postgenomic method for predicting essential genes at subsaturation levels of mutagenesis: application to Mycobacterium tuberculosis . Proc Natl Acad Sci USA. 2003, 100: 7213-7218. 10.1073/pnas.1231432100.PubMed CentralView ArticlePubMedGoogle Scholar
- Banu S, Honore N, Saint-Joanis B, Philpott D, Prevost MC, Cole ST: Are the PE-PGRS proteins of Mycobacterium tuberculosis variable surface antigens?. Mol Microbiol. 2002, 44: 9-19. 10.1046/j.1365-2958.2002.02813.x.View ArticlePubMedGoogle Scholar
- Delogu G, Pusceddu C, Bua A, Fadda G, Brennan MJ, Zanetti S: Rv1818c-encoded PE_PGRS protein of Mycobacterium tuberculosis is surface exposed and influences bacterial cell structure. Mol Microbiol. 2004, 52: 725-733. 10.1111/j.1365-2958.2004.04007.x.View ArticlePubMedGoogle Scholar
- Brennan MJ, Delogu G, The PE multigene family: A 'molecular mantra' for mycobacteria. Trends Microbiol. 2002, 10: 246-249. 10.1016/S0966-842X(02)02335-1.View ArticlePubMedGoogle Scholar
- Brennan MJ, Delogu G, Chen Y, Bardarov S, Kriakov J, Alavi M, Jacobs WR: Evidence that mycobacterial PE_PGRS proteins are cell surface constituents that influence interactions with other cells. Infect Immun. 2001, 69: 7326-7333. 10.1128/IAI.69.12.7326-7333.2001.PubMed CentralView ArticlePubMedGoogle Scholar
- Delogu G, Brennan MJ: Comparative immune response to PE and PE_PGRS antigens of Mycobacterium tuberculosis . Infect Immun. 2001, 69: 5606-5611. 10.1128/IAI.69.9.5606-5611.2001.PubMed CentralView ArticlePubMedGoogle Scholar
- Singh KK, Zhang X, Patibandla AS, Chien P, Laal S: Antigens of Mycobacterium tuberculosis expressed during preclinical tuberculosis: Serological immunodominance of proteins with repetitive amino acid sequences. Infect Immun. 2001, 69: 4185-4191. 10.1128/IAI.69.6.4185-4191.2001.PubMed CentralView ArticlePubMedGoogle Scholar
- Talarico S, Cave MD, Marrs CF, Foxman B, Zhang L, Yang Z: Variation of the Mycobacterium tuberculosis PE_PGRS 33 gene among clinical isolates. J Clin Microbiol. 2005, 43: 4954-4960. 10.1128/JCM.43.10.4954-4960.2005.PubMed CentralView ArticlePubMedGoogle Scholar
- Chaitra MG, Hariharaputran S, Chandra NR, Shaila MS, Nayak R: Defining putative T cell epitopes from PE and PPE families of proteins of Mycobacterium tuberculosis with vaccine potential. Vaccine. 2005, 23: 1265-1272. 10.1016/j.vaccine.2004.08.046.View ArticlePubMedGoogle Scholar
- Ramakrishnan L, Federspiel NA, Falkow S: Granuloma-specific expression of Mycobacterium virulence proteins from the glycine-rich PE-PGRS family. Science. 2000, 288: 1436-1439. 10.1126/science.288.5470.1436.View ArticlePubMedGoogle Scholar
- Dheenadhayalan V, Delogu G, Brennan MJ: Expression of the PE_PGRS 33 protein in Mycobacterium smegmatis triggersnecrosis in macrophages and enhanced mycobacterial survival. Microbes Infect. 2006, 8: 262-272. 10.1016/j.micinf.2005.06.021.View ArticlePubMedGoogle Scholar
- Gevers D, Vandepoele K, Simillon C, Van de PeerThompson Y: Gene duplication and biased functional retention of paralogs in bacterial genomes. Trends Microbiol. 2004, 12: 148-154. 10.1016/j.tim.2004.02.007.View ArticlePubMedGoogle Scholar
- Gutacker MM, Mathema B, Soini H, Shashkina E, Kreiswirth BN, Graviss EA, Musser JM: Single-Nucleotide Polymorphism-Based Population Genetic Analysis of Mycobacterium tuberculosis Strains from 4 Geographic Sites. J Infect Dis. 2006, 193: 121-128. 10.1086/498574.View ArticlePubMedGoogle Scholar
- Mardassi H, Namouchi A, Haltiti R, Zarrouk M, Mhenni B, Karboul A, Khabouchi N, Gey van Pittious NC, Streicher EM, Rauzier J, Gicquel B, Dellagi K: Tuberculosis due to resistant Haarlem strain, Tunisia. Emerg Infect Dis. 11: 957-961.
- Smith NH, Dale J, Inwald J, Palmer S, Gordon SV, Hewinson RG, Smith JM: The population structure of Mycobacterium bovis in Great Britain: clonal expansion. Proc Natl Acad Sci USA. 2003, 100: 15271-15275. 10.1073/pnas.2036554100.PubMed CentralView ArticlePubMedGoogle Scholar
- Supply P, Warren RM, Banuls AL, Lesjean S, Van Der Spuy GD, Lewis LA, Tibayrenc M, Van Helden PD, C Locht: Linkage disequilibrium between minisatellite loci supports clonal evolution of Mycobacterium tuberculosis in a high tuberculosis incidence area. Mol Microbiol. 2003, 47: 529-538. 10.1046/j.1365-2958.2003.03315.x.View ArticlePubMedGoogle Scholar
- Filliol I, Motiwala AS, Cavatore M, Qi W, Hazbon MH, Bobadilla del Valle M, Fyfe J, Garcia-Garcia L, Rastogi N, Sola C, Zozio T: Global phylogeny of Mycobacterium tuberculosis based on single nucleotide polymorphism (SNP) analysis: insights into tuberculosis evolution, phylogenetic accuracy of other DNA fingerprinting systems, and recommendations for a minimal standard SNP set. J Bacteriol. 2006, 188: 759-772. 10.1128/JB.188.2.759-772.2006.PubMed CentralView ArticlePubMedGoogle Scholar
- Goh KS, Rastogi N, Berchel M, Huard RC, Sola C: Molecular evolutionary history of tubercle bacilli assessed by study of the polymorphic nucleotide within the nitrate reductase (narGHJI) operon promoter. J Clin Microbiol. 2005, 43: 4010-4014. 10.1128/JCM.43.8.4010-4014.2005.PubMed CentralView ArticlePubMedGoogle Scholar
- Springer B, Sander P, Sedlacek L, Hardt WD, Mizrahi V, Schar P, Bottger EC: Lack of mismatch correction facilitates genome evolution in mycobacteria. Mol Microbiol. 2004, 53: 1601-1609. 10.1111/j.1365-2958.2004.04231.x.View ArticlePubMedGoogle Scholar
- Santoyo G, Romero D: Gene conversion and concerted evolution in bacterial genomes. FEMS Microbiol Rev. 2005, 29: 169-183. 10.1016/j.femsre.2004.10.004.View ArticlePubMedGoogle Scholar
- Szostak JW, Orr-Weaver TL, Rothstein RJ, Stahl FW: The double-strand-break repair model for recombination. Cell. 1983, 3: 25-35. 10.1016/0092-8674(83)90331-8.View ArticleGoogle Scholar
- Allers T, Lichten M: Differential timing and control of noncrossover and crossover recombination during meiosis. Cell. 2001, 106: 47-57. 10.1016/S0092-8674(01)00416-0.View ArticlePubMedGoogle Scholar
- Prammananan T, Sander P, Springer B, Bottger EC: RecA-Mediated gene conversion and aminoglycoside resistance in strains heterozygous for rRNA. Antimicrob Agents Chemother. 1999, 43: 447-453.PubMed CentralView ArticlePubMedGoogle Scholar
- Liu X, Gutacker MM, Musser JM, Fu YX: Evidence for Recombination in Mycobacterium tuberculosis. J Bacteriol.
- Talaat AM, Lyons R, Howard ST, Johnston SA: The temporal expression profile of Mycobacterium tuberculosis infection in mice. Proc Natl Acad Sci USA. 2004, 101: 4602-4607. 10.1073/pnas.0306023101.PubMed CentralView ArticlePubMedGoogle Scholar
- Zahrt TC, Deretic V: Mycobacterium tuberculosis signal transduction system required for persistent infections. Proc Natl Acad Sci USA. 2001, 98: 12706-12711. 10.1073/pnas.221272198.PubMed CentralView ArticlePubMedGoogle Scholar
- Rosas-Magallanes V, Deschavanne P, Quintana-Murci L, Brosch R, Gicquel B, Neyrolles O: Horizontal transfer of a virulence operon to the ancestor of Mycobacterium tuberculosis. Mol Biol Evol. 2006, 23: 1129-1135. 10.1093/molbev/msj120.View ArticlePubMedGoogle Scholar
- Pethe K, Swenson DL, Alonso S, Anderson J, Wang C, Russell DG: Isolation of Mycobacterium tuberculosis mutants defective in the arrest of phagosome maturation. Proc Natl Acad Sci USA. 2004, 101: 13642-13647. 10.1073/pnas.0401657101.PubMed CentralView ArticlePubMedGoogle Scholar
- Strong M, Sawaya MR, Wang S, Phillips M, Cascio D, Eisenberg D: Toward the structural genomics of complexes: Crystal structure of a PE/PPE protein complex from Mycobacterium tuberculosis . Proc. Natl Acad Sci USA. 2006, 103: 8060-8065. 10.1073/pnas.0602606103.PubMed CentralView ArticlePubMedGoogle Scholar
- Genolist (Pasteur Institute). [http://genolist.pasteur.fr]
- Thompson JD, Desmond GH, Toby JG: ClustalW: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994, 22: 4673-4680.PubMed CentralView ArticlePubMedGoogle Scholar
- Hall TA: BioEdit: A user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symp. 1999, 41: 95-98.Google Scholar
- The Institute for Genomics Research (TIGR). [http://www.tigr.org]
- Sanger Centre. [http://www.sanger.ac.uk]
- Viana-Niero C, Gutierrez C, Sola C, Filliol I, Boulhabal f, Vincent V, Rastogi N: Genetic diversity of Mycobacterium africanum clinical isolates based on IS 6110-restriction fragment length polymorphism analysis, spoligotyping, and variable tandem DNA repeats. J Clin Microbiol. 2001, 39: 57-65. 10.1128/JCM.39.1.57-65.2001.PubMed CentralView ArticlePubMedGoogle Scholar
- Brudey K, Driscoll JR, Rigouts L, Prodinger WM, Gori A, Al-Hajoj SA, Allix C, Aristimuno L, Arora J, Baumanis V: Mycobacterium tuberculosis complex genetic diversity : mining the fourth international spoligotyping database (SpolDB4) for classification, population genetics and epidemiology. BMC Microbiol. 2006, 6: 6-23. 10.1186/1471-2180-6-23.View ArticleGoogle Scholar
- Schneider S, Roessli D, Excoffier L: Arlequin ver. 2000: A software for population genetics data analysis. 2000, Geneva: Genetics and Biometry Laboratory, University of GenevaGoogle Scholar
- Rozas J, Sanchez-DelBarrio JC, Messeguer X, Rozas R: DnaSP, DNA polymorphism analyses by the coalescent and other methods. Bioinformatics. 2003, 19: 2496-2497. 10.1093/bioinformatics/btg359.View ArticlePubMedGoogle Scholar
- Nei M, Gojobori T: Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol Biol Evol. 1986, 3: 418-426.PubMedGoogle Scholar
- Kamerbeek J, Schouls L, Kolk A, Van Agterveld M, Van Soolingen D, Kuijper S, Bunschoten A, Molhuizen H, Shaw R, Goyal M, van Embden J: Simultaneous detection and strain differentiation of Mycobacterium tuberculosis for diagnosis and epidemiology. J Clin Microbiol. 1997, 35: 907-914.PubMed CentralPubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.