Simple sequence repeats (SSRs, also known as microsatellites) are repetitive sequence motifs of one to six base pairs (bps) scattered throughout all known genomes . The extensive length differences that may be achieved by microsatellites and their high rate of polymorphism have facilitated their use as molecular markers in epidemiological investigations. However, knowledge is lacking about the mutational mechanism(s) that lead to variations in microsatellite loci.
Gaining a detailed understanding of the features underlying microsatellite genomic structure will aid subsequent interpretations of data from these clinically useful genomic regions. Strand slippage during DNA replication can cause insertion and deletion of repeat units in newly-synthesized nucleotide chains. These events are the most common cause of expansion or contraction of microsatellites [2, 3]. Recent studies have reported differences in rates and patterns of mutation among distinct loci and species; thus, allele size, motif size, genetic lineage, G/C content, functional potential of the transcribed product, and effectiveness of mismatch repair enzymes might all act as mediators of the mutation patterns of such loci [4–7].
Although several models have been proposed to explain the mutation processes that effect microsatellite evolution, they have yet to be confirmed . Studies into genomic evolution exploit the hyper-variable nature of microsatellite sequences to observe mutation events directly . Specifically, pedigree analysis has provided substantial amounts of mutation data for broad ranges of chromosomal loci and organisms [8–10]. However, most of these studies on microsatellites have focused on size variation among alleles, and have not addressed potential sequence variations within otherwise similarly sized alleles.
Sequence analysis is an alternative empirical approach for studying microsatellite evolution. Elucidating the sequence structure of alleles allows for direct comparison with other alleles within a single species or with orthologous loci from different species, effectively allowing for the study of accumulated mutational effects over evolutionary timescales. Intraspecific comparisons that reveal the sequence structure of individual alleles may provide significant insights into the otherwise complex process of maintaining genomic integrity under selective pressure.
Mycobacterial genomes harbor a number of polymorphic microsatellites . Microsatellites in these genomes impart a certain degree of genome plasticity and probably account for many biological functions in the context of pathogen adaptability, virulence and survival . Usually, errors resulting from strand slippage are promptly repaired by a three-enzyme system composed of mutL, mutS, and mutH; however, mycobacterial genomes lack these enzymes . Thus, such genomes serve as interesting systems to investigate the rates of mutations in microsatellites and the existence of regulatory mechanisms that govern microsatellite mutations.
The repetitive CCG sequence located in the Rv0050 gene in Mycobacterium tuberculosis (MTB) is a trinucleotide microsatellite locus (MML0050) that exhibits high a polymorphism rate. The Rv0050 locus encodes a bifunctional penicillin-binding protein (ponA1) [14, 15]. The hyper-variable locus has gained popularity as a Variable Number Tandem Repeat (VNTR) biomarker in epidemiological investigations of MTB strains . However, the mutational mechanisms that are responsible for generating the high levels of variation in the MML0050 locus remain unclear.
In this study, we sought to explain the mutation tempo and mode of the MML0050 locus and its polymorphic nature using clinical isolates from two MTB genotypes: W-Beijing and non-W-Beijing. W-Beijing strains are genetically closely related, present with a characteristic spoligotype pattern, and have enjoyed wide global dispersion [17–19]. They account for 80-90% of the MTB strains isolated from the Beijing area since the 1950s and remain prevalent in other parts of China, including the Ningxia, Shanghai, and Guangdong provinces . Such strains have attracted much research interest due to their reported association with multiple drug resistance, relapses, treatment failure, hypervirulent phenotype in mice, and faster growth rates in human monocytes [21–25]. We sequenced a set of MML0050 locus alleles of different size classes from different MTB families to analyze the stabilizing effect of interrupting motifs in microsatellite regions, the effect of allele length, genetic lineage for the introduction of interruptive motif, and the relation between number of repeat units and mutation frequency.