Comparisons of mutation rate variation at genome-wide microsatellites: evolutionary insights from two cultivated rice and their wild relatives

Background Mutation rate (μ) per generation per locus is an important parameter in the models of population genetics. Studies on mutation rate and its variation are of significance to elucidate the extent and distribution of genetic variation, further infer evolutionary relationships among closely related species, and deeply understand genetic variation of genomes. However, patterns of rate variation of microsatellite loci are still poorly understood in plant species. Furthermore, how their mutation rates vary in di-, tri-, and tetra-nucleotide repeats within the species is largely uninvestigated across related plant genomes. Results Genome-wide variation of mutation rates was first investigated by means of the composite population parameter θ (θ = 4Nμ, where N is the effective population size and μ is the mutation rate per locus per generation) in four subspecies of Asian cultivated rice O. sativa and its three related species, O. rufipogon, O. glaberrima, and O. officinalis. On the basis of three data sets of microsatellite allele frequencies throughout the genome, population mutation rate (θ) was estimated for each locus. Our results reveal that the variation of population mutation rates at microsatellites within each studied species or subspecies of cultivated rice can be approximated with a gamma distribution. The mean population mutation rates of microsatellites do not significantly differ in motifs of di-, tri-, and tetra-nucleotide repeats for the studied rice species. The shape parameter was also estimated for each subspecies of rice as well as other related rice species. Of them, different subspecies of O. sativa possesses similar shape parameters (α) of the gamma distribution, while other species extensively vary in their population mutation rates. Conclusion Through the analysis of genome-wide microsatellite data, the population mutation rate can be approximately fitted with a gamma distribution in most of the studied species. In general, different population histories occurred along different lineages may result in the observed variation of population mutation rates at microsatellites among the studied Oryza species.


Background
Microsatellites are composed of tandemly repeated, sim-ple DNA sequence motifs of 1-6 nucleotide bases in length. These loci are ubiquitously found throughout both prokaryotic and eukaryotic genomes and typically are highly polymorphic within species and populations. As such, microsatellites have become one of the most popular types of molecular markers and widely employed to study population structure of a diverse range of organisms [1,2], reconstruction of evolutionary history [3], parentage and relatedness analysis [4], and natural selection [5]. Despite the widespread application of microsatellites, their evolutionary dynamics across loci within the species and across related species are yet to be well understood. A powerful approach to study the mutational dynamics of microsatellites during the evolutionary process is to investigate their variation of mutation rates. The development of genomic technology provides an unprecedented opportunity to generate increasing microsatellite data from model organisms and their related species on a genomewide scale. There are growing interests in using such microsatellite data, as they make it possible to take mutation rate variation into consideration by analyzing multiple loci across related species rather than a single locus from a single species.
Mutation rate (µ) per generation per locus is an important parameter in models of population genetics as it permits one to estimate the timing of evolutionary divergence between species [1,6], and the effective population size of the species [7,8]. Mutation rates of microsatellites have long been estimated in numerous animal studies. One of the most important observations was that the mutation rate largely varies in several orders of magnitude among different species, ranging from 5 × 10 -6 in Drosophila [9][10][11] to 10 -3 in humans [12,13]. Clearly, the repeat motif length may influence microsatellite mutation rates, but the controversial has lasted many years. For example, Weber and Wong initially surveyed di-and tetranucleotide repeat microsatellites and estimated higher mutation rates for tetranucleotide repeats [14]. However, Chakraborty and his colleagues employed a method based on population variation to estimate relative differences in mutation rate and claimed that di-nucleotide repeats mutate at a rate 1.2-2.4 times higher than that of tri-or tetranucleotide repeats [15]. The finding has been confirmed by the latter studies [9,10,16,17]. These preliminary results showed that mutational dynamics at microsatellite loci could be more complicated than previously observed and assumed, at least at the interspecific level. Therefore, a picture of evolutionary dynamics of mutation rates among microsatellite loci of different repeat types at the genome-wide level is required for further understanding evolutionary factors which govern microsatellite variability and thus contributing to overall levels of genomic diversity across species. In this regard, knowledge of mutation dynamics is particularly necessary in plants, which have been insufficiently explored. Mutation rates at microsatellites in plants seem to be higher than in animals based on different levels of genetic diversity [18] as well as obviously limited experimental estimates [8,19,20].
The proper estimation of mutation rate at each locus is required to examine the variation of mutation rate among loci. Mutation rate of a microsatellite locus can be estimated from the direct genotype data [8,14,21]. Direct observation of such mutational events is preferable if possible, but involves much more time-consuming genotyping. It is quite costly in practice because mutation rates at most microsatellite loci are not large enough to be accurately estimated with a reasonable sample size. These assayed loci may not have sufficient representation of the genome-wide coverage of all microsatellites. Because of the above-mentioned limitations, it is impractical to study the mutation rate variation at the genome level using such a direct approach. Alternatively, the scaled mutation rate can be estimated using population genetic methods that utilize allelic frequency at large microsatellite loci in population samples. In such an analysis, it is commonly assumed that the population has reached a mutation-drift equilibrium so that the allele frequency distribution can be expressed in terms of the composite population parameter θ = 4Nµ, where N is the effective population size and µ is the mutation rate per locus per generation. Using this approach, for example, Xu and Fu recently developed an estimator of θ based on the genetic variation data at the microsatellite loci [22]. Their estimator is unbiased under the single-step stepwise mutation model and is robust against other forms of stepwise mutation models. It also has the advantage of being simple to compute and performs better than several existing estimators, including the maximum-likelihood-based estimator [23]. Taking advantage of this novel development, they further carried out an analysis of population mutation rate variation at di-nucleotide microsatellite loci in the human genome and demonstrated that their estimator is a reasonable assumption for the analysis of large genomic data [24].
As an important model crop, Asian cultivated rice (Oryza sativa L.) has two fully sequenced genomes [25,26]. With increasing population genetic data derived from genomewide microsatellite loci [27][28][29][30], rice affords unique opportunities to use the above-mentioned population genetic method [22] to study their patterns of mutation rate. The increasing evidence [31][32][33] suggested that the majority of rice cultivars can be classified as two primary subspecies, indica and japonica, which were separated from their common wild progenitor O. rufipogon Griff. about 0.4 MYA (Million Years Ago) [34] (Figure 1). Genetic structure of this domesticated crop is so heterogenous that isozyme loci even distinguished additional population structure consisting of the six variety groups of indica, japonica, aus, aromatic, rayada, and ashina [35]. More recently, using 169 nuclear microsatellites and two chloroplast loci, five distinct groups were further confirmed within O. sativa, corresponding to indica, aus, aromatic, temperate japonica, and tropical japonica rices [29]. With regard to the subject of this work, there has been little comparative study on genome-wide patterns of mutation rate variation at microsatellites among closely related plant species and/or subspecies. Therefore, a variety of rice subspecies and their differently diverged relatives could be used as a valuable study model to gain genome-wide insights into patterns of mutation rates at microsatellite loci.
Variation in substitution rates has long been observed for the nucleotide sequence data. Some substitution models which have been proposed to fit the distribution of substitution rates [36,37]. Of them, gammar-distribution rates [38][39][40][41] are the most adopted ones. Recently, genomewide analyses suggested that the gammar-distribution fits the population mutation rate distribution for the microsatellite loci in human [24]. It is of great interest to examine whether the mutation models will also fit the population mutation rate distribution for the microsatellite loci in rice and other related plant genomes.
In this study, we carried out an analysis of the population mutation rate in rice subspecies and related species at di-, tri-, and tetra-nucleotide microsatellite loci using three data sources. We collected and reported the first dataset of 60 microsatellite loci by genotyping Chinese representative samples which also included two related wild species, O. rufipogon and O. officinalis Wall. ex Watt., in addition to two Chinese subspecies (indica and japonica) of O. sativa. The second dataset we investigated is from a much more comprehensive dataset which were generated using a total of 169 microsatellites with world-wide sampling of differ-ent rice subspecies. Finally, we used the third dataset that was generated from a world-wide collection of the African cultivated rice Oryza glaberrima Steud., which diverged from the Asian cultivated rice about 0.7 MYA [34] (Figure  1), by assaying 93 microsatellite loci. Here, we reported the observed patterns of population mutation rate variation in microsatellite loci across rice subspecies and related species, and then attempted to provide evolutionary insights where possible.

The gamma distribution of population mutation rates at di-, tri-, and tetra-nucleotide microsatellite loci
The estimates of population mutation rate were computed for each locus in all rice species where genotypes are available. To examine variation in population mutation rate of microsatellites with respect to their motif sizes, we classified the analyzed loci into four types of repeats, namely di-, tri-, tetra-, and motif-complicated according to their motif sizes. Then the distribution of the population mutation rate in each category was plotted using histograms of population mutation rates at di-nucleotide repeats in all the species and/or subspecies studied. We found that the histograms can be fitted for most species and/or subspecies with a gamma distribution except O. officinalis ( Figure  2). In the case of O. officinalis, we could not fit a gamma distribution probably due to missing data. However, the distributions of population mutation rates at microsatellites with tri-nucleotide repeats can also be approximated well with gamma distributions for all the species and/or subspecies without any exception ( Figure 3). Because only Data set II has sufficient microsatellites with tetra-nucleotide repeats for O. sativa (Table 1), the histograms of population mutation rate can be fitted to the gamma distributions for the cultivated rice as well as all the included subspecies, respectively ( Figure 4). A resampling based procedure was then performed to compare the estimates of population mutation rate at loci with different motifs from the same species and/or subspecies. Results show that the means of the population mutation rates at the loci with different motifs do not significantly differ at the 95% level in the studied rice species.

Dynamics of population mutation rates across related rice species
Our results show that population mutation rates largely vary among the four studied species (Table 2). Of them, O. rufipogon, the wild progenitor of O. sativa, possesses the largest mean population mutation rate, while O. officinalis exhibits the lowest value. In comparison, two cultivated rice apparently differ in population mutation rates. African cultivated rice O. glaberrima has at least fourfold lower value than Asian cultivated rice O. sativa. The interspecific comparisons at all the microsatellites further suggest that the two estimated parameters (both α and β ) of a gamma Species tree of the studied rice species [32] Figure 1 Species tree of the studied rice species [32].  (Table 2). Interestingly, African cultivated rice O. glaberrima exhibited a slightly larger shape parameter α but a rather smaller scale parameter β than Asian cultivated rice O. sativa from both Data set I and II.
Patterns of population mutation rate variation were further investigated for the di-, tri-, and tetra-nucleotide motifs of the analyzed microsatellite loci, respectively (Table 3 and 4; Figure 4). A similar pattern was observed in the three species, O. sativa, O. rufipogon and O. glaberrima, at microsatellites with di-nucleotide repeats (Table   3). Although the estimator β exhibited a similar pattern in the above-mentioned species, at microsatellites with trinucleotide repeats as shown in Table 4

Difference of population mutation rates among subspecies of Asian cultivated rice
In this study, population mutation rate variation of O. sativa was estimated by using two data sources. The Data set I was collected by using microsatellites and samples both fewer than Data set II [29]. Comparisons of parameters (α and β ) of the fitted gamma distributions show consistent estimates of population mutation rate variation from these two data sets (Table 2). Similar results could also be observed at microsatellites with di-and trinucleotide repeats from Data set I and II (Table 3, 4).
To explore whether there was any intra-specific difference in the population mutation rates of microsatellites at the genomic level, estimates of α and β of the gamma distribution were compared among the studied subspecies of O. sativa (  Figure 4). It is clear that the scale parameter β has a very large variance among the estimates relative to the mean value α for the studied subspecies. Interestingly, tropical japonica had a larger β value than temperate japonica despite their very close genetic proximity.
To further confirm the potential difference of mutation rates among subspecies described above, we estimated and compared mutation rates for the analyzed microsatellites in two major subspecies, indica and japonica, through coalescent simulations for Data set I (methods see [42]). As shown in Figure 5, we again found no obvious differ- Histogram of the population mutation rate and the fitted gamma distribution at di-nucleotide repeats in each of the studies species/subspecies  1 ence of mutation rates between them, suggesting that indeed these two subspecies may not have obtained obvious intra-specific difference of mutation rates at microsatellite loci since their domestication.

Discussion
Using the genetic variation data at microsatellties throughout the genome, we estimated the population mutation rate θ at these loci in a total of four Oryza species. The estimator of θ was developed with the assumption of mutation-drift equilibrium and no assortative mating in the studied population [22]. While the samples of two cultivated rice in this study may reach mutationdrift equilibrium, it was estimated that, for example, the outcrossing rate is only 0-5 % in Asian cultivated rice O. sativa [43]. However, we would argue that the estimator of θ is still applicable to our data sets due to the following reasons. First, it may be true that the selfing rate is high for a particular population (e.g., a variety) of cultivated rice. In our study, only one individual is taken from each accession and the estimation can be performed after adjustment for population size; and second, the estimator is based on sample homozygosity, which is computed from allele frequency as given in Equation 1. The main effect of selfing is on the frequencies of genotypes rather than the allele frequencies. Therefore, the low outcrossing rate in selfing species such as O. sativa and O. glaberrima could have minimal effects on the estimator.  It is well recognized that the cultivated plants always had a complicated history of evolution and domestication. As a result, in this case of cultivated rice, there was an obvious population size expansion. However, since it is generally believed that on average the effective population size is constant for all the loci in a specific sample, our comparisons of population mutation rate within samples/species should be valid. But the comparison across species would be problematic. Therefore, we can only compare the α estimates based on the property of gamma distribution when making comparisons among different species.
It should be noted that the approach we take in our study does have limitations. Because we are estimating the population mutation rate θ, the variation in this parameter reflects both the variation of the underlying mutation rate and the variation of coalescent time at each locus, which could be very large. Only on average does the estimates of θ reflect the average mutation rate. Therefore, even though our approach does not allow the direct comparison of the exact distribution of the mutation rate, it allows us to make comparisons of the average mutation rate between the microsatellites with different repeat motifs, which is the main purpose of our study.
It is noticeable that Data set II was collected by using the world-wide samples of O. sativa which included almost all genetic or ecological subspecies [29]. A total of 146 microsatellites were sampled from the whole rice genome. In comparison, Data set I only used the representative samples from China by assaying relatively fewer microsatellites of the genome. However, comparisons of parameters (α and β) of a gamma distribution show relatively consistent estimates of population mutation rate variation in O. sativa using all the loci and the microsatellites with dinucleotide repeats from these two data sets. This suggests that the sample size and number of microsatellties from Data set I may be sufficient to capture the mutation rate variation of the major class of microsatellites at the genomic level. However, since most of the microsatellites used in our study are di-nucleotide repeats and we have much fewer tri-nucleotide repeat loci (10 for Data set I but 37 for Data set II), it is not surprising to observe that the parameter estimates at the tri-nucleotide repeats in O. sativa from the two data sets are inconsistent. Noticeably, the ascertainment bias should be taken into consideration for each dataset under investigation when making comparisons among species and/or datasets. As the microsat-Histogram of the population mutation rate and the fitted gamma distribution at tri-nucleotide repeats in each of the studies species/subspecies   ellite loci were mainly chosen for a purpose of the detection of polymorphisms, the microsatellites used for genotyping were most likely not random, leading to an important bias about the distribution of theta values. For example, loci with low underlying theta values are very likely to be discarded earlier, because they were considered monomorphic. Therefore, it may be necessary to analyze microsatellite data based on the randomly selected loci and retest the patterns observed in this study.
As a main purpose of this work, it is of great interest to examine whether there is any variation of mutation rates across microsatellites of different repeat types at a genome-wide level among different subspecies of the same species, O. sativa, and among these closely related Oryza species. In contrast to the findings from humans [15][16][17]44], our observations indicate that the mean population mutation rate of microsatellites with different motifs does not make significant differences among subspecies of O. sativa or within this species. The same is true for African cultivated rice O. glaberrima, and the other two wild rice species, O. rufipogon and O. officinalis. The effective population size for the sample from the same species or subspecies is generally assumed to be same across the microsatellite loci. Therefore, our results suggest that the mean population mutation rates of microsatellites with different motifs do not make significant differences within the studied Oryza species or each subspecies of cultivated rice.
Our results suggest that the variation of population mutation rate at microsatellites within species can be approximated with a gamma distribution for the most studied species. In such a study to first investigate evolutionary dynamics of population mutation rates at microsatellites in Asian cultivated rice and its relatives, the gamma distribution was applied. In addition, the results show that the subspecies of O. sativa have similar α values. Since α is generally interpreted as the shape parameter of a gamma distribution, our results reveal that the shapes of the distributions of population mutation rate at microsatellites in the subspecies of O. sativa are similar. This conclusion has further been demonstrated through another separate coalescent simulation for Data set I. An easy interpretation is that these subspecies of O. sativa have not been separated long enough to cause any differences in the mutation rate at microsatellites. However, a very large variance of the scale parameter β was found among the studied subspecies. Such a difference in the distributions mainly comes from the different effective population sizes of these subspecies. From the viewpoint of the domestication and history of cultivation of this cultivated rice [29,32,33,35,43,45,46], it is possible that these genetic or ecological subspecies have different effective population size. Such a difference has contributed to the variability in the scale parameter β observed.
In contrast to consistent variation of population mutation rates at microsatellites among different subspecies within O. sativa, we observed apparent variation among the studied rice species. Both estimated parameters α and β of a gamma distribution are larger for the wild progenitor O. rufipogon than those for the derived cultivated rice O. sativa. Theoretically, it is a well-known fact that the wild progenitor always has a large effective population size. Therefore, this observation strongly suggests that the population mutation rates of microsatellites in cultivated rice have become small after its origin and domestication. The other African cultivated rice species, O. glaberrima, which was originated and is merely grown in Africa [47,48], exhibited a larger shape parameter α but a smaller scale parameter β than Asian cultivated rice O. sativa. This indicates that African cultivated rice may have a relatively high mutation rate of microsatellites. The result may also be indicative of small effective population size in O. glaberrima, which seems consistent to the origin and very narrow geographical range of this cultivar in Africa [47,48]. The most distantly related species O. officinalis fits a gamma distribution only at microsatellites with tri-nucleotide repeats, which showed a smaller α value even than O. sativa. In comparison, it had a smaller estimated α than any other species.

Conclusion
In this study, genome-wide microsatellite data were analyzed for four Oryza species. It suggests that the population mutation rates at the assayed loci could be approximately fitted with a gamma distribution in most of the studied species. In addition, we obtained the shape parameters of their distributions. The results demonstrate that population mutation rate (θ) is able to efficiently estimate the genome-wide variation of mutation rates at microsatellite loci in plants. The results may help to guide the modeling of genome-wide genetic variability at microsatellite loci with different repeat motifs.
Our comparative analysis shows that the mean population mutation rate of microsatellites with different motifs does not make significant differences within the studied Oryza species or each subspecies of cultivated rice O. sativa. This finding is not consistent to previous results reported in other organisms (e.g., humans), although the inconsistence may come from the dramatically different sample sizes in these studies. More data from other plant taxa may help to further clarify our obtained result. Meanwhile, precise estimates of mutation rates of microsatellites with different motifs may be needed through collecting direct genotype data in closely related rice species.
Our data showed that population mutation rates largely vary across the rice species in this study. Such observations are indicative of the dynamics of mutation rates among them. It is also likely that they may possess different effective population size after the divergence from the common ancestor. Different evolutionary histories along these lineages may be used to explain the variation of mutation rates. To better elucidate patterns of mutation rate and affected factors, it is of great interest to further study and compare extensive taxonomic groups of the genus with different population genetic histories.
We found that different subspecies of O. sativa have similar shape parameters of the gamma distribution. The coalescent simulation also indicates that the variation of mutation rates among these subspecies of O. sativa is not apparent due to their short duration of divergence. However, both estimated parameters α and β of a gamma distribution are larger for the wild progenitor O. rufipogon than those for the derived subspecies of cultivated rice O. sativa. In addition to a decreased effective population size during the process of domestication of cultivated rice, it is likely that mutation rates of microsatellites in rice may have become small in comparison to wild progenitor. Accurate estimate of mutation rates in these subspecies of Asian cultivated rice as well as the wild progenitor in experimental populations may help to better investigate the dynamics of their effective population size. With such an effort, we will be able to deeply obtain evolutionary insights into mutation and evolution of microsatellite loci in cultivated rice species and wild relatives.

Data
Data set I The first dataset consists of represents by genotyping a total of 113 accessions of cultivated rice and closely related species at 60 microsatellite loci (  [49]. When this study was conducted in 1997, there were a total of 323 microsatellite loci publicly available [50][51][52][53]. We selected five loci for each chromosome totaling 60 primer pairs that are randomly distributed throughout the rice genome [30]. These microsatellites included 39 di-nucleotide, 10 trinucleotide, and 11 motif-complicated nucleotide. Microsatellite polymorphisms were assayed by specific PCR conditions following Panaud et al. (1996) [51]. PCR products were run on 6 % polyacrylamide denaturing gels and marker bands were revealed using the sil-ver staining as described by Panaud et al. (1996) [51]. The null alleles were confirmed after several repetitions with different amplification conditions to ensure that no reaction failure existed. To determine the allele size, the samples were directly compared with band sizes from an allelic ladder prepared by amplification of an artificial mixture of DNA from all the assayed samples.  [48].

Statistical Methods
The estimates of population mutation rate and patterns of mutation rate variation in each subspecies/species were obtained by generally using the approach by Xu, Chakraborty, and Fu (2005) [24]. In brief, the analysis includes the following steps.

Estimation of population mutation rate
The allele frequency of each locus in each subspecies/species was estimated using the genotype data by the genecounting method. We further estimated the composite parameter -population mutation rate θ = 4Nµ, where N is the effective population size and µ is the mutation rate per locus per generation. The parameter is critical in making inferences using genetic variations because many statistical properties of measures of genetic variation are dependent on the parameter. We estimated θ using the sample homozygoisty-based estimator developed by Xu and Fu (2004) [22], which is approximately unbiased under the single-step stepwise mutation model and has a much smaller variance than the size-variance based estimator. The estimator is also simple to compute, which makes it suitable for large-scale analysis of microsatellites at the genomic level.
The sample homozygosity is computed as where n is the sample size, k is the number of alleles in the sample, and p i is the allele frequency estimate for the ith allele in the sample. A biased estimator is given by The population mutation rate θ was estimated for each locus in every possible subspecies/species where allele frequency can be estimated from the genotype data.

Fitting with a gamma distribution
Previous studies have shown that the variation of mutation rate at the protein-enzyme loci and single-nucleotide sites can be approximated as a gamma distribution. A recent study also suggests that the gamma distribution can be used to model the mutation rate variation at microsatellite loci with di-nucleotide repeats in humans [24]. The exact pattern of mutation rate variation in rice is unknown. From the estimates of the population mutation rate for the microsatellites across the rice genome, we plotted the histogram, which shows that the distribution of the mutation rates in rice can be approximated with a gamma distribution. A maximum-likelihood approach was used to fit the histogram with a gamma distribution and to estimate the two parameters of the gamma distribution -shape parameter α and scale parameter β . The gamma distribution was parameterized such that the expectation is αβ and variance is αβ 2 . The estimates of the parameters from each subspecies were compared.

Comparing the θ Estimates
To compare the θ estimates from microsatellites with different motifs, we used a resampling procedure. Suppose we want to compare the θ estimates from di-nucleotide markers to those from tri-nucleotide markers. Suppose we had n tri-nucleotide markers, we re-sample 1,000 times with replacement n markers from the group of di-nucleotide markers and calculated the average of θ estimates from each resampling. The number of times N the average value was less than the average value from the tri-nucleotide markers was used to calculate a P value as N/1000.

Demographic Model
The population model of domestication used here approximates the demography of a domestication event of a single cultivated species from its wild progenitor (see [42] for details). It is assumed that the wild progenitor species has a constant-size population with size N2. The domestication begins at time Td in a small founder population of the cultivated species, which is merely a subset of the members in the wild progenitor. The domestication is assumed to have occurred in this constant-size founder population with size N1. Usually, N1 could be much smaller than N2. When the domestication is complete at time Te, it is assumed that the population size changes to N0, which is generally much larger than N1, representing a rapid population expansion of the domesticated species. Let T0 and T1 be the lengths of time when the population sizes are N0 and N1, respectively. Thus, this simple domestication model has five above-described demographic parameters.

Simulating microsatellite polymorphisms
Under such a population model, patterns of microsatellite variation in the two species (one subspecies and its progenitor) were simulated. On a simulated genealogy from two subspecies with the shared ancestral population [42], random mutations were placed and the changes of repeat numbers at microsatellites were simulated. To approximate the mutation process in microsatellites, we used a generalized stepwise model, under which the change of the number of microsatellite repeats follows a symmetric geometric distribution with a parameter p. In this study, we considered five models with p = 1, 0.9, 0.8, 0.5, and 0, and the results for p = 0.8 are shown in this article because the results under the five models are very similar.

Rejection-sampling algorithm for estimating demographic parameters
A rejection-sampling algorithm was employed to obtain estimates of the demographic parameters and the mutation rate in the above-described model [42]. Because evaluating the full likelihood of multiple microsatellites may be computationally infeasible at present to the best of our  knowledge, we summarized the data by two statistics, the average heterozygosity in wild (HR) and the average heterozygosity in cultivated rice (HC), which could be either HI or HJ for the analysis of subsp. indica and subsp. japonica, respectively. The procedure of our rejection-sampling algorithm to obtain a sample from the joint posterior distribution of the mutation rate is as follows: 1) simulate the mutation rate of microsatellites from the prior distributions. For the mutation rate (u), we used a uniform distribution from 0 to 2 _10 -4 when p = 0.8; 2) place random mutations and simulate patterns of microsatellite polymorphisms. To incorporate the variation in mutation rate across loci, the mutation rate at each locus is assumed to follow a Gamma distribution with mean u (determined at Step 1) and SD 0.7 × u. For each case, at least 5,000 accepted sets of parameters were collected.