What's in a name; Genetic structure in Solanum section Petota studied using population-genetic tools

Background The taxonomy and systematic relationships among species of Solanum section Petota are complicated and the section seems overclassified. Many of the presumed (sub)species from South America are very similar and they are able to exchange genetic material. We applied a population genetic approach to evaluate support for subgroups within this material, using AFLP data. Our approach is based on the following assumptions: (i) accessions that may exchange genetic material can be analyzed as if they are part of one gene pool, and (ii) genetic differentiation among species is expected to be higher than within species. Results A dataset of 566 South-American accessions (encompassing 89 species and subspecies) was analyzed in two steps. First, with the program STRUCTURE 2.2 in an 'unsupervised' procedure, individual accessions were assigned to inferred clusters based on genetic similarity. The results showed that the South American members of section Petota could be arranged in 16 clusters of various size and composition. Next, the accessions within the clusters were grouped by maximizing the partitioning of genetic diversity among subgroups (i.e., maximizing Fst values) for all available individuals of the accessions (2767 genotypes). This two-step approach produced an optimal partitioning into 44 groups. Some of the species clustered as genetically distinct groups, either on their own, or combined with one or more other species. However, accessions of other species were distributed over more than one cluster, and did not form genetically distinct units. Conclusions We could not find any support for 43 species (almost half of our dataset). For 28 species some level of support could be found varying from good to weak. For 18 species no conclusions could be drawn as the number of accessions included in our dataset was too low. These molecular data should be combined with data from morphological surveys, with geographical distribution data, and with information from crossing experiments to identify natural units at the species level. However, the data do indicate which taxa or combinations of taxa are clearly supported by a distinct set of molecular marker data, leaving other taxa unsupported. Therefore, the approach taken provides a general method to evaluate the taxonomic system in any species complex for which molecular data are available.


Background
The taxonomy of wild potato species, belonging to section Petota of the genus Solanum, is known to be problematic [1][2][3]. Identification of many species is difficult and the systematic relationships among the wild potatoes are not clear. One of the causes for these difficulties is the ability of many species to hybridize easily [2]. Hawkes [1] hypothesized that approximately 12% of the 224 tuber-bearing Solanum species he recognized, had arisen from hybrid speciation. A quote from Correll [4] (page 404) may serve to illustrates the magnitude of the problem: "In fact, the difficulty one encounters in dealing with plants from northwest Argentina and southern Bolivia is such that one is tempted to consider, with very few exceptions the entire Tuberarium population to be one vast assemblage of hybrids" (section Tuberarium being roughly equivalent to the current section Petota).
Next to hybridization there is a large amount of phenotypic plasticity, i.e., plants look different in different environments [4][5][6]. Partly because of this, taxonomists have granted minor variants (sub)species status. As a consequence, species boundaries are based on morphological characters that are not expressed under all conditions. Hence, numerous species have been described, many of which are extremely similar to each other, and that is why Spooner and Salas [2] and van den Berg and Jacobs [3] concluded that the group of wild species belonging to Solanum section Petota is overclassified. An extreme example of overclassification within Solanum section Petota is the so-called brevicaule complex. Morphological results failed to distinguish the 30 species in the brevicaule complex [7]. Molecular data showed that the brevicaule complex is paraphyletic and that many taxa should probably be relegated to synonymy [8].
The systematic relationships among these species are also hard to determine. These have been expressed in an arrangement of 19 series, as designated by Hawkes [1] and others. Some of the series are difficult to keep apart while other series contain subgroups that could be considered a separate series [3]. To date, the series classification of Hawkes [1] and other authors has received no cladistic support [6]. Jacobs et al. [9] described the taxonomic structure present in Solanum section Petota. They focused on testing the validity of the series classification and on studying the taxonomic structure of the section based on AFLP data. They produced the largest dataset ever constructed for Solanum section Petota and analysed it both phenetically and phylogenetically. Although some of the branches in the resulting trees were supported by jackknife values above 69, both phenetic and phylogenetic trees also display a large polytomy containing many taxa.
In the present study, we focus on the status of the recognized species in section Petota, in order to evaluate possible overclassification, misclassification and hybridization. The number of species in the Solanum section Petota has already been reduced somewhat due to the application of molecular techniques. While Hawkes [1] still recognized 227 tuber-bearing species (of which 7 were cultivated) and 9 non tuber-bearing species within section Petota, Spooner and Hijmans [5] recognized only 203 tuber-bearing species, including 7 cultivated species. Spooner and Salas [2] reduced the number further to 189 species (including only 1 cultivated species). Phylogenetic and phenetic analysis of previous studies, reviewed in van den Berg and Jacobs [3] and Jacobs et al. [9] revealed that accessions from many wild Solanum species, especially the species of the South American series Tuberosa, Megistacroloba, and Yungasensia, are closely related. This is consistent with the observations that they freely exchange genes and produce hybrids under artificial conditions. Because of this, we chose as the starting point of our analysis the AFLP data used by Jacobs et al. [9] to consider the individual plants as belonging to one gene pool, rather than to separate taxa, and to employ a population genetics approach to detect the genetic structure of these AFLP data for the group of South American representatives of Solanum section Petota.
To test which accessions may belong to one or more species groups we used a Bayesian population clustering approach implemented in the program STRUCTURE 2.2 [10,11]. STRUCTURE clusters individuals without using a-priori information about their identity. The primary assumptions of the model used in STRUCTURE are Hardy-Weinberg equilibrium (HWE) within populations and linkage equilibrium among loci, and the program attempts to find population groupings that are not in disequilibrium [11]. Both assumptions may not always be valid when taking a more or less random set of accessions collected over a larger area as representing a species, but disequilibrium will always be smaller within a species than between species. The program has been successfully used in a large variety of population genetic studies, for example in the research of genetic structure in the human population [12], in the phylogeography of the sand-dune shrub America pungens [13], for distinguishing chicken breeds [14], and to detect hybrids between cultivated and wild apple [15,16]. Recently, STRUCTURE was also used in studies on phylogenetic relationships among birch species [17], on species delimitation in a recent species radiation in turtles [18] and in the Mexican jay [19], and produced part of the evidence for a separate species status of the Galapagos sea lion [20].
Accessions within one species are expected to share more alleles with each other than with accessions from other species. As a result, genetic differentiation among species is expected to be higher than within species. Consequently, if we subdivide an unstructured set of accessions according to their species labels, the fraction of the genetic variation present among, rather than within, groups will be higher if those species labels correctly identify the accessions. If the species labels are incorrect then combining accessions with incorrect species labels into new groups will increase the fraction genetic variation among the groups.
Thus, the genetic differentiation among alternative groupings as expressed in Fst values will allow us to further subdivide the groups resulting from the STRUC-TURE analysis, and to distinguish genetically separate species from species that should be grouped together.
This approach of species delimitation resembles somewhat the view of Shaffer and Thompson [18] that follows Mayden [21] and de Queiroz [22,23], in that they consider species as segments of evolutionary lineages. In this view, species delimitation comes down to identification of metapopulation-like lineages. The metapopulation lineage species definition leads to operational species delimitation approaches that recognize sets of populations that freely exchange genes in nature but have no or very restricted gene exchange with other sets of populations [18]. In this paper we describe how this approach works out for Solanum section Petota.

Plant Material
We used the plant material from the genus Solanum section Petota as described in Jacobs et al. [9], which consists of 4929 genotypes representing 916 accessions. From each accession a representative genotype was chosen [9]. A subset (out of the 916) consisting of 566 plants (one plant per accession) was made, representing the 89 species/subspecies from South America that appeared in the large polytomy of the trees presented by Jacobs et al. [9], plus the accessions that do not belong to the species groups with high jackknife or bootstrap support (viz. excluding the following supported groups: Acaulia group, Mexican diploid group, diploid Piurana group, tetraploid Piurana group, polyploid Conicibaccata group, diploid Conicibaccata group, Circaeifolia group, Longipedicellata group, and Iopetala group). Information on the accession numbers and geographic origin of these 566 samples can be found in Additional file 1. The nomenclature of the plant material follows that of Jacobs et al. [9]. This means that in some cases we have retained the original labels, even when taxonomic references suggested a change of the species name. However, a number of obvious mistakes (due to mislabeling) that became clear after preliminary AFLP analyses have been corrected after morphological examination.

AFLP
The protocol of Vos et al. [24] was used to generate AFLP fragments. The plant material was fingerprinted with two EcoRI/MseI AFLP primer combinations: E32/ M49 and E35/M48. These primer combinations gave 91 and 131 polymorphic bands, respectively. The AFLP analysis was done on a MegaBACE 2.1 by Keygene N.V. Bands were scored as dominant markers, using the Keygene proprietary software.

Data analysis Bayesian clustering
The 566 South-American accessions were analyzed with STRUCTURE 2.2 [10,11] in an 'unsupervised' procedure according to Rosenberg [25] based on genetic similarities only. We used the approach of coding the dominant markers as described by Falush et al [26]. The dominant AFLP data were entered by coding both alleles as '1' when the AFLP band was present and both as '0' when the band was absent. We specified '0' as the recessive allele for all the AFLP data. This enables the simultaneous analysis of accessions with different levels of ploidy like described by Schenk et al. [17]. Evanno et al. [27] showed that results of AFLPs with STRUCTURE can be as accurate as those of microsatellites. Estimates for the log likelihood were obtained using the admixture model and the assumption that the allele frequencies are correlated. The log likelihood estimates were obtained for 10 replicate runs at each K ranging from K = 1 to K = 30. For each run, we used a burn-in of 25,000 cycles and a data run of 100,000 cycles.
To test whether STRUCTURE was suitable for analyzing the Solanum AFLP data, a pilot analysis was carried out on the condensed dataset of 916 individuals. Almost all species groups as defined by Jacobs et al. [9] and smaller supported branches in the NJ tree have their own cluster at K = 18 or higher (results not shown), which confirms that STRUCTURE can be used for the AFLP dataset.

Partitioning of genetic variation within and among groups
It is unrealistic to assume that one STRUCTURE analysis could separate all species. Some of the 566 accessions may be from a genetically homogeneous species that occupies a small area, while others may be from a genetically highly variable species that occupies a large area. Some species were represented by many accessions, others by only a few. Therefore, while increasing the number of clusters (K) in the STRUCTURE analyses, accessions of certain species may already start to be assigned to different clusters before accessions of other species would be separated from each other. When large datasets are analyzed convergence problems for the Gibbs sampler algorithm used in STRUCTURE software may occur [12,28]. Therefore we decided do a nested analysis.
The second level (nested) analyses could be done again by STRUCTURE for each group separately, as e.g. Jing et al. [29] did in Pisum. The advantage is that an a priori grouping is made and accessions formerly classified under the same name may end up in different groups. An alternative option was to optimize the grouping of accessions by maximizing the Fst among the species or among combinations of species. This has two important advantages: (1) all plants within an accession can be included in this computationally simple analysis, and (2) even if several rounds of grouping are performed, it is still much faster than optimizing and performing a STRUCTURE analysis on each of the 16 clusters. A disadvantage is that accessions of the same name remain together, which may mean that in theory the best solution is less optimal than obtained with the nested STRUCTURE approach.
As a pilot experiment, we performed a nested STRUCTURE analysis on a few clusters and compared the results to an Fst analysis of the same clusters. The results were compared by calculating the Fst among groups for the nested STRUCTURE analysis and for the optimized Fst approach. The optimized Fst approach always resulted in a higher value for the Fst among the groups within the cluster (not shown). We therefore decided to continue with the Fst analysis. This combination is a novel approach.
The partitioning of genetic variation (Fst) among STRUCTURE clusters or among new groups within a cluster was computed using AFLP-SURV 1.0 [30]. The allelic frequencies at AFLP loci were calculated from the observed frequencies of fragments, using the Bayesian approach [31] (assuming diploid species and Hardy-Weinberg equilibrium) using all 2767 available genotypes for the 566 accessions (when available 5 plants per accession). We assumed a uniform prior distribution of allelic frequencies. Significance of the Fst values was tested by 1000 permutations. The confidence limits obtained were used to determine the significance of differences between the separate estimates.

Grouping within clusters by maximizing Fst
Within each of the 16 STRUCTURE clusters we calculated Fst based on the species present using AFLP-Surv. Subsequently, combinations of accessions with different species labels were made and the overall Fst value and pairwise Fst values between the groups within a cluster were computed. We performed several rounds of grouping. Each time the accessions of those species or groups that showed a pairwise Fst of less than the observed overall Fst of the groups within the cluster were combined. This process was repeated, merging species and species groups, until further merging of groups did not increase the overall Fst value significantly.

Clustering of the 566 South-American accessions into 16 clusters
The 566 South-American accessions were analyzed using STRUCTURE, testing various numbers of groups, from K = 1 to K = 30. Figure 1 shows the average posterior probability Ln(P(D)) for 10 runs as a function of K. The posterior probability increases until around K = 16, after which it reaches a plateau. From K = 18 onwards the posterior probability became increasingly variable among runs, and the clustering of accessions became unstable between replicate runs. In contrast, at K = 16 the clustering results were stable and most clusters had the same composition in all 10 replicate runs. We therefore took K = 16 (Ln P(D) = -41181.7) as the optimal K.
The estimated population structure of one run at K = 16 is shown in Figure 2. Each individual accession is presented by a thin vertical line, and this line shows colored segments that represent the relative percentage of membership to the K clusters (the underlying data can be found in Additional file 1). The accessions labeled as S. okadae, S. raphanifolium, S. verrucosum, and S. macropilosum occupy exclusively one cluster, while many other accessions are found to share a cluster with accessions from one or more other species, for instance S. huancabambense with S. sogarandinum. Many accessions labeled with the same species name are distributed over two clusters, e.g. the accessions of S. maglia, S. gourlayi, S. tarijense. Finally, there is a number of species whose accessions show membership to more than two clusters. Additional file 1 provides the detailed results on the composition of the clusters and the percentage of membership per individual accession for these clusters, in the run with the highest probability. Most clusters defined by STRUCTURE for K = 16 are the same in all 10 runs. The main exception is cluster 3, which was found in only 3 out of 10 runs as a separate unit. In the other 7 runs its accessions were combined with those of cluster 4.
The partitioning of genetic variation among the clusters (Fst) in the 16 cluster arrangement represented 31% of the genetic variation (Table 1). For comparison, we also calculated that the 89 pre-existing taxa explained 29% of the existing genetic variation. A subdivision in  Figure 2 Estimated population structure for K = 16. Each accession is represented by a thin line, which is partitioned in K colored segments that represent the membership to K clusters. The labels below indicate the species labels.
10 groups (one run of a suboptimal STRUCTURE analysis at K = 10) already explained 27%. The 566 individual accession arrangement showed the lowest value of Fst, as only 15% of the genetic variation is present among accessions. All Fst values were significantly different from each other. The level of genetic differentiation among the accessions was lower within the clusters than among the clusters ( Table 2). The lowest values are for cluster 1, 6 and 15, which mainly or exclusively consist of accessions with only one species label, e.g. cluster 15, which contains only S. okadae accessions, has an Fst of 0.0029.
Genetic differentiation among species within clusters that contain accessions from two species ranged from 9.8% in cluster 4 to 27.8% in cluster 7. In cluster 4, cluster 10, and cluster 12 the species arrangement only added a small part to the genetic differentiation, relative to the value for all accessions separately.

Further subdivision of the 16 clusters
As the contribution to the partitioning of genetic variation could differ for the various species within a cluster, we performed several rounds of grouping on all 2767 individuals available for these accessions. Each time the accessions of those species that showed a pairwise Fst of less than the observed overall Fst of the groups within the cluster were combined into one group, so that in the next round the number of groups was lower. The process was repeated, merging species and species groups, until further merging did not increase the Fst value. Table 2 lists the Fst value of the optimal number of groups, along with those of the value obtained with one group more or less, and the group structure of the optimal configuration is reported. In most of the clusters one or two merging steps were sufficient to reach a maximum Fst, but in cluster 7, 12, and 14, three cycles were needed, while in cluster 10 and 16 the process took four cycles. In some clusters the highest overall Fst was reached when most of the species labels were merged together; this was the case in cluster 10, 14 and 16. In other clusters the optimal Fst was reached at an arrangement that only merged a few of the species in the cluster, while other species remained separate. This was the case in cluster 3, 4 and 13. In cluster 8 no new arrangement yielded a higher Fst. Overall, the 566 accessions were grouped into 44 genetically distinct groups.
The assignment of the 566 accessions into 44 genetically distinct groups was then used to infer the support for the 89 species into which these accessions had been classified. The results are presented according to taxonomical classification in Table 3, and will be discussed below. For those species (18) that were represented by only one accessions in this study, no conclusion could be drawn. For 43 species there was no evidence, for 20 there was weak evidence and for 8 there was good evidence.

Discussion
Many described species in section Petota are very similar to each other and are able to cross, suggesting that this section is overclassified. We have tested this for the large group of South American species of the section Petota, using a population genetic approach that would allow us to identify any structure among this material, if present. The results obtained from the analysis of 566 South-American Solanum section Petota accessions with STRUCTURE showed an optimal overall subdivision of these accessions in 16 clusters. By maximizing the partitioning of genetic variation among groups (Fst) we obtained support for additional groups within these clusters, up to a total of 44 units (or 48 units including the unknown species accessions) ( Table 2). This does not automatically mean that 44 is the correct number of species as genetic differentiation would be expected among separate species but it can also be found among populations within a species (see below). Nevertheless, the Fst values of the various species arrangements in Table 1 offer a clear indication of overclassification: Fst increases from 0.145 (the 566 accessions) to 0.273 (10 clusters) and to 0.312 (16 clusters). The highest value is obtained after the nested analysis, when 44 groups explain 35% of the genetic variation (the remainder being present within species). The Fst value of the 89 species arrangement (0.2953) is even lower than that of the 16 clusters (0.312), indicating that the current species arrangement is 'over the top' but still does explain a considerable part of the genetic variation within the dataset.

Misclassification and overclassification
If not all accessions of a species are in one cluster but one or a few are present in different clusters, this may  3 ktz (7), mag (2), oka (8)

Status of subspecies
In nearly all cases there was no support for maintaining taxa at the subspecies level. This is the case for the subspecies within the species S. microdontum, S. vernei, S. boliviense and S. megistacrolobum. Only one of the recognized subspecies was supported in our analysis: S. commersonii subsp. malmeanum could be differentiated genetically from S. commersonii subsp. commersonii ( Table 3). Some of these (sub) species have been extensively studied previously, using morphology. The subspecies S. microdontum subsp. gigantophyllum was already considered to be a synonym of S. microdontum [32] and should not be recognized, as this is a clear case of overclassification. Giannattasio and Spooner studied the boundaries between S. megistacrolobum subsp. megistacrolobum and S. megistacrolobum subsp. toralapanum using morphological data [33] and with molecular markers [34]. Based on their analysis they suggested to preserve S. megistacrolobum subsp. toralapanum as a distinct subspecies while our analysis does not find support for this. Spooner et al. [35] studied the relationships of S. boliviense and S. astleyi using RAPDs and concluded that S. astleyi should be reduced to a subspecies of S. boliviense. Our data do not provide support for a subspecies level in S. boliviense.

Some species are supported
The following species are supported as genetically distinct units: S. raphanifolium, S. verrucosum (with S. macropilosum as synonym), S. microdontum, S. commersonii, S. okadae (only the seven accessions in cluster 15), S. huancabambense, and S. sogarandinum. The seven S. okadae accessions that appear in cluster 3 together with S. venturii accessions turned out to be mislabeled and have been corrected as being S. venturii accessions        (personal communication R. Hoekstra, CGN). The accessions labeled S. microdontum, S. huancabambense and S. sogarandinum share their cluster with accessions from other species, but the optimal partitioning of genetic variation within the cluster shows that they represent distinct genetic units. This is consistent with the results from Jacobs et al. [9] and most of these species were also recognized in one or more other studies [2,6,32,36,37].

Support for combinations of species, pointing at overclassification
Some species are assigned to one STRUCTURE cluster, but their accessions do not form distinct genetic units within the cluster on their own, but combined with accessions from another species they do (Table 2). These are probably cases of overclassification. Examples are the combination of S. verrucosum and S. macropilosum in cluster 2, of S. kurtzianum and S. maglia in cluster 3, of S. venturii and S. okadae in cluster 3, of S. sandemanii, S. weberbauerii, and S. medians in cluster 5. Some of these combinations have already been recognized in the literature, e.g. S. macropilosum is considered a synonym of S. verrucosum [6]. Spooner and Salas [2] recognized S. medians and S. sandemanii, but not S. weberbauerii, which name they apparently considered as a synonym (unfortunately, information about this was not provided). Spooner et al. [38] synonymized both S. sandemanii and S. weberbaueri under S. medians.

Accessions scattered across clusters, pointing at mislabelling
The analysis showed that accessions from some species were scattered across two or even three clusters. This was the case for the accessions with the following species labels: S. maglia, S. doddsii, S. chacoense, S. gourlayi, S. virgultorum, S. hoopesii, S. augustii, S. tarijense S. vernei, S. infundibuliforme, S. alandiae, S. neorosii, S. sucrense, S. pachytrichum, and S. violaceimarmoratum. A major cause for this situation is probably mislabeling of accessions, although some of these species may be the product of hybridization events that occurred a long time ago. For instance, Solanum doddsii from Bolivia has been hypothesized to be a hybrid between S. alandiae and S. chacoense [39].
Misclassifications do occur since identification is often problematic due to ambiguous species characteristics. Problems with the identification of species were already addressed by Spooner and Salas [2] and Spooner and van den Berg [40], who noted that many of the taxa are extremely similar in morphology and many species are distinguished only by minor characters with often overlapping character states.

Hybrid accessions
Many authors [1,2,4,41,42] have suggested that certain recognized species in Solanum sect. Petota are the results of hybridization. Recent hybridizations can readily be recognized from the STRUCTURE analysis by the probability with which they are assigned to a particular cluster. While most accessions have a very high probability (usually around 0.9) to belong to one cluster, hybrid individuals tend to have a much lower probability (< 0.5) and have a, often only slightly lower, probability to belong to another cluster. Schulte et al. [43] also argue that a posterior probability lower than 0.5 provides strong evidence for a recent hybrid origin of individuals.
To practically present our results, we have assigned all accessions to the cluster to which it had the highest  [7,8,44]: S. canasense S. bukasovii, S. candolleanum, S. coelestipetalum, S. pampasense, S. ambosinum, S. marinasense, S. velardei, S. incamayoense, S. leptophyes, S. ugentii and S. sparsipilum. Ugent [45] already proposed in 1970 that these should be reduced to one species. The division of the species according to our analysis in two clusters (10 and 16) reflects the presence of the northern and southern subgroups of the brevicaule-complex (see below). Solanum oplocense was shown to be a welldefined species using morphological data [7] and molecular data [8], but it was not distinct in an AFLP study [46] nor in ours. Previous results from a morphological study [47] and a more recent molecular study [48] had already suggested that the species S. berthaultii and S. tarijense should be combined. The species in cluster 7 were studied morphologically by Ames and collaborators [49], who placed Solanum immite and S. chancayense among the 6 distinctive species in a group of 29 species, the remainder of which were 'difficult to distinguish'.

Clusters correspond to the geographical origin of the accessions
Many accessions within a cluster come from the same geographical region (Additional file 1). For the largest and most complicated clusters (7,10,12,14,16) the information on the geographic origin of the accessions allows to draw some tentative conclusions. Cluster 16 contains mostly accessions from Argentina and Bolivia from the southern brevicaule complex and cluster 10 consist mostly of accessions from Peru (and northern Bolivia) that can be considered as belonging to the northern brevicaule complex. This separation of the brevicaule complex in a northern and southern part was already noted by Kardolus [50], was confirmed by Spooner and Salas [2] and is accepted in the treatment of this group on the Solanaceae Source website (http:// www.solanaceaesource.org), where Spooner and his collaborators maintain two species, S. candolleanum for the northern representatives, and S. brevicaule for the southern representatives. Cluster 7 contains almost exclusively Peruvian accessions, and some species labels in cluster 7 (S. albornozii, S. augustii, S. chancayense, S. dolichocremastrum, S. immite) are associated with series Piurana [1,2], but Jacobs et al. [9] could not find support for these species to be included in one of the recognized Piurana species groups. Ames and collaborators [49,51] studied putative members of series Piurana with, respectively, morphological data and COSII markers, and concluded that based on morphology only three out of a total of 33 species could be recognized. The molecular data supported more species, some of them lacking morphological support, and the authors announced that decisions on species boundaries will be formalized in a forthcoming taxonomic monograph.
Cluster 14 contains all S. berthaultii accessions and almost all S. tarijense accessions, plus a few accessions with other species labels, which mostly come from Bolivia and Argentina. Cluster 12 contains accessions from various geographical origins, most of them from Bolivia and Argentina but some are from Peru and Paraguay. This group may represent accessions that relatively easily exchanged genetic material. The geographical distribution of accessions within clusters is consistent with the notion that our approach produces a meaningful arrangement of the accessions into groups that may (have) exchange(d) genetic material. For exchange of genetic material at least the accessions with the different species labels should have overlapping or adjacent geographical areas, at present or in the recent past.
Indeed, information on the distribution areas of the species of sect. Petota given in Hijmans et al. [52] confirms overlapping areas for many species within the recognized clusters, e.g. the species S. augustii, S. immite and S. dolichocremastrum in cluster 7, and S. berthaultii and S. tarijense in cluster 14.

Conclusion
A large number of species is presently recognized in the group of South American representatives of Solanum section Petota. The approach taken in the present paper was to determine the genetic distinctiveness of these species. The outcome questions the species and subspecies status of more than half of the taxonomic labels used in South American part of Solanum section Petota. The genetically distinct clusters and groups within clusters resulting from our analysis can be used as a basis for recognizing groups of species and for an evaluation of species status (Table 3).

Additional material
Additional file 1: Plant material used and cluster assignment. This file contains information on the accession numbers and geographic origin of the 566 samples used in this study. Also indicated is the cluster to which an accession has been assigned. The table lists all probabilities for all accessions. In this file putative hybrid accessions may readily be detected through conditional formatting (probabilities above 0.5 are in dark grey cells, lower probabilities -that may be indicative of recent hybridisation -in white cells, and negligible probabilities in light grey font).