Molecular phylogenetics has made remarkable progress in recent decades toward the reconstruction of a largely resolved 'tree of life' . In flowering plants, the relationships are now clear from the deepest branches through to the family level, with only a few exceptions [2–4]. However, developing strong phylogenetic hypotheses for more closely related plants at the interface of species level and population level has been, and still is, quite challenging. At this level, gene exchange and recombination is still possible, and low sequence divergence often limits phylogenetic resolution. As a result, multiple independent phylogenetic markers with sufficient sequence divergence are necessary, and analytical approaches from population genetics may in some cases be more appropriate than traditional phylogenetic methods [5, 6].
The most commonly used marker regions in plant phylogenetics are coding and non-coding sequences from the chloroplast genome and ribosomal gene regions located in the nucleus. Recently, whole plastid genomes (plastomes) have been used to address deep branching questions [e.g. [7–11]] in plants, and a rapid increase in plastome scale data is underway. Mitochondrial genes are not as widely used because of typically very slow sequence evolution and some complexities such as extreme rate variation in some lineages and occurrences of horizontal gene transfer [12, 13], gene conversion and processed paralogy [e.g. . Nevertheless, these genes do play a substantial role for phylogenies of for instance parasitic plants [e.g. [13, 15] where plastid genomes can be highly modified [16, 17]. However, some of the most extensive efforts to date to resolve the "deep branches" in flowering plant phylogenetics have involved large taxon samples with sequence data obtained from up to 17 genes representing all three genomic compartments .
"The Tortoise and the Hare" series [19–21] addressed the utility of nuclear and chloroplast loci for low level phylogenetics. These authors noticed an increased resolution for recent branching events with the nuclear gene Adh, but the chloroplast markers required much less laboratory effort. Subsequently, a number of promising chloroplast regions have been used to address questions at low taxonomic levels . Plastomes provide numerous genes, introns, and intergenic regions; amplification and direct sequencing of PCR product is easy as a single cell harbors 1000 or more plastids with multiple (essentially) identical plastid genomes. Plastomes provide an almost unmatched source of orthologous sequence that is not complicated by gene family duplication, and multiple genes can be concatenated to provide a very long sample of orthologous sequence. However, plastome markers are tightly linked on a single (typically) non-recombining molecule that usually reflects only the maternal lineage in angiosperms, limiting the generality of plastid DNA as the sole source of evolutionary markers at and below species level. In addition, plastomes are limited in their variability (i.e. substitution rates, parsimony informative sites) and therefore their utility for low taxonomic level studies may be restricted [19, 22].
The intergenic spacer (ITS) regions of nrDNA often provide higher variability, but their orthology usually remains an assumption not tested prior to their use in phylogenetic analysis. Hundreds to thousands of copies of highly conserved nrDNA gene regions located in the nucleus make those markers easy to amplify, but their evolution by tandem duplication events results in a gene array more accurately described as a gene family or large collection of paralogs [23, 24]. The paralogous nature of nrDNA can complicate the reconstruction of phylogenetic relationships, as it is impossible to determine orthologous copies from such a large copy number. As a consequence, divergent paralogs may be inadvertently sampled from different organisms in a study; when included in phylogenetic reconstruction, artifacts can appear [e.g. .
Nuclear markers (excluding nrDNA marker) are essential for a broad range of evolutionary investigation, including systematics and character evolution [e.g. , hybridization [e.g. , polyploidization [e.g. , biogeography , origins of domestication [e.g. , and speciation [e.g. . Occurring independently all over the nuclear genome in a virtually inexhaustible repertoire in terms of both number and variability, bi-parentally inherited nuclear loci are promising on different levels, especially when compared to organellar markers.
The first attempts to establish nuclear markers other than ribosomal genes yielded a substantial number of low copy nuclear genes (LCNG) . LCNG, such as the ADH-genes , pistillata , GPAT , PRK  or LEAFY  were applied in numerous studies. These genes are known to occur either in single copy in one or more focal species in the study group, or as members of small gene families  and may not occur in any specific plant lineage. Complete sampling of gene families often involves intense experimental efforts to clone and sequence all members of a family. Thus, for practical reasons, single loci are mostly preferred.
Many different approaches have been used to identify useful nuclear single copy loci in plants, with results varying widely in terms of both the general idea and computational effort. In recent years, the advent of next generation sequencing technologies (NGS), bioinformatic progress, and publicly available sequence data have greatly facilitated the identification of such loci [e.g. . These rich sources of sequence information have been utilized by many research groups, who identified new nuclear markers for plant, animal and fungal phylogenetics to reveal relationships that could not be resolved with organelle or nuclear ribosomal DNA markers [22, 40–47]. All of them focus on genes that occur in the nucleus in single copy in a sample of organisms with sequenced genomes. The nucleus provides a vast repertoire of unlinked markers. However, since there are thousands of genes in the nucleus, marker selection is challenging and the majority of nuclear genes occur in small to large gene families. The present study is based on the approach published in Duarte et al. , where a global classification of plant protein coding sequences (Tribes) was used to identify a collection of 959 genes that are represented by exactly one copy in each of four sequenced angiosperm genomes (Arabidopsis, Populus, Vitis, and Oryza). "Tribes" are collections of related genes in a specified set of genomes produced by the gene clustering program MCL-Tribe [39, 48]. While many PlantTribes approximate gene families , the global classification also identifies genes that are members of highly distinctive clusters that lack closely related paralogs, including clusters with only a single gene in each taxon. The set of genes identified in the analysis was called APVO SSCG (Arabidopsis, Populus, Vitis, Oryza shared single copy genes). The agt1 gene, applied in the present study, is part of that set and the abbreviation nSCG (nuclear single copy gene) that is used here refers to this approach. We use the term nSCG operationally, referring to a gene that has been identified as single copy in global gene classification of a specified set of genomes, here APVO. This does not necessarily mean that the gene will be single copy in any given lineage, particularly in very recent polyploids [22, 49], where the entire gene set has been recently duplicated, and the process of duplicate gene loss is underway. However, as larger numbers of genomes are interrogated, genes that continue to be found in single copy in all but the most recent polyploids are more likely to be single copy in a given uncharacterized lineage.
Hughes et al.  proposed the exploration of nuclear loci other than nrDNA for phylogenetic reconstruction, which would require the abandonment of 'universal thinking'. The enhanced variability of nuclear loci compared to other markers provides great potential, but limits the likelihood of identifying universal amplification primers that will function across a wide taxonomic range. In fact, it might be difficult to identify universal gene loci because polyploidy is prevalent and frequent in plants. Duplicated genes are often lost by both random and selective processes in a short time, but that also means there is a chance to encounter multiple gene copies in a specific lineage .
Peperomia (Piperaceae) ranks among the ten largest angiosperm genera, with approximately 1,650 species . The phylogenetics and classification of such species-rich clades has long been very challenging. In addition, morphological characters have been shown to be subject to parallel evolution and extreme reduction, resulting in a paucity of synapomorphies [52, 53]. Moreover, speciation within Peperomia has likely happened comparatively recently , and as a consequence, reconstructed phylogenies often lack resolution at the species and population level . Recent backbone phylogenies were based on over 4000 molecular characters to overcome the lack of variability [52, 53]. Therefore, Peperomia is an ideal candidate for testing the performance of a nSCG region and comparing outcomes with variable chloroplast markers such as those suggested by Shaw et al. [20, 21]. Within this study, we focus on closely related species belonging to Peperomia subgenus Tildenia where currently 59 species are recognized [55, 56].