Skip to main content
Fig. 1 | BMC Evolutionary Biology

Fig. 1

From: Identification and assessment of variable single-copy orthologous (SCO) nuclear loci for low-level phylogenomics: a case study in the genus Rosa (Rosaceae)

Fig. 1

Data-mining workflow to identify single-copy orthologous tags (SCOTags) for phylogenomics. Single-copy genes (SCGs) from reference genomes are identified using a self-blast procedure (step 1). The two SCG sets are compared to each other to retrieve shared single-copy orthologs (SCOs) (step 2). SCOs are target-assembled from unassembled whole genome shotgun sequencing data using the aTRAM pipeline. Numbers presented in table (1) correspond to the total number of contigs that were assembled for each Rosa species with an unassembled genome (step 3). Contig sequences from each SCO are aligned using mafft and the resulting alignment is sliced in regions ≥300 bp covered by ≥4 taxa including Rosa ‘Old Blush’ and Rosa persica. For each region, pairs of primers are designed on the consensus sequence and the most variable non-overlapping SCOTags are retained (step 4). Additional filtering steps enables to discard SCOTags with unspecific primer pairs (step 5a), SCOTags that do not pass the RBB test of orthology (5b), SCOTags with inconsistent number of alleles regarding the genome ploidy level (5c) and to find SCOTags in whole genome shot gun assemblies of three additional Rosa species (step 5d) and seven outgroups. Numbers in table (2) correspond to the number of SCOTags that were retrieved for each of the four Rosa species with already assembled datasets. The procedure is described in detail in the Methods section. RBB: Reciprocal Best Blast; mcl: Markov CLuster algorithm

Back to article page