- Research article
- Open Access
Evaluating the feasibility of using candidate DNA barcodes in discriminating species of the large Asteraceae family
BMC Evolutionary Biologyvolume 10, Article number: 324 (2010)
Five DNA regions, namely, rbcL, matK, ITS, ITS2, and psbA-trnH, have been recommended as primary DNA barcodes for plants. Studies evaluating these regions for species identification in the large plant taxon, which includes a large number of closely related species, have rarely been reported.
The feasibility of using the five proposed DNA regions was tested for discriminating plant species within Asteraceae, the largest family of flowering plants. Among these markers, ITS2 was the most useful in terms of universality, sequence variation, and identification capability in the Asteraceae family. The species discriminating power of ITS2 was also explored in a large pool of 3,490 Asteraceae sequences that represent 2,315 species belonging to 494 different genera. The result shows that ITS2 correctly identified 76.4% and 97.4% of plant samples at the species and genus levels, respectively. In addition, ITS2 displayed a variable ability to discriminate related species within different genera.
ITS2 is the best DNA barcode for the Asteraceae family. This approach significantly broadens the application of DNA barcoding to resolve classification problems in the family Asteraceae at the genera and species levels.
Asteraceae is the largest family of flowering plants in the world. The family includes over 1,600 genera and 23,000 individual species. Many members of the Asteraceae family are important for medicinal, ornamental, and economic purposes.
Approximately 300 Asteraceae species are already used for medicinal purposes in China. For example, Artemisia annua and its derivatives are effective in treating malaria . Saussurea involucrate, an endangered species, possesses anti-fatigue, anti-inflammation, anti-tumor and free radical scavenging properties . Echinacea also has immuno-modulatory properties with its ability to reduce inflammation, speed up wound healing and boost the immune system in response to bacterial or viral infection . Commercially important plants of the Asteraceae family include the food crops Lactuca sativa (lettuce), Cichorium intybus (chicory), Cynara scolymus (globe artichoke), Smallanthus sonchifolius (yacon), Helianthus tuberosus (jerusalem artichoke), and so on. Aside from consumption, the seeds of Helianthus annuus (sunflower), and those of Carthamus tinctorius (safflower), another Asteraceae member, can be used for the production of cooking oil. Other commercially important species of the family Asteraceae are members of the Tanacetum, Chrysanthemum and Pulicaria genera, which have insecticidal properties. Eupatorium adenophorum is also one of the more noxious invasive plants worldwide, and it does have a significant effect on local ecosystems.
The wide variety of plants in the family Asteraceae often makes identification at the species level difficult . Given the many valuable members of Asteraceae described above, an easy and accurate method of authenticating an Asteraceae species is indispensable for ensuring the drug and food safety of internationally traded herbs.
DNA barcoding is a process that uses a short piece of DNA sequence from a standard locus as a species identification tool . DNA barcode regions have already been adopted for animal use [6, 7] and several regions have previously been recommended for plant use [8–17]. The Plant Working Group of the Consortium for the Barcode of Life (CBOL) proposed rbcL and matK as the core DNA barcodes for plants . A previous study by Kress et al.  tested 7 promising barcodes. However, only 15 sequences of Asteraceae, representing 14 species distributed among only 9 genera, were analyzed in the study. The CBOL Plant Working Group  also evaluated the performance of the leading barcoding loci in species identification, but the sequences of Asteraceae used included just 75 samples, consisting of 38 species belonging to 19 genera. Chen et al.  likewise compared the practicality of using the suggested barcode sequences against a large number of medicinal plants. However, the study included no more than 450 sequences of Asteraceae derived from 306 species from 50 genera. The researchers did not provide sufficient evidence that the recommended DNA barcode regions are suitable for species identification in the family Asteraceae, which includes a large number of closely related species. Thus, this issue is addressed in our study by comparing the feasibility of using each of these five proposed DNA barcodes (rbcL, matK, ITS, ITS2, and psbA-trnH) in the Asteraceae family.
Results and Discussion
Assessment of the universality of the five candidate barcodes
A universal DNA barcode is required to be tractable for use in a wide range of species. Therefore barcode regions must be relatively short in length to facilitate DNA extraction, amplification and sequencing . As shown in Figure 1, for the selected samples, three regions (ITS2, psbA-trnH and rbcL) were amplified using a single pair of universal primers for each locus that results in high amplification and a sequencing efficiency of 85%. In comparison, ITS had a relatively lower efficiency at 75%. We used two pairs of matK primers exhibiting different universalities for the members of the family Asteraceae. The primers Kim3F/1R and 390F/1326R achieved amplifying and sequencing efficiencies of 91% and 25%, respectively.
Measurement of inter- versus intra-specific genetic divergence at each locus
Six metrics were employed to characterize inter- versus intra-specific variation (Figure 2) [11, 12, 19–21]. A favorable barcode should possess a high inter-specific divergence to distinguish different species. ITS2 and ITS both exhibited significantly higher levels of inter-specific discriminatory ability than psbA-trnH and matK. The lowest divergence between conspecific individuals, as determined by all inter-specific calculations was exhibited by rbcL. Wilcoxon signed-rank tests affirmed that ITS2 had the highest divergence at the inter-specific level, whereas rbcL had the lowest (Table 1). The results of the intra-specific differences were similar, with ITS2 contributing the largest and rbcL the smallest variations (Figure 2).
Testing the efficacy of authentication
BLAST1 and Distance methods were used to test the ability of the potential barcoding sequences in assigning unique species identities to the given samples [12, 22]. The results from the two methods revealed a clear pattern (Figure 3), demonstrating that the ITS region exhibits the highest identification efficiency. ITS2 and ITS performed well at the genus level using both methods, and at the species level using the Distance method. Using the BLAST1 method, ITS2 was less efficient (2.5%) than ITS at the species level, while rbcL was the lowest performer. In addition, except for the combination of matK and psbA-trnH, which improved the correct identification rates by 1.4%, using one sequence rather than a combination of two markers didn't improve the rates of identification.
The meta-analysis of markers, ITS, psbA-trnH, matK and rbcL, was also performed in parallel with the analysis on ITS2 using GenBank data (see Additional file 1: Identification efficiency of the five regions evaluated in a large pool of Asteraceae samples from GenBank). The correct identification rates were significantly higher for ITS2 than for other markers except ITS. The GenBank data analyses were consistent with our experimental data results. Compared with single markers alone, combinations of markers could improve the rate of correct species identification (<5%).
Overall, our study demonstrates that ITS2 is the most successful region in terms of universality, the specific genetic divergence, and discrimination between species among the five markers examined. ITS is also proven as a valuable marker for authenticating species in Asteraceae. However, its low amplification efficiency limits its potential for broad taxonomic use. Although matK, rbcL, and psbA-trnH have effective primers for the amplification, the three markers are less powerful than ITS and ITS2 in species discrimination in the family Asteraceae. Moreover, theoretically, the regions based on nuclear DNA are much more informative than barcodes based on organellar DNA .
To evaluate further the ability of the ITS2 region to authenticate a wide range of Asteraceae species, it was also tested against a larger database that includes 3,490 samples sequences derived from 2,315 different species (Table 2). The ITS2 region performed well, with a 76.4% (BLAST1 method) and 69.7% (Distance method) successful identification rate at the species level and a 97.4% (BLAST1 method) and 96.2% (Distance method) successful identification rate at the genus level.
ITS2 is suitable, but not ideal
Our research displayed a similar trend to that of Chen et al.  and demonstrated that ITS2 is a promising barcode for authenticating plant species. In accordance with the criteria outlined by Kress et al. , the ITS2 region has several advantages that make it a promising candidate for DNA barcoding. It has been proposed as a candidate marker for taxonomic classification and barcoding of medicinal plants because it has both high correct identification rate and high amplification efficiency [12, 24–27]. As the ITS2 region is one of the most common regions used for phylogenetic analyses [28–30], a vast amount of sequencing data has already been deposited in GenBank and is ready for immediate use.
The presence of multiple copies of ITS2 sequences is challenging . However, Coleman  proposed that the repeats displayed a high degree of similarity. Coleman also suggested that the PCR-amplified copies could represent the information of the ITS2 region in individuals and that ITS2 could be considered a single locus in most cases.
Among the six large genera (number of species > 50) in the Asteraceae family (Table 2), the utility of ITS2 for species authentication varied and could only be analyzed individually, not as a group. For the genus Brachyscome, with 57 sequences representing 55 species, ITS2 worked well with a 96.5% successful identification rate. Satisfactory results were also obtained for the genus Erigeron, where >80.5% of the sequences were correctly identified. In contrast, ITS2 had lower identification efficiency for the genera Centaurea and Artemisia (48.0% and 59.3%, respectively). And in two other genera (Senecio and Stevia), ITS2 was relatively powerful for taxonomic classification, precisely authenticating 73.4% and 76.9% of the samples, respectively. The identification efficiencies of ITS2 in dataset 2 are listed in Additional file 2 (Authentication efficiency of ITS2 using different methods for the genera in dataset 2 containing more than one species).
To improve identification accuracy within a particular genus, using combinations of DNA barcodes may be necessary. Therefore, ITS is proposed for use as a complementary barcode for differentiating species within the Asteraceae family.
Application and meaning of DNA barcoding
The selected DNA barcode for Asteraceae, ITS2, is not perfect, especially for taxonomists and phylogenetic experts. However, even an imperfect barcode can have a major effect on many areas of research and be sufficient for many applications . For instance, ITS2 might be a suitable DNA barcode for public users, such as customs officials, forensic examiners, food-processing individuals, and research organizations. Considering that ITS2 has a strong ability to group plant samples into their correct genus and has a relatively high accuracy for grouping samples into their correct species, it is of great practical value to individuals without adequate taxonomic training. Compared with ITS2, ITS or the chloroplast genome is better equipped to deal with the biological complexities of species distinctions, a major focus of taxonomists and phylogenetic experts .
Altogether, our results support the claim that ITS2 is a valuable locus for differentiating species within Asteraceae and that DNA barcoding is a useful tool for classification and identification of individual species. We propose applying DNA barcoding technology to resolve classification problems in the family Asteraceae at the genera and species levels.
Sampling of plant materials
Dataset 1, which consists of 110 samples from 63 species representing 48 genera of Asteraceae (see Additional file 3: Samples in dataset 1 for testing the potential barcodes and accession numbers in GenBank) was gathered from a large geographical area in China from July 2007 to January 2008. Great effort was made to ensure that the samples represent the major lineages of Asteraceae. Furthermore, the maximum number of samples belonging to closely related species was collected (Table 3). Plant samples in dataset 1 were spread across two subfamilies (Carduoideae and Cichorioideae) and encompassed a total of 11 tribes of the family Asteraceae.
DNA extraction, PCR amplification, and sequencing
DNA extraction, PCR amplification, and sequencing were performed as described previously .
Data Acquisition from GenBank
First, all sequences involving the five markers of Asteraceae were downloaded from GenBank. Certain gene regions of the five barcoding markers based on GenBank annotations were then obtained. Sequences <100 bp in length, with ambiguous bases (more than 15'Ns'), or those belonging to unnamed species (i.e. sequences with 'sp.' in the species name) were filtered out. Finally, to avoid contamination with fungal sequences existing in ITS2 sequences, a Hidden Markov Model (HMM)  based on well-curated fungal sequences was used to search for downloaded ITS2 sequences to remove the sequences possibly contaminated with fungi. The meta-analysis was performed using the remaining sequences (see Additional file 4: Accession numbers of the five loci sequences from GenBank for the meta-analysis). The ITS2 sequences were also used to construct dataset 2, which is comprised of 3,490 sequences from 2,315 Asteraceae species downloaded from GenBank (see Additional file 5: Accession numbers of ITS2 sequences used in dataset 2). Many closely related species were also included in dataset 2 (Table 3).
Sequence alignment and analysis
Consensus sequences and contig generation were accomplished using CodonCode Aligner V 3.5 (CodonCode Co., USA). The sequences of the candidate DNA barcodes were aligned using Clustal W and the genetic distances were calculated using the Kimura 2-Parameter (K2P) model. The average intra-specific distance, theta, and coalescent depth were calculated to evaluate the intra-specific variation [12, 19]-. The average inter-specific distance, the minimum inter-specific distance, and Theta primer were used to represent inter-specific divergences [11, 12, 20, 21]. Wilcoxon signed-rank tests were used as previously described [10–12]. Two methods of species identification, namely BLAST1 and the nearest distance method, were performed as described previously [12, 22]. The traffic light approach  was used to identify the combination of markers, as long as the sequences could be identified by one of the markers in combination, the combination would have identification power. If any of the sequences were identified unsuccessfully for any marker in combination, the combination would incapable of identifying that sequence.
Mueller MS, Karhagomba IB, Hirt HM, Wemakor E: The potential of Artemisia annua L. as a locally produced remedy for malaria in the tropics: agricultural, chemical and clinical aspects. J Ethnopharmacol. 2000, 73: 487-493. 10.1016/S0378-8741(00)00289-0.
Wu W, Qu Y, Gao HY, Yang JY, Xu JG, Wu LJ: Novel ceramides from aerial parts of Saussurea involucrata Kar. et. Kir. Arch Pharm Res. 2009, 32: 1221-1225. 10.1007/s12272-009-1906-6.
Haddad PS, Azar GA, Groom S, Boivin M: Natural health products modulation of immune function and prevention of chronic diseases. Complement Alternat Med. 2005, 2: 513-520.
Bayer RJ, Starr JR: Tribal interrelationships and phylogeny of the Asteraceae. Ann Mo Bot Gard. 1998, 85: 242-256. 10.2307/2992008.
Hebert PD, Cywinska A, Ball SL, de Waard JR: Biological identifications through DNA barcodes. Proc Biol Sci. 2003, 270: 313-321. 10.1098/rspb.2002.2218.
Evans KM, Wortley AH, Mann DG: An assessment of potential diatom ''barcode'' genes (cox1, rbcL, 18S and ITS rDNA) and their effectiveness in determining relationships in Sellaphora (Bacillariophyta). Protist. 2007, 158: 349-364. 10.1016/j.protis.2007.04.001.
Hogg ID, Hebert PD: Biological identification of springtails (Hexapoda: Collembola) from the Canadian Arctic, using mitochondrial DNA barcodes. Can J Zool. 2004, 82: 749-754. 10.1139/z04-041.
Song JY, Yao H, Li Y, Li XW, Lin YL, Liu C, Han JP, Xie CX, Chen SL: Authentication of the family Polygonaceae in Chinese pharmacopoeia by DNA barcoding technique. J Ethnopharmacol. 2009, 124: 434-439. 10.1016/j.jep.2009.05.042.
Kress WJ, Wurdack KJ, Zimmer EA, Weigt LA, Janzen DH: Use of DNA barcodes to identify flowering plants. Proc Natl Acad Sci USA. 2005, 102: 8369-8374. 10.1073/pnas.0503123102.
Kress WJ, Erickson DL: A two-locus global DNA barcode for land plants: the coding rbcL gene complements the non-coding trnH-psbA spacer region. PLoS One. 2007, 2: e508-10.1371/journal.pone.0000508.
Lahaye R, van der BM, Bogarin D, Warner J, Pupulin F, Gigot G: DNA barcoding the floras of biodiversity hotspots. Proc Natl Acad Sci USA. 2008, 105: 2923-2928. 10.1073/pnas.0709936105.
Chen SL, Yao H, Han JP, Liu C, Song JY, Shi LC, Zhu YJ, Ma XY, Gao T, Pang XH, Luo K, Li Y, Li XW, Jia XC, Lin YL, Leon C: Validation of the ITS2 Region as a novel DNA barcode for identifying medicinal plant species. PLoS One. 2010, 5: e8613-10.1371/journal.pone.0008613.
Chase MW, Cowan RS, Hollingsworth PM, van den Berg C, Madrinan S, Petersen G, Seberg O, Jorgsensen T, Cameron KM, Carine M, Pedersen N, Hedderson TAJ, Conrad F, Salazar GA, Richardson JE, Hollingsworth ML, Barraclough TG, Kelly L, Wilkinson M: A proposal for a standardized protocol to barcode all land plants. Taxon. 2007, 56: 295-299.
Gonzalez MA, Baraloto C, Engel J, Mori SA, Pétronelli P, Riéra B, Roger A, Thébaud C, Chave J: Identification of amazonian trees with DNA barcodes. PLoS One. 2009, 4: e7483-10.1371/journal.pone.0007483.
Yao H, Song JY, Ma XY, Liu C, Li Y, Xu HX, Han JP, Duan LS, Chen SL: Identification of Dendrobium species by a candidate DNA barcode sequence: the chloroplast psbA-trnH intergenic region. Planta Med. 2009, 75: 667-669. 10.1055/s-0029-1185385.
Fazekas AJ, Burgess KS, Kesanakurti PR, Graham SW, Newmaster SG, Husband BC, Percy DM, Hajibabaei M, Barrett SCH: Multiple multilocus DNA barcodes from the plastid genome discriminate plant species equally well. PLoS One. 2008, 3: e2802-10.1371/journal.pone.0002802.
Kress WJ, Erickson DL, Jones FA, Swensond NG, Perez R, Sanjur O, Bermingham E: Plant DNA barcodes and a community phylogeny of a tropical forest dynamics plot in Panama. Proc Natl Acad Sci USA. 2009, 106: 18621-18626. 10.1073/pnas.0909820106.
CBOL Plant Working Group: A DNA barcode for land plants. Proc Natl Acad Sci USA. 2009, 106: 12794-12797. 10.1073/pnas.0905845106.
Meier R, Zhang GY, Ali F: The use of mean instead of smallest inter-specific distances exaggerates the size of the "barcoding gap" and leads to misidentification. Syst Biol. 2008, 57: 809-813. 10.1080/10635150802406343.
Meyer CP, Paulay G: DNA barcoding: error rates based on comprehensive sampling. PLoS Biol. 2005, 3: 2229-2238.
Pang XH, Song JY, Zhu YJ, Xu HX, Huang LF, Chen SL: Applying plant DNA barcodes for Rosaceae species identification. Cladistics. 2010, 26:
Ross HA, Murugan S, Li WLS: Testing the reliability of genetic methods of species identification via simulation. Syst Biol. 2008, 57: 216-230. 10.1080/10635150802032990.
Thomas C: Plant barcode soon to become reality. Science. 2009, 325: 526-10.1126/science.325_526.
Coleman AW: ITS2 is a double-edged tool for eukaryote evolutionary comparisons. Trends Genet. 2003, 19: 370-375. 10.1016/S0168-9525(03)00118-5.
Coleman AW: Pan-eukaryote ITS2 homologies revealed by RNA secondary structure. Nucleic Acids Res. 2007, 35: 3322-3329. 10.1093/nar/gkm233.
Chiou SJ, Yen JH, Fang CL, Chen HL, Lin TY: Authentication of medicinal herbs using PCR-amplified ITS2 with specific primers. Planta Med. 2007, 73: 1421-1426. 10.1055/s-2007-990227.
Coleman AW: Is there a molecular key to the level of "biological species" in eukaryotes? A DNA guide. Mol Biol Evol. 2009, 50: 197-203.
Linder CR, Goertzen LR, Heuvel BV, Francisco-Ortegac J, Jansen RK: a The complete external transcribed spacer of 18S-26S rDNA: amplification and phylogenetic utility at low taxonomic levels in Asteraceae and closely allied families. Mol Phylogenet Evol. 2000, 14: 285-303. 10.1006/mpev.1999.0706.
Englund M, Pornpongrungrueng P, Gustafsson MHG, Anderberg AA: Phylogenetic relationships and generic delimitation in Inuleae (Asteraceae) based on ITS and cpDNA sequence data. Cladistics. 2009, 25: 319-352. 10.1111/j.1096-0031.2009.00256.x.
Pelser PB, Nordenstam B, Kadereit JW, Watson LE: An ITS phylogeny of tribe Senecioneae (Asteraceae) and a new delimitation of Senecio L. Taxon. 2007, 5694: 1062-1077.
Keller A, Schleicher T, Schultz J, Müller T, Dandekar T, Wolf M: 5.8S-28S rRNA interaction and HMM-based ITS2 annotation. Gene. 2009, 430: 50-57. 10.1016/j.gene.2008.10.012.
Chase MW, Salamin N, Wilkinson M, Dunwell JM, Kesanakurthi RP, Haidar N, Savolainen V: Land plants and DNA barcodes: short-term and long-term goals. Phil Trans R Soc B. 2005, 360: 1889-1895. 10.1098/rstb.2005.1720.
We want to thank YL Lin for the morphological confirmation of plant species, and we would like to thank KL Chen, XC Jia, K Luo for providing plant samples. We are also grateful to LC Shi for helping analyze the data. This work is supported by the Special Founding for Healthy Field (No. 200802043) and National Natural Science Foundation (No.30970307).
TG, JYS, SLC, CL conceived of the project and designed experiments. TG and HY performed the experiments. TG and YJZ analyzed the data. TG, HY wrote the paper. All authors read and approved the final manuscript.