Evaluating the feasibility of using candidate DNA barcodes in discriminating species of the large Asteraceae family
© Gao et al; licensee BioMed Central Ltd. 2010
Received: 18 March 2010
Accepted: 26 October 2010
Published: 26 October 2010
Five DNA regions, namely, rbcL, matK, ITS, ITS2, and psbA-trnH, have been recommended as primary DNA barcodes for plants. Studies evaluating these regions for species identification in the large plant taxon, which includes a large number of closely related species, have rarely been reported.
The feasibility of using the five proposed DNA regions was tested for discriminating plant species within Asteraceae, the largest family of flowering plants. Among these markers, ITS2 was the most useful in terms of universality, sequence variation, and identification capability in the Asteraceae family. The species discriminating power of ITS2 was also explored in a large pool of 3,490 Asteraceae sequences that represent 2,315 species belonging to 494 different genera. The result shows that ITS2 correctly identified 76.4% and 97.4% of plant samples at the species and genus levels, respectively. In addition, ITS2 displayed a variable ability to discriminate related species within different genera.
ITS2 is the best DNA barcode for the Asteraceae family. This approach significantly broadens the application of DNA barcoding to resolve classification problems in the family Asteraceae at the genera and species levels.
Asteraceae is the largest family of flowering plants in the world. The family includes over 1,600 genera and 23,000 individual species. Many members of the Asteraceae family are important for medicinal, ornamental, and economic purposes.
Approximately 300 Asteraceae species are already used for medicinal purposes in China. For example, Artemisia annua and its derivatives are effective in treating malaria . Saussurea involucrate, an endangered species, possesses anti-fatigue, anti-inflammation, anti-tumor and free radical scavenging properties . Echinacea also has immuno-modulatory properties with its ability to reduce inflammation, speed up wound healing and boost the immune system in response to bacterial or viral infection . Commercially important plants of the Asteraceae family include the food crops Lactuca sativa (lettuce), Cichorium intybus (chicory), Cynara scolymus (globe artichoke), Smallanthus sonchifolius (yacon), Helianthus tuberosus (jerusalem artichoke), and so on. Aside from consumption, the seeds of Helianthus annuus (sunflower), and those of Carthamus tinctorius (safflower), another Asteraceae member, can be used for the production of cooking oil. Other commercially important species of the family Asteraceae are members of the Tanacetum, Chrysanthemum and Pulicaria genera, which have insecticidal properties. Eupatorium adenophorum is also one of the more noxious invasive plants worldwide, and it does have a significant effect on local ecosystems.
The wide variety of plants in the family Asteraceae often makes identification at the species level difficult . Given the many valuable members of Asteraceae described above, an easy and accurate method of authenticating an Asteraceae species is indispensable for ensuring the drug and food safety of internationally traded herbs.
DNA barcoding is a process that uses a short piece of DNA sequence from a standard locus as a species identification tool . DNA barcode regions have already been adopted for animal use [6, 7] and several regions have previously been recommended for plant use [8–17]. The Plant Working Group of the Consortium for the Barcode of Life (CBOL) proposed rbcL and matK as the core DNA barcodes for plants . A previous study by Kress et al.  tested 7 promising barcodes. However, only 15 sequences of Asteraceae, representing 14 species distributed among only 9 genera, were analyzed in the study. The CBOL Plant Working Group  also evaluated the performance of the leading barcoding loci in species identification, but the sequences of Asteraceae used included just 75 samples, consisting of 38 species belonging to 19 genera. Chen et al.  likewise compared the practicality of using the suggested barcode sequences against a large number of medicinal plants. However, the study included no more than 450 sequences of Asteraceae derived from 306 species from 50 genera. The researchers did not provide sufficient evidence that the recommended DNA barcode regions are suitable for species identification in the family Asteraceae, which includes a large number of closely related species. Thus, this issue is addressed in our study by comparing the feasibility of using each of these five proposed DNA barcodes (rbcL, matK, ITS, ITS2, and psbA-trnH) in the Asteraceae family.
Results and Discussion
Assessment of the universality of the five candidate barcodes
Measurement of inter- versus intra-specific genetic divergence at each locus
Wilcoxon signed rank test of the inter-specific divergences among the five loci
inter Relative Ranks, n, Pvalue
W + = 92, W - = 13, n = 14, P < 0.0132
W + = 89, W - = 2, n = 13, P < 0.0024
psbA-trnH > rbcL
W + = 65, W - = 13, n = 12, P < 0.0414
W + = 21, W - = 0, n = 6, P < 0.0277
psbA-trnH > matK
W + = 36, W - = 0, n = 8, P < 0.0117
W + = 355, W - = 23, n = 27, P < 6.6389 × 10-5
ITS2 > ITS
W + = 45, W - = 0, n = 9, P < 0.0076
W + = 45, W - = 0, n = 9, P < 0.0076
W + = 3, W - = 7, n = 4, P < 0.4615
rbcL = matK
W + = 45, W - = 0, n = 9, P < 0.0075
Testing the efficacy of authentication
The meta-analysis of markers, ITS, psbA-trnH, matK and rbcL, was also performed in parallel with the analysis on ITS2 using GenBank data (see Additional file 1: Identification efficiency of the five regions evaluated in a large pool of Asteraceae samples from GenBank). The correct identification rates were significantly higher for ITS2 than for other markers except ITS. The GenBank data analyses were consistent with our experimental data results. Compared with single markers alone, combinations of markers could improve the rate of correct species identification (<5%).
Overall, our study demonstrates that ITS2 is the most successful region in terms of universality, the specific genetic divergence, and discrimination between species among the five markers examined. ITS is also proven as a valuable marker for authenticating species in Asteraceae. However, its low amplification efficiency limits its potential for broad taxonomic use. Although matK, rbcL, and psbA-trnH have effective primers for the amplification, the three markers are less powerful than ITS and ITS2 in species discrimination in the family Asteraceae. Moreover, theoretically, the regions based on nuclear DNA are much more informative than barcodes based on organellar DNA .
Identification efficiency of the ITS2 locus for the family and six large genera in dataset 2 using different methods
Success identification (%)
at the species level/
at the genus level
ITS2 is suitable, but not ideal
Our research displayed a similar trend to that of Chen et al.  and demonstrated that ITS2 is a promising barcode for authenticating plant species. In accordance with the criteria outlined by Kress et al. , the ITS2 region has several advantages that make it a promising candidate for DNA barcoding. It has been proposed as a candidate marker for taxonomic classification and barcoding of medicinal plants because it has both high correct identification rate and high amplification efficiency [12, 24–27]. As the ITS2 region is one of the most common regions used for phylogenetic analyses [28–30], a vast amount of sequencing data has already been deposited in GenBank and is ready for immediate use.
The presence of multiple copies of ITS2 sequences is challenging . However, Coleman  proposed that the repeats displayed a high degree of similarity. Coleman also suggested that the PCR-amplified copies could represent the information of the ITS2 region in individuals and that ITS2 could be considered a single locus in most cases.
Among the six large genera (number of species > 50) in the Asteraceae family (Table 2), the utility of ITS2 for species authentication varied and could only be analyzed individually, not as a group. For the genus Brachyscome, with 57 sequences representing 55 species, ITS2 worked well with a 96.5% successful identification rate. Satisfactory results were also obtained for the genus Erigeron, where >80.5% of the sequences were correctly identified. In contrast, ITS2 had lower identification efficiency for the genera Centaurea and Artemisia (48.0% and 59.3%, respectively). And in two other genera (Senecio and Stevia), ITS2 was relatively powerful for taxonomic classification, precisely authenticating 73.4% and 76.9% of the samples, respectively. The identification efficiencies of ITS2 in dataset 2 are listed in Additional file 2 (Authentication efficiency of ITS2 using different methods for the genera in dataset 2 containing more than one species).
To improve identification accuracy within a particular genus, using combinations of DNA barcodes may be necessary. Therefore, ITS is proposed for use as a complementary barcode for differentiating species within the Asteraceae family.
Application and meaning of DNA barcoding
The selected DNA barcode for Asteraceae, ITS2, is not perfect, especially for taxonomists and phylogenetic experts. However, even an imperfect barcode can have a major effect on many areas of research and be sufficient for many applications . For instance, ITS2 might be a suitable DNA barcode for public users, such as customs officials, forensic examiners, food-processing individuals, and research organizations. Considering that ITS2 has a strong ability to group plant samples into their correct genus and has a relatively high accuracy for grouping samples into their correct species, it is of great practical value to individuals without adequate taxonomic training. Compared with ITS2, ITS or the chloroplast genome is better equipped to deal with the biological complexities of species distinctions, a major focus of taxonomists and phylogenetic experts .
Altogether, our results support the claim that ITS2 is a valuable locus for differentiating species within Asteraceae and that DNA barcoding is a useful tool for classification and identification of individual species. We propose applying DNA barcoding technology to resolve classification problems in the family Asteraceae at the genera and species levels.
Sampling of plant materials
Number of DNA sequences used in the study
Total No. of sequences
No. of sequences belonging to genera containing more than one species
No. of sequences belonging to species containing more than one samples
DNA extraction, PCR amplification, and sequencing
DNA extraction, PCR amplification, and sequencing were performed as described previously .
Data Acquisition from GenBank
First, all sequences involving the five markers of Asteraceae were downloaded from GenBank. Certain gene regions of the five barcoding markers based on GenBank annotations were then obtained. Sequences <100 bp in length, with ambiguous bases (more than 15'Ns'), or those belonging to unnamed species (i.e. sequences with 'sp.' in the species name) were filtered out. Finally, to avoid contamination with fungal sequences existing in ITS2 sequences, a Hidden Markov Model (HMM)  based on well-curated fungal sequences was used to search for downloaded ITS2 sequences to remove the sequences possibly contaminated with fungi. The meta-analysis was performed using the remaining sequences (see Additional file 4: Accession numbers of the five loci sequences from GenBank for the meta-analysis). The ITS2 sequences were also used to construct dataset 2, which is comprised of 3,490 sequences from 2,315 Asteraceae species downloaded from GenBank (see Additional file 5: Accession numbers of ITS2 sequences used in dataset 2). Many closely related species were also included in dataset 2 (Table 3).
Sequence alignment and analysis
Consensus sequences and contig generation were accomplished using CodonCode Aligner V 3.5 (CodonCode Co., USA). The sequences of the candidate DNA barcodes were aligned using Clustal W and the genetic distances were calculated using the Kimura 2-Parameter (K2P) model. The average intra-specific distance, theta, and coalescent depth were calculated to evaluate the intra-specific variation [12, 19]-. The average inter-specific distance, the minimum inter-specific distance, and Theta primer were used to represent inter-specific divergences [11, 12, 20, 21]. Wilcoxon signed-rank tests were used as previously described [10–12]. Two methods of species identification, namely BLAST1 and the nearest distance method, were performed as described previously [12, 22]. The traffic light approach  was used to identify the combination of markers, as long as the sequences could be identified by one of the markers in combination, the combination would have identification power. If any of the sequences were identified unsuccessfully for any marker in combination, the combination would incapable of identifying that sequence.
We want to thank YL Lin for the morphological confirmation of plant species, and we would like to thank KL Chen, XC Jia, K Luo for providing plant samples. We are also grateful to LC Shi for helping analyze the data. This work is supported by the Special Founding for Healthy Field (No. 200802043) and National Natural Science Foundation (No.30970307).
- Mueller MS, Karhagomba IB, Hirt HM, Wemakor E: The potential of Artemisia annua L. as a locally produced remedy for malaria in the tropics: agricultural, chemical and clinical aspects. J Ethnopharmacol. 2000, 73: 487-493. 10.1016/S0378-8741(00)00289-0.View ArticlePubMedGoogle Scholar
- Wu W, Qu Y, Gao HY, Yang JY, Xu JG, Wu LJ: Novel ceramides from aerial parts of Saussurea involucrata Kar. et. Kir. Arch Pharm Res. 2009, 32: 1221-1225. 10.1007/s12272-009-1906-6.View ArticlePubMedGoogle Scholar
- Haddad PS, Azar GA, Groom S, Boivin M: Natural health products modulation of immune function and prevention of chronic diseases. Complement Alternat Med. 2005, 2: 513-520.View ArticleGoogle Scholar
- Bayer RJ, Starr JR: Tribal interrelationships and phylogeny of the Asteraceae. Ann Mo Bot Gard. 1998, 85: 242-256. 10.2307/2992008.View ArticleGoogle Scholar
- Hebert PD, Cywinska A, Ball SL, de Waard JR: Biological identifications through DNA barcodes. Proc Biol Sci. 2003, 270: 313-321. 10.1098/rspb.2002.2218.PubMed CentralView ArticlePubMedGoogle Scholar
- Evans KM, Wortley AH, Mann DG: An assessment of potential diatom ''barcode'' genes (cox1, rbcL, 18S and ITS rDNA) and their effectiveness in determining relationships in Sellaphora (Bacillariophyta). Protist. 2007, 158: 349-364. 10.1016/j.protis.2007.04.001.View ArticlePubMedGoogle Scholar
- Hogg ID, Hebert PD: Biological identification of springtails (Hexapoda: Collembola) from the Canadian Arctic, using mitochondrial DNA barcodes. Can J Zool. 2004, 82: 749-754. 10.1139/z04-041.View ArticleGoogle Scholar
- Song JY, Yao H, Li Y, Li XW, Lin YL, Liu C, Han JP, Xie CX, Chen SL: Authentication of the family Polygonaceae in Chinese pharmacopoeia by DNA barcoding technique. J Ethnopharmacol. 2009, 124: 434-439. 10.1016/j.jep.2009.05.042.View ArticlePubMedGoogle Scholar
- Kress WJ, Wurdack KJ, Zimmer EA, Weigt LA, Janzen DH: Use of DNA barcodes to identify flowering plants. Proc Natl Acad Sci USA. 2005, 102: 8369-8374. 10.1073/pnas.0503123102.PubMed CentralView ArticlePubMedGoogle Scholar
- Kress WJ, Erickson DL: A two-locus global DNA barcode for land plants: the coding rbcL gene complements the non-coding trnH-psbA spacer region. PLoS One. 2007, 2: e508-10.1371/journal.pone.0000508.PubMed CentralView ArticlePubMedGoogle Scholar
- Lahaye R, van der BM, Bogarin D, Warner J, Pupulin F, Gigot G: DNA barcoding the floras of biodiversity hotspots. Proc Natl Acad Sci USA. 2008, 105: 2923-2928. 10.1073/pnas.0709936105.PubMed CentralView ArticlePubMedGoogle Scholar
- Chen SL, Yao H, Han JP, Liu C, Song JY, Shi LC, Zhu YJ, Ma XY, Gao T, Pang XH, Luo K, Li Y, Li XW, Jia XC, Lin YL, Leon C: Validation of the ITS2 Region as a novel DNA barcode for identifying medicinal plant species. PLoS One. 2010, 5: e8613-10.1371/journal.pone.0008613.PubMed CentralView ArticlePubMedGoogle Scholar
- Chase MW, Cowan RS, Hollingsworth PM, van den Berg C, Madrinan S, Petersen G, Seberg O, Jorgsensen T, Cameron KM, Carine M, Pedersen N, Hedderson TAJ, Conrad F, Salazar GA, Richardson JE, Hollingsworth ML, Barraclough TG, Kelly L, Wilkinson M: A proposal for a standardized protocol to barcode all land plants. Taxon. 2007, 56: 295-299.Google Scholar
- Gonzalez MA, Baraloto C, Engel J, Mori SA, Pétronelli P, Riéra B, Roger A, Thébaud C, Chave J: Identification of amazonian trees with DNA barcodes. PLoS One. 2009, 4: e7483-10.1371/journal.pone.0007483.PubMed CentralView ArticlePubMedGoogle Scholar
- Yao H, Song JY, Ma XY, Liu C, Li Y, Xu HX, Han JP, Duan LS, Chen SL: Identification of Dendrobium species by a candidate DNA barcode sequence: the chloroplast psbA-trnH intergenic region. Planta Med. 2009, 75: 667-669. 10.1055/s-0029-1185385.View ArticlePubMedGoogle Scholar
- Fazekas AJ, Burgess KS, Kesanakurti PR, Graham SW, Newmaster SG, Husband BC, Percy DM, Hajibabaei M, Barrett SCH: Multiple multilocus DNA barcodes from the plastid genome discriminate plant species equally well. PLoS One. 2008, 3: e2802-10.1371/journal.pone.0002802.PubMed CentralView ArticlePubMedGoogle Scholar
- Kress WJ, Erickson DL, Jones FA, Swensond NG, Perez R, Sanjur O, Bermingham E: Plant DNA barcodes and a community phylogeny of a tropical forest dynamics plot in Panama. Proc Natl Acad Sci USA. 2009, 106: 18621-18626. 10.1073/pnas.0909820106.PubMed CentralView ArticlePubMedGoogle Scholar
- CBOL Plant Working Group: A DNA barcode for land plants. Proc Natl Acad Sci USA. 2009, 106: 12794-12797. 10.1073/pnas.0905845106.PubMed CentralView ArticleGoogle Scholar
- Meier R, Zhang GY, Ali F: The use of mean instead of smallest inter-specific distances exaggerates the size of the "barcoding gap" and leads to misidentification. Syst Biol. 2008, 57: 809-813. 10.1080/10635150802406343.View ArticlePubMedGoogle Scholar
- Meyer CP, Paulay G: DNA barcoding: error rates based on comprehensive sampling. PLoS Biol. 2005, 3: 2229-2238.Google Scholar
- Pang XH, Song JY, Zhu YJ, Xu HX, Huang LF, Chen SL: Applying plant DNA barcodes for Rosaceae species identification. Cladistics. 2010, 26:Google Scholar
- Ross HA, Murugan S, Li WLS: Testing the reliability of genetic methods of species identification via simulation. Syst Biol. 2008, 57: 216-230. 10.1080/10635150802032990.View ArticlePubMedGoogle Scholar
- Thomas C: Plant barcode soon to become reality. Science. 2009, 325: 526-10.1126/science.325_526.View ArticlePubMedGoogle Scholar
- Coleman AW: ITS2 is a double-edged tool for eukaryote evolutionary comparisons. Trends Genet. 2003, 19: 370-375. 10.1016/S0168-9525(03)00118-5.View ArticlePubMedGoogle Scholar
- Coleman AW: Pan-eukaryote ITS2 homologies revealed by RNA secondary structure. Nucleic Acids Res. 2007, 35: 3322-3329. 10.1093/nar/gkm233.PubMed CentralView ArticlePubMedGoogle Scholar
- Chiou SJ, Yen JH, Fang CL, Chen HL, Lin TY: Authentication of medicinal herbs using PCR-amplified ITS2 with specific primers. Planta Med. 2007, 73: 1421-1426. 10.1055/s-2007-990227.View ArticlePubMedGoogle Scholar
- Coleman AW: Is there a molecular key to the level of "biological species" in eukaryotes? A DNA guide. Mol Biol Evol. 2009, 50: 197-203.Google Scholar
- Linder CR, Goertzen LR, Heuvel BV, Francisco-Ortegac J, Jansen RK: a The complete external transcribed spacer of 18S-26S rDNA: amplification and phylogenetic utility at low taxonomic levels in Asteraceae and closely allied families. Mol Phylogenet Evol. 2000, 14: 285-303. 10.1006/mpev.1999.0706.View ArticlePubMedGoogle Scholar
- Englund M, Pornpongrungrueng P, Gustafsson MHG, Anderberg AA: Phylogenetic relationships and generic delimitation in Inuleae (Asteraceae) based on ITS and cpDNA sequence data. Cladistics. 2009, 25: 319-352. 10.1111/j.1096-0031.2009.00256.x.View ArticleGoogle Scholar
- Pelser PB, Nordenstam B, Kadereit JW, Watson LE: An ITS phylogeny of tribe Senecioneae (Asteraceae) and a new delimitation of Senecio L. Taxon. 2007, 5694: 1062-1077.Google Scholar
- Keller A, Schleicher T, Schultz J, Müller T, Dandekar T, Wolf M: 5.8S-28S rRNA interaction and HMM-based ITS2 annotation. Gene. 2009, 430: 50-57. 10.1016/j.gene.2008.10.012.View ArticlePubMedGoogle Scholar
- Chase MW, Salamin N, Wilkinson M, Dunwell JM, Kesanakurthi RP, Haidar N, Savolainen V: Land plants and DNA barcodes: short-term and long-term goals. Phil Trans R Soc B. 2005, 360: 1889-1895. 10.1098/rstb.2005.1720.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.