Grapes are an important crop plant grown for wine, juice, raisins, and as fresh fruit. In 2004, the world's grape harvest area in 89 grape-producing countries was 7.5 million hectares, and in the United States grapes were grown in 380,000 hectares . The total production of grapes in the US in 2004 was 5,418,160 metric tons and this generated $2.5 billion . There is considerable interest in using chloroplast genetic engineering as an environmentally friendly approach for engineering disease resistance to powdery and downy mildew, two fungal diseases that have a negative impact on the grape industry. Chloroplast genetic engineering offers a number of unique advantages, including a high-level of transgene expression , multi-gene engineering in a single transformation event [23–26], transgene containment via maternal inheritance [27–29] or cytoplasmic male sterility , and lack of gene silencing, position effect, pleiotropic effects, and undesirable foreign DNA [20, 31–35]. Thus far, transgenes have been stably integrated and expressed via the chloroplast genome to confer several useful agronomic traits, including insect resistance [36, 37, 23], herbicide resistance [27, 38], disease resistance , drought tolerance , salt tolerance , and phytoremediation . The complete grape chloroplast genome sequence reported in this paper provides valuable characterization of spacer regions for potential integration of transgenes at optimal sites via homologous recombination, as well as endogenous regulatory sequences for optimal expression of transgenes.
Genome organization and evolution
The organization of the Vitis genome with two copies of an IR separating the SSC and LSC regions is identical to most sequenced angiosperm chloroplast genomes [reviewed in ]. The size of the genome at 160,928 bp is also within the known size range for angiosperms, which generally vary from 150,519 (Lotus ) to 162,686 bp (Amborella ) among photosynthetic genomes from dicots that have both copies of the IR. Size of the Vitis IR at 26,358 bp is also well within the size range of other sequenced dicot genomes, which range from 23,302 (Calycanthus ) to 27,807 bp (Oenothera ). Gene content and order of the Vitis chloroplast genome is virtually identical to tobacco and many other unrearranged angiosperm chloroplast genomes. Several previously sequenced rosid chloroplast genomes have lost the rpl22 gene, including legumes [45–49]. The distribution of this loss on the chloroplast phylogeny (arrows in Fig. 4A) indicates that there have been at least two independent losses of rpl22 in rosids. Multiple, independent gene losses in angiosperms have been demonstrated for other genes including infA , rps16  and accD [51, 52]. Thus, it is evident that gene losses are not always reliable indicators of phylogenetic relationships.
It is increasingly evident that chloroplast genomes contain repeated sequences other than the IR . Several studies have identified a higher incidence of dispersed repeats in genomes that have experienced extensive rearrangements [53, 54]. However, dispersed repeats are also being detected in unrearranged genomes. In most cases, these repeats are more common in intergenic spacers and introns, which is also true for the Vitis genome. Repeats have been located in a number of other rosids  in the same regions as those identified in the Vitis genome. One of these, a 32 bp repeat in the trnS gene, is in the same location in Gossypium hirsutum , indicating that this repeat may be shared among rosids. Although repeats have been implicated in playing a role in chloroplast genome rearrangements , their effect if any in unrearranged chloroplast genomes is unknown.
Based on previous studies of Atropa  and tobacco , posttranscriptional RNA editing events, as well as deamination-facilitating attacks on nucleotides' exocyclic amino groups, yield primarily C-to-U alterations. Analyses of the Vitis chloroplast genome and the corresponding ESTs indicate that the five C-to-U changes likely represent mRNA edits. However, the remaining six differences could be either sequencing errors in the genomic DNA or EST sequences or due to the use of different cultivars and/or plants/tissues used for sequencing. Our methods eliminate the latter explanation since we only compared DNA and EST sequences from leaves of the chardonnay variety of Vitis vinifera. In view of the high depth of coverage (8X) of our genomic DNA sequences, we believe that the non C-to-U changes represent EST sequencing errors.
Evolutionary loss of RNA editing sites has been observed in earlier studies and could be attributed to a decrease in the effect of RNA-editing enzymes . Additionally, conversions other than C-to-U in Vitis and other plants suggest that chloroplast genomes may be accumulating a considerable number of nucleotide substitutions, and some genes might accumulate more changes than others, such as the petL and ndh genes that have a high frequency of RNA editing . Therefore, despite high levels sequence conservation in chloroplast genomes, variations do occur posttranscriptionally, promoting translational efficiency due to transcript-protein complex binding and/or changes in the chloroplasts microenvironment (e.g., like redox potential or light intensity [61, 62]).
Phylogenetic analyses of 28 (Fig. 3) or 29 (Fig. 4) angiosperms based on 61 protein-coding genes identified many of the major lineages recognized in previous phylogenetic hypotheses of flowering plants [reviewed in ]. Two groups, Amborella and Nymphaelaes (represented by Nuphar and Nymphaea) are basal, with Amborella forming the first diverging lineage in MP analyses and Amborella/Nymphaelaes together forming the most basal clade in ML trees. These results are congruent with recent 61-gene analyses by Leebens-Mack et al.  and support their contention that limited taxon sampling in earlier whole chloroplast genome phylogenies led some previous workers to suggest that Amborella may not be among the earliest diverging angiosperm lineages [2, 3]. Monophyly of the monocots is strongly supported, and they are sister to the remaining angiosperms. Calycanthus, the sole representative of the magnolids, is weakly supported as sister to eudicots in the MP analyses (Figs. 3A and 4A) but the genus is weakly supported as sister to a clade that includes both monocots and eudicots in ML trees (Figs. 3B and 4B). Monophyly of eudicots is strongly supported (100% bootstrap values), in agreement with phylogenies based on both pollen [63, 64] and other molecular data [13, 14, 18, 19, 65–67]. Within eudicots, Ranunculales diverge first and are sister to a strongly supported eudicot clade that includes two moderately to well-supported groups comprising the rosids and asterids. The early divergence of Ranunculales among eudicots is in agreement with many recent molecular phylogenies [see chapter 5 in ]. Although previous studies have clearly indicated that Carylophyllales belong in the core eudicot clade , resolution of the relationships of Caryophyllales to other major clades of eudicots remains uncertain. This order has been considered to be closely allied to rosids, asterids, or simply as an unresolved major eudicot clade sister to the Dilleniaceae . Although taxon sampling is limited in our 61 gene phylogeny, there is moderate to strong support for a sister relationship between the Caryophyllales and asterids (Figs. 3 and 4).
The rosid clade is very diverse, including nearly 140 families representing approximately 39% of the species of angiosperms. The most recent phylogenies of this group [summarized in chapter 8 in ] indicate that there are seven major clades whose relationships still remain unresolved. Eight (Fig. 3) or nine (Fig. 4, includes Gossypium) representatives of four of these major clades are included in our phylogenetic analyses, including members of eurosids I, eurosids II, Myrtales, and Vitaceae. Phylogenetic analyses of both datasets using MP and ML clearly indicate that the Vitaceae is sister to the remaining rosids, and therefore represents an early diverging member of the rosid clade. Previous molecular phylogenetic comparisons that included Vitaceae could not resolve its relationship. Phylogenetic analyses of rbcL sequences alone placed the Vitaceae as sister to either the Caryophyllales or asterid clade with weak support . Phylogenies based on atpB provided only weak support for a sister relationship of Vitaceae to Saxifragales . Several phylogenies based on two to four genes suggested that the Vitaceae are sister to rest of the rosids, with relatively weak support (50–75%; [14–16]). However, phylogenies based on the chloroplast gene matK did not place Vitaceae sister to rosids but instead positioned the family as sister to Dilleniaceae with weak support . In short, the phylogenetic position of Vitaceae is equivocal, though our results strongly support the earlier findings that Vitaceae represent an early diverging clade within rosids (Figs. 3 and 4).
The two datasets we examined differed by only one taxon but the results of MP analyses differed dramatically regarding the placement of three of the four rosid clades examined (compare Figs. 3A and 4A). The 28-taxon dataset (excluding Gossypium) showed relationships that are incongruent with recent molecular phylogenies of rosids  by placing the eurosids II (represented by only Brassicales) sister to the Fabales in eurosids I. This made eurosids I paraphyletic because the other representative of this clade is Cucumis (Cucurbitales), which is sister to the Brassicales in molecular phylogenies of rosids . The addition of Gossypium in the 29-taxon dataset generates MP trees (Fig. 4A) that are congruent with previous angiosperm phylogenies. The Brassicales and Malvales are sister and there is strong support for the monophyly of eurosid II. The addition of Gossypium also makes the eurosid I clade strongly monophyletic in the MP tree by placing the Cucurbitales sister to the Fabales, both of which are members of the nitrogen-fixing clade [see chapter 8 in ]. In contrast to the MP trees, relationships among the major rosid clades do not differ in the ML trees when Gossypium is added. In both the 28 and 29-taxon data sets the ML trees do not support the monophyly of eurosids I since Cucurbitales (eurosid I) are sister to the Myrtales and Brassicales (eurosid II) are sister to the Fabales (eurosid I). Therefore, the ML analyses are incongruent with currently accepted relationships among rosids , though the strongest support for the monophyly of eurosid I clade is only 77% (jackknife support) in a three-gene analysis . Thus, our results suggest that additional phylogenetic studies are needed to assess the monophyly of eurosids I and their relationship to other rosids.
There has been considerable debate regarding the utility of whole genome sequences for phylogeny reconstruction [5, 7–10]. Some have argued that the use of more genes from whole genomes has great potential for providing much more data for resolving phylogenetic relationships [2, 68], whereas others have suggested that problems with limited taxon sampling available for whole genomes [5, 7, 8, 10] and model misspecification [4, 11] overshadows any potential advantages. One example that highlighted each of these concerns centered around the controversy regarding identification of basal angiosperms. Leebens-Mack et al.  demonstrated that inadequate taxon sampling clearly played a role in misleading some previous studies, and Goremykin et al.  demonstrated that ML analyses of whole chloroplast genome data sets can be sensitive to model specification. It is well known that ML methods fail when model parameters are misspecified [69–71]. The phylogenetic analyses in this study provide yet another example of these phenomena. Addition of the Gossypium genome to our parsimony analyses generated trees that are congruent with current understanding of relationships among the major rosid clades. However, the ML analyses are incongruent with the MP trees regarding the monophyly and relationships of the rosid clades and support for the alternative relationships was very strong in each case (compare Figs. 4A and 4B). It is possible, if not likely, that the use of a single "average" model (GTR + I + Γ) in the ML analyses is inappropriate for a data set of 61 concatenated genes [see  for a discussion of this issue]. Future phylogenetic analyses of complete chloroplast genome sequences should consider using methods in which different models can be applied to different partitions of the data (e.g., genes, codon positions, functional groups) . Development of more appropriate models of evolution of chloroplast sequences  may also improve the accuracy of phylogenies based on these genomes. Thus, we need more extensive sampling of whole chloroplast genomes from the major lineages of flowering plants and more rigorous phylogenetic analyses before the full potential of this approach can be realized. Ongoing projects by several labs [see  for a list of some of these] should greatly enhance our taxon sampling so that we can generate reliable phylogenies based on whole chloroplast genomes.