Genome organization and evolution of GC content
The organization of the Drimys and Piper genomes with two copies of an IR separating the SSC and LSC regions is identical to most sequenced angiosperm plastid genomes [reviewed in ]. The sizes of the genome at 160,604 and 160,624 bp, respectively are similar to the each other and Liriodendron, but larger than the only other sequenced magnoliid genome (Calycanthus, 153,337, 23, Table 1). Most of this size difference is due to the larger size of the IR in Drimys (26,649 bp) and Piper (27,039 bp) relative to Calycanthus (23,295 bp), although some is also due to the larger LSC region (Table 1). Expansion and contraction of the IR is a common phenomenon in land plant plastid genomes  with the IR ranging in size from 9,589 bp in the moss Physcomitrella  to 75,741 bp in the highly rearranged angiosperm genome of Pelargonium [36, 37]. Among angiosperms the IR generally ranges in size between 20–27 kb, and the magnoliid genomes except for Calycanthus are at the high end of that range.
Gene order of the magnoliid plastid genomes is identical to tobacco and many other unrearranged angiosperm plastid genomes. There are a few differences in gene content and these can be explained by two phenomena. The first concerns differences in the annotation of two genes in these genomes. Two putative genes (ACRS and ycf15) in Calycanthus were not annotated in Drimys and Piper because several recent studies indicated that they are not functional plastid genes. The sequence of ycf15 has been shown to be highly variable among angiosperms, with conserved motifs at the 5' and 3' ends and an intervening sequence that makes it a pseudogene [38, 39]. An examination of ycf15 transcripts in spinach indicated that is not a functional protein-coding gene . More recent sequence comparisons of ycf15 in other plastid genomes also supported the conclusion that this putative gene is not functional [9, 39]. The ACRS gene was identified by Goremykin et al.  in Calycanthus based on its very high sequence identity with the mitochondrial ACR-toxin sensitivity (ACRS) gene of Citrus jambhiri . This conserved sequence has been identified (as ycf68) in a number of plastid genomes, however, in cases where it has been critically examined the presence of internal stop codons indicates that this is a pseudogene. The second explanation for gene content differences among the three magnoliid genomes is caused by the expansion of the IR in Piper, which results in the duplication of trnH. Small expansions of the IR boundary are common in plastid genomes  resulting in duplications of genes at the IR/SC boundaries. The duplication of trnH in Piper is shared with Nuphar, a member of the Nymphaeales (L. Raubeson et al. unpublished). This expansion of the IR to duplicate trnH has clearly happened independently in Piper and Nuphar since none of the other basal angiosperms or magnoliids have this duplication.
Examination of GC content in 34 seed plant plastid genomes reveals several clear patterns. GC content for the complete genomes ranges between 34–39% (Fig. 3A), confirming previous observations that plastid genomes are in general AT rich [24, 39, 41–47]. The uneven distribution of GC content over the plastid genome is also very evident, and there are several explanations for this pattern. First, there is a clear bias for the coding regions to have a significantly higher GC content than non-coding regions (Figs. 3, 4), which again confirms previous observations based on comparisons of many fewer genomes . Second, there is an uneven distribution of GC content by regions of the genome with the highest GC content in the IR and the lowest in the SSC (Figs. 4, 7). The higher GC content in the IR can be attributed to the presence of the four rRNA genes in this region, which have the highest GC content of any coding regions (Fig. 6B). This higher GC content in the IR region is maintained even when one copy of the IR is lost as in Medicago and Pinus (Fig 7, bottom right panel). The lower GC content in the SSC region is due to the presence of 8 of the 11 NADH genes, which have the lowest GC content of any of the classes of genes compared (Figs. 5, 6). Third, GC content varies by functional groups of genes. Among protein genes, GC content is highest for photosynthetic genes, lowest for NADH genes, with genetic system genes having intermediate values. This same pattern was observed by Shimada and Sugiura  in comparisons of the first three sequenced land plant plastid genomes.
Differences in GC content were also observed by codon position in protein-coding genes (Figs. 3B, 5A,B, 6A). For each of the three classes of genes (photosynthetic, genetic system, and NADH) the third position in the codon has a significant AT bias. This pattern has been observed previously [41, 45, 46, 48], and it has been attributed to codon bias. Previous studies have demonstrated that there is a strong A+T bias in the third codon position for plastid genes [45, 46, 48]. This is in contrast to a GC bias in codon usage for nuclear genes in plants . Several studies have examined codon usage of plastid genes to attempt to determine if these biases can be attributed to nucleotide compositional bias, selection for translational efficiency, or a balance among mutational biases, natural selection, and genetic drift [49–53]. All of these studies have been limited to examining a single or few genes, and they have been constrained by the limited sampling of complete genome sequences for taking variation in GC content into account. Our comparisons of GC content variation for a wide diversity of angiosperm lineages provide a rich source of information for future investigations of the relationship between GC content and codon usage bias.
The debate concerning the identity of the most basal angiosperm lineage continues even though numerous molecular phylogenetic studies of angiosperms have been conducted over the past 15 years [3, 4, 6, 8–11], [14–20], [26–29], [54–66]. Several issues have confounded the resolution of relationships among basal angiosperms, including long branch attraction associated with sparse taxon density and poor taxon sampling, and conflict among trees obtained using different phylogenetic methodologies [4, 6, 7, 11, 13, 18–20, 67]. Most recent studies agree that Amborella and the Nymphaeales represent the earliest diverging angiosperm lineages [3, 4, 6–8], [14–20], [26–29], [57–66]. The most recent multi-gene phylogenetic reconstructions based on nine gene sequences from the plastid, mitochondrial, and nuclear genomes  generate trees supporting each of these two hypotheses depending on the method of phylogenetic analysis and the genes included. Trees generated from plastid genes supported the Amborella basal hypothesis, whereas mitochondrial genes supported the Amborella + Nymphaeales hypothesis. Furthermore, MP analyses tended to support the Amborella basal hypothesis and ML analyses supported Amborella + Nymphaeales. A similar set of relationships was also observed in recent phylogenetic studies using sequences of 61 genes from completely sequenced plastid genomes [18, 19, 67]. In these studies, MP trees placed Amborella alone as the basal most angiosperm with strong support and ML trees placed Amborella + Nymphaeales at the base with moderate support. These differences were attributed to rapid diversification and the lack of extant lineages that could be used to cut the length of branches leading to Amborella and the most recent common ancestor of lineages within the Nymphaeales.
Our phylogenetic analyses include three additional magnoliids from three different orders. Both MP and ML trees (Figs. 8, 9) support Amborella alone as the earliest diverging lineage of angiosperms. Support for this relationship is very strong in MP trees (100% bootstrap) and weak (63%) in ML trees. However, a SH test that constrained Amborella + Nymphaeales in a basal position indicated that the two hypotheses of basal angiosperm relationships are not significantly different. Thus, although both MP and ML analyses including the three additional magnoliid taxa support Amborella as the basal-most branch in the angiosperm phylogeny, sampling of more taxa and genes, and further investigations of model specification in phylogenetic analyses are needed before this issue is fully resolved [see discussion in [18, 23]].
Several earlier molecular phylogenetic studies based on one or a few genes [27, 56, 63] did not support the monophyly of magnoliids. Furthermore, morphological studies of angiosperms failed to detect any synapomorphies for this group. The circumscription, monophyly, and relationships of magnoliids has only recently been established based on phylogenetic analyses of multiple genes [14, 15]. These earlier multigene trees provided only weak to moderate support for the monophyly of magnoliids and the sister group relationships of the Canellales/Piperales and Laurales/Magnoliales. A recent study using eight plastid, mitochondrial, and nuclear genes  provided the first strong support for both the monophyly and relationships among the four orders of magnoliids. Our phylogenetic trees based on 61 plastid protein-coding genes also provide strong support for the monophyly of magnoliids and the sister relationship between the Canellales/Piperales and Laurales/Magnoliales.
One of the most controversial remaining issues regarding relationships among angiosperms concerns the resolution of relationships among the magnoliids, monocots and eudicots. Previous phylogenetic studies have supported three different hypotheses of relationships among these lineages: (1) (magnoliids (monocots, eudicots)), (2) (monocots (magnoliids, eudicots)), and (3) (eudicots (magnoliids, monocots)). The first hypothesis was supported in phylogenetic analyses based on phytochrome genes  and 17 plastid genes  but bootstrap support for a sister relationship of monocots and eudicots was only 67%. Several studies supported the second hypothesis [7, 16, 68], however, bootstrap support was again weak ranging from 55 – 78%. The three-gene phylogenetic tree of Soltis et al.  supported the third hypothesis with only 56% jackknife support. This relationship was also recovered in a matK gene tree with a parsimony bootstrap value of 78% and a posterior probability of 0.73 . Both MP and ML trees based on 61 plastid-encoded protein genes support hypothesis 1 (Figs. 8, 9). Branch support for this hypothesis is moderate (MP, Fig. 8) or strong (ML, Fig. 9). Congruence of the results from both MP and ML analyses is notable because our previous phylogenetic analyses using whole plastid genomes that included only one member of the magnoliid clade (Calycanthus, [18, 67]) were incongruent. In these earlier studies, MP trees supported hypothesis 2 (monocots sister to a clade that included magnoliids and eudicots), whereas ML trees supported hypothesis 1 (magnoliids sister to a clade that included monocots and dicots). These differences provide yet another example of the importance of expanded taxon sampling in phylogenetic studies using sequences from whole plastid genomes [18, 67]. The addition of other angiosperm lineages, especially members of the Chloranthales, Certatophyllaceae, and Illiciales may be critical for providing additional resolution of relationships among the major clades.