- Research article
- Open Access
Conflicting phylogenetic signals in the SlX1/Y1 gene in Silene
BMC Evolutionary Biology volume 8, Article number: 299 (2008)
Increasing evidence from DNA sequence data has revealed that phylogenies based on different genes may drastically differ from each other. This may be due to either inter- or intralineage processes, or to methodological or stochastic errors. Here we investigate a spectacular case where two parts of the same gene (SlX1/Y1) show conflicting phylogenies within Silene (Caryophyllaceae). SlX1 and SlY1 are sex-linked genes on the sex chromosomes of dioecious members of Silene sect. Elisanthe.
We sequenced the homologues of the SlX1/Y1 genes in several Sileneae species. We demonstrate that different parts of the SlX1/Y1 region give different phylogenetic signals. The major discrepancy is that Silene vulgaris and S. sect. Conoimorpha (S. conica and relatives) exchange positions. To determine whether gene duplication followed by recombination (an intralineage process) may explain the phylogenetic conflict in the Silene SlX1/Y1 gene, we use a novel probabilistic, multiple primer-pair PCR approach. We did not find any evidence supporting gene duplication/loss as explanation to the phylogenetic conflict.
The phylogenetic conflict in the Silene SlX1/Y1 gene cannot be explained by paralogy or artefacts, such as in vitro recombination during PCR. The support for the conflict is strong enough to exclude methodological or stochastic errors as likely sources. Instead, the phylogenetic incongruence may have been caused by recombination of two divergent alleles following ancient interspecific hybridization or incomplete lineage sorting. These events probably took place several million years ago. This example clearly demonstrates that different parts of the genome may have different evolutionary histories and stresses the importance of using multiple genes in reconstruction of taxonomic relationships.
One of the challenges of evolutionary biology is phylogeny reconstruction. Modern techniques have facilitated the use of DNA sequences as the primary source of phylogenetic data. Usually, only small fractions of the genome are analyzed. In systematic research, such regions are often assumed to reflect the organismal lineage ("species") phylogeny. In principle, however, the resulting phylogenies do not reflect the history of the species, but rather the history of the individual DNA regions themselves (e.g. ). These regions may have different evolutionary histories, which can lead to presence of several conflicting gene phylogenies. Taken together, these phylogenies can give clues to the organismal phylogeny (e.g. ). Incongruent gene phylogenies may have different causes, e.g. introgression (e.g. [3, 4]), homo- or polyploid hybridization (e.g. [5, 6]), mistaken orthology due to gene duplications and losses , or incomplete lineage sorting of alleles . In fact, coalescence theory predicts that in some cases, the most probable gene tree will not even reflect the species tree [9, 10], and this has also been shown in simulation studies using coalescent models on concatenated data .
In order to understand how gene phylogenies relate to organismal phylogenies, it is important to obtain sequence data from different parts of the genomes [12, 13]. Plant molecular phylogenetics has heavily utilized chloroplast and nuclear ribosomal DNA (nrDNA) data, whereas other parts of the nuclear genome, as well as the mitochondrial genome, are much less utilized. To differentiate between different causes of gene tree discordances, the use of multiple, potentially unlinked low-copy gene regions is desirable (e.g. [12–15]).
If sampling of a multi-copy gene family is poor, paralogues may be misidentified as orthologues . Paralogy problems may, however, exist even if entire genome sequences are at hand, if paralogues have gone extinct haphazardly in different lineages . To complicate matters even further, members of a multi-copy gene family, and also alleles of the same gene, may recombine and make phylogenetic analysis difficult . Paradoxically, this property has facilitated the use of nrDNA sequences, since cistrons from this gene family often are present in very large tandemly repeated numbers. These copies are usually very similar due to the process of concerted evolution, operating by e.g. unequal crossing over and gene conversion (e.g. [17, 18]).
To distinguish intralineage processes such as gene duplications from interlineage processes (e.g. hybridization) causing tree discordances, we are mainly interested in gene copies at three different levels. We follow the terminology of  and use the term "inparalogues" for genes resulting from duplications within terminal taxa/lineages (Figure 1a). Duplications that occur along the internal branches of the ingroup species tree will be referred to as "recent outparalogues" (Figure 1c, cf. ). Genes that have duplicated before the origin of the group of interest will be referred to as "ancient outparalogues" (Figure 1b). "Sequence copies" or "sequence variants" means any kind of alleles, paralogues or orthologues that are similar enough to be aligned with each other.
In organisms where complete genome sequences are not available, paralogy determination has to be considered by heuristic methods. A classical method to find paralogues is to use Southern blot hybridization . A drawback of this method is that it only gives a number of bands able to hybridize with the probe under specific conditions. It does not reveal the actual sequences, which makes it impossible to make any detailed assertions about the homology status of the different bands detected.
Another way to examine paralogues is to use a phylogenetic approach to sort out the different copies of the gene. This has been applied to genomes with complete sequence data (e.g. ). To deal with genomes with incomplete sequence information, Small & al.  suggested a combination of PCR-mediated sequencing, Southern blots, and expression studies. They argue that the sequence-based approach is the weakest and that especially Southern blots can be used to strengthen orthology assessment. However, they did not consider using multiple PCR primer pairs to amplify several overlapping DNA fragments instead of amplifying one single fragment.
While the combination of the approaches certainly is powerful, we think that there is room for improvement of the PCR-mediated approach. In principle, multiple PCR primers could be used with essentially the same coverage as constructed hybridization probes, but with the difference of being more sensitive and, most importantly, giving access to the actual sequences and thus to sophisticated phylogenetic analyses. A major advantage here is that gene trees derived from sequence data provide a possibility to assess the detailed paralogy status (in- or outparalogues of different relative ages) of multiple sequence copies. A weakness of the PCR method, however, is that some sequence copies may be preferred in the amplification (PCR bias, e.g. ). Using multiple primer pairs along the sequence might alleviate this problem to some extent, but if these primers were designed from a single sequence, they will on average be biased towards sequences that are similar to the template sequence. To circumvent this problem, one can design primers placed in more conserved parts of the sequence. Obviously, this requires some knowledge of the sequence diversity present. Although difficult to quantify exactly, PCR bias will decrease and the possibility to find paralogues will increase with the number of independent PCR primer pairs used.
To assess the efficiency of a multiple PCR-primer method, we use a probabilistic approach. Under the assumption that different primer pairs are independent and that they do not, on average, preferentially prefer some copies to others, we can calculate the probability that all sequence copies have been detected with these primer pairs.
In order to use nuclear genes in phylogenetic analyses when entire genome sequence information is not at hand, it is often necessary to develop protocols that are specific for the plant group in question . Popp & Oxelman  developed a protocol to use RNA polymerase genes in phylogenies, which has been successfully used in subsequent Sileneae (Caryophyllaceae) studies to reveal a number of conflicting gene trees ([5, 23], unpublished data). Here, we test the utility of another low-copy nuclear gene, SlX1/SlY1. SlY1 is a sex-linked gene described from the Y chromosome of Silene latifolia Poir., a dioecious member of Silene sect. Elisanthe (Fenzl ex Endl.) Ledeb. . SlX1 is a closely related gene, located on the X chromosome [24, 25]. There are also homologues in the other dioecious species in Elisanthe [26, 27] and on the autosomes in non-dioecious taxa [26–29]. The region is hereafter called "XY1". An initial Southern blot study suggested that there may be several paralogues of XY1 , but in later PCR-based studies, only a single XY1 copy has been found [28, 29].
Preliminary phylogenetic analyses of XY1 sequences indicated that different parts of the gene give rise to conflicting Silene phylogenies. These conflicts mainly involved the relationships between a few distinct lineages: Silene sections Elisanthe and Conoimorpha and some of their relatives in Silene subgenus Behenantha (Otth) Endl. (= subgenus Behen sensu e.g. ). The dioecious species in sect. Elisanthe (S. latifolia, S. dioica (L.) Clairv., S. diclinis (Lag.) Lainz, S. heuffelii Soó, S. marizii Samp.) are characterized by sexual dimorphism and sex chromosomes. Previous studies indicate a close relationship between the dioecious species in Silene sect. Elisanthe and the members of Silene sect. Conoimorpha Otth [30–34], The members of the Conoimorpha group all have calyces with several (up to 60) prominent parallel nerves. Silene vulgaris (Moench) Garcke represents a group of species having strongly inflated calyces with reticulate nerves. They appear closely related to the annual S. behen L. (with which it share some morphological characteristics) and also to S. pendula L. [30, 33].
The aim of this study is to investigate the historical explanation to a case where different parts of a gene (XY1) give rise to conflicting phylogenies within Silene. To investigate whether gene duplication/loss may be a plausible explanation we present a novel probabilistic PCR approach to determine the number of sequence variants present in an organism.
Number of sequence variants
In S. conica L., S. conoidea L. and S. pendula only one XY1 sequence variant was found (see Table 1 for voucher information). In S. acaulis L. two variants were found and in S. nutans L. and S. vulgaris three variants were found (Table 2). Using our novel approach to calculate the posterior probability that the actual number of sequence variants is the observed number (achieved from different independent, partially overlapping PCR products, and using a discrete uniform distribution as prior), we found that for all species except S. nutans, the posterior probability is > 0.99 that there are no more undetected variants (Table 2. See Methods for explanation and formula). The divergence between the variants within an individual was always less than the divergence between sequences from different species (Table 2). The entire alignment of the XY1 region contained 6416 bp and 361 indel characters. The reduced 9-taxon alignment, where parts with large amounts of missing data (due to long indels or unsequenced regions) in many sequences were excluded, contained 4045 bp.
Conflicting phylogenetic signals in different parts of the alignment
The GARD recombination detection screening suggested several recombination breakpoints in the reduced 4045-bp alignment, resulting in two larger non-recombinant partitions in the alignment. The central part of the alignment was divided into a few short partitions by additional recombination breakpoints. Different analysis settings resulted in variation in the number and placement of breakpoints. The positions of the outermost breakpoints, however, differed only slightly: directly before exon 8 and in the intron between exons 9 and 10 (Figure 2, Figure 3, Table 3). The main differences between the partitions (hereafter referred to as the 5' and 3' parts, respectively) are that S. vulgaris and S. conica change places in the position closest to the dioecious species S. latifolia and S. dioica (Figure 3).
Applying Bayesian phylogenetic methods to the partitions suggested by GARD on the full 6416 bp alignment with more taxa (Figure 4) resulted in trees compatible to the Neighbor-Joining trees from GARD. In the three cases (S. acaulis, S. nutans, S. vulgaris) where there were more than one sequence variant per individual, these were always monophyletic within the species (inparalogues). Silene conica and S. conoidea, representing sect. Conoimorpha, grouped together with strong support (Figure 4). Also S. dioica and S. latifolia constituted a well-supported group (dioecious Elisanthe) in both partitions (Figure 4). Silene noctiflora L., the type species of sect.Elisanthe , did not form a monophyletic group with the dioecious species (Figure 4).
In the 5' part of the alignment S. vulgaris grouped together with S. noctiflora as a sister-group to the dioecious species in sect. Elisanthe. This whole clade (II) was sister to a clade with the Conoimorpha group and S. pendula (Figure 4a). In the 3' part of the alignment, sect. Conoimorpha grouped together with S. noctiflora as the closest relatives (VI) to the dioecious Elisanthe. Silene vulgaris grouped with S. pendula (Figure 4b).
The ages of the nodes including S. vulgaris and its closest relatives were 5.1 million years in the 5' part (split with S. noctiflora), and 5.0 million years in the 3' part for the split with S. pendula (Figure 4). The corresponding ages for sect. Conoimorpha were 8.0 million years in the 5' part (split with S. pendula) and 4.4 million years in the 3' part (split with S. noctiflora) (Figure 4). However, the 95% HPD intervals for these nodes overlap considerably.
The parsimony trees for the two partitions were congruent with the corresponding trees obtained by Bayesian inference and by the GARD analysis, both using the reduced alignment (data not shown) and the full 6416-bp data set with additional gap characters (MPB values in Figure 4).
Silene vulgaris and the monophyletic group Conoimorpha exchange positions in our phylogenetic trees somewhere between the first 2.7 kbp and the last 2.5 kbp of the 6416 bp XY1 alignment (Figures 3, 4). The last 2.5 kbp gives a phylogeny in agreement with those obtained by the nuclear ribosomal DNA region ITS [30–32] and the low-copy nuclear genes RPA2, RPB2 and the RPD2a gene (Rautenberg & Oxelman, unpublished data). Also chloroplast DNA rps16  and a large chloroplast DNA data set  show agreeing phylogenies. In contrast, the first 2.7 kbp of the alignment gives an unexpected phylogeny, where S. vulgaris is placed closely related to the dioecious Elisanthe species. This deviating phylogenetic pattern was also recently found in the RPD2b gene (Rautenberg & Oxelman, unpublished data). The results from the XY1 gene agree with previous studies that have shown that S. noctiflora does not belong with the dioecious species in Silene sect. Elisanthe [30–33, 36].
Conflicts between separate phylogenies based on different genes can be observed on several genomic levels: when comparing phylogenies based on different genomes (e.g. organellar vs. nuclear), different genes or different copies of a gene (either alleles or paralogues), or even different parts of one gene. These conflicting phylogenies can either reflect complex phylogenetic patterns (e.g. [7, 23]), or simply highlight errors and problems in lab procedures and/or phylogenetic methods. We will here discuss possible causes of the observed phylogenetic conflict.
Lab errors and artefacts
• Contamination or mix-up of extractions, PCR samples or sequence reactions: The sequences are verified by multiple accessions of the species, and multiple PCR and sequence reactions. The data set also includes sequences made in different laboratories from different source plants. Thus, this possibility can be safely rejected.
• Recombination between paralogues or alleles during PCR: Paralogous gene copies or alleles can recombine during PCR (e.g. [37–39]). This process could create a pattern like the observed one, if our sequencing reactions would have been based on single PCR products of the entire gene. However, our PCR products include several overlapping fragments, 700–2500 bp long.
• Inconsistent alignment procedures in separate parts of the alignment could also influence the accurateness of the resulting phylogeny. This explanation is also unlikely, because the included alignment appears unproblematic. Two 700 bp parts of the alignment with much indel variation were excluded in the reduced alignment used in the GARD analysis.
• Inappropriate phylogenetic methods or sampling errors: Choosing the wrong phylogenetic method is a potential cause of inconsistent phylogenies. In this study, we used distance methods, Bayesian methods and parsimony methods for phylogeny reconstruction. The same general pattern was found with all methods. The different methods applied here have very different theories and potential problems , and in addition, the bootstrap support values and the posterior probabilities are high. Therefore, we argue that the observed pattern is not likely to have been caused by inappropriate methodologies. The high support values also make stochastic errors (i.e. unfortunate sampling of substitution patterns along the sequence alignment) an unlikely explanation.
• Existence of paralogues: If a gene duplicates within a lineage, each of the resulting paralogues will have its own evolutionary fate. Difficulties in orthology determination can seriously distort phylogenies and conclusions drawn from them. Two processes are potentially serious when inferring organismal phylogenies from multi-copy sequences. First, in vivo recombination between the gene copies may give rise to mosaic sequences, which can give inconsistent phylogenies, where different parts of the alignment will reflect conflicting topologies. Second, there is a risk that orthology is mistaken in the phylogenetic analysis if some of the existing outparalogues are not detected, or if some of the paralogues are lost in some lineages. The phylogeny will then be influenced by which copy is lost (or not detected) in which lineage . On the other hand, inparalogues (gene copies that are monophyletic within an individual) will not cause such problems (Figure 1a). Ancient gene duplications (earlier than the divergence of the studied organismal group) followed by haphazard losses will lead to a pattern with some parts of the ingroup grouping with the outgroup (i.e. the ingroup will appear non-monophyletic, Figure 1b). This effect will, under the assumption that the in- and outgroups are correctly circumscribed, indicate that there is a paralogy problem. Thus, only recent outparalogues, i.e. genes that have duplicated after the divergence of the ingroup, but before the origin of the terminal taxa (leaves), are of serious concern for orthology/paralogy interpretation in our case (Figure 1c).
Recombination between two or more paralogues could certainly create a pattern like the one we observe. Results from Southern blot hybridization experiments made by Delichère & al.  suggest that there may be one or more copies of SlX1/SlY1 on the chromosomes of S. latifolia. It is unknown whether these extra copies are inparalogues, recent outparalogues, or if their origins are more ancient than the origin of Silene (ancient outparalogues). In S. conica, the gene orthologous to SlX1/SlY1 seem to be single copy according to our results and those of Atanassov & al. . RT-PCR experiments by Nicolas & al.  also revealed only one sequence in all dioecious species analyzed. The within-species sequence variation we observed in S. vulgaris, S. acaulis and S. nutans is possibly caused by allele variation and/or recent gene duplication, resulting in inparalogues. A natural explanation to the multiple bands detected in some of the Southern blots experiments  might therefore be ancient outparalogues that diverged before the diversification of Silene, or other genes with a similar sequence. However, an alternative explanation might be that they represent silent pseudogenes that have diverged so much that our PCR experiments have failed to target them. Although not an impossible explanation, we consider this to be unlikely, given the large number of different PCR primer pairs used by us on taxa representing various major lineages in Silene and given the high posterior probabilities from the statistical calculations. Note however that relaxation of selective constraints in pseudogenes may result in elevated substitution and indel rates, resulting in violation of the assumption of non-biased targeting of the primers constructed from an alignment of apparently functional gene sequences.
• Incomplete lineage sorting: In recently diverged lineages, the alleles from the ancestral gene pool might not yet have become sorted into the new lineages. One way to reject incomplete lineage sorting as a possible cause of incongruence could be to compare the divergence times of the conflicting nodes. Assuming that one of the trees does reflect the organismal tree, lineage sorting can be rejected if the divergence time of the organismal tree node is older than in the deviating tree. If we consider the tree from the 3' part of XY1 to be the most likely organism tree (as is supported by other data, see above), we cannot reject incomplete lineage sorting because the ages are very similar (node IX versus node III, Figure 4) or considerably younger (node VIII versus node V, Figure 4). However, the 95% HPD intervals for the relevant nodes overlap largely (Figure 4), and we lack a robust hypothesis about dating of splits in the species tree. It is therefore not possible either to reject or corroborate incomplete lineage sorting as the cause of the observed incongruence.
• Horizontal gene transfer: Recently, horizontal (or lateral) gene transfer in plants has been reported (reviewed in ). Most of these cases are mitochondrial genes that seem to be transferred between isolated lineages, but two examples of horizontal transfer of nuclear genes have also been suggested [42, 43]. Although we cannot rule out horizontal gene transfer completely, we do not have a reasonable explanation on the mechanisms and series of events that could create the observed pattern.
• Hybridization also creates patterns where different parts of the hybrid's genome reflect relationships with the different parental taxa. If the hybridization is a success, genes or alleles with separate evolutionary histories will become introduced into the offspring. Through repeated backcrossing with one of the parental lineages, only a minority of the other lineage's genes will prevail ("introgression") and the resulting pattern will mimic horizontal gene transfer.
For a hybridization event to be a favourable explanation when groups exchange positions between two trees, the age of the split disagreeing with the species tree should be younger than in the species tree . If we assume that the 3' part of the XY1 alignment reflects the organism tree (as is corroborated by other data), the hybridization explanation is not supported if the ages of nodes VIII and V are taken at face value (Figure 4b). The 95% HPD intervals for the nodes are broad, however, so hybridization cannot be rejected. Nodes IX and III have very similar median ages, making the discrimination between hybridization and intralineage processes even more obscure. Denser taxon sampling could possibly narrow the HPD intervals.
There are no morphological characters suggesting that hybridization has taken place. On the other hand, this putative event probably lies several million years back, and given the rampant morphological homoplasy in Silene in general, this is perhaps not surprising.
An enigmatic feature of the phylogenetic results is that the lineages of S. vulgaris and the Conoimorpha group appear to mutually switch positions in the trees. There is no reason to expect that recombination should take place at the same sequence location in different lineages, either under a paralogy or under a hybridization hypothesis. However, the taxonomic sampling is sparse, and further sampling might reveal that the phylogenetic positions are not mutually exchanged. The fact that GARD actually supports several recombination events may indicate sequential events, rather than a reciprocal switch.
Lychnis flos-jovis appears within Silene in our trees (Figure 4). The phylogenetic status of Lychnis in relation to Silene is not strongly supported , but recent studies (e.g. ) have rather corroborated the sister-group relationship between the two. However, other data (e.g. [23, 34]) also indicate complicated patterns that may involve reticulations. Here, we follow Oxelman & al's  generic classification of the tribe Sileneae, but the purpose of this paper is not to draw any taxonomic conclusions.
A difficulty when working with organisms where the entire genomic sequences are not known is to estimate confidence in whether the number of sequence variants (alleles, paralogues) detected does reflect all variation within the organism. Joly & al.  used a binomial distribution to calculate the number of clones from a PCR product that had to be sequenced to achieve a certain probability of sampling all alleles in a tetraploid individual (given that the primer pair picks all variants). Since the binomial distribution assumes the events of finding an allele to be independent we argue that a Bayesian approach is more appropriate. The methods appear to give similar results, however. Our approach has the advantage that the total number of sequence copies does not need to be known. Also, replicating the number of independent PCR primer pairs decreases the risk of PCR bias.
There is a phylogenetic conflict in different parts of the Silene SlX1/Y1 gene that cannot be explained by gene duplications/losses or artefacts, such as in vitro recombination during PCR. This phylogenetic incongruence may have been caused by recombination of two divergent alleles following horizontal gene transfer, interspecific hybridization or incomplete lineage sorting. Given our results of the relative dating, we can reject neither of these hypotheses. However, the fact that we recently discovered a phylogenetic pattern similar to that from the first part of the XY1 alignment also in the RPD2b gene (unpublished data) can be interpreted as support for the hybridization/introgression hypothesis.
Our novel probabilistic PCR approach, in combination with phylogenetic methods, provides a useful way to discriminate between different paralogue types and to determine the number of outparalogues in a genome, when the entire genomic sequence is not known.
This example clearly demonstrates that different parts of the genome may tell us different stories and stresses the importance of using multiple genes in reconstruction of taxonomic relationships.
Six specimens representing various phylogenetic lineages in Sileneae (Table 1) were screened for XY1 sequences using PCR outlined below. In addition, GenBank sequences, including several representatives of Silene sect. Elisanthe were used (Table 1). Taxa were chosen to mostly include representatives from Silene subgenus Behenantha. Lychnis flos-jovis (L.) Desr. and representatives from Silene subgenus Silene were used as outgroups [13, 31, 33]. All included taxa are diploid [45, 46]. Genus names follow the generic classification of Sileneae by Oxelman & al. .
Isolation of total genomic DNA was performed from herbarium specimens or fresh material using a modified Carlson/Yoon method . Most DNA isolations were purified by the GFX Purification Kit (Amersham Biosciences) and dissolved in EB buffer (10 mM Tris-Cl, pH 8.5, QiaGen). Some were purified by the Ultra Silica Bead Kit (ABgene).
Primers and PCR
Primers for the XY1 region were designed to amplify several partially overlapping fragments of XY1 (Figure 2). A preliminary alignment with several Silene taxa (GenBank accessions in Table 1 and 10 unpublished sequences with similar sequence diversity as the sequences used in the analyses) was used to get the initial primer sequences. The primers were aimed to work on all of the XY1 variants. Most primers were positioned in exons (Figure 2, Table 4). Details on PCR conditions can be obtained from the first author on request.
When the sequences were polymorphic due to indel polymorphisms, specific primers were designed, either directly from the sequenced PCR products, or from cloned sequences. These new primers were used in later PCR (in a few cases) and for direct sequencing of the polymorphic PCR products.
Purified PCR products were sent to Macrogen Inc. in Seoul, South Korea for sequencing (using the BigDyeTM terminator kit and run on ABI 3730XL). Some sequence reactions were run on an ABI 3700 sequencer at Rudbeck lab, Uppsala University, Sweden. Sequencing reactions were in this case carried out using the BigDye 3.1 kit.
In some taxa, cloning of PCR products was performed to overcome problems with indel polymorphisms. The PCR fragments were obtained by Taq (ABgene), and were cloned using TOPO TA cloning Kit for Sequencing (Invitrogen), with half the recommended reaction volumes. From each cloning reaction, 7–13 colonies were picked for PCR using the universal primers M13F and M13R. Purified PCR products were sequenced by Macrogen Inc. using the universal primers T3 and T7promoter available at Macrogen Inc. and otherwise as above. Mostly, the cloned sequences were only used as a base for subsequent primer design, but in some cases the actual sequences were also used in the alignment. Single base polymorphisms that occurred only in single clones were considered as PCR artefacts and discarded from subsequent analyses.
Assembly and alignment
The Staden package version 1.6.0 for Mac OS X  with phred version 0.020425.c and phrap version 0.990319  was used to assemble readings into contigs. In some cases manual editing of the contigs was made. Base polymorphisms were coded using the NC-IUPAC ambiguity codes.
The resulting contig sequences were aligned manually using QuickAlign , using the criteria of Popp & Oxelman . The sequences were trimmed to reduce the number of taxa with long stretches of missing data in the beginning and end. Parts of the introns between exons 8/9 and 12/13 were very variable, with long indels in many sequences.
Determination of number of sequence copies
We used a Bayesian approach to calculate the probability that we sampled all sequence copies. Let x be the number of sequence variants in the genome. Each PCR primer pair combination amplifies one or more sequence variants. Assuming that there is no PCR bias, the probability of sampling sequence copies will be analogous to the probability of drawing balls from a big bowl containing balls with an unknown number of different colours (x).
By using a discrete uniform distribution on (1, . . ., M) as prior for x, approximating the hypergeometric distribution with a multinomial distribution with parameters (1/x, . . ., 1/x) the posterior distribution of x can be calculated as , where x obs is the observed number of colours. The number of ball draws (n) needed for the posterior probability of x = x obs to be larger than 0.95 is given in Table 5. For the mathematical arguments, see Additional file 1. Thus, by representing PCR primer pair combinations with balls and using colours to represent paralogues, we can obtain an estimate of whether it is improbable that additional PCR primer pairs will detect additional paralogues in the genome. Cases when one primer pair results in two or more sequences will be interpreted as a draw that accidentally results in more than one ball. Note that the assumptions are that the sequences of the primer pairs are unbiased with respect to the population of sequence variants in the genome (no PCR bias). We define this population by the preliminary alignment, taken to represent the phylogenetic diversity in Silene. Thus, we regard sequences outside of this population as ancient outparalogues, i.e. they are not "balls". We regard this procedure as sufficient to justify the assumption that the ability of the primer pairs to amplify recent outparalogues will not, on average, be biased. Even if this assumption is overly simplistic and almost certainly violated, we think that the PCR approach employed here, with the probabilities given in Table 5, provides a useful framework for determination of the number of paralogues in a genome, when the entire genomic sequence is not known.
To screen for putative recombination breakpoints, GARD (Genetic Algorithm Recombination Detection)  was used online . Due to computational limitations, a reduced alignment was analysed. In this reduced data set, only nine sequences were analysed and parts of the alignment with much missing data (first 870 bp and last 200 bp) were excluded. The XY1 introns between exons 8/9 and 12/13 were very variable in length between taxa and large parts (≈700 bp each) of these introns were also excluded (Figure 2). We used the GARD detection method using HKY85 nucleotide substitution bias model (as suggested by the model selection tool on the GARD web page ), with Beta-Gamma rate variation and 5 rate classes. We also tried 3–4 rate classes, and the General Discrete Distribution, with similar results not affecting the conclusions.
BEAST v1.4.7  was used for Bayesian phylogenetic inference and dating of divergence times. Input files for BEAST were created with BEAUti v1.4.7 , using a relaxed clock model , with a Yule prior and the nucleotide substitution models proposed by MrModeltest version 2.2 , using the Akaike information criterion. A prior on the age of the root of the tree was set to 12.57 million years, with a normally distributed standard deviation of 2.018 . Two MCMC chains were run for 10 million generations with trees and parameter values saved every 1000th generation. One of the chains had no constraints on the monophyly of the included groups, in the other chain three groups of interest were forced to be monophyletic: subgenus Silene, subgenus Behenantha (including L. flos-jovis) and the dioecious species of section Elisanthe (nested within subgenus Behenantha). There were no substantial differences between the two MCMC chains, except for the age of subgenus Behenantha. For this group, the priors had a strong impact on the results, especially when the monophyly constraints were in effect. The resulting log files were checked in Tracer v1.4 , and the tree files were summarized using TreeAnnotator v1.4.7  into one Maximum credibility tree with median node heights (discarding the first 10% of the trees as "burn-in"). Trees were visualized using FigTree 1.1.2 .
Maximum parsimony analyses and maximum parsimony bootstrap support measures were performed with PAUP* v.4.0b10 for Unix  on the complete alignment with gap coding, as well as the reduced data matrix, with the data sets partitioned into the non-recombined 5' and 3' parts from the GARD recombination detection procedure. Maximum parsimony analyses were carried out using heuristic search with TBR branch swapping, multrees option in effect, and 10 random addition sequences. For bootstrap support, 1000 replicates were performed, with the multrees option off.
Distances between the sequences were calculated using MEGA4 . In addition to the pairwise distances between all sequences (data not shown), mean distances between sequence copies within an individual and mean distances between sequences from different individuals/taxa were also calculated. Divergence estimates are Tamura-Nei distances with Γ = 0.6587 and ± standard error (SE), based on 500 bootstrap replicates.
base pairs/1000 base pairs
highest posterior density
Markov Chain Monte Carlo
maximum parsimony bootstrap
Doyle JJ: Gene Trees and Species Trees: Molecular Systematics as One-Character Taxonomy. Syst Bot. 1992, 17 (1): 144-163. 10.2307/2419070.
Liu L, Pearl DK: Species Trees from Gene Trees: Reconstructing Bayesian Posterior Distributions of a Species Phylogeny Using Estimated Gene Tree Distributions. Syst Biol. 2007, 56 (3): 504-514. 10.1080/10635150701429982.
Soltis DE, Kuzoff RK: Discordance between Nuclear and Chloroplast Phylogenies in the Heuchera Group (Saxifragaceae). Evolution. 1995, 49 (4): 727-742. 10.2307/2410326.
Okuyama Y, Fujii N, Wakabayashi M, Kawakita A, Ito M, Watanabe M, Murakami N, Kato M: Nonuniform Concerted Evolution and Chloroplast Capture: Heterogeneity of Observed Introgression Patterns in Three Molecular Data Partition Phylogenies of Asian Mitella (Saxifragaceae). Mol Biol Evol. 2005, 22 (2): 285-296. 10.1093/molbev/msi016.
Popp M, Erixon P, Eggens F, Oxelman B: Origin and Evolution of a Circumpolar Polyploid Species Complex in Silene (Caryophyllaceae) Inferred from Low Copy Nuclear RNA Polymerase Introns, rDNA, and Chloroplast DNA. Syst Bot. 2005, 30 (2): 302-313. 10.1600/0363644054223648.
Frajman B, Oxelman B: Reticulate phylogenetics and phytogeographical structure of Heliosperma (Sileneae, Caryophyllaceae) inferred from chloroplast and nuclear DNA sequences. Mol Phylogenet Evol. 2007, 43 (1): 140-155. 10.1016/j.ympev.2006.11.003.
Wendel JF, Doyle JJ: Phylogenetic Incongruence: Window into Genome History and Molecular Evolution. Molecular systematics of plants II: DNA sequencing. Edited by: Soltis DE, Soltis PS, Doyle JJ. 1998, New York: Chapman & Hall, 265-296.
Pamilo P, Nei M: Relationships between Gene Trees and Species Trees. Mol Biol Evol. 1988, 5 (5): 568-583.
Degnan JH, Salter LA: Gene tree distributions under the coalescent process. Evolution. 2005, 59 (1): 24-37.
Degnan JH, Rosenberg NA: Discordance of Species Trees with Their Most Likely Gene Trees. PLoS Genetics. 2006, 2 (5): 0762-0768. 10.1371/journal.pgen.0020068.
Kubatko LS, Degnan JH: Inconsistency of Phylogenetic Estimates from Concatenated Data under Coalescence. Syst Biol. 2007, 56 (1): 17-24. 10.1080/10635150601146041.
Linder CR, Rieseberg LH: Reconstructing patterns of reticulate evolution in plants. Am J Bot. 2004, 91 (10): 1700-1708. 10.3732/ajb.91.10.1700.
Popp M, Oxelman B: Evolution of a RNA Polymerase Gene Family in Silene Caryophyllaceae) – Incomplete Concerted Evolution and Topological Congruence Among Paralogues. Syst Biol. 2004, 53 (6): 914-932. 10.1080/10635150490888840.
Sang T: Utility of Low-Copy Nuclear Gene Sequences in Plant Phylogenetics. Crit Rev Biochem Mol Biol. 2002, 37 (3): 121-147. 10.1080/10409230290771474.
Small RL, Cronn RC, Wendel JF: L.A.S. Johnson Review No. 2: Use of nuclear genes for phylogeny reconstruction in plants. Austral Syst Bot. 2004, 17: 145-170. 10.1071/SB03015.
Martin A: The phylogenetic placement of Chondrichthyes: inferences from analysis of multiple genes and implications for comparative studies. Genetica. 2001, 111: 349-357. 10.1023/A:1013747532647.
Elder JF, Turner BJ: Concerted Evolution of Repetitive DNA Sequences in Eukaryotes. Q Rev Biol. 1995, 70 (3): 297-320. 10.1086/419073.
Zimmer EA, Martin SL, Beverley SM, Kant YW, Wilson AC: Rapid duplication and loss of genes coding for the α chains of hemoglobin. Proc Natl Acad Sci USA. 1980, 77 (4): 2158-2162. 10.1073/pnas.77.4.2158.
Sonnhammer ELL, Koonin EV: Orthology, paralogy and proposed classification for paralog subtypes. Trends Genet. 2002, 18 (12): 619-620. 10.1016/S0168-9525(02)02793-2.
Southern EM: Detection of specific sequences among DNA fragments separated by gel electrophoresis. J Mol Biol. 1975, 98 (3): 503-517. 10.1016/S0022-2836(75)80083-0.
Chiu JC, Lee EK, Egan MG, Sarkar IN, Coruzzi GM, DeSalle R: OrthologID: automation of genome-scale ortholog identification within a parsimony framework. Bioinformatics (Oxford, England). 2006, 22 (6): 699-707. 10.1093/bioinformatics/btk040.
Wagner A, Blackstone N, Cartwright P, Dick M, Misof B, Snow P, Wagner GP, Bartels J, Murtha M, Pendleton J: Surveys of Gene Families Using Polymerase Chain Reaction: PCR Selection and PCR Drift. Syst Biol. 1994, 43 (2): 250-261. 10.2307/2413465.
Frajman B, Eggens F, Oxelman B: Hybrid origins and homoploid reticulate evolution within Heliosperma (Sileneae, Caryophyllaceae) – a multigene phylogenetic approach with relative dating. Taxonomy and Reticulate Phylogeny of Heliosperma and Related Genera (Sileneae, Caryophyllaceae). PhD Thesis. Edited by: Frajman B. 2007, Uppsala University
Delichère C, Veuskens J, Hernould M, Barbacar N, Mouras A, Negrutiu I, Monéger F: SlY1, the first active gene cloned from a plant Y chromosome, encodes a WD-repeat protein. EMBO J. 1999, 18 (15): 4169-4179. 10.1093/emboj/18.15.4169.
Filatov DA, Monéger F, Negrutiu I, Charlesworth D: Low variability in a Y-linked plant gene and its implications for Y-chromosome evolution. Nature. 2000, 404: 388-390. 10.1038/35006057.
Filatov DA, Laporte V, Vitte C, Charlesworth D: DNA Diversity in Sex-Linked and Autosomal Genes of the Plant Species Silene latifolia and Silene dioica. Mol Biol Evol. 2001, 18: 1442-1454.
Nicolas M, Marais G, Hykelova V, Janousek B, Laporte V, Vyskot B, Mouchiroud D, Negrutiu I, Charlesworth D, Monéger F: A gradual process of recombination restriction in the evolutionary history of the sex chromosomes in dioecious plants. PLoS Biol. 2005, 3 (1): e4-10.1371/journal.pbio.0030004.
Filatov DA, Charlesworth D: Substitution Rates in the X- and Y-Linked Genes of the Plants, Silene latifolia and S. dioica. Mol Biol Evol. 2002, 19 (6): 898-907.
Atanassov I, Delichère C, Filatov DA, Charlesworth D, Negrutiu I, Monéger F: Analysis and Evolution of Two Functional Y-Linked Loci in a Plant Sex Chromosome System. Mol Biol Evol. 2001, 18 (12): 2162-2168.
Desfeux C, Lejeune B: Systematics of Euromediterranean Silene (Caryophyllaceae): evidence from a phylogenetic analysis using ITS sequences. C R Acad Sci III. 1996, 319: 351-358.
Oxelman B, Lidén M: Generic boundaries in the tribe Sileneae (Caryophyllaceae) as inferred from nuclear rDNA sequences. Taxon. 1995, 44: 525-542. 10.2307/1223498.
Oxelman B, Lidén M, Rabeler RK, Popp M: A revised generic classification of the tribe Sileneae (Caryophyllaceae). Nordic J Bot. 2001, 20: 743-748. 10.1111/j.1756-1051.2000.tb00760.x.
Oxelman B, Lidén M, Berglund D: Chloroplast rps16 intron phylogeny of the tribe Sileneae (Caryophyllaceae). Pl Syst Evol. 1997, 206: 393-410. 10.1007/BF00987959.
Erixon P, Oxelman B: Reticulate or treelike chloroplast DNA evolution in Sileneae (Caryophyllaceae)?. Mol Phylogenet Evol. 2008, 48: 313-325. 10.1016/j.ympev.2008.04.015.
Greuter W: Silene (Caryophyllaceae) in Greece: a subgeneric and sectional classification. Taxon. 1995, 44: 543-581. 10.2307/1223499.
Sandbrink JM, Geurts LJNM, Gadella TWJ, van Brederode J: Chloroplast DNA variation in Silene section Elisanthe Demonstrates Silene noctiflora L. is not Properly Classified. Biochem Syst Ecol. 1989, 17 (7/8): 539-549. 10.1016/0305-1978(89)90097-5.
Bradley RD, Hillis DA: Recombinant DNA Sequences Generated by PCR Amplification. Mol Biol Evol. 1997, 14 (5): 592-593.
Judo MSB, Wedel AB, Wilson C: Stimulation and suppression of PCR-mediated recombination. Nucleic Acids Res. 1998, 26 (7): 1819-1825. 10.1093/nar/26.7.1819.
Saiki RK, Gelfand DH, Stoffel S, Scharf SJ, Higuchi R, Horn GT, Mullis KB, Erlich HA: Primer-directed enzymatic amplification of DNA with a thermostable DNA polymerase. Science. 1988, 239 (4839): 487-491. 10.1126/science.2448875.
Yang Z: Computational Molecular Evolution. 2006, Oxford: Oxford University Press
Richardson AO, Palmer JD: Horizontal gene transfer in plants. J Exp Bot. 2007, 58 (1): 1-9. 10.1093/jxb/erl148.
Diao X, Freeling M, Lisch D: Horizontal Transfer of a Plant Transposon. PLoS Biol. 2006, 4 (1): e5-10.1371/journal.pbio.0040005.
Ghatnekar L, Jaarola M, Bengtsson BO: The introgression of a functional nuclear gene from Poa to Festuca ovina. Proc Roy Soc B. 2006, 273: 395-399. 10.1098/rspb.2005.3355.
Joly S, Starr JR, Lewis WH, Bruneau A: Polyploid and Hybrid Evolution in Roses East of the Rocky Mountains. Am J Bot. 2006, 93 (3): 412-425. 10.3732/ajb.93.3.412.
Chater AO, Walters SM, Akeroyd JR: Silene. Flora Europaea. Edited by: Tutin TG, Burges NA, Chater AO, Edmondson JR, Heywood VH, Moore DM, Valentine DH, Walters SM, Webb, DA. 1993, Cambridge: Cambridge University Press, 1: 191-218. 2
Degraeve N: Etude de diverses particularités caryotypiques des genres Silene, Lychnis et Melandrium. Bol Soc Brot Ser 2. 1980, 53: 595-643.
Staden R: The Staden Sequence Analysis Package. Mol Biotechnol. 1996, 5: 233-241. 10.1007/BF02900361.
Phred, Phrap, Consed. [http://www.phrap.org]
Müller J, Müller K: QuickAlign: a new alignment editor. Plant Molecular Biology Reporter. 2003, 21: 5-10.1007/BF02773390.
Simmons MP, Ochoterena H: Gaps as characters in sequence-based phylogenetic analyses. Syst Biol. 2000, 49 (2): 369-381. 10.1080/10635159950173889.
Müller K: SeqState – primer design and sequence statistics for phylogenetic DNA data sets. Appl Bioinformatics. 2005, 4: 65-69. 10.2165/00822942-200504010-00008.
Kosakovsky Pond SL, Posada D, Gravenor MB, Woelk CH, Frost SDW: Automated Phylogenetic Detection of Recombination Using a Genetic Algorithm. Mol Biol Evol. 2006, 23 (10): 1891-1901. 10.1093/molbev/msl051.
Datamonkey.org resource for recombination analysis. [http://www.datamonkey.org/GARD/]
Drummond AJ, Rambaut A: BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol Biol. 2007, 7: 214-10.1186/1471-2148-7-214.
Drummond AJ, Ho SYW, Phillips MJ, Rambaut A: Relaxed Phylogenetics and Dating with Confidence. PLoS Biol. 2006, 4 (5): e88-10.1371/journal.pbio.0040088.
Nylander JAA: MrModeltest v2. [http://www.abc.se/~nylander/]
Rambaut A, Drummond AJ: Tracer v1.4. [http://beast.bio.ed.ac.uk/Tracer]
Rambaut A: FigTree. [http://tree.bio.ed.ac.uk/software/figtree/]
Swofford DL: PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods). Version 4. 2002, Sunderland, Massachusetts: Sinauer Associates
Tamura K, Dudley J, Nei M, Kumar S: MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. Mol Biol Evol. 2007, 24: 1596-1599. 10.1093/molbev/msm092.
Holmgren PK, Holmgren NH: Index Herbariorum: A global directory of public herbaria and associated staff. New York Botanical Garden's Virtual Herbarium, [http://sweetgum.nybg.org/ih/]
Sileneae Database. [http://www.sileneae.info]
Råde L, Westergren B: Beta mathematics handbook: concepts, theorems, methods, algorithms, formulas, graphs, tables. 1990, Lund: Studentlitteratur; Bromley: Chartwell-Bratt
The Swedish Research Council for Environment, Agricultural Sciences and Spatial Planning (FORMAS), The Royal Physiographic Society in Lund, Helge Ax:son Johnssons stiftelse, Royal Swedish Academy of Sciences, Swedish Research Council, Liljewalchs resestipendium, Linné-stipendiestiftelsen, and Wallenbergstiftelsen gave financial support to AR and BO. Per Erixon and Douglas Stone helped in the lab. Mike Steel gave valuable input on the probabilistic approach. Two anonymous reviewers gave constructive criticisms to an earlier version of this paper, and helped to improve clarity substantially.
AR carried out the molecular genetic studies, sequence alignment, phylogenetic analyses and drafted the manuscript. DF supplied preliminary data and helped to draft the manuscript. BS performed the statistical analysis and drafted the statistical parts of the manuscript. NH carried out parts of the molecular genetic studies (primer design, cloning) in cooperation with AR. BO conceived of the study, participated in its design and coordination and helped to draft the manuscript. All authors read and approved the final manuscript.