In this paper we examined patterns of molecular evolution of three paralogous gene pairs, in order to detect signatures of post-duplication functional divergence. We chose a time scale that allows analysing patterns of natural selection by examining patterns of nucleotide substitution of protein-coding sequences. With that aim, we focused on three sets of paralogs from the Medicago truncatula genome, Lax2/Lax4, Pg3/Pg11 and Pg11a/Pg11c. The duplications leading to these sets of paralogs occurred before the radiation of the 17 species studied but are still recent, as the three set of paralogs, Lax2/Lax4, Pg3/Pg11 and Pg11a/Pg11c, exhibit still 83, 72 and 88% nucleotide identity, respectively. Furthermore, we selected genes that are putatively involved in symbiotic functions, considering that interspecific interactions can involve both evolution of novelty (especially in the case of the legume-rhizobium symbiosis which evolved relatively recently) and co-evolutionary phenomena that are detectable through signatures of positive selection.
Models describing the evolutionary fate of duplicated genes once the duplication is fixed in the species suppose different forms of selective pressures . First, according to the neofunctionalization model, i.e. evolution of a new function through functional divergence of one of the duplicated copies, selective pressures are expected to be asymmetrical between paralogs . The copy fulfilling the ancestral function is expected to remain under purifying selection while the other copy is expected to experience a short period of relaxed constraint and then positive selection driving the acquisition of its new function. Second, the subfunctionalization model envisions the fixation of complementary degenerative mutations . Under this model, relaxation of purifying selection is expected during the period of functional redundancy, and may allow the fixation of at least two complementary degenerative mutations (one in each gene). When both copies are jointly required to fulfil the ancestral gene function, purifying selection is still expected to be prevalent to maintain both copies. Although both models have been functionally validated, they are not exclusive and more complex scenarios combining the steps cited previously have been devised [15, 25].
For all three studied paralogous gene pairs, the two copies exhibit different regimes of selection. This result suggests that these paralogous gene pairs have undergone at least some functional differentiation. Three different tests were used to qualify selective pressures governing the paralogs. The first one contrasted the average ω between paralog clades of the phylogeny and yielded significant differences only for Pg11/Pg3 (Table 1). The second test is specifically designed to detect positive selection affecting only a few sites of the sequence. We found signatures of positive selection in both Pg11a and Pg11c copies, and in Pg11 (Table 2). Finally, the clade model (Table 3) is a combination of branch and site models and allows investigating specifically the presence of sites evolving under divergent selective pressures between the paralogous genes and quantify its proportion. The clade model (MD) detected a significant increase of ω in Pg11 due to the occurrence of positive selection, as detected by the site model M8. For the paralogous pair Pg11a/Pg11c, branch models failed to detect any difference in selective pressure. Model D is more detailed and allows showing that sites under positive selection actually experience a stronger positive pressure in Pg11c than in Pg11a. Lax2 is the subject of an intense purifying selection whereas Lax4 harbours some sites (20% of sites) evolving quasi neutrally (ω = 0.97). The combination of these different tests provides a more complete picture of the selective pressures at work on each set of paralog. Since each single test addresses a single hypothesis, the comparison of several complementary tests allows acquiring a more complete picture. However, the clade model, which accounts for both variation of ω among branches and amino acid position, appears as the most informative for qualifying changes of selective constraint during duplicated genes evolution . The only drawback is that it does not test formally for positive selection.
We observed that the Pg3 and Pg11c gene copies were pseudogenes in several species: in M. littoralis and M. tricycla for Pg3 and in M. tornata, M. rigidula and M. polymorpha for Pg11c. Since the three genes are present and potentially functional in, at least, four other species among those studied, we can hypothesise that the mutations affecting the function of these gene copies occurred, in some phylum, after the two successive rounds of duplications leading to the presence of three copies. This observation suggests that redundancy between copies is sufficient to have allowed the loss of one copy in several species.
Functional redundancy generated by multiple copies also implies periods of relaxed selection pressures, except if duplication itself is advantageous as it is the case, for instance, for a positive dose effect of copy number . Redundancy is expected to occur with a larger probability when divergence between copies is slowed as it is the case of gene conversion . The phylogenetic miss positioning we observed for four genes copies (Figure 1) may be explained by gene conversion. One way to test this hypothesis would be to sequence other individuals of M. tornata for example, in order to see if we could detect shared polymorphism between copies, which is a signature of gene conversion .
We detected sites under positive selection in Pg11 but not in Pg3. Rodriguez-Llorente et al. suggested that Pg3 has been recruited by symbiosis after a duplication affecting an ancestral pollen-specific gene. The authors suggested that the modifications occurred essentially in the promoter region. Our results show that positive selection targeted both copies of Pg11 independently, possibly indicating the evolution of novel gene function. The polygalacturonase family contains members in organisms as distantly related as plants and eubacteria. In plants this gene family has been expanding dramatically through rounds of whole-genome duplications, segmental duplications and tandem duplications (66 and 59 copies in Arabidopsis thaliana and rice respectively) . The high level of expansion of this family, generating periods of high redundancy, was probably accompanied by pseudogenization events, equivalent to those we detected in the Medicago genus. However as expression patterns are diverse between members of the family  subfunctionalization events were probably involved in the overall high retention rate of functional genes, notable in this family. Functional divergence among members of large gene families may also be driven by positive selection. Main examples in plants are disease resistance genes , transcription factors  or genes involved in development . In our study, positive selection is detected in Pg11, resulting from the cumulative effects of positive selection in both Pg11a and Pg11c, the more recent duplicated gene pair we studied. Actually, this mode of selection does correspond to neither neofunctionalization nor subfunctionalization in their stricter definition. Subfunctionalization does not predict positive selection in either copy, while neofunctionalization predicts positive selection in only one copy (if detectable). Both copies could be under positive selection because they inherited, from the ancestral Pg11 gene, functions that imply regime of positive selection. Alternatively, neo-functionalization could involve adaptive differentiation of both copies (to avoid functional overlap), that would mediate adaptive evolution of both copies. Selection targets different sites in Pg11a and Pg11c and the strength of positive selection is different between them (Table 3). This observation is compatible with both models.
According to the clade models, the paralogs Lax2 and Lax4 experience different modes of selection. Both genes are mainly under purifying selection. Interestingly no pseudogenes were detected in Lax2 or in Lax4. The redundancy stage subsequent to the duplication generating Lax2 and Lax4 is not detectable anymore and may have been shorter than in Pg gene family. However, Lax4 appeared to be slightly, but significantly, less constrained than Lax2. According to the clade models (with 2 or 3 classes of sites, Table 3) a relaxation of constraint is observed for about 20% of the sites for Lax4 relative to Lax2. This means either that Lax4 acquired a function that implies less functional constraints or that both genes underwent subfunctionalization in such a way that the protein sequence of Lax4 is less constrained. Currently, the precise functions of Lax2 and Lax4 are not known. Both paralogs are expressed in shoot and roots of nodulating plants of M. truncatula. Lax2 is found in Expressed Sequence Tag (EST) libraries built from different tissues (2 in early seed development, 2 in flowers, early seeds, late seeds and stems, 2 in mixed root and nodules, 1 in nematode-infected roots, in developing flowers and phosphate-starved leaf). Lax4 is not found in EST libraries but expression of Lax4 was detected in shoots and roots of nodulating plants of M. truncatula.
The models contrasting ω in different branches allowed testing transient relaxation of purifying selection predicted to occur immediately after duplication. A significant increase of ω was detected in basal branches of the Lax2/Lax4 phylogeny. The opposite trend was detected for the Pg11/Pg3 pair, where purifying selection appeared to be actually weaker in late branches than in early branches, particularly for Pg11 (ω = 0.44). However, the value of ω in late branches was likely biased by the occurrence of positive selection in Pg11, because branch models average over all sites.