Diversification and adaptive sequence evolution of Caenorhabditis lysozymes (Nematoda: Rhabditidae)

Background Lysozymes are important model enzymes in biomedical research with a ubiquitous taxonomic distribution ranging from phages up to plants and animals. Their main function appears to be defence against pathogens, although some of them have also been implicated in digestion. Whereas most organisms have only few lysozyme genes, nematodes of the genus Caenorhabditis possess a surprisingly large repertoire of up to 15 genes. Results We used phylogenetic inference and sequence analysis tools to assess the evolution of lysozymes from three congeneric nematode species, Caenorhabditis elegans, C. briggsae, and C. remanei. Their lysozymes fall into three distinct clades, one belonging to the invertebrate-type and the other two to the protist-type lysozymes. Their diversification is characterised by (i) ancestral gene duplications preceding species separation followed by maintenance of genes, (ii) ancestral duplications followed by gene loss in some of the species, and (iii) recent duplications after divergence of species. Both ancestral and recent gene duplications are associated in several cases with signatures of adaptive sequence evolution, indicating that diversifying selection contributed to lysozyme differentiation. Current data strongly suggests that genetic diversity translates into functional diversity. Conclusion Gene duplications are a major source of evolutionary innovation. Our analysis provides an evolutionary framework for understanding the diversification of lysozymes through gene duplication and subsequent differentiation. This information is expected to be of major value in future analysis of lysozyme function and in studies of the dynamics of evolution by gene duplication.


Background
Since their discovery by Ian Fleming, lysozymes have become an important model system in molecular biology, biochemistry, and structural biology with major biomedical importance [1]. They are ubiquitous enzymes known from almost all groups of organisms including phages, bacteria, protists, fungi, animals, and plants [2][3][4][5][6][7]. Several distinct lysozyme types are recognised, including the chicken-type, goose-type, invertebrate-type, or amoeba lysozymes [2,7,8]. Because of their ability to break up peptidoglycan (an important component of bacterial cell walls) and their induced expression upon pathogen exposure, their original function was suggested to be defence against bacterial infections. At the same time, some lys-ozymes are involved in digestion. This function is found in vertebrate and insect taxa, which obtain nutrition from microorganisms involved in decomposing organic matter, e.g. the vertebrate foregut fermenters like ruminant artiodactyls, leaf-eating monkeys, the bird hoatzin, and the Drosophila and Musca flies [5,[9][10][11].
Lysozymes have additionally become an important model in studies of molecular evolution. The origin of a digestive function in the leaf-eating monkeys was found to show the characteristic signature of adaptive sequence evolution, i.e. the non-synonymous substitution rate was significantly larger than the synonymous substitution rate, strongly indicating that amino acid-changing mutations were favoured by natural selection [12,13]. Gene duplication appears to play an important role in lysozyme evolution. Impressive examples include the ruminant artiodactyls with at least seven genes per genome [10], Drosophila fruitlies with at least eleven loci [5,14], and the mosquito Anopheles gambiae with at least nine lysozymes [15]. In these examples, some lysozymes have a digestive function. Functional diversification is further indicated by variation in gene expression pattern (e.g., timing, tissue, expression level) and several biochemical characteristics. For instance, the digestive lysozymes differ from the antimicrobial lysozymes by an increased expression in the gut, their resistance to protease degradation, an acidic isoelectric point and pH optimum [5,9]. Taken together, these patterns are consistent with the specific role of gene duplication as a source of evolutionary innovation [16], as known for diverse gene families like the animal hox and the vertebrate MHC genes [17,18].
An unexpected diversity of lysozymes is found in nematodes of the genus Caenorhabditis. They contain up to 15 different lysozymes of two distinct types [19,20]: the invertebrate-type and another distinct type that is charac-terized by lysozymes from various protist taxa (hereafter termed protist-type lysozymes). Although the exact function of these enzymes has not as yet been assessed systematically, some of them are involved in pathogen defence [19][20][21]. In the current paper, we provide a framework for understanding diversification of the Caenorhabditis lysozymes. In particular, we explore the lysozyme genealogy and test the hypothesis that gene duplications associate with diversifying selection, as expected for a role in immunity against the usually rapidly evolving repertoire of pathogens. Lysozyme sequences are considered from the three Caenorhabditis species with completely sequenced genomes, i.e. C. elegans, C. briggsae, and C. remanei [22]. Their genealogies are reconstructed at both protein and DNA sequence level with the help of maximum likelihood (ML) tree inference methods [23]. Signatures of positive selection are assessed across branches of the inferred genealogy and across the aligned sequences with the help of the maximum likelihood approach developed by Ziheng Yang and co-workers [24,25]. The results are related to the current data on lysozyme function.

Overview and general phylogenetic position of the Caenorhabditis lysozymes
The lysozymes from the three Caenorhabditis species are listed in Table 1 and 2. The genomic distribution of clustered genes is illustrated in Fig. 1. As a first step, we compared all complete lysozyme protein sequences from C. elegans with those from various vertebrates, invertebrates, protists, and one phage. For this purpose, a multiple sequence alignment was generated based on a hierarchical method, i.e. similar sequences are aligned first, followed by alignment of less similar sequences (see methods). We noted that the resulting alignment almost exclusively contained variable positions. Moreover, if we varied the settings of the alignment algorithm (e.g. gap opening, gap extension, or gap distance penalties), only few regions could be recovered in identical form. Therefore, positional homology may not be entirely reliable. Since the alignment was inferred from the hierarchical algorithm, it should still be informative as to the general phylogenetic position of the nematode lysozymes. In fact, its phylogenetic analysis highlighted that the nematode possesses two distinct lysozyme types ( Fig. 2; inferred from the alignment obtained using default settings of the alignment programme), thus confirming previous observations. In particular, five C. elegans lysozymes group with the invertebrate-type lysozymes. These lysozyme genes are thus labelled Cel-ilys-1 up to Cel-ilys-5. The remaining ten C. elegans lysozymes are the previously labelled genes lys-1 up to lys-10. They fall into two separate lineages within the distinct clade of protist lysozymes.
For the more detailed phylogenetic analyses, we examined the two lysozyme types separately. For this purpose, we generated two new alignments and several subsets of these (see methods and below).

Evolution of invertebrate-type lysozymes
The genomes of C. briggsae and C. remanei contain two and three invertebrate-type lysozyme genes, respectively.
They were named in consideration of their similarity and phylogenetic affinity to the C. elegans lysozymes (see below; Table 1). Three of the five invertebrate-type lysozymes from C. elegans are found in a single cluster and with the same orientation on chromosome IV (Fig. 1A). None of the other genes are present in clusters (Table 1). All invertebrate-type lysozymes could be reliably aligned to each other at both protein (alignment 2; Fig. 3) and DNA sequence level (alignment 3; position of indels is identical between the two alignments). Two genes show unusual properties in comparison to the others and thus they may be non-functional (i.e. they are pseudogenes). In particular, the gene Cbr-ilys-4 possesses an unusual amino terminus and it lacks a signal peptide. Cel-ilys-1 contains a large insertion, it shows many nucleotide differences to the other sequences, and it also lacks a signal peptide.
Phylogenetic analysis of protein and DNA sequences yielded essentially identical tree topologies (Fig. 4A). The only two differences refer to (i) the exact position of Celilys-1, Cel-ilys-2, and Cel-ilys-3 in relation to each other, and (ii) the position of Cel-ilys-4 and Cbr-ilys-4 in relation to the monophylum of Cre-ilys-4.1 and Cre-ilys-4.2. These discrepancies are reflected by low bootstrap support for the respective branches in both protein and DNA trees, indicating lack of sufficient unambiguous phylogenetic information in the sequences. Otherwise, the inferred genealogy identifies two distinct clades, one with the ilys-4 genes and the other with all remaining genes. Both clades contain genes from all three taxa.
Our analysis did not reveal any indication for adaptive sequence evolution across sequences (likelihood ratio test [LRT] comparison between model M8 with either model M7 or M8a, P ≥ 0.5). However, we consistently identified two episodes of positive selection along the phylogeny, regardless of the analysis method (Table 3; Fig. 4). In both cases, adaptive sequence evolution associates with incidences of intra-lineage lysozyme radiations (in one case within the C. remanei and the other case within the C. elegans lineage; Fig. 4A). Most of the remaining branches have a d N /d S rate ratio well below 1, suggesting purifying selection (i.e. amino acid changes are selectively disfavoured).

Evolution of protist-type lysozymes
The protist-type lysozymes are present with either seven (both C. briggsae and C. remanei) or ten genes (C. elegans; Table 2). Synteny is found for the genes lys-1, lys-2, and lys-3, which are clustered in all three species -in both C. elegans and C. briggsae on chromosome V and in C. remanei on supercontig 13 ( Table 2; Fig. 1B). The gene lys-1 is always found in opposite orientation to the other two. In C. remanei, the lys-3 homologue is separated from the Genomic distribution of the lysozyme genes on A, chromosome IV of C. elegans and C. briggsae, and B, chromosome V of C. ele-gans and C. briggsae and supercontig (sctg) 13 of C. remanei Phylogenetic relationships between the C. elegans lysozymes (red labels) and the invertebrate-type (blue labels), protist-type (green labels), and also c-type, g-type, and phage-type lysozymes (all black labels) Figure 2 Phylogenetic relationships between the C. elegans lysozymes (red labels) and the invertebrate-type (blue labels), protist-type (green labels), and also c-type, g-type, and phage-type lysozymes (all black labels). The tree was reconstructed from amino acid sequences using maximum likelihood. Branches are drawn in proportion to the inferred number of substitutions per site (see bar in bottom left corner). Bootstrap support from 200 replicate data sets is indicated next to branches. Only values larger than 50 are given. Branches interrupted by two slashes were shortened. The unrooted topology is shown, since the position of a possible root is unknown.
Alignment of the Caenorhabditis invertebrate-type lysozyme amino acid sequences other two genes by approximately 10,000 nucleotides (and four open-reading frames) in contrast to both C. elegans and C. briggsae, where the three genes are directly adjacent to each other. In C. elegans, the lys-7 gene is additionally found on chromosome V, but in a different location than the three clustered genes (Fig. 1B). C. elegans contains a second well-defined cluster of protist-type lysozymes on chromosome IV, including Cel-lys-4, Cel-lys-5, and Cel-lys-6. In this case, there is no synteny in the other species. Interestingly, however, the C. briggsae chromosome IV contains a cluster that combines genes from the above C. elegans cluster (in this case the C. briggsae genes Cbr-lys-6.1 and Cbr-lys-6.2) with the gene Cbr-lys-10. The C. elegans orthologue of the latter gene, Cel-lys-10, is similarly present on chromosome IV but in a different location than the cluster (Fig. 1A). In C. remanei, two additional genes are found in relatively close physical proximity to each other: Cre-lys-8.1 and Cre-lys-8.2 are located on supercontig 9 (Table 2)  The overall phylogeny of the protist-type lysozymes from nematodes and one outgroup taxon (Dictyestelium discoideum) was assessed with an alignment of the complete protein sequences (alignment 4; Fig. 5 and Additional file 1). This alignment was robust to variations of the settings of the alignment programme. In contrast, for the corresponding DNA sequences, several regions could not be recovered in identical form under similar variations. Therefore, it cannot be entirely excluded that these regions bear an increased risk of homoplasy. To reduce this risk for the detailed analysis of lysozyme evolution (i.e. inference of non-synonymous and synonymous substitution rates), we extracted five subsets from alignment 4. Of these, alignment 5 consists of the alignable part of all protist-type lysozyme DNA sequences from the Caenorhabdi-Genealogy of the Caenorhabditis invertebrate-type lysozymes, including A, the unrooted tree topology with branch-lengths inferred from DNA sequence analysis, and B, the tree topology with branch-names used in the analysis of positive selection across branches The representation in B serves to illustrate branch-names for the analysis of positive selection; the branch-names are identical to those given in Table 3.
tis nematodes (see vertical black lines with arrows below the alignment in Fig. 5/Additional file 1). Since alignment 5 considered only a comparatively short part of the genes, we additionally analyzed the clade 1 and 2 protist-type lysozymes separately. These separate analyses allowed us to include complete or almost complete genes and thus additional phylogenetic information as contained in the regions excluded in alignment 5. Here, analysis of clade 1 lysozymes was based on the alignable part of the genes (see vertical black lines with arrows above alignment in . Almost all of these differences are again associated with low bootstrap support, suggesting that the available sequences lack sufficient unambiguous phylogenetic information at these two levels. All other relationships were consistently identified, irrespective of the alignment used, indicating the availability of robust phylogenetic information in these cases. The phylogenetic analysis yielded the following information. (i) The protist-type lysozymes fall into two distinct clades (clade 1 and 2), which diverged before separation of the three species (Fig. 6A).
(ii) Within clade 1, four distinct phylogenetic groups are identified (Fig. 7A). Three of them contain one orthologue per species, indicating duplication of genes before species separation. The fourth group includes one gene for C. briggsae, two monophyletic genes for C. elegans, and two monophyletic genes for C. remanei.
(iii) The inferred clade 2 topology shows less hierarchical structure than the clade 1 topology (Fig. 8A). Here, the lys-10 orthologues form a monophyletic group, which is most closely related to Cel-lys-4. The remainder of this clade shows a single gene from C. remanei, two monophyletic genes from C. elegans, and two monophyletic genes from C. briggsae.
The analysis of positive selection across sequence alignments yielded a single significant result. In particular, for the aligned clade 1 coding sequences (alignment 7, see methods and Fig. 5) model M8 differed significantly from both model 7 (LRT, 2ΔL = 8.786, P = 0.012) and model 8a (LRT, 2ΔL = 6.589, adjusted P = 0.005). A single alignment position was found to be subject to adaptive sequence evolution according to the Bayes empirical Bayes method (P = 0.99). The alignment position is found in the middle of the genes and it is highlighted in Fig. 5.
For the other data sets, the comparisons were all insignificant Branch names are as depicted in Fig. 4B. For the first comparison, d N / d S rate ratios were inferred for individual branches with the free-ratio model, in which all branches were allowed to vary; the optimal model had a likelihood of ln L = -2871.29; the significance of individual branches having a d N /d S rate ratios above 1 or below 1 was assessed with non-parametric bootstrapping using 100 replicates; d N /d S rate ratios larger than 1 and with bootstrap support of more than 50 are given in bold. For the second comparison, d N /d S rate ratios were repeatedly inferred with the 2-ratio model, in which only the branch of interest was allowed to differ from the remaining branches; the significance of the individual branches to be different from the remaining branches was assessed via a likelihood ratio test comparison to the null model, in which all branches of the tree were assumed to have identical d N /d S rate ratios; the null model had a likelihood score of ln L = -2895.56; the probability P was calculated from twice the likelihood difference 2ΔL between null model and tested model; significance is indicated by * and ** for α = 0.1 and 0.05 according to the sequential Bonferroni procedure, respectively, and by # and ## for α = 0.1 and 0.05 according to the false discovery rate, respectively; bold d N /d S rate ratios indicate those that are significantly larger than 1 according to either method.
During assessment of adaptive sequence evolution along branches, the different data sets and the two methods for inference of statistical significance produced slightly different results. For instance for the clade 1 sequences, only a single branch was inferred to have a d N /d S rate ratio significantly above 1 by both methods (the branch leading to the lys-1 orthologues). In the analysis of the complete data set (including both clade 1 and 2), the same branch was found to be significant by only one of the two approaches.
In spite of these variations, the results from all data sets and methods, taken together, consistently point to two main tree regions that are likely to be subject to positive selection: (i) the branch leading to the lys-10 orthologues (Tables 4 and 6; Figs. 6 and 8), and (ii) the branches associated with the early radiation of the lys-1, lys-2, and lys-7/ 8 orthologues (Tables 4 and 5; Figs. 6 and 7). The majority of the remaining branches yielded a d N /d S rate ratio that was clearly below 1, indicating purifying selection. Tables 1 and 2 list the characteristics of Caenorhabditis lysozymes, highlighting variation in length, molecular weight, isoelectric point, charge, and the grand average hydropathy. Importantly, the three distinct clades differ Alignment of the Caenorhabditis protist-type lysozyme amino acid sequences Figure 5 Alignment of the Caenorhabditis protist-type lysozyme amino acid sequences. The figure only shows the top quarter of the alignment. The complete alignment is given in Additional file 1. In both cases, black lines at the beginning of the alignment denote the inferred signal peptides. Alignment 4 (see methods and results) includes all taxa and the entire protein sequences. Vertical black lines with arrows below the alignment indicate the regions used for specific DNA sequence analysis of all protist-type lysozymes (alignment 5). Vertical black lines with arrows above the alignment indicate those regions analyzed for the clade 1 lysozymes (alignments 6 and 7 for protein and DNA sequences, respectively). Clade 2 lysozyme analysis was based on complete sequences (alignments 8 and 9 for protein and DNA sequences, respectively). Note that all alignments are subsets of alignment 4, i.e. the position of indels is identical. The red box and arrow indicate the sequence position, which was inferred to be under positive selection for the clade 1 lysozymes.

Characteristics and function of the different lysozymes
significantly in all of these traits with the exception of charge ( Table 7). The most pronounced differences are found between the clade 1 protist-type lysozymes and the invertebrate-type lysozymes (posthoc tests in Table 7). Although the two protist-type lysozyme clades are generally more similar to each other, they do show some variation, especially regarding length and weight.
For the C. elegans lysozymes the current knowledge on the site of gene expression and the role in immune defence is summarized in Table 8. All genes, for which data is available, appear to be expressed in the intestines. Some are additionally expressed in neurons (Cel-lys-1), larval muscles (Cel-lys-7), or the pharynx (Cel-lys-8). The data on immune function highlights clear differences between the three clades. The most pronounced effect is seen for pathogen-induced gene expression. It was reported for all of the clade 1 protist-type genes. Within this clade, individual genes vary as to their response to different pathogens (Table 8). In contrast, both the clade 2 protist-type and the invertebrate-type genes show considerably fewer cases of pathogen-activation, and at the same time, several cases of pathogen-suppression ( Table 8). The above pattern is generally confirmed by the current data on lysozyme regulation through known components of the C. elegans immune system ( Table 8). The clade 1 protist-type genes generally appear to be under positive control of the immune system. At the same time, they show variation as to the importance of different regulatory factors. In contrast, the other two clades rather appear to be under negative influence of immunity pathways (Table 8).

Evolution of Caenorhabditis lysozymes
Caenorhabditis nematodes are among the organisms with the highest number and the most extreme diversity of lysozyme genes. Their lysozymes fall into three distinct clades, one being part of the invertebrate-type and the other two of the evolutionary very distant protist-type lysozymes. Moreover, the Cel-lys-9 gene from C. elegans, which undoubtedly belongs to the protist-type lysozymes (Fig. 2), shows only limited similarities to the other nematode genes and it may thus represent a class of its own. To date, it is impossible to say whether the invertebrate-Genealogy of all Caenorhabditis protist-type lysozymes, including A, the unrooted tree topology with branch-lengths inferred from protein sequence analysis, and B, the tree topology with branch-names used in the analysis of positive selection across branches Figure 6 Genealogy of all Caenorhabditis protist-type lysozymes, including A, the unrooted tree topology with branchlengths inferred from protein sequence analysis, and B, the tree topology with branch-names used in the analysis of positive selection across branches. The branch-names in B are identical to those given in Table 4. All other information as in Fig. 4.
type and the protist-type lysozymes evolved from a common ancestor or not. In the latter case, their general similarity as lysozymes would be a consequence of convergent evolution towards a similar function in defence or digestion. Additional data from more basal nematode as well as metazoan taxa (e.g. cnidarians, poriferans, platyhelminths) is required to distinguish between these alternatives.
Some of the Caenorhabditis lysozyme genes are found in clusters within the genome, as known for about one fifth of the protein-coding genes of C. elegans and apparently characteristic for genes involved in interactions with the environment [26]. Thus, lysozymes may be subject to similar evolutionary dynamics recently described for several of the clustered gene families [27]. These clustered gene families are most likely shaped by concerted molecular evolution. They are characterized by species-specific clades of the gene clusters, the presence of inverted genes that have been proposed to stabilize concerted evolution of clusters over time, and strong purifying selection [27]. However, the inferred evolutionary history of lysozyme clearly contrasts with such patterns. Genes in close genomic proximity do not form species-specific phylogenetic clades. None of the genomic lysozyme clusters con-tain "stabilizing" genes with inverted orientation in the middle of the cluster. Furthermore, although the majority of genes appears to be subject to purifying selection, we did obtain a strong indication for several episodes of diversifying selection.
We conclude that the lysozymes follow a different evolutionary trajectory. Our analysis reveals three main patterns.
(i) Gene duplication prior to species separation and maintenance of the duplicated genes. This scenario is most evident where lysozyme orthologues are monophyletic and distributed in synteny across genomes in all three taxa, e.g. the protist-type lys-1, lys-2, and lys-3 genes. Other likely cases are the protist-type lys-6, lys-8, lys-10, and the invertebrate-type ilys-4 and ilys-5 genes, for which corresponding orthologues fall into monophyletic clades. In all these cases, the orthologous genes must have an age of at least three million years, which is the minimum time since the last most common ancestor of the three Caenorhabditis species [28]. Their maintenance across time suggests an important conserved biological role for each group of orthologues. In this case, their original divergence after gene duplication may have been favoured by diversifying Genealogy of the protist-type clade 1 lysozymes, including A, the unrooted tree topology with branch-lengths inferred from DNA sequence analysis, and B, the tree topology with branch-names used in the analysis of positive selection across branches Figure 7 Genealogy of the protist-type clade 1 lysozymes, including A, the unrooted tree topology with branch-lengths inferred from DNA sequence analysis, and B, the tree topology with branch-names used in the analysis of positive selection across branches. The branch-names in B are identical to those given in Table 5. All other information as in Fig. 4.
selection and thus, it may associate with signatures of adaptive sequence evolution. Such a signature is indeed found for the clade 1 protist-type lysozymes (including lys-1 to lys-3, lys-8, and orthologues).
(ii) Recent gene duplication and diversification. Phylogenetic analysis revealed five cases of lineage-specific duplication events (Figs. 4, 7, and 8). One of these cases (Creilys-4.1 and Cre-ilys4.2) is associated with a significant signature of adaptive sequence evolution, suggesting that diversifying selection favoured lysozyme differentiation upon duplication. The other four cases (Cre-lys-8.1 and Cre-lys-8.2; Cbr-lys-6.1 and Cbr-lys-6.2; Cel-lys-5 and Cellys-6; Cel-lys-7 and Cel-lys-8) appear to be subject to purifying selection. This pattern indicates strong selection for maintenance of gene function after the duplication event.
(iii) Gene duplication prior to species separation followed by differential gene loss. This scenario appears to apply to the Cel-ilys-1, Cel-ilys-2, Cel-ilys-3, and Cel-lys-4 genes, which are each present in only one of the species and diverge from internal nodes, some of them along long branches indicative of old evolutionary age. Loss of genes after duplication events in the other Caenorhabditis species may then suggest redundant functions of lysozymes in these taxa. As above under (i), their original diversification may have been driven by diversifying selection. Indeed, two episodes of adaptive sequence evolution were found to associate with these genes (Figs. 4A, 8A).
Phylogenetic inferences can only yield an approximation of the past and thus come with some uncertainty. Considering that the inferred relationships are generally supported by high bootstrap values and that they are based on the maximum likelihood approach, which was shown in the past to be less susceptible to biases (e.g. longbranch attraction) than other tree reconstruction methods [29], our results should provide a realistic image of Caenorhabditis lysozyme evolution. Taken together, their lysozyme repertoire is shaped by both ancestral and recent gene duplications. Sequence evolution is to a large extent determined by purifying selection. Yet, it also includes several episodes of diversifying selection, which associate with ancient as well as recent duplications. To our knowledge, similar evolutionary dynamics have not as yet been inferred for the lysozymes from other taxa.
Genealogy of the protist-type clade 2 lysozymes, including A, the unrooted tree topology with branch-lengths inferred from DNA sequence analysis, and B, the tree topology with branch-names used in the analysis of positive selection across branches Figure 8 Genealogy of the protist-type clade 2 lysozymes, including A, the unrooted tree topology with branch-lengths inferred from DNA sequence analysis, and B, the tree topology with branch-names used in the analysis of positive selection across branches. The branch-names in B are identical to those given in Table 6. All other information as in Fig. 4.
It is worth noting that we did not find an indication for adaptive sequence evolution between the two main protist-type clades (Fig. 6, Table 4). Two explanations are conceivable. On the one hand, differentiation of the two clades was not subject to diversifying selection. On the other hand, diversifying selection was important but could not be detected due to a lack of power of the analy-sis, which had to be based on a reduced data set including only the conserved sequence regions that could be reliably aligned across the different genes and taxa. At the same time, this specific result (as well as all other cases of comparatively long branches with d N /d S rate ratios below 1) strongly suggests that our analysis is not compromised by a possible saturation of synonymous substitutions along long branches, which could have led to underestimated d S rates and thus artificially high d N /d S rate ratios. It is also worth noting that only a single alignment site was inferred to be under positive selection in our analyses. This is unusual because in most immunity gene data sets associated with adaptive sequence evolution a larger number of positively selected sites is identified, e.g. in MHC class I receptors [30,31]. A possible reason is that the different evolutionary lineages vary as to the position of the positively selected sites or that only few lineages are subject to positive selection on specific sites. In both cases, the method employed would hinder detection of these positively selected sites because it assumes the same pattern of selection across all lineages [25]. We did not attempt to perform an analysis, in which d N /d S ratios are allowed to Branch names as in Fig. 7B. Methods and abbreviations as in Table 3.
The optimal free-ratio model had a likelihood of ln L = -6048.14 and the 1-ratio null model for the LRT comparison had ln L = -6117.89. Branch names are as in Fig. 6B. Methods and abbreviations as in Table  3. The optimal free-ratio model had a likelihood of ln L = -5556.07 and the 1-ratio null model for LRT comparison had ln L = -5646.46.
vary simultaneously across sites and lineages, because these types of analyses may be liable to higher error rates [32,33]. The single site, which we identified to be under positive selection, is thus predicted to be of main -albeit currently unknown -functional importance.

Functional diversification
Gene duplications are likely to be one of the main sources of evolutionary innovation [16]. The duplicated genes may acquire new functions (neo-functionalisation) or they may partition the multiple functions of the ancestral gene (sub-functionalisation) [17]. The relevance of either alternative as well as additional scenarios is a topic of intense current debate [34][35][36][37][38]. Importantly, in all cases the genetic diversity of duplicates is predicted to translate into functional diversity. Such a pattern is found in the ruminantia, which possess at least five different lysozyme types: the stomach, tracheal, intestinal, kidney, and milk lysozymes [10]. The first type is involved in digestion, whereas the others may function as antibacterial enzymes in immunity [10]. A similar pattern is observed for the at least eleven different Drosophila lysozymes. Most of them have a digestive role and show specialisation as to their time and site of expression [5,14]. A recent study additionally suggested an anti-fungal immune function for some of the genes (Lys B, C, D, E and CG16756) [39]. A further example includes the nine lysozymes of the mosquito Anopheles gambiae, which vary as to their role in immunity and digestion and also as to their time and location of expression [15].
The Caenorhabditis lysozymes show clear signatures of functional diversification. Pronounced differences between the three main clades and also within each of the clades are observed for molecular characteristics of the genes, their pathogen-induced expression, and also their regulation by the immune system. Based on the current data, it appears that the protist-type clade 1 lysozymes play an important role in immunity: They are all induced upon pathogen exposure. Most of them are under positive control of immunity pathways, including components of the insulin-like signalling cascade (DAF-16) [40][41][42], the p38 mitogen-activated protein kinase (MAPK) pathway (SEK-1 and PMK-1) [43], the TGF-β pathway (DBL-1, SMA-2) [44], or the GATA transcription factor ELT-2 [45]. Most interestingly, the different genes from this clade vary in their response to pathogens and immunity pathways. This variation may contribute to high immune specificity, as has recently been identified phenomenologically for The comparison focuses on the two clades of the protist-type lysozymes (p-lys) and one clade of the invertebrate-type lysozymes (i-lys), using ANOVA and the Tukey-Kramer posthoc tests performed with the program JMP IN 5.1.2. Significant pairwise posthoc comparisons are indicated by the respective clade numbers. Protein length, molecular weight (MW, to be multiplied by 1000), isoelectric point (pI), charge, and grand average hydropathy are given with the standard error of the mean. Branch names are as depicted in Fig. 8B. Methods and abbreviations as in Table 3. The optimal free-ratio model had a likelihood of ln L = -3525.65 and the 1-ratio null model for the LRT comparison had ln L = -3555.98.
invertebrates [46][47][48] and which is consistent with highly specific C. elegans-pathogen interactions [49]. Although the underlying molecular mechanisms are currently unknown, they are likely to be based on the genetic diversification of pathogen recognition receptors and/or immune effectors such as the lysozymes [21,50,51]. They may also include the synergistic interaction between different components of the immune system [51], as generally known for lysozymes and antimicrobial peptides [5,7,52,53]. In C. elegans, the immune function has been tested for two genes of the clade 1 protist-type lysozymes.
Overexpression of Cel-lys-1 enhances resistance against S. marcescens [20], whereas silencing of Cel-lys-7 increases susceptibility to M. nematophilum [19]. The importance of lysozyme diversification for immunity in general and also for immune specificity clearly warrants further investigation.
The role of the invertebrate-type and also the clade 2 protist-type lysozymes is as yet unclear. The only exception may be Cel-ilys-3. Its silencing enhances susceptibility to M. nematophilum [19]. In the same study, no effect was observed after Cel-ilys-2 knock-down [19]. In general, both invertebrate-type and clade 2 protist-type lysozymes are less often activated by pathogens than the clade 1 protist-type lysozymes. At the same time, several of the genes are downregulated by pathogens and by known immunity pathways. The latter observation may suggest that their main function somehow interferes with the immune response. A similar finding was made for some of the digestive lysozymes from D. melanogaster, which are also downregulated upon immune challenge [5]. This particular similarity may indicate that the primary function of these nematode lysozymes is also digestion. The information on their molecular characteristics (e.g. isoelectric point) or the localization of gene expression is consistent with a role in both immunity and digestion. Unfortunately, the nematode's intestines are the main location for bacterial digestion and at the same time immune defence against pathogens that are easily taken up during feeding [54]. Therefore, lysozymes are expected to have similar characteristics (e.g. regarding pH optimum) even if they vary in function. Future analyses should thus be performed with either exclusive food bacteria or exclusive pathogens, in order to distinguish between the alternative functions.

Conclusion
Our study provides an evolutionary framework for understanding lysozyme diversification in Caenorhabditis nematodes. The comprehensive lysozyme repertoire falls into three distinct clades and it is shaped by both purifying selection and several episodes of adaptive sequence evolution. The genetic diversification appears to translate into functional differentiation. The information obtained should prove useful as a primer for future analysis of lysozyme function in digestion and immunity. The Caenorhabditis lysozymes may further serve as an example of the importance of evolution by gene duplication in invertebrate immune systems.

Sequence alignments
For the three considered Caenorhabditis species, protein and DNA sequences of annotated genes with similarities to known lysozymes were obtained from wormbase [55]. Three main alignments were generated (alignments 1, 2, and 4; see below). The first one of these, alignment 1, served to infer the general phylogenetic relationship of the C. elegans lysozymes to those from other taxa. We specifically considered taxa, which were included in similar lysozyme phylogenetic analyses in the past [8,56,57], thus allowing comparison between our results and those from previous studies. The alignment was produced with the hierarchical method, implemented in the programme CLUSTALW [58] and using the default settings. The resulting alignment contained substantial sequence variation. Moreover, variations of the programme settings (gap open, gap extension, and gap distance penalties) resulted in differences among generated alignments. Therefore, positional homology across alignment 1 may not be entirely reliable. Since it is based on the hierarchical alignment method (i.e. similar sequences are aligned first, followed by subsequent addition of less similar sequences), it should still be informative as to the general phylogenetic position of the Caenorhabditis lysozymes in comparison to those from other taxa.
The more detailed analysis of Caenorhabditis lysozyme evolution was based on the main alignments 2 and 4. Six additional alignments were extracted from these two alignments (see below, alignments 3, 5-9). In particular, alignments 2 and 3 served to analyse the invertebrate-type lysozymes. The overall phylogeny of the protist-type lysozymes from nematodes and one outgroup taxon was examined with alignment 4. Five additional alignments were extracted from alignment 4 for the detailed analysis of lysozyme evolution (alignments 5-9). Here, we excluded highly variable sequence regions from alignments, if these could not be recovered in identical form under alternative settings of the alignment programme (only relevant for alignments 5-7), in order to ensure a high likelihood of positional homology and thus a reduced risk of homoplasy. Alignments 4-9 are subsets of each other with identical position of indels as indicated in Fig. 5.
(ii) Alignment 2 contained protein sequences of all invertebrate-type lysozymes from the three Caenorhabditis species (Fig. 3).
(iii) Alignment 3 is the DNA version of alignment 2.
(iv) Alignment 4 has all protein sequences of the Caenorhabditis protist-type lysozymes and also one from D. discoideum (XP_644284).
(v) Alignment 5 is the modified DNA version of alignment 4. Here, we excluded the taxon D. discoideum and additionally one 5' and one 3' end region, which could not be aligned reliably at the DNA sequence level. The excluded region at the 5' end corresponds to positions 1 to 149 and that at the 3' end to positions 301 to 345 of the protein sequence alignment (see Fig. 5).
(vi) Alignment 6 represents a subset of alignment 4. It contains the Caenorhabditis protein sequences of the clade 1 protist-type lysozymes, whereby we excluded a fragment at the 5' end (positions 1 to 67; Fig. 5), another fragment towards the 3' end (positions 314 to 331; Fig. 5), and a small region at the 3' end (positions 339 to 345; Fig. 5).
(vii) Alignment 7 is the DNA version of alignment 6.
(viii) Alignment 8 is again a subset of alignment 4. It includes all complete Caenorhabditis protein sequences of the clade 2 protist-type lysozymes (Fig. 5).
(ix) Alignment 9 is the DNA version of alignment 8.

Sequence characteristics
General properties of the different lysozymes were inferred with the help of the ProtParam tool of the ExPASy server [59,60], including protein length, molecular weight, isoelectric point (pI), charge, and also the grand average of hydrophobicity. Differences in these traits between lysozyme clades were assessed with an analysis of variance (ANOVA) and posthoc Tukey-Kramer comparisons, using the program JMP IN 5.1.2 (SAS Institute Inc.). The presence and position of a signal peptide was inferred with the SIGNALP 3.0 server [61]. Further information on the genomic location, the function and also regulation of the C. elegans lysozymes were taken from wormbase [55] and the current literature.

Phylogenetic tree inference
Phylogenies were reconstructed using the maximum likelihood (ML) optimality criterium [23]. For protein sequence alignments, the optimal substitution model was first inferred using the program ProtTest version 1.3 [62] and the Akaike information criterion, following the recommended approach [63,64]. The optimal substitution model was employed for a heuristic tree search with the help of the program PhyML [65,66] using default settings. The robustness of the inferred tree topology was evaluated via non-parametric bootstrapping [67] based on 200 replicate data sets.
For DNA sequence alignments, the optimal substitution model was found using the same strategy as above and as implemented in the program ModelTest version 3.7 [63,64,68]. The phylogenetic tree was then inferred with the help of the ML option of the program PAUP* 4.0b10 [69], using the optimal substitution model, a heuristic tree search based on branch-swapping by tree bisection and reconnection (TBR), the random addition of sequences, which was repeated ten times, and otherwise default settings. The robustness of the tree topology was assessed with non-parametric bootstrapping using 500 replicates.