Evolution of the mammalian lysozyme gene family
© Irwin et al; licensee BioMed Central Ltd. 2011
Received: 17 October 2010
Accepted: 15 June 2011
Published: 15 June 2011
Lysozyme c (chicken-type lysozyme) has an important role in host defense, and has been extensively studied as a model in molecular biology, enzymology, protein chemistry, and crystallography. Traditionally, lysozyme c has been considered to be part of a small family that includes genes for two other proteins, lactalbumin, which is found only in mammals, and calcium-binding lysozyme, which is found in only a few species of birds and mammals. More recently, additional testes-expressed members of this family have been identified in human and mouse, suggesting that the mammalian lysozyme gene family is larger than previously known.
Here we characterize the extent and diversity of the lysozyme gene family in the genomes of phylogenetically diverse mammals, and show that this family contains at least eight different genes that likely duplicated prior to the diversification of extant mammals. These duplicated genes have largely been maintained, both in intron-exon structure and in genomic context, throughout mammalian evolution.
The mammalian lysozyme gene family is much larger than previously appreciated and consists of at least eight distinct genes scattered around the genome. Since the lysozyme c and lactalbumin proteins have acquired very different functions during evolution, it is likely that many of the other members of the lysozyme-like family will also have diverse and unexpected biological properties.
The vertebrate lysozyme gene family has traditionally been considered to be composed of three genes: lysozyme c, lactalbumin, and calcium-binding lysozyme [1–4]. Lysozyme c, chicken-type (or conventional) lysozyme, is a bacteriolytic enzyme that is secreted into many body fluids of mammals (e.g., blood, tears, and milk) and is found at a high concentration in the eggs of many bird species [1, 2, 5]. Lysozyme c is widespread in nature; its protein and gene sequences have been characterized from numerous diverse vertebrate and non-vertebrate species [3, 5, 6]. Lactalbumin is related to lysozyme, with around 40% amino acid identity and nearly identical three-dimensional structure, but lacks its bacteriolytic activity [1, 2, 4, 7]. Lactalbumin is expressed in lactating mammary glands, where it binds a calcium ion and modifies the activity of β-galactosyltransferase-1, such that the complex catalyzes the synthesis of lactose [2, 4, 7]. Lactalbumin has recently been shown to have a second activity in the gut, where it loses the calcium ion and binds a fatty acid; this new form of lactalbumin appears to promote apoptosis of tumor cells, and thus has been renamed HAMLET (human lactalbumin made lethal to tumors) . Lactalbumin appears to be found only in mammals, and is widely distributed in this group. Calcium-binding lysozyme has bacteriolytic activity like lysozyme c, but also shares with lactalbumin the ability to bind a calcium ion. Calcium-binding lysozymes appear to be relatively rare; they have been found in the milk of only a few mammalian species (e.g., horse, dog, cat, seal, and echidna), as well as in the eggs (e.g., pigeon) and stomachs (e.g., hoatzin) of some bird species [3, 9]. Indeed, calcium-binding lysozyme genes have not been reported for the human or rodent genomes.
Previous phylogenetic analyses of lysozyme c, lactalbumin, and calcium-binding lysozyme sequences had suggested that the earliest divergences within this gene family occurred between lysozyme c and the ancestor of the genes for lactalbumin and calcium-binding lysozyme, and that this initial gene duplication may have preceded the divergence of the lineages leading to fish and mammals [10, 11]. The separation of the lactalbumin and calcium-binding lysozyme genes was proposed to be more recent, with some studies [9, 12] suggesting a divergence on the early mammalian lineage, which would be consistent with the restriction of the lactalbumin gene to mammals. In contrast, another study  suggested that the duplication generating the lactalbumin and calcium-binding lysozyme genes predated the bird-mammal divergence. Moreover, the orthology of the mammalian and avian calcium-binding lysozymes has even been questioned [3, 11]. Thus, the origin of these mammalian lysozyme-like genes remains an open question.
Recently, cDNAs for several additional lysozyme-like sequences have been identified from human testis cDNA libraries [13–15]. These cDNAs were found to be encoded by genes that are now annotated by Ensembl  as LYZL (lysozyme-like): LYZL2, LYZL4, LYZL6 and LYZL3 (Synonym SPACA3; SPACA, Sperm acrosome associated . SPACA3 is also known as SPRSA  and SLLP1 ). The predicted protein sequences of some of these lysozyme-like sequences have amino acid substitutions at sites important for the catalytic activity of lysozyme, suggesting that these proteins would not be able to hydrolyze the glycosidic bonds of bacterial peptidoglycan [13, 15]. Since these four new lysozyme-like genes (LYZL2, LYZL4, LYZL6, and SPACA3) are expressed predominantly in the testes, it has been suggested that they might have a role in reproduction [13–15, 17]. Such a role has been shown for Lyzl4 and Spaca3 in mice [18, 19].
The identification of these LYZL genes in the human genome suggests that the mammalian lysozyme-like gene family is larger than previously appreciated, and raises the possibility that the lysozyme-like proteins encoded by these genes may have novel biological functions. Here we have used extensive similarity searches of the human and other vertebrate genomes. We thereby identified three additional intact lysozyme-like genes in the human genome; these have been annotated in the databases, but not reported in the literature. We have also identified multiple lysozyme-like genes in the genomes of diverse vertebrates. Using a combination of phylogenetic and genomic neighborhood (or synteny) analyses, wherein the relationships of the genes that flank the lysozyme-like genes in diverse species were examined, we demonstrate that orthologs of the human lysozyme-like genes are found in the genomes of diverse mammalian species. Our analyses suggest that there were at least six, and perhaps as many as nine, diverse types (or subfamilies) of lysozyme-like genes in the genome of the common ancestor of all extant mammals, and that these diverse genes have been maintained on most mammalian lineages. This suggests that their protein products probably have essential biological functions that are yet to be identified.
Results and Discussion
Number of Lysozyme Genes in the Human Genome
Chromosomal location of human lysozyme-like genes
49,024,938 - 49,026,186
29,577,990 - 29,607,257
30,895,152 - 30,918,691
42,438,570 - 42,452,092
34,261,548 - 34,270,674
31,318,887 - 31,324,895
47,863,734 - 47,869,126
47,986,603 - 47,991,995
Pairwise identity, in percent, of human lysozyme-like protein sequences
Lysozyme Genes in Other Vertebrate Genomes
Importantly, the BLAST searches identified potential orthologs in most mammalian genomes (Figure 2) of all of the divergent lysozyme-like genes found in the human genome (LYZ, LALB, LYZL1/2, LYZL4, LYZL6, SPACA3, and SPACA5). Initial assignments of orthology of these mammalian genes were based upon sequence similarity, but were subsequently confirmed by performing genomic neighborhood and phylogenetic analyses (see below). Our combined findings about gene number and orthologous relationships of the mammalian lysozyme-like gene family members are outlined in Figure 2. In contrast to the mammalian genes, the lysozyme-like genes found in the other vertebrate genomes could not be readily classified into the above subfamilies based upon sequence similarities; this is because these genes and their encoded proteins displayed similar levels of sequence identity to all of the different mammalian paralogs. Furthermore, no evidence of synteny with the mammalian genes was found for any of the non-mammalian vertebrate genes, except for the lysozyme c gene. Thus, none of the non-mammalian vertebrate lysozyme-like genes could be definitively classified as orthologs of any of the lysozyme-like mammalian genes, other than Lyz itself.
Many of the genes that were identified in our searches were only partial sequences, most likely due to the incomplete nature of the genomes in question. However, all but one of these genes were consistent with a structure similar to that of the mammalian lysozyme and lactalbumin genes -- that is, their coding regions appeared to be composed of four exons having similar intron-exon structures [5, 6]. The lone exception was a Lyzl1/2-like gene found in the treeshrew (from Genescaffold_6044), which had a nearly full-length coding sequence that contained stop codons and frameshifts, but no introns; thus, this gene appears to be a processed pseudogene. Taken together, these observations suggest that essentially all of the vertebrate lysozyme-like genes have been generated by duplications of genomic DNA, rather than by reverse-transcription and insertion into genome.
Phylogeny of Vertebrate Lysozymes
Importantly, as illustrated by the Bayesian analysis shown in Figure 3, all of these phylogenetic analyses suggested that most lineages of mammals have eight different types of lysozyme-like genes (or pseudogenes): Lyz, Lalba, Lysc1, Lyzl1/2, Lyzl4, Lyzl6, Spaca3, and Spaca5. Regardless of type of phylogenetic analysis, these mammalian genes always clustered together as monophyletic groups, or clades, supporting their orthologous relationships. These mammalian gene clades routinely had high statistical support, again regardless of method used. These results are consistent with the orthologous relationships suggested by the original BLAST searches, as well as with our genomic neighborhood analyses (described in more detail in the sections below). However, the genes from the other vertebrates did not consistently group with any of these mammalian orthologs, with the exception of some of the lysozyme c sequences.
In addition, it is clear that most, if not all, of the eight mammalian lysozyme-like genes duplicated and diverged from each other prior to the divergence of the earliest mammalian lineages. This is clear for at least two reasons. First, both the platypus and the eutherian genomes contain copies of most of these gene duplicates; therefore, these genes must have diverged earlier than did the species lineages. Second, some of the non-mammalian vertebrate genes appear to have phylogenetic affinity for some of the mammalian gene lineages, although few have much statistical support. This is particularly evident for many of the lizard genes which, as illustrated in Figure 3, tend to branch with various mammalian orthologs. If this result is not an artifact, then many of the lysozyme-like genes must have duplicated prior to the mammal-reptile (or even mammal-amphibian) divergence. If this were the case, however, then these gene duplicates must have been deleted from the genomes of birds.
Whereas our phylogenetic analyses supported the monophyly of each of the mammalian lysozyme-like gene duplicates, the relationships between the paralogs were not resolved well (see Figure 3, and Additional files 2-4: Figures S1-S3). While many of our phylogenetic analyses, including the one shown in Figure 3 (and Additional file 3: Figure S2), suggested that the Lalba clade was the earliest diverging lineage and that most of the Lyzl and Spaca genes (Lyzl1/2, Lyzl4, Lyzl6, Spaca3, and Spaca5, but not Lyzl8) were most closely related to each other, these relationships were not consistently found (e.g., see Additional file 2: Figure S1). Therefore, the phylogenetic analyses are inconclusive concerning the relationships of the different subfamilies of mammalian lysozyme-like genes.
The phylogenetic trees also suggested the possibility that at least some of the gene divergences occurred very early in vertebrate evolution, i.e., prior to the mammal-fish divergence. For example, the mammalian Lyz gene sequences were found to branch with Lyz genes from fish, rather than with the other mammalian lysozyme-like genes (Figure 3); if this branching order reflects the actual evolutionary history of the genes (rather than phylogenetic affinity based upon conserved lysozyme protein structure and function), then the Lyz gene lineage must have diverged from the other lysozyme-like genes prior to the mammal-fish divergence. Again, if this were true, then many species lineages must have deleted the duplicates from their genomes. Below, we discuss each subfamily of lysozyme-like gene in the mammals, and consider their potential non-mammalian orthologs.
Lysozyme c (Lyz) Genes
Lysozyme-like 1/2 (Lyzl1/2) Genes
Lysozyme-like 4 (Lyzl4) Genes
The Lyzl4 gene was either found as a single copy, or was missing, in all of the mammalian genomes examined (Figure 2, and Additional file 1: Table S1). Many of the missing genes likely reflect incomplete genomes rather than deletions. Genomic neighborhood analysis confirmed the orthology of the Lyzl4 genes across placental mammals. However, the flanking genes are on different chromosomes in opossum, suggesting that chromosomal recombination had occurred (Additional file 6: Figure S5). Phylogenetic analyses of Lyzl4 sequences were consistent with it being a single copy gene (Additional file 7: Figure S6). The only potential non-mammalian orthologs of Lyzl4 identified by the phylogenetic analyses were from the lizard (LyzM and LyzN in Figure 3, but no evidence for orthology in Additional files 2-4: Figures S1-S3); however, the lizard and mammalian genes were not in similar genomic neighborhoods (data not shown).
Lysozyme-like 6 (Lyzl6) Genes
Most mammals exhibited only one Lyzl6 gene, although the opossum had none. Yet, in contrast to most of the other paralogs, great variation in the number of Lyzl6 genes was observed across mammals, with five genes identified in the dog and four genes identified in both the alpaca and the hyrax (Figure 2, and Additional file 1: Table S1). Given the distant relationships of these three species, these gene duplication events must have occurred independently. The placental and wallaby Lyzl6 genes reside in a conserved genomic neighborhood, which again suggests that this gene was deleted on the opossum lineage (Additional file 8: Figure S7). The presence of multiple Lyzl6 genes in a genome raises the possibility of concerted evolution; however, sufficient data were not available to allow examination of this possibility for the Lyzl6 genes (Figure 2, and Additional files 1 and 9: Table S1 and Figure S8). The phylogenetic analyses did not suggest any candidates for Lyzl6 orthologs in non-mammalian species (Figure 3, and Additional files 2-4: Figures S1-S3).
Sperm acrosomal protein 3 (Spaca3) Genes
Spaca3, like Lyzl4, was not found to be duplicated in any of the mammalian genomes examined (Figure 2, and Additional file 1: Table S1). Spaca3 resides in a conserved genomic neighborhood in placental mammals; however, a Spaca3 gene is absent from this genomic neighborhood in the opossum (Additional file 10: Figure S9). While the wallaby and platypus Spaca3 genes could not be placed in a genomic context due to the short lengths of their genomic contigs (Additional file 10: Figure S9), phylogenetic analysis of these sequences (Additional file 11: Figure S110 was consistent with them being orthologs. These results suggest that the Spaca3 gene was deleted on the opossum lineage. Again, no non-mammalian orthologs were suggested by phylogenetic analysis (Figure 3, and Additional files 2-4: Figures S1-S3).
Sperm acrosomal protein 5 (Spaca5) Genes
The Spaca5 gene was found only within placental mammals, with no orthologs suggested by phylogenetic analysis or similarity searches in marsupials, platypus, or other vertebrates (Figures 2 and 3, and Additional files 1-4: Table S1 and Figures S1-S3). Thus, it is possible that this gene duplication happened in the ancestor of placental mammals. Genomic neighborhood analysis showed that the Spaca5 gene was in a similar neighborhood on the human, macaque, mouse, and dog X chromosomes (Additional file 12: Figure S11); this genomic region was not found in marsupials, platypus, or other vertebrates (results not shown). The SPACA5 gene was found to be uniquely duplicated in the human genome (Figure 2, Additional files 1 and 13: Table S1 and Figure S12). A very recent duplication of SPACA5, since human-chimpanzee divergence, could account for the perfect identity of the protein sequences (Table 2) without requiring concerted evolution; however, concerted evolution between the human SPACA5 and SPACA5B genes cannot be excluded.
Lysozyme-like 8 (Lyzl8) Gene
The platypus genome contained one lysozyme-like gene, named Lyzl8, which did not group with any of the other mammalian genes (Figure 3, and Additional files 2-4: Figures S1-S3). All of our phylogenetic analyses supported the designation of Lyzl8 as a unique lysozyme-like gene duplicate, as the platypus gene did not fall within any of the other monophyletic gene groups. The relationship of the platypus Lyzl8 gene to the other lysozyme-like genes was highly labile in the phylogenetic analyses (Figure 3, and Additional files 2 and 3: Figures S1 and S2). This result is in accord with the fact that the platypus Lyzl8 gene (or protein) showed little similarity to any of the other lysozyme-like genes (or proteins) in our BLAST searches. When the platypus Lyzl8 gene was used as a query to search mammalian genomes, only one genomic sequence -- from the sloth (Figure 2, and Additional file 1: Table S1) -- was found to have greater similarity to Lyzl8 than to any other lysozyme-like gene. When the short sloth sequence was used as a query against the platypus genome, its best match was the Lyzl8 gene. However, the sloth genomic contig was short, containing only a single exon, and therefore could not be used for phylogenetic or genomic neighborhood analysis; thus, the evidence supporting orthology of the sloth sequence to the platypus Lyzl8 gene is very weak. Thus, at present, it is not clear whether this gene duplication happened on the ancestral mammal lineage, with subsequent losses on most descendant lineages, or on the monotreme lineage.
Lactalbumin (Lalba) and Calcium-binding Lysozyme (Lysc1) Genes
An intriguing observation from our genomic neighborhood analysis of was that the mammalian calcium-binding lysozyme gene (Lysc1) is located adjacent to the Lalba gene in the dog (Figure 7) and horse (not shown) genomes. Both previous phylogenetic analyses [9–12] and our new phylogenetic analyses (Figure 2, and Additional files 2-4: Figures S1-S3) suggested that the Lysc1 gene originated prior to the radiation of mammals. However, our tBLASTn searches using either dog or horse Lysc1 identified similar sequences in the genomes of only a few diverse mammals -- dog, cat, horse, shrew, sloth, and mouse lemur (Figure 2, and Additional file 1: Table S1). It is also noteworthy that the mammalian (Lysc1) and avian calcium-binding lysozyme genes are not closely related in our phylogenies, a finding in agreement with some earlier analyses [3, 11]. Thus, it is reasonable to speculate that calcium binding evolved independently in these bird and mammal lysozymes. The newly identified Lysc1-like genomic sequences all were found on short genomic contigs (Additional file 1: Table S1); nonetheless, both the cat and mouse lemur genomic contigs also encode part of the c12orf41 gene (Additional file 15: Figure S14A), which is adjacent to the Lysc1 gene in both the dog and horse genomes (Figure 7). This suggests that the Lysc1 gene may be near the c12orf41 gene in other mammalian genomes. Using a strategy that has previously worked to identify genes that could not be found through typical BLAST searches [40, 41], we focused carefully on the sequences between the Lalba and c12orf41 genes. In 17 of the 37 mammalian genomes available from Ensembl [16, 25], the Lalba and c12orf41 genes were contained in contiguous genomic sequences. In 18 of the 20 species this genomic region was fragmented into several small genomic contigs; thus, we cannot exclude the possibility that in these genomes the two genes are contiguous. In the pig and the little brown bat this genomic region was not fragmented. In the pig, the current genome assembly does not encode the Lalba gene and the c12orf41 gene is embedded within a very large genomic fragment, suggesting that the Lalba - c12orf41 genomic region has been reorganized in the pig genome (or that this region has been incorrectly assembled). In the little brown bat, the Lalba gene is embedded in a large genomic fragment that was not annotated to include c12orf41 (although our BLAST searches did identify a very small fragment with strong similarity). A more careful examination of the little brown bat genomic contig revealed that most of the genomic region is composed of unsequenced gaps.
For the 17 genomes that did have linked Lalba and c12orf41 genes, the distance between these two genes ranged from ~50 kb (mouse, rat, and rabbit) to ~250 kb (cow and opossum). For all of these genomes, except the opossum (see below), the only genes (or pseudogenes) annotated as existing between Lalba and c12orf41 were olfactory receptor-like genes, which are not very useful for identifying orthologous and conserved genomic neighborhoods due to their abundance. In the opossum, in addition to the olfactory receptor-like genes, three additional genes were annotated between Lalba and c12orf41: the genes Mip, Spryd4, and Gls2. Unfortunately, the wallaby genome is poorly assembled near the Lalba and c12orf41 genes, and thus the neighboring genes could not be identified. Although the Mip, Spryd4, and Gls2 genes reside on the same chromosome as Lalba and c12orf41 in many mammals (e.g., human, rat, guinea pig, cow, horse, and elephant), they are found greater than 8 Mb away; furthermore, in some species (e.g., mouse and dog) they are on different chromosomes. These observations suggest that the organization of the Lalba, c12orf41, Mip, Spryd4, and Gls2 genes, and potentially a Lysc1 gene, has changed between the opossum (and possibly other marsupials) and placental mammals.
The genomic sequence between the Lalba and c12orf41 genes for the 17 genomes where these two genes were linked was aligned with MultiPipMaker [42, 43]. Sequences with similarity to the Lysc1 gene were not observed in 9 of the genomic sequences -- those from marmoset, mouse, rat, guinea pig, rabbit, treeshrew, cow, little brown bat, and opossum (Additional file 15: Figure S14B). It should be noted, however, that for 3 of these species (cow, little brown bat, and marmoset) these genomic sequences contain large amounts of unknown sequence (i.e., sequence gaps). Thus, there are only 6 species with nearly complete genomic sequences spanning the Lalba and c12orf41 genes for which we have good evidence for the actual absence of a Lysc1 gene or pseudogene -- mouse, rat, guinea pig, rabbit, treeshrew, and opossum. Pairwise sequence alignments between the mouse, rat, or guinea pig genomic sequences with those from dog or horse (or primates) using PipMaker  revealed that a large genomic region, which could potentially encode a Lysc1 gene, is missing from these rodent genomes (results not shown). This suggests that this genomic region, including the Lysc1 gene, was deleted either early on the rodent lineage or in the common ancestor of rodents and close relatives (e.g., rabbit), but after the divergence of the rodent lineage from the primate lineage (see Figure 2).
Intact Lysc1 genes that predict potentially functional calcium-binding lysozymes were found in only a few species (dog, horse, and shrew), whereas pseudogenes were found on several lineages (primates, elephant, and sloth). Phylogenetic and genomic analyses suggested that the pair of Lysc1 genes found in the shrew resulted from a tandem gene duplication event on the lineage leading to this species (Additional files 1 and 17: Table S1 and Figure S16); the divergence of the predicted protein sequences of the two genes suggests that they are not undergoing concerted evolution, however. The Lysc1 gene was deleted from the genome on the lineages leading to rodents (mouse, rat, guinea pig, and squirrel) and treeshrews. Taken together, the above observations suggest that the Lysc1 gene likely arose from a duplication of the lactalbumin gene early in mammalian evolution, and was inactivated several times independently, as summarized in Figure 2.
Here we have shown that the mammalian lysozyme gene family is much larger than previously anticipated, and is composed of at least eight distantly-related members (Lyz, Lalba, Lysc1, Lyzl1/2, Lyzl4, Lyzl6, Spaca3, and Spaca5) in most mammalian species. These observations suggest that this family experienced several duplication events prior to the origin of mammals. Several other gene families also experienced such amplifications near the origin of mammals, such as those generating the gene families for keratin-associated proteins , kallikriens [45, 46], and bitter taste receptors . Amplification of these latter genes has been suggested to be associated with development of new mammal-specific features -- e.g., hair (keratin-associated proteins), skin (kallikriens), and diet (bitter taste receptors) [44–47]. Intriguingly, lactalbumin is essential for lactose synthesis in mammary glands, a mammal-specific trait [2, 4, 7]. These observations raise the possibility that other members of the lysozyme-like family have also evolved mammal-specific roles. The new lysozyme-like genes have been largely conserved within mammals, suggesting that they provide important biological functions. The products of the Spaca3 and Lyzl4 genes have recently been shown to be involved in fertilization in mice [18, 19]. Much further study is needed to identify the enzymatic activities (if any) and biological functions of these newly identified lysozyme-like proteins.
Similar to the keratin-associated protein  and bitter taste receptor  gene families, genes for the lysozyme-like proteins are dispersed over several chromosomes (Table 1). The mechanisms by which these original gene duplications occurred are unclear, as the genes that flank the dispersed lysozyme-like genes show no homology to each other, implying that they were not generated by large segmental duplication events (as we observed for the duplications of LYZL1/LYZL2 and SPACA5/SPACA5B in the human genome). The lysozyme-like gene family also shares with the keratin associated protein , kallikrein , and bitter taste receptor  gene families the propensity for lineage-specific gene duplications (see Figures 2 and 3). The lineage-specific expansions, in contrast to the initial duplications, have frequently been tandem in nature. Such tandem organization increases the likelihood that the duplicated genes could be involved in concerted evolution [22, 23], which our phylogenetic analyses suggest have occurred in the Lyz and Lyzl1/2 subfamilies. The Lyz subfamily showed the greatest tendency to tandemly duplicate and evolve in concert, whereas the other lysozyme-like genes typically showed conservation in copy number. Tandem duplication or amplification of the Lyz gene has previously been observed in certain mammals, including the ruminants and rodents, where lysozyme appears to function as a digestive enzyme in the gut [3, 25–34]. It is of interest to note that many of the species that we found to possess multiple Lyz genes -- e.g., elephant and wallaby -- are also herbivorous species, and thus may use lysozyme as a digestive enzyme upon gut bacteria. The need for higher levels of digestive lysozymes in the guts of fermenting herbivores could have driven the fixation of the tandem duplications in these lineages. Gene conversion between the tandem duplicates might then provide a mechanism whereby favorable mutations in one gene copy could spread to the other copies in the cluster [33, 36], as well as a mechanism for retention of sequence similarity  in well-adapted proteins.
All vertebrate genomes maintained in the Ensembl  and Pre!Ensembl  databases (release 57, see Additional file 1: Table S1 for a full list) were searched in April 2010 for lysozyme-like sequences. We initially searched the genomes using the tBLASTn algorithm [20, 48] using previously-characterized human and rodent lysozyme c and lactalbumin sequences. Subsequent tBLASTn searches used all of the identified putative lysozyme-like protein sequences. Similar searches were conducted using additional databases (e.g., genome assemblies and ESTs) available at the NCBI website . After identification of the dog and horse calcium-binding lysozyme gene, the other mammalian genome assemblies on the Ensembl database were searched using these sequences using tBLASTn for similar sequences. All sequences that had E-scores below 0.01 were examined. Sequences identified by BLAST searches were used in reciprocal BLASTx searches of the human, mouse and dog proteomes to ensure that their best matches were lysozyme-like sequences. Sequences that were unannotated to encode lysozyme-like sequences (see Additional file 1: Table S1) were examined to identify potential coding sequences using published methods [50–52]. Insect and amphioxus lysozyme sequences, used as outgroups for the phylogenetic analysis (see below), were identified by searches of the NCBI ENTREZ protein database  for Drosophila  and amphioxus  lysozymes; these protein sequences were then used in tBLASTn  searches of the Ensembl  and NCBI databases  for related sequences. Several insect sequences were downloaded to represent the diversity of insect lysozyme sequences.
Genomic comparisons of DNA sequences near the lysozyme-like genes were conducted using PipMaker and MultiPipMaker [42, 43, 55]. Genes neighboring the lysozyme-like genes were identified from the genome assemblies at Ensembl  and Pre!Ensembl . The organization of genes adjacent to the lysozyme-like genes was used to determine whether the genes of interest reside in conserved genomic neighborhoods.
Phylogenies of vertebrate lysozyme-like gene coding sequences were generated with sequences from human, mouse, dog, horse, opossum, wallaby, and platypus, representing the diversity of mammals, as well as those from other vertebrate species (see Additional file 1: Table S1) and outgroups (Additional file 18: Table S2). Lysozyme-like coding sequences were aligned using MAFFT  and Clustal , as implemented at the Guidance web site [58, 59], using default parameters. (A MAFFT alignment of all the full-length sequences is provided in Additional file 19: Figure S17). Protein sequences were used as guides to generate the DNA sequence alignments. The reliability of the alignments was examined using Guidance [58, 59] and trimmed alignments using sites that had values above the default cut-off of 0.93 were generated. Insect and/or amphioxus lysozyme sequences were used to root the trees of vertebrate lysozyme-like sequences.
Phylogenetic trees of the sequences were generated by a variety of methods including MrBayes 3.1.2 [60, 61], PhyloBayes 3.2f , and PhyML , MEGA4.0.2 , and PAUP* 4beta10 . Bayesian trees were generated from coding sequences with MrBayes 3.1.2 using parameters selected by hierarchical likelihood ratio tests with ModelTest version 3.8, as implemented on the ModelTest server [66–68]. MrBayes was run for 2,000,000 generations with four simultaneous Metropolis-coupled Monte Carlo Markov chains sampled every 100 generations. The average standard deviation of split frequencies dropped to less than 0.02 for all analyses. The first 25% of the trees were discarded as burn-in with the remaining samples used to generate the consensus trees. Trace files generated by MrBayes were examined by Tracer  to verify if they had converged. Bayesian phylogenies were also generated from protein sequences using PhyloBayes, with two chains being used with the automatic stopping rule set to terminate the analysis when bpcomp and tracecomp indicated that discrepancies between the chains was equal to or below 0.2 and all effective sizes were greater than 100. The first 10% of the trees were discarded as burnin. PAUP* was used to construct parsimony trees. Bootstrapped maximum likelihood trees, 100 replications, were generated by PhyML  on the PhyML webserver  using parameters for the substitution model suggested by ModelTest. The maximum likelihood search was initiated from a tree generated by BIONJ and the best tree was identified after heuristic searches using the nearest neighbor interchange (NNI) algorithm. MEGA4  was used to construct bootstrapped (1000 replications) neighbor-joining distance trees, using either Maximum Composite Likelihood distances for the DNA sequences or JTT distances for the proteins sequences. Bootstrapped parsimony trees were also generated by PAUP , with 1000 replications and the same search method used for maximum likelihood.
With respect to orthology-paralogy issues, the choice of outgroup, the alignment method (MAFFT  or Clustal ), and the use of full-length or trimmed (based on Guidance scores [58, 59]) alignments had little influence on the key findings of these analyses. Methods that relied on shorter sequences (i.e., trimmed alignments or protein sequences) or simpler models of sequence evolution (i.e., neighbor-joining or parsimony) tended to yield weaker support for the earlier diverging lineages, but none of our analyses were in significant conflict with the key inferences of the phylogeny presented in Figure 3.
For phylogenies that contained just mammalian lysozyme-like sequences, Lalba sequences were arbitrarily used to root the trees. When only mammalian lysozyme-like gene sequences were used for the phylogenetic analyses, then stronger support for each of the orthologous groups was found with all of the phylogenetic methods used including Bayesian inference, maximum likelihood, distance, and parsimony (see Additional file 4: Figure S3). To generate gene-specific phylogenies, the platypus sequence was used as a root, except for Lysc1 and Spaca5 where the platypus does not have these sequences. For Lysc1, the sloth sequence was used to root the tree, whereas for Spaca5 the elephant and tenrec sequences provided the root.
This work has been supported by grants from the Natural Sciences and Engineering Research Council (to DMI) and from the National Institutes of Health and SUNY-Albany (to CBS). We thank the Associate Editor and two anonymous reviewers for their comments that have helped improve this manuscript.
- McKenzie HA, White FH Jr: Lysozyme and alpha-lactalbumin: structure, function, and interrelationships. Adv Protein Chem. 1991, 41: 173-315.View ArticlePubMedGoogle Scholar
- McKenzie HA: alpha-lactalbumins and lysozymes. Lysozymes: model enzymes in biochemistry and molecular biology. Edited by: Jollès, P. 1996, Basel, Birkhäuser Verlag, 365-409.View ArticleGoogle Scholar
- Prager EM, Jollès P: Animal lysozymes c and g: an overview. Lysozymes: model enzymes in biochemistry and molecular biology. Edited by: Jollès P. 1996, Basel, Birkhäuser Verlag, 9-31.View ArticleGoogle Scholar
- Qasba PK, Kumar S: Molecular divergence of lysozymes and alpha-lactalbumin. Crit Rev Biochem Mol Biol. 1997, 32: 255-306. 10.3109/10409239709082574.View ArticlePubMedGoogle Scholar
- Callewaert L, Michiels CW: Lysozymes in the animal kingdom. J Biosci. 2010, 35: 127-160. 10.1007/s12038-010-0015-5.View ArticlePubMedGoogle Scholar
- Irwin DM, Yu M, Wen Y: Isolation and characterization of vertebrate lysozyme genes. Lysozymes: model enzymes in biochemistry and molecular biology. Edited by: Jollès P. 1996, Basel, Birkhäuser Verlag, 225-241.View ArticleGoogle Scholar
- Permyakov EA, Berliner LJ: alpha-Lactalbumin: structure and function. FEBS Lett. 2000, 473: 269-274. 10.1016/S0014-5793(00)01546-5.View ArticlePubMedGoogle Scholar
- Pettersson-Kastberg J, Aits S, Gustafsson L, Mossberg A, Storm P, Trulsson M, Persson F, Mok KH, Svanborg C: Can misfolded proteins be beneficial? The HAMLET case. Ann Med. 2009, 41: 162-176. 10.1080/07853890802502614.View ArticlePubMedGoogle Scholar
- Nitta K, Tsuge H, Shimazaki K, Sugai S: Calcium-binding lysozymes. Biol Chem Hoppe Seyler. 1988, 369: 671-675. 10.1515/bchm3.1988.369.2.671.View ArticlePubMedGoogle Scholar
- Dautigny A, Prager EM, Pham-Dinh D, Jollès J, Pakdel F, Grinde B, Jollès P: cDNA and amino acid sequences of rainbow trout (Oncorhynchus mykiss) lysozymes and their implications for the evolution of lysozyme and lactalbumin. J Mol Evol. 1991, 32: 187-98. 10.1007/BF02515392.View ArticlePubMedGoogle Scholar
- Grobler JA, Rao KR, Pervaiz S, Brew K: Sequences of two highly divergent canine type c lysozymes: implications for the evolutionary origins of the lysozyme/alpha-lactalbumin superfamily. Arch Biochem Biophys. 1994, 313: 360-366. 10.1006/abbi.1994.1399.View ArticlePubMedGoogle Scholar
- Nitta K, Sugai S: The evolution of lysozyme and alpha-lactalbumin. Eur J Biochem. 1989, 182: 111-118. 10.1111/j.1432-1033.1989.tb14806.x.View ArticlePubMedGoogle Scholar
- Mandal A, Klotz KL, Shetty J, Jayes FL, Wolkowicz MJ, Bolling LC, Coonrod SA, Black MB, Diekman AB, Haystead TA, Flickinger CJ, Herr JC: SLLP1, a unique, intra-acrosomal, non-bacteriolytic, c lysozyme-like protein of human spermatozoa. Biol Reprod. 2003, 68: 1525-1537.View ArticlePubMedGoogle Scholar
- Chiu WW, Erikson EK, Sole CA, Shelling AN, Chamley LW: SPRASA, a novel sperm protein involved in immune-mediated infertility. Hum Reprod. 2004, 19: 243-249. 10.1093/humrep/deh050.View ArticlePubMedGoogle Scholar
- Zhang K, Gao R, Zhang H, Cai X, Shen C, Wu C, Zhao S, Yu L: Molecular cloning and characterization of three novel lysozyme-like genes, predominantly expressed in the male reproductive system of humans, belonging to the c-type lysozyme/alpha-lactalbumin family. Biol Reprod. 2005, 73: 1064-1071. 10.1095/biolreprod.105.041889.View ArticlePubMedGoogle Scholar
- ,: Ensembl Genome Browser [http://www.ensembl.org/index.html].
- Kurth BE, Digilio L, Snow P, Bush LA, Wolkowicz M, Shetty J, Mandal A, Hao Z, Reddi PP, Flickinger CJ, Herr JC: Immunogenicity of a multi-component recombinant human acrosomal protein vaccine in female Macaca fascicularis. J Reprod Immunol. 2008, 77: 126-141. 10.1016/j.jri.2007.06.001.View ArticlePubMedPubMed CentralGoogle Scholar
- Sun R, Shen R, Li J, Xu G, Chi J, Li L, Ren J, Wang Z, Fei J: Lyzl4, a novel mouse sperm-related protein, is involved in fertilization. Acta Biochem Biphys Sinica. 2011, 43: 346-353. 10.1093/abbs/gmr017.View ArticleGoogle Scholar
- Herrero MB, Mandal A, Digilio LC, Coonrod SA, Maier B, Herr JC: Mouse SLLP1, a sperm lysozyme-like protein involved in sperm-egg binding and fertilization. Develop Biol. 2005, 284: 126-142. 10.1016/j.ydbio.2005.05.008.View ArticlePubMedGoogle Scholar
- Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25: 3389-3402. 10.1093/nar/25.17.3389.View ArticlePubMedPubMed CentralGoogle Scholar
- Peters CW, Kruse U, Pollwein R, Grzeschik KH, Sippel AE: The human lysozyme gene. Sequence organization and chromosomal localization. Eur J Biochem. 1989, 182: 507-516. 10.1111/j.1432-1033.1989.tb14857.x.View ArticlePubMedGoogle Scholar
- Hall L, Emery DC, Davies MS, Parker D, Craig RK: Organization and sequence of the human alpha-lactalbumin gene. Biochem J. 1987, 242: 735-742.View ArticlePubMedPubMed CentralGoogle Scholar
- Bailey JA, Eichler EE: Primate segmental duplications: crucibles of evolution, diversity and disease. Nat Rev Genet. 2006, 7: 552-64.View ArticlePubMedGoogle Scholar
- Fawcett JA, Innan H: Neutral and non-neutral evolution of duplicated genes with gene conversion. Genes. 2011, 2: 191-209. 10.3390/genes2010191.View ArticlePubMedPubMed CentralGoogle Scholar
- ,: Ensembl Pre-release Genome Browser [http://pre.ensembl.org/index.html].
- Hammer MF, Schilling JW, Prager EM, Wilson AC: Recruitment of lysozyme as a major enzyme in the mouse gut: duplication, divergence, and regulatory evolution. J Mol Evol. 1987, 24: 272-279. 10.1007/BF02111240.View ArticlePubMedGoogle Scholar
- Cross M, Renkawitz R: Repetitive sequence involvement in the duplication and divergence of mouse lysozyme genes. EMBO J. 1990, 9: 1283-1288.PubMedPubMed CentralGoogle Scholar
- Cross M, Mangelsdorf I, Wedel A, Renkawitz R: Mouse lysozyme M gene: isolation, characterization, and expression studies. Proc Natl Acad Sci USA. 1988, 85: 6232-6236. 10.1073/pnas.85.17.6232.View ArticlePubMedPubMed CentralGoogle Scholar
- Yeh TC, Wilson AC, Irwin DM: Evolution of rodent lysozymes: isolation and sequence of the rat lysozyme genes. Mol Phylogenet Evol. 1993, 2: 65-75. 10.1006/mpev.1993.1007.View ArticlePubMedGoogle Scholar
- Cámara VM, Prieur DJ: Secretion of colonic isozyme of lysozyme in association with cecotrophy of rabbits. Am J Physiol. 1984, 247: G19-G23.PubMedGoogle Scholar
- Ito Y, Hirashima M, Yamada H, Imoto T: Colonic lysozymes of rabbit (Japanese white): recent divergence and functional conversion. J Biochem. 1994, 116: 1346-1353.PubMedGoogle Scholar
- Irwin DM, Wilson AC: Multiple cDNA sequences and the evolution of bovine stomach lysozyme. J Biol Chem. 1989, 264: 11387-11393.PubMedGoogle Scholar
- Irwin DM, Prager EM, Wilson AC: Evolutionary genetics of ruminant lysozymes. Anim Genet. 1992, 23: 193-202.View ArticlePubMedGoogle Scholar
- Irwin DM: Evolution of the bovine lysozyme gene family: changes in expression and reversion of function. J Mol Evol. 1995, 41: 299-312. 10.1007/BF01215177.View ArticlePubMedGoogle Scholar
- Irwin DM: Evolution of cow nonstomach lysozyme genes. Genome. 2004, 47: 1082-1090. 10.1139/g04-075.View ArticlePubMedGoogle Scholar
- Irwin DM, Wilson AC: Concerted evolution of ruminant stomach lysozymes. Characterization of lysozyme cDNA clones from sheep and deer. J Biol Chem. 1990, 265: 4944-4952.PubMedGoogle Scholar
- Wen Y, Irwin DM: Mosaic evolution of ruminant stomach lysozyme genes. Mol Phylogenet Evol. 1999, 13: 474-482. 10.1006/mpev.1999.0651.View ArticlePubMedGoogle Scholar
- Soulier S, Mercier JC, Vilotte JL, Anderson J, Clark AJ, Provot C: The bovine and ovine genomes contain multiple sequences homologous to the alpha-lactalbumin-encoding gene. Gene. 1989, 83: 331-338. 10.1016/0378-1119(89)90119-4.View ArticlePubMedGoogle Scholar
- Vilotte JL, Soulier S, Mercier JC: Complete sequence of a bovine alpha-lactalbumin pseudogene: the region homologous to the gene is flanked by two directly repeated LINE sequences. Genomics. 1993, 16: 529-532. 10.1006/geno.1993.1223.View ArticlePubMedGoogle Scholar
- Kurokawa T, Uji S, Suzuki T: Identification of cDNA coding for a homologue to mammalian leptin from pufferfish, Takifugu rubripes. Peptides. 2005, 26: 745-750. 10.1016/j.peptides.2004.12.017.View ArticlePubMedGoogle Scholar
- Irwin DM, Zhang T: Evolution of the vertebrate glucose-dependent insulinotropic polypeptide (GIP) gene. Comp Biochem Physiol Part D. 2006, 1: 385-95.Google Scholar
- Schwartz S, Zhang Z, Frazer KA, Smit A, Riemer C, Bouck J, Gibbs R, Hardison R, Miller W: PipMaker--a web server for aligning two genomic DNA sequences. Genome Res. 2000, 10: 577-586. 10.1101/gr.10.4.577.View ArticlePubMedPubMed CentralGoogle Scholar
- Schwartz S, Elnitski L, Li M, Weirauch M, Riemer C, Smit A, NISC Comparative Sequencing Program, Green ED, Hardison RC, Miller W: MultiPipMaker and supporting tools: Alignments and analysis of multiple genomic DNA sequences. Nucleic Acids Res. 2003, 31: 3518-3524. 10.1093/nar/gkg579.View ArticlePubMedPubMed CentralGoogle Scholar
- Wu DD, Irwin DM, Zhang YP: Molecular evolution of the keratin associated protein gene family in mammals, role in the evolution of mammalian hair. BMC Evol Biol. 2008, 8: 241-10.1186/1471-2148-8-241.View ArticlePubMedPubMed CentralGoogle Scholar
- Pavlopoulou A, Pampalakis G, Michalopoulos I, Sotiropoulou G: Evolutionary history of tissue kallikreins. PLoS One. 2010, 5: e13781-10.1371/journal.pone.0013781.View ArticlePubMedPubMed CentralGoogle Scholar
- Sotiropoulou G, Pampalakis G, Diamandis EP: Functional roles of human kallikrein-related peptidases. J Biol Chem. 2009, 284: 32989-3294. 10.1074/jbc.R109.027946.View ArticlePubMedPubMed CentralGoogle Scholar
- Dong D, Jones G, Zhang S: Dynamic evolution of bitter taste receptor genes in vertebrates. BMC Evol Biol. 2009, 9: 12-10.1186/1471-2148-9-12.View ArticlePubMedPubMed CentralGoogle Scholar
- ,: BLAST: Basic Local Alignment Search Tool [http://blast.ncbi.nlm.nih.gov/Blast.cgi].
- ,: National Center for Biotechnology Information [http://www.ncbi.nlm.nih.gov/].
- Irwin DM: Ancient duplications of the human proglucagon gene. Genomics. 2002, 79: 741-746. 10.1006/geno.2002.6762.View ArticlePubMedGoogle Scholar
- Irwin DM, Gong Z: Molecular evolution of the vertebrate goose-type lysozyme genes. J Mol Evol. 2003, 56: 234-242. 10.1007/s00239-002-2396-z.View ArticlePubMedGoogle Scholar
- Zhou L, Irwin DM: Fish proglucagon genes have differing coding potential. Comp Biochem Physiol. 2004, 137B: 255-264.View ArticleGoogle Scholar
- Kylsten P, Kimbrell DA, Daffre S, Samakovlis C, Hultmark D: The lysozyme locus in Drosophila melanogaster: different genes are expressed in midgut and salivary glands. Mol Gen Genet. 1992, 232: 335-343.View ArticlePubMedGoogle Scholar
- Liu M, Zhang S, Liu Z, Li H, Xu A: Characterization, organization and expression of AmphiLysC, an acidic c-type lysozyme gene in amphioxus Branchiostoma belcheri tsingtauense. Gene. 2006, 367: 110-117.View ArticlePubMedGoogle Scholar
- ,: PipMaker and MultiPipMaker [http://pipmaker.bx.psu.edu/pipmaker/].
- Katoh K, Misawa K, Kuma K, Miyata T: MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 2002, 30: 3059-3066. 10.1093/nar/gkf436.View ArticlePubMedPubMed CentralGoogle Scholar
- Thompson JD, Higgins DG, Gibson TJ: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994, 22: 4673-80. 10.1093/nar/22.22.4673.View ArticlePubMedPubMed CentralGoogle Scholar
- ,: The Guidance Server [http://guidance.tau.ac.il/].
- Penn O, Privman E, Ashkenazy H, Landan G, Graur D, Pupko T: GUIDANCE: a web server for assessing alignment confidence scores. Nucleic Acids Res. 2010, 38: W23-W28. 10.1093/nar/gkq443.View ArticlePubMedPubMed CentralGoogle Scholar
- Huelsenbeck JP, Ronquist F: MRBAYES: Bayesian inference of phylogeny. Bioinformatics. 2001, 17: 754-755. 10.1093/bioinformatics/17.8.754.View ArticlePubMedGoogle Scholar
- Ronquist F, Huelsenbeck JP: MRBAYES 3: Bayesian phylogenetic inference under mixed models. Bioinformatics. 2003, 19: 1572-1574. 10.1093/bioinformatics/btg180.View ArticlePubMedGoogle Scholar
- Lartillot N, Philippe H: A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process. Mol Biol Evol. 2004, 21: 1095-1109. 10.1093/molbev/msh112.View ArticlePubMedGoogle Scholar
- Guindon S, Gascuel O: A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Systematic Biology. 2003, 52: 696-704. 10.1080/10635150390235520.View ArticlePubMedGoogle Scholar
- Tamura K, Dudley J, Nei M, Kumar S: MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) Software Version 4.0. Mol Biol Evol. 2007, 24: 1596-1599. 10.1093/molbev/msm092.View ArticlePubMedGoogle Scholar
- Swofford DL: PAUP* Phylogenetic analysis using parsimony and other methods, version 4.0b10. 2002, Sunderland, Sinauer AssociatesGoogle Scholar
- Posada D, Crandall KA: ModelTest: testing the model of DNA substitution. Bioinformatics. 2003, 14: 817-818.View ArticleGoogle Scholar
- Posada D: ModelTest Server: a web-based tool for the statistical selection of models of nucleotide substitution online. Nucl Acids Res. 2006, 34: W700-W703. 10.1093/nar/gkl042.View ArticlePubMedPubMed CentralGoogle Scholar
- ,: ModelTest Server 1.0 [http://darwin.uvigo.es/software/modeltest_server.html].
- Rambaut A, Drummond AJ: MCMC Trace Analysis Package, version 1.5 [http://tree.bio.ed.ac.uk/software/tracer/].
- ,: PhyML 3.0: new algorithms, methods and utilities [http://www.atgc-montpellier.fr/phyml/].
- Murphy WJ, Pringle TH, Crider TA, Springer MS, Miller W: Using genomic data to unravel the root of the placental mammal phylogeny. Genome Res. 2007, 17: 413-421. 10.1101/gr.5918807.View ArticlePubMedPubMed CentralGoogle Scholar
- Hallström BM, Janke A: Resolution among major placental mammal interordinal relationships with genome data imply that speciation influenced their earliest radiations. BMC Evol Biol. 2008, 8: 162-10.1186/1471-2148-8-162.View ArticlePubMedPubMed CentralGoogle Scholar
- Prasad AB, Allard MW, NISC Comparative Sequencing Program, Green ED: Confirming the phylogeny of mammals by use of large comparative sequence data sets. Mol Biol Evol. 2008, 25: 1795-1808. 10.1093/molbev/msn104.View ArticlePubMedPubMed CentralGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.