Evolution of trappin genes in mammals

Background Trappin is a multifunctional host-defense peptide that has antiproteolytic, antiinflammatory, and antimicrobial activities. The numbers and compositions of trappin paralogs vary among mammalian species: human and sheep have a single trappin-2 gene; mouse and rat have no trappin gene; pig and cow have multiple trappin genes; and guinea pig has a trappin gene and two other derivativegenes. Independent duplications of trappin genes in pig and cow were observed recently after the species were separated. To determine whether these trappin gene duplications are restricted only to certain mammalian lineages, we analyzed recently-developed genome databases for the presence of duplicate trappin genes. Results The database analyses revealed that: 1) duplicated trappin multigenes were found recently in the nine-banded armadillo; 2) duplicated two trappin genes had been found in the Afrotherian species (elephant, tenrec, and hyrax) since ancient days; 3) a single trappin-2 gene was found in various eutherians species; and 4) no typical trappin gene has been found in chicken, zebra finch, and opossum. Bayesian analysis estimated the date of the duplication of trappin genes in the Afrotheria, guinea pig, armadillo, cow, and pig to be 244, 35, 11, 13, and 3 million-years ago, respectively. The coding regions of trappin multigenes of almadillo, bovine, and pig evolved much faster than the noncoding exons, introns, and the flanking regions, showing that these genes have undergone accelerated evolution, and positive Darwinian selection was observed in pig-specific trappin paralogs. Conclusion These results suggest that trappin is an eutherian-specific molecule and eutherian genomes have the potential to form trappin multigenes.


Background
Trappins are a family of small secretory proteins that possess an N-terminal transglutaminase-substrate (TGS) domain and a C-terminal whey acidic protein (WAP) domain [1]. The TGS domain consists of repeats of six semi-conserved amino acids, KGQDPV, that act as anchoring regions. In this case, the lysine or glutamine residues of these regions are cross-linked with extracellular-matrix proteins by the action of transglutaminases, which helps trappin molecules to become concentrated at the site of action [2][3][4]. In contrast, the WAP domain is a four-disulfide core region and is defined by eight conserved cysteine residues. The WAP domain of trappin shows anti-proteolytic [4][5][6] and antimicrobial [7][8][9] activities that allow it to act as an innate immune defense molecule. In fact, trappin-2 displays antibacterial activities against Gram-positive and Gram-negative bacteria [7][8][9]; it also has antifungal activity [9], and the antimicrobial activity is independent of its antiprotease function [9]. The most well characterized trappin is human trappin-2, which is also known as elafin, skinderived antileukoproteinase (SKALP), elastase-specific inhibitor (ESI), or protease inhibitor 3 (PI3) [1,10]. It has strong inhibitory activity against leukocyte and pancreatic elastases and proteinase 3 [4][5][6], and shows antiinflammatory activity [11] as well. The antiproteolytic and antimicrobial activities of trappin-2 are quite similar to those of secretory leukocyte protease inhibitor (SLPI) [12,13], which consists of two WAP domains with the second WAP domain being highly homologous to the WAP domain of trappin-2. Trappin-2 is expressed in the trachea, lung, gut, epidermis, esophagus, vagina, and oral epithelia [2,4]. In these tissues, the expression is induced by proinflammatory cytokines, such as interleukin-1 (IL-1) and tumor necrosis factor (TNF)-α [14,15].
The number of trappin genes varies among mammalian species. For example, humans and sheep have a single trappin-2 gene [16,17], while pigs have at least six: trappin-1, trappin-2, trappin-3, trappin-7, trappin-8, and trappin-9 [18,19]. At the other extreme are the mouse and rat, which lack trappin genes entirely [20], though the guinea pig has genes for trappin-12 and its derivatives caltrin II and seminal vesicle secretory protein (SVP), which lack TGS-and WAP-coding regions, respectively [21,22]. Despite the variance in copy number between the different mammalian lineages, all trappin genes are encoded by three exons. Exon 1 encodes a signal peptide, exon 2 codes for a TGS-and WAPdomains, and exon 3 encodes a 3' untranslated region [18,19,23]. While the exonic organization is highly conserved among various mammalian lineages, there is variation in the number of six-amino-acid repeats in the TGS domain [18,19]. Due to a point mutation of splicing site, guinea pig trappin-12 exceptionally lacks intron 2, which is present at the 3' noncoding region of the trappin gene [22]. A short interspersed element (SINE) is found in intron 2 of the trappin genes of the pig, wart hog, and collared peccary [18,19].
While we have mentioned several species that possess multiple trappin genes, it is not known if (1) these are exceptional cases or (2) trappin genes normally exist as a multigene family. In an attempt to find the answers, we analyzed genome databases developed by the Mammalian Genome Project http://www.broad.mit.edu/mammals/ and identified six trappin genes from the ninebanded armadillo (Dasypus novemcinctus) genome. The nine-banded armadillo belongs to the taxonomic order Xenarthra. Because this lineage is believed to be one of the most ancient lineages of placental mammals [24], the analyses of armadillo trappin genes are quite interesting because the duplication and evolution of armadillo trappin genes are expected to have occurred independently from other species. In contrast, we identified a single trappin-2 gene from the genome databases of many species including the chimpanzee, rhesus macaque, bushbaby, dog, cat, horse, cow, European shrew, European hedgehog, megabat, and microbat. This fact suggests that trappin-2 is the ancestral form of trappin genes, and trappin-null species such as mouse and rat are exceptional. Finally, we identified anciently duplicated trappin-18 gene in Afrotheria such as the elephant (Loxodonta africana), tenrec (Echinops telfairi), and hyrax (Procavia capensis), and trappin-related genes in chicken and opossum, suggesting that the gene family originated as far back as more than 100 million years ago.

Results
Identification of trappin, SLPI, and trappin-related genes in eutherian mammals, opossum, platypus, chicken, and zebra finch To estimate the origin of trappin genes, we analyzed the genome databases of eutherian mammals, opossum, platypus, birds (chicken and zebra finch), Xenopus, fish (Danio rerio, Takifugu rubripes, Tetraodon nigroviridis, Gasterosteus aculeatus, and Oryzias latipes), sea squirts (Ciona intestinalis and Ciona savignyi), insects (Drosophila melanogaster, Anopheles gambiae, and Aedes aegypti), and Caenorhabditis elegans. Typical trappin genes were identified only in the mammalian species ( Figure 1C-D). A single homologous gene was identified in chicken, zebra finch, and opossum, and multiple homologous genes were identified in platypus, ( Figure  1A-B). The other species did not show any homologous genes to trappins except for other WAP-coding genes with a low homology (data not shown).
The presence of the SLPI gene was also analyzed using the genome databases, and a single orthologous gene was identified in all mammalian species except for the guinea pig and the rabbit ( Figure 1A). In contrast, there are no clear direct orthologs for the SLPI gene in the genome databases of chicken, zebra finch, and opossum, and the trappin-homologous genes are also the most homologous to SLPI. In platypus, the above-mentioned trappin-homologous genes encode two-WAP-domain proteins and may be the paralogs of the mammalian SLPI gene.
The trappin-related genes in chicken, zebra finch, and opossum have a single WAP-coding region but lack a TGS-coding region. Only the WAP-coding region is similar to trappin and the SLPI genes, but the other flanking regions lack any significant similarity except for a weak similarity in the signal-peptide coding regions (data not shown). The deduced amino acid sequence of the WAP domains of the trappin-related genes are shown in alignment with those of mammalian trappin and the SLPI genes ( Figure 1B). The catalytically important Met residue (an asterisk in Figure 1, A and 1C) is conserved in the opossum and platypus genes but not in the chicken and zebra finch genes.
Platypus SLPI genes show a stronger identity with trappin-2 (67%) than SLPI (55%) in the deduced amino acid sequences. The phylogenetic analysis for trappin, mammalian SLPI, platypus SLPI, and trappin-related genes of chicken and opossum is shown in Figure 2A.
Trappin-related genes of chicken and opossum and platypus SLPI genes are not clearly categorized as trappin or SLPI.

Identification of trappin-2 genes from various eutherian mammals
We analyzed genome databases for various eutherian mammals including human, chimpanzee, rhesus macaque, bushbaby, mouse, rat, rabbit, dog, cat, cow, European shrew, European hedgehog, microbat, megabat, nine-banded armadillo, sloth, elephant, hyrax, and tenrec. Only mouse, rat, and rabbit lack trappin genes in  Figure 1 WAP domain of the trappin gene family. WAP domains of SLPI (A), trappin-related genes (B), trappin-2 (C), and the other speciesspecific trappin paralogs (D) are shown. Conserved and semiconserved residues among trappin-2 genes are indicated by light gray. Eight conserved Cys residues constituting the WAP motif signature sequence are shaded in black. The asterisk indicates the catalytically important Met residue. The variable region, which is thought to determine the specificity of WAP motifs, is boxed. their genome databases, but the other species have at least one trappin gene in their genome databases ( Figure  3). All the newly identified trappin genes consist of three exons like previously analyzed trappin genes: exon 1 encodes a signal peptide, exon 2 encodes the TGS and WAP domains, and exon 3 encodes the 3' untranslated region (data not shown). We aligned the amino acid sequences of the WAP domains of those genes, and categorized them into two groups: the first group contained the catalytically important Met residue (asterisk in Figure 1C) and was named trappin-2 (Met rule); and the second group lacked the Met residue and was named according to the order of discovery. Most animals have a single trappin-2 gene ( Figures 1C and 3).
This finding suggests that the trappin-2 gene is the ancestral form of the trappin genes. According to this definition, the previously reported porcine trappin-8 should also be renamed trappin-2b as it also has the Met residue. Tenrec trappin-18 also has the Met residue at the catalytic site. However, the phylogenetic analyses using nucleotide sequences of the noncoding regions (introns, exon 3 and 5' and 3' flanking regions) clarified that this gene is closely related with trappin-18 genes of elephant and hyrax. Therefore we call this gene tenrec trappin-18 as an exception to the Met rule.
The average nonsynonymous and synonymous distances were calculated on 150 bp WAP-coding regions among the trappin-2 genes of the various species,  trappin paralogs except for trappin-2, and all trappin genes (Table 1). In the WAP-coding region, the rate of nonsynonymous substitutions is lower than that of synonymous substitutions among trappin-2. The rates of nonsynonymous and synonymous substitutions are similar in the WAP-coding region among trappin paralogs except for trappin-2. These results suggest that the purifying selection is operating on the trappin-2 gene in various eutherian mammals.
Identification of novel trappin multigene families in armadillo and Afrotheria (elephant, tenrec, and hyrax) Database analyses demonstrated the presence of six trappin genes in nine-banded armadillo (Dasypus novemcinctus), which were named trappin-2 and trappins-13-17 ( Figures, 1 and 3). Afrotherian species such as the elephant (Loxodonta africana), tenrec (Echinops telfairi), and hyrax (Procavia capensis) had two trappin genes, which were named trappin-2 and trappin-18 ( Figures, 1 and 3). We also found two novel trappin paralogs from the bovine genome database, and named them trappin-19 and trappin-20 ( Figures 1D and  3). Phylogenetic analyses of the noncoding regions of trappin genes from several mammalian species are shown in Figure 2B. All armadillo trappins-13-17 genes form a single branch with armadillo trappin-2 gene. Bovine trappin-19 and trappin-20 also share the same branch with bovine trappin-2. These results suggest that those genes are recently duplicated species specific    Figure 3). We next calculated the date of duplications individually for each species using the taxon pair that was most closely related to the node of interest as a reference point. We found that some of the trappin gene subfamilies were relatively young. For instance, when the divergence time between sheep and cow (18.3 Mya) was used as a calibration point, the date of the duplication events giving rise to the pig and bovine trappin gene families were calculated as 7.8 and 9.7 Mya, respectively. Similarly, when the divergence time between human and armadillo (96.2 Mya) was used as a reference point, the date of the duplication event giving rise to the armadillo trappin gene family was calculated as 15.5 Mya. On the other hand, certain trapping gene subfamilies appear to be more ancient. For example, when the divergence time between primate and rodent (61.7 Mya) was used as a reference point, the date the guinea pig trappin gene subfamily was estimated to have originated 55.2 Mya, and when the divergence time between elephant and tenrec (48.6 Mya) was used as a reference point, the Afrotherian trappin trappin gene subfamily was calculated to have originated 91.9 Mya (double asterisks in Figure 3).
We also estimated divergence times using the Bayesian method implemented in the BEAST software package [25]. The date of the duplications of trappin genes of pig, cow, armadillo, guinea pig, and Afrotheria were calculated as 3.3, 12.6, 11.4, 34.7, and 244 Mya (arrows in Figure 3), respectively, when the nucleotide sequences of trappin genes were used for the calculation. When amino acid sequences were used for the calculation, the date of the duplications of trappin genes of pig, cow, armadillo, and Afrotheria were calculated as 49.7, 18.3, 50.9, and 154 Mya (arrowheads in Figure 3), respectively. Thus, the dates calculated for nucleotide and amino acid sequence data are very different. The former are similar to the dates generated using the linearized tree method in pig, cow, and armadillo, and the latter are similar to the dates generated using the linearized tree method in Afrotheria. In pig, cow, and armadillo, the protein-coding regions of trappin multigenes have evolved rapidly (Table 2), and the dates calculated for amino acid sequence data showed larger values. In Afrotheria, since amino acid sequences evolve more slowly than nucleotide sequences and are, therefore, less prone to homoplasy over the long evolutionary time frames being considered, it is reasonable to suspect that the dates generated using nucleotide sequence data are not as reliable as those generated from acid sequence data for this particular study.

Synteny analyses around trappin genes
It has been demonstrated that the trappin-2 is mapped on the WAP four-disulphide core (WFDC) domain locus which contains a number of WFDC genes [17]. Moreover, the conserved synteny of WFDC loci has been studied for primates, rodents, and the dog [20,26]. To extend these studies, we analyzed the genes neighboring trappins in species that were included in our study. Most scaffolds containing trappin genes were too short to analyze. However, we could analyze genes neighboring to horse trappin-2, megabat trappin-2, bovine trappins, hyrax trappin-18 and elephant trappins ( Figure 4A). As reported previously for the dog WFDC locus, the horse, megabat, and bovine trappin genes flanked WFDC5 in opposed directions, whereas WFDC12 and WFDC15 were not found between WFDC5 and trappin(s). The bovine trappin genes were tandemly arrayed, suggesting that these genes arose by tandem gene duplication. In contrast, hyrax and elephant trappin-18 flanked WFDC5 in the same direction and was mapped to the same locus as WFDC12. Harr Plot analyses demonstrated that human WFDC12 is highly homologous to the 5'-flanking region, exon 1, and intron 1 of hyrax trappin-18 ( Figure 4C). Similar homology was observed between human WFDC12 and elephant trapin-18 ( Figure 4D).
Recently, Hurle et al. found that primate trappin-2 contains a pseudogene for WFDC12 in intron 1, and suggested that trappin-2 and WFDC12 have a common ancestral gene [26]. All trappin genes contained a pseudogene for WFDC12 in intron 1 (data not shown) except for Afrotherian trappin-18, which codes for a WFDC12-like peptide in intron 1 ( Figure 4B). Accelerated evolution of TGS and WAP coding region trappin multigenes in armadillo, cow, and pig and positive selection of the WAP-coding region of pig trappin paralogs The average distances of the 5'-flanking region, exon 1, intron 1, exon 2, intron 2, exon 3, and 3'-flanking region among trappin multigenes for each species were calculated ( common sites) and those between the non coding regions (87 varied sites in 1691 common sites) demonstrated that the difference is significant (P < 0.01). A similar difference was not observed in the other regions. In cow, the average distances between the exon 2 regions were 0.167 (JC method) and 0.299 (NJ method) ( Table 2, line A4), and were 3.1 and 4.3 times, respectively, higher than those between the non-coding regions ( Table 2, line B1; P < 0.01, Fisher's exact test). In pig, the average distances between the exon 2 regions ( Table 2, line A4) were 12 and 16 times higher than those between the non-coding regions when calculated by the JC and TN methods, respectively (Table 2, line B1; P < 0.01). In contrast, there was no significant difference in the average distances between the exon 2 regions (Table 2, line A4) and those between the noncoding regions ( Table 2, line B1) of elephant, hyrax, and guinea pig trappin genes. In pig, the average distances Average JC (upper) and TN (lower) distances of each region among the trappin multigenes for each species (A1-E3) and estimated times of gene duplication (F1-F2) are shown. For each species, the accelerated evolution of each region (A1-A7, C1-E3) was assessed by making comparisons against the average distances of non-coding regions (B2). The bold letters indicate the distances of regions which evolved faster than the non-coding regions of the same genes. * P < 0.05, ** P < 0.01. *** In guinea pig, there is only one trappin gene that has both the TGS and WAP domains. ND, no data; pre, signal peptide (pre-sequence).  between the exon 1 regions and between the intron 2 regions of different genes ( Table 2, lines A2 and A5) were also higher than those of the non-coding regions ( Table 2, line B1) (P < 0.01).
Next, we calculated distance values for synonymous substitutions per site (ds) and non-synonymous substitutions per site (dn) for the signal peptide (pre peptide), TGS, and WAP coding regions ( Table 2, line C1-E3), and compared against the average distance of the noncoding regions. In armadillo, dn of the TGS coding domain (P < 0.05) and both ds and dn of the WAP coding domain (P < 0.01) were significantly higher than the average distance of the non-coding regions. In cow, only dn of the TGS and WAP coding regions were significantly higher than the average distance of the non-coding regions (P < 0.01). In pig, both dn and ds of the signal peptide, TGS, and WAP coding regions were higher than the average distance of the non-coding regions.
To examine the presence of positive Darwinian selection, we compared dn and ds of the rapidly-evolved coding regions using Fisher's test. When the average values in each species were used for the analyses, we could not detect any statistically significant difference between dn and ds of all the trappin multigenes. However, the pairwise comparison matrix of dn and ds on the paralogs of each species demonstrated a dn/ds rate of 3.3 for porcine trappin-2 vs. trappin-3, 4.9 for porcine trappin-2 vs. trappin-9, and 5.2 for porcine trappin-8 vs. trappin-9 (P < 0.05) ( Table 3). Because trappin-2 is the most conservative gene within the trappin family and porcine trappin-8 is the closest homolog of porcine trappin-2, these data indicate the positive Darwinian selection of porcine trappin-3 and trappin-9. Although the dn/ds rate between bovine trappin-5 and other bovine paralogs are as high as 2.8-5.8, the differences were not statistically significant (P = 0.07) ( Table 3).

Evaluation of the quality of genomic sequence with low coverage
The nucleotide substitutions between seven known cDNAs and corresponding exons in the genome databases were calculated and shown in Table 4. Some of these substitutions may have occurred as a result of sequencing errors or site-specific polymorphism within each species. Yet, we can still infer that the average rates of sequencing errors are lower than the substitution rates. In low coverage genomic sequences of armadillo, rabbit, cat, and elephant, the average substitution rates of the seven genes were 0.10-0.49%. In high coverage genomic sequences of cow and human, the average substitution rates of the seven genes were 0.23 and 0.06%, respectively. These estimates are not substantially different, suggesting that artifacts due to errors or polymorphism are negligible. In the case for armadillo trappin genes, for example, the 129-bp WAP-coding regions and the~2.1-kb entire genes may contain less than 0.4 and 6-base sequence errors, respectively. Among armadillo trappin multigenes, the WAP-coding regions and the~2.1-kb entire genes have 8-50 and 96-165-base substitutions, respectively. Thus, the sequencing errors of genomic sequences with low coverage appear to be negligible, although the caveat remains that we can not negate a small number of possible errors.

Origin of trappin gene
Computer analyses of genome databases revealed that typical trappin is a eutherian mammalian specific gene. The typical trappin genes were found only in eutherian mammals and not other species including Xenopus, fish, sea squirt, insects, and C. elegans. The trappin-related genes were found in chicken and opossum. The computer analyses also showed that most eutherian mammalian species have a single SLPI gene, and platypus has multiple SLPI genes. The trappin-related genes of those animals and platypus SLPI genes show strong similarity with trappin in the WAP domain only, but all the other regions have no significant homology. Therefore, these genes may relate with the ancestoral WAP domain of trappin. Interestingly, platypus SLPI showed higher homology to the WAP domain of trappin-2 than that of mammalian SLPI. This strongly suggests that the WAP domain of trappin and SLPI share a common ancestor.
Trappin is a protein that consists of TGS and WAP domains. SLPI is a two WAP-domain protein. The second WAP domain of SLPI and the WAP domain of trappin-2 are quite similar in their amino acid sequences and functions such as antiproteolytic and antimicrobial activities, suggesting an ancestral relationship. However, except for the WAP-coding regions, there is no significant homology in the nucleotide sequences between trappin and SLPI. Interestingly, trappin genes are known to have weak but significant similarity with other TGS genes in introns, TGS-coding, and noncoding regions ( Figure 5) [27]. This mosaic pattern of homology in trappin genes indicates that trappin genes originated from the TGS gene and obtained a WAP domain possibly by exon-shuffling. Afrotherian trappin-18 codes for a WFDC12-like peptide in intron 1 region, and the other trappins contain a pseudogene for WFDC12 in intron 1. These results support the hypothesis of Hurle et al. [26] that trappin and WFDC12 are derived from a common ancestral gene which codes for both trappin and WFDC12.

Evolution of trappin genes in eutherian mammals
Nineteen species of eutherian mammals were analyzed by a search for the presence of trappin genes within their genome databases, and the results were combined with those of previous experimental analyses of human [17,23], pig [18,19], wart hog [19], collared peccary [19], cow [28], sheep [16], and guinea pig [21,22]. In total, we could compare the trappin genes from 24 eutherian mammals (Figure 3). Within the 24 species analyzed, we could isolate trappin genes from 21 species. A single trappin-2 gene was found in 11 species, and multiple trappin genes were found in 8 species. These results indicate that trappin-2 is the most common and is an ancestral form while the other trappins are specie-specific paralogs. We could not find trappin genes in three mammalian species: mouse, rat, and rabbit. Our experimental analyses (data not shown) and the integrity of the genome databases of mouse and rat suggest that mouse and rat lack trappin genes in their genome [20]. In mouse and rat, other WAP-motif containing proteins such as SLPI and SWAMs may compensate the function Numbers show the nucleotide substitutions per the length of the sequences used for the analysis. The scaffold, contig, or accession numbers of the sequences are shown in Supplementary Table S1 (additional file 1). ND, no data. of trappin. In the case of rabbit, it is not certain whether rabbit really lacks trappin genes or rabbit has a trappin gene that has not yet been analyzed by the genome project. By computer analyses of genome databases, we found that the nine-banded armadillo as well as pig and cow also have recently-duplicated trappin multigene. The computer analyses of bovine genome databases also revealed two novel trappin paralogs and the sequences of the introns and flanking regions, which enabled the detailed evolutional analyses of bovine trappin multigenes. As previously reported porcine trappin multigenes, the WAP-coding regions of the trappin multigenes of armadillo and cow were shown to have evolved under accelerated evolution. Only dn was accelerated in the WAP coding regions of bovine trappins, and both dn and ds of the WAP coding regions were accelerated in armadillo and porcine trappins. The accelerated substitutions of non-synonymous sites of WAP-coding regions may be explained by positive Darwinian selection or relaxation of functional constraints, because we observed statistically significant positive selection of the WAP coding regions of porcine trappin-3 and trappin-9 but no significant difference between dn and ds of other trappins (Table 3). However, the question why synonymous substitutions are also accelerated can not be interpreted simply by the existence of positive Darwinian selection or relaxation of functional constraint. The mechanism whereby the synonymous substitutions are accelerated must be clarified by future studies.

TGS
The molecular clock and Bayesian analyses using the nucleotide sequences estimated the date of duplication as 11.4-15.9, 8.8-12.6, and 3.3-7.8 Mya for trappin multigenes of armadillo, cow, and pig, respectively. These results are consistent with previous experimental analyses demonstrating that the collared peccary that was separated from porcine 33 Mya [29,30], and sheep, which was separated from bovine 19.6 Mya [31], do not have trappin multigenes [16,19]. The findings of recently-duplicated accelerated-evolved trappin multigenes in three individual species demonstrate that mammalian genomes have the potential to form trappin multigenes in several million years. The selective pressure that formed the trappin multigenes may relate with some pathogens, and the variety of amino-acid sequences in the WAP-domain may contribute to the acquisition of antimicrobial activities for a large spectrum of pathogens. Tissue distribution of trappin paralogs in pig and cow has been shown to vary among genes: porcine trappin-2 is expressed in the trachea and the large intestine, porcine trappin-1 in the small intestine, bovine trappin-2 in the epidermis and the tongue, bovine trappin-4 in the trachea and the tongue, and bovine trappin-5 in the trachea [28]. Therefore, the selective pressures might also affect the regulation of the tissue-specific expression of trappin genes.
Our previous analyses revealed that guinea pig has a trappin-12 gene [22] and two derivative genes, SVP [32] and caltrin II [19]. SVP and caltrin II genes have significant homology with trappin including introns, noncoding region of exons, and flanking regions, but lack WAP and TGS domains, respectively. The molecular clock analysis estimated the date of the duplication of the guinea pig genes as 34.7-79.0 Mya. This date of duplication is much earlier than those of pig, cow, and armadillo.
In Afrotherians we found two trappin genes, trappin-2 and trappin-18, whose date of duplication was estimated as 91.9-244 Mya. This date is surprising, because it is earlier than the date of the periods of divergence of the major orders of eutherian mammals (70-10 Mya) [24,31], and suggests that the duplication of trappin-18 occurred in the ancestors of the eutherian mammals before the divergence of the species. In this context, most species lack trappin-18, however, only Afrotheria has retained the gene. The reason is still unknown, but it is conceivable that trappin-18 increases resistance to Afrotheria-specific pathogen. Another possible alternative explanation is that trappin-18 underwent substitutions at a faster rate per year than other trappin genes and that lead to the duplication time being overestimated.

Conclusions
• Typical trappin genes are only found in the genome sequences of eutherians but not in those of other vertebrate species.
• Trappin-2 is the most widely distributed and is the strongest candidate of the ancestral forms of trappin.
Recently-duplicated species-specific trappin paralogs are present in the genomes of armadillo, pig, and cow, and the non-synonymous sites of those genes have undergone accelerated evolution as a result of positive Darwinian selection or relaxation of functional constraint.
• Synonymous sites of recently-duplicated trappin paralogs of armadillo and pig have also undergone accelerated evolution by unknown mechanisms.
• The anciently-duplicated trappin-18 gene is only retained in afrotherian species and is a fossil molecule of the trappin gene family.

Isolation of trappin genes from various animal species
The genome database of various species (URL: http:// www.ensembl.org/index.html) [33] were screened using the amino-acid sequence of human trappin-2. The exon-intron organization was estimated by comparing it with that of the human trappin-2 gene. The nucleotide sequences of the trappin genes were deposited in the DDBJ/EMBL/GenBank DNA databases as third party annotations (TPAs) under accession numbers BR000322 to BR000327 and BR000708 to BR000720.
Evaluation of the quality of genomic sequence with low coverage To evaluate the quality of genomic sequences with low coverage, we compared known cDNA sequences with those of corresponding exons in the genome databases. We used cDNA sequences for SDHA, MDH2, ATP5B, GAPDH, SDHB, CS, and IDH1 which was determined by Kullberg et al. [34]. The corresponding exons of armadillo (2 × coverage), rabbit (2 ×), cat (1.87 ×), elephant (2 ×), cow (7 ×), and human (Genome Reference Consortium GRCh37 assembly) were isolated and the numbers of nucleotide substitutions between the sequences of cDNA and the genome databases were calculated for each species using MEGA software [35]. The sequences used for the analysis are shown in Supplemental Table S1 (see Additional file 1).

Molecular clock analysis and Bayesian divergence time estimation
The introns, exon 3 (noncoding exon), and 5'-and 3'noncoding regions of pig, cow, armadillo, guinea pig, and Afrotheria (elephant and hyrax) trappins were aligned, and a phylogenetic tree was constructed by the NJ method. A linearized tree was constructed and the dates of the duplication events of trappin genes were calculated by MEGA 3.1 using the divergence time between Primate and Artiodactyla (96.2 Mya) [40] as a calibration point for dating.
As an additional method to investigate divergence times, we used the Bayesian method implemented in the software package BEAST 1.4.8 [25]. To generate divergence times, the following nine fossil calibration points were taken from the work by Benton and Donoghue [40] and implemented as priors in the analysis of both DNA sequence and amino acid sequence data: (1) human-chimp: 6.5 Mya; (2) human-Macaque: 23.5 ± 0.5 Mya; (3) dog-cat: 43 ± 0.2 Mya; (4) cow-sheep: 18.3 ± 0.1 Mya; (5) cow-dog 96.2 ± 0.9 Mya; (6) human-cow: 96.2 ± 0.9 Mya; (7) human-armadillo: 96.2 ± 0.9 Mya; (8) tenrec-elephant: 48.6 ± 0.2 Mya; (9) human-opossum: 124.6 ± 0.1 Mya. The chains were run until convergence was reached (i.e., until the effective sample size for each parameter exceeded 200), which was 93 million states for the DNA sequence data and 10 million states for the amino acid sequence data. The HKY + gamma model was used for the analysis of the DNA sequence data, and the WAG model was used for the analysis of the amino acid sequence data. For both sequence data types, the birth-death speciation process was used as a tree prior.

Calculation of nucleotide substitution rates
Nucleotide sequences were separated into the following regions: the 5'-flanking region, exon 1, intron 1, exon 2, the WAP-coding region of exon 2, intron 2, exon 3, and the 3'-flanking region. These regions were aligned separately using ClustalW software. Jukes-Cantor (JC) distances [41] and Tamura-Nei (TN) distances [42] were calculated using MEGA software [35]. For the calculation of TN distances, we estimated the gamma shape parameter using MrBayes [43] plugin for Geneious software. Distance values for the synonymous substitutions per site (ds) and non-synonymous substitutions per site (dn) of the signal-peptide-coding region of exon 1, TGS-and WAP-coding regions of exon 2 were calculated using the modified Nei-Gojobori (NG) method [44]. Standard errors were computed using the bootstrap method [45] with 2,000 replicates. Fisher's exact test was used for the statistical analyses [46].
Additional file 1: Nucleotide sequences used for the evaluation of the quality of genomic sequence with low coverage. Accession numbers for the cDNAs (upper) and scaffold, contig, or accession numbers of the genomic sequences (lower) are shown. ND, no data. Click here for file [ http://www.biomedcentral.com/content/supplementary/1471-2148-10-31-S1.DOCX ]