Evolution, expansion and expression of the Kunitz/BPTI gene family associated with long-term blood feeding in Ixodes Scapularis

Background Recent studies of the tick saliva transcriptome have revealed the profound role of salivary proteins in blood feeding. Kunitz/BPTI proteins are abundant in the salivary glands of ticks and perform multiple functions in blood feeding, such as inhibiting blood coagulation, regulating host blood supply and disrupting host angiogenesis. However, Kunitz/BPTI proteins in soft and hard ticks have different functions and molecular mechanisms. How these differences emerged and whether they are associated with the evolution of long-term blood feeding in hard ticks remain unknown. Results In this study, the evolution, expansion and expression of Kunitz/BPTI family in Ixodes scapularis were investigated. Single- and multi-domain Kunitz/BPTI proteins have similar gene structures. Single-domain proteins were classified into three groups (groups I, II and III) based on their cysteine patterns. Group I represents the ancestral branch of the Kunitz/BPTI family, and members of this group function as serine protease inhibitors. The group I domain was used as a module to create multi-domain proteins in hard ticks after the split between hard and soft ticks. However, groups II and III, which evolved from group I, are only present and expanded in the genus Ixodes. These lineage-specific expanded genes exhibit significantly higher expression during long-term blood feeding in Ixodes scapularis. Interestingly, functional site analysis suggested that group II proteins lost the ability to inhibit serine proteases and evolved a new function of modulating ion channels. Finally, evolutionary analyses revealed that the expansion and diversification of the Kunitz/BPTI family in the genus Ixodes were driven by positive selection. Conclusions These results suggest that the differences in the Kunitz/BPTI family between soft and hard ticks may be linked to the evolution of long-term blood feeding in hard ticks. In Ixodes, the lineage-specific expanded genes (Group II and III) lost the ancient function of inhibiting serine proteases and evolved new functions to adapt to long-term blood feeding. Therefore, these genes may play a profound role in the long-term blood feeding of hard ticks. Based our analysis, we propose that the six genes identified in our study may be candidate target genes for tick control.

Conclusions: These results suggest that the differences in the Kunitz/BPTI family between soft and hard ticks may be linked to the evolution of long-term blood feeding in hard ticks. In Ixodes, the lineage-specific expanded genes (Group II and III) lost the ancient function of inhibiting serine proteases and evolved new functions to adapt to long-term blood feeding. Therefore, these genes may play a profound role in the long-term blood feeding of hard ticks. Based our analysis, we propose that the six genes identified in our study may be candidate target genes for tick control.

Background
Ticks are classified into two major families: Ixodidae (hard ticks) and Argasidae (soft ticks) [1,2]. The family Ixodidae is further divided into two groups, Prostriata and Metastriata. Prostriata contains only a single genus, Ixodes. In contrast, Metastriata contains four subfamilies: Amblyomminae, Haemaphysalinae, Hyalomminae, and Rhipicephalinae [1,2]. All ticks are external bloodfeeding parasites of mammals, birds and reptiles throughout the world [3,4]. They can transmit a wide variety of pathogens causing several human and animal diseases, including Lyme disease, human granulocytic anaplasmosis, and human babesiosis [5,6]. However, hard and soft ticks display different feeding strategies. Hard ticks feed on blood for a few days to over one week, whereas soft ticks typically feed on blood for minutes to hours [7]. The evolutionary drivers of long-term blood feeding in hard ticks remain unknown.
How the functional differences and complex domain architectures of Kunitz/BPTI proteins emerged and whether this evolution is associated with differences in blood-feeding strategies between soft and hard ticks remain unclear. Studying the evolution and expression of Kunitz/BPTI gene family between soft and hard ticks can help answer these questions. The evolution of the Kunitz/BPTI family in soft ticks has been previously studied [11,26]. These studies revealed that the blood coagulation and platelet aggregation inhibitors (Kunitz/ BPTI proteins) evolved in the ancestral soft tick lineage and persisted in all soft ticks. However, to our knowledge, there has not been a comprehensive and systematic study on the evolution and expression of Kunitz/ BPTI gene family in hard ticks. The hard tick Ixodes scapularis and its close relatives, Ixodes pacificus and Ixodes ricinus, are the most important ticks because they transmit the majority of emerging human disease pathogens [28,29]. The genome and saliva transcriptome of Ixodes scapularis are now available [15,18,28], which provides an opportunity to study the evolution and expression of Kunitz/BPTI gene family in Ixodes scapularis.
In our study, we systematically examined the evolution, expansion and expression of Kunitz/BPTI gene family in Ixodes scapularis. We then compared the characteristics of this family between soft and hard ticks by sequence, structure and gene expression analyses. Based on these analyses, we illustrated the profound role of this family in long-term blood feeding and discuss the reasons for different blood-feeding strategies between hard and soft ticks. Finally, we proposed that the six genes (Table 1) with highly dynamic expression identified in our study might be candidate target genes for tick control.

Results
Identification, classification and gene structure of the Kunitz/BPTI family in all ticks The sequence search strategy used here was described in Methods and additional file (Additional File 1). We retrieved 368 sequences in the NR database at the NCBI website. After removing redundant sequences (Additional File 2), partial sequences and other sequences without the cysteine motif of the Kunitz/BPTI domain, a total of 303 Kunitz/BPTI proteins were recognized in Six-KU proteins were not present. Signal peptides of the 303 sequences were predicted using the SignalP 4.0 server [30]. We found that most of the Kunitz/BPTI proteins (205/303 sequences) have signal peptides. Signal peptides were detected in each type of Kunitz/BPTI protein, indicating that Kunitz/BPTI proteins perform their function mainly in the extracellular medium.
The genome data of Ixodes scapularis contain rich information about the organization between Kunitz/ BPTI gene loci and their gene structures. By aligning mRNA to genomic sequences, we obtained gene structures and organization for the Kunitz/BPTI gene loci ( Figure 1). Single-domain proteins have similar gene structure. Furthermore, single and multi-domain Kunitz/BPTI proteins have similar gene structure. The first exon encodes a signal peptide, and the Kunitz/BPTI domain is encoded in a single exon ( Figure 1A). Multiple exons arrayed in tandem in the gene contribute to multi-domain Kunitz/BPTI proteins. The length of the introns is divergent. However, the Kunitz/BPTI domainencoding exon has an average length of 180 bp, which encodes the 60-aa domain. Some Kunitz/BPTI genes (9/ 55) are arranged in tandem in the genome ( Figure 1B), while other Kunitz/BPTI genes (46/55) are dispersed throughout the genome.
Evolution of single-domain Kunitz/BPTI proteins: Evidence for lineage-specific expansion in Ixodes ticks To clarify the phylogenetic relationships of singledomain Kunitz/BPTI proteins in Ixodes scapularis and compare them with other tick Kunitz/BPTI proteins, we first analyzed the Kunitz/BPTI proteins in Ixodes scapularis. Then, all tick Kunitz/BPTI proteins were gathered for further analysis. A total of 80 Kunitz/BPTI proteins in Ixodes scapularis are clustered into three clades (named groups I, II and III) (Figure 2A and Additional File 4). Groups II and III are highly supported with posterior probabilities of 0.99 and 1 in Bayesian inference (MB) tree and bootstrap values of 61% and 100% (NJ tree) and 62% and 100% (ML tree), respectively. In contrast, group I forms one clade with lower support values of 0.79 in the MB tree and 0.75 in the NJ tree. The proteins group I do not cluster into one clade in the ML tree (Additional File 4). The Low supports for the three groups may be due to the long branches and divergence in the group. Groups I, II and III follow the cysteine patterns CX  Figure 2B and Additional File 5). The pattern of group I is the typical cysteine pattern of the Kunitz/BPTI family, and this motif is also common in Bilateria and Cnidaria [24,[35][36][37][38][39]. This suggests that group I represents the ancestral branch of the Kunitz/BPTI family in Ixodes scapularis.
Next, we extended our analysis of single-domain Kunitz/BPTI proteins from Ixodes scapularis to all ticks. Group I proteins in Ixodes scapularis are distributed widely, and they do not form a monophyletic clade (Figure 3 and Additional File 6), which further indicates that group I represents the ancestral branch of the Kunitz/BPTI family in Ixodes scapularis. By contrast, groups II and III form two separate monophyletic clades with high support (posterior probabilities of 0.92 and 0.97 in the MB tree and bootstrap values of 70% and 97% in the NJ tree and 70% and 84% in the ML tree, respectively) ( Figure 3 and Additional File 6). This  suggests that group II and III proteins may be only present in the genus Ixodes.
To confirm this argument, we searched the EST database of NCBI by TBLASTN. We were able to obtain some Kunitz/BPTI ESTs in Ixodes ricinus with translated protein sequences containing cysteine patterns from groups II and III (Additional File 7). However, this cysteine pattern was not found in other tick genera ESTs. Furthermore, we searched the genome of an additional hard tick (Rhipicephalus microplus) using The support values for these two groups (in MB, NJ, and ML tree) were shown in left bottom. The circle, square and triangle in the phylogram indicate the proteins from soft ticks, the genus Ixodes (hard ticks) and other hard ticks, respectively. The sequence name consists of 5 letters (3 from the genus and 2 from the species name) with the NCBI accession number. (Argmo, Argas monolakensis; Ornco, Ornithodoros coriaceus; Ornmo, Ornithodoros moubata; Ornpa, Ornithodoros parkeri; Ornsa, Ornithodoros savignyi; Rhiap, Rhipicephalus appendiculatus; Rhisa, Rhipicephalus sanguineus; Ixosc, Ixodes scapularis; Ixopa, Ixodes pacificus; Ixori, Ixodes ricinus; Rhimi, Rhipicephalus microplus; Hyama, Hyalomma marginatum rufipes; Ambam, Amblyomma americanum; Haelo, Haemaphysalis longicornis).
TBLASTN but did not find any group II or III proteins. Taken together, we infer that the genes in groups II and III are only present in species of the genus Ixodes. Groups II and III are lineage-specific expansions of the Kunitz/BPTI family in Ixodes.
Evolution of multi-domain Kunitz/BPTI proteins: The group I domain is a module for constructing multidomain Kunitz/BPTI proteins There are many multi-domain Kunitz/BPTI proteins in ticks. Two-, three-, four-and five-KU proteins from all available tick sequences were aligned, and their corresponding phylogenetic trees were constructed. A phylogenetic tree for seven-KU proteins was not constructed because only three seven-KU proteins were found (one in Ixodes scapularis and two in Amblyomma maculatum). By carefully checking these phylogenetic trees, we found that two-KU proteins are widespread in both soft and hard ticks ( Figure 4A). Intriguingly, the proteins that have more than two Kunitz/BPTI domains (three-, four-, five-and seven-KU) are only present in hard ticks ( Figure 4B, C, and 4D). This indicates that many multi- domain Kunitz/BPTI proteins were created in hard ticks during the evolution of the Kunitz/BPTI family. These multi-domain Kunitz/BPTI proteins may contribute to the different blood-feeding strategies of soft and hard ticks.
In our analysis, multiple Kunitz/BPTI domains in tandem contribute to multi-domain Kunitz/BPTI proteins. Because single-domain Kunitz/BPTI proteins were classified into three groups, we examined whether multidomain Kunitz/BPTI proteins arose from a single Kunitz/BPTI group or whether they originated from different Kunitz/BPTI groups. First, all multi-domain Kunitz/BPTI proteins in Ixodes scapularis were split into single-domain segments. Then, these single-domain segments were aligned together with all of the singledomain Kunitz/BPTI proteins in Ixodes scapularis, and the corresponding phylogenetic trees were constructed (Additional File 8). Again, groups II and III form two separate monophyletic clades with high support (posterior probabilities of 0.96 and 1 in the MB tree and bootstrap values of 77% and 96% in the NJ tree and 82% and 91% in the ML tree, respectively). In contrast, group I proteins are distributed widely, and they cluster with other domains from the different multi-domain Kunitz/ BPTI proteins. This indicates that only the group I domain was used as a module for constructing multidomain Kunitz/BPTI proteins, and group II and III proteins were not involved in the formation of multidomain Kunitz/BPTI proteins.
This hypothesis is also supported by the species distribution of the Kunitz/BPTI proteins. Groups II and III are only present in the genus Ixodes. However, multidomain Kunitz/BPTI proteins are widely present in hard ticks (including the genera Ixodes, Amblyomma, and Rhipicephalus) and soft ticks (in the case of two-KU proteins). This suggests that multi-domain Kunitz/BPTI proteins emerged earlier than the development of group II and III proteins. Therefore, it is impossible for multidomain Kunitz/BPTI proteins to have originated from group II and III domains.

Functional site analysis of the Kunitz/BPTI family in Ixodes scapularis
The typical Kunitz/BPTI protein has a conserved tertiary structure stabilized by three highly conserved disulfide bridges with the bonding patterns 1-6, 2-4, and 3-5 ( Figure 5A). The structure-function relationship of Kunitz/BPTI proteins has been studied in prior studies [22,24,34,36,37,[40][41][42][43]. Kunitz/BPTI proteins have two functions. One commonly accepted function is the inhibition of serine proteases [8,44]. The second function, which is relatively rare, is the blocking of ion channels [24,45]. At the "apex" of the inhibitor structure, a protruding loop (L1) penetrates deeply in the catalytic-site cleft of a protease to inhibit the protease via the P 1 residue side chain (Lys15 in BPTI) [24,36,46,47] (Figure 5A). In ion channel blockers, the "base" of the structure, which contains a 3 10 helix and a β-turn, binds near the entrance of the pore of an ion channel to block the ion channel activity [39,[48][49][50] ( Figure 5A).
The key residues for inhibiting serine proteases are the amino acids surrounding the P 1 residue that are in direct contact with the protease [51]. Trypsin inhibitors have a basic residue (R or K) in the P 1 position followed by a small residue (motif: C 2 [K/R] s). Chymotrypsin inhibitors have a residue with a large side-chain (F, L, N, Y) in the P 1 position followed by a small residue (motif: C 2 [B] s) (where C 2 , B and s indicate the second residue cysteine, a residue with a large side-chain and a small residue, respectively) [36]. Most of the proteins in group I exhibit the motifs C 2 [K/R] s and C 2 [B] s (Figure 2B and Additional File 5), which suggests that they are serine protease inhibitors. Therefore, group I proteins may function as anti-hemostatic factors by inhibiting serine proteases in the coagulation system. For example, the protein Ir-CPI ( Figure 3) from Ixodes ricinus has the motif C2 [K/R] s, and it can inhibit FXIIa, FXIa, and kallikrein [52].
However, proteins in groups II and III lack the key residues for inhibiting serine proteases ( Figure 2B), suggesting that they lost this function. To investigate whether group II and III proteins have evolved new functions, we performed structure-based phylogenetic analysis ( Figure 5B, C). Group III proteins form a monophyletic clade that does not include homologous structures ( Figure 5C), preventing us from inferring their function. Interestingly, group II proteins and Ra-KLP (PDBID: 2W8X) [24] are clustered into the same clade ( Figure 5C). Ra-KLP, which is a salivary protein in Rhipicephalus appendiculatus (hard tick), can regulate host blood flow and innate immunity through modulating maxiK channels [24]. Ra-KLP is well superimposed with protein AAM93612 (group II) (RMSD: 1.2) ( Figure 5B). The two proteins share strong similarities in the regions of the 3 10 helix and the β-turn at the "base" of the structure ( Figure 5B). Furthermore, structural elements (R or K and a close hydrophobic residue [L, Y, or F]) associated with channel-modulating activity [39,48] are present in the AAM93612 protein and most proteins in group II ( Figure 5B and Additional File 5). These observations indicate that proteins in group II may modulate ion channels, such as maxiK channels. Some of the proteins in group II possess a Q instead of an R or K (Additional File 5), which means that these proteins cannot exhibit channel-modulating activity. Therefore, the possibility that group II may have multiple functions should not be excluded.
The expression patterns of genes in groups II and III exhibit stage-specificity during long-term blood feeding in Ixodes ticks Hard ticks feed on blood for a few days to over one week, whereas soft ticks typically feed on blood for only minutes to hours [7]. Therefore, if genes exhibit significantly higher expression at 6-12 hours or later after host attachment, such genes may have a potentially important role in the long-term blood feeding of hard ticks. Here, we used R statistics [53] to analyze  1BPI). Secondary structural elements were colored from blue (N-terminus) to red (C-terminus). The three conserved disulphide bridges and the P 1 residue (Lys15 in BPTI) are shown as an atom-colored stick model. (B), Structural alignment of proteins AAM93612 (group II) and Ra-KLP (PDBID: 2W8X). The two proteins share strong similarities in the regions of 3 10 helix and β-turn at the "base" of the structure. In contrast, there is significant difference at the "apex" of the structure between the two proteins, due to different type indels. The superimposed structures are shown as cartoon (left) and ribbon (right). The key residues associated with channel-modulating activity are represented as stick (AAM93612: R-34, F-28; Ra-KLP: R-50, Y-44). (C), Structure-based neighbor-joining (NJ) tree of Kunitz/BPTI proteins. Six protein modes (AAM93606, AAM93608, AAM93610 and AAM93612 from group II; AAM93632 and AAM93635 from group III) were aligned with other Kunitz/BPTI protein structures (1BF0, 1BPI, 1BUN, 1DEM, 1DTK, 1DTX, 1SHP, 2CA7, 2JOT, 2KCR, 2UUY, 2W8X) using the MISTRAL online server. The root-mean-square deviation (RMSD) of pairwise structures was used to reconstruct neighbor-joining (NJ) tree. The length of each branch is proportional to the RMSD. Bar denotes 20% RMSD.
We further searched the EST database of Ixodes ricinus and found similar expression patterns for group II and III genes (Additional File 7). In both Ixodes scapularis and Ixodes ricinus, group II genes were expressed in the middle and late stages of long-term blood feeding, whereas group III genes were only expressed in the late stage of long-term blood feeding ( Figure 6 and Additional File 7). The expression patterns of genes in groups II and III exhibit stage-specificity during longterm blood feeding. This suggests that these genes are functionally linked to long-term blood feeding in the Ixodes ticks.

Positive selection drove the evolution of group II and III genes
Positive selection is a major driving force in the expansion of gene families for particular functions [54]. To investigate whether positive selection drove the evolution of new functions in the Kunitz/BPTI family in Ixodes scapularis, site-specific models implemented in the CODEML program of PAML version 4.4c [55] were used to evaluate the selective pressure in each group. We found strong evidence (with p-value < 0.0001 for M1a vs. M2a and M7 vs. M8) for positive selection imposed on both groups II and III, and we identified positively selected sites in each group ( Figure 2B, 7 and Additional File 9 and 10). However, there no evidence for positive selection was found in group I (Figure 7 and Additional File 11).
Substitution saturation could bias the estimation of the dN/dS ratios. In order to assess substitution saturation of three groups (group I, II and III) further, two methods (Steel's method [56] and Xia's method [57]) implemented in the software DAMBE [58] were used. Both methods show that group I has experienced substitution saturation, but groups II and III have experienced little substitution saturation (Additional file 12). Furthermore, previous research shows that saturation does not increase the rate of false positive predictions for the likelihood ratio test [59]. Therefore, the results of detection of positive selection within group II and III sequences remain relevant. Although Bayesian prediction of sites under positive selection is more sensitive to saturation level, the divergence levels estimated here do not pose a serious concern [60,61]. Therefore, we used detected sites under positive selection for further exploratory analyses.
Sequences used in the PAML analysis and the phylogenies with dN and dS on each branch have been provided in additional files (Additional File 13 and 14). There are more positively selected sites in group II than in group III (Figure 7, Additional File 9 and 10). To address whether the identified sites under positive selection relate to the molecular function of the Kunitz/BPTI family, the positively selected sites were mapped to three-dimensional structures modeled by I-TASSER online [62]. In group II, the positively selected sites are in the L1 and L2 loops at the "apex" and the α-helix at the "base" of the structure (Figure 7B), which suggests that there was selection for rapid changes in this region. The rapid amino acid substitutions could lead to structural conformational changes in the 3 10 helix and βturn, which paves the way for the evolution of new functions, such as modulating ion channels. In group III, only three positively selected sites were identified; they are located at the L1 loop, L2 loop and β-turn ( Figure 7B), which suggests that these sites are involved in the function of group III proteins. In both group II and group III, positively selected sites are located in the functional regions. Therefore, rapid evolution in these two groups of Kunitz/BPTI proteins may have contributed to the evolution of new functions.

Discussion
The evolution scenario of Kunitz/BPTI family in ticks Ticks are classified into several groups (Figure 8). Previous studies indicated that Ixodidae (hard ticks) evolved from bird-feeding soft ticks and whole genome duplication occurred once early in tick evolution [15,63,64]. Based on our results combined with those of previous studies, the evolution of the Kunitz/BPTI family in ticks is proposed as follows ( Figure 8A). The common ancestor of ticks carried Kunitz/BPTI genes orthologous to those in group I, because group I genes are present in both soft and hard ticks. Hard and soft ticks diverged between 120 and 92 MYA [26], and whole genome duplication occurred in this divergence. Many multidomain Kunitz/BPTI proteins were created in hard ticks after the split between hard and soft ticks. Three-, four-, five-and seven-KU proteins are only present in hard ticks. After the split of Prostriata and Metastriata, certain indels developed in group I genes followed by multiple gene duplications gave rise to groups II and III in Prostriata. By contrast, other types of indels developed in group I genes in Metastriata. The group II and III type indels lead to structural variations in the Kunitz/ BPTI proteins ( Figure 8B). For example, the L1 loop in group II turns into a longer and distorted loop, and the 3 10 helix is not present in group III. These variations may result in evolutionary novelty by changing the Figure 7 Positive selection detection result of Kunitz/BPTI family in Ixodes scapularis. (A), Column charts of the posterior mean ω for each site in group I, group II and group III. In these charts, the blue, green and red columns are based on M3, M2a and M8, respectively. Positively selected sites are detected in both group II and group III, but not in group I. (B), Surface models of representative proteins in group I, group II and group III. Protein XP_002406145.1, AAM93612.1 and AAK97828.1 from group I, group II and group III, respectively, were modeled using the I-TASSER server. Positively selected sites with posterior probability > 95% (Table S4-S5) are located in the surface models (shown as stick with red color). structure and therefore the function of the protein.
Then, positive selection drove the divergence of genes in groups II and III in Prostriata, as described above. And structural alignment and functional site analysis suggests that genes in group II evolved a new function of modulating ion channels.
Ra-KLP, which is a salivary protein in Rhipicephalus appendiculatus (Metastriata), lost the function of inhibiting serine proteases but gained the function of modulating maxiK channels [24]. Both Ra-KLP and group II proteins are members of the Kunitz/BPTI family, but they resulted from different types of indels and formed different cysteine patterns after the split between Prostriata and Metastriata ( Figure 5B, 8A and Additional File 15). However, the two proteins share strong similarities in their 3 10 helices and β-turns at the "base" of the structures, and both have structural elements associated with channel-modulating activity ( Figure 5B and Additional File 15). This is an example of the phenomenon of convergent evolution. Kunitz/BPTI proteins in the two branches of hard ticks (Prostriata and Metastriata) acquired similar structures and functions despite having different indels and cysteine patterns ( Figure 5B and Additional File 15).
This evolutionary scenario is rare in other species, such as snakes, spiders, cone snails and sea anemones, in which positive selection was detected in the evolution of the Kunitz/BPTI family despite the absence of indels [24,[35][36][37][38][39].
The profound role of Kunitz/BPTI family in long-term blood feeding To successfully feed to repletion, ticks must circumvent host hemostatic, inflammatory and immune responses [7][8][9]. Hemostatic responses rely on the triad of blood coagulation, platelet aggregation, and vasoconstriction [7]. In our study, group I proteins are serine protease inhibitors that function as anti-hemostatic factors by inhibiting kallikrein and the coagulation factors FXIIa, FXIa and FXa (Figure 3). Group II proteins may have evolved a new function in regulating host blood flow and innate immunity through modulating maxiK channels. Hard ticks imbibe large volumes of blood (approximately two thirds of their body volume) and expand enormously in the late stages of long-term blood feeding (named the rapid engorgement stage) [65,66]. Regulating host blood flow can enhance blood-feeding efficiency in the late stages of long-term blood feeding, which is beneficial for the rapid engorgement of hard ticks. Therefore, this function may play a profound role in the long-term blood feeding of hard ticks. Gene expression analysis reveals that the expression of genes in groups II and III is significantly higher in the late stages of long-term blood, which further suggests that these genes are functionally linked to long-term blood feeding.
Different blood-feeding strategies may result from different sizes of the gene family in soft and hard ticks Hard and soft ticks display different blood-feeding strategies: soft ticks typically feed for less than one hour, whereas hard ticks feed for prolonged periods of time, varying from a few days to over one week [7]. The reasons for this difference are not fully clarified [1,7,12]. Previous studies indicated that both hard and soft ticks possess a similar number of gene families, but they have different numbers of gene family members [12]. Multiple gene duplications occurred in hard ticks after whole genome duplication, resulting in a large number of gene family members. Compared with hard ticks, soft ticks possess fewer members in each family [12]. Therefore, different blood-feeding strategies may result from different sizes of the gene family in soft and hard ticks. In our study, there are a large number of Kunitz/BPTI genes, especially in the lineage-specific expanded genes (group II and III) in hard ticks. Group II and III proteins are functionally linked to long-term blood feeding. Furthermore, multi-domain proteins (three-, four-, fiveand seven-KU) are only present in hard ticks. In contrast, the few members of the Kunitz/BPTI family and the absence of groups II and III and multi-domain proteins in soft ticks may partly explain why soft ticks cannot feed on blood for longer periods.
The six genes with highly dynamic expression identified in our study may be candidate target genes for tick control Tick control is important because ticks are vectors of a variety of bacterial and protozoan diseases, and ticks have the ability to transmit pathogens to vertebrates [5,6]. The development of vaccines directed against tick proteins may reduce tick infestations and the transmission of tick-borne pathogens. However, the limiting step in tick vaccine development is the identification of tick protective antigens [67][68][69]. Here, we propose that proteins encoded by the six genes (Table 1) identified in our study may be candidate protective antigens. This proposal is based on the following findings. First, the expression of these six genes increases significantly at 6-12 hours or later after host attachment. Additionally, these genes belong to groups II and III, which are subject to positive selection. The patterns of expression and evolution strongly suggest an important role for these genes in the process of long-term blood feeding. Second, safety issues may arise when protein ortholog sequences are used for vaccination because they can potentially induce autoimmune responses that are damaging to the host [68,70]. Fortunately, these group II and III genes are lineage specific and only present in Ixodes ticks. Although this feature makes these genes safe for vaccine development, it limits the application to only Ixodes ticks. Third, the Salp10 (AAK97828) protein in group III had been previously identified as a salivary gland antigen that elicited antibodies in the host [71]. This provides direct experimental evidence to support our proposal. Taken together, these six genes may be promising target genes for tick control. Additional experimental studies are needed to assess the validity of these genes for vaccine development.

Conclusions
Our study reveals the great differences between soft and hard ticks in the Kunitz/BPTI family. Compared with hard ticks, soft ticks do not possess group II and III proteins and multi-domain proteins (three-, four-, five-and seven-KU). Many multi-domain Kunitz/BPTI proteins were created in hard ticks using the group I domain as a module after the split between hard and soft ticks. Groups II and III, which exhibit significantly higher expression during long-term blood feeding, are only present and expanded in the genus Ixodes. In Ixodes, positive selection drove the expansion of the Kunitz/BPTI family and the evolution of new functions in group II such as ion channel-modulating ability. The two groups may play a profound role in the long-term blood feeding of hard ticks by enhancing blood-feeding efficiency in the late stages of long-term blood feeding, which is beneficial for the rapid engorgement of hard ticks. Therefore, our results suggest that the differences in the Kunitz/BPTI family between soft and hard ticks may be linked to the evolution of long-term blood feeding in hard ticks. Finally, we propose that the six genes (Isc.218, Isc.190, Isc.255, Isc.196, Isc.180 and Isc.179) identified in our study may be candidate target genes for tick control.

Sequence retrieval and characteristics identification
The search strategy used here is described as follow (Additional File 1). First, BPTI (a typical member of the Kunitz/BPTI family, UniProtKB AC: P00974) was used as a query sequence to search for Kunitz/BPTI protein sequences from all ticks against the NR database using BLASTP at the NCBI website under the default parameters [72,73]. Second, all hit sequences were filtered for the existence of a Kunitz/BPTI domain, which was determined by searching against the Pfam database under the default parameters [74]. To confirm that Kunitz/BPTI proteins were completely excavated, these putative Kunitz/BPTI proteins identified in this round of BLASTP search were used as a query to perform new rounds of BLASTP until no new hits appeared. Finally, a total of three rounds of BLASTP searches were performed, and there were 291, 368, and 368 sequences retrieved in the first, second and third rounds of the BLASTP search, respectively. Additionally, we used PSI-BLAST to search for Kunitz/BPTI proteins. These two search strategies have similar performance (Additional File 16). All sequences identified as Kunitz/BPTI proteins were submitted to the SignalP 4.0 server [30] for signal peptide prediction. Gene structures of the Kunitz/ BPTI family were determined using the gene structure display server http://gsds.cbi.pku.edu.cn/index.php [75].

Phylogenetic analysis of the Kunitz/BPTI family
To facilitate analysis of the Kunitz/BPTI family in ticks, Kunitz/BPTI proteins were classified into the following categories based their domain architectures: singledomain Kunitz/BPTI proteins present only in Ixodes Scapularis (ixosc_1domain), single-domain Kunitz/BPTI proteins present in all ticks (all_1domain), two-domain Kunitz/BPTI proteins present in all ticks (all_2domain), three-domain Kunitz/BPTI proteins present in all ticks (all_3domain), four-domain Kunitz/BPTI proteins present in all ticks (all_4domain), five-domain Kunitz/BPTI proteins present in all ticks (all_5domain), seven-domain Kunitz/BPTI proteins present in all ticks (all_7domain), and the combination of single-domain and multidomain Kunitz/BPTI proteins present only in Ixodes scapularis (ixosc_mix). Proteins in each category were aligned using ClustalW 2.0 [76] with fine adjustment by hand with reference to the cysteine residue position. Because the Kunitz domains are subjected to indels, ClustalW may not have treated these indels correctly. Therefore, the program Prank [77,78] was used to infer the indels and confirm the alignments obtained from ClustalW. Based on the alignment above, amino acid substitution models of protein evolution were determined by ProtTest 2.4 [79]. Phylogeny inference was performed using the neighbor-joining (NJ), Bayesian inference (MB) and maximum likelihood (ML) methods. MEGA 4.0 [80], MrBayes 3.1.2 [81] and Tree Finder [82] were used for reconstructing the NJ, MB and ML trees, respectively. All sets of parameters used for sequence alignment, model testing and phylogenetic analysis were listed (Additional File 17).

Evolutionary analysis of the Kunitz/BPTI family
Three groups (groups I, II and III) from single-domain Kunitz/BPTI proteins in Ixodes scapularis were tested to determine whether they were subjected to selection. Tests of selection (nonsynonymous-to-synonymous rate ratio, ω = dN/dS) were accomplished by using codonbased substitution models implemented in the CODEML program of PAML version 4.4c [55]. Codon alignments of nucleotide sequences (Additional File 13) for three groups were constructed from the PAL2NAL server [83] based corresponding protein sequence alignments. The tree topology used for the three groups was based on the NJ trees in Additional File 14. The presence of a positively selected rate class was detected using the likelihood ratio tests (LRTs, i.e., by comparing the likelihood of a neutral model with that of a selection model). For detailed assumptions and descriptions of each model see papers [55,84]. The Model 0 (M0) assumes one ω for all sites. Three pairs of models make three LRTs by comparing M0 (one ratio) against M3 (discrete), M1a (nearly neutral model) against M2a (positive selection model) and M7 (βdistribution model) against M8 (β& ω model). To test for significance, the LRT statistic 2ΔL (twice the log likelihood difference) was compared against the chi-square distribution (with df = 2 with critical values of 9.21 for M1a vs. M2a and M7 vs. M8, but with df = 4 with critical values 13.28 for M0 vs. M3) at the 1% significance level. The Bayes Empirical Bayes (BEB) approach was used to calculate the posterior probability that each site was from a particular site class, and sites with high posterior probabilities coming from the class with ω > 1 (with p > 0.95) were inferred to be under positive selection. All analyses were run twice using different initial ω values to ensure convergence. Substitution saturation could bias the estimation of the dN/dS ratios. In order to assess substitution saturation of three groups (group I, II and III), two methods (Steel's method [56] and Xia's method [57]) implemented in the software DAMBE [58] were used.
Protein modeling, structural alignment and locating positively selected sites Nine proteins (XP_002406145 from group I; AAM93606, AAM93607, AAM93608, AAM93610 and AAM93612 from group II; AAM93632, AAM93635 and AAK97828 from group III) were modeled using the I-TASSER server [62,85]. All model structures predicted here had been deposited in Protein Model DataBase [86] under ID numbers PM0077184-PM0077186 and PM0077307-PM0077312. Structural alignment was performed using the MISTRAL online server [87]. PyMOL