Evolution of the avian β-defensin and cathelicidin genes

Background β-defensins and cathelicidins are two families of cationic antimicrobial peptides (AMPs) with a broad range of antimicrobial activities that are key components of the innate immune system. Due to their important roles in host defense against rapidly evolving pathogens, the two gene families provide an ideal system for studying adaptive gene evolution. In this study we performed phylogenetic and selection analyses on β-defensins and cathelicidins from 53 avian species representing 32 orders to examine the evolutionary dynamics of these peptides in birds. Results and conclusions Avian β-defensins are found in a gene cluster consisting of 13 subfamiles. Nine of these are conserved as one to one orthologs in all birds, while the others (AvBD1, AvBD3, AvBD7 and AvBD14) are more subject to gene duplication or pseudogenisation events in specific avian lineages. Avian cathelicidins are found in a gene cluster consisting of three subfamilies with species-specific duplications and gene loss. Evidence suggested that the propiece and mature peptide domains of avian cathelicidins are possibly co-evolving in such a way that the cationicity of the mature peptide is partially neutralised by the negative charge of the propiece prior to peptide secretion (further evidence obtained by repeating the analyses on primate cathelicidins). Negative selection (overall mean dN < dS) was detected in most of the gene domains examined, conserving certain amino acid residues that may be functionally crucial for the avian β-defensins and cathelicidins, while episodic positive selection was also involved in driving the diversification of specific codon sites of certain AMPs in avian evolutionary history. These findings have greatly improved our understanding of the molecular evolution of avian AMPs and will be useful to understand their role in the avian innate immune response. Additionally, the large dataset of β-defensin and cathelicidin peptides may also provide a valuable resource for translational research and development of novel antimicrobial agents in the future. Electronic supplementary material The online version of this article (doi:10.1186/s12862-015-0465-3) contains supplementary material, which is available to authorized users.


Background
Defensins and cathelicidins are two families of cationic small peptides that have broad-spectrum antimicrobial activities against a wide range of bacterial, fungal or viral pathogens. These peptides are produced in a large variety of invertebrate and vertebrate organisms, representing an ancient form of host defense against microbes. In addition to their antimicrobial function, defensins and cathelicidins have also been found to exhibit diverse immunomodulatory activities, rendering them important components of both innate and adaptive immune systems [1][2][3]. With a key role in host defense against rapidly evolving pathogens, defensin and cathelicidin gene families provide an ideal system for studying adaptive molecular evolution [4]. Previous studies in mammalian lineages have revealed positive selection driving rapid divergence of these host defense peptides [4,5]. Recent whole-genome sequence analysis of 48 bird species [6] has enabled us to perform a comprehensive comparative analysis on the avian lineages of defensins and cathelicidins, which will not only greatly improve our understanding of the evolutionary diversification of these ancient peptides over the past 100 million years through the avian radiation, but also provide a valuable resource for developing novel antibiotics to treat microbial infections in birds and other vertebrates [7,8].
Cathelicidins are characterised by having a conserved cathelin domain and a highly variable mature peptide domain. Four avian cathelicidin genes have been identified in the chicken, designated CATHL1, CATHL2, CATHL3 and CATHB1 [2,[29][30][31]. These genes share similar structures with mammalian cathelicidins, each comprising of four exons, encoding a prepropiece consisting of a signal peptide, the cathelin-like domain (propiece) and the mature peptide. One unusual feature found in the propiece of chicken CATHB1, is the presence of nine octamer repeats (PGLDGSXS) N-terminal to the cathelin domain [30], which is not seen in other cathelicidin genes. Mature avian cathelicidins are activated before secretion with the propiece cleaved off by serine proteases [32] and form an alpha-helix predominant structure with a kink induced by a glycine or proline close to the centre [33][34][35]. All four chicken cathelicidins show high levels of expression in immune organs and gastrointestinal, respiratory and urogenital tracts, with CATHL1, 2 and 3 most highly expressed in the bone marrow and lung, and CATHB1 in the bursa of Fabricius [36]. In addition to broad antimicrobial activity against bacteria and fungi, avian cathelicidins also play a range of immunoregulatory roles, such as blocking lipopolysaccharide-induced cytokine expression [37] or inducing specific chemokine production [31].
In this paper, we studied the evolution of β-defensins and cathelicidins in 53 avian species (Table 1) and discussed the following issues: conservation and lineage-specific duplication/deletion of genes, conserved genomic organisation of β-defensin and cathelicidin gene clusters, coevolution between pro-and mature peptides in cathelicidins, and amino acid sites and evolutionary branches under selection.

Results and discussion
Avian β-defensins and cathelicidins A total of 758 genes, including 714 β-defensins and 44 cathelicidins, were annotated in 53 avian genomes (scaffold number and gene coordinates provided in Additional file 1; nucleotide sequences in Additional file 2). The selected species represent 32 different avian orders spanning 52 genera (Table 1), last sharing a common ancestor around 114 million years ago ( Fig. 1) [38].
Characteristics of AvBD gene subfamilies are summarised in Table 2 (alignments shown in Additional file 3). Ten genes (AvBD2, 4, 5, 7, 8, 9, 10, 11, 12 and 13) are conserved as one-to-one orthologues in most surveyed species, suggesting a high level of conservation of these genes for over 100 million years (Fig. 3). AvBD1 and AvBD3 are subject to lineage-specific expansion, with up to three AvBD1 paralogues found in the killdeer (Charadrius vociferus), saker falcon (Falco cherrug), medium ground finch (Geospiza fortis) and zebra finch; extensive AvBD3 duplications occurred in Passeriformes with up to 14 paralogues found in the white-throated sparrow (Zonotrichia albicollis). AvBD6 is a duplication of AvBD7, which has arisen within Galliformes after the divergence of family Odontophoridae and before Phasianidae, as it is present in chicken and turkey but absent in the northern bobwhite (Colinus virginianus). AvBD7 has degenerated into a pseudogene in the sunbittern (Eurpyga helias) and is missing in all three psittacines. AvBD14 was found in 23 species and has been degraded into a pseudogene in the orders Falconiformes, Passeriformes and Psittaciformes, and species Columbia livia, Gavia stellata, and Calypte anna.
The spacing between cysteines within the defensin motif is conserved within subfamilies with the overall consensus being C (4-7) C (3-6) C (7-10) C (5-6) CC, numbers representing the number of residues between cysteines ( Table 2). The majority of subfamilies have 9-11 residues preceding the first cysteine of the defensin motif. Exceptions include the group of passeriforme AvBD3s immediately preceding AvBD5 which have 2   AvBD14 fragmented a indicates an intact region between AvBD2 and AvBD5 in the genomic assembly, thus the actual number and position of AvBD1(s) and AvBD3(s) can be determined residues prior to the first cysteine in addition to having only 2 exons. AvBD1 and AVBD3 have the same spacing of cysteines but differ in the number of residues prior to the first cysteine. Cathelicidins were only found in 21 surveyed species due to low assembly quality of the genomic regions. Similar to the 10 relatively conserved AvBDs, CATHL2, CATHL3 and CATHB1 have been conserved across a variety of avian orders (Fig. 4). Evidence suggested that CATHL3 has been reduced to a pseudogene in Falconiformes and lost in Passeriformes, yet duplicated to give rise to CATHL1 in Galliformes. Annotation results also suggested that CATHL2 and CATHL3 may have been lost in Sphenisciformes and Ciconiiformes, represented by the emperor penguin (Aptenodytes forsteri) and crested ibis (Nipponia nippon), respectively (Fig. 4), though this could be an artefact caused by high degree of assembly gaps in the genomic region. The octamer repeats feature found in chicken CATHB1 appeared to be unique to Galliformes (or Phasianidae), as it has only been seen in the chicken and turkey.

AvBD3
Up to 14 paralogs in a single species (5-7) C (6) C (4) C (9) C (5) CC 3 (3rd exon: 12-14 residues in Galliformes) AvBD5 one to one ortholog (11) C (6) C (4) C (9) C (5) CC 3 AvBD4 one to one ortholog (9-14) C (6) C (4) C (9) C (5) CC 2-3 AvBD14 One to one ortholog (6) C (6) C (4) C (9) C (6) CC 2 a Genes are listed in the order from the flanking gene cathepsin B to the flanking gene TRAM2 b The number of residues between cysteines are noted in parenthesis c Third exons only code for 2-5 residues unless otherwise noted up to 11 different scaffolds/contigs, thus there may be unrevealed lineage-specific gene rearrangements. Similar to AvBDs, avian cathelicidins also form a conserved gene cluster, which is flanked by kelch-like family member 18 (KLHL18) and transforming growth factor beta regulator 4 (TBRG4) genes (Fig. 4). The majority of species share a conserved gene order of KLHL18, CATHL2, CATHL3, CATHB1 and TBRG4. Exceptions include an inversion of the region containing CATHL3 and CATHL2 in Galliformes, and CATHL3 and CATHB1 being arranged in the reverse order in the common cuckoo (Cuculus canorus).
Such clustering of homologous genes in a tightly linked fashion is rather common for immune genes. Other well-known examples include the major histocompatibility complex (MHC), immunoglobulins (Ig), Fc receptors (FcR) and killer-cell Ig-like receptors (KIR). These immune gene families are believed to be regularly refreshed via in cis duplication, resulting in related genes lying next to each other in linked array in the genome [39]. It has been suggested that immune genes clustering together may be biologically significant in that it may facilitate the coordinated expression of functionally related loci, and therefore has been selectively maintained [39,40].

Net charge of mature peptides of β-defensins and cathelicidins
Disruption of microbial membranes is a major mechanism underlying antimicrobial activity of defensins and cathelicidins [41] and it has been demonstrated that the net charge of the mature peptide directly influences its antimicrobial potency [42]. The net charge of the mature peptide ranges from −2.9 to +10.0 in avian β-defensins and +4.0 to +12.0 in cathelicidins (Fig. 5). On average,  cathelicidins have higher charges than the defensins. Six defensin subfamilies (AvBD5, 8, 9, 10, 11 and 12) have an average net charge lower than +4.0, with AvBD12 showing the lowest average charge (+0.1). AvBD3 and all cathelicidin subfamilies have an average charge higher than +6.0. The low net charge of certain AvBDs indicates that they may have lower activities in terms of direct killing microbes. By contrast, the highly cationic peptides, such as AvBD3 in the emperor penguin, Adélie penguin (Pygoscelis adeliae) and yellow-throated sandgrouse (Pterocles gutturalis) with a net charge of +10.0, and CATHB1 in the kea (Nestor notabilis) with a net charge of +12.0, may provide valuable templates for developing new antimicrobial agents. The mature peptide sequences of all avian β-defensins and cathelicidins are provided in Additional file 4 and net charges shown in Additional file 5.

Avian cathelicidins and the "charge balance hypothesis"
In mammalian α-defensins, Michaelson et al. [43] proposed that the anionic propiece plays a role in preventing autocytotoxicity by neutralising the cationicity of the mature peptide. The authors demonstrated a linear relationship between the net negative charge of the propiece and the positive charge of the mature peptide of seven α-defensins (two human and five rabbit genes). Hughes and Yeager [4,44] provided further supportive evidence of this relationship (r = −0.742; p < 0.001) using 28 α-defensins from five mammalian species (mouse, rat, guinea pig, rabbit and human). This "charge balance hypothesis" is unlikely to apply to the β-defensins due to the short length of the propiece (0-7 amino acids in AvBDs). However, we explored the hypothesis in avian cathelicidins and revealed a similar, though weaker (r = −0.38; p = 0.03) association between the electrostatic charges of the propiece and mature peptides (Fig. 6a). All annotated avian cathelicidins with an intact coding sequence were used in the analysis (n = 24). While the mature form of all cathelicidins is highly cationic (8.51 ± 0.61), the propiece has an anionic character (−5.24 ± 1.13). The inactive form of CATHL2 before secretion has the lowest mean net charge (0.81 ± 0.62) among avian cathelicidins (overall mean = 3.27 ± 1.06), suggesting a better neutralising effect of the propiece in this subfamily. It should be noted that changes in cytoplasmic pH also affect the electrostatic charges of peptides, with higher pH resulting in lower charges (e.g. at pH 7.5, overall mean charge of presecretory avian cathelicidins = 2.79 ± 1.05). Similar level of correlation (r = −0.38; p = 0.02) was also found in 28 primate cathelicidin genes ( Fig. 6a; accession numbers of primate genes listed in Additional file 6).
Further analysis was performed on reconstructed ancestral sequences to infer changes that may have occurred in avian and primate cathelicidins over evolutionary time (Fig. 6b, Additional file 7) [44]. A significant negative correlation was observed between electrostatic property changes in the propiece and mature peptide in both groups (aves: r = −0.35, p = 0.01; primate: r = −0.36, p = 0.01). When substitutions increasing peptide charge occurred in the mature cathelicidin, charge in the propiece tended to either decrease or remain unchanged, whereas when the mature peptide became less cationic, the propiece tended to become less anionic. This is highly similar to what Hughes and Yeager observed in mammalian defensins [44].
Moreover, evidence of intra-molecular amino acid residue co-evolution was detected between two pairs of sites with electrostatic properties in the propiece and mature peptide domains of CATHB1 (highlighted in Additional file 3). Residue 122L/G/E/K and 226G/K/R, and 160R/Q/ P and 222E/D/G/N showed 96.9 and 80.2 % probability of having been co-evolving, respectively (residue positions are based on chicken CATHB1).
These observations indicate that, similar to mammalian α-defensins, the propiece and mature peptide of avian and primate cathelicidins may have co-evolved in such a way that amino acid substitutions in both regions are selected and accumulated to balance the charge. Further experimental evidence will be needed to validate this hypothesis and elucidate the role of cathelicidin propiece in preventing autocytotoxicity.

Amino acid sites under selection
Negative selection was detected in a large proportion of amino acid sites in the examined avian β-defensin (11.4-40.9 %) and cathelicidin (9.7-24.7 %) genes (Fig. 7a, Additional file 3). In AvBD1, 2, 3, 8, 9, 11, 12 and 13, the overall mean rate of nonsynonymous nucleotide substitutions (d N ) was significantly (at 0.05 nominal level) lower than that of synonymous substitutions (d S ) in both mature peptide and the signal domains, suggesting an overall effect of negative selection on these genes (Fig. 7b,  Table 3). In the other five AvBDs significant negative selection was detected only in the mature peptide, whereas the opposite pattern was observed in the avian cathelicidin genes with only the signal and propiece domain showing significantly higher d S than d N .
Despite the strong background of negative selection, evidence indicating specific amino acid sites subject to episodic diversifying selection was found in most studied avian genes (Fig. 8), though the effect of such selection was very weak in AvBD9, AvBD11 and AvBD13 and the three cathelicidins, with less than 4 % of total sites inferred to be positively selected. To reduce chances of false positive detections, only codon sites that were detected by multiple selection test methods were considered significant (see Methods section) [45,46]. In the examined avian AMPs, most positively selected sites were found inside the mature peptide domain, except for AvBD5, AvBD13 and AvBD14 and all cathelicidins, in which more were found in the signal and propiece regions (Figs. 7a and 8). Within mature defensins, diversifying selection appears to mainly (68.3 % cases) affect those residues that are close to (within two residues) the conserved cysteines. For comparison, same analyses were performed on two primate β-defensin subfamilies, DEFB1 (n = 24) and DEFB4 (n = 10), and the primate cathelicidin CAMP (n = 28) (gene accession numbers provided in Additional file 6). Consistent with previous reports, our results suggested that positive selection has involved in driving the evolution of primate DEFB4 (coding for βdefensin 2) [47], whereas DEFB1 is highly conserved with no evidence of diversifying selection in primate lineages [48]. Taken together both positive and negative selection, AvBDs appear to have evolved under higher selective pressures and restraint, while the two examined mammalian β-defensins have evolved more neutrally in primates (Fig. 7a). Contrary to what was observed in avian cathelicidins, in primate CAMP a majority (58.3 %) of positively selected residues are located inside the mature peptide, resulting in a significantly higher overall d N than d S in the gene domain (Fig. 7).
Due to the important role of electrostatic charge on antimicrobial potency of the mature peptide, positively charged amino acid residues are expected to have evolved under positive selection. However, in the examined AvBDs and cathelicidins, a number of such sites showed strong negative selection or neutral results. One possible explanation is that these cationic residues may have been selectively accumulated prior to the divergence of the studied bird lineages (~114 million years ago), and then have been conserved by negative selection through the long evolutionary history due to their significant role on peptide function. Further studies to include more distant taxa (such as reptiles) will help elucidate the evolutionary dynamics of the peptides.

Lineages subject to episodic diversifying selection
Evolutionary branches that were indicated to have experienced diversifying selection episodes at each gene are highlighted in Additional file 8. On average, 20.9 and 26.6 % of nodes over the course of evolution of AvBDs and cathelicidins, respectively, were detected with episodes of positive selection, with the highest percentage observed in AvBD7 (41.4 %) and lowest in AvBD13 (7.1 %). The later stages of evolution within avian lineages seem to have involved relatively stronger diversifying selection as compared to the more ancient branches, which is consistent with the observation that the examined gene families are generally well conserved across all avian orders and families. This result contradicts to what was previously detected in mammalian, particularly primate β-defensins, which involved more positive selection episodes in more ancient branches due to duplication and diversification of β-defensins in the early stages of mammalian evolution [5]. Several evolutionary nodes were detected to have undergone diversifying selection at multiple genes (Additional file 8). For example, AvBD5, 7, 10, 12 and 14 of Galliformes were inferred to be under diversifying selection between 84.5 and 37.9 million years before present (evolutionary time estimated based on branch length in Fig. 1), prior to the divergence between family Phasianidae (chicken and turkey) and Odontophoridae (northern bobwhite Colinus virginianus). Within the order Psittaciformes, AvBD3, 4 and 8 were estimated to have been subject to episodic positive selection between 79.8 and 55.3 million years ago before the divergence of the Puerto

Conclusion
In this study, we investigated the evolution of β-defensins and cathelicidins in 53 bird species. Both gene families form a generally conserved gene cluster in avian genomes with certain genes being more prone to duplication (AvBD1, AvBD3 and AvBD7) or pseudogenisation (AvBD14) events. Intense negative selection was detected in a majority of examined gene domains, likely accounting for the conservation of certain amino acid residues that are essential for the functioning of β-defensins and cathelicidins in birds. Evidence indicated that episodic positive selection also played a role in driving the diversification of specific residues of certain antimicrobial peptides in avian evolutionary history, contributing to high variability of gene sequences and electrostatic property of the peptides. Our results also revealed that selection may have acted on cathelicidins to maintain a balanced charge between the anionic propiece and cationic mature peptide over evolutionary time. This work not only has greatly improved our understanding of the molecular evolution of these host defense peptides, but also provides a valuable resource for potential translational research and development of novel antimicrobial agents.

Database search and gene nomenclature
Fifty-three bird genomes were searched for β-defensin and cathelicidin genes (GenBank Assembly IDs provided in Table 1). For each genome, four steps were taken to identify genes of interest: 1) An initial search was performed on a predicted protein/CDS database with BLAST programs using chicken genes as query sequences. Hits with E-value <0.1 were extracted from the database, aligned to chicken sequences with ClustalW [49], and manually examined to exclude false positives. 2) Then for each gene, a profile hidden Markov model (HMM) [50] was built from a peptide sequence alignment that includes all orthologues found in the previous step. The profile HMMs were searched against the predicted protein databases with HMMER3.1 programs [51] on a GALAXY platform. Hits with both E-values (full sequence and best 1 domain) <1 were extracted, aligned with previously found sequences, and manually checked to confirm real homologues. 3) Sequences identified in the previous steps were then used to BLAST search the whole genome to find any genes, gene fragments, or pseudogenes that are not included in the protein/CDS database. 4) All genomic scaffolds and contigs containing β-defensin or cathelicidin genes were extracted to study the genomic organisation of these genes. In addition, sequences containing CTSB, TRAM2, KLHL18 or TBRG4, the flanking genes of the β-defensin or cathelicidin gene clusters, were also extracted. Scaffolds and contigs were manually curated using Artemis [52].
Annotated sequences were named by tagging the gene name with a five-letter abbreviation as a suffix that distinguishes the species. For example, AvBD1_ACACH refers to gene AvBD1 of the rifleman (Acanthisitta chloris). Duplications of AvBD1 and AvBD3 were numbered 1.n and 3.n from AvBD2. Some AvBD1 and AvBD3 duplicates were found on isolated scaffolds and numbering of these defensins may not represent the actual position in the intact cluster. Previously used identification references for zebra finch duplicates [9] are included in the identification.

Mature peptide prediction and net charge estimation
Sequence features, such as signal peptide, propiece and mature peptide, within annotated genes were speculated based on functional domains in chicken defensins [10] and cathelicidins [30, 31,33,34]. The net charge of mature peptide was estimated as pK a j −pH with N i and pKa i being the number and pKa values of histidine (H), lysine (K) and arginine (R) residues and the N-terminus, and N j and pKa j the number and pKa of aspartic acid (D), glutamic acid (E), cysteine (C) and tyrosine (Y) residues and the C-terminus [53]. Lehninger's set of pKa values were used [54] and intramolecular disulfide bond formation was taken into account in calculation.

Evolutionary analyses
Overall phylogenetic analyses between all avian βdefensins and cathelicidins (Fig. 2) were conducted in MEGA5 with a Maximum Likelihood method [55,56] using four discrete categories for the Gamma distribution to model evolutionary rate differences among sites and 100 bootstrap replicates to infer the level of confidence on the phylogeny (support values lower than 40 % are not shown in the consensus tree) [57].
A range of evolutionary analyses were performed on each β-defensin and cathelicidin gene via the Datamonkey webserver [58], including: 1) Negative selection sites were detected using Fixed Effect Likelihood (FEL) [59] and Fast Unconstrained Bayesian Approximation for Inferring Selection (FUBAR) [60]. 2) Individual sites under positive selection were detected using three test methods, including Mixed Effects Model of Evolution (MEME) [61], FEL, and FUBAR. Codon sites found to be significant for positive or negative selection by more than two methods (MEME p < 0.05, FEL p < 0.1, and FUBAR posterior probability >0.9) were included in the analyses [45,46]. 3) Individual branches with episodic diversifying selection were inferred by combining results from two analyses-MEME (emprical bayes factor >20) and branch-site REL (p < 0.05) [62]. 4) Intramolecular co-evolution of amino acid sites in cathelicidins were detected using the Spidermonkey/Bayesian Graphical Model [63]. Only sites that are involved in electrostatic properties and have more than three branches with nonsynonymous substitutions were included in the analysis for covariation. Assessment of overall mean rates of nonsynonymous (d N ) and synonymous (d S ) nucleotide substitutions in the signal and propiece region and the mature peptide domain, and the significance test of overall selection were calculated in MEGA5 [55] using the Kumar model [64] and 1000 bootstrap replicates to estimate standard errors.
Ancestral cathelicidin sequences were inferred using the Maximum Likelihood method [64] under the Whelan And Goldman model [65] in MEGA5. Rates among sites were treated as a Gamma distribution using five Gamma categories.

Ethics
No human, human data, or animal was used in this study.

Availability of supporting data
GenBank assembly IDs of all 53 examined avian genomes are provided in Table 1. Scaffold and coordinates information of annotated genes are detailed in Additional file 1. Nucleotide sequences of all annotated genes are available in Additional file 2. Full-length amino acid sequence alignments and putative mature peptide sequence of each gene are provided in Additional files 3 and 4, respectively. Gen-Bank accession numbers of primate genes used in analyses are listed in Additional file 6.