Identification and localization of the β-keratin multigene family
The β-keratin nucleotide sequences, amino acid sequences, and unique features associated with the β-keratin genes were obtained from NCBI and published sources . All GI numbers or references (without GI numbers) for β-keratins are listed in Table 3. The NCBI Basic Local Alignment Search Tool (BLAST) was used to search the 2.1 build of G. gallus and the 1.1 build of T. guttata . The genomic data was downloaded via the NCBI ftp site. We used Artemis 8.1  to view the β-keratin genes graphically.
To identify the genomic region in G. gallus that contained the claw, feather, and feather-like β-keratins described by Presland et al  we used very strict E-values and BLAST scores. This strict use of identity was also applied to all the unique features associated with the β-keratins, including the scale β-keratins and the β-keratin from cultured keratinocytes. Table 3 identifies the highest and lowest E-value and BLAST score used for each feature of the genomic region identified by Presland et al  for G. gallus  and for the β-keratin in jun-transformed cells. The sequences used to search T. guttata were identical to those used for G. gallus (Table 3).
In addition to the cluster of β-keratins identified by Presland et al , preliminary analysis of the genomes of both G. gallus and T. guttata revealed additional genomic loci (including chromosome unknown of both the chicken and zebra finch) containing feather or feather-like β-keratins. To identify and classify these genes fully, more lenient parameters were used: an E-value cut-off of 1e-10, reasonable stop codons and start codons, no in-frame stop codons, no frame shift mutations, and no regions of unknown genomic sequence (see Table 2).
All sequences in this study were obtained from the genomic sequences of G. gallus and T. guttata and use a simple annotation pattern. Since all data presented in this paper is from the genomic sequences of G. gallus and T. guttata, the numbering of the β-keratins will follow a 5' to 3' pattern. This annotation also includes the species (abbreviated as GGA or TGU), chromosome number and β-keratin subfamily (feather = FK, feather-like = FL, claw = Claw, or scale = Scale and β-keratin from cultured keratinocytes = keratinocyte). For example, the claw sequence which is found at the 5' end of the cluster on the positive strand, located on microchromosome 25 of G. gallus is annotated as GGA25_claw1 (See Additional File 1 Table S1).
Expressed Sequence Tag Analysis
Expressed sequence tag (EST) databases for both avian species were downloaded via NCBI and contained 599,999 and 91,801 sequences for chicken and zebra finch, respectably. The β-keratin databases for chicken and zebra finch were used as queries for blastn searches of the EST databases. An E-value cutoff of 1e-160 was used for G. gallus β-keratin sequences and an E-value cutoff of 1e-150 was used for the T. guttata β-keratin sequences. The GI number for the EST, the range of the nucleotides corresponding to the β-keratin sequence, the range of the 5' and 3' EST base pairs, and the EST tissue collection information is listed in Additional File 3, Table S3.
Alignments were accomplished using the program CLUSTAL W Multiple Sequence Alignment Program  and PAL2NAL . All default parameters were used in PAL2NAL and CLUSTAL W with the option SLOW/ACCURATE selected for CLUSTAL W alignments. Visual inspection confirmed an adequate alignment. Phylogenetic and molecular evolutionary analyses were conducted using MEGA version 4 .
Tree reconstruction was done using a total of 222 taxa, which included all probable feather β-keratins (located on GGA1, GGA2, TGU2, GGA5, GGA27, TGU27, GGA_Un and TGU_Un) and all probable β-keratins on GGA6 and chromosome 25 of both avian species (see Table 2). As an outgroup for tree reconstruction, three β-keratin nucleotide coding sequences were used from Crocodylus niloticus (Nile crocodile) and were obtained via NCBI and have the following Genbank numbers: 215541571, 215541573 and 187942180. The Modeltest program, using the Akaike Information Criterion (AIC), selected the generalized time reversible evolutionary model with gamma distributed rate heterogeneity and a proportion of invariant sites as the best fit model for these taxa . Using this suggested evolutionary model, tree reconstruction was accomplished using both distance based (Neighbor-Joining) and character based (Maximum Likelihood) methods.
MEGA version 4  was used for Neighbor-Joining tree reconstruction, which was accomplished with 1000 bootstrap replicates, pairwise deletion, all codon positions selected, the LogDet evolutionary model, substitutions including transitions and transversions, and a heterogeneous pattern among lineages [58, 59]. The nile crocodile sequences was chosen as the outgroup. The resulting evolutionary tree was used for figure construction (Additional file 5).
The RA×ML 7.0.3 edition was used for Maximum Likelihood analyses. 1000 bootstrap replicates was accomplished using the GTR+I+G model (GTRGAMMAI) with C. niloticus chosen as the outgroup. The first run was done using the -f i option, which performs a really thorough standard bootstrap procedure (1000 bootstrap replicates). A second run was done with -f d option, which performs a fast rapid hill-climbing algorithm. Bootstrap values from the first run were added to the resulting tree of the second run (Additional file 6) .
The large scale duplication seen among the feather β-keratin sequences were analyzed to determine amino acid sites under varying selective pressures using the PAML 4.0 package, in which six models of evolution were used . Models tested were: M0 (one ratio), M1a (nearly neutral), M2a (positive selection), M3 (discrete), M7 (beta) and M8 (beta + ω). Three models allow for the possibility of positive selection, M2a, M3, and M8, which is the d
ratio of ω being greater than 1. To test the fit of these models to the data and therefore the occurrence of a false positive, likelihood ratio tests (LRTs) were used. LRTs were only used when a model (M2a, M3, or M8) suggested a high d
ratio for amino acid sites. These three models were tested against a null model, using four separate tests (M0 vs. M2a; M0 vs. M3; M1a vs. M2a; and M7 vs. M8), by comparing (2ln Δl) against the χ2-distribution, with the degrees of freedom equal to the number of parameters between models. Each major locus that contained more than two feather β-keratins were analyzed separately (TGU2, GGA2, GGA25, GGA27 and TGU27). This allowed for the possibility that each chromosome/locus of each species (GGA and TGU) has evolved independent or temporally separate and would then be under different selective pressures. The PAL2NAL alignment tool was used to convert a nucleotide alignment into a codon alignment for input into the PAML program in association with the Neighbor-Joining (N-J) tree output from CLUSTAL W [54, 56, 62].
To search for duplicated segments shared between pairs of genes, by mechanisms such as gene conversion and/or unequal crossing over, the program GENCONV 1.81a was utilized and applied to all probable genes in the final data set (See Identification and localization of the β-keratin multigene family above). Global Bonferroni corrected P-values were calculated against a simulated distribution of 10,000 iterations to determine the statistical significance of a shared observed fragment from a pair of sequences. To be considered a significant result the p-value must be lower than 0.01 [31, 63].