Intragenic homogenization and multiple copies of prey-wrapping silk genes in Argiope garden spiders

Background Spider silks are spectacular examples of phenotypic diversity arising from adaptive molecular evolution. An individual spider can produce an array of specialized silks, with the majority of constituent silk proteins encoded by members of the spidroin gene family. Spidroins are dominated by tandem repeats flanked by short, non-repetitive N- and C-terminal coding regions. The remarkable mechanical properties of spider silks have been largely attributed to the repeat sequences. However, the molecular evolutionary processes acting on spidroin terminal and repetitive regions remain unclear due to a paucity of complete gene sequences and sampling of genetic variation among individuals. To better understand spider silk evolution, we characterize a complete aciniform spidroin gene from an Argiope orb-weaving spider and survey aciniform gene fragments from congeneric individuals. Results We present the complete aciniform spidroin (AcSp1) gene from the silver garden spider Argiope argentata (Aar_AcSp1), and document multiple AcSp1 loci in individual genomes of A. argentata and the congeneric A. trifasciata and A. aurantia. We find that Aar_AcSp1 repeats have >98% pairwise nucleotide identity. By comparing AcSp1 repeat amino acid sequences between Argiope species and with other genera, we identify regions of conservation over vast amounts of evolutionary time. Through a PCR survey of individual A. argentata, A. trifasciata, and A. aurantia genomes, we ascertain that AcSp1 repeats show limited variation between species whereas terminal regions are more divergent. We also find that average dN/dS across codons in the N-terminal, repetitive, and C-terminal encoding regions indicate purifying selection that is strongest in the N-terminal region. Conclusions Using the complete A. argentata AcSp1 gene and spidroin genetic variation between individuals, this study clarifies some of the molecular evolutionary processes underlying the spectacular mechanical attributes of aciniform silk. It is likely that intragenic concerted evolution and functional constraints on A. argentata AcSp1 repeats result in extreme repeat homogeneity. The maintenance of multiple AcSp1 encoding loci in Argiope genomes supports the hypothesis that Argiope spiders require rapid and efficient protein production to support their prolific use of aciniform silk for prey-wrapping and web-decorating. In addition, multiple gene copies may represent the early stages of spidroin diversification.


Background
Spider silks are ideal for studying the molecular evolutionary processes that create and maintain adaptive characteristics. An individual spider can produce and use different silk types singly or in combination for specific tasks, with each silk type having mechanical properties well-suited to its function. For example, aciniform silk is used in prey immobilization and egg sac construction [1,2]. The mechanical properties of aciniform silk include impressive extensibility and toughness [1], making it excellent for swathing struggling prey because it is easy to stretch but difficult to break. Orb-weaving garden spiders from the genus Argiope are renowned for their use of aciniform silk. Argiope employ many layers of aciniform silk to completely immobilize and envelop their prey (e.g. [3,4]), and Argiope are also a model system for studying the purpose of aciniform-silk web decorations, known as stabilimenta, that have been implicated in predator avoidance, prey attraction, and web stability (for review see [5,6]).
Of the five fibrous silks spun by the silver garden orbweaver Argiope argentata, aciniform silk is the toughest and one of the most extensible [7]. However, little is known about the evolution of aciniform silk's physical attributes. Spider silk mechanical properties are related to the suite of proteins that compose each silk type. The majority of spider silk proteins, or spidroins (a contraction of "spider-fibroins" [8]), are encoded by members of a single gene family. Spidroins are typically very large (>200 kDa), and are dominated by a series of iterated repeats flanked by short amino (N)-and carboxy (C)terminal regions [9,10]. The length, number, and amino acid (aa) composition of the iterated repeats are silktype specific, whereas phylogenetic analyses have shown that aa residues in the N-and C-terminal regions are more conserved across spidroins [11,12]. Repeat aa sequence corresponds to secondary structures that are partly responsible for silk mechanical properties (e.g. [13][14][15]), and conservation of the N-and C-terminal regions [12] and their presence in spun silk fibers suggests an important role in spider silk biology [16][17][18].
The evolutionary maintenance of spidroin repeat sequences within a silk type and the divergence of those repeat sequences between silk types is central to spider silk function and diversity. Within a particular spidroin, repeat units tend to be highly similar, or homogeneous, in amino acid and nucleotide sequence. The gene encoding aciniform spidroin (AcSp1) has repeats that are relatively complex among spidroin family members, however, despite this complexity, AcSp1 repeats are also spectacularly homogenized [1,19]. A recent analysis of a complete AcSp1 from the western black widow Latrodectus hesperus showed that its repetitive region, like those of other spidroins, is dominated by the amino acids glycine (G), alanine (A), and serine (S) [19]. However, L. hesperus AcSp1 repeats have few or none of the short glycine and alanine-rich subunits, such as GGX, poly-GA, and poly-A, that can be the bulk of other spidroin repeats [9]. Nevertheless, L. hesperus AcSp1 repeats are remarkably homogenized (>99% identity at the nucleotide level [19]). This is consistent with results from a partial length AcSp1 cDNA from the banded garden orb-weaver Argiope trifasciata, which has 14 repeats that are each 600 bp and share 99.9% identity at the nucleotide level [1].
The high level of AcSp1 repeat homogeneity is frequently attributed to gene conversion or unequal crossing over resulting in intragenic concerted evolution (e.g. [1,19,20]). Concerted evolution usually refers to homogenization among gene family members, such as rDNA gene copies [21], but it can also occur within a gene [22,23]. Stabilizing selection alone would maintain protein sequence, resulting in a high level of repeat identity at the aa level. However, the extreme level of homogenization reported for AcSp1 repeats provides evidence for concerted evolution because it exists at both the protein and nucleotide levels [1,19].
In addition to concerted evolution, repeat homogeneity in AcSp1 may be maintained by functional constraints. Recent nuclear magnetic resonance (NMR) studies of AcSp1 repeats from both Nephila antipodiana and A. trifasciata delineate different domains in each repeat unit, one domain that is rich in alpha helices and one that is not [24,25]. Xu et al. [25] used NMR and dihedral angles from global likelihood estimate (DANGLE) analyses to predict the chemical shift indices of a 199 aa recombinant A. trifasciata AcSp1 repeat. The consensus secondary structure assignments specified that the last quarter of the protein was unstructured, but that the first three-quarters of the repeat contained six major helical regions. Protein structures such as these six alpha helices are considered the foundation for silk mechanical properties (e.g. [25,26]).
Assessing the extent to which a spidroin is homogenized within a single gene or among individuals is difficult because the repetitive region makes it exceptionally challenging to sequence complete spidroin genes. Indeed, partial length sequences that are biased toward the C-terminus greatly dominate the number of published spidroins [12]. Additionally, the evolutionary processes leading to spidroin divergence between species and silk types are often unclear due to a lack of knowledge about spidroin genetic variation among individuals.
Here, we address these issues by presenting a complete spidroin gene from an Argiope spider, the AcSp1 sequence of A. argentata (Aar_AcSp1), and by screening for AcSp1 variation among individual A. argentata, A. trifasciata, and A. aurantia spider genomes. Sequencing the full array of Aar_AcSp1 repeats enabled us to test hypotheses of concerted evolution and functional constraints. Based on previous spidroin research, Aar_AcSp1 repeats should be extremely homogenous at the nucleotide and amino acid levels. In addition, amino acid sequences that are predicted to correspond to the structural motifs that contribute to the toughness and extensibility of aciniform silk should be more conserved between Argiope species relative to surrounding regions. Among the surveyed A. argentata individuals, we expected Aar_AcSp1 to be a single-copy gene, similar to L. hesperus AcSp1 [19]. Between species, previous research suggests that the spidroin repeats within each silk type are highly conserved, but that the terminal regions show more variation [12], and we hypothesized that Aar_AcSp1 would also follow this pattern.

Results and discussion
Argiope argentata AcSp1 complete sequence and phylogenetic placement Despite obtaining 59 AcSp1 cDNA clones, including one that was >8 kb [1], a complete Argiope AcSp1 remained elusive until the present study. By screening a large-insert genomic DNA library, we sequenced and assembled 18,080 bp of A. argentata DNA including a complete open reading frame that is 13,440 bp long and predicted to encode a 4,479 aa A. argentata AcSp1 (Aar_AcSp1; Figure 1; GenBank KJ206620). No introns were detected. The putative protein has a predicted size of~430 kDa, and the most abundant amino acids are serine (22.6%), alanine (14.4%) and glycine (13.3%). Aar_AcSp1 has three regions, a central repetitive region that is flanked by conserved N-and C-terminal regions. The repetitive region dominates~90% of the protein and is composed of 20 iterated repeats (Figure 1). The first 19 repeats are each 204 aa, and the last repeat is 186 aa due to truncation at the end ( Figure 1). The length, amino acid composition, and organization of AcSp1 are all consistent with other spidroin family members [9].
Phylogenetic analyses of Aar_AcSp1 N-and C-terminal coding regions with those from other spidroins grouped Aar_AcSp1 with the Latrodectus (widow spider) AcSp1 sequences in a well-supported clade (bootstrap value = 98%; Figure 2). Latrodectus and Argiope are estimated to have diverged from each other~175 million years ago (MYA) [27]. Despite this lengthy time period, the recovery of an AcSp1 clade was consistent with prior studies in which spidroin sequences nearly always grouped based on silk type (e.g. [9,10]). The sister group to the AcSp1 clade was TuSp1, tubuliform (egg-case) spidroin, suggesting that these paralogs have a relatively recent common ancestor [19]. Further potential evidence of their shared ancestry is that both of these silk types are used in egg-case construction and both have repeats that are relatively complex compared to other spidroins [28]. In our phylogenetic analysis, a large, weakly supported assemblage of spidroins is sister to the combined AcSp1 and TuSp1 clade. Given the low support, it is unclear which spidroins are most closely related to AcSp1 and TuSp1.

Argiope argentata AcSp1 repeat homogeneity
As expected, Aar_AcSp1 repeats are complex and spectacularly homogenized. Although glycine, alanine, and serine account for~50% of its repetitive region composition, Aar_AcSp1 has few of the glycine/alanine-rich motifs such as GGX, GPG, poly-GA, and poly-A that are dominant in the dragline major ampullate spidroins (MaSp1, MaSp2) from Argiope and other taxa [9]. At the nucleotide level, the average pairwise percent identity between Aar_AcSp1 repeats is an astonishing 98.7%. Complexity and extreme homogenization are also features of previously described AcSp1 sequences [1,19].
The extreme nucleotide identity of Aar_AcSp1 is consistent with concerted evolution, and cannot be easily explained by codon usage bias. For example, Aar_AcSp1 codon use is strongly influenced by amino acid position within a repeat. In our repeat alignment (Additional file 1: Figure S1), the neighboring alanine codons at nucleotide positions 103-105 and 106-108 are GCC and GCT, respectively. GCCGCT is present in the same relative location in all twenty repeats. Similarly, the glycine codons that appear at nucleotide positions 64-66 and 130-132 also consistently use different codons (GGT and GGA, respectively). The same alternative codons are used at the same exact positions throughout most, if not all, the repeats. Despite a slight skew toward alanine codons that end in adenine (A) or thymine (T) (55.0% GCW, W being the IUPAC ambiguity code for A or T; Additional file 2: Table S5), it is difficult to postulate that selective forces acting at the level of codon usage are responsible for the extensive homogeneity of codon positions found throughout the 612 bp Aar_AcSp1 repeat. Concerted evolution that fixes particular codons at particular locations across repeats provides a clearer explanation.
Analyses of the full array of Aar_AcSp1 iterated repeats were also consistent with two concerted evolution predictions. First, ML analysis grouped araneid AcSp1 repeats into well-supported, species-specific clades rather than grouping the repeats across species ( Figure 3A). Furthermore, nucleotide pairwise identity within each species averaged 98%, but pairwise identity between Aar_AcSp1 repeats and repeats from other species averaged only 78.5% (73.6% vs. Araneus ventricosus, 79.1% vs. A. trifasciata, and 82.8% vs. A. amoena). That repeats are more similar within species than between species regardless of intragenic repeat position can be explained by rapid intra-specific spread of genetic variation via unequal crossing over during recombination [22,23].
Second, the average nucleotide pairwise identity of the first and last Aar_AcSp1 repeats to the rest of the array is slightly lower at 96% and 93%, respectively. Less similar first and last repeats are consistent with some models of concerted evolution [29]. However, araneid AcSp1 first and last repeats still grouped within species-specific clades ( Figure 3A), suggesting that these repeat sequences are more homogeneous within a gene than those of previously analyzed spidroins. For example, in an analysis of repetitive units from the flagelliform spidroin (Flag) of the golden orb-weaver Nephila clavipes and the congeneric Nephila inaurata madagascariensis, the first repeats grouped together across species, and the last repeats also formed their own clade. By contrast, the central (not first or last) repeats formed species-specific clades because each repetitive unit was nearly identical within each species yet divergent across species [30]. Longer estimated divergence times between the species in our present study may explain the more thorough homogenization of araneid AcSp1 sequences compared to that of the previously studied Nephila Flag sequences. The estimated divergence time between the Nephila species is~7.4 MYA [31], whereas Araneus and Argiope are thought to have di-verged~30 MYA and within Argiope,~23 MYA between A. argentata and A. trifasciata [32].

Functional constraints on AcSp1 repeats
We predicted that functional constraints would result in greater aa sequence conservation in the portion of each repeat proposed by Xu et al. to contain six alpha helices [25]. To test this, we first compared known Araneidae repetitive regions. We aligned consensus AcSp1 repeat sequences from three Argiope species (A. argentata, A. trifasciata, A. amoena) and Araneus ventricosus. We then graphed pairwise identities for each aligned position between the A. trifasciata repeat sequence and the other species, and plotted it against the predicted A. trifasciata domains from Xu et al. [25] ( Figure 3B). Using amino acid positions from Xu et al. [25], the average percent pairwise identity over the 150 aa helix-rich domain was 84.0%, but only 54.1% over the remaining 49 aa. Xu et al. [25] also noted a major alpha-helical domain from 102-151 aa, encompassing the region denoted as helix 5 and 6 in Figure 3B. Consistent with being structurally important, the average percent identity in this domain was 90.7%. Moreover, our alignment was slightly longer (216 aa) than the A. trifasciata recombinant repeat length (199 aa) due to indels that only appeared in the unstructured region. Notably, in the region from 200-209 aa (our alignment), the A. trifasciata repeat has a deletion ( Figure 3B). These indels further indicate that the final quarter of AcSp1 repeats is less conserved than the first three-quarters.
To investigate amino acid conservation in the predicted AcSp1 helical regions across greater evolutionary time, we also aligned consensus amino acid AcSp1 repeat sequences from L. hesperus and Uloborus diversus to the A. trifasciata repeat from Xu et al. [19,25]. Araneidae, represented here by Argiope and Araneus, and Theridiidae, represented by L. hesperus, are members of the superfamily Araneoidea, with araneids and theridiids estimated to have last shared a common ancestor~175 MYA [27]. U. diversus is within the Deinopoidea, the sister-group to the Araneoidea. Araneoids and deinopoids diverged from each other~210 MYA [27]. Together, Araneoidea and Deinopoidea compose the Orbiculariae (orb-web weaving spiders and their relatives).
The AcSp1 repeat units from L. hesperus and U. diversus are almost twice as long as the araneid AcSp1 repeat units. The L. hesperus and U. diversus repeat units can be further subdivided into two parts that align with each other [19]. We aligned each part from each species (two parts per species) to the A. trifasciata repeat separately. We then calculated the average pairwise percent identity for each comparison and for each of the six putative alpha-helical regions predicted by Xu et al. ( [25]; Additional file 2: Table S6). The overall pairwise identities between L. hesperus repeat part 1 and U. diversus repeat part 1 with the A. trifasciata repeat was 30% and 29%, respectively. Of note, the percent pairwise identity between L. hesperus repeat part 1 and the A. trifasciata repeat was 47% in the A. trifasciata region associated with helix 4, and it was 41% against U. diversus repeat part 1 in the region associated with helix 6. 47% and 41% were the highest pairwise identity percentages.
Our results strongly support the hypothesis that functional constraints are acting to conserve protein sequence in the repetitive region of AcSp1. Our comparison of A. trifasciata AcSp1 repeat sequence with that of other araneid species indicates that a specific amino acid sequence is maintained in the predicted helix-rich domain of AcSp1 repeats across Araneidae. In contrast, comparison between the A.trifasciata repeat with part 1 of repeats from L. hesperus and U. diversus indicates that amino acid sequence in the regions associated with alpha-helices 4 and 6 are the most highly conserved across Orbiculariae. The higher level of conservation in the amino acid sequences corresponding to helices 4 and 6 may indicate that these regions impart the same general function across Orbiculariae whereas the other predicted helical regions of A. trifasciata impart functions unique to Araneidae. Sequencing AcSp1 from other genera of Araneidae and other families of Orbiculariae will enable further elucidation of these hypotheses.
To our knowledge, there are no current predictions about the secondary structures of L. hesperus or U. diversus AcSp1 repeats. It is feasible that, like the AcSp1 domains of N. antipodiana and A. trifasciata, L. hesperus and U. diversus AcSp1 repeats also feature distinct structural regions. Finally, our analysis may be an underestimation of sequence conservation because it does not include amino acid replacements that are functionally equivalent. However, predicting functional protein similarity is difficult given the extensive physicochemical changes that spider silk undergoes as it is processed from a liquid into dry silk (e.g. [33,34]).

Delineation of AcSp1 variants in individual Argiope spiders
Spidroin sequence variation between individual spiders is an important source of genetic variation for the evolution of different silk types within and between species. To investigate genetic variation in AcSp1 between individuals of A. argentata and the congeneric A. trifasciata and A. aurantia, we first designed PCR primers targeting the repetitive region of Aar_AcSp1. Amplification of genomic DNA across species and individuals resulted in AcSp1 repeat sequences that did not show intraspecific variation but had significant inter-specific variation ( Figure 4A). Intraspecific homogenization of the repeats could be explained by biased PCR amplification of a single repeat type in the repetitive region, however, our results are consistent  Table S2). Box highlights aciniform clade, with Argiope argentata AcSp1 further indicated in red. Vertical bars identify clades by silk type. Bootstrap values greater than 70% are shown. Abbreviations defined in Additional file 2: Table S2. Scale bar represents substitutions per site.
with the high degree of repeat sequence conservation in AcSp1 sequences from araneids ( Figure 3A) and L. hesperus [19].
Next, we designed PCR primers targeting N-and Cterminal coding regions of Aar_AcSp1. We then amplified the same individual genomic DNAs that were surveyed  Table S4. for the repetitive region ( Figure 4). Unlike the repetitive region PCR, direct sequencing of terminal region PCR products resulted in extensive numbers of multiple peaks and in some cases, poor sequencing reads due to length differences. Thus, all terminal region PCR products were cloned, a total of 385 amplicons were sequenced, and AcSp1 variants were diagnosed. Each variant was supported by at least two amplicons with sequences that had greater than 95% identical bases (Figure 4; see Methods). Unlike the repetitive region sequences, which showed no intra-specific or allelic variation, all terminal region amplifications were heterogeneous.
The number of variants characterized was surprising because all of the individual spiders surveyed were found to have more than two terminal region variants, indicating that these Argiope species must have multiple AcSp1 encoding loci. Argiope spiders are not known to be polyploid, thus multiple gene copies per genome is the only explanation for more than two N-or C-terminal region variants in a single individual. For example, we found seven C-terminal variants in one A. argentata, suggesting at least four AcSp1 gene copies ( Figure 4C). Each A. trifasciata individual possessed a minimum of seven Nor C-terminal variants, again indicating at least four gene copies ( Figure 4B, C). Likewise, an A. aurantia individual possessed six N-terminal variants but only two C-terminal variants ( Figure 4B, C). The smaller number of C-terminal variants could be explained by lack of variation in the C-terminal region or by incomplete sampling of variants by PCR survey.
ML analysis of sequences from the PCR survey shows that the branch lengths in the repetitive region ( Figure 4A) are shorter than the branch lengths of the terminal region trees ( Figure 4B, C). The majority of N-and C-terminal variants cluster into well-supported, species-specific groups, and intra-specific branch lengths are very short compared to inter-specific branch lengths ( Figure 4B, C). One exception is A. argentata C-terminal coding region variant V1, which forms a weakly supported group with A. trifasciata and A. aurantia C-terminal coding region variants ( Figure 4C). Given the weak clade support, this variant is probably an outlier that is not as homogenized as the other A. argentata variants.
The shorter branch lengths of the repetitive region variants tree compared to those of the N-and C-terminal region trees suggest that the repetitive region is the most conserved araneid AcSp1 region (Figure 4). Yet, comparison of the average ratio of non-synonymous to synonymous substitution rates (dN/dS) across codons implies that the N-terminal region has been subject to slightly stronger purifying selection than the repetitive and C-terminal regions (0.20 vs. 0.30 and 0.42 dN/dS, respectively). dN/dS estimations, however, assume independence of sites and thus are confounded by factors such as concerted evolution and recombination. The full-length Aar_AcSp1 and other AcSp1 provide extensive evidence that the repetitive region units are most likely not evolving independently from each other ( Figure 3A; [1,19]). Thus concerted evolution and purifying selection both must play a role in the near-perfect homogeneity of Argiope AcSp1 iterated repeats. Recombination can also affect tests of selection [35]. Because we could not conclusively determine the exact number of loci within an individual or assign alleles to specific loci, we were unable to ascertain recombination between loci. Subsequent analyses with additional data could address the impact of recombination on dN/dS estimates.
Previous work with AcSp1 sequences did not find evidence for multiple loci [1,19]. The lack of variation among A. trifasciata AcSp1 cDNA clones [1] may be due to overexpression or preferential cloning of one variant and thus its preponderance in the characterized cDNAs. Alternatively, consistent depletion of aciniform silk may be required to stimulate transcription of multiple AcSp1 loci. This hypothesis is supported by a significant increase in aciniform-silk dependent web-decorating behavior in three species of Argiope in response to a two-week period of aciniform silk depletion [4]. Future work could focus on comparing the number of AcSp1 variants expressed by spiders consistently depleted of aciniform silk versus that from spiders that are not depleted.
Survey of individual L. hesperus genomes also did not find AcSp1 variants [19]. However, the detection of multiple AcSp1 loci in Argiope but not Latrodectus is consistent with the hypothesis that Argiope spiders maintain multiple gene copies as a strategy for efficiently producing large amounts of protein. In contrast with Argiope spiders, Latrodectus spiders use markedly fewer strands of aciniform silk during prey capture [3] and do not make stabilimenta. Increased AcSp1 copy number in Argiope spiders may therefore be a strategy for increasing protein production [36]. Because spidroins are costly, highly expressed proteins [37,38], resource abundance in the form of prey availability may also stimulate aciniform spidroin production in Argiope to prepare for resource scarcity [39,40]. Precedent for this strategy exists. In the bacteria Escherichia coli, multiple copies of rRNA operons provide a competitive advantage by enabling increased growth rates and decreased cell division lag time in environments where resources fluctuate rapidly [41,42].
Previous research has found variants for other spidroins [43][44][45][46], and that the dragline spidroin MaSp1 is encoded by multiple loci in several species [47,48]. Unlike A. argentata AcSp1 variants, the C-terminal coding region of L. hesperus MaSp1 is nearly identical across loci [47]. This difference could indicate functional constraints on the C-terminal coding region of MaSp1 that either differ from or are not acting on AcSp1. A comparison of structural component predictions from the amino acid sequences of terminal regions across different spidroins would greatly inform our understanding of the contribution of the terminal regions to the evolution of different spider silk types.

Conclusions
The highly similar iterated repeat array of our complete Argiope argentata AcSp1 gene combined with sequence conservation of functionally important regions of individual repeats supports a hypothesis of concerted evolution and functional constraints acting together to homogenize Aar_AcSp1 repeats. In addition, several terminal region variants per individual Argiope genome indicate multiple Argiope AcSp1 loci. Across AcSp1 loci within an individual, we found homogenization of the repetitive region, but variation at the terminal coding regions. We also found evidence for stronger purifying selection in the N-terminal region versus the repetitive or C-terminal region, suggesting that the N-terminal region is the most constrained portion of the aciniform spidroin. The maintenance of multiple copies of AcSp1 in Argiope genomes underscores the importance of aciniform silk in Argiope ecology and evolution. Indeed, variation between individuals and multiple gene copies within individuals could provide a method for the rapid synthesis of aciniform silk in this genus, and may represent the early stages of the differentiation that led to the extraordinary sequence and functional diversity of spider silks.

Methods
Isolation and sequencing of AcSp1 containing BAC clone A bacterial artificial chromosome (BAC) library was constructed by Rx BioSciences (Rockville, MD) with Argiope argentata genomic DNA inserted into pCC1BAC vector (Epicentre, Madison, WI). Colony pools were PCR screened for AcSp1 with primers designed from the repetitive region of Argiope trifasciata AcSp1 (Additional file 2: Table S1), resulting in one positive clone. The positive clone was restriction enzyme digested and a~17 kb Hind III fragment of the full insert was found to contain the complete AcSp1 gene.
The 17 kb fragment was gel purified with the S.N.A.P. UV-Free Gel Purification Kit (Invitrogen, Carlsbad, CA), ligated into HindIII digested pZErO™-2 plasmid (Invitrogen), and transformed into TOP10 cells (Invitrogen). Seven plasmid clones with the expected insert size and restriction enzyme digest patterns were end-sequenced with M13 and Sp6 primers to identify orientation of the inserts. Two clones (one of each insert orientation) were tripledigested with SpeI, XbaI, and XhoI and the fragments were gel-purified. The two clones were also singledigested with BamHI and the largest fragment from each digest (5.5 kb or 5.9 kb, composed of the vector and either a 2.2 kb or 2.6 kb insert fragment) was gelpurified and re-circularized to produce subclones. The triple digest produced a 12.4 kb SpeI/XbaI fragment that was gel-purified and subcloned into SpeI digested pZErO-2 plasmid. End-sequencing the subclones revealed that the 2.6, 12.4, and 2.2 kb inserts corresponded to the AcSp1 N-terminal, repetitive, and C-terminal encoding regions, respectively. The 2.6 and 2.2 kb fragments were sequenced in their entirety using primer walking (Additional file 2: Table S1). Because the 12.4 kb fragment contained repetitive nucleotide sequence, primer walking was not feasible. Instead, the 12.4 kb fragment was bidirectionally sequenced in its entirety using the transposon-based EZ-Tn5 < TET-1 > Insertion Kit (Epicentre Biotechnologies). The complete contig of the 17 kb genomic fragment was manually assembled with Sequencher 4.5 (Gene Codes, Ann Arbor, MI). An additional 1 kb of genomic sequence immediately adjacent to the 3' end of the 17 kb fragment was determined by primer walking (Additional file 2: Table S1) using the original BAC clone as template DNA. The complete Argiope argentata AcSp1 gene was uploaded to GenBank with the accession number KJ206620.
Inter-and Intraspecific sampling of N-, repetitive, and C-terminal coding region fragments Genomic DNA was extracted from single legs removed from four A. argentata, one A. aurantia, and two A. trifasciata individuals using the DNeasy Blood & Tissue Kit (Qiagen, Valencia, CA). N-terminal, repetitive, and C-terminal encoding fragments of AcSp1 were PCR amplified using primers designed from the A. argentata AcSp1 complete gene (Additional file 2: Table S1).
PCR products of the expected size were purified using the AccuPrep Gel Purification Kit (Bioneer Inc., Alameda, CA). Products were directly sequenced. If a chromatograph had overlapping peaks, indicative of heterogeneous amplification, then the product was ligated into pJET 1.2 plasmid (ThermoScientific) and transformed into TOP10 cells. Individual colonies were PCR amplified using pJET1.2 Forward and Reverse sequencing primers. Inserts of the expected size were gel purified and sequenced. If one variant was highly abundant, then additional colonies were PCR amplified and digested with restriction enzymes to identify the abundant variants. The remaining undigested PCR products containing the rare variant were purified and sequenced.

Diagnosing variants
Nucleotide sequences from the PCR fragments from each species were aligned as described below. For variant diagnosis, single nucleotide polymorphisms (SNPs) that were present in only one individual clone were attributed to Taq polymerase error and that SNP was ignored. If a sequence had a pattern of polymorphism that was not present in at least one other clone, the sequence was discarded. Neighbor joining trees were then used to visualize highly similar sequences. Clusters that had greater than 95% identical sites were considered a variant group. With the exception of the cluster for A. trifasciata C-terminal coding variant 14 (95.2% identical sites), all clusters had greater than 98% identical bases. Clustered sequences were extracted and aligned to derive the majority rule consensus for that variant. Each variant is therefore supported by at least two sequences. Variants were uploaded to GenBank with accession numbers KJ206570-KJ206619.

Phylogenetic analyses
The conserved spidroin N-and C-terminal regions from the complete A. argentata AcSp1 were aligned to 29 published spidroins that also have both N-and C-terminal regions (accession numbers in Additional file 2: Table S2) using ClustalW [49] implemented in Geneious v6.1.6 (Biomatters Ltd., Auckland, NZ). The N-and C-terminal regions were separately aligned with default settings and the alignments were adjusted by eye. The aa alignments dictated nucleotide alignments. N-and C-terminal encoding region alignments were concatenated for phylogenetic analyses of spidroin paralogs (Additional file 1: Figure S2). Despite potential recombination and convergence in the N-and C-terminal encoding regions, previous research found no conflict between the strongly supported nodes between separate N-and C-terminal trees and that concatenation of the terminal regions provides greater evolutionary resolution [12].
The 20 repeat units from the complete A. argentata AcSp1 repetitive region were divided into individual files and aligned as above with individual repeat units from published araneid AcSp1 sequences (Additional file 1: Figure S3, accession numbers in Additional file 2: Table S3). Alignments for the N-and C-terminal encoding sequences obtained from the PCR survey of individual genomes were created as above using diagnosed variants. Repetitive region alignments from the PCR survey also used the above method (alignments in Additional file 1: Figures S4-S6).

Availability of supporting data
All sequences generated in this study are deposited in GenBank (KJ206570-KJ206620). Alignments used in ML analyses are available shown in the additional files. Alignments and the corresponding trees for this study are available at TreeBASE (http://purl.org/phylo/treebase/ phylows/study/TB2:S15355).
Additional file 2: Table S1. Primers used for full-length Aar_AcSp1 sequencing and targeted amplification of N-terminal, repetitive, and C-terminal coding regions. The name and sequence of primers designed for