Preparation of animals
At least 2000 Tigriopus californicus were collected from each of two natural populations, San Diego (SD: N 32.746842, W −117.254650) and Santa Cruz (SC: N 36.949589, W −122.047036), and maintained in the laboratory at 20°C with a 12-hour light:dark cycle for 1–3 generations before initiation of experiments. Two replicate samples of over 300 individual copepods of all life stages were pooled from laboratory stocks of SD and SC and flushed repeatedly in filtered (0.22 micron) seawater over two days. One sample from each population served as a control treatment, while the other sample served as the acute heat stress treatment. Pooling of large numbers of individuals into single RNA samples allows for biological averaging, and for small experimental designs, pooling has been shown to approximate non-pooled, replicated designs . To initiate an acute heat stress, samples were placed in 50-ml Falcon tubes and immersed in a 35°C water bath for one hour, allowed to recover for 1 hour at 20°C, and then immediately transferred into Tri Reagent (Sigma) for RNA extraction. Samples in the control treatment were maintained at 20°C and otherwise treated identically. Samples in Tri-Reagent were disrupted with a tissue homogenizer and total RNA was then extracted following the manufacturer’s protocol.
The Illumina mRNA sequencing kit (RS-100-0801) and multiplexing kit (PE-400-1001) were used to isolate mRNA and prepare sequencing libraries. Matching barcodes were assigned to the SD control and heat stress sample (TGACCA), as well as the SC control and heat stress samples (GCCAAT), to allow for multiplexing on two lanes of an Illumina GA-II genome analyzer flow cell. This was done to remove possible PCR bias in comparisons of control versus heat stress samples. For sequencing, the two barcoded control samples were pooled in one lane, while the two barcoded heat stress samples pooled in an adjacent lane of the flow cell. Previous experiments have suggested that there is relatively little between-lane bias . Each sample was sequenced for single-end 100 basepair read lengths.
Preparation of reference assemblies
Reference transcriptomes for SD and SC were previously developed from 454 pyrosequencing of each population . To improve coverage and representation of rare gene transcripts in these reference assemblies, we performed de novo assembly using the previously published 454 data and the new Illumina data. Separate assemblies were made for SD and SC sequencing reads. CLC Genomics Workbench v4.5 (CLC Bio) was used to remove adaptors and trim sequences of low quality reads, and the reads were assembled into contigs with the following assembly parameters: minimum fraction length of read overlap = 0.5, minimum sequence similarity = 0.9, minimum contig length 60 basepairs.
Assembled contigs from each population were identified as putative gene products using BLASTX searches in NCBI’s non-redundant (nr) protein database and annotated for gene function using the Blast2GO software [39, 40]. The highest scoring BLAST hit, at a threshold E-value of 10-3, was used to assign a gene name to each assembled contig. For contigs with a positive BLAST hit, Gene Ontology (GO)  terms were retrieved at a threshold of 10-6.
Orthology between SD and SC contigs was assessed by a reciprocal best-hit approach. We used NCBI’s executable scripts to create BLAST-searchable databases with each set of contigs and then performed BLASTN searches between the two data sets. We parsed the output keeping only pairs of sequences that were each other’s best hit, with E ≤10-20, as putatively homologous.
Read mapping and expression analysis
Illumina reads from SD and SC samples were mapped to their respective reference assembly using CLC Genomics Workbench (read mapping parameters: minimum fraction length of read overlap = 0.5, minimum sequence similarity = 0.99). The expression of each gene was determined by calculating the sum of the reads mapping to each contig, and then normalized by dividing the number of observed reads by the length of the contig and by the total number of reads sequenced across the transcriptome (reads per kilobase per million reads, or RPKM). To compare levels of expression between control and heat stressed samples in each population, we used Kal’s Z-test of proportions . The Z-test calculates the difference in the proportion of reads observed between two treatments and compares whether each sample was drawn from the same distribution, using the normal distribution as an approximation of the binomial distribution. Due to the large number of tests, statistical significance was assessed using p-values adjusted by Bonferroni correction. In some cases Bonferroni can be overly conservative, so we consider tests as significant with a Bonferroni p-value ≤ 0.10. This was more conservative than applying a false discovery rate at p < 0.05, which detected many more (>10x) contigs. We then filtered the dataset of significant loci and retained only genes where at least one of the orthologous contigs had >10 mapped reads. For graphical purposes, RPKM values were log10 transformed due to the large range in expression values across genes. Using the GO terms for each transcriptome, we examined whether the group of genes showing a significant change in expression in each population were enriched for particular gene ontology terms relative to their reference transcriptome. Statistical significance was assessed using Fisher’s exact tests implemented in the Blast2GO software.
Evolutionary analysis of the Hsp70 family
As discussed below, RNA-seq results identified 18 loci annotated as hsp70. Because these genes were prominent in the genes responsive to thermal stress, we carried out a more detailed analysis of this gene family. We performed additional BLASTX against the Drosophila melanogaster and Caenorhabditis elegans genomes in order to identify T. californicus contigs that were paralogous hsp70 loci. For each identified hsp70 copy, we aligned the orthologs between SD and SC and performed comparative analyses of sequence divergence. To look for evidence of molecular adaptation, coding sequences were compared to calculate the ratio of non-synonymous (d
N) to synonymous (d
S) nucleotide substitution rates . Typically, d
S ratios much greater than one indicate strong diversifying selection, whereas values close to one are compatible with neutral evolution and values significantly less than one are consistent with balancing or purifying selection. To examine relationships among the Hsp70 proteins, orthologs of Drosophila melanogaster [GenBank: NM_169441.1, NM_079632.5, NM_079615.3, NM_080059.2, NM_169469.1, NM_080188.2, NM_141952.2, NM_167306.1, NM_079017.2], Caenorhabditis elegans [EMBL: F26D10, C15H9, F43E2, C37H5, F44E5.4, F44E5.5, C12C8, F11F1], and Escherichia coli [UniProt: P77319] were obtained as out-group sequences and amino acids were aligned using the ClustalW2 algorithm . A neighbor-joining tree was constructed in Mega5 , using the p-distances model with pairwise deletion of gaps and missing data. Support for topological relationships in the phylogram was evaluated by bootstrapping with 1,000 replicates.
Validation of RNA-seq expression levels
In order to evaluate the accuracy of the RNA-seq expression profiles, we measured the expression of a set of heat shock genes using quantitative PCR (qPCR). We first quantified expression using the same RNA samples prepared for RNA-seq, as a means of technical replication. cDNA from the transcriptome samples was diluted to ~100 ng/ul and 1ul was amplified over 25 cycles with 200 nM of each gene specific primer (Additional file 1) using Brilliant III Ultra-Fast SYBR Green QPCR Mastermix (Agilent Technologies). Several housekeeping genes were chosen to normalize the expression of target genes, including beta-actin, glyceraldehyde 3-phosphate dehydrogenase, tubulin, and myosin. Ct values for each sample were normalized by geometric averaging of the housekeeping genes, implemented in the program GeNorm .
We assessed biological accuracy further by replicating the thermal stress experiment with additional samples (biological replicates). For each replicate, thirty individuals were placed in 15-mL tubes containing filtered seawater, and the experimental thermal stress was repeated as above. RNA was extracted using Tri-Reagent, with tissue homogenization performed using a Mini-BeadBeater 8 (BioSpec) with silica beads (0.175 mg) at full speed for 30s. RNA was quantified by spectrophotometry, and 500 ng of RNA were used for cDNA synthesis with the High Capacity RNA-to-cDNA kit (Applied Biosystems). The experiment was performed with three replicates per temperature for each population. Quantitative PCR was performed in 20-μL reactions with the iTaq Fast SYBR Green Supermix (Bio-Rad), and contained 2 μL of a 1:10 dilution of cDNA and 250 nM of each primer. Fluorescence was measured in a Stratagene Mx 3000P System (Stratagene), with the following thermal profile: 95°C for 3 min, followed by 40 cycles of 95°C for 10 s and 59°C for 30 s. At the end of thermal cycling, a melting curve analysis was performed to confirm the presence of single amplification products. A total of 14 target genes were surveyed, and their expression levels were normalized to the expression of housekeeping genes.