Gene expression variation is widespread among individuals and taxa, has a heritable component, and it is subject to influence by natural selection and genetic drift [56, 57]. Therefore, transcriptome analysis should provide insights into which genes are "important", besides being a common way of discovering differences in gene expression because regulation of gene activity occurs primarily at the transcription level . In this study, we used a new variant of cDNA-AFLP technology to analyze differences in gene expression between two ecotypes of the marine snail Littorina saxatilis, considered an example of incomplete ecological and sympatric speciation. The experimental reproducibility obtained in this work for the cDNA-AFLP technique is in agreement with previous studies using this technology [11, 59–61]. Since a pooling strategy has shown previously similar efficiency to the individual strategy for gene expression comparisons , we analyzed pools of 10 individuals (5 males and 5 females) with the aim of reducing individual effects, thus increasing the power to detect differences between ecotypes. We found that about 4% of the studied transcripts (12% without multitest adjustments) showed differences in expression between ecotypes. We note that our results would be conservative under a hypothetical scenario in which only one of the sexes contributed to expression differences between ecotypes, by averaging in each pool the input due to males and females.
Although the cDNA-AFLP technique is an end-point PCR technique, we note that a crucial characteristic of the AFLP technology is that it is semi-quantitative: the relative intensity of a PCR fragment band in an AFLP fingerprint is related to the original abundance of that fragment in the AFLP template [63, 64]. Even though alternative quantification techniques such as real-time PCR can be several fold more sensitive, our results should be considered for this reason as conservative. This is reinforced by the fact that our conclusions are based on the existence of consistent profiling patterns across both biological and technical replicates. Thus, although small random differences at the start of the amplification may have a large effect on the final amount of PCR product, it is rather unlikely that these fluctuations can produce a repetitive systematic pattern across replicates. Indeed, the probability of getting such a pattern by chance is negligible, as indicated by the ANOVA analysis, even considering multiple testing.
Similarly, any putative technical or biological problem associated to this study was resolved by using two complementary types of controls: First, using both biological and technical replication and showing differences in the biological replicates despite of the detected technical noise. Second, gene expression differences between ecotypes were compared with those observed within ecotypes. So, any possible biases affecting our study should affect similarly to the within-and between-ecotype differentiation, while our results showed that significant transcript differences were observed exclusively between ecotypes.
The observed differences in gene expression could be heritable, caused by environmental effects linked to the different habitats associated to each ecotype, or result from a combination of environmental and heritable factors. Nevertheless, a recent study has shown that much of the variation in mRNA expression is related to genetic variation, and that only a minor part of this variation occurs in response to environmental changes . Similarly, previous studies indicated that most of the protein expression and morphological differentiation observed between the ecotypes in the same population were not plastic [66, 67]. Therefore, and even though it is obvious that environment has a role in gene expression, it seems reasonable to hypothesize that much of the variation found here could be at least primarily influenced by genetic factors, although the exact quantification of the extent of this genetic determination will require further experimental work comparing expression (transcriptomic) differences in wild and laboratory-reared snails.
Sorting or clustering genes into groups according to their expression patterns can provide a broad and interpretable overview of gene regulation . Here, we carried out a cluster analysis in order to identify sets of genes with common expression between ecotypes. We could clearly distinguish two different groups of genes differentially expressed which allowed clustering all the pooled individuals by their ecotype (Figure 1). These two sets of genes could represent different functional groups involved in similar metabolic pathways or, at least, similarly regulated, thought a future confirmation would be needed.
Regarding the identification of the transcripts, most of the differentially expressed genes did not correspond to known sequences in the available databases. This could be due to the scarcity of gastropod sequences in public databases or, alternatively, these sequences may represent novel mRNAs. Another possibility is that they could correspond to the 3' untranslated region of the gene where the sequences are often less conserved than the sequences of protein-coding regions . However, this possibility was reduced in the current study thanks to the modification applied to the original cDNA-AFLP protocol, in which the 3'-end tails close to the poly(A) tail were discarded (see Material and Methods).
When analyzing gene expression data by qPCR, stability of candidate reference genes and an appropriate method of normalization must be carefully evaluated. An ideal reference gene should be expressed at a constant level in each sample, but nowadays it is broadly accepted that the ideal and universal reference gene does not exist [29, 69, 70]. Unfortunately, for many organisms a large sequence dataset is not available, precluding the identification of genes whose transcripts are maintained at stable levels across the samples being surveyed . In fact, this prior validation of reference genes remains uncommon in gastropods [35–42], or even in molluscs, though with exceptions . To the best of our knowledge, the present study represents the first effort aimed at the identification and validation of reference genes for gene expression in a marine gastropod. The recently developed MIQE (the Minimum Information for publication of Quantitative real-time PCR Experiments) guidelines  suggest employing the geometric average of multiple reference genes and assessing gene stability with the support of validated mathematical models such as geNorm . Even when there are few sequences available for L. saxatilis, we decided to discard β-actin gene, one of the most traditionally used in qPCR, for two main reasons: i) both ecotypes show differences in musculature (proportionally, SU ecotype has a bigger food muscle ); ii) it has proven to have an unstable expression in several organisms [73, 74]. According to geNorm, the threshold M value for considering a gene to be unsuitable for data normalization is suggested to be ≥ 1.5 . In this study, the 6 tested genes showed M values below 1.5, being possible good candidates to be used in the normalization. Based on this estimates, geNorm ranks the stability of the six genes in the following order: 18S < Calmodulin < EF1 < EF2 < Histone and Tubulin. Moreover, low values of the pair-wise variation V between two sequential normalization factors containing an increasing number of genes (Figure 2) showed it was unnecessary to include more than the two genes chosen by the geNorm software: histone and tubulin (M = 0.29). Since these two genes are involved in distinct biological processes and metabolic pathways (Table 3), have therefore a smaller chance of being co-regulated genes.
The analysis of qPCR profiles for the candidate gene Cytochrome c Oxidase subunit I (COI) confirmed the results obtained previously by cDNA-AFLP. This up-regulation of the COI in the SU ecotype could result both from an increased number of mitochondria in the SU ecotype and/or changes in transcription rate per se. Previous studies showed that higher levels of mtDNA gene products under physiological stimuli are primarily met by mtDNA replication (reviewed in ). Cytochrome c oxidase (COX) or Complex IV (EC 184.108.40.206), is a large transmembrane protein complex found in bacteria and in the mitochondrion. Specifically, subunit 1, like subunits 2 and 3, is a large and highly hydrophobic protein encoded in the mitochondrial genome . COX is the last enzyme in the respiratory electron transport chain of mitochondria (or bacteria). It plays a fundamental role in energy production of aerobic cells, and also contributes to the storage of energy in the form of an electrochemical gradient that will be used by the oxidative phosphorylation system for synthesis of ATP . Changes in the transcription level of genes involved in energy metabolism have been reported previously, and are especially interesting since they may influence important traits . As COX activity can be modulated according to the energetic requirement of the cell, its increase in SU individuals could be related to a need of energy to avoid the dislodgement by the heavy wave action typical of their habitat. In fact, a very similar result was found in the same population at the proteome level , where enzymes related with the energetic metabolism (arginine kinase and fructose bisphosphate aldolase) were also up-regulated in the SU ecotype. Consequently, the provision of ATP and the control of its metabolism seem to be critical components of the general environmental stress response in all organisms, allowing them to respond with adaptive physiological changes while, at the same time, buffering the changing energy demands . The SU individuals have to develop higher muscular effort than the other ecotype to be able to hold on the rocks while suffering the strong swell of their habitat . Therefore, increasing the energetic metabolism could represent a possible adaptive physiological mechanism underlying differential muscular effort between both ecotypes. Future efforts aimed at determining whether this difference is due to environmental or genetic causes have to be made, for example studying its expression in individuals grown in lab conditions. Similar results have been found between sympatric species of lake whitefish and brook charr, where genes involved in energy metabolism emerged as prime candidates underlying their adaptive divergence [77, 78]. Furthermore, and remarkably, the same transcript corresponding to the COI, together with the transcript 76 not identified, showed a higher expression in RB males versus RB females in a previous work . A similar result was found in gene expression studies between sexes in Drosophila spp, where males showed a higher degree of variation in expression in genes associated with mitochondria and defence functions [79, 80]. This is of particular interest given that sex-biased genes, especially those genes with male-biased expression, appear to evolve faster and, therefore, develop a higher divergence between species [79, 81–84]. Taking into account that the sex dimorphism in size (males are smaller than females ) is much more pronounced in the SU ecotype than in the RB, the coincidence of genes participating in sexual and ecotype differentiation is plausible and raises the hypothesis that genes associated with the male reproductive function may contribute disproportionably to speciation, i.e. faster male theory [85, 86].
A major challenge towards a comprehensive analysis of speciation processes is the integration of data from different "omics" resources and their interpretation. Even when we are very cautious and taking all the precautions due to the inherent differences caused by the distinct techniques, we could make a proxy of what is happening at three different biological levels. At genome level, both ecotypes differed around 3% in their genome by an AFLP scan , whereas the transcripts analyzed in the current study displayed 4% of gene expression differences. In general, it seems that there is a greater differentiation among organisms at transcriptome level than at genome level , maybe due to a higher evolutionary rate of the transcriptome , or to the epistatic and pleiotropic nature of the molecular mechanisms underlying gene expression . On the other hand, the proteome level is the closest to the phenotype, and our previous results  showed higher differentiation between ecotypes at the proteome level, that is around 7%. Since the selection is acting at phenotypic level while variation is generated at the level of the genotype, the proportion of changes caused by selection can be expected to be largest at phenotypic level and smallest at the DNA sequence level , a view that seems consistent with the partial data we have available in the global analysis.
Another central, and yet controversial, question in evolutionary biology concerns the genetic basis of evolutionary change. King and Wilson (1975)  proposed that the key to understand the differences among species is not in the gene-coding, but in the DNA region that regulate the levels, locations, and time of gene expression. An important tenet of evolutionary developmental biology ("evo devo") is that cis-regulatory mutations are more important than structural mutations in phenotypic evolution , although it is also argued that adaptations likely involve a mixture of structural and cis-regulatory changes . Here, the hypothesized increase of gene differentiation between ecotypes along the three different molecular levels could indicate a certain influence of the regulatory elements affecting to gene expression, but such hypothesis will need an independent corroboration. New sequencing effort will be necessary in order to improve our knowledge on the genetic architecture of adaptive traits in this model system.