Recombinational micro-evolution of functionally different metallothionein promoter alleles from Orchesella cincta

Background Metallothionein (mt) transcription is elevated in heavy metal tolerant field populations of Orchesella cincta (Collembola). This suggests that natural selection acts on transcriptional regulation of mt in springtails at sites where cadmium (Cd) levels in soil reach toxic values This study investigates the nature and the evolutionary origin of polymorphisms in the metallothionein promoter (pmt) and their functional significance for mt expression. Results We sequenced approximately 1600 bp upstream the mt coding region by genome walking. Nine pmt alleles were discovered in NW-European populations. They differ in the number of some indels, consensus transcription factor binding sites and core promoter elements. Extensive recombination events between some of the alleles can be inferred from the alignment. A deviation from neutral expectations was detected in a cadmium tolerant population, pointing towards balancing selection on some promoter stretches. Luciferase constructs were made from the most abundant alleles, and responses to Cd, paraquat (oxidative stress inducer) and moulting hormone were studied in cell lines. By using paraquat we were able to dissect the effect of oxidative stress from the Cd specific effect, and extensive differences in mt induction levels between these two stressors were observed. Conclusion The pmt alleles evolved by a number of recombination events, and exhibited differential inducibilities by Cd, paraquat and molting hormone. In a tolerant population from a metal contaminated site, promoter allele frequencies differed significantly from a reference site and nucleotide polymorphisms in some promoter stretches deviated from neutral expectations, revealing a signature of balancing selection. Our results suggest that the structural differences in the Orchesella cincta metallothionein promoter alleles contribute to the metallothionein -over-expresser phenotype in cadmium tolerant populations.


Background
Transcriptional regulation plays an important role in the evolution of many phenotypes, especially when the phenotype correlates with expression level of a particular key gene. Transcriptional regulation is mostly controlled at the level of transcriptional initiation, i.e. the recruitment of transcription factors which determine the stability of the RNA polymerase II holoenzyme complex and hence the frequency of transcription initiation. Due to the modularity of transcription factor binding site clusters and the lack of reading frame constraint, promoters are more evolvable than coding regions [1,2]. A few point mutations in a cis-regulatory region can already confer functionally different phenotypes, even when the pattern does not deviate from neutral expectations [3,4]. Some examples of variation in regulatory loci conferring adaptive phenotypes are: the LdhB promoter of Fundulus heteroclitus [4][5][6][7], the hsp70Ba promoter of Drosophila melanogaster [8,9] the chalcone synthase promoter of Arabidopsis thaliana [3] and the Cyp6g1 promoter of Drosophila melanogaster [10]. In this paper we focus on transcriptional regulation of the Orchesella cincta metallothionein gene (mt), which is assumed to be involved in heavy metal tolerance.
Metallothioneins are low molecular weight metal-binding proteins with a high content of conserved cysteines within certain phylogenetic lineages and a lack of aromatic amino acids and histidine [11,12], although some invertebrate metallothioneins deviate from this pattern [13][14][15]. Metallothioneins are involved in essential metal homeostasis, metal detoxification, free radical scavenging, cell proliferation and apoptosis processes [16]. They are induced by several chemical and physical stresses [17], including free metal ions, altered redox status, oxidative stress and heat shock. The Orchesella cincta metallothionein (MT) [18] has a molecular weight of 7 kDa and consists of 77 amino acids. Almost all of the body burden of Cd is located in the gut epithelium and a major part of it is bound to MT [19,20]. It is suggested that the excretion of Cd from the animal occurs by a molting cycle regulated apoptotic process, by which the Cd-loaded midgut epithelium is shed [21,22]. Higher constitutive [23] and cadmium inducible [24]mt mRNA levels have been observed in populations from heavy metal contaminated sites, compared to populations from reference sites. Parent-offspring comparisons showed that Cd-induced expression of mt (h 2 = 0.48) is a heritable trait. Differences between expression level classes were linked to RFLP patterns of the pmt locus [25]. Although certain alleles of the mt coding sequence are linked to heavy metal pollution of the soil [26], it is rather unlikely that the heavy metal tolerance can be attributed to a single gene [27,28].
Inherited heavy metal tolerance has been associated with duplication events [29] and polymorphisms in the metallothionein coding sequence [26,30] The regulation of MT biosynthesis is mainly transcriptional and depends for the most part on cis-acting regulatory elements, such as the metal responsive element (MRE) which binds metal responsive transcription factor-1 (MTF-1), a Zn-finger protein, and the anti-oxidant responsive element (ARE) which recruits the nuclear erythroid derived related factor-2 (Nrf-2), a protein of the b-zip leucine zipper family [16,31]. However, this situation can not be generalized for all invertebrate phyla [32,33] Therefore, the induction of metallothionein can be considered as a concerted action of general, metal-specific and oxidative stress specific transcription factors.
In the past decades several studies on metallothionein promoters of invertebrates have been performed [34][35][36][37][38][39][40][41][42], although most mechanistic studies have been done on vertebrate model organisms, reviewed by [16,31,43]. One way to study the functionality of promoters is to fuse them to a quantitative reporter gene and analyze the induction in vitro in a host cell line. The comparison of allelic polymorphism in promoters by reporter assays has mainly been applied in medical biology, e.g. [44,45]. Only few studies in evolutionary ecology have compared alleles in this functional approach [5][6][7]46] and only one study compared the metal inducibility of metallothionein promoter alleles [47].
In the present study natural occurring allelic variation of the Orchesella cincta metallothionein promoter (pmt locus) is described. Following discovery of extensive variation in promoter sequence we formulated the following research question: "Are the pmt alleles observed in natural populations differentially induced by Cd, paraquat and 20-hydroxyecdysone (molting hormone) and can their induction be related to their different architecture?" Luciferase constructs were made and tested in an arthropod cell line for dose-dependent inducibilities. Paraquat is included in the experiment, because it generates reactive oxygen species in the electron transport chain [48] causing oxidative stress. This approach allows us to discriminate between the effects of Cd and oxidative stress separately. Finally, we tested if the pmt allele frequency distribution, based on nucleotide diversity measures, deviated from neutrality in a tolerant and sensitive population.

Results
General architecture of the pmt locus Nine different alleles were identified in an alignment of 32 1500 bp promoter sequences (see additional files 1,2,3,4,5,6,7,8,9,10). The consensus sequences of the respective alleles and the O. villosa were submitted to Genbank (DQ523588 to DQ523596, DQ641512, DQ523587 and EF106974). The general architecture of the nine promoter alleles is shown in Fig. 1, which also indicates positions of putative core promoter elements and transcription factor binding sites [16,31,[49][50][51]. The number of the putative TFBS is summarized in Table 1 for each allele. The basal promoter consists of an initiator (Inr) consensus with an overlap of a 20-hydroxyecdysone responsive element (HERE). The pmtA allele contains two extra putative initiators. All the alleles, except pmtC, have a downstream promoter element (DPE) consensus downstream of their Inr. All pmt alleles contain MREs, which are all orientated in the sense direction. The proximal promoter, about 300 bp 5' from the Inr, contains most of the MREs. The alleles pmtA1, pmtA2, pmtB, pmtD1, pmtD2 and pmtBAL have five MREs in this region, named MRE-a to MRE-e. A number of indels in this MRE-rich region make this region variable. The MRE-a was apparently lost from the pmtC allele by a 13 bp deletion. A 19 bp deletion 5' of the MRE-b, relative to pmtC, pmtD1, pmtD2 and pmtBAL, characterizes pmtA, pmtB, pmtE and pmtF. This deletion affects the spatial position of the MREs. The pmtC and pmtF alleles have a point mutation which disrupts the consensus MRE sequence of MRE-d and MRE-e respectively. The pmtD1 and pmtD2 alleles share a HERE between MREa and MRE-b, by one point mutation. An AP-1 binding site consensus was only retrieved in the forward PCR primer D1-36F, and is not further discussed. This primer was developed after genome walking resulting in a clone that was apparently pmtA1. All alleles, except pmtC share a DNA replication-related element (DRE) [50].
Architecture of the nine respective metallothionein promoter alleles (pmt) Figure 1 Architecture of the nine respective metallothionein promoter alleles (pmt). The respective putative transcription factor binding sites are represented as in the legend. Indels are indicated with black triangles. MRE, metal responsive element; ARE, anti-oxidant responsive element; DRE, DNA replication-related element; HERE, 20-hydroxyecdysone responsive element; Inr, initiator; DPE, downstream promoter element; C/EBP, CCAAT enhancer binding protein. The full sequence alignment is given in Additional File , 10.
A putative enhancer, ± 850 bp upstream from the Inr (when the 1269 bp insertion of pmtBAL is not taken into account), contains another MRE (MRE-f) and an anti-oxidant responsive element (ARE). Within this enhancer and towards the proximal promoter a number of HEREs and C/EBP (CCAAT enhancer binding protein binding site) binding sites are found scattered, which differ in number and position between the respective alleles. The 1269 bp indel in the pmtBAL allele is delineated at its edges (relative to the other alleles in the alignment) by two C/EBP binding sites. Furthermore this insertion contains 1 ARE, 3 HEREs and another C/EBP binding site.

Similarities among pmt alleles and recombinant analyses
The phi test for recombination in the Splitstree4 program [52] found statistically significant evidence for recombination (p = 1.09 × 10 -13 ). A bootstrap confidence network [52,53] based on a split decomposition analysis was developed representing the inferred recombination events in the pmt locus (Fig. 2). Split decomposition analysis addresses the problem of conflicting phylogenetic signals due to recombination which is not necessarily a branching or tree-like process. Parallel edges in the network represent evolutionary lineages of conflicting bifurcating trees. The parallel edges are presented in different colors in order to relate the events to results of the analysis below. The 1269 bp indel from the pmtBAL allele was omitted from these analyses. The evolution of the apparently ancestral alleles, pmtBAL, pmtC and pmtF, can be interpreted in a bifurcating pattern, whereas the more recent alleles pmtE, pmtD2, pmtD1, pmtB, pmtA2 and pmtA1 are consistent with a reticulate origin. The recombination analysis is presented in Fig. 3. Data below a bootscan threshold of 70% were omitted from the graphs. Breakpoints with their respective p-values from the Recco analysis, are plotted in the same graphs as a matter of convenience.
The pmtA1 and pmtA2 alleles consists of a ± 500 bp upstream block shared with pmtD1 and a ± 400 bp block shared with pmtB confirmed by very high bootstrap values. In between a slight similarity with pmtE was found. Recco confirmed this bootscan analysis and found a recombination breakpoint between the two dominant blocks. Allele pmtB contains a ± 150 bp upstream block related to pmtC and the ± 400 bp downstream block shared with pmtA as mentioned before. Both regions had recombination breakpoints at their respective 3' and 5' edges. The central part of the pmtB allele shares limited similarity with pmtD2. The pmtC allele has the shared ± 150 bp block with pmtB and a block shared with pmtE at the outer 3' end of the locus (the second exon of the mt gene), both confirmed by the Recco method, although not visible in the reticulate network. In between the two blocks bootscan analysis identified similarity with pmt-BAL. Numerous recombination breakpoints were detected in this region, suggesting that this allele contains a recombination hotspot. The pmtD1 situation is very clearcut. This allele contains the ± 500 bp block shared with pmtA and a ± 400 bp 3' block shared with pmtD2, confirmed by the Recco method. Both regions have numerous putative recombination breakpoints at their edges. The pmtD2 allele shares a ± 500 bp upstream block with pmtE and the downstream block with pmtD1 mentioned before, confirmed by the Recco method. In between the respective blocks a smaller ± 100 bp block similar to pmtB, surrounded by recombination breakpoints, was found. The pmtF allele has a ± 100 bp upstream block shared with pmtBAL immediately followed by a very small block shared with pmtB. Only the former block was confirmed by Recco. The pmtBAL situation was rather contradictory, although some recombination breakpoints provided by Recco confirmed the bootscan similarity profile with pmtF. The limited similarity with pmtC was not supported by Recco. pmtE consists of the clear cut upstream ± 500 bp Table 1

: Summary of the occurrence of the respective putative transcription factor binding sites. a: The number of MREs is given as one in the enhancer region (in the vicinity of the ARE) plus the number in the proximal promoter. b: The number of HEREs is represented as the number in the region upstream of the initiator plus the one overlapping the initiator.
Inr Consensus TCAKTY [49,84] RGWYV [84] TGCRCNC [31] TGA CNNNGC [31] CCAAT [31] TATCGATA [50] KNTCANTNNMM [51] pmtA block shared with pmtD2 and the shared second exon with pmtC. On the other hand, the central part showed some above threshold similarity with pmtA1. Again numerous putative recombination breakpoints, at the edges of some shared blocks, were detected by Recco.
The deep trenches in the respective recombined blocks represent conserved modules, e.g. the one at position ± 400 bp is the region containing the ARE and the distant MRE.

Functional analysis
The functional significance of pmt variation was assessed by evaluating the effect of the different promoters on gene expression. Luciferase reporter assays of pGL3-pmt constructs were performed in Drosophila S2 cell line. No induction of the empty vector pGL3Basic neither the normalization vector pAc5.1/V5-His/lacZ was observed (data not shown), implying that the observed induction of the pGL3-pmt constructs are exclusively due to the interaction of the transcription factors from the host cells with the mt 95% confidence reticulate network of the eight described Orchesella cinca pmt alleles with the Orchesella villosa pmt clone as an out-group, following 1000 replicate bootstraps in Splitstree v4 (uncorrected p for nucleotide substitution, NeighborNet to cal-culate the distance and Reticulate to calculate the splits) Figure 2 95% confidence reticulate network of the eight described Orchesella cinca pmt alleles with the Orchesella villosa pmt clone as an out-group, following 1000 replicate bootstraps in Splitstree v4 (uncorrected p for nucleotide substitution, NeighborNet to calculate the distance and Reticulate to calculate the splits). The colored parallel edges refer to the colors in the recombination analysis.
Recombination analysis of Orchesella cincta metallothionein promoter alleles Figure 3 Recombination analysis of Orchesella cincta metallothionein promoter alleles. Bootscanning analysis representing the percentage of permutated trees (left axis) that did coincide between the respective pmt alleles in a sliding window approach (200 bp width, 20 bp step size, Kimura 2-parameter for nucleotide substitution) relative to the sequence position. Only the relationships which trespass the 70% threshold of the permutated trees are presented. On the right axis the p-values of the respective breakpoints are indicated. The colours refer to the parallel edges in the reticulate network (Fig. 2).
promoters in the luciferase constructs. As a positive control for ecdysone treatment the 20-E exposure was also performed on cells transfected with the construct pEcRhspluc [54], a highly 20-E inducible luciferase construct (results not shown). Indeed, a high induction level was observed, comparable to Poels et al (2004) [45].
Basal luciferase expression data are presented in Fig. 4. Highly significant differences in basal expression were detected between the luciferase constructs from the different alleles, following a One Way ANOVA test, with the Tukey HSD post-hoc test. The basal expression from the pmtCluc construct hardly deviated from the empty vector pGL3Basic (data not shown) and was an order of magnitude lower than the basal expression values of the other constructs. The pmtD1luc had a higher basal expression than the pmtA1, pmtF and the pmtC constructs.
The dose response graphs from the Cd, paraquat and 20-E exposures and their estimated parameters are given in Figs. 5, 6, 7 and 8 respectively (see Additional File , 11 for a numerical summary of the curve fit data). It appeared that all constructs were susceptible to Cd. The RLU max estimates of pmtD2luc and pmtFluc Cd exposures did not differ and were the highest observed. Exposure of the other constructs to Cd resulted in significantly different RLU max values. The pmtCluc constructs, with the lowest RLU max were the least inducible. The slope of pmtD2luc in the Cd exposure was steeper than pmtBluc and pmtFluc, implying a quicker inducibility. The most sensitive construct to induction by Cd, represented by the lowest EC 50 , was pmtA1luc. It differed significantly from the least sensitive constructs pmtBluc, pmtD2luc and pmtFluc.
The pmtCluc construct did not show a significant Pearson correlation with paraquat concentration (p > 0.05) and no model fit was possible. No significant differences in the estimated parameters of the paraquat exposure were found in the responses of the inducible constructs. The 20-E exposure data revealed an inhibition of every construct. The estimated values for the parameters were abandoned, because of the wide associated 95% confidence intervals, therefore only the estimates for the RLUmin value are represented. It appeared that the pmtBluc construct was the least inhibited at the maximum exposure concentrations of 0.1 mM 20-E, compared to the other constructs.
At the lower range of the 20-E exposure a slight induction was observed in all constructs. We tested the RLU estimated from the inducing concentration range relativeto the unexposed control in a One Way ANOVA approach (see Table 2). The pmtBluc and pmtD2luc constructs were both significantly induced at respectively 0.1 nM and 1 nM 20-E exposure.

Allele frequencies in field populations
The occurrence of the pmt alleles was assessed in O. cincta populations from a clean reference site (Roggebotzand) and a metal-polluted site (Plombières). These data are summarized in Table 3, together with information on soil metal concentrations and several indices of Cd tolerance in the two populations obtained from earlier work (Cd excretion, growth reduction and mt expression). The insert of the pmtBAL allele is omitted from this analysis.
Pollution by heavy metals in the abandoned Pb-Zn mine of Plombières dates back to the Middle Ages and has proceeded until the beginning of the 20 th century [55]. This is reflected by the almost 200-fold higher total Cd content of this soil compared to the clean site in Roggebotzand. This latter site is located on reclaimed land, which fell dry in 1968. A G-test for differences of allele frequencies between both populations was highly significant (p < 0.001). The frequencies of pmtA1 and pmtA2 are relatively low in the population from the mining site Plombières, while the pmtC, pmtB and pmtD2 alleles are more represented compared to the situation in the reference Roggebotzand population. In addition, a higher nucleotide, haplotype and nucleotide diversity was observed in the Plombières population. When calculating the Tajima's D test, no deviation from neutral expectations was observed.

Discussion
From our data we have very good evidence that allelic diversity at the pmt locus has evolved by extensive recombination events, although we do not have knowledge about the genetic mechanism, e.g. crossing-over or gene conversion. The patterns in the splits decomposition network could somehow be linked to the results of the recombination analysis. The most important recombination blocks, which became evident in the bootscanning and Recco methods, were visible in the network. Especially the more recent alleles, pmtA1, pmtB, pmtD1, pmtD2 and pmtE revealed clear-cut signals of recombination among each other. Recombination between the older alleles, pmtBAL, pmtF and pmtC, were not detected at all in the Recco and the Splitstree method Reasons for this could be that the parental alleles were not sampled or that former recombination events could be hidden behind the muta-Dose response relationships of the six luciferase constructs tional load. Because the Recco method takes the minimization of recombination and mutation costs into account, instead of the tree-like model in the bootscanning approach, it detects recombination false positives to a lesser extent [56]. The absence of reticulations in the edges of these alleles in the 95% confidence reticulate network, points towards less conflicting bifurcations (Fig. 2). These may be caused by the accumulation of mutations following past recombination events.
The numerous recombination breakpoints at the pmt locus can be explained by the fact that transcriptionally active chromatin with recruited transcription factors is hypersensitive to recombination initiation [57]. Not transcription in se but the recruitment of recombination machinery by environmentally activated transcription factors following chromatin remodeling causes recombination events. For example, in fission yeast, phosphorylation by the stress-activated protein kinase from the MAPK pathway increases the affinity of the transcription factor ATF1. PCR1 for a cAMP-responsive element-like (CRE-like) DNA sequence and remodels the chromatin [58]. This process and the respective CRE-like sequence are associated with a recombination hotspot in fission yeast [59]. In every pmt allele two CRE core sequences, CGTCA [60] were found. Because of the importance of the composition of the flanking sequences of the CRE core sequence and the large number of CREB proteins, these data are not further discussed, although it may be of importance for transcriptional regulation and initiation of recombination at the Orchesella cincta pmt locus.
The 1269 bp insertion in the pmtBAL allele, which does contain some relevant putative transcription factor binding sites, is suggested to be a possibly recombined or Dose response relationships of the six luciferase constructs Figure 6 Dose response relationships of the six luciferase constructs. On the Y-axis the β-galactosidase normalized relative luciferase units (RLU) are presented. On the X-axis the exposure concentrations of paraquat are indicated.
duplicated region from another promoter. Similar events, where promoter regions were swapped between loci, have been described before [46].
The number of MREs in stress gene promoters in general [61] and metallothionein promoters in particular varies extensively. The same applies to their sequence and their spatial context. However, three MRE's (MRE-b, MRE_c and MRE-e) were conserved in sequence (TGCACAC) between O. cincta and the out-group species O. villosa. Previous studies revealed that the proximal MRE cluster in metal-responsive promoters is necessary [62] and MREs in the proximal promoter region need to work cooperatively for the full inductive capacity [36,43,63]. The most proximal MRE to the transcription start (MRE-a), has been shown to be most contributing to induction by Zn and Cd of the human hmt-IIA promoter [64] and cooperates with more upstream MREs for the complete heavy metal induction [65].
The induction by paraquat was several orders of magnitude lower than the induction by Cd, which possibly reflects that not all the MREs are involved. In a study by [66] three different oxidative stressors elicited a twofold response of the rainbow trout MT-B promoter luciferase constructs, comparable to our study with paraquat. Beside the ARE and the MRE-a it was suggested that ARE half-sites (TGAC) are as well responsible for oxidative stress inducibility [66].
The reporter assays show that allele C is hardly inducible by Cd and oxidative stress (Fig. 5) and shows no basal transcription level above background (data not shown).

This may be explained by the deletion of the MRE-a in this
Dose response relationships of the six luciferase constructs  A: RLU max estimates from the cadmium (Cd) exposure data, B: Slope estimates from the Cd exposure data, C: EC 50 estimates from the Cd exposure data D: RLU min from the 20-E exposure data Figure 8 A: RLU max estimates from the cadmium (Cd) exposure data, B: Slope estimates from the Cd exposure data, C: EC 50 estimates from the Cd exposure data D: RLU min from the 20-E exposure data.
allele as follows. During exposure to Cd and paraquat, trans-activation in non-mammalian metallothionein promoters can occur by interaction of an enhancer (containing MREs and/or ARE) with the MRE-a and other proximal MREs [43,63,[66][67][68][69]. The induction of pmtC by Cd is low and completely abolished by paraquat possibly due to the incapability of trans-activation between the MRE-a (lacking in the C allele) and the distal enhancer, directly or in combination with CCAAT/enhancer binding proteins [70,71]. Binding sites for CCAAT/enhancer binding proteins were detected in all alleles of the pmt locus. Also, the MRE-a is often important in determining basal expression levels [43,[64][65][66]. Deletion of MRE-a in pmt C can therefore explain the low basal expression of this allele. Finally the importance of MRE-a is reflected in the highly conserved core sequence: O. cincta MRE-a has the same core sequence as Onchorhynchus mykiss, Strongylocentrotus purpuratus and Drosophila metallothionein promoters.
There may be an alternative explanation for the very low basal expression levels of pmtC as well as low stress induction, related to the absence of DRE. The DRE [50] is an element found in DNA replication related genes, as well as in stress responsive genes (a.o. glutathione-S-transferase, catalase). This element coordinates the cell cycle specific expression of metallothionein. Since pmtC lacks this element (disruption of the consensus by a point mutation) it may contribute to the low inducibility and basal expression.
We realize that we tested O. cincta derived promoters in a Drosophila genetic background. Thus, the observed levels may not reflect the levels in vivo due to absence of O. cincta specific transcription factors or altered binding specificity of Drosophila transcription factors to O. cincta specific transcription factor binding sites. Reporter assays are sometimes unable to generate the tissue, temporal and species specific transcriptional regulation [72][73][74][75]. For instance, aberrant expression levels have been observed when expression levels of 12 orthologous genes of human and chimpanzee were compared in cell lines from different human tissue origin [76] when compared to their in vivo expression levels. In contrast with that, Crawford et al [7] obtained consistent results using two unrelated fish cell lines (rainbow trout hepatoma cells and salmon cardiac cells) to study transcriptional activity of killifish lactate dehydrogenase-B promoters.
The general pattern of 20-E exposure is the inhibition of the metallothionein promoter. This probably occurs because of the overlapping HERE and Inr. The moderate inducibility of pmtB and pmtD2, at respectively 0.1 and 1 nM 20-E, can not be explained straightforward, although pmtD2 has a larger number of HEREs (5+1, See Table 1). Total cadmium (Cd) content of the soil; allele frequencies of the pmt alleles; S, total number of variable (segregating) sites; η, total number of mutations; Hd, haplotype diversity ± SD; π, nucleotide diversity per site ± SD; Tajima's D, Fu and Li's D and Fu and Li's D*, mean normalized expression, p-value of the G-test performed on the reconstituted dataset (df = 8), MNE, mean normalized expression at mRNA level; Cd excretion efficiency; percentage of Cd-induced growth reduction.
The spatial and sequence context could be the reason for their inducibility. In the firebrat Thermobia domestica 20-E equivalents peak to 5 μM during apolysis [77] falling back to the basal concentration of 80 nM, indicating that the order of magnitude of our exposure range was well within the physiologically relevant range. These data suggest that the metallothionein expression is switched off during ecdysone-induced apoptosis of the gut epithelium, when the onset of new cuticle formation is set.
Our small-scale population genetic comparison between populations from the clean site in Roggebotzand and the polluted one in Plombières reflected a positive sign of balancing selection at Plombières, because of the significant positive Fu and Li's D values, higher nucleotide diversity per site, and higher haplotype diversity in the latter population. A study by Timmermans et al. [26] revealed selection on certain alleles of the O. cincta mt coding sequence by heavy metal content in the soil. However, no signatures of any selection were detected in the amino acid sequence (d n /d s , Fisher's exact test) or in the nucleotide composition (Tajima's D). The population at Plombières is characterized by relatively high frequencies of pmtD2 and pmtB, which is in accordance with their greater inducibility compared to the most common pmtA1 allele. This suggests a fitness advantage to phenotypes with high metallothionein expression at polluted sites [23]. On the other hand, the Plombières population also has a relatively high frequency of the less responsive pmtC allele, This is consistent with the signature of balancing selection detected by Fu and Li's D values and can be understood if there are not only advantages in terms of metal tolerance but also fitness costs associated with high mt expression. Further work is necessary to elucidate the precise relationship between fitness and pmt genotype.

Conclusion
The Orchesella cincta metallothionein promoter contains a high degree of polymorphism, reflected in the different number and spacing of consensus transcription factor binding sites involved in relevant regulatory processes, such as heavy metal, oxidative stress induction and the regulation by the molting cycle.
In general, the evolution of transcriptional regulation can occur by stabilizing selection by which erosive mutations in TFBS and the emergence of new ones are compensating each other. Evolution due to transcriptional regulation has been reported in macro-evolutionary processes, e.g. the even-skipped-2 enhancer of Drosophila species [78][79][80]. Alternatively, micro-evolutionary processes rather occur by point mutations in (putative) TFBS [3,6,7]. Here, we provide evidence of extensive recombination, reshuffling the nucleotide variation of insertion, deletions and point mutations in the micro-evolution of a regulatory locus.
Since induction of metallothionein is suggested to be associated with a metal-tolerant phenotype [24,25] the difference in inducibility between the pmt alleles provides a scope for natural selection in field populations. The deviations from neutral expectations in the cadmium tolerant population from Plombières support this suggestion. Future approaches will be the detection of DNAbinding activity of selected transcription factors, measure the in vivo transcription by real-time Q-PCR, and screen field populations for their respective pmt allele frequencies.

DNA purification
DNA of individual animals was purified by the SV genomic DNA purification system (Promega Corporation). The maxiprep used for the genome walking procedure was done by a modified CTAB extraction method [81] on 100 adult individuals.

Transcription Factor Binding Site Analysis
Because of the lack of functional studies on this promoter, the descriptive work of the observed sequence variation was restricted to the identification of consensus binding sites available in the literature by using the program Genepalette 1.2 [83]. The core promoter structure was analyzed by using consensus sequences from the literature [49,83,84]. Consensus binding sites for transcription factors known from metallothionein induction [16,31] cell cycle regulated and stress genes [50] and molting cycle regulated transcription processes [51] were included in the analysis. Firstly, luciferase signals in each well were normalized with an internal luciferase standard, in order to standardize the readings between different plates. Secondly, the normalizations for the number of cells, and the transfection efficiency, were done by dividing the latter values by A 420 values from the ONPG measurement. The basal expression values of the constructs and the pGL3Basic vector were measured in four replicate experiments, six wells per experiment.

Data analysis
DNA sequences were processed and aligned in Vector NTI version 10.0.1 (Invitrogen). Recombination events, which took place at this locus, make the construction of a bifurcating tree less relevant. Therefore, the Splitstree4 software [52] was used to construct a reticulate network to represent evolutionary relationships between the respective alleles. The assumptions under which the network was constructed are: uncorrected P for nucleotide substitution, NeighborNet to calculate the distances and the reticulate method to treat the splits. Gaps, constant and non-parsimonous positions were omitted from the analysis. Bootstrap analysis was performed on one thousand replicates and subsequently a 95% confidence network was constructed.
Recombination sites were detected by using two methods. The first method we used relies on the bootscan method implemented in the program Simplot [86]. It constructs replicate trees in a sliding window approach. The general accepted threshold level for the detection of a recombination is a clustering in 70% of the permutated trees in a certain window. Thousand replicate neighbor-joining trees were made by using the following parameters, window size 200 bp, step size 20 bp and the Kimura 2-parameter as a model to estimate nucleotide substitution. Alternatively, a non-phylogenetic method, applying dynamic programming that minimizes the mutation and recombination cost between sequences was used. This method is implemented in the software Recco [56]. The parameter α, representing the ratio of mutation cost to recombination cost was set to 0.2, the methods to calculate mutation and recombination costs were respectively Hamming and Delta Dirac.
The population genetic data were achieved by RFLP analysis of the amplified pmt fragments by PCR (Janssens unpublished). A reconstituted dataset was made by pasting the consensus for every allele the number of times it was observed in the sample, according to Timmmermans et al., in press These datasets were aligned with each other and an alignment with the out-group Orchesella villosa pmt sequence was provided. Molecular diversity indices and deviations from the neutral theory (Tajima's D, Fu and Li's D and D*) were calculated using DnaSP v4.10 [87].
The basal expression RLU data were log(x+1) transformed to approach normality of the data, A One Way ANOVA with p < 0.05 was executed. The LSD post-hoc test was performed to discriminate significance groups.
Initial curve-fitting was done in Kaleidagraph v 3.5 for a rough estimate of the parameters, and the curve plotting. Fine tuning of the curve fitting was done in SPSS v 12.0.1.
The dose-responses of the Cd and paraquat exposures on the RLU of every construct were compared by estimating the RLU max , EC 50 and the slope by fitting a curve with the following formula, from which RLU max by summing RLU 0 (the average RLU at unexposed conditions) and RLU e .
The 20-E exposure data were fitted with the following formula and RLU min , slope and EC 50 were estimated.
The RLU values from 20-E exposure concentrations at which a putative induction was observed were tested by a One Way ANOVA (p < 0.05) to test for induction.