Bmc Evolutionary Biology Evidence for Positive Selection Acting on Microcystin Synthetase Adenylation Domains in Three Cyanobacterial Genera

Background: Cyanobacteria produce a wealth of secondary metabolites, including the group of small cyclic heptapeptide hepatotoxins that constitutes the microcystin family. The enzyme complex that directs the biosynthesis of microcystin is encoded in a single large gene cluster (mcy). mcy genes have a widespread distribution among cyanobacteria and are likely to have an ancient origin. The notable diversity within some of the Mcy modules is generated through various recombination events including horizontal gene transfer.


Background
Cyanobacteria produce a wealth of bioactive peptide derivatives with a broad range of biological activities and pharmacological properties [1]. Many of these are synthesized on nonribosomal peptide synthetases (NRPS). These megaenzyme complexes typically have a modular architecture. A typical module contains specific functional domains for activation, thioesterification, and condensation of amino acids [2]. Additional domains for the modification of activated amino acids such as epimerization, heterocyclisation, oxidation, formylation, reduction or Nmethylation may also be present [2]. NRPS gene clusters in some cyanobacteria can occupy up to 5 percent of the genome [1].
The modular design of NRPS gene clusters promotes homologous recombination, including recombination within a gene cluster and intragenomic recombination between different gene clusters within the same genome or intergenomic recombination with DNA introduced from other cyanobacteria [3][4][5]. The cellular consequences of recombination will depend on several factors, including the phenotypic effects, if any, of the introduced DNA segment. In order to be successful, the new gene variant should at least not be detrimental to the host. For NRPS systems, important factors will be whether the novel peptide can fulfil the biological role(s) of the original peptide or provide new benefits to the host. Nonetheless, recombination within and among NRPS gene clusters potentially could constitute a mechanism for continuous alteration of the synthetases and peptide products.
The substrate specificity of the adenylation (A) domain is considered to be the primary determinant of substrate selection (for a review, see [18]). Recombination events involving A domains might lead to changes in substrate specificity and subsequently in the microcystin profile [3,15]. The A domains of modules McyB1 (the first module of the McyB protein) and McyC recognise and activate the amino acids that are incorporated in the variable positions X and Z of microcystin (Figure 1). These A domains have been extensively studied within Microcystis [13,15], and Planktothrix [11,14]. Within Microcystis, recombination has lead to the presence of two types of A domains in different strains [15]: a mainly Leu-activating A domain and a mainly Arg-activating A domain that has a high sequence similarity to the Arg-activating A domain in McyC (in the following, these two types of A domains in McyB1 are called B-type and C-like, respectively). Recombination involving A domain coding regions of mcyB1 and mcyC has been detected in Microcystis [13,15] and Anabaena [3,9].
Here, we compare McyB1 and McyC A domains in strains from the three main microcystin-producing genera: Anabaena, Microcystis and Planktothrix to investigate the role of genomic processes in the reshaping of microcystin genes, enzyme complexes and corresponding peptides. We have looked for signs of recombination, both between equivalent and non-equivalent modules, as well as mutations and selective forces acting on these A domains.
Organization of the mcyABC gene cluster (A) Figure 1 Organization of the mcyABC gene cluster (A). Adenylation domains investigated in the present study are indicated in red and green. The relative positions of primers (arrows) are shown. Genus-specific mcyB and mcyC primers are listed in Table 8. (B) The structure of microcystin-LR. Amino acid residues activated by the adenylation domains of McyB1 and McyC are indicated by red and green, respectively. Mdha is N-methyl-dehydroalanine, D-MeAsp is 3methyl-aspartic acid and Adda is 3-amino-9-methoxy-2,6,8,trimethyl-10-phenyl-4,6-decandienoic acid.

Results
In the present study, we have compared A domains of  microcystin synthetase modules McyB1 and McyC from  altogether 58 strains including 21 Anabaena, 19 Microcystis and 18 Planktothrix strains with characterized microcystinisoform profiles, including two non-producers (Table 1). The profiles made it possible to identify the amino acid residue(s) incorporated by each of these modules.

Microcystin isoforms produced by different genera
In total, we identified 22 structural variants (Table 1), mainly differing in the methylation status of D-Asp 3 or Dha 7 , but also the amino acid present at position X (Figure 1). Seven different amino acid residues were found at position X, mainly Leu, Arg and homotyrosine (Hty), but also Phe, homoisoleucine (Hil), homophenylalanine (Hph) and Tyr, while only Arg was found at position Z ( Table 1).
The Anabaena strains mainly produce MC-LR variants, but also several other isoforms, e.g. MC-RR and MC-HtyR (Table 1). Nine Microcystis strains produce MC-RR, either as the only isoform or together with MC-LR and/or MC-YR. One strain produces only MC-YR isoforms, while seven strains produce MC-LR isoforms (together with MC-YR for two strains) ( Table 1). Two of the Microcystis strains examined here have a partial deletion of the mcy gene cluster and do not produce microcystin [13]. Within Planktothrix, 15 of 18 strains produce MC-RR, either as the only isoform (5 strains), together with MC-LR (8 strains) or together with MC-LR and MC-HtyR (2 strains). The remaining three strains produce mainly MC-HtyR, one of them together with MC-LR (Table 1).  Figure S1). Therefore, in the comparisons below of McyB1 sequences from all three genera, only the C-like McyB1 sequences of Microcystis were included.

Variation in evolutionary rates between genera and between McyB1 and McyC
Phylogenetic analyses of the amino acid sequence alignment of the 108 McyB1 and McyC A domain sequences yielded a similar tree topology for all methods used (ML, Bayesian and NJ, Figure 2

Mutation rates and recombination within-and between McyB1 and McyC adenylation domains
The mutation rates ranged from 0.0076 to 0.0359, being lowest in mcyC of Anabaena and Planktothrix (Table 3). Moderate recombination levels (0.010 ≤ ρ ≤ 0.027 per base) ( Table 3) were detected in all data sets except for mcyC from Planktothrix. Low recombination levels were estimated for this data set, but all three permutation tests indicated that recombination rate was not statistically significant different from 0. Recombination rate/mutation    rate ratios below 1 in the Microcystis and Planktothrix data sets (Table 3) suggest that point mutations are the main cause of genetic variation in McyB1 and McyC A domains from these genera. In contrast, a recombination rate/ mutation rate ratio higher than 1 in the Anabaena data sets indicates that recombination has had a major impact on these A domains.
Recombination events were also suggested in all data sets by the mosaic structure of informative sites, with the exception of mcyC from Planktothrix ( Figure 3). The reticulate phylogenies revealed by the split decomposition analysis (Figures 4 and 5) were supported by Phi test (Table 2) in all data sets except mcyC from Planktothrix. Recombination detection programs (RDP, GENECONV and MaxChi) identified several recombination breakpoints along the entire mcyB1 sequences in Anabaena and Planktothrix strains, while only one single putative recombination event was detected within the Microcystis mcyB1 and mcyC data sets (Table 4). No recombination events were suggested by recombination detection programs within the mcyC alignments of Anabaena and Planktothrix. The analyses of the combined mcyB1C data sets ( Figure 6, Table 5) suggested recombination events between mcyB1 and mcyC in Anabaena and Microcystis, but not in Planktothrix.

Substrate specificity of MycB1 and McyC adenylation domains
McyB1 and McyC A domain sequences were aligned with the Phe-activating A domain of GrsA [20] to identify the binding-pocket residues. The binding pocket signatures of McyC A domains (activating mainly Arg) were more or less identical within the genus, while only five residues are identical in binding pocket signatures from all three genera (Table 6). Binding pocket signatures are more diverse in McyB1 A domains, reflecting the diversity of amino acid residues incorporated in position X (Table 6). Some McyB1 modules with identical binding pocket signatures incorporate a somewhat different set of amino acid residues (e.g. Microcystis strains N-C 357 and N-C 496, Table  6), indicating that other residues in the A domain or other  domains, such as the condensation domain influence substrate specificity. A role of the condensation domain in substrate selection has been suggested by several studies (for a review, see [18]).

Adenylation domains and selective forces
An excess of synonymous over non-synonymous substitutions (ω < 1) (  (Table 7). The number of potential sites under positive selection with statistical support (P > 90%) ranged from 3 to 8 (Table 7)  Branch-site models were used to detect possible positive selection acting on the McyB1 sequences from Anabaena and Planktothrix strains that incorporate Hty in position X. (Table 1, Figure 2). There were no statistically significant differences between the log-likelihood values of the alternative models and the null models (data not shown), indicating no evidence for positive selection in domains incorporating Hty.

Discussion
This study is so far the most extensive comparative analysis of microcystin synthetase adenylation domains for the modules McyB1 and McyC. The phylogenetic trees of the 108 adenylation domain sequences showed clustering according to module and genus ( Figure 2). Our data set revealed no signs of recombination between genera, in agreement with previous studies on mcy genes [3,21] and similar studies from other NRPS gene clusters [19,22]. This also is in line with other studies that show that the rate of successful homologous recombination rapidly is reduced with increased genetic distance [23][24][25].  Tables 4 and 5). The low sequence variation (0-1.2%) within Planktothrix mcyC sequences makes it difficult to detect recombination, since for the majority of methods, a minimum sequence variation of 5% is necessary to obtain substantial power [26]. The large sequence divergence between Planktothrix mcyB1 and mcyC sequences might prevent homology-driven recombination, which requires a relatively high level of sequence similarity between the donor and recipient DNA. In Planktothrix, the longest identical DNA segment shared by mcyB1 and mcyC (18 bp) may be too short for initiation of RecA-mediated recombination [27][28][29]. Within Anabaena and Microcystis, the high sequence similarity between these gene segments appears to be maintained by frequent recombination  (Figure 3). In both cases this has resulted in a change of functionality (i.e. amino acid activated) and subsequent production of microcystin-RR.

The evolutionary history of the McyB1 A domain
Our results, together with previous reports [3,11,15], indicate that various types of recombination lead to a continual restyling (remodelling) of the adenylation domains of microcystin synthetase. Recombination within a single domain appears to be frequent and may have little impact on the type of amino acid activated. Recombination between mcyB1 and mcyC appears to be frequent in some genera and may result in changes in the microcystin pro-  file of the recombinant strains. Successful recombination between A domain regions from different NRPS gene clusters [14,15] were found to be infrequent in the strains investigated here.

Positive selection in the adenylation domains of McyB1 and McyC
Overall, the adenylation domains of McyB1 and McyC seem to be under purifying selection, as shown previously  for other segments of the mcy gene cluster [11,12,21], indicating that mutations that affect the amino acid sequence of these domains generally are deleterious. However, the ω-values (0.2-0.49) observed in this study are relatively high compared to ω-values reported for several cyanobacterial house keeping genes (mainly below 0.1) [30], implying a relaxation of selective constraints.

Amino acid residues in the A domains of McyB1 and
McyC that seem to be under positive selective pressure are located throughout the entire analyzed sequence ( Figure  7). Among the positively selected amino acids, residues included in binding pocket signatures are particularly interesting, since they may influence the active site selectivity [31,32]. The amino acid change Phe→Cys in binding pocket position 278 (Table 6) in the McyB1 sequences of Anabaena is an example of this. According to the peptide profiles of the Anabaena strains (Table 1), this change is associated with the incorporation of Hty/Leu, rather than only Leu (or Leu/Arg). Also, the Leu→Phe exchange at the binding pocket position 278 in the McyB1 sequences of Planktothrix (Table 6) is associated with a change in incorporation from Hty to Arg. One could hypothesize that positive selection of these and other residues in the synthetases reflect selection of a particular peptide profile produced by the corresponding strains. Such a causative relationship between these specific genetic changes and phenotypic effects remains to be demonstrated.
Interestingly, a binding pocket residue under positive selection is also present in the McyC sequences of Anabaena (Figure 7). Since all McyC modules studied here mainly incorporate Arg, the selection seemingly does not concern gross substrate specificity. Other properties, like NRPS catalytic efficiency or the ability to produce minor variants, might be the properties selected for. Also in McyB1 sequences from all genera there are several positively selected amino acid residues not associated with substrate selectivity, indicating that some other property is selected for in these A domains. This could for instance again be changes in the catalytic efficiency or in the interactions between neighboring domains and modules.

Sequence comparisons show that the A domain of McyC is more conserved than the McyB1 A domain -also
Informative sites in Anabaena, Microcystis and Planktothrix mcyB1C data sets

Conclusion
Our results revealed no clear indications of recombination across the genera, while frequent recombination events both within and between mcyB and mcyC sequences were detected between strains from same genus, except for mcyC from Planktothrix. We demonstrate remodelling of mcyB and mcyC genes including evidence for positive selection acting at some sites, indicating that the microcystin variant profile of a given strain is likely to influence the ability of the strain to interact with its environment.

Bacterial strains
Cyanobacterial strains were grown at the University of Helsinki and Norwegian Institute of Water Research (NIVA) under continuous white light at a photon irradiance of 7 μmol m -2 s -1 in Z8 medium [33].

Mass spectrometry
Microcystins were extracted from lyophilized biomass collected on glass fiber filters with 50% MeOH as extraction agent. A detailed description of the method can be found in Rohrlack et al. [34].
For the identification of microcystins liquid chromatography with mass spectrometric detection (LC-MS/MS) was used. The instrumental setup included a Waters Acquity UPLC System equipped with a Waters Atlantis C18 column (2.1 × 150 mm, 5 μm particle size) and directly coupled to a Waters Quattro Premier XE tandem quadrupole MS/MS detector. The UPLC system was set to deliver a linear gradient from 20% to 60% acetonitrile in water, both containing 0.1% acetic acid, within 8 minutes at a flow rate of 0.25 mL min -1 . The column and auto sampler temperatures were 20 and 4°C, respectively. At all times, the MS/MS detector was run in positive electrospray mode (ESI+). Other general settings included a source temperature of 120°C, a desolvation temperature of 350°C, a drying gas flow rate of 800 L hour -1 , a gas flow at the cone of 50 L hour -1 , and standard voltages and energies suggested by the manufacturer for the ESI+ mode.
To screen extracts for microcystins, the detector was run in total scanning mode for the mass range from 500 to 1100 Da over the entire UPLC gradient. At this stage, the cone voltage was 60 V and the time for one scan 2 seconds. Afterwards, all mass signals, that represented compounds with a molecular mass within the range of 500-1100 Da, were analyzed in fragmentation experiments. To this end, the detector was run in daughter ion scanning mode and the cone voltage and collision cell settings were optimized to obtain as many fragments of the respective compound as possible. In all cases, argon served as collision gas. Microcystins were identified by their typical fragmentation patterns including a number of immonium ions of amino acids, the characteristic Adda side chain fragment (135 Da), and a number of ring fragments. Identification was further supported by comparing fragmentation patterns with those of Microcystin LR, RR and YR standards that have been purchased from Sigma-Aldrich and by using the fragmentation simulation software HighChem-Mass Frontier (version 3). The precise positions of demethylations in microcystin molecules were not determined.

DNA extraction, PCR amplification and sequencing
For microcystin-producing Anabaena strains supplied by the University of Helsinki strain collection, DNA was extracted from dried cell matter with Qiagen DNeasy Plant Mini Kit (QIAGEN GmbH, Hilden, Germany). Strains from NIVA were lysed according to Chromczynski and Rymaszewski [35] and PCR performed directly on the lysate.
PCR was performed with DynaZyme II DNA polymerase (Finnzymes, Espoo, Finland) and BD Advantage™ 2 polymerase (BD Biosciences, Palo Alto, CA, USA). Primers used for amplification of adenylation domains from the mcyABC operon are listed in Table 8 and their relative positions in the mcyABC operon are shown in Figure 1. Genus-specific primers for Microcystis and Anabaena were designed based on the publicly available mcy gene sequences of Microcystis aeruginosa PCC 7806 (AF183408) and UV027 (AF458094) and Anabaena strain 90 (AJ536156). Primers used for amplifying the mcyB segment from Microcystis strains were placed as far as possible apart from the region involved in the recombination event between the A domain-encoding segments of mcyB and mcyC [13,15]. The mcyB regions flanking the recombination site are highly similar in all Microcystis strains (Additional file 1, Figure S1). The PCR products were purified using E.Z.N.A Gel Extraction Kit (Omega Biotek) and Montage™ PCR Centrifugal Filter Devices (Millipore, Billerica, MA, USA). The purified PCR products were sequenced with both external and internal primers (Table  8). Sequencing was conducted under BigDye™ terminator cycling conditions, and sequencing reactions were purified using ethanol precipitation and separated on an Applied Biosystems 3730xl DNA Analyzer. Chromatograms were examined with the program CHROMAS 2.2 (Technelysium Pty Ltd.), while editing and contig assembly were performed with BIOEDIT sequence alignment editor. All sequences have been submitted to GenBank under accession numbers EU009866-EU009922 (Table 1). Several sequences (for Microcystis strains PCC 7806, K-139, UV027, NIES102 and Anabaena strain 90) were retrived from GenBank together with A domain sequences from Planktothrix spp. generated by Kurmayer and co-workers [11,14] (Table 1).

Sequence alignments and phylogenetic analyses
Amino acid sequences of A domains from all genera were aligned using ClustalW [36]. The best evolution model based on the sequence alignment was determined using ProtTest [37]. The sequences were used to infer the phylogeny in a Bayesian framework applying the program MrBayes v3.1 [38]. Analysis with the following parameters was performed: JTT model, gamma distribution, running 2 million generations and sampling trees every 100 generation, burn-in 3000 trees. The maximum likelihood (ML) tree was estimated using PhyML [39] under the JTT model, gamma distribution and with parameter values indicated by ProtTest. The Neighbor joining (NJ) tree was obtained under the JTT model and gamma distribution using MEGA version 3 [40]. Bootstrap confidence limits were obtained by 1000 replicates in both ML and NJ analysis. DnaSP version 3.51 [41] was used to estimate mutation rates, based on the number of segregating sites, using the Watterson's estimator of Θ [42] and the average nucleotide diversity (π) [43].

Recombination analyses and nucleotide substitution statistics
Recombination was investigated by split decomposition analysis using SplitsTree version 4.8 [44] with default settings (uncorrected P method) and 1000 bootstrap replicates together with Phi test for recombination [45]. In addition, the following statistical tests for detecting recombination were used: GENECONV [46], RDP, and MaxChi [47] analyses in the RDP version 2 b08 program package [48]. For detecting recent and older recombination events using GENECONV G-scale values 0 and 1 were used, respectively. Recombination was also detected by visual analysis of informative sites (variable sites where each variant occurs in at least two sequences) as described by Rudi et al. [24].
The recombination rate, ρ = 2 Nr (N is the effective population size and r is the recombination rate per nucleotide site per generation) was estimated for each data set using the composite likelihood method proposed by Hudson [49] and extended to allow for finite-site mutation models [50]. The method is based on combining the coalescent likelihoods of all pairwise comparisons of segregating sites. The hypothesis of no recombination was tested using the likelihood permutation test (LPT) as in McVean   [2] are shown and binding pocket residues [32] are indicated by red diamonds. Amino acid residues undergoing positive selection are shown in dark blue boxes. Numbering of amino acid residues according to GrsA (swissprot: P0C061). et al. [50] and the permutation tests which detect a decrease in r 2 and |D'|, measures of linkage disequilibrium, with an increase in the physical distance. Both the composite likelihood analysis and the three permutation tests were carried out using the LDhat package [50].
We used CODEML from the PAML v3.15 package [51] to test for the presence of codon sites affected by positive selection and to identify those sites under selection. A likelihood ratio test (LRT) for positive selection [52,53] compares two codon substitution models, one of which accounts for positive selection and the other which does not. The gene is inferred to be under positive selection if (1) ML estimates suggests that there are codon(s) under positive selection (with ω = d n /d s > 1) and (2) the LRT is significant. Simulations by Anisimova et al. [54] showed that high levels of recombination seem to affect dramatically the accuracy of the LRT test and that recombination often mistakenly is seen as evidence of positive selection. LRTs of M0-M3 and M1-M2 are heavily affected, while LRT of M7-M8 is much less (positive selection was falsely detected in only 20% of replicates). Therefore, models M7 (beta) and M8 (beta and ω) were considered in present study. Under the model M7 (beta), the ω ratio various according to the beta distribution and does not allow the positive selected sites (< ω < 1), and thus serves as the null model by comparing with model M8 (beta and ω). Model M8 adds an additional site class to the beta model to account for sites under positive selection (ω > 1). A Bayesian approach implemented in CODEML and shown to be robust to recombination effects [54] was used to identify residues under positive selection. The average ω for A domain sequences was calculated using the parameters of the best fitting model.
Branch-site models [55] were employed to test for positive selection acting on specific branches in the phylogenetic tree. Branches of the tree were divided a priori into foreground and background lineages, and a LRT was constructed by comparing a model that allows positive selection on the foreground lineages (alternative model) with a model that does not allow such positive selection (the null model).

Authors' contributions
ATK designed the study, contributed to molecular studies, performed the phylogenetic, recombination, mutation and selection analysis and drafted the manuscript. DPF: Contributed to molecular studies and helped draft the manuscript. TR: performed the LC-MS/MS analysis and helped draft the manuscript. JJ: performed the LC-MS/MS analysis of certain Anabaena strains. LR: revised the manuscript. KS: participated in coordination of the study at HU and revised the manuscript. TK and KSJ participated in the design of the study, interpretation of the results and revision of the manuscript. All authors read and approved the final manuscript.