Phylogenomics of Cas4 family nucleases
© The Author(s). 2017
Received: 7 August 2017
Accepted: 16 November 2017
Published: 28 November 2017
The Cas4 family endonuclease is a component of the adaptation module in many variants of CRISPR-Cas adaptive immunity systems. Unlike most of the other Cas proteins, Cas4 is often encoded outside CRISPR-cas loci (solo-Cas4) and is also found in mobile genetic elements (MGE-Cas4).
As part of our ongoing investigation of CRISPR-Cas evolution, we explored the phylogenomics of the Cas4 family. About 90% of the archaeal genomes encode Cas4 compared to only about 20% of the bacterial genomes. Many archaea encode both the CRISPR-associated form (CAS-Cas4) and solo-Cas4, whereas in bacteria, this combination is extremely rare. The solo-cas4 genes are over-represented in environmental bacteria and archaea with small genomes that typically lack CRISPR-Cas, suggesting that Cas4 could perform uncharacterized defense or repair functions in these microbes. Phylogenomic analysis indicates that both the CRISPR-associated cas4 genes are often transferred horizontally but almost exclusively, as part of the adaptation module. The evolutionary integrity of the adaptation module sharply contrasts the rampant shuffling of CRISPR-cas modules whereby a given variant of the adaptation module can combine with virtually any effector module. The solo-cas4 genes evolve primarily via vertical inheritance and are subject only to occasional horizontal transfer. The selection pressure on cas4 genes does not substantially differ between CAS-Cas4 and solo-cas4, and is close to the genomic median. Thus, cas4 genes, similarly to cas1 and cas2, evolve similarly to ‘regular’ microbial genes involved in various cellular functions, showing no evidence of direct involvement in virus-host arms races. A notable feature of the Cas4 family evolution is the frequent recruitment of cas4 genes by various mobile genetic elements (MGE), particularly, archaeal viruses. The functions of Cas4 in these elements are unknown and potentially might involve anti-defense roles.
Unlike most of the other Cas proteins, Cas4 family members are as often encoded by stand-alone genes as they are incorporated in CRISPR-Cas systems. In addition, cas4 genes were repeatedly recruited by MGE, perhaps, for anti-defense functions. Experimental characterization of the solo and MGE-encoded Cas4 nucleases is expected to reveal currently uncharacterized defense and anti-defense systems and their interactions with CRISPR-Cas systems.
Cas4 is one of the core CRISPR-associated (Cas) proteins, which is implicated in the adaptation phase in several subtypes of CRISPR-Cas systems [1–4]. During this phase of the CRISPR response, a segment of invading DNA (protospacer), usually from a virus or plasmid, is selected, typically, based on the presence of a protospacer adjacent motif (PAM) [5, 6]. The protospacer then is incorporated by the Cas1-Cas2 adaptation complex into a CRISPR array, most often, next to the first, 5′-terminal repeat. The array is transcribed into a single, long pre-crRNA, which is subsequently processed into mature CRISPR (cr)RNAs and is incorporated into the effector complex or a single effector protein, which scans invading DNA and, once a match is found, recruits nuclease domain-containing Cas proteins to cleave the target DNA [7–10]. So far the cas4 gene has been identified in the subtypes I-A, B, C, D, U of Class 1 CRISPR-Cas systems and subtypes II-B and V-A, B, E of Class 2. In each of the respective loci, the cas4 gene is adjacent to cas1 and cas2, and in some cases, fused with cas1, which is compatible with the involvement of Cas4 in adaptation [11, 12]. Furthermore, it has been shown that in subtype I-A loci, cas1, cas2 and cas4 genes form a single operon , and for subtype I-B, the requirement of Cas4 for adaptation has been demonstrated in a direct experiment . A recent study on the cas gene requirement for adaptation by the type I-A system of the crenarchaeon Sulfolobus islandicus has shown that two paralogous cas4 genes (one denoted csa1) in the cas operon are both essential, suggesting that these proteins could form a complex in which the two paralogs would perform unique functions .
Cas4 is a DNA exonuclease that contains a nuclease domain and a Fe-S cluster-binding module [2, 14–16] and is homologous to the nuclease moieties of well-characterized proteins involved in recombination and repair in bacteria (RecB, AddB) and eukaryotes (Dna2) [17–21]. The Cas4 nuclease belongs to the expansive PD-DEXK phosphodiesterase superfamily, named after the conserved catalytic motif present in most members . Similar to the AddB nuclease, Cas4 is thought to form recombinogenic 3′ ssDNA overhangs in the protospacers, thus facilitating their subsequent incorporation into the CRISPR array as dsDNA spacers . In the structures that have been solved for Cas4 proteins from Pyrobaculum calidifontis (Pcal_0546; pdb: 4R5Q)  and Sulfolobus solfataricus (SSO0001, pdb: 4IC1) , Pcal_0546 is a monomer, whereas SSO0001 forms a toroidal decamer composed of 5 tightly packed dimers. The Pcal_0546 protein has been shown to exhibit a metal-dependent 5′ to 3′ exonuclease activity against ssDNA substrates , whereas the Cas4 protein SSO1391 from S. solfataricus has been reported to cleave ssDNA in both the 5′ to 3′ and the 3′ to 5′ directions . However, in an independent study, only the 5′-3′ exonuclease activity has been detected for SSO1391 as well .
Although the cas4 gene is usually located next to cas1 or cas2 genes and, in some CRISPR-Cas systems, is fused to cas1 (eg. subtypes I-U and V-B), many bacterial and archaeal genomes encompass an additional cas4 gene that is not associated with CRISPR-cas loci (hereinafter, solo-Cas4). Notably, solo-cas4 genes are present in many archaeal and bacterial genomes that lack CRISPR-cas loci .
The recent discovery of Cas4 homologs encoded in bacteriophage genomes has added a new twist to the functional repertoire of the Cas4 family. It has been shown that a Campylobacter phage that encodes a Cas4-like protein, employs this protein to stimulate acquisition of host-derived spacers by the Campylobacter type II-C CRISPR-Cas system (which lacks cas4). The phage appears to use this mechanism to evade host immunity via a mechanism that remains to be characterized [22, 23].
Cas4 homologs have been identified also in archaeal viruses  and casposons, self-synthesizing transposons that employ Cas1 homologs as recombinases and are thought to be the ancestors of the CRISPR-Cas adaptation modules [25, 26]. One archaeal virus, Thermoproteus tenax virus 1 (TTV1), encodes an inactivated derivative of a Cas4-like nuclease that became a component of the viral nucleocapsid .
Despite the current keen interest in the mechanisms, diversity and evolution of the CRISPR-Cas systems, no comprehensive analysis of the Cas4 family has been reported so far. The aim of this work is to present such an analysis with the focus on cas4 gene neighborhoods and comparison of the (predicted) functions and evolutionary regimes of CRISPR-associated Cas4 and solo-Cas4 subfamilies.
Results and discussion
Phylogenomic analysis of the Cas4 protein family
As reported previously, the majority of the Cas4 proteins belong to two families, namely, COG1468 and COG4343; the latter is specific for subtype I-A CRISPR-Cas systems and is also known as Csa1 (these two COGs correspond to pfam01930/DUF83 and pfam06023/ DUF911, respectively) . In addition, Cas4-like nucleases of pfam12705/PDDEXK_1 are occasionally found in association with cas loci and have been shown to affect spacer adaptation by certain CRISPR-Cas systems [22, 23]. In the present analysis, we focused primarily on these three major families of Cas4 homologs and disregarded a few other, less common PD-DExK families that also have been reported in the vicinity of several CRISPR-cas loci .
To collect protein sequences of the Cas4 family, we screened 48,599 prokaryotic genomes (Genbank, March 2016) using PSI-BLAST search  with the sequence profiles related to the three families of Cas4 homologs. The results of this search were filtered to exclude sequences that produced better scores to other profiles in the CDD database (See Material and Methods for details). Additionally, we searched 2996 phage genomes from the PHANTOM database . In total, 7060 protein sequences were retrieved, of which 883 were from complete bacterial and archaeal genomes, and 272 were from bacterial or archaeal viruses.
Inspection of the multiple alignment of Cas4 proteins (see Materials and Methods) shows conservation of the catalytic motifs of the PD-DEXK phosphodiesterase superfamily, suggesting that all Cas4 proteins are active nucleases. All members of the Cas4 family contain a Fe-S-cluster-binding module with 4 conserved cysteines. It has been shown that different Cas4 protein bind either [2Fe-2S] or [4Fe-4S] clusters [2, 15]; however, no consistent difference in the patterns of sequence conservation was detected between proteins binding these distinct ligands.
Subtypes I-A, I-B, I-D and to a lesser extent I-U, are scattered among many strongly supported Cas4 branches suggesting extensive shuffling of the adaptation and effector modules (Fig. 3b-f, Additional file 1: Figure S1, Additional file 3: Information File S1). This evidence of module shuffling is consistent with previous observations from phylogenetic analysis of Cas1, although this effect has been underestimated and was not carefully explored. With many more genomes now available for analysis, it becomes apparent that adaptation modules of subtypes I-A, I-B and I-D can function with (almost) any of the respective effector modules (Fig. 3, Additional file 1: Figure S1, Additional file 3: Information File S1). Furthermore, these adaptation modules apparently can combine also with type III and type IV effector modules (Fig. 3, Additional file 1: Figure S1, Additional file 3: Information File S1) because some branches within well-supported Cas4 clades correspond to these CRISPR-Cas types (Fig. 3, Additional file 1: Figure S1, Additional file 3: Information File S1).
Solo-Cas4 can be clearly differentiated from CAS-Cas4 in the phylogenetic tree where they form several distinct branches that are mostly consistent with the order-level taxonomy of archaea, suggesting predominant vertical inheritance (Additional file 1: Figure S1, Additional file 3: Information File S1). Despite the general trend to retain solo-Cas4 in the genome, in some archaeal lineages, this gene apparently evolved fast, so that, for example, two groups of Thermoproteales, Pyrobaculum/Thermoproteus and Vulcanisaeta, encode dissimilar solo-Cas4 sequences that do not cluster with each other or with homologs from other archaea. Several members of the archaeal DPANN superphylum that mostly lack CRISPR-Cas systems, encode two distinct solo-Cas4 proteins (Additional file 1: Figure S1, Additional file 3: Information File S1). In bacteria, solo-Cas4 shows a patchy distribution, with some notable exceptions. First, solo-Cas4 is present and monophyletic in the majority of Thermoanaerobacteriales and many Chloroflexi. Second, solo-Cas4 is encoded in many genomes of the Candidate Phyla Radiation (CPR), a recently described assemblage of bacteria with small genomes and many unusual features, such as Parcubacteria, Nomurabacteria, Roizmanbacteria and others [34, 35]. Some of the solo-Cas4 proteins from the CPR confidently group with homologs from the archaeal DPANN superphylum (Additional file 1: Figure S1, Additional file 3: Information File S1). Thus, solo-Cas4 is clearly over-represented in environmental bacteria and archaea with small genomes and could perform important biological functions in these organisms that remain to be characterized. Furthermore, dissemination of solo-Cas4 genes via HGT between bacteria and archaea appears likely. In several of the DPANN archaeal genomes, the gene context, namely the adjacency to genes coding for restriction-modification systems, implies that solo-Cas4 functions in so far uncharacterized defense pathways (Additional file 7: Table S1).
Cas4 in mobile genetic elements (MGE-Cas4)
Phylogenetic analysis of the Cas4 family identified several distinct branches that are associated with MGE and viruses, especially, archaeal ones; to our knowledge, the spread of Cas4 homologs in MGE has not been systematically described previously. In particular, a cas4 gene is present in most of the known viruses of Desulfurococcales and Sulfolobales (Additional file 1: Figure S1, Additional file 3: Information File S1). For example, Cas4 proteins encoded in Acidianus filamentous virus genomes form a lineage that is the sister group of solo-Cas4 from Sulfolobales within the archaeal solo-Cas4 branch, suggesting acquisition of these genes from the host by the ancestral virus (Additional file 1: Figure S1, Additional file 3: Information File S1). Sulfolobus spindle shaped virus encodes Cas4 that is most similar to a group of Cas4 proteins associated with type I CRISPR-Cas systems in Thermococcales (Additional file 1: Figure S1, Additional file 3: Information File S1). In this case, the ancestral virus seems to have captured a CAS-Cas4 that became solo in the virus genome. Several Cas4 proteins from Sulfolobus islandicus rudiviruses and Acidianus filamentous virus also form a branch, albeit weakly supported. Cas4 proteins from numerous phages form a large, well-supported clade which includes many phages of Cyanobacteria and Proteobacteria, suggesting dissemination of the cas4 gene among phages. One branch in the Cas4 tree cannot be linked to any known phages but likely corresponds to uncharacterized prophages integrated in genomes of Campylobacteriales. Another large clade of Cas4 is apparently associated with conjugative plasmids from Burkholderiales. Other Cas4 proteins encoded by bacteriophages and archaeal viruses are scattered over the tree and typically do not show clear affinity to solo-Cas4, CAS-Cas4 or each other (Additional file 1: Figure S1, Additional file 3: Information File S1).
Horizontal gene transfer and selection pressures on different groups of cas4 genes.
To estimate the relative contributions of the vertical and horizontal components in the evolutionary history of the cas4 gene family, we compared the tree-induced distances between Cas4 protein sequences and the distances between the 16S rRNAs from the corresponding genomes (proxy for the species tree) as described earlier  (see Methods for details); a parallel analysis of Cas1 proteins was used as a reference (see Methods for the link to the data available online).
Both Cas4 and Cas1 only loosely follow the species tree in their history: the Spearman rank correlation coefficients with the 16S rRNA distances for CRISPR-Cas associated Cas4 and Cas1 are 0.15 and 0.22, respectively (see Methods for details). Notably, both solo cas4 genes and solo cas1 genes show a stronger vertical evolution trend (correlation coefficients of 0.47 and 0.53, respectively). The substantially greater horizontal component in the evolutionary history of the CRISPR-associated genes reflects the more dynamic nature of the evolution of defense systems compared to other categories of microbial genes . The cas4 and cas1 genes encoded in the same locus nearly always evolve (and get transferred) as a single unit (see Methods for the link to the data available online): the correlation between the cas4 and cas1 distances for co-located genes is 0.88, and both show closely similar, low correlation coefficients with the rRNA tree (0.14 and 0.13, respectively).
All cas4 genes seem to evolve under comparable levels of selective pressure, with the medians of the dN/dS ratios for the CAS-cas4 and solo-cas4 gene pairs being in the range of 0.06 to 0.18 (Additional file 5: Information file S3). These values are close to the genome-wide median dN/dS ratios that are in the range of 0.10 to 0.14 . Thus, in agreement with previous observations , all cas4 genes experience moderate levels of purifying selection that are close to the median values for the respective bacteria and archaea. Furthermore, it has been shown that the characteristic selective pressure on cas4 gene is only slightly lower than that on cas1 and cas2, the most strongly conserved cas genes, which is compatible with the joint involvement of the respective Cas proteins in adaptation .
Cas4-like proteins of pfam12705
As mentioned above, Cas4-like proteins that showed the highest similarity to pfam12705 are not associated with CRISPR-Cas systems. However, in addition to the Campylobacter phages that encode a Cas4-like nuclease that has been shown to interfere with the host type II-C CRISPR-Cas system [22, 23], members of this family were identified in several other notable contexts. These include a large clade that consists of phages infecting different Mycobacteria as well as some Propionibacterium and Bacillus phages as well as several smaller branches including phages that infect Bacillus, Thermus and other bacteria (Additional file 1: Figure S1, Additional file 3: Information File S1).
We constructed a tree of UvrD homologs from all genomes that encode the UvrD-Cas4-like fusion and found that the helicase domains of the fusion proteins are also monophyletic, indicating a single origin and, most likely, a unique function of the fusion proteins (Additional file 6: Information File S4). The UvrD-Cas4 proteins are encoded in two distinct, conserved contexts. The most common genomic neighborhood includes genes for methylation (Mod) and restriction (Res) subunits of Type III Restriction-Modification systems [42–44]. Several of these loci also encompass genes for predicted regulatory proteins containing WYL and helix-turn-helix (HTH) domains as well as Shlaffen-like ATPases fused to an HTH domain that are often encoded in other defense contexts.
Another context of the UvrD-Cas4 fusion includes DNA-cytosine methylase Dcm that is responsible for all 5-methylcytosine modifications in E. coli . The physiological role of this methylation remains unknown although it has been shown that Dcm recognition sites correspond to sequence motifs for very short patch repair of T/G mismatches, and the Vsr nucleases involved in this process are often encoded next to the dcm gene . Indeed, several uvrD-cas4 loci also contain a gene for a Vsr nuclease, suggesting that this system retains the same features of recognition sites and functional link between Dcm and Vsr (Fig. 6). The same pair of genes was observed previously in a variety of defense gene contexts, MORC family ATPases in particular . Collectively, these genomic associations strongly suggest that the UvrD-Cas4 helicase-nuclease fusion proteins are components of multiple microbial defense systems.
Considering that, in both cases described above, the UvrD-Cas4 protein appears to belong to the same subfamily and the uvrD-cas4 gene is either stand-alone or is found in non-conserved contexts in archaeal and bacterial genomes, it appears that the fusion protein plays an ancillary role in different pathways by facilitating site-specific DNA unwinding and cleavage. The Cas4-like nuclease domain can be predicted to play a role analogous to the nuclease domains of RecB and AddB in promoting DNA recombination pathways . Most of the bacterial genomes that encode the UvrD-Cas4 fusion also encode RecB (a near ubiquitous protein in bacteria) suggesting that the functions of these helicase-nuclease fusion proteins could be partially redundant rather than complementary, a pattern typical of repair pathways [47, 48].
The link between UvrD and Cas4-like protein evolved at least on one other independent occasion. In several bacteria, mostly Clostridiales, the two genes are not fused but are encoded in a predicted operon in a context that does not appear indicative of defense functions, suggesting that the two enzymes could be jointly involved in other cellular processes, such as DNA repair (Fig. 6).
Finally, in several bacteria and mesophilic archaea, we identified another link between a Cas4-like nuclease and a superfamily 2 helicase that belongs to the uncharacterized YprA-like family (COG1205) and is unrelated to UvrD. This helicase contains a characteristic C-terminal metal-binding domain (pfam09369/DUF1998 family) and large inserts with additional putative metal-binding domains (Fig. 6). Helicases of this family are strongly associated with defense islands , suggesting that, together with the Cas4-like protein, they participate in so far uncharacterized defense pathways.
Unlike most cas genes, Cas4 family endonucleases are about as often encoded outside as they are within CRISPR-cas loci and are also found in many MGE, particularly archaeal viruses. Because of this unusual distribution of cas4 genes in bacterial and archaeal genomes, we sought to explore the phylogenomics of the Cas4 family. The results of this phylogenomic indicate that evolution of those cas4 genes that belong to CRISPR-Cas adaptation modules (Cas-cas4) involves extensive horizontal transfer that, however, almost always involves complete adaptation modules. The evolutionary integrity of the adaption modules sharply contrasts the rampant modular shuffling of CRISPR-Cas systems whereby a given variant of the adaptation module can combine with virtually any variety of effector modules. The evolutionary history of solo-cas4 genes is much better compatible with the microbial species tree and appears to include fewer horizontal gene transfer events. The Cas-cas4 and solo-cas4 genes evolve under moderately strong purifying selection that, for both classes of cas4 genes, is close to the genomic median. Thus, the evolutionary regime of the cas4 genes, along with cas1 and cas2 genes, resembles that of ‘regular’ microbial genes involved in various cellular functions, with no evidence of direct involvement in virus-host arms races. However, cas4 genes were recruited by various MGE on many independent occasions, suggesting that Cas4 nucleases might be involved in anti-defense functions. The solo-Cas4 as well as MGE-encoded Cas4 so far have not been studied experimentally. Functional characterization of these proteins is likely to result in identification of novel defense and anti-defense systems, and their connections with CRISPR-Cas systems.
The prokaryotic genomic dataset was taken from WGS and GenBank repositories at the NCBI as of March 2016 and includes 43,000 genomes. Cas4-like proteins were identified using PSI-BLAST search against sequence profiles corresponding to the following subfamilies: COG1468, COG4343, pfam01930, pfam06023, cd09637, cd09659, cls000170  and pfam12705. To ensure specificity of the extracted set of proteins after the initial identification, all retrieved protein sequences were additionally run against the Conserved Domain Database (CDD) using the RPS-BLAST program with the same parameters and all proteins with higher similarity to any other profiles were discarded. The protein sequences from the cas4 neighborhoods were extracted from the respective genomes or contigs and annotated using the COGs, Pfam and CD profiles from the CDD and the custom defense gene profiles . The cas genes and CRISPR-Cas subtypes were annotated according to the previously described nomenclature .
Clustering and Phylogenetic analysis
The NCBI BLASTCLUST program (ftp://ftp.ncbi.nih.gov/blast/documents/blastclust.html) was used to cluster sequences by similarity. For Cas4 clustering, BLASTCLUST was used with the sequence identity threshold of 50% and length coverage threshold of 0.8; for 16S rRNA nucleotide sequences, the program was used with sequence identity threshold of 90% and length coverage threshold of 0.9. For Cas8 and Cas10d clustering, the program was used with sequence identity threshold of 0.5 and length coverage threshold of 80%. Alignments of protein sequences were constructed using MUSCLE  and MAFFT  programs. Short fragments or disrupted sequences were discarded. Sites with the gap character fraction values >0.5 and homogeneity <0.1 were removed from the alignment. Phylogenetic analysis was performed using the FastTree program , with the WAG evolutionary model for amino acid sequences and the GTR evolutionary model for nucleotide sequences. The same program was used to compute bootstrap values.
Relationships within diverse sequence families were established using the following procedure: initial sequence clusters were obtained using UCLUST  with the sequence similarity threshold of 0.5; sequences were aligned within clusters using MUSCLE . Then cluster-to-cluster similarity scores were obtained using HHsearch  (including trivial clusters consisting of a single sequence each); a UPGMA dendrogram was constructed from the pairwise similarity scores. Highly similar clusters (pairwise score to self-score ratio > 0.1) were aligned to each other using HHALIGN ; the procedure was repeated iteratively. At the last step, sequence-based trees were reconstructed from the cluster alignments using FastTree , as described above and rooted by mid-point; these trees were grafted onto the tips of the profile similarity-based UPGMA dendrogram.
For the comparison of evolutionary distances between Cas4 and Cas1 protein sequences and 16S rRNA sequences from the corresponding genomes (proxy of the species tree), all pairwise distances between CRISPR-associated or solo proteins were calculated from the corresponding phylogenetic trees; for protein sequences from unaligned clusters, an arbitrary high distance of 12 was assigned to the corresponding pair. If a pair of genomes contained more than one Cas4 or Cas1 sequence in either or both of them, the shortest protein-protein distance was selected to represent this genome pair. Distances between the corresponding 16S rRNA sequences were, likewise, calculated from the 16S rRNA trees or, if the two 16S rRNAs belonged to unaligned clusters, an arbitrary distance high distance of 3 was assigned. Spearman rank correlation between protein-protein and rRNA-rRNA sequence distances for all pairs was reported as a measure of coherence between the protein evolution and the species tree.
All pairwise distances are available for download at ftp://ftp.ncbi.nih.gov/pub/wolf/_suppl/cas4.
The authors thank Koonin group members for helpful discussions.
This work was supported by intramural funds of the US Department of Health and Human Services (the National Library of Medicine, to EVK) and by the NIH grant R35 118,160 (to MPT).
Availability of data and materials
The datasets supporting the conclusions of this article are included within the additional files and available at ftp://ftp.ncbi.nih.gov/pub/wolf/_suppl/cas4
Conception and design of the study: KSM, YIW, EVK. Data collection: SH, SS. Data analysis: SH, YIW, MPT, KSM. Manuscript drafting: SH, YIW, KSM. Manuscript revision for critical intellectual content: EVK. All authors read and approved the final manuscript.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Plagens A, Tjaden B, Hagemann A, Randau L, Hensel R. Characterization of the CRISPR/Cas subtype I-A system of the hyperthermophilic crenarchaeon Thermoproteus tenax. J Bacteriol. 2012;194(10):2491–500.View ArticlePubMedPubMed CentralGoogle Scholar
- Lemak S, Beloglazova N, Nocek B, Skarina T, Flick R, Brown G, Popovic A, Joachimiak A, Savchenko A, Yakunin AF. Toroidal structure and DNA cleavage by the CRISPR-associated [4Fe-4S] cluster containing Cas4 nuclease SSO0001 from Sulfolobus solfataricus. J Am Chem Soc. 2013;135(46):17476–87.View ArticlePubMedPubMed CentralGoogle Scholar
- Li M, Wang R, Zhao D, Xiang H. Adaptation of the Haloarcula hispanica CRISPR-Cas system to a purified virus strictly requires a priming process. Nucleic Acids Res. 2014;42(4):2483–92.View ArticlePubMedGoogle Scholar
- Liu T, Li Y, Wang X, Ye Q, Li H, Liang Y, She Q, Peng N. Transcriptional regulator-mediated activation of adaptation genes triggers CRISPR de novo spacer acquisition. Nucleic Acids Res. 2015;43(2):1044–55.View ArticlePubMedPubMed CentralGoogle Scholar
- Amitai G, Sorek R. CRISPR-Cas adaptation: insights into the mechanism of action. Nat Rev Microbiol. 2016;14(2):67–76.View ArticlePubMedGoogle Scholar
- Jackson SA, McKenzie RE, Fagerlund RD, Kieper SN, Fineran PC, Brouns SJ: CRISPR-Cas: Adapting to change. Science 2017, 356(6333).Google Scholar
- Barrangou R, Horvath P. A decade of discovery: CRISPR functions and applications. Nat Microbiol. 2017;2:17092.View ArticlePubMedGoogle Scholar
- Barrangou R, Marraffini LA. CRISPR-Cas systems: prokaryotes upgrade to adaptive immunity. Mol Cell. 2014;54(2):234–44.View ArticlePubMedPubMed CentralGoogle Scholar
- Marraffini LA. CRISPR-Cas immunity in prokaryotes. Nature. 2015;526(7571):55–61.View ArticlePubMedGoogle Scholar
- Charpentier E, Richter H, van der Oost J, White MF. Biogenesis pathways of RNA guides in archaeal and bacterial CRISPR-Cas adaptive immunity. FEMS Microbiol Rev. 2015;39(3):428–41.View ArticlePubMedGoogle Scholar
- Makarova KS, Wolf YI, Alkhnbashi OS, Costa F, Shah SA, Saunders SJ, Barrangou R, Brouns SJ, Charpentier E, Haft DH, et al. An updated evolutionary classification of CRISPR-Cas systems. Nat Rev Microbiol. 2015;13(11):722–36.View ArticlePubMedPubMed CentralGoogle Scholar
- Koonin EV, Makarova KS, Zhang F. Diversity, classification and evolution of CRISPR-Cas systems. Curr Opin Microbiol. 2017;37:67–78.View ArticlePubMedGoogle Scholar
- Liu T, Liu Z, Ye Q, Pan S, Wang X, Li Y, Peng W, Liang Y, She Q, Peng N. Coupling transcriptional activation of CRISPR-Cas system and DNA repair genes by Csa3a in Sulfolobus islandicus. Nucleic Acids Res. 2017;45(15):8978–92.View ArticlePubMedPubMed CentralGoogle Scholar
- Makarova KS, Grishin NV, Shabalina SA, Wolf YI, Koonin EV. A putative RNA-interference-based immune system in prokaryotes: computational analysis of the predicted enzymatic machinery, functional analogies with eukaryotic RNAi, and hypothetical mechanisms of action. Biol Direct. 2006;1:7.View ArticlePubMedPubMed CentralGoogle Scholar
- Lemak S, Nocek B, Beloglazova N, Skarina T, Flick R, Brown G, Joachimiak A, Savchenko A, Yakunin AF. The CRISPR-associated Cas4 protein Pcal_0546 from Pyrobaculum calidifontis contains a [2Fe-2S] cluster: crystal structure and nuclease activity. Nucleic Acids Res. 2014;42(17):11144–55.View ArticlePubMedPubMed CentralGoogle Scholar
- Zhang J, Kasciukovic T, White MF. The CRISPR associated protein Cas4 is a 5′ to 3′ DNA exonuclease with an iron-sulfur cluster. PLoS One. 2012;7(10):e47232.View ArticlePubMedPubMed CentralGoogle Scholar
- Krajewski WW, Fu X, Wilkinson M, Cronin NB, Dillingham MS, Wigley DB. Structural basis for translocation by AddAB helicase-nuclease and its arrest at chi sites. Nature. 2014;508(7496):416–9.View ArticlePubMedPubMed CentralGoogle Scholar
- Wilkinson M, Chaban Y, Wigley DB. Mechanism for nuclease regulation in RecBCD. elife. 2016;5Google Scholar
- Wigley DB. Bacterial DNA repair: recent insights into the mechanism of RecBCD, AddAB and AdnAB. Nat Rev Microbiol. 2013;11(1):9–13.View ArticlePubMedGoogle Scholar
- Zhou C, Pourmal S, Pavletich NP: Dna2 nuclease-helicase structure, mechanism and regulation by Rpa. Elife. 2015;4. doi:10.7554/eLife.09832.
- Steczkiewicz K, Muszewska A, Knizewski L, Rychlewski L, Ginalski K. Sequence, structure and functional diversity of PD-(D/E)XK phosphodiesterase superfamily. Nucleic Acids Res. 2012;40(15):7016–45.View ArticlePubMedPubMed CentralGoogle Scholar
- Hooton SP, Brathwaite KJ, Connerton IF. The Bacteriophage carrier state of campylobacter jejuni features changes in host non-coding RNAs and the Acquisition of new Host-Derived CRISPR spacer sequences. Front Microbiol. 2016;7:355.View ArticlePubMedPubMed CentralGoogle Scholar
- Hooton SP, Connerton IF. Campylobacter jejuni acquire new host-derived CRISPR spacers when in association with bacteriophages harboring a CRISPR-like Cas4 protein. Front Microbiol. 2014;5:744.PubMedGoogle Scholar
- Krupovic M, Cvirkaite-Krupovic V, Prangishvili D, Koonin EV. Evolution of an archaeal virus nucleocapsid protein from the CRISPR-associated Cas4 nuclease. Biol Direct. 2015;10:65.View ArticlePubMedPubMed CentralGoogle Scholar
- Krupovic M, Beguin P, Koonin EV: Casposons: the mobile elements that gave rise to the adaptation module of CRISPR-Cas systems. Curr Opin Microbiol 2017, in press.Google Scholar
- Krupovic M, Makarova KS, Forterre P, Prangishvili D, Koonin EV. Casposons: a new superfamily of self-synthesizing DNA transposons at the origin of prokaryotic CRISPR-Cas immunity. BMC Biol. 2014;12:36.View ArticlePubMedPubMed CentralGoogle Scholar
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25(17):3389–402.View ArticlePubMedPubMed CentralGoogle Scholar
- Aziz RK, Devoid S, Disz T, Edwards RA, Henry CS, Olsen GJ, Olson R, Overbeek R, Parrello B, Pusch GD, et al. SEED servers: high-performance access to the SEED genomes, annotations, and metabolic models. PLoS One. 2012;7(10):e48053.View ArticlePubMedPubMed CentralGoogle Scholar
- Makarova KS, Koonin EV. Annotation and classification of CRISPR-Cas systems. Methods Mol Biol. 2015;1311:47–75.View ArticlePubMedGoogle Scholar
- Silas S, Makarova KS, Shmakov S, Páez-Espino D, Mohr G, Liu Y, Davison M, Roux S, Krishnamurthy SR, Hansen LL et al: On the origin of reverse transcriptase-using CRISPR-Cas systems and their hyper-diverse, enigmatic spacer repertoires. MBio, in press 2017.Google Scholar
- Rinke C, Schwientek P, Sczyrba A, Ivanova NN, Anderson IJ, Cheng JF, Darling A, Malfatti S, Swan BK, Gies EA, et al. Insights into the phylogeny and coding potential of microbial dark matter. Nature. 2013;499(7459):431–7.View ArticlePubMedGoogle Scholar
- Castelle CJ, Wrighton KC, Thomas BC, Hug LA, Brown CT, Wilkins MJ, Frischkorn KR, Tringe SG, Singh A, Markillie LM, et al. Genomic expansion of domain archaea highlights roles for organisms from new phyla in anaerobic carbon cycling. Curr Biol. 2015;25(6):690–701.View ArticlePubMedGoogle Scholar
- Shmakov S, Abudayyeh OO, Makarova KS, Wolf YI, Gootenberg JS, Semenova E, Minakhin L, Joung J, Konermann S, Severinov K, et al. Discovery and functional characterization of diverse class 2 CRISPR-Cas systems. Mol Cell. 2015;60(3):385–97.View ArticlePubMedPubMed CentralGoogle Scholar
- Brown CT, Hug LA, Thomas BC, Sharon I, Castelle CJ, Singh A, Wilkins MJ, Wrighton KC, Williams KH, Banfield JF. Unusual biology across a group comprising more than 15% of domain bacteria. Nature. 2015;523(7559):208–11.View ArticlePubMedGoogle Scholar
- Burstein D, Sun CL, Brown CT, Sharon I, Anantharaman K, Probst AJ, Thomas BC, Banfield JF. Major bacterial lineages are essentially devoid of CRISPR-Cas viral defence systems. Nat Commun. 2016;7:10613.View ArticlePubMedPubMed CentralGoogle Scholar
- Puigbo P, Makarova KS, Kristensen DM, Wolf YI, Koonin EV. Reconstruction of the evolution of microbial defense systems. BMC Evol Biol. 2017;17(1):94.View ArticlePubMedPubMed CentralGoogle Scholar
- Kristensen DM, Wolf YI, Koonin EV. ATGC database and ATGC-COGs: an updated resource for micro- and macro-evolutionary studies of prokaryotic genomes and protein family annotation. Nucleic Acids Res. 2017;45(D1):D210–8.View ArticlePubMedGoogle Scholar
- Takeuchi N, Wolf YI, Makarova KS, Koonin EV. Nature and intensity of selection pressure on CRISPR-associated genes. J Bacteriol. 2012;194(5):1216–25.View ArticlePubMedPubMed CentralGoogle Scholar
- Dillingham MS. Superfamily I helicases as modular components of DNA-processing machines. Biochem Soc Trans. 2011;39(2):413–23.View ArticlePubMedGoogle Scholar
- Epshtein V. UvrD helicase: an old dog with a new trick: how one step backward leads to many steps forward. BioEssays. 2015;37(1):12–9.View ArticlePubMedGoogle Scholar
- Galperin MY, Makarova KS, Wolf YI, Koonin EV. Expanded microbial genome coverage and improved protein family annotation in the COG database. Nucleic Acids Res. 2015;43(Database issue):D261–9.View ArticlePubMedGoogle Scholar
- Schwarz FW, Toth J, van Aelst K, Cui G, Clausing S, Szczelkun MD, Seidel R. The helicase-like domains of type III restriction enzymes trigger long-range diffusion along DNA. Science. 2013;340(6130):353–6.View ArticlePubMedPubMed CentralGoogle Scholar
- Butterer A, Pernstich C, Smith RM, Sobott F, Szczelkun MD, Toth J. Type III restriction endonucleases are heterotrimeric: comprising one helicase-nuclease subunit and a dimeric methyltransferase that binds only one specific DNA. Nucleic Acids Res. 2014;42(8):5139–50.View ArticlePubMedPubMed CentralGoogle Scholar
- Gupta YK, Chan SH, Xu SY, Aggarwal AK. Structural basis of asymmetric DNA methylation and ATP-triggered long-range diffusion by EcoP15I. Nat Commun. 2015;6:7363.View ArticlePubMedPubMed CentralGoogle Scholar
- Marinus MG, Lobner-Olesen A. DNA Methylation. EcoSal Plus. 2014;6(1). doi:10.1128/ecosalplus.ESP-0003-2013.
- Iyer LM, Abhiman S, Aravind L. MutL homologs in restriction-modification systems and the origin of eukaryotic MORC ATPases. Biol Direct. 2008;3:8.View ArticlePubMedPubMed CentralGoogle Scholar
- Aravind L, Walker DR, Koonin EV. Conserved domains in DNA repair proteins and evolution of repair systems. Nucleic Acids Res. 1999;27(5):1223–42.View ArticlePubMedPubMed CentralGoogle Scholar
- Lenhart JS, Schroeder JW, Walsh BW, Simmons LA. DNA repair and genome maintenance in Bacillus Subtilis. Microbiol Mol Biol Rev. 2012;76(3):530–64.View ArticlePubMedPubMed CentralGoogle Scholar
- Makarova KS, Wolf YI, Snir S, Koonin EV. Defense islands in bacterial and archaeal genomes and prediction of novel defense systems. J Bacteriol. 2011;193(21):6039–56.View ArticlePubMedPubMed CentralGoogle Scholar
- Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004;32(5):1792–7.View ArticlePubMedPubMed CentralGoogle Scholar
- Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol. 2013;30(4):772–80.View ArticlePubMedPubMed CentralGoogle Scholar
- Price MN, Dehal PS, Arkin AP. FastTree 2--approximately maximum-likelihood trees for large alignments. PLoS One. 2010;5(3):e9490.View ArticlePubMedPubMed CentralGoogle Scholar
- Edgar RC. Search and clustering orders of magnitude faster than BLAST. Bioinformatics. 2010;26(19):2460–1.View ArticlePubMedGoogle Scholar
- Soding J. Protein homology detection by HMM-HMM comparison. Bioinformatics. 2005;21(7):951–60.View ArticlePubMedGoogle Scholar
- Soding J, Remmert M, Biegert A, Lupas AN, et al. Nucleic Acids Res. 2006;34(Web Server issue):W374–8.View ArticlePubMedPubMed CentralGoogle Scholar