- Research article
- Open Access
Evolutionary dynamics of origin and loss in the deep history of phospholipase D toxin genes
BMC Evolutionary Biology volume 18, Article number: 194 (2018)
Venom-expressed sphingomyelinase D/phospholipase D (SMase D/PLD) enzymes evolved from the ubiquitous glycerophosphoryl diester phosphodiesterases (GDPD). Expression of GDPD-like SMaseD/PLD toxins in both arachnids and bacteria has inspired consideration of the relative contributions of lateral gene transfer and convergent recruitment in the evolutionary history of this lineage. Previous work recognized two distinct lineages, a SicTox-like (ST-like) clade including the arachnid toxins, and an Actinobacterial-toxin like (AT-like) clade including the bacterial toxins and numerous fungal homologs.
Here we expand taxon sampling by homology detection to discover new GDPD-like SMase D/PLD homologs. The ST-like clade now includes homologs in a wider variety of arthropods along with a sister group in Cnidaria; the AT-like clade now includes additional fungal phyla and proteobacterial homologs; and we report a third clade expressed in diverse aquatic metazoan taxa, a few single-celled eukaryotes, and a few aquatic proteobacteria. GDPD-like SMaseD/PLDs have an ancient presence in chelicerates within the ST-like family and ctenophores within the Aquatic family. A rooted phylogenetic tree shows that the three clades derived from a basal paraphyletic group of proteobacterial GDPD-like SMase D/PLDs, some of which are on mobile genetic elements. GDPD-like SMase D/PLDs share a signature C-terminal motif and a shortened βα1 loop, features that distinguish them from GDPDs. The three major clades also have active site loop signatures that distinguish them from GDPDs and from each other. Analysis of molecular phylogenies with respect to organismal relationships reveals a dynamic evolutionary history including both lateral gene transfer and gene duplication/loss.
The GDPD-like SMaseD/PLD enzymes derive from a single ancient ancestor, likely proteobacterial, and radiated into diverse organismal lineages at least in part through lateral gene transfer.
Comparative evolutionary analyses provide insight into origin of novel function, as well as the dynamics and directionality of phenotypic change. As new sequence data become available, we can better assess the relative importance of convergence, including independent recruitment from widespread gene families, and lateral gene transfer (LGT) in the origin of novel phenotypes. The evolutionary history of phospholipase D (PLD) toxins is a particularly interesting case study. Emerging understanding of the phylogenetic distribution of these toxins has inspired hypotheses about roles of lateral gene transfer and/or convergent recruitment of toxic activity, and the identification of functionally relevant motifs [1,2,3]. The recognition of similarities among disparate lineages that express these toxins has potential to elucidate common features of the toxic effects on mammals, facilitate development of widespread treatments, and illuminate general cascades of human pathophysiological response.
PLD toxins are well known from a variety of pathogenic organisms, most famously in venoms of brown recluse spiders and their relatives. Comparative evidence supports recruitment of the toxin in spider venoms before the most recent common ancestor of sicariid spiders, which include the genera Loxosceles, Sicarius, and Hexophthalma (formerly African Sicarius) [4, 5]. We therefore have referred to the spider toxin gene family as SicTox . Multiple paralogs of SicTox are highly expressed in venom and serve functional roles in prey immobilization , but these enzymes are most notorious for being sufficient to cause severe dermonecrotic lesions, and occasional systemic effects, in mammals . The enzyme activity is also functionally relevant in tick saliva (Ixodes) , venom of a buthid scorpion , and as exotoxins in two genera of pathogenic bacteria . Homologs with less well-described function are also in pathogenic fungi [2, 3]. Felicori and coworkers  recently expanded the known distribution of homologs to include: 11 genera in four orders of fungi; spider taxa with sequenced genomes (Araneomorph Stegodyphus & mygalomorph Acanthoscurria), two new genera of ixodine ticks, predatory mites (Metaseiulus) and dust house mites (Dermatophagiodes); two new suborders of Actinomycetales bacteria (Streptomyces, Austwickia) and a genus of proteobacteria in the order Burkholderiales (Burkholderia). A phylogenetic analysis supported two major clades, a SicTox-like (ST-like) clade that includes the arachnid toxins, and an Actinobacterial-toxin like (AT-like) clade that includes the bacterial toxins as well as fungal homologs, with the Burkholderia homolog a singleton outside of both broader groups .
Phospholipase D toxins belong to a protein domain family known as the GDPD-like SMase D/PLDs (NCBI conserved protein domain family cd08576). This family is unrelated to the better-known HKD PLDs , and instead evolved from glycerophosphoryl diester phosphodiesterase (GDPD), a ubiquitous gene family that hydrolyzes glycerol phosphodiesters and plays a role in glycerol metabolism [1, 12, 13]. PLD toxins were originally described as sphingomyelinase D enzymes (SMase D),  but later studies showed some to be broader spectrum PLDs capable of acting on diverse lysophospholipids [15, 16], as well as other phosphosphingolipids . The name GDPD-like SMase/PLD thus reflects both evolutionary origin and ambiguous substrate preference. While the most well-characterized members of this family are toxins, some members of the family may not be; for example, there are spider homologs expressed in non-venom tissues that are serving some other function within the organism (Binford, unpublished).
PLD toxins have retained some shared or overlapping features with GDPD relatives while also diverging considerably. Key active site residues are conserved between the two groups . GDPDs act primarily on glycerophosphodiester substrates, and PLD toxins are best known as sphingomyelinases, but at least a few GDPDs and many GDPD-like SMaseD/PLDs have lysophospholipase activity [15, 16, 18,19,20]. PLD toxins are generally extracellular enzymes [21, 22], while only a subset of GDPD family members are secreted . All known members of the PLD toxin lineage have a characteristic C-terminal motif, proposed to increase structural stability [2, 4]. This motif is lacking in GDPD domains, and is thus a shared, derived character, or synapomorphy, for the GDPD-like SMaseD/PLD clade . Finally, GDPD enzymes catalyze a two-step reaction in which an alcohol group on the substrate is displaced by intramolecular attack of a hydroxyl group, generating a cyclic intermediate, which is hydrolyzed in a second step . The mechanism of PLD toxins from bacteria, fungi and spiders lacks the hydrolytic step, so that the cyclic intermediate is the final product [19, 24].
The disparate phylogenetic distribution of these enzymes, coupled with their common structural motifs and lack of hydrolytic chemistry, has fueled the question of how the taxonomic distribution of PLD toxins arose. Is it a product of convergent evolution from different GDPD ancestors, or of divergent evolution accompanied by lateral gene transfer [1,2,3]? We initially argued the case for lateral gene transfer based on the evidence of the synapomorphy of the C-terminal motif shared between bacterial and spider SMase D . With the detection of homologs in fungi, Fry et al. argued that the gene family was likely widespread among Opisthokonts and convergently recruited for antagonistic function in fungi and spiders . With a further expanded data set, Felicori and coworkers have suggested that lateral transfer between metazoans and bacteria is likely, but also that the recruitment of the secreted toxin may have occurred convergently within fungi and arachnids . The discovery that bacterial, fungal and spider PLD toxins catalyze cyclization (not hydrolysis) led us to reassert our original hypothesis of divergence plus lateral gene transfer .
Here we use homology searching and phylogenetic analysis to probe the deeper evolutionary history of PLD toxins in the GDPD-like SMaseD/PLD family. Through updated sequence similarity searching, we discover a new lineage of proteins related to PLD toxins, found in a diverse array of aquatic organisms. We infer that the ST-like homologs (which include the spider toxins), the AT-like homologs (which include the actinobacterial toxins and fungal homologs) and the new mainly aquatic lineage comprise three major clades. These three clades diverged from a basal paraphyletic group that likely originated in proteobacteria. We identify multiple molecular synapomorphies indicating that all GDPD-like SMaseD/PLD family members diverged from a single ancestor, probably bacterial. Finally, we find that lateral gene transfer, but also gene duplication and loss, contributed to the unusual species distribution of PLD toxins.
Sequence similarity searching
We conducted initial protein BLAST searches using three representatives of the SicTox-like (ST-like) family and three from the Actinobacterial-toxin like (AT-like) family as query sequences, along with a singleton sequence from the proteobacterium Burkholderia. We accepted hits with E < 1e-05 (E-value) to a single query sequence or E < 1e-03 to at least two query sequences. These initial searches identified PLD-toxin homologs in previously unrepresented lineages (Additional file 1: Table S1). For ST queries, these included other arthropod classes and subphyla such as the Merostomata and crustaceans, but also distantly related organisms such as cnidarians and proteobacteria. For AT and Burkholderia queries these included other fungal phyla such as the basidiomycota, but also oceanic diatoms and other classes within proteobacteria (Additional file 1: Table S1). Conserved domain searches (CD) showed that all the initial protein hits belong to the same broad sequence family as the AT-like and ST-like family sequences (cd08576, GDPD-like SMase D/PLD; E < 1e-10), and not other domain families within the broad GDPD superfamily (see Additional file 1: Table S1). Some hits in lineages such as the Cnidaria or Proteobacteria had such close similarity (E < 1e-25) to ST or AT query sequences as to suggest lateral gene transfer events similar to the one previously identified between fungi and actinobacteria . Meanwhile, a set of 8 relatively weak hits from ST queries, found in aquatic/marine proteobacteria and crustaceans, had high mutual sequence similarity (E < 1e-20 for most pairwise comparisons) and suggested the discovery of a novel, widely distributed, toxin-like sequence family.
To find additional homologs, we performed translated BLAST searches of genome and transcriptome databases, as well as secondary protein and translated BLAST searches from the initial protein BLAST hits above. Secondary searches initiated from the aquatic/marine proteobacterial/crustacean hits led to expansion of this seed group to a wide variety of organisms, almost all from aquatic/marine habitats (Additional file 1: Table S2). Translated genome and transcriptome searches from ST queries led to identification of further close ST homologs in previously unrepresented lineages. Most notably, it solidified the presence of close ST-like homologs in cnidarians and led to the discovery of bona fide close ST-like homologs in myriapods (Additional file 1: Table S3). Meanwhile, translated searches from AT and Burkholderia queries, and secondary searches from unclassified query sequences found in the initial blastp search, led to identification of only a few additional homologs (Additional file 1: Tables S4 and S5). These searches set the stage for a new consideration of the evolutionary history of the GDPD-like SMase D/PLD family, including expanded phylogenetic distribution, potential lateral gene transfer events and/or gene loss, and origin from GDPDs.
We aligned the GDPD-like SMase D/PLD sequences (Additional file 2: Figure S1) and constructed a phylogenetic tree using RaxML (Fig. 1). We then rooted the tree using six highly diverse GDPDs of known structure as outgroups, while restricting the character set to positions that could be confidently aligned using structure-structure alignment between GDPDs and PLD toxins (see Materials and Methods). All six outgroups, considered together or separately, rooted the tree on the same branch.
Most sequences belong to one of three major clades, corresponding to expanded versions of the AT-like and ST-like groups, and a third large new group. These clades are collapsed in Fig. 1, but their internal tree structure and taxonomic representation are discussed further below (see also Figs. 5, 6 and 7 and Additional file 1: Tables S2-S7). The expanded AT-like group (Additional file 1: Tables S4 and S7) is recovered as a clade with 100% bootstrap support. The expanded ST-like group (Additional file 1: Tables S3 and S6) is more weakly supported as a clade (77% bootstrap support), but has a clear catalytic loop signature (see below) that distinguishes it from other toxin-like PLDs and supports its monophyly. The third group is a well-supported clade (100% bootstrap support) that includes homologs from diverse organisms found primarily in marine or aquatic habitats. We refer to this novel family as the Aquatic clade (Additional file 1: Table S2). It is sister to the ST-like group, though with weaker bootstrap support (68%).
The tree also includes a scattering of sequences, mostly from proteobacteria, including the singleton sequence from Burkholderia found by Felicori and coworkers  (Fig. 1; Additional file 1: Table S5). The proteobacterial sequences represent three diverse classes: β (Burkholderia and Methylibium), γ (Pseudomonas) and δ (Desulfoluna). Although representation of each class is limited to one or two sequences, proteobacteria resolve as paraphyletic with respect to inclusion of the AT-like clade and the Aquatic/ST-like clade. In particular, the Methylibium sequence is sister to all other taxa in the outgroup rooting, while a pair of proteobacterial sequences (γ and δ) is sister to the Aquatic/ST-like clades. The Burkholderia singleton is sister to a lineage in which a homolog from the oceanic diatom Thalassiosira oceanica is weakly supported as the closest relative of the AT-like group. The paraphyletic grouping of proteobacterial sequences in this analysis suggests divergent origins of the GDPD-like SMaseD/PLD sequences from a proteobacterial ancestor.
Shared and distinguishing sequence features of the major toxin-like PLD groups
One signature feature of the GDPD-like SMaseD/PLDs is a C-terminal “plug” motif (Fig. 2) identified in previous work . Sequence logos (Fig. 2) show that the motif is similar in the ST-like and AT-like groups, as well as in the proteobacterial-dominated basal sequences. The motif profile diverges in the Aquatic group yet retains a similar overall pattern. The C-terminal motif is a synapomorphy reflecting divergence of GDPD-like SmaseD/PLDs from a single ancestor, rather than convergent origin from different GDPDs in diverse lineages.
A second signature of GDPD-like SMaseD/PLDs is the length of the short βα1 loop connecting strand β1 to helix α1 (Fig. 3). In GDPD families from prokaryotes to mammals, the βα1 loop is at least six residues longer (most typically 7–9 based on an analysis of the NCBI Conserved Domain database) and contains several conserved residues that interact with an adjacent small domain (GDPD-insert domain or GDPD-I domain ) nested within the βα2 loop. In GDPD-like SMaseD/PLDs, the shortening of the βα1 loop may relate to the absence of the adjacent GDPD-I domain (see below). The βα1 loop length is a second signature feature, newly described here, that argues against convergent evolution of GDPD-like SMaseD/PLDs.
The GDPD-I domain is largely deleted in GDPD-like SMaseD/PLDs and replaced by a βα2 loop sequence that differs characteristically between the major groups (Fig. 3). In Sicariid toxins, the βα2 loop is known as the catalytic loop . In the ST-like group it has a conserved length (15 residues), a pair of conserved cysteine residues that form a disulfide bond within the loop, and a third less conserved cysteine that covalently links the βα2 loop to the βα6 loop. In the AT-like group, the catalytic loop is shorter (9–13 residues), more variable, and lacks conserved cysteines. The Aquatic group, meanwhile, conserves the cysteines of the ST-like group but within a longer catalytic loop sequence that contains at least one additional pair of cysteine residues. The signature sequence of the catalytic loop distinguishes the three major families of toxin-like PLDs, though it does not distinguish the AT-like group from the scattered basal sequences. Based on our phylogeny, the ancestral catalytic loop of the GDPD-like SMaseD/PLDs was probably very short and lacked conserved cysteine residues.
Known sicariid and actinobacterial toxins are single-domain proteins, but all basal sequences and the great majority of Aquatic clade sequences have significant regions of sequence outside of the catalytic domain. Some of these regions contain recognizable domains, yielding novel domain architectures (Fig. 4). Other putative domains resisted classification despite conservation in multiple homologs. At least 70% of Aquatic clade members have one to four repeats of a novel, unclassified cysteine-rich domain (Additional file 3: Figure S2). Three basal sequences have one or more PLAT (Polycystin-1, Lipoxygenase, Alpha-Toxin) domains , which commonly exhibit calcium-dependent membrane/lipid recognition and help target proteins to membranes [28,29,30]. Other domains identified include bacterial Ig-like domain repeats, PUD-1/PUD-2 (protein upregulated in daf-2 loss of function) , and VMO-I (vitelline membrane outer layer protein-I)  domains. The Ig-like domains are distant homologs of those found in calcium-dependent bacterial adhesins [33, 34], while the biochemical function of PUD-1/PUD-2 is not well understood. VMO-I is probably a carbohydrate-binding domain , and it bears mentioning that several AT-like clade homologs also contain domains likely to recognize glycans in surface glycoproteins (Materials and Methods; Additional file 1: Table S7). While the functions of these PLD-fused domains are diverse and in some cases unknown, a role in surface recognition and adhesion is a common theme. The multidomain nature of the basal sequences suggests that the ancestral GDPD-like SMase D/PLD may have utilized additional domains for interfacial and other properties. Finally, a majority of sequences in all three major clades, along with several basal sequences, have recognizable signal peptides (Additional file 1: Tables S2-S7). As expected, most members of this family are probably secreted enzymes.
Taxonomic distribution of ST-like PLDs
Following reports of sicariid toxin homologs in ticks  and mites , it seemed possible that this family would prove to be conserved in arachnids and maybe chelicerates in general. Indeed, current BLAST searches with sicariid toxin sequences detect homologs (E-values <1e-20 to Sicariid query sequences) in all 24 available representative chelicerate genomes (see also Fig. 9 below) and many transcriptomes. This includes the horseshoe crab Limulus polyphemus  and representatives of a wide variety of arachnid orders (including scorpions and acarines) and non-sicariid spiders (Additional file 1: Tables S3 and S6; Fig. 5). Many species carry multiple paralogs, with some copies showing deviations from active site consensus sequences that suggest functional divergence or gene decay.
Outside of chelicerates, we detected arthropod ST-like homologs in a broad set of myriapods (centipedes and millipedes) (Additional file 1: Table S3 and Fig. 5). The genome of the centipede Strigamia maritima (class Chilopoda) has two homologs supported by transcript data , and the transcriptome of the millipede Polydesmus angustus (class Diplopoda) also contains a homolog . Fragmentary BLAST hits also support a presence in the class Symphyla, while the fourth class of myriapods, Pauropoda, is not well represented in databases. Some of these homologs, such as the P. angustus sequence, show perfect conservation of canonical sicariid toxin motifs; others, such as the S. maritima sequences, show some evidence of divergence or gene degradation. In our maximum likelihood tree, the myriapod sequences are monophyletic and weakly supported as sister to all chelicerate ST-like sequences, suggesting that ST-like sequences could have been present in the common ancestor of myriapods and chelicerates (Fig. 5). Current hypotheses support myriapods as sister to hexapods and crustaceans, forming the group mandibulata, which is then sister to chelicerates  (see also Fig. 9, below). Assuming these relationships are correct, and their presence in arthropods represents a single evolutionary origin, ST-like homologs have an ancient origin in an aquatic ancestor of arthropods (> 550 Ma) . If true, the apparent absence of ST-like homologs in crustaceans and hexapods is consistent with a loss of representatives of this molecular clade, before the most recent common ancestor of crustaceans and hexapods (pancrustacea).
Beyond arthropods, the only other metazoan taxon with strong evidence for sicariid toxin-like enzymes is the anthozoan class of Cnidaria, a branch of animals quite distant from arthropods. Remarkably, we detect homologs from four different anthozoan orders (Fig. 5; Additional file 1: Table S3), suggesting an ancient presence in this group. The genomes of the stony coral Acropora digitifera (order Scleractinia)  and the sea pansy Renilla reniformis (order Pennatulacea)  both contain genes encoding close homologs (E-value <1e-40 to sicariid toxin query sequences), well supported by transcript data in the case of Acropora. Translated BLAST searches using sicariid toxins yielded very similar hits in transcriptome data from eight other anthozoan species, including representatives of two additional orders (Ceriantharia and Corallimorpharia). Despite this extensive representation, BLAST searches using these sequences as queries failed to find ST-like family sequences in four other representative anthozoan genomes, including those of the sea anemones Nematostella vectensis and Exaiptasia pallida (order Actiniaria) and the stony corals Orbicella faveolata and Stylophora pistillata (order Scleractinia). Other anthozoan transcriptomes also lacked hits, except for three sequences in a reference transcriptome assembly of mesenteries/nematosomes/tentacles from Nematostella  that were unsupported by other transcriptome or genome data for that organism (Additional file 1: Table S3). The deep presence in anthozoans and arthropods, and absence in other metazoan lineages, may be best explained by ancient lateral gene transfer. We return to this point below.
With our expanded understanding of the distribution of, and relationships among, GDPD-like SMaseD/PLDs in the ST-like clade, we confirm broad genomic presence of homologs in chelicerate and myriapod arthopods. Given this distribution, expression in venoms of sicariid spiders represents recruitment from this gene family for venom function . Chelicerate homologs have also convergently emerged for venomous function in ticks , and scorpions . The function of ST-like clade members in the other arthropods, and in anthozoans, is unknown.
Taxonomic distribution of the aquatic clade
The newly discovered Aquatic clade has a strikingly wide species distribution (Fig. 6; Additional file 1: Table S2). Combined genome and transcriptome data strongly support presence in one prokaryotic phylum and seven eukaryotic phyla: Proteobacteria, Amoebozoa, Ichthyosporea, Ctenophora, Cnidaria, Rotifera, Platyhelminthes, and Arthropoda. Transcriptome and proteome data support a presence in Mollusca/Brachiopoda. Ctenophores are particularly well represented, with homologs found in both representative genomes, Mnemiopsis leidyi and Pleurobrachia bachei, and in a wide array of transcriptomes from the major ctenophore classes. Thus, the presence of Aquatic clade PLD genes in ctenophores is probably ancient. With the exception of the slime mold Physarum polycephalum and the wood-decomposing springtail Holacanthella duospinosa, all Aquatic clade members come from organisms that occupy aquatic, marine, or tidal habitats.
Despite their wide species distribution, Aquatic clade homologs have a sparse presence, or no detectable presence, in most metazoan phyla. Within Arthropoda, for example, genome and transcriptome data from multiple species strongly support establishment in the crustacean genus Daphnia. The only other unambiguous arthropod representation is in the hexapod order Collembola, supported by genome and transcriptome data in Holacanthella duospinosa  and by transcriptome data in Anurida maritima, both in the family Neanuridae. Other crustacean and collembolan genomes lacked detectable homologs, as did all other hexapod genomes including insects (see also Fig. 9, below). However, there is some transcriptomic evidence for a limited presence in the crustacean class Malacostraca. Within Rotifera, homologs are well supported by genome and transcriptome data in the bdelloid rotifers Adineta vaga and Rotaria magnacalcarata but not found in the genomes of Adineta ricciae or Rotaria macrura , or the non-bdelloid rotifer Brachionus calyciflorus. Within Cnidaria, there are Aquatic homologs in at least five genera in the hydrozoan class, including raw genome data from Hydractinia, yet no detectable homolog in the genome of the representative hydrozoan Hydra vulgaris . Non-metazoan taxa are represented only by one species of slime molds Amoebozoa (Physarum polycephalum), one species of the Ichthyosporea (Amoebidium parasiticum), and three diverse species of Proteobacteria (Ahrensia, Oceanospirillum and Candidimonas). Glaring absences include plants, all deuterostomes, nematodes, annelids, most molluscs, insects, and all other bacteria.
A paucity of genome and/or transcriptome data is unlikely to account for sparse or absent representation. Scaffold-level genome assemblies with strong sequence coverage have been deposited in NCBI databases for numerous representatives of each of the above phyla (refer to Fig. 9 below for numbers of genomes as of May 2018). In some individual species, absence of a detectable homolog in a genome could reflect gaps in draft genome assembly. It also cannot be ruled out that acquisition of many introns, or extreme sequence divergence, contributes to apparent absence in some lineages. However, such concerns are mitigated by extensive secondary BLAST searches and other arguments (see further discussion below). On the whole, incomplete or sparse representation in a lineage likely reflects gene loss in some cases and lateral gene transfer in others.
For taxa with multiple Aquatic homologs from different species we were able to test whether the proteins were monophyletic within the taxonomic group (Fig. 6). The broadly represented ctenophore sequences are a clade, within which there is evidence of an ancient duplication leading to distinct paralogs. Hydrozoan homologs and most arthropods (with the exception of the Malacostraca) are each recovered as monophyletic. Support for a flatworm and rotiferan clade is strong (98% bootstrap support), which may reflect their taxonomic relatedness, since both phyla are in the platyzoan clade of spiralia [47, 48]. The patterns of conservation of Aquatic homologs across deeper clades, the ctenophores in particular, suggest that the Aquatic group, like the ST-like group, has an ancient origin.
While support for the Aquatic clade is strong (100% bootstrap), the deeper relationships among the taxon-specific clades are not well resolved. However, tree topology tests allow rejection, at a 95% confidence level, of monophyly for the combined set of protostomian sequences (Additional file 1: Table S8). Arthropod and spiralian (Lophotrochozoa) monophyly are also rejected, with the caveat that sequences from Malacostraca and Mollusca are supported only by transcriptome data, albeit from at least three species each (Additional file 1: Table S2). The lack of congruence between the organismal and molecular phylogeny suggests that the Aquatic clade has experienced either lateral gene transfer or a combination of ancient gene duplication and extensive gene loss, or both (see further discussion below).
Taxonomic distribution of AT-like group
Expanding on previous observations, we detected a broad set of fungal and actinobacterial homologs that are strongly supported as a single clade that is not a near relative of ST-like PLDs (Figs. 1 and 7) . The actinobacterial representation includes five orders, adding one order (four genera of Pseudonocardiales) to the taxa detected previously, as well as diversity within the other four orders. Within Ascomycota, we detect representatives from four classes, seven orders, 11 families, and 21 genera, representing an expansion of fungal sequences available in databases since 2013. As with the phylogeny of the Aquatic clade, deeper nodes in this lineage are not well supported. With that caveat, our phylogeny does not support monophyly of Ascomycota, largely because of the inclusion of actinobacteria, one proteobacterium, and the few representative basidiomycota rendering the lineage paraphyletic. Actinobacteria are resolved as a clade, albeit with negligible support (37% bootstrap support).
Having established that GDPD-like SMaseD/PLDs are a monophyletic group descended from GDPD enzymes, and having reexamined their phylogenetic distribution, we now consider two models to explain the origin of this distribution. One limiting model is that it derives purely from vertical descent accompanied by gene duplication and loss; a second model is that lateral gene transfer also contributed significantly. At first glance, the phylogeny and taxonomic representation of GDPD-like SMaseD/PLDs seems to favor involvement of lateral gene transfer. Evidence includes phylogenetically disparate presence of homologs within and among the three major clades, combined with the absence of homologs in major areas of the tree of life that have representative genomes and transcriptomes (including a majority of bacterial clades, archaea, plants, basal Eukaryotes (Excavata), deuterostomes and hexapods).
Indeed, a pure vertical descent model can be falsified by two clearcut cases of lateral gene transfer within the AT-like group, both involving bacteria. One previously described event, also evident in our analysis, occurred between fungi and actinobacteria, the two major groups of organisms carrying AT-like group PLDs . A second involves Xenorhabdus mauleonii, the lone representative of Proteobacteria in the AT-like group (Fig. 6). Dozens of Xenorhabdus genomes have been sequenced (Fig. 8), and phylogenetic relationships among them are well characterized , but only X. mauleonii [50, 51] carries a PLD toxin gene homolog. The homolog in X. mauleonii is found within the pyrBI operon  and is directly flanked by pyrB and cbbBc  (Fig. 8). In other Xenorhabdus species, there is either no gene or one of a highly diverse set of modules between pyrB and cbbBc, suggesting that this locus is a hotspot for recombination and insertion/deletion. These modules include a toxin-antitoxin pair and prophage genes, genetic elements commonly associated with LGT . The basal species X. innexi and a scattering other species contain a gene set that may represent the ancestral gene module in this region. The cross-genome comparisons among Xenorhabdus species strongly support the hypothesis that X. mauleonii acquired its PLD toxin homolog by LGT. Clearly, it is essential to allow for at least some LGT involving bacteria in the history of the GDPD-like SMaseD/PLDs.
As a side note, the acquisition of a toxin-like PLD by Xenorhabdus mauleonii also suggests a functional role in entomopathogenicity. Xenorhabdus bacteria are entomopathogenic endosymbionts of entomopathogenic nematodes . Numerous fungi that carry AT-like PLDs are also entomopathogens, including Aschersonia aleyrodis, a whitefly control agent  with a toxin-like PLD that is sister to X. mauleonii in the tree of the AT-like group (Fig. 7). Whether or not the direct source of LGT to X. mauleonii was an entomopathogenic fungus, the relatively close homology between PLDs from unrelated entomopathogens suggests that the PLD may somehow foster pathogenicity toward insects.
We next consider whether vertical descent could nonetheless account for the unusual distribution of GDPD-like SMaseD/PLDs among complex eukaryotes in the ST-like and Aquatic clades (Fig. 9). Both the ST-like and Aquatic clades, which are sister taxa in our phylogeny (Fig. 1), contain representatives of arthropods and cnidarians, albeit from nonoverlapping subgroups. The Aquatic clade, but not the ST-like clade, also contains representatives of many other metazoan and some non-metazoan phyla, while lacking representatives of large metazoan groups like the deuterostomes or insects. Under a vertical descent model, the common ancestor of the ST-like and Aquatic clades must have arisen in some pre-metazoan eukaryote, then undergone at least one gene duplication to give the ST-like and Aquatic clades, followed by variable paralog retention within both cnidarians and arthropods. A number of major cnidarians (e.g. Hydra vulgaris and Exaiptasia pallida), arthropods (e.g. essentially all hexapods), deuterostomes, and numerous entire metazoan phyla must have lost both paralogs, being unrepresented in either clade. A vertical descent model would thus require massive gene loss in both the ST-like and Aquatic clades (Fig. 9).
Extreme sequence divergence, leading to undetected homologs, could exaggerate the appearance of gene loss, but we doubt this is an issue in our study. In exhaustive PSI-BLAST searches restricted to metazoans, the closest additional hits we found had E-values above 0.1 and belonged largely to the GDE4 family. GDE4-like proteins, which are widespread in metazoans, have signature GDPD features and are therefore not a highly diverged branch of the GDPD-like SMaseD/PLDs, though they do have lysophospholipase D activity. PSI-BLAST searches not restricted to metazoans were quickly swamped by bacterial hits belonging to GDPD families. Thus, any undetected, highly diverged sequences from hexapods or deuterostomes (metazoans that are well represented in protein databases) would have to be more different from GDPD-like SMaseD/PLDs than GDPD-like SMaseD/PLDs are from major groups of GDPD enzymes.
We also noted above that the molecular phylogeny of the Aquatic clade is not fully congruent with deep organismal relationships among eukaryotes (Additional file 1: Table S8). Under a vertical descent model, ancient duplication events within the Aquatic clade, along with even more paralog loss, would have to be invoked to resolve this conflict. When one further considers the scattering of proteobacterial sequences in the Aquatic clade, LGT becomes a more sensible explanation. That having been said, patterns of conservation with certain lineages, such as Anthozoa and Rotifera, do point toward multiple recent gene loss events (Fig. 9, dashed boxes), and extensive gene duplication is also evident in both the ST-like and Aquatic clades (Figs. 5 and 6). While gene duplication and loss by themselves are unlikely to fully explain the observed phylogenetic distribution in these clades, they should not be minimized as contributors.
We suggest that an ancient LGT-mediated radiation, between and among proteobacteria and various eukaryotes, contributed significantly to the observed phylogenetic distribution, in combination with extensive gene duplication/loss. As noted above, the presence of mostly proteobacterial homologs among basal sequences suggests a bacterial ancestor for GDPD-like SMaseD/PLDs. These deep homologs are broadly dispersed among the β, γ and δ proteobacteria, while a few additional homologs from γ and α proteobacteria are also scattered within the major clades. Several of the proteobacterial homologs are found on mobile genetic elements, consistent with the propensity for LGT, particularly of secreted pathogenic molecules . The previously identified Burkholderia cenocepacia sequence , for example, resides on a conjugative or mobilizable plasmid that carries an extensive set of DNA transfer genes, including a conjugative relaxase . The basal sequence from Methylibium resides on a predicted genomic island, as does a sequence from Ahrensia within the Aquatic clade (see Materials and Methods). The sparse presence of GDPD-like SMaseD/PLDs in the proteobacteria, which are widely represented in genome sequencing projects, thus almost certainly results from their high genetic mobility. Given that interkingdom transfer is well supported within the AT-like clade, it is reasonable to suggest that early GDPD-like SMaseD/PLDs radiated widely through LGT, not only among proteobacteria but also between proteobacteria and eukaryotes.
Putative functions of aquatic clade proteins
The AT-like and ST-like groups contain homologs that are known toxins, though some homologs could have other functions. Meanwhile, the newly discovered Aquatic group has not been established to contain toxins, and several lines of evidence suggest a diverse array of functions. For example, the slime mold Physarum polycephalum, which expresses a member of the Aquatic clade, produces a biologically active cyclic lysophosphatidic acid (cPA) that inhibits cell proliferation [59,60,61]. Given that Sicariid PLD toxins generate cPA from lysophospholipid substrates , the Aquatic clade homolog is a candidate for the cPA synthase in Physarum. It is noteworthy that the cPA was isolated from a single-celled form of the organism, the myxamoebae, and the strongest expression evidence for the Aquatic PLD also comes from single-cell transcriptomes. Two other members of the Aquatic clade, from the brachiopod Laqueus rubellus and the hydrozoan Hydractinia symbiolongicarpus, show possible evidence for tissue-specific expression. The Hydractinia sequence was identified from a study of transcriptomes in feeding, reproductive and defensive polyps, and was expressed specifically in feeding polyps . The Laqueus sequence was reported in a proteomic and transcriptomic study of the brachiopod shell matrix, and was identified specifically from the insoluble organic matrix . These observations suggest that members of the Aquatic clade may play diverse biological roles. The majority of members of this family also have one or more copies of a ~ 70-residue cysteine-rich C-terminal domain (Additional file 3: Figure S2). This domain may ultimately provide clues to Aquatic clade function, but as of now it has no apparent homology to any known domain family.
The GDPD-like SMaseD/PLD enzymes share a single common ancestor and are a monophyletic domain family. The ancestor evolved from a GDPD enzyme, most likely a multidomain protein from bacteria, and acquired several novel features including shortened βα1 and βα2 loops and a C-terminal motif. Descendants of this ancestor have radiated extensively, at least in part by ancient lateral gene transfer. Three major clades emerged, with one (ST-like) now found in corals and arthropods, one (AT-like) in bacteria and fungi, and a third (Aquatic) in a wide array of aquatic organisms. GDPD-like SMaseD/PLDs are ancient and broadly established within a few major lineages, such as the chelicerates and ctenophores. Overall, however, the evolution of this family appears highly dynamic and includes gene duplication and gene loss, in addition to extensive lateral gene transfer. Traditionally known as toxins, GDPD-like SMaseD/PLD enzymes may carry out a wide array of biological functions, which may be illuminated by future investigations.
Materials and Methods
Initial protein BLAST searches
Initial protein BLAST (blastp) searches of the NCBI nonredundant protein database, using three sicariid toxins with known structure as query sequences (PDB IDs 4Q6X, 3RLH and 1XX1), yielded over 300 high-coverage (> 75%) hits to proteins from 35 organisms in the class Arachnida (all at E-values <1e-25), including hits to 27 Sicariid spiders, 2 non-Sicariid spiders, and 6 mite and tick species (Acari). A single high-coverage hit was also retrieved for a homolog in Merostomata (horseshoe crab, E = 1e-50 to 1e-63, 35–39% ID), a distantly related class within the arthropod subphylum Chelicerata. This suggested broad conservation of SicTox-like sequences within the Chelicerata. Surprisingly, two high-coverage hits at this high level of similarity (E < 1e-32 to all queries) were also found to hypothetical non-chelicerate proteins, one from Acropora digitifera (a stony coral, in the phylum Cnidaria) and one from Rhagoletis zephyria (snowberry fruit fly, an insect).
The same searches yielded 11 weaker but significant high-coverage hits to non-chelicerate proteins (1e-17 < E < 1e-03, and 24–31% sequence identity, to at least two of the three SicTox query sequences). These included single hypothetical protein sequences from five highly diverse proteobacterial species representing the α-, β-, γ- and δ-proteobacteria (Ahrensia sp. R2A130, Oceanospirillum beijerinckii DSM 7166, Methylibium sp. YR605, Pseudomonas hussainii and Desulfoluna spongiphila), along with six sequences from two water flea (a crustacean) species, Daphnia pulex and Daphnia magna. Each of these proteins showed similarity to at least 1 chelicerate SicTox homolog at E-value <1e-10 when used as queries themselves, while showing no similarity below E < 1e-05 to the group of bacterial/fungal PLD homologs described in Dias-Lopes et al.  A set of eight of these sequences also showed higher similarity to each other (E-values <1e-20) than to the chelicerate SicTox proteins or to the Acropora and Rhagoletis sequences described above. These findings suggested the existence of a novel family of GDPD-like SMase D/PLDs, reasonably closely related to the ST-like family but distinct from it.
Initial blastp searches of the NCBI nonredundant protein database were also conducted using three AT-like (actinobacterial or ascomycotal) PLD toxin representatives as query sequences. Two query sequences were selected based on recent biochemical characterization : one (Genbank: AAA21882) from the actinobacterium Arcanobacterium haemolyticum and one (Genbank: EFW19765) from the fungus Coccidioides posadasii, a representative of the Eurotiomycete class. A third query sequence (Genbank: EFY88254) was selected from Metarhizium acridum CQMa102, a representative of the Sordariomycete class and a basal fungal sequence in the phylogenetic analysis of Dias-Lopes et al. . These searches yielded a combined total of 249 unique high-coverage hits at E < 1e-25. This dataset included representatives of 9 genera, 5 families and 5 orders within the Actinobacteria; and 19 genera, 9 families, 5 orders and 3 classes within Ascomycota. In addition, outside of these lineages, very strong hits (E < 1e-50 to all queries) were found to a hypothetical protein in Xenorhabdus maleonii, a proteobacterium, and Serendipita vermifera, a fungus in the phylum Basidiomycota. Finally, at a much lower level of similarity, a weak but high-coverage hit was observed from all queries (at E < 1e-03) to a hypothetical protein from the oceanic diatom Thalassiosira oceanica. A blastp search with the Thalassiosira sequence showed similarity to 8 actinobacterial/fungal PLD homologs at E < 1e-05. Thus, the new searches revealed that the AT-like family has clear homologs (some very close and some relatively distant) in at least three species outside of Actinobacteria and Ascomycota.
Finally, a blastp search was conducted with a singleton sequence from Burkholderia cenopacia, reported as a toxin-like PLD by Dias-Lopes et al.  This search returned the Thalassiosira oceanica sequence mentioned above as a strong hit (E~1e-30).
Protein BLAST hits from newly represented lineages are summarized in Additional file 1: Table S1. Representative hits (see sequence filtering below) from arachnids, actinobacteria and ascomycota are included in Tables S6 and S7.
Translated blast and other database searching
To supplement the updated protein database for GDPD-like SMase D/PLDs, an exhaustive set of translated BLAST (tblastn) searches was then conducted on NCBI whole genome shotgun (WGS), transcriptome (TSA) and EST databases across all living organisms, along with blastp searches of the NCBI transcriptome protein database. These searches employed the original query sequences (see above) but also included searches initiated from homologs from newly represented lineages identified in the original blastp searches. Hits resulting from these searches are included in Additional file 1: Tables S2-S5. For the novel Aquatic group (Additional file 1: Table S2) and to some extent for the ST-like group (Additional file 1: Table S3), these searches revealed interesting species distributions, and special care was taken to obtain the most complete and accurate phylogenetic representation (see below).
For newly represented phyla, these searches were further supplemented, where possible, with deeper analysis on web databases dedicated to particular organisms or phyla, or on NCBI sequence read archive (SRA) datasets. This included analysis of ctenophore transcriptome data at http://neurobase.rc.ufl.edu/pleurobrachia; Mnemiopsis leidyi genome/transcriptome data at http://research.nhgri.nih.gov/mnemiopsis; Daphnia genome data at http://wfleabase.org and http://genome.jgi.doe.gov/; Acropora digitifera genome data at http://marinegenomics.oist.jp/coral; Physarum polycephalum genome data at http://www.physarum-blast.ovgu.de; Adineta vaga genome/transcriptome data at http://genoscope.cns.fr/adineta; Macrostomum lignano genome/transcriptome data at www.macgenome.org; Strigamia maritima genome data at https://www.hgsc.bcm.edu/arthropods/geophilimorph-centipede-genome-project and http://metazoa.ensembl.org; anthozoan transcriptome data at http://people.oregonstate.edu/~meyere/data.html (keyword search); NCBI SRA genome sequencing data for Amoebidium parasiticum; and a draft assembly for Hydractinia echinata at https://bica.nhgri.nih.gov/hydractinia/. These analyses led, for example, to the inference of widespread representation of the Aquatic group in the phylum Ctenophora, and to genome-level confirmation of representation in the hydrozoan class within Cnidaria.
There was some concern that homologous sequences could be missed in genome searches due to relatively low sequence similarity to the query, combined with interruption by introns. An illustrative example is the member of the Aquatic group from Physarum polycephalum. In tblastn searches of the Physarum transcriptome, all three of the original Aquatic group queries used gave strong hits (E-value <1e-25) to the homologous transcript; by contrast, none of the queries yielded hits to sequences in the Physarum genome. When the translated Physarum transcript was used as a query, however, a set of strong hits to the Physarum genome emerged. The reason for this discrepancy is that the Physarum homolog is relatively distant from the query sequences and its gene has at least 6 introns.
To minimize missed homologs in translated genomes, a second round of tblastn searches was done using representatives of a particular phylum or class against all genomes within that phylum or class. Some of these searches confirmed conspicuous absences suggestive of either gene loss or lateral gene transfer. For example, searches with hydrozoan queries in the Aquatic group gave no hits in the representative hydrozoan genome Hydra vulgaris, other than to a cysteine-rich C-terminal domain; searches with anthozoan queries in the ST-like group gave no hits in several anthozoan genomes including Nematostella vectensis, Exaiptasia pallida, Orbicella faveolata, and Stylophora pistillata. Further examples are discussed within the main text.
For transcriptome hits, hypothetical amino-acid sequences were generally inferred by simple open reading frame analysis of the mRNA sequence, using either DNA Strider or NCBI ORF Finder. Some genome hits were supported by transcript data and vice versa, and in such cases gene models were often available for inference of a hypothetical amino-acid sequence. In certain cases, however, existing gene models appeared to be incorrect. For example, one gene model had a PLD domain within the 5’-UTR of a gene, and more detailed analysis of SRA transcript data suggested that an intron had been missed. In such cases, alternative gene models were used to obtain hypothetical amino-acid sequences. Finally, some translated genome hits lacked any associated transcript data or gene model, but belonged to apparently single-exon genes, based on analysis with NCBI ORF Finder and inspection of amino-acid sequence alignments with homologs. In these cases, hypothetical protein sequences were inferred from direct translation of the genome sequence.
Filtering of sequences
BLAST hits to SicTox queries were dominated by arachnid homologs, and within arachnids, sicariid spider homologs dominate. In generating a representative set of arachnid sequences for further analysis, the sicariid hits were discarded and replaced with a representative set of 18 sequences spanning the known phylogeny of SicTox proteins from sicariids, and including the three initial query sequences of known structure (4Q6X, 1XX1 and 3RLH). Sequences from other arachnids were filtered at 95% redundancy but were otherwise retained unless they were highly incomplete at the termini or contained large deletions. One hypothetical protein sequence from Stegodyphus mimosarum (KFM59798) was retained for phylogenetic analysis despite containing only 75% of a PLD domain, due to its importance as a representative of close SicTox homologs in non-sicariid spiders. The final representative set of arachnid sequences (Additional file 1: Table S6; all ultimately assigned to the ST-like clade) contained a total of 62 sequences, including 32 representatives from spiders, 17 from ticks, 8 from mites and 5 from scorpions. While most arachnid sequences came directly from the nonredundant NCBI protein database, 13 tick sequences were derived from the transcriptome shotgun assembly protein database, and 5 hypothetical spider protein sequences were derived from translated BLAST hits from genomes, in cases where the entire PLD domain appeared to reside within a single exon. Sequences from the scorpion Mesobuthus martensii were derived from translated BLAST hits that were then mapped to protein sequence models downloaded from http://lifecenter.sgst.cn/main/en/scorpion.jsp.
Ascomycotal and actinobacterial sequences
BLAST hits to actinobacterial and fungal queries (AT-like family) were dominated by homologs from these two lineages, and a fairly strict approach was taken in choosing a representative set for further analysis. First, highly incomplete sequences (< 225 residues in length), and sequences with large deletions that included conserved active site residues, were removed unless otherwise specified. The remaining sequences were then filtered for redundancy at 90% identity. Two small sequence subfamilies were also removed from the dataset because they are likely to have diverged functionally and may lack PLD activity, however they may be interesting subjects for future investigation. First, one group of fungal sequences (XP_014576785 from Metarhizium majus, XP_007808303 from Metarhizium acridum, and XP_018143442 from Pochonia chlamydospora) exhibited extremely divergent active sites, including nonconservative active site replacements at His 12, Glu 32 and His 47, and in some cases also Asp 34, Asp 91 and Lys 93. Second, a group of bacterial sequences (WP_083462538 from Kitasatospora griseola and WP_037599565 from Streptacidiphilus rugosus) exhibited a considerably longer active site loop lacking any histidine corresponding to His 47. This group of sequences also contained probable N-terminal carbohydrate- or actin-binding domains, while almost all other AT-like sequences are single-domain proteins. Aside from those sequences, all ascomycotal and actinobacterial genera represented in the original set of BLAST hits were represented in the filtered set, except for Hirsutella, a fungal genus with only a fragmentary blastp hit. The representative protein alignment was supplemented with 14 translated, putatively intronless sequences obtained from tblastn searches of whole genome shotgun or transcriptome data. Sequences were only added if they represented new genera, and were also filtered at 90% redundancy. New phyla represented by these sequences included two different additional classes in the Ascomycota (Dothideomycetes and Lecanoromycetes). The final representative set of homologs from Actinobacteria and Ascomycota (Additional file 1: Table S7; all eventually assigned to the same AT-like clade) contained 72 sequences from 9 genera, 5 families and 5 orders within Actinobacteria; and 32 genera, 14 families, 9 orders and 5 classes of Ascomycota.
Sequences from other lineages
In general, sequences from newly represented lineages were retained for phylogenetic analysis unless they were highly redundant (95% ID level), highly fragmentary (e.g. < 75% complete PLD domain), contained major deletions, or were strongly suspected of being contaminants (see below).
Among ST-like proteins, several sequences putatively belonging to plants (Humulus lupulus and Ambrosia trifida) proved to be identical to ST-like proteins from plant-feeding mites (Tetranychus urticae). On the basis of this apparent instance of mite contamination, ST-like sequences outside of chelicerates showing high identity (> 50%) to known mite sequences were flagged as possible contaminants (see Additional file 1: Table S3). These included two hits putatively from the genome of the snowberry fruit fly Rhagoletis zephyria, which is in an early state of assembly at present; most hits from plant transcriptomes; and a fragmentary hit from the crustacean Talitrus saltator. The Rhagoletis hits are on relatively short unplaced scaffolds and could not be verified with available transcript data, nor were they supported by transcript data from Rhagoletis pomonella. In the AT-like group, two Blastp hits (JAV87767 and JAV94811) were recovered from fragmentary sequences putatively from Photinus pyralis, a species of firefly. These Photinus hits were derived from transcriptome shotgun data, and both are > 80% identical to proteins from fungi in the genus Metarhizium, which is comprised of entomopathogens. Cross-species contamination is strongly suspected here as well. Within the Aquatic group, several sequences putatively belonging to Oreochromis niloticus (a fish commonly known as tilapia) proved to be identical to confirmed genome and transcriptome sequences from Amoebidium parasiticum, a microorganism that is not found in association with tilapia but which had been sequenced at the same institute (Broad Institute). The sequences above were generally removed for phylogenetic analysis, except to illustrate contamination in the case of a Rhagoletis sequence (Fig. 5).
Sporadic hits from weakly represented lineages may be regarded as suspect, or tentative, even if no likely source of contamination can be identified. Specifically, some hits assigned to the Aquatic group (see Additional file 1: Table S2) came from transcriptome or genome data in phyla/classes for which a presence was not supported by at least two types of data (e.g. genome/transcriptome or transcriptome/proteome). These included the crustacean subphylum/class Crustacea/Malacostraca within the phylum Arthropoda (transcriptome hits from three species); the class Bivalvia within the phylum Mollusca (transcriptome hits from four species); the classes Alphaproteobacteria, Betaproteobacteria and Gammaproteobacteria within the bacterial phylum Proteobacteria (1 protein hit each, inferred from genomic data). In the case of the proteobacterial hits, contamination is unlikely because of the presence of genes on the DNA contig/scaffold with close homologs in the genome of species in the same genus. In the other cases, the sequences were retained for phylogenetic analyses but were flagged as lower confidence representatives in Additional file 1: Table S2, and in Fig. 6.
Sequence alignment and phylogenetic tree construction
Sequences were aligned using ClustalX . Maximum-likelihood (ML) phylogenetic trees were constructed with RaxML  at the CIPRES Science Gateway (https://www.phylo.org), using WAG + γ4 models with observed frequencies, as recommended by analysis using ProtTest . Although many homologs contain additional domains, especially at the C terminus, only the catalytic PLD domain sequence was used for tree construction, not including N-terminal signal sequences. Trees were rooted in RaxML using 6 GDPD sequences of known structure as outgroups. GDPD of known structure were judged to be best for outgroup rooting, as they allow for the highest possible quality sequence alignment to the ingroup using structure-structure alignment. Structural similarity searches were conducted using VAST  with known SicTox structures as query structures, and the four most similar GDPD structures were chosen (3QVQ, 3NO3, 3 L12 an 2O55). Second, two-round PSI-BLAST searches of the PDB were conducted with Sicariid toxins of known structure as queries, and the three best GDPD hits were chosen (3 L12, 2PZ0 and 2CH0). This produced a total of 6 candidate GDPD outgroups from the bacterial phyla Proteobacteria, Bacteroidetes and Firmicutes, as well as one sequence from the red algae Galdieria sulphuraria. We aligned the structures and sequences using PROMALS3D  and Chimera. For rooting, the character set was limited to 128 best-aligned positions, including the β-barrel framework plus helices α1, α2, and parts of α3, α7 and α8. These regions corresponded to sequence blocks that were 1) alignable within 5 Å in a Chimera Matchmaker alignment, and 2) only the blocks within that set where the Chimera and PROMALS3D alignments agreed. During rooting no restrictions were placed on the ingroup topology. All 6 outgroups, individually and together, rooted the tree on the same branch in the best ML tree. Although the structural alignment introduces a potential bias toward rooting in the Sicariid toxin-like group, the root position lies outside of it.
Phylogenetic hypothesis testing for the Aquatic clade was performed by calculating best maximum likelihood trees with and without a multifurcating constraint tree representing each hypothetical monophyletic group. Statistical tree topology tests (approximately unbiased test)  were then conducted as implemented on the IQ-Tree Web Server (http://iqtree.cibiv.univie.ac.at/) . Tests were performed for various eukaryotic clade hypotheses on datasets both including and excluding bacterial sequences.
A partial phylogenetic tree for eukaryotic organisms (Fig. 9) was constructed based on a variety of literature sources. Ecdysozoan phylogeny was based on the following references: [39, 71, 72]. Spiralian (Lophotrochozoa) phylogeny was based on the following references: [48, 73]. Cnidarian phylogeny was based on the following references: [42, 74].
Analysis of domains, operons and genomic islands
Domain families were identified using batch CD-search at the NCBI Conserved Domain Database, with E-values of 1e-05 or less being accepted as significant hits. Signal peptides were identified using SignalP version 4.0 . Sequences that included substantial N- or C-terminal regions outside the GDPD-like SMaseD/PLD region were also analyzed using FFAS03 against the Pfam and PDB databases, in an effort to identify more remote homologies missed by CD-search . FFAS scores lower than − 10 were considered significant. Proteobacterial genomic DNA was analyzed for genomic islands with Island Viewer 3 (http://www.pathogenomics.sfu.ca/islandviewer3/browse/), which integrates the programs SIGI-HMM, Island Pick and IslandPath-DIMOB . Prophage regions were predicted using PHAST . The genes of the plasmid from Burkholderia cenocepacia strain HI2424 were analyzed by BLAST and found to include genes coding for a virB4 homolog and a relaxase, along with a complement of tra genes similar to that found on F-plasmids. Based on this analysis the plasmid should be classified as conjugative (or at least mobilizable) .
Basic Local Alignment Search Tool
Fold and Function Assignment
Glycerophosphoryl diester phosphodiester phosphodiesterase
Lateral gene transfer
Messenger ribonucleic acid
National Center for Biotechnology Information
Open reading frame
Protein Data Bank
Sequence read archive
Vector Alignment Search Tool
Cordes MH, Binford GJ. Lateral gene transfer of a dermonecrotic toxin between spiders and bacteria. Bioinformatics. 2006;22(3):264–8.
Dias-Lopes C, Neshich IA, Neshich G, Ortega JM, Granier C, Chavez-Olortegui C, Molina F, Felicori L. Identification of new sphingomyelinases D in pathogenic fungi and other pathogenic organisms. PLoS One. 2013;8(11):e79240.
Fry BG, Roelants K, Champagne DE, Scheib H, Tyndall JD, King GF, Nevalainen TJ, Norman JA, Lewis RJ, Norton RS, et al. The toxicogenomic multiverse: convergent recruitment of proteins into animal venoms. Annu Rev Genomics Hum Genet. 2009;10:483–511.
Binford GJ, Bodner MR, Cordes MH, Baldwin KL, Rynerson MR, Burns SN, Zobel-Thropp PA. Molecular evolution, functional variation, and proposed nomenclature of the gene family that includes sphingomyelinase D in sicariid spider venoms. Mol Biol Evol. 2009;26(3):547–66.
Binford GJ, Wells MA. The phylogenetic distribution of sphingomyelinase D activity in venoms of Haplogyne spiders. Comp Biochem Physiol B Biochem Mol Biol. 2003;135(1):25–33.
Zobel-Thropp PA, Kerins AE, Binford GJ. Sphingomyelinase D in sicariid spider venom is a potent insecticidal toxin. Toxicon. 2012;60(3):265–71.
Tambourgi DV, Magnoli FC, van den Berg CW, Morgan BP, de Araujo PS, Alves EW, Da Silva WD. Sphingomyelinases in the venom of the spider Loxosceles intermedia are responsible for both dermonecrosis and complement-dependent hemolysis. Biochem Biophys Res Commun. 1998;251(1):366–73.
Alarcon-Chaidez FJ, Boppana VD, Hagymasi AT, Adler AJ, Wikel SK. A novel sphingomyelinase-like enzyme in Ixodes scapularis tick saliva drives host CD4 T cells to express IL-4. Parasite Immunol. 2009;31(4):210–9.
Borchani L, Sassi A, Shahbazzadeh D, Strub JM, Tounsi-Guetteti H, Boubaker MS, Akbari A, Van Dorsselaer A, El Ayeb M. Heminecrolysin, the first hemolytic dermonecrotic toxin purified from scorpion venom. Toxicon. 2011;58(1):130–9.
McNamara PJ, Cuevas WA, Songer JG. Toxic phospholipases D of Corynebacterium pseudotuberculosis, C. ulcerans and Arcanobacterium haemolyticum: cloning and sequence homology. Gene. 1995;156(1):113–8.
Selvy PE, Lavieri RR, Lindsley CW, Brown HA. Phospholipase D: enzymology, functionality, and chemical modulation. Chem Rev. 2011;111(10):6064–119.
Binford GJ, Cordes MH, Wells MA. Sphingomyelinase D from venoms of Loxosceles spiders: evolutionary insights from cDNA sequences and gene structure. Toxicon. 2005;45(5):547–60.
Murakami MT, Fernandes-Pedrosa MF, de Andrade SA, Gabdoulkhakov A, Betzel C, Tambourgi DV, Arni RK. Structural insights into the catalytic mechanism of sphingomyelinases D and evolutionary relationship to glycerophosphodiester phosphodiesterases. Biochem Biophys Res Commun. 2006;342(1):323–9.
Kurpiewski G, Forrester LJ, Barrett JT, Campbell BJ. Platelet aggregation and sphingomyelinase D activity of a purified toxin from the venom of Loxosceles reclusa. Biochim Biophys Acta. 1981;678(3):467–76.
Lee S, Lynch KR. Brown recluse spider (Loxosceles reclusa) venom phospholipase D (PLD) generates lysophosphatidic acid (LPA). Biochem J. 2005;391(Pt 2):317–23.
van Meeteren LA, Frederiks F, Giepmans BN, Pedrosa MF, Billington SJ, Jost BH, Tambourgi DV, Moolenaar WH. Spider and bacterial sphingomyelinases D target cellular lysophosphatidic acid receptors by hydrolyzing lysophosphatidylcholine. J Biol Chem. 2004;279(12):10833–6.
Lajoie DM, Roberts SA, Zobel-Thropp PA, Delahaye JL, Bandarian V, Binford GJ, Cordes MH. Variable substrate preference among phospholipase D toxins from Sicariid spiders. J Biol Chem. 2015;290(17):10994–1007.
Corda D, Mosca MG, Ohshima N, Grauso L, Yanaka N, Mariggio S. The emerging physiological roles of the glycerophosphodiesterase family. FEBS J. 2014;281(4):998–1016.
Lajoie DM, Cordes MH. Spider, bacterial and fungal phospholipase D toxins make cyclic phosphate products. Toxicon. 2015;108:176–80.
Ohshima N, Kudo T, Yamashita Y, Mariggio S, Araki M, Honda A, Nagano T, Isaji C, Kato N, Corda D, et al. New members of the mammalian glycerophosphodiester phosphodiesterase family: GDE4 and GDE7 produce lysophosphatidic acid by lysophospholipase D activity. J Biol Chem. 2015;290(7):4260–71.
Songer JG, Libby SJ, Iandolo JJ, Cuevas WA. Cloning and expression of the phospholipase D gene from Corynebacterium pseudotuberculosis in Escherichia coli. Infect Immun. 1990;58(1):131–6.
Cuevas WA, Songer JG. Arcanobacterium haemolyticum phospholipase D is genetically and functionally similar to Corynebacterium pseudotuberculosis phospholipase D. Infect Immun. 1993;61(10):4310–6.
Shi L, Liu JF, An XM, Liang DC. Crystal structure of glycerophosphodiester phosphodiesterase (GDPD) from Thermoanaerobacter tengcongensis, a metal ion-dependent enzyme: insight into the catalytic mechanism. Proteins. 2008;72(1):280–8.
Lajoie DM, Zobel-Thropp PA, Kumirov VK, Bandarian V, Binford GJ, Cordes MH. Phospholipase D toxins of brown spider venom convert lysophosphatidylcholine and sphingomyelin to cyclic phosphates. PLoS One. 2013;8(8):e72372.
Santelli E, Schwarzenbacher R, McMullan D, Biorac T, Brinen LS, Canaves JM, Cambell J, Dai X, Deacon AM, Elsliger MA, et al. Crystal structure of a glycerophosphodiester phosphodiesterase (GDPD) from Thermotoga maritima (TM1621) at 1.60 A resolution. Proteins. 2004;56(1):167–70.
Murakami MT, Fernandes-Pedrosa MF, Tambourgi DV, Arni RK. Structural basis for metal ion coordination and the catalytic mechanism of sphingomyelinases D. J Biol Chem. 2005;280(14):13658–64.
Bateman A, Sandford R. The PLAT domain: a new piece in the PKD1 puzzle. Curr Biol. 1999;9(16):R588–90.
Naylor CE, Jepson M, Crane DT, Titball RW, Miller J, Basak AK, Bolgiano B. Characterisation of the calcium-binding C-terminal domain of Clostridium perfringens alpha-toxin. J Mol Biol. 1999;294(3):757–70.
Kulkarni S, Das S, Funk CD, Murray D, Cho W. Molecular basis of the specific subcellular localization of the C2-like domain of 5-lipoxygenase. J Biol Chem. 2002;277(15):13167–74.
Xu Y, Streets AJ, Hounslow AM, Tran U, Jean-Alphonse F, Needham AJ, Vilardaga JP, Wessely O, Williamson MP, Ong AC. The Polycystin-1, lipoxygenase, and alpha-toxin domain regulates Polycystin-1 trafficking. J Am Soc Nephrol. 2016;27(4):1159–73.
Ding YH, Du YG, Luo S, Li YX, Li TM, Yoshina S, Wang X, Klage K, Mitani S, Ye K, et al. Characterization of PUD-1 and PUD-2, two proteins up-regulated in a long-lived daf-2 mutant. PLoS One. 2013;8(6):e67158.
Shimizu T, Vassylyev DG, Kido S, Doi Y, Morikawa K. Crystal structure of vitelline membrane outer layer protein I (VMO-I): a folding motif with homologous Greek key structures related by an internal three-fold symmetry. EMBO J. 1994;13(5):1003–10.
Griessl MH, Schmid B, Kassler K, Braunsmann C, Ritter R, Barlag B, Stierhof YD, Sturm KU, Danzer C, Wagner C, et al. Structural insight into the giant ca(2)(+)-binding adhesin SiiE: implications for the adhesion of Salmonella enterica to polarized epithelial cells. Structure. 2013;21(5):741–52.
Vance TD, Olijve LL, Campbell RL, Voets IK, Davies PL, Guo S. Ca2+−stabilized adhesin helps an Antarctic bacterium reach out and bind ice. Biosci Rep. 2014;34(4). https://doi.org/10.1042/BSR20140083.
Kido S, Doi Y, Kim F, Morishita E, Narita H, Kanaya S, Ohkubo T, Nishikawa K, Yao T, Ooi T. Characterization of vitelline membrane outer layer protein I, VMO-I: amino acid sequence and structural stability. J Biochem. 1995;117(6):1183–91.
Simpson SD, Ramsdell JS, Watson Iii WH, Chabot CC. The draft genome and transcriptome of the Atlantic horseshoe crab, Limulus polyphemus. Int J Genomics. 2017;2017:7636513.
Chipman AD, Ferrier DE, Brena C, Qu J, Hughes DS, Schroder R, Torres-Oliva M, Znassi N, Jiang H, Almeida FC, et al. The first myriapod genome sequence reveals conservative arthropod gene content and genome organisation in the centipede Strigamia maritima. PLoS Biol. 2014;12(11):e1002005.
Rehm P, Meusemann K, Borner J, Misof B, Burmester T. Phylogenetic position of Myriapoda revealed by 454 transcriptome sequencing. Mol Phylogenet Evol. 2014;77:25–33.
Regier JC, Shultz JW, Zwick A, Hussey A, Ball B, Wetzer R, Martin JW, Cunningham CW. Arthropod relationships revealed by phylogenomic analysis of nuclear protein-coding sequences. Nature. 2010;463(7284):1079–83.
Lozano-Fernandez J, Carton R, Tanner AR, Puttick MN, Blaxter M, Vinther J, Olesen J, Giribet G, Edgecombe GD, Pisani D. A molecular palaeobiological exploration of arthropod terrestrialization. Philos Trans R Soc Lond Ser B Biol Sci. 2016;371(1699). https://doi.org/10.1098/rstb.2015.0133.
Shinzato C, Shoguchi E, Kawashima T, Hamada M, Hisata K, Tanaka M, Fujie M, Fujiwara M, Koyanagi R, Ikuta T, et al. Using the Acropora digitifera genome to understand coral responses to environmental change. Nature. 2011;476(7360):320–3.
Kayal E, Bentlage B, Pankey MS, Ohdera AH, Medina M, Plachetzki DC, Collins AG, Ryan JF. Phylogenomics provides a robust topology of the major cnidarian lineages and insights on the origins of key organismal traits. BMC Evol Biol. 2018;18:68.
Babonis LS, Martindale MQ, Ryan JF. Do novel genes drive morphological novelty? An investigation of the nematosomes in the sea anemone Nematostella vectensis. BMC Evol Biol. 2016;16(1):114.
Wu C, Jordan MD, Newcomb RD, Gemmell NJ, Bank S, Meusemann K, Dearden PK, Duncan EJ, Grosser S, Rutherford K, et al. Analysis of the genome of the New Zealand giant collembolan (Holacanthella duospinosa) sheds light on hexapod evolution. BMC Genomics. 2017;18(1):795.
Nowell RW, Almeida P, Wilson CG, Smith TP, Fontaneto D, Crisp A, Micklem G, Tunnacliffe A, Boschetti C, Barraclough TG. Comparative genomics of bdelloid rotifers: insights from desiccating and nondesiccating species. PLoS Biol. 2018;16(4):e2004830.
Chapman JA, Kirkness EF, Simakov O, Hampson SE, Mitros T, Weinmaier T, Rattei T, Balasubramanian PG, Borman J, Busam D, et al. The dynamic genome of Hydra. Nature. 2010;464(7288):592–6.
Dunn CW, Hejnol A, Matus DQ, Pang K, Browne WE, Smith SA, Seaver E, Rouse GW, Obst M, Edgecombe GD, et al. Broad phylogenomic sampling improves resolution of the animal tree of life. Nature. 2008;452(7188):745–9.
Edgecombe GD, Giribet G, Dunn CW, Hejnol A. Higher-level metazoan relationships: recent progress and remaining questions. Org Divers Evol. 2011;11:151–72.
Tailliez P, Laroui C, Ginibre N, Paule A, Pages S, Boemare N. Phylogeny of Photorhabdus and Xenorhabdus based on universally conserved protein-coding sequences and implications for the taxonomy of these two genera. Proposal of new taxa: X. vietnamensis sp. nov., P. luminescens subsp. caribbeanensis subsp. nov., P. luminescens subsp. hainanensis subsp. nov., P. temperata subsp. khanii subsp. nov., P. temperata subsp. tasmaniensis subsp. nov., and the reclassification of P. luminescens subsp. thracensis as P. temperata subsp. thracensis comb. nov. Int J Syst Evol Microbiol. 2010;60(Pt 8):1921–37.
Fischer-Le Saux M, Mauleon H, Constant P, Brunel B, Boemare N. PCR-ribotyping of Xenorhabdus and Photorhabdus isolates from the Caribbean region in relation to the taxonomy and geographic distribution of their nematode hosts. Appl Environ Microbiol. 1998;64(11):4246–54.
Tailliez P, Pages S, Ginibre N, Boemare N. New insight into diversity in the genus Xenorhabdus, including the description of ten novel species. Int J Syst Evol Microbiol. 2006;56(Pt 12):2805–18.
Pauza CD, Karels MJ, Navre M, Schachman HK. Genes encoding Escherichia coli aspartate transcarbamoylase: the pyrB-pyrI operon. Proc Natl Acad Sci U S A. 1982;79(13):4020–4.
Bommer D, Schaferjohann J, Bowien B. Identification of cbbBc as an additional distal gene of the chromosomal cbb CO2 fixation operon from Ralstonia eutropha. Arch Microbiol. 1996;166(4):245–51.
Van Melderen L, Saavedra De Bast M. Bacterial toxin-antitoxin systems: more than selfish entities? PLoS Genet. 2009;5(3):e1000437.
Thomas GM, Poinar GO. Xenorhabdus, new genus of entomopathogenic, nematophilic bacteria of the family Enterobacteriaceae. Int J Syst Bacteriol. 1979;29:352–60.
Liu M, Chaverri P, Hodge KT. A taxonomic revision of the insect biocontrol fungus Aschersonia aleyrodis, its allies with white stromata and their Hypocrella sexual states. Mycol Res. 2006;110(Pt 5):537–54.
Dobrindt U, Hochhut B, Hentschel U, Hacker J. Genomic islands in pathogenic and environmental microorganisms. Nat Rev Microbiol. 2004;2(5):414–24.
Garcillan-Barcia MP, Francia MV, de la Cruz F. The diversity of conjugative relaxases and its application in plasmid classification. FEMS Microbiol Rev. 2009;33(3):657–87.
Murakami-Murofushi K, Shioda M, Kaji K, Yoshida S, Murofushi H. Inhibition of eukaryotic DNA polymerase alpha with a novel lysophosphatidic acid (PHYLPA) isolated from myxoamoebae of Physarum polycephalum. J Biol Chem. 1992;267(30):21512–7.
Murakami-Murofushi K, Kaji K, Kano K, Fukuda M, Shioda M, Murofushi H. Inhibition of cell proliferation by a unique lysophosphatidic acid, PHYLPA, isolated from Physarum polycephalum: signaling events of antiproliferative action by PHYLPA. Cell Struct Funct. 1993;18(5):363–70.
Takahashi Y, Shimada Y, Shioda M, Yoshida S, Murofushi H, Murakami-Murofushi K. Isolation of a new species of Physarum lysophosphatidic acid, PHYLPA, and its effect on DNA polymerase activity. Cell Struct Funct. 1993;18(3):135–8.
Sanders SM, Shcheglovitova M, Cartwright P. Differential gene expression between functionally specialized polyps of the colonial hydrozoan Hydractinia symbiolongicarpus (phylum Cnidaria). BMC Genomics. 2014;15:406.
Isowa Y, Sarashina I, Oshima K, Kito K, Hattori M, Endo K. Proteome analysis of shell matrix proteins in the brachiopod Laqueus rubellus. Proteome Sci. 2015;13:21.
Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG. The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 1997;25(24):4876–82.
Stamatakis A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 2014;30(9):1312–3.
Abascal F, Zardoya R, Posada D. ProtTest: selection of best-fit models of protein evolution. Bioinformatics. 2005;21(9):2104–5.
Gibrat JF, Madej T, Bryant SH. Surprising similarities in structure comparison. Curr Opin Struct Biol. 1996;6(3):377–85.
Pei J, Kim BH, Grishin NV. PROMALS3D: a tool for multiple protein sequence and structure alignments. Nucleic Acids Res. 2008;36(7):2295–300.
Shimodaira H. An approximately unbiased test of phylogenetic tree selection. Syst Biol. 2002;51(3):492–508.
Trifinopoulos J, Nguyen LT, von Haeseler A, Minh BQ. W-IQ-TREE: a fast online phylogenetic tool for maximum likelihood analysis. Nucleic Acids Res. 2016;44(W1):W232–5.
Giribet G, Edgecombe GD. Current understanding of Ecdysozoa and its internal phylogenetic relationships. Integr Comp Biol. 2017;57(3):455–66.
Schwentner M, Combosch DJ, Pakes Nelson J, Giribet G. A Phylogenomic solution to the origin of insects by resolving crustacean-hexapod relationships. Curr Biol. 2017;27(12):1818–1824 e1815.
Kocot KM, Struck TH, Merkel J, Waits DS, Todt C, Brannock PM, Weese DA, Cannon JT, Moroz LL, Lieb B, et al. Phylogenomics of Lophotrochozoa with consideration of systematic error. Syst Biol. 2017;66(2):256–82.
Zapata F, Goetz FE, Smith SA, Howison M, Siebert S, Church SH, Sanders SM, Ames CL, McFadden CS, France SC, et al. Phylogenomic analyses support traditional relationships within Cnidaria. PLoS One. 2015;10(10):e0139068.
Petersen TN, Brunak S, von Heijne G, Nielsen H. SignalP 4.0: discriminating signal peptides from transmembrane regions. Nat Methods. 2011;8(10):785–6.
Jaroszewski L, Rychlewski L, Li Z, Li W, Godzik A. FFAS03: a server for profile--profile sequence alignments. Nucleic Acids Res. 2005;33(Web Server issue):W284–8.
Dhillon BK, Laird MR, Shay JA, Winsor GL, Lo R, Nizam F, Pereira SK, Waglechner N, McArthur AG, Langille MG, et al. IslandViewer 3: more flexible, interactive genomic island discovery, visualization and analysis. Nucleic Acids Res. 2015;43(W1):W104–8.
Zhou Y, Liang Y, Lynch KH, Dennis JJ, Wishart DS. PHAST: a fast phage search tool. Nucleic Acids Res. 2011;39(Web Server issue):W347–52.
Smillie C, Garcillan-Barcia MP, Francia MV, Rocha EP, de la Cruz F. Mobility of plasmids. Microbiol Mol Biol Rev. 2010;74(3):434–52.
This research was supported by NIH grant R15 GM097676, to GJB and by NSF grant CHE1808716, to MHJC and GJB.
Availability of data and materials
All data generated or analysed during the current study are included in this published article [and its supplementary information files], and/or sequence identifiers are supplied that allow retrieval from public databases.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Tables S1-S8. Informational tables for GDPD-like SMaseD/PLD sequences considered in this study. (XLS 140 kb)
Figure S1. Sequence alignment of all GDPD-like SMaseD/PLD domains used for phylogenetic tree construction. Sequence names include an abbreviation for genus and species (e.g. Tetr_ur for Tetranychus urticae), preceded by a number to account for multiple homologs from a given species, and followed by a protein or nucleotide identifier for the database source of the sequence. (PDF 4708 kb)
Figure S2. Sequence alignment of cysteine-rich C-terminal domains found among Aquatic clade homologs. Note the presence of 10 perfectly conserved cysteine residues per domain, along with a glycine-rich motif and conserved tyrosine at the C-terminal end (asterisks). Sequence names include an abbreviation for genus and species, preceded by a number to account for multiple homologs from a given species and a letter to account for multiple domains within a homolog, e.g. “b1_Eurh_ex” denotes the 2nd cysteine-rich C-terminal domain found in homolog 1 from Eurhomalea exalbida. The sequence names is also tagged with a protein or nucleotide identifier for the database source of the sequence. (PDF 375 kb)