Evolutionary fates of universal stress protein paralogs in Platyhelminthes
© The Author(s). 2018
Received: 23 January 2017
Accepted: 23 January 2018
Published: 1 February 2018
Universal stress proteins (USPs) are present in all domains of life. Their expression is upregulated in response to a large variety of stress conditions. The functional diversity found in this protein family, paired with the sequence degeneration of the characteristic ATP-binding motif, suggests a complex evolutionary pattern for the paralogous USP-encoding genes. In this work, we investigated the origin, genomic organization, expression patterns and evolutionary history of the USP gene family in species of the phylum Platyhelminthes.
Our data showed a cluster organization, a lineage-specific distribution, and the presence of several pseudogenes among the USP gene copies identified. The absence of a well conserved -CCAATCA- motif in the promoter region was positively correlated with low or null levels of gene expression, and with amino acid changes within the ligand binding motifs. Despite evidence of the pseudogenization of various USP genes, we detected an important functional divergence at several residues, mostly located near sites that are critical for ligand interaction.
Our results provide a broad framework for the evolution of the USP gene family, based on the emergence of new paralogs that face very contrasting fates, including pseudogenization, subfunctionalization or neofunctionalization. This framework aims to explain the sequence and functional diversity of this gene family, providing a foundation for future studies in other taxa in which USPs occur.
The emergence of gene families is based on successive events of gene duplication. Duplicate copies can result from unequal crossing-over during meiosis or from retrotransposition processes . While a crossing-over mismatch can generate a duplication of the entire gene structure, including promoter regions, introns and exons, retrotransposition events usually result in an intronless gene composed only of the exons of the ancestral gene, and giving rise to a single transcript. For each new duplicate copy, several outcomes are possible. First, a neofunctionalization process, where the new gene takes on a new function, different from that of the parental gene. Second, a subfunctionalization process, where the new copy preserves its function, but with a singular spatio-temporal regulation (e.g., expression in a specific tissue and at a specific developmental stage). Third, a pseudogenization process, where the duplicate copy accumulates deleterious mutations, leading to loss of function [2–4].
Members of the universal stress protein (USP) gene family are found in bacteria, archaea, and eukaryotes and are composed of a variable number of copies due to lineage-specific expansions . These proteins are highly expressed in response to a large variety of stress conditions, such as oxidative stress, heat shock, and UV exposure [6–8]. In addition to stress resistance, they participate in the regulation of cell growth and host infection in Mycobacterium tuberculosis , and contribute to cell adhesion and motility in Escherichia coli . The USP protein domain exhibits a protein motif capable to interact with ATP, ADP, AMP, GTP, etc. [9, 10]. In some USPs, the amino acid sequence making up this motif is partially or completely degenerated . In general, it was observed that almost all USPs crystals with the typical ATP-binding motif were solved with ATP or an ATP analog, while for those USPs where this motif is completely degenerated, neither ligand nor ion binding was observed . Although different functions have been identified for USPs with typical and degenerated ATP-binding motifs [6, 8, 12], the functional impact of the amino acid substitutions at sites involved in ligand interaction remains poorly understood.
The Platyhelminthes include some harmful parasite species with considerable negative effect on public health, especially in developing countries  (World Health Organization, WHO, 2016). Several of the so-called neglected tropical diseases (NTDs), a diverse group of communicable diseases of the tropics, are caused by Platyhelminthes, including echinococcosis and schistosomiasis, which are responsible for 1200 and 11,700 deaths per year worldwide, respectively . The complex life cycle of parasitic Platyhelminthes involves their interaction with two or more hosts, accompanied by drastic physiological and morphological changes [15, 16]. This continuous change of microenvironments results in the exposure to a wide array of biotic and abiotic stressors [16–18]. In a recent review, the application of the USPs as novel anti-parasitic targets has been discussed . USPs play an important role in the transition between different stages of the Schistosoma life cycle, including that between cercariae and schistosomula stages. Based on this and on the absence of this gene family in vertebrates, including humans, USPs could represent an interesting target for anti-schistosomal treatment .
Here, we use comparative genomics and the relationship between protein sequence variations and gene expression patterns to build a framework for the evolution of the USP gene family in the Platyhelminthes. This framework aims to explain the sequence and functional diversity of this gene family, providing a foundation for future studies in other taxa in which USPs occur.
Sample collection, genotyping and quantitative PCR
Bovine hydatid cysts were obtained from the Cooperleo Abattoir (São Leopoldo, Rio Grande do Sul, Brazil). The pre-adult stage (protoscoleces, PSC) of Echinococcus ortleppi was collected by hydatid cyst fluid aspiration and washed with phosphate buffered saline (PBS). Genotyping was performed on part of the cytochrome c oxidase subunit I (cox1) gene as previously described .
For quantitative PCR (qPCR) expression analysis, approximately 1.000 PSC were mixed with 0.5 mL of TRIzol reagent (Thermo Fisher Scientific) and immediately frozen in liquid nitrogen until RNA extraction. Total RNA was isolated using TRIzol according to the manufacturer’s protocol. Isolated RNA was subsequently treated with RNase-free DNase I (Sigma-Aldrich) for 30 min at 25 °C to remove all genomic DNA. Total RNA concentration was determined using a Qubit fluorometer (Thermo Fisher Scientific). The first strand of cDNA was synthesized from 200 ng of total RNA using M-MLV reverse transcriptase (Thermo Fischer Scientific) and an Oligo(dT)18 primer (0.5 μg/μL), following manufacturer’s instructions. The final cDNA product was diluted 100-fold with nuclease-free water prior to use in qPCR experiments.
Real-time PCR was performed using an ABI Real-Time 7500 Fast PCR system (Applied Biosystems). Based on the genome of Echinococcus granulosus, specific primers were designed for six USP genes of E. ortleppi (Additional file 1: Table S1), two of which are downregulated and four of which are upregulated in the pre-adult form, according to RNA-seq data [21, 22]. The reaction mixture and the qPCR cycling conditions were as described previously . Control reactions without reverse transcriptase and without template were included to confirm the absence of genomic DNA and other PCR contaminants, respectively. All qPCR reactions were performed in technical and biological triplicates. The amplification efficiency was calculated using the LinRegPCR software . The gene expression quantification was performed with the ΔΔCt method, using EF-1α as a normalizer gene .
USP sequence retrieval
The USP sequences of twelve Platyhelminthes species (Gyrodactylus salaris, Macrostomum lignano, Schistosoma mansoni, Schistosoma haematobium, Opisthorchis viverrini, Clonorchis sinensis, Echinococcus granulosus, Echinococcus multilocularis, Echinococcus canadensis, Taenia solium, Hymenolepis microstoma, and Schmidtea mediterranea) were extracted from the WormBase ParaSite and SmedGD databases [25, 26] using the USP Pfam code PF00582 and the keyword “universal stress protein”. Orthologs relationship were initially obtained by reciprocal BLASTn, and confirmed by the presence of monophyletic clades of each group of the orthologs in the phylogenetic trees. USP sequences were retrieved based on the following criteria: synteny, the presence of a single exon in most sequences (or the conserved position of the intron where present) , and phylogenetic relationships between orthologs. The identification of low homology sequences (probable pseudogenes) was achieved through a tBLASTn search (BLOSUM45). For each species, we blasted each USP protein sequence against the entire genome, applying an e-value threshold of 1e-1, allowing the alignment of low complexity regions, and using opening and extending gap penalties of 14 and 2, respectively. To avoid the recovery of spurious hits, we used similar criteria to those used in the search for orthologs, as follows: synteny, genes with a single intron or intronless, and the amino acid conservation in specific regions related to the interaction with ligands and belonging to the USP domain. The USP sequences of the molluscs Lottia gigantea, Crassostrea gigas, and Octopus bimaculoides, and the annelids Helobdella robusta and Capitella teleta, were retrieved from the Ensembl Genome and JGI databases [27, 28] using the Pfam code PF00582. These last species were used as outgroups in the phylogenetic analysis.
Phylogenetic analyses were performed using the Bayesian Inference (BI) and Maximum Likelihood (ML) probabilistic methods [29, 30]. Protein sequences were aligned with MAFFT v7  using the FFT-NS-I method, and any columns containing more than 95% gaps were deleted using Gap Strip/Squeeze v2.1.0 . The best substitution model for our data set was defined with the Smart Model Selection (SMS) tool incorporated in PhyML . The ML tree was generated with PhyML v3.0  using the aLRT-SH method for branch support. The BI tree was generated with BEAST v1.8.4 , using two independent runs of 50.000.000 chains and sampling at every 5.000 generations. The birth and death process , and the LG + G substitution model  with 4 gamma categories, were the priors for the analysis in BEAST v1.8.4. Other parameters (e.g. clock model) were used as default. The software TRACER v1.6  was used to check the convergence of Monte Carlo Markov Chains (MCMC) and to ensure adequate effective sample sizes (ESS > 200) after the first 10% of generations were deleted as burn-in. The maximum clade credibility tree was estimated with TreeAnnotator, which is part of the BEAST v1.8.4 package, and the tree was visualized using Figtree v1.4.3 .
Positive selection analysis
DNA motif analysis of the promoter regions
In order to gain insights about the origin and regulation of the USPs, we searched for conserved patterns (DNA motifs) in the promoter region of these genes. DNA motif analysis was performed using the mixture model by expectation maximization (MEME) method, incorporated in the MEME suite . Five hundred bp of the USP promoter region were extracted from the 5′ end upstream of the start codon ATG. The Saccharomyces cerevisiae database was used to compare the identified motifs with others previously described (Tomtom) , and to find associations with genes linked to gene ontology terms (GOMo) . The motif search was executed with default parameters, considering a maximum width of 10 nucleotides and allowing any number of repetitions for the motifs in the sequence. All upstream sequences are available in the Additional file 2.
Divergence analysis and USP protein modeling
Evolutionarily conserved amino acids are expected to have an important role in protein structure and function. Therefore, changes at these sites may be an indicator of functional divergence. We used the software Diverge v3.0 , to examine site-specific shifted evolutionary rates by calculating the coefficient of type I of divergence (Ɵ I ). Type I of divergence results in differing functional constraints (i.e., different evolutionary rates) between duplicated genes, regardless of the underlying evolutionary mechanisms. The null hypothesis (Ɵ I = 0) is assessed by the likelihood ratio test (LRT) , and its rejection indicates some level of functional divergence between the clusters compared. Because the output of Diverge v3.0 follows a chi-square distribution with one degree of freedom, the LRT values greater than or equal to 3.84 indicate functional divergence between clusters. Comparisons were performed between paralogs groups of approximately five sequences within the Cestoda and Trematoda classes. Using a cut-off value of 0.9 for the a posteriori probability, we identified amino acid sites under Type I of functional divergence.
For protein modeling, we chose the E. granulosus USP protein EgrG_08736, which exhibits the typical ATP-binding motif [Gx2Gx9G(S/T)]. The 3D protein was modelled with a homologous template using Phyre v2 . In addition to 3D modeling, Phyre v2 predicts ligand-binding sites and analyzes the effect of amino acid variants (Phyre Investigator). The obtained model was used to evaluate the effect of mutations at conserved sites, and to localize the amino acid residues found to be under functional divergence by Diverge v3.0. The quality of the model was assessed with ModFOLD v4.0 .
USP gene organization in the Platyhelminthes
The high quality and completeness of several Platyhelminthes genomes (e.g. Echinococcus spp., Schistosoma mansoni) [21, 50], together with the criteria for the search for orthologs (see Methods), allowed us to locate and accurately retrieve all the DNA and protein sequences of the USP genes for each species. We found that the number of USP genes varied between Platyhelminthes species: 12 genes in E. granulosus, E. multilocularis, and E. canadensis, 13 in T. solium, 16 in H. microstoma, 10 in S. mansoni and S. haematobium, 18 in C. sinensis and O. viverrini, 17 in S. mediterranea, 6 in G. salaris, and 83 in M. lignano (Additional file 1: Table S2). The high number of USP sequences in this last species, including about 35 identical sequences, could be a consequence of an ancestral whole-genome duplication or recent large segmental duplications, as previously described . To simplify, we removed the two zeros at the beginning and end of each USP identification number (ID) for the Cestoda species. Through reciprocal BLASTn, we detected orthologous relationships within the Cestoda and Trematoda classes; however, between classes, or when including the free-living flatworm S. mediterranea (class Turbellaria), the orthologous relationships between species become fuzzy and unrequited (Additional file 1: Table S2). In all Platyhelminthes species analyzed here, USP genes are distributed in clusters throughout the genome, with lineage-specific losses/expansions (Fig. 1a). Clustering is more accentuated in the Cestoda and Trematoda than in Turbellaria (data available at the SmedGD database). The relaxed tBLASTn analysis detected three pseudogene candidates in the genus Echinococcus, two in T. solium, and one in H. microstoma. Synteny indicates that these pseudogenes may represent lineage-specific gene losses in Echinococcus spp. and T. solium compared with H. microstoma; and in H. microstoma compared with H. diminuta (Fig. 1a and b). Pseudogenes are characterized by the presence of indels in their coding sequence, which lead to frameshift mutations and thereby generate stop codons (Fig. 1b). Sequence differences between pseudogenes and their respective ortholog are highly variable for paralogous pseudogenes, reflecting an ancient pseudogenization process (Fig. 1b). For the other species, there was no evidence of pseudogenes with our search strategy. Nevertheless, we identified twelve USP genes in the Trematoda, which could not be detected using the USP Pfam code. All of these were located in the vicinity of other USP genes. Two were not previously annotated (Csin107892a, T265_02176a), and one gene was re-annotated (T265_02178, corresponding to genes T265_02178a and T265_02178b). The other copies, which were annotated as “universal stress protein” or without description, were Csin107891, Csin107892, Csin107893, Csin110039, Csin110041, T265_02177, T265_02179, and T265_02180. All protein sequences reported in this work are available in the Additional file 2.
Phylogenetic trees and origin of the USP gene family
DNA motif analysis
DNA motif analysis of the promoter regions detected the heptanucleotide -CCAATCA- between positions − 200 and − 40 upstream for almost all USP genes (Additional file 1: Table S3). This motif is a known DNA binding site for the mammalian nuclear transcription factor Y (NF-Y, HAP in S. cerevisiae), which promotes the initiation of gene transcription . NF-Y consists of three subunits: NFA, NFB and NFC (HapB, HapC, and HapE orthologs in S. cerevisiae). In the presence of reactive oxygen species (ROS), oxidized HapC prevent the interaction with the HapE and HapB subunits. Consequently, the formation of the CCAAT-binding complex is abolished and their nuclear localization and regulation of target genes becomes affected . Using the Pfam numbers PF02045 and PF00808, we identified the orthologs of NFYA, NFYB, and NFYC for the species studied (data available at the WormBase database). In line with this, the GOMo tool reports that the motif -CCAATCA- is involved in the oxidation-reduction processes as the ATP synthesis coupled to proton transport. In both Cestoda and Trematoda classes, some genes lack the conserved -CCAATCA−/-CCAAT- motif (e.g. EgrG_02019, EgrG_08735, EgrG_09839, and their orthologs). This could suggest a common origin for these genes, different regulation properties, or evidence of a pseudogenization process. In other genes (e.g. EgrG_08736, EgrG_08738, EgrG_07258, EgrG_10769, EgrG_09018, and orthologs), the -CCAATCA- motif occurs at the exact same position. These data provide insights about the functional diversification of the USP promoter regions and their origins by retrotransposition or tandem duplication events, as well as about the relationship between these last two.
3D protein modeling
The highest scoring template in the 3D structural analysis of EgrG_08736 was the USP MJ0577 from Methanococcus jannaschii, with a confidence of homology of 99.9%. Against this template, the alignment coverage was 84%, and the sequence identity between both proteins was 33%. Analysis with ModFOLD v4.0 returned a global quality model score of 0.7 and a p-value of 6.4E-4 (Additional file 3: Figure S3A). The presence of a large coil between the second beta strand and the second alpha helix (Additional file 3: Figure S3B) is due to the insertion of eleven amino acids in our query sequence relative to the template. This region is highly variable across USP paralogs  (Additional file 3: Figure S4); it is located on the outside of the protein pocket in the 3D model (Additional file 3: Figure S3). Using Phyre v2 Investigator, we predicted the likely functional sites in our model, as well as the effect of mutations at these specific sites (described below; see also Additional file 3: Figure S3C).
Gene expression analysis
Positive selection and divergence analysis
Positive selection analysis of USP genes for species of class Cestoda
Estimates of parameters
Positive selected sites (PSS)a
ω = 0.32480
ω0 = 0.03822, ω1 = 1, p0 = 0.95404, p1 = 0.04596
ω0 = 0.03823, ω1 = 1., ω2 = 1, p0 = 0.95404, p1 = 0.00224, p2 = 0.04372
151 N b
p = 0.04543, q = 0.40703
M8 (beta & ω)
p0 = 0.97089, p = 0.03908, q = 0.98312, p1 = 0.02911, ω = 1
7 K 16E 20 T 32 K 45R 48 K 49 K 50R 51D 64 K 65S 671 N 72E 77 L 80E 83 N 114 K 115I 117E 120G 151N
Functional divergence analysis (Type I) within Cestoda and Trematoda species
Ɵ ± SE
Sites (Qk > 0.9)c
0.70 ± 0.14
0.78 ± 0.17
0.95 ± 0.16
0.97 ± 0.14
0.98 ± 0.17
0.91 ± 0.14
1.36 ± 0.15
0.80 ± 0.18
0.64 ± 0.14
0.79 ± 0.14
Ɵ ± SE
Sites (Qk > 0.9)c
0.64 ± 0.20
0.75 ± 0.21
0.99 ± 0.24
0.94 ± 0.13
0.68 ± 0.17
0.69 ± 0.18
0.68 ± 0.19
0.84 ± 0.19
The expansion of gene families by gene duplication represents a successful strategy for the propagation of gene copies through the acquisition of specialized or novel functions (e.g. globin or homeobox gene families) [54, 55]. Although some genes may acquire adaptive novelties that are maintained from one generation to the next, others may follow a pseudogenization process through the accumulation of deleterious mutations. An understanding of when and how fast these duplications occur is key to our understanding of the duplicated genes’ functional diversity.
Here, we explored the evolutionary fates of the USP gene family in Platyhelminthes species of medical relevance. We found that the USP genes of this phylum are mostly intronless, transcribed independently and encoding a single protein domain. A few USPs (EgrG_09018 and orthologs) contain a single intron in a conserved position around amino acid 75. This is similar to what has been described for Hydra, where 22 out of 24 USP genes are intronless . Based on a well-supported monophyletic clade, Forêt and colleagues consider that a single retrotransposition event had a pivotal role in the emergence of most intronless USP genes after the anthozoan/hydrozoan divergence . Our results show that the same process could have been very important after the separation of the Cestoda and Trematoda classes. Nevertheless, the cluster organization of the USP genes in the Platyhelminthes (around 50% of USP genes occur in clusters in both Cestoda and Trematoda) revealed the importance of tandem duplications for the generation of new USP copies. This idea is supported by the presence of a well-conserved DNA motif occurring at the same position in tandemly organized genes (e.g. EgrG_08736, EgrG_08738, and orthologs in the Cestoda; T265_02179 and T265_02181, Smp_001000 and Smp_200240, and orthologs in C. sinensis and S. haematobium, respectively). Surprisingly, we found that the position of the -CCAATCA- motif was also preserved between isolated USP genes and their most closely related homologs (e.g. EgrG_10769 and EgrG_07258, which probably emerged from EgrG_09018), suggesting a retrotransposition event that included both the coding sequence and promoter region. This might be due to the fact that the transcription start sites (TSS) tend to be interspersed rather than located at one specific site. If a TSS upstream of the promoter region is used, a large part of the core promoter may be transcribed [56, 57]. This mechanism could ensure the transcriptional activity of the newly retrotransposed genes and would consequently avoid the effects of neutral evolution, i.e., the accumulation of deleterious mutations.
USPs are often classified based on the presence or absence of the conserved ATP-binding motif residues . A positive correlation between the conservation of the [Gx2Gx9G(S/T)] protein motif and crystal solubility in the presence of ATP, or an ATP analog, has been previously described . In addition, the same authors found a high level of conservation (~ 80%) at amino acid positions forming part of the motif across all crystals extracted from the PDB database, with exception of the second glycine (G130, which is preserved in 50% of crystals). Our protein model showed that amino acid alterations at specific ligand-binding sites have a negative effect on protein function, including modifications at P11, D13, V41, G127, G130, G140, and S/T141. The residue G130 was the most exchangeable amino acid, and could thus be most easily substituted non-synonymously (Additional file 3: Figure S3C). The USP gene expression data allowed us to associate the transcriptional activity with modifications at residues that are critical for ligand interaction. In general, we observed that alterations in the [Gx2Gx9G(S/T)] motif and at other positions within the protein pocket (e,g, D13, V41) are associated with very low or null levels of gene expression in almost all life cycle stages of the parasites, probably as a result of functional redundancy . Additionally, for several USPs, we observed amino acid insertions (SMU15024145, EgrG_08735, HmN_05623, etc.) and deletions (T265_02176, HmN_00323, Csin110041) within the [Gx2Gx9G(S/T)] motif. These modifications could lead to a steric hindrance, thereby preventing the contact with the ligand. Based on this, we believe that several USP genes in the Cestoda (EgrG_08734, EgrG_08735, and orthologs; HmN_05021 and HmN_05022; etc.) and Trematoda (Smp_136870, Smp_136890; Smp_09793 and their orthologue A_04226; etc.) are in the process of pseudogenization. Like the pseudogenes described here (Fig. 1a and b), the genes under pseudogenization lack the canonical and well conserved -CCAATCA- motif in the promoter region, which probably affects their transcriptional activity. Although less likely, the possibility that changes in the ATP-binding sites might expand the ligand repertoire should not be dismissed. Further functional studies will be necessary to clarify these findings.
In contrast to the process of pseudogenization, many USP paralogs may have acquired new functions, leading to functional diversification within the gene family. Since several amino acid modifications have occurred close to ligand-binding sites, this functional diversification may be associated to the interaction with different types of ligands. Furthermore, several USPs were found to occur as dimers or higher oligomeric complexes [8, 11], suggesting that substitutions involved in the protein oligomerization could increase the complexity of the protein-protein interactions. These sequence variations, and those found in promoter regions, could be considered adaptive traits that emerge as part of subfunctionalization or neofunctionalization processes (Fig. 4). The publication of the genome sequences of several Platyhelminthes species [21, 22, 50, 62] revealed that gene expansion, such as in heat shock proteins, species-specific antigens, or proteases, is a widespread process related to the adaptation to parasitism. In this way, some of the USP paralogs could be considered adaptations to the parasitic lifestyle by increasing the repertoire of binding proteins, by establishing complex protein-protein interactions (homo- and heterodimers), or by being expressed in a specific tissue, life cycle stage, or in response to a particular stressor.
In the present work we found that the USP gene family has an ancient origin and follows a complex evolutionary pattern (pseudogenization and sub/neofunctionalization) for several Platyhelminth species. This scenario may result from different selective pressures acting on the USP genes. If these patterns are restricted to parasitic flatworms, or also include the free-living species, remains to be elucidated. Further studies associating functional diversity with the various sequence modifications will help deepen our knowledge about the patterns and regulation of USP gene expression. Additional analyses will be necessary to investigate the role of ncRNAs in the specific spatiotemporal regulation of the USP genes.
We thank the following colleagues for helping us to improve our manuscript through their comments and suggestions: Claudia Thompson (Center for Biotechnology, UFRGS, Porto Alegre, Brazil), Loreta Freitas (Department of Genetics, UFGRS, Porto Alegre, Brazil), and Karen Haag (Department of Genetics, UFGRS, Porto Alegre, Brazil). We also thank to the editorial board and the anonymous reviewers for their comments and valuable contributions.
This work was supported by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (grant number 1278/2011) and the Conselho Nacional de Desenvolvimento Científico e Tecnológico (grant number 472316/2013–3), Ministry of Education, Brazil. The funders had no role in study design, data collection and analysis, decision to publish, interpretation of data, or preparation of the manuscript.
Availability of data and materials
The datasets supporting the conclusions of this article are included within the article and its additional files.
SME and AZ conceived and designed this study. SME, MPC, and LBC performed the experiments and analyzed the data. SME wrote the manuscript. MPC, LBC, and AZ helped to draft the manuscript. All authors read and approved the final version of the manuscript.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Hurles M. Gene duplication: the genomic trade in spare parts. PLoS Biol. 2004;2(7):E206.View ArticlePubMedPubMed CentralGoogle Scholar
- Ohno S. Evolution by gene duplication. Berlin: Springer; 1970.Google Scholar
- Hahn MW. Distinguishing among evolutionary models for the maintenance of gene duplicates. J Hered. 2009;100(5):605–17.View ArticlePubMedGoogle Scholar
- Innan H, Kondrashov F. The evolution of gene duplications: classifying and distinguishing between models. Nat Rev Genet. 2010;11(2):97–108.View ArticlePubMedGoogle Scholar
- Forêt S, et al. Phylogenomics reveals an anomalous distribution of USP genes in metazoans. Mol Biol Evol. 2011;28(1):153–61.View ArticlePubMedGoogle Scholar
- Gustavsson N, Diez A, Nyström T. The universal stress protein paralogues of Escherichia Coli are co-ordinately regulated and co-operate in the defence against DNA damage. Mol Microbiol. 2002;43(1):107–17.View ArticlePubMedGoogle Scholar
- Nachin L, Nannmark U, Nyström T. Differential roles of the universal stress proteins of Escherichia Coli in oxidative stress resistance, adhesion, and motility. J Bacteriol. 2005;187(18):6265–72.View ArticlePubMedPubMed CentralGoogle Scholar
- Jung YJ, et al. Universal stress protein exhibits a Redox-dependent chaperone function in Arabidopsis and enhances plant tolerance to heat shock and oxidative stress. Front Plant Sci. 2015;6:1141.PubMedPubMed CentralGoogle Scholar
- Drumm JE, et al. Mycobacterium tuberculosis universal stress protein Rv2623 regulates bacillary growth by ATP-binding: requirement for establishing chronic persistent infection. PLoS Pathog. 2009;5(5):e1000460.View ArticlePubMedPubMed CentralGoogle Scholar
- Zarembinski TI, et al. Structure-based assignment of the biochemical function of a hypothetical protein: a test case of structural genomics. Proc Natl Acad Sci U S A. 1998;95(26):15189–93.View ArticlePubMedPubMed CentralGoogle Scholar
- Tkaczuk KL, et al. Structural and functional insight into the universal stress protein family. Evol Appl. 2013;6(3):434–49.View ArticlePubMedPubMed CentralGoogle Scholar
- Boes N, et al. The Pseudomonas Aeruginosa universal stress protein PA4352 is essential for surviving anaerobic energy stress. J Bacteriol. 2006;188(18):6529–38.View ArticlePubMedPubMed CentralGoogle Scholar
- World Health Organization. World health statistics 2016: monitoring health for the SDGs, sustainable development goals. Available from: http://www.who.int/en/ Accessed 10 Dec 2016.
- Molyneux DH, Savioli L, Engels D. Neglected tropical diseases: progress towards addressing the chronic pandemic. The Lancet. 2017;389: 312–325.Google Scholar
- Thompson, R., Biology and systematics of Echinococcus. 1995: CAB International, Wallingford. 1-50.Google Scholar
- Maeng S, et al. Oxidative stress-mediated mouse liver lesions caused by Clonorchis Sinensis infection. Int J Parasitol. 2016;46(3):195–204.View ArticlePubMedGoogle Scholar
- Negrão-Corrêa D, et al. Interaction of Schistosoma Mansoni Sporocysts and Hemocytes of Biomphalaria. J Parasitol Res. 2012;2012:743920.View ArticlePubMedPubMed CentralGoogle Scholar
- Cheng Z, et al. Identification and characterisation of Emp53, the homologue of human tumor suppressor p53, from Echinococcus multilocularis: its role in apoptosis and the oxidative stress response. Int J Parasitol. 2015;45(8):517–26.View ArticlePubMedGoogle Scholar
- Masamba P, et al. Universal stress proteins as new targets for environmental and therapeutic interventions of Schistosomiasis. Int J Environ Res Public Health. 2016;13(10):972–984.Google Scholar
- Bowles J, Blair D, McManus DP. Genetic variants within the genus Echinococcus identified by mitochondrial DNA sequencing. Mol Biochem Parasitol. 1992;54(2):165–73.View ArticlePubMedGoogle Scholar
- Tsai IJ, et al. The genomes of four tapeworm species reveal adaptations to parasitism. Nature. 2013;496(7443):57–63.View ArticlePubMedPubMed CentralGoogle Scholar
- Zheng H, et al. The genome of the hydatid tapeworm Echinococcus granulosus. Nat Genet. 2013;45(10):1168–75.View ArticlePubMedGoogle Scholar
- Espinola SM, Ferreira HB, Zaha A. Validation of suitable reference genes for expression normalization in Echinococcus spp. larval stages. PLoS One. 2014;9(7): e102228.Google Scholar
- Ruijter JM, et al. Amplification efficiency: linking baseline and bias in the analysis of quantitative PCR data. Nucleic Acids Res. 2009;37(6):e45.View ArticlePubMedPubMed CentralGoogle Scholar
- Robb SM, et al. SmedGD 2.0: the Schmidtea mediterranea genome database. Genesis. 2015;53(8):535–46.View ArticlePubMedPubMed CentralGoogle Scholar
- Howe KL, et al. WormBase 2016: expanding to enable helminth genomic research. Nucleic Acids Res. 2016;44(D1):D774–80.View ArticlePubMedGoogle Scholar
- Aken BL, et al. Ensembl 2017. Nucleic Acids Res. 2017;45(D1):D635–42.View ArticlePubMedGoogle Scholar
- Nordberg H, et al. The genome portal of the Department of Energy Joint Genome Institute: 2014 updates. Nucleic Acids Res. 2014;42(Database issue):D26–31.View ArticlePubMedGoogle Scholar
- Guindon S, et al. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol. 2010;59(3):307–21.View ArticlePubMedGoogle Scholar
- Drummond AJ, et al. Bayesian phylogenetics with BEAUti and the BEAST 1.7. Mol Biol Evol. 2012;29(8):1969–73.View ArticlePubMedPubMed CentralGoogle Scholar
- Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol. 2013;30(4):772–80.View ArticlePubMedPubMed CentralGoogle Scholar
- HIV database: Gap Strip/Squeeze v2.1.0. Available from: https://www.hiv.lanl.gov/content/sequence/GAPSTREEZE/gap.html. Accessed 10 Sept 2017.
- Lefort V, Longueville JE, Gascuel O. SMS: smart model selection in PhyML. Mol Biol Evol. 2017;34(9):2422–4.View ArticlePubMedGoogle Scholar
- Gernhard T. The conditioned reconstructed process. J Theor Biol. 2008;253(4):769–78.View ArticlePubMedGoogle Scholar
- Yang Z. Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. J Mol Evol. 1994;39(3):306–14.View ArticlePubMedGoogle Scholar
- Rambaut A, S M., Xie D, Drummond AJ. Tracer v1.6. 2014, Available from http://tree.bio.ed.ac.uk/software/tracer/.
- Rambaut, A. FigTree: tree figure drawing tool. 2014. Available from http://tree.bio.ed.ac.uk/.Google Scholar
- Zhang J, Nielsen R, Yang Z. Evaluation of an improved branch-site likelihood method for detecting positive selection at the molecular level. Mol Biol Evol. 2005;22(12):2472–9.View ArticlePubMedGoogle Scholar
- Doron-Faigenboim A, Pupko T. A combined empirical and mechanistic codon model. Mol Biol Evol. 2007;24(2):388–97.View ArticlePubMedGoogle Scholar
- Stern A, et al. Selecton 2007: advanced models for detecting positive and purifying selection using a Bayesian inference approach. Nucleic Acids Res. 2007;35(Web Server issue):W506–11.View ArticlePubMedPubMed CentralGoogle Scholar
- Suyama M, Torrents D, Bork P. PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments. Nucleic Acids Res. 2006;34(Web Server issue):W609–12.View ArticlePubMedPubMed CentralGoogle Scholar
- Yang Z, Nielsen R. Codon-substitution models for detecting molecular adaptation at individual sites along specific lineages. Mol Biol Evol. 2002;19(6):908–17.View ArticlePubMedGoogle Scholar
- Bailey TL, et al. MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res. 2009;37(Web Server issue):W202–8.View ArticlePubMedPubMed CentralGoogle Scholar
- Gupta S, et al. Quantifying similarity between motifs. Genome Biol. 2007;8(2):R24.View ArticlePubMedPubMed CentralGoogle Scholar
- Buske FA, et al. Assigning roles to DNA regulatory motifs using comparative genomics. Bioinformatics. 2010;26(7):860–6.View ArticlePubMedPubMed CentralGoogle Scholar
- Gu X, et al. An update of DIVERGE software for functional divergence analysis of protein family. Mol Biol Evol. 2013;30(7):1713–9.View ArticlePubMedGoogle Scholar
- Gu X. Statistical methods for testing functional divergence after gene duplication. Mol Biol Evol. 1999;16(12):1664–74.View ArticlePubMedGoogle Scholar
- Kelley LA, et al. The Phyre2 web portal for protein modeling, prediction and analysis. Nat Protoc. 2015;10(6):845–58.View ArticlePubMedPubMed CentralGoogle Scholar
- McGuffin LJ, Buenavista MT, Roche DB. The ModFOLD4 server for the quality assessment of 3D protein models. Nucleic Acids Res. 2013;41(Web Server issue):W368–72.View ArticlePubMedPubMed CentralGoogle Scholar
- Protasio AV, et al. A systematically improved high quality genome and transcriptome of the human blood fluke Schistosoma Mansoni. PLoS Negl Trop Dis. 2012;6(1):e1455.View ArticlePubMedPubMed CentralGoogle Scholar
- Wasik K, et al. Genome and transcriptome of the regeneration-competent flatworm, Macrostomum lignano. Proc Natl Acad Sci U S A. 2015;112(40):12462–7.View ArticlePubMedPubMed CentralGoogle Scholar
- Mantovani R. The molecular biology of the CCAAT-binding factor NF-Y. Gene. 1999;239(1):15–27.View ArticlePubMedGoogle Scholar
- Thön M, et al. The CCAAT-binding complex coordinates the oxidative stress response in eukaryotes. Nucleic Acids Res. 2010;38(4):1098–113.View ArticlePubMedGoogle Scholar
- Holland PW. Evolution of homeobox genes. Wiley Interdiscip Rev Dev Biol. 2013;2(1):31–45.View ArticlePubMedGoogle Scholar
- Storz JF. Gene duplication and evolutionary innovations in hemoglobin-oxygen transport. Physiology (Bethesda). 2016;31(3):223–32.Google Scholar
- Frith MC, et al. A code for transcription initiation in mammalian genomes. Genome Res. 2008;18(1):1–12.View ArticlePubMedPubMed CentralGoogle Scholar
- Okamura K, Nakai K. Retrotransposition as a source of new promoters. Mol Biol Evol. 2008;25(6):1231–8.View ArticlePubMedPubMed CentralGoogle Scholar
- Sousa MC, McKay DB. Structure of the universal stress protein of Haemophilus influenzae. Structure. 2001;9(12):1135–41.View ArticlePubMedGoogle Scholar
- Hingley-Wilson SM, et al. Individual mycobacterium tuberculosis universal stress protein homologues are dispensable in vitro. Tuberculosis (Edinb). 2010;90(4):236–44.View ArticleGoogle Scholar
- Wang J, Marowsky NC, Fan C. Divergent evolutionary and expression patterns between lineage specific new duplicate genes and their parental paralogs in Arabidopsis Thaliana. PLoS One. 2013;8(8):e72362.View ArticlePubMedPubMed CentralGoogle Scholar
- Milligan MJ, Lipovich L. Pseudogene-derived lncRNAs: emerging regulators of gene expression. Front Genet. 2014;5:476.PubMedGoogle Scholar
- Berriman M, et al. The genome of the blood fluke Schistosoma Mansoni. Nature. 2009;460(7253):352–8.View ArticlePubMedPubMed CentralGoogle Scholar