Diversification of Quiescin sulfhydryl oxidase in a preserved framework for redox relay
© Limor-Waisberg et al.; licensee BioMed Central Ltd. 2013
Received: 9 January 2013
Accepted: 7 March 2013
Published: 19 March 2013
Skip to main content
© Limor-Waisberg et al.; licensee BioMed Central Ltd. 2013
Received: 9 January 2013
Accepted: 7 March 2013
Published: 19 March 2013
The enzyme family Quiescin Sulfhydryl Oxidase (QSOX) is defined by the presence of an amino-terminal thioredoxin-fold (Trx) domain and a carboxy-terminal Erv family sulfhydryl oxidase domain. QSOX enzymes, which generate disulfide bonds and transfer them to substrate proteins, are present in a wide variety of eukaryotic species including metazoans and plants, but are absent from fungi. Plant and animal QSOXs differ in their active-site amino acid sequences and content of non-catalytic domains. The question arises, therefore, whether the Trx-Erv fusion has the same mechanistic significance in all QSOX enzymes, and whether shared features distinguish the functional domains of QSOX from other instances in which these domains occur independently. Through a study of QSOX phylogeny and an analysis of QSOX sequence diversity in light of recently determined three-dimensional structures, we sought insight into the origin and evolution of this multi-domain redox alliance.
An updated collection of QSOX enzymes was used to confirm and refine the differences in domain composition and active-site sequence motif patterns of QSOXs belonging to various eukaryotic phyla. Beyond the expected phylogenetic distinction of animal and plant QSOX enzymes, trees based on individual redox-active QSOX domains show a particular distinction of the Trx domain early in plant evolution. A comparison of QSOX domains with Trx and Erv domains from outside the QSOX family revealed several sequence and structural features that clearly differentiate QSOXs from other enzymes containing either of these domains. Notably, these features, present in QSOXs of various phyla, localize to the interface between the Trx and Erv domains observed in structures of QSOX that model interdomain redox communication.
The infrastructure for interdomain electron relay, previously identified for animal and parasite QSOXs, is found broadly across the QSOX family, including the plant enzymes. We conclude that the conserved three-dimensional framework of the QSOX catalytic domains accommodates lineage-specific differences and paralog diversification in the amino acid residues surrounding the redox-active cysteines. Our findings indicate that QSOX enzymes are characterized not just by the presence of the two defining domain folds but also by features that promote coordinated activity.
The proper pairing and oxidation of cysteine thiols to disulfides is crucial for the folding of many proteins. Accordingly, specialized enzymes catalyze the formation of disulfide bonds . One such enzyme is the eukaryotic Quiescin Sulfhydryl Oxidase (QSOX). QSOX homologs have been found in protists, plants, and metazoans, but not in fungi . Though disulfide catalysts are conserved components of the endoplasmic reticulum (ER), QSOX is localized outside the ER. In particular, plant QSOX is found in the cell wall , and mammalian QSOX is localized to the Golgi apparatus [4, 5] or secreted from cells . The importance of oxidative protein folding in the early secretory pathway for production of cell-surface and secreted proteins is well-appreciated, but the role of an additional disulfide catalyst downstream of the ER is poorly understood.
The vastly differing substrate sets potentially available to QSOX enzymes in subcellular or extracellular environments across various taxa raise questions regarding QSOX evolution and whether QSOX enzymes in various species have a common role. In particular, a study of QSOX physiological function (Ilani T, Alon A, Grossman I, Horowitz B, Kartvelishvily E, Cohen SR, Fass D: A secreted disulfide catalyst controls extracellular matrix composition and function, submitted) identified a requirement for metazoan QSOX in assembly of extracellular matrix proteins that are not present in plants. A study of QSOX phylogeny will therefore complement the experimental identification of QSOX substrates to distinguish potential general roles for a post-ER disulfide formation pathway from functions into which the enzyme may have been co-opted on particular branches of the evolutionary tree. Furthermore, comparative analysis of QSOX enzymes will aid the identification of structural features that contribute to QSOX catalysis or differentiate QSOX enzymes from one another. Motivated by increasing interest in the biology of this enzyme family and facilitated by the large number of sequences that have become available in recent years, we present an analysis of QSOX evolution, highlighting universal features vs. aspects of the enzyme family that may vary for adaptation to particular functional niches.
Herein we provide an updated analysis of QSOX evolution, performed in light of recent structural studies of the QSOX enzyme family. Amino acid sequences were annotated, curated, and subsequently aligned to reveal and compare functional and structural domains and motifs. This study sheds light on the emergence of different QSOX orthologs, paralogs, and isoforms and presents the major evolutionary events that may have led to the contemporary set of QSOX enzymes.
Many QSOX sequences available in the public databases are currently un- or mis-annotated. Due to the presence of a domain with homology to PDI proteins, the most common mis-annotation is the identification of the QSOX as a PDI. Although both QSOX and PDI act as catalysts of disulfide bond formation, they differ in their cellular localization, domain composition, and electron acceptor. Whereas PDI family proteins are defined by localization to the endoplasmic reticulum and the presence of one or more Trx domains , QSOX is defined by the presence of both a Trx domain and an Erv domain . Hence, other annotations such as “Sulfhydryl oxidase like” or “Thioredoxin like,” are similarly insufficient. Another misleading annotation is the numbering of QSOX paralogs. In several instances, the numbering of a QSOX paralog in a given species does not correspond to its counterpart in a second, closely related species. We therefore suggest a coherent and phylogeny-based annotation and numbering for all QSOX sequences identified and used throughout this article. Accession numbers and corresponding annotations are provided as (Additional file 1: Table S1).
Stramenopiles, Euglenozoa, and Alveolata are three early branches of unicellular eukaryotes for which QSOX sequences were identified. Within Stramenopiles, QSOX sequences were identified in diatoms. QSOX sequences also appear in Oomycetes, which include several genera of plant pathogens: Phytophthora, Hyaloperonospora, and Albugo. In Euglenozoa, QSOX sequences were identified for Leishmania and Trypanosoma species, and among Alveolata for Perkinsus, Plasmodium, and Cryptosporidium.
In plants (Viridiplantae) QSOX is found in both green algae (Chlorophyta) and green plants (Streptophyta). Among green algae, QSOX sequences were identified for Chlorophyceae, Mamiellophyceae, and Trebouxiophyceae. Among Streptophyta, QSOX sequences were identified in mosses (Embryophyta) and vascular plants (Tracheophyta). A QSOX sequence was identified for S. moellendorffii, member of an ancient lineage at the root of vascular plants in the evolutionary tree. Among the vascular plants, QSOX sequences were found for both monocotyledons and eudicotyledons.
The earliest evidence of a QSOX sequence in Opisthokonta is of the marine choanoflagellate M. brevicollis. However, due to a gap in the data affecting the Trx domains, the M. brevicollis QSOX sequence could not be retrieved in its entirety, and the sequence was therefore excluded from subsequent analyses. Within Metazoa, a QSOX sequence was identified for the placozoan T. adhaerens, which has the smallest animal genome known to date. Among Eumetazoa, QSOX sequences were identified for species of Ctenophora, Cnidaria, and Bilateria (within Acoelomata in Trematoda, within Pseudocoelomata in Nematoda, and within Coelomata). Many QSOX sequences were identified for Arthropoda (both within Chelicerata and Mandibulata) and Chordata, in primitive small invertebrates such as Tunicata and the fish-like cephalochordate B. floridae, as well as in Craniata, from fish to humans.
In total, 228 QSOX sequences were identified within the genomes of 132 different species.
Many organisms contain more than one QSOX gene (Additional file 2: Figure S2). In some species, such duplication events seem to be evolutionarily recent, whereas in others, the duplication occurred deep in the origin of the phylum. Two QSOX paralogs can be found in Craniata from fish to humans (Figure 3), with the exception of several species listed below. The branching at the origin of Craniata means that the human QSOX1 paralog is more similar to fish QSOX1 than to human QSOX2. Within the genomes of A. carolinensis, O. cuniculus, M. putorius, and S. scrofa, only the QSOX1 variant was found; in O. anatinus only the QSOX2 variant was found. The complementary QSOX paralogs may not have been identified due to poor coverage of the respective genomes.
In addition to the two QSOX paralogs, QSOX sequences have also diversified in Craniata through alternative splicing. In many Craniata, from rodents to humans, an alternative splice variant of QSOX1 has been identified (Additional file 2: Figure S2). This splice variant interrupts the final exon and eliminates the transmembrane segment but preserves all redox-active domains. The splice variation may thus affect processing or localization of the QSOX enzyme. Except for a single occurrence in M. musculus, splice variants of the QSOX2 paralog of Craniata have not been identified.
For most arthropods, no paralogs were identified. However, significant exceptions to this generalization are the coleopteran T. castaneum and Drosophila, with up to four paralogs in several Drosophila species. The QSOX paralogs of T. castaneum do not share the same duplication event as those of Drosophila. In Drosophila, a major duplication event occurred at the root of the genus, giving rise to the QSOX1 paralog present in all Drosophila and to a second branch of QSOX paralogs that underwent several further duplication events with no clear consistency between the species.
Among Nematoda, several duplication events seem to have occurred before the branching of Chromadorea and Enoplea, followed by inconsistent gene duplication and gene loss events.
Among Viridiplantae, QSOX paralogs were identified only within Streptophyta. The duplication events in plants are recent and seem to have occurred locally at the species or genus level. Accumulation of additional QSOX sequences might reveal slightly deeper branching between closely related plants. Nevertheless, as the QSOX paralogs in the two available Arabidopsis species genomes and those of G. max do not seem to have emerged from the same duplication event, the branching is not expected to be much deeper.
Among Alveolata, QSOX paralogs were identified for the P. marinus species. Due to a lack of genome sequences for closely related species, it is not known how deep this duplication event might be.
Among Stramenopiles, a duplication event may have occurred deep within Oomycetes, followed by multiple inconsistent gene duplication and gene loss events. However, it may be that several relevant paralogs were not identified due to incomplete genome coverage. Another duplication event among Stramenopiles seems to have occurred deep at the level of Bacillariophyta.
This extended catalog of QSOX sequences provides the platform for the following phylogenetic and comparative analyses.
Several branches seem to emerge from the root of the QSOX phylogenetic tree and evolve apart (Figure 3). Although all neighbor-joining (NJ) and maximum likelihood (ML) trees branch according to the corresponding taxonomic affiliations of the examined species, low consensus scores are obtained at the roots of the major branches. It is only in more recent branches that the distinctions between QSOXs of different species are well supported. According to the consensus ML tree (Figure 3), the Viridiplantae cluster, beginning with the Mamiellophyceae branch, is well distinguished from other QSOX sequences. The distinction between QSOXs of monocotyledons and eudicotyledons is also well supported. Among Metazoa, somewhat reliable scores distinguish QSOXs of Arthropoda and Nematoda after the branching of the D. pulex and T. spiralis species, respectively. The distinction between QSOX1 and QSOX2 of Craniata is the first to be well supported within Chordata. Lastly, among protists, the Alveolata branch seems to cluster among the Stramenopile branch together with Oomycetes and apart from Bacillariophyta. From all of the above, it is clear that Viridiplantae evolved apart from the various Metazoan branches, although the relative positioning of the branches is poorly supported and hence does not reveal deeper affiliations. Removal of protists from the ML trees slightly improved the distinction between Viridiplantae and several of the Metazoan branches (Additional file 2: Figure S3). Overall, QSOX sequences of Viridiplantae display a continuous path from Chlorophyta to the higher plants in Streptophyta, with a few recent duplication events, whereas highly divergent forms of Metazoa QSOX sequences emerged from a very ancient ancestor.
To assess the evolutionary paths of the QSOX Trx and Erv domains independently, separate ML phylogenetic trees were constructed (Additional file 2: Figures S4 and Figure S5). Similarly to the original ML tree, the ML tree for each domain supports claims for branching order mainly for recent taxonomic affiliations. Nevertheless, important observations can be made upon closer examination. For example, the Trx ML tree shows notable differences from the Erv ML tree along the plant lineage. The Trx ML tree shows the branching of Viridiplantae following Mamiellophyceae, but the clear distinction between monocotyledons and eudicotyledons is missing. Conversely, the distinction between monocotyledons and eudicotyledons is strongly supported by the Erv ML tree, but this tree fails to show the early distinction of plants from other phyla. In other words, the ancient distinction of plant QSOX Trx domains, evident following Mamiellophyceae, was apparently followed by diversification of the Erv domain, particularly notable at the bifurcation of Streptophyta into monocotyledons and eudicotyledons. Therefore, it appears that the Trx and Erv domains of plant QSOXs were subjected to distinct selection events at different points in evolution.
Some differences are also observed between the Trx and Erv domain trees within Metazoa. For example, the Trx ML tree shows support for a distinction between the Craniata QSOX1 and QSOX2 variants well within Mammalia. In the Erv ML tree, however, the diversification of the QSOX2 variant is already well supported at the root of Craniata. QSOX1, in turn, is distinguished in the Erv ML tree from other animal QSOXs at the level of Mammalia. Therefore, it seems that animal QSOX variants underwent a significant evolutionary event that co-affected both their Trx and Erv domains prior to or during the emergence of mammals.
The Trx-CXXC motif shows strong sequence conservation and characterizes major groups of organisms. Specifically, the CGHC sequence is nearly universal in Euglenozoa and Metazoa, with the exception of certain nematodes and arthropods of the Coleoptera, Diptera, and Hymenoptera lineages. Interestingly, among Drosophila, a QSOX paralog with a CGHC pattern coexists with an array of paralogs displaying a CG [DN] C sequence in their Trx-CXXC motifs. Similarly, in nematodes, QSOXs with a CGHC patterns are found together with a paralog exhibiting a CGAC pattern. To the extent possible, we assigned the designation QSOX1 to paralogs containing the CGHC pattern dominant in Metazoa QSOXs. Among Viridiplantae and the Apicomplexa branch of alveolates, the CPAC pattern is the most abundant, whereas Stramenopiles typically have a CPHC pattern. It is interesting to note that a CPXC pattern is rare among PDI family Trx domains, for which the CGHC pattern is prominent, even in PDIs from plants . Hence, the distinction between the three major patterns (CGHC, CPHC, and CPAC) for Trx redox-active motifs in QSOX enzymes is independent of trends for CXXC motifs in PDI family proteins.
The Erv-CXXC patterns are more diverse than the Trx-CXXC patterns. For example, five organisms at the base of Metazoa, T. adhaerens, N. vectensis, H. magnipapillata, S. purpuratus, and B. floridae, all share the common Trx-CXXC sequence CGHC but have the Erv-CXXC sequences CQKC, CSYC, CRYC, CQNC, and CQEC, respectively. In Craniata, variation in the Erv-CXXC motif follows the taxonomic lineage. Specifically, both QSOX1 and QSOX2 of lower taxa often contain a CREC pattern, whereas the patterns diverge in higher taxa such that CRDC becomes dominant for QSOX1 and CKEC for QSOX2. In general, Erv-CXXC patterns delimit narrower groups of organisms than do Trx-CXXC patterns and vary in most instances of paralogs. Even in some recent duplications in which paralogs retain high sequence identity, the Erv-CXXC motifs differ. For example, the Erv-CXXC motifs have diverged following the independent duplications in Arabidopsis (CEEC vs. CEDC) and Trichocarpa (CDDC vs. CDEC).
Consideration of the various QSOX CXXC patterns in light of the phylogenetic tree allowed us to distinguish between diversity that implies the absence of constraints from diversity that suggests adaptation. In particular, the strong conservation of Trx-CXXC patterns within different lineages suggests tight constraints along these branches, despite the prominent differences in patterns between lineages. A phylogenetic perspective also aids in discrimination between primary and auxiliary paralogs. The primary paralog is present in closely related species and tends to exhibit the same CXXC patterns, whereas auxiliary paralogs greatly increase the sequence diversity of the motif. Finally, a phylogenetic approach allows speculation regarding the primordial patterns that diversified into those exhibited by contemporary organisms. For example, the occurrence of CPAC in the Trx-CXXC of plants may have arisen from a primordial CPHC pattern, still exhibited by Stramenopiles and even in two occurrences of Chlorophyta.
We then sought more insight into the potential origins of the Trx and Erv domains to assess the likelihood of various scenarios that may have produced the current QSOX family. A comparison of the QSOX Trx and Erv domains with other proteins containing only one or the other of these domains demonstrated that the QSOX ψErv/Erv module is clearly distinct from all other known Erv enzymes. The most widespread member of the Erv family is Erv1, a mitochondrial enzyme found broadly throughout eukaryotes, including yeast. Erv1 is a symmetric dimer, in which the two copies of the Erv domain interact. This arrangement differs significantly from QSOX, in which the Erv domain forms a pseudodimer with the ψErv domain and is therefore inaccessible for true dimerization. The presence of the ψErv region immediately upstream of the Erv domain in QSOX, as well as the existence of a third, carboxy-terminal CXXC motif (CT-CXXC), are unique characteristics of the QSOX subfamily of Erv enzymes that distinguish them definitively from other Erv domains. Remarkably, the CT-CXXC motif, despite being dispensable for enzyme assays in vitro, is as highly retained as the two catalytic CXXC motifs of QSOX. In summary, the available data suggest that the ψErv/Erv modules in all contemporary QSOX enzymes share a common ancestor.
In contrast to the clear distinction between the ψErv/Erv module of QSOX enzymes and Erv domains of other proteins, the distinction between Trx domains of QSOXs and PDI family proteins is less evident. The strong structural similarity among Trx domains from PDI proteins, and between PDI and QSOX, makes the identification of defining features of Trx domains in one protein family vs. the other quite challenging. Unlike the extension containing the CT-CXXC motif of QSOX, which is a distinguishing characteristic of its Erv domain, the CX6-8C disulfide in the QSOX Trx1 domain is found also in PDI proteins. No secondary structure element, loop, or other structural feature appears to differentiate QSOX Trx domains from PDI Trx domains. Furthermore, the large and ancient PDI family provides a rich source for both Trx and juxtaposed Trx1/Trx2 domains from which the Trx modules in plant and animal QSOXs may have been derived. Focusing on amino acid sequences rather than structural features, the CGHC and CPAC patterns, found in the Trx domains of major QSOX lineages, are observed in various PDI proteins as well. Interestingly, the CGHC pattern is strongly dominant among PDI proteins that contain non-catalytic Trx domains downstream of a catalytic one, just as the CGHC pattern is characteristic of QSOXs that contain a catalytic/non-catalytic arrangement of Trx domains. In turn, CPXC patterns are frequently observed in PDI family proteins that lack non-catalytic Trx domains, analogous to plant and protist QSOXs.
Despite the similarities between QSOX and PDI Trx modules, in all phylogenetic trees constructed to include both sets, QSOX Trx domains clustered separately, although with very low scores at the root (Additional file 2: Figure S6). Further inspection of QSOX sequences compared to PDIs revealed an amino acid position that differs consistently between the two protein families. Two residues upstream of the CXXC motif in the Trx domain, a proline that is found nearly universally in PDI proteins is replaced by, most often, a serine or threonine in metazoan QSOXs or a histidine in plant QSOXs (Additional file 2: Figure S7). A proline is found only in select QSOX enzymes, such as those from O. tauri, M. pusilla, and P. infestans. The phylogenetic trees and the pinpointing of particular residues suggests that subtle sequence differences do distinguish QSOX and PDI Trx domains, consistent with either divergent evolution of QSOX Trx domains from a single precursor or convergence of distinct QSOX Trx domain sequences due to shared pressure to function in conjunction with the ψErv/Erv module.
In either scenario, the question remains whether the fusion of Trx and Erv domains has a common purpose across all organisms that display this fusion. In particular, is the fusion indicative of a shared biochemical mechanism? The availability of QSOX crystal structures  allowed for an analysis of QSOX sequences against the backdrop of the three-dimensional structures. The ensuing analysis of conservation at the level of QSOX domains was conducted to shed more light on the evolution of fundamental structural and mechanistic elements of the QSOX enzyme.
The first two segments of conservation correspond to the Trx1 redox-active site and the CX6-8C disulfide-bonded loop adjacent to it in the tertiary structure. Together, these two regions constitute the interface between the Trx1 and the Erv domains when they are packed against one another in the electron-transfer intermediate . To better appreciate the conservation at the contact site, conservation scores were mapped onto the QSOX structure (Figure 6B). Apart from the Trx-CXXC motif, the most conserved residue in these two segments is a proline at the beginning of the β4 strand. This proline, present in many Trx-fold proteins and found structurally in the cis configuration, was previously suggested to contribute to substrate binding  or shown to facilitate substrate release  or inhibit metal binding by the reactive thiolate-based active site of thioredoxins . Following the cis-proline, there is poor conservation at the sequence level in QSOX, though at the structural level the domain is predicted to retain an α4 helix.
The third region of conservation (Figure 6C) is another disulfide bonded loop at the amino-terminal junction of the Erv domain (Cys449-Cys452 in the H. sapiens QSOX1 sequence). Within the eleven residues between the cysteines in this loop is a basic residue (Arg401 in the H. sapiens QSOX1 sequence) that makes an electrostatic interaction to the phosphates in the FAD cofactor. The remainder of the loop projects out from the Erv helical bundle to contribute, together with the sixth region of conservation detailed below, to a structural element that has been described as a “backstop,” since it seems to constrain the Trx1 domain sterically as it docks against the Erv domain during interdomain electron transfer. The presence of this backstop is one of the major differences between QSOX Erv domains and other Erv enzymes. This difference may reflect the structural contexts of the di-cysteine motifs that interact with the Erv-CXXC in QSOX compared to other Erv enzymes. Specifically, the QSOX Trx domain, from the opposite end of the multi-domain protein, interacts with the QSOX Erv domain , whereas di-cysteine motifs on flexible regions of polypeptide tethered locally interact with the FAD-proximal disulfide of stand-alone Erv domains [19, 20]. The sequence conservation in QSOX extends from the backstop loop through the first three turns of the α1 helix in the Erv domain, where a tryptophan and histidine project from the same side of the α1 helix to form part of the FAD binding site.
The fourth conserved region in QSOX comprises the Erv redox-active di-cysteine motif. In addition, a highly conserved histidine and phenylalanine follow the CXXC motif in the α3 helix of the Erv domain. The imidazole side chain of the histidine is surface-exposed and interacts with the polar edge of the FAD isoalloxazine. The phenylalanine, together with the α1 helix tryptophan mentioned above, contributes to the hydrophobic pocket occupied by the non-polar edge of the isoalloxazine.
The fifth conserved region is the α4 helix in the Erv-domain helical bundle, which contains a set of side chains that contacts the adenine portion of the FAD. Specifically, a histidine and two asparagines are hallmarks of Erv domains . The α4 histidine interacts with the conserved histidine from the α1 helix, and the two imidazole rings, in a planar arrangement, stack against the FAD adenine ring.
Finally, the sixth conserved segment (Figure 6C) comprises the CT-CXXC motif. The CT-CXXC disulfide is immediately downstream of another basic residue (Lys500 in the H. sapiens QSOX1 sequence) in the vicinity of the FAD phosphates and an aromatic residue that packs against the outer face of the FAD adenine. Non-QSOX Erv domains have these same two functional residues, but in a simpler structural context: they are present in a KXXF/Y motif on the approximately ten-residue linker that directly connects the fourth and fifth helix in the Erv bundle. In QSOX, there are insertions both upstream and downstream of the basic/aromatic residue pair. Upstream is a stretch of about ten residues that constitute the second half of the backstop by packing against the third conserved segment (Figure 6C). Downstream of the basic/aromatic pair is the CT-CXXC and a weakly conserved loop preceding the fifth helix in the domain. The presence of the loop between the CT-CXXC and the fifth helix is conserved, though its composition less so. The function of this loop in the QSOX structure or mechanism is still unclear.
In summary, regions of highly conserved sequences in QSOX correspond to 1) important redox-active or structural disulfide bonds, 2) residues in direct contact with the FAD cofactor, and 3) features that appear to contribute to the interaction of the Trx and Erv domains but do not appear to be essential for the folding or fundamental catalytic activity of each module in isolation. In particular, the backstop in the ψErv/Erv module, contributed by regions #3 and #6 described above, is a prominent feature that appears to promote domain docking for electron transfer.
Inspection of sequences that are poorly conserved may also shed light on aspects of QSOX function. An apparently poorly conserved region that nevertheless is likely to be crucial for QSOX activity is the linker between the Trx domain(s) and the ψErv region. The sequences of this linker are highly variable, but the lengths fall into a narrow range: about 20–35 amino acids between the predicted ends of the flanking secondary structure elements. A rough conservation of linker length is consistent with the proposed role of the linker in providing flexibility for relative domain rotations in QSOX function, whilst tethering the two interacting domains with high effective concentration . Although some outliers appear to contain linkers of up to a hundred amino acids, these may arise from erroneous predictions of splice sites based on genome sequences.
The structures and dynamic relationship of the component parts of the QSOX enzyme, as well as the steps in its internal electron relay, have been recently determined . Furthermore, the targeting and localization of QSOX to the Golgi apparatus and extracellular environment, at least in mammals, is now evident [4, 6] and (Ilani T, Alon A, Grossman I, Horowitz B, Kartvelishvily E, Cohen SR, Fass D: A secreted disulfide catalyst controls extracellular matrix composition and function, submitted). With its unique domain composition, localization, and functional independence, it is likely that QSOX acts in a fundamentally different pathway or pathways than other disulfide forming enzymes in the cell. Although the complement of disulfide catalysts differs between organisms, the presence of one or more disulfide forming enzymes in the ER is considered necessary for the central role of this organelle in oxidative protein folding. QSOX, outside the ER, appears not to play a general role in folding the pool of secretory proteins, but may rather be dedicated to particular targets.
Trx-fold domains are universal in prokaryotes, archaea, and eukaryotes, but the fusion of a Trx domain with an Erv sulfhydryl oxidase domain is unique to the eukaryotic cell. QSOX is clearly not essential for eukaryotes, however, as yeast appear to lack the enzyme entirely. One may speculate that QSOX activity is related to multicellularity, which is in keeping with an observed role in extracellular matrix formation in animals (Ilani T, Alon A, Grossman I, Horowitz B, Kartvelishvily E, Cohen SR, Fass D: A secreted disulfide catalyst controls extracellular matrix composition and function, submitted). According to this model, the presence of QSOX in unicellular parasites such as trypanosomes and the resemblance of the Trx domain redox-active site of parasite QSOXs to that of their hosts (Figure 4) suggest that host proteins may be the targets of the pathogen-derived disulfide catalyst. Complicating the emphasis on multicellularity, however, is the observation that QSOX is found in unicellular green algae and in diatoms. The possibility therefore remains open that QSOX evolved to oxidize different target proteins on different pathways and for different purposes in plants, animals, and other species, rather than playing a universal biological role in all the species that encode it.
The most evident structural difference between plant and animal QSOXs is their domain composition. A number of different models can be discussed for how the QSOX family acquired its inhomogeneous domain content. If we assume that all current QSOX enzymes derived from the same original fusion between Trx-fold and ψErv/Erv elements, then either the fusion involved a single Trx domain, and metazoans subsequently gained an additional Trx domain, or the fusion occurred with a tandem Trx1/Trx2, and non-metazoans subsequently lost the Trx2 domain. From our analysis, however, we cannot rule out a model in which two independent fusion events of the ψErv/Erv unit with either a single Trx domain or tandem Trx domains led to the set of contemporary QSOXs. Another model that can be considered is one in which a universal QSOX precursor arose by fusion of a single Trx domain with the ψErv/Erv module, and the Trx domain was then replaced by a tandem Trx1/Trx2 early in the evolution of metazoans. In the latter two models, a subsequent modest convergence of QSOX Trx sequences, reflecting shared pressure to function in an electron relay with the ψErv/Erv module, would then explain the weak distinction of both plant and animal QSOX Trx domains as a group from PDI Trx domains. It should be noted that each of the models presented involves a minimum of two major fusion/deletion/substitution events in evolution. However, an independent fusion of the ψErv/Erv module with either a single Trx vs. tandem Trx domains on different lineages may nevertheless be less parsimonious, since it requires a ψErv/Erv module to be retained and available for both fusions at distinct points in time. In all examined sequences, the ψErv/Erv unit of QSOX enzymes is clearly distinct from single-domain Erv enzymes present in these same organisms, and there is little doubt that this module of QSOX evolved divergently from a single precursor. In particular, the backstop region and the CT-CXXC motif are present universally in QSOX enzymes but lacking in other Erv proteins. In the absence of evidence that the ψErv/Erv unit has a function on its own, it may be unlikely that it existed in primordial genomes until it was independently incorporated into two evolutionarily distinct fusion proteins with different Trx modules.
In contrast to the ambiguities regarding the sources and evolution of the QSOX domains, the comparative analysis performed against the backdrop of the QSOX enzyme structure suggests that the QSOX biochemical mechanism is well-understood and conserved throughout the evolutionary tree. In particular, the transfer of electrons from the Trx to the Erv domain (Figure 1) by a dithiol/disulfide exchange step appears to be characteristic of QSOX enzymes. As noted above, the required interaction between the Trx and Erv domains is strongly supported by conservation of the “backstop” region, a feature absent from Erv family enzymes that do not receive electrons at their FAD-proximal disulfide directly from a Trx-fold domain. In addition, the conservation of a region with variable sequence but relatively uniform length between the oxidoreductase and sulfhydryl oxidase modules supports the notion that the orientation between the Trx and Erv domains is flexible but the effective concentration of the two redox-active domains is constrained.
Given the support from sequence analysis for a universal interdomain electron transfer mechanism in QSOX, it is therefore puzzling that a biochemical study of recombinant A. thaliana QSOX failed to detect this step . The recombinant enzyme appeared to be well-folded, the Trx redox-active motif could readily accept electrons, and the Erv domains displayed robust sulfhydryl oxidase activity. Nevertheless, the Trx domain did not appear to transfer electrons to the Erv domain in the plant enzyme. The lack of interdomain electron transfer in recombinant plant QSOX cannot be explained merely by the absence of the Trx2 domain, as recombinant T. brucei QSOX, also lacking a Trx2 domain, showed effective interdomain dithiol/disulfide exchange and activity virtually indistinguishable from mammalian QSOX homologs [8, 9, 24]. This conflict between the in vitro enzymology of plant QSOX and the structure-guided sequence analysis presented here suggests that much remains to be discovered regarding the in vivo context of plant QSOX and the manner by which its localization, post-translational modification, and processing may affect its ability to carry out its anticipated enzymatic cycle. The puzzling biochemistry of plant QSOXs may be related to the observation that the Trx and Erv domain sequences appear to have diverged in independent steps along the plant lineage, suggesting that the domains were subjected in plants to other pressures besides their mutual interaction.
The open questions regarding plant QSOX enzymology emphasize that, sterically and electrostatically, the interdomain redox relay that occurs in QSOX is not trivial. It has been noted that redox relays tend not to involve direct transfer of electrons between Trx domains , a generalization that may apply, in most cases, to transfer between Trx and Erv domains as well. The reason for this apparent prohibition is the unfavorable apposition of helix dipoles as the Trx- and Erv-CXXC motifs, both at the amino termini of helices, interact. The elucidated mechanism of QSOX enzymes [8, 9] nevertheless includes direct electron transfer between helix amino-termini, indicating that such an event, while rare, does occur in a natural biological pathway. This unusual interaction is likely to be compensated by other features of the protein-protein interaction interface or more globally within the enzyme. In particular, the tethering of the Trx and Erv modules to one another to increase their effective concentration and the presence of the “backstop” may be necessary, but not sufficient, to promote the interaction. Further studies of plant QSOX, including high-resolution structural information, will provide a wider view of QSOX evolution and correct any biases acquired by a focus on metazoan and parasite representatives of the family.
Despite certain marked differences among QSOX sequences from different phyla, strong similarities within the redox-active domains suggest that fundamental aspects of the mechanism described for metazoan QSOX enzymes are shared among QSOXs throughout Eukarya. Based on sequence alignments of QSOX enzymes from diverse species and an understanding of the physical basis for interdomain electron transfer derived from QSOX crystal structures , not only the presence of the Trx and Erv domains but also aspects of their interaction interfaces appear to be shared in common. Regardless of the possibly differing substrate sets available to QSOX enzymes in different species, and the potentially different pathways in which these substrates function, there is validity to the annotation of all detected Trx-Erv domain fusions as QSOX enzymes. QSOX enzymes appear to be characterized not only by a Trx domain and an Erv domain at opposite ends of a multi-domain protein, but also by the potential to carry out a complete electron transfer pathway from substrate dithiols to the terminal electron acceptor, molecular oxygen.
The NCBI BLAST tool (http://www.ncbi.nlm.nih.gov/BLAST/) was used with various full length QSOX protein sequences to exhaust the available databases at NCBI, at both the nucleotide (nt, refseq_rna, refseq_genomic, and est) and protein (nr, refseq_protein, swissprot, and pdb) levels. When identified, expressed sequence tags (ESTs) were assembled into contigs and compared to the corresponding genomic DNA. Identification of exon boundaries and consensus splice sites as well as of sites of transcription termination was done by comparison of genomic DNA and cDNA sequences. All retrieved sequences were manually curated. Splicing variants of QSOXs were identified by visual inspection.
It should be noted that the resemblance of the QSOX Trx and Erv domains to other thioredoxins and sulfhydryl oxidases made the identification more complex, especially for short (i.e., lacking one of the domains) and isolated (i.e., lacking phylogenetically close homologs) sequences. Sequences were therefore accepted as QSOX if they had both a Trx1 and an Erv domain, as the existence of both domains is a defining characteristic of QSOX sequences. In several instances, segments of the Trx or Erv domains were missing, and the sequence was confirmed as QSOX by comparison with a closely related homolog. In all these instances, strong homology could be observed along the sequences and not only at loci of the remaining domain. There were no predetermined cutoffs, but the minimal similarity was 68%, calculated for L. loa QSOX3 (lacking parts of its Trx and Erv domains) and A. suum QSOX3.
Protein sequences were aligned using the ClustalX program with default settings . The neighbor-joining phylogenetic trees, used to display paralogs and CXXC patterns, were constructed using the embedded algorithms of the ClustalX program.
To assess evolutionary events from QSOX sequences, more robust maximum likelihood (ML) trees were constructed. For the ML trees of the QSOX Trx and Erv domains, the following patterns were generated: Trx: X(5)-[DGEILQKVNRM]-x(5)-[WFYKSGT]-[CS]-x(2)-[CS]-x(4)-[PRSDETGAKHN]-x(10,35)-C-x(5,9)-C-x(8)-P-[AGTSLMFRHKNQ]-X(9), Erv: X(30)-[TNPSLF]-C-[GSTA]-X(18,54)-C-x(2)-[CSG]-x(2)-[HNKEY]-[FIL]-x(40,116)-[WFY]-x(5)-C-x(2)-C. Truncated QSOX sequences (indicated in Additional file 1: Table S1) were omitted from the dataset used for the analysis. To avoid over-representation on particular branches, one representative species from each genus was included (indicated in Additional file 1: Table S1). Concatenation of the Trx and Erv domains was used to build the QSOX ML tree. All ML trees were constructed using the Phylip package . First, 100 bootstrap replicates were generated using the seqboot program (seed set to 9). The ML trees were then generated using the proML program with the Dayhoff and Jones-Taylor-Thornton (JTT) probability models (seed set to 9 and jumble to 3). Lastly, consensus trees were established using ‘Consense,’ available in the Phylip package. Figures were created using FigTree.
A number of QSOX crystal structures are available [9, 10]. These include the H. sapiens structure determined in two fragments (PDB codes 3Q6O and 3LLK), a mutant M. musculus four-domain structure (PDB codes 3T58 and 3T59), and wild-type and mutant T. brucei structures (PDB codes 3QCP and 3QD9). A structure model for a “closed” or “docked” state of H. sapiens QSOX1 was constructed by superposition of the structures of the oxidoreductase  and sulfhydryl oxidase  portions of H. sapiens QSOX, determined separately by X-ray crystallography (PDB codes 3Q6O and 3LLK respectively), onto the structure of a mutant of M. musculus QSOX designed to mimic the intermediate in interdomain electron transfer (PDB code 3T58) .
To improve the accuracy of the QSOX sequence logo, secondary structures were predicted using PSIPRED  and used to manually adjust the alignment in certain cases. In some sequences that lack the Trx2 domain, segments of the ψErv domain were erroneously pulled into the gap, but these segments could be returned readily to their proper positions based on the high helix content of the ψErv domain and its conservation on the structural level. Such adjustments were unnecessary in other analyses, which were based on alignments that excluded the Trx2 domain. Logos were created using WebLogo . The QSOX multiple sequence alignment was then aligned to the atomic coordinate file using the ConSurf server , resulting in a color coded scheme of conserved residues mapped onto the model of the H. sapiens QSOX1 closed state structure.
We would like to thank Tal Pupko from Tel Aviv University and Irit Orr from the Weizmann Institute Bioinformatics Unit for their remarks and suggestions. This study was supported by the Israel Science Foundation.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.