Ancestry and evolution of a secretory pathway serpin
© Kumar and Ragg. 2008
Received: 13 May 2008
Accepted: 15 September 2008
Published: 15 September 2008
Skip to main content
© Kumar and Ragg. 2008
Received: 13 May 2008
Accepted: 15 September 2008
Published: 15 September 2008
The serpin (serine protease inhibitor) superfamily constitutes a class of functionally highly diverse proteins usually encompassing several dozens of paralogs in mammals. Though phylogenetic classification of vertebrate serpins into six groups based on gene organisation is well established, the evolutionary roots beyond the fish/tetrapod split are unresolved. The aim of this study was to elucidate the phylogenetic relationships of serpins involved in surveying the secretory pathway routes against uncontrolled proteolytic activity.
Here, rare genomic characters are used to show that orthologs of neuroserpin, a prominent representative of vertebrate group 3 serpin genes, exist in early diverging deuterostomes and probably also in cnidarians, indicating that the origin of a mammalian serpin can be traced back far in the history of eumetazoans. A C-terminal address code assigning association with secretory pathway organelles is present in all neuroserpin orthologs, suggesting that supervision of cellular export/import routes by antiproteolytic serpins is an ancient trait, though subtle functional and compartmental specialisations have developed during their evolution. The results also suggest that massive changes in the exon-intron organisation of serpin genes have occurred along the lineage leading to vertebrate neuroserpin, in contrast with the immediately adjacent PDCD10 gene that is linked to its neighbour at least since divergence of echinoderms. The intron distribution pattern of closely adjacent and co-regulated genes thus may experience quite different fates during evolution of metazoans.
This study demonstrates that the analysis of microsynteny and other rare characters can provide insight into the intricate family history of metazoan serpins. Serpins with the capacity to defend the main cellular export/import routes against uncontrolled endogenous and/or foreign proteolytic activity represent an ancient trait in eukaryotes that has been maintained continuously in metazoans though subtle changes affecting function and subcellular location have evolved. It is shown that the intron distribution pattern of neuroserpin gene orthologs has undergone substantial rearrangements during metazoan evolution.
The serpins represent a superfamily of proteins with a common fold that cover an extraordinary broad spectrum of different biological functions. Most serpins inhibit proteases from one or several different clans of peptidases; some superfamily members, however, exert disparate roles, such as assisting in protein folding or transportation of hormones . This functional diversity is enabled, at least in part, by the unusual structural plasticity of the serpin molecule that, in the native form, often takes a metastable structure. Serpins can perform their activity in the extracellular space or in various subcellular compartments, including the secretory pathway routes [2, 3], and they are found in all high-order branches of the tree of life . Deficiency of some serpins, such as antithrombin or neuroserpin, is lethal or may be associated with serious pathology [5, 6]. Mutations of the neuroserpin gene for instance may result in formation of intracellular aggregates in the brain causing dementia , while wild type neuroserpin provides protection of neuronal cells in cerebral ischemia and other pathologies . Neuroserpin inhibits tissue plasminogen activator (tPA), urokinase-type plasminogen activator, nerve growth factor-γ, and plasmin. These enzymes are also believed to represent physiological targets of the inhibitor [8, 9]. Native neuroserpin is found in the medium of some cell lines  but also in dense core secretory vesicles of neuronal cells [10, 11], suggesting that it could exert a function within the regulated secretory pathway, though there is no experimental evidence for this. Association of neuroserpin with secretory pathway organelles is mediated via a 13 amino acid C-terminal sorting sequence . Recently, serpins equipped with a C-terminal endoplasmic reticulum (ER) retention/retrieval signal that efficiently inhibit furin and/or other members of the proprotein convertase (PC) family have been identified in Drosophila melanogaster [12–15], demonstrating for the first time that serpins with antiproteolytic activity may reside in early secretory pathway organelles.
The elucidation of phylogenetic relationships among animal serpins poses a notorious problem . Serpin genes represent a substantial fraction of metazoan genomes, often amounting to several dozens of members in mammals. In various vertebrate lineages multiple expansions of serpin genes have occurred [17, 18] resulting in numerous paralogs. In other lineages, such as fungi, serpins seem to be rare. In some species phylogenetic relationships of serpin genes may be obscured further by a propensity for reciprocal or non-reciprocal exchange of cassette exons coding for the hypervariable reactive site loop region (RSL) . The sequence of this region plays a primary role in determining the specificity of serpin/target enzyme interaction. Inhibition of target proteases involves cleavage of a scissile bond located between positions P1 and P1' of the inhibitor's RSL . Serpins also occur with a patchy distribution in prokaryotes, but the time point of their first emergence is not known .
Extension of microsynteny analysis to lancelets (Branchiostoma floridae) and sea urchins (Strongylocentrotus purpuratus) showed that a serpin gene is present in either of these species in close vicinity to the PDCD10 gene (Figure 2). As in vertebrates, these genes are arranged in a head-to-head orientation. Sequence comparisons corroborated that the Branchiostoma floridae serpin adjacent to PDCD10, denominated Bfl-Spn-1, is the ortholog of the previously characterised serpin gene Spn-1 from the closely related Branchiostoma lanceolatum (92.2% sequence identity for the C-terminal 385 amino acids) that was recently shown to inhibit proprotein convertases . Each of these serpins contains a highly conserved RSL region (positions P5 – P1': NMMKR ↓ S), and a C-terminal ER retention/retrieval signal (KDEL) (Figure 4). The presence of an N-terminal signal peptide in lancelet Spn-1 mediating access to the secretory pathway is supported by cDNA sequence analysis and expression studies . The gene cluster harbouring the PDCD10/Spn-1 gene pair includes a closely related paralog (Bfl-Spn-2) of B. floridae Spn-1 (Figure 2) that also has a counterpart in B. lanceolatum (not shown).
Though microsynteny and signature sequences strongly argue in favour of a common ancestor giving rise to mammalian neuroserpin, Spu-Spn-1 from Strongylocentrotus purpuratus, and Spn-1 from Branchiostoma, their genes depict quite different patterns of intron distribution (Figure 3). The sea urchin Spu-Spn-1 gene does not contain any intron mapping to the serpin core domain, and the single (correctly predicted?) intron resides in the sequence coding for the signal peptide (accession number: NW_001288761). The Spn-1 gene from lancelets harbours introns at positions 75c and 174a (α1-antitrypsin numbering). This intron-poor gene architecture contrasts with the mammalian neuroserpin gene that depicts the characteristic group 3 exon-intron structure with introns at positions 167a, 230a, 290b, 323a, 352a, and 380a (the first intron of the neuroserpin gene mapping to the serpin core domain, tentatively assigned to position ~90a, cannot be assigned reliably, due to alignment ambiguities). Strikingly, none of these intron positions is conserved among neuroserpin orthologs from lancelets, sea urchins or vertebrates. There is also no congruence of the introns at positions 75c and 174a in the lancelet Spn-1 gene with any of the other vertebrate serpin genes (Figure 1). Obviously, massive changes have occurred along the neuroserpin gene lineage concerning exon-intron organisation since divergence of echinoderms, cephalochordates and vertebrates. The Spn-1 gene from Nematostella vectensis also does not contain an intron mapping to the serpin body (Figure 3).
Contrasting with the neuroserpin gene lineage, comparably few changes are evident in the architecture of the immediately adjacent PDCD10 gene since the split of sea urchins and mammals (Figure 6). The PDCD10 genes from humans and Strongylocentrotus have four out of six intron positions in common. Two introns (positions 50c and 186b, numbering based on the human sequence) seem to have been lost in the sea urchin, since they are present in the earlier diverging cnidarian, Nematostella vectensis. The sea anemone PDCD10 gene contains eight introns, six of which are found at equivalent positions in the human homolog. Nematostella vectensis genes were recently demonstrated to share the majority of intron positions with their mammalian counterparts . None of the PDCD10 introns of C. elegans superimposes on an intron found in the orthologs from humans, the sea urchin, or the sea anemone.
The findings here reveal a clear history of neuroserpin, a prominent group 3 vertebrate serpin. Features derived from the genomic, gene and protein level provide ample discriminatory data to enable drawing of a reliable kinship history of its previously unknown origin. Microsynteny analysis proved to be especially illuminating, demonstrating that rare genomic characters can provide very useful information for decoding of bonds in protein families with intricate evolutionary history. Recent investigations provide a plausible explanation for the strongly conserved syntenic association of PDCD10 and neuroserpin orthologs during diversification of deuterostomes. Apparently, expression of the head-to-head arranged genes is controlled by a bi-directional, asymmetrically acting promoter region inserted within the ~0.9 kb intergenic region separating the transcription units coding for PDCD10 and neuroserpin . Dependence on the common regulatory region thus may have forced the maintenance of linkage of these genes. The rapidly increasing flood of data from genome sequencing projects will certainly continue to provide further discriminatory information from multiple, independent levels of biological organisation, such as codon usage dichotomy , to enable robust classification of other metazoan serpins.
Neuroserpin orthologs from early diverging deuterostomes, like Strongylocentrotus or Branchiostoma, contain classical ER retention signals (KDEL or HEEL) at their C-terminal ends, and the Nematostella Spn-1 sequence terminates with SDEL, which functions as an autonomous ER retention/retrieval signal in HeLa cells, when hooked to a reporter protein . The C-terminal end of neuroserpin from mammals, chicken, and Xenopus is HDFEEL (Figure 4). In HeLa cells, which express three different KDEL receptors with overlapping, but not identical passenger specificities, the FEEL sequence targets attached passenger proteins primarily to the Golgi, though some 25% of cells depict ER localisation . In transfected COS cells, intracellular neuroserpin localises to either the ER or Golgi ; in cells with a regulated secretory pathway, however, neuroserpin resides in large dense core vesicles, mediated by a C-terminal extension encompassing the last 13 amino acids, including the FEEL sequence . Collectively, these data are compatible with the view that, in an ancient ortholog of neuroserpin, a two amino acid insertion (FE) gave rise (in combination with additional residues?) to a modified sorting signal enabling a more specialised subcellular localisation. Irrespective of the still fragmentary data concerning the phylogenetic classification of Spn-1 from the sea anemone, it is clear that surveillance of the secretory pathway routes by serpins is an ancient and conserved trait in eukaryotes. Whether the C-terminal extensions of neuroserpin orthologs from fishes (Figure 4) are functional secretory pathway address signals remains to be determined.
The regional changes of placement within the secretory route may have come along with diversifications associated with the inhibitors' functions due to changes within the RSL region. Neuroserpin from vertebrates is believed to interact with its preferred target enzyme, tPA, via the single Arg residue (P1 position) in the RSL region . In lancelets, the scissile bond is preceded by the dipeptide motif Lys-Arg (KR), which is characteristic for substrates and inhibitors of proprotein convertases, which indeed, have been identified as target enzymes of lancelet Spn-1 . Similar biochemical properties are expected for Spn-1 from the sea urchin, and Spn-1 from the sea anemone (Figure 4). The physiological interaction partners of these inhibitors have not yet been identified.
Though the data clearly indicate that the roots of mammalian neuroserpin may be traced back far in the history of animals, unequivocal support for a neuroserpin ortholog in arthropods is still lacking. Several labs have provided evidence for a serpin (Spn4) with furin inhibiting activity and containing a canonical ER targeting signal in Drosophila [13–15], and a similar protein has been detected in Anopheles . However, caution should be advised, because homoplasy due to convergent evolution currently cannot be excluded. The Spn4 gene is prone to recombination events, especially in the regions coding for the RSL region . Unraveling the relationships of the Spn4 gene from fruit flies and neuroserpin orthologs from deuterostomes requires further investigation.
The history of the neuroserpin/PDCD10 gene pair reveals some remarkable insights into the evolution of the exon-intron structure of metazoan genes. Even closely adjacent genes that are physically linked at least since divergence of echinoderms and chordates may be subject to quite different trends affecting the intron distribution patterns. Comparably few changes in the exon-intron architecture have happened in PDCD10 orthologs since divergence of lineages leading to sea anemones and vertebrates (Figure 6). In PDCD10 genes, six out of eight intron positions occurring in humans or in the cnidarian are conserved. This is in accordance with findings demonstrating that the majority of genes from early diverging present-day eumetazoans are intron-rich with most introns apparently maintained since ancient times [34, 38]; for serpin genes, however, the situation appears to be different. Regardless of the still rudimentary evidence for the putative sea anemone neuroserpin ortholog, the available data show that serpin genes in Nematostella vectensis are intron-poor. The sea anemone Spn-1 gene does not contain any introns mapping to the serpin body, and the single serpin core intron identified in one (accession number: XP_001627750) of the currently known three Nematostella vectensis serpin genes maps to residue 42c (α1-antitrypsin numbering; not shown). Looking up at deuterostomes, the sea urchin neuroserpin ortholog Spu-Spn-1 is also devoid of introns within the region coding for the serpin core. In contrast, the Spn-1 genes from Branchiostoma floridae (Figure 3) and its close relative, Branchiostoma lanceolatum  each depict two introns mapping to identical sites within the serpin body. Their positions, however, are not congruent with any of the introns of mammalian neuroserpin, the prototype group 3 vertebrate serpin gene or with any other intron location known from vertebrate serpin genes . Therefore it must be considered that, in the serpin lineage leading to mammalian neuroserpin, an appreciable fraction of introns is not ancient, but may have been acquired during metazoan evolution; however, it cannot be excluded that intron paucity in present-day serpin genes of cnidarians (and in neuroserpin orthologs from sea urchins and lancelets) is due to massive intron loss, in contrast to most other introns that have survived hundreds of millions of years in these creatures. Intron gain is possibly not as rare as sometimes believed , however, it could be confined to certain gene families and/or to discrete evolutionary phases , for as yet unexplored reasons. Several types of processes have been proposed that may explain how introns may be acquired, but definite answers are still awaited.
In this study, we analysed and resolved the evolutionary roots of neuroserpin, a secretory-pathway associated mammalian serpin. Insight into the intricate history of the multi-membered serpin superfamily beyond the fish/tetrapod split was obtained by showing that orthologs of neuroserpin exist at least since the emergence of deuterostomes and probably already since divergence of eumetazoans and Bilateria. The continuous presence of neuroserpin orthologs equipped with C-terminal signal sequences assigning residence within the secretory pathway documents that serpins functioning as guards of the cellular export/import routes represent an ancient trait. This surveillance role has been subject to subtle functional and local variances during evolution as evidenced by changes within the RSL and the subcellular address signal. In contrast to many other, even closely linked genes, in which the majority of intron positions has been conserved for hundreds of millions of years, the intron distribution pattern of neuroserpin gene orthologs has experienced massive changes, perhaps dominated by intron gain.
Serpin protein and DNA sequences of various genomes were extracted from publicly accessible databases (see Additional file 2) via the BLAST software package (including PSI-BLAST) using key words or the human α1-antitrypsin sequence for searching. Chromosomal microsynteny analysis was performed using the NCBI Map Viewer , the ENSEMBL genome browser , the JGI genome browser , the Tetraodon genome browser , the UCSC genome browser , and inspecting the Strongylocentrotus purpuratus genome database .
Alignments of protein sequences were performed with CLUSTAL X  and refined manually in GeneDoc . Intron positions were identified and assigned with GENEWISE . Mature human α1-antitrypsin was used as reference for mapping of positions and phasing of introns in serpin genes .
This work was supported by the Deutsche Forschungsgemeinschaft, Graduate Program 'Bioinformatics' at the University of Bielefeld.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.