Analyses of the core eukaryotic protein subunit of telomerase support extensive adaptation to different evolutionary and life histories in the Metazoa

Most animals employ telomerase, which consists of a catalytic subunit known as the telomerase reverse transcriptase (TERT) and an RNA template, to maintain telomere ends. Given the importance of TERT and the apparent importance of telomere biology in core metazoan life history traits like ageing and the control of somatic cell proliferation, we hypothesised that TERT would have patterns of sequence and regulatory evolution reflecting adaptations to diverse evolutionary and life histories across the Animal Kingdom. To test this, we performed a complete investigation of the evolutionary history of TERT across animals. We show that although TERT is almost ubiquitous across Metazoa, it has undergone substantial sequence evolution in canonical motifs. Beyond the known canonical motifs, we also identify and compare regions that are highly variable between lineages, but for which conservation exists within phyla. Recent data have highlighted the importance of alternate splice forms of TERT in non-canonical functions in some animals. Although animals may share some conserved introns, we find that the selection of exons for alternative splicing appears to be highly variable, and regulation by alternative splicing appears to be a very dynamic feature of TERT evolution. We show that even within a closely related group of triclad flatworms, where alternative splicing of TERT was previously correlated with reproductive strategy, we observe highly diverse alternative splicing patterns. Our work establishes that the evolutionary history and structural evolution of TERT involves previously unappreciated levels of change, supporting the view that this core eukaryotic protein has adapted to the requirements of diverse animal life histories.


Introduction
reaching adulthood, certain animals with indeterminate growth such as some fish (Charnov et al. 89 1991; Klapper et al. 1998a; Reznick et al. 2002), bivalve molluscs (Owen et al. 2007), sea urchins 90 (Ebert et al. 2008), lobsters (Klapper et al. 1998b) and planarians (Tan et al. 2012) express 91 telomerase through the life cycle. 92 In addition to its canonical role in telomere maintenance, TERT, the protein subunit of the 93 telomerase enzyme has also been shown to have non-canonical functions (Smith et al. 2003; is known about TERT alternatively spliced (AS) variants in animals this represents a potential 103 evolutionary mode for lineage specific life history adaptation, and thus broader assessment of 104 TERT alternate splicing is required particularly in lineages spanning short evolutionary time 105 frames. 106 Given that telomerase function is very likely to show adaptation to lineage specific life 107 history traits, it is surprising that an in depth understanding of the evolutionary history of TERT is 108 currently lacking from the literature. Here, we make a detailed study of TERT across the Animal 109 Kingdom using extant data to look for evidence (or lack) of patterns of TERT protein and 110 alternative splicing evolution that would support lineage specific adaptations. We identify TERTs 111 across animal phylogeny and perform within and between phyla comparison of the structure of 112 this gene. We are able to confirm the domain content of the ancestral TERT at the base of the 113 Animal Kingdom and then we are able to show dynamic evolution of the protein in different 114 lineages. This includes both the loss of canonical motifs and the invention of lineage specific 115 domains, for example the invention of novel C-terminal extension and N-terminal linker motifs 116 specific to the vertebrate lineage. In some cases we hypothesized that novel motifs may simply 117 compensate for lost canonical motifs, and in others they may relate to lineage specific activities 118 and interactions that remain to be discovered. Next we use available data to assess the levels of 119 alternative splicing, implicated in non-canonical TERT function in humans and described these in 120 different taxa. We find evidence that alternative splicing is likely to be common feature of some 121 lineages, but is not broadly conserved with respect to splicing patterns. Finally, we study the 122 evolution of splicing in one particular related group of animals, the free-living Tricladida planarians 123 (Riutort et al. 1993; Carranza et al. 1996;Álvarez-Presas et al. 2008). We find that it evolves 124 relatively rapidly in this group of highly regenerative animals, suggesting that study of alternative 125 splicing over shorter evolutionary timescales may be required to understand adaptive non-126 canonical roles of TERT. Taken together our work highlights a previously unappreciated 127 evolutionary and likely functional diversity in this core eukaryotic protein. We conclude that 128 telomere biology, a core synapomorphy of eukaryotes, has undergone significant adaptation in 129 different lineages and some of this is through the evolution of the TERT protein subunit of 130 telomerase itself. 131 132 Results and Discussion 133 134 TERT distribution across the Animal Kingdom and reconstruction of the ancestral TERT domain 135 structure. 136 While TERT proteins perform core functions across eukaryotes, not much is known about how 137 conserved or not TERT protein structure is between different metazoan lineages. In order to 138 address this, we set out to define the structure of TERT in the metazoan ancestor and use this as a 139 basis for comparison amongst animals. We searched for TERT orthologs from publicly available Salpingoeca rosetta. Despite this, both M. brevicollis (Robertson 2009) and S. rosetta (Fairclough 150 et al. 2013) have the 5'-TTAGGG(n)-3' telomeric repeat. Based on this finding, a canonical TERT is 151 predicted to be present (Robertson 2009 . 2), while all protostomes have lost one or more canonical motifs 212 suggesting broad differences in how TERT executes its core function may exist in metazoans.

214 215
Taxa-specific divergence of canonical TERT motifs 216 We next examined evolution of TERT canonical motifs in detail across taxa. Sequence analyses of 217 11 TERT canonical motifs revealed that the TEN (GQ motif) and TRBD (CP, QFP and T motifs) 218 domains have high sequence conservation across their whole lengths (supplementary fig. S2).

219
Amongst unicellular eukaryotes considered here, only C. owczarzaki TERT has all 11 motifs ( fig. 2). 220 An independent study reported that TERT in Leishmania amazonensis also contained all 11 motifs 221 (  S6A). Alignment of unicellular eukaryotes 297 CTE regions revealed that blocks of conservation exist within different protozoan lineages, 298 although not between unicellular species and human TERT (supplementary fig. S6C).

299
Crytosporidium sp. CTE regions are considerably shorter than those of other protozoans. 300 Kinetoplastid protozoans, Leishmania sp. and Trypanosoma sp., have a block of conserved 301 residues made up of 48 amino acids at the end of their TERT proteins (supplementary fig. S6C). In 302 other early-branching metazoans, although the parasitic cnidarian T. kitauei and the comb jelly M. 303 leidyi lack CTE regions, we identified some conserved residues in CTE regions when aligning these 304 regions from A. queenslandica and H.vulgaris (supplementary fig. S6A).

305
Within bilaterians, it has been reported that nematodes from the Caenorhabditis genus 306 have some of the shortest TERT proteins and appear to have lost their CTE structure (Malik et al. 307 2000; Meier et al. 2006). We observe that this is also true for the parasitic roundworm 308 Ancylostoma ceylanicum where the C-terminal region beyond the E motif only has 42 amino acid 309 residues (supplementary fig. S6E). We discovered that most parasitic nematodes from the Our findings suggest that the ancestor of metazoans possessed a CTE and that while this 325 feature has been largely retained in most animal lineages, it has been lost multiple times.  S7F). It seems highly 361 likely that these conserved motifs have functional roles that will be important for TERT function in 362 each of these lineages. 363 Extending our analyses to protostomes, we observed that protostomes generally have 364 truncated N-terminal regions. Because arthropods and lophotrochozoans lack the GQ motif 365 altogether, we extracted sequences upstream of the CP motif (for species that have CP motifs) or   many AS variants in mammals and fishes appear to be of substantial lengths, suggesting that they 453 would encode functional proteins given that most of their TRBD and RT motifs are still present 454 (supplementary table 2). In avians however, many AS variants appear to lack any canonical motifs. 455 The wild turkey has 10 AS variants, 8 of these involve the exclusion of multiple exons and/or loss 456 of frame mutations, do not contain any TERT motifs and are therefore likely to be non-coding and 457 be subjected to nonsense-mediated decay (supplementary fig. S10B). This observation of complex 458 splicing patterns generating many short AS variants with premature termination codon has also 459 been reported in chickens ( to compare N-termini of parasitic flatworms, we were able to identify planarian-specific N-501 terminal motifs within this region such as the I(K/D)xKC and PIYK motifs ( fig. 4D). With regards to 502 CTE regions, there is poor conservation between molluscs and platyhelminthes and they require 503 separate alignments. Within platyhelminthes CTE regions, we observed pockets of conservation, where they share the same positions on the TERT protein ( fig. 4F). As planarian TERTs shared a 515 high degree of structural similarity and due to the lack of genomic data for the nine remaining 516 planarian species, we based our AS variant analysis on an inferred genomic structure generated 517 from the Smed-TERT sequence. From the splicing patterns of cloned AS variants, we could visualise 518 exons that are being excluded and this confirmed that the intron-exon boundaries were accurately 519 inferred based on Smed-TERT sequence (supplementary fig. S13 & S14). AS variants were 520 identified from all seven Dugesiidae species, two Planariidae species and one Dendrocoelidae 521 species (supplementary fig. S13, S14A & 14B). We did not find any AS variants from the marine there is low degree of conservation in the selection of spliced sites for these species. The most 535 prevalent splicing pattern is the deletion of exons 8 to 10 (represented by orange triangles), which 536 is present in six out of ten species where AS variants were found (supplementary fig. S13 & S14). 537 Deletion of exons 11 and 12 (represented by blue triangles) were identified in five out of ten 538 species. These splicing events would affect the TERT RT domain that is encoded by exons 8 to 13 539 (supplementary fig. S13 & S14). We note that the least amount of TERT splicing is observed in the 540 least regenerative species, P. plebeja and D. lacteum (supplementary figure S15 & S16; Liu et al. 541 2013). The significance of this, if any, awaits further investigation. Overall our data suggest that AS 542 splicing of TERT is evolving rapidly within this group of animals and that both fine grained study of 543 TERT alternative splicing as well as functional study of AS variants are required to understand why 544 TERT has evolved so dynamically during animal evolution. 545 546

Conclusion 547
We performed thorough analyses of TERT canonical and non-canonical motifs, gene structure and 548 alternative splicing across representative metazoan species. We show that although the ancestral 549 TERT protein in Metazoa is likely to possess all 11 canonical motifs, the GQ and CP motifs are 550 prone to lineage specific losses ( fig. 2). Beyond the canonical motifs, we demonstrate that the N-551 terminal linkers and CTE regions of TERTs are highly divergent across phyla. In these regions, we 552 discovered novel motifs that exhibit high levels of sequence conservation over long evolutionary 553 times indicating that they may serve an unknown but important biological function. For example, 554 the CTE regions are distinct within phyla (supplementary fig. S6)  We also demonstrated that the N-terminal linker regions have poor sequence conservation 558 between metazoan phyla implying that these regions are adapted to different lineages 559 (supplementary fig. S7). The N-terminal linkers of TERT proteins have been implicated to mediate 560 a conserved function in enhancing the translocation of the 3' end of DNA substrate relative to the 561 RNA template (Collins, 1999;Peng et al., 2001). Deleting residues in this region reduces the 562 translocation of DNA substrate and overall processivity (Moriarty et al., 2004). Future studies will 563 be needed to unravel the biological significance of phyla specific sequences and whether they fine 564 tune telomerase processivity to life history strategy. 565 Analysis on TERT intron exon structure revealed that the metazoan ancestor is intron rich 566 and share many intron positions with TERTs in the vertebrate lineage ( fig. 3). TERTs in 567 protostomes have undergone significant intron loss; lepidopterans and aphids do not have any 568 introns, perhaps due to short lifespans of these taxa. Alternative splicing of TERT has been 569 extensively studied in many species. We report new observations on splicing events for 14 570 metazoan species and show that the selection of spliced exons is poorly conserved in these 571 animals (supplementary fig. S8, S9 & S10). Lastly, we investigated TERT alternative splicing in 572 Tricladida planarians and demonstrated that splicing is dynamic and rapidly evolving within this 573 group of closely related species ( fig. 4, supplementary fig. S13 & S14). It is likely that splicing in 574 planarian TERTs is significant for the highly regenerative and potentially immortal life histories of 575 some of these species and AS variants could possess important non-canonical functions. This study 576 highlights that neither TERT structure nor its regulation is static. A future endeavour will be to 577 unravel mechanistic linkages that connect unique sequence evolution and the dynamic regulation 578 of TERT to actual biological roles.  Animal culture 621 All freshwater flatworm strains were cultured at 20°C using the planarian water formulation. 1X 622 Montjuic salt solution was prepared using milliQ ddH2O with the following composition: 1.6mM 623 NaCl, 1mM CaCl 2 , 1mM MgSO 4 , 0.1mM MgCl 2 , 0.1mM KCl, 1.2mM NaHCO 3 (González-Estévez et 624 al., 2012). The marine flatworm Procerodes plebeja was cultured at 14°C using a salt water 625 formulation made with Tropic Marin sea salt to a salinity of 28-30ppm. Dendrocoelum lacteum 626 was fed with shrimp while all other flatworms used in this study were fed organic beef liver once a 627 week. All flatworms were starved for 1 week prior to any experimental procedures. The worms 628 were kept in the dark at all times apart from feeding and water changing times. 629 630

RNA extraction and reverse transcription in Triclad flatworms 631
Total RNA was isolated from three animals of each species using the TRIzol reagent (Thermo Fisher 632 Scientific) according to manufacturer's instructions. cDNA was generated from total RNA extract 633 using the QuantiTect reverse transcription kit (Qiagen) containing the RT primer mix (blend of 634 oligo-dT and random hexamers). A genomic DNA elimination step was also included in this kit 635 performed using the gDNA wipeout buffer.

637
Degenerate polymerase chain reaction (PCR) and 5' 3' RACE for the cloning of TERT genes 638 TERT transcript sequences for Girardia tigrina (Wheeler et al., 2015), Dugesia japonica (Nishimura 639 et al., 2012), D. lacteum, and P. nigra (Brandl et al., 2015) were retrieved using tblastn with 640 Smed_TERT as a query from published transcriptome sequences of the respective species. Top hits 641 with the best e-value for each species was used for reciprocal blast against the nr database in NCBI 642 to confirm the identification of TERT. For degenerate PCR, first strand cDNA synthesis was 643 performed as mentioned previously. For 5' and 3' RACE, first strand cDNA synthesis was 644 performed on TRIzol extracted total RNA using the SMARTer RACE 5' 3' Kit (Clontech).

656
For the cloning of TERT in Schmidtea lugubris, the same set of degenerate PCR primers (DugesiaFD 657 and DugesiaRD) was used for PCR followed by cloning and sequencing. PCR products were cloned 658 in the pGEMT-easy vector (Promega) followed by colony PCR and sequencing as mentioned 659 previously. The 5' end of Slug_TERT transcript was cloned using the following primer set: 660 Dugesia5F (5'-AATYGAGMGWMATGGTTT-3'; this primer was designed based on 5' sequence 661 region in Dugesia sp.) and LugR (5'-CTGAAATTTGTGCCATTG-3'; Slug-TERT gene specific primer). 662 Next, 3' RACE was performed with the SMARTer RACE 5' 3' Kit using gene specific primers listed in

668
PCR products were cloned in the pGEMT-easy vector (Promega) followed by colony PCR and 669 sequencing as mentioned previously. The 5' and 3' end of Pfel-TERT was obtained using 5' and 3' 670 RACE with the SMARTer RACE 5' 3' Kit using gene specific primers listed in supplementary table 3  671 according to manufacturer's instructions.

680
All degenerate PCRs were performed using the Advantage 2 Polymerase (Clontech) according to 681 recommended thermal cycling parameters. All RACE products were ran on a gel and gel extraction 682 was performed with the MinElute gel extraction kit (Qiagen) following by cloning and sequencing 683 of the RACE products. 684 685 DNA extraction, 18S rDNA PCR and sequencing 686 High molecular weight genomic DNA (gDNA) was isolated using the phenol-chloroform method 687 followed by ethanol precipitation. PCR was performed according to conditions in Carranza et al.,

688
(1996) using the Advantage 2 Polymerase (Clontech). List of 18S primer sequences are provided in 689 supplementary table 3. PCR products were gel extracted using the MinElute gel extraction kit 690 (Qiagen) and eluted in 10uL of molecular grade water. Gel extracted products were sequenced in 691 both directions using 18S nested primers listed in supplementary products were ran on a gel. In order to cut out gel pieces containing the AS variants, the gel image 701 was overexposed to enable visualization of low abundance variants. Gel regions containing bands 702 were excised and extracted using the MinElute gel extraction kit (Qiagen). A-tailing was 703 subsequently performed with GoTaq Polymerase (Promega) for TA-cloning purposes since Phusion 704 PCR products were blunt ended. PCR products were cloned in the pGEMT-easy vector (Promega) 705 followed by colony PCR with M13F and M13R primers. Colony PCR products were run on a gel for 706 2 hours using 1.3% agarose in 1X TAE buffer to allow good resolution of bands. Colony PCR gel 707 would display an array of bands with different sizes representing the AS variants. Bands of 708 different sizes were selected for Sanger sequencing. For each species, 25 colonies were selected 709 for sequencing to allow exhaustive identification of TERT AS variants. Gtig-TERT having one less exon. Since genome data is not available for all the other flatworm 720 species, we based our analyses on the gene structure of Smed-TERT and Gtig-TERT and refer to 721 them as inferred exon positions. Further analyses on cloned AS variants for species without 722 genomic data confirmed the accurate positioning of intron-exon boundaries based on the 723 excluded exons. The full-length TERT sequences for each species were referred to as 'isoform 1'. 724 Cloned AS variants were mapped to 'isoform 1', which set as a reference sequence for each 725 respective species. From this, we were able to visualize the skipped exons, splice site mutations or