Structure-function evolution of the Transforming acidic coiled coil genes revealed by analysis of phylogenetically diverse organisms

Background Examination of ancient gene families can provide an insight into how the evolution of gene structure can relate to function. Functional homologs of the evolutionarily conserved transforming acidic coiled coil (TACC) gene family are present in organisms from yeast to man. However, correlations between functional interactions and the evolution of these proteins have yet to be determined. Results We have performed an extensive database analysis to determine the genomic and cDNA sequences of the TACCs from phylogenetically diverse organisms. This analysis has determined the phylogenetic relationship of the TACC proteins to other coiled coil proteins, the resolution of the placement of the rabbit TACC4 as the orthologue of human TACC3, and RHAMM as a distinct family of coiled coil proteins. We have also extended the analysis of the TACCs to the interaction databases of C. elegans and D. melanogaster to identify potentially novel TACC interactions. The validity of this modeling was confirmed independently by the demonstration of direct binding of human TACC2 to the nuclear hormone receptor RXRβ. Conclusion The data so far suggest that the ancestral TACC protein played a role in centrosomal/mitotic spindle dynamics. TACC proteins were then recruited to complexes involved in protein translation, RNA processing and transcription by interactions with specific bridging proteins. However, during evolution, the TACC proteins have now acquired the ability to directly interact with components of these complexes (such as the LSm proteins, nuclear hormone receptors, GAS41, and transcription factors). This suggests that the function of the TACC proteins may have evolved from performing assembly or coordination functions in the centrosome to include a more intimate role in the functional evolution of chromatin remodeling, transcriptional and posttranscriptional complexes in the cell.


Background
The evolution of complex organisms has been associated with the generation of gene families by the continual duplication of an initial relatively small set of ancestral genes. Through this process, followed by subsequent mutation, reduplication and exon shuffling between gene families, genes have evolved both discrete, and partially redundant functions with their related family members. With the completion of the genome sequencing projects of human, mouse, rat, fruit fly and nematodes, we are now in a position to ask fundamental questions in regard to how genes interact in the context of the whole organism. Thus, with the appropriate application of bioinformatics, it is now possible to trace the lineage of particular genes and gene families, with related gene families in other organisms. Furthermore, with the growing amount of large-scale proteomic and genomic data becoming publicly available, this analysis can now be extended to reveal the complex interplay between evolution of gene structure and protein function.
The first Transforming acidic coiled coil gene, TACC1, was identified during the development of an expression map of the proximal short arm of human chromosome 8 [1]. Two additional TACC family members were subsequently identified and mapped to paralogous chromosomal regions on human chromosomes 4p16 and 10q26, physically close to members of the FGFR gene family [1][2][3]. This mapping data, together with identification of a single TACC gene in the protostomes Caenorhabitis elegans, and Drosophila melanogaster [4][5][6], led to the speculation that the ancestral FGFR and TACC genes were located physically close to each other. Thus, during the evolution of vertebrates, subsequent successive duplications of the ancestral gene cluster have given rise to three TACC family members located close to FGFR genes in humans. In accordance with the proposed quadruplication of the vertebrate genome during evolution, there is a fourth FGFR family member in vertebrates, raising the question of whether a fourth TACC gene is associated with FGFR4 in vertebrate genomes. To date, only three active TACC genes have been cloned in humans [1][2][3], one in each of mouse [7], Xenopus laevis [8], D. melanogaster [4], and C. elegans [5,6]. Although two additional new candidate TACC family members, Oryctolagus cuniculus TACC4 [9] and human RHAMM [10] have been proposed, their true identity and placement in the evolution of the TACC family is under debate. Thus, the identification and functional characterization of new members of the TACC family in other organisms, alternatively spliced isoforms of each TACC and comparison of the phylogenetic relationship of these genes relative to other members of the coiled coil superfamily will resolve this issue and provide clues to the evolution of TACC function.

Results and Discussion
In silico identification of TACC family members from vertebrate and invertebrate lineages Sequence similarity searches of the publicly available genome databases with the BLAST and TBLAST programs were performed to identify TACC and RHAMM ortho-logues, and other members of the coiled coil superfamily in a diverse set of species (Fig. 1). This identified the complete sequence of the TACC genes in representatives of five major phylogenetically distinct clades. Where possible, the construction of the TACC sequences from these organisms was also confirmed by the analysis of the cDNA databases. Several partial sequences in other vertebrate species, the echinodermate Strongylocentrotus purpuratus and the protostome insect Anopheles gambiae were also identified, suggesting an ancient conservation of the TACC genes in metazoan lineages. However, due to the relative infancy of the cDNA/genome projects for these latter organisms, complete characterization of these TACC genes could not be undertaken. No conclusion could be made about the existence of TACC-like sequence in nonbilaterian metazoans, such as Cnidaria or Porifera, due to the paucity of sequence information for these organisms, and additional definitive sequences with a defined TACC domain could not be found in other non-metazoan organisms.
At the base of the chordate branch of life, a single TACC gene was identified in the genome of the urochordate Ciona intestinalis [11], and a partial TACC sequence from an analysis of the Halocynthia rortezi EST database [12]. This confirms the original assumption that a single TACC gene was present in the chordate ancestor. The next major event in the evolution of the chordate genome has been suggested to have occurred 687 ± 155.7 million years ago (MYA), with the first duplication of the chordate genome, and a second duplication occurring shortly thereafter. Thus, if the TACC genes were duplicated at both events, we would expect to identify four TACC genes in the most "primitive" compact vertebrate genome sequenced to date, the pufferfish Takifugu rubripes, with three genes corresponding to the human TACC1-3, and, in keeping with the proposed model for genomic duplication of the chromosomal loci for the TACC genes (discussed below), a possible fourth gene deriving from the TACC3 ancestor. Indeed, four TACC genes were identified in T. rubripes. Of these, two genes corresponded to the T. rubripes orthologues of human TACC2 and TACC3. However, the other two genes, trTACC1A and trTACC1B are clearly most related to TACC1 (Fig. 1). Although trTACC1A is highly homologous to trTACC1B, the latter encodes a significantly smaller predicted protein. The trTACC1B gene is encoded by 15 exons over approximately 7 kb of the Takifugu Scaffold 191 (see below). A search of this region using the trTACC1A sequence and gene prediction software has so far failed to identify additional exons of trTACC1B. However, given the intron/exon structure of this apparently complete gene, it appears likely that trTACC1B is active in the pufferfish, and presumably fulfils either a temporal-spatial specific function within the organism, or a distinct function from the larger trTACC1A product within the cell. Thus, based upon the surrounding chromosomal loci (see below), the trTACC1A and trTACC1B genes appear to have arisen from the duplication of the chromosomal segment containing the teleost TACC1 ancestor, during the additional partial genomic duplication that occurred in the teleost lineage. Therefore, this analysis of T. rubripes does not support the hypothesis that the region surrounding the TACC3 ancestor was included in the second round of vertebrate genomic duplication.
Examination of higher vertebrates led to the identification of splice variants of TACC1 and TACC2 in Mus musculus, and the assembly of the previously unidentified orthologues of TACC1-3 from Rattus norvegus. In addition, the TACC1X sequence was found on mouse chromosome X.
Phylogenetic analysis of the TACC family members compared to other coiled coil proteins Figure 1 Phylogenetic analysis of the TACC family members compared to other coiled coil proteins. The phylogenetic tree was constructed as described in the Methods section. The TACC family defines a separate subfamily of coiled coil containing proteins, distinct from other coiled coil families such as the keratins, RHAMM and tropomyosins. Note that the RHAMM proteins form a separate branch more closely related to the tropomyosins and kinesin like proteins (KLP), than the TACC proteins.
This gene is clearly related to the mouse TACC1, however, further examination revealed a mouse B1 repeat distributed over the length of the proposed intron. In addition, no expression of TACC1X was detected in mouse RNA by rt-PCR analysis (data not shown), suggesting that this sequence is a processed pseudogene. Similarly, TACC1 pseudogenes also exist spread over 22 kb of the centromeric region of human chromosome 10 and, in 8q21, a shorter region 86% identical to the final 359 bp of the TACC1 3' untranslated region. No pseudogenes corresponding to TACC2 or TACC3 were identified in any mammalian species.

Characterization of vertebrate TACC3 orthologues
Based upon current functional analysis, the characterization of TACC3 orthologues is likely to be pivotal to understanding the sequence and functional evolution of the TACC gene family. As indicated below, the chromosomal region containing the TACC gene precursors was duplicated twice during vertebrate evolution. Although the analysis of T. rubripes, rodents and humans so far suggests that the vertebrate TACC3 precursor was not included in the second round of genomic duplication, it could not be excluded that a TACC4 gene may have been lost during the evolution of these lineages. The cloning of a new member of the TACC family in Oryctolagus cuniculus has added to this controversy [9]. Designated TACC4, the 1.5 kb cDNA was highly related, but proposed to be distinct from TACC3. However, Northern blot data suggested that this gene produces a single 2.3 kb transcript [9], indicating that the cloned cDNA was incomplete. The degree of similarity to the published sequence of human and mouse TACC3 suggested to us that TACC4 actually represents a partial rabbit TACC3 cDNA. To test this hypothesis, we set out to clone the complete rabbit TACC3 sequence, based upon the known features of human and mouse TACC3. We have previously noted that the N-terminal and C-terminal regions of the human and mouse TACC3 proteins are highly conserved ( [2], see below). Therefore, based upon the sequence identity between these genes, we designed a consensus oligonucleotide primer, T3con2, that would be suitable for the identification of the region containing the initiator methionine of the TACC3 cDNAs from primates and rodents. Using this primer, in combination with the TACC4-specific RACE primer (RACE2), initially used by Steadman et al [9], we isolated a 1.5 kb PCR product from rabbit brain cDNA by rt-PCR. In combination with 3'RACE, this generated a consensus cDNA of 2283 bp which corresponds to the transcript size of 2.3 kb detected by the "TACC4" sequence reported in Figure 4 of Steadman et al [9]. Thus, while it remains possible that the "TACC4" sequence is an alternative splice product, or is the product of reduplication of the TACC3 gene (events that would be specific to the rabbit), the only transcript detected in rabbit RNA corresponds to the predicted tran-script size of the TACC3 sequence that we have identified here. Furthermore, the string of nucleotides found at the 5' end of the "TACC4" sequence is also found at the 5' ends of a number of cDNA sequences (e.g. U82468, NM_023500), that were isolated by 5'RACE, suggesting that they may correspond to an artefact of the 5'RACE methodology used in their construction. The rabbit "TACC4" and the rabbit TACC3 sequence that we have isolated are also found on the same branch of the TACC phylogenetic tree with the other TACC3 orthologues, including maskin (Xenopus laevis), and the newly identified TACC3 sequences in Rattus norvegus, Gallus gallus, Silurana tropicalis, Danio rerio and T. rubripes, reported in this manuscript (Fig. 1). Thus, it is not in a separate branch that may be expected if the sequence was a distinct TACC family member.

Placement of the RHAMM gene in the phylogeny of the coiled coil gene family
Human RHAMM has also been proposed to be the missing fourth member of the TACC family [10]. Evidence used in support of this claim included its chromosomal location on 5q32 in humans (discussed below), its sequence similarity in its coiled coil domain to the TACC domain and the subcellular localization of the RHAMM protein in the centrosome. However, if RHAMM were a bona fide TACC family member, then we would predict its evolution would be similar to those of other TACC family members, and fit with the proposed evolution of the vertebrate genome. Thus, we set out to identify RHAMM orthologues and related genes in metazoans, so that a more complete phylogeny of the coiled coil super family could be generated. We identified a single RHAMM gene in all deuterostomes for which cDNA and/or genomic sequence was available, including C. intestinalis. No RHAMM gene was identified in insects or nematodes. This indicates that the RHAMM/TACC genes diverged after the protostome/deuterostome split 833-933 MYA, but prior to the echinodermata/urochordate divergence (>750 MYA). Significantly, sequence and phylogenetic analysis of coiled coil proteins ( Fig. 1) clearly shows that RHAMM does not contain a TACC domain and instead forms a distinct family of proteins in the coiled coil superfamily, and is not a direct descendant of the ancestral TACC gene.

Evolution of the chromosomal segments containing the TACC genes
The phylogenetic tree of the FGFR genes closely resembles that of the vertebrate TACC1-3 genes. Recently, detailed analyses of the chromosomal regions containing the FGFR gene family in humans, mouse and the arthopod D. melanogaster have revealed the conservation of paralogous chromosomal segments between these organisms (Fig. 2, [13], Table 1 [see Additional file 1]). This has provided further support that an ancient chromosomal segment was duplicated twice during vertebrate evolution, with the first duplication that gave rise to the human chromosome 4p16/5q32-ter and human chromosome 8p/10q23-ter ancestors occurring in the early stages after the invertebrate divergence. This suggests that the ancestral FGFR-TACC gene pair most probably arose prior to the initial duplication and subsequent divergence of these paralogous chromosomal segments, estimated to have occurred 687 ± 155.7 MYA. This has raised the suggestion that a fourth TACC gene in vertebrates would reside in the same chromosomal region as FGFR4. Indeed this hypothesis has been used in support for the RHAMM gene as a member of the TACC family [10]. Human RHAMM maps to chromosome 5q32 in a region bounded by GPX3 and NKX2E. These loci separate two clusters of genes on human chromosome 5 that are paralogous with 4p16. Interestingly, these three clusters are located on different chromosomes in mouse and rat (Fig. 2), further suggesting that this cluster of genes was transposed into this region after the primate/rodent divergence.
Linear organization of gene clusters centering upon the chromosomal loci of the FGFR genes in humans Figure 2 Linear organization of gene clusters centering upon the chromosomal loci of the FGFR genes in humans. Paralogous genes present in at least two of the four loci are shown, with the exception of the region between GPX3 and NKX2E on chromosome 5, which appears to represent a series of intervening genes inserted after duplication of the 4p16/5q32-35 clusters, and genes mentioned in Fig. 3. Corresponding syntenic mouse chromosomal regions (mm*) are indicated. Takifugu rubripes scaffolds are shown (TR*) that contain more than one homologous gene from these clusters. Further details on the location of paralogous genes can be found in [see Additional file 1]. Because the conservation of gene order can also provide clues to the evolution of gene regulation, we next attempted to trace the evolution of these paralogous segments by examining the genome of the tunicate C. intestinalis [11] and the most "primitive" compact vertebrate genome sequenced to date, T. rubripes [14].
Although not fully assembled, examination of the genome of T. rubripes confirmed the presence of chromosomal segments paralogous to those found in higher vertebrates (Fig. 2). For instance, the orthologues of GPRK2L and RGS12 are found on T. rubripes scaffold 290 (emb|CAAB01000290.1), and within 300 kb of each other in human 4p16. The T. rubripes orthologues of FGFR3, LETM1 and WHSC1 are located on the same 166 kb genomic scaffold 251 (emb|CAAB01000166.1). Significantly, the three human orthologues of these genes are also located within 300 kb of each other on 4p16. Furthermore, TACC3 and FGFRL map to the overlapping scaffolds 1184/4669 (emb|CAAB01004668). Similarly, elements of these gene clusters, extending from HMP19 to GPRK6 in human chromosome 5q34-ter are also found in the pufferfish, with the T. rubripes orthologues of NSD1, FGFR4 and a RAB-like gene mapping on scaffold 407 (emb|CAAB01000407). However, there is no evidence for a gene corresponding to a TACC4 gene in any of these clusters.
As noted above, phylogenetic analysis of the TACC sequences indicate that there are two TACC1 related genes in the pufferfish. trTACC1B is located on the 180 kb scaffold 191 (emb|CAAB01000191.1), which also contains the orthologues of several genes located in human chromosome 8p21-11. Thus, this scaffold represents the more "developed" TACC1 chromosomal segment that is evident in higher vertebrates. On the other hand, the trTACC1A gene is located in the 396 kb scaffold 12 (emb|CAAB010012.1). This scaffold also contains the T. rubripes orthologues of MSX1, STX18, D4S234E and the predicted gene LOC118711, in addition to sequences with homology to LOXL, EVC, LOC159291, and the LDB family. Thus, scaffold 12 contains genes found in the regions of human chromosome 4 and 10 that also contain the loci for TACC3 and TACC2, respectively, and may therefore more closely resemble the genomic organization resulting from the initial duplication of the ancestral paralogous chromosomal segment.
Conserved paralogous clusters may result from the initial clustering of the genes in a relatively small ancestral genomic contig. Some evidence for the existence of "protoclusters" that could correspond to the paralogous chromosomal segments noted in higher vertebrates is present in the genome of the urochordate C. intestinalis [11]. For instance, the orthologues of FGFR, and WHSC1, carboxypeptidase Z and FLJ25359 cluster within an 85 kb region of the C. intestinalis genome and the human orthologues are still maintained in paralogous segments of 4p16, 8p and 10q (Fig. 3, [see Additional file 1]). However, it should be noted that no clusters of genes from the vertebrate paralogous segments are locate close to the TACC or RHAMM genes of C. intestinalis, indicating that the formation of the much larger paralogous segments encompassing the FGFR-TACC genes formed later in evolutionary time, or conversely have been subject to extensive rearrangement in tunicates. In combination with the examination of the T. rubripes genome, this also provides additional evidence that either the second round of duplication of the chromosomal segment that contained the FGFR3/4 ancestor did not include a TACC gene, or that such a gene was lost very early in vertebrate evolution, prior to the divergence of the Gnanthostome lineages. However, the final resolution of the initial evolution of these paralogous segment will await the sequencing of the amphioxus and lamprey genomes, which only have one FGFR gene, and therefore should only contain one copy of the other corresponding genes in this conserved segment.

Comparative genomic structure of the TACC family
The genomic DNA sequences corresponding to the orthologous TACC genes of human, mouse, rat, pufferfish, C. intestinalis, D. melanogaster and C. elegans were extracted and analyzed by Genescan and BLAST to determine the genomic structure of each TACC gene. In some cases, for rat and pufferfish, exons were added or modified based on the best similarity of translated peptides to the corresponding mouse and human proteins. For regions with low sequence similarity in T. rubripes, genomic sequences http://www.genoscope.cns.fr/ from the fresh water pufferfish, Tetraodon nigroviridis were used as additional means to verify the predicted exons.
The general structure of the TACC genes and proteins is depicted in Fig. 4. The main conserved feature of the TACC family, the TACC domain, is located at the carboxy terminus of the protein. In the case of the C. elegans TAC protein, this structure comprises the majority of the protein and is encoded by two of the three exons of the gene.
In the higher organisms, D. melanogaster, and the deuterostomes C. intestinalis to human, this feature is also encoded by the final exons of the gene (five in D. melanogaster, seven in the deuterostome genes). Outside of the TACC domain, however, TACC family members show relatively little homology. It is interesting that each TACC gene contains one large exon, which shows considerable variability between TACC orthologues, and constitutes the main difference between the TACC3 genes in the vertebrates (see below). In deuterostomes, this exon contains the SDP repeat (or in the case of the murine TACC3's, a rodent-specific 24 amino acid repeat), which is responsible for the binding of the SWI/SNF chromatin remodeling complex component GAS41 [15,16].
Of the vertebrate TACC proteins, the TACC3 orthologues show the greatest variability in size and sequence, ranging in size from 599 amino acids for the rat TACC3 protein, to 942 amino acids in the Danio rerio protein. The reasons for these differences are apparent from the genomic structure of the TACC3 orthologues. TACC3 can be divided into three sections: a conserved N-terminal region (CNTR) of 108 amino acids, encoded by exons 2 and 3 in each verte-brate TACC3 gene, the conserved TACC domain distributed over the final seven exons, and a highly variable central region. The lack of conservation in both size and sequence of the central portion of the TACC3 proteins of human and mouse has been previously noted, and accounts for the major difference between these two orthologues [2]. The majority of this central portion, which contains the SDP repeat motifs, is encoded by one exon in human and the pufferfish (emb|CAAB01001184). In rodents, however, this region is almost entirely composed of seven 24 amino acid   repeats, which are located in a single exon of the mouse and rat TACC3 genes. It has been previously reported that there are four mouse TACC3 splice variants that differ in the number of these repeats [2,7,17]. As these repeats are present in a single exon, it appears likely that these different sequences may be the result of the DNA polymerases used in the cDNA synthesis and/or PCR reaction stuttering through the repeat motif. The correct sequence, reported by Sadek et al [7], is the one used throughout the entirety of this manuscript. These repeats are not evident in the rabbit protein, or any other TACC protein, and may indicate that the rodent TACC3 has evolved distinct functions, as has already been noted for the amphibian Xenopus TACC3, maskin [8].

Alternative splicing in vertebrate TACC genes
Whereas exon shuffling can drive the functional diversification of gene families over evolutionary time, the temporal and/or tissue specific alternative splicing of a gene can give rise to functional diversification of a single gene during the development of an organism. Although no alternative splicing of TACC3 has been clearly documented, both temporal and tissue specific splicing is observed in the TACC1 and TACC2 genes. In the case of TACC2, an additional large (5 kb) exon accounts for the main difference between the major splice variants of the vertebrate TACC2 genes [3]. The alternative splicing of this exon suggests a major functional difference between the two TACC2 isoforms, TACC2s and TACC2l [3], as well as a significant difference between TACC2 and its closest TACC family relative, TACC1. However, the function of this region of the TACC2l isoform is current unknown.
Alternative splicing, together with differential promoter usage has already been noted for the human TACC1 gene [18,19]. In addition, as shown in Fig. 5, we have identified additional TACC1 isoforms that result from alternative splicing of exons 1b-4a. The functions of these different isoforms are unknown, however the region deleted from the shorter variants can include the binding site for LSm7 [20] (variants C, D, F-I), and/or the nuclear localization signals and binding site for GAS41 [15] and PCTAIRE2BP [20] (isoforms B-D, S). One of these isoforms, TACC1S is localized exclusively to the cytoplasm [19], suggesting that the shorter isoforms would not be able to interact with elements of the chomatin remodeling and/or RNA processing machinery in the nucleus. Thus, changes in the complement of TACC1 isoforms in the cell may result in alterations in cellular RNA metabolism at multiple levels, and may account for the observation that TACC1D and TACC1F isoforms are associated with tumorigenic changes in gastric mucosa [18].

In silico modeling of the evolution of TACC protein function
The protein and genomic structure of the present day TACC family members suggests that the function of the ancestral TACC protein was mediated solely through the interactions of the conserved TACC domain. Using an in silico protein-protein interaction model based upon known mitotic spindle and centrosomal components, we have previously predicted a number of additional interactions that could be conserved between a functional TACC homologue in yeast, spc-72, and one or more human TACC proteins [21]. Thus, it is known that all the TACC proteins examined to date interact, via the TACC domain, with the microtubule/centrosomal proteins of the stu2/ msps/ch-TOG family [5,6,[22][23][24], and with the Aurora kinases [20,21,25]. These interactions are required for the accumulation of the D-TACC, spc72, ceTAC1 and TACC3 proteins to the centrosome [5,6,[22][23][24]. Hence, this functional interaction with the centrosome and mitotic spindle is likely to represent the ancient, conserved function of the TACC family. However, it is apparent that the human TACC proteins also differ in their ability to interact with the Aurora kinases. For instance, TACC1 and TACC3 interact with Aurora A kinase, whereas TACC2 interacts with Aurora C kinase [21], suggesting a degree of functional specialization in the derivatives of the ancestral chordate TACC, after the radiation of the vertebrate TACC genes.
The localization of the vertebrate TACC proteins in the interphase nucleus [15,26,27] suggests that they have additional functions outside their ancient role in the centrosome and microtubule dynamics. Thus, it seems likely that TACC family members in protostomes and deuterostomes have integrated new unique functions as the evolving TACC genes acquired additional exons. The results of the pilot large-scale proteomic analysis in C. elegans and D. melanogaster provide further suggestive evidence to this functional evolution. Yeast two hybrid analysis indicates that ceTAC directly binds to C. elegans lin15A, lin36 and lin37 [28]. These proteins bridge ceTAC to other elements of the cytoskeleton and microtubule network, as well as to components of the ribosome, the histone deacetylase chromatin remodeling machinery such as egr-1 and lin-53 (the C. elegans homologues of the human MTA-1 and RbAP48), and to transcription factors such as the PAL1 homeobox and the nuclear hormone receptor nhr-86 [28] (Fig. 6A). Similarly, large scale proteomics [29] has shown that Drosophila TACC interacts with two proteins, the RNA binding protein TBPH and CG14540 (Fig. 6B), and thus indirectly with the Drosophila SWI/SNF chromatin remodeling complex and DNA damage repair machinery. Significantly, the ceTAC protein has also recently been implicated in DNA repair through its direct interaction with the C. elegans BARD1 orthologue [30]. It should be noted that a number of interactions with the TACC Alternative splicing of the human TACC1 gene proteins from these organisms have probably been missed by these large scale methods, including the well documented direct interactions with the aurora kinases and the stu2/msps/ch-TOG family.
Because of the evolutionary conservation of the TACC domain, we would predict that some of the functional interactions seen in C. elegans and D. melanogaster would be observed in higher animals. Phylogenetic profiling from these interaction maps suggests two similar sets of predicted interactions for vertebrate TACCs (Fig. 6C and  6D). Strikingly, however, the C. elegans specific proteins lin15A, lin36 and lin37 do not have readily discernible homologues in vertebrates or Drosophila, although the presence of a zinc finger domain in lin36 may suggest that this protein is involved directly in transcription or perform an adaptor role similar to LIM containing proteins. For the DTACC interacting proteins, TBPH corresponds to TDP43, a protein implicated in transcriptional regulation and splicing [31,32]. However, the assignment of the human homologue of CG14540 is less clear, with the closest matches in the human databases corresponding to glutamine rich transcription factors such as CREB and the G-box binding factor.

Comparison of modeled with experimentally defined interactions of the vertebrate TACC proteins
The interaction data for the vertebrate TACCs is relatively limited; however, interaction networks are now beginning to emerge. The results of our functional analysis, as well as other published data clearly indicate that the vertebrate TACCs interact with proteins that can be divided into two broad categories: 1) proteins with roles in centrosome/ mitotic spindle dynamics, and 2) proteins involved in gene regulation, either at the level of transcription, or subsequent RNA processing and translation [3,[5][6][7]15,[19][20][21]24,25,33,34]. Many of these proteins do not appear to interact directly with the protostome TACCs, but would be expected to be in the same protein complex (Fig.  6C,6D).
Significant analysis of the association of the TACCs with the centrosome and the dynamics of mitotic spindle assembly from yeast to humans has been published [5,6,[21][22][23][24]. From this analysis, it seems likely that the vertebrate TACC3 protein has retained this direct ancestral function, based upon its location in these structures during mitosis [27], its strong interaction with Aurora Kinase A, and the observation that it is the only human TACC protein phosphorylated by this enzyme [21]. However, the variability of the central domain of the vertebrate orthologues, suggests that TACC3 may also have acquired additional, and in some instances, species-specific functions. For instance, in X. laevis, the maskin protein has acquired a binding site for the eIF4E protein, and thus a function in the coordinated control of polyadenylation and translation in the Xenopus oocyte [8,35]. A recent study has suggested that this function may be unique to maskin: although it is unclear whether the other vertebrate TACC3 proteins interact with the eIF4E/CPEB complex, the human TACC1A isoform is unable to interact with the eIF4E/CPEB complex. Instead, some TACC1 isoforms have evolved a related, but distinct function by directly interacting with elements of the RNA splicing and transport machinery [19].
To further characterize the evolving functions of the TACC proteins, we have used an unbiased yeast two hybrid screening method to identify proteins that bind to the human TACC proteins [3,34]. In a screen of a MATCH-MAKER fetal brain library (BD Biosciences Clontech), in addition to isolating the histone acetyltransferase hGCN5L2 [34], we also identified the β3 isoform of retinoid-X receptor β as a protein that interacts with the TACC domain of TACC2. As shown in Fig. 7, this interaction is confirmed in vitro by GST-pull down analysis. Significantly, RXRβ is a close family relative of the nuclear hormone receptor, nhr-86, from C. elegans, which interacts with the ceTAC binding protein lin36 (Fig. 6A). This suggests that while protostome TACCs may require additional protein factors to interact with such components, the TACCs in higher organisms may have evolved the ability to directly interact with some of the proteins in the predicted interaction map (Fig. 6E). Indeed, this appears to be directly linked to the acquisition of new domains and duplication of the chordate TACC precursor. In fact, the first identified function of a vertebrate TACC protein was as a transcriptional coactivator acting through a direct interaction with the ARNT transcription factor [7]. It is also intriguing that the deuterostome specific SDP repeat interacts with GAS41, a component/accessory factor of the human SWI/SNF chromatin remodeling complex [3,15]. Although there is a D. melanogaster homologue of GAS41, dmGAS41, the large scale proteomic interaction database does not indicate a direct interaction of dmGAS41 with DTACC. This may be due to the lack of the SDP repeat region in the Drosophila TACC protein. This further suggests that the vertebrate TACCs have gained the specific ability to direct interact with transcriptional regulatory complexes, and that bridging protein(s) are no longer required. Thus, where the ceTAC protein is only composed of the TACC domain, the significantly larger TACC family members in higher protostomes and deuterostomes may have integrated one or more functions of the bridging protein (in this case lin15A, lin36 or lin37). This may also explain the absence of lin15A, lin36 and lin37 homologues in higher organisms, as they were no longer under selective evolutionary pressure to remain within the complex, and thus lost in the evolving genome.

Proposed functional evolution of the TACC family
Examination of the evolution of ancient gene families provides an insight into how gene structure relates to function. We have presented above, a detailed examination of one such gene family. The data so far suggest that the functional TACC homologue in yeast (spc72) has a specific role in centrosomal/mitotic spindle dynamics [21,22]. This ancient TACC function is conserved throughout evolution in both protostomes and deuterostomes. In addition, the TACC proteins of lower organisms appear to interact with bridging proteins that are components of several different protein complexes involved in DNA damage repair, protein translation, RNA processing and transcription. However, over the process of evolutionary time, with the acquisition of new domains and duplication of the chordate TACC precursor, the chordate TACC proteins have acquired the ability to directly interact with some of the other components of these complexes (such as the LSm proteins, nuclear hormone receptors, GAS41, accessory proteins and transcription factors), and thus evolved additional functions within these complexes. Indeed, the first assigned function of a vertebrate TACC protein, mouse TACC3, was as a transcriptional coactivator of the ARNT mediated transcriptional response to hypoxia and polyaromatic hydrocarbons [7]. Mouse TACC3 has also been reported to interact with the transcription factor STAT5 [33].
Recently, we have demonstrated that TACC2 and TACC3 can bind to nuclear histone acetyltransferases [34], further confirming a more direct role for the TACC proteins in transcriptional and chromatin remodeling events. Interestingly although all human TACC proteins can directly interact with the histone acetyltransferase pCAF in vitro, the TACC1 isoforms expressed in human breast cancer cells do not interact with this histone acetylase [34]. This may be attributable to the proposed function of the Exon 1 containing TACC1 variants in RNA processing, via the interaction with LSm-7 and SmG [19]. Thus, alternative splicing of the TACC1 gene adds further diversity to TACC1 function, as the deletion of specific exons and their associated binding domains will change the potential protein complexes with which they can associate, either directly, or by redirecting the splice variants to different subcellular compartments. With the duplication of the TACC1/TACC2 ancestor, it is apparent that an even greater functional diversity may have been introduced into the TACC family. The TACC2 protein retains the ability of TACC3 to interact with GAS41, INI1, histone acetyltransferases and transcription factors (in this case, RXRβ) ( Fig. 7) [3,34]. However, the tissue specific splicing of the 5 kb exon in the TACC2l isoform [3] indicates that this In vitro interaction of RXRβ3 and TACC2s protein has several temporal and tissue specific functions yet to be identified.

Compilation and assembly of previously uncharacterized TACC cDNAs and genes
Corresponding orthologous sequences for TACC, RHAMM, KLP, KIF, TPM and keratins families were identified initially using the TBLASTN program [36] to search the published genomic and cDNA databases. For Takifugu rubripes, gene predictions were produced by the Ensembl automated pipeline http://www.ensembl.org/ Fugu_rubripes/ [37] and the JGI blast server http:// bahama.jgi-psf.org/fugu/bin/fugu_search. DNA sequences covering the homology regions were extracted and analyzed by Genscan to obtain potential exons. In some cases, exons were added or modified based on the best similarity of translated peptides to the corresponding mouse and human proteins. For regions with low sequence similarity, genomic sequences http:// www.genoscope.cns.fr/ from the fresh water pufferfish, Tetraodon nigroviridis were used as additional means to verify the predicted exons. Due to the variability of the central region of vertebrate TACC3 cDNAs (see text), to further confirm prediction of the Takifugu rubripes TACC3, full length cDNAs corresponding to the Danio rerio TACC3 (IMAGE clones 2639991, 2640369 and 3724452) were also obtained from A.T.C.C. and fully sequenced. Potential paralogous chromosomal segments and scaffold were identified by searching the public databases deposited at NCBI and at the Human Genome Mapping Project, Cambridge UK.