The integrins of the urochordate Ciona intestinalis provide novel insights into the molecular evolution of the vertebrate integrin family

Background Integrins are a functionally significant family of metazoan cell surface adhesion receptors. The receptors are dimers composed of an alpha and a beta chain. Vertebrate genomes encode an expanded set of integrin alpha and beta chains in comparison with protostomes such as drosophila or the nematode worm. The publication of the genome of a basal chordate, Ciona intestinalis, provides a unique opportunity to gain further insight into how and when the expanded integrin supergene family found in vertebrates evolved. Results The Ciona genome encodes eleven α and five β chain genes that are highly homologous to their vertebrate homologues. Eight of the α chains contain an A-domain that lacks the short alpha helical region present in the collagen-binding vertebrate alpha chains. Phylogenetic analyses indicate the eight A-domain containing α chains cluster to form an ascidian-specific clade that is related to but, distinct from, the vertebrate A-domain clade. Two Ciona α chains cluster in laminin-binding clade and the remaining chain clusters in the clade that binds the RGD tripeptide sequence. Of the five Ciona β chains, three form an ascidian-specific clade, one clusters in the vertebrate β1 clade and the remaining Ciona chain is the orthologue of the vertebrate β4 chain. Conclusion The Ciona repertoire of integrin genes provides new insight into the basic set of these receptors available at the beginning of vertebrate evolution. The ascidian and vertebrate α chain A-domain clades originated from a common precursor but radiated separately in each lineage. It would appear that the acquisition of collagen binding capabilities occurred in the chordate lineage after the divergence of ascidians.


Background
Integrins are cell surface adhesion receptors which mediate cell-extracellular matrix (ECM), cell-cell and cell-path-ogen interactions. Integrin receptors are structurally elaborate and composed of non-covalently associated α and β subunits. Integrins have a large extracellular domain responsible for binding extracellular ligands, a transmembrane domain and, a relatively small intracellular domain that interacts with the cytoskeleton and intracellular signaling pathways. Integrins integrate information from the extracellular and cytoplasmic environments by transducing signals bidirectionally across the plasma membrane. Hence, the binding of a specific ECM ligand by an integrin may elicit the activation of intracellular signaling pathways, cytoskeletal reorganisations and changes in cell adhesion or migration; and conversely, alterations in the intracellular environment and signaling can result in the activation or inhibition of ligand binding by the extracellular domain of an integrin [1,2]. Consequently, integrins have fundamental roles in diverse physiological processes including: tissue morphogenesis and remodeling [3], immune and inflammatory responses [4], and regulation of cell growth, migration and differentiation [5].
Integrin ligands include ECM components such as laminins, fibronectin and collagens [2], cell surface intercellular adhesion molecules (ICAMs) and plasma proteins such as fibrinogen [4]. Since cell adhesion and the production of a collagen-based ECM are essential characteristics of metazoa, it is not surprising that integrins have been detected throughout the multicellular animal kingdom, from the simplest and most primitive phyla (sponge and cnidarians) [6] to higher vertebrates. In humans, 18 α and 8 β integrin subunits combine to form 24 functionally distinct heterodimers [2].
Within the complement of 24 human integrin receptors, distinct functional sub-divisions can be made on the basis of ligand specificity and tissue distribution. A previous study of integrin phylogeny [7] identified 5 α subunit clades with vertebrate chains occupying 4 of these, namely: laminin binding (α3, α6 and α7 -PS1 clade), RGD tri-peptide binding (αV, αIIb, α5 and α8 -PS2 clade), and 2 vertebrate-specific clusters consisting of a small clade comprising the α4 and α9 subunits and a large αA-domain containing clade including both collagenbinding (α1, α2, α10 and α11) and leukocyte-specific (αD, αE, αL, αM and αX) α subunits (I-DOM clade). The αA-domain is structurally homologous to the module identified originally in von Willebrand factor. The collagen-binding αA-domains contain an inserted α-helix which appears to contribute directly to ligand binding. α subunit integrin homologues from model invertebrates C. elegans and D. melanogaster clustered in the laminin clade (PS1) and the RGD binding (PS2) clade with remaining α chains forming a drosophila-specific PS3 clade [7].
The α chain αA-domain, shared by all members of the I-DOM clade, mediates ligand binding in a metal iondependant manner by way of a conserved, non-contigu-ous sequence termed the metal ion-dependent adhesion site (MIDAS) motif [8,9]. It is somewhat surprising that no examples of collagen-binding (i.e. αA-domain-containing) integrins have been found in protostomes since basement membrane and fibrillar collagens are essential components of some of the most primitive invertebrates such as the cnidarian, Hydra vulgaris [10,11]. The origin of the I-DOM clade remains to be fully determined although since the urochordate Halocynthia roretzi has at least one α integrin containing an αA-domain [12] the limited evidence available to date suggests that it may be a chordate invention. In contrast, all known β subunits have a conserved βA-domain (also known as the I-like domain) [13] that must therefore have been present in the prototypic metazoan β subunit.
Vertebrate β subunits have been resolved into three branches by phylogenetic analysis, with a majority of sequences falling into two well-supported clades [7]. The two clades, termed β1 (β1, β2 and β7) and β3 (β3, β5, β6 and β8) included seven of the eight known subunits. In the β1 clade, the β2 and β7 subunits are leukocyte-specific, whereas their common ancestor, the β1 subunit, forms heterodimers with 12 of the 18 α subunits. Clade β3 subunits are all specific to RGD ligand binding integrins. The β4 subunit was positioned separately to other vertebrate clades and contains a unique extended cytoplasmic domain (~1000 residues), which makes direct contact with intermediate filaments via type fibronectin III repeats [14]. Protostome β subunits from C. elegans and D. melanogaster were not found to cluster with vertebrate sequences. Similarly, early deuterostome sequences (sea urchin) formed lineage-specific clusters with poor resolution amongst all invertebrate clades. A recent study by Miyazawa and Nonaka (2004) presented a phylogeny of integrin β subunits including novel sequences from the urochordate Halocynthia roretzi. These subunits are expressed on ascidian phagocytic blood cells (hemocytes) but phylogenetic analyses positioned them distal from vertebrate leukocyte β integrins in an ascidianspecific clade [15].
The recent sequencing of the genome of the ascidian Ciona intestinalis [16], a urochordate and one of the closest invertebrate relatives of vertebrates, provides a unique opportunity to gain insight into the complete set of integrins available in chordates prior to the large-scale or genome duplication events that many believe were associated with the early stages of vertebrate evolution [17][18][19][20]. An early preliminary analysis identified candidate integrin genes in the Ciona genome [21]. Here we have identified and refined the sequences of 11 α and 5 β integrin subunits from the Ciona genome. Eight of the α chains contain an αA-domain (also known as an I-Domain) that lacks the collagen-binding α helix and these chains form an ascidian-specific clade related to, but distinct from, the vertebrate I-DOM clade. The remaining 3 α chains are predicted to bind laminins and RGD-containing motifs. Two Ciona β chains cluster with the vertebrate β1 clade and the remaining 3 form an ascidian-specific β clade. The majority of chains in ascidian-specific clades are expressed on cells in the blood and are likely to be involved in innate immunological processes. These novel data provide further insights into the mechanism of evolution of the vertebrate family of integrins, and specifically when and how specific clades of integrin chains first arose. The differences between the ascidian and vertebrate complements of integrins again emphasises how phyla and species mould their genomes by the amplification of specific subsets of genes as part of the process of acquiring a stable and successful phenotype.

Obtaining refined protein sequences for integrins encoded in the Ciona genome
The C. intestinalis genome database was searched for α and β integrin genes and the sequences listed in Fig. 1A were obtained. These sequences were already annotated and no additional genes were obtained during extensive BLAST searching of the Ciona genome using the 26 human integrin chain sequences. The 12 annotated α chain sequences recovered directly from genome database clearly represented fragments of genes, whereas the five β chain sequences represented a much more complete data set (Fig. 1A). In order to generate more complete α chain data, DNA flanking the predicted gene fragments were downloaded and searched manually for putative "missing" exons. The search involved the identification of ORFs based on their sequence being present in EST clones or conservation of their translated sequence in comparison with homologous genes. Eleven Ciona α chain genes were confirmed by this process (Fig. 1B) with two of the JGI predicted genes (ci0100152017 & ci0100152002 - Fig.  1A) proving to represent different regions of a single gene (Fig. 2). The sequence refinement process was aided by the fact that four of the Ciona α chain genes were very highly conserved and had clearly radiated relatively recently by a process involving tandem duplication. The completion of one full-length gene sequence therefore considerably simplified the elucidation of the exon structures of the remaining three genes since these were well conserved (Table 1), which was not the case when comparing the gene structures of more distantly related integrins (data not shown). The size and relative genomic loci of these four genes (ci100152017, ci0100130149, ci0100152615 & ci0100131399 -subsequently referred to as α5-8) is presented in Fig. 2.

Sequence alignments and characterization
The sequences for the 11 Ciona α chains were aligned with human homologues. An annotated version of this alignment is presented in Fig. 3,4,5. Eight of the 11 Ciona α chains (α1-8) have a well conserved αA-domain including the essential residues that constitute the MIDAS motif (Fig. 3). The position of the αA-domain insertion in the human and Ciona chains is identical, indicating that all these chains have arisen from a common progenitor. However, all 8 Ciona chains lack 9-11 amino acid residues corresponding to the 'collagen binding' α-helical domain present in the collagen binding (Hs_α1 & α10) vertebrate α chains (Fig. 3). The Ciona α chains share all other major features with their vertebrate homologues across the alignment including the well-defined transmembrane and conserved intracellular interaction domains ( Fig. 3,4,5).
The sequences of the five Ciona β chains are also highly conserved with respect to their vertebrate orthologues including the MIDAS motif within the I-like domain, the four EGF domains, the transmembrane domain, and the intracellular interaction motifs and PTB-like domains ( Fig. 6 &7).

Phylogenetic analyses
The α chain analysis is presented in Fig. 8 in the form of a maximum likelihood tree with supporting data from 1,000 neighbor-joining bootstrap replicates and Bayesian analysis. Overall the inferred phylogenetic relationships are consistent with a previous phylogenetic reconstruction [7]. The clades identified by Hughes (PS1, PS2, and the vertebrate I-DOM and α4/9) are all present (Fig. 8). Note, the PS3 clade is not shown because it is specific to Drosophila. Ciona α9 and α10 cluster in the PS1 clade and their position, separating the protostome and vertebrate sequences, is as expected. In contrast, Ciona α11 clusters with its ascidian orthologue (Hr_α2) in the PS2 clade but at an anomalous position distal to the protostome sequences. Only Neighbor Joining analysis (not shown) produced the anticipated branching in this region of the tree with protostome PS2 clade members being most distal. The remaining eight Ciona α chains (Ci_α1 to α8) all have an αA-domain and form an ascidian specific clade that includes the αA-domain containing H. roretzi α1 chain (I-DOM [ascidian] - Fig. 8).
Comparison of sequences recovered directly from the JGI C. intestinalis database before (A -Recovered) and after (B -Refined) sequence refinement Figure 1 Comparison of sequences recovered directly from the JGI C. intestinalis database before (A -Recovered) and after (B -Refined) sequence refinement. The annotated cartoons represent the domain structures of generic α & β integrin chains. Each subsequent row represents: (A -Recovered) the domain structure encoded by a sequence as retrieved directly from the database together with its assigned database accession number; and (B -Refined) the refined version of that gene after detailed analysis of the genomic sequence as described in the methods together with the name assigned during these analyses (e.g. Ci_α1). Refined sequences are presented as alignments in Fig. 3-7 and are also included as amino acid residue sequence files (see additional file 1).

Discussion
The urochordate C. intestinalis occupies a pivotal position in the animal kingdom for understanding the evolution of vertebrates. The Ciona genome provides an insight into the basic set of genes available at the very beginning of vertebrate evolution since the urochordates diverged just prior to the widespread gene duplication processes that are thought to have shaped and transformed the vertebrate genome [16].
The Ciona genome encodes 11 α and five β integrin chains. As expected, some Ciona α chains cluster in the PS1 laminin-binding clade (Ci_α9 & α10), and in the PS2 RGD-binding clade (Ci_α11 together with its ascidian orthologue Hr_ α2 -see Fig 8). It is unclear why the urochordate members of the PS2 clade cluster in an anomalous position distal to the protostome sequences although this suggests that the ancestral urochordate PS2 gene underwent significant and rapid sequence changes after divergence from the lineage leading to vertebrates. The remaining eight Ciona α chains all contain an αA-domain and, somewhat surprisingly, form an ascidian-specific clade related to, but distinct from, the vertebrate I-DOM clade. This phylogenetic relationship suggests the vertebrate and ascidian αA-domain containing clades arose from a common progenitor but that this gene radiated separately in both the ascidian and vertebrate lineages after their divergence. Data supporting this hypothesis includes: i) ascidian and vertebrate genes have the αAdomain inserted at the same location supporting the notion of both lineages having a common progenitor (Fig. 3); ii) all eight ascidian α chain αA-domains lack the same 9-11 amino acids encompassing and adjacent to the α-helical domain in the collagen-binding α chains (Fig.   3); and iii) at least four of the Ciona αA-domain-containing α chains (α5-α8) appear to have arisen very recently as a result of tandem duplications within the ascidian genome based on retained similarities in exon size (Table  1), their high level of sequence identity (Fig. 3,4,5 and 8) and their genomic location (Fig. 2). The common progenitor gene presumably evolved in deuterostomes, possibly in the earliest chordates since the vertebrate and ascidian lineages of αA-domain α chains have radiated entirely separately and no protostome orthologues have been identified. The most likely function for the progenitor αA-domain-containing α chain involves the adhesion of blood cells to complement-like proteins or the extracellular matrix. In the urochordate H. roretzi, the Hr_α1 αAdomain gene, together with the Hr_β1 & β2 genes, are expressed on hemocytes and are thought to act as complement receptors [12]. A large number of EST's for Ciona αA-domain-containing α chains have been found in either blood cell or hemocyte cDNA libraries (Table 2). Finally, the expression of half of the genes comprising the vertebrate I-DOM clade is leukocyte-specific. In addition, it would seem that the collagen-binding property exhibited by the other half of the I-DOM α chains was a late functional acquisition of this vertebrate clade, perhaps associated with an insertional mutagenic event creating the collagen binding α-helix. It is noteworthy that Ciona expresses progenitor forms for all three clades of vertebrate fibrillar collagens [22]. It is therefore apparent that the early evolution of chordates did not require collagenbinding integrins. Functionally important interactions between integrins and collagen triple helices must have developed later in chordate evolution, possibly in the earliest vertebrates and co-incidental with the acquisition of the collagen-binding helix.
The phylogenetic relationships of the Ciona and vertebrate β integrin chains (see Fig. 9) emphasizes the pivotal position invertebrate chordates occupy with respect to understanding how vertebrates and their genes evolved. Previous phylogenetic analysis has suggested that neither protostome nor early deuterostomes (echinoderm) β chain sequences cluster with their vertebrate orthologues [7]. The clustering of a Ciona (Ci_β1) and a previously reported echinoderm sequence (Sp_ βC) with the vertebrate β1 clade genes ( Fig. 9) resolves more clearly how the promiscuous vertebrate β1 chain and its paralogues have evolved from a deuterostome-specific progenitor. In addition, the vertebrate β4 chain, which fails to cluster with any other vertebrate genes, has a Ciona orthologue (Ci_ β5, Fig. 9) and must therefore have evolved prior to the divergence of ascidians. It was not possible to determine for certain whether the Ci_ β5 chain has the extended intracellular C-terminal domain present in the vertebrate β4 chain using either EST analysis or the search for exons containing conserved ORFs. Nevertheless, direct translation of 10 kb of genomic sequence 3' to the predicted Cterminus of the Ciona gene revealed the presence of short ORFs homologous to a domain present in the vertebrate integrin β4 intracellular domain and shared by Na + -Ca 2+ exchangers (data not shown). The remaining three Ciona β gene sequences formed an ascidian-specific clade together with two β chains from H. roretzi (Fig. 9).
The genomic locations and orientations of recovered and refined α integrin genes present on scaffolds 21 and 91  Prediction of dimerisation patterns for novel integrin chains based on a combination of known interactions and phylogeny As phylogenetic relationships between novel α and β chains become defined, it is possible to start predicting likely interactions based on the known dimerisation partners of close relatives (Fig. 10 one laminin binding (PS1), one RGD binding (PS2) and one αA-domain containing chain (Fig. 8). The novel Ciona data therefore clearly indicate that radiation of vertebrate α chain genes took place after the divergence of urochordates. The β chain phylogeny ( Fig. 9) indicates that the common progenitor of urochordates and vertebrates had a minimum of one β1-like gene (that subsequently radiated in vertebrates) and a single Hs_β4/Ci_ β5 progenitor that radiated in neither lineage (Fig. 9). The origins of the ascidian and vertebrate-specific β clades is not resolved.
The ascidian lineage exhibits amplifications of subsets of integrin genes to produce ascidian-specific classes of novel integrins (Fig. 8,9,10). In particular, these novel integrins appear to be expressed in blood (see above) and may be involved in mechanisms associated with innate immunity. It has been proposed that the major metamorphic transformations between ascidian larval and adult body forms may be dependent upon innate immune responses [23]. This suggestion would provide a plausible explanation of the requirement for an expanded set of urochordate hemocyte integrins although more generic explanations, such as the preparation for a sessile life-style under attack by pathogens, are also possible.

Methods
The complete sequences of the 26 known human integrin genes were used to probe the Ciona intestinalis genome and TIGR cDNA gene index (http://genome.jgi-psf.org/ ciona4/ciona4.home.html and http://www.tigr.org/tdb/ tgi/cingi using TBLASTN and PSI-BLAST with cut-off expectancy values of E = 1) to identify homologous genes [16,24]. Ciona gene models were also detected using the orthologue detection program InParanoid using a keyword search using 'integrin' as the query http://abi.mar seille.inserm.fr/cgi-bin/karine/inparanoid-para [25]. To identify all the integrin genes, reciprocal BLAST searches of the Ciona, human and non-redundant databases were used. Frequently, EST for the Ciona genes contradicted the proposed gene models from JGI. In instances where an EST clearly demonstrated the misplacement of exons in the recovered JGI model, the protein sequence was corrected to reflect this. To detect missing exons not supported by EST data, genomic DNA flanking the sequence of interest was retrieved and analysed using the GENES-CAN [26]http://genes.mit.edu/GENSCAN.html and GENEWISE [27]http://www.ebi.ac.uk/Wise2 gene prediction programmes. Modified sequences were checked by aligning with respective human integrin profiles using CLUSTAL X [28] and corrected coding sequences used for subsequent analyses. Expression profiles for the Ciona genes were obtained from the TIGR database (see above).
The α and β integrin sequences were aligned separately using CLUSTAL X. The variable domain structure amongst α integrins necessitated subdivision of the alignment groups based on the presence/absence of an αA-domain. Subgroups were aligned and then combined so that the final alignment contained all the α integrins with a 200residue (approx) gap region corresponding to the αAdomain.
For phylogenetic analysis, gap-containing sites were removed from each alignment and Maximum Likelihood trees were inferred using PROML from the PHYLIP package [29]. The JTT model of amino acid substitutions was used with and without global rearrangements and correction for rate heterogeneity (α value obtained from TREEPUZZLE [30]). The topologies of the trees were tested using two independent methods: Neighbour-joining bootstrap replicates and Bayesian tree inference using PHYLIP and Mr Bayes programmes respectively [31]. The accession numbers for protein sequences used in this study are presented in Tables 3 &4.