Sequencing and comparative analysis of fugu protocadherin clusters reveal diversity of protocadherin genes among teleosts
© Yu et al; licensee BioMed Central Ltd. 2007
Received: 25 October 2006
Accepted: 30 March 2007
Published: 30 March 2007
The synaptic cell adhesion molecules, protocadherins, are a vertebrate innovation that accompanied the emergence of the neural tube and the elaborate central nervous system. In mammals, the protocadherins are encoded by three closely-linked clusters (α, β and γ) of tandem genes and are hypothesized to provide a molecular code for specifying the remarkably-diverse neural connections in the central nervous system. Like mammals, the coelacanth, a lobe-finned fish, contains a single protocadherin locus, also arranged into α, β and γ clusters. Zebrafish, however, possesses two protocadherin loci that contain more than twice the number of genes as the coelacanth, but arranged only into α and γ clusters. To gain further insight into the evolutionary history of protocadherin clusters, we have sequenced and analyzed protocadherin clusters from the compact genome of the pufferfish, Fugu rubripes.
Fugu contains two unlinked protocadherin loci, Pcdh1 and Pcdh2, that collectively consist of at least 77 genes. The fugu Pcdh1 locus has been subject to extensive degeneration, resulting in the complete loss of Pcdh1γ cluster. The fugu Pcdh genes have undergone lineage-specific regional gene conversion processes that have resulted in a remarkable regional sequence homogenization among paralogs in the same subcluster. Phylogenetic analyses show that most protocadherin genes are orthologous between fugu and zebrafish either individually or as paralog groups. Based on the inferred phylogenetic relationships of fugu and zebrafish genes, we have reconstructed the evolutionary history of protocadherin clusters in the teleost fish lineage.
Our results demonstrate the exceptional evolutionary dynamism of protocadherin genes in vertebrates in general, and in teleost fishes in particular. Besides the 'fish-specific' whole genome duplication, the evolution of protocadherin genes in teleost fishes is influenced by lineage-specific gene losses, tandem gene duplications and regional sequence homogenization. The dynamic protocadherin clusters might have led to the diversification of neural circuitry among teleosts, and contributed to the behavioral and physiological diversity of teleosts.
A long-standing mystery facing neurobiologists is the molecular mechanism underlying the highly-diversified neural network in vertebrate brains . The discovery of three closely-linked protocadherin (Pcdh) clusters in mammalian genomes has led to an intriguing speculation that these genes may provide a profound molecular code for specifying neuron-neuron connections in the central nervous system [2–4]. Each of the three clusters, designated Pcdh α, β, and γ clusters, contains different numbers of large (~2.4 kb each) 'variable' exons. Each of these exons encodes an extracellular domain comprising six repeats of calcium-binding ectodomain (EC1-EC6), a transmembrane domain and a short cytoplasmic segment. The 3' ends of the α and γ (but not the β) clusters contain three 'constant' exons each, that are alternatively spliced to individual variable exons in their respective clusters. The constant exons encode the main part of the cytoplasmic domain shared by all members in the same cluster [2, 3]. In many ways, this type of genomic organization resembles the immunoglobulin and T-cell receptor gene loci, which are widely known for their ability to generate a remarkably diverse repertoire of antigen recognizing molecules. Pcdh genes are expressed mainly in the neurons, and their proteins are highly enriched on synaptic membranes [2, 5, 6]. The transcription of Pcdh genes is controlled by individual promoters located adjacent to each variable exon [7, 8], which contribute to the differential expression patterns of individual Pcdh genes in the central nervous system . The Pcdh genes also appear to be under a higher order of complex regulation since their expression seems to be allele-selective , and individual neurons, even of the same kind, express an overlapping but distinct combination of Pcdh genes [7, 8]. More recently, two long-range cis-regulatory elements in Pcdhα cluster have been identified and proposed to underlie the monoallelic expression of the Pcdh genes . Taken together, these features of Pcdh genes indeed suggest that they have the potential to play a fundamental role in establishing neural diversity in the brain.
The Pcdh clusters are essentially a vertebrate innovation that accompanied the emergence of the neural tube and the elaborate central nervous system. No such Pcdh cluster has been identified in invertebrate genomes . Mammals contain a single Pcdh locus consisting of about 60 genes [3, 6, 12–15]. The lobe-finned fish, coelacanth, which is believed to be a forerunner of tetrapods, also contains a single Pcdh locus organized into α, β, and γ clusters similar to mammals, with a total of 49 genes . In contrast, the teleost fish, zebrafish, contains two unlinked Pcdh loci (DrPcdh1 and DrPcdh2), presumably due to the 'fish-specific' genome duplication [17, 18]. The zebrafish genes in each locus are organized into only α and γ clusters. The two loci collectively contain at least 107 genes. The massive expansion of Pcdh genes in zebrafish has been attributed to lineage-specific expansion of individual genes in some Pcdh clusters [19, 20]. Interestingly, the zebrafish Pcdh genes have experienced concerted evolution through adaptive selection and gene conversion . Thus, the structure and organization of Pcdh clusters in zebrafish is quite divergent from that in lobe-finned fish and mammals. It has been speculated that the differences in the complement of Pcdh s in zebrafish and mammals could be related to the anatomical differences of their brains . However, it is not known whether the organization of Pcdh clusters in zebrafish is typical of all teleost fishes or unique to the zebrafish lineage. Teleosts are the largest and most successful group of vertebrates. The extant teleosts include almost the same number of species as all other living vertebrate species combined. Teleost fishes also exhibit wide diversity in their habitat, morphology, behavior, physiology and adaptations . Given the possible function of protocadherins in the formation of neural complexity, it would be of interest to characterize Pcdh clusters from diverse groups of teleost fishes.
In this study, we report the sequencing and comparative analysis of Pcdh clusters in the pufferfish, Fugu rubripes. Pufferfishes are unique in having the smallest genome among vertebrates. The reduction in the genome size of pufferfish is attributed to a paucity of repetitive sequences and short intergenic regions and introns. At 400 Mb, the fugu genome is one-eighth the human genome and one-quarter the size of the zebrafish genome. A 'draft' sequence of the fugu genome was completed in 2002 purely by the whole-genome shotgun sequencing strategy . However, we found that most of Pcdh genes were misassembled in the 'draft' genome sequence, most likely due to the presence of a highly similar 3' region (identity >99% across about 750 bp) shared by all the variable exons in the same paralog subcluster (see Results and discussion below). We therefore sequenced overlapping cosmid and BAC clones and meticulously assembled the complete sequence for the Pcdh loci. Our results show that there are two unlinked Pcdh loci in fugu, similar to zebrafish, and they contain at least 77 genes. The Pcdh1 locus in fugu has undergone an extensive degeneration, resulting in the complete loss of the γ cluster. Based on the inferred evolutionary relationships of fugu and zebrafish Pcdh genes, we were able to reconstruct the two Pcdh loci in the common ancestor of fugu and zebrafish and the ancestral single Pcdh locus in the teleost fish lineage prior to the 'fish-specific' whole-genome duplication. Our data indicate that Pcdh clusters in teleost fishes have undergone extensive diversification largely through lineage-specific degeneration, tandem gene duplication, and gene conversion.
Results and discussion
Two unlinked protocadherin loci in fugu
The larger contig includes a complete Pcdhα cluster containing 37 variable exons and three constant exons, followed by the first 23 variable exons of a Pcdhγ cluster (Fig 1). We identified three non-Pcdh genes, fEtf1, fFbxw11 and fFgf18 upstream of the Pcdhα cluster indicating that the α cluster on this contig is complete (Fig 1). The shorter contig contains 14 variable exons and three constant exons of a Pcdhγ cluster, followed by two non-Pcdh genes, fDiaph1 and fTrpc7 (Fig 1). RT-PCR with forward primers corresponding to variable exons of the Pcdhγ cluster of the larger contig and a reverse primer for the constant region of Pcdhγ in the shorter contig (data not shown) showed that the two contigs belong to the same locus. We designate this locus as FrPcdh2 locus (Fig 1). We were unable to fill the gap between the two contigs due to the lack of a genomic clone spanning the two contigs. Attempts to fill the gap by long-template PCR using fugu genomic DNA as a template also failed, presumably due to the large size of the gap between the two contigs. The FrPcdh2 locus contains 37 α variable exons and at least 37 γ variable exons. The exons downstream of the gap have been numbered with a prime sign (24' to 37') to indicate that the numbers may not represent their actual positions in the cluster. Thus, fugu possesses two unlinked Pcdh loci, Pcdh1 and Pcdh2.
The two Pcdh loci in fugu apparently resulted from segmental duplication from an ancestral Pcdh cluster. Phylogenetic analyses using constant (data not shown) and variable (see below) regions of fugu and zebrafish Pcdh genes show that the two Pcdh loci in fugu are orthologous to the duplicate zebrafish Pcdh loci, respectively, indicating that the locus duplication took place before the divergence of the two lineages. This duplication is most likely the result of the "fish-specific' whole genome duplication event that occurred early during the evolution of ray-finned fishes [17, 18]. Like the two zebrafish Pcdh loci, both fugu Pcdh loci lack β cluster genes. The presence of a β cluster in the lobe-finned fish and tetrapods, and its absence in fugu and zebrafish suggest that the β cluster either evolved in the lineage that led to the lobe-finned fish and tetrapods, or was already present in the common ancestor of these vertebrates and was subsequently lost in the teleost lineage before the divergence of the fugu and zebrafish lineages.
The two Pcdh loci in the fugu and zebrafish show significant differences in their gene content and organization. For instance, FrPcdh1 cluster is highly degenerate compared to the zebrafish Pcdh1; as a result, it contains only three α genes compared to ten α genes in zebrafish Pcdh1 cluster [12, 19, 20]. More strikingly, the fugu Pcdh1 locus completely lacks a γ cluster (Fig 1), whereas the zebrafish Pcdh1 locus contains a γ cluster with at least 28 genes [12, 19, 20]. Thus, unlike the zebrafish genome which contains two Pcdhγ clusters, fugu genome contains a single Pcdhγ cluster that is located in the Pcdh2 locus. The whole-genome sequence of a second pufferfish, Tetraodon nigroviridis, has recently been completed . To determine whether Tetraodon contains a single Pcdhγ cluster like the fugu, we performed a BLASTX search of the Tetraodon genome database and discovered that Tetraodon also contains two sets of α constant exons belonging to two putative Pcdh clusters but only a single set of γ constant exons similar to fugu. Thus the second copy of Pcdhγ cluster associated with the Pcdh1 locus seems to have been lost before the divergence of the fugu and Tetraodon. The highly degenerate nature of the Pcdh1 locus in pufferfishes is consistent with the trend of pufferfish genome towards compaction. The complete loss of the second copy of Pcdhγ cluster in pufferfish suggests that these Pcdhγ genes may be redundant. However, we cannot rule out the possibility that the loss of this cluster in pufferfishes might have an effect on their phenotype with regard to the structure and function of the central nervous system.
Phylogenetic relationships of fugu, zebrafish and coelacanth protocadherin genes
As shown in Fig 3a, Pcdhα genes of fugu, zebrafish and coelacanth comprise three large paralog/ortholog groups. The first group (group I in Fig 3a) contains genes localized at the two ends of fugu and zebrafish Pcdhα clusters, including fugu FrPcdh1α1–3, FrPcdh2α1–7, FrPcdh2α37 and zebrafish DrPcdh1α1–2, DrPcdh1α10, DrPcdh2α1–7, DrPcdh2α38, besides all but one of the genes (LmPcdhα14) in the coelacanth Pcdhα cluster. These fugu and zebrafish genes are further divided into four subgroups. The first subgroup (Ia in Fig 3a) consists of two fugu inter-locus paralogs, FrPcdh1α1 and FrPcdh2α1, zebrafish DrPcdh1α1 and LmPcdhα1. The second subgroup (Ib in Fig 3a) is comprised of two fugu inter-locus paralogs, FrPcdh1α2, FrPcdh2α2 and their zebrafish orthologs, DrPcdh1α2 and DrPcdh2α1. The third subgroup (Ic in Fig 3a) contains fugu FrPcdh1α3 and FrPcdh2α37 and their zebrafish orthologs DrPcdh1α10 and DrPcdh2α38, as well as the coelacanth ortholog, LmPcdha21. An interesting feature of these Pcdh genes is that they seem to be resistant to gene duplication. In spite of the heavy turnover of genes in their neighborhood (see below), they have been conserved as single-copy genes throughout the evolution of these vertebrates. This suggests that they may play a fundamental role in the central nervous system. The fourth subgroup (Id in Fig 3a) contains fugu FrPcdh2α3–7 and zebrafish DrPcdh2α2–7. No direct orthologous relationship can be identified between individual genes in this subgroup; instead, FrPcdh2α3–7 as a paralog group seems to be orthologous to DrPcdh2α5–7. This type of phylogenetic relationship indicates that subsequent to the divergence of the two species, the ancestral paralogs have undergone independent lineage-specific gene duplications, giving rise to a multi-gene paralog group in each species. This phylogenetic tree also suggests that the subgroup Ia and Ic are derived from a common ancestor, while subgroup Ib and Id share a common ancestor. Except LmPcdhα1 and LmPcdhα21, other coelacanth genes in this group do not show any direct orthology to fugu and zebrafish genes, suggesting that these genes are either specific to lobe-finned fish and tetrapods or have been lost from the teleost fish lineage.
The second paralog/ortholog group (group II in Fig 3a) in the Pcdhα phylogenetic tree comprises fugu FrPcdh2α26–36, zebrafish DrPcdh1α3–9, DrPcdh2α26–37, and a single coelacanth gene, LmPcdhα14. The subtrees of this group show that a subset of genes in the zebrafish Pcdh1α locus, the DrPcdh1α(3–5,7–8), are generated from an ancestral paralog of DrPcdh1α6 through multiple gene duplication events in the zebrafish lineage. No fugu ortholog for zebrafish DrPcdh1α3- 9 genes is found in FrPcdh1 locus, presumably due to the independent loss of this paralog group of genes in fugu. On the other hand, FrPcdh2α31–35 appear to be derived from a single common ancestor through lineage-specific duplications in fugu. A single fugu gene, FrPcdh2α36, seems to share a common ancestor with a cluster of zebrafish genes, DrPcdh2α(27,31–36), indicating that while the fugu gene was retained as single-copy, the zebrafish gene has undergone multiple duplications. The fourth subset of genes in this paralog/ortholog group consists of multiple fugu and zebrafish genes including FrPcdh2α26–30, DrPcdh2α(26,28–30) and DrPcdh1α9. However, the orthologous relationship between these subsets of genes cannot be inferred with confidence since the bootstrap values at their branch nodes are rather low (< 200). As these paralog/ortholog group genes are closely related and are clearly segregated from other fugu and zebrafish Pcdh paralog/ortholog group genes, we consider the whole group as one large paralog/ortholog group. The evolution of such paralog/ortholog groups is likely to have involved many rounds of lineage-specific gene duplication and degeneration. It appears that LmPcdhα14 is a distant ortholog of this group (Fig 3a). Interestingly, this coelacanth gene also shares common ancestry with the entire mammalian α cluster (except the c1 and c2 genes) , suggesting that this paralog/ortholog group of fugu and zebrafish genes is perhaps orthologous to the entire mammalian Pcdhα cluster.
The third paralog/ortholog group of Pcdhα genes (group III in Fig. 3a) seems to be teleost-specific, containing only fugu FrPcdh2α8–25 and zebrafish DrPcdh2α8–25. These genes can further be divided into three subgroups. The first subgroup (IIIa in Fig 3a) contains fugu FrPcdh2α8 and its zebrafish ortholog DrPcdh2α8, whereas the other two subgroups (IIIb and IIIc in Fig 3a) that contain multiple fugu and zebrafish paralogs do not exhibit any manifest individual orthologous relationships. However, it is clear that fugu FrPcdh2α (15,18–25) and FrPcdh2α (9–14,16–17) as paralog subgroups are orthologous to zebrafish DrPcdh2α19–25 and DrPcdh2α9–18, respectively.
Similar to Pcdhα genes, the Pcdhγ genes also form three large paralog/ortholog groups (Fig 3b). Orthology between multi-gene groups (between a single gene and a subset of genes and between two subsets of genes in two species) seems also to be a common feature of the Pcdh1γ cluster. For example, the fugu FrPcdh2γ32' (group II in Fig 3b) is apparently orthologous to the entire zebrafish paralog group DrPcdh2γ28–31, whereas the fugu FrPcdh2γ1–17 (group I in Fig 3b) as a paralog group is orthologous to a zebrafish DrPcdh2γ1–13. Additionally, orthology between two individual genes from two species is also observed in the Pcdhγ cluster. For instance, fugu FrPcdh2γ18 (group I in Fig 3b) is clearly an ortholog of zebrafish DrPcdh2γ14. Consistent with the previous study , the phylogenetic tree for Pcdhγ cluster also revealed that coelacanth Pcdhγ genes comprise five paralog groups (group III in Fig 3b), of which four, LmPcdhγ(1,3–4,7,9,19), LmPcdhγ11–16, LmPcdhγ(2,5,8,17–18,20) and LmPcdhβ1–4 are closely related to each other, whereas the fifth group, LmPcdhγ21–24, is more closely related to fugu FrPcdh2γ37' and zebrafish DrPcdh1γ28 (Fig 3b). Such a phylogenetic relationship suggests that a massive expansion of Pcdhγ genes has occurred in the coelacanth lineage subsequent to the divergence of these species.
Orthology between an individual gene in one species and a group of genes in another and between groups of genes in two species rather than between individual genes is a characteristic of multigene families which have experienced continuous events of lineage-specific gene duplications and losses. Pcdh cluster is a typical example of such a dynamic cluster of genes in vertebrates. The Pcdh clusters from fugu and zebrafish include instances of orthology between a single fugu gene and a group of paralogous zebrafish genes (e.g., FrPcdh2γ32' and DrPcdh2γ28–31) and between entire paralog groups of fugu and zebrafish genes (e.g., FrPcdh2γ1–17 and DrPcdh2γ1–13). These types of phylogenetic relationships among Pcdh genes in fugu and zebrafish illustrate the exceptionally dynamic evolutionary changes at the Pcdh loci in the teleost fish lineage following the 'fish-specific' whole genome duplication event. Although the single Pcdh cluster in mammals and the coelacanth have experienced gene duplications and losses, the extent of turnover is much lower than that in the fugu and zebrafish. Such variations in the complement of Pcdh genes show that Pcdh clusters are much more dynamic in teleost fishes than in mammals and lobe-finned fishes. Since teleost fishes are the most species-rich and most diverse group of vertebrates, it is likely that the evolutionarily dynamic Pcdh clusters in teleosts might have contributed to morphological and behavioral diversity of teleost fishes.
Regional gene conversion in fugu protocadherin locus
Paralog sequence similarity of fugu protocadherin subclusters
5' low homology region
3' high homology region
Pcdh protein sequenceb
Amino acid identityc (%)
Nucleotide identityd (%)
Pcdh protein sequenceb
Amino acid identityc (%)
Nucleotide identityd (%)
80.0 ± 13.4
79.0 ± 13.9
99.3 ± 0.3
99.3 ± 0.2
67.6 ± 10.8
69.9 ± 9.9
99.4 ± 0.3
99.1 ± 0.2
57.5 ± 6.6
60.7 ± 5.1
99.4 ± 0.3
99.4 ± 0.3
70.0 ± 10.8
72.7 ± 10.0
99.6 ± 0.3
99.3 ± 0.3
66.4 ± 9.0
68.2 ± 8.5
99.3 ± 0.3
99.3 ± 0.2
66.5 ± 9.5
68.7 ± 7.3
99.3 ± 0.6
99.6 ± 0.3
GC3 of fugu paralog subcluster protocadherin sequences
5' low homology regiona
3' high homology regiona
41.3 ± 7.2
61.2 ± 0.3
2.78 × 10-4
38.4 ± 4.3
54.4 ± 0.5
2.68 × 10-16
45.0 ± 2.2
60.7 ± 0.7
1.32 × 10-15
43.6 ± 2.6
56.3 ± 0.4
2.40 × 10-19
41.5 ± 3.4
58.5 ± 0.6
3.28 × 10-15
40.1 ± 1.2
68.8 ± 0.3
6.35 × 10-9
Pcdhs have been proposed to provide molecular diversities for neuron-neuron connections through the combinatorial interaction of protocadherin proteins. For classical cadherins, the trans-homophilic interaction (i.e. the interaction between cells) is mainly mediated by the EC1 domain . Although yet to be demonstrated experimentally, it is generally believed that Pcdhs also engage in a similar form of homophilic interaction as the classic cadherins. However, unlike classic cadherins which contain five ectodomains in their extracellular region, the extracellular region of Pcdhs contains six ectodomains. It is possible that the molecular diversifying signals of Pcdhs in fugu are encoded by the extracellular EC1-EC3 domains since this region is more divergent among individual Pcdhs as compared to the highly homologous C-terminal extracellular domains. This is consistent with the observation that the EC2 and EC3 domains of zebrafish and mammalian Pcdhs seldom undergo sequence homogenization processes and thus provide the most diversifying signals for the molecules . Interestingly, it has been shown recently that EC2 and EC3 of mammalian Pcdhs undergo diversity-enhancing positive diversifying selection . Collectively, these observations imply that the N-terminal ectodomains of Pcdhs play a crucial role in mediating neuronal connections in the brain. Furthermore, in contrast to the virtually 100% identical C-terminal sequences of paralogs in the same fugu subclusters, the converted regions are highly divergent between subclusters. The consensus sequences for the converted regions between different subclusters of Pcdh2α and Pcdh2γ exhibit on average only 37.7% and 38.9% identities, respectively. This implies that the converted regions in different subclusters may have undergone adaptive selection and acquired diverse functions specific to each subcluster. In contrast to fugu Pcdh2 cluster genes, the Pcdh1 cluster genes do not contain any signature for gene conversion.
Reconstruction of protocadherin clusters in ancestral fish lineage
We have identified two unlinked fugu Pcdh loci that collectively contain at least 77 Pcdh genes. The gene content of the two fugu Pcdh loci is quite different from that of the two Pcdh loci in zebrafish. We show that following the 'fish-specific' whole-genome duplication, regional sequence homogenization due to repeated lineage-specific gene conversion processes, secondary gene losses and tandem gene duplications are the major factors affecting the evolution of Pcdh clusters in teleosts. Based on phylogenetic analyses, we predict that there were at least six α and ten γ genes (or paralog groups) in the Pcdh locus of the ancestral fish genome prior to the whole-genome duplication event. Elucidating the origin and evolutionary dynamics of Pcdh clusters in different lineage of vertebrates is an important endeavor as it may help to uncover the molecular code for the complex central nervous system of vertebrates.
Sequencing and assembly of fugu Pcdh loci
To identify fugu Pcdh sequences in the fugu 'draft' genome, we performed TBLASTN search of fugu genome database using human protocadherin protein sequences as the query . We identified about 70 scaffolds that showed similarity to Pcdh protein with an E-value of 10-10 or less. Detailed examination of these scaffolds showed that most of the resulting scaffolds were misassembled due to the high sequence homology shared by multiple fugu Pcdh variable exons. Only three scaffolds, scaffold_6, scaffold_480 and scaffold_160, were found to contain large reliably-assembled sequences. Gaps within the relevant regions of these scaffolds were filled by PCR using fugu genomic DNA as a template. Scaffold_6 contained a complete Pcdh cluster flanked by non-Pcdh genes. We identified three overlapping cosmid clones that cover the Pcdh-containing region on scaffold_6. These include: c117N19, c112D15 and c5N15. For the other two scaffolds, scaffold_480 and scaffold_160, we used only the reliable Pcdh-containing sequences as the anchor sequence for identifying overlapping cosmid and BAC clones by BLASTN search of the cosmid or BAC end databases . We first attempted to sequence these cosmid and BAC clones by shotgun method. However, since this resulted in piling up of many variable exons, we resorted to cloning and sequencing restriction enzyme-digested fragments to obtain contiguous sequences. The protocol for shotgun sequencing of cosmid and BAC clones comprised of shearing DNA by ultra-sonication followed by end-filling by Klenow treatment. The blunt-ended DNA fragments were resolved on an agarose gel and 2–3 kb fragments were isolated and subcloned into the EcoRV site of pBluescript SK vector. Plasmid inserts were sequenced from both ends using T3 and T7 primers and BigDye Terminator technology (Applied Biosystem). Sequence reads were then edited and assembled using SeqMan (Lasergene). The Pcdh variable exons and non-Pcdh genes were annotated based on the results of BLASTX search of the non-redundant protein database at NCBI  and GENSCAN predictions . Sequences of fugu Pcdh clusters generated in this study have been submitted to GenBank under accession numbers DQ986917 and DQ986918. Human orthologs of the fugu non-Pcdh genes were identified by BLAT search of the human genome database at the UCSC genome browser .
The genomic sequences of zebrafish and coelacanth Pcdh clusters were retrieved from the GenBank . The zebrafish Pcdh clusters were assembled from sequences of AC144823, AC144826, AC144828, AC146480, AL929558, AB075928, BX005294 and BX957322[12, 16, 19, 20], whereas the coelacanth Pcdh clusters were assembled from sequences of AC150238, AC150284 and AC150308-AC150310. Variable exons were identified by BLASTX searches. We used the N-terminal protocadherin ectodomain sequences (EC1-EC3) for constructing phylogenetic trees as this region is structurally homologous in all species, which gives rise to few gaps in the alignment and does not undergo gene conversion. The sequences of EC1-EC3 from various species were aligned by ClustalX algorithm . Phylogenetic trees were constructed by the Neighbor-joining method based on sequence distance matrix, and the trees were drawn using NJplot . The robustness of the tree was determined by bootstrap analysis of 1000 replicate sample sequences.
Analysis for third position GC content
We used CODEML program in PAML package with default parameters to determine the GC content at third-position of codons . The nucleotide sequence alignments were generated by RevTrans program using amino acid sequence alignment as templates .
We thank GeneService Ltd for supplying fugu cosmid and BAC clones. We would like to thank Haslinawaty Bte Kassim, Alex Lim, Boon Hui Tay, Xixi Jia, Lei Ling Thia, Janice Tan for their excellent technical assistance. The research work in WPY's laboratory is supported by the Biomedical Research Council (BMRC), the National Medical Research Council (NMRC), and the SingHealth Foundation Funds, Singapore and the work in BV's laboratory is supported by the Agency for Science, Technology and Research (A*STAR), Singapore.
- SPERRY RW: Chemoaffinity in the orderly growth of nerve fiber patterns and connections. Proc Natl Acad Sci U S A. 1963, 50: 703-710. 10.1073/pnas.50.4.703.PubMed CentralView ArticlePubMedGoogle Scholar
- Kohmura N, Senzaki K, Hamada S, Kai N, Yasuda R, Watanabe M, Ishii H, Yasuda M, Mishina M, Yagi T: Diversity revealed by a novel family of cadherins expressed in neurons at a synaptic complex. Neuron. 1998, 20: 1137-1151. 10.1016/S0896-6273(00)80495-X.View ArticlePubMedGoogle Scholar
- Wu Q, Maniatis T: A striking organization of a large family of human neural cadherin-like cell adhesion genes. Cell. 1999, 97: 779-790. 10.1016/S0092-8674(00)80789-8.View ArticlePubMedGoogle Scholar
- Shapiro L, Colman DR: The diversity of cadherins and implications for a synaptic adhesive code in the CNS. Neuron. 1999, 23: 427-430. 10.1016/S0896-6273(00)80796-5.View ArticlePubMedGoogle Scholar
- Frank M, Ebert M, Shan W, Phillips GR, Arndt K, Colman DR, Kemler R: Differential expression of individual gamma-protocadherins during mouse brain development. Mol Cell Neurosci. 2005, 29: 603-616. 10.1016/j.mcn.2005.05.001.View ArticlePubMedGoogle Scholar
- Zou C, Huang W, Ying G, Wu Q: Sequence analysis and expression mapping of the rat clustered protocadherin gene repertoires. Neuroscience. 2007, 144: 579-603. 10.1016/j.neuroscience.2006.10.011.View ArticlePubMedGoogle Scholar
- Tasic B, Nabholz CE, Baldwin KK, Kim Y, Rueckert EH, Ribich SA, Cramer P, Wu Q, Axel R, Maniatis T: Promoter choice determines splice site selection in protocadherin alpha and gamma pre-mRNA splicing. Mol Cell. 2002, 10: 21-33. 10.1016/S1097-2765(02)00578-6.View ArticlePubMedGoogle Scholar
- Wang X, Su H, Bradley A: Molecular mechanisms governing Pcdh-gamma gene expression: evidence for a multiple promoter and cis-alternative splicing model. Genes Dev. 2002, 16: 1890-1905. 10.1101/gad.1004802.PubMed CentralView ArticlePubMedGoogle Scholar
- Esumi S, Kakazu N, Taguchi Y, Hirayama T, Sasaki A, Hirabayashi T, Koide T, Kitsukawa T, Hamada S, Yagi T: Monoallelic yet combinatorial expression of variable exons of the protocadherin-alpha gene cluster in single neurons. Nat Genet. 2005, 37: 171-176. 10.1038/ng1500.View ArticlePubMedGoogle Scholar
- Ribich S, Tasic B, Maniatis T: Identification of long-range regulatory elements in the protocadherin-alpha gene cluster. Proc Natl Acad Sci U S A. 2006, 103: 19719-19724. 10.1073/pnas.0609445104.PubMed CentralView ArticlePubMedGoogle Scholar
- Hill E, Broadbent ID, Chothia C, Pettitt J: Cadherin superfamily proteins in Caenorhabditis elegans and Drosophila melanogaster. J Mol Biol. 2001, 305: 1011-1024. 10.1006/jmbi.2000.4361.View ArticlePubMedGoogle Scholar
- Wu Q: Comparative genomics and diversifying selection of the clustered vertebrate protocadherin genes. Genetics. 2005, 169: 2179-2188. 10.1534/genetics.104.037606.PubMed CentralView ArticlePubMedGoogle Scholar
- Wu Q, Zhang T, Cheng JF, Kim Y, Grimwood J, Schmutz J, Dickson M, Noonan JP, Zhang MQ, Myers RM, Maniatis T: Comparative DNA sequence analysis of mouse and human protocadherin gene clusters. Genome Res. 2001, 11: 389-404. 10.1101/gr.167301.PubMed CentralView ArticlePubMedGoogle Scholar
- Yanase H, Sugino H, Yagi T: Genomic sequence and organization of the family of CNR/Pcdhalpha genes in rat. Genomics. 2004, 83: 717-726. 10.1016/j.ygeno.2003.09.022.View ArticlePubMedGoogle Scholar
- Sugino H, Hamada S, Yasuda R, Tuji A, Matsuda Y, Fujita M, Yagi T: Genomic organization of the family of CNR cadherin genes in mice and humans. Genomics. 2000, 63: 75-87. 10.1006/geno.1999.6066.View ArticlePubMedGoogle Scholar
- Noonan JP, Grimwood J, Danke J, Schmutz J, Dickson M, Amemiya CT, Myers RM: Coelacanth genome sequence reveals the evolutionary history of vertebrate genes. Genome Res. 2004, 14: 2397-2405. 10.1101/gr.2972804.PubMed CentralView ArticlePubMedGoogle Scholar
- Vandepoele K, De VW, Taylor JS, Meyer A, Van de PY: Major events in the genome evolution of vertebrates: paranome age and size differ considerably between ray-finned fishes and land vertebrates. Proc Natl Acad Sci U S A. 2004, 101: 1638-1643. 10.1073/pnas.0307968100.PubMed CentralView ArticlePubMedGoogle Scholar
- Christoffels A, Koh EG, Chia JM, Brenner S, Aparicio S, Venkatesh B: Fugu genome analysis provides evidence for a whole-genome duplication early during the evolution of ray-finned fishes. Mol Biol Evol. 2004, 21: 1146-1151. 10.1093/molbev/msh114.View ArticlePubMedGoogle Scholar
- Tada MN, Senzaki K, Tai Y, Morishita H, Tanaka YZ, Murata Y, Ishii Y, Asakawa S, Shimizu N, Sugino H, Yagi T: Genomic organization and transcripts of the zebrafish Protocadherin genes. Gene. 2004, 340: 197-211. 10.1016/j.gene.2004.07.014.View ArticlePubMedGoogle Scholar
- Noonan JP, Grimwood J, Schmutz J, Dickson M, Myers RM: Gene conversion and the evolution of protocadherin gene cluster diversity. Genome Res. 2004, 14: 354-366. 10.1101/gr.2133704.PubMed CentralView ArticlePubMedGoogle Scholar
- Venkatesh B: Evolution and diversity of fish genomes. Curr Opin Genet Dev. 2003, 13: 588-592. 10.1016/j.gde.2003.09.001.View ArticlePubMedGoogle Scholar
- Aparicio S, Chapman J, Stupka E, Putnam N, Chia JM, Dehal P, Christoffels A, Rash S, Hoon S, Smit A, Gelpke MD, Roach J, Oh T, Ho IY, Wong M, Detter C, Verhoef F, Predki P, Tay A, Lucas S, Richardson P, Smith SF, Clark MS, Edwards YJ, Doggett N, Zharkikh A, Tavtigian SV, Pruss D, Barnstead M, Evans C, Baden H, Powell J, Glusman G, Rowen L, Hood L, Tan YH, Elgar G, Hawkins T, Venkatesh B, Rokhsar D, Brenner S: Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes. Science. 2002, 297: 1301-1310. 10.1126/science.1072104. [http://www.fugu-sg.org]View ArticlePubMedGoogle Scholar
- Jaillon O, Aury JM, Brunet F, Petit JL, Stange-Thomann N, Mauceli E, Bouneau L, Fischer C, Ozouf-Costaz C, Bernot A, Nicaud S, Jaffe D, Fisher S, Lutfalla G, Dossat C, Segurens B, Dasilva C, Salanoubat M, Levy M, Boudet N, Castellano S, Anthouard V, Jubin C, Castelli V, Katinka M, Vacherie B, Biemont C, Skalli Z, Cattolico L, Poulain J, De B, Cruaud C, Duprat S, Brottier P, Coutanceau JP, Gouzy J, Parra G, Lardier G, Chapple C, McKernan KJ, McEwan P, Bosak S, Kellis M, Volff JN, Guigo R, Zody MC, Mesirov J, Lindblad-Toh K, Birren B, Nusbaum C, Kahn D, Robinson-Rechavi M, Laudet V, Schachter V, Quetier F, Saurin W, Scarpelli C, Wincker P, Lander ES, Weissenbach J, Roest CH: Genome duplication in the teleost fish Tetraodon nigroviridis reveals the early vertebrate proto-karyotype. Nature. 2004, 431: 946-957. 10.1038/nature03025.View ArticlePubMedGoogle Scholar
- Drouin G, Prat F, Ell M, Clarke GD: Detecting and characterizing gene conversions between multigene family members. Mol Biol Evol. 1999, 16: 1369-1390.View ArticlePubMedGoogle Scholar
- Taguchi Y, Koide T, Shiroishi T, Yagi T: Molecular evolution of cadherin-related neuronal receptor/protocadherin(alpha) (CNR/Pcdh(alpha)) gene cluster in Mus musculus subspecies. Mol Biol Evol. 2005, 22: 1433-1443. 10.1093/molbev/msi130.View ArticlePubMedGoogle Scholar
- Smith NG, Eyre-Walker A: Synonymous codon bias is not caused by mutation bias in G+C-rich genes in humans. Mol Biol Evol. 2001, 18: 982-986.View ArticlePubMedGoogle Scholar
- Galtier N, Piganeau G, Mouchiroud D, Duret L: GC-content evolution in mammalian genomes: the biased gene conversion hypothesis. Genetics. 2001, 159: 907-911.PubMed CentralPubMedGoogle Scholar
- Yap AS, Brieher WM, Pruschy M, Gumbiner BM: Lateral clustering of the adhesive ectodomain: a fundamental determinant of cadherin function. Curr Biol. 1997, 7: 308-315. 10.1016/S0960-9822(06)00154-0.View ArticlePubMedGoogle Scholar
- The National Center for Biotechnology Information. 2007, [http://www.ncbi.nlm.nih.gov]
- Burge C, Karlin S: Prediction of complete gene structures in human genomic DNA. J Mol Biol. 1997, 268: 78-94. 10.1006/jmbi.1997.0951. [http://genes.mit.edu/GENSCAN.html]View ArticlePubMedGoogle Scholar
- UCSC genome browser. 2007, [http://genome.ucsc.edu/]
- Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG: The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 1997, 25: 4876-4882. 10.1093/nar/25.24.4876. [http://www-igbmc.u-strasbg.fr/BioInfo]PubMed CentralView ArticlePubMedGoogle Scholar
- Perriere G, Gouy M: WWW-query: an on-line retrieval system for biological sequence banks. Biochimie. 1996, 78: 364-369. 10.1016/0300-9084(96)84768-7. [http://pbil.univ-lyon1.fr/software/njplot.html]View ArticlePubMedGoogle Scholar
- Yang Z: PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci. 1997, 13: 555-556. [http://abacus.gene.ucl.ac.uk/software/paml.html]PubMedGoogle Scholar
- Wernersson R, Pedersen AG: RevTrans: Multiple alignment of coding DNA from aligned amino acid sequences. Nucleic Acids Res. 2003, 31: 3537-3539. 10.1093/nar/gkg609. [http://www.cbs.dtu.dk/services/RevTrans]PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.