Molecular evolution of the MAGUK family in metazoan genomes

Background Development, differentiation and physiology of metazoans all depend on cell to cell communication and subsequent intracellular signal transduction. Often, these processes are orchestrated via sites of specialized cell-cell contact and involve receptors, adhesion molecules and scaffolding proteins. Several of these scaffolding proteins important for synaptic and cellular junctions belong to the large family of membrane-associated guanylate kinases (MAGUK). In order to elucidate the origin and the evolutionary history of the MAGUKs we investigated full-length cDNA, EST and genomic sequences of species in major phyla. Results Our results indicate that at least four of the seven MAGUK subfamilies were present in early metazoan lineages, such as Porifera. We employed domain sequence and structure based methods to infer a model for the evolutionary history of the MAGUKs. Notably, the phylogenetic trees for the guanylate kinase (GK)-, the PDZ- and the SH3-domains all suggested a matching evolutionary model which was further supported by molecular modeling of the 3D structures of different GK domains. We found no MAGUK in plants, fungi or other unicellular organisms, which suggests that the MAGUK core structure originated early in metazoan history. Conclusion In summary, we have characterized here the molecular and structural evolution of the large MAGUK family. Using the MAGUKs as an example, our results show that it is possible to derive a highly supported evolutionary model for important multidomain families by analyzing encoded protein domains. It further suggests that larger superfamilies encoded in the different genomes can be analyzed in a similar manner.


Background
The membrane-associated guanylate kinase (MAGUK) family in mammals consists of 22 members, which vary in size and domain organization (Fig. 1A, Table 1). Despite these variances, the MAGUKs share a well conserved core structure, which is comprised of one or multiple PDZ domains, a Src homology 3 (SH3) and a guanylate kinase (GK) domain. The SH3 and GK domains surpass their canonical counterparts by interacting with each other to form a super domain. This characteristic is also remnant of voltage-gated calcium channel beta subunits that lack PDZ domains. The large MAGUK family, as already implied by their assortment of different gene architectures, encodes for a heterologous group of proteins with very diverse biological functions (for review see [1]). Molecular studies essentially have established a wide variety of cellular functions for different MAGUKs. Examples include regulation of cellular processes including such as: establishment of cell polarity, tight junction formation, cell proliferation or apoptosis, cell differentiation and neuronal synapse transmission [2][3][4][5]. Mutations or changes in expression have often been found to cause defects in cell-cell adhesion, cell polarity, cell proliferation and subsequently development [6][7][8][9].
The domain architecture of the MAGUKs enables interaction with receptors, the actin-cytoskeleton and ion channels, but also allows for tethering of different MAGUK subfamily proteins together [10][11][12]. MAGUK proteins may contain up to six PDZ domains, two L27 and two WW domains (Fig. 1A). In addition to these domains, all MAGUKS, expect for the MAGI subfamily, contain a SH3 domain. The GK domain of the MAGUKs shares homology with Guk1of yeast but appears to have lost GMP binding capacity and catalytic activity [13]. Additionally, the SH3-and GK domain form an intermolecular interaction which renders the GK domain catalytic inactive and can function as a separate binding interface [14,15]. The understanding of the biochemical role and regulation of this super domain is limited, but it is intriguing that a similar SH3-HOOK-GK motif is present in voltage-gated calcium channels beta subunits [11,16].
Structural and molecular studies have shown that PDZ domains are pivotal features of scaffolding proteins and localize MAGUKs and their interaction partners to specialized membrane domains of neuronal and epithelial cells [1,2,17]. PDZ domains have a compact and modular structure, and allow MAGUK proteins to bind to C-terminal recognition sequences. Although originally identified in metazoans, the domain has been found to be spread through bacterial, fungi and plant lineages as well [1,18].
To date, phylogenetic analyses have been carried out only for individual members, such as ZO1 and the MAGIs [19,20]. Through our extensive phylogenetic analysis of the entire MAGUK family presented here, we were able to divide the MAGUKs into 7 subfamilies and to infer a probable evolutionary sequence of events that gave rise to the MAGUK domain architectures. We used a domain-bydomain analysis in order to determine that the PDZ-SH3-GK structure evolved only once in the course of evolution. In addition, we confirm our phylogenetic data by molecular modeling and provide evidence for the hypothesis that the MAGUK GK domain originated from a catalytically active GK domain and gradually lost its enzymatic characteristics when new subfamilies emerged.
Domain architectures of the MAGUK subfamilies and their distribution over the eukaryotic phyla Eight subfamilies have been identified as members of the MAGUK family in our study. Depicted here in a general schematic representation are the domains present in these members. All members contain a central core comprised of a GK domain and several N and/or C-terminally positioned PDZ domains. All members except the MAGI proteins contain a SH3 domain that is nested between a PDZ and the GK domain. The CASK and MPP proteins contain N-terminal L27 domains, while CARMA genes encode CARD domains at this position. Names used are allowing a more systematic, but do not reflect in all cases the commonly used names e.g. DLG4 is better known as PSD95. A list of synonyms is given in table 1. The CACNB subfamily, commonly not regarded as a canonical MAGUK subfamily, only contains the SH3 and GK domains. Phylogenetic analysis presented here however shows that it is a related subfamily. (B) The MAGUK subfamily distribution over the eukaryotic phyla shows no homologs were found for Choanozoans/Protozoans, Plantae and Fungi, here represented by Monosiga ovata, Arabidopsis thaliana and Saccharomyces cerevisae, respectively. Tetraodon nigroviridis has duplicated DLG4, ZO1 and ZO2 encoding genes, whereas Gallus gallus lacked the gene for DLG4. The arrowheads on the bottom indicate two possible gene duplication events.

Taxonomic distribution and phyla-specific architectures
Many important biological roles have been described for members of the MAGUK family (reviewed in [1]). However, only very limited information is available about their evolutionary history. To analyze the phylogeny and molecular evolution of the MAGUKs, we initially gathered protein sequences for species of all major metazoan phyla, ranging from Porifera to Chordata, by using human, fruit fly, sponge and hydrozoan sequences as seed. Some sequences were readily available in GenBank and in helpful automated databases like Pfam (with many redundant sequences present) and Superfamily [21]. Many sequences were assembled from ESTs and genomic data. Sequences for all functional domains were identified (described in Methods) and categorized (see Additional file 1 for complete list).
No MAGUK homologs or MAGUK-like structures, represented by combinations of a GK domain with a SH3 and/ or PDZ domains were found in protozoans, fungi and in plants. Our data shows that most canonical MAGUK family members are present throughout all animal phyla investigated (Fig. 1B). Basal metazoans, represented here by the sponges, Oscarella carmela and Suberites domuncula, and the cnidarians, Hydra vulgaris and Hydra magnipapillata, encode for several different MAGUK subfamily members (Fig. 1B). This was previously recognized when a MAGI homolog was characterized in S. domuncula [20] and a ZO member was described for H. vulgaris [19]. Here, we are able to add three new members to this list in basal metazoans, which now includes homologs of MPP, DLG and a DLG5 encoding genes ( Fig. 1B and Additional file 1).
The protein architectures, with respect to the domain combinations, appear largely consistent and conserved throughout all metazoan phyla investigated. However, some lineage-specific differences can be found for example the C. elegans dlg5 gene, lacks the sequences for the GK-, the SH3-and the fourth PDZ-domain. The M. musculus and H. sapiens Dlg5 genes show variations as well, since they encode additional DUF622 or CARD domains (Q3UGX5, NP_004738). These domains are not present in all other species investigated.
We observed several species specific gene duplications within the MAGUK family for example several additional ZO homologs are present in the T. nigroviridis genome, compared to other vertebrates. A Tetraodon Zo1 gene can be found on both chromosome 5 and 13 and duplicated sequences for ZO-2 are positioned on the same two chromosomes (see Additional file 1). Gene duplications in teleosts have been described for the hox clusters [22,23].
Guanylate kinase dendrogram and summarized phylogenetic analyses Figure 2 Guanylate kinase dendrogram and summarized phylogenetic analyses. (A) The dendrogram is based on all protein families currently known to contain GK sequences (note: this is not a phylogeny). These include the three families of the voltage-gated calcium channel beta subunit (CAB), the homologs of Guanylate kinase (GUK) and the MAGUK family. The core structure of the GUK is GK only, while the CAB and MAGUK families have a SH3-GK or PDZ-SH3-GK architecture respectively. The MAGUK and CAB families were only found in metazoan species, while sequences of GUK family members appeared dispersed over all eukaryotic lineages. In the GUK clade the metazoans are indicated with the letter M, the Fungi with F, Bacteria with B, Viruses with V and the Plants with P. Species that were included in this dendrogram as well as sequences used are listed in Additional file 2. (B-C) Summarized phylogenetic trees based on Bayesian consensus trees (see Additional file 3, 4, 5) for the GK, SH3 and PDZ domains, respectively. Numbers indicate % Bayesian posterior probability and % bootstrap Maximum Likelihood.

Phylogenetic analysis of guanylate kinase domains
To elucidate the evolution of the MAGUK family genes and their gene architectures, we performed an analysis of the centrally positioned approximately 250 amino acid long GK domain. It has been suggested that the MAGUK GK domain evolved from other, enzymatically active, GK domains. The latter include domains like those present in the homologs of the S. cerevisae guk1 gene, which shares approximately 40% homology at the protein level [1]. Enzymatic GK domains can be found in most eukaryotic phyla, including plants, fungi, Protozoa and Metazoa, and are encoded by guanlylate kinase homologs (syn. Guk or GMP). The domains are also present in bacteria and certain viruses [e.g. [24]]. Our assembly ( Fig. 2A) illustrates that the MAGUK protein subfamilies form a specific clustering pattern. Of particular interest is the clustering of the CASK subfamily with the MPP subfamily and very close to MPP1. Indeed, the MPP and CASK subfamilies are the only proteins in the MAGUK family that share an architecture containing two L27 domains (see Fig. 1A) [25]. The Guk and CACNB sequences clustered in separate clades.
On the basis of the aligned GK sequences that we used to construct the dendrogram we generated a phylogenetic tree ( Fig. 2B; Additional file 3). Guk family sequences (not belonging to the MAGUKs) were used to root the tree. Judging from the GK phylogenetic relationships, the GK tree suggests that the MPP subfamily split of first from an active GK precursor closely followed by the MAGI and the CACNB subfamilies.
The phylogenetic analysis of the SH3 domains ( Fig. 2C; Additional file 4), which may be 60-70 amino acids in length, showed the same basic evolutionary relationships as the GK domain-based phylogeny, which suggest that the two domains co-evolved along the same path. It must be noted that the MAGI subfamily is not present in the SH3 phylogeny since its members do not contain a SH3 domain.

Comparison of GK 3D structures
To substantiate our above described phylogeny, we created models of the GK and the super domain SH3-HOOK-GK, which is present in each MAGUK subfamily member, except the MAGIs. Canonical GK domains catalyze the reversible phosphoryl transfer from ATP to GMP [e.g. [25]]. Like other NMP kinases, GK proteins contain three essential, dynamic regions: the CORE domain, the LID domain and NMP-binding pocket (Fig. 3) [26]. Unique for the GK is that its GMP-binding domain is comprised of four β-sheets and a short helix, whereas others are purely α-helical [27].
Consistent with our phylogenetic analysis, it is apparent that the MPPs show the most extensive structural similarity with the active GK. The other subfamilies show less  structural resemblance, but it is clear that the DLG and ZO subfamilies are more related to one another, than to the other subfamilies. This was also apparent from our phylogeny.
Indicated in the lowest two panels of figure 3, the SH3-HOOK-GK structures of representatives of the MPP and CACNB subfamilies are shown. Members of the DLG and ZO subfamilies adopted a structure largely similar to the MPP model, which places the CACNB subfamily by itself. This may be explained by the different biological role that CACNB is involved in, which requires a different structural organization compared to the scaffolding role of the canonical MAGUKs. The CARMA subfamily GK or SH3-HOOK-GK sequences could not be superimposed on any annotated structure in de database used. The PDB files and sequences used for the superimpositions, as well as their expect values, are listed in Additional file 2.

Insertion of the WW domains in the MAGI subfamily
The MAGI proteins are different from the other MAGUK members as they lack a SH3 domain between the core GK and PDZ domains. They contain, however, two WW domains (Fig. 1A), which have been suggested to function in a similar fashion as the MAGUK SH3 domain, i.e. facilitating oligomerization [1,28]. From sequence alignments and annotation of the protein-protein interaction domains on these alignments we observed overlapping regions of the GK domain range and WW domains in the sequences of invertebrate species (in SMART, not shown). These overlaps were not present in the mammalian sequences, most likely because a larger, not well conserved, amino acid stretch was present between the two domains. Our alignment of the MAGI sequences revealed that there is a partial conservation of the proper GK domain (approx. 80 residues, compared to the normal 250) and indeed this area could also be reliably modeled (Fig. 3). These observations suggest that early in metazoan evolution, the two WW domains inserted into the C-terminal part of the MAGI GK domain, which thereafter resulted in a loss of this part of the domain.

Diversification of the MAGUK architecture
We assumed that a thorough analysis of the PDZ domain (approx. 70-100 amino acids) would provide more insight into the architectural evolution of the MAGUK family. All C-terminally positioned PDZs of the DLG1-3 subfamily, the ZO1-3 subfamily and DLG5 are intimately related ( Fig. 2D; Additional file 5). Within the DLG1-3 subfamily the three PDZ domains in these proteins are also clustered together very closely, which suggests rapid domain duplications. In the DLG5 subfamily, the fourth PDZ domain seems to have arisen from a domain duplication of the third DLG5 PDZ domain.
The first MAGI PDZ is closely related to the PDZs of the ZO, DLG subfamilies, which was expected from the GK phylogenetic tree. Interestingly however, all other MAGI PDZs (PDZs 2-6) are more related to one another and the MPP/CASK then to the most N-terminally positioned PDZ. These results suggest that the MAGI core structure is not due to an inversion of the GK domain as commonly assumed and reflected in the name (MAGI: membraneassociated guanylate kinase with an inverted arrangement of protein-protein interaction domains [28]).

Discussion
The goal of the present study was to gain insight into the general and structural evolution of the MAGUK family. We further wanted to prove that it is possible to derive a supported evolutionary model by analyzing the phylogeny of individual domains found in multidomain protein families, like the MAGUKs. In order to map the general evolutionary history of this gene family we assembled a large dataset of sequences for all major metazoan phyla, and phylogenetically and structurally analyzed the core domains present.
Choanoflagellates may be closely related to metazoans and based on phylogenetic analyses and the observation that these protists have a collar of feeding tentacles reminiscent of sponge feeding cells, both had been suggested to form a monophyletic group called the Opisthokonta, [e.g. [24,[30][31][32]]. Other protozoan species like Giardia intestinalis (syn. G. lamblia) were also suggested to be close to basal metazoans [33][34][35]. We attempted to identify MAGUK proteins and closely related structural homologs, which are vital for metazoan processes such as cell to cell communication, in these species as well, however, none could be found in protozoans. In addition, MAGUK sequences were not found in the genomes of bacteria, fungi and plant species. This suggests that the formation of the MAGUK structure, with its characteristic and centrally-positioned, non-functioning GK domain is essentially of metazoan origin and we speculate that the MAGUKs initially played important roles in cell to cell communication. The absence in Plantae can be explained by the late evolution of the MAGUKs, but it is tempting to speculate that cell to cell communication in plants is fundamentally different due to additional cell walls and thus might require different (scaffolding) modules than the MAGUKs.
Our search for MAGUK homologs has however, let to the identification of three new members in the most basal metazoans, which now includes homologs of MPP, DLG and a DLG5 encoding genes ( Fig. 1B and Additional file 1). These findings imply that all canonical MAGUK family members arose very early during metazoan evolution. The CARMA genes are a likely exception and no homologs were identified in species more basal than the Deuterostomia. It is important to note that while this is a reasonable assumption at this time, not all genome projects have yet been completed. Thus, after completion they should be revisited.
We initiated our phylogenetic analysis with the GK and SH3 domain, and compared their evolutionary histories as they are both present in most MAGUK subfamilies. Indeed, the GK and SH3 domain show a similar phylogeny (Fig 2B and 2C; Additional file 3 and 4). Furthermore, these findings are supported by molecular modeling of the 3 dimensional structures of the GK domain and not contradicted by the phylogenetic analysis of the PDZ domains ( Fig 2D; Additional file 5). This is an important finding, showing that the analysis of the independent domains of different sizes renders a similar evolutionary scenario.
In regard to the MAGUK family, we found that the Ca 2+ channel beta-subunit family seems to have evolved together with the MAGUK family. The CACNBs are commonly not considered to belong to the MAGUKs, but our phylogenetic results show that they are related, share a common ancestor and may thus represent a MAGUK subfamily.
Our analysis of the PDZ domains showed a bifurcated evolution (Additional file 5), with the MAGI PDZs linked to both groups (first PDZ to DLG, ZO and CARMA; the second to sixth PDZ to the MPPs). Additionally, we describe here that the WW domains likely inserted into the MAGI GK domain and then moved toward the C-terminus, leaving the GK domain deprived of its essential CORE and LID domain. At present, we do not have a good explanation for these events, but our findings are evidence against a complete inversion of the MAGI structure as was suggested earlier [28].
Based on our domain-by-domain analysis we propose a model to describe the structural evolution of the MAGUK family, including the CACNB family (Fig. 4). Evolving from an enzymatically active GK encoding gene, the family arose by obtaining both a PDZ and SH3 domain. Then the MPP subfamily split off, taking up L27 domains. The CACNB and MAGI subfamilies arose through domain loss of the PDZ and SH3 domains, respectively. The MAGUK core structure, consisting of a PDZ, SH3 and GK domain evolved further and gave rise, after duplication of the Nterminal PDZ domain, to the DLG, ZO and, lastly, the CARMA subfamily. Duplications of PDZ domains happened twice during evolution of the MAGUKs and are illustrated in our model (Fig. 4, arrows on top of protein structures).
In summary, we have derived here a highly supported evolutionary model for the MAGUK family by analyzing the phylogeny of individual domains. The results of our analysis provide strong evidence that other complex multidomain families and also larger superfamilies can be investigated in a similar way. Additionally, we provide evidence that places the Calcium-channel beta-subunit proteins within the MAGUK family from an evolutionary perspective.

Conclusion
To elucidate the origin and the evolutionary history of the MAGUK family, we investigated full-length cDNA, EST and genomic sequences of species in major phyla. These data indicated that MAGUKs are present only in metazoan species and not encoded in protozoans, bacteria or plants. Phylogenetic analysis of our sequence data showed a matching evolutionary history for the central protein interaction domains of the MAGUKs. Supported further by structural evidence, we postulate that the MAGUKs evolved first as a GK-SH3 structure from an active GK enzyme, which is present in protozoa and plants. Then the PDZ domain was added to this structure, thereby completing the MAGUK core structure. New domains were subsequently added or duplications of the PDZ were made in order to give rise to the MAGUK assortment now present in vertebrates. Additionally, we provide evidence that places the Calcium-channel beta-subunit proteins within the MAGUK family, based on the evolutionary perspective of our research. Our results show that it is possible to derive a supported evolutionary model for important multidomain families by analyzing encoded protein domains. We suggest here that larger superfamilies can be analyzed in a similar manner.

Alignment and phylogenetic analysis
Alignments were performed using ClustalX [39] with default parameter values, and manually refined in Gene-Doc. The alignments were used for phylogenetic analysis employing both Bayesian analysis and Maximum Likelihood (ML). Bayesian trees were generated with MrBayes [40]. Rate variation across sites was modeled with a four rate gamma distribution and invariant sites, while the MCMC search itself was continued for 1,000,000 generations, sampled every 100 generations, and 2500 trees were discarded as burnin. The amino acid substitution model was set to mixed in order to reduce assumptions prior to analysis. For ML, alignments were bootstrapped 1000 times with the program Seqboot from the Phylip package [41]. Subsequently, phylogenetic trees were generated with the ML algorithm implemented in PhyML [42], with the amino acid substitution set at Jones-Taylor-Thornton. Other PhyML parameters were gamma distribution with four classes for across-site rate variation and optimization of the alpha parameter that was used for the gamma distribution. Consensus trees were calculated with Consense [41]. At last, phylogenetic trees were visualized with MEGA 3.1 [43]. In the tree figures shown, the topology support values are labeled on the Bayesian consensus tree in the order % Bayesian posterior probability/% bootstrap Maximum Likelihood to reduce and standardize the characters and figures used.

Molecular modeling and tree construction on structural information
To create 3D models of the GK and the GK-HOOK-SH3 super domain of the different MAGUK subfamilies we used sequences of human origin. The BLAST E-value limit was set at 1.0e -6 while template identification searches were performed, selecting for the best template through the SWISS-MODEL workspace [44,45]. For model building, refinement and visualization of the superimpositions Swiss-PdbViewer version 3.7 was used [44]. Used templates and their expect values are given in Additional file 2.