Comparative genomics of chondrichthyan Hoxa clusters
© Mulley et al. 2009
Received: 28 April 2009
Accepted: 2 September 2009
Published: 2 September 2009
Skip to main content
© Mulley et al. 2009
Received: 28 April 2009
Accepted: 2 September 2009
Published: 2 September 2009
The chondrichthyan or cartilaginous fish (chimeras, sharks, skates and rays) occupy an important phylogenetic position as the sister group to all other jawed vertebrates and as an early lineage to diverge from the vertebrate lineage following two whole genome duplication events in vertebrate evolution. There have been few comparative genomic analyses incorporating data from chondrichthyan fish and none comparing genomic information from within the group. We have sequenced the complete Hoxa cluster of the Little Skate (Leucoraja erinacea) and compared to the published Hoxa cluster of the Horn Shark (Heterodontus francisci) and to available data from the Elephant Shark (Callorhinchus milii) genome project.
A BAC clone containing the full Little Skate Hoxa cluster was fully sequenced and assembled. Analyses of coding sequences and conserved non-coding elements reveal a strikingly high level of conservation across the cartilaginous fish, with twenty ultraconserved elements (100%,100 bp) found between Skate and Horn Shark, compared to three between human and marsupials. We have also identified novel potential non-coding RNAs in the Skate BAC clone, some of which are conserved to other species.
We find that the Little Skate Hoxa cluster is remarkably similar to the previously published Horn Shark Hoxa cluster with respect to sequence identity, gene size and intergenic distance despite over 180 million years of separation between the two lineages. We suggest that the genomes of cartilaginous fish are more highly conserved than those of tetrapods or teleost fish and so are more likely to have retained ancestral non-coding elements. While useful for isolating homologous DNA, this complicates bioinformatic approaches to identify chondrichthyan-specific non-coding DNA elements
The Hox genes play important roles in determination of anterior-posterior patterning during embryonic development . Their clustered organisation in the genome is a consequence of their origin from tandem gene duplications and is intimately associated with their expression during development. To date, all invertebrate lineages have been found to have only a single set of Hox genes, generally arranged in a single Hox cluster. In contrast, vertebrates have multiple more compact clusters, with four being present in tetrapods (Hoxa, Hoxb, Hoxc and Hoxd ) as a result of two rounds of whole genome duplication at the base of the vertebrates. Teleost fish have up to eight clusters and, within teleosts, the salmonids have at least thirteen as the result of additional genome duplications [9–11]. Furthermore, teleost Hox clusters are known to evolve at a faster rate than those of tetrapods, making it difficult to identify ancient conserved non-coding sequences . Two Hox gene clusters (Hoxa and Hoxd) have been fully sequenced from a chondrichthyan (the Horn Shark Heterodontus francisci [3, 4]). Because of this, and the availability of the complete sequence of the Hoxa cluster of the Senegal bichir (Polypterus senegalus ) - a member of the earliest diverging lineage of ray-finned fish and therefore unaffected by the whole genome duplications which increased Hox cluster number in teleosts - we chose to analyse the Hoxa cluster of Little Skate (Leucoraja erinacea), a member of the oldest extant lineage of Elasmobranchs . Together with the data from the more recently diverging Horn Shark and outgroup Holocephalan sequences we can for the first time carry out a comparative genomic analysis within the cartilaginous fish.
Gene structure of the three Hoxa genes for which there are complete data in the Elephant Shark genome
Gene length (bp)
Vertebrate Hox clusters are usually free of complex repeats. However, the Little Skate Hoxa cluster contains a 109 bp region between Hoxa11 and Hoxa10 which shows high (84%) identity to the Deu-domain and 3'-tail region of SacSINE1 identified by Nishihara et al.  in the dogfish shark (Squalus acanthias). SacSINE1 is a 463 bp tRNA-derived SINE related to the SINE3 family of zebrafish and thought to be derived from the L2 clade of LINEs. The Little Skate SINE fragment lacks the 5' promoter for transcription by RNA III polymerase, indicating that it is a non-functional retroposon although it may have accumulated some other function which accounts for the degree of sequence conservation to SacSINE1. Alternatively, the Little Skate SINE may represent a recent retroposition into the Hox cluster which has been inactivated by the loss of the 5' promoter region. The orthologous region of the Horn Shark Hoxa cluster shows no sequence similarity to the Little Skate SINE fragment and it will be interesting to trace the evolution of this SINE within other members of the Rajidae. Repeatmasker  identifies 6523 bp (4.87%) of the Little Skate BAC clone as being repetitive, of which only 2432 bp (2.33%) is located in the gene-containing region between HoxA1 and HoxA13. The same region of the Horn Shark Hoxa cluster contains only 1205 bp (1.14%) of repetitive DNA, despite the larger genome size of Horn Shark. Aside from the SINE fragment in the Little Skate cluster, the rest of the repeats are simple-sequence elements such as mononucleotide tracts (see below) or microsatellites with a repeating unit of 2-6 bp. It seems likely therefore that the constraining force on repetitive DNA in vertebrate Hox clusters is not the preservation of precise intergenic distances (since microsatellites are known to expand and contract by strand slippage during DNA replication) but rather the exclusion of alternative transcriptional start sites in LINE or SINE promoters which may disrupt the tightly controlled expression of Hox genes during development.
Both the Skate and Horn Shark Hoxa clusters contain a number of guanine-rich sequences (Figure 5b). Regions such as this have been shown to be involved in transcriptional regulation of genes through the formation of stable four-stranded structures (the G-quadruplex or G4 DNA ) and it is possible that these sequences may be involved in transcriptional control of the Hoxa genes. The Quadparser program  identifies 13 putative G4-forming regions in the Little Skate BAC sequence, nine of which are located within the coding region between Hoxa13 and Hoxa1. The orthologus region in Horn Shark contains 14 such sequences and although some relative positions appear to be conserved (such as between Hoxa9 and Hoxa10 and a doublet between Hoxa4 and Hoxa5) these do not appear to be homologous based on conservation of both the G4 quadruplex and surrounding sequences. Because of this, it appears unlikely that G4 DNA plays an ancestral role in Hoxa gene regulation, although it is still possible that there are some cell- or species-specific requirements for the G4 structures in non-embryonic Hox expression.
The Little Skate Hoxa cluster is strikingly similar to that of the Horn Shark in terms of both sequence conservation and intergenic distances despite the two lineages having been separated by at least 180 million years. The available data from the ongoing Elephant Shark (Callorhinchus milii) genome project suggests that the Hoxa cluster of this species is also highly conserved with respect to the Skate and Horn Shark despite having been separated from them for around 375 million years. Comparisons of the human Hoxa cluster to those of marsupials and amphibians reveal a much lower level of sequence conservation over similar periods of time, indicating either a faster rate of molecular evolution in the tetrapod and mammalian lineages or a slower rate of evolution in the cartilaginous fish. This finding echoes the suggestion of Wang et al.  that UCEs and protein-coding genes are evolving more slowly in Elephant Shark compared to other vertebrates. If the level of conservation seen in the chondrichthyan Hoxa clusters in reflected across the entire genome then this will complicate attempts to identify functional conserved non-coding elements within the cartilaginous fish using phylogenetic footprinting. This is an important consideration given the two cartilaginous fish genomes currently in the pipeline and the likely increase in this number due to recent advances in genome sequencing technologies.
The Little Skate BAC clone 0081H20 was isolated in a low stringency pooled homeobox screen of a set of 4.1× coverage BAC library (BAC Library RE__Ba (Little Skate)) filters from the Clemson University Genomics Institute (CUGI). The BAC was shotgun sequenced to 9.3× coverage by contract to the Genome Center at Washington University, resulting in 14 major contigs in 4 scaffolds. Remaining gaps were filled using a combination of direct sequencing on BAC DNA or by cloning and sequencing of PCR products from primers designed to cross gaps in the scaffolds. Genes were annotated by comparison to published Horn Shark genes (from BAC sequences described by  and  deposited under accession numbers AF224262 and AF479755). Chondrichthyan non-coding elements were identified using the LAGAN alignment program  implemented in mVISTA . Putative ncRNAs were identified using Infernal (version 1.0rc3)  and covariance models  of miRNA families from Rfam (version 9.1) . Candidate precursors of microRNAs were selected with sequence length greater than 60 nt and minimum free energy of stem-loop structure less than -20 kcal/mol. RNA structures were drawn using the RNAfold (Vienna RNA Package 1.8.2) webserver http://rna.tbi.univie.ac.at.
This research was supported by a grant from the Wellcome Trust.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.