- Open Access
ReXSpecies – a tool for the analysis of the evolution of gene regulation across species
© Struckmann et al; licensee BioMed Central Ltd. 2008
- Received: 31 August 2007
- Accepted: 14 April 2008
- Published: 14 April 2008
Annotated phylogenetic trees that display the evolution of transcription factor binding in regulatory regions are useful for e.g. 1) narrowing down true positive predicted binding sites, providing predictions for binding sites that can be tested experimentally, and 2) giving insight into the evolution of gene regulation and regulatory networks.
We describe ReXSpecies, a web-server that processes the sequence information of a regulatory region for multiple species and associated (predicted) transcription factor binding sites into two figures: a) An annotated alignment of sequence and binding sites, consolidated and filtered for ease of use, and b) an annotated tree labeled by the gain and loss of binding sites, where the tree can be calculated from the data or taken from a trusted taxonomy, and the labels are calculated based on standard or Dollo parsimony. For genes involved in mammalian pluripotency, ReXSpecies trees highlight useful patterns of transcription factor binding site gain and loss, e.g. for the Oct and Sox group of factors in the 3' untranslated region of the cystic fibrosis transmembrane conductance regulator gene, which closely match experimental data.
ReXSpecies post-processes the information provided by transcription factor binding site prediction tools, in order to compare data from many species. The tool eases visualization and successive interpretation of transcription factor binding data in an evolutionary context. The ReXSpecies URL can be found in the Availability and requirements section.
- Cystic Fibrosis Transmembrane Conductance Regulator
- Transcription Factor Binding Site
- Alignment Position
- Phylogenetic Footprinting
- Ornithorhynchus Anatinus
Elucidating how genes are regulated is an important step in understanding the processes of life. One approach to infer gene regulation and regulatory networks is to predict transcription factor binding sites (TFBSs) in genomic sequence data. These TFBSs may be located upstream or downstream of known genes, or be part of their UTRs (untranslated regions). There are already tools available for searching genomic regions from multiple species for TFBSs such as Mapper [1, 2] or Genomatix MatInspector . These tools use TFBS models represented by Hidden Markov Models (HMM, used by Mapper), Position Specific Weight Matrices (PWM, used by Genomatix), or IUPAC consensus sequences (Genomatix) to predict TFBSs in a DNA sequence. In case of Mapper, the source of models are Jaspar  and Transfac ; Genomatix uses a database of TFBS developed in-house. The DNA motif that these tools are designed to match is usually short (about 8–20 base pairs) and thus it is not surprising, that there are many false positive matches. We showcase that the visualization and study of the evolutionary history of regulatory regions can be insightful, and that it helps to separate the wheat from the chaff. We argue that beyond evolutionary conservation of binding sites, plausible patterns of common gain and loss of TFBSs in evolution ease this separation.
An evolutionary approach for TFBS prediction is phylogenetic footprinting , based on the idea that the sequences coding a regulatory element should be preserved across different species. Phylogenetic footprinting methods try to discover TFBS in a set of orthologous regulatory regions from multiple species, by identifying the best conserved motifs in those orthologous regions . We propose here to make a step forward with respect to already well-established phylogenetic footprinting servers such as FootPrinter , providing a tool for analyzing and visualizing the evolution of the binding sites. Up to now, large and even small amounts of data had to be digested and visualized manually for this task, by writing down all predictions for each sequence, positioning these in the alignment, and annotating a trusted species tree with them. The annotated alignment then highlights conserved TFBSs and the annotated tree describes the evolution (gain and loss) of binding sites.
With the exception of Mulan , visualization approaches published up to now do not calculate nor consider phylogenetic trees. Moreover, there is no tool that can annotate phylogenetic trees with TFBS information, nor is there a multiple alignment visualization that also presents a multiple alignment of the TFBSs.
In particular, CONREAL  gives an alignment overview for two sequences only. Similarly rVista  only handles pairwise sequence comparison. In contrast, multiTF  displays TFBSs and conservation for multiple species, but without considering TFBS predictions separately for each species, they are all listed in one track. Mulan  produces an annotated alignment, but it uses only pairwise alignments of each sequence with a reference sequence; multiTF identifies conserved TFBS in the Mulan output, and displays the result pairwise using rVista . It displays an unannotated distance tree of the sequences to inform the user about the phylogenetic relationship of the sequences. PRODORIC  is suitable for bacterial genomes only. The ECRBrowser  is a genome browser showing only sequence conservation and TFBS predictions that are precomputed, just like UCSC [14, 15] and EnsEMBL [16, 17].
Import TFBS predictions from different sources (since March 2008 TFBS predictions may be obtained directly, see "Note added in proof");
Filter TFBS predictions to extract the relevant ones;
Visualize evolution of TFBSs using an annotated tree and an annotated alignment;
Provide access via a web front end;
Provide a modular design to make extensions possible;
Offer a simple Wiki and functionality to share results.
A significant limitation in the understanding of gene regulation and regulatory networks is the lack in visualizing and mastering patterns associated with the very large amounts of data generated by technology such as DNA sequencing, ChIP on Chip, ChIP-seq, and microarrays. ReXSpecies is intended to reduce this limitation. It can be accessed via a web front end  and a tutorial is available there .
Input and output
The input for ReXSpecies are a set of homologous sequences, predicted TFBSs for these sequences (e.g. produced by Mapper [1, 2] or Genomatix , since March 2008 TFBS predictions may be obtained directly, see "Note added in proof"), and a phylogenetic tree (the tree is an optional input). Sequences in FASTA format can be read, but ReXSpecies can convert other formats to FASTA. If the sequences are not aligned, ReXSpecies can align them using Muscle [22, 23].
The most simple format for TFBS predictions that ReXSpecies can read is the Mapper output format (a tab separated plain text file with the columns Model ID, Factor name, Strand, Start, End, Score, and E-value). Moreover, ReXSpecies can read Genomatix-generated HTML tables directly. Last but not least, XML import/export of TFBS predictions is possible, see . Phylogenetic trees can be read in Newick or NEXUS format.
ReXSpecies generates HTML output containing an annotated alignment and an annotated tree as described below. These HTML documents may be saved locally by current web browsers.
To reduce the number of TFBS annotations, we filter them based on the E-value of the matches by setting a threshold. Predictions with an E-value larger than a given threshold are hidden. Because the prediction of TFBSs based on a short motif in a long sequence is not very accurate, this filter should not be set too strict. In case of Mapper we take the E-values as provided. To generate E-values for the Genomatix  predictions, we use a patched version of the implementation of the Extreme Value Distribution (EVD) method in Bioperl-ext [25, 26].
Moreover, a filter routine based on regular expressions  is implemented. It can hide TFBSs based on any field in the Mapper/Genomatix TFBS prediction record, which contains information about the name of the TFBS, the position of the TFBS in the sequence investigated, and the score of the match. Another relevant field in the filter tool is the set of species the TFBS model is made from; for example all plant TFBSs can be filtered out if Mammals are investigated only, because most likely all predictions of plant TFBSs in a mammalian sequence are false positive.
TFBS predictions in the same species may belong to slightly different models of the same factor, and predictions in different species may belong to orthologous factors, and we may wish to group them together if they occur at approximately the same position. Positions may vary slightly, because the alignment is unaware of the TFBSs, or because models are slightly different. Moreover, such slightly different models for the same or for an orthologous transcription factor may have very different names derived from synonyms, e.g. POU5F1 versus Oct4.
ReXSpecies consists of a number of modules written in Perl extending the Web-Application base class using multiple inheritance. These modules should only contain callback functions (i.e. functions, called by the base class) to register them in the menu or to generate their user interface. All other functionality should be kept separate in private objects. The Web-Application class provides persistence, user management, and objects of common classes for Perl CGI scripts.
For calculations that last too long to be done interactively (e.g. tree calculation) we have implemented a job spooler running as a server process.
Due to the large set of bioinformatics libraries available, we decided to use Perl. We use many modules from CPAN (Comprehensive Perl Archive Network)  available under various open source licenses, e.g. Bioperl . The tree rendering is done by an overloaded version of Bio::Phylo . The database back end is currently based on MySQL . For user management there is a module supporting a LDAP (Lightweight Directory Access Protocol)  user database.
Reliability of predicted TFBS
Stem cells and pluripotency
Stem cells are currently a major topic of interest and in this section we use ReXSpecies to explore the regulation of genes involved in mammalian stem cell pluripotency. We define pluripotency as the ability to undergo self-renewal and the potential to form all different cell types of the body . Embryonic stem cells (ESCs) are pluripotent and they are important for the development of cellular regenerative therapies for medical conditions with irreversible tissue damage or loss . Efforts to realize this potential and to be able to reprogram somatic cells to pluripotent like cells with properties similar to ESCs require a better understanding of the interplay of the transcription factors and their binding sites involved in the regulation of the transcriptional network that is behind the ability of ESCs to maintain the pluripotent state . In the last years, the transcription factors Oct4, Sox2 and Nanog have been identified to be master regulators of pluripotency, providing ESCs with extensive self-renewal potential/capacity . For these three key regulators, TFBS models are available for searches by Mapper [1, 2] and Genomatix . For Octamer binding in general, there are 15 HMM models in Mapper and 10 models in Genomatix available, but there is no Oct4-specific model. For Sox, there are 8 models in Mapper and 6 models provided by Genomatix. To be as specific as possible for Sox2, we include a Sox2 HMM model based on the binding site data in . For Nanog, only Genomatix offers a single model that is not very sensitive, however; we found matches only in Lemur (see below).
Evolution of the CFTR 3' UTR regulatory region
Evolution of the Nanog 5' regulatory region
In addition to Oct4/Sox2, Nanog is a key player of pluripotency . The evolution of the first upstream conserved part of its 5' regulatory region is visualized in Figures 2 and 5 (and in Additional files 1 and 2). The most prominent observation is the large number of predicted TFBSs of stem-cell relevant transcription factors appearing on the lineage from Theria to Eutheria which may be associated with the developmental changes that occurred during the evolution from Theria to Eutheria. These predictions are found to appear in a region conserved for all Theria; this region comprises, in part, the region shown in Figure 2 and the first quarter of the region shown in Additional file 1. In particular, "SMAD at position 14" is found from Eutheria onwards with the exception of Insectivora, "Oct6 at position 53" is found in all Eutheria except E. Europaeus (Insectivora) and Carnivora, denoted by the synonym OCTB. Curiously, "Sox9 at position 57" first appears closeby for the same set of species with the caveat that it was lost in Rodents. "Otx2 (orthodendicle) at position 84" is also found for all Eutheria, except E. telfairi, denoted by a Genomatix family of weight matrices called HOXF. Very recently, Zhou et al  identified Otx2 as a "core regulator in mouse ESC" (embryonic stem cells), noting that it had not "been implicated in ESC maintenance" before. Finally, outside of the region conserved in all Theria, within the last three quarters of the region shown in Additional file 1, we find a plethora of other relevant predictions, e.g. predicted binding of "EKLF" (erythroid Krueppel-like factor; only very recently the involvement of Klf4 (Krueppel-like factor 4) in pluripotency was shown ). However, their uniqueness to Eutheria is not as clear as in the cases described above, because no homologous region could be obtained for the non-Eutherian opossum (M. domestica), and it is possible that the region in question still exists in opossum, and that it did not evolve in Eutheria.
Preliminary analyses of conserved regulatory regions of other key pluripotency genes yield further interesting observations that may give rise to hypotheses about the regulation of pluripotency. Downstream of the Oct4 (POU5F1) gene we find conserved predicted TFBSs of stem-cell relevant transcription factors such as Sox, STAT, SMAD, EKLF, SP1, Pax and FKHD. Downstream of the Sox2 gene the most interesting finding is that among all Amniota, only human has a predicted Oct/Sox binding motif (data not shown).
Caveats in interpreting predicted TFBS
We already discussed the most obvious problem with using genomic (sequence) data and associated predicted binding sites, namely the large number of mis-predictions. We would like to exemplify two further problems. First, we have to consider that the set of transcription factors and TFBS known for various species is incomplete, so we never know whether we are dealing with orthologous TFs or paralogous TFs (a very similar problem, called "hidden paralogy", complicates species tree inference, see Martin and Burg ). In fact, our Sox2 binding site predictions are based on a model that may also match binding sites of other Sox factors; it is even possible that Sox2 does not bind at TFBSs predicted using this model, but other Sox factors do. Recently, it was shown that Sox binding sites found adjacent to Oct binding sites of genes involved in pluripotency are not functionally important . Other Sox factors (Sox4, Sox11, Sox15) may bind, and Sox2 was shown to be an upstream regulator of pluripotency instead. However, while such insights may modify the evolutionary analysis, they do not usually invalidate it.
Secondly, significance of our observations is hard to quantify. As in many areas of scientific investigation, the "wheat", i.e. the observations deemed valuable and subsequently reported, may simply be chance findings that are to be expected if a large amount of data is analyzed. In other words, looking at sufficiently many predicted TFBS, we are doomed to find chance correlations that seem to make evolutionary sense, e.g. common gain and loss of TFBS. Therefore, we should not get tired to stress that all in-silico analysis should be followed up by experimental validation. Evolutionary patterns can narrow down true positive predictions, but they cannot identify them. A combined analysis of in-silico and experimental data is yet another approach, and it is important future work to add experimental TFBS data (e.g. ChIP on Chip, ) to our visualizations, aiming at a deeper understanding of the evolution of biological features such as the regulation of pluripotency.
The ReXSpecies web-server is able to give deeper insights into the evolution of regulatory regions by providing sequence alignments and phylogenetic trees annotated with predictions for TFBSs and their gain or loss. In the future we plan to automate more tasks so that finally the input will only be a gene and the output will be an overview of its putative regulatory regions across different species annotated with TFBS predictions, a tree labeled with those predictions including gain/loss information at the edges, and possibly even a regulatory network inferred from the TFBS predictions. Towards this end, automation of the retrieval of sequence information and TFBS predictions is planned. Moreover, we wish to add more tree estimation tools besides MrBayes [18, 19], e.g. RAxML , and add likelihood based methods for labeling, as well as add TFBS prediction modules to enable use without Genomatix or Mapper access, automated grouping by clustering of TFBS predictions, and import of experimental (e.g. ChIP on Chip ) data.
Project name: ReXSpecies
Project home page: http://sourceforge.net/projects/rexspecies
Operating system: Web application running on Linux
Programming language: Perl
Other requirements: bioperl, muscle, MySQL, LDAP, MrBayes
License: GNU LGPL
Source code of the version used for this article: See Additional file 4
We thank E. Klassen, M. Kabisch, A. Seeland, C. Scheuner for assistance. RAR was supported by the following grants to Ileana Zucchi, CNR, Milan, Italy: (1) FIRB Internazionali Grant RBIN04CBSM 000 and (2) N.O.B.E.L. Grant funded by Fondazione CARIPLO.
- Marinescu VD, Kohane IS, Riva A: The MAPPER database: a multi-genome catalog of putative transcription factor binding sites. Nucleic Acids Res. 2005, D91-7. 33 DatabaseGoogle Scholar
- Marinescu VD, Kohane IS, Riva A: MAPPER: a search engine for the computational identification of putative transcription factor binding sites in multiple genomes. BMC Bioinformatics. 2005, 6: 79-10.1186/1471-2105-6-79.PubMed CentralView ArticlePubMedGoogle Scholar
- Cartharius K, Frech K, Grote K, Klocke B, Haltmeier M, Klingenhoff A, Frisch M, Bayerlein M, Werner T: MatInspector and beyond: promoter analysis based on transcription factor binding sites. Bioinformatics. 2005, 21 (13): 2933-42. 10.1093/bioinformatics/bti473.View ArticlePubMedGoogle Scholar
- Wasserman WW, Sandelin A: Applied bioinformatics for the identification of regulatory elements. Nat Rev Genet. 2004, 5 (4): 276-87. 10.1038/nrg1315. [http://view.ncbi.nlm.nih.gov/pubmed/15131651]View ArticlePubMedGoogle Scholar
- Matys V, Kel-Margoulis OV, Fricke E, Liebich I, Land S, Barre-Dirrie A, Reuter I, Chekmenev D, Krull M, Hornischer K, Voss N, Stegmaier P, Lewicki-Potapov B, Saxel H, Kel AE, Wingender E: TRANSFAC(R) and its module TRANSCompel(R): transcriptional gene regulation in eukaryotes. Nucl Acids Res. 2006, 34 (suppl 1): D108-110. 10.1093/nar/gkj143. [http://nar.oxfordjournals.org/cgi/content/abstract/34/suppl_1/D108]PubMed CentralView ArticlePubMedGoogle Scholar
- Tagle DA, Koop BF, Goodman M, Slightom JL, Hess DL, Jones RT: Embryonic epsilon and gamma globin genes of a prosimian primate (Galago crassicaudatus). Nucleotide and amino acid sequences, developmental regulation and phylogenetic footprints. J Mol Biol. 1988, 203 (2): 439-55. 10.1016/0022-2836(88)90011-3.View ArticlePubMedGoogle Scholar
- Blanchette M, Tompa M: Discovery of Regulatory Elements by a Computational Method for Phylogenetic Footprinting. Genome Res. 2002, 12 (5): 739-748. 10.1101/gr.6902. [http://www.genome.org/cgi/content/abstract/12/5/739]PubMed CentralView ArticlePubMedGoogle Scholar
- Blanchette M, Tompa M: FootPrinter: a program designed for phylogenetic footprinting. Nucl Acids Res. 2003, 31 (13): 3840-3842. 10.1093/nar/gkg606. [http://nar.oxfordjournals.org/cgi/content/abstract/31/13/3840]PubMed CentralView ArticlePubMedGoogle Scholar
- Ovcharenko I, Loots GG, Giardine BM, Hou M, Ma J, Hardison RC, Stubbs L, Miller W: Mulan: Multiple-sequence local alignment and visualization for studying function and evolution. Genome Res. 2005, 15: 184-194. 10.1101/gr.3007205. [http://www.genome.org/cgi/content/abstract/15/1/184]PubMed CentralView ArticlePubMedGoogle Scholar
- Berezikov E, Guryev V, Cuppen E: CONREAL web server: identification and visualization of conserved transcription factor binding sites. Nucleic Acids Res. 2005, 33 (Suppl 2): W447-450. 10.1093/nar/gki378. [http://nar.oxfordjournals.org/cgi/content/abstract/33/suppl_2/W447]PubMed CentralView ArticlePubMedGoogle Scholar
- Loots GG, Ovcharenko I: rVISTA 2.0: evolutionary analysis of transcription factor binding sites. Nucleic Acids Res. 2004, 32 (Suppl 2): W217-221. 10.1093/nar/gkh383. [http://nar.oxfordjournals.org/cgi/content/abstract/32/suppl_2/W217]PubMed CentralView ArticlePubMedGoogle Scholar
- Munch R, Hiller K, Grote A, Scheer M, Klein J, Schobert M, Jahn D: Virtual Footprint and PRODORIC: an integrative framework for regulon prediction in prokaryotes. Bioinformatics. 2005, 21 (22): 4187-4189. 10.1093/bioinformatics/bti635. [http://bioinformatics.oxfordjournals.org/cgi/content/abstract/21/22/4187]View ArticlePubMedGoogle Scholar
- Ovcharenko I, Nobrega MA, Loots GG, Stubbs L: ECR Browser: a tool for visualizing and accessing data from comparisons of multiple vertebrate genomes. Nucl Acids Res. 2004, 32 (Suppl 2): W280-286. 10.1093/nar/gkh355. [http://nar.oxfordjournals.org/cgi/content/abstract/32/suppl_2/W280]PubMed CentralView ArticlePubMedGoogle Scholar
- Genome Browser Gateway. [http://genome.ucsc.edu/cgi-bin/hgGateway]
- Karolchik D, Baertsch R, Diekhans M, Furey TS, Hinrichs A, Lu YT, Roskin KM, Schwartz M, Sugnet CW, Thomas DJ, Weber RJ, Haussler D, Kent WJ: The UCSC Genome Browser Database. Nucleic Acids Res. 2003, 31: 51-4. 10.1093/nar/gkg129.PubMed CentralView ArticlePubMedGoogle Scholar
- Ensembl Genome Browser. [http://www.ensembl.org/index.html]
- Hubbard TJP, Aken BL, Beal K, Ballester B, Caccamo M, Chen Y, Clarke L, Coates G, Cunningham F, Cutts T, Down T, Dyer SC, Fitzgerald S, Fernandez-Banet J, Graf S, Haider S, Hammond M, Herrero J, Holland R, Howe K, Howe K, Johnson N, Kahari A, Keefe D, Kokocinski F, Kulesha E, Lawson D, Longden I, Melsopp C, Megy K, Meidl P, Ouverdin B, Parker A, Prlic A, Rice S, Rios D, Schuster M, Sealy I, Severin J, Slater G, Smedley D, Spudich G, Trevanion S, Vilella A, Vogel J, White S, Wood M, Cox T, Curwen V, Durbin R, Fernandez-Suarez XM, Flicek P, Kasprzyk A, Proctor G, Searle S, Smith J, Ureta-Vidal A, Birney E: Ensembl 2007. Nucl Acids Res. 2006, [http://nar.oxfordjournals.org/cgi/content/abstract/gkl996v1]Google Scholar
- Huelsenbeck JP, Ronquist F: MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics. 2001, 17 (8): 754-5. 10.1093/bioinformatics/17.8.754.View ArticlePubMedGoogle Scholar
- Ronquist F, Huelsenbeck JP: MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics. 2003, 19 (12): 1572-4. 10.1093/bioinformatics/btg180.View ArticlePubMedGoogle Scholar
- ReXSpecies – Regulation across species. [http://bio.math-inf.uni-greifswald.de/ReXSpecies]
- ReXSpecies-Tutorial. [http://bio.math-inf.uni-greifswald.de/ReXSpecies-Tutorial.html]
- Edgar RC: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004, 32 (5): 1792-7. 10.1093/nar/gkh340.PubMed CentralView ArticlePubMedGoogle Scholar
- Edgar RC: MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics. 2004, 5: 113-10.1186/1471-2105-5-113.PubMed CentralView ArticlePubMedGoogle Scholar
- Document type definition for TFBS lists. [http://www.math-inf.uni-greifswald.de/~struckma/tfbs/dtd/1.0/tfbs.dtd]
- BioPerl. [http://www.bioperl.org/]
- Durbin R, Eddy SR, Krogh A, Mitchison G: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. 1998, Cambridge University PressView ArticleGoogle Scholar
- Perl documentation. [http://perldoc.perl.org/perlre.html]
- Wheeler DL, Barrett T, Benson DA, Bryant SH, Canese K, Chetvernin V, Church DM, DiCuccio M, Edgar R, Federhen S, Geer LY, Kapustin Y, Khovayko O, Landsman D, Lipman DJ, Madden TL, Maglott DR, Ostell J, Miller V, Pruitt KD, Schuler GD, Sequeira E, Sherry ST, Sirotkin K, Souvorov A, Starchenko G, Tatusov RL, Tatusova TA, Wagner L, Yaschenko E: Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2007, 35 (suppl 1): D5-12. 10.1093/nar/gkl1031. [http://nar.oxfordjournals.org/cgi/content/abstract/35/suppl_1/D5]PubMed CentralView ArticlePubMedGoogle Scholar
- Common Taxonomy Tree. [http://www.ncbi.nlm.nih.gov/Taxonomy/CommonTree/wwwcmt.cgi]
- Fitch WM: Toward Defining the Course of Evolution: Minimum Change for a Specific Tree Topology. Systematic Zoology. 1971, 20 (4): 406-416. 10.2307/2412116.View ArticleGoogle Scholar
- Quesne WJL: The Uniquely Evolved Character Concept and its Cladistic Application. Systematic Zoology. 1974, 23 (4): 513-517. 10.2307/2412469.View ArticleGoogle Scholar
- Farris JS: Phylogenetic Analysis Under Dollo's Law. Systematic Zoology. 1977, 26: 77-88. 10.2307/2412867.View ArticleGoogle Scholar
- CPAN – Comprehensive Perl Archive Network. [http://www.cpan.org]
- Bio::Phylo. [http://search.cpan.org/dist/Bio-Phylo/]
- MySQL. [http://www.mysql.com/]
- Lightweight Directory Access Protocol. [http://en.wikipedia.org/wiki/LDAP]
- Boiani M, Scholer HR: Regulatory networks in embryo-derived pluripotent stem cells. Nature Reviews Molecular Cell Biology. 2005, 6 (11): 872-884. 10.1038/nrm1744.View ArticlePubMedGoogle Scholar
- Mimeault M, Hauke R, Batra SK: Stem cells: a revolution in therapeutics-recent advances in stem cell biology and their therapeutic applications in regenerative medicine and cancer therapies. Clin Pharmacol Ther. 2007, 82 (3): 252-64. 10.1038/sj.clpt.6100301.View ArticlePubMedGoogle Scholar
- Takahashi K, Yamanaka S: Induction of pluripotent stem cells from mouse embryonic and adult fibroblast cultures by defined factors. Cell. 2006, 126 (4): 663-76. 10.1016/j.cell.2006.07.024.View ArticlePubMedGoogle Scholar
- Okumura-Nakanishi S, Saito M, Niwa H, Ishikawa F: Oct-3/4 and Sox2 Regulate Oct-3/4 Gene in Embryonic Stem Cells. J Biol Chem. 2005, 280 (7): 5307-5317. 10.1074/jbc.M410015200. [http://www.jbc.org/cgi/content/abstract/280/7/5307]View ArticlePubMedGoogle Scholar
- Thomas JW, Touchman JW, Blakesley RW, Bouffard GG, Beckstrom-Sternberg SM, Margulies EH, Blanchette M, Siepel AC, Thomas PJ, McDowell JC, Maskeri B, Hansen NF, Schwartz MS, Weber RJ, Kent WJ, Karolchik D, Bruen TC, Bevan R, Cutler DJ, Schwartz S, Elnitski L, Idol J, Prasad A, Lee-Lin S, Maduro V, Summers T, Portnoy M, Dietrich N, Akhter N, Ayele K, Benjamin B, Cariaga K, Brinkley C, Brooks S, Granite S, Guan X, Gupta J, Haghighi P, Ho S, Huang M, Karlins E, Laric P, Legaspi R, Lim M, Maduro Q, Masiello C, Mastrian S, McCloskey J, Pearson R, Stantripop S, Tiongson E, Tran J, Tsurgeon C, Vogt J, Walker M, Wetherby K, Wiggins L, Young A, Zhang L, Osoegawa K, Zhu B, Zhao B, Shu C, De Jong P, Lawrence C, Smit A, Chakravarti A, Haussler D, Green P, Miller W, Green E: Comparative analyses of multi-species sequences from targeted genomic regions. Nature. 2003, 424 (6950): 788-793. 10.1038/nature01858.View ArticlePubMedGoogle Scholar
- Vuillaumier S, Dixmeras I, Messai H, Lapoumeroulie C, Lallemand D, Gekas J, Chehab FF, Perret C, Elion J, Denamur E: Cross-species characterization of the promoter region of the cystic fibrosis transmembrane conductance regulator gene reveals multiple levels of regulation. Biochem J. 1997, 327 (Pt 3): 651-62.PubMed CentralView ArticlePubMedGoogle Scholar
- Botquin V, Hess H, Fuhrmann G, Anastassiadis C, Gross MK, Vriend G, Scholer HR: New POU dimer configuration mediates antagonistic control of an osteopontin preimplantation enhancer by Oct-4 and Sox-2. Genes Dev. 1998, 12 (13): 2073-90. 10.1101/gad.12.13.2073.PubMed CentralView ArticlePubMedGoogle Scholar
- Genomatix Matrix Family Library. [http://www.genomatix.de/online_help/help_gems/mat_lib_50.html]
- Chambers I, Colby D, Robertson M, Nichols J, Lee S, Tweedie S, Smith A: Functional expression cloning of Nanog, a pluripotency sustaining factor in embryonic stem cells. Cell. 2003, 113 (5): 643-55. 10.1016/S0092-8674(03)00392-1.View ArticlePubMedGoogle Scholar
- Zhou Q, Chipperfield H, Melton DA, Wong WH: A gene regulatory network in mouse embryonic stem cells. Proceedings of the National Academy of Sciences. 2007, 104 (42): 16438-16443. 10.1073/pnas.0701014104. [http://www.pnas.org/cgi/content/abstract/104/42/16438]View ArticleGoogle Scholar
- Nakatake Y, Fukui N, Iwamatsu Y, Masui S, Takahashi K, Yagi R, Yagi K, Miyazaki JI, Matoba R, Ko MSH, Niwa H: Klf4 cooperates with Oct3/4 and Sox2 to activate the Lefty1 core promoter in embryonic stem cells. Mol Cell Biol. 2006, 26 (20): 7772-82. 10.1128/MCB.00468-06.PubMed CentralView ArticlePubMedGoogle Scholar
- Martin AP, Burg TM: Perils of Paralogy: Using HSP70 Genes for Inferring Organismal Phylogenies. Systematic Biology. 51: 570-587. 10.1080/10635150290069995. 1 July 2002, [http://www.ingentaconnect.com/content/tandf/usyb/2002/00000051/00000004/art00003]
- Masui S, Nakatake Y, Toyooka Y, Shimosato D, Yagi R, Takahashi K, Okochi H, Okuda A, Matoba R, Sharov AA, Ko MSH, Niwa H: Pluripotency governed by Sox2 via regulation of Oct3/4 expression in mouse embryonic stem cells. Nature Cell Biology. 2007, 9 (6): 625-635. 10.1038/ncb1589.View ArticlePubMedGoogle Scholar
- Boyer LA, Lee TI, Cole MF, Johnstone SE, Levine SS, Zucker JP, Guenther MG, Kumar RM, Murray HL, Jenner RG, Gifford DK, Melton DA, Jaenisch R, Young RA: Core transcriptional regulatory circuitry in human embryonic stem cells. Cell. 2005, 122 (6): 947-56. 10.1016/j.cell.2005.08.020.PubMed CentralView ArticlePubMedGoogle Scholar
- Stamatakis A: RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics. 2006, 22 (21): 2688-2690. 10.1093/bioinformatics/btl446. [http://bioinformatics.oxfordjournals.org/cgi/content/abstract/22/21/2688]View ArticlePubMedGoogle Scholar
- Beckstette M, Homann R, Giegerich R, Kurtz S: Fast index based algorithms and software for matching position specific scoring matrices. BMC Bioinformatics. 2006, 7: 389-10.1186/1471-2105-7-389. [http://www.biomedcentral.com/1471-2105/7/389]PubMed CentralView ArticlePubMedGoogle Scholar
- Beckstette M, Strothmann D, Homann R, Giegerich R, Kurtz S: PoSSuMsearch: Fast and Sensitive Matching of Position Specific Scoring Matrices using Enhanced Suffix Arrays. Lecture Notes in Informatics (LNI). 2004, P-53:Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.