- Research article
- Open Access
Whole genome duplication events in plant evolution reconstructed and predicted using myosin motor proteins
© Mühlhausen and Kollmar; licensee BioMed Central Ltd. 2013
Received: 14 April 2013
Accepted: 16 September 2013
Published: 22 September 2013
The evolution of land plants is characterized by whole genome duplications (WGD), which drove species diversification and evolutionary novelties. Detecting these events is especially difficult if they date back to the origin of the plant kingdom. Established methods for reconstructing WGDs include intra- and inter-genome comparisons, KS age distribution analyses, and phylogenetic tree constructions.
By analysing 67 completely sequenced plant genomes 775 myosins were identified and manually assembled. Phylogenetic trees of the myosin motor domains revealed orthologous and paralogous relationships and were consistent with recent species trees. Based on the myosin inventories and the phylogenetic trees, we have identified duplications of the entire myosin motor protein family at timings consistent with 23 WGDs, that had been reported before. We also predict 6 WGDs based on further protein family duplications. Notably, the myosin data support the two recently reported WGDs in the common ancestor of all extant angiosperms. We predict single WGDs in the Manihot esculenta and Nicotiana benthamiana lineages, two WGDs for Linum usitatissimum and Phoenix dactylifera, and a triplication or two WGDs for Gossypium raimondii. Our data show another myosin duplication in the ancestor of the angiosperms that could be either the result of a single gene duplication or a remnant of a WGD.
We have shown that the myosin inventories in angiosperms retain evidence of numerous WGDs that happened throughout plant evolution. In contrast to other protein families, many myosins are still present in extant species. They are closely related and have similar domain architectures, and their phylogenetic grouping follows the genome duplications. Because of its broad taxonomic sampling the dataset provides the basis for reliable future identification of further whole genome duplications.
Whole genome duplications have had a strong impact on species diversification and may have triggered evolutionary novelties [1, 2]. Plants underwent several independent rounds of whole genome duplication (WGD) events [3–9]. Traces of these WGDs are still present, although duplication events are usually followed by massive gene loss and structural rearrangements . Nevertheless, many cases of both recent and ancient WGD events have been reported so far, including the hexaploidy event shared by most, if not all, eudicots [11–13], and WGDs dated to the common ancestor of all extant angiosperms and to the common ancestor of all extant seed plants .
Whole genome duplications are usually reconstructed by intra- and inter-genome comparisons to detect synthenic regions (genomic collinearity), by KS age distribution analyses, and by phylogenetic tree constructions . Since collinearity decreases with time, it can usually not be used to detect old genome duplications. KS describes the number of synonymous substitutions per synonymous site and becomes unreliable in age distribution analyses due to gene loss and saturation effects. Phylogenetic approaches have the advantage that duplication events can be mapped onto gene trees provided that these trees include paralogs created by given WGD events and orthologous genes from other species. However, individual gene trees can be affected by different evolutionary rates of genes between species, pseudogenization and individual gene duplication and loss. To overcome these difficulties, a multigene approach has been undertaken to differentiate between a shared or species-specific WGD in the legumes Glycine max and Medicago truncatula  and a phylogenomics approach to correctly date proposed WGDs early in plant evolution . Nevertheless, for the fast and convenient detection and dating of so far undiscovered WGDs it would be ideal to have a protein family whose evolution has not been affected by the described problems. The difficulty is to identify such a protein family because most genes in plants exist in only one or two copies per genome (e.g. TEL genes , CAP and ARP2/3 proteins ) while other families like the expansin superfamily and the MADS-box transcription factor genes might contain dozens to over hundred gene family members [19, 20].
Myosins constitute one of the largest and most diverse protein families in eukaryotes . They are characterized by a motor domain that binds to actin in an ATP-dependent manner, a neck domain consisting of varying numbers of IQ motifs that each bind either a myosin-specific light chain or a calmodulin or calmodulin-like protein, and amino-terminal and carboxy-terminal domains of various length and function . Myosins are typically classified based on phylogenetic analyses of their motor domains. An analysis of all myosin genes available in 2007 allowed grouping them into 35 classes . While metazoans, fungi and protozoans contain myosins of many different classes, only myosins of class VIII and class XI, are present in and unique for plants. The formerly algae-specific class XIII myosins have been shown to be part of the class XI . Class VIII myosins contain long N-terminal extensions, that have not been characterised in detail so far, and C-terminal coiled-coil regions. Class XI myosins have six IQ motifs followed by an extended coiled-coil region and a DIL domain and thus have domain architectures identical to class V myosins.
Assembling and annotating plant myosins is a continuous effort of our group. Since the major myosin sequence analysis was published in 2007 , every newly assembled plant genome had been analysed. Annotated myosin sequences were made available to the community via CyMoBase [24, 25]. Since only a few plant genomes had been sequenced in 2007 [23, 26] we did not develop a concise nomenclature for the many homologs within the two plant myosin classes. Such a nomenclature should account for whole genome and single gene duplications and thus would require a broad taxonomic sampling. The first plant myosins identified in Arabidopsis thaliana had been named ATM1/ATM2 [27, 28] and MYA1/MYA2/MYA3 [28, 29]. Their recently suggested renaming , however, resulted in a mixture of numbers and letters to distinguish class VIII and class XI orthologs and paralogs in order to partly keep the earlier naming of the other 13 Arabidopsis myosins . Thus, a comprehensive naming scheme is still missing that would also be flexible enough to incorporate the myosins from the upcoming sequencing projects.
Here, we used the myosin protein family for reconstructing and predicting of WGDs in plant evolution. Myosins represent an outstanding case because in each extant plant species many homologs are present for which unambiguous paralog and ortholog relationships can be reconstructed. We present an analysis of 67 completely sequenced plant species that provides the framework for the identification and placement of WGDs in so far uncovered branches of the plant tree.
Identification and annotation of the plant myosins
The genomic regions containing putative myosin genes were identified using Arabidopsis thaliana myosins as queries for TBLASTN searches. The protein sequences were then assembled and annotated using ab initio gene prediction and cross-species gene reconstruction software followed by manual refinement. For ab initio gene predictions we used AUGUSTUS  and Genscan . Compared to myosins of other major eukaryotic branches the myosins of plants are relatively conserved and belong to only two classes, class VIII and class XI. As more and more draft genome assemblies of species closely related to already sequenced species become available, known gene annotations can be used as starting point for gene predictions. Here, we used the cross-species search function implemented in the gene reconstruction software WebScipio  to obtain myosins from such species. An example is the myosin protein family of Eutrema halophilum, which was annotated based on the preceding annotation of the myosins from Eutrema parvulum. Manual refinement of ab initio predicted and cross-species reconstructed sequences includes correcting wrongly predicted sequence regions, resolving sequencing problems and assembling myosins spread on several contigs. In detail, the comparison of a newly added myosin sequence with already annotated plant myosins in a structure guided, manually refined multiple sequence alignment allowed us to identify missing regions, whose sequences were added by manually inspecting the respective genomic regions, and to delete extra sequence, which has obviously been mis-predicted as exonic region within actually intronic sequence. Notably, plant myosins contain several very short exons that were missing in almost all ab initio predictions. During manual refinement we also accounted for in-frame stop codons and frame shifts as result of for example local low-coverage within genomic sequences.
WebScipio has also been used to reconstruct the gene structures of all plant myosins. Through comparison of intron positions and splice-site phases relative to the multiple protein sequence alignments, several suspicious exon borders could be resolved in the less conserved parts of the C-terminal tail regions. Unfortunately, full-length cDNA sequences are only available for about a dozen plant myosins, covering not even all Arabidopsis thaliana and Oryza sativa myosins. However, the available plant EST and cDNA read data helped in determining for example the correct N-termini of the headless class XI myosins and the C-termini of the short class XI myosins (see below). Plant genomes have been sequenced with different methods (Sanger, Roche/454, Illumina, and combinations of them) and different coverage. Only a few have undergone refinement and extensive closing of assembly gaps. Because myosins are large proteins we only used those genomes in which we could unambiguously reconstruct all myosins. Thus, we excluded the fragmented draft genomes of some species from our analysis. Among these are Penstemon cyananthus, Amaranthus tuberculatus, Lotus japonicus, Vigna radiate and Leersia perrieri. Nevertheless, some myosin genes contain smaller or larger gaps in many plant genomes. Sequences for which only a small part is missing (up to 5% of the average protein length) were termed “Partials”. “Partials” are not expected to considerably influence the phylogenetic tree computations and were used together with complete sequences for these computations. Sequences with gaps accounting for more than 5% of the expected sequence length were termed “Fragments”. “Fragments” are important for the qualitative analysis to denote the presence of this specific myosin subtype in the respective species but were not used in phylogenetic tree computations because of the long gaps in the alignment. Regions with gaps cannot be excluded from the alignment for tree computations, because the gaps in the “Fragments” are not at the same positions. However, separately adding each single “Fragment” to the alignment and calculating independent trees can unambiguously classify “Fragments”. For instance, a class XI myosin sequence containing about 1,300 residues of the putative 1,560 residues of the full-length sequence would be denoted as “Fragment” but its subtype relationship could be resolved unambiguously. The classification of all annotated myosins from the 67 completely sequenced plants into these three categories based on their respective sequence length is listed in Additional file 1.
The plant myosin dataset contains 828 sequences from 87 plant species. Out of these, 694 motor domain sequences from 67 species are complete and were used in the phylogenetic tree reconstructions. Additionally, phylogenetic trees were calculated based on reduced datasets comprising 380 myosin full length and 221 myosin motor domain sequences of less than 90% identity, respectively. The genome assemblies of Hordeum vulgare, Beta vulgaris, Betula nana (this genome assembly is highly contaminated with DNA from various fungi), Pyrus x bretschneideri, and Jatropha curcas were made available shortly after we had finished our analysis. Therefore, their myosins were not included in the tree computations but added to the qualitative analysis as examples for easily revealing WGDs in newly sequenced genomes. We tried to identify alternative splice variants based on the extensive cDNA/EST data available from plant transcriptome sequencing projects (Additional file 2). Only a few cases have been described for myosins from Oryza sativa  and Arabidopsis thaliana  that report intron retention events and alternative transcription start sites. We did not find any alternative splicing event in the available cDNA/EST data and the reported intron retention cases are not even conserved in closely related species leading to completely different sequences, frame-shifts and in-frame stop codons. Therefore, we conclude that either the reported cases contain incompletely spliced transcripts or that alternative splicing in plants is species-specific in contrast to the strong inter-species conservation of the coding sequence.
Phylogenetic analysis, classification and nomenclature
All new plant myosin sequences have been added to a multiple sequence alignment including all annotated myosins of all classes . This is a structure-guided sequence alignment in which gaps are prohibited within sequence regions mapping to secondary structural elements of the crystal structure of the myosin motor domain. Wherever gaps were present in genome assemblies leading to missing exons, we kept the integrity of the coding sequence of the neighbouring exons. Myosins are usually classified based on phylogenetic analyses of their motor domain sequences [23, 36]. While it is agreed that new classes are defined by strongly supported phylogenetic groupings and conserved domain organisations, a concise nomenclature of multiple members within these classes has not been developed yet. Such a nomenclature should reflect the phylogenetic relation of different subtypes within classes and thus needs to comply with branch- and species-specific whole genome, genomic region and single gene duplications leading to orthologs and paralogs.
Class XI myosins: Similar to the naming scheme for class VIII myosins, spermatophyte class XI myosins were named A to H according to their branching in five major subgroups. Out of these, three subgroups were further refined into subtypes 11A/11B, 11C/11D and 11E/11F. Additional numbers and characters reflect further, branch specific duplications. Numbers mark duplications affecting whole branches, while homologs in single species, which underwent additional duplications, are described by lowercase letters. For example, Brassica rapa underwent a species-specific whole genome duplication in addition to a whole genome duplication at the origin of the Brassicales clade . Accordingly myosin homologs of subtype B encoded by Brassica rapa are named myosin-11B1, -11B2a, -11B2b, -11B3, -11B4a, and -11B4b. The ortholog (numbers) and paralog (lowercase letters) relationship becomes apparent immediately. In contrast to the class VIII myosins, the class XI myosins have completely conserved gene structures. Some myosins have lost single introns in the tail regions but these losses are not subtype specific and cannot be used as discriminator.
Altogether, 208 of the plant myosins grouped to class VIII and 594 to class XI. 187 of the class VIII and 565 of the class XI myosins were derived from whole genome sequencing projects of 67 plant species (Figure 2).
Class VIII myosins
Class XI myosins
Plant headless myosins
In contrast, myosin-11E3 duplicates (called Myo11E4) have been found in all sequenced species of the Poales clade and are supported by EST/cDNA data for several of the species. These subtype 11E4 myosins encode three IQ motifs, miss the first part of the coiled-coil region of the Myo11E3 homologs due to loss of exon 23, but contain a conserved 40 amino acid long N-terminal extension (Figure 7B). They are not as identical to their respective Myo11E3 homologs as AtMyo11E2 is to AtMyo11E (48% identity compared to 72% for the A.thaliana homologs) and they are independently located in the genome and not in tandem to the Myo11E3 homologs (Additional file 9). This suggests that sub- or neo-functionalization has already occurred.
Short-tailed class XI myosins
Evidence for WGDs can be found by various methods. One of these is the reconstruction of phylogenetic trees from DNA and protein sequences. When analysing gene and protein families in phylogenetic analyses, however, it is very difficult to distinguish between single gene duplications, the duplication of small genomic regions, and WGDs. Theoretically, WGDs lead to the doubling of the entire gene set. However, species cannot maintain the entire set of duplicates because this provides the basis for deleterious mutations that would compromise the fitness of the genome . Therefore, duplicated genomes transform back to the original state by eliminating most of the duplicated gene set. Duplications of genomic regions can be distinguished from single gene duplications due to the micro-syntheny that should be present in the first case. In contrast, single gene duplications often result in tandemly arrayed genes. The difficulties in distinguishing between the three types of gene and genome duplications can be overcome through the analysis of multiple independent genes. If multiple genes from different genomic regions were independently duplicated in one genome compared to another, this would strongly support a WGD. Here, we propose using the myosin motor protein family as marker for WGDs in plants. Plant myosins represent a multi-gene family whose members are independent and distributed over all chromosomes in Arabidopsis thaliana (example of an eudicot) and Oryza sativa (example of an monocot; Additional file 9). In addition, we use a very high taxonomic sampling. This allows for the direct comparison of species and branches, which have undergone recent WGDs, to many closely related species/branches that did not duplicate. The first step of our analysis therefore consisted in the identification of the myosin repertoire in as many species as possible.
The complete repertoire of all myosins within a species can only be determined by analysing its genome sequence. Transcriptome data like cDNA, EST and RNASeq data are never complete because not all developmental stages and cell types are covered, and because not all myosins are abundant. By analysing transcriptome data it can therefore never be decided whether a certain myosin subtype is really “absent” in this species or only absent in the data. Another drawback of transcriptome data are usually their short read length. Given the above-average length of the myosin motor domain (compared to the average protein length in eukaryotes) cDNA and EST reads would be spread over the entire motor domain sequence. At the normal read depth of transcriptome data it would thus not be possible to decide which N-terminal read would belong to which read mapping to the middle or C-terminus of the motor domain, or whether these would belong to gene duplicates. The unknown number of gene duplicates in the species to be analysed is a further limitation. Short, non-overlapping sequences can, however, not be used in phylogenetic tree reconstructions. Therefore, we only used data from whole genome and high coverage assemblies. Incomplete genome assemblies as result from low coverage sequencing were not included into the analysis. Examples for the latter are the fragmentary assemblies of Penstemon cyananthus, Amaranthus tuberculatus, Lotus japonicus, Vigna radiata and Leersia perrieri. Unfortunately, a genome sequence of a gymnosperm is not available today. Therefore, whole genome duplications in plants can only be traced back to the last common ancestor of the angiosperms.
Annotated gene datasets are only available for a few sequenced plant genomes, and most of these annotations are based on automatic gene predictions without including cDNA and EST data. Full-length cDNA sequences are only available for the Arabidopsis thaliana and Oryza sativa sequencing projects covering a few of the myosins. Therefore, we had to manually assemble all sequences based on preliminary results from ab-initio gene prediction and cross-species gene reconstruction software. To help in the correct assembly of the myosin coding sequences from the genomic DNA, available cDNA sequences of single homologs from other species have also been used for comparison and are included in the multiple sequence alignment. Altogether, we were able to identify and reconstruct 775 myosins in 67 completely sequenced plant species (Figure 2). In the qualitative analysis of the presence and absence of homologs in species and branches we included all sequences while only complete and “partial” (see Results section for definition) sequences were used in the tree computations. These phylogenetic trees were used to resolve the ortholog-paralog relationship between the analysed plant myosins. The grouping into different myosin subtypes is additionally supported by subtype-specific identical gene structures (Additional file 5) and subtype-specific homologous sequences within the unique regions of the class VIII and class XI myosins (Additional file 10). By mapping the paralogs onto the plant species tree, it can subsequently be determined whether the paralogs resulted from a duplication event before or after a given branching event. In the case of a WGD we suppose that many if not all of the myosins are present as duplicates. It is highly unlikely that several myosin subtypes duplicated independently of each other, e.g. as part of multiple single gene duplications. In contrast, if only one or two of the myosins were duplicated in the comparison of two closely related species/branches, it would be rather likely that these duplications are the result of single gene duplications or duplications of genomic regions.
The qualitative analysis together with the phylogenetic tree reconstructions also allows for timing the WGD events. By resolving the phylogenetic relationship between species, we could, for example, support the proposed timing of WGDs in the Brassicales clade [15, 56, 58, 59]. In accordance with these studies, the myosin data also support placing the α and β WGD events after the divergence of the Papaya lineage from the Brassicales clade (Figure 9). Similarly, the WGD found in Malus x domestica  was placed at the origin of the Maleae after their divergence from the Amygdaleae (containing Prunus persica, for instance; Figure 11A). We conclude that the myosin gene family could be very suitable for detecting ancient WGDs through phylogenetic reconstructions. Obviously, the plants retained many of the duplicated myosins after the WGD events and additional single gene duplications are rare. So far, the genes reported to have survived the ancient WGDs did mainly belong to transcription factors, transferases and their binding proteins, and protein kinases . The most popular models to describe gene duplications include neo- and subfunctionalization, dosage effects, and shielding against deleterious mutations . The reason for retaining so many similar myosins in plants has, however, not been determined yet. Myosins are not part of metabolic pathways, in which duplications of single genes have very strong effects, but are part of the intracellular transport machinery. Thus, duplicated myosins could have specialized in the transport of specific cargoes. Also, having a higher dosage of myosins after WGDs would probably not be harmful to the species.
In addition to the formerly described WGDs, we also found evidence indicating further WGDs (Figure 9, Additional files 3, 4 and 12): First, we found evidence for two very recent WGDs in Linum usitatissimum, of which only one had been suggested before . The myosin-8B, myosin-11A and myosin-11C subtypes clearly group into one-to-two-to-four patterns and three of the 11D subtype myosins are still present (Figure 2, Figure 11B). It is unlikely, that seven independent myosins underwent single gene duplications or genomic region duplications in the short time since the divergence of Linum from Ricinus and Populus. Second, the myosin data indicated genome duplications in Gossypium raimondii. Recently, the genome of this cotton species had been sequenced independently by two groups [4, 61]. One group analysed synonymous nucleotide substitution (KS) values and the resulting single peak had been interpreted as a single WGD . The other determined an abrupt five- to sixfold ploidy in the cotton lineage shortly after its divergence from the ancestor shared with Theobroma cacao although KS values also only showed a single peak . In the genome analysis about 7,000 co-linearity supported gene triplets have been found . The myosins are also present as triplets in subtypes myosin-8A, -8B, -11A, -11E and their phylogeny does not show any one-to-two-to-four pattern (Figure 2, Figure 11C). Thus, instead of two consecutive WGDs our data would support a triplication that happened after separation of Theobroma from Gossypium. For the exact timing genome data from additional species of the Malvales branch would be necessary. Third, the number of homologs encoded by N.benthamiana is doubled in comparison to other Solanaceae, with the exception of subtype myosin-11B, of which only three instead of four homologs were identified in N.benthamiana, (Figure 2, Additional file 12A). The N.benthamiana myosins always group together in single branches. Therefore, we propose a genome duplication in Nicotiana benthamiana after its divergence from the other Solanaceae. Forth, Manihot esculenta encodes duplicates of myosin-8A, -11A, -11B and -11E compared to Jatropha curcas, which encodes only one homolog of each of the myosin-11 subtypes (Figure 2, Additional file 12B). The one-to-two pattern of the duplicates indicates that the Manihot esculenta WGD happened after separation from Jatropha (Figure 9). Fifth, the myosin data suggest two WGDs or a genome triplication in the evolution of Phoenix dactylifera after its divergence from Musa acuminata (Figure 9). In detail, subtypes myosin-8A, -8B and -11A are present as triplets, subtypes -11E and -11G as duplets (Figure 2, Additional file 12C). In contrast, only a single WGD has been reported recently based on the analysis of a preliminary P.dactylifera annotation . Sixth and most notably, reconstruction of the class XI myosin family suggests another duplication in the ancestor of angiosperms in addition to the ϵ and ζ WGDs (Figure 10). However, in this case we cannot distinguish between a single gene and whole genome duplication. This might become possible when genome assemblies of species become available that diverged after separation of the Lycopodiophyta but before the Magnoliophyta established.
In general, most whole genome assemblies were reported to contain only 80-90% genome coverage by comparing genome assembly sizes with experimental genome size estimations obtained by e.g. flow cytometry. Although most of the supposed missing genome sequence concerns telomere and other highly repetitive regions, myosin homologs might have been missed in our analysis due to gaps in the genome assemblies. However, the class VIII and class XI myosins consist of many subtypes. Even if one or several of the myosins were missing in a certain genome the comparison of the (incomplete) myosin repertoire of the genome to the presented table of myosins across the plant phylum (Figure 2) allows reconstruction of WGDs and will also allow prediction of WGDs in upcoming plant genome assemblies. The phylogenetic analysis of the myosins in these upcoming assemblies together with the dataset presented here will also allow the timing of proposed WGDs. This way, WGDs can already be reconstructed and predicted for species for which only fragmented genome assemblies are available hindering syntheny-based studies.
Based on phylogenetic tree reconstructions, we identified two class VIII myosin subtypes and eight class XI subfamilies. The topology of the subtypes together with the phylogeny of the homologs within the subtype branches allowed reconstructing the WGDs that occurred in the evolution of the tracheophytes. Although most known WGDs could be reproduced the myosins did not reveal all known WGDs. Therefore, WGDs might have been missed in branches that do not show WGDs based on myosin data and for which further analyses are not yet available. The myosin data revealed evidence for two ancient, angiosperm-wide WGDs, potentially identical with the most ancient, formerly described WGDs occurring during seed plant and angiosperm evolution, called ϵ and ζ. In addition to reconstruct already known WGDs, we also propose further WGDs in the Manihot esculenta, Linum usitatissimum, Gossypium raimondii, Nicothiana benthamiana and Phoenix dactylifera lineages, and another possible WGD in the ancestor of the angiosperms. This is the first analysis of 67 completely sequenced plant genomes revealing most of the known WGD events by analysing a single protein family. We propose that myosin duplications not contained in the presented dataset but found in future sequenced species are very strong hints to further WGDs. The myosins will also be a strong complement where other methods are not appropriate of do not reveal clear answers.
Identification and annotation of the myosin heavy chain genes
The complete myosin heavy chain gene repertoires of Chlamydomonas reinhardtii, Ostreococcus lucimarinus, Ostreococcus tauri, Populus trichocarpa, Arabidopsis thaliana, Sorghum bicolor, and Oryza sativa were obtained from . The sequences were updated based on newer genome assemblies if necessary. Some minor ambiguities in the tail regions were corrected based on the comparative analysis with newly available genomes from plants of the same branch. The myosin genes of most other plant and algae species have essentially been obtained as described in . Shortly, myosin genes have been identified in TBLASTN searches starting with the protein sequences of the Arabidopsis myosins. The respective genomic regions were submitted to AUGUSTUS  to obtain gene predictions. However, feature sets are only available for a few plant species. Therefore, all hits were subsequently manually analysed at the genomic DNA level. When necessary, gene predictions were corrected by comparison with the other myosins as included in the multiple sequence alignment. As the amount of plant myosin sequences increased (especially the number of sequences from taxa with few representatives), many of the initially predicted sequences were reanalysed to correctly identify all exon borders in the unique parts of the tail regions. Where possible, EST data have been analysed to help in the annotation process.
Recently, genome sequencing efforts have been extended from sequencing species from new branches to sequencing closely related organisms. Within the plants these species include for example Cucumis melo, Eucalyptus camaldulensis, Solanum pimpinellifolium, Lycopersicon esculentum, Eutrema halophilum (two different assemblies of Eutrema halophilum (Thellungiella halophila) are available [62, 63] that had been analysed independently here), and Fragaria vesca, of which the closely related species Cucumis sativus, Eucalyptus grandis, Solanum tuberosum, Eutrema parvulum, and Prunus persica had been sequenced before. Protein sequences from these closely related species have been obtained by using the cross-species functionality of WebScipio [34, 64]. Nevertheless, for all these genomes TBLASTN searches have been performed. With this strategy, we sought to ensure that we would not miss more divergent myosin homologs, which might have been derived by species-specific inventions or duplications.
All sequence related data (protein names, corresponding species, GenBank ID’s, alternative names, corresponding publications, domain predictions, sequences, and gene structure reconstructions) and references to genome sequencing centres are available at CyMoBase (http://www.cymobase.org, ). A list of the analysed species, their abbreviations as used in the alignments and trees, as well as detailed information and acknowledgments of the respective sequencing centres are also available as Additional file 13. Most plant genomes have been published or are available from GenBank. Permission to use the myosin data from Aquilegia coerulea, Citrus clementina, Eucalyptus grandis, Panicum virgatum, Phaseolus vulgaris has been obtained from the genome project leaders. WebScipio [34, 64] was used to reconstruct the gene structure (i.e. the exon/intron pattern) of each sequence.
Generating the multiple sequence alignment
The plant myosin sequences were added to the structure-guided multiple sequence alignment obtained from . In detail, we first aligned every newly predicted sequence to its supposed closest relative using ClustalW  and added it then to the multiple sequence alignment. During the subsequent sequence validation process, we manually adjusted the obtained alignment by removing wrongly predicted sequence regions and filling gaps. Still, in those sequences derived from low-coverage genomes many gaps remained. To maintain the integrity of exons preceded or followed by gaps, gaps reflecting missing parts of the genomes were added to the multiple sequence alignment. The sequence alignment can be obtained from CyMoBase or Additional file 10. Reduced alignments containing sets of representative sequences of less than 90% identity were obtained by using the CD-HIT suite .
Computing and visualising phylogenetic trees
For calculating phylogenetic trees only complete and almost complete (missing a maximum of 5% of the supposed full-length sequence, “Partials”) sequences were included in the dataset (Additional file 1). As outgroup, class V myosin sequences from Homo sapiens, Mus musculus, Caenorhabditis elegans, Drosophila melanogaster and Saccharomyces cerevisiae were added. The phylogenetic trees were generated using three different methods: Neighbour Joining, Maximum likelihood and Bayesian inference. 1. ClustalW v.2.0.10  was used to calculate unrooted trees with the Neighbour Joining method. For each dataset, bootstrapping with 1,000 replicates was performed. 2. Maximum likelihood (ML) analysis with estimated proportion of invariable sites and bootstrapping (1,000 replicates) were performed using RAxML . To this end, ProtTest was used first to determine the most appropriate of the available 112 amino acid substitution models . Within ProtTest, the tree topology was calculated with the BioNJ algorithm and both the branch lengths and the model of protein evolution were optimized simultaneously. The Akaike Information Criterion with a modification to control for small sample size (AICc, with alignment length representing sample size) identified the JTT model with gamma model of rate heterogeneity to be the best. 3. Posterior probabilities were generated using MrBayes v3.2.1. . Using the mixed amino-acid option, two independent runs with 10,000,000 generations, four chains, and a random starting tree were performed. MrBayes used the JTT model  for all protein alignments. Trees were sampled every 1.000th generation and the first 25% of the trees were discarded as “burn-in” before generating a consensus tree. Phylogenetic trees were visualized with the CLC Sequence Viewer (http://www.clcbio.com) and iTOL  and are available as Additional files 3 and 11.
Availability of supporting data
We would like to thank Björn Hammesfahr for help with CyMoBase and Prof. Christian Griesinger for continuous generous support. We thank Frederick Gmitter (Citrus clementina), Scott Jackson (Phaseolus vulgaris), Scott Hodges (Aquilegia coerulea), Jeremy Schmutz (Panicum virgatum), and Zander Myburg (Eucalyptus grandis) for giving permission to use the myosin data prior to publication of the genome projects. This work was partly supported by the Göttingen Graduate School of Neurosciences and Molecular Biosciences (DFG Grants GSC 226/1 and GSC 226/2).
- Jaillon O, Aury J-M, Wincker P: “Changing by doubling”, the impact of whole genome duplications in the evolution of eukaryotes. C R Biol. 2009, 332: 241-253. 10.1016/j.crvi.2008.07.007.PubMedView ArticleGoogle Scholar
- Sémon M, Wolfe KH: Consequences of genome duplication. Curr Opin Genet Dev. 2007, 17: 505-512. 10.1016/j.gde.2007.09.007.PubMedView ArticleGoogle Scholar
- D’Hont A, Denoeud F, Aury J-M, Baurens F-C, Carreel F, Garsmeur O, Noel B, Bocs S, Droc G, Rouard M, Silva CD, Jabbari K, Cardi C, Poulain J, Souquet M, Labadie K, Jourda C, Lengellé J, Rodier-Goud M, Alberti A, Bernard M, Correa M, Ayyampalayam S, Mckain MR, Leebens-Mack J, Burgess D, Freeling M, Mbéguié-A-Mbéguié D, Chabannes M, Wicker T, et al: The banana (Musa acuminata) genome and the evolution of monocotyledonous plants. Nature. 2012, 488: 213-217. 10.1038/nature11241.PubMedView ArticleGoogle Scholar
- Wang K, Wang Z, Li F, Ye W, Wang J, Song G, Yue Z, Cong L, Shang H, Zhu S, Zou C, Li Q, Yuan Y, Lu C, Wei H, Gou C, Zheng Z, Yin Y, Zhang X, Liu K, Wang B, Song C, Shi N, Kohel RJ, Percy RG, Yu JZ, Zhu Y-X, Wang J, Yu S: The draft genome of a diploid cotton Gossypium raimondii. Nat Genet. 2012, 44: 1098-1103. 10.1038/ng.2371.PubMedView ArticleGoogle Scholar
- Wang X, Wang H, Wang J, et al: The genome of the mesopolyploid crop species Brassica rapa. Nat Genet. 2011, 43: 1035-1039. 10.1038/ng.919.PubMedView ArticleGoogle Scholar
- Fawcett JA, Maere S, Van de Peer Y: Plants with double genomes might have had a better chance to survive the Cretaceous–Tertiary extinction event. Proc Natl Acad Sci U S A. 2009, 106: 5737-5742. 10.1073/pnas.0900906106.PubMed CentralPubMedView ArticleGoogle Scholar
- Van de Peer Y: A mystery unveiled. Genome Biol. 2011, 12: 113-10.1186/gb-2011-12-5-113.PubMed CentralPubMedView ArticleGoogle Scholar
- Velasco R, Zharkikh A, Affourtit J, Dhingra A, Cestaro A, Kalyanaraman A, Fontana P, Bhatnagar SK, Troggio M, Pruss D, Salvi S, Pindo M, Baldi P, Castelletti S, Cavaiuolo M, Coppola G, Costa F, Cova V, Ri AD, Goremykin V, Komjanc M, Longhi S, Magnago P, Malacarne G, Malnoy M, Micheletti D, Moretto M, Perazzolli M, Si-Ammour A, Vezzulli S, et al: The genome of the domesticated apple (Malus × domestica Borkh.). Nat Genet. 2010, 42: 833-839. 10.1038/ng.654.PubMedView ArticleGoogle Scholar
- Schmutz J, Cannon SB, Schlueter J, Ma J, Mitros T, Nelson W, Hyten DL, Song Q, Thelen JJ, Cheng J, Xu D, Hellsten U, May GD, Yu Y, Sakurai T, Umezawa T, Bhattacharyya MK, Sandhu D, Valliyodan B, Lindquist E, Peto M, Grant D, Shu S, Goodstein D, Barry K, Futrell-Griggs M, Abernathy B, Du J, Tian Z, Zhu L, et al: Genome sequence of the palaeopolyploid soybean. Nature. 2010, 463: 178-183. 10.1038/nature08670.PubMedView ArticleGoogle Scholar
- Hufton AL, Panopoulou G: Polyploidy and genome restructuring: a variety of outcomes. Curr Opin Genet Dev. 2009, 19: 600-606. 10.1016/j.gde.2009.10.005.PubMedView ArticleGoogle Scholar
- Jaillon O, Aury J-M, Noel B, Policriti A, Clepet C, Casagrande A, Choisne N, Aubourg S, Vitulo N, Jubin C, Vezzi A, Legeai F, Hugueney P, Dasilva C, Horner D, Mica E, Jublot D, Poulain J, Bruyère C, Billault A, Segurens B, Gouyvenoux M, Ugarte E, Cattonaro F, Anthouard V, Vico V, Del Fabbro C, Alaux M, Di Gaspero G, Dumas V, et al: The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature. 2007, 449: 463-467. 10.1038/nature06148.PubMedView ArticleGoogle Scholar
- Jiao Y, Leebens-Mack J, Ayyampalayam S, Bowers JE, McKain MR, McNeal J, Rolf M, Ruzicka DR, Wafula E, Wickett NJ, Wu X, Zhang Y, Wang J, Zhang Y, Carpenter EJ, Deyholos MK, Kutchan TM, Chanderbali AS, Soltis PS, Stevenson DW, McCombie R, Pires JC, Wong GK-S, Soltis DE, Depamphilis CW: A genome triplication associated with early diversification of the core eudicots. Genome Biol. 2012, 13: R3-10.1186/gb-2012-13-1-r3.PubMed CentralPubMedView ArticleGoogle Scholar
- Tang H, Wang X, Bowers JE, Ming R, Alam M, Paterson AH: Unraveling Ancient Hexaploidy Through Multiply-Aligned Angiosperm Gene Maps. Genome Res. 2008, 18: 1944-1954. 10.1101/gr.080978.108.PubMed CentralPubMedView ArticleGoogle Scholar
- Jiao Y, Wickett NJ, Ayyampalayam S, Chanderbali AS, Landherr L, Ralph PE, Tomsho LP, Hu Y, Liang H, Soltis PS, Soltis DE, Clifton SW, Schlarbaum SE, Schuster SC, Ma H, Leebens-Mack J, de Pamphilis CW: Ancestral polyploidy in seed plants and angiosperms. Nature. 2011, 473: 97-100. 10.1038/nature09916.PubMedView ArticleGoogle Scholar
- Van de Peer Y, Fawcett JA, Proost S, Sterck L, Vandepoele K: The flowering world: a tale of duplications. Trends Plant Sci. 2009, 14: 680-688. 10.1016/j.tplants.2009.09.001.PubMedView ArticleGoogle Scholar
- Pfeil BE, Schlueter JA, Shoemaker RC, Doyle JJ: Placing paleopolyploidy in relation to taxon divergence: a phylogenetic analysis in legumes using 39 gene families. Syst Biol. 2005, 54: 441-454. 10.1080/10635150590945359.PubMedView ArticleGoogle Scholar
- Charon C, Bruggeman Q, Thareau V, Henry Y: Gene duplication within the Green Lineage: the case of TEL genes. J Exp Bot. 2012, 63: 5061-5077. 10.1093/jxb/ers181.PubMedView ArticleGoogle Scholar
- Hatje K, Kollmar M: A phylogenetic analysis of the brassicales clade based on an alignment-free sequence comparison method. Front Plant Sci. 2012, 3: 192-PubMed CentralPubMedView ArticleGoogle Scholar
- Sampedro J, Lee Y, Carey RE, de Pamphilis C, Cosgrove DJ: Use of genomic history to improve phylogeny and understanding of births and deaths in a gene family. Plant J. 2005, 44: 409-419. 10.1111/j.1365-313X.2005.02540.x.PubMedView ArticleGoogle Scholar
- Airoldi CA, Davies B: Gene duplication and the evolution of plant MADS-box transcription factors. J Genet Genomics. 2012, 39: 157-165. 10.1016/j.jgg.2012.02.008.PubMedView ArticleGoogle Scholar
- Krendel M, Mooseker MS: Myosins: tails (and heads) of functional diversity. Physiology (Bethesda). 2005, 20: 239-251. 10.1152/physiol.00014.2005.View ArticleGoogle Scholar
- Schliwa M, Woehlke G: Molecular motors. Nature. 2003, 422: 759-765. 10.1038/nature01601.PubMedView ArticleGoogle Scholar
- Odronitz F, Kollmar M: Drawing the tree of eukaryotic life based on the analysis of 2,269 manually annotated myosins from 328 species. Genome Biol. 2007, 8: R196-10.1186/gb-2007-8-9-r196.PubMed CentralPubMedView ArticleGoogle Scholar
- CyMoBase - a database for cytoskeletal and motor proteins. http://www.cymobase.org,
- Odronitz F, Kollmar M: Pfarao: a web application for protein family analysis customized for cytoskeletal and motor proteins (CyMoBase). BMC Genomics. 2006, 7: 300-10.1186/1471-2164-7-300.PubMed CentralPubMedView ArticleGoogle Scholar
- Pennisi E: The greening of plant genomics. Science. 2007, 317: 317-317. 10.1126/science.317.5836.317.PubMedView ArticleGoogle Scholar
- Knight AE, Kendrick-Jones J: A myosin-like protein from a higher plant. J Mol Biol. 1993, 231: 148-154. 10.1006/jmbi.1993.1266.PubMedView ArticleGoogle Scholar
- Kinkema M, Wang H, Schiefelbein J: Molecular analysis of the myosin gene family in Arabidopsis thaliana. Plant Mol Biol. 1994, 26: 1139-1153. 10.1007/BF00040695.PubMedView ArticleGoogle Scholar
- Kinkema M, Schiefelbein J: A myosin from a higher plant has structural similarities to class V myosins. J Mol Biol. 1994, 239: 591-597. 10.1006/jmbi.1994.1400.PubMedView ArticleGoogle Scholar
- Peremyslov VV, Mockler TC, Filichkin SA, Fox SE, Jaiswal P, Makarova KS, Koonin EV, Dolja VV: Expression, Splicing, and Evolution of the Myosin Gene Family in Plants. Plant Physiol. 2011, 155: 1191-1204. 10.1104/pp.110.170720.PubMed CentralPubMedView ArticleGoogle Scholar
- Reddy AS, Day IS: Analysis of the myosins encoded in the recently completed Arabidopsis thaliana genome sequence. Genome Biol. 2001, 2: research0024.1-research0024.17. 10.1186/gb-2001-2-7-research0024.View ArticleGoogle Scholar
- Stanke M, Morgenstern B: AUGUSTUS: a web server for gene prediction in eukaryotes that allows user-defined constraints. Nucleic Acids Res. 2005, 33: W465-W467. 10.1093/nar/gki458.PubMed CentralPubMedView ArticleGoogle Scholar
- Burge C, Karlin S: Prediction of complete gene structures in human genomic DNA. J Mol Biol. 1997, 268: 78-94. 10.1006/jmbi.1997.0951.PubMedView ArticleGoogle Scholar
- Hatje K, Keller O, Hammesfahr B, Pillmann H, Waack S, Kollmar M: Cross-species protein sequence and gene structure prediction with fine-tuned Webscipio 2.0 and Scipio. BMC Res Notes. 2011, 4: 265-10.1186/1756-0500-4-265.PubMed CentralPubMedView ArticleGoogle Scholar
- Jiang S, Ramachandran S: Identification and molecular characterization of myosin gene family in Oryza sativa genome. Plant Cell Physiol. 2004, 45: 590-599. 10.1093/pcp/pch061.PubMedView ArticleGoogle Scholar
- Berg JS, Powell BC, Cheney RE: A millennial myosin census. Mol Biol Cell. 2001, 12: 780-794. 10.1091/mbc.12.4.780.PubMed CentralPubMedView ArticleGoogle Scholar
- Lagercrantz U: Comparative mapping between Arabidopsis thaliana and Brassica nigra indicates that Brassica genomes have evolved through extensive genome replication accompanied by chromosome fusions and frequent rearrangements. Genetics. 1998, 150: 1217-1228.PubMed CentralPubMedGoogle Scholar
- Letunic I, Doerks T, Bork P: SMART 7: recent updates to the protein domain annotation resource. Nucleic Acids Res. 2011, 40: D302-D305.PubMed CentralPubMedView ArticleGoogle Scholar
- Finn RD, Mistry J, Tate J, Coggill P, Heger A, Pollington JE, Gavin OL, Gunasekaran P, Ceric G, Forslund K, Holm L, Sonnhammer ELL, Eddy SR, Bateman A: The Pfam protein families database. Nucleic Acids Res. 2010, 38: D211-D222. 10.1093/nar/gkp985.PubMed CentralPubMedView ArticleGoogle Scholar
- Ponting CP: AF-6/cno: neither a kinesin nor a myosin, but a bit of both. Trends Biochem Sci. 1995, 20: 265-266. 10.1016/S0968-0004(00)89040-4.PubMedView ArticleGoogle Scholar
- Pashkova N, Jin Y, Ramaswamy S, Weisman LS: Structural basis for myosin V discrimination between distinct cargoes. EMBO J. 2006, 25: 693-700. 10.1038/sj.emboj.7600965.PubMed CentralPubMedView ArticleGoogle Scholar
- Heuck A, Fetka I, Brewer DN, Hüls D, Munson M, Jansen R-P, Niessing D: The structure of the Myo4p globular tail and its function in ASH1 mRNA localization. J Cell Biol. 2010, 189: 497-510. 10.1083/jcb.201002076.PubMed CentralPubMedView ArticleGoogle Scholar
- Crooks GE, Hon G, Chandonia JM, Brenner SE: WebLogo: a sequence logo generator. Genome Res. 2004, 14: 1188-1190. 10.1101/gr.849004.PubMed CentralPubMedView ArticleGoogle Scholar
- Arrigo N, Barker MS: Rarely successful polyploids and their legacy in plant genomes. Curr Opin Plant Biol. 2012, 15: 140-146. 10.1016/j.pbi.2012.03.010.PubMedView ArticleGoogle Scholar
- Bausher MG, Singh ND, Lee S-B, Jansen RK, Daniell H: The complete chloroplast genome sequence of Citrus sinensis (L.) Osbeck var “Ridge Pineapple”: organization and phylogenetic relationships to other angiosperms. BMC Plant Biol. 2006, 6: 21-10.1186/1471-2229-6-21.PubMed CentralPubMedView ArticleGoogle Scholar
- Rensing SA, Lang D, Zimmer AD, Terry A, Salamov A, Shapiro H, Nishiyama T, Perroud P-F, Lindquist EA, Kamisugi Y, Tanahashi T, Sakakibara K, Fujita T, Oishi K, Shin-I T, Kuroki Y, Toyoda A, Suzuki Y, Hashimoto S-i, Yamaguchi K, Sugano S, Kohara Y, Fujiyama A, Anterola A, Aoki S, Ashton N, Barbazuk WB, Barker E, Bennetzen JL, Blankenship R, et al: The Physcomitrella Genome Reveals Evolutionary Insights into the Conquest of Land by Plants. Science. 2008, 319: 64-69. 10.1126/science.1150646.PubMedView ArticleGoogle Scholar
- Cui L, Wall PK, Leebens-Mack JH, Lindsay BG, Soltis DE, Doyle JJ, Soltis PS, Carlson JE, Arumuganathan K, Barakat A, Albert VA, Ma H, de Pamphilis CW: Widespread genome duplications throughout the history of flowering plants. Genome Res. 2006, 16: 738-749. 10.1101/gr.4825606.PubMed CentralPubMedView ArticleGoogle Scholar
- Gaut BS, Doebley JF: DNA sequence evidence for the segmental allotetraploid origin of maize. PNAS. 1997, 94: 6809-6814. 10.1073/pnas.94.13.6809.PubMed CentralPubMedView ArticleGoogle Scholar
- Missaoui AM, Paterson AH, Bouton JH: Investigation of genomic organization in switchgrass (Panicum virgatum L.) using DNA markers. Theor Appl Genet. 2005, 110: 1372-1383. 10.1007/s00122-005-1935-6.PubMedView ArticleGoogle Scholar
- Beardsley PM, Schoenig SE, Whittall JB, Olmstead RG: Patterns of evolution in western North American Mimulus (Phrymaceae). Am J Bot. 2004, 91: 474-489. 10.3732/ajb.91.3.474.PubMedView ArticleGoogle Scholar
- Consortium TTG: The tomato genome sequence provides insights into fleshy fruit evolution. Nature. 2012, 485: 635-641. 10.1038/nature11119.View ArticleGoogle Scholar
- Schlueter JA, Scheffler BE, Jackson S, Shoemaker RC: Fractionation of Synteny in a Genomic Region Containing Tandemly Duplicated Genes across Glycine max, Medicago truncatula, and Arabidopsis thaliana. J Hered. 2008, 99: 390-395. 10.1093/jhered/esn010.PubMedView ArticleGoogle Scholar
- Shoemaker RC, Polzin K, Labate J, Specht J, Brummer EC, Olson T, Young N, Concibido V, Wilcox J, Tamulonis JP, Kochert G, Boerma HR: Genome Duplication in Soybean (Glycine Subgenus Soja). Genetics. 1996, 144: 329-338.PubMed CentralPubMedGoogle Scholar
- Tuskan GA, DiFazio S, Jansson S, et al: The Genome of Black Cottonwood, Populus trichocarpa (Torr. & Gray). Science. 2006, 313: 1596-1604. 10.1126/science.1128691.PubMedView ArticleGoogle Scholar
- Wang Z, Hobson N, Galindo L, Zhu S, Shi D, McDill J, Yang L, Hawkins S, Neutelings G, Datla R, Lambert G, Galbraith DW, Grassa CJ, Geraldes A, Cronk QC, Cullis C, Dash PK, Kumar PA, Cloutier S, Sharpe AG, Wong GK-S, Wang J, Deyholos MK: The genome of flax (Linum usitatissimum) assembled de novo from short shotgun sequence reads. Plant J. 2012, 72: 461-473. 10.1111/j.1365-313X.2012.05093.x.PubMedView ArticleGoogle Scholar
- Bowers JE, Chapman BA, Rong J, Paterson AH: Unravelling angiosperm genome evolution by phylogenetic analysis of chromosomal duplication events. Nature. 2003, 422: 433-438. 10.1038/nature01521.PubMedView ArticleGoogle Scholar
- Tang H, Bowers JE, Wang X, Paterson AH: Angiosperm genome comparisons reveal early polyploidy in the monocot lineage. PNAS. 2010, 107: 472-477. 10.1073/pnas.0908007107.PubMed CentralPubMedView ArticleGoogle Scholar
- Ming R, Hou S, Feng Y, Yu Q, Dionne-Laporte A, Saw JH, Senin P, Wang W, Ly BV, Lewis KLT, Salzberg SL, Feng L, Jones MR, Skelton RL, Murray JE, Chen C, Qian W, Shen J, Du P, Eustice M, Tong E, Tang H, Lyons E, Paull RE, Michael TP, Wall K, Rice DW, Albert H, Wang M-L, Zhu YJ, et al: The draft genome of the transgenic tropical fruit tree papaya (Carica papaya Linnaeus). Nature. 2008, 452: 991-996. 10.1038/nature06856.PubMed CentralPubMedView ArticleGoogle Scholar
- Lyons E, Pedersen B, Kane J, Alam M, Ming R, Tang H, Wang X, Bowers J, Paterson A, Lisch D, Freeling M: Finding and Comparing Syntenic Regions among Arabidopsis and the Outgroups Papaya, Poplar, and Grape: CoGe with Rosids. Plant Physiol. 2008, 148: 1772-1781. 10.1104/pp.108.124867.PubMed CentralPubMedView ArticleGoogle Scholar
- Innan H, Kondrashov F: The evolution of gene duplications: classifying and distinguishing between models. Nat Rev Genet. 2010, 11: 97-108.PubMedView ArticleGoogle Scholar
- Paterson AH, Wendel JF, Gundlach H, Guo H, Jenkins J, Jin D, Llewellyn D, Showmaker KC, Shu S, Udall J, Yoo M, Byers R, Chen W, Doron-Faigenboim A, Duke MV, Gong L, Grimwood J, Grover C, Grupp K, Hu G, Lee T, Li J, Lin L, Liu T, Marler BS, Page JT, Roberts AW, Romanel E, Sanders WS, Szadkowski E, et al: Repeated polyploidization of Gossypium genomes and the evolution of spinnable cotton fibres. Nature. 2012, 492: 423-427. 10.1038/nature11798.PubMedView ArticleGoogle Scholar
- Wu H-J, Zhang Z, Wang J-Y, Oh D-H, Dassanayake M, Liu B, Huang Q, Sun H-X, Xia R, Wu Y, Wang Y-N, Yang Z, Liu Y, Zhang W, Zhang H, Chu J, Yan C, Fang S, Zhang J, Wang Y, Zhang F, Wang G, Lee SY, Cheeseman JM, Yang B, Li B, Min J, Yang L, Wang J, Chu C, et al: Insights into salt tolerance from the genome of Thellungiella salsuginea. Proc Natl Acad Sci U S A. 2012, 109: 12219-12224. 10.1073/pnas.1209954109.PubMed CentralPubMedView ArticleGoogle Scholar
- Yang R, Jarvis DE, Chen H, Beilstein MA, Grimwood J, Jenkins J, Shu S, Prochnik S, Xin M, Ma C, Schmutz J, Wing RA, Mitchell-Olds T, Schumaker KS, Wang X: The Reference Genome of the Halophytic Plant Eutrema salsugineum. Front Plant Sci. 2013, 4: 46-PubMed CentralPubMedGoogle Scholar
- Odronitz F, Pillmann H, Keller O, Waack S, Kollmar M: WebScipio: an online tool for the determination of gene structures using protein sequences. BMC Genomics. 2008, 9: 422-10.1186/1471-2164-9-422.PubMed CentralPubMedView ArticleGoogle Scholar
- Thompson JD, Gibson TJ, Higgins DG: Multiple sequence alignment using ClustalW and ClustalX. Curr Protoc Bioinformatics. 2002, Chapter 2: Unit 2-3.Google Scholar
- Li W, Godzik A: Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006, 22: 1658-1659. 10.1093/bioinformatics/btl158.PubMedView ArticleGoogle Scholar
- Stamatakis A, Hoover P, Rougemont J: A rapid bootstrap algorithm for the RAxML Web servers. Syst Biol. 2008, 57: 758-771. 10.1080/10635150802429642.PubMedView ArticleGoogle Scholar
- Darriba D, Taboada GL, Doallo R, Posada D: ProtTest 3: fast selection of best-fit models of protein evolution. Bioinformatics. 2011, 27: 1164-1165. 10.1093/bioinformatics/btr088.PubMedView ArticleGoogle Scholar
- Ronquist F, Huelsenbeck JP: MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics. 2003, 19: 1572-1574. 10.1093/bioinformatics/btg180.PubMedView ArticleGoogle Scholar
- Jones DT, Taylor WR, Thornton JM: A new approach to protein fold recognition. Nature. 1992, 358: 86-89. 10.1038/358086a0.PubMedView ArticleGoogle Scholar
- Letunic I, Bork P: Interactive Tree Of Life (iTOL): an online tool for phylogenetic tree display and annotation. Bioinformatics. 2007, 23: 127-128. 10.1093/bioinformatics/btl529.PubMedView ArticleGoogle Scholar
- Hammesfahr B, Odronitz F, Mühlhausen S, Waack S, Kollmar M: GenePainter: a fast tool for aligning gene structures of eukaryotic protein families, visualizing the alignments and mapping gene structures onto protein structures. BMC Bioinformatics. 2013, 14: 77-10.1186/1471-2105-14-77.PubMed CentralPubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.