Generally, in EST databases, multiple sequences encoding the same RP have been found, because RPs are well expressed and their mRNAs are abundant and over-represented in the cDNA library. In addition, random sequencing of cDNAs on a large scale always results in high number of sequences encoding RPs . As expected, in the S. cephaloptera EST database, in spite of the relatively low number of clones which represent transcripts encoding proteins (only 2396), we were able to deduce complete amino acid sequences of 55 RPs, probably because a) in this cDNA library, inserts with a size greater than 800 bp have been selected, b) although only the 5'-ends of the clones from this library has been sequenced, the generally short length of the RP mRNAs allow to obtain complete sequences, c) 40.3% of the clones encoding proteins are RP mRNAs, and d) most of the RP genes (61%) were indeed found on EST clusters composed of 4 or more sequences. The two extreme examples are S9 and L44/L36a proteins, which were respectively encoded on 45 and 42 ESTs (Table 2). However, we have found only 1 EST for SA, S2, S3a and L2/L8, and if the ORF contains frameshift(s) and/or if it is partial, it becomes impossible to obtain the complete protein sequence. In addition, ESTs encoding for 14 RPs are missing. It is not surprising for two of these proteins, P0-like which is missing in rat and L24-like which is probably not associated with ribosome; for the others, it is probably due to a bias in the EST database. The reason for the huge disproportion in numbers of ESTs encoding the different RPs in S. cephaloptera database could also be just the consequence events such as mRNAs stability/instability, differences in efficiencies of mRNAs copying and insertions of cDNAs in the Lambda-phage.
Eukaryotic RP genes appear to belong to multigene families [27, 28]; however, great differences have been found between the different Kingdoms. In the yeast cell, where approximately half of the RP genes are duplicated , in all cases, both gene copies are transcribed although their expression levels often differ considerably ; moreover, the proteins encoded by duplicated genes have identical or virtually identical sequences and are functionally indistinguishable. In plants, multiple functional RP isoforms could be produced . In contrast, generally, in animals only a single gene encodes each RP, the other members of each gene family are pseudogenes . Consequently, analyses of animal EST databases reveal that RP appeared to have only one type of mRNA; the exceptions are rare, for example, in the channel catfish (Ictalurus punctatus) EST analysis has revealed, except for three protein types, that each other RP type is encoded by only one type of mRNA [44, 45]. Surprisingly, analysis of S. cephaloptera EST database reveals a more complicated pattern; almost each isoform could be encoded by two mRNA subtypes which only differ by a short region in their 5'-end sequences; moreover, in approximately half of the RP gene families, deduced isoforms have been found and the generally relatively high number of clones suggest that the corresponding mRNAs are probably translated and the proteins functional. These two events complicate the comprehension of the molecular evolutionary history of chaetognaths. Moreover, PCR experiments have evidenced that at least for 4 ribosomal gene families, the various genes variations are intraindividual variations (Table 4).
Within a mRNA type, generally, the sequences could be divided in two subtypes (named TTT and TAC) which differ by a short region in the 5'-UTR. These regions could correspond to two different transcription factor binding sites. In S. cephaloptera, as each of the many entire RP cDNAs has only one of the putative binding sites, this means that the two subtypes could have differential transcription patterns, whereas in other taxa, this feature is restricted to a low number of RP genes. As differences in promoters generally correspond to diverse RP gene expression control in specific tissues , our results suggest that one of the putative promoter site (probably TTT, which yields more mRNA) plays certainly a role during housekeeping conditions, whereas the other site (TAC, which yields less mRNA) would allow an expression of RPs when a very large quantity of RPs would be essential in specific tissues and/or in the most crucial development stages. In addition, almost all the RP genes contain one of the two putative binding sites. Two other elements are in favour of this hypothesis; one is that two subtypes for a given gene family encode identical proteins except in one case; the second element is that bioinformatic prediction suggests that one of the 5'-end region could constitute a binding site for members of an homeobox factor family members which are tissue-specific transcription factors and are critical regulators of whole organ development .
The analysis of the chaetognath EST database has also revealed a relatively great number of RP paralogs. If some of them, with a low number of ESTs, could be artefacts, the others, with a higher number of ESTs (Table 3), could have a physiological significance. Indeed, the identical or similar sizes of the paralogous members of each protein families added to cDNA analyses evidenced that these isoforms are not due to expression of differentially spliced mRNAs. Moreover, bioinformatical analyses suggest that most of the isoforms exhibit differences in their biologically significant sites (column 5 in Table 3). RP isoforms have been found in other animal taxa. However, the numbers of paralogs is lower; for example, in the channel catfish, if exclude alternative spliced transcripts which concern only the S3 family, paralogs have been found for only two types of RPs (S26 and S27) and one paralog pair has a high percentage of amino acid similarity with 94.8% identity, whereas the other paralog pair only differs by one amino acid [44, 45]. In human, two RP genes on the sex chromosomes, one on the Y and one on the X, are both widely transcribed in human tissues and encode two isoforms of S4 RP which differ at 19 of the 263 amino acids . In addition, two genes encoding different L39 proteins have also been evidenced in human . In rat, two functional genes are reported for S27; multiple transcripts encode each isoform and exhibit different tissue expression patterns . Moreover, in sponge (Suberites domuncula), no RP isoforms have been evidenced . To our knowledge, except chaetognaths, the presence of numerous RP isoforms has only been evidenced in plants ( and references therein). For example, due to the extensive segmental duplication of the Arabidopsis genome, all its RP genes have between two and several paralogs; and assessing RP gene expression by the presence of an EST showed that at least 77% of RP genes (not including the 21 genes with incomplete ORFs) are expressed at a level detectable by an EST .
The roles of multiple functional RP isoforms in plants remain unclear although it has been proposed that expression of multiple RP genes from a single family may be necessary to accommodate high – or specific – translational needs in growing plant tissue; thus, RP genes copies under development regulation may be required in addition to those gene copies that are constitutively expressed [29, 49]. For example, Arabidopsis RP gene L16 is present as two copies in the genome, with one isoform expressed in proliferating tissues and the other expressed in more specific tissues ; similarly, differential transcriptional regulation of the two RP L23A genes has also been reported . Moreover, differential expression of homeologous (genes duplicated by polyploidy) 18S-5.8S-26S rRNA genes has been shown in plant allopolyploids  and expression of multiple genes in a RP gene family may be indicative of ribosome heterogeneity . Surprisingly, chaetognaths exhibit numerous molecular analogies with plants; two classes of paralogs of 18S-28S rRNA have been reported [32, 33], which could be the result of an allopolyploid event in the ancestor of all the extant chaetognaths . Moreover, in S. cephaloptera, one of the 18S class plays a ubiquitous role whereas the other is specific to oocytes . The great number of RP paralogs in this species could be the result of the allopolyploidy and we hypothesize that two populations of ribosome could exist in chaetognath cells; one of them contains the housekeeping rRNA (Class I) and the isoforms for which numerous mRNA have been found in the EST database and which give relatively short branches in phylogenetic reconstructions (data not shown); the other contains the class II rRNAs with the other isoforms. Moreover, a preliminary observation suggests that, in chaetognaths, the positive or negative selection of RP families which contain paralogs has probably some functional reasons. Indeed, in Escherichia coli, it has been evidenced that most RP genes are crucial for ribosome assembly or functionality, such as proteins implicated in the early assembling proteins (S4, S7, S8, S15, S17, L2, L3, L4, L5, L15, L18), the bridges between two subunits (S13, S15, S19, L2, L5, L14), contact with tRNA (S7, S9, S12, S13, L1, L5), and the surrounding polypeptide exit channel (L22, L24, L29) . It is interesting to compare this list of proteins to those given for the chaetognath putative isoforms (Table 3); only S7, S8, S15, S17, L15 and L22 are present in the two lists (i.e., have isoforms and fit the above functions). In addition, for S7, L15 and L22, a paralog is encoded by only a unique clone (EST = 1 in Table 3), suggesting possible sequencing artefacts, and the two S15 paralogous proteins have 100 % of similarity. Therefore, we hypothesize that paralogs for "crucial RPs" could be strongly unfavourable. If RP paralogs which exhibit various non ribosomal functions were to interact with the ribosome, this could induce an inactivation of the translation mechanism. Contrarily, if this event occurs with other non crucial RPs, it could be selectively neutral.
Alternatively, expression of multiple gene family members may also be indicative of multiple functions for RPs from any given gene family, with some members having ribosomal functions and other extraribosomal roles. It is well known that many RPs perform additional extra-ribosomal functions in cells. In mammalian, where the number of RP paralogs is very low, RPs also exhibit various secondary functions in DNA repair, apoptosis, drug resistance and proliferation. They are involved in different cellular processes, from replication and regulation of cell growth to apoptosis and malignant transformation [56, 57]; and consequently the expression of their genes could vary considerably [58, 59]. In addition, zebrafish carrying heterozygous mutations in a number of RPs are predisposed to cancer . According to us, probably when two or more paralogous RPs exhibit several differences in their primary sequences, one of the paralog plays its "conventional role" as component of ribosome, while the other(s) perform(s) extra-ribosomal functions. Moreover, it has been proposed that gnathostomes had undergone two events of polyploidization leading to octaploidy  and in this clade, in each RP gene family, generally a single gene is functional, suggesting that in each RP gene family, all the paralogs but one are subject to strong counter-selection; this is not the case in chaetognaths, putatively allopolyploids, which could have overcome the deleterious effects of paralog RPs. Interestingly, in S. cephaloptera, in more half of the RP paralog families, the percentage of identity between the members of each family is less than 93% (Table 3); this could correspond to a subfunctionalization, after ploidy, the homoeolog copies specialize to perform complementary functions [62, 63]. A great number of RP paralogs generate another problem; indeed, the ribosome is an intricate ribonucleoprotein complex with a multitude of protein constituents present in equimolar amounts. Coordination of the synthesis of these RPs presents a major challenge to the cell and is a result of the sum total of all regulatory mechanisms, i.e., transcriptional, posttranscriptional, translational, posttranslational, on each RP gene. The presence of multiple (often more than two) functional genes encoding each RP substantially make more complex coordinated expression . Chaetognaths, which seem to date unique among animals in carrying multiple paralogous RP functional genes, contradict the current knowledge regarding coordinated systems of RP gene expression in animals. This is probably another prove of the uniqueness of this phylum among animals, as already focused at the anatomical and histological levels. In the future, comparison of chaetognaths versus other animals RP genes regulation will provide fruitful data.
In spite the use of several methods, the phylogenetic relationship of chaetognath is not resolved by the present study. Two biases appear likely to affect our reconstructions, the LBA artefact, and the composition artefact, evidenced by contrasting G+C levels on least constrained third codon positions. Such artefacts may lead to wrong clades with strong branch support and we suspect this is the case for the chaetognath and Drosophila "clade". However, for the second type of bias, the ML non stationary analyses which allow G+C content to vary, still groups the chaetognath with Drosophila, which are long branch species. Furthermore, the second codon position and amino-acid datasets should be much less susceptible to the composition bias but yield the same group. The fact that the non homogeneous amino-acid model CAT, shown to be the most robust method against LBA , although at the cost of lower posterior support values , yielded a topology that did not join chaetognath and Drosophila suggests that LBA are more important biases than composition artefacts to infer chaetognath phylogenetic relationships. This analysis, which does not group "long branch" species with similar base compositions (chaetognath and Drosophila), is also in agreement with previous works such as Marletaz et al. , although the posterior supports are very low, as expected with this method . Therefore, the LBA artefact seems to affect our phylogenetic reconstruction more than the base composition bias, since the methods which are supposed to "correct" for GC-content variation among lineages do not change the topology obtained with more standard methods, while the method supposed to correct for LBA does change it.
Marletaz et al.  building a dataset of S. cephaloptera RP genes concatenated for 17 taxa, recovered the deuterostomian clade with high bootstrap support, whereas the chaetognaths clustered strongly with protostomes (bootstrap 98%) and their position as a sister group to all other protostomes was supported by weak bootstrap values (51%). As we analysed the paralogy for all the RP gene families and used, after preliminary phylogenetic analyses, only the paralogs with the shorter branches, we hoped to obtain similar topology but with strongly supported nodes, this is not case probably illustrating that the number of taxa plays a significant role and is of major importance when LBA artefacts are into play. Our thorough phylogenetic analyses, using non homogeneous and non stationary models as well as the CAT mixture model for the first time on that data set, helped to identify and correct specific sources of artefactual branch attraction. We can now predict that improvements to infer phylogenetic relationship of the chaetognath phylum will rely on using the PhyloBayes program with the CAT model on a wider taxonomic dataset than the one we used in the present study, such as that of Marletaz et al. . Moreover, versus this last study, we also had the advantages of choosing only the most conserved RP paralogs (by discarding the divergent ones), however, in spite of these various improvements, present results confirm the difficulty of finding the exact phylogenetic relationships of chaetognaths.