We recently developed an informatics method [13] to provide an estimate for the orderliness of synonymous codon usage (SCUO) and the amount of synonymous codon usage bias. This method was based on the Shannon informatics theory and the entropy theory and allows the comparison of codon usage bias within and across genomes.

To calculate SCUO, we created a codon table for the amino acids that have more than one codon, indexed in an arbitrary way, so that we could unambiguously refer to the *j* -th (degenerate) codon of amino acid *i*, 1 ≤ *i* ≤ 18. In mycoplasmas, Trp was also included into the codon table since a standard stop codon TGA encodes Trp in this specific species so that 1 ≤ *i* ≤ 19. For simplicity, the following description of the method is only based on the standard genetic codon table although the actual SCUO computation considered special cases for different organisms.

Let *n*
_{
i
}represent the number of degenerate codons for amino acid *i*, so 1 ≤ *j* ≤ *n*
_{
i
}: for example, 1 ≤ *j* ≤ 6 for leucine, 1 ≤ *j* ≤ 2 for tyrosine, etc. For each sequence, let *x*
_{
ij
}represent the number of times that synonymous codon *j* of amino acid *i* is present, 1 ≤ *i* ≤ 18, 1 ≤ *j* ≤ *n*
_{
i
}. Normalizing the *x*
_{
ij
}by their sum over *j* gives the frequency of the *j* -th degenerate codon for amino acid *i* in each sequence.

According to the information theory, we define the entropy *H*
_{
ij
}of the *i* -th amino acid of the *j* -th codon in each sequence by

*H*
_{
ij
}= - *p*
_{
ij
}log *p*
_{
ij
} **2**

Summing over the codons representing amino acid *i* gives the entropy of the *i* -th amino acid in the each sequence

If the synonymous codons for the *i* -th amino acid were used at random, one would expect a uniform distribution of them as representatives for the *i* -th amino acid. Thus, the maximum entropy for the *i* -th amino acid in each sequence is

If only one of the synonymous codons is used for the *i* -th amino acid, i.e., the usage of the synonymous codons is biased to the extreme, then the *i* -th amino acid in each sequence has the minimum entropy:

Unlike Shannon's definition of information, Gatlin [33] and Layzer [34] define the information as the difference between the maximum entropy and the actual entropy as an index of orderliness. The greater the information, the more ordered the sequence will be [35]. In our case, this information measures the nonrandomness in synonymous codon usage and therefore describes the degree of orderliness for synonymous codon usage for the *i* -th amino acid in each sequence.

Let *O*
_{
i
}be the normalized difference between the maximum entropy and the observed entropy for the *i* -th amino acid in each sequence, i.e.,

Obviously, 0 ≤ *O*
_{
i
}≤ 1. When the synonymous codon usage for the *i* -th amino acid is random, *O*
_{
i
}= 0. When this usage is biased to the extreme, 0_{
i
}= 1. Thus, *O*
_{
i
}can be thought of as a measure of the bias in synonymous codon usage for the *i* -th amino acid in each sequence. We designate the statistics *O*
_{
i
}as the synonymous codon usage order (SCUO) for the *i* -th amino acid in each sequence.

Let *F*
_{
i
}be the composition ratio of the *i* -th amino acid in each sequence:

Then the average SCUO for each sequence can be represented as

A software package called codonO was written using the C programming language to calculate SCUO for each open reading frame (ORF). This program is available at http://digbio.missouri.edu/~;wanx/cu/codonO/. The 86 unicellular genomes explored in this paper can be found at our Web site http://digbio.missouri.edu/~wanx/cu/genomelist.htm.