Skip to main content

Table 2 Brief description of models: Mechanistic codon substitution models

From: Superiority of a mechanistic codon substitution model even for protein sequences in Phylogenetic analysis

Equal-Constraint-n-F-dGm(r|s|sf)

Equal constraint irrespective of amino acid substitution type is assumed; β = 0 in Eq. 4.

EI-n-F-dGm(r|s|sf)

w ab EI ≡-(Δ ε ̂ ab c +Δ ε ̂ ab v ) based on the Energy-Increment-based (EI) method [26] is used to estimate w ab in Eq. 4. The Δ ε ̂ ab c and Δ ε ̂ ab v represent the effects of the mean increment of contact energy between residues and of residue-volume change due to an amino acid replacement, respectively; see Supporting Information, Text S1, in [26].

JTT-ML91+-n-F-dGm(r|s|sf), WAG-ML91+-n-F-dGm(r|s|sf), LG-ML91+-n-F-dGm(r |s|sf)

Selective constraints { w ab JTT/WAG/LG-ML91+ } estimated by maximizing the likelihood of JTT/WAG/LG [5, 10, 11] in the ML-91+ model [26] are used as { w ab estimate } in Eq. 4.

KHG-ML200-n-F-dGm(r|s|sf)

Selective constraints { w ab KHG-ML200 } estimated by maximizing the likelihood of the KHG codon substitution matrix [25] in the ML-200 model [26] are used as { w ab estimate } in Eq. 4.

  1. The suffix "n" means the number of parameters optimized for the substitution rate matrix. The suffix "-F" means that equilibrium codon frequencies are assumed to be equal to codon frequencies in codon sequences; equal codon usage is assumed for amino acid sequences. The suffix "-dGm(r|s|sf)" denotes "-dGmr", "-dGms" or "-dGmsf". The suffixes "-dGmr" and "-dGms" mean the variation of mutation rate or selective constraint across sites, respectively, which is approximated by a discrete gamma distribution [38] with m categories of unequal probabilities; see Additional file 1 for details. The "f" following "-dGms" means that the posterior frequencies of amino acids in each category in the first run are used in the second run as the equilibrium frequencies for each category; see the Methods section.