Efficient context-dependent model building based on clustering posterior distributions for non-coding sequences

Table 7 Stepwise context reduction for the Ancestral Repeats dataset using the graph-based approach.

Model	Contexts	Annealing	Melting	log BF
GTR16C	16 (96)	[623.2; 638.2]	[645.5; 661.9]	642.2
GTR15C	15 (90)	[658.0; 672.1]	[665.0; 682.0]	669.2
GTR14C	14 (84)	[651.9; 668.4]	[664.3; 678.9]	665.9
GTR13C	13 (78)	[664.9; 679.6]	[676.4; 693.1]	678.5
GTR12C	12 (72)	[673.3; 689.1]	[685.3; 701.7]	687.4
GTR11C	11 (66)	[682.3; 697.9]	[693.5; 710.4]	696.0
GTR10C	10 (60)	[677.5; 693.4]	[697.5; 710.3]	694.7
GTR9C	9 (56)	[693.7; 707.6]	[710.4; 724.6]	709.1
GTR8C	8 (48)	[699.3; 711.7]	[712.4; 727.5]	712.7
GTR7C	7 (42)	[686.5; 700.0]	[705.1; 719.3]	702.7
GTR6C	6 (36)	[650.6; 663.0]	[651.2; 664.8]	657.4
GTR5C	5 (30)	[641.4; 652.3]	[639.2; 649.2]	645.5
GTR	1 (6)	-	-	0

The stepwise context reduction using our graph-based clustering approach reveals an optimal model with 8 clusters for the Ancestral Repeats dataset (GTR8C). It attains a log Bayes Factor of 712.7 (as compared to GTR1C), a significant improvement over the full context-dependent model (GTR16C) which has twice as many parameters. This model also outperforms the 10-clusters model determined by the likelihood-based clustering approach.

ISSN: 2730-7182