Comparison of Finite and Infinite Mixture Models for Capturing Compositional Heterogeneity Across Sites

Public Deposited

Analytics

Resource Type

Creator

Abstract

Phylogenetic modelling of evolutionary processes across sites from sequence alignments has garnered increasing attention over the last few decades. One approach adopts the view that the heterogeneity across observations is a result of the data set having been emitted from several different models, each drawn from a distribution. Finite mixture models provide discretizations of the unknown distribution into a set of sub-models, or components. Choosing a level of discretization is done from a set of likelihood-based model comparisons. We use Bayesian cross-validation to compare a range of finite mixture models, along with the infinite mixture modelling approach known as `CAT', and gamma-distributed rates across sites approach. Using simulations and real alignments, our findings indicate that the improvement in model-fit from finite mixture models is attained when the number of components is between 20 and 60. The magnitude of improvement is dependant on whether or not the gamma approach is invoked.

Subject

Language

Publisher

Thesis Degree Level

Thesis Degree Name

Thesis Degree Discipline

Identifier

Rights Notes

Copyright © 2018 the author(s). Theses may be used for non-commercial research, educational, or related academic purposes only. Such uses include personal study, research, scholarship, and teaching. Theses may only be shared by linking to Carleton University Institutional Repository and no part may be used without proper attribution to the author. No part may be used for commercial purposes directly or indirectly via a for-profit platform; no adaptation or derivative works are permitted without consent from the copyright owner.

Date Created

Relations

In Collection:

Thumbnail	Title	Date Uploaded	Visibility	Actions
	bujaki-comparisonoffiniteandinfinitemixturemodels.pdf	2023-05-05	Public	Download