Comparison of Finite and Infinite Mixture Models for Capturing Compositional Heterogeneity Across Sites

It appears your Web browser is not configured to display PDF files. Download adobe Acrobat or click here to download the PDF file.

Click here to download the PDF file.

Creator: 

Bujaki, Thomas James

Date: 

2018

Abstract: 

Phylogenetic modelling of evolutionary processes across sites from sequence alignments has garnered increasing attention over the last few decades. One approach adopts the view that the heterogeneity across observations is a result of the data set having been emitted from several different models, each drawn from a distribution. Finite mixture models provide discretizations of the unknown distribution into a set of sub-models, or components. Choosing a level of discretization is done from a set of likelihood-based model comparisons. We use Bayesian cross-validation to compare a range of finite mixture models, along with the infinite mixture modelling approach known as `CAT', and gamma-distributed rates across sites approach. Using simulations and real alignments, our findings indicate that the improvement in model-fit from finite mixture models is attained when the number of components is between 20 and 60. The magnitude of improvement is dependant on whether or not the gamma approach is invoked.

Subject: 

Bioinformatics

Language: 

English

Publisher: 

Carleton University

Thesis Degree Name: 

Master of Science: 
M.Sc.

Thesis Degree Level: 

Master's

Thesis Degree Discipline: 

Chemistry

Parent Collection: 

Theses and Dissertations

Items in CURVE are protected by copyright, with all rights reserved, unless otherwise indicated. They are made available with permission from the author(s).