Clustering Profiles in Generalized Linear Mixed Models Settings Using Bayesian Nonparametric Statistics

Public Deposited
Resource Type
Creator
Abstract
  • Generalized linear mixed models are used to model clustered and longitudinal data in which the distribution of the response variable is a member of the exponential family. This thesis introduces a novel method for simultaneous clustering of such data and estimation of parameters of the underlying generalized linear mixed models. Generalized linear mixed models consist of two sets of parameters: fixed effects parameters that associate covariates to the response at the population level, and random effects parameters that associate covariates to the response at the individual level. We introduce a method that identifies homogeneous groups in the data based on similarities among random effects parameters that are obtained when homogeneous groups are modeled using generalized linear mixed models. We achieve this by placing a Dirichlet Process prior on random effects parameters, which induces clustering of random effects and subsequently the clustering of profiles. As a result, our method simultaneously groups profiles into clusters and estimates model parameters of each cluster without assuming that the number of clusters is known in advance. We have tested our method on both simulated data and data from public health domain. In simulations, we have shown that the method manages to recover the correct number of clusters, successfully clusters profiles and correctly estimates model parameters. In public health clustered data, our method produces parameter estimates that are very close to those obtained by a frequentist maximum likelihood method, while identifying groups of homogeneous health regions that reveal certain properties of the underlying survey population that cannot be easily obtained using other methods. Similar methods have been proposed for longitudinal data with continuous responses. This thesis extends these models in novel ways to clustered and longitudinal data, where the distribution of the response variable can be any member of the exponential family.

Subject
Language
Publisher
Thesis Degree Level
Thesis Degree Name
Thesis Degree Discipline
Identifier
Rights Notes
  • Copyright © 2018 the author(s). Theses may be used for non-commercial research, educational, or related academic purposes only. Such uses include personal study, research, scholarship, and teaching. Theses may only be shared by linking to Carleton University Institutional Repository and no part may be used without proper attribution to the author. No part may be used for commercial purposes directly or indirectly via a for-profit platform; no adaptation or derivative works are permitted without consent from the copyright owner.

Date Created
  • 2018

Relations

In Collection:

Items