Clustering Profiles in Generalized Linear Mixed Models Settings Using Bayesian Nonparametric Statistics

It appears your Web browser is not configured to display PDF files. Download adobe Acrobat or click here to download the PDF file.

Click here to download the PDF file.


Mizdrak, Predrag




Generalized linear mixed models are used to model clustered and longitudinal data in which the distribution of the response variable is a member of the exponential family. This thesis introduces a novel method for simultaneous clustering of such data and estimation of parameters of the underlying generalized linear mixed models. Generalized linear mixed models consist of two sets of parameters: fixed effects parameters that associate covariates to the response at the population level, and random effects parameters that associate covariates to the response at the individual level. We introduce a method that identifies homogeneous groups in the data based on similarities among random effects parameters that are obtained when homogeneous groups are modeled using generalized linear mixed models. We achieve this by placing a Dirichlet Process prior on random effects parameters, which induces clustering of random effects and subsequently the clustering of profiles. As a result, our method simultaneously groups profiles into clusters and estimates model parameters of each cluster without assuming that the number of clusters is known in advance. We have tested our method on both simulated data and data from public health domain. In simulations, we have shown that the method manages to recover the correct number of clusters, successfully clusters profiles and correctly estimates model parameters. In public health clustered data, our method produces parameter estimates that are very close to those obtained by a frequentist maximum likelihood method, while identifying groups of homogeneous health regions that reveal certain properties of the underlying survey population that cannot be easily obtained using other methods. Similar methods have been proposed for longitudinal data with continuous responses. This thesis extends these models in novel ways to clustered and longitudinal data, where the distribution of the response variable can be any member of the exponential family.






Carleton University

Thesis Degree Name: 

Doctor of Philosophy: 

Thesis Degree Level: 


Thesis Degree Discipline: 

Probability and Statistics

Parent Collection: 

Theses and Dissertations

Items in CURVE are protected by copyright, with all rights reserved, unless otherwise indicated. They are made available with permission from the author(s).