Creator:
Date:
Abstract:
Correlated data collected from probability based sample surveys are often used in research studies in economics, health and social sciences. These surveys usually involve complex design such as stratification, clustering and unequal selection probability. Ignoring the correlations or the sampling design features may lead to erroneous inferences. In this thesis, we consider regression analysis of correlated survey data, taking account of both the correlation and sampling design. In the non-survey context, marginal models and mixed effects models are two approaches commonly used for correlated data. The Generalized Estimating Equation (GEE) method is the main method for marginal models and likelihood based methods are often used for mixed effects models. Recent progresses have been made to both approaches. Qu, Lindsay and Li (2000) proposed a quadratic inference functions (QIF) approach for marginal models that improves the GEE in terms of efficiency under misspecification of the second moment, and also possesses other features that the GEE does not. Lindsay (1988) proposed composite likelihood (CL) approach for multi-level mixed effects models. The CL method has been developed to reduce high dimensional likelihood functions to low dimensional ones, which makes the computation simpler while still having many of the good inference properties of a full likelihood function. In this thesis, we extend these methods to survey data with complex design. Weighting technique is often used to account for sampling design. Carrillo, Chen and Wu (2010) developed weighted GEE for longitudinal survey data. Following their work, we propose a weighted QIF method to improve the weighted GEE in parallel with the improvement in non-survey context. We study its asymptotic properties related to regression parameter estimation and hypothesis testing. We also study the problem of variable selection under the QIF method. For the CL method, Rao, Verret and Hidiroglou (2013) proposed a survey weighted pairwise CL approach for two-level survey data. Yi, Rao and Li (2016) further studied its properties for point estimation. We study its properties for analytical inference in this thesis, in particular composite likelihood ratio statistics.