Small Area Estimation: Informative Sampling and Two-fold Models

It appears your Web browser is not configured to display PDF files. Download adobe Acrobat or click here to download the PDF file.

Click here to download the PDF file.


Chatrchi, Golshid




Small area estimation (SAE) has become a dynamic field of research over the last couple of decades. The increasing demand for small area estimates in both public and private sectors has opened the door to further investigation of more applied problems. In this thesis, we consider three problems in the context of small area estimation: (i) small area estimation under informative sampling, (ii) small area estimation under unmatched two-fold subarea models, and (iii) variable selection under two-fold subarea model.

In survey sampling, often samples are drawn with unequal probabilities to improve the statistical efficiency of the sampling strategy. When the selection probabilities are correlated with the variables of interest, even after conditioning on the auxiliary information, the sampling scheme is called informative. Under informative sampling, the population model does not hold for the sample. Hence, to avoid biased results, the selection effects should be accounted in the inference procedures. To address this problem, we propose a semi-parametric approach by including a P-spline term of selection probabilities in the model.

The basic area-level model assumes that a linear model relates the small area means to area specific auxiliary information. However, in some SAE applications that deal with logit relationships the associated function is non-linear. In this situation, the linking model and the sampling model do not match and cannot be combined to produce the basic area-level model. We propose an empirical best unbiased predictor approach to estimation of small subarea parameters under a two-fold subarea-level model consisting of a sampling model and an unmatched linking model.

The main assumption behind standard variable selection methods is that the variable of interest is observed, which is not the case for small area models, where dependent variables are not directly observed, due to the sampling errors. In particular, for a matched two-fold subarea level model, direct estimates of subarea means are measured as proxies for the unknown variable of interest. Applying the standard regression model selection naively may result in inefficient model selection methods. To tackle this issue, we propose a two-step adjustment to the standard variable selection method.






Carleton University

Thesis Degree Name: 

Doctor of Philosophy: 

Thesis Degree Level: 


Thesis Degree Discipline: 

Probability and Statistics

Parent Collection: 

Theses and Dissertations

Items in CURVE are protected by copyright, with all rights reserved, unless otherwise indicated. They are made available with permission from the author(s).