Small area estimation (SAE) has become a dynamic field of research over the last couple of decades. The increasing demand for small area estimates in both public and private sectors has opened the door to further investigation of more applied problems. In this thesis, we consider three problems in the context of small area estimation: (i) small area estimation under informative sampling, (ii) small area estimation under unmatched two-fold subarea models, and (iii) variable selection under two-fold subarea model.
In survey sampling, often samples are drawn with unequal probabilities to improve the statistical efficiency of the sampling strategy. When the selection probabilities are correlated with the variables of interest, even after conditioning on the auxiliary information, the sampling scheme is called informative. Under informative sampling, the population model does not hold for the sample. Hence, to avoid biased results, the selection effects should be accounted in the inference procedures. To address this problem, we propose a semi-parametric approach by including a P-spline term of selection probabilities in the model.
The basic area-level model assumes that a linear model relates the small area means to area specific auxiliary information. However, in some SAE applications that deal with logit relationships the associated function is non-linear. In this situation, the linking model and the sampling model do not match and cannot be combined to produce the basic area-level model. We propose an empirical best unbiased predictor approach to estimation of small subarea parameters under a two-fold subarea-level model consisting of a sampling model and an unmatched linking model.
The main assumption behind standard variable selection methods is that the variable of interest is observed, which is not the case for small area models, where dependent variables are not directly observed, due to the sampling errors. In particular, for a matched two-fold subarea level model, direct estimates of subarea means are measured as proxies for the unknown variable of interest. Applying the standard regression model selection naively may result in inefficient model selection methods. To tackle this issue, we propose a two-step adjustment to the standard variable selection method.