On the Correction for Misclassification Bias in Electronic Health Data Using Validation Sample Approaches

It appears your Web browser is not configured to display PDF files. Download adobe Acrobat or click here to download the PDF file.

Click here to download the PDF file.


Gravel, Christopher Andre




Patient electronic health records (EHRs) are used to assess potential adverse drug reaction risk. These records are assumed to have perfect classification of the outcomes of interest, however, this assumption is not necessarily realistic. There are many reasons for outcome misclassification to be present in EHR data. Coding issues, diagnostic uncertainty (particularly relative to time), and misdiagnoses are causes this type of error. Unbiased estimation with outcome misclassification relies on the availability of additional information. We considered the use of internally validated data and demonstrated misclassification bias adjustment in binary and right censored continuous time survival data with and without the presence of competing risks. These data structures are investigated as they pertain to the underlying nature of EHR data. In misclassified binary data we considered the use of different sampling schemes for acquisition of the validation data. We first considered the estimated asymptotic relative efficiencies between the maximum likelihood estimators derived from these sampling approaches. Monte Carlo simulation demonstrated that the possibility of a minimal variance MLE relative to differing sampling schemes exists, however, the ability to assess this prior to sampling is not possible. We propose a numerical method that results in a validation sample size determination algorithm that can be used to approximate the relationship between sample size, variance of the estimator of the parameter of interest and chosen sampling approach. Finally, we considered methods of estimation used to assess association in a two-by-two contingency table such as the odds-ratio and logistic regression. For right censored continuous time survival data with and without competing risks, we proposed the use of internal validation to adjust for misclassification bias of two different types. First, we considered the problem of failing to observe the occurrence of an event of interest and incorrectly concluding that the individual under study is a censored observation. Second, we considered the situation in which we correctly observe an event occurrence, however erroneously observe the cause-specific event type. Under assumptions based on EHR data, using a multi-sample likelihood based approach we produced unbiased estimators for data with either form of error or both simultaneously being present.






Carleton University

Thesis Degree Name: 

Doctor of Philosophy: 

Thesis Degree Level: 


Thesis Degree Discipline: 

Probability and Statistics

Parent Collection: 

Theses and Dissertations

Items in CURVE are protected by copyright, with all rights reserved, unless otherwise indicated. They are made available with permission from the author(s).