On the Correction for Misclassification Bias in Electronic Health Data Using Validation Sample Approaches

Public Deposited
Resource Type
Creator
Abstract
  • Patient electronic health records (EHRs) are used to assess potential adverse drug reaction risk. These records are assumed to have perfect classification of the outcomes of interest, however, this assumption is not necessarily realistic. There are many reasons for outcome misclassification to be present in EHR data. Coding issues, diagnostic uncertainty (particularly relative to time), and misdiagnoses are causes this type of error. Unbiased estimation with outcome misclassification relies on the availability of additional information. We considered the use of internally validated data and demonstrated misclassification bias adjustment in binary and right censored continuous time survival data with and without the presence of competing risks. These data structures are investigated as they pertain to the underlying nature of EHR data. In misclassified binary data we considered the use of different sampling schemes for acquisition of the validation data. We first considered the estimated asymptotic relative efficiencies between the maximum likelihood estimators derived from these sampling approaches. Monte Carlo simulation demonstrated that the possibility of a minimal variance MLE relative to differing sampling schemes exists, however, the ability to assess this prior to sampling is not possible. We propose a numerical method that results in a validation sample size determination algorithm that can be used to approximate the relationship between sample size, variance of the estimator of the parameter of interest and chosen sampling approach. Finally, we considered methods of estimation used to assess association in a two-by-two contingency table such as the odds-ratio and logistic regression. For right censored continuous time survival data with and without competing risks, we proposed the use of internal validation to adjust for misclassification bias of two different types. First, we considered the problem of failing to observe the occurrence of an event of interest and incorrectly concluding that the individual under study is a censored observation. Second, we considered the situation in which we correctly observe an event occurrence, however erroneously observe the cause-specific event type. Under assumptions based on EHR data, using a multi-sample likelihood based approach we produced unbiased estimators for data with either form of error or both simultaneously being present.

Subject
Language
Publisher
Thesis Degree Level
Thesis Degree Name
Thesis Degree Discipline
Identifier
Rights Notes
  • Copyright © 2015 the author(s). Theses may be used for non-commercial research, educational, or related academic purposes only. Such uses include personal study, research, scholarship, and teaching. Theses may only be shared by linking to Carleton University Institutional Repository and no part may be used without proper attribution to the author. No part may be used for commercial purposes directly or indirectly via a for-profit platform; no adaptation or derivative works are permitted without consent from the copyright owner.

Date Created
  • 2015

Relations

In Collection:

Items