Text Classification with Noisy Class Labels

Public Deposited
Resource Type
Creator
Abstract
  • This thesis focuses on the problem of text classification with noise in the labels of the training data. Label noise can have many potential consequences, such as decreasing the model's accuracy and increasing the model's complexity. Designing learning algorithms that help maximize a desired performance measure in such noisy settings is important for achieving success on real world data. This thesis also investigates a recently proposed text classification method, called the Tsetlin Machine. The Tsetlin Machine can learn human readable rules made up of clauses. There is currently only one paper about the Tsetlin Machine applied to text classification problems, and this thesis builds on the work of that paper. Our experiments have shown that classical methods and the Tsetlin Machine have reasonably low impact on their performance from label noise, while recent state-of-the-art methods in text classification are not as robust.

Subject
Language
Publisher
Thesis Degree Level
Thesis Degree Name
Thesis Degree Discipline
Identifier
Rights Notes
  • Copyright © 2020 the author(s). Theses may be used for non-commercial research, educational, or related academic purposes only. Such uses include personal study, research, scholarship, and teaching. Theses may only be shared by linking to Carleton University Institutional Repository and no part may be used without proper attribution to the author. No part may be used for commercial purposes directly or indirectly via a for-profit platform; no adaptation or derivative works are permitted without consent from the copyright owner.

Date Created
  • 2020

Relations

In Collection:

Items