Text Classification with Noisy Class Labels

Resource Type

Creator

Abstract

This thesis focuses on the problem of text classification with noise in the labels of the training data. Label noise can have many potential consequences, such as decreasing the model's accuracy and increasing the model's complexity. Designing learning algorithms that help maximize a desired performance measure in such noisy settings is important for achieving success on real world data. This thesis also investigates a recently proposed text classification method, called the Tsetlin Machine. The Tsetlin Machine can learn human readable rules made up of clauses. There is currently only one paper about the Tsetlin Machine applied to text classification problems, and this thesis builds on the work of that paper. Our experiments have shown that classical methods and the Tsetlin Machine have reasonably low impact on their performance from label noise, while recent state-of-the-art methods in text classification are not as robust.

Subject

Language

Publisher

Thesis Degree Level

Thesis Degree Name

Thesis Degree Discipline

Identifier

Rights Notes

Copyright © 2020 the author(s). Theses may be used for non-commercial research, educational, or related academic purposes only. Such uses include personal study, research, scholarship, and teaching. Theses may only be shared by linking to Carleton University Institutional Repository and no part may be used without proper attribution to the author. No part may be used for commercial purposes directly or indirectly via a for-profit platform; no adaptation or derivative works are permitted without consent from the copyright owner.

Date Created

Relations

In Collection:

Thumbnail	Title	Date Uploaded	Visibility	Actions
	pagotto-textclassificationwithnoisyclasslabels.pdf	2023-05-05	Public	Download