Text Classification with Noisy Class Labels

It appears your Web browser is not configured to display PDF files. Download adobe Acrobat or click here to download the PDF file.

Click here to download the PDF file.

Creator: 

Pagotto, Andrea

Date: 

2020

Abstract: 

This thesis focuses on the problem of text classification with noise in the labels of the training data. Label noise can have many potential consequences, such as decreasing the model's accuracy and increasing the model's complexity. Designing learning algorithms that help maximize a desired performance measure in such noisy settings is important for achieving success on real world data. This thesis also investigates a recently proposed text classification method, called the Tsetlin Machine. The Tsetlin Machine can learn human readable rules made up of clauses. There is currently only one paper about the Tsetlin Machine applied to text classification problems, and this thesis builds on the work of that paper. Our experiments have shown that classical methods and the Tsetlin Machine have reasonably low impact on their performance from label noise, while recent state-of-the-art methods in text classification are not as robust.

Subject: 

Computer Science

Language: 

English

Publisher: 

Carleton University

Thesis Degree Name: 

Master of Computer Science: 
M.C.S.

Thesis Degree Level: 

Master's

Thesis Degree Discipline: 

Computer Science

Parent Collection: 

Theses and Dissertations

Items in CURVE are protected by copyright, with all rights reserved, unless otherwise indicated. They are made available with permission from the author(s).