Tales of a Coronavirus Pandemic: Topic Modelling with Short-Text Data
Public Deposited- Resource Type
- Creator
- Abstract
With more than 13 million tweets collected spanning between March 2020 to November 2020 relating to the COVID-19 global pandemic, the topics of discussion are investigated using topic models - statistical models that learn latent topics present in a collection of documents. Topic modelling is first conducted using Latent Dirichlet Allocation (LDA), a method that has seen great success when applied to formal texts. As LDA attempts to learn latent topics by analysing term co-occurrences within documents, it can encounter difficulties in the learning process when presented with shorter documents such as tweets. To address the inadequacies of LDA applied to short-text, a second topic modelling technique is considered, known as the Biterm Topic Model (BTM), which instead analyses term co-occurrences over the entire collection of documents. Comparing the performances of LDA and BTM, it was found that the topic quality of BTM was superior to that of LDA.
- Subject
- Language
- Publisher
- Thesis Degree Level
- Thesis Degree Name
- Thesis Degree Discipline
- Identifier
- Rights Notes
Copyright © 2021 the author(s). Theses may be used for non-commercial research, educational, or related academic purposes only. Such uses include personal study, research, scholarship, and teaching. Theses may only be shared by linking to Carleton University Institutional Repository and no part may be used without proper attribution to the author. No part may be used for commercial purposes directly or indirectly via a for-profit platform; no adaptation or derivative works are permitted without consent from the copyright owner.
- Date Created
- 2021
Relations
- In Collection:
Items
Thumbnail | Title | Date Uploaded | Visibility | Actions |
---|---|---|---|---|
shen-talesofacoronaviruspandemictopicmodelling.pdf | 2023-05-05 | Public | Download |