Creator:
Date:
Abstract:
With more than 13 million tweets collected spanning between March 2020 to November 2020 relating to the COVID-19 global pandemic, the topics of discussion are investigated using topic models - statistical models that learn latent topics present in a collection of documents. Topic modelling is first conducted using Latent Dirichlet Allocation (LDA), a method that has seen great success when applied to formal texts. As LDA attempts to learn latent topics by analysing term co-occurrences within documents, it can encounter difficulties in the learning process when presented with shorter documents such as tweets. To address the inadequacies of LDA applied to short-text, a second topic modelling technique is considered, known as the Biterm Topic Model (BTM), which instead analyses term co-occurrences over the entire collection of documents. Comparing the performances of LDA and BTM, it was found that the topic quality of BTM was superior to that of LDA.