Identifying Opinion Based Questions in Developer Chat Communication

Public Deposited
Resource Type
Creator
Abstract
  • Today, software developers work on complex and fast-moving projects that often require instant assistance. With numerous topics discussed in parallel in chat servers such as Discord, mining them would offer researchers opportunities to develop software tools and services. Firstly, we propose a dataset called DISCO consisting of the one-year public DIScord chat COnversations of four software development communities. Secondly, we improve the existing ChatEO's opinion-asking question identification process by replacing heuristics with Deep Learning (DL) architecture (with various word embeddings) in Natural Language Processing tasks. The results show a better performance of DL models over heuristics and are validated with a manual qualitative study. We have employed an automatic weak learner, Snorkel to label a larger dataset to increase DL performance. We have also used class balancing techniques - SMOTE and Near-Miss on this larger dataset. SMOTE along with Multi-CNN and GloVe-Twitter achieves the best performance in this study (0.95 recall).

Subject
Language
Publisher
Thesis Degree Level
Thesis Degree Name
Thesis Degree Discipline
Identifier
Rights Notes
  • Copyright © 2022 the author(s). Theses may be used for non-commercial research, educational, or related academic purposes only. Such uses include personal study, research, scholarship, and teaching. Theses may only be shared by linking to Carleton University Institutional Repository and no part may be used without proper attribution to the author. No part may be used for commercial purposes directly or indirectly via a for-profit platform; no adaptation or derivative works are permitted without consent from the copyright owner.

Date Created
  • 2022

Relations

In Collection:

Items