Representation Learning for Information Extraction

Public Deposited
Resource Type
Creator
Abstract
  • Distributed representations, predominantly acquired via neural networks, have been applied to natural language processing tasks including speech recognition and machine translation with a success comparable to sophisticated state-of-the-art algorithms. The present thesis offers an investigation of the application of such representations to information extraction. Specifically, I explore the suitability of applying shallow distributed representations to the automatic terminology extraction task, as well as the bridging reference resolution task. I created a dataset as a gold standard for automatic term extraction in the mathematical education domain. I carefully assessed the performance of the existing terminology extraction methods on this dataset. Then, I introduce a novel method %DI it is one method or several? I changed to one method since the next sentence says algorithm at singular not plural %Ehsan: noted, thanks. for automatic terminology extraction for one word terms, and I evaluate the performance of the novel algorithm in various terminological domains. The introduced algorithm leverages the distributed representation of words from the local and global perspectives to encode syntactic, semantic, association, and frequency information at the same time. Furthermore, this novel algorithm can be trained with a minimal number of data points. I show that the algorithm is robust to the change of domain, and that information can be transferred from one technical domain to another, leveraging what we call anchor words with consistent semantics shared between the domains. As for the bridging reference resolution task, a dataset is built on the letter portion of the Open American National Corpus and I compare the performance of a preliminary method against a majority class baseline.

Subject
Language
Publisher
Thesis Degree Level
Thesis Degree Name
Thesis Degree Discipline
Identifier
Rights Notes
  • Copyright © 2019 the author(s). Theses may be used for non-commercial research, educational, or related academic purposes only. Such uses include personal study, research, scholarship, and teaching. Theses may only be shared by linking to Carleton University Institutional Repository and no part may be used without proper attribution to the author. No part may be used for commercial purposes directly or indirectly via a for-profit platform; no adaptation or derivative works are permitted without consent from the copyright owner.

Date Created
  • 2019

Relations

In Collection:

Items