Representation Learning  for Information Extraction

Amjadian, Ehsan

Download PDF

Resource Type

Thesis

Creator

Amjadian, Ehsan

Abstract

Distributed representations, predominantly acquired via neural networks, have been applied to natural language processing tasks including speech recognition and machine translation with a success comparable to sophisticated state-of-the-art algorithms. The present thesis offers an investigation of the application of such representations to information extraction. Specifically, I explore the suitability of applying shallow distributed representations to the automatic terminology extraction task, as well as the bridging reference resolution task. I created a dataset as a gold standard for automatic term extraction in the mathematical education domain. I carefully assessed the performance of the existing terminology extraction methods on this dataset. Then, I introduce a novel method %DI it is one method or several? I changed to one method since the next sentence says algorithm at singular not plural %Ehsan: noted, thanks. for automatic terminology extraction for one word terms, and I evaluate the performance of the novel algorithm in various terminological domains. The introduced algorithm leverages the distributed representation of words from the local and global perspectives to encode syntactic, semantic, association, and frequency information at the same time. Furthermore, this novel algorithm can be trained with a minimal number of data points. I show that the algorithm is robust to the change of domain, and that information can be transferred from one technical domain to another, leveraging what we call anchor words with consistent semantics shared between the domains. As for the bridging reference resolution task, a dataset is built on the letter portion of the Open American National Corpus and I compare the performance of a preliminary method against a majority class baseline.

Subject

Linguistics

Language

English

Publisher

Carleton University

Thesis Degree Level

Doctoral

Thesis Degree Name

Doctor of Philosophy (Ph.D.)

Thesis Degree Discipline

Cognitive Science

Identifier

DOI: https://doi.org/10.22215/etd/2019-13514

Rights Notes

Copyright © 2019 the author(s). Theses may be used for non-commercial research, educational, or related academic purposes only. Such uses include personal study, research, scholarship, and teaching. Theses may only be shared by linking to Carleton University Institutional Repository and no part may be used without proper attribution to the author. No part may be used for commercial purposes directly or indirectly via a for-profit platform; no adaptation or derivative works are permitted without consent from the copyright owner.

Date Created

2019

Relations

In Collection:

Theses and Dissertations

Items

Thumbnail	Title	Date Uploaded	Visibility	Actions
	amjadian-representationlearningforinformationextraction.pdf	2023-05-05	Public	Download

Representation Learning for Information Extraction

Downloadable Content

Relations

Items