MAWT: Multi-Attention-Weight Transformers

Public Deposited
Resource Type
Creator
Abstract
  • Transformers are machine learning models designed to learn and predict sequential and structured data, which are crucial to tasks such as neural machine translation and semantic parsing. They have become state-of-the-art engines for both of these tasks and much research in natural language processing is devoted to increasing their performance by introducing modifications to their architectures. In light of this trend, this thesis introduces a new Transformer architecture called MAWT: Multi-Attention-Weight Transformers in an attempt to increase the accuracy and variety of the acceptable predictions of a Transformer. It attempts to achieve this by training multiple weights per each Transformer attention head, which then are used to test the accuracy of the engine. This creates a new architecture under which the system produces a candidate set of outputs (instead of a singly output), along with a method for selecting from the candidate set. My proposal rests on the assumption -- motivated by statistical considerations -- that having a candidate set increases the probability of finding an exact match within the set. Upon testing, I observed that my system outperforms the regular transformer on 5/6 benchmark neural machine translation and semantic parsing datasets, where engine performance is measured by exact match accuracy. Exact match accuracy demands syntactic identity between the output and the target. In order to investigate how well my new architecture generalizes to measures of semantic equivalence that don't also demand syntactic identity, I also recorded the BLEU scores on these datasets. The BLEU score is a measure of performance based on n-grams rather than exact symbolic match (i.e., how many contiguous sequence of n-many strings from the predicted output match the desired output). The results I report on the BLEU scores are more mixed, raising important questions that I highlight about the role of syntax in measures of semantic equivalence.

Subject
Language
Publisher
Thesis Degree Level
Thesis Degree Name
Thesis Degree Discipline
Identifier
Rights Notes
  • Copyright © 2023 the author(s). Theses may be used for non-commercial research, educational, or related academic purposes only. Such uses include personal study, research, scholarship, and teaching. Theses may only be shared by linking to Carleton University Institutional Repository and no part may be used without proper attribution to the author. No part may be used for commercial purposes directly or indirectly via a for-profit platform; no adaptation or derivative works are permitted without consent from the copyright owner.

Date Created
  • 2023

Relations

In Collection:

Items