MAWT: Multi-Attention-Weight Transformers

Askin, Enver Deniz

Download PDF

Resource Type

Thesis

Creator

Askin, Enver Deniz

Abstract

Transformers are machine learning models designed to learn and predict sequential and structured data, which are crucial to tasks such as neural machine translation and semantic parsing. They have become state-of-the-art engines for both of these tasks and much research in natural language processing is devoted to increasing their performance by introducing modifications to their architectures. In light of this trend, this thesis introduces a new Transformer architecture called MAWT: Multi-Attention-Weight Transformers in an attempt to increase the accuracy and variety of the acceptable predictions of a Transformer. It attempts to achieve this by training multiple weights per each Transformer attention head, which then are used to test the accuracy of the engine. This creates a new architecture under which the system produces a candidate set of outputs (instead of a singly output), along with a method for selecting from the candidate set. My proposal rests on the assumption -- motivated by statistical considerations -- that having a candidate set increases the probability of finding an exact match within the set. Upon testing, I observed that my system outperforms the regular transformer on 5/6 benchmark neural machine translation and semantic parsing datasets, where engine performance is measured by exact match accuracy. Exact match accuracy demands syntactic identity between the output and the target. In order to investigate how well my new architecture generalizes to measures of semantic equivalence that don't also demand syntactic identity, I also recorded the BLEU scores on these datasets. The BLEU score is a measure of performance based on n-grams rather than exact symbolic match (i.e., how many contiguous sequence of n-many strings from the predicted output match the desired output). The results I report on the BLEU scores are more mixed, raising important questions that I highlight about the role of syntax in measures of semantic equivalence.

Subject

Artificial intelligence

Language

English

Publisher

Carleton University

Thesis Degree Level

Doctoral

Thesis Degree Name

Doctor of Philosophy (Ph.D.)

Thesis Degree Discipline

Cognitive Science

Identifier

DOI: https://doi.org/10.22215/etd/2023-15442

Rights Notes

Copyright © 2023 the author(s). Theses may be used for non-commercial research, educational, or related academic purposes only. Such uses include personal study, research, scholarship, and teaching. Theses may only be shared by linking to Carleton University Institutional Repository and no part may be used without proper attribution to the author. No part may be used for commercial purposes directly or indirectly via a for-profit platform; no adaptation or derivative works are permitted without consent from the copyright owner.

Date Created

2023

Relations

In Collection:

Theses and Dissertations

Items

Thumbnail	Title	Date Uploaded	Visibility	Actions
	askin-mawtmultiattentionweighttransformers.pdf	2023-05-05	Public	Download

MAWT: Multi-Attention-Weight Transformers

Downloadable Content

Relations

Items