Sequence Modeling with Linear Complexity
Public Deposited- Resource Type
- Creator
- Abstract
Sequence modeling represents a fundamental task in machine learning. Today, non-autoregressive attention-based methods that use self-attention are deemed standard for modeling sequences. The major drawback of self-attention is the quadratic time complexity O(n^2) for the processing of a sequence that it requires. This makes self-attention expensive and slow when it comes to long sequences. In this thesis, we aim to reduce this complexity to linear time without using attention and we introduce Time-aware Large Kernel (TaLK) Convolutions, a novel adaptive convolution operation that learns to predict the size of a summation kernel instead of using a fixed-sized kernel matrix, with a time complexity of O(n). We evaluated our method using benchmark datasets in four different NLP tasks: machine translation, language modeling, abstractive text summarization and text classification. Our proposed method is capable of achieving state-of-the-art results without attention and with a significantly faster execution time and smaller memory footprint.
- Subject
- Language
- Publisher
- Thesis Degree Level
- Thesis Degree Name
- Thesis Degree Discipline
- Identifier
- Rights Notes
Copyright © 2020 the author(s). Theses may be used for non-commercial research, educational, or related academic purposes only. Such uses include personal study, research, scholarship, and teaching. Theses may only be shared by linking to Carleton University Institutional Repository and no part may be used without proper attribution to the author. No part may be used for commercial purposes directly or indirectly via a for-profit platform; no adaptation or derivative works are permitted without consent from the copyright owner.
- Date Created
- 2020
Relations
- In Collection:
Items
Thumbnail | Title | Date Uploaded | Visibility | Actions |
---|---|---|---|---|
lioutas-sequencemodelingwithlinearcomplexity.pdf | 2023-05-05 | Public | Download |