Sequence Modeling with Linear Complexity

It appears your Web browser is not configured to display PDF files. Download adobe Acrobat or click here to download the PDF file.

Click here to download the PDF file.


Lioutas, Vasileios




Sequence modeling represents a fundamental task in machine learning. Today, non-autoregressive attention-based methods that use self-attention are deemed standard for modeling sequences. The major drawback of self-attention is the quadratic time complexity O(n^2) for the processing of a sequence that it requires. This makes self-attention expensive and slow when it comes to long sequences. In this thesis, we aim to reduce this complexity to linear time without using attention and we introduce Time-aware Large Kernel (TaLK) Convolutions, a novel adaptive convolution operation that learns to predict the size of a summation kernel instead of using a fixed-sized kernel matrix, with a time complexity of O(n). We evaluated our method using benchmark datasets in four different NLP tasks: machine translation, language modeling, abstractive text summarization and text classification. Our proposed method is capable of achieving state-of-the-art results without attention and with a significantly faster execution time and smaller memory footprint.


Artificial Intelligence
Computer Science




Carleton University

Thesis Degree Name: 

Master of Computer Science: 

Thesis Degree Level: 


Thesis Degree Discipline: 

Computer Science

Parent Collection: 

Theses and Dissertations

Items in CURVE are protected by copyright, with all rights reserved, unless otherwise indicated. They are made available with permission from the author(s).