Creator:
Date:
Abstract:
Sequence modeling represents a fundamental task in machine learning. Today, non-autoregressive attention-based methods that use self-attention are deemed standard for modeling sequences. The major drawback of self-attention is the quadratic time complexity O(n^2) for the processing of a sequence that it requires. This makes self-attention expensive and slow when it comes to long sequences. In this thesis, we aim to reduce this complexity to linear time without using attention and we introduce Time-aware Large Kernel (TaLK) Convolutions, a novel adaptive convolution operation that learns to predict the size of a summation kernel instead of using a fixed-sized kernel matrix, with a time complexity of O(n). We evaluated our method using benchmark datasets in four different NLP tasks: machine translation, language modeling, abstractive text summarization and text classification. Our proposed method is capable of achieving state-of-the-art results without attention and with a significantly faster execution time and smaller memory footprint.