Inverse Visual Question Answering with Multi-Level Attentions

It appears your Web browser is not configured to display PDF files. Download adobe Acrobat or click here to download the PDF file.

Click here to download the PDF file.


Alwattar, Yaser




Inverse Visual Question Answering (iVQA) is a contemporary task emerged from the need for improving visual and language understanding. It tackles the challenging problem of generating a corresponding question for a given image-answer pair. Current state-of-the-art iVQA models use the conventional way of representing images by using a convolutional neural network (CNN) to extract visual features. Although some models leverage semantic concepts as an enhancement for the answer cue, they give the same importance weights to these concepts without considering their correlation with the answers. Moreover, the existing iVQA models mainly rely on the conventional recurrent neural networks for question modelling. Nevertheless, the attention-based sequence learning mechanism for question modelling which could help to reduce model parameters remains unexplored. In this research, we address these issues by developing two novel deep multilevel attention models for the task of inverse visual question answering.


Artificial Intelligence




Carleton University

Thesis Degree Name: 

Master of Computer Science: 

Thesis Degree Level: 


Thesis Degree Discipline: 

Computer Science

Parent Collection: 

Theses and Dissertations

Items in CURVE are protected by copyright, with all rights reserved, unless otherwise indicated. They are made available with permission from the author(s).