Inverse Visual Question Answering with Multi-Level Attentions

Resource Type

Creator

Abstract

Inverse Visual Question Answering (iVQA) is a contemporary task emerged from the need for improving visual and language understanding. It tackles the challenging problem of generating a corresponding question for a given image-answer pair. Current state-of-the-art iVQA models use the conventional way of representing images by using a convolutional neural network (CNN) to extract visual features. Although some models leverage semantic concepts as an enhancement for the answer cue, they give the same importance weights to these concepts without considering their correlation with the answers. Moreover, the existing iVQA models mainly rely on the conventional recurrent neural networks for question modelling. Nevertheless, the attention-based sequence learning mechanism for question modelling which could help to reduce model parameters remains unexplored. In this research, we address these issues by developing two novel deep multilevel attention models for the task of inverse visual question answering.

Subject

Language

Publisher

Thesis Degree Level

Thesis Degree Name

Thesis Degree Discipline

Identifier

Rights Notes

Copyright © 2019 the author(s). Theses may be used for non-commercial research, educational, or related academic purposes only. Such uses include personal study, research, scholarship, and teaching. Theses may only be shared by linking to Carleton University Institutional Repository and no part may be used without proper attribution to the author. No part may be used for commercial purposes directly or indirectly via a for-profit platform; no adaptation or derivative works are permitted without consent from the copyright owner.

Date Created

Relations

In Collection:

Thumbnail	Title	Date Uploaded	Visibility	Actions
	alwattar-inversevisualquestionansweringwithmultilevel.pdf	2023-05-05	Public	Download