Deep Generative Models for Unsupervised Scale-Based and Position-Based Disentanglement of Concepts from Face Images.

Public Deposited
Resource Type
Creator
Abstract
  • Among the different categories of natural images, face images are very important because of the role they play in human social interactions. It is recognised that despite all the recent advances of artificial intelligence using deep neural networks, computers are still struggling at achieving a rich and flexible understanding of face images comparable to humans' face perception abilities. This thesis aims at finding fully unsupervised ways for learning a transformation from face images pixel space to a representation space in which the underlying facial concepts are captured and disentangled. We propose that it is possible to utilize clues from the real 3D world in order to guide the representation learner in the direction of disentangling facial concepts. We conduct two studies in order to test this hypothesis. First, we propose a deep autoencoder model for extracting facial concepts based on their scales. We introduce an adaptive resolution reconstruction loss inspired by the fact that different categories of concepts are encoded in (and can be captured from) different resolutions of face images. With the help of this new reconstruction loss, the deep autoencoder model is able to receive a real face image and compute its representation vector, which not only makes it possible to reconstruct the input image faithfully, but also separates the concepts related to specific scales. Second, we introduce a new scheme to enable generative adversarial networks to learn a representation for face images which is composed of the representations for smaller facial components. This is inspired by the fact that all face images display the same underlying structure. As a result, a face image can be divided into parts with fixed positions each containing specific facial components only. Learning a separate distribution for each of these parts is equivalent to disentangling these components in the representation space.

Subject
Language
Publisher
Thesis Degree Level
Thesis Degree Name
Thesis Degree Discipline
Identifier
Rights Notes
  • Copyright © 2022 the author(s). Theses may be used for non-commercial research, educational, or related academic purposes only. Such uses include personal study, research, scholarship, and teaching. Theses may only be shared by linking to Carleton University Institutional Repository and no part may be used without proper attribution to the author. No part may be used for commercial purposes directly or indirectly via a for-profit platform; no adaptation or derivative works are permitted without consent from the copyright owner.

Date Created
  • 2022

Relations

In Collection:

Items