Multiple Model Reinforcement Learning for Environments with Poissonian Time Delays

Public Deposited
Resource Type
Creator
Abstract
  • This thesis proposes a novel algorithm for use in reinforcement learning problems where a stochastic time delay is present in an agent's reinforcement signal. In these problems, the agent does not necessarily receive a reinforcement immediately after the action that caused it. We relax previous constraints in the literature by assuming that rewards may arrive to the agent out of order or may even overlap with one another. The algorithm combines Q-learning and hypothesis testing to enable the agent to learn about the delay itself. A proof of convergence is provided. The algorithm is tested in a grid-world simulator in MATLAB, the Webots mobile-robot simulator, and in an experiment with a real e-Puck mobile robot. In each of these test beds, the algorithm is compared to Watkins' Q-learning, of which it is an extension. In all cases, the novel algorithm outperforms Q-learning in situations where reinforcements are variably delayed.

Subject
Language
Publisher
Thesis Degree Level
Thesis Degree Name
Thesis Degree Discipline
Identifier
Rights Notes
  • Copyright © 2014 the author(s). Theses may be used for non-commercial research, educational, or related academic purposes only. Such uses include personal study, research, scholarship, and teaching. Theses may only be shared by linking to Carleton University Institutional Repository and no part may be used without proper attribution to the author. No part may be used for commercial purposes directly or indirectly via a for-profit platform; no adaptation or derivative works are permitted without consent from the copyright owner.

Date Created
  • 2014

Relations

In Collection:

Items