Multiple Model Reinforcement Learning for Environments with Poissonian Time Delays

Resource Type

Creator

Abstract

This thesis proposes a novel algorithm for use in reinforcement learning problems where a stochastic time delay is present in an agent's reinforcement signal. In these problems, the agent does not necessarily receive a reinforcement immediately after the action that caused it. We relax previous constraints in the literature by assuming that rewards may arrive to the agent out of order or may even overlap with one another. The algorithm combines Q-learning and hypothesis testing to enable the agent to learn about the delay itself. A proof of convergence is provided. The algorithm is tested in a grid-world simulator in MATLAB, the Webots mobile-robot simulator, and in an experiment with a real e-Puck mobile robot. In each of these test beds, the algorithm is compared to Watkins' Q-learning, of which it is an extension. In all cases, the novel algorithm outperforms Q-learning in situations where reinforcements are variably delayed.

Subject

Language

Publisher

Thesis Degree Level

Thesis Degree Name

Thesis Degree Discipline

Identifier

Rights Notes

Copyright © 2014 the author(s). Theses may be used for non-commercial research, educational, or related academic purposes only. Such uses include personal study, research, scholarship, and teaching. Theses may only be shared by linking to Carleton University Institutional Repository and no part may be used without proper attribution to the author. No part may be used for commercial purposes directly or indirectly via a for-profit platform; no adaptation or derivative works are permitted without consent from the copyright owner.

Date Created

Relations

In Collection:

Thumbnail	Title	Date Uploaded	Visibility	Actions
	campbell-multiplemodelreinforcementlearningforenvironments.pdf	2023-05-04	Public	Download