Multiple Model Reinforcement Learning for Environments with Poissonian Time Delays
Public Deposited- Resource Type
- Creator
- Abstract
This thesis proposes a novel algorithm for use in reinforcement learning problems where a stochastic time delay is present in an agent's reinforcement signal. In these problems, the agent does not necessarily receive a reinforcement immediately after the action that caused it. We relax previous constraints in the literature by assuming that rewards may arrive to the agent out of order or may even overlap with one another. The algorithm combines Q-learning and hypothesis testing to enable the agent to learn about the delay itself. A proof of convergence is provided. The algorithm is tested in a grid-world simulator in MATLAB, the Webots mobile-robot simulator, and in an experiment with a real e-Puck mobile robot. In each of these test beds, the algorithm is compared to Watkins' Q-learning, of which it is an extension. In all cases, the novel algorithm outperforms Q-learning in situations where reinforcements are variably delayed.
- Subject
- Language
- Publisher
- Thesis Degree Level
- Thesis Degree Name
- Thesis Degree Discipline
- Identifier
- Rights Notes
Copyright © 2014 the author(s). Theses may be used for non-commercial research, educational, or related academic purposes only. Such uses include personal study, research, scholarship, and teaching. Theses may only be shared by linking to Carleton University Institutional Repository and no part may be used without proper attribution to the author. No part may be used for commercial purposes directly or indirectly via a for-profit platform; no adaptation or derivative works are permitted without consent from the copyright owner.
- Date Created
- 2014
Relations
- In Collection:
Items
Thumbnail | Title | Date Uploaded | Visibility | Actions |
---|---|---|---|---|
campbell-multiplemodelreinforcementlearningforenvironments.pdf | 2023-05-04 | Public | Download |