Multiple Model Reinforcement Learning for Environments with Poissonian Time Delays

It appears your Web browser is not configured to display PDF files. Download adobe Acrobat or click here to download the PDF file.

Click here to download the PDF file.


Campbell, Jeff




This thesis proposes a novel algorithm for use in reinforcement learning problems
where a stochastic time delay is present in an agent's reinforcement signal. In these
problems, the agent does not necessarily receive a reinforcement immediately after the
action that caused it. We relax previous constraints in the literature by assuming that
rewards may arrive to the agent out of order or may even overlap with one another.
The algorithm combines Q-learning and hypothesis testing to enable the agent to
learn about the delay itself. A proof of convergence is provided. The algorithm
tested in a grid-world simulator in MATLAB, the Webots mobile-robot simulator,
and in an experiment with a real e-Puck mobile robot. In each of these test beds,
the algorithm is compared to Watkins' Q-learning, of which it is an extension. In all
cases, the novel algorithm outperforms Q-learning in situations where reinforcements
are variably delayed.


Artificial Intelligence
System Science




Carleton University

Thesis Degree Name: 

Master of Applied Science: 

Thesis Degree Level: 


Thesis Degree Discipline: 

Engineering, Electrical and Computer

Parent Collection: 

Theses and Dissertations

Items in CURVE are protected by copyright, with all rights reserved, unless otherwise indicated. They are made available with permission from the author(s).