On Multi-Agent Reinforcement Learning in Matrix, Stochastic and Differential Games

Public Deposited
Resource Type
Creator
Abstract
  • In this thesis, we investigate how reinforcement learning algorithms can be applied to two different types of games. The first type of games are matrix and stochastic games, where the states and actions are represented in discrete domains. In this type of games, we propose two multi-agent reinforcement learning algorithms to solve the problem of learning when each learning agent has only minimum knowledge about the underlying game and the other learning agents. We mathematically show that the proposed CLR-EMAQL algorithm converges to Nash equilibrium in games with pure Nash equilibrium. We introduce the concept of Win-or-Learn-Slow (WoLS) mechanism for the proposed EMAQL algorithm so that the proposed algorithm learns slowly when it is losing. We also provide a theoretical proof of convergence to Nash equilibrium for the proposed EMAQL algorithm in games with pure Nash equilibrium. In games with mixed Nash equilibrium, our mathematical analysis shows that the proposed EMAQL algorithm converges to an equilibrium. Although our mathematical analysis does not explicitly show that the proposed EMAQL algorithm converges to Nash equilibrium, our simulation results indicate that the proposed EMAQL algorithm does converge to Nash equilibrium. The second type of games are differential games, where the states and actions are represented in continuous domains. We provide four main contributions. First, we propose a new fuzzy reinforcement learning algorithm for differential games that have continuous state and action spaces. Second, we propose a new fuzzy reinforcement learning algorithm for pursuit-evasion games so that the pursuer trained by the proposed algorithm can capture the evader even when the environment of the game is different from the training environment. Third, we propose a new decentralized fuzzy reinforcement learning algorithm for multi-pursuer pursuit-evasion differential games with a single-superior evader that has a speed similar to the speed of the pursuers. Fourth, we propose a new decentralized fuzzy reinforcement learning algorithm for multi-pursuer pursuit-evasion differential games with a single-superior evader that has a speed that is similar to or higher than the speed of each pursuer. Simulation results show the effectiveness of the proposed algorithms.

Subject
Language
Publisher
Thesis Degree Level
Thesis Degree Name
Thesis Degree Discipline
Identifier
Rights Notes
  • Copyright © 2017 the author(s). Theses may be used for non-commercial research, educational, or related academic purposes only. Such uses include personal study, research, scholarship, and teaching. Theses may only be shared by linking to Carleton University Institutional Repository and no part may be used without proper attribution to the author. No part may be used for commercial purposes directly or indirectly via a for-profit platform; no adaptation or derivative works are permitted without consent from the copyright owner.

Date Created
  • 2017

Relations

In Collection:

Items