Learning in the Multi-Robot Pursuit Evasion Game

Public Deposited
Resource Type
Creator
Contributors
Abstract
  • This thesis investigates the learning issue for mobile robots playing the differential forms of the pursuit-evasion (PE) game by proposing different learning algorithms. The intended learning algorithms are used to reduce (1) the computational requirements as much as possible, without affecting the overall performance of the learning algorithm, (2) the learning time, and (3) the capture time and the possibility of collision among the pursuers, and to deal with multi-robot PE game with a single superior evader.The computational complexity is reduced by examining four methods of parameter tuning for the Q-Learning Fuzzy Inference System (QFIS) algorithm to decide which parameters are the best to tune and which parameters have a little impact on the performance. Then, two learning algorithms are proposed to reduce the learning time. The first one uses a two-stage learning technique that combines the PSO-based fuzzy logic control (FLC) algorithm with the QFIS algorithm. The PSO algorithm is used as a global optimizer, whereas the QFIS algorithm is used as a local optimizer. The second one is a modified version of the fuzzy-actor critic learning (FACL) algorithm, which is called fuzzy actor-critic learning Automaton (FACLA) algorithm. It uses the continuous actor-critic learning Automaton (CACLA) algorithm to tune the parameters of the FIS.After that, a decentralized learning technique is proposed to enable a group of two pursuers or more to capture a single inferior evader. It uses the FACLA algorithm together with the Kalman filter technique to reduce the capture time and to reduce the collision possibility among the pursuers. No communication among the pursuers is assumed. Finally, a decentralized learning algorithm is suggested and applied successfully for the case of multi-robot PE game with a single superior evader, in which all the players have similar speeds. A new reward function is suggested and used as a guide for the pursuer to move either to the intercepted point with the evader or to move in parallel with the evader depending on whether the pursuer can capture the evader or not. Simulation results show the feasibility of the proposed learning algorithms.

Subject
Language
Publisher
Thesis Degree Level
Thesis Degree Name
Thesis Degree Discipline
Identifier
Rights Notes
  • Copyright © 2019 the author(s). Theses may be used for non-commercial research, educational, or related academic purposes only. Such uses include personal study, research, scholarship, and teaching. Theses may only be shared by linking to Carleton University Institutional Repository and no part may be used without proper attribution to the author. No part may be used for commercial purposes directly or indirectly via a for-profit platform; no adaptation or derivative works are permitted without consent from the copyright owner.

Date Created
  • 2019

Relations

In Collection:

Items