This thesis investigates the learning issue for mobile robots playing the differential forms of the pursuit-evasion (PE) game by proposing different learning algorithms. The intended learning algorithms are used to reduce (1) the computational requirements as much as possible, without affecting the overall performance of the learning algorithm, (2) the learning time, and (3) the capture time and the possibility of collision among the pursuers, and to deal with multi-robot PE game with a single superior evader.
The computational complexity is reduced by examining four methods of parameter tuning for the Q-Learning Fuzzy Inference System (QFIS) algorithm to decide which parameters are the best to tune and which parameters have a little impact on the performance. Then, two learning algorithms are proposed to reduce the learning time. The first one uses a two-stage learning technique that combines the PSO-based fuzzy logic control (FLC) algorithm with the QFIS algorithm. The PSO algorithm is used as a global optimizer, whereas the QFIS algorithm is used as a local optimizer. The second one is a modified version of the fuzzy-actor critic learning (FACL) algorithm, which is called fuzzy actor-critic learning Automaton (FACLA) algorithm. It uses the continuous actor-critic learning Automaton (CACLA) algorithm to tune the parameters of the FIS.
After that, a decentralized learning technique is proposed to enable a group of two pursuers or more to capture a single inferior evader. It uses the FACLA algorithm together with the Kalman filter technique to reduce the capture time and to reduce the collision possibility among the pursuers. No communication among the pursuers is assumed. Finally, a decentralized learning algorithm is suggested and applied successfully for the case of multi-robot PE game with a single superior evader, in which all the players have similar speeds. A new reward function is suggested and used as a guide for the pursuer to move either to the intercepted point with the evader or to move in parallel with the evader depending on whether the pursuer can capture the evader or not. Simulation results show the feasibility of the proposed learning algorithms.