Deep Reinforcement Learning as Guidance for Aerospace Robotics

It appears your Web browser is not configured to display PDF files. Download adobe Acrobat or click here to download the PDF file.

Click here to download the PDF file.


Hovell, Kirk Charles




The ability for a manipulator-equipped chaser spacecraft to autonomously capture a target spacecraft is an unsolved prerequisite for space debris removal and on-orbit servicing. This thesis investigates using deep reinforcement learning (DRL) to improve the capabilities of a manipulator-equipped chaser at this task. DRL allows for behaviour to be learned, rather than designed, according to a simple reward function. DRL uses trial-and-error to learn the behaviour, which is not feasible to perform on-board a spacecraft. Training must therefore be performed in simulation with the resulting behaviour transferred to the spacecraft. Transferring the learned-in-simulation behaviour to a real robot, however, is difficult due to dynamics differences between the simulator and the real world, i.e., the simulation-to-reality gap. This thesis develops, over the course of four increasingly-difficult applications, a solution to the simulation-to-reality gap by restricting DRL to exclusively learn the guidance portion of the guidance, navigation, and control system needed for autonomous spacecraft operations. The first application is spacecraft proximity operations (without capture), where a DRL-based guidance strategy issuing desired velocity signals is designed, trained, and evaluated in simulation and experiment. Next, the DRL-based guidance strategy is improved upon and applied to a quadrotor proximity operations scenario. Here, it is demonstrated in simulation and experiment that desired acceleration signals lead to better performance compared to desired velocity signals. These two proof-of-concept results show the proposed DRL-based guidance strategy is viable for bringing DRL to real aerospace vehicles. Next, the DRL-based guidance strategy is applied to a more difficult scenario: a multi-agent cooperative quadrotor runway inspection task, where fault-tolerant behaviour is successfully learned and demonstrated in both simulation and a real, outdoor, GPS-driven quadrotor facility. Finally, with the now-developed DRL-based guidance strategy, the author returns to the central motivator for this research: autonomous manipulator-based capture of a spinning spacecraft. The DRL-based guidance strategy learns this task in simulation and is successfully transferred to an experimental facility where similar results are obtained. Additionally, capture is successful in experiment despite large perturbations and initial conditions not seen during training. Improvements to the experimental facility were performed to enable this research.


Engineering - Aerospace
Artificial Intelligence




Carleton University


Supervisor and co-author: 
Steve Ulrich
Murat Bronz

Thesis Degree Name: 

Doctor of Philosophy: 

Thesis Degree Level: 


Thesis Degree Discipline: 

Engineering, Aerospace

Parent Collection: 

Theses and Dissertations

Items in CURVE are protected by copyright, with all rights reserved, unless otherwise indicated. They are made available with permission from the author(s).