Imitation learning is a supervised learning problem that involves training a model to perform a task in a given environment using demonstrations of an expert. In this thesis, we propose 5 metrics to evaluate the performance of imitation learning agents. We compare state-of-the-art imitation learning models to deep neural networks at imitating state-based and reactive behavior. To compare the imitation learning techniques, we use two partially observable domains: the continuous RoboCup domain and the discrete Vacuum Cleaner domain. We show how our proposed metrics provide us with more qualitative information about the performance of imitation learners when imitating state-based behavior compared to state-of-the-art metrics. In addition, we show how our testing methodology provides results that resemble the eye-test that current testing methodologies fail to provide. We also show how Long Short-Term Memory (LSTM) networks outperform state-of-the-art models at imitating state-based behavior in the RoboCup soccer domain.