In this thesis, we aim to learn a deep embedding space suitable for k-NN. Our approach is based on minimizing the leave-one-out 1-NN classification error in the embedding space. Directly optimizing for such a rule is not tractable due to its discontinuous nature. We propose Multi-scale Deep Nearest Neighbour (MsDNN) which is a differentiable loss function that aims to maximize the expected sample margin for every training sample. The output of MsDNN is an embedding space. We evaluate the resulting space from two angles. From the classification view, during testing, we run a k-NN classifier and report the classification accuracy. But classification accuracy does not tell us the entire story about the goodness of an embedding space. Therefore, we run k-means clustering in the embedding space. Analogous to the hierarchical clustering, subclasses might exist on different scales. Our method provides a mechanism to target subclasses in different scales.