Distributed representations, predominantly acquired via neural networks, have been applied to natural language processing tasks including speech recognition and machine translation with a success comparable to sophisticated state-of-the-art algorithms. The present thesis offers an investigation of the application of such representations to information extraction. Specifically, I explore the suitability of applying shallow distributed representations to the automatic terminology extraction task, as well as the bridging reference resolution task. I created a dataset as a gold standard for automatic term extraction in the mathematical education domain. I carefully assessed the performance of the existing terminology extraction methods on this dataset. Then, I introduce a novel method %DI it is one method or several? I changed to one method since the next sentence says algorithm at singular not plural %Ehsan: noted, thanks. for automatic terminology extraction for one word terms, and I evaluate the performance of the novel algorithm in various terminological domains. The introduced algorithm leverages the distributed representation of words from the local and global perspectives to encode syntactic, semantic, association, and frequency information at the same time. Furthermore, this novel algorithm can be trained with a minimal number of data points. I show that the algorithm is robust to the change of domain, and that information can be transferred from one technical domain to another, leveraging what we call anchor words with consistent semantics shared between the domains. As for the bridging reference resolution task, a dataset is built on the letter portion of the Open American National Corpus and I compare the performance of a preliminary method against a majority class baseline.