Object Detection and Tracking for Creation of Interactive Videos

It appears your Web browser is not configured to display PDF files. Download adobe Acrobat or click here to download the PDF file.

Click here to download the PDF file.


Li, Jueya




Object detection is a fundamental approach in creating interactive videos. In this thesis, we propose a new method for object detection, combining object recognition with tracking in a neural network. Specifically, we use GoogLeNet as a feature extractor, and then apply a long short-term memory (LSTM) network to further adjust the feature vectors extracted by GoogLeNet according to the context of the feature vectors of the previous frame. We feed the output of the LSTM to a classifier and regressor as in the Overfeat network, to obtain predicted confidences and predicted bounding boxes. We pre-train the feature extractor on ImageNet datasets, then evaluate our network on OTB100 dataset. We compare our results to results obtained without tracking. Our model shows a better performance at predicting objects in frames where occlusion and background clutter appear, and results in more consistent object bounding boxes across frames.


Computer Science




Carleton University

Thesis Degree Name: 

Master of Computer Science: 

Thesis Degree Level: 


Thesis Degree Discipline: 

Computer Science

Parent Collection: 

Theses and Dissertations

Items in CURVE are protected by copyright, with all rights reserved, unless otherwise indicated. They are made available with permission from the author(s).