Object Detection and Tracking for Creation of Interactive Videos

Resource Type

Creator

Abstract

Object detection is a fundamental approach in creating interactive videos. In this thesis, we propose a new method for object detection, combining object recognition with tracking in a neural network. Specifically, we use GoogLeNet as a feature extractor, and then apply a long short-term memory (LSTM) network to further adjust the feature vectors extracted by GoogLeNet according to the context of the feature vectors of the previous frame. We feed the output of the LSTM to a classifier and regressor as in the Overfeat network, to obtain predicted confidences and predicted bounding boxes. We pre-train the feature extractor on ImageNet datasets, then evaluate our network on OTB100 dataset. We compare our results to results obtained without tracking. Our model shows a better performance at predicting objects in frames where occlusion and background clutter appear, and results in more consistent object bounding boxes across frames.

Subject

Language

Publisher

Thesis Degree Level

Thesis Degree Name

Thesis Degree Discipline

Identifier

Rights Notes

Copyright © 2018 the author(s). Theses may be used for non-commercial research, educational, or related academic purposes only. Such uses include personal study, research, scholarship, and teaching. Theses may only be shared by linking to Carleton University Institutional Repository and no part may be used without proper attribution to the author. No part may be used for commercial purposes directly or indirectly via a for-profit platform; no adaptation or derivative works are permitted without consent from the copyright owner.

Date Created

Relations

In Collection:

Thumbnail	Title	Date Uploaded	Visibility	Actions
	li-objectdetectionandtrackingforthecreationof.pdf	2023-05-05	Public	Download