A Graph-Based Indexing Technique for Efficient Searching in Large Scale Textual Documents

It appears your Web browser is not configured to display PDF files. Download adobe Acrobat or click here to download the PDF file.

Click here to download the PDF file.

Creator: 

Kalandar Mohideen, Mohamed Abdulla

Date: 

2020

Abstract: 

This thesis proposes a new graph-based indexing technique to improve the search latency for textual documents by using a Graph-Based Index (GBI) structure. GBI uses a directed graph built using a hash table to effectively capture the simultaneous occurrence of multiple keywords in a document. The objective is to use the relationship between the search keywords captured in the graph structure and a fast hash table lookup to effectively retrieve all the results of a query at once. A proof-of-concept prototype has been built for both GBI and Inverted Index. A thorough performance analysis is carried out for comparing GBI with Inverted Index using a synthetic workload. GBI is also compared with an enterprise-level search engine called Elasticsearch. The results show that the graph-based indexing technique can reduce the search latency for executing queries notably in comparison to Inverted Index and Elasticsearch.

Subject: 

System Science
Computer Science

Language: 

English

Publisher: 

Carleton University

Thesis Degree Name: 

Master of Applied Science: 
M.App.Sc.

Thesis Degree Level: 

Master's

Thesis Degree Discipline: 

Engineering, Electrical and Computer

Parent Collection: 

Theses and Dissertations

Items in CURVE are protected by copyright, with all rights reserved, unless otherwise indicated. They are made available with permission from the author(s).