A Parallel Processing Technique for Filtering and Storing User Specified Data

It appears your Web browser is not configured to display PDF files. Download adobe Acrobat or click here to download the PDF file.

Click here to download the PDF file.


Chanda, Bannya




Users are often interested in a specific type of data (user-preferred data) from a large-volume dataset. An efficient system that only stores user-preferred data from the large dataset can reduce the search latency, which allows the users to search for relevant information in a timely manner. The motivation behind this thesis is to devise a technique that filters a large dataset and stores only the filtered data, thereby saving storage space for the user. Running the filtering operation can be CPU-intensive, which can lead to high latency in extracting preferred data from the dataset. To solve this problem, the technique employs parallel processing and machine learning. A proof-of-concept prototype for this technique has been built on Apache Spark. The performance of the prototype subjected to synthetic datasets is analyzed. The analysis of experimental results shows the viability of this technique and provides insights into the system behavior and performance.


Computer Science
Engineering - Electronics and Electrical




Carleton University

Thesis Degree Name: 

Master of Applied Science: 

Thesis Degree Level: 


Thesis Degree Discipline: 

Engineering, Electrical and Computer

Parent Collection: 

Theses and Dissertations

Items in CURVE are protected by copyright, with all rights reserved, unless otherwise indicated. They are made available with permission from the author(s).