A Workload-Driven Framework for NoSQL Data Modeling and Partitioning

It appears your Web browser is not configured to display PDF files. Download adobe Acrobat or click here to download the PDF file.

Click here to download the PDF file.

Creator: 

Davoudian, Ali

Date: 

2021

Abstract: 

Due to the scalability problems in traditional relational database systems, a variety of NoSQL stores have emerged over the last decade to deal with big data. The lack of standard processes for designing and partitioning NoSQL datasets, as two non-orthogonal principles of distributed database systems, has led to the proposal of several recent methods. On the one hand, the existing design methods provide various conceptual modeling notations and mainly target a particular NoSQL data model that cause extra effort for designers when switching from one data model to another. Also, by providing just a set of guidelines and heuristics for the design process, many methods have to be applied manually which is an error-prone and time-consuming process. To deal with these limitations, we present a novel method for designing key-value, wide-column, and document NoSQL database schemas from the same conceptual model. It first generates a generic NoSQL logical schema from the conceptual model and query workload of the system. Then it converts the generic schema to the schemas of targeted NoSQL data models regarding their important features and design trade-offs between the read query performance and storage overhead or consistency maintenance. On the other hand, the existing graph partitioning strategies are mostly workload-agnostic, as they presume the same probability of traversing edges or visiting vertices, which does not always hold with different query workloads. In addition, they are mostly graph topology-agnostic, as they do not differentiate between high-degree and low-degree vertices. Furthermore, many existing workload-aware strategies are unable to adapt to dynamic workloads. To address these limitations, we present a novel workload-adaptive and topology-driven approach named Helios, that aims to achieve low-latency and high-throughput online graph queries. In order to assess the impact of Helios on a graph store and to show how easily the approach can be plugged on top of the system, we exploit it in a distributed graph-based RDF store. The query engine of the store exploits Helios to reduce inter-node communication for future queries and balances the computational load across a cluster of nodes.

Subject: 

Computer science

Language: 

English

Publisher: 

Carleton University

Thesis Degree Name: 

Doctor of Philosophy: 
Ph.D.

Thesis Degree Level: 

Doctoral

Thesis Degree Discipline: 

Computer Science

Parent Collection: 

Theses and Dissertations

Items in CURVE are protected by copyright, with all rights reserved, unless otherwise indicated. They are made available with permission from the author(s).