A Workload-Driven Framework for NoSQL Data Modeling and Partitioning

Public Deposited
Resource Type
Creator
Abstract
  • Due to the scalability problems in traditional relational database systems, a variety of NoSQL stores have emerged over the last decade to deal with big data. The lack of standard processes for designing and partitioning NoSQL datasets, as two non-orthogonal principles of distributed database systems, has led to the proposal of several recent methods. On the one hand, the existing design methods provide various conceptual modeling notations and mainly target a particular NoSQL data model that cause extra effort for designers when switching from one data model to another. Also, by providing just a set of guidelines and heuristics for the design process, many methods have to be applied manually which is an error-prone and time-consuming process. To deal with these limitations, we present a novel method for designing key-value, wide-column, and document NoSQL database schemas from the same conceptual model. It first generates a generic NoSQL logical schema from the conceptual model and query workload of the system. Then it converts the generic schema to the schemas of targeted NoSQL data models regarding their important features and design trade-offs between the read query performance and storage overhead or consistency maintenance. On the other hand, the existing graph partitioning strategies are mostly workload-agnostic, as they presume the same probability of traversing edges or visiting vertices, which does not always hold with different query workloads. In addition, they are mostly graph topology-agnostic, as they do not differentiate between high-degree and low-degree vertices. Furthermore, many existing workload-aware strategies are unable to adapt to dynamic workloads. To address these limitations, we present a novel workload-adaptive and topology-driven approach named Helios, that aims to achieve low-latency and high-throughput online graph queries. In order to assess the impact of Helios on a graph store and to show how easily the approach can be plugged on top of the system, we exploit it in a distributed graph-based RDF store. The query engine of the store exploits Helios to reduce inter-node communication for future queries and balances the computational load across a cluster of nodes.

Subject
Language
Publisher
Thesis Degree Level
Thesis Degree Name
Thesis Degree Discipline
Identifier
Rights Notes
  • Copyright © 2021 the author(s). Theses may be used for non-commercial research, educational, or related academic purposes only. Such uses include personal study, research, scholarship, and teaching. Theses may only be shared by linking to Carleton University Institutional Repository and no part may be used without proper attribution to the author. No part may be used for commercial purposes directly or indirectly via a for-profit platform; no adaptation or derivative works are permitted without consent from the copyright owner.

Date Created
  • 2021

Relations

In Collection:

Items