One of the most powerful and prominent technologies for knowledge discovery in Decision
Support System environments is Online Analytical Processing (OLAP). OLAP is the foundation
for a wide range of essential business applications. Since its introduction, OLAP
has consistently required a massive computational power. On the other hand, the physical
limits of the speed for single processor systems bound the performance of any single processor
solution. Parallel and distributed processing can provide two key ingredients to solve
this problem: increased computational power through parallel
processors and increased I/O
bandwidth through parallel storage.
In this thesis, we provide new methods to parallelize OLAP systems on the recent parallel
and distributed platforms. We provide new algorithms including two parallel sorting
algorithms (an important part of data cube construction) on many-core Graphics Processors
(GPUs) and multi-core CPUs. In addition, we introduce a method for parallel construction
of static data cubes on multi-core CPUs.
Next, we present the main contribution of this thesis in the area of Real-time OLAP.
We offer and discuss a new algorithmic solution
with a new data structure called PDC-tree
that supplies Real-time OLAP for multi-core platforms. To our knowledge, the PDC-tree is
the first solution that provides a fully parallel Real-time OLAP using a parallelized tree data
structure. We emphasize that the PDC-tree provides Real-time OLAP without materializing
any data cube, and hence avoids its drawbacks.
In the last part of this thesis, we focus on the subject of parallel and distributed Realtime
OLAP on cloud architectures. A cloud-based framework called CR-OLAP is developed
that builds the structure of our cloud solution.
CR-OLAP encompasses a new OLAP data
structure called PDCR-tree, a non-trivial enhanced successor of the PDC-tree. In addition
to answering OLAP queries in a real-time manner, CR-OLAP provides the scalability and
load balancing of data among cloud resources, while assuring performance for very large
data warehouses. Experiments on the Amazon EC2 Cloud confirm the real-time responsibility
of CR-OLAP under a heavy load of OLAP queries in large data warehouses.