In multidimensional (MD) databases summarizability is a key property for obtaining interactive response times. With summarizable dimensions, pre-computed and materialized aggregate query results at lower levels of the dimension hierarchy can be used to correctly compute results at higher levels of the same hierarchy, improving efficiency.
Being summarizability such a desirable property, we argue that established MD models cannot properly model the summarizability condition, and this is a consequence of the limited expressive power of the modeling languages. In addition, because of
limitations in existing MD models, algorithms for deciding summarizability and cube view selection are not efficient or practical.
We propose an extension to the Hurtado-Meldelzon (HM) MD model, the EHM model, that includes subcategories and explore its properties specially in addressing issues related to summarizability. We investigate the extended model as a way to directly model MDDBs, with some clear advantages over HM models. Most importantly, EHM is -in a precise technical sense- more expressive than HM for modeling MDDBs that are subject to summarizability conditions. Moreover,
given an MD aggregate query in an EHM database, we can determine in a practical way (that only requires processing the dimension schema as opposed to the instance), from which minimal subset of pre-computed cube views it can be correctly computed.
Our extended model allows for a repair approach that transforms non-summarizable HM dimensions into summarizable EHM dimensions. We propose and formalize a two-step process that involves modifying both the schema and the instance of a non-summarizable HM dimension.