Kimball Forum
Would you like to react to this message? Create an account in a few clicks or log in to continue.

Modeling for ad-hoc queries across measures and using nonconforming dimensions

3 posters

Go down

Modeling for ad-hoc queries across measures and using nonconforming dimensions Empty Modeling for ad-hoc queries across measures and using nonconforming dimensions

Post  josmi Tue Nov 13, 2012 11:48 am

We are in the early planning stages for creating the necessary data models for a web-based app that will allow users to query our data in a guided ad-hoc fashion.
What makes this interesting is that the data consists in a large number of measurements where each measurement has its own set of applicable dimensions and where only a few dimensions (mainly date/time) are shared among all measurements.

For example measurement M_a has dimensions Date, D_1 and D_2 while measurement M_b has dimensions Date, D_3 and D_4.
Users would e.g. want to create a report that shows the monthly sums of measurements M_a and M_b side by side where D_1 is restricted to some value and D_3 and D_4 are restricted to some other value.

Moreover, each measurement may be described by sets of non-conforming dimensions. So there will be many combinations where for measurement M_a the dimensions D_1 has some value, but there is no applicable value for D_2, and vice-versa.

We have hundreds of different dimensions and dozens of measurements and for each measurement, several dozens of dimensions may apply but overall, only a few dimensions are fully conforming for all or several of the measurements.

We are now puzzled by how this data should get best modeled into first a set of fact tables with associated dimensions in a star shape and subsequenetly maybe into cubes to take the biggest advantage of pre-existing tools or approaches to implement the web GUI for interactively creating the queries, specifying how to tabulate measures down and across etc.

The obvious way to do this seems to be to essentially factor this into all combinations of measures and dimensions such that each fact table only contains compatible measures and compatible dimensions -- this will result in many hundreds of tables though.
But then, which pre-existing tools would be available for dealing with query generation, drilling down, drilling across etc. especially when it has to be done across different fact tables or involves non-compatible dimension?

This is really two questions, I suppose:
- what is a state-of-the art, clean and scaling way to model data like this?
- how can one benefit from as many pre-existing tools and solutions as possible for implementing a query app based on such data?

josmi

Posts : 3
Join date : 2012-11-13

Back to top Go down

Modeling for ad-hoc queries across measures and using nonconforming dimensions Empty Re: Modeling for ad-hoc queries across measures and using nonconforming dimensions

Post  Mike Honey Thu Nov 15, 2012 10:05 pm

Hi josmi,

I'm imagining you could combine all your measurements into one fact, with an additional Measurement Type dimension (to keep track of what each row means).

I'd probably keep the dimensions separate, but look for opportunities to consolidate multiple dimensions into one dimension with multiple attributes, if they can be related in some way (even as a Junk dimension). Presenting a flat list of "hundreds" of choices will not be useful in any UI.

Each dimension needs an "Unknown" member row. Default your "Unknown" SK value (e.g -1) into the fact FK columns for the dimensions which are not relevant to that fact row. Then the one fact table can be joined to all the dimensions.

My favourite tech for this would be the Microsoft BI stack i.e. SQL Server Analysis Services cubes with Excel (published to Excel Web Services and Power View) and SQL Server Reporting Services as the analysis and reporting tools.

Good luck!
Mike
Mike Honey
Mike Honey

Posts : 185
Join date : 2010-08-04
Location : Melbourne, Australia

http://www.mangasolutions.com

Back to top Go down

Modeling for ad-hoc queries across measures and using nonconforming dimensions Empty Re: Modeling for ad-hoc queries across measures and using nonconforming dimensions

Post  josmi Fri Nov 16, 2012 12:02 pm

Hi Mike,

many thanks for this answer! To keep everything is as few fact tables (or even a single one) would have been what we originally would have liked most, but is this not creating restrictions as to what kind of reports one is able to create? As I pointed out, Dimensions are not conformant and we have measurements which are subdivided by some dimension A but not B and vice-versa.

Example1:
For date YYYYMMDD we have a measurement A which is identified by dimension DA1 having value va1 and DA2 having va2. For the same date YYYYMMDD we have a measurement B which is identified by dimension DB1 having value vb1. Both A and B also have for the conformant dimension C the value vc.

If I split this into two fact tables I would get (showing directly the dimension value instead of the corresponding key of the dimension table)
YYYYMMDDAva1va2vc1

YYYYMMDDBvb1vc1

One report would require to show, for each year accumulated in the rows, the measurements A where DA1=va1 und DA2=va2 in comparison with measurement B where DB1=vb1, possible that comparison done for all values of DC in the columns like so:
Year DC=vc1 DC=vc1 DC=vc2 DC=vc2
A B A B
2000 12 13 22 27
2001 10 11 21 21

How could one get the original data into a single fact table and still be able to create this report?

One way to create the fact table that I see is this:
DateMeasAMeasBDimDA1DimA2DimB1DimC
YYYYMMDDAnullva1va2N/Avc1
YYYYMMDDnullBN/AN/Avb1vc1
With this method, all measurements in a row where we have dimensions that do not apply to this measurement would be null and all dimensions that apply to null measurements would be N/A (not applicable).
Is there another way to do this that would solve all my problems?

Excuse me if I am overlooking the obvious here (i somehow have the feeling that I must be overlooking something basic here), but how does one construct cubes that deal with this so that we can get reports as illustrated with the sample report?

josmi

Posts : 3
Join date : 2012-11-13

Back to top Go down

Modeling for ad-hoc queries across measures and using nonconforming dimensions Empty Re: Modeling for ad-hoc queries across measures and using nonconforming dimensions

Post  ngalemmo Fri Nov 16, 2012 4:51 pm

A fundamental premise of a relational or dimensional model is there exists attributes by which data can be associated with other data.

In a dimensional framework, its call conformance. Without it, all you have are independent measures.

Forget about using cubes. They require conforming attributes, otherwise they are not of much use.

You basically have two options: write a whole bunch of ugly SQL queries, or go back to the drawing board and rethink the situation. One fact table or a hundred fact tables isn't going to make any difference.

ngalemmo
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

http://aginity.com

Back to top Go down

Modeling for ad-hoc queries across measures and using nonconforming dimensions Empty Re: Modeling for ad-hoc queries across measures and using nonconforming dimensions

Post  josmi Fri Nov 16, 2012 5:42 pm

Thank you for this advice!
It has already dawned on us that because of the many non-conforming dimenions, we won't be able to create useful cubes or use many of the standard tools for analysis and reports.
It is still a bit hard to believe that there is essentially nothing apart from doing it all ourselves that would support analysis, comparison, drill down, aggregation etc. in the case of multiple independet measures. In our domain, comparing those is exactly what is needed to get the required insights.

I am a bit puzzled about your second option:
ngalemmo wrote:
You basically have two options: write a whole bunch of ugly SQL queries, or go back to the drawing board and rethink the situation. One fact table or a hundred fact tables isn't going to make any difference.
In what way could one rethink the situation? Are there any tools or documented approaches how to best model data in such a situation or is the only way to make progress to sit down, think hard and essentially re-invent all the wheels necessary (figure out the data model, write tools for ETL, querying, reporting, managing metadata etc.)?

josmi

Posts : 3
Join date : 2012-11-13

Back to top Go down

Modeling for ad-hoc queries across measures and using nonconforming dimensions Empty Re: Modeling for ad-hoc queries across measures and using nonconforming dimensions

Post  Sponsored content


Sponsored content


Back to top Go down

Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum