Fed up with ignorance. A miniblog post for your consideration.

View previous topic View next topic Go down

Fed up with ignorance. A miniblog post for your consideration.

Post  durik on Wed Oct 10, 2012 8:20 am

Hello everyone!
If you agree that a fact table with 50-100 dimensions is a good idea, then please, skip the whole post.

I don't really consider you provide a solution, but a comment and a cheerful word could just make my day! I really need it…
And I hope that this would be a most unusual and entertaining post, as the problem I am having is not with the Dimensional Design, but with a person, and maybe even with a lot of them.

I am assigned to a project as a consulting architect and there is no such option to switch people in the project. The pay is really small for such a job, but I just like the challenge and hope for the best in the future, since I consider knowledge gained to be a valid motivator and a driver of the future progress in my life.

We have a large, very complex, and VERY badly structured data-source. That "would-be" Data Warehouse product, as they call it ( not me) had being under constant and active development by monkey coders until recently. They had been brutally unleashed out of supervision just about 10+ years ago, and not so long ago they even developed the means to change the structure of the product via auto-updater, so as to fast-fix incorrect data or to add something quickly. It contains critical government business data with a mix of both aggregates and rows of structured data in a very abusive and cross-referencing meaning, but it is all in itself is a story for another post.

So let us just say, we have a very complex data source on our hands, and such as no Kimball or another author post or book covers it. There is just a little coverage on aggregates, ETL, and how to deal with them in different ways.

The goal of the project is to provide the client with the easy means of analyzing data. That is a problem in the current design, as the aggregates and rows mentioned are built for the Reports coded in Excel via XML in a 1 SQL table cell to 1 XLS report cell relationship. The report engine… is self-coded. So, as you can guess, there is absolutely no way anyone ever thought there of business processes. And everyone sees just one “business” process – adding another column to a new report.

It gets even worse, since the client is a government organization where no one cares about really analyzing data, but just about getting the numbers to the top of hierarchy. Still, they’ve heard about beautiful dashboards on their IPads, and asked for an enterprise OLAP solution, as to please their bosses with more beautiful sets of the same numbers.
Originally on my team no one even knew about Kimball. All the team's knowledge of dimensional design came from relational DB design and some extracts from the OLAP Product books, which, as you know, contain a page or two on the theme. So the team consists of me, an OLAP developer, and a business analyst, who is also a db developer.

So it is here where the fun part begins.

The business analyst analyzed the data source, and gave me a database diagram for the data warehouse. I asked him about dimensions, since they were missing and were gibberish. Let me explain. I was given a relationship database diagram of a fact table in excel (we had powerdesigner btw). That fact table had about 100 columns. The facts were correctly identified, and were completly additive conforming to the legacy reports. However, the main bulk of columns were “dimension” columns. Some dimension columns had foreign keys to dimension tables, but some did not, and still were called dimensions. The ones that had references actually were based on data source having foreign keys relationships. So his logic was simple – if the data source had a foreign key relationship – than that would be a dimension with a “foreing key”. And other columns in the diagram were “dimensions”, but without foreing keys.

I tried to explain, that this is an incorrect way of designing the fact and dimension tables, and given him a lot of references to pages and citations from my honestly bought from my own personal money kindle Kimball books. I have to confess, I downloaded some off the web in PDF, as to show him the books, but I have bought all of them, as I just love my kindle!
To my amusement, he said : “My work is 100% Kimball”. As if he had time to read and understand them, it actually was very fast, and a matter of a day or so.

We had long arguments then.
I tried to explain with citations from Kimball books, with a lot of Excel and Database and even OLAP examples:
1. That no one makes 50-100 dimensions in a fact table, as that is counter-productive to user-friendliness, to speed and complexity of mdx queries in OLAP, and complicates ETL and aggregation mid-tier-database complexity in our case.
2. That dimensions are made from different attributes, and these attributes are gathered into hierarchies, based on business user requirements in a given business process.
3. That correlated values (Yes, I even showed him data profiler) is another, supplemental way to analyze dimensions.
4. I even tried to argue that his table is large junk dimension with supposedly uncorrelated values, when they do infact mean a lot to each other and can be grouped.

His argument still defeats mine!!!!

His argument is, no matter what I say :
“Show me where Kimball tells us to artificially add up those dimensions… into groups”
And I can't answer it...

I showed him an example I made with 8 dimensions, and 5 business processes with a bus matrix diagram.
Then I asked him to provide me with an example of 2 business processes, and he told me, that he analyzed everything and there is 1 business process with 50 dimensions, and that it is all 100% Kimball.

He told me also, that my 8 dimensions and 5 business processes are gibberish, and I’ve agreed so, as I never ever spoken to the client myself.

And then, again.
“Show me where Kimball tells us to artificially add up those dimensions… into groups”

I am at a complete loss and I gave up. His argument is valid...
I am going to accept his “dimensional” diagram and leave the project ASAP.

Still, I did not gave up completely, but trying to contact you for a cheer up or so.

It is a matter of professional pride for a person who actually knows some thoery not only in dimension design, but in statistics and data analysis as well.

Thank you for reading.


Posts : 2
Join date : 2012-10-10

View user profile

Back to top Go down

Re: Fed up with ignorance. A miniblog post for your consideration.

Post  Mike Honey on Thu Oct 11, 2012 11:00 pm

Hi Durik,

How about pointing your friend here:

In particular the 3rd para:
"Although you could lump all context into a wide, logical record associated with each measured fact, you’ll usually find it convenient and intuitive to divide the context into independent logical clumps.
We call these logical clumps dimensions ..."

Good luck!
Mike Honey

Posts : 185
Join date : 2010-08-04
Location : Melbourne, Australia

View user profile http://www.mangasolutions.com

Back to top Go down

Re: Fed up with ignorance. A miniblog post for your consideration.

Post  durik on Fri Oct 12, 2012 10:23 am

Thank you, Mike!!!
That was really helpful, in combination with some other info, but that you've mentioned was just what he needed.
Now we finally have more than 1 business process and the situation is slowly improving.
It's far for being a success, however, since the business process business talks about are data-centric, and not business centric.
That is probably typical within government organizations and it's the main next challenge for us.


Posts : 2
Join date : 2012-10-10

View user profile

Back to top Go down

Re: Fed up with ignorance. A miniblog post for your consideration.

Post  Sponsored content

Sponsored content

Back to top Go down

View previous topic View next topic Back to top

- Similar topics

Permissions in this forum:
You cannot reply to topics in this forum