Dimension with two hierarchies

View previous topic View next topic Go down

Dimension with two hierarchies

Post  neo.helios on Thu Feb 24, 2011 9:41 am

I need to design a dimension with two hierarchies

1. Location -> Building -> Room -> Cabin
2. Location ->Building

I need the hierarchy 1 to be referenced in one FACT table and hierarchy 2 in other FACT. But my ETL tool allows only the dimension key of the lowest level to be mapped to the FACT table and generates a -ve Dimension_Key for the second hierarchy

Is it proper approach to create both the hierarchies in the same dimension and referencing different FACT tables

neo.helios

Posts : 11
Join date : 2010-11-02

View user profile

Back to top Go down

Re: Dimension with two hierarchies

Post  Jeff Smith on Thu Feb 24, 2011 9:56 am

I would build 1 dimension. I would build the dimension in 2 steps. I would create a staging dimension with Location and Building, creating a surrogate key (Location, Building, Loc_bldg_Skey). I would create a second dimension build with Location, Building, Loc_bldg_Skey, Room, Cabin, Loc_bldg_Rm_cbn_skey using the staging dimension as the source for the Location, Building, Loc_bldg_Skey columns.

I would create a view with distinct Location, Building, Loc_bldg_Skey as a mini dimension but I think an argument could be made to go ahead and make that initial staging table as a dimension table - my ETL allows for multiple surrogate keys in a dimension table.

By the way, it's one hierarchy with a rollup point (a mini dimension). Multiple hierachies in a dimension is when the lowest level can roll up along 2 or more different paths.

Jeff Smith

Posts : 471
Join date : 2009-02-03

View user profile

Back to top Go down

Re: Dimension with two hierarchies

Post  ngalemmo on Thu Feb 24, 2011 9:58 am

Is the Location/Building relationship different in #2? If not, how are these two hierarchies different?

Or is the issue that the facts are at different grains?

And what ETL tool are you using?
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Dimension with two hierarchies

Post  neo.helios on Thu Feb 24, 2011 10:05 am

Thanks for the quick reply Jeff.

With the approach that you suggest, if I have more levels in the dimension and if I have to use each level as a hierarchy ref. in different fact tables then I would end up creating many staging and mini dimensions. Please suggest

neo.helios

Posts : 11
Join date : 2010-11-02

View user profile

Back to top Go down

Re: Dimension with two hierarchies

Post  neo.helios on Thu Feb 24, 2011 10:06 am

the fact tables are at different grains. And I am using Oracle Warehouse Builder as the ETL tool

neo.helios

Posts : 11
Join date : 2010-11-02

View user profile

Back to top Go down

Re: Dimension with two hierarchies

Post  dellsters on Thu Feb 24, 2011 11:00 am

ngalemmo wrote:
And what ETL tool are you using?

Why does the ETL tool matter? Is there one in particular that manages the hierarchies better than others?

dellsters

Posts : 39
Join date : 2009-02-11

View user profile

Back to top Go down

Re: Dimension with two hierarchies

Post  Jeff Smith on Thu Feb 24, 2011 11:09 am

If you have mutliple levels in a hierarchy and you plan to create aggregate tables at the various levels, then yes, I would create surrogates at each level. If you plan to only rollup the data in cubes, then I wouldn't bother creating additional surrogate keys.

Let's say your detailed data was at the City level. You wanted to create aggregate tables for the county, state, and region level. I would have 1 dimension table with 4 surrogate keys - 1 for each level, city, county, state, region. I would create 3 views from the dimension table with County/State/Region, State/Region, and Region.

If I was planning to aggregate the data only in Cubes, then I would not create Surrogates for each level.

But, when in doubt, create the additinal surrogates. Easier to do it from the get go then after the fact.

Jeff Smith

Posts : 471
Join date : 2009-02-03

View user profile

Back to top Go down

Re: Dimension with two hierarchies

Post  Jeff Smith on Thu Feb 24, 2011 11:11 am

dellsters wrote:
ngalemmo wrote:
And what ETL tool are you using?

Why does the ETL tool matter? Is there one in particular that manages the hierarchies better than others?

I think the older the version of the ETL tool, the less functionality it was with regards to dimension building.

Jeff Smith

Posts : 471
Join date : 2009-02-03

View user profile

Back to top Go down

Re: Dimension with two hierarchies

Post  ngalemmo on Thu Feb 24, 2011 11:35 am

dellsters wrote:
ngalemmo wrote:
And what ETL tool are you using?

Why does the ETL tool matter? Is there one in particular that manages the hierarchies better than others?

I asked because of the comment:

But my ETL tool allows only the dimension key of the lowest level to be mapped to the FACT table and generates a -ve Dimension_Key for the second hierarchy

Most tools are flexible enough to do whatever you need to do...
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Dimension with two hierarchies

Post  dellsters on Thu Feb 24, 2011 3:12 pm

For high level ETL architecture, don't you populate the dimensions first and when it comes to populating the fact, just source all the dimensions and look up the keys and load the fact table?

dellsters

Posts : 39
Join date : 2009-02-11

View user profile

Back to top Go down

Re: Dimension with two hierarchies

Post  hang on Fri Feb 25, 2011 7:02 am

aluri.prabhakar wrote:But my ETL tool allows only the dimension key of the lowest level to be mapped to the FACT table and generates a -ve Dimension_Key for the second hierarchy
So don't use ETL tool if it does not do the job. Once you understand the process of ETL, the best tool is SQL. Vendor specific ETL tools are pretty for their look and having skills on them somehow makes you a serious ETL professional on the market. In reality, a lot of hard core stuff need to be scripted out by SQL, and believe me, properly scripted native SQL always outperforms visual tools.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

View user profile

Back to top Go down

Re: Dimension with two hierarchies

Post  Jeff Smith on Fri Feb 25, 2011 10:00 am

hang wrote:
aluri.prabhakar wrote:But my ETL tool allows only the dimension key of the lowest level to be mapped to the FACT table and generates a -ve Dimension_Key for the second hierarchy
So don't use ETL tool if it does not do the job. Once you understand the process of ETL, the best tool is SQL. Vendor specific ETL tools are pretty for their look and having skills on them somehow makes you a serious ETL professional on the market. In reality, a lot of hard core stuff need to be scripted out by SQL, and believe me, properly scripted native SQL always outperforms visual tools.

Not exactly true. Cognos DecisionStream did a very good job with dimensions and was fast, at the time, when it came to loading fact tables and aggregates. But, it was a little limiting. I think either SSIS or SSAS does a real nice job - it's very fast.

I like SQL and am comfortable with it but I don't think it's the fastest thing in the world. At least not for SQL Server databases.

Jeff Smith

Posts : 471
Join date : 2009-02-03

View user profile

Back to top Go down

Re: Dimension with two hierarchies

Post  hang on Fri Feb 25, 2011 7:00 pm

Jeff Smith wrote:I think either SSIS or SSAS does a real nice job - it's very fast.
True, they both do good job in their respective field. But we are talking about ETL, so SSAS is irrelevant, and it is irreplaceable by SQL when a cube is needed.

I like SSIS too when doing ETL, but I use it for its special features that can hardly be implemented in SQL, like connecting to alian database, connection string config, file management, email, cube processig and fauzzy matching etc. However some of the core ETL functionality like Surrogate Key generation and lookup, SCD implementation can be achieved in stored proc more efficiently.

What I am saying is, combine SQL scripts with the ETL tools. When a ETL tool does not do the job, use SQL instead of counting on tools to do everything for you. And also, if tools are too slow, try SQL and I bet the latter will be likely faster.

Jeff, have you compared the performance of using SSIS SCD2 component with stored proc in SQL for million records dimension. If you have you may not want to do everything in ETL by SSIS. My experience is 10 times difference sometimes.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

View user profile

Back to top Go down

Re: Dimension with two hierarchies

Post  Jeff Smith on Sat Feb 26, 2011 11:58 am

I'll be the first to admit my strength is not ETL. I was just sharing some of the comments I heard among the developers. We have a raging battle going on between 2 camps - one that swears by Stored Proc the other that thinks SQL Server's ETL tool is the way to go.

Jeff Smith

Posts : 471
Join date : 2009-02-03

View user profile

Back to top Go down

Re: Dimension with two hierarchies

Post  hang on Sat Feb 26, 2011 5:37 pm

As I said, SSIS has got pretty face and is impressive to show the picture to other people which is probably a good selling point. On the other hand, stored proc can do all the hard core stuff that SSIS can't, but with ugly old looking SQL code. However the value should only be measured by ..., you know what I mean. My advice is use both to their best strength.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

View user profile

Back to top Go down

Re: Dimension with two hierarchies

Post  robber on Tue May 03, 2011 8:31 am

ngalemmo wrote:Is the Location/Building relationship different in #2? If not, how are these two hierarchies different?

Or is the issue that the facts are at different grains?

And what ETL tool are you using?

I'm trying to model a similar scenario where the facts are at different grains. Normally I would snow flake at this point but never really like doing that so I'm intrigued about this concept of using multiple surrogate keys. So each dimension row we are going to store 2 surrogate keys, one for each grain? I would prefer not to snow flake but I'm not exactly thrilled about having more than one surrogate key either, which way would you go?


robber

Posts : 41
Join date : 2009-02-28
Location : Canada

View user profile

Back to top Go down

Re: Dimension with two hierarchies

Post  Jeff Smith on Tue May 03, 2011 8:54 am

What's wrong with having multiple surrogate keys in a dimension table representing different levels of the hierarchy? Say you geographic dimension starting multi-state regions drilling down to census block groups. You want to create an aggregate table at the county level. You would need a surrogate key for each county. Create a view from the geographic dimension selecting distinct rows from the County Surrogate Key on up.

Jeff Smith

Posts : 471
Join date : 2009-02-03

View user profile

Back to top Go down

Re: Dimension with two hierarchies

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum