Too many Time Dimension Role Views?

View previous topic View next topic Go down

Too many Time Dimension Role Views?

Post  mea0730 on Thu Jul 15, 2010 2:44 pm

I have a question about date fields and the time dimension.

We have an Order Detail fact table that has about 10 date fields. We're not sure how best to handle these with regards to joining them to the "Time" dimension. If we strictly follow the relational Star model, we'll have 10 surrogate date fields and 10 Role views on top of the customer dimension. what's the best practice for this? Is that the right way to do it! Do we leave some of the dates that we don't need to "rollup" as just "Degenerate Dimension" fields in the fact table and only create surrogate keys against the dates we think we will need to join to the time dimension? My concern is that someone will want to rollup on one of the date fields we didn't create a surrogate key for!

I'd be interested to see how others have handled this!

Thanks,
Mike

mea0730

Posts : 21
Join date : 2010-05-15
Location : San Jose, CA

View user profile

Back to top Go down

Re: Too many Time Dimension Role Views?

Post  ngalemmo on Thu Jul 15, 2010 4:05 pm

Wither you store them as foreign keys or as degenerate values, it doesn't make much difference from a number of columns or space point of view. If there is any chance the date may have a need for the dimension attributes (for rollups, filtering and so on), then, by all means, store the foreign key. However, dates such as effective and expiration dates for the row, don't work well as foreign keys because the dimension attributes are of little use since queries against these columns are almost always based on BETWEEN filters for a specific date.

As for views, I am not a big proponent of defining views in the database. Just about every BI tool has facilities to deal with multiple dimension roles without needing such structures in the schema. It's extra work that doesn't have any payback.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Too many Time Dimension Role Views?

Post  mea0730 on Fri Jul 16, 2010 12:52 am

But without a view, how would you allow the database to maintain the ref integrity with a FK constraint?

Thanks,
Mike

mea0730

Posts : 21
Join date : 2010-05-15
Location : San Jose, CA

View user profile

Back to top Go down

Re: Too many Time Dimension Role Views?

Post  BoxesAndLines on Fri Jul 16, 2010 9:20 am

Views don't implement referential integrity. Additionally, referential integrity constraints are extra overhead in a data warehouse. I agree with ngalemmo here. The value offered by views doesn't correlate to the effort to maintain them.
avatar
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

View user profile

Back to top Go down

Re: Too many Time Dimension Role Views?

Post  ngalemmo on Fri Jul 16, 2010 12:00 pm

Yep... as B&L said, RI constraints are extra overhead you don't need. The process of assigning surrogate keys is more than enough to ensure RI. The only DB constraint I've ever implemented is unique PK indexes on dimension tables (and the occasional NOT NULL on Y/N type columns).
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Too many Time Dimension Role Views?

Post  mea0730 on Fri Jul 16, 2010 1:23 pm

So you don't use any FK constraints? I know that Kimball recommends using constraints unless it really becomes a performance issue!

mea0730

Posts : 21
Join date : 2010-05-15
Location : San Jose, CA

View user profile

Back to top Go down

Re: Too many Time Dimension Role Views?

Post  ngalemmo on Fri Jul 16, 2010 1:57 pm

Then I guess I disagree with Ralph... It's not the first time.

If you go through the bother of assigning surrogate keys, why do you need the database to double check? I mean, it is kind of like wearing a belt AND suspenders.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Too many Time Dimension Role Views?

Post  hang on Fri Jul 16, 2010 10:30 pm

mikea730 wrote: I know that Kimball recommends using constraints unless it really becomes a performance issue!

Really? Kimball said that. If so, I think RI constraints should be optional, in DW dimensional store, rather than a rule or best practice.

I started my DW experience quite a few years ago after having read two of Kimball's famous books on dimensional modeling and ETL. I never had the impression that I need to have RI constraints in dimensional data store (DDS) and I have never wanted to do it in all of my BI projects even though many others attempted to do so. Undoubtedly IR constraint is very useful to protect data quality for an OLTP system, but in DW, particularly in dimensional area, I fully agree with ngalemmon and B&L.

In my experience, having IR constraint in DDS is not just a performance issue in production but also an unnecessary hindering factor in development. RI issues in DDS should be resolved by ETL process not relying on built-in RDBMS features. In my current project, after causing so many headaches to ETL (performance, unable to truncate tables, exploding transaction logs) by IR constraints enforced by DBA, the only reason left is the connivence of auto-generating nice ER diagram by certain modeling tools. It's laughable justification when you think about the cost I mentioned in the ETL headache which could easily kill the project altogether. As pointed out many times by Kimball in his books, focus on two priorities in data warehousing, performance and ease of use. My take is don't get sidetracked too early by things like IR constraints, or sometimes even implementing a sophisticated security model at early stage, which could jeopardise the delivery significantly.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

View user profile

Back to top Go down

Re: Too many Time Dimension Role Views?

Post  mea0730 on Sun Jul 18, 2010 12:41 pm

I see your points with regards to the ETL taking care of the constraints and it make sense, and maybe I'm being too paranoid, but my concern is with that one "offshore" ETL programmer that get added to the project a year from now and decides to "do things his own way" and bypass the surrogate key pipeline process. But listening to what your saying, I'm leaning towards "disabled" constraints! and that's a big step for me!

What do you think about Tom Kyte's case for FK constraints below? Do you think this would apply to ""RELY DISABLE" constraints as well?

http://www.oracle.com/technology/oramag/oracle/09-may/o39asktom.html

This might be a case for using views in roleplaying dimensions. In Oracle you can add a PK with "RELY DISABLE" on a view and then FK's in the facts can also reference the PK of the view (they also need to have "RELY DISABLE").

PS. Hang, check out the Data Warehouse Lifecycle. He discusses disabling and enabling the constraints as part of the ETL processing.


mea0730

Posts : 21
Join date : 2010-05-15
Location : San Jose, CA

View user profile

Back to top Go down

Re: Too many Time Dimension Role Views?

Post  mea0730 on Sun Jul 18, 2010 1:00 pm

Sorry for the links, but here's a couple more references. He makes some good points with regards to FK's.

http://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:22885462604505

http://tkyte.blogspot.com/2006/01/stuck-in-rut.html

mea0730

Posts : 21
Join date : 2010-05-15
Location : San Jose, CA

View user profile

Back to top Go down

Re: Too many Time Dimension Role Views?

Post  hang on Mon Jul 19, 2010 7:53 am

The only dimension that can be skipped from the surrogate key pipeline is the date dimension if you use yyyymmdd integer date key, which is recommended, as it can be derived from the date or datetime field existing in the fact. All other dimension keys should be meaningless surrogate key and can only be obtained from the pipeline by joining their respective dimensions. I am not too concerned about joining the date dimension for RI purpose as you would pre-build the dimension to cover all the possible dates.

I don't mind creating views for role playing dimensions, but they've got nothing to do with RI and are purely for clarity purpose to dimension users. If you have created RI constraints, then it is recommended to disable them during ETL if performance becomes an issue.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

View user profile

Back to top Go down

Re: Too many Time Dimension Role Views?

Post  ngalemmo on Mon Jul 19, 2010 12:40 pm

but my concern is with that one "offshore" ETL programmer that get added to the project a year from now and decides to "do things his own way" and bypass the surrogate key pipeline process

Onshore, offshore, it doesn't much matter... you need to invoke standards, reviews and above all, testing in any project. If you are using a robust ETL tool that supports reusable components, setting up a consistant, reusable SK assignment process is dead simple. There is no valid excuse to 'bypass' it.

And I really do not like the idea of disabling then reenabling FK constraints... If you try to enable them after a load and there is a problem, what exactly do you do then? You are up at 2 AM, and for the next few hours are trying to find the bad row (or rows) and probably, because you are half asleep, will make a few mistakes in the process. Not the way I would like to start my day.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Too many Time Dimension Role Views?

Post  Jeff Smith on Wed Jul 21, 2010 3:52 pm

Usually, Referential Integrety isn't an issue since the Dimension Keys in the fact table come from the Dimension Table. Kimball recommends keeping RI to protect against somebody doing something that shouldn't be allowed to do such as changing the value of Dimension Keys in the Fact Table.

You can always turn off RI during the load and turn it back on after the load (something about an ounce of prevention).

10 dates in a fact table seems like a lot. Should some of them be part of other dimensions? If space is a concern, put the date fields that typically have the same dates into a dimension table of their own. If, for example, you process claims, it's typical that the received date of the claim , the processing date, and the check date are all on the same date. You could put the 3 dates into a single dimension that links to the fact table and the date dimension table. This dimension would have the following columns: Dimension Key, Received Date Key, Processed Date Key, and Check Date Key. You'd take a performance hit with joins involving this table, but if the 3 dates weren't used very often, this design could speed up performance as it would make your fact table smaller. It is a snow flak and some people frown on such things, but if space is an issue, then why not.

I am working with data with lots of dates that are typical the same date Claim Open Date, Claim Closed Date, etc. I'd say 90% of the rows have the same date values in most of the date fields.

If a date is used a lot, then I wouldn't create the pseudo date dimension. It all depends.

Jeff Smith

Posts : 471
Join date : 2009-02-03

View user profile

Back to top Go down

Re: Too many Time Dimension Role Views?

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum