Business keys in Fact. Is this accepted?

View previous topic View next topic Go down

Business keys in Fact. Is this accepted?

Post  raikarleena on Tue Mar 10, 2009 7:28 pm

It is a data warehouse optimization project and a very basic data warehouse is in place. There are a couple of dimensions with only 2, 3 columns only (<30 rows) and where the Business key can be easily interpreted.

Ex: Event_Type_DIM: this has 2 columns (event_type_code,event_type_description) where code store 4 values:SMS,VOICE,GPRS,RECHARGE. Its basically a code that generated revenue.

Currently these Codes (Varchars(10)) are used in Fact for reference and not a surrogate key. The Client is very keen on having these business keys in the FACT and do not want them to be replaced by SKs. The reason is that: they use either these Business keys to filter the record or group by or directly in select clause. but not using as reference because they are rarely used for joining and retriving the other attribute from dimension. So they want to avoid the join as far as possible.

Similar things happen with another 3-4 dimensions. is this accepted? they say they do not care about space (though we have 160 million records) and want only type SCD1. As a result they also have NULLs in fact which is taken care by outer joins.

Is this architecture accepted?

i am actualy trying to introduce a proper dimension with SK, PK and FK but they seem to be reluctant. So thought i would collect good reasons to justify myself.

Thanks in Advance!!!

Leena

raikarleena

Posts : 11
Join date : 2009-03-10

View user profile

Back to top Go down

Re: Business keys in Fact. Is this accepted?

Post  BoxesAndLines on Wed Mar 11, 2009 1:32 pm

Not really. But I think you already know that. The best example I can give you is the time dimension. Sure I can store the date on the fact table (and I sometimes do) but the minute I want to filter by holidays, weekends, Monday's, last week, last month, and so forth, it becomes extremely tedious using SQL.

Additionally, if your users are using a BI tool, the physical design of the underlying data structure should be irrelevant. My guess is you have a group of SQL savvy users that want to run queries directly on the database. As long as that is occurring, you will always fight and likely lose this battle.
avatar
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

View user profile

Back to top Go down

Re: Business keys in Fact. Is this accepted?

Post  DilMustafa on Thu Mar 12, 2009 12:30 am

Interesting. We were facing the same scenerio a while ago. My developers wanted to use Date as a key in the fact tables. We ended up convincing them on using SK. I will stick to the best practices and will use SK.'s Refer them to Kimball articles and best practices.

DilMustafa

Posts : 49
Join date : 2009-02-03
Location : Calgary, Canada

View user profile

Back to top Go down

Re: Business keys in Fact. Is this accepted?

Post  raikarleena on Thu Mar 12, 2009 6:08 pm

Thanks a lot for youe reply and the example.
Almost all the modeling/best Practices I know are picked by me from Kimball's books/articles/Tips so Pretty good with basics and this one (using SKs) is i guess one of the top 5 most striked tips. So kind of explected the same answer. The answer definately reassures me that I was correct and give me much more confidence with real life supporting issues to go back and fight for SK incorporation.

Thanks a lot. I am happy to receive the answer

raikarleena

Posts : 11
Join date : 2009-03-10

View user profile

Back to top Go down

Re: Business keys in Fact. Is this accepted?

Post  BrianJarrett on Fri Mar 13, 2009 1:33 pm

I also agree that these values should be forced to the dimensions. Is the join really killing performance? Maybe a DBA needs to take a look at some indexing and/or partitioning strategies.

I'd also get rid of the nulls in your fact table; your star schema really needs to be inner joins only. Point them to a -1 placeholder dimension (or something providing the equivalent functionality). The outer joins will often negatively affect performance; maybe this is one of the reasons why your users don't want to join to the dimension tables.
avatar
BrianJarrett

Posts : 61
Join date : 2009-02-03
Age : 43
Location : St. Louis, MO

View user profile

Back to top Go down

Re: Business keys in Fact. Is this accepted?

Post  raikarleena on Mon Mar 16, 2009 6:52 am

@BoxesAndLines - You are right when you say
"My guess is you have a group of SQL savvy users that want to run queries directly on the database. As long as that is occurring, you will always fight and likely lose this battle"

I perfectly understand the importance of surrogate key in a Data Warehouse but I am a little stuck in explaining it to my client as they are either not interested in using the featured that would be added to their system Or are not bothered with the advantage. They too have some valid questions and I am a little stuck as I am not able to sho them the business benifit.

The points I used to stress on a star schema with SK and the what client manager had to say on them is as follows:
1.
Me: Reduced Space in fact - This would ensure storage is reduced by 50% as we are not repeating the business Keys(20 chars) where as dealing with Surrogate keys (int). This would help reduce paging in memory at query time as fact size would now be much smaller

Client: SQL server 2008 uses mechanism for compression. So 20 char long value is stored just once and pointers are internally stored to referene the location. So no significant benifit in terms of Fact size.

2.
Me:Elimination of NULLS and business keys not mathing values with Dimensions in Fact -
As NULL and unmatched values would point to associated keys (-1 and -2) and all the queries will use equijoin, choosing optimal execution paths.

Client:What about the late arriving dimensions. U would populate a -2 key in dimension and would need re-processing to replace the -2s with appropriate Surrogate keys. As I plan to populate the unidentified business keys in dimension first (with other attributes as 'N/A') and use them in fact and when the dimension arrives no additional processing Required to get them aligned.
Moreover, I generate my 3 summary tables which would be Flat and all Reporting/cubes would access them so the performance only affects 1 time smmary building.It is now taking 6 mins and thats pretty good.

3.
Me: Currently we having all SCD-1 history tracking. In future is SCD-2 is required for any of the confirmed Dims (as many new Facts are expected), we would have to track in fact which is not a good design. Say: I will be repeating the MSISDN version in fact for every call made by that Subscriber etc.

Client:I know its not a good design but show me the benifit.

4.
Me: Ad-hoc Queries- Currently performance is not an issue as we are just running the query just once (to built the summary table) and reporting/cubes done on these summaries. This is possible in initial Phase as queries ar ehighly predictible. Once we open the Ad-Hoc reporting to users (reporting at any level including most granular), it would cause a problem as the queries would then run all the day on DHW and would use Business keys (Varchar(20)) for joins.

Client: I do not plan to open the CDR Level (most granular level Fact) to users. we are getting 30 million records per day and it is not advisable to give them the freedom to query such a huge table.


I am not sure how can I convince them. I know it would be a problem as my warehouse grows but do not have any points to argue

I am looking for some points on about discussions that would help me go back and fight my battle. I know its tough but they telling me 'show us the benifit and we would agree!!!"

Any help is appreciated!!!

-Lee

raikarleena

Posts : 11
Join date : 2009-03-10

View user profile

Back to top Go down

Re: Business keys in Fact. Is this accepted?

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum