Specify acceptable Snowflaking scenarios

View previous topic View next topic Go down

Specify acceptable Snowflaking scenarios

Post  bi_at_nj on Sat Oct 31, 2009 12:49 am

Can someone please specify acceptable snowflaking scenarios.

- Thanks in advance
bi_at_nj

bi_at_nj

Posts : 14
Join date : 2009-10-31

View user profile

Back to top Go down

Re: Specify acceptable Snowflaking scenarios

Post  kapoor_dh on Wed Dec 16, 2009 3:42 am

Snowflaking is advisable when there exists a many to one relationship between the key attributes of the dimension which are readily accessed along with the Fact table and the attributes which are not so often used along with Fact but are related to the key attributres in Dimension resulting in lot of rows in the Dimension table (due to many to one relationship)

Example : Customer Dim in Sales Star Schema,A customer can have multiple addresses,order_address,billing_address,ship_to_address etc.Now a particular sales record can have one ot all of these address aplicable,meaning all of the addresses can be same for a single customer or they can differ,here it is advisable to keep only the key attributes of the Customer (like name,etc) in cusomer dimension and take out the address attributes in the table name customer_address and join the customer dimension (1:m) to this.

Another candidate could be product dimension.Snoflaking makes the join between the Fact and dimension faster avoiding the scan of too many rows.

kapoor_dh

Posts : 24
Join date : 2009-12-08

View user profile

Back to top Go down

Re: Specify acceptable Snowflaking scenarios

Post  ngalemmo on Wed Dec 16, 2009 1:05 pm

Addresses are not an acceptable example of snowflaking. If you implement a separate customer dimension and customer address dimension, you may carry the customer dimension key in the customer address dimension, but this is technically not a snowflake because that foreign key would not (and should not) be used when reporting facts. It would only be used when generating a list of customers.

A fact table should contain foreign keys to all related dimensions. So there would be a customer FK as well as multiple role based FKs to the related addresses as necessary.

Snowflaking, which involves joining from a fact to a dimension, then joining again from the dimension to another dimension, doesn't make queries run faster, particularly on database systems (such as Oracle) which have features specifically to support star joins.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Specify acceptable Snowflaking scenarios

Post  kapoor_dh on Thu Dec 17, 2009 1:03 am

You are correct,but the scenario which I have mentioned,the addressed are not often used for reporting,hence they are unnessesary increasing the size of the customer dimension.Only in very few Adhoc reportings we do need the addresses as well,I agree in such cases there will be an extra join,but the cases are few.

Also this is what I understand of Snowflaking a table that is coming out of dimension and not linked to the fact table,correct me if I am wrong and you have any other definition for the same,explaing with an example.

kapoor_dh

Posts : 24
Join date : 2009-12-08

View user profile

Back to top Go down

Re: Specify acceptable Snowflaking scenarios

Post  BoxesAndLines on Thu Dec 17, 2009 12:51 pm

ngalemmo wrote:...If you implement a separate customer dimension and customer address dimension, you may carry the customer dimension key in the customer address dimension, but this is technically not a snowflake because that foreign key would not (and should not) be used when reporting facts. ...

Just checking here, you would put the customer FK in the address dim or vice versa? I would put the address FK('s) in the customer dim.
avatar
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

View user profile

Back to top Go down

Re: Specify acceptable Snowflaking scenarios

Post  ngalemmo on Thu Dec 17, 2009 1:04 pm

Yes, a snowflake is a FK from a dimension to another dimension where the latter dimension is not referenced by the fact.

But as far as an application of snowflaking goes, there is little gained by snowflaking shipping and billing information in a order-to-cash subject area. Just add the FK's to the facts.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Specify acceptable Snowflaking scenarios

Post  ngalemmo on Thu Dec 17, 2009 1:08 pm

BoxesAndLines wrote:Just checking here, you would put the customer FK in the address dim or vice versa? I would put the address FK('s) in the customer dim.

It is typical that a customer may have multiple shipping addresses, and if the company involved is a manufacturer, many billing addresses as well. It would be impractical to place the FK's on the customer. Which is also why you need to put the FK's in the fact table. The shipping and/or billing destination may change from one order to the next.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Specify acceptable Snowflaking scenarios

Post  BoxesAndLines on Sun Dec 20, 2009 3:10 pm

On the flip side of the coin, an address plays in umpteen (very precise term ;^)) different roles within the organization. Burdening an address dimension with all of these roles (i.e. FK's to other dimensions would be unwieldy). I've recently consolidated half of the addresses within my current client's organization and I already have 20M distinct addresses. The only snowflakes I've designed are to demographic dimensions. Customer relationships are temporary relationships to an address and would cause serious performance issues maintaining the current customer in a type 1 or actual history on a type 2 on the address dimension.
avatar
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

View user profile

Back to top Go down

Re: Specify acceptable Snowflaking scenarios

Post  Jeff Smith on Tue Dec 22, 2009 1:18 pm

We can a case of a revenue fact table. The fact table had a huge number of rows and 1 single fact - revenue. The dimensions include product, customer, and dates for when the revenue was generated. There were 6 or 7 dates associate with the fact table, only 1 of which was used with any frequency. The database had a date dimension and a decision was made to create snow flake design that involved a table that had 5 columns containing the the date deminsion keys for each of the dates in the revenue data plus a "Revenue Date Key" that was put on the fact table. This greatly reduced the size of the fact table and improved the performance of the vast majority of queries against the fact table since few of them used the Dates in the Revenue Date Dimension.

I would only use such a design for very detailed data. Imagine a large bank with tens of millions of accounts and the interest rate spread changing daily plus transaction fees.

Of course, this design was from several years ago when a terabyte of disk actually took up space in a data center. Now a days, the actual physical space to store 10 columns of integers is less than the old disk used to store 1 column of integers.

Jeff Smith

Posts : 471
Join date : 2009-02-03

View user profile

Back to top Go down

Re: Specify acceptable Snowflaking scenarios

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum