Degenerate Dimension Implementation

View previous topic View next topic Go down

Degenerate Dimension Implementation

Post  ChrisJames on Sat Apr 30, 2011 9:32 am

I am looking for some guidance around the proper scenario to use a degenerate dimension.

I currently have a fact table defined as such:

Date_Key
Distributor_Key
Item_Key
Customer_Key
Invoice_Number
Invoice_TXN_Date
Purchase_QTY
Purchase_AMT


The Date_Key is the date that the data was provided (its provided once a month) and is what all reporting is based from. The Invoice_TXN_Date is the date of the transaction for the item. It is rare that there are multiple Invoice_TXN_Date values for a given Invoice_Number, but it does happen.

When I originally created this my thought was to make the Invoice_Number a Degenerate Dimension. I do not have any Invoice Line item numbers and since this table is at the Item level, the Invoice Number and Invoice Date are repeated for each item related to a specific invoice. This fact table has 8 million records. The distinct Invoice_Number, Invoice_TXN_Date count is roughly 500,000.

The only reporting requirements given around the Invoice_Number and Invoice_Txn_Date are the following:
- Be able to display items and totals by Invoice Number
- Be able to display items and totals by Invoice_Txn_Date

My first question is if it makes sense to make the Invoice_Number a degenerate dimension given the above data scenario or should it be a separate dimension?

Second, if its best to go the degenerate route and considering that I am putting the data into Analysis Services 2008, would I need to add a surrogate key to my fact table for the Invoice_Number to have a unique key to create the Degenerate dimension on?

Thanks!

ChrisJames

Posts : 2
Join date : 2011-04-30

View user profile

Back to top Go down

Re: Degenerate Dimension Implementation

Post  ngalemmo on Sat Apr 30, 2011 6:53 pm

would I need to add a surrogate key to my fact table for the Invoice_Number to have a unique key to create the Degenerate dimension on?

A degenerate dimension is a dimension with no attributes. You simply place the invoice number in the fact table. No surrogate key. End of story.

Its an appropriate thing to do for things like invoice numbers.

Invoice transaction date should be an FK to the date dimension. It is not clear that is the case from your column name.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Degenerate Dimension Implementation

Post  ChrisJames on Sun May 01, 2011 7:37 am

Thanks. I understand about not needing a surrogate key if using the Invoice Number as a degenerate dimension. But in regards to my first question, since the Invoice Number is not the same grain as my fact table and is repeated for each Item that makes up the complete invoice, is it still appropriate to leave it in the Fact table? Fact table is over 8 million records. Distinct Invoice number is under 500,000. I know things like Invoice Number are usually left in the fact table but I am curious as to when it is not appropriate and if my given situation is not an appropriate scenario.

ChrisJames

Posts : 2
Join date : 2011-04-30

View user profile

Back to top Go down

Re: Degenerate Dimension Implementation

Post  ngalemmo on Mon May 02, 2011 12:55 am

Sure. Each row is an item on the invoice, so invoice number is part of the grain.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Degenerate Dimension Implementation

Post  BoxesAndLines on Mon May 02, 2011 11:24 am

That wil also provide an easy method for counting the number of invoices.
avatar
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

View user profile

Back to top Go down

Re: Degenerate Dimension Implementation

Post  ngalemmo on Mon May 02, 2011 2:29 pm

Another comment... repetition of a dimensional key or a degenerate dimension value in a fact table is normal and expected. Which is why fact table queries are almost always aggregate queries.

The cardinality, in relation to the fact table, is only an issue with an actual dimension table, never with a degenerate dimension. Usually, when there is a high number of dimension rows relative to the number of fact rows, it is an indication that the dimension needs to be revisited and possibly broken down. That is why, like in your case, you don't usually see Invoice or Order dimensions, because the attributes relating to invoices and orders are broken down into simpler (and much smaller) dimension tables. This leaves just the identity (invoice number, order number, etc...) which winds up as a degenerate value in the fact.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Degenerate Dimension Implementation

Post  datamodeller on Wed May 04, 2011 9:37 am

There is another dimension to your query. From the description it appears that the grain of the fact is at 'Invoice Item level'. If in future, there is a plan to have another fact table with the granularity of 'Invoice Number' and roll over/denormalise ceratin item line attributes to the fact directly, you would also prefer to have the drill down capability from the Invoice Fact to Invoice Item Fact. In such circumstance, I guess keeping 'Invoice Number' as dimension and linking both the fact tables will be beneficial.

datamodeller

Posts : 9
Join date : 2010-07-25

View user profile

Back to top Go down

Re: Degenerate Dimension Implementation

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum