Generating a Junk dimension - Cross join or based on actual values in fact?

View previous topic View next topic Go down

Generating a Junk dimension - Cross join or based on actual values in fact?

Post  kletos on Thu Feb 21, 2013 5:35 pm

Hi there,

Are there any problems if I create a junk dimension and maintain it like a Type 1 SCD, compared to creating the junk dimension as a cross join to build an exhaustive list of permutations.

The rationale for a Type 1 styled Junk dimension is that a cross join results in 10,000 rows, while a distinct of attributes in the fact table results in only 100 rows. This 100 rows may never reach 10,000 rows.

Thanks!
ML

kletos

Posts : 1
Join date : 2013-02-13

View user profile

Back to top Go down

RE: Generating a Junk dimension - Cross join or based on actual values in fact?

Post  rathjeevesh on Thu Feb 21, 2013 10:12 pm

As per the kimball group advice we should not be creating the exhaustive list with the cross join, to begin with. In your scenario it is still manageable if the junk dimension will go up to 10K rows if created through cross join, but there can be cases where the distinct values for each of the columns may be little high and there can be a number of columns to be placed in the junk dimension which may lead to a huge dimension in terms of rows.

Its better to build your junk dimension based upon data realities i.e. populate them with new rows as they come from source. But if you are sure the number of rows by cross join will be at max 10K i.e. source has some restriction already enforced, distinct values for the columns are predefined then you may build the exhaustive list and may not have to worry about maintaining the ETL job to populate the junk dimension.

rathjeevesh

Posts : 15
Join date : 2013-02-16

View user profile

Back to top Go down

Re: Generating a Junk dimension - Cross join or based on actual values in fact?

Post  ngalemmo on Fri Feb 22, 2013 7:35 am

rathjeevesh wrote:As per the kimball group advice we should not be creating the exhaustive list with the cross join, to begin with. In your scenario it is still manageable if the junk dimension will go up to 10K rows if created through cross join, but there can be cases where the distinct values for each of the columns may be little high and there can be a number of columns to be placed in the junk dimension which may lead to a huge dimension in terms of rows.

Its better to build your junk dimension based upon data realities i.e. populate them with new rows as they come from source. But if you are sure the number of rows by cross join will be at max 10K i.e. source has some restriction already enforced, distinct values for the columns are predefined then you may build the exhaustive list and may not have to worry about maintaining the ETL job to populate the junk dimension.

This is correct. There may be a few times when a cross-join makes sense, but most of the time, just create rows for combinations that actually exist.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Generating a Junk dimension - Cross join or based on actual values in fact?

Post  chade25 on Mon Feb 25, 2013 11:40 pm

I like to test with:

select count(*) from (select distinct [column] from [table]>);

and in the case that it is customer ids, then leave it out of the select to get the distinct values of what would be the junk dimension and add columns in to the inner select and see what ones make it sky rocket to get an idea of the best combo.

I did that today for a demographic junk dimension and it was only 1037 for every combo! I was very happy. But I want to keep even the old values because the source system just stomps over them. So if I were to say just truncate the table and recreate it, i would lose accurate demographic history, because some combos would not link to that person...so add it as it comes to maintain accurate history on the fact table at the time of the event.

chade25

Posts : 29
Join date : 2012-04-12
Age : 37
Location : Oregon

View user profile

Back to top Go down

Re: Generating a Junk dimension - Cross join or based on actual values in fact?

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum