Monster Dimensions

View previous topic View next topic Go down

Monster Dimensions

Post  DJ on Wed Jul 21, 2010 11:52 am

My customer list is huge (about 70 millionish). We currently have a 60 million row dimension (CUSTOMER) that holds most of our existing customers, and this existing large DIM is currently performing fine for us.

Now my clients want to type 2 two customer attributes: salesrep id and sales channel code.

Since these two attributes will change more often than my other customer attributes, I'm putting them in a separate dimension (something like CUSTOMER SALES CHANNEL).

In this new CUSTOMER SALES CHANNEL dimension, my client wants to show that one customer can have two salesreps (based on product line). Thus, on my new DIM, my natural key is the concatenation of customer id, product line and salesrep.

SO - my new dimension, CUSTOMER SALES CHANNEL, will likely hold 200 million rows to start with, and it can move to perhaps 400 million rows over time).

We're very concerned about performance from a DIM this large.

We are planning on benchmarking our design before we get too far.

Here are a few ideas being pondered right now.

I was thinking of possibly hash partitioning on my surrogate key. Most of the time we'll be joining this DIM for facts, of course.

If need be I could have a separate DIM for each of my business units, or do something like this. I'd rather not do this, but if I must, I will.

We're also looking at DW appliances, but that is a whole nother Oprah Show.

I'm looking for opinions and ideas right now.

Thanks in advance!

DJ

Posts : 5
Join date : 2010-05-10

View user profile

Back to top Go down

Re: Monster Dimensions

Post  ngalemmo on Wed Jul 21, 2010 1:05 pm

One option would be to junk the Sales Channel dimension and include its FK in facts. Supplement it with a customer/product line/sales channel fact (with effective dates) to track the relationship.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Monster Dimension

Post  DJ on Wed Jul 21, 2010 1:39 pm

Thanks for the ideas!

I'm rather thinking of the cust / prod line / salesrep as a dimension that needs to be associated to many facts. I'm viewing it as dimensional data that happens to be at a lower granularity than customer.

As you realize, if I make the cust / prod line / salesrep a fact, I can't associate it to me other facts.




DJ

Posts : 5
Join date : 2010-05-10

View user profile

Back to top Go down

Re: Monster Dimensions

Post  ngalemmo on Wed Jul 21, 2010 2:08 pm

You associate the junk dimension (sales rep/sales channel) with the facts. The additional fact table is to keep the history of the relationship as it relates to customer.

The junk dimension would be very small and directly associated to facts where applicable. Assuming these facts already have customer and product as dimensions, you can easily associate the correct sales rep/channel row based on the existing context without the need for yet another monster dimension.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Monster Dimensions

Post  hang on Wed Jul 21, 2010 11:36 pm

Why would you have the sales rep/channel in customer dimension in first place. These two are really separate dimensions and their relationship with customers can only be established by sales transactions (fact). Treating them as a single dimension would mean you can predetermine their relationships even though the corresponding transaction has not occurred.

200 million is definitely no-go for a dimension, even for a monster dimension. That size is a typical symptom for a fact, maybe fact-less fact to be be mistaken as dimension. Even ruling out these two entities, your 60 million customer dimension is big enough to be a monster dimension. Then ngalemmo's approach may be used to further reduce the size. Wrapping up low cardinality attributes in a junk dimension, removing them from customer dimension and having a single junk dimension key in the fact is one way to reduce the size, at least horizontally. You may move fast changing attributes into a separate dimension and only relate current attribute set to the customer dimension, and use fact to reflect the more granular changing relationships.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

View user profile

Back to top Go down

Monster Dim

Post  DJ on Thu Jul 22, 2010 10:58 am

Hang says: Why would you have the sales rep/channel in customer dimension in first place. These two are really separate dimensions and their relationship with customers can only be established by sales transactions (fact). Treating them as a single dimension would mean you can predetermine their relationships even though the corresponding transaction has not occurred.

Hang, as you said, "Treating them as a single dimension would mean you can predetermine their relationships even though the corresponding transaction has not occurred"

That is our exact business requirement, you have very good intuition, and you have articulated our business perfectly. And this is why cust / salerep / sales channel is treated as a single dimension. Our operational systems do not associate the salesrep to the order.

Yes, 60 million is large for a DIM, it is more typical for a fact, and that would be why I'm posting here.

Our 60 million customer volume is not due to changing attributes, it's due to us having that many customers. Unfortunately, moving quickly changing attributes to a junk dim won't solve that. Having many customers is a terrific thing (unless we are dimensional modeler!)

I am pondering several alternatives. Hang, per your intuition (and mine), this is what I am pondering. Even though our transactions don't have salesrep id on them in our operational systems (they have only the cust with his pre-assigned salesrep), I think I'll put the salesrep on the transaction, and the customer on the transaction, and have separate DIMS (cust and salesrep). That will keep my volume down to a mere 60 million.

DJ

Posts : 5
Join date : 2010-05-10

View user profile

Back to top Go down

Re: Monster Dimensions

Post  hang on Thu Jul 22, 2010 7:48 pm

So the bottom line is you have 60 million customers and that is pretty much the minimum size of the customer dimension. But the point is you can still make it leaner by stripping out some low cardinality fields, and put them in a junk dimension that has a direct entry in the fact tables. Don't forget the performance can also benefit when a table is much leaner even if itís still quite deep.

For instance you could have a demographic dimension that may hold gender, age group, income range, marriage status, has child etc. So that you can do most high level analytical queries with you facts joining a very small dimension instead of a monster one. Kimball calls this type of dimension a minidimension, and if you have a foreign key in the customer dimension, itís called the outrigger of customer dimension.

I guess you may need a factless fact for salesrep-customer assignments with effective timestamps in it, so that you can do analysis about the productivities during certain period of time, say finding out the under achievers. I don't think this table would be as big as customer dimension as you may not assign salesreps to every customer, even though you may have many-to-many relationship.

I wonder if youíve already had an order fact table where you may have a few date fields (eg. Order date, payment date, ship date etc.) plus salesrep key, customer key (and possibly customer demographic Key), product key, quantity, dollar amount and perhaps a degenerate dimension, order number. This accumulating snapshot fact table will give you the actual associations between customers, salesreps and products at the time of order, in addition to your assigned associations between customers and salesreps.

You may still need a transaction grain fact table where you may have all the relationships at the time of sales. Your sales channels may appear in this fact table if they are unknown at the time of order.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

View user profile

Back to top Go down

Re: Monster Dimensions

Post  hang on Sun Aug 01, 2010 9:54 am

I think the Salesrep-customer assignment factless table is called coverage table in DW term. Another typical coverage fact is product promotions. This table can be used to track customer assignments, even before an order is taken. If it is not too big, you could store the assignments snapshot on daily basis. Therefore you would have Date key, Customer Key, Salesrep key and perhaps Salestem key in this table.


hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

View user profile

Back to top Go down

Re: Monster Dimensions

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum