Kimball Forum
Would you like to react to this message? Create an account in a few clicks or log in to continue.

Modeling Sales & Visits

5 posters

Go down

Modeling Sales & Visits Empty Modeling Sales & Visits

Post  wonka Tue Mar 06, 2012 8:51 pm

Here's my latest uncertainty...

I'm building a DW for a company that's tracking sales of items and visits to one of their outlets. I'm good with the sales, but not sure which path to go for the visits. Sales looks something like this (very generic):

SALES
============
ProductID
CustomerID
DateID
Quantity
Amount

My original plan led me to creating a Visit factless fact

CUST_VISITS
============
DateID
VisitTypeID
Visit (defaults to 1)

VISIT_TYPES
============
VisitTypeID
VisitDescr

This seemed best fit but then I started thinking I should just hang the visit dim off the sales fact and have something like:

SALES
============
ProductID
CustomerID
DateID
Quantity
Amount
VisitType1 (YES or NO)
VisitType2 (YES or NO)
VisitType3 (YES or NO)
VisitType4 (YES or NO)

My issue with doing this is the Visit and Sales transactions are at different granularity. Sales happen at an exact point in time (e.g. 3/6/12 @ 14:45:32) whereas a Visit=1 Day (e.g. 3/6/12). Sales can also have multiple records per a customer visit (e.g. there could be 5 sales records on 3/6/12 for customer 123 but only 1 Visit record for 3/6/12).

I'm sure this is very basic and I shouldn't be confused, but wanted to know what you guys all thought. Any comments would be appreciated! Thanks in advance.

wonka

Posts : 13
Join date : 2011-08-10

Back to top Go down

Modeling Sales & Visits Empty Declare the grain ans stick to it

Post  rob.hawken Tue Mar 06, 2012 10:59 pm

Don't go with a mixed grain fact, breaks Ralph's first law of fact table design "keep to the grain".

rob.hawken

Posts : 13
Join date : 2010-09-19

Back to top Go down

Modeling Sales & Visits Empty Re: Modeling Sales & Visits

Post  hang Tue Mar 06, 2012 11:35 pm

You are not really mixing grains in your sales fact table as adding CustomerID and VisitType to it does not render any measure (Quantity, Amount) non-additive. You are on right track by getting rid of the initial CUST_VISITS fact table, as you may easily achieve that by aggregation on sales fact table. All you need is to have a timestamp, in addition to the date key, in the sales fact table.

However I have a little advice on the 4 columns for VisitType in the fact table. The model would be much cleaner if you can wrap them up into a single junk dimension and connect them to the fact table by a single FK.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

Back to top Go down

Modeling Sales & Visits Empty Re: Modeling Sales & Visits

Post  Vishy Wed Mar 07, 2012 1:31 am

I would again go by Ralph view on creating fact as it says, a fact should represent what happened and whatever the dimensions involved during that particular thing happening.

This is how we start and come with dimension while creating a new system.

So when a sale is happening i.e at point of sale (POS), we have following things coming

1) Time is there
2) Customer is there
3) Store location is involved
4) Item is involved

Here when any sale is happening, "visit" is not involved so it should not at all go this fact.

You are also thinking from your comfort perspective, think about how a reports will be delivered out of your this design where you have all the things in a single fact. Performance won't be that good and un-necessary you are also increasing the size of a fact table.

Thanks



Last edited by Vishy on Wed Mar 07, 2012 1:31 am; edited 1 time in total (Reason for editing : typo)

Vishy

Posts : 75
Join date : 2012-02-28
Location : India

Back to top Go down

Modeling Sales & Visits Empty Re: Modeling Sales & Visits

Post  hang Wed Mar 07, 2012 7:07 am

Vishy, what is available in 'Visits' but not available in sales transaction fact, VisitType or anything else you can see from this case. Are you saying VisitType should not be in sales fact? then how can you analyse the sales down to the grain of the time in a day by VisitType.

To me, 'visits' also applies to time of a day if you 'allocate' VisitType down to sales fact table. a daily 'Visit' by customer is just an aggregate based on sales fact table. So only one base fact table as follows:

Sales fact table
DateKey,
SalesDateTime
ProductKey,
CustomerKey,
VisitTypeKey (junk dimension)
Count=1

Don't get carried away by SalesDateTime. Very likely you still load the sales transactions on daily basis to the fact table in data warehouse. So all the dimensions should be available.

Now with this base fact table, you may work out all kinds of aggregate facts, including 'visit'. You may summarise the count by day by customer, considering a customer could visit twice, or have a distinct count on customer. You may create views ( maybe materialised) or physical aggregate tables for performance reason, or you just feed the base fact to cube and let cube aggregate everything for you.

The point is, you don't treat 'Visit' as part of base model. If it is a physical table, it's just a performance booster like many other aggregate (rollup) fact tables.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

Back to top Go down

Modeling Sales & Visits Empty Re: Modeling Sales & Visits

Post  wonka Wed Mar 07, 2012 11:49 am

Thanks for the suggestions. This is what I'm leaning towards doing...

Sales
=============
CustomerID
ProductID
DateID
VisitTypeID
DateTime
Quantity
Amount

Sample records would be:
CustomerID|ProductID|DateID|VisitTypeID|DateTime|Quantity|Amount
123|2|20110203|1001|2011-02-03 06:00:00|1|950.00
123|4|20110203|1001|2011-02-03 15:00:00|5|25.00

dimVisitType
=============
VisitTypeID
Walk (visit without a sale)
Sale (sale generated)
FirstVisit (first visit to store)
Reactivated (first visit to store in a year)

Sample record would be:
VisitTypeID|Walk|Sale|FirstVisit|Reactivated
1000|YES|NO|NO|NO
1001|NO|YES|NO|NO
1002|NO|YES|YES|NO
...
...

Is this kind of more along the lines of what you guys are suggesting?






wonka

Posts : 13
Join date : 2011-08-10

Back to top Go down

Modeling Sales & Visits Empty Re: Modeling Sales & Visits

Post  ngalemmo Wed Mar 07, 2012 3:23 pm

Sales are not visits and visits are not sales. They are two different events and should be tracked by two different fact tables. Of course, one wonders how on earth they are going to get visit information if the customer never buys anything. Its hard enough to know who the customer really is when you have a sale (I assume this is retail and not a membership club). On-line is about the only mechanism to get visit counts, but you almost never know who the person really is.
ngalemmo
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

http://aginity.com

Back to top Go down

Modeling Sales & Visits Empty Re: Modeling Sales & Visits

Post  hang Wed Mar 07, 2012 7:56 pm

When I consolidate seemingly two different fact into one, I firstly check the dimensionality and see if they could share exactly the same dimension set. In this case, I guess the only questionable attribute is the DateTime. If there are different registering processes, the DateTime would be different from the sales even if the visit is linked and it would be really hard to properly connect the two fact together.

However as you said, the visit would only be on day level, so obviously the visiting time can only be the sale time, and that makes me wonder if there is anything else that stops you from consolidating the two facts. Would consolidating introduce double counting on any measures, I guess not if you sensibly set them to 0 or null for non sale visit and the value itself is an rough indicator without looking further into VisitType dimension.

Of course splitting them up is easy and consolidating is always harder as it involves more insight into the business. Actually what is even harder is when you either have two many small fact tables or some fact tables are just too wide. What I do is to make an extra effort trying to find out if the small fact tables can be combined based on dimensionalities, and wide fact tables spitted based on the business focus or process to strike out a properly balanced schema.

Why consolidating if possible? Because you don't have to leave the consolidating to user if the link is inherently in the same table. Whereas splitting up an overloaded fact based on business focus may avoid double counting and sparsely populated fact table. So based on these criteria, I would consolidate in this case.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

Back to top Go down

Modeling Sales & Visits Empty Re: Modeling Sales & Visits

Post  Vishy Thu Mar 08, 2012 2:13 am

Transaction fact table itself tells even if you don't have typeID/visit there.

It tells you

1) Customer visited or not on a given day
2) customer bought something or not on a given day

so only thing the fact table is not showing is customer's "window shopping".

Hang, I was not able to understand clearly what did you mean but I am from reporting background also at that sometimes pushes me to have that kind of design so that report is easy and fast.

customer visit itself has too many other attributes also associated with it other then attributes mentioned. Like following

------------------------------------------------------

Date
Customer ID
Walk (visit without a sale) --sale generated or not can be calculated from this column itself
FirstVisit (first visit to store)
Reactivated (first visit to store in a year)
Visit after how many days?
What kind of items (category) purchased ( baby related,grocry,hardwares etc)
reason of not purchasing (wanted to purchase but didn't find that item or price issue)
store location ID
----------------------------------------------------
I am not sure if you get few of these columns details from source but over the period of time you might start getting them or customer may demand this.

I can tell you that both facts can be joined at reporting level, getting details for a (day+customer) from this factless fact table and then for the same (day+customer) from the transaction and then have left outer join b/w these 2 outputs.

Having all the details in your fact is almost doubling the size of the main fact and for a retailed store kind of thing where you have lot many customers+lot many transactions , this fact would grow like anything....






Vishy

Posts : 75
Join date : 2012-02-28
Location : India

Back to top Go down

Modeling Sales & Visits Empty Re: Modeling Sales & Visits

Post  hang Thu Mar 08, 2012 5:03 am

Vishy, I think what you and Ngalemmo said makes sense, as I found out the product may not be involved in the visit. So separating them looks better idea.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

Back to top Go down

Modeling Sales & Visits Empty Re: Modeling Sales & Visits

Post  Sponsored content


Sponsored content


Back to top Go down

Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum