Kimball Forum
Would you like to react to this message? Create an account in a few clicks or log in to continue.

Customer addresses in a high volume retail environment

4 posters

Go down

Customer addresses in a high volume retail environment Empty Customer addresses in a high volume retail environment

Post  Guest Wed May 13, 2009 5:58 am

I have a requirement to start capturing addresses in our data warehouse for detailed analysis of customer returns. Up until now, analysis at postal code has been sufficient.

The company I work is mail order, so the addresses I need to record are:
Current Account Address
Delivery Address
Account Address at time of delivery (may differ from above)
Goods Collection Address (may differ from delivery address)

I could go for the straightforward approach and simply record all the address details against the relevant customer, order and return dimensions. My concern is that recording two addresses against the order will impact general query performance as it significantly increases the size of the row and therefore will require more I/O for any query processing this dimension (as rows per page will go down).

My instinct is to make address a conformant dimension and snowflake - this would protect the general queries from any performance impact but would make queries using addresses a little slow (which we could live with). Problem here is that the delivery address is free format text and we don't have any address cleaning software at the back end so would probably end up with lots of 'duplicate' addresses.

Anyone else dealt with a similar problem or got any suggestions?

Thanks

Philip

Guest
Guest


Back to top Go down

Customer addresses in a high volume retail environment Empty Re: Customer addresses in a high volume retail environment

Post  BoxesAndLines Wed May 13, 2009 9:31 am

You're on the right track by wanting to create an Address dimension table. Addresses take up a large of amount of space so removing redundant addresses by snowflaking is a good idea. To improve performance, put the appropriate address FK's on your fact table too. That way if you want address information on your fact query it is available with a direct join without having to go through Customer, Order, etc.

The lack of data quality software is another problem altogether. In addition to increasing the row counts in your address dimension due to formatting inconsistencies, the validity of the address is unknown. People who are in the business of defrauding retailers purposefully misrepresent their address and name data for the very reason you are adding this data to your warehouse. I'd do some research and push for a business case for identity and DQ software.
BoxesAndLines
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

Back to top Go down

Customer addresses in a high volume retail environment Empty Re: Customer addresses in a high volume retail environment

Post  Guest Thu May 14, 2009 4:38 am

Thanks for the response. I'm going to go down the separate addresses dimension.

With regard to data quality & fraud - we have a lot of up front checking in the operational systems and use credit reference agencies to check customers and so account addresses will be consistent. The data quality problem occurs with delivery addresses that are entered free form without any validation - so the same delivery address may be entered slightly differently on two orders. There's not a lot I can do about this at present, but maybe once it becomes visible in the data warehouse we can find a business case for cleaning delivery addresses as well.

Guest
Guest


Back to top Go down

Customer addresses in a high volume retail environment Empty Snowflake or star depends on usage

Post  Jeff Smith Thu May 14, 2009 11:04 am

You can create an address dimension and then create a Customer dimension that contains the dimension keys for all of the addesses (one column for each address type). The Customer Address Dimension woth be modeled as follows:
Customer Address Key
Delivery Address Key
Account Address at time of delivery Key
Goods Collection Address Key

Where the Customer Address Key is the surrogate key. This would save a lot of space.

The "Current Account Address" was a little unclear, but if you always want to know the Current Account Address, even on history, then you might want to assign this key to the Account Dimension and update it whenever it changes. Otherwise, put it in the Customer Addess Dimension.

Jeff Smith

Posts : 471
Join date : 2009-02-03

Back to top Go down

Customer addresses in a high volume retail environment Empty Re: Customer addresses in a high volume retail environment

Post  Guest Fri May 15, 2009 4:02 am

[quote="Jeff Smith"]You can create an address dimension and then create a Customer dimension that contains the dimension keys for all of the addesses (one column for each address type). The Customer Address Dimension woth be modeled as follows:
Customer Address Key
Delivery Address Key
Account Address at time of delivery Key
Goods Collection Address Key
quote]

Not sure if I'm misunderstanding, but what you seem to be describing above is a factless fact table rather than a dimension table?

To clarify the addresses that we need to hold:
The current address of a customer is the address at which we would contact the customer now.
The delivery address can vary by order - so order 1 may go to Uncle Bob, order 2 may go to the neighbour.
The account address at time of delivery - this would be the current address (above) at the time they placed the order.
The goods collection address is the address from which a customer return would be picked up by the haulier.

Seems to me the best way to record the addresses is an address dimension linked directly into the relevant fact tables, with the compromise of link to the address table from the customer dimension to reduce row size and therefore not impact I/O performance on the customer dimension.

Guest
Guest


Back to top Go down

Customer addresses in a high volume retail environment Empty Re: Customer addresses in a high volume retail environment

Post  tropically Fri May 15, 2009 11:31 pm

Hi
Could you please explain:
"with the compromise of link to the address table from the customer dimension "

From my understanding this is whats being layed out

DIM ADDRESS
------------
DIM_ADDRESS_SYS_ID


FACT
-----------
CUSTOMER_ADDRESS_SYS_ID
DELVERY_ADDRESS_SYS_ID
ACCOUNT_ADDRES_SYS_ID

And all three sys ids in the fact point to dim_address``corrwct

tropically

Posts : 13
Join date : 2009-05-12

Back to top Go down

Customer addresses in a high volume retail environment Empty Re: Customer addresses in a high volume retail environment

Post  Guest Tue Jun 02, 2009 4:21 am

Sorry for the late reply. Hopefully the diagram below will help explain.

On the sales fact table there are two keys that are both used as links back to the Address dimension. By following the delivery address key, we can find out what delivery address was used for the order, and using the Billing Address key tells us what the customer's billing address was at the time of the order.
To find out what the customer's current address is, we use the Current Address Key from the Customer Dimension which links into the Address table as well.

If you use Business Objects, you would show this in a universe by having Delivery Address and Billing Address aliases of the Address Dimension.

This compromise (by not embedding the customer address in the customer table) means we have to do an extra join when doing address analysis (e.g. customers by postal area), but given that most queries do not use the address, this keeps the row size down on the Customer dimension and improves the I/O throughput of that table.
If you use a database that enables compression on tables, the I/O issue is probably not as great a concern, as text is highly compressable.

Philip

Customer addresses in a high volume retail environment Cust_a10

Guest
Guest


Back to top Go down

Customer addresses in a high volume retail environment Empty Re: Customer addresses in a high volume retail environment

Post  Jeff Smith Tue Jun 02, 2009 9:13 am

A couple of things. I would consider keeping Postal Code in a seperate dimension. There is a natural hierarchy with Postal Code to State and you may even add company specific Regional roll ups to the Postal Dimension. The Postal Dimension would contain city name, county name, state name, state postal abbreviation, region, etc. Plus, taking out the Postal Code code reduce the number of rows in the Address Dimension. 100 Main Street, for example, can exist in many Postal Codes.

The current Address should stay on the Customer Dimension but you may also want to put it on the fact table. If you only keep it on the Customer Dumension, you will not be keeping history. You may want to map out the location of everyone that used a particular store. By keeping the Customer Address on the Fact table, you can create an accurate map of where people lived at the time of the sale as opposed to where they live now. Not a big issue in an area where people don't move, but could cause havoc in college towns or in places with a lot of renters.

If space is an isse and if the other addresses aren't used very often, then you could create a Customer Address dimension that, as you pointed out, a lot like a factless fact table, but it assigns a dimension key to each combination of Customer Addresses. For example:

Customer Address Surrogate Key
CUSTOMER_ADDRESS_SYS_ID
DELVERY_ADDRESS_SYS_ID
ACCOUNT_ADDRES_SYS_ID

With the Customer Address Surrogate Key being assigned to the Sales Fact Table. This would save a lot of space if customers frequently used a store. It wouldn't make much sense for a car dealership, but it would make a lot of sense for a Bank where a customer can use an ATM or branch several times a week.

It all depends on how the data is used and how it acts.

Jeff Smith

Posts : 471
Join date : 2009-02-03

Back to top Go down

Customer addresses in a high volume retail environment Empty Re: Customer addresses in a high volume retail environment

Post  Guest Tue Jun 02, 2009 9:38 am

Putting the customer address on the fact table not necessary and to be honest just sounds plain wrong. The Billing address key from the sales fact table points to the customer address at the time of the order so that provides the history you're talking about and it only takes up 4 bytes on the fact table. This makes it sound like I am concerned about space, but it is the resulting I/O of having wide rows that I am trying to avoid.

In the example I've also been a bit lazy which is probably causing some confusion. The customer address is made up of a number of components, I just couldn't be bothered typing them all in. I would leave the postal code on the address dimension, but I am in the UK and the postal code has a hierarchy encoded within it that is extracted into it's component parts. We currently have the postal code table which when the address dimension is implemented would effectively become a conformant dimension.

The one thing this model does not give us is address changes that occur outside the ordering process - this would the factless fact table mentioned earlier in the thread but currently we have no requirement to capture that information.

Guest
Guest


Back to top Go down

Customer addresses in a high volume retail environment Empty Re: Customer addresses in a high volume retail environment

Post  Jeff Smith Tue Jun 02, 2009 11:07 am

Usually when I use the term "current" in a dimension table, it's a field that gets updated. So a Current Address key on the customer dimension table would be the customer's current address as opposed tot he customer's address at the time of the sale. As I stated, it all depends on how the data acts. If Customers move around a lot and you want to keep history, you model it one way. If the customers don't move or if you don't care about history, then you model it another way.

Jeff Smith

Posts : 471
Join date : 2009-02-03

Back to top Go down

Customer addresses in a high volume retail environment Empty Re: Customer addresses in a high volume retail environment

Post  Guest Tue Jun 02, 2009 12:38 pm

Sorry, the lat sentence of previous reply should have read:
"The one thing this model does not give us is an history of address changes that occur outside the ordering process - this would the factless fact table mentioned earlier in the thread but currently we have no requirement to capture that information."

The address key on the customer dimension would be updated when a customer changes their address.

Guest
Guest


Back to top Go down

Customer addresses in a high volume retail environment Empty Re: Customer addresses in a high volume retail environment

Post  DilMustafa Tue Jun 02, 2009 1:15 pm

I have handeled a similar scenerio creating an address dimension (a master list of all addresses) and then role-playing it as mailing address, delivery address and service address.

DilMustafa

Posts : 49
Join date : 2009-02-03
Location : Calgary, Canada

Back to top Go down

Customer addresses in a high volume retail environment Empty Re: Customer addresses in a high volume retail environment

Post  Sponsored content


Sponsored content


Back to top Go down

Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum