Customer and Addresses

View previous topic View next topic Go down

Customer and Addresses

Post  zahid_ash on Sun Nov 20, 2011 9:55 pm

Hi,

I have to design a dimensional relationship for Customer and Address.

In this case Customer and Address have many to many relationship. i.e. There could be joint customers and each customer has postal and physical address

I am thinking to create a Customer dimension and Address dimension and Create FactLess fact table in between.

My question is: What could be the business key for Address dimension?

Address dimension has the column e.g.: Address Key , Unit, Property, Street , City, Postcode, State, Country.


zahid_ash

Posts : 5
Join date : 2011-09-15

View user profile

Back to top Go down

Address Hash

Post  elmorejr on Mon Nov 21, 2011 10:20 am

Your best bet is to use a HASH function on the fields that you would count as a unique address. For example, in Postgres, you could do something like:

Code:
addr_string = upper(Unit) || upper(Property) || upper(Street) || upper(City) || upper(Postcode) || upper(State) || upper(Country)

addr_hash = md5(addr_string)

Store the "addr_hash" as your business key.

elmorejr

Posts : 25
Join date : 2011-10-20
Location : Columbus, Ohio, United States of America, Earth, Sol

View user profile

Back to top Go down

Re: Customer and Addresses

Post  ngalemmo on Mon Nov 21, 2011 6:30 pm

Depends on the number of address, but generally, it is not a good idea to use a hash as a key, unless you are using something like SHA-256. Given that generates a 32 byte hash, your not gaining much over just using the address itself.

When you deal with hashes in a combinatorial situation (i.e. using it as a key, the hash value needs to be unique among all others in the table) the probability of a collision (two different addresses having the same hash value) increases very rapidly as the population (# rows in the table) increases. (For example, if you have 31 people in a room, the probability that any two people in the room have the same birthday is around 50%). There have been collisions found with md5 and SHA-1.

If the issue is index size and lookup performance, then one approach is to use a small hash, such as CRC-32, as a non-unique index on the table. This generates a 32-bit hash and will likely have collisions as the table reaches 1-2 million rows. But as a non unique index, it will quickly isolate 1-3 rows that you then check field for field.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Customer and Addresses

Post  zahid_ash on Mon Nov 21, 2011 6:45 pm

I have post a reply before, dont know where it has disappeared
--------------------------------------------------------

For every combination of strings their will be a new HASH value , If any column of an Address dimension changes like any street name then the HASH key will change so how can implement the Slowly changing dimension for Address dimension if we use the HASH key as a business Key?

zahid_ash

Posts : 5
Join date : 2011-09-15

View user profile

Back to top Go down

By definition you can not / should not

Post  elmorejr on Tue Nov 22, 2011 11:53 am

If you plan to implement an address dimension, where each entry is a unique address, there should not be a need to implement Type 2 changes. Every address is unique. Thus, a change in address will become a new address entry in your dimension unless an entry for that address already exists.

In this scenario, the following are unique:

100 Anystreet, Apt. 2B, Columbus, Ohio 43210

100 Anystreet, Apt. 2C, Columbus, Ohio 43210


If you would like to implement Type 2 as it applies to the customer changing their address, then you would not want to implement a separate address dimension. You will want to include the address information in your customer dimension and apply Type 2 rules accordingly.

elmorejr

Posts : 25
Join date : 2011-10-20
Location : Columbus, Ohio, United States of America, Earth, Sol

View user profile

Back to top Go down

Re: Customer and Addresses

Post  zahid_ash on Tue Nov 22, 2011 10:09 pm

@ elmorejr

Thanks mate.

Got your point i.e. Address is not a SCD2 dimension and FactLess fact will hold the customer and address relationship for any change in address or customer.




zahid_ash

Posts : 5
Join date : 2011-09-15

View user profile

Back to top Go down

Re: Customer and Addresses

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum