Is it a best practice that Data warehouse follows the source system data type?

View previous topic View next topic Go down

Is it a best practice that Data warehouse follows the source system data type?

Post  M. Khan on Sat Feb 02, 2013 11:21 pm

Currently I am working on a project where the client is insisting to use the data types soure system . I appreciate all input I can receive on this subject.

M. Khan

Posts : 11
Join date : 2012-07-24

View user profile

Back to top Go down

Re: Is it a best practice that Data warehouse follows the source system data type?

Post  Jeff Smith on Mon Feb 04, 2013 10:42 am

Can you be more specific? Is it just Integers Vs CHAR/VARCHAR or something else?

Jeff Smith

Posts : 471
Join date : 2009-02-03

View user profile

Back to top Go down

Re: Is it a best practice that Data warehouse follows the source system data type?

Post  ngalemmo on Mon Feb 04, 2013 11:23 am

It is not uncommon to use VARCHAR for a source ID that is numeric to allow for future changes. Recommend it, but, if the client insists, do what they ask. After all, its T&M and their money.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Is it a best practice that Data warehouse follows the source system data type?

Post  BoxesAndLines on Mon Feb 04, 2013 12:27 pm

ngalemmo wrote:...After all, its T&M and their money.

LOL. The smart money is on T&M. Some companies still insist on signing "fixed bid" projects which always end up badly for all parties.

If there is only one source, then I would match the source datatypes. If there are two or more sources, then obviously, you need to rationalize the datatypes and lengths.
avatar
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

View user profile

Back to top Go down

Re: Is it a best practice that Data warehouse follows the source system data type?

Post  Jeff Smith on Mon Feb 04, 2013 12:41 pm

It sounds to me like the client was burned once before. Someone changed data types by cutting the length of CHAR fields or changing a CHAR field to Numeric and it caused problems after the vendor left.

Jeff Smith

Posts : 471
Join date : 2009-02-03

View user profile

Back to top Go down

Re: Is it a best practice that Data warehouse follows the source system data type?

Post  ngalemmo on Mon Feb 04, 2013 1:07 pm

Jeff Smith wrote:It sounds to me like the client was burned once before. Someone changed data types by cutting the length of CHAR fields or changing a CHAR field to Numeric and it caused problems after the vendor left.

Yep, that would be a problem.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Is it a best practice that Data warehouse follows the source system data type?

Post  M. Khan on Tue Feb 05, 2013 3:31 pm

Jeff,
Here is a small example, we have custmer type that defines different types of customer types as per the client definition. It is a set of code (i.e. integer) and description(i.e. varchar). The source system allows to leave customer type in customer master null. As a best practice that I have read, columns should not be left as null in data warehouse to avoid any errors in calcualtions in reports. If I make customer type code as varchar, I can make it empty string(since the client is using SQl server). But, If I make the code as integer as per the source system then I cannot leave it as empty string as there is no equivalent to empty string for integer in SQL server to avoid nulls in customer dimension.


M. Khan

Posts : 11
Join date : 2012-07-24

View user profile

Back to top Go down

Re: Is it a best practice that Data warehouse follows the source system data type?

Post  Jeff Smith on Tue Feb 05, 2013 4:06 pm

Couldn't you define the column so that the default is something other than Null but essentially means Null?

Jeff Smith

Posts : 471
Join date : 2009-02-03

View user profile

Back to top Go down

Re: Is it a best practice that Data warehouse follows the source system data type?

Post  BoxesAndLines on Tue Feb 05, 2013 9:14 pm

M. Khan wrote:Jeff,
... As a best practice that I have read, columns should not be left as null in data warehouse to avoid any errors in calcualtions in reports.

That is not a best practice in my warehouses. Data can be null and as a result, I leave it null. What is a best practice is to associate a fact row to a default dimension row if the dimension is not applicable or known. Additionally, in facts, what do you put in metric columns that don't have a value? 0? Can't use that because its a valid value. -1? That will impact your ability to sum across rows. You have to set the column to null.
avatar
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

View user profile

Back to top Go down

Re: Is it a best practice that Data warehouse follows the source system data type?

Post  M. Khan on Wed Feb 06, 2013 11:35 am

Thanks you all for contributing on the subject.
The example that I have discussed earlier is about customer dimension. We are going for two members in custmer dimension with surrogate keys 0 and -1. Customer with surrogate key 0 indicates unknown customer and -1 is reserved for not applicable. Customer type is an attribute of customer dimension and there is a requirement that analysis can be done by customer type. Hence, in case of null, ETL will default the Customer Type Code to 0 and description will be populated with the value "Customer type not set".





M. Khan

Posts : 11
Join date : 2012-07-24

View user profile

Back to top Go down

Re: Is it a best practice that Data warehouse follows the source system data type?

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum