Is it a best practice that Data warehouse follows the source system data type?
4 posters
Page 1 of 1
Is it a best practice that Data warehouse follows the source system data type?
Currently I am working on a project where the client is insisting to use the data types soure system . I appreciate all input I can receive on this subject.
M. Khan- Posts : 11
Join date : 2012-07-24
Re: Is it a best practice that Data warehouse follows the source system data type?
Can you be more specific? Is it just Integers Vs CHAR/VARCHAR or something else?
Jeff Smith- Posts : 471
Join date : 2009-02-03
Re: Is it a best practice that Data warehouse follows the source system data type?
It is not uncommon to use VARCHAR for a source ID that is numeric to allow for future changes. Recommend it, but, if the client insists, do what they ask. After all, its T&M and their money.
Re: Is it a best practice that Data warehouse follows the source system data type?
ngalemmo wrote:...After all, its T&M and their money.
LOL. The smart money is on T&M. Some companies still insist on signing "fixed bid" projects which always end up badly for all parties.
If there is only one source, then I would match the source datatypes. If there are two or more sources, then obviously, you need to rationalize the datatypes and lengths.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: Is it a best practice that Data warehouse follows the source system data type?
It sounds to me like the client was burned once before. Someone changed data types by cutting the length of CHAR fields or changing a CHAR field to Numeric and it caused problems after the vendor left.
Jeff Smith- Posts : 471
Join date : 2009-02-03
Re: Is it a best practice that Data warehouse follows the source system data type?
Jeff Smith wrote:It sounds to me like the client was burned once before. Someone changed data types by cutting the length of CHAR fields or changing a CHAR field to Numeric and it caused problems after the vendor left.
Yep, that would be a problem.
Re: Is it a best practice that Data warehouse follows the source system data type?
Jeff,
Here is a small example, we have custmer type that defines different types of customer types as per the client definition. It is a set of code (i.e. integer) and description(i.e. varchar). The source system allows to leave customer type in customer master null. As a best practice that I have read, columns should not be left as null in data warehouse to avoid any errors in calcualtions in reports. If I make customer type code as varchar, I can make it empty string(since the client is using SQl server). But, If I make the code as integer as per the source system then I cannot leave it as empty string as there is no equivalent to empty string for integer in SQL server to avoid nulls in customer dimension.
Here is a small example, we have custmer type that defines different types of customer types as per the client definition. It is a set of code (i.e. integer) and description(i.e. varchar). The source system allows to leave customer type in customer master null. As a best practice that I have read, columns should not be left as null in data warehouse to avoid any errors in calcualtions in reports. If I make customer type code as varchar, I can make it empty string(since the client is using SQl server). But, If I make the code as integer as per the source system then I cannot leave it as empty string as there is no equivalent to empty string for integer in SQL server to avoid nulls in customer dimension.
M. Khan- Posts : 11
Join date : 2012-07-24
Re: Is it a best practice that Data warehouse follows the source system data type?
Couldn't you define the column so that the default is something other than Null but essentially means Null?
Jeff Smith- Posts : 471
Join date : 2009-02-03
Re: Is it a best practice that Data warehouse follows the source system data type?
That is not a best practice in my warehouses. Data can be null and as a result, I leave it null. What is a best practice is to associate a fact row to a default dimension row if the dimension is not applicable or known. Additionally, in facts, what do you put in metric columns that don't have a value? 0? Can't use that because its a valid value. -1? That will impact your ability to sum across rows. You have to set the column to null.M. Khan wrote:Jeff,
... As a best practice that I have read, columns should not be left as null in data warehouse to avoid any errors in calcualtions in reports.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: Is it a best practice that Data warehouse follows the source system data type?
Thanks you all for contributing on the subject.
The example that I have discussed earlier is about customer dimension. We are going for two members in custmer dimension with surrogate keys 0 and -1. Customer with surrogate key 0 indicates unknown customer and -1 is reserved for not applicable. Customer type is an attribute of customer dimension and there is a requirement that analysis can be done by customer type. Hence, in case of null, ETL will default the Customer Type Code to 0 and description will be populated with the value "Customer type not set".
The example that I have discussed earlier is about customer dimension. We are going for two members in custmer dimension with surrogate keys 0 and -1. Customer with surrogate key 0 indicates unknown customer and -1 is reserved for not applicable. Customer type is an attribute of customer dimension and there is a requirement that analysis can be done by customer type. Hence, in case of null, ETL will default the Customer Type Code to 0 and description will be populated with the value "Customer type not set".
M. Khan- Posts : 11
Join date : 2012-07-24
Similar topics
» How to handle situations where the data is deleted from the source system?
» Using the Dimensional Data Warehouse as source data for the OLTP process
» data warehouse and data warehouse system
» data warehouse or not ? when is it okay to use OLAP without a data warehouse database
» Deletes in the source system for Type-2...
» Using the Dimensional Data Warehouse as source data for the OLTP process
» data warehouse and data warehouse system
» data warehouse or not ? when is it okay to use OLAP without a data warehouse database
» Deletes in the source system for Type-2...
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum
|
|