Poorly sturctured data at source system

View previous topic View next topic Go down

Poorly sturctured data at source system

Post  JSchroeder on Thu Aug 08, 2013 12:16 pm

If the data in the source system is poorly structured and not normalized in any way, what is the most typical way to handle this? Is it common to try and normalize the data as it moves through the staging area before pushing it out into a dimensional data mart? We're pulling data from one of our own websites that is stored in SQL server, but it isn't really structured in any way. Would the right approach be to try and normalize it first?

JSchroeder

Posts : 12
Join date : 2012-03-29

View user profile

Back to top Go down

Reverse the telescope?

Post  Mike Honey on Thu Oct 03, 2013 4:11 am

Hi JSchroeder,

I wouldn't make any effort to normalize data for a Data Mart application - I prefer the Star Schema design which is heavily de-normalized.

I also prefer the "aggresive load" strategy, where source system data is accepted as valid and correct until proven otherwise. In reality its typically the most heavily scrutinized data available, flawed or not.

I'd see examples of "poor structure" as an opportunity for the Data Mart application to add value. You can clean data, add hierarchies, banding etc.

Good luck!
Mike
avatar
Mike Honey

Posts : 185
Join date : 2010-08-04
Location : Melbourne, Australia

View user profile http://www.mangasolutions.com

Back to top Go down

Re: Poorly sturctured data at source system

Post  BoxesAndLines on Thu Oct 03, 2013 10:55 pm

I wouldn't normalize it either. Normalizing requires two extensive sets of ETL logic. One to extract and normalize to ETL staging area (ODS). Another set of ETL logic to extract and load to the dimensional warehouse. I try to minimize the amount of transformation logic that must be done to stage the data, usually just adding dates to store a few weeks of data in case I need to reload anything to the warehouse.
avatar
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

View user profile

Back to top Go down

Re: Poorly sturctured data at source system

Post  harisrana on Thu Feb 12, 2015 1:19 am

I wouldn't normalize it either. Normalizing requires two extensive sets of ETL logic. One to extract and normalize to ETL staging area (ODS). Another set of ETL logic to extract and load to the dimensional warehouse. I try to minimize the amount of transformation logic that must be done to stage the data, usually just adding dates to store a few weeks of data in case I need to reload anything to the warehouse.

harisrana

Posts : 1
Join date : 2015-02-12

View user profile

Back to top Go down

Re: Poorly sturctured data at source system

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum