Some very basic question regarding overall ETL and Modeling strategy/best practices for Data warehouse project

View previous topic View next topic Go down

Some very basic question regarding overall ETL and Modeling strategy/best practices for Data warehouse project

Post  kuldeepchitrakar on Thu Sep 08, 2011 9:45 am


One of my client has multiple source OLTP system (due to acquisition) which stores same type of data e.g. order, sales amount etc now they might be called by different name according to source system but they mean the same e.g. one might call is “customer” one might call is “user”.

Source data gets stored in various databases (Oracle, SQL server, Sybase), Flat Files, excel, XML.

Now thinking from DW perspective, we need to make decision on staging

Question 1:

Now if we think from ETL perspective, we need a staging but Do we need a staging area for each source system because if we don’t do that and keep a common staging area we would need to do a light transformation while bringing the data from source to staging.

Question 2:
Considering the different type of source system, we are planning to have a staging (multiple or single that’s not final) in a database (simple table no constraint, just bring the data in common storage system) so that we leverage the database functionality while transforming the data balance the transformations (so data transformations load is balanced between ETL server and DB server).

Question 3:

We might get very less data read window to read the data form source system so we need to bring required data from source to staging as fast as we can.
Considering the above constraint do you think we can have a PRE-STAGING area whose structure will be same as source (without constraints) so that we can just read the data from source and dump it in pre-staging area (DBMS). This will not have any type of data transformation so ideally it will be fast and we will also have source data in pre-staging area which we can use in pre-staging to staging. (Just a thought)

Final Question:

Considering the multi source systems (might get new systems added in future) huge data. What are the best practices/design strategies which we can adopt right from initial point of design of DW, ETL, overall infrastructure

These questions might sound very basic or easy but would like to know expert thoughts so that we don’t miss anything or don’t do anything in wrong way.



Posts : 17
Join date : 2010-04-21
Age : 34
Location : India

View user profile

Back to top Go down

View previous topic View next topic Back to top

- Similar topics

Permissions in this forum:
You cannot reply to topics in this forum