Extracting Master/Detail data into Staging Tables

View previous topic View next topic Go down

Extracting Master/Detail data into Staging Tables

Post  monserob on Mon Jun 24, 2013 9:34 am

I need to extract invoice data from our source system; this data is made up of header and line info as you'd expect.

My first question is should I denormalise the data at this stage and have just one invoice staging table or should I structure the staging area similar to the source and have both header and line staging tables? If it's relevant the data warehouse invoice fact table will be down to line level.

If the answer to the question above is to keep the data normalised in the staging area then I need to know best practice for extracting the data in separate packages without duplicating the source extraction logic. Possible answers that I've considered are:


  1. Extract the header data into a staging table and then join the staging table with the source line table. The downside to this is the join will be across different databases on different servers so not sure what this will mean in terms of performance
  2. Retrieve the extracted header data from option 1 above and then cycle through each header record and retrieve the line info from the source system. The downside to this is the number of hits to the database but this is the option that feels more like how the ETL (SSIS) should work
  3. Extract the header and line data in a single query and then decompose the data into two staging tables. Downside is it seems like a lot of unnecessary data massaging will take place, which could introduce bugs


Any help would be appreciated!

monserob

Posts : 4
Join date : 2013-01-30

View user profile

Back to top Go down

Re: Extracting Master/Detail data into Staging Tables

Post  Mike Honey on Mon Jun 24, 2013 6:34 pm

Hi monserob

I'd normally design a single query spanning both header and line info - the other options seem more complex and I'd expect them to be less efficient by a factor of 10 or more.

If volumes & requirements permit, I'd deliver the data from that query directly into the fact table.  I only use staging tables where I have to, and you can easily deliver 10m rows using this method with modern hardware and ETL tools.

Good luck!
Mike
avatar
Mike Honey

Posts : 185
Join date : 2010-08-04
Location : Melbourne, Australia

View user profile http://www.mangasolutions.com

Back to top Go down

Re: Extracting Master/Detail data into Staging Tables

Post  ngalemmo on Mon Jun 24, 2013 6:49 pm

I don't like the first 2 options.  Assuming these databases are sitting on different servers, cross-database joins like those tend to perform very poorly.

As for option 3, its not clear why you would 'decompose' the data.  Usually, ETL is running on a separate server so I doesn't impact the operational system.  Depending on your volume and load on the operational environment, you can pull header and detail in the same query, but, it does require more resources on the operational machine.  If you have sufficient information in the source tables that allow you to pull the tables individually (no joins) that would be a more efficient option, putting less load on the operational system.  The latter is usually the way to do it in a 24/7 operational environment.  If the operational system goes down every night, is it not a major consideration.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Extracting Master/Detail data into Staging Tables

Post  zip159 on Mon Jun 24, 2013 8:04 pm

If the volume of data is low you could pull all the header and line data over to a staging area and then do your processing there.  Even if the volume is high it may make more sense to do that than doing cross database joins or cycling through header records and pulling lines for each header (depending on how it's implemented).

I would personally create two staging tables, header and line.  Pull all the relevant headers into the header staging table and pull all the lines for the relevant headers into the line staging table.  If you can determine what the relevant lines are without looking at headers that's great, if not then do a join and only pull back the line information.  You're doing a little more work but I think it's a cleaner approach doing it this way.

zip159

Posts : 6
Join date : 2013-06-24

View user profile

Back to top Go down

Re: Extracting Master/Detail data into Staging Tables

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum