Data Warehouse Purging

View previous topic View next topic Go down

Data Warehouse Purging

Post  bniknar on Thu Sep 01, 2011 2:03 pm

Hello,

I've inherited a data warehouse (Microsoft SQL server) that has a star schema. I've been asked to purge all data older than 1 year. There are SAN space issues, as well as retention policies, which are driving the need to purge., This is a custom-built data warehouse, unfortunately with poor documentation.

Rather than trying to figure this out by hand: are there any tools available which can anaylze the schema and give me data to make purging decisions? We want to do this carefully so we don't impact either the ETL process or the reporting that's based off the DW.

Thank you.


bniknar

Posts : 1
Join date : 2011-09-01

View user profile

Back to top Go down

Re: Data Warehouse Purging

Post  VHF on Wed Sep 14, 2011 6:04 pm

Most of the fact table in a star schema will relate to a date dimension. This should make it easy to identify data older than 1 year. But I don't know of any tool that would do this automatically.

Large DWs often have fact table partitioned by date; the oldest data can be purged simply by dropping partitions.

VHF

Posts : 236
Join date : 2009-04-28
Location : Wisconsin, US

View user profile

Back to top Go down

Re: Data Warehouse Purging

Post  lorenzolamas on Wed Nov 02, 2011 1:42 am

One idea is to copy the data to another table on a weekly or monthly basis.

For example, if you have the table, TABLE, you might want to have a table called TABLE_2001_01 for the first week of 2001. In addition, you might want to have TABLE_2001 with all of the data for the year.

After that, you could delete the data from the original table subject to constraints, etc.

lorenzolamas

Posts : 3
Join date : 2011-10-29

View user profile

Back to top Go down

Data Warehouse Purging

Post  sonia2010 on Mon Jan 09, 2012 6:17 am

Hi,
This is my first day in this forum and I hope from all of you to welcome me...! The Data Warehouse/Mart is expected to house historical data. Based on the duration for how long the end users want to perform the analysis, we keep the data for that long. In other words, we would have users want to compare the Stores Monthly sales and compare it with the monthly sales of for the last 3 years. Here we would have to keep at least 3 years of data so that end users can perform their analysis. Once we have reached a point where we have data for more than 3 years it would become redundant as there would be no analysis done on it and on top of it it would continue occupying space. We need to get rid of the data that is no more required, and it is achieved through the Data Purge process.Data Purging is the process of deleting data from the fact or aggregate tables which are no more required for any data analysis. Data purging is an on going process that keeps getting rid of data which is no longer required because the time frame for which the analysis to be done on it is not required anymore. Data is archived before it is purged so that the users could request for data beyond the required historical date that the they have requested for in the requirements. This is to cater for any future requirements only.

sonia2010

Posts : 5
Join date : 2012-01-09
Age : 27
Location : GERMANY

View user profile

Back to top Go down

Re: Data Warehouse Purging

Post  Jeff Smith on Mon Jan 09, 2012 10:27 am

I hate purging data.

Have you tried using page compression on the Tables and Non-clustered indexes? The page compression can reduce the disk space used by the tables by 1/2 and even more for the nonclustered indexes.

I haven't seen much of an impact on performance. The performance hit on CPU tends to be off set by the performance gain in IO. IO performance hasn't improved as much as CPU performance so anything you can do to shrink the size of the data getting passed to the CPU is good.

I would do the dimension tables first. Running page compression on big tables can be an issue. You might need to rebuild the tables using the table gained from compressing the dimension tables and smaller fact tables. If that's the case and the big fact tables aren't already partitioned, it's a good ooportunity to partition the table. SQL Server doesn't recommend partitioning tables of less than 150 million rows, but I think that's because you won't see a performance improvement from partitioning with smaller tables. However, partitioning tables can make removing old data a lot easier and it gives you the option of putting older data on slower, cheaper disk.

Jeff Smith

Posts : 471
Join date : 2009-02-03

View user profile

Back to top Go down

Data Warehouse Purging

Post  sonia2010 on Tue Jan 10, 2012 7:05 am

Hi,
You will be challenged to learn about business and feeder system changes that will affect the DW/DSS systems You as the system developer would like to know of developments that will affect the DW/DSS systems in time to allow adequate time to assess what is impacted, make changes, test changes, etc. Of course this is no new concern to anyone doing systems maintenance. If you are responsible for a system being fed from, say, 10 sources, you may have much more exposure than you have with the typical transaction processing system. And though intelligent use of the data extraction, cleaning, and loading tools and the information catalogs can greatly ease the burden here, many changes will require a fair amount of effort. By the way, keeping informed and assessing the impact of technically driven changes to the feeder systems may be more difficult than keeping track of the business driven changes. If your IS organization has change control meetings, it is a major mistake for a DW/DSS developer not to attend those meetings regularly.

sonia2010

Posts : 5
Join date : 2012-01-09
Age : 27
Location : GERMANY

View user profile

Back to top Go down

Re: Data Warehouse Purging

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum