how to "reduce the duplicated rows"

View previous topic View next topic Go down

how to "reduce the duplicated rows"

Post  jon on Wed Dec 08, 2010 1:36 am

i have adopted a database with a table, my_daily_snap_tbl, that keep a complete archive of daily transactional detail. For example, today, there are 100 records in the OLTP_table then these 100 rows are copied over and insert to my_daily_snap_tbl. suppose on next day, there are total of 105 rows, then, all 105 are copied over. if I do the following select:
select snap_date, count(snap_date)
from my_daily_snap_tbl
group by count(snap_date)

it will show:
2010-12-07 100
2010-12-08 105

if I do
select count(*)
from my_daily_snap_tbl

I will get 205 rows. (In reality, I've got 7 months worth of data in this my_daily_snap_tbl, which snapped every day, and there are 400 million rows.)

When I asked around to find out why my_daily_snap_tbl is capturing data from transaction liked that, one answer I've got is: the analyst wanted to have the ability to goto a specific day and see all the rows on any given day.

I did a quick comparison of rows, there are over 80% records haven't changed at all. Most of them are duplicated rows.

is there a way to model this table so that i only keep track of the changed, deleted, and new rows? if so, how to resemble the records on a given date? Feel free to elaborate how to handle it, with samples and code. I am a newbie.

Thanks.


jon

Posts : 11
Join date : 2010-05-10

View user profile

Back to top Go down

Re: how to "reduce the duplicated rows"

Post  ngalemmo on Wed Dec 08, 2010 2:31 am

There are two basic forms of fact tables that can be used to maintain a historical record of measures over time: a delta (transactional) fact table and an accumulating snapshot fact table.

A delta fact table stores activity as a series of transactions where each row is the net change. In other words, if a fact is updated, you store the difference between the old version and new version. The table would include the date of the change. By summing facts bound by that date, you can recreate the state of a fact at any point in time.

An accumulating snapshot stores a new image of the row with the addition of effective and expiration dates. When a fact is updated, the old version is expired and a new version is added with an appropriate effective date. Selecting rows bound by these dates would allow you to state the measures as of a particular date. While similar to your current table, this appoach stores far fewer rows as only new or updated rows are added to the table. Unchanged rows simply remain effective.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

thank you.

Post  jon on Wed Dec 08, 2010 11:38 pm

thanks for your explanation.

I've got only 4 words for them... it is amazingly good.

Thanks for your patient and detailed reply.

You guys are the best.

jon

Posts : 11
Join date : 2010-05-10

View user profile

Back to top Go down

Re: how to "reduce the duplicated rows"

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum