Kimball Forum
Would you like to react to this message? Create an account in a few clicks or log in to continue.

Rolling Month Measures Issue

3 posters

Go down

Rolling Month Measures Issue Empty Rolling Month Measures Issue

Post  larry_lan Wed Jun 06, 2012 9:15 am

Hi:

I met a difficult requirement about rolling month measures. It's different from regular rolling time reports, hope you can read through this post.

Dimensions and Facts have many fields, to simplify the problem, I use the following example. Suppose there are two tables, EMP and FACT. where EMP has three fields, (id, month, status); and FACT also has three fields, (id, month, amount).

Generally speaking, for dimensional table, we should use surragate key. However, the status for EMPs changes so fast that we think better to use monthly snapshot. So the EMP table may looks like

id, month, status
1, 201201, A
2, 201201, B
1, 201202, B
2, 201202, C
1, 201203, C
2, 201203, D


And in FACT table, we recorded EMP's sales amount every month, such as:

id, month, amount
1, 201201, 10
2, 201201, 20
1, 201202, 30
2, 201202, 40
1, 201203, 50
2, 201203, 60

The requirement is, user specify a month and a rolling number, n. then the report should output n months' data, but the STATUS field for EMP table, should use the LATEST month. For example, base on the sample data, if user specify '201203' for month, and '3' for rolling month. then data for the report should looks like

id, status, month, amount
1, C, 201203, 10
2, D, 201203, 20
1, C, 201203, 30
2, D, 201203, 40
1, C, 201203, 50
2, D, 201203, 60

And the SQL we use is:

select a.id, a.status, a.month, b.amount
from EMP a inner join FACT b ON a.id = b.id
where b.month between 201203 - 3 + 1 and 201203 -- 3 months
and a.month = 201203 -- the latest month

No, so fa so good. However, the most difficult part is performance issue, EMP table has 1 million rows per month, while Fact table has 10 million rows per month, it took me half hour to run such sql if I join these two table.

Now, I wonder if there is a better way to model this requirement? Such as put all information in Fact table and query without join? then I can speed up the query time.

Thanks
Larry

larry_lan

Posts : 5
Join date : 2011-11-07

Back to top Go down

Rolling Month Measures Issue Empty Re: Rolling Month Measures Issue

Post  hang Wed Jun 06, 2012 6:24 pm

Firstly you query would return far more records than you expected. I guess you may want to query as follows:

select a.id, a.status, a.month, sum(b.amount)
from EMP a inner join FACT b ON a.id = b.id
where b.month between 201203 - 3 + 1 and 201203 -- 3 months
and a.month = 201203 -- the latest month
group by a.id, a.status, a.month

If month is the period grain for your report, you may pre-aggregate your monthly snapshot by id and month without status, using similar query.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

Back to top Go down

Rolling Month Measures Issue Empty Re: Rolling Month Measures Issue

Post  larry_lan Wed Jun 06, 2012 7:04 pm

Hi Hang:

Thanks for your suggestion. We have many properties in dimensions, and some of then need sum, while some of them is count distinct. There are millions of combination. We tried pre-aggregate and it took days to calculate.

Thanks
Larry

larry_lan

Posts : 5
Join date : 2011-11-07

Back to top Go down

Rolling Month Measures Issue Empty Re: Rolling Month Measures Issue

Post  hang Wed Jun 06, 2012 7:07 pm

Sounds like a good case of leveraging OLAP cube, SSAS for instance.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

Back to top Go down

Rolling Month Measures Issue Empty Re: Rolling Month Measures Issue

Post  ngalemmo Wed Jun 06, 2012 9:11 pm

Try:

select a.id, a.status, a.month, b.amount
from (SELECT id, max(status) as status from EMP where month = 201203 group by id) a inner join FACT b ON a.id = b.id
where b.month between 201203 - 3 + 1 and 201203
ngalemmo
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

http://aginity.com

Back to top Go down

Rolling Month Measures Issue Empty Re: Rolling Month Measures Issue

Post  larry_lan Thu Jun 07, 2012 4:24 am

Thanks ngalemmo:

The subquery do reduce the joining records and query becomes faster. Thanks a lot. I also wonder if there is any better way to model such requirement in star schema? Since we will put this model in report tools like BO or BIEE, and with these tools, queries sometimes are not so flexible.

Thanks
Larry

larry_lan

Posts : 5
Join date : 2011-11-07

Back to top Go down

Rolling Month Measures Issue Empty Re: Rolling Month Measures Issue

Post  Sponsored content


Sponsored content


Back to top Go down

Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum