Detalization of fact table

View previous topic View next topic Go down

Detalization of fact table

Post  Aleksandr on Mon Nov 23, 2015 11:01 am

Hi,

There is a scenario. We have a fact table containing Vacation Request duration with the following attributes:
RequestKey (surrogate key)
RequestType
Employee
Total Duration (measure to analyze)
Start_Date
End_Date

This fact table keeps total vac duration. But business users would like to analyze vacation by months. If a vacation started on January and finished on February we should keep duration for each month.

Which scenario looks better?
1. Create a separate fact table with the following columns and configure reference to existing table by RequestKey?
- RequestKey
- Vacation Month
- Month Duration

2. Create two additional columns: Vacation Month and Duration in existing table and keep both duration?

Any thoughts would be very helpful! Thanks in advance!


Last edited by Aleksandr on Mon Nov 23, 2015 12:36 pm; edited 2 times in total

Aleksandr

Posts : 10
Join date : 2015-11-11

View user profile

Back to top Go down

RE: Detalization of fact table

Post  zoom on Mon Nov 23, 2015 11:44 am

If you add Vacation start date and vacation end date in the fact table, then it is going to work for monthly analysis. You don't have to store "Duration", since it can be calculated using those dates. Sometimes a customer checkout early or extend their visit, so this is another reason not to store "Duration" in the fact table.

zoom

Posts : 97
Join date : 2010-08-23
Location : Chicago

View user profile

Back to top Go down

Re: Detalization of fact table

Post  Aleksandr on Mon Nov 23, 2015 12:34 pm

hi zoom,

Thanks for your reply. Now we are planning to load data to Vacation Request fact table from scratch every day, because count of rows is not great. So we can calculate duration by month in ETL process. But I doubt, is it a good idea mix Month Duration and Total duration in the same fact table? Total duration is calculated alreday in source and we could just extract it rather than calculate sum by months.

Aleksandr

Posts : 10
Join date : 2015-11-11

View user profile

Back to top Go down

Re: Detalization of fact table

Post  zoom on Mon Nov 23, 2015 2:36 pm

What is you grain on the fact table? If you are loading daily transactions, then "monthly" or "Total" should not be kept in the same fact table. Using that daily transaction fact table, you can either create a monthly snapshot table or your reporting team can sum totals in the report.

zoom

Posts : 97
Join date : 2010-08-23
Location : Chicago

View user profile

Back to top Go down

Re: Detalization of fact table

Post  Aleksandr on Mon Nov 23, 2015 3:33 pm

Let's look at business process in some details. An employee uses a web-application to request a vacation from October 30, 2015 to November 02, 2015. This application calculates duration (in hours) of vac and writes a row into underlying DB.

Our ETL process extract this vacation info from appication and load data to fact table. For our example it looks like:

RequestKey Employee    Start_Dt     End_Dt         Duration
1000          100            20151030   20151102      24

But business users want to analyze duration of vacation by month and see Total too: October, 2015 - 8 hours, November - 16 hours, Total - 24. So, we should modify our fact table.

Is good idea to store data in such form:
RequestKey Employee    Start_Dt     End_Dt         Month       Month Duration    Total Duration
1000          100            20151030   20151102      201510      8                       24
1000          100            20151030   20151102      201511      16                      24

Or Total Duration should be removed?

Thanks.

Aleksandr

Posts : 10
Join date : 2015-11-11

View user profile

Back to top Go down

Re: Detalization of fact table

Post  ngalemmo on Mon Nov 23, 2015 3:34 pm

To zoom's point, what if the users later decide to do a weekly report?

Given your volumes are not large, a daily grain would allow you to handle anything.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Detalization of fact table

Post  Aleksandr on Mon Nov 23, 2015 4:04 pm

Hi, ngalemmo

Keeping such data in daily grain will increase the number of rows is about 30 times - it's significantly. If store the data in monthly grain, is bad practice to mix Month and Total duration in the same table?

Thank you!

Aleksandr

Posts : 10
Join date : 2015-11-11

View user profile

Back to top Go down

Re: Detalization of fact table

Post  Aleksandr on Tue Nov 24, 2015 1:30 am

What about to keep two tables: the first (described on top the topic, let's name it Main table) stores total duration, the second (called VacationByMonth) stores detailed data? I've two suggestions regarding the second table structure:

1. With columns RequestKey, Month, Month Duration Duration. Configure foreign key to reference to the first table by RequestKey columns. But if I understand correctly Kimball's The Data Warehouse Toolkit doesn't recommend such method.

2. With columns Start_Dt, End_Dt, Employee, RequestType, Month, Month Duration.

At that moment we have a couple of reports. And only one of them needs duration by Months, so it will use VacationByMonth table. Other reports consume Total Duration and they will use Main table.

Is the second structure of VacationByMonth acceptable?

Thanks.

Aleksandr

Posts : 10
Join date : 2015-11-11

View user profile

Back to top Go down

Re: Detalization of fact table

Post  ron.dunn on Tue Nov 24, 2015 6:21 am

How many employees do you have? Unless you're the world's biggest company, you don't have enough data to worry about the size of your fact table.

If it were my choice, I'd store each day of leave as a separate transaction. Something like:

fact_leave (employee_id,date_id,leave_type_id,hours_taken)

I'd use a type-2 slowly changing dimension for Employee, including their current department.

My Date dimension would include year, month, day, etc. to cater for all the roll-up scenarios you might want.

Further, I could now handle scenarios I don't think you've thought of yet, including:
- Employee who takes 1/2 day leave to attend a private appointment
- Employee who takes 10 days leave, but has two days credited back because they were called into work for an emergency
- Employee who took a 5 day break which included both personal leave and bereavement leave

Considering data volume, how many employees do you have? Let's go wild, and say 20,000. How many days leave do they get each year? Let's be a little generous, and say 20 days. That means your table might contain 400,000 rows, plus a small percentage of rows for people who have multiple entries for the one day. That should only take a few seconds to load on any contemporary DBMS.

What problems do you think might not be solved by this approach?


ron.dunn

Posts : 55
Join date : 2015-01-06
Location : Australia

View user profile http://ajilius.com

Back to top Go down

Re: Detalization of fact table

Post  Aleksandr on Tue Nov 24, 2015 7:04 am

Hi ron.dunn,

Thanks for you reply. We have about 15 000 employees. Each employee takes about 25 vacation days (regular, ill, etc.). So, table might contain 400 000 rows per year -you guessed:). I think we will not be faced with the performance problem. But the whole process will look a little weird:

1. We extract from data source already calculated Total duration.
2. Split total duration into set of daily values.
3. Aggregate daily values to get total duration for report.

I agree with you, such method handles a lot of scenarios, but new scenarios are not expected. What about keeping two tables? Why it looks bad?

Thank you!




Aleksandr

Posts : 10
Join date : 2015-11-11

View user profile

Back to top Go down

Re: Detalization of fact table

Post  nick_white on Tue Nov 24, 2015 8:27 am

A well designed data warehouse should be able to support any report that a user can come up with, for a particular topic and, where it doesn't extending your model to support new queries should not break existing queries. It does this by holding the lowest possible level of detail available - as from this it should be possible to fulfill any query; if it can't then it means the data is not actually available, which is a different issue. From this granular fact data it is then possible to build up additional, aggregate, fact tables if there are performance issues with querying the granular data (either due to the volume of the data or the complexity of the query).

What you are trying to do is design a fact table to support just a specific report. While superficially this may seem like an attractive idea, it is normally a very bad idea - you end up with a fact table per report, every new reporting requirement results in the creation of a new fact table and you end up with a mess of of a DW that is basically unsupportable.
You say that there will be no new requirements - and you may be right and therefore your design will work - but in 99% of DWs implemented you'll find that users will decide they want to report on the data in a slightly different way. With your design approach, you'll either have to redesign the fact table(s) - which may well break your existing reports and require them to be rebuilt as well - or you create a new, slightly different fact table which means you have duplicated data, more complex ETL and a design your users will never understand: if they want to create a new report how are they meant to know which of the many Vacation data tables are they meant to query?

nick_white

Posts : 366
Join date : 2014-01-06
Location : London

View user profile

Back to top Go down

Re: Detalization of fact table

Post  Aleksandr on Wed Nov 25, 2015 3:30 pm

Hi,

We've discussed this issue and decided to store vacation data in daily grain. You are really right: such method handles all scenarios. Thanks you all for advice, you patiently explained to me how to solve my question. Your advice is really valuable

Aleksandr

Posts : 10
Join date : 2015-11-11

View user profile

Back to top Go down

Re: Detalization of fact table

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum