Kimball Forum
Would you like to react to this message? Create an account in a few clicks or log in to continue.

Too normalized? And question on aggregated fact

4 posters

Go down

Too normalized? And question on aggregated fact Empty Too normalized? And question on aggregated fact

Post  Booma Mon Mar 17, 2014 8:20 am

Hi,

Im designing a dimensional model for a datawarehouse for some kind of webshop.

I've got some attributes I wanna use, but I'm afraid it contains too much outriggers and is too much snowflaked. (i left some of attributes out for this example)

customer dimension
company_name
full_name
gender
email
registration_date : FK -> date dimension
birthday_date : FK -> date dimension
first_purchase_date : FK -> date dimension
last_purchase_date : FK -> date dimension

How many outriggers is acceptable?

Also, I'd like to add 'last_year_spendings' and maybe similair attributes as an aggregated fact. However I can't find much information on this topic. It's only half a page in the book 'The Complete Guide to Dimensional Modelling' by Kimball. How do these aggregated fact tables look?

At last I'd like to thanks this forum and the book. I've come far with my overall dimensional model thanks to these resources

Booma
Booma

Posts : 12
Join date : 2014-03-10

Back to top Go down

Too normalized? And question on aggregated fact Empty Re: Too normalized? And question on aggregated fact

Post  BoxesAndLines Mon Mar 17, 2014 8:26 am

For the date dimension, don't model a FK from the customer dimension. You can always join to the date dimension with a date column. Aggregate fact tables are just sums and group by's of a lower grain fact. Which columns you use or drop is driven by reporting. That is, I want my aggregate fact to handle the vast majority of queries while my detail grained fact should cover everything else.
BoxesAndLines
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

Back to top Go down

Too normalized? And question on aggregated fact Empty Re: Too normalized? And question on aggregated fact

Post  Booma Mon Mar 17, 2014 8:34 am

BoxesAndLines wrote:For the date dimension, don't model a FK from the customer dimension.  You can always join to the date dimension with a date column.  Aggregate fact tables are just sums and group by's of a lower grain fact.  Which columns you use or drop is driven by reporting.  That is, I want my aggregate fact to handle the vast majority of queries while my detail grained fact should cover everything else.
Thank you for your fast reply.

I dont really understand your comment on the customer dimension. You mean just put a Date field for registration_date, birthday_date etc? And if I need to group by month, for example, just join the tables on the DATE field? Would that be much slower than by ID?
Booma
Booma

Posts : 12
Join date : 2014-03-10

Back to top Go down

Too normalized? And question on aggregated fact Empty Re: Too normalized? And question on aggregated fact

Post  BoxesAndLines Mon Mar 17, 2014 9:01 am

You should have an index on the date column in the date dimension to address the joins. I typically add date dimension FK's to the fact table. If I have other dates that end up in dimensions, I leave them as date knowing that I can always join to the date dimension if needed.
BoxesAndLines
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

Back to top Go down

Too normalized? And question on aggregated fact Empty Re: Too normalized? And question on aggregated fact

Post  Booma Wed Mar 19, 2014 3:46 am

I still don't really understand aggregated facts as a dimension attribute.

Here is a snippet of my dimensional model:

I have the standard Customer dimension
Customer dimension
gender
full_name
...other attributes...
spent_fact_id (FK to Spent aggregated fact)

Spent aggregated fact
id
spent_last_year (decimal that indicates the amount of dollars/euros)
average_lifetime_spending

I don't see how that is a fact table, since it contains no dimensions.
Let's say I want a report with the average spending of Male customers, which spent at least $1000 last year. I don't really how you would query that with this model.
How do I record historical spendings? How much did they spent last year, 2 years ago, 5 years ago. The last 5 years in total?
Booma
Booma

Posts : 12
Join date : 2014-03-10

Back to top Go down

Too normalized? And question on aggregated fact Empty Re: Too normalized? And question on aggregated fact

Post  ngalemmo Wed Mar 19, 2014 9:09 am

Customer attributes are dimensional values, last year's spend, for example, is a classification of the customer, not a fact, so it belongs in a dimension table.

In retail it is not unusual to have multiple customer related dimension tables. Each table would carry the same primary key value and one or more tables could be used in a query depending on what is needed. The reason for this is a lot of attributes are not static and often fully recalculated. Maintaining separate tables helps this process as often a table is replaced rather than updated.

As far as the date attributes go, it is probably not a good idea to snowflake to the date dimension, instead include the desired date attributes in the table itself. Customer is going to be your largest dimension, snowflaking such a dimension can significant impact performance.
ngalemmo
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

http://aginity.com

Back to top Go down

Too normalized? And question on aggregated fact Empty Re: Too normalized? And question on aggregated fact

Post  Booma Wed Mar 19, 2014 10:14 am

You're right! Thanks, I got it now.


This question has few to do with the other questions I asked.
What is the preferred way to determine age of a product/customer in a dimensional model? For example: What is the average age of all customers?

I thought of a dimension 'TimeDelta'

attribute nameexample values
id1
years1
months17
days523

The day_amount could also be a measure for a fact. But with both solutions, it means that the fact table which contains this dimension needs to be updated daily.
Booma
Booma

Posts : 12
Join date : 2014-03-10

Back to top Go down

Too normalized? And question on aggregated fact Empty Re: Too normalized? And question on aggregated fact

Post  ngalemmo Wed Mar 19, 2014 11:28 am

There are two ways to deal with this, depending on how the question is phrased. Do you want the current age of people who have purchased things or do you want the age of the purchaser?

The former is simply an age calculation off the birthdate attribute of the customer, the latter is the difference between the birthdate and the time of the transaction. You don't need additional structures.

ngalemmo
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

http://aginity.com

Back to top Go down

Too normalized? And question on aggregated fact Empty Too normalized? And question on aggregated fact

Post  zoom Fri Mar 21, 2014 2:47 pm

Booma wrote:I still don't really understand aggregated facts as a dimension attribute.

Here is a snippet of my dimensional model:

I have the standard Customer dimension
Customer dimension
gender
full_name
...other attributes...
spent_fact_id (FK to Spent aggregated fact)

Spent aggregated fact
id
spent_last_year (decimal that indicates the amount of dollars/euros)
average_lifetime_spending

I don't see how that is a fact table, since it contains no dimensions.  
Let's say I want a report with the average spending of Male customers, which spent at least $1000 last year. I don't really how you would query that with this model.
How do I record historical spendings? How much did they spent last year, 2 years ago, 5 years ago. The last 5 years in total?


Take out first_purchase_date and last_purchase_date from customer dimension and keep one purchase date in the fact table. So your Fact table has all the transactions date and amount in it for all customers. Like here:

Customer Purchase date Amount
Adam 1/1/2012 100
Adam 11/1/2012 200
Adam 5/12/2013 10
Adam 1/1/2014 20
Adam 3/1/2014 20

Now if you want to know how much Adam made in 2012 uses SQL between function. Sum purchase amount where purchase date between 1/1/2012 and 12/31/2012 group by customer is $300 for Adam

You can adjust sql to run for 2 years or 5 years . I hope you get the idea now.

zoom

Posts : 97
Join date : 2010-08-23
Location : Chicago

Back to top Go down

Too normalized? And question on aggregated fact Empty Re: Too normalized? And question on aggregated fact

Post  Sponsored content


Sponsored content


Back to top Go down

Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum