All dimensions embedded in fact tables

View previous topic View next topic Go down

All dimensions embedded in fact tables

Post  rendybjunior on Thu Oct 23, 2014 4:54 am

Normally, when we design data warehouse we will have fact tables and dimension tables.

However, it does make sense to embed dimension in fact table. Especially for simple dimensions who has no other attributes and rarely change its value.

Having dimensions in fact table will makes query run very fast and no need to maintain dimension table separately, no need to look up dimension table when doing ETL.

What are considerations to keep dimensions in separate table from facts?

Thanks!

rendybjunior

Posts : 7
Join date : 2014-09-30

View user profile

Back to top Go down

Re: All dimensions embedded in fact tables

Post  BoxesAndLines on Thu Oct 23, 2014 7:50 am

What do you do when a dimension column changes value? Update a billion row table?
avatar
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

View user profile

Back to top Go down

Re: All dimensions embedded in fact tables

Post  hkandpal on Thu Oct 23, 2014 8:22 am

Hi

What database are you using, if you are using an RDBMS then the are created to join, you should not have any problem when joining tables.
What is the dimesnion which you want to store, if it is like an order number, invoice number then it is fine to store in fact.

thanks
Himanshu

hkandpal

Posts : 113
Join date : 2010-08-16

View user profile

Back to top Go down

Re: All dimensions embedded in fact tables

Post  nick_white on Thu Oct 23, 2014 9:41 am

If you only have a single attribute for your dimension you do embed it in your Fact - it's called a Degenerate Dimension

nick_white

Posts : 366
Join date : 2014-01-06
Location : London

View user profile

Back to top Go down

Re: All dimensions embedded in fact tables

Post  ngalemmo on Thu Oct 23, 2014 11:22 am

As B&L pointed out, its a real challenge to update something like that.

But even if you never update it, there are more practical reasons for a proper model. If you are using a typical row/column based DBMS, the table becomes huge. It is not clear that it would be any faster to query due to the row width. DBMS needs more physical reads because fewer rows fit in a block. Taken to extreme, a flattened table could be 15x or more larger than a proper fact table, consuming vast amounts of space and taking much longer to query.

In the case of vector (columnar) databases space isn't an issue and the performance impact is less, but you still have the update issue to contend with.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: All dimensions embedded in fact tables

Post  rendybjunior on Thu Oct 23, 2014 8:52 pm

Thanks for all your responses, happy to join this forum and finds you guys so helpful.

I'm adding more context. I have not decided DBMS to be used, hence I am open to use columnar, document oriented, or RDBMS.

I'll try to summarize my response:

1. Update billion rows when dimension change
@ngalemmo & @BoxesAndLine
What if I still embed the dimension to fact and:
- decided not to update old rows
- have a separate table that store information of previous dimensions so query can refer to that information if someone want to use historical and current value viewed as something equal
Will still be there any problem I will faced in the future?

2. Row column DBMS very wide row which affect performance
@ngalemmo
Assume I decided not to use row column DBMS, then this point will be invalid. I have not decided DBMS to be used, hence I am open to use columnar, document oriented, or RDBMS. Of course if I am using row column DBMS it will be a deal breaker. To continue the discussion forward, let's say I am not using row column DBMS.

3. Join or not join fact and dimension
@hkandapal
Yet if it is even designed to join, not joining any tables will improve performance. CMIIW.

4. Degenerate dimension
@nick_white
It is not something like invoice number, it is about the whole dimensions to be embedded.

In addition, I am trying to find another reason:
5. Let say it is important for user to know possible values of dimension. Separate dimension will give easiness to show all possible values for certain dimension.
However (with some effort) it can be created incrementally when doing ETL. CMIIW.
6. Let say will be helpful when other facts having well defined exactly same dimension to be drilled across.
However, if we maintain the definition somewhere else, it still can be accessed and drilled across.

I totally agree the concept of fact and dimension, no argue on that.
However I am still wondering whether I have to separate them or have them in the same table all at once...


Last edited by rendybjunior on Thu Oct 23, 2014 8:58 pm; edited 1 time in total (Reason for editing : typos)

rendybjunior

Posts : 7
Join date : 2014-09-30

View user profile

Back to top Go down

Re: All dimensions embedded in fact tables

Post  ngalemmo on Fri Oct 24, 2014 11:42 am

Frankly, it isn't worth discussing. It's a very old idea that is impractical in the long run. If you want to try it, go ahead.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: All dimensions embedded in fact tables

Post  hkandpal on Fri Oct 24, 2014 1:00 pm

Hi ,

you are correct not joining and storing in a single file will give you a better performance then any RDBMS, but file systesm has its own drawback like security, disaster recovery, scalability, multi user capability...


thanks

hkandpal

Posts : 113
Join date : 2010-08-16

View user profile

Back to top Go down

Re: All dimensions embedded in fact tables

Post  rendybjunior on Sun Oct 26, 2014 10:37 pm

@ngalemmo
Do you have reference to any discussions for the same subject in the past? Thanks

rendybjunior

Posts : 7
Join date : 2014-09-30

View user profile

Back to top Go down

Re: All dimensions embedded in fact tables

Post  nick_white on Mon Oct 27, 2014 3:37 am

Hi,
Dimensional modelling is the industry standard for data warehouse design and has been developed over the last few decades by some of the top minds in the business - most of whom have experience implementing this methodologies 100s of times across all industries.
If you think you can come up with something better then good luck to you but you're unlikely to get much help designing a new methodology from a Kimball design forum - as, almost by definition, people on this forum are followers of Kimball's methodology.

Regards,

nick_white

Posts : 366
Join date : 2014-01-06
Location : London

View user profile

Back to top Go down

Re: All dimensions embedded in fact tables

Post  BoxesAndLines on Mon Oct 27, 2014 9:17 am

Sure, it's called the Data warehouse Toolkit. I'd recommend buying a copy.
avatar
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

View user profile

Back to top Go down

Re: All dimensions embedded in fact tables

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum