Dimension for booleans
4 posters
Page 1 of 1
Dimension for booleans
Hi All, just hoping for a sanity check here,
I'm looking at building a transactional fact table tracking messages sent from user to user. The source database table is actually already in this structure. It's like so:
table messages(src_user_id int, dest_user_id int, accepted boolean, rejected boolean);
Because of the two boolean columns, it's possible for a message to be in a state where it has been neither accepted nor rejected (false, false). It's also the case that only one can be true. That makes me think that this could be modeled with a single column 'accepted_boolean_key' referencing a dimension for boolean values. Such a dimension might look like:
table dim_booleans(boolean_key int, value varchar), and have tuples like:
0, Unknown (to conform to Pentaho Kettle standards)
1, False
2, True
3, Neither
The value of the accepted column then would initially be 3 and become 1 or 2. I would also use such a dimension in cases where we don't know the truth value of a proposition instead of having a NULL.
My question: is this insane? Do people do this?
Thanks!,
David
I'm looking at building a transactional fact table tracking messages sent from user to user. The source database table is actually already in this structure. It's like so:
table messages(src_user_id int, dest_user_id int, accepted boolean, rejected boolean);
Because of the two boolean columns, it's possible for a message to be in a state where it has been neither accepted nor rejected (false, false). It's also the case that only one can be true. That makes me think that this could be modeled with a single column 'accepted_boolean_key' referencing a dimension for boolean values. Such a dimension might look like:
table dim_booleans(boolean_key int, value varchar), and have tuples like:
0, Unknown (to conform to Pentaho Kettle standards)
1, False
2, True
3, Neither
The value of the accepted column then would initially be 3 and become 1 or 2. I would also use such a dimension in cases where we don't know the truth value of a proposition instead of having a NULL.
My question: is this insane? Do people do this?
Thanks!,
David
djantzen- Posts : 4
Join date : 2011-02-16
Re: Dimension for booleans
Yes, this looks like a good approach.. additionally you can have values like Not applicable, Unknown, Accepted, Rejected in the dimension table.
Alternatively, this can also be implemented in the form of accumulated snapshot fact. So you can have Initial Status Date, Initial Status value, Decision Date, Decision Value
Initial status would be "unknown" and the Decision value would be accepted or rejected. The advantage of this is that you can even trap when or how long it took to get rejected/accepted.
-Datta.
Alternatively, this can also be implemented in the form of accumulated snapshot fact. So you can have Initial Status Date, Initial Status value, Decision Date, Decision Value
Initial status would be "unknown" and the Decision value would be accepted or rejected. The advantage of this is that you can even trap when or how long it took to get rejected/accepted.
-Datta.
dattatraynale- Posts : 3
Join date : 2011-02-16
Re: Dimension for booleans
The status values should be in a dimension, and the statuses themselves should be in clearly identified columns, such as 'Accepted Flag', 'Rejected Flag' and so on. Wither these are true booleans or multi-state, that's your call.
As far as the dimension itself, treat it as a junk dimension. The natural key is the concatenation of all status values. The primary key should be a surrogate... don't try to build any intelligence in the key. If you want to pregenerate all possible combination of values, go ahead... there are not that many rows.
What I would not do is try to reduce everything to a single column where you are encoding bits to represent different states. I makes it difficult to use, particularly for the end user. BI Tools and SQL like columns that don't require bit level manipulation to interpret.
As far as the dimension itself, treat it as a junk dimension. The natural key is the concatenation of all status values. The primary key should be a surrogate... don't try to build any intelligence in the key. If you want to pregenerate all possible combination of values, go ahead... there are not that many rows.
What I would not do is try to reduce everything to a single column where you are encoding bits to represent different states. I makes it difficult to use, particularly for the end user. BI Tools and SQL like columns that don't require bit level manipulation to interpret.
Re: Dimension for booleans
Thanks for the response. If I follow, you're suggesting the tables should be:
with tuples:
- Code:
fact_message(src_user_key, dest_user_key, status_key)
dim_status(accepted boolean, rejected boolean)
with tuples:
- Code:
false, false
true, false
false, true
djantzen- Posts : 4
Join date : 2011-02-16
Re: Dimension for booleans
That's they way I would do it as well. Otherwise you are losing information.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Similar topics
» Replacing Aggregate dimension with Conformed base dimension at lowest level.
» How to handle a Type I or II dimension with a snowflaked customer sub dimension (kimball book page 337, 338)
» bridge table and junk dimension on customer dimension (bank/credit union)
» Tip #136 Adding a Mini-Dimension to a Bridge Table: What if you can't split the dimension?
» Using the Date Dimension for Summary Fact Tables or new specialized Month Dimension?
» How to handle a Type I or II dimension with a snowflaked customer sub dimension (kimball book page 337, 338)
» bridge table and junk dimension on customer dimension (bank/credit union)
» Tip #136 Adding a Mini-Dimension to a Bridge Table: What if you can't split the dimension?
» Using the Date Dimension for Summary Fact Tables or new specialized Month Dimension?
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum
|
|