Dimension for booleans

View previous topic View next topic Go down

Dimension for booleans

Post  djantzen on Wed Feb 16, 2011 10:13 pm

Hi All, just hoping for a sanity check here,

I'm looking at building a transactional fact table tracking messages sent from user to user. The source database table is actually already in this structure. It's like so:

table messages(src_user_id int, dest_user_id int, accepted boolean, rejected boolean);

Because of the two boolean columns, it's possible for a message to be in a state where it has been neither accepted nor rejected (false, false). It's also the case that only one can be true. That makes me think that this could be modeled with a single column 'accepted_boolean_key' referencing a dimension for boolean values. Such a dimension might look like:

table dim_booleans(boolean_key int, value varchar), and have tuples like:

0, Unknown (to conform to Pentaho Kettle standards)
1, False
2, True
3, Neither

The value of the accepted column then would initially be 3 and become 1 or 2. I would also use such a dimension in cases where we don't know the truth value of a proposition instead of having a NULL.

My question: is this insane? Do people do this?

Thanks!,
David

djantzen

Posts : 4
Join date : 2011-02-16

View user profile

Back to top Go down

Re: Dimension for booleans

Post  dattatraynale on Wed Feb 16, 2011 11:14 pm

Yes, this looks like a good approach.. additionally you can have values like Not applicable, Unknown, Accepted, Rejected in the dimension table.

Alternatively, this can also be implemented in the form of accumulated snapshot fact. So you can have Initial Status Date, Initial Status value, Decision Date, Decision Value

Initial status would be "unknown" and the Decision value would be accepted or rejected. The advantage of this is that you can even trap when or how long it took to get rejected/accepted.

-Datta.

dattatraynale

Posts : 3
Join date : 2011-02-16

View user profile

Back to top Go down

Re: Dimension for booleans

Post  ngalemmo on Thu Feb 17, 2011 12:48 am

The status values should be in a dimension, and the statuses themselves should be in clearly identified columns, such as 'Accepted Flag', 'Rejected Flag' and so on. Wither these are true booleans or multi-state, that's your call.

As far as the dimension itself, treat it as a junk dimension. The natural key is the concatenation of all status values. The primary key should be a surrogate... don't try to build any intelligence in the key. If you want to pregenerate all possible combination of values, go ahead... there are not that many rows.

What I would not do is try to reduce everything to a single column where you are encoding bits to represent different states. I makes it difficult to use, particularly for the end user. BI Tools and SQL like columns that don't require bit level manipulation to interpret.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Dimension for booleans

Post  djantzen on Thu Feb 17, 2011 6:33 pm

Thanks for the response. If I follow, you're suggesting the tables should be:
Code:

fact_message(src_user_key, dest_user_key, status_key)
dim_status(accepted boolean, rejected boolean)

with tuples:
Code:

false, false
true, false
false, true

djantzen

Posts : 4
Join date : 2011-02-16

View user profile

Back to top Go down

Re: Dimension for booleans

Post  BoxesAndLines on Fri Feb 18, 2011 9:57 am

That's they way I would do it as well. Otherwise you are losing information.
avatar
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

View user profile

Back to top Go down

Re: Dimension for booleans

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum