Dimension for booleans

djantzen Wed Feb 16, 2011 10:13 pm

Hi All, just hoping for a sanity check here,

I'm looking at building a transactional fact table tracking messages sent from user to user. The source database table is actually already in this structure. It's like so:

table messages(src_user_id int, dest_user_id int, accepted boolean, rejected boolean);

Because of the two boolean columns, it's possible for a message to be in a state where it has been neither accepted nor rejected (false, false). It's also the case that only one can be true. That makes me think that this could be modeled with a single column 'accepted_boolean_key' referencing a dimension for boolean values. Such a dimension might look like:

table dim_booleans(boolean_key int, value varchar), and have tuples like:

0, Unknown (to conform to Pentaho Kettle standards)
1, False
2, True
3, Neither

The value of the accepted column then would initially be 3 and become 1 or 2. I would also use such a dimension in cases where we don't know the truth value of a proposition instead of having a NULL.

My question: is this insane? Do people do this?

Thanks!,
David

dattatraynale Wed Feb 16, 2011 11:14 pm

Yes, this looks like a good approach.. additionally you can have values like Not applicable, Unknown, Accepted, Rejected in the dimension table.

Alternatively, this can also be implemented in the form of accumulated snapshot fact. So you can have Initial Status Date, Initial Status value, Decision Date, Decision Value

Initial status would be "unknown" and the Decision value would be accepted or rejected. The advantage of this is that you can even trap when or how long it took to get rejected/accepted.

-Datta.

ngalemmo Thu Feb 17, 2011 12:48 am

The status values should be in a dimension, and the statuses themselves should be in clearly identified columns, such as 'Accepted Flag', 'Rejected Flag' and so on. Wither these are true booleans or multi-state, that's your call.

As far as the dimension itself, treat it as a junk dimension. The natural key is the concatenation of all status values. The primary key should be a surrogate... don't try to build any intelligence in the key. If you want to pregenerate all possible combination of values, go ahead... there are not that many rows.

What I would not do is try to reduce everything to a single column where you are encoding bits to represent different states. I makes it difficult to use, particularly for the end user. BI Tools and SQL like columns that don't require bit level manipulation to interpret.

djantzen Thu Feb 17, 2011 6:33 pm

Thanks for the response. If I follow, you're suggesting the tables should be:

Code:: fact_message(src_user_key, dest_user_key, status_key) dim_status(accepted boolean, rejected boolean)

with tuples:

Code:: false, false true, false false, true

BoxesAndLines Fri Feb 18, 2011 9:57 am

That's they way I would do it as well. Otherwise you are losing information.

Sponsored content

Dimension for booleans

Dimension for booleans

Re: Dimension for booleans

Re: Dimension for booleans

Re: Dimension for booleans

Re: Dimension for booleans

Re: Dimension for booleans