One-to-one dimension to fact
2 posters
Page 1 of 1
One-to-one dimension to fact
I have inherited a design with a very wide fact table. It includes many timestamps, long varchar descriptions, booleans, as well as legitimate FKs to dimensions and numerous measures. There are also numerous dates, but I have a DATE_DIM, and consider the dates as legitimate FKs to to that dim. I have considered a dim table for booleans and then one dim table that has all the varchars and timestamps as attributes. However, that new dim would be one-to-one with the fact. And the varshars would not work well in a junk dim, as they are free-form descriptions (not to search by, but to display on BI reports). Is the one-to-one dim the way to go or am I better leaving these timestamps and varchars in the fact as degenerate dims? This is Oracle. The current fact has 200 columns and 100,000,000 rows.
wjordan- Posts : 6
Join date : 2010-05-05
Re: One-to-one dimension to fact
You have your work cut out for you.
You need to break things up a bit, and you may want to consider moving the text to a junk dimension. Even though the text is free-form, people are creatures of habit and, depending on the nature of the text, there may be more commonality than you may think.
When I have implemented a junk text dimension in the past, I build the table with a surrogate primary key, a hash number as a non-unique alternate key and the text field. I have two indexes, one on the PK and the other on the hash number ( a pseudo natural key). The range of the hash number should be a large as resonably possible, a 32 bit CRC hash of the text is best (the Oracle hash function will let you use 2^30 -1 as the upper limit). The hash will not be unique, but it avoids having to index the text and is close enough so that doing a lookup to find an existing text is efficient.
You need to break things up a bit, and you may want to consider moving the text to a junk dimension. Even though the text is free-form, people are creatures of habit and, depending on the nature of the text, there may be more commonality than you may think.
When I have implemented a junk text dimension in the past, I build the table with a surrogate primary key, a hash number as a non-unique alternate key and the text field. I have two indexes, one on the PK and the other on the hash number ( a pseudo natural key). The range of the hash number should be a large as resonably possible, a 32 bit CRC hash of the text is best (the Oracle hash function will let you use 2^30 -1 as the upper limit). The hash will not be unique, but it avoids having to index the text and is close enough so that doing a lookup to find an existing text is efficient.
Similar topics
» Dimension Design with intermediate tables between fact and dimension
» Using the Date Dimension for Summary Fact Tables or new specialized Month Dimension?
» Modeling an Employee Dimension to a Fact which has two columns relating to the Dimension
» Conformed Dimension for Transaction Fact and Accumulating Snapshot Fact Table
» Using the Date Dimension for Summary Fact Tables or new specialized Month Dimension?
» Modeling an Employee Dimension to a Fact which has two columns relating to the Dimension
» Conformed Dimension for Transaction Fact and Accumulating Snapshot Fact Table
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum
|
|