How long should -1 dummy records exist in fact tables?

View previous topic View next topic Go down

How long should -1 dummy records exist in fact tables?

Post  Scoop on Tue Feb 10, 2009 10:57 am

Dear Kimball Group, we are using the Kimball BUS Architecture method with SQL Server 2005, SSIS, SSAS, ProClarity, PPS, and MOSS.

I am at a clients site that is using Full Refresh process for building 3 data marts so far every night. We are trying to get incremental going using replication and ODS but their parent company is not allowing us to go forward with replication on production due to them thinking this a huge strain and IO issue on the production OLTP servers. We are working on this and they want client to upgrade to 64 bit servers for all OLTP systems. Yet the I/O issues are on the SAN being SAN I/O issues.
Question?
Each Data Mart dimension is using a -1 dummy record. Every time they load a fact record they make sure each dimension exists and if it does not they add the -1 key to that fact record. They have been told that these -1 records should only exist if these fact records are late but will have corresponding dimension the next business day or week, etc. What we have been telling Viracon Inc is that all dimensions need to have 1 or many fact records and these dimensions must be related to the grain of each fact table and they cannot be
-1 dummy record if that dimension does not exist for that grain of that fact table and will never exist. We have a lot of -1 dummy records in each fact table that you can see in the attached sample reports and we are trying to find article or statements that describe or inform businesses to not have fact records with dummy records if they are never going to arrive and this could produce incorrect results.

Briefing books and Dashboards are picking up measures with out related dimensions.

We also have been telling business to narrow down data marts to business process driven marts not departmental. We also are working with clients to to create marts with the necessary fact tables and only relate the dimensions that are equal to the grain of each fact table.

May not have shared everything that is going on, but client seems to think having thousands of -1 dummy records is okay. Any input, advice, articles, comments, suggestions is greatly appreciated.

Thank you for your help and if this was not appropriate we are very sorry.

Scoop
ProjectKey CustomerKey MaterialKey DepartmentKey MachineKey ShiftKey DateKey SquareFeetLost MaterialCostLost
----------- ----------- ----------- ------------- ----------- ----------- ----------- --------------------------------------- ---------------------------------------
-1 -1 124485 10208 16842 1303 545809 0.00 0.00
-1 -1 124409 10208 16842 1303 545836 0.00 0.00
-1 -1 124485 10208 16842 1303 545808 0.00 0.00
-1 -1 124419 10208 16842 1303 545808 0.00 0.00
-1 -1 124483 10208 16842 1303 545808 0.00 0.00
-1 -1 124419 10208 16842 1303 545808 0.00 0.00
-1 -1 124409 10258 -1 1305 545792 552.60 273.54
-1 -1 124409 10258 -1 1306 545793 552.60 273.54
-1 -1 124409 10258 -1 1306 545795 368.40 182.36
-1 -1 124409 10258 -1 1306 545795 184.20 91.18
-1 -1 124409 10239 -1 1303 545798 184.20 91.18
-1 -1 124427 10258 -1 1305 545799 552.60 340.40
-1 -1 124459 10239 -1 1305 545808 184.20 151.04
-1 -1 124409 10212 -1 1305 545809 552.60 273.54
-1 -1 124414 10212 -1 1305 545809 552.60 320.51
-1 -1 124409 10258 -1 1305 545819 368.40 182.36
-1 -1 124409 10258 -1 1305 545819 368.40 182.36
-1 -1 124409 10212 -1 1305 545823 552.60 273.54
-1 -1 124409 10239 -1 1303 545828 736.80 364.72
-1 -1 124409 10258 -1 1306 545830 2947.20 1458.86
-1 -1 124409 10258 -1 1306 545830 921.00 455.90
-1 -1 124409 10258 -1 1306 545830 368.40 182.36
-1 -1 124412 10258 -1 1305 545832 552.60 326.03
-1 -1 124412 10258 -1 1305 545832 552.60 326.03
-1 -1 124480 10239 -1 1303 545835 736.80 785.43
-1 -1 124409 10212 -1 1306 545836 368.40 182.36
-1 -1 124480 10258 -1 1306 545837 921.00 981.79
-1 -1 124409 10212 -1 1304 545842 368.40 182.36
-1 -1 124352 10212 -1 1306 545842 921.00 511.16
-1 -1 124472 10212 -1 1306 545842 552.60 646.54
-1 -1 124409 10239 -1 1303 545847 368.40 182.36
-1 -1 124292 10208 16842 1303 545833 0.00 0.00
-1 -1 124409 10208 16842 1303 545808 0.00 0.00
-1 -1 124292 10208 16842 1303 545833 0.00 0.00
-1 -1 124409 10208 16842 1303 545808 0.00 0.00
-1 -1 124409 10208 16842 1303 545806 0.00 0.00
-1 -1 124498 10208 16842 1303 545854 0.00 0.00
-1 -1 124300 10208 16842 1303 545821 0.00 0.00
-1 -1 124413 10208 16842 1303 545806 0.00 0.00
-1 -1 124302 10208 16842 1303 545793 0.00 0.00
-1 -1 124302 10212 -1 1303 545796 12.00 5.27
-1 -1 124354 10208 16842 1303 545793 0.00 0.00
-1 -1 124412 10208 16842 1303 545808 0.00 0.00
-1 -1 124412 10208 16842 1303 545809 0.00 0.00
-1 -1 124498 10208 16842 1303 545792 0.00 0.00
-1 -1 124414 10208 16842 1303 545828 0.00 0.00
-1 -1 124431 10208 16842 1303 545806 0.00 0.00
-1 -1 124431 10208 16842 1303 545861 0.00 0.00
-1 -1 124427 10208 16842 1303 545802 0.00 0.00
-1 -1 124427 10208 16842 1303 545806 0.00 0.00

Scoop

Posts : 18
Join date : 2009-02-10

View user profile

Back to top Go down

Re: How long should -1 dummy records exist in fact tables?

Post  Edwin Kurian on Tue Feb 10, 2009 11:26 am

You are on the right track by informing business that marts should be created for a business process. Moreover bring in important fields, and then scale the mart to other fields. The beauty of Dimension modelling is that it is easily scalable.
I think the bigger issue here is to analyze why you are having many -1 (Default / NA / Unknown) fact records. Reasons for -1 records could be because of late arriving dimensions, fact source not clean (assuming the dimensions have been cleaned when loaded). Not populating the fact table because "dimension key not found" is not the best idea. Filtering records to the fact table should be a business logic and the decision of the business users.

Edwin
who believes that involvement of the business team is critical to the success of a BI project.

Edwin Kurian

Posts : 13
Join date : 2009-02-03
Location : Milwaukee

View user profile http://www.valordevelopment.com

Back to top Go down

How long should -1 dummy records exist in fact tables?

Post  Scoop on Wed Feb 11, 2009 2:23 pm

Edwin we thank you for your reply.

We do not want business to load data into fact table with dimension records that do not exist for every dimension. The IT folks we are mentoring do not think it is going to cause problems when we create reports we can see the dummy records and actual measures for these dummy. IT folks do not think anything is wrong with data. But, when you load data into fact table for each dimension and a lof of the surrogate keys end up being -1 that is a problem.

You said; Not populating the fact table because "dimension key not found" is not the best idea. Can you please give us any words of wisdom, documents, do's and dont's, why you should or should not.

Just trying to see what other folks would say or do??

We are just with some clients that really do not believe source data is incorrect and that it is okay to have records in facts that are not related to every dimension.

So, advice, help is greatly appreciated not looking for anyone to do our jobs for us just little advice or help.

Thank you,
Scoop

Scoop

Posts : 18
Join date : 2009-02-10

View user profile

Back to top Go down

Re: How long should -1 dummy records exist in fact tables?

Post  BrianJarrett on Wed Feb 11, 2009 3:08 pm

There are other ways to handle missing dimension records. For example; you can load a bogus dimension record for each fact record missing the dimension, created new and specifically for the fact record you're loading. This "on the fly" dimension record would get a new surrogate key just like any other record but will be clearly marked as a bogus dimension (pick a field and give it a standardized bogus flag). When you create the dimension you will populate it with the actual alternate key for the record missing the dimension.

If the problem is a late-arriving dimension then when the dimension does show up it'll update the bogus record, requiring no additional work for the ETL process. Since the record can be clearly defined as bogus it can be easily included on an exception report.

Since the new "on the fly" record doesn't contain a surrogate key of -1 you also have the benefit of it not breaking any unique constraints you have on your fact table's composite primary key.

Some records, I've found, are just fine loaded with generic -1. These are typically not included in the fact table's composite primary key or they're just not that important. It's a case by case basis, really.

Hope this helps.


Last edited by BrianJarrett on Thu Mar 05, 2009 3:25 pm; edited 1 time in total (Reason for editing : Spelling error)
avatar
BrianJarrett

Posts : 61
Join date : 2009-02-03
Age : 43
Location : St. Louis, MO

View user profile

Back to top Go down

Re: How long should -1 dummy records exist in fact tables?

Post  Edwin Kurian on Wed Feb 11, 2009 3:44 pm

Not populating the fact table because "dimension key not found" is not the best idea. The reason is that you would lose the fact record. At a later point if you would run a comparison report between the source system and the data warehouse at an aggregated level, the numbers might not match. You will have a hard time justifying the numbers.

Brian has suggested an excellent solution to missing dimension keys. I have implemented the bogus dimension record solution at a client. You may want to be careful of the load order if the dimension surrogate key is an identity field (i.e database generated), as the key will be assigned at the time of insert.

Another alternative that I have implemented is Fact records that do not find a dimension key are loaded to an Error table. Structure of the error table will be similar to the Fact table, with additional business keys. This way the Fact table will not contain any -1 dimension key record. For every load cycle, the error table is processed first (i.e lookup Dimension keys if they have arrived), and then the Fact table. You can then have audit reports based of the error table.


Edwin
who believes every challenge is unique, and the solution will have to be customized to the needs.

Edwin Kurian

Posts : 13
Join date : 2009-02-03
Location : Milwaukee

View user profile http://www.valordevelopment.com

Back to top Go down

DUMMY RECORDS

Post  sumitkumbhar on Thu Feb 12, 2009 2:45 am

Hi

In our datawarehouse, we do exactly what Brian described.
When a fact record shows up with a business key that does not yet exist in the dimension, we insert that business key in the dimension with default values for other columns (mostly 'unknown' string). The insert creates the record in the dimension and hence a surrogate key. Then while loading the fact record into the fact table we associate this surrogate key to it. Later (after few days) when we get the values for the other columns for this record and we update the dimension table with it.

Hope this helps.
Sumit

sumitkumbhar

Posts : 1
Join date : 2009-02-03

View user profile

Back to top Go down

Re: How long should -1 dummy records exist in fact tables?

Post  Scoop on Thu Feb 12, 2009 11:30 am

I thank all of you for your response.

We agree with what everyone is stating and we are just adding the dummy records to the fact tables. Because we build all dimensions first then build the facts with their natural keys and if the natural key is in our staging table but not in the related dimension we place a -1 dummy record in the fact table.

What we are trying to establish is that the Data Marts that are being created should not be creating any fact records with measures(facts) if they do not relate to each dimension in that given data mart and that these dimensions that are not tied to the fact table in the data mart should be created in another data mart for that particular business process or same data mart but only relate dimensions to the fact they are related and business process they are addressing to complete trending, decisions, and analysis with.

We just see briefing books being created and a lot of dimension records contain dummy values but have facts and we just do not see how this information is going to help the business if they do not know what these facts belong too.

So, we think these fact records that exist that only relate to 3 or 4 out of the 8 or 10 dimensions should be related to a separate fact table.

I know you do not have our Data mart in front of you, but we are refreshing data every night type 1 only and these facts that have dummy dimensions will never exist.

Maybe we are confusing everyone or we are confused. But, in my years of experience creating DM's and EDW's I have never experienced a firm who is demandind the data in source is clean not dirty and that dummy fact records are okay when I can go create results from data mart that have all unknown dimension records and many fact measures.

Maybe just venting, but usually we would do the dummy record process until those late arriving or early arriving facts or dimensions come into play, but these howerver do not.

Thank you for letting us vent and we will let this go until we can convince our Admins to turn on replication so, we can start with incremental and capturing history,etc.

Thank you,
Scoop

Scoop

Posts : 18
Join date : 2009-02-10

View user profile

Back to top Go down

-1 dummy dimensions and no facts

Post  Scoop on Tue Mar 03, 2009 2:11 pm

I need to ask another ignorant question and we may be second guessing ourselfs here but, we would like some opinions.

Our client thinks it is Okay to have a Data Mart with 12 dimensions and one fact table and to load -1 dummy record in every dimension before loading each dimension with source data and frefresh model every night knowing that 6 out of the 12 dimensions will never have a fact record so, loading -1 (1,000 of -1 dummy records into) fact table for each of those 5 dimensions is okay. They are not late arriving dimensions and they are not late arriving facts they are dimensions and facts that will never exist but they still want them loaded with -1.

So, if you select columns from 5 of these 12 dimensions in ProClarity and some measures you will get results for measures but -1 or unknown or item 0 for all dimensions.

So, basically facts are being loaded for our DimCustomer dimension with "unknown" customer name and we select all measures and we arrive with results for each measure even though we do not know who the customer is and will never know. This is a Data issue or a ETL load issue because the dimension is not late arriving.

I hope I am making sense and looking for response? What we are trying to say only load dimension data that will have a corresponding fact, a late arriving fact or a late arriving dimension and if this will not occur in the business then do not load -1 dummy surrogate key to fact table if that dimension will never exist.
Or, Dimensions are built first from source systems with natural keys then fact table is built with natural keys from each dimension if natural keys from fact stage tables correspond with natural key from stage dimension tables load record or add correct surrogate key to fact stage table and if not load that dimension and fact to error tables do not load this bad or non-existent data to the production fact table. So, no -1 dummy records should ever exist in fact table unless they are late arriving?????

Thank you for your responses and help very appreciative.

Scoop

Scoop

Posts : 18
Join date : 2009-02-10

View user profile

Back to top Go down

Re: How long should -1 dummy records exist in fact tables?

Post  tod mckenna on Wed Mar 04, 2009 8:10 am

You may also want to look at the grain of this fact table. If ProjectKey and CustomerKey are not relevant some or most of the time, then I would question their inclusion into this fact. I don't know your business, but I would think that having any significant amount of unknown dimension keys in any fact table is a cause of concern.

With that said, sometimes it is OK. I've handled early arriving facts in different ways:
(1) Using negative surrogate keys as you have described when I know I'll never get corresponding dimension attributes
(2) Using Brian Jarrett's approach of creating dimension rows that I *know* will come in later
(3) Leaving facts with unknown dimensions in the staging area for later processing (much like Edwin Kurian suggested with his "error" table) when I *know* that there *should* be a match

It seems that your group is trying to handle all of these cases using the "-1" approach and that might be the problem. Alternately, consider that having 2 factless fact tables that link customers and projects to your main fact table (although this might get a little messy, your situation might warrant it).

Hope this helps!


Last edited by tod mckenna on Wed Mar 04, 2009 8:23 am; edited 1 time in total
avatar
tod mckenna

Posts : 9
Join date : 2009-02-03

View user profile http://blog.todmeansfox.com

Back to top Go down

Re: How long should -1 dummy records exist in fact tables?

Post  tod mckenna on Wed Mar 04, 2009 8:22 am

Scoop wrote:
Our client thinks it is Okay to have a Data Mart with 12 dimensions and one fact table and to load -1 dummy record in every dimension before loading each dimension with source data and frefresh model every night knowing that 6 out of the 12 dimensions will never have a fact record so, loading -1 (1,000 of -1 dummy records into) fact table for each of those 5 dimensions is okay. They are not late arriving dimensions and they are not late arriving facts they are dimensions and facts that will never exist but they still want them loaded with -1.

My first thought is that this is not a good design at all. I would question what the grain of the fact table and then I would question the mechanics of the ETL environment.

However,

If the purpose of doing this is to mask customers (for compliance or security reasons), then this approach of loading a dimension with nothingness might make sense -- especially if you want to reuse certain reports and functions in the datawarehouse, where these dimensions ARE populated correctly. This approach might also make sense if certain facts simply do not ahve the dimensions in question. But honestly, I try to avoid these cases as they can confuse users and cause some reporting problems (see my previous response).

Scoop wrote:
I hope I am making sense and looking for response? What we are trying to say only load dimension data that will have a corresponding fact, a late arriving fact or a late arriving dimension and if this will not occur in the business then do not load -1 dummy surrogate key to fact table if that dimension will never exist.
Or, Dimensions are built first from source systems with natural keys then fact table is built with natural keys from each dimension if natural keys from fact stage tables correspond with natural key from stage dimension tables load record or add correct surrogate key to fact stage table and if not load that dimension and fact to error tables do not load this bad or non-existent data to the production fact table. So, no -1 dummy records should ever exist in fact table unless they are late arriving?????


The general rule: Load facts after your dimensions so that each fact has a full compliment of dimensions.
avatar
tod mckenna

Posts : 9
Join date : 2009-02-03

View user profile http://blog.todmeansfox.com

Back to top Go down

Re: How long should -1 dummy records exist in fact tables?

Post  Scoop on Thu Mar 05, 2009 1:36 pm

Todd we thank you for your replies and we have designed many DM's in past yet we have never run into a client that is demanding these -1 dummy and fact records exist. This causes so, many books and queries to go out to lunch and demands the business really know what they are dragging and dropping into reports.

Grain is the mentoring we are establishing with the IT folks here that business processes really have to get narrowed down to establish true grain of the facts.

We thank everyone for responses.

We like this site..

Scoop

Scoop

Posts : 18
Join date : 2009-02-10

View user profile

Back to top Go down

Re: How long should -1 dummy records exist in fact tables?

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum