Swiss cheese and dimensional design

View previous topic View next topic Go down

Swiss cheese and dimensional design

Post  cridal on Tue Jun 02, 2009 6:39 pm

I have a potential patient dimension... Wait, this actually sounds like a gross overstatement... Let me explain...

Patient information in the source system resembles Swiss cheese, because any particular piece of the data is entirely voluntary.

So you have Bill without last name, Hernandez without first name and address, or Sparky, who happens to be a dog… All of this is mixed in with other, case related info in one table in the source system with CaseID as PK. It’s a particularly messy pile of s…tuff, that is inherently uncleanable.

Creating a dimension out of it does not make sense to me as that dimension will grow 1:1 with the fact (a natural key of the dimension seems to be the CaseID).
Leaving all the information in the fact table seems extremely ugly and inefficient. Is purging this info the only way out? What if there is a business case for running an occasional report on the quality of patient information gathering?

cridal

Posts : 9
Join date : 2009-03-27

View user profile

Back to top Go down

Re: Swiss cheese and dimensional design

Post  ngalemmo on Tue Jun 02, 2009 6:50 pm

You have to consider what the business case is to collect this data. If there is apparently no business case to collect indentifying information, what would be the business case to store it for analysis?
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Agreed, swiss cheese --> can not analyze

Post  lmorgan on Wed Jun 03, 2009 3:31 pm

If the data is swiss cheese as you say, how can you preface any business "question" using it? You could not even do a simple, Show me totals by patient last name. Sounds like you'll be doing analysis on dimensions other than patient.

lmorgan

Posts : 3
Join date : 2009-02-27

View user profile

Back to top Go down

Re: Swiss cheese and dimensional design

Post  Jeff Smith on Wed Jun 03, 2009 4:27 pm

cridal wrote:I have a potential patient dimension... Wait, this actually sounds like a gross overstatement... Let me explain...

Patient information in the source system resembles Swiss cheese, because any particular piece of the data is entirely voluntary.

So you have Bill without last name, Hernandez without first name and address, or Sparky, who happens to be a dog… All of this is mixed in with other, case related info in one table in the source system with CaseID as PK. It’s a particularly messy pile of s…tuff, that is inherently uncleanable.

Creating a dimension out of it does not make sense to me as that dimension will grow 1:1 with the fact (a natural key of the dimension seems to be the CaseID).
Leaving all the information in the fact table seems extremely ugly and inefficient. Is purging this info the only way out? What if there is a business case for running an occasional report on the quality of patient information gathering?

I personally don't like excluding data from the source if possible. After all, one man's garbage is another man's gold. If you want to include the crappy data, then 1 option is to put the crappy data into junk dimensions. If all of the unique combinations of the crappy data numbers the same of the rows in the fact table, then you could put the data into multiple junk dimensions, selecting different combinations of the crappy columns until you come up with junk dimensions that are manageable in size.

Jeff Smith

Posts : 471
Join date : 2009-02-03

View user profile

Back to top Go down

Re: Swiss cheese and dimensional design

Post  cridal on Wed Jun 03, 2009 4:46 pm

To Imorgan:

I guess the only queries I envision relate to quality of patients data capture. For example, what percentage of cases by operator is missing a patient’s zip code or city? The business reason for this might be to see which operator is lazy. If you can see a historical trend in percentages, then it might have to do with operator's performance on top of patient's unwillingness to share their info. Business case is clear -> fire that operator.

So hypothetically, a reason to come up with the design might be valid. The question is: if it were the case, how would you handle it in the design?

cridal

Posts : 9
Join date : 2009-03-27

View user profile

Back to top Go down

Re: Swiss cheese and dimensional design

Post  BoxesAndLines on Wed Jun 03, 2009 10:20 pm

Are you a provider or an insurance company? If you are a provider, how do you bill the patient if you don't know who they are? If you are an insurance company, you should have some sort of subscriber identifier. Your problem is a common occurrence when data is repurposed. You want the data to provide insights to operator performance, but the data or the system wasn't designed for that problem. Most insurance companies can't even tell you who a particular claim is for. All they know is that a subscriber has a policy and the claim should be paid. Whether the claim is for the primary policy holder or their spouse or their children is not important to the claims processing application. It is very important to the managed care folks who are data mining the claims data to try to identify successful and unsuccessful trends in health care.

If you want to report on the data quality of the incoming data, Ralph has written many articles on the audit dimension to track the data quality of source and fact data. This would be a first step to quantifying the data quality.
avatar
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

View user profile

Back to top Go down

Re: Swiss cheese and dimensional design

Post  cridal on Thu Jun 04, 2009 1:01 am

It's actually none of those things. I've read Kimball. If you read my question carefully, I'm posing a question about a particular data modeling decision and not advice about data quality profiling. The source system is what it is.

cridal

Posts : 9
Join date : 2009-03-27

View user profile

Back to top Go down

Re: Swiss cheese and dimensional design

Post  BoxesAndLines on Thu Jun 04, 2009 8:00 am

Stick them on the fact table as nullable.
avatar
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

View user profile

Back to top Go down

Event fact

Post  lmorgan on Thu Jun 04, 2009 11:14 am

Agree with Boxes on this. Sounds more like an event fact, with a bunch of nullable columns to allow missing data.

lmorgan

Posts : 3
Join date : 2009-02-27

View user profile

Back to top Go down

Re: Swiss cheese and dimensional design

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum