How to load a dimension when the source data reuses IDs with different descriptions?

View previous topic View next topic Go down

How to load a dimension when the source data reuses IDs with different descriptions?

Post  pzajkowski on Fri Jun 17, 2011 11:26 am

I presently work with healthcare data, and one of our primary data sources reuses service provider IDs with completely different descriptions whenever the provider is out-of-network. (Providers that are "in-network" have unique Provider IDs, fortunately.)

Examples:
ProvID ProvIDDesc
ITS0001 ALBERT EINSTEIN MEDICAL CENTER
ITS0001 ALLEGHENY GENERAL HOSPITAL
ITS0001 ARIA HEALTH
ITS0001 ARROYO GRANDE COMMUNITY H
ITS0001 BAPTIST HOSPITAL WEST
ITS0001 BAPTIST MED CTR, BC
... etc

Since the providers listed above are always out-of-network, I populate Dim_Providers with a single row for ITS001 with a generic description of "ITS Out-Of-Network". There are other ITS-prefixed ProvIDs with the same problem; these are also collapsed to a single row in Dim_Providers with the same generic description: e.g., ITS002, ITS003, ITS004.... all get a description of "ITS Out-Of-Network". On the upside, all of these out-of-network providers are geographically way far away from our network, so we essentially ignore all of these types of medical claims.

While my approach eliminates the complexity of the situation, I'm nonetheless quite frustrated by the mere fact that a source ID can mean completely different providers. I feel my approach is in "violation" of Kimball dimensional modeling.

How would the rest of you handle this kind of data to create a Kimball compliant Dim_Providers dimension?


pzajkowski

Posts : 31
Join date : 2009-08-10

View user profile

Back to top Go down

Re: How to load a dimension when the source data reuses IDs with different descriptions?

Post  ngalemmo on Sat Jun 18, 2011 8:21 am

If you want to retain the name, it basically boils down to modifing the process of assigning surrogate keys. The dimension table itself doesn't need to change.

For the population of 'ITS' id's, use a look-up process that includes name as part of the natural key.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: How to load a dimension when the source data reuses IDs with different descriptions?

Post  pzajkowski on Mon Jun 20, 2011 8:51 am

ngalemmo wrote:

For the population of 'ITS' id's, use a look-up process that includes name as part of the natural key.

Now, why didn't I think of that? Simple and effective. (I think I get stuck in the mindset of merely using source IDs for the look-up process, instead of considering additional fields.)

I have a followup concern, however. Since our data sources provide a total mixed bag of spellings for a given service provider, I presently run a process equivalent to maintaining a master person index so that a consistent provider name can be stamped on all relevant rows, especially since not all of our sources supply a National Provider Index (NPI). For example, at least dozen different source IDs and spellings exist for "Rochester General Hospital": "RG hosp", "Roc Gen H", "Hosp, Rochester G".... etc. Without stamping a single matching provider description of "Rochester General Hospital" in Dim_Providers, I'd be writing a lot of messy sql code to capture the various versions of this single provider. It seems to me that this is a necessary evil, unless you can suggest otherwise?


pzajkowski

Posts : 31
Join date : 2009-08-10

View user profile

Back to top Go down

Re: How to load a dimension when the source data reuses IDs with different descriptions?

Post  ngalemmo on Mon Jun 20, 2011 10:24 am

That gets messy no matter what you do...

One solution would be to have two name fields on the dimension, a raw name and a clean name. For new rows populate both columns with the raw name (which serves as part of the natural key) and allow users to update the clean name value with corrected spellings. Always report using the clean name column. Names get fixed over time and it doesn't mess up reporting queries. By keeping the raw name, future references with the same spelling would be associated with the same dimension row and corrected (clean) spelling.

For normal providers you can leave the raw name blank or populate it, it doesn't matter.

avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: How to load a dimension when the source data reuses IDs with different descriptions?

Post  pzajkowski on Mon Jun 20, 2011 10:32 am

OK - Your suggestions and explanation is in-line with what I've been considering.

As always, thanks for the feedback and assistance.
-- Pete

pzajkowski

Posts : 31
Join date : 2009-08-10

View user profile

Back to top Go down

Re: How to load a dimension when the source data reuses IDs with different descriptions?

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum