How to deal with non-durable NK with split/merge records

View previous topic View next topic Go down

How to deal with non-durable NK with split/merge records

Post  jwalker on Tue Oct 23, 2012 9:58 am

I'm relatively new to warehouse design and am running into an issue on how to address non-durable natural keys. The case I am dealing with is a feed from a data provider that is a list of physicians. The list contains various physician attributes as well as a person_id. A given person_id might change from month to month, which is a bit of a problem. The larger problem, however, is a person might split or join with other persons. Although this doesn't happen often, it's something that needs to be addressed. Here is an example:

"person_id split" scenario
monthperson_idname
June4838J Smith
July2382John Smith
July9928James Smith

"person_id merge" scenario
monthperson_idname
June1831Susy Smith
June1029S Smith
July7642Susan Smith

I am trivializing the data for the purposes of example. My vendor is supposed to be giving me migration information so that I can adjust my data as needed to bring it in line with their system. I will not have to perform any sort of fuzzy name matching or standardization and that is not really the issue at hand.

The issue becomes in how to deal with these changes when it comes to fact table population. There is a need to be able to answer both questions of "what was the view of the world at X point in time" as well as "what is the most current/updated view of the world". Because of the nature of the dimension change isn't simply "end date the old record; insert new record", I don't know of a good way to deal with this. Almost everything I've thought of to date requires a sort of type-2 fact table, and as far as I can tell, that's not generally a preferred option. Depending on the fact table I'm building, there would also be an issue of how to properly split measures if a person_id splits, but that's a different topic.

I might have to deal with it differently for different fact tables, I'm not sure. Here is a high level list of fact tables as I currently see them:

  1. Affiliations: Data provider is the source for this data. This is a factless fact table that's simply a join between physicians and organizations to indicate some sort of relationship between the two
  2. Splitter: Data provider is the source for this data. The core measure of this is "number of claims a physician had at a given organization". This is not the same set of relationships as in the Affiliations fact.
  3. Encounters: Our client (hospital systems) is the source for this encounter data. Measures will likely surround total charges and other financial data


If anyone has suggestions on how to deal with record merge/splits for dimensions with non durable keys, please let me know. I'm really struggling with finding a good way to represent this data.

jwalker

Posts : 2
Join date : 2012-10-23

View user profile

Back to top Go down

Re: How to deal with non-durable NK with split/merge records

Post  ngalemmo on Thu Oct 25, 2012 6:19 pm

What country are you in? I know in the US, and probably most other countries, physicians are licensed and they are assigned a unique, durable ID. If that is part of your data feed, that should be the natural key for the physician dimension. This might be something you need to discuss with your data provider if they do not supply it.

If I understand the gist of what the feed is today, the person id is problematic. Do they reuse ID's over time? If you can't get a license number, you may have to use month/person ID (if they reuse person ID) as the natural key and build a bridge to associate multiple dimension rows to the same person. It won't be easy.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: How to deal with non-durable NK with split/merge records

Post  jwalker on Thu Oct 25, 2012 6:30 pm

Thanks for the response! Yes, I am indeed in the states; I believe the identifier you are thinking of is NPI. Unfortunately, while the fill rate of this is high (around 90%), it is not complete. My understanding is that NPI was something that became required in the last 10 years or so, so anyone (or any system) that was around prior may not have a NPI. Regardless, I don't have 100% fill for that datapoint.

As I understand it, my provider does not re-use person_id keys. I also believe that when a person splits, the 2 split persons have new keys assigned and the old key is not used i.e. if key A splits, it becomes B and C. The same is true for merges i.e. if D and E merge, the new key is F.

I don't like the idea of not being able to answer the question of "how did the system look like as of X date", but I do need to apply migrations so that the current view is accurate. I'm not sure which way is the way to go.

jwalker

Posts : 2
Join date : 2012-10-23

View user profile

Back to top Go down

Re: How to deal with non-durable NK with split/merge records

Post  ngalemmo on Thu Oct 25, 2012 6:44 pm

Use a bridge table to associate various persons to a primary person. This will give you a simple structure to maintain and keep the dimension SK's intact. You can change the primary as you see fit without having to mess with the dimension table. Users can then report using the person on the fact or the primary through the bridge. You would have a similar structure for group affiliations. The bridge can also carry effective date ranges if you need to restate based on relationships in the past.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: How to deal with non-durable NK with split/merge records

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum