Large volume of hospital data into fact table

View previous topic View next topic Go down

Large volume of hospital data into fact table

Post  inddatadm on Thu Oct 15, 2015 8:47 pm

Hi all,
I am working in INDIA for a hospital network, my requirement is create a data model to capture hospital patient register data, the hospital has chain of sub hospital(around 15) and it's spread across all state(25), single state data volume is around 6 mil row, if I bring all state data then it will come around 1 to 2TB data. Please advise me as how to handle this volume of data.

Here are the kind of info I will capture
patient info(name, where he or she from,...)
diagnosis
procedure

I am thinking about creating fact tables as
patient diagnosis/procedure as separate

inddatadm

Posts : 3
Join date : 2015-10-15

View user profile

Back to top Go down

Large volume of hospital data into fact table

Post  zoom on Fri Oct 16, 2015 8:07 am

Do you know what business user is asking to measure and the population for it ....all states or just one state? I am not worry about the data volume but the requirements to built on. Your dim tables are going to be smaller than the Fact table? If you partition Fact table on patient visit's date MONTH then performance is going to be fast if you bring all states data. BTW, keep diagnosis and procedure together. A doctor wants to see them together. Keeping them separated would cause performance issue and joining them together, if kept separate, would be a nightmare.

I do not know if you have done any data profile on the patient data, but a very sick patient can have multiple diagnosis and procedures in a day. Please read Kimball book on how to create bridge table for that purpose. He described it in detail on how to do data modeling for Health care industry.

Data volume is the least challenging part when working with patient's data. The most challenging part is how to de-dup patient's demographic data to create a master/unique patient dim.

zoom

Posts : 97
Join date : 2010-08-23
Location : Chicago

View user profile

Back to top Go down

Large volume of hospital data into fact table

Post  inddatadm on Fri Oct 16, 2015 10:27 am

The requirement is to bring meausre from all state, which is why I am worried about query performance. Currentnly we have diagnosis and procedure as separate fact with each diagnosis or procedure as one row (basically if the patient has 10 diagnosis then , in the diagnosis table we will have 10 rows with sequence number) same way for procedures.

where as I have separate table for visit which is 1 row for each visit.

inddatadm

Posts : 3
Join date : 2015-10-15

View user profile

Back to top Go down

Re: Large volume of hospital data into fact table

Post  ngalemmo on Fri Oct 16, 2015 2:40 pm

Generally, diagnosis is modeled as a multi-valued dimension, with a bridge table between a fact and the dimension table.  Procedures, on the other hand are stored as a fact at the procedure grain(procedures usually have a lot of measures associated with them).  Both visit and procedure facts would reference diagnosis through the bridge.

There are two ways to store diagnosis.  You can either have a bridge keyed by visit and diagnosis, or keyed by diagnosis group and diagnosis.  The diagnosis group is a derived value based on the unique combination of diagnoses.  The latter is more difficult to construct, but my experience doing this with ICD-9 coding was a significantly smaller bridge table (approximately 90% smaller).  This was due to the fact doctors usually don't code very many, there is a high correlation between diagnosis used and most of the time a large numbers of patients have the same diagnosis.  By its nature, the number of groups grows very quickly when initially constructed, but then grows very slowly as most diagnosis combinations have been encountered.

Note that this grouping is based solely on the combination of codes and should not be confused with diagnosis groups, such as the John Hopkins algorithm, which consider patient demographics and severity of the diagnosis.  Building groups is a matter of constructing a natural key made up of the primary diagnosis followed by the remaining diagnoses sorted in ascending sequence.  The bridge should include a 'is primary' flag to identify the primary diagnosis or you can place a separate primary diagnosis key on the facts if that is commonly used in analysis to avoid using the bridge.

However, with the higher precision built into ICD-10 coding, it is not clear if you would experience as significant reduction in size, but it may be worth looking into.

Also, considering the population you need to deal with, it may be worth looking into high power platforms, such as Netezza.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Large volume of hospital data into fact table

Post  inddatadm on Sat Oct 17, 2015 3:39 pm

Thanks for your input, one of the other problem is the registration id(which I am storing as degenerated dimension in the fact table, because each hospital has it's own registration number and it can be same for other hospital) so I am leaving that as DD and storing in Fact table, my problem is now we have all other attributes related to registrationn(like when patient admit, discharge dates and other comment fields) are going to come in, if I don't get them into a dimension table then all other attributes will also end up in fact table , any input ?

if I bring these registration id by each hospital , the dimension table will become like a transaction table .. so not sure what I should di

inddatadm

Posts : 3
Join date : 2015-10-15

View user profile

Back to top Go down

Re: Large volume of hospital data into fact table

Post  ngalemmo on Mon Oct 19, 2015 3:13 pm

Registration ID does not have to be a dimension table. The ID itself can be degenerate. You can represent attributes associated with the registration as other dimensions of the fact (such as admit date), either conforming or junk, as needed.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Large volume of hospital data into fact table

Post  zoom on Mon Oct 19, 2015 3:20 pm

Dates should exist in the fact table. I resolved that "comments" issue by creating a bridge table. The bridge table had all the procedure done for a patient and it also had the comments for a procedure. Below is what I had as a design.... an arrow -> means 1:m . For example 1 or many patient rows exists in the fact table.

Fact table <- patient dim -> patient bridge <- procedure codes

This is a workable solutions for me. Like I mentioned before if you partition fact table on a date, performance should not be an issue.


zoom

Posts : 97
Join date : 2010-08-23
Location : Chicago

View user profile

Back to top Go down

Re: Large volume of hospital data into fact table

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum