Modelling Age / Date of Birth

View previous topic View next topic Go down

Modelling Age / Date of Birth

Post  Tyberious Funk on Thu Feb 21, 2013 7:46 pm

Sorry if this is a simple or frequently repeated question. I'm still relatively inexperienced at dimensional data modelling...

We have a fact table, which for the sake of my question let's call it FACT_SERVICES. One of the fields in the fact table is the Age of the recipient of the services. This field is used for calculating things like the average age of our clients, max/min age and so forth. It is also used for reports, where it is frequently rolled up to groupings, eg 18-25, 26-35 and so forth.

Age is calculated from the date of services, less the client's DOB. Due to the usual data quality issues, this often produces incorrect results like clients with negative ages.

My initial reaction was to create an AGE surrogate key, and then put together a DIM_AGE table. Any age less than 18 and more than 120 would be given a surrogate key of 0, which in the DIM table would relate to 'UNKNOWN'. The advantage of the DIM table is that I could then also create attributes to handle commonly used grouping methodologies eg, "Broad" Age categories such as 18-30, 31-50, 51-65, 65+ or "Narrow" categories such as 18-25, 26-30, 31-35 on so forth. At the moment, grouping ages is handled differently in each report depending on the individual report developer (urgh!)

But I'm not sure how I then handle the question of average age.

In this instance, should age be treated as both a measure and a dimension?




Tyberious Funk

Posts : 3
Join date : 2013-02-21

View user profile

Back to top Go down

RE: Modelling Age / Date of Birth

Post  rathjeevesh on Thu Feb 21, 2013 10:42 pm

This is a thoughtful approach, but will need effort for implementation. You are treating age as a measure and thinking to have a dimension for it... does not sound smart. I do not intend to say that it is wrong. Option to have standard age group is a good case.

Store age as a calculated measure in the fact. You can handle exception during ETL, like replace age >120 and <18 as '0' or 18 if it is <18 and 100 if it is >120 and store these kind of exception separately with a unique identifier (Service ID in this case). Send the list back to the source team to fix these kind of data issues and demonstrate how a DW solution can help improve data quality.

rathjeevesh

Posts : 15
Join date : 2013-02-16

View user profile

Back to top Go down

Re: Modelling Age / Date of Birth

Post  ngalemmo on Fri Feb 22, 2013 7:50 am

If you are calculating the age based on date of service, and you usually use other attributes from the client dimension in queries, I would tend to not bother storing age at all, and calculate it on the fly. This would allow the results to change as you get better birthdate information.

Cases where is it mandatory you store age in the fact usually involve health care and health insurance. In these cases, the age that goes into the fact (as a degenerate dimension) is the age recorded on the claim or admission form. It is not calculated.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Modelling Age / Date of Birth

Post  BoxesAndLines on Fri Feb 22, 2013 10:25 am

Make sure your age banding dimension goes to the "age" level. This is analogous to the date dimension.
avatar
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

View user profile

Back to top Go down

Re: Modelling Age / Date of Birth

Post  rathjeevesh on Fri Feb 22, 2013 12:42 pm

I believe AGE that TF is referring to is the age of client at the time he/she received service. Average, max, min age are some of the prospective Age is looked from. That is why I think age is a measure which has a fixed value at the time the service started unlike generic age which keeps on changing and should be calculated as TODAY-DOB. So storing the value in the fact table will keep it simple.

-JR

rathjeevesh

Posts : 15
Join date : 2013-02-16

View user profile

Back to top Go down

Re: Modelling Age / Date of Birth

Post  Tim Webber on Fri Feb 22, 2013 4:13 pm

I like the idea of storing your clients date of birth (DOB) in a client dimension. I also suspect its best to "update' or "overwrite" this DOB if the operational system where it comes from gets a new value UNLESS you have a need to know the DOB (age) of the client as it was when the service transaction was generated in which case you might want a type 3 dimension with current DOB and a slowly changed DOB. This way you have the "latest" and greatest DOB to work with and the DOB at the time of the transaction. If you store these in the fact table you may have lots more rows to update given a "fix" to a date of birth in the operational system.

With this method of storing our DOB in the dimension, your age becomes a calculated field as already suggested above -> also you may have more than one calculation. You could have

1) age as of the date of service
2) cohort calculations (age groups) as of the date of service
3) current age of your clients
4) current age cohorts

Lastly, avg, min, max age can be treated the same way. They become a calculations based on age as of the date of service or current age -> depending on what question you are trying to answer.


Tim Webber

Posts : 11
Join date : 2013-02-01

View user profile

Back to top Go down

Re: Modelling Age / Date of Birth

Post  ngalemmo on Fri Feb 22, 2013 6:38 pm

rathjeevesh wrote:I believe AGE that TF is referring to is the age of client at the time he/she received service. Average, max, min age are some of the prospective Age is looked from. That is why I think age is a measure which has a fixed value at the time the service started unlike generic age which keeps on changing and should be calculated as TODAY-DOB. So storing the value in the fact table will keep it simple.

-JR

Yes it is. But, it is calculated based on the service date and the birthdate on record. It is not the client declaring their age and storing that information. The transaction will always have the service date, and the client will always have a birthdate, so the derived age can always be derived and does not need to be stored.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Modelling Age / Date of Birth

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum