Kimball Forum
Would you like to react to this message? Create an account in a few clicks or log in to continue.

Health Catalyst - Healthcare DW/BI

5 posters

Go down

Health Catalyst - Healthcare DW/BI Empty Health Catalyst - Healthcare DW/BI

Post  brownp123me Tue Feb 05, 2013 10:32 am


Has anyone had exposure to this organization? They make intriguing claims about a "late binding bus architecture"
Any thoughts ?

http://www.healthcatalyst.com/company/

After determining that the predominant approaches to data modeling weren’t effective for healthcare data, they discovered the solution, which is now known as the Adaptive Data Architecture. Using a late-binding bus architecture, Catalyst’s adaptive data model is agile, flexible, and can be implemented in a matter of weeks compared to the months or years traditional approaches require

brownp123me

Posts : 2
Join date : 2013-02-05

Back to top Go down

Health Catalyst - Healthcare DW/BI Empty Re: Health Catalyst - Healthcare DW/BI

Post  Jeff Smith Tue Feb 05, 2013 1:08 pm

What's "Late-Binding" mean?

Jeff Smith

Posts : 471
Join date : 2009-02-03

Back to top Go down

Health Catalyst - Healthcare DW/BI Empty Re: Health Catalyst - Healthcare DW/BI

Post  brownp123me Tue Feb 05, 2013 2:02 pm

that's what i'm trying to figure out. I have asked someone from the company and will post when i get it. Should be interesting

brownp123me

Posts : 2
Join date : 2013-02-05

Back to top Go down

Health Catalyst - Healthcare DW/BI Empty Re: Health Catalyst - Healthcare DW/BI

Post  ngalemmo Tue Feb 05, 2013 2:44 pm

If they are talking about fact/dimension relationships, they are probably using a timestamp qualifier in joins.

In a strict dimensional model, it uses 'early binding'. When a fact table is loaded, FK relationships to type 2 dimensions are assigned at the time of load. This becomes a fixed, non-ambiguous, relationship. It identifies the member as well as the version of the member that is associated to the fact.

In a 'late binding' scenario, the fact is associate with the member, but not a specific version (i.e. it stores a type 1 FK with the fact). When the dimension is referenced, a timestamp associated with the fact is used to locate the proper version of the dimensions... it uses a composite key (type 1 key and timestamp) to access the dimension.

In late binding, late arriving dimension data is not a problem because the fact/dimension binding occurs at the time of use, rather than the time of load. If a retroactive dimension update occurs, subsequent queries for facts in that timeframe would carry the changed attributes without needing to rekey the facts.
ngalemmo
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

http://aginity.com

Back to top Go down

Health Catalyst - Healthcare DW/BI Empty Re: Health Catalyst - Healthcare DW/BI

Post  Jeff Smith Tue Feb 05, 2013 3:12 pm

How is the performance with such a database? Do the complex joins cause performance to drag? I would think that since you can't define a Primary Key performance would suffer. You really couldn't even build a cluster index on the dimension key and the dates fields without really slowing load performance.

Jeff Smith

Posts : 471
Join date : 2009-02-03

Back to top Go down

Health Catalyst - Healthcare DW/BI Empty Re: Health Catalyst - Healthcare DW/BI

Post  ngalemmo Tue Feb 05, 2013 11:27 pm

If you are using a strict dimensional pattern, no, the impact is usually not significant. Netezza is very effective with star schema. Generally, dimensions tend to be small (under 1GB or so), so Netezza will pull these into memory and use the memory image (only containing needed columns) to join en-masse with the rows in the fact table. Its a giant merge in one pass through the data.

If there is a large dimension that is commonly used in queries, it is usually beneficial to distribute both the dimension and the fact table by the same key. This means joins between these two large tables (large dimension table and fact table) is performed on the same SPU. Another strategy is to use a common group-by dimension key. When aggregations are performed, the aggregations will operate in parallel, which can significantly improve performance of aggregate queries that use that dimension. The challenge with this strategy is getting an appropriately smooth distribution. The more columns you organize on, the less likely that organization is useful to a query.

Also, Netezza has no indexes. It cannot enforce PK or FK constraints, so it doesn't. It allows you to declare them for documentary purposes. Some BI tools and other query tools may use this information to support the user experience.
ngalemmo
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

http://aginity.com

Back to top Go down

Health Catalyst - Healthcare DW/BI Empty Late Binding in data warehousing

Post  drsanders Thu May 02, 2013 10:23 am

ngalemmo captures the essence of our late binding methodology, very well. I'm a senior VP with Health Catalyst, but a CIO and data warehousing guy, first. If you would like to learn more about our methodology, please give me a shout. You can also Google "late binding data warehouse slideshare" for a slide deck that provides an overview.

Dale Sanders, dale.sanders@healthcatalyst.com

drsanders

Posts : 1
Join date : 2013-05-02

Back to top Go down

Health Catalyst - Healthcare DW/BI Empty Re: Health Catalyst - Healthcare DW/BI

Post  sachij3u Mon Jul 22, 2013 3:46 pm

On the same lines as ngalemmo mentioned, we have used a similar approach of defining the dimension with composite key something like say (Dim_id and version_number).
Dim_id is a surrogate key which remains same for every natural key but version increases each time there is a change detection on the natural key. eg:
For natural key (employee_id = 1000), the EMP_DIM_ID = 80 would always be remain same as 80. But everytime any attribute of employee id=1000 changes the version number increases.
Now while reporting the business user would always point to the latest record of employee using the below query:

Select *
 from fact f, emp_dim e
where f.emp_dim_id = e.emp_dim_id
and e.emp_current_rec = 1 (the record is most current and not expired)
sachij3u
sachij3u

Posts : 19
Join date : 2013-07-11
Age : 43
Location : Herndon, VA

Back to top Go down

Health Catalyst - Healthcare DW/BI Empty Re: Health Catalyst - Healthcare DW/BI

Post  Sponsored content


Sponsored content


Back to top Go down

Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum