Bridge Table and Customer Hierarchy

View previous topic View next topic Go down

Bridge Table and Customer Hierarchy

Post  mea0730 on Wed May 26, 2010 2:03 am

Does anyone know of a good detailed reference that discusses populating and maintaining customer hierarchy type bridge tables? The only decent resource I've found is the "The Data Warehouse ETL Tool Kit", but it only briefly shows an example on how to initially populate the table. It would seem the most complex part would be updating the table when a node is moved. Maintaining type2 changes in the bridge table would add even more complexity. If a node was removed from one location and inserted into a new location, that would involve creating new records as well as updating the "# levels from parent" on all the effected nodes. This seems like it would be very difficult! Does anyone have experience maintaining customer hierarch bridge tables?

Also, in order to maintain multiple hierarchies (for example, sales and marketing might want to have their own hiearchy structures), would it be best/easeir to have a seperate bridge table for each hierarchy? What's the best way to do this?

Thanks for any information!

mea0730

Posts : 21
Join date : 2010-05-15
Location : San Jose, CA

View user profile

Back to top Go down

Try Nested Sets instead

Post  John Simon on Wed May 26, 2010 2:53 am

I've looked into this area quite extensively. In my opinion the Bridge tables are slow performing and grow quite huge. Most OLAP tools will handle Parent-Child hierarchies quite comfortably these days, and most transaction systems store the data this way so I donít really see the need for the Bridge table anymore.
For performance in reporting i.e. using Stored Procedures as the source for a report, Iíve found using Joe Celkoís Nested Sets to be the fast method available to get all descendents or ancestors. So in my dimension table Iíll store Parent-child keys as well as Lft, Rgt and Level values for the Nested Set queries.

John Simon

Posts : 186
Join date : 2009-05-29
Location : Melbourne, Australia

View user profile http://jsimonbi.wordpress.com

Back to top Go down

Re: Bridge Table and Customer Hierarchy

Post  ngalemmo on Wed May 26, 2010 12:46 pm

Support for recursive structures in BI tools is not universal, but if it happens to fit your situation, by all means, take advantage of it.

I wrote a routine a while back to update a bridge structure. Rather than try to figure out the effect of a change to a particular node, I basically regenerated the bridge and updated the existing bridge (with effective dates) by comparing it with the newly generated version... a very simple sort/merge pass on the data.

I havn't seen Celko's approach, but I'm not crazy about the idea of using stored procedures to support reporting in a data warehouse. My feeling is a data warehouse should be an environment where the business can do whatever they need to do without getting IT involved (other than loading the data in the first place or in an advisory capacity).
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Bridge Table and Customer Hierarchy

Post  John Simon on Wed May 26, 2010 6:40 pm

I like Stored Procedures for Reports when the reports are canned. They allow the use of things like temp tables and other database specific SQL features that generic reporting tools donít. Generic ad-hoc reporting tools also write notoriously poor SQL, and as such donít perform well for large queries or queries that contain complex business logic. Another advantage of stored procedures if using SQL Server is that they allow the query plan to be re-used whereas ad-hoc queries using a reporting tool can blow-out the plan cache Ė again affecting performance.
Stored Procedures also allow the front-end reporting tool to be changed without having to rewrite all of the reports.

Regardless, Iíve found that bridge tables just donít perform well. The advantage they do have is if youíre in a situation where the reporting tool doesnít deal well with parent-child hierarchies then the bridge table can be of use.

John Simon

Posts : 186
Join date : 2009-05-29
Location : Melbourne, Australia

View user profile http://jsimonbi.wordpress.com

Back to top Go down

Re: Bridge Table and Customer Hierarchy

Post  John Simon on Wed May 26, 2010 6:45 pm

Oh, by the way, Celko's approach to hierarchies is simple and brilliant.
Ralph Kimball even mentions it in his courses now as something to be considered. It's lightning fast for returning descendents and ancestors. The only disadvantage is if you have constant changes to the hierarchy - although in a data warehouse environment this isn't really an issue, especially if the client is only interested in the latest version of a hierarchy.

John Simon

Posts : 186
Join date : 2009-05-29
Location : Melbourne, Australia

View user profile http://jsimonbi.wordpress.com

Back to top Go down

Re: Bridge Table and Customer Hierarchy

Post  mea0730 on Wed May 26, 2010 11:38 pm

I'm really supprised that Ralph Kimball, or anyone for that matter, has not written in detail about how to maintain a bridge table. I'm supprised that Ralph would recommend this method without more details or recommendations on how to maintain it.

With regards to my second question, for multiple hierarchies, do you think it would be best to maintain seperate bridge tables to keep the number of records at a minimum? I guess the other way would be to include a Hierarchy Name field in the bridge table.

Thanks,
Mike

mea0730

Posts : 21
Join date : 2010-05-15
Location : San Jose, CA

View user profile

Back to top Go down

Re: Bridge Table and Customer Hierarchy

Post  John Simon on Thu May 27, 2010 12:58 am

Are you using the bridge table to allow users to do ad-hoc querying? How will this table be used?

John Simon

Posts : 186
Join date : 2009-05-29
Location : Melbourne, Australia

View user profile http://jsimonbi.wordpress.com

Back to top Go down

Re: Bridge Table and Customer Hierarchy

Post  mea0730 on Thu May 27, 2010 1:29 am

Yes, it could potentially be used for ad-hoc queries and for canned queries like, "Show me all revenue for a specific company and all divisions and sites under that company". I like the bridge table concept for it's query time simplicity and ad-hoc capability. But the complex maintenance is concerning. I'm thinking that a compromise might be to update the whole hierarchy of a particular high level parent node if one of the child nodes is changed in the source system. If just seems like it would be difficult to insert a node and keep all the pointers updated correctly otherwise!

mea0730

Posts : 21
Join date : 2010-05-15
Location : San Jose, CA

View user profile

Back to top Go down

Re: Bridge Table and Customer Hierarchy

Post  John Simon on Thu May 27, 2010 1:37 am

Here's some code I wrote to create a bridge table in SQL Server 2005 that you might find useful:

IF OBJECT_ID('dbo.Product') IS NOT NULL DROP TABLE dbo.Product
GO
CREATE TABLE dbo.Product(ProductID int not null
, ProductName varchar(100)
, ParentProductID INT)
GO
ALTER TABLE dbo.Product ADD CONSTRAINT
PK_Product PRIMARY KEY CLUSTERED
(
ProductID
) ON [PRIMARY]
GO
INSERT INTO dbo.Product(ProductID, ProductName, ParentProductID)
VALUES (0,'Baby Goods',NULL)
, (10,'Baby Food',0)
, (20,'Nappies',0)
, (100,'All Ages Baby Food',10)
, (150,'Beginners',10)
, (200,'Strawberry Yoghurt',100)
, (250,'Baby Cereal',100)
, (300,'Formula Milk',150)
, (310,'Bulk Pack Nappies',300)
, (400,'Small Pack Nappies',20)
, (450,'Bulk Pack Nappies',20)
GO

IF OBJECT_ID('tempdb..#a') IS NOT NULL DROP TABLE #a
;WITH RecursiveCTE
As
( SELECT p.ProductName
, p.ProductID
, 1 AS HLevel
, CAST(p.ProductID AS VARCHAR(100)) AS TreePath
, ParentProductID
FROM dbo.Product p
WHERE p.ParentProductID IS NULL
UNION ALL
SELECT p.ProductName
, p.ProductID
, H.HLevel+1
, CAST(H.TreePath + '/'+ CAST(p.ProductID AS VARCHAR(10)) AS VARCHAR(100)) AS TreePath
, p.ParentProductID
FROM dbo.Product p
INNER JOIN RecursiveCTE H
ON H.ProductID=p.ParentProductID
)
SELECT *
INTO #a
FROM RecursiveCTE
GO
if OBJECT_ID('tempdb..#c') is not null drop table #c
SET NOCOUNT ON;
DECLARE @tmp TABLE (ID INT identity(1,1), ProductName VARCHAR(250),TreePath VARCHAR(100), HLevel INT)
DECLARE @tmpResult TABLE (ProductName VARCHAR(100),TreePath VARCHAR(100), Parent INT, Child INT, Hlevel INT)
DECLARE @parent INT = 1
, @cnt INT = 1
, @MaxCnt INT
, @Hlevel INT
, @TreePath VARCHAR(100)
, @ProductName VARCHAR(100)

INSERT INTO @tmp(ProductName, TreePath, HLevel)
SELECT ProductName, TreePath, HLevel
FROM #A

SELECT @MaxCnt = MAX(ID) FROM @tmp
WHILE @cnt <= @MaxCnt
BEGIN
SELECT @TreePath = TreePath FROM @tmp WHERE ID = @cnt
SELECT @Hlevel = Hlevel FROM @tmp WHERE ID = @cnt
SELECT @ProductName = ProductName FROM @tmp WHERE ID = @cnt
SELECT @parent = 1

WHILE @parent <= @Hlevel
BEGIN
INSERT INTO @tmpResult(ProductName, TreePath, Parent, Child, Hlevel)
VALUES (@ProductName
, @TreePath
, dbo.udf_GetNode(@TreePath,'/', @parent)
, dbo.udf_GetNode(@TreePath,'/', -1)
, @Hlevel-1)
SELECT @parent = @parent + 1
END
SELECT @cnt = @cnt + 1
END
;with TmpResults AS (
SELECT *
FROM @tmpResult
)
SELECT a.ProductName, a.TreePath, a.Parent, a.Child , a.Hlevel
, CASE WHEN a.Child = b.Parent THEN 'Y' ELSE 'N' END AS BottomFlag
INTO #c
FROM TmpResults a
LEFT OUTER JOIN TmpResults b
ON a.Parent = b.Parent
AND NOT EXISTS (SELECT 1 FROM TmpResults c
WHERE b.Parent = c.Parent
AND a.Parent <> c.Child)
ORDER BY TreePath

IF OBJECT_ID('dbo.ProductBridge') IS NOT NULL DROP TABLE dbo.ProductBridge
GO
SELECT Parent AS ParentProductID, Child AS ChildProductID, Hlevel,
BottomFlag
INTO dbo.ProductBridge
FROM #c

John Simon

Posts : 186
Join date : 2009-05-29
Location : Melbourne, Australia

View user profile http://jsimonbi.wordpress.com

Back to top Go down

Re: Bridge Table and Customer Hierarchy

Post  John Simon on Thu May 27, 2010 1:40 am

I'll note that I tried to test Bridge tables for a 30 thousand member dimension with 10 levels. I also tried this for Nested Sets, HierarchyID, and parent-child recursion. I ended up killing the bridge table population after 2 hours of it running on my laptop as it was taking forever and was going to be too huge.

I don't think this is a good idea for large dimensions in terms of query performance. Either get a better front-end tool or use Nested Sets in your canned reports.

John Simon

Posts : 186
Join date : 2009-05-29
Location : Melbourne, Australia

View user profile http://jsimonbi.wordpress.com

Back to top Go down

Re: Bridge Table and Customer Hierarchy

Post  ngalemmo on Thu May 27, 2010 12:15 pm

John, your experience with performance may have more to do with the platform than anything else. I have built bridges with hundreds of thousands of rows in Oracle (and beefy hardware) with depths down to 12 levels in only a few minutes.

By the way, can you post a link to Celko's article?
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Bridge Table and Customer Hierarchy

Post  mea0730 on Thu May 27, 2010 1:23 pm

Sounds like it will be wise to do expermenting like John did to see what the performance wuold be in our environment.

Nick,
Regarding the bridge table method... To keep the maintenance of the bridge table simple, what would you think about rebuilding the from the top parent, the complete hierachy for a particular customer when one of the child records/nodes is added, moved or changed in the source system. I know that sounds like overkill, but that seems like it would be simpler rather than trying to write logic to insert a node.

Thanks,
Mike

mea0730

Posts : 21
Join date : 2010-05-15
Location : San Jose, CA

View user profile

Back to top Go down

Re: Bridge Table and Customer Hierarchy

Post  ngalemmo on Thu May 27, 2010 3:40 pm

As I mentioned earlier, what I've done before is, after the recursive relationship is updated, I regenerated the entire bridge from the recursive structure.

Now, if you don't need to track changes, you can simply overwrite the old bridge. If you need to track changes, a simple sort/merge pass between the most current old version and the newly generated bridge, matching things up by parent/child key would allow you to obsolete old entries (using effective date ranges) and add new/changed relationships.

I attempted to do this by looking at a single change and identifying what nodes were affected and such, and sort of had something that would work, but I dumped it because it was much too complicated (could not use recursive calls) and I could not be certain it was correct for every situation.

I also would not limit it to regenerating one parent at a time. Then you would have to know you have found all changed parents and that none of those had parents of their own that changed. Rebuilding the whole thing and a sort/merge update pass is simpler, and will always get it right.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Bridge Table and Customer Hierarchy

Post  John Simon on Thu May 27, 2010 6:36 pm

The issue with platform is valid, except that it took me only a couple of minutes on the same platform to build the same size hierarchies using alternate methods.

He's got a book you can buy on Amazon: "Joe celko's trees and hierarchies in sql for smarties"
Here's an article by Joe that explains his model:
http://intelligent-enterprise.informationweek.com/001020/celko.jhtml

BTW, there is an easier way to build the nested set method than what Joe has written in that article. If anyone is interested I can post the code here.

John Simon

Posts : 186
Join date : 2009-05-29
Location : Melbourne, Australia

View user profile http://jsimonbi.wordpress.com

Back to top Go down

Re: Bridge Table and Customer Hierarchy

Post  mea0730 on Thu May 27, 2010 7:37 pm

John, Thanks for the reference to the book! I will check it out! I would also be interested in seeing your code. I've not decided which way I want to go yet.

Another chalenge I see with rebuilding only the hierarchy that has changed is, how do you know what the "source" or the old hierachy parent was so you can rebuild that hierarchy. The source system would need some kind of a change log showing what the old linkage was before the change. I see why you resorted to rebuilding every time. My concern is the performance if the rebuild for 500K or more customer records.

mea0730

Posts : 21
Join date : 2010-05-15
Location : San Jose, CA

View user profile

Back to top Go down

Re: Bridge Table and Customer Hierarchy

Post  John Simon on Thu May 27, 2010 7:53 pm

IF OBJECT_ID('dbo.Product') IS NOT NULL DROP TABLE dbo.Product
GO
CREATE TABLE dbo.Product(ProductID int not null
, ProductName varchar(100)
, ParentProductID INT)
GO
ALTER TABLE dbo.Product ADD CONSTRAINT
PK_Product PRIMARY KEY CLUSTERED
(
ProductID
) ON [PRIMARY]
GO
INSERT INTO dbo.Product(ProductID, ProductName, ParentProductID)
VALUES (0,'Baby Goods',NULL)
, (10,'Baby Food',0)
, (20,'Nappies',0)
, (100,'All Ages Baby Food',10)
, (150,'Beginners',10)
, (200,'Strawberry Yoghurt',100)
, (250,'Baby Cereal',100)
, (300,'Formula Milk',150)
, (310,'Heinz Formula',300)
, (400,'Small Pack Nappies',20)
, (450,'Bulk Pack Nappies',20)
GO
IF OBJECT_ID('dbo.ProductsNested') IS NOT NULL DROP TABLE dbo.ProductsNested
GO
;WITH ProductLevels
AS
( SELECT p.ProductName
, p.ProductID
, 1 AS HLevel
, CONVERT(VARCHAR(MAX), ProductID) AS TreePath
, ParentProductID
FROM dbo.Product p
WHERE p.ParentProductID IS NULL
UNION ALL
SELECT p.ProductName
, p.ProductID
, x.HLevel+1
, x.TreePath + '.' + CONVERT(VARCHAR(MAX), p.ProductID) AS TreePath
, p.ParentProductID
FROM dbo.Product p
INNER JOIN ProductLevels x
ON x.ProductID=p.ParentProductID
)
,ProductRows AS(
SELECT
ProductLevels.*,
ROW_NUMBER() OVER (ORDER BY TreePath) AS Row
FROM ProductLevels
)
SELECT ProductName
,ER.ProductID
,ParentProductID
,ER.TreePath
,ER.HLevel
,ER.Row
,(ER.Row * 2) - ER.HLevel AS Lft,
((ER.Row * 2) - ER.HLevel) +
(
SELECT COUNT(*) * 2
FROM ProductRows ER2
WHERE ER2.TreePath LIKE ER.TreePath + '.%'
) + 1 AS Rgt
INTO dbo.ProductsNested
FROM ProductRows ER
ORDER BY TreePath

--Find leaf nodes
SELECT *
FROM dbo.ProductsNested
WHERE rgt - lft = 1


--Get descendents of Baby Food
DECLARE @Root VARCHAR(100);
SET @Root = 'Baby Food'

SELECT child.ProductID, REPLICATE(' ', child.Hlevel - parent.Hlevel) + child.ProductName AS ProductName
FROM ProductsNested parent
INNER JOIN ProductsNested child
ON parent.ProductName = 'Baby Food'
AND child.lft BETWEEN parent.lft AND parent.rgt
ORDER BY child.lft

--Get ancestors of Heinz Formula
DECLARE @Root VARCHAR(100);
SET @Root = 'Heinz Formula'

SELECT parent.ProductID, REPLICATE(' ', parent.Hlevel) + parent.ProductName AS ProductName
FROM ProductsNested parent
INNER JOIN ProductsNested child
ON child.ProductName = 'Heinz Formula'
AND child.lft BETWEEN parent.lft AND parent.rgt
ORDER BY child.lft

IF OBJECT_ID('dbo.ProductSale') IS NOT NULL DROP TABLE dbo.ProductSale
GO
CREATE TABLE dbo.ProductSale(SaleDate DATE, ProductID INT, SaleQty INT)
GO
INSERT INTO dbo.ProductSale
VALUES (GETDATE(), 200, 50)
, (GETDATE(),250, 90)
, (GETDATE(), 310, 80)
, (GETDATE(), 400, 23)
, (GETDATE(), 450, 19)
GO


--Get Totals
;WITH Sales AS (
SELECT lft, ProductName,COALESCE(ps.SaleQty,0) AS SaleQty
FROM ProductsNested child
LEFT OUTER JOIN dbo.ProductSale ps
ON ps.ProductID = child.ProductID
)
SELECT REPLICATE(' ', parent.Hlevel) + parent.ProductName AS ProductName
, SUM(SaleQty) AS SaleQty, parent.lft
FROM ProductsNested parent
INNER JOIN Sales child
ON child.lft BETWEEN parent.lft AND parent.rgt
GROUP BY parent.ProductName, parent.lft, parent.Hlevel
ORDER BY parent.lft

John Simon

Posts : 186
Join date : 2009-05-29
Location : Melbourne, Australia

View user profile http://jsimonbi.wordpress.com

Back to top Go down

Re: Bridge Table and Customer Hierarchy

Post  mea0730 on Thu May 27, 2010 7:56 pm

thanks!

mea0730

Posts : 21
Join date : 2010-05-15
Location : San Jose, CA

View user profile

Back to top Go down

Re: Bridge Table and Customer Hierarchy

Post  John Simon on Thu May 27, 2010 8:03 pm

Mike,
You may need to have a Type3 hierarchy i.e. only record the last "n" number of hierarchy changes.
FYI, Here's the results from Performance testing on SQL Server 2008 of various methods including Parent-Child (Adjacency List) using recursion and a loop, the new HierarchyID method, enumerated Path method and Nested Sets.

http://i65.servimg.com/u/f65/15/25/49/01/hierar11.jpg

John Simon

Posts : 186
Join date : 2009-05-29
Location : Melbourne, Australia

View user profile http://jsimonbi.wordpress.com

Back to top Go down

Re: Bridge Table and Customer Hierarchy

Post  ngalemmo on Fri May 28, 2010 1:33 pm

John,

Thanks for your link to Celko's article.

I looked it over and at first read his solution is functionally equivalent to Kimball's bridge. Whereas the bridge enumeriates the entire span of control for each parent, Celko's solution is to code that span as a key range and use a self join on the table to enumerate the span. The result is a much, much smaller table, which isn't a bad thing.

From a performance standpoint, given the economies of today's hardware, Celko's solution would probably perform better for very large and deep hierarchies since the entire table could be easily cached into memory, so the extra self-join would not be an issue.

The downside of the structure is it cannot be updated in a way to reflect change history in a traditional manner (i.e. date ranges) because the movement around the hiearchy would mess up the key ranges. So, the structure he decribes is a static snapshot of the hierarchy. This is not necessarily bad, given the fact is it a much smaller table to begin with. Structuring the table as an accumulating snapshot with the addition of a version number or as-of date would allow for a historical record in a resonably sized table provided it is managed in a reasonable way (such as a monthly snapshot history).
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Bridge Table and Customer Hierarchy

Post  mea0730 on Tue Jun 29, 2010 2:14 am

Sorry to open this thread again, but I was wondering how others have handled the situation where multiple hierarchy overlays are needed. For example in our case, Sales, Marketing, finance, each may want to maintain distinct unrelated customer hierarchy relationships. In order to avoid creating bridge tables that are too large by adding a "hierarchy type" field to the bridge table (if we end up using bridge tables), would there be any issues with just creating seperate bridge tables for each hierarchy? Is guess the main draw back is when a new hierarchy is created we have to create a new table instead of adding a new type.

Thanks!

mea0730

Posts : 21
Join date : 2010-05-15
Location : San Jose, CA

View user profile

Back to top Go down

Re: Bridge Table and Customer Hierarchy

Post  ngalemmo on Tue Jun 29, 2010 11:29 am

Depends on your BI platform. It becomes an ease of use issue. If you go with hierarchy type then you can provide a drop down list to choose which hierarchy to use and the same SQL will work regardless of which one they choose. It also allows you to add new hierarchy types without having to change the schema.

Separate tables require more object choices in the BI layer, more code to maintain each table, and the need to change the schema when new hierarchies are introduced.

Table size is a red herring... it would be the least of my worries. Besides, if hierarchy type is the first column in the index, the DB will quickly ignore everything else... and you can probably define a cluster index which will behave as if the other hierarchy types didn't exist.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Bridge Table and Customer Hierarchy

Post  LAndrews on Tue Jun 29, 2010 1:45 pm


Depending on your dbms - you could look into partitioning your bridge table by hierarchy type, and then configuring your BI tools to prune the correct partition for the subject area.

LAndrews

Posts : 132
Join date : 2010-05-13
Location : British Columbia, Canada

View user profile

Back to top Go down

Re: Bridge Table and Customer Hierarchy

Post  mea0730 on Tue Jun 29, 2010 7:28 pm

Partitioning! Excellent suggestions!!

Thanks!

mea0730

Posts : 21
Join date : 2010-05-15
Location : San Jose, CA

View user profile

Back to top Go down

Re: Bridge Table and Customer Hierarchy

Post  John Simon on Sat Jan 15, 2011 6:20 am

If anyone is interested, I've started a blog and covered how to populate the Hierarchy Bridge table. As noted in my blog, with a large data set that was initially taking over an hour before I killed the job, it now runs in just seconds.

http://jsimonbi.wordpress.com/



John Simon

Posts : 186
Join date : 2009-05-29
Location : Melbourne, Australia

View user profile http://jsimonbi.wordpress.com

Back to top Go down

Can you please send me the code that you used to populate the hierarchy bridge table.

Post  sharvan.kumar.83@gmail.co on Thu Oct 08, 2015 10:09 pm

Hi Ngalemmo,

Can you please send me the code that you used to populate the hierarchy bridge table in case of dimension undergo type 2 change.

Would like to understand how do i go ahead with this approach.

Thanks

sharvan.kumar.83@gmail.co

Posts : 10
Join date : 2014-11-17

View user profile

Back to top Go down

Re: Bridge Table and Customer Hierarchy

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum