Conformed dimensions

View previous topic View next topic Go down

Conformed dimensions

Post  jochem_van_grondelle on Mon Aug 01, 2011 7:54 am

Hello,

This questions relates to a data model that I am designing, according to Kimball methodology.
This data ware house combines data from a CMDB that contains information about computers (12,000) and scanning software that scans all computers for all installed software products.
So simplified there would be a factless fact table called Software Installation, with some dimensions such as Computer, Date Discovered, and Software Product.

The problem lies in the fact that with software there is a hierarchy of Vendor, Product and Version (Adobe, Photoshop, CS4). Most reports that have to be built have to ignore the version (but not all). If a computer has both Photoshop CS3 and CS4 installed (so two versions but one product) in a report this computer would have to be counted as 1 for the line Adobe Photoshop.

My idea is to make a SoftwareProductDimension, and conformed to that, a SoftwareProductVersionDimension. I could then use two fact tables: SoftwareProductInstallation and SoftwareProductVersionInstallation.

What would you think of that solution?

I just need to save information about software products, such as description and unit price, that are on a product level and not on a version level. Could I just add these extra attributes to the SoftwareProductDimension, or should I create a separate dimension for this.

I appreciate anyone for helping me a step further.

Feel free to let me know if you would have any questions about my case.

Jochem






jochem_van_grondelle

Posts : 11
Join date : 2009-09-22

View user profile

Back to top Go down

Re: Conformed dimensions

Post  ngalemmo on Mon Aug 01, 2011 9:39 am

I imagine you could build another fact, but are you not overcomplicating things? Could you not get the same result with a query using an embedded select?
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Conformed dimensions

Post  VHF on Mon Aug 01, 2011 9:45 am

jochem_van_grondelle wrote:My idea is to make a SoftwareProductDimension, and conformed to that, a SoftwareProductVersionDimension.

What you are describing sounds like snowflaked dimension (rather than conformed dimensions.) It is quite common for new dimensional modelers to create snowflake dimensions because it follows best practices 3NF relational design. However, snowflaking is generally frowned upon in the DW world because it adds complexity (especially for end users that have trouble understanding relational models) and reduces query performance.

Instead, you should denormalize your dimensions in order to create simple star schemas. In your case, this would mean a single SoftwareProductDimension with attributes for Vendor, Product, and Version. There would be one row for each possible version of a product. (Your ETL process will need to handle creating a new row whenever a new vendor/product/row shows up.) You can also add Description and Price* to this dimenson. It is OK to repeat the description information for as many rows as there are versions of a product. This is the denormalized aspect of the dimension.

You can then have a single fact table SoftwareInstalledFact which would point to the appropriate row in SoftwareProductDimension as well as other dimensions for machine (which could include attributes for department, physical location, etc.) and date.

*There are two ways to handle price: Either as an attribute in SoftwareProductDimension or in a fact table of its own. What happens when a price changes? How will you receive the price information? How often will prices change? Do you want to keep historical price information, or only the most current price? If you need only the most current price for a given version of the software, this can easily be stored as a dimension attribute. Tracking changes would require more advanced SCD2 (Slowly Changing Dimension Type 2) techniques or the use of a periodic snapshot fact table to capture price.

VHF

Posts : 236
Join date : 2009-04-28
Location : Wisconsin, US

View user profile

Back to top Go down

Re: Conformed dimensions

Post  jochem_van_grondelle on Mon Aug 01, 2011 9:59 am

ngalemmo wrote:I imagine you could build another fact, but are you not overcomplicating things? Could you not get the same result with a query using an embedded select?
Thank you for your response.

For multiple reasons I would not stick to one fact table and one productdimension.
-Information about products (such as price, reseller and description) is provided on a product level (not version). Should I repeat all of these extra attributes in every version of a product?
-For all users that will have access to a reporting environment, I would like it easy for them to distinguish Product vs Product+Version, and to avoid them to make any mistakes. Because there will be big differences between aggregated data on a Product vs. Product Version level.
-As my case was simplified, there will be an even more grained fact table, concentrating on individual files that were found on a computer that identify a product version (and so a product), which should be designed the same way. Product -> Product Version -> Product Version Files. From this quite detailed data, I could quickly build up the other - kind of aggregated - fact tables.

I hope you understand these, but I am still very curious to your opinion.

Just to understand you well about the embedded select; Is this what you mean if I would stick to one fact table?

Code:

Computer Name | Product  | Version
--------------+-----------+----------
Comp1         | Photoshop | CS3
Comp2         | Photoshop | CS3
Comp2         | Photoshop | CS4

SELECT Product, Count(*)
FROM
(SELECT DISTINCT Computer Name, Product From VersionInstallations) AS ProductInstallations
GROUP BY Product
which would return a Count of 2 for Photoshop

jochem_van_grondelle

Posts : 11
Join date : 2009-09-22

View user profile

Back to top Go down

Re: Conformed dimensions

Post  jochem_van_grondelle on Mon Aug 01, 2011 10:08 am

VHF wrote:What you are describing sounds like snowflaked dimension (rather than conformed dimensions.)

Thanks for your explanation.

I am aware of the important differences between snowflakes and denormalized dimensional modeling.
I believe that in the case I tried to explain, I am still referring to a dimension model. Maybe I have caused some misunderstanding?

My solution would be (very simplified)

FACTS
SoftwareVersionInstallation (FK to Computer, SoftwareProduct, SoftwareProductVersion)
SoftwareInstallation (FK to Computer, SoftwareProduct)

DIMENSIONS
Computer (Computer Name, Owner, IP)
SoftwareProduct (Vendor, Product Name, Description, Price)
SoftwareProductVersion (Vendor, Product Name, Version)

VHF wrote:*There are two ways to handle price: Either as an attribute in SoftwareProductDimension or in a fact table of its own. What happens when a price changes? How will you receive the price information? How often will prices change? Do you want to keep historical price information, or only the most current price? If you need only the most current price for a given version of the software, this can easily be stored as a dimension attribute. Tracking changes would require more advanced SCD2 (Slowly Changing Dimension Type 2) techniques or the use of a periodic snapshot fact table to capture price.
As history of pricing is not important, I could add price as an attribute to the SoftwareProductDimension

jochem_van_grondelle

Posts : 11
Join date : 2009-09-22

View user profile

Back to top Go down

Re: Conformed dimensions

Post  VHF on Mon Aug 01, 2011 10:29 am

jochem_van_grondelle wrote:Should I repeat all of these extra attributes in every version of a product?
Yes, that is best practice in dimensional modeling. It is very hard when coming from the 3NF world to resist the temptation to factor out repeated values!

Let's try it out with a single fact table:

FACTS
SoftwareInstallation (FK to Computer, FK to SoftwareProduct)

DIMENSIONS
Computer (Computer Name, Owner, IP)
SoftwareProduct (Vendor, Product Name, Version, Description, Price)

This allows you get information by version:

SELECT Vendor, Product Name, Version, COUNT(*)
FROM SoftwareInstallation AS f1
JOIN SoftwareProduct AS d1 ON d1.SoftwareProductKey = f1.SoftwareProductKey
GROUP BY Vendor, Product Name, Version

Or just by product:

SELECT Vendor, Product Name, COUNT(*)
FROM SoftwareInstallation AS f1
JOIN SoftwareProduct AS d1 ON d1.SoftwareProductKey = f1.SoftwareProductKey
GROUP BY Vendor, Product Name

(Note that there is no need for a DISTINCT subquery--just GROUP BY whatever level you are after.)

You are concerned about what your users will see and how they will query the system. In general, a single star schema for each subject area keeps things as simple as possible for them. What tool(s) will they be using to query the DW? Most BI tools automatically do the grouping based on which fields the users drag into their reports.

(No concerns with you creating a separate fact table to identify which files were found on the scanned systems. Presumably only IT-insiders will be using that one!)

VHF

Posts : 236
Join date : 2009-04-28
Location : Wisconsin, US

View user profile

Back to top Go down

Re: Conformed dimensions

Post  VHF on Mon Aug 01, 2011 10:34 am

BTW, I see now that you are not snowflaking. You are just creating another dimension and another fact at different grains. Sometimes this is required, so it isn't always a wrong approach, although I still feel that in your case everything can be handled with a single version-grained star schema.

Are there any queries you still feel couldn't be easily answered using the single fact table?

VHF

Posts : 236
Join date : 2009-04-28
Location : Wisconsin, US

View user profile

Back to top Go down

Re: Conformed dimensions

Post  jochem_van_grondelle on Mon Aug 01, 2011 10:36 am

Thanks again.

I need to think a bit. I will get back on this!

jochem_van_grondelle

Posts : 11
Join date : 2009-09-22

View user profile

Back to top Go down

Re: Conformed dimensions

Post  hang on Mon Aug 01, 2011 3:36 pm

VHF's model should work. Just remember the grain of the product dimension is down to the version level. The product name/id is just a grouping (aggregate) attribute in the dimension. The natural key would be on two columns, product id and version number.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

View user profile

Back to top Go down

Re: Conformed dimensions

Post  ngalemmo on Mon Aug 01, 2011 4:38 pm

I have no problem having separate product and product version dimensions. There is some minor redundancy, but it is no big deal.

I question the need for a second fact table in order to avoid counting multiple versions of the same product twice for a given user. A count of distinct users (i.e. COUNT(DISTINCT ) ) grouped by product (no version) would handle the problem in a version level fact table.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Conformed dimensions

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum