Do I need a second fact table?

View previous topic View next topic Go down

Do I need a second fact table?

Post  lprince on Tue Dec 15, 2015 12:09 pm

I have a "factless" fact table that contains daily snapshots.

The fact table is constructed:

PriKey,
Report_date,
SDCompKey (this connects to a dim table for Sales DistrictComposition - SalesDistrict,State,County - SCD)
CustomerType1 Surrogate Key (this connects to a dim table for CustomerType1 - name, address, etc - SCD)
CustomerType2 Surrogate Key (this connects to a dim table for CustomerType2 - name, address, etc - SCD)
ObjectCount (factored Counter specifically associated to CustomerType1 to sum for a daily particular activity)

I am stuck using a cube to serve the results as the UI, but I need to get daily counts for CustomerType2 by SalesDistrict grouping on the sales district name and customertype 2 name.

I currently do this using a query in my staging database area:

select report_date, salesdistrictname, count(*) as CT2Count
from (select distinct report_date, salesdistrictname, ct2Name
from facttable f
join salesdistrictcompdim s on f.sdcompkey=s.sdcompkey
join ctype2dim c on f.ctype2key=c.ctype2key ) a
group by report_date, salesdistrictname

Should I build a second fact table and cube measure to get the results I need? I can't figure out how to query the current cube to give me the correct total. Record count won't do it and distinct counts using just the keys on the fact table won't do it either. I really don't want to build another ObjectCount field for CustomerType2 if I can avoid it. I am using MS Analysis Server with the fact table and dim tables in a cube.

Thanks


lprince

Posts : 9
Join date : 2015-11-11

View user profile

Back to top Go down

Re: Do I need a second fact table?

Post  ngalemmo on Tue Dec 15, 2015 4:17 pm

Why do you need the name to do a count? Are you not simply counting customers?

Would not

select report_date, salesdistrictname, count(distinct ctype1key) as CT2Count
from facttable f
join salesdistrictcompdim s on f.sdcompkey=s.sdcompkey
group by report_date, salesdistrictname

do the same thing?
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Do I need a second fact table?

Post  lprince on Tue Dec 15, 2015 6:00 pm

Oh I wish it did.... But no, I'm dealing with historic trashy data (not my source, so I've been struggling with how far to clean it.)

I'm using name because all historical reporting had been done grouping on salesdistrict and CustomerType2name so I'm trying to verify my counts against history to make sure I have accurate numbers. That said, It's a twofold problem, because the customerType2 is a company, they may have field offices in multiple locations, which is in the raw data. Additionally, because different people are doing data entry, there are typos (Inc. vs ,Inc, etc) in the contact info across counties/states for same company on the same day. The best I could come up with was to dump all the unique values into the Dim table and then create a durablekey for ones that I could confirm are the same based on reasonable variation in name, same address, same phone #, etc. So as a result, the Dim table has a surrogate key: CType2_key and Cust_DurableKey, customer_name, customer_rollup_name (could be a parent company or corrected version of the name), address, phone, city, state, and zip.

So, unfortunately, while CType2_SurrogateKey is unique in the fact table, the grouping returns too many records because what I technically need is a rollup on the Cust_DurableKey and not the Customername (in the final version). I'm new to learning DWH, but I didn't think I should put the DurableKey into the fact table. If that's what I should do, that might fix my problem.



lprince

Posts : 9
Join date : 2015-11-11

View user profile

Back to top Go down

Re: Do I need a second fact table?

Post  ngalemmo on Tue Dec 15, 2015 8:11 pm

Isn't ctype1key your durable key?

If this key represents a field office, but you are trying to count head offices/companies, you need either a hierarchy or attributes for the head office in your dimension. You could then count head offices based on distinct values of the head office id.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Do I need a second fact table?

Post  lprince on Wed Dec 16, 2015 10:20 am

Nope, sorry to cause confusion, but Ctype1Key belongs to a totally different set of businesses. There are roughly 200 type1s and 3000 type2s with random subsets that come and go on a daily basis. The two groups work in tandem at the same location on the same item, and both of them can be customers of our business so we think of them separately and as a result, I have separate dim tables for each of them. I was able to solve the problem for the CustomerType1 counts by introducing a factored object count (that totals to 1 for a given day/CustomerType1) because it was easy to do and the worst case was a division by 4. In the case of CustomerType2, it could be a division by 2 all the way up to 40 something to come up with a factored count so I was looking for a more elegant solution. I'm thinking a calculated measure might do the trick, but I'm only halfway through the MDX book that I'm reading and I don't know if that would be better or if I really should make a second cube just for CustomerType2 that's aggregated the way I need it. My problem with that though is that they won't be able to drill down through the data once I do it that way.




lprince

Posts : 9
Join date : 2015-11-11

View user profile

Back to top Go down

Re: Do I need a second fact table?

Post  lprince on Thu Dec 17, 2015 11:37 am

I solved it using the following mdx for a single day:

with member measures.uniqCt2
as count(nonempty(([Cust_Type2Dim].[CT2Name].[Ct2Name]*[Measures].[RowCount])))
select measures.uniqCt2 on 0,
nonempty([salesdistrictdim].[salesdistrictnumber].[salesdistrictnumber]) on 1
from CubeName
where ([YMD_date].[DateValue].&[2015-01-05])

I'll still have to fuss with it a bit more to get the date into rows, but it does a distinct count of CustType2 for each sales district.
It took me a couple of days of reading/googling to get the MDX syntax right, so I thought i'd post back and maybe save someone else time.


Last edited by lprince on Thu Dec 17, 2015 11:38 am; edited 1 time in total (Reason for editing : typo in syntax)

lprince

Posts : 9
Join date : 2015-11-11

View user profile

Back to top Go down

Re: Do I need a second fact table?

Post  ngalemmo on Thu Dec 17, 2015 12:58 pm

So, you are trying to allocate measures to different grains?

Usually you handle this with a bridge table. The bridge table joins with a fact dimension fk and the bridge references whichever dimension it needs to. The bridge may contain an allocation factor if you need it.

My question would be, do the facts always contain references to a ctype1 and ctype2 dimension row, or can it sometimes be null (i.e. any)?

The nice thing about a bridge is you build it to support a specific allocation need. Everything needed to support queries is pre-calculated and stored on the bridge. And each specific bridge can be combined with another bridge. Allocation of those facts would be derived by multiplying the allocation factors in the bridges being used.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Do I need a second fact table?

Post  lprince on Thu Dec 17, 2015 2:44 pm

maybe, but not intentionally. The grain of the fact table is "all activities reported on a given day for any given location by these two groups of businesses/customers that use any given piece of equipment." It's not like anything I can find in a book or neat and tidy lesson and it's "factless" which makes looking for examples of the right way to structure my tables/dims that much harder. The businesses are always in tandem, so there isn't ever a record that has one without the other. CustomerType1Dim is the "owner" of the equipment and CustomerType2Dim is the operator of the equipment, but they can both be our customers or be a potential customer, that's a different key I have in each dim table. I tried to reduce the complexity in my example to avoid confusing the issue I was trying to solve. Sales districts are geographic based, but if Customer A is active in District 1 and District 2, they want it counted as 2, not 1 because each sales district is its own universe and ultimately we are using this as a measure of 'potential customers' that exist at a point in time in each district.

Would it be better structure to build an aggregated fact table that contains a single record for each Day, SalesDistrict, Cust1, Cust2 combination and lose the details of what they are working on? The total dataset is less than a million records. I wanted to make it so that the people could look at the high level totals, but ultimately, if they were curious enough, be able to drill down through the data to see what activity and location information related to where the businesses are working.

I've read about bridge tables a little bit on line, but honestly am only halfway through the Data Warehouse Toolkit book, so I'm not sure I understand how to apply them in this situation.

lprince

Posts : 9
Join date : 2015-11-11

View user profile

Back to top Go down

Re: Do I need a second fact table?

Post  nick_white on Fri Dec 18, 2015 5:42 am

I have to admit to getting a bit confused by all the previous posts but reading your last post am I correct in thinking that the grain/uniqueness of your fact table is defined by these keys:
Day, SalesDistrict, Cust1, Cust2 and EquipmentID (or whatever you've called it)?

Also, what you are trying to achieve is a count of unique Cust2 by Day and SalesDistrict?

An aggregate is not going to help, IMO, as aggregates only provide performance improvements they don't give you any more information.

Why won't "SELECT Day, SalesDistrict, count (distinct Cust2) from FactTable GROUP BY Day, SalesDistrict" work?

nick_white

Posts : 366
Join date : 2014-01-06
Location : London

View user profile

Back to top Go down

Re: Do I need a second fact table?

Post  lprince on Fri Dec 18, 2015 9:52 am

Thank you for that confirmation on aggregation.

The query I originally posted in my first post does the totals correctly for me, but I can't query a Cube that way, the interface that we're exposing to the users is measures/dims from a Cube using SSAS and a vendor based solution I think is equivalent to PowerPivot. The uniqueness of the table is based on fields: report_date, geo_locationkey. Everything else isn't unique unless I include report_date and geo_locationkey. There can be multiples of owner/operator (CustType1/CustType2, respectively) doing multiple activities using multiple pieces of equipment in multiple sales districts. The CustType2ID represents unique entries in the Dim Table (containing CustType2ID, Cust_DurableKey, CustName, Cust_Rollup_Name, Phone, Address, etc) for the SAME Customer (Due to Typos in data, field office locations and other variations in contact info (multiple phone #s) - think SCD not only in time, but in many cases no single record per customer that is truly the "right" one, just unique variations that have to be grouped together by the DurableKey). It's the nature of the source data which is outside our control. I pondered building an outrigger dim table that had only the DurableKey and rollup_name in it (so I would have a table with single entry per Customer), but wasn't sure if there was an advantage.

The business explanation: report is used to count "potential customers available on any given day", which is later compared to a count of "customers sold to on any given day" to determine sales effectiveness. The reason I need a count of CustType2 (grouped on customer name) by salesdistrict is to match an existing report that is currently generated manually on a weekly basis to one that will come from the cube "on-demand" and can be sliced/diced in ways other than just the daily totals. The new report will group on SalesDistrict, Cust_DurableKey rather than name, so it will correct some other errors I've found in the report (name identical, but they are not the same company).

The MDX query I posted gives me the solution I needed so I can live with the structures I've got while I do some more research on bridge tables.

lprince

Posts : 9
Join date : 2015-11-11

View user profile

Back to top Go down

Re: Do I need a second fact table?

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum