Time-Stamped transaction tracking not suitable, gets too big

View previous topic View next topic Go down

Time-Stamped transaction tracking not suitable, gets too big

Post  Toby on Mon Oct 03, 2011 12:21 pm

Hi

I am looking for a modelling solution to the following problem:

My base data is a large amount of residents, each of them with a large set of (clearly defined) attributes (name, gender, address, job, name of father, ...). As residents move, marry, die, and so on, their attribute data changes. These changes happen quite frequently. With each change, usually only a small set of attributes change. I need to answer questions about both the present and any point back in time.
Basically, this is the same thing as the time-stamped transaction tracking in a dimension approach that Kimball describes for employee transactions. However, due to the large amount of people (residents instead of employees) and the size of each resident entry, I know that this approach will not work, even though the kind of questions that can be answered by this model seems suitable.

What are my options here?

Thanks,
Toby


Toby

Posts : 3
Join date : 2011-10-03

View user profile

Back to top Go down

Re: Time-Stamped transaction tracking not suitable, gets too big

Post  ngalemmo on Mon Oct 03, 2011 2:31 pm

I'm sure there are frequent updates, but I can't imagine any one person is updated very frequently at all. Most people don't move that often and you only die once (last time I checked). Why don't you think the approach won't work?
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Time-Stamped transaction tracking not suitable, gets too big

Post  Toby on Tue Oct 04, 2011 3:45 am

OK, so I did the maths:

A person has about 3Kbyte worth of data which cannot easily be left out (things like 'name of father', which nobody will ever use for a query already removed). There are a lot of unicode varchar fields that are responsible for this size.
I start with 700'000 people in the largest area.
A typical week has about 10'000 updates in that area.

This leads to:
2GB worth of data to start with (700'000*3/1024/1024)
1.5GB worth of updates for a year (52*10000*3/1024/1024)

I agree that it's managable in terms of data storage. Will it be in terms of query performance? People plan to use PowerPivot to analize the data, which I guess will push their client machines to the limit (I don't know much about PowerPivot yet).

Thanks,
Toby

Toby

Posts : 3
Join date : 2011-10-03

View user profile

Back to top Go down

Re: Time-Stamped transaction tracking not suitable, gets too big

Post  John Simon on Tue Oct 04, 2011 6:10 pm

A few points:
1. PowerPivot has a great compression engine, but if yur values are mostly distinct you won't get any gains
2. PowerPivot has an effective limit of 2Gb. PowerPivot itself does not have this limit, but SharePoint does, which means you cannot share this data in PowerPivot.
3. You may want to consider using a demographics mini-dimension if these attributes are changing often, and link this dimension into your fact table.

John Simon

Posts : 186
Join date : 2009-05-29
Location : Melbourne, Australia

View user profile http://jsimonbi.wordpress.com

Back to top Go down

Re: Time-Stamped transaction tracking not suitable, gets too big

Post  Mike Honey on Tue Oct 04, 2011 7:11 pm

Hi Toby,

I'm curious about your statement: "700'000 people in the largest area". How many "areas" do you have to serve? If you have more than one or two, you may find the PowerPivot architecture a bit frustrating. AFAIK, it doesn't offer any built in support for cloning a model for different data slices, so maintenance and support of your models could get out of hand.

You might be better served building an SSAS model to cover all the "areas" in one pass, with a single model. Dimension security (on an "Areas" dimension) could split it for user consumption. The end-user interface could be Excel PivotTables, optionally published to Excel Web Services in SharePoint, so the same experience (actually a little better) to PowerPivot.

Good luck!
Mike
avatar
Mike Honey

Posts : 185
Join date : 2010-08-04
Location : Melbourne, Australia

View user profile http://www.mangasolutions.com

Back to top Go down

Re: Time-Stamped transaction tracking not suitable, gets too big

Post  Toby on Wed Oct 05, 2011 8:02 am

@John:
Thanks for this info, the compression engine will have a lot, since a lot of attributes are enum-like. The client currently wishes to not use sharepoint, so for now at least, the 2GB file size limit will not be a problem. I'll think about the demographics mini dimension, this seems to be a good approach. A lot of questions will be comparisons between different geographic locations for instance.

@Mike:
Thanks for this warning. In my case, it's one area = one customer, and the largest of all installations will be 700'000 people. There currently is one customer, with a good possibility of up to 10 other customers. These customers have quite similar needs. There is however no combined data available, and each customer runs his own database and his own clients with PowerPivot installed.
What I expect to happen is that I copy&paste the first few PowerPivot reports between customers (adjusting headers and such), and then each customer will then manage his reports himself. I will then only be called upon for new requirements that need the warehouse/etl to be adjusted or that require deeper PowerPivot knowledge (which I am not having yet).

Toby

Posts : 3
Join date : 2011-10-03

View user profile

Back to top Go down

Re: Time-Stamped transaction tracking not suitable, gets too big

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum