type 1 dimension - new requirements for attributes that will be updated often

View previous topic View next topic Go down

type 1 dimension - new requirements for attributes that will be updated often

Post  topcat on Tue Feb 05, 2013 1:50 pm

i have a type one product dimension which gets updated maybe once every 6-8 weeks. new requirements are for attributes that will likely be updated quite often, possibly once a day, but most likely 2-3 times a week. the product dimension is quite large so i thought to break the new attributes out into their own dimension (for performance reasons). the new attributes are pretty unique to the product so pretty much 1-to-1 with the product itself. how should this be modeled? should the new dimension have the same PK as the product dimension? include the natural keys in the new dimension as well?
note that there is no reason to track history for any of the data.

thanks!

topcat

Posts : 19
Join date : 2012-08-09

View user profile

Back to top Go down

Re: type 1 dimension - new requirements for attributes that will be updated often

Post  ngalemmo on Tue Feb 05, 2013 2:32 pm

What are you concerned about that would make you consider splitting the table?

Daily updates, particularly in a type 1 dimension should not be a major concern. Do you have performance or service level requirements that you are not able to maintain?

Anyway, if you do, the technique is called 'vertical partitioning'. You would use the same PK on both tables. Only one needs to carry the natural key.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

re: type 1 dimension - new requirements for attributes that will be updated often

Post  topcat on Tue Feb 05, 2013 2:55 pm

Concerned about performance, the dimension is very wide and will potentially contain a quarter million records. The dimension values are "behavioral" dimensions based on facts. for example, something along the lines of how many times the product was ordered (its not that exactly, but something similar). So overall, i am concerned about the db engine updating 100K+ records which are each 10k characters wide. The other thing i didnt mention is the db engine (netezza) works on an insert only strategy, so updates are really inserts with a later reclaim process that removes the old inactive records.

topcat

Posts : 19
Join date : 2012-08-09

View user profile

Back to top Go down

Re: type 1 dimension - new requirements for attributes that will be updated often

Post  ngalemmo on Tue Feb 05, 2013 10:30 pm

Netezza... ok. Vertical partitioning on a Netezza platform is an effective way to improve query performance of a very wide table. But you need to break it down in a way that minimizes the time you need more than one table in a typical query. One of the vertical partitions should be the 'rarely used' collection of attributes... hopefully these may be a sizable chunk of a row that you remove from query overhead almost all the time. This may mean having some attributes in common across tables.

Make sure all tables are distributed on the primary key, and use the same primary key for all tables. Organizing the tables on one or two commonly used predicates will also help. In V7.0 they are going to a finer grained zone map (smaller disk blocks) which will improve the results of organization and zone maps in general.

Another effective mechanism is to declare materialized views. A Netezza materialized view is restricted to columns from one table (no joins). Its purpose is to create vertical partitions to help query performance. The difference is it materializes a copy of those columns from the main table and it keeps the content in sync with the base table. Another advantage it is transparent to the query. The user submits a query to the main table. The query optimizer will check to see if there is an appropriate materialized view and use that instead. User doesn't need to know or care about the views.

A 100K+ row table is no big deal. If you have a moderate sized machine with 48 SPUs, you are dealing with 48 little tables of around 2K rows each. At an average size of 10K bytes, each spu would need to read, at most, 20MB of disk, cache what it needs to memory (it only pulls the attributes you use in the query, so cache used is much, much less that 20MB), then a hash join to the fact table (with any other needed dimensional attributes in memory). Its very quick. If you can come up with no more than 3-4 different subsets (because of the overhead to maintain these tables when updating the base table), materialized views may be a more effective use of the machine.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: type 1 dimension - new requirements for attributes that will be updated often

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum