Kimball Forum
Would you like to react to this message? Create an account in a few clicks or log in to continue.

Varchar2 sizing in DW

+2
ngalemmo
barrettd
6 posters

Go down

Varchar2 sizing in DW Empty Varchar2 sizing in DW

Post  barrettd Thu Sep 02, 2010 2:57 pm

Is there a good reason not to make every varchar2 field the same large length, varchar2(255) for example?

I don't want to use the database to enforce size rules as I will still want to insert the oversized value, along with an error record in the audit table.

barrettd

Posts : 1
Join date : 2009-08-18

Back to top Go down

Varchar2 sizing in DW Empty Re: Varchar2 sizing in DW

Post  ngalemmo Thu Sep 02, 2010 4:19 pm

From a database standpoint it doesn't matter (heck, use VARCHAR2(4000)).

But, it does matter from a documentation/useability standpoint. The documentation benefits should be obvious. On the useability side, most query tools will attempt to size columns and fields based on their size declaration in the database. Making everything large will often require the user to manually adjust the display width in reports. It can get to be real annoying...
ngalemmo
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

http://aginity.com

Back to top Go down

Varchar2 sizing in DW Empty What is the answer to the Varchar2 debate?

Post  ianc Mon Nov 22, 2010 11:50 am

This is a subject that keeps occurring and I've yet to find a definitive answer. I'd like to know what the professional opinion is with regards to rapid development and saving costs in an industrial environment. It is of interest with regards to development standards.

In my scenario, the DW accepts data from multiple databases. Much of the data is common, so should populate existing fields. The number of inputs is likely to grow. We have scenarios where reading from a new database is necessary (over time) that increases the size of fields in our existing conformed dimensions (fields used for keys, codes, and descriptions). This means we have to regression test our existing code if we increase sizes, so ideally I'd like us to only have to go back over the minimum.

To speed up development (and reduce the amount of changes to existing code), can you see any drawbacks in the over sizing of VARCHAR2 fields (not excessively, but to a happy medium). As VARCHAR2 only uses the storage that it needs to, is there any harm?

I am aware that some products / developer code may read the DDL definitions and size fields accordingly. The only problem that I may have heard of could be to do with composite indexes and space used if they are on the oversized fields (exceeding a maximum allowed by Oracle).

I've asked trainers and DBAs (from Oracle) who say they can see the advantage of over sizing the fields and that there should not be a performance impact. Other DBAs I've asked say that there may be problems with composite indexes and memory usage. Other DBAs I've asked have said to stop being lazy and just profile / size the fields correctly. Other warehousing experts (of successful systems) I have spoken to recommend over sizing. Tom Kyte's website explains that the DDL metadata is there to enforce data integrity and prevent users from inputting excessive values so we should size accordingly. It also refers to the business requirement. There may also be issues with establishing maximum sizes where we have fields that are derived from derived fields.

I'd just wondered if you'd seen this concept before in your organisations, and what your opinion was, and what best practice was. I’d like to be able to say that when creating DDL, set the field size for all codes to varchar2(x), all descriptions to varchar2(y) etc. Having standarised data types in can be useful when performing joins (in particular work table stages prior to the presentation layer) especially around the use of indexes (with Oracle not having to do conversions e.g. number vs varchar).

ianc

Posts : 1
Join date : 2010-11-22

Back to top Go down

Varchar2 sizing in DW Empty Re: Varchar2 sizing in DW

Post  John Simon Mon Nov 22, 2010 6:07 pm

In SQL Server at least, the maximum size of an index is 900 bytes. So large varchar fields will not be indexable. Generally if I see a table with all fields as varchar(255), the table was built either in a hurry, or by someone not experienced in building data warehouses.

John Simon

Posts : 186
Join date : 2009-05-29
Location : Melbourne, Australia

http://jsimonbi.wordpress.com

Back to top Go down

Varchar2 sizing in DW Empty Re: Varchar2 sizing in DW

Post  BoxesAndLines Mon Nov 22, 2010 9:17 pm

I didn't think testing was that difficult. With the testing tools available today, regression testing can be simple as kicking off a test suite of scripts. Personally, I'd rather have my ETL processes fail when it encounters invalid data. That way, I can actively address data quality. For example, if I have a code column that is historically two bytes and one day I get a sampling of rows with lengths > 100 bytes, I want to know. Otherwise, the business users will soon be calling in complaints on bad data.
BoxesAndLines
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

Back to top Go down

Varchar2 sizing in DW Empty Re: Varchar2 sizing in DW

Post  ngalemmo Tue Nov 23, 2010 5:00 am

Bottom line, using some arbitrary large column size is just plain sloppy. Spend the time to understand the data you are modeling and act accordingly. I would only consider doing something like that with a staging table.
ngalemmo
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

http://aginity.com

Back to top Go down

Varchar2 sizing in DW Empty My 2 cents

Post  g8rpal Fri Dec 03, 2010 6:43 pm

I've also experienced the index issue described above when a data architect at my previous employer attempted to make all the columns nvarchar(255) for all text based columns in our dimensions.

I'd also like to add that if you are using Microsoft SQL Server Integration Services (SSIS) for your ETL process it can slow down the package loads tremendously. Whenever you apply any transformations to the data it fills up the buffer very quickly causing the engine to have to constantly swap data to disk.

Secondly, if your source column is char(2) you are actually using (a bit) more storage than necessary to store the data.

Great question!

g8rpal

Posts : 10
Join date : 2010-12-03
Location : Jacksonville, FL

Back to top Go down

Varchar2 sizing in DW Empty Re: Varchar2 sizing in DW

Post  Sponsored content


Sponsored content


Back to top Go down

Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum