BIT vs CHAR

mnachu Wed Jun 15, 2011 10:13 pm

I recently heard that using CHAR(1) is better than using BIT in data warehouse. Also the suggestion was to use 'Y'/'N' when using CHAR(1).

Does anybody know why would that be? I also heard that we should avoid using tinyint in star schema.

Both the suggestions were related to having the star schema in the SSAS Cubes.

Thanks,
Nachi

ngalemmo Wed Jun 15, 2011 10:34 pm

The Y/N thing is primarily from a user-interface point of view. It's usually better understood by end users.

As far a tinyint goes, I would imagine that you don't want to get into a situation of having to modify the schema because the range of values goes beyond what you had expected. There really isn't any savings using tiny integers... saving 3 bytes just isn't worth it.

VHF Tue Jul 26, 2011 4:15 pm

ngalemmo wrote:There really isn't any savings using tiny integers... saving 3 bytes just isn't worth it.

What about 3 bytes * 500M fact records * 5 dimensions with tinyint keys = 7GB? :-)

VHF Tue Jul 26, 2011 4:18 pm

The advice to avoid tinyint with SSAS may be because there is a bug in SQL 2005/2008 SSAS in the Data Source View (DSV) where it doesn't correctly join a tinyint FK in the fact table with a tinyint dimension PK. I'm not sure if it has been fixed R2.

There are a couple workarounds to fix it (one involves editing the DSV XML). What you want to end up with with both the dimension PK and the fact FK having a DataType property of System.Byte. Then things will work correctly with a tinyint PK/FK.

I don't know enough about the internal architecture of SSAS to know if tinyint keys in the relational source translates to smaller storage in the cube.

ngalemmo Tue Jul 26, 2011 4:42 pm

VHF wrote:
ngalemmo wrote:There really isn't any savings using tiny integers... saving 3 bytes just isn't worth it.

What about 3 bytes * 500M fact records * 5 dimensions with tinyint keys = 7GB? :-)

I'll give you the $2.19 the extra disk is going to cost you...

VHF Tue Jul 26, 2011 4:46 pm

ngalemmo wrote:I'll give you the $2.19 the extra disk is going to cost you...

At $0.31/GB you aren't running on a NetApp SAN! :-)

ngalemmo Tue Jul 26, 2011 6:12 pm

OK... $79.99. That's my final offer.

BoxesAndLines Tue Jul 26, 2011 6:12 pm

You can't sum BIT, so I generally don't use them. I like the smallest numeric datatype and use values 1 or 0.

VHF Wed Jul 27, 2011 9:45 am

ngalemmo wrote:OK... $79.99. That's my final offer.

Too funny! I'll have to share this with my network guys!

VHF Fri Jul 29, 2011 11:24 am

Our network engineers tell me the SAN storage comes in at about $6/GB (when you buy SAN disk you are actually buying licensing to expand the SAN as well as disks and racks.) So it's a lot more expensive than buying a hard drive at Office Depot, but even at SAN prices I'm going to have to give you a refund of $37.99!

Now that I am (almost) convinced to go with 32-bit integer SKs for all dimensions, should I go with a single key generator for all dimensions or separate key generators for each dimension?

ngalemmo Fri Jul 29, 2011 4:26 pm

I shop at Fry's... better prices.

If you can get away with it, one generator has the advantage of giving you mutually exclusive key values. It allows you to do some interesting things, such as creating a super-type dimension later on.

VHF Fri Jul 29, 2011 4:49 pm

I can see where that might be very useful, for example a "shipping party" might be a vendor, a customer, or a internal entity. Piece of cake if the SKs are unique across the DW.

Now if I can only talk myself into going 32-bit all the way... even for those little tiny dimensions with only a handful of rows!

lorenzolamas Wed Nov 02, 2011 1:39 am

Bit:
Will store 1 or 0 or null. Only takes a Bit to store the value (by definition!). Usually used for true or false, and many programming languages will interpret a bit as a true or false field automatically.
Char [1]:
A char takes 8 bits, or one byte, so its 8 times larger when stored. You can store (pretty much) any character in there. Will probably be interpreted as a string by programming languages. I think Char [1] will always take the full byte, even when empty, unless you use varchar or nvarchar.

ngalemmo Wed Nov 02, 2011 11:41 pm

so its 8 times larger when stored.

Well, not really. Read your DB docs carefully. Whatever you do, columns need to be byte aligned for obvious reasons (sometimes word aligned, depending on the DBMS and OS). So, if the table contains a single bit column, it will take at least a byte to store it. What differs between DB implementations is what does the DB do if the table contains multiple bit columns... does it pack up to 8 such columns into a single byte or does it store each column in its own byte or does it depend if the columns are declared contiguously in the DDL.

bbjumpman Tue Nov 29, 2011 5:37 am

Bits can be in an index but they generally shouldn't be. The reason bit shouldn't be in an index is you only have 2 values (3 including null). For an index to be effective, the range of values should be larger than that and there should be a good deal of diversity in the values. According to the Microsoft SQL Server 2000 Performance Tuning Guide, the general rule is if a query accesses more than 20% of the rows for a given table, it's better to do a table scan than use an index. Given with equal distribution you're looking at 33.3% per value (I'm including the null... otherwise we're looking at 50%), that'd be a table scan. Hence the reason bits don't generally get put in indexes. It's possible, just typically not very useful.

As for space, SQL Server will automatically allocate 1 byte for the first bit. It'll fill in the rest of the byte with other bit columns as needed, up to 8. Then it'll start on the next byte if there are more bit columns. If space is a concern, take that into account (but it usually isn't). Therefore if you have more than 1 column of this sort, you will see a space savings over char(1) and tinyint, both of which will take up 1 byte per column.

Sponsored content

BIT vs CHAR

BIT vs CHAR

Re: BIT vs CHAR

tinyint keys

tinyint keys

Re: BIT vs CHAR

Re: BIT vs CHAR

Re: BIT vs CHAR

Re: BIT vs CHAR

Re: BIT vs CHAR

Re: BIT vs CHAR

Re: BIT vs CHAR

Re: BIT vs CHAR

Re: BIT vs CHAR

Re: BIT vs CHAR

Re: BIT vs CHAR

Re: BIT vs CHAR