BIT vs CHAR
+2
ngalemmo
mnachu
6 posters
Page 1 of 1
BIT vs CHAR
I recently heard that using CHAR(1) is better than using BIT in data warehouse. Also the suggestion was to use 'Y'/'N' when using CHAR(1).
Does anybody know why would that be? I also heard that we should avoid using tinyint in star schema.
Both the suggestions were related to having the star schema in the SSAS Cubes.
Thanks,
Nachi
Does anybody know why would that be? I also heard that we should avoid using tinyint in star schema.
Both the suggestions were related to having the star schema in the SSAS Cubes.
Thanks,
Nachi
mnachu- Posts : 8
Join date : 2011-04-19
Re: BIT vs CHAR
The Y/N thing is primarily from a user-interface point of view. It's usually better understood by end users.
As far a tinyint goes, I would imagine that you don't want to get into a situation of having to modify the schema because the range of values goes beyond what you had expected. There really isn't any savings using tiny integers... saving 3 bytes just isn't worth it.
As far a tinyint goes, I would imagine that you don't want to get into a situation of having to modify the schema because the range of values goes beyond what you had expected. There really isn't any savings using tiny integers... saving 3 bytes just isn't worth it.
tinyint keys
ngalemmo wrote:There really isn't any savings using tiny integers... saving 3 bytes just isn't worth it.
What about 3 bytes * 500M fact records * 5 dimensions with tinyint keys = 7GB? :-)
Last edited by VHF on Tue Jul 26, 2011 4:28 pm; edited 1 time in total (Reason for editing : changed title)
VHF- Posts : 236
Join date : 2009-04-28
Location : Wisconsin, US
tinyint keys
The advice to avoid tinyint with SSAS may be because there is a bug in SQL 2005/2008 SSAS in the Data Source View (DSV) where it doesn't correctly join a tinyint FK in the fact table with a tinyint dimension PK. I'm not sure if it has been fixed R2.
There are a couple workarounds to fix it (one involves editing the DSV XML). What you want to end up with with both the dimension PK and the fact FK having a DataType property of System.Byte. Then things will work correctly with a tinyint PK/FK.
I don't know enough about the internal architecture of SSAS to know if tinyint keys in the relational source translates to smaller storage in the cube.
There are a couple workarounds to fix it (one involves editing the DSV XML). What you want to end up with with both the dimension PK and the fact FK having a DataType property of System.Byte. Then things will work correctly with a tinyint PK/FK.
I don't know enough about the internal architecture of SSAS to know if tinyint keys in the relational source translates to smaller storage in the cube.
Last edited by VHF on Tue Jul 26, 2011 4:28 pm; edited 1 time in total (Reason for editing : changed title)
VHF- Posts : 236
Join date : 2009-04-28
Location : Wisconsin, US
Re: BIT vs CHAR
VHF wrote:ngalemmo wrote:There really isn't any savings using tiny integers... saving 3 bytes just isn't worth it.
What about 3 bytes * 500M fact records * 5 dimensions with tinyint keys = 7GB? :-)
I'll give you the $2.19 the extra disk is going to cost you...
Re: BIT vs CHAR
ngalemmo wrote:I'll give you the $2.19 the extra disk is going to cost you...
At $0.31/GB you aren't running on a NetApp SAN! :-)
VHF- Posts : 236
Join date : 2009-04-28
Location : Wisconsin, US
Re: BIT vs CHAR
You can't sum BIT, so I generally don't use them. I like the smallest numeric datatype and use values 1 or 0.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Re: BIT vs CHAR
ngalemmo wrote:OK... $79.99. That's my final offer.
Too funny! I'll have to share this with my network guys!
VHF- Posts : 236
Join date : 2009-04-28
Location : Wisconsin, US
Re: BIT vs CHAR
Our network engineers tell me the SAN storage comes in at about $6/GB (when you buy SAN disk you are actually buying licensing to expand the SAN as well as disks and racks.) So it's a lot more expensive than buying a hard drive at Office Depot, but even at SAN prices I'm going to have to give you a refund of $37.99!
Now that I am (almost) convinced to go with 32-bit integer SKs for all dimensions, should I go with a single key generator for all dimensions or separate key generators for each dimension?
Now that I am (almost) convinced to go with 32-bit integer SKs for all dimensions, should I go with a single key generator for all dimensions or separate key generators for each dimension?
VHF- Posts : 236
Join date : 2009-04-28
Location : Wisconsin, US
Re: BIT vs CHAR
I shop at Fry's... better prices.
If you can get away with it, one generator has the advantage of giving you mutually exclusive key values. It allows you to do some interesting things, such as creating a super-type dimension later on.
If you can get away with it, one generator has the advantage of giving you mutually exclusive key values. It allows you to do some interesting things, such as creating a super-type dimension later on.
Re: BIT vs CHAR
I can see where that might be very useful, for example a "shipping party" might be a vendor, a customer, or a internal entity. Piece of cake if the SKs are unique across the DW.
Now if I can only talk myself into going 32-bit all the way... even for those little tiny dimensions with only a handful of rows!
Now if I can only talk myself into going 32-bit all the way... even for those little tiny dimensions with only a handful of rows!
VHF- Posts : 236
Join date : 2009-04-28
Location : Wisconsin, US
Re: BIT vs CHAR
Bit:
Will store 1 or 0 or null. Only takes a Bit to store the value (by definition!). Usually used for true or false, and many programming languages will interpret a bit as a true or false field automatically.
Char [1]:
A char takes 8 bits, or one byte, so its 8 times larger when stored. You can store (pretty much) any character in there. Will probably be interpreted as a string by programming languages. I think Char [1] will always take the full byte, even when empty, unless you use varchar or nvarchar.
Will store 1 or 0 or null. Only takes a Bit to store the value (by definition!). Usually used for true or false, and many programming languages will interpret a bit as a true or false field automatically.
Char [1]:
A char takes 8 bits, or one byte, so its 8 times larger when stored. You can store (pretty much) any character in there. Will probably be interpreted as a string by programming languages. I think Char [1] will always take the full byte, even when empty, unless you use varchar or nvarchar.
lorenzolamas- Posts : 3
Join date : 2011-10-29
Re: BIT vs CHAR
so its 8 times larger when stored.
Well, not really. Read your DB docs carefully. Whatever you do, columns need to be byte aligned for obvious reasons (sometimes word aligned, depending on the DBMS and OS). So, if the table contains a single bit column, it will take at least a byte to store it. What differs between DB implementations is what does the DB do if the table contains multiple bit columns... does it pack up to 8 such columns into a single byte or does it store each column in its own byte or does it depend if the columns are declared contiguously in the DDL.
Re: BIT vs CHAR
Bits can be in an index but they generally shouldn't be. The reason bit shouldn't be in an index is you only have 2 values (3 including null). For an index to be effective, the range of values should be larger than that and there should be a good deal of diversity in the values. According to the Microsoft SQL Server 2000 Performance Tuning Guide, the general rule is if a query accesses more than 20% of the rows for a given table, it's better to do a table scan than use an index. Given with equal distribution you're looking at 33.3% per value (I'm including the null... otherwise we're looking at 50%), that'd be a table scan. Hence the reason bits don't generally get put in indexes. It's possible, just typically not very useful.
As for space, SQL Server will automatically allocate 1 byte for the first bit. It'll fill in the rest of the byte with other bit columns as needed, up to 8. Then it'll start on the next byte if there are more bit columns. If space is a concern, take that into account (but it usually isn't). Therefore if you have more than 1 column of this sort, you will see a space savings over char(1) and tinyint, both of which will take up 1 byte per column.
As for space, SQL Server will automatically allocate 1 byte for the first bit. It'll fill in the rest of the byte with other bit columns as needed, up to 8. Then it'll start on the next byte if there are more bit columns. If space is a concern, take that into account (but it usually isn't). Therefore if you have more than 1 column of this sort, you will see a space savings over char(1) and tinyint, both of which will take up 1 byte per column.
bbjumpman- Posts : 3
Join date : 2011-11-04
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum
|
|