- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
There is now an 11-digit zip, broken into groups of 5, 4, and 2 digits.
Admin
Admin
I agree with this. But being somewhat of a newbie to SQL, I've always declared my columns with a NOT NULL constraint to avoid the whole null issue. Are there downside to doing that?
Admin
No, the @-symbol is not standard SQL. It is a Microsoft T-SQL distortion.
The Embedded SQL standard uses a colon to indicate a host variable like :first_name.
Also, Embedded SQL uses an INTO clause rather than = for variable assignments.
SELECT FIRST, LAST INTO :first_name, :last_name FROM EMP WHERE EMP_ID = 10;
The stored procedures standard, i.e. Persistent Stored Modules (PSM) might also use colons. Check the documentation. All implementation are non-standard today.
Admin
This is by far the best explanation for a story to date.
Admin
I blame XML for "let's make the names of everything be 100% self describing". That and dolts on facebook whose handles are "ShallowJerseyGirlWhoLovesColliesAndItalianFood"
Admin
nope, not really. a column should only allow nulls if it's actually valid for it to contain no data (and yes, and empty string is different to no data. i can't remember why that is the case off the top of my head, but it's true! look it up.)
you wanna watch those ntexts though, unless you're constrained to mssql2k or lower i'd go with nvarchar(max) (the ntext will be gone one day).
also, if in your example "First" is a name, i'd consider taking into account that normal people don't have names that long. but that might just be me.
carry on good sirs.
Admin
The original intent of Hungarian notation was to hold additional, and necessary, information about a variable; e.x. msTime, meaning time in milliseconds. This information was originally called the variable's 'type.' 'type' was defined in the paper, however various people didn't read the definition and assumed 'type' to mean the literal type of variable (integer etc.).
some info about this can be found here http://www.joelonsoftware.com/articles/Wrong.html
Admin
Yeah, that's about the point where I say "fuck it" and fire up Profiler.
Admin
Hey! I object! I just friended her a couple days ago. We went to high school together, and she's a really sweet person. Shame! Shame on you!
Admin
That is by far the best analogy for why this is bad that I have ever seen, heard or thought of.
You sir, are a genius because you made me laugh at the end of a very hard day.
Admin
As far as I'm concerned, not only is there nothing wrong with that, but also I would even go so far as to say that should be your default for all columns unless you have a specific reason to make it otherwise.
Admin
Admin
Why do you guys want crack, don't you know there are drugs in it?
Admin
I've never been able to figure out exactly what triggers it, but I can point to one table in one of our databases that has exactly this behaviour. If I do:
Then I get a horrific index scan that takes forever. Note that SomeField is the first field in a clustered index, and OtherField is the primary key (which is nonclustered, and yes, there is a good reason for that). Note also that this table has billions of rows, so maybe there's just something in there that's confusing the optimizer. Nevertheless, huge difference in performance.
Admin
Admin
it might be because there are plenty lots of rows where SomeField = 1 . Just guessing.
Admin
I'm forced to work with the company wide system, which has all table and field names without vowels, including 'y', unless it's first or last Letter, or in ID. Double letters are truncated down to one.
So you'll find a table like XmUsrsTms (Users Teams), with columns such as CpCseNts (Case Notes), CpFrstClntAdrs (First Client Address - yes, that's another typical problem)
Also, instead of using an identity columnn in the tables, there is another table of tables along with the next ID to use. To add a new record, you have to go to this table and get the next ID and update the record.
Also, there are no stored procedures, views or functions, or lookup tables.
I did think, due to the table naming convention, that all the problems were just down to ignorance, but I think that certainly the latter two are designed to 'keep the mystery'. Especially as the table IDs are made even more problematic by the consuming application these people built NOT using transactions when using them - effectively preventing external addition to the tables.
Same with the SPs, both mean even quite trivial external messing with the data has to be done by the developers. Need to add a record anywhere? They have to build a webservice or new screen for it.
I'm not saying that this is the case here, that's just some BASIC developer sitting down in front of SQL for the first time.
"Hmm, what's first? Declare some variables, now fill them, then output them, what's the point of this "select" in every line, tch! What a waste of time. OK. EXECUTE. Oops, some values are NULL, I need to check if they're Null somehow, Oh good, they have an IF at least. Right, sorted.
EXECUTE. What's wrong with the 'THEN' You stupid machine???"
Phew! I hope it's not always this complicated.
I remember reading my first TSQL Lessons and thinking, "I'm on lesson 3, and they've not even mentioned how you loop yet FFS!".
Admin
if @lnFax is null begin set @lnFax = '' end
I love this snippet
Admin
Sorry, but I find that to be almost as dumb/OCD as his blogging about aliased fonts. Maybe it works for his eyes, but after reading "sName" enough times I'd want to sName somebody upside the head.
Although he writes off abstraction as not suitable for this situation, I have a hunch that it would work decently.
This may not be the best idea either...but why not make an object that contains the string and a flag stating if it's safe, and a method for cleaning the string and changing the flag? Then you change Write to something that will work transparently with your object or with a string, but throw an exception if you pass it the new datatype and "unsafe."
I don't know, there's got to be something better than creating that much new syntactic sugar. Type checking is something the computer can do better than your eyes, which is why HN is superfluous to begin with. I don't think that "type" in this cases matches a built-in language type matters, it's still treated the same way.
Who knows, I haven't been doing this as long as any of you and I wouldn't be offended if anyone pointed out why what I'm saying or what I proposed is way off.
Admin
And why empty strings are not the same as nulls, regardless of what Larry Ellison says.
Admin
Obviously I have misunderstood the phrase 'crack programmer' all these years.
Admin
That's why I always make sure to train my users to type empty strings into the user interface instead of nulls.
uh...
Admin
I'm with you. BASIC used to have type declaration codes that were enforced by the compiler e.g. $ for string, % for integer etc.
So they deprecated them and encouraged hungarian notation (strBlah, intBlah) which was more effort to type, and unenforced by the compiler.
Then Microsoft said forget hungarian, use strict typing and use intellisense to view the variable type - and let the compiler enforce type checking.
Now they've defeated that by adding type inference to .NET despite the great VB4 "Variant data type" fiasco of the 1990s... sigh...
Admin
Teh awesome!
Admin
Awesome .. story in a (do{Nut}shell)
Admin
Not every country in the world uses 5+4 zip codes -- and not every country in the world uses 10 digit phone numbers. Why is it so hard to convince Americans of this?
Admin
Did you miss the ... "declare @lnCountry varchar(255)" ???
Not every country in the world uses 5+4 zip codes -- and not every country in the world uses 10 digit phone numbers. Why is it so hard to convince Americans of this?[/quote]
Well, at least the @lnTitle is long enough that all the titles fit in: "Prof Dr. rer., Dr. nat, Dark Lord, Holder of the Brass Scepter of the Dukedom of Whereveristan, Keeper of the goals, Warden of the Reindeer etc. pp. "
However, it is true, ZIP codes and Phone numbers must be strings and long enough for everyone. Surely 255 chars is a bit of of overhead, though. I guess 30 should be enough by far.
At least they chose varchar...
Admin
Actually, I think that the reason for 255 is a bit different. Unless you /really/ want to impose some kind of length limit, there is no point in putting anything else there than the maximum. A DB column declared as varchar(10) will take up just as much space as a column declared varchar(255). But if you declare it as varchar(10), you'll run into trouble later if for some reason you will need to store a longer string there.
Admin
give them crack, and eat donuts while I watch the ensuing fun
Admin
Good lord. That's all I can say.....wtf??????
Admin
Why not using a boolean type for a boolean value?
The downside / specific reason is, what do you do when somebody has no, say, fax number?
No way. "VARCHAR values are stored as a one-byte or two-byte length prefix plus data." "A varchar field will only use up as much space on disk as the data you store in it (plus a one or two byte overhead)." (MySql 5.) "For example, assume you declare a column VARCHAR2 with a maximum size of 50 characters. In a single-byte character set, if only 10 characters are given for the VARCHAR2 column value in a particular row, the column in the row's row piece stores only the 10 characters (10 bytes), not 50." (Oracle.) "varchar(n) variables store non-fixed length character strings consisting of approximately n characters. They consume l+2 bytes of space, where l is the actual length of the string." (SQL Server.) And so on...
In case you were speaking about the space allocated, then a varchar(n) column usually allocates n+1 or n+2 bytes, so there is still a difference between varchar(10) and varchar(255).
Admin
Ehh, you didn't get it...
What I meant was that you should read the documentation on your DBMS and decide on what type to use. As long as the overhead between choices A and B is the same, use the bigger one.
So, on MySQL, if your string will be short, use varchar(255). That way the storage space requirement will be N+1, where N is the number of characters. If your string might be longer than 255, use varchar(65535) - the maximum you can get with storage space being N+2. But there is no point in using varchar(100), because varchar(255) will have the same overhead, but be more flexible.
On MSSQL you would normally just use nvarchar(max), because anything less than that has the same storage requirements anyway.
I'm not well versed in Oracle, but you get the idea.
As for NULL columns - there is a point for them, because a NULL is a valid value as well, and denotes that the cell is empty. Using some sentinel values for that often makes the code more cumbersome. For example, I once had to work with a MySQL DB where there were no NULLs. So, when someone wanted to make a cell "empty", he put 00-00-0000 00:00 in there (valid in MySQL). Unfortunately .NET's DateTime type does not allow such a value. The hoops I had to jump through to get it working...
Also, NULLs are invaluable for foreign keys. (You DO use Foreign Keys, don't you?) All the JOINs are based around the fact that a non-existent key is NULL, not 0 or some other value.
Admin
And this line of reasoning is why things like nearly every app I'm currently running uses more memory than my video card had 6 years ago. Sure, that logic might work for some databases, but then some hack job programmer says "Hey, I can do that in xxx too" only then it takes up an unnecessary amount of memory. Jebus people, how friggen hard is it to properly DESIGN AND SPEC your database tables. If your front end only allows 100 characters, make the backend only allow 100 characters. Inconsistencies only lead to failure.
Admin
Yes, I really don't get it. At all. If you declare a column as varchar(15), then 16 bytes of space get allocated (in single-byte), and if you fill it with just 8 chars, the actual space occupied is 9 bytes. (In double-byte, it would be 32 bytes allocated, 20 occupied.) Why in the world should I declare a varchar(255) --allocating 256 bytes of memory-- to actually use only 9 and leave 247 unused? For each column of each record of each table maybe? I've tried counting the amount of allocated (and unused) space in a 100-table DB made of 100-column, 1M-records tables declared that way... but my head imploded.
Side note: yes, I DO use foreign keys. And, if I ever catch a NULLable column involved in a foreign key, I'll pretty much kill the culprit.
Admin
What I want to know is how all those nulls got in there in the frist place. My keyboard doesn't even have a NULL key.
Admin
I think you got it backwards.
If you declare it varchar(15), and don't fill it, then it will take 1 byte on the disk and 1 byte in memory.
When you put an 8-byte long string in there, it will allocate a few more byte and will now take 9 bytes on disk and 9 bytes in memory.
If you have a table with a varchar(255) column, and 100'000 rows, but each row only has 3 characters stored in that column, then the table will take up 400'000 bytes on the disk, in RAM, and wherever else you put it. Well, plus some inevitable overhead of other origins, but that's besides the point.
varchar is not like char, which actually allocated that many bytes for every row and leaves them empty if not used. That's the whole point of varchar! It takes up only as much memory as needed.
Admin
That's exactly why I would not declare a varchar(255) to store a phone number or any other short string.
Admin
Admin
This is ORACLE SQL, so the function you're looking for is NVL(fieldname,replacementvalueforwhennull)
Yeah - one select with NVL() wrapped fields would have sufficed.
Admin
Admin
Admin
TopCoder, is that you?
Admin
Store it as an empty string!
Are you REALLY trying to distinguish between when a user never enters a fax number, and when they type in a fax number and then delete it? Users cannot enter NULLs into the user interface, so to me, user-entered fields should never support NULL.
Date fields might be an exception, but I personally find it more practical to have a magic date that means "no date". At least then you'll only get a wtf date on your printout if you make a coding error, rather than crashing the app.
Admin
Admin
However, as I already mentioned elsewhere, this is going to get off-topic. I won't pretend I can really grab the guts of data management. I'll be contented by knowing I will not make 20 queries to retrieve one lonely record.
Admin
Fair point. OK, let's drop this here. If you would still like to argue about this with me (I'd really like to hear at least one good reason why NOT to make them max size), I've opened a question in stackoverflow:
http://stackoverflow.com/questions/1261441/what-size-to-pick-for-a-nvarchar-column
Admin
See Oracle.
Admin
It might be I worked too much in an Oracle environment so I am stuck to its (perhaps old BTW) mechanism, but: Let's say you have a table, with a column, declared as varchar(255)-- varchar2(255) in Oracle if you prefer:
I hope I've been able to make it clearer now. Of course I might be wrong, I'm recalling all by heart and NOT looking at the docs-- otherwise I'd spoil the discussion! ;) and maybe be more precise :/ let me quote you:
^_^I'll follow the Stack Overflow thread, thanks for sharing.
Admin
A short sample extracted from that mess should suffice as an explanation:
One SQL statement, one round trip to the db, for initializing first and last names with first and last name fields from a table (if the id matches).
Then, another SQL to nulls them out. Why not just test their individual values with plain-vanilla, easy to read IF statements.
One single SQL followed by locally executed control and assignment statements should suffice to initialize all those variables.
But instead, the whole thing is a collection of SQL statements being executed, each which can be expensive.
Well, by themselves and from the point of view of a single client executing them, they are not.
In general, and unless otherwise indicated by some sort of invariant, you assume every SQL statement (or any form of access to a remote resource) to be expensive. This is the type of complexity that cannot be factored out for the sake of simplicity. Design should take these into account and attempt to minimize their execution.
Incompetent programmers treat those as if they were local method calls with no overhead. Big no no.
You have a sufficiently large number of clients and/or execute that sh1t with a sufficiently large rate (specially continuously over a sufficiently long period of time), and you are pretty much nuking the living crap out of the capacity of the database server (and perhaps the network's as well.)
I've seen these type of supposedly innocuous crap (innocuous from the point of the "single machine" view of a crappy developer) bringing systems down during peak times. Bad beyond description.