The Daily WTF: Curious Perversions in Information Technology

akatherder · 2007-05-18 Reply Admin

This reminds me of "this works but there has to be a better way".

Of course, I'd be surprised if it even worked all the time.

2007-05-18 Reply Admin

It's very selfish and wasteful, in any case, to use GUIDs as row identifiers. Don't they know there's only a finite number of those things to go round?

Lisae · 2007-05-18 Reply Admin

Wonders what's wrong with using GUIDs anyway, unless this is going to be a small, local database. What to use instead? An auto-increase field? Those are even worse. And you don't want to bother the user to come up with an unique ID every time so GUIDs are useful.

jimlangrunner · 2007-05-18 Reply Admin

Makes me want to cry. I had to read the abortion twice to figure out what it might say.

Maybe it's obfuscation? Job security? Nobody else can read it, so they can't fire me!

2007-05-18 Reply Admin

Excuse my ignorance, but isn't a GUID a little harder to come up with than an auto-incremented number? Don't most modern databases have the notion of a sequence object whose sole purpose is to return unique (semi-) sequential numbers? And how exactly is an auto-increment a bad thing?

2007-05-18 Reply Admin

Why on earth would you spend time refactoring a database to remove GUID primary keys? As good old Jeff Atwood says:

I know what you're thinking. Using sixteen bytes instead of four bytes for a primary key? Have you lost your mind? Those additional 12 bytes do come at a cost. But that cost may not be as great as you think. -- Primary Keys: IDs versus GUIDs

The linked post has links to several articles that address the debate.

I can certainly see not choosing GUIDs when designing a database, but committing resources to removing them and replacing them with integers? WTF?

TheD · 2007-05-18 Reply Admin

Excuse my ignorance, but isn't a GUID a little harder to come up with than an auto-incremented number? Don't most modern databases have the notion of a sequence object whose sole purpose is to return unique (semi-) sequential numbers? And how exactly is an auto-increment a bad thing?

I don't know how much extra overhead is added when you start generating GUIDs versus autoincrement fields. I really like using GUIDs, especially in cases where I need to know the identifier before the insert.

2007-05-18 Reply Admin

Indeed, I also use guid's for record identifers. In Sql 2005, I haven't had any performance problems with them. I know that's not the best way of doing things, but I don't see any reason to change something that's working fine just to regain a few bits per record.

--Taladon [email protected]

2007-05-18 Reply Admin

I see the WTF is the article itself

"K.P. adds, You may notice the use of GUIDs. The resident developers were nearly moved to tears at the news that we would be eliminating GUIDs as row identifiers. Apparently a former “guru” had informed them that numeric row Ids are for row counting, and it is safest to use GUIDs for each PK, “just in case we ever have to use replication.”"

2007-05-18 Reply Admin

This is definitely a case of a procedural programmer not grokking set theory. "Hmm, we need a loop to change an element of an array. That's how we would do it in C..."

As for GUIDs as row identifiers -- I can understand them for a couple of primary tables. But for every table? Your indexes alone would choke the system to a crawl.

KattMan · 2007-05-18 Reply Admin

GUIDS as primary keys, in case of replication. Yes this is somewhat valid, depending on your replication scheme. Keep in mind the ability to insert an auto-increment ID is not given by default, and often set to say it is not for replication. With this in mind all your relationships can be broken by a simple replication process.

2007-05-18 Reply Admin

Their are pros and cons to using GUIDs as unique row identifiers. I certainly wouldn't call it a WTF.

Jeff Atwood, a contributor to this very site, wrote about it in his blog a while back: http://www.codinghorror.com/blog/archives/000817.html

KattMan · 2007-05-18 Reply Admin

As far as the SQL statement is concerned, I have done things similar, but this is when I had to receive an XML file into a stored proc, load it into a table and make updates based off of each record in this new table.

As it applies in this particular case... where policyGUID = @PolicyGUID This means it will be only one record ever, basically a waste.

Sounds like the DB developer used a pattern everywhere it wasn't needed.

savar · 2007-05-18 Reply Admin

WTF!:
I see the WTF is the article itself
"K.P. adds, You may notice the use of GUIDs. The resident developers were nearly moved to tears at the news that we would be eliminating GUIDs as row identifiers. Apparently a former “guru” had informed them that numeric row Ids are for row counting, and it is safest to use GUIDs for each PK, “just in case we ever have to use replication.”"

I read the phrase "well-known healthcare provider" to mean "data out the wazoo". For high throughput situations, GUIDs are unnecessary overhead that makes your indices huge and harder to cache.

2007-05-18 Reply Admin

Blimey. I've seen this before at the SQL end of our shop. Similar thing, huge long cursory mess that refactors out to 'INSERT INTO a SELECT column FROM b'.

What it turned out to be, basically, was that the developer in question had been taught that cursors would be required to do some very advanced queries in SQL.

You got it - "therefore" SQL code is not advanced unless it features a cursor. You want your code to be advanced, don't you?

2007-05-18 Reply Admin

Well, SQL Server has identity ranges now, which seem to work quite well. Prior to that we used a composite key consisting of a site identifier and a identity (autoincrement) column (in fact we often still do). Has the advantage of letting us directly query a variety of things based on the site that created them, whether or not it was a table where we cared enough about the individual user who created that row to store it in the record. Of course, we're merge replicating satellite sites rather than remote users, so the number of subscribers is (relatively) low, and we had a table storing each site's details anyway, and hence a unique site ID to use in the composite key.

travisowens · 2007-05-18 Reply Admin

Yes there are times where GUIDs are a superior choice to just using an incrementing ID. When you need your application to know the entry key in advance before INSERTing the data. Since you are generating the GUID within the app, there's no measurable performance hit, but there is a performance hit if you have to retrieve the resulting incremental ID after your insert. There are even some instances in SQL Server where the ID returned will be the wrong ID. Hence generating your own key (GUID) in your app before you ever touch the database can sometimes be the best solution.

When working with simpler and smaller apps, incremental IDs are fine. I'd also like to point out I thought incremental IDs were the only way to go until I started working on larger and complex projects, so if you think I'm wrong, perhaps you haven't done something big enough yet.

2007-05-18 Reply Admin

KattMan:
GUIDS as primary keys, in case of replication. Yes this is somewhat valid, depending on your replication scheme. Keep in mind the ability to insert an auto-increment ID is not given by default, and often set to say it is not for replication. With this in mind all your relationships can be broken by a simple replication process.

Oops forgot to quote - my bad! My comment about identity ranges/primary keys was in reference to this

2007-05-18 Reply Admin

Priceless!

2007-05-18 Reply Admin

Thats right GUIDs must be bad. I love having so write sql and supporting code that returns values from insert statements.

It looks soooo enterprisey especially when I do bulk inserts :-p

2007-05-18 Reply Admin

KattMan:
As far as the SQL statement is concerned, I have done things similar, but this is when I had to receive an XML file into a stored proc, load it into a table and make updates based off of each record in this new table.
As it applies in this particular case... where policyGUID = @PolicyGUID This means it will be only one record ever, basically a waste.

Sounds like the DB developer used a pattern everywhere it wasn't needed.

That is assuming the policyGUID is the primary key, I would venture to say that it is more likely a foreign key in this case since its a Patient table.

2007-05-18 Reply Admin

damn, beat me to it

2007-05-18 Reply Admin

travisowens:
There are even some instances in SQL Server where the ID returned will be the wrong ID. Hence generating your own key (GUID) in your app before you ever touch the database can sometimes be the best solution.

This is only the case if you use @@IDENTITY. Using SCOPE_IDENTITY() returns the most recent identity within the scope of the sproc.

2007-05-18 Reply Admin

It is pity it was not told if GUIDs are generated on server or on clients. But since it was made "for replication"... i think on server, where SEQUENCEs (GENERATORs in Interbase/Yaffil/old-Firebird terms) could be better. And what about if client would have a broken GUID-generating library ?

But afterall, if this necessary WTF ? Imagine You (personally You) had to port the running system from ISAM to SQL. And you have to do it fast, now and alone. Wouldn't there semi-automatic conversion from xBase-like language into this procedural SQL most quick, easy and error-safe way, comparing to rethinking each algorithm and then inventing the query, hopefully corect one. Don't fix the working program.

Sure at some time, when SQL server will be becoming more value as DMBS and less value as large storage with easy backup, then this scheme will produce bottlenecks(in other word would not be 'working program' no more), and those need to be found and refactored... Oh, isn't this exactly what the writer was hired to do ?

CAPTCHA: RIAA... Jesus, i hoped to have a good weekend! :-(

2007-05-18 Reply Admin

Dang, that is like going from Pittsburgh, PA to New York, NY via Los Angeles, CA.

You still get there, but who would choose to go that way?

Sean · 2007-05-18 Reply Admin

JTK:
This is definitely a case of a procedural programmer not grokking set theory. "Hmm, we need a loop to change an element of an array. That's how we would do it in C..."
As for GUIDs as row identifiers -- I can understand them for a couple of primary tables. But for every table? Your indexes alone would choke the system to a crawl.

Untrue. There is a very interesting article that IMO serves well to debunk that myth.

http://www.informit.com/articles/article.asp?p=25862&seqNum=1&rl=1

Of course, the GUID is 4x larger than an int, but in the era of cheap memory and storage, it is hardly an index-killer. It doesn't necessarily help with all types of replication, but using GUIDs as PKs makes merging databases a snap. The article includes an algorithm for creating an ordered GUID (COMB), which vastly improves performance when that GUID is or is part of a clustered index, to the point that it is just slightly less than the int-key baseline.

However, one hidden benefit that is not mentioned in this article is the scalability of each choice. Using an autoincrementing integer can sometimes lead to tables being locked for an unnecessarily long amount of time -- if you generate a new ID, you have to lock the entire table until the INSERT transaction has completed. If you have a system with a lot of concurrency (and I mean a LOT), this type of locking can become quite expensive in a large, heavily-used table.

This is not to say the GUID is a panacea. It has its drawbacks, especially if you're storage-constrained. But for modern database systems on relatively modern hardware, it's something to think about when designing your architecture.

2007-05-18 Reply Admin

Lazgen:
Dang, that is like going from Pittsburgh, PA to New York, NY via Los Angeles, CA.
You still get there, but who would choose to go that way?

I went from Philadelphia, PA to West Virginia by way of Detroit a few months ago.

NorthWest has a hub in Detroit, and I was told to get the cheapest fare I could find. So a two hour flight became an all-day event.

Hey, I got paid.

2007-05-18 Reply Admin

I've been using GUID rather than ints, longs and counters since about 2000, when I read Kimball's Data Warehouse book. If handling a lot of new rows is a problem, then GUIDs beat counters hollow. Especially so when these INSERTs are coming from distributed sources -- GUIDs avoid the bottleneck of allocating new unique IDs from a single sequence generator.

2007-05-18 Reply Admin

A word about the statement. It probably came a jr dev following a strict "don't use an update with a where clause that isn't the primary key of the table". They probably had issues with replication of cross db udpates in the past and went quite draconian.

2007-05-18 Reply Admin

But the GUIDs may very well help you to build some sort of replications scheme, which in turn may help to share some load among several boxes. Not to mention the situation where you have to be able to move the data to similar, but not identical, system...

I wouldn't probably use them as the primary key, though.

2007-05-18 Reply Admin

gah:
I wouldn't probably use them as the primary key, though.

But what would I use them for? Nevermind, my brain's not working very well right now.

2007-05-18 Reply Admin

Yeh, its funny when the new 'expert' on a project needs to badmouth the old expert but doesn't have a clue what they're talking about.

2007-05-18 Reply Admin

travisowens:
... if you think I'm wrong, perhaps you haven't done something big enough yet.

If this thing had a voting system, I'd give you full marks.

Denis Troller · 2007-05-18 Reply Admin

I can certainly see not choosing GUIDs when designing a database, but committing resources to removing them and replacing them with integers? WTF?

Been there, done that. Had a database with a GUID as a PK in each and every one of the tables (indeed for the exact same reason "we might want easy replication"). It seemed fine at first, but the insert time seemed to increase exponentially...

Moving away from GUIDs to integer sped up the software by an insane amount of speed. After a certain amount of rows with GUIDs, inserting a new one becomes pretty slow, and your logfile turns gigantic. Inserting a legacy chunk of data (out of text files) turned from "not being done in an afternoon" to "done under 2 hours".

And I won't even mention the speed difference for a simple "Count(*)".

Last, but not least, GUIDs make debugging a nightmare when trying to follow what screwed up (memorizing two or three integers is easy, but you have to copy paste each GUID, which makes it a wee bit harder).

Granted, I'm no database "expert" and there might have been a way to make it work with GUIDs, but if you're not sure how to achieve performance, I say stay away from them.

Gsquared · 2007-05-18 Reply Admin

mattnaik:
travisowens:
There are even some instances in SQL Server where the ID returned will be the wrong ID. Hence generating your own key (GUID) in your app before you ever touch the database can sometimes be the best solution.

This is only the case if you use @@IDENTITY. Using SCOPE_IDENTITY() returns the most recent identity within the scope of the sproc.

Which can be a problem if you have multiple insert statements in the same proc and one of them fails. Have to be careful of errors in that. Best to check @@rowcount on your insert before using any of these.

2007-05-18 Reply Admin

Lisae:
Wonders what's wrong with using GUIDs anyway, unless this is going to be a small, local database. What to use instead? An auto-increase field? Those are even worse. And you don't want to bother the user to come up with an unique ID every time so GUIDs are useful.

Oh, I don't know, maybe you could use the natural keys?

2007-05-18 Reply Admin

Denis Troller:
And I won't even mention the speed difference for a simple "Count(*)".
Last, but not least, GUIDs make debugging a nightmare when trying to follow what screwed up (memorizing two or three integers is easy, but you have to copy paste each GUID, which makes it a wee bit harder).

Interesting: I've never noticed a speed difference or even heard this argument made, and I've been using guids since 2000. And I've heard the argument about "normal numbers" being easy to remember and write down before, but I've never had problems with doing some simple cut and pastes, and my memory is crap anyway.

captcha: pinball. Good idea!

2007-05-18 Reply Admin

CaptchaFillrOutr:
Oh, I don't know, maybe you could use the natural keys?

I knew it was only a matter of time before somebody served that nugget. I just hope it was intended to be ironic.

2007-05-18 Reply Admin

You're right. You're no expert. A simple index defragment or reindex (depending on what SHOWCONTIG showed you) would have given you the same speed increase for about 10 minutes worth of the work. Most of the increase came not from switching data types, but from the fact that by switching data types, the indexes were rebuilt.

You have to know what you're using and know how to use it. Just saying "use A all the time!" and blindly following some database design "recipe" will inevitably lead you to make some pretty bad errors.

2007-05-18 Reply Admin

KattMan:
With this in mind all your relationships can be broken by a simple replication process.

Happened to me once. My relationship broke unexpectedly while I was in the middle of a simple replication process... My ex girlfriend caught us in the act. Did I mention I like natural keys more than artifical ones?

Captcha: dubya - well, it was more like "gotcha!"

2007-05-18 Reply Admin

Maybe I should have specified, driving from Pittsburgh to New York via Los Angeles.

2007-05-18 Reply Admin

wow... what a shameful waste.

I'll guess it's someone who doesn't know any better, a programmer without any previous sql knowledge writing this "sql" in a for-loop mentality. Forgivable, but for the love of the cosmos (and performance--- oh the humanity), make sure this poor sap learns some sql.

cursors? bad. I only use cursors to iterate over system views.

2007-05-18 Reply Admin

Sean:
JTK:
This is definitely a case of a procedural programmer not grokking set theory. "Hmm, we need a loop to change an element of an array. That's how we would do it in C..."
As for GUIDs as row identifiers -- I can understand them for a couple of primary tables. But for every table? Your indexes alone would choke the system to a crawl.

Untrue. There is a very interesting article that IMO serves well to debunk that myth.

http://www.informit.com/articles/article.asp?p=25862&seqNum=1&rl=1

Of course, the GUID is 4x larger than an int, but in the era of cheap memory and storage, it is hardly an index-killer. It doesn't necessarily help with all types of replication, but using GUIDs as PKs makes merging databases a snap. The article includes an algorithm for creating an ordered GUID (COMB), which vastly improves performance when that GUID is or is part of a clustered index, to the point that it is just slightly less than the int-key baseline.

However, one hidden benefit that is not mentioned in this article is the scalability of each choice. Using an autoincrementing integer can sometimes lead to tables being locked for an unnecessarily long amount of time -- if you generate a new ID, you have to lock the entire table until the INSERT transaction has completed. If you have a system with a lot of concurrency (and I mean a LOT), this type of locking can become quite expensive in a large, heavily-used table.

This is not to say the GUID is a panacea. It has its drawbacks, especially if you're storage-constrained. But for modern database systems on relatively modern hardware, it's something to think about when designing your architecture.

heh, thanks for the article. interesting read... although it is older, some of the info in it is definitely relevant. I always felt that a GUID would be a boon as a clustered index because of the size of the key to check, which should be less performant. it should be faster to check an int as a clustered index. I could care less about the space involved, errr... guess that really depends on how many rows we are talking about.

I haven't had the luxury of dealing with a DB that gets hit so hard you might want to take into account a GUID to avoid locking issues... very cool stuff, will read more.

FredSaw · 2007-05-18 Reply Admin

And when he had finished coding his SQL, he called a taxi to take him across the street.

2007-05-18 Reply Admin

Re GUIDs, I'm not an MSSQL guru but my impression was that it defaults to GUIDs for unique key generation, or am I thinking about something else?

That aside, I had a project that consisted of isolated databases of identical structure located on servers owned by multiple companies that gathered real-time data using a leased common application but that required all data from all companies be collated in a single database at the application owner's site. Simplest way to keep things organized was to use a GUID with a location flag. It's been running about 7 years now, no glitches.

chrismcb · 2007-05-18 Reply Admin

CaptchaFillrOutr:
Lisae:
Wonders what's wrong with using GUIDs anyway, unless this is going to be a small, local database. What to use instead? An auto-increase field? Those are even worse. And you don't want to bother the user to come up with an unique ID every time so GUIDs are useful.

Oh, I don't know, maybe you could use the natural keys?

WHY would you use natural keys?

bif · 2007-05-18 Reply Admin

RadMat:
I know what you're thinking. Using sixteen bytes instead of four bytes for a primary key? Have you lost your mind? Those additional 12 bytes do come at a cost. But that cost may not be as great as you think.

Before reading this thread, I would have thought using GUIDs as PKs was monumentally idiotic. Now I can see that there are in fact some cases where it could be very useful.

Or...reading WTF is making me more stupider.

2007-05-18 Reply Admin

Oddly enough we're having problems BECAUSE we're using GUIDs right now. It mostly comes with the time it takes to search through GUIDS vs INT because the data is so spread out combined with READ-COMMITTED.

Mainly the fault is that of InnoDB because it prefixes all indexes with the primary key.

Ee can't use the database GUID() because it would break replication. GUID generation is being done on the java app server side so our GUIDs are even more random. At least if we were using database GUID() we'd have some sort of sequence to the values.

CAPTCHA: wigwam (I think I need a sweat lodge after the GUID crap I've had to deal with today)

2007-05-18 Reply Admin

Am I the only one who noticed the subtlety conveyed in the statement that a person named "Kevin P." is working for "a well-known health care provider"? Or is that just considered obvious?

2007-05-18 Reply Admin

TheD:
I really like using GUIDs, especially in cases where I need to know the identifier before the insert.

If you need to know the value before the insert, just SELECT SEQ.NEXTVAL before you do the insert, and include the PK in the insert instead of letting the trigger fill it in from the sequence.

If you need to do a bulk insert, just get a bunch of values from the sequence, possibly via a sproc. If you need to get the ID back after the insert, write a sproc which returns the ID, or just SELECT SEQ.CURRVAL from within the same transaction.

Gosh, is there something that Oracle is actually better for?

An Amusing Way of Saying UPDATE

Leave a comment on “An Amusing Way of Saying UPDATE”