Comment On An Amusing Way of Saying UPDATE

Four months ago, Kevin P. joined a well-known health care provider as a project lead. Little did he know what he would find himself leading. [expand full text]
« PrevPage 1 | Page 2Next »

Re: An Amusing Way of Saying UPDATE

2007-05-18 09:20 • by akatherder
This reminds me of "this works but there has to be a better way".

Of course, I'd be surprised if it even worked all the time.

Re: An Amusing Way of Saying UPDATE

2007-05-18 09:22 • by Dominic (unregistered)
It's very selfish and wasteful, in any case, to use GUIDs as row identifiers. Don't they know there's only a finite number of those things to go round?

Re: An Amusing Way of Saying UPDATE

2007-05-18 09:52 • by Lisae
Wonders what's wrong with using GUIDs anyway, unless this is going to be a small, local database. What to use instead? An auto-increase field? Those are even worse. And you don't want to bother the user to come up with an unique ID every time so GUIDs are useful.

Re: An Amusing Way of Saying UPDATE

2007-05-18 09:57 • by jimlangrunner
Makes me want to cry. I had to read the abortion twice to figure out what it might say.

Maybe it's obfuscation? Job security? Nobody else can read it, so they can't fire me!

Re: An Amusing Way of Saying UPDATE

2007-05-18 10:00 • by APH (unregistered)
Excuse my ignorance, but isn't a GUID a little harder to come up with than an auto-incremented number? Don't most modern databases have the notion of a sequence object whose sole purpose is to return unique (semi-) sequential numbers? And how exactly is an auto-increment a bad thing?

Re: An Amusing Way of Saying UPDATE

2007-05-18 10:01 • by RadiantMatrix (unregistered)
Why on earth would you spend time refactoring a database to remove GUID primary keys? As good old Jeff Atwood says:
I know what you're thinking. Using sixteen bytes instead of four bytes for a primary key? Have you lost your mind? Those additional 12 bytes do come at a cost. But that cost may not be as great as you think. -- Primary Keys: IDs versus GUIDs

The linked post has links to several articles that address the debate.

I can certainly see not choosing GUIDs when designing a database, but committing resources to removing them and replacing them with integers? WTF?

Re: An Amusing Way of Saying UPDATE

2007-05-18 10:03 • by TheD
Excuse my ignorance, but isn't a GUID a little harder to come up with than an auto-incremented number? Don't most modern databases have the notion of a sequence object whose sole purpose is to return unique (semi-) sequential numbers? And how exactly is an auto-increment a bad thing?


I don't know how much extra overhead is added when you start generating GUIDs versus autoincrement fields. I really like using GUIDs, especially in cases where I need to know the identifier before the insert.

Re: An Amusing Way of Saying UPDATE

2007-05-18 10:08 • by John Jackson (unregistered)
137591 in reply to 137590
Indeed, I also use guid's for record identifers. In Sql 2005, I haven't had any performance problems with them. I know that's not the best way of doing things, but I don't see any reason to change something that's working fine just to regain a few bits per record.

--Taladon
--taladon@gmail.com

Re: An Amusing Way of Saying UPDATE

2007-05-18 10:11 • by WTF! (unregistered)
I see the WTF is the article itself

"K.P. adds, You may notice the use of GUIDs. The resident developers were nearly moved to tears at the news that we would be eliminating GUIDs as row identifiers. Apparently a former “guru” had informed them that numeric row Ids are for row counting, and it is safest to use GUIDs for each PK, “just in case we ever have to use replication.”"

Re: An Amusing Way of Saying UPDATE

2007-05-18 10:21 • by JTK (unregistered)
This is definitely a case of a procedural programmer not grokking set theory. "Hmm, we need a loop to change an element of an array. That's how we would do it in C..."

As for GUIDs as row identifiers -- I can understand them for a couple of primary tables. But for every table? Your indexes alone would choke the system to a crawl.

Re: An Amusing Way of Saying UPDATE

2007-05-18 10:25 • by KattMan
GUIDS as primary keys, in case of replication.
Yes this is somewhat valid, depending on your replication scheme. Keep in mind the ability to insert an auto-increment ID is not given by default, and often set to say it is not for replication. With this in mind all your relationships can be broken by a simple replication process.

Re: An Amusing Way of Saying UPDATE

2007-05-18 10:27 • by Joshua (unregistered)
Their are pros and cons to using GUIDs as unique row identifiers. I certainly wouldn't call it a WTF.

Jeff Atwood, a contributor to this very site, wrote about it in his blog a while back:
http://www.codinghorror.com/blog/archives/000817.html

Re: An Amusing Way of Saying UPDATE

2007-05-18 10:28 • by KattMan
As far as the SQL statement is concerned, I have done things similar, but this is when I had to receive an XML file into a stored proc, load it into a table and make updates based off of each record in this new table.

As it applies in this particular case...
where policyGUID = @PolicyGUID
This means it will be only one record ever, basically a waste.

Sounds like the DB developer used a pattern everywhere it wasn't needed.

Re: An Amusing Way of Saying UPDATE

2007-05-18 10:34 • by savar
137597 in reply to 137592
WTF!:
I see the WTF is the article itself

"K.P. adds, You may notice the use of GUIDs. The resident developers were nearly moved to tears at the news that we would be eliminating GUIDs as row identifiers. Apparently a former “guru” had informed them that numeric row Ids are for row counting, and it is safest to use GUIDs for each PK, “just in case we ever have to use replication.”"


I read the phrase "well-known healthcare provider" to mean "data out the wazoo". For high throughput situations, GUIDs are unnecessary overhead that makes your indices huge and harder to cache.

Cursorage.

2007-05-18 10:35 • by Timberwolf (unregistered)
Blimey. I've seen this before at the SQL end of our shop. Similar thing, huge long cursory mess that refactors out to 'INSERT INTO a SELECT column FROM b'.

What it turned out to be, basically, was that the developer in question had been taught that cursors would be required to do some very advanced queries in SQL.

You got it - "therefore" SQL code is not advanced unless it features a cursor. You want your code to be advanced, don't you?

Re: An Amusing Way of Saying UPDATE

2007-05-18 10:36 • by Kev (unregistered)
137600 in reply to 137594
Well, SQL Server has identity ranges now, which seem to work quite well. Prior to that we used a composite key consisting of a site identifier and a identity (autoincrement) column (in fact we often still do). Has the advantage of letting us directly query a variety of things based on the site that created them, whether or not it was a table where we cared enough about the individual user who created that row to store it in the record. Of course, we're merge replicating satellite sites rather than remote users, so the number of subscribers is (relatively) low, and we had a table storing each site's details anyway, and hence a unique site ID to use in the composite key.

Re: An Amusing Way of Saying UPDATE

2007-05-18 10:38 • by travisowens
Yes there are times where GUIDs are a superior choice to just using an incrementing ID. When you need your application to know the entry key in advance before INSERTing the data. Since you are generating the GUID within the app, there's no measurable performance hit, but there is a performance hit if you have to retrieve the resulting incremental ID after your insert. There are even some instances in SQL Server where the ID returned will be the wrong ID. Hence generating your own key (GUID) in your app before you ever touch the database can sometimes be the best solution.

When working with simpler and smaller apps, incremental IDs are fine. I'd also like to point out I thought incremental IDs were the only way to go until I started working on larger and complex projects, so if you think I'm wrong, perhaps you haven't done something big enough yet.

Re: An Amusing Way of Saying UPDATE

2007-05-18 10:38 • by Kev (unregistered)
137603 in reply to 137594
KattMan:
GUIDS as primary keys, in case of replication.
Yes this is somewhat valid, depending on your replication scheme. Keep in mind the ability to insert an auto-increment ID is not given by default, and often set to say it is not for replication. With this in mind all your relationships can be broken by a simple replication process.



Oops forgot to quote - my bad! My comment about identity ranges/primary keys was in reference to this

Re: An Amusing Way of Saying UPDATE

2007-05-18 10:49 • by Gedoon (unregistered)
Priceless!

Re: An Amusing Way of Saying UPDATE

2007-05-18 11:04 • by The Selfish One (unregistered)
137613 in reply to 137578
Thats right GUIDs must be bad.
I love having so write sql and supporting code that returns values from insert statements.

It looks soooo enterprisey especially when I do bulk inserts :-p



Re: An Amusing Way of Saying UPDATE

2007-05-18 11:12 • by SkippyForce (unregistered)
137617 in reply to 137596
KattMan:
As far as the SQL statement is concerned, I have done things similar, but this is when I had to receive an XML file into a stored proc, load it into a table and make updates based off of each record in this new table.

As it applies in this particular case...
where policyGUID = @PolicyGUID
This means it will be only one record ever, basically a waste.

Sounds like the DB developer used a pattern everywhere it wasn't needed.


That is assuming the policyGUID is the primary key, I would venture to say that it is more likely a foreign key in this case since its a Patient table.

Re: An Amusing Way of Saying UPDATE

2007-05-18 11:19 • by mattnaik (unregistered)
137622 in reply to 137617
damn, beat me to it

Re: An Amusing Way of Saying UPDATE

2007-05-18 11:22 • by mattnaik (unregistered)
137623 in reply to 137601
travisowens:
There are even some instances in SQL Server where the ID returned will be the wrong ID. Hence generating your own key (GUID) in your app before you ever touch the database can sometimes be the best solution.


This is only the case if you use @@IDENTITY. Using SCOPE_IDENTITY() returns the most recent identity within the scope of the sproc.

Re: An Amusing Way of Saying UPDATE

2007-05-18 11:54 • by Arioch (unregistered)
137631 in reply to 137582
It is pity it was not told if GUIDs are generated on server or on clients. But since it was made "for replication"... i think on server, where SEQUENCEs (GENERATORs in Interbase/Yaffil/old-Firebird terms) could be better. And what about if client would have a broken GUID-generating library ?



But afterall, if this necessary WTF ?
Imagine You (personally You) had to port the running system from ISAM to SQL.
And you have to do it fast, now and alone.
Wouldn't there semi-automatic conversion from xBase-like language into this procedural SQL most quick, easy and error-safe way, comparing to rethinking each algorithm and then inventing the query, hopefully corect one.
Don't fix the working program.


Sure at some time, when SQL server will be becoming more value as DMBS and less value as large storage with easy backup, then this scheme will produce bottlenecks(in other word would not be 'working program' no more), and those need to be found and refactored... Oh, isn't this exactly what the writer was hired to do ?

CAPTCHA: RIAA... Jesus, i hoped to have a good weekend! :-(

Re: An Amusing Way of Saying UPDATE

2007-05-18 11:57 • by Lazgen (unregistered)
Dang, that is like going from Pittsburgh, PA to New York, NY via Los Angeles, CA.

You still get there, but who would choose to go that way?

Re: An Amusing Way of Saying UPDATE

2007-05-18 12:09 • by Sean
137639 in reply to 137593
JTK:
This is definitely a case of a procedural programmer not grokking set theory. "Hmm, we need a loop to change an element of an array. That's how we would do it in C..."

As for GUIDs as row identifiers -- I can understand them for a couple of primary tables. But for every table? Your indexes alone would choke the system to a crawl.


Untrue. There is a very interesting article that IMO serves well to debunk that myth.

http://www.informit.com/articles/article.asp?p=25862&seqNum=1&rl=1

Of course, the GUID is 4x larger than an int, but in the era of cheap memory and storage, it is hardly an index-killer. It doesn't necessarily help with all types of replication, but using GUIDs as PKs makes merging databases a snap. The article includes an algorithm for creating an ordered GUID (COMB), which vastly improves performance when that GUID is or is part of a clustered index, to the point that it is just slightly less than the int-key baseline.

However, one hidden benefit that is not mentioned in this article is the scalability of each choice. Using an autoincrementing integer can sometimes lead to tables being locked for an unnecessarily long amount of time -- if you generate a new ID, you have to lock the entire table until the INSERT transaction has completed. If you have a system with a lot of concurrency (and I mean a LOT), this type of locking can become quite expensive in a large, heavily-used table.

This is not to say the GUID is a panacea. It has its drawbacks, especially if you're storage-constrained. But for modern database systems on relatively modern hardware, it's something to think about when designing your architecture.

Re: An Amusing Way of Saying UPDATE

2007-05-18 12:27 • by sammy (unregistered)
137645 in reply to 137633
Lazgen:
Dang, that is like going from Pittsburgh, PA to New York, NY via Los Angeles, CA.

You still get there, but who would choose to go that way?


I went from Philadelphia, PA to West Virginia by way of Detroit a few months ago.

NorthWest has a hub in Detroit, and I was told to get the cheapest fare I could find. So a two hour flight became an all-day event.

Hey, I got paid.

GUID rather than an int

2007-05-18 12:29 • by Andy Dingley (unregistered)
I've been using GUID rather than ints, longs and counters since about 2000, when I read Kimball's Data Warehouse book. If handling a lot of new rows is a problem, then GUIDs beat counters hollow. Especially so when these INSERTs are coming from distributed sources -- GUIDs avoid the bottleneck of allocating new unique IDs from a single sequence generator.

Re: An Amusing Way of Saying UPDATE

2007-05-18 12:34 • by Johnboy (unregistered)
A word about the statement. It probably came a jr dev following a strict "don't use an update with a where clause that isn't the primary key of the table". They probably had issues with replication of cross db udpates in the past and went quite draconian.

Re: An Amusing Way of Saying UPDATE

2007-05-18 12:38 • by gah (unregistered)
137651 in reply to 137597
But the GUIDs may very well help you to build some sort of replications scheme, which in turn may help to share some load among several boxes. Not to mention the situation where you have to be able to move the data to similar, but not identical, system...

I wouldn't probably use them as the primary key, though.

Re: An Amusing Way of Saying UPDATE

2007-05-18 12:47 • by gah (unregistered)
137653 in reply to 137651
gah:

I wouldn't probably use them as the primary key, though.

But what would I use them for? Nevermind, my brain's not working very well right now.

Re: An Amusing Way of Saying UPDATE

2007-05-18 12:56 • by Yeh (unregistered)
137657 in reply to 137592
Yeh, its funny when the new 'expert' on a project needs to badmouth the old expert but doesn't have a clue what they're talking about.

Re: An Amusing Way of Saying UPDATE

2007-05-18 13:08 • by Grim (unregistered)
137661 in reply to 137601
travisowens:
... if you think I'm wrong, perhaps you haven't done something big enough yet.


If this thing had a voting system, I'd give you full marks.

Re: An Amusing Way of Saying UPDATE

2007-05-18 13:11 • by Denis Troller
137662 in reply to 137589
I can certainly see not choosing GUIDs when designing a database, but committing resources to removing them and replacing them with integers? WTF?


Been there, done that. Had a database with a GUID as a PK in each and every one of the tables (indeed for the exact same reason "we might want easy replication").
It seemed fine at first, but the insert time seemed to increase exponentially...

Moving away from GUIDs to integer sped up the software by an insane amount of speed.
After a certain amount of rows with GUIDs, inserting a new one becomes pretty slow, and your logfile turns gigantic.
Inserting a legacy chunk of data (out of text files) turned from "not being done in an afternoon" to "done under 2 hours".

And I won't even mention the speed difference for a simple "Count(*)".

Last, but not least, GUIDs make debugging a nightmare when trying to follow what screwed up (memorizing two or three integers is easy, but you have to copy paste each GUID, which makes it a wee bit harder).

Granted, I'm no database "expert" and there might have been a way to make it work with GUIDs, but if you're not sure how to achieve performance, I say stay away from them.



Re: An Amusing Way of Saying UPDATE

2007-05-18 13:11 • by Gsquared
137663 in reply to 137623
mattnaik:
travisowens:
There are even some instances in SQL Server where the ID returned will be the wrong ID. Hence generating your own key (GUID) in your app before you ever touch the database can sometimes be the best solution.


This is only the case if you use @@IDENTITY. Using SCOPE_IDENTITY() returns the most recent identity within the scope of the sproc.
Which can be a problem if you have multiple insert statements in the same proc and one of them fails. Have to be careful of errors in that. Best to check @@rowcount on your insert before using any of these.

Re: An Amusing Way of Saying UPDATE

2007-05-18 13:29 • by CaptchaFillrOutr (unregistered)
137674 in reply to 137582
Lisae:
Wonders what's wrong with using GUIDs anyway, unless this is going to be a small, local database. What to use instead? An auto-increase field? Those are even worse. And you don't want to bother the user to come up with an unique ID every time so GUIDs are useful.


Oh, I don't know, maybe you could use the natural keys?

Re: An Amusing Way of Saying UPDATE

2007-05-18 13:43 • by Tofino (unregistered)
137676 in reply to 137662
Denis Troller:
And I won't even mention the speed difference for a simple "Count(*)".

Last, but not least, GUIDs make debugging a nightmare when trying to follow what screwed up (memorizing two or three integers is easy, but you have to copy paste each GUID, which makes it a wee bit harder).


Interesting: I've never noticed a speed difference or even heard this argument made, and I've been using guids since 2000. And I've heard the argument about "normal numbers" being easy to remember and write down before, but I've never had problems with doing some simple cut and pastes, and my memory is crap anyway.

captcha: pinball. Good idea!

Re: An Amusing Way of Saying UPDATE

2007-05-18 14:08 • by Aaron (unregistered)
137687 in reply to 137674
CaptchaFillrOutr:
Oh, I don't know, maybe you could use the natural keys?

I knew it was only a matter of time before somebody served that nugget. I just hope it was intended to be ironic.

Re: An Amusing Way of Saying UPDATE

2007-05-18 14:11 • by Lionstone (unregistered)
137689 in reply to 137662
You're right. You're no expert.
A simple index defragment or reindex (depending on what SHOWCONTIG showed you) would have given you the same speed increase for about 10 minutes worth of the work. Most of the increase came not from switching data types, but from the fact that by switching data types, the indexes were rebuilt.

You have to know what you're using and know how to use it. Just saying "use A all the time!" and blindly following some database design "recipe" will inevitably lead you to make some pretty bad errors.

Re: An Amusing Way of Saying UPDATE

2007-05-18 15:12 • by the way: (unregistered)
137701 in reply to 137594
KattMan:
With this in mind all your relationships can be broken by a simple replication process.

Happened to me once. My relationship broke unexpectedly while I was in the middle of a simple replication process... My ex girlfriend caught us in the act.
Did I mention I like natural keys more than artifical ones?


Captcha: dubya - well, it was more like "gotcha!"

Re: An Amusing Way of Saying UPDATE

2007-05-18 15:22 • by Lazgen (unregistered)
137703 in reply to 137645
Maybe I should have specified, driving from Pittsburgh to New York via Los Angeles.

Re: An Amusing Way of Saying UPDATE

2007-05-18 15:34 • by barf indeedy (unregistered)
wow... what a shameful waste.

I'll guess it's someone who doesn't know any better, a programmer without any previous sql knowledge writing this "sql" in a for-loop mentality. Forgivable, but for the love of the cosmos (and performance--- oh the humanity), make sure this poor sap learns some sql.

cursors? bad. I only use cursors to iterate over system views.

Re: An Amusing Way of Saying UPDATE

2007-05-18 15:50 • by barf indeedy (unregistered)
137712 in reply to 137639
Sean:
JTK:
This is definitely a case of a procedural programmer not grokking set theory. "Hmm, we need a loop to change an element of an array. That's how we would do it in C..."

As for GUIDs as row identifiers -- I can understand them for a couple of primary tables. But for every table? Your indexes alone would choke the system to a crawl.


Untrue. There is a very interesting article that IMO serves well to debunk that myth.

http://www.informit.com/articles/article.asp?p=25862&seqNum=1&rl=1

Of course, the GUID is 4x larger than an int, but in the era of cheap memory and storage, it is hardly an index-killer. It doesn't necessarily help with all types of replication, but using GUIDs as PKs makes merging databases a snap. The article includes an algorithm for creating an ordered GUID (COMB), which vastly improves performance when that GUID is or is part of a clustered index, to the point that it is just slightly less than the int-key baseline.

However, one hidden benefit that is not mentioned in this article is the scalability of each choice. Using an autoincrementing integer can sometimes lead to tables being locked for an unnecessarily long amount of time -- if you generate a new ID, you have to lock the entire table until the INSERT transaction has completed. If you have a system with a lot of concurrency (and I mean a LOT), this type of locking can become quite expensive in a large, heavily-used table.

This is not to say the GUID is a panacea. It has its drawbacks, especially if you're storage-constrained. But for modern database systems on relatively modern hardware, it's something to think about when designing your architecture.


heh, thanks for the article. interesting read... although it is older, some of the info in it is definitely relevant. I always felt that a GUID would be a boon as a clustered index because of the size of the key to check, which should be less performant. it should be faster to check an int as a clustered index. I could care less about the space involved, errr... guess that really depends on how many rows we are talking about.

I haven't had the luxury of dealing with a DB that gets hit so hard you might want to take into account a GUID to avoid locking issues... very cool stuff, will read more.

Re: An Amusing Way of Saying UPDATE

2007-05-18 15:53 • by FredSaw
And when he had finished coding his SQL, he called a taxi to take him across the street.

Re: An Amusing Way of Saying UPDATE

2007-05-18 16:38 • by Sam Thornton (unregistered)
Re GUIDs, I'm not an MSSQL guru but my impression was that it defaults to GUIDs for unique key generation, or am I thinking about something else?

That aside, I had a project that consisted of isolated databases of identical structure located on servers owned by multiple companies that gathered real-time data using a leased common application but that required all data from all companies be collated in a single database at the application owner's site. Simplest way to keep things organized was to use a GUID with a location flag. It's been running about 7 years now, no glitches.

Re: An Amusing Way of Saying UPDATE

2007-05-18 16:45 • by chrismcb
137724 in reply to 137674
CaptchaFillrOutr:
Lisae:
Wonders what's wrong with using GUIDs anyway, unless this is going to be a small, local database. What to use instead? An auto-increase field? Those are even worse. And you don't want to bother the user to come up with an unique ID every time so GUIDs are useful.


Oh, I don't know, maybe you could use the natural keys?

WHY would you use natural keys?

Re: An Amusing Way of Saying UPDATE

2007-05-18 17:59 • by bif
137736 in reply to 137589
RadMat:

I know what you're thinking. Using sixteen bytes instead of four bytes for a primary key? Have you lost your mind? Those additional 12 bytes do come at a cost. But that cost may not be as great as you think.


Before reading this thread, I would have thought using GUIDs as PKs was monumentally idiotic. Now I can see that there are in fact some cases where it could be very useful.

Or...reading WTF is making me more stupider.

Re: An Amusing Way of Saying UPDATE

2007-05-18 18:34 • by lusis (unregistered)
137742 in reply to 137712
Oddly enough we're having problems BECAUSE we're using GUIDs right now. It mostly comes with the time it takes to search through GUIDS vs INT because the data is so spread out combined with READ-COMMITTED.

Mainly the fault is that of InnoDB because it prefixes all indexes with the primary key.

Ee can't use the database GUID() because it would break replication. GUID generation is being done on the java app server side so our GUIDs are even more random. At least if we were using database GUID() we'd have some sort of sequence to the values.

CAPTCHA: wigwam (I think I need a sweat lodge after the GUID crap I've had to deal with today)

Re: An Amusing Way of Saying UPDATE

2007-05-18 18:51 • by jpl (unregistered)
Am I the only one who noticed the subtlety conveyed in the statement that a person named "Kevin P." is working for "a well-known health care provider"? Or is that just considered obvious?

Re: An Amusing Way of Saying UPDATE

2007-05-18 20:08 • by Darwin (unregistered)
137753 in reply to 137590
TheD:
I really like using GUIDs, especially in cases where I need to know the identifier before the insert.


If you need to know the value before the insert, just SELECT SEQ.NEXTVAL before you do the insert, and include the PK in the insert instead of letting the trigger fill it in from the sequence.

If you need to do a bulk insert, just get a bunch of values from the sequence, possibly via a sproc. If you need to get the ID back after the insert, write a sproc which returns the ID, or just SELECT SEQ.CURRVAL from within the same transaction.

Gosh, is there something that Oracle is actually better for?
« PrevPage 1 | Page 2Next »

Add Comment