- Feature Articles
- CodeSOD
-
Error'd
- Most Recent Articles
- Secret Horror
- Not Impossible
- Monkeys
- Killing Time
- Hypersensitive
- Infallabella
- Doubled Daniel
- It Figures
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
This reminds me of "this works but there has to be a better way".
Of course, I'd be surprised if it even worked all the time.
Admin
It's very selfish and wasteful, in any case, to use GUIDs as row identifiers. Don't they know there's only a finite number of those things to go round?
Admin
Wonders what's wrong with using GUIDs anyway, unless this is going to be a small, local database. What to use instead? An auto-increase field? Those are even worse. And you don't want to bother the user to come up with an unique ID every time so GUIDs are useful.
Admin
Makes me want to cry. I had to read the abortion twice to figure out what it might say.
Maybe it's obfuscation? Job security? Nobody else can read it, so they can't fire me!
Admin
Excuse my ignorance, but isn't a GUID a little harder to come up with than an auto-incremented number? Don't most modern databases have the notion of a sequence object whose sole purpose is to return unique (semi-) sequential numbers? And how exactly is an auto-increment a bad thing?
Admin
Why on earth would you spend time refactoring a database to remove GUID primary keys? As good old Jeff Atwood says:
The linked post has links to several articles that address the debate.I can certainly see not choosing GUIDs when designing a database, but committing resources to removing them and replacing them with integers? WTF?
Admin
I don't know how much extra overhead is added when you start generating GUIDs versus autoincrement fields. I really like using GUIDs, especially in cases where I need to know the identifier before the insert.
Admin
Indeed, I also use guid's for record identifers. In Sql 2005, I haven't had any performance problems with them. I know that's not the best way of doing things, but I don't see any reason to change something that's working fine just to regain a few bits per record.
--Taladon [email protected]
Admin
I see the WTF is the article itself
"K.P. adds, You may notice the use of GUIDs. The resident developers were nearly moved to tears at the news that we would be eliminating GUIDs as row identifiers. Apparently a former “guru” had informed them that numeric row Ids are for row counting, and it is safest to use GUIDs for each PK, “just in case we ever have to use replication.”"
Admin
This is definitely a case of a procedural programmer not grokking set theory. "Hmm, we need a loop to change an element of an array. That's how we would do it in C..."
As for GUIDs as row identifiers -- I can understand them for a couple of primary tables. But for every table? Your indexes alone would choke the system to a crawl.
Admin
GUIDS as primary keys, in case of replication. Yes this is somewhat valid, depending on your replication scheme. Keep in mind the ability to insert an auto-increment ID is not given by default, and often set to say it is not for replication. With this in mind all your relationships can be broken by a simple replication process.
Admin
Their are pros and cons to using GUIDs as unique row identifiers. I certainly wouldn't call it a WTF.
Jeff Atwood, a contributor to this very site, wrote about it in his blog a while back: http://www.codinghorror.com/blog/archives/000817.html
Admin
As far as the SQL statement is concerned, I have done things similar, but this is when I had to receive an XML file into a stored proc, load it into a table and make updates based off of each record in this new table.
As it applies in this particular case... where policyGUID = @PolicyGUID This means it will be only one record ever, basically a waste.
Sounds like the DB developer used a pattern everywhere it wasn't needed.
Admin
I read the phrase "well-known healthcare provider" to mean "data out the wazoo". For high throughput situations, GUIDs are unnecessary overhead that makes your indices huge and harder to cache.
Admin
Blimey. I've seen this before at the SQL end of our shop. Similar thing, huge long cursory mess that refactors out to 'INSERT INTO a SELECT column FROM b'.
What it turned out to be, basically, was that the developer in question had been taught that cursors would be required to do some very advanced queries in SQL.
You got it - "therefore" SQL code is not advanced unless it features a cursor. You want your code to be advanced, don't you?
Admin
Well, SQL Server has identity ranges now, which seem to work quite well. Prior to that we used a composite key consisting of a site identifier and a identity (autoincrement) column (in fact we often still do). Has the advantage of letting us directly query a variety of things based on the site that created them, whether or not it was a table where we cared enough about the individual user who created that row to store it in the record. Of course, we're merge replicating satellite sites rather than remote users, so the number of subscribers is (relatively) low, and we had a table storing each site's details anyway, and hence a unique site ID to use in the composite key.
Admin
Yes there are times where GUIDs are a superior choice to just using an incrementing ID. When you need your application to know the entry key in advance before INSERTing the data. Since you are generating the GUID within the app, there's no measurable performance hit, but there is a performance hit if you have to retrieve the resulting incremental ID after your insert. There are even some instances in SQL Server where the ID returned will be the wrong ID. Hence generating your own key (GUID) in your app before you ever touch the database can sometimes be the best solution.
When working with simpler and smaller apps, incremental IDs are fine. I'd also like to point out I thought incremental IDs were the only way to go until I started working on larger and complex projects, so if you think I'm wrong, perhaps you haven't done something big enough yet.
Admin
Oops forgot to quote - my bad! My comment about identity ranges/primary keys was in reference to this
Admin
Priceless!
Admin
Thats right GUIDs must be bad. I love having so write sql and supporting code that returns values from insert statements.
It looks soooo enterprisey especially when I do bulk inserts :-p
Admin
That is assuming the policyGUID is the primary key, I would venture to say that it is more likely a foreign key in this case since its a Patient table.
Admin
damn, beat me to it
Admin
This is only the case if you use @@IDENTITY. Using SCOPE_IDENTITY() returns the most recent identity within the scope of the sproc.
Admin
It is pity it was not told if GUIDs are generated on server or on clients. But since it was made "for replication"... i think on server, where SEQUENCEs (GENERATORs in Interbase/Yaffil/old-Firebird terms) could be better. And what about if client would have a broken GUID-generating library ?
But afterall, if this necessary WTF ? Imagine You (personally You) had to port the running system from ISAM to SQL. And you have to do it fast, now and alone. Wouldn't there semi-automatic conversion from xBase-like language into this procedural SQL most quick, easy and error-safe way, comparing to rethinking each algorithm and then inventing the query, hopefully corect one. Don't fix the working program.
Sure at some time, when SQL server will be becoming more value as DMBS and less value as large storage with easy backup, then this scheme will produce bottlenecks(in other word would not be 'working program' no more), and those need to be found and refactored... Oh, isn't this exactly what the writer was hired to do ?
CAPTCHA: RIAA... Jesus, i hoped to have a good weekend! :-(
Admin
Dang, that is like going from Pittsburgh, PA to New York, NY via Los Angeles, CA.
You still get there, but who would choose to go that way?
Admin
Untrue. There is a very interesting article that IMO serves well to debunk that myth.
http://www.informit.com/articles/article.asp?p=25862&seqNum=1&rl=1
Of course, the GUID is 4x larger than an int, but in the era of cheap memory and storage, it is hardly an index-killer. It doesn't necessarily help with all types of replication, but using GUIDs as PKs makes merging databases a snap. The article includes an algorithm for creating an ordered GUID (COMB), which vastly improves performance when that GUID is or is part of a clustered index, to the point that it is just slightly less than the int-key baseline.
However, one hidden benefit that is not mentioned in this article is the scalability of each choice. Using an autoincrementing integer can sometimes lead to tables being locked for an unnecessarily long amount of time -- if you generate a new ID, you have to lock the entire table until the INSERT transaction has completed. If you have a system with a lot of concurrency (and I mean a LOT), this type of locking can become quite expensive in a large, heavily-used table.
This is not to say the GUID is a panacea. It has its drawbacks, especially if you're storage-constrained. But for modern database systems on relatively modern hardware, it's something to think about when designing your architecture.
Admin
I went from Philadelphia, PA to West Virginia by way of Detroit a few months ago.
NorthWest has a hub in Detroit, and I was told to get the cheapest fare I could find. So a two hour flight became an all-day event.
Hey, I got paid.
Admin
I've been using GUID rather than ints, longs and counters since about 2000, when I read Kimball's Data Warehouse book. If handling a lot of new rows is a problem, then GUIDs beat counters hollow. Especially so when these INSERTs are coming from distributed sources -- GUIDs avoid the bottleneck of allocating new unique IDs from a single sequence generator.
Admin
A word about the statement. It probably came a jr dev following a strict "don't use an update with a where clause that isn't the primary key of the table". They probably had issues with replication of cross db udpates in the past and went quite draconian.
Admin
But the GUIDs may very well help you to build some sort of replications scheme, which in turn may help to share some load among several boxes. Not to mention the situation where you have to be able to move the data to similar, but not identical, system...
I wouldn't probably use them as the primary key, though.
Admin
Admin
Yeh, its funny when the new 'expert' on a project needs to badmouth the old expert but doesn't have a clue what they're talking about.
Admin
If this thing had a voting system, I'd give you full marks.
Admin
Been there, done that. Had a database with a GUID as a PK in each and every one of the tables (indeed for the exact same reason "we might want easy replication"). It seemed fine at first, but the insert time seemed to increase exponentially...
Moving away from GUIDs to integer sped up the software by an insane amount of speed. After a certain amount of rows with GUIDs, inserting a new one becomes pretty slow, and your logfile turns gigantic. Inserting a legacy chunk of data (out of text files) turned from "not being done in an afternoon" to "done under 2 hours".
And I won't even mention the speed difference for a simple "Count(*)".
Last, but not least, GUIDs make debugging a nightmare when trying to follow what screwed up (memorizing two or three integers is easy, but you have to copy paste each GUID, which makes it a wee bit harder).
Granted, I'm no database "expert" and there might have been a way to make it work with GUIDs, but if you're not sure how to achieve performance, I say stay away from them.
Admin
Admin
Oh, I don't know, maybe you could use the natural keys?
Admin
Interesting: I've never noticed a speed difference or even heard this argument made, and I've been using guids since 2000. And I've heard the argument about "normal numbers" being easy to remember and write down before, but I've never had problems with doing some simple cut and pastes, and my memory is crap anyway.
captcha: pinball. Good idea!
Admin
Admin
You're right. You're no expert. A simple index defragment or reindex (depending on what SHOWCONTIG showed you) would have given you the same speed increase for about 10 minutes worth of the work. Most of the increase came not from switching data types, but from the fact that by switching data types, the indexes were rebuilt.
You have to know what you're using and know how to use it. Just saying "use A all the time!" and blindly following some database design "recipe" will inevitably lead you to make some pretty bad errors.
Admin
Captcha: dubya - well, it was more like "gotcha!"
Admin
Maybe I should have specified, driving from Pittsburgh to New York via Los Angeles.
Admin
wow... what a shameful waste.
I'll guess it's someone who doesn't know any better, a programmer without any previous sql knowledge writing this "sql" in a for-loop mentality. Forgivable, but for the love of the cosmos (and performance--- oh the humanity), make sure this poor sap learns some sql.
cursors? bad. I only use cursors to iterate over system views.
Admin
heh, thanks for the article. interesting read... although it is older, some of the info in it is definitely relevant. I always felt that a GUID would be a boon as a clustered index because of the size of the key to check, which should be less performant. it should be faster to check an int as a clustered index. I could care less about the space involved, errr... guess that really depends on how many rows we are talking about.
I haven't had the luxury of dealing with a DB that gets hit so hard you might want to take into account a GUID to avoid locking issues... very cool stuff, will read more.
Admin
And when he had finished coding his SQL, he called a taxi to take him across the street.
Admin
Re GUIDs, I'm not an MSSQL guru but my impression was that it defaults to GUIDs for unique key generation, or am I thinking about something else?
That aside, I had a project that consisted of isolated databases of identical structure located on servers owned by multiple companies that gathered real-time data using a leased common application but that required all data from all companies be collated in a single database at the application owner's site. Simplest way to keep things organized was to use a GUID with a location flag. It's been running about 7 years now, no glitches.
Admin
Admin
Before reading this thread, I would have thought using GUIDs as PKs was monumentally idiotic. Now I can see that there are in fact some cases where it could be very useful.
Or...reading WTF is making me more stupider.
Admin
Oddly enough we're having problems BECAUSE we're using GUIDs right now. It mostly comes with the time it takes to search through GUIDS vs INT because the data is so spread out combined with READ-COMMITTED.
Mainly the fault is that of InnoDB because it prefixes all indexes with the primary key.
Ee can't use the database GUID() because it would break replication. GUID generation is being done on the java app server side so our GUIDs are even more random. At least if we were using database GUID() we'd have some sort of sequence to the values.
CAPTCHA: wigwam (I think I need a sweat lodge after the GUID crap I've had to deal with today)
Admin
Am I the only one who noticed the subtlety conveyed in the statement that a person named "Kevin P." is working for "a well-known health care provider"? Or is that just considered obvious?
Admin
If you need to know the value before the insert, just SELECT SEQ.NEXTVAL before you do the insert, and include the PK in the insert instead of letting the trigger fill it in from the sequence.
If you need to do a bulk insert, just get a bunch of values from the sequence, possibly via a sproc. If you need to get the ID back after the insert, write a sproc which returns the ID, or just SELECT SEQ.CURRVAL from within the same transaction.
Gosh, is there something that Oracle is actually better for?