- Feature Articles
- CodeSOD
-
Error'd
- Most Recent Articles
- Stop Poking Me!
- Operation Erred Successfully
- A Dark Turn
- Nothing Doing
- Home By Another Way
- Coast Star
- Forsooth
- Epic
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
This comment doesn't have an index.
Admin
This module has File Not Found.
Admin
There are databases (SQLite comes to mind) which don't have an dedicated UUID type, so the only way to store them is as a string. However it's always bad form to mix up types, it just adds additional error opportunities and makes implementing proper code way more complex and results in a high cost in maintainability and poor extendability.
Admin
I've found way too many people who don't understand GUIDs or don't get you can use a GUID as an internal key only for optimization/uniqueness and still expose a descriptive key to the user. Once I even met a group who didn't use GUIDs because they "used too much memory" compared to ints.
Admin
The ID is the key. The GUID is looked up in this table and used to reference something else. The releases module doesn't point at a something else so doesn't have a GUID. Yes, relying on a parsing exception rather than just storing a NULL is indeed a bit weird ...
Admin
But what if you have a trillion rows in your database? You would need a whole extra hard drive in your pool just for the GUIDs!
Admin
Integer IDs are preferable to guids in many cases, but storage size, while technically valid, is a small problem in the grand scheme of things
Admin
You don't store a UUID as a string- you store it as an integer. If your database can't do 128 bit integers, then we can talk about string representations.
Admin
Heyho, that statement is no correct one so many levels but honestly DB vendors themselves have already tried to quell the "integers are not faster then UUIDs" rumors, but still it remains.
To take an example, the SQL server most optimized for UUIDs, MS SQL, has a best case penalty of only 10% over 64 bit integers when using UUIDs are PKs. This due to MS SQL having UUIDs as 1st class primitive type support from day 1 while all other DBs do not but either internally represent them as binary or text block.
Now lets start with ID generation: For an 64bit integer this is a single atomic CPU OP, while obviously an UUID generation takes when done cheap 1,000x, when done correctly 1,000,000x the compute power. This might be irrelevant for hobby DBs but it matters a lot for enterprise databases.
Now lets check the lookup side: Obviously 64bit integer have a perfect distribution without collision while any other structure needs to be hashed or simplified in another way, which additionally generates not only direct one-time overhead, you need a different index structure (think direct array access vs hash table, while in reality in DBs you actually have responsive tree structures). Now obviously this simplification can not be optimized on the expected data, because in the best case it's completely random which means in practice you will never have balanced distributions making the lookup usually slower and less predicatable.
Now I don't get into the details of overfetching issues (the biggest contirbutor to slow DB in practice BTW), client implementations due to mashalling and transcoding etc. etc.
Long story short: UUIDs are always less efficient by far compared 64 bit IDs.
There are some legit purposes for UUIDs though and those are literally what they were made for: Distributed IDs, so IDs between two completely seperated systems. Hence they often get uses as replications IDs, for synchonization purposes and as external IDs generated by a complete different domain.
Admin
Ehm, maybe I'm getting out of touch, but AFAIK neither MS SQL, Postgres, Oracle or Maria support native 128 bit integer types. And MS SQL is to my knowledge still the only one supporting natively UUIDs. But I'm happy to be proven wrong ;-)
Admin
We have a very similar table set up for one of our applications, and the reasoning behind the one here might be the same. We have an integer ID field to uniquely identify the record, and we have a GUID because an external system we interface with requires us to supply its GUID for the object any time we interact with it (which we obtain when we first register that object with the external system).
Some records have a GUID (because the external system knows about them and has supplied us the GUID to store). And some records do not (because they have never been passed to the external system).
My suspicion is that the situation for this wtf is similar.
Storing the GUID as a string? Depends, I guess, on what DB engine is being used.
Admin
that GUID field has a unique value for that row. Doesn't that mean that it does actually have a GUID? (perhaps an "even more unique" one than any other used anywhere else... <wink> )
Admin
So many valid and sensible comments on a totally mis-represented article... WTF is going on?
Admin
I don't know of any databases that support 128 bit integers, but all the ones I know of support storing raw bytes. I'd go with some BINARY(16) column before I resorted to storing them as strings.
Admin
One rationale for both integer and GUID keys comes from SQL Server... using an integer as the table primary key helps with efficient indexing and page fill. It may not be desirable to expose those IDs as public identifiers, so that's where the GUID comes in, but that still doesn't account for either storage as strings or exempting rows, especially when the exemption is marked by free text.
Admin
Jeez I've been out of touch for a while it seems. Last I remember, GUIDs were the preferred way to guarantee uniqueness.
yikes.
Admin
Guids guarantee uniqueness in a global context, which is not needed 99% of the time for a table row.
Addendum 2024-07-03 11:22: PS. MaxiTB's comment explains it all, really.
Admin
Well, that's because you're not thinking enterprisy enough. All you need to do is outsource your GUID generation to a separately scaled microservice, or to a "cloud" provider that does GUID as a Service.
This is a parody, in case that's not obvious.
Admin
THe worst coding I ever encountered-- and I posted to this group decades ago with it, not sure of the link-- was in a Visual Basic 6.0 app which was doing some database read/write operations.
One DB table had a GUID value that was no longer used, and a dev needed to store an integer, so instead of changing the database schema he wrote two functions, IntToGuid() and GuidToInt() and "translated" the values after reading / writing queries.
The result was we had a lot of identical GUID values for int values. Interesting bit of work doing binary coding in VB 6.0 to do the translations, however.
Admin
Next you will be telling us that a telephone number is an interger
Admin
Do you still have the code, or at least the overall logic of it? It sounds like a gem to keep.
Admin
For what it's worth, Postgres has a distinct GUID type. If you're storing GUIDs there you've got no reason not to use the type. It doesn't need to be a full "integer type" because how often are you going to be doing arithmetic with them?
Admin
I'm just impressed that all the stringly GUIDs (except the absent one) are represented in the same format: enclosing braces, lower case alpha, and dashes. In .NET parlance that's the "B" format:
https://learn.microsoft.com/en-us/dotnet/api/system.guid.tostring?view=net-8.0
I would more expect at least one to be upper-cased or have parens instead of braces or ... Anything to ensure a string comparison somewhere in the codebase will return false on what's really the same underlying 128 bit Int.
Kids these days! Can't even screw up correctly.
Admin
Technically a GUID is represented internally like this:
See https://referencesource.microsoft.com/#mscorlib/system/guid.cs
It has historical reasons because in the past GUIDs were constructed via various criteria like hardware attributes, version number etc.
That's by the way another reason why you should avoid UUIDs when possible, they are very expansive to operate compared to simple 64 bit integers:
You have here a ton of inefficient 8bit & 16bit comparisons when implemented correctly, so yeah, using GUIDs as IDs is not really something you want to do client-side either when not needed.
Admin
Count me amongst those who think the GUID is for something external and the only WTF is storing it as a string (which may be necessary anyway.)
Admin
Same. We have a column for “supplied by us” in lookup tables like this, and while clients can modify the values of the other columns, they can’t modify “our” column, so even if they rename something we’ll still know if we’ve added it previously and don’t have to add again. The column name here isn’t great though.