The Daily WTF: Curious Perversions in Information Technology

Mr. TA · 2024-07-03 Reply Admin

This comment doesn't have an index.

2024-07-03 Reply Admin

This module has File Not Found.

MaxiTB · 2024-07-03 Reply Admin

This also tells us that the GUIDs are stored as strings, which is the wrong way to store a GUID.

There are databases (SQLite comes to mind) which don't have an dedicated UUID type, so the only way to store them is as a string. However it's always bad form to mix up types, it just adds additional error opportunities and makes implementing proper code way more complex and results in a high cost in maintainability and poor extendability.

DocMonster · 2024-07-03 Reply Admin

I've found way too many people who don't understand GUIDs or don't get you can use a GUID as an internal key only for optimization/uniqueness and still expose a descriptive key to the user. Once I even met a group who didn't use GUIDs because they "used too much memory" compared to ints.

2024-07-03 Reply Admin

The ID is the key. The GUID is looked up in this table and used to reference something else. The releases module doesn't point at a something else so doesn't have a GUID. Yes, relying on a parsing exception rather than just storing a NULL is indeed a bit weird ...

Dragnslcr · 2024-07-03 Reply Admin

But what if you have a trillion rows in your database? You would need a whole extra hard drive in your pool just for the GUIDs!

Mr. TA · 2024-07-03 Reply Admin

Integer IDs are preferable to guids in many cases, but storage size, while technically valid, is a small problem in the grand scheme of things

Remy Porter · 2024-07-03 Reply Admin

You don't store a UUID as a string- you store it as an integer. If your database can't do 128 bit integers, then we can talk about string representations.

MaxiTB · 2024-07-03 Reply Admin

Heyho, that statement is no correct one so many levels but honestly DB vendors themselves have already tried to quell the "integers are not faster then UUIDs" rumors, but still it remains.

To take an example, the SQL server most optimized for UUIDs, MS SQL, has a best case penalty of only 10% over 64 bit integers when using UUIDs are PKs. This due to MS SQL having UUIDs as 1st class primitive type support from day 1 while all other DBs do not but either internally represent them as binary or text block.

Now lets start with ID generation: For an 64bit integer this is a single atomic CPU OP, while obviously an UUID generation takes when done cheap 1,000x, when done correctly 1,000,000x the compute power. This might be irrelevant for hobby DBs but it matters a lot for enterprise databases.

Now lets check the lookup side: Obviously 64bit integer have a perfect distribution without collision while any other structure needs to be hashed or simplified in another way, which additionally generates not only direct one-time overhead, you need a different index structure (think direct array access vs hash table, while in reality in DBs you actually have responsive tree structures). Now obviously this simplification can not be optimized on the expected data, because in the best case it's completely random which means in practice you will never have balanced distributions making the lookup usually slower and less predicatable.

Now I don't get into the details of overfetching issues (the biggest contirbutor to slow DB in practice BTW), client implementations due to mashalling and transcoding etc. etc.

Long story short: UUIDs are always less efficient by far compared 64 bit IDs.

There are some legit purposes for UUIDs though and those are literally what they were made for: Distributed IDs, so IDs between two completely seperated systems. Hence they often get uses as replications IDs, for synchonization purposes and as external IDs generated by a complete different domain.

MaxiTB · 2024-07-03 Reply Admin

Ehm, maybe I'm getting out of touch, but AFAIK neither MS SQL, Postgres, Oracle or Maria support native 128 bit integer types. And MS SQL is to my knowledge still the only one supporting natively UUIDs. But I'm happy to be proven wrong ;-)

2024-07-03 Reply Admin

We have a very similar table set up for one of our applications, and the reasoning behind the one here might be the same. We have an integer ID field to uniquely identify the record, and we have a GUID because an external system we interface with requires us to supply its GUID for the object any time we interact with it (which we obtain when we first register that object with the external system).

Some records have a GUID (because the external system knows about them and has supplied us the GUID to store). And some records do not (because they have never been passed to the external system).

My suspicion is that the situation for this wtf is similar.

Storing the GUID as a string? Depends, I guess, on what DB engine is being used.

2024-07-03 Reply Admin

that GUID field has a unique value for that row. Doesn't that mean that it does actually have a GUID? (perhaps an "even more unique" one than any other used anywhere else... <wink> )

2024-07-03 Reply Admin

So many valid and sensible comments on a totally mis-represented article... WTF is going on?

Balor64 · 2024-07-03 Reply Admin

I don't know of any databases that support 128 bit integers, but all the ones I know of support storing raw bytes. I'd go with some BINARY(16) column before I resorted to storing them as strings.

2024-07-03 Reply Admin

One rationale for both integer and GUID keys comes from SQL Server... using an integer as the table primary key helps with efficient indexing and page fill. It may not be desirable to expose those IDs as public identifiers, so that's where the GUID comes in, but that still doesn't account for either storage as strings or exempting rows, especially when the exemption is marked by free text.

DocMonster · 2024-07-03 Reply Admin

Jeez I've been out of touch for a while it seems. Last I remember, GUIDs were the preferred way to guarantee uniqueness.

yikes.

Mr. TA · 2024-07-03 Reply Admin

Guids guarantee uniqueness in a global context, which is not needed 99% of the time for a table row.

Addendum 2024-07-03 11:22: PS. MaxiTB's comment explains it all, really.

pcooper · 2024-07-03 Reply Admin

obviously an UUID generation takes when done cheap 1,000x, when done correctly 1,000,000x the compute power. This might be irrelevant for hobby DBs but it matters a lot for enterprise databases.

Well, that's because you're not thinking enterprisy enough. All you need to do is outsource your GUID generation to a separately scaled microservice, or to a "cloud" provider that does GUID as a Service.

This is a parody, in case that's not obvious.

JabberMan · 2024-07-03 Reply Admin

THe worst coding I ever encountered-- and I posted to this group decades ago with it, not sure of the link-- was in a Visual Basic 6.0 app which was doing some database read/write operations.

One DB table had a GUID value that was no longer used, and a dev needed to store an integer, so instead of changing the database schema he wrote two functions, IntToGuid() and GuidToInt() and "translated" the values after reading / writing queries.

The result was we had a lot of identical GUID values for int values. Interesting bit of work doing binary coding in VB 6.0 to do the translations, however.

2024-07-03 Reply Admin

Next you will be telling us that a telephone number is an interger

2024-07-04 Reply Admin

Do you still have the code, or at least the overall logic of it? It sounds like a gem to keep.

Watson · 2024-07-04 Reply Admin

For what it's worth, Postgres has a distinct GUID type. If you're storing GUIDs there you've got no reason not to use the type. It doesn't need to be a full "integer type" because how often are you going to be doing arithmetic with them?

WTFGuy · 2024-07-04 Reply Admin

I'm just impressed that all the stringly GUIDs (except the absent one) are represented in the same format: enclosing braces, lower case alpha, and dashes. In .NET parlance that's the "B" format:
https://learn.microsoft.com/en-us/dotnet/api/system.guid.tostring?view=net-8.0

I would more expect at least one to be upper-cased or have parens instead of braces or ... Anything to ensure a string comparison somewhere in the codebase will return false on what's really the same underlying 128 bit Int.

Kids these days! Can't even screw up correctly.

MaxiTB · 2024-07-04 Reply Admin

underlying 128 bit Int

Technically a GUID is represented internally like this:

        private int         _a;
        private short       _b;
        private short       _c;
        private byte       _d;
        private byte       _e;
        private byte       _f;
        private byte       _g;
        private byte       _h;
        private byte       _i;
        private byte       _j;
        private byte       _k;

See https://referencesource.microsoft.com/#mscorlib/system/guid.cs

It has historical reasons because in the past GUIDs were constructed via various criteria like hardware attributes, version number etc.

That's by the way another reason why you should avoid UUIDs when possible, they are very expansive to operate compared to simple 64 bit integers:

        public bool Equals(Guid g)
        {
            // Now compare each of the elements
            if(g._a != _a)
                return false;
            if(g._b != _b)
                return false;
            if(g._c != _c)
                return false;
            if (g._d != _d)
                return false;
            if (g._e != _e)
                return false;
            if (g._f != _f)
                return false;
            if (g._g != _g)
                return false;
            if (g._h != _h)
                return false;
            if (g._i != _i)
                return false;
            if (g._j != _j)
                return false;
            if (g._k != _k)
                return false;
 
            return true;
        }

You have here a ton of inefficient 8bit & 16bit comparisons when implemented correctly, so yeah, using GUIDs as IDs is not really something you want to do client-side either when not needed.

LorenPechtel · 2024-07-04 Reply Admin

Count me amongst those who think the GUID is for something external and the only WTF is storing it as a string (which may be necessary anyway.)

2024-07-05 Reply Admin

Same. We have a column for “supplied by us” in lookup tables like this, and while clients can modify the values of the other columns, they can’t modify “our” column, so even if they rename something we’ll still know if we’ve added it previously and don’t have to add again. The column name here isn’t great though.

Looks Guid to Me

Leave a comment on “Looks Guid to Me”