• RLB (unregistered)

    Extra WTF: why is the UUID column a varchar 128? Uuids are 128 bits. They're not variable, and not that long, so use a char 32 already.

  • Tim (unregistered)

    "First, users should never see an auto-incrementing ID. That means you need to use UUIDs. But UUIDs are large and expensive, so they should never be your primary key, use an auto-incrementing ID for that.

    This is not, in and of itself, a radical or ridiculous statement"

    Is this really not a radical or ridiculous statement? Maybe I'm old fashioned but It would certainly raise concerns for me. surely you want to have one primary key field for each object?

  • 516052 (unregistered) in reply to Tim

    It makes some sense from a security standpoint if you do your API right.

    Basically imagine your standard MVC layout. Your frontend only ever sees UIDs and all its queries toward the backend only use UIDs. The backend and DB all use the numeric ID for all actual DB queries. This creates a complete separation where the two systems are entirely decoupled while still giving you all the advantage of using numeric IDs for actual processing.

  • (nodebb)

    @RLB: I would hope by now most SQL-ish databases have a native UUID datatype. If there is a native UUID datatype, then storing your UUID value in the database as anything other than that datatype is nuts. ISTM a 128-bit integer datatype is the least bad next choice if the database supports that. Then a 16-byte binary string, and only finally some length 32 Unicode character hexadecimal display format.

  • KarterianMoss (unregistered) in reply to 516052

    I need to think about what you said, but I find it really hard to believe that whatever benefit you get from that separation outweighs the downsides. For starters, if not done very carefully right it would turn any debugging session into a nightmare.

    ================================================================================================== "But UUIDs are large and expensive, so they should never be your primary key, use an auto-incrementing ID for that." The fact that they aren't the PK wouldn't make these UUIDs smaller or less expensive. If Bjørn has already commited himself to including them in the table, adding another (integer) field to be the PK would only make things worse

  • (nodebb)

    The problem with using auto-increment as PK is that then you are using it for FK. If you need to insert 100,000 records AND their attendant child records, you can't pre-generate the PK/FK and insert them all at once. You can insert the primary records, re-fetch them with their UUIDs, and then put the auto-increments into the child records. I'm just not sure if the performance value of auto-increments is worth that extra mess.

  • (nodebb) in reply to Tim

    The UUID is not a primary key, it's an external identifier. It does happen to also uniquely identify a record, but it doesn't fill the role of primary key in the database.

  • (nodebb) in reply to konnichimade

    you can't pre-generate the PK/FK and insert them all at once

    Pretty sure I've been doing that for years in my .NET applications. [looks at code] Yep.

  • (nodebb) in reply to RLB

    No, use the UUID type in Postgres (which they were using). 16-byte storage. Not 32.

  • (author) in reply to KarterianMoss

    It's going to vary a lot by database engine, and how optimized their UUID type is. I still come across databases in production that don't have a native UUID type, or where indexing on them gets weird. And it's not so much about the storage cost, but the indexing cost- if that value is your primary, it's likely going to show up in multiple indexes. And it's 128-bits, in contrast to, say, a bigint, which is 64-bits in many implementations. Or a numeric type, like Oracle's numeric, maxes out at 22-bytes (but is varying size, which introduces its own concerns). And each index is also going to create its own set of stats gathering. Doubling the size of your type (or worse) can have downstream effects- broadly speaking, there's an argument for joining on numeric types and using them as primary keys, and associating a unique UUID for user-facing lookups.

    Which is why I stick with: "It's not a ridiculous statement." It's not necessarily a correct statement. But there are enough caveats and complexities that it's not something to reject on its face.

    I think in the modern world, distributed/sharded database systems are common enough that there's an equally good argument to be made to "prefer a UUID for primary keys", but again, there are so many caveats that it's not inherently right or wrong- it depends.

    All that said: exposing auto-incrementing IDs to users is definitely a bad practice most of the time.

  • Álvaro González (github)

    These tables don't need additional columns to serve as primary key, but try to convince many ORMs and assorted tools to work with a PK that is not a single column called "id".

  • (nodebb) in reply to Remy Porter

    exposing auto-incrementing IDs to users is definitely a bad practice most of the time.

    Yes, people keep saying that but why? The only reason I can think (apart from the one below which applies to all internal ids) of is if your system is so poorly designed that having easily guessable ids might lead to a security breach. If that's the case, fix your system.

    I would argue that having any kind of internal id exposed to the user (as opposed to the client side code) including UUIDs is a bad idea. As soon as you do that, you lose control of it because the user can copy it. For example: put any kind of internal id in a URL and the user can bookmark it. You can never change that id again without breaking the bookmark.

    If you need an extra id to be exposed to the user, can you please make it more ergonomic than a UUID. Make it something that the user has a fighting chance of remembering or writing down by hand without making a mistake. There is a reason why no web site ever says "your login id is 805db880-ea57-41ce-87d7-42fb5d6d6aa5".

  • Foo AKA Fooo (unregistered) in reply to RLB

    "I heard that UUIDs are 128 big, so I make the field 128. Bits, chars, potayto, potahto. And some items might not have a UUID, because reasons, and our coding standards forbid NULL because NULL is evil. So I make it varchar, so they can put an empty string there."

  • (nodebb)

    Yes, people keep saying that but why?

    It exposes things that aren't necessarily security issues, like number of users, number of orders placed, numbers of anything else in the system.

  • A nunny moose (unregistered) in reply to RLB

    Maybe they were afraid a UUID wasn't unique enough and they munged several UUIDs together.

    And if you think nobody could do something so horrible, I once had a co-worker who used the random function to generate primary keys and was adamant that it would be fine.

  • (nodebb) in reply to PJH

    It exposes things that aren't necessarily security issues, like number of users

    
    CREATE TABLE [dbo].[Foo] (
        [FooID]          INT           NOT NULL IDENTITY (10001, 1),
        [FooName]        VARCHAR  (32) NOT NULL,
        [FooDescription] VARCHAR (512) NOT NULL
        CONSTRAINT [PK_Foo] PRIMARY KEY CLUSTERED ([FooID] ASC),
    );
    
  • MaxiTB (unregistered) in reply to Tim

    Your intenal technical primary key for performance reasons should always be a primitive integer type that is as small as possible but not smaller than the word size of your platform (because that would be again inefficient). In other words, on a 64bit platform your primary key should be a 64bit auto-increment integer for every table and that means you end with all 64bit integer foreign keys as well. That breaks on of the fundamental rules on how to design a DB but the performance gains and reduced complexity are simply worth it.

    Now you should also have a public business key that is unique for a table that needs to be addressed either by a stakeholder or the primary data cosumer itself in an CQRS scenario (you need to provider an identifier on INSERT because combined Read/Write operations are not allowed in that case). This allows you to rebuild and optimized the tables technical primiary key by reorganizing it or even completely change the DB structure while still providing a valid ID for the stake holders even if the record itself doesnt even exist anymore and is for example now a partial collection of various records in various tables. In other words, you should never expose your technical key in an publiclly facing API or application to avoid getting completely locked in by exposing internal details.

  • (nodebb) in reply to WTFGuy

    To my knowledge only MS SQL server has a full-supported native UUID DB type, all other vendors have at best a binary type alias but nothing beyond that like version specific indexing to speed up the lookup process. And I know from a MS SQL blog a long time ago that they optimized GUIDs to the max and still the performance is only half of an 64bit (auto inc) integer. On the other hand you have DBs like SQLite which don't have a binary type and the index performance is just horrible and inconsistent. So basically if you want to design to be somewhat database agnostic, using UUIDs as a primary key is not the best idea when it comes to performance and even if you only use MS SQL, you will still end up with half the perf on larger tables.

Leave a comment on “Joined Up”

Log In or post as a guest

Replying to comment #:

« Return to Article