• Jay (unregistered)

    Instead of trying to force-fit the GUID approach, or adopting some date/time pattern with a questionable resolution, I'll be replacing it with a 1-row 1-column table with the next INT or BIGINT to use within the body of the PK.

    It turns out that there are actually multiple new classes of synthetic IDs that need to be added, each with its own prefix letter. My posting only mentioned the 'N' version. They will all share the same table & logic for allocating the next number to be used as part of the PK.

    Simple and correct -- or is there something wrong with this approach too?

  • HowItWorks (unregistered) in reply to Jay
    Jay:
    Instead of trying to force-fit the GUID approach, or adopting some date/time pattern with a questionable resolution, I'll be replacing it with a 1-row 1-column table with the next INT or BIGINT to use within the body of the PK.

    It turns out that there are actually multiple new classes of synthetic IDs that need to be added, each with its own prefix letter. My posting only mentioned the 'N' version. They will all share the same table & logic for allocating the next number to be used as part of the PK.

    Simple and correct -- or is there something wrong with this approach too?

    To do this correctly, it MUST detect and handle race conditions!

    Without details, it appears to be a better approach, provided it is properly implemented.

  • Richard (unregistered)

    Wait...you can't use the whole thing. Ok, just use part of a GUID!

    No, TRWTF is missing the final ">" on your closing "

    " tag! :p

  • JJ (unregistered) in reply to Peter Michael Green
    Peter Michael Green:
    By my recoking [...]
    People did a lot of recoking in the 80's.
  • J (unregistered) in reply to jay
    jay:
    joe.edwards:
    Smug Unix User:
    The real WTF is using anything except id for the single primary key. What value are you adding? Table.Id is not as redundant as Table.TableId. It is not as random as Table.[22345668905233560890]. Can anyone make a valid case for naming a single non-composite primary key with anything other than id?
    *responds to ObviousTroll*

    I use TableNameID because foreign key columns will have the same name as the primary key (always join on x.foo = y.foo), and I don't have to alias the column when I select from a join.

    Exactly. And once the field name when used as a foreign key has to be different from the field name in the original record, you're bound to get people calling it different things. I'm presently working on a system where sometimes transactions.id is referenced as "transactions_id", other times as "transaction_id", "transactionid", "tranid", etc etc. Ditto when you have to alias it.

    x.y_id = y.id y.z_q_id = z_q.id

  • Paul Neumann (unregistered) in reply to J
    J:
    x.y_id = y.id y.z_q_id = z_q.id
    +1
  • F (unregistered) in reply to Nagesh
    Nagesh:
    faoileag:
    Smug Unix User:
    Can anyone make a valid case for naming a single non-composite primary key with anything other than id?
    Think of a class DriversLicence with the following members: id, name, penaltyPointsAccrued.

    Some people prefer to keep the names of the member fields intact in the database (think roll-your-own ORM). So id is taken by the id of the driver's licence, and thus the unique primary key becomes Table.Id

    Yes, I know it's a bad example as the id of the driver's licence should be unique as well, but still.

    In India, I can have driver's license in two states at same time. Both current.

    Schroedinger's Cat is also in two states at the same time.

  • VictorSierraGolf (unregistered) in reply to Valued Service

    Oh, I see, somebody never had to learn German or Russian. Lucky you...

  • B (unregistered) in reply to Jay
    Jay:
    Instead of trying to force-fit the GUID approach, or adopting some date/time pattern with a questionable resolution, I'll be replacing it with a 1-row 1-column table with the next INT or BIGINT to use within the body of the PK.

    It turns out that there are actually multiple new classes of synthetic IDs that need to be added, each with its own prefix letter. My posting only mentioned the 'N' version. They will all share the same table & logic for allocating the next number to be used as part of the PK.

    Simple and correct -- or is there something wrong with this approach too?

    If you're using SQL Server 2012, you can also use a SEQUENCE. It handles a lot of race condition possibilities for you.

    http://msdn.microsoft.com/en-us/library/ff878091.aspx

  • Tractor (unregistered) in reply to SQLer
    SQLer:
    JAPH:
    The pseudo-random requirement for the id is to make it harder for hackers / crackers to guess the next id. How many times have we seen a web interface where the table id is passed in through the uri?

    Security through obscurity is what you're describing... and only prevents access to "unauthorized records" (from bad code) by writing more bad code.

    Indeed, if someone shouldn't be allowed access to something, then don't allow access to it. Making it harder to find and hoping he won't guess it is NOT security, it's wishful thinking.

  • Captain Oblivious (unregistered) in reply to SQLer
    SQLer:
    JAPH:
    The pseudo-random requirement for the id is to make it harder for hackers / crackers to guess the next id. How many times have we seen a web interface where the table id is passed in through the uri?

    Security through obscurity is what you're describing... and only prevents access to "unauthorized records" (from bad code) by writing more bad code.

    You have no idea what you're talking about.

    Hint: if you think 128 or 256 or 512 bits of entropy is merely "obscurity", you don't deserve a job in security.

  • (cs)

    http://xkcd.com/538/

  • DBAVetus (unregistered) in reply to Jay
    Jay:
    Instead of trying to force-fit the GUID approach, or adopting some date/time pattern with a questionable resolution, I'll be replacing it with a 1-row 1-column table with the next INT or BIGINT to use within the body of the PK.

    Oh, god. Not this again.

    You'll have to lock the table, read it, update it, commit it.

    If you don't lock the table, two processes can read it at the same time and get the same "GUID".

    If you do lock the table while reading, the approach "works", but kills your performance as all processes have to queue up to get at the next "GUID", and if one of the processes fails part way through without committing, your whole system stops dead.

    If you can, use a sequence, or something else that your DB undoubtedly has to handle this exact situation.

  • Norman Diamond (unregistered) in reply to Nagesh
    Nagesh:
    MatsH:
    String.Concat("F", "R", "I", "X", "T").ToLower().Replace("X", "S")
    Not work with turkey Localization.
    It doesn't matter. "frixt" and "frıxt" both lack capital "X".
  • Norman Diamond (unregistered) in reply to Valued Service
    Valued Service:
    TRWTF is English.
    That's why they need speakers of Elbonian.
  • Anonymous (unregistered) in reply to Peter Michael Green
    Peter Michael Green:
    Colin 't Hart:
    Assuming 19 ASCII characters, then there are 19*7 = 133 bits available. GUIDs are 128 bits so that fits comfortably.
    That works if we assume all ascii characters are usable whether or not they are printable.

    If we limit ourselves to printable characters including space that gives us 95 usable characters which according to wolfram alpha lets us store just over 124 bits worth of information in 19 characters.

    By my recoking to store the whole GUID in 19 character positions the system needs to be able to safely handle 107 possibilities for each character.

    4 bits in the GUID are used to encode the version, wich is the same for every GUID generated by SQL Server, so there are only 124 bits to be encoded, a perfect match for ASCII.
  • Duc (unregistered)

    The truly WTF is that a project manager can influence the solution by on his/her own interpretation of the requirement. And if Jay keeps letting this happens, his development life will be as miserable as it can get.

  • Swedish tard (unregistered) in reply to herby
    herby:
    faoileag:
    TRWTF is English.
    Maybe true, but for those of you following at home, when doing internationalization, English usually has the shortest (character wise) phrases. If you have silly things like dialog boxes (funny plural) and only provide space for English, other languages will overflow.

    So, yes, English is a WTF, but other languages are WTF as well. They use silly things like diacriticals and even funny letters.

    The next tirade is on our measurement systems, but that is a big WTF that we all know.

    No, pretty much every asian language kicks english ass, and with a wide marging. Also, I'd wager that the scandinavian languages does as well, us being prone to not say things unnecesarily.

  • (cs) in reply to DBAVetus
    DBAVetus:
    Jay:
    Instead of trying to force-fit the GUID approach, or adopting some date/time pattern with a questionable resolution, I'll be replacing it with a 1-row 1-column table with the next INT or BIGINT to use within the body of the PK.

    Oh, god. Not this again.

    You'll have to lock the table, read it, update it, commit it.

    If you don't lock the table, two processes can read it at the same time and get the same "GUID".

    If you do lock the table while reading, the approach "works", but kills your performance as all processes have to queue up to get at the next "GUID", and if one of the processes fails part way through without committing, your whole system stops dead.

    If you can, use a sequence, or something else that your DB undoubtedly has to handle this exact situation.

    Why is this not a featured comment? This guy has just saved me a lot of typing.

  • Paul (unregistered)

    And Firebird and Interbase both have generators but you would have to convert it to a string and put the N at front but that would be handled by the trigger.

    There are better ways than truncating a GUID

  • (cs) in reply to J
    J:
    This assumes that any ASCII character is fair game, including NUL, ACK, BEL, and so on. Whether characters like TAB and CR would be allowed is debatable... But let's say you have 100 printable characters to play with (it's almost certainly a little less):

    100^19 = 1.0e+38 permutations 256^16 = 3.4e+38 permutations

    So you're short on space by more than a factor of three.

    You've got 95 characters, if you allow space. Otherwise 94 (from U+00021 to U+0007e). With 19 characters drawn from that set, you get up to about 124 bits of information. It turns out that whether you allow spaces doesn't matter all that much in how many bits of information you can put in 19 symbols.

    Allowing the “N” to vary over the letters of the alphabet (though still staying upper case) would be enough to allow a complete encoding with 1 bit to spare.

  • faoileag (unregistered) in reply to herby
    herby:
    faoileag:
    TRWTF is English.
    TRWTF is people who mess up the quotes when they quote a post with nested quotes.
  • faoileag (unregistered) in reply to Swedish tard
    Swedish tard:
    Also, I'd wager that the scandinavian languages does as well, us being prone to not say things unnecesarily.
    You are thinking of Finnish, obviously. Danish is actually not too bad (unless you try to understand spoken Danish) and swedish is ok, too. But Finnish... looking at Finnish I'm completely baffled regarding pronounciation, in much the same way I am with Welsh :-)
  • Peter Michael Green (unregistered)

    When working with randomly allocated unique IDs you have to ask yourself two questions.

    1: How many records will the database have (n) 2: What probablity of collision is acceptable (p)

    Let x be the number of values of our unique ID

    Assume that n >> 1 , also assume that since we are trying to find values that will make one collision very unlikely two or more collisions will be extremely unlikely.

    The number of pairs of records is approximately (n^2)/2 The probablity of any particular pair colliding is 1/x The probablity of a collision in the set as a whole is approximately (n^2)/2x

    Seting p = (n^2)/2x and solving for x gives us

    x=(n^2)/2p

    Since we haven't been told how big the database in question is or the acceptable probablity of a collision we can't say whether the unique ID he has generated is sufficiently large or not.

  • (cs) in reply to Swedish tard
    Swedish tard:
    No, pretty much every asian language kicks english ass, and with a wide marging. Also, I'd wager that the scandinavian languages does as well, us being prone to not say things unnecesarily.

    Endis vurn, hoom der inserten der data en der tablen, pootin oom ver databasen! Bork! Bork! Bork!

  • (cs) in reply to B
    B:
    Jay:
    Instead of trying to force-fit the GUID approach, or adopting some date/time pattern with a questionable resolution, I'll be replacing it with a 1-row 1-column table with the next INT or BIGINT to use within the body of the PK.

    It turns out that there are actually multiple new classes of synthetic IDs that need to be added, each with its own prefix letter. My posting only mentioned the 'N' version. They will all share the same table & logic for allocating the next number to be used as part of the PK.

    Simple and correct -- or is there something wrong with this approach too?

    If you're using SQL Server 2012, you can also use a SEQUENCE. It handles a lot of race condition possibilities for you.

    http://msdn.microsoft.com/en-us/library/ff878091.aspx

    SQL Server has had SEQUENCE for a lot longer than that. But it still doesn't have an atomic way to retrieve the generated SEQUENCE value on insert, (unless they added that in 2012; not sure,) which makes for race conditions of its own.

    I know Firebird lets you do that with the INSERT INTO ... RETURNING syntax; what other databases have that?

  • All hail the Google (unregistered) in reply to Mason Wheeler
    Mason Wheeler:
    B:
    Jay:
    Instead of trying to force-fit the GUID approach, or adopting some date/time pattern with a questionable resolution, I'll be replacing it with a 1-row 1-column table with the next INT or BIGINT to use within the body of the PK.

    It turns out that there are actually multiple new classes of synthetic IDs that need to be added, each with its own prefix letter. My posting only mentioned the 'N' version. They will all share the same table & logic for allocating the next number to be used as part of the PK.

    Simple and correct -- or is there something wrong with this approach too?

    If you're using SQL Server 2012, you can also use a SEQUENCE. It handles a lot of race condition possibilities for you.

    http://msdn.microsoft.com/en-us/library/ff878091.aspx

    SQL Server has had SEQUENCE for a lot longer than that. But it still doesn't have an atomic way to retrieve the generated SEQUENCE value on insert, (unless they added that in 2012; not sure,) which makes for race conditions of its own.

    I know Firebird lets you do that with the INSERT INTO ... RETURNING syntax; what other databases have that?

    Oracle appears to have it (and an incredibly easy way to use sequences to resolve this issue); "Insert into ... values (concat('N' + seq.nextval), ...)" I really don't understand why any engineer would let the PM decide on the implementation details of the DB (or really anything). Isn't that my job?

  • B (unregistered) in reply to Mason Wheeler
    Mason Wheeler:
    B:
    Jay:
    Instead of trying to force-fit the GUID approach, or adopting some date/time pattern with a questionable resolution, I'll be replacing it with a 1-row 1-column table with the next INT or BIGINT to use within the body of the PK.

    It turns out that there are actually multiple new classes of synthetic IDs that need to be added, each with its own prefix letter. My posting only mentioned the 'N' version. They will all share the same table & logic for allocating the next number to be used as part of the PK.

    Simple and correct -- or is there something wrong with this approach too?

    If you're using SQL Server 2012, you can also use a SEQUENCE. It handles a lot of race condition possibilities for you.

    http://msdn.microsoft.com/en-us/library/ff878091.aspx

    SQL Server has had SEQUENCE for a lot longer than that. But it still doesn't have an atomic way to retrieve the generated SEQUENCE value on insert, (unless they added that in 2012; not sure,) which makes for race conditions of its own.

    You have not done your research, sir. There's a reason I linked the MSDN page.

    SQL Server has had IDENTITY properties allowed on table fields for a very long time, but SEQUENCE is never-before-seen-brand-new in SQL Server 2012. The query "SELECT * FROM sys.sequences" will fail on any version of SQL Server prior to 2012. They are similar to Oracle SEQUENCEs (unsurprisingly) and they represent a unique way of handling a series of numbers.

    A SEQUENCE is a table-independent value. It is not tied to a specific field in a specific table, but is managed as an entity on it's own.

    You can get a value using the NEXT VALUE FOR construct (examples here: http://msdn.microsoft.com/en-us/library/ff878370.aspx ). Unless you specifically allow your SEQUENCE to restart (or CYCLE) the sequence when the max value has been reached, the server will only return any given value for NEXT VALUE FOR once. It's like a take-a-number paper ticket at the DMV. The values are consumed.

    I know Firebird lets you do that with the INSERT INTO ... RETURNING syntax; what other databases have that?

    SQL Server can do this with the OUTPUT clauses, which are valid for INSERT, UPDATE, and DELETE statements (as well as the SQL Server-specific MERGE statement) using the inserted.* and deleted.* special table references. See here: http://msdn.microsoft.com/en-us/library/ms177564.aspx That has been in place since SQL Server 2005.

  • TortoiseWrath (unregistered)

    What was wrong with "N2013041222124600001"?

  • Guy (unregistered)

    ...and then can you are told that the PK has to be a clustered index...

  • AC (unregistered)

    TRWTF is that a project manager gets to decide these kind of things...

  • AN AMAZING CODER (unregistered) in reply to Tractor
    Tractor:
    SQLer:
    JAPH:
    The pseudo-random requirement for the id is to make it harder for hackers / crackers to guess the next id. How many times have we seen a web interface where the table id is passed in through the uri?

    Security through obscurity is what you're describing... and only prevents access to "unauthorized records" (from bad code) by writing more bad code.

    Indeed, if someone shouldn't be allowed access to something, then don't allow access to it. Making it harder to find and hoping he won't guess it is NOT security, it's wishful thinking.

    Yeah, passwords are dumb, since anyone can get to a website and type one in.

    Ignoring the fact that a UUID has exponentially more entropy than anyone's banking password on thus website, using a UUID instead of a non-linear sequence isn't just for "making it hard to guess". It's also more scalable, and hides business intelligence. It's the same reason I started using invoices prefixed with the year when I started my business: I didn't want anyone to know they were only the first (or 5th, or 20th) person I've ever billed.

  • bambam (unregistered) in reply to DBAVetus
    DBAVetus:
    Jay:
    Instead of trying to force-fit the GUID approach, or adopting some date/time pattern with a questionable resolution, I'll be replacing it with a 1-row 1-column table with the next INT or BIGINT to use within the body of the PK.

    Oh, god. Not this again.

    You'll have to lock the table, read it, update it, commit it.

    If you don't lock the table, two processes can read it at the same time and get the same "GUID".

    If you do lock the table while reading, the approach "works", but kills your performance as all processes have to queue up to get at the next "GUID", and if one of the processes fails part way through without committing, your whole system stops dead.

    If you can, use a sequence, or something else that your DB undoubtedly has to handle this exact situation.

    Used the following SQL to implement a database independent record id generator:

    SELECT curid FROM idtable previd = curid nextid = curid + 1 UPDATE idtable SET curid = nextid WHERE curid = previd if record update count = 0 repeat above

    The UPDATE only succeeds if curid was not updated by another process. No locks required.

  • David Mårtensson (unregistered) in reply to B
    B:
    If you're using SQL Server 2012, you can also use a SEQUENCE. It handles a lot of race condition possibilities for you.

    http://msdn.microsoft.com/en-us/library/ff878091.aspx

    Using insert with an output clause or a select @@identity immediately after an insert will work on older MS SQL and with no race conditions as long as you do not try to share a connection between threads, which you should not do anyway ;)

  • Ross Presser (unregistered)

    Because I have nothing else to contribute, I hereby contribute GuidToBase64 (for MS SQL Server 2008 and higher. Maybe 2005.)

    CREATE FUNCTION dbo.Guidtobase64 ( @guid UNIQUEIDENTIFIER ) RETURNS VARCHAR(22) AS BEGIN DECLARE @bin VARBINARY(20) SET @bin = CONVERT(VARBINARY, @guid) RETURN Replace( Replace( Cast(N'' AS XML) .value('xs:base64Binary(xs:hexBinary(sql:variable("@bin")))' , 'varchar(20)') , '+', '-') , '/', '_') END

    Reversing this is left as an exercise.

    CAPTCHA: I am the SAGACITER

  • Jim (unregistered) in reply to Captain Oblivious
    Captain Oblivious:
    SQLer:
    JAPH:
    The pseudo-random requirement for the id is to make it harder for hackers / crackers to guess the next id. How many times have we seen a web interface where the table id is passed in through the uri?

    Security through obscurity is what you're describing... and only prevents access to "unauthorized records" (from bad code) by writing more bad code.

    You have no idea what you're talking about.

    Hint: if you think 128 or 256 or 512 bits of entropy is merely "obscurity", you don't deserve a job in security.

    That works right up until someone discovers you don't really have that much entropy. All it takes is a little bug in the PRNG to be found an you are TOAST.

    Correct nobody is going to 'guess' valid ideas in a sparsely populated 128-bit or larger space. As a security guy though I would call you FOOLISH to depend on that. Because there have been many vulns due to the discovery that something people though was random was actually directly predictable, or there were at least ways to dramatically reduce the search space.

  • Lis (unregistered) in reply to bambam
    Used the following SQL to implement a database independent record id generator:

    SELECT curid FROM idtable previd = curid nextid = curid + 1 UPDATE idtable SET curid = nextid WHERE curid = previd if record update count = 0 repeat above

    The UPDATE only succeeds if curid was not updated by another process. No locks required.

    Welcome to TDWTF. With advice like this, I look forward to seeing your stuff on the front page any time now.

  • gnasher729 (unregistered) in reply to Jim
    Jim:
    Correct nobody is going to 'guess' valid ideas in a sparsely populated 128-bit or larger space. As a security guy though I would call you FOOLISH to depend on that. Because there have been many vulns due to the discovery that something people though was random was actually directly predictable, or there were at least ways to dramatically reduce the search space.
    I remember someone noticing that various different routers often used the same private keys for encryption. Which is not in itself the problem, since you still can't crack them. However, it implied that these routers would use the same "random" prime numbers in their public keys, and if two private keys have one of their public key components in common (but not both), then both private keys are trivial to crack.
  • Bobby Tables (unregistered)

    The 'N' part of the key has meaning, it's a grouping code, so the OP tells us. That means the database design doesn't even make it to first normal form, fields should not be multi-value. Just use a sequence to generate a unique key, put the CHAR(1) values in a separate column. When an invoice/report is printed out, then concatenate the two values together as a string. Mixing presentation with storage layers, spells poor performance. Also, pro-tip, hire someone who has a clue about databases.

  • B (unregistered) in reply to David Mårtensson
    David Mårtensson:
    B:
    If you're using SQL Server 2012, you can also use a SEQUENCE. It handles a lot of race condition possibilities for you.

    http://msdn.microsoft.com/en-us/library/ff878091.aspx

    Using insert with an output clause or a select @@identity immediately after an insert will work on older MS SQL and with no race conditions as long as you do not try to share a connection between threads, which you should not do anyway ;)

    True, but a SEQUENCE is a bit cleaner, IMO, and it allows you to easily use the same series of numbers for multiple tables. That might be a rare use-case, but it's there. You also don't have to put the logic in the application. You can do this:

    create sequence mainseq as bigint start with 1 increment by 1;

    create table mytable ( id bigint not null primary key constraint DF_mytblid default next value for mainseq, code varchar(20) not null, .... );

    Now other applications can access your DB and can create new IDs and the logic is already clear.

  • Neil (unregistered) in reply to bambam
    bambam:
    Used the following SQL to implement a database independent record id generator:

    SELECT curid FROM idtable previd = curid nextid = curid + 1 UPDATE idtable SET curid = nextid WHERE curid = previd if record update count = 0 repeat above

    The UPDATE only succeeds if curid was not updated by another process. No locks required.

    When I was a beginner rookie database programmer I used to run into various concurrency and deadlock issues, but I don't remember ever having a problem with the following sequence:

    UPDATE idtable SET curid = curid + 1 SELECT curid FROM idtable INSERT ... COMMIT

    Of course, that doesn't make it correct, just lucky.

  • RandomUser423721 (unregistered) in reply to David Mårtensson
    David Mårtensson:
    B:
    If you're using SQL Server 2012, you can also use a SEQUENCE. It handles a lot of race condition possibilities for you.

    http://msdn.microsoft.com/en-us/library/ff878091.aspx

    Using insert with an output clause or a select @@identity immediately after an insert will work on older MS SQL and with no race conditions as long as you do not try to share a connection between threads, which you should not do anyway ;)

    On versions that support it (2k+ ?), use SCOPE_IDENTITY() instead of @@IDENTITY. Reduces issues with triggers, etc.

  • mcmcc (unregistered) in reply to Peter Michael Green
    Peter Michael Green:
    Colin 't Hart:
    Assuming 19 ASCII characters, then there are 19*7 = 133 bits available. GUIDs are 128 bits so that fits comfortably.
    That works if we assume all ascii characters are usable whether or not they are printable.

    If we limit ourselves to printable characters including space that gives us 95 usable characters which according to wolfram alpha lets us store just over 124 bits worth of information in 19 characters.

    By my recoking to store the whole GUID in 19 character positions the system needs to be able to safely handle 107 possibilities for each character.

    If memory serves, only 122 bits of GUIDs are actually random (the rest are type/version info), so it still fits with two bits to spare.

Leave a comment on “Mini-GUID”

Log In or post as a guest

Replying to comment #:

« Return to Article