• jverd (unregistered) in reply to Alexander Morou
    Alexander Morou:
    If you encounter a duplicate when you generate a key, generate a new damn key. Easy fix.

    The real question is why they didn't just call Guid.NewGuid(), or implement a system specific variation that would check the subset in play for a duplicate and repeat as necessary until a new Guid is found.

    There was some comment in the original WTF to the effect that at the point of ID generation, they could not know which IDs has already been generated, ergo the whole point of using a pseudo-random GUID generator in the first place.

    So if I'm that developer, I either accept that the risk of GUID collision is small enough (Andy's retarded swap hack notwithstanding), or I go back to the stakeholders and point out that this requirement cannot be implemented as stated.

  • Alexander Morou (unregistered) in reply to jverd

    Seems I skimmed the opening part of Unique Requirement.

    Even for a large series of users generating their own pools of IDs and submitting them in after they're done, the same logic still applies. If you can't check the information at the time of creation, you could likely modify the underlying implementation in a way that would be suitable. If a collision did occur, the logical way to handle that is through the correspondence between client and server. When the client uploads a data set that collides, create new ids for the client's associated information and update the client side data to reflect that change.

    Mind you, given the relatively miniscule chance that such an occurrence would present itself, it should be an acceptable compromise for the client.

  • Why? (unregistered)

    This article has already been submitted to and published by Visual Studio Magazine 2 or 3 years ago. (Submitted by this very web site.) Are we out of good WTFs completely for now?

  • Alexander Morou (unregistered) in reply to Why?

    You are indeed correct.

    Since there was no source cited, is that a bad thing for this website or does it regularly not cite its sources?

  • Alexander Morou (unregistered) in reply to Alexander Morou

    Then again, the author that submitted it to the Visual Studio Magazine was Alex Papadimoulis.

  • jverd (unregistered) in reply to Alexander Morou
    Alexander Morou:
    Even for a large series of users generating their own pools of IDs and submitting them in after they're done, the same logic still applies. If you can't check the information at the time of creation, you could likely modify the underlying implementation in a way that would be suitable. If a collision did occur, the logical way to handle that is through the correspondence between client and server. When the client uploads a data set that collides, create new ids for the client's associated information and update the client side data to reflect that change.

    Mind you, given the relatively miniscule chance that such an occurrence would present itself, it should be an acceptable compromise for the client.

    Well, yeah, we can speculate all we want on the actual requirements and context, and depending on which variation we come up with, there are different sets of viable solutions.

    To get the most enjoyment out of the WTF, I take the requirement and GUID solution at face value. I assume that they had to use a GUID, or something like it, because there's no way to detect a collision until it's too late--when it's either impossible or impractical to work backwards to the source, separate the results, regenerate, and try again. (And I mean "assume" in the sense of "let's make this one of our assumptions, for this iteration of the problem," as opposed to "I wholeheartedly believe that this is true.")

    If this is the case, then your choices are bascially:

    1. Determine that GUID is "sufficiently unique."

    2. Come up with some other scheme that you can determine is "sufficiently unique".

    3. Push back on the stakeholders and tell them, "Impossible as stated."

    On the other hand, if it turns out that it's merely inconvenient to deal with a duplicate--say we don't find out until 3 steps down the line, but we CAN reliably detect dupes, and enough of the original request/input/whatever remains that "Reject, Regenerate, Retry" is possible--then we have a different landscape, and as you say, the rarity of such occurrences should mean the "RRR" approach is viable.

  • Alexander Morou (unregistered) in reply to jverd

    Well,

    There's a number of things that aren't immediately obvious from the article.

    They mention that these are the client's needs, but say nothing of the client's awareness of potential solutions. Regardless of the unique identifier employed, it will undoubtedly have the chance, scant as it might be, of duplicates. Being as the client isn't writing the software themselves (because either they're not programmers, or your group wrote some underlying dependency software that is closed source, and you're the guys to go to), the solution on handling duplications might not be obvious to the client; which is why they would put forth such a requirement in the first place.

    There's really insufficient information to say whether recreating the identifiers, after they're created, is out of the question since their use outside of synchronization/submission is not explicitly stated. Given that the correspondence between the client software and the server back-end can handle the issue, short of the users writing the guid down and using that internally, the software should be easily able to propagate any necessary changes to the underlying data model. Odds are the only change would be the Guid, not the underlying data structure that represents it (since it's probably a client-side database of some sort?)

  • (cs) in reply to Alexander Morou
    Alexander Morou:
    Regardless of the unique identifier employed, it will undoubtedly have the chance, scant as it might be, of duplicates.
    Understatement of the century? Do people understand scale?
  • Alexander Morou (unregistered) in reply to Sutherlands

    Actually, I take that back, let me rephrase:

    Given that most systems store data fields with a fixed-length value; regardless of the method employed to generate the unique identifier, so long as said unique identifier is also fixed-length and cannot dynamically adjust based off of the needs of the scale of the system in place, duplicates will undoubtedly occur should the elements of the set representing the unique identifiers in use not be incremental in nature. Given an incremental system, versus random, the identifier is then limited to the finite set of elements capable of being represented in the size allowed.

  • jverd (unregistered) in reply to Alexander Morou
    Alexander Morou:
    Well,

    There's a number of things that aren't immediately obvious from the article.

    Exactly. That was part of my point. :-)

    They mention that these are the client's needs, but say nothing of the client's awareness of potential solutions.

    Hence the potential for dialogue with the stakeholders I mentioned. We (the developers) go back to the client and tell him, "Look, based on your requirements, uniqueness CANNOT be guaranteed. However, we can make it extremely unlikely, such as blah blah million years blah blah," and present possible solutions based on the requirements as understood and on potential loosening of the requirements.

    Of course, then we drastically reduce our chances of a WTF, and where's the fun in that?

    Regardless of the unique identifier employed, it will undoubtedly have the chance, scant as it might be, of duplicates.

    That depends on how much of the article we take at face value. If there's truly no way to know anything at all about what IDs have been generated so far, as the article implies, then yes, there's no way we can guarantee that an ID hasn't been used already. All we can do is reduce the probability of a collision to an acceptable level.

    If we go back to the requirements generators and try to squeeze more details out of them, we may find that there's some way to know for certain that a particular subset of valid IDs has never been used.

    Being as the client isn't writing the software themselves (because either they're not programmers, or your group wrote some underlying dependency software that is closed source, and you're the guys to go to), the solution on handling duplications might not be obvious to the client; which is why they would put forth such a requirement in the first place.

    Yup. And why we need to make sure they understand what they're really getting, and what the limitations and alternatives are.

    There's really insufficient information to say whether recreating the identifiers, after they're created, is out of the question

    Yup. There is, however, sufficient information to determine that the "swap the first two digits" hack is a total WTF, and to take childish amusement from it.

  • jverd (unregistered) in reply to Alexander Morou
    Alexander Morou:
    Actually, I take that back, let me rephrase:

    Given that most systems store data fields with a fixed-length value; regardless of the method employed to generate the unique identifier, so long as said unique identifier is also fixed-length and cannot dynamically adjust based off of the needs of the scale of the system in place, duplicates will undoubtedly occur should the elements of the set representing the unique identifiers in use not be incremental in nature.

    Not quite. It will only "undoubtedly" occur if we generate at least N + 1 identifiers. It will occur with probability described by the birthday paradox up to that point.

    Given an incremental system, versus random, the identifier is then limited to the finite set of elements capable of being represented in the size allowed.

    This is true of the random generation case as well. The pigeonhole principle applies in both cases. It's just that with the sequential case, we always know, from a single datum, exactly which IDs have been used and which haven't, and we can guarantee that the first N IDs generated will be unique.

  • Alexander Morou (unregistered) in reply to jverd

    By undoubtedly, I was referring to a use case that required an indeterminate amount of time, up to, but not including, an infinite amount of time (since it's not finite, you can't really include it, short of making something up.) Granted, both instances are limited in the set of values that they can represent; however, in the case of the random, but piecewise (per character) finite (0-9A-F), any system which generated a sequence based off of a set of rules would be difficult to construct such that it would adjust in size as needed based off of the determination of the current length being inadequate. As you'd need to enumerate each variation of the 'random' but finite sequence to determine whether the set has been completed before increasing the size requirements.

    I was more speaking hypothetically in the facetious sense.

  • jverd (unregistered) in reply to Alexander Morou
    Alexander Morou:
    By undoubtedly, I was referring to a use case that required an indeterminate amount of time, up to, but not including, an infinite amount of time (since it's not finite, you can't really include it, short of making something up.)

    Okay, but still, for both the random and the sequential case, there's no guarantee of a dupe until we've hit N+1 IDs. The only difference is that until that point, with the sequential we're guaranteed no dupes, and with the random, we have an ever-decreasing probability of no dupes.

    And in the domain of a GUID, and assuming it has decent entropy, for any reasonable lifetime, the probability will never climb above "unimaginably minuscule."

    I was more speaking hypothetically in the facetious sense.

    What? You can't be facetious on the internet!

  • (cs) in reply to Coyne
    Coyne:
    His proposal called for 4 digits, with no two employees having the same pin.

    Why do the PINs need to be unique?

  • (cs) in reply to jverd
    jverd:
    Okay, but still, for both the random and the sequential case, there's no guarantee of a dupe until we've hit N+1 IDs. The only difference is that until that point, with the sequential we're guaranteed no dupes, and with the random, we have an ever-decreasing probability of no dupes.
    The key advantage of a GUID over a sequence number is that you don't need to have a centralized service issuing GUIDs; sequence numbers force you to keep a counter somewhere. (GUIDs are also mixed up thoroughly so they are somewhat awkward to guess if you don't already know it; that can be useful in some applications. OTOH, it's not exactly hard to use a hash algorithm to turn a sequence number into an apparently-random number, so it's not really a significant difference.)
  • (cs) in reply to Maurits
    Maurits:
    Coyne:
    His proposal called for 4 digits, with no two employees having the same pin.
    Why do the PINs need to be unique?
    So they can be used as primary keys! Duh!
  • Geoff (unregistered)

    340 billion trillion quadrillion wtf? how is that not a typo? It means nothing. Use the proper term listed here for it to make any freaking sense. http://en.wikipedia.org/wiki/List_of_numbers#English_names_for_powers_of_10

  • fial (unregistered)

    and what's the possibility that the "newly generated" PGUID ("processed GUID") is already not present in the system either as an PGUID or non processed GUID? they should at least think of that..

  • Mike (unregistered)

    I get the academic ammusment of discussing "the chances" of the two first digits being the same or being born on the same day or day of week as someone else, but there's something I just want to point out. Those discussions all seem to assume that there is an equal chance of all outcomes. There can actually be properies inherent (or engineered) in the process of choosing that favors certain combinations. The GUID generation algorithm is essentially a black box for most of us. And for things like being born on the same weekday...doctors may not like to work late on Fridays, for example, so may induce more on that day. Statistics are like bikinis -- what they reveal is suggestive, but what they conceal is vital.

  • blackanchorage (unregistered) in reply to JustSomeGuy
    JustSomeGuy:
    Obviously, the answer is to interject your own getguid function along the lines of:

    def myguid(): ....guid = "aawhatever" ....while guid[0] == guid[1]: ........guid = getGUID() ....return guid

    Viola! No duplicate first characters. Of course, it reduces your GUID-space by a fctor of 16 but there's still plenty of them out there, right :-)

    Nooo... your GUID-space would be heximated.

  • TechNeilogy (unregistered) in reply to blackanchorage

    Once, pre MS GUIDs, I had to write a routine to generate unique strings which were printed and distributed to another branch of the company. For some reason, in the first few pages of strings, even though the strings were fairly short, some pretty serious profanity showed up embedded in the strings. So I had to add a filter.

    This experience forever changed my views on randomness.

  • The Poop... of DOOM! (unregistered) in reply to Synchronos
    Synchronos:
    DaveE:
    the probability of one char equaling the character that preceded it (no matter the char) is 1 in 16.
    WC:
    The chance that the second character is the same as the first is only 1 in 16.
    DaveE:
    the probability of both the chars being 'A' (for instance' is 1 in 256.
    WC:
    No, 1 in 256 is the changes of the 2 characters being a certain character.

    What's the probability of misreading comments on the Internets? Approximately 1 in 1.

    It's more like 1 in 1, cause he misread the comment.

  • The Poop... of DOOM! (unregistered) in reply to ih8u
    ih8u:
    Dave:
    You're the only one, I don't know why I even come here anymore.

    Translation: I don't feel cool, important, superior, or even justified in my sluggish existence unless I don't like things that you like. See how awesome I am? I'm just so miserable because things aren't awesome enough for me.

    Why don't you join the other forum monkeys and try and figure out the probability that shit happens which, by the way, is 103%. You see, it happens ALL THE TIME, and sometimes twice.

    Not quite... Bad things always come in 3. Get fired? Chances are your gf'll break up and your grandma'll die. It's the Rule of 3, but the more realistic version than the one taught in elementary school.

  • WolfWings (unregistered) in reply to K

    Nope, it's actually 1-in-16 since there's 16 'winning' combinations out of those 256 possible permutations.

  • (cs) in reply to Geoff
    Geoff:
    >340 billion trillion quadrillion wtf? how is that not a typo? It means nothing. Use the proper term listed here for it to make any freaking sense. http://en.wikipedia.org/wiki/List_of_numbers#English_names_for_powers_of_10
    Well, let's see now. One hundred thousand is one hundred times one thousand.

    I'm guessing it makes sense if you multiply the terms. Or use COMMON FUCKING SENSE.

    Oh wait - nerd site - never mind.

  • jverd (unregistered) in reply to dkf
    dkf:
    jverd:
    Okay, but still, for both the random and the sequential case, there's no guarantee of a dupe until we've hit N+1 IDs. The only difference is that until that point, with the sequential we're guaranteed no dupes, and with the random, we have an ever-decreasing probability of no dupes.
    The key advantage of a GUID over a sequence number is that you don't need to have a centralized service issuing GUIDs; sequence numbers force you to keep a counter somewhere. (GUIDs are also mixed up thoroughly so they are somewhat awkward to guess if you don't already know it; that can be useful in some applications. OTOH, it's not exactly hard to use a hash algorithm to turn a sequence number into an apparently-random number, so it's not really a significant difference.)

    Yes, I know. Not sure of the relevance or what point you're trying to make.

  • jverd (unregistered) in reply to Mike
    Mike:
    I get the academic ammusment of discussing "the chances" of the two first digits being the same or being born on the same day or day of week as someone else, but there's something I just want to point out. Those discussions all seem to assume that there is an equal chance of all outcomes. There can actually be properies inherent (or engineered) in the process of choosing that favors certain combinations. The GUID generation algorithm is essentially a black box for most of us.

    Exactly. And since we don't KNOW of any algorithm preference for or against the first two digits being the same, the best we can do is assume that they're random and independent, and hence the probability of a match is 1:16.

    If we were to peek inside the black box and learn otherwise, we would of course change our calculations appropriately. Likewise, if we took a statistically significant sample and saw a meaningful trend in one direction or the other.

    Far as I know though, all we have to go on at this point is that it's "best effort random".

  • Why? (unregistered) in reply to Alexander Morou
    Alexander Morou:
    You are indeed correct.

    Since there was no source cited, is that a bad thing for this website or does it regularly not cite its sources?

    Well, I was not questioning the source. The question was: if Alex submitted this to VS Mag so long ago, why did he publish the same WTF of this site just now?

  • Pluto (unregistered) in reply to K

    yes ... so it's one out of sixteen: 00, 11, 22, ... ff = 16 combinations out of 256 = 1/16.

  • Numeromancer (unregistered) in reply to jrh
    jrh:
    So, to fix this they need to go back to where they are creating the DatasetID, check to see if the first 2 characters match and if they do create a new guid until they don't. Amiright?

    Could you please post your full name and a picture? We need it for one of the tables in our HR database.

  • I <3 scala (unregistered) in reply to Reynoldsjt
    Reynoldsjt:
    Lies:
    Reynoldsjt:
    For the love of all that is holy, just spell it out for people. Total number of combination - duplicates divided by total number of combinations 16^2 - 16*15 / 16^2 = 1/16

    No. For the thousandth time, this is wrong.

    The GUID in the story starts with "66" the chances of getting a 6 are 1/16 followed by another 6 is another 1/16. So when Andy asked "what are the chances of that happening?" the chances of getting that GUID are 1/16 * 1/16 = 1/256.

    If you still don't get it, just do it on your fingers.

    00,11,22,33,44,55,66,77,88,99,AA,BB,CC,DD,EE,FF

    Are all the possible doubles for two hex characters. There are quite clearly 16 duplicates, not 256. If there are a total of 16 possible outcomes(THAT WE CARE ABOUT) And you can only get one of them. Then its a 1 out of 16 chance.

    What?? Absolutely not. There are 16 outcomes that we care about, out of 256 possible outcomes, so the probability of getting one of those we care about about is 16/256 = 1/16. Same number, but your reasoning gets to it by sheer coincidence. Say, if there were 17 chars instead of 16, but you still cared only about the 16 cases above, the probability of getting one of them would be 16/(17^2).

  • (cs)
    "But what are the chances of that?"

    I know a lot of students who complain they will never need something as abstract as probability theory in the "real world".

  • UNIX weenie (unregistered)

    Linux programmer here. Am I missing something?

  • Haha (unregistered) in reply to UNIX weenie

    Yes - Windows is what most people use.

  • jverd (unregistered) in reply to UNIX weenie
    UNIX weenie:
    Linux programmer here. Am I missing something?
    Don't know. What part is not making sense?
  • populus (unregistered) in reply to Maurits
    Maurits:
    Why do the PINs need to be unique?

    So they can be used as usernames and passwords (can't be bothered to dig up the WTF where the authentication was effectively by password while the username was ignored, so if the admin rejected your password, you knew somebody else used it).

  • captcha: saluto (unregistered) in reply to Nexzus
    Nexzus:
    It's fun to ask people the chance of being born on the same day of the week as yourself. Assuming they at least know the bare minimum about odds and multiplication, about 90%* will say 1 in 49.

    Where do you find these people? Seriously.

  • L. (unregistered) in reply to vindico

    Nonsense !

    Most programmers are dicky wannabees using and creating non-unique guid creation functions.

    Many GUID functions have a logical implementation that can NOT guarantee unicity (yes I'm talking to you all you md5-sha-noobstylers ^^).

    Of course Andy here is the leader of the pack, but still most libraries available for guid's aren't any more unique (lol let's base it on oh-so-unique MAC address and stuff...), even those coming from "respectable sources".

    Worse than that, most of the fake guid solutions available are slower than the real guid ones --

  • jverd (unregistered) in reply to L.
    L.:
    Nonsense !

    Most programmers are dicky wannabees using and creating non-unique guid creation functions.

    Many GUID functions have a logical implementation that can NOT guarantee unicity (yes I'm talking to you all you md5-sha-noobstylers ^^).

    Of course Andy here is the leader of the pack, but still most libraries available for guid's aren't any more unique (lol let's base it on oh-so-unique MAC address and stuff...), even those coming from "respectable sources".

    Worse than that, most of the fake guid solutions available are slower than the real guid ones --

    Wow, you sure put us in our place, with your incoherent, unsupported hissy fit.

  • L. (unregistered) in reply to jverd

    http://download.oracle.com/javase/1.5.0/docs/api/java/util/UUID.html

    This is by design not unique. Is that supported enough for you ?

  • jverd (unregistered) in reply to L.
    L.:
    http://download.oracle.com/javase/1.5.0/docs/api/java/util/UUID.html

    This is by design not unique. Is that supported enough for you ?

    No GUID/UUID can possibly be guaranteed to be unique. If you have a specific failing to address with that class, and you can actually explain coherently why it's insufficient for its intended purpose, and what you think is better and why, please do.

    So far all you've put forth is a rant and a sound-bite.

  • L. (unregistered) in reply to jverd

    You sir are correct, i shan't be ranting anymore --

    Basically, I find it useful to have UUID's that can at the very least be unique in a controlled environment.

    As in, being able to have unique id's within a limited set of machines(I picked 32bits for the machine id, carefully distributed by me and intended to be unique, not based on MAC which can/must sometimes be spoofed for other purposes).

    In order to achieve that, several parts are required:

    -> machine unicity -> thread unicity -> thread time unicity -> inside-thread unicity

    The reason I picked those variables is very easy : -> machine id is a preset variable -> thread id is a readily available variable (you can pre-store it @ start of thread) -> thread time unicity (same you can pre-set it @ start) to avoid mixing thread 1234 from 12 o'clock with thread 1234 from 16 o'clock (with the time id which is in milli or microseconds, you can be sure that ending your thread with a very small sleep will prevent any possibility of two threads on the same time point) -> inside thread unicity (counter... of course you always have at least a counter going somewhere for some reason, so might as well use it)

    So basically, I picked 4 variables which are almost free to get (machine_id is present even before thread execution, thread_id is present @ thread creation, and thread_time is present @ thread creation too, the counter is almost inevitably present if you're handling several objects) in terms of processing power, but can also guarantee unicity.

    This can be guaranteed to be unique, as long as your time is unique (and if it is not, you have so many problems beyond unique identification it's not worth focusing on it -- ), and from my calculations, that type of unique id is safe for the next 20+ years according to processing power growth (although at some point threads_ids might get another digit).

    Besides, this method of generating unique id's is orders of magnitude faster, as it requires only very minor cpu activity (no random, no hash, no time variable generation on every id, etc. etc.) -> the most expensive operations have to be on the base16 encode / concat side.

    Also, I chose Java on purpose as it's more or less respected as a language, but the implementations in PHP are just as much of a problem.

  • jverd (unregistered) in reply to L.

    Interesting that you're so sure that your ID generator is so superior to those of all the rest of us "dicky wannabes." I too have had to create a unique-ish ID scheme, and it's more than up to the task set for it.

    However, given the choice, I'll simply use java.util.UUID.randomUUID(). On my 6-year-old PC it can create over 800,000 IDs per second, which is just over one microsecond per ID. That's certainly "very minor CPU activity" in every context in which I've ever needed IDs. (Perhaps your ID-generating demands have been heavier.) And if you're worried about a collision, then you haven't done the math.

    So in all honesty, I still fail to understand what point you're trying to make.

    Is it that many developers make bogus assumptions when coding "unique" ID generators? Absolutely.

    Is it that most developers are idiots but you're far superior? Not seen one shred of evidence for that yet.

    Is it that most ID libraries suck? No support for that either.

    Is it that java.util.UUID in general and/or UUID.randomUUID() in particular suck? No evidence of that either,. Of course, that doesn't mean there aren't domains where they're not appropriate. But so what? No tool is a panacea.

  • jverd (unregistered) in reply to L.
    L.:
    (with the time id which is in milli or microseconds, you can be sure that ending your thread with a very small sleep will prevent any possibility of two threads on the same time point) ... This can be guaranteed to be unique, as long as your time is unique (and if it is not, you have so many problems beyond unique identification it's not worth focusing on it -- ).

    That's highly specious.

    Getting the same instruction to be executed within the same milli- or microsecond on more than one box is not that far fetched, especially when you take into account server farms, time servers, cron jobs, and the like. And even with the sleep() call, while less likely still, they're certainly not "guaranteed" not to collide.

    Now, it may be that your particular context minimizes these issues to the point where the risk is acceptable, which is fine. Just don't present the approach as if it's "guaranteed" and will work for the general case.

  • L. (unregistered) in reply to jverd
    jverd:
    On my 6-year-old PC it can create over 800,000 IDs per second, which is just over one microsecond per ID. That's certainly "very minor CPU activity"
    Right, yet it's slower than the alternative I offer and that is pure logic, beyond whatever the implementation is, but still you justify that additional CPU activity how ?

    My point is : The default libraries for UUID do NOT provide UNIQUE ID's and THEY ARE SLOW.

    There is NO reason to accept inefficient AND ineffective code when given a better alternative.

  • L. (unregistered) in reply to jverd
    jverd:
    L.:
    (with the time id which is in milli or microseconds, you can be sure that ending your thread with a very small sleep will prevent any possibility of two threads on the same time point) ... This can be guaranteed to be unique, as long as your time is unique (and if it is not, you have so many problems beyond unique identification it's not worth focusing on it -- ).

    That's highly specious.

    Getting the same instruction to be executed within the same milli- or microsecond on more than one box is not that far fetched, especially when you take into account server farms, time servers, cron jobs, and the like. And even with the sleep() call, while less likely still, they're certainly not "guaranteed" not to collide.

    Now, it may be that your particular context minimizes these issues to the point where the risk is acceptable, which is fine. Just don't present the approach as if it's "guaranteed" and will work for the general case.

    And thus I suggest you re-read my post, which talks about a unique machine ID, NOT based on MAC address (which you sometimes can /have to spoof) but still unique by design (i.e. unique key distribution service).

    The sleep call is ONLY there to prevent the following case :

    microsecond 0,0: thread fired, thread id set, thread time set, thread count set, action completed, thread killed; microsecond 0,3: thread fired, thread id set (o shit it's the same), etc.

    So instead you have this

    microsecond 0,0: thread fired, thread id set, thread time set, thread count set, action completed, sleep(1*time granularity), thread killed;

    Which implies that even if your computing power was infinite, you'd still be in the next microsecond, thus guaranteeing that the time part is unique.

  • L. (unregistered) in reply to jverd
    jverd:
    And if you're worried about a collision, then you haven't done the math.

    In a case with an optimized java software (lol like that even exists) on very powerful hardware, the numbers you think about DO NOT apply.

    In the same case, on hardware 10 or 20 years from now, you obviously have no ideas of how "possible" a collision becomes.

    So on one hand, you have a solution that guarantees that a collision is IMPOSSIBLE, and it runs faster.

    On the other hand, you have a solution that guarantees that p(collision) is QUITE SMALL, and it runs slower.

    AND YOU WANT TO PICK THE SECOND ONE BECAUSE ?

    i'm not even stating that my prototype uuid is a panacea, i'm just telling you it's a zillion times better than a solution with a non-null p(collision).

    If it were my decision, I'd drop the RFC in the trash can, have a retarded machine_id ready for the future (let's give it 64bits), the smallest time fraction available in standard software (nanotime ?), a thread count over 16bits, etc. just to make sure that solution is still valid in 2100.

  • jverd (unregistered) in reply to L.
    L.:
    jverd:
    On my 6-year-old PC it can create over 800,000 IDs per second, which is just over one microsecond per ID. That's certainly "very minor CPU activity"
    Right, yet it's slower than the alternative I offer

    Irrelevant. Once it's fast enough, making it go faster adds no value. And it's already way more than fast enough.

    and that is pure logic, beyond whatever the implementation is, but still you justify that additional CPU activity how ?

    I justify it by that fact that since making it faster than fast enough adds no value, then the contrapositive follows--namely, NOT making it faster than fast enough does not diminish the value.

    My point is : The default libraries for UUID do NOT provide UNIQUE ID's and THEY ARE SLOW.

    Your library does not provide unique IDs either. java.util.UUID.randomUUID() provides unique values WITH EXTREMELY HIGH PROBABILITY--certainly no worse than your scheme.

    And not, it is NOT slow. "slower than the fastest possible" != "slow"

    There is NO reason to accept inefficient AND ineffective code when given a better alternative.

    Agree completely. But since java's UUID is more "efficient" than I'll ever need, and plenty effective, neither your code nor mine is a "better alternative", except that mine was better in the situation where I used it because I UUID was simply not an option--for reasons that had nothing to do with its speed or effectiveness.

  • jverd (unregistered) in reply to L.
    L.:
    jverd:
    L.:
    (with the time id which is in milli or microseconds, you can be sure that ending your thread with a very small sleep will prevent any possibility of two threads on the same time point) ... This can be guaranteed to be unique, as long as your time is unique (and if it is not, you have so many problems beyond unique identification it's not worth focusing on it -- ).

    That's highly specious.

    Getting the same instruction to be executed within the same milli- or microsecond on more than one box is not that far fetched, especially when you take into account server farms, time servers, cron jobs, and the like. And even with the sleep() call, while less likely still, they're certainly not "guaranteed" not to collide.

    Now, it may be that your particular context minimizes these issues to the point where the risk is acceptable, which is fine. Just don't present the approach as if it's "guaranteed" and will work for the general case.

    And thus I suggest you re-read my post, which talks about a unique machine ID, NOT based on MAC address (which you sometimes can /have to spoof) but still unique by design (i.e. unique key distribution service).

    Irrelevant. Your claim was that times should never match, and if they do, you have a huge problem. Unique machine IDs do not factor into the fact that that claim is dubious.

    They DO factor into making reducing the likelihood of collision of the ID in toto, even if times do match, yes, but I wasn't refuting that, only your claim about time on it own.

    So I suggest you re-read my post.

    The sleep call is ONLY there to prevent the following case :

    microsecond 0,0: thread fired, thread id set, thread time set, thread count set, action completed, thread killed; microsecond 0,3: thread fired, thread id set (o shit it's the same), etc.

    So instead you have this

    microsecond 0,0: thread fired, thread id set, thread time set, thread count set, action completed, sleep(1*time granularity), thread killed;

    Which implies that even if your computing power was infinite, you'd still be in the next microsecond, thus guaranteeing that the time part is unique.

    Except that sleep() can wake up spuriously, so there's no guarantee that time has elapsed as far as your code can tell.

    Or the system clock could have been adjusted.

    So, again, you have no guarantee of unique times.

  • jverd (unregistered) in reply to L.
    L.:
    jverd:
    And if you're worried about a collision, then you haven't done the math.

    In a case with an optimized java software (lol like that even exists)

    Up until now I'd had my doubts about whether you were actually capable of a reasonable discussion, but I was willing to give you a chance. Now I can see you're just a nutter.

    So on one hand, you have a solution that guarantees that a collision is IMPOSSIBLE,

    Wrong.

    and it runs faster.

    Irrelevant.

    Buh-bye. Do post again if you decide to live in the real world someday though!

Leave a comment on “A More Unique Identifier”

Log In or post as a guest

Replying to comment #:

« Return to Article