The Daily WTF: Curious Perversions in Information Technology

2011-07-15 Reply Admin

Alexander Morou:
If you encounter a duplicate when you generate a key, generate a new damn key. Easy fix.
The real question is why they didn't just call Guid.NewGuid(), or implement a system specific variation that would check the subset in play for a duplicate and repeat as necessary until a new Guid is found.

There was some comment in the original WTF to the effect that at the point of ID generation, they could not know which IDs has already been generated, ergo the whole point of using a pseudo-random GUID generator in the first place.

So if I'm that developer, I either accept that the risk of GUID collision is small enough (Andy's retarded swap hack notwithstanding), or I go back to the stakeholders and point out that this requirement cannot be implemented as stated.

2011-07-15 Reply Admin

Seems I skimmed the opening part of Unique Requirement.

Even for a large series of users generating their own pools of IDs and submitting them in after they're done, the same logic still applies. If you can't check the information at the time of creation, you could likely modify the underlying implementation in a way that would be suitable. If a collision did occur, the logical way to handle that is through the correspondence between client and server. When the client uploads a data set that collides, create new ids for the client's associated information and update the client side data to reflect that change.

Mind you, given the relatively miniscule chance that such an occurrence would present itself, it should be an acceptable compromise for the client.

2011-07-15 Reply Admin

This article has already been submitted to and published by Visual Studio Magazine 2 or 3 years ago. (Submitted by this very web site.) Are we out of good WTFs completely for now?

2011-07-15 Reply Admin

You are indeed correct.

Since there was no source cited, is that a bad thing for this website or does it regularly not cite its sources?

2011-07-15 Reply Admin

Then again, the author that submitted it to the Visual Studio Magazine was Alex Papadimoulis.

2011-07-15 Reply Admin

Alexander Morou:
Even for a large series of users generating their own pools of IDs and submitting them in after they're done, the same logic still applies. If you can't check the information at the time of creation, you could likely modify the underlying implementation in a way that would be suitable. If a collision did occur, the logical way to handle that is through the correspondence between client and server. When the client uploads a data set that collides, create new ids for the client's associated information and update the client side data to reflect that change.
Mind you, given the relatively miniscule chance that such an occurrence would present itself, it should be an acceptable compromise for the client.

Well, yeah, we can speculate all we want on the actual requirements and context, and depending on which variation we come up with, there are different sets of viable solutions.

To get the most enjoyment out of the WTF, I take the requirement and GUID solution at face value. I assume that they had to use a GUID, or something like it, because there's no way to detect a collision until it's too late--when it's either impossible or impractical to work backwards to the source, separate the results, regenerate, and try again. (And I mean "assume" in the sense of "let's make this one of our assumptions, for this iteration of the problem," as opposed to "I wholeheartedly believe that this is true.")

If this is the case, then your choices are bascially:

Determine that GUID is "sufficiently unique."
Come up with some other scheme that you can determine is "sufficiently unique".
Push back on the stakeholders and tell them, "Impossible as stated."

On the other hand, if it turns out that it's merely inconvenient to deal with a duplicate--say we don't find out until 3 steps down the line, but we CAN reliably detect dupes, and enough of the original request/input/whatever remains that "Reject, Regenerate, Retry" is possible--then we have a different landscape, and as you say, the rarity of such occurrences should mean the "RRR" approach is viable.

2011-07-15 Reply Admin

Well,

There's a number of things that aren't immediately obvious from the article.

They mention that these are the client's needs, but say nothing of the client's awareness of potential solutions. Regardless of the unique identifier employed, it will undoubtedly have the chance, scant as it might be, of duplicates. Being as the client isn't writing the software themselves (because either they're not programmers, or your group wrote some underlying dependency software that is closed source, and you're the guys to go to), the solution on handling duplications might not be obvious to the client; which is why they would put forth such a requirement in the first place.

There's really insufficient information to say whether recreating the identifiers, after they're created, is out of the question since their use outside of synchronization/submission is not explicitly stated. Given that the correspondence between the client software and the server back-end can handle the issue, short of the users writing the guid down and using that internally, the software should be easily able to propagate any necessary changes to the underlying data model. Odds are the only change would be the Guid, not the underlying data structure that represents it (since it's probably a client-side database of some sort?)

Sutherlands · 2011-07-15 Reply Admin

Alexander Morou:
Regardless of the unique identifier employed, it will undoubtedly have the chance, scant as it might be, of duplicates.

Understatement of the century? Do people understand scale?

2011-07-15 Reply Admin

Actually, I take that back, let me rephrase:

Given that most systems store data fields with a fixed-length value; regardless of the method employed to generate the unique identifier, so long as said unique identifier is also fixed-length and cannot dynamically adjust based off of the needs of the scale of the system in place, duplicates will undoubtedly occur should the elements of the set representing the unique identifiers in use not be incremental in nature. Given an incremental system, versus random, the identifier is then limited to the finite set of elements capable of being represented in the size allowed.

2011-07-15 Reply Admin

Alexander Morou:
Well,
There's a number of things that aren't immediately obvious from the article.

Exactly. That was part of my point. :-)

They mention that these are the client's needs, but say nothing of the client's awareness of potential solutions.

Hence the potential for dialogue with the stakeholders I mentioned. We (the developers) go back to the client and tell him, "Look, based on your requirements, uniqueness CANNOT be guaranteed. However, we can make it extremely unlikely, such as blah blah million years blah blah," and present possible solutions based on the requirements as understood and on potential loosening of the requirements.

Of course, then we drastically reduce our chances of a WTF, and where's the fun in that?

Regardless of the unique identifier employed, it will undoubtedly have the chance, scant as it might be, of duplicates.

That depends on how much of the article we take at face value. If there's truly no way to know anything at all about what IDs have been generated so far, as the article implies, then yes, there's no way we can guarantee that an ID hasn't been used already. All we can do is reduce the probability of a collision to an acceptable level.

If we go back to the requirements generators and try to squeeze more details out of them, we may find that there's some way to know for certain that a particular subset of valid IDs has never been used.

Being as the client isn't writing the software themselves (because either they're not programmers, or your group wrote some underlying dependency software that is closed source, and you're the guys to go to), the solution on handling duplications might not be obvious to the client; which is why they would put forth such a requirement in the first place.

Yup. And why we need to make sure they understand what they're really getting, and what the limitations and alternatives are.

There's really insufficient information to say whether recreating the identifiers, after they're created, is out of the question

Yup. There is, however, sufficient information to determine that the "swap the first two digits" hack is a total WTF, and to take childish amusement from it.

2011-07-15 Reply Admin

Alexander Morou:
Actually, I take that back, let me rephrase:
Given that most systems store data fields with a fixed-length value; regardless of the method employed to generate the unique identifier, so long as said unique identifier is also fixed-length and cannot dynamically adjust based off of the needs of the scale of the system in place, duplicates will undoubtedly occur should the elements of the set representing the unique identifiers in use not be incremental in nature.

Not quite. It will only "undoubtedly" occur if we generate at least N + 1 identifiers. It will occur with probability described by the birthday paradox up to that point.

Given an incremental system, versus random, the identifier is then limited to the finite set of elements capable of being represented in the size allowed.

This is true of the random generation case as well. The pigeonhole principle applies in both cases. It's just that with the sequential case, we always know, from a single datum, exactly which IDs have been used and which haven't, and we can guarantee that the first N IDs generated will be unique.

2011-07-15 Reply Admin

By undoubtedly, I was referring to a use case that required an indeterminate amount of time, up to, but not including, an infinite amount of time (since it's not finite, you can't really include it, short of making something up.) Granted, both instances are limited in the set of values that they can represent; however, in the case of the random, but piecewise (per character) finite (0-9A-F), any system which generated a sequence based off of a set of rules would be difficult to construct such that it would adjust in size as needed based off of the determination of the current length being inadequate. As you'd need to enumerate each variation of the 'random' but finite sequence to determine whether the set has been completed before increasing the size requirements.

I was more speaking hypothetically in the facetious sense.

2011-07-15 Reply Admin

Alexander Morou:
By undoubtedly, I was referring to a use case that required an indeterminate amount of time, up to, but not including, an infinite amount of time (since it's not finite, you can't really include it, short of making something up.)

Okay, but still, for both the random and the sequential case, there's no guarantee of a dupe until we've hit N+1 IDs. The only difference is that until that point, with the sequential we're guaranteed no dupes, and with the random, we have an ever-decreasing probability of no dupes.

And in the domain of a GUID, and assuming it has decent entropy, for any reasonable lifetime, the probability will never climb above "unimaginably minuscule."

I was more speaking hypothetically in the facetious sense.

What? You can't be facetious on the internet!

Maurits · 2011-07-15 Reply Admin

Coyne:
His proposal called for 4 digits, with no two employees having the same pin.

Why do the PINs need to be unique?

dkf · 2011-07-16 Reply Admin

jverd:
Okay, but still, for both the random and the sequential case, there's no guarantee of a dupe until we've hit N+1 IDs. The only difference is that until that point, with the sequential we're guaranteed no dupes, and with the random, we have an ever-decreasing probability of no dupes.

The key advantage of a GUID over a sequence number is that you don't need to have a centralized service issuing GUIDs; sequence numbers force you to keep a counter somewhere. (GUIDs are also mixed up thoroughly so they are somewhat awkward to guess if you don't already know it; that can be useful in some applications. OTOH, it's not exactly hard to use a hash algorithm to turn a sequence number into an apparently-random number, so it's not really a significant difference.)

dkf · 2011-07-16 Reply Admin

Maurits:
Coyne:
His proposal called for 4 digits, with no two employees having the same pin.
Why do the PINs need to be unique?

So they can be used as primary keys! Duh!

2011-07-16 Reply Admin

340 billion trillion quadrillion wtf? how is that not a typo? It means nothing. Use the proper term listed here for it to make any freaking sense. http://en.wikipedia.org/wiki/List_of_numbers#English_names_for_powers_of_10

2011-07-16 Reply Admin

and what's the possibility that the "newly generated" PGUID ("processed GUID") is already not present in the system either as an PGUID or non processed GUID? they should at least think of that..

2011-07-16 Reply Admin

I get the academic ammusment of discussing "the chances" of the two first digits being the same or being born on the same day or day of week as someone else, but there's something I just want to point out. Those discussions all seem to assume that there is an equal chance of all outcomes. There can actually be properies inherent (or engineered) in the process of choosing that favors certain combinations. The GUID generation algorithm is essentially a black box for most of us. And for things like being born on the same weekday...doctors may not like to work late on Fridays, for example, so may induce more on that day. Statistics are like bikinis -- what they reveal is suggestive, but what they conceal is vital.

2011-07-17 Reply Admin

JustSomeGuy:
Obviously, the answer is to interject your own getguid function along the lines of:
def myguid(): ....guid = "aawhatever" ....while guid[0] == guid[1]: ........guid = getGUID() ....return guid

Viola! No duplicate first characters. Of course, it reduces your GUID-space by a fctor of 16 but there's still plenty of them out there, right :-)

Nooo... your GUID-space would be heximated.

2011-07-17 Reply Admin

Once, pre MS GUIDs, I had to write a routine to generate unique strings which were printed and distributed to another branch of the company. For some reason, in the first few pages of strings, even though the strings were fairly short, some pretty serious profanity showed up embedded in the strings. So I had to add a filter.

This experience forever changed my views on randomness.

2011-07-18 Reply Admin

Synchronos:
DaveE:
the probability of one char equaling the character that preceded it (no matter the char) is 1 in 16.

WC:
The chance that the second character is the same as the first is only 1 in 16.

DaveE:
the probability of both the chars being 'A' (for instance' is 1 in 256.

WC:
No, 1 in 256 is the changes of the 2 characters being a certain character.

What's the probability of misreading comments on the Internets? Approximately 1 in 1.

It's more like 1 in 1, cause he misread the comment.

2011-07-18 Reply Admin

ih8u:
Dave:
You're the only one, I don't know why I even come here anymore.

Translation: I don't feel cool, important, superior, or even justified in my sluggish existence unless I don't like things that you like. See how awesome I am? I'm just so miserable because things aren't awesome enough for me.

Why don't you join the other forum monkeys and try and figure out the probability that shit happens which, by the way, is 103%. You see, it happens ALL THE TIME, and sometimes twice.

Not quite... Bad things always come in 3. Get fired? Chances are your gf'll break up and your grandma'll die. It's the Rule of 3, but the more realistic version than the one taught in elementary school.

2011-07-18 Reply Admin

Nope, it's actually 1-in-16 since there's 16 'winning' combinations out of those 256 possible permutations.

hoodaticus · 2011-07-18 Reply Admin

Geoff:
>340 billion trillion quadrillion wtf? how is that not a typo? It means nothing. Use the proper term listed here for it to make any freaking sense. http://en.wikipedia.org/wiki/List_of_numbers#English_names_for_powers_of_10

Well, let's see now. One hundred thousand is one hundred times one thousand.

I'm guessing it makes sense if you multiply the terms. Or use COMMON FUCKING SENSE.

Oh wait - nerd site - never mind.

2011-07-18 Reply Admin

dkf:
jverd:
Okay, but still, for both the random and the sequential case, there's no guarantee of a dupe until we've hit N+1 IDs. The only difference is that until that point, with the sequential we're guaranteed no dupes, and with the random, we have an ever-decreasing probability of no dupes.
The key advantage of a GUID over a sequence number is that you don't need to have a centralized service issuing GUIDs; sequence numbers force you to keep a counter somewhere. (GUIDs are also mixed up thoroughly so they are somewhat awkward to guess if you don't already know it; that can be useful in some applications. OTOH, it's not exactly hard to use a hash algorithm to turn a sequence number into an apparently-random number, so it's not really a significant difference.)

Yes, I know. Not sure of the relevance or what point you're trying to make.

2011-07-18 Reply Admin

Mike:
I get the academic ammusment of discussing "the chances" of the two first digits being the same or being born on the same day or day of week as someone else, but there's something I just want to point out. Those discussions all seem to assume that there is an equal chance of all outcomes. There can actually be properies inherent (or engineered) in the process of choosing that favors certain combinations. The GUID generation algorithm is essentially a black box for most of us.

Exactly. And since we don't KNOW of any algorithm preference for or against the first two digits being the same, the best we can do is assume that they're random and independent, and hence the probability of a match is 1:16.

If we were to peek inside the black box and learn otherwise, we would of course change our calculations appropriately. Likewise, if we took a statistically significant sample and saw a meaningful trend in one direction or the other.

Far as I know though, all we have to go on at this point is that it's "best effort random".

2011-07-18 Reply Admin

Alexander Morou:
You are indeed correct.
Since there was no source cited, is that a bad thing for this website or does it regularly not cite its sources?

Well, I was not questioning the source. The question was: if Alex submitted this to VS Mag so long ago, why did he publish the same WTF of this site just now?

2011-07-19 Reply Admin

yes ... so it's one out of sixteen: 00, 11, 22, ... ff = 16 combinations out of 256 = 1/16.

2011-07-20 Reply Admin

jrh:
So, to fix this they need to go back to where they are creating the DatasetID, check to see if the first 2 characters match and if they do create a new guid until they don't. Amiright?

Could you please post your full name and a picture? We need it for one of the tables in our HR database.

2011-07-21 Reply Admin

Reynoldsjt:
Lies:
Reynoldsjt:
For the love of all that is holy, just spell it out for people. Total number of combination - duplicates divided by total number of combinations 16^2 - 16*15 / 16^2 = 1/16

No. For the thousandth time, this is wrong.

The GUID in the story starts with "66" the chances of getting a 6 are 1/16 followed by another 6 is another 1/16. So when Andy asked "what are the chances of that happening?" the chances of getting that GUID are 1/16 * 1/16 = 1/256.

If you still don't get it, just do it on your fingers.

00,11,22,33,44,55,66,77,88,99,AA,BB,CC,DD,EE,FF

Are all the possible doubles for two hex characters. There are quite clearly 16 duplicates, not 256. If there are a total of 16 possible outcomes(THAT WE CARE ABOUT) And you can only get one of them. Then its a 1 out of 16 chance.

What?? Absolutely not. There are 16 outcomes that we care about, out of 256 possible outcomes, so the probability of getting one of those we care about about is 16/256 = 1/16. Same number, but your reasoning gets to it by sheer coincidence. Say, if there were 17 chars instead of 16, but you still cared only about the 16 cases above, the probability of getting one of them would be 16/(17^2).

Arancaytar · 2011-07-24 Reply Admin

"But what are the chances of that?"

I know a lot of students who complain they will never need something as abstract as probability theory in the "real world".

2011-07-27 Reply Admin

Linux programmer here. Am I missing something?

2011-07-29 Reply Admin

Yes - Windows is what most people use.

2011-08-02 Reply Admin

UNIX weenie:
Linux programmer here. Am I missing something?

Don't know. What part is not making sense?

2011-08-04 Reply Admin

Maurits:
Why do the PINs need to be unique?

So they can be used as usernames and passwords (can't be bothered to dig up the WTF where the authentication was effectively by password while the username was ignored, so if the admin rejected your password, you knew somebody else used it).

2011-08-05 Reply Admin

Nexzus:
It's fun to ask people the chance of being born on the same day of the week as yourself. Assuming they at least know the bare minimum about odds and multiplication, about 90%* will say 1 in 49.

Where do you find these people? Seriously.

2011-08-08 Reply Admin

Nonsense !

Most programmers are dicky wannabees using and creating non-unique guid creation functions.

Many GUID functions have a logical implementation that can NOT guarantee unicity (yes I'm talking to you all you md5-sha-noobstylers ^^).

Of course Andy here is the leader of the pack, but still most libraries available for guid's aren't any more unique (lol let's base it on oh-so-unique MAC address and stuff...), even those coming from "respectable sources".

Worse than that, most of the fake guid solutions available are slower than the real guid ones --

2011-08-08 Reply Admin

L.:
Nonsense !
Most programmers are dicky wannabees using and creating non-unique guid creation functions.

Many GUID functions have a logical implementation that can NOT guarantee unicity (yes I'm talking to you all you md5-sha-noobstylers ^^).

Of course Andy here is the leader of the pack, but still most libraries available for guid's aren't any more unique (lol let's base it on oh-so-unique MAC address and stuff...), even those coming from "respectable sources".

Worse than that, most of the fake guid solutions available are slower than the real guid ones --

Wow, you sure put us in our place, with your incoherent, unsupported hissy fit.

2011-08-09 Reply Admin

http://download.oracle.com/javase/1.5.0/docs/api/java/util/UUID.html

This is by design not unique. Is that supported enough for you ?

2011-08-09 Reply Admin

L.:
http://download.oracle.com/javase/1.5.0/docs/api/java/util/UUID.html
This is by design not unique. Is that supported enough for you ?

No GUID/UUID can possibly be guaranteed to be unique. If you have a specific failing to address with that class, and you can actually explain coherently why it's insufficient for its intended purpose, and what you think is better and why, please do.

So far all you've put forth is a rant and a sound-bite.

2011-08-10 Reply Admin

You sir are correct, i shan't be ranting anymore --

Basically, I find it useful to have UUID's that can at the very least be unique in a controlled environment.

As in, being able to have unique id's within a limited set of machines(I picked 32bits for the machine id, carefully distributed by me and intended to be unique, not based on MAC which can/must sometimes be spoofed for other purposes).

In order to achieve that, several parts are required:

-> machine unicity -> thread unicity -> thread time unicity -> inside-thread unicity

The reason I picked those variables is very easy : -> machine id is a preset variable -> thread id is a readily available variable (you can pre-store it @ start of thread) -> thread time unicity (same you can pre-set it @ start) to avoid mixing thread 1234 from 12 o'clock with thread 1234 from 16 o'clock (with the time id which is in milli or microseconds, you can be sure that ending your thread with a very small sleep will prevent any possibility of two threads on the same time point) -> inside thread unicity (counter... of course you always have at least a counter going somewhere for some reason, so might as well use it)

So basically, I picked 4 variables which are almost free to get (machine_id is present even before thread execution, thread_id is present @ thread creation, and thread_time is present @ thread creation too, the counter is almost inevitably present if you're handling several objects) in terms of processing power, but can also guarantee unicity.

This can be guaranteed to be unique, as long as your time is unique (and if it is not, you have so many problems beyond unique identification it's not worth focusing on it -- ), and from my calculations, that type of unique id is safe for the next 20+ years according to processing power growth (although at some point threads_ids might get another digit).

Besides, this method of generating unique id's is orders of magnitude faster, as it requires only very minor cpu activity (no random, no hash, no time variable generation on every id, etc. etc.) -> the most expensive operations have to be on the base16 encode / concat side.

Also, I chose Java on purpose as it's more or less respected as a language, but the implementations in PHP are just as much of a problem.

2011-08-10 Reply Admin

Interesting that you're so sure that your ID generator is so superior to those of all the rest of us "dicky wannabes." I too have had to create a unique-ish ID scheme, and it's more than up to the task set for it.

However, given the choice, I'll simply use java.util.UUID.randomUUID(). On my 6-year-old PC it can create over 800,000 IDs per second, which is just over one microsecond per ID. That's certainly "very minor CPU activity" in every context in which I've ever needed IDs. (Perhaps your ID-generating demands have been heavier.) And if you're worried about a collision, then you haven't done the math.

So in all honesty, I still fail to understand what point you're trying to make.

Is it that many developers make bogus assumptions when coding "unique" ID generators? Absolutely.

Is it that most developers are idiots but you're far superior? Not seen one shred of evidence for that yet.

Is it that most ID libraries suck? No support for that either.

Is it that java.util.UUID in general and/or UUID.randomUUID() in particular suck? No evidence of that either,. Of course, that doesn't mean there aren't domains where they're not appropriate. But so what? No tool is a panacea.

2011-08-10 Reply Admin

L.:
(with the time id which is in milli or microseconds, you can be sure that ending your thread with a very small sleep will prevent any possibility of two threads on the same time point) ... This can be guaranteed to be unique, as long as your time is unique (and if it is not, you have so many problems beyond unique identification it's not worth focusing on it -- ).

That's highly specious.

Getting the same instruction to be executed within the same milli- or microsecond on more than one box is not that far fetched, especially when you take into account server farms, time servers, cron jobs, and the like. And even with the sleep() call, while less likely still, they're certainly not "guaranteed" not to collide.

Now, it may be that your particular context minimizes these issues to the point where the risk is acceptable, which is fine. Just don't present the approach as if it's "guaranteed" and will work for the general case.

2011-08-11 Reply Admin

jverd:
On my 6-year-old PC it can create over 800,000 IDs per second, which is just over one microsecond per ID. That's certainly "very minor CPU activity"

Right, yet it's slower than the alternative I offer and that is pure logic, beyond whatever the implementation is, but still you justify that additional CPU activity how ?

My point is : The default libraries for UUID do NOT provide UNIQUE ID's and THEY ARE SLOW.

There is NO reason to accept inefficient AND ineffective code when given a better alternative.

2011-08-11 Reply Admin

jverd:
L.:
(with the time id which is in milli or microseconds, you can be sure that ending your thread with a very small sleep will prevent any possibility of two threads on the same time point) ... This can be guaranteed to be unique, as long as your time is unique (and if it is not, you have so many problems beyond unique identification it's not worth focusing on it -- ).

That's highly specious.

Getting the same instruction to be executed within the same milli- or microsecond on more than one box is not that far fetched, especially when you take into account server farms, time servers, cron jobs, and the like. And even with the sleep() call, while less likely still, they're certainly not "guaranteed" not to collide.

Now, it may be that your particular context minimizes these issues to the point where the risk is acceptable, which is fine. Just don't present the approach as if it's "guaranteed" and will work for the general case.

And thus I suggest you re-read my post, which talks about a unique machine ID, NOT based on MAC address (which you sometimes can /have to spoof) but still unique by design (i.e. unique key distribution service).

The sleep call is ONLY there to prevent the following case :

microsecond 0,0: thread fired, thread id set, thread time set, thread count set, action completed, thread killed; microsecond 0,3: thread fired, thread id set (o shit it's the same), etc.

So instead you have this

microsecond 0,0: thread fired, thread id set, thread time set, thread count set, action completed, sleep(1*time granularity), thread killed;

Which implies that even if your computing power was infinite, you'd still be in the next microsecond, thus guaranteeing that the time part is unique.

2011-08-11 Reply Admin

jverd:
And if you're worried about a collision, then you haven't done the math.

In a case with an optimized java software (lol like that even exists) on very powerful hardware, the numbers you think about DO NOT apply.

In the same case, on hardware 10 or 20 years from now, you obviously have no ideas of how "possible" a collision becomes.

So on one hand, you have a solution that guarantees that a collision is IMPOSSIBLE, and it runs faster.

On the other hand, you have a solution that guarantees that p(collision) is QUITE SMALL, and it runs slower.

AND YOU WANT TO PICK THE SECOND ONE BECAUSE ?

i'm not even stating that my prototype uuid is a panacea, i'm just telling you it's a zillion times better than a solution with a non-null p(collision).

If it were my decision, I'd drop the RFC in the trash can, have a retarded machine_id ready for the future (let's give it 64bits), the smallest time fraction available in standard software (nanotime ?), a thread count over 16bits, etc. just to make sure that solution is still valid in 2100.

2011-08-11 Reply Admin

L.:
jverd:
On my 6-year-old PC it can create over 800,000 IDs per second, which is just over one microsecond per ID. That's certainly "very minor CPU activity"
Right, yet it's slower than the alternative I offer

Irrelevant. Once it's fast enough, making it go faster adds no value. And it's already way more than fast enough.

and that is pure logic, beyond whatever the implementation is, but still you justify that additional CPU activity how ?

I justify it by that fact that since making it faster than fast enough adds no value, then the contrapositive follows--namely, NOT making it faster than fast enough does not diminish the value.

My point is : The default libraries for UUID do NOT provide UNIQUE ID's and THEY ARE SLOW.

Your library does not provide unique IDs either. java.util.UUID.randomUUID() provides unique values WITH EXTREMELY HIGH PROBABILITY--certainly no worse than your scheme.

And not, it is NOT slow. "slower than the fastest possible" != "slow"

There is NO reason to accept inefficient AND ineffective code when given a better alternative.

Agree completely. But since java's UUID is more "efficient" than I'll ever need, and plenty effective, neither your code nor mine is a "better alternative", except that mine was better in the situation where I used it because I UUID was simply not an option--for reasons that had nothing to do with its speed or effectiveness.

2011-08-11 Reply Admin

L.:
jverd:
L.:
(with the time id which is in milli or microseconds, you can be sure that ending your thread with a very small sleep will prevent any possibility of two threads on the same time point) ... This can be guaranteed to be unique, as long as your time is unique (and if it is not, you have so many problems beyond unique identification it's not worth focusing on it -- ).

That's highly specious.

Getting the same instruction to be executed within the same milli- or microsecond on more than one box is not that far fetched, especially when you take into account server farms, time servers, cron jobs, and the like. And even with the sleep() call, while less likely still, they're certainly not "guaranteed" not to collide.

Now, it may be that your particular context minimizes these issues to the point where the risk is acceptable, which is fine. Just don't present the approach as if it's "guaranteed" and will work for the general case.

And thus I suggest you re-read my post, which talks about a unique machine ID, NOT based on MAC address (which you sometimes can /have to spoof) but still unique by design (i.e. unique key distribution service).

Irrelevant. Your claim was that times should never match, and if they do, you have a huge problem. Unique machine IDs do not factor into the fact that that claim is dubious.

They DO factor into making reducing the likelihood of collision of the ID in toto, even if times do match, yes, but I wasn't refuting that, only your claim about time on it own.

So I suggest you re-read my post.

The sleep call is ONLY there to prevent the following case :
microsecond 0,0: thread fired, thread id set, thread time set, thread count set, action completed, thread killed; microsecond 0,3: thread fired, thread id set (o shit it's the same), etc.

So instead you have this

microsecond 0,0: thread fired, thread id set, thread time set, thread count set, action completed, sleep(1*time granularity), thread killed;

Which implies that even if your computing power was infinite, you'd still be in the next microsecond, thus guaranteeing that the time part is unique.

Except that sleep() can wake up spuriously, so there's no guarantee that time has elapsed as far as your code can tell.

Or the system clock could have been adjusted.

So, again, you have no guarantee of unique times.

2011-08-11 Reply Admin

L.:
jverd:
And if you're worried about a collision, then you haven't done the math.

In a case with an optimized java software (lol like that even exists)

Up until now I'd had my doubts about whether you were actually capable of a reasonable discussion, but I was willing to give you a chance. Now I can see you're just a nutter.

So on one hand, you have a solution that guarantees that a collision is IMPOSSIBLE,

Wrong.

and it runs faster.

Irrelevant.

Buh-bye. Do post again if you decide to live in the real world someday though!

A More Unique Identifier

Leave a comment on “A More Unique Identifier”