- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
Admin
The REAL entertaining part about the site is the guys who have even more free time and use it to rant about the comments themselves.
Admin
dsId.Substring(2); returns the chars of the string from index 2 (inclusive) and on, not the first two chars.
Look it up:
String.Substring Method (Int32) .NET Framework 1.1
Retrieves a substring from this instance. The substring starts at a specified character position.
[Visual Basic] Overloads Public Function Substring( _ ByVal startIndex As Integer _ ) As String [C#] public string Substring( int startIndex ); [C++] public: String* Substring( int startIndex ); [JScript] public function Substring( startIndex : int ) : String;
Admin
mm...that's the problem with trolling, you often(=always) come out looking like an ass... perhaps if us prison coders had better training.
anyhow - you're right of course, the code looks like c# and the substring would take all chars from position 2 to end of string
which raises a different question - if dsID is 2 chars - then the original statement of the article stands, but if it's longer then it really doesn't...(it might generate a legit guid)
Admin
This community service is provided at no extra cost to yourself .... (Just happens to be the middle of the night and I was woken up just to see why the server wasn't available remotely. One taxi ride later ... )
Admin
Dude, .substring(2) returns the substring that begins at index 2. c++ and I think C# do that.
Admin
dsID.Substring(2) does not get the first two characters of the string. It gets all the characters EXCEPT the first two characters of the string.
If they tried to create a GUID consisting of the first 2 characters swapped plus the first two characters in the original order, that would total -- try out my second grade math here -- 4 characters. But a GUID is 32 characters. So that would not be a valid GUID.
The function does NOT create a 4-character string. It creates a 32-character string with the first two characters swapped. Like, if the original was 12000000-0000-0000-0000-000000000000, the output would be 21000000-0000-0000-0000-000000000000, not 2112.
Thus, probability that the output string will be identical to the input string: 1/16.
Note that if the previous poster were correct in his understanding of substring(2), then the probability that the output string would be identical to the input string would be zero, as the two would have different lengths.
Admin
Hey, this function works 15 times out of 16, or over 93% of the time! That's probably better than most of the code that I have to work with ...
Admin
some have pointed out that i misread the code and that it reverses the first 2 chars and then takes the rest of the original string.
so i'll adjust the output assuming that the orig. string is only 2 chars long (the rest of the data is irrelevant from a probability standpoint regarding the 1/16 or 1/256 argument):
AA = AA AB = BA AC = CA BA = AB BB = BB BC = CB CA = AC CB = BC CC = CC
you still get 3^2 = 9 distinct cases or 1/9 chance for each combination.
Admin
"He didn't have a better idea, but was confident that, given enough time, he could cobble something together that utilized the computer's serial number, CPU footprint and a number of other factors."
I would use numeric time, assuming the SLA allowed for the performance hit of synchronization/mutex lock.
Admin
hmm...yes, never mind - i misunderstood the usage of pdId and dsId..
Admin
yes, and if your goal was to create a new combination that is unique (meaning different) than the original, then it fails in 3 out of the 9 cases (AA,BB,CC).
As has been said, you create the same combination in 3/9 or as it is more commonly stated, 1/3 of the time.
Admin
Could someone explain to a non-Windows coder why it matters whether or not the first two digits are identical? I get the point that it does, and don't really want to argue about how likely that is, it just seems broken that something purporting to generate a unique identifier is that sensitive to initial input.
I mean, if there's a library routine like this:
int random(unsigned int seed) { srand(seed); return rand(); }
and Andy calls it from his program as random(42), then yes there's some justification in calling Andy an idiot, but perhaps not as much as in calling the person who wrote the library routine an idiot.
Admin
Reminiscent of this....
http://stackoverflow.com/questions/1705008/simple-proof-that-guid-is-not-unique
Admin
I actually can't think of a stupider way to fix the problem. You win.
That should also be a featured comment.
Admin
And some can read code, comments and documentation.
I wish I worked with more people like Hortical.
Admin
It takes the GUID from the API, or else it gets the hose again. It does this, whenever it is told.
Admin
Amazing, the number of people who can't figure stuff like this out.
At our location, one of the programmers was going to generate a pin for each employee: His proposal called for 4 digits, with no two employees having the same pin.
When I pointed out that we had 40,000 employees, and a 4-digit pin has 10,000 unique pins, his response was, "So, what?"
(Duhhhhh...)
Admin
Please somebody, I'm so confused. What's the final word on this? Does SubString() take a length or a start index? I first read it as start index and wondered what the problem was (other than just using the GUID out of the box instead of manipulating it). If "length" (starting at index 0), then the story works, but that's not how I read it at first.
Admin
Windows or non-windows doesn't matter. If your GUID is this:
BB63DC25-D37B-4107-9D63-74825C2C7443
Then swapping the first two hex digits doesn't accomplish much. In any language.
Admin
Admin
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA!!!!!!!!!!!!!!!!!!!!!!!
HOW MANY OF THOSE HAVE TWO OF THE SAME CHARACTER!!!?!?!?!?!?!!
TTTTTTHHHHHHHHHHRRRRRRRRREEEEEEEEEEE!!!!!!!!!!!!!!!
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA!!!!!!!!!!!!!!!!!!!!!!!
Admin
The problem is they're thinking too hard. They start to trip when approaching the question, but instead of trying to pull themselves up to reassess the question, they just dive right into the concrete.
Admin
Why haven't anyone pointed out that using a guid in the first place is kind of silly? Why not just use a counter? Using a guid was kind of wtf in the first place if you ask me..
Admin
Wow. Reading through some of the comments here, I'm thinking this roughly characterizes what many of the commentators on this site are feeling right now.
Admin
and for those who are still not sure: "12345".substring(2) = "345"
Admin
Now that you've pointed it out, I see. Somehow I read a call to something to generate a GUID in there, instead of just taking an existing one and lightly frobbing it.
Admin
ah, fuck it.
You're either obstinately wrong or talking about something that's irrelevant to the discussion.
Admin
Admin
Taken from MSDN
Not exactly spam now is it?
Admin
Oh, forgot to mention length defaults to length of string - start position
Admin
No, probably not...
Admin
Admin
Good catch! Oh, wait, it isn't VB. You fucking retard.
Admin
What is it?
Admin
[quote user="Emu"] Retrieves a substring from this instance. The substring starts at a specified character position and has a specified length.[/quote] First arg is start position and the second is the length to read. [/quote]
Thank you, Emu. There's no indication what language or toolkit this code is (some intelligent guesses could be made I suppose), and I'm not a Windows programmer.
So, if this is true, wouldn't this code take a guid like {f4204ca9... and turn it into {4f204ca9... ?
In that case, there's a WTF, but it wouldn't cause collisions with a 1/16 probability.
Right?
Admin
Admin
proposition: The system would generate a GUID for the unprocessed dataset and another for the processed dataset. They wouldn't need to be related but Andy related them anyway. Here's an example:
Generated Guid:
Andy's half-assed attemp at a Guid: take the 21, swap them and copy the rest of the GUID by using the function Hortical called irrelevant.The result is:
and it would be be used in an unique column at the same table. I think I do not need to explain why the system barfed when the 1st and 2nd chars were the same, do I?
Admin
Then, your ass obeys Sturgeon's Law as well it should.
Admin
C#
Admin
The issue is that (apparently) the values of dsID and pdsID are both used to (supposedly) uniquely identify datasets.
pdsID is simply dsID with the first two characters switched so in all GUIDs where the first two characters are the same (1/16th of GUIDs) the pdsID = dsID which then causes the issue.
Admin
For the love of all that is holy, just spell it out for people. Total number of combination - duplicates divided by total number of combinations 16^2 - 16*15 / 16^2 = 1/16
Admin
No, you don't, because you don't know what I was referring to possibly because I didn't know what the other guy was referring to.
The dicussion was pulled off on some tangent were people didn't know what the chances of this problem was (the guy i was epsonsfvdfvdf vdfvd
just tired of talking about it
Admin
yeah, I looked up the var keyword, didn't that was in c#
Admin
No. For the thousandth time, this is wrong.
The GUID in the story starts with "66" the chances of getting a 6 are 1/16 followed by another 6 is another 1/16. So when Andy asked "what are the chances of that happening?" the chances of getting that GUID are 1/16 * 1/16 = 1/256.
Admin
Yep, Andy must have felt that the two guids needed to be related in someway and coded to generate the second guid based on swapping the 1st two characters of the 1st guid.
No clue as to why he didn't just generate a new one, unless he wanted to 'track' the relationship of the 2 ids. In which case, it's best to relate the guids using a lookup table. But Andy probably didn't want to do that.
Cal
Admin
Well, there's your problem. That's the whole point of the article - that when the first two characters are the same, we get a collision.
The fact that the second character is identical to the first one is supremely relevant.
Admin
Admin
From the sound of the story, the Processed GUID can't collide with the original GUID. The Processed GUID is "generated" by swapping the first two characters. So, if the first two characters of the original GUID are identical, the Processed GUID will be the same as the original GUID.
So, given that the GUID will collide in any case where the first two characters are the same, we need to look for any situation where the first two characters are the same. You state that the probability of the first two characters both being 6 is 1/256, which is correct, but we don't care what the first one is, only that the second letter is the same as the first. Since the probability of any given character being in both the first and second place is 1/256, and there are 16 possible characters, multiplying gives us...
1/16 chance of collision.
Admin
Except that isn't the question at hand. The question is what are the chances that the first two digits of the GUID will be identical. If every digit as a 1/16 chance of being any one particular character, then the chances of the first two digits being identical is 1/16 as the probability of the first digit doesn't matter at all since the first digit is already fixed.