- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
guids are your friends.
Admin
A good manager simply would have informed the bug owner that "must be a fluke" is not a valid conclusion after the first bug report was closed. That's not micromanagement.
Admin
Technical managers are the best. Typically they come from a development background, which means they know how a team works. My hat goes off to Dan E. for taking the gentlest approach :)
Admin
Oh, I see the wtf. He should have used a configurable number of random numbers rather than fixing it at two. Right?
Admin
This comment has been randomly generated.
Admin
The distribution of comments containing "This is not a wtf at all"
or
"The real wtf is ..."
are neither randomly nor pseudorandomly distributed. They are bound to occur with a probability of exactly one in every thread on tdwtf.
capture gotcha (hit the bots hard in the face)
Admin
The failure is to let the bug 'slide' the next was to not impose code reviews on that developer for everything he did for the next month or two.
Admin
Better yet, a random number of random numbers!
Admin
37% of all statistics are made up on the spot!
Admin
Not really, as they're usually generated at random. So they could still collide. Not a huge chance of it happening, but still.
Admin
When someone talks about incredibly small chances, and says the chances of something happening again are infinitesimal, the correct response is to ask them to prove it mathematically. After all, it's only reasonable to assume a dev must have given the problem this sort of rigorous treatment before declaring the likelihood infinitesimal, right?
Admin
After all, a one in a million chance means it'll happen next Tuesday.
Admin
Incredibly small chances are almost bound to happen when you roll the dice often enough.
Admin
Admin
You request permission to put more tracing code in the Production system. (But that's only if you really can't reproduce it.)
Admin
I don't have the rest of the code, but I don't see what pseudo-randomness has to do with it.
Presumably getting multiple orders during the same second is common, so if that's what was broken, it would've broken a lot.
A plain-old pseudo-random number in the 32-bit range would've taken care of this. Even if they get hundreds of orders per second.
A random number in the range of 0-99 isn't useful for disambiguating.
Admin
There is such a thing as "sufficiently improbable" (one in a hundred isn't it, or am I misreading the code?). But I believe the problem was they were seeding the random number generator from the real-time clock, which would mean the chance of a "fluke" would always be 1:86400 (or however many steps are in the real-time clock cycle), regardless of the amount of pseudorandom data they generate.
Also of course, rnd100 is as "random" as two times rnd10.
Admin
Nice. I bet that developer brings his own bombs onto airline flights, too, to guarantee his safety. After all, what are the odds that there would be TWO bombs on the same flight?
If I were Dan, I would wait for it to happen a third, fourth, maybe fifth time, and keep forwarding it to the developer. Let him try to explain it away. "Wow, the chances of it happening five times are infinitely small. I should buy a lottery ticket!"
captcha: poindexter (amen)
Admin
Dan is a poor dev manager. This comment in a bug report:
"There's absolutely no way this could happen in our code. It had to be some bizarre data fluke."
...is absolutely unacceptable. Dan should have re-opened the bug and assigned it to someone else, or provided mentoring to the developer who closed it, or fixed it himself the first time.
Allowing a bug report to be closed because the developer was incapable of fixing or reproducing it, or because the developer did not feel that the bug was important; is simply a management mistake (we all make mistakes).
Allowing the same thing to happen a second time, is just bad management.
Admin
The REAL WTF is that they didn't use UNIQUE constraints on that row, right? Right? (That, and the not following up on the "there's no way this could be happening" excuse)
Admin
Hahah "infinitesimally small". There's a 1% chance of this happening! That is infinitely more likely than infinitesimal.
Admin
Not true. Bizarre data flukes happen all the time. I think that explains this hard to find bug I'm supposed to be working on right now.
Admin
There is a 1% chance of collision when orders are not placed at the same time... there is 100% chance of collision when you run the random function twice using the same seed.
Admin
Not true, there are no bizzare data flukes. There are extremely hard to find bugs. A computer will NOT simply decide to do something different in rare cases. There is always a reason, the problem is trying to find that reason. This is more of a fringe case rather than a fluke, but it is repeatable given the same circumstances. Finding what those circumstances are is the difficult part. No one said this job was supposed to be easy.
Admin
Birthday paradox. You have a more than 50% chance of getting a collision within 14 iterations, if I remember by combinatorics class.
I still maintain that the seed has nothing to do with the problem, or else this would happen every time there were 2 orders within the same second. Even a poorly seeded PRNG would be fine for this problem, as long as its period was big enough. (And 100 is definitely not big enough.)
Admin
I lol'd
Admin
While fixing this is a good thing, wasting meeting time explaining to the incompetent subordinates how RNG works is an abuse of authority. See why. If you are in the coder's hat, act like a coder. Fix it, talk to the guy who was responsible for the bug, and keep it quiet. Or write a short unassuming memo. But please don't patronize. You may be a manager, but this does not mean you thusly become a super-coder, omniscient and omnipotent.
Admin
If I humbly suggest you to read vol.II of The Art of Computer Programming by You Should Know Who?
Admin
Since no fooling around with random numbers is necessary, I think the WTF is on the mark.
But if a random number were called for, one would always be sufficient as long as the full value were used (and only one copy of this process is running). The well-known congruential generators have long cycle times - which means that they do not produce duplicates until they cycle.
If more than one process needs a unique ID, it seems simplest to store a long interlocked counter in the database to avoid coincidental collisions.
Admin
I don't understand how randomly generating the id would ever guarantee uniqueness. I mean regardless of the seed value (or the pseudo generation)eventually you will get the same number twice. They have to be using this in conjunction with another column as the primary key right? That part was left out wasn't it?
Admin
I fixed a similar bug in our code (a pretty popular piece of software) a couple of months ago.
The bug is this: each thread seeds a new random (for the sake of randomness) with current time. Of course two threads starting at the same millisecond produce the same "random" number.
The single-machine solution is to use a static rng; if you want to produce unique ids, there are lots of solutions around, e.g. check out guids algorithm.
Admin
It's pretty unlikly the library would seed the PRNG with using a timer with 1 sec granulity. It probably used a timer with a granularity of 1 millisecond. So you would only get the same seed if there was more than one order in the same millisecond. That's ignoring reasons the timer may have the same value at completely different moments in time.
Admin
It doesn't; it just makes collisions very unlikely.
To be clear, the Right Answer in the article is to just use your database's natural sequence constructs to guarantee uniqueness.
But a lot of real-world applications rely upon the incredible rarity of collisions of random numbers. Such as the Andrews File System.
This also applies for just a plain old incrementer. Eventually it wraps. But it takes a long time to get through 32 bits worth of orders.
Admin
Because berating programmers is the best way to retain them. And true micromanagement helps.
If the guy's an idiot, fire him now. If he's not an idiot, keep him happy and productive.
Admin
I had a similar problem with SQL numbers awhile back, and since it was written by a SQL "guru" (or so he said) I never thought it could possibly be his code. Especially since he said it couldn't. So after years of patching other code to avoid it (and after he took his guruness to another company) I discovered his "unique number" logic was even less unique than a random.
Basically, he took today's julian as the "base" and added a sequence number to each row of data. Which worked fine the first time it ran each day. The 2nd time would cause an error.
The "newid()" function in SQL is there for a reason...
--Al--
Admin
Admin
So, I had multiple occasions where the system behaviour simply didn't match intentions. There were times we could trace to data problems and there were times we suspected data problems, but we never wrote it off as "There's absolutely no way this could happen in our code. It had to be some bizarre data fluke."
Thing is, it HAD happened, the odds are that it WILL happen again, and I'm more than eager to stomp it FLAT into the ground NOW. If my vendor had came back to me with that kind of report, I'll have done nasty things to them. Worse, if I had reported that way to my manager, she'll had stomped me into the ground too.
Data flukes in IT systems can happen, but they usually happen for a reason. It may not happen for a good reason, but there's usually a reason for it.
CAPTCHA: bling. What my vendor isn't getting..
Admin
One day the system went down. We had powerful, important customers calling our top guys wanting to know why we were offline. Now picture this: three of our best developers are huddled around a desk, trying to research the problem, and this manager guy is talking on his cell phone, running around in circles and squawking like a chicken, trying to look like he's doing something to handle the situation. Every few seconds he would interrupt the developers with, "Anything yet? What do we know so far?" and then continue talking to his bosses on the phone. He would crowd his way into the group around the computer and stand peeking over their shoulders, as if he was actually going to understand what he saw onscreen.
Finally some folks from the support team realized that we wouldn't get anything done with him around, so they cooked up an excuse to lead him off to another part of the building, bless their hearts.
I'll take the technically experienced managers any time. At least you don't have to explain to them why the team needs training and the latest tools.
Admin
In this case, that means that if 14 rows are inserted in the time between the INSERT and the SELECT, then you have a 50% chance of collision. It does not mean that if a two-way collision happens 14 times, the probability of an 'accident' is 50% -- in fact, it's 13.13% or so.
Absolutely. The seed part is rubbish, unless the RNG was seeded every second with the time, or if multiple threads had their own RNGs, seeded with the same value, and the load was well-balanced between them so that the RNGs stayed in sync.
Admin
Testing and debugging is an exercise in epistemology. We have some observation (the bug report or test report), from which we could draw a conclusion (there is a bug in the code). A fundamental problem is that our observations are not entirely reliable, so given a bug report we have two possible conclusions: the bug report is a manifestation of a bug in the code, or the bug report is itself faulty. Sometimes when a bug report (I prefer the term incident report, since the presence of a bug is not yet established) is not reproduceable it is because the reporter has made a mistake and misinterpreted or misreported something. So it can be wise to close a so-call 'bug-report' that is not reproduceable.
The real expertise, of course, is knowing when doing so is the right thing to do. In this case, it seems the developers could examine the faulty database, so there would be very good reasons to believe there was a bug present.
Admin
The concept with a GUID is that there's a time code and a node identifier. According to the RFC, the node identifier is simply the MAC address of the machine they're being generated on. (Pedants may point out that it's really the MAC address of a network interface in the machine.) Microsoft uses some other scheme to generate a node number, but the concept is similar.
In any case, you're limited in the number of GUIDs per unit time you can generate so they make a really lousy ID number. Plus they involve locking as only one thread per MAC address can generate a GUID at a time.
The Right Solution is the obvious solution: use the built-in database function for generating IDs. They exist for a reason.
Admin
who cares if it was a problem with the data or the code. a problem is still a problem, right?
captcha: gotcha
Admin
captcha "dubya"?????????
Admin
Exactly, the problem isn't the period or the granularity of the timer being used to seed rnd. It's pretty clearly implied that each order starts a new thread, and each thread is seeding a new random number generator. So if two threads seed their random number generators with the same time value, they will always get the same random numbers. That's how pseudo-random functions work. It's just a mathematical function that takes a starting value and returns a series of numbers with no obvious pattern. But with the same starting value, the series will always be the same. No matter how high the resolution of your time value, sooner or later you will get two threads with the same seed. The likelihood increases as computers get faster.
If you're going to use pseudo-random numbers for values that need to be unique, it is therefore imperative that you maintain a single seed value for your entire process tree to minimize the chance of collisions. Better yet, don't use pseudo-random values. By its very nature, a pseudo-random series does not guarantee you won't get duplicates. Quite the contrary, it's just a matter of time until you do.
Admin
I don't know, the manager gets it for poking his nose in, he gets it for not poking his nose in. Maybe the coders should just realize they don't necessarily know everything and accept the well intentioned advice without being offended.
In the end, the coders might just learn a thing or two.
Admin
Who cares? Use a sequence and the chance of a collision goes to zero.
Admin
Like Arthur Clarke said, "Any sufficiently advanced technology is indistinguishable from magic. ". Some bugs seem the way as well.
Admin
wow, your advice to fix a random wtf is to use a different type of random. Rather then a sequence which has a smaller footprint, and is definitely unique.
Admin
What is a "bug-two"? </nitpicking>
Admin
Code reviews should not be something "imposed" on a developer. Code reviews compliment unit tests, both catch problems that the other will not. It makes as much sense to say that unit tests should be "imposed" on a developer.
A developer's attitude toward these tools reflects on their true goals. If your goal as a software engineer is to make stable, correct, maintainable software (read: diametric opposite of a WTF), then code reviews and unit tests should BOTH be part of your personal work process.