• gomer (unregistered)

    guids are your friends.

  • (cs)

    A good manager simply would have informed the bug owner that "must be a fluke" is not a valid conclusion after the first bug report was closed. That's not micromanagement.

  • (cs)

    Technical managers are the best. Typically they come from a development background, which means they know how a team works. My hat goes off to Dan E. for taking the gentlest approach :)

  • (cs)

    Oh, I see the wtf. He should have used a configurable number of random numbers rather than fixing it at two. Right?

  • (cs)

    This comment has been randomly generated.

  • Quicksilver (unregistered)

    The distribution of comments containing "This is not a wtf at all"

    or

    "The real wtf is ..."

    are neither randomly nor pseudorandomly distributed. They are bound to occur with a probability of exactly one in every thread on tdwtf.

    capture gotcha (hit the bots hard in the face)

  • RandyD (unregistered)

    The failure is to let the bug 'slide' the next was to not impose code reviews on that developer for everything he did for the next month or two.

  • (cs) in reply to gabba
    gabba:
    Oh, I see the wtf. He should have used a configurable number of random numbers rather than fixing it at two. Right?

    Better yet, a random number of random numbers!

  • (cs)

    37% of all statistics are made up on the spot!

  • bling (unregistered) in reply to gomer
    gomer:
    guids are your friends.

    Not really, as they're usually generated at random. So they could still collide. Not a huge chance of it happening, but still.

  • Russ (unregistered)

    When someone talks about incredibly small chances, and says the chances of something happening again are infinitesimal, the correct response is to ask them to prove it mathematically. After all, it's only reasonable to assume a dev must have given the problem this sort of rigorous treatment before declaring the likelihood infinitesimal, right?

  • Greg D (unregistered)

    After all, a one in a million chance means it'll happen next Tuesday.

  • Loren Pechtel (unregistered)

    Incredibly small chances are almost bound to happen when you roll the dice often enough.

  • (cs) in reply to operagost
    operagost:
    A good manager simply would have informed the bug owner that "must be a fluke" is not a valid conclusion after the first bug report was closed.
    What do you do when you get assigned a bug that you cannot reproduce?
  • (cs) in reply to brazzy
    brazzy:
    operagost:
    A good manager simply would have informed the bug owner that "must be a fluke" is not a valid conclusion after the first bug report was closed.
    What do you do when you get assigned a bug that you cannot reproduce?

    You request permission to put more tracing code in the Production system. (But that's only if you really can't reproduce it.)

  • Anon Fred (unregistered)

    I don't have the rest of the code, but I don't see what pseudo-randomness has to do with it.

    Presumably getting multiple orders during the same second is common, so if that's what was broken, it would've broken a lot.

    A plain-old pseudo-random number in the 32-bit range would've taken care of this. Even if they get hundreds of orders per second.

    A random number in the range of 0-99 isn't useful for disambiguating.

  • Anonymouse (unregistered) in reply to Loren Pechtel
    Incredibly small chances are almost bound to happen when you roll the dice often enough.

    There is such a thing as "sufficiently improbable" (one in a hundred isn't it, or am I misreading the code?). But I believe the problem was they were seeding the random number generator from the real-time clock, which would mean the chance of a "fluke" would always be 1:86400 (or however many steps are in the real-time clock cycle), regardless of the amount of pseudorandom data they generate.

    Also of course, rnd100 is as "random" as two times rnd10.

  • el jaybird (unregistered)

    Nice. I bet that developer brings his own bombs onto airline flights, too, to guarantee his safety. After all, what are the odds that there would be TWO bombs on the same flight?

    If I were Dan, I would wait for it to happen a third, fourth, maybe fifth time, and keep forwarding it to the developer. Let him try to explain it away. "Wow, the chances of it happening five times are infinitely small. I should buy a lottery ticket!"

    captcha: poindexter (amen)

  • BiggerWTF (unregistered)

    Dan is a poor dev manager. This comment in a bug report:

    "There's absolutely no way this could happen in our code. It had to be some bizarre data fluke."

    ...is absolutely unacceptable. Dan should have re-opened the bug and assigned it to someone else, or provided mentoring to the developer who closed it, or fixed it himself the first time.

    Allowing a bug report to be closed because the developer was incapable of fixing or reproducing it, or because the developer did not feel that the bug was important; is simply a management mistake (we all make mistakes).

    Allowing the same thing to happen a second time, is just bad management.

  • Jon W (unregistered)

    The REAL WTF is that they didn't use UNIQUE constraints on that row, right? Right? (That, and the not following up on the "there's no way this could be happening" excuse)

  • sweavo (unregistered)

    Hahah "infinitesimally small". There's a 1% chance of this happening! That is infinitely more likely than infinitesimal.

  • (cs) in reply to BiggerWTF
    BiggerWTF:
    Dan is a poor dev manager. This comment in a bug report:

    "There's absolutely no way this could happen in our code. It had to be some bizarre data fluke."

    ...is absolutely unacceptable. Dan should have re-opened the bug and assigned it to someone else, or provided mentoring to the developer who closed it, or fixed it himself the first time.

    Allowing a bug report to be closed because the developer was incapable of fixing or reproducing it, or because the developer did not feel that the bug was important; is simply a management mistake (we all make mistakes).

    Allowing the same thing to happen a second time, is just bad management.

    Not true. Bizarre data flukes happen all the time. I think that explains this hard to find bug I'm supposed to be working on right now.

  • JUST ANOTHER WTF (unregistered) in reply to sweavo
    sweavo:
    Hahah "infinitesimally small". There's a 1% chance of this happening! That is infinitely more likely than infinitesimal.

    There is a 1% chance of collision when orders are not placed at the same time... there is 100% chance of collision when you run the random function twice using the same seed.

  • (cs) in reply to gabba
    gabba:
    Not true. Bizarre data flukes happen all the time. I think that explains this hard to find bug I'm supposed to be working on right now.

    Not true, there are no bizzare data flukes. There are extremely hard to find bugs. A computer will NOT simply decide to do something different in rare cases. There is always a reason, the problem is trying to find that reason. This is more of a fringe case rather than a fluke, but it is repeatable given the same circumstances. Finding what those circumstances are is the difficult part. No one said this job was supposed to be easy.

  • Anon Fred (unregistered) in reply to sweavo
    sweavo:
    Hahah "infinitesimally small". There's a 1% chance of this happening! That is infinitely more likely than infinitesimal.

    Birthday paradox. You have a more than 50% chance of getting a collision within 14 iterations, if I remember by combinatorics class.

    I still maintain that the seed has nothing to do with the problem, or else this would happen every time there were 2 orders within the same second. Even a poorly seeded PRNG would be fine for this problem, as long as its period was big enough. (And 100 is definitely not big enough.)

  • Me (unregistered) in reply to SpComb

    I lol'd

  • Vlad Patryshev (unregistered)

    While fixing this is a good thing, wasting meeting time explaining to the incompetent subordinates how RNG works is an abuse of authority. See why. If you are in the coder's hat, act like a coder. Fix it, talk to the guy who was responsible for the bug, and keep it quiet. Or write a short unassuming memo. But please don't patronize. You may be a manager, but this does not mean you thusly become a super-coder, omniscient and omnipotent.

  • Vlad Patryshev (unregistered) in reply to SpComb

    If I humbly suggest you to read vol.II of The Art of Computer Programming by You Should Know Who?

  • MBC (unregistered)

    Since no fooling around with random numbers is necessary, I think the WTF is on the mark.

    But if a random number were called for, one would always be sufficient as long as the full value were used (and only one copy of this process is running). The well-known congruential generators have long cycle times - which means that they do not produce duplicates until they cycle.

    If more than one process needs a unique ID, it seems simplest to store a long interlocked counter in the database to avoid coincidental collisions.

  • (cs)

    I don't understand how randomly generating the id would ever guarantee uniqueness. I mean regardless of the seed value (or the pseudo generation)eventually you will get the same number twice. They have to be using this in conjunction with another column as the primary key right? That part was left out wasn't it?

  • Vlad Patryshev (unregistered) in reply to Anon Fred

    I fixed a similar bug in our code (a pretty popular piece of software) a couple of months ago.

    The bug is this: each thread seeds a new random (for the sake of randomness) with current time. Of course two threads starting at the same millisecond produce the same "random" number.

    The single-machine solution is to use a static rng; if you want to produce unique ids, there are lots of solutions around, e.g. check out guids algorithm.

  • (cs) in reply to Anon Fred
    Anon Fred:
    I still maintain that the seed has nothing to do with the problem, or else this would happen every time there were 2 orders within the same second. Even a poorly seeded PRNG would be fine for this problem, as long as its period was big enough. (And 100 is definitely not big enough.)

    It's pretty unlikly the library would seed the PRNG with using a timer with 1 sec granulity. It probably used a timer with a granularity of 1 millisecond. So you would only get the same seed if there was more than one order in the same millisecond. That's ignoring reasons the timer may have the same value at completely different moments in time.

  • Anon Fred (unregistered) in reply to T604
    T604:
    I don't understand how randomly generating the id would ever guarantee uniqueness.

    It doesn't; it just makes collisions very unlikely.

    To be clear, the Right Answer in the article is to just use your database's natural sequence constructs to guarantee uniqueness.

    But a lot of real-world applications rely upon the incredible rarity of collisions of random numbers. Such as the Andrews File System.

    T604:
    I mean regardless of the seed value (or the pseudo generation)eventually you will get the same number twice.

    This also applies for just a plain old incrementer. Eventually it wraps. But it takes a long time to get through 32 bits worth of orders.

  • Hieronymous Coward (unregistered) in reply to RandyD

    Because berating programmers is the best way to retain them. And true micromanagement helps.

    If the guy's an idiot, fire him now. If he's not an idiot, keep him happy and productive.

  • Al H (unregistered)

    I had a similar problem with SQL numbers awhile back, and since it was written by a SQL "guru" (or so he said) I never thought it could possibly be his code. Especially since he said it couldn't. So after years of patching other code to avoid it (and after he took his guruness to another company) I discovered his "unique number" logic was even less unique than a random.

    Basically, he took today's julian as the "base" and added a sequence number to each row of data. Which worked fine the first time it ran each day. The 2nd time would cause an error.

    The "newid()" function in SQL is there for a reason...

    --Al--

  • (cs) in reply to el jaybird
    el jaybird:
    If I were Dan, I would wait for it to happen a third, fourth, maybe fifth time, and keep forwarding it to the developer. Let him try to explain it away.
    When developers have spent sufficiently long in ivory towers staring at computer screens, we start to forget about the world outside our cubes and the fact that it is driven by goals which are quite different from ours. I suggest you approach your boss and ask for a guided tour of your company so as to get the big picture, along with an explanation of the relationship between satisfied customers and your paycheck.
  • Lynx (unregistered) in reply to BiggerWTF
    BiggerWTF:
    Dan is a poor dev manager. This comment in a bug report:

    "There's absolutely no way this could happen in our code. It had to be some bizarre data fluke."

    ...is absolutely unacceptable. Dan should have re-opened the bug and assigned it to someone else, or provided mentoring to the developer who closed it, or fixed it himself the first time.

    I agree. I had been working on a relatively simple leave application system that unfortunately sits on a legacy database (very legacy... as in, the users insist on keeping all, and I mean ALL, leave applications EVER keyed into the system...).

    So, I had multiple occasions where the system behaviour simply didn't match intentions. There were times we could trace to data problems and there were times we suspected data problems, but we never wrote it off as "There's absolutely no way this could happen in our code. It had to be some bizarre data fluke."

    Thing is, it HAD happened, the odds are that it WILL happen again, and I'm more than eager to stomp it FLAT into the ground NOW. If my vendor had came back to me with that kind of report, I'll have done nasty things to them. Worse, if I had reported that way to my manager, she'll had stomped me into the ground too.

    Data flukes in IT systems can happen, but they usually happen for a reason. It may not happen for a good reason, but there's usually a reason for it.

    CAPTCHA: bling. What my vendor isn't getting..

  • (cs) in reply to Squiggle
    Squiggle:
    Technical managers are the best. Typically they come from a development background, which means they know how a team works. My hat goes off to Dan E. for taking the gentlest approach :)
    Yep. My team had a manager a few years back who was all noise and bluster and didn't know anything about what we actually did. He was all about appearance, looking good in front of his superiors.

    One day the system went down. We had powerful, important customers calling our top guys wanting to know why we were offline. Now picture this: three of our best developers are huddled around a desk, trying to research the problem, and this manager guy is talking on his cell phone, running around in circles and squawking like a chicken, trying to look like he's doing something to handle the situation. Every few seconds he would interrupt the developers with, "Anything yet? What do we know so far?" and then continue talking to his bosses on the phone. He would crowd his way into the group around the computer and stand peeking over their shoulders, as if he was actually going to understand what he saw onscreen.

    Finally some folks from the support team realized that we wouldn't get anything done with him around, so they cooked up an excuse to lead him off to another part of the building, bless their hearts.

    I'll take the technically experienced managers any time. At least you don't have to explain to them why the team needs training and the latest tools.

  • Richard (unregistered) in reply to Anon Fred
    Anon Fred:
    sweavo:
    Hahah "infinitesimally small". There's a 1% chance of this happening! That is infinitely more likely than infinitesimal.

    Birthday paradox. You have a more than 50% chance of getting a collision within 14 iterations, if I remember by combinatorics class.

    In this case, that means that if 14 rows are inserted in the time between the INSERT and the SELECT, then you have a 50% chance of collision. It does not mean that if a two-way collision happens 14 times, the probability of an 'accident' is 50% -- in fact, it's 13.13% or so.

    Anon Fred:
    I still maintain that the seed has nothing to do with the problem, or else this would happen every time there were 2 orders within the same second. Even a poorly seeded PRNG would be fine for this problem, as long as its period was big enough. (And 100 is definitely not big enough.)

    Absolutely. The seed part is rubbish, unless the RNG was seeded every second with the time, or if multiple threads had their own RNGs, seeded with the same value, and the load was well-balanced between them so that the RNGs stayed in sync.

  • (cs) in reply to BiggerWTF
    BiggerWTF:
    Allowing a bug report to be closed because the developer was incapable of fixing or reproducing it, or because the developer did not feel that the bug was important; is simply a management mistake (we all make mistakes).

    Testing and debugging is an exercise in epistemology. We have some observation (the bug report or test report), from which we could draw a conclusion (there is a bug in the code). A fundamental problem is that our observations are not entirely reliable, so given a bug report we have two possible conclusions: the bug report is a manifestation of a bug in the code, or the bug report is itself faulty. Sometimes when a bug report (I prefer the term incident report, since the presence of a bug is not yet established) is not reproduceable it is because the reporter has made a mistake and misinterpreted or misreported something. So it can be wise to close a so-call 'bug-report' that is not reproduceable.

    The real expertise, of course, is knowing when doing so is the right thing to do. In this case, it seems the developers could examine the faulty database, so there would be very good reasons to believe there was a bug present.

  • Anon (unregistered) in reply to bling
    bling:
    gomer:
    guids are your friends.

    Not really, as they're usually generated at random. So they could still collide. Not a huge chance of it happening, but still.

    If you follow the RFC on GUIDs they most certainly are not generated randomly. They follow a strict set of rules and are time based. The tick in a GUID is every 100 nanoseconds, which is usually smaller than the granularity of a PC's clock (which I think is around 10 milliseconds) but they're not random.

    The concept with a GUID is that there's a time code and a node identifier. According to the RFC, the node identifier is simply the MAC address of the machine they're being generated on. (Pedants may point out that it's really the MAC address of a network interface in the machine.) Microsoft uses some other scheme to generate a node number, but the concept is similar.

    In any case, you're limited in the number of GUIDs per unit time you can generate so they make a really lousy ID number. Plus they involve locking as only one thread per MAC address can generate a GUID at a time.

    The Right Solution is the obvious solution: use the built-in database function for generating IDs. They exist for a reason.

  • Dude (unregistered)

    who cares if it was a problem with the data or the code. a problem is still a problem, right?

    captcha: gotcha

  • blindman (unregistered) in reply to bling
    bling:
    gomer:
    guids are your friends.

    Not really, as they're usually generated at random. So they could still collide. Not a huge chance of it happening, but still.

    Globally. UNIQUE. Identifier. And don't spout crap about the uniqueness being relaxed for privacy concerns. They are still unique for any machine, and the chance of two machines generating the same GUID really IS infinitesimally small.

    captcha "dubya"?????????

  • ViciousPsicle (unregistered) in reply to stinch
    stinch:
    Anon Fred:
    I still maintain that the seed has nothing to do with the problem, or else this would happen every time there were 2 orders within the same second. Even a poorly seeded PRNG would be fine for this problem, as long as its period was big enough. (And 100 is definitely not big enough.)

    It's pretty unlikly the library would seed the PRNG with using a timer with 1 sec granulity. It probably used a timer with a granularity of 1 millisecond.

    Exactly, the problem isn't the period or the granularity of the timer being used to seed rnd. It's pretty clearly implied that each order starts a new thread, and each thread is seeding a new random number generator. So if two threads seed their random number generators with the same time value, they will always get the same random numbers. That's how pseudo-random functions work. It's just a mathematical function that takes a starting value and returns a series of numbers with no obvious pattern. But with the same starting value, the series will always be the same. No matter how high the resolution of your time value, sooner or later you will get two threads with the same seed. The likelihood increases as computers get faster.

    If you're going to use pseudo-random numbers for values that need to be unique, it is therefore imperative that you maintain a single seed value for your entire process tree to minimize the chance of collisions. Better yet, don't use pseudo-random values. By its very nature, a pseudo-random series does not guarantee you won't get duplicates. Quite the contrary, it's just a matter of time until you do.

  • Joel (unregistered) in reply to Vlad Patryshev
    Vlad Patryshev:
    While fixing this is a good thing, wasting meeting time explaining to the incompetent subordinates how RNG works is an abuse of authority. See why. If you are in the coder's hat, act like a coder. Fix it, talk to the guy who was responsible for the bug, and keep it quiet. Or write a short unassuming memo. But please don't patronize. You may be a manager, but this does not mean you thusly become a super-coder, omniscient and omnipotent.

    I don't know, the manager gets it for poking his nose in, he gets it for not poking his nose in. Maybe the coders should just realize they don't necessarily know everything and accept the well intentioned advice without being offended.

    In the end, the coders might just learn a thing or two.

  • Franz Kafka (unregistered) in reply to blindman
    blindman:
    bling:
    gomer:
    guids are your friends.

    Not really, as they're usually generated at random. So they could still collide. Not a huge chance of it happening, but still.

    Globally. UNIQUE. Identifier. And don't spout crap about the uniqueness being relaxed for privacy concerns. They are still unique for any machine, and the chance of two machines generating the same GUID really IS infinitesimally small.

    captcha "dubya"?????????

    Who cares? Use a sequence and the chance of a collision goes to zero.

  • Bob (unregistered) in reply to Lynx
    Lynx:
    Data flukes in IT systems can happen, but they usually happen for a reason. It may not happen for a good reason, but there's usually a reason for it.

    Like Arthur Clarke said, "Any sufficiently advanced technology is indistinguishable from magic. ". Some bugs seem the way as well.

  • guids? lol? seriously? (unregistered) in reply to gomer

    wow, your advice to fix a random wtf is to use a different type of random. Rather then a sequence which has a smaller footprint, and is definitely unique.

  • Eduardo Habkost (unregistered)

    What is a "bug-two"? </nitpicking>

  • Some Engineer (unregistered) in reply to RandyD
    RandyD:
    The failure is to let the bug 'slide' the next was to not impose code reviews on that developer for everything he did for the next month or two.

    Code reviews should not be something "imposed" on a developer. Code reviews compliment unit tests, both catch problems that the other will not. It makes as much sense to say that unit tests should be "imposed" on a developer.

    A developer's attitude toward these tools reflects on their true goals. If your goal as a software engineer is to make stable, correct, maintainable software (read: diametric opposite of a WTF), then code reviews and unit tests should BOTH be part of your personal work process.

Leave a comment on “Identity Crisis ”

Log In or post as a guest

Replying to comment #:

« Return to Article