• Duderoni (unregistered) in reply to brazzy
    brazzy:
    T.T.:
    I agree.

    Who in his right mind would use random numbers to generate an ID??? Developers like that definitely did not have an education whatsoever

    Or they may be competent and doing the right thing (not in the given example, though). In a distributed system, performance considerations or architectural limitations or a requirement to allow offline operation may result in a random ID being your only choice. If it's done right, there's a hundred things that are more likely to go wrong and break your app than two large random numbers being identical.

    Nonsense, if you have to resolve such technical issues then you should implement a decent work around. Something like locally caching and a solid merging algorithm. In any case, there should be a system in place that can detect and resolve collisions in an acceptable way.

    Why do you think people have spent so much time in solving hard problems related to parallel computations? Not because these were all part of some trivial issue that can be solved with blind faith in statistics.

    Using random numbers is just a false sense of security, no matter how small the chances of collisions are. Especially if the stakes are high ("oops sorry sir, our software accidentally lost your 1 bilion dollar transfer, but we're very confident it will not happen again").

    I pitty to poor sod who has to maintain your code in the future. The trick in solid software development is to make your system as predictable as possible. Only then can you reason your way through intricate bugs and implement a fix.

  • (cs) in reply to Duderoni

    "Using random numbers is just a false sense of security, no matter how small the chances of collisions are"

    It's acceptable if the chances of collision are low enough.

  • Dan E (unregistered)

    Wow, my first submission to be published :) I never realized how much they modify and make up in these stories... Even my name is slightly wrong. Anyway, most of the comments about my behaviour are meaningless, as things really didn't go down the way they are described in the article. A colorful reenactment of the events, but not very correct.

    Regarding the granularity of the random function - this is VBScript we're talking about; by default it seeds using the number of seconds that have passed since midnight.

    Funny thing is, though, a few months later we had another incident with random numbers. A legacy VB6 program was used to change a weekly password in a customer's database. It came up with a very limited range of passwords - not always the same passwords, but very often. When testing the software, though, you could run it a thousand times and it would always produce new, different passwords. The developer who was assigned to the case made the software iterate until the password didn't match last week's password, then marked the bug as resolved. When reviewing the developer's "fix", I realized the actual problem had less to do with the code, and more to do with the fact that the application had been scheduled to run at precisely 1 am every monday.

  • (cs) in reply to Vlad Patryshev
    Vlad Patryshev:
    If I humbly suggest you to read vol.II of The Art of Computer Programming by You Should Know Who?

    Voldermort ?

  • blindman (unregistered) in reply to CynicalTyler
    CynicalTyler:
    blindman:
    the chance of two machines generating the same GUID really IS infinitesimally small
    Infinitesimally small is infinitely larger than exactly zero, which is what you get if you use a sequence where you can guarantee wrapping is not a problem.
    RandomJoe:
    GUIDs are useful too if you're returning the ID number back to the customer.
    Ibuprofen is useful too if you're going to shoot yourself in the foot. I would much rather encode such values, or better yet, avoid returning them at all.

    Oh and don't feed me the line about using GUIDs for spanning databases across machines. Saying "there's a really small chance of GUIDs colliding, so let's just use it" is logically the same as saying "we did not bother to eliminate the possibility of failure".

    The number of GUID combinations is 2^122. Obviously, you have NO concept of how large this is. Bigint is only 2^64, making the number of GUIDs more than 288 MILLION BILLION times larger. I've have never, in my entire life, bought a lottery ticket. But GUIDs are odds I can live with.

  • T.T. (unregistered) in reply to Cthulhu

    That is a wrong assessment IMHO. If the stakes are high enough, NO WAY will you ever use a random number for unique ID's.

    Try that in sattelite hardware/software, and you should be fired instantly. Cause you bet your ass that a collision will happen, and then what? Sattelite straight down the drain?

    CAPTCHA: booyakasha

  • (cs) in reply to Duderoni
    Duderoni:
    Nonsense, if you have to resolve such technical issues then you should implement a decent work around. Something like locally caching and a solid merging algorithm. In any case, there should be a system in place that can detect and resolve collisions in an acceptable way.
    Again, the conditions and requirements may rule all such solutions out, or the consequences of collisions might be not important enough to warrant the additional complexity.
    Why do you think people have spent so much time in solving hard problems related to parallel computations? Not because these were all part of some trivial issue that can be solved with blind faith in statistics.
    Straw man argument.
    Using random numbers is just a false sense of security, no matter how small the chances of collisions are.
    Come back when you have acquired a sense of proportion. Following that argument, doing anything at all "is just a false sense of security". Cosmic rays may cause a critical bit to flip and create an ID collision despite a perfect software solution. CRC checks may not catch the error because they rely on statistics. And your super-safe 256 bit AES or 2048 bit RSA encryption? Relies on the "false sense of security" that someone doesn't simply guess the right key.
    Especially if the stakes are high ("oops sorry sir, our software accidentally lost your 1 bilion dollar transfer, but we're very confident it will not happen again").
    If you can guarantee that the error is less likely than one in 1 trillion cases, and the simpler design saves you 1 cent per transaction, you could just refund the billion dollars and it would be at most 10% of what you've saved - any insurance company would be more than happy to take on that kind of risk.
    I pitty to poor sod who has to maintain your code in the future. The trick in solid software development is to make your system as predictable as possible. Only then can you reason your way through intricate bugs and implement a fix.
    If making it completely predictable requires a considerably more complex system then while you may be better able to reason your way through complex bugs, you will also have considerably more of them.
  • Nutmeg Programmer (unregistered)

    Lots of shops have a regular program (e.g. monthly lunch) with a short tech talk. Reviewing random numbers might be appropriate.

    That said, I've been doing business programming for a long time, and the only times I've ever needed random numbers was in doing Monte Carlo simulations and the like. I can see that you may need randomness for passwords, etc., but it seems pretty sketchy to me for database keys.

  • Paul (unregistered) in reply to brazzy

    If you use the right algorithm, you can guarantee that GUIDs are unique to a reasonable level.

    Of course, this depends on using the appropriate algorithm - using a unique 'node id' (eg NIC MAC address)

    • using a high resolution timer (100ns per 'tick') or simulating it
    • tracking backwards clock changes and modifying the 'clock sequence' counter as required

    Once you have this, the only way you can generate duplicates is if you swap a network card from one PC to another PC with a clock which is behind the first PC's clock by at least the time it took to swap the network card. If all PCs are running with synchronised clocks, then there's no way you can get duplicates with version 1 GUIDs.

    Version 4 GUIDS on the other hand - I'd never want to consider using those for important stuff, they just don't seem determinate enough.

  • Matthew (unregistered) in reply to KattMan
    KattMan:
    Not true, there are no bizzare data flukes. There are extremely hard to find bugs. A computer will NOT simply decide to do something different in rare cases.

    Well, I guess you can tell all those computer scientists who think up error correction and detection algorithms to go home. Clearly they have nothing to worry about. RAM never stores the wrong bits. Hard drives never return bad data.

    There is always a reason,

    Of course there is always a reason. It isn't magic. It is just that the reason isn't always the software (or the user). This isn't to say you can't use software to try and detect and deal with unpredictable hardware errors, but they do happen.

    the problem is trying to find that reason.

    OR sometimes you don't bother finding the specific reason and you just toss in some error detection and try to deal with errors as gracefully as a possible.

    -matthew

  • Anonymous Comment Generatatron (unregistered)

    To be honest there are two WTF's here:

    First, that the development manager let this go three times before doing something about it.

    Second, his response was to find the problem, then have a big meeting and rub everyones noses in it.

    Both indicate bad managment (be it technical or not). If there is an issue where you know you can help, then help. Don't sit back and then parade the problem around for all to see.

    Of course, I've any number of isntances where ex-development turned manager types have sworn blind the issue is with X only for it to turn out to be the pink zebra walking past the building. Just because you used to be a developer does not mean that you have onmicient knowledge of all things.

    We're human; get over it and move the hell on.

  • Maarten (unregistered) in reply to brazzy
    brazzy:
    In the real world, a theoretical certainty that a failure cannot happen is NOT more valuable than sufficiently small likelihood.

    Actually it is, because predicting exactly WHEN this failure with a small likelihood is going to happen and proving that is absolutely positively NOT next Tuesday, is harder and more time consuming then actually making the correct design decisions and KNOWING it will not happen next Tuesday.

  • (cs) in reply to Maarten
    Maarten:
    brazzy:
    In the real world, a theoretical certainty that a failure cannot happen is NOT more valuable than sufficiently small likelihood.

    Actually it is, because predicting exactly WHEN this failure with a small likelihood is going to happen and proving that is absolutely positively NOT next Tuesday, is harder and more time consuming then actually making the correct design decisions and KNOWING it will not happen next Tuesday.

    You don't understand. No prediction is happening (would not be possible anyway) - you don't care when it happens because it is so unlikely that it will probably never happen at all. Your "correct design decisions", on the other hand, still need to be implemented, implemented correctly, and then may not matter because by next Tuesday you've won $100mil in the lottery, the company headquarters were wiped out by an earthquake, and the customer has decided to scrap the project due to political in-fighting - all things that are far more likely than a collision between sufficiently large random numbers and against which you cannot guard at all.

    The rational reason why it's better to use a sequence ID rather than a random number is that in most cases it requires no extra work, and it uses far less space and thus offers better performance. A sufficiently safe random ID would need to be several hundred bits long, which would lead to rather bloated database indices.

  • blindman (unregistered) in reply to brazzy
    brazzy:
    The rational reason why it's better to use a sequence ID rather than a random number is that in most cases it requires no extra work, and it uses far less space and thus offers better performance. A sufficiently safe random ID would need to be several hundred bits long, which would lead to rather bloated database indices.
    Using GUIDs saves coding time, saves database calls, and simplifies many administrative processes. I absolutely GUARANTEE that the likelihood of an error being introduced by colliding identities during administrative processing is far greater than the likelihood of ever having two independently generated GUIDs collide. Those are the real world odds. And as to performance, you won't see any noticeable difference between GUIDs and identities until you are closing in on the terabyte range.
  • (cs)
    A month or so later, the same problem happened again. There were two "mixed up" bookings, one that had almost completely overwritten the other. Again, Dan held back and assigned the bug to his developer. A few days later, he received the second resolution: "The chance of this happening was incredibly small; it's impossible to guarantee it won't happen again, but now the chance of it happening again is infinitesimally small."

    This is like the pilot that brings a bomb onto a plane, and explains to the alarmed copilot, "the odds of there being 2 bombs on a plane are way worse than the odds of just one, so I brought one on myself."

  • Matt (unregistered) in reply to ParkinT
    37% of all statistics are made up on the spot!
    Forfty* percent of people know that!
    • I swear to God, that what is said. "Forf-ty".
  • Maarten (unregistered) in reply to brazzy
    brazzy:
    Maarten:
    brazzy:
    In the real world, a theoretical certainty that a failure cannot happen is NOT more valuable than sufficiently small likelihood.

    Actually it is, because predicting exactly WHEN this failure with a small likelihood is going to happen and proving that is absolutely positively NOT next Tuesday, is harder and more time consuming then actually making the correct design decisions and KNOWING it will not happen next Tuesday.

    You don't understand. No prediction is happening (would not be possible anyway) - you don't care when it happens because it is so unlikely that it will probably never happen at all. Your "correct design decisions", on the other hand, still need to be implemented, implemented correctly, and then may not matter because by next Tuesday you've won $100mil in the lottery, the company headquarters were wiped out by an earthquake, and the customer has decided to scrap the project due to political in-fighting - all things that are far more likely than a collision between sufficiently large random numbers and against which you cannot guard at all.

    The rational reason why it's better to use a sequence ID rather than a random number is that in most cases it requires no extra work, and it uses far less space and thus offers better performance. A sufficiently safe random ID would need to be several hundred bits long, which would lead to rather bloated database indices.

    There is one point however, that you missed: If the number was really random, you might get away with it. Most random number generators don't guarantee anything, so the next Tuesday approach is the only feasible one, especially if it is extremely simple to come up with a correct solution.

  • s. (unregistered) in reply to Anon Fred
    Anon Fred:
    I still maintain that the seed has nothing to do with the problem, or else this would happen every time there were 2 orders within the same second. Even a poorly seeded PRNG would be fine for this problem, as long as its period was big enough. (And 100 is definitely not big enough.)

    True. The seed likely included the hash of the order. Two different orders happening the same second are common. Two identical (same source, same destination, same plane, same class, same seat requirements and so on) are quite unlikely (unless you're buying 2 tickets at once, but this case is unlikely to be instantiated as a two separate cases of buying the same ticket at the same time)

    Recently I'd worked on an app that is hardly mission-critical, but still serves lots of users. The "unique ID" (valid 30 days) was first intended to be time since the beginning of the month in miliseconds and the user's own IP. Unfortunately NATs are quite common around here. The ID had to be extended by 4 random digits (generated client side so the seed would be very different each time).

    (yep, the user can mess with the ID. Our policy is "you are free to shot your own foot")

  • CynicalTyler (unregistered) in reply to blindman
    blindman:
    The number of GUID combinations is 2^122. Obviously, you have NO concept of how large this is.
    Obviously you have no concept of the fact that no matter how large a number is, it's still not infinity. 1/2^122 is not equal to zero, 1/infinity is.

    Go math!

  • (cs) in reply to Maarten
    Maarten:
    There is one point however, that you missed: If the number was really random, you might get away with it. Most random number generators don't guarantee anything, so the next Tuesday approach is the only feasible one, especially if it is extremely simple to come up with a correct solution.
    It's not that difficult to get random numbers right, so random IDs are quite feasible.

    I believe we can agree on this: in most cases (most notably whenever you don't have a distributed system), DB-generated serial IDs are easy to use, have no drawbacks and are the best solution.

  • Konrad Zielinski (unregistered) in reply to KattMan

    When you are dealing with an event Driven system the chances of BIzar data flukes become much more likely, as it becomes harder nad harder to predict what two events could be occuring at exactly the same time.

    Using OO generaly dosn't help as it makes the actual code executed harder to trace in many cases. Its esentally normal to just treat the symptom the first time a particular data corruption occurs. On the second time it happens hover arguing that the chances of it happening again are infetesemly small dosn't hold (as it allready did happen again).

  • AdT (unregistered)

    Loophole in Windows Random Number Generator - an interesting coincidence of WTFs

  • (cs) in reply to RTFC
    RTFC:
    "the importance of reading the documentation"

    Now there's the real WTF. The documentation has nothing to do with the code. If you want to know what the code is doing, you read the code. If you read the documentation, you add code that depends on the documentation, you add bugs. RTFD? FOAD.

    Dimwit.

    The compiler has nothing to do with the code. The assembler has nothing to do with the code. Even the CPU (with attendant registers, pipe-lining, etc) has nothing to do with the code.

    (Incidentally, Wiki-style emphasis has nothing to do with BBCode, either.)

    It would be nice to think that even a cloth-eared moron who has sold his soul to the Church of Agile, such as yourself for example, would recognise that the salient principle here is (and I can't emphasise it too strongly):

    If you supply a service to some other person, it is good form to explain what that service does.

    You're lucky, because the chain beneath your (no doubt) deplorable code relies on thirty years' of work to avoid the need for "documentation." (Unless you count compiler/linker errors, which presumably you ignore. Or, in an interpreted/script language, you happily catch errors and exceptions and then throw them away. God knows, Unix shell scripts have been doing this for years.)

    Up the chain, though, you're basically screwing your clients.

    All of them.

    In general, and without exception. And the horse than ran in after you.

    But, in the meantime, consider this: what would you do when confronted with a third-party library with copious documentation -- actually, this is optional -- and no code?

    Writing documentation that may, or may not, be out-of-date "adds bugs?"

    Bugs?

    Have you got tertiary syphilis? Because I certainly don't recognise the idea of out-of-date documentation as a bug in, say, exactly the same way that referencing a null pointer in C or C++ is a bug.

    Nutmeg Programmer:
    This is a Bohr bug, the most dependable kind.

    heisenbug: /hi:'zen-buhg/ n. [from Heisenberg's Uncertainty Principle in quantum physics] A bug that disappears or alters its behavior when one attempts to probe or isolate it. (This usage is not even particularly fanciful; the use of a debugger sometimes alters a program's operating environment significantly enough that buggy code, such as that which relies on the values of uninitialized memory, behaves quite differently.)

    Bohr bug: /bohr buhg/ n. [from quantum physics] A repeatable bug; one that manifests reliably under a possibly unknown but well-defined set of conditions.

    mandelbug: /man'del-buhg/ n. [from the Mandelbrot set] A bug whose underlying causes are so complex and obscure as to make its behavior appear chaotic or even non-deterministic. This term implies that the speaker thinks it is a Bohr bug, rather than a heisenbug.

    schroedinbug: /shroh'din-buhg/ n. [MIT: from the Schroedinger's Cat thought-experiment in quantum physics] A design or implementation bug in a program that doesn't manifest until someone reading source or using the program in an unusual way notices that it never should have worked, at which point the program promptly stops working for everybody until fixed. Though (like bit rot) this sounds impossible, it happens; some programs have harbored latent schroedinbugs for years.

    Now, those are fucking bugs.

    You can't even categorise them in bugzilla, should you be able to type better than an ape or think with your back-brain better than to make lucid comment such as RTFC.

  • Annonymous Coulterstein (unregistered) in reply to Salami

    So Microsoft Basic gets a pass for business use but Random is a problem? Can't wait to try incrementing your session IDs.

    Hey why do I get the same word for this CAPTCHA test every time? :)

  • Bisual Vasic (unregistered) in reply to RandomJoe

    1 in a quintillion is still not zero. How much would it suck if your system had that much chance of a collision, and it collided almost immediately? Sure the odds are ridiculously low, but it could happen.

    RandomJoe:
    GUIDs are useful too if you're returning the ID number back to the customer.

    It adds one more layer of difficulty for a malicious user to look at someone else's order/reservation/whatever.

    Say the customer's status page is http://foo.com/status.php?id=4392.

    If they can just increment the id and see someone else's data, that's bad.

    If it's http://foo.com/status.php?id=439fa2e8c8d8e, they're going to have a harder time finding the next URL to try (obviously this can't be the only layer, but it can be a layer.

    So the solution to this problem is make it harder to guess valid IDs? Instead of just fixing the bug that lets one user use another user's ID? I hope you don't work anywhere important.

  • AdT (unregistered) in reply to RTFC
    RTFC:
    "the importance of reading the documentation"

    Now there's the real WTF. The documentation has nothing to do with the code. If you want to know what the code is doing, you read the code. If you read the documentation, you add code that depends on the documentation, you add bugs. RTFD? FOAD.

    You're on the right track, but of course the source code has nothing to do with the machine code. The documentation for programming languages is no good and if you rely on it, you will introduce bugs. You gotta read the ones and the zeros if you want to be sure. RTFB (Read The Fucking Binary)!

    Actually, you still can't be sure that the CPU manufacturer's Instruction Reference is any good - it's only documentation after all and documentation is by definition erroneous.

    So Read The Fucking Die!

  • (cs) in reply to Bisual Vasic
    Bisual Vasic:
    1 in a quintillion is still not zero.

    It might be close enough though. If you generate random 128 bit IDs for 1000,000 users the chance of any two users having the same ID assigned is so low that it can be assumed never to occur.

    Spending time implementing some other method that had absolute zero chance of producing duplicates may not be worth the time.

  • AdT (unregistered) in reply to Cthulhu
    Cthulhu:
    Bisual Vasic:
    1 in a quintillion is still not zero.

    It might be close enough though. If you generate random 128 bit IDs for 1000,000 users the chance of any two users having the same ID assigned is so low that it can be assumed never to occur.

    Actually, that depends very much on the quality of the random distribution which is often poor (as seen in the article), i.e. very far away from uniformity. Usually, you can't just concatenate the output of four subsequent calls to rnd.

  • blindman (unregistered) in reply to brazzy
    brazzy:
    I believe we can agree on this: in most cases (most notably whenever you don't have a distributed system), DB-generated serial IDs are easy to use, have no drawbacks and are the best solution.
    Sigh....No, we can't agree on that. Identities have LOTS of drawbacks for performance, administration, and coding. I have used GUIDs and I have used identities thousands of times over the last fifteen years. I frequently regret using identities (my current project, for instance, where identities were required by the client but have made the project much more difficult). But I have NEVER, in FIFTEEN YEARS, said to myself "I wish I hadn't used GUIDs in this database." Few areas of database design grant people so much opportunity to demonstrate how misinformed they are than GUIDs. I always throw in a few GUID questions when I am interviewing a job candidate.
  • blindman (unregistered) in reply to CynicalTyler
    CynicalTyler:
    blindman:
    The number of GUID combinations is 2^122. Obviously, you have NO concept of how large this is.
    Obviously you have no concept of the fact that no matter how large a number is, it's still not infinity. 1/2^122 is not equal to zero, 1/infinity is.

    Go math!

    Nice. So, when you are purchasing hardware for a new system, do you always buy the highest quality of every single component, regardless of price? Or do you find the best buy for your buck? Because the chance of your system crashing due to hardware issues is MUCH greater than the chance of it crashing due to colliding GUIDs. Do you have every line of code reviewed by two, three, four, or more developers? Or is that prohibitively expensive? Because the chance of your system crashing due to a bug in your code is MUCH greater than the chance of it crashing due to colliding GUIDs. Your innumeracy is glaring.

  • blindman (unregistered)

    ...and 1/infinity is NOT zero. http://mathforum.org/library/drmath/view/62486.html "Go math?" Go back to high school, buddy.

  • Mitch (unregistered)

    Every experienced coder should know that Random functions-be it in C or C#-don't actually return random numbers.

    I'm just a BSA, and even I knew that one. Remind me to tell you sometime about having to teach SQL to alleged coders.

  • AdT (unregistered) in reply to Cthulhu
    Cthulhu:
    It might be close enough though. If you generate random 128 bit IDs for 1000,000 users the chance of any two users having the same ID assigned is so low that it can be assumed never to occur.

    Also, you didn't take the so-called birthday problem into account.

    If you generate a million 128-bit random IDs for each of a billion users, you already have a non-negligible chance of producing a dupe. Still less than 0.00000002%, though, so it's more likely that an E.L.E. will happen in the next 10 years.

  • (cs) in reply to Bisual Vasic
    Bisual Vasic:
    1 in a quintillion is still not zero. How much would it suck if your system had that much chance of a collision, and it collided almost immediately? Sure the odds are ridiculously low, but it _could_ happen.
    How much would it suck if you habe a 1 in a million chance in getting struck by lightning, and it happened tomorrow? Sure the odds are ridiculously low, but it _could_ happen. Do you refuse to go outside because of it?
    So the solution to this problem is make it harder to guess valid IDs? Instead of just fixing the bug that lets one user use another user's ID? I hope you don't work anywhere important.
    So, how would you fix the bug? Have the user log in and then... huh? First of all, a long enough random number is far harder to guess than any password. Second, after the user has logged in, you need to confirm that he has logged in successfully for subsequent requests. How are you going to do that - give him cookie "loggedIn=true"? No? What else then? Perhaps a cookie with a large RANDOM number that you'll also store in the session?

    I hope you don't work anywhere important.

  • (cs) in reply to blindman
    blindman:
    brazzy:
    I believe we can agree on this: in most cases (most notably whenever you don't have a distributed system), DB-generated serial IDs are easy to use, have no drawbacks and are the best solution.
    Sigh....No, we can't agree on that. Identities have LOTS of drawbacks for performance, administration, and coding. I have used GUIDs and I have used identities thousands of times over the last fifteen years. I frequently regret using identities (my current project, for instance, where identities were required by the client but have made the project much more difficult). But I have NEVER, in FIFTEEN YEARS, said to myself "I wish I hadn't used GUIDs in this database."
    Can we get more specific please?

    Do you seriously propose to use GUIDs as primary keys for all DB tables?

  • Jay Levitt (unregistered)
    his is like the pilot that brings a bomb onto a plane, and explains to the alarmed copilot, "the odds of there being 2 bombs on a plane are way worse than the odds of just one, so I brought one on myself."

    Ha! That's like Mitch Hedberg's roundabout AIDS test:

    I don’t get the regular AIDS test anymore. I get the roundabout AIDS test. I ask my friend Brian, “Do you know anybody who has AIDS?”. He says, “No”. I say, “Cool, because you know me.”

    When I worked on OLTP systems that had (what were at the time) ridiculous transaction volumes, doubling every three to six months, I learned the hard way that there is "no such thing" as a data fluke. (Yes, I know that that is imprecise. I know that even with fault-tolerant hardware, ECC, etc., there is a slim chance of some sort of fluke happening, and thus I'm indulging in the very imprecision that I criticize. If it makes you happy, you can say "the chances of a data fluke are significantly lower the chances of said fluke being caused by almost any 'inconceivably improbable' bug.")

    I had a mail server that would occasionally crash, and I couldn't figure out the cause. There didn't seem to be any commonalities - or, rather, any commonalities that we observed didn't seem to help us reproduce it. The hardware was way too slow for any sort of tracing code; disk writes took maybe a third of a second, and each server instance was processing dozens per second.

    But the crashes were increasing exponentially with our growth. Finally I added a circular buffer, and quickly figured it out from the core files:

    The server processed mail for many types of mail clients. The bug only happened with an ancient client, used by 2% of our user base.

    The server processed both mail that originated locally and mail that originated on the Internet. The bug only happened with mail from the Internet, which comprised 50% of our volume.

    Some 98% of our mail was less than 8K long (this was before HTML e-mail was invented and attachments were widespread). This bug only happened with mail that was longer than 8K, or less than 2% of our volume.

    Internet mail has lots of headers, unlike local mail, and to avoid confusing novice users, we moved those headers to the bottom, separated by a short dashed line. Guess when the bug hit?

    When the dashed line fell across the 8K boundary.

    (I could calculate the probability of that, too, but that would require advanced calculation equipment of some sort, possibly involving math.)

    So out of a million transactions a day per instance, this bug hit 2% * 50% * 2% * teeny% of the time. And, even then, I think it was a stray pointer, so it might or might not cause a crash.

    After that, I started saying "There is no 'sometimes'. There is only 'when it'."

  • blindman (unregistered) in reply to brazzy
    brazzy:
    Can we get more specific please?

    Do you seriously propose to use GUIDs as primary keys for all DB tables?

    I don't propose any rigid rules. But I use GUIDs by default. So basically, I look at a project and decide whether NOT to use GUIDs.

  • (cs) in reply to brazzy
    brazzy:

    Do you seriously propose to use GUIDs as primary keys for all DB tables?

    In a few circumstance, possibly yes. For example, when using a certain database that has poor support for multi master replication and updating multiple servers could cause a collision, then yes, use a GUID.

  • Charly (unregistered) in reply to Anon

    And because of a political outcry about using MAC Addresses the "standard" algorithm for generating GUIDs does not even include the MAC Address anymore... So they are not even geuaranteed to be unique

  • Charly (unregistered) in reply to notromda

    And because generated GUIDs reverse the Least significant Bit/Most significant Bit, Using them as PKs in an OLTP table will guarantee rapidly fragmented indices and really really terible read performance

  • blindman (unregistered) in reply to Charly
    Charly:
    And because generated GUIDs reverse the Least significant Bit/Most significant Bit, Using them as PKs in an OLTP table will guarantee rapidly fragmented indices and really really terible read performance
    Yet another misconception....
  • AdT (unregistered) in reply to brazzy
    brazzy:
    Second, after the user has logged in, you need to confirm that he has logged in successfully for subsequent requests. How are you going to do that - give him cookie "loggedIn=true"? No? What else then? Perhaps a cookie with a large RANDOM number that you'll also store in the session?

    What is HTTPS?

    IOW: Wouldn't it be ironic to use some incredibly hard to guess random number for authentication and then send it in cleartext over an insecure network?

  • Charly (unregistered) in reply to blindman
    blindman:
    Charly:
    And because generated GUIDs reverse the Least significant Bit/Most significant Bit, Using them as PKs in an OLTP table will guarantee rapidly fragmented indices and really really terible read performance
    Yet another misconception....

    Not a misconception at all... Google it, or here, read the link

    http://www.sql-server-performance.com/articles/per/guid_performance_p1.aspx

    There are ways to fix it but they basicallly obviate most of the rationale for using Guids in the first place..

  • blindman (unregistered)

    Did YOU read the article? And did you actually take time to UNDERSTAND it? Apparently not... In his test, inserting 100,000 rows took 30 seconds longer than an Identity. Next time I create an OLTP system where users are inserting 100,000 customer records at a time, I'll keep this in mind. The difference for normal OLTP transactions would be absolutlely negligible. Plus, it looks like his test tables did not contain any columns other than the key values. Throw a few varchar(100)s on there, and watch the performance ratio evaporate... Then, for individual inserts, add the extra time needed for calling SCOPE IDENTITY to send the pkey value back to the application, and subtract the time necessary to call NEWID() because those are being generated client-side, and then tell me where your performance gains go. And what did the article say about reads? "The tests seem to prove that the binary comparison of the GUID performs quite well with the other alternatives." And what is the conclusion drawn in the article? "It would seem, as a result of this testing, that the uniqueidentifier data type performs about the same as an integer data type when filtering the data through the WHERE clause." I have never encountered performance issues from using GUIDs. Those of you who spend your time reading articles you don't understand are really annoying those of us who have been doing this for a living for the last decade. Now, don't you have some tapes that need changing?

  • mt (unregistered)

    As a programmer turned manager (not by choice) I know how the poor guy feels. I have to tell myself 10 times a day not to do something that I hated when my managers did it.

  • (cs) in reply to AdT
    AdT:
    brazzy:
    Second, after the user has logged in, you need to confirm that he has logged in successfully for subsequent requests. How are you going to do that - give him cookie "loggedIn=true"? No? What else then? Perhaps a cookie with a large RANDOM number that you'll also store in the session?

    What is HTTPS?

    IOW: Wouldn't it be ironic to use some incredibly hard to guess random number for authentication and then send it in cleartext over an insecure network?

    Maybe; it is a quite common practice though. Ebay does it, Sourceforge does it, Gmail does it.

    In any case I don't see what it has to do with the discussion at hand - HTTPS ensures nobody can eavesdrop and enables the client to verify the identity of the server, but does not enable the server to authenticate the client's identity, unless you use client certificates, which is very, very rarely done.

  • (cs) in reply to notromda
    notromda:
    brazzy:

    Do you seriously propose to use GUIDs as primary keys for all DB tables?

    In a few circumstance, possibly yes. For example, when using a certain database that has poor support for multi master replication and updating multiple servers could cause a collision, then yes, use a GUID.

    Note that I explicitly said "whenever you don't have a distributed system" .

    Charly:
    And because of a political outcry about using MAC Addresses the "standard" algorithm for generating GUIDs does not even include the MAC Address anymore... So they are not even geuaranteed to be unique
    They weren't guaranteed to be unique before that, either. MAC addresses can be set via software, and manufacturers have been known to reuse them as well.
  • Charly (unregistered) in reply to blindman
    blindman:
    Did YOU read the article? And did you actually take time to UNDERSTAND it? Apparently not... In his test, inserting 100,000 rows took 30 seconds longer than an Identity. Next time I create an OLTP system where users are inserting 100,000 customer records at a time, I'll keep this in mind. The difference for normal OLTP transactions would be absolutlely negligible. Plus, it looks like his test tables did not contain any columns other than the key values. Throw a few varchar(100)s on there, and watch the performance ratio evaporate... Then, for individual inserts, add the extra time needed for calling SCOPE IDENTITY to send the pkey value back to the application, and subtract the time necessary to call NEWID() because those are being generated client-side, and then tell me where your performance gains go. And what did the article say about reads? "The tests seem to prove that the binary comparison of the GUID performs quite well with the other alternatives." And what is the conclusion drawn in the article? "It would seem, as a result of this testing, that the uniqueidentifier data type performs about the same as an integer data type when filtering the data through the WHERE clause." I have never encountered performance issues from using GUIDs. Those of you who spend your time reading articles you don't understand are really annoying those of us who have been doing this for a living for the last decade. Now, don't you have some tapes that need changing?

    boy are you obnoxious.. Are you always like this?
    And - yes I did read the article, and yes I do understand it - and no - don't expect any further interaction as I'm done wasting time with someone as immature and obnoxious as you are.

  • (cs) in reply to Charly
    Charly:
    boy are you obnoxious.. Are you always like this? And - yes I did read the article, and yes I do understand it - and no - don't expect any further interaction as I'm done wasting time with someone as immature and obnoxious as you are.

    Nice logical fallacy, refusing to admit that you had your whole shit handed to you on a silver platter by someone that is a lot smarter and knowledgeable than you, and doesn't feel the need to just throw around buzzwords like OLTP FRAGMENTATION TABELZ LOLZ.

    "I'd take the time to explain it but you're an idiot so i won't waste my time"

    man am i glad that i'm not you. or your friend. or anyone that will ever interact with you outside of this discussion.

  • (cs) in reply to AdT
    AdT:
    Cthulhu:
    Bisual Vasic:
    1 in a quintillion is still not zero.

    It might be close enough though. If you generate random 128 bit IDs for 1000,000 users the chance of any two users having the same ID assigned is so low that it can be assumed never to occur.

    Actually, that depends very much on the quality of the random distribution which is often poor (as seen in the article), i.e. very far away from uniformity. Usually, you can't just concatenate the output of four subsequent calls to rnd.

    Calling rand() multiple times will do it fine

Leave a comment on “Identity Crisis ”

Log In or post as a guest

Replying to comment #:

« Return to Article