- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
Nonsense, if you have to resolve such technical issues then you should implement a decent work around. Something like locally caching and a solid merging algorithm. In any case, there should be a system in place that can detect and resolve collisions in an acceptable way.
Why do you think people have spent so much time in solving hard problems related to parallel computations? Not because these were all part of some trivial issue that can be solved with blind faith in statistics.
Using random numbers is just a false sense of security, no matter how small the chances of collisions are. Especially if the stakes are high ("oops sorry sir, our software accidentally lost your 1 bilion dollar transfer, but we're very confident it will not happen again").
I pitty to poor sod who has to maintain your code in the future. The trick in solid software development is to make your system as predictable as possible. Only then can you reason your way through intricate bugs and implement a fix.
Admin
"Using random numbers is just a false sense of security, no matter how small the chances of collisions are"
It's acceptable if the chances of collision are low enough.
Admin
Wow, my first submission to be published :) I never realized how much they modify and make up in these stories... Even my name is slightly wrong. Anyway, most of the comments about my behaviour are meaningless, as things really didn't go down the way they are described in the article. A colorful reenactment of the events, but not very correct.
Regarding the granularity of the random function - this is VBScript we're talking about; by default it seeds using the number of seconds that have passed since midnight.
Funny thing is, though, a few months later we had another incident with random numbers. A legacy VB6 program was used to change a weekly password in a customer's database. It came up with a very limited range of passwords - not always the same passwords, but very often. When testing the software, though, you could run it a thousand times and it would always produce new, different passwords. The developer who was assigned to the case made the software iterate until the password didn't match last week's password, then marked the bug as resolved. When reviewing the developer's "fix", I realized the actual problem had less to do with the code, and more to do with the fact that the application had been scheduled to run at precisely 1 am every monday.
Admin
Voldermort ?
Admin
Admin
That is a wrong assessment IMHO. If the stakes are high enough, NO WAY will you ever use a random number for unique ID's.
Try that in sattelite hardware/software, and you should be fired instantly. Cause you bet your ass that a collision will happen, and then what? Sattelite straight down the drain?
CAPTCHA: booyakasha
Admin
Admin
Lots of shops have a regular program (e.g. monthly lunch) with a short tech talk. Reviewing random numbers might be appropriate.
That said, I've been doing business programming for a long time, and the only times I've ever needed random numbers was in doing Monte Carlo simulations and the like. I can see that you may need randomness for passwords, etc., but it seems pretty sketchy to me for database keys.
Admin
If you use the right algorithm, you can guarantee that GUIDs are unique to a reasonable level.
Of course, this depends on using the appropriate algorithm - using a unique 'node id' (eg NIC MAC address)
Once you have this, the only way you can generate duplicates is if you swap a network card from one PC to another PC with a clock which is behind the first PC's clock by at least the time it took to swap the network card. If all PCs are running with synchronised clocks, then there's no way you can get duplicates with version 1 GUIDs.
Version 4 GUIDS on the other hand - I'd never want to consider using those for important stuff, they just don't seem determinate enough.
Admin
Well, I guess you can tell all those computer scientists who think up error correction and detection algorithms to go home. Clearly they have nothing to worry about. RAM never stores the wrong bits. Hard drives never return bad data.
Of course there is always a reason. It isn't magic. It is just that the reason isn't always the software (or the user). This isn't to say you can't use software to try and detect and deal with unpredictable hardware errors, but they do happen.
OR sometimes you don't bother finding the specific reason and you just toss in some error detection and try to deal with errors as gracefully as a possible.
-matthew
Admin
To be honest there are two WTF's here:
First, that the development manager let this go three times before doing something about it.
Second, his response was to find the problem, then have a big meeting and rub everyones noses in it.
Both indicate bad managment (be it technical or not). If there is an issue where you know you can help, then help. Don't sit back and then parade the problem around for all to see.
Of course, I've any number of isntances where ex-development turned manager types have sworn blind the issue is with X only for it to turn out to be the pink zebra walking past the building. Just because you used to be a developer does not mean that you have onmicient knowledge of all things.
We're human; get over it and move the hell on.
Admin
Actually it is, because predicting exactly WHEN this failure with a small likelihood is going to happen and proving that is absolutely positively NOT next Tuesday, is harder and more time consuming then actually making the correct design decisions and KNOWING it will not happen next Tuesday.
Admin
The rational reason why it's better to use a sequence ID rather than a random number is that in most cases it requires no extra work, and it uses far less space and thus offers better performance. A sufficiently safe random ID would need to be several hundred bits long, which would lead to rather bloated database indices.
Admin
Admin
This is like the pilot that brings a bomb onto a plane, and explains to the alarmed copilot, "the odds of there being 2 bombs on a plane are way worse than the odds of just one, so I brought one on myself."
Admin
Admin
There is one point however, that you missed: If the number was really random, you might get away with it. Most random number generators don't guarantee anything, so the next Tuesday approach is the only feasible one, especially if it is extremely simple to come up with a correct solution.
Admin
True. The seed likely included the hash of the order. Two different orders happening the same second are common. Two identical (same source, same destination, same plane, same class, same seat requirements and so on) are quite unlikely (unless you're buying 2 tickets at once, but this case is unlikely to be instantiated as a two separate cases of buying the same ticket at the same time)
Recently I'd worked on an app that is hardly mission-critical, but still serves lots of users. The "unique ID" (valid 30 days) was first intended to be time since the beginning of the month in miliseconds and the user's own IP. Unfortunately NATs are quite common around here. The ID had to be extended by 4 random digits (generated client side so the seed would be very different each time).
(yep, the user can mess with the ID. Our policy is "you are free to shot your own foot")
Admin
Go math!
Admin
I believe we can agree on this: in most cases (most notably whenever you don't have a distributed system), DB-generated serial IDs are easy to use, have no drawbacks and are the best solution.
Admin
When you are dealing with an event Driven system the chances of BIzar data flukes become much more likely, as it becomes harder nad harder to predict what two events could be occuring at exactly the same time.
Using OO generaly dosn't help as it makes the actual code executed harder to trace in many cases. Its esentally normal to just treat the symptom the first time a particular data corruption occurs. On the second time it happens hover arguing that the chances of it happening again are infetesemly small dosn't hold (as it allready did happen again).
Admin
Loophole in Windows Random Number Generator - an interesting coincidence of WTFs
Admin
The compiler has nothing to do with the code. The assembler has nothing to do with the code. Even the CPU (with attendant registers, pipe-lining, etc) has nothing to do with the code.
(Incidentally, Wiki-style emphasis has nothing to do with BBCode, either.)
It would be nice to think that even a cloth-eared moron who has sold his soul to the Church of Agile, such as yourself for example, would recognise that the salient principle here is (and I can't emphasise it too strongly):
If you supply a service to some other person, it is good form to explain what that service does.
You're lucky, because the chain beneath your (no doubt) deplorable code relies on thirty years' of work to avoid the need for "documentation." (Unless you count compiler/linker errors, which presumably you ignore. Or, in an interpreted/script language, you happily catch errors and exceptions and then throw them away. God knows, Unix shell scripts have been doing this for years.)
Up the chain, though, you're basically screwing your clients.
All of them.
In general, and without exception. And the horse than ran in after you.
But, in the meantime, consider this: what would you do when confronted with a third-party library with copious documentation -- actually, this is optional -- and no code?
Writing documentation that may, or may not, be out-of-date "adds bugs?"
Bugs?
Have you got tertiary syphilis? Because I certainly don't recognise the idea of out-of-date documentation as a bug in, say, exactly the same way that referencing a null pointer in C or C++ is a bug.
Now, those are fucking bugs.You can't even categorise them in bugzilla, should you be able to type better than an ape or think with your back-brain better than to make lucid comment such as RTFC.
Admin
So Microsoft Basic gets a pass for business use but Random is a problem? Can't wait to try incrementing your session IDs.
Hey why do I get the same word for this CAPTCHA test every time? :)
Admin
1 in a quintillion is still not zero. How much would it suck if your system had that much chance of a collision, and it collided almost immediately? Sure the odds are ridiculously low, but it could happen.
So the solution to this problem is make it harder to guess valid IDs? Instead of just fixing the bug that lets one user use another user's ID? I hope you don't work anywhere important.Admin
You're on the right track, but of course the source code has nothing to do with the machine code. The documentation for programming languages is no good and if you rely on it, you will introduce bugs. You gotta read the ones and the zeros if you want to be sure. RTFB (Read The Fucking Binary)!
Actually, you still can't be sure that the CPU manufacturer's Instruction Reference is any good - it's only documentation after all and documentation is by definition erroneous.
So Read The Fucking Die!
Admin
It might be close enough though. If you generate random 128 bit IDs for 1000,000 users the chance of any two users having the same ID assigned is so low that it can be assumed never to occur.
Spending time implementing some other method that had absolute zero chance of producing duplicates may not be worth the time.
Admin
Actually, that depends very much on the quality of the random distribution which is often poor (as seen in the article), i.e. very far away from uniformity. Usually, you can't just concatenate the output of four subsequent calls to rnd.
Admin
Admin
Admin
...and 1/infinity is NOT zero. http://mathforum.org/library/drmath/view/62486.html "Go math?" Go back to high school, buddy.
Admin
Every experienced coder should know that Random functions-be it in C or C#-don't actually return random numbers.
I'm just a BSA, and even I knew that one. Remind me to tell you sometime about having to teach SQL to alleged coders.
Admin
Also, you didn't take the so-called birthday problem into account.
If you generate a million 128-bit random IDs for each of a billion users, you already have a non-negligible chance of producing a dupe. Still less than 0.00000002%, though, so it's more likely that an E.L.E. will happen in the next 10 years.
Admin
I hope you don't work anywhere important.
Admin
Do you seriously propose to use GUIDs as primary keys for all DB tables?
Admin
Ha! That's like Mitch Hedberg's roundabout AIDS test:
When I worked on OLTP systems that had (what were at the time) ridiculous transaction volumes, doubling every three to six months, I learned the hard way that there is "no such thing" as a data fluke. (Yes, I know that that is imprecise. I know that even with fault-tolerant hardware, ECC, etc., there is a slim chance of some sort of fluke happening, and thus I'm indulging in the very imprecision that I criticize. If it makes you happy, you can say "the chances of a data fluke are significantly lower the chances of said fluke being caused by almost any 'inconceivably improbable' bug.")
I had a mail server that would occasionally crash, and I couldn't figure out the cause. There didn't seem to be any commonalities - or, rather, any commonalities that we observed didn't seem to help us reproduce it. The hardware was way too slow for any sort of tracing code; disk writes took maybe a third of a second, and each server instance was processing dozens per second.
But the crashes were increasing exponentially with our growth. Finally I added a circular buffer, and quickly figured it out from the core files:
The server processed mail for many types of mail clients. The bug only happened with an ancient client, used by 2% of our user base.
The server processed both mail that originated locally and mail that originated on the Internet. The bug only happened with mail from the Internet, which comprised 50% of our volume.
Some 98% of our mail was less than 8K long (this was before HTML e-mail was invented and attachments were widespread). This bug only happened with mail that was longer than 8K, or less than 2% of our volume.
Internet mail has lots of headers, unlike local mail, and to avoid confusing novice users, we moved those headers to the bottom, separated by a short dashed line. Guess when the bug hit?
When the dashed line fell across the 8K boundary.
(I could calculate the probability of that, too, but that would require advanced calculation equipment of some sort, possibly involving math.)
So out of a million transactions a day per instance, this bug hit 2% * 50% * 2% * teeny% of the time. And, even then, I think it was a stray pointer, so it might or might not cause a crash.
After that, I started saying "There is no 'sometimes'. There is only 'when it'."
Admin
Admin
In a few circumstance, possibly yes. For example, when using a certain database that has poor support for multi master replication and updating multiple servers could cause a collision, then yes, use a GUID.
Admin
And because of a political outcry about using MAC Addresses the "standard" algorithm for generating GUIDs does not even include the MAC Address anymore... So they are not even geuaranteed to be unique
Admin
And because generated GUIDs reverse the Least significant Bit/Most significant Bit, Using them as PKs in an OLTP table will guarantee rapidly fragmented indices and really really terible read performance
Admin
Admin
What is HTTPS?
IOW: Wouldn't it be ironic to use some incredibly hard to guess random number for authentication and then send it in cleartext over an insecure network?
Admin
Not a misconception at all... Google it, or here, read the link
http://www.sql-server-performance.com/articles/per/guid_performance_p1.aspx
There are ways to fix it but they basicallly obviate most of the rationale for using Guids in the first place..
Admin
Did YOU read the article? And did you actually take time to UNDERSTAND it? Apparently not... In his test, inserting 100,000 rows took 30 seconds longer than an Identity. Next time I create an OLTP system where users are inserting 100,000 customer records at a time, I'll keep this in mind. The difference for normal OLTP transactions would be absolutlely negligible. Plus, it looks like his test tables did not contain any columns other than the key values. Throw a few varchar(100)s on there, and watch the performance ratio evaporate... Then, for individual inserts, add the extra time needed for calling SCOPE IDENTITY to send the pkey value back to the application, and subtract the time necessary to call NEWID() because those are being generated client-side, and then tell me where your performance gains go. And what did the article say about reads? "The tests seem to prove that the binary comparison of the GUID performs quite well with the other alternatives." And what is the conclusion drawn in the article? "It would seem, as a result of this testing, that the uniqueidentifier data type performs about the same as an integer data type when filtering the data through the WHERE clause." I have never encountered performance issues from using GUIDs. Those of you who spend your time reading articles you don't understand are really annoying those of us who have been doing this for a living for the last decade. Now, don't you have some tapes that need changing?
Admin
As a programmer turned manager (not by choice) I know how the poor guy feels. I have to tell myself 10 times a day not to do something that I hated when my managers did it.
Admin
In any case I don't see what it has to do with the discussion at hand - HTTPS ensures nobody can eavesdrop and enables the client to verify the identity of the server, but does not enable the server to authenticate the client's identity, unless you use client certificates, which is very, very rarely done.
Admin
Admin
boy are you obnoxious.. Are you always like this?
And - yes I did read the article, and yes I do understand it - and no - don't expect any further interaction as I'm done wasting time with someone as immature and obnoxious as you are.
Admin
Nice logical fallacy, refusing to admit that you had your whole shit handed to you on a silver platter by someone that is a lot smarter and knowledgeable than you, and doesn't feel the need to just throw around buzzwords like OLTP FRAGMENTATION TABELZ LOLZ.
"I'd take the time to explain it but you're an idiot so i won't waste my time"
man am i glad that i'm not you. or your friend. or anyone that will ever interact with you outside of this discussion.
Admin
Calling rand() multiple times will do it fine