- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
Were you responding to me or someone else? Again, a correctly-implemented try-write inside a loop will work; the first insert works on the first try, the second insert fails on the first try and (after another increment) works on the second.
Of course, if auto-increment is built into the DB, then you should use that instead. And your web app should check that you have permission to get the data you request, to block adjacent-ID snooping.
Admin
The question is, what did Hugo's post have to do with what he quoted? He gave a valid and humorous response - but to a question that wasn't asked. That, I believe, was OneFactor's point.
Admin
ergh. Can somebody tell me if I've been naughty? I'm using a random seed generator for my sessions (in the name of security, rather than setting cookies with the username set in it or the like I have a server-side session ID and various other things.)
God, this forum software is awful. I could write something better than this >_>
Admin
Whoops, HTML typo. this...
Should read as...
rashin frashin brillant forum...
If it doesn't work this time, I give up...
Admin
AUTO_INCREMENT? And where is that in the sybase documentation?
Admin
Random ID are occasionally required. E.g. a company want random order ID numbers so that stock analysts can't guess at their sales figures by looking at the IDs. Or a cell phone manufacturer might want to make their phone serial numbers random for the same reason.
This WTF isn't as WTF as might appear at first. Make the random selection simpler (minor improvement as the SQL query dominates timings anyway), make it transaction safe, and make the sequence space larger than six digits. If the application requires random IDs then the real WTF is if it was "fixed" by using a database sequence...
Admin
This is the standard FAQ: http://perl.plover.com/FAQs/Namespaces.html
Admin
what auto_increment?
Admin
NOOOO!!!!!
If I had a pound for every time I've had to fix code that uses the MAX() + 1 abomination, I'd have err...about £10
Admin
Maybe I'm going to say something silly, but in Java you can ask your Statement for all keys it has generated via updates:
I'm not sure if you can do it too for multiple updates
Admin
I wouldn't even start to think about rekeying. Instead, I'd start with negative identity keys and go downwards from there. The 'alternate ID domain' technique works in many interesting scenarios...
Best Regards,
Gene Pool
Admin
..insert stuff here..
declare @ID int
set @ID = @@indentity
first WTF-post ever. Hope this works!
Admin
That's me! Can I say hello? Hello, mom! I'm on WTF!
Jokes aside, I have learned lots of things with the WTFs posted here, ehem, I mean, with the posts commenting the WTFs, of course :)
For a first, the discussion about how adding arbitrary columns in a database using a XML-filled TEXT column, that finally brought me to use ALTER TABLE ADD/DROP COLUMN in a system I'm building.
For a second, I learnt about left joins on this incredible misuse of joins, and posterior discussions on how "right join" is so much better and clearer that "left join" , you just can't make these things up :) .
And, of course, rewriting the WTF in a sensible manner serves me to learn how to program better, and forces me to confront new problems I will probably find later anyways. We're not all jaded profesionals, you know.
P.D.: my avatar is Schrodinger's cat, awaiting its fate
Admin
I've used loops not unlike this where precision wasn't particularly important. But //any// while() loop is potentially dangerous, which is why I (try to) always put bounds on the while loop.
EG:
$i=0;
while (somecondition)
{
if ($i++>1000)
return error("Not found error seeking next user id");
// do whatever you wanted to do
}
Even in cases like this WTF, where there are more than 1,000 possible options, having a system that would perform over 1,000 select queries and STILL not have a right answer is a mis-estimation of what needs to be accomplished - it's still a proper error.
Admin
But it's easy to "randomize" the customer ID. Try using MD5 with a common "salt" phrase appended to the UID, for smaller databases, for instance.
Admin
The problem with GUIDs is that they are not really unique, just very, very, very, very, very, very, very, very, very, very, very, very, very, very likely to be unique.
Admin
I can't blame Jeff. In the middle of the night I'd probably just improve the hack as well and properly fix it in the morning.
The operations department isn't responsible for technology/engineering.
Adding a nice // FIXME would be responsible though, while making sure procedures are in place to spend time on removing FIXMEs.
Admin
That'll work if it is ok to have customer IDs that are fairly large numbers: d41d8cd98f00b204e9800998ecf8427e. It isn't ok e.g. if you want to be able to ask over the phone: "what is your customer ID please". Or it won't work for, say, bank account numbers which have smallish length limitations (been there, done that).
Using MD5 like that essentially amounts to using it as a random number generator with the salt+sequence as the seed for the RNG. Also you would want more randomness than just a static salt because anyone who has or ever has had access to the salt can easily reverse engineer the checksum. It only takes one ex-sysadmin who used to have root access to the file containing the salt and who has turned bad...
Admin
*sigh*
As already stated here a number of times, @@identity has been deprecated since SQL Server 2000 came out. Use scope_identity() instead, unless you like adding a sense of mystery to your apps.
Admin
It is likely a US Government Agency
Admin
If you are working with something like postgres you can run a query:
select nextval('sequence_name');
And this will return a value from the sequence that you can then use, and also increment the sequence so that that value will not be given out again. more details...
Admin
Hugo unfortunately didn't quote the entire post -- just the second paragraph. Go back and check it out.
Admin
Again, someone completely missing the point. The problem is NOT that the new ID's need to be random. the problem is the try-and-then-try-again "algorithm" used to generate them!
Admin
If the IDs are *really* random, there is hardly way to avoid the try-and-then-try-again method; but of course the range has to be big enough to make collissions rather unlikely and the number of trials should be limited, in case something goes wrong (it's much easier to find the code that threw the "cantFindNewIdException" than to find the infinite loop).
On the other hand, if pseudo-random is good enough, it's not difficult to completely avoid collissions.
Admin
hahaha i love this perl!
Admin
"Select @@IDENTITY" right after a row insert returns the last generated value in the identity column.
Admin
So how would you generate random IDs without checking if you have generated the same one already, then looping and retrying if there is a collision? Please do explain some more how I'm missing the point.
Admin
I am no expert on hashing, but I am pretty sure the are 1-1 hash functions out there, that given a range of numbers, will map one to one with the same set, but with seemingly random patterns.
if the range is 0-99
f(1) = 53
f(2) = 12
f(3) = 99
f(x) = z
where it is never the case that f(x)==f(y) if x != y
so, now just use a sequence in your database to come up with a number, and use the hash function to create the primary key. Obviously you need to plan ahead and make sure your range never needs to increase, unless there truly is a magic hash function that can increase the range without collisions with the smaller range.
Eck, I babble.
Admin
Assuming, of course, that the original developer was bright enough for the solution to "feel stupid"....
Admin
Damn Perl. While it is certainly true that a WTF can be written in any language, Perl sure makes it a lot easier than some. Not having read the above scoping FAQ, I got tripped up by the difference between "my $x, $y" (which seemed a logical extension of "my $x") and "my ($x, $y)". Not that you get an error message or anything if you use the wrong syntax; most of your variables just silently fail to get locally scoped. WTF?
Admin
(...talking to myself...)
There is a variant of the generate-select-loop algorithm: generate-insert-loop. Generate a random ID, then try to insert the actual business object into the database. Loop and retry if you get a unique index violation. Extra bonus: automatically deals with clustering. Disadvantage: you'll know the ID number only after a successful insert, bad if you happen to need it before.
I've actually done that. It's a bit scary though: error codes are database dependent, and if you have other unique indexes besides the ID it requires even more hackery to figure out which index broke. But I had to stuff the maximum number of transactions to a slightly underpowered database, and that bought me one insert instead of select+insert. The application was an online stock IPO subscription system, and I needed transaction reference numbers without disclosing the number of subscribers so far (= popularity of the IPO).
A further variant would be to have a dedicated table with just the ID numbers and a unique index; insert there to check for duplicates and to "reserve" an ID.
Admin
My bad. I only read Hugo's post which quoted the second paragraph and it looked like the original question was asking about returning the identity (perfectly legit) so it could be used. Then Hugo's post appeared like it was making fun of the the original question for asking about using an identity and generating it beforehand. I was trying to point out that the original question did not ask for an identity before hand. But I can see now that the original question in fact did ask for an identity beforehand.
So yes, Hugo's post is both funny and addresses the original question adequately. So to sum it all up:
1. you can't generate an identity beforehand
2. if you want something beforehand use a guid
3. if you need an identity, return the result of scope_identity
4. please do not use @@identity or else baby Jesus will cry.
Do we have a count on the number of people giving out that bad advice of @@identity? duplicate WTF's within the WTF. I'm beginning to see why religious zealots get so touchy over heresies. They just keep coming up over, and over, and over...
Admin
Two things:
1. Is it possible to automatically reject all posts with the text "Select @@IDENTITY," because that is the most ubiquitous wrong answer to any question on the internet.
2. I wonder if the original system referenced by this WTF used the same random seed. I would find it personally hillarious if each iteration tested THE SAME sequence of pseudo-random numbers until finding the next available one. Nested WTFs are the best.
Admin
Yeah, you can do that. One problem is that once someone (such as the vendor who wrote the code) knows the algorithm he can easily reverse engineer it. Security by obscurity. Probably can't change the algorithm either because a new algorithm would generate clashes with the old IDs already in the database. It would somehow have to be an algorithm that is somehow seeded with a seed that nobody can find out later. Is this starting to sound like a nonexistengorithm...?
If you can trust your software vendors and your IT staff (past as well as current) then that'll work. I come from banking background where nobody including me is to be trusted. I quit my job at the bank and I have suspicion my ex-boss is happy I didn't walk out with the keys to the vault in my head...
Admin
Your code is wrong, but because of other problems. See below (long post about randomness of random())
Is this is a frigging recursive function? I'm sorry, but, you see, if your process ever gets stuck searching for free keys, not only you will enter an infinite "loop", but you will run out of heap because of so many recursions :)
I suggest, first of all, rewriting it as iterative, so that simple mortals have a chance to understand it, instead of staying a few minutes in front of the computer scratching their heads in disbelief. I know that is an inferior approach, because then you can't make certain hacks to the code, but....
The problem here is that rand() is not totally random on a normal computer, so, if you hit the same seed as another time, you will get a loooooong series of "random" numbers equal to the last time you hit that same seed. So you may wind up calling mysql_query lots of times, because rand() already gave you all those numbers in a former run, and they are still in the database. Even if you don't get the same seed, you may encounter repetitions and series of numbers similar to series you already got.
(maybe Perl automatically seeds using the system time, or maybe to need to seed yourself, I don't know)
You can improve randomness with some tricks. I don't know Perl, so I'm using a made up function. "shift(number,how_many)" will shift to the left the parameter "chain" by "how_many" number of bits. Overflow bits reappear to the right (I forgot the exact name for this). You can also uses mask, sums and multiplies. "tail_to_size(string,size)" is supossed to reduce the string to a "size" length. We want the lowest bytes here, because they are sligthly more random.
The reason for this is that "rand" may be returning always the exact series of ":random" numbers when it gets the same seed, so you always need to either: a) give "rand" an arbitrary seed set by exterior events (system time, for example) or b) operate the random number with an arbitrary number. Most users will have different IPs, and you get for free 255 different last_bytes_of_IP ones :)
If you generate many, many, many random numbers, and apply them somewhere, you will start seeing enormous biases depending on how you seed. Here you are using a 16 digit number, so biases are less visible.
Rant: in a small car videogame for university, I once had to try getting a random number between 2 and -2 five times a second. This number decided what side the road moved as the screen scrolled. When I hit the screen border, I only accepted part of the range (if I hit the right border, I ignore values over zero, because I would fall out of the screen).
I kept getting a bias for values over zero, so the road kept hitting the right border, going back to the middle of the screen, then moving between the middle and the right border, occasionally making small incursions towards the left border, sometimes even reaching it. I also got different bias depending on what computer I ran it on!
At the end, I squashed the bias by combining two different random numbers with different seeds, XORing them, shifting by a *third* random number that had already been un-biased, then using the most random bits. In most computers, all biases disappeared or became very small.
In my case, it was a very visible bias. In your case, you may be getting a bias, but it is not visible to you, or you may have avoided it altogether by using 16 digits numbers :) In my case, I had a very small output set, so any bias had enormous effects.
Admin
In defense of the WTF, while it isn't thread safe, it is at least an order of magnatude less likely to fail.
With the MAX, it has a chance to fail each time two threads are looking for an ID at the same time. At least with the WTF, there is only a 1 in 1000000 chance that the two threads looking for an ID will generate the same random value. (Of course as the table fills up more and more threads will be looking for values at the same time)
Admin
This is easier to understand if you remember that the originators of perl were interested in fun, creativity, making puns and other low class jokes, feeling good about yourself and creating an ambiguous, redundant, language, unsuited to production, not a language meant for development of reliable, maintainable code.
Admin
looks different between IE/Firefox
Admin
use warning;
and perl will give you a warning, isn't that easy?
Admin
Ah, but Sybase AutoIncrement has issues, or at least it has in the past, when the system is halted it can start back with an incorrect seed and will create duplicate identities, which can cause a great WTF!
Admin
One cheap way for a unique pk is use yyyymmddHHiiss. It'll be the same length. Plus you know when the record was added. Also it's sortable.
And If you can hit the submit button at the same exact second someone else does...I'll kiss your butt.
Maybe this is a wtf, but I've got a system running 10 years now with no problems. [{][B]
--------------------
Admin
IE doesn't support position fixed;
Admin
Don't forget, we are talking about the generation of "random" numbers. I udnerstand your point that by using non-sequential numbers -- you can obscure the # of users you have, make it harder to guess other "IDs", or the # of transactions in a system. But knowing that secret pseudo-random number algorithm is *slightly* different than knowing the "back door that will access the bank vault" algorithm.
Even if no developers can ever be trusted with this secret algorithm (i.e., your manager also believes that UFO's are watching him and so on) you can always lock away the key value that the hash algorithm uses -- kind of like an encryption key.
Admin
Probably because either a) not many people are using your system concurrently or b) you code is ignoring constraint violations in your SQL statements.
Admin
create table Seed (id int identity, when DateTime)
declare @seed int, @random int
DO_IT:
insert into Seed values (getdate())
select @seed = @@identity
/* check just in case...*/
if(not exists(select id from Seed where id = @seed))
goto DO_IT
set @random = fGenRandom(@seed)
Done and done.
Wasn't that easy?
Admin
Incorrecto. ¡Cuarto!
:-P
Admin
I know it was not part of the original issue, but a lot of these key generation schemes will really bite you in the arse if you ever need replication.
Admin
Read his question AGAIN. You're advise is good, but it's not addressing his problem. He needs a way to assign a PK outside the database before it's written. I already outlined 2 basic ways to approach the problem.
You will run into this design problem whenever you have to write offline applications (i.e. an order taking application that sits on a laptop and doesn't upload back to the database until much later). You can make the unique number a reference number but NOT the PK of the real record (which many do), or you can implement a smart algorithm at the client that ensures uniqueness (the constraint should be heavily verified at the database side anyway to prevent duplicate inserts).
Admin
I did. Ther person he quoted acknowledged that the way he was doing it was a WTF and then asked how to get the ID after it was inserted, not before like he had been doing.
Admin
A relatively easy way to do that is "(c^n) mod p" (c and p are well-choosen constants, p should be a prime, n the number of the key to generate. For example, c=75 and p=65537 will generate all numbers in the range of 1..65536. (Of course this range is too small for practial purposes, take it as an illustrative example). To make it harder to guess n, just add a secret constant ...
key=((c^n)+s) mod p
Obviously there is no way to create such a magic hash function in a non-obvious way.
(Obvious way: range 1..x maps to 1..x, x+1..y maps to x+1, y etc.)