The Daily WTF: Curious Perversions in Information Technology

emurphy · 2006-03-02 Reply Admin

craig:

Except that leaves open the possibility of two simultaneous calls getting the same max value and the first insert to be submitted to the DB will work, while the other will fail.

Were you responding to me or someone else? Again, a correctly-implemented try-write inside a loop will work; the first insert works on the first try, the second insert fails on the first try and (after another increment) works on the second.

Of course, if auto-increment is built into the DB, then you should use that instead. And your web app should check that you have permission to get the data you request, to block adjacent-ID snooping.

mugs · 2006-03-03 Reply Admin

Jeff S:
OneFactor:
Jeff S:
Hugo:
Anonymous:

I know MSSQL has an auto-increment option, but how would I implement a non-WTF solution that returns the ID so I can use it?

If:

a) the auto-increment MUST be an identity, and

b) you MUST know the new value before actually inserting,

then:

1) Restart reading from the beginning of the specs. You have already read past the WTF, you just didn't realise yet.

2) If you still want this, use IDENT_CURRENT('tablename') and add 1. Make sure to use a transaction and force a table lock before calling the function. Don't forget to send your code to Alex when you're done, then we can all have a laugh at it.

Best, Hugo

Hugo -- that is a GREAT post! Could not agree more, and very well put.

Thank you, that made my day. :)

the original question did not demand knowing the identity before inserting. It required returning the inserted identity so it could be used. scope_identity() works just fine for that. The original question was perfectly legit and has a workable solution. As of Sql Server 2000. The real WTF is that it took so long for SqlServer to provide this solution.

uh ... that's great, thanks for telling us ... not sure what any of that has to do with what you've quoted, but thanks for clearing that up.

The question is, what did Hugo's post have to do with what he quoted? He gave a valid and humorous response - but to a question that wasn't asked. That, I believe, was OneFactor's point.

Vector · 2006-03-03 Reply Admin

ergh. Can somebody tell me if I've been naughty? I'm using a random seed generator for my sessions (in the name of security, rather than setting cookies with the username set in it or the like I have a server-side session ID and various other things.)

function fSeed() {

$s = 0;

for($x = 0; $x < 16; $x++) {
$s .= rand(0, 9);
}

$q = mysql_query("SELECT * FROM `sessions` WHERE `SID` = '{$s}'") or die(mysql_error());
if (mysql_num_rows($q) == 1) {
return fSeed();
} else {
return $s;
}
}

God, this forum software is awful. I could write something better than this >_>

Vector · 2006-03-03 Reply Admin

Whoops, HTML typo. this...

for($x = 0; $x < 16; $x++) {
$s .= rand(0, 9);
}

Should read as...

for($x = 0; $x < 16; $x++) {
$s .= rand(0, 9);
}

rashin frashin brillant forum...

If it doesn't work this time, I give up...

2006-03-03 Reply Admin

AUTO_INCREMENT? And where is that in the sybase documentation?

2006-03-03 Reply Admin

Anonymous:
This is what I don't get. What's the benefit of being random?

Random ID are occasionally required. E.g. a company want random order ID numbers so that stock analysts can't guess at their sales figures by looking at the IDs. Or a cell phone manufacturer might want to make their phone serial numbers random for the same reason.

This WTF isn't as WTF as might appear at first. Make the random selection simpler (minor improvement as the SQL query dominates timings anyway), make it transaction safe, and make the sequence space larger than six digits. If the application requires random IDs then the real WTF is if it was "fixed" by using a database sequence...

devdas · 2006-03-03 Reply Admin

JohnO:
wintermyute:
Waiting for someone to say that the Real WTF is that the system was written in Perl...

I was wondering what that was. What the hell is My? Local variable declaration?

This is the standard FAQ: http://perl.plover.com/FAQs/Namespaces.html

giannis · 2006-03-03 Reply Admin

Anonymous:
Anonymous:
You could try to get the highest ID (with MAX()) and simply and 1 to that number and use it as your ID.

Really bad if you delete some rows. MAX ID will be 10 and auto_increment will be 15.

what auto_increment?

2006-03-03 Reply Admin

NOOOO!!!!!

If I had a pound for every time I've had to fix code that uses the MAX() + 1 abomination, I'd have err...about £10

Enric Naval · 2006-03-03 Reply Admin

Maybe I'm going to say something silly, but in Java you can ask your Statement for all keys it has generated via updates:


    //open connection

    //make 1 update

    preparedStatement.prepare(......);
    preparedStatement.executeUpdate(......);

    //get 1 key

    final int key;
    final ResultSet rs = preparedStatement.getGeneratedKeys();
    if ( rs.next() ) {
      key = rs.getInt("GENERATED_KEY");
    } else {
      throw new java.sql.SQLException("No hay ninguna llave generada!");
    }
    return key;

    //close connection

I'm not sure if you can do it too for multiple updates

    //open connection

    //make several updates

    preparedStatement.prepare(......);
    preparedStatement.executeUpdate(......);
    preparedStatement.prepare(......);
    preparedStatement.executeUpdate(......);
    preparedStatement.prepare(......);
    preparedStatement.executeUpdate(......);

    //get several keys

    final Vector<Integer> key; 
    final ResultSet rs = preparedStatement.getGeneratedKeys();
    while ( rs.next() ) {
      key.add( rs.getInt("GENERATED_KEY") );
    }
    return key;

2006-03-03 Reply Admin

I wouldn't even start to think about rekeying. Instead, I'd start with negative identity keys and go downwards from there. The 'alternate ID domain' technique works in many interesting scenarios...

Best Regards,

Gene Pool

2006-03-03 Reply Admin

Anonymous:

Ok, I have an honest question.

I use a method like this because I need to know the ID number before I write something to the database (like, to create a link to the new object in several tables). So I generate a random ID, check to make sure it isn't already in use, and return it.

I know MSSQL has an auto-increment option, but how would I implement a non-WTF solution that returns the ID so I can use it?

..insert stuff here..

declare @ID int

set @ID = @@indentity

first WTF-post ever. Hope this works!

Enric Naval · 2006-03-03 Reply Admin

Jeff S:
Hey did anyone mention yet that you can use SCOPE_IDENTITY() ?

It scares me how lately many people seem to think these WTF's are little programming challenges and feel the need to present their "solutions".

That's me! Can I say hello? Hello, mom! I'm on WTF!

Jokes aside, I have learned lots of things with the WTFs posted here, ehem, I mean, with the posts commenting the WTFs, of course :)

For a first, the discussion about how adding arbitrary columns in a database using a XML-filled TEXT column, that finally brought me to use ALTER TABLE ADD/DROP COLUMN in a system I'm building.

For a second, I learnt about left joins on this incredible misuse of joins, and posterior discussions on how "right join" is so much better and clearer that "left join" , you just can't make these things up :) .

And, of course, rewriting the WTF in a sensible manner serves me to learn how to program better, and forces me to confront new problems I will probably find later anyways. We're not all jaded profesionals, you know.

P.D.: my avatar is Schrodinger's cat, awaiting its fate

mcrbids · 2006-03-03 Reply Admin

I've used loops not unlike this where precision wasn't particularly important. But //any// while() loop is potentially dangerous, which is why I (try to) always put bounds on the while loop.

EG:

$i=0;
while (somecondition)
{
if ($i++>1000)
return error("Not found error seeking next user id");
// do whatever you wanted to do
}

Even in cases like this WTF, where there are more than 1,000 possible options, having a system that would perform over 1,000 select queries and STILL not have a right answer is a mis-estimation of what needs to be accomplished - it's still a proper error.

mcrbids · 2006-03-03 Reply Admin

Anonymous:
Anonymous:
This is what I don't get. What's the benefit of being random?

Random ID are occasionally required. E.g. a company want random order ID numbers so that stock analysts can't guess at their sales figures by looking at the IDs. Or a cell phone manufacturer might want to make their phone serial numbers random for the same reason.

But it's easy to "randomize" the customer ID. Try using MD5 with a common "salt" phrase appended to the UID, for smaller databases, for instance.

Anonymouse · 2006-03-03 Reply Admin

JohnO:
Use UUID/GUID.

The problem with GUIDs is that they are not really unique, just very, very, very, very, very, very, very, very, very, very, very, very, very, very likely to be unique.

2006-03-03 Reply Admin

I can't blame Jeff. In the middle of the night I'd probably just improve the hack as well and properly fix it in the morning.

The operations department isn't responsible for technology/engineering.

Adding a nice // FIXME would be responsible though, while making sure procedures are in place to spend time on removing FIXMEs.

2006-03-03 Reply Admin

mcrbids:
Anonymous:
Anonymous:
This is what I don't get. What's the benefit of being random?

Random ID are occasionally required. E.g. a company want random order ID numbers so that stock analysts can't guess at their sales figures by looking at the IDs. Or a cell phone manufacturer might want to make their phone serial numbers random for the same reason.

But it's easy to "randomize" the customer ID. Try using MD5 with a common "salt" phrase appended to the UID, for smaller databases, for instance.

That'll work if it is ok to have customer IDs that are fairly large numbers: d41d8cd98f00b204e9800998ecf8427e. It isn't ok e.g. if you want to be able to ask over the phone: "what is your customer ID please". Or it won't work for, say, bank account numbers which have smallish length limitations (been there, done that).

Using MD5 like that essentially amounts to using it as a random number generator with the salt+sequence as the seed for the RNG. Also you would want more randomness than just a static salt because anyone who has or ever has had access to the salt can easily reverse engineer the checksum. It only takes one ex-sysadmin who used to have root access to the file containing the salt and who has turned bad...

John Smallberries · 2006-03-03 Reply Admin

Anonymous:
Anonymous:

Ok, I have an honest question.

I use a method like this because I need to know the ID number before I write something to the database (like, to create a link to the new object in several tables). So I generate a random ID, check to make sure it isn't already in use, and return it.

I know MSSQL has an auto-increment option, but how would I implement a non-WTF solution that returns the ID so I can use it?

..insert stuff here..

declare @ID int

set @ID = @@indentity

first WTF-post ever. Hope this works!

*sigh*
As already stated here a number of times, @@identity has been deprecated since SQL Server 2000 came out. Use scope_identity() instead, unless you like adding a sense of mystery to your apps.

ParkinT · 2006-03-03 Reply Admin

Anonymous:

I feel really sorry for the users of this system.

It is likely a US Government Agency

2006-03-03 Reply Admin

If you are working with something like postgres you can run a query:

select nextval('sequence_name');

And this will return a value from the sequence that you can then use, and also increment the sequence so that that value will not be given out again. more details...

Jeff S · 2006-03-03 Reply Admin

mugs:
Jeff S:
OneFactor:
Jeff S:
Hugo:
Anonymous:

I know MSSQL has an auto-increment option, but how would I implement a non-WTF solution that returns the ID so I can use it?

If:

a) the auto-increment MUST be an identity, and

b) you MUST know the new value before actually inserting,

then:

1) Restart reading from the beginning of the specs. You have already read past the WTF, you just didn't realise yet.

2) If you still want this, use IDENT_CURRENT('tablename') and add 1. Make sure to use a transaction and force a table lock before calling the function. Don't forget to send your code to Alex when you're done, then we can all have a laugh at it.

Best, Hugo

Hugo -- that is a GREAT post! Could not agree more, and very well put.

Thank you, that made my day. :)

the original question did not demand knowing the identity before inserting. It required returning the inserted identity so it could be used. scope_identity() works just fine for that. The original question was perfectly legit and has a workable solution. As of Sql Server 2000. The real WTF is that it took so long for SqlServer to provide this solution.

uh ... that's great, thanks for telling us ... not sure what any of that has to do with what you've quoted, but thanks for clearing that up.

The question is, what did Hugo's post have to do with what he quoted? He gave a valid and humorous response - but to a question that wasn't asked. That, I believe, was OneFactor's point.

Hugo unfortunately didn't quote the entire post -- just the second paragraph. Go back and check it out.

Jeff S · 2006-03-03 Reply Admin

Anonymous:
Anonymous:
This is what I don't get. What's the benefit of being random?

Random ID are occasionally required. E.g. a company want random order ID numbers so that stock analysts can't guess at their sales figures by looking at the IDs. Or a cell phone manufacturer might want to make their phone serial numbers random for the same reason.

This WTF isn't as WTF as might appear at first. Make the random selection simpler (minor improvement as the SQL query dominates timings anyway), make it transaction safe, and make the sequence space larger than six digits. If the application requires random IDs then the real WTF is if it was "fixed" by using a database sequence...

Again, someone completely missing the point. The problem is NOT that the new ID's need to be random. the problem is the try-and-then-try-again "algorithm" used to generate them!

ammoQ · 2006-03-03 Reply Admin

Jeff S:
Anonymous:
Anonymous:
This is what I don't get. What's the benefit of being random?

Random ID are occasionally required. E.g. a company want random order ID numbers so that stock analysts can't guess at their sales figures by looking at the IDs. Or a cell phone manufacturer might want to make their phone serial numbers random for the same reason.

This WTF isn't as WTF as might appear at first. Make the random selection simpler (minor improvement as the SQL query dominates timings anyway), make it transaction safe, and make the sequence space larger than six digits. If the application requires random IDs then the real WTF is if it was "fixed" by using a database sequence...

Again, someone completely missing the point. The problem is NOT that the new ID's need to be random. the problem is the try-and-then-try-again "algorithm" used to generate them!

If the IDs are *really* random, there is hardly way to avoid the try-and-then-try-again method; but of course the range has to be big enough to make collissions rather unlikely and the number of trials should be limited, in case something goes wrong (it's much easier to find the code that threw the "cantFindNewIdException" than to find the infinite loop).
On the other hand, if pseudo-random is good enough, it's not difficult to completely avoid collissions.

2006-03-03 Reply Admin

hahaha i love this perl!

2006-03-03 Reply Admin

JohnO:
Anonymous:

Ok, I have an honest question.

I use a method like this because I need to know the ID number before I write something to the database (like, to create a link to the new object in several tables). So I generate a random ID, check to make sure it isn't already in use, and return it.

I know MSSQL has an auto-increment option, but how would I implement a non-WTF solution that returns the ID so I can use it?

Use UUID/GUID.

"Select @@IDENTITY" right after a row insert returns the last generated value in the identity column.

2006-03-03 Reply Admin

Jeff S:
Anonymous:
Anonymous:
This is what I don't get. What's the benefit of being random?

Random ID are occasionally required. E.g. a company want random order ID numbers so that stock analysts can't guess at their sales figures by looking at the IDs. Or a cell phone manufacturer might want to make their phone serial numbers random for the same reason.

This WTF isn't as WTF as might appear at first. Make the random selection simpler (minor improvement as the SQL query dominates timings anyway), make it transaction safe, and make the sequence space larger than six digits. If the application requires random IDs then the real WTF is if it was "fixed" by using a database sequence...

Again, someone completely missing the point. The problem is NOT that the new ID's need to be random. the problem is the try-and-then-try-again "algorithm" used to generate them!

So how would you generate random IDs without checking if you have generated the same one already, then looping and retrying if there is a collision? Please do explain some more how I'm missing the point.

2006-03-03 Reply Admin

Anonymous:
Jeff S:
Anonymous:
Anonymous:
This is what I don't get. What's the benefit of being random?

Random ID are occasionally required. E.g. a company want random order ID numbers so that stock analysts can't guess at their sales figures by looking at the IDs. Or a cell phone manufacturer might want to make their phone serial numbers random for the same reason.

This WTF isn't as WTF as might appear at first. Make the random selection simpler (minor improvement as the SQL query dominates timings anyway), make it transaction safe, and make the sequence space larger than six digits. If the application requires random IDs then the real WTF is if it was "fixed" by using a database sequence...

Again, someone completely missing the point. The problem is NOT that the new ID's need to be random. the problem is the try-and-then-try-again "algorithm" used to generate them!

So how would you generate random IDs without checking if you have generated the same one already, then looping and retrying if there is a collision? Please do explain some more how I'm missing the point.

I am no expert on hashing, but I am pretty sure the are 1-1 hash functions out there, that given a range of numbers, will map one to one with the same set, but with seemingly random patterns.

if the range is 0-99

f(1) = 53
f(2) = 12
f(3) = 99
f(x) = z

where it is never the case that f(x)==f(y) if x != y

so, now just use a sequence in your database to come up with a number, and use the hash function to create the primary key. Obviously you need to plan ahead and make sure your range never needs to increase, unless there truly is a magic hash function that can increase the range without collisions with the smaller range.

Eck, I babble.

2006-03-03 Reply Admin

Assuming, of course, that the original developer was bright enough for the solution to "feel stupid"....

stevekj · 2006-03-03 Reply Admin

devdas:
JohnO:
wintermyute:
Waiting for someone to say that the Real WTF is that the system was written in Perl...

I was wondering what that was. What the hell is My? Local variable declaration?

This is the standard FAQ: http://perl.plover.com/FAQs/Namespaces.html

Damn Perl. While it is certainly true that a WTF can be written in any language, Perl sure makes it a lot easier than some. Not having read the above scoping FAQ, I got tripped up by the difference between "my $x, $y" (which seemed a logical extension of "my $x") and "my ($x, $y)". Not that you get an error message or anything if you use the wrong syntax; most of your variables just silently fail to get locally scoped. WTF?

2006-03-03 Reply Admin

Anonymous:

So how would you generate random IDs without checking if you have generated the same one already, then looping and retrying if there is a collision? Please do explain some more how I'm missing the point.

(...talking to myself...)

There is a variant of the generate-select-loop algorithm: generate-insert-loop. Generate a random ID, then try to insert the actual business object into the database. Loop and retry if you get a unique index violation. Extra bonus: automatically deals with clustering. Disadvantage: you'll know the ID number only after a successful insert, bad if you happen to need it before.

I've actually done that. It's a bit scary though: error codes are database dependent, and if you have other unique indexes besides the ID it requires even more hackery to figure out which index broke. But I had to stuff the maximum number of transactions to a slightly underpowered database, and that bought me one insert instead of select+insert. The application was an online stock IPO subscription system, and I needed transaction reference numbers without disclosing the number of subscribers so far (= popularity of the IPO).

A further variant would be to have a dedicated table with just the ID numbers and a unique index; insert there to check for duplicates and to "reserve" an ID.

OneFactor · 2006-03-03 Reply Admin

Jeff S:
mugs:
Jeff S:
OneFactor:
Jeff S:
Hugo:
Anonymous:

I know MSSQL has an auto-increment option, but how would I implement a non-WTF solution that returns the ID so I can use it?

If:

a) the auto-increment MUST be an identity, and

b) you MUST know the new value before actually inserting,

then:

1) Restart reading from the beginning of the specs. You have already read past the WTF, you just didn't realise yet.

2) If you still want this, use IDENT_CURRENT('tablename') and add 1. Make sure to use a transaction and force a table lock before calling the function. Don't forget to send your code to Alex when you're done, then we can all have a laugh at it.

Best, Hugo

Hugo -- that is a GREAT post! Could not agree more, and very well put.

Thank you, that made my day. :)

the original question did not demand knowing the identity before inserting. It required returning the inserted identity so it could be used. scope_identity() works just fine for that. The original question was perfectly legit and has a workable solution. As of Sql Server 2000. The real WTF is that it took so long for SqlServer to provide this solution.

uh ... that's great, thanks for telling us ... not sure what any of that has to do with what you've quoted, but thanks for clearing that up.

The question is, what did Hugo's post have to do with what he quoted? He gave a valid and humorous response - but to a question that wasn't asked. That, I believe, was OneFactor's point.

Hugo unfortunately didn't quote the entire post -- just the second paragraph. Go back and check it out.

My bad. I only read Hugo's post which quoted the second paragraph and it looked like the original question was asking about returning the identity (perfectly legit) so it could be used. Then Hugo's post appeared like it was making fun of the the original question for asking about using an identity and generating it beforehand. I was trying to point out that the original question did not ask for an identity before hand. But I can see now that the original question in fact did ask for an identity beforehand.

So yes, Hugo's post is both funny and addresses the original question adequately. So to sum it all up:
1. you can't generate an identity beforehand
2. if you want something beforehand use a guid
3. if you need an identity, return the result of scope_identity
4. please do not use @@identity or else baby Jesus will cry.

Do we have a count on the number of people giving out that bad advice of @@identity? duplicate WTF's within the WTF. I'm beginning to see why religious zealots get so touchy over heresies. They just keep coming up over, and over, and over...

maldrich · 2006-03-03 Reply Admin

Two things:

1. Is it possible to automatically reject all posts with the text "Select @@IDENTITY," because that is the most ubiquitous wrong answer to any question on the internet.

2. I wonder if the original system referenced by this WTF used the same random seed. I would find it personally hillarious if each iteration tested THE SAME sequence of pseudo-random numbers until finding the next available one. Nested WTFs are the best.

2006-03-03 Reply Admin

Anonymous:

I am no expert on hashing, but I am pretty sure the are 1-1 hash functions out there, that given a range of numbers, will map one to one with the same set, but with seemingly random patterns.

Yeah, you can do that. One problem is that once someone (such as the vendor who wrote the code) knows the algorithm he can easily reverse engineer it. Security by obscurity. Probably can't change the algorithm either because a new algorithm would generate clashes with the old IDs already in the database. It would somehow have to be an algorithm that is somehow seeded with a seed that nobody can find out later. Is this starting to sound like a nonexistengorithm...?

If you can trust your software vendors and your IT staff (past as well as current) then that'll work. I come from banking background where nobody including me is to be trusted. I quit my job at the bank and I have suspicion my ex-boss is happy I didn't walk out with the keys to the vault in my head...

Enric Naval · 2006-03-03 Reply Admin

Vector:
ergh. Can somebody tell me if I've been naughty? I'm using a random seed generator for my sessions (in the name of security, rather than setting cookies with the username set in it or the like I have a server-side session ID and various other things.)
function fSeed() {

$s = 0;

for($x = 0; $x < 16; $x++) {
$s .= rand(0, 9);
}

$q = mysql_query("SELECT * FROM `sessions` WHERE `SID` = '{$s}'") or die(mysql_error());
if (mysql_num_rows($q) == 1) {
return fSeed();
} else {
return $s;
}
}

God, this forum software is awful. I could write something better than this >_>

Your code is wrong, but because of other problems. See below (long post about randomness of random())


the original function

function fSeed() 
{
  $s = 0;
  for($x = 0; $x < 16; $x++) {
    $s .= rand(0, 9);
  }

  $q = mysql_query("SELECT * FROM `sessions` WHERE `SID` = '{$s}'") or die(mysql_error());

  if (mysql_num_rows($q) == 1) {
    return fSeed();
  } else {
    return $s;
  }
}

Is this is a frigging recursive function? I'm sorry, but, you see, if your process ever gets stuck searching for free keys, not only you will enter an infinite "loop", but you will run out of heap because of so many recursions :)

I suggest, first of all, rewriting it as iterative, so that simple mortals have a chance to understand it, instead of staying a few minutes in front of the computer scratching their heads in disbelief. I know that is an inferior approach, because then you can't make certain hacks to the code, but....


function fSeed() 
{


  do {

    $s = 0;
    for($x = 0; $x < 16; $x++) {
      $s .= rand(0, 9);
    }

    $q = mysql_query("SELECT * FROM `sessions` WHERE `SID` = '{$s}'") or die(mysql_error());


  } while (mysql_num_rows($q) > 0) {

  return $s;
}

The problem here is that rand() is not totally random on a normal computer, so, if you hit the same seed as another time, you will get a loooooong series of "random" numbers equal to the last time you hit that same seed. So you may wind up calling mysql_query lots of times, because rand() already gave you all those numbers in a former run, and they are still in the database. Even if you don't get the same seed, you may encounter repetitions and series of numbers similar to series you already got.

(maybe Perl automatically seeds using the system time, or maybe to need to seed yourself, I don't know)

You can improve randomness with some tricks. I don't know Perl, so I'm using a made up function. "shift(number,how_many)" will shift to the left the parameter "chain" by "how_many" number of bits. Overflow bits reappear to the right (I forgot the exact name for this). You can also uses mask, sums and multiplies. "tail_to_size(string,size)" is supossed to reduce the string to a "size" length. We want the lowest bytes here, because they are sligthly more random.

The reason for this is that "rand" may be returning always the exact series of ":random" numbers when it gets the same seed, so you always need to either: a) give "rand" an arbitrary seed set by exterior events (system time, for example) or b) operate the random number with an arbitrary number. Most users will have different IPs, and you get for free 255 different last_bytes_of_IP ones :)


This function here is a bit exaggerated, but I hope it gets the point accross. It has reduced a lot the possibility of hitting even twice the same SID because of problems with rand():


function fSeed($last_byte_of_IP) 
{

  //we should also seed random() with system time before calling rand

  do {

    $s = 0;
    for($x = 0; $x < 16; $x++) {
      $s .= rand(0, 9);
    }
    $s = shift($s,$last_byte_of_IP);
    $s = $s * $last_byte_of_IP;
    $s = tail_to_size(s,16);


    $q = mysql_query("SELECT * FROM `sessions` WHERE `SID` = '{$s}'") or die(mysql_error());


  } while (mysql_num_rows($q) > 0) {

  return $s;
}

If you generate many, many, many random numbers, and apply them somewhere, you will start seeing enormous biases depending on how you seed. Here you are using a 16 digit number, so biases are less visible.

Rant: in a small car videogame for university, I once had to try getting a random number between 2 and -2 five times a second. This number decided what side the road moved as the screen scrolled. When I hit the screen border, I only accepted part of the range (if I hit the right border, I ignore values over zero, because I would fall out of the screen).

I kept getting a bias for values over zero, so the road kept hitting the right border, going back to the middle of the screen, then moving between the middle and the right border, occasionally making small incursions towards the left border, sometimes even reaching it. I also got different bias depending on what computer I ran it on!

At the end, I squashed the bias by combining two different random numbers with different seeds, XORing them, shifting by a *third* random number that had already been un-biased, then using the most random bits. In most computers, all biases disappeared or became very small.

In my case, it was a very visible bias. In your case, you may be getting a bias, but it is not visible to you, or you may have avoided it altogether by using 16 digits numbers :) In my case, I had a very small output set, so any bias had enormous effects.

hank miller · 2006-03-03 Reply Admin

Anonymous:
Anonymous:
You could try to get the highest ID (with MAX()) and simply and 1 to that number and use it as your ID.

Not threadsafe. The WTF isn't either, but not a good solution

In defense of the WTF, while it isn't thread safe, it is at least an order of magnatude less likely to fail.

With the MAX, it has a chance to fail each time two threads are looking for an ID at the same time. At least with the WTF, there is only a 1 in 1000000 chance that the two threads looking for an ID will generate the same random value. (Of course as the table fills up more and more threads will be looking for values at the same time)

Cooper · 2006-03-03 Reply Admin

stevekj:
devdas:
JohnO:
wintermyute:
Waiting for someone to say that the Real WTF is that the system was written in Perl...

I was wondering what that was. What the hell is My? Local variable declaration?

This is the standard FAQ: http://perl.plover.com/FAQs/Namespaces.html

Damn Perl. While it is certainly true that a WTF can be written in any language, Perl sure makes it a lot easier than some. Not having read the above scoping FAQ, I got tripped up by the difference between "my $x, $y" (which seemed a logical extension of "my $x") and "my ($x, $y)". Not that you get an error message or anything if you use the wrong syntax; most of your variables just silently fail to get locally scoped. WTF?

This is easier to understand if you remember that the originators of perl were interested in fun, creativity, making puns and other low class jokes, feeling good about yourself and creating an ambiguous, redundant, language, unsuited to production, not a language meant for development of reliable, maintainable code.

2006-03-03 Reply Admin

Anonymous:
Looks like someone found a bug.... (found floating just under No one likes the second page. on the second page)

what only 1............ as pointed out many times here this forum software is the biggist cluster of bugs most of us have ever seen!!!!!

looks different between IE/Firefox

2006-03-03 Reply Admin

stevekj:
devdas:
JohnO:
wintermyute:
Waiting for someone to say that the Real WTF is that the system was written in Perl...

I was wondering what that was. What the hell is My? Local variable declaration?

This is the standard FAQ: http://perl.plover.com/FAQs/Namespaces.html

Damn Perl. While it is certainly true that a WTF can be written in any language, Perl sure makes it a lot easier than some. Not having read the above scoping FAQ, I got tripped up by the difference between "my $x, $y" (which seemed a logical extension of "my $x") and "my ($x, $y)". Not that you get an error message or anything if you use the wrong syntax; most of your variables just silently fail to get locally scoped. WTF?

use warning;

and perl will give you a warning, isn't that easy?

2006-03-03 Reply Admin

Ah, but Sybase AutoIncrement has issues, or at least it has in the past, when the system is halted it can start back with an incorrect seed and will create duplicate identities, which can cause a great WTF!

kidd1270 · 2006-03-03 Reply Admin

One cheap way for a unique pk is use yyyymmddHHiiss. It'll be the same length. Plus you know when the record was added. Also it's sortable.

And If you can hit the submit button at the same exact second someone else does...I'll kiss your butt.

Maybe this is a wtf, but I've got a system running 10 years now with no problems. [{][B]

--------------------

John Smallberries · 2006-03-03 Reply Admin

Anonymous:
Anonymous:
Looks like someone found a bug.... (found floating just under No one likes the second page. on the second page)

what only 1............ as pointed out many times here this forum software is the biggist cluster of bugs most of us have ever seen!!!!!

looks different between IE/Firefox

IE doesn't support position fixed;

Jeff S · 2006-03-03 Reply Admin

Anonymous:
Anonymous:

I am no expert on hashing, but I am pretty sure the are 1-1 hash functions out there, that given a range of numbers, will map one to one with the same set, but with seemingly random patterns.

Yeah, you can do that. One problem is that once someone (such as the vendor who wrote the code) knows the algorithm he can easily reverse engineer it. Security by obscurity. Probably can't change the algorithm either because a new algorithm would generate clashes with the old IDs already in the database. It would somehow have to be an algorithm that is somehow seeded with a seed that nobody can find out later. Is this starting to sound like a nonexistengorithm...?

If you can trust your software vendors and your IT staff (past as well as current) then that'll work. I come from banking background where nobody including me is to be trusted. I quit my job at the bank and I have suspicion my ex-boss is happy I didn't walk out with the keys to the vault in my head...

Don't forget, we are talking about the generation of "random" numbers. I udnerstand your point that by using non-sequential numbers -- you can obscure the # of users you have, make it harder to guess other "IDs", or the # of transactions in a system. But knowing that secret pseudo-random number algorithm is *slightly* different than knowing the "back door that will access the bank vault" algorithm.

Even if no developers can ever be trusted with this secret algorithm (i.e., your manager also believes that UFO's are watching him and so on) you can always lock away the key value that the hash algorithm uses -- kind of like an encryption key.

Jeff S · 2006-03-03 Reply Admin

kidd1270:
One cheap way for a unique pk is use yyyymmddHHiiss. It'll be the same length. Plus you know when the record was added. Also it's sortable.

And If you can hit the submit button at the same exact second someone else does...I'll kiss your butt.

Maybe this is a wtf, but I've got a system running 10 years now with no problems. [{][B]

--------------------

Probably because either a) not many people are using your system concurrently or b) you code is ignoring constraint violations in your SQL statements.

John Smallberries · 2006-03-03 Reply Admin

create table Seed (id int identity, when DateTime)

declare @seed int, @random int
DO_IT:
insert into Seed values (getdate())
select @seed = @@identity
/* check just in case...*/
if(not exists(select id from Seed where id = @seed))
goto DO_IT

set @random = fGenRandom(@seed)

Done and done.
Wasn't that easy?

kanet77 · 2006-03-03 Reply Admin

BlackTigerX:
tercero!

Incorrecto. ¡Cuarto!
:-P

Charlie Marlow · 2006-03-03 Reply Admin

I know it was not part of the original issue, but a lot of these key generation schemes will really bite you in the arse if you ever need replication.

2006-03-03 Reply Admin

maldrich:
Anonymous:

Ok, I have an honest question.

I use a method like this because I need to know the ID number before I write something to the database (like, to create a link to the new object in several tables). So I generate a random ID, check to make sure it isn't already in use, and return it.

I know MSSQL has an auto-increment option, but how would I implement a non-WTF solution that returns the ID so I can use it?

Anonymous ... beware! There's a lot of bad advice in this thread. There is exactly one optimal way to deal with Identity (sequence) in MS SQL 2000 and later:

Insert into your table
select scope_identity()

Do not use max(), do not use @@identity, and look in BOL so you don't copy some incorrect syntax from the kind folks here.

Read his question AGAIN. You're advise is good, but it's not addressing his problem. He needs a way to assign a PK outside the database before it's written. I already outlined 2 basic ways to approach the problem.

You will run into this design problem whenever you have to write offline applications (i.e. an order taking application that sits on a laptop and doesn't upload back to the database until much later). You can make the unique number a reference number but NOT the PK of the real record (which many do), or you can implement a smart algorithm at the client that ensures uniqueness (the constraint should be heavily verified at the database side anyway to prevent duplicate inserts).

mugs · 2006-03-03 Reply Admin

Jeff S:
mugs:
Jeff S:
OneFactor:
Jeff S:
Hugo:
Anonymous:

I know MSSQL has an auto-increment option, but how would I implement a non-WTF solution that returns the ID so I can use it?

If:

a) the auto-increment MUST be an identity, and

b) you MUST know the new value before actually inserting,

then:

1) Restart reading from the beginning of the specs. You have already read past the WTF, you just didn't realise yet.

2) If you still want this, use IDENT_CURRENT('tablename') and add 1. Make sure to use a transaction and force a table lock before calling the function. Don't forget to send your code to Alex when you're done, then we can all have a laugh at it.

Best, Hugo

Hugo -- that is a GREAT post! Could not agree more, and very well put.

Thank you, that made my day. :)

the original question did not demand knowing the identity before inserting. It required returning the inserted identity so it could be used. scope_identity() works just fine for that. The original question was perfectly legit and has a workable solution. As of Sql Server 2000. The real WTF is that it took so long for SqlServer to provide this solution.

uh ... that's great, thanks for telling us ... not sure what any of that has to do with what you've quoted, but thanks for clearing that up.

The question is, what did Hugo's post have to do with what he quoted? He gave a valid and humorous response - but to a question that wasn't asked. That, I believe, was OneFactor's point.

Hugo unfortunately didn't quote the entire post -- just the second paragraph. Go back and check it out.

I did. Ther person he quoted acknowledged that the way he was doing it was a WTF and then asked how to get the ID after it was inserted, not before like he had been doing.

ammoQ · 2006-03-03 Reply Admin

Anonymous:

I am no expert on hashing, but I am pretty sure the are 1-1 hash functions out there, that given a range of numbers, will map one to one with the same set, but with seemingly random patterns.

if the range is 0-99

f(1) = 53
f(2) = 12
f(3) = 99
f(x) = z

where it is never the case that f(x)==f(y) if x != y

so, now just use a sequence in your database to come up with a number, and use the hash function to create the primary key.

Eck, I babble.

A relatively easy way to do that is "(c^n) mod p" (c and p are well-choosen constants, p should be a prime, n the number of the key to generate. For example, c=75 and p=65537 will generate all numbers in the range of 1..65536. (Of course this range is too small for practial purposes, take it as an illustrative example). To make it harder to guess n, just add a secret constant ...
key=((c^n)+s) mod p

Obviously you need to plan ahead and make sure your range never needs to increase, unless there truly is a magic hash function that can increase the range without collisions with the smaller range.

Obviously there is no way to create such a magic hash function in a non-obvious way.
(Obvious way: range 1..x maps to 1..x, x+1..y maps to x+1, y etc.)

This Should Work Just Fine

Leave a comment on “This Should Work Just Fine”