The Daily WTF: Curious Perversions in Information Technology

2009-03-13 Reply Admin

Aaah, now this is why I come here. Good old-fashioned painful WTF. No random stories here, just ugly code. Well done.

WayneCollins · 2009-03-13 Reply Admin

Anonymous:
After your insert query, call this:
SELECT LAST_INSERT_ID() FROM
It will be thread safe if you're not using persistent connections.
What part of "in a single request" is so hard to understand?

Auction_God · 2009-03-13 Reply Admin

WayneCollins:
JamesQMurphy:
128 bits? You're essentially using GUIDs. A company that I worked at 10 years ago used GUIDs for their ID column -- SQL Server supports it.

This was Oracle. I might have overlooked its GUID type if it had one. What I ended up doing was randomly generating two longs in C#, then concatting their "ToString()" output. Which, come to think of it, means that there was less than 128 bits of randomness, since multiple pairs of numbers yield the same concatted string.

Oops.

The Oracle Function is: SYS_GUID()

2009-03-13 Reply Admin

WayneCollins:
Anonymous:
After your insert query, call this:
SELECT LAST_INSERT_ID() FROM
It will be thread safe if you're not using persistent connections.
What part of "in a single request" is so hard to understand?
The part where it's
insert into sometable (somefields) values (somevalues) select SCOPE_IDENTITY() as lastId
How hard is that and how is that not a single request?

Leak · 2009-03-13 Reply Admin

WayneCollins:
LHC:
@@IDENTITY returns the last inserted identity value
Just have your row identfied as an identity

Yeah, but that's still two requests.

Ummm - you do know that you can execute more than one SQL statement per query?

Using "INSERT INTO ... VALUES (...) SELECT @@IDENTITY" does the trick just fine, in just one query. Run it with ExecuteScalar (at least that's what it's called in .NET) and you'll get the ID as the result.

np: Fink - This Is The Thing (Distance And Time)

obediah · 2009-03-13 Reply Admin

WayneCollins:
Anonymous:
After your insert query, call this:
SELECT LAST_INSERT_ID() FROM
It will be thread safe if you're not using persistent connections.
What part of "in a single request" is so hard to understand?
You sidetracked a thread to ask a lazy noob question that anyone with a shred of respect for others would have went to google to piggyback on one of the thousands of other times someone was nice enough to answer this question. You don't get to be snarky.
I on the other hand, and an innocent bystander. So Anonymous, What part of "in a single request" is so hard to understand?

2009-03-13 Reply Admin

The true WTF isn't using random ids; the TWTF is picking the id from a very narrow range of values, thus having a crazy number of collisions. In fact, when you have heavy concurrency, sequential numbers can be a source of contention in indexes, since every one is trying to insert more or less at the same place in the index (contention is often more severe in indexes than in tables). There was no primary key, but there might have been (let's be optimistic) an index, possibly unique (optimism will kill me) on the column. But it's likely that this is a case where they was no concurrency at all ... It's very typical of ideas that are not totally insane by themselves, which in some cases are quite suitable, but were poorly understood and consequently misused.

Auction_God · 2009-03-13 Reply Admin

I worked with someone I originally thought was a talented developer. He did almost the exact same thing. Upon insert, he would randomly generate a number and try the insert. If it failed he would change the number (there was at least a unique index on the column!).

He didn't want to use the "new fangled" identity column that was just made available (in SQL Server 6.0). Much worse than this WTF, was the fact that this was a distributed database (used by the USPS no less). So the key he generated was always prepended by another number indicating the "location". That way when the data was brought together (with our custom replication solution - SQL Server didn't have that yet) - the data would not collide. There was no opportunity for expanding the range of numbers because whatever was done could collide with another location's initial numbers. He didn't want to use a CHAR or VARCHAR for the primary key. He had read an article from MS that they were slower than a numeric column. (If he had relented on this, at least locations could have been text either pre- or post- pended to the ID).

I still have palm marks on my forehead from doing the code review.

2009-03-13 Reply Admin

That is not an exponential-time function. As the table fills up, execution time grows asymptotically towards infinity.

Probably faster, if the random number generator is not strong.

Bobbo · 2009-03-13 Reply Admin

SomeCoder:
Sorry to be pedantic but "eachother" is NOT a word. It's a big pet peeve of mine since way back in school, I had a teacher attempt to teach us that "eachother" was correct. It's two words people, "EACH" and "OTHER".

WTF? That's the first time I've encountered that - thanks a lot for giving me a new pet peeve to watch out for! Still doesn't top "could of", "prolly" and "PIN number" though. (Yes I do need to get out more!)

As a side-WTF, Merriam-Webster actually returns results for 'eachother'. Good old Oxford English restores sanity though by returning no results.

Sutherlands · 2009-03-13 Reply Admin

DaveShaw:
Vollhorst:
He could have run that code in multiple threads to speed it up!

Argghh! That made me Laugh.

So what happens when there are 999999 invoices in the database...??

Dave

You take one down, pass it around, then there's 999998 invoices in the database.

2009-03-13 Reply Admin

but why:
Why aren't the invoice numbers allowed to be sequential ?

May be a security issue. If the numbers are sequential, then if you have X, then you know X+/-N are likely valid as well. If they are random, then it's harder to guess them.

I once worked at a place where a developer did a sequential numbering scheme, and lo and behold, someone could increment/decrement the ID in the URL and see other client's data. I had to patch in a random number validation code to hack a fix for that.

Kermos · 2009-03-13 Reply Admin

WayneCollins:
themagni:
WayneCollins:
I must confess that I have written random ID generators in code to feed to a table.
In my defense, the ID was 128 bits, so it would be a lot longer before you would expect a collision. Also, there was a primary key constraint on the ID, so you couldn't put in a duplicate record... which would bring you back to the ID generating loop, except much later in the growth of the table and with a round trip to the database every time.

By the way, is there a good way to insert a row with a unique key and get back the key in a single request to an Oracle or MSSQL server?

You could use the primary key, add each one sequentially, then if you don't want customers to know what their number is, hash it up and give them that number. Save the hash in the db and you can search by that, so you could write something like "select * from customers where hashID = value"

I do PLC programming and I know that much.

Yeah, that's great, until some other app, or multiple instances of the same app, want to insert things into the same table.

Or until you get a Hash collision...but those never happen, right?

Sutherlands · 2009-03-13 Reply Admin

Buddy:
but why:
Why aren't the invoice numbers allowed to be sequential ?

May be a security issue. If the numbers are sequential, then if you have X, then you know X+/-N are likely valid as well. If they are random, then it's harder to guess them.

I once worked at a place where a developer did a sequential numbering scheme, and lo and behold, someone could increment/decrement the ID in the URL and see other client's data. I had to patch in a random number validation code to hack a fix for that.

Wait, clients could see other clients data by putting in the id, and your fix was to make the ids random? Didn't want to try server-side validation with some sort of permissions and login?

Sutherlands · 2009-03-13 Reply Admin

WayneCollins:
JamesQMurphy:
128 bits? You're essentially using GUIDs. A company that I worked at 10 years ago used GUIDs for their ID column -- SQL Server supports it.

This was Oracle. I might have overlooked its GUID type if it had one. What I ended up doing was randomly generating two longs in C#, then concatting their "ToString()" output. Which, come to think of it, means that there was less than 128 bits of randomness, since multiple pairs of numbers yield the same concatted string.

Oops.

Could you give me an example of 2 pairs of numbers that concat to the same string? I'm curious.

Sutherlands · 2009-03-13 Reply Admin

Kaosadvokit:
I almost hate to say this, but since I love playing devil's advocate, I actually enjoy it.
In principle, it is done correctly, the problem is that the range of allowed random numbers is insufficient. Version 4 UUIDs are generated as 122-bit pseudorandom numbers. If he was issuing a GUID and retrying in the event of a conflict he would be applauded for having error detection/correction that many people omit.

Of course, it would make even more sense to use an auto-increment integer column with a unique constraint, as so many have pointed out, but that doesn't mean that this method is invalid.

In theory, theory and practice are the same thing, but in practice, they aren't.

2009-03-13 Reply Admin

DaveShaw:
So what happens when there are 999999 invoices in the database...??

Something tells me that the firm won't be in business long enough to generate 999999 invoices...

Kermos · 2009-03-13 Reply Admin

I've seen this:
jaspax:
The sad thing is that after reading the title and the first paragraph, I could see exactly what the WTF was going to be. The rest of the article just confirmed my suspicions.

Same thing here. And on top of that, I know exactly how this code came into existence:

Developer builds invoicing system. PHB says "I don't want the unique invoice numbers incrementing because then our clients will think we don't do much business. Can you make them random and still unique?"

Better solution would have been to just increment the invoice ID by a value other than 1, pick like a prime number or something. That way, invoice numbers still have an order to them but it looks like more are being generated than there really are. And if you're incrementing by some oddball value, chances are nobody that sees the invoices will ever notice.

Any customer that does pay attention to the Invoice Numbers will likely thing something is weird when they one week get invoice #104545 and then a week later #50

jspenguin · 2009-03-13 Reply Admin

Sutherlands:
WayneCollins:
JamesQMurphy:
128 bits? You're essentially using GUIDs. A company that I worked at 10 years ago used GUIDs for their ID column -- SQL Server supports it.

This was Oracle. I might have overlooked its GUID type if it had one. What I ended up doing was randomly generating two longs in C#, then concatting their "ToString()" output. Which, come to think of it, means that there was less than 128 bits of randomness, since multiple pairs of numbers yield the same concatted string.

Oops.
Could you give me an example of 2 pairs of numbers that concat to the same string? I'm curious.

1 and 23456 12 and 3456 123 and 456 1234 and 56 12345 and 6 etc.

2009-03-13 Reply Admin

Could you give me an example of 2 pairs of numbers that concat to the same string? I'm curious.

"1" + "21" = "12" + "1" = "121"

Archimidas · 2009-03-13 Reply Admin

SomeCoder:
Sorry to be pedantic but "eachother" is NOT a word. It's a big pet peeve of mine since way back in school, I had a teacher attempt to teach us that "eachother" was correct. It's two words people, "EACH" and "OTHER".
/pedantic

I guess the only possible explanation for this code being written is that the developer had exactly zero knowledge of SQL. I... guess... I'm grasping at straws here because this is pretty terrible :)

This is how language evolves. If 'eachother' isn't aword itwill besoon.

2009-03-13 Reply Admin

Oi vey.

Solution to the "we don't want to reveal our low volume to our customers" problem: CYYMMDDHHNNSS - yeah its a 13 digit invoice id, but what non programmer customer is going to pick up on the sequence?

2009-03-13 Reply Admin

oracle: INSERT INTO ........ RETURNING

Avoid it, use a sequence manually, and two statements instead.

dpm · 2009-03-13 Reply Admin

GregM:
Could you give me an example of 2 pairs of numbers that concat to the same string? I'm curious.

"1" + "21" = "12" + "1" = "121"

Is "zero-filled" that difficult a concept?

2009-03-13 Reply Admin

pjt33:
Burdieburd:
But why is it suddenly slowing down?
I don't see a "suddenly" in the article. It's probably just passed the point at which someone perceives that it's slower than it used to be. Also note the lack of index, so with n rows it could take O(n) time per query and the expected number of queries is N / (N - n) where N is the number of possible keys. nN / (N - n) grows fairly fast once n isn't small.
Incidentally, I'm not familiar with PHP but am I correct in thinking that the error handling is dodgy? It seems to echo an error message and carry straight on.

Actually, your average runtime is going to be solved as the limit of the series (n/N) + 2*(n/N)^2 + ... k*(n/N)^k. This makes the calculation decidedly non-linear.

F(n/N)(to 15 terms) 0.1 = 0.12 0.2 = 0.31 0.5 = 2.00 0.7 = 7.57 0.75 = 11.24 0.76 = 12.21 0.9 = 43.67

So it grows exponentially from 0 at 0% load to infinity at 100% load; the performance would fall off sharply once it passed a particular threshold and it went exponential.

2009-03-13 Reply Admin

VLPRONJ:
IANARP (I am not a real programmer), and even I can tell that random numbers have limited application in business invoicing...

If your application says "Enterprise" anywhere in the title then you use random numbers to determine the number of zeros to append to the cost.

2009-03-13 Reply Admin

WayneCollins:
LHC:
Been awhile, but maybe...
@@IDENTITY returns the last inserted identity value

Just have your row identfied as an identity

Yeah, but that's still two requests.

A stored procedure could do it in one request. I should have done that, but I was still revolted by TSQL and PL/SQL and tried to do everything from a "real" language...

Also, Oracle Sequencers are very handy. I use them quite frequently to populate key values. An in-line SELECT SEQUENCE_NAME.NEXTVAL FROM dual, and you're good to go.

A single statement, outside of the database? Ugh. Don't do it. Stored procs are the way to go if you're using Oracle. Well, they are IMHO.

2009-03-13 Reply Admin

WayneCollins:
Anonymous:
After your insert query, call this:
SELECT LAST_INSERT_ID() FROM
It will be thread safe if you're not using persistent connections.
What part of "in a single request" is so hard to understand? I know in JDBC you can set an option on the query object to return the generated ID... but even still, why are you so psychotic about doing everything in one request? OH NO I HAVE A MINOR INEFFIENCY

snoofle · 2009-03-13 Reply Admin

WayneCollins:
I must confess that I have written random ID generators in code to feed to a table.
In my defense, the ID was 128 bits, so it would be a lot longer before you would expect a collision. Also, there was a primary key constraint on the ID, so you couldn't put in a duplicate record... which would bring you back to the ID generating loop, except much later in the growth of the table and with a round trip to the database every time.

By the way, is there a good way to insert a row with a unique key and get back the key in a single request to an Oracle or MSSQL server?

use a "sequence"

2009-03-13 Reply Admin

Anon:
but why:
Why aren't the invoice numbers allowed to be sequential ?

Because you can't have a customer place order number 800234 one day and then order 800235 the next day and realize that you have no other customers!

If that's causing a problem for you, use Fibbonacci sequential invoice numbers. Then your customer base will just keep growing!

2009-03-13 Reply Admin

dpm:
GregM:
Could you give me an example of 2 pairs of numbers that concat to the same string? I'm curious.

"1" + "21" = "12" + "1" = "121"

Is "zero-filled" that difficult a concept?

Given that the original poster was the one that realized that they have less randomness than they originally thought, I think it's fair to say they hadn't thought of padding the string with zeros. Otherwise they wouldn't have a problem.

2009-03-13 Reply Admin

What I want to know is this: what was the random-number generator seeded with? Please, oh please, tell me they use the same seed every time -- it would make this WTF so much more ... err ... WTFier.

2009-03-13 Reply Admin

moz:
Anon:
but why:
Why aren't the invoice numbers allowed to be sequential ?

Because you can't have a customer place order number 800234 one day and then order 800235 the next day and realize that you have no other customers!
If that's causing a problem for you, use Fibbonacci sequential invoice numbers. Then your customer base will just keep growing!

It's thinking like that that will see you go far in management, my son.

2009-03-13 Reply Admin

but why:
Why aren't the invoice numbers allowed to be sequential ?

Because someone specified "A random invoice ID number" when they really meant "An arbitrary invoice ID number"

Kermos · 2009-03-13 Reply Admin

clm:
What I want to know is this: what was the random-number generator seeded with? Please, oh please, tell me they use the same seed every time -- it would make this WTF so much more ... err ... WTFier.

If it's PHP 4.2 or higher than the seeding is done automatically according to the documentation. However, I suspect that it's only by accident that this is saving the code from that particular WTF.

2009-03-13 Reply Admin

Oh, I get it... they're missing the definition!

define('RAND', 90210);
// ... snip ... 
$code = RAND;
// ... snip ...

WayneCollins · 2009-03-13 Reply Admin

Anon:
dpm:
GregM:
Could you give me an example of 2 pairs of numbers that concat to the same string? I'm curious.

"1" + "21" = "12" + "1" = "121"

Is "zero-filled" that difficult a concept?

Given that the original poster was the one that realized that they have less randomness than they originally thought, I think it's fair to say they hadn't thought of padding the string with zeros. Otherwise they wouldn't have a problem.

Nope, didn't pad it. So I got somewhat more than 127 bits of randomness... every combination in which the second number was negative was unique, since the second number in string form then began with a "-".

I should have used a stored procedure. Or a real GUID type. Or both. Or just fired off two SQL statements, perhaps bundled in one request. There was much room for improvement all around. At least I won't make those particular mistakes again. I'm sure I'll find exciting new ones...

DaveK · 2009-03-13 Reply Admin

snoofle:

User1 User2 GetNewId GetNewId n=Random(000000,999999) // 1234 n=Random(000000,999999) // 1234 select select if 1234 not in db if 1234 not in db insert 1234 insert 1234

universe explodes

Well, that would be a mercy. At least we wouldn't have to look at this stupid code any more.

DaveK · 2009-03-13 Reply Admin

icywindow:
Or perhaps he should have taken advantage of Perl's relationship with PHP and put it into a hash function to get the number out, then increment until you find the next untaken number, after all, 1000000 is prime, right guys?

Hmmm, you're not under the misapprehension that hash tables *have* to have a prime number of entries, are you? A commonly-held canard.

DaveK · 2009-03-13 Reply Admin

Kaosadvokit:
I almost hate to say this, but since I love playing devil's advocate, I actually enjoy it.
In principle, it is done correctly, the problem is that the range of allowed random numbers is insufficient.

In principle, it was the wrong thing to do in principle. Also, in practice, it was the wrong thing to do in principle. Also, in practice, it was the wrong thing to do in practice.

Also, in principle, it was the wrong thing to do in practice.

Doing the completely wrong thing, no matter how flawless your implementation, is never correct. Not even in principle. Especially not in practice.

2009-03-13 Reply Admin

I unfortunately have encountered this exact solution before also.

2009-03-13 Reply Admin

HK47:
WayneCollins:
Anonymous:
After your insert query, call this:
SELECT LAST_INSERT_ID() FROM
It will be thread safe if you're not using persistent connections.
What part of "in a single request" is so hard to understand? I know in JDBC you can set an option on the query object to return the generated ID... but even still, why are you so psychotic about doing everything in one request? OH NO I HAVE A MINOR INEFFIENCY
Because apparently someone has never done multithreaded programming...

DaveK · 2009-03-13 Reply Admin

asdf:
Oh, I get it... they're missing the definition!
define('RAND', 90210);
// ... snip ... 
$code = RAND;
// ... snip ... 

inb4 xkcd ref.

Also, that's kinda lame. You have to squint and pretend not to have noticed that there already is a statement assigning a value to $code and that it doesn't need any extra definitions and you're just proposing to throw that existing value away. Any joke that relies on you being dumber than the guy who coded the original WTF in order to be able to see it is headed for fail.

mxx · 2009-03-13 Reply Admin

fanha:
... So it grows exponentially from 0 at 0% load to infinity at 100% load; the performance would fall off sharply once it passed a particular threshold and it went exponential.

Can't be bothered to do the maths right now, but it can't be growing exponentially to infinity; exponential functions don't go to infinity.

2009-03-13 Reply Admin

WayneCollins:
At least I won't make those particular mistakes again. I'm sure I'll find exciting new ones...

And really, can any of us ask for anything more than the continuous opportunity to make exciting new mistakes?

2009-03-13 Reply Admin

This reminds me of a sorting algorithm that a friend invented (in jest). He called it a "bumble sort".

The code randomly looked at an array location and compared it's contents to the location immediately after. If it was lower than the first, it swapped the location's contents and repeated the process on another randomly selected array location.

2009-03-13 Reply Admin

mxx:
fanha:
... So it grows exponentially from 0 at 0% load to infinity at 100% load; the performance would fall off sharply once it passed a particular threshold and it went exponential.

Can't be bothered to do the maths right now, but it can't be growing exponentially to infinity; exponential functions don't go to infinity.

My exponential functions all go to 11.

2009-03-13 Reply Admin

Sutherlands:
Buddy:
May be a security issue. If the numbers are sequential, then if you have X, then you know X+/-N are likely valid as well. If they are random, then it's harder to guess them.
I once worked at a place where a developer did a sequential numbering scheme, and lo and behold, someone could increment/decrement the ID in the URL and see other client's data. I had to patch in a random number validation code to hack a fix for that.

Wait, clients could see other clients data by putting in the id, and your fix was to make the ids random? Didn't want to try server-side validation with some sort of permissions and login?

I don't know what Buddy was doing, but adding random characters essentially gives each data item its own password.

I've seen this where clients want to share their read-only access to their data with other other people who do not have credentials for the server. Think photo sharing: People post photos and then send the link to their friends or customers. The server has to provide the image to anyone with the right URL, because that is its purpose. However, users may start to complain if they figure out that their image URLs can be guessed by random strangers.

The folks at SmugMug went through this a year or two ago. The guessability of URLs was obvious to anyone who thought about it, and users had the option of requiring authentication. Then one day it erupted as a "scandle" and SmugMug fixed it by adding random numbers to make it harder to brute force the URLs.

Greebo · 2009-03-13 Reply Admin

WayneCollins:

Since you've identified Oracle and MSSQL as your target databases, the single request should be accomplished with a stored procedure call.

In Oracle, use a sequence to reserve the number, and then insert it as the unique ID.

In MSSQL, us an Identity column, and SCOPE_IDENTITY() (not @@IDENTITY) to retrieve the newly inserted ID from the current scope (@@IDENTITY ignores scope).

That solves the multi-threaded conflict issue and the "single statement" is a sproc.

ContraCorners · 2009-03-13 Reply Admin

VLPRONJ:
IANARP (I am not a real programmer), and even I can tell that random numbers have limited application in business invoicing...

Unless you work Enron, of course.

The Quest for the Unique ID

Leave a comment on “The Quest for the Unique ID”