• Mister Cheese (unregistered) in reply to Kermos
    Kermos:
    I've got MySQL running next to me on the server here and it's been doing its job faithfully for over a year now with no problems.
    Brave thing to say on Friday 13th...
  • (cs) in reply to DaveK
    DaveK:
    Sixnutz Esq.:
    DaveK:
    icywindow:
    Or perhaps he should have taken advantage of Perl's relationship with PHP and put it into a hash function to get the number out, then increment until you find the next untaken number, after all, 1000000 is prime, right guys?
    Hmmm, you're not under the misapprehension that hash tables *have* to have a prime number of entries, are you? A commonly-held canard.

    A commonly held duck? I don't get it.

    Well, I canardly explain it better than it's already explained in the other thread at http://forums.thedailywtf.com/forums/p/11152/193432.aspx#193432.
    HST: Look, you're Samoan. Maybe you can explain this to me. Buffalo Attorney: My people are bleeding on the streets, and you want me to click on this? HST Yeah ... I'm expecting bats and lizards, and all I see is this goddamn duck. Attorney: Honky Bullshit! (*Enraged*) It's a Diplodocus! You want a bat on a skateboard? Madre de dios, Kentucky boy, I can show you lizards. Hell, Streisand is out there in the Tropicana piano lounge ... but that's not important right now. HST: You need to suck on this root ... and then we need to break into the penthouse suite. It's the only place within 50 miles of this dump that has The Fear. Attorney: The Fear of waking up after drinking 7/8 of a bottle of Mescal and realising that the Worm is still at the bottom? HST: I'm ready to face that Fear. Besides, bats and lizards have too many bones in them. Where did we put the ether?
  • (cs) in reply to moz
    moz:
    Anon:
    but why:
    Why aren't the invoice numbers allowed to be sequential ?

    Because you can't have a customer place order number 800234 one day and then order 800235 the next day and realize that you have no other customers!

    If that's causing a problem for you, use Fibbonacci sequential invoice numbers. Then your customer base will just keep growing!

    At a previous job we generated license serial numbers by adding a random number to the previously assigned serial number.

  • Mahesh (unregistered)

    In 2003, I saw similar code generated by a Micrsoft consultant. In that case he/she used GUID's and MS SQL Server.

  • David Mandlebrot (unregistered)

    Well, I've used exactly that method, but not for small key widths and large tables. More like 64 bit keys and tables with 1000 entries.

  • terry (unregistered) in reply to Buddy

    can't you just keep the ID in POST instead of GET /newb question

  • (cs) in reply to obediah
    obediah:
    rycamor:
    Why bother with a loop? Just increase the range and you're fine. At least, that's the thinking behind the transaction id generator' in a particular PHP/MySQL financial app I inherited:

    function tid()

    {

    $tid=srand((double) microtime() * 1000000);

    $randa = rand(48,57); ...

    $str = sprintf("0%c%c%c%c%c%c%c%c%c",$randa,$randb,$randc,$randd,$rande,$randf,$randg,$randh,$randi);

    return $str;

    }

    What's the magic of 48 and 57?

    They really should have beefed it up a bit:

    $randa = rand(rand(40,49), rand(50,59); ...

    And then shuffle $randa through $randi just to be sure :)

    I'm not sure of the magic of 48 and 57 myself. (I wonder if there is some numerical representation of the character zero?)

    Of course I don't know what numbers that would come up with if I randomly selected a "(" or ")" or ";" or "."

  • (cs) in reply to pink_fairy
    pink_fairy:
    Go to any English village green.

    The green will have a pond in it, or by it.

    The pond will most probably have ducks resident in it.

    English village greens are held "in common," unless subject to some hideous parliamentary Act of Enclosure.

    Thus, the ducks are held "in common."

    Simple, really. And now back to the real universe, in which words can have more than one meaning and are interpreted according to context. A difficult thing to achieve in SQL or PHP, but one that's oddly satisfying in English.

    Uhm, don't those ducks belong to the Queen? Or does she just lay claim to all the swans?

  • (cs) in reply to fanha
    fanha:
    pjt33:
    <snip> Also note the lack of index, so with n rows it could take O(n) time per query and the expected number of queries is N / (N - n) where N is the number of possible keys. nN / (N - n) grows fairly fast once n isn't small. <snip>

    Actually, your average runtime is going to be solved as the limit of the series (n/N) + 2*(n/N)^2 + ... k*(n/N)^k. This makes the calculation decidedly non-linear.

    F(n/N)(to 15 terms) 0.1 = 0.12 0.2 = 0.31 0.5 = 2.00 0.7 = 7.57 0.75 = 11.24 0.76 = 12.21 0.9 = 43.67

    So it grows exponentially from 0 at 0% load to infinity at 100% load; the performance would fall off sharply once it passed a particular threshold and it went exponential.

    Where do you get your series from? I'm operating on the basis that you have an exponential distribution with success probability (N - n) / N so expectation N / (N - n).

    There's clearly something wrong with your formula because at 0% load the expected number of trials is 1.

  • Craig Lewis (unregistered) in reply to obediah

    Ascii 48 is 0 Ascii 57 is 9

    printf's %c will take a character or an integer (since a character is just an 8 bit unsigned int).

    So it just generates a 9 digit random number.

    Captcha: damnum

  • chryse (unregistered) in reply to WayneCollins

    Yes, with auto identity columns. Upon insert you can read a @@identity key in a single call.

  • Anon (unregistered)
    evilspoons:
    2. I would *love* to see an invoice where my invoice number came in the form of a GUID - that would be awesome.

    Wouldn't it be fun when you have to call somebody and read the GUID invoice number to them other the phone?

  • (cs) in reply to FELIPE
    FELIPE:
    PLEASE DEFINE NONTRIVIAL
    The simplest way to define it is to use it in a sentence, e.g.
    I would have thought it was easy to turn off the caps lock before posting, but apparently for some commenters it is non-trivial.
  • (cs) in reply to hatterson
    hatterson:
    DaveK:
    hatterson:
    Brett:
    This reminds me of a sorting algorithm that a friend invented (in jest). He called it a "bumble sort".

    The code randomly looked at an array location and compared it's contents to the location immediately after. If it was lower than the first, it swapped the location's contents and repeated the process on another randomly selected array location.

    So how did it know when it was done? Did it just run forever and give you a higher and higher degree of confidence of sorting?

    Perhaps it did a full scan of the array after every flip to see if it was now in order? That seems the most logical solution for such a sorting algorithm.

    Also, bogosort.

    Also, Intelligent Design Sort. The most efficient sort algorithm yet discovered. Or do I mean invented?
    I would have gone with 'designed' or 'created'
    Well, I was going for 'subtle', where you have chosen to go for 'spelling out the joke in a fairly heavy-handed fashion'!
  • MF (unregistered) in reply to Anon

    until you get a race condition

  • (cs) in reply to chrismcb
    chrismcb:
    pink_fairy:
    Go to any English village green.

    The green will have a pond in it, or by it.

    The pond will most probably have ducks resident in it.

    English village greens are held "in common," unless subject to some hideous parliamentary Act of Enclosure.

    Thus, the ducks are held "in common."

    Simple, really. And now back to the real universe, in which words can have more than one meaning and are interpreted according to context. A difficult thing to achieve in SQL or PHP, but one that's oddly satisfying in English.

    Uhm, don't those ducks belong to the Queen? Or does she just lay claim to all the swans?

    No, not ducks. And only swans that live on rivers or lakes connecting to the sea.
  • C.K. (unregistered) in reply to Anon
    Anon:
    but why:
    Why aren't the invoice numbers allowed to be sequential ?

    Because you can't have a customer place order number 800234 one day and then order 800235 the next day and realize that you have no other customers! Same reason you don't start your invoice numbers at 000001.

    I was once in the position of choosing a invoicing scheme for a small business and I opted for something like 090313001 for the number system. first 6 digits are just the date and the last three are sequential numbers starting at 001 each day.

    I also could have randomized the start number each day to further obfuscate sales numbers.

  • pmv (unregistered)

    This code is actually brilliant. There coder has solved the problem of losing keys when deleting rows.

    Here's a example of the problem:

    CREATE TABLE aTable (Key PRIMARY KEY AUTO_INCREMENT, Val INT);

    for (i=0; i<one million zillion; i++) { query("INSERT INTO aTable (Val) VALUES (i)"; }

    DELETE FROM aTable WHERE Key = 2;

    Crap, we will never be able to fill row 2, what will we do? Lets remove the primary key, and insert using random numbers for the key!

  • Excalibur (unregistered)

    PHP actually has a uniqid() function: http://www.php.net/uniqid

    It "Gets a prefixed unique identifier based on the current time in microseconds." It can output a unique id with 13 or 23 characters.

  • (cs) in reply to Anon

    I went to college so long ago that we studied random number generators (and before relational databases existed). If you code your own linear congruential psuedo-random number generator, you can generate random looking six digit invoice numbers and never have to check for existance. X[K+1] = F(X[K]) and it only cycles after all possible values of X have been stepped through. 796380 follows 308047 reliably and if 308047 was the last one used then you are guaranteed that 796380 hasn't been used yet. They only look random. No, this doesn't solve the concurrent update problem.

  • (cs)

    There's a proper article on generating sequential keys, re-use etc. here: http://www.faqs.org/faqs/databases/sybase-faq/part14/ about half-way down. It's a bit Sybase-oriented, but applicable to any database. I'm still not convinced that anyone really really needs 'random' numbers to be generated for invoice numbers etc. How many customers do analysis of their invoice number sequences, and who cares if they do?

  • (cs)

    And I also guess that you think bogosort isn't of practical use? :o

  • Peets (unregistered) in reply to DaveShaw
    DaveShaw:
    Vollhorst:
    He could have run that code in multiple threads to speed it up!

    Argghh! That made me Laugh.

    So what happens when there are 999999 invoices in the database...??

    Dave

    To keep up with the programming principles at work in this fine example of code craft, I assume this where the quest for some GOTO equivalent instruction starts.

    I must applaud the creativity though, such an invoice number is implicitly secure because it ties in with absolutely nothing. Translated: it is useless in more ways than one - sequential or date+sequential gives you at least an idea where to start if you need to reference back to the data - it's taking a medium sized nuke to the associated business processes as you remove fallback.

    Suggestion to the original coder: take up gardening or something.

  • Will (unregistered) in reply to WayneCollins

    "By the way, is there a good way to insert a row with a unique key and get back the key in a single request to an Oracle or MSSQL server?"

    In Oracle, something like this:

    insert into invoice (InvoiceId, InvoiceAmount) values (invoice_seq.nextval, :a) returning InvoiceId INTO :i

  • Avalanche (unregistered) in reply to Anon

    In The Netherlands a system generating random invoice numbers would be breaking the law. Invoice numbers are required by law to be increasing without gaps (or gaps need to be accounted for) for a given time period (eg day, week, month or year). Also a number issued on the 15th must be higher than all numbers issued on the 14th.

    This rule is there to make auditing records easier (harder to fudge the records by adding or removing invoices)

  • (cs) in reply to WayneCollins

    Take a guess.

  • (cs) in reply to WayneCollins
    WayneCollins:
    hikari:
    Wow. At least my predecessors had the, uh, decency, to do "SELECT MAX(id)+1 FROM some_table;" to get the next free identity.

    Every day I go to work, I die a bit more.

    Did they have the decency to wrap that in a transaction or something?

    Take a guess...

  • m0ffx (unregistered) in reply to chrismcb
    chrismcb:
    Uhm, don't those ducks belong to the Queen? Or does she just lay claim to all the swans?
    Just the swans. And I think only those on the Thames are considered hers 'by default' - other rivers she would have to explicitly claim them.

    There was a very angry swan the other day when I was out rowing. Kept attacking the boats. But I think it came off worse when it flew into my oar!

  • Alan (unregistered)

    This is how BASF writes code.

  • (cs) in reply to m0ffx
    m0ffx:
    chrismcb:
    Uhm, don't those ducks belong to the Queen? Or does she just lay claim to all the swans?
    Just the swans. And I think only those on the Thames are considered hers 'by default' - other rivers she would have to explicitly claim them.

    There was a very angry swan the other day when I was out rowing. Kept attacking the boats. But I think it came off worse when it flew into my oar!

    Ducks, on the other hand, are great fun when punting. I vividly (well, actually, aurally) remember running over one -- accidentally -- on the Cherwell many years back. You know the saying "90% of a duck's/swan's activity happens under water?" Try putting 100% of a very, very buoyant water-fowl under water.

    Basically, the thing sounded like it was testing out Galileo's theory of the inclined plane. I'd swear I heard a very indignant, and water-logged, quack half-way through.

    Don't try this with swans. Not only does the Queen own them (under specific circumstances, as pointed out by DaveK above), but they're vicious. Not only that, but they have a penis. I'm told that ducks do, too, but somehow I don't have any fear of being raped by a duck.

    Know of any rowing club on or near the Leam, btw? All I can find is goddamn human ducks.

  • Edward Royce (unregistered)

    Hmmm.

    Ok so the guy was too lazy to not even bother with a "NOT IN"??

  • (cs) in reply to Edward Royce
    Edward Royce:
    Hmmm.

    Ok so the guy was too lazy to not even bother with a "NOT IN"??

    Hi Ed.

    The word is "DUCK". "NOT" optional.

    What, youm a fan of static typing?

  • Tommy (unregistered) in reply to WayneCollins

    Yes, like this (Java and with SQLSERVER):

    INSERT INTO Customer (Name, Address) VALUES (?,?) SELECT SCOPE_IDENTITY()

    Excute it as a query and you'll get the Identity :)

  • _dew_ (unregistered) in reply to pink_fairy
    pink_fairy:
    Actually, I believe that "thinking like a manager" is an oxymoron. Not just because managers are an imaginary irritant, but because, by definition, they are incapable of thinking.
    s/"incapable of"/"not paid for"
  • (cs) in reply to Anon
    Anon:
    Yes, but you're thinking like a rational person, try thinking like a manager instead.
    I'm sorry, but I find myself incapable of believing that technical expertise makes a person unable to offer advice in that specific area of expertise, or that productivity is able to be increased infinitely without bound. Guess I'm not management material.
  • The Queen (unregistered)

    We do not like being associated with ducks. We are not amused.

  • EmperorOfCanada (unregistered)

    This code might be more appropriate if the invoice number needed to be random (which would be somewhat useless in such small number). I which case you don't want something sequential coming from the database. I suspect that the key problem was also bad in a different way, and that was that the invoice number column was not indexed.

  • (cs)

    Alex/Jake:

    Do be serious for once.

    It ill befits an interwobbly site to remove a posting just because it mentions the word "pnis." Especially if you've sorted things out with your grandmother (god bless her). And particularly if the pnis in question belongs to a swan and/or duck.

    I am sad. I am morose. Please refund my subscription.

    As an unrelated point, has anyone out there been raped by a duck? I'm told you can get compensation in the tens of thousands of ... quacks.

    Addendum (2009-03-14 18:36): I'ev just checked this on my expert system. Everbofy needs an expret system. Guess what it said"

    "No Quack."

  • (cs)

    Have you been snorting the fairy dust again?

    Anyway, why is everybody here assuming that if the database has a unique, primary, auto-incremented, index key as an identifier, it must be identical to the invoice number? If you want a random-looking invoice number, just take this unique identifier and disguise it a bit. You know, for instance multiply it by 777 and add a random number between 0 and 776 or whatever. Then the customer won't be able to guess other invoice numbers, and the company can tell the original identifier from a customer's invoice number by dividing it by 777. Or it that too simple?

  • BjörkFan86 (unregistered)

    Stop being so anal. Just make the number 999999 a billion times bigger and this code will run fine for a 1000 more years.

  • (cs)

    This mess doesn't look too distanced from an open-address hash table with very high load factor, really.

  • rycamor (unregistered) in reply to Craig Lewis
    Craig Lewis:
    Ascii 48 is 0 Ascii 57 is 9

    printf's %c will take a character or an integer (since a character is just an 8 bit unsigned int).

    So it just generates a 9 digit random number.

    Captcha: damnum

    Wow, your captcha was even better than mine (validus).

    Yes, it just generates a random 9-digit number and prefixes a '0'. The whole function could have been:

    function tid()
    {
       srand((double) microtime() * 1000000);
       return sprintf("%010d",rand(0,9999999999));
    }

    Oops!! I just made the function 10 times better by randomizing the first digit. Or is that 10 billion times better?

    Anyway, it was only part of the WTF, the main one being that the developer had not really thought out the nature of financial systems, confusing transactions for payments. Thus, the 'transactions' table had a 'tid' column that could not be made unique because any one transaction would involve two sub-charges, thus 3 rows of payments, so the 'tid' column could not be unique.

    You know, the sort of thing one solves handily by having a 'transactions' table with a 'tid' column and a 'payments' table with a 'tid' foreign key...

    Sigh

  • smilr (unregistered)

    To my shame - I have used a variant of this system before.

    Picture a blackjack game written in TI-Basic, running on a TI-86 graphing calculator. Considering this was typed in on the calculator itself and not compiled on a computer the available variables were rather limited.

    So - to represent and store the "deck" of cards I used the current graphics buffer for the graphplot as memory. 52 pixels were mapped out and turned dark on the display. Instead of shuffling out a sequence of cards, my program randomly selected from the "deck" by picking a card (pixel), and if it was still dark - "deal" it out and turn the pixel dark to mark the card as no longer in the deck.

    Even then on the calculator one could FEEL the slowdown as the available unused cards dwindled. I shudder to think this was used in a real production database.

  • smilr (unregistered) in reply to smilr
    smilr:
    if it was still dark - "deal" it out and turn the pixel dark to mark the card as no longer in the deck.

    Dang - make that "turn the pixel light" to mark it as dealt from the deck. W

  • (cs) in reply to Burdieburd
    Burdieburd:
    But why is it suddenly slowing down? This would just gradually become slower as the table filled up, and take forever after no slots were left.

    "Gradually"? Wait, are you assuming the shape of a curve you haven't even seen, let alone calculated? All you know is that the time it takes goes up. You don't know how quickly it becomes slower.

    Here's how to derive:

    Let p(x) = the probability that a randomly chosen number is valid, when x IDs are used up. It is clear that p(x) = (1,000,000-x)/1,000,000 when there are 1,000,000 IDs in total. p'(x) is the probability of failure, or 1-p(x).

    Next, let q(x) = the average (mean) number of attempts it will take to get a single valid ID, each attempt succeeding with probability p(x).

    This weighted average is calculated by multiplying the number of attempts with the likelihood of success occurring exactly after that number of attempts. That is 1 times p(x), plus 2 times p'(x) (failure on one attempt) times p(x) (success on the second), and so on.

    q(x) = p(x) + 2p'(x)p(x) + 3(p'(x))^2p(x) ...

    = Sum[for i=0 to infinity] ( (i+1) * p'(x)^i * p(x) )

    (I've forgotten how to derive the limit of this converging series, but for the graph you could probably get away with just calculating the first 100 terms.)

    When you've graphed that for x=0 to x=1,000,000, then you can talk about it gradually becoming slower. My un-founded guess is that the slope gets increasingly steep, but I'll know more after a bit of scripting. :)

    Addendum (2009-03-15 05:59): Here's a curve showing collision count:

    [image]

    As expected, the collisions rise slowly at first, and then dramatically increase. I had to use a logarithmic scale to even show it as a curve instead of a sort of " _| ".

  • (cs)

    Didn't anyone notice the "echo $DB->erro();"? Outputting error messages when your code is very likely to be vulnerable to SQL injections certainly makes life easier for wannabe hackers. After all, you want to know whether little Bobby really dropped that table. Maybe this code isn't vulnerable (who can tell?*), but other code by the same author probably is given that this stuff has all the telltale signs of Luke 23:34 programming ("Forgive them, for they know not what they do."). SELECT * FROM ".$_TB['key'] ? I bet register_globals is on. For convenience. Where do $a and $Type come from, anyway?

    *) By the way, this is THE argument to end the old prepared (more accurately: preparable) statements vs. argument-escaping debate: With prepared statements, you can (in most cases) easily verify that your code is invulnerable to SQL injection, barring a library bug of course. With argument escaping and PHP in particular, on the other hand, it is very hard to verify that your code is invulnerable under all circumstances. You can only verify that some code is vulnerable in retrospect. In other words, absence of proof is once again much weaker than proof of absence.

    Of course, I'm just dreaming about this argument ending the debate. There are already a dozen reasons for preferring preparable statements over argument escaping, yet people still rationalize the use of the latter. banghead

  • (cs) in reply to Untouchable
    Untouchable:
    $code = mt_rand(1000000, 99999999);
    Because you know, Just do what it takes.

    Nominated for blue color.

  • (cs) in reply to snoofle
    snoofle:
    User1                              User2
    GetNewId                           GetNewId
    n=Random(000000,999999) // 1234    n=Random(000000,999999) // 1234
    select                             select
    if 1234 not in db                  if 1234 not in db
       insert 1234                        insert 1234
    

    universe explodes

    you know, i suspect this could very well be the reason for the random selection...the probability of 2 users randomly selecting the same number is 1/9999999^2, the probability of select max(userid) +1 causing duplicate id's is almost certain. thus - better solution!

    (btw: i always worry when i leave out the <sarcasm> tags)

  • Barbie (unregistered) in reply to Burdieburd
    Burdieburd:
    But why is it suddenly slowing down? This would just gradually become slower as the table filled up, and take forever after no slots were left.

    Tipping point, geometric increase in time as the number of invoices increases.

  • (cs) in reply to rycamor
    rycamor:
    Craig Lewis:
    Ascii 48 is 0 Ascii 57 is 9

    printf's %c will take a character or an integer (since a character is just an 8 bit unsigned int).

    So it just generates a 9 digit random number.

    Captcha: damnum

    Wow, your captcha was even better than mine (validus).

    Yes, it just generates a random 9-digit number and prefixes a '0'. The whole function could have been:

    function tid()
    {
       srand((double) microtime() * 1000000);
       return sprintf("%010d",rand(0,9999999999));
    }

    Oops!! I just made the function 10 times better by randomizing the first digit. Or is that 10 billion times better?

    Actually you just made it a lot worse. See yours just generates 1 random number. The first one generated 9 random numbers.

    9 > 1

    This is really simply stuff.

Leave a comment on “The Quest for the Unique ID”

Log In or post as a guest

Replying to comment #:

« Return to Article