• (cs) in reply to John
    Anonymous:

    Slightly off topic - Is this also a requirements / analysis WTF.

    Too bad we don't have any context for why it needed to be random in the first place.Does it truly need to be random, or just perceived to be random? Is there also requirements that all records should be viewed over time. Should we replace the word 'random' with 'not the same every time'

    I'm thinking of the case of banner ads or similar where the client just wants the content to change on subsequent requests. You really wouldn't want to be completely random because you can't guarantee that everything gets viewed and the quality of the random algorithm could bias your results.

    You might also have contractual obligations to show content, and if random, you can't assure the business it will be shown (unless of course you can convince the boss that your website will still be running when the universe dies)

     

    Isn't using a database backend to serve banner ads a OMGWTFBBQ? sounds enterprisey... throw in some XML logging and you have yourself a $2000 product! ("Guaranteed to serve 50 random images per minute!")

  • (cs) in reply to Anonymoose DBA

    Been working with database driven business applications for nearly 20 years and have yet to encounter a situation where random data was desired to be returned.  I have needed the occasional random number, but random data?  Can anyone give an example where that is useful? 

    Problem statement:  "10 million transactions have been carried out in the past year.  I need to perform QA (by hand) enough records to establish (with a 95 % confidence) that < 1 % of all transaction where in error.  Produce a suitable result."

  • (cs) in reply to Anonymoose DBA
    Anonymous:

    Been working with database driven business applications for nearly 20 years and have yet to encounter a situation where random data was desired to be returned.  I have needed the occasional random number, but random data?  Can anyone give an example where that is useful?  Only thing I can think of is a "Quote of the Day" scenario where you are unlikely to have millions of rows . . . So while this is an interesting mental exercise I wonder about the WTF'ery of a business application where random data is desired. 

    I mostly aggree, but here is another example where randomness can be desireable: In the management of an automated warehouse, containers have be (equally) distributed over aisles etc. Using a random generator can, in some cases, deliver better results than a cycle counter. For example, imagine you have a rack with two sides ("left" and "right") and you want to load them equally. Using a cycle counter, the first container goes "left", the second one "right", the third one "left" etc. On one unlucky day, you receive a big delivery where the original containers are a bit larger than the storage containers. Let's assume the original containers contain 12 pieces but only 10 pieces fit into the storage boxes. (For simplicity, we assume all products are of the same size)
    So the workers put 10 pieces in the first box, the remaining 2 in the second box. Both boxes go into the warehouse. Then the worker opens the next original container and puts 10 pieces into the third box, the remaining 2 into the fourth box. And so on. At the end of the day, there are many boxes with 10 pieces on the left side and many boxes with 2 pieces on the right side of the rack, so the load (weight) is unequal, which means mechanical stress for the rack.

    Granted, this example is a bit over-simplified and artificial, but I've experienced similar effects in some of the projects I've been working on.

  • Eli (unregistered) in reply to Patrik
    Anonymous:

    Some of the previous posts showed horrible solutions that were no better than the original code. Here's how I would do it:

    list($min, $max) = mysql_fetch_row(mysql_query('SELECT MIN(id), MAX(id) FROM abc'));
    $a = rand($min, $max)
    $row = mysql_fetch_row(mysql_query('SELECT * FROM abc WHERE id >= '.$a.' LIMIT 1'));

    Nope. That's not really random. Since you're using >= it's biased towards IDs that come right after records that have been deleted.
     
    Think about it: If your table only had IDs: 1, 2, 3, and 100 then your algorithm is going to pick 100 a heckuva lot more often then it'll pick 1, 2, or 3.
     

  • (cs) in reply to Ron Pakston
    To all you CHOPS who suggest that you SELECT COUNT(*) FROM BLA and then choose a random number between 0 and the result and the select are living/working in a dream world.

    What about application servers that generate a seed for the unique id on every restart ?

    What about seeding the PK to not start at 0 ?


    Care to explain that a little more simply?  In what situation would using the COUNT(*) to generate a LIMIT (or OFFSET) clause fail?
  • Boner (unregistered) in reply to Earl Purple

    If I remember correctly array_rand() at one point did not work very well at all, and returned a biased order. Besides why not just ORDER BY RAND() ?

  • (cs) in reply to Boner

    Did anyone here actually try to ORDER BY RAND() on a big dataset? Performance is questionable to say the least. I believe a "LIMIT 1" will trigger an optimized linear algorithm instead of a full sort, but I still hope there's a better alternative.

  • John (unregistered) in reply to GeneWitch

    @GeneWitch

    Despite the example being contrived, I don't see how that would be 'enterprisey'. A database is just a relational store of data.

    Maybe you skipped ahead a few steps and misread the intention of serving the ad (gif, flash, etc) from the database using BLOBs or something. I would agree that could be a WTF - especially when the original WTF'er did a SELECT * just to grab the row id's.

  • anonymous (unregistered) in reply to Anonymoose DBA
    Anonymoose DBA:
    Been working with database driven business applications for nearly 20 years and have yet to encounter a situation where random data was desired to be returned. I have needed the occasional random number, but random data? Can anyone give an example where that is useful? Only thing I can think of is a "Quote of the Day" scenario where you are unlikely to have millions of rows . . . So while this is an interesting mental exercise I wonder about the WTF'ery of a business application where random data is desired.

    Really? I can think of a few applications of random numbers in business:

    1. Lucky draws. This is a very common way of promoting products and services. The task is to find a random row from the customers table for the prize.

    2. Quality Control. In manufacturing, they pick a small sample (e.g. 1%) randomly from the finished products, and check if the sample meets the standards. In the service industries, companies may survey their customers (again, randomly picked from the customers table) and done a telephone questionnaire with them to find out if customers are satisfied with the services provided by the company. (The company may offer a small reward -- coupons or souvenirs -- to those who answered the questionnaire.) Why sampling? For manufacturing, checking the product would ruin it. So, you can only check a small sample. For telephone questionnaires, it'd be too costly to do a complete census on EVERY customer (even if you confine to "EVERY customer" in the age group 25--30).

  • James (unregistered)

    Wikipedia has a random page function with a few million rows. We store a random number in the records with an index on it, generate a random number in the client, select the first record greater than or equal to client value LIMIT 1. The LIMIT can use the index so it's very fast, no sorting required.

    Easy to get 20 rows with this, LIMIT 20, you're done if you get all 20, try again to get the rest if you don't. Repeat until you do or notice that the row count is zero. :)

  • sssasfg (unregistered) in reply to Konrads dot smelkovs at gmail dot com
    Konrads dot smelkovs at gmail dot com:
    So far, this is the sanest post out here , as in provides the best solution so far.Random sampling in SQL seems to be a domain of knowledge itself, this is what I found so far and think it is worth it:An academic bit (a homework actually)The quick and practical approachFor those not disgruntled with MS SQL ServerDB2 implements sql'03 ideas of bernoulli and system samplingBigger (albeit old) research on topic. The download bit is in top right corner. Citation list shows updated works on the subjectI myself would probably use the practical approach, as it is simple widely available even on feature poor systems, such as MySQL 4.0 and likes.P.S. A note the the editor: order by random() is not a wise idea :)
  • Adam (unregistered)

    Oh my. I present a random picture in the sidebar of my personal webpage whenever it's reloaded. Some photos have been deleted, COUNT(picture_id)=185, MAX(picture_id)=250:

    "SELECT picture_id FROM picture" 185 Rows in 0.01sec This list is selected into an array, a random id selected, then the rest of the data is picked. "SELECT * FROM picture WHERE picture_id = my_rand_id" 1 row in set (0.90 sec)

    The page comes up in around a second. If I were to try and be funky and use the ORDER BY rand() method it would take:

    "SELECT picture_id FROM picture ORDER BY rand() LIMIT 1" 1 row in set (40.06 sec)

    Using random(min(picture_id), max(picture_id)) is similarly out of the question as ids are sparse..

  • Jasmine (unregistered)

    ORDER BY Rand() doesn't work on MSSQL Version 2000 and 2005

    The original code isn't that bad. Sometimes the brute-force approach is fine, and it's much more reliable than something 'clever' like ORDER BY RAND... which seems to only work on random systems here and there...

  • Patrick (unregistered)

    Well, for selecting a single row, this would seem the fastest:

    SELECT COUNT(primarykey) as length FROM table

    then

    $num = mt_rand(0, length);

    then

    SELECT * FROM table WHERE primarykey = $num

    untested.

  • surender singh (unregistered) in reply to qbolec

    please give me solution fetch N-th row using LIMIT N,1 is proportional to N (or at least strictly monotone:) ) [/quote]

  • I've seen far, far worse (unregistered)

    considering it's PHP from a newbie programmer, it's not all that bad. ok, it's a log way round of doing something, but there's even use of looping constructs in there, something I've seen self-styled "PHP gurus with over ten years of experience" have no clue about whatsoever.

    captcha: tristique - over the years, PHP hasn't lost any of it's tristique (even with added namespace support)

  • sqiar (unregistered) in reply to John

    Thanks for the post, In this complex environment business need to present there company data in meaningful way.So user easily understand it .Sqiar (http://www.sqiar.com/why-data-visualization/) which is in UK,provide services like Tableau and Data Warehousing etc .In these services sqiar experts convert company data into meaningful way.

Leave a comment on “Finding Random Rows”

Log In or post as a guest

Replying to comment #110124:

« Return to Article