- Feature Articles
- CodeSOD
-
Error'd
- Most Recent Articles
- Stop Poking Me!
- Operation Erred Successfully
- A Dark Turn
- Nothing Doing
- Home By Another Way
- Coast Star
- Forsooth
- Epic
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
Isn't using a database backend to serve banner ads a OMGWTFBBQ? sounds enterprisey... throw in some XML logging and you have yourself a $2000 product! ("Guaranteed to serve 50 random images per minute!")
Admin
Problem statement: "10 million transactions have been carried out in the past year. I need to perform QA (by hand) enough records to establish (with a 95 % confidence) that < 1 % of all transaction where in error. Produce a suitable result."
Admin
I mostly aggree, but here is another example where randomness can be desireable: In the management of an automated warehouse, containers have be (equally) distributed over aisles etc. Using a random generator can, in some cases, deliver better results than a cycle counter. For example, imagine you have a rack with two sides ("left" and "right") and you want to load them equally. Using a cycle counter, the first container goes "left", the second one "right", the third one "left" etc. On one unlucky day, you receive a big delivery where the original containers are a bit larger than the storage containers. Let's assume the original containers contain 12 pieces but only 10 pieces fit into the storage boxes. (For simplicity, we assume all products are of the same size)
So the workers put 10 pieces in the first box, the remaining 2 in the second box. Both boxes go into the warehouse. Then the worker opens the next original container and puts 10 pieces into the third box, the remaining 2 into the fourth box. And so on. At the end of the day, there are many boxes with 10 pieces on the left side and many boxes with 2 pieces on the right side of the rack, so the load (weight) is unequal, which means mechanical stress for the rack.
Granted, this example is a bit over-simplified and artificial, but I've experienced similar effects in some of the projects I've been working on.
Admin
Nope. That's not really random. Since you're using >= it's biased towards IDs that come right after records that have been deleted.
Think about it: If your table only had IDs: 1, 2, 3, and 100 then your algorithm is going to pick 100 a heckuva lot more often then it'll pick 1, 2, or 3.
Admin
Care to explain that a little more simply? In what situation would using the COUNT(*) to generate a LIMIT (or OFFSET) clause fail?
Admin
If I remember correctly array_rand() at one point did not work very well at all, and returned a biased order. Besides why not just ORDER BY RAND() ?
Admin
Did anyone here actually try to ORDER BY RAND() on a big dataset? Performance is questionable to say the least. I believe a "LIMIT 1" will trigger an optimized linear algorithm instead of a full sort, but I still hope there's a better alternative.
Admin
@GeneWitch
Despite the example being contrived, I don't see how that would be 'enterprisey'. A database is just a relational store of data.
Maybe you skipped ahead a few steps and misread the intention of serving the ad (gif, flash, etc) from the database using BLOBs or something. I would agree that could be a WTF - especially when the original WTF'er did a SELECT * just to grab the row id's.
Admin
Really? I can think of a few applications of random numbers in business:
Lucky draws. This is a very common way of promoting products and services. The task is to find a random row from the customers table for the prize.
Quality Control. In manufacturing, they pick a small sample (e.g. 1%) randomly from the finished products, and check if the sample meets the standards. In the service industries, companies may survey their customers (again, randomly picked from the customers table) and done a telephone questionnaire with them to find out if customers are satisfied with the services provided by the company. (The company may offer a small reward -- coupons or souvenirs -- to those who answered the questionnaire.) Why sampling? For manufacturing, checking the product would ruin it. So, you can only check a small sample. For telephone questionnaires, it'd be too costly to do a complete census on EVERY customer (even if you confine to "EVERY customer" in the age group 25--30).
Admin
Wikipedia has a random page function with a few million rows. We store a random number in the records with an index on it, generate a random number in the client, select the first record greater than or equal to client value LIMIT 1. The LIMIT can use the index so it's very fast, no sorting required.
Easy to get 20 rows with this, LIMIT 20, you're done if you get all 20, try again to get the rest if you don't. Repeat until you do or notice that the row count is zero. :)
Admin
Admin
Oh my. I present a random picture in the sidebar of my personal webpage whenever it's reloaded. Some photos have been deleted, COUNT(picture_id)=185, MAX(picture_id)=250:
"SELECT picture_id FROM picture" 185 Rows in 0.01sec This list is selected into an array, a random id selected, then the rest of the data is picked. "SELECT * FROM picture WHERE picture_id = my_rand_id" 1 row in set (0.90 sec)
The page comes up in around a second. If I were to try and be funky and use the ORDER BY rand() method it would take:
"SELECT picture_id FROM picture ORDER BY rand() LIMIT 1" 1 row in set (40.06 sec)
Using random(min(picture_id), max(picture_id)) is similarly out of the question as ids are sparse..
Admin
ORDER BY Rand() doesn't work on MSSQL Version 2000 and 2005
The original code isn't that bad. Sometimes the brute-force approach is fine, and it's much more reliable than something 'clever' like ORDER BY RAND... which seems to only work on random systems here and there...
Admin
Well, for selecting a single row, this would seem the fastest:
SELECT COUNT(primarykey) as length FROM table
then
$num = mt_rand(0, length);
then
SELECT * FROM table WHERE primarykey = $num
untested.
Admin
please give me solution fetch N-th row using LIMIT N,1 is proportional to N (or at least strictly monotone:) ) [/quote]
Admin
considering it's PHP from a newbie programmer, it's not all that bad. ok, it's a log way round of doing something, but there's even use of looping constructs in there, something I've seen self-styled "PHP gurus with over ten years of experience" have no clue about whatsoever.
captcha: tristique - over the years, PHP hasn't lost any of it's tristique (even with added namespace support)
Admin
Thanks for the post, In this complex environment business need to present there company data in meaningful way.So user easily understand it .Sqiar (http://www.sqiar.com/why-data-visualization/) which is in UK,provide services like Tableau and Data Warehousing etc .In these services sqiar experts convert company data into meaningful way.