The Daily WTF: Curious Perversions in Information Technology

2007-04-21 Reply Admin

This is why I love reading dailyWTF: the comments display more hillarious ignorance than the articles. If it's O(n^4), then doubling n means that it will require 16 times more processing power. Or, to put it another way, if you can get to line four in several seconds, with 16 times the power you'd be able to get to line eight. With 256 times the power, you'd be able to get to line 16. With 4096 times the power (a good ballpark for a core2 Xeon), you'd get to line 24.

I'll allow that you might be trying to make a joke. But "I don't understand time complexity" is not really a very funny one.

Pingmaster · 2007-04-21 Reply Admin

dkfal:
we won't have noticed that in the modern days with modern processors =p

Except that we now have help files that are several pages long (think SQL Server books online)...even a modern processor would take some time getting to the last character in a 5MB help file... so yes, with small files, not noticeable on new processors, large files; does anyone want to duplicate this and run a speed trial on a large file?

2007-04-21 Reply Admin

Her peers would probably laugh at our movie making skills.

seaturnip · 2007-04-21 Reply Admin

bobday:
dkfal:
we won't have noticed that in the modern days with modern processors =p
I have the sneaking suspicion that it would be noticeable on any file larger than a few MB.
If a modern computer can do file access 10000 times faster than the old Mac in question, the file only needs to be about 10 times bigger to see the same performance degradation.

Addendum (2007-04-20 11:21): I should say about 21 times bigger if it's an O(n^3) algorithm.

Indeed. One thing I've learned about optimization is that no amount of hardware speed can stop complete idiots from writing code that runs really slow.

WIldpeaks · 2007-04-21 Reply Admin

Just reading this article, I feel trolls about women & computers, and pascal will soon arise...

WIldpeaks · 2007-04-21 Reply Admin

Please. My work looks good all on its own.

http://www.dilbert.com/comics/dilbert/archive/images/dilbert2006109570418.gif

^_^

2007-04-22 Reply Admin

Tom:
Her peers would probably laugh at our movie making skills.

Good thing we aren't employed as cinematographers, then.

2007-04-22 Reply Admin

fennec:
I've searched through both pages for the phrase, and have found it lacking, so I suppose I must be the one to present the congratulations:
Brilliant!

Shouldn't that be "brillant" ?

2007-04-22 Reply Admin

As only a few people noted correctly, the complexity of the algorithm is only O(N^2) (assuming that the length of a line is bounded by some constant, which seems reasonable in this context), or just O(K^2) taking K to be the offset in the file where line N ends (and making no assumptions about line lengths). That would suggest that the runtime for line N would be about four times the runtime for line N/2 (at least for N "near infinity").

The times given in the article seem to rise far more steeply than that, although it's a bit hard to judge the situation at such low values of N. But consider this: reading line 6 would cause about 100,000 calls to Read (and that would be the bottle neck of the program). Surely that could execute well within a minute on a processor that executes 8 million instructions per second (or: well over 4000 instructions for each character to be read) especially since it's reasonable to assume that an essential library function like Read was implemented somewhat efficiently.

So, I'm assuming that the story was embellished a bit for our entertainment. (Not that there's anything wrong with that.)

2007-04-22 Reply Admin

Maks Verver:
As only a few people noted correctly, the complexity of the algorithm is only O(N^2) (assuming that the length of a line is bounded by some constant, which seems reasonable in this context), or just O(K^2) taking K to be the offset in the file where line N ends (and making no assumptions about line lengths). That would suggest that the runtime for line N would be about four times the runtime for line N/2 (at least for N "near infinity").
The times given in the article seem to rise far more steeply than that, although it's a bit hard to judge the situation at such low values of N. But consider this: reading line 6 would cause about 100,000 calls to Read (and that would be the bottle neck of the program). Surely that could execute well within a minute on a processor that executes 8 million instructions per second (or: well over 4000 instructions for each character to be read) especially since it's reasonable to assume that an essential library function like Read was implemented somewhat efficiently.

So, I'm assuming that the story was embellished a bit for our entertainment. (Not that there's anything wrong with that.)

Aren't you forgetting the time the I/O itself takes? With a driver that is unable to cache reads, that would be incredibly slow. (On a machine that old, the HDD would probably have been much slower than the new-fangled 10k RPM stuff we take for granted nowadays.)

2007-04-23 Reply Admin

but the algorithm is basically a linear search. Maybe the author intended O(4).

2007-04-23 Reply Admin

and there's essentially just 4 basic operations

but i'm just playing devil's advocate now

i hope she was a warrior princess

2007-04-23 Reply Admin

but i think the author was just being creative,

it works as an O(4) algorithm... it doesn't work as anything else... isn't it all just relative?

2007-04-23 Reply Admin

KattMan:
WhatTheFrack:
I work for a computer company; So I am really getting a kick out of most of these replies. Some of you guys are very good at making it sound like you know what you are talking about. But trust me.... You don't. I think you just want to make yourself sound smart, when in reality you don't know what you are talking about. This is how bad info gets passed around. If you don't know about the topic....Don't make yourself sound like you do. Cos some people believe anything they hear.”
( http://en.wikipedia.org/wiki/Big_O_notation )

( and for the uninitiated, http://forums.fark.com/cgi/fark/comments.pl?IDLink=1688267 )

I take offense to that as I know my Ramones music very well thank you. I have their entire anthology. Not to mention I also knew a girl named Sheena, but regretfully she wasn't a punk rocker.

I'll have you all know that I've used the internet now for quite some time, and you'r positings are very rude.

2007-04-23 Reply Admin

At least she followed the specification. Maybe they should've mentioned something about real-time...

2007-04-23 Reply Admin

Matthew Wakeling:
You are very unlikely to ever see an example of code that has a complexity of a power of N that isn't an integer, and such code will usually have a very big clue in it to indicate why it isn't an integer.

Uh... What about Karatsuba or Toom multiplication? Or anything taking advantage of the birthday paradox?

Where are the big clues? No sqrt's spelled out there.

2007-04-23 Reply Admin

zip:
It's a constant time algorithm? Awesome!!

could it be the answer to time travel!?

2007-04-23 Reply Admin

Mariano:
Matthew Wakeling:
for (int i = 2; i< n; i++) {
   boolean isPrime = true;
   for (int o = 2; o < sqrt(i); o++) {
      if (i % o == 0) {
         isPrime = false;
      }
   }
   if (isPrime) {
      print out i;
   }
}
O(n^1.5) is a rather coarse estimation of the running time of your example. You need knowledge on the distribution of the smallest prime divisor of numbers to do anything sensible.

That's why I didn't make the algorithm exit early in the loop if isPrime becomes true. The above algorithm really is O(n^1.5), and really is dumb. If you added the trivial optimisation of changing "for (int o = 2; o < sqrt(i); o++) {" into "for (int o = 2; o < sqrt(i) && isPrime; o++) {" then your statement would be true.

kimbo305 · 2007-04-23 Reply Admin

Man, Sheena's husband was a total dick. He balanced the extra income against Sheena's misery at doing a job she had no business doing and extra burden to her coworkers, and went with the extra income?

Then again, Sheena should have had the sense to quit on her own.

2007-04-23 Reply Admin

kimbo305:
Man, Sheena's husband was a total dick.
Then again, Sheena should have had the sense to quit on her own.

Yeah, I was thinking that. Maybe it was a tax cheat? Still, that must have been several years of pure misery for poor Sheena, unless she got the hang of programming later on. You would have thought that they could have transferred her to marketing, at the very least.

2007-04-23 Reply Admin

akatherder:
Boyaa:
Im guessing sheena was Indian and all code copied of the internet.

I can't place a good date on either, but the series of tubes that make our Internets was hardly in place back when there were 8 MHz Macs. Maybe she just got it out of computer magazines or on the back of cereal boxes.

Maybe the web wasn't around then, but I was using the internet when the Mac was introduced (FTP, telnet, usenet).

Remember kids, Internet != www

kimbo305 · 2007-04-23 Reply Admin

CB:
You would have thought that they could have transferred her to marketing, at the very least.

I don't see marketing being any easier or more interesting to someone who's got a background in film.

2007-04-23 Reply Admin

You guys laugh, but in the year 2002, I worked with a developer who held both B.S. and M.S. degrees in EECS from MIT (yes, the Massachusetts Institute of Technology, the most prestigious of American schools of engineering), wrote code that was not all that different from Sheena's.

To pull up a list of processed items in the database, he would retrieve all of them, iterate against the database to get the details, filter locally in the Java tier, and display them. At one point, he was making over 600 round trips to the database, and the page took 5 minutes to load. The whole thing was doable in a single trip to the database, and a stored procedure to fully support him was already written and available.

The best part is that he became the architect for the next product he worked on.

MichaelWojcik · 2007-04-23 Reply Admin

bobday:
The ReadC function parses the entire file up to the character, so calling it once is O(n) on filesize.
ReadL calls ReadC for every character in a line (say of average length m), so we have O(mn).

ReadLine calls ReadL for every line up to the requested one. So if the file has l lines, the expected runtime is O(lmn). But n = lm, so the runtime of a single call to ReadLine is O(n^2), and parsing a whole file is O(n^3).

Addendum (2007-04-20 11:22): Errr... parsing a whole file is O(ln^2)

l is n/m, so ln^2 is (n/m)n^2 is (n^3)/m. If m << n, we can discard the m term for big-O notation (of course we can in any case, since big-O is an upper bound, but let's pretend we really mean big-theta), and parsing the whole file is O(n^3), as you originally claimed.

Obviously there are potential real-world examples where m is close to or even greater than n (a help file with relatively few lines), but I think it's fair to go with O(n^2) for finding a single line and O(n^3) for iterating through all of them as estimates for realistic performance.

2007-04-23 Reply Admin

I can't believe how many people are struggling with Big-O notation. This talk about constant 80 characters per line or accessing the first item in the help file or later is irrelevant. You are over complicating it. It doesn't matter if it's the first line or not. Similar to performing a linear search, if I have a 100 items to iterate through to find my match and luckily the item I'm looking for is the first item in the list, it doesn't change the operation to a O(1) just because I got lucky. You always assume worse case (or as others have said, "as n approaches infinity"). Looping through/iterating a collection is an O(n) algorithm.

Simply put, the nested function calls above are no different than having 3 for loops all nested together. Each loop is iterating through it's own collection, an O(n) operation. Nesting like that gives us O(n) * O(n) * O(n) = O(n^3).

Furthermore, if it was only O(n^2), we wouldn't see the performance drop off as dramatically as was stated above when they moved to later items in the help file.

2007-04-23 Reply Admin

Jonno:
Maks Verver:
As only a few people noted correctly, the complexity of the algorithm is only O(N^2) (assuming that the length of a line is bounded by some constant, which seems reasonable in this context), or just O(K^2) taking K to be the offset in the file where line N ends (and making no assumptions about line lengths). That would suggest that the runtime for line N would be about four times the runtime for line N/2 (at least for N "near infinity").
The times given in the article seem to rise far more steeply than that, although it's a bit hard to judge the situation at such low values of N. But consider this: reading line 6 would cause about 100,000 calls to Read (and that would be the bottle neck of the program). Surely that could execute well within a minute on a processor that executes 8 million instructions per second (or: well over 4000 instructions for each character to be read) especially since it's reasonable to assume that an essential library function like Read was implemented somewhat efficiently.

So, I'm assuming that the story was embellished a bit for our entertainment. (Not that there's anything wrong with that.)
Aren't you forgetting the time the I/O itself takes? With a driver that is unable to cache reads, that would be incredibly slow. (On a machine that old, the HDD would probably have been much slower than the new-fangled 10k RPM stuff we take for granted nowadays.)

On the machine in question there was no hard drive. The disk was a single sided 400K floppy. The 90,000 Disk reads in the 'line 6' example would have been a really REALLY big bottleneck on that disk drive.

2007-04-23 Reply Admin

Pascal was the official language for the mac up until about OS 7 or 8. I still have my books around. It was fun when they started shifting the books to C++ because all the variables had to be typecast to Pascal prior to making the system calls.

2007-04-23 Reply Admin

Luds:
I can't believe how many people are struggling with Big-O notation. This talk about constant 80 characters per line or accessing the first item in the help file or later is irrelevant. You are over complicating it. It doesn't matter if it's the first line or not. Similar to performing a linear search, if I have a 100 items to iterate through to find my match and luckily the item I'm looking for is the first item in the list, it doesn't change the operation to a O(1) just because I got lucky. You always assume worse case (or as others have said, "as n approaches infinity"). Looping through/iterating a collection is an O(n) algorithm.
Simply put, the nested function calls above are no different than having 3 for loops all nested together. Each loop is iterating through it's own collection, an O(n) operation. Nesting like that gives us O(n) * O(n) * O(n) = O(n^3).

Furthermore, if it was only O(n^2), we wouldn't see the performance drop off as dramatically as was stated above when they moved to later items in the help file.

If a function f(n) can be written as a finite sum of other functions, then the fastest growing one determines the order of f(n). In particular, if a function may be bounded by a polynomial in n, then as n tends to infinity, one may disregard lower-order terms of the polynomial.

If n is the number of characters, x is the number of lines, and y is the accumulation of z in (n-z)...(n-4)+(n-3)+(n-2)+(n-1)+n (her character reading algorithm), then the formula for her total application algorithm is: x * n^2 -y

As stated above we can disregard the -y. We can then simplify: x/x * n/x *n/x = (divide all terms by x) 1 * n/x * n/x = (simplify - x/x = 1) 1 * n^2/x or just n^2/x = (simplify - 1 times anything is that thing) n^2 (simplify - the divisor is less than n and grows as linearly as n grows quadratically, so we can ignore the divisor as well for large values of n).

Therefore we have O(n^2) as lim n -> infinity.

I hope this puts that to bed now once and for all.

2007-04-23 Reply Admin

Tinkerghost:
Pascal was the official language for the mac up until about OS 7 or 8. I still have my books around. It was fun when they started shifting the books to C++ because all the variables had to be typecast to Pascal prior to making the system calls.

I am happy that one of the 'official' languages on the Mac today is another one that starts with the letter 'p' -- Python.

Captcha - darwin, how appropriate!

2007-04-23 Reply Admin

Luds:
I can't believe how many people are struggling with Big-O notation. This talk about constant 80 characters per line or accessing the first item in the help file or later is irrelevant. You are over complicating it. It doesn't matter if it's the first line or not. Similar to performing a linear search, if I have a 100 items to iterate through to find my match and luckily the item I'm looking for is the first item in the list, it doesn't change the operation to a O(1) just because I got lucky. You always assume worse case (or as others have said, "as n approaches infinity"). Looping through/iterating a collection is an O(n) algorithm.
Simply put, the nested function calls above are no different than having 3 for loops all nested together. Each loop is iterating through it's own collection, an O(n) operation. Nesting like that gives us O(n) * O(n) * O(n) = O(n^3).

Your statement that the algorithm is O(n^3) is correct, but the statement that the algorithm is O(n^2) is not only correct but more precise.

Your problem is that you are not using a consistent definition of "n". Let's define variables, with multicharacter names for clarity:

lines - total number of lines within the file chars - total number of characters within the file maxlinelen - the maximum length of a line

ReadC is obviously O(chars). ReadL calls it O(maxlinelen) times, so it is O(maxlinelenchars). ReadLine calls ReadL O(lines) times, so it is O(linesmaxlinelenchars). Clearly if all lines are the same length, linesmaxlinelen = chars, so ReadLine is O(chars^2). In fact, if they are not all the same length, the same thing is true, though it's a bit trickier to show. "chars" is what everyone but you is calling "n", so O(n^2) is correct.

I'll make this O(lines*maxlinelen) => O(chars) step more obvious, since more than one person has tripped over it:

group_iterate(int n)
{
    const int groupsize = ...;
    int i, group_num, j;

    for (i = 0, group_num = 0; group_num < n/groupsize; group_num++) {
        for (j = 0; j < groupsize && i < n; i++, j++) {
            do_something(i);
        }
    }
}

This code calls do_something only O(n) times, even though it has nested for loops. (It is also O(n^2), since Big-Oh bounds do not need to be stated as tightly as possible. If you were to state that it is T(n^2), however, you would be wrong.) This is analogous to the outer loop over lines and the middle loop over characters within a line.

That not all lines are the same length doesn't change the outcome. Still for any given call to ReadL, ReadC is called exactly once for every character from the beginning of the file to the last character of the line, inclusive.

And quoting again part of the above:

You always assume worse case (or as others have said, "as n approaches infinity").

Those are two orthogonal ideas. All asymptotic notation describes an asymptote (a simpler notation the result approaches as n approaches infinity). O(...) is by definition worst-case. O(...) is by definition best-case. T(...) is both (and thus is only defined if they are the same.) It's not unusual to give a best-case time function (precise definition, not using asymptotic notation), and it's not unusual to give a best-case O(...) (as n approaches infinity).

Furthermore, if it was only O(n^2), we wouldn't see the performance drop off as dramatically as was stated above when they moved to later items in the help file.

Yeah, even at O(n^4) ReadLine(5) would only take 2.4X as long as ReadLine(4), not change from "several seconds" to "over a minute". Their claims are exaggerated and/or there's something more to it.

As I briefly mentioned in an earlier post, I doubt Sheena's was the only WTF in this system...the interface for the tiny piece they gave her was ill-considered. ReadLine(l) is just not an appropriate interface for files with variable-length lines with no indexes, as it simply must be at least O(l). They probably are looping calls to ReadLine(0) ... ReadLine(n) and obviously this interface doesn't even optimize the "I'm on line l-1, they want line l" case. So that takes them to cubic right there...who knows what else lurks.

2007-04-23 Reply Admin

Sorry, a couple paragraphs of my last post got mangled in a confusing way because apparently wtf's forum software transliterated my greek characters to latin ones or something.

slamb:
Those are two orthogonal ideas. All asymptotic notation describes an asymptote (a simpler notation the result approaches as n approaches infinity). O(...) is by definition worst-case. O(...) is by definition best-case. T(...) is both (and thus is only defined if they are the same.) It's not unusual to give a best-case time function (precise definition, not using asymptotic notation), and it's not unusual to give a best-case O(...) (as n approaches infinity).

I'll write out the names of the characters instead. That should read:

Those are two orthogonal ideas. All asymptotic notation describes an asymptote (a simpler notation the result approaches as n approaches infinity). O(...) is by definition worst-case. Theta(...) is by definition best-case. omega(...) is both (and thus is only defined if they are the same.) It's not unusual to give a best-case time function (precise definition, not using asymptotic notation), and it's not unusual to give a best-case omega(...) (as n approaches infinity).

There are a couple other paragraphs that say nonsensical things like "O(n^2) is true but O(n^2) is false". In each case, the second one is supposed to be Omega(n^2).

2007-04-23 Reply Admin

er, Theta(n^2). sigh I can't even blame the forum software that time...I just wrote the wrong character name.

2007-04-23 Reply Admin

We can then simplify: x/x * n/x *n/x = (divide all terms by x)

Huh? How did you justify dividing the expression by x^3 all of a sudden? Remember, x depends on n(and you have made no assumptions as to how yet). You might as well have said, "divide all terms by x*n^2" and concluded that the algorithm runs in constant time.

1 * n/x * n/x = (simplify - x/x = 1) 1 * n^2/x or just n^2/x = (simplify - 1 times anything is that thing)

(n/x)*(n/x) = (n/x)^2, not n^2/x.

A reasonable assumption is x = Cn; that is, the average number of characters per line is constant. (0 < C < 1, but this is irrelevant.)

Then, working from your xn^2, xn^2 = Cn^3, so the complexity is O( n^3 ).

2007-04-24 Reply Admin

Thank you all for the totally pointless numerical analysis.

Pointless, because if you read my earlier addendum, you'd know the 128K Mac had no extra RAM for anything sane like a disk cache. So as soon as 6'th line, the poor disk would have to, for EACH CHARACTER, re-read the first block of the file, scan all of it, rad the second block, then scan it for the next character. Plus if more than a second or so had elapsed, the disk driver would have turned off the drive motor, requiring a spin-up to speed, not to mention finding the block and reading it.

2007-04-24 Reply Admin

Ummm, so your saying that the algorithm performs at a constant '4'??

Ancient_Hacker · 2007-04-25 Reply Admin

Ummm, so your saying that the algorithm performs at a constant '4'??

Around these neck of the woods, everybody knows that O() notation is about performance for large values of n. So rather than waste a second, we talk about O(2) and O(3), everybody realizing it could only refer to O(n^3) et cetera.

Curiously enough, in this one case there's a couple large constant multiplier factors that kick in. Going from one disk block to another is a big factor, especially if the blocks are on different tracks, or woe is you, if the tracks are recorded at different speeds.

2007-04-25 Reply Admin

You mean O(n**4) presumably.

2007-04-25 Reply Admin

So, my question is why wasn't help text stored in the STR/STR# resource? Then tell the OS to pull it. Plus, you can support many languages that way and still change it as needed. (It's been a while since I looked at classic resources so my memory may be off on their names.) Not a WTF per se, but a huh?

2007-04-25 Reply Admin

grg:
Thank you all for the totally pointless numerical analysis.
Pointless, because if you read my earlier addendum, you'd know the 128K Mac had no extra RAM for anything sane like a disk cache. So as soon as 6'th line, the poor disk would have to, for EACH CHARACTER, re-read the first block of the file, scan all of it, rad the second block, then scan it for the next character. Plus if more than a second or so had elapsed, the disk driver would have turned off the drive motor, requiring a spin-up to speed, not to mention finding the block and reading it.

I still don't buy it. Even on an original Mac, I think they had at least 512-byte blocks. At reasonable line sizes, the sixth line would still be on the first block. And I don't see how a second would have elapsed without reading a character, since you say there was no disk cache. So I don't see how there'd be any magic threshold between five and six lines.

2011-03-01 Reply Admin

kinds of watches and women handbags for shopping

www.replica038.com

And We Got Sheena

Leave a comment on “And We Got Sheena”