- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
This is why I love reading dailyWTF: the comments display more hillarious ignorance than the articles. If it's O(n^4), then doubling n means that it will require 16 times more processing power. Or, to put it another way, if you can get to line four in several seconds, with 16 times the power you'd be able to get to line eight. With 256 times the power, you'd be able to get to line 16. With 4096 times the power (a good ballpark for a core2 Xeon), you'd get to line 24.
I'll allow that you might be trying to make a joke. But "I don't understand time complexity" is not really a very funny one.
Admin
Admin
Her peers would probably laugh at our movie making skills.
Admin
Indeed. One thing I've learned about optimization is that no amount of hardware speed can stop complete idiots from writing code that runs really slow.
Admin
Just reading this article, I feel trolls about women & computers, and pascal will soon arise...
Admin
http://www.dilbert.com/comics/dilbert/archive/images/dilbert2006109570418.gif
^_^
Admin
Good thing we aren't employed as cinematographers, then.
Admin
Shouldn't that be "brillant" ?
Admin
As only a few people noted correctly, the complexity of the algorithm is only O(N^2) (assuming that the length of a line is bounded by some constant, which seems reasonable in this context), or just O(K^2) taking K to be the offset in the file where line N ends (and making no assumptions about line lengths). That would suggest that the runtime for line N would be about four times the runtime for line N/2 (at least for N "near infinity").
The times given in the article seem to rise far more steeply than that, although it's a bit hard to judge the situation at such low values of N. But consider this: reading line 6 would cause about 100,000 calls to Read (and that would be the bottle neck of the program). Surely that could execute well within a minute on a processor that executes 8 million instructions per second (or: well over 4000 instructions for each character to be read) especially since it's reasonable to assume that an essential library function like Read was implemented somewhat efficiently.
So, I'm assuming that the story was embellished a bit for our entertainment. (Not that there's anything wrong with that.)
Admin
Admin
but the algorithm is basically a linear search. Maybe the author intended O(4).
Admin
and there's essentially just 4 basic operations
but i'm just playing devil's advocate now
i hope she was a warrior princess
Admin
but i think the author was just being creative,
it works as an O(4) algorithm... it doesn't work as anything else... isn't it all just relative?
Admin
I'll have you all know that I've used the internet now for quite some time, and you'r positings are very rude.
Admin
At least she followed the specification. Maybe they should've mentioned something about real-time...
Admin
Uh... What about Karatsuba or Toom multiplication? Or anything taking advantage of the birthday paradox?
Where are the big clues? No sqrt's spelled out there.
Admin
could it be the answer to time travel!?
Admin
That's why I didn't make the algorithm exit early in the loop if isPrime becomes true. The above algorithm really is O(n^1.5), and really is dumb. If you added the trivial optimisation of changing "for (int o = 2; o < sqrt(i); o++) {" into "for (int o = 2; o < sqrt(i) && isPrime; o++) {" then your statement would be true.
Admin
Man, Sheena's husband was a total dick. He balanced the extra income against Sheena's misery at doing a job she had no business doing and extra burden to her coworkers, and went with the extra income?
Then again, Sheena should have had the sense to quit on her own.
Admin
Yeah, I was thinking that. Maybe it was a tax cheat? Still, that must have been several years of pure misery for poor Sheena, unless she got the hang of programming later on. You would have thought that they could have transferred her to marketing, at the very least.
Admin
Maybe the web wasn't around then, but I was using the internet when the Mac was introduced (FTP, telnet, usenet).
Remember kids, Internet != www
Admin
Admin
You guys laugh, but in the year 2002, I worked with a developer who held both B.S. and M.S. degrees in EECS from MIT (yes, the Massachusetts Institute of Technology, the most prestigious of American schools of engineering), wrote code that was not all that different from Sheena's.
To pull up a list of processed items in the database, he would retrieve all of them, iterate against the database to get the details, filter locally in the Java tier, and display them. At one point, he was making over 600 round trips to the database, and the page took 5 minutes to load. The whole thing was doable in a single trip to the database, and a stored procedure to fully support him was already written and available.
The best part is that he became the architect for the next product he worked on.
Admin
l is n/m, so ln^2 is (n/m)n^2 is (n^3)/m. If m << n, we can discard the m term for big-O notation (of course we can in any case, since big-O is an upper bound, but let's pretend we really mean big-theta), and parsing the whole file is O(n^3), as you originally claimed.
Obviously there are potential real-world examples where m is close to or even greater than n (a help file with relatively few lines), but I think it's fair to go with O(n^2) for finding a single line and O(n^3) for iterating through all of them as estimates for realistic performance.
Admin
I can't believe how many people are struggling with Big-O notation. This talk about constant 80 characters per line or accessing the first item in the help file or later is irrelevant. You are over complicating it. It doesn't matter if it's the first line or not. Similar to performing a linear search, if I have a 100 items to iterate through to find my match and luckily the item I'm looking for is the first item in the list, it doesn't change the operation to a O(1) just because I got lucky. You always assume worse case (or as others have said, "as n approaches infinity"). Looping through/iterating a collection is an O(n) algorithm.
Simply put, the nested function calls above are no different than having 3 for loops all nested together. Each loop is iterating through it's own collection, an O(n) operation. Nesting like that gives us O(n) * O(n) * O(n) = O(n^3).
Furthermore, if it was only O(n^2), we wouldn't see the performance drop off as dramatically as was stated above when they moved to later items in the help file.
Admin
On the machine in question there was no hard drive. The disk was a single sided 400K floppy. The 90,000 Disk reads in the 'line 6' example would have been a really REALLY big bottleneck on that disk drive.
Admin
Pascal was the official language for the mac up until about OS 7 or 8. I still have my books around. It was fun when they started shifting the books to C++ because all the variables had to be typecast to Pascal prior to making the system calls.
Admin
If a function f(n) can be written as a finite sum of other functions, then the fastest growing one determines the order of f(n). In particular, if a function may be bounded by a polynomial in n, then as n tends to infinity, one may disregard lower-order terms of the polynomial.
If n is the number of characters, x is the number of lines, and y is the accumulation of z in (n-z)...(n-4)+(n-3)+(n-2)+(n-1)+n (her character reading algorithm), then the formula for her total application algorithm is: x * n^2 -y
As stated above we can disregard the -y. We can then simplify: x/x * n/x *n/x = (divide all terms by x) 1 * n/x * n/x = (simplify - x/x = 1) 1 * n^2/x or just n^2/x = (simplify - 1 times anything is that thing) n^2 (simplify - the divisor is less than n and grows as linearly as n grows quadratically, so we can ignore the divisor as well for large values of n).
Therefore we have O(n^2) as lim n -> infinity.
I hope this puts that to bed now once and for all.
Admin
I am happy that one of the 'official' languages on the Mac today is another one that starts with the letter 'p' -- Python.
Captcha - darwin, how appropriate!
Admin
Your statement that the algorithm is O(n^3) is correct, but the statement that the algorithm is O(n^2) is not only correct but more precise.
Your problem is that you are not using a consistent definition of "n". Let's define variables, with multicharacter names for clarity:
lines - total number of lines within the file chars - total number of characters within the file maxlinelen - the maximum length of a line
ReadC is obviously O(chars). ReadL calls it O(maxlinelen) times, so it is O(maxlinelenchars). ReadLine calls ReadL O(lines) times, so it is O(linesmaxlinelenchars). Clearly if all lines are the same length, linesmaxlinelen = chars, so ReadLine is O(chars^2). In fact, if they are not all the same length, the same thing is true, though it's a bit trickier to show. "chars" is what everyone but you is calling "n", so O(n^2) is correct.
I'll make this O(lines*maxlinelen) => O(chars) step more obvious, since more than one person has tripped over it:
This code calls do_something only O(n) times, even though it has nested for loops. (It is also O(n^2), since Big-Oh bounds do not need to be stated as tightly as possible. If you were to state that it is T(n^2), however, you would be wrong.) This is analogous to the outer loop over lines and the middle loop over characters within a line.
That not all lines are the same length doesn't change the outcome. Still for any given call to ReadL, ReadC is called exactly once for every character from the beginning of the file to the last character of the line, inclusive.
And quoting again part of the above:
Those are two orthogonal ideas. All asymptotic notation describes an asymptote (a simpler notation the result approaches as n approaches infinity). O(...) is by definition worst-case. O(...) is by definition best-case. T(...) is both (and thus is only defined if they are the same.) It's not unusual to give a best-case time function (precise definition, not using asymptotic notation), and it's not unusual to give a best-case O(...) (as n approaches infinity).
Yeah, even at O(n^4) ReadLine(5) would only take 2.4X as long as ReadLine(4), not change from "several seconds" to "over a minute". Their claims are exaggerated and/or there's something more to it.
As I briefly mentioned in an earlier post, I doubt Sheena's was the only WTF in this system...the interface for the tiny piece they gave her was ill-considered. ReadLine(l) is just not an appropriate interface for files with variable-length lines with no indexes, as it simply must be at least O(l). They probably are looping calls to ReadLine(0) ... ReadLine(n) and obviously this interface doesn't even optimize the "I'm on line l-1, they want line l" case. So that takes them to cubic right there...who knows what else lurks.
Admin
Sorry, a couple paragraphs of my last post got mangled in a confusing way because apparently wtf's forum software transliterated my greek characters to latin ones or something.
I'll write out the names of the characters instead. That should read:
There are a couple other paragraphs that say nonsensical things like "O(n^2) is true but O(n^2) is false". In each case, the second one is supposed to be Omega(n^2).
Admin
er, Theta(n^2). sigh I can't even blame the forum software that time...I just wrote the wrong character name.
Admin
A reasonable assumption is x = Cn; that is, the average number of characters per line is constant. (0 < C < 1, but this is irrelevant.)
Then, working from your xn^2, xn^2 = Cn^3, so the complexity is O( n^3 ).
Admin
Thank you all for the totally pointless numerical analysis.
Pointless, because if you read my earlier addendum, you'd know the 128K Mac had no extra RAM for anything sane like a disk cache. So as soon as 6'th line, the poor disk would have to, for EACH CHARACTER, re-read the first block of the file, scan all of it, rad the second block, then scan it for the next character. Plus if more than a second or so had elapsed, the disk driver would have turned off the drive motor, requiring a spin-up to speed, not to mention finding the block and reading it.
Admin
Ummm, so your saying that the algorithm performs at a constant '4'??
Admin
Around these neck of the woods, everybody knows that O() notation is about performance for large values of n. So rather than waste a second, we talk about O(2) and O(3), everybody realizing it could only refer to O(n^3) et cetera.
Curiously enough, in this one case there's a couple large constant multiplier factors that kick in. Going from one disk block to another is a big factor, especially if the blocks are on different tracks, or woe is you, if the tracks are recorded at different speeds.
Admin
You mean O(n**4) presumably.
Admin
So, my question is why wasn't help text stored in the STR/STR# resource? Then tell the OS to pull it. Plus, you can support many languages that way and still change it as needed. (It's been a while since I looked at classic resources so my memory may be off on their names.) Not a WTF per se, but a huh?
Admin
I still don't buy it. Even on an original Mac, I think they had at least 512-byte blocks. At reasonable line sizes, the sixth line would still be on the first block. And I don't see how a second would have elapsed without reading a character, since you say there was no disk cache. So I don't see how there'd be any magic threshold between five and six lines.
Admin
kinds of watches and women handbags for shopping
www.replica038.com