The Daily WTF: Curious Perversions in Information Technology

2013-02-18 Reply Admin

"Fr1st".toLowerCase()

2013-02-18 Reply Admin

Y U NO ESCAPE CHARS IN CODE? :P

2013-02-18 Reply Admin

Dat validation

Hmmmm · 2013-02-18 Reply Admin

Troll:
Y U NO ESCAPE CHARS IN CODE? :P

Perhaps this was an ironic meta-WTF by the author given the second sentence, though I doubt it...

GettinSadda · 2013-02-18 Reply Admin

For added Lulz, this post only makes sense if you read the HTML source!

biziclop · 2013-02-18 Reply Admin

At least there will be no debate about what TRWTF is.

2013-02-18 Reply Admin

ESCAPE FROM THIS MADNESS

2013-02-18 Reply Admin

Time to go write some really WTFy code containing malicious Javascript and wait for it to be published.

2013-02-18 Reply Admin

F***-it Fred:
Time to go write some really WTFy code containing malicious Javascript and wait for it to be published.

phynol · 2013-02-18 Reply Admin

if(pageHTML.toLowerCase().regionMatches(index, " tag }

What C family language supports arbitrarily closing a paran with a brace? Or are we just making WTFs up as we go along?

2013-02-18 Reply Admin

But the PRE tag shuts off the HTML interpreter, doesn't it?

Hint: They're trying to train us to view source on every article.

Remy Porter · 2013-02-18 Reply Admin

Someone forgot to escape their "<". It's been fixed, and for good measure, run through a syntax highlighter so that you can see the WTFness in the code IN COLOR.

Remy Porter · 2013-02-18 Reply Admin

There honestly should be such a tag. There should also be a tag that allows you to pass its contents to a different interpreter, thus making it easier to inline binary data.

2013-02-18 Reply Admin

phynol:
Erik Gern:
if(pageHTML.toLowerCase().regionMatches(index, " tag }
What C family language supports arbitrarily closing a paran with a brace? Or are we just making WTFs up as we go along?

None.

But sometimes people forget that "pre" in HTML does not allow you to use angular brackets in HTML directly (without encoding them as their corresponding HTML-entities).

Remy Porter · 2013-02-18 Reply Admin

And regarding the article, this isn't a WTF. Regular expressions are expensive and difficult to maintain!

2013-02-18 Reply Admin

So you look at your code, and you think: "hmmm... maybe I shouldn't call toLowerCase() more than once on the same string".

Bang, along comes Donald Knuth and says "premature optimization is the root of all evil!" ;-)

Profiler, anyone?

Raedwald · 2013-02-18 Reply Admin

Parsing HTML with regular expressions? That never goes well.

2013-02-18 Reply Admin

So the toLowerCase is clearly a WTF. Comparing the text to every known tag is at best a borderline WTF. There are more efficient methods, but they are more complicated to implement. I can think of:

Construct a tree-structure before processing, containing all known tags, where each node is a character. Then read each tag one character at a time while navigating the tree. (or do this implicitly, with switch, but that could be even uglier and more WTF-y)
Search for the first non-letter character, and use the string up to that as a key into a hash table.

snoofle · 2013-02-18 Reply Admin

article:
...it also lower cased the entire document multiple times...

So it converted the entire 1+M document to lower case 70+ times for every tag in the file? That's a lot of cpu-grinding. This generates unnecessary heat.

Forget carbon emissions; this is where global warming comes from people!

2013-02-18 Reply Admin

As a PHP newb, I'd be thankful if someone could name one of those "reliable libraries a developer could use to do the heavy lifting." A simple one, please.

2013-02-18 Reply Admin

Remy Porter:
And regarding the article, this isn't a WTF. Regular expressions are expensive and difficult to maintain!

Sure, and as The Guru told us, "the delay is a little price to pay as long as the code keeps its essence. Just put more CPU power and memory". And the boss just bent before those deep words, while we were hearing it with astonishing devotion. Not a WTF at all. Just as The Guru told us.

2013-02-18 Reply Admin

faoileag:
So you look at your code, and you think: "hmmm... maybe I shouldn't call toLowerCase() more than once on the same string".
Bang, along comes Donald Knuth and says "premature optimization is the root of all evil!" ;-)

Profiler, anyone?

Once you figure out that your code crashes, or takes a day to process a large page, the optimization is not premature anymore.

2013-02-18 Reply Admin

I actually like the first test in the sample given: it fires on all tags starting with "<a", not only the anchor tag.

Ah well, the "Do a test involving the tag"-Test will probably weed out applets, areas and the like.

2013-02-18 Reply Admin

gnasher729:
faoileag:
So you look at your code, and you think: "hmmm... maybe I shouldn't call toLowerCase() more than once on the same string".
Bang, along comes Donald Knuth and says "premature optimization is the root of all evil!" ;-)
Once you figure out that your code crashes, or takes a day to process a large page, the optimization is not premature anymore.

Definitely not. And "Pedro the Profiler" rightfully comes to the rescue.

But storing the result of toLowerCase() in a temp var and working on that variable would be :-)

2013-02-18 Reply Admin

faoileag:
But storing the result of toLowerCase() in a temp var and working on that variable would be :-)

But storing the result of toLowerCase() in a temp var and working on that variable straightaway before the method has had a chance to choke on large pages would be.

FTFM

2013-02-18 Reply Admin

Slow yes, but who here thinks it would take 24 hours to process a single page?

snoofle · 2013-02-18 Reply Admin

Black Bart:
Slow yes, but who here thinks it would take 24 hours to process a single page?

In fairness, have you seen some of the crap generated by Frontpage?

2013-02-18 Reply Admin

Black Bart:
Slow yes, but who here thinks it would take 24 hours to process a single page?

Methinks. Do you imagine how painful should be to lowercase Finnish text? And more than 70 times?

2013-02-18 Reply Admin

It's worse than that.

Every time a tag is found the entire page is converted to lowercase.

2013-02-18 Reply Admin

ZoomST:
Black Bart:
Slow yes, but who here thinks it would take 24 hours to process a single page?
Methinks. Do you imagine how painful should be to lowercase Finnish text? And more than 70 times?

It's worse than that - every time a tag is found on the page the whole page is converted to lowercase. 70+ times.

2013-02-18 Reply Admin

ZoomST:
Black Bart:
Slow yes, but who here thinks it would take 24 hours to process a single page?
Methinks. Do you imagine how painful should be to lowercase Finnish text? And more than 70 times?

You must be a Russian.

2013-02-18 Reply Admin

Bobby Tables :
ZoomST:
Black Bart:
Slow yes, but who here thinks it would take 24 hours to process a single page?
Methinks. Do you imagine how painful should be to lowercase Finnish text? And more than 70 times?
It's worse than that - every time a tag is found on the page the whole page is converted to lowercase. 70+ times.

It's worse than that - every time an opening angular bracket is found, the whole page is converted to lowercase 70+ times, because all if-clauses are executed everytime, no matter how early the current tag appears in the that list of if-clauses.

That makes it N * 70+ lowercase calls, where N is the number of opening angular in the page.

DaveK · 2013-02-18 Reply Admin

fa2k:
So the toLowerCase is clearly a WTF. Comparing the text to every known tag is at best a borderline WTF. There are more efficient methods, but they are more complicated to implement. I can think of: - Construct a tree-structure before processing, containing all known tags, where each node is a character. Then read each tag one character at a time while navigating the tree. (or do this implicitly, with switch, but that could be even uglier and more WTF-y) - Search for the first non-letter character, and use the string up to that as a key into a hash table.

If you really think that using a hash table to do string lookups is "complicated" and that sequential strcmps against every possible match is only a "borderline WTF", you should not be programming. Hash tables are about as basic as fire or the wheel.

2013-02-18 Reply Admin

DaveK:
fa2k:
So the toLowerCase is clearly a WTF. Comparing the text to every known tag is at best a borderline WTF. There are more efficient methods, but they are more complicated to implement. I can think of: - Construct a tree-structure before processing, containing all known tags, where each node is a character. Then read each tag one character at a time while navigating the tree. (or do this implicitly, with switch, but that could be even uglier and more WTF-y) - Search for the first non-letter character, and use the string up to that as a key into a hash table.
If you really think that using a hash table to do string lookups is "complicated" and that sequential strcmps against every possible match is only a "borderline WTF", you should not be programming. Hash tables are about as basic as fire or the wheel.

Or the fiery wheel. Which is all kinds of awesome!

2013-02-18 Reply Admin

Wait, does this actually work?

Featured Comment Baby!

2013-02-18 Reply Admin

DaveK:
If you really think that using a hash table to do string lookups is "complicated" and that sequential strcmps against every possible match is only a "borderline WTF", you should not be programming. Hash tables are about as basic as fire or the wheel.

Actually, with a good strcmp implementation, a dozen calls to strcmp will likely be faster than your homegrown hash implementation. Have a look at the instruction set of a newer Intel processor. There are additions to the instruction set that were specifically made because processing of XML etc. takes significant percentages of total CPU time.

2013-02-18 Reply Admin

Joe tester:

Wait, does this actually work?

Featured Comment Baby!

Works for me. Must be your fault. :)

dkf · 2013-02-18 Reply Admin

gnasher729:
Actually, with a good strcmp implementation, a dozen calls to strcmp will likely be faster than your homegrown hash implementation.

While strcmp is awesomely fast, the hashing might be a reasonable approach of the string is long (since if the data is large enough, you'll effectively-flush the DCache and your performance will be back to that of main memory). Depending on exactly what sort of match is desired.

2013-02-18 Reply Admin

gnasher729:
DaveK:
If you really think that using a hash table to do string lookups is "complicated" and that sequential strcmps against every possible match is only a "borderline WTF", you should not be programming. Hash tables are about as basic as fire or the wheel.
Actually, with a good strcmp implementation, a dozen calls to strcmp will likely be faster than your homegrown hash implementation. Have a look at the instruction set of a newer Intel processor. There are additions to the instruction set that were specifically made because processing of XML etc. takes significant percentages of total CPU time.

As always, it depends. With some techniques you can search for different things simultanously (e.g. a lexer generator such as flex with uses parallel regular expressions), so you could shave off a factor of 70 here. Specialized CPU instructions can hardly match that.

Then again, if you get rid of the quadratic complexity (i.e. converting the whole string to lower-case and possibly anything else that traverses the whole string in each loop), you can shave off a factor on the order of a million for large files, so that's clearly the more important thing here. If that's done and it's still too slow (unlikely), you can care about a measly 70x speedup next.

2013-02-18 Reply Admin

if(pageHTML.toLowerCase().regionMatches(index, "<img", 0, 4)){ //Do a test involving the <img> tag }

But what if your code needs to be international? Do you really want to rewrite this to parse the Finnish [image] tag?

Plan ahead. Maybe you should include your list of tags expressed in every possible language, just to be sure.

2013-02-18 Reply Admin

Bobby Tables:
It's worse than that. Every time a tag is found the entire page is converted to lowercase.

It's worse than that, he's dead, Jim.

2013-02-18 Reply Admin

DaveK:
fa2k:
There are more efficient methods, but they are more complicated to implement. I can think of: - Construct a tree-structure before processing, containing all known tags, where each node is a character. Then read each tag one character at a time while navigating the tree. - Search for the first non-letter character, and use the string up to that as a key into a hash table.
If you really think that using a hash table to do string lookups is "complicated" and that sequential strcmps against every possible match is only a "borderline WTF", you should not be programming. Hash tables are about as basic as fire or the wheel.

Right out of college I worked for a giant global consulting firm with a one-word name that sounds like a sneeze. I wrote crap-tons of J2EE for lots of huge enterprise applications. At that firm, we would have been given bad marks on our review if we had implemented either of the solutions you suggest.

Speed and efficiency weren't really what our project leads cared about; making the code maintainable by cheap commodity programmers later was more their concern. If performance testing showed that the application had a bottleneck, they would just tell the client they're going to need some more infrastructure to drive the finished product.

More than once I brought a module to my lead for a code review, and in the module I had done fairly simple things, like caching the results of expensive methods, or adding a subclass so I could pass data around in logical, sensical ways, and I would be told that it was "too complicated" for future developers to understand, and would I please just code the simplest and most straightforward procedure that met the (barely coherent) specifications and not spend time thinking about how "best" to do it?

Anyway, my bitterness aside, it's entirely plausible that this code was written this way not because the developer thought it was a good idea, but because management found the good idea to be too complicated for their poor little brains.

2013-02-18 Reply Admin

gnasher729:
Have a look at the instruction set of a newer Intel processor. There are additions to the instruction set that were specifically made because processing of XML etc. takes significant percentages of total CPU time.

This is TRWTF. A general-purpose processor should not have application-specific instructions implemented in hardware.

Sometimes I wish Intel would let their engineers design the chips, instead of having the marketing department do it. (Pentium 4, I'm looking at you.)

chubertdev · 2013-02-18 Reply Admin

this

Raedwald:
Parsing HTML with regular expressions? That never goes well.

2013-02-18 Reply Admin

Thank some entity that my homework is only partially implementing HTTP-protocol... Why can't they have nice strict spec on web... Arbitary white space and no enforcement cases.

2013-02-18 Reply Admin

The Taginator -- destroying the web one page lookup at a time.

2013-02-18 Reply Admin

faoileag:
So you look at your code, and you think: "hmmm... maybe I shouldn't call toLowerCase() more than once on the same string".
Bang, along comes Donald Knuth and says "premature optimization is the root of all evil!" ;-)

Profiler, anyone?

No, in this case you have an easy reply to Donald: "It is not optimizing, I am only following DRY!"

2013-02-18 Reply Admin

This shows that Donald's advice is still good: If you don't write shtty code, there is probably no need to optimize. And if you wrote shtty code, it won't get better if you try to optimize it. Either way rule one of optimization holds: Don't do it.

2013-02-18 Reply Admin

A. Nonymous:
sh*tty

I don't recognize that word. It isn't in my dictionary. Can someone tell me what it means?

I hope it isn't a bad word. But if it is, I'm safe. As long as I don't know what it means, your bad word won't make me think a bad thought.

However if you've made some kind of error, that other people still understand, then they're still thinking bad thoughts despite your error.

So that couldn't be it.

Still confused.

2013-02-18 Reply Admin

Joe:
A. Nonymous:
sh*tty
I don't recognize that word. It isn't in my dictionary. Can someone tell me what it means?

Probably just a typo, seems to mean shoddy.

Internet.toLowerCase

Leave a comment on “Internet.toLowerCase”