• Steve (unregistered)

    "Fr1st".toLowerCase()

  • Troll (unregistered)

    Y U NO ESCAPE CHARS IN CODE? :P

  • Dat validation (unregistered)

    Dat validation

  • Hmmmm (cs) in reply to Troll
    Troll:
    Y U NO ESCAPE CHARS IN CODE? :P
    Perhaps this was an ironic meta-WTF by the author given the second sentence, though I doubt it...
  • GettinSadda (cs)

    For added Lulz, this post only makes sense if you read the HTML source!

  • biziclop (cs)

    At least there will be no debate about what TRWTF is.

  • Anonymous (unregistered)

    ESCAPE FROM THIS MADNESS

  • F***-it Fred (unregistered)

    Time to go write some really WTFy code containing malicious Javascript and wait for it to be published.

  • foo (unregistered) in reply to F***-it Fred
    F***-it Fred:
    Time to go write some really WTFy code containing malicious Javascript and wait for it to be published.
  • phynol (cs)
    if(pageHTML.toLowerCase().regionMatches(index, " tag } 

    What C family language supports arbitrarily closing a paran with a brace? Or are we just making WTFs up as we go along?

  • Preakness (unregistered)

    But the PRE tag shuts off the HTML interpreter, doesn't it?

    Hint: They're trying to train us to view source on every article.

  • Remy Porter (cs) in reply to phynol

    Someone forgot to escape their "<". It's been fixed, and for good measure, run through a syntax highlighter so that you can see the WTFness in the code IN COLOR.

  • Remy Porter (cs) in reply to Preakness

    There honestly should be such a tag. There should also be a tag that allows you to pass its contents to a different interpreter, thus making it easier to inline binary data.

  • faoileag (unregistered) in reply to phynol
    phynol:
    Erik Gern:
    if(pageHTML.toLowerCase().regionMatches(index, " tag }
    What C family language supports arbitrarily closing a paran with a brace? Or are we just making WTFs up as we go along?
    None.

    But sometimes people forget that "pre" in HTML does not allow you to use angular brackets in HTML directly (without encoding them as their corresponding HTML-entities).

  • Remy Porter (cs)

    And regarding the article, this isn't a WTF. Regular expressions are expensive and difficult to maintain!

  • faoileag (unregistered)

    So you look at your code, and you think: "hmmm... maybe I shouldn't call toLowerCase() more than once on the same string".

    Bang, along comes Donald Knuth and says "premature optimization is the root of all evil!" ;-)

    Profiler, anyone?

  • Raedwald (cs)

    Parsing HTML with regular expressions? That never goes well.

  • fa2k (unregistered)

    So the toLowerCase is clearly a WTF. Comparing the text to every known tag is at best a borderline WTF. There are more efficient methods, but they are more complicated to implement. I can think of:

    • Construct a tree-structure before processing, containing all known tags, where each node is a character. Then read each tag one character at a time while navigating the tree. (or do this implicitly, with switch, but that could be even uglier and more WTF-y)
    • Search for the first non-letter character, and use the string up to that as a key into a hash table.
  • snoofle (cs) in reply to Steve
    article:
    ...it also lower cased the entire document multiple times...
    So it converted the entire 1+M document to lower case 70+ times for every tag in the file? That's a lot of cpu-grinding. This generates unnecessary heat.

    Forget carbon emissions; this is where global warming comes from people!

  • Noumenon (unregistered)

    As a PHP newb, I'd be thankful if someone could name one of those "reliable libraries a developer could use to do the heavy lifting." A simple one, please.

  • ZoomST (unregistered) in reply to Remy Porter
    Remy Porter:
    And regarding the article, this isn't a WTF. Regular expressions are expensive and difficult to maintain!
    Sure, and as The Guru told us, "the delay is a little price to pay as long as the code keeps its essence. Just put more CPU power and memory". And the boss just bent before those deep words, while we were hearing it with astonishing devotion. Not a WTF at all. Just as The Guru told us.
  • gnasher729 (unregistered) in reply to faoileag
    faoileag:
    So you look at your code, and you think: "hmmm... maybe I shouldn't call toLowerCase() more than once on the same string".

    Bang, along comes Donald Knuth and says "premature optimization is the root of all evil!" ;-)

    Profiler, anyone?

    Once you figure out that your code crashes, or takes a day to process a large page, the optimization is not premature anymore.

  • faoileag (unregistered)

    I actually like the first test in the sample given: it fires on all tags starting with "<a", not only the anchor tag.

    Ah well, the "Do a test involving the tag"-Test will probably weed out applets, areas and the like.

  • faoileag (unregistered) in reply to gnasher729
    gnasher729:
    faoileag:
    So you look at your code, and you think: "hmmm... maybe I shouldn't call toLowerCase() more than once on the same string".

    Bang, along comes Donald Knuth and says "premature optimization is the root of all evil!" ;-)

    Once you figure out that your code crashes, or takes a day to process a large page, the optimization is not premature anymore.
    Definitely not. And "Pedro the Profiler" rightfully comes to the rescue.

    But storing the result of toLowerCase() in a temp var and working on that variable would be :-)

  • faoileag (unregistered) in reply to faoileag
    faoileag:
    But storing the result of toLowerCase() in a temp var and working on that variable would be :-)
    But storing the result of toLowerCase() in a temp var and working on that variable straightaway before the method has had a chance to choke on large pages would be.

    FTFM

  • Black Bart (unregistered)

    Slow yes, but who here thinks it would take 24 hours to process a single page?

  • snoofle (cs) in reply to Black Bart
    Black Bart:
    Slow yes, but who here thinks it would take 24 hours to process a single page?
    In fairness, have you seen some of the crap generated by Frontpage?
  • ZoomST (unregistered) in reply to Black Bart
    Black Bart:
    Slow yes, but who here thinks it would take 24 hours to process a single page?
    Methinks. Do you imagine how painful should be to lowercase Finnish text? And more than 70 times?
  • Bobby Tables (unregistered) in reply to ZoomST

    It's worse than that.

    Every time a tag is found the entire page is converted to lowercase.

  • Bobby Tables (unregistered) in reply to ZoomST
    ZoomST:
    Black Bart:
    Slow yes, but who here thinks it would take 24 hours to process a single page?
    Methinks. Do you imagine how painful should be to lowercase Finnish text? And more than 70 times?

    It's worse than that - every time a tag is found on the page the whole page is converted to lowercase. 70+ times.

  • Doctor_of_Ineptitude (unregistered) in reply to ZoomST
    ZoomST:
    Black Bart:
    Slow yes, but who here thinks it would take 24 hours to process a single page?
    Methinks. Do you imagine how painful should be to lowercase Finnish text? And more than 70 times?

    You must be a Russian.

  • faoileag (unregistered) in reply to Bobby Tables
    Bobby Tables :
    ZoomST:
    Black Bart:
    Slow yes, but who here thinks it would take 24 hours to process a single page?
    Methinks. Do you imagine how painful should be to lowercase Finnish text? And more than 70 times?
    It's worse than that - every time a tag is found on the page the whole page is converted to lowercase. 70+ times.
    It's worse than that - every time an opening angular bracket is found, the whole page is converted to lowercase 70+ times, because all if-clauses are executed everytime, no matter how early the current tag appears in the that list of if-clauses.

    That makes it N * 70+ lowercase calls, where N is the number of opening angular in the page.

  • DaveK (cs) in reply to fa2k
    fa2k:
    So the toLowerCase is clearly a WTF. Comparing the text to every known tag is at best a borderline WTF. There are more efficient methods, but they are more complicated to implement. I can think of: - Construct a tree-structure before processing, containing all known tags, where each node is a character. Then read each tag one character at a time while navigating the tree. (or do this implicitly, with switch, but that could be even uglier and more WTF-y) - Search for the first non-letter character, and use the string up to that as a key into a hash table.
    If you really think that using a hash table to do string lookups is "complicated" and that sequential strcmps against every possible match is only a "borderline WTF", you should not be programming. Hash tables are about as basic as fire or the wheel.
  • Anon (unregistered) in reply to DaveK
    DaveK:
    fa2k:
    So the toLowerCase is clearly a WTF. Comparing the text to every known tag is at best a borderline WTF. There are more efficient methods, but they are more complicated to implement. I can think of: - Construct a tree-structure before processing, containing all known tags, where each node is a character. Then read each tag one character at a time while navigating the tree. (or do this implicitly, with switch, but that could be even uglier and more WTF-y) - Search for the first non-letter character, and use the string up to that as a key into a hash table.
    If you really think that using a hash table to do string lookups is "complicated" and that sequential strcmps against every possible match is only a "borderline WTF", you should not be programming. Hash tables are about as basic as fire or the wheel.

    Or the fiery wheel. Which is all kinds of awesome!

  • Joe tester (unregistered)

    Wait, does this actually work?

    Featured Comment Baby!

  • gnasher729 (unregistered) in reply to DaveK
    DaveK:
    If you really think that using a hash table to do string lookups is "complicated" and that sequential strcmps against every possible match is only a "borderline WTF", you should not be programming. Hash tables are about as basic as fire or the wheel.
    Actually, with a good strcmp implementation, a dozen calls to strcmp will likely be faster than your homegrown hash implementation. Have a look at the instruction set of a newer Intel processor. There are additions to the instruction set that were specifically made because processing of XML etc. takes significant percentages of total CPU time.
  • foo (unregistered) in reply to Joe tester
    Joe tester:

    Wait, does this actually work?

    Featured Comment Baby!

    Works for me. Must be your fault. :)
  • dkf (cs) in reply to gnasher729
    gnasher729:
    Actually, with a good strcmp implementation, a dozen calls to strcmp will likely be faster than your homegrown hash implementation.
    While strcmp is awesomely fast, the hashing might be a reasonable approach of the string is long (since if the data is large enough, you'll effectively-flush the DCache and your performance will be back to that of main memory). Depending on exactly what sort of match is desired.
  • foo (unregistered) in reply to gnasher729
    gnasher729:
    DaveK:
    If you really think that using a hash table to do string lookups is "complicated" and that sequential strcmps against every possible match is only a "borderline WTF", you should not be programming. Hash tables are about as basic as fire or the wheel.
    Actually, with a good strcmp implementation, a dozen calls to strcmp will likely be faster than your homegrown hash implementation. Have a look at the instruction set of a newer Intel processor. There are additions to the instruction set that were specifically made because processing of XML etc. takes significant percentages of total CPU time.
    As always, it depends. With some techniques you can search for different things simultanously (e.g. a lexer generator such as flex with uses parallel regular expressions), so you could shave off a factor of 70 here. Specialized CPU instructions can hardly match that.

    Then again, if you get rid of the quadratic complexity (i.e. converting the whole string to lower-case and possibly anything else that traverses the whole string in each loop), you can shave off a factor on the order of a million for large files, so that's clearly the more important thing here. If that's done and it's still too slow (unlikely), you can care about a measly 70x speedup next.

  • Huck Finn (unregistered)
    if(pageHTML.toLowerCase().regionMatches(index, "<img", 0,="" 4)){="" do="" a="" test="" involving="" the=""></img",> tag }
    But what if your code needs to be international? Do you really want to rewrite this to parse the Finnish tag?

    Plan ahead. Maybe you should include your list of tags expressed in every possible language, just to be sure.

  • Jazz (unregistered) in reply to Bobby Tables
    Bobby Tables:
    It's worse than that. Every time a tag is found the entire page is converted to lowercase.

    It's worse than that, he's dead, Jim.

  • Jazz (unregistered) in reply to DaveK
    DaveK:
    fa2k:
    There are more efficient methods, but they are more complicated to implement. I can think of: - Construct a tree-structure before processing, containing all known tags, where each node is a character. Then read each tag one character at a time while navigating the tree. - Search for the first non-letter character, and use the string up to that as a key into a hash table.
    If you really think that using a hash table to do string lookups is "complicated" and that sequential strcmps against every possible match is only a "borderline WTF", you should not be programming. Hash tables are about as basic as fire or the wheel.

    Right out of college I worked for a giant global consulting firm with a one-word name that sounds like a sneeze. I wrote crap-tons of J2EE for lots of huge enterprise applications. At that firm, we would have been given bad marks on our review if we had implemented either of the solutions you suggest.

    Speed and efficiency weren't really what our project leads cared about; making the code maintainable by cheap commodity programmers later was more their concern. If performance testing showed that the application had a bottleneck, they would just tell the client they're going to need some more infrastructure to drive the finished product.

    More than once I brought a module to my lead for a code review, and in the module I had done fairly simple things, like caching the results of expensive methods, or adding a subclass so I could pass data around in logical, sensical ways, and I would be told that it was "too complicated" for future developers to understand, and would I please just code the simplest and most straightforward procedure that met the (barely coherent) specifications and not spend time thinking about how "best" to do it?

    Anyway, my bitterness aside, it's entirely plausible that this code was written this way not because the developer thought it was a good idea, but because management found the good idea to be too complicated for their poor little brains.

  • Jazz (unregistered) in reply to gnasher729
    gnasher729:
    Have a look at the instruction set of a newer Intel processor. There are additions to the instruction set that were specifically made because processing of XML etc. takes significant percentages of total CPU time.

    This is TRWTF. A general-purpose processor should not have application-specific instructions implemented in hardware.

    Sometimes I wish Intel would let their engineers design the chips, instead of having the marketing department do it. (Pentium 4, I'm looking at you.)

  • chubertdev (cs)

    this

    Raedwald:
    Parsing HTML with regular expressions? That never goes well.
  • Rnd( (unregistered)

    Thank some entity that my homework is only partially implementing HTTP-protocol... Why can't they have nice strict spec on web... Arbitary white space and no enforcement cases.

  • Gary Olson (unregistered)

    The Taginator -- destroying the web one page lookup at a time.

  • A. Nonymous (unregistered) in reply to faoileag
    faoileag:
    So you look at your code, and you think: "hmmm... maybe I shouldn't call toLowerCase() more than once on the same string".

    Bang, along comes Donald Knuth and says "premature optimization is the root of all evil!" ;-)

    Profiler, anyone?

    No, in this case you have an easy reply to Donald: "It is not optimizing, I am only following DRY!"

  • A. Nonymous (unregistered) in reply to A. Nonymous

    This shows that Donald's advice is still good: If you don't write shtty code, there is probably no need to optimize. And if you wrote shtty code, it won't get better if you try to optimize it. Either way rule one of optimization holds: Don't do it.

  • Joe (unregistered) in reply to A. Nonymous
    A. Nonymous:
    sh*tty
    I don't recognize that word. It isn't in my dictionary. Can someone tell me what it means?

    I hope it isn't a bad word. But if it is, I'm safe. As long as I don't know what it means, your bad word won't make me think a bad thought.

    However if you've made some kind of error, that other people still understand, then they're still thinking bad thoughts despite your error.

    So that couldn't be it.

    Still confused.

  • A. Nonymous (unregistered) in reply to Joe
    Joe:
    A. Nonymous:
    sh*tty
    I don't recognize that word. It isn't in my dictionary. Can someone tell me what it means?

    Probably just a typo, seems to mean shoddy.

Leave a comment on “Internet.toLowerCase”

Log In or post as a guest

Replying to comment #:

« Return to Article