• minty (unregistered) in reply to tgape
    tgape:
    MET:
    Nikolai:
    Actually, this indicates that the developer knows what he is doing and tries to write the most efficient code possible (the kind of thing a lot of modern software engineers lack). Definitely not WTF.
    In my 13 years of experience a focus on efficiency first is always the mark of a n000b. After a while most realise that correctness and maintainability are almost always more important, and only a very few places need to coded as if every cycle counts. I agree a good developer should know how to code very efficiently, I just think they should know not to do so most of the time.

    While I agree that overly focusing on efficiency is the mark of a n00b, I certainly hope that you have some sense beyond that of your words.

    Have to agree with @tgape there. While premature optimization is the root of all evil, being an idiot is the root of all slow programs.

    Noob's just code by accident. They get something working and carefully step away lest it break. Since they don't know what they are doing, their attempts at optimization are sure to fail and just make the code YAWTF.

    Good programmers (hopefully) try to write decent code all of the time, and part of that is avoiding things that they know are slow or performance killers. It's not a premature optimization if you're not writing code you know to be bad.

    If that was the case, @MET, would you punish everyone using a StringBuilder without a performance test to validate it was necessary?

  • mutax (unregistered) in reply to m0ffx

    auuugghh my eyes!

    ===: is php :(

    php is the language:

    if ($a===true) { ... } else if ($a===false={ ... } else { fileNotFound(); }

    :((

  • more randomer than you (unregistered)

    never coded PHP, but a 5 second google search would have me believe that the naieve way of doing this properly would be:

    $spider_footprint = array('googlebot', 'crawler',...); if(in_array(strtolower($agent)),$spider_footprint) { $is_spider = 1; }

  • Rob (unregistered) in reply to m0ffx

    I wonder what the worst case complexity on that comparison is, perhaps we should just check if its somewhat equal and assume statistical luck.

  • more randomer than you (unregistered) in reply to W. Snapper
    W. Snapper:
    it couldn't be. All the states were known. If you're writing code that tests the validity of every single read from the database every place you use it, then you're doing it wrong.

    is there an image meme for when something goes completely over someone's head?typing "woosh" sometimes doesn't do it enough justice.

  • (cs) in reply to Jay
    Jay:
    dpm:
    W. Snapper:
    I just did this on a recent project, to find the status of account applications. Their database had both "Approved" and "approved," "declined" and "Declined," etc.

    Why would I not simply omit the first letter? And for those who are somehow claiming clarity, is there anyone who didn't instantly understand the code?

    status = "Reclined". Now what?

    So if a new browser hits the market that's named "Foogle" or "Yebcrawler", this code will break. I'd take a bet on the odds of that happening any time soon.

    That's hardly the point. Your profession is Software Engineering. You always cover your ass.

    Like it's been pointed out a hundred times in the comments already, stripos would do just fine. What's with this dropping the first character bullshit? An efficiency hack has a time and place, and this is not it.

  • Travis (unregistered)

    I actually has this problem a while back when trying to do a substring matching...

    End up using stristr() instead.

  • Minh (unregistered) in reply to m0ffx

    ===== is so equal that it overflows into testing inequality! So to prevent against such overflows, you should use a YesNoFactory.

  • Vollhorst (unregistered) in reply to m0ffx
    m0ffx:
    Steve:
    = : assignment == : is equal test === : is really f****ng equal test
    ==== : is genuinely, truly, ultimately, indisputably, beyond all reasonable doubt equal test
    No, ==== is File not Found
  • Access boy (unregistered) in reply to fmobus
    fmobus:

    The Real WTF(tm) is a library function returning int or bool. It should rather behave like C's strpos, Java's indexOf, Python's find(); they all return -1 if the haystack does not contain the needle. It makes more sense that way: you're testing the position of a substring, which is a number. A boolean would be expected if you're testing if string contains substring, regardless of position.

    Great, a function returning Unsigned Int, except when it returns Signed Int.

    The real WTF is that people still think in-band signalling is appropriate 40 years after Captain Crunch.

  • (cs) in reply to Access boy
    Access boy:
    Great, a function returning Unsigned Int, except when it returns Signed Int.

    The real WTF is that people still think in-band signalling is appropriate 40 years after Captain Crunch.

    You try doing something else in a language without sane variadic functions.

  • Bob Holness (unregistered) in reply to m0ffx

    ===== : is irrigation system

  • Steve H (unregistered) in reply to ChrisB
    ChrisB:
    if you Read The Fine Manual for strstr, you'll find the following message:
    Note: If you only want to determine if a particular needle occurs within haystack , use the faster and less memory intensive function strpos() instead.

    No, you won't. Either way, substr_count() is actually designed for the job unlike either of those two.

  • Inglorion (unregistered) in reply to m0ffx

    Actually, it makes sense to distinguish between different kinds of equality. For example, Common Lisp has numerous equality operators. To name just a few (with rough semantics in parentheses): eq (is the same object as), eql (has the same value as), equalp (has the same value or contains the same values as), string= (case-sensitive string comparison), string-equal (case-insensitive string comparison).

  • wombat (unregistered) in reply to m0ffx
    m0ffx:
    Steve:
    = : assignment == : is equal test === : is really f****ng equal test
    ==== : is genuinely, truly, ultimately, indisputably, beyond all reasonable doubt equal test

    does ==== == === ?

    or ==== === === ?

  • me (unregistered)

    This is also a nice way to not having to check for upper or lower case...

  • Jug (unregistered) in reply to ParkinT

    What about open-licensed code e.g. a BSD license? I know YUI is licensed under BSD but has named their global, Javascript object 'YAHOO.'

  • TInkerghost (unregistered) in reply to James
    James:
    And giving a negative length to substr should also generate an error, so whether it returns "-1" or "false", substr *should* bomb out.
    Depends, negative lengths & negative positions do have a purpose in some languages. EX: grabbing the last 4 of a CC# $l4 = substr($ccn,-4,4); Works regardless of the format the CC was entered in and to me is more readable than: $l4 = substr($ccn,strlen($ccn)-5,4)

    I have to admit I can't come up with a rational for using negative lengths off the top of my head, but I suppose you could make a similar claim for readability.

  • (cs) in reply to Jay
    Jay:
    So if a new browser hits the market that's named "Foogle" or "Yebcrawler", this code will break. I'd take a bet on the odds of that happening any time soon.

    Said the lead COBOL programmer who decided that there was no way COBOL code would last long, and therefore two digit years were a good way to save space.

  • (cs) in reply to minty
    minty:
    tgape:
    MET:
    Nikolai:
    Actually, this indicates that the developer knows what he is doing and tries to write the most efficient code possible (the kind of thing a lot of modern software engineers lack). Definitely not WTF.
    In my 13 years of experience a focus on efficiency first is always the mark of a n000b. After a while most realise that correctness and maintainability are almost always more important, and only a very few places need to coded as if every cycle counts. I agree a good developer should know how to code very efficiently, I just think they should know not to do so most of the time.

    While I agree that overly focusing on efficiency is the mark of a n00b, I certainly hope that you have some sense beyond that of your words.

    Have to agree with @tgape there. While premature optimization is the root of all evil, being an idiot is the root of all slow programs.

    Good programmers (hopefully) try to write decent code all of the time, and part of that is avoiding things that they know are slow or performance killers. It's not a premature optimization if you're not writing code you know to be bad.

    If that was the case, @MET, would you punish everyone using a StringBuilder without a performance test to validate it was necessary?

    No I wouldn't. Not using StringBuilder would be the opposite extreme to premature optimisation; the mistake we see so often on this site of not using good library facilities where these already exist. I agree with tgape's post and your comments here. What I was railing against was programmers who consider efficiency first every time. I just didn't have the time to write such a detailed explanation!

    It's nice to write a post that generates so much traffic though ;)

  • Andrey (unregistered) in reply to plaidfluff

    You are wrong!

    Here some examples: Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp) Yandex/2.01.000 (compatible; Win16; Dyatel; Z)

    Do you think Yahoo have ill-behaved spider?

  • Jim McDish (unregistered)

    Brilliant! Persoanlly, I always kinda liked Spiders! LOL

    RD www.FireMe.To/udi

  • Jay (unregistered) in reply to donniel
    donniel:
    Jay:
    dpm:
    W. Snapper:
    I just did this on a recent project, to find the status of account applications. Their database had both "Approved" and "approved," "declined" and "Declined," etc.

    Why would I not simply omit the first letter? And for those who are somehow claiming clarity, is there anyone who didn't instantly understand the code?

    status = "Reclined". Now what?

    So if a new browser hits the market that's named "Foogle" or "Yebcrawler", this code will break. I'd take a bet on the odds of that happening any time soon.

    That's hardly the point. Your profession is Software Engineering. You always cover your ass.

    Like it's been pointed out a hundred times in the comments already, stripos would do just fine. What's with this dropping the first character bullshit? An efficiency hack has a time and place, and this is not it.

    Wow, I'm surprised how many negative responses I got to that post.

    Let me see if I can sum up the criticism:

    The scenario: We are checking a string that comes from an external source and which has no specific formatting rules. We want to identify strings that represent a certain attribute, i.e. robot vs browser, that is not explicitly identified by any rigorous attribute of the string. We decide to search for some likely substrings within that string in the hopes of making this identification reasonably reliably. So for example, we want to catch "Spider" and "spider".

    Excellent, praise-worthy solution: Look for a match against the 6 characters "spider" using a case-insensitive compare.

    Ignorant, stupid, worthless solution: Look for the 5 characters "pider" with a case-sensitive compare.

    The reasoning here is apparently that the second solution might fail because someone might someday create a browser named "Tpider" which would then incorrectly look like a robot, while the first solution would handle this correctly. But of course it is absolutely impossible that anyone would ever create a browser called "InSPIderation", which would generate incorrect results on the first solution but correct results on the second solution.

    When testing against external data that is not formulated according to a well-defined spec, any solution is inherently unreliable. The best you can do is look at examples of real data and come up with something that works with all the cases you are able to identify.

    I'm not saying I would have used the "pider" solution, but it is not objectively worse than any other solution I have seen proposed.

  • (cs) in reply to benh
    benh:
    Where is the WTF? It's not the most visually appealing, but it would clearly work and is not too roundabout.

    TRWTF is the people who thinks this "clearly" works. Granted I don't have the spec, perhaps they want want to find things that match Tpider...

  • (cs)

    Stop the bus.

    This is PHP we are talking about.

    Surely there is a SpiderDetectionAndCoffeeBeanOptimization module, and he could just import that and use the currentUserAgentIsASpider() function.

    I mean, really!

  • Emil Vikström (unregistered) in reply to Joon
    Joon:
    Surely there is a SpiderDetectionAndCoffeeBeanOptimization module, and he could just import that and use the currentUserAgentIsASpider() function.
    Almost correct, except that it's not an external module. get_browser() is a builtin function and with a good browscap file for your PHP installation you should be able to use this code:
    $browser = get_browser($agent, true);
    if($browser['crawler']) {
       $is_spider = 1;
    }

    This code will catch other spiders too, which may or may not be desirable.

  • Pantero Blanco (unregistered)

    I don't really see a WTF here. Sure, in the unlikely event that "Foogle" comes out, someone will have to rewrite the code...Which shouldn't prove to be hard at all. No one here had trouble seeing what this guy was doing, right?

    It wasn't the best way to do it, but it's a minor mistake at worst.

  • omar (unregistered)

    is it wrong that I misread the array name as $pider_footprint even before I read the rest of the article?

  • Vroomfundel (unregistered) in reply to MET

    Ain't that smart! But it's wrong!

    Using StringBuilder explicitly results in awfully cumbersome code, and string concatenation compiles into StringBuilder anyway.

    So, according to the quoted posts, anyone using StringBuilder is a n00b

  • Vroomfundel (unregistered) in reply to MET
    MET:
    minty:
    tgape:
    MET:
    Nikolai:
    Actually, this indicates that the developer knows what he is doing and tries to write the most efficient code possible (the kind of thing a lot of modern software engineers lack). Definitely not WTF.
    In my 13 years of experience a focus on efficiency first is always the mark of a n000b. After a while most realise that correctness and maintainability are almost always more important, and only a very few places need to coded as if every cycle counts. I agree a good developer should know how to code very efficiently, I just think they should know not to do so most of the time.

    While I agree that overly focusing on efficiency is the mark of a n00b, I certainly hope that you have some sense beyond that of your words.

    Have to agree with @tgape there. While premature optimization is the root of all evil, being an idiot is the root of all slow programs.

    Good programmers (hopefully) try to write decent code all of the time, and part of that is avoiding things that they know are slow or performance killers. It's not a premature optimization if you're not writing code you know to be bad.

    If that was the case, @MET, would you punish everyone using a StringBuilder without a performance test to validate it was necessary?

    No I wouldn't. Not using StringBuilder would be the opposite extreme to premature optimisation; the mistake we see so often on this site of not using good library facilities where these already exist. I agree with tgape's post and your comments here. What I was railing against was programmers who consider efficiency first every time. I just didn't have the time to write such a detailed explanation!

    It's nice to write a post that generates so much traffic though ;)

    Ooops, forgot to quote.

  • Alex "pHARMa" Zarubin (unregistered)

    Pbbly, the original codewriter just didn't know whether to capitalize or not the first legtter of each probable spider...

    Nevertheless, the word pider in Russian means someone sexually incomprehent and having unordinary sexual oriantation (in the bad meaning of this sentence, of course)...

    So, who knows what David meant under pider search - gay search or smth else...

  • Megamannen (unregistered)

    He is probably thinking "I don't know if it's Google or 'google', better safe than sorry"

  • Anonymous (unregistered) in reply to m0ffx

    ===== : Case-sensitive version of ====

  • (cs) in reply to m0ffx

    Is === equivilent to:

    if(value==otherValue&&value==otherValue){}

    Or

    while(true) { if(value==otherValue) { //do stuff break; } else { value=otherValue; } }

    I'd like to know before I start incorporating it into my source.

  • Friday (unregistered)

    cider?

    ps: new fav captcha of all time: transverbero !

  • Douglastab (unregistered)

    Pharmacie sans ordonnance http://kamagraenligne.com/# pharmacie en ligne pas cher

Leave a comment on “pider Detection”

Log In or post as a guest

Replying to comment #:

« Return to Article