| « A Thank You & Out of Place Pics | Deleted Software » |
"I came across this snippet in our header file," wrote David, "it's a basic webspider detector that is used later on to record certain actions differently if $is_spider was set to 1."
"Rather than check the difference between 0 and FALSE (or use a more appropriate function), the original developer just dropped the first letter of each crawler name so that strpos doesn't return 0."
$spider_footprint = array('ooglebot', 'rawler', 'pider', 'ulliver',
'arvest', 'ahoo! Slurp');
foreach($spider_footprint as $spider_name)
{
if (strpos($agent, $spider_name))
{
$is_spider = 1;
break;
}
}
|
ee roblems or ther oders hen hey ry o igure ut hat s oing n n his ode...
|
|
Don't you realize that each of those spider names is a Registered Trademark.
If they appeared in the source code, David's company would be obliged to pay usage fees! {or post a disclaimer in the comments} |
|
Here's a much simpler spider detection routine, which I use all the time:
if (strpos($_SERVER['HTTP_USER_AGENT'], '+http') === false) Every well-behaved spider puts a URL, prefixed with +, into the user-agent, to give information about said spider. Any spider which isn't well-behaved is likely to spoof itself as something else entirely. Ergo, it's only worth worrying about ones which put the +http indicator into the UA. |
Re: pider Detection
2008-08-18 13:35
•
by
m0ffx
(unregistered)
|
==== : is genuinely, truly, ultimately, indisputably, beyond all reasonable doubt equal test |
| « A Thank You & Out of Place Pics | Deleted Software » |