|
Yahoo + POS website = fun
Last post 07-28-2008 4:17 PM by BeenThere. 13 replies.
-
07-24-2008 11:18 PM
|
|
-
Ion9


- Joined on 08-24-2006
- Posts 11
|
Yahoo + POS website = fun
A month or so ago, one of the websites under my care got slammed hard for a solid three weeks to a month.
What really confused all of us was the gradual increase in bandwidth without an increase in access_log, error_log, or google analytics based hits, so we suspected foul play. That break in the bandwidth is when we implemented an abuse throttle that would temporarily ban IP's with more then 60 connections to the server in the span of 1 minute, unfortunately we had to make some minor "tweaks" as we were banning a number of corporate gateways, and Google bot which allowed the culprit got back into the swing of things like nothing had happened. Its at that point we figured out who or what as responsible ~$4,800 USD in unscheduled bandwidth charges later, Yahoo turns out to have found a circular hole in the rats nest of mod_rewrite rules for the website. More intelligent robots like Googlebot and MSN have a bunch of sanity checks to prevent this, both to save themselves money and to avoid killing what their aggregating...but not Yahoo, Yahoo appears to get impatient and actually re-request a transaction if its taking too long, excaberating an already bad situation, then to make things even more fun, from looking back over the logs... it looks like its various threads would often grab the same jpg or swf object repeatedly... just in case the file had changed in the span of a minute or so. The obvious WTF is that our website is a POS that needs to be printed out, burned, and mailed to the genius who implemented his own version of MVC that doesn't even come close to what a MVC framework is. The other wtf in my mind is that Yahoo appears obvlious to the havoc it creates with a search aggregoter bot that is borderline DoS'ing its targets.
"If it's not broken, don't even look at it funny." There are nightmares waiting for you in anything labeled "experimental".
|
|
-
-
morbiuswilters


- Joined on 01-15-2008
- Cambridge, MA
- Posts 2,634
|
Re: Yahoo + POS website = fun
Ion9:The other wtf in my mind is that Yahoo appears obvlious to the havoc it creates with a search aggregoter bot that is borderline DoS'ing its targets.
A few years ago I had a problem with Googlebot DOSing a web app. The problem was we had a random GET var on most requests to get around stupid caching proxies and the bot got stuck in a loop and couldn't get out. By the time the DB server stopped accepting connections and the load hit 100 Googlebot was doing 900 GETs per second.
< pstorer> Bans don't mean shit on the forum. It's like being on the Sex Offender List. You can still entice kids into your van with candy.
Want more? Go the IRC channel #TDWTFMafia on irc.slashnet.org.
Bush^3 vs. /O(s|b)ama Bi(n La)?den/ -- YOU DECIDE!
|
|
-
-
benryves


- Joined on 04-07-2006
- Posts 56
|
Re: Yahoo + POS website = fun
Yahoo's bot is also the single worst offender here -- Yahoo Slurp accounting for 183.62 MB, Googlebot accounting for 31.15 MB and MSNBot-media accounting for 16.25 MB.
I'm really not sure what it's doing to make it use nearly 6 times as much bandwidth as Google, or 11 times as much as MSN, though I'm not using mod_rewrite anywhere so I can rule out your explanation!
|
|
-
|
|
Re: Yahoo + POS website = fun
Oooooooooooold. I once had a HTTP server behind a DSL line for some temporarily hosted data. After I linked to an image to it on a forum, many search engines started checking even the start page. That's nice - and most search engines didn't cost much traffic. They only did one connection at once, and only about once a day. Not Yahoo, which repeatedly read the very same page over and over again - which cost a noticable amount of bandwidth. I ended up banning Yahoo's bot by Apache rules, as the site couldn't be found on Yahoo anyway.
|
|
-
-
dhromed


- Joined on 04-13-2005
- Dutchland
- Posts 2,541
|
Re: Yahoo + POS website = fun
I just checked my stats, fearing the worst. And What The Fuck? | # |
Hits |
Files |
KBytes |
Visits |
Hostname |
|
| 1 |
165 |
0.48% |
115 |
0.45% |
88944 |
12.39% |
58 |
0.93% |
llf320038.crawl.yahoo.net |
| 2 |
202 |
0.59% |
172 |
0.67% |
71044 |
9.90% |
105 |
1.69% |
llf520173.crawl.yahoo.net |
| 3 |
173 |
0.51% |
139 |
0.54% |
43039 |
6.00% |
81 |
1.30% |
llf520064.crawl.yahoo.net |
| 4 |
42 |
0.12% |
28 |
0.11% |
41219 |
5.74% |
11 |
0.18% |
llf320056.crawl.yahoo.net |
| 5 |
70 |
0.21% |
53 |
0.21% |
34655 |
4.83% |
28 |
0.45% |
llf520125.crawl.yahoo.net |
| 6 |
106 |
0.31% |
95 |
0.37% |
31317 |
4.36% |
57 |
0.92% |
llf520190.crawl.yahoo.net |
| 7 |
29 |
0.09% |
25 |
0.10% |
22313 |
3.11% |
14 |
0.23% |
llf520107.crawl.yahoo.net |
| 8 |
8 |
0.02% |
8 |
0.03% |
21892 |
3.05% |
0 |
0.00% |
|
| 9 |
38 |
0.11% |
22 |
0.09% |
18813 |
2.62% |
18 |
0.29% |
llf520165.crawl.yahoo.net |
| 10 |
37 |
0.11% |
31 |
0.12% |
16527 |
2.30% |
1 |
0.02% |
|
— Flurp.
|
|
-
-
sentix


- Joined on 05-06-2008
- Posts 5
|
Re: Yahoo + POS website = fun
OperatorBastardusInfernalis:Oooooooooooold.
I know that the search engine's conquest to ass rape the internet, one site at a time, is old... but that just makes things worse, you'd think for a symbiotic relationship ( search & content ) it wouldn't be parasitical or destructive to the tech and coders that are not OCD about looking for flaws in their sites.
|
|
-
-
dcardani


- Joined on 11-11-2005
- Posts 49
|
Re: Yahoo + POS website = fun
What gets me about this is that it not only cost the OP's company $4800 in bandwidth charges, it also probably cost Yahoo additional money in bandwidth charges, too. And if they're doing this to other sites, which it appears they are, they're really screwing themselves. There's all this talk of lost ad revenue at Yahoo. I wonder how much they could make back by writing decent software?
|
|
-
-
sootzoo


- Joined on 02-12-2007
- Posts 164
|
Re: Yahoo + POS website = fun
All in all, this is either the stupidest thing I've read all week (and I'm porting SSDS to .NET!), or the worst trolling attempt ever. -bstorer
|
|
-
-
morbiuswilters


- Joined on 01-15-2008
- Cambridge, MA
- Posts 2,634
|
Re: Yahoo + POS website = fun
dcardani:What gets me about this is that it not only cost the OP's company $4800 in bandwidth charges, it also probably cost Yahoo additional money in bandwidth charges, too.
The difference between the OP's bandwidth costs and Yahoo's would be dramatic. Companies like Yahoo lease their own lines and have a static pool of bandwidth whereas the OP is obviously paying on some kind of committment plan. In the long run this would require Yahoo to have more lines coming in, but the cost is not nearly what you think it is, at least relative to the $4800 the OP had to pay in overages.
< pstorer> Bans don't mean shit on the forum. It's like being on the Sex Offender List. You can still entice kids into your van with candy.
Want more? Go the IRC channel #TDWTFMafia on irc.slashnet.org.
Bush^3 vs. /O(s|b)ama Bi(n La)?den/ -- YOU DECIDE!
|
|
-
-
sentix


- Joined on 05-06-2008
- Posts 5
|
Re: Yahoo + POS website = fun
Yeah, the difference between a conglomerate web search company's network setup and a website with 200K unique visits... would be like comparing a small office lan to a national telecom. The site is parked in oblivion and meant to only serve about 350-500GB a month, not 1.8 TB.
|
|
-
-
morbiuswilters


- Joined on 01-15-2008
- Cambridge, MA
- Posts 2,634
|
Re: Yahoo + POS website = fun
sentix:Yeah, the difference between a conglomerate web search company's network setup and a website with 200K unique visits... would be like comparing a small office lan to a national telecom. The site is parked in oblivion and meant to only serve about 350-500GB a month, not 1.8 TB.
I should take this opportunity to point out that TRWTF is that the OP's server is not using bandwidth throttling to prevent massive overcharges. I mean, what if I was just some jerk who wanted to cost his company a lot of money? I could sit and suck down tons of bandwidth before anyone noticed.
< pstorer> Bans don't mean shit on the forum. It's like being on the Sex Offender List. You can still entice kids into your van with candy.
Want more? Go the IRC channel #TDWTFMafia on irc.slashnet.org.
Bush^3 vs. /O(s|b)ama Bi(n La)?den/ -- YOU DECIDE!
|
|
-
-
Dalden


- Joined on 06-18-2008
- Cape Town, South Africa
- Posts 17
|
Re: Yahoo + POS website = fun
OperatorBastardusInfernalis:Not Yahoo, which repeatedly read the very same page over and over again - which cost a noticable amount of bandwidth. I ended up banning Yahoo's bot by Apache rules, as the site couldn't be found on Yahoo anyway. If you don't want to be indexed, why not use a robots.txt file? Or doesn't Yahoo honour it anymore?
|
|
-
-
sentix


- Joined on 05-06-2008
- Posts 5
|
Re: Yahoo + POS website = fun
Dalden:If you don't want to be indexed, why not use a robots.txt file? Or doesn't Yahoo honour it anymore? Well we do want to be indexed, for the websites target market its #1 and #2|3 in the search results for all of its core keywords. Just wish it didn't involve the server being raped in the process.
|
|
-
-
BeenThere


- Joined on 04-11-2008
- Posts 128
|
Re: Yahoo + POS website = fun
sentix:Well we do want to be indexed, for the websites target market its #1 and #2|3 in the search results for all of its core keywords. Just wish it didn't involve the server being raped in the process. Well of course your #1 - your site has thousands and thousands of keyword related content - all almost identical except for the URL of course. Actually, after rapping your server, you are lucky they didn't accuse you of "blacklisted SEO tactics" and ban your site. Abusing a poor Yahoo indexing bot just to crank your listings up - shame on you!
The mind boggles, And yet the goggles, They do nothing.
|
|
Page 1 of 1 (14 items)
|
|
|