Every so often, Bob B. observed that his company's e-commerce site would crash-hard. No one had any clue as to why it happened, but everyone knew how to fix it. Restart both the IIS and SQL Server processes and, voilà, within a minute, the site was up and running again.
Like an old car with a few quirks, the company worried that tinkering with the application might make things worse. But after a few months and a handful of customer complaints, Bob was permitted to investigate the issue so long as he wasn't too intrusive.
Waiting for Failure
The first problem Bob encountered was with the home-grown logging module. Whenever the application crashed, it would crash too, leaving Bob with a log filled with "error occurred while logging an error" messages. A few weeks and a crash or two later, Bob fixed the logging code and deployed it to production.
It didn't take too long for another crash to occur. Bob dove right into the log files and saw the dreaded "Server Out Of Memory" error coming from SQL Server. A Google search revealed that a Service Pack installation would most certainly fix things. So, after cajoling the bosses to let him upgrade SQL Server, Bob had the service pack installed. Now it became a waiting game to see what the real problem was.
A tense few weeks passed with out a single crash reported, and Bob assumed the Service Pack had fixed the problem. Then the site crashed again with the same error message: "Server Out Of Memory." Bob started digging further and noticed that the server was, indeed, out of memory. The reason was pretty clear, too: there were nearly 2,000,000 active visitor sessions open.
For a niche e-commerce Web site with an average of a thousand shoppers a day, 2 million visitors were far out of the ordinary. Bob wondered about a Denial of Service (DoS) attack, but quickly ruled that out. No one would possibly bother to DoS their site.
Going Around in Circles
Bob started looking through IIS log files from around the time of the crash, but saw no promising leads. It looked like an everyday log file. After a half hour of reading line after line, Bob held the PageDown key while he tried to think of a better approach. And that's where he noticed a slew of requests from the same IP address. Checking other resources to see request headers, he saw the IP address belonged to an AOL user that seemed to be browsing from somewhere in Ohio.
66.77.93.50 - [08:34:29] "GET /access?action= _ _ forward&uri=%2Ferror.aspx HTTP/1.1" 302 - "-" "-" "-" 66.77.93.50 - [08:34:29] "GET /error.aspx _ _ HTTP/1.1" 302 - "-" "-" "-"
Pieces started coming together. Some Web surfer from Ohio got into an infinite redirect loop that was creating a new session with each iteration. Apparently, the AOL user was patient enough to let that loop continue for almost 11 hours.
By disabling cookies in his browser and typing in a specially crafted URL, Bob confirmed that he could trigger an endless loop of redirection as well. But the mystery was how the loop was initiated.
Bob dug until he found the first log entry from the user:
66.77.93.50 - [08:34:29] _ _ "GET /favicon.ico HTTP/1.1" 302 - "-" "-" "-"
It was for the favicon, that small icon that appears next to the address browser. Even stranger, it was the Ohioan's first and, aside from the redirects, only request.
In the end, Bob figured out exactly how the problem had happened: some random visitor using an older version of AOL bookmarked their site. The user wasn't trying to visit the site, let alone waiting 11 hours for a page to come up. He or she just happened to have the AOL browser open, which would then periodically attempt to update the favicons for its bookmarked sites, and then diligently follow the endless redirects as ordered.
Bob quickly added a favicon.ico to the root folder and patched up the infinite redirection problem. The crashing problem went away and never returned. But, for some time, the company's Web site was at the mercy of the browsing habits of some random family living in Ohio.