The Watchdog Hydra

Ammar A uses Node to consume an HTTP API. The API uses cookies to track session information, and ensures those cookies expire, so clients need to periodically re-authenticate. How frequently?

That's an excellent question, and one that neither Ammar nor the docs for this API can answer. The documentation provides all the clarity and consistency of a religious document, which is to say you need to take it on faith that these seemingly random expirations happen for a good reason.

Ammar, not feeling particularly faithful, wrote a little "watchdog" function. Once you log in, every thirty seconds it tries to fetch a URL, hopefully keeping the session alive, or, if it times out, starts a new session.

That's what it was supposed to do, but Ammar made a mistake.

function login(cb) {
	request.post({url: '/login', form:{username: config.username, password: config.password}, function(err, response) {
		if (err) return cb(err)
			
		if (response.statusCode != 200)
			return cb(response.statusCode);
		
		console.log("[+] Login succeeded");
		setInterval(watchdog, 30000);
		return cb();			
	})
}


function watchdog() {
	// Random request to keep session alive
	request.get({url: '/getObj', form:{id: 1}, (err, response)=>{
		if (!err && response.statusCode == 200)
			return console.log("[+] Watchdog ping succeeded");
		
		console.error("[-] Watchdog encountered error, session may be dead");
		login((err)=>{
			if (err)
				console.error("[-] Failed restarting session:",err);
		})
	})
}

login sends the credentials, and if we get a 200 status back, we setInterval to schedule a call to the watchdog every 30,000 milliseconds.

In the watchdog, we fetch a URL, and if it fails, we call login. Which attempts to login, and if it succeeds, schedules the watchdog every 30,000 milliseconds.

Late one night, Ammar got a call from the API vendor's security team. "You are attacking our servers, and not only have you already cut off access to most of our customers, you've caused database corruption! There will be consequences for this!"

Ammar checked, and sure enough, his code was sending hundreds of thousands of requests per second. It didn't take him long to figure out why: requests from the watchdog were failing with a 500 error, so it called the login method. The login method had been succeeding, so another watchdog got scheduled. Thirty seconds later, that failed, as did all the previously scheduled watchdogs, which all called login again. Which, on success, scheduled a fresh round of watchdogs. Every thirty seconds, the number of scheduled calls doubled. Before long, Ammar's code was DoSing the API.

In the aftermath, it turned out that Ammar hadn't caused any database corruption, to the contrary, it was an error in the database which caused all the watchdog calls to fail. The vendor corrupted their own database, without Ammar's help. Coupled with Ammar's careless setInterval management, that error became even more punishing.

Which raises the question: why wasn't the vendor better prepared for this? Ammar's code was bad, sure, a lurking timebomb, just waiting to go off. But any customer could have produced something like that. A hostile entity could easily have done something like that. And somehow, this vendor had nothing in place to deal with what appeared to be a denial-of-service attack, short of calling the customer responsible?

I don't know what was going on with the vendor's operations team, but I suspect that the real WTF is somewhere in there.

[Advertisement] Utilize BuildMaster to release your software with confidence, at the pace your business demands. Download today!