Joe K was a developer at a company that provided a SaaS Natural Language Processing system. As Chief Engineer of the Data Science Team (a term that make him feel like some sort of mad professor), his duties included coding the Data Science Service. It provided the back-end for handling the complex, heavy-lifting type of processing that had to happen in real-time. Since it was very CPU-intensive, Joe spent a lot of time trying to battle latency. But that was the least of his problems.


The rest of the codebase was a cobbled-together mess that had been coded by the NLP researchers- scientists with no background in programming or computer science. Their mantra was “If it gets us the results we need, who cares how it looks behind the scenes?” This meant Joe’s well-designed data service somehow had to interface with applications made from a pile of ugly hacks. It was difficult at times, but he managed to get the job done while also keeping CPU usage to a minimum.

One day Joe was working away when Burt, the company CEO, burst in to their humble basement computer lab in an obvious tizzy. Burt rarely visited the “egghead dungeon”, as he called it, so something had to be amiss. “JOE!” he cried out. “The production data science service is completely down! Every customer we have gave me an angry call within the last ten minutes!”

Considering this was an early-stage startup with only five customers, Burt’s assertion was probably true, if misleading. “Wow, ok Burt. Let me get right on that!” Joe offered, feeling flustered. He took a look at the error logging service and there was nothing to be found. He then attempted to SSH to each of the production servers, with success. He decided to check performance on the servers and an entire string of red flags shot straight up the proverbial flag pole. Every production server was at 100% CPU usage.

“I have an effect for you, Burt, but not a cause. I’ll have to dig deeper but it almost seems like… a Denial of Service attack?” Joe offered, not believing that would actually be the case. With only five whitelisted customers able to connect, all of them using the NLP system to its fullest shouldn’t come even close to causing this.

While looking further at the server logs, Joe got an instant message from Xander, the software engineer who worked on the dashboards, “Hey Joe, I noticed prod was down… could it be related to something I’m doing?”

“Ummm… maybe? What is it you are doing exactly?” Joe replied, with a new sense of concern. Xander’s dashboard shouldn’t have any interaction with the DSS, so it seemed like an odd question. Requests to the NLP site would initially come to a front-end server, and if there was some advanced analysis that needed to happen, that server would RPC to the DSS. After the response was computed, the front-end server would log the request and response to the Xander’s dashboard system so it could monitor usage stats.

“Well, the dashboard is out of sync,” Xander explained. There had been a bug causing events to not make it to the dashboard system for the past month. They would need to be added to make the dashboard accurate. This could have been a simple change to the dashboard’s database, but instead Xander decided to replay all of the actual HTTP requests to the front end. Many of those requests triggered processing on the DSS- processing which had already been done. And since it was taking a long time, Xander had batched up the resent requests and was running them from three different machines, thus providing a remarkably good simulation of a DDoS.


“Ok, ok, sorry. I’ll get this cleaned up,” Xander assured Joe. Within 15 minutes, the server CPU usage returned to normal levels and everything was great again. Joe was able to get Burt off his back and return to his normal duties.

A few minutes later, Joe’s IM dinged again with a message from Xander. "Hey Joe, sorry about that, LOL. But are we 100% sure that was the problem? Should I do it again just to be sure?

If there was a way for Joe to use instant messaging to send a virtual strangulation to Xander, he would have done it. But a “HELL NO!!!” would have to suffice.

[Advertisement] BuildMaster allows you to create a self-service release management platform that allows different teams to manage their applications. Explore how!