• Christopher Jefferson (google)

    For any company, there is a largest DoS attack they can cope with -- I bet if I started hammering the servers of many small companies I could bring them down with a fairly similar "attack".

    The fact they thought he had corrupted the database was a mistake on their part, but most people panic when they see all their servers go down.

  • (author)

    If you're selling access to an API, I feel like "one badly behaved client" should be below your threshold of "largest DoS they can cope with".

  • (nodebb) in reply to Remy Porter

    @Remy Sort of. If the API hits a very niche market you might be lucky to have a pool of a thousand customers. I'd say it's reasonable to be unable to serve a million times that much traffic from one nutjob. (It takes Ammar's bug only ten minutes to multiply its traffic by a million times.)

  • Dave (unregistered) in reply to Remy Porter

    Unless the client is particularly badly behaved, it's all coming from one IP address. There's no excuse for being unable to handle that.

  • WTFGuy (unregistered)

    We also don't know how many machines Ammar commands. If this goof got deployed to 20 of his servers, or 20K of his clients, that makes the exponential even more exciting. Though as Steve almost says, adding a constant multiplier to an exponential doesn't change the end-game much.

    For awhile I was in Steve's boat. My firm was a minnow providing a niche web service to Fortune 500-scale outfits. If any of them had a screw-up in their server infrastructure & started flogging me hard, much less exponentially, they could crush my total bandwidth like a grape shortly after they'd utterly DOSed our servers into quivering thrashing leaking submission long before they noticed anything noteworthy on their own network traffic shape.

    We were fortunate to never have that happen. But we thought about the possibility.

  • (nodebb) in reply to Remy Porter

    Vendors often resort to yelling at their customers instead of fixing things.

    I just went through a case where a vendor's system has a reporting interface which is a read-only SQL database that we access over a VPN. Their database was caching bad query plans for weeks. Bad as in - a particular query will take 90 second to execute, but if you modify any part of the query, it executes in less than a tenth of a second.

    I made a detailed write-up and sent it to the vendor. I got back a "SQL Server caches query plans, that's normal" response. After another attempt to explain that I was aware of caching, but theirs is broken, I gave up.

    I tweaked the database layer to add a comment containing a GUID at the beginning of every query. Now their sever will never cache anything. They made me do it.

  • hater (unregistered)

    was it oracle? that's oracle's playbook: blame the customer

  • Brian (unregistered)

    Heh, I did something similar once. I was tweaking some code that spawned up a bunch of threads, each of which called a third-party API. The vendor had a frequency limit, but they mostly relied on customers policing themselves and some automated tools to detect overages after the fact. (As opposed to another vendor we were using who had implemented a proper throttling mechanism.) So one of my tests accidentally created far too many of these threads, and sure enough we got an angry email saying we were using up too much bandwidth and would we kindly leave some for their other customers. And no matter how many times we responded that it was just an isolated incident and not our regular operation, we would get another one every couple of days, because our averages for the month had been thrown all out of whack.

    And yes, I'm fully aware that we should have been using mocks of their interface for testing, but this was a startup that was still in the "just git 'r done" phase and hadn't quite reached that level of maturity yet.

  • Foo AKA Fooo (unregistered) in reply to WTFGuy

    "adding a constant multiplier to an exponential doesn't change the end-game much."

    Actually, it does. The exponential will soon be limited by the available bandwidth or processing power of the machine, so what actually matters is the total bandwidth of all machines participating. That's why DDOS is a thing.

  • Sole Purpose of Visit (unregistered) in reply to Foo AKA Fooo

    Well, that's obviously true, but not a particularly relevant "correction."

    If you take yer average data center, say Microsoft, there are about 5000 servers. Simplifying by assuming a single network connection to each, that there is your constant -- 5000.

    Now start with an external client cloud (could be a single IP, as in this instance) that does a fork bomb for logins. That's your exponential. Start with two logins, move on to four logins in thirty seconds, and (as here) we get a million "DoS" events after ten minutes, if I have my exponentials right. Obviously, at some point shortly after that, one of those 5000 servers is going to fall over. Which leaves 4,999 other servers to cope.

    Leaving alone internal considerations such as database replications, and assuming pure REST semantics ... well now, 5000 is 2^~14.

    Congrats: you've given yourself seven minutes more leeway. I'd say that is, indeed, "not changing the endgame much."

  • Sole Purpose of Visit (unregistered) in reply to Sole Purpose of Visit

    Or, alternatively, and I think this was your point: 5000 separate client machines might enable a usefully distributed attack, because as you say there is a limit to how many logins a client can request. In which case you are correct, but not usefully so. All you're really saying is that the exponential is limited to some asymptote by the resources involved, which frankly shouldn't be of any interest to anybody running a server farm. Call it DoS, call it DDos, the constant here doesn't really matter.

  • Sole Purpose of Visit (unregistered)

    Bond: "You expect me to talk?" Goldfinger: "No, Mr Bond, I expect you to die!" Bond: "Don't forget, there's this pesky constant. Gimme seven more minutes. I'll figure something out."

  • Sleezo the Programmer (unregistered)

    I read about a programmer who closed a ticket by writing "I no longer write software. I makes furniture now." I'm thinking of a similar approach. Instead of making furniture, I'm going to sell cars. Find a spot with high road traffic. Fill it up with loud signage, foil banner flags wavering around, a towering fan man, and blinking lights and honking horns. Wait in a tow truck across the street and as soon as someone has their head turn, BAM!, crash right into them. Get out and bully them into a sale. If not a new car, at least junk theirs and take the tires.

    Now that I think about it more, with all the amateur API implementations, you can do something similar. After login, download 1 MB profile picture. Profile picture is hosted on a CDN you resell. Client signs up for free tier CDN with API.

    Imagine if this client implementation deployed, were they had to get thousands of users to uninstall or update software to stop the requests.

    hmmmmm......

  • Andrew A. Gill (unregistered) in reply to Remy Porter

    I remember having a similar reaction back on this comment

    https://thedailywtf.com/articles/comments/robot-anarchy#comment-506060

    I'm gonna apply the same logic here.

    The vendor knew how many people they were supporting. They knew what network segment those users were on, & should have known what the maximum bandwidth that segment could pump out, as well as the equivalent number of HTTP requests that could carry.

    They also, I'm sure, have dealt with clients who got phishing attacks which made their users zombies which might have attacked the vendor's hosts.

    You have all the data you need for a robust DDoS protection. If you don't actually implement it, that's your problem, fine Vendor.

  • Pierre (unregistered)

    A (large) number years ago I worked at a company which aggregated prices of travel products. A customer on the website would give us a few bits of info and then we would call partners APIs and return products and prices etc. We also had loads of deals which linked to a partner website and would also hit their API. These deals appeared on 1M+ landing pages. Every now and then we would get angry complaints from our partners claiming we were DoSing their APIs. Couldn't work out why and internal data didn't match.

    Turned out that when we replatformed the website that we hadn't included nofollow attributes on the links. So whenever google, microsoft, ad networks or anything crawled our site they would follow the links and take down the partners website briefly... ooops.

  • Pierre (unregistered)

    A (large) number years ago I worked at a company which aggregated prices of travel products. A customer on the website would give us a few bits of info and then we would call partners APIs and return products and prices etc. We also had loads of deals which linked to a partner website and would also hit their API. These deals appeared on 1M+ landing pages. Every now and then we would get angry complaints from our partners claiming we were DoSing their APIs. Couldn't work out why and internal data didn't match.

    Turned out that when we replatformed the website that we hadn't included nofollow attributes on the links. So whenever google, microsoft, ad networks or anything crawled our site they would follow the links and take down the partners website briefly... ooops.

  • Foo AKA Fooo (unregistered) in reply to Sole Purpose of Visit

    I don't really follow your setup. Why is one server "obviously" failing? Do you picture some action combat like scenario where you direct all your firepower to one victim (which doesn't shoot back)? In reality, the data center would distribute your traffic, so with the power of one machine, you can't even tickle 5000 servers. Even if you run full-load from the start, so the initial exponential growth is irrelevant (exponential growth is only sustainable for a short time, almost always, including in nature, see COVID).

    For that further "2^14" increase, you need 2^14 machines on your side to cope. That's the constant multiplier you need.

  • imdumb (unregistered)

    I don't get why the number of watchdogs was doubling? Watchdog calls login, this schedules next watchdog in 30 seconds... why are there two watchdogs after that period? They are also triggered from the outside?

  • (nodebb)

    I did something similar way way back when my company installed one of the first DEC VAX-780s. I learned about this cool command "spawn," and of course had to test it out (pseudocode, not the actual macro)

      function foo( spawn(foo); spawn(foo); exit)  

    Wow... I quickly deleted the original file "foo" and considered myself lucky I knew the head guy responsible for keeping the computer running smoothly

  • Foo AKA Fooo (unregistered) in reply to cellocgw

    I inadvertently did that at university (in shell). It was spawning so fast that I couldn't abort it or do anything. Already on my way to the supervisors to get my head washed, I realized it was running in an NFS mounted home directory, so I could log into another machine and delete (or indeed, "chmod 0") the script from there. The first computer became responsive again after a while. :)

  • youre-not-so-dumb (unregistered) in reply to imdumb

    This is a classic javascript error. The programmer likely intended to use setTImeout, which runs the function once, rather than setInterval, which runs the function repeatedly on the interval.

  • Dave (unregistered) in reply to imdumb

    "The setInterval() method repeats a given function at every given time-interval."

    "The clearInterval() method stops the executions of the function specified in the setInterval() method."

    https://www.w3schools.com/js/js_timing.asp

    the watchdog functions kept being called forever every 30s

  • Chakat Firepaw (unregistered) in reply to imdumb

    Because Watchdog didn't stop when it called Login.

    Someone triggers Login, which starts Watchdog(1). Watchdog(1) checks after 30s, gets an error, calls Login. Login starts Watchdog(2), tells Watchdog(1) it's finished. Watchdog(2) checks after 30s, gets an error, calls Login. Login starts Watchdog(3), tells Watchdog(2) it's finished. Watchdog(1) checks after 30s, gets an error, calls Login. Login starts Watchdog(4), tells Watchdog(1) it's finished.

    And so on, with each instance of Watchdog causing another to be spawned every 30s.

  • Olivier (unregistered) in reply to Dave

    That makes even less sense... If clearInterval knows to stop the calls to the function why setInterval would allow to start two or more repeated occurrences of a timer with the same function name?

    The problem is not that watchdog is being called every 30 seconds, that is the expected behaviour, but after the 1st failure and execution of login, 2 whatchdog timers are running, and 4 after another 30 seconds, etc.

    Any call to setInterval(watchdog, 30000) should cancel the previous timer set for the function watchdog; or else how clearInterval(watchdog) will know which one of the 2^20 it is supposed to clear?

    I am so glad I don't have to touch that JS mess.

  • (nodebb) in reply to Olivier

    Any call to setInterval(watchdog, 30000) should cancel the previous timer set for the function watchdog; or else how clearInterval(watchdog) will know which one of the 2^20 it is supposed to clear?

    Because setInterval() returns an identifier that can be passed to clearInterval() to cancel the timer. Here though it just gets thrown away so there's no way to identify the interval later to clear it.

    https://developer.mozilla.org/en-US/docs/Web/API/WindowOrWorkerGlobalScope/setInterval https://developer.mozilla.org/en-US/docs/Web/API/WindowOrWorkerGlobalScope/clearInterval

    There are perfectly valid reasons why you might want more than one periodic calls to the same function, so you don't want setting one to have the added effect of cancelling another. That's why it's called setInterval, not setOneIntervalAndMaybeClearAnother.

  • LCrawford (unregistered)

    This actually happened recently - https://twitter.com/OSM_Tech/status/1309872590299844608

  • imdumb (unregistered) in reply to youre-not-so-dumb

    Thanks everyone, now I understand that it's because the setInterval was used, and with this construct it should actually be setTimeout. And they say that it's C++ that shoots your foot!

  • (nodebb) in reply to Sole Purpose of Visit

    Bond: "You expect me to talk?" Goldfinger: "No, Mr Bond, I expect you to die!" Bond: "Don't forget, there's this pesky constant. Gimme seven more minutes. I'll figure something out."

    That's effectively what happened. The laser took several seconds to reach his nether regions and in that time he was able to persuade Goldfinger he was more useful alive than dead.

  • Heisenberg200 (unregistered)

    I've written the same exact code and with the exact same bug! But I only hammered our company's production servers, until the node instance threw so many errors i had to kill it.

    It's almost a right of passage to do this error in node!

  • Some Ed (unregistered) in reply to imdumb

    Every programming language provides a gun aimed at your feet. Some of them are easier to trigger than others, but they all have it.

    Even after my roommate questioned me on what the exact definition of 'a programming language' is, I still feel this is true. It's just the scope of the answer was a bit broader than I expected.

Leave a comment on “The Watchdog Hydra”

Log In or post as a guest

Replying to comment #517998:

« Return to Article