Weekend work 2012-07-16 28 (7583096606)

It'd been two hours since Mike had gone to bed. Two slow, miserable hours of counting sheep, staring at the barely visible ceiling, and trying to shake off the stress of the last few weeks at work.

His company had (finally) moved from bare metal to a modern, virtualized environment. Mike and a few coworkers were in charge of revamping the whole backup system to match the efficiency of the servers. They'd been handed a generous budget and a single task: make daily backups of all the key servers feasible.

Hundreds of emails, days of overtime work, and plenty of bickering had resulted in an impressive setup. Massive JBOD arrays, managed by virtual servers in a cabinet full of blades and hooked up to top-of-the-line 10G switches, all hummed along nicely in the server room.

The backup solution was ready for its first test. Mike had personally started the backup job late in the afternoon. The whole team had gathered over his monitor to watch the speed slowly crawl up.

One, two... six gigabits per second. With 30TB worth of snapshots, the whole process would take under half a day, meeting the original goal with a huge margin. Mike had gone home a few minutes later, hoping to see the backup completed by morning. But during the evening, his initial optimism had slowly faded.

What if we misconfigured something? he kept thinking. Or worse, the hardware fails?

Every time he felt close to sleep, his mind conjured yet another disastrous scenario, jolting him wide awake again.

Finally, Mike's eyes darted towards the laptop resting in a bag by his bedside. He knew he shouldn't be doing work this late at night ... but surely a little peek at the administration panel wouldn't hurt?

Mike crawled out of bed and booted up the laptop while rationalizing his surrender to temptation. It'll just take a few minutes. See that everything's fine, then get some sleep.

He logged in to the corporate VPN, opened up the dashboard ... and realized he wouldn't be returning to bed that night. The backup was stuck at less than four percent, and the speed had dropped from six gigabits to barely over a megabit per second.

Mike clicked around frantically, trying to pinpoint the problem. The only thing he could find was a warning in VMWare's logs saying that one of the logical units had ended up with over a second of latency. He browsed the network configurations, SMART error logs, but everything else seemed to be in perfect order. All he knew was that a few minutes after he'd left work, the latency had spiked and never returned to a reasonable level.

Over the next 20 minutes, Mike tried changing almost every single option even vaguely related to the problem. He was ready to give up—until finally, after closing what seemed to be a thousandth dialog window, the backup speed suddenly ramped back up to several gigabytes per second.

It took a while for Mike to remember what exactly he'd just changed. All he'd done was switch the load balancing algorithm used to communicate between the servers and the storage.

That shouldn't have changed a thing, he thought, puzzled. He decided to watch the process for a while.

The problems came back no more than a few minutes later, but by that time, Mike had an idea of what could be causing them. He remotely disabled the currently used port on the network switch.

Sure enough, the traffic flew freely through a different port for a few minutes before clogging up again.

The next few days, spent mostly on support calls and filing tickets with the switch manufacturer, confirmed Mike's suspicions. The large amount of data being pushed through the port slowly filled more and more space in the port's buffer. When that space ran out, the port crashed, decimating the network speed and slowing the backups down to a crawl.

Luckily, just a few firmware updates later the issue was fully resolved, and Mike was able to enjoy not only the blazing-fast backups and the convenience of virtual servers, but also a good night's sleep.

[Advertisement] BuildMaster allows you to create a self-service release management platform that allows different teams to manage their applications. Explore how!