Developers sometimes fail to appreciate how difficult a job Operations really is. In companies that don't hold with newfangled DevOps, the division of labor often comes with a division of reputation as well. After all, developers do the hard work of making software. What are Ops guys even for? They don't make software. They don't generate leads or fix your desktop PC. Why bother paying for talented senior Ops professionals?
Spend a few days with the Ops team, however, and you start to see why you should pay them a little more than your average garbageman. The Ops lifecycle is a daily grind of deployments, patching, and sticking fingers in dykes, trying to keep that expensive cesspit the devs call "software" running. Simple tasks such as spinning up new infrastructure in AWS often get pushed to the back burner behind putting out fires and making sure critical maintenance tasks that didn't get done last year don't explode into flames.
Still, companies like to cut corners. Often, Ops folks have very little programming expertise and no training budget, meaning repetitive tasks are automated using cobbled-together bits of shell script found via Google. In the Ops world, a bit of Perl or Python is worth its weight in gold.
Today's snippet, as you can probably guess, is not in Perl or Python. It is instead in a common paradigm: Bash embedded in Perl. Likely, the original script was written by a senior who knows Perl, and this chunk was written by a strapped-for-time medior who didn't:
my $secs = `cut -f1 -d. /proc/uptime`;
$data{lastboottime} = strip(`date -d "$secs seconds ago" '+%Y'-'%m'-'%d'T'%T' 2>/dev/null`);
The point of this snippet is to gather the last time the machine booted; later code sends it to a central inventory system. The bug here is that the last boot time would drift by a second or so between updates—not because the machine had rebooted, but because the code gathering it was imprecise.
For those who spend their day at a higher level of abstraction, let me explain: we start by querying /proc/uptime
, which I'll let the manpage explain:
This file contains information detailing how long the system has been on since its last restart. The output of /proc/uptime is quite minimal:
350735.47 234388.90
The first number is the total number of seconds the system has been up. The second number is how much of that time the machine has spent idle, in seconds.
We then use cut
to snip the output, using a period as a delimiter and taking only the first field, meaning we take the floor of the uptime in seconds and throw away the rest. We store that in a Perl variable, then feed it back into Bash in the middle of a date format string so that it reads "X seconds ago." We then parse that, rearrange it into year-month-day, throw away any errors, and trim it to put back into Perl for the rest of the script to forward on.
Some days I feel this is the real reason DevOps was invented: a bunch of devs saw the code the Ops guys were writing and cringed so hard, they found themselves volunteering to write "Whatever code you need, man. Just ask me, I'll get it done for you. Please."