- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
It'd be good if there was some way to stop frost comments. They're soooo tedious.
Admin
Damn autocorrect... Frist, frist, frist...
Admin
Admin
Lucky the server wasn't set to UTC
Admin
And nobody investigated the cron mails they received daily, around lunchtime, saying something along the lines of:
kill: (21342) - No such process
I would consider that to be TRWTF.
Admin
Admin
At least it was only
and not or data might have been lost or corrupted.Admin
Yeah but a kill 9 might have been logged letting them find the problem quicker
Admin
Somebody was knowledgeable enough to set up a cron job but did not know/care about that job being kill with a PID? sounds too far fetched
Admin
Boss says: "Set up a CRON job to stop (such-and-such a program) on 22nd December."
Apprentice says: "How do I do that?"
Boss says: "Here's how to stop a program: first you do (whatever command it was to identify what the process ID is of the program in question, I can't remember), and then when you have its process ID, kill it." (Demonstrates by using "kill" to stop the program in question, which happens to have to ID 21342).
Apprentice says: "Okay, but how do I set up a CRON job?"
Boss says: "Use man for instructions, gotta run, got a meeting to go to."
Admin
Admin
Admin
Ah yes, this was the WTF with the inexplicable food obsession.
Admin
OK, so that explains the process dying at 10:12pm, but why was it dying at 12:22?
Admin
Admin
But that doesn't change the kill vs. kill 9 part.
Admin
Admin
In the former case, the application might log a SIGTERM followed by a normal shutdown. In the latter case, the process would die before it could log anything, so it would log nothing until it was restarted, at which point it might log that it was restarting from an unclean shutdown.
Admin
So the correct fix is for the mission critical process(es) to check if their id is 21342. If so, gracefully exit and restart...Still leaves a tiny window...but should be a mjor improvement....
Admin
Admin
Ahhh, thanks.
Admin
Admin
Admin
Admin
Admin
22:12 != 12:22 so yeah why was it dying at 12:22?
Admin
Admin
Admin
Guys, seriously, this is the internet. Sometimes, you just have to spell it out.
The actual article stated that the process always died during lunch, and specifically stated "12:22PM" in the last paragraph. However, the crontab line given in the article got the minute and hour reversed, which would should have caused the process kill to occur at 22:12, or 10:12PM.
tl;dr: anonymization failure.
Admin
22:12 is a strange time for a lunch
Admin
Maybe the WTF is that they aren't running a decent network service monitor?
I'd much rather have nagios tell me that a service has crashed then my boss, or my users.
Admin
I don't quite get how the process can have been running for days and then get killed by that cron job. I mean, unless this thing respawns itself each day or does something silly like that, it would have had that same process ID since it started, so why didn't it get killed the first time that cron job ran?
Admin
Presumably there's some sort of scheduled daily restart. Just go with it.
Admin
They used UTC for their server clocks.
Admin
(Or, possibly, Steve made the anonymisation error by retyping the crontab line from memory, and misremembered the crontab format.)
Admin
Admin
Admin
The REAL WTF is the fact, they have no auto-restart service active!
Like "daemontools, monit, upstart..."
I thought it's standard toolset for every system admin and every mission-critical process is monitored!
Admin
This is the best joke I had on DailyWTF this year.
Admin
Admin
LogicalLinguistic Inclusive"or""and/or" Exclusive"exclusive or""or"
As you can see, "or" occurs twice in this table (unless clarified, i.e. "exclusive or" or "and/or"). One "or" is logical, and the other "or" is linguistic. I meant the logical one.
Admin
Admin
The existence of a signal handler does not automagically protect against data corruption. Many library functions are not signal safe, it is so easy to call the wrong function and cause memory corruption.
Admin
Admin
Your turn.
Admin
Admin
Admin
Admin
TRWTF is that they fired someone and then let them work on the crontab...