- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
I asked a question, you responded with speculation, I countered with another speculation, that's all.
No support -> longer downtime -> losing money. How is that hard to get? And don't forget driving through a snowstorm at night. Sure I get that ... that it's speculation. Depends entirely on the kind of web site. Many web sites are used more during leisure time, I guess.BTW, I'm not really interested in beating the dead horse, as you rightly called it. It's just that arguing with you is so much fun because you make it so easy -- contradicting yourself, "proving" speculation with more speculation. You don't often get to see that for so long. Please keep up the good work! (And I do apologise for the last paragraphs which actually discuss parts of the story.)
Admin
Yes, But did you do it on a busy production server during peak hours (not during a scheduled downtime) and were you breaking any rules in the SOP your company has outlined for you? Also, were you doing it in such a fashion that you cannot get to (or get reliable hands to) the server (such as during a record blizzard?).
If so you should be fired for incompetence. If not then mistakes are mistakes and you will learn.
Admin
That said, I can agree that I'd personally never want to drive through a snowstorm at night for work. If night-maintenance would have required that, I'd have simply rescheduled the maintenance work for another night, one without a snowstorm.
Also, if the night-shift was really so fucking incompetent, to the point where I'd rather put up with the risks of daytime maintenance (even if those risks include downtime during peak hours), then I'd postpone the maintenance until close to the end of the day-shift, rather than a time of day that would cost a whole day's worth of orders.
That's cute. Maybe you could point out for me exactly where I've contradicted myself, or when I claimed speculation was proof of speculation. I'm curious to know, so I could be less likely to make the mistake again in future.Admin
FTFY
Admin
RUN!! IT'S A MOTIE!!
Admin
+1
Admin
Well, I'd have thought that was OK (not being a Linux admin)
But, I bet most people have done something similar at least once in their lives. It's one of the things you learn quickly by experience...
I know I've reconfigured remote servers and made it so I can't log in remotely - that's why I always make sure we have remote KVM support to our servers as well - that way there's two things I have to kill before I get total loss of access...
IMV, the real WFTs are:
Alex made two mistakes, one which wasn't blindingly obvious to someone who hadn't already made it before... The other mistake (restarting the service during the day) probably wasn't that big in the end - if he'd restarted it in the evening, it would probably have taken much longer to get it fixed.
The rest was him doing a valiant job trying to fix it, and being hindered by everyone/everything else.
So, Alex deserves a medal. The main thing he did wrong, you can bet he won't be doing again...
Admin
FTFY
Admin
Also, how would you learn not to do that? Is there a 'things a sysadmin should never do' book?
I bet most people learn not to do it by doing it, then saying 'oh crap', and having to walk^H^H^H^Hrun madly down to the server room to fix things.
Alex was just unlucky that his learning experience was at a hosting company hours away, and that his employer was too cheap to get a decent hosting company, remote power switching, or remote KVM.
Admin
Admin
A female sys admin ??????
Admin
It is a small comfort to realize one is not the only idiot.
Admin
I did say I was missing the part that was speculation (giving you the opportunity to clarify) - at first it seemed you were saying the idea of most users being on during peak hours was speculation, which is absurd (since that's what peak hours means). But rather than clarify, you threw out hints that only seemed to confirm my initial assumption.
Since then I realized that the speculation was actually my assumption that office hours coincided with peak hours. Still, I haven't heard you say that (despite giving you ample opportunity) - and why would you? If you did that, I might have said "oh, gotcha, yes you're right", and the discussion would have been over. You'd have missed out on the chance to pretend you were such a smart guy.
Admin
Admin
Orders weren't lost. Orders being lost would mean they were placed in the first place and then lost. These weren't. This company missed\lost out on..a days worth of orders.
Admin
Admin
The only troll I see here is you.
Admin
Uh yea, senior management looks to save money. Period. Working for a living stinks. No matter how good YOU are..
Admin
Try that again with a network change that will make the current connection invalid. Depending on your operating system, at best, your connection dies, but the start command completes. But the OS on which I ran that command-line remotely without screen (I only ever did it once), it terminated the ssh connection and HUPped everything running on the TTY it opened. That left the box with a running network. YAY!
'/etc/init.d/network stop' killed all of the server parent processes - sshd, sendmail, apache, etc.
'/etc/init.d/network start' was killed before it started any server parent processes.
So the box was on the network, but still inaccessible. Not a distinction that most people would make, as it's effectively offline. But it's crucial to understanding exactly why the process failed.
Running '/etc/init.d/network restart' instead wouldn't have helped - it still would've killed everything on the TTY. Similarly, '(/etc/init.d/network stop; /etc/init.d/network start)&' would not have helped.
Running '/etc/init.d/network stop; nohup /etc/init.d/network start' would have. Running the command-line in a screen session would have also helped.
Note when I made my blunder, I was just down the hallway from the computer. It was back in service in about five minutes. (Long hallway, but I ran. Big, mostly empty building, so I could get away with that.) We did have remote terminals - which worked on all of our equipment except that one. Unfortunately, someone thought it'd be a good idea to give the box an array that required the serial port for a separate command interface, and it only had one serial port, and the terminal switch only did serial port consoles. Sigh.