- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
So the software has a bug that occurs because of an unforeseen scenario, and the IT guy, instead of actually fixing the regular problem with the hardware connection breaking or asking the vendor for support, just complains to his boss that it's not his found and lets the problem persist.
I think I see the WTF.
Admin
It's the OG... [image]
Admin
oops... [image]
Admin
Similar to what I was thinking, but more humorously :)...adminfish had no option in the matter. What the vendor had was a good salesman. The OP confirmed this in a followup msg.
Admin
I know! I know! The "#$%&" stands for "Failure", right? :)
Admin
The real wtf is that it took him 15 minutes to get from a parking lot to the server room
Admin
That was my thought as well. Any admin worth a shit would drive to the office once, perhaps twice, before coming up with a better solution than having to lay hands every time the software got into a funk.
Massimo fails it.
Admin
I'm glad you two geniuses are here to solve all these problems. We'd all probably still be programming on punch cards if it wasn't for your smarts.
Admin
Honestly, if you're getting split-brain problems more often than you're getting actual server failures, and there's no (affordable) way to fix the split-brain problem... you unplug the secondary and save it as a cold backup for when the primary goes offline.
Admin
Admin
Not yet it isn't.
This is a redundant comment.
Admin
Quite efficient. My record for gaining access to a server room is 1 hour 47 minutes. Also due to "security".
Admin
Which bit of "Rather than take advantage of fancy-schmancy Windows functionality, the developers of the Redundancy Manager chose to take their own, custom approach" did you have trouble with? Windows clustering wouldn't have suffered in this way.
Admin
I miss the days when management had two choices for redundancy:
(a) Buy incredibly expensive hardware (eg Stratus) at a million a pop. Automatic, self-correcting redundancy. (b) Pay people the going rate because they know what they're doing ...
... bwaah hah hah hah hah ...
Well, it did at least work occasionally. I mean, what's this supposed to mean?
"With neither systems being aware of the other, they would start fighting for hardware resources which were meant to be accessed by only one server. They would collide, give up due to various errors, then they would try and fail again, stuck in an infinite loop because they had no way of handling such situations."
I'm old-school. I'd go for the Stratus solution first. I'd go for the "don't buy software from certifiable morons" solution second. But ... "no way of handling such situations?"
Can nobody say "Aloha?"
Admin
Is there ant way we can get her drunk?
Admin
"...simple coloration between problem <-> solution..." - This would require a colourful sloution.
"...take at least 4 detaurs, get on a bus..." - We're talking bus as in mode of transport here, right?
Admin
Okay, that makes a lot more sense. Anonymization is really getting out of hand. Since you posted them, you obviously don't mind the details being 'out there,' and the truth makes a much better story this time. Thanks for coming by to correct the record. I need to start keeping a compendium of posters who come by to say, "That's not what REALLY happened!"
Admin
That even is named "split brain" is a common problem in cluster architecture and do exist standard way to solve it, without disconnect a cable. Problem is that some IT workers are clueless.
Admin
Just wondering ...
Really exists ??
.. Alex => ALEX => A.L.E.X. (On a highly polished brass plate)
=> Advanced Lexical Editing Xenophon? or Awesome Language Editorial Xerox? (As in table [Sorry])
Very very small letters: Long time appreciative lurker bursting out with OT first post. Please don't harm me or take away my keyboard ...
[Ship of Strangers; A.E.S.O.P.]
Admin
Admin
Or perhaps ANDY?
Admin
Admin
Admin
In this situation, it would be in the guy's best interest to try to fix it, even if the only way to do so involved cheating a little, like hacking the object code or... maybe spoofing the traffic from Host B to Host A and Host A to Host B so that neither of them ever think the other one is down except when it's really down.
I mean, no he's probably not going to get a raise, heck he probably won't even get recognition. But he would get to sleep in a little later.
Admin
By definition, the purpose of having a cluster is to tolerate failure.
The network may be fine tonight, and just fine 99% of the time, but one night, there is a hiccup, possibly congestion caused by the application using the link.
The cluster failover isn't going to work very well if server A still kills server B, when the failover operation is starting.
A script cannot automatically determine the physical state of the equipment.
Just because A is able to run scripts doesn't necessarily mean A is working.
Just because B is generating traffic doesn't mean a split-brain has arisen.
Dirty hacks are dangerous and prevent clustering systems from working reliably.
The solution is proper configuration of the clustering software, and/or replacement of clustering software with software that works properly.
Just because 100k was spent on it, doesn't mean the 100k was a valuable investment, it's just money lost in the past.
Admin
Ya, the tray ejection could press the return key on a pc with a script that does "net stop crappyProgamFoo <CR><LF> net start crappyProgramFoo"
back to sleep!
Admin
And white listing it in Kaspersky, if you use it.
Admin
That's OVER NI.... nevermind
Admin
You have a very optimistic view of how things work. That's a tacit acknowledgement by management that they made a mistake in the initial contract award; which won't happen.
Which isn't to say all managers are like that, but a lot are. Usually proportional to their actual technical ability; I think it's a bell curve: the ones who know nothing or know a lot are more likely to actually do their jobs properly. It's the ones who know a bit that you need to run like hell from.
Admin
On Windows I think you'd have to use VBScript, so you can hook in to the WMI API to control the adapter.
"Windoze" stopped being funny a long time ago, you know.
Admin
Oh yeah, that's her. I feel like buying some latex.
Admin
Talking about redundacy... why didn't he get two computers set them up properly so that he could VPN the 1st to eject the CD tray to press eject on the 2nd computer so that the tray pressed reboot on the clusters?
Admin
Yes, when Vista came out it became sad instead of funny.
Admin
Of course, trying to do that while logged in remotely might be a Bad Idea.
Admin
ipconfig -release networkBetweenServers I'm guessing that there's at least one WAN link and another link between the servers. It's possible to simply release and renew the DHCP lease of one connection arches eyebrow
In Soviet Russia, you screw computer!
CAPTCHA: populus. Ooooh I loved that game.
Admin
thought you were talking about what you'd do to irish girl for a minute there...
Admin
Admin
You're forgetting 2 things:
no standard telnetd or sshd on windows. you'd have to RDP, which is pretty heavyweight.
much more importantly, if you're coming in over the network, it's kind of hard to turn the network back on if you just turned it off. Maybe there's some script you could run that could sleep in the middle of off/on, or have a separate management interface on the box.
go vandelay, where do i sign up?
captcha: tego (no stra-?)
Admin
One of the most common problems with management in this industry (and probably in many other industries as well) is: when deciding whether to implement something like this, they often base their decisions solely on what it will cost right now. The potential benefits of something like a VPN at some point in the future, in circumstances that are (in their judgment at least) unlikely to ever occur, rarely enter into the decision-making process.
Admin
Let us hope that Irish Girl never reads these comments. (See, Irish Girl? I'm defending your honor, you want me!)
(Honor? I hardly know 'er!)
Admin
Not yet it isn't.
This is a redundant comment.
Admin
FTFY. HTH. HAND.
For example, compare prices on MySQL, PostgreSQL, Oracle.
(Yes, MySQL has certain problems. PostgreSQL has certain other problems. Oracle also has certain problems - some from column A, some from column B, some from column OMGWTF.)
Admin
What I want to know is why the following 2 comments aren't "Featured"... Or don't we get featured comments anymore?
Also (despite my LOLing at the above), I think some of you guys need to back off on the Irish girl comments. Yes, she is quite good looking, but settle down. If she was yours, would you want the other guys talking like that about her?
Admin
First off there is the FUD factor. A bug in the clustering could cause two machines to gain access to the same disks. With a STOMITH solution (or turning off the port in the FC-switch...) that risk is reduced.
With generic PC hardware there is also the "computer freezes for a while" problem that often happens after someone has inserted a bad cd in the drive... During the freeze a split brain error can happen. I've seen it happen on server hardware from HP.
With VMS running on known hardware, that is of course a much less likely error.
Admin
This is a redundant comment.
Admin
Well if your OS understands shared filesystems and clustering at boot time then two machines getting to the same disk is a good thing. Again the problem was solved over 20 years ago.
STOMITH is flawed. All the cluster nodes shoot each other all at the same time. Now your whole cluster is down. If you have a bad driver that walks all over a disk, then it is a problem outside of clustering. Killing the FC port is just putting a bandaid on a gunshot wound.
The problem with a machine frezing and taking a cluster down is becuase the important parts of clustering such as node membership are not in the kernel and require a user space process to run. This is the flawed RedHat design for clustering.
Put the right sanity timers in the kernel and be done with it. "Oh but it is sooooo hard to debug things in the kernel, let's move to user space."
Yea, I keep hearing that "known hardware" thing, and it is BS.
Admin
Simple solution is to have the servers reboot themselves every night... ghetto engineering at its finest.
Admin
Perhaps investing in an IP based switch to manually shut off the offending port and a remote console for the Server in question would save the moron from having to drive to work.
Seriously, fix your network.
Admin
This might be a bit easier:
evict.bat:
shutdown /h \node1
sc \node2 stop Appservice sc \node2 start Appservice
Admin
I wouldn't care, I wouldn't be wasting time reading this!