• fanha (unregistered)

    So the software has a bug that occurs because of an unforeseen scenario, and the IT guy, instead of actually fixing the regular problem with the hardware connection breaking or asking the vendor for support, just complains to his boss that it's not his found and lets the problem persist.

    I think I see the WTF.

  • anon (unregistered) in reply to clickey McClicker
    clickey McClicker:
    I admit I am not fully versed on the "irish girl" thing, it was before my time here, so I am left to wonder if this is the original irish girl or what? Or just some busted tees girl.

    It's the OG... [image]

  • anon (unregistered) in reply to anon

    oops... [image]

  • Dr. Phil (unregistered) in reply to Some shmoe who had to write such 'cluster'
    Some shmoe who had to write such 'cluster':
    Oh, you just don't understand 'enterprise' grade hardware and software.

    For you see, you take anything that's available to end user customer, mark it up 500-1500%, hit it with a hammer sometime, if cables, tie a couple of knots in them. Then you sell it to 'enterprises'.

    As for software.

    If there is a simple coloration between problem <-> solution, you must take at least 4 detours, get on a bus with a floppy, upload it via satellite, change byte ordering a couple of times and then you may be approaching the 'enterprise' grade solution for the problem.

    Similar to what I was thinking, but more humorously :)...adminfish had no option in the matter. What the vendor had was a good salesman. The OP confirmed this in a followup msg.

  • (cs)

    I know! I know! The "#$%&" stands for "Failure", right? :)

  • Jan Hančič (unregistered)

    The real wtf is that it took him 15 minutes to get from a parking lot to the server room

  • (cs) in reply to m0ffx
    m0ffx:
    Indeed, Massimo has an inability to administer these servers.

    That was my thought as well. Any admin worth a shit would drive to the office once, perhaps twice, before coming up with a better solution than having to lay hands every time the software got into a funk.

    Massimo fails it.

  • (cs) in reply to wee
    wee:
    m0ffx:
    Indeed, Massimo has an inability to administer these servers.

    That was my thought as well. Any admin worth a shit would drive to the office once, perhaps twice, before coming up with a better solution than having to lay hands every time the software got into a funk.

    Massimo fails it.

    I'm glad you two geniuses are here to solve all these problems. We'd all probably still be programming on punch cards if it wasn't for your smarts.

  • Wisq (unregistered)

    Honestly, if you're getting split-brain problems more often than you're getting actual server failures, and there's no (affordable) way to fix the split-brain problem... you unplug the secondary and save it as a cold backup for when the primary goes offline.

  • (cs) in reply to anon
    anon:
    oops... [image]
    Is there any way we can get her as the Official TheDailyWTF Mascot?
  • cennef (unregistered) in reply to fennec
    fennec:
    Charles400:
    This is a redundant comment.

    Not yet it isn't.

    This is a redundant comment.

    Not yet it isn't.

    This is a redundant comment.

  • Mister Cheese (unregistered) in reply to Jan Hančič
    Jan Hančič:
    The real wtf is that it took him 15 minutes to get from a parking lot to the server room

    Quite efficient. My record for gaining access to a server room is 1 hour 47 minutes. Also due to "security".

  • AndyC (unregistered) in reply to Satanicpuppy
    Satanicpuppy:
    Windows Clustering is an oxymoron. A real failover cluster should never have that sort of conflict.

    Which bit of "Rather than take advantage of fancy-schmancy Windows functionality, the developers of the Redundancy Manager chose to take their own, custom approach" did you have trouble with? Windows clustering wouldn't have suffered in this way.

  • (cs)

    I miss the days when management had two choices for redundancy:

    (a) Buy incredibly expensive hardware (eg Stratus) at a million a pop. Automatic, self-correcting redundancy. (b) Pay people the going rate because they know what they're doing ...

    ... bwaah hah hah hah hah ...

    Well, it did at least work occasionally. I mean, what's this supposed to mean?

    "With neither systems being aware of the other, they would start fighting for hardware resources which were meant to be accessed by only one server. They would collide, give up due to various errors, then they would try and fail again, stuck in an infinite loop because they had no way of handling such situations."

    I'm old-school. I'd go for the Stratus solution first. I'd go for the "don't buy software from certifiable morons" solution second. But ... "no way of handling such situations?"

    Can nobody say "Aloha?"

  • Jo (unregistered) in reply to KattMan
    KattMan:
    anon:
    oops... [image]
    Is there any way we can get her as the Official TheDailyWTF Mascot?

    Is there ant way we can get her drunk?

  • Wil (unregistered) in reply to Dr. Phil
    Some shmoe who had to write such 'cluster':
    Oh, you just don't understand 'enterprise' grade hardware and software.

    For you see, you take anything that's available to end user customer, mark it up 500-1500%, hit it with a hammer sometime, if cables, tie a couple of knots in them. Then you sell it to 'enterprises'.

    As for software.

    If there is a simple coloration between problem <-> solution, you must take at least 4 detours, get on a bus with a floppy, upload it via satellite, change byte ordering a couple of times and then you may be approaching the 'enterprise' grade solution for the problem.

    "...simple coloration between problem <-> solution..." - This would require a colourful sloution.

    "...take at least 4 detaurs, get on a bus..." - We're talking bus as in mode of transport here, right?

  • Downfall (unregistered) in reply to Massimo
    Massimo:
    Ok, as usual Alex spiced up the thing a bit, so maybe some clarification is due ;-) <snip>

    Okay, that makes a lot more sense. Anonymization is really getting out of hand. Since you posted them, you obviously don't mind the details being 'out there,' and the truth makes a much better story this time. Thanks for coming by to correct the record. I need to start keeping a compendium of posters who come by to say, "That's not what REALLY happened!"

  • Gaetano (unregistered)

    That even is named "split brain" is a common problem in cluster architecture and do exist standard way to solve it, without disconnect a cable. Problem is that some IT workers are clueless.

  • Axil (unregistered)

    Just wondering ...

    Really exists ??

    .. Alex => ALEX => A.L.E.X. (On a highly polished brass plate)

    => Advanced Lexical Editing Xenophon? or Awesome Language Editorial Xerox? (As in table [Sorry])

    Very very small letters: Long time appreciative lurker bursting out with OT first post. Please don't harm me or take away my keyboard ...

    [Ship of Strangers; A.E.S.O.P.]

  • (cs) in reply to James
    James:
    Not related to the article, but "We found an Irish girl to hold the book" is full of win.

    (if you don't get what I'm talking about, whitelist this site on Adblock like a good little forum monkey)

    Hasn't that one been around for weeks? I guess I looked at a dozen ads yesterday for nothing. What are the odds of me refreshing the page for a couple of minutes without that particular ad being chosen to display? Higher than I thought, I figure.

  • (cs) in reply to Jo
    Jo:
    KattMan:
    anon:
    oops... [image]
    Is there any way we can get her as the Official TheDailyWTF Mascot?

    Is there ant way we can get her drunk?

    I think you mean ANY.

    Or perhaps ANDY?

  • WIWTF (unregistered) in reply to KattMan
    KattMan:
    anon:
    oops... [image]
    Is there any way we can get her as the Official TheDailyWTF Mascot?
    She's my WIWTF mascot! (What I Want To ...)
  • (cs) in reply to WhiskeyJack
    Anonymous Coward:
    Thank you.
    WhiskeyJack:
    Chris:
    The real WTF here as that he has to drive to the office to work on this.

    Yup. Just ssh into the remote box and use the "unplugNetworkCable" command, and go back to sleep...

    sudo ifconfig eth0 down;sleep 10;sudo ifconfig eth0 up I'm sure it's similar on Windoze.
  • Wyrd (unregistered)

    In this situation, it would be in the guy's best interest to try to fix it, even if the only way to do so involved cheating a little, like hacking the object code or... maybe spoofing the traffic from Host B to Host A and Host A to Host B so that neither of them ever think the other one is down except when it's really down.

    I mean, no he's probably not going to get a raise, heck he probably won't even get recognition. But he would get to sleep in a little later.

  • Jimmy (unregistered) in reply to Procedural
    Procedural:
    Agreed with previous commentators: fix the network. If that's not enough, write some code to shut down the network connection of Server A when Server B's traffic spikes (or touches certain resources indicating that it is live)

    By definition, the purpose of having a cluster is to tolerate failure.

    The network may be fine tonight, and just fine 99% of the time, but one night, there is a hiccup, possibly congestion caused by the application using the link.

    The cluster failover isn't going to work very well if server A still kills server B, when the failover operation is starting.

    A script cannot automatically determine the physical state of the equipment.

    Just because A is able to run scripts doesn't necessarily mean A is working.

    Just because B is generating traffic doesn't mean a split-brain has arisen.

    Dirty hacks are dangerous and prevent clustering systems from working reliably.

    The solution is proper configuration of the clustering software, and/or replacement of clustering software with software that works properly.

    Just because 100k was spent on it, doesn't mean the 100k was a valuable investment, it's just money lost in the past.

  • chris (unregistered) in reply to Crash Magnet

    Ya, the tray ejection could press the return key on a pc with a script that does "net stop crappyProgamFoo <CR><LF> net start crappyProgramFoo"

    back to sleep!

  • (cs) in reply to James
    James:
    Not related to the article, but "We found an Irish girl to hold the book" is full of win.

    (if you don't get what I'm talking about, whitelist this site on Adblock like a good little forum monkey)

    And white listing it in Kaspersky, if you use it.

  • Over (unregistered) in reply to Anonymous
    Anonymous:
    A hundred thousand dollars??? I wonder how many seats of VMWare Infrastructure that would buy. Plenty, I would have thought.

    That's OVER NI.... nevermind

  • (cs) in reply to Morry
    Morry:
    I don't understand why massimo (or is this Alex's edits?) would argue what the supplier will and won't do. That's management's job to go back to the supplier and tell them their setup isn't working and they need to fix it.

    You have a very optimistic view of how things work. That's a tacit acknowledgement by management that they made a mistake in the initial contract award; which won't happen.

    Which isn't to say all managers are like that, but a lot are. Usually proportional to their actual technical ability; I think it's a bell curve: the ones who know nothing or know a lot are more likely to actually do their jobs properly. It's the ones who know a bit that you need to run like hell from.

  • (cs) in reply to lolwtf
    lolwtf:
    sudo ifconfig eth0 down;sleep 10;sudo ifconfig eth0 up I'm sure it's similar on Windoze.

    On Windows I think you'd have to use VBScript, so you can hook in to the WMI API to control the adapter.

    "Windoze" stopped being funny a long time ago, you know.

  • JimmyVile (unregistered) in reply to clickey McClicker
    clickey McClicker:
    Anonymous Coward:

    I admit I am not fully versed on the "irish girl" thing, it was before my time here, so I am left to wonder if this is the original irish girl or what? Or just some busted tees girl.

    Oh yeah, that's her. I feel like buying some latex.

  • NerfedCharPlayer (unregistered)

    Talking about redundacy... why didn't he get two computers set them up properly so that he could VPN the 1st to eject the CD tray to press eject on the 2nd computer so that the tray pressed reboot on the clusters?

  • AdT (unregistered) in reply to hikari
    hikari:
    "Windoze" stopped being funny a long time ago, you know.

    Yes, when Vista came out it became sad instead of funny.

  • Buffled (unregistered) in reply to hikari
    hikari:
    lolwtf:
    sudo ifconfig eth0 down;sleep 10;sudo ifconfig eth0 up I'm sure it's similar on Windoze.

    On Windows I think you'd have to use VBScript, so you can hook in to the WMI API to control the adapter.

    "Windoze" stopped being funny a long time ago, you know.

    ipconfig /release

    Of course, trying to do that while logged in remotely might be a Bad Idea.

  • gamers2000 (unregistered)

    ipconfig -release networkBetweenServers I'm guessing that there's at least one WAN link and another link between the servers. It's possible to simply release and renew the DHCP lease of one connection arches eyebrow

    In Soviet Russia, you screw computer!

    CAPTCHA: populus. Ooooh I loved that game.

  • asdf (unregistered) in reply to lolwtf
    lolwtf :
    sudo ifconfig eth0 up;sleep 10;sudo ifconfig eth0 down

    thought you were talking about what you'd do to irish girl for a minute there...

  • (cs) in reply to Massimo
    Massimo:
    Lastly, VPN: there was no one (due to "security" reasons), and although I suggested implementing it, management just didn't like the idea.
    But they liked the idea of extended down time while you drove in?
  • Rich (unregistered) in reply to lolwtf

    You're forgetting 2 things:

    1. no standard telnetd or sshd on windows. you'd have to RDP, which is pretty heavyweight.

    2. much more importantly, if you're coming in over the network, it's kind of hard to turn the network back on if you just turned it off. Maybe there's some script you could run that could sleep in the middle of off/on, or have a separate management interface on the box.

    go vandelay, where do i sign up?

    captcha: tego (no stra-?)

  • (cs) in reply to operagost
    operagost:
    Massimo:
    Lastly, VPN: there was no one (due to "security" reasons), and although I suggested implementing it, management just didn't like the idea.
    But they liked the idea of extended down time while you drove in?

    One of the most common problems with management in this industry (and probably in many other industries as well) is: when deciding whether to implement something like this, they often base their decisions solely on what it will cost right now. The potential benefits of something like a VPN at some point in the future, in circumstances that are (in their judgment at least) unlikely to ever occur, rarely enter into the decision-making process.

  • CynicalTyler (unregistered)

    Let us hope that Irish Girl never reads these comments. (See, Irish Girl? I'm defending your honor, you want me!)

    (Honor? I hardly know 'er!)

  • cennef (unregistered) in reply to fennec
    fennec:
    Charles400:
    This is a redundant comment.
    Not yet it isn't.

    This is a redundant comment.

    Not yet it isn't.

    This is a redundant comment.

  • (cs) in reply to Some shmoe who had to write such 'cluster'
    Some shmoe who had to write such 'cluster':
    Oh, you just don't understand 'enterprise' grade hardware and software.

    For you see, you take anything that's available to end user customer, mark it up 500-150000%, hit it with a hammer sometime, if cables, tie a couple of knots in them. Then you sell it to 'enterprises'.

    FTFY. HTH. HAND.

    For example, compare prices on MySQL, PostgreSQL, Oracle.

    (Yes, MySQL has certain problems. PostgreSQL has certain other problems. Oracle also has certain problems - some from column A, some from column B, some from column OMGWTF.)

  • (cs) in reply to asdf

    What I want to know is why the following 2 comments aren't "Featured"... Or don't we get featured comments anymore?

    undrline:
    fennec:
    Charles400:
    This is a redundant comment.
    Not yet it isn't.

    This is a redundant comment.

    As the Master Comment, I feel that you should just both duke it out for commenting resources.

    asdf:
    lolwtf :
    sudo ifconfig eth0 up;sleep 10;sudo ifconfig eth0 down

    thought you were talking about what you'd do to irish girl for a minute there...

    Also (despite my LOLing at the above), I think some of you guys need to back off on the Irish girl comments. Yes, she is quite good looking, but settle down. If she was yours, would you want the other guys talking like that about her?

  • Sebastian (unregistered) in reply to PG
    PG:
    Salmonymous:
    This is commonly referred as a "split-brain" problem. Apparently ,testing was not the priority for such a high price. Nothing to see here, move along.

    Yes and as another poster has said VMS had ways to take care of this way back in the 80s, and without the stupid Linux Clustering "Shoot the other guy in the head" way of thinking.

    First off there is the FUD factor. A bug in the clustering could cause two machines to gain access to the same disks. With a STOMITH solution (or turning off the port in the FC-switch...) that risk is reduced.

    With generic PC hardware there is also the "computer freezes for a while" problem that often happens after someone has inserted a bad cd in the drive... During the freeze a split brain error can happen. I've seen it happen on server hardware from HP.

    With VMS running on known hardware, that is of course a much less likely error.

  • Dan (unregistered) in reply to fennec
    fennec:
    Charles400:
    This is a redundant comment.
    Not yet it isn't.

    This is a redundant comment.

    Not yet it isn't.

    This is a redundant comment.

  • PG (unregistered) in reply to Sebastian
    Sebastian:
    PG:
    Yes and as another poster has said VMS had ways to take care of this way back in the 80s, and without the stupid Linux Clustering "Shoot the other guy in the head" way of thinking.

    First off there is the FUD factor. A bug in the clustering could cause two machines to gain access to the same disks. With a STOMITH solution (or turning off the port in the FC-switch...) that risk is reduced.

    With generic PC hardware there is also the "computer freezes for a while" problem that often happens after someone has inserted a bad cd in the drive... During the freeze a split brain error can happen. I've seen it happen on server hardware from HP.

    With VMS running on known hardware, that is of course a much less likely error.

    Well if your OS understands shared filesystems and clustering at boot time then two machines getting to the same disk is a good thing. Again the problem was solved over 20 years ago.

    STOMITH is flawed. All the cluster nodes shoot each other all at the same time. Now your whole cluster is down. If you have a bad driver that walks all over a disk, then it is a problem outside of clustering. Killing the FC port is just putting a bandaid on a gunshot wound.

    The problem with a machine frezing and taking a cluster down is becuase the important parts of clustering such as node membership are not in the kernel and require a user space process to run. This is the flawed RedHat design for clustering.

    Put the right sanity timers in the kernel and be done with it. "Oh but it is sooooo hard to debug things in the kernel, let's move to user space."

    Yea, I keep hearing that "known hardware" thing, and it is BS.

  • Mitur Binesderti (unregistered)

    Simple solution is to have the servers reboot themselves every night... ghetto engineering at its finest.

  • Dude (unregistered)

    Perhaps investing in an IP based switch to manually shut off the offending port and a remote console for the Server in question would save the moron from having to drive to work.

    Seriously, fix your network.

  • (cs) in reply to m0ffx

    This might be a bit easier:

    evict.bat:

    shutdown /h \node1

    sc \node2 stop Appservice sc \node2 start Appservice

  • David (unregistered) in reply to tin
    tin:
    Also (despite my LOLing at the above), I think some of you guys need to back off on the Irish girl comments. Yes, she is quite good looking, but settle down. If she was yours, would you want the other guys talking like that about her?

    I wouldn't care, I wouldn't be wasting time reading this!

Leave a comment on “Cluster#$%&”

Log In or post as a guest

Replying to comment #:

« Return to Article