• (nodebb)

    Wait, so the mainframe guy is not the cause of the WTF? There's got to be a frist for everything I guess.

  • LCrawford (unregistered)

    I presume they HAD backups and didn't depend on the SAN being a RAID+x having enough redundancy?

  • Me (unregistered)

    Great to see someone get praise on this website, for a change. Yes, the old geezers do know something that the young whippersnappers have yet to learn.

    BTW: TRWTF is sending your most junior tech-support helpdesk employee for training. Been there, done that. Still wearing the T-shirt.

  • RLB (unregistered)

    Erwin wore a "best"? Is that a new-fangled combination of vest and belt?

  • Andrew Miller (google)

    I'd point at the message that said "possible data loss". That ain't the same thing as "This WILL delete ALL your data".

    BTW If you turn up to the office in the UK in a vest and suspenders, you would probably be reported to HR....

  • DQ (unregistered)

    TRWTF is not having one of the guys (or girls) who have been doing it for the last ten years present but relying on some external guy.

  • David-T (unregistered) in reply to RLB

    A best and a velt?

  • Robert Morson (google) in reply to nerd4sale

    I was expecting to find out that Erwin was sabotaging the SAN in a misguided attempt to prove that mainframes were superior.

  • anon (unregistered) in reply to DQ

    if you are paying them for the teaching, why not use it? if something like this happens, you can hold them responsible.

  • DQ (unregistered) in reply to anon

    It's always nice if you can blame the s*it on somebody else, but that doesn't help much if you're the one cleaning it up...

  • IHeartVMWare (unregistered)

    I submitted this article, and I wanted to say that Remy did a great job keeping the main underlying facts in the story. I was not a storage or MF admin at the time, (I was a virtualization admin that suffered some consequences), but this is one of my favorite 'war stories' to talk about from my career. I have a couple more I hope to submit soon.

  • Dale (unregistered)

    How did mainframes solve the storage problem? Is it just a case of not having much data to store?

  • Prime Mover (unregistered)

    Half way through this story I was preparing to rise to my feet.

    A little further on and I had my hands raised.

    At the story's end I was giving Erwin the standing ovation that he deserves.

    I myself have been the recipient of a process for performing a task. And, like Erwin, I ensure that the person handing it off to me dictates in full detail exactly what he does. And I write it down carefully.

    (Oh, and as soon as the technician or whoever is out of the door, I write a script or build a robot (coughITAPPMONcough) to do as much of this job automatically. Like the time when a colleague taught me how to use an "automated" procedure which evolved into 5 pages of my A4 notebook crammed with instructions on exactly what to do.)

  • Prime Mover (unregistered) in reply to IHeartVMWare

    Wow, yes, please do. This is already one of my favourite stories on TDWTF and I've read a few of them, I can tell you.

  • (nodebb) in reply to Andrew Miller

    BTW If you turn up to the office in the UK in a vest and suspenders, you would probably be reported to HR....

    Doubly so if you're a guy and anyone can tell you're wearing suspenders.

  • MiserableOldGit (unregistered) in reply to Steve_The_Cynic
    Doubly so if you're a guy and anyone can tell you're wearing suspenders.

    You clearly never worked in the UK Civil Service. Such things are mandatory if you wish to ascend the ranks there.

    Even in a boring software firm in Surrey we had a guy on the team who regularly turned up with eyeliner and lipstick and nail varnish because he hadn't actually seen a bathroom since his night out on the town. I tried to avoid getting near enough to find out if he had perfume or what his breath smelled of .... ahem.

  • Foo AKA Fooo (unregistered) in reply to Andrew Miller

    'I'd point at the message that said "possible data loss". That ain't the same thing as "This WILL delete ALL your data".'

    This, and the great difference between "initialize" and "re-initialize". What's wrong with those people, aren't they capable of writing clear messages, or do they make it unclear for ... reasons? (Can't be in order to improve their reputation, if the result is deleting everything their product is storing.)

    FWIW, when I wrote a tool that had to overwrite a disk, I wrote "will delete everything on the disk" ..., even though usually the disk would be empty. Better to err on the side of caution here. Additionally I would display the contents (so the user can check that the disk was actually empty, unless they intentionally want to overwrite a disk), and required them typing "yes" (at lest the vendor here did that). All of it seemed kind of obvious to me (and still do), and it's not even my main area of expertise (unlike the vendor here -- at least it should be theirs).

  • Worf (unregistered)

    I have to admit, I didn't see that coming.

    And I have to admit, I half expected Erwin to have screwed it up by turning a 5 minute procedure into an hour long one. Or showing his disdain for the SAN by being horribly incompetent.

    But no, he was being diligent and getting clarity into the procedure making sure to do things right. And careful at that. And asking lots of questions.

    Though the UI could be improved including not calling it "Initialize" or "Reinitialize' but something plainly obvious like "Add replacement disk to array" or "Replace missing disk". If it was going to delete all the data because it effectively deletes the entire array, it should say that/ "Reinitialize will delete the entire array, all data will be lost!". Plus, since replacing failed disks is a common operation, that should be an extremely visible button on all pages "A disk is missing, and I found a new disk, replace missing disk with the new disk?"

    If the disk has a hot or cold spare, it could even say that "Missing disk replaced with hot spare. Use new disk as spare?"

    Common operations are common and should be easy to do. Things like deleting the array are extremely uncommon and may only be done in the unit's lifetime, so hiding it away is a good idea.

  • MiserableOldGit (unregistered)

    I suspect the details of whatever the UI said and what went on at that point has been lost in time and subsequently made-up. The issue there is they probably did send along a non-expert to offer "training" because they didn't take the client's concerns seriously. Erwin might have been a petty pedantic PITA on a mission to prove a point, but they made it easy for him.

  • MiserableOldGit (unregistered)

    I had a very similar experience around the same time, but I was working in the tourist industry. Not "conservative" but certainly a hub for IT WTF of epic proportions. My sector was doing incredibly well, had no idea what to do with the wheelbarrows of cash they were getting, other than spaff it away on absurd marketing consultants and very ill-advised capital IT "investments".

    I think it was around 2004 for us, I remember speaking to friends of friends in the insurance industry who started to drool at the tech we were buying, and laugh uncontrollably at what was being done with it.

    We got a nice big SAN, complete with this robot backup machine (it was a beauty to watch it swap a tape) and a load of fibre optic connectors between them and the servers. Of course it fixed nothing, because it may have served up 2 TB, but if the morons in marketing have no concept of resources, they'll gobble that in weeks, and they did.

    It became clear the IT manager who had done this had no clue what he was doing, so just before he went on holiday the company (in a rare moment of clarity) got a couple of consultants in to be his standby as they didn't trust his assurances that nothing could possibly go wrong while he was away. During his tour of the server room, they asked the same question about swapping disks and he said "oh, it's all hot swappable, see?" and popped two disks out the array and pushed them back in.

    He'd left the building and was on his way to the airport before the horrible reality of the cluster-fuck he'd caused truly came to light ... the SAN had some sort of caching that (sort of) kept shit going for 45 minutes, but then everything went off-line for an 8 hour array rebuild and there was nothing we could do.

    We did have an Erwin dealing with the "Medium Iron" that travel companies use, his track record was only slightly better, but he was better at company politics.

  • JohnyBgood (unregistered)

    I'm a field tech, whenever I'm left alone working, everything goes well and customer is satisfied. Whenever I get an "Erwin" pushing on my nerves and breaking my line of though every damn minute, well, shit happens.

  • (nodebb)

    I once worked at a place where the "little iron", an AS/400 (a.k.a. iSeries) about the size of a small chest fridge, was so fragile that accidentally unplugging the network cable meant that it had to be cold booted. Meanwhile, the x86 junk surrounding it was all virtualized and clustered so that a large chunk of hardware would have to go down before anyone noticed.

  • (nodebb) in reply to MiserableOldGit

    I remember a case where a technician was sent for an expensive repair on a large scientific apparatus. The company had outsourced support to local tech companies, and didn't even provide them with documentation — remember, science apparatus, so we are talking about 100k€, if not several, in purchase costs, and adequately expensive support contracts, plus billing for technicians to actually provide on-site support.

    After a week of wasting several Scientists time, the device still didn't work and the measurement PC was broken too. The company stepped back from billing the technician hours, but by all right the institute should have been compensated a few thousand € in wasted working hours.

  • RLB (unregistered) in reply to Robert Morson

    I was expecting to find out that Erwin was sabotaging the SAN in a misguided attempt to prove that mainframes were superior.

    Nah. Guys like this have their professional pride.

  • RLB (unregistered) in reply to Steve_The_Cynic

    Doubly so if you're a guy and anyone can tell you're wearing suspenders.

    Who are you to look down on my dear papa?!

  • Hal (unregistered) in reply to Worf

    I have sneaking suspicion based on the vagaries of the story this is/was an Compaq/HP smart array. Those were ripe with WTFery starting with the clumsy web interface intended to make it "easy" but in practice requiring you either had a Compaq tech come out and do it for you because they did it enough to know the procedures cold or read the manual through front to back and only then walk through whatever procedure step by step.

    The UI was positively littered with vague, inconsistently used, and confusing terminology like "initialize" vs "re-initialize", or "attach" on one screen vs "online" on another ...

    That is before you got to the hardware. SAS shelfs and SCISI had the same SCSI cable connector. Wanna guess if you could put them on the same chain and controller? Well you could cable it that way and it would work ... for a while ... then bad stuff would happen. Why not use a different cable to avoid mistakes when moving things or at least detect the erorr and let the user know they should not start the arrays cabled that way? because... where would be the fun in that I guess.

  • MiserableOldGit (unregistered) in reply to RLB
    Nah. Guys like this have their professional pride.

    Really? I've worked with plenty who spent a good chunk of their day sabotaging and backstabbing. I remember one who used to take every integration project as an opportunity to DOS-attack the other servers with floods of meaningless and unnecessary requests to try and "prove" some sort of point.

  • RLB (unregistered) in reply to MiserableOldGit

    I've worked with plenty who spent a good chunk of their day sabotaging and backstabbing.

    That would, I believe, be the kind that doesn't listen to field techs at all, not the kind that makes them go slower so they can take notes.

  • MiserableOldGit (unregistered) in reply to RLB

    True ... I recall getting dragged into an "emergency" meeting with this one, the head of IT and some poor unfortunate field tech who'd he'd brought in to install some software tool. Apparently the tool needed MySQL for its datastore, at the time we had no MySQL presence. Turned out he had assumed SQL Server and MySQL were one and the same and it had become my fault when the field tech had pointed he couldn't put his datastore on our SQL Servers. Well everyone's fault, except his.

    I did say it would probably take little more than half an hour to spin up a VM, put the MySQL version of their choice on it and sort out whatever network settings and drivers were necessary, we could deal with who was going to look after it later on ... but as shouty man had already burned through an hour arguing with the tech and trying to make him solve a problem he couldn't the poor guy just wanted to get out of there before he wrecked even more appointment slots.

    He didn't really listen to anyone... I mean he had ears, but they were just there so his glasses would stay on.

  • (nodebb)

    The real WTF is that the controller doesn't automatically rebuild the drive (at the very least after you tell it about the replacement, but some controllers will even skip that step for you).

    Which I guess is another advantage of having a hot spare, as they always rebuild automatically, and also the operation of adding a new hot spare (or equivalent) is much less dangerous.

Leave a comment on “Big Iron”

Log In or post as a guest

Replying to comment #:

« Return to Article