• rc4 (disco)

    This feels like déjà vu...

  • Fox (disco) in reply to rc4

    Temporal Bug.

  • LB_ (disco) in reply to rc4

    I definitely read it before, but Google disagrees with me.

  • Maciejasjmj (disco) in reply to LB_

    I think I've seen that one somewhere before, too...

  • PWolff (disco)
    Comment held for moderation.
  • RFoxmich (disco)

    TRWTF: "We'll talk about compensation when you're back."

  • PWolff (disco) in reply to RFoxmich
    RFoxmich:
    TRWTF: "We'll talk about compensation when you're back."

    Well, he didn't say back from where. So it might as well be "from afterlife" or something like this.

    And nowhere is hinted that Mitch thought any different.

  • GettinSadda (disco)

    OK, I'm confused... maybe it's just ignorance, but I don't see this as an interesting story.

    As far as I can tell the receivers were configured with a wrong IP address as a managing device and that device was at a different office (or perhaps this was the correct address as the receivers were supposed to be remotely monitored) but they locked up if the managing device did not respond. IP address changed and they work again. The rest is window-dressing.

  • JBert (disco) in reply to GettinSadda
    GettinSadda:
    The rest is window-dressing.

    Shush, don't give away the secret!

  • RFoxmich (disco) in reply to GettinSadda

    Well maybe TRWTF is that this worked as long as it did.

  • redwizard (disco) in reply to GettinSadda
    GettinSadda:
    As far as I can tell the receivers were configured with a wrong unresponsive IP address as a managing device and that device was at a different office (or perhaps this was the correct address as the receivers were supposed to be remotely monitored) but they locked up by design if the managing device did not respond.

    This in itself wasn't a WTF technically (but from a business process viewpoint, I'd say yes it was). The timing of the failure (New Year's Eve, quite inconvenient) was the WTF.

    I especially liked the solution: the device reports SNMP traps to itself. Hey, it got everything working again!

    A tech coming in to troubleshoot a setup doesn't necessarily know all the reason why something might fail, and even the engineer who designed it wouldn't think to check that off the cuff. In the end, he got the job done.

    RFoxmich:
    TRWTF: "We'll talk about compensation when you're back."

    +1

  • RFoxmich (disco) in reply to redwizard

    I especially liked the solution: the device reports SNMP traps to itself. Hey, it got everything working again!

    Not quite from the article:

    Mitch sighed, at the end of his rope. There's one more thing I can try, he thought. "But you can remote to the router in here, right?"

    "I can," Paul said.

    "Can you add that IP as its secondary interface? Maybe it will play nicer with the trap?"

    The solution was to reconfigure their local router so that it's secondary interface can recognize the same IP as the remote router. TRWTF then is that they didn't just configure the devices to TRAP to the local router's normal IP address instead of this horrid kludge. An even bigger WTF is why they don't have the TRAPs going out to a real sink so that they can monitor why the devices are trapping in the first place.

  • redwizard (disco) in reply to RFoxmich

    "Can you add that IP as its secondary interface? Maybe it will play nicer with the trap?"

    You said:

    RFoxmich:
    The solution was to reconfigure their local router so that it's secondary interface can recognize the same IP as the remote router.

    Read that again. The secondary interface ITSELF is the IP of the router. Therefore, the router is sending SNMP traps to itself.

    RFoxmich:
    TRWTF then is that they didn't just configure the devices to TRAP to the local router's normal IP address instead of this horrid kludge. An even bigger WTF is why they don't have the TRAPs going out to a real sink so that they can monitor why the devices are trapping in the first place.

    This I agree with.

  • Dreikin (disco) in reply to redwizard
    redwizard:
    Read that again. The secondary interface ITSELF is the IP of the router. Therefore, the router is sending SNMP traps to itself.

    FTA:

    some of our receivers try to send an SNMP trap to some IP, then the ICMP Destination Unreachable message pipes in and they just die.

    The devices are sending the SNMP traps. The router is just taking over as the target for the traps via adding a secondary interface with the IP address that the devices are sending to.

  • Crunger (disco) in reply to redwizard
    redwizard:
    GettinSadda:
    As far as I can tell the receivers were configured with a wrong unresponsive IP address as a managing device

    This in itself wasn't a WTF technically (but from a business process viewpoint, I'd say yes it was). The timing of the failure (New Year's Eve, quite inconvenient) was the WTF.

    I'd reverse those. It's almost expected that failures like that will happen at bad times. During normal business hours, there are usually a number of people alert to things (like say a screeching UPS for a power socket that "was working fine"), that end up nipping disasters like this in the bud.

    Meanwhile, the fact that the system has a single point of failure that nobody can fix remotely -- there's few truer indicators of human incompetence. That's a golden :wtf:, but nothing less than what we expect of a monopoly utility industry.

    I especially liked the solution: the device reports SNMP traps *to itself*. Hey, it got everything working again!

    I didn't quite read it that way.

    The devices are all set to trap to a device that is currently offline. So, let's make them all trap to a different device, which is closer and can be maintained.

    While saving the day, Ensign Wesley completely glossed over the reasons why all the devices needed to trap to that device, and the nasty conflict that should happen when the correct device comes back online. If his boss weren't so obviously cool, this story could easily end up poorly for our hero.

Leave a comment on “A Cable Outage”

Log In or post as a guest

Replying to comment #:

« Return to Article