• Wisq (unregistered) in reply to Gump
    Gump:
    Exchange, by default, does not behave this way. You get 4 tries across something like 48 or 72 hours, and the message dies.

    I believe you're confusing a temporary 400-series error (e.g. "can't connect to the target mailserver") with a fatal 500-series error (e.g. "invalid account").

    With a 500 error, you don't keep retrying for 48 to 72 hours, you stop right there and immediately generate a bounce message to be sent to the sender. And, if the sender has mail rules set to send somewhere else (and not to ignore bounces), that could indeed generate additional bounces.

    Of course, it's possible that you might actually be correct and it's Exchange that treats fatal errors as temporary ones. But that's a pretty serious violation of the spec, and as a mailserver admin, I've never seen Exchange servers pounding on my door despite repeatedly receiving fatal errors.

  • Anonymous (unregistered) in reply to nonpartisan
    nonpartisan:
    Quirkafleeg:
    Anonymous:
    Qyn:
    This reminds me of the "Reply All" fiasco at my workplace last year.
    I've seen this happen numerous times and have heard stories of it happening with over 100000 recipients. It's a common problem among idiots with no clue about e-mail.
    Is this Bcc thing for when I fill up the Cc?
    Bcc means that the recipients listed there will get the message BEFORE those listed in the cc.
    No no, Quirkafleeg is absolutely correct. The "B" stands for "bonus" as in you get all these bonus recipients!
  • Krenn (unregistered) in reply to your_mom
    your_mom:
    We've literally had a server down for over half a day with a fix that only took about 20 minutes because of this sh*t.

    This is why I love our escalation management team. They handle the barrage of status demands and also the scutwork of figuring out who needs to be told when a high-severity issue comes in (IE: who are the stakeholders for app QXN), leaving the techs and admins to just fix the damn thing.

    Of course, we also curse them when they're reminding us that our 30 minute ticket update is overdue. ;)

  • Anonymous (unregistered) in reply to Steve
    Steve:
    Two words: Nagios monitoring
    Excuse me Steve but this is multi-cultural site and we would appreciate it if you could keep your racial epithets to yourself.
  • Mike (unregistered)

    My only questions is "Who HASN'T had this happen?"

    CAPTCHA: validus - I feel validated now.

  • Anonymoose (unregistered) in reply to Steve
    Steve:
    Two words: Nagios monitoring
    Because it's better to have all your users yelling at you and your monitoring system nagging you at the same time?
  • Anon (unregistered) in reply to TroZ
    TroZ:
    Qyn:
    This reminds me of the "Reply All" fiasco at my workplace last year. One of the company-wide email blasts went out for some such or the other. One of the management recipients had a question, and replied to the email. Of course, being management, technology is a mystery, so instead of sending a direct reply, they sent a "reply all" to over 20,000 people. <snip> Needless to say, several heads rolled, and the "reply all" button was stricken from the email client.

    That is why company wide emails should be BCCed to all instead of TO all.

    My old boss (actually boss' boss) hated BCC. He felt if you're going to be copy somebody else's boss when you bitch at them you should man up and CC them so your victim can see it.

  • Matt (unregistered) in reply to DES
    DES:
    Pardon my saying so, but this is bulls*. A normal Sendmail setup will not behave as described in the article. Either Chris or his predecessor made a complete hash of configuring sendmail, or whoever sent this in made it all up.

    IS there even such a thing as a "normal" Sendmail setup? Not that I've ever seen. To each his own.

  • (cs) in reply to Steve
    Steve:
    Two words: Nagios monitoring

    When will you people stop posting your CAPTCHAs?

  • Spike (unregistered)

    Wow that company must have very shitty conference rooms. if the room is cramped with just one occupant, then how much worse it must be to actually attend a meeting there.

  • mh (unregistered) in reply to Mason Wheeler
    Mason Wheeler:
    So why not just buy a larger hard drive?
    Sorry but this kind of thing is TR-R-WTF. Anyone with any experience in this game knows that usage always expands to fill capacity. All that buying a bigger disk would give you is now you have 500 GB of crap in your queue to worry about.
  • Old fart (unregistered)

    So this just confirms former Senator Ted Stevens' assertion that "the internet is a series of tubes" and that "sending an internet can cause the tubes to become clogged."

  • (cs) in reply to your_mom
    your_mom:
    "Bounce loops should not occur either. The first bounce goes to the sender, the second to the admin (postmaster), and the third is discarded."

    Should is a key word there. I've seen systems set up by clueless sysadmins that would change stuff like that because they are afraid they would lose something. In fact I've seen it quite often. They would set something like send all bounces to the sender and the admin and never discard anything. They take an approach of enable everything, save everything, and notify me about everything with no understanding of what they are doing.

    Yea, I remember a few yours ago when our (exchange) mailserver went down for half a day because someone turned 'out of office' on in outlook and for some reason send an email to himself (likely to check if it worked, what do I know)....

    End result: Exchange server goes down and there is some 40000 mails in the que which had to be sifted through for the real ones which should not be discarded.

    What fun.....

    Yazeran

    Plan: To go to Mars one day with a hammer

  • Eric (unregistered)

    When will people learn. EMail is for sending messages, not files...

  • (cs)

    That takes me back a bit. I remember, probably about ten years ago, I had a client who managed to kill IMAP on our mail server because she was attempting to download a two-gig attachment over dialup. Good times.

  • Lod (unregistered) in reply to Anonymoose

    I completely agree. Sometimes this site seems to be a chronicle of admins so clueless they don't know that they are clueless.

    sendmail is old and crusty, but it works well enough when configured correctly. sendmail provides plenty of controls that could have avoided this situation, as you pointed out.

    since the admin could have prevented the meltdown without violating the terms set by his management, how does responsibility for this situation fall on anyone else?

  • bramster (unregistered)

    Arrived at work this morning, after a day away . . . A whole sh*tload of messages telling my my mailbox was full.

    Now, ALMOST FULL I can understand.

    It's like "This page intentionally left blank"

    WTF

  • ronron (unregistered)

    "The queue directory contained such a preposterous number of files that wildcards such as ? and * could not expand. That meant there was no immediate way to list only the Sendmail "q*" files.." Ever heard of:

    ls | grep "^q"
    ?

  • smxlong (unregistered)

    Uhh.. Isn't TRWTF the use of SMTP as a file transfer system?

  • Franz Kafka (unregistered) in reply to Lod
    Lod:
    I completely agree. Sometimes this site seems to be a chronicle of admins so clueless they don't know that they are clueless.

    sendmail is old and crusty, but it works well enough when configured correctly. sendmail provides plenty of controls that could have avoided this situation, as you pointed out.

    since the admin could have prevented the meltdown without violating the terms set by his management, how does responsibility for this situation fall on anyone else?

    If you look at the end of the story, he tried to get the 'retry every hour' policy changed and was denied. This leads me to believe that it was imposed by mgmt in the first place. Otherwise, he'd just change the damn policy.

  • GP (unregistered) in reply to Mason Wheeler
    Mason Wheeler:
    Things like this happened with distressing frequency when I was in high school. I don't remember what the idiotic client we had for the school email system was called, but the default when you clicked "reply" was "reply to all," not "reply to sender." You had to go out of your way to hunt down the "reply to sender" command if you didn't want every one to see your reply. Needless to say, hillarity ensued, repeatedly.

    I think it was mailx (on SunOS) that was configured that way.

    'r' was reply-all 'R' was reply-sender

    You could 'set flipr' in your .mailrc to reverse the behavior.

  • (cs) in reply to Franz Kafka
    Franz Kafka:
    Lod:
    I completely agree. Sometimes this site seems to be a chronicle of admins so clueless they don't know that they are clueless.

    sendmail is old and crusty, but it works well enough when configured correctly. sendmail provides plenty of controls that could have avoided this situation, as you pointed out.

    since the admin could have prevented the meltdown without violating the terms set by his management, how does responsibility for this situation fall on anyone else?

    If you look at the end of the story, he tried to get the 'retry every hour' policy changed and was denied. This leads me to believe that it was imposed by mgmt in the first place. Otherwise, he'd just change the damn policy.

    There is absolutely nothing wrong with "retry every hour". What's wrong is sending NDNs for transient errors. The MTA should only send an NDN when it's given up.

  • (cs) in reply to ronron
    ronron :
    "The queue directory contained such a preposterous number of files that wildcards such as ? and * could not expand. That meant there was no immediate way to list only the Sendmail "q*" files.." Ever heard of:
    ls | grep "^q"
    ?

    Won't work. Try it yourself. Even

    ls -f
    probably wouldn't have helped if this was more than ~5 years ago.

  • Worf (unregistered) in reply to Qyn
    Qyn:
    This reminds me of the "Reply All" fiasco at my workplace last year. One of the company-wide email blasts went out for some such or the other. One of the management recipients had a question, and replied to the email. Of course, being management, technology is a mystery, so instead of sending a direct reply, they sent a "reply all" to over 20,000 people. Needless to say, there were many confused souls that promptly replied that they should probably not be receiving this reply... and wouldn't you know it THEY all hit "reply all" too. It became the joke of the day to send out a reply company-wide and wait for someone to reply that their mailbox was filling faster than a plate at a Vegas buffet. Under the crushing weight of hundreds of thousands of emails, the servers crashed, and email was out for several hours. Needless to say, several heads rolled, and the "reply all" button was stricken from the email client.

    That scenario took down Microsoft's email system as well...

    http://msexchangeteam.com/archive/2004/04/08/109626.aspx

    Took Microsoft a couple of days to fix.

  • wheresthefire (unregistered)
    Zylon:
    Checking free space before trying to cram more stuff onto the disk IS a trivial operation, dingbat.

    Yes, and Sendmail certainly does this. It kept running in a configuration that involved mailing error messages to an account that ended up getting full. It didn't crash. It presumably returned proper SMTP response codes for resource exhaustion. In short, it did what it was told to do. What's the alternative?

    void HandleIncomingEmail(Message* msg) {
      if(DiskIsTooFullForMessage(msg)) {
        DoZylonsAwesomeAlgorithmForHandlingIncomingEmailMessageWhenDiskIsFull();
      } else {
        DoNormalStuff(msg);
      }
    }
    
  • (cs) in reply to wheresthefire
    wheresthefire:
    Zylon:
    Checking free space before trying to cram more stuff onto the disk IS a trivial operation, dingbat.

    Yes, and Sendmail certainly does this. It kept running in a configuration that involved mailing error messages to an account that ended up getting full. It didn't crash. It presumably returned proper SMTP response codes for resource exhaustion. In short, it did what it was told to do. What's the alternative?

    void HandleIncomingEmail(Message* msg) {
      if(DiskIsTooFullForMessage(msg)) {
        DoZylonsAwesomeAlgorithmForHandlingIncomingEmailMessageWhenDiskIsFull();
      } else {
        DoNormalStuff(msg);
      }
    }
    

    TRWTF is using branching when you should be throwing exceptions ;)

  • Dave (unregistered) in reply to Ozz
    Ozz:
    md5sum:
    Addendum (2010-02-11 10:46): I guess I shouldn't say that "people thank you"... people just pretty much ignore it if it doesn't crash, they only notice when it DOESN'T work. However, it looks good on your review to have high uptime on the systems you are administrating.
    SysAdmins are like custodians. If you're doing your job right then no-one cares about you, or even thinks about you, until they need you to clean up the mess they made.

    No, no, NO, NO! Bad sysadmin, giving the users ideas like that. No doggychocs for you. Users should daily be grateful for your rapid and magical intervention that allows them to perform their daily tasks. They must never know that you broke it in the first place so that you could look good by knowing what was wrong.

    In an ideal world, you set them up so they think they broke it. You don't ever let them break it by themselves, though.

    TRWTF with the original wossname is that he didn't manage to blame it on the time-travelling effects of the people demanding updates after it went wrong. If management will believe that 'electron leakage' and 'bit slippage' caused that million dollar outage, they won't ask too many questions if you tell them that since electronic signals are propagated through the chip at close to the speed of light, significant time dilation can occur in exceptional circumstances.

  • Right Wing-Nut (unregistered) in reply to Dave
    Dave:
    Ozz:
    md5sum:
    Addendum (2010-02-11 10:46): I guess I shouldn't say that "people thank you"... people just pretty much ignore it if it doesn't crash, they only notice when it DOESN'T work. However, it looks good on your review to have high uptime on the systems you are administrating.
    SysAdmins are like custodians. If you're doing your job right then no-one cares about you, or even thinks about you, until they need you to clean up the mess they made.

    No, no, NO, NO! Bad sysadmin, giving the users ideas like that. No doggychocs for you. Users should daily be grateful for your rapid and magical intervention that allows them to perform their daily tasks. They must never know that you broke it in the first place so that you could look good by knowing what was wrong.

    In an ideal world, you set them up so they think they broke it. You don't ever let them break it by themselves, though.

    TRWTF with the original wossname is that he didn't manage to blame it on the time-travelling effects of the people demanding updates after it went wrong. If management will believe that 'electron leakage' and 'bit slippage' caused that million dollar outage, they won't ask too many questions if you tell them that since electronic signals are propagated through the chip at close to the speed of light, significant time dilation can occur in exceptional circumstances.

    Thanks. I've been suffering from lack of BOFH for the last two months...

  • Mike (unregistered) in reply to Dave
    Dave:
    If management will believe that 'electron leakage' and 'bit slippage' caused that million dollar outage, they won't ask too many questions if you tell them that since electronic signals are propagated through the chip at close to the speed of light, significant time dilation can occur in exceptional circumstances.
    Management may buy that, but any admin worth paying knows you can put a reverse flux capacitor across the binary conduit extrusions to dampen any probabilistic reflections. Unless, of course, you're operating under the Copenhagen interpretation.
  • Right Wing-Nut (unregistered) in reply to Qyn
    Qyn:
    This reminds me of the "Reply All" fiasco at my workplace last year. One of the company-wide email blasts went out for some such or the other. One of the management recipients had a question, and replied to the email. Of course, being management, technology is a mystery, so instead of sending a direct reply, they sent a "reply all" to over 20,000 people. Needless to say, there were many confused souls that promptly replied that they should probably not be receiving this reply... and wouldn't you know it THEY all hit "reply all" too. It became the joke of the day to send out a reply company-wide and wait for someone to reply that their mailbox was filling faster than a plate at a Vegas buffet. Under the crushing weight of hundreds of thousands of emails, the servers crashed, and email was out for several hours. Needless to say, several heads rolled, and the "reply all" button was stricken from the email client.

    We had a ~200 man occurance of this at AMD around '97. When I opened up my email & saw about 50 of these replies, I figured that most were from engineers administering an intranet beating. So I replied to all. :)

    The whole thing died down on its own in about 45 minutes. After about two hours, corporate IT sent out a company-wide (AMD had on the order of 10k employees at the time) email in very stern tones about not replying to all. I was sorely tempted....

    The next day, the sysadmin for the affected group put up a hand-drawn cartoon. Panel 1: Angry desk worker typing: Reply to all: "Remove me from your email list". Panel 2: A cat sitting behind a desk: "Your request to be removed from the corporate email list has been granted."

  • (cs) in reply to bramster
    bramster:
    Arrived at work this morning, after a day away . . . A whole sh*tload of messages telling my my mailbox was full.

    Now, ALMOST FULL I can understand.

    It's like "This page intentionally left blank"

    WTF

    "This page intentionally left blank" notices do have their reasons. From wiki:

    Chapters conventionally start on an odd-numbered page; therefore, if the preceding chapter happens to have an odd number of pages, a blank page is inserted at the end. Book pages are often printed on large sheets because of technical and financial considerations. Thus, a group of 8, 16, or 32 consecutive pages will be printed on a single sheet in such a way that when the sheet is mechanically folded and cut, the pages will be in the correct order for binding. Such a group is called a section or signature. Books printed in this manner will always have as many pages as a multiple of the large sheets they were printed on, such as a multiple of 8, 16, or 32. As a result, these books will usually have pages left blank.

    Now, for novels it may not matter that much, but there are certain documents where the appearance of missing pages or printing errors can have consequences, like legal documents, operating manuals or contracts. Hence the notice.

    Also: Use BCC to access the pirated software!

  • (cs) in reply to mh
    mh:
    All that buying a bigger disk would give you is now you have 500 GB of crap in your queue to worry about.
    So... buy a larger hard drive _and_ expand people's mailbox quotas?
  • jeffls (unregistered) in reply to Qyn
    Qyn:
    This reminds me of the "Reply All" fiasco at my workplace last year.

    Only one? I worked at DEC->CPQ->HPQ for a decade and these email storms happened very often (i.e., more than once a year).

    I keep saying: People are the most insidious computer virus devised.

  • Anthony (unregistered) in reply to Steve
    Steve:
    Two words: Nagios monitoring
    That was four words...
  • legal weasel (unregistered) in reply to DES
    DES:
    Pardon my saying so, but this is bulls*. A normal Sendmail setup will not behave as described in the article. Either Chris or his predecessor made a complete hash of configuring sendmail, or whoever sent this in made it all up.

    Absolutely correct. The sideways man should go before the fish.

    [Doot doot trying to look less like spam doot doot.]

  • Jonathan (unregistered)

    OK, if there are too many files to use wildcard expansion, you can always pipe ls into egrep:

    ls | egrep '^q.*'

  • FeepingCreature (unregistered)

    The correct solution, imho, is keeping a log of attachments sent; if you send an attachment more than ~50 times, md5 it, move it to a cache folder on a webserver, and replace the attachment with a link.

  • alister (unregistered) in reply to frits
    frits:
    I gave a letter to the postman, he put it his sack. Bright in early next morning, he brought my letter back.

    She wrote upon it: Return to sender, address unknown. No such number, no such zone. We had a quarrel, a lover's spat I write I'm sorry but my letter keeps coming back.

    So then I dropped it in the mailbox And sent it special D. Bright in early next morning it came right back to me.

    Real life sent parcel Special Delivery UK (reistered post US? - basically get there by noon) Was woken up at 07:30 with parcel for you, err no. I pointed out the 26pt addressee with "To:" and mine in 12pt double strike through with "From:". He wasn't happy....... nor was I.

  • (cs) in reply to smxlong
    smxlong:
    Uhh.. Isn't TRWTF the use of SMTP as a file transfer system?

    QFT.

    Also the apparent lack of single-instance storage for attachments in sendmail.

  • quisling (unregistered) in reply to frits
    frits:
    wheresthefire:
    Zylon:
    Checking free space before trying to cram more stuff onto the disk IS a trivial operation, dingbat.

    Yes, and Sendmail certainly does this. It kept running in a configuration that involved mailing error messages to an account that ended up getting full. It didn't crash. It presumably returned proper SMTP response codes for resource exhaustion. In short, it did what it was told to do. What's the alternative?

    void HandleIncomingEmail(Message* msg) {
      if(DiskIsTooFullForMessage(msg)) {
        DoZylonsAwesomeAlgorithmForHandlingIncomingEmailMessageWhenDiskIsFull();
      } else {
        DoNormalStuff(msg);
      }
    }
    

    TRWTF is using branching when you should be throwing exceptions ;)

    Ah, frits, this time, we agree! :D

  • Jay L (unregistered) in reply to Sebastian
    An admin without a clue?

    Without several:

    • As you point out, you could use find
    • Or you could use the handy command mailq, which has been a part of sendmail since at least 1994
    • He relies on the postmaster mailbox for troubleshooting? Does he know about log files?
    • postmaster isn't configured to drop double bounces? Hasn't that been sendmail's default since.. ever?

    I mean, I don't even do this stuff for a living. Chris, you are a Linux sysadmin?

  • Iago (unregistered) in reply to Anonymous
    Anonymous:
    Qyn:
    This reminds me of the "Reply All" fiasco at my workplace last year.
    I've seen this happen numerous times and have heard stories of it happening with over 100000 recipients. It's a common problem among idiots with no clue about e-mail.

    At least one of which seems to have involved the entire Australian armed forces. Most entertaining.

    Nicely appropriate captcha, by the way :) (it was "saluto")

  • Quirkafleeg (unregistered) in reply to Old fart
    Old fart:
    So this just confirms former Senator Ted Stevens' assertion that "the internet is a series of tubes" and that "sending an internet can cause the tubes to become clogged."
    Congraturation, you win an Internet. Hold on while I send it…
  • Cheong (unregistered) in reply to toth
    toth:
    if the message bounced, an error message including the entire original message and attachment was sent back to the sender

    This is TRWTF.

    Agreed. The sender should have a copy sleep quietly in the "Sent mail" folder.

    Perheps they should write a new webmail/outlook plugin that have the bounced back message contain a reference to the origional mail (such as the MessageID), so even if the bounce back message does not contain the origional attachment, the user can still send it again with one click.

    But wait... you now have two problems... :O

  • Edward Royce (unregistered) in reply to mh
    mh:
    Mason Wheeler:
    So why not just buy a larger hard drive?
    Sorry but this kind of thing is TR-R-WTF. Anyone with any experience in this game knows that usage always expands to fill capacity. All that buying a bigger disk would give you is now you have 500 GB of crap in your queue to worry about.

    In that case buy a -smaller- hard disk.

    Why does it always have to be me to solve these issues!

    :)

  • (cs)

    Aren't modern mail servers smart enough to just hash every message and then only save one copy of otherwise identical messages?

    Yeah yeah, I know, either they've already been doing that for decades, or there's some horrible inherent flaw in such a scheme.

  • WilliamF (unregistered) in reply to nonpartisan

    Bcc is BLIND carbon copy which sends to the addresses without storing them in the message.

  • Nick H (unregistered)

    The beauty of email is that there is no guarantee of delivery.

    Imagine how complex the protocol (and the overall system) would have been if email delivery had to be guaranteed.

    So, deleting the whole queue should be considered a perfectly acceptable solution from the start (after the cause was discovered).

  • nonpartisan (unregistered) in reply to WilliamF
    WilliamF:
    Bcc is BLIND carbon copy which sends to the addresses without storing them in the message.
    Whoosh!!
  • cite (unregistered)
    The queue directory contained such a preposterous number of files that wildcards such as ? and * could not expand.

    That's the real WTF. Hi @ "find"!

Leave a comment on “The Great Cascade”

Log In or post as a guest

Replying to comment #:

« Return to Article