• (cs)

    The real WTF is the fact that the FTP software defaulted to ASCII for file transfer.

  • Darrell (unregistered)

    Management didn't want to spend money on a critical backup system? Shocker! ;)

  • brian (unregistered)

    uh, yeah.

    Random file corruption DOES happen.

    But in this case, STUPIDITY happened.

    The real WTF is that they allowed stupid people to make decisions regarding data integrity.

  • (cs)

    Someone was still using a command-line ftp? How old is this?

    EDIT: More to the point, why wasn't it automated, and why did they never test the backups after changing to the new system?

  • wtfwtf (unregistered)

    The new WTF is the crazy opinions in the articles. Files don't get corrupted? On what planet, in what alternate reality, are bits always stored perfectly, never get flipped, and a program never does anything wrong?? I want to live there. Maybe they even name their wtf sites properly.

  • ErikTheRed (unregistered)

    <MOOD STYLE="grumpy_old_geek">These damn kids today don't even know what a command prompt looks like. All that GUI crap...it's rottin' their brains! When I was a kid, we used a keyboard and commands to interact with our computers. Now these fancy mice and easy screens are spoilin' em!

    They should teach the rudiments of command-line navigation to every newbie. I'm a Windows sysadmin, and can't stand it when one of the admins I work with is "afraid" to automate a process because they haven't touched the command prompt...it happens!! </MOOD>

    Seriously, any sysadmin who didn't get it in writing from the higher-ups who didn't want to replace a tape backup system isn't worth their salt. Imagine if you were the guy caught out because someone's automated FTP script assumed binary file transfers!!

  • Joachim Otahal (unregistered) in reply to Dark Shikari
    Dark Shikari:
    The real WTF is the fact that the FTP software defaulted to ASCII for file transfer.

    It is not, any commandline FTP I know is first in ascii mode, simply 'cause the server you access is in ascii mode upon connection. I don't know the RFC, but ANY ftp client I know can handle this and most of them silently go to bin mode (like most GUI ftp programs, browsers etc).

  • gerrr (unregistered) in reply to Thief^
    Thief^:
    Someone was still using a command-line ftp? How old is this?

    I use the ftp command line in dos all the time. I am sure if you have a non-gui version of Linux or Unix it would be a very similar client.

  • (cs) in reply to Thief^
    Thief^:
    Someone was still using a command-line ftp? How old is this?

    EDIT: More to the point, why wasn't it automated, and why did they never test the backups after changing to the new system?

    Because just telling the intern to upload it is cheaper and less complex than telling someone to write a script to do it, and because no one ever checks if backups work anyways. It's sad, but true.

  • Joachim Otahal (unregistered) in reply to Thief^
    Thief^:
    Someone was still using a command-line ftp? How old is this?
    I occationally still use commandline ftp, every OS (even Windows XP) offers an commandline ftp client.

    But I only use it to check if there is something wrong with the browser, or to check username and password working. After that it goes into the browser or an ncftp batch job etc.

  • (cs)

    I'll try to remember that when my 2GB thumb drive wouldn't allow me access to a directory last week that it wasn't random file corruption that caused the errors... it was some weird little gremlin that decided to chew on it while I was sleeping.

    Good story but bad absolute statement. Random data corruption does happen through no fault of the user. I've had it happen on a thumb drive recently, a hard drive in the past (no, it wasn't a dead hard drive, it was corrupted sectors that appeared overnight and caused massive problems with the files), and sometimes just a file that saved "funny" when I pressed "ALT-F S".

    Seejay

  • Darth Vader (unregistered) in reply to Dark Shikari

    No, the real WTF is that he used a month old backup instead of running sed -e 's/\r\n/\n/g' on the backed up file. Seriously, WTF was he doing as an admin?

  • Jimmy (unregistered)

    Bit rot is real!

  • Joachim Otahal (unregistered) in reply to Darrell
    Darrell:
    Management didn't want to spend money on a critical backup system? Shocker! ;)
    Things like that happen all the time, the management decides without contacting the company who implemented it first or the person who knows how it works.

    What happened most often here: "Wohoo, we save money by switching the ISP" or "Wohoo, we save money by changing rate so a cheaper one on our ISP"

    Usally it ends up with: "Ou, what is an fixed IP?" "We need two internet connection to have a VPN between our offices?" "What is a smarthost?" etc etc "MX record?" etc

    The result is that the few Euros ('round 10 per month usually) it would have saved is only a fraction of the cost that mistake generates. I especially like the cut-off-offices who return to FAX with real paper then!

  • bling (unregistered) in reply to Darth Vader
    Darth Vader:
    No, the real WTF is that he used a month old backup instead of running sed -e 's/\r\n/\n/g' on the backed up file. Seriously, WTF was he doing as an admin?

    Maybe you have a sed recipe for appending all the stuff that was truncated after the 'end-of-file' character was encountered, too?

  • Mattastic (unregistered) in reply to Darth Vader

    Oh... no friend, no. Not having BIN on sends the file as 7bit ASCII instead of 8 bits to save bandwidth. I don't think line ending having anything to do with this WTF

  • Jenkins (unregistered) in reply to Darth Vader
    Darth Vader:
    No, the real WTF is that he used a month old backup instead of running sed -e 's/\r\n/\n/g' on the backed up file. Seriously, WTF was he doing as an admin?

    Hahaha! It was binary data! What makes you think the original data didn't have x10x13 byte-sequences? And that's assuming charactersets didn't create additional issues. WTF indeed.

  • -j (unregistered) in reply to Darth Vader

    and what if there were some \r\n's in the binary before it was corrupted?

  • Steve Wahl (unregistered) in reply to Darth Vader
    Darth Vader:
    No, the real WTF is that he used a month old backup instead of running sed -e 's/\r\n/\n/g' on the backed up file. Seriously, WTF was he doing as an admin?

    The ascii translation done by FTP is not always reversible. For instance, transfering from a DOS/Windows world to a Unix world changes CR/LF pairs to LF only, like you show above. But it leaves bare LF ('\n') characters alone. Translating back, you can't tell the difference between a \n that had its companion \r deleted and a \n that wasn't paired with a \r to begin with.

    That's not the only example, either. Some ascii file formats end the file when they first encounter a Ctrl-Z. The fact that the backup files were noticably smaller than they should have been says to me that it was probably something more drastic like this that happened to the faulty backups.

  • (cs)

    I guess "random" depends on your frame of reference, a random coin flip is simply a function of a few million (or more) variables.

    Anyway, as someone that managed a collection of samba shares to 30,000 users - working with a word doc over a network drive will lead to "random" file corruption.

  • Spoe (unregistered) in reply to Darth Vader
    Darth Vader:
    No, the real WTF is that he used a month old backup instead of running sed -e 's/\r\n/\n/g' on the backed up file. Seriously, WTF was he doing as an admin?

    It might work if the "backup" server were the one with the larger files. That is, if the ASCII transfer was mapping '\n' to "\r\n". But, from the article, the files there were smaller so the mapping was presumably "\r\n" to '\n'. There's no way to determine if a particular new line in the backup file was a carriage right and new line or just a new line in the original file so the equivalent "sed -e 's/\n/\r\n/g'" wouldn't (necessarily) work.

    Also, you're assuming that was the only mapping in the transfer.

    If '\n' to "\r\n" was the only mapping that occurred, the yes, it'd work.

  • Belcat (unregistered) in reply to wtfwtf
    wtfwtf:
    The new WTF is the crazy opinions in the articles. Files don't get corrupted? On what planet, in what alternate reality, are bits always stored perfectly, never get flipped, and a program never does anything wrong?? I want to live there. Maybe they even name their wtf sites properly.
    The real wtf is the comments... Why do you think Alex crosses out the words? He's joking around. Do you really think that 30 backups would just randomly get corrupted? No, never, impossible, there has to be a cause.
  • Joachim Otahal (unregistered) in reply to Darth Vader
    Darth Vader:
    No, the real WTF is that he used a month old backup instead of running sed -e 's/\r\n/\n/g' on the backed up file. Seriously, WTF was he doing as an admin?

    ASCII transfer not only translates crlf<->cr. It aborts upon seeing the ^d, since it is the end character (terminals, printing etc). Though I do not know if it should abort on that, but I see that it does on most *nix like machines.

    BTW: sed on an (supposed to be) binary file??

    BTW2:The article mentions that the backups were too small by far.

  • hprotagonist (unregistered) in reply to Thief^
    Thief^:
    Someone was still using a command-line ftp? How old is this?

    I use lftp as my only/preferred FTP client.

    among other things, it takes the stupid out of ascii/bin transfers, and just does the Right Thing.

  • htg (unregistered)

    rsync? scp? Tested automated scripts?

    captcha: sanitarium (The Loony BIN)

  • A Nonny Mouse (unregistered)

    Skeptical Sally, is Ness's real name Scot? If so I think I know why

  • LintMan (unregistered)

    I'm not an admin, but even to me it seems obvious that you need to do a full test of your backup system when you make a major process change of that sort. Is that not SOP?

  • Anonymus (unregistered)

    Some people hate it when someone types "First!"

    Some people hate it when people type their CAPTCHA.

    Some people dislike it when people say "The real wtf is..."

    My personal pet peeve is people that don't understand sarcasm, especially on this site. Of course data corruption exists, did you not notice the author mention WORD DOCUMENTS? I'm not sure how he could have made the sarcasm any more obvious...

    CAPTCHA: ewww: Electronic World Wide Web?

  • Eric (unregistered) in reply to Thief^

    There are those of us who find that using the shell based 'ftp' command is far quicker for transferring a file or two than loading a gui equivalent. Even then though, when I use ftp, I automatically type 'hash <return> bin <return> prompt <return>' before anything remotely related to any transferring of files.

    Eric

  • Shinobu (unregistered)

    While random file corruptions may perhaps not occur anymore in a properly controlled environment (i.e. redundant storage, parity archives and what not), in practice most file corruptions I have endured were caused by buggy software, or rather, lousy programmers. And while these may not be properly random in the strictest sense of the word, they do appear pretty random to the user. So there.

  • Anodyne (unregistered) in reply to Mattastic
    Oh... no friend, no. Not having BIN on sends the file as 7bit ASCII instead of 8 bits to save bandwidth. I don't think line ending having anything to do with this WTF
    Not with any FTP setup I've ever heard of. Anyway, line ending translations are quite bad enough...

    The translation is reversible if the ASCII-mode transfer was from a UN*X machine to a Windows machine - the single byte LF (line feed) becomes CRLF (carriage return followed by line feed), and this is fairly straightforward to fix - just replace all occurrences of CRLF with LF. Even if there was a CRLF pair in the original file, this will have become CRCRLF so no problem.

    Unfortunately, the really bad case (transferring from a Windows to a UN*X machine) is the more likely situation here. In that case, any CRLF pairs get replaced with a single LF - and that's irreversible. There's no way to distinguish between an LF that started off as part of a CRLF, and an LF that stood alone. Well, there might be, but it's likely to require knowledge of the specific binary file format and most likely considerable human intervention.

    I have encountered both cases several times. The most common source of the problem these days is FTP software that tries (and fails) to autodetect the file type. Of course, users of such software have no idea that the ASCII/binary decision even needs to be made, so fixing the problem can be... fun.

    Me: "Sorry, your FTP software didn't do it right and the file is broken. I'm afraid you'll have to upload the file again." User: "OK, how do I do it right?" Me: "(sigh) Which of the 4 billion FTP client apps are you using?"

  • (cs)

    So the intern messed up the backups. Thats great.

    But who corrupted the production database. And how does dropping in the backup fix that problem

  • derby (unregistered) in reply to ErikTheRed
    When I was a kid, we used a keyboard

    keyboards! You were lucky. We had to flip toggle switches all day long if we wanted to interact with our computers. Kids and there damn fancy buffered input.

  • Fred4 (unregistered) in reply to Anodyne
    Anodyne:
    Oh... no friend, no. Not having BIN on sends the file as 7bit ASCII instead of 8 bits to save bandwidth. I don't think line ending having anything to do with this WTF
    Not with any FTP setup I've ever heard of. Anyway, line ending translations are quite bad enough...

    The translation is reversible if the ASCII-mode transfer was from a UN*X machine to a Windows machine - the single byte LF (line feed) becomes CRLF (carriage return followed by line feed), and this is fairly straightforward to fix - just replace all occurrences of CRLF with LF. Even if there was a CRLF pair in the original file, this will have become CRCRLF so no problem.

    Unfortunately, the really bad case (transferring from a Windows to a UN*X machine) is the more likely situation here. In that case, any CRLF pairs get replaced with a single LF - and that's irreversible. There's no way to distinguish between an LF that started off as part of a CRLF, and an LF that stood alone. Well, there might be, but it's likely to require knowledge of the specific binary file format and most likely considerable human intervention.

    I have encountered both cases several times. The most common source of the problem these days is FTP software that tries (and fails) to autodetect the file type. Of course, users of such software have no idea that the ASCII/binary decision even needs to be made, so fixing the problem can be... fun.

    Me: "Sorry, your FTP software didn't do it right and the file is broken. I'm afraid you'll have to upload the file again." User: "OK, how do I do it right?" Me: "(sigh) Which of the 4 billion FTP client apps are you using?"

    That's correct. Frankly this was a pretty boneheaded decision on the part of the original FTP designers. A simple transmission protocol has no business messing with the contents of files, especially not by default! If an ASCII file is accidentally transmitted in binary, big deal, the carriage return characters are a little funny. If a binary file is accidentally transmitted in ASCII, it's a complete disaster. Why do this??

  • Asd (unregistered)

    ASCII translation in FTP is the real WTF. Same goes for CVS translation (and database charset conversions). Why can't they just assume I don't want my files fucked with? It is trivial for me to change the files if I want to.

    One of the funnier cases of "corruption" that I have seen was a developer shutting down Oracle with shutdown abort, and not realizing that was anything unusual. Where did he get they idea that would be OK?

    Real file corruption does happen though, I have seen it. Word can magically destroy files when it feels like it.

  • (cs)

    Why would you ever transfer something in ascii mode any more? Any half-decent editors will be able to process any of the kinds of line-ending anyway. What's wrong with ftp clients defaulting to binary mode and leaving the data alone?

  • old bloke (unregistered)

    http://www.ietf.org/rfc/rfc959.txt 3.1.1.1. ASCII TYPE This is the default type and must be accepted by all FTP implementations.

  • (cs)

    And the lesson to be learned, folks, is: always confirm your backups are actually.. well.. backing up.

    Best (and smartest) example I've encountered is a company who uses the backup for research purposes. (So, active work is done on A, which copies to B. Everyone gets read-only access to B to look up historical data). If something goes wrong, people figure it out pretty darned fast.

    As an aside, best definition of data corruption was a computer used in the warehouse. IT was asked if they could recover any of the data. Turns out, there's no software that will recover from forklifts through the case.

  • Anon (unregistered) in reply to obediah
    obediah:
    I guess "random" depends on your frame of reference, a random coin flip is simply a function of a few million (or more) variables.

    Ever heard of quantum mechanics?

  • operagost (unregistered) in reply to Asd
    Asd:
    One of the funnier cases of "corruption" that I have seen was a developer shutting down Oracle with shutdown abort, and not realizing that was anything unusual. Where did he get they idea that would be OK?
    With 7.1 for VMS at a company I used to work with, this was the only way to get Oracle to actually shut down: - Shutdown abort - Startup restrict - then Shutdown

    This must have been okay with Oracle because we had a support contract with them.

  • Mattastic (unregistered) in reply to Anodyne
    Anodyne:
    Oh... no friend, no. Not having BIN on sends the file as 7bit ASCII instead of 8 bits to save bandwidth. I don't think line ending having anything to do with this WTF
    Not with any FTP setup I've ever heard of. Anyway, line ending translations are quite bad enough...

    The translation is reversible if the ASCII-mode transfer was from a UN*X machine to a Windows machine - the single byte LF (line feed) becomes CRLF (carriage return followed by line feed), and this is fairly straightforward to fix - just replace all occurrences of CRLF with LF. Even if there was a CRLF pair in the original file, this will have become CRCRLF so no problem.

    Unfortunately, the really bad case (transferring from a Windows to a UN*X machine) is the more likely situation here. In that case, any CRLF pairs get replaced with a single LF - and that's irreversible. There's no way to distinguish between an LF that started off as part of a CRLF, and an LF that stood alone. Well, there might be, but it's likely to require knowledge of the specific binary file format and most likely considerable human intervention.

    I have encountered both cases several times. The most common source of the problem these days is FTP software that tries (and fails) to autodetect the file type. Of course, users of such software have no idea that the ASCII/binary decision even needs to be made, so fixing the problem can be... fun.

    Me: "Sorry, your FTP software didn't do it right and the file is broken. I'm afraid you'll have to upload the file again." User: "OK, how do I do it right?" Me: "(sigh) Which of the 4 billion FTP client apps are you using?"

    Yeah, I appear to have spoken too soon. I started second guessing myself right after I posted and looked it up. RFC 114 & RFC 913 make mention of 7-bit, but this does not appear to be the norm. Was I thinking SMTP?

  • Confused (unregistered)

    I work for a company with a product that performs FTP, so I am really getting a kick out of these replies...

    I hate, loathe and despise ASCII mode. The concept is flawed to begin with. Typically people put stuff like MSDOS-format files on a Unix server, use FTP to download it to something like a Mac, and finally open the document with MS Windows. Then they wonder why it's a mess.

    ASCII mode means that the server is supposed to assume that the file is in its local text encoding, translate that into NVT-ASCII as best it can and send that, and the receiver should translate from NVT-ASCII to whatever text encoding makes sense for it. In practice, clients and servers do a weird mishmash of either not doing anything or just doing simple CRLF mappings. If your text file was in Latin1, UTF-8, cp1252 or anything more complicated that 7-bit ASCII it's as good as lost.

    To my delight, lately I've seen ftp servers that simply ignore ASCII mode. They transfer in binary no matter what you say. Highly nonstandard, but I'd like to see more of that.

  • Evo (unregistered) in reply to bling
    bling:
    Darth Vader:
    No, the real WTF is that he used a month old backup instead of running sed -e 's/\r\n/\n/g' on the backed up file. Seriously, WTF was he doing as an admin?

    Maybe you have a sed recipe for appending all the stuff that was truncated after the 'end-of-file' character was encountered, too?

    Hmmm, let's say we're missing about 2GB (= 2.000.000.000 bytes (Yes, really)). We could just do: head -c 2000000000 /dev/random and append that to the file. If we're lucky, we have the original file....

    A nice chance of 1:256^2000000000...

    Then again, pipe enough data to a file, run it, and you'll have recreated Windows. What? This isn't an illegal copy of Windows, it's simply the first X bytes from /dev/random !

  • (cs) in reply to obediah

    Could not agree more. Word is awful about files over a network, or LARGE files in general.

  • Sarcastic Ahole (unregistered)

    you guys are all dorks

  • (cs) in reply to Evo
    Evo:
    bling:
    Darth Vader:
    No, the real WTF is that he used a month old backup instead of running sed -e 's/\r\n/\n/g' on the backed up file. Seriously, WTF was he doing as an admin?

    Maybe you have a sed recipe for appending all the stuff that was truncated after the 'end-of-file' character was encountered, too?

    Hmmm, let's say we're missing about 2GB (= 2.000.000.000 bytes (Yes, really)). We could just do: head -c 2000000000 /dev/random and append that to the file. If we're lucky, we have the original file....

    A nice chance of 1:256^2000000000...

    Then again, pipe enough data to a file, run it, and you'll have recreated Windows. What? This isn't an illegal copy of Windows, it's simply the first X bytes from /dev/random !

    What the heck are you trying to say??

  • nobody (unregistered) in reply to derby
    derby:
    When I was a kid, we used a keyboard

    keyboards! You were lucky. We had to flip toggle switches all day long if we wanted to interact with our computers. Kids and there damn fancy buffered input.

    Luxury! We had to magnetize little magnetic cores, and check the magnetization with tiny magnets!

  • Andrey (unregistered)

    Forgetting to manually set BIN mode in an ftp client that doesn't do it for you (99% do) is an entirely forgivable mistake. What is inexcusable is the fact even ls -l on the local machine and the backup server would've shown that something is seriously messed up. In a critical process such as this, the absolute minimum of testing would've been to compare md5 hashes of the local files and the ones on the remote server to make sure no corruption happened. While I doubt that a place too cheap to buy a few extra tapes would have a test server for test restores, a couple of extra checks in the backup script/instructions would've revealed the problem really quickly.

  • (cs) in reply to old bloke
    old bloke:
    http://www.ietf.org/rfc/rfc959.txt 3.1.1.1. ASCII TYPE This is the default type and must be accepted by all FTP implementations.

    I believe that the command line ftp in Linux distros is set to binary by default (common sense!).

  • jimlangrunner (unregistered) in reply to derby
    derby:
    When I was a kid, we used a keyboard

    keyboards! You were lucky. We had to flip toggle switches all day long if we wanted to interact with our computers. Kids and there damn fancy buffered input.

    toggle switches! Lucky. We had to flip those bits by hand, with magnets. I won't even tell you how much fun it was to read it back out! Kids and their damned mechanical input devices.

Leave a comment on “The Loony BIN”

Log In or post as a guest

Replying to comment #:

« Return to Article