- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
The real WTF is the fact that the FTP software defaulted to ASCII for file transfer.
Admin
Management didn't want to spend money on a critical backup system? Shocker! ;)
Admin
uh, yeah.
Random file corruption DOES happen.
But in this case, STUPIDITY happened.
The real WTF is that they allowed stupid people to make decisions regarding data integrity.
Admin
Someone was still using a command-line ftp? How old is this?
EDIT: More to the point, why wasn't it automated, and why did they never test the backups after changing to the new system?
Admin
The new WTF is the crazy opinions in the articles. Files don't get corrupted? On what planet, in what alternate reality, are bits always stored perfectly, never get flipped, and a program never does anything wrong?? I want to live there. Maybe they even name their wtf sites properly.
Admin
<MOOD STYLE="grumpy_old_geek">These damn kids today don't even know what a command prompt looks like. All that GUI crap...it's rottin' their brains! When I was a kid, we used a keyboard and commands to interact with our computers. Now these fancy mice and easy screens are spoilin' em!
They should teach the rudiments of command-line navigation to every newbie. I'm a Windows sysadmin, and can't stand it when one of the admins I work with is "afraid" to automate a process because they haven't touched the command prompt...it happens!! </MOOD>
Seriously, any sysadmin who didn't get it in writing from the higher-ups who didn't want to replace a tape backup system isn't worth their salt. Imagine if you were the guy caught out because someone's automated FTP script assumed binary file transfers!!
Admin
It is not, any commandline FTP I know is first in ascii mode, simply 'cause the server you access is in ascii mode upon connection. I don't know the RFC, but ANY ftp client I know can handle this and most of them silently go to bin mode (like most GUI ftp programs, browsers etc).
Admin
I use the ftp command line in dos all the time. I am sure if you have a non-gui version of Linux or Unix it would be a very similar client.
Admin
Because just telling the intern to upload it is cheaper and less complex than telling someone to write a script to do it, and because no one ever checks if backups work anyways. It's sad, but true.
Admin
But I only use it to check if there is something wrong with the browser, or to check username and password working. After that it goes into the browser or an ncftp batch job etc.
Admin
I'll try to remember that when my 2GB thumb drive wouldn't allow me access to a directory last week that it wasn't random file corruption that caused the errors... it was some weird little gremlin that decided to chew on it while I was sleeping.
Good story but bad absolute statement. Random data corruption does happen through no fault of the user. I've had it happen on a thumb drive recently, a hard drive in the past (no, it wasn't a dead hard drive, it was corrupted sectors that appeared overnight and caused massive problems with the files), and sometimes just a file that saved "funny" when I pressed "ALT-F S".
Seejay
Admin
No, the real WTF is that he used a month old backup instead of running sed -e 's/\r\n/\n/g' on the backed up file. Seriously, WTF was he doing as an admin?
Admin
Bit rot is real!
Admin
What happened most often here: "Wohoo, we save money by switching the ISP" or "Wohoo, we save money by changing rate so a cheaper one on our ISP"
Usally it ends up with: "Ou, what is an fixed IP?" "We need two internet connection to have a VPN between our offices?" "What is a smarthost?" etc etc "MX record?" etc
The result is that the few Euros ('round 10 per month usually) it would have saved is only a fraction of the cost that mistake generates. I especially like the cut-off-offices who return to FAX with real paper then!
Admin
Maybe you have a sed recipe for appending all the stuff that was truncated after the 'end-of-file' character was encountered, too?
Admin
Oh... no friend, no. Not having BIN on sends the file as 7bit ASCII instead of 8 bits to save bandwidth. I don't think line ending having anything to do with this WTF
Admin
Hahaha! It was binary data! What makes you think the original data didn't have x10x13 byte-sequences? And that's assuming charactersets didn't create additional issues. WTF indeed.
Admin
and what if there were some \r\n's in the binary before it was corrupted?
Admin
The ascii translation done by FTP is not always reversible. For instance, transfering from a DOS/Windows world to a Unix world changes CR/LF pairs to LF only, like you show above. But it leaves bare LF ('\n') characters alone. Translating back, you can't tell the difference between a \n that had its companion \r deleted and a \n that wasn't paired with a \r to begin with.
That's not the only example, either. Some ascii file formats end the file when they first encounter a Ctrl-Z. The fact that the backup files were noticably smaller than they should have been says to me that it was probably something more drastic like this that happened to the faulty backups.
Admin
I guess "random" depends on your frame of reference, a random coin flip is simply a function of a few million (or more) variables.
Anyway, as someone that managed a collection of samba shares to 30,000 users - working with a word doc over a network drive will lead to "random" file corruption.
Admin
It might work if the "backup" server were the one with the larger files. That is, if the ASCII transfer was mapping '\n' to "\r\n". But, from the article, the files there were smaller so the mapping was presumably "\r\n" to '\n'. There's no way to determine if a particular new line in the backup file was a carriage right and new line or just a new line in the original file so the equivalent "sed -e 's/\n/\r\n/g'" wouldn't (necessarily) work.
Also, you're assuming that was the only mapping in the transfer.
If '\n' to "\r\n" was the only mapping that occurred, the yes, it'd work.
Admin
Admin
ASCII transfer not only translates crlf<->cr. It aborts upon seeing the ^d, since it is the end character (terminals, printing etc). Though I do not know if it should abort on that, but I see that it does on most *nix like machines.
BTW: sed on an (supposed to be) binary file??
BTW2:The article mentions that the backups were too small by far.
Admin
I use lftp as my only/preferred FTP client.
among other things, it takes the stupid out of ascii/bin transfers, and just does the Right Thing.
Admin
rsync? scp? Tested automated scripts?
captcha: sanitarium (The Loony BIN)
Admin
Skeptical Sally, is Ness's real name Scot? If so I think I know why
Admin
I'm not an admin, but even to me it seems obvious that you need to do a full test of your backup system when you make a major process change of that sort. Is that not SOP?
Admin
Some people hate it when someone types "First!"
Some people hate it when people type their CAPTCHA.
Some people dislike it when people say "The real wtf is..."
My personal pet peeve is people that don't understand sarcasm, especially on this site. Of course data corruption exists, did you not notice the author mention WORD DOCUMENTS? I'm not sure how he could have made the sarcasm any more obvious...
CAPTCHA: ewww: Electronic World Wide Web?
Admin
There are those of us who find that using the shell based 'ftp' command is far quicker for transferring a file or two than loading a gui equivalent. Even then though, when I use ftp, I automatically type 'hash <return> bin <return> prompt <return>' before anything remotely related to any transferring of files.
Eric
Admin
While random file corruptions may perhaps not occur anymore in a properly controlled environment (i.e. redundant storage, parity archives and what not), in practice most file corruptions I have endured were caused by buggy software, or rather, lousy programmers. And while these may not be properly random in the strictest sense of the word, they do appear pretty random to the user. So there.
Admin
The translation is reversible if the ASCII-mode transfer was from a UN*X machine to a Windows machine - the single byte LF (line feed) becomes CRLF (carriage return followed by line feed), and this is fairly straightforward to fix - just replace all occurrences of CRLF with LF. Even if there was a CRLF pair in the original file, this will have become CRCRLF so no problem.
Unfortunately, the really bad case (transferring from a Windows to a UN*X machine) is the more likely situation here. In that case, any CRLF pairs get replaced with a single LF - and that's irreversible. There's no way to distinguish between an LF that started off as part of a CRLF, and an LF that stood alone. Well, there might be, but it's likely to require knowledge of the specific binary file format and most likely considerable human intervention.
I have encountered both cases several times. The most common source of the problem these days is FTP software that tries (and fails) to autodetect the file type. Of course, users of such software have no idea that the ASCII/binary decision even needs to be made, so fixing the problem can be... fun.
Me: "Sorry, your FTP software didn't do it right and the file is broken. I'm afraid you'll have to upload the file again." User: "OK, how do I do it right?" Me: "(sigh) Which of the 4 billion FTP client apps are you using?"
Admin
So the intern messed up the backups. Thats great.
But who corrupted the production database. And how does dropping in the backup fix that problem
Admin
keyboards! You were lucky. We had to flip toggle switches all day long if we wanted to interact with our computers. Kids and there damn fancy buffered input.
Admin
That's correct. Frankly this was a pretty boneheaded decision on the part of the original FTP designers. A simple transmission protocol has no business messing with the contents of files, especially not by default! If an ASCII file is accidentally transmitted in binary, big deal, the carriage return characters are a little funny. If a binary file is accidentally transmitted in ASCII, it's a complete disaster. Why do this??
Admin
ASCII translation in FTP is the real WTF. Same goes for CVS translation (and database charset conversions). Why can't they just assume I don't want my files fucked with? It is trivial for me to change the files if I want to.
One of the funnier cases of "corruption" that I have seen was a developer shutting down Oracle with shutdown abort, and not realizing that was anything unusual. Where did he get they idea that would be OK?
Real file corruption does happen though, I have seen it. Word can magically destroy files when it feels like it.
Admin
Why would you ever transfer something in ascii mode any more? Any half-decent editors will be able to process any of the kinds of line-ending anyway. What's wrong with ftp clients defaulting to binary mode and leaving the data alone?
Admin
http://www.ietf.org/rfc/rfc959.txt 3.1.1.1. ASCII TYPE This is the default type and must be accepted by all FTP implementations.
Admin
And the lesson to be learned, folks, is: always confirm your backups are actually.. well.. backing up.
Best (and smartest) example I've encountered is a company who uses the backup for research purposes. (So, active work is done on A, which copies to B. Everyone gets read-only access to B to look up historical data). If something goes wrong, people figure it out pretty darned fast.
As an aside, best definition of data corruption was a computer used in the warehouse. IT was asked if they could recover any of the data. Turns out, there's no software that will recover from forklifts through the case.
Admin
Ever heard of quantum mechanics?
Admin
This must have been okay with Oracle because we had a support contract with them.
Admin
Yeah, I appear to have spoken too soon. I started second guessing myself right after I posted and looked it up. RFC 114 & RFC 913 make mention of 7-bit, but this does not appear to be the norm. Was I thinking SMTP?
Admin
I work for a company with a product that performs FTP, so I am really getting a kick out of these replies...
I hate, loathe and despise ASCII mode. The concept is flawed to begin with. Typically people put stuff like MSDOS-format files on a Unix server, use FTP to download it to something like a Mac, and finally open the document with MS Windows. Then they wonder why it's a mess.
ASCII mode means that the server is supposed to assume that the file is in its local text encoding, translate that into NVT-ASCII as best it can and send that, and the receiver should translate from NVT-ASCII to whatever text encoding makes sense for it. In practice, clients and servers do a weird mishmash of either not doing anything or just doing simple CRLF mappings. If your text file was in Latin1, UTF-8, cp1252 or anything more complicated that 7-bit ASCII it's as good as lost.
To my delight, lately I've seen ftp servers that simply ignore ASCII mode. They transfer in binary no matter what you say. Highly nonstandard, but I'd like to see more of that.
Admin
Hmmm, let's say we're missing about 2GB (= 2.000.000.000 bytes (Yes, really)). We could just do: head -c 2000000000 /dev/random and append that to the file. If we're lucky, we have the original file....
A nice chance of 1:256^2000000000...
Then again, pipe enough data to a file, run it, and you'll have recreated Windows. What? This isn't an illegal copy of Windows, it's simply the first X bytes from /dev/random !
Admin
Could not agree more. Word is awful about files over a network, or LARGE files in general.
Admin
you guys are all dorks
Admin
What the heck are you trying to say??
Admin
Admin
Forgetting to manually set BIN mode in an ftp client that doesn't do it for you (99% do) is an entirely forgivable mistake. What is inexcusable is the fact even
ls -l
on the local machine and the backup server would've shown that something is seriously messed up. In a critical process such as this, the absolute minimum of testing would've been to compare md5 hashes of the local files and the ones on the remote server to make sure no corruption happened. While I doubt that a place too cheap to buy a few extra tapes would have a test server for test restores, a couple of extra checks in the backup script/instructions would've revealed the problem really quickly.Admin
I believe that the command line ftp in Linux distros is set to binary by default (common sense!).
Admin
toggle switches! Lucky. We had to flip those bits by hand, with magnets. I won't even tell you how much fun it was to read it back out! Kids and their damned mechanical input devices.