Comment On The Loony BIN

Call me a Skeptical Sally (actually, don't), but whenever I hear someone complaining of random file corruption, I don't really believe them. Of course, it's a wonderful excuse if you don't know why your code doesn't work or you just slacked off and didn't get some Word document done; maybe you've even used it a few times. Still, that doesn't change the fact that random file corruption rarely never happens. [expand full text]
« PrevPage 1 | Page 2 | Page 3Next »

Re: The Loony BIN

2007-05-01 11:10 • by Dark Shikari
The real WTF is the fact that the FTP software defaulted to ASCII for file transfer.

Re: The Loony BIN

2007-05-01 11:10 • by Darrell (unregistered)
Management didn't want to spend money on a critical backup system? Shocker! ;)

Re: The Loony BIN

2007-05-01 11:10 • by brian (unregistered)
uh, yeah.

Random file corruption DOES happen.

But in this case, STUPIDITY happened.

The real WTF is that they allowed stupid people to make decisions regarding data integrity.

Re: The Loony BIN

2007-05-01 11:17 • by Thief^
Someone was still using a command-line ftp? How old is this?

EDIT: More to the point, why wasn't it automated, and why did they never test the backups after changing to the new system?

Re: The Loony BIN

2007-05-01 11:27 • by wtfwtf (unregistered)
The new WTF is the crazy opinions in the articles. Files don't get corrupted? On what planet, in what alternate reality, are bits always stored perfectly, never get flipped, and a program never does anything wrong?? I want to live there. Maybe they even name their wtf sites properly.

Re: The Loony BIN

2007-05-01 11:28 • by ErikTheRed (unregistered)
<MOOD STYLE="grumpy_old_geek">These damn kids today don't even know what a command prompt looks like. All that GUI crap...it's rottin' their brains! When I was a kid, we used a keyboard and commands to interact with our computers. Now these fancy mice and easy screens are spoilin' em!

They should teach the rudiments of command-line navigation to every newbie. I'm a Windows sysadmin, and can't stand it when one of the admins I work with is "afraid" to automate a process because they haven't touched the command prompt...it happens!!
</MOOD>

Seriously, any sysadmin who didn't get it in writing from the higher-ups who didn't want to replace a tape backup system isn't worth their salt. Imagine if you were the guy caught out because someone's automated FTP script assumed binary file transfers!!

Re: The Loony BIN

2007-05-01 11:29 • by Joachim Otahal (unregistered)
134416 in reply to 134408
Dark Shikari:
The real WTF is the fact that the FTP software defaulted to ASCII for file transfer.


It is not, any commandline FTP I know is first in ascii mode, simply 'cause the _server_ you access is in ascii mode upon connection. I don't know the RFC, but ANY ftp client I know can handle this and most of them silently go to bin mode (like most GUI ftp programs, browsers etc).

Re: The Loony BIN

2007-05-01 11:29 • by gerrr (unregistered)
134417 in reply to 134412
Thief^:
Someone was still using a command-line ftp? How old is this?


I use the ftp command line in dos all the time. I am sure if you have a non-gui version of Linux or Unix it would be a very similar client.

Re: The Loony BIN

2007-05-01 11:29 • by halcyon
134418 in reply to 134412
Thief^:
Someone was still using a command-line ftp? How old is this?

EDIT: More to the point, why wasn't it automated, and why did they never test the backups after changing to the new system?


Because just telling the intern to upload it is cheaper and less complex than telling someone to write a script to do it, and because no one ever checks if backups work anyways. It's sad, but true.

Re: The Loony BIN

2007-05-01 11:32 • by Joachim Otahal (unregistered)
134419 in reply to 134412
Thief^:
Someone was still using a command-line ftp? How old is this?

I occationally still use commandline ftp, every OS (even Windows XP) offers an commandline ftp client.

But I only use it to check if there is something wrong with the browser, or to check username and password working. After that it goes into the browser or an ncftp batch job etc.

Re: The Loony BIN

2007-05-01 11:34 • by seejay
I'll try to remember that when my 2GB thumb drive wouldn't allow me access to a directory last week that it wasn't random file corruption that caused the errors... it was some weird little gremlin that decided to chew on it while I was sleeping.

Good story but *bad* absolute statement. Random data corruption does happen through no fault of the user. I've had it happen on a thumb drive recently, a hard drive in the past (no, it wasn't a dead hard drive, it was corrupted sectors that appeared overnight and caused massive problems with the files), and sometimes just a file that saved "funny" when I pressed "ALT-F S".

Seejay

Re: The Loony BIN

2007-05-01 11:35 • by Darth Vader (unregistered)
134422 in reply to 134408
No, the real WTF is that he used a month old backup instead of running
sed -e 's/\r\n/\n/g' on the backed up file. Seriously, WTF was he doing as an admin?

Re: The Loony BIN

2007-05-01 11:40 • by Jimmy (unregistered)
Bit rot is real!

Re: The Loony BIN

2007-05-01 11:42 • by Joachim Otahal (unregistered)
134425 in reply to 134409
Darrell:
Management didn't want to spend money on a critical backup system? Shocker! ;)

Things like that happen all the time, the management decides without contacting the company who implemented it first or the person who knows how it works.

What happened most often here: "Wohoo, we save money by switching the ISP" or "Wohoo, we save money by changing rate so a cheaper one on our ISP"

Usally it ends up with:
"Ou, what is an fixed IP?"
"We need two internet connection to have a VPN between our offices?"
"What is a smarthost?"
etc
etc "MX record?"
etc

The result is that the few Euros ('round 10 per month usually) it would have saved is only a fraction of the cost that mistake generates.
I especially like the cut-off-offices who return to FAX with real paper then!

Re: The Loony BIN

2007-05-01 11:43 • by bling (unregistered)
134426 in reply to 134422
Darth Vader:
No, the real WTF is that he used a month old backup instead of running
sed -e 's/\r\n/\n/g' on the backed up file. Seriously, WTF was he doing as an admin?


Maybe you have a sed recipe for appending all the stuff that was truncated after the 'end-of-file' character was encountered, too?

Re: The Loony BIN

2007-05-01 11:43 • by Mattastic (unregistered)
134427 in reply to 134422
Oh... no friend, no. Not having BIN on sends the file as 7bit ASCII instead of 8 bits to save bandwidth. I don't think line ending having anything to do with this WTF

Re: The Loony BIN

2007-05-01 11:43 • by Jenkins (unregistered)
134428 in reply to 134422
Darth Vader:
No, the real WTF is that he used a month old backup instead of running
sed -e 's/\r\n/\n/g' on the backed up file. Seriously, WTF was he doing as an admin?


Hahaha! It was binary data! What makes you think the original data didn't have x10x13 byte-sequences? And that's assuming charactersets didn't create additional issues. WTF indeed.

Re: The Loony BIN

2007-05-01 11:46 • by -j (unregistered)
134429 in reply to 134422
and what if there were some \r\n's in the binary before it was corrupted?

Re: The Loony BIN

2007-05-01 11:49 • by Steve Wahl (unregistered)
134431 in reply to 134422
Darth Vader:
No, the real WTF is that he used a month old backup instead of running
sed -e 's/\r\n/\n/g' on the backed up file. Seriously, WTF was he doing as an admin?


The ascii translation done by FTP is not always reversible. For instance, transfering from a DOS/Windows world to a Unix world changes CR/LF pairs to LF only, like you show above. But it leaves bare LF ('\n') characters alone. Translating back, you can't tell the difference between a \n that had its companion \r deleted and a \n that wasn't paired with a \r to begin with.

That's not the only example, either. Some ascii file formats end the file when they first encounter a Ctrl-Z. The fact that the backup files were noticably smaller than they should have been says to me that it was probably something more drastic like this that happened to the faulty backups.

Re: The Loony BIN

2007-05-01 11:50 • by obediah
I guess "random" depends on your frame of reference, a random coin flip is simply a function of a few million (or more) variables.

Anyway, as someone that managed a collection of samba shares to 30,000 users - working with a word doc over a network drive will lead to "random" file corruption.

Re: The Loony BIN

2007-05-01 11:50 • by Spoe (unregistered)
134433 in reply to 134422
Darth Vader:
No, the real WTF is that he used a month old backup instead of running
sed -e 's/\r\n/\n/g' on the backed up file. Seriously, WTF was he doing as an admin?


It *might* work if the "backup" server were the one with the larger files. That is, if the ASCII transfer was mapping '\n' to "\r\n". But, from the article, the files there were smaller so the mapping was presumably "\r\n" to '\n'. There's no way to determine if a particular new line in the backup file was a carriage right and new line or just a new line in the original file so the equivalent "sed -e 's/\n/\r\n/g'" wouldn't (necessarily) work.

Also, you're assuming that was the only mapping in the transfer.

If '\n' to "\r\n" was the only mapping that occurred, the yes, it'd work.

Re: The Loony BIN

2007-05-01 11:51 • by Belcat (unregistered)
134434 in reply to 134414
wtfwtf:
The new WTF is the crazy opinions in the articles. Files don't get corrupted? On what planet, in what alternate reality, are bits always stored perfectly, never get flipped, and a program never does anything wrong?? I want to live there. Maybe they even name their wtf sites properly.

The real wtf is the comments... Why do you think Alex crosses out the words? He's joking around. Do you really think that 30 backups would just randomly get corrupted? No, never, impossible, there has to be a cause.

Re: The Loony BIN

2007-05-01 11:52 • by Joachim Otahal (unregistered)
134435 in reply to 134422
Darth Vader:
No, the real WTF is that he used a month old backup instead of running
sed -e 's/\r\n/\n/g' on the backed up file. Seriously, WTF was he doing as an admin?


ASCII transfer not only translates crlf<->cr.
It aborts upon seeing the ^d, since it is the end character (terminals, printing etc). Though I do not know if it should abort on that, but I see that it does on most *nix like machines.

BTW: sed on an (supposed to be) binary file??

BTW2:The article mentions that the backups were too small by far.

Re: The Loony BIN

2007-05-01 11:52 • by hprotagonist (unregistered)
134436 in reply to 134412
Thief^:
Someone was still using a command-line ftp? How old is this?


I use lftp as my only/preferred FTP client.

among other things, it takes the stupid out of ascii/bin transfers, and just does the Right Thing.

Re: The Loony BIN

2007-05-01 11:55 • by htg (unregistered)
rsync?
scp?
Tested automated scripts?

captcha: sanitarium (The Loony BIN)

Re: The Loony BIN

2007-05-01 11:59 • by A Nonny Mouse (unregistered)
Skeptical Sally, is Ness's real name Scot? If so I think I know why

Re: The Loony BIN

2007-05-01 12:04 • by LintMan (unregistered)
I'm not an admin, but even to me it seems obvious that you need to do a full test of your backup system when you make a major process change of that sort. Is that not SOP?

Re: The Loony BIN

2007-05-01 12:05 • by Anonymus (unregistered)
Some people hate it when someone types "First!"

Some people hate it when people type their CAPTCHA.

Some people dislike it when people say "The real wtf is..."

My personal pet peeve is people that don't understand sarcasm, especially on this site. Of course data corruption exists, did you not notice the author mention WORD DOCUMENTS? I'm not sure how he could have made the sarcasm any more obvious...

CAPTCHA: ewww: Electronic World Wide Web?

Re: The Loony BIN

2007-05-01 12:06 • by Eric (unregistered)
134441 in reply to 134412
There are those of us who find that using the shell based 'ftp' command is far quicker for transferring a file or two than loading a gui equivalent. Even then though, when I use ftp, I automatically type 'hash <return> bin <return> prompt <return>' before anything remotely related to any transferring of files.

Eric

Re: The Loony BIN

2007-05-01 12:07 • by Shinobu (unregistered)
While random file corruptions may perhaps not occur anymore in a properly controlled environment (i.e. redundant storage, parity archives and what not), in practice most file corruptions I have endured were caused by buggy software, or rather, lousy programmers. And while these may not be properly random in the strictest sense of the word, they do appear pretty random to the user. So there.

Re: The Loony BIN

2007-05-01 12:09 • by Anodyne (unregistered)
134443 in reply to 134427
Oh... no friend, no. Not having BIN on sends the file as 7bit ASCII instead of 8 bits to save bandwidth. I don't think line ending having anything to do with this WTF

Not with any FTP setup I've ever heard of. Anyway, line ending translations are quite bad enough...

The translation is reversible if the ASCII-mode transfer was from a UN*X machine to a Windows machine - the single byte LF (line feed) becomes CRLF (carriage return followed by line feed), and this is fairly straightforward to fix - just replace all occurrences of CRLF with LF. Even if there was a CRLF pair in the original file, this will have become CRCRLF so no problem.

Unfortunately, the really bad case (transferring from a Windows to a UN*X machine) is the more likely situation here. In that case, any CRLF pairs get replaced with a single LF - and that's irreversible. There's no way to distinguish between an LF that started off as part of a CRLF, and an LF that stood alone. Well, there might be, but it's likely to require knowledge of the specific binary file format and most likely considerable human intervention.

I have encountered both cases several times. The most common source of the problem these days is FTP software that tries (and fails) to autodetect the file type. Of course, users of such software have no idea that the ASCII/binary decision even needs to be made, so fixing the problem can be... fun.

Me: "Sorry, your FTP software didn't do it right and the file is broken. I'm afraid you'll have to upload the file again."
User: "OK, how do I do it right?"
Me: "(sigh) Which of the 4 billion FTP client apps are you using?"

Re: The Loony BIN

2007-05-01 12:10 • by Kiriai
So the intern messed up the backups. Thats great.

But who corrupted the production database. And how does dropping in the backup fix that problem

Re: The Loony BIN

2007-05-01 12:19 • by derby (unregistered)
134446 in reply to 134415
When I was a kid, we used a keyboard


keyboards! You were lucky. We had to flip toggle switches all day long if we wanted to interact with our computers. Kids and there damn fancy buffered input.

Re: The Loony BIN

2007-05-01 12:22 • by Fred4 (unregistered)
134447 in reply to 134443
Anodyne:
Oh... no friend, no. Not having BIN on sends the file as 7bit ASCII instead of 8 bits to save bandwidth. I don't think line ending having anything to do with this WTF

Not with any FTP setup I've ever heard of. Anyway, line ending translations are quite bad enough...

The translation is reversible if the ASCII-mode transfer was from a UN*X machine to a Windows machine - the single byte LF (line feed) becomes CRLF (carriage return followed by line feed), and this is fairly straightforward to fix - just replace all occurrences of CRLF with LF. Even if there was a CRLF pair in the original file, this will have become CRCRLF so no problem.

Unfortunately, the really bad case (transferring from a Windows to a UN*X machine) is the more likely situation here. In that case, any CRLF pairs get replaced with a single LF - and that's irreversible. There's no way to distinguish between an LF that started off as part of a CRLF, and an LF that stood alone. Well, there might be, but it's likely to require knowledge of the specific binary file format and most likely considerable human intervention.

I have encountered both cases several times. The most common source of the problem these days is FTP software that tries (and fails) to autodetect the file type. Of course, users of such software have no idea that the ASCII/binary decision even needs to be made, so fixing the problem can be... fun.

Me: "Sorry, your FTP software didn't do it right and the file is broken. I'm afraid you'll have to upload the file again."
User: "OK, how do I do it right?"
Me: "(sigh) Which of the 4 billion FTP client apps are you using?"


That's correct. Frankly this was a pretty boneheaded decision on the part of the original FTP designers. A simple transmission protocol has no business messing with the contents of files, especially not by default! If an ASCII file is accidentally transmitted in binary, big deal, the carriage return characters are a little funny. If a binary file is accidentally transmitted in ASCII, it's a complete disaster. Why do this??

Re: The Loony BIN

2007-05-01 12:27 • by Asd (unregistered)
ASCII translation in FTP is the real WTF. Same goes for CVS translation (and database charset conversions). Why can't they just assume I don't want my files fucked with? It is trivial for me to change the files if I want to.

One of the funnier cases of "corruption" that I have seen was a developer shutting down Oracle with shutdown abort, and not realizing that was anything unusual. Where did he get they idea that would be OK?

Real file corruption does happen though, I have seen it. Word can magically destroy files when it feels like it.

Re: The Loony BIN

2007-05-01 12:28 • by Thief^
Why would you ever transfer something in ascii mode any more? Any half-decent editors will be able to process any of the kinds of line-ending anyway. What's wrong with ftp clients defaulting to binary mode and leaving the data alone?

Re: The Loony BIN

2007-05-01 12:29 • by old bloke (unregistered)
http://www.ietf.org/rfc/rfc959.txt
3.1.1.1. ASCII TYPE
This is the default type and must be accepted by all FTP implementations.

Re: The Loony BIN

2007-05-01 12:37 • by AGould
And the lesson to be learned, folks, is: *always* confirm your backups are actually.. well.. backing up.

Best (and smartest) example I've encountered is a company who uses the backup for research purposes. (So, active work is done on A, which copies to B. Everyone gets read-only access to B to look up historical data). If something goes wrong, people figure it out pretty darned fast.

As an aside, best definition of data corruption was a computer used in the warehouse. IT was asked if they could recover any of the data. Turns out, there's no software that will recover from forklifts through the case.

Re: The Loony BIN

2007-05-01 12:47 • by Anon (unregistered)
134452 in reply to 134432
obediah:
I guess "random" depends on your frame of reference, a random coin flip is simply a function of a few million (or more) variables.


Ever heard of quantum mechanics?

Re: The Loony BIN

2007-05-01 12:47 • by operagost (unregistered)
134453 in reply to 134448
Asd:

One of the funnier cases of "corruption" that I have seen was a developer shutting down Oracle with shutdown abort, and not realizing that was anything unusual. Where did he get they idea that would be OK?

With 7.1 for VMS at a company I used to work with, this was the only way to get Oracle to actually shut down:
- Shutdown abort
- Startup restrict
- then Shutdown

This must have been okay with Oracle because we had a support contract with them.

Re: The Loony BIN

2007-05-01 12:51 • by Mattastic (unregistered)
134454 in reply to 134443
Anodyne:
Oh... no friend, no. Not having BIN on sends the file as 7bit ASCII instead of 8 bits to save bandwidth. I don't think line ending having anything to do with this WTF

Not with any FTP setup I've ever heard of. Anyway, line ending translations are quite bad enough...

The translation is reversible if the ASCII-mode transfer was from a UN*X machine to a Windows machine - the single byte LF (line feed) becomes CRLF (carriage return followed by line feed), and this is fairly straightforward to fix - just replace all occurrences of CRLF with LF. Even if there was a CRLF pair in the original file, this will have become CRCRLF so no problem.

Unfortunately, the really bad case (transferring from a Windows to a UN*X machine) is the more likely situation here. In that case, any CRLF pairs get replaced with a single LF - and that's irreversible. There's no way to distinguish between an LF that started off as part of a CRLF, and an LF that stood alone. Well, there might be, but it's likely to require knowledge of the specific binary file format and most likely considerable human intervention.

I have encountered both cases several times. The most common source of the problem these days is FTP software that tries (and fails) to autodetect the file type. Of course, users of such software have no idea that the ASCII/binary decision even needs to be made, so fixing the problem can be... fun.

Me: "Sorry, your FTP software didn't do it right and the file is broken. I'm afraid you'll have to upload the file again."
User: "OK, how do I do it right?"
Me: "(sigh) Which of the 4 billion FTP client apps are you using?"


Yeah, I appear to have spoken too soon. I started second guessing myself right after I posted and looked it up. RFC 114 & RFC 913 make mention of 7-bit, but this does not appear to be the norm. Was I thinking SMTP?

Re: The Loony BIN

2007-05-01 12:52 • by Confused (unregistered)
I work for a company with a product that performs FTP, so I am really getting a kick out of these replies...

I hate, loathe and despise ASCII mode. The concept is flawed to begin with. Typically people put stuff like MSDOS-format files on a Unix server, use FTP to download it to something like a Mac, and finally open the document with MS Windows. Then they wonder why it's a mess.

ASCII mode means that the server is supposed to assume that the file is in its local text encoding, translate that into NVT-ASCII as best it can and send that, and the receiver should translate from NVT-ASCII to whatever text encoding makes sense for it. In practice, clients and servers do a weird mishmash of either not doing anything or just doing simple CRLF mappings. If your text file was in Latin1, UTF-8, cp1252 or anything more complicated that 7-bit ASCII it's as good as lost.

To my delight, lately I've seen ftp servers that simply *ignore* ASCII mode. They transfer in binary no matter what you say. Highly nonstandard, but I'd like to see more of that.

Re: The Loony BIN

2007-05-01 12:54 • by Evo (unregistered)
134456 in reply to 134426
bling:
Darth Vader:
No, the real WTF is that he used a month old backup instead of running
sed -e 's/\r\n/\n/g' on the backed up file. Seriously, WTF was he doing as an admin?


Maybe you have a sed recipe for appending all the stuff that was truncated after the 'end-of-file' character was encountered, too?


Hmmm, let's say we're missing about 2GB (= 2.000.000.000 bytes (Yes, really)).
We could just do:
head -c 2000000000 /dev/random
and append that to the file. If we're lucky, we have the original file....

A nice chance of 1:256^2000000000...

Then again, pipe enough data to a file, run it, and you'll have recreated Windows.
What? This isn't an illegal copy of Windows, it's simply the first X bytes from /dev/random !

Re: The Loony BIN

2007-05-01 13:20 • by JCM
134457 in reply to 134432
Could not agree more. Word is awful about files over a network, or LARGE files in general.

Re: The Loony BIN

2007-05-01 13:21 • by Sarcastic Ahole (unregistered)
you guys are all dorks

Re: The Loony BIN

2007-05-01 13:23 • by zip
134459 in reply to 134456
Evo:
bling:
Darth Vader:
No, the real WTF is that he used a month old backup instead of running
sed -e 's/\r\n/\n/g' on the backed up file. Seriously, WTF was he doing as an admin?


Maybe you have a sed recipe for appending all the stuff that was truncated after the 'end-of-file' character was encountered, too?


Hmmm, let's say we're missing about 2GB (= 2.000.000.000 bytes (Yes, really)).
We could just do:
head -c 2000000000 /dev/random
and append that to the file. If we're lucky, we have the original file....

A nice chance of 1:256^2000000000...

Then again, pipe enough data to a file, run it, and you'll have recreated Windows.
What? This isn't an illegal copy of Windows, it's simply the first X bytes from /dev/random !


What the heck are you trying to say??

Re: The Loony BIN

2007-05-01 13:26 • by nobody (unregistered)
134460 in reply to 134446
derby:
When I was a kid, we used a keyboard


keyboards! You were lucky. We had to flip toggle switches all day long if we wanted to interact with our computers. Kids and there damn fancy buffered input.

Luxury! We had to magnetize little magnetic cores, and check the magnetization with tiny magnets!

Re: The Loony BIN

2007-05-01 13:26 • by Andrey (unregistered)
Forgetting to manually set BIN mode in an ftp client that doesn't do it for you (99% do) is an entirely forgivable mistake. What is inexcusable is the fact even `ls -l` on the local machine and the backup server would've shown that something is seriously messed up. In a critical process such as this, the absolute minimum of testing would've been to compare md5 hashes of the local files and the ones on the remote server to make sure no corruption happened. While I doubt that a place too cheap to buy a few extra tapes would have a test server for test restores, a couple of extra checks in the backup script/instructions would've revealed the problem really quickly.

Re: The Loony BIN

2007-05-01 13:29 • by jergosh
134462 in reply to 134450
old bloke:
http://www.ietf.org/rfc/rfc959.txt
3.1.1.1. ASCII TYPE
This is the default type and must be accepted by all FTP implementations.


I believe that the command line ftp in Linux distros is set to binary by default (common sense!).

Re: The Loony BIN

2007-05-01 13:33 • by jimlangrunner (unregistered)
134463 in reply to 134446
derby:
When I was a kid, we used a keyboard


keyboards! You were lucky. We had to flip toggle switches all day long if we wanted to interact with our computers. Kids and there damn fancy buffered input.


toggle switches! Lucky. We had to flip those bits by hand, with magnets. I won't even tell you how much fun it was to read it back out! Kids and their damned mechanical input devices.
« PrevPage 1 | Page 2 | Page 3Next »

Add Comment