- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
For number 2 it's not a copy its a move which is an atomic action under Linux, and unless you are moving between two separate partitions (why would you) it happens pretty much instantaneously so no need to monitor progress.
Admin
TRWTF is that this message board doesn't do threaded replies!!!
Admin
That wouldn't surprise me at all. I get this all the time. On my resume, underneath the list of the various programming languages I know and the experience I used to have a phrase "I've been exposed to other languages such as ... (list of languages) ... and could do simple tasks using them as a part of my other duties"
I don't know how many times I got called by recruiters for Python or C# jobs. "But it says you've been exposed to it!". Yeah, I can take a program and debug it probably. Or I might be able to make small changes. I'm not going to get through an interview for a senior level position.
I've pretty much simply refused the interviews. No point in wasting my time for a job I know I won't get (or if I did get it would be a living hell since anyone hiring me for a C# job would have to be off the scale on cluelessness)
Admin
Anyone who does interviews and doesn't see how clear-cut this was, is a walking liability and needs to get trained.
Admin
The concept of a non-atomic move on a filesystem horrifies me. What would the intermediate state look like? Would there be an entry in the directory but the inode isn't filled in yet?
Admin
That's easy - he's hiring toadies and footstools. In a less perjorative example, I'm allowed to pass on hiring someone because they don't fit in with the team, even if they're professionally qualified.
Being an abusive jerk isn't illegal.
Admin
This is linux - a non-atomic move would mean that there are two links to the file - the old one and the new one.
Admin
A while back, I interviewed for a position as a programmer writing C code under QNX for embedded controls - a position requiring quite a bit of experience. I did fairly well on the technical part of the interview and matched personality well with the other team members.
However, the manager kept going back to the fact that I didn't have a degree. He did this in a number of ways. It was a bit irritating, but overall, a pleasant experience.
So an hour after I get home, the recruiter I was using called me excited that she had an offer for me. I politely declined it, and she was dumbfounded... even after I told her that, if the manager was harping on me during an interview for not having a degree, what kind of manager would he be? I had the qualifications for the job but didn't think it would be a good match.
The recruiter never called me back.
Admin
Finally, why any software developer should date to say "just don't do it" to a customer? It is the customer who pays money.
Read my words again. You may still be able to download the file. You may just don't have any temporary directory. So you may have the simple options to choose between - either you download the file right to the watched directory, or you cannot download the file at all. Example: an ADSL router with either NFS/Samba mounted volume, or an external hard drive whose root directory is being monitored by the Watcher. It is not the solutions which are bad. The solutions perfectly fit the problem - as it was presented. It is the problem which is poorly defined and leaves a lot of unclear spots. The candidate's solutions are overly complex but will work (unless the game rules change again, say, to prohibit the kernel hacking). The Jeremy's solution is simple but may not work. If you are writing a 10-bucks-per-bunch document backup utility - it is sufficient. If you are writing a non-upgradeable firmware to a hardware root DNS server, you are doomed. Was the candidate ever told that he is not writing a non-upgradeable firmware to a hardware root DNS server?But finally, anyway, no sensible person may name a working solution "bad", opposing them to the non-working "good" solution. Such dialogue worths another WTF.
Admin
LOL...OK, so it's a jargon issue.
The filesystem, in Unixish parlance, is the directory tree as a whole. It spans partitions and physical devices. Oh, it is also the arrangement of data and files within a partition but when people talk about "the filesystem" they mean the entire hierarchy. It can easily span partitions, logical volumes, devices, etc.
I'm guessing you are more familiar with the DOS/Windows world where filesystem has a more limited meaning. However, the context of the conversation made it clear that they were talking about the files spanning physical devices or partitions within the filesystem...which means they were using the unixish meaning of the word.
You should follow your own advice and do some reading.
Admin
No, you don't use temp at all. And you either control the install or write the install requirements in the docs.
Yes. Pick another directory on the same file system. Any one will do, but something like /staging is reasonable.
who cares? You're the one who wants to bung it into /tmp.
not very bright, are we? the third option is to download into /a then move to /b, which takes no space from /tmp. Duh.
No sensible person will accept kernel hacks to support a file downloader. If you're writing a $10/mo, then you absolutely want a simple solution (not his), because otherwise, support costs will kill you.
Admin
One day I heard the Unix guys called ReiserFS "a filesystem"...
Admin
No, I don't think I will. The filesystem refers to a logical volume. The directory tree is everything that the unix file subsystem can see. I'm guessing you don't know much about unix to be lecturing me about something so simple - I've been at it for about 15 years and I know more than you.
Now then, the filesystem is a single volume. Always. fsck operates on one volume at a time, for instance, and fstab lists them out, one per line. The context didn't really cover what you think at all - that's why I was specific in my advice.
Besides, if filesystem referred to the whole tree, then concepts like 'same fs' are meaningless.
Admin
I swear, I could have been the interviewer on that last interview. Except it was a different company and I was the only technical interviewer. It's funny... every company has a different idea about what a "senior developer" should be able to do. I've worked at some places where that meant you had to be a rocket scientist, and then others where at most you had to know how to include a file. There's no industry "standard." Even as the interviewer for a company like that, you have to pretend that you agree with the title given to the job. I didn't consider that a WTF, because it sounded too familiar :(
Admin
Shrug. You really should do some reading. Knowledge is always good. I can't force you of course but it will help you to understand what people are saying.
Filesystem has two meanings.
Your job as a reader is to figure out which of those two meanings the writer intended.
You failed.
Admin
True that. To compound things, some places will post ads for a software developer III and a programmer IV and just assume that people know what that means. Since, as you say, there aren't any standards, why bother with even finer grained distinctions?
Admin
Admin
Why I mentioned /tmp, is because it is usually open for writing. If it exists. Though, nothing changes to better side if you use "/a" instead of "/tmp". The things just become even more complex. Cause you need even more assumptions to be legally able to write to /a dir than to /tmp dir. Not to say that the root dir in Unix is mounted read-only too often to consider the opposite.
Note, that, according to the rules, the only place you are allowed to write is the working directory. You've never been mentioned that you may write somewhere else. If you don't verbally confirm such obvious thing - you may have "a lack of initiative" or something, but if you assume this thing while it is never mentioned - you have a "lack of bordercase feeling". Frankly, I would hire the person with the lack of initiative.
I've personally been in the team which had the Linux kernel severely patched for the purpose of the userspace applications running on that system. The system had a 6-to-7-digits cost, contained about 10 years old legacy code, and required a specially designed hardware to run. Looking at the Jeremy's story, I've never noticed a mention that this downloader is not going to become a part of a similar system.Admin
Go read http://linux.die.net/man/8/mount. Filesystem in this context means one thing only. Your job as a reader was to understand that, and you failed. I don't care if you think that filesystem describes the whole shebang - it doesn't. You can refer to the virtual filesystem if you like, but that has different semantics.
Admin
I keep getting calls for .NET programming. I should note there is absolutely NO .NET programming on my resume. I don't mention it at all. There's even a minimum on my resume about windows.
I finally figured out why I get the calls. It's my email address. One of these days I"m going to get a call from someone 'who read my resume and thinks I'd be perfect for the .NET position!' and lose my temper and ask them in what universe did they read it. Recruiters no longer read resumes, their computer programs key off of certain words and then you get the calls from the idiots who waste your time with... well... idiocy. (And then they're shocked that you're not interested in the interview!)
Admin
yeah, I'm lazy and didn't include an implicit root. Here you go:
the third option is to download into ${DL_ROOT}/a then move to ${DL_ROOT}/b
Now stick DL_ROOT somewhere that's got about 50-100G free - twice whatever your expected retention is.
yeah, see above. I didn't want to repeat myself for the 3rd time.
I didn't mention it to you because we haven't got anything like requirements. There are lots of other choices that fall under the purview of write to dir a, move to dir b, and I don't need to mention them here - it's pointless on such a fuzzy problem.
Me too, minus the special hardware and kernel hacks. If I'm writing a file catcher, I'm not adding to the mess when there are better solutions available.
Admin
Admin
Yep, I most definitely spread the word about that company on my personal blog (which is just a tiny blip on the web in a strange foreign language but is indexed nicely by google).
Admin
I've had some similar interviews myself, I was about to write them up but I'll save them for my own submission at some point I think...
The best practice approach would of course be to use advisory locking in both applications, and this would be fine with scenario as originally outlined. You should pretty much always use advisory locking when reading/writing to files, it's amazing how often people don't bother with any sort of file locking (and how many minor but never the less irrigating problems arise as a result).
Half way through there is arbitrary condition of "let’s just say you can’t modify the Watcher" introduced (albeit with the best of intentions) but in a real world environment in my experience if you've got the code for the Downloader you will usually have the code for the Watcher and so be able to fix both applications to work as they should have done in the first place, or at least you'd be able to walk over to / call someone who does look after that application and mention it to them.
If in some unusual scenario that was not the case I would absolutely fire a bug report to whomever was responsible requesting that support for observing advisory locks is added (given the application crashes on files that are still being written to - that's a pretty critical show stopper and the Watcher applications problem).
In the mean time, mandatory locking in the Downloader would prevent the Watcher from opening the file.
Okay, the Watcher could in theory still crash in this scenario, if it has timeout mechanism wrapped round the file open instruction and there is an unhanded exception that occurs when that happens, but that's highly unlikely really, especially given it doesn't even call flock() / lockf() in the first place. Usually a mandatory lock will just cause the application that trying to get a read on a file (without looking for a lock) to sit there until the file is cleared for reading, with no other ill effects, but I mention this just for completeness.
Ob: Mandatory locking is usually poor form though and arguably at best only /slightly/ less evil than a crashing app. ;-)
In this situation, it's also a good idea to consider setting up a monitor to look out for the Watcher process, if it really is that bad.
Admin
To head off another exchange, this is almost certainly not tmp - you're probably going to keep the file around, so you may as well stick it in a dir on the destination fs.
Admin
After reading all the comments, I'm pretty sure there are a lot of people here who really want to make things complicated and try to avoid simple solutions at any costs.
/Bla/DownAndWatch for downloading /Bla/DownAndWatch/complete for the Watcher
mv really IS the fastest and most simple solution. Forget about /tmp and stuff like that - this stuff just needs to be on the same physical volume, otherwise you need to double your free space needs.
Admin
Admin
And btw, even
does not ensure that the downloading and the watching directories are on the same physical volume. Oh, those Unix IT specialists, they are so inventive...Admin
In that case, watcher might start trying to process .bash_profile. Anyway, the only requirement is to get another directory on the same filesystem. watcher is most likely configured using something like watcher.cfg, but even if it isn't, this isn't a big deal. So you create ~download/staging and download there, then move over (yeah, glossing over permissions - set you umask or run them under the same account).
Admin
that falls under incompetence; you write your install doc to specify that they are on the same filesystem. mounting /Bla/DownAndWatch/complete as a new fs is not a supported config.
Admin
Make sure you do body { text-align: center; } too, otherwise IE 6 won't place nice.
Admin
If you are really paranoid, you can forgo the Unix mv command and write your own wrapper to the rename system call. The rename system call will fail with EXDEV if it attempts to move a file across file systems.
If you don't think the download temp file/rename solution is the best one, why not check out your favourite browser as it downloads a 500 megabyte porn movie. In this case, of course, the watcher is not even a process on the computer, but a human being. Firefox, IE and Safari all implement variations on the theme.
Admin
The interviewer started making the situation more complex than it had to be, for the same end-result of figuring out if a candidate is right for the job posted. Isn't he exemplifying the exact kind of candidate that he was trying to filter-out?
Admin
Funny, the IE I have does the download and copy thing, even when I tell it to save the file to a desktop. Porn movie, jdk, fedora ISO, whatever.
Admin
Temporary file locations are a good way to indicate to end users of desktop applications when a thing is finished downloading (in the way that say, that Firefox does with downloads when it renames them) but it doesn't seem a particularly great solution here and seems odd to insist it is "the one true way" when it comes to writing systems software - I certainly don't think it is any such thing (and is more obtuse than file locking, which exists to handle exactly the kind of problem described).
If you went for moving a file around instead of simply locking you'd have to have a configuration option to indicate where the separate temporary directory is, and add quite a bit of error handling - e.g. to see if the temp directory is really not the same as the directory it's saving files to (and also possibly - from the description we have of how the Watcher works - that it's not a subdirectory), that the temporary directory is on the same volume, that the directory is not a link to somewhere on another volume (and or link to the same folder as the download folder) and possibly other things that don't immediately spring to mind.
I can't see anything in favor of moving over simple locking. If you are writing to a file at any point you ought to be getting at least an advisory file lock anyway, assuming you are not a cowboy, so moving the file ultimately just adds a bunch of extra possible failure scenarios.
Admin
No, that wouldn't work. The file record will show up in the watched directory in the midst of copying and your watcher program catches it in mid-copy. You have the same problem as mid-download, just a much narrower range of failure opportunity.
The better solution (assuming you cannot PUSH from the downloading location) is to have the downloader and watcher's logic (not its directory monitoring features) in the same application. On top of that, you probably ideally want some sort of streaming solution with a backup-to-disk feature if you still want those files sitting around.
A directory monitoring approach will never work in today's filesystem implementations. What is needed is a true indicator of download/copy completeness flag within the file's metadata; and naturally for OS functions and custom downloaders to support said metadata.
Admin
Actually, on most file systems, "moving" a file, providing its on the same disk and partition, doesn't actualy move the data, it just changes the record in the allocation table that says where the file is, so as long as the file was fully downloaded before the move, the problem of hitting the end of an incomplete file could never occur.
Admin
suppose you're downloading that 5G file and the router dies halfway through. Your download thingy shoots itself in the head because woops, bug in the code. File lock goes away, watcher chews on half downloaded file, explodes.
Now suppose you're using one of the other two methods - lock files (adv. locking) or some other dir. Same scenario, but the fallout is a half downloaded file and the watcher doesn't try to eat it. As a bonus, you can recover easily when the router comes back, depending on your protocol, or just try again. It's also easy to write monitors to catch stale files and that sort of thing compared with file locking.
Admin
No, this is a move - there is no such thing as mid-copy.
That's basically what the while .PROCESSING, .TRANSMIT convention does.
Admin
(yes. IE does it wrong. No surprise there. FF and others get it right.)
Admin
You have to handle dealing with the the connection disappearing during the download, locking or no locking - I've written many applications and scripts in a range of languages which handle large file transfers reliably and I it's not made any more complicated by locking, quite the contrary (it is really wacky to write to files without locking though). Neither, I should add, does it in any way prevent you from implementing a resumable download.
File locking is really, really something you should always be doing when writing to a file regardless of whether you are using a temp file or not. Simply not bothering to use advisory locking is bad form.
Hint for one reason why: If you get two copies of the Downloader running, let alone anything else trying to access that file while it's still being written it is going to end up corrupt and the Watcher can either end up processing duplicate data (or simply crashing, which is what it seems to do when it comes across malformed data).
What your suggesting is a easier solution only for the sort of rent-a-coder who (a) doesn't intend to do any file locking at all and (b) doesn't intend to do error handling on the download and (c) doesn't care about the integrity of the data.
Of course, doing BOTH temporary file handling AND advisory locking is arguably an ideal technical solution, but of course it will take considerably longer to implement and test the appropriate level of error handling for all the new potential issues it raised compared just doing some simple file locking that should be in the application in the first place. One is a straightforward bug fix, the other introduces new functionality to the application.
Admin
My thoughts exactly. Apparently, you're writing the downloader, so a) Why, exactly, can't it have an EOF marker, since you're also writing the file processor? b) Regardless of 'a': Simplest solution of all -- Rename "download file xx" to "Done-download file xx". Your watcher only sees files beginning with "Done-" thanks to file filtering.
Admin
a: we haven't really said anything about the format of the file, and we don't really want the downloader to know any format details because that means you have to update it when a new format comes around.
Admin
I have to confess to not seeing the solution to the downloader/watcher problem.
I would have had the watcher check the file size, and when it stops increasing, process the file. However, that will fail if the download completely stalls.
Still, I'm not a developer.
Admin
Is it just me, or was the interviewer from the first story a total jerk?
Admin
It's not correct because he was giving a specific example and should have used e.g. Using i.e. would have meant that the only meaning of diamond in the rough was the situation he described specifically.
For example if you worked for a company that only printed newspapers you could say "our publications (i.e. newspapers)". If your company printed books and magazines as well it would only be correct to say "our publications (e.g. newspapers)".
Admin
Not at all.. UNIX. Using rename() just moves the link to the file to a new directory, provided source and destination are on the same filesystem.
The file is already saved to disk, nothing about the file changes other than an entry is created in the new directory.
As far as software running on the system may be concerned, this is completely atomic. And this is very frequently the method used by various tools.
For example, rsync uses this method when transferring a new file to a host (a temporary file is created during the transfer, and after the transfer, it is rename()'d to the final location of the file).
Admin
In an ideal world that would be great. However in the real world this isn't always the case. Especially when dealing with government employees you often run across the "that's the way it is, deal with it" or "it's not specified in the contract"
Admin
So guarantee the rename() is atomic. Put the new file in the same directory, but prefix it with a ".", give it a naming convention that will cause other processes not to mess with it, or place it in a subdirectory. Open exclusive does not work, because there is no such feature.
I mount my filesystems using NFS on UNIX which does not provide proper file locking of any sort. It is common practice to use such filesystems.
renmame() is pretty sure to work everywhere, "open exclusive" is only valid if you make liberal and unwarranted assumptions about OS and filesystem types.
Admin
1 - Requires having access to the server the watcher is running on. This may or may not be the same server as the downloader. 2 - Requires modifying the watcher. 3 - Is essentially the same as the proposed solution. Rename is the exact same as move when the move is done on the same filesystem. 4 - This assumes that you can modify the run schedule of the watcher and also places a hard cap on the size of file that can be transfered, although it may work it's far from ideal. 5 - In an ideal world that would be great, in the real world it's likely that you'll have to interface with a PoS at some point and simply have to deal with it.