- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
Agreed. But using "i.e." when one means "e.g." is almost as annoying as IE.
Admin
With the temp directory solution, the file has already been written completely to disk, and all the mv has to do is to create a new directory entry for it in the new directory. So what might happen if the mv is not atomic? The directory entry might not be complete, and the Watcher only sees half the file name and therefore can't find the file? It'll just have to try again later.
Obviously, when setting up the solution you would have to make sure the temp directory is on the right file system. /var/tmp might indeed not cut it.
The interview QUESTION is good enough. The problem might be that when I'm told I may not modify the Watcher, how the heck am I supposed to know that I might be allowed to modify the Downloader?
Admin
We have 2 open positions on our team, due to high turnover, so I interview probably about 5 or 6 people a week and have gotten really good at giving technical interviews. It usually involves giving them a real problem from some code we have, and seeing if they solve it the right way, and then I explain to them how it should be done and make sure they agree.
It is hard to find people who have the right mix of skills and personality. Some people realize halfway through my technical interview that they lack the required knowledge and simply cut it short and walk out of the room, I assume in embarrassment.
Admin
Yes, it is just you. Actually, I would say that he should rephrase the question, something like this maybe:
Users are running on a version of Linux which they install and upgrade themselves. We have written a downloader application to download some important data for their business. They are using a 3rd party processing app which watches a specific directory for new downloads. As soon as the processing application sees a new file in the directory, it begins processing it. However, processing happens faster than downloading, and the watcher will produce an error if it processes an incomplete file. How can we modify the downloader to make sure this doesn't happen?
Now it's clear that you can't modify the OS because you don't control it, and you can't modify the watcher because you don't control it. You've also made it clear that you want a solution which modifies the downloader program.
In any event, I don't think Jeremy was being a jerk at all. He was just having trouble describing a real-world problem accurately. It's very common for inexperienced people to become flummoxed at real-world questions like these because they don't have the experience to know that these problems will arise. It just requires a little forethought when phrasing the question.
Admin
Every file system command? No.
Rename? Yes, it's guaranteed to be atomic, by the relevant standards, at least in the way we care about.
Seriously, it links the inode to the new directory, and unlinks it from the old one (link + unlink is another way to do rename, BTW). In no sane implementation can that result in seeing a partial file.
And before you mention different filesystems, rename (and link) generally do not support that (they return errors), and, if they do support it, they are STILL required to be atomic.
RTFS — http://www.unix.org/single_unix_specification/
Admin
What is the point of an Async watcher/downloader when they files needs to be synchnerized? I'd just make the downloader start the watcher to process the file it finished downloading.
Though if they were to keep the original design and this being in Linux with the same assumtion from the question.
I'd have the downloader create a second file to mark download completion for each file downloaded. This will prevent the moving gigs of data...
The 2nd file would be a MD5 read from the server and on the downloaded data and other meta data about the file for verification purposes...
Admin
Instead of mv, why not ln?
Admin
I think I know what happened with Jeremy's interviewee, because I tripped over exactly the same thing. When Jeremy said "you can't change the Watcher", I, and I presume the interviewee, interpreted that as "you can't change the Watcher or the Downloader". Without pausing to check the constraint, the solution proposed is not unreasonable; without restating the constraints back to the interviewer, the wrong assumption will go unnoticed (and produce an apparently bizarre solution).
So I don't think this is a WTF, just a misunderstanding. I hope this question wasn't the only reason the guy didn't get the job, because he's exactly the kind of person you'd want to have around when things really are that intractable.
Admin
So even if it was not possible to modify the downloader, this is STILL easy to do.
Admin
"margin: auto" Thanks for the tip.
Admin
every body is worried about mv being atomic or not, but if you can't assume that you have access to an additional temporary directory on the same disk, couldn't you download to some temp directory you do have access to, and then toss a link into the directory being polled? Cleanup would be a bit more of an issue, the watcher would take care of the links, but the original file(s) might require a bit of trickery. But this would certainly guarantee atomic operation, download file - create link - .... - clean up file, just random ideas.
Admin
It's certainly possible. I think it means id est, and loosely translates into "That is to say".
I used it between two attempts at explaining the metaphor- A literal one, and one where I attempted to show a situation where the description might be more valid.
What do you think it means?
Admin
Have you ever asked why you have high turnover?
Admin
Admin
Problem solved.
I am totally amazed at how many 'over complicators' there are posting messages here. One would think they would be too busy over complicating things and not have any free time to post to TDWTF.
Admin
Surely a better solution to the download problem is to add ".unfinished" to the file name then rename it when it's complete.
This avoids the problem of some smartass putting the temporary folder on another disk and then you get expensive copy operations, run out of disk space when you try to move the file, etc.
Admin
Ah, but what if his cell phone didn't work?
Admin
Admin
Well, I think it means nothing, because it was written as 'IE', not i.e.. That was kind of his point to begin with.
Admin
That part isn't in the original problem statement. It's only included at the very end, where the interviewer says, "so you’re saying, to solve the problem of the Watcher processing files that are not done downloading, you would modify the Linux kernel?"
Admin
So. Why did the interviewee for a PHP developer position got asked about CSS?
Admin
Move within the same filesystem ("rename") is atomic and doesn't involve copying/moving the file bits. In any reasonable OS, which Windows is, too, no matter how you would object. As soon as the file appears in the target directory, it's there instantly, and "Watcher" can read it without any problem.
Admin
Admin
Seriously?!? They said the office called him and his cell phone worked perfectly.
Admin
Touché.
Admin
I was referring to a myriad of over complicators posting. However, if you are asking for general knowledge, rather than to answer the interview question properly...
Physical drives are irrelevant in Linux. Moves across file systems are not atomic, but are implemented as copy and remove. Traditionally file systems did not span physical drives, but today they can in various ways.
Admin
No, don't make it blue! It's just our favorite troll, TopCod3r.
Admin
A situation similar to this comes up at my office on a daily basis. We virtually never have control over both the watcher and the downloader. Often they don't even run on the same machine and simply interact through a shared folder or something similar.
Saying "just have a script wait till the downloader finishes to start the watcher" assumes you have complete control over the system which may or may not be true. The solution of a tmp name/directory (or something like a poll timer checking for file size if you're the watcher) is the most universally accepted as it requires only modifying and/or controlling one of the applications
Admin
Another way to process large files is to ensure that the processor does not start processing a file until it sees a small semaphore file.
E.g. a 100MB Movie.MOV file won't be processed until a 1byte Movie.MOV.GO file is in the same directory. I've been using this for years.
Admin
Even more fun, ask for a SECOND simple solution.
The temporary directory is a good solution, but it has problems. E.g., in Unix you can rename atomically, but I can imagine situations where you would have to manually read/write the entire file. Is there another standard approach?
It turns out there is --- and it's actually more correct in some ways. Have the downloader get a 'write' lock on the file. Have the watcher get a 'read' lock on the file. (Or its own 'write' lock if it deletes the file as the final step in the process.) You're fine as long as everyone uses locks and the watcher program is smart enough to keep retrying to get a lock. (I assume it's not so stupid that it will wait indefinitely for that lock.)
Admin
I'm pretty sure the actual point was that i.e. != e.g. http://www.wsu.edu/~brians/errors/e.g.html
Admin
The problem statement said that the downloader downloads multiple files. Presumably you'd want the watcher to process file 1 while the downloader was downloading file 2.
Admin
I still think e.g. was what was needed here. I believe e.g. would be here is one of many examples.
**EDIT - adding a link that shows my point http://ancienthistory.about.com/od/abbreviations/f/ievseg.htm
Admin
No, I think the usage is correct here IE = "Id Est" to denote clarification or further explanation.
captcha: dolor. ie lorem ipsum dolor est...
Admin
Yea i'm guessing that you are the problem. People don't just walk out of interviews out of technical embarrassment if they don't know something they usualy say something like "If I had google in front of me I could look it up in 2 seconds!" They walk out in situations where they find the interview to be asinine, or decide the company sucks considering the events that lead up to that point.
Admin
So the real WTF is me for misreading move vs copy...
Or everyone else for assuming that your average developer has control over where the SAN administrator chooses to store various file paths?
Or are you all assuming that this is a small/medium shop where a developer also controls the infrastructure? That just isn't true in enterprise organizations (at least, not all of them).
I admit my original comment was short-sighted, but come on people... if you need a caveat on your response, then your response is flawed, too.
Admin
I think these interview questions must hit a raw nerve with programmers who know they would have failed them. The level of both ignorance and hostility in the comments is phenomenal.
Admin
Don't feed the trolls
Admin
I've got a "downloader/watcher" set of apps I wrote. There are a few ways to solve the problem. In fact, moving files around would be my last choice.
My watcher and downloader are in the same app. Each thread knows what to pass along so that the files can be processed correctly. An old version had two separate apps, and used temporary file names. It was much slower, though.
Admin
Recruiters are not always your friend. He may have been sending you just to get some information about the position, or to make the next guy look better.
Admin
No. "Id est" means "that is", or "in other words." It does not mean "for example." http://en.wikipedia.org/wiki/Inter_alia#I
Admin
Try inotifywait, it can report file names when they are closed. But I would still go for the temporary location. Suppose the download gets interrupted and is restarted. That could lead to incomplete files being processed.
Admin
umm... because web development involves css? Even if you have a dedicated design team that gives you the static html, you will still likely need to modify it as you build out the site, and that requires knowing html and css.
Admin
It would really depend on what the "Watcher" was "Processing" If we are talking about multi-GB files I would consider coding the "Watcher" so that it could start processing before you had to download the entire file and/or move it to it's proper directory. And what exaclty is this data and how is it used? The "simpler" solution may be to set up a one method web service access the bits of data as needed by the user vs. constantly keeping two data sets in sync and eating up bandwidth.
I suppose attempting to solve the functional problem and not a techincal problem is what separates Engineers from Programmers (the men from the boys).
If I were to ask a question like this in an interview I would expect to hear several questions from the canidate regarding what the goal of this feature was and what his restrictions were, but hopefully modifying the kernel is assumed outside his domain of control.
Admin
JQM: Please do keep up. TopCod3r really is tdwtf's resident troll. ;) Just see other posts by him and you may detect a pattern.
Admin
I had a similar thing happen wherein the guy wanted me to come meet him, but the first opportunity I had was after I got done with work on Friday (was looking for another job but I wasn't about to endanger my current one). So there I was driving over there after 5:30 PM on a Friday, to go somewhere that takes at least an hour to drive to in normal traffic.
Recruiter calls me when I'm halfway there and offers to meet on Monday instead. I said I'm halfway there and we can just go ahead and do this today.
When I finally get there, the recruiter is gone. He said f*** this and left - apparently I was crimping his Friday night plans (in all fairness meeting on Friday after work was his idea). I was pissed since I had driven this whole way for nothing.
On the upside they did have someone else to talk to me - a blonde former cheerleader who had been in the workforce all of three weeks and apparently had not had the "don't dress provocatively" speech with her boss yet.
Admin
BUZZZZZ
Thanks for playing. You can't change the Watcher, and it needs to take out a lock, and *NIX doesn't do an automatic locking as part of the filesystem.
Admin
What if the downloader crashes or the network goes away?
anyway, the solution I've seen work (and work well) is this:
download file. when done, create file.DOWNLOADED file muncher notices a file ending in .DOWNLOADED, creates a .PROCESSING file in the same matter and writes its pid to it. when file muncher finishes, write file.DONE, delete file.PROCESSING
you can fill in the error checks pretty easily, and using ls will tell you what's going on.
Admin
But the way Linux works, the entire filesystem is represented as being contiguous, even when the physical storage isn't.
All it would take is one "clever" sysadmin to put the temp directory on a separate partition and all of a sudden you've reintroduced the same race condition you had before -- except this time you're not even aware of it.
The best idea is to use some sort of locking. Either using built-in operating system locking or just something as simple as dropping a lock file in the directory that both the Downloader and Watcher respect.
Yes, I realize the Watcher isn't supposed to be modified... but stating that you can't modify the Watcher does make this an absurd scenario to begin with.
Admin
"...remote server and save them to a certainly directory on disk. A Watcher program monitors this directory..."
and then:
“What about if the Downloader just wrote files to a temporary directory, and then moved the file to the appropriate directory when the download was complete.”
I think the first door he closed was that leading to his 'briliant' solution... And as always, if you know how the conjurer does his tricks, they are so easy you could have thought of them yourselve