A Problem at the Personal Level & More

  • Alan 2008-09-24 11:06
    “Oh,” he scoffed, “so, we’re equals now, is that it?”

    No. I'm not an asshole. Good day.
  • crystal mephistopheles 2008-09-24 11:08

    “Oh, okay,” the candidate replied. He pondered for a full minute and said “so in that case, I would hae the Watcher listen on a TCP/IP port, and have the Downloader tell it when it was done downloading.”

    “That seems like a lot of work,” I said....


    I don't think it's fair to say network I/O is "a lot of work". Granted, his temporary directory solution is even simpler, but most high level languages (.NET, Java, etc.) have fully defined classes you can use to implement this in only a few lines of code, and I would classify this as an acceptable solution.
  • MiffTheFox 2008-09-24 11:09
    First!

    And if it's not first, I'll just patch CS so it identifies all third comments as first.
  • Max 2008-09-24 11:10
    Oh, seriously. Each one of these is a job you shouldn't take anyway.

    1) Obvious reasons...
    2) An interviewer who thinks downloads in progress are a problem but file copies in progress are not shows a lack of understanding.
    3) Interviews are two-way -- if the people interviewing you are clueless, the job will suck.
  • Outlaw Programmer 2008-09-24 11:12
    Disagree on (3) here. The people interviewing him weren't clueless. This kind of "fizz-buzz" problem is pretty common and it's amazing how many "developers" can't pull off the simplest things.
  • Azeroth 2008-09-24 11:12
    There is another simple solution with the downloader/watcher problem - downloader should open the file exclusively while it's being downloaded, this way watcher won't be able to access it until it's closed. This way it's not even required to move anything anywhere.
  • andrew 2008-09-24 11:14
    This is actually a bit more graceful even. There is no polling involved of the directory, you know exactly when to process the file. This could be 100x better if the mount is NFS'ed. Also, what if you need to change the "copy" framework to something else. I think mv /tmp/foo /end/foo is fine to start but its not robust for large(r) scale or more complicated designs.
  • The Undroid 2008-09-24 11:14
    Better Dave should reveal his tendencies at the interview, even at the cost of an hour and a half wasted, than hire Shari and gradually work out whatever his problem is with Brown grads.

    Also, how do we know there are no troublesome latencies in the Dowloader's local copy?
  • blabla 2008-09-24 11:15
    Max:
    2) An interviewer who thinks downloads in progress are a problem but file copies in progress are not shows a lack of understanding.


    Umm... I think the guy said 'move' rather than 'copy', which is at least closer to atomic, if not atomic?
  • ThomasP 2008-09-24 11:17
    I think you missed the part where they were talking about file moves in Linux, not Windows.
  • Marc 2008-09-24 11:22
    Rename?
  • jpaull 2008-09-24 11:22
    Speaking of bad recuiters... I once dealt with a recruiter that kept trying to hook me up with positions I was not qualified for. I only had a couple of years experience as a Developer and on three different occasions scheduled me for interviews for Senior positions, and then would be totally baffled when I would tell him that I bombed the interviews.

    I wonder sometimes whether or not these recruiters get paid by the number of interviews they schedule and not by the number of positions they actually fill.
  • SoonerMatt 2008-09-24 11:24
    Marc:
    Rename?

    Yeah I was thinking that too. Rather than make it move a 3 gb file (which could fail in itself), I would start the transaction as a .tmp file then remove the .tmp when it's completed.
  • asdsdf 2008-09-24 11:24
    2) File copies/moves across a filesystem are (probably) much quicker than the downloading. The interviewer never said it was a problem with multiple processes accessing a file, just that if the Watcher reached the end of the file too quickly. If a copy/move can copy bytes faster than the Watcher can process the bytes, then it should be fine, no?
  • spenk 2008-09-24 11:26
    Max:

    2) An interviewer who thinks downloads in progress are a problem but file copies in progress are not shows a lack of understanding.


    How about a lack of reading comprehension? Move, not copy - makes a big difference if the files are moving between folders / directories on the same volume.
  • smasher 2008-09-24 11:27
    "moving" a file and "copying" a file are not the same. mv is hardly atomic, but if the file appears in the target directory, it's already been written to disk.
  • spenk 2008-09-24 11:27
    SoonerMatt:
    Marc:
    Rename?

    Yeah I was thinking that too. Rather than make it move a 3 gb file (which could fail in itself), I would start the transaction as a .tmp file then remove the .tmp when it's completed.


    The watcher appears to process *whatever* files are in that particular folder - simply altering the extension might not be enough.
  • Alan 2008-09-24 11:28
    If you are qualified for the job - of course the test will be easy. My company does basic testing for every applicant and every new recruit slags of the test - until they see the reject pile.
  • Gorfblot 2008-09-24 11:29
    A diamond in the rough is something that looks to be of little value, but is actually worth quite a lot once some polishing has been done. IE, You talked with one of the Tier 1 helpdesk people and realized they had some talent and were quick to learn. You get them some further training, move them to a junior development position, and in a short time they become a major contributor to success- That's a diamond in the rough.
  • snoofle 2008-09-24 11:33
    jpaull:
    Speaking of bad recuiters... I once dealt with a recruiter that kept trying to hook me up with positions I was not qualified for. I only had a couple of years experience as a Developer and on three different occasions scheduled me for interviews for Senior positions, and then would be totally baffled when I would tell him that I bombed the interviews.

    I wonder sometimes whether or not these recruiters get paid by the number of interviews they schedule and not by the number of positions they actually fill.

    I've found that sometimes recruiters will hold back some critical piece of info (say, requires relocation across the country) when they tell you how great the job is, so that you'll go to an interview you'd otherwise pass up. This way it looks like they're providing qualified candidates to the company.

    It's a waste of your time, and if I find out about it before leaving the interview, I'll let the interviewer know that the headhunter wasted his time and mine, and perhaps they should use a different agent in the future. I come off honest and the loser headhunter gets what he deserves.
  • emurphy 2008-09-24 11:36
    Max:

    2) An interviewer who thinks downloads in progress are a problem but file copies in progress are not shows a lack of understanding.


    As noted, atomic renames (and I think moving files to another directory is atomic even under Windows, provided you stay within the same volume, right?) shouldn't have this problem.

    What if you can't modify the Downloader, either? Other than "replace one or both programs with something that can be modified", my first thought is to write a third program that also checks the remote site, and alters file permissions on the downloads-in-progress so the Watcher can't access them until the download is finished.

    Max:

    3) Interviews are two-way -- if the people interviewing you are clueless, the job will suck.


    They were clueful enough to reject unqualified candidates. And it's possible (though I wouldn't bet on it) that the recruiter isn't a WTF either, that the pool of prospects just really is that lousy in that area (on the theory that good prospects get hired quickly while bad ones keep trying).
  • imMute 2008-09-24 11:37
    Max:
    \
    2) An interviewer who thinks downloads in progress are a problem but file copies in progress are not shows a lack of understanding.


    File copies would still be a problem, but file moves (also known as renames) are extremely quick as long as the src/dst are on the same volume: the FS only moves the inode data. Hell, even windows handles this as well as *nix.
  • snoofle 2008-09-24 11:37
    Alan:
    “Oh,” he scoffed, “so, we’re equals now, is that it?”

    No. I'm not an asshole. Good day.

    *bows in awe*
  • ST 2008-09-24 11:39
    Thanks for the interview tales, this is one of my favourite sections. Mind you, I'm pretty shocked at how many of the resident professionals are trying to come up with alternative answers for the problem in the second tale. Obviously you use a temp filename (ignored by the watcher) or a temp directory. What kind of mindset comes up with anything else? Especially consdering that the candidate is being prompted by the interviewer that he is looking for a simple solution. I don't think they lost any worthwhile talent by turning down that hire.
  • Aaron 2008-09-24 11:42
    If its in linux, the lsof command works quite well at telling you a file is still open, without having to modify the kernel or anything.

    lsof | grep filename || start whatever you wanted

  • David Emery 2008-09-24 11:46
    The "open exclusive" is the -right way*- to do this. If /tmp is located on a different file system/drive than the destination, then the file move operation is not guaranteed atomic. You can have the same problem on the disk copy than you have on the download (although the latency is less, it's certainly not zero.)

    dave

    (* I won't say "the only right way" but I might strongly imply it :-)
  • Someone You Know 2008-09-24 11:47
    Gorfblot:
    A diamond in the rough is something that looks to be of little value, but is actually worth quite a lot once some polishing has been done. IE, You talked with one of the Tier 1 helpdesk people and realized they had some talent and were quick to learn. You get them some further training, move them to a junior development position, and in a short time they become a major contributor to success- That's a diamond in the rough.


    I don't think "IE" means what you think it means.
  • James 2008-09-24 11:51
    David Emery:
    The "open exclusive" is the -right way*- to do this. If /tmp is located on a different file system/drive than the destination, then the file move operation is not guaranteed atomic. You can have the same problem on the disk copy than you have on the download (although the latency is less, it's certainly not zero.)

    dave

    (* I won't say "the only right way" but I might strongly imply it :-)


    a) since you're designing the fix, you can design it so the /tmp is on the same volume as the /final, so move is effectively guaranteed atomic

    b) I was going to suggest having Watcher poll the filesize and do its scan a fixed time after it stops changing. I think that can work, if you're able to be sure that Downloader is using a protocol with a spec'd timeout. Assuming, of course, that Watcher is interested in failed downloads as well as those that finish...
  • wds 2008-09-24 11:52
    imMute:
    File copies would still be a problem, but file moves (also known as renames) are extremely quick as long as the src/dst are on the same volume: the FS only moves the inode data. Hell, even windows handles this as well as *nix.

    All this assuming they're on the same partition right? So if they're not, you're in serious trouble. Considering how /tmp is the usual place to drop stuff in, and /tmp is often on another partition I don't see how this is an acceptable solution. Not to mention the problem with having allocated space in /var/program/blah but not in /tmp and thus running out of room to drop your multigig executables.

    See I'd just have used a lockfile.
  • sibtrag 2008-09-24 11:53
    I'd say that expecting move to be atomic is a rather poor assumption. You certainly open yourself up to problems if the volume structure changes. For instance, someone replaces the local directory with a symbolic link to a large scratch disk if the multi-GB downloads are filling the disk (or interfering with other activity on that drive).

    It may be more robust to download into another location & then create a symlink in the original directory.
  • Mike P 2008-09-24 11:53
    ST:
    Obviously you use a temp filename (ignored by the watcher) or a temp directory. What kind of mindset comes up with anything else?


    Well, my first thought was "Start the Downloader, wait for it to complete, then start the Watcher".

    Presumably the interviewer would have told me I couldn't do this, in which case the temp directory solution would have been my second answer.

    On Linux (mentioned by the interviewer) "mv" is perfectly atomic provided both source and destination are within the same filesystem.
  • ThomsonsPier 2008-09-24 11:55
    Someone You Know:
    Gorfblot:
    A diamond in the rough is something that looks to be of little value, but is actually worth quite a lot once some polishing has been done. IE, You talked with one of the Tier 1 helpdesk people and realized they had some talent and were quick to learn. You get them some further training, move them to a junior development position, and in a short time they become a major contributor to success- That's a diamond in the rough.


    I don't think "IE" means what you think it means.


    Internet Explorer? If that's the case, I make sure to pronounce it, 'AIEEEEEE!'
  • jwa 2008-09-24 11:55
    And your whole response is predicated upon the assumption that the term temporary file meant strictly /tmp. Which is not always the case, more likely it was "a file which is temporary" not "a file in the system temporary directory".
  • Charles Duffy 2008-09-24 12:01
    David Emery:
    The "open exclusive" is the -right way*- to do this. If /tmp is located on a different file system/drive than the destination, then the file move operation is not guaranteed atomic. You can have the same problem on the disk copy than you have on the download (although the latency is less, it's certainly not zero.)


    It's a move, not a copy. Moves within the same partition are atomic.

    Moreover, the tempfile-and-move approach (where, yes, you put your temporary file in the same directory or otherwise somewhere known to be on the same partition) is the generally accepted solution to this problem; if you don't know it, there's no way in hell I want you on my systems engineering team.

    Again, the problem the interviewer was looking for is the way every single UNIX developer with any kind of a clue does atomic file updates.
  • hatterson 2008-09-24 12:02
    crystal mephistopheles:

    “Oh, okay,” the candidate replied. He pondered for a full minute and said “so in that case, I would hae the Watcher listen on a TCP/IP port, and have the Downloader tell it when it was done downloading.”

    “That seems like a lot of work,” I said....


    I don't think it's fair to say network I/O is "a lot of work". Granted, his temporary directory solution is even simpler, but most high level languages (.NET, Java, etc.) have fully defined classes you can use to implement this in only a few lines of code, and I would classify this as an acceptable solution.


    True, but this requires you to have access to both the downloader and the watcher. In the "ideal" solution you only need access to the downloader.
  • klenow 2008-09-24 12:02
    Is it just me, or does Jeremy sound like a bit of a jerk? He simply wouldn't let the guy use any solution that wasn't his pet solution. Seems the simplest solution is to have the Downloader activate the Watcher when it's done.
  • SomeCoder 2008-09-24 12:02
    Yeah, solution #2 is just great. Because we all know that every file system command is guaranteed to be atomic, right?

    mv may be CLOSE to atomic but it's definitely not guaranteed to always be atomic. And if we suddenly have to change directories across partitions then it damn well is NOT atomic.

    I think Jeremy H should get a better interview question.
  • Paul 2008-09-24 12:03
    re: File watcher.

    Ignoring the fact that several people have raised that files appear before they've been fully copied or moved!

    The interviewer shot themselves in the foot, halfway through the (reasonable) list of solutions given by the candidate, they said it was not possible to modify the downloader. However, their simple solution given at the end required a modification of the application they said couldn't be downloaded!
  • oppeto 2008-09-24 12:07
    ST:
    Thanks for the interview tales, this is one of my favourite sections. Mind you, I'm pretty shocked at how many of the resident professionals are trying to come up with alternative answers for the problem in the second tale. Obviously you use a temp filename (ignored by the watcher) or a temp directory. What kind of mindset comes up with anything else? Especially consdering that the candidate is being prompted by the interviewer that he is looking for a simple solution. I don't think they lost any worthwhile talent by turning down that hire.

    Start with the assumption that you can modify the watcher, but not the downloader (exactly like here at my workplace, actually)... check the file size, wait a minute, check again, only process if the size is the same. Not foolproof, but "good enough" and doable in a small shell script.
  • Not Worthy 2008-09-24 12:13
    my first thought is to write a third program that also checks the remote site, and alters file permissions on the downloads-in-progress so the Watcher can't access them until the download is finished.


    THANK YOU! Seriously, the problem states that the Downloader downloads to a particular directory, and the official solution is "just change that directory". Even the official solution violates the terms of the problem.

    Seriously, run the watcher and downloader as different users. If the directory has the right permissions and the right umasks are in place, the watcher cannot even see the files which the downloader downloads until something (maybe a third process, but preferably the downloader) changes the permissions on the file, say to world-readable. _That's_ the right way to do it.
  • hatterson 2008-09-24 12:15
    SomeCoder:
    Yeah, solution #2 is just great. Because we all know that every file system command is guaranteed to be atomic, right?

    mv may be CLOSE to atomic but it's definitely not guaranteed to always be atomic. And if we suddenly have to change directories across partitions then it damn well is NOT atomic.

    I think Jeremy H should get a better interview question.


    Actually I believe that if you stick within the same volume it is guaranteed to be atomic. Given that you can modify the downloader it's fairly simple to have the tmp location on the same volume as the final location
  • Branan 2008-09-24 12:15
    A file move is different from a file copy in Linux. A move involves changing one pointer in the filesystem, no information is actually "moved". So it's not actually a problem.
  • AMerrickanGirl 2008-09-24 12:19
    The first story about the interviewer going out to lunch reminded me of a somewhat similar experience.

    I was contacted by a recruiter who asserted that he had a couple of jobs that were perfect for me. But first, he insisted on meeting me in person.

    His office was 40 miles away, but I agreed to be there at 9 am.

    I arrived there at 9 am. He wasn't there yet. They put me in a small windowless room and asked me to wait.

    Twenty minutes later I wandered out to find someone and ask what was going on. They didn't know why Recruiter Guy wasn't in yet, but they sent out his cube mate, another recruiter, to talk to me. Problem was, she didn't have any of my information available to her so it was kind of a waste of time.

    Another twenty minutes went by and someone finally called Recruiter Guy. He was stuck in traffic, with a cell phone that worked perfectly, since they got through on the first try, but he hadn't bothered to call his office to tell them to make nice to the 9 am interview that he was going to be very late for.

    They had the nerve to ask me if I would wait another 40 minutes for him to arrive. I said no thanks and walked out. If he had made the effort to call his office and apologize to me for being late, I would have waited. But he could have cared less, so he wasn't getting any commission off my back.
  • Chris 2008-09-24 12:19
    I think the problem with the downloader/watcher question is actually in the wording of the question, which serves to misdirect the candidate.

    It is phrased thusly: "Every night, a Downloader program will... save them to a certain directory on disk. A Watcher program monitors this directory"

    The clear implication is that the Watcher watches the download directory. This would probably give the candidate a fairer shot:

    "Every night, a Downloader program will... save them to disk. A Watcher program monitors a certain directory


  • persto 2008-09-24 12:20
    Paul:
    re: File watcher.

    Ignoring the fact that several people have raised that files appear before they've been fully copied or moved!
    For moves within a filesystem (renames), this *does* *not* happen. That's why it's called "rename", you're taking a file that *already* *exists*, and just pointing a new name at it. It is essentially a pointer assignment, "inode *file; ...; dir_ent.target = file".

    The interviewer shot themselves in the foot, halfway through the (reasonable) list of solutions given by the candidate, they said it was not possible to modify the downloader. However, their simple solution given at the end required a modification of the application they said couldn't be downloaded!

    Read again... the said not to modify the *watcher*.
  • mauhiz 2008-09-24 12:28
    I don't know what I am missing there, why not just wait for the Downloader to finish its job?

    a script would look like this :

    Downloader && Watcher

    I don't think having a daemon poll a directory is a good practice anyways...
  • fruey 2008-09-24 12:30
    AMerrickanGirl:
    The first story about the interviewer going out to lunch reminded me of a somewhat similar experience.

    I was contacted by a recruiter who asserted that he had a couple of jobs that were perfect for me. But first, he insisted on meeting me in person.

    His office was 40 miles away, but I agreed to be there at 9 am.

    I arrived there at 9 am. He wasn't there yet. They put me in a small windowless room and asked me to wait.

    Twenty minutes later I wandered out to find someone and ask what was going on. They didn't know why Recruiter Guy wasn't in yet, but they sent out his cube mate, another recruiter, to talk to me. Problem was, she didn't have any of my information available to her so it was kind of a waste of time.

    Another twenty minutes went by and someone finally called Recruiter Guy. He was stuck in traffic, with a cell phone that worked perfectly, since they got through on the first try, but he hadn't bothered to call his office to tell them to make nice to the 9 am interview that he was going to be very late for.

    They had the nerve to ask me if I would wait another 40 minutes for him to arrive. I said no thanks and walked out. If he had made the effort to call his office and apologize to me for being late, I would have waited. But he could have cared less, so he wasn't getting any commission off my back.


    I worked for a boss who was always late. He hated me from when I told him that being late showed an absolute disrespect for the value of other people's time.

    Funnily enough, he left soon afterwards...
  • Brandon 2008-09-24 12:37
    Alan:
    “Oh,” he scoffed, “so, we’re equals now, is that it?”

    No. I'm not an asshole. Good day.


    And then when he starts to come back with some witty remark you interrupt him with, "I said GOOD DAY!"
  • Marc 2008-09-24 12:39
    You could build custom FPGA that intercepts packets on the the network and copies them to flash memory. The hardware can use a serial interface to indicate when the download is complete to a third program, 'The Mounter', which mounts the flash disk to the location the Watcher is expecting.

    The hardware can have a pool of flash memory disk areas, one being written to from the network, one mounted. Each flash memory area would only hold one file at a time.

    Since the Watcher is always running, I'm assuming it uses some sort of event handling system. An operating system hook to the event which indicated the Watcher is done processing and is now watching could be used to tell 'The Mounter' when its time to unmount a flash disk and mount the next one in the queue.
  • AnonymousInterviewer 2008-09-24 12:40
    The problem with the bad recruiter is hardly a joke for me. I have faced this on numerous instances where we were looking for candidates. I felt they were resorting to corrupt HR practices, like nepotism. Ofcourse, I have never been able to gather any substantial evidence to file a formal complaint with HR.

    The only time I came close to nailing them down was when I got so frustrated with them that I posted a job on a local bulletin board, totally bypassing them. I got more responses from good local candidates in about 6 hours than I got from them in 2 weeks. Of course, there was no way I could proceed with this because my action (bypassing them) itself was in violation of HR policy.

    This proof was enough for me to quit the company. I don't think this practice has changed, though. BTW, I'm talking about a large corporation, here - a big name in software.
  • Someone You Know 2008-09-24 12:41
    ThomsonsPier:
    Someone You Know:
    Gorfblot:
    A diamond in the rough is something that looks to be of little value, but is actually worth quite a lot once some polishing has been done. IE, You talked with one of the Tier 1 helpdesk people and realized they had some talent and were quick to learn. You get them some further training, move them to a junior development position, and in a short time they become a major contributor to success- That's a diamond in the rough.


    I don't think "IE" means what you think it means.


    Internet Explorer? If that's the case, I make sure to pronounce it, 'AIEEEEEE!'


    Agreed. But using "i.e." when one means "e.g." is almost as annoying as IE.
  • Mover And Copier 2008-09-24 12:45
    SomeCoder:
    mv may be CLOSE to atomic but it's definitely not guaranteed to always be atomic. And if we suddenly have to change directories across partitions then it damn well is NOT atomic.

    I think Jeremy H should get a better interview question.
    We shouldn't get hung up on exactly what "atomic" means. Just focus on the question whether or not there is any chance the Watcher might process a file too soon.

    With the temp directory solution, the file has already been written completely to disk, and all the mv has to do is to create a new directory entry for it in the new directory. So what might happen if the mv is not atomic? The directory entry might not be complete, and the Watcher only sees half the file name and therefore can't find the file? It'll just have to try again later.

    Obviously, when setting up the solution you would have to make sure the temp directory is on the right file system. /var/tmp might indeed not cut it.

    The interview QUESTION is good enough. The problem might be that when I'm told I may not modify the Watcher, how the heck am I supposed to know that I might be allowed to modify the Downloader?
  • TopCod3r 2008-09-24 12:46
    We have 2 open positions on our team, due to high turnover, so I interview probably about 5 or 6 people a week and have gotten really good at giving technical interviews. It usually involves giving them a real problem from some code we have, and seeing if they solve it the right way, and then I explain to them how it should be done and make sure they agree.

    It is hard to find people who have the right mix of skills and personality. Some people realize halfway through my technical interview that they lack the required knowledge and simply cut it short and walk out of the room, I assume in embarrassment.
  • dcardani 2008-09-24 12:47
    klenow:
    Is it just me, or does Jeremy sound like a bit of a jerk? He simply wouldn't let the guy use any solution that wasn't his pet solution. Seems the simplest solution is to have the Downloader activate the Watcher when it's done.


    Yes, it is just you. Actually, I would say that he should rephrase the question, something like this maybe:

    Users are running on a version of Linux which they install and upgrade themselves. We have written a downloader application to download some important data for their business. They are using a 3rd party processing app which watches a specific directory for new downloads. As soon as the processing application sees a new file in the directory, it begins processing it. However, processing happens faster than downloading, and the watcher will produce an error if it processes an incomplete file. How can we modify the downloader to make sure this doesn't happen?

    Now it's clear that you can't modify the OS because you don't control it, and you can't modify the watcher because you don't control it. You've also made it clear that you want a solution which modifies the downloader program.

    In any event, I don't think Jeremy was being a jerk at all. He was just having trouble describing a real-world problem accurately. It's very common for inexperienced people to become flummoxed at real-world questions like these because they don't have the experience to know that these problems will arise. It just requires a little forethought when phrasing the question.
  • September ain't over yet 2008-09-24 12:49
    Every file system command? No.

    Rename? Yes, it's guaranteed to be atomic, by the relevant standards, at least in the way we care about.

    Seriously, it links the inode to the new directory, and unlinks it from the old one (link + unlink is another way to do rename, BTW). In no sane implementation can that result in seeing a partial file.

    And before you mention different filesystems, rename (and link) generally do not support that (they return errors), and, if they do support it, they are STILL required to be atomic.

    RTFS — http://www.unix.org/single_unix_specification/
  • uxor 2008-09-24 12:53
    What is the point of an Async watcher/downloader when they files needs to be synchnerized? I'd just make the downloader start the watcher to process the file it finished downloading.

    Though if they were to keep the original design and this being in Linux with the same assumtion from the question.

    I'd have the downloader create a second file to mark download completion for each file downloaded. This will prevent the moving gigs of data...

    The 2nd file would be a MD5 read from the server and on the downloaded data and other meta data about the file for verification purposes...
  • E 2008-09-24 12:55
    Instead of mv, why not ln?
  • gwenhwyfaer 2008-09-24 12:55
    I think I know what happened with Jeremy's interviewee, because I tripped over exactly the same thing. When Jeremy said "you can't change the Watcher", I, and I presume the interviewee, interpreted that as "you can't change the Watcher or the Downloader". Without pausing to check the constraint, the solution proposed is not unreasonable; without restating the constraints back to the interviewer, the wrong assumption will go unnoticed (and produce an apparently bizarre solution).

    So I don't think this is a WTF, just a misunderstanding. I hope this question wasn't the only reason the guy didn't get the job, because he's exactly the kind of person you'd want to have around when things really are that intractable.
  • September ain't over yet 2008-09-24 12:56
    #!/bin/sh
    
    # Assume that the watcher is already running, and is watching
    # for new files in /var/local/whatever/ready using inotify
    # or whatever

    cd /var/local/whatever
    [ -d tmp ] || mkdir tmp

    wget -O tmp/$$ "http://www.whatever.com/file"
    mv -f tmp/$$ ready/$$


    So even if it was not possible to modify the downloader, this is STILL easy to do.
  • David 2008-09-24 12:56
    "margin: auto"
    Thanks for the tip.
  • fert 2008-09-24 12:58
    every body is worried about mv being atomic or not, but if you can't assume that you have access to an additional temporary directory on the same disk, couldn't you download to some temp directory you do have access to, and then toss a link into the directory being polled? Cleanup would be a bit more of an issue, the watcher would take care of the links, but the original file(s) might require a bit of trickery. But this would certainly guarantee atomic operation, download file - create link - .... - clean up file, just random ideas.
  • Gorfblot 2008-09-24 12:59
    Someone You Know:

    I don't think "IE" means what you think it means.


    It's certainly possible. I think it means id est, and loosely translates into "That is to say".

    I used it between two attempts at explaining the metaphor- A literal one, and one where I attempted to show a situation where the description might be more valid.

    What do you think it means?
  • JamesQMurphy 2008-09-24 12:59
    TopCod3r:
    We have 2 open positions on our team, due to high turnover, so I interview probably about 5 or 6 people a week and have gotten really good at giving technical interviews. It usually involves giving them a real problem from some code we have, and seeing if they solve it the right way, and then I explain to them how it should be done and make sure they agree.

    It is hard to find people who have the right mix of skills and personality. Some people realize halfway through my technical interview that they lack the required knowledge and simply cut it short and walk out of the room, I assume in embarrassment.


    Have you ever asked why you have high turnover?
  • adsfg 2008-09-24 13:00
    Branan:
    A file move is different from a file copy in Linux. A move involves changing one pointer in the filesystem, no information is actually "moved". So it's not actually a problem.
    What if you move from physical drive to another?
  • Rick 2008-09-24 13:07
    adsfg:
    Branan:
    A file move is different from a file copy in Linux. A move involves changing one pointer in the filesystem, no information is actually "moved". So it's not actually a problem.
    What if you move from physical drive to another?

    Don't.

    Problem solved.

    I am totally amazed at how many 'over complicators' there are posting messages here. One would think they would be too busy over complicating things and not have any free time to post to TDWTF.
  • Jimmy Jones 2008-09-24 13:07
    Surely a better solution to the download problem is to add ".unfinished" to the file name then rename it when it's complete.

    This avoids the problem of some smartass putting the temporary folder on another disk and then you get expensive copy operations, run out of disk space when you try to move the file, etc.

  • RBoy 2008-09-24 13:11
    AMerrickanGirl:
    He was stuck in traffic, with a cell phone that worked perfectly, since they got through on the first try, but he hadn't bothered to call his office to tell them to make nice to the 9 am interview that he was going to be very late for.


    Ah, but what if his cell phone didn't work?
  • ChiefCrazyTalk 2008-09-24 13:14
    Azeroth:
    There is another simple solution with the downloader/watcher problem - downloader should open the file exclusively while it's being downloaded, this way watcher won't be able to access it until it's closed. This way it's not even required to move anything anywhere.

    Another solution I've seen - download a 0-byte semaphore AFTER the file is downloaded, and check for the presense of that file before starting your processing of the main file.
  • Pat 2008-09-24 13:18
    Gorfblot:
    Someone You Know:

    I don't think "IE" means what you think it means.


    It's certainly possible. I think it means id est, and loosely translates into "That is to say".

    I used it between two attempts at explaining the metaphor- A literal one, and one where I attempted to show a situation where the description might be more valid.

    What do you think it means?


    Well, I think it means nothing, because it was written as 'IE', not i.e.. That was kind of his point to begin with.
  • Reader X 2008-09-24 13:19
    ThomasP:
    I think you missed the part where they were talking about file moves in Linux, not Windows.


    That part isn't in the original problem statement. It's only included at the very end, where the interviewer says, "so you’re saying, to solve the problem of the Watcher processing files that are not done downloading, you would modify the Linux kernel?"

    Original Problem:
    Every night, a Downloader program will retrieve a handful of several-gigabyte files from a remote server and save them to a certainly directory on disk. A Watcher program monitors this directory and immediately processes whichever files show up. However, because downloading takes significantly longer than processing, the Watcher program will crash if it reads a file that has not been fully downloaded. How would you prevent this from occurring?

  • Spectre 2008-09-24 13:22
    So. Why did the interviewee for a PHP developer position got asked about CSS?
  • alegr 2008-09-24 13:22
    ThomasP:
    I think you missed the part where they were talking about file moves in Linux, not Windows.


    Move within the same filesystem ("rename") is atomic and doesn't involve copying/moving the file bits. In any reasonable OS, which Windows is, too, no matter how you would object. As soon as the file appears in the target directory, it's there instantly, and "Watcher" can read it without any problem.
  • adsfg 2008-09-24 13:23
    Rick:
    adsfg:
    Branan:
    A file move is different from a file copy in Linux. A move involves changing one pointer in the filesystem, no information is actually "moved". So it's not actually a problem.
    What if you move from physical drive to another?

    Don't.

    Problem solved.

    I am totally amazed at how many 'over complicators' there are posting messages here. One would think they would be too busy over complicating things and not have any free time to post to TDWTF.
    Actually, I was just asking what Linux does in this case. I'm not asking in relation to the problem.
  • SoonerMatt 2008-09-24 13:24
    RBoy:
    AMerrickanGirl:
    He was stuck in traffic, with a cell phone that worked perfectly, since they got through on the first try, but he hadn't bothered to call his office to tell them to make nice to the 9 am interview that he was going to be very late for.


    Ah, but what if his cell phone didn't work?


    Seriously?!? They said the office called him and his cell phone worked perfectly.
  • Gorfblot 2008-09-24 13:31
    Pat:
    Well, I think it means nothing, because it was written as 'IE', not i.e.. That was kind of his point to begin with.


    Touché.
  • Rick 2008-09-24 13:32
    adsfg:
    Rick:
    adsfg:
    Branan:
    A file move is different from a file copy in Linux. A move involves changing one pointer in the filesystem, no information is actually "moved". So it's not actually a problem.
    What if you move from physical drive to another?

    Don't.

    Problem solved.

    I am totally amazed at how many 'over complicators' there are posting messages here. One would think they would be too busy over complicating things and not have any free time to post to TDWTF.
    Actually, I was just asking what Linux does in this case. I'm not asking in relation to the problem.


    I was referring to a myriad of over complicators posting. However, if you are asking for general knowledge, rather than to answer the interview question properly...

    Physical drives are irrelevant in Linux. Moves across file systems are not atomic, but are implemented as copy and remove. Traditionally file systems did not span physical drives, but today they can in various ways.
  • duplicity 2008-09-24 13:40
    No, don't make it blue! It's just our favorite troll, TopCod3r.
  • hatterson 2008-09-24 13:41
    mauhiz:
    I don't know what I am missing there, why not just wait for the Downloader to finish its job?

    a script would look like this :

    Downloader && Watcher

    I don't think having a daemon poll a directory is a good practice anyways...


    A situation similar to this comes up at my office on a daily basis. We virtually never have control over both the watcher and the downloader. Often they don't even run on the same machine and simply interact through a shared folder or something similar.

    Saying "just have a script wait till the downloader finishes to start the watcher" assumes you have complete control over the system which may or may not be true. The solution of a tmp name/directory (or something like a poll timer checking for file size if you're the watcher) is the most universally accepted as it requires only modifying and/or controlling one of the applications
  • fortyrunner 2008-09-24 13:42
    Another way to process large files is to ensure that the processor does not start processing a file until it sees a small semaphore file.

    E.g. a 100MB Movie.MOV file won't be processed until a 1byte Movie.MOV.GO file is in the same directory. I've been using this for years.

  • Chris 2008-09-24 13:44
    Even more fun, ask for a SECOND simple solution.

    The temporary directory is a good solution, but it has problems. E.g., in Unix you can rename atomically, but I can imagine situations where you would have to manually read/write the entire file. Is there another standard approach?

    It turns out there is --- and it's actually more correct in some ways. Have the downloader get a 'write' lock on the file. Have the watcher get a 'read' lock on the file. (Or its own 'write' lock if it deletes the file as the final step in the process.) You're fine as long as everyone uses locks and the watcher program is smart enough to keep retrying to get a lock. (I assume it's not so stupid that it will wait indefinitely for that lock.)
  • KM 2008-09-24 13:46
    Pat:
    Gorfblot:
    Someone You Know:

    I don't think "IE" means what you think it means.


    It's certainly possible. I think it means id est, and loosely translates into "That is to say".

    I used it between two attempts at explaining the metaphor- A literal one, and one where I attempted to show a situation where the description might be more valid.

    What do you think it means?


    Well, I think it means nothing, because it was written as 'IE', not i.e.. That was kind of his point to begin with.


    I'm pretty sure the actual point was that i.e. != e.g.
    http://www.wsu.edu/~brians/errors/e.g.html
  • Steve Burnap 2008-09-24 13:47
    The problem statement said that the downloader downloads multiple files. Presumably you'd want the watcher to process file 1 while the downloader was downloading file 2.
  • DeLos 2008-09-24 13:50
    KM:
    Pat:
    Gorfblot:
    Someone You Know:

    I don't think "IE" means what you think it means.


    It's certainly possible. I think it means id est, and loosely translates into "That is to say".

    I used it between two attempts at explaining the metaphor- A literal one, and one where I attempted to show a situation where the description might be more valid.

    What do you think it means?


    Well, I think it means nothing, because it was written as 'IE', not i.e.. That was kind of his point to begin with.


    I'm pretty sure the actual point was that i.e. != e.g.
    http://www.wsu.edu/~brians/errors/e.g.html


    I still think e.g. was what was needed here. I believe e.g. would be here is one of many examples.

    **EDIT - adding a link that shows my point http://ancienthistory.about.com/od/abbreviations/f/ievseg.htm
  • LEGO 2008-09-24 13:54
    Someone You Know:
    Gorfblot:
    A diamond in the rough is something that looks to be of little value, but is actually worth quite a lot once some polishing has been done. IE, You talked with one of the Tier 1 helpdesk people and realized they had some talent and were quick to learn. You get them some further training, move them to a junior development position, and in a short time they become a major contributor to success- That's a diamond in the rough.


    I don't think "IE" means what you think it means.


    No, I think the usage is correct here IE = "Id Est" to denote clarification or further explanation.

    captcha: dolor. ie lorem ipsum dolor est...
  • Mizchief 2008-09-24 13:56
    JamesQMurphy:
    TopCod3r:
    We have 2 open positions on our team, due to high turnover, so I interview probably about 5 or 6 people a week and have gotten really good at giving technical interviews. It usually involves giving them a real problem from some code we have, and seeing if they solve it the right way, and then I explain to them how it should be done and make sure they agree.

    It is hard to find people who have the right mix of skills and personality. Some people realize halfway through my technical interview that they lack the required knowledge and simply cut it short and walk out of the room, I assume in embarrassment.


    Have you ever asked why you have high turnover?


    Yea i'm guessing that you are the problem. People don't just walk out of interviews out of technical embarrassment if they don't know something they usualy say something like "If I had google in front of me I could look it up in 2 seconds!" They walk out in situations where they find the interview to be asinine, or decide the company sucks considering the events that lead up to that point.
  • Max 2008-09-24 13:57
    So the real WTF is me for misreading move vs copy...

    Or everyone else for assuming that your average developer has control over where the SAN administrator chooses to store various file paths?

    Or are you all assuming that this is a small/medium shop where a developer also controls the infrastructure? That just isn't true in enterprise organizations (at least, not all of them).

    I admit my original comment was short-sighted, but come on people... if you need a caveat on your response, then your response is flawed, too.

  • Aaron 2008-09-24 13:57
    David Emery:
    The "open exclusive" is the -right way*- to do this.

    As always, TRWTF is in the comments. You're like the 10th person now to suggest something involving locking or permissions. If the Watcher program is going to crash if it tries to process an incomplete file, what the hell makes you think it won't crash when it tries to open a file that's locked or privileged? You're making all sorts of unstated assumptions there that just aren't necessary because there's a much simpler solution that works just as well.

    I think these interview questions must hit a raw nerve with programmers who know they would have failed them. The level of both ignorance and hostility in the comments is phenomenal.
  • duplicity 2008-09-24 13:58
    Mizchief:
    JamesQMurphy:
    TopCod3r:
    We have 2 open positions on our team, due to high turnover, so I interview probably about 5 or 6 people a week and have gotten really good at giving technical interviews. It usually involves giving them a real problem from some code we have, and seeing if they solve it the right way, and then I explain to them how it should be done and make sure they agree.

    It is hard to find people who have the right mix of skills and personality. Some people realize halfway through my technical interview that they lack the required knowledge and simply cut it short and walk out of the room, I assume in embarrassment.


    Have you ever asked why you have high turnover?


    Yea i'm guessing that you are the problem. People don't just walk out of interviews out of technical embarrassment if they don't know something they usualy say something like "If I had google in front of me I could look it up in 2 seconds!" They walk out in situations where they find the interview to be asinine, or decide the company sucks considering the events that lead up to that point.


    Don't feed the trolls
  • wee 2008-09-24 14:01
    I've got a "downloader/watcher" set of apps I wrote. There are a few ways to solve the problem. In fact, moving files around would be my last choice.

    My watcher and downloader are in the same app. Each thread knows what to pass along so that the files can be processed correctly. An old version had two separate apps, and used temporary file names. It was much slower, though.

  • yellowstuff 2008-09-24 14:04
    Recruiters are not always your friend. He may have been sending you just to get some information about the position, or to make the next guy look better.
  • yellowstuff 2008-09-24 14:08
    No. "Id est" means "that is", or "in other words." It does not mean "for example." http://en.wikipedia.org/wiki/Inter_alia#I
  • Gerrit 2008-09-24 14:08
    Aaron:
    If its in linux, the lsof command works quite well at telling you a file is still open, without having to modify the kernel or anything.

    lsof | grep filename || start whatever you wanted



    Try inotifywait, it can report file names when they are closed. But I would still go for the temporary location. Suppose the download gets interrupted and is restarted. That could lead to incomplete files being processed.
  • Bush 2008-09-24 14:08
    Spectre:
    So. Why did the interviewee for a PHP developer position got asked about CSS?


    umm... because web development involves css? Even if you have a dedicated design team that gives you the static html, you will still likely need to modify it as you build out the site, and that requires knowing html and css.
  • Mizchief 2008-09-24 14:10
    It would really depend on what the "Watcher" was "Processing" If we are talking about multi-GB files I would consider coding the "Watcher" so that it could start processing before you had to download the entire file and/or move it to it's proper directory. And what exaclty is this data and how is it used? The "simpler" solution may be to set up a one method web service access the bits of data as needed by the user vs. constantly keeping two data sets in sync and eating up bandwidth.

    I suppose attempting to solve the functional problem and not a techincal problem is what separates Engineers from Programmers (the men from the boys).

    If I were to ask a question like this in an interview I would expect to hear several questions from the canidate regarding what the goal of this feature was and what his restrictions were, but hopefully modifying the kernel is assumed outside his domain of control.

  • Kozz 2008-09-24 14:12
    JQM:
    Please do keep up. TopCod3r really is tdwtf's resident troll. ;) Just see other posts by him and you may detect a pattern.
  • Schnapple 2008-09-24 14:12
    AMerrickanGirl:
    The first story about the interviewer going out to lunch reminded me of a somewhat similar experience.

    I was contacted by a recruiter who asserted that he had a couple of jobs that were perfect for me. But first, he insisted on meeting me in person.

    His office was 40 miles away, but I agreed to be there at 9 am.

    I arrived there at 9 am. He wasn't there yet. They put me in a small windowless room and asked me to wait.

    Twenty minutes later I wandered out to find someone and ask what was going on. They didn't know why Recruiter Guy wasn't in yet, but they sent out his cube mate, another recruiter, to talk to me. Problem was, she didn't have any of my information available to her so it was kind of a waste of time.

    Another twenty minutes went by and someone finally called Recruiter Guy. He was stuck in traffic, with a cell phone that worked perfectly, since they got through on the first try, but he hadn't bothered to call his office to tell them to make nice to the 9 am interview that he was going to be very late for.

    They had the nerve to ask me if I would wait another 40 minutes for him to arrive. I said no thanks and walked out. If he had made the effort to call his office and apologize to me for being late, I would have waited. But he could have cared less, so he wasn't getting any commission off my back.


    I had a similar thing happen wherein the guy wanted me to come meet him, but the first opportunity I had was after I got done with work on Friday (was looking for another job but I wasn't about to endanger my current one). So there I was driving over there after 5:30 PM on a Friday, to go somewhere that takes at least an hour to drive to in normal traffic.

    Recruiter calls me when I'm halfway there and offers to meet on Monday instead. I said I'm halfway there and we can just go ahead and do this today.

    When I finally get there, the recruiter is gone. He said f*** this and left - apparently I was crimping his Friday night plans (in all fairness meeting on Friday after work was his idea). I was pissed since I had driven this whole way for nothing.

    On the upside they did have someone else to talk to me - a blonde former cheerleader who had been in the workforce all of three weeks and apparently had not had the "don't dress provocatively" speech with her boss yet.
  • PG 2008-09-24 14:13
    Chris:

    It turns out there is --- and it's actually more correct in some ways. Have the downloader get a 'write' lock on the file. Have the watcher get a 'read' lock on the file. (Or its own 'write' lock if it deletes the file as the final step in the process.) You're fine as long as everyone uses locks and the watcher program is smart enough to keep retrying to get a lock. (I assume it's not so stupid that it will wait indefinitely for that lock.)


    BUZZZZZ

    Thanks for playing. You can't change the Watcher, and it needs to take out a lock, and *NIX doesn't do an automatic locking as part of the filesystem.
  • Franz_Kafka 2008-09-24 14:19
    Azeroth:
    There is another simple solution with the downloader/watcher problem - downloader should open the file exclusively while it's being downloaded, this way watcher won't be able to access it until it's closed. This way it's not even required to move anything anywhere.


    What if the downloader crashes or the network goes away?

    anyway, the solution I've seen work (and work well) is this:

    download file.
    when done, create file.DOWNLOADED
    file muncher notices a file ending in .DOWNLOADED, creates a .PROCESSING file in the same matter and writes its pid to it.
    when file muncher finishes, write file.DONE, delete file.PROCESSING

    you can fill in the error checks pretty easily, and using ls will tell you what's going on.
  • savar 2008-09-24 14:20
    imMute:


    File copies would still be a problem, but file moves (also known as renames) are extremely quick as long as the src/dst are on the same volume: the FS only moves the inode data. Hell, even windows handles this as well as *nix.


    But the way Linux works, the entire filesystem is represented as being contiguous, even when the physical storage isn't.

    All it would take is one "clever" sysadmin to put the temp directory on a separate partition and all of a sudden you've reintroduced the same race condition you had before -- except this time you're not even aware of it.

    The best idea is to use some sort of locking. Either using built-in operating system locking or just something as simple as dropping a lock file in the directory that both the Downloader and Watcher respect.

    Yes, I realize the Watcher isn't supposed to be modified... but stating that you can't modify the Watcher does make this an absurd scenario to begin with.
  • m&m 2008-09-24 14:21
    "...remote server and save them to a certainly directory on disk. A Watcher program monitors this directory..."

    and then:

    “What about if the Downloader just wrote files to a temporary directory, and then moved the file to the appropriate directory when the download was complete.”

    I think the first door he closed was that leading to his 'briliant' solution...
    And as always, if you know how the conjurer does his tricks, they are so easy you could have thought of them yourselve
  • David Emery 2008-09-24 14:21
    Charles Duffy:
    David Emery:
    The "open exclusive" is the -right way*- to do this. If /tmp is located on a different file system/drive than the destination, then the file move operation is not guaranteed atomic. You can have the same problem on the disk copy than you have on the download (although the latency is less, it's certainly not zero.)


    It's a move, not a copy. Moves within the same partition are atomic.

    Moreover, the tempfile-and-move approach (where, yes, you put your temporary file in the same directory or otherwise somewhere known to be on the same partition) is the generally accepted solution to this problem; if you don't know it, there's no way in hell I want you on my systems engineering team.

    Again, the problem the interviewer was looking for is the way every single UNIX developer with any kind of a clue does atomic file updates.


    Read what I and others have pointed out. mv is atomic If-and-ONLY-IF you're on the same partition.

    Before we got O_EXCL or advisory locking in Unix, the common trick was to use the fact that the kernel would execute a hardlink (ln) atomically to create a lockfile, e.g. ln filename.lock. The application would -poll for the absence- of filename.lock to determine that the file was unlocked and that the download was completed.

    The advantage of opening a file either exclusive or using advisory locking, is that it doesn't have to be a polling operation. It can be a blocking operation where the block/unblock is all managed by the OS (kernel and supporting library.)

    If you don't understand why polling in this situation is A Bad Thing, I don't want to be on your systems engineering team.*

    dave

    *sorry, but the original posting was (a) a bit insulting because it was (b) a bit uninformed and (c) failed to RTFPosting; and (d) I'm feeling cranky today.
  • Smash King 2008-09-24 14:23
    wds:
    See I'd just have used a lockfile.
    Have you per any chance designed MS-Access?
  • Smash King 2008-09-24 14:28
    klenow:
    Is it just me, or does Jeremy sound like a bit of a jerk? He simply wouldn't let the guy use any solution that wasn't his pet solution. Seems the simplest solution is to have the Downloader activate the Watcher when it's done.
    You cross a dimensional portal and gets stuck in a world where interviewers are unable to use adaptive interviews "IE" those that change to turn whatever answer the interviewee provided automatically wrong. Now what do you do?
  • Franz_Kafka 2008-09-24 14:32
    savar:
    imMute:


    File copies would still be a problem, but file moves (also known as renames) are extremely quick as long as the src/dst are on the same volume: the FS only moves the inode data. Hell, even windows handles this as well as *nix.


    But the way Linux works, the entire filesystem is represented as being contiguous, even when the physical storage isn't.

    All it would take is one "clever" sysadmin to put the temp directory on a separate partition and all of a sudden you've reintroduced the same race condition you had before -- except this time you're not even aware of it.

    The best idea is to use some sort of locking. Either using built-in operating system locking or just something as simple as dropping a lock file in the directory that both the Downloader and Watcher respect.

    Yes, I realize the Watcher isn't supposed to be modified... but stating that you can't modify the Watcher does make this an absurd scenario to begin with.


    if the two directories are in the same filesystem, it doesn't matter if they're in different disks. In fact, i can guarantee that they don't. Also, you have to assume that the system (the part you control) is correctly configured, so the file is downloaded to a staging dir, then moved. Allowing for malicious config makes the problem impossible.

    I already pounded your locking scheme, so never miind about that. Anyway, nfs locking isn't reliable.
  • Franz_Kafka 2008-09-24 14:34
    David Emery:


    The advantage of opening a file either exclusive or using advisory locking, is that it doesn't have to be a polling operation. It can be a blocking operation where the block/unblock is all managed by the OS (kernel and supporting library.)

    If you don't understand why polling in this situation is A Bad Thing, I don't want to be on your systems engineering team.*

    dave

    *sorry, but the original posting was (a) a bit insulting because it was (b) a bit uninformed and (c) failed to RTFPosting; and (d) I'm feeling cranky today.


    Polling isn't a bad thing, depending on the interval - poll once per minute (files are downloaded a few times per day) and the fast fail case makes for almost no system load. Also, it's simple and predictable.
  • akatherder 2008-09-24 14:35
    Mover And Copier:
    The problem might be that when I'm told I may not modify the Watcher, how the heck am I supposed to know that I might be allowed to modify the Downloader?


    So you're given a problem with 2 entities. The person who gives you the problem says "Solve this problem without changing entity #1." You assume that you can't do anything to entity #2?

    The only rational thing I can assume is that changing entity #2 is the ONLY way to solve it or else this is a bullshit trick question.
  • Nicolas Verhaeghe 2008-09-24 14:43
    Yes, I also passed many interviews with flying colors, after answering very basic questions.

    This comes from the fact that our industry attracts many wannabees who just don't have what it takes, or just spent years slacking in college and managed somehow to pass their exams by cramming but have little understanding of what it is we IT guys do.

    I even told an interviewer once that, really, these questions were very simple, to which he agreed and said that I'd be surprised at how bad most applicants are.

    An advice I always give is this: when you do not know, say you do not know, do not make anything up. Better: say that you do not know *YET*.
  • brodie 2008-09-24 14:45
    Someone You Know:
    Gorfblot:
    A diamond in the rough is something that looks to be of little value, but is actually worth quite a lot once some polishing has been done. IE, You talked with one of the Tier 1 helpdesk people and realized they had some talent and were quick to learn. You get them some further training, move them to a junior development position, and in a short time they become a major contributor to success- That's a diamond in the rough.


    I don't think "IE" means what you think it means.

    It looks like he's using it as the abbreviation for "id est," which essentially means "in other words..." Which is perfectly fine, as used.

    You use e.g. when you want to give specific examples, and i.e. when you want to elaborate on something.
  • Visual 2008-09-24 14:51
    "“What about if the Downloader just wrote files to a temporary directory, and then moved the file to the appropriate directory when the download was complete.”"

    No experience with coding on NIX but that could still fault on windows.

    If you write a service/application that monitors a folder for filewrites you might try to process the file before it's fully copied there. Thread a loop for exclusive access on the file to prevent locks/errors.
  • Nicolas Verhaeghe 2008-09-24 14:51
    Smash King:
    klenow:
    Is it just me, or does Jeremy sound like a bit of a jerk? He simply wouldn't let the guy use any solution that wasn't his pet solution. Seems the simplest solution is to have the Downloader activate the Watcher when it's done.
    You cross a dimensional portal and gets stuck in a world where interviewers are unable to use adaptive interviews "IE" those that change to turn whatever answer the interviewee provided automatically wrong. Now what do you do?


    I too could find FIVE different solutions, just as simple

    1-The watcher is activated by the downloader
    2-The watcher and the downloader are one same application
    3-While being downloaded, the file is named "tmp_[filename]" and then renamed "[filename]" and the watcher knows not to open files starting with "tmp_"
    4-Downloader is scheduled to work at the top of the hour, Watcher scheduled to work at the bottom of the hour, and if you have a T1, what kind of a file would take more than 30 minutes to download?!?
    5-Watcher is coded properly with TRY/CATCH clauses and you go from there...

    Why would you have a "watcher" spend it's time hogging processor time and memory 24/7 anyway?

    In my world, my SSIS packages fire on schedule and do their tasks sequentially.

    If the interviewer does not accept any other solution than is, this means only one thing: control freak. Take your stuff and say bye bye.
  • blindio 2008-09-24 14:51
    Seems to me that this is a 3rd party application (the watcher) as it's not something we can change, yet it crashes on a pretty obvious scenario. I think the solution is to contact the vendor and tell them to fix their buggy POS software and send me a patch.
  • Franz_Kafka 2008-09-24 14:53
    Visual:
    "“What about if the Downloader just wrote files to a temporary directory, and then moved the file to the appropriate directory when the download was complete.”"

    No experience with coding on NIX but that could still fault on windows.

    If you write a service/application that monitors a folder for filewrites you might try to process the file before it's fully copied there.


    That's why you move the file.
  • Jules Winnfield 2008-09-24 14:55
    brodie:
    Someone You Know:
    Gorfblot:
    A diamond in the rough is something that looks to be of little value, but is actually worth quite a lot once some polishing has been done. IE, You talked with one of the Tier 1 helpdesk people and realized they had some talent and were quick to learn. You get them some further training, move them to a junior development position, and in a short time they become a major contributor to success- That's a diamond in the rough.


    I don't think "IE" means what you think it means.

    It looks like he's using it as the abbreviation for "id est," which essentially means "in other words..." Which is perfectly fine, as used.

    You use e.g. when you want to give specific examples, and i.e. when you want to elaborate on something.
    I use IE when I want to install malware.
  • Gorfblot 2008-09-24 14:56
    KM:

    I'm pretty sure the actual point was that i.e. != e.g.
    http://www.wsu.edu/~brians/errors/e.g.html


    That is to say, you talked with one of the Tier 1 helpdesk people and realized they had some talent and were quick to learn…

    Naw, still sounds allright too me. Wereas if you replaced teh "That is" width "For Example", it'd sound awkward w/o some miner changes- at very least put the hole thing in present tense.

    Its ok, I can handle bean rong. Guess Im a diamond in the rogue.
  • Franz_Kafka 2008-09-24 14:56
    Nicolas Verhaeghe:
    Smash King:
    klenow:
    Is it just me, or does Jeremy sound like a bit of a jerk? He simply wouldn't let the guy use any solution that wasn't his pet solution. Seems the simplest solution is to have the Downloader activate the Watcher when it's done.
    You cross a dimensional portal and gets stuck in a world where interviewers are unable to use adaptive interviews "IE" those that change to turn whatever answer the interviewee provided automatically wrong. Now what do you do?


    I too could find FIVE different solutions, just as simple

    1-The watcher is activated by the downloader
    2-The watcher and the downloader are one same application
    3-While being downloaded, the file is named "tmp_[filename]" and then renamed "[filename]" and the watcher knows not to open files starting with "tmp_"
    4-Downloader is scheduled to work at the top of the hour, Watcher scheduled to work at the bottom of the hour, and if you have a T1, what kind of a file would take more than 30 minutes to download?!?
    5-Watcher is coded properly with TRY/CATCH clauses and you go from there...



    4: no locking - that's insane.
    5: try/catch doesn't work when the process dies


    Why would you have a "watcher" spend it's time hogging processor time and memory 24/7 anyway?


    sitting around watching a dir takes no appreciable cpu, and likely not much memory either.
  • allo 2008-09-24 14:57
    The best solution would not need a change as a temporary directory. The Watcher would just wait for two files. Then it processes one and waits until the Downloader begins the next. When the Downloader stops, the Watcher would process the last file.
  • campkev 2008-09-24 14:58
    brodie:
    Someone You Know:
    Gorfblot:
    A diamond in the rough is something that looks to be of little value, but is actually worth quite a lot once some polishing has been done. IE, You talked with one of the Tier 1 helpdesk people and realized they had some talent and were quick to learn. You get them some further training, move them to a junior development position, and in a short time they become a major contributor to success- That's a diamond in the rough.


    I don't think "IE" means what you think it means.

    It looks like he's using it as the abbreviation for "id est," which essentially means "in other words..." Which is perfectly fine, as used.

    You use e.g. when you want to give specific examples, and i.e. when you want to elaborate on something.


    A specific example like "You talked with one of the Tier 1 helpdesk people and realized they had some talent and were quick to learn. You get them some further training, move them to a junior development position, and in a short time they become a major contributor to success?"

    I.e. was wrong here. E.g. would have been the correct choice. Deal with it.
  • Smash King 2008-09-24 14:59
    Rick:
    adsfg:
    Branan:
    A file move is different from a file copy in Linux. A move involves changing one pointer in the filesystem, no information is actually "moved". So it's not actually a problem.
    What if you move from physical drive to another?

    Don't.

    Problem solved.
    Or if it's not an option:
    -Download file to \\drive1\somefolder
    -Upon completion, have the downloader move the file to \\drive2\tempfolder
    - Only then move the file to the destination folder

    The first move is not atomic but it doesn't matter because the watcher watches over the other directory. Problem solved anyway.
  • James R. Twine 2008-09-24 14:59
    Ok - so the majority here believe that mv is an atomic operation if the src and dst are on the same filesystem (or partition?)...

    Does this hold true for ALL filesystems running under Linux/Unix? It might be that the directory is not running on ext2/ext3... What if it was a mounted FAT32 partition, or RiserFS, or UFS, or a SMB share?

    That is the problem with questions that have "only one right answer" IMHO -- they either omit important assumptions (i.e. the underlying filesystem supports atomic moves and the src/dst are on the same filesystem/partition), or are poorly conceived and disregard important implementation details like that that the really smart people think about. :)
  • Asiago Chow 2008-09-24 15:08
    if the two directories are in the same filesystem, it doesn't matter if they're in different disks. In fact, i can guarantee that they don't.


    Not sure what you meant to say here. It does matter whether they are on the same disk or partition. I don't know what you can guarantee.

    Here's what they are talking about:


    echo "hello" > \mine\tmp\tmpfile
    mv \mine\tmp\tmpfile \mine




    Now imagine the filesystem was set up one of these ways:


    1:
    mkdir \mine
    mkdir \mine\tmp

    2:
    mkdir \mine
    mkdir \mine\tmp
    mount \dev\sda3 \mine\tmp

    3:
    mkdir \mine
    mount \dev\sda1 \mine
    mkdir \mine\tmp
    mount \dev\sdb1 \mine\tmp





    #1 is going to be effectively atomic. The other two are not.

    Also, you have to assume that the system (the part you control) is correctly configured, so the file is downloaded to a staging dir, then moved. Allowing for malicious config makes the problem impossible.


    It's not malicious config, it's a very common type of config in unixesq systems...and one which is normally hidden and inconsequential but could cause unstable behavior due to this shortcut. It doesn't necessarily make the problem impossible.

    The mv method works and I've used it. However, there are other methods which add functionality that might well be needed in the real world so his characterization of it as the "only real solution" was questionable either on the "only" or "real" front.
  • vman 2008-09-24 15:23
    Of course, there's no real reason given to not have the downloader kick off the "watcher" (or processor in this case). Which could easily be set up so that watcher *never* starts before the downloaded file is ready.
  • mccoyn 2008-09-24 15:26
    > There’s really only one correct answer, and I try my best to guide candidates to get there.

    Jeremy isn't very good at guiding a candidate to the correct answer. His guidance was repeatedly 'you can't do that'. Rather than say "you can’t modify the Watcher" why not say "you can only modify the Downloader". This immediately gets the candidate out of the *incorrect* mindset of modifying the processor and into the *correct* mindset of modifying the downloader.
  • akatherder 2008-09-24 15:33
    allo:
    The best solution would not need a change as a temporary directory. The Watcher would just wait for two files. Then it processes one and waits until the Downloader begins the next. When the Downloader stops, the Watcher would process the last file.


    I was going to suggest that but it has some of its own ugliness.

    1. The Watcher doesn't necessarily have a way of knowing which of the two files is fully downloaded and which is in progress. This is easy to workaround if the files are named sequentially.

    2. If the Watcher takes a long time to process a certain file and the next file is the "runt" of the bunch, you could end up with 3 (or more) files in the directory at the same time. Again, if the files are named sequentially... not a big problem.

    3. I don't think the Downloader has a way to notify that it has "stopped", and you can process the last file now. Otherwise it would probably have a way to notify that it has finished each file and we wouldn't need to worry about anything.
  • wds 2008-09-24 15:34
    Smash King:
    wds:
    See I'd just have used a lockfile.
    Have you per any chance designed MS-Access?

    Heh no, but think about it. The recruiter said the watcher program couldn't be modified but only to guide the person to the right answer. So if you were able to modify the watcher program, there's two easy solutions. You can make the watcher program ignore certain files and then just rename them to files it does see or you can use a lockfile. Lockfile is the first I came up with reading the question (quite a common pattern, actually, think of pidfiles), renaming is probably a bit easier and more stable since there's no cleanup required if the downloader crashes.

    That being said, trying to start up some sort of IPC link for this was a bit overkill either way. Envisioning a problem when trying to put the temporary directory on different drives might be premature as well, but then that's the problem with these types of recruiting questions, you're working off of a set of assumptions and you never really know what they are.
  • Edward Royce 2008-09-24 15:43
    Azeroth:
    There is another simple solution with the downloader/watcher problem - downloader should open the file exclusively while it's being downloaded, this way watcher won't be able to access it until it's closed. This way it's not even required to move anything anywhere.


    That was my thought. Lock the file until fully downloaded and then have the downloader invoke the processing program.

    No mucking about with polling a file for it's lock status. No goofing around. No worries some idiot will see an empty directory and remove it in a "clean up" effort.
  • EatenByAGrue 2008-09-24 15:44
    Jimmy Jones:
    Surely a better solution to the download problem is to add ".unfinished" to the file name then rename it when it's complete.

    This avoids the problem of some smartass putting the temporary folder on another disk and then you get expensive copy operations, run out of disk space when you try to move the file, etc.



    You're right, it is. And stop calling me Shirley.
  • Jasmine 2008-09-24 15:58
    “Where the hell are you getting these people then? Any amateur could answer those questions.”

    I actually said something like that once, after being told I was the only person who answered a simple SQL question correctly. After three interviews and flying halfway across the USA (on their dime) for a fourth interview, I'm pretty sure that comment is why I'm not living in Phoenix right now.

    Also, from reading the comments, I'm wondering if anyone really has any practical experience with the download/pickup routine. Having worked for possibly the largest printing company in the world, I can tell you this directory-watching thing is pretty common, and on every system I've worked on, which includes Unix, Windows, and AS400 - the "move file when fully downloaded" procedure is the simplest and most reliable, and most other schemes are prone to problems you can't even imagine - things that I never thought could happen, such as whole file systems failing because of race conditions between processes. The interviewer has this one right - however, it's not the *only* simple solution, and it does require some experience to know that renaming schemes and such can lead to problems. If the intent is to see if the person can come up with simple ideas for simple problems, he could probably go with a simpler question that isn't so reliant on practical experience - like the one about the weight of a 747.
  • and Weeeeeeee! 2008-09-24 15:58
    SoonerMatt:
    Marc:
    Rename?

    Yeah I was thinking that too. Rather than make it move a 3 gb file (which could fail in itself), I would start the transaction as a .tmp file then remove the .tmp when it's completed.


    Yeah... kinda like Firefox!
  • Franz_Kafka 2008-09-24 16:00
    Asiago Chow:
    if the two directories are in the same filesystem, it doesn't matter if they're in different disks. In fact, i can guarantee that they don't.


    Not sure what you meant to say here. It does matter whether they are on the same disk or partition. I don't know what you can guarantee.


    No, it only matters that they are on the same filesystem. If you don't know what that means, go read a book.


    Here's what they are talking about:


    echo "hello" > \mine\tmp\tmpfile
    mv \mine\tmp\tmpfile \mine




    Now imagine the filesystem was set up one of these ways:


    1:
    mkdir \mine
    mkdir \mine\tmp

    2:
    mkdir \mine
    mkdir \mine\tmp
    mount \dev\sda3 \mine\tmp

    3:
    mkdir \mine
    mount \dev\sda1 \mine
    mkdir \mine\tmp
    mount \dev\sdb1 \mine\tmp





    #1 is going to be effectively atomic. The other two are not.


    #2 and #3 are not on the same filesystem.


    Also, you have to assume that the system (the part you control) is correctly configured, so the file is downloaded to a staging dir, then moved. Allowing for malicious config makes the problem impossible.


    It's not malicious config, it's a very common type of config in unixesq systems...and one which is normally hidden and inconsequential but could cause unstable behavior due to this shortcut. It doesn't necessarily make the problem impossible.

    The mv method works and I've used it. However, there are other methods which add functionality that might well be needed in the real world so his characterization of it as the "only real solution" was questionable either on the "only" or "real" front.


    Setting up the move so it goes across filesystems is a serious config error and part of what I was referring to as malicious behavior. The mv method works across filesystems now, but it didn't always work like that. Used to be, it would fail if the filesystems differed.
  • Franz_Kafka 2008-09-24 16:01
    James R. Twine:
    Ok - so the majority here believe that mv is an atomic operation if the src and dst are on the same filesystem (or partition?)...

    Does this hold true for ALL filesystems running under Linux/Unix? It might be that the directory is not running on ext2/ext3... What if it was a mounted FAT32 partition, or RiserFS, or UFS, or a SMB share?

    That is the problem with questions that have "only one right answer" IMHO -- they either omit important assumptions (i.e. the underlying filesystem supports atomic moves and the src/dst are on the same filesystem/partition), or are poorly conceived and disregard important implementation details like that that the really smart people think about. :)


    It holds true for all sensible choices for a server filesystem. I assume sensible configuration and enforce it when possible, because allowing stupid things like fat32 on a server just encourages idiots.
  • Franz_Kafka 2008-09-24 16:07
    Jasmine:

    Also, from reading the comments, I'm wondering if anyone really has any practical experience with the download/pickup routine. Having worked for possibly the largest printing company in the world, I can tell you this directory-watching thing is pretty common, and on every system I've worked on, which includes Unix, Windows, and AS400 - the "move file when fully downloaded" procedure is the simplest and most reliable, and most other schemes are prone to problems you can't even imagine - things that I never thought could happen, such as whole file systems failing because of race conditions between processes. The interviewer has this one right - however, it's not the *only* simple solution, and it does require some experience to know that renaming schemes and such can lead to problems. If the intent is to see if the person can come up with simple ideas for simple problems, he could probably go with a simpler question that isn't so reliant on practical experience - like the one about the weight of a 747.


    I have experience with the whole download/pickup routing, and the way it worked was this:

    1. transfer a file to some staging dir
    2. add a .PICKUP or whatever zero size file after done
    3. other process sees .PICKUP file, does some logic with .PROCESSING and whatever, then starts processing the file
    4. other process finishes and moves file into an archive dir
    5. monitoring software watches dir also and alerts for various error cases, like stale files, .PROCESSING over a certain age, etc.

    this process is easy to understand and troubleshoot, and also to repair when it burps, which is handy when dealing with the one part of the IF that can't be easily modified remotely and must not break, even at 2am.
  • Duke of New York 2008-09-24 16:09
    Shari would do well to rat out this company and employee by name somewhere (not necessarily here), because what's described is a rather blatant case of illegal employment discrimination.
  • sadwings 2008-09-24 16:10
    ftp the file to /somedir/.filename.dat and rename it to /somedir/filename.dat after you are finished writing it.
  • El Jeffe 2008-09-24 16:10
    Who watches the watcher...?
  • Franz_Kafka 2008-09-24 16:10
    Duke of New York:
    Shari would do well to rat out this company and employee by name somewhere (not necessarily here), because what's described is a rather blatant case of illegal employment discrimination.


    Since when is being a Brown alum a protected class?
  • Marc 2008-09-24 16:11
    You could give the watcher a low priority and starve CPU cycles indefinitely until you know the file has been downloaded.

    This is accomplished with two monkeys trained in Morse code. A monkey on the server side taps the file size to a monkey on the client side through a sophisticated can/string setup. When the file sizes are equal, the client side monkey can remove a banana from a bin placed on a balance scale. The downward pressure from the other side of the scale causes fluid motion in a hydraulic system. A lever arm connected to the hydraulic presses a button with four independent contacts. If 2 out of 4 contacts are detected, the priority of the watcher task is temporarily increased. The hydraulic system will include a pressure operated release valve to ensure the button is unpressed in a timely manner.

    The client side must constantly display a security application which logs file access. A video camera pointed at the display uses normalized cross correlation and unsupervised learning algorithms to detect additional file access in the security log. This results in the watcher being downgraded to the starvation priority until the redundant switch contacts are again activated.

    Periodically, both the client side monkey and the server side monkey must finish a game of Tic Tac Toe with a trained chicken to verify they are still paying attention/not dead. The game must end in a tie. If the game is not completed or the monkey loses, an alarm is sounded and the monkey is replaced. If the chicken loses, the chicken is replaced.
  • Franz_Kafka 2008-09-24 16:11
    El Jeffe:
    Who watches the watcher...?


    host monitoring watches the dir and alerts when files sit around too long.
  • sfgsfdgdsfgdsfg 2008-09-24 16:11
    Rick:
    adsfg:
    Branan:
    A file move is different from a file copy in Linux. A move involves changing one pointer in the filesystem, no information is actually "moved". So it's not actually a problem.
    What if you move from physical drive to another?

    Don't.

    Problem solved.


    Depends on how long you want to keep the file around.
    If it's download -> proces -> delete then the following can be applied:

    Assume: /final is partition A, /tmp is partion B.

    Download the file 'whatever' in /tmp,
    create a symbolic link in /final/ to /tmp/whatever
    process the file
    remove symbolic link
    remove real file

    Done.
  • Duke of New York 2008-09-24 16:19
    Franz_Kafka:
    Since when is being a Brown alum a protected class?

    Are you serious?

    - The candidate's name (although changed) is "Shari."
    - The interviewer's behavior was consistent with someone intentionally stalling to avoid the interview.
    - He specifically said that she had a problem "even if qualified professionally," before the interview began. In employment discrimination, that's the equivalent of a smoking gun.
  • Franz_Kafka 2008-09-24 16:22
    Duke of New York:
    Franz_Kafka:
    Since when is being a Brown alum a protected class?

    Are you serious?

    - The candidate's name (although changed) is "Shari."
    - The interviewer's behavior was consistent with someone intentionally stalling to avoid the interview.
    - He specifically said that she had a problem "even if qualified professionally," before the interview began. In employment discrimination, that's the equivalent of a smoking gun.


    So she's a woman and it's automatically sex discrimination? He took a long lunch and blew off an appointment because he's a self centered schmuck and mostly looked down on here for going to brown. The personality fit is a valid test, but in this case, the personality he wanted was 'sycophant'.
  • Honeyman 2008-09-24 16:23
    > “What about if the Downloader just wrote files to a temporary directory, and then moved the file to the appropriate directory when the download was complete.”

    What if the temporary directory is located on a different storage volume? You'll end up moving several gigabyte file from a volume to a volume, which is not that fast - isn't it why you explicitly mentioned "several gigabytes" so that this approach should be ignored?
    What if the volume with the temporary directory is smaller than the work dir? The download just fails if done into the temporary directory - while it would successfully work if done into the work dir.
    What if the the temporary directory is on the ramdisk? A nice side effect of your approach would be the complete system lock-up.
    What if your work directory is located on a different filesystem, which doesn't support several-Gb-long files? FAT-16 with its 4 Gb limit? Or what if the FS on your work directory doesn't support all the possible names which are allowed in the work dir? CP-1250 limited, while you need to download a file with Chinese hieroglyphs in the name? Or supports 8.3 names only?
    Finally - what if you don't have a temporary directory AT ALL? If it is an embedded solution - an ATM, a hardware switch or something? Have you heard of ADSL switches which have in-build Bittorrent clients?

    Too many "what if"s. Do you know Jeremy, what they mean? That you didn't mention anything of these in your problem preconditions. While you should have. Instead, you were changing the rules during the game, again and again. The proper software architect plays the rules given by the customer (i.e. YOU) and tries his best to cover all cases which are not defined explicitly. If he doesn't know, if he uses a Windows-based one-HDD-PC or a memory limited Linux/RISC-based ADSL router with no embedded HDD at all (but an SMB client), he assumes his worst.
    Mark it, Jeremy, nowhere in your questions you ever mentioned the conditions which make your own "solution" valid. While ALL of the solutions from your candidate are perfectly valid. His solutions are complex and valid. Your solution is simple and invalid.
    Sorry to say, Jeremy. You failed your own interview.
  • Val 2008-09-24 16:27
    I'm amazed nobody has answered "chmod" yet, as "I Guess That Would Work, Too"
  • some_other_dave 2008-09-24 16:28
    JamesQMurphy:
    Have you ever asked why you have high turnover?


    I suspect they already know exactly why they have high turnover, and the reason is something beyond their control. (I know it is where I worked recently...)
  • swordfishBob 2008-09-24 16:29
    Marc:
    You could build custom FPGA that intercepts packets on the the network and copies them to flash memory. The hardware can use a serial interface to indicate when the download is complete to a third program, 'The Mounter', which mounts the flash disk to the location the Watcher is expecting.

    The hardware can have a pool of flash memory disk areas, one being written to from the network, one mounted. Each flash memory area would only hold one file at a time.

    Since the Watcher is always running, I'm assuming it uses some sort of event handling system. An operating system hook to the event which indicated the Watcher is done processing and is now watching could be used to tell 'The Mounter' when its time to unmount a flash disk and mount the next one in the queue.

    Don't forget to mount it on a wooden table.
  • swordfishBob 2008-09-24 16:30
    Val:
    I'm amazed nobody has answered "chmod" yet, as "I Guess That Would Work, Too"

    Someone did, on page 1, but they didn't say "chmod" explicitly.
  • Franz_Kafka 2008-09-24 16:39
    Honeyman:
    > “What about if the Downloader just wrote files to a temporary directory, and then moved the file to the appropriate directory when the download was complete.”

    What if the temporary directory is located on a different storage volume? You'll end up moving several gigabyte file from a volume to a volume, which is not that fast - isn't it why you explicitly mentioned "several gigabytes" so that this approach should be ignored?

    Don't do that.


    What if the volume with the temporary directory is smaller than the work dir? The download just fails if done into the temporary directory - while it would successfully work if done into the work dir.
    What if the the temporary directory is on the ramdisk? A nice side effect of your approach would be the complete system lock-up.


    What's your fixation on /tmp? you download to one dir, then move it over. your questions are the equivalent of asking 'what if you deleted all your source code and set yourself on fire?'. completely irrelevant.


    What if your work directory is located on a different filesystem, which doesn't support several-Gb-long files? FAT-16 with its 4 Gb limit? Or what if the FS on your work directory doesn't support all the possible names which are allowed in the work dir? CP-1250 limited, while you need to download a file with Chinese hieroglyphs in the name? Or supports 8.3 names only?


    That's stupid. Don't do it.


    Finally - what if you don't have a temporary directory AT ALL? If it is an embedded solution - an ATM, a hardware switch or something? Have you heard of ADSL switches which have in-build Bittorrent clients?


    Well if you can't download the file at all, then you're SOL.


    Too many "what if"s. Do you know Jeremy, what they mean? That you didn't mention anything of these in your problem preconditions. While you should have. Instead, you were changing the rules during the game, again and again. The proper software architect plays the rules given by the customer (i.e. YOU) and tries his best to cover all cases which are not defined explicitly. If he doesn't know, if he uses a Windows-based one-HDD-PC or a memory limited Linux/RISC-based ADSL router with no embedded HDD at all (but an SMB client), he assumes his worst.
    Mark it, Jeremy, nowhere in your questions you ever mentioned the conditions which make your own "solution" valid. While ALL of the solutions from your candidate are perfectly valid. His solutions are complex and valid. Your solution is simple and invalid.
    Sorry to say, Jeremy. You failed your own interview.


    his solutions were pretty bad - hack the kernel for the sake of a file downloader?
  • Duke of New York 2008-09-24 16:40
    Franz_Kafka:
    So she's a woman and it's automatically sex discrimination? He took a long lunch and blew off an appointment because he's a self centered schmuck and mostly looked down on here for going to brown. The personality fit is a valid test, but in this case, the personality he wanted was 'sycophant'.

    What "personality fit"? The interviewer was talking about some "personal level problem" that he had somehow discovered before even doing the interview.

    This interviewer did several things that HR people specifically tell you in interview training never to do, because they are difficult to explain in court as something other than dscrimination. If the company has an EEO policy, he violated it, regardless of his reasons.
  • poochner 2008-09-24 16:40
    Marc:
    You could give the watcher a low priority and starve CPU cycles indefinitely until you know the file has been downloaded.

    This is accomplished with two monkeys trained in Morse code. A monkey on the server side taps the file size to a monkey on the client side through a sophisticated can/string setup. When the file sizes are equal, the client side monkey can remove a banana from a bin placed on a balance scale. The downward pressure from the other side of the scale causes fluid motion in a hydraulic system. A lever arm connected to the hydraulic presses a button with four independent contacts. If 2 out of 4 contacts are detected, the priority of the watcher task is temporarily increased. The hydraulic system will include a pressure operated release valve to ensure the button is unpressed in a timely manner.

    The client side must constantly display a security application which logs file access. A video camera pointed at the display uses normalized cross correlation and unsupervised learning algorithms to detect additional file access in the security log. This results in the watcher being downgraded to the starvation priority until the redundant switch contacts are again activated.

    Periodically, both the client side monkey and the server side monkey must finish a game of Tic Tac Toe with a trained chicken to verify they are still paying attention/not dead. The game must end in a tie. If the game is not completed or the monkey loses, an alarm is sounded and the monkey is replaced. If the chicken loses, the chicken is replaced.

    Just don't forget to mount different scratch monkeys on the client and server.
  • Tim 2008-09-24 16:40
    One time I was interviewing people for this job. One of the candidates was completely unqualified. I mean, he went to BROWN. ALL of our other candidates came from top notch schools, like Harvard or MIT. So about 30 minutes after the interview was supposed to go to lunch.

    When I got back, I made him wait another 15 minutes. By then he seemed a little pissed, so I asked him why. He had just got mad and left! Can you believe it? He should be giving me sexual favors! I have a PHD from Harvard!
  • Franz_Kafka 2008-09-24 16:44
    Duke of New York:
    Franz_Kafka:
    So she's a woman and it's automatically sex discrimination? He took a long lunch and blew off an appointment because he's a self centered schmuck and mostly looked down on here for going to brown. The personality fit is a valid test, but in this case, the personality he wanted was 'sycophant'.

    What "personality fit"? The interviewer was talking about some "personal level problem" that he had somehow discovered before even doing the interview.

    This interviewer did several things that HR people specifically tell you in interview training never to do, because they are difficult to explain in court as something other than dscrimination. If the company has an EEO policy, he violated it, regardless of his reasons.


    He interpreted her objection to getting blown off for nearly an hour as a personality problem. If I were that guy (I'm not) and I got sued, I would stand up in court and say "Sorry your Honor, but I'm a complete asshole. I treat men and women equally poorly".
  • Matt 2008-09-24 16:51
    For number 2 it's not a copy its a move which is an atomic action under Linux, and unless you are moving between two separate partitions (why would you) it happens pretty much instantaneously so no need to monitor progress.
  • Matt 2008-09-24 16:53
    TRWTF is that this message board doesn't do threaded replies!!!
  • cfreak 2008-09-24 16:53
    That wouldn't surprise me at all. I get this all the time. On my resume, underneath the list of the various programming languages I know and the experience I used to have a phrase "I've been exposed to other languages such as ... (list of languages) ... and could do simple tasks using them as a part of my other duties"

    I don't know how many times I got called by recruiters for Python or C# jobs. "But it says you've been exposed to it!". Yeah, I can take a program and debug it probably. Or I might be able to make small changes. I'm not going to get through an interview for a senior level position.

    I've pretty much simply refused the interviews. No point in wasting my time for a job I know I won't get (or if I did get it would be a living hell since anyone hiring me for a C# job would have to be off the scale on cluelessness)
  • Duke of New York 2008-09-24 16:58
    Franz_Kafka:
    If I were that guy (I'm not) and I got sued, I would stand up in court and say "Sorry your Honor, but I'm a complete asshole. I treat men and women equally poorly".

    That wouldn't keep him from having to testify as to the specific nature of the "personal problem" (not "personality problem") that was not related to her professional qualifications for the job. Or from having lawyers sift through his e-mails for evidence of a past pattern of behavior.

    Anyone who does interviews and doesn't see how clear-cut this was, is a walking liability and needs to get trained.
  • Mike D. 2008-09-24 16:59
    Matt:
    For number 2 it's not a copy its a move which is an atomic action under Linux, and unless you are moving between two separate partitions (why would you) it happens pretty much instantaneously so no need to monitor progress.

    Since the original setup has the same directory for downloading and processing, it's implied that both directories are on the same partition, unless the directory is the partition's root, in which case you'd only be in trouble if one directory couldn't be a subdirectory of the other. If that's a problem, you need to LART the programmer responsible.

    The concept of a non-atomic move on a filesystem horrifies me. What would the intermediate state look like? Would there be an entry in the directory but the inode isn't filled in yet?
  • Franz_Kafka 2008-09-24 17:01
    Duke of New York:
    Franz_Kafka:
    If I were that guy (I'm not) and I got sued, I would stand up in court and say "Sorry your Honor, but I'm a complete asshole. I treat men and women equally poorly".

    That wouldn't keep him from having to testify as to the specific nature of the "personal problem" (not "personality problem") that was not related to her professional qualifications for the job. Or from having lawyers sift through his e-mails for evidence of a past pattern of behavior.

    Anyone who does interviews and doesn't see how clear-cut this was, is a walking liability and needs to get trained.


    That's easy - he's hiring toadies and footstools. In a less perjorative example, I'm allowed to pass on hiring someone because they don't fit in with the team, even if they're professionally qualified.

    Being an abusive jerk isn't illegal.
  • Franz_Kafka 2008-09-24 17:03
    Mike D.:
    Matt:
    For number 2 it's not a copy its a move which is an atomic action under Linux, and unless you are moving between two separate partitions (why would you) it happens pretty much instantaneously so no need to monitor progress.

    Since the original setup has the same directory for downloading and processing, it's implied that both directories are on the same partition, unless the directory is the partition's root, in which case you'd only be in trouble if one directory couldn't be a subdirectory of the other. If that's a problem, you need to LART the programmer responsible.

    The concept of a non-atomic move on a filesystem horrifies me. What would the intermediate state look like? Would there be an entry in the directory but the inode isn't filled in yet?


    This is linux - a non-atomic move would mean that there are two links to the file - the old one and the new one.
  • Josh 2008-09-24 17:08
    A while back, I interviewed for a position as a programmer writing C code under QNX for embedded controls - a position requiring quite a bit of experience. I did fairly well on the technical part of the interview and matched personality well with the other team members.

    However, the manager kept going back to the fact that I didn't have a degree. He did this in a number of ways. It was a bit irritating, but overall, a pleasant experience.

    So an hour after I get home, the recruiter I was using called me excited that she had an offer for me. I politely declined it, and she was dumbfounded... even after I told her that, if the manager was harping on me during an interview for not having a degree, what kind of manager would he be? I had the qualifications for the job but didn't think it would be a good match.

    The recruiter never called me back.
  • Honeyman 2008-09-24 17:08
    Don't do that.

    Say that to customer? Was the candidate explicitly permitted to apply his own restrictions - to be able to say to an imaginary customer the words like "well, we'll do the simple solution, but it won't work if you mount the temporary directory on a different disk, etc, etc"?
    Or do you mean, that the candidate "should not do that"? Well, he didn't. He just designed the solution for the world, where this is probably done already.

    What's your fixation on /tmp? you download to one dir, then move it over.

    Do you have any sensible way to find any other directory available for temporary writing?
    Was this way given to the candidate?
    Thanks to such ostrich-head-in-the-sand approach, I was lucky enough to see several Windows programs which could not get installed to disk D, if the disk C has a lack of space. I've seen the programs which try to put its temporary files in the root directory. I've seen the MP3 players which require administration rights to be executed. I've seen enough programs written with "keep it simple" and "just don't do it" thoughts in mind.

    Finally, why any software developer should date to say "just don't do it" to a customer? It is the customer who pays money.

    Well if you can't download the file at all, then you're SOL.

    Read my words again. You may still be able to download the file. You may just don't have any temporary directory. So you may have the simple options to choose between - either you download the file right to the watched directory, or you cannot download the file at all. Example: an ADSL router with either NFS/Samba mounted volume, or an external hard drive whose root directory is being monitored by the Watcher.

    his solutions were pretty bad - hack the kernel for the sake of a file downloader?

    It is not the solutions which are bad. The solutions perfectly fit the problem - as it was presented.
    It is the problem which is poorly defined and leaves a lot of unclear spots.
    The candidate's solutions are overly complex but will work (unless the game rules change again, say, to prohibit the kernel hacking). The Jeremy's solution is simple but may not work. If you are writing a 10-bucks-per-bunch document backup utility - it is sufficient. If you are writing a non-upgradeable firmware to a hardware root DNS server, you are doomed. Was the candidate ever told that he is not writing a non-upgradeable firmware to a hardware root DNS server?

    But finally, anyway, no sensible person may name a working solution "bad", opposing them to the non-working "good" solution. Such dialogue worths another WTF.
  • Asiago Chow 2008-09-24 17:14
    Franz_Kafka:
    Asiago Chow:
    if the two directories are in the same filesystem, it doesn't matter if they're in different disks. In fact, i can guarantee that they don't.


    Not sure what you meant to say here. It does matter whether they are on the same disk or partition. I don't know what you can guarantee.


    No, it only matters that they are on the same filesystem. If you don't know what that means, go read a book.



    LOL...OK, so it's a jargon issue.

    The filesystem, in Unixish parlance, is the directory tree as a whole. It spans partitions and physical devices. Oh, it is also the arrangement of data and files within a partition but when people talk about "the filesystem" they mean the entire hierarchy. It can easily span partitions, logical volumes, devices, etc.

    I'm guessing you are more familiar with the DOS/Windows world where filesystem has a more limited meaning. However, the context of the conversation made it clear that they were talking about the files spanning physical devices or partitions within the filesystem...which means they were using the unixish meaning of the word.

    You should follow your own advice and do some reading.
  • Franz_Kafka 2008-09-24 17:16
    Honeyman:
    Don't do that.

    Say that to customer? Was the candidate explicitly permitted to apply his own restrictions - to be able to say to an imaginary customer the words like "well, we'll do the simple solution, but it won't work if you mount the temporary directory on a different disk, etc, etc"?


    No, you don't use temp at all. And you either control the install or write the install requirements in the docs.

    What's your fixation on /tmp? you download to one dir, then move it over.

    Do you have any sensible way to find any other directory available for temporary writing?


    Yes. Pick another directory on the same file system. Any one will do, but something like /staging is reasonable.

    Well if you can't download the file at all, then you're SOL.

    Read my words again. You may still be able to download the file. You may just don't have any temporary directory.


    who cares? You're the one who wants to bung it into /tmp.

    So you may have the simple options to choose between - either you download the file right to the watched directory, or you cannot download the file at all.


    not very bright, are we? the third option is to download into /a then move to /b, which takes no space from /tmp. Duh.

    his solutions were pretty bad - hack the kernel for the sake of a file downloader?

    It is not the solutions which are bad. The solutions perfectly fit the problem - as it was presented.
    It is the problem which is poorly defined and leaves a lot of unclear spots.
    The candidate's solutions are overly complex but will work (unless the game rules change again, say, to prohibit the kernel hacking). The Jeremy's solution is simple but may not work. If you are writing a 10-bucks-per-bunch document backup utility - it is sufficient. If you are writing a non-upgradeable firmware to a hardware root DNS server, you are doomed. Was the candidate ever told that he is not writing a non-upgradeable firmware to a hardware root DNS server?

    But finally, anyway, no sensible person may name a working solution "bad", opposing them to the non-working "good" solution. Such dialogue worths another WTF.


    No sensible person will accept kernel hacks to support a file downloader. If you're writing a $10/mo, then you absolutely want a simple solution (not his), because otherwise, support costs will kill you.
  • Honeyman 2008-09-24 17:18
    One day I heard the Unix guys called ReiserFS "a filesystem"...
  • Franz_Kafka 2008-09-24 17:22
    Asiago Chow:
    Franz_Kafka:
    Asiago Chow:
    if the two directories are in the same filesystem, it doesn't matter if they're in different disks. In fact, i can guarantee that they don't.


    Not sure what you meant to say here. It does matter whether they are on the same disk or partition. I don't know what you can guarantee.


    No, it only matters that they are on the same filesystem. If you don't know what that means, go read a book.



    LOL...OK, so it's a jargon issue.

    The filesystem, in Unixish parlance, is the directory tree as a whole. It spans partitions and physical devices. Oh, it is also the arrangement of data and files within a partition but when people talk about "the filesystem" they mean the entire hierarchy. It can easily span partitions, logical volumes, devices, etc.

    I'm guessing you are more familiar with the DOS/Windows world where filesystem has a more limited meaning. However, the context of the conversation made it clear that they were talking about the files spanning physical devices or partitions within the filesystem...which means they were using the unixish meaning of the word.

    You should follow your own advice and do some reading.


    No, I don't think I will. The filesystem refers to a logical volume. The directory tree is everything that the unix file subsystem can see. I'm guessing you don't know much about unix to be lecturing me about something so simple - I've been at it for about 15 years and I know more than you.

    Now then, the filesystem is a single volume. Always. fsck operates on one volume at a time, for instance, and fstab lists them out, one per line. The context didn't really cover what you think at all - that's why I was specific in my advice.

    Besides, if filesystem referred to the whole tree, then concepts like 'same fs' are meaningless.
  • Montoya 2008-09-24 17:26
    I swear, I could have been the interviewer on that last interview. Except it was a different company and I was the only technical interviewer. It's funny... every company has a different idea about what a "senior developer" should be able to do. I've worked at some places where that meant you had to be a rocket scientist, and then others where at most you had to know how to include a file. There's no industry "standard." Even as the interviewer for a company like that, you have to pretend that you agree with the title given to the job. I didn't consider that a WTF, because it sounded too familiar :(
  • Asiago Chow 2008-09-24 17:29
    Shrug. You really should do some reading. Knowledge is always good. I can't force you of course but it will help you to understand what people are saying.

    Filesystem has two meanings.

    Your job as a reader is to figure out which of those two meanings the writer intended.

    You failed.
  • Franz_Kafka 2008-09-24 17:30
    Montoya:
    I swear, I could have been the interviewer on that last interview. Except it was a different company and I was the only technical interviewer. It's funny... every company has a different idea about what a "senior developer" should be able to do. I've worked at some places where that meant you had to be a rocket scientist, and then others where at most you had to know how to include a file. There's no industry "standard." Even as the interviewer for a company like that, you have to pretend that you agree with the title given to the job. I didn't consider that a WTF, because it sounded too familiar :(


    True that. To compound things, some places will post ads for a software developer III and a programmer IV and just assume that people know what that means. Since, as you say, there aren't any standards, why bother with even finer grained distinctions?
  • Duke of New York 2008-09-24 17:31
    Franz_Kafka:
    Duke of New York:
    Franz_Kafka:
    If I were that guy (I'm not) and I got sued, I would stand up in court and say "Sorry your Honor, but I'm a complete asshole. I treat men and women equally poorly".

    That wouldn't keep him from having to testify as to the specific nature of the "personal problem" (not "personality problem") that was not related to her professional qualifications for the job. Or from having lawyers sift through his e-mails for evidence of a past pattern of behavior.

    Anyone who does interviews and doesn't see how clear-cut this was, is a walking liability and needs to get trained.


    That's easy - he's hiring toadies and footstools. In a less perjorative example, I'm allowed to pass on hiring someone because they don't fit in with the team, even if they're professionally qualified.

    In an EEO shop you're not allowed to do that to the candidate's face before even starting the interview.
  • Honeyman 2008-09-24 17:32
    not very bright, are we? the third option is to download into /a then move to /b, which takes no space from /tmp. Duh.

    I thought we are talking about Unix, right?
    ~:$ mkdir /a
    mkdir: cannot create directory `/a': Permission denied

    Why I mentioned /tmp, is because it is usually open for writing. If it exists.
    Though, nothing changes to better side if you use "/a" instead of "/tmp". The things just become even more complex. Cause you need even more assumptions to be legally able to write to /a dir than to /tmp dir.
    Not to say that the root dir in Unix is mounted read-only too often to consider the opposite.

    Note, that, according to the rules, the only place you are allowed to write is the working directory. You've never been mentioned that you may write somewhere else. If you don't verbally confirm such obvious thing - you may have "a lack of initiative" or something, but if you assume this thing while it is never mentioned - you have a "lack of bordercase feeling". Frankly, I would hire the person with the lack of initiative.

    No sensible person will accept kernel hacks to support a file downloader.

    I've personally been in the team which had the Linux kernel severely patched for the purpose of the userspace applications running on that system. The system had a 6-to-7-digits cost, contained about 10 years old legacy code, and required a specially designed hardware to run.
    Looking at the Jeremy's story, I've never noticed a mention that this downloader is not going to become a part of a similar system.
  • Franz_Kafka 2008-09-24 17:33
    Asiago Chow:
    Shrug. You really should do some reading. Knowledge is always good. I can't force you of course but it will help you to understand what people are saying.

    Filesystem has two meanings.

    Your job as a reader is to figure out which of those two meanings the writer intended.

    You failed.


    Go read http://linux.die.net/man/8/mount. Filesystem in this context means one thing only. Your job as a reader was to understand that, and you failed. I don't care if you think that filesystem describes the whole shebang - it doesn't. You can refer to the virtual filesystem if you like, but that has different semantics.
  • Leigh 2008-09-24 17:38
    I keep getting calls for .NET programming. I should note there is absolutely NO .NET programming on my resume. I don't mention it at all. There's even a minimum on my resume about windows.

    I finally figured out why I get the calls. It's my email address. One of these days I"m going to get a call from someone 'who read my resume and thinks I'd be perfect for the .NET position!' and lose my temper and ask them in what universe did they read it. Recruiters no longer read resumes, their computer programs key off of certain words and then you get the calls from the idiots who waste your time with... well... idiocy. (And then they're shocked that you're not interested in the interview!)
  • Franz_Kafka 2008-09-24 17:41
    Honeyman:
    not very bright, are we? the third option is to download into /a then move to /b, which takes no space from /tmp. Duh.

    I thought we are talking about Unix, right?
    ~:$ mkdir /a
    mkdir: cannot create directory `/a': Permission denied


    yeah, I'm lazy and didn't include an implicit root. Here you go:

    the third option is to download into ${DL_ROOT}/a then move to ${DL_ROOT}/b

    Now stick DL_ROOT somewhere that's got about 50-100G free - twice whatever your expected retention is.

    Why I mentioned /tmp, is because it is usually open for writing. If it exists.
    Though, nothing changes to better side if you use "/a" instead of "/tmp". The things just become even more complex. Cause you need even more assumptions to be legally able to write to /a dir than to /tmp dir.
    Not to say that the root dir in Unix is mounted read-only too often to consider the opposite.


    yeah, see above. I didn't want to repeat myself for the 3rd time.

    Note, that, according to the rules, the only place you are allowed to write is the working directory. You've never been mentioned that you may write somewhere else. If you don't verbally confirm such obvious thing - you may have "a lack of initiative" or something, but if you assume this thing while it is never mentioned - you have a "lack of bordercase feeling". Frankly, I would hire the person with the lack of initiative.


    I didn't mention it to you because we haven't got anything like requirements. There are lots of other choices that fall under the purview of write to dir a, move to dir b, and I don't need to mention them here - it's pointless on such a fuzzy problem.

    No sensible person will accept kernel hacks to support a file downloader.

    I've personally been in the team which had the Linux kernel severely patched for the purpose of the userspace applications running on that system. The system had a 6-to-7-digits cost, contained about 10 years old legacy code, and required a specially designed hardware to run.
    Looking at the Jeremy's story, I've never noticed a mention that this downloader is not going to become a part of a similar system.


    Me too, minus the special hardware and kernel hacks. If I'm writing a file catcher, I'm not adding to the mess when there are better solutions available.
  • Sam 2008-09-24 17:43
    Max:

    2) An interviewer who thinks downloads in progress are a problem but file copies in progress are not shows a lack of understanding.


    2) Actually, he was talking about a Move, not a Copy. Assuming your temporary directory is on the same filesystem as the destination, then a move requires only moving the root node of the file to the new directory node and is pretty much instant. Perhaps that's why he's the interviewer and you wouldn't get the job.
  • Shari 2008-09-24 17:44
    Yep, I most definitely spread the word about that company on my personal blog (which is just a tiny blip on the web in a strange foreign language but is indexed nicely by google).
  • Iain Collins 2008-09-24 17:56
    1. It would be nice to have a database with stories like this, listing the companies. A bit like FuckedCompany.com used to be, but from a prospective employee perspective (pointers to existing sites that fit the bill appreciated).

    I've had some similar interviews myself, I was about to write them up but I'll save them for my own submission at some point I think...


    2. I don't like to nitpick ideas but moving a the file is not really a great solution, never mind the only one - plain old file locking makes more sense here (thats kinda what it's for...and by "kinda" I mean exactly).

    The best practice approach would of course be to use advisory locking in both applications, and this would be fine with scenario as originally outlined. You should pretty much always use advisory locking when reading/writing to files, it's amazing how often people don't bother with any sort of file locking (and how many minor but never the less irrigating problems arise as a result).

    Half way through there is arbitrary condition of "let’s just say you can’t modify the Watcher" introduced (albeit with the best of intentions) but in a real world environment in my experience if you've got the code for the Downloader you will _usually_ have the code for the Watcher and so be able to fix both applications to work as they should have done in the first place, or at least you'd be able to walk over to / call someone who does look after that application and mention it to them.

    If in some unusual scenario that was not the case I would absolutely fire a bug report to whomever was responsible requesting that support for observing advisory locks is added (given the application *crashes* on files that are still being written to - that's a pretty critical show stopper and the Watcher applications problem).

    In the mean time, mandatory locking in the Downloader would prevent the Watcher from opening the file.

    Okay, the Watcher could in theory still crash in this scenario, _if_ it has timeout mechanism wrapped round the file open instruction _and_ there is an unhanded exception that occurs when that happens, but that's highly unlikely really, especially given it doesn't even call flock() / lockf() in the first place. Usually a mandatory lock will just cause the application that trying to get a read on a file (without looking for a lock) to sit there until the file is cleared for reading, with no other ill effects, but I mention this just for completeness.

    Ob: Mandatory locking is usually poor form though and arguably at best only /slightly/ less evil than a crashing app. ;-)

    In this situation, it's also a good idea to consider setting up a monitor to look out for the Watcher process, if it really is that bad.


    3. I've been on both sides of interviews like that - well pretty much, I never over egg simple questions as being deeply technical, but I go out of my way to avoid making people feel uncomfortable in interviews and ramp up the questions slowly and leave them open ended - it's often not the candidates fault when they are sent to an interview for a job that they are not really ready for and there is no point in making someone feel bad who is already probably quite stressed out.
  • Franz_Kafka 2008-09-24 17:59
    Sam:
    Max:

    2) An interviewer who thinks downloads in progress are a problem but file copies in progress are not shows a lack of understanding.


    2) Actually, he was talking about a Move, not a Copy. Assuming your temporary directory is on the same filesystem as the destination, then a move requires only moving the root node of the file to the new directory node and is pretty much instant. Perhaps that's why he's the interviewer and you wouldn't get the job.


    To head off another exchange, this is almost certainly not tmp - you're probably going to keep the file around, so you may as well stick it in a dir on the destination fs.
  • mda 2008-09-24 18:01
    After reading all the comments, I'm pretty sure there are a lot of people here who really want to make things complicated and try to avoid simple solutions at any costs.

    /Bla/DownAndWatch for downloading
    /Bla/DownAndWatch/complete for the Watcher

    mv really IS the fastest and most simple solution. Forget about /tmp and stuff like that - this stuff just needs to be on the same physical volume, otherwise you need to double your free space needs.
  • Honeyman 2008-09-24 18:11
    /Bla/DownAndWatch for downloading
    /Bla/DownAndWatch/complete for the Watcher

    A tiny thing may stop you of thinking you are the smartest guy of all the TDWTF.
    You are not allowed to edit Watcher.
    So if, for some reason, the watcher watches the whole "/home/watcher/" directory, you'll just have no place for a separate downloading directory.
    And if you decide to download to "/home/watcher/download/" and watch the "/watcher/", then watcher may consider the /home/watcher/download/ directory a file too, and fail. Or it may watch the files recursively. Well, at least the opposite is not proven.
  • Honeyman 2008-09-24 18:13
    And btw, even
    /Bla/DownAndWatch for downloading
    /Bla/DownAndWatch/complete for the Watcher

    does not ensure that the downloading and the watching directories are on the same physical volume. Oh, those Unix IT specialists, they are so inventive...
  • Franz_Kafka 2008-09-24 18:15
    Honeyman:
    /Bla/DownAndWatch for downloading
    /Bla/DownAndWatch/complete for the Watcher

    A tiny thing may stop you of thinking you are the smartest guy of all the TDWTF.
    You are not allowed to edit Watcher.
    So if, for some reason, the watcher watches the whole "/home/watcher/" directory, you'll just have no place for a separate downloading directory.
    And if you decide to download to "/home/watcher/download/" and watch the "/watcher/", then watcher may consider the /home/watcher/download/ directory a file too, and fail. Or it may watch the files recursively. Well, at least the opposite is not proven.


    In that case, watcher might start trying to process .bash_profile. Anyway, the only requirement is to get another directory on the same filesystem. watcher is most likely configured using something like watcher.cfg, but even if it isn't, this isn't a big deal. So you create ~download/staging and download there, then move over (yeah, glossing over permissions - set you umask or run them under the same account).
  • Franz_Kafka 2008-09-24 18:17
    Honeyman:
    And btw, even
    /Bla/DownAndWatch for downloading
    /Bla/DownAndWatch/complete for the Watcher

    does not ensure that the downloading and the watching directories are on the same physical volume. Oh, those Unix IT specialists, they are so inventive...


    that falls under incompetence; you write your install doc to specify that they are on the same filesystem. mounting /Bla/DownAndWatch/complete as a new fs is not a supported config.
  • Mike 2008-09-24 18:17
    Make sure you do body { text-align: center; } too, otherwise IE 6 won't place nice.
  • jeremypnet 2008-09-24 18:29
    Honeyman:
    Don't do that.

    Do you have any sensible way to find any other directory available for temporary writing?

    Yes, it's quite simple. You put the temporary file in the destination directory but make sure it's hidden from the watcher in some way. How do you hide it? Well you give it an extension the watcher isn't looking for or you put it in a sub directory.

    If you are really paranoid, you can forgo the Unix mv command and write your own wrapper to the rename system call. The rename system call will fail with EXDEV if it attempts to move a file across file systems.

    If you don't think the download temp file/rename solution is the best one, why not check out your favourite browser as it downloads a 500 megabyte porn movie. In this case, of course, the watcher is not even a process on the computer, but a human being. Firefox, IE and Safari all implement variations on the theme.
  • undrline 2008-09-24 18:38
    The interviewer started making the situation more complex than it had to be, for the same end-result of figuring out if a candidate is right for the job posted. Isn't he exemplifying the exact kind of candidate that he was trying to filter-out?
  • Franz_Kafka 2008-09-24 18:39
    jeremypnet:
    Honeyman:
    Don't do that.

    Do you have any sensible way to find any other directory available for temporary writing?


    If you don't think the download temp file/rename solution is the best one, why not check out your favourite browser as it downloads a 500 megabyte porn movie. In this case, of course, the watcher is not even a process on the computer, but a human being. Firefox, IE and Safari all implement variations on the theme.


    Funny, the IE I have does the download and copy thing, even when I tell it to save the file to a desktop. Porn movie, jdk, fedora ISO, whatever.
  • Iain Collins 2008-09-24 18:53
    Charles Duffy:
    Moreover, the tempfile-and-move approach (where, yes, you put your temporary file in the same directory or otherwise somewhere known to be on the same partition) is the generally accepted solution to this problem; if you don't know it, there's no way in hell I want you on my systems engineering team.

    Again, the problem the interviewer was looking for is the way every single UNIX developer with any kind of a clue does atomic file updates.

    Like David Emery, the commentator you were replying to, I would tend very strongly towards locking the file rather than moving the file.

    Temporary file locations are a good way to indicate to end users of desktop applications when a thing is finished downloading (in the way that say, that Firefox does with downloads when it renames them) but it doesn't seem a particularly great solution here and seems odd to insist it is "the one true way" when it comes to writing systems software - I certainly don't think it is any such thing (and is more obtuse than file locking, which exists to handle exactly the kind of problem described).

    If you went for moving a file around instead of simply locking you'd have to have a configuration option to indicate where the separate temporary directory is, and add quite a bit of error handling - e.g. to see if the temp directory is really not the same as the directory it's saving files to (and also possibly - from the description we have of how the Watcher works - that it's not a subdirectory), that the temporary directory is on the same volume, that the directory is not a link to somewhere on another volume (and or link to the same folder as the download folder) and possibly other things that don't immediately spring to mind.

    I can't see anything in favor of moving over simple locking. If you are writing to a file at any point you ought to be getting at least an advisory file lock anyway, assuming you are not a cowboy, so moving the file ultimately just adds a bunch of extra possible failure scenarios.

  • James 2008-09-24 18:55
    No, that wouldn't work. The file record will show up in the watched directory in the midst of copying and your watcher program catches it in mid-copy. You have the same problem as mid-download, just a much narrower range of failure opportunity.

    The better solution (assuming you cannot PUSH from the downloading location) is to have the downloader and watcher's logic (not its directory monitoring features) in the same application. On top of that, you probably ideally want some sort of streaming solution with a backup-to-disk feature if you still want those files sitting around.

    A directory monitoring approach will never work in today's filesystem implementations. What is needed is a true indicator of download/copy completeness flag within the file's metadata; and naturally for OS functions and custom downloaders to support said metadata.
  • Nate 2008-09-24 18:55
    smasher:
    "moving" a file and "copying" a file are not the same. mv is hardly atomic, but if the file appears in the target directory, it's already been written to disk.


    Actually, on most file systems, "moving" a file, providing its on the same disk and partition, doesn't actualy move the data, it just changes the record in the allocation table that says where the file is, so as long as the file was fully downloaded before the move, the problem of hitting the end of an incomplete file could never occur.
  • Franz_Kafka 2008-09-24 18:59
    Iain Collins:
    Charles Duffy:
    Moreover, the tempfile-and-move approach (where, yes, you put your temporary file in the same directory or otherwise somewhere known to be on the same partition) is the generally accepted solution to this problem; if you don't know it, there's no way in hell I want you on my systems engineering team.

    Again, the problem the interviewer was looking for is the way every single UNIX developer with any kind of a clue does atomic file updates.

    Like David Emery, the commentator you were replying to, I would tend very strongly towards locking the file rather than moving the file.

    I disagree.


    I can't see anything in favor of moving over simple locking. If you are writing to a file at any point you ought to be getting at least an advisory file lock anyway, assuming you are not a cowboy, so moving the file ultimately just adds a bunch of extra possible failure scenarios.


    suppose you're downloading that 5G file and the router dies halfway through. Your download thingy shoots itself in the head because woops, bug in the code. File lock goes away, watcher chews on half downloaded file, explodes.

    Now suppose you're using one of the other two methods - lock files (adv. locking) or some other dir. Same scenario, but the fallout is a half downloaded file and the watcher doesn't try to eat it. As a bonus, you can recover easily when the router comes back, depending on your protocol, or just try again. It's also easy to write monitors to catch stale files and that sort of thing compared with file locking.
  • Franz_Kafka 2008-09-24 19:01
    James:
    No, that wouldn't work. The file record will show up in the watched directory in the midst of copying and your watcher program catches it in mid-copy. You have the same problem as mid-download, just a much narrower range of failure opportunity.


    No, this is a move - there is no such thing as mid-copy.


    A directory monitoring approach will never work in today's filesystem implementations. What is needed is a true indicator of download/copy completeness flag within the file's metadata; and naturally for OS functions and custom downloaders to support said metadata.


    That's basically what the while .PROCESSING, .TRANSMIT convention does.
  • DaveK 2008-09-24 19:10
    Franz_Kafka:
    jeremypnet:
    Honeyman:
    Don't do that.

    Do you have any sensible way to find any other directory available for temporary writing?


    If you don't think the download temp file/rename solution is the best one, why not check out your favourite browser as it downloads a 500 megabyte porn movie. In this case, of course, the watcher is not even a process on the computer, but a human being. Firefox, IE and Safari all implement variations on the theme.


    Funny, the EG I have does the download and copy thing, even when I tell it to save the file to a desktop. Porn movie, jdk, fedora ISO, whatever.

    FTFY :)

    (yes. IE does it wrong. No surprise there. FF and others get it right.)
  • Iain Collins 2008-09-24 19:35
    Franz_Kafka:
    suppose you're downloading that 5G file and the router dies halfway through. Your download thingy shoots itself in the head because woops, bug in the code. File lock goes away, watcher chews on half downloaded file, explodes.

    Now suppose you're using one of the other two methods - lock files (adv. locking) or some other dir. Same scenario, but the fallout is a half downloaded file and the watcher doesn't try to eat it. As a bonus, you can recover easily when the router comes back, depending on your protocol, or just try again. It's also easy to write monitors to catch stale files and that sort of thing compared with file locking.


    You have to handle dealing with the the connection disappearing during the download, locking or no locking - I've written many applications and scripts in a range of languages which handle large file transfers reliably and I it's not made any more complicated by locking, quite the contrary (it is really wacky to write to files without locking though). Neither, I should add, does it in any way prevent you from implementing a resumable download.

    File locking is really, really something you should always be doing when writing to a file regardless of whether you are using a temp file or not. Simply not bothering to use advisory locking is bad form.

    Hint for one reason why: If you get two copies of the Downloader running, let alone anything else trying to access that file while it's still being written it is going to end up corrupt and the Watcher can either end up processing duplicate data (or simply crashing, which is what it seems to do when it comes across malformed data).

    What your suggesting is a easier solution only for the sort of rent-a-coder who (a) doesn't intend to do any file locking at all and (b) doesn't intend to do error handling on the download and (c) doesn't care about the integrity of the data.

    Of course, doing BOTH temporary file handling AND advisory locking is arguably an ideal technical solution, but of course it will take considerably longer to implement and test the appropriate level of error handling for all the new potential issues it raised compared just doing some simple file locking that should be in the application in the first place. One is a straightforward bug fix, the other introduces new functionality to the application.

  • OBloodyhell 2008-09-24 19:52
    Marc:
    Rename?


    My thoughts exactly. Apparently, you're writing the downloader, so
    a) Why, exactly, can't it have an EOF marker, since you're also writing the file processor?
    b) Regardless of 'a': Simplest solution of all -- Rename "download file xx" to "Done-download file xx". Your watcher only sees files beginning with "Done-" thanks to file filtering.
  • Franz_Kafka 2008-09-24 20:16
    OBloodyhell:
    Marc:
    Rename?


    My thoughts exactly. Apparently, you're writing the downloader, so
    a) Why, exactly, can't it have an EOF marker, since you're also writing the file processor?
    b) Regardless of 'a': Simplest solution of all -- Rename "download file xx" to "Done-download file xx". Your watcher only sees files beginning with "Done-" thanks to file filtering.


    a: we haven't really said anything about the format of the file, and we don't really want the downloader to know any format details because that means you have to update it when a new format comes around.
  • m0ffx 2008-09-24 20:19
    I have to confess to not seeing the solution to the downloader/watcher problem.

    I would have had the watcher check the file size, and when it stops increasing, process the file. However, that will fail if the download completely stalls.

    Still, I'm not a developer.
  • Bizzle 2008-09-24 20:21
    Is it just me, or was the interviewer from the first story a total jerk?
  • Xythar 2008-09-24 20:32
    brodie:
    Someone You Know:
    Gorfblot:
    A diamond in the rough is something that looks to be of little value, but is actually worth quite a lot once some polishing has been done. IE, You talked with one of the Tier 1 helpdesk people and realized they had some talent and were quick to learn. You get them some further training, move them to a junior development position, and in a short time they become a major contributor to success- That's a diamond in the rough.


    I don't think "IE" means what you think it means.

    It looks like he's using it as the abbreviation for "id est," which essentially means "in other words..." Which is perfectly fine, as used.

    You use e.g. when you want to give specific examples, and i.e. when you want to elaborate on something.


    It's not correct because he was giving a specific example and should have used e.g. Using i.e. would have meant that the *only* meaning of diamond in the rough was the situation he described specifically.

    For example if you worked for a company that only printed newspapers you could say "our publications (i.e. newspapers)". If your company printed books and magazines as well it would only be correct to say "our publications (e.g. newspapers)".
  • Dracolith 2008-09-24 20:56
    Max:

    2) An interviewer who thinks downloads in progress are a problem but file copies in progress are not shows a lack of understanding.


    Not at all.. UNIX. Using rename() just moves the link to the file to a new directory, provided source and destination are on the same filesystem.

    The file is already saved to disk, nothing about the file changes other than an entry is created in the new directory.

    As far as software running on the system may be concerned, this is completely atomic. And this is very frequently the method used by various tools.

    For example, rsync uses this method when transferring a new file to a host (a temporary file is created during the transfer, and after the transfer, it is rename()'d to the final location of the file).




  • hatterson 2008-09-24 21:00
    blindio:
    Seems to me that this is a 3rd party application (the watcher) as it's not something we can change, yet it crashes on a pretty obvious scenario. I think the solution is to contact the vendor and tell them to fix their buggy POS software and send me a patch.


    In an ideal world that would be great. However in the real world this isn't always the case. Especially when dealing with government employees you often run across the "that's the way it is, deal with it" or "it's not specified in the contract"
  • Dracolith 2008-09-24 21:06
    David Emery:
    The "open exclusive" is the -right way*- to do this. If /tmp is located on a different file system/drive than the destination, then the file move operation is not guaranteed atomic.


    So guarantee the rename() is atomic. Put the new file in the same directory, but prefix it with a ".", give it a naming convention that will cause other processes not to mess with it, or place it in a subdirectory.
    Open exclusive does not work, because there is no such feature.

    I mount my filesystems using NFS on UNIX which does not provide proper file locking of any sort.
    It is common practice to use such filesystems.

    renmame() is pretty sure to work everywhere, "open exclusive" is only valid if you make liberal and unwarranted assumptions about OS and filesystem types.






  • hatterson 2008-09-24 21:06
    Nicolas Verhaeghe:
    Smash King:
    klenow:
    Is it just me, or does Jeremy sound like a bit of a jerk? He simply wouldn't let the guy use any solution that wasn't his pet solution. Seems the simplest solution is to have the Downloader activate the Watcher when it's done.
    You cross a dimensional portal and gets stuck in a world where interviewers are unable to use adaptive interviews "IE" those that change to turn whatever answer the interviewee provided automatically wrong. Now what do you do?


    I too could find FIVE different solutions, just as simple

    1-The watcher is activated by the downloader
    2-The watcher and the downloader are one same application
    3-While being downloaded, the file is named "tmp_[filename]" and then renamed "[filename]" and the watcher knows not to open files starting with "tmp_"
    4-Downloader is scheduled to work at the top of the hour, Watcher scheduled to work at the bottom of the hour, and if you have a T1, what kind of a file would take more than 30 minutes to download?!?
    5-Watcher is coded properly with TRY/CATCH clauses and you go from there...

    Why would you have a "watcher" spend it's time hogging processor time and memory 24/7 anyway?

    In my world, my SSIS packages fire on schedule and do their tasks sequentially.

    If the interviewer does not accept any other solution than is, this means only one thing: control freak. Take your stuff and say bye bye.


    1 - Requires having access to the server the watcher is running on. This may or may not be the same server as the downloader.
    2 - Requires modifying the watcher.
    3 - Is essentially the same as the proposed solution. Rename is the exact same as move when the move is done on the same filesystem.
    4 - This assumes that you can modify the run schedule of the watcher and also places a hard cap on the size of file that can be transfered, although it may work it's far from ideal.
    5 - In an ideal world that would be great, in the real world it's likely that you'll have to interface with a PoS at some point and simply have to deal with it.
  • moz 2008-09-24 21:27
    Jules Winnfield:
    brodie:
    You use e.g. when you want to give specific examples, and i.e. when you want to elaborate on something.
    I use IE when I want to install malware.

    Really? How odd. I use Apache, myself.
  • Franz_Kafka 2008-09-24 21:38
    moz:
    Jules Winnfield:
    brodie:
    You use e.g. when you want to give specific examples, and i.e. when you want to elaborate on something.
    I use IE when I want to install malware.

    Really? How odd. I use Apache, myself.


    All depends on whether you're giving or receiving.
  • gero 2008-09-24 21:52
    OK, a FAQ about the different "solutions" offered

    Q. What if it's on a different partition?
    A. Then make a temporary directory inside the same directory. You're guaranteed it will be on the same partition.

    Q. Locking?
    A. You can't control the watcher

    Q. Rename instead of move?
    A. It's the same damn thing. The system call, not the system utility "mv" (which is the same thing when on the same partition).

    Q. Activate the watcher after the downloader.
    A1. You are downloading multiple files... the watcher could process any "unfinished" file in this case.
    A2. There is probably a difference between the "watcher" and the file "processor" (as in functionality, though it sounds they are the same program in this case). You probably want to start the processor on the downloaded file, but crap, you can't modify it.
  • ContraCorners 2008-09-24 21:56
    blindio:
    Seems to me that this is a 3rd party application (the watcher) as it's not something we can change, yet it crashes on a pretty obvious scenario. I think the solution is to contact the vendor and tell them to fix their buggy POS software and send me a patch.


    The post didn't say anything about a retail application. How did you know it was Point Of Sale? oh... I get it...

    never mind
  • JimBob 2008-09-24 21:57

    Say that to customer? Was the candidate explicitly permitted to apply his own restrictions - to be able to say to an imaginary customer the words like "well, we'll do the simple solution, but it won't work if you mount the temporary directory on a different disk, etc, etc"?

    What if's are great and a valuable tool to any software engineer. However, if you follow the thread for too long, you never get anything done. There are an infinite chain of "what if's" that must be accounted for in even the most trivial of exercises. The difference between someone good at asking questions and someone good at finding solutions is knowing when to draw the line.

    The candidate could have established that line immediately by just asking a couple of basic questions to determine the parameters of the problem. Instead he follows a progressively more convoluted trail to answer a problem he was told has a simple solution - culminating at modifying the linux kernel itself.

    The interviewer did not say it was the best solution. He said it was simple. If I get someone in front of me who can't see the forest for the trees, I'm going to pass on him too.

  • tgape 2008-09-24 22:01
    James R. Twine:
    Ok - so the majority here believe that mv is an atomic operation if the src and dst are on the same filesystem (or partition?)...

    Does this hold true for ALL filesystems running under Linux/Unix? It might be that the directory is not running on ext2/ext3... What if it was a mounted FAT32 partition, or RiserFS, or UFS, or a SMB share?


    It holds for every single file on any POSIX OS, regardless of whether that file is on a native filesystem or not. This is because rename(2) is atomic, according to POSIX, so if you want to be POSIX, rename better be atomic. And it's incredibly easy to have rename(2) be sufficiently atomic for this purpose. (Note: even if rename was implemented by doing a link(2) followed by an unlink(2) of the old name, it would *still* be fast enough, unless Watcher is so fragile that it dies if it ever sees a file with multiple links. For it to do that, it'd pretty much have to be coded to do that, however - and it would still be such a tiny window that it's virtually inconceivable that it would happen regularly.)

    Btw, this means Windows, too. POSIX is a big thing.

    (Note: there are many unix OSes which have optional POSIX support. However, their non-POSIX mode is still loosely POSIX, and still does this. It's too necessary to not do it, and it's too useful to write the code to do it and then not always use it.)
  • tgape 2008-09-24 22:03
    klenow:
    Is it just me, or does Jeremy sound like a bit of a jerk? He simply wouldn't let the guy use any solution that wasn't his pet solution. Seems the simplest solution is to have the Downloader activate the Watcher when it's done.


    The Watcher program runs as a different user. suid doesn't work. Since both programs run as non-privileged users, one cannot spawn the other.

    Oh, and Watcher's a fragile piece of crap; it'll freeze if it hits a file it can't read, it requires a kill -9 to restart, and that will corrupt its database. So don't even go there.

    I'm sorry about that - I didn't write Watcher; it's 25 years old, and it's 25M of hand-written assembly - no comments. All of the developers retired; most of them are dead, but there's one guy left who's living in an insane asylum. Some say it was from working with that code for five years after all of the other developers had retired.


    For what it's worth, it's not 'changing the scenario' if you're revealing more information about the situation which wasn't previously known. If the new information is inconsistent with the old, that's a different story.
  • tgape 2008-09-24 22:06
    James:
    No, that wouldn't work. The file record will show up in the watched directory in the midst of copying and your watcher program catches it in mid-copy. You have the same problem as mid-download, just a much narrower range of failure opportunity.

    The better solution (assuming you cannot PUSH from the downloading location)


    I call troll. Nobody could possibly think having the 'downloading' application running on a different machine would improve the IPC (Inter-process communication, for those clueless who obviously reading (and posting) here.)
  • ContraCorners 2008-09-24 22:11
    Franz_Kafka:
    Duke of New York:
    Franz_Kafka:
    If I were that guy (I'm not) and I got sued, I would stand up in court and say "Sorry your Honor, but I'm a complete asshole. I treat men and women equally poorly".

    That wouldn't keep him from having to testify as to the specific nature of the "personal problem" (not "personality problem") that was not related to her professional qualifications for the job. Or from having lawyers sift through his e-mails for evidence of a past pattern of behavior.

    Anyone who does interviews and doesn't see how clear-cut this was, is a walking liability and needs to get trained.


    That's easy - he's hiring toadies and footstools. In a less perjorative example, I'm allowed to pass on hiring someone because they don't fit in with the team, even if they're professionally qualified.

    Being an abusive jerk isn't illegal.


    No one is saying that you have to hire someone with a "problem on the personal level." What (I think) the Duke is saying is that at the very least...

    a) being an abuisive jerk isn't a defense in court.
    b) "Shari" would have a legitimate case against which you would have to defend yourself and who wants to go through that hassle.

    You don't have to hire anyone who doesn't fit in... you have my permission to be an abusive jerk... But when decide, based on a piece of paper and two spoken sentenances, that a job applicant has "proplems at the personal level" you had better be prepared to explain what those problems are and why the make her someone you wouldn't hire.
  • tgape 2008-09-24 22:19
    akatherder:
    Mover And Copier:
    The problem might be that when I'm told I may not modify the Watcher, how the heck am I supposed to know that I might be allowed to modify the Downloader?


    So you're given a problem with 2 entities. The person who gives you the problem says "Solve this problem without changing entity #1." You assume that you can't do anything to entity #2?

    The only rational thing I can assume is that changing entity #2 is the ONLY way to solve it or else this is a bullshit trick question.


    In fact, I like to give interview questions which require one to understand that "not being able to change entity #1 does not mean one cannot change entity #2." It's such a fundamental bit of logic, which has come up many times in my career.

    And, in the many times I've been called to clean up someone else's royal mess, a failure to apply that bit of logic has almost always been the root cause of the initial mess, as well as the cause of numerous of the additional messes added on top of that initial mess to raise it from a simple mess through complete mess to royal mess.

    So, while I feel sympathy for those people who cannot understand it, there is no way I want to sponsor employment for such people in any 'thinking' job.
  • Vertigo 2008-09-24 22:19
    Dave never wanted to give you the job, whoever you turned out to be. He just wanted the interview to go badly so he could get back to other things.
  • gero 2008-09-24 22:23
    ContraCorners:
    you had better be prepared to explain what those problems are and why the make her someone you wouldn't hire.
    the receptionist:
    And actually, Dave, the guy you’ll be meeting with, is out to lunch.

    Being "politically correct" could probably be the right thing to do in certain situations (I personally hate this use of female gender personal pronouns but that's not the point) but in this case, calling Dave "her" would be close to insulting.
    And because I hate this political "correctness" I pick on it.
  • tgape 2008-09-24 22:32
    I see a common thread on these interview response discussions I'd like to mention.

    Specifically, the interviewer sets up a scenario. He or she then refines the scenario by adding complications into the picture.

    Then, the interviewee and the various forum members start adding their own complications into the picture. "What if there's no other directory on the partition you can write to?", for example.

    Please stop this. It is the interviewer's place to add clarification. (Yes, I realize I put a post above with a clarification claiming that the Watcher was excessively fragile, and I spawned some interviewer-ish explanation why. But I wasn't explaining why the interviewer's answer wouldn't work; I was providing justification for the conditions the interviewer gave arbitrarily.)

    As an interviewee, you should be looking for possibilities that the interviewer has left open, rather than trying to close the remaining holes.

    Then, later, as a new employee, you should be looking for possibilities that the various PHBs and clueless gits who have preceded you have not managed to block off. Don't complete the barricade, but figure out what path remains open and take it.
  • tgape 2008-09-24 23:20
    The first story reminds me of one of the many interviews I had when I was first looking for an IT job, many years ago.

    I arrived 25 minutes early, but I brought a book to read. I chatted briefly with the receptionist, and indicated that I was intentionally early, as that way I couldn't possibly lose track of time - she'd alert me when he was ready to interview me.

    About 5 minutes later, a man hurried out of the office area. He told the receptionist he was running late and going to grab a quick bite - he might be late for his 1:00 interview. She indicated I was already there. He looked over, and said, "Hey, want to grab lunch?"

    The interview didn't go well; he didn't get the employee. But, on the bright side, he did get a few pointers on where he was going wrong on some of his projects, and he was able to get reimbursed for the meal. Years later, I encountered the guy he did end up hiring; my suggestions saved him several months of effort.

    (Note: technically, he turned me down as 'over-qualified'. I got that a lot, as I'd blown off my classes, but to mess around with computers more and learn more about them.)


    As someone who has given interviews to quite a few people (my first job was at a rapidly growing contracting company; it was continuously hiring for several years, during which time I gave dozens if not hundreds of interviews), I'd say Shari appears to have an excellent case. It's possible that discovery would turn up a consistent behavior pattern for the defendant, but it's also possible her alma mater isn't the only thing in the story which is brown, in which case that may not matter so much. (Same goes for any other racial minority, although to a lesser extent.) I'm not 100% certain, because it has been quite some time since I've given an interview, but I think Dave broke virtually every rule which we were explicitly given for interview behavior. Since it doesn't say anything about how he looked at her, I would rather just assume he broke that rule, too, to keep things simple. (Note: they didn't explicitly forbid any overt sexual actions, except by stating that the employee behavior guidelines still applied, since interviewers were getting paid for their time, so I'm not including those. I'm just betting that he didn't keep his gaze "neutral.")
  • fred1024 2008-09-25 00:21
    you need a better operating if mv is not atomic
  • Cpt 2008-09-25 00:54
    crystal mephistopheles:

    “Oh, okay,” the candidate replied. He pondered for a full minute and said “so in that case, I would hae the Watcher listen on a TCP/IP port, and have the Downloader tell it when it was done downloading.”

    “That seems like a lot of work,” I said....


    I don't think it's fair to say network I/O is "a lot of work". Granted, his temporary directory solution is even simpler, but most high level languages (.NET, Java, etc.) have fully defined classes you can use to implement this in only a few lines of code, and I would classify this as an acceptable solution.


    And the temporary directory solution will pose the same problem: what if the copy function is not finished yet and the Watcher gets more cpu cycles? The only solution in this case is to modify the protocol (or rather introduce a very crude one) with a separate flag file which contains a value indicating that either the provider is downloading, the provider is ready, the watcher is reading or the watcher is ready. From these 4 states both parties can derive what the other is or has been doing (or not) and whether it is save to do its stuff. If there is a contention for this status file you could add a second locking file (just keep it locked, nothing more) and create a true double semaphore.
  • Zaippa 2008-09-25 01:06
    I haven't read more than page 1 and 2, but about copying not being atomic - i.e. that you can see partial files in a directory while copying...

    I would assume that the OS doesn't put the filename into a directory before it has finished copying the data. If that is the case, you'ld never get partial files in a directory from a copy.

    I mean, this is how i thought copying worked:
    - first copy the data to the destination sectors on the hd
    - AFTER this is done, place an new file entry into the directory

    But it does it the other way around?
  • Zaippa 2008-09-25 01:10
    Hmm... just tried copying a large file (on windows). It does show the file in the destination directory while copying (in explorer and cmd)... So it looks like it does insert the file entry first.. Never mind :)

    (but, well.. this is also a somewhat contrived example)
  • Franz Kafka 2008-09-25 01:18
    tgape:
    I'm not 100% certain, because it has been quite some time since I've given an interview, but I think Dave broke virtually every rule which we were explicitly given for interview behavior. Since it doesn't say anything about how he looked at her, I would rather just assume he broke that rule, too, to keep things simple. (Note: they didn't explicitly forbid any overt sexual actions, except by stating that the employee behavior guidelines still applied, since interviewers were getting paid for their time, so I'm not including those. I'm just betting that he didn't keep his gaze "neutral.")


    This reminds me of one of my jobs (well known software company) - they don't have a process for vetting that you are allowed to interview people, they just toss you in with a vague idea of how to proceed. I didn't hear about us being sued when I was there, but I sort of expected to.

    Cpt:

    And the temporary directory solution will pose the same problem: what if the copy function is not finished yet and the Watcher gets more cpu cycles?


    For the last time, this is a copy and not a move. move takes almost no time, and is atomic. Just forget about it.
  • Kensey 2008-09-25 01:19
    You're missing the fact that the first quote refers to the candidate and the second refers to the interviewer.
  • Franz Kafka 2008-09-25 01:20
    Zaippa:
    I haven't read more than page 1 and 2, but about copying not being atomic - i.e. that you can see partial files in a directory while copying...

    I would assume that the OS doesn't put the filename into a directory before it has finished copying the data. If that is the case, you'ld never get partial files in a directory from a copy.

    I mean, this is how i thought copying worked:
    - first copy the data to the destination sectors on the hd
    - AFTER this is done, place an new file entry into the directory

    But it does it the other way around?


    Where would you store the data you were copying if there wasn't a file to put it in? Just how do you think files are stored? (hint: lots of stuff on google to satisfy your curiosity).
  • Tourist 2008-09-25 01:21
    Nicolas Verhaeghe:

    I too could find FIVE different solutions, just as simple


    1-The watcher is activated by the downloader

    > what's the point with the watcher in this context?

    2-The watcher and the downloader are one same application

    > sounded from the description they were not the same process.

    3-While being downloaded, the file is named "tmp_[filename]" and then renamed "[filename]" and the watcher knows not to open files starting with "tmp_"

    > modifying the watcher which wasn't allowed

    4-Downloader is scheduled to work at the top of the hour, Watcher scheduled to work at the bottom of the hour, and if you have a T1, what kind of a file would take more than 30 minutes to download?!?

    > here you are assuming the size and the download speed are always predictable and that downloader is not downloading say every five minutes. You know what they say about assumptions

    5-Watcher is coded properly with TRY/CATCH clauses and you go from there...

    > ;-) the problem is not on this level

    Why would you have a "watcher" spend it's time hogging processor time and memory 24/7 anyway?

    > because maybe it is not predictable when files arrive? or the watcher feeds some other program where they want so see the results asap etc.


  • Kensey 2008-09-25 01:22
    gero:
    ContraCorners:
    you had better be prepared to explain what those problems are and why the make her someone you wouldn't hire.
    the receptionist:
    And actually, Dave, the guy you’ll be meeting with, is out to lunch.

    Being "politically correct" could probably be the right thing to do in certain situations (I personally hate this use of female gender personal pronouns but that's not the point) but in this case, calling Dave "her" would be close to insulting.
    And because I hate this political "correctness" I pick on it.


    The above is what my previous comment would be placed under if I had any clue what I was doing at a time of day when I haven't slept yet and am supposed to get up in five hours to go to work.
  • Duke of New York 2008-09-25 01:30
    tgape:
    my first job was at a rapidly growing contracting company

    Sounds like the company I worked for in the 90s. It grew rapidly, then contracted.
  • Eternal Density 2008-09-25 01:45
    Smash King:
    You cross a dimensional portal and gets stuck in a world where interviewers are unable to use adaptive interviews "IE" those that change to turn whatever answer the interviewee provided automatically wrong. Now what do you do?
    I pick up my crowbar and look for zombies to bash. No, not zombie processes, headcrab zombies.
  • Zaippa 2008-09-25 01:48
    Long time since i've been looking at real industrial strength filesystems, but in the old CBM DOS V2.0 a file is 1) an entry in the directory, 2) the data itself.

    The entry in the directory contains the filename and a pointer to the first track/sector of the data, the file should contain.

    The data itself, is of course stored at the track/sectors of the disk.

    So, it is _easily_ possible to save the data first, and insert the entry in the directory (which essentially is just a pointer to the data + a name) afterwards. (atleast in CBM DOS V2)

    It seems like you suggest that the data for a file is stored inside the directory entry itself..? (if that is the case, i don't see how hardlinking in unix could work - that would have to copy all the data for each hardlink..)

    Or am i missing something? (maybe it's all different in modern filesystems)
  • Zaippa 2008-09-25 01:56
    My above post should have been a reply to:

    Franz Kafka:
    Zaippa:
    I haven't read more than page 1 and 2, but about copying not being atomic - i.e. that you can see partial files in a directory while copying...

    I would assume that the OS doesn't put the filename into a directory before it has finished copying the data. If that is the case, you'ld never get partial files in a directory from a copy.

    I mean, this is how i thought copying worked:
    - first copy the data to the destination sectors on the hd
    - AFTER this is done, place an new file entry into the directory

    But it does it the other way around?


    Where would you store the data you were copying if there wasn't a file to put it in? Just how do you think files are stored? (hint: lots of stuff on google to satisfy your curiosity).
  • Franz Kafka 2008-09-25 01:59
    Zaippa:
    Long time since i've been looking at real industrial strength filesystems, but in the old CBM DOS V2.0 a file is 1) an entry in the directory, 2) the data itself.

    The entry in the directory contains the filename and a pointer to the first track/sector of the data, the file should contain.

    The data itself, is of course stored at the track/sectors of the disk.

    So, it is _easily_ possible to save the data first, and insert the entry in the directory (which essentially is just a pointer to the data + a name) afterwards. (atleast in CBM DOS V2)

    It seems like you suggest that the data for a file is stored inside the directory entry itself..? (if that is the case, i don't see how hardlinking in unix could work - that would have to copy all the data for each hardlink..)

    Or am i missing something? (maybe it's all different in modern filesystems)


    It's all different in modern filesystems, at least unix ones.

    A common implementation for unix filesystems is as follows:
    each file is stored in an inode, which stores permission and ownership info along with pointers to the blocks that contain the data. Directory entries contain a name and an inode and are themselves files.

    Generally, inodes contain enough block pointers (that point to the data blocks) to fill up leftover space in a data block, minus some entries at the end that do various levels of indirection for big files. The actual data is stored contiguously (mostly) when possible, and fses like ext2 do things like allocating 8 blocks at a time to speed up access.

    Meanwhile, directories can be indexed by name, so 10,000 files in a dir is fast as hell.

    So you see, there's no way to stre data without creating a file to put it in, but since you can just write the file somewhere, then create a file entry in some other dir, it's no big deal.
  • Zaippa 2008-09-25 02:23
    Franz Kafka:

    It's all different in modern filesystems, at least unix ones.

    A common implementation for unix filesystems is as follows:
    each file is stored in an inode, which stores permission and ownership info along with pointers to the blocks that contain the data. Directory entries contain a name and an inode and are themselves files.

    Generally, inodes contain enough block pointers (that point to the data blocks) to fill up leftover space in a data block, minus some entries at the end that do various levels of indirection for big files. The actual data is stored contiguously (mostly) when possible, and fses like ext2 do things like allocating 8 blocks at a time to speed up access.

    Meanwhile, directories can be indexed by name, so 10,000 files in a dir is fast as hell.

    So you see, there's no way to stre data without creating a file to put it in, but since you can just write the file somewhere, then create a file entry in some other dir, it's no big deal.


    Ok, thanx for the clearification. Not sure i get the point about why the directory entry can't be made afterwards, though. But i'm tired, so that is probably why. ;)

    (I just pulled out my old tanenbaum book (modern operating systems), took a _quick_ look at the section on the UNIX V7 file system. It says that a directory entry, is in fact just a filename + a pointer to an i-node. So it does seem to me that it is possible to create the i-node (etc) before creating the directory entry.)

    Anyway, i'm tired, and you're right - it's getting a bit off topic. :)
  • Earl Colby Pottinger 2008-09-25 02:32
    ST:
    Thanks for the interview tales, this is one of my favourite sections. Mind you, I'm pretty shocked at how many of the resident professionals are trying to come up with alternative answers for the problem in the second tale. Obviously you use a temp filename (ignored by the watcher) or a temp directory. What kind of mindset comes up with anything else?


    Real people. Personally, while reading the problem I right away came up with a solution - Have 'Watcher' look for the next file before processing the present file.

    Example, If you expect to see File1, File2, File3 .... FileX
    then process File1 when File2 appears, process File2 when File3 appears ... etc ... FileX I would process after a reasonable delay or add a dummy End_Of_Files file.
  • Wild Thing 2008-09-25 02:40
    Matt:
    TRWTF is that this message board doesn't do threaded replies!!!


    Seconded.
  • fnordheim 2008-09-25 03:03
    Typical CS graduate solution.

    In the real world, the watcher finds the file, tries to open it, fails, dumps about 347 error messages which trash /var/log and your patience, refuses to process anything further (thus overflowing the directory, causing a week of outage), sacrifices your first-born to Steve, and then it will get *real* nasty.
  • ClaudeSuck.de 2008-09-25 03:22
    crystal mephistopheles:

    “Oh, okay,” the candidate replied. He pondered for a full minute and said “so in that case, I would hae the Watcher listen on a TCP/IP port, and have the Downloader tell it when it was done downloading.”

    “That seems like a lot of work,” I said....


    I don't think it's fair to say network I/O is "a lot of work". Granted, his temporary directory solution is even simpler, but most high level languages (.NET, Java, etc.) have fully defined classes you can use to implement this in only a few lines of code, and I would classify this as an acceptable solution.


    And there we are again: the beginning of an enterprisy solution.
  • Readthis 2008-09-25 03:24
    JamesQMurphy:
    TopCod3r:
    We have 2 open positions on our team, due to high turnover, so I interview probably about 5 or 6 people a week and have gotten really good at giving technical interviews. It usually involves giving them a real problem from some code we have, and seeing if they solve it the right way, and then I explain to them how it should be done and make sure they agree.

    It is hard to find people who have the right mix of skills and personality. Some people realize halfway through my technical interview that they lack the required knowledge and simply cut it short and walk out of the room, I assume in embarrassment.


    Have you ever asked why you have high turnover?


    Woosh!
  • ClaudeSuck.de 2008-09-25 03:26
    Max:
    Oh, seriously. Each one of these is a job you shouldn't take anyway.

    1) Obvious reasons...
    2) An interviewer who thinks downloads in progress are a problem but file copies in progress are not shows a lack of understanding.
    3) Interviews are two-way -- if the people interviewing you are clueless, the job will suck.


    We solved this by deleting the file from the source directory after transfer and have The Watcher (we could modify The Watcher) have a look into this directory. Once a file is in the working directory but not in the source directory anymore it was assumed that the file was transferred completely (and successfully). Not fool-proof but worked quite well, though.
  • ClaudeSuck.de 2008-09-25 03:31
    Azeroth:
    There is another simple solution with the downloader/watcher problem - downloader should open the file exclusively while it's being downloaded, this way watcher won't be able to access it until it's closed. This way it's not even required to move anything anywhere.


    http://thedailywtf.com/Articles/I-Didn%e2%80%99t-Know-You-Could-Do-That!.aspx
  • gero 2008-09-25 04:01
    I missed the point by posting here.

    Apologies to ContraCorners.

    I'll have to find another case of political correctness to flame....
  • Joel H. 2008-09-25 04:01
    Jeremy H. is a tool. He said that you can't modify the Watcher but you can modify the Downloader?

    I've had tools like this ask me questions where they really want one golden answer. He is not looking at how you approach the problem, rather whether you reach his solution.

    The temporary directory solution is a poor one - it creates a race condition. It assumes that a copy operation from temp to the real dir is going to be an atomic and allows the watcher to pick up half written files. Sure it will crash less - but it will still crash.

    Communication between the downloader and watcher makes sense to me. Sounds like the guy was just getting annoyed that Jeremy is a tool.
  • Aran 2008-09-25 04:02
    Was that first interviewer Steve Jobs? I've heard anecdotes about that kind of thing...
  • Chris Leather 2008-09-25 04:09
    “What about if the Downloader just wrote files to a temporary directory, and then moved the file to the appropriate directory when the download was complete.”
    Hang on a minute - surely the watcher program will try to process the file while the downloader program was moving it from the temporary directory (which would take a while due to it's size), which would produce precisely the same result - the Watcher program crashes as it processes an incomplete file. Surely the answer is for the Watcher program to check the last edited timestamp of the file - once that is say, 10 minutes old, then it's likely the file has finished downloading.
  • gero 2008-09-25 04:30
    Joel H.:
    The temporary directory solution is a poor one - it creates a race condition. It assumes that a copy operation from temp to the real dir is going to be an atomic and allows the watcher to pick up half written files. Sure it will crash less - but it will still crash.

    Chris Leather :
    Hang on a minute - surely the watcher program will try to process the file while the downloader program was moving it from the temporary directory (which would take a while due to it's size), which would produce precisely the same result - the Watcher program crashes as it processes an incomplete file.


    We're looooooping here...

    Seriously guys, the real solution is to mount the remote location (there are plenty of FUSE modules for every imaginable protocol and we already know these guys are using Linux from the context), have the downloader just add a symbolic link to the remote file (that should be as atomic as it gets), and then the watcher will act on the file pointed to by the symbolic link.

    Sheesh.
  • The General 2008-09-25 05:11
    SoonerMatt:
    Marc:
    Rename?

    Yeah I was thinking that too. Rather than make it move a 3 gb file (which could fail in itself), I would start the transaction as a .tmp file then remove the .tmp when it's completed.

    That's just what we do to solve a very similar problem - the "watcher" is a 3rd party app looking for a particular file type. Even though the files are rarely 3Mb let alone 3Gb, and there's no network connection involved in writing them, that damn watcher is just too fast. The files are therefore written as .tmp then renamed when done.
  • Casey 2008-09-25 05:15
    Azeroth:
    There is another simple solution with the downloader/watcher problem - downloader should open the file exclusively while it's being downloaded, this way watcher won't be able to access it until it's closed. This way it's not even required to move anything anywhere.


    Nope, sorry.

    Watcher tries to open file, and gets can not open file error and abends. -904 resource unavailble

    The **proper** way to do this is to have the downloader trigger the watcher program when done. Has no one heard the term "batch" or "job schedule" before?

    It's a good thing you *nix kiddies haven't tried to reinvent the car, as I suspect it would not have wheels.
  • Rhialto 2008-09-25 05:31
    SoonerMatt:
    Marc:
    Rename?

    Yeah I was thinking that too. Rather than make it move a 3 gb file (which could fail in itself), I would start the transaction as a .tmp file then remove the .tmp when it's completed.

    Why does everybody seem to think that moving a file involves copying it? Not even MSWindows does it that stupidly, I think. (Or does it?) Anyway this was apparently about Linux.
  • Sam 2008-09-25 05:32
    Joel H.:
    Jeremy H. is a tool. He said that you can't modify the Watcher but you can modify the Downloader?

    I've had tools like this ask me questions where they really want one golden answer. He is not looking at how you approach the problem, rather whether you reach his solution.

    The temporary directory solution is a poor one - it creates a race condition. It assumes that a copy operation from temp to the real dir is going to be an atomic and allows the watcher to pick up half written files. Sure it will crash less - but it will still crash.

    Communication between the downloader and watcher makes sense to me. Sounds like the guy was just getting annoyed that Jeremy is a tool.



    You are the tool for insulting others. The "simple" solution is a move, as POSIX guarantees atomicity. This is discussed at length in the first 4 pages of comments, where the differences between copy and move have also been discussed.

    As you are a tool, you wouldn't get the job for unspecified "personal problems", but we'll all really know it's because you are a tool.
  • NeoMojo 2008-09-25 05:52
    JamesQMurphy:
    Have you ever asked why you have high turnover?

    It's because of the little one. He's always saying "turn over. turn over." So they all turn over and one falls out.
  • Rhialto 2008-09-25 05:54
    James:
    b) I was going to suggest having Watcher poll the filesize and do its scan a fixed time after it stops changing. I think that can work, if you're able to be sure that Downloader is using a protocol with a spec'd timeout. Assuming, of course, that Watcher is interested in failed downloads as well as those that finish...

    My idea was to watch the directory, and once a new file appears, the other ones must be finished. (For the last file, your solution or something like it would be needed)

    Anyway, modifying the Watcher was not an option, apparently.
  • Mike 2008-09-25 06:23
    Sorry but that temporary files copying question is a crock of shit. You still have the same problem. If the file is large, and the disks are slow/physically separate from each other, the operating system will copy it piece by piece from the source directory to the target; it's not an atomic operation, so you still can't know when it's fully complete.
  • balancer100 2008-09-25 06:37
    So how will the Watcher tell the difference between downloading a file from the internet to a directory and copying a file from another directory to the one it is watching?

  • Doesn't matter 2008-09-25 06:42
    AAAaaaaaarghhh!!! Enough!!! Someone say something DIFFERENT... PLEASE!!
  • Iain Collins 2008-09-25 06:42
    gero:

    Q. Locking?
    A. You can't control the watcher


    It doesn't necessarily matter if you can't modify the Watcher (which, incidentally was a condition introduced only halfway through to try to point the candidate in another direction, not a limitation from the outset).

    Not being able to modify the Watcher to use flock()/lockf() (which any app like that _should_ be doing in the first place if it's been written professionally) just means you have to get a mandatory lock, rather than an advisory lock.
  • A Problem At The File System Level 2008-09-25 06:49
    Rhialto:
    SoonerMatt:
    Marc:
    Rename?
    Yeah I was thinking that too. Rather than make it move a 3 gb file (which could fail in itself), I would start the transaction as a .tmp file then remove the .tmp when it's completed.
    Why does everybody seem to think that moving a file involves copying it? Not even MSWindows does it that stupidly, I think. (Or does it?) Anyway this was apparently about Linux.
    What if the file is 3 gigs in size? The .tmp idea will fail the same way as the temp-directory idea, i.e. the "mv xxx.tmp xxx" command will obviously take a while and the Watcher might already see the file when the rename operation hasn't finished writing the file's data to disk.

    Just kidding, of course. :-)
  • Iain Collins 2008-09-25 07:00
    Dracolith:
    I mount my filesystems using NFS on UNIX which does not provide proper file locking of any sort. It is common practice to use such filesystems.


    "Proper" file locking is absolutely supported on NFS under UNIX, even *Windows* supports locking over NFS.

    Most systems just require an appropriate RPC daemon and portmap to be running (I think it's a Service in Windows). This is not that unusual, I've certainly used commercial/3rd party software under UNIX which requires file locking on NFS be enabled if you want to run it from an NFS share.

    It's not "a liberal and unwarranted" assumption, as it's trivial to check if you got the lock (and something you would need to do anyway, even if you were writing to another location and then moving the file) - certainly it's less work than the conditional checking you'd have to do to handle using a temporary file location.
  • Carra 2008-09-25 07:06
    Heck, my first idea was "just send a signal when a file is done" too. Watcher just listens on port x, downloaders sends a signal to port y. Or a triple handshake.

    Although I did get the "use a temp dir" when he said the watcher could not be changed.

    Can't blame the recruit for assuming it's one of those M$ trick questions where for every solution you give, they dismiss it.
  • vr602 2008-09-25 07:10
    What a lot of old crap about locks, atomic moves, filesystem semantics, *nix etc. It's exhausting!

    The simple solution has been suggested a couple of times above, and ignored, i.e. the semaphore file which is sent after the big file. The Watcher only registers the arrival of the semaphore file, and only processes the big file.

    And don't tell me "the Watcher can't be modified." - this is the real world, and it's a piece of code; it can be modified.
  • pincho 2008-09-25 07:29
    >2) File copies/moves across a filesystem are (probably) much
    >quicker than the downloading. The interviewer never said it was
    >a problem with multiple processes accessing a file, just that if
    >the Watcher reached the end of the file too quickly. If a
    >copy/move can copy bytes faster than the Watcher can process the
    >bytes, then it should be fine, no?

    That's what I call sound approach to creating a robust solution free of timing dependencies ;-)
  • alioth 2008-09-25 07:32
    I don't think it's fair to say network I/O is "a lot of work". Granted, his temporary directory solution is even simpler


    Basic network I/O is easy, but often you also need security. You're probably going to need (by policy) to have some authentication for any network traffic. While you can use SSL to provide this (which makes it easy from the programming level) you now have to manage certificates.

    It's therefore about two orders of magnitude easier to just use the filesystem which both programs already have access and are protected by the necessary permissions or ACLs, rather than having to add yet more authentication and authorization to make sure these things remain secure - especially if you're encumbered by an ISO27001 security process.
  • Iain Collins 2008-09-25 07:33
    vr602:
    What a lot of old crap about locks, atomic moves, filesystem semantics, *nix etc. It's exhausting!

    From the comments it seems like a lot of posters don't lock files (be they temporary or otherwise) even when writing (let alone reading) because it's "old crap" and "exhausting".

    "Locking files is so 1990's - this is Web 2.0 baby!

    We just try to append and hope for the best!"
  • A Problem At The File System Level 2008-09-25 08:07
    gero:
    We're looooooping here...
    It would seem the reason why we're looping here is because...

    The real WTF is the forum software, of course! If you could view posts by thread, it might make it a lot easier to check whether something was already said before (by the way: was it already mentioned that a file move from the temp directory might not be atomic?), or whether a certain post has already been replied to, at least by letting you easily ignore all the posts on a completely unrelated topic.
  • Inigo Montoya 2008-09-25 08:22
    A diamond in a rough is a gem that stands out among a collection of dull rocks.

    What you're talking about is a "rough diamond".

    Inconceivable!
  • smeg 2008-09-25 08:39
    This is the real world; inter-department and inter-company politics and bureaucracy will ensure no-one gets access to the piece of code in order to modify it - assuming whoever is responsible actually feels responsible enough to do something
  • vr602 2008-09-25 08:45
    Iain Collins:
    vr602:
    What a lot of old crap about locks, atomic moves, filesystem semantics, *nix etc. It's exhausting!

    From the comments it seems like a lot of posters don't lock files (be they temporary or otherwise) even when writing (let alone reading) because it's "old crap" and "exhausting".

    "Locking files is so 1990's - this is Web 2.0 baby!

    We just try to append and hope for the best!"


    Yes, well I knew someone would miss the point completely.

    Who says I don't lock files? I lock files if I need to; the point with my solution is it doesn't matter if the file is locked or not, it's simply not a factor. The "lock the file" solution is old crap not because locking files is wrong, or because I can't be bothered to do it. but because it's irrelevant.
    Oh and incidentally, I don't do any web development at all, I'm very much an old-fashioned dinosaur of a programmer with ancient standards and working practices that date back to 1986, so please don't accuse me of being new-style lazy.
  • Philluminati 2008-09-25 08:49
    I have to admit I failed this like everyone else it seems.

    My first reaction is that the watcher should be invoked by the downloader when it has finished getting the file rather than constantly running.

    If you can't do that I have some other suggestions but certainly one of them wasn't "download them to a temporary directory and move them when they're finished".

    Why would you be able to "chain" an event to the end of the downloader, such as a move, but not chain the watcher launching then?
  • Stephen 2008-09-25 08:54
    Erm.. Moving a file is *always* the right answer to this problem. In linux you can even move a file ontop of an existing file and it wont even break processes that have the original open. They can merrily keep using that file descriptor until they close and reopen the file.

    Seriously... anyone who fails to get this right is not worth hiring. They're usually the ones that develop horrifically overcomplex systems for simple problems.

  • Spiun 2008-09-25 09:21
    SomeCoder:
    Yeah, solution #2 is just great. Because we all know that every file system command is guaranteed to be atomic, right?

    mv may be CLOSE to atomic but it's definitely not guaranteed to always be atomic. And if we suddenly have to change directories across partitions then it damn well is NOT atomic.

    I think Jeremy H should get a better interview question.


    Well, the best self-advice I can squeeze out of this situation is thinking out loud at an interview. Basically its possible that the temporary directory solution was a candidate solution, but the developer thought about the possibility that it may not be on the same volume so he silently discarded it (and the probability is not that small either).

    So... if you figure your reasoning is good, you can think-out-loud the discarded solutions too, possibly getting bonus points.
  • Dan 2008-09-25 09:26
    “Oh,” he scoffed, “so, we’re equals now, is that it?”

    ROFL!!
    I highly doubt it!
  • Arlen 2008-09-25 09:30
    Nice trackback, too.
  • equex 2008-09-25 09:42
    I once went for an interview and was told to meet up directly at their clients offices, where he would appear and meet up with me to discuss the next project with them.

    This company makes a well known open source CMS system.

    The guy doesnt show up, and i have to act as a standin with their clients and i sit there and bullshit as good as i could.

    20 minutes later he shows up for the still ongoing meeting.

    Afterwards, i team up with him and we go to their offices where the actual interview would take place.

    The interview consisted of 30 minutes of slides about their biggest customers (NASA, etc etc. NASA cant make their own CMS ? lol) and thats when i started to smell rats.

    After the presentation they go on about how they dont want to hire anyone, they only set up franchises. So they wanted me to start my own company and enter into their franchise so they wouldnt have to hire me.

    I went for a cigarette and left without saying anything.

  • Steve 2008-09-25 09:44
    Regarding story #2, I've had to deal with kluges like that in the past.

    A larger number of years ago than I can comfortably think about, I worked in an installation where we had a Cray X-MP running something called CTSS (Cray Time Sharing System) and a bunch of VAXes, which acted more or less as front ends to the Cray. We had a film recorder system running on one of the VAXes and the Cray was connected to that sysem by means of something called Hyperchannel, back when that was Hot Stuff.

    The Cray would deposit image files to be recorded on the film recorder for movies.

    As with the problem posed, there was no way of telling when the file was complete (or if it actually was complete, since the Hyperchannel connection would occasionally go wonky).

    It's difficult to describe how ugly and complicated this became.

    It's not a trivial problem. Believe me. I have the scars to prove it.
  • James R. Twine 2008-09-25 09:59
    In Windows, moving files across drives (letters/partitions) is done via a copy and then a delete behind the scenes (look it up and/or try it with a large enough file - the article does talk about multi-GB files). As such, the file can appear in the directory listing of the destination before it is fully "there".

    Posix or not, assumption is a big thing, too.
  • James R. Twine 2008-09-25 10:07
    tgape:
    James R. Twine:
    Ok - so the majority here believe that mv is an atomic operation if the src and dst are on the same filesystem (or partition?)...
    Does this hold true for ALL filesystems running under Linux/Unix? It might be that the directory is not running on ext2/ext3... What if it was a mounted FAT32 partition, or RiserFS, or UFS, or a SMB share?

    It holds for every single file on any POSIX OS, regardless of whether that file is on a native filesystem or not. This is because rename(2) is atomic, according to POSIX, so if you want to be POSIX, rename better be atomic. And it's incredibly easy to have rename(2) be sufficiently atomic for this purpose. (Note: even if rename was implemented by doing a link(2) followed by an unlink(2) of the old name, it would *still* be fast enough, unless Watcher is so fragile that it dies if it ever sees a file with multiple links. For it to do that, it'd pretty much have to be coded to do that, however - and it would still be such a tiny window that it's virtually inconceivable that it would happen regularly.)

    Btw, this means Windows, too. POSIX is a big thing.

    (Note: there are many unix OSes which have optional POSIX support. However, their non-POSIX mode is still loosely POSIX, and still does this. It's too necessary to not do it, and it's too useful to write the code to do it and then not always use it.)


    In Windows, moving (not renaming) files across drives (letters/partitions) is done via a copy and then a delete behind the scenes (look it up and/or try it with a large enough file - the article does talk about multi-GB files). As such, the file can (and does) appear in the directory listing of the destination before it is fully "there". Oh, and be sure to try that on different filesystem types, too.

    POSIX or not, assumption is also a big thing.

    Remember, my original point was that these kind of "only one right answer"-type questions often take a lot of assmptions for granted.
  • hatterson 2008-09-25 10:27
    Joel H.:
    Jeremy H. is a tool. He said that you can't modify the Watcher but you can modify the Downloader?

    I've had tools like this ask me questions where they really want one golden answer. He is not looking at how you approach the problem, rather whether you reach his solution.

    The temporary directory solution is a poor one - it creates a race condition. It assumes that a copy operation from temp to the real dir is going to be an atomic and allows the watcher to pick up half written files. Sure it will crash less - but it will still crash.

    Communication between the downloader and watcher makes sense to me. Sounds like the guy was just getting annoyed that Jeremy is a tool.


    Actually he's very much looking at how the person approached the problem. He presented a scenario and refined it to see how the interviewee would respond to a changing scenario.

    Instead of using the new information (no EOF, can't modify the watcher) to step back and rethink how he would approach the problem he continued to simply modify his solution despite the fact that it was getting increasingly convoluted, complicated and time consuming. The interviewer wanted to see if the employee had an open mind to changing his proposed solution. Instead the employee started with the assumption that the file cannot be moved and decided to just figure out harder ways to do that rather then step back and ask if his initial assumption was correct.
  • hatterson 2008-09-25 10:28
    gero:
    Joel H.:
    The temporary directory solution is a poor one - it creates a race condition. It assumes that a copy operation from temp to the real dir is going to be an atomic and allows the watcher to pick up half written files. Sure it will crash less - but it will still crash.

    Chris Leather :
    Hang on a minute - surely the watcher program will try to process the file while the downloader program was moving it from the temporary directory (which would take a while due to it's size), which would produce precisely the same result - the Watcher program crashes as it processes an incomplete file.


    We're looooooping here...

    Seriously guys, the real solution is to mount the remote location (there are plenty of FUSE modules for every imaginable protocol and we already know these guys are using Linux from the context), have the downloader just add a symbolic link to the remote file (that should be as atomic as it gets), and then the watcher will act on the file pointed to by the symbolic link.

    Sheesh.


    Yea, mv is way too complicated.
  • hatterson 2008-09-25 11:03
    Casey:
    Azeroth:
    There is another simple solution with the downloader/watcher problem - downloader should open the file exclusively while it's being downloaded, this way watcher won't be able to access it until it's closed. This way it's not even required to move anything anywhere.


    Nope, sorry.

    Watcher tries to open file, and gets can not open file error and abends. -904 resource unavailble

    The **proper** way to do this is to have the downloader trigger the watcher program when done. Has no one heard the term "batch" or "job schedule" before?

    It's a good thing you *nix kiddies haven't tried to reinvent the car, as I suspect it would not have wheels.


    So what about the scenario where the downloader runs on server xyz. It downloads a file from stuff.com and puts it in a shared folder \\xyz\files. The watcher runs on server abc which is entirely outside your domain of control, you are not allowed access to this server in any way thus there is no possible way to trigger the watcher.

    Granted there are times when downloading it to a temp directory will not work but they are very very very few and far between. I would much rather use the very simple, very reliable solution and only make things more complex when it is required rather then using the more complicated and less usable solution as my default.
  • KenW 2008-09-25 11:03
    Cpt:

    And the temporary directory solution will pose the same problem: what if the copy function is not finished yet and the Watcher gets more cpu cycles? The only solution in this case is to modify the protocol (or rather introduce a very crude one) with a separate flag file which contains a value indicating that either the provider is downloading, the provider is ready, the watcher is reading or the watcher is ready. From these 4 states both parties can derive what the other is or has been doing (or not) and whether it is save to do its stuff. If there is a contention for this status file you could add a second locking file (just keep it locked, nothing more) and create a true double semaphore.


    Why is it all you dolts that are suggesting locking/semaphores/IPC and so forth don't seem to understand that none of those will work unless you can modify both the Watcher and the Downloader, and you were expressly told that wasn't possible?

    It isn't really that hard to understand. Let me try one more time:

    You cannot use a solution that requires modifying both sides of the process because you're NOT ALLOWED TO MODIFY BOTH SIDES OF THE PROCESS!

    Does that help you at all? If not, please find another career you (collectively speaking to all people posting solutions that fail to comprehend the above) are more qualified for, like flipping burgers, washing dishes, or digging ditches. Thanks so much for playing.
  • KenW 2008-09-25 11:05
    Zaippa:
    Hmm... just tried copying a large file (on windows). It does show the file in the destination directory while copying (in explorer and cmd)... So it looks like it does insert the file entry first.. Never mind :)


    Which is why everyone keeps stressing that you move the file, not copy the file. Please pay attention.
  • KenW 2008-09-25 11:10
    Earl Colby Pottinger:
    Real people. Personally, while reading the problem I right away came up with a solution - Have 'Watcher' look for the next file before processing the present file.

    Example, If you expect to see File1, File2, File3 .... FileX
    then process File1 when File2 appears, process File2 when File3 appears ... etc ... FileX I would process after a reasonable delay or add a dummy End_Of_Files file.


    Really, brainiac? You found a solution that requires modification to the Watcher after you were expressly told that wasn't allowed? Wow! You're an immediate hire! For the new job listing for a janitor, of course; you're obviously not qualified for anything more technical.

    See Security down the hall for your pass. They'll know what areas you're not allowed to access.
  • KenW 2008-09-25 11:14
    Joel H.:
    Jeremy H. is a tool. He said that you can't modify the Watcher but you can modify the Downloader?

    I've had tools like this ask me questions where they really want one golden answer. He is not looking at how you approach the problem, rather whether you reach his solution.

    The temporary directory solution is a poor one - it creates a race condition. It assumes that a copy operation from temp to the real dir is going to be an atomic and allows the watcher to pick up half written files. Sure it will crash less - but it will still crash.

    Communication between the downloader and watcher makes sense to me. Sounds like the guy was just getting annoyed that Jeremy is a tool.


    Joel H. is a tool. He doesn't seem to know enough about operating and file systems to understand the difference between a copy and a move operation. Therefore, he came up with yet another moronic description of a problem that doesn't exist to justify why he too would have failed to pass Jeremy's easy interview question.
  • Ray 2008-09-25 11:18
    How about checking timestamps too? If the file is unchanged for a minute and the file size hasn't changed, it's probably done.

    If it's really that big of an issue, I'd probably use lsof or fuser too.
  • KenW 2008-09-25 11:25
    Mike:
    Sorry but that temporary files copying question is a crock of shit. You still have the same problem. If the file is large, and the disks are slow/physically separate from each other, the operating system will copy it piece by piece from the source directory to the target; it's not an atomic operation, so you still can't know when it's fully complete.


    Sorry, but you're wrong. Thanks for playing. Don't bother trying again.

    (For an explanation of why you're wrong, read the other 100+ posts in this thread that tell you.)
  • Publius 2008-09-25 11:27
    Gorfblot:
    Someone You Know:

    I don't think "IE" means what you think it means.


    It's certainly possible. I think it means id est, and loosely translates into "That is to say".

    I used it between two attempts at explaining the metaphor- A literal one, and one where I attempted to show a situation where the description might be more valid.

    What do you think it means?


    You don't use id est ("that is") to introduce an example, you use it to restate the previous sentence more clearly and succinctly. You use exempli gratia ("By grace of example") to introduce examples, always, no exceptions.
  • Mr^B 2008-09-25 11:31
    I think we've discovered TRWTF:

    "Developers who don't actually listen to the customer and produce what they think the customer wants, rather than what they asked for."

    Hoorah! Tea and Scones all round!
  • Tepsifüles 2008-09-25 11:35
    Philluminati:
    I have to admit I failed this like everyone else it seems.

    My first reaction is that the watcher should be invoked by the downloader when it has finished getting the file rather than constantly running.

    If you can't do that I have some other suggestions but certainly one of them wasn't "download them to a temporary directory and move them when they're finished".

    Why would you be able to "chain" an event to the end of the downloader, such as a move, but not chain the watcher launching then?

    Because the downloader is invoked by your own cronjob, while the watcher has to be consuming the files in its working directory constantly. Scenario: the watcher converts medical data files from one format to another, users bring in tapes written by devices in the next room without a network connection and just throw them into the directory; every night, a load of such records from another institution arrives via the downloader. So you can control when does the downloader run, what is run immediately before/after, but can't/don't really want to mess with the watcher, which needs to get all the files in its working directory, downloaded or not.

    And to break the loop: chmod FTW. With a slight exaggeration, that's what permissions are made for.
  • hatterson 2008-09-25 11:37
    Tepsifüles:
    And to break the loop: chmod FTW. With a slight exaggeration, that's what permissions are made for.


    Assuming the watcher doesn't crash when it can't read a file due to permission issues.
  • dkf 2008-09-25 11:40
    Marc:
    You could build custom FPGA that intercepts packets on the the network and copies them to flash memory. The hardware can use a serial interface to indicate when the download is complete to a third program, 'The Mounter', which mounts the flash disk to the location the Watcher is expecting.

    The hardware can have a pool of flash memory disk areas, one being written to from the network, one mounted. Each flash memory area would only hold one file at a time.

    Since the Watcher is always running, I'm assuming it uses some sort of event handling system. An operating system hook to the event which indicated the Watcher is done processing and is now watching could be used to tell 'The Mounter' when its time to unmount a flash disk and mount the next one in the queue.
    Gloves.
  • grammernarzee 2008-09-25 11:58
    KenW:
    Why is it all you dolts that are suggesting locking/semaphores/IPC and so forth don't seem to understand that none of those will work unless you can modify both the Watcher and the Downloader, and you were expressly told that wasn't possible?

    It isn't really that hard to understand. Let me try one more time:

    You cannot use a solution that requires modifying both sides of the process because you're NOT ALLOWED TO MODIFY BOTH SIDES OF THE PROCESS!

    Does that help you at all? If not, please find another career you (collectively speaking to all people posting solutions that fail to comprehend the above) are more qualified for, like flipping burgers, washing dishes, or digging ditches. Thanks so much for playing.


    1 Stop shouting.
    2 Get over yourself. Just because *you* think your solution is better than everyone else's, that doesn't mean they should be flipping burgers.
    3 It's perfectly reasonable to ask why e.g. the Watcher can't be modified. Why not? No really, why not? Because someone says so? Well big flipping burger deal, maybe he's wrong and should be washing dishes. And if it can't be modified then let's write our own - even you could do that, right? Why jump through hoops to comply with a crummy piece of software that doesn't do what you want, when it easily could do?
    Maybe you'll shout back that I'm not "solving the problem as stated". Well so what, I'm getting things done, not writing overcomplicated solutions for foolish problems, in an attempt to justify my arrogance.
  • Good Greif 2008-09-25 12:04
    Cpt:
    Why is it all you dolts that are suggesting locking/semaphores/IPC and so forth don't seem to understand that none of those will work unless you can modify both the Watcher and the Downloader, and you were expressly told that wasn't possible?


    What in the name of all that's holy do you think "mandatory" file locks are, exactly?

    "Thanks so much for playing." ... indeed.
  • Zaippa 2008-09-25 12:20
    KenW:
    Zaippa:
    Hmm... just tried copying a large file (on windows). It does show the file in the destination directory while copying (in explorer and cmd)... So it looks like it does insert the file entry first.. Never mind :)


    Which is why everyone keeps stressing that you move the file, not copy the file. Please pay attention.
    Thank you, KenW. Perhaps you should have been paying attention to what i wrote as well. I am well aware that a move would solve the problem, but if you read the post above the post you quoted, you would realize that i was talking about that copying could easily be made atomic as well. (but i guess that would introduce other problems, and is why it isn't done like that).

    Normally i would ignore posts like yours, but you seem like a bitch and i'm in a bitchy mood as well, so there you go :)
  • Publius 2008-09-25 12:24
    The correct solution is pretty obvious. You have the watcher and downloader walk into a talent agency. The watcher says to the talent agent, "We have a really amazing act. You should represent us."

    The agent says, "Sorry, I don't represent family acts. They're a little too cute."

    The downloader says, "Sir, if you just see our act, we know you would want to represent us."

    The agent says, "OK. OK. I'll take a look."

    "First I come out, wearing a tuxedo, playing Brahms. Just as the music reaches a crescendo, the downloader in an evening gown runs on stage and undresses me before dancing provocatively on top of the piano.

    Just as I finish playing the song with my cock, the downloader strips and does a backflip off the piano in a split on stage. Once her naked ass hits the floor, my 7 year old daughter and 13 year old son rush on stage juggling flaming lawn darts. the downloader does a handstand and catches the lawn darts in her cunt, she then manages to queef them out, making her the third part of this juggling act.

    The queefs force her to squeeze out a few turds, which I eagerly start smearing on my naked body, which arouses me quickly. Once I'm fully aroused my daughter and son take turns blowing me while the downloader straps on a monstrous dildo and begins reaming each child while i ejaculate in the eyes of my offspring.

    Once I cum, I run into the audience, shit-covered body still sticky with cum and grab my parents and in-laws to involve them into the act. I strip them all nude and instruct them to start a circle jerk while screaming racial slurs. So my father and my father-in-law start screaming, "Fuck the niggers" while mutually masturbating, and my mother and mother-in-law begin diddling one another and chanting, "I hate spics and jews!" Once they reach a geriatric climax, the downloader uses their ejaculate to lube up her fist which she uses to start fisting me.

    As my asshole is violated, I start playing double dutch with my kids, and once they get tangled in the ropes, start a torrid 69. All the sucking and slurping cause my in-laws and parents to get aroused again and they start sodomizing and fisting one another.

    The downloader at this point has completely started dry-heaving, so she vomits all over my ass and my back. I line up each of my family members who take turns licking the chunks of spew off my back and out of my ass.

    By now my children have to defecate so I tell them to shit in each other's favorite orifices. My son, ever the trooper takes a thick, dense shit in his sister's vagina while my daughter shits in my son's nose.

    My young daughter also conveniently starts her menstrual cycle shortly thereafter, and the menses and boy-shit in her cunt make for great lube, as each of my in-laws begin fucking my daughter. My son, blinded in shit, heads back to the piano and does his best Stevie Wonder impression while the downloader runs back into the audience to grab a toddler from the crowd.

    She begins stuffing this child into her vagina, while my parents begin screaming how she's possessed by Satan and start performing a nude exorcism on her. The power of christ compels them to kill the toddler, which also makes it easier to cram into the downloader's lovehole.

    By now, I'm so horny and aroused that I start fucking the dead baby inside the downloader while my young son starts licking my asshole and fingering his paternal grandparents. My in-laws finish abusing my daughter and start wrestling each other, which culminates in a huge powerbomb through the piano bench. The impact shatters my mother-in-law's hips, leaving her crippled.

    The strain of the throw caused my father's bad heart to seize, and he collapses in a heap on the stage. As he gurgles and foams at the mouth, my daughter runs over and begins rubbing her shit covered pussy lips all over my crippled mother-in-law.

    The downloader grabs the wooden shards of the piano bench and begins playing her father's dying body like a xylophone. My son pulls his tongue out of my asshole and begins sucking his dying grandfather's cock.

    I diall 911 and call for the paramedics who revive my father-in-law and then take turns fucking my daughter and eating the menses and shit out of her tight cunt.

    Once he's conscious we all assemble in a large circle holding hands and chanting gibberish before launching into a rousing group impression of 'A Downs Syndrome' perspective on the horrors of the holocaust, 9/11 and the bombing of Pearl Harbor.

    As we're moaning and screaming, my son runs off-stage to get the family dog. The dog runs over to my crippled mother-in-law and begins peeing on her. Once the dog finishes leaving her in a puddle of piss, my daughter stops blowing the paramedics to light the dog on fire.

    The dog yelps and howls before collapsing. My son runs over to fuck the burnt corpse while screaming, "White is right!"as my daughter begins goose-stepping around the stage, squeezing shit out of her cunt and offering Nazi salutes to the audience.

    My father-in-law begins raping my father, claiming that he's doing it for the forgotten Vietnam vets and POWs. My downloader puts my crippled mother-in-law on her shoulders as I put the downloader on my shoulders and we play a game of naked chicken.

    Once my son finishes fucking the dead dog. He takes the pieces of the piano bench and begins crucifying the corpse. Once the dog is hung like jesus, he begins weeping at the foot of the cross, saying, "Why my god have you forsaken me?"

    My daughter mounts the top of the crucifix, using it as a wooden dildo. My parents, my in-laws and the downloader join hands at the center of the stage and start singing "The Hills Are Alive With The Sound Of Music"

    I grab the lawn darts and shove one up everyone's ass before heading back to the piano to finish off the show with a rendition of Freebird."

    For the longest time, the agent just sits in silence. Finally, he manages, "That's a hell of an act. What do you call it?"

    And the watcher says, "The Aristocrats!
  • Franz Kafka 2008-09-25 13:02
    James R. Twine:
    tgape:
    James R. Twine:
    Ok - so the majority here believe that mv is an atomic operation if the src and dst are on the same filesystem (or partition?)...
    Does this hold true for ALL filesystems running under Linux/Unix? It might be that the directory is not running on ext2/ext3... What if it was a mounted FAT32 partition, or RiserFS, or UFS, or a SMB share?

    It holds for every single file on any POSIX OS, regardless of whether that file is on a native filesystem or not. This is because rename(2) is atomic, according to POSIX, so if you want to be POSIX, rename better be atomic. And it's incredibly easy to have rename(2) be sufficiently atomic for this purpose. (Note: even if rename was implemented by doing a link(2) followed by an unlink(2) of the old name, it would *still* be fast enough, unless Watcher is so fragile that it dies if it ever sees a file with multiple links. For it to do that, it'd pretty much have to be coded to do that, however - and it would still be such a tiny window that it's virtually inconceivable that it would happen regularly.)

    Btw, this means Windows, too. POSIX is a big thing.

    (Note: there are many unix OSes which have optional POSIX support. However, their non-POSIX mode is still loosely POSIX, and still does this. It's too necessary to not do it, and it's too useful to write the code to do it and then not always use it.)


    In Windows, moving (not renaming) files across drives (letters/partitions) is done via a copy and then a delete behind the scenes (look it up and/or try it with a large enough file - the article does talk about multi-GB files). As such, the file can (and does) appear in the directory listing of the destination before it is fully "there". Oh, and be sure to try that on different filesystem types, too.

    POSIX or not, assumption is also a big thing.

    Remember, my original point was that these kind of "only one right answer"-type questions often take a lot of assmptions for granted.


    We aren't talking about windows, and we aren't assuming that it goes across partitions - you don't have to deal with that case, so don't.
  • Hans 2008-09-25 13:02
    And what if the two directories don't actually reside on the same partition?
  • Hans 2008-09-25 13:03
    Ah, I see this point has already been flogged to death
  • Interviewer 2008-09-25 13:14
    The program runs on Windows, so there is no user permissions, what do you do now?
  • Hans 2008-09-25 13:19
    This solution depends entirely on whether the file copy is doing it one file at a time (ie rsync), but you could simply watch the directory and just work on any fire that isn't the newest one. Yeh, so you have to change the watcher script, and you need to add a little logic to deal with the last file (have the watcher monitor if the file has changed size in x seconds/minutes/whatever, or just have the downlaoder script drop an empty file at the very end of its copy).

    I just assume this would be simple but also avoid any potential issues like moving atomicity when your file system is partitioned in different ways.
  • Hans 2008-09-25 13:25
    Interviewer:
    The program runs on Windows, so there is no user permissions, what do you do now?


    Interviewee tries to answer. Interviewer quickly interjects: "... it's Windows CE specifically and you are about to experience a power outage in a few seconds and the server is on fire. What do you do now? Huh? Hypothetically..."

    Halp!

    :p
  • jtwine 2008-09-25 13:25
    (And neither is "cp", I tried both in this scenario...)

    OK - so on one of my Linux boxes I created a ~858MB file filled with zeros by doing a cat /dev/zero > test.bin and then doing a ^C after a few seconds.

    That file is located on one LVM filesystem using ext3. The other filesystem is ext2 on a software RAID mirror. Using two terminal sessions I started the copy from the src directory on one session, and kept doing ls -laF in the other session which was in the dest directory.

    The filename appears immediately and its size grows while the mv (or cp) is in progress.

    Result: mv Is Not Atomic.

    Details:
    uname -a output: Linux LCARS 2.6.25.6-55.fc9.x86_64 #1 SMP Tue Jun 10 16:05:21 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux

    directory listings while mv in progress:
    [root@LCARS raidmirror]# ls -laF
    
    total 70064
    drwxr-xr-x 8 root root 4096 2008-09-25 13:05 ./
    drwxr-xr-x 4 root root 4096 2008-07-21 19:50 ../
    drwxrwxrwx 2 root users 4096 2008-06-20 11:58 BackedUp/
    drwxr-xr-x 6 backuppc backuppc 4096 2008-06-23 05:14 BackupPC/
    drwx------ 2 root root 16384 2008-06-20 09:51 lost+found/
    drwxrwxrwx 4 nobody users 4096 2008-09-12 12:00 Mirrored/
    drwxrwxrwx 3 root root 4096 2008-06-22 08:56 Subversion/
    drwxrwxrwx 3 root root 4096 2008-06-22 08:49 temp/
    -rw------- 1 root root 71622656 2008-09-25 13:05 test.bin

    [root@LCARS raidmirror]# ls -laF
    total 85912
    drwxr-xr-x 8 root root 4096 2008-09-25 13:05 ./
    drwxr-xr-x 4 root root 4096 2008-07-21 19:50 ../
    drwxrwxrwx 2 root users 4096 2008-06-20 11:58 BackedUp/
    drwxr-xr-x 6 backuppc backuppc 4096 2008-06-23 05:14 BackupPC/
    drwx------ 2 root root 16384 2008-06-20 09:51 lost+found/
    drwxrwxrwx 4 nobody users 4096 2008-09-12 12:00 Mirrored/
    drwxrwxrwx 3 root root 4096 2008-06-22 08:56 Subversion/
    drwxrwxrwx 3 root root 4096 2008-06-22 08:49 temp/
    -rw------- 1 root root 87838720 2008-09-25 13:05 test.bin

    [root@LCARS raidmirror]# ls -laF
    total 98764
    drwxr-xr-x 8 root root 4096 2008-09-25 13:05 ./
    drwxr-xr-x 4 root root 4096 2008-07-21 19:50 ../
    drwxrwxrwx 2 root users 4096 2008-06-20 11:58 BackedUp/
    drwxr-xr-x 6 backuppc backuppc 4096 2008-06-23 05:14 BackupPC/
    drwx------ 2 root root 16384 2008-06-20 09:51 lost+found/
    drwxrwxrwx 4 nobody users 4096 2008-09-12 12:00 Mirrored/
    drwxrwxrwx 3 root root 4096 2008-06-22 08:56 Subversion/
    drwxrwxrwx 3 root root 4096 2008-06-22 08:49 temp/
    -rw------- 1 root root 100982784 2008-09-25 13:05 test.bin

    If you doubt the results, how about trying a test yourself before posting? Oh, and Windows does something similar.

    Thanks!
  • JPK 2008-09-25 13:29
    Nice reference to Hypothetical Questions

  • jtwine 2008-09-25 13:35
    Franz Kafka:
    We aren't talking about windows, and we aren't assuming that it goes across partitions - you don't have to deal with that case, so don't.

    (OK - as I already proved, the mv can be non-atomic on Linux, too -- at least on mine, so Windows/non-Windows is not a factor.)

    Where in the OP was it indicated what assumptions are and are not valid? Or did you just assume that the mv was not across filesystems/partitions?

    AGAIN...

    "Only one right answer" questions need to include the conditions (assumptions) in the scenario. Otherwise, only the people with limited experience with complex setups will assume that the mv would be atomic because the file would not have to cross filesystems/partitions, or because they just do not know any better.

    More experienced people, that know (for example) that /tmp should be on its own filesystem so that you do not get DoSed out of you own box by something filling up /tmp with garbage, will consider the way the real world works, and may not come up with the "one right answer."
  • JPK 2008-09-25 13:36
    SoonerMatt:
    RBoy:
    AMerrickanGirl:
    He was stuck in traffic, with a cell phone that worked perfectly, since they got through on the first try, but he hadn't bothered to call his office to tell them to make nice to the 9 am interview that he was going to be very late for.


    Ah, but what if his cell phone didn't work?


    Seriously?!? They said the office called him and his cell phone worked perfectly.


    Let's try again...

    Nice reference to Hypothetical Questions
  • Franz_Kafka 2008-09-25 13:53
    jtwine:
    Franz Kafka:
    We aren't talking about windows, and we aren't assuming that it goes across partitions - you don't have to deal with that case, so don't.

    (OK - as I already proved, the mv can be non-atomic on Linux, too -- at least on mine, so Windows/non-Windows is not a factor.)


    I already said on page 1 that malicious config was out. Both directories are on the same filesystem, so shut up about non-atomic moves. move the syscall is always atomic. period.


    Where in the OP was it indicated what assumptions are and are not valid? Or did you just assume that the mv was not across filesystems/partitions?


    No, I stated it in my solution.


    More experienced people, that know (for example) that /tmp should be on its own filesystem so that you do not get DoSed out of you own box by something filling up /tmp with garbage, will consider the way the real world works, and may not come up with the "one right answer."


    you're the one with a hardon for /tmp. don't use /tmp.
  • Rick 2008-09-25 14:19
    So did Jeremy ever find a candidate that could read minds? I'm sure the real point is to see what kind of solution an interviewee comes up with, not to see how long it takes for them to guess your 'right' solution. Why not try 'hot and cold' or 'Password'? 'Twenty questions' would be even more fair. I spy, with my little eye, something ending in .avi..


  • ChadN 2008-09-25 14:22
    Holy crap! People are *still* discussing the atomicity of renames(), etc. in this thread? After a day, and 6 pages of responses? Just download to some alternate directory, regardless of whether it is on the same filesystem or not, and create a symlink when the file is downloaded. Easy. The (one and only, imo) correct answer to the interview question for any modern OS, and most non-modern. Otherwise, use a tmpname() function to create a directory on the same filesystem for downloading, then use an atomic rename(). And yes, atomic renames within a filesystem are the rule, not the exception.
  • jtwine 2008-09-25 14:35
    Franz_Kafka:
    I already said on page 1 that malicious config was out. Both directories are on the same filesystem, so shut up about non-atomic moves. move the syscall is always atomic. period.

    Sorry, experience and wisdom tell me not to presume that different filesystems/partitions is malicious.

    As for "shutting up" (mind yourself) about non-atomic moves, I have faith that (maybe) with a few more years experience, you will understand the difference between robust and "just works". In order to design "robust", you need to know all existing assumptions and not make any new ones.

    Franz_Kafka:
    you're the one with a hardon for /tmp. don't use /tmp.

    OK - another precondition/assumption of the exercise = do not use /tmp. We should get a list somewhere...
  • jtwine 2008-09-25 14:39
    ChadN:
    Holy crap! People are *still* discussing the atomicity of renames(), etc. in this thread? After a day, and 6 pages of responses? Just download to some alternate directory, regardless of whether it is on the same filesystem or not, and create a symlink when the file is downloaded. Easy. The (one and only, imo) correct answer to the interview question for any modern OS, and most non-modern. Otherwise, use a tmpname() function to create a directory on the same filesystem for downloading, then use an atomic rename(). And yes, atomic renames within a filesystem are the rule, not the exception.
    Yes, because people are making statements without backing them up with proof. Also, I believe that softlinking to the file has already been covered (and I agree that it would work well).

    You make a good point about how (in general) atomic renames (or moves) are the rule and not the exception. That point has been made before, too. My point, at least, is that too many assumptions go into the whole "one right answer" type of problem. Although many seems to suffer from a severe case of "just don't get it" when it comes to that... :)
  • Franz_Kafka 2008-09-25 14:51
    jtwine:

    Sorry, experience and wisdom tell me not to presume that different filesystems/partitions is malicious.

    Too bad you forgot to actually read the whole solution in your experience and wisdom. Deployed config is part of the solution, and i will not consider the case where the two dirs are on different file systems. I will design the solution to fail hard if someone does that, and the reason is because I want a solution that just works.


    As for "shutting up" (mind yourself) about non-atomic moves, I have faith that (maybe) with a few more years experience, you will understand the difference between robust and "just works". In order to design "robust", you need to know all existing assumptions and not make any new ones.

    there are no assumptions about where the second dir lives; that is part of a proposed solution and is not subject to change - you may as well demand a solution that can deal with random code edits. Oh, and shut up about nonatomic moves - you and some other people have whined about it 6 or 7 times already and been shut down every time.

    Maybe when you graduate college and have to actually work with people for more than 3 months at a time you'll learn to listen.

    OK - another precondition/assumption of the exercise = do not use /tmp. We should get a list somewhere...


    Yeah, it's called "jtwine addine assumptions of his own and not owning up to them when called out". I never said /tmp, I said "A temp directory" right next to the part where I said "on the same filesystem" shortly before someone decided to interpret filesystem as the whole file tree. Either discuss the solution or don't. Setting up strawmen is just juvenile.
  • jtwine 2008-09-25 15:37
    Franz_Kafka:
    jtwine:

    Sorry, experience and wisdom tell me not to presume that different filesystems/partitions is malicious.
    ...blah... Deployed config is part of the solution, and i will not consider the case where the two dirs are on different file systems.
    OK - I see where the problem is... you are failing to realize that my posts are not about your solution, they are about the "one right answer" as proposed in the OP.

    As to why you are failing to realize this, it might have to do with the fact that I was not logged in when I wrote the "James R. Twine" posts...

    Franz_Kafka:
    Oh, and shut up about nonatomic moves - you and some other people have whined about it 6 or 7 times already and been shut down every time.
    Shhhh...
    You are confusing me with someone that did not actually demonstrate their experience with that topic. My point has not yet been (intelligently) contested.

    Franz_Kafka:
    Maybe when you graduate college and have to actually work with people for more than 3 months at a time you'll learn to listen.
    Shhhh...
    Again, you are confused. And also severely lacking in awareness of my position and experience.

    Given the above three issues, there is no need to address the remainder of your post.
  • Tii--ii--iime, must be on yoouurrr side.... 2008-09-25 16:17
    You know, we have open positions, but if I had the time to sit with 5 or 6 interviewees a week and explain a solution to even *half* of them, I'd be divorced because I'd have to spend that time plus some at home just to keep up with my project schedules.

    JamesQMurphy:
    TopCod3r:
    We have 2 open positions on our team, due to high turnover, so I interview probably about 5 or 6 people a week and have gotten really good at giving technical interviews. It usually involves giving them a real problem from some code we have, and seeing if they solve it the right way, and then I explain to them how it should be done and make sure they agree.

    It is hard to find people who have the right mix of skills and personality. Some people realize halfway through my technical interview that they lack the required knowledge and simply cut it short and walk out of the room, I assume in embarrassment.


    Have you ever asked why you have high turnover?
  • Franz_Kafka 2008-09-25 16:39
    jtwine:

    Given the above three issues, there is no need to address the remainder of your post.


    Well I guess you told me. Not like I have anything to judge you on other than your harping on /tmp incessantly.
  • jtwine 2008-09-25 17:33
    Franz_Kafka:
    jtwine:

    Given the above three issues, there is no need to address the remainder of your post.
    Well I guess you told me. ...
    Yes, I know... :P

    Seriously, I believe that the tone of our exchanges were due to a misunderstanding and I am man enough to admit that the misunderstanding may have been my fault and that I may have gotten more aggressive than was necessary.

    Peace!
  • ChadN 2008-09-25 18:15
    jtwine:
    Franz_Kafka:
    Well I guess you told me. ...

    ... I am man enough to admit that the misunderstanding may have been my fault and that I may have gotten more aggressive than was necessary.

    Peace!

    Are you seriously trying to make nice with a guy calling himself Franz_Kafka? Good luck with that.
  • Franz_Kafka 2008-09-25 19:11
    ChadN:
    jtwine:
    Franz_Kafka:
    Well I guess you told me. ...

    ... I am man enough to admit that the misunderstanding may have been my fault and that I may have gotten more aggressive than was necessary.

    Peace!

    Are you seriously trying to make nice with a guy calling himself Franz_Kafka? Good luck with that.


    Hey, I can be nice and bury hatchets and all that. Now if you'll excuse me, I'm needed in the bureaucracy thread.
  • WAAman 2008-09-25 19:28
    Franz_Kafka:
    So she's a woman and it's automatically sex discrimination?


    Like it or not, Yes.

    If, because you're a douchebag, you choose not to hire a qualified person, and that person happens to be female, black, hispanic, gay[*], or crippled, then that person has the option of suing your ass into the ground.

    If, because you're a douchebag, you choose not to hire a qualified person, and that person happens to be a straight white or asian male, then that person has the option of interviewing with a non-douchebag employer.


    [*] in some states
  • Trevor 2008-09-25 19:32
    Everyone is missing what is clearly the correct solution. It doesn't involve modifying the watcher, the kernel, having a temp directory, or even moving or copying files.

    Modify the downloader to be a FUSE file system that makes the desired file appear to be local (This is technically not modifying the kernel because it is user space). When a process tries to open and read a file it will innitiate the download and the filesystem will block the read until the download is complete. Just configure the watcher to watch whatever directory the new filesystem is mounted as and you're done.
  • Bill 2008-09-25 20:07
    I pose interviewees problems from our current projects, and if the solution sounds good I use it. Free work.
  • Steve 2008-09-25 20:38
    Publius:
    The correct solution is pretty obvious. You have the watcher and downloader walk into a talent agency. The watcher says to the talent agent, "We have a really amazing act. You should represent us."

    . . .

    For the longest time, the agent just sits in silence. Finally, he manages, "That's a hell of an act. What do you call it?"

    And the watcher says, "The Aristocrats!
    Win!
  • AntiQuercus 2008-09-25 22:31
    Indeed, I see this every week with a little backup process I have scheduled, that "moves" 1Gb of file from a samba mounted unix file system to a windows workstation drive.

    During the "move", the windows directory listing shows the file present on both systems, in its full size. When the "move" is complete, the source file disappears. This is an example of a move being done as a copy/delete. I haven't tried accessing the files during the "move", so I can't tell you if locking occurs. This takes about 5 minutes typically.

    I notice this in contrast to what happens when moving similar files between unix filesystems. The file is present on both systems during the move, but the new file has a size that grows while the file is being moved. The original file disappears when the new file reaches full-size. Between different unix file systems, move is done as a copy delete, and takes about a minute per gig.

    Of course, if I move the file to different directories on the same file system, it happens apparently instantaneously, as the move is not a copy/delete, but just a relink.

    Folks, you're all right. If you can count on being able to use a temporary file on the same filesystem, use the move (mv) method to supply a complete file to the watcher. You can count on this, as there will need to be space in the watcher's pickup area for typical downloaded files, so the staging area will have that much space available anyway, being on the same file system.

    If the staging area cannot be on the same filesystem as the watcher's pickup area (for whatever reason--policy, security, because other processes depend on that staging area), then you need to think a bit harder, considering all the suggestions above: serialising download/pickup, rearranging filesystems, getting the watcher fixed, etc.

    It was an interview hypothetical, with a simplified scenario. Otherwise, in a real shop, with a network of different platforms, cross-mounted filesystems and legacy systems, I think people are right to ask, "Whoa. Can we assume there won't be some race condition in this other method? Can we assume we have the space we need for a staging area? What else might be using the pickup area? Can we rearrange our filesystems without breaking something else? Can we work around the limitations of the components? Can we replace the braindead components?"

    I sometimes suspect that people who say, "What's the big deal? Use this simple solution, bam, sorted." have not got a lot of experience in a larger shop with multiple interlocking constraints. Solutions that work fine on a single linux desktop in your bedroom might not be appropriate in a 20+ mixed platform shop with high throughput and availability requirements. One poster whose response to one what-if was "don't do it that way" overlooks the case where it "has to be done that way" because firewall policy says so, and their higher-ups have refused requests to change the policy, or because some fragile legacy system requires it that way, and it's too hard to change.

    And you know, I reckon there's some back story we're not hearing too. Jeremy seems to be what, zinging someone in retrospect? "Nyah nyah, see how simple my solution is?"

    In the real world, it's not always that simple. Sometimes it is, sometimes it ain't.
  • Casey 2008-09-25 23:44

    So what about the scenario where the downloader runs on server xyz. It downloads a file from stuff.com and puts it in a shared folder \\xyz\files. The watcher runs on server abc which is entirely outside your domain of control, you are not allowed access to this server in any way thus there is no possible way to trigger the watcher.


    Well, if that's the case then the watcher program isn't <i>my</i> problem. Let <i>them</i> deal with a possible incomplete file.
  • Duke of New York 2008-09-26 04:29
    WAAman:
    Franz_Kafka:
    So she's a woman and it's automatically sex discrimination?


    Like it or not, Yes.

    If, because you're a douchebag, you choose not to hire a qualified person, and that person happens to be female, black, hispanic, gay[*], or crippled, then that person has the option of suing your ass into the ground.

    Again, the mere fact of refusing to hire is not decisive. A discrimination lawyer would need some facts to work with, and Dave handed them to her on a silver platter.

    1. Failing to start the interview at the scheduled time
    2. Expressing an opinion of the candidate before doing the interview
    3. Saying specifically that this opinion was not related to the candidate's professional qualifications.
  • Tepsifüles 2008-09-26 07:01
    Interviewer:
    The program runs on Windows, so there is no user permissions, what do you do now?

    attrib +H
  • Contractor.. 2008-09-26 07:55
    Oooh look at you!! Perhaps you should go onto Experts Exchange rather than read the DailyWTF!!!

  • belgariontheking 2008-09-26 08:00

    That's one hell of a link there.
  • Nova 2008-09-26 08:19
    And what if the file move also takes time?
  • tgape 2008-09-26 08:25
    grammernarzee:
    3 It's perfectly reasonable to ask why e.g. the Watcher can't be modified.


    Yes, it is. But until the interviewer answers, "Well, ok, it can be," any answers that require it be modified are not going to result in a job offer. In fact, they're WTF.
  • captain obvious 2008-09-26 08:49
    PHP should have been the hint for the third. I remember a phone interview for such a PHP role and the most advanced the questions got were OOP definitions. I ended up having to raise the level of competency in the discussion myself, ended up having two short discussions on IDE and framework choices and reasons and a slightly extended discussion on feasibility and scalability and how PHP and the LAMP stack fits in.
  • Someone You Know 2008-09-26 09:22
    Nova:
    And what if the file move also takes time?


    Here we go again...
  • tgape 2008-09-26 09:55
    jtwine:
    "Only one right answer" questions need to include the conditions (assumptions) in the scenario.


    No. "Only one right answer" questions need to make the conditions open to discovery. Jeremy did that.

    In the real world, you don't get your scenarios handed to you on a silver platter. You need to discover them. It could be that Watcher is owned by a different user, and you don't have permission, and the owner refuses to change it. It could be that it's a legacy app, with no source code, and as yet no rewrite. It could be that it's a third party app. It could be that it's deemed business critical, and they're not willing to let junior programmers touch it.

    It could just be that it's a complete mess, and all the current programmers refuse to touch it because they're afraid of breaking it. (I've been given a few of those to maintain. No fun. On the other hand, I've never handed one off to someone else - I've always cleaned them up first.)


    However, when you're in an interview question, your best bet is to take the interviewer at their word - if they say don't touch, don't touch.

    jtwine:
    More experienced people, that know (for example) that /tmp should be on its own filesystem so that you do not get DoSed out of you own box by something filling up /tmp with garbage,


    Obviously, you've not seen what happens to a Solaris box when /tmp fills, given their default config. (/tmp is a separate partition there, and it's not pretty. It's also a DoS.)

    jtwine:
    will consider the way the real world works, and may not come up with the "one right answer."


    Actually, /tmp is for temporary files, generally those whose deletion on reboot would not matter. It is not a temporary holding place for more permanent files, or those which may need to be retained between system boots. As such, unless Watcher is looking in a /tmp directory, /tmp is not appropriate for this sort of file.


    For what it's worth, I've known people who've managed to answer questions with "one right answer" with a different answer, and still get hired - it usually just needs to actually handle the situation at least as well as the "one right answer". And, none of the other answers given here have.

    I've been a system administrator for over 10 years. I know how complicated the real world is. I also see time and time again people applying a complicated fix to a simple situation, because they're too focused on the complicated. Most of the time, it works, and I don't complain too much. Some of the time, it creates (or would create - sometimes I can veto) a WTF.

    I've seen one bit of code that answers the above problem by writing the file to a temporary directory (and with a leading '.', even), locking the file with flock, locking it with fcntl, using a semaphore, and a '.lock' file, then renaming it, and undoing all of the locks. Not in the reverse order, for what it's worth. The coder defended it with, "But I didn't know the target OS would remain static, nor what the underlying filesystem would be."

    Note that this complicated solution had several failure points (program names changed to match this example):

    1. It applied flock and fcntl on the same file. On most OSes, this would deadlock with itself. Of all the possible target OSes at that shop, it only worked on the target OS the program was initially written for.

    2. It applied flock and fcntl to the temporary file name - not the new name. (It couldn't hit the new name, because it used rename(2), and so the move completed before it could possibly target the new name - which should've been a clue.)

    3. Its .lock file could have caused the same problem that the whole process was made to circumvent - watcher would've crashed on it had watch ever seen that file, just like watcher would any other 'incomplete' file.

    4. watcher didn't honor any of those locks. Putting them in downloader didn't do anything useful.

    5. rename(2) is required to fail if it's not atomic - so even if the underlying file is NFS, and remotely the two directories are actually on separate partitions, it either works atomically or fails.

    6. He didn't check for rename(2) failures.

    7. Since he also put a lock around his semaphore (just in case it ran on a system that didn't do semaphores correctly, but did atomic file creates correctly), and he mangled his unwrap just right, he had a potential deadlock on unlocking.

    8. His group owned the server on which watcher ran, and were authoritative for the filesystem layout. The responsibility for making sure watcher ran properly went along with that authority, so whoever owned watcher would be able to ensure that constraint continued to be enforced. And, since downloader died on rename(2) failures (he didn't check it, but that doesn't mean his code handled it correctly), if they failed to enforce that constraint, they would have been alerted to that fact immediately, and it would have been associated with a change that they had performed, thus blame would naturally fall to them.

    9. He wasn't fired for it. Not even though he claimed to have fixed the problem, but the problem was still there. (Hint: downloader wasn't the only program writing files to that directory - but it was the only one he updated.)

    Disclaimer: it wasn't actually the same situation, because 'downloader' wasn't downloading from the net, but rather pulling data from a database. 'watcher' didn't crash on incomplete files, it just corrupted its own database, and continued merrily running. Until, of course, it corrupted it so bad it got a null pointer where it wasn't expecting it. And it wasn't that 'watcher' *couldn't* be modified, it's just that nobody was willing to until I came along. The situation was made much simpler when I modified a 'strcmp(file, ".")' to 'strncmp(file, ".", 1)'. Although, not as much simpler as when I replaced 12k lines of C code (no comments) with 150 lines of perl code (with comments).
  • tgape 2008-09-26 10:11
    Franz Kafka:
    tgape:

    It holds for every single file on any POSIX OS...

    Btw, this means Windows, too. POSIX is a big thing.


    We aren't talking about windows, and we aren't assuming that it goes across partitions - you don't have to deal with that case, so don't.


    Cutting out the WTF, I was talking Windows - because it's the same situation. If the partition is the same between src and dst, rename(2) is atomic. If the partition is not the same between src and dst, rename(2) is atomic - but probably an error.

    The WTF changed the case he was talking about, despite my having quoted enough of his prior statements that his switcheroo showed.

    We weren't simply assuming it wasn't going across partitions; we were requiring that the rename(2) did not go across partitions.
  • tgape 2008-09-26 10:16
    gero:
    ... we already know these guys are using Linux from the context)...


    Actually, we don't. We know the interviewee indicated that a Linux kernel patch was a possibility. But, since we know that the interviewee indicated that a Linux kernel patch was a possibility, we can easily surmise that the interviewee was not a good source of information on stuff.

    I mean, honestly - a kernel patch to make a downloader/watcher scenario work? That's even worse than suggesting FUSE when people don't like the trivial rename from another directory on the same partition answer.

    (Hint: FUSE may already be implemented in the kernel, but it's still a sledgehammer. A big one, at that. Not to mention, the watcher program could be timing sensitive, and could possibly crash or corrupt data if it's suspended for too long. Yes, I'm reaching.)
  • Trevor 2008-09-26 13:01
    tgape:
    I mean, honestly - a kernel patch to make a downloader/watcher scenario work? That's even worse than suggesting FUSE when people don't like the trivial rename from another directory on the same partition answer.

    (Hint: FUSE may already be implemented in the kernel, but it's still a sledgehammer. A big one, at that. Not to mention, the watcher program could be timing sensitive, and could possibly crash or corrupt data if it's suspended for too long. Yes, I'm reaching.)

    Yes, I know FUSE was overkill when I suggested it. I just wanted to get the conversation away from the "it is" "it isn't" argument of whether or not move is atomic.
  • Duke of New York 2008-09-26 13:17
    I'm glad I only read the first story.
  • Georgy Porgy 2008-09-26 16:20
    Article:
    “What about if the Downloader just wrote files to a temporary directory, and then moved the file to the appropriate directory when the download was complete.”


    I like the simplicity, but that wouldn't solve the problem, would it?
    Wouldn't that essentially be the same problem, because a multi-gig file would be incomplete during a move, and the Watcher would crash while processing the incomplete file?
    Instead of moving a file slowly piece by piece from the internet, your just moving a file faster piece by piece from another folder.
  • James O'Boston 2008-09-26 17:35
    SomeCoder:
    Yeah, solution #2 is just great. Because we all know that every file system command is guaranteed to be atomic, right?

    mv may be CLOSE to atomic but it's definitely not guaranteed to always be atomic. And if we suddenly have to change directories across partitions then it damn well is NOT atomic.

    I think Jeremy H should get a better interview question.


    Wait wait wait wait. How can it not be atomic? Remove the directory entry from that place, and create one right here in its place? How could this /not/ be atomic? Seems a directory entry either exists, or does not exist, excluding of course bi-state SchrödingerVerzeichnisse and bi-coastal FlyoverDirectories.
  • FatherStorm 2008-09-26 18:42
    I've had to do that exact thing. My solution was to have the watcher watch a seperate log directory and on completion of uploading of orders, the downloader would dump a log of the downloads and paths to the log directory. this ensures all transfers were successful before the log wrote and the watcher acted. Also helped on journaling the download as a seperate transactionally based log.
  • El_oscuro 2008-09-26 18:43
    For the downloader problem, I would have suggested the script touch another file in a different directory, indicating the file is done.

    However, I guess modifying the Linux Kernel works too...
  • John 2008-09-26 21:10
    ChadN:
    Holy crap! People are *still* discussing the atomicity of renames(), etc. in this thread? After a day, and 6 pages of responses? Just download to some alternate directory, regardless of whether it is on the same filesystem or not, and create a symlink when the file is downloaded. Easy. The (one and only, imo) correct answer to the interview question for any modern OS, and most non-modern. Otherwise, use a tmpname() function to create a directory on the same filesystem for downloading, then use an atomic rename(). And yes, atomic renames within a filesystem are the rule, not the exception.


    Holy Crap!X2 ... Did I really read through 7 pages before someone came up with what I thought was obvious on the 1st?

    And has an added bonus of not mucking up the real directory in case of a crash.
  • Casey 2008-09-26 23:47
    Has anyone ever heard of Scheduling Software, maybe like Control-M?
    I swear, the real WTF is NO ONE around here has ever worked for an IT shop with more then a handful of people.

  • hvm 2008-09-27 08:25
    Why not just let the Watcher know which files are done through some file like finished.log. You could save the date/time of the events and keep track of all the downloaded files. You would also have better control on how the watcher works and I think it's more efficient for it to read one file instead of checking the whole directory for new files.
  • George 2008-09-28 08:41
    Oddly enough, as a person who doesn't have a job programming, that was the first thought I had.

    Let the last function of the Downloader be to start the Watcher. Why complicate matters having something run all the time when it doesn't need to?
  • Shari 2008-09-28 10:52
    belgariontheking:

    That's one hell of a link there.


    Sorreeee! thats what happens when your blog is in Hebrew, and you send a trackback with a link to this post... *sigh*
  • Cube 2008-09-29 04:34
    A Downloader program will retrieve a handful of several-gigabyte files from a remote server and save them to a certainly directory on disk. A Watcher program monitors this directory and immediately processes whichever files show up.


    If the Downloader program downloads files to a directory and the Watcher program processes them immediately, you couldn't use a temp dir -- as it would become the the "main" directory which is processed by the Watcher program.

    I'd say there's a slight contradiction between the task description and the "correct" solution.
  • GregoryD 2008-09-29 13:58
    Meh.

    I've been a technical recruiter. Being a technical recruiter requires no technical experience. All it requires is the ability to sell. You have to be able to sell the candidate on your ability to place them (and then get the candidate and manager leads from them that will eventually get you more business), and the client on your ability to find qualified candidates. You can do those two things? Great, you too can make $100K+ plus a year working 9-5, as long as you don't mind a phone being surgically attached to your head and making 100+ phone calls a day.

    Recruiters don't have the technical knowledge to know whether or not, based on your resume, whether you're qualified for the job or not. They're just going off of buzzwords and requirements that the client has given them. Even the best recruiters will make mistakes half the time. Their bread and butter are the repeat superstar candidates who they can place multiple times, combined with exclusive clients who have become fed up with poor results from various agencies.

    I did quite well as a recruiter, but I didn't enjoy the work. It might be one of the few positions that are less honorable than trial lawyers. To be really good, you will have to be able to work people with half-truths until it becomes a natural way of dealing with people.
  • tbrown 2008-09-29 15:28
    TopCod3r:
    We have 2 open positions on our team, due to high turnover, so I interview probably about 5 or 6 people a week and have gotten really good at giving technical interviews. It usually involves giving them a real problem from some code we have, and seeing if they solve it the right way, and then I explain to them how it should be done and make sure they agree.

    It is hard to find people who have the right mix of skills and personality. Some people realize halfway through my technical interview that they lack the required knowledge and simply cut it short and walk out of the room, I assume in embarrassment.


    Too funny!

    I know it's just a troll, but I can't resist giving my "what he said/what he meant" translation...

    Because I'm such an a** to work with, people on our team jump ship as soon as they find something even tentatively viable, as a result we currently have 2 open positions. I interview about 5 or 6 people a week and have memorized my technical interview questions, based on problems that I perceive in our code. I see if they can immediately see how I would solve the problem or, lacking that, whether they can be convinced to agree with my solution.

    It is hard to find people who either think exactly like I do or are good enough suck-ups to agree with my opinion no matter what. Some people realize halfway through my technical interview that I am so thoroughly obnoxious that they simply cut it short and walk out of the room, I assume in embarrassment (because if I acknowledged the real reason it would lower my sense of self esteem).
  • Zoinks 2008-09-29 20:16
    +2 for Marc's suggestion.
    .. I'd suggest checking that the Estate of Heath Robinson wouldn't have any claims over any profits you make from marketing this solution -- or at least negotiate a fee/royalty up-front.

    Do not feed the lawyers!
  • trolltard 2008-09-29 20:19
    I like that line of thinking.
    First I would ask exactly what the downloader and watcher do though and who owns the code. Then I would be most likely be misled by the phrasing of the question as most candidates apparently were and propose one or two "complicated" answers. Then I would ask them to read the problem definition again. Then I would suggest getting the code for whatever the watcher does and adding it to the downloader program so that the watcher's function is called on the file that has been downloaded as soon as the downloading is complete. Then I would probably fight against the irrational insistence that neither program be modified (probably this was an unnecessary design flaw grandfathered in from the beginning) for a minute or two and eventually give up on that track and finally suggest downloading to a temp file first or using a rename (mv).

    By the way, the real WTF here really is the fact that there are 300 plus comments on these mundane item(s) (although maybe what makes it so popular is that they are common situations that we can all relate too). Or maybe we are all just common.
  • jk 2008-09-29 23:12
    Franz_Kafka:
    Duke of New York:
    Franz_Kafka:
    If I were that guy (I'm not) and I got sued, I would stand up in court and say "Sorry your Honor, but I'm a complete asshole. I treat men and women equally poorly".

    That wouldn't keep him from having to testify as to the specific nature of the "personal problem" (not "personality problem") that was not related to her professional qualifications for the job. Or from having lawyers sift through his e-mails for evidence of a past pattern of behavior.

    Anyone who does interviews and doesn't see how clear-cut this was, is a walking liability and needs to get trained.


    That's easy - he's hiring toadies and footstools. In a less perjorative example, I'm allowed to pass on hiring someone because they don't fit in with the team, even if they're professionally qualified.

    Being an abusive jerk isn't illegal.


    in fact, at thedailywtf.com, it's a requirement.
  • AndiWand 2008-09-30 09:22
    SIGSTOP? SIGCONT?
  • gnasher 2008-09-30 18:39
    From the article: "However, because downloading takes significantly longer than processing, the Watcher program will crash if it reads a file that has not been fully downloaded."

    So this is clearly not related to the problem that the watcher starts processing before the file is completely downloaded, the problem is that the watcher is processing faster than the download happens, and reaches the end of the download file while the download is still in progress. It surely requires some clarification with the interviewer, but it is quite likely that copying the file would be much faster than processing, and that starting the processing before the copying is finished is no problem, as long as the copying is faster.
  • Random832 2008-10-01 12:29
    spenk:
    SoonerMatt:
    Marc:
    Rename?

    Yeah I was thinking that too. Rather than make it move a 3 gb file (which could fail in itself), I would start the transaction as a .tmp file then remove the .tmp when it's completed.


    The watcher appears to process *whatever* files are in that particular folder - simply altering the extension might not be enough.


    Yes, but if the interviewee had come up with that in the first place, the interviewer probably would not have resorted to "You can't modify the watcher, now what do you do?"
  • Random832 2008-10-01 12:50
    James R. Twine:
    Ok - so the majority here believe that mv is an atomic operation if the src and dst are on the same filesystem (or partition?)...

    Does this hold true for ALL filesystems running under Linux/Unix? It might be that the directory is not running on ext2/ext3... What if it was a mounted FAT32 partition, or RiserFS, or UFS, or a SMB share?


    It's true for any unix filesystem because it is required by POSIX and nobody is going to design a unix filesystem that doesn't provide for this.

    It's true for NTFS - I don't know if it's atomic as implemented on windows (but it probably is), but NTFS's support for hardlinks means the operation can be broken down into "create new hardlink, delete old hardlink" which, while two operations, can be done quickly enough that it can be done without the kernel returning control to the application (and the rename() function [used for moving when possible] is guaranteed atomic - it will _fail_ if it can't be done atomically, so you can check this in the new code in the downloader)

    I don't know about FAT. Why would you be using FAT?
  • Random832 2008-10-01 12:54
    Asiago Chow:
    if the two directories are in the same filesystem, it doesn't matter if they're in different disks. In fact, i can guarantee that they don't.


    Not sure what you meant to say here. It does matter whether they are on the same disk or partition. I don't know what you can guarantee.

    Here's what they are talking about:


    echo "hello" > \mine\tmp\tmpfile
    mv \mine\tmp\tmpfile \mine




    Now imagine the filesystem was set up one of these ways:

    [code]
    1:
    mkdir \mine
    mkdir \mine\tmp

    2:
    mkdir \mine
    mkdir \mine\tmp
    mount \dev\sda3 \mine\tmp


    What we have here is a failure to communicate.

    In this situation, /mine and /mine/tmp are no longer "in the same filesystem" as the terms are used by people who know unix.

    When savar said "But the way Linux works, the entire filesystem is represented as being contiguous, even when the physical storage isn't.", Franz_Kafka assumed he was talking about RAID or LVM or something, not "mounting two partitions in subdirectories" (since those are traditionally _still called_ two filesystems)
  • Random832 2008-10-01 13:00
    Duke of New York:
    Franz_Kafka:
    So she's a woman and it's automatically sex discrimination? He took a long lunch and blew off an appointment because he's a self centered schmuck and mostly looked down on here for going to brown. The personality fit is a valid test, but in this case, the personality he wanted was 'sycophant'.

    What "personality fit"? The interviewer was talking about some "personal level problem" that he had somehow discovered before even doing the interview.


    The rest of us assumed the 'personal level problem' was acting "clearly annoyed" when the interviewer finally showed up.

    If someone really did come in with an actual unexplained attitude problem (not the case here since it was clearly justified), you're supposed to go through the entire interview and make up some bullshit reason?
  • Random832 2008-10-01 13:10
    Asiago Chow:
    Shrug. You really should do some reading. Knowledge is always good. I can't force you of course but it will help you to understand what people are saying.

    Filesystem has two meanings.

    Your job as a reader is to figure out which of those two meanings the writer intended.

    You failed.


    Except that someone else used the term _first_, with the "one volume/partition/drive" meaning, and savar _changed_ it (in a way that was _just_ ambiguous enough, given the meaning previously in use, that he could have been talking about software RAID or LVM) in the middle of the discussion
  • Duke of New York 2008-10-01 13:50
    Random832:
    If someone really did come in with an actual unexplained attitude problem (not the case here since it was clearly justified), you're supposed to go through the entire interview and make up some bullshit reason?

    You're supposed to go through the interview and make a no-hire recommendation for specific reasons related to the job. If you don't, you risk a lawsuit.

    This is not some big secret. Ask any HR person.
  • Dave 2008-10-07 04:10
    The "Watcher" problem is a really terrible interview question, what you're asking candidates is "I have a magic silver bullet solution that I want you to guess and I'll shoot down any approach that doesn't match my silver bullet". I'd hire the candidate simply for the ingenuity he showed in approaching the problem.
  • Valeria Williams 2009-08-11 14:44
    Every time that I download my computer does not responed. What do I need to do? I don't believe that every time my system comes up theres always a proplem. Please help me, cause trying to do appliciations on line is a pain when the system keeps going down. Advise me on what you can. Thank You!!!!!
  • Valeria Williams 2009-08-11 14:44
    Every time that I download my computer does not responed. What do I need to do? I don't believe that every time my system comes up theres always a proplem. Please help me, cause trying to do appliciations on line is a pain when the system keeps going down. Advise me on what you can. Thank You!!!!!
  • Alex 2009-08-18 13:53
    [quote user="savar"][quote user="imMute"]
    All it would take is one "clever" sysadmin to put the temp directory on a separate partition and all of a sudden you've reintroduced the same race condition you had before -- except this time you're not even aware of it.[/quote]

    I'm a sysadmin and I'm deeply offended by your lack of vision.
    Putting /tmp on a different partition allows you to mount it with useful flags, such as noexec, which is a good security practice.
  • Grammar Nazi 2009-10-08 11:29
    AMerrickanGirl:
    But he could have cared less, so he wasn't getting any commission off my back.


    No, he "couldn't have cared less". If he could have cared less, then he cared. What you mean is that he didn't care. Also you shouldn't start a sentence with a conjunction.
  • cindy 2010-12-18 09:35
    find for all kinds of watches and handbags

    http://replica038.com
  • Yboy403 2011-11-12 18:51
    I loled at the first one…I'm a 17-year-old kid with no real programming experience other than the most basic dhtml and c++/Java experience, and I guessed the answer in thirty seconds.
  • Sean 2013-08-02 14:18
    Someone You Know:
    Gorfblot:
    A diamond in the rough is something that looks to be of little value, but is actually worth quite a lot once some polishing has been done. IE, You talked with one of the Tier 1 helpdesk people and realized they had some talent and were quick to learn. You get them some further training, move them to a junior development position, and in a short time they become a major contributor to success- That's a diamond in the rough.


    I don't think "IE" means what you think it means.


    I concur, what you are looking for is "EG"
  • John tan 2013-09-27 07:38
    I Guess That Would Work, Too
    from Jeremy H.

    i think this Jeremy H. doesnt have what it takes to be in IT field.

    the best answer is CRC32 hash. its two in one function, confirming it has downloaded completely nicely, and fully validate and verify the file totally identical 100%.

  • John tan 2013-09-27 07:43
    I Guess That Would Work, Too
    from Jeremy H.

    and one more important flaw in this stupid interview question. Who in the sane mind transferring several-gigabyte files remotely through internet. at least please consider file compression first....

    that software engineer is a raw gem, he is a great programmer, but not highly EQ. with a great smart manager leadership, he can be turned into a very important valuable company asset.


    shame, how a raw gem, be treated like dirt by interviewers.
  • Don't be a cockface 2014-09-09 07:45
    wds:
    imMute:
    File copies would still be a problem, but file moves (also known as renames) are extremely quick as long as the src/dst are on the same volume: the FS only moves the inode data. Hell, even windows handles this as well as *nix.

    All this assuming they're on the same partition right? So if they're not, you're in serious trouble. Considering how /tmp is the usual place to drop stuff in, and /tmp is often on another partition I don't see how this is an acceptable solution. Not to mention the problem with having allocated space in /var/program/blah but not in /tmp and thus running out of room to drop your multigig executables.

    See I'd just have used a lockfile.


    Several years later I still can't help myself - DON'T BE SO FUCKING DELIBERATELY OBTUSE YOU KNOBHEAD