• (disco)

    Nice.... Everyone knows what happens when you buttume...

  • (disco)

    I think there's some sort of order of operations issue here. Shouldn't it start with the file extension, check that the file extension matches the anticipated contents (and continue), or if it doesn't, then do some file-detection magic?

    Or, failing that, just move the priority of Fortran detection lower in the ladder (so that CSV is detected first). ... ... Oh! Oh! I get it! We can store our enable-disable settings in the config!

  • (disco) in reply to Tsaukpaetra
    Tsaukpaetra:
    Shouldn't it start with the file extension, check that the file extension matches the anticipated contents (and continue), or if it doesn't, then do some file-detection magic?

    But if you do that, my fortan files that I have the .csv file extension on (for "C-like by SteVen") won't work :(

  • (disco) in reply to sloosecannon

    Proposed solution: Name them .SVC, for Steven's C.

  • (disco) in reply to Tsaukpaetra

    But that conflicts with "Steven's Visual C" :P

  • (disco) in reply to sloosecannon

    Proposed solution: Name them .VCS, for Visual-C by Steven

  • (disco) in reply to Tsaukpaetra

    But that clashes with the files for my "Video Control System"

  • (disco) in reply to PleegWat

    Proposed solution: Name them .CVS, for Control for Video System. :P ;P

    C'mon guys, there's only so many combinations of these three letters I can do....

  • (disco) in reply to Tsaukpaetra
    Tsaukpaetra:
    Proposed solution: Name them .CVS, for Control for Video System.

    Control directory for the Concurrent Versioning System? Kind of a stretch since those aren't files and are literally called .CVS, maybe someone else knows something better.

    Tsaukpaetra:
    C'mon guys, there's only so many combinations of these three letters I can do....

    That's the challenge!

  • (disco)

    Reminds me of this bug: https://support.microsoft.com/en-us/kb/215591

  • (disco)

    This is why using magic numbers in production code (as opposed to one-off manual checks) is a horrible idea.

    And yes, I've seen people defending this idea. (Thankfully, only on Reddit, not in code I've had to work with)

  • (disco)

    CSV, or: How I Learned to Stop Using Magic Numbers and Love the Filename Extensions

  • (disco)

    Sorry, but Andrew has a quick and slick workround available: get the code which imports these csv files to check whether it starts "C " and if so add a dummy line at the top.

  • (disco) in reply to Quite

    Or perhaps check for "C " and add quotes around the first field if found.

  • (disco) in reply to immibis_
    immibis_:
    This is why using magic numbers in production code (as opposed to one-off manual checks) is a horrible idea.

    And yes, I've seen people defending this idea. (Thankfully, only on Reddit, not in code I've had to work with)

    Magic numbers are always useful; they're fast and mostly reliable, and always work as a first-fail. Since they're not 100% reliable, obviously you have to handle failures, jumping back and parsing each possibility in turn, but so many engineers have that "aha!" moment and never the "but what if....?" epiphany. Ah, the fresh smell of hubris in the morning.

  • (disco) in reply to Zemm
    Zemm:
    Reminds me of this bug: https://support.microsoft.com/en-us/kb/215591
    I like the careful explanation of how to insert an apostrophe in the text file.

    It reminded me of the manual I saw for a big, big, big IBM pen plotter back in the day, with a careful explanation of something that everyone older than about three days old will have learned: how to plug an appliance into a electrical outlet.

  • (disco) in reply to Zemm
    Zemm:
    Reminds me of this bug: https://support.microsoft.com/en-us/kb/215591

    Simple solution don't use Apple Products [applies to 2004 - so over a decade old]

  • (disco) in reply to PleegWat

    .FU -- FORTRAN, Uncompiled

    There! Problem solved.

    Although, a Real Programmer should be able to do anything in FORTRAN.

  • (disco) in reply to ka1axy

    Humph. You'd think so, but half the kids coming out of coding boot camp nowadays haven't even heard of FORTRAN.

  • (disco)

    TRWTF is that Andrew didn't replace the "enterprise software" hooks for ingesting the CSV with a quick subroutine or module call that is available in practically any modern programming language to ingest and create the necessary output for the other subunit rather than try to wrangle the 3rd party software. Saves irritation and effort trying to convince the 3rd party that is a problem.

  • (disco) in reply to Hasteur

    But that sounds like work!

  • (disco) in reply to immibis_

    @Immibis_, I like to learn more about this...

    • NodeBB needs to detect file types when a user uploads an image (as avatar/attachment)
    • We use a module called mmmagic. We chose this module because it detects file type via magic numbers. It requires compilation.
    • I've made it my mission to eliminate compiled libraries from NodeBB. We no longer use imagemagick (in favour of jimp) or compiled bcrypt (in favour of bcrypt.js), but all other MIME detection packages are all detecting via file extension, which is not exactly safe, as someone could upload a .exe with a .png extension and the software would be none-the-wiser.

    So to me, my two options are detection via extension, or via magic numbers. If using magic numbers in production code is a horrible idea, then what is the correct course of action?

  • (disco) in reply to julianlam
    julianlam:
    So to me, my two options are detection via extension, or via magic numbers. If using magic numbers in production code is a horrible idea, then what is the correct course of action?

    As long as you don't need to detect CSV (which has no “magic number” detection scheme at all that can distinguish it from plain text) then you can use your current strategy. The best MIME detector I know of is Apache Tika, but it's not perfect and is also a pretty heavyweight dependency unless you're using Java already (which qualifies it for the Elephant In The Room award I guess).

  • (disco) in reply to dkf

    You can always use my preferred method for handling MIME types:

    try
    {
    //assume you have received the correct MIME type
    }
    catch(Exception e) 
    {
    }
    
  • (disco) in reply to Vault_Dweller

    Sounds like...

    Steam for Linux Launcher

    :point_up: Found in the Steam for Linux Launcher code.

  • (disco) in reply to immibis_
    immibis_:
    This is why using magic numbers in production code (as opposed to one-off manual checks) is a horrible idea.

    "Magic numbers" does not mean what you think it means. This is content sniffing (done terribly).

  • (disco)

    I had used libmagic by GNU... And guess what. That didn't work for me because for the weirdest reasons libmagic does that looking-into-the-file as well. You don't go and misinterpret my files and get away with it. I have now switched to the lighttpd kind of way to map a type to an extension

  • (disco)

    This also reminds me of the Notepad glitch where a text file containing the text pattern:

    they hid the facts
    

    was interpreted as Unicode.


    Filed under: Looks like they fixed it. Facts no longer hidden!

  • (disco) in reply to sloosecannon

    TRWTF is that they're only checking for an uppercase C. Various FORTRANS allow C, c, D, d, !, *, and // as comment starters.

    One trick we used to pull on fortran programmers: we would run a script that would add an invisible display code 74 as the first character of each line. The compiler considered character 74 as a comment, so the whole program was one long comment. The compiler would cheerfully compile and run that! Real fast!

  • (disco) in reply to Zemm
    Zemm:
    Reminds me of this bug: https://support.microsoft.com/en-us/kb/215591

    Reminds me of this bug: https://bugs.launchpad.net/ubuntu/+source/file/+bug/248619

    Or rather: https://bugs.launchpad.net/ubuntu/+source/cupsys/+bug/255161

  • (disco) in reply to mihi

    Greatest bug of all time.

  • (disco) in reply to mihi

    “Help! My files won't print on Tuesdays!” :laughing:

  • (disco) in reply to julianlam
    julianlam:
    So to me, my two options are detection via extension, or via magic numbers. If using magic numbers in production code is a horrible idea, then what is the correct course of action?

    In your case, I'd probably elect to detect via extension but verify the file matches the image format. Which in the avatar case you need to do anyway for size checks/resizing.

    If the image doesn't pass either end, you've got a human to return the error to anyway.

  • (disco) in reply to PleegWat

    You mean I should just process it, and if my image manipulation lib derps out, then throw an error?

    How does this fare, security-wise, is it exploitable?

  • (disco) in reply to julianlam

    Well, I'd say if your image handling library is safe, you can hand it a windows executable and it'll say "Hey bro, that's not a jpeg".

    If your image handling library is not safe, you can probably hand it a crafted jpeg that passes your magic checker but turns out into an exploit further down.

    And I'd say the image library is more likely to error out on a windows executable than on a crafted image file.

  • (disco)

    Out of curiosity, I tried this:

    $ file - <<< 'C Hello'
    /dev/stdin: ASCII FORTRAN program text
    
  • (disco)

    I'm the original submitter. As usual, the story has been extensively fictionalised and elaborated, but the basics are still there - disallowing uploads if the content doesn't pass a magic test might possibly make sense for formats with standard headers where magic is reliable, but for loosely defined formats like CSV where pretty much any text file with commas might qualify and hence less common patterns like FORTRAN comments are (quite reasonably) chosen as a higher probability is just madness. So it's not sensible as a general case for all uploads, and in reality I just edited one line in the third-party library to disable the magic test and accept the uploaded file regardless of contents. It's then up to the CSV parser to fail if the file is invalid.

    And TRWTF is probably that Excel (which output the CSV files in question) doesn't default to quoting string columns, which would also have avoided the issue. Of course you can't ask end-users of a website to adjust their Excel settings.

  • (disco) in reply to Troy
    $ cat trwtf.csv 
    C COLON BACK SLASH,2,bupkis,9999
    C IS FOR COOKIE,6,Donald Trump,0
    C SPACE SOMETHING,1,Dillis Filler,112
    $ file trwtf.csv 
    trwtf.csv: FORTRAN program, 
    

    Edit: There is no input that returns CSV when given to file.

  • (disco) in reply to julianlam
    julianlam:
    So to me, my two options are detection via extension, or via magic numbers. If using magic numbers in production code is a horrible idea, then what is the correct course of action?

    Perhaps it was badly worded? Using magic numbers to test for a particular file type is fine (and I assume you only have a small set of allowed image types). Using magic numbers to guess the file type is bad.

    Or: all JPEG files start with FFD8, but not all files that start with FFD8 are JPEG files.

Leave a comment on “You're Not My MIME Type”

Log In or post as a guest

Replying to comment #:

« Return to Article