- Feature Articles
- CodeSOD
-
Error'd
- Most Recent Articles
- Secret Horror
- Not Impossible
- Monkeys
- Killing Time
- Hypersensitive
- Infallabella
- Doubled Daniel
- It Figures
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
Nice.... Everyone knows what happens when you buttume...
Admin
I think there's some sort of order of operations issue here. Shouldn't it start with the file extension, check that the file extension matches the anticipated contents (and continue), or if it doesn't, then do some file-detection magic?
Or, failing that, just move the priority of Fortran detection lower in the ladder (so that CSV is detected first). ... ... Oh! Oh! I get it! We can store our enable-disable settings in the config!
Admin
But if you do that, my fortan files that I have the .csv file extension on (for "C-like by SteVen") won't work :(
Admin
Proposed solution: Name them .SVC, for Steven's C.
Admin
But that conflicts with "Steven's Visual C" :P
Admin
Proposed solution: Name them .VCS, for Visual-C by Steven
Admin
But that clashes with the files for my "Video Control System"
Admin
Proposed solution: Name them .CVS, for Control for Video System. :P ;P
C'mon guys, there's only so many combinations of these three letters I can do....
Admin
Control directory for the Concurrent Versioning System? Kind of a stretch since those aren't files and are literally called
.CVS
, maybe someone else knows something better.That's the challenge!
Admin
Reminds me of this bug: https://support.microsoft.com/en-us/kb/215591
Admin
This is why using magic numbers in production code (as opposed to one-off manual checks) is a horrible idea.
And yes, I've seen people defending this idea. (Thankfully, only on Reddit, not in code I've had to work with)
Admin
CSV, or: How I Learned to Stop Using Magic Numbers and Love the Filename Extensions
Admin
Sorry, but Andrew has a quick and slick workround available: get the code which imports these csv files to check whether it starts "C " and if so add a dummy line at the top.
Admin
Or perhaps check for "C " and add quotes around the first field if found.
Admin
Magic numbers are always useful; they're fast and mostly reliable, and always work as a first-fail. Since they're not 100% reliable, obviously you have to handle failures, jumping back and parsing each possibility in turn, but so many engineers have that "aha!" moment and never the "but what if....?" epiphany. Ah, the fresh smell of hubris in the morning.
Admin
It reminded me of the manual I saw for a big, big, big IBM pen plotter back in the day, with a careful explanation of something that everyone older than about three days old will have learned: how to plug an appliance into a electrical outlet.
Admin
Simple solution don't use Apple Products [applies to 2004 - so over a decade old]
Admin
.FU -- FORTRAN, Uncompiled
There! Problem solved.
Although, a Real Programmer should be able to do anything in FORTRAN.
Admin
Humph. You'd think so, but half the kids coming out of coding boot camp nowadays haven't even heard of FORTRAN.
Admin
TRWTF is that Andrew didn't replace the "enterprise software" hooks for ingesting the CSV with a quick subroutine or module call that is available in practically any modern programming language to ingest and create the necessary output for the other subunit rather than try to wrangle the 3rd party software. Saves irritation and effort trying to convince the 3rd party that is a problem.
Admin
But that sounds like work!
Admin
@Immibis_, I like to learn more about this...
mmmagic
. We chose this module because it detects file type via magic numbers. It requires compilation..exe
with a.png
extension and the software would be none-the-wiser.So to me, my two options are detection via extension, or via magic numbers. If using magic numbers in production code is a horrible idea, then what is the correct course of action?
Admin
As long as you don't need to detect CSV (which has no “magic number” detection scheme at all that can distinguish it from plain text) then you can use your current strategy. The best MIME detector I know of is Apache Tika, but it's not perfect and is also a pretty heavyweight dependency unless you're using Java already (which qualifies it for the Elephant In The Room award I guess).
Admin
You can always use my preferred method for handling MIME types:
Admin
Sounds like...
Steam for Linux Launcher
:point_up: Found in the Steam for Linux Launcher code.
Admin
"Magic numbers" does not mean what you think it means. This is content sniffing (done terribly).
Admin
I had used libmagic by GNU... And guess what. That didn't work for me because for the weirdest reasons libmagic does that looking-into-the-file as well. You don't go and misinterpret my files and get away with it. I have now switched to the lighttpd kind of way to map a type to an extension
Admin
This also reminds me of the Notepad glitch where a text file containing the text pattern:
was interpreted as Unicode.
Filed under: Looks like they fixed it. Facts no longer hidden!
Admin
TRWTF is that they're only checking for an uppercase C. Various FORTRANS allow C, c, D, d, !, *, and // as comment starters.
One trick we used to pull on fortran programmers: we would run a script that would add an invisible display code 74 as the first character of each line. The compiler considered character 74 as a comment, so the whole program was one long comment. The compiler would cheerfully compile and run that! Real fast!
Admin
Reminds me of this bug: https://bugs.launchpad.net/ubuntu/+source/file/+bug/248619
Or rather: https://bugs.launchpad.net/ubuntu/+source/cupsys/+bug/255161
Admin
Greatest bug of all time.
Admin
“Help! My files won't print on Tuesdays!” :laughing:
Admin
In your case, I'd probably elect to detect via extension but verify the file matches the image format. Which in the avatar case you need to do anyway for size checks/resizing.
If the image doesn't pass either end, you've got a human to return the error to anyway.
Admin
You mean I should just process it, and if my image manipulation lib derps out, then throw an error?
How does this fare, security-wise, is it exploitable?
Admin
Well, I'd say if your image handling library is safe, you can hand it a windows executable and it'll say "Hey bro, that's not a jpeg".
If your image handling library is not safe, you can probably hand it a crafted jpeg that passes your magic checker but turns out into an exploit further down.
And I'd say the image library is more likely to error out on a windows executable than on a crafted image file.
Admin
Out of curiosity, I tried this:
Admin
I'm the original submitter. As usual, the story has been extensively fictionalised and elaborated, but the basics are still there - disallowing uploads if the content doesn't pass a magic test might possibly make sense for formats with standard headers where magic is reliable, but for loosely defined formats like CSV where pretty much any text file with commas might qualify and hence less common patterns like FORTRAN comments are (quite reasonably) chosen as a higher probability is just madness. So it's not sensible as a general case for all uploads, and in reality I just edited one line in the third-party library to disable the magic test and accept the uploaded file regardless of contents. It's then up to the CSV parser to fail if the file is invalid.
And TRWTF is probably that Excel (which output the CSV files in question) doesn't default to quoting string columns, which would also have avoided the issue. Of course you can't ask end-users of a website to adjust their Excel settings.
Admin
Edit: There is no input that returns CSV when given to
file
.Admin
Perhaps it was badly worded? Using magic numbers to test for a particular file type is fine (and I assume you only have a small set of allowed image types). Using magic numbers to guess the file type is bad.
Or: all JPEG files start with FFD8, but not all files that start with FFD8 are JPEG files.