• (cs) in reply to Bim Job
    Bim Job:
    MIME much?
    Context much?
  • dlikhten (unregistered)

    What if you start telling him that by accident someone might make their extention a random combination of letters of size 1-6 characters, he needs to list them all out.

    See if the guy would actually type it all up. If he at least makes a loop we know that he read the basics of programming books :P

  • Carl (unregistered) in reply to History Teacher
    History Teacher:
    Being evolved from a system developed on old and currently ridiculously obsolete technology is not retarded.

    ...we have retards who call this retarded without suggesting any working alternative.

    The fact is, there won't be an alternative in our lifetime

    Unix is an alternative, it is working now, and was working essentially the same way even back before your "system developed on old and currently ridiculously obsolete technology" was slapped together to close a sale and maybe we'll fix it later. Oh, and it worked on that old hardware too, so you can't use that as an excuse.

    Don't try to spin history to someone who was there.

  • (cs) in reply to Anon
    Not if you are using regional settings that use comma as the decimal separator. I've been stung by that before.

    Well, then you should use (and check for) quotes when you open a CSV file:

    Example:

    John,"5th Street, number 10", 100$

    contains three columns.

  • Headchips (unregistered)

    Actually his idea initially was correct. Extension really doesn't matter except for certain files that are executables or might contain code that you don't want uploaded. You don't want someone uploading an executable that could contain a virus for example. As long as it was not a dnagerous extenstion, then check to see if the contents are valid. But he got carried away with exentension-mania and/or failed to comment his code well enough so others understood the intent.

  • AndyC (unregistered) in reply to Carl
    Carl:
    Unix is an alternative, it is working now, and was working essentially the same way even back before your "system developed on old and currently ridiculously obsolete technology" was slapped together to close a sale and maybe we'll fix it later. Oh, and it worked on that old hardware too, so you can't use that as an excuse.

    Don't try to spin history to someone who was there.

    It works just fine. Till someone gives you a file with absolutely no context as to what it is or what it was created with and you either have to keep guessing or just go back and ask them.

    The file extension is metadata. It's also the one piece of metadata that is pretty much guaranteed to survive the translation to any file system, to any OS, that can easily be transferred as part of transmitting a file (regardless of protocols etc) and that can always be used consistently on any kind of file (even plain text which is utterly unstructured).

    It may not be fancy, it may not be particularly clever, but it does work. Which is why you even see Apple using it these days.

  • ClutchDude (unregistered)

    I think we almost don't have enough background here.

    The situation could be that a delimited .txt file is being spit out to Windows users that then make a small change to it, via the system, viewer or plain textpad(hence the .txt prolly!) User then uploads the changes.

    However, like others have said, the extension is a convenience, not a requirement. Your validation logic is going to tell you if the file is good or not, so what's the point in caring about the extension?

    Truth be told, this thing will probably only be used by 4 people who should know what file they are uploading and the extension check is a simple "Whoops. Wrong file." catch.

    Captcha: validus-def:strong, mighty, powerful, exceeding

  • My Name? (unregistered) in reply to Anonymous Cow-Herd
    Anonymous Cow-Herd:
    My Name?:
    Frits:
    The real WTF is validating the extension at all. Let the data validation fail if the file is the incorrect type.

    No, the real WTF is using tabs as delimiters. Also relying on actual files is the worng way to do it. By using streams the mime-type table becomes obsolete.

    Surely should be using XML. That's more "enterprise" than using tabs or commas or semicolons or rectums as delimiters.

    I did not complain about commas or semicolons! But tabs, spaces and rectums look all alike (eg: did you mean a tab instead of 8 spaces?) should be avoided as single delimiters.

  • (cs) in reply to Headchips
    Headchips:
    Extension really doesn't matter except for certain files that are executables or might contain code that you don't want uploaded. You don't want someone uploading an executable that could contain a virus for example. As long as it was not a dnagerous extenstion...
    There's no such thing as a "dnagerous extenstion"; the extension changes nothing about the file contents. If you're reading it like a text file, an .exe that (if executed) would be the world's most disastrous computer virus presents absolutely no more security risk than any other file, and certainly no more security risk than the same file renamed to .txt.

    Where do people get the idea that data in itself can be malicious? It's only when you (or your OS) try and do something with it that it can be seen as such.

  • Sam (unregistered)

    You people are really missing the point of the article:

    "To solve this, my colleague figured the best way was to verify that the uploaded file's name had the correct extension of .txt. It's a decent first step that one would normally code as follows."

    It doesn't claim to be a perfect system, the point is how the guy attempted to program said condition.

  • some other dude (unregistered) in reply to AndyC

    If you really can't figure out what you are supposed to do with a file based on context, try using the 'file' command to determine what sort of file it is, and if it is a text file, open it up to see what is in it.

    Really this is unneeded 99.99% of the time bacause it is generally pretty damned obvious what a file is from context.

  • Swedish tard (unregistered) in reply to mariushm
    mariushm:
    Not if you are using regional settings that use comma as the decimal separator. I've been stung by that before.

    Well, then you should use (and check for) quotes when you open a CSV file:

    Example:

    John,"5th Street, number 10", 100$

    contains three columns.

    And as suggested, if it is used as a decimal separator sometimes and sometimes not? Will you parse it as a string or as a number? Or, oh joy, what if someone with a keyset different from the one used on the server enters a character that translates to " on the server? Seen those a few times actually. Imho, in CSVs, position says what type each field holds. IF it doesnt straight parse, or if there is ambiguity between acceptable characters and separators, then swap out the separator instead of pushing more rules into a system that is pretty easy from the start.

    Comma is a rather bad separator, especially since there are lots of characters hardly used in normal text or numbers to pick from in a normal keymap. Perhaps pipe? Or some other even more archaic character.

    For example; John|5th Street, number 10|100,00$

  • Jim Steichen (unregistered)

    Stupid is as Stupid does, sir.

  • lol (unregistered) in reply to m0ffx
    Comment held for moderation.
  • cinnamon colbert (unregistered)

    as a nonprogrammer, i think the idea of checking for .txt is good. after all, how many "regular" people actually know about extensions in the first place ? if the admins who are uplaoding the files are not programers, they may not even know what extensions are .....

    anyway, having it in .txt means you used notepad, or saved it somehow in a special way from excell or word, which means you actually thought about the file for a second, and , maybe, there is a prayer you got it right

  • (cs)

    Just a note: back when I was doing this kind of thing, one of the mail merge formats Microsoft Word used was the Microsoft Word Table: which was was a tab delimited text file with the extension .DOC

  • Sea Shells (unregistered) in reply to Markp
    Where do people get the idea that data in itself can be malicious?
    ...says the coworker who wonders why anyone would use strncpy instead of strcpy...

    Just sayin'...you know...

    • SS
  • dan sichel (unregistered) in reply to Jasper

    Dude, that's so 20th century. You should use a loop in text to speech engine to have the conversation with this dude.

    I was going to include code for a loop to go through all the permutations but then realized that would make me dumber than the guy who did the coding, so I'll leave that as an exercise for any masochistic moron reading this.

  • (cs)

    Well, you know what they say, when you eliminate the improbable, whatever remains, however idiotic, must be TRWTF.

  • Ike (unregistered) in reply to Shishire
    Shishire:
    [That being said, the day my *nix box starts checking for file extensions to determine what type of a file it is is the day I throw it out of a 5th story window.
    mv myhugeprog.c myhugeprog.o; make clean
  • Joseph M. (unregistered) in reply to cinnamon colbert
    cinnamon colbert:
    as a nonprogrammer, i think the after all, how many "regular" people actually know about extensions in the first place ? if the admins who are uplaoding the files are not programers, they may not even know what extensions are .....

    Ah, yes, that line of thinking is such a common source of WTFs: "I don't use that particular feature, so I won't bother to include it in my program."

  • (cs) in reply to DJ Maze
    DJ Maze:
    Is magic mime not available anymore? How about "filename.exe\x00.txt"

    Given it's ToUpper() and not .toUpper() I'd guess it's .NET, rather than Java. Which still has exceptions.

  • (cs) in reply to BSDGuy
    BSDGuy:
    ding, ding!

    Why check the extension at all? If the file meets the delimiting parameters then it's all good,....if it doesn't return BAD_FILE_FORMAT (some number..probably negative...or use an exception if it's Java which I think it is based on system.IO....).

    Two WTFs don't make a right.

    Given it's using ToUpper() and not .toUpper() I'd guess it's .NET, rather than Java. Which still has exceptions.

  • Grandpa's Long Drawers (unregistered) in reply to Jasper
    And also: Some extensions such as .asp and .aspx are in the list twice. Probably those are files you really don't want people to upload. Just to be extra sure you check twice.

    This is retarded. If you read it correctly you'll realize that it's actually listing the extensions ".aspx;.asp", ".asp;.cer", and ".aspx;.axd". It's unsurprising that you would miss this though, as knowledge of these extensions has only been made available to non-retarded people.

  • Grandpa's Long Drawers (unregistered) in reply to My Name?
    My Name?:
    Anonymous Cow-Herd:
    My Name?:
    Frits:
    The real WTF is validating the extension at all. Let the data validation fail if the file is the incorrect type.

    No, the real WTF is using tabs as delimiters. Also relying on actual files is the worng way to do it. By using streams the mime-type table becomes obsolete.

    Surely should be using XML. That's more "enterprise" than using tabs or commas or semicolons or rectums as delimiters.

    I did not complain about commas or semicolons! But tabs, spaces and rectums look all alike (eg: did you mean a tab instead of 8 spaces?) should be avoided as single delimiters.

    ASCAPE THE DERIMETER

  • Grandpa's Long Drawers (unregistered) in reply to some other dude
    some other dude:
    somedude:
    Assuming that it was stated in the requirements that the files had to have a .txt extension, the whole file extension debate is moot.

    And assuming everyone involved was a narwhale then the entire debate is also moot, because who really gives a damn what narwhales do?

    The fact is that both of these assumptions are stupid to make, being completely baseless.

    // Incorrect extension. Show error message.

    Sorry dude, there's no such thing as a "narwhale".

  • Grandpa's Long Drawers (unregistered) in reply to Shishire
    Shishire:
    Markp:
    Maybe it's just Windows people not getting it, but I absolutely never name text files with .txt on Unix-like systems. I just leave out the extension altogether.
    I'd like to point out that I make sure to label all text files on UNIX systems *.txt. Why? Because that way, I (not the computer, but the person) know that it's supposed to be text. That being said, the day my *nix box starts checking for file extensions to determine what type of a file it is is the day I throw it out of a 5th story window.

    yeah I did that too but then the system didn't boot any more. something about not being able to find the boot loader menu file. fortunately, the guy it landed on knew it wasn't an anvil, so he was okay.

  • Mr.Googler (unregistered) in reply to Engival

    Seriously?

    Why do 90% of people on here not understand "example"? This isn't gospel, people, this is humorous snippets.

    I'm sure the original poster could have given a ten page case study with notes of the "correct" way to do it, just as I'm sure you would have gone through each of the ten pages picking out typos in his comments.

  • Anonymous Cow-Herd (unregistered) in reply to cinnamon colbert
    cinnamon colbert:
    anyway, having it in .txt means you used notepad, or saved it somehow in a special way from excell or word,

    or just saved it normally and renamed it to .txt thinking it's been "converted". (See also .bmp -> .jpg)

  • Another Nonny Mouse (unregistered) in reply to SenTree

    Thought I was the only person who remembered Mr Pode of Croydon. Saluto!

  • (cs) in reply to Swedish tard
    Swedish tard:
    Comma is a rather bad separator, especially since there are lots of characters hardly used in normal text or numbers to pick from in a normal keymap.
    It made sense originally, for US numeric data without thousand separators. It's just that it's ended up being used for a lot more than that (and many implementors of CSV writers and readers are truly retarded as they don't know how to do quoting, but that's another story). All your criticism really makes sense as is “it's stupid to use things outside where they were designed for” yet I can assure you that it happens all the time. I'm aware of some code I wrote a few years ago which caused a significant WTF moment, but it was not because the code was wrong, but rather because it was being used for something completely outside the original design brief. C'est la vie.
  • Anonymous Cow-Herd (unregistered) in reply to HeebyJeeby
    HeebyJeeby:
    It's retarded because you're retarded

    YOUR RETARDED

    Discuss

  • (cs) in reply to Ike
    Ike:
    Shishire:
    [That being said, the day my *nix box starts checking for file extensions to determine what type of a file it is is the day I throw it out of a 5th story window.
    mv myhugeprog.c myhugeprog.o; make clean
    And? This has nothing to do with extensions, unless you mean you have a makefile that does things like "rm *.o" as it's clean action. But then the writer of the makefile was thinking of ".o" as an extension, make itself doesn't assume anything there. And if you have a makefile that does a "rm myhugeprog.o" or "rm *.o" then the WTF is doing "mv myhugeprog.c myhugeprog.o". But the whole thing is up to the programmer/writer of the makefile, there are no assumptions about any extensions. (gcc and some other compilers do have some defaults though, maybe those can provide a better example?) Still *nix itself doesn't care about extensions, some programs do, but not your example make.

    For a nice read about make, you might be interested in this: http://miller.emu.id.au/pmiller/books/rmch/

    I see extensions as just a bit of metadata for users to interpret with which program they should try to use the file, the programs themselves should make as few assumptions as possible based on the file name or it's extension. Depending on the kind of program, it should have other ways to identify a file, or it should defer to the user (whom might use an extension to help him).

    For example: If I get a files.tar file, I might try running it through tar, but if it's not a valid tar archive, tar should complain it's not.

  • Helfdane (unregistered)

    Of course the real real wtf is in the validation as well ;-)

    if (System.IO.Path.GetExtension(fileName).ToLower() == "txt")

    should've been:

    if (System.IO.Path.GetExtension(fileName).ToLower().Equals(txt))

  • (cs) in reply to Another Nonny Mouse
    Another Nonny Mouse:
    Thought I was the only person who remembered Mr Pode of Croydon. Saluto!
    Greetings, fellow connoisseur. The reference happened to be in my medium-term memory because the series was recently re-run on BBC Radio 7 - an excellent station for those of us who like classic comedy.
  • BlackStar (unregistered)

    My god, I seriously facepalmed...

  • Anon (unregistered) in reply to mariushm
    mariushm:
    Not if you are using regional settings that use comma as the decimal separator. I've been stung by that before.

    Well, then you should use (and check for) quotes when you open a CSV file:

    Example:

    John,"5th Street, number 10", 100$

    contains three columns.

    Try telling Excel

    Quotes don't fix the problem. Example:

    John,"5,12",100

    Now is 5,12 a string or a number?

  • bored (unregistered) in reply to Helfdane
    Helfdane:
    Of course the real real wtf is in the validation as well ;-)

    if (System.IO.Path.GetExtension(fileName).ToLower() == "txt")

    should've been:

    if (System.IO.Path.GetExtension(fileName).ToLower().Equals(txt))

    fail

  • Who (unregistered) in reply to Carl
    Carl:
    File "extensions" are TRWTF.

    A file name is just a name. You can name a file anything you want, as long as you use valid characters. It can have zero or 100 dots and the computer shouldn't care. The name shouldn't be expected to contain any metadata that the computer cares about. It is for use by humans, and the computer should let the human use whatever name is meaningful to the human.

    Here we go with this file-extension-hating argument... The one thing that I've always liked about Windows is that every file has a meaningful file extension. I don't NEED to look at any metadata or anything more complicated than the name. If it says txt, there's a 99.99% chance that it's a simple text file. When I double click that file, the only thing Windows needs to check is the file extension and it knows how to open the file.

    If I want to use Notepad++ to open a text file, I have to update exactly one record in the registry and now ALL the txt files automatically open with the new program.

    Your argument about the date, owner, etc. is ridiculous. All of those items can change and some change frequently. Of course they wouldn't be a part of the file name. The type of file will never change. A JPG doesn't just turn into a TXT file. It's always a JPG... Sometimes a type can change (ie. from a TXT to a CSV) but for the most part, that doesn't happen, and when it does, it's most likely because the contents of the file did fundamentally change or it was given the wrong name to begin with.

    Anyway, just because you prefer to not have file extensions doesn't make them wrong or broken. I understand that you can't rely on them to be 100% accurate, but it's a good, easy first line defense to help select the right files in the first place. Do I think that an app should reject files with the wrong extension? No. But if this was a quick, internal app, and the users were told specifically to use a TXT file, than why not reject anything else immediately? If they couldn't get that right, who knows what else they messed up!

    For arguments sake, if you had a file named "Configuration" how would you keep track of the fact that it's a text-based file that is human readable? On my machine, it'd be called "configuration.ini" and I would know exactly what to expect.

  • Frits (unregistered)

    Be conservative in what you do; be liberal in what you accept from others.

  • Callin (unregistered) in reply to Helfdane
    Helfdane:
    Of course the real real wtf is in the validation as well ;-)

    if (System.IO.Path.GetExtension(fileName).ToLower() == "txt")

    should've been:

    if (System.IO.Path.GetExtension(fileName).ToLower().Equals(txt))

    Of course, the REAL real WTF is why you forgot to put "txt" in quotes.

    This entire discussion is an absolute mess.

  • Callin (unregistered)

    ^

    and "Equals()" should be lowercase

  • Swedish tard (unregistered) in reply to dkf
    dkf:
    Swedish tard:
    Comma is a rather bad separator, especially since there are lots of characters hardly used in normal text or numbers to pick from in a normal keymap.
    It made sense originally, for US numeric data without thousand separators. It's just that it's ended up being used for a lot more than that (and many implementors of CSV writers and readers are truly retarded as they don't know how to do quoting, but that's another story). All your criticism really makes sense as is “it's stupid to use things outside where they were designed for” yet I can assure you that it happens all the time. I'm aware of some code I wrote a few years ago which caused a significant WTF moment, but it was not because the code was wrong, but rather because it was being used for something completely outside the original design brief. C'est la vie.

    Quite true, but today comma is still being used, even in todays international world of programming. It is a very unfortunate relic of the past, and one that is so easy to remedy that it is silly. Not that I mind. I see a lot of worse things being done again and again just because they were always done... (And I know that wasnt really your point there, I just went off on a tangent)

  • aasfrd (unregistered) in reply to Callin
    Callin:
    Helfdane:
    Of course the real real wtf is in the validation as well ;-)

    if (System.IO.Path.GetExtension(fileName).ToLower() == "txt")

    should've been:

    if (System.IO.Path.GetExtension(fileName).ToLower().Equals(txt))

    Of course, the REAL real WTF is why you forgot to put "txt" in quotes.

    This entire discussion is an absolute mess.

    Don't forget the "."

  • Anonymous (unregistered) in reply to Callin
    Callin:
    ^

    and "Equals()" should be lowercase

    Incorrect. This is .NET, not Java. In .NET, methods names are capitalised so Equals is absolutely correct. But wait, this doesn't even have anything to do with the article. Please pay attention.

  • Anonymous (unregistered) in reply to cinnamon colbert

    Let's be realistic. What is more likely to happen is that the software will tell the user that the extension has to be ".txt", so they'll take their .xlsx and rename it to .txt and upload that.

  • Bim Job (unregistered) in reply to rfsmit
    rfsmit:
    Bim Job:
    MIME much?
    Context much?
    Bim Job:
    Shishire:
    Markp:
    Maybe it's just Windows people not getting it, but I absolutely never name text files with .txt on Unix-like systems. I just leave out the extension altogether.
    I'd like to point out that I make sure to label all text files on UNIX systems *.txt. Why? Because that way, I (not the computer, but the person) know that it's supposed to be text. That being said, the day my *nix box starts checking for file extensions to determine what type of a file it is is the day I throw it out of a 5th story window.
    MIME much?
    Not often, no. I'm selective. On the whole, i like to think that I select an appropriate subset of the original opinion or statement.

    You, much, you little prick?

  • Bim Job (unregistered) in reply to Frits
    Frits:
    Be conservative in what you do; be liberal in what you accept from others.
    Yup, like that worked so well.

    Three characters: W3C.

  • History Teacher (unregistered) in reply to Carl
    Carl:
    History Teacher:
    Being evolved from a system developed on old and currently ridiculously obsolete technology is not retarded.

    ...we have retards who call this retarded without suggesting any working alternative.

    The fact is, there won't be an alternative in our lifetime

    Unix is an alternative, it is working now,
    Uh, no, it's not!

    You can't even compile your precious Unix without using tools that depend on extension to be accurate metadata about contents of the file.

    Then there's editors determining what editing mode a file should be opened in.

    There's GUI file managers determining what application should be used to open a file (no, file manager looking for magic strings in file contents is not reliable nor easily user-configurable).

    There's web servers needing to easily and efficiently provide correct mime type for all static files.

    Need I go on, or elaborate more on above cases?

    and was working essentially the same way even back before your "system developed on old and currently ridiculously obsolete technology" was slapped together to close a sale and maybe we'll fix it later. Oh, and it worked on that old hardware too, so you can't use that as an excuse.

    Don't try to spin history to someone who was there.

    If you're thinking of current "unix-like" filesystems, I believe they are a rather decent invention, from the 1980's or so, much more recent than for example CP/M with it's 8+3 filenames. But I'm not saying you're wrong about there having been better alternatives suitable for small media capacities, abysmal seek times (8" floppy) and small RAM sizes. But current kind of filesystems would be totally retarded choice for personal computer technology of the 70's... Are you sure you were there?

  • History Teacher (unregistered) in reply to RogerWilco
    RogerWilco:
    Ike:
    Shishire:
    [That being said, the day my *nix box starts checking for file extensions to determine what type of a file it is is the day I throw it out of a 5th story window.
    mv myhugeprog.c myhugeprog.o; make clean
    And? This has nothing to do with extensions, unless you mean you have a makefile that does things like "rm *.o" as it's clean action. But then the writer of the makefile was thinking of ".o" as an extension, make itself doesn't assume anything there. And if you have a makefile that does a "rm myhugeprog.o" or "rm *.o" then the WTF is doing "mv myhugeprog.c myhugeprog.o". But the whole thing is up to the programmer/writer of the makefile, there are no assumptions about any extensions. (gcc and some other compilers do have some defaults though, maybe those can provide a better example?)
    So, without using extension, how would you create a set of rules to create linkable object files from different kind of source files? Needless to say, listing every source-object filename pair and correct compiler explicitly is not an acceptable solution.
    Still *nix itself doesn't care about extensions, some programs do, but not your example make.
    Some programs and scripts that are part of a Unix OS belong to the group of programs that rely on extension being valid metadata.

Leave a comment on “Pretty Basic Validation”

Log In or post as a guest

Replying to comment #:

« Return to Article