• Peter Harvey (unregistered) in reply to travisowens

    Hmm, you claim he's running in JavaScript, but the function looks like Java and has a comment "function is called from jsp". I do not believe this is JavaScript. Unless JSP has a means to invoke a JavaScript on the client side?

    This method looks fine to me. And I am curious about how you get a '\0' into a String. Not saying it's impossible, but I'd be surprised if Java let you. :)

  • Ancient_Hacker (cs)

    ... and nobody's mentioned the worse WTF in there---

    ... It's not the use of three lines where one line with three nested replaces would do.

    ... it's not the use of three replaces wwhere one replaceall() would do.

    It's the WTF of doing things backwards and incompetently-- trimming THREE bad characters and letting anything else through.

    Perhaps it would make more sense and use the algorithm of "just let through letters that we think can appear in a name".

  • foobarbaz (unregistered) in reply to Peter Harvey
    Peter Harvey:
    This method looks fine to me. And I am curious about how you get a '\0' into a String. Not saying it's impossible, but I'd be surprised if Java let you. :)

    I don't know about Java, but in C I would:

    void getNull(char string[])
    {
        printf("Please press the NULL key and hit enter\n");
        string[strlen(string)] = getchar();
    }
    

    *The real WTF is I can make a better WTF than The Daily WTF.

  • savar (cs) in reply to WIldpeaks
    WIldpeaks:
    So the wtf is the lame joke ?

    I guess he's trying to strip special characters out of a SQL string?? But I have no idea why he's replacing "&", or why inserts spaces in their place.

    This WTF makes very little sense. I'm sure some people (read: nobody) like the stylized story that accompanies each code segment, but frankly today it's a detractor.

    Why does the double quote have it's pants around its ankles...and the single quote is toting a pager?

  • A. N. Consultant :p (unregistered)
    Comment held for moderation.
  • Fred (unregistered) in reply to Dan
    A user should never have to conform to the wills of the system, the system should have to conform to the will of the user. If the system doesn't need quotes and can work fine simply by removing them, why would that not be the obvious solution? You propose that the system should just reject input and ask the user to type it in again without quotes???

    Got that right! And they should also type in their credit card number as a string of 16 digits NO SPACES OR DASHES PLEASE and if their card is an Amex that has 15 digits then they can just sit there and stew.

  • Art (unregistered) in reply to Peter Harvey
    Peter Harvey:
    And I am curious about how you get a '\0' into a String. Not saying it's impossible, but I'd be surprised if Java let you. :)

    String str = "\0";

    :o)

  • Peter Harvey (unregistered) in reply to Ancient_Hacker
    Ancient_Hacker:
    It's the WTF of doing things backwards and incompetently-- trimming THREE bad characters and letting anything else through.

    Perhaps it would make more sense and use the algorithm of "just let through letters that we think can appear in a name".

    Um, let us imagine that the developer is using some kind of proprietary wire protocol. Something used, say, for communicating with a simple device that displays text. Or for communicating medical records over old modems in Australia. This is a situation where you might like to use a JSP to give access to a number of people. And it's a situation where people know exactly what characters are problematic, because its defined in the protocol.

    Why would you remove "everything but the characters that are illegal" when the set of "illegal" characters is only the control characters for the wire protocol? I mean, the set of "legal" characters is also defined in the protocol, so the two methods would be equivalent.

    Remember that this code is taken way out of context. You can't say "this code is wrong!" without know what it was trying to achieve. All we know is that it "removes special characters" and by golly it does just that. The definition of "special" is context-dependent, and we don't know the context.

  • htg (unregistered)

    So it does 3 O(n) functions rather than one custom one that handled all 3 undesirable characters within (it's certainly not worth doing it with RegExps for such a simple task).

    Now in the context of the data that it will be processing it could be dodgy, e.g., if there are things being passed in like "foo & bar", but that's not the function's fault, it's the specification (or not using a library method that sanitises/fixes "web" strings for processing).

  • Peter Harvey (unregistered) in reply to Art
    Art:
    String str = "\0";

    :o)

    Hehe, fair enough. :)

  • WhatTheFrack (cs) in reply to Licky Lindsay
    Licky Lindsay:
    s:
    Except when you know the input shouldn't contain any of these characters.

    In which case the input should be totally rejected, not "sanitized".

    Cool...then you'd end with another appearance on WTF for your application, after someone called "O'Brien" tried to enter their name and got an "invalid format" response.

  • aquanight (cs) in reply to travisowens
    travisowens:
    1. He's doing it in JavaScript, which is ran on the client side, which can easily be disabled, allowing me to submit any naughty chars I want.

    Incorrect. Neither 'public' 'static' nor 'String' are valid in JavaScript. This is plain Java, likely server-side Java (aka JSP).

    travisowens:
    2. He's skipped other chars which can cause damage: \0 ; \n \r (I forget the whole list)

    Without knowing context, we cannot be sure what characters can cause damage. It may very simply be that only ' " and & mess up the system, but all other characters are okay.

    travisowens:
    3. He's replacing the bad characters with spaces, when he should stop processing if they're that nasty, or convert them into HTML char codes if he just wants to make them safe to process.

    There are times when you should reject, and times when you should sanitize. Sanitizing could be either escaping or replacing with spaces, depending on requirements. Again, we do not know what this function is for.

    travisowens:
    4. If a user entered an HTML char code, which contain a &, the third replace will really foul it up.

    HTML is irrelevant. Although JSP is mentioned, this function could easily have nothing to do with user input. Also, users generally don't input HTML entities, but rather have their input converted as such prior to rendering in a browser.

    travisowens:
    PS: RegEx isn't always as slow as people think it is, but for a simple replace, Regex is probably overkill. I've found cases where RegEx was much faster than non RegEx methods, it depends on the language & compiler.

    It also depends on using regex properly. Passing a string literal as a regex probably means it'll have to be parsed every time the function is called (barring compiler optimizations, which require extreme intelligence on the compiler's part). If you want any efficiency with regexes that aren't one-shots in java, you want to use the java.util.regex classes. For example, here is the OP's function using a regex:

    private static final java.util.regex.Pattern specialChars = Pattern.compile("['"&]");
    public static String removeSpecialChars(String a_sValue) {
        if (a_sValue != null) {
            Matcher m = specialChars.matcher(a_sValue);
            a_sValue = m.replaceAll(" ");
        }
        return a_sValue;
    }

    I leave it to the experts of the board to determine if this is faster or slower than the OP function (I figure it'll probably be a whole ±5 nanosecond difference.)

    The only WTF to see here is the use of hungarian notation in the formal paramter's name (a_sValue).

  • Pol (unregistered)

    uh... Lessee. Shouldn't be using a single replace char to char. What if a character (needing to be escaped) is added later to the standard (whatever it is) and maintenance don't update this code? Processing strings on a character basic in a web context (or anything high-level) is dangerous if you don't have the full picture. Almost no one does (me neither). What about Unicode or Localized strings? What if the character you replace it with may become dangerous (see escaping)?

    If he's building a sql query and wants to get rid of dangerous chars, he should use QueryBuilder. Doing anything else is dangerous (see point A). If he's building an XML/HTML string, he should use another way (templates or Builders). This code per se is not a problem, but it should never be called.

    This is a perfect example of Really Bad Web Practice. The fact that people in this forum find code like this one ok is only proof that something must be done at the education level...

    Use libraries for building strings from user inputs, don't manage them yourself. Otherwise, you're aiming your foot and waiting for the user to trigger.

    Need coffee (Not A Captcha (tm))

  • AssimilatedByBorg (cs) in reply to dkf
    dkf:
    AssimilatedByBorg:
    I don't like, but can accept, a system that uppercases everything and leaves it that way, but if it's going to allow some sort of mixed case, e.g., Mckay, why not allow McKay? grrr...
    You are aware that capitalization rules are locale-specific? That means if you work with the same data in the USA, the Netherlands and Turkey, you run the risk of getting three different answers if you don't take care...
    Oh, I'm totally unaware of capitalization rules. What I'm trying to say is that if I enter McKay, then the system darn well better have a good reason for changing what I enter. I have a hard time imagining why it should ever change what I type in, when there are basically no rules (in the English world, at least) about what can appear in a person's name.

    (Now, postal codes typically have rules -- that any letters in them (Canada/UK) are only uppercase -- then it makes sense for the system to sanitize the input. But not, IMHO, people's names.)

  • Ben (unregistered)

    The real WTF is that this method is totally unsafe for translating HTML to JavaScript. Plus he shouldn't be using Perl for that anyway.

    AND he doesn't take into account that curse words are harmful to little children and should be removed as well.

    And don't get me started on the indentation.

  • Ron B (unregistered) in reply to Dan

    Hear, hear! My pet peeve is the 99.8% of the credit card forms that will not accept the spaces that are in the account number. How am I supposed to read the number back to make sure it is correct with all the digits scrunched together? If your code doesn't like the spaces, then you should take them out -- not force me to do it for you.

    Captcha: tesla. Nikola was da man! Inventor of radio, and OC before OC was cool.

  • N/k (cs) in reply to Pol
    Pol:
    uh... Lessee. Shouldn't be using a single replace char to char. What if a character (needing to be escaped) is added later to the standard (whatever it is) and maintenance don't update this code? Processing strings on a character basic in a web context (or anything high-level) is dangerous if you don't have the full picture. Almost no one does (me neither). What about Unicode or Localized strings? What if the character you replace it with may become dangerous (see escaping)?

    If he's building a sql query and wants to get rid of dangerous chars, he should use QueryBuilder. Doing anything else is dangerous (see point A). If he's building an XML/HTML string, he should use another way (templates or Builders). This code per se is not a problem, but it should never be called.

    This is a perfect example of Really Bad Web Practice. The fact that people in this forum find code like this one ok is only proof that something must be done at the education level...

    Use libraries for building strings from user inputs, don't manage them yourself. Otherwise, you're aiming your foot and waiting for the user to trigger.

    Need coffee (Not A Captcha (tm))

    Amen.

    I was about to write how all those comments here are The Real WTF™. Come on, people. We know it's Java, we know it's in a JSP, it's "Justice 2.0", and we should know the escaped characters are, due to some some curious coincidence, very similar to those which happen frequently in web-related stuff (SQL queries, HTML tag properties, HTML entities, URL queries). And the best you can say is that this code is out of context. Or that a public static String with no "function" keyword is JavaScript.

    Yeah, sometimes the real WTF is in the comments.

  • Joe (unregistered)

    I'm sure the wtf is not using a built in escaping function. To you know, sanitize the data. If quotes and &'s are an issue then they are either evaluating a string as code or running a sql query with the string.

  • GettinSadda (cs)

    OK, here is my WTF of the same type...

    /* Decrement - subtract one from an integer unless this would make it negative */
    int Decrement(int Value)
    {
      if(Value >= 0) Value--;
      return Value;
    }

    So, what's wrong with it?

    • It decrements a variable - why doesn't it increment it, that's much more useful!
    • It is a C function, but it would have been more useful as a C++ template function
    • I prefer Java
    • It uses the wrong brace style
    </sarcasm> There is nothing wrong with writing a function that succinctly performs the task required by the specification - this is not a WTF!
  • Lihtox (unregistered) in reply to BA
    BA:
    Stephen Cochran:
    In which case the input should be totally rejected, not "sanitized".
    There are many, many systems in which you can't arbitrarily reject data. Many times, you have to accept the input and make sure it doesn't do any harm.

    Agreed. As an example consider the many ways a phone number can be written down....

    Unfortunately, some websites take Stephen Cochran's advice and tell the user to "enter your phone/credit-card number without spaces or dashes". The laziness of this amazes me: the script that would strip out non-numeric characters from the number is not much longer than the warning they have now.
  • GettinSadda (cs) in reply to Joe
    Joe:
    I'm sure the wtf is not using a built in escaping function. To you know, sanitize the data. If quotes and &'s are an issue then they are either evaluating a string as code or running a sql query with the string.
    For all you know the specification could have said "all quotes and ampersands to be replaced with spaces". You are just guessing at the application - I hope I never have to use any of your code if that is how you design it!
  • SkittlesAreYum (unregistered) in reply to GettinSadda
    GettinSadda:
    OK, here is my WTF of the same type...
    /* Decrement - subtract one from an integer unless this would make it negative */
    int Decrement(int Value)
    {
      if(Value >= 0) Value--;
      return Value;
    }
    So, what's wrong with it? * It decrements a variable - why doesn't it increment it, that's much more useful! * It is a C function, but it would have been more useful as a C++ template function * I prefer Java * It uses the wrong brace style </sarcasm> There is nothing wrong with writing a function that succinctly performs the task required by the specification - this is not a WTF!

    I agree with what you're saying. However, before you post comments with your code, you should make sure they're correct. The function will decrement Value to a minimum of -1, not 0 like your comments claim. You want >, not >=.

  • code guy (unregistered) in reply to travisowens

    [quote user="travisowens"]For those who couldn't figure it out (in order of WTF-ness):

    1. He's doing it in JavaScript, which is ran on the client side, which can easily be disabled, allowing me to submit any naughty chars I want.

    public static String ...

    This is Java embedded within the JSP, not Javascript.

    (Javascript doesn't have public/private methods)

  • kanet77 (cs) in reply to Uberbandit
    Uberbandit:
    bonzombiekitty:
    Plus it's done a pretty inefficient way. I assume this is java, so an easier way to have done it would be in a single replaceAll() statment. Lets assume he has a reason to replace the special characters with an empty space, he could have just done:

    if(a_sValue != null){ return a_sValue.replaceAll("[\\'|\\"|&|]", " "); }

    But I really can't complain, I've done things like that before because I was lazy.

    side note: the four backslashes thing always screws me up.

    Why the last | ? I'm not very familiriazed with Java but Shouldn't it be: return a_sValue.replaceAll("[\\'|\\"|&]", " ");

    Actually, I prefer the original code to this. The original code is quick and easy to write and much more readable and maintainable. The fact that we're talking about "four backslashes" and "why the last | ?" demonstrates that. And with the original code you know at a glance how many replacements you do. I like that.

    Incidentally, this is what you want (you don't need 4 backslashes): return a_sValue.replaceAll("['|"|&]", " ");

    There's a lot of personal preference that goes into this decision. I'm not claiming that I'm right and you're wrong. But you say your code is more efficient. The original code takes less time for me to work with. Even if your code runs faster, it will not be noticeable unless you do this operation millions of times per second.

  • GettinSadda (cs) in reply to SkittlesAreYum
    SkittlesAreYum:
    I agree with what you're saying. However, before you post comments with your code, you should make sure they're correct. The function will decrement Value to a minimum of -1, not 0 like your comments claim. You want >, not >=.
    You are 100% right! That's what I get for writing that quickly due to being called for food! Next time I will eat first, then proof-read
  • ben (unregistered) in reply to Ancient_Hacker

    Again, you're making assumptions about the context that may not be valid. There's nothing that says this is a name, or anything else. We don't know what the inputs and outputs are.

  • MichaelWojcik (cs) in reply to Nelle
    Nelle:
    can someone please spell out what exactly is wrong with this peace of code ?

    I don't understand about indirection, I don't know why this thing won't load; CVS rejects my corrections - All I want is to have my peace of code...

    Right - I'll get me coat.

  • MichaelWojcik (cs) in reply to s
    s:
    Regexp is almost always slower than straight-off character replace, even repeated 3 times.

    Care to provide some evidence for that?

    Used correctly, a regex solution would be compiled once to a DFA. In this case that'd be a very simple state machine that should fit in any modern general-purpose CPU's cache and would have excellent branch-prediction characteristics. The regex compilation time would be amortized over all the calls to the function, so if this is at all performance-critical the compilation overhead can be ignored.

    Both solutions are O(N), but it's far from obvious that the simple-search approach will have a better constant. Especially since the search patterns are single bytes, so (assuming a byte machine) there's no benefit to using any searching method more sophisticated than brute force.

  • Phat Wednesday (unregistered)

    Poor spelling aside, this isn't very WTF-y.

    If I were to offer a critique, it would be the hard-coded "special" characters. What's so hard about adding a param that allows any array of characters to be "removed".

  • AdT (unregistered)

    This Error'd is rather boring, but there are some nice premature optimization WTFs in the comments. Using a regex that is reparsed on every single call and selling that as an optimization (although it may in practice be much slower, as one commenter pointed out) is an awesome feat.

    Rule 1 of premature optimization: Do it! Rule 2 (for the clueless only): Do it immediately!

    And then there is the adverse extreme, the classic "the compiler does all the thinking for me"-WTF. Not only does the Java compiler assemble all consecutive String.replace calls into one efficient loop (with a single memory allocation). And not only does it replace all the string literals you use in String.replaceAll calls with pre-compiled patterns, no, it also redeems your mortgage and cooks you a nice dinner!

    And there is certainly no difference between what a compiler can do and what it actually does do. So real world measurements are superfluous in any case.

  • Aaron (unregistered) in reply to AssimilatedByBorg
    AssimilatedByBorg:
    What I'm trying to say is that if I enter McKay, then the system darn well better have a good reason for changing what I enter. I have a hard time imagining why it should ever change what I type in, when there are basically no rules (in the English world, at least) about what can appear in a person's name.
    My guess is that the system probably stores it in all caps, and the developers added some code to the front-end which they thought would make it a little less ugly (and does, at least 9 times out of 10).

    If the same developers made both design decisions (all uppercase in the database, first letter only in front-end) then that would just be poor design, but sometimes they have to interface with an anal-retentive legacy system, probably made in COBOL, which they have no control over and which no one knows how to update.

    On a related note, I wonder how Prince enters his name into online forms...

  • Stephen Cochran (unregistered) in reply to Lihtox
    Lihtox:
    BA:
    Stephen Cochran:
    In which case the input should be totally rejected, not "sanitized".
    There are many, many systems in which you can't arbitrarily reject data. Many times, you have to accept the input and make sure it doesn't do any harm.

    Agreed. As an example consider the many ways a phone number can be written down....

    Unfortunately, some websites take Stephen Cochran's advice and tell the user to "enter your phone/credit-card number without spaces or dashes". The laziness of this amazes me: the script that would strip out non-numeric characters from the number is not much longer than the warning they have now.

    WTF did I give any such advice? All I said is sometimes you have to use the data you got, and can't just reject it.

    Has no one here ever written EDI code?

  • Reaver (unregistered) in reply to MichaelWojcik
    MichaelWojcik:
    Nelle:
    can someone please spell out what exactly is wrong with this peace of code ?

    I don't understand about indirection, I don't know why this thing won't load; CVS rejects my corrections - All I want is to have my peace of code...

    Right - I'll get me coat.

    WOW, I never thought I would ever see a Boston song from 1976 reworked into a computing context.

    golf clap

    Cheers,

  • Reaver (unregistered) in reply to Aaron
    Aaron:
    On a related note, I wonder how Prince enters his name into online forms...

    Probably by breaking it down into its constituent ASCII parts for later reassembly

    ^<O>||/...

  • Anon (unregistered) in reply to MichaelWojcik
    MichaelWojcik:
    s:
    Regexp is almost always slower than straight-off character replace, even repeated 3 times.

    Care to provide some evidence for that?

    No direct evidence (sorry) but in Java that's almost always true. I've replaced a bunch of replaceAll-style calls with simple replace statements and the speed increase was something insane like 10 times.

    MichaelWojcik:
    Used correctly, a regex solution would be compiled once to a DFA. In this case that'd be a very simple state machine that should fit in any modern general-purpose CPU's cache and would have excellent branch-prediction characteristics. The regex compilation time would be amortized over all the calls to the function, so if this is at all performance-critical the compilation overhead can be ignored.
    Unfortunately, this is Java code, and Java regexs are compiled into a DFA using a ton of objects. So you get massive overhead of creating the objects, and enough overhead from the invoking the virtual functions on the objects to make your assertion false.

    Plus, Java recompiles the regex every time replace is called. In order to cache the compiled regex, you'd have to use a Pattern object and not use String's replaceAll method.

    So you'd be correct in non-Java languages (like Perl), but you're unfortunately quite incorrect for Java. In Java, it's best to avoid regexs unless you really need them.

  • Derrick Pallas (cs)

    For anyone that doesn't realize it, JSP is for generating HTML server-side. Essentially, you embed Java in HTML.

    PHP : C :: JSP : Java

  • fennec (cs) in reply to Derrick Pallas
    Derrick Pallas:
    For anyone that doesn't realize it, JSP is for generating HTML server-side. Essentially, you embed Java in HTML.

    PHP : C :: JSP : Java

    You got it right, except for the PHP : C part. JSP programming basically does some magic tag replacement, wraps all the plain page stuff in print statements, and passes it on to the Java compiler when you're building the project. You can use Java in JSPs.

    PHP compiles programs to the Zend virtual machine, which is not the C compiler. It usually compiles the programs at runtime. You cannot use C code in PHP files.

    PHP is a lot like Windows. Very convenient, ubiquitous, and its soul is an abominable junk-heap filled with the most disgraceful sort of rubbish imaginable, mangled up in tangled-up knots.

  • - (unregistered) in reply to kanet77
    kanet77:
    Incidentally, this is what you want (you don't need 4 backslashes): return a_sValue.replaceAll("[\'|\"|&]", " ");
    You can drop one of the backslashes. And you use | wrong. | inside [] means the actual |-character

    A more elegant expression would be this:

    return a_sValue.replaceAll("['"&]", " ");

  • kanet77 (cs) in reply to -
    -:
    kanet77:
    Incidentally, this is what you want (you don't need 4 backslashes): return a_sValue.replaceAll("[\'|\"|&]", " ");
    You can drop one of the backslashes. And you use | wrong. | inside [] means the actual |-character

    A more elegant expression would be this:

    return a_sValue.replaceAll("['"&]", " ");

    Good call. Thanks for pointing that out. I admit I didn't look closely enough. I've barely dealt with regex in years. I just knew the ton of backslashes in the other poster's code had to go!

    Comparing the original code to your code I'd say it's a wash. It comes down to personal preference unless there are performance issues.

  • Joseph Newton (unregistered) in reply to BA
    BA:
    Stephen Cochran:
    In which case the input should be totally rejected, not "sanitized".
    There are many, many systems in which you can't arbitrarily reject data. Many times, you have to accept the input and make sure it doesn't do any harm.

    Agreed. As an example consider the many ways a phone number can be written down: 800 555 1234 800.555.1234 (800) 555-1234 800-555-1234 555-1234 555.1234

    And the internal format the application expects... 800/555-1234

    The answer in this particular case would be: Sanitize when possible, reject when not.

    And the real WTF is ragging on people for misspelling in comments when the meaning and intention are perfectly clear. Maybe the guy is ESL and isn't familiar with English idioms.

    For shame worsethanfailure.com, for shame.

    Another option presents itself--preventioin is the best cure, so I would suggest designing the interface so that it precludes bad data, or at least makes the appropriate format clear. For phone numbers, this would usually mean having a separate entry control for each of its three elements. Some validation would still be in order to ensure that the input is present and numerical, but the upfront work should prevent most of the format errors.

    Captcha? Nah, this is more like Tai Chi.

  • Zygo (unregistered) in reply to Zen
    Zen:
    I had a system "sanitize" my password on account creation. Everyting that was not [a-zA-Z0-9_] got replaced by an underscore. Took me some time to figure out why I couldn't log in.
    How do you know the _ wasn't replaced by an underscore?

    ;-)

  • Jon W (unregistered)

    Come on, the real WTF is SQL injection.

    If this runs as JavaScript, then it's on the client.

    Remove quotes on the client, and then submitting to a back-end smells of ... you know what I'm talking about :-)

    Captcha: kungfu -- it knows it's me!

  • Stevenovitch (unregistered)

    the RWTF is that you guys think you know better when you don't have a single freaking clue what this function is trying to accomplish in relation to the system.

    Schadenfreund is all handy dandy, but seriously, try to be a bit more open minded.

  • BA (unregistered) in reply to Joseph Newton
    Joseph Newton:
    BA:
    Stephen Cochran:
    In which case the input should be totally rejected, not "sanitized".
    There are many, many systems in which you can't arbitrarily reject data. Many times, you have to accept the input and make sure it doesn't do any harm.

    Agreed. As an example consider the many ways a phone number can be written down: 800 555 1234 800.555.1234 (800) 555-1234 800-555-1234 555-1234 555.1234

    And the internal format the application expects... 800/555-1234

    The answer in this particular case would be: Sanitize when possible, reject when not.

    And the real WTF is ragging on people for misspelling in comments when the meaning and intention are perfectly clear. Maybe the guy is ESL and isn't familiar with English idioms.

    For shame worsethanfailure.com, for shame.

    Another option presents itself--preventioin is the best cure, so I would suggest designing the interface so that it precludes bad data, or at least makes the appropriate format clear. For phone numbers, this would usually mean having a separate entry control for each of its three elements. Some validation would still be in order to ensure that the input is present and numerical, but the upfront work should prevent most of the format errors.

    Captcha? Nah, this is more like Tai Chi.

    Usability trumps prevention. And sanitization is prevention. I don't keep my phone number on three pieces of paper, so why should I have to enter it like that. Of course, I'm not going to go the extreme and make the address one huge textarea, but phone numbers, credit card numbers, social security numbers, etc. should be one textbox that is fairly liberal in the formats it accepts.

    I mean, I'm sorry for the lazy people who don't want to write the few extra lines (or regexps) to handle this, but think of the end user.

  • silverpie (cs) in reply to AssimilatedByBorg
    AssimilatedByBorg:
    Oh, I'm totally unaware of capitalization rules. What I'm trying to say is that if I enter McKay, then the system darn well better have a good reason for changing what I enter. I have a hard time imagining why it should ever change what I type in, when there are basically no rules (in the English world, at least) about what can appear in a person's name.

    Take the non-word "ijams"; in English, titlecasing it yields "Ijams." In Dutch, the ij is a digraph that must be capitalized together, so you get "IJams." And in Turkish, the letter i with or without a dot are two separate letters that must be preserved, yielding something that this forum software can't handle--a capital I with a dot on top (U+0130) followed by "jams"...

  • Andrew (unregistered) in reply to travisowens
    travisowens:
    For those who couldn't figure it out (in order of WTF-ness):
    1. He's doing it in JavaScript, which is ran on the client side, which can easily be disabled, allowing me to submit any naughty chars I want.

    2. He's skipped other chars which can cause damage: \0 ; \n \r (I forget the whole list)

    3. He's replacing the bad characters with spaces, when he should stop processing if they're that nasty, or convert them into HTML char codes if he just wants to make them safe to process.

    4. If a user entered an HTML char code, which contain a &, the third replace will really foul it up.

    PS: RegEx isn't always as slow as people think it is, but for a simple replace, Regex is probably overkill. I've found cases where RegEx was much faster than non RegEx methods, it depends on the language & compiler.

    This is not ECMA Script (Javascript). It's not run on the user's web browser.

    It is SUN's Java. Notice the type decalarations like "public static String". The Javadoc comment even says that it's "called from jsp".

    So, this probably generates HTML via JSP. The escaped letters are important to HTML (double/single-quote and ampersand). I think the WTF is that this Java function breaks the HTML output.

  • SuperousOxide (cs) in reply to Thief^
    Thief^:
    I've had something similar where a system accepted but truncated the password I gave it on signup, but didn't tell me and didn't truncate the password on login. It's very annoying when a system changes things without even telling you.

    Why would you be sanitizing passwords anyway? Any character you can type should be a valid character for a password.

  • Someone (unregistered)

    without using replaceAll, I do not think I even understand what this could be good for.

    removeSpecialChars( "a&b&c") returns "a b&c"

  • ContractorInLivingHell (unregistered) in reply to bonzombiekitty

    That code is more compact and 'elegant'. I used to try to do stuff like that too. Problem is, it's harder to read and maintain, and harder for some programmer who is less sophisticated than some of us to deal with it. 'Good code' is maintainable code, and can be maintained by people who are less-adept than you. Yes, this means you have to 'dumb-down' your code some, but that isn't so bad. Also, in three months after you have moved on to some other project, do you want to have to go back and decipher your own code?

  • Icelight (unregistered) in reply to GettinSadda
    GettinSadda:
    OK, here is my WTF of the same type...
    /* Decrement - subtract one from an integer unless this would make it negative */
    int Decrement(int Value)
    {
      if(Value >= 0) Value--;
      return Value;
    }
    So, what's wrong with it? * It decrements a variable - why doesn't it increment it, that's much more useful! * It is a C function, but it would have been more useful as a C++ template function * I prefer Java * It uses the wrong brace style </sarcasm> There is nothing wrong with writing a function that succinctly performs the task required by the specification - this is not a WTF!

    It's not thread safe. What if you had a race condition! What if TWO! threads were sending a global value and it suddenly got decremented too far! yikes!!!!

Leave a comment on “Eliminating Shady Characters”

Log In or post as a guest

Replying to comment #:

« Return to Article