• (cs) in reply to Joseph Newton
    Joseph Newton:
    Another option presents itself--preventioin is the best cure, so I would suggest designing the interface so that it precludes bad data, or at least makes the appropriate format clear. For phone numbers, this would usually mean having a separate entry control for each of its three elements. Some validation would still be in order to ensure that the input is present and numerical, but the upfront work should prevent most of the format errors.

    No, don't do that. This prevents me from copying and pasting the whole phone number in. Most implementations I've seen do not get the navigation right, so if I mistype the last number in one of those sections, I can't just hit backspace to correct it, I have to move back to the previous control and hit backspace.

    Just do the simple replace in this case. Computers are really good at this sort of thing.

  • Cory (unregistered)

    I like to call this method "the Plagiarizer"

  • AdT (unregistered) in reply to GettinSadda
    GettinSadda:
    int Decrement(int Value)
    {
      if(Value >= 0) Value--;
      return Value;
    }

    There is nothing wrong with writing a function that succinctly performs the task required by the specification - this is not a WTF!

    But this function does not work, try it for yourself:

    Decrement(5);
    assert(5 == 4); // we decremented 5, so now it's 4.
    

    This breaks!

  • Cewl (unregistered) in reply to Someone

    No it doesn't. According to the javadoc for java.lang.String

    replace(char oldChar, char newChar) Returns a new string resulting from replacing all occurrences of oldChar in this string with newChar

    all occurences

  • Cewl (unregistered) in reply to Cewl
    Cewl:
    No it doesn't. According to the javadoc for java.lang.String

    replace(char oldChar, char newChar) Returns a new string resulting from replacing all occurrences of oldChar in this string with newChar

    all occurences

    Duh, that was in response to the statement:

    removeSpecialChars( "a&b&c") returns "a b&c"

  • Stevenovitch (unregistered) in reply to AdT

    god i hope you're not serious.

    seriously though it's very possible to end up with -1 there :)

  • (cs) in reply to Stevenovitch
    Stevenovitch:
    god i hope you're not serious.
    Hey, I used sarcasm tags!
  • J H (unregistered) in reply to Licky Lindsay
    Licky Lindsay:
    s:
    Except when you know the input shouldn't contain any of these characters.

    In which case the input should be totally rejected, not "sanitized".

    If your app was an employee, what kind of employee would it be? When it's employee performance review time, how would you rate it?

    Anal-Retentive Guy [image]

    From CPU

  • Nonymous (unregistered) in reply to bonzombiekitty
    bonzombiekitty:
    tiro:
    Alex G.:
    I don't really get it. Either that or I just look too much into it.

    It's generally not the best of ideas to just replace stuff with an empty space when you should be escaping it.

    Plus it's done a pretty inefficient way. I assume this is java, so an easier way to have done it would be in a single replaceAll() statment. Lets assume he has a reason to replace the special characters with an empty space, he could have just done:

    if(a_sValue != null){ return a_sValue.replaceAll("[\\'|\\"|&|]", " "); }

    But I really can't complain, I've done things like that before because I was lazy.

    side note: the four backslashes thing always screws me up.

    I think the method in the post makes the most sense since its at least readable. Your regexp is wrong too.

    Your regexp is effectively

    /['|"|&|]/
    which matches the characters ' " & or | since you placed them in a character class. There's no reason to escape both quote types in a character class either, nor do you need that last pipe if it wasn't in a character class.

    /["'&]/
    or
    /('|"|&)/
    would have matched what the original post did.

    For a language that requires quoting (like Java) that's

    "["'&]"
    or
    "('|"|&)"

  • Nonymous (unregistered) in reply to kanet77

    That regexp is still wrong. You want ["'&] using a | in a character class is a literal |, not a logical operator and there's no reason to escape quotes in a regexp, the only reason to escape them is because of a language forcing you to place the regexp in a quoted string.

    "[\"'&]"
    or
    "(\"|'|&)"
    is what you wanted.
  • (cs)

    This site needs a Digg-like system for voting whether or not this is actually a WTF.

    My vote for this one is no.

  • Yossi (unregistered) in reply to s

    while you are right if the replacement only occurs once in the application, for a repeated replacement like this one, regexp is much more efficient. by compiling a java Pattern object once with a regexp to do all your replacements, you can reach much better performane. there's a reason 'they' invented regexp.

  • Yossi (unregistered) in reply to Art

    sure, because when you use "replaceAll" , the method call generates a java "Pattern" object and from it you generate a Matcher object for the string. the compilation of a regexp Pattern in java is expensive. for efficient code you need to do: Pattern replacer = Pattern.compile(""[\\'|\\"|&|]"); Matcher matcher = replacer.Matcher(myString); String result = matcher.replaceAll(" ");

    which is exactly what happens when you just use someString.replaceAll(pattern,somethingToReplaceWith);

    by caching the Pattern, you can achieve much grater performance.

  • fleischdot (unregistered) in reply to Licky Lindsay
    Licky Lindsay:
    s:
    Except when you know the input shouldn't contain any of these characters.

    In which case the input should be totally rejected, not "sanitized".

    aah yeah. and you're going to tell the customers every day, how the input is typed correctly ?

    Sorry, but whe thinking about similiar behaviour of some of my UI's, the standard "we're in 2007, man, why on hell is this not working as i expect it"-customer would sanitize me ;)

    and always a great captcha: sanitarium LOL

  • guest (unregistered) in reply to Welbog

    The name of the replaceAll function of java.lang.String is misleading. replaceRegex would have been a better name. It's not WTF (worse than failure) though.

  • Spud (unregistered) in reply to Yossi
    Yossi:
    while you are right if the replacement only occurs once in the application, for a repeated replacement like this one, regexp is much more efficient. by compiling a java Pattern object once with a regexp to do all your replacements, you can reach much better performane. there's a reason 'they' invented regexp.

    For a complex regex yes, but hardly for a straight replacement of a few predefined characters. Consider the following test class, the replaceCharacters method still outperforms replaceRegex with a factor of nearly 10 to 1 in my tests.

    Of course if the replace were more complex or if you wanted to make it configurable, that would be another thing.

    public class test {

    private static Pattern pattern = Pattern.compile("[\"'&]");
    
    static public void main(String a[]) {
    	int TEST_RUNS = 10000000;
    	String TEST_STRING = "   aaa ''' &&& \"\"\" ABC 12345678  ''' &&& \"\"\" "; 
    
    	System.out.println(replaceRegex(TEST_STRING).equals(replaceCharacters(TEST_STRING)));
    	
    	long start1 = System.currentTimeMillis();
    	for (int i = 0; i < TEST_RUNS; i++) {
    		replaceRegex(TEST_STRING);
    	}
    	long stop1 = System.currentTimeMillis();
    	System.out.println("Regexp: " + (stop1-start1) + "ms");
    	
    	long start2 = System.currentTimeMillis();
    	for (int i = 0; i < TEST_RUNS; i++) {
    		replaceCharacters(TEST_STRING);
    	}
    	long stop2 = System.currentTimeMillis();
    	System.out.println("Replace: " + (stop2-start2) + "ms");
    }
    
    private static String  replaceCharacters(String a_sValue) {
    	a_sValue = a_sValue.replace('\'', ' ');
    	a_sValue = a_sValue.replace('\"', ' ');
    	a_sValue = a_sValue.replace('&', ' ');
    	return a_sValue;
    }
    	
    private static String replaceRegex(String a_sValue) {
    	Matcher matcher = pattern.matcher(a_sValue);
    	return matcher.replaceAll(" ");
    }
    

    }

  • csrster (unregistered) in reply to Licky Lindsay
    Licky Lindsay:
    s:
    Except when you know the input shouldn't contain any of these characters.

    In which case the input should be totally rejected, not "sanitized".

    Rubbish. I run a system which automatically ingests tv-program information into a Digital Object Management System. The program information contains special characters, the DOMS requires that we use pid's without special characters, so we remove them pretty much as shown in this "wtf".

    The idea that we should reject programs containing non-standard characters would seriously compromise our data completeness, a key target in our system - essentially we'd fail to ingest a large fraction of our target programs, which would then be unavailable to our end-users.

  • s (unregistered) in reply to Pol
    Pol:
    This is a perfect example of Really Bad Web Practice. The fact that people in this forum find code like this one ok is only proof that something must be done at the education level...

    I think few people thought it's a Good Code(tm). But being Bad Code, it's still far from the WTF level. It's just moderately ugly and a bit too rigid. For SQL query it may be dangerous. For XML maybe cleaner methods exist. Doing quick raw conversion before processing the text for stats or matching against patterns, preparing a string for full text search, or otherwise preparing the text for processing which analyses it, doesn't store/display it though, it may be perfectly okay. Especially if adding extra characters to the set would be bad, because, say, all others must be allowed.

  • Chris (unregistered) in reply to travisowens
    travisowens:
    For those who couldn't figure it out (in order of WTF-ness):
    1. He's doing it in JavaScript, which is ran on the client side, which can easily be disabled, allowing me to submit any naughty chars I want.

    Nope. Java not JavaScript.

    2. He's skipped other chars which can cause damage: \0 ; \n \r (I forget the whole list)

    Nope. As others have pointed out, the characters might be the only ones of concern for the context that this code is used in.

    3. He's replacing the bad characters with spaces, when he should stop processing if they're that nasty, or convert them into HTML char codes if he just wants to make them safe to process.
    1. If a user entered an HTML char code, which contain a &, the third replace will really foul it up.

    You're assuming this is data that's intended to be used in a webpage (in which case an escaping function such as fn:escapeXml for JSP expression language should be used).

    PS: RegEx isn't always as slow as people think it is, but for a simple replace, Regex is probably overkill. I've found cases where RegEx was much faster than non RegEx methods, it depends on the language & compiler.

    Also depends on whether you precompile the regex up front. Too many people are used to Perl, where this is possibly done for you. Using the regex functions that come with many Unix systems libc, or the regex support in Java, you need to explicitly precompile the refex to avoid the overhead of donig it every time.

    Chris

  • Chris (unregistered) in reply to travisowens
    travisowens:
    Oh I forgot the Javascript syntax for RegEx.Replace. A MAJOR mistake he made is that global replace is not enabled, so he replaces only the first naughty char of each type, but no further repeating chars.

    As I pointed out in another post, this is Java not JavaScript, and the String replace() methods replace each occurence of the character or substring, not just the first one.

    Chris

  • Chris (unregistered) in reply to Derrick Pallas
    Derrick Pallas:
    For anyone that doesn't realize it, JSP is for generating HTML server-side. Essentially, you embed Java in HTML.

    PHP : C :: JSP : Java

    Apart from the fact that it should be PHP : PHP :: JSP : Scriptlets, most people who know what they're doing don't embed Java code in JSP anymore. Ever since the standard tag library and expression language became available, it's good practice to disable scriptlet support in your web container.

    Chris

  • Chris (unregistered) in reply to guest
    guest:
    The name of the replaceAll function of java.lang.String is misleading. replaceRegex would have been a better name. It's not WTF (worse than failure) though.

    I agree it's badly named. The replace() methods should be called replaceAll(), and the replaceAll() that currently takes a regex to be compiled into a pattern should be called replacePattern().

    Chris

  • Nelle (unregistered) in reply to MichaelWojcik
    MichaelWojcik:
    Nelle:
    can someone please spell out what exactly is wrong with this peace of code ?

    I don't understand about indirection, I don't know why this thing won't load; CVS rejects my corrections - All I want is to have my peace of code...

    Right - I'll get me coat.

    :) Made me laugh .... thx ...

    p.s. In my defence, I'n not a native speaker ..

  • Neil (unregistered)

    Wait a sec ... if the goal of the first two replacements is to replace single and double quotes with spaces, should he have used double quotes around his escaped quote marks instead of single ones?

    Am I the only one who noticed this, or does Java not distinguish between single and double quotes?

Leave a comment on “Eliminating Shady Characters”

Log In or post as a guest

Replying to comment #:

« Return to Article