- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
Hmm, you claim he's running in JavaScript, but the function looks like Java and has a comment "function is called from jsp". I do not believe this is JavaScript. Unless JSP has a means to invoke a JavaScript on the client side?
This method looks fine to me. And I am curious about how you get a '\0' into a String. Not saying it's impossible, but I'd be surprised if Java let you. :)
Admin
... and nobody's mentioned the worse WTF in there---
... It's not the use of three lines where one line with three nested replaces would do.
... it's not the use of three replaces wwhere one replaceall() would do.
It's the WTF of doing things backwards and incompetently-- trimming THREE bad characters and letting anything else through.
Perhaps it would make more sense and use the algorithm of "just let through letters that we think can appear in a name".
Admin
I don't know about Java, but in C I would:
*The real WTF is I can make a better WTF than The Daily WTF.
Admin
I guess he's trying to strip special characters out of a SQL string?? But I have no idea why he's replacing "&", or why inserts spaces in their place.
This WTF makes very little sense. I'm sure some people (read: nobody) like the stylized story that accompanies each code segment, but frankly today it's a detractor.
Why does the double quote have it's pants around its ankles...and the single quote is toting a pager?
Admin
For those who are still having issues, here is reasonably simple summary of functions for "healing" strings in javascript.
The RWTF is using a web browser for rendering anything more than a document with a couple of headings and some paragraphs of text (ok I will allow the img tag ... reluctantly)
Captcha: <sarcasm>TBL, you're my hero!</sarcasm>
Admin
Got that right! And they should also type in their credit card number as a string of 16 digits NO SPACES OR DASHES PLEASE and if their card is an Amex that has 15 digits then they can just sit there and stew.
Admin
String str = "\0";
:o)
Admin
Um, let us imagine that the developer is using some kind of proprietary wire protocol. Something used, say, for communicating with a simple device that displays text. Or for communicating medical records over old modems in Australia. This is a situation where you might like to use a JSP to give access to a number of people. And it's a situation where people know exactly what characters are problematic, because its defined in the protocol.
Why would you remove "everything but the characters that are illegal" when the set of "illegal" characters is only the control characters for the wire protocol? I mean, the set of "legal" characters is also defined in the protocol, so the two methods would be equivalent.
Remember that this code is taken way out of context. You can't say "this code is wrong!" without know what it was trying to achieve. All we know is that it "removes special characters" and by golly it does just that. The definition of "special" is context-dependent, and we don't know the context.
Admin
So it does 3 O(n) functions rather than one custom one that handled all 3 undesirable characters within (it's certainly not worth doing it with RegExps for such a simple task).
Now in the context of the data that it will be processing it could be dodgy, e.g., if there are things being passed in like "foo & bar", but that's not the function's fault, it's the specification (or not using a library method that sanitises/fixes "web" strings for processing).
Admin
Hehe, fair enough. :)
Admin
Admin
Incorrect. Neither 'public' 'static' nor 'String' are valid in JavaScript. This is plain Java, likely server-side Java (aka JSP).
Without knowing context, we cannot be sure what characters can cause damage. It may very simply be that only ' " and & mess up the system, but all other characters are okay.
There are times when you should reject, and times when you should sanitize. Sanitizing could be either escaping or replacing with spaces, depending on requirements. Again, we do not know what this function is for.
HTML is irrelevant. Although JSP is mentioned, this function could easily have nothing to do with user input. Also, users generally don't input HTML entities, but rather have their input converted as such prior to rendering in a browser.
It also depends on using regex properly. Passing a string literal as a regex probably means it'll have to be parsed every time the function is called (barring compiler optimizations, which require extreme intelligence on the compiler's part). If you want any efficiency with regexes that aren't one-shots in java, you want to use the java.util.regex classes. For example, here is the OP's function using a regex:
I leave it to the experts of the board to determine if this is faster or slower than the OP function (I figure it'll probably be a whole ±5 nanosecond difference.)
The only WTF to see here is the use of hungarian notation in the formal paramter's name (a_sValue).
Admin
uh... Lessee. Shouldn't be using a single replace char to char. What if a character (needing to be escaped) is added later to the standard (whatever it is) and maintenance don't update this code? Processing strings on a character basic in a web context (or anything high-level) is dangerous if you don't have the full picture. Almost no one does (me neither). What about Unicode or Localized strings? What if the character you replace it with may become dangerous (see escaping)?
If he's building a sql query and wants to get rid of dangerous chars, he should use QueryBuilder. Doing anything else is dangerous (see point A). If he's building an XML/HTML string, he should use another way (templates or Builders). This code per se is not a problem, but it should never be called.
This is a perfect example of Really Bad Web Practice. The fact that people in this forum find code like this one ok is only proof that something must be done at the education level...
Use libraries for building strings from user inputs, don't manage them yourself. Otherwise, you're aiming your foot and waiting for the user to trigger.
Need coffee (Not A Captcha (tm))
Admin
(Now, postal codes typically have rules -- that any letters in them (Canada/UK) are only uppercase -- then it makes sense for the system to sanitize the input. But not, IMHO, people's names.)
Admin
The real WTF is that this method is totally unsafe for translating HTML to JavaScript. Plus he shouldn't be using Perl for that anyway.
AND he doesn't take into account that curse words are harmful to little children and should be removed as well.
And don't get me started on the indentation.
Admin
Hear, hear! My pet peeve is the 99.8% of the credit card forms that will not accept the spaces that are in the account number. How am I supposed to read the number back to make sure it is correct with all the digits scrunched together? If your code doesn't like the spaces, then you should take them out -- not force me to do it for you.
Captcha: tesla. Nikola was da man! Inventor of radio, and OC before OC was cool.
Admin
I was about to write how all those comments here are The Real WTF™. Come on, people. We know it's Java, we know it's in a JSP, it's "Justice 2.0", and we should know the escaped characters are, due to some some curious coincidence, very similar to those which happen frequently in web-related stuff (SQL queries, HTML tag properties, HTML entities, URL queries). And the best you can say is that this code is out of context. Or that a public static String with no "function" keyword is JavaScript.
Yeah, sometimes the real WTF is in the comments.
Admin
I'm sure the wtf is not using a built in escaping function. To you know, sanitize the data. If quotes and &'s are an issue then they are either evaluating a string as code or running a sql query with the string.
Admin
OK, here is my WTF of the same type...
So, what's wrong with it?
Admin
Admin
Admin
I agree with what you're saying. However, before you post comments with your code, you should make sure they're correct. The function will decrement Value to a minimum of -1, not 0 like your comments claim. You want >, not >=.
Admin
[quote user="travisowens"]For those who couldn't figure it out (in order of WTF-ness):
public static String ...
This is Java embedded within the JSP, not Javascript.
(Javascript doesn't have public/private methods)
Admin
Actually, I prefer the original code to this. The original code is quick and easy to write and much more readable and maintainable. The fact that we're talking about "four backslashes" and "why the last | ?" demonstrates that. And with the original code you know at a glance how many replacements you do. I like that.
Incidentally, this is what you want (you don't need 4 backslashes): return a_sValue.replaceAll("['|"|&]", " ");
There's a lot of personal preference that goes into this decision. I'm not claiming that I'm right and you're wrong. But you say your code is more efficient. The original code takes less time for me to work with. Even if your code runs faster, it will not be noticeable unless you do this operation millions of times per second.
Admin
Admin
Again, you're making assumptions about the context that may not be valid. There's nothing that says this is a name, or anything else. We don't know what the inputs and outputs are.
Admin
I don't understand about indirection, I don't know why this thing won't load; CVS rejects my corrections - All I want is to have my peace of code...
Right - I'll get me coat.
Admin
Care to provide some evidence for that?
Used correctly, a regex solution would be compiled once to a DFA. In this case that'd be a very simple state machine that should fit in any modern general-purpose CPU's cache and would have excellent branch-prediction characteristics. The regex compilation time would be amortized over all the calls to the function, so if this is at all performance-critical the compilation overhead can be ignored.
Both solutions are O(N), but it's far from obvious that the simple-search approach will have a better constant. Especially since the search patterns are single bytes, so (assuming a byte machine) there's no benefit to using any searching method more sophisticated than brute force.
Admin
Poor spelling aside, this isn't very WTF-y.
If I were to offer a critique, it would be the hard-coded "special" characters. What's so hard about adding a param that allows any array of characters to be "removed".
Admin
This Error'd is rather boring, but there are some nice premature optimization WTFs in the comments. Using a regex that is reparsed on every single call and selling that as an optimization (although it may in practice be much slower, as one commenter pointed out) is an awesome feat.
Rule 1 of premature optimization: Do it! Rule 2 (for the clueless only): Do it immediately!
And then there is the adverse extreme, the classic "the compiler does all the thinking for me"-WTF. Not only does the Java compiler assemble all consecutive String.replace calls into one efficient loop (with a single memory allocation). And not only does it replace all the string literals you use in String.replaceAll calls with pre-compiled patterns, no, it also redeems your mortgage and cooks you a nice dinner!
And there is certainly no difference between what a compiler can do and what it actually does do. So real world measurements are superfluous in any case.
Admin
If the same developers made both design decisions (all uppercase in the database, first letter only in front-end) then that would just be poor design, but sometimes they have to interface with an anal-retentive legacy system, probably made in COBOL, which they have no control over and which no one knows how to update.
On a related note, I wonder how Prince enters his name into online forms...
Admin
WTF did I give any such advice? All I said is sometimes you have to use the data you got, and can't just reject it.
Has no one here ever written EDI code?
Admin
WOW, I never thought I would ever see a Boston song from 1976 reworked into a computing context.
golf clap
Cheers,
Admin
Probably by breaking it down into its constituent ASCII parts for later reassembly
^<O>||/...
Admin
Plus, Java recompiles the regex every time replace is called. In order to cache the compiled regex, you'd have to use a Pattern object and not use String's replaceAll method.
So you'd be correct in non-Java languages (like Perl), but you're unfortunately quite incorrect for Java. In Java, it's best to avoid regexs unless you really need them.
Admin
For anyone that doesn't realize it, JSP is for generating HTML server-side. Essentially, you embed Java in HTML.
PHP : C :: JSP : Java
Admin
You got it right, except for the PHP : C part. JSP programming basically does some magic tag replacement, wraps all the plain page stuff in print statements, and passes it on to the Java compiler when you're building the project. You can use Java in JSPs.
PHP compiles programs to the Zend virtual machine, which is not the C compiler. It usually compiles the programs at runtime. You cannot use C code in PHP files.
PHP is a lot like Windows. Very convenient, ubiquitous, and its soul is an abominable junk-heap filled with the most disgraceful sort of rubbish imaginable, mangled up in tangled-up knots.
Admin
A more elegant expression would be this:
return a_sValue.replaceAll("['"&]", " ");
Admin
Good call. Thanks for pointing that out. I admit I didn't look closely enough. I've barely dealt with regex in years. I just knew the ton of backslashes in the other poster's code had to go!
Comparing the original code to your code I'd say it's a wash. It comes down to personal preference unless there are performance issues.
Admin
Another option presents itself--preventioin is the best cure, so I would suggest designing the interface so that it precludes bad data, or at least makes the appropriate format clear. For phone numbers, this would usually mean having a separate entry control for each of its three elements. Some validation would still be in order to ensure that the input is present and numerical, but the upfront work should prevent most of the format errors.
Captcha? Nah, this is more like Tai Chi.
Admin
;-)
Admin
Come on, the real WTF is SQL injection.
If this runs as JavaScript, then it's on the client.
Remove quotes on the client, and then submitting to a back-end smells of ... you know what I'm talking about :-)
Captcha: kungfu -- it knows it's me!
Admin
the RWTF is that you guys think you know better when you don't have a single freaking clue what this function is trying to accomplish in relation to the system.
Schadenfreund is all handy dandy, but seriously, try to be a bit more open minded.
Admin
Usability trumps prevention. And sanitization is prevention. I don't keep my phone number on three pieces of paper, so why should I have to enter it like that. Of course, I'm not going to go the extreme and make the address one huge textarea, but phone numbers, credit card numbers, social security numbers, etc. should be one textbox that is fairly liberal in the formats it accepts.
I mean, I'm sorry for the lazy people who don't want to write the few extra lines (or regexps) to handle this, but think of the end user.
Admin
Take the non-word "ijams"; in English, titlecasing it yields "Ijams." In Dutch, the ij is a digraph that must be capitalized together, so you get "IJams." And in Turkish, the letter i with or without a dot are two separate letters that must be preserved, yielding something that this forum software can't handle--a capital I with a dot on top (U+0130) followed by "jams"...
Admin
This is not ECMA Script (Javascript). It's not run on the user's web browser.
It is SUN's Java. Notice the type decalarations like "public static String". The Javadoc comment even says that it's "called from jsp".
So, this probably generates HTML via JSP. The escaped letters are important to HTML (double/single-quote and ampersand). I think the WTF is that this Java function breaks the HTML output.
Admin
Why would you be sanitizing passwords anyway? Any character you can type should be a valid character for a password.
Admin
without using replaceAll, I do not think I even understand what this could be good for.
removeSpecialChars( "a&b&c") returns "a b&c"
Admin
That code is more compact and 'elegant'. I used to try to do stuff like that too. Problem is, it's harder to read and maintain, and harder for some programmer who is less sophisticated than some of us to deal with it. 'Good code' is maintainable code, and can be maintained by people who are less-adept than you. Yes, this means you have to 'dumb-down' your code some, but that isn't so bad. Also, in three months after you have moved on to some other project, do you want to have to go back and decipher your own code?
Admin
It's not thread safe. What if you had a race condition! What if TWO! threads were sending a global value and it suddenly got decremented too far! yikes!!!!