- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
If only we could get ridoff those guys who like to say "first".
Admin
owned them good!
Admin
owned them good!
Admin
Hey, what a lovely manual "quotation" system! But I'll still keep using prepared statements. At least they don't corrupt my data :)
Admin
"ridoff", is that related to "rip-off"?
Admin
So the wtf is the lame joke ?
Admin
I don't really get it. Either that or I just look too much into it.
Admin
It's generally not the best of ideas to just replace stuff with an empty space when you should be escaping it.
Admin
Except when you know the input shouldn't contain any of these characters. Usually the replacement is more heavyweight then. But there happen situations when the target system has magic characters which can't be quoted.
This entry is not a WTF without context. At worst it's a "could be done better" practice.
Admin
Sometimes it makes perfect sense to just eat the character or replace it with something benign.
It all depends on context.
Admin
Totally agree. The function does exactly what it says on the tin!
Admin
Plus it's done a pretty inefficient way. I assume this is java, so an easier way to have done it would be in a single replaceAll() statment. Lets assume he has a reason to replace the special characters with an empty space, he could have just done:
if(a_sValue != null){ return a_sValue.replaceAll("[\\'|\\"|&|]", " "); }
But I really can't complain, I've done things like that before because I was lazy.
side note: the four backslashes thing always screws me up.
Admin
I don't get it ... Sorry, but I don't ... It may be funny or WTF, but it is far beyond my comprehension ... Perhaps when I pass a couple of Sun's Java Certification tests I may be able to laugh at this or to say WTF, but until then can someone please spell out what exactly is wrong with this peace of code ?
Admin
In which case the input should be totally rejected, not "sanitized".
Admin
Well, I think its just a little strange that the special characters gotten rid of are ' " and &. I would expect !@#$%^&*() or something to also be included. But then the submission says very little about what kinds of strings are being manipulated or for what purpose.
I certainly don't get the 'clever' flavor text surrounding the code. It just feels like someone trying to be funny and falling flat on his face. It certainly doesn't explain the situation, unless the situation was that someone wanted to get rid of some special characters and the submitter thought that was bad.
Admin
Regexp is almost always slower than straight-off character replace, even repeated 3 times. The original code is enormously more readable too.
Sure if you can afford the comfort of discarding a piece of data and reporting an error instead, go ahead.
Admin
I must admit I thought the same thing. But it is hardly a wtf. In fact, one could even argue that is more readable the way it is written. I wouldn't be surprised if the compiler could even optimise it to the same thing. At worst, it's just a bit inefficient, but not shockingly so.
Admin
Admin
It is all about context. The guy who wrote it may have had a boss that said "I'm sick of those goddam quotes and ampersands, get ridoff em!". In that case, the function is fine!
Admin
Exactly, how do we know that the string isn't a ", ' or & delimited list of integers that the coder wants to generalize into a space-delimited list. That's obviously not likely, but man, we really can't comment on something as valid as this without any context.
Admin
Agreed This is the first time I have felt that this was not WTF worthy.
I would write this code if I needed this functionality... although I'd never leave a typo in the comment, I'm too o.c. fourthat.
CAPTCHA: poindexter - a person who is intelligent but socially inept; a nerd...hehesnort
Admin
Yeah, I'm not sure what's wrong here. Assuming the programmer had a good reason to remove single quotes, double quotes, and ampersands, that seems like a perfectly reasonable way to do it. Sure, I'm sure there are other ways, even more efficient ways to do it -- but the code itself really isn't bad or buggy or wtf'ey.
So I'm guessing the problem lies in my assumption that there was a good reason to remove those things. But we really need more information here.
Admin
I will really hate myself for what I'm about to say, but...
The real WTF is that all of you guys argue over the code while the real WTF is in the comments: there is no such an expression as "get ridoff". The idiom is "get rid of smt".
Admin
What? It is this type of thought that keeps some systems from being extremely easy to use.
A user should never have to conform to the wills of the system, the system should have to conform to the will of the user. If the system doesn't need quotes and can work fine simply by removing them, why would that not be the obvious solution? You propose that the system should just reject input and ask the user to type it in again without quotes???
Admin
[quote user="TheJasper"] if(a_sValue != null){ return a_sValue.replaceAll("[\\'|\\"|&|]", " "); } [/quote]
I must admit I thought the same thing. But it is hardly a wtf. In fact, one could even argue that is more readable the way it is written. I wouldn't be surprised if the compiler could even optimise it to the same thing. At worst, it's just a bit inefficient, but not shockingly so.[/quote]
A single-character search&replace is almost always faster than a regexp. A smart compiler in the above case will realize the first argument is a static string and statically compile the replacement code. A somewhat dumber compiler will assume the replacement expression -could- be a variable, and so will first pass the first argument through a regexp parser, then generate copy&replace code, then parse the value using said code, all during runtime.
People think stuff like eval() or regexp replaces is harmless because it gets magically changed during original compilation. But what about regex=someExternalUserInput; somestring.replaceAll(regex,target); ?
Admin
I had a similar experience @ work
we were pushing XML files to a server that was not under our control, and the software was not XML compliant...
following the request of the customer, we had to replace ">" by "greater than", "<" by "lower than", and so on in the text nodes...
that's the best way for sure, fuck the standards !
Admin
If the WTF is a poor wording of a comment to a valid piece of code, then this story is Worse Than Failure.
Admin
I had a system "sanitize" my password on account creation. Everyting that was not [a-zA-Z0-9_] got replaced by an underscore. Took me some time to figure out why I couldn't log in.
Admin
If you're building a filename or directory in Windows you have to do "something" with the invalid characters. A space or underscore is probably your best bet.
Although the code still doesn't make much sense.
Admin
Context is always the key. Sometimes sanitizing by dropping/replacing characters is not really a valid option. In those situations one should reject the input (if we're not going to escape it) If this was a password value, we would not want silently to turn a password of &'""''&&" into a bunch of spaces.
I agree this may not be best way to deal with special characters, or maybe not the most efficient. But without know what they are doing with these strings or why they decided these characters are bad, ti's hardly a wtf
Admin
A test run on my PC doing the same operation 10000000 times showed that the character replace outperforms the regexp by a factor of 10 to 1.
I would say the same about readability. The fact that you mistyped it in your post (missing a backslash) says something.
Admin
Admin
"cash&items".match(/shit/)
Admin
Agreed. As an example consider the many ways a phone number can be written down: 800 555 1234 800.555.1234 (800) 555-1234 800-555-1234 555-1234 555.1234
And the internal format the application expects... 800/555-1234
The answer in this particular case would be: Sanitize when possible, reject when not.
And the real WTF is ragging on people for misspelling in comments when the meaning and intention are perfectly clear. Maybe the guy is ESL and isn't familiar with English idioms.
For shame worsethanfailure.com, for shame.
Admin
I've used a system that truncated the password to 8 characters. I was quite annoyed when after a year of using the system just fine, typing my 12-letter password with special characters at the end I learned my password was really a common dictionary word, all-lowercase. Still beats the hell of a system which required the password to contain at least 2 digits, at least 5 lowercase characters, no other characters than [0-9a-z] and be between 8 and 10 characters long, then refuse anything else.
A basic place for use of the OP code would be a simple search engine entry preprocessing. Remove the quotes after switching the "sentence mode" on, remove '&' as AND operator which is default, leave spaces because otherwise the remaining words would get concatenated.
Admin
I remember back in the bad old days when I used vb4, access and csv. Quotes were the bane of my life and no matter the amount of times we tried to filter them out, some function somewhere would let em through and screw up the whole system.
Og course we just told off the users - "How dare you use a quote in your item descriptions - its your fault the application lost all your data"
Admin
I don't like, but can accept, a system that uppercases everything and leaves it that way, but if it's going to allow some sort of mixed case, e.g., Mckay, why not allow McKay? grrr...
Admin
Admin
I would be surprised. And impressed.
I'm guessing that the direct character replace version (the one featured) is faster, but I'd be willing to bet that if the regex were precompiled then the regex would be faster by a factor of about 3.
As though it mattered. This isn't the kind of site where we are supposed to be picking over people's code for being cycles slower! Let's see some bad code snippets. Give me exponential time solutions to log (or constant) time problems! :-) I can go to YouTube if I want to see people who can't spell.
Luke
Admin
The Sparc workstations at my University had the same password weakness where it only looked at the first 8 characters. I proclaimed my password un-hackable to my roommate. When he wasn't looking I copied my password. Then I told him to watch. I pasted my password then proceeded to slam on keys for about 5 full seconds. Lo and behold it worked when I hit enter.
Admin
Nah. Most regexes (specifically, those without backreferences) will be just as fast as scanning for single characters, as long as you stay away from Unicode. They are asymptotically equivalent, and it takes a pretty big regex to get outside a modern CPU cache.
Luke
Admin
Admin
Why the last | ? I'm not very familiriazed with Java but Shouldn't it be: return a_sValue.replaceAll("[\\'|\\"|&]", " ");
CAPTCHA: Again "Slashbot"? I filled my quota of creativity with you last time.
Admin
For those who couldn't figure it out (in order of WTF-ness):
He's doing it in JavaScript, which is ran on the client side, which can easily be disabled, allowing me to submit any naughty chars I want.
He's skipped other chars which can cause damage: \0 ; \n \r (I forget the whole list)
He's replacing the bad characters with spaces, when he should stop processing if they're that nasty, or convert them into HTML char codes if he just wants to make them safe to process.
If a user entered an HTML char code, which contain a &, the third replace will really foul it up.
PS: RegEx isn't always as slow as people think it is, but for a simple replace, Regex is probably overkill. I've found cases where RegEx was much faster than non RegEx methods, it depends on the language & compiler.
Admin
Oh I forgot the Javascript syntax for RegEx.Replace. A MAJOR mistake he made is that global replace is not enabled, so he replaces only the first naughty char of each type, but no further repeating chars.
Admin
Now, if the original code had been doing the opposite (a complex RE to perform a simple one-character-for-one-character replacement) then we'd have had our WTF! for sure on the grounds of that being a stupidly expensive way to do a trivial operation. We'd have also known for sure that the original author was proving that (s)he could write Perl in any language.
Admin
I think the wtf here is that the double quote is escaped although it shouldn't be.
Admin
Without context, I don't see the wtf. It's not in javascript and the comment clearly states that the function removes "some" special characters. Never claimed to remove them all.
Admin
Until we know what the return value should be used for and where the input value comes from, I see nothing wrong with that code.
Admin
travisowens: QFT
on another note: we're fond of doing all sorts of bizarre character replacements/sanitization here. We really like: ' to become ` And we can't handle , because we do naive CSV parsing And we translate = into ~ in some places because we're passing x=y type strings as values in GET or POST contexts z=x~y
It makes me giggle every time one of those comes back to bite us in the ass. I especially liked when we just removed ' from people's names, changing O'Reilly into Oreilly. I'm sure they loved that.