- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
Way, way too much. :facepalm:
Admin
Bonus WTF: How many times does it recalculate
name.Length - 1
andname[i].Length - 1
?Admin
I'll additionally posit the author of that code was probably not capable of typing 10-finger blind, or he'd have put the string in alphabetical order.
Admin
I thought that to keep regular you just had to follow this guide?
Admin
A WTF which includes regular expressions but that the regular expression is not the WTF?!?!? What's next? Fixing kernel bugs with Visual Basic?
Admin
And how expensive do you think those are? The concatenation problems are more of an issue. The repeated linear searching of a string is just extra fun, as nobody has ever heard of character properties and could hardly ever conceive of implementing them by an efficient lookup table…
Admin
Compared to everything else, not very, but .Length and -1 are both unnecessary calculations which further add to the performance hit caused by the primary :wtf:, and these unnecessary calculations could be easily avoided. That sort of thing is a pet peeve of mine.
Admin
If the compiler knows the length never changes, surely it optimizes it to cache the result of the calculation?
Admin
According to the link that I provided:
...Though I now see a :wtf: in that benchmark, depending on how well C# handles int declarations versus just accessing
one.Length
directly.Admin
Does the C# language definition exclude all ways the string (and hence its length) might be modified while the loop is running?
Admin
yes. string is immutable.
Admin
A “sufficiently good” compiler could do that — the real semantics would support it safely — but the tricky bit would be telling the compiler that the length really really is immutable. Also, I've no idea how much work has been put into hoisting such things in the Microsoft C# compiler; there are other, more important, fish to fry. A heck of a lot bigger win would be detecting the “build a string by appending without really doing so properly” case and optimising that. That would do an awesome amount to improve the speed of user code as it is depressingly common.
It's definitely possible to do the analysis and determine that this sort of thing is applicable. I've seen code that does things of equivalent complexity when compiling other languages…
Admin
So:
Everybody will know the answer to (2). Not everybody will necessarily remember the answer to (1) offhand. And what if (2) is what was really wanted?
Self-documentation is valuable even at the price of inefficiency.
Admin
Also let's not forget the whole mess with j <= names[i].length -1 which is (at least logically in a loop context) identical to j != names[i].length
at least this is the case in perl (and i think in other c-derived languages as well):
perl -e 'for ($x=0; $x != 5; $x++){print "$x\n"}' 0 1 2 3 4
perl -e 'for ($x=0; $x <= 4; $x++){print "$x\n"}' 0 1 2 3 4
What remains is if this makes any difference in runtime or not (and considering the create and throw away strings in this code, I guess not... :smile: ).
Admin
Well then do regex like "[^QWERTYUIOPASDFGHJKLYXCVBNMqwertyuiopasdfghjklyxcvbnm0123456789_]" instead of "[^\w]" if you don't know what \w means. But PLEASE don't loop through every character and concatenate hundreds of immutable strings together and thus creating more immutable strings.
Admin
What if someone stupid places a j++ in the end of the loop block for whatever reason (e.g. to skip a character)? Suddenly j > names[i].length-1 and != won't catch that but <= will.
Admin
Or they could have used
j < names[i].Length
, which is usually a lot more common in C-style for loops.Admin
https://dotnetfiddle.net/hOxdEC
Admin
Anyone who doesn't simply do i < length is suspect.
I prefer to list the characters in regex (the range support keeps it small but readable) unless there is a really good reason to use something such as \w because of being anal about security. Specify exactly what you want. I rarely use something like that but it does come in useful and convenient with intl sometimes. Posix properties especially.
Admin
Why not [^A-Za-z0-9_]-berg?
Also:
It is harder to read. More operations and more code. Also, even if it's a tiny micro optimisation, it actually isn't when written the fast simple way by default. Things often add up. If you're writing twice as slow everywhere your program ends up taking twice the cycles everywhere. Sooner or later those unnecessary slowdowns will be multiplied. People who add extra complexity that slows things down, has no benefit, no even a shorthand, etc I suspect are trying to create small cases to optimise later so that they can keep their job. Write a loop like that all the time you'll soon find one that iterates a billion times and benefits from this optimisation. I wouldn't go crazy with the premature micro optimisation but things like this should be created appropriately in the first place.
But you shouldn't rely on the compiler when you shouldn't have to as a rule of thumb.
Admin
Synthetic benchmarks are bad, but any benchmarks with no environment information are absolutely worthless. There is no difference between the two on a current CLR.
Either way it comes down to function call overhead (since Length is a property), because the length itself is never calculated on the fly (String has been immutable from the beginning). The subtraction part is really not even worth thinking about. Know your things etc.
Admin
Of course, but we're talking about :wtf:s here, not clean code.
Admin
Again, :wtf:-ery not clean regex. If he doesn't trust \w to include some (huma) language specific characters, what makes you think he trusts A-Z not to including ÄÖÜ or something like that?
Admin
This reminds me of a good point actually. I test and train people. As a rule of thumb, validate your input, don't corrupt it. Sometimes you have to go the sanitisation route but others do it out of a misguided sense of convenience or that something must always work. Another rule is that if you must sanitise, try to do it on display output only or consider if you need to retain your source data in case sanitisation gets it wrong. A big focus is around preserving data integrity.
The WTF sample is potentially bad because it might be doing sanitisation (lossy corruption) where validation will do. In one test I ask people to not allow special characters. If this WTF were validation it would do that the way that is most likely right for any given scenario but then we don't know the context so can't be sure. In validation it is usually safer to deny too much than not enough especially when feedback is given allowing the user to workaround or raise an issue. You can also use a sanitised version as a proposal if it is user input rather than an external datasource.
The funny thing about the validation test is that everyone thinks special characters are just a few because of the English language gives that impression (special = rare), the keyboard does, etc and always create huge lists of special characters (all the punctuation on the keyboard, special accented language characters, normally no more, often in the wrong charset so they are just dumping arbitrary binary data into regex). Very few realise to instead only allow the characters that are not special.
A-Z introducing characters not in the ASCII range would be weird. Some level of mistrust is good for righting more predictable, reliable and secure code. However too much and someone would not be able to write code at all, it would be too slow. It normally indicates a lack of knowledge or deeper understanding when you see overly cautious code. Normally you try to maintain enough control over your environment to not have crazy internal variability :D. So you wouldn't set your regex interpreter to anything other than ascii or some form or unicode and not have the interpreter compile your code as anything but ascii or utf8. I have never used a regex implementation that I know of that would allow something not extending ascii and I doubt many exist. If you needed something else it would be converted by the program according to need.
Admin
How many check with you to verify that the characters you regard as "special" are the same ones they regard as "special"?
Admin
I'm having problems assigning a meaning to that sentence. It's almost meaningful, but not quite…
In reference to what I was talking about, an optimisation which was pointed at making the concatenate-in-a-loop antipattern less horribly bad would improve a lot of user code. Yes, the compiler is not obligated to do such a thing, and there's a better solution already that better programmers use, but there's a lot of crap programmers out there and this would make a big difference to their code. It's also a really constrained case that it would be possible to actually do something about. (Yes, I do know what sort of things compilers can do. This is one of them.)
You might be arguing that compilers shouldn't be that smart. (I'm guessing that that's what you're talking about above, but I could be wrong as your sentence managed to be a bit slippery in meaning.) I understand such a position, but I know the shit that gets hacked together day in day out by the grunts in this industry: what I propose would definitely help.
Admin
I mean that in this specific case (loop condition) the compiler shouldn't have to fix. The programmer should not have done it like that to begin with.
Admin
OK, that's what I thought most likely, and responded to. In short, yes, but no. Because incompetent programmers.
Admin
Not many professionally as it usually means the requirements and specifications aren't being done properly.
Giving training assignments that opportunity is not usually given because it is useful to test people's independent thinking about how to address the problem based on the context. If you don't specify it exactly giving no right answer, you give them a chance to produce a better scheme than your own.
To be fair though, more people should ask. A lot of people will work with unclear requirements and try their best without considering that there might be a problem.
Admin
Trick question. The code as presented doesn't compile.
String.Contains does not accept a char parameter.
Change it to
and it works.
Admin
Even assuming a programmer was forced to loop through this string character-by-character, that's the whole reason for StringBuilder.
Admin
At the additional cost of an extra comparison to avoid out of range access to an array. Maybe unless you're 102% sure the string is \0-terminated and nobody will ever add another j++ to the loop body.
(I love when a dialect considers all loop variables immutable within the loop.)
In C# it does.
(The code compiles without change.)
Admin
This was my favorite part. He added to the obscurity by managing to even go about typing out all the letters in a way that made it unclear if he skipped one or not.
Admin
Admin
If only the compiler had some way to know whether a local variable was assigned to during its way round the loop. If only there was possible to tell if that might have happened…
Admin
Yeah, that one's a tough problem. I've heard that there's some guys at MIT working on it, and they might have a solution in a couple of years.
:scream:
Admin
In C#, 62.
Admin
There's a worrying undercurrent of :wtf: here which I'm having difficulty accepting on a philosophical level. It goes: "It doesn't matter if my code is terminally appalling, because the optimiser will iron out all the kinks and make it as efficient as if I'd written it properly in the first place."
Yes I know I've deliberately misrepresented the intent of some of the above, but it's instructive to see where we end up.
If the compiler did not do any optimisation, then any bad code would soon be caught by the fact that it takes too long to run. But with optimisers to take the sting out of some of the unthinking sillinesses perpetrated by incompetent, naive and/or insouciant programmers, the accumulated codebase of the universe cannot help but be compromised.
While the above may be viewed as a naive and unrealisting view of the industry, I still believe it needs to be borne in mind.
Admin
At first, I thought TR :wtf: was going to be that the author missed a letter in their blacklist string (but turns out they got all of
[A–Za-z0-9_]
).Unicode is hard.
Yup. Turns out, in a lot of contexts (especially security), whitelists are better than blacklists. Don't try to enumerate all of the bad things, you'll undoubtedly fail.
Admin
If there's no performance difference then performance is not a relevant factor when comparing two approaches to implementing something. "Properly" is not absolutely defined.
This is a dumb argument. If given implementation has an optimiser, then there is no point inventing hypothetical situation in which it doesn't. And if it doesn't, then it likely already incurs enough overhead on its own to make micro-optimisation doubly useless and/or is not used in context where shaving every possible microsecond is important.
Except that's demonstrably untrue. We're not running 386s any more. And actual high-performance computation code does way more than optimisers can do automatically at the moment (and that's things like vectorisation).
The fact is, there's not many applications that need optimisations in which the difference is measured in cycles. The code in OP is algorithmically bad and that's MAYBE the reason to optimise it, and not that oh no it calculates length - 1 more than once. And I say maybe because it might not be worth it. Especially if it's surrounded by I/O.
Profile first, dammit.
Not naive, just woefully outdated.
Admin
In other words, shut up grandad, we write shit and are proud of it.
Admin
If you want to ignore all of the other words I typed, sure.
Admin
Welcome to TDWTF.
Admin
I didn't in my copy of VS 2013...
Admin
Where "bad code" is at least partly defined as code written without respect to the compiler's implementation details.
Most major performance problems IME have pretty near 0% to do with particular compiler optimizations.
Admin
Indeed. A bit of a trick question for a training assignment, one of my favourites. The wrong answer is nearly always deny, allow (blacklist) rather than allow, deny (whitelist). Another correct answer is to complain about the lack of a specific specification. The best way to explain it seems to be to tell people to think of it like a shopping list. In that scenario, would you create a list containing all of the items you don't want to buy in the shop?
A real WTF would be the same code sample, 100% correct functionally and equivalent but with a list of all the disallowed characters. I am sure there has to be one of those in the dailywtf archives. In training assignments when you see the blacklist getting bigger than the whitelist would be you start to laugh.
The only real problem I see with the WTF code at a glance is that it will allow strange names and might be slower than it could be because of a lack of what appears to be common knowledge. Bad loop condition, contains is slow (O(n), could be O(1) with a LUT) and lack of string builder. Regex patterns can be pre-compiled as well in most implementations so probably wouldn't give anyone performance concerns for most scenarios.
Admin
Wait, what? "Deny all characters except ABC" is a whitelist. "Allow all characters except DEF" is a blacklist.
Admin
It's from Apache httpd but also a basic principle for firewalls in general (default action). It just means really allow [list] then deny everything else. Everything in security has to be obscure and confusing (misdirection is key), even the instruction manual. I don't know why. Have you seen the sourcecode for some cryptographic libraries? When people go into doing security mode it leaks out everywhere and over everything. Ironically it's probably actually one cause of bugs that cause security holes because of "secure-obscure code" (security by obfuscation) which tends to just hide the bugs from being fixed, rather than from hackers.
Admin
Yeah, I recognised the syntax.
Something like:
Right? Where you literally have the string
deny,allow
. It's a whitelist ;)Admin
I see, they have a third option which I don't use.
"First, all Allow directives are evaluated; at least one must match" -> "any requests which do not match an Allow or a Deny directive are denied by default."
That's all I use. I suppose the middle denies are to apply a blacklist to the whitelist.