- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
This is pretty awful code but... you know, we've seen at least 20 such string validation functions in the past few months. I thirst for some more diverse and extraordinary WTFs ;). Besides, this one strikes me as being just too ackward to be real...
Admin
Then this language is a wtf in itself. See, in Delphi, if you replace some char with '' (empty char), the entire string will not be lost.
Example:
Replacing "a" in "abc" with '' will result in "bc"
See?
Admin
Yeah, nice thread, first some guy thinks that ForUpdate part is always optimized out, so that modifying data during loop is not possible at all , then some guy believes there are only nullterminated strings in existance ..
btw. You can use nullterminated strings in delphi as well.
Admin
Ofcourse, '' != NULL ... (at least according to JavaScript)
Drak
Admin
He's right. The NULL character is '\0'. '' is the empty char or the empty string.
Admin
That's not a WTF. That's just how null-terminated strings behave.
Admin
That's l=4 :)
Admin
Nope, the undersquare is not allowed :oP
PS. One of my teachers in highschool kept calling underscores undersquares. I guess he thought the rest of the square had fallen off or something...
Admin
Write 50 times on the blackboard:
Premature optimization is the root of all evil.
Admin
Ahem, this one compiles AND works as intended.
login.replace("h",""); does nothing at all. You have to do:
login = login.replace("h","");
(damn Java)
Notice the empty space in the last output. It's the null character. When seeing the output in HTML, the last output is invisible because the browser seems to stop at the null character, you have to look at the source code :)
Conclusion: Java strings are not null-terminated strings
Admin
Nice jpg Savior...
Admin
This proves diddly squat. The IL is not what gets executed. The JIT is where most of the optimization happens, and it can make use of the fact that the list is not accessible to any other thread and the length therefore constant.
Quite apart from that, the cost of accessing a property is so miniscule that it should never, never EVER be used to justify premature optimization.
Admin
My god! This is the worst thing you can do!
The pattern "for( int index = 0; index < array.Length; index++)" is known by the JIT-compiler and optimized by it. The optimization is not only that the Length property is only accessed once, but also that no access of array[index] in the loop is checked for an IndexOutOfRangeException - the JIT compiler just knows that index has to be in the valid range (except you modify index in the loop of course).
Of course you won't see this optimization in the IL code - that's what the C# compiler emits, not what's emitted by the JIT compiler and executed.
And you won't see the optimized code in the debugger either - because all optimization has to be disabled for enabling debugging.
If you want to see it: try and measure!
Greetings
Admin
Hay guy are you aware that "" is not "\0" and that most modern languages are not using raw null-terminated strings anyway?
Admin
I used to think of that example as one of premature optimisation, but since it's essentially no less difficult or clear either to understand or type, it's not really a good example at all. There's not the downside associated with premature optimisation, and since loops are the sort of thing your brain kinda goes and "types itself", typing:
for (int i=0, n=whatever.Count; i<n; i++)<br>
takes a trivial difference in time from:
for(int i=0; i<whatever.Count, i++)<br>
So, it's not a serious problem. The example in the code loads the new variable outside of the loop syntax though, which could unnecessarily bloat up the method. In general people load things into new variables as a matter of code clarity though, which is a perfectly reasonable idea, since it'll have no effect on the performance, and anything that makes something more readable is worth it.
As for loops, I prefer the "foreach" pattern in C#.
Admin
this is just not good
Admin
i dunno their is somthin wrong with the cod it isnt right
Admin
No. '\0' is the NUL character.
Admin
Brazzy: "Write 50 times on the blackboard:
Premature optimization is the root of all evil."
T'aint premature optimisation - it's habitual avoidance of pessimisation, same as preferring prefix increment in C++.
Perhaps the language's optimiser can turn the getter version into the same thing, but why write inefficient code in the first place just to rely on the optimiser always making the improvement?
And no, I'm not arguing that everyone should do this. I'm just pointing out that it's a habit that certain coders have got into for a good reason, so if you don't like it, you'd better come up with a better reason than Ytram's one.
(Perhaps Ytram would like to re-examine his feelings. Few things are more irritating in other people's code that stuff done for no apparent reason, but this isn't the case here.)
Admin
Honestly I've to say that I still can't do touch typing properly (for 10 years I think), while I do believe that I have some good programming skills.
It's stupid to judge a guy's typing skill for his/her professioness unless he/she is doing clerk work.
Admin
To me the lack of typing skills said "I don't really use a computer much," and in this guy's case I'm pretty sure that was true. As I said in the sentence before, when he was asked to make a for loop to go through an array, he typed "for (" and waited for some kind of instruction as to what to do next.
Admin
While I understand your argument that there is a certain level of optimization that the JIT compiler can perform, it's not nearly as much an optimizer as it is a converter for IL to native code. To contend that the JIT optimizations would actually be intelligent enough to perform complete run-time thread analysis and optimize away a virtual call is presumptuous at best.
From even MSDN online's PAG documentation, "Chapter 5 - Improving Managed Code Performance" (link below):
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnpag/html/scalenetchapt05.asp
"Avoid Repetitive Field or Property Access
If you use data that is static for the duration of the loop, obtain it before the loop instead of repeatedly accessing a field or property. The following code shows a collection of orders being processed for a single customer.
for ( int item = 0; item < Customer.Orders.Count ; item++ ){
CalculateTax ( Customer.State, Customer.Zip, Customer.Orders[item] );
}
Note that State and Zip are constant for the loop and could be stored in local variables rather than accessed for each pass through the loop as shown in the following code.
string state = Customer.State;
string zip = Customer.Zip;
int count = Customers.Orders.Count;
for ( int item = 0; item < count ; item++ )
{
CalculateTax (state, zip, Customer.Orders[item] );
}
Note that if these are fields, it may be possible for the compiler to do this optimization automatically. If they are properties, it is much less likely. If the properties are virtual, it cannot be done automatically."
Pay close attention to the last paragraph, especially that concerning the virtual call on the ArrayList collection class' "Count" property as was given in my example. This property is virtual and cannot be inlined by the JIT compiler, unlike a non-virtual property used to access a single field. In the particular code I once had to maintain, it relied on looping through large numbers of items, several times in different types of .NET framework collections. While I grant the design itself may have been flawed, we managed to profile the code (using DevPartner's community profiler) and saw that a good portion of the time was being spent in the 'Count' property within the loop. Modifying the loop to what I had posted earlier yielded substantial speed results when profiled against the original property being used as the loop terminator.
Admin
I still haven't seen any compelling reason to use an extra variable for loop iteration. It hurts readability(which I admit is subjective) and just plain annoying to me. Optimization has been mentioned several times but that doesn't hold up. Mainly because we know that the guy who wrote today's WTF has no clue what optimization even means. I also haven't seen a valid argument explaining how doing this intentionally is not early optimization without using a made up word like "pessimisation". [:P]
Note: I will admit that doing this in C probably actually makes a lot of sense, as a call to strlen would probably have a somewhat significant performance hit for each loop iteration. In a lot of modern languages though, the length property of an array and any other list object are maintained counts stored in a instance variable.
Admin
Do you hear that? I believe it was the sound of brazzy being SERVED!!
Admin
It doesn't need to do thread analysis at all. An object created locally in a method and not passed anywhere is not visible to any other threads than the current one. This has nothing to do with calls being virtual and requires not much intelligence.
And if the .NET JIT compiler really does little optimization then that reflects very poorly on the skills on Microsoft's developers - in the Java world it's well known that the JIT compiler is where nearly all the optimization happens - and a lot of it DOES happen. The difference in speed between the same byte code being executed on a 1.3 VM and a 1.5 VM can be massive.
Admin
Joe: I wouldn't believe everything that's in the MSDN in the first place. But to your post.
This WTF was Java code, and someone was talking about the JIT. It kind of doesn't make sense to argue against that with specs of the MS C# compiler or even help entries about that, now does it?
In Java you might notice that the code testing the behaviour of the ArrayList() will execute insanely fast. Actually, it wont be executed at all, because the Java optimizer will notice that the code does nothing. So much about bad JIT optimization. The JIT does indeed compile stuff like the given code a direct iteration of the array.
And the optimization: did you ever hear about the coding principle: "say what you mean"? You don't mean: store the length of the array how it was before the loop in a local variable and then loop over it. You mean: loop over the array. Say what you mean, and in the end, if you notice (using a profiler) that this is actually not optimized by your compiler and a really big performance problem, then you can optimize this.
There's no point in doing micro-optimizations, making (uneducated) assumptions about compiler behaviour etc. It will only introduce bugs, make the code lengthy, and - worst - more difficult to read, because it doesn't say what you mean, but what you somehow expect to be faster. Bad idea.
Admin
Well, then a better reason: it hurts code clarity. Each additional variable means more state, and more state means more things to keep in mind when reading the code, and more things that can go wrong. It may be a very small loss in code clarity, but it's also a very small gain in performance, except in a few extreme cases, which is what profilers are for.
Admin
In my experience, people create a lot of throwaway variables simply because a line got too long and they're unfamiliar with the possibility of having a statement span lines.
The same people then tend to reuse those throwaway variables for unelated tasks and create really nasty bugs. I know, because once upon a time I wrote code like that myself.
</whatever.count></n>
Admin
I can tell you that I have run into my share of problems with MSDN documentation over time. Also, I do understand that the original WTF was presented in Java. That's why I made it quite clear that a) what I had done was in C#; b) that I had examined the compiler-optimized generated code; c) profiled the code both before and after the optimization; and d) read the PAG documentation regarding when certain optimizations can and cannot be done.<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" /><o:p></o:p>
<o:p></o:p>
With that being said, I still maintain that what I did was not making any uneducated assumptions about why I had chosen to use that pattern while looping through .NET collections. The argument that the original code was in Java (of which I do wholeheartedly admit I am unfamiliar with its compiler or virtual machine behavior) does not make my arguments any less valid. Yes, I agree it is important not to make uneducated assumptions, but on the same token, I also think it is a bit close-minded to just dismiss a coding style because it irritates them or is just as easily dismissed as an "uneducated assumption".
Admin
The same principle applies in VB6 and to an extent C++ as well. Why couldn't it apply equally to Java? What if the function to get the length of a collection-like object were processor intensive? Sure, it only takes 3ms to get an answwer, but multiply that over 1000 iterations of a loop and you have a problem.
This is implementation specific. Further if it does what you say it does, it can have unwanted side-effects: What if something changes the number of items while in the loop, and this was the reason for checking that field each iteration?
Wouldn't an iterator be a better option in this case? (If you truely should adhere to the say what you mean metaphor.)
This is not what I would call something falling into the category of premature optimization. Its a simple thing to do and can be used to avoid a serious bottleneck. Writing esosteric code that does seemingly strange things that no one can understand just to squeeze a few more cycles from the processor does fall into this category, storing the count in a variable is very well understood and does not affect readability.
Do what you want to, every time I come across a program that takes several seconds to populate a listbox in a dialog, I'll think of you.
Admin
To think that a reference to a local object cannot be given to a thread while the function is performing work with that same object and in turn waiting for the thread to complete is fallacious thinking. Just because an object is local to a function does not mean that it is closed off to the rest of your application.
This article has a fairly good overview of the majority of JIT optimizations that are performed:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dndotnet/html/dotnetperftechs.asp
I still maintain that in writing C# code using the .NET collections with virtual properties, that my 'optimization' was neither uneducated nor premature.
Admin
Yeah, I would put it as my avatar, but since this forum somehow managed to disable the edit where I should enter the url...
Admin
There is no such thing as a NULL character, and we're not talking about C-style ASCIIZ strings either.
Admin
Yes it is, for two reasons: First, it takes more effort to write (unlike prefix increment), and secondly it could actually hurt performance in some cases, and checking this would require performance measurements in the first (also unlike prefix increment, which never costs more cycles in any reasonable implementation).
Admin
No, you don't, because it takes not 3ms, but something on the other of a few ns, and anything you actually DO in that loop is almost certain to completely dominate the execution time.
As is your anecdote about once having found a .NET app bottlenecked on property access.
In the given code, this is impossible.
Quite definitely.
A very, very, very unlikely bottleneck.
Admin
It does if, as I wrote, it is never passed anywhere outside that methode.
That may be so, but what does it have to do with the code that was originally criticized, which is Java code and uses the length of a String, which cannot change - a rather fundamental fact that the JIT optimizer could either depend on as a given or even easily deduct from the code (private field, never modified outside the constructor).
Admin
I've found one reason (after I removed all iof them from code I inherited): debugging. Breakpoints are usually set on lines.
Admin
I was being sarcastic... Guess you missed it..
Admin
Apparently, two different anonymic above.
It is not "NULL character" but "NUL character" (if referring to ASCII). "NULL" is a value for pointers.
Sincerely,
Gene Wirchenko
Admin
That, in itself, is a premature optimisation.
How would you know how expensive it is to access a property without actually having a look?
Sincerely,
Gene Wirchenko
Admin
This comment is huge WTF. How do you know there is a user? This could very well be buried in b2b code. Do you know what happens when you throw up a messagebox in a daemon process? Even if this is a GUI based system, doing as you suggest is a huge maintenance headache. It makes it difficult to add to the error handling and to reuse the code in different contexts.
People who are against exceptions are just plain wrong. Stop demonstrating your ignorance. Just keep it to yourself. It's you who are advocating bad practice.
Admin
<FONT size=2>First of all, it does not have anything specifically to do with the code that was posted; not every thread in this message board does since many of them do tend to branch of into tangents --often some of these containing misleading information.</FONT>
<FONT size=2>What it does have to do with is my initial post that was in reply to a comment regarding a coding style I had seen used before and had used myself. It can, and was used correctly in the context in which I presented it. However, several people did not take the time to understand that before quickly making a blanket statement that I "was wrong" or how that practice was inherently evil and a premature optimization unless one knew what they were doing.</FONT>
<FONT size=2>In my defense, I presented not only code samples and documentation to support my claim; the validity of such which was not acknowledged other than one person noting that they were skeptical of the MSDN documentation and not providing one shred of documentation to support that it was either incorrect or untrue.</FONT>
<FONT size=2>In any case, I am not backing down. I still maintain that: a) in C#, the virtual 'Count' property in .NET collections is neither inlined nor optimized to something less than a function call by the JIT compiler; and b) assigning the property to a local and comparing your loop iterator to that local to terminate the loop is a necessary mandatory optimization and in cases where iterations happen *many* times, a significant increase in performance is realized. I think a profiler is just as necessary a tool as an editor, a compiler and a debugger.</FONT><FONT size=2>
</FONT>Admin
Case in point, I had to work with another team at my old job with some application integration. They had, as you advocate, "just handled it" when there was an error condition. When they moved this code into a service on a remote server, this great message box would 'pop-up' invisibly to no user and wait for someone to close the dialog. Basically, I had to log in and restart the service every time it happened.
Admin
Any word was made-up at some point. "pessimising" has been in my vocabulary for several years now. Take a look at the jargon file:
http://www.catb.org/~esr/jargon/html/P/pessimal.html
http://www.catb.org/~esr/jargon/html/P/pessimizing-compiler.html
Readability is very subjective. Optimisation may hold up: it depends on the language and implementation.
Sincerely,
Gene Wirchenko
Admin
Storing the result of .length() and then using that stored result in your loop conditional is a good practice. Always using the "getter" (not storing the result and having "i < x.length" as the conditional) is a worse practice because it results in many unnecessary extra calls to length() (length() will be called every time the loop conditional is evaluated!)
Yes, with modern computers the cost of those extra method call is pretty negligible, but suppose you are looping over a list of 10,000 items - do you really want to make 10,000 unnecessary method calls?
Admin
No thanks, but you can write this 50 times on the blackboard:
Stupid code is stupid in all languages.
Writing code that intentionally makes function calls unnecessarily is stupid. Even if you know it'll be optimized out, you're still writing it in a way that makes the assumption that it will not be optimized out.
The code:
for (int i = 0; i < something.doSomething(); i++) {}
is explictly telling the program to call doSomething() several times. Now, doSomething could do anything. If the compiler is able to recognize that it doesn't actually have to make that function call and thus optimize the runtime, great. But when *you* are able to recognize that you don't need to call doSomething() a number of times, but do anyway, well, that's programmer error and a WTF.
Admin
And that doesn't consider the fact that the given code essentially has a side effect. If you want to call doSomething() a number of times, then you should have that code inside the loop itself instead of having it as part of the test condition. From a pure readability standpoint, the code is less readable because you have a test condition in a loop that makes a function call, even if the actual runtime code doesn't make that function call. Especially so, in fact, because the assumption should be that it *does* make said call, because that what the code tells it to do.
Admin
It depends on many factors. Maybe it isn't a factor in this code, maybe it is just getting raw information out of an array that just merely stores its size. Maybe its not some lousy 3rd party class that actually runs some code to figure out the count, then returns the count. Maybe they don't really want 1000's of function calls. So, should I only look for those special cases and profile after the fact, especially when I'm on a tight deadline, or am I going to do one simple thing that could turn thousands of function calls and cache misses into a single function call? It means the difference between a function taking 3ms-4ms and a function taking several seconds, I'll choose the first.
Maybe you have the wrong person here, but I don't recall making one single mention of having found any bottlenecks in any .NET classes... considering I don't really work with .NET anwyway.
Okay. What if what was passed to the function was a subclass of the string that did take a long time to find out its length, lets call it an "ASCII-Z Compatible String" which interoperates with a legacy system that passes around said strings.
You've obviously never worked with Visual Basic 6 [:)]
Admin
A better, more modern and more relavent example:
Suppose for a moment that the Java runtime on your particular environment/implementation internally store strings in a compact UTF-8 format. There is no way to just store a length without analysing the string to determine how many characters there are in the string.
I suppose after each manipulation, you could recalculate and store the length, but, when performing a large amount of manipulations, you've again intruduced a performance bottleneck, and this one is even more problematic: There's no way to get around it. You can't tell the string class to quit recalculating the length, but you can alter your code to get the length first, then iterate through the loop. In this case, there will even be a slight penalty on an iterator object, but thats a lot less than processing the string several times.
I tend to make a point that function calls in loop control statements are a bad thing, unless you absolutely need them. It is an especially bad thing when the return value from the function call you're testing against is known to be invariant.
Admin
Though in this particular case, the programmer obviously couldn't decide whether to signal verification errors using a return code or an exception.
A valid non-empty string produces a return code of 1, an empty string produces a return code of 0, and an invalid non-empty string leads to an exception.