- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
Exactly for the purpose you mentioned: to avoid unnecessary calculations.
Admin
Hey, that's not fair! :'( I only use my index fingers too when I type. Ok, I don't look at the keyboard all the time when I type, but I'm faster with my two indexfingers then most of my classmates wo use ten fingers.(S)
Admin
Belated pessimization is the leaf of no good.
There is a difference between good coding practices and premature optimization. In C++, for example, passing a large object by reference (not by value) is a good practice that you do automatically to avoid a needless performance penalty. It's not an optimization. It is the removal of an avoidable performance "pessimization". The same goes for caching results of potentially expensive or extremely repetitive calls, like checking a length using an accessor or attribute. The compiler may not be able to optimize the call out of the loop safely (something might change the state of the object externally), so it will not do so. Good coding practice says to hoist all code out of a loop that is invariant inside the loop. If the string length will not change in the loop, why keep computing (calling) it? Cache it in a local variable, and use the result of that. Now the compiler will probably be able to optimize the use of the local variable because it knows it is not being changed in the loop.
You might think that it is a waste of time to worry about small costs like this, but believe me, these small costs add up to a big one in a large code base. Read any decent book on performance optimization for confirmation.
Admin
Haha, I had a similar experience with someone who had been "a programmer for 5 years". He was not at any university (nor in posession of a degree), though so I think your story wins.
Admin
What's up with the size of Viewstate on the front page? It's humongous.
Admin
The String class is final and cannot be subclassed.
Admin
What manipulations are you speaking of? Note that Java Strings are defined to be immutable, so at the very least a sane implementation would cache the value at first access.
Then again, implementing String like this in the first place would be far from sane, since you'd then have a LOT of methods with O(n) rather than O(1) running time - including charAt(). So your loop is going to be O(n^2) anyway and it doesn't really matter.
Admin
No. Doing microoptimizations that decrease code clarity as a matter of course is stupid, a programmer error and a WTF.
Admin
Brazzy: "No. Doing microoptimizations that decrease code clarity as a matter of course is stupid, a programmer error and a WTF."
If it makes the code less clear, sure. That is not of course the case here, so your statement is somewhat irrelevant.
To compute the loop bounds at the start of the loop is not objectively unclear. It is merely unfamiliar to you. It's not as though there's any more state to worry about. In the 'traditional' style,
for (int i = 0; i != ctr.end(); ++i)
you have to worry about the state of two variables, i and ctr (whatever that may be). In the alternate form
for (int i = 0, end = ctr.end(); i != end; ++i)
you have to worry about the state of i and end. My personal opinion is that the latter is simpler, since the end variable is a simpler type.
It also very clearly states what is going on:
1: here are the first good' and 'first ungood' indices 2: the loop then advances from the former to the latter
You may find the second form less clear. So be it. You may prefer not to do it. Also so be it - although a style that encourages a pessimisation might be discouraged round here, we all agree that profiling is what should be used before active optimisation.
But don't show your ignorance in dissing the style.
Admin
Java strings are not mutable, length is therefore computed at creation time and never has to be computed again.
A java implementation that wouldn't store it's string's length would be at best stupid.
Admin
Agreed. With the former one has to know that the internals of the loop do not alter the values within ctr, whereas in the second one knows that we are always searching for a constant value.
Admin
oh my.
Admin
So, essentially, with Java you must do operations that copy entire strings when doing any simple operation, such as concatenating strings (which results in recalculating the length and storing it...)?
Admin
Basically correct. Operations that appear to modify a String object actually return new String object instances:
http://java.sun.com/j2se/1.4.2/docs/api/java/lang/String.html
The .NET String class works in a similar fashion:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpref/html/frlrfsystemstringclasstopic.asp
If you are performing large quantities of concatenation (in a loop, for example) you can use the StringBuffer (Java) or StringBuilder (.NET) objects to do this efficiently. In fact, both systems will typically use these "under the covers" as an optimization for simple concatenation operations.
Admin
Tell me how you would concatenate two Strings in any other language without copying at least one of them. And not any operation requires copying,
e.g. the substring() method returns a string object backed by the same character array, just with a different begin index and length,
What does require copying is any operation that changes the content of the string. This has performance implications, but not only negative ones. The immutability of strings means you never have to do defensive copying or synchronization to prevent someone from changing a String you are working on.
If you really need a mutable string, the class StringBuffer is just next door.
Admin
While mine is that it's more complicated because there is an additional variable.
Can we simply agree to disagree on this point and stop repeating that one or the other is stupid, mandatory, or whatever?
Which might be the intended behaviour or (IMO more likely) a bug waiting to happen.
Admin
I disagree. Either way, if you don't know that the contents of ctr are not changed you have a bug waiting to happen in the general case.
Some (but not all) containers invalidate your iterator when you change the contents of the container. So no matter what you need to either know the internals in details to know that the changes you are making to the container will not invalidate the iterator, or you need to not alter the values within the container.
If your iterator is valid, is it valid in a useful way? Deleting the current element means the iterator has to move, will the loop move it again? If you add an element should the iterator get to that new element?
Note that I'm speaking in general. ctr was not defined in the example so we do not know how it works.
Admin
When you work with an iterator, you usually don't have to index-loop anyway; in fact, doing so would be a WTF.
Admin
No. Because a) it's not a microoptimization except in the most simplistic case, and b) it doesn't decrease code clarity, it increases it because now you're not making an implicit function call.
I mean, really, an extra variable decreases clarity? Are you serious? In at least one implementation given, the variable is set as part of the loop initialization. While I dislike that style myself, it's good in that it's defining both your start and end points right there in the loop initialization chunk of code. What decreases code clarity is when you are not specifying explictly at what point a counting loop ends. Instead you're making a function call to determine whether to end the loop or not. That function call could do anything, without looking at it, you have no way to know. You just have to make assumptions. It's anything but clear.
Admin
No problem.
Admin
They manage to portray their argument without any proof at all. They've managed to agree to disagree since they can't take the heat. We know who it was, however, that stated that they had re-used 'extra' variables for purposes that were not intended, thus introducing bugs into code; let's see:
"In my experience, people create a lot of throwaway variables simply because a line got too long and they're unfamiliar with the possibility of having a statement span lines.
The same people then tend to reuse those throwaway variables for unelated [sic] tasks and create really nasty bugs. I know, because once upon a time I wrote code like that myself."
It takes that special person who has learned how to program computers and suffer from such great hubris to the point where they close up their minds to actually learning something new and fight against it without any solid argument other than using themselves as a prime example.
Instead, they continue to voice their own opinions instead of actually taking the time to read everyone's input and understand it. Their main goal is to do nothing but spread fear, uncertainty and doom.
It is people like that who keep ignorant, uneducated and shoddy coders in a profession where others who actually do a good job and enjoy it for the "love of the art" are slaves to cleaning up after their mess.
That's why Budweiser salutes you, "Mr. Daily WTF Perpetuator".
Admin
Extra variables introduce extra state. Extra state means more to keep track of (mentally). At least use a run-time constant if you need to introduce a named temporary - in languages that support this -, that way it's obvious it's not going to get modified.
If that is in fact what you're doing, then never mind.
Admin
They, they, they ... they are all stupid and mean and they all made up their minds about everything. Plus, they all overgeneralize things every single second! Those idiots! I hate them.
By the way: Where's your argument?
Admin
Have you reacted to anyone's shortcomings today with physical violence? I know that you like to break the bones of people who make coding errors, because that's such an effective teaching tool.
By the way, does your uncontrollable penchant for violence extend to your non-working hours? For instance, if a waitress got your order wrong, would you smash her head through the counter-top? Come on tough guy, tell me how you beat people up for making errors.
Admin
<FONT face="Courier New" size=2>haven't you brought this up with someone else before?</FONT>
Admin
Dunno if he has or not, but it sure sounds vaguely familiar.............
Admin
Nevermind all that rubbish - have you ever had a debate with a libertarian over the value of silver?
Admin
<FONT face="Courier New" size=2>yes, and in fact, today there was more banter with the libertarian.</FONT>
<FONT face="Courier New" size=2>as a libertarian, do you collect silver or gold? because i saw on the television today that they're selling gold coins for only 20% down that feature some kind of orchestra ensemble, presumably the most popular gold coins in the world. also, if you called their number, 1.888.567.GOLD, they would send you a free pamphlet explaining why you should invest in gold.</FONT>
Admin
Just a pamphlet? Man, you need to get in touch with my supplier - they give you a whole book about buying gold and how the price of gold is guaranteed to go up.
Admin
Nice, but pretty heavyweight, and hardly fit as a general String implementation.
Admin
Didn't you know? Calling the person you disagree with ignorant and narrow-minded beats any argument!
Admin
Absolutely. Ask the functional programmers.
That's a perfectly fine specification right there.
If you are uncertain what a function call will do, you have a problem of a completely different scope. Whether you call it once (why do you think your criticism doesn't apply then?) or many times is then the least of your worries.
Admin
<FONT face="Courier New" size=2>that's perfectly reasonable. but you don't go around telling people that gold has intrinsic value. it's the same thing as saying that shrubbery out there is feeling hyper, or that the fedex truck desires to be a sky baron.</FONT>
Admin
Opps, good point.
This doesn't invalidate my point though. If you are working over members of some set, and modifying that set, you need to know how the internals works enough to ensure that your modifications do not change the set in such a way that your iterator/index becomes invalid.
If your set is a list and you insert before the current element, then the next element may be the one you just worked with. If you append to the end of the list, did you really mean to get that appended element this time you work through the list? (I can think of reasons to go either way)
I think think of a hash table that would allow iteration in various forms, but by adding or deleting an element the order that things are iterated would change. (if you cross a threshold where the table determines it is worth re-hashing the entire table with a different hash function for example)
So my point stands: if the elements of a container change, then you need to know how the container works internally with respect to those changes to be sure that your iterator/index is valid.
Admin
You're still missing the point. If you're calling a function many times that you know you only need to call once, then that is a WTF, don't you think?
Slice it any way you like, but you're intentionally writing code designed to do something notably stupid. Expecting the compiler to clean up your sloppy coding style is more than a bit silly.
Admin
Thanks for the advice chief - not that I have ever made such a claim.
Admin
I agree complely. So what are we arguing about?
Admin
No, obviously I don't think so.
No. I'm intentionally writing code that minimizes state and thus increases clarity, which is notably smart. I do so in the full knowledge that it may result in non-optimal performance with a very low probability of this actually ending up to be performance-relevant and needing to be changed later. But if I wanted to avoid any possibility of non-optimal performance, I'd be writing assembler and never get anything done.
Admin
is
numElements = cnr.getNumElements()
for(i = 0; i < numElements;++i)
equivalent to
for(i = 0;i<cnr.getNumElements();++i)
in all cases. Since clearly if they are not equivalent you cannot use the first form, and thus we cannot argue that one form or the other is better.
We have agreed that we can go ahead with the latter argument now. I forget which side I was taking though. Do you have any preference to which side you want to take, cause I can do the other then.
Admin
Ah, I think we can now resolve this promptly. Obviously, they are NOT equivalent in all cases, since cnr could theoretically be manipulated by a different thread. However, what I was talking about was the special case where cnr is created locally and never passed anywhere.
</cnr.getnumelements>
Admin
In 1999, the following article appeared about Java Hotspot regarding optimizations, such as method inlining.
http://java.sun.com/developer/technicalArticles/Networking/HotSpot/index.html
But such "method inlining" becomes problematic within the world of object oriented code design. "Often, the address you're going to jump to is hard coded in the instruction," says Stoutamire. "But that's not always the case. In some instances, with what's called dynamic dispatch, or virtual method invocation, you have a pointer at runtime that's used to call one of a set of different methods."
....and goes on to state....
"The problem with inlining," continues Stoutamire, "is that you can't inline across dynamic dispatch. The reason for that is that you're never really sure what method you're going to call, so you can't bring the body of the method up into the call."
This support my earlier argument on any type of getter function that is virtual. Now, this article, which appeared on WebSphere Advisor, also notes that one should avoid using the method call in a for() loop... note that there is still no mention of any object which is 'constrained' within a single function call.
http://doc.advisor.com/doc/12053
There is another, which I consider very well written article here:
http://www.webcom.com/~haahr/essays/java-style/single-page.html
Under the section "Create new variables rather than reassigning old one", to which I am sure you can relate, states the following:
"Local variables are useful for providing names for intermediate results. The value of doing so is diminished if a local variable is used to hold several different values with different interpretations during its lifetime. (This does not apply to loop or accumulator variables, which, by their nature, are meant to change throughout their lifetimes, but their meaning should always be the same, relative to the current iteration of a loop.)"
...also germaine is the following under "Set loop limits in for-initialization clauses":
"The initialization clause of a for statement is executed exactly once, where the termination test is executed every time around the loop. If the upper bound on a numeric (typically integer) loop could be changed by execution of the loop and one does not want to use the changed value (or if the upper bound is time-consuming to compute) the loop limit can be declared and set along with the iteration variable in the initialization clause."
Another good site that talks about performance improvement techniques in loops for java:
http://www.precisejava.com/javaperf/j2se/Loops.htm
In the section entitled "Overview of loops" they state:
"Always avoid anything that can be done outside of the loop like method calls, assigning values to variables, or testing for conditions."
An article on O'Reilly's site even proffers the same advice:
http://java.oreilly.com/news/javaperf_0900.html
What's really interesting is what he states in the section entitled "Eliminating the Unnecessarily Repeated Method Call". This I found somewhat interesting:
"Amazingly, while most of the VMs gain one or two percent in speed from this change, eliminating the repeated method call actually makes the 1.2 JIT VM run 40 percent slower! ...(While I can't imagine what makes the change so disadvantaged, I suspect it's some inefficient artifact of the native-code generation on my processor, such as an incorrect integer alignment. This likelihood is confirmed by a further test, which combines the two optimizations used so far"
So much for the spectacular intelligence of the JIT at the time the article was written. Perhaps it has changed quite a bit over the years; however, I don't know if it's quite intelligent enough to track all references that a person might make to a object that was only declared in a single function. In fact, what's to say that a reference (or several) couldn't be made to an instance of that one object and one of them eventually passed to another function? Is the JIT compiler now keeping track of all object references?
The Java Performance Portal, also touts:
http://www.jptc.org/article10.html
Note that they state (in one of their bullet points), "Avoid using a method call in the loop termination test. The overhead is significant, because after each iteration the method has to be called.
I suppose with all of this information that's been posted everywhere, it's a conspiracy. Either that, or you're the inventor of the first omniscient Java JIT compiler that you're keeping hidden from the rest of the world.
Admin
What exactly do you mean by copy? Of course, one could use a constant reference from the literal pool. Within C#:
const string something = "No " + "problem.";
The literal pool is generated at compile time and its constant reference is used throughout the code. There is no copying. Also, because the '+' operator is used between two literal strings, they are concatenated at compile time. Any other questions?
FWIW, Alexis had made a joke that had me rolling on the floor laughing when I read his reply. After all, he did concatenate two strings in the english language without copying any of them. If you don't specify your bounds, you're open to some very creative interpretion. Perhaps that's how loop bounds should be treated.
Admin
No, it doesn't, because you failed to read the next paragraph, where they say that the HotSpot JVM has a way around this - and note that this article is from 1999, which is more than half of Java's lifetime ago. JIT technology has advanced a LOT since then.
I'd take any performance advice from a page that also touts "exception terminated loops" (one of the worst Java performance myths, which has shown up here as a WTF some time ago) a few paragraphs on with about a tonne of salt.
No, it's just that when it comes to performance tuning in Java, there are a lot of persistent myths being circulated. Some of them are just total garbage, others were right at one time but became invalid because of advances in JIT compiler technology. Since Java is so popular, but also so popularly maligned for performance problems, a lot gets written about this and, once published on the web, tends to stick around long after it's ceased to be meaningful.
But I'm tired of theoretical arguments, why not let the numbers speak? I ran the code below (hope the less-than signs turn out right), which loops through a list (which isn't even local!) of 200,000 elements 1,000 times each with calling the size() method once and calling it always. It does the minimal amount of actual work I can imagine (adding an integer) to avoid the loops being optimized away completely (which the Java JIT is capable of). On my P4 Mobile 2GHz, running on JDK 1.5.0_2 with the server VM, this is the result:
Total time taken(ms) when calling size() once: 3665
Total time taken(ms) when calling size() always: 3445
If this does not prove to your satisfaction that either the JIT is capable of optimizing away the repeated call to ArrayList.size(), or that even 2,000,000 calls incur so little overhead that it's not even measurable, then please suggest a better methodology.
import java.util.*;
public class TestLoop
{
public static void main(String[] args)
{
List list = new ArrayList();
for(int i=0; i<200000; i++)
{
list.add(new Integer(i));
}
long acc=0;
long start;
start = System.currentTimeMillis();
for(int i=0; i<1000; i++)
{
acc += loopCallAlways(list);
}
long callAlways = System.currentTimeMillis()-start;
start = System.currentTimeMillis();
for(int i=0; i<1000; i++)
{
acc += loopCallOnce(list);
}
long callOnce = System.currentTimeMillis()-start;
System.out.println(acc);
System.out.println("Total time taken(ms) when calling size() once: "+callOnce);
System.out.println("Total time taken(ms) when calling size() always: "+callAlways);
}
private static long loopCallOnce(List list)
{
long acc = 0;
for(int i=0; i<list.size(); i++)
{
acc += ((Integer)list.get(i)).intValue();
}
return acc;
}
private static long loopCallAlways(List list)
{
long acc = 0;
for(int i=0, end=list.size(); i<end; i++)
{
acc += ((Integer)list.get(i)).intValue();
}
return acc;
}
}
Admin
Yeah, C# borrowed that (as it did pretty much anything) from Java, where it works exactly the same way.
What if the strings are not compile-time constants? I think this is the case Mike R meant when he implied that Java Strings being immutable would be a performance proble, which is what I was replying to.
Umm... you mean, by typing them out in one go? I fail to see what's so funny about that and doubt he even had that in his mind when he wrote his reply.
Admin
Oh, I see - I hurt your feelings and now you're stalking me.
Admin
I may have read too much into that reply; if so, it was a very good unintended 'funny'. It seemed to me it could have been a very obscure jest. Either way, I had a good chuckle.
Admin
I hardly think that the next paragraph, which states:
*"The Java HotSpot VM handily works around this problem by using dynamic deoptimization. "Deoptimization is the ability to, at any point, revert from compiled code to interpreted code...."
I will concede that the JIT compiler is smart enough to "know when it is beat" with regards to optimization, not the other way around as was presented.
I can definitely agree with that and is even touted as one of the 'myths' that has been eclipsed by JIT compiler technology. (I did look for those as well.) Sometimes in the past there was apparently some performance gain for poor programming practice; I imagine at one time with JIT intelligence in its infancy, someone did eek out more CPU cycles by catching the exception. The article does go on to say that the would not recommend doing that. However, I in my search for java optimization myths, I have yet to find any that cite using a temporary in the scope of the for loop as the terminator is faster or slower either way. I do maintain that declaring another local *within* the scope of a for loop is hardly poor practice, and is still a valid optimization technique, at least in the .NET arena; it is also more palatable than abusing the language (such as with the exception catching technique).
Touche. Yes, things do change over time and sometimes they do improve greatly. Although even with today's virtual machine technology, I find that the loop terminator tidbit of advice is still valid for performance tuning (at least with the .NET virtual machine).
Well, I'm skeptical about the results; however, I would probably be less inclined to argue with results from a profiling tool instead of having the application do its own timing. I've usually found that to be not so accurate.
Admin
Well, I'm not going to spend a lot of effort to find something I'm convinced doesn't exist, though I may run the banchmark through a profiler on my working PC on Monday (where I did a lot of profiling this week and found a MAJOR performance WTF in a massively-deployed framework for financial apps).