- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
This can easily be further optimized by skipping the intermediate xypoint list, and immediately store everything in a double[] list.
So, even the solution is still kinda wtf-ey...
Admin
It really should be:
List xyPoints = new ArrayList();
although you ought to start thinking about moving up to Java 5 sometime and do something more like:
List <XYPoint> xyPoints = new ArrayList <XYPoint> ();
... and then you may get people to believe you're a software professional.
Admin
He should at the same time fix the exception handling and if possible converted to a try with resources.
Admin
Wouldn't it even be slightly better to use a real list (for example LinkedList), not an array-backed list? An ArrayList still needs to be recreated by copying the data to a new array if the array is full when adding an entry. If I understand it correctly, at least it increased the size roughly by *1.5, instead of +1 as in the original code. So, for an ArrayList without an initial size given to reach a size of 100K, it still needs to recreated 23 times...
Admin
and here forever more he is looked at as a miracle worker... The following week he is tasked for figuring out why their outside connection is slow, when he says it can't be fixed unless they go to the provider he is sacked for his inability to get a simple job done. The ones that hired him are also sacked, and the ones that sacked them are sacked.
Admin
Remcoder : but he doesn't know the length of the array until it has been read in.
That would seem to justify the intermediate xypoint list.
Admin
But, but, but the refactored code lost the enterpriseyness!
Admin
The original code also has a garbage collection fiasco which means that it will use a great deal more memory than it needs as well
Admin
I had this brillant programmer (or somebody just like him) on one of my teams several years ago. I wondered what became of him after we let him go.
Admin
I'm sure the original version worked just fine on a test file of a 20 points or so... and beware of pre-mature optimization ;)
Admin
Yes, but beware also of premature pessimization.
Admin
Even if this code was written pre-1998 (and therefore pre-ArrayList in the JCF), I'd still want to find the developer and wallop him with either a data structures or an algorithms textbook. Growing an array while amortizing the cost of array copies (as ArrayList does) was a known technique in the 90s. Or, if you didn't realize that (or if it wouldn't work in this situation) at least use a linked list and then copy it into the array once you know the size.
If this code was written post-1998 (and there weren't any management directives to keep all code backwards compatible to Java 1.1) then I think the wallop would need to be harder.
Admin
Why is XYPoint a "convenience class"?
That seems to be an inconvenience class.
Admin
A better but not best solution would have been to use memcpy in place of looping through the hard coded copy of all x and y values.
Admin
Meant to add that the memcpy solution is basically the Arraylist solution but capacity grows 1 at a time instead of ArrayList's 50%.
Rant: WTF does ArrayList not allow the user to set the capacity increase? Why 50%? I wish I could balance performance with memory use. Granted there is the trimToSize() method or whatever it is that can be called once the arraylist is populated.
Admin
The original programmer could at least have reduced it from N^2 to NlogN by using the well-known technique of doubling the array size when it becomes full, rather than adding 1 for each point.
Admin
TRWTF is the estimation of 2 billion copy operations.
Admin
The original code has the array increasing (copying) by 1. At least he could have used powers of two to get close, or given that a rough estimate of the file size might be available, start there.
Maybe if there were only three points this original solution might be OK, but after 1,000 you had better look harder for a viable solution.
Yeah, I know, it requires a bit of thought, which appears to be sadly lacking.
Admin
TRWTF is people in the comments demanding that the convenience class be refactored when the performance problem is already fixed. Is it readable? Yes. Does it perform well enough? Yes.
Admin
The real WTF is that the boss allowed him to fix something that was actually functioning, albeit poorly.
Admin
While many of the points above are valid, I'll clarify a few things. There were many things in this application that our Dev team wanted to refactor or replace, but it wasn't high enough priority to to spend much time on, beyond fixing the occasional bug that popped up.(such as the date formats mentioned) I only found this code because I was already in the class and had some extra time in our current development cycle.(Pre-Sprint days)
The original code was likely written in the Java 4 or 5 days, and when I refactored it, I think we were running Java 6 maybe? Either way, It was back in 2010, so my memory of the details are a little fuzzy. I still remember the core WTF clearly though, since I've used it as an example of what not to do many times since then.
And yes, I did the math wrong when I was submitting, apparently. Using 100,000 points and N*(N+1)/2 should have given around 5 billion operations, doubled 10 billion to account for both X and Y copies. Not sure where I got the 2 billion from.
Admin
!
http://bit.ly/2tpWxTq
Admin
Well, let's imagine how the flyweight "convenience class" is written.
Requirement: Don't bother to use the built-in format-based input parsing on ReadLine(). We can do better than that, can't we? Oh, and don't bother to use a regexp either, because "now you have two problems." (For nit-pickers, you can always inject the Regexp. Which, of course, you could also do on the format string.)
Solution: build your own nitwit class with three methods: a constructor, and two accessors.
Or, you could just inline the code in the constructor, however haphazardly. Good bye GC pressure! (Although since we're measuring things, the GC only gets hit 100,000 times, so hey, let's just burn those cycles.)
So, yes. Lose the stupid extra class. (And to Kriby: sure, it doesn't matter in terms of optimisation in this case, but let's imagine you or I are going to be code-reviewing the original garbage. Slap a man around the head with a well-aged halibut, and he's just going to end up a bit smelly. Slap him repeatedly until the fish turns round and tells you "wait a minute! I'm not dead yet!" ... and you've achieved a mild evolution in the code-monkey progression.
Admin
Which is, um, what practically any implementation of class (X)List does.
So ... back to the obvious approach, as taken by our hero in this case.
Admin
Wow. Looking at StackOverflow, apparently it does grow by 50%. (Actually a factor of 3/2 + 1, but I don't care about off-by-one errors, hee hee.) I thought you'd mis-typed I'd always assumed a straight-forward doubling.
Nonetheless, I don't see where the constant factor involved can be the basis of a rant. This is just the characteristic behaviour of this particular container. Presumably it works in 99% of cases (ie, the common cases).
A more plausible rant might consider the difference between reading from and writing to an ArrayList. (And I don't know how it's implemented right now, so consider this merely a theoretical point.) In general, I could care less whether my container is based upon repeated reallocate-copy semantics, or whether it is based upon "tell me the expected size, I'll allocate chunks in that size, if you don't tell me I'll use a default, and I'll link all those chunks together" semantics.
I say "in general." But for those corner cases, I'd prefer the latter, which is what you seem to be asking for. But that wouldn't be an ArrayList. It would be something more specialised, and it would be specialised for a very good reason. Most people don't need this sort of performance hike ... see "Premature specialisation," etc.
(And the refactored example in the OP most certainly does not need this performance hike.)
Admin
Totally agree, when I looked at the solution I was like - um arrays are objects in Java ...
seems to work perfectly fine to me ...
Admin
They know the file size, right? And they know the format of said file? It should be easy enough to figure out how much memory you need from that information. Plus it should be more efficient than using an ArrayList, which in a pessimistic case could grow almost twice as big as necessary. Plus you wouldn't need any copies...
Admin
"Initial code pulled from memory"? Those guys don't use version control...
Admin
It's convenient for the future maintainer who has to figure out what your code does.
Admin
Thanks, Captain Obvious!
Admin
In what possible parallel universe?
Jump (using your favourite visual programming environment, say, emacs) to the constructor for this totally ridiculous class, look at the constructor, and enjoy! It's not as if a single optimiser or compiler out there is going to be able to elide this nonsense.
Alternatively, do it inline, and return a tuple. With named members, if you really insist.
Tell me again. In what possible parallel universe does this "convenience class" offer a "convenience" to the maintainer?
Admin
TRWTF is 2 billion? Actually, that is low:
N = 100,000
For each N, we go through N - 1 copies:
N * N-1 / 2
Now, TFA states 1 copy for X and 1 for Y (or times 2):
This gives us around 10 billion copies. (assuming my maths are right).
FredW
Admin
There's always that moment in a code WTF where the light bulb goes off. I got to ...
//Okay, we've got one point so far. points = new double[1][2];
And just started yammering "oh no no no no, please no ..."
Admin
Regardless of your classes, regardless of your data, regardless of the version of Java you use or the structures available to you, it seems to me you could always just run through the file once, count the points, and then declare an array of exactly that size and run through the file a second time.
Unless file I/O is horrendously slow, there's no guessing or class conversion involved. Get count, reset to start of file, read point, put in doubles array.
Admin
Yeah, that numbers way too low.
On the first point, you copy 0 points from the old array. On the second point, you copy 1 point. On the third point, you copy 2 points. . . . On the 100,000th point, you copy 99,999 points.
If you add all of those copies, you get 4,999,950,000 points copied. And each point is really copying two items, X and Y.
So for 100,000 points, it involves 100k shy of 10 billion copies, not 2 billion.
Admin
The reference to Paula is a bit unfair, since the guilty party had at least written some functioning code.
Admin
"The original programmer could at least have reduced it from N^2 to NlogN by using the well-known technique of doubling the array size when it becomes full, rather than adding 1 for each point."
That is an improvement, but much better than O (n log n), it is actually O (n). You are doubling the array size log n times, but the first doublings are for much smaller arrays. So if n = 1024, array sizes would be 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, and 1024, and only 1024 items are copied in total.
Admin
TRWTR is Tim, who can't add 1 to 100,000 to get 5*10^9
Admin
!
https://twitter.com/fun_stevehawkin/status/971560487354880005
Admin
While the performance problem was solved, another solution would be to skip the helper class XYPoint and instead just push all X and Y values to a List<Double>. When finished, this list will contain an even number of values and you just need to reconstruct it as the desired double[]
Admin
Ran into a similar problem years ago with the linker used in an embedded environment, which was excruciatingly slow. Sent in a support request, after some experimentation, asking whether they were perhaps using an n^2 algorithm to resolve fixups. After a few days got back a somewhat embarrassed "actually it's n^3, not n^2".
Admin
By using Gauss's method we can work out that N= (100000*(100000+1))/2 which equals 5,000,050,000 . But we just calculated one copy for each point; since every point has two coordinates double that value for a whooping 10 billion and a hundred thousand operations. Just 5 times Tim's measly "jaw-dropping" 2 billion
Admin
Another approach would be to read the file twice, once to count the rows and then again to load the array allocated from the count.
Admin
Sigh. It's another example of real world code where you start with something which works as per the spec, then have to make an emergency fix which isn't well thought out because there's people calling you into meetings every hour to demand to know why it hasn't been fixed, and then it never gets looked at because the attention of the manager is on the next sale or problem.
The major WTF on this site is that we're shown examples without context.
Admin
I hit the same problem in code building "database record changed" messages in C++. It seems to be a fairly common anti-pattern. :-/
Admin
Just in case - please do not use
for (int i = 0; i < list.size(); i++) {list.get(i);}
for iteration over List in Java. In case it's LinkedList and not ArrayList, you'll get the O(n^2) performance on iteration. Use for-each loop if you're Java 1.5 and above. And if you're below (really?), please use iterators.
Admin
True: the average point is copied 50,000 times, with two operations in each copy, so there are actually 100,000 x 50,000 x 2 = 10 billion copy operations.
Not that 2 billion wouldn't have been bad enough.
Addendum 2018-03-12 03:40: Huh. Guess I didn't get shown all the comments the first time...
Admin
No, you're better off using an
ArrayList<double[]>
, as once you've finished populating it you can just usetoArray
to get the array to pass to the third-party library.Admin
So it's a subtype of convenience.
Admin
The story is missing the end where the manager is so freaked out from the speed of the proper solution that they go back to using the old code.