The Daily WTF: Curious Perversions in Information Technology

2018-03-07 Reply Admin

This can easily be further optimized by skipping the intermediate xypoint list, and immediately store everything in a double[] list.

So, even the solution is still kinda wtf-ey...

2018-03-07 Reply Admin

It really should be:

List xyPoints = new ArrayList();

although you ought to start thinking about moving up to Java 5 sometime and do something more like:

List <XYPoint> xyPoints = new ArrayList <XYPoint> ();

... and then you may get people to believe you're a software professional.

2018-03-07 Reply Admin

He should at the same time fix the exception handling and if possible converted to a try with resources.

2018-03-07 Reply Admin

Wouldn't it even be slightly better to use a real list (for example LinkedList), not an array-backed list? An ArrayList still needs to be recreated by copying the data to a new array if the array is full when adding an entry. If I understand it correctly, at least it increased the size roughly by *1.5, instead of +1 as in the original code. So, for an ArrayList without an initial size given to reach a size of 100K, it still needs to recreated 23 times...

KattMan · 2018-03-07 Reply Admin

and here forever more he is looked at as a miracle worker... The following week he is tasked for figuring out why their outside connection is slow, when he says it can't be fixed unless they go to the provider he is sacked for his inability to get a simple job done. The ones that hired him are also sacked, and the ones that sacked them are sacked.

2018-03-07 Reply Admin

Remcoder : but he doesn't know the length of the array until it has been read in.

That would seem to justify the intermediate xypoint list.

2018-03-07 Reply Admin

But, but, but the refactored code lost the enterpriseyness!

2018-03-07 Reply Admin

The original code also has a garbage collection fiasco which means that it will use a great deal more memory than it needs as well

2018-03-07 Reply Admin

I had this brillant programmer (or somebody just like him) on one of my teams several years ago. I wondered what became of him after we let him go.

2018-03-07 Reply Admin

I'm sure the original version worked just fine on a test file of a 20 points or so... and beware of pre-mature optimization ;)

Steve_The_Cynic · 2018-03-07 Reply Admin

Yes, but beware also of premature pessimization.

2018-03-07 Reply Admin

Even if this code was written pre-1998 (and therefore pre-ArrayList in the JCF), I'd still want to find the developer and wallop him with either a data structures or an algorithms textbook. Growing an array while amortizing the cost of array copies (as ArrayList does) was a known technique in the 90s. Or, if you didn't realize that (or if it wouldn't work in this situation) at least use a linked list and then copy it into the array once you know the size.

If this code was written post-1998 (and there weren't any management directives to keep all code backwards compatible to Java 1.1) then I think the wallop would need to be harder.

2018-03-07 Reply Admin

Why is XYPoint a "convenience class"?

Create an XYPoint from a string of data
Create a new two-element array
Assign (probably the only) two properties of the XYPoint to the two elements of the array
Discard the XYPoint

That seems to be an inconvenience class.

2018-03-07 Reply Admin

A better but not best solution would have been to use memcpy in place of looping through the hard coded copy of all x and y values.

2018-03-07 Reply Admin

Meant to add that the memcpy solution is basically the Arraylist solution but capacity grows 1 at a time instead of ArrayList's 50%.

Rant: WTF does ArrayList not allow the user to set the capacity increase? Why 50%? I wish I could balance performance with memory use. Granted there is the trimToSize() method or whatever it is that can be called once the arraylist is populated.

2018-03-07 Reply Admin

The original programmer could at least have reduced it from N^2 to NlogN by using the well-known technique of doubling the array size when it becomes full, rather than adding 1 for each point.

2018-03-07 Reply Admin

TRWTF is the estimation of 2 billion copy operations.

herby · 2018-03-07 Reply Admin

The original code has the array increasing (copying) by 1. At least he could have used powers of two to get close, or given that a rough estimate of the file size might be available, start there.

Maybe if there were only three points this original solution might be OK, but after 1,000 you had better look harder for a viable solution.

Yeah, I know, it requires a bit of thought, which appears to be sadly lacking.

2018-03-07 Reply Admin

TRWTF is people in the comments demanding that the convenience class be refactored when the performance problem is already fixed. Is it readable? Yes. Does it perform well enough? Yes.

2018-03-07 Reply Admin

The real WTF is that the boss allowed him to fix something that was actually functioning, albeit poorly.

2018-03-07 Reply Admin

While many of the points above are valid, I'll clarify a few things. There were many things in this application that our Dev team wanted to refactor or replace, but it wasn't high enough priority to to spend much time on, beyond fixing the occasional bug that popped up.(such as the date formats mentioned) I only found this code because I was already in the class and had some extra time in our current development cycle.(Pre-Sprint days)

The original code was likely written in the Java 4 or 5 days, and when I refactored it, I think we were running Java 6 maybe? Either way, It was back in 2010, so my memory of the details are a little fuzzy. I still remember the core WTF clearly though, since I've used it as an example of what not to do many times since then.

And yes, I did the math wrong when I was submitting, apparently. Using 100,000 points and N*(N+1)/2 should have given around 5 billion operations, doubled 10 billion to account for both X and Y copies. Not sure where I got the 2 billion from.

2018-03-07 Reply Admin

!

http://bit.ly/2tpWxTq

2018-03-07 Reply Admin

Well, let's imagine how the flyweight "convenience class" is written.

Requirement: Don't bother to use the built-in format-based input parsing on ReadLine(). We can do better than that, can't we? Oh, and don't bother to use a regexp either, because "now you have two problems." (For nit-pickers, you can always inject the Regexp. Which, of course, you could also do on the format string.)

Solution: build your own nitwit class with three methods: a constructor, and two accessors.

Or, you could just inline the code in the constructor, however haphazardly. Good bye GC pressure! (Although since we're measuring things, the GC only gets hit 100,000 times, so hey, let's just burn those cycles.)

So, yes. Lose the stupid extra class. (And to Kriby: sure, it doesn't matter in terms of optimisation in this case, but let's imagine you or I are going to be code-reviewing the original garbage. Slap a man around the head with a well-aged halibut, and he's just going to end up a bit smelly. Slap him repeatedly until the fish turns round and tells you "wait a minute! I'm not dead yet!" ... and you've achieved a mild evolution in the code-monkey progression.

2018-03-07 Reply Admin

Which is, um, what practically any implementation of class (X)List does.

So ... back to the obvious approach, as taken by our hero in this case.

2018-03-07 Reply Admin

Wow. Looking at StackOverflow, apparently it does grow by 50%. (Actually a factor of 3/2 + 1, but I don't care about off-by-one errors, hee hee.) I thought you'd mis-typed I'd always assumed a straight-forward doubling.

Nonetheless, I don't see where the constant factor involved can be the basis of a rant. This is just the characteristic behaviour of this particular container. Presumably it works in 99% of cases (ie, the common cases).

A more plausible rant might consider the difference between reading from and writing to an ArrayList. (And I don't know how it's implemented right now, so consider this merely a theoretical point.) In general, I could care less whether my container is based upon repeated reallocate-copy semantics, or whether it is based upon "tell me the expected size, I'll allocate chunks in that size, if you don't tell me I'll use a default, and I'll link all those chunks together" semantics.

I say "in general." But for those corner cases, I'd prefer the latter, which is what you seem to be asking for. But that wouldn't be an ArrayList. It would be something more specialised, and it would be specialised for a very good reason. Most people don't need this sort of performance hike ... see "Premature specialisation," etc.

(And the refactored example in the OP most certainly does not need this performance hike.)

2018-03-07 Reply Admin

Totally agree, when I looked at the solution I was like - um arrays are objects in Java ...

	List<double[]> points = new ArrayList<double[]>();
	for (int i = 0; i < 1000; ++i) {
		double point[] = new double[2];
		point[0] = 10;
		point[1] = 20;
		points.add(point);
	}
	for (double[] point : points.toArray(new double[][] {})) {
		System.out.println(point[0]+":"+point[1]);
	}

seems to work perfectly fine to me ...

2018-03-07 Reply Admin

They know the file size, right? And they know the format of said file? It should be easy enough to figure out how much memory you need from that information. Plus it should be more efficient than using an ArrayList, which in a pessimistic case could grow almost twice as big as necessary. Plus you wouldn't need any copies...

2018-03-07 Reply Admin

"Initial code pulled from memory"? Those guys don't use version control...

2018-03-07 Reply Admin

Why is XYPoint a "convenience class"? ... [stuff you do with it] ... That seems to be an inconvenience class.

It's convenient for the future maintainer who has to figure out what your code does.

2018-03-07 Reply Admin

At least he could have used powers of two to get close, or given that a rough estimate of the file size might be available, start there.

Maybe if there were only three points this original solution might be OK, but after 1,000 you had better look harder for a viable solution.

Thanks, Captain Obvious!

2018-03-07 Reply Admin

In what possible parallel universe?

Jump (using your favourite visual programming environment, say, emacs) to the constructor for this totally ridiculous class, look at the constructor, and enjoy! It's not as if a single optimiser or compiler out there is going to be able to elide this nonsense.

Alternatively, do it inline, and return a tuple. With named members, if you really insist.

Tell me again. In what possible parallel universe does this "convenience class" offer a "convenience" to the maintainer?

2018-03-07 Reply Admin

TRWTF is 2 billion? Actually, that is low:

N = 100,000

For each N, we go through N - 1 copies:

N * N-1 / 2

Now, TFA states 1 copy for X and 1 for Y (or times 2):

This gives us around 10 billion copies. (assuming my maths are right).

FredW

2018-03-07 Reply Admin

There's always that moment in a code WTF where the light bulb goes off. I got to ...

//Okay, we've got one point so far. points = new double[1][2];

And just started yammering "oh no no no no, please no ..."

2018-03-07 Reply Admin

Regardless of your classes, regardless of your data, regardless of the version of Java you use or the structures available to you, it seems to me you could always just run through the file once, count the points, and then declare an array of exactly that size and run through the file a second time.

Unless file I/O is horrendously slow, there's no guessing or class conversion involved. Get count, reset to start of file, read point, put in doubles array.

2018-03-07 Reply Admin

Yeah, that numbers way too low.

On the first point, you copy 0 points from the old array. On the second point, you copy 1 point. On the third point, you copy 2 points. . . . On the 100,000th point, you copy 99,999 points.

If you add all of those copies, you get 4,999,950,000 points copied. And each point is really copying two items, X and Y.

So for 100,000 points, it involves 100k shy of 10 billion copies, not 2 billion.

2018-03-07 Reply Admin

The reference to Paula is a bit unfair, since the guilty party had at least written some functioning code.

2018-03-07 Reply Admin

"The original programmer could at least have reduced it from N^2 to NlogN by using the well-known technique of doubling the array size when it becomes full, rather than adding 1 for each point."

That is an improvement, but much better than O (n log n), it is actually O (n). You are doubling the array size log n times, but the first doublings are for much smaller arrays. So if n = 1024, array sizes would be 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, and 1024, and only 1024 items are copied in total.

2018-03-07 Reply Admin

TRWTR is Tim, who can't add 1 to 100,000 to get 5*10^9

2018-03-07 Reply Admin

!

https://twitter.com/fun_stevehawkin/status/971560487354880005

2018-03-08 Reply Admin

While the performance problem was solved, another solution would be to skip the helper class XYPoint and instead just push all X and Y values to a List<Double>. When finished, this list will contain an even number of values and you just need to reconstruct it as the desired double[]

2018-03-08 Reply Admin

Ran into a similar problem years ago with the linker used in an embedded environment, which was excruciatingly slow. Sent in a support request, after some experimentation, asking whether they were perhaps using an n^2 algorithm to resolve fixups. After a few days got back a somewhat embarrassed "actually it's n^3, not n^2".

2018-03-08 Reply Admin

Tim determined that for the average file of 100,000 points, each import required a jaw-dropping 2 billion copy operations (1 billion for the Xs, 1 billion for the Ys).

Well, we can definitely determine that Tim could work a bit in his math skills. The first point requires one copy, (actually two since we have X and Y, but we'll leave that multiplication for later.) the second does two copy operations, the tenth does ten etc. In other words this just a bigger version of the problem that Gauss supposedly solved when he was eight! This one goes to 100,000 but it's still the same thing.

By using Gauss's method we can work out that N= (100000*(100000+1))/2 which equals 5,000,050,000 . But we just calculated one copy for each point; since every point has two coordinates double that value for a whooping 10 billion and a hundred thousand operations. Just 5 times Tim's measly "jaw-dropping" 2 billion

CoyneTheDup · 2018-03-08 Reply Admin

Another approach would be to read the file twice, once to count the rows and then again to load the array allocated from the count.

2018-03-08 Reply Admin

Sigh. It's another example of real world code where you start with something which works as per the spec, then have to make an emergency fix which isn't well thought out because there's people calling you into meetings every hour to demand to know why it hasn't been fixed, and then it never gets looked at because the attention of the manager is on the next sale or problem.

The major WTF on this site is that we're shown examples without context.

2018-03-08 Reply Admin

I hit the same problem in code building "database record changed" messages in C++. It seems to be a fairly common anti-pattern. :-/

2018-03-09 Reply Admin

Just in case - please do not use

for (int i = 0; i < list.size(); i++) {list.get(i);}

for iteration over List in Java. In case it's LinkedList and not ArrayList, you'll get the O(n^2) performance on iteration. Use for-each loop if you're Java 1.5 and above. And if you're below (really?), please use iterators.

Scarlet_Manuka · 2018-03-12 Reply Admin

TRWTF is the estimation of 2 billion copy operations.

True: the average point is copied 50,000 times, with two operations in each copy, so there are actually 100,000 x 50,000 x 2 = 10 billion copy operations.

Not that 2 billion wouldn't have been bad enough.

Addendum 2018-03-12 03:40: Huh. Guess I didn't get shown all the comments the first time...

urkerab · 2018-03-14 Reply Admin

No, you're better off using an ArrayList<double[]>, as once you've finished populating it you can just use toArray to get the array to pass to the third-party library.

2018-03-17 Reply Admin

Why is XYPoint a "convenience class"?

Create an XYPoint from a string of data Create a new two-element array Assign (probably the only) two properties of the XYPoint to the two elements of the array Discard the XYPoint

That seems to be an inconvenience class.

So it's a subtype of convenience.

2018-03-19 Reply Admin

The story is missing the end where the manager is so freaked out from the speed of the proper solution that they go back to using the old code.

Just One More Point

Leave a comment on “Just One More Point”