The Daily WTF: Curious Perversions in Information Technology

2019-11-11 Reply Admin

((record*) nullptr)->updateDatabaseFormat();

2019-11-11 Reply Admin

The vector is also being passed-by-value, so no matter what's being done, the original object is never going to get changed!

2019-11-11 Reply Admin

Access violation: Dereferenced Null-Pointer

2019-11-11 Reply Admin

Well, at least Sylvester is using the STL, which is better than 90% of C++ programmers (including the one I'm working with at the moment).

It would have been nice if he'd used std::set, which delivers exactly the functionality he's looking for. And it would have been nice if he'd made the method const and returned a new, non-duplicate, vector. But I suppose committing something that actually works is, as usual, the main thing ...

2019-11-11 Reply Admin

Bananakin, is that "Ban Anakin" or "Banana Kin"?

2019-11-11 Reply Admin

Regarding the comment from Bob, it is my fault. When "anonimizing" the code for posting it here, I forgot to add the reference symbol. So it changed the container elements, so we had a bugin the making, as per the documentation, the erased elements are not deleted from it, so they are (probably) in a state with undefined behaviour. The if condition that only erases 1 element was, using Sylvester words, "because without it, the code didn't compile". I decided not to ask for more details. That was enough for me...

2019-11-11 Reply Admin

Long story short: both. There is a Bananas already and this one is his padawan.

2019-11-11 Reply Admin

Sylvester's second commit? 'SolveBug2'.

2019-11-11 Reply Admin

Seems reasonable that if you've asked for some code to be reviewed and you've labelled the bug, then the person looking at it will ask you for details. The WTF is the company not explaining their workflow.

2019-11-11 Reply Admin

| Sylvester ... comes from a more "enterprise" background... also, Sylvester might not be particularly good at the job

But you repeat yourself.

Alexis_de_Torquemada · 2019-11-11 Reply Admin

Ok, I may receive the accurate objection that having a slow program that works is infinitely better than having a fast program that doesn't. But… that said, I still have the spontaneous inclination to go on a rant about std::set.

This is in fact what's wrong with it: nothing per se. It works as advertised, It's just horribly slow compared to just about any other in-memory associative container. And this includes small vectors (dozens to hundreds of elements), which will almost certainly beat std::set. When it was standardized, ease of use was preferred over performance. The only way to reasonably implement it, mainly due to the required pointer stability, is as a balanced binary search tree, typically a red-black tree because it performs better in common use cases than e.g. AVL, and is easy to implement. However, the depth of a balanced binary tree is proportional to the two's logarithm of the number of elements. An RB tree guarantees that the depth is never greater than twice the minimum. But this means, e.g., that with merely 513 items in the tree (minimum depth 10), the depth can reach up to 20. The STL design, in particular as mentioned above, pointer stability, forces every value to occupy a separate block from the free store. Typically, the tree structure is included in this "node". So a single lookup may require 20 pointer indirections in a row. If your cache is cold, every one of these has a high chance of being a miss, and since the result even of the comparison of the next node key depends on the indirection, this is completely serialized forming a critical path of, in the worst case, 20 RAM roundtrips. 1 RAM roundtrip is currently estimated to take up to 100 processor cycles on an average PC. Common desktop CPUs are capable of executing up to 3 simple instructions from a thread, per cycle, simultaneously. So, worst case for depth 20, 2000 cycles wasted, enough to execute 6000 ins'ns. (tbc)

Alexis_de_Torquemada · 2019-11-11 Reply Admin

(cont'd) Juxtapose this with a vector where even with a cold cache, you most likely only suffer one or two memory roundtrips because the CPU is very good at detecting linear access patterns and most likely prefetches the remaining cache lines. Or juxtapose it with an open addressing hash map with again most likely only a few cache misses. Since unordered_map is designed to use closed addressing with a linked list per bucket, it is significantly worse in this regard, but should still generally be much faster than the horrible std::set.

As a final note, std::set proves to be largely immune to any fundamental improvement because of the way it is specified. In particular, you cannot even use an n-ary balanced tree such as a B-tree or any of its derivatives, to try and reduce the depth and hence the number of critical path cache misses. A true B-tree requires that the keys are stored in the node. Since keys could be of a non-copyable type (as of C++-11) and pointer stability must be provided, however, they cannot in general coexist in a single allocation, defeating much of the purpose of the B-tree.

This is all effectively the result of these data structures (balanced binary search trees) not having aged well. When the STL was standardized, people still thought in terms of outdated non-superscalar processors where it makes no difference, performancewise, whether I add two 32-bit integers or retrieve a 32-bit value from memory. The only memory hierarchy that was well-known and for which solutions - such as n-ary trees - was the duality of fast RAM and slow non-volatile memory such as disks and tapes. Even if you had had the tremendous foresight to criticize std::set as being non-implementable using cache-friendly n-ary trees, you would have most likely been laughed out of the room. Why would anyone in their right mind use database algorithms to deal with data in memory after all.

2019-11-11 Reply Admin

When "anonymizing", you probably changed Sylvester's name, too. May I guess that the guy is called Arnold?

powerlord · 2019-11-11 Reply Admin

It sounds like "SolveBug" is what he wanted the others to do during the code review. :P

2019-11-11 Reply Admin

Sylvester? Sufferin' buggotash!

2019-11-12 Reply Admin

Using set could be ok but it messes up original order. But then the sort does it also. To select better(faster/reliable etc) method one would need more information about the problem domain.

2019-11-12 Reply Admin

"I'll be bug."

2019-11-12 Reply Admin

std::unique doesn't erase elements, it merely moves the duplicates to the back of the container.

2019-11-12 Reply Admin

Const? Non-duplicate? What is this, a PhD assignment?

We should consider ourselves lucky that he isn't using some twisted custom implementation of bubble sort, which creates at least two large memory leaks on each pass (I've actually seen even that happen in real life C, qsort() is apparently "too slow").

2019-11-16 Reply Admin

You're worried about 6,000 instructions? According to a table I found, that's 0.0005 seconds on a 386DX. 3.7 x 10^-8 seconds on a Core i7. Unless this scales up with the square of the size of the set or something, it seems unlikely to be a performance concern. If it's the best solution for some other reason, such as readability, go for it and fix it if the performance is too slow. If there is no particular reason to use it, then use something faster.

2019-11-25 Reply Admin

From https://en.cppreference.com/w/cpp/algorithm/unique:

"Return value: Forward iterator to the new end of the range"

The error isn't in the std::unique part at all. The problem is in the erase, which only removes one element if not given the end range. The code should have been: "container.erase(it, container.end())", and it would have worked. The check for the it != container.end() is then redundant, and this can be even made simpler with the erase-remove idiom kind of approach:

container.erase(std::unique(container.begin(), container.end()), container.end())

will be enough after the sorting.

One Way to Solve a Bug

Leave a comment on “One Way to Solve a Bug”