The Daily WTF: Curious Perversions in Information Technology

MaxiTB · 2023-04-18 Reply Admin

I love the use of qsort while completely ignoring the fact that the memory barrier is one of the main reasons why performance of low level functions degrade. I rate this two small rubber ducks out of ten.

Addendum 2023-04-18 07:02: I know this is C++ and there is not a shuffle function available by default, but in modern languages there are (like Random.Shuffle() in C#). However as always with random things someone has to be very, very careful with pseudo-RNG or simple algorithms like Yates, they are not suited for real randomness because both feature predictable distribution features, so doing a real shuffle should be left up to people that implement proper crypto libraries :-)

2023-04-18 Reply Admin

Custom compiler best compiler. Because custom compilers WTFs don't HALT.

2023-04-18 Reply Admin

for each position in the array, swap its element with an element at a random position of said array. rinse and repeat 3 times for extra randomness. </LOL>

2023-04-18 Reply Admin

This C code is correct

No, it isn't. int n = rand() % nb means there are only nb possible weights, so there will (very likely) be collisions and elements with the same weight won't get shuffled. E.g. if nb = 2, 1/2 the time the weights will either both be 0.0 or both be 0.5 and qsort won't do anything. Hence you'll get [0, 1] three times as often as [1, 0].

Using rand() / (double) RAND_MAX as the weight (or better yet, skip the division and leave as an int) would do a lot better. (At least until nb gets to a reasonable proportion of sqrt(RAND_MAX) and you start chancing collisions again.)

2023-04-18 Reply Admin

Maybe this code runs really efficiently after being built by their compiler? (No, I don't think so either - it's probably even worse than it being built by a good compiler.)

MaxiTB · 2023-04-18 Reply Admin

Nah, this is impossible. First you have to allocated the area which is a non-zero operation, then you have to copy the area with is On, then you have qsort it which is O(n*log n) and then you have to copy it which is again On, then you have to free the allocated area which is again not a no-zero operation.

In comparison a Yates shuffle is a simple On operation.

So it is literally impossible that this algorithm will be faster or more resource efficient on any binary computer :)

2023-04-18 Reply Admin

This said, I'd love to peek under the hood of that compiler. It's been a while since I did something like that.

2023-04-18 Reply Admin

Yes, I completely agree. I was just trying to make a joke out of both the presented code and the compiler that this type of programmer would come up with!

2023-04-18 Reply Admin

Let's trace through this.

Let's not. I enjoy the posts on this site, but I have a limit and one glance at today's code shut me down . . . I get enough of that at work.

Jer · 2023-04-18 Reply Admin

Why do so many programmers when confronted with a problem that clearly is not a new one (like shuffling an array of numbers) jump to "I need to code my own solution to this from first principles" instead of "there's probably an algorithm that already does this"?

MaxiTB · 2023-04-18 Reply Admin

If you interested in modern compilers, there are actually a lot competent out there, like gcc and the Roslyn .net compiler:

https://github.com/gcc-mirror/gcc
https://github.com/dotnet/roslyn/blob/main/docs/wiki/Roslyn-Overview.md

If you are interested in bad compiler implementations ehm those are harder to find :-)

dkf · 2023-04-18 Reply Admin

they are not suited for real randomness

That depends on the random number source. If that's good, Fisher-Yates is optimal in that every possible permutation has the same probability. (That algorithm also minimises memory usage and number of swaps; it's really very good indeed.)

If you're using a junk RNG, that's a separate problem.

2023-04-18 Reply Admin

The worse the better.

2023-04-18 Reply Admin

The real crime here is the use of rand()

2023-04-18 Reply Admin

Ideally I'd want a C++ compiler (because I like C++) that technically obeys all the standards to the letter BUT violates any and all unwritten rules including the principle of least astonishment, common sense and common decency.

It should abuse every possibility to make the code perform. This includes generating code which whilst technically compliant is as inefficient as possible without breaking said compliance.

And I don't just mean memory mishandling or extra operations here. I mean like how modern compilers take advantage of knowing how modern CPUs work and their unique quirks and features to optimize toward their architecture and doing the absolute opposite by optimizing away from these.

Also it should generate code that is as spagetified as possible by deliberately abusing long jumps as much as possible to make decompilation for the purposes of debugging as painful as possible. This should be marketed as a security feature.

Finally bonus points if it can also bloat the size of compiled .exe files by sheer inefficiency (no cheating like padding allowed).

2023-04-18 Reply Admin

"they are not suited for real randomness"

How many applications need "real" randomness?

2023-04-18 Reply Admin

Kinda reminds me of those ways to generate a "random order" of an SQL table's rows by sorting on a random value. Uh, yeah. Right.

2023-04-18 Reply Admin

Quick sort is an unstable sort. If two numbers have the same weight, it's just as fair that they may swap positions each time it's evaluated.

If you want to make sure the numbers won't swap positions despite having the same weight, you need to use a stable sort.

2023-04-18 Reply Admin

Smithers gets it. The code shown, with the random number taken modulo nb, will have collisions in the variable being sorted. The collisions will cause a noticeable bias in the resulting permutations, because qsort() presumably defaults to "do nothing" if the sort variables are compared as equal.

2023-04-18 Reply Admin

I suspect the following would be better than the example code while being worse than every correct implementation. Thoughts?

f_cmp_valweight(const void * pv1, const void *pv2) { return Rand() % 3 - 1; }

prueg · 2023-04-19 Reply Admin

I think an unstable sort just means it isn't defined how it will handle tied values, rather than it being 50/50 each time. E.g. when sorting two 0's it will either always swap them, or never swap them.

prueg · 2023-04-19 Reply Admin

C++ does indeed have a built-in shuffle. See https://cplusplus.com/reference/algorithm/shuffle/. There is also a <random> library that includes decent RNGs and scaling functionality for turning RNG output into usable ranges without introducing modulo bias.

2023-04-19 Reply Admin

Another WTF is naming a type *_t. This is done a lot in POSIX, and guess what, POSIX actually reserves *_t "to the implementation".

2023-04-19 Reply Admin

For a couple seconds I believed somebody had submitted some part of the code I Inherited (still there after more than 12 years). Fortunately it avoids the bias Smithers noticed. JiffyPop’s idea is funny for esthetics, but it is also probably slower than the original, because it calls Rand() more often.

2023-04-19 Reply Admin

That would be undefined behavior, which is worse than the code in the article. Your version can return that a < b, b < c, and c < a, or even that a < b and b < a, both of which are impossible.

dkf · 2023-04-19 Reply Admin

E.g. when sorting two 0's it will either always swap them, or never swap them.

Or sometimes it will do it and sometimes it won't. The algorithm simply isn't required to be consistent at all (and might not be if it is doing things in multiple phases). In a stable sort, it's as if you have the original index of the value as an auxiliary key so that values are never truly tied.

jkshapiro · 2023-04-19 Reply Admin

Kinda reminds me of those ways to generate a "random order" of an SQL table's rows by sorting on a random value. Uh, yeah. Right.

Forgive my ignorance (I'm not a developer) but what's wrong with this?

2023-04-19 Reply Admin

f_cmp_valweight(const void * pv1, const void *pv2) { return Rand() % 3 - 1; }

LOL, I had the same idea.

Serious question: are there any sorting algorithms which rely on stable comparison results, and algorithm may result in an infinite loop?

JiffyPop’s idea is funny for esthetics, but it is also probably slower than the original, because it calls Rand() more often.

Not a problem. Just cache the results of each comparison in a map.

2023-04-19 Reply Admin

While not the case here, this is exactly how certain crypto algorithms perform a permutation in constant time, in order to avoid side-channel attacks (see e.g. the NTRU cryptosystem, where this algorithm is written into the specification). Other than the fact that they use a constant-time sort and not just C's qsort() function. So while you're correct to make fun of this particular instance, note that it may actually be useful in other instances.

2023-04-19 Reply Admin

This isn't even the correct use of rand(). As described by the ANSI C committee, the low-order bits are much less random than the high-order bits, so the sequence generated by taking the remainder after division by n is not uniformly distributed. To get a number from 0 to n-1, correct usage is:

((unsigned long long)rand() * n) / (RAND_MAX+1)

And if there isn't an srand() call to change the seed, you'll get the same sort order every time you run the program.

This quote from On-line Numerical Recipe in C https://www.unf.edu/~cwinton/html/cop4300/s09/class.notes/LCGinfo.pdf is enlightening:

"Be very, very suspicious of a system-supplied rand() that resembles the one just described. If all scientific papers whose results are in doubt because of bad rand()s were to disappear from library shelves, there would be a gap on each shelf about as big as your fist."

2023-04-20 Reply Admin

((unsigned long long)rand() * n) / (RAND_MAX+1)

That should be /RAND_MAX, without the +1, otherwise you'll get an integer overflow message.

2023-04-24 Reply Admin

Actually this may come from a very old submission of mine under a different name (would be sorry to learn that someone has to deal with a similar piece of crap): the context really rings a bell.

So assuming this comes from my program, here is what I can tell: this is a niche language targeted for non-programmers. The core implementation was made some 26 years ago by an intern, who did the best they could with the help of one of the Dragon books. Over the years additional developpers added warts upon warts to get additional features.

Of course it is full of bugs, limitations, quirks and mishandled cases, but fortunately the users are actually quite smart and they know what to avoid, although this has a cost in boilerplate code.

At last the powers that be have allowed a refactoring of the code. We cannot start from scratch because of the tremendous amount of work that relies on the program, so this is going to take a while, but with good unit tests it should work nicely in the end.

2023-04-28 Reply Admin

This fits nicely to the fact that the code itself is quite random: space or no space after asterisk ("const void * pv1, const void pv2" - within the same context!), underscore or none or camel case ("f_cmp_valweight", "randomperm", "aValWeight"), spaces or no spaces around arithmetic operators ("((double) n) / ((double) nb)", "malloc(nbsizeof(valweight_t))") and around comparators ("p1->weight > p2->weight", "i<nb") and around assignment operator ("i=0", "int n = rand() % nb"). Maybe the code is used to seed the RNG? Or... could the code itself be the RNG? Who knows...

Random Permutation

Leave a comment on “Random Permutation”