• (nodebb)

    I love the use of qsort while completely ignoring the fact that the memory barrier is one of the main reasons why performance of low level functions degrade. I rate this two small rubber ducks out of ten.

    Addendum 2023-04-18 07:02: I know this is C++ and there is not a shuffle function available by default, but in modern languages there are (like Random.Shuffle() in C#). However as always with random things someone has to be very, very careful with pseudo-RNG or simple algorithms like Yates, they are not suited for real randomness because both feature predictable distribution features, so doing a real shuffle should be left up to people that implement proper crypto libraries :-)

  • LZ79LRU (unregistered)

    Custom compiler best compiler. Because custom compilers WTFs don't HALT.

  • Industrial Automation Engineer (unregistered)

    for each position in the array, swap its element with an element at a random position of said array. rinse and repeat 3 times for extra randomness. </LOL>

  • Smithers (unregistered)

    This C code is correct

    No, it isn't. int n = rand() % nb means there are only nb possible weights, so there will (very likely) be collisions and elements with the same weight won't get shuffled. E.g. if nb = 2, 1/2 the time the weights will either both be 0.0 or both be 0.5 and qsort won't do anything. Hence you'll get [0, 1] three times as often as [1, 0].

    Using rand() / (double) RAND_MAX as the weight (or better yet, skip the division and leave as an int) would do a lot better. (At least until nb gets to a reasonable proportion of sqrt(RAND_MAX) and you start chancing collisions again.)

  • Pabz (unregistered)

    Maybe this code runs really efficiently after being built by their compiler? (No, I don't think so either - it's probably even worse than it being built by a good compiler.)

  • (nodebb) in reply to Pabz

    Nah, this is impossible. First you have to allocated the area which is a non-zero operation, then you have to copy the area with is On, then you have qsort it which is O(n*log n) and then you have to copy it which is again On, then you have to free the allocated area which is again not a no-zero operation.

    In comparison a Yates shuffle is a simple On operation.

    So it is literally impossible that this algorithm will be faster or more resource efficient on any binary computer :)

  • LZ79LRU (unregistered) in reply to MaxiTB

    This said, I'd love to peek under the hood of that compiler. It's been a while since I did something like that.

  • Pabz (unregistered) in reply to MaxiTB

    Yes, I completely agree. I was just trying to make a joke out of both the presented code and the compiler that this type of programmer would come up with!

  • dpm (unregistered)

    Let's trace through this.

    Let's not. I enjoy the posts on this site, but I have a limit and one glance at today's code shut me down . . . I get enough of that at work.

  • (nodebb)

    Why do so many programmers when confronted with a problem that clearly is not a new one (like shuffling an array of numbers) jump to "I need to code my own solution to this from first principles" instead of "there's probably an algorithm that already does this"?

  • (nodebb) in reply to LZ79LRU

    If you interested in modern compilers, there are actually a lot competent out there, like gcc and the Roslyn .net compiler:

    • https://github.com/gcc-mirror/gcc
    • https://github.com/dotnet/roslyn/blob/main/docs/wiki/Roslyn-Overview.md

    If you are interested in bad compiler implementations ehm those are harder to find :-)

  • (nodebb) in reply to MaxiTB

    they are not suited for real randomness

    That depends on the random number source. If that's good, Fisher-Yates is optimal in that every possible permutation has the same probability. (That algorithm also minimises memory usage and number of swaps; it's really very good indeed.)

    If you're using a junk RNG, that's a separate problem.

  • LZ79LRU (unregistered) in reply to MaxiTB

    The worse the better.

  • GuestoGuest (unregistered)

    The real crime here is the use of rand()

  • LZ79LRU (unregistered)

    Ideally I'd want a C++ compiler (because I like C++) that technically obeys all the standards to the letter BUT violates any and all unwritten rules including the principle of least astonishment, common sense and common decency.

    It should abuse every possibility to make the code perform. This includes generating code which whilst technically compliant is as inefficient as possible without breaking said compliance.

    And I don't just mean memory mishandling or extra operations here. I mean like how modern compilers take advantage of knowing how modern CPUs work and their unique quirks and features to optimize toward their architecture and doing the absolute opposite by optimizing away from these.

    Also it should generate code that is as spagetified as possible by deliberately abusing long jumps as much as possible to make decompilation for the purposes of debugging as painful as possible. This should be marketed as a security feature.

    Finally bonus points if it can also bloat the size of compiled .exe files by sheer inefficiency (no cheating like padding allowed).

  • Barry Margolin (github)

    "they are not suited for real randomness"

    How many applications need "real" randomness?

  • Randal L. Schwartz (github)

    Kinda reminds me of those ways to generate a "random order" of an SQL table's rows by sorting on a random value. Uh, yeah. Right.

  • Worf (unregistered) in reply to Smithers

    Quick sort is an unstable sort. If two numbers have the same weight, it's just as fair that they may swap positions each time it's evaluated.

    If you want to make sure the numbers won't swap positions despite having the same weight, you need to use a stable sort.

  • Shadow of Smithers (unregistered)

    Smithers gets it. The code shown, with the random number taken modulo nb, will have collisions in the variable being sorted. The collisions will cause a noticeable bias in the resulting permutations, because qsort() presumably defaults to "do nothing" if the sort variables are compared as equal.

  • JiffyPop (unregistered)

    I suspect the following would be better than the example code while being worse than every correct implementation. Thoughts?

    f_cmp_valweight(const void * pv1, const void *pv2) { return Rand() % 3 - 1; }

  • (nodebb) in reply to Worf

    I think an unstable sort just means it isn't defined how it will handle tied values, rather than it being 50/50 each time. E.g. when sorting two 0's it will either always swap them, or never swap them.

  • (nodebb) in reply to MaxiTB
    Comment held for moderation.
  • Felix Palmen (unregistered)

    Another WTF is naming a type *_t. This is done a lot in POSIX, and guess what, POSIX actually reserves *_t "to the implementation".

  • Gaetan (unregistered)

    For a couple seconds I believed somebody had submitted some part of the code I Inherited (still there after more than 12 years). Fortunately it avoids the bias Smithers noticed. JiffyPop’s idea is funny for esthetics, but it is also probably slower than the original, because it calls Rand() more often.

  • Balor64 (unregistered) in reply to JiffyPop

    That would be undefined behavior, which is worse than the code in the article. Your version can return that a < b, b < c, and c < a, or even that a < b and b < a, both of which are impossible.

  • (nodebb) in reply to prueg

    E.g. when sorting two 0's it will either always swap them, or never swap them.

    Or sometimes it will do it and sometimes it won't. The algorithm simply isn't required to be consistent at all (and might not be if it is doing things in multiple phases). In a stable sort, it's as if you have the original index of the value as an auxiliary key so that values are never truly tied.

  • (nodebb) in reply to Randal L. Schwartz

    Kinda reminds me of those ways to generate a "random order" of an SQL table's rows by sorting on a random value. Uh, yeah. Right.

    Forgive my ignorance (I'm not a developer) but what's wrong with this?

  • Gearhead (unregistered) in reply to JiffyPop

    f_cmp_valweight(const void * pv1, const void *pv2) { return Rand() % 3 - 1; }

    LOL, I had the same idea.

    Serious question: are there any sorting algorithms which rely on stable comparison results, and algorithm may result in an infinite loop?

    JiffyPop’s idea is funny for esthetics, but it is also probably slower than the original, because it calls Rand() more often.

    Not a problem. Just cache the results of each comparison in a map.

  • SwineOne (unregistered)
    Comment held for moderation.
  • Andrew Klossner (unregistered)

    This isn't even the correct use of rand(). As described by the ANSI C committee, the low-order bits are much less random than the high-order bits, so the sequence generated by taking the remainder after division by n is not uniformly distributed. To get a number from 0 to n-1, correct usage is:

    ((unsigned long long)rand() * n) / (RAND_MAX+1)

    And if there isn't an srand() call to change the seed, you'll get the same sort order every time you run the program.

    This quote from On-line Numerical Recipe in C https://www.unf.edu/~cwinton/html/cop4300/s09/class.notes/LCGinfo.pdf is enlightening:

    "Be very, very suspicious of a system-supplied rand() that resembles the one just described. If all scientific papers whose results are in doubt because of bad rand()s were to disappear from library shelves, there would be a gap on each shelf about as big as your fist."

  • gordinfish (unregistered) in reply to Andrew Klossner
    Comment held for moderation.
  • Gaetan (unregistered) in reply to LZ79LRU

    Actually this may come from a very old submission of mine under a different name (would be sorry to learn that someone has to deal with a similar piece of crap): the context really rings a bell.

    So assuming this comes from my program, here is what I can tell: this is a niche language targeted for non-programmers. The core implementation was made some 26 years ago by an intern, who did the best they could with the help of one of the Dragon books. Over the years additional developpers added warts upon warts to get additional features.

    Of course it is full of bugs, limitations, quirks and mishandled cases, but fortunately the users are actually quite smart and they know what to avoid, although this has a cost in boilerplate code.

    At last the powers that be have allowed a refactoring of the code. We cannot start from scratch because of the tremendous amount of work that relies on the program, so this is going to take a while, but with good unit tests it should work nicely in the end.

  • Officer Johnny Holzkopf (unregistered) in reply to Felix Palmen
    Comment held for moderation.

Leave a comment on “Random Permutation”

Log In or post as a guest

Replying to comment #:

« Return to Article