Rust is one of the "cool" languages these days. It promises all the low-level power of C with memory safety and "modern" programming conventions like iterables and maps. High performance, expressive language, low-level power seems like a great combination for certain domains.
Now, Jenna Winchester needed to do some Morton Coding or Z-indexing, which is an algorithm which lets you take multidimensional points and turn them into 1-dimensional points in a way that maintains their spatial relationships- essentially a fast way of traversing a quadtree. It's a fairly simple and fast algorithm, especially if you implement it using bitwise operations. A naive implementation, without optimizations, can do its job with very few CPU cycles, relatively speaking.
And while Jenna could have implemented her own version of it, never reinvent a wheel that someone else probably has. So she tracked down a Rust library (or crate, if we're using Rust terminology) which promised to do the job. Jenna's expectation was that she could feed in her 5-dimensional point, and get back the z-index by simply doing something like let output = input.z_index()
. Let's call the library morty_code
, because we should focus more on the painful experience of working with a badly designed API than worry about calling out a library for a mildly niche language for a very specific problem domain.
That, of course, would be too easy. The code which Jenna needed to write to perform the core purpose of what the library claimed to do was this:
fn morton_encode_u8_5d_zdex (input: [u8; 5]) -> u64 {
use zorder::*;
let usize_bits = 8*core::mem::size_of::<usize>();
let transmute_input = |x: &u8| -> FromU8 {(*x).into()};
input // Take the original input,
.iter() // element by element...
.map(transmute_input) // Transform each to the custom input types zindex needs...
.z_index() // Compute the result...
.unwrap() // Panic if there's an error... (Can there be one? Who knows!)
.iter_storage() // Take the result usize by usize...
.fold(0 as u64, |acc, a| (acc<<usize_bits) | a as u64)
// ...and finally, unify the iterator of usizes into a single u64.
// Can you just FEEL the ergonomics?
}
Now, even if you don't know Rust, which I don't, this looks menacing, even before you read Jenna's comments. Here's the key things: you can compute the Z-index using bitwise operations. The library author, however, didn't understand this, or didn't care, so instead used a different datastructure: a vector of bits. The line where we define transmute_input
invokes FromU8
, which takes an 8-bit number and turns it into an 8-item vector of bits. Which, despite knowing that it will always need exactly 8 items to hold 8 bits, the actual implementation of FromU8
dynamically allocates that memory.
So, with that in mind, we can trace through the implementation. We take our 5-dimensions of 8-bit integers as input
. We iterate across each one, converting each to a vector-of-bits using .map(transmute_input)
, for each dimension, we can then calculate the z_index()
, which comes back as a vector-of-bits, so we have to unwrap()
it. We chunk the results back up using that iter_storage()
and then finally we can reduce the z-indexes for each dimension using fold
to bitshift them around.
If that seems like a lot of work to implement a simple algorithm first described in the 1960s, you'd be right. Jenna ran some performance tests comparing her naive implementation with the implementation from this library:
I checked the assembly that's emitted for a simple case of two u32s to one u64. A very naive version needed 600 machine instructions.
morty_code
needed more than three thousand. And since it contains multiple subroutines,morty_code
turns out to be two orders of magnitude slower than the naive version.
But hey, we wouldn't want to use the naive version, because we'd have to worry about things like edge cases and faulty assumptions which surely means the library has to be more correct, right?
I whipped up a couple simple tests to ensure that the functions operate correctly. Surprise! The
morty_code
version doesn't. It ends up putting the high-significance bits at the end and the low-significance bits at the beginning. Printing the vector-of-bits directly shows the result correctly, but printing it after transforming it into a u64 shows the bits reversed.
Which is to say that the internal representation surprises you with its endianess. I suspect that it was that endian problem which initially lead to the creation of the vector-of-bits type that's used internally, but there are far easier ways to resolve conflicts with byte order.
Jenna contacted the original developer of the library, hoping to maybe help improve the experience for other developers.
This was the point at which I decided that the code has absolutely no redeeming features. A few fruitless posts of dialogue later, I realised that talking to TDWTF would be much more productive than talking to the maintainer. So... here we are.
Here we are, but where is "here" on the z-order curve?