Rusted chain

Working with a legacy codebase is like embarking on an archaeological expedition to a foreign land: you never know what ancient artifacts you're going to uncover. Will it be the mighty fast inverse square root? The rusty yet still operational Duff's device? An old COBOL module forgotten by time, quietly holding the universe together?

Unfortunately for Matteo I., his escapade wasn't nearly as fruitful. Digging through the muddy codebase, all he found were horrible C++ and linked lists. Lots and lots and lots of linked lists.

Now, the thing about linked lists is that they didn't age very well. Back when processors were less smart and locality wasn't an issue, they were a reasonable trade-off—but as technology marched on, new data structures were invented, and processor caches grew, the drawbacks started to overshadow the benefits. And so, they were relegated to college textbooks and old programmers' tales.

Still, in the hands of a skilled programmer, those outmoded structures could still be made to perform well. Judging by the code below, however, "skilled programmers" were not a resource Matteo’s predecessors had in abundance...

REP:
 for (P = PPfirst; P != NULL; P = P->next) {
     if (some_condition(P)) {
         // ...
         PPLinkElem *PL;
         if (P == PPfirst) {
             PPfirst = P->next;
             delete P;
             goto REP;
         } else {
             for (PL = PPfirst; PL != NULL && PL->next != P; PL = PL->next);
             if (PL == NULL || PL->next != P) {
                 DIE("Error in ...");
                 return 0;
             }
             PL->next = P->next;
             delete P;
             goto REP;
         }
     }
 }

According to our submitter, this code is supposed to remove elements from a list based on some_condition(). Normally, that's where linked lists shine. Removing a single element is a simple, constant-time operation, so filtering the whole list is barely O(n)—as opposed to O(n2) in case of an array. So, let's see whether the original programmer was up to the task.

The procedure begins innocently enough, with a standard traversal loop. When the condition is met, we first check if we're on the first element, and if so, we fix the start pointer, deallocate the element... then jump back to the very beginning of the loop. It's certainly an awkward way to handle that special case, and I'm sure that goto made a few people twitch, but so far no points are deducted. We’re still in the linear complexity bracket.

In the other branch, however, things get a little more heated. To remove an element from the middle of a list, we need to have the node before it point to the node after it, effectively bypassing the removed one. But it turns out we don't know where the previous node is, since we forgot to store its location! So we get to walk through the whole list all over again and search for it, which means we've brought our performance down by a notch.

Then, in a show of defensive programming, we protect ourselves from cosmic rays by checking if the previous element was actually found in the list. If not, the program crashes and returns a success code to the OS, like any good crashing program should. But if we're lucky enough to find our previous node, we can then filter the element out... and then go back to the very beginning, starting the filtering from scratch.

For those keeping score at home, the grand total is O(n3). The implementation is not only worse than it should be, but it also takes one of the very few aspects in which linked lists can be better than arrays and turns it into a crippling disadvantage.

Unfortunately, aside from the horrible performance, the code does work—so it ended up copied, pasted and peppered all over the 160kLOC codebase. We just wish we could send Matteo a bigger shovel to dig through all this dirt.