The Daily WTF: Curious Perversions in Information Technology

HardwareGeek · 2015-07-08 Reply Admin

Removing a single element is a simple, constant-time operation, so filtering the whole list is barely O(n)—as opposed to O(n²) in case of an array.

The way you phrased that derailed my brain for a little while. I initially read "removing a single element" as meaning the entire process, including finding the element to remove (which of course is not constant-time), rather than just the actual removal. :facepalm:

But it turns out we don't know where the previous node is, since we forgot to store its location!

:headdesk:

checking if the previous element was actually found in the list.

:headdesk: :headdesk:

starting the filtering from scratch.

:headdesk: :headdesk: :headdesk: :gun:

foxyshadis · 2015-07-14 Reply Admin

PaulaBean:
we protect ourselves from cosmic rays

You always have to wonder what kind of bug drove someone to make checks that can't fail. I wonder if I've left that kind of code in from old debugging sessions at some point. Debugging makes you insane.

TheCPUWizard · 2015-07-14 Reply Admin

a) "If not, the program crashes and returns a success code to the OS, like any good crashing program should"..... Actually it is not clear what DIE(...) does, it may actually terminate the program and return a failed code to the OS [such that the return 0 is never executed].

b) I would not consider the above a "crashing program"

c) I may have misread things, since I am just now drinking my first cup of coffee.

aliceif · 2015-07-14 Reply Admin

The code snippet could be from main for all we know ...

Maciejasjmj · 2015-07-14 Reply Admin

TheCPUWizard:
it is not clear

Agreed. But if it returns an error code, why return 0 next line? If it doesn't, what does it do? Is the zero returned from main, or from a procedure?

I'd say if the program exits, it counts as a crash - but other than that, I'm at a total loss.

cvi · 2015-07-14 Reply Admin

PaulaBean:
[...] as opposed to O(n²) in case of an array. [...]

:doingitwrong.iso:

TheCPUWizard · 2015-07-14 Reply Admin

Maciejasjmj:
But if it returns an error code, why return 0 next line? If it doesn't, what does it do? Is the zero returned from main, or from a procedure?

What if it does NOT RETURN -- for example throws an exception or even has a simple "exit(4);"

TheCPUWizard · 2015-07-14 Reply Admin

Maciejasjmj:
I'd say if the program exits, it counts as a crash - but other than that, I'm at a total loss.

So every program that completes execution has crashed????

Reddog · 2015-07-14 Reply Admin

Think the point is that it ended prematurely thus not really completing it's job. It quit graciously maybe but still failed. Might as well crash then

kupfernigk · 2015-07-14 Reply Admin

The article is certainly right about obsolescence. I think I last used a linked list in around 2000, and that was proof of concept for a batch process in C that had to load in and clean up csv files, and was rapidly replaced by a database approach (thank you, Monty Widenius, for making it possible for free.) I think I last tore my hair out and replaced someone's linked list implementation (in Java) in about 2006 - children, do not use linked lists where there is a much better library solution. It is not big and it is not clever.

RFoxmich · 2015-07-14 Reply Admin

Debugging makes you insane

Debugging lets you express the insanity you already have.

RFoxmich · 2015-07-14 Reply Admin

Maciejasjmj:
Agreed. But if it returns an error code, why return 0 next line?

Because you never know when die refuses to die... due to cosmic rays perhaps.

TheCPUWizard · 2015-07-14 Reply Admin

RFoxmich:
Because you never know when die refuses to die... due to cosmic rays perhaps

Or because the compiler does not know that DIE will never return, and the explicit (but never executed) return statement is required to get proper analytics (such as flow path analysis)

Maciejasjmj · 2015-07-14 Reply Admin

cvi:
:doingitwrong.iso:

Hm... The naive solution is definitely O(n²). I think you can bring it to O(n), but with O(n) space requirement (mark-and-remove)?

TheCPUWizard:
What if it does NOT RETURN

Well, then it doesn't and returning 0 is redundant unless your compiler complains because it can't see exit() as an exit point.

TheCPUWizard:
So every program that completes execution has crashed????

It halts (I assume) execution in the middle of processing and drops you off to the system. Whether it's via throw new UnhandledException(), abort(), return or a segfault hardly matters - end result is the same.

cvi · 2015-07-14 Reply Admin

Maciejasjmj:
Hm... The naive solution is definitely O(n²). I think you can bring it to O(n), but with O(n) space requirement (mark-and-remove)?

See example implementation of std::remove_if.

It works something like

invalidIdx = iterate forward to find first to-be-removed element
iterate forward from invalidIdx+1: For a valid element, copy it to invalidIdx and increment invalidIdx. For an invalid element, do nothing.

Maciejasjmj · 2015-07-14 Reply Admin

Hm, that makes sense too. Well, one more reason not to use LLs...

PleegWat · 2015-07-14 Reply Admin

Item * item;
Item ** pitem;

for( pitem = &list ; *pitem ; /**/ )
{
    item = *pitem;
    if( some_condition( item ) )
    {
        *pitem = item->next;
        item_free(item);
        continue;
    }
    else
    {
        pitem = &item->next;
    }
}

Eldelshell · 2015-07-14 Reply Admin

Calling a data structure like a linked list obsolete is a bit extreme. It certainly has its place depending on what you're trying to accomplish and almost every language has it or is very easy to implement.

RFoxmich · 2015-07-14 Reply Admin

Well in that case TRWTF is that the only return statement is after the die...well then I suppose the die was cast.

cvi · 2015-07-14 Reply Admin

With the array version you still end up doing O(n) copies/moves, so if the copy/move is expensive that might be an issue.

Still, IMO you need a special reason for picking an LL over some other data structure (e.g., array).

PleegWat · 2015-07-14 Reply Admin

Array version:

int readidx, writeidx

for( readidx = 0, writeidx = 0; readidx < len; readidx++ )
{
    if( some_condition( array[readidx] ) )
    {
        item_free(array[readidx]);
    }
    else
    {
        array[write_idx++] = array[read_idx];
    }
}

len = write_idx;
/* optional realloc */

dkf · 2015-07-14 Reply Admin

cvi:
you need a special reason for picking an LL over some other data structure

Needing a high speed insert in the middle of the list is the best reason there. You can get a lot better performance (at the cost of space, of course) by building a subsidiary indexing structure that points to the entries in the linked list. And for goodness sake, make it a deque! The space-cost is easily compensated by the simpler algorithms.

Maciejasjmj · 2015-07-14 Reply Admin

dkf:
Needing a high speed insert in the middle of the list is the best reason there.

Kinda high speed - you still need to get to the previous node to fix it, so it's only good when you're already enumerating the list. And it's only relevant if you need to keep the order, because hashtables.

dkf:
deque

Obviously. It's the age of 8GB of RAM - while I can see some (if still rather exotic) uses for deques, SLLs can die in a fire.

cvi · 2015-07-14 Reply Admin

dkf:
Needing a high speed insert in the middle of the list is the best reason there.

Good point.

I always end up with something like per-pixel linked lists (of which you can build many in parallel & lock-free while allocating elements from a shared pool) as an example for a valid use of linked lists.

Nutster · 2015-07-14 Reply Admin

Let's see, implementing deletion code in linked list, based on a condition.

template <class T>
class LLNode<T>
{
    public:
    T value;
    LLNode<T> *next;
};

template <class T>
void nukem(LLNode<T> &*pFirst, bool (*some_condition)(LLNode<T> *))
{
    LLNode<T> *pLast = NULL, *pCurrent;

    for (pCurrent = pFirst; pCurrent != NULL; /* increment inside the loop */)
    {
        if (some_condition(pCurrent))
        {
            if (pCurrent == pFirst || pLast == NULL)
            {
                pFirst = pCurrent->next;
                delete pCurrent;
                pCurrent = pFirst;
            }
            else
            {
                pLast->next = pCurrent->next;
                delete pCurrent;
                pCurrent = pLast->next;
            }
        }
        else
        {
            pLast = pCurrent;
            pCurrent = pCurrent->next;
        }
    } // for
} // nukem

Sorry about the lack of comments, but that should do the deletion in O(n), depending on the complexity of some_condition(). In an array:

template <class T>
void nukem(T &*pArray, int &size; bool (*some_condition)(*T))
{
    int iCurrent, iCopy;

    for (iCurrent = 0; iCurrent < size; /* increment happens inside the loop */)
    {
        if (some_condition(pArray + iCurrent))
        {
            --size;
            // Copy following nodes over the current ones.
            for (iCopy = iCurrent; iCopy < size; ++ iCopy)
                pArray[iCopy] = pArray[iCopy+1];
        }
        else
           ++iCurrent;
    }  // for
} // nukem

Again with the lack of comments! This just off the top of my head and not well tested, but I have to get off to work. This would be around O(n^2), by copying all the following elements each time, again depending on the complexity of the condition and the assignment operation. Hmm, could I make it O(n log(n))?

cvi · 2015-07-14 Reply Admin

Nutster:
Hmm, could I make it O(n log(n))?

Yes.

And then you could make it O(n), and not be doing it wrong. :wtf:

RFoxmich · 2015-07-14 Reply Admin

TheCPUWizard:
What if it does NOT RETURN -- for example throws an exception or even has a simple "exit(4);"

For that matter what if the list is malformed and has a loop in it muhuhuhahahah.

dkf · 2015-07-14 Reply Admin

Maciejasjmj:
Kinda high speed - you still need to get to the previous node to fix it, so it's only good when you're already enumerating the list. And it's only relevant if you need to keep the order, because hashtables.

Actually, hash tables tend to use a linked list hanging off each of the hash buckets. Because the length of the list tends to be limited (depending on when you do a hash rebuild, which is a Black Magic tuning parameter) you don't pay much cost for the list iteration.

Maciejasjmj · 2015-07-14 Reply Admin

dkf:
Actually, hash tables tend to use a linked list hanging off each of the hash buckets.

Sure, but if your hash function isn't complete shit, those will rarely if ever contain more than one element.

lolwhat · 2015-07-14 Reply Admin

Constant time is O(1). Linear time is O(n).

dkf · 2015-07-14 Reply Admin

Maciejasjmj:
Sure, but if your hash function isn't complete shit, those will rarely if ever contain more than one element.

You're wrong. It's usual for a hash table to have multiple elements per bucket precisely because that means that the actual hash table size is smaller for almost no overhead. Given random elements (i.e., even distribution of hash values) you get a random distribution of bucket loadings, with some buckets having several elements and others none. Because randomness. Preventing that requires a very carefully tuned hash function, and it's just not worth the effort to write one of those! (You'd have to retune every time you used different input data. Belgium that!)

The trick is to rebuild the hash table once the number of elements reaches a critical load factor. That redistributes the elements across a larger hash table with a (slightly) different hash function. (The “difference” in hash function is usually just that a different mod factor is used to bound the hash value to the hash table size. Trivial.) That and getting a good (pre-mod) hash function seems to be genuinely difficult.

AbDQyzLRKr · 2015-07-14 Reply Admin

What's kind of embarassing: the author claims that the algorithm is O(n³), which is plain wrong. It's actually still O(n²)…

DaveN · 2015-07-14 Reply Admin

Well, technically, any function that is O(n²) is also O(n³). However, it is pretty clear that the author means Θ(n³), which is wrong. The worst-case running time is Θ(n²).

(when you have done a "goto REP;" after deleting an element, all elements before the just deleted will have some_condition evaluate to false. Hence only linear time is spent passing over them before we get to the next element that is a candidate for deletion.)

Maciejasjmj · 2015-07-14 Reply Admin

Ah, but you guys proved that filtering an array is O(n), so my original assertion that this code makes linked lists less efficient than arrays at filtering still holds!

Filed under: two wrongs make a right

tenshino · 2015-07-14 Reply Admin

I know I've been guilty of putting checks that should never be able to fail in places where, in the event of cosmic rays, an infinite loop or data corruption might happen. I usually comment such checks as being overly-pessimistic and unnecessary, though. I just always remember two quotes I read years ago:

"Computers make very fast, very accurate mistakes."

and

"Beware of errors in the above code. I have only proven it correct, I have not attempted to compile."

It's not like there's a significant performance penalty on modern CPU's for doing a NULL comparison, or having a "default" in a switch statement that will never (barring cosmic rays) happen.

Maciejasjmj · 2015-07-14 Reply Admin

It's not like it protects you from anything, since any compiler worth its salt would remove the dead code. So even if cosmic rays happen, you probably still won't be saved.

AbDQyzLRKr · 2015-07-14 Reply Admin

One more consideration: not knowing the scenario this code is used in, it may well be that the number of removed elements is small (or even constant) with respect to n. In that case, this method is Θ(n), too.

Also, in terms of cache locality, a linked list is pretty much equivalent to having an array of object references/pointers. So in a language like java or with objects that are either non-copyable or expensive to copy, arrays wouldn't solve this problem.

tenshino · 2015-07-14 Reply Admin

Fair point, but it makes me feel better ;)

Kian · 2015-07-14 Reply Admin

As some people already pointed out, arrays and linked lists have the same O-notation cost for insertion or removal. What the article misunderstands is that O-notation is not the whole picture. Arrays and linked lists have different trade-offs that are completely independent of the O-notation cost.

For example, deletion from an array requires "shifting" every subsequent element to keep the array compact. Aside from the fairly cheap cost of doing this (especially compared to constant cache misses caused by following links in a list), this means that pointers or iterators to every element behind the removed ones are invalidated. A linked list avoids this issue since the nodes remain where they were.

Performance wise, there have been tests that show that on current computers, even in the cases where the linked list should excel, an array performs better because of the cost of following pointers around versus the locality advantage of the array. The only reason to use a linked list is if your design relies on its behavior (not invalidating pointers, for example). And even then, you can probably make a smarter structure (array of pointers, for example).

Maciejasjmj · 2015-07-14 Reply Admin

Kian:
arrays and linked lists have the same O-notation cost for insertion or removal

Pretty sure it's not the case. You need to include squashing the array in the cost since if you don't squash, you no longer have an array, but I don't think you need to include lookup.

Kian:
What the article misunderstands is that O-notation is not the whole picture

Pretty sure I mentioned that.

Kian · 2015-07-14 Reply Admin

Maciejasjmj:
Pretty sure it's not the case. You need to include squashing the array in the cost since if you don't squash, you no longer have an array, but I don't think you need to include lookup.

Sorry, should have clarified: For the case of filtering, where you are already traversing the whole thing and already have a O(n) procedure. The single act of eliminating one element you already have a pointer to is O(1) in a doubly linked list and O(n) in an array because of the squashing. In a singly linked list it is O(n) because you have to find the previous element.

Maciejasjmj:
Pretty sure I mentioned that.

Did you? I couldn't find it.

Yamikuronue · 2015-07-14 Reply Admin

Kian:
The single act of eliminating one element you already have a pointer to

The act of obtaining a pointer to an element is O(1) for an array, and O(n) for a linked list. Meaning the total "remove" operation, from data structure to data structure minus one element, is O(n) in both.

Kian · 2015-07-14 Reply Admin

Not necessarily. You could have the pointer from when you created the element, without ever having looked for it, and then removed it. You would have never traversed the list, nor cared about it's length.

Yamikuronue · 2015-07-14 Reply Admin

Sure, but then you're intentionally handing the linked list an advantage by excluding what it's bad at in order to prove it's faster.

Kian · 2015-07-14 Reply Admin

My point was that arrays are still faster for almost everything, and that O-notation doesn't tell you everything you need to know.

The O-notation cost for removing a single element is lower for a doubly linked list, that is a fact. But that's a dumb point to make because the whole point of storing things in containers is to traverse the container at some point, and linked lists are awful at traversal. I'm not defending linked lists, but you need to understand them in order to hate them properly.

emkael · 2015-07-14 Reply Admin

[quote="Maciejasjmj, post:17, topic:49925] It halts (I assume) execution in the middle of processing and drops you off to the system. Whether it's via throw new UnhandledException(), abort(), return or a segfault hardly matters - end result is the same. [/quote]

Exiting, even prematurely, with a predictable error code is definitely not the same as "crashing" with a random segfault. If it was, nobody would ever bother to check exit codes in batch files.

This DIE() directive may do exactly what's expected of some provisional form of exception handling. It takes an exceptional situation and acts on it, in more or less adequate, but controlled way.

Martin_Tilsted1 · 2015-07-14 Reply Admin

The runtime is not really O(n^3) It is only (O(n)+O(k*n)) where k is the number of nodes which are removed.

If the number of nodes to remove is constant, or has a known upper bound, this method is O(n) but with some horrible constants. If the number of elements removed is proportionel to n, then the method is O(n^2)

Maciejasjmj · 2015-07-14 Reply Admin

Kian:
Sorry, should have clarified: For the case of filtering, where you are already traversing the whole thing and already have a O(n) procedure. The single act of eliminating one element you already have a pointer to is O(1) in a doubly linked list and O(n) in an array because of the squashing. In a singly linked list it is O(n) because you have to find the previous element.

Yeah, that adds up (though it's O(n) compares as opposed to O(n) swaps, which may be an issue).

Any reasonable removal code would probably require being handed a pointer to a previous element, and it's only the case for SLLs (which kinda clashes with the "reasonable" part), but yeah.

Kian:
Did you? I couldn't find it.

Back when processors were less smart and locality wasn't an issue, they were a reasonable trade-off—but as technology marched on, new data structures were invented, and processor caches grew, the drawbacks started to overshadow the benefits.

Kinda-sorta.

Yamikuronue:
The act of obtaining a pointer to an element is O(1) for an array, and O(n) for a linked list. Meaning the total "remove" operation, from data structure to data structure minus one element, is O(n) in both.

If you can assume you know the index of the element you want to remove, why can't you assume you know the pointer to the element? In both cases, you have a piece of information related to the data structure itself, and not the data being stored.

emkael:
Exiting, even prematurely, with a predictable error code is definitely not the same as "crashing" with a random segfault. If it was, nobody would ever bother to check exit codes in batch files.

As far as we know, the exit code is zero, so. Also from the batch file's point of view, a segfault also returns a predictable error code (139, AFAIR).

Martin_Tilsted1:
only (O(n)+O(k*n))

Fine, fine, my mad complexity analysis skills got a bit rusty since I lost the "CS student" status from the author blurb.

FrostCat · 2015-07-14 Reply Admin

Maciejasjmj:
e if you don't squash, you no longer have an array

Surely by now someone's invented a UDT backed by an array, but also has an array of valid bits so you can skip the squashing. (Or use a thread to do it a little bit at a time.)

Maciejasjmj · 2015-07-14 Reply Admin

FrostCat:
Surely by now someone's invented a UDT backed by an array, but also has an array of valid bits so you can skip the squashing. (Or use a thread to do it a little bit at a time.)

Kind of a sparse-ish array. It looks like it would make things somewhat awkward though - straightforward enumeration time would depend on the number of elements that have ever been in an array, there's a disparation between "n-th element (using this index I have here)" and "n-th element (counting from the first)", and inserting element still requires reallocation.

Pointerrific

Leave a comment on “Pointerrific”