- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
The way you phrased that derailed my brain for a little while. I initially read "removing a single element" as meaning the entire process, including finding the element to remove (which of course is not constant-time), rather than just the actual removal. :facepalm:
:headdesk:
:headdesk: :headdesk:
:headdesk: :headdesk: :headdesk: :gun:
Admin
You always have to wonder what kind of bug drove someone to make checks that can't fail. I wonder if I've left that kind of code in from old debugging sessions at some point. Debugging makes you insane.
Admin
a) "If not, the program crashes and returns a success code to the OS, like any good crashing program should"..... Actually it is not clear what DIE(...) does, it may actually terminate the program and return a failed code to the OS [such that the return 0 is never executed].
b) I would not consider the above a "crashing program"
c) I may have misread things, since I am just now drinking my first cup of coffee.
Admin
The code snippet could be from
main
for all we know ...Admin
Agreed. But if it returns an error code, why return 0 next line? If it doesn't, what does it do? Is the zero returned from main, or from a procedure?
I'd say if the program exits, it counts as a crash - but other than that, I'm at a total loss.
Admin
:doingitwrong.iso:
Admin
What if it does NOT RETURN -- for example throws an exception or even has a simple "exit(4);"
Admin
So every program that completes execution has crashed????
Admin
Think the point is that it ended prematurely thus not really completing it's job. It quit graciously maybe but still failed. Might as well crash then
Admin
The article is certainly right about obsolescence. I think I last used a linked list in around 2000, and that was proof of concept for a batch process in C that had to load in and clean up csv files, and was rapidly replaced by a database approach (thank you, Monty Widenius, for making it possible for free.) I think I last tore my hair out and replaced someone's linked list implementation (in Java) in about 2006 - children, do not use linked lists where there is a much better library solution. It is not big and it is not clever.
Admin
Debugging lets you express the insanity you already have.
Admin
Because you never know when die refuses to die... due to cosmic rays perhaps.
Admin
Or because the compiler does not know that DIE will never return, and the explicit (but never executed) return statement is required to get proper analytics (such as flow path analysis)
Admin
Hm... The naive solution is definitely O(n²). I think you can bring it to O(n), but with O(n) space requirement (mark-and-remove)?
Well, then it doesn't and returning 0 is redundant unless your compiler complains because it can't see exit() as an exit point.
It halts (I assume) execution in the middle of processing and drops you off to the system. Whether it's via
throw new UnhandledException()
,abort()
,return
or a segfault hardly matters - end result is the same.Admin
See example implementation of
std::remove_if
.It works something like
invalidIdx
= iterate forward to find first to-be-removed elementinvalidIdx+1
: For a valid element, copy it toinvalidIdx
and incrementinvalidIdx
. For an invalid element, do nothing.Admin
Hm, that makes sense too. Well, one more reason not to use LLs...
Admin
Admin
Calling a data structure like a linked list obsolete is a bit extreme. It certainly has its place depending on what you're trying to accomplish and almost every language has it or is very easy to implement.
Admin
Well in that case TRWTF is that the only return statement is after the die...well then I suppose the die was cast.
Admin
With the array version you still end up doing O(n) copies/moves, so if the copy/move is expensive that might be an issue.
Still, IMO you need a special reason for picking an LL over some other data structure (e.g., array).
Admin
Array version:
Admin
Admin
Kinda high speed - you still need to get to the previous node to fix it, so it's only good when you're already enumerating the list. And it's only relevant if you need to keep the order, because hashtables.
Obviously. It's the age of 8GB of RAM - while I can see some (if still rather exotic) uses for deques, SLLs can die in a fire.
Admin
Good point.
I always end up with something like per-pixel linked lists (of which you can build many in parallel & lock-free while allocating elements from a shared pool) as an example for a valid use of linked lists.
Admin
Let's see, implementing deletion code in linked list, based on a condition.
Sorry about the lack of comments, but that should do the deletion in O(n), depending on the complexity of some_condition(). In an array:
Again with the lack of comments! This just off the top of my head and not well tested, but I have to get off to work. This would be around O(n^2), by copying all the following elements each time, again depending on the complexity of the condition and the assignment operation. Hmm, could I make it O(n log(n))?
Admin
Yes.
And then you could make it O(n), and not be doing it wrong. :wtf:
Admin
For that matter what if the list is malformed and has a loop in it muhuhuhahahah.
Admin
Actually, hash tables tend to use a linked list hanging off each of the hash buckets. Because the length of the list tends to be limited (depending on when you do a hash rebuild, which is a Black Magic tuning parameter) you don't pay much cost for the list iteration.
Admin
Sure, but if your hash function isn't complete shit, those will rarely if ever contain more than one element.
Admin
Constant time is O(1). Linear time is O(n).
Admin
You're wrong. It's usual for a hash table to have multiple elements per bucket precisely because that means that the actual hash table size is smaller for almost no overhead. Given random elements (i.e., even distribution of hash values) you get a random distribution of bucket loadings, with some buckets having several elements and others none. Because randomness. Preventing that requires a very carefully tuned hash function, and it's just not worth the effort to write one of those! (You'd have to retune every time you used different input data. Belgium that!)
The trick is to rebuild the hash table once the number of elements reaches a critical load factor. That redistributes the elements across a larger hash table with a (slightly) different hash function. (The “difference” in hash function is usually just that a different mod factor is used to bound the hash value to the hash table size. Trivial.) That and getting a good (pre-mod) hash function seems to be genuinely difficult.
Admin
What's kind of embarassing: the author claims that the algorithm is O(n³), which is plain wrong. It's actually still O(n²)…
Admin
Well, technically, any function that is O(n2) is also O(n3). However, it is pretty clear that the author means Θ(n3), which is wrong. The worst-case running time is Θ(n2).
(when you have done a "goto REP;" after deleting an element, all elements before the just deleted will have some_condition evaluate to false. Hence only linear time is spent passing over them before we get to the next element that is a candidate for deletion.)
Admin
Ah, but you guys proved that filtering an array is O(n), so my original assertion that this code makes linked lists less efficient than arrays at filtering still holds!
Filed under: two wrongs make a right
Admin
I know I've been guilty of putting checks that should never be able to fail in places where, in the event of cosmic rays, an infinite loop or data corruption might happen. I usually comment such checks as being overly-pessimistic and unnecessary, though. I just always remember two quotes I read years ago:
"Computers make very fast, very accurate mistakes."
and
"Beware of errors in the above code. I have only proven it correct, I have not attempted to compile."
It's not like there's a significant performance penalty on modern CPU's for doing a NULL comparison, or having a "default" in a switch statement that will never (barring cosmic rays) happen.
Admin
It's not like it protects you from anything, since any compiler worth its salt would remove the dead code. So even if cosmic rays happen, you probably still won't be saved.
Admin
One more consideration: not knowing the scenario this code is used in, it may well be that the number of removed elements is small (or even constant) with respect to n. In that case, this method is Θ(n), too.
Also, in terms of cache locality, a linked list is pretty much equivalent to having an array of object references/pointers. So in a language like java or with objects that are either non-copyable or expensive to copy, arrays wouldn't solve this problem.
Admin
Fair point, but it makes me feel better ;)
Admin
As some people already pointed out, arrays and linked lists have the same O-notation cost for insertion or removal. What the article misunderstands is that O-notation is not the whole picture. Arrays and linked lists have different trade-offs that are completely independent of the O-notation cost.
For example, deletion from an array requires "shifting" every subsequent element to keep the array compact. Aside from the fairly cheap cost of doing this (especially compared to constant cache misses caused by following links in a list), this means that pointers or iterators to every element behind the removed ones are invalidated. A linked list avoids this issue since the nodes remain where they were.
Performance wise, there have been tests that show that on current computers, even in the cases where the linked list should excel, an array performs better because of the cost of following pointers around versus the locality advantage of the array. The only reason to use a linked list is if your design relies on its behavior (not invalidating pointers, for example). And even then, you can probably make a smarter structure (array of pointers, for example).
Admin
Pretty sure it's not the case. You need to include squashing the array in the cost since if you don't squash, you no longer have an array, but I don't think you need to include lookup.
Pretty sure I mentioned that.
Admin
Sorry, should have clarified: For the case of filtering, where you are already traversing the whole thing and already have a O(n) procedure. The single act of eliminating one element you already have a pointer to is O(1) in a doubly linked list and O(n) in an array because of the squashing. In a singly linked list it is O(n) because you have to find the previous element.
Did you? I couldn't find it.Admin
The act of obtaining a pointer to an element is O(1) for an array, and O(n) for a linked list. Meaning the total "remove" operation, from data structure to data structure minus one element, is O(n) in both.
Admin
Not necessarily. You could have the pointer from when you created the element, without ever having looked for it, and then removed it. You would have never traversed the list, nor cared about it's length.
Admin
Sure, but then you're intentionally handing the linked list an advantage by excluding what it's bad at in order to prove it's faster.
Admin
My point was that arrays are still faster for almost everything, and that O-notation doesn't tell you everything you need to know.
The O-notation cost for removing a single element is lower for a doubly linked list, that is a fact. But that's a dumb point to make because the whole point of storing things in containers is to traverse the container at some point, and linked lists are awful at traversal. I'm not defending linked lists, but you need to understand them in order to hate them properly.
Admin
[quote="Maciejasjmj, post:17, topic:49925] It halts (I assume) execution in the middle of processing and drops you off to the system. Whether it's via
throw new UnhandledException()
,abort()
,return
or a segfault hardly matters - end result is the same. [/quote]Exiting, even prematurely, with a predictable error code is definitely not the same as "crashing" with a random segfault. If it was, nobody would ever bother to check exit codes in batch files.
This DIE() directive may do exactly what's expected of some provisional form of exception handling. It takes an exceptional situation and acts on it, in more or less adequate, but controlled way.
Admin
The runtime is not really O(n^3) It is only (O(n)+O(k*n)) where k is the number of nodes which are removed.
If the number of nodes to remove is constant, or has a known upper bound, this method is O(n) but with some horrible constants. If the number of elements removed is proportionel to n, then the method is O(n^2)
Admin
Yeah, that adds up (though it's O(n) compares as opposed to O(n) swaps, which may be an issue).
Any reasonable removal code would probably require being handed a pointer to a previous element, and it's only the case for SLLs (which kinda clashes with the "reasonable" part), but yeah.
Kinda-sorta.
If you can assume you know the index of the element you want to remove, why can't you assume you know the pointer to the element? In both cases, you have a piece of information related to the data structure itself, and not the data being stored.
As far as we know, the exit code is zero, so. Also from the batch file's point of view, a segfault also returns a predictable error code (139, AFAIR).
Fine, fine, my mad complexity analysis skills got a bit rusty since I lost the "CS student" status from the author blurb.
Admin
Surely by now someone's invented a UDT backed by an array, but also has an array of valid bits so you can skip the squashing. (Or use a thread to do it a little bit at a time.)
Admin
Kind of a sparse-ish array. It looks like it would make things somewhat awkward though - straightforward enumeration time would depend on the number of elements that have ever been in an array, there's a disparation between "n-th element (using this index I have here)" and "n-th element (counting from the first)", and inserting element still requires reallocation.