- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
I haven't seen a hash table implementation using a linked list for ages. Typically, you calculate the hash value, and then you calculate a sequence of possible locations in the table based on the hash table. No linked list involved. And if the table gets full, you create a bigger table.
Admin
Java's built-in HashMap (unless it's changed in 1.8) uses linked lists to handle collisions.
Admin
Well I was thinking of something like, in C++ terms, redefining operator() for callers, so nothing but internal use would see the actual array. Implementing all of that--including the feature you mention--sounds like a fun WTFy project.
You could, in theory, handle insertions via overflow arrays. :) The background thread would gradually unify the backing array.
Admin
A data structure handled by a background thread. No way that could cause any problems...
Admin
Nobody seems to have picked it up the first time I mentioned, it, so I figured it was worth doing it again.
I think someone who really knew what they were doing could implement it properly, although it's obviously a huge :wtf: and most people should run screaming from the idea. ;)
Somewhat more seriously, and I forgot to mention this in my last post, if you were able to make sure nothing in the array ever got aliased, so that you could "safely" move things around, some kind of cleanup process that was designed to amortize work across multiple calls in anything that accessed the backing array could probably function. Many years ago I saw a game that used a 1D array to implement the map; the code had a bunch of functions to essentially build a randomized list that contained pointers to each element in the array, and a static variable that kept track of where it was in the list. The update thread would periodically run partway through the list updating cells. That made it look like it wasn't operating like the scan lines on a CRT as it updated the map, but also amortized the cost of updating the map.
In modern hardware I'm sure you'd never bother doing that, but this was code written to run on OS/2 in the mid 90s.
Admin
That's a different strategy. It's more efficient with optimal hash functions but in reality it's slower. The selection and tuning of hash functions is really hard, much more so than most people think, and small differences can have quite an impact. The main downside is that it encourages the use of prime-sized hash tables, and those trigger slower CPU code paths in some cases (in the mod/remainder operation used to compute the actual hash table entry address).
The highest performance hash tables only use a single level of hashing (with a very fast hash function) and then store a list hanging off the hash bucket. If your hash table is large enough, the cost is the same whatever strategy you use, but with the use of lists hanging off the buckets you don't need to resize/rebuild the table (an expensive operation) nearly so often. With good choice of growth factors, the average length of list that needs to be traversed to get an element can be kept small (i.e., less than 2).
FWIW, I maintain one of the world's fastest hash table implementations (because it's a super-hot-spot in a library I maintain). I've read quite a lot of the literature on hashing (of non-malicious data) and almost everything I've seen written about the area is bunk. What I can say is that a hash function tuned for malicious data is usually comparatively slow on non-malicious data, that getting good cache locality matters hugely, and that tuning hash tables is not just about twiddling with hash functions (the rebuild algorithm is sensitive too).
It changed in 1.8. It switches to hanging balanced trees off the hash table buckets once a size threshold is passed (no, I don't know the sorting criterion when using a non-comparable key, or if one is even possible). Quite a lot more complicated, and might give poorer code locality (would need to use the right sort of profiler to figure that out) but will give better worst-case behaviour if the change kicks in.
Admin
I'm pretty sure array-of-pointers still has better cache behaviour than linked list, because array-of-pointer allows prefetching further in advance. In a linked list you can only fetch N+1 once N has been fetched.
Regarding deleting in the middle of a linked list, I've got a long double-ended queue used for cleaning up elements that are otherwise accessed from a hash table. Being able to cheaply delete elements from the list here is a huge benefit. The main problem in that code is not so much its usage of linked lists, but the fact that the linked list is iterated over in circumstances where it really shouldn't be.
Admin
Linked lists are not broken per se. In many cases, they are a perfectly good structure for the task. And there are languages (like pure C) in which you may not have any sane data structures in the standard library - when you need to roll your own, the linked lists' simplicity shines.
TRWTF for me is not even trying to wrap that linked list in any kind of sane and generic API, but coding all those
->next
s manually, and COPYING them all over the codebase. Adding a total lack of idea about basic algorithms, we get a total mess.Admin
Just as well it wasn't a doubly-linked list. Imagine the performance if they'd had to do all those traversals in both directions!
Filed under: You know they would have
Admin
It changed in 1.8.
EDIT: :hanzo:'d
Admin
Concurrent programming is very very hard.
EhCache uses a background thread to expire objects in the diskstore for the cache.
Admin
Admin
My emphasis added. The mistakes are precise (consistently repeatable), not accurate (giving the correct value, almost a contradiction in terms for a mistake).
Admin
No, it's just that when you do concurrent programming, the CPUs hate your guts and will find creative ways to spite you.
Admin
I commonly implement iteration over my linked list in place; that's really too simple to bother with wrappers (and C doesn't have a foreach construct with generic iterators). Inserts/deletes/etc. into doubly linked I do tend to abstract into separate functions, as I'm always triple-counting whether I updated all pointers and it'll still turn out I missed one...
Admin
That's what macros are for! Something vaguely like this (assuming you're not doing updates at the same time):
so you can write:
WARNING: untested. I use similar things in my code, but they're iterating over a more complex structure. The use is still pretty similar though…
Admin
True, you could do that, though in our case not with a generic foreach name as we tend to prefix our structure members.
Also I think your example case is missing a temporary variable. Not sure if I could introduce that in that location - we've been on C89 for way too long, and that doesn't allow variable declarations in the initialization section of the foreach.
Admin
Ah, in that case you'd take the prefix as an additional parameter and use macro magic to build the member names. :fried_shrimp:
Could well be so. I prefer to name the iteration variable explicitly so that I can nest the iterators easily (for those algorithms where an N-against-M matching is the right approach).
Admin
Or this https://troydhanson.github.io/uthash/utlist.html
Admin
I use to code C! You can remove all matching items from a linked without needing a special case for the head pointer if you use doubly-indirect pointers:
So compact! So simple!
Admin
:hanzo:
Yup, nice trick to know, hard trick for some people to understand, formerly mentioned by me and praised by @Steve_The_Cynic
This trick also works to insert things into a sorted list.
Admin
"used"... You used to program in C? Or did you write code in C?
If you call "programming" by the new crappy diminishing name "coding", you are part of the problem, and a traitor, and should line up over there by the wall. Don't worry about this thing over here with the bundle of tubes pointed at you. It won't hurt very much or for very long if it starts up.
Admin
I don't think you want to dereference the possible null in the for condition.
Admin
pitem never becomes NULL. It starts as the address of the head element, and it is only ever updated to the address of the next pointer of an existing element.
Admin
So you're saying that in this implementation of Item, whatever it is, the last element will always be a fully-initialized terminal object that has a default value of zero?
In every implementation of a list I've seen, the last item->next points to NULL and dereferencing that will do bad things to your program, so testing pitem instead of *pitem is used.
Admin
item
is a single pointer.pitem
is a double pointer. Neitheritem
norpitem
are ever set to NULL.On the last item,
pitem
is set to&item->next
.item
is not null, so&item->next
is neither null nor some other invalid value, butitem->next
is null and so is*pitem
, so the loop terminates. At termination of the loop*pitem
points to the trailing NULL pointer of the list.Admin
I see you've never used Xcode before.
Admin
Admin
Admin
Actually, @PleegWat's implementation is not as big of a :wtf: as I thought, It has an unnecessary continue statement, but it works in principle.
The challenge here is dealing with the case where the first item(s) in the list must be deleted, and the case where items in the middle of the list must be deleted. You can code this with separate logic (as @Nutster did), but this code solves that problem using an "extra level of indirection" in the form of the double-pointer pitem. Once the "else" clause gets run, pitem really does hold a pointer to a previous next, and that next pointer can get changed in a later iteration.
I may have some crow for dinner, but this extra layer of indirection really makes the code difficult to read. I prefer the "dummy head node" technique, where the first node in your list is empty, but has a next pointer that goes to the first real item. This lets you operate the same way on every node, using regular pointers only.
Admin
I prefer using languages where you don't have to deal with this low-level bullshit technique.
Admin
What's fun is when an optimiser manages to deduce functionally the same thing from a higher-level (BS-free) description. No need for manual trickiness if a compiler can make it for you.
Admin
At work, we offload all our performance issues on the MS SQL Server developers. Seems to be a winning strategy.
Admin
I find it hard to believe this isn't deliberate. I've seen cases of "I'll optimise that later if it needs to be optimised." but in this code it looks simpler to do it properly than to do it how it has been done. I think someone wanted to make sure they had maintenance work to do later.
Admin
The other solution I tend to go for is to write a foreach function the does the iterating, pass a function pointer as an argument and call that function on each list element.
Admin
That's OK in a language with lambdas, but C doesn't have them (unless you rely on GNU extensions, which I won't do). Without lambdas, that's seriously annoying. Also, you can't do
break
andcontinue
and earlyreturn
that way. Not without serious hackery (well, for earlyreturn
that is; the others can be done with a smarter function definition).Admin
Admin
Admin
Shouldn't
List
be a template type so that you can make thenext
be of the right type, whatever the subclass is?Admin
Admin
It was a quote. Don't remember by who, off the top of my head.
Admin
It's been called coding for a long time, back from the days when it involved machine language opcodes.
My first real job was in '86, on the ASCO computer-assisted coding program for the '86 (Australian) Census. We had 640k ram, a 10MHz 80386 processor, 10 megabyte hard drives and we were thankful! Had to fight tooth and nail to get colour monitors, too, even though colour was a key part of the user interface.
Oh, what days! It was crap, but I didn't know it. I was young and earning actual money for the very first time.
Admin
Speaking of old school:
There were two ways to get output on the screen. One involved talking to the BIOS, the other involved putting your bytes directly into video card memory. The magic incantation was
Which defined video memory as 2-dimensional array of unsigned 2-byte ints. top byte was the colour, low byte was the character. To put an uppercase 'A' in bright white on blue at row 5, column 6, it was
The problem was that the old CGI cards usewd dual-ported memory, and would dump snow all over the screen if you did this. So you would watch the hardware port that indicated that the screen was in the middle of a retrace (pointing the electron gun back at the top of the screen again), and dump things to the screen during that.
Thing is: this isn't some hack I invented. It was standard with the old PC-clones using CGI cards. Everyone did it.
Admin
The pointer pitem is never null. At the end of the list, it will point to a pointer that holds a null. The hand is quicker than the eye.
Admin
You assume that the code is single threaded and doesn't have to deal with reentrancy problems (e.g., due to OS interrupts firing off signal handlers). Which is probably a reasonable assumption, but let's make it explicit before the code blows up in a way that generates maximum amusement here… :D
Admin
Despite all that, I have the impression that calling this job "coding" and calling programmers "coders" is, this time round, of a different character. Given that you have non-programmer types calling us "coders" and then going on one-day "coding" courses, one might be forgiven for thinking that PHBs are going to start assuming that we are overpaid - after all, the PHB knows about coding now...
(About coding, perhaps, a little bit, but about all that a programmer / developer / software engineer does? Ha Ha!)
Admin
And the key was that they didn't use dual-ported memory. They used ordinary single-ported DRAM, and external logic to control access, and that external logic gave priority to the CPU rather than the graphics hardware. (On the XT clone I bought in 1987, however, the CGA card's external memory access gave priority to the graphics over the CPU. It tended to be slower than slow, but never a trace of snow on-screen.)
Oh, and it wasn't just during vertical retrace (bottom-to-top return) but also during horizontal retrace (right-to-left return) that you could scribble into the memory without causing snow. Also, you don't point the gun itself, because the electron emitter is fixed (horizontal retrace on NTSC is 15KHz, and you don't really want a physical component with wires attached wobbling back and forth that fast on a consumer device), and the CRT uses electromagnets to steer and focus the stream of electrons. Those electromagnets take a non-zero time to change from all-right to all-left, and from all-down to all-up, and these times are the retrace intervals.
If you're going to lecture people on history, it pays to get it right. (I'm sure there are details I got wrong, and I apologise for that.)
Admin
It might've been a 286, which had a 10MHz version, or it might have been a 12MHz 386. Apparently the first models had yield problems which meant that the clock speeds were reduced. Hard to be sure, since both options were quite possible in the reported timeframe and it's quite a while ago now. :smile:
Admin
They know how to write it down and read it again. Do they understand how to choose what to write down? Programming isn't (usually) just throwing characters at a wall and seeing what sticks.
Except for when it's Javascript or PHP, it seems.
Admin
This is more concise:
class node { public: char payload[16]; node *next; };
void foo(node *ll) { for (node *n = ll; n != NULL; n = n->next) { if (n->next != NULL && strcmp(n->next->payload, "foo") == 0) { n->next = n->next->next; } } }