The Daily WTF: Curious Perversions in Information Technology

gnasher729 · 2015-07-14 Reply Admin

I haven't seen a hash table implementation using a linked list for ages. Typically, you calculate the hash value, and then you calculate a sequence of possible locations in the table based on the hash table. No linked list involved. And if the table gets full, you create a bigger table.

hungrier · 2015-07-14 Reply Admin

Java's built-in HashMap (unless it's changed in 1.8) uses linked lists to handle collisions.

FrostCat · 2015-07-14 Reply Admin

Well I was thinking of something like, in C++ terms, redefining operator() for callers, so nothing but internal use would see the actual array. Implementing all of that--including the feature you mention--sounds like a fun WTFy project.

You could, in theory, handle insertions via overflow arrays. :) The background thread would gradually unify the backing array.

Maciejasjmj · 2015-07-14 Reply Admin

FrostCat:
background thread

A data structure handled by a background thread. No way that could cause any problems...

FrostCat · 2015-07-14 Reply Admin

Nobody seems to have picked it up the first time I mentioned, it, so I figured it was worth doing it again.

I think someone who really knew what they were doing could implement it properly, although it's obviously a huge :wtf: and most people should run screaming from the idea. ;)

Somewhat more seriously, and I forgot to mention this in my last post, if you were able to make sure nothing in the array ever got aliased, so that you could "safely" move things around, some kind of cleanup process that was designed to amortize work across multiple calls in anything that accessed the backing array could probably function. Many years ago I saw a game that used a 1D array to implement the map; the code had a bunch of functions to essentially build a randomized list that contained pointers to each element in the array, and a static variable that kept track of where it was in the list. The update thread would periodically run partway through the list updating cells. That made it look like it wasn't operating like the scan lines on a CRT as it updated the map, but also amortized the cost of updating the map.

In modern hardware I'm sure you'd never bother doing that, but this was code written to run on OS/2 in the mid 90s.

dkf · 2015-07-14 Reply Admin

gnasher729:
I haven't seen a hash table implementation using a linked list for ages. Typically, you calculate the hash value, and then you calculate a sequence of possible locations in the table based on the hash table. No linked list involved. And if the table gets full, you create a bigger table.

That's a different strategy. It's more efficient with optimal hash functions but in reality it's slower. The selection and tuning of hash functions is really hard, much more so than most people think, and small differences can have quite an impact. The main downside is that it encourages the use of prime-sized hash tables, and those trigger slower CPU code paths in some cases (in the mod/remainder operation used to compute the actual hash table entry address).

The highest performance hash tables only use a single level of hashing (with a very fast hash function) and then store a list hanging off the hash bucket. If your hash table is large enough, the cost is the same whatever strategy you use, but with the use of lists hanging off the buckets you don't need to resize/rebuild the table (an expensive operation) nearly so often. With good choice of growth factors, the average length of list that needs to be traversed to get an element can be kept small (i.e., less than 2).

FWIW, I maintain one of the world's fastest hash table implementations (because it's a super-hot-spot in a library I maintain). I've read quite a lot of the literature on hashing (of non-malicious data) and almost everything I've seen written about the area is bunk. What I can say is that a hash function tuned for malicious data is usually comparatively slow on non-malicious data, that getting good cache locality matters hugely, and that tuning hash tables is not just about twiddling with hash functions (the rebuild algorithm is sensitive too).

hungrier:
Java's built-in HashMap (unless it's changed in 1.8) uses linked lists to handle collisions.

It changed in 1.8. It switches to hanging balanced trees off the hash table buckets once a size threshold is passed (no, I don't know the sorting criterion when using a non-comparable key, or if one is even possible). Quite a lot more complicated, and might give poorer code locality (would need to use the right sort of profiler to figure that out) but will give better worst-case behaviour if the change kicks in.

PleegWat · 2015-07-14 Reply Admin

I'm pretty sure array-of-pointers still has better cache behaviour than linked list, because array-of-pointer allows prefetching further in advance. In a linked list you can only fetch N+1 once N has been fetched.

Regarding deleting in the middle of a linked list, I've got a long double-ended queue used for cleaning up elements that are otherwise accessed from a hash table. Being able to cheaply delete elements from the list here is a huge benefit. The main problem in that code is not so much its usage of linked lists, but the fact that the linked list is iterated over in circumstances where it really shouldn't be.

IrritatingEnlightenr · 2015-07-14 Reply Admin

Linked lists are not broken per se. In many cases, they are a perfectly good structure for the task. And there are languages (like pure C) in which you may not have any sane data structures in the standard library - when you need to roll your own, the linked lists' simplicity shines.

TRWTF for me is not even trying to wrap that linked list in any kind of sane and generic API, but coding all those ->nexts manually, and COPYING them all over the codebase. Adding a total lack of idea about basic algorithms, we get a total mess.

Scarlet_Manuka · 2015-07-15 Reply Admin

Just as well it wasn't a doubly-linked list. Imagine the performance if they'd had to do all those traversals in both directions!

Filed under: You know they would have

another_sam · 2015-07-15 Reply Admin

hungrier:
Java's built-in HashMap (unless it's changed in 1.8) uses linked lists to handle collisions.

It changed in 1.8.

EDIT: :hanzo:'d

another_sam · 2015-07-15 Reply Admin

Maciejasjmj:
A data structure handled by a background thread. No way that could cause any problems...

Concurrent programming is very very hard.

EhCache uses a background thread to expire objects in the diskstore for the cache.

Steve_The_Cynic · 2015-07-15 Reply Admin

PleegWat:
```c Item * item; Item ** pitem;
for( pitem = &list ; *pitem ; /**/ ) { item = *pitem; if( some_condition( item ) ) { *pitem = item->next; item_free(item); continue; } else { pitem = &item->next; } }
</div></BLOCKQUOTE>
Hallelujah!!!!  Somebody knows this one!  That was my first thought on solving the single-linked list delete problem.  I remember it after 30 years, and I'm glad someone else remembered it.  Thanks.

Steve_The_Cynic · 2015-07-15 Reply Admin

tenshino:
"Computers make very fast, ***very accurate mistakes.***"

My emphasis added. The mistakes are precise (consistently repeatable), not accurate (giving the correct value, almost a contradiction in terms for a mistake).

dkf · 2015-07-15 Reply Admin

another_sam:
Concurrent programming is very very hard.

No, it's just that when you do concurrent programming, the CPUs hate your guts and will find creative ways to spite you.

PleegWat · 2015-07-15 Reply Admin

I commonly implement iteration over my linked list in place; that's really too simple to bother with wrappers (and C doesn't have a foreach construct with generic iterators). Inserts/deletes/etc. into doubly linked I do tend to abstract into separate functions, as I'm always triple-counting whether I updated all pointers and it'll still turn out I missed one...

dkf · 2015-07-15 Reply Admin

PleegWat:
C doesn't have a foreach construct with generic iterators

That's what macros are for! Something vaguely like this (assuming you're not doing updates at the same time):

#define foreach(var,list) for(var=&(list);*var;*var=&var->next)

so you can write:

foreach(x,allmyxs) {
    if (x->value == 17)
        break;
    frobnicate(x->value);
}

WARNING: untested. I use similar things in my code, but they're iterating over a more complex structure. The use is still pretty similar though…

PleegWat · 2015-07-15 Reply Admin

True, you could do that, though in our case not with a generic foreach name as we tend to prefix our structure members.

Also I think your example case is missing a temporary variable. Not sure if I could introduce that in that location - we've been on C89 for way too long, and that doesn't allow variable declarations in the initialization section of the foreach.

dkf · 2015-07-15 Reply Admin

PleegWat:
True, you could do that, though in our case not with a generic foreach name as we tend to prefix our structure members.

Ah, in that case you'd take the prefix as an additional parameter and use macro magic to build the member names. :fried_shrimp:

PleegWat:
Also I think your example case is missing a temporary variable.

Could well be so. I prefer to name the iteration variable explicitly so that I can nest the iterators easily (for those algorithms where an N-against-M matching is the right approach).

rdt · 2015-07-15 Reply Admin

dkf:
Something vaguely like this

Or this https://troydhanson.github.io/uthash/utlist.html

Foon · 2015-07-16 Reply Admin

I use to code C! You can remove all matching items from a linked without needing a special case for the head pointer if you use doubly-indirect pointers:

struct Foo { Foo *next; }
Foo *head;

 /* fun starts here */

for(Foo **p = &head; *p ; ) {
  if(needToDelete(*p)) {
    Foo *deleteThis = *p;
    *p = (*p)->next;
    free(deleteThis);
  }
  else {
    p = &(*p)->next;
  }
}

So compact! So simple!

PleegWat · 2015-07-16 Reply Admin

:hanzo:

Yup, nice trick to know, hard trick for some people to understand, formerly mentioned by me and praised by @Steve_The_Cynic

This trick also works to insert things into a sorted list.

Steve_The_Cynic · 2015-07-16 Reply Admin

Foon:
I ***use*** to code C!

"used"... You used to program in C? Or did you write code in C?

If you call "programming" by the new crappy diminishing name "coding", you are part of the problem, and a traitor, and should line up over there by the wall. Don't worry about this thing over here with the bundle of tubes pointed at you. It won't hurt very much or for very long if it starts up.

foxyshadis · 2015-07-17 Reply Admin

PleegWat:

Item * item;
Item ** pitem;

for( pitem = &list ; *pitem ; /**/ )
{
    item = *pitem;
    if( some_condition( item ) )
    {
        *pitem = item->next;
        item_free(item);
        continue;
    }
    else
    {
        pitem = &item->next;
    }
}

I don't think you want to dereference the possible null in the for condition.

PleegWat · 2015-07-17 Reply Admin

pitem never becomes NULL. It starts as the address of the head element, and it is only ever updated to the address of the next pointer of an existing element.

foxyshadis · 2015-07-17 Reply Admin

PleegWat:
pitem never becomes NULL. It starts as the address of the head element, and it is only ever updated to the address of the next pointer of an existing element.

So you're saying that in this implementation of Item, whatever it is, the last element will always be a fully-initialized terminal object that has a default value of zero?

In every implementation of a list I've seen, the last item->next points to NULL and dereferencing that will do bad things to your program, so testing pitem instead of *pitem is used.

PleegWat · 2015-07-17 Reply Admin

item is a single pointer. pitem is a double pointer. Neither item nor pitem are ever set to NULL.

On the last item, pitem is set to &item->next. item is not null, so &item->next is neither null nor some other invalid value, but item->next is null and so is *pitem, so the loop terminates. At termination of the loop *pitem points to the trailing NULL pointer of the list.

mott555 · 2015-07-17 Reply Admin

Steve_The_Cynic:
The mistakes are precise (consistently repeatable)

I see you've never used Xcode before.

Crunger · 2015-07-19 Reply Admin

Steve_The_Cynic:

PleegWat:

Item * item; Item ** pitem;

for( pitem = &list ; *pitem ; /**/ ) { item = *pitem; if( some_condition( item ) ) { *pitem = item->next; item_free(item); continue; } else { pitem = &item->next; } }

</div></BLOCKQUOTE>
Hallelujah!!!!  Somebody knows this one!  That was my first thought on solving the single-linked list delete problem.  I remember it after 30 years, and I'm glad someone else remembered it.  Thanks.
</div></BLOCKQUOTE>

Yeah.  High fives all around.  This nice crisp implementation should lead to a juicy crash, since the previous item in the list still points to the item you just freed.

This is what's so tricky about a linked list, and why you have to keep around the predecessor pointer, or do something silly as happened in the original CodeSOD.

Buddy · 2015-07-19 Reply Admin

Crunger:
Steve_The_Cynic:
PleegWat:
Item * item; Item ** pitem;

for( pitem = &list ; *pitem ; /**/ ) { item = *pitem; if( some_condition( item ) ) { *pitem = item->next; item_free(item); continue; } else { pitem = &item->next; } }
</div></BLOCKQUOTE>
</div></BLOCKQUOTE>
 you have to keep around the predecessor pointer
</div></BLOCKQUOTE>
Would it help if the else branch was written as `pitem = &(item->next)`?

Crunger · 2015-07-19 Reply Admin

Buddy:
Would it help if the else branch was written as `pitem = &(item->next)`?

Actually, @PleegWat's implementation is not as big of a :wtf: as I thought, It has an unnecessary continue statement, but it works in principle.

The challenge here is dealing with the case where the first item(s) in the list must be deleted, and the case where items in the middle of the list must be deleted. You can code this with separate logic (as @Nutster did), but this code solves that problem using an "extra level of indirection" in the form of the double-pointer pitem. Once the "else" clause gets run, pitem really does hold a pointer to a previous next, and that next pointer can get changed in a later iteration.

I may have some crow for dinner, but this extra layer of indirection really makes the code difficult to read. I prefer the "dummy head node" technique, where the first node in your list is empty, but has a next pointer that goes to the first real item. This lets you operate the same way on every node, using regular pointers only.

blakeyrat · 2015-07-19 Reply Admin

I prefer using languages where you don't have to deal with this low-level bullshit technique.

dkf · 2015-07-19 Reply Admin

What's fun is when an optimiser manages to deduce functionally the same thing from a higher-level (BS-free) description. No need for manual trickiness if a compiler can make it for you.

blakeyrat · 2015-07-19 Reply Admin

At work, we offload all our performance issues on the MS SQL Server developers. Seems to be a winning strategy.

isthisunique · 2015-07-20 Reply Admin

I find it hard to believe this isn't deliberate. I've seen cases of "I'll optimise that later if it needs to be optimised." but in this code it looks simpler to do it properly than to do it how it has been done. I think someone wanted to make sure they had maintenance work to do later.

geoff · 2015-07-20 Reply Admin

The other solution I tend to go for is to write a foreach function the does the iterating, pass a function pointer as an argument and call that function on each list element.

dkf · 2015-07-20 Reply Admin

That's OK in a language with lambdas, but C doesn't have them (unless you rely on GNU extensions, which I won't do). Without lambdas, that's seriously annoying. Also, you can't do break and continue and early return that way. Not without serious hackery (well, for early return that is; the others can be done with a smarter function definition).

Steve_The_Cynic · 2015-07-20 Reply Admin

Crunger:

Steve_The_Cynic:

PleegWat:

Item * item; Item ** pitem;

for( pitem = &list ; *pitem ; /**/ ) { item = *pitem; if( some_condition( item ) ) { *pitem = item->next; item_free(item); continue; } else { pitem = &item->next; } }

</div></BLOCKQUOTE>
Hallelujah!!!!  Somebody knows this one!  That was my first thought on solving the single-linked list delete problem.  I remember it after 30 years, and I'm glad someone else remembered it.  Thanks.
</div></BLOCKQUOTE>

Yeah.  High fives all around.  This nice crisp implementation should lead to a juicy crash, since the previous item in the list still points to the item you just freed.

This is what's so tricky about a linked list, and why you have to keep around the predecessor pointer, or do something silly as happened in the original CodeSOD.
</div></BLOCKQUOTE>
At the ```item_free```, ```item``` is the predecessor pointer.

Read the code very carefully.  ```pitem``` points to the pointer to the current item.  We retrieve the pointer to the current item in the first statement of the ```for``` loop, then if the delete condition is false, we move ```pitem``` to point to the next pointer.  If the delete condition is true, we modify the pointer to the current item to point to the item after the one we want to delete, and we call ```item_free``` against the captured value of the item to delete.

This code is ***correct***, but it is harder to understand than a more classic implementation.  On the plus side, it needs no dummy node in the list ***and*** no special case code when deleting the first element.

urkerab · 2015-07-21 Reply Admin

Crunger:
I prefer the "dummy head node" technique, where the first node in your list is empty, but has a next pointer that goes to the first real item. This lets you operate the same way on every node, using regular pointers only.

I think you could have the best of both worlds in C++, something like this (from memory; may not be actual C++):

struct Item;
struct List {
  Item *next;
};
struct Item: public List {
  T data;
};
/* ... */
List *prev = &list;
while (Item *item = prev->next) {
  if (some_condition(item)) {
    prev->next = item->next;
    item_free(item);
  } else {
    prev = item;
  }
}

dkf · 2015-07-21 Reply Admin

Shouldn't List be a template type so that you can make the next be of the right type, whatever the subclass is?

urkerab · 2015-07-21 Reply Admin

dkf:
Shouldn't `List` be a template type so that you can make the `next` be of the right type, whatever the subclass is?

Yes but I was too lazy to work out the syntax to forward declare the `Item` template type.

tenshino · 2015-07-21 Reply Admin

It was a quote. Don't remember by who, off the top of my head.

Paul_Murray · 2015-07-26 Reply Admin

It's been called coding for a long time, back from the days when it involved machine language opcodes.

My first real job was in '86, on the ASCO computer-assisted coding program for the '86 (Australian) Census. We had 640k ram, a 10MHz 80386 processor, 10 megabyte hard drives and we were thankful! Had to fight tooth and nail to get colour monitors, too, even though colour was a key part of the user interface.

Oh, what days! It was crap, but I didn't know it. I was young and earning actual money for the very first time.

Paul_Murray · 2015-07-26 Reply Admin

Speaking of old school:

There were two ways to get output on the screen. One involved talking to the BIOS, the other involved putting your bytes directly into video card memory. The magic incantation was

#define scr ((unsigned int (*)[80])0xB8000000)

Which defined video memory as 2-dimensional array of unsigned 2-byte ints. top byte was the colour, low byte was the character. To put an uppercase 'A' in bright white on blue at row 5, column 6, it was

scr[5][6] = 0x1F00 | 'A';

The problem was that the old CGI cards usewd dual-ported memory, and would dump snow all over the screen if you did this. So you would watch the hardware port that indicated that the screen was in the middle of a retrace (pointing the electron gun back at the top of the screen again), and dump things to the screen during that.

Thing is: this isn't some hack I invented. It was standard with the old PC-clones using CGI cards. Everyone did it.

Paul_Murray · 2015-07-26 Reply Admin

I don't think you want to dereference the possible null in the for condition.

The pointer pitem is never null. At the end of the list, it will point to a pointer that holds a null. The hand is quicker than the eye.

dkf · 2015-07-26 Reply Admin

Paul_Murray:
The pointer pitem is never null.

You assume that the code is single threaded and doesn't have to deal with reentrancy problems (e.g., due to OS interrupts firing off signal handlers). Which is probably a reasonable assumption, but let's make it explicit before the code blows up in a way that generates maximum amusement here… :D

Steve_The_Cynic · 2015-07-27 Reply Admin

Paul_Murray:
It's been called coding for a long time, back from the days when it involved machine language opcodes.
My first real job was in '86, on the ASCO computer-assisted coding program for the '86 (Australian) Census. We had 640k ram, a 10MHz 80386 processor, 10 megabyte hard drives and we were thankful! Had to fight tooth and nail to get colour monitors, too, even though colour was a key part of the user interface.

Oh, what days! It was crap, but I didn't know it. I was young and earning actual money for the very first time.

I hear Yorkshiremen in your gripe. (And the slowest 386 was 16MHz...)

Despite all that, I have the impression that calling this job "coding" and calling programmers "coders" is, this time round, of a different character. Given that you have non-programmer types calling us "coders" and then going on one-day "coding" courses, one might be forgiven for thinking that PHBs are going to start assuming that we are overpaid - after all, the PHB knows about coding now...

(About coding, perhaps, a little bit, but about all that a programmer / developer / software engineer does? Ha Ha!)

Steve_The_Cynic · 2015-07-27 Reply Admin

Paul_Murray:
Thing is: this isn't some hack I invented. It was standard with the old PC-clones using CGI cards. Everyone did it.

Not just the clones either. The original CGA (it was the Color/Graphics *Adapter*...) on the IBM PC did the same. (And that card was why machines still had 8-bit-only ISA slots in the 1990s. There was so much low-density chippage on it that it had to be made extra deep starting just at the end of the 8-bit slot connector, and therefore wouldn't go into 16-bit slots.

And the key was that they didn't use dual-ported memory. They used ordinary single-ported DRAM, and external logic to control access, and that external logic gave priority to the CPU rather than the graphics hardware. (On the XT clone I bought in 1987, however, the CGA card's external memory access gave priority to the graphics over the CPU. It tended to be slower than slow, but never a trace of snow on-screen.)

Oh, and it wasn't just during vertical retrace (bottom-to-top return) but also during horizontal retrace (right-to-left return) that you could scribble into the memory without causing snow. Also, you don't point the gun itself, because the electron emitter is fixed (horizontal retrace on NTSC is 15KHz, and you don't really want a physical component with wires attached wobbling back and forth that fast on a consumer device), and the CRT uses electromagnets to steer and focus the stream of electrons. Those electromagnets take a non-zero time to change from all-right to all-left, and from all-down to all-up, and these times are the retrace intervals.

If you're going to lecture people on history, it pays to get it right. (I'm sure there are details I got wrong, and I apologise for that.)

dkf · 2015-07-27 Reply Admin

Steve_The_Cynic:
And the slowest 386 was 16MHz...

It might've been a 286, which had a 10MHz version, or it might have been a 12MHz 386. Apparently the first models had yield problems which meant that the clock speeds were reduced. Hard to be sure, since both options were quite possible in the reported timeframe and it's quite a while ago now. :smile:

dkf · 2015-07-27 Reply Admin

Steve_The_Cynic:
(About coding, perhaps, a little bit, but about all that a programmer / developer / software engineer does? Ha Ha!)

They know how to write it down and read it again. Do they understand how to choose what to write down? Programming isn't (usually) just throwing characters at a wall and seeing what sticks.

Except for when it's Javascript or PHP, it seems.

2016-05-26 Reply Admin

This is more concise:

class node { public: char payload[16]; node *next; };

void foo(node *ll) { for (node *n = ll; n != NULL; n = n->next) { if (n->next != NULL && strcmp(n->next->payload, "foo") == 0) { n->next = n->next->next; } } }

Pointerrific

Leave a comment on “Pointerrific”