The Daily WTF: Curious Perversions in Information Technology

2023-01-05 Reply Admin

The repository you're looking for is "ADLINK-IST / opensplice".

2023-01-05 Reply Admin

I wanted to have the first post, but I overslept.

Jeremy Pereira · 2023-01-05 Reply Admin

For some reason, this code doesn't use mutexes to avoid two threads from entering the block.

The reason is that they do not want the block to be executed by all the threads but not at the same time, they only wanted one thread of many to execute the code in the block. If they were going to use a standard mutex they'd need another thread safe variable to record that the spliced thread was already being created. This way, they not only save on a second expensive variable, but other threads don't have to wait for the first thread to finish starting the spliced thread. They can immediately carry on with the next task which is... (checks notes) to wait for the spliced thread.

The only way the code can possibly be correct is if startSplicedWithinProcess has a way of checking if a spliced thread is already running and does nothing if it is. Otherwise, a second thread arriving at the first line of code after the first thread had decremented the init count would create another spliced thread. In fact, if the weird sleep code were before the decrement, I'd assume that was why it was put in: to make sure (for some value of being sure less than 100%) that the other threads all get to the increment before the first thread gets to the decrement.

However, the only reason I can think of it being where it is is if startSplicedWithinProcess has a method of checking whether the spliced thread is running that doesn't work properly in a threaded environment. For example, it might rely on a flag that can't be guaranteed to have been flushed from cache into main memory and the sleeps improve the chances that it has actually happened.

Addendum 2023-01-05 08:22: It's just occurred to me that for my hypothesis, the sleep is still in the wrong place. Maybe the sleep itself is enough to force a context switch and a cache flush.

AGlezB · 2023-01-05 Reply Admin

I've seen those delay_100ms constants before. The name says 100 but it you go to the definition you'll usually find a different number because some test was failing and now a million places in the code are waiting a full second or more instead of 100ms.

2023-01-05 Reply Admin

TRWTF is why is my mouse cursor a pointer when I hover over the top 2 lines of comment text in the second code block?

2023-01-05 Reply Admin

Mutexes and semaphores are great if the resources you're accessing are all within the same process but in today's world of microservices, parallelization and statelessness you're often trying to synchronize with processes and resources that might be in a different data centre running on a different technology stack. Sleep and retry is usually a lot simpler to code than the alternatives.

2023-01-05 Reply Admin

What the hell does it mean for a thread to be "spliced", anyway?

Remy Porter · 2023-01-05 Reply Admin

Sleep and retry is still the wrong solution to that problem- embrace the asynchronicity and do it all on a message bus. Emit events, pend on a message queue when waiting for replies. Synchronize that way. It's a different way to think about programming, but it isn't just for massive scale data-centers. At #dayjob, we use cFS, a NASA-built robotics platform, and it's entirely built out of asynchronous message queues running on embedded systems. Anytime I see a sleep, it's 100% a code smell that can be replaced with an operation that pends on a resource of some kind (whether the message queue or actual hardware).

2023-01-05 Reply Admin

If you want a bad way to delay your thread, you can throw in a busy loop or a sleep. This is the first time I've seen a busy loop with a sleep. Why not just sleep(delay_100ms * 10) instead of having all that context switching to just to wake up and go back to sleep again?

2023-01-05 Reply Admin

It's times like these that make me very happy that Dart has share-nothing threads.

Steve_The_Cynic · 2023-01-05 Reply Admin

Otherwise, a second thread arriving at the first line of code after the first thread had decremented the init count would create another spliced thread.

No, because the first thread (the one that got into the block to launch the thread) doesn't decrement the count. The scenario of interest is:

I arrive and atomically +1 the count
I get 1 for the count and enter the "true" branch of the if() where I begin to create the thread.
You arrive and atomically +1 the count
You get 2 and do the other wait (the else of if(count == 1)), and atomically -1 the count.
I finish creating the thread and I don't -1 the count. It is now still 1.

Items 2 and 3 can be inverted and the situation doesn't change.

2023-01-05 Reply Admin

I had to deal with legacy code like this. These bits of hacktastic crap are usually to handle old O/S versions with broken synchronization internals.

Obviously the right solution is to not support those old O/S versions. But we've all dealt with pig-headed customers who refuse to upgrade anything. As a small company you kinda have to go along.

When we got bought, it was a pleasure telling these stubborn fools they were not supported anymore. Ripping out these rotten code bits was pure joy.

Mark Wilson · 2023-01-05 Reply Admin

release the mutex!

WatersOfOblivion · 2023-01-05 Reply Admin

"Put it all on a message bus" sounds great ... until your message bus is on a different host and there's a network hiccup. After all, the first fallacy of distributed computing is "The network is reliable." You have to sleep and retry (or, more properly, exponentially back off and retry) the act of communicating the message on the message bus in the first place. At #dayjob at one of those "massive scale data centers" building those microservices Tim mentioned, we do as much as possible async for throughput, yes, but at a low level backoff and retry of failed communications is automatically baked into every client for reliability.

dkf · 2023-01-06 Reply Admin

You can't always safely wait for someone to wake you up. I've got some code that has to do a blind sleep(), but in that case it's waiting for some control hardware to boot (it takes about 15-20 seconds to stabilize) and there isn't a reliable method for that hardware to signal my code that it is done. UDP messages (all I practically have available there) really do get lost sometimes, especially at scale, so a blind sleep() and then a poke at the thing to see if it is up is all I have.

That's very much the exception to the rule.

2023-01-09 Reply Admin

Some (embedded) operating systems can switch thread only if sleep() is executed.

Exclusive Threads

Leave a comment on “Exclusive Threads”