- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
Admin
Oh man. Knowing what the contract on sleep usually is, I can't wait to see the comments on this one.
Admin
The code is almost certainly a WTF. But TRWTF is it (apparently) survived for a "decade" as Production Code (either that or expecting us to believe it).
Whilst the code may be correctly identified as to the "reason", the analysis as to the "why" leaves something to be desired.
Assumptions (based on code and provenance):
Best Case: 50/50 Chance of the Main Log being produced because:
This requires these further assumptions:
Worst Case:
There are variations and subtitles of the above, but I'll be damned if I am going to type them all out for you. Hopefully, I have given you two points in a range of possibilities that are sufficiently far apart to be considered "upper and lower limits".
Yes. I know. In a perfect world it would be 100% success or failure because times.now.t.tv_sec would always be even or odd (and let's face it. If it was permanently odd, would we be here discussing it?).
To reiterate, the code is possible a WTF and certainly it could be killed, or otherwise seriously destabilised, with anything above the power of a popgun. Another possible WTF (and thus allowing to survive undetected) was that was "test" once (if at all) and worked.
Assume that it is only a recent problem: I would suspect that after working fine for years, an external influence affected the physical logs. I.e. It took longer and longer to compile / produce them, so that they had to be queued. Something changed (or not) that resulted in some of the queued items being "chucked away". Because the Quick Logs are, by definition, smaller than the Main Logs. They would not have (or have not yet) been affected in the same manner.
Yawn.... bored... hmmmmm lunchtime.....
Admin
Those operations are definitely not always quick to execute. Any system call can take an appreciable amount of time, but thread/process spawning is one of the prime candidates for really doing so, especially when under load…
Admin
Pedantic Fail accepted :grinning:
I was trying to allude to a concept as simply and as broadly as I could. So, to :persevere: I will try some other allusion such as: "Fire and Forget" - whatever that might translate to in the coding heaven (or hell) of your choice.
Admin
sleep()
will typically sleep for at least 2 seconds. Might be more, depending on scheduling. Might be less if a signal arrives betweentime. Notably the duration of the loop is the 2 second sleep, plus however long the rest of the loop takes.Admin
@remy, @pjh
Admin
This topic is now sleepwalking. It might appear in the full log.
Admin
I just assume that no one cared enough, and everyone unable to debug because of the bug screamed impotently to the heavens (or Microsoft). For any problem, most people who want to fix it can't, and most people who can fix it won't, so your company probably has to land intern D.H. before anything ever actually works right.
Admin
I would totally agree with you, in principle, in that a fresh pair of eyes - meaning somebody new to the Code at whatever skill / pay level will see the blindingly obvious. The fun then becomes the game of determining the extent of the WTFery:
Some of the above would result in a CLM, if "pushed" to hard by "the new kid on the block".
However, in this case the Code was "Inherited" and, as we know, the provenance and veracity of these "incidents" result in an evaluation (===) of true. I don't think any Interns were harmed or mentally scared whilst experiencing it :smiley:
Admin
This reminds me of a monitoring solution I once saw for sale at a newly-invented-wheel-dealership.
Someone was faced with the nigh-insurmountable task of watching a log file on a remote server. It was pretty complex -- The log file was encoded in something called "ASCII", and each line started with a timestamp containing only the year, month, day, hour, minute and second with a decimal part which only went down to microseconds. Worse yet, there was no way of knowing when that line ended, unless you happened to catch on to the presence of a so-called "NEWLINE" character at the end.
Fortunately, our heroic developer had access to the most sophisticated monitoring tool ever invented. It ran on Windows and had a name something like "SCUMM", although I doubt that it did a very good job of playing Maniac Mansion. Since there was absolutely no way that there could have been any kind of native support for something as arcane as reading log files, it was time to do Something Clever(tm).
So, one minute after the previous check, like clockwork, a secure shell connection was opened. It called sudo to get the root privileges required to then call su to another user ID, and then finally called ls to check the modification time of the globally readable log file in question.
The mtime was then compared with the current time, and if the difference was less than sixty seconds, the monitoring system would sign in again, run sudo again, then su again, and finally use "tail -1" to gather all of the new lines added to the file since the last time.
Well, as long as there was only one of them. And it had to have been added within the last 59 seconds, even though the check interval was guaranteed to be longer than that. And don't even get me started on the incredibly unlikely chance that new messages would be added while the check was running. Or how the monitoring account was hooked in to active directory, and could take several seconds to authenticate each time because AD is really hard to use.
The truly amazing thing is that nobody could figure out how this mess was somehow sleeping in through critical alerts.
Admin
I personally wouldn't categorize this as a WTF. Bugs happen, and I can see it as a totally reasonable one to have.
It probably survived as long as it did because none of the log's consumers actually needed up to the minute information, so getting it once a day or whatever was probably fine. And as long as it doesn't lose data during the minutes it skips (presumably it just dumps all the happenings since the latest attempt), it was apparently "good enough" for their use case.
Admin
Admin
Yeah. The sleep() directive could be replaced with if (time.now.milliseconds() >= (time.then.milliseconds() + 720000)) then... to make it more accurate. And you could even put some fancy code in to modify the 720000 to account for the "ticks" used to process the functionality.
But wouldn't this sort of elevate the CPU priority of process [and the priority of the functionality], seeing as it would need to monitor it in pseudo real-time (possibly tipping it into an uncontrollable spin and [to mix my metaphors a bit] cause it to hang [in mid-air - so to speak and really WTFify the metaphor] ? :smiley:
Admin
The right way to do it is to ask the OS to call your callback at the right time; it's the only way to get anywhere close to accurate.
Admin
...which really requires a real time OS to do it right. However, if you are logging something every minute, real time accuracy almost certainly isn't needed. A lot depends on how long the 1 minute logging process takes to execute compared to the 2 second process; if it is more than 2 seconds we're in trouble. I personally would run them in separate threads; that to me is the wtf.
Admin
Hmmmm, dial down the sarcasm a bit - I'm sorry, did I say that with my "outside voice" lol.
Where do I need to send your leg? Seeing as I was pulling it and it came off in my hand.
I think I was aiming for something along the lines of "...agreed, but everything has it time and place. Given the (apparent) role of this code, I don't think it, or anything connected to it, (via cause and effect or synchronicity - not a spelling WTF, a dictionary WTF -- if it shows as such) gives a damn if it takes 1.75 or 2.25 mins to complete.
Admin
It seems like these problems are so easy to solve (at a detail that is sufficient). Just keep track of when you did the full log save last and if it has been 1 minute, do it on the next iteration and store that time. And if saving takes a long time, you could decide whether you want a 1-minute gap between them or 1-minute interval (e.g. whether to store the "time since last saved" before or after save).
And I'd say keep a sleep statement in there for simplicity (although keep it relative, 2 seconds minus time it took to do the save, to keep a ~2 second resolution on quick log).
Don't overcomplicate things, but at least do it so it works.