• Prime Mover (unregistered)

    A software concern in Nantucket

    Kept all of their mem in a bucket

    A coder named Drake

    Showed their stupid mistake

    And as for the bucket, said f*** it.

  • (nodebb)

    At a guess, we're looking at exhaustion of the memory pool and the differences in the numbers of allocations relate to how large the pool actually was (which, in those pre-virtual memory days, will have depended a lot on what else was on the machine). Many programs will have had no problem at all because they simply never actually allocated enough memory to burn through the pool, but images were very chunky things (at that time) so they stressed the pool more.

    There's a few possibilities for the actual nature of the failure, with the simplest and WTFiest being that free() was a NO-OP. A failure to return freed chunks (or at least freed chunks over a threshold size) to the pool would have also worked, which might have been the sort of thing that happens if the pool itself has a limited chunk size it handles and delegates all larger allocations to the base runtime, but never handles that case during free(), yet nobody caught it because their internal test cases didn't use a lot of large allocs. I could totally believe either of those scenarios, especially the latter.

  • (nodebb)

    We were told that some of the developers were complaining that our program was silly, because it didn't do anything with the memory we were allocating.

    Working in an engineering software project, I've recently been seeing this argument too. "Is the bug actually a bug if it only shows in this unrealistic example?" Heck yes!

    In this case I could simply argue, that the bug does occur in realistic configurations too, its just less clearly distinguished from numerical noise. But it contributes to the failure of some realistic models to provide correct results, and thus at the very least makes debugging more obvious bugs harder.

    Curiously enough, I recently found out that nobody ever tests the "nothing to do" configurations. Obviously they are useless to end-users, but they are damn valuable for internal testing!

  • Industrial Automation Engineer (unregistered) in reply to R3D3

    Curiously enough, I recently found out that nobody ever tests the "nothing to do" configurations.

    Operations on empty data-sets. Will trigger the most horrendous freezes and crashes every single time (for any given value of "every")

  • Prime Mover (unregistered) in reply to Industrial Automation Engineer

    Recently emerged from a deep dive into low-level library code where nobody ever checked for an input parameter to be blank or null. An unanticipated sequence of tightly-knitted RPCs result in an occasional case where an arrayful of data in one place is accessed and used a fraction of a second after one of the data items has been removed from the DB. Shouldn't happen, not a big deal if it does -- except that said low-level library code seriously objects to the DB returning a no-such-object message and blows the app out of the water.

  • hanche (unregistered)

    Many years ago, I reported a bug in Solaris. It concerned /bin/kill, which as far as I can recall, was a single line:

    kill "[email protected]"

    That would or would not work, depending on which exec… call you used to call it. The idea being, some versions of exec* would fall back on running a shell on the file if other methods of executing it would not work, which of course this didn't, missing a #! line. I enclosed a tiny C program demonstrating the bug.

    After several months, I got a response back saying, in essence, you should not run /bin/kill that way. Instead, you should use the signal() system call. Excellent advice, most of the time. But where I really encountered this bug, was in the es shell, which did not have a “kill” builtin, so I needed to be able to run the external kill command. (The problem was easy to work around, just creating a function in the shell that would run “sh -c "kill …"” instead.)

    Years later, the script was replaced by one with a proper #! line followed by a long and scary-looking copyright statement, followed by a single line along the lines of

    $0 "[email protected]"

    That was hardlinked to the names of a whole lot of builtin shell commands. Including umask, which is rather pointless (except at least it doesn't produce any errors).

  • Patrick (unregistered) in reply to Industrial Automation Engineer

    Even when "every" == 0 ?

  • (nodebb)

    Lordy does that bring back memories... both general and specific. Wrote quite a few dBase (II, then III, and IV) and Clipper programs back in the day, and encountered (what appears to be the same memory issue). Was not as lucky as Drake in getting any real communications at all from Nantucket, so it was time to "experiment and qualify" the error in order to see if there was a way around it.... IIRC (it has been well over 30 years) the issue centered on details of the size of the allocated memory blocks and the issue could be dodged by allocating blocks of specific repeatable sizes.....

  • Brian (unregistered) in reply to Industrial Automation Engineer

    #1 most common bug in my current codebase: missing null checks. Everything works fine in dev & QA testing, because everyone makes sure that the feature actually works with the intended data. But as soon as there's a scenario with no data it's null-reference exceptions all the way.

  • (nodebb)

    So, the language was clipped from history?

  • LCrawford (unregistered)

    Back in the day, checking for and handling results of malloc, free, and realloc were devilishly tricky to get right. But the specific numbered failure counts smell like something related to some type of internal memory pool handling.

  • DB5 (unregistered)

    Only 22 iterations until failure is kind of fast to run out of memory but in Win95 and Win NT 4 memory fragmentation was the culprit for many long-running applications. I had to write handler that caught the out-of-memory exception and coalesced the heap before trying again and that fixed many of our stability issues.

    Ah! The good old days.

  • (nodebb) in reply to Prime Mover

    If this were Reddit, I would have just gilded you.

  • MaxiTB (unregistered)

    The most interesting thing about the article to me was that I'm from this era and before and I never encountered or heard of Clipper before. Must be a really localized "everywhere".

  • ZZartin (unregistered)

    Then Windows 95 came out, and Clipper refused to pivot to Windows and became a footnote in history.

    Yeah who could ever have predicted this crazy new fangled "GUI" would ever catch on.

  • Andrew Klossner (unregistered)

    My guess is the PCs that could allocate memory 151 times were running a memory expander like QEMM. Adding 64KB RAM to a 640KB system made a big difference back in the day.

  • Yehuda (unregistered)

    I updated the Wikipedia page with your in-depth analysis. You're welcome. :) https://en.wikipedia.org/wiki/Clipper_(programming_language)#cite_note-9

  • JC (unregistered)

    It could just be a fragmentation issue. Certain block sizes may be pathological to some allocators, specially if an intermediate allocation (maybe hidden by the runtime) happens in between.

  • Loren Pechtel (unregistered)

    Count me in the memory fragmentation group. It doesn't even need to be something like QEMM at work, small changes in memory can have a big effect in when the fragmentation hits the wall.

  • (nodebb) in reply to Industrial Automation Engineer

    Or will cause your query builder to issue a DELETE FROM <table>; statement. Yes, with no WHERE clause. Or so I've been told...

  • 3298 (unregistered)

    Oh, what fun it is to run into buggy low-level code... and have its developers dismissively respond to your report.

    My own encounter with such a situation involved a pet project which worked fine when linked against libc, but crashed on exit when compiled in freestanding mode. The crash wasn't in my code, it was in ld-linux-x86-64.so.2. As you might guess from the presence of that option, this pet project was low-level itself, so I was at least equipped to track the bug down. It wasn't near the crash location, of course (that's normal for failed assertions), but I did find it. The report contained the usual requested description of symptoms and a minimal reproducing piece of code (a few lines of x86-64 ASM) with compilation instructions - and then a code analysis pointing out where exactly the runtime linker takes a wrong turn, a few lines about the history of the bug including a mention of the exact commit it first appeared in, plus a two-line patch fixing it.

    The only response hints at a somewhat ham-fisted possible fix by completely removing a minor feature (the one making the "if" block necessary in the first place) ... and then dings the minimal reproducer: "You really need to use the glibc-supplied startup files when linking, there is no way around that." Man, if you cared to read the code analysis, or the proposed patch, you would instantly realize that those startup files are utterly irrelevant to this bug! Adding them to the compilation instructions is just unnecessary noise - the opposite of what the requirement for a minimal reproducer is supposed to achieve. They don't even have an effect until after the buggy code runs.

    End of the story: glibc bug #27772 simply got buried without a fix.

    PS: I'm also consistently reproducing 500 Internal Server Errors while trying to post this comment. If you see this, then cookies must be turned on to help TDWTF "authenticate" you as a not-logged-in guest. :wtf:

  • Aled (unregistered) in reply to MaxiTB

    Clipper had interesting features as a programming language compared with DBase. For example, it had closures, although there were some limitations to their use that I don't remember now. Compared with the competition it was very much better for Real Programmers (tm). Remember that this was MS-DOS, before Windows.

  • (nodebb) in reply to R3D3

    We were told that some of the developers were complaining that our program was silly, because it didn't do anything with the memory we were allocating.

    "Oh, I'm sorry, here's a 275,000 line real-world program in 187 separate modules that also demonstrates the bug. Have fun tracking it down".

  • ClipperHead52 (unregistered) in reply to ZZartin

    it wasn't they didn't try to adapt to Windows, it was they decided to redo it as a pure object-oriented language that was so different from Clipper, it was just easier to invest time in learning any other language, than to start from scratch learning the replacement, known as Visual Object. Clipper derivatives as still in use today, all over the world, but more popular in countries where you cannot get the latest hardware

  • Appie (unregistered)

    In those days == didnt exist yet.

  • Zygo (unregistered)

    "Is the bug actually a bug if it only shows in this unrealistic example?"

    This is usually shorthand for "is the bug a thing we should divert development and testing time away from revenue-generating activity to fix?" and it's the kind of question teams will ask whenever such resources are insufficient (which they almost always are) and there are clearly viable alternatives to fixing every problem known to exist.

    The real fun begins when you have to start debugging a client's corner-case issues, so you turn on auditing tools and find the thousands of pre-existing bugs in the code, because nobody fixes (or even tests for) the special cases. That's also when you discover that there's a decade or two of legacy code piled on top of all this that couldn't handle errors, even if they were properly detected and reported, without non-trivial architectural changes in the upper layer code. The effort to fix the lower layers will be wasted on existing customers.

    That sort of story usually ends with "so we started over with a less-broken lower layer and gave it to customers with less-mature apps" as this one did.

    "Did you find the bug?" "Not yet. Every 3rd line of code claims to be Spartacus. I'm crucifying as fast as I can!"

  • Officer Johnny Holzkopf (unregistered) in reply to ClipperHead52
    Comment held for moderation.
  • Chakat Firepaw (unregistered) in reply to Yehuda

    Your Wikipedia edit was quickly reverted through the "deletionist two-step": Your reference back here was removed, then the rest was removed as "original research".

    (Why I have no desire to edit Wikipedia, reason #6.)

  • jay (unregistered)

    This doesn't seem very mysterious. There was a bug in their memory manager. Under some conditions they didn't release the memory, or didn't release all of it, or didn't release it correctly. The bug was hard to find and it was easier to just rewrite the module than to debug it. Or maybe they'd already started on an improved memory manager and it wasn't worth the trouble to even try to fix the old one when they were replacing it anyway. Which suddenly reminds me, I read once that a teenager posted on his Facebook page, "Sometimes it's not worth fixing something. It's better to just start over." And his father posted, "That's why you have a little brother."

  • jay (unregistered)

    "We were told that some of the developers were complaining that our program was silly, because it didn't do anything with the memory we were allocating." I've heard this dumb argument so many times. When debugging a problem is difficult, it makes good sense to strip it down to the bare minimum to reproduce the bug. I don't care that the "bare minimum" may not be a useful application. I've had a few times I've posted such a stripped-down example on Stackoverflow or the like asking for help, and I get people saying that my problem could be solved by just leaving out the unnecessarily complex thing that doesn't work. Likewise, I've heard developers say they applied for a job and were asked to write some sample code to solve some simple problem. And instead of rationally understanding that of course this is an over-simplified problem to test their programming skills, they'll argue about it. I recall one person saying that he demanded of the interviewer, "What department requested this and why?" Sigh.

  • Fizz (unregistered)
    Comment held for moderation.
  • Diane B (unregistered)

    Given the era, I'd wager the difference between 22 and 151 had to do with HIMEM shenanigans .

    Well, either that or the computer couldn't decide whether to shoot itself (22) or drink itself to death (151).

  • (nodebb) in reply to Zygo

    This is usually shorthand for "is the bug a thing we should divert development and testing time away from revenue-generating activity to fix?" and it's the kind of question teams will ask whenever such resources are insufficient (which they almost always are) and there are clearly viable alternatives to fixing every problem known to exist.

    Depends. The unrealistic example almost certainly is just the much-requested "minimal reproducible example" meant to demonstrate the issue. The issue almost certainly was found during actual application. In my case it was that way, which is why the fix was after all assigned to the necessary colleague.

  • (nodebb) in reply to hanche

    After several months, I got a response back saying, in essence, you should not run /bin/kill that way. Instead, you should use the signal() system call. Excellent advice, most of the time. But where I really encountered this bug, was in the es shell, which did not have a “kill” builtin, so I needed to be able to run the external kill command. (The problem was easy to work around, just creating a function in the shell that would run “sh -c "kill …"” instead.)

    https://github.com/microsoft/pylance-release/issues/2417

    They treated the bug report as a support ticket ultimately, specifying a work-around and closing it.

    Sure, that helped me. But its still just a workaround, not a fix for the issue, and it quite explicitly denies acknowledgement that there is an issue.

  • clubs21ids (unregistered)
    Comment held for moderation.
  • Manish (unregistered)
    Comment held for moderation.
  • Drake Christensen (unregistered)

    I'm the person who submitted this. Sorry I'm so late, here. I'm a couple of weeks behind in my feeds.

    A couple of details that got lost in the rewrite.

    First, it was only eight or ten allocations, and the largest was a couple of K. A few were under a hundred bytes. And, we allocated all of them in a group and then released all of them, in reverse order. So, it was nowhere near an out of memory issue.

    And, at this point in our development, we weren't using extended memory, yet.

    We weren't using any of our code for our example. It was a simple while (true) loop with the allocations and releases and the print statement, using calls directly into the Clipper libraries.

    Also, none of the short tempers were between us and Nantucket. But, from vague references by our contact, we deduced that tempers were continuing to flair within Nantucket.

    And, one thing I forgot to spell out. The new memory manager wasn't being released on our account. It was already available in their shipping library. Our code was still using the older calls. This is where we learned that they were deprecating the older memory manager that we had been using.

    So, yeah, it sure seems to have been a pathological case, caused by the particular sizes we were allocating. It's just my insatiable curiosity that wants to know more about the details.

    Knowing that might explain why it crashed at two particular iteration counts on different machines. We didn't spend a lot of time trying to narrow down what was the same and different with those machines. They were a variety of machines purchased at different times and personalized by their users. The fact that we had a test that demonstrated the problem in less than a second was where we stopped looking for things on our side.

Leave a comment on “A Forgotten Memory”

Log In or post as a guest

Replying to comment #:

« Return to Article