The Daily WTF: Curious Perversions in Information Technology

2018-08-20 Reply Admin

So TRWTF is that the 3rd party provider was able to track down a memory leak in OP's system that OP could not?

2018-08-20 Reply Admin

that is 60 mlps - memory leaks per second

2018-08-20 Reply Admin

Its worse than that. Daniel knew there was a memory leak and yet still deployed it with a vague "It will be all right, because they told me it will be re-booted nightly", and did not verify that the re-boots would occur. I have built machine vision systems and custom code that runs in industrial processes that run 24/7 and I do week long burn tests before releasing code and would never accept a memory leak like that.

Steve_The_Cynic · 2018-08-20 Reply Admin

8 bytes per frame plus at least 16 bytes, maybe more, of overhead per allocation. I'd be willing to speculate that the total leaked per allocation was more like 32 bytes (16 bytes of arena header plus 8 bytes for the allocation plus 8 bytes to pad the allocation to a 16-byte boundary) once you include all the overheads, so the application crashes because it runs out of address space (that 0.4GB becomes 1.6GB, plus all the other stuff in the first 2GB), regardless of the disk policies.

2018-08-20 Reply Admin

I did have the luxury of learning to program in C, but it was all simple stuff that didn't involve malloc at all.

2018-08-20 Reply Admin

Quite. With a competent use of a memory monitor, and an appropriately controlled test, Daniel would have seen a steady ramping up of average memory used over the course of a few hours, at which point it would be clear that the increase was a smooth and steady ramp at the rate of 8 x 60 (or whatever the numbers) per second, and at that point it should indeed become considerably clearer where to focus his attention.

2018-08-20 Reply Admin

"Those of us that had the luxury of learning to program in C or other non-auto-gc'd langauges, learned early on the habit of writing the allocation and deallocation of a block of memory at the same time, and only then filling in the code in between afterward. This prevented those nasty I-forgot-to-free-it memory leaks. "

Then we grew up and learned C++.

2018-08-20 Reply Admin

You might be surprised how common that is. At my previous company, all of the persistent applications were set up to automatically kill themselves and restart every night, because apparently things got weird if you left them running too long. I shuddered a bit when I heard that... and sure enough, as the volume of our transactions grew, the whole system started crashing due to some memory leak in a homebrew container class that took the CTO weeks to track down (mainly because he didn't want to admit it was a fault in his 20-year-old personal library).

2018-08-20 Reply Admin

You may have the luxury of killing a process for the hell of it, but in my industry (where code runs equipment 24/7) you can't just re-boot without a co-ordinated effort in meatspace

2018-08-20 Reply Admin

8 bytes sounds per frame sounds like a filename perhaps? Whats the betting the frames were MJPEG and using temporary filenames?

2018-08-20 Reply Admin

8 bytes is also a double float. Or a pair of pointers. Or some struct{} that has 8 bytes. I seriously doubt a temporary filename on every single frame. The article says this was in the OP's user code, and not the vision library itself.

2018-08-20 Reply Admin

That's the right moment to retell the famous story from Kent Mitchell, isn't it? https://groups.google.com/forum/message/raw?msg=comp.lang.ada/E9bNCvDQ12k/1tezW24ZxdAJ

Steve_The_Cynic · 2018-08-20 Reply Admin

It's a fair point, although I'd prefer if the calculation took into account the "carry time" of the missile, while it's still in its launcher / hanging from the wing or fuselage of the plane / etc.

2018-08-20 Reply Admin

While it's in its launcher/hanging from a wing, it's going to be in a off/standby state. Frequently the power for the missile is supplied by "thermal batteries", which are heat activated primary batteries (non-reusable). It may get power before launch, but that will most likely only be shortly before the launch ("hey, target is at this location, and updates will be using this key code").

The only real exceptions that I can think of are optically guided missiles like the Maverick, which are lock on before launch, and sometimes had imaging seekers on for a really long time (notably, A-10s used them for night vision in ODS).

There's a case where something similar happened on the Patriot system, which actually did cause problems (also in ODS). The radar/launchers weren't meant to be on as long as it was, and so the timestamps started going wonky: http://www-users.math.umn.edu/~arnold/disasters/patriot.html

2018-08-20 Reply Admin

This happened to me once. I'd just inherited a hellish Frankenstein project that had largely been assembled copying and pasting bits from other projects. The project was in a very unstable early prototype stage but had to go live ASAP. I sorted out what I could for the app itself then after little more than a month of that I was sent to a tiny windowless room (lights also did not work) in another country with ten hulking monolithing built like tanks machines crammed in. No chairs, only one foreign keyboard to go around and all sorts of other deprivation. To make things worse, people had done things to these machines. Each one had been messed with in some way that made it different. Clearly we weren't the first rats in this maze.

I distinctly remember each one being in some state of malfunction. One would be in a constant reboot cycle. Another would reboot every few minutes, another completely at random. If you rebooted them all you would see a myriad results. Under the hood was a nasty tangled mess of hacks and an especial favourite was the gratuitous use of sleep and reboot. If the network went out? Reboot, hence unplugging a network cable put a machine into infinite reboot. If something didn't initialise fast enough or in the wrong order? Reboot. In many places, reboot was used in place of die!

Eventually with common sense (like getting one working first then copying to the others, etc) we reached some semblance of sanity. Until one night a few months later I was working late and startled as all the machines simultaneously rebooted. Some knucklehead had put reboot in a cronjob and not told anyone.

2018-08-20 Reply Admin

Instead of "C", write your code in Rust. It compiles to a binary which has the same ABI as C; but does not (by design!) have memory leaks.

2018-08-20 Reply Admin

A little bruish, but I'm curious why the gent didn't use the Windows Scheduler app to force the OS to do a reboot every 24 hours.

Not the most genteel fix, but it's a solution...

CoyneTheDup · 2018-08-21 Reply Admin

I once got an out of GDI memory on a Windows application. I managed to figure out how to reproduce it and then I reported it. Turns out they had been looking for that memory leak for a while and since I could reproduce it one of the developers showed up at my desk to try this version, try this version, try this version. He was actually creating and building versions on the fly on a laptop he brought with him . On the other hand, I got really good at running the system out of GDI memory , it would only take about 3 or 4 seconds after starting the app .

He traced the leak to a call to a third-party Library that returned a handle to a bitmap. The caller was supposed to release the handle when finished, and that wasn't happening. Turns out you can use a lot of GDI memory up with leaked bitmaps.

2018-08-21 Reply Admin

Your analysis clearly contradicts the facts. We know how often the leak occurs, we know how much memory there is available, we know the software runs for ten days, so we can calculate exactly how much memory is lost per frame. And it's eight bytes, not 32. If it was 32 bytes, the cameras would crash after 2 1/2 days.

2018-08-21 Reply Admin

TRWTF is that our hero spent days on this and couldn't see the difference between a process growing and using all RAM and a process writing to disk and using all RAM(disk).

urkerab · 2018-08-21 Reply Admin

Not all memory allocators have overhead for small allocations; for instance, jemalloc can be configured to give you true 1-byte allocations. Note that some compilers generate code based on the alignment of memory handed out by their CRT, so don't mix the two together.

2018-08-21 Reply Admin

It's bad but we can often be distracted by where the problem looks most likely to be. Needs a duck.

I once spent quite a while trying to hunt down a memory leak in an XML processor I wrote. I reached a point where I thought it might have been the libraries internals and started digging into all kinds of crazy tools and hacks to try to get to the bottom of that.

I decided to do a sanity check though on the close pointer in GDB and turns out it wasn't the library itself but the documentation that had thrown me off...

Function: xmlTextReaderClose
int	xmlTextReaderClose		(xmlTextReaderPtr reader)
This method releases any resources allocated by the current instance changes the state to Closed and close any underlying input.
reader:	the xmlTextReaderPtr used
Returns:	0 or -1 in case of error

versus

Function: xmlFreeTextReader
void	xmlFreeTextReader		(xmlTextReaderPtr reader)
Deallocate all the resources associated to the reader
reader:	the xmlTextReaderPtr

Subtle descriptions and if you read the description for close too quickly you might get caught out.

2018-08-21 Reply Admin

My first position out of college (20 years ago) I had a DOS machine that would sometimes lock up at night BEFORE it did the reports at 2am, 3am, etc. Don't know if was a memory leak per say because the machine would be locked up on the dos screen... not our apps. I grabbed one of those nifty "light timers" from Lowe's and set it to turn the computer off at 1:30am and turn it back on at 1:45am... never had a problem after that. I'm sure it wasn't the right way to reboot it but it worked :)

2018-08-21 Reply Admin

And this is why I run my embedded code under valgrind or AddressSanitizer for extended periods before foisting it off on QA.

2018-08-21 Reply Admin

I did a consulting gig at a company once that accidentally made a whole SAN read only. It took everyone about 8 hrs to figure out how to bring their whole system back up. My understanding is that it even took down AD so to get it started they had to use local accounts. It was a cluster!

2018-08-22 Reply Admin

hey - this didn't totally suck ass. good to see a change in trajectory.

2021-09-09 Reply Admin

LOL

2023-05-27 Reply Admin

I know it's late but...

I'm quite sure I know which device and library were used. In fact, I can hazard a few guesses about which specific humans at the vendor might have analyzed the code and written that response.

The images are acquired straight from the camera into memory. It is difficult to envision a scenario where Daniel would have needed to save or load any images from disk during analysis. Certainly it wouldn't have been recommended to do so, since this is a low powered device operating at a high framerate and waiting on file I/O would have been death for the critical loop.

I do not have any insight into what the actual leak may have been, but I'm going to guess that it was something incidental to the image processing and probably had nothing to do with the library.

Leaky Fun For the Whole Family

Leave a comment on “Leaky Fun For the Whole Family”