- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
"Thankfully" is the wrong word here. "Fortunately" or "luckily" would be better.
It doesn't actually have any choice about that if we want it to remain a C++ compiler.
And the specific reason it's UB is that the compiler doesn't have to keep "bss" globals (i.e. globals without an explicit initialiser) in the same order as their definitions. <<== Here speaks the voice of experience.
Sherman! Set the WABAC machine to 1992! In this fateful year, I had to port an embedded system from one C compiler to another(1) for reasons that aren't really relevant here(2). The code in question was fairly well dosed with WTFs(3), and among them was how it coped with saving some variables to non-volatile memory (battery-backed static RAM, in fact. There were just two big lists of global variables, one in BSS (no explicit init) and one in DATA (yes explicit init), and the code memcpy()ed between them.
And, of course, the old compiler put both sets in memory in definition order, while the new compiler put the explicitly initialised variables in definition order, and the BSS variables in whatever random order it felt like. Needless to say, this ended badly, since the "endmarker" BSS variables weren't at the beginning and end any more... I repackaged the variables as struct fields, and could replace the save by
saved_data = live_data;
, and the restore bylive_data = saved_data;
.And yes, there were tests for this change. The test plan weighed in at 35 pages of A4.
(1) The old compiler was Computer Innovations' "CI-86 Optimising Compiler", and the new one was Aztec C. The CI-86 compiler was the latest version (I called them up to verify), and hadn't been updated in six years by that point.
Addendum 2024-10-01 06:58: (2) The CI-86 compiler couldn't generate small code to save its life, and we ran out of ROM
Admin
(3) Notably the central control variable for the primitive threading system, named by a previous developer to celebrate buying a new car. He bought a
golf_gti
.Admin
if( h->quant4_bias[i] ) free( h->quant4_bias[i] );
Again!?
Admin
Until developers are liable by law for the harm they have caused, with all of their property and real jail time if the realized property value is insufficient, no "safety culture" will emerge, ever. Either there is skin in the game, or there isn't. With the current laws governing employment, there would never be.
Admin
Just to be the devil's advocate, something so uncommon on the internet: how about we define some reasonable behavior instead of all that UB BS? In this case, defining that two subsequently declared variables are at contiguous address would keep the program compilable and safe until the end of times. We could even insist that it aligns properly. No need for perfection or idealism.
Or we could accept that UB is a conspiracy to slowly make life for C programmers more and more difficult by subtly changing the way old code worls, and thus make everyone stop writing C.
Admin
I wrote a commercial C/C++ memory allocator library in the 1990s, including a memory-checking version for this type of scenario. Given how sophisticated the tools are these days, it's tragic that people still don't use them. Today I prefer using a language with no explicit pointers and is less/not prone to this kind of badness.
Admin
Member when there was still memory you could not write to twice? Those were the days :-)
Admin
In my 15 years of actively coding in C/C++ I had exactly four instances where I "forgot" to free memory or handled memory incorrectly that only appeared after testing:
Was actually a private project on my Amiga. To be fair, this was my first multi-tasking 32 bit machine (as in the CPU was 32 bit aligned, I don't care that the data bus was 16 bit multiplexed); my only experience before was the 8-bitter C64.
Was actually in school, I messed up allocations of my custom made memory manager for the graphical UI I made (DOS back then support XMS and EMS and basically I messed a call up there, so it wasn't even directly heap related).
Was when I worked as a game developer, I free memory from the same pointer twice instead of two different pointers during game shutdown (so technically not relevant). The reason was the most evil development practice one can do called Search&Replace. Obviously it was noticed in stress testing before release and I had to actually argue with the lead dev a bit to get the five minutes to fix it and resubmit the code to automatic testing.
Was on the AIX for a bank - my code was so efficient, the back pressure killed follow up services (obviously that made me the bad guy). So while I added an artificial delay I notice that in one case my code had a circular reference blocking cleanup which added up to around 100kB total for a few million transactions. Obviously nothing to worry about because the service got start up once daily, but I fixed it anywhere without asking because that it learned my lesson from (3) :-)
My point is, memory management is only a issue for two types of developers: The ones that are sloppy and the ones that don't know what they are actually doing. And from my experience both types can crash whatever they touch, no matter the language. Or to quote someone more famous: "There is no fix for stupid".
Admin
In this specific case, it's actually defined provided all the pointers are the same size (which is overwhelmingly the normal case these days, especially for things that you'd give to
free()
) as they're inside astruct
, the number of elements in the first array is 4, and the pointers will all be standard-aligned. If the arrays had instead been directly on the stack, it would have been entirely undefined; relative positioning of arrays with respect to each other is not guaranteed at all in that case. (Theoretically, they could have been placed with the difference between them being not an integer multiple of the size of pointers. Nothing actually does that though, because that would be crazy, but placing them not contiguous or in either order would be quite expected.)This code is bad and its author should feel bad, but it won't crash.
Admin
Well, the idea behind undefined behavior in the 70s/80s could be summed up to two reasons: Multi-Platform & Performance.
Multi-platform is obvious; if you do not strictly define behaviors it is easier to implement a compiler for a specific platform. This was very important back in the old days were even architectures between product families were further apart than x64 and ARM these days. Performance is obvious too, if you have a wide array for architectures it's easier to implement performant code if your are not to limited in what you can do. Keep in mind, back then performance was not a nice to have it was a must, there was simply not room left for comfort and developers were all experts in the field (or multiple other scientific fields), so hence those languages left a lot room open for implementations.
Ironically the approach comes with a lot downside; a sin obviously was the fluid type definitions instead of using fix sized ones. That made sense to have 24bit support for some architectures, but these days it doesn't matter. Hence a lot of modern languages have platform independent fixed types. Another example from C++ was the lack of binary compatible contracts; mangling was a compiler/vendor specific thing, heck everyone even implemented their own virtual call structure. And both cases worked against the original intention and restricted the language family on a multi-platform (potability) and performance level instead of helping them.
So long story short, I think it's not a matter of UB good or bad, it's more a matter that times have changed. In the old days it was required because the compute power was not there. Today Microsoft is wasting millions of dollars every single day because they run LLMs on Python while even their in-house managed language C# would be around 10% of the cost, with C++ you could get lower. But they buy a nuclear power plant instead.
Admin
They were, indeed, the days, and I'm glad to be rid of them.
Admin
"The reality is that our safety culture in software is woefully behind the role software plays in society." This is just one way that our profession is still in its infancy. We haven't yet gone through the kind of reflection and self-regulation that lawyers and physicians have, and if we don't do it ourselves, the government will do it for us and probably poorly. This is not a "government = bad" position, but that the legislation and regulations would be written by people who do not understand what we do, and so are incapable of addressing the problems appropriately.
Admin
For things you'd give to
free()
, yes, but more generally, in C++ there are things called pointers that are not the same size as an ordinaryvoid *
, at least:a. Pointers to functions - they aren't required to be the same size as pointers to data. (Case in point: the Medium and Compact memory models on x86-16, where for Compact, function pointers are 16 bits and data pointers are 32 bits, and for Medium, it's the other way around.)
b. Pointers to member variables or member functions in C++
SomeType SomeClass::*variable
and the like. In certain cases, they can have ... complicated internal structure to account for stuff like classes that inherit virtually, and the consequent weird effects that has on object layout.Admin
Yes. that's why I mentioned the caveats that I did. And passing a pointer-to-member to
free()
is likely to be "happy fun times" for all. I don't think I want to try debugging those crashes. (You also probably shouldn't be usingfree()
directly in C++ unless you're implementing custom memory handling.)The different memory model stuff is... unlikely to apply in this case, especially as both POSIX and Windows genuinely don't support that nonsense, and nor do most people on embedded platforms. It's the sort of hackery that doesn't really apply that much now. And passing an actual pointer to a function to
free()
would take us back into "happy fun time" space.Admin
"Many programmers derive the major part of their intellectual satisfaction and professional excitement from not quite understanding what they are doing”
Admin
I would suggest that nothing we can do as a program-producing profession will be effective without larger changes and cooperation with other parts of the software economy.
Many of these kinds of issues arise from a simple skill deficit, but many others arise from short-sighted time and money pressure in the management structure. No time for testing, no time for code reviews, no budget for high-skill mentoring.
If an organization ships bad software, the primary fault may be in that management structure.
I don't disagree that a government-imposed solution is unlikely to be effective, but I wonder if customer-driven solutions would be, in some cases?
Admin
The problem is that these are not constants, they are computed properties (to use Swift terminology). I consider myself to be a good programmer, but on a bad day, I can be either or both of those things, depending on such variables as how much beer I drank last night and how much I understand the domain in which I am working.
Admin
There's a joke somewhere in there about the white house and memory issues, but I'm having trouble remembering it right now.
Admin
Yes but no. 32-bit builds of Windows 10 do, indeed, still support running x86-16 code, with all the segmenty memory-modelish horribilicity that that implies. (Whether you're talking about x86-16 for DOS or x86-16 for Win16, it's still x86-16. Windows 11, having dropped support for 32-bit builds, doesn't have to put up with that nonsense.)
Curiously, x86-32's memory model is, aside from some pendantry about the exact numbers in the segment registers, roughly equivalent to the x86-16 Tiny model.
Admin
"This code is bad and its author should feel bad, but it won't crash."
Oh yes it will. Well, that code might not crash, but the code that accesses the arrays you're ALSO freeing will crash.
Admin
Hmm. The whole if statement inside the loop is unnecessary. free() is guaranteed to be safe on a null pointer, and the pointer is either null or not null, so there's no need for the if wrapper at all. The loop should simply iterate and call free() on each pointer. (A good optimizing compiler will probably strip the if out anyway.)
What is missing is the setting of the freed pointer to null after the call to free, because the result of calling free() on already-freed memory is also undefined.
Admin
I assumed that it was a beneficial (though probably unintentional) side effect. Perhaps originally there was only a single array of 6 elements, and the code worked as intended. But then someone split it into two arrays and forgot to change the freeing code, but it just still happened to work.
Admin
I mean they might want to free both arrays, so iterating over 6 elements is - for now - equivalent to iterating over and freeing the first 4 then iterating over the next 2. Of course there is a worse possibility - that the developer was trying to be "clever" in recognising that they could just run over the end of the first array to access the second.
Admin
Seriously? Nobody? That's no C++ there. free has no place in C++.
Admin
I'd submit that memory management is an issue for a third type of developer: the one that inherits the code from the sloppy developer.
I can't remember how many hours I spent cleaning up the base EmlenMUD code years ago, so that it was actually possible to keep the MUD running for 60 days on end, instead of it crashing once or twice a day due to memory issues.
Admin
I came here for the same reason and was surprised nobody said that before.
If free is use you should be in C, not C++...
Admin
Admin
If you want a system without UB, feel free to use one. Buy a PDP11 and install a 70's compiler. Define the behavious as being whatever the system does (such as the old behaviour that you could continue to use a block after freeing it until you allocated or freed another because that wass just how the allocator happened to be imlemented), and you're happy.
One of the main values of UB is to help programmers think carefully about the difference between the accident of current implementation and the true requirements. This means that code written at one point in time has a good chance of working with minor changes decades later when the technology has hugely changed. If anything, C didn't have enough UB, as the lack of stricter limits on aliasing has meant that things like vectorisation are difficult.
Admin
“The legislation and regulations would be written by people who do not understand what we do, and so are incapable of addressing the problems appropriately.” Unfortunately, the U.S. Supreme Court just cut back sharply on the power of federal agencies to interpret the laws they administer and ruled that courts should rely on their own interpretation of ambiguous laws. So, even if Congress creates an agency that hires software experts to determine software safety requirements, any judge anywhere can override or throw out the regulations for any reason.
Admin
I have exactly one class for memory allocation. It's a template to interface with C. There is no other code in anything I've ever written which uses "new" or "malloc".
X somefunc() { sdMem<sometype> x(/n/); //C api //shove x in a C++ container X return X } ..typically used for strings sdMem<chartype>..
Oh, they said - that's inefficient! Yeah, like spending man hours debugging isn't. Modern compilers optimise most, if not all of it out, anyway.
You can't use macros in C++, that's not the way. Yes you can. Change SD_AP from auto_ptr to unique_ptr - modify two lines of code in decades old code.
Wrap pointer args in a macro. Type that as religiously as the arg declaration to the function. Start of function, SDE_NULL(arg) before doing anything. I used to have that as debug code but has long been in release. It's typically one asm conditional branch instruction. The payoff is an exception detailing the filename and line number (man hours again).
Admin
free(0)
is aNOP
in both C and C++.No test is needed in the first place!
Admin
I'd argue the real WTF is that stack allocation of
uint16_t quant4_bias[4][16];
is more memory and speed efficient than theuint16_t* quant4_bias[4][16];
+malloc
/free
Admin
Perhaps you'd find it interesting to review the actual behavior of the legal and medical professions. It is almost unheard of for a lawyer or doctor to experience significant consequences for violating professional ethics. They do, however, have some shiny documents that say how ethical their professions are. TLDR: programmers hate to write documentation.
Admin
Trouble is, that kind of approach to "safety culture" doesn't really translate to software. To extend the metaphor, it would be like requiring the same safety features for a supermarket checkout counter as we do for the earth mover.
Software simply has a vastly different idea of "safety". In the real world we're generally prioritizing the safety of people first, of property second and of profits last. In the digital world it's almost exactly the opposite, with only a small number of exceptions such as medical equipment.