- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
But magic values appearing out of no where are what make our job fun...
Admin
Finally a real wtf article.
Admin
This is pretty typical, the boss doesn't understand the problem but rather than risk looking like they know less than a subordinate, they choose instead to not get involved.
Admin
If you know the taxrate will be 0.06 - why have it in a configuration file at all? You won't be able to reconfigure it anyway without replacing the test value in the code wherever you test... That imho is the major WTF. That, and the fact that you don't get stack/heap colission errors...
Admin
wow...
if (vtaxrate != 0.06) vtaxrate = 0.06;
?
captcha: paint... yeah, they must have huffed a lot.
Admin
I'm just wondering why they are using an equality to test what appears to be a float...
Admin
Now that is what I call a real WTF.
Doesn't the compiler detect potential stack overflows, or at least give runtime errors when they occur? Apparently not, I guess, because it would impact runtime performance.
This application very likely runs in the old 8086-style "real mode" which uses segmentation such that you can address >64KB of memory using 16-bit pointers. In this mode you have segments of 64KB starting every 16 bytes, overlapping each other. Hence, you can address 1 MB in total. A 16-bit pointer is always relative to a particular segment. So, I would assume the counter (pointer) within the segment to overflow, and hence start overwriting values at the very bottom of the stack, not the heap. If the "tax" percentage was on the stack somewhere, it could obviously become mangled by this. I doubt it was on the heap though.
If the program wouldn't crash this way, it would likely crash later upon returning from a few function calls. (wait...who am I kidding...did they even use functions?)
I may be wrong though, it's been 12 years or so since I last wrote x86 assembler code (speaking of WTF's, I know, I know).
Admin
Looks like I should learn to read. They specifically used the mode where the stack + heap are within the same segment. Arghhh. That was quite unusual even in those times. A simple recompile with the compiler set to have separate stack+heap segments might have fixed at least that part of the WTF, which makes it even worse!
Captcha: sanitarium, indeed.
Admin
simply set project settings to 'large' model.. don't tell management..
Admin
Variables won't. Constants aren't.
Admin
Admin
Even in real-mode DOS, you don't have to have a 64KB shared stack and heap! Change the memory model!
Or, failing that, use a real compiler. It shouldn't be that hard to write wrappers for Borland's proprietary stuff.
Admin
In C, if you're using a memory model where pointers default to near, the data segment and stack segment have to be the same. Otherwise taking the address of a stack variable wouldn't work.
Admin
Maybe 0.06 just happens to occur exactly in their floating point system ;)
Yeah, this comparison gave me the chills too. Of course, it's just one minor WTF compared to the major problem with the memory management which, if solved, would remove the need for such a comparison.
Admin
Never change a "running" system...
Admin
It was a long time ago, and I'm not sure how much I've forgotten, but changing the memory model was a good way to introduce subtle bugs in a program, especially if it's written by non-C-programmers.
IIRC a whole lot of non-standard rules about pointers pop up on the 8086: there were two or three semantically different kinds of pointers, and up to four distinct address spaces for pointers.
In the tiny model all segment registers point to the same address and all pointers are 16-bit "near" pointers, so the entire program, all of its code, heap, data, and stack must fit in the same 64K. Such code would be fastest (no multi-word arithmetic) and smallest (only 16 bits per pointer) and most portable (the pointer semantics are exactly those mandated by ISO C), but the 64K limit was absolute.
At the other extreme, all pointers were 32 bits (thus using the full 1MB of address space), all pointer arithmetic was expensive (there was multi-word math involved, and comparisons had to take aliasing into account), and there were still gotchas and restrictions (structs always had to be smaller than 64K, and all libraries you link with had to be built with the same memory model).
In between were various hybrid models, most of which changed the default behaviors for pointer types. You could also specify these behaviors explicitly, so speed-critical pointers could be declared "near" while pointers to large data values would be "far". IIRC there were also modifiers to specify where static and global variables were allocated (i.e. whether they consumed space in the precious first 64K of the data segment that could be accessed by near pointers) but maybe I'm thinking of a 6809 instead of an 8086 on that one.
The big problem with the hybrid models is that not all pointers are created equal--if you take a pointer to function and cast it to pointer to character, it might have the wrong address because these pointer types use different segment registers and therefore different address spaces. If you simply cast one pointer type to the other, the compiler usually did the right thing, or at least gave you a compilation error if it couldn't, but it couldn't help you if you had code that did a pointer conversion by casting a pointer to a long, then the long to another pointer.
Arrays were especially interesting since you could in some cases have a far pointer to the first element of the array, but only a near unsigned (!) offset within the array, so strange things happen if you e.g. try to access the 32769th integer in an array, or if you were compiling yacc output which is full of expressions like "foo(yytok[-6],yytok[-8])" that could be interpreted as "foo(yytok[65530],yytok[65528])" if you weren't careful about how yytok was declared.
Back to this specific case, I wouldn't trust a bunch of ex-COBOL programmers to have vetted their code to make sure it didn't pass a near pointer to a heap object to a function that is expecting a near pointer to a stack object.
I would turn on stack probing, though, since a modern machine should have no problems with an extra few instructions per function call and the error checking is important; however, two months before Y2K I'm not sure what I would have done had any stack/heap collision errors shown up--rewrite the whole thing? Port to a 32-bit compiler? Buy as much fuel and non-perishable food as possible and hide in a bunker until February?
Admin
Sure, and then you don't know what other damage is done by having your heap trashed. At least by aborting you start over with a (hopefully) clean slate.
If the program was structured to be able to handle partial processing of the input, you probably wouldn't have the error happen on the next run.
Admin
What an ABORTion
Admin
They should really market that as a feature of their software. The buzz word "dynamic" comes to mind.
Admin
The clear answer here is to put vtaxrate at the top of the heap, so it explodes first.
This isn't impossible even with modern hardware/software -- I work on an embedded system that has data space at the bottom of a particular memory bank and the stack at the top.
Admin
I have no idea what this heap is or what alloc\delloc does, but I want to play with pointers too mommy. Can I use pointer multiplication mommy pretty please OO
Admin
Or they do understand the problem, but don't care because fixing it would be so much more expensive than it's worth...
Sucks, but hey that's business.
Admin
Medium model: 64KB (2 ** 16 bytes) shared by heap AND stack.
You can the rest of memory for code. If I remember correctly, each compiled file got its own segment, controlled by the DS (data segment) register, which was set by the subroutine call.
This is for C programs, of course.
CAPTCHA: muhahaha which is what I was thinking.
Oh, and I better not laugh at this, I was working on a DOS legacy program as late as 2003, with all these problems.
Admin
Interchangeable subroutines aren't.
Admin
OK, so you're recompiling anyway to fix the Y2K issues. In that case, why not change the memory model while you're at it?
What am I missing?
Admin
Admin
The original code was COBOL, which doesn't use binary floats. The 0.06 tax-rate is a PACKED DECIMAL (BCD) value with 3-digits and a scale of 2. an exact '=' comparison is possible.
These PACKED DECIMALs match base-10 money. This is a major reason why banking still uses COBOL.
Admin
I bet you won't be surprised, a famous German money bank (with small electronics department) suffered from the same thing back in 1995:
http://www.khd-research.net/Bahn/PCR/PC_and_Railways_1.html
Read 17.03.1995 and the next two articles on this topic...
Admin
Captcha: cognac (but I kept typing the missing "i")
Admin
This reminds me of how I've seen Visual Studio.net occasionally decide that Boolean TRUE was really Boolean FALSE.
("If [thing that evaluates to true, and the runtime even tells me it's true] = FALSE then [code]" will step into [code] even though I watch it see that the expression is TRUE.
This is, of course, a signal that it's time to restart the environment because something is Deeply Wrong.
Fortunately whatever we were doing that was causing that, we seem to have stopped doing, because it hasn't happened in a while.)
Admin
Well that's obviously BS. There are plenty of industrial strength programming languages out there that support user-definable classes, operator overloading and the notion of a library. This is just one combination of ingredients you might need to support packed decimals and that library only needs to be written once to be used a million times.
Admin
Large DOS programs tended to be rather fragile with regard to the memory model.
In particular, there might be assumptions about the size of a pointer (2 bytes for data, 4 bytes for function) in the code.
Admin
I also remember that we really tried to avoid writing C code for DOS. Installing Linux 0.99 using the very distros was so much simpler than comprehending what was going on in the memory model (let alone using a DOS Extender...)
Admin
Perhaps you have accidentally used = instead of == or something like that. VS is not that buggy in my experience, it seems more reasonable that maybe there was a typo in your code.
Admin
If [thing that evaluates to true, and the runtime even tells me it's true] = FALSE
Admin
Ahhh stack bugs.
Reminds me of an interesting Windows GDI bug I had to track down a few years ago. The problem was that some controls on a dialog wouldn't draw in debug code(but would work just fine in release)
Turns out in the good ol days, GDI would store a struct at the bottom of the stack (well rather at the start of the 16K chunk of code holding the stack). I can't remember what the struct was for, but I THINK it had info on the stack size. During drawing GDI would request a "fast alloc". If there was room on the stack, it would just grab it from the stack (you know, just decrease the stack pointer) But if there wasn't room, it would do a normal alloc.
So whats the bug? Well now adays we don't have a stack size limit. But when GDI still checks this ONE member of this struct that exists at the 16K barrier. When GDI does a thunk, it will write the necessary value into this location.
I don't remember how to get this to occur exactly, but what you have to do is grow the stack enough so that it falls into another 16K chunk. Then pop back off the stack, so it goes back to the previous 16K chunk. THEN make a GDI call. (Actually I don't think you make the GDI call, rather you return from the window proc, which will make the GDI call)
What happens is, if you grow the stack, and write stuff to the stack you will overwrite this magical location. Thats ok, because now the bottom of the other 16K chunk will hold the magical value. But then you pop stuff off the stack, the magical location will remain unchanged (and no longer has a magical value). If this magical location has a "bad" enough value, then the next GDI call will fail. Well they all will until another Thunk occurs. And you end up with a dialog with no text in the controls.
The interesting thing is that this never happened in release code, only debug. Well the magical value had to be "large enough" Well in our debug code we would fill the stack with random bytes (typically CC or something) which would trigger the issue!
This bug was in WinXP, and probably in Vista as well.
Admin
Aaaaaarrgggh. [More screams of pain deleted].
If [Thing that evaluates to true] = FALSE Then .... IS NOT THE SAME THING AS If Not [Thing that evaluates to true] Then ....
Which is probably what you meant.
Adelle.
Admin
if (vtaxrate != 0.06) vtaxrate = 0.06;
You're not understanding. A bad tax rate was an indication that all sorts of things might be screwed up. I think it's impressive that the submitter figured out the real problem rather than the "obvious" brute force fix.
Admin
BCD, huh? Boy, that does take me back. My first venture into computer science included drawing a logic diagram for a BCD adder circuit. All on paper, of course, so I had no way to actually test the soundness of the design, but it was quite a fun mental exercise.
Admin
Well, at first, I thought : if you know it should be 0.06, why not make it constant? Then it won't be stored in the 'data' memory, but rather in the code itself, so no overwriting should occur.
It took me a moment to realize, that this if() was probably added for the sole purpose of testing the stack/heap collision - I'd prefer a better error message, though.
Still, they should make it constant, remove it from the config, and either add better and selfexplaining tests, or just move to Year 2001, where we use all the memory available:)
Admin
Isn't this what C's "volatile" keyword is for - variables which can change due to external factors.
PS: Reading all the posts from people who're desperately trying to remember about 8086 segments:
I'm pretty sure there's memory model which stores all pointers as four bytes - segment+offset. This allows you to take the address of stack variables, etc. and would fix the problem.
Admin
I think one of the comments did mention a 32bit pointer
Admin
I guess the programmer tried to do the Right Thing by making the program ocnfigurable. But when the program started crashing, and the managers started screaming, he just did a quick and dirty fix to get it done.
Admin
Admin
Actually I think the error message was "the real WTF" (tm) here. It should have said something like "Restart the program, heap is fucked up". And the value they used to check whether everything was properly initialised or not is the tax rate. Could have been another one. For all you know, other values are messed up.
Of course, they should fix the initialisation instead of this cheap workaround. But we're livin in a real world...
Admin
@ Doubts
No, I think the programmer didn't figure out the problem it self. He only figured out that sometimes, the value started to suddenly change without explanation, thus the hacked test.
Had he guessed what causes that spooky value change, he would have used a better method. In such situation, if there's no time or possibility to adapt the code to a better memory model, one shouldn't check the parameters (and thus make them constant), one should use stack-smashing protecion : You put a known value at a critical location. In this situation, the programmer should have put an additional variable on top of the heap (say, a volatile char array holding the sting "Canary") and at each critical place (before/after calling subroutines) check if the content of the canary was changed. It's a little bit more efficient than constantly check the heap and stack pointer (as proposed by other reader), but still does the job is done.
@ Dan P
And thus use an overblown amount of space and processing power to do something that exists as plain stupid x86 CPU opcodes. What you propose is exactly similar to use complex libraries and operator overloading, just to do floating point in software, when there's a perfectly good implementation in the FPU part of the processor that could do the job and which is supported by a myriad of language. The logic is : a CPU can do base-10 math. In banking, they need base-10 math. Just pick up a language that can do base-10 math using the BCD opcodes instead of hacking some kludge.
@ Joce :
No. Not at all. Volatile, is for variables that are shared between different task. It force the compiler to always read the variable from memory instead of caching it into a register, because some background process (say, a hardware interrupt) may have changed it meanwhile.
In this situation, it's a variable that should change. But that gets overwritten when the heap and stack collide.
And this is exactly why they can't just change it at will without testing : because now all pointer will be 4 bytes instead of 2, and the higher half of it isn't actually a memory location but a segment identifier (4 byte far pointer math != 4 byte 32bits flat pointer math). You can be sure that there are several location in the code that assume either 2byte pointer size, or that assume that you can shift/multiply pointers (you cant. you can only multiply shift the lower half, and if it overflows - correctly update the segment). Given the short time lapse, it's better that they left it so. The only thing they could do is :
Admin
And overwrite the stack ???
However i can't understand why the choosen such a memory model ? Stack and Data should e in different segments, and heap should use the rest of 1mb memory, outside of both of them. If Turbo Pascal could do it, then Turbo C could too.
Admin
Admin
It's not "business". It's stupidity - no question.
Well, ok, unless the output of the program is never used, in which case it's ok for it to be a random number generator.
Unless you have psychic powers that guarantee that no other data is ever corrupted, the fact that your memory is being corrupted in a way that noone actually understands proves that there will be other variables getting corrupted but not detected. At an absolute minimum, even if the problem isn't fixed a competent person would verify the source of the memory corruption and when it turns out to be the stack, add code to detect instances of the stack causing corruption of the heap. This would, at least, ensure that the output is valid.
Admin
No - there was an actual bug in the Visual Studio "unmanaged code to managed code" compilation. You would return true from an unmanaged c++ method and it would magically change to false when being tested by the .NET (say c#) code. Fixed with a service pack...