The Daily WTF: Curious Perversions in Information Technology

2007-04-13 Reply Admin

I'm suffering form mental diarrhea.... this has completely obliterated my mind.

captcha: alarm... sure!

2007-04-13 Reply Admin

But that's a very nice technique. In C, in constrained memory situations, where you can't afford to waste eight bytes or so by allocating two chunks of memory and having to store an extra pointer. I think I used it last, oh, 1995 coding C on an Amiga.

In C++ and the typical environments where that is used, it's a bit more ridiculous, but there's not enough context here to safely scoff at it.

2007-04-13 Reply Admin

'summand'?

I guess this guy was too clever to just create a generally useful linked list node when he has a collection of things to add up...

Faxmachinen · 2007-04-13 Reply Admin

This post was supposed to read "THIS POST MUST BE FIRST", but then Rafael Larios had to go and break it. From now on, Worse Than Failure will only be serving recipies involving ketchup.

2007-04-13 Reply Admin

I wish I could say this is not common practice, but it's made it into the dirent structure in the POSIX(?) spec.

2007-04-13 Reply Admin

Ok.

I don't see what is wrong. Isn't that a typical variable-sized structure?

The only wtf is those poor souls who can't understand the concept.

Captcha: onomatopopepia... or something like that

2007-04-13 Reply Admin

Yeah, that's not too bad. It can be nice to just have an array inside the structure, so you just have to malloc or free one thing at a time (and increase space efficiency too, although I doubt that that would matter normally). Of course, then you have to start using "malloc" instead of "new" in C++ code, and worry about lower-level things than you otherwise would have to. In any case, I know that the linux kernel uses this technique.

Volmarias · 2007-04-13 Reply Admin

This line must be last!

Ah fuck.

2007-04-13 Reply Admin

I've seen cases where this can be useful, but it certainly is an invitation to read random memory and for memory corruption. But the size is nowhere in the structure, meaning it is even easier for these problems to occur. (I don't usually post captchas, but ewww was too good)

2007-04-13 Reply Admin

This ABSOLUTELY MUST BE THE LAST COMMENT. Post another and everything could come crashing down!

asuffield · 2007-04-13 Reply Admin

There is nothing wrong with this - it is not only standard practice, but it is explicitly blessed by the C specification itself as the correct method for creating variable-size arrays before C99.

In C99, you use [] instead of [1] and the compiler will check for you that it is the last member in the struct.

2007-04-13 Reply Admin

In C code, there's nothing wrong with this technique. C is a low-level language (consider things like offsetof), and there's nothing wrong with using it as such, as long as you know what you're doing.

In C++, it is more likely to be bad form, although if it's a POD class, and there is reason to use something like that, it isn't really a WTF - using something like this in a non-POD class would be a WTF.

It might also be a slight WTF if the particular program didn't need the memory-efficiency or low-level control - but that would only be the usual case of premature optimization.

I'm sure everybody can agree that when, say, using C to implement the runtime for a high-level language, a reasonable string representation might be something like:

struct string { size_t length; char data[1]; };

2007-04-13 Reply Admin

So? This is just the "struct hack", which has been used in C since the dawn of time.

It does depends on how he uses it and how hard such a hack was needed, of course. And compilers have been getting too smart for some undefined code lately.

Strictly speaking record->myOrd[3] is undefined even if allocated, since the compiler is allowed to know that myOrd only has 1 element. The safe way to use it (when enough is allocated) is, er, this, I think:

((OrdArith::OrdElem *) ((char *)record + offsetof(struct summand, myOrd[0])))[3]

...at least if C++ works like C in that respect.

And the hack is now officially supported in C99 if you remove the index size in the declaration, but not in C++98.

2007-04-13 Reply Admin

How is portability a "myth"? The mythical aspects of the software I'm using to write this post seems to carry with them pretty tangible, real benefits.

2007-04-13 Reply Admin

While valid C, I certainly hope no one ever derives a class from that... those new class members gotta go somewhere, and it's certainly not BEFORE the old ones in more implementations.

2007-04-13 Reply Admin

This is a pretty standard declare for a variable sized structure... The benefit of having an array of [1] over having a pointer is that you'll have the structure in contiguous memory, which is typically how these things are returned from the database, files or whatever.

The only thing a bit strange is there is no length given in the structure telling you how long your dynamic structure is, but I'll assume that the length can be obtained through OrdArith::OrdElem.

Just one example of a variable length structure in common use is here: http://publib.boulder.ibm.com/infocenter/iseries/v5r4/topic/apiref/conParameter.htm?resultof=%22%76%61%72%69%61%62%6c%65%22%20%22%76%61%72%69%61%62%6c%22%20%22%6c%65%6e%67%74%68%22%20%22%73%74%72%75%63%74%75%72%65%22%20%22%73%74%72%75%63%74%75%72%22%20

Anyways, it looks OK to me. Seems like another non-wtf.

2007-04-13 Reply Admin

mav:
This is a pretty standard declare for a variable sized structure... The benefit of having an array of [1] over having a pointer is that you'll have the structure in contiguous memory, which is typically how these things are returned from the database, files or whatever.
The only thing a bit strange is there is no length given in the structure telling you how long your dynamic structure is, but I'll assume that the length can be obtained through OrdArith::OrdElem.

Just one example of a variable length structure in common use is here:

Anyways, it looks OK to me. Seems like another non-wtf.

Whoops, I gave the wrong link. Try this.

http://publib.boulder.ibm.com/infocenter/iseries/v5r4/topic/apiref/error.htm

2007-04-13 Reply Admin

The real WTF is "why didn't the poster know about this standard technique?".

It's perfectly reasonable in C and is, for compatibility reasons, acceptable in C++ as long as the struct is just "plain old data". (It may be considered 'poor style' in C++ by std::string afficionados, but I fart in their general direction - actually, they're usually right, but not always).

I use it myself for storing highly-replicated data. Consider a cache of filenames, for example. They're up to 64K bytes long in Windows, so you don't want to allocate a maximally-sized array. If you're going to store tens of thousands of the buggers, there might be some benefit in having the name contiguous with the rest of the structure rather than indirected, thus requiring a separate heap allocation.

2007-04-13 Reply Admin

asuffield:
There is nothing wrong with this - it is not only standard practice, but it is explicitly blessed by the C specification itself as the correct method for creating variable-size arrays before C99.
In C99, you use [] instead of [1] and the compiler will check for you that it is the last member in the struct.

I was going to say exactly this. I will now go away and come crashing down.

bobday · 2007-04-13 Reply Admin

the real wtf is all too often a meta-wtf these days:
How is portability a "myth"? The mythical aspects of the software I'm using to write this post seems to carry with them pretty tangible, real benefits.

Get your head in the game

2007-04-13 Reply Admin

This is standard way to perform unbound allocation in C. It is supported by the standard. It seems to me that quite a few WTF recently have been the submitter not understanding the code.

What is next? Putting some of the inner of a device driver for embedded hardware and complaining that it access memory with fixed addresses ? Showing this code:

float SquareRootFloat(float number) { long i; float x, y; const float f = 1.5F;

x = number * 0.5F;
y  = number;
i  = * ( long * ) &y;
i  = 0x5f3759df - ( i >> 1 );
y  = * ( float * ) &i;
y  = y * ( f - ( x * y * y ) );
y  = y * ( f - ( x * y * y ) );
return number * y;

}

as a WTF ? (For the uninformed, this is the square root implementation of quake 3. And yes, it is faster that the fsqrt assembly instruction).

2007-04-13 Reply Admin

Am I the only one that noticed that the 'public' keyword was applied to a 'struct', where (unless the rules have changed) all members of a struct in C++ are public by default? Things that make you go hrmmmm.....

2007-04-13 Reply Admin

its actually a very common technique in C when you need to have variable length structures and dealing with libraries. For example, thats how PostgreSQL handles variable length data types. First a 4-byte signed integer to indicate length in bytes and then a binary blob to hold the actual data. I still can't think of why this would be useful an any language with variable length data types.

2007-04-13 Reply Admin

Last Comment!

jlevin · 2007-04-13 Reply Admin

I think that code looks fine. The daily WTF really has been getting worse lately. The name change was only one step in that direction, the entries follow the same trend.

I have seen several well designed APIs that use that convention. The only example I can find right now is Microsoft's DRIVE_LAYOUT_INFORMATION_EX ( http://www.osronline.com/ddkx/storage/k306_6cvm.htm ), but there are many more.

2007-04-13 Reply Admin

Former Jr. Programmer:
Last Comment!

With sinking heart, I sense a new WTF meme....

Thief^ · 2007-04-13 Reply Admin

Fred:
This is standard way to perform unbound allocation in C. It is supported by the standard. It seems to me that quite a few WTF recently have been the submitter not understanding the code.
What is next? Putting some of the inner of a device driver for embedded hardware and complaining that it access memory with fixed addresses ? Showing this code:

float SquareRootFloat(float number) { long i; float x, y; const float f = 1.5F;
x = number * 0.5F;
y  = number;
i  = * ( long * ) &y;
i  = 0x5f3759df - ( i >> 1 );
y  = * ( float * ) &i;
y  = y * ( f - ( x * y * y ) );
y  = y * ( f - ( x * y * y ) );
return number * y;
}

as a WTF ? (For the uninformed, this is the square root implementation of quake 3. And yes, it is faster that the fsqrt assembly instruction).

I read that as "this code accesses memory directly", which it doesn't.

It is a wtf that that code is faster than the assembly square-root instruction though.

EDIT: I also can't figure out how it works :-S

DaveAronson · 2007-04-13 Reply Admin

UnwiredBen:
While valid C, I certainly hope no one ever derives a class from that...

My C++ is kinda rusty, but note that it said "struct" not "class". Can you derive a class from a struct? I wouldn't think so, but C++ has a lot of things I don't think make sense. :-)

On the other claw, you are right if you rephrase it as "ever includes it directly in another struct or class". Doing so could be OK so long as it is, again, the very last thing. However, this severely limits reusability since you can't use both it and another element that needs to be last!

snoofle · 2007-04-13 Reply Admin

I once inherited an implementation of a pseudo random number generator built with this technique. They put a single struct like this on the base of the stack at the start of the program, and then indexed up the stack to get some 'random' values. Supposedly, it took them a bit of trial and error to get the typical stack depth, so they knew how far to index.

The numbers weren't too random (for some <sarcasm>inexplicable reason</sarcasm>, the code behaved in a predictable manner and the numbers on the stack were surprisingly consistent). I wound up replacing it with something a bit more standard...

Last!

iwpg · 2007-04-13 Reply Admin

Anonymous Coward:
Am I the only one that noticed that the 'public' keyword was applied to a 'struct', where (unless the rules have changed) all members of a struct in C++ are public by default? Things that make you go hrmmmm.....

STFU. That's like complaining that someone wrote braces round a single-statement loop body.

Thief^:
It is a wtf that that code is faster than the assembly square-root instruction though.

It's faster because it's approximate, but good enough for a game. The assembly instruction is general purpose, and has to take more time to get an accurate answer.

2007-04-13 Reply Admin

Thief^:
Fred:
This is standard way to perform unbound allocation in C. It is supported by the standard. It seems to me that quite a few WTF recently have been the submitter not understanding the code.
What is next? Putting some of the inner of a device driver for embedded hardware and complaining that it access memory with fixed addresses ? Showing this code:

float SquareRootFloat(float number) { long i; float x, y; const float f = 1.5F;
x = number * 0.5F;
y  = number;
i  = * ( long * ) &y;
i  = 0x5f3759df - ( i >> 1 );
y  = * ( float * ) &i;
y  = y * ( f - ( x * y * y ) );
y  = y * ( f - ( x * y * y ) );
return number * y;
}

as a WTF ? (For the uninformed, this is the square root implementation of quake 3. And yes, it is faster that the fsqrt assembly instruction).
I read that as "this code accesses memory directly", which it doesn't.

It is a wtf that that code is faster than the assembly square-root instruction though.

EDIT: I also can't figure out how it works :-S

Oh, I wasn't clear. I talked about two separate examples. A device driver that access memory directly, OR the sqrt code from q3a.

You can get start there to get explanation of the code http://www.codemaestro.com/reviews/review00000105.html (see pdf linked at the end). But you'll need some hard numerical algo knowledge. Also, I think the code is not from carmack, and can be traced to someone else.

The point was: it is not because one doesn't understand the code that it is a wtf.

2007-04-13 Reply Admin

DaveAronson:
UnwiredBen:
While valid C, I certainly hope no one ever derives a class from that...
My C++ is kinda rusty, but note that it said "struct" not "class". Can you derive a class from a struct? I wouldn't think so, but C++ has a lot of things I don't think make sense. :-)

Sure you can (and sure, C++ has a lot of stuff that don't make much sense).

On the other hand, this evil programmer may probably need to get a small bullet in the head. To be sure he won't add anything after his "this must be the last" comment.

2007-04-13 Reply Admin

And suddently I feel sick ... Cough how can one possybly writhe ... that ... Yirk !

2007-04-13 Reply Admin

Fred:
as a WTF ? (For the uninformed, this is the square root implementation of quake 3. And yes, it is faster that the fsqrt assembly instruction).

Only on the hardware of the time. On modern processors, the fsqrt instruction is faster (somebody benchmarked this, but I don't have a link handy).

2007-04-13 Reply Admin

The REAL wtf is that the programmer added a comment to explain what he was doing and why. Everyone knows REAL programmers don't use comments, especially when they do anything "clever". :-)

2007-04-13 Reply Admin

Fred:
This is standard way to perform unbound allocation in C. It is supported by the standard. It seems to me that quite a few WTF recently have been the submitter not understanding the code.
What is next? Putting some of the inner of a device driver for embedded hardware and complaining that it access memory with fixed addresses ? Showing this code:

float SquareRootFloat(float number) { long i; float x, y; const float f = 1.5F;
x = number * 0.5F;
y  = number;
i  = * ( long * ) &y;
i  = 0x5f3759df - ( i >> 1 );
y  = * ( float * ) &i;
y  = y * ( f - ( x * y * y ) );
y  = y * ( f - ( x * y * y ) );
return number * y;
}

as a WTF ? (For the uninformed, this is the square root implementation of quake 3. And yes, it is faster that the fsqrt assembly instruction).

Last time I checked, C didn't have the "public" keyword. Maybe that's a hint, something like "maybe the used language is C++?".

And, for the "uninformed", this implementation is faster than the CPU's fsqrt instruction, but slower than the inverse square root instruction you'll find in the multimedia oriented SIMD instruction sets. Especially since you don't have to tackle with int-to-float conversions.

And it works because of the careful choice of the magic constant. A quick search on Carmacks inverse square root (because that's what the algorithm computes, as the "return number * y" statement clearly says) will give you the reasonning behind this ugly-but-clever piece of code (not that the code is from Carmack himself. He borrowed it from who knows who, last time I checked).

That's all I had to say. Yummy.

savar · 2007-04-13 Reply Admin

JustSomeone:
I'm sure everybody can agree that when, say, using C to implement the runtime for a high-level language, a reasonable string representation might be something like:
struct string { size_t length; char data[1]; };

Why char[1] instead of char*?

2007-04-13 Reply Admin

Thief^:
Fred:
This is standard way to perform unbound allocation in C. It is supported by the standard. It seems to me that quite a few WTF recently have been the submitter not understanding the code.
What is next? Putting some of the inner of a device driver for embedded hardware and complaining that it access memory with fixed addresses ? Showing this code:

float SquareRootFloat(float number) { long i; float x, y; const float f = 1.5F;
x = number * 0.5F;
y  = number;
i  = * ( long * ) &y;
i  = 0x5f3759df - ( i >> 1 );
y  = * ( float * ) &i;
y  = y * ( f - ( x * y * y ) );
y  = y * ( f - ( x * y * y ) );
return number * y;
}

as a WTF ? (For the uninformed, this is the square root implementation of quake 3. And yes, it is faster that the fsqrt assembly instruction).
I read that as "this code accesses memory directly", which it doesn't.

It is a wtf that that code is faster than the assembly square-root instruction though.

EDIT: I also can't figure out how it works :-S

Google the hex number and you'll find some articles explaining how it works.

It is NOT faster, as far as I know, on modern CPUs. It only provides a tangible benefit when:

a) You don't need exact accuracy (its about 1.7% off) b) You're on a processor where floating point operations are slow.

I've tested it on modern CPUs and it is slightly slower than the built-in sqrt function.

2007-04-13 Reply Admin

savar:
JustSomeone:
I'm sure everybody can agree that when, say, using C to implement the runtime for a high-level language, a reasonable string representation might be something like:
struct string { size_t length; char data[1]; };

Why char[1] instead of char*?

Because char* requires a separate allocation, while char data[1] will be contiguous memory. You do, after all, only want to malloc once. :-)

Thief^ · 2007-04-13 Reply Admin

savar:
JustSomeone:
I'm sure everybody can agree that when, say, using C to implement the runtime for a high-level language, a reasonable string representation might be something like:
struct string { size_t length; char data[1]; };

Why char[1] instead of char*?

Because char* would mean that you'd have to allocate a separate external-to-the-structure array. The idea is that you do (struct string*)malloc(sizeof(size_t)+length*sizeof(char)), and then you can just access the elements of the array as if it was char[length] instead of char[1].

Makes passing the struct to functions a pain though, because pass-by-value would only pass the first char, because that's all the struct definition says it contains.

2007-04-13 Reply Admin

There's a clue for you in the non-C pubic keyword, meaning it has to be compiled with a C++ compiler, and will be treated as a class with all members public. Unlike C, in C++ there is no requirement that members of classes are contiguous (which is how VS2003 can get away with extending managed classes in C++), though usually they are, and are allocated in the order presented in the source file (otherwise it might have to be first).

With a compiler where you know the layout, using a reinterpret cast or operator new on some memory you've allocated yourself to create a pointer to this type is safe. But not portable.

I'd imagine the length of p->myOrd is given by (reinterpret_cast<void*>(p->myNext)-reinterpret_cast<void*>(p->myOrd)-sizeof(summand*))/sizeof(OrdArith::OrdElem ), or if the sizes of the types are the same: (p-p->next-1).

So maybe the WTF is that a C programmer has got his hands on a C++ compiler and not noticed.

cparker · 2007-04-13 Reply Admin

OrdArith::OrdElem myOrd[1]; // myOrd

Ah, I see. Thanks for clarifying that for me. I wouldn't have guessed "myOrd" from "myOrd[1]". You're a real pal.

2007-04-13 Reply Admin

This is totally off topic, but I was listening to random music on my fairly large mp3 drive when I realized that no matter how happy I am, I'm only 1 Emmylou Harris song away from depressed.

CAPTCHA: (smile) I would, if it wasn't for that damned Emmylou.

2007-04-13 Reply Admin

Pete Kirkham:
Unlike C, in C++ there is no requirement that members of classes are contiguous (which is how VS2003 can get away with extending managed classes in C++), though usually they are, (...)

Nonsense. In both languages, members are required to have proper alignment just like other data. Which means the standard can _not_ require them to be contiguous.

With a compiler where you know the layout, using a reinterpret cast or operator new on some memory you've allocated yourself to create a pointer to this type is safe. But not portable.

Safe and portable, as long as you don't depend on more layout than the standard requires, and use code which doesn't allow the compiler to "know" that there is just one member.

I'd imagine the length of p->myOrd is given by (reinterpret_cast<void*>(p->myNext)-reinterpret_cast<void*>(p->myOrd)-sizeof(summand*))/sizeof(OrdArith::OrdElem ), or if the sizes of the types are the same: (p-p->next-1).

No. The compiler - and sizeof()- doesn't know that the programmer uses another size than the structure definition imples. The programmer has to take care of the size himself.

drfloyd5 · 2007-04-13 Reply Admin

I can lurk no longer. I made an account just to make this post in person.

The CodeSOD now officially blows. I will use this 1 WTF to prove it.

The WTF isn't.
There are 30 posts explaining in non-unique ways why #1 is true.

The signal to noise ratio of the CodeSOD WTFs is pretty crappy. And the comments s/n ratio is even worse. So all in all the whole CodeSOD is a useless read.

Now, believing that one should not critize without offering a suggestion: Simple, Change the CodeSOD to the CodeSOW. Just one a week. That would releive the pressure to release sub-standard CodeS just to fill todays slot. Complex, Create a Which Code is the Bigger Failure Challange. Present two snippits of code, user vote determines wich ones suck more. Winers go on to fight last weeks winners, and so forth.

-DF5

2007-04-13 Reply Admin

iwpg:
Anonymous Coward:
Am I the only one that noticed that the 'public' keyword was applied to a 'struct', where (unless the rules have changed) all members of a struct in C++ are public by default? Things that make you go hrmmmm.....
STFU. That's like complaining that someone wrote braces round a single-statement loop body.

Hey now! I actually like putting braces around single-statement loop bodies. You never know if or when you're gonna want to add other stuff later.

I just think the effort of adding the public specifier to a struct, along with the adorning comments is just dumb.

sibtrag · 2007-04-13 Reply Admin

I am reminded of the first bug I investigated during my limited career as an OS developer as an undergraduate intern at a long-gone computer company in the early 80s.

I was left pretty much on my own to figure out how to build the OS after my code finally went through the assembler. Having done so, I waited for my time slot to use the machine for testing. This was a big iron machine, so the little development office I worked in only had one...we used it for coding, building & OS testing. Thus, testing came in time slots allocated before & after the normal business day.

My code quickly hit an illegal instruction trap & dropped me into the rudimentary debugger. When I looked around, all my code was gone....replaced by 0s. My mentor & I looked around to try to figure out what happened to the code when he noticed that the "OSEND" symbol was not at the end of the OS.

It was then revealed to me that I had broken the one rule about building that OS: The OSEND module must be at the end of the list of modules on the link list. The memory initializer started clearing memory at that symbol and thus wiped out much of the OS.

Happily, that could be disabled with a simple patch so that I could find my second bug.

2007-04-13 Reply Admin

Well it was clear to me that it wasn't a WTF once I saw "impunity" in the comments.

Not having done C in a while, I must say that I miss hacks like that..

2007-04-13 Reply Admin

The real WTF is he's not declaring it:

typedef std::list< std::vector< OrdArith::OrdElem > > summand;

(unless there's more to the class that has been commented-out)

Also, thanks to the poster above that mentioned C99 codifies the variable struct trick with elem[] instead of elem[1]. That's quite cool.

2007-04-13 Reply Admin

This isn't a WTF. In fact, look at the implementation of any STL std::string class in C++. It is what makes an efficient implementation versus one that fragments the hell out of your heap by requiring you to allocate 4-byte slots for pointers to pointers to data.

From GNU std::string:

      struct _Rep_base
      {
    size_type       _M_length;
    size_type       _M_capacity;
    _Atomic_word        _M_refcount;
      };

      struct _Rep : _Rep_base
      {
    // Types:
    typedef typename _Alloc::template rebind<char>::other _Raw_bytes_alloc;

    // (Public) Data members:

    // The maximum number of individual char_type elements of an
    // individual string is determined by _S_max_size. This is the
    // value that will be returned by max_size().  (Whereas npos
    // is the maximum number of bytes the allocator can allocate.)
    // If one was to divvy up the theoretical largest size string,
    // with a terminating character and m _CharT elements, it'd
    // look like this:
    // npos = sizeof(_Rep) + (m * sizeof(_CharT)) + sizeof(_CharT)
    // Solving for m:
    // m = ((npos - sizeof(_Rep))/sizeof(CharT)) - 1
    // In addition, this implementation quarters this amount.
    static const size_type  _S_max_size;
    static const _CharT _S_terminal;

    // The following storage is init'd to 0 by the linker, resulting
        // (carefully) in an empty string with one reference.
        static size_type _S_empty_rep_storage[];

        static _Rep&
        _S_empty_rep()
        {
      void* __p = reinterpret_cast<void*>(&_S_empty_rep_storage);
      return *reinterpret_cast<_Rep*>(__p);
    }

Must Be Last

Leave a comment on “Must Be Last”