The Daily WTF: Curious Perversions in Information Technology

2007-04-13 Reply Admin

Not only is putting a variably-sized array at the end of a struct a common trick in C, the gcc compiler has an extension (not used here) for the sole purpose of making that trick easier: zero-length arrays. There's a slight issue in that the programmer is using C idioms in a C++ project, and he's probably in for constructor-related troubles, but that doesn't qualify as a wtf.

2007-04-13 Reply Admin

DaveAronson:
UnwiredBen:
While valid C, I certainly hope no one ever derives a class from that...
My C++ is kinda rusty, but note that it said "struct" not "class". Can you derive a class from a struct? I wouldn't think so, but C++ has a lot of things I don't think make sense. :-)

The main (only) difference between a struct and a class in C++ is that a struct's members default public and classes members default private.

2007-04-13 Reply Admin

savar:
JustSomeone:
I'm sure everybody can agree that when, say, using C to implement the runtime for a high-level language, a reasonable string representation might be something like:
struct string { size_t length; char data[1]; };

Why char[1] instead of char*?

In addition to avoiding overhead of allocating memory external to the struct (As has been mentioned), you can also avoid copying data around unnecessarily. Think of dealing with data that has some kind of fixed-width header followed by a body whose size is indicated in the header, like a file record, or a network packet.

You can cast a pointer to the raw network packet to a struct like this, access the required fields, and manipulate the body without allocating separate space for the header and payload, and copying stuff around. This is especially handy if your packet is layered -- something like TCP in IP in an Ethernet frame. Once you know your ethernet frame contains an IP packet, you can simply cast the pointer to the body to a pointer of type "struct my_ip_packet", etc., and go from there without copying anything.

2007-04-13 Reply Admin

Quite true. Of course just because you can do something doesn't mean you should. You could make all your classes structs (make the members private) and have functions in them too. Same diff, makes your code a little more difficult to understand, but you want to make yourself indispensable right?

AnthonyG · 2007-04-13 Reply Admin

Cryptic struct names+unneeded public access specifier(C++ structs default to public)+linked list= Fun :)

2007-04-13 Reply Admin

dave:
(It may be considered 'poor style' in C++ by std::string afficionados, but I fart in their general direction - actually, they're usually right, but not always).

No, we are always right.

If it seems like we're not right, it's because you forgot to provide a partial template specialization of a member template function of a template class you've never heard of.

Somewhere.

In your code.

;-)

2007-04-13 Reply Admin

iwpg:

Thief^:
It is a wtf that that code is faster than the assembly square-root instruction though.
It's faster because it's approximate, but good enough for a game. The assembly instruction is general purpose, and has to take more time to get an accurate answer.

Aha! So that's why I keep getting nailed by assholes with railguns, while my return fire just burns the walls behind them! Stupid inaccurate sqrt functions, messing up my perfect aim...

;-)

Last!

2007-04-13 Reply Admin

Just to clarify on the Quake 3 code, if anyone was still interested:

http://games.slashdot.org/comments.pl?sid=209372&cid=17070964

It's Newtons Method a few times, with a decent first guess.

2007-04-13 Reply Admin

My recollections of C (quite dated now) are that declarations of the form "char* x" declare pointers not arrays.

So the programmer is declaring a structure that consists of a pointer followed by an OrdArith::OrdElem (whatever that is).

The implications are that, somewhere, an array of summands is to be created, perhaps like this:

summand sa[6];

The comments would then seem to indicate that the code might address sa[6], which is beyond the bounds of the array (which would have subscripts from 0..5), because "sa[6]^.summand" would overlay the OrdArith::OrdElem of sa[5] in memory, right?

So, unless someone can explain how I got this wrong, it seems this is WTF:

But the OrdArith::OrdElem isn't initialized (unless the constructor initializes it out of sight of the code snippet) and so the program might be accessing "random" data in sa[6]^.summand.
A OrdArith::OrdElem is allocated in all of sa[0] through sa[4], even though only the one in sa[5] is supposedly needed. It would seem better to leave it out and declare the array with 7 elements.
Even that assumes that there is some logical reason why the programmer can't keep the array subscripts in within bounds.

So did I misunderstand all this somehow?

2007-04-13 Reply Admin

asuffield:
There is nothing wrong with this - it is not only standard practice, but it is explicitly blessed by the C specification itself as the correct method for creating variable-size arrays before C99.
In C99, you use [] instead of [1] and the compiler will check for you that it is the last member in the struct.

The reason C99 added this is that there was not a blessed way to do it beforehand. It's not strictly conforming now, and it wasn't then... To say nothing of it being obviously C++.

2007-04-13 Reply Admin

Ok, in C this is almost valid, for the reasons explained above many times. However in C++ it should almost certainly be one of the following:

template<int S> struct summand { static const int size = S; summand *myNext; OrdArith::OrdElem myOrd[S]; };

or

struct summand { summand *myNext; std::vectorOrdArith::OrdElem myOrd; };

Or the above with the typedef, would make it a doubly linked list though.

So the WTF is that it's extremely poor style and probably wrong for C++, and technically just about wrong for C99.

2007-04-13 Reply Admin

seebs:
asuffield:
There is nothing wrong with this - it is not only standard practice, but it is explicitly blessed by the C specification itself as the correct method for creating variable-size arrays before C99.
In C99, you use [] instead of [1] and the compiler will check for you that it is the last member in the struct.

The reason C99 added this is that there was not a blessed way to do it beforehand. It's not strictly conforming now, and it wasn't then... To say nothing of it being obviously C++.

I would expect compilers to provide a (optional) warning whenever char[1] appears other than at the end of a struct (including use of that struct in an array or derived type) or when the struct is copied by value.

Seriously, who goes out of their way to allocate an array of char with a single element?

2007-04-13 Reply Admin

This is not WTF at all -- varariable size structs do make a lot of sense, e.g.

for use in shared memory
cache-conciousness
serialization
...

Though they are noz the most beautful construct C/C++ offers...

2007-04-13 Reply Admin

The real WTF is what's wrong with the guy. Instead of putting this comment, he would have been better with using sizeof( summand ) and not having to care about elements ordering. What's wrong with (Element*)(((char*)aSummand) + sizeof( summand ))? You can #define it.

2007-04-13 Reply Admin

Windows headers even declare

#define ANYSIZE_ARRAY 1

so that you can do this...

typedef struct _THING { int arraysize; int AllMyNumbers[ANYSIZE_ARRAY]; } THING;

captcha: bling (rhythms with THING)

2007-04-13 Reply Admin

I don't really understand today's snippet, but I do have one like it:

We have a monster global dummy object (well over 500 simple member variables). If you need to add to the object (and one often does), you must:

Add your new variable at the end.
Determine how many bytes your variable will need.
Go to a nearby byte array and subtract the amount of bytes your variable will occupy from the amount declared for that array (dunno what will happen when the amount declared gets to 0, but it seems sized massively for future needs).

This is because the bytes are streamed out for persistence. Straight up bits. Who needs structured data of any kind?

Take that for inflexible.

2007-04-13 Reply Admin

Thief^:
Fred:
This is standard way to perform unbound allocation in C. It is supported by the standard. It seems to me that quite a few WTF recently have been the submitter not understanding the code.
What is next? Putting some of the inner of a device driver for embedded hardware and complaining that it access memory with fixed addresses ? Showing this code:

float SquareRootFloat(float number) { long i; float x, y; const float f = 1.5F;
x = number * 0.5F;
y  = number;
i  = * ( long * ) &y;
i  = 0x5f3759df - ( i >> 1 );
y  = * ( float * ) &i;
y  = y * ( f - ( x * y * y ) );
y  = y * ( f - ( x * y * y ) );
return number * y;
}

as a WTF ? (For the uninformed, this is the square root implementation of quake 3. And yes, it is faster that the fsqrt assembly instruction).
I read that as "this code accesses memory directly", which it doesn't.

It is a wtf that that code is faster than the assembly square-root instruction though.

EDIT: I also can't figure out how it works :-S

Newton's Method.

2007-04-13 Reply Admin

I've read the comments, but I'm just not understanding today's snippet and why some say it is "good"?

Is it attempting to set up a situation where one can write to memory after the end of the heap? How can one make sure that that the expected space there even exists?

Would declaring some sort of uninitialized (and unused) array of the maximum expected size after that declaration solve any issues about "this must be last", or would it refuse to compile? Or would the optimizer get rid of it?

2007-04-13 Reply Admin

Hallvard:
Pete Kirkham:
Unlike C, in C++ there is no requirement that members of classes are contiguous (which is how VS2003 can get away with extending managed classes in C++), though usually they are, (...)
Nonsense. In both languages, members are required to have proper alignment just like other data. Which means the standard can _not_ require them to be contiguous.
With a compiler where you know the layout, using a reinterpret cast or operator new on some memory you've allocated yourself to create a pointer to this type is safe. But not portable.
Safe and portable, as long as you don't depend on more layout than the standard requires, and use code which doesn't allow the compiler to "know" that there is just one member.
I'd imagine the length of p->myOrd is given by (reinterpret_cast<void*>(p->myNext)-reinterpret_cast<void*>(p->myOrd)-sizeof(summand*))/sizeof(OrdArith::OrdElem ), or if the sizes of the types are the same: (p-p->next-1).
No. The compiler - and sizeof()- doesn't know that the programmer uses another size than the structure definition imples. The programmer has to take care of the size himself.

Your primary problem with this is if you tried to do this with a base class in C++, or a class involved in multiple inheritance. You'd be hosed. You can't reliably predict which base is going to be located relative to is peers. This trick is really only reliable for simple inheritance, and only when done at the most derived class, or all child classes don't add any new data members.

2007-04-13 Reply Admin

whicker:
I've read the comments, but I'm just not understanding today's snippet and why some say it is "good"?

This code is a fairly common way to allow for variable sized data structures. Here's how you use this sort of code (imagine this is some kind of networking API):

typedef struct message_struct
{
    host_addr send_to;
    int length;
    char data[1]; // must be last member
} message;

message* make_message(host_addr send_to, int data_length, char* data)
{
    message* ret = malloc(sizeof(packet) + (data_length - 1));

    ret.send_to = send_to;
    ret.length = data_length;
    memcpy(ret.data, data, data_length);

    return ret;
}

This works in C++ too, as long as the struct or class has nothing that couldn't be in a C struct (i.e., no private or protected members, no constructors, no virtual functions, etc).

darin · 2007-04-13 Reply Admin

EmmanuelD:
And, for the "uninformed", this implementation is faster than the CPU's fsqrt instruction, but slower than the inverse square root instruction you'll find in the multimedia oriented SIMD instruction sets.

Accurate square-root computation can be expensive in hardware, but relatively simple in software. Many RISC computers don't have "fsqrt" since the instruction will take longer to execute than other floating point instructions. Software can converge on the answer in only a few iterations.

2007-04-13 Reply Admin

Pete Kirkham:
There's a clue for you in the non-C pubic keyword, ...

let's not hit below the belt there

kirchhoff · 2007-04-13 Reply Admin

whicker:
Is it attempting to set up a situation where one can write to memory after the end of the heap? How can one make sure that that the expected space there even exists?
Would declaring some sort of uninitialized (and unused) array of the maximum expected size after that declaration solve any issues about "this must be last", or would it refuse to compile? Or would the optimizer get rid of it?

Let me explain this by way of example. Say you have a piece of software that has to do some math with variable length arrays of coefficients, say to represent filters for processing audio. You want to have arrays of variable length, but you also want to store some flags for that array that describe the purpose or special handling during processing. And you don't want to create additional different data structures just for that purpose... you think you might need to add more data/relationships to the coefficients struct later in development and don't want to make your object model a nightmare and end up with a badly fragmented heap. So you come up with the following data structure:

typedef struct {
  size_t n_coeffs;
  int my_funky_flags;
  double coefficients[1];
} CV;

The idea is that each instance of CV could hold a different number of doubles (specifically n_coeffs). You are supposed to allocate sizeof(CV) + sizeof(CV::coefficients) * (number_of_elements_to_store - 1) bytes of memory in the constructor, cast it to a CV pointer, and assign number_of_elements to the n_coeffs member before returning the pointer.

CV * CVFactory(size_t n_elems) {
  void * ptr;
  ptr = malloc(sizeof(CV)) + malloc(sizeof(double) * (n_elems - 1));
  if(ptr) {
    ((CV *) ptr)->n_elems = n_elems;
    ((CV *) ptr)->flags = FLAGS_DEFAULT;
  }
  return (CV *) ptr;
}

/// ...

CV * myCoeffs = CVFactory(10); //Allocate a 10 element CV
assert(myCoeffs && myCoeffs->n_coeffs == 10);
myCoeffs->coefficients[0] = 0.0;
myCoeffs->coefficients[1] = -1.0;
//blah blah
myCoeffs->coefficients[9] = sqrt(phase_of_the_moon("Blargh"));
myCoeffs->flags |= FLAGS_NEWTONS_METHOD;

Whenever the user access CV->coefficients[i], it's okay for i > 1... i can go all the way up to CV->n_coeffs, because that's how much extra memory you allocated. The compiler won't issue a warning if i > 1 if it is capable of doing so. Most understand that a 1-length array (which isn't normally useful to define in a struct) is to be treated as an UNBOUNDED array for just this kind of construction. In any case the element is never optimized out because it's not zero sized. And all compilers will attempt to locate it at the end of the struct. But not all compilers will try to align it... that's something you have to handle yourself if that matters.

Incidentally, the C99 standard now allows for you to just use the [] notation, omitting the 1, and it will do the right thing. In that case, the array member has no size contribution when you use sizeof(), but the name of the array and it's type is still kept track of. In such cases it is "optimized out", while retaining the desired behavior. Accessing that array will access the memory location right beyond the struct itself... remember you allocated extra just for that purpose in the constructor/factory. GCC has an extension that recognized the nonsensical "[0]" for the same purpose.

Why is this feature useful? You do this when you have simple types wrapping a variably sized array to avoid fragmenting the heap by allocating both arrays AND nodes to point to them.

2007-04-13 Reply Admin

mi.size = sizeof(mi); //seems to be what ms thinks is good form

2007-04-13 Reply Admin

drfloyd5:
I can lurk no longer. I made an account just to make this post in person.
The CodeSOD now officially blows. I will use this 1 WTF to prove it.

The WTF isn't.

There are 30 posts explaining in non-unique ways why #1 is true.

The signal to noise ratio of the CodeSOD WTFs is pretty crappy. And the comments s/n ratio is even worse. So all in all the whole CodeSOD is a useless read.

I beg to differ -- I cut my teeth on C++ back in ~1997, and never learned "correct" C. If this apparently hackish attempt at fooling the compiler is the "right" way to do variable-sized structs, maybe there shouldn't be such things (in C anyway) as variable-sized structs. In a vacuum, I still consider this example a WTF, simply because any code that does non-obvious things and isn't commented accordingly is a WTF, at least if the reader doesn't know better.

I'm glad this got posted, because from time to time I have to get my hands dirty with somebody else's C code, and where I would previously have recoiled in horror at seeing this abuse of assumptions, now I can understand that at least there's precedent -- maybe even a good reason.

One of the main reasons I read WTF is to learn more about the "right" way of doing things in different languages (usually by watching reaction to the "wrong" way). I want to see the CodeSOD stick around!

CAPTCHA: "ewww" Heh. Apropos!

2007-04-13 Reply Admin

As others have said, this is a fairly normal variable-length struct in C. Weird comment, but normal.

But it's a mistake in C++. Maybe not WTF-level, but a mistake.

Why? It should have been a struct, not a class. A C++ class may have a vtable, and that vtable can be stored at the beginning or end of the class in memory. If it's at the end then accesses past [0] will be conflicting with the vtable. Don't think it doesn't happen ... a few years ago I ran into some code that had problems because we switched to a new version of the compiler that relocated the vtables.

This class happens to not get a vtable because it's plain-old-data (POD) and isn't subclassed. But it's implicitly relying on a pretty subtle and fragile/unguaranteed behavior -- that nobody is using a subclassed version of it, that there either isn't a vtable, and if there is a vtable that it's kept at the start of the class.

2007-04-13 Reply Admin

eraserhd:
From GNU std::string:
  static size_type _S_empty_rep_storage[];

Unless I'm very much mistaken, static breaks it totally. And this code is doing something else entirely.

2007-04-13 Reply Admin

That is a perfectly fine imeplementation idiom. The WTF was that the guy had to comment on it -- does he work with people who don't know the langauge or something?

No, I'm serious! There are many cases where the number of heap blocks (or the number of cache misses getting to your data) matters significantly.

2007-04-13 Reply Admin

Drew Thaler:
Why? It should have been a struct, not a class.

Look again, it's a struct.

2007-04-13 Reply Admin

In C++, you allocate objects like this one on the stack, using reference counting (and move semantics in the copy ctor / assignment operator), which allows simple and fast handling of the objects (e.g. using in STL containers by value). The object then allocates its buffer on the heap which doesn't fragment more than in the WTF solution.

2007-04-13 Reply Admin

It's also a problem if you put more than one of them into an array.

chrismcb · 2007-04-13 Reply Admin

savar:
JustSomeone:
I'm sure everybody can agree that when, say, using C to implement the runtime for a high-level language, a reasonable string representation might be something like:
struct string { size_t length; char data[1]; };

Why char[1] instead of char*?

I know several people replied to this, but...

An array is not synonymous with a pointer. Although you can treat them the same in code, in reality they aren't the same.

When you do char *p (NOOOO the star doesn't belong on the char!) the compiler will allocate (on a 32 bit machine) 4 bytes for a pointer. When you do char p[1] the compiler (compiler independed) will reserve 1 byte (while a char is 7 bits, it will reserve 1 byte, possibly 4) for your character.

In both cases you can refer to your character as *p, but you can see they are implemented differently.

As other people pointed out, if you do char *p, you now have to point your pointer at something. Either malloc(new, whatever) up some more memory, or point it to some that already exist. But with a char p[1] the memory has already been reserved.

Now one common use of this pattern is to read in a chunk of memory, and stick a header on the front, ala BITMAPINFO.

ifthenelse · 2007-04-13 Reply Admin

Laff. The WTF is people who think this is a WTF.

chrismcb · 2007-04-13 Reply Admin

Coyne:
My recollections of C (quite dated now) are that declarations of the form "char* x" declare pointers not arrays.
So the programmer is declaring a structure that consists of a pointer followed by an OrdArith::OrdElem (whatever that is).

The implications are that, somewhere, an array of summands is to be created, perhaps like this:
summand sa[6];
The comments would then seem to indicate that the code might address sa[6], which is beyond the bounds of the array (which would have subscripts from 0..5), because "sa[6]^.summand" would overlay the OrdArith::OrdElem of sa[5] in memory, right?

So, unless someone can explain how I got this wrong, it seems this is WTF:

But the OrdArith::OrdElem isn't initialized (unless the constructor initializes it out of sight of the code snippet) and so the program might be accessing "random" data in sa[6]^.summand.

A OrdArith::OrdElem is allocated in all of sa[0] through sa[4], even though only the one in sa[5] is supposedly needed. It would seem better to leave it out and declare the array with 7 elements.

Even that assumes that there is some logical reason why the programmer can't keep the array subscripts in within bounds.

So did I misunderstand all this somehow?

Yeah, you missed the boat.

It appears that summand is supposed to be a linked list, so he would not do "summand sa[6]" But if he did, then he accessed "summand[6]" he would be access memory outside is block (period. It wouldn't overlay OrdElem, because no extra memory was given for OrdElem)

This setup was to create large blocks of memory, and link them together through a "summand."

It would go something like this:

pNewnode = (summand*)malloc(sizeof(summand)+sizeof(OrdArith::OrdElem)999)/ some large number*/) pNewnode->myNext = prevSummand; for (t = 0 to 1000) summand->myOrd[t] = fooOrd[t]; // Or whatever copy mechanism you want to use

I have to say, if this struct ONLY contains a pointer, and a chunk of memory, there are other better ways to do this. If this struct has other metadata other than a next pointer, then its ok.

2007-04-13 Reply Admin

OK, I think you guys need a new rule: no more Java programmers submitting C/C++ WTFs. Placing a variable-length array at the end of a structure is a long-standing technique in C. It avoids the space overhead of storing a pointer to an extra heap block, and likewise avoids the inherent problems with static array sizes. But syntactically, it can be accessed just like a member array.

It is actually very robust. ISO C99 has even blessed the (already mentioned by jimrandomh above) gcc extension allowing a "zero size" member at the end of the array.

Really guys. The criteria for a WTF has to be a real problem, not just "OMG WTF duz this c0d du!!!"

2007-04-13 Reply Admin

nobody:
I've seen cases where this can be useful, but it certainly is an invitation to read random memory and for memory corruption. But the size is nowhere in the structure, meaning it is even easier for these problems to occur. (I don't usually post captchas, but ewww was too good)

I thought this was crazy too, the first time I saw it in C. It's used in the PostgreSQL "VARYING" data types. The "text" datatype (i.e. VARCHAR(N)) is defined this way:

typedef struct { unisigned int varlen; char data[1]; } text;

Apparently, "data" couldn't be a (char *) type; that stores a pointer in the struct. the "data[1]" makes it store a byte, and "data" references it's first part as an array.

PostgreSQL requires the "data" to be allocated with the struct, and the real size set in "varlen". For an ASCII VARCHAR(30), add 30 bytes:

/* Store "Hello World" in STR VARCHAR(30); */ text str; str = malloc(sizeof(text) + 30sizeof(char)); memcpy(str->data, "Hello World", 11); str->varlen = 11;

That said, C++ should never use this technique! Use a constructor and the new() operator. Even a cross-compiler to a PDA or embedded-chip makes tight code from this.

aquanight · 2007-04-13 Reply Admin

Drew Thaler:
As others have said, this is a fairly normal variable-length struct in C. Weird comment, but normal.

Normal for C. Not for C++.

Drew Thaler:
Why? It should have been a struct, not a class.

It is a struct. The public keyword is redundant (though I've been known to start class declarations with private: even though it's default. Readability, people, seriously). As for the comment next to the public, well, let me just say that I'd rather use friends than make everything public. (If needed, and the functions would've been global scope, you could just put the functions as static members of a singleton class. Then friend the class.) Then again, you shouldn't need friend anyway except for operator overloading. Two words: Accessor methods.

Drew Thaler:
A C++ class may have a vtable, and that vtable can be stored at the beginning *or* end of the class in memory. <snip> This class happens to not get a vtable because it's plain-old-data (POD) and isn't subclassed. But it's implicitly relying on a pretty subtle and fragile/unguaranteed behavior -- that nobody is using a subclassed version of it, that there either isn't a vtable, and if there is a vtable that it's kept at the start of the class.

Again, it's a struct, not a class. But in C++ the keywords are equivalent for all purposes except default visibility (struct: public, class: private).

As you said, this POD will have no vtable. IIRC, only the declaration of a virtual method or destructor introduces a vtable. If it is inherited by a class that introduces virtuals, that subclass will also introduce the vtable (and thus put the vtable at the start/end of that class - nothing else would make sense really). Of course, any sort of inheriting, vtable or no, will mess up the variable-length feature of this structure. You could actively prevent this by abusing the struct==class equivalency, a'la private constructor + public static factory method (that takes the size paramter) (this works because inheriting basically requires at least one usable constructor). But having never used the trick myself, I don't honestly know if having a private constructor wrecks the trick. Alternatively, you could actively prevent it by not inheriting your struct, put a comment saying DON'T INHERIT OR ELSE, and rely on other coders using your struct to not be idiots with it. Alternatively, you could use more C++-friendly approaches (See below).

Now, as for why this is a WTF. No matter what, it's a WTF because the trick needs a size field, else you can't know how far "greater than 1" you can index "with impunity". sizeof won't tell you (it'll just give the size of the pointer, or the compile-time known size of the struct (with the array being [1])). I guess you could negative-index into the malloc header or something, but then you're no longer portable. Otherwise, we'd need to know what OrdArith::OrdElem is. If it's anything but a typedef to char, then the program could just as easily use std::vectorOrdArith::OrdElem (or std::vector<const OrdArith::OrdElem*> if it's a big class/struct so you don't want massive-copy-fun). If it's a typedef to char, then there's two possibilities:

It's a string. So it should be std::string.
It's meant for some element that the struct doesn't know in advance what. In which case, the C++ way is template <typename T> struct ...... (yes, you can do this to structs). In fact, you could use template for the struct even if it's an array of something. template <int S> struct ... { ...; OrdArith::OrdElem myOrd[S]; } - As someone has already pointed out.

2007-04-13 Reply Admin

Petli:
In C++ and the typical environments where that is used, it's a bit more ridiculous, but there's not enough context here to safely scoff at it.

There's plenty of context to scoff at that. Even in 1995, you would not do something this ugly to save 8 bytes. Typical 1995 machines had over 1mb of RAM. There's no excuse for this type of code, anywhere, except maybe on some embedded system from the early 80s, and even then, it's poor quality work.

In fact, I can say that that code is unacceptable at any point in time since the existence of C++, which is what that code is.

Captcha: bathe: what I feel like I need to do after reading that code

2007-04-13 Reply Admin

The real WTF is (and I always hated the people that say that): How lowly he thinks of his peers.

captcha: cognac - no thanks, I had enough

2007-04-13 Reply Admin

can lurk no longer. I made an account just to make this post in person.
The CodeSOD now officially blows. I will use this 1 WTF to prove it.

The WTF isn't.

There are 30 posts explaining in non-unique ways why #1 is true.

The signal to noise ratio of the CodeSOD WTFs is pretty crappy. And the comments s/n ratio is even worse. So all in all the whole CodeSOD is a useless read.

I also can lurk no longer. More quality less quantity. I am going back to reading the old articles I have missed and hope things are fixed by the time I return.

Things are worse than fucked.

To try and also make a suggestion: You took the daily out of the name and now there are articles pretty much every day? I dont see an rss feed, just let us know when there is something new and we don't have to read low standard articles.

How about a section like - you might think this is a WTF but its actually quite clever.

Anyone remeber lints bug of the month? http://www.gimpel.com/html/bugs.htm

2007-04-13 Reply Admin

This is even standardized (at least in C99, but perhaps even before). It's called a "flexible array member" and the standard says you can declare that array as:

struct s { int i; char a[]; };

But for compatibility with older GCCs, you'd often declare it as a[0]. And for compatibility with compilers that don't support that either, there's always a[1].

2007-04-13 Reply Admin

aquanight:

It's a string. So it should be std::string.

It's meant for some element that the struct doesn't know in advance what. In which case, the C++ way is template <typename T> struct ...... (yes, you can do this to structs). In fact, you could use template for the struct even if it's an array of something. template <int S> struct ... { ...; OrdArith::OrdElem myOrd[S]; } - As someone has already pointed out.

Your template<int S> solution only works if the size of the array is known at compile time.

And you don't need a size field if OrdElem contains some value that tells you you're at the end of the list.

And if you're mapping to a buffer (a very common case where this trick is used), you can't use a std::vector without a lot of wasted work/memory.

Face it people, this isn't enough to call this a real WTF unless there's more context we aren't seeing.

2007-04-14 Reply Admin

I know everybody's pointed out already that this is an old, well-known abuse of a dark corner of C. For the story, Dennis Ritchie himself used it and attached a funny comment to it: "unwarranted chumminess with the compiler" (some sources list it as ""unwarranted chumminess with the implementation"). FWIW

2007-04-14 Reply Admin

Andrew:
nobody:
I've seen cases where this can be useful, but it certainly is an invitation to read random memory and for memory corruption. But the size is nowhere in the structure, meaning it is even easier for these problems to occur. (I don't usually post captchas, but ewww was too good)

I thought this was crazy too, the first time I saw it in C. It's used in the PostgreSQL "VARYING" data types. The "text" datatype (i.e. VARCHAR(N)) is defined this way:

typedef struct { unisigned int varlen; char data[1]; } text;

Apparently, "data" couldn't be a (char *) type; that stores a pointer in the struct. the "data[1]" makes it store a byte, and "data" references it's first part as an array.

PostgreSQL requires the "data" to be allocated with the struct, and the real size set in "varlen". For an ASCII VARCHAR(30), add 30 bytes:

/* Store "Hello World" in STR VARCHAR(30); */ text str; str = malloc(sizeof(text) + 30sizeof(char)); memcpy(str->data, "Hello World", 11); str->varlen = 11;

That said, C++ should never use this technique! Use a constructor and the new() operator. Even a cross-compiler to a PDA or embedded-chip makes tight code from this.

Thanks Andrew,

This is the first comment on this thread that actually makes sense of the technique used.

2007-04-14 Reply Admin

Perfectly sensible C that either inexplicably survived a C -> C++ port or was written by a C programmer who didn't actually know C++.

2007-04-14 Reply Admin

aquanight:

Again, it's a struct, not a class. But in C++ the keywords are equivalent for all purposes except default visibility (struct: public, class: private).

As you said, this POD will have no vtable. IIRC, only the declaration of a virtual method or destructor introduces a vtable. If it is inherited by a class that introduces virtuals, that subclass will also introduce the vtable (and thus put the vtable at the start/end of that class - nothing else would make sense really). Of course, any sort of inheriting, vtable or no, will mess up the variable-length feature of this structure. You could actively prevent this by abusing the struct==class equivalency, a'la private constructor + public static factory method (that takes the size paramter) (this works because inheriting basically requires at least one usable constructor). But having never used the trick myself, I don't honestly know if having a private constructor wrecks the trick. Alternatively, you could actively prevent it by not inheriting your struct, put a comment saying DON'T INHERIT OR ELSE, and rely on other coders using your struct to not be idiots with it. Alternatively, you could use more C++-friendly approaches (See below).

....

This illustrates very well why some languages provide a "final" keyword. Thanks.

Captcha: So who or what is "vern" without the trailing "e", and what does he/it have to do with programming, WTFs, etc.? Second consecutive appearance.

2007-04-14 Reply Admin

Fellow Lurker:
To try and also make a suggestion: You took the daily out of the name and now there are articles pretty much every day? I dont see an rss feed, just let us know when there is something new and we don't have to read low standard articles.

For someone being so smug you sure don't have the intelligence to find the RSS link at the bottom of the main page.

no name:
And you don't need a size field if OrdElem contains some value that tells you you're at the end of the list.

Yay for buffer overruns!

Now this is a WTF because it's technically bad code. One look at that by an unseasoned programmer would ilicit a "WTF" especially since there's no comments explaining the use of the function. No mention of fact that you'd need to allocate this struct with malloc with extra size added on (especially important to mention since we're in a C++ environment). Just because it's a well used trick doesn't mean it's not abusing anything and doesn't magically make it not a WTF (especially since it's being used in a C++ environment).

2007-04-14 Reply Admin

no more Java programmers submitting C/C++ WTFs

The real WTF is people thinking such a thing as "C/C++" exists. There is C, and there is C++. The fact that most C is valid C++ is not an excuse. The SOD might have been a C clever hack to circumvent the limitations of the language, but is completely unportable "bad behaviour" if you are writing in C++.

GeneWitch · 2007-04-15 Reply Admin

Federico:
no more Java programmers submitting C/C++ WTFs

The real WTF is people thinking such a thing as "C/C++" exists. There is C, and there is C++. The fact that most C is valid C++ is not an excuse. The SOD might have been a C clever hack to circumvent the limitations of the language, but is completely unportable "bad behaviour" if you are writing in C++.

No mo' C++ prgrammers submitting ANSI BASIC/Visual Basic.Net wtfs either. Bastards.

fist-poster · 2007-04-15 Reply Admin

While I must admit I didn't know about the trick, it seems to be part of the language, and indeed std::string is also implemented in a similar way. However, the WTF worthy thing is the comment about making everything public to avoid friends. This gives some idea about what the rest of the code might look like. So, instead of keeping the (necessary and useful) ugliness all hidden in one place and providing an easy and straightforward interface, can we assume that every bit of code using this struct needs to know about its memory layout? I wonder, why std::string looks so nice and why I never even knew how it is laid out in memory...

Must Be Last

Leave a comment on “Must Be Last”