The Daily WTF: Curious Perversions in Information Technology

Steve_The_Cynic · 2020-05-13 Reply Admin

The entire world needs to stop ragging on Hungarian notation in general. There are effectively two kinds of HN, often called "systems" and "application". The original HN was what's now called "application HN", where the wart shows some additional information beyond simply the type. If the thing is a string, what flavour of string (HTML-encoded, XSS-safe, etc.) is it, that sort of thing. This gives us "c" (the thing is a count) and "cb" (the thing is a count of bytes) and "s" (the thing is a size) and "he" (the thing is HTML-encoded) and "xs" (the thing is XSS-safe), but not "i" (the thing is an integer). The last one is really "systems HN", where the wart shows what the type is.

Application HN has been supplanted by better use of better type systems, especially if you aren't using C. To represent that HTML-encoded string, you store it in an HtmlEncodedString rather than an XssSafeString or a plain String. Each provides appropriate conversion methods towards the others.

Systems HN remains a cardinal sin, and it has given us gems like wParam which is not a WORD. (It used to be a WORD, but then we moved from Win16 to Win32, where it's a DWORD, and should logically, therefore, be dwParam.)

2020-05-13 Reply Admin

On that note, can anyone explain what the "IInterface" convention is supposed to accomplish?

Remy Porter · 2020-05-13 Reply Admin

Since we use PascalCase for types, it's nice to know if a type is instantiatable or not. Mind you, I'm not personally fond of the IType convention, but it doesn't irk me the way a lot of other uses of hungarian notation do.

I used to do VB6, and haven't recovered from too many objects named btnButton1.

2020-05-13 Reply Admin

It's only time to stop ragging on Hungarian notation when it's no longer in use.

2020-05-13 Reply Admin

IHelper and IDoohickey are probably COM interfaces, and the AddRef calls are therefore correct. Reference counting is an alternative to garbage collection, and does have its merits.

2020-05-13 Reply Admin

I thought you had to do your own garbage collection with C++?

Poor old Sinonyi. There was an old thread somewhere explaining what the prefix was supposed to achieve and not what it was subverted to. After reading it the intent became clear, as did the fact that it was a massive misinterpretation

Jaime · 2020-05-13 Reply Admin

Application HN has been supplanted by better use of better type systems

Systems HN remains a cardinal sin

This is why we rag on it. Both reasons for its use are bad for most of us. The second has always been bad for everyone. The first was only ever good for languages that had a weak type system. If course, it was invented to help writing in C, a language that, at the time, didn't even have a native way to communicate that a variable is being used as a boolean.

If I ever write C89 again, I'll be sure to name my pointers pXXX. Other than that, HN must die.

2020-05-13 Reply Admin

Maybe I'm old & cynical, but I only had to read the code as far as the word "short" to know the source of the rare & seemingly random crashes.

Though the rest of the code was a treat too. Correct enough as far as we can tell, but still a goopy mess of half-baked thinking.

Kudos this time for an excellent guided tour by Remy. He's taken some heat of late; sometimes deserved, often not. This was a good walk-through.

Steve_The_Cynic · 2020-05-13 Reply Admin

The key use-case of Application HN is where the type system is insufficient. If you name things correctly you end up with stuff like this (using the name convention above, and given typedef char *StringType):

  StringType heTextToSend = NULL;
  StringType plainRawText = plainSomeFunction(params);

  heTextToSend = heFromPlain( plainRawText );

The type system here is insufficient to capture the operations correctly, but we can see that plainRawText receives plain text from plainSomeFunction (they both begin with plain), and that heTextToSend receives "he" text(1) from heFromPlain which received plain text from plainRawText. An inconsistency would show up immediately to a human reader, and the job can conceivably be automated. (Inconsistent: heTextToSend = plainSomeFunction(params);)

No, of course this is not a replacement for a proper type system, but it at least mitigates part of the pain. Once you have a proper type system, you should in general do away with even AHN. People advocating the use of SHN in new code should be shot.

(1) The clear problem is a lack of clarity about what "he" means as an AHN wart.

Steve_The_Cynic · 2020-05-13 Reply Admin

People rag on HN generally because other people are too fond of SHN because they misunderstand Simonyi's intent, or because they are cargo-culting from people (possibly through several generations) who misunderstood Simonyi's intent.

2020-05-13 Reply Admin

@Jaime ref

This is why we rag on it. Both reasons for its use are bad for most of us. The second has always been bad for everyone. The first was only ever good for languages that had a weak type system. If course, it was invented to help writing in C, a language that, at the time, didn't even have a native way to communicate that a variable is being used as a boolean.

If I ever write C89 again, I'll be sure to name my pointers pXXX. Other than that, HN must die.

I fully agree with the sentiment. BUT...

Now that typeless languages like JS, Python, Ruby, et al, seem to be retaking the world by storm, paradoxically the total LOC written in 2020 that could benefit from correct Application Hungarian type designators is far higher than the LOC written in C and VB back in Ye Olden Tymes of the 1990s.

Not that I favor this trend mind you. I'm just pointing out that it exists.

2020-05-13 Reply Admin

A COM object would delete itself as soon as the last reference to it was Release()-d. In this code example, this would typically be achieved by looping through the vector and the map in the destructor of CDisplayControl, and release each element in turn. I guess you could view this as a form of garbage collection, though it's a different concept from the GCs we know today.

2020-05-13 Reply Admin

I thought you had to do your own garbage collection with C++?

That's correct, you are supposed to free up any heap memory that you allocate. But that becomes very difficult and error-prone when you don't follow good practices and start passing around pointers all over your application without clearly defining a single "owner" who's responsible for managing their lifetimes. This leads to people creating their own garbage collectors using reference counting... which is even more difficult and error-prone.

Nowadays, if you really need automatic garbage collecting, the STL has a smart-pointer class that will do it for you.

Steve_The_Cynic · 2020-05-13 Reply Admin

A COM object would delete itself as soon as the last reference to it was Release()-d

Subject to the caveat that COM refcounts are on interfaces, not on objects.

Addendum 2020-05-13 09:17: The difference often isn't visible until you cross apartment, process or machine boundaries using DCOM, at which point it becomes critical.

2020-05-13 Reply Admin

Ill Igive It Ia IShot ....

Leaving aside COM interfaces (for which I believe the original CXXXX and IXXXX conventions were invented), let's take a semi-general case. (I say semi-general because I'm stipulating two DLLs/sos, although this is nothing more than a specialisation of generic SOLID principles.)

One DLL is the provider. It has a class (non-sealed, potentially at the head of an inheritance hierarchy) called DooHickey.

The other DLL is the consumer. There are only three things it cares about DooHickeys: they Foo and they Bar and they Baz. Or, at a higher level (design?) they behave like a DooHickey. In fact, to the consumer, for all intents and purposes, they are a DooHickey.

Except, to the consumer, they're only a DooHickey in Liskov substitution terms. At some point in the future, the consumer might be provided with an "enhanced" or "derived" or even "monkey-patched" DooHickey. In fact, the provider might be a different DLL entirely -- maybe even the consumer DLL. (For purposes of testing, throwing away nasty unperformant third party code, etc.)

So, at a semantic level, you want to know that an object is a DooHickey, but you can't call it a DooHickey, because functionally it isn't one -- it's only a subset. But you have to call it something. And since it's an interface to a DooHickey, you might as well call it an IDooHickey.

CDooHickeys are an abomination, though. Unless they're part of COM, which is complicated enough to need further annotation.

2020-05-13 Reply Admin

Clarification to the above - there are two simple rules for handling memory in C & C++.

Rule #1: Don't. Most of the time you don't need to allocate pointers; a stack variable will suffice. You can still pass references to those stack variables to methods higher up, but the object gets cleaned up with the rest of the stack frame, so its lifetime is safely managed. (Provided you don't do something silly like storing a pointer to a stack variable after it goes out of scope - but if you do that, you've got other problems)

Rule #2: Don't shift ownership. If you absolutely must allocate heap memory, free it in the same scope. New it in a method? Delete it in the same method. New it in a class constructor? Delete it in the destructor. Create it with a factory? Destroy it with the same factory (and whoever calls the create should call the destroy within the same scope). And so on.

Those two rules will solve 90% of your memory management problems without the need for reference counting.

2020-05-13 Reply Admin

To be picky: It may wrap around to negative, it may not. It's undefined behavior.

2020-05-13 Reply Admin

Speaking as someone who has largely been working with "typeless" languages (Ruby and JS) in the past 8 years, I have to say I've never seen the need to use AHN either. You can almost always tell what type a variable is by seeing how you deal with it. If you're gsubbing something, it's likely a string; if you're %ing it, it's probably a number, and if all you're doing is printing it out, more often than not it doesn't even matter since the interpreter is smart enough to coerce things as necessary.

Right now the trouble with lack of types is more about the parameters functions take - in that case, no matter what you name it, since you didn't declare it, you have no guarantee that your "sUsername" actually is a string. This is why things like TypeScript and Sorbet have started gaining in popularity.

Steve_The_Cynic · 2020-05-13 Reply Admin

I've never seen the need to use AHN either. You can almost always tell what type a variable is by seeing how you deal with it.

Except that you've misunderstood the difference between AHN (warts indicate flavour) and SHN (warts indicate type). The interesting question is not "is it a string or a number?" but "what kind/flavour of string is it?". Does it contain plain text, HTML-encoded text, etc.?

2020-05-13 Reply Admin

I've worked in C# (with IInterface convention) and Java (which doesn't do that), and, despite not liking type prefixes in general, there is actually something nice about knowing whether you're referencing a real class or an interface. I can't really explain what the added value is, but there definitely is some to me. It also prevents the "interface Wotsit" -> "class WotsitImpl implements Wotsit" problem - "class Wotsit : IWotsit" is a lot cleaner, and it's quite a common pattern to have a boundary interface with one default implementation that you want to give the same name to.

(Of course the .Net Framework is full of those - IList -> List etc)

"CThingy" is an abomination though.

As for the WTF - can't you compile with overflow checking enabled? But TRWTF here is having a separate count variable in the first place when you can just request it off the vector, not using the wrong type on that variable.

Jaime · 2020-05-13 Reply Admin

As for the WTF - can't you compile with overflow checking enabled?

OK.... but the problem was that the overflow created an intermittent and difficult to locate problem. How would you know to do that? You're proposing a solution to a really easy to solve problem, but you're not simplifying the really hard to do task of finding the bug.

TheCPUWizard · 2020-05-13 Reply Admin

A METHOD was called to pass the DooHicky the Helper.... No property necessarily involved. No knowledge that it could be read back, or even retained....

2020-05-13 Reply Admin

It also prevents the "interface Wotsit" -> "class WotsitImpl implements Wotsit" problem - "class Wotsit : IWotsit" is a lot cleaner[.]

Maybe it's a matter of different programming styles, but I actually feel exactly the opposite. I tend to use "WotsitImpl" only in a Wotsit#create static factory method or a WotsitFactory/WotsitBuilder class, so the "Impl" only shows up in two places - the implementation itself and the factory/builder. By contrast, the "I" prefix shows up everywhere you use the interface (No shit, Naomi!), so I find it a lot more intrusive.

And I guess there's also a matter of - for lack of a better word! - semantic aesthetics. If I have an interface Wotsit, to me, it sounds like that interface embodies the concept of "Wotsit-ness", and a WotsitImpl is only one kind of Wotsit. Whereas the Microsoft convention suggests - again, to me! - that the implementation embodies the concept and the interface is some kind of... ill-defined other thing. (Come to think of it, I also shy away from "Impl" in favor of something more descriptive whenever it's applicable, probably for the exact same reason.)

Paramecium 13 · 2020-05-13 Reply Admin

It looks like BobC was rather short sighted.

2020-05-13 Reply Admin

I like using the 'I' prefix on interfaces when the rest of the interface name is a verb or verb phrase. It makes the interface name describe what the interface is covenanting to be able to do.

So things like IIterate, IProcessBackups, and IExportConfiguration.

But if you have something like 'class DooHickey' with 'interface IDooHickey' or equivalently 'class DooHickeyImpl' with 'interface DooHickey', then you likely need to rethink your interface's purpose. You probably have an abstract base class instead of an interface.

2020-05-13 Reply Admin

And for a bonus, the initial value of m_nNumHelpers is undefined (unless there is a constructor that's been elided for the article). So you can't even be sure it'll start at zero.

2020-05-13 Reply Admin

It's been awhile, but I seem to recall a bug in Visual Studio 5 where COM would drop things early, and the work around was to litter the code with AddRef's and Release's, fixed in Visual Studio 6, where it lead to debugging new memory leak issues on recompiled code....

2020-05-13 Reply Admin

People rag on hungarian notation simply because it doesn't handle two things.

First, it adds a bunch of seemingly random characters to a variable name, which can obscure what the object actually is. I mean, is psz really more readable than C string? (Pointer to String Zero Terminated, aka, C "char*"). It might be important in interfaces, but generally speaking you make it a convention what kind of strings you'd use or make your own string format and functions to go between the language native strings and your own format. (And really, if your interface has multiple string types, you might want to rethink things through).

Second, it says absolutely nothing about changing types. After all, psz refers to C-style strings, pwcsz refers to wide strings, but now you have to rename every )(*%&#@$ variable if you decide to add i18n support.

Hungarian notation is shorthand, it's basically putting the type in variable names. Why not use "int intIndex" or "short shortNumber". Same thing, you just don't do it because someone may decide a short is too short and change it to an int. And of course, people think it's a short, and now it's an int, and the build breaks because the compiler sees a mismatch that's buried deep away.

2020-05-13 Reply Admin

We should add polish notation and remove inference, so declarations look like this:

signed int_counter_int = 0i;

2020-05-13 Reply Admin

Yes. Except when you don't.

The whole point of an interface declaration is that you, the client, depend upon the interface declaration and nothing but the interface declaration. That makes it quite flexible -- and so much more flexible (from a client's point of view) than the declaration of an abstract base class.

In other words, even if you provide an abstract base class to the client (via linking or metadata or marshalling or whatever), you are needlessly narrowing the semantics of what you provide. This isn't just an abstract (hem hem) concept. I've spent about six months explaining to my hyper-intelligent C++ coworker that there are important differences between using an interface and using the root of a class inheritance graph.

Give me another six months, and I'll get him accustomed to the idea. Or else kill him. Either way suits me. I am a man of little patience.

2020-05-14 Reply Admin

I find it helpful when reading code to see a variable is declared of the type ISomething. That tells me straight away that I'm looking at something that could be jut about anything, and that I can only guarantee it will be implement whatever that interface has. Whereas seeing something of type "Something" implies a particular concrete class (although, it could still be a sub-class with god-knows-what overridden). The Interface warns me, at least in my line of work, that it could be something completely different in different environments / builds / markets.

A Short Trip on the BobC

Leave a comment on “A Short Trip on the BobC”