- Feature Articles
- CodeSOD
-
Error'd
- Most Recent Articles
- Office Politics
- Secret Horror
- Not Impossible
- Monkeys
- Killing Time
- Hypersensitive
- Infallabella
- Doubled Daniel
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
Re CS degrees: I have a computer science degree (cue calls of "Oh, bad luck!"). During the course of my, ah, course I learned many things, several of which I remembered for years afterwards. One or two of them were even useful in my career.
"How to write code" was not one of these things. I learned that myself, by doing it, like everybody else.
Admin
I don't think so. I'd guess, the code started out as a special map for at most 4 entries and tries to avoid the overhead of a hasmap. (Creating the hashmap does have an overhead; it might have been a worthwhile optimization.) Later it became apparent that more than four entries would be needed, and adding the hashmap was easier than adapting the existing interface to the real hashmap. Maybe changing the interface was out of the question due to political reasons.
Admin
Amen.
Admin
Let's face it, it's obvious WHAT it does - it doesn't need comments - I followed it without trouble. But a comment or two to explain WHY wouldn't go amiss.
Admin
This is a very good question, and the answer, at least today, is: You make it up as you go, just like everyone else!
This, of course, is why TDWTF is such a well-stocked site. :)
I think that the only reason that this abominable state of affairs is allowed to persist, instead of being rectified with a system of software development certification similar to what other engineering professions use, is that to most people, code is totally inscrutable and largely invisible. No one can tell what a mess you made except your fellow programmers, and most of the time they're in worse shape than you are, so they can't tell either.
Part of the problem is that software development is nearly as much an art as it is a science. And part of the reason for this is that the tools change so fast that by the time any given development environment has time to mature from an art to a science, it's obsolete. At least in Microsoft-land.
That doesn't mean that it's impossible to train and certify high-quality software developers. But no one's doing it, and no one cares that no one's doing it, because most people, especially the ones paying for software development, have no idea what difference it would make. To them, software is software, regardless of whether it's hacked together by a bunch of half-trained monkeys in India, or a bunch of local half-trained monkeys.
Contrast this with, say, architecture, where it's more or less intuitively obvious to most people that if you try to get someone with no training in architecture to design a building, it's probably going to fall over, explode, or sink into the ground. Or fall over, sink into the ground, and then explode. And that's after it's late and over budget. The same is true of software development, but this fact has not yet penetrated the consciousness of the general public, which includes most software managers, not to mention clients of software development companies. Not only that, but it hasn't penetrated the consciousness of many people who pass themselves off as software developers, because how hard can coding be?
Admin
So does this [pi]
Admin
I just added up all your milliseconds and it looks like your benchmark took 407 days for the Hashmap alone. You must have started on this benchmark back before July of 2004. So this must be your code yes?
Admin
My guess was 2771835000ns, but that is 2.7 seconds (according to google).
Admin
Sorry. As Bob answered, this is in nanoseconds - System.nanoTime().
Admin
No, EMPTY_MAP and singletonMap are immutable so they can't be used here (given the requirements, you must be able to put additional elements).
Admin
No, the hashCode is computed only once and it is cached in the String class.
FastMap is not complicated (I'm sure you understood it easily). Did you find bugs? Using an ArrayList or even an Array would reduce the performance and increase the memory use. It would reduce the code size but this is not something to worry when you're optimizing for speed.
Admin
Still means it has to be computed once, which will usually happen the first time that particular String instance is used as a key in a HashMap get(), put() or contains() call. If you're using the same String instances many times, you could probably save a lot more time by using an algorithm that doesn't require this.
The toString() method shows wrong data after the switch to "real map mode" if the map got modified. Also, the references to the 4 values don't get cleared in the switch (or ever), which is a potential memory leak.
Using ArrayList would not reduce performance and would (using the values for Sun's 32 bit JVMs) require a whopping 24 bytes of memory, which is pretty insignificant compared to the will over 200 bytes the whole data structure uses even with very small keys and values. In fact, it would use LESS memory than the current implementation if there is only 1 entry, because the 3 pairs of empty pointers also take up 24 bytes and you wouldn't need the size field.
Admin
The more important question is: How do you remain a "professional" programmer?
Since this is a profession changing at fast pace, it's probably more difficult to keep in touch with all the new stuff along the way while doing real work at the same time.
Admin
Amen, brother.
Admin
I have similar experience. It is good to find out what others have done in an area. Studying computing has helped me refine some points, given me the theory behind a thing so that I can apply it more generally, and exposed me to things that I might not have thought of otherwise. Using something already constructed, whether a library of routines or a body of knowedge, is good leveraging of other people's effort.
Sincerely,
Gene Wirchenko
Admin
Am too drunk to read thru this shit now... to much to read.... take your word that its hilarious since all the snippets on this site are an absolute classic and a joy to me......
</remark>
Admin
Well, I have a pretty good idea where he got the idea. If you use the Commons Collections, they have a Flat3Map (http://jakarta.apache.org/commons/collections/api-release/org/apache/commons/collections/map/Flat3Map.html) that probably does something relatively similar. Of course, the situations where I know I will only ever be using a Map of size 3 or less and speed is critical are rare, so I have yet to use that interesting little utility....
Admin
Any good books that hit the nail on the head of what every "professional" programmer should know? I consider myself professional (self employed and get good results and happy clients) but being self taught I am always trying to see what other developers swear by when it comes to good methodologies and design practices.
To date, a lot of the WTF code I run into usually involves sloppy work (ie, no planning, no refactoring, just a growing leviathon that gets a stamp and pricetag and shipped out the door) or straight out lazy or unethical practices (the cookie "admin=y" should never give complete control to a site...even if the client doesn't know the difference - btw, this one made it into a widely distributed application) which are more about attitude and a lack of professionalism than professional grade skills.
PS: Happy Bird Day everyone
Admin
"The Pragmatic Programmer - From Journeyman to Master" by Andrew Hunt and David Thomas.
Admin
My personal list of classics:
Admin
What gets to be lots of fun is when some of the constructor is called with some nulls sprinkled in [:P]
Admin
I'd add 'Code Complete' by Steve McConnel to that list. He focuses on the details of implementation of a design in a fairly language agnostic manner. You don't have to agree with every thing he says but he raises all the questions that you're going to have to think about and provide an answer to in the end, which is why I always find myself coming back to this book....
Admin
I'm a CS major graduating next Summer. Maybe I should just go ahead and jump off a bridge...
[:S]
Admin
Eh, I like comments. Comments tell me what the original author intended the code to do, not just what it is doing (which any idiot can see). They tell me why something might look like a WTF, but is actually a reasonable way of doing something, or warn me that some part of the code is 'good enough' for the current design, but that if something else is changed this should be updated as well, etc.
Comments are a window into the thought processes of the previous programmer, they shouldn't explain what, but why. If I know what he (or, rarely, she) was thinking, I can more quickly see where problems are and maybe what else might be affected by the change.
Admin
I can only assume that he thought that the "map" would only ever have 4 items, but wanted to plan for the future while being "efficent". Hashs are NOT efficient at either low or large numbers of elements. They have a sweet spot for a FIXED size, and only about about an 80% fill rate, beyond that, they go "funky" and can QUICKLY degrade into worse than worst case scenarios. How anyone thinks adding in a "proxy" kludge could possibly improve the situation... <shrug>
Might as well go whole hog and write an AVL tree routine. At least then you know your psychosis is well rewarded (except that the MATH behind that fails to account for cache hits as your data gets scattered to hell and gone all over the heap. Flat arrays scan scan actually be more efficient simply because they cause less expensive page faults. If you REALLY wanna go nuts, keep a sorted flat array so that you can at least do binary sorts and take advantage of the MMU when you want to serious block copying for an insertion sort.
Admin
I like "Code Complete", too, and for the same reasons. I think anyone wanting to become a professional programmer should read this book.
Sincerely,
Gene Wirchenko
Admin
Admin
I can not belive it. Someone knows Java. Someone knows the idea of data structures. And yet someone is able to create such a WTF... (btw. I really like this one... must have been some kind of CS student; they do sometimes such things...)
But unfotunately he/she forgot to override the equals and hashCode methods...
Admin
http://en.wikipedia.org/wiki/Office_Space
Admin
Admin
They SHOULD tell you what the intention is, not just what it is doing, I've yet to run into comments written by others which explain why. The best I've run into is comments saying what it is doing when it is not obvious what is going on. Though in that case, improving variable names, renaming methods, or extracting methods would be an equally viable solution.
That being said, my comments stink too. The other problem is that even good comments are often used as "de-odorant" for a fundamental underlying design / implementation problem.
The best use of comments are when the comments contradict the code - that becomes a great tip off there is a big problem. I think Donald Knuth said that "<FONT size=2>If the code and the comments disagree, then both are probably wrong"</FONT>
Admin
In my code there are comments like
"do xxx only when condition yyy is met, because some race condition with zzz might otherwise lead to wrong results"
"changed on 24.12.2005 because user xxx requested it to handle case yyy" (IMO it's good to have comments like that in the code, not only the CVS, in case another user requests the opposite)
"checking for value xxx in field yyy; function zzz sets it to that value to indicate case xyz"
as well as many comments describing what is done, in terms of business logic
Admin
Maybe one day I will get to read your code. I noticed your comments usually have the word "because" in them. Maybe that is the key. What I really really loathe are the "auto-comment" generators that "helpfully" list the parameter names and return types. Thanks but I know how to read method signatures.
One kind of comment I would like to see more of is indications of whether the method can return null (and perhaps under what conditions), and whether it is ok to send in null parameters. Then again maybe that sort of indication belongs to the language syntax so the compiler can check it. I hear Forte had some sort of syntax to declare at compile time whether variables / return values could be null.
Admin
I must admit I sometimes also write totaly useless comments, like
// get pending events
static int getPendingEvents(int millisecondstowait, EventCode *events, int maxevents, int useGDK, int checkJoydevice, int checkXKeyboard, long *timing) {
Admin
With IntelliJ IDEA 5 it is possible to mark the methods/variables that can be null or are always non-null, and the IDE will display a warning if a NullPointerException cound happen: http://www.jetbrains.com/idea/features/newfeatures.html#nullable
Usually in the beginning of every method the first thing I do is to check if any parameter is null and throw a NullPointerException if it is so. In the Javadoc I'll write a "@throws NullPointerException if any parameter is null." If the null value is really allowed for some parameter, that I'll always mention in the documentation.
I'm always willing to hear about other ways to handle null values.
Admin
Agreed, it is better to improve the design than the code. But he wanted a fast replacement for HashMap. Even if you're using the same String instances many times, the hashcode is computed only once. For speed optimisation, you could implement a faster hashCode() method. Or better, not to use the String class at all.
Yes the code has some serious bugs. Not sure if it comes froma real product.
You would need loops. I will try to code it.
Yes but it would be much slower if you resize the array.
Admin
You would hopefully not explicitly code a loop but call the contains() and indexOf() methods of ArrayList.. Of course those are internally implemented with loops, but I'm sure the JIT can unroll them.
If any change is made to the map, the given implementation immediately switches to a "real" HashMap, so that cannot be held against the ArrayList based implementation.
Admin
My "Perfect World" commenting scheme is as follows...
Classes should be commented with a brief overview of the class's purpose. This ought to be a cut and paste job from the spec document, if there is one. Implementation details only should be mentioned if they would be important to someone using the class.
Methods should be commented with a brief statement of the method's purpose. Again this should be a cut and paste from the spec document. A brief description of non-obvious parameters should be included. When describing valid inputs and outputs, pay particular attention to edge conditions.
Taken together, class and method comments should allow a reasonably skillful programmer to utilize this class without having to study the code.
Code comments should be avoided except in the following circumstances...
1) If the method is long or unusually complex, a brief comment describing the algorithm should appear toward the top of the method body.
2) Any assumptions or WTF written as an expedient should be clearly commented.
3) Any workaround for some bug somewhere else should be commented.
4) Any code that is particularly subtle or non-obvious should be commented, particularly something that causes a side-effect somewhere else.
5) Magic numbers in the parameter list or return of someone else's code should be commented.
I don't fall into the school of thought that says that methods should have one and only one exit point. However, when my exit point might be obscure I like to point that out with a comment too.
Admin
What about the method name comments? I have a lot of code with a coding standard that requires the following:
//---> getIndex() - Return index # <-------------------------------------
ULONG TapeDevice::getIndex(CHAR *pszName)
At least that doesn't get in the way. That code also has the version control comments inserted at the start, meaning I often compile, test, check code in, send to test. If the test group finds a problem I can use the debugger to find what line I crashed on, but that line has no simple relation to what line I need to goto in my editor.
Most of my comments are of one of the following forms:
//I appologise for the length of this function, it grew much more than I expected,
//and should be split up.
// Use an O(n^2) algorithm because I'm too lazy to find a good hash implementation
// when n should always be small anyway.
// Management won't allow me to do this right of the bogus argument: ...
Admin
This is what I call a method comment. I'm primarily a java programmer. The java language specifies a format for these comments, and will automatically generate documentation based on the contents of these comments. This system is called javadoc.
I'd consider all three of these an instance of my rule #2.
Admin
And it's what I call useless crap, because if done as in the example, it contains no information you don't already get from the method name. Its only effect is to reduce readability.
It's typical for the kind of junk you get when the coding standard mandates that each method must be commented or enforces a minimum comment ratio.
Don't take me wrong - meaningful comments can be a lifesaver on non-trivial methods, but if people have a technical "comment quota" enforced on them, they are NOT more likely to write meaningful comments; instead, the code gets diluted with lots of meaningLESS comments you'd be better off without.
Admin
"People who write crap code also write crap comments that you're actually better off without."
No, it's those people that you're actually better off without.
(That solves the crap code and crap comment problems at the same time)
Admin
I didn't say it was a good method comment. It is certainly useless crap as-is. But if it were fleshed out a little as I described earlier it would be useful.
This would be useful text in an API document autogenerated from the source...
Parameters:
CHAR *pszName: The name of the tape device
Returns:
ULONG: The index number of the named device
This method returns the index number of the device named in the first parameter. Only the first 32 characters of the name parameter are considered significant, and any additional characters will be ignored. If the device does not exist, it returns 0. If a null pointer is passed, it returns -1. If some error occurs, it returns -2.
I agree with you 100% here. That is one of things I hate about IDEs is that they like to autogenerate meaningless comments for you. I do, however, like to keep documentation and code in the same source. The javadoc concept is a good one (not a perfect one) that does make code more supportable and reusable in the long run. I'm willing to pay the price by having additional comments in my source file, provided that those comments are not in the code itself, but at the class declaration and before the method declaration.
Admin
That made it to the apache's commons-collections api!
http://jakarta.apache.org/commons/collections/api-release/org/apache/commons/collections/map/Flat3Map.html
Admin
Well said.
Admin
Begs the question: how much time does he lose a) on all those boolean checks b) on the additional call, which isn't inlined by the JIT BECAUSE of those boolean checks c) the recreation of the map when some joker goes and adds a 6th element.
I'm tempted to run one of those exhaustive, boring tests like the ones in Java Performance Tuning to get the relative performance between this FHM and ordinary HM on every JVM released since 1.2.2...
Admin
Admin
It can be because your assumption that calculating the key is "very fast" is not necessarily right. If calculating the key takes on average longer than 3 equals() comparisons, this will always be faster than a normal hashmap without collisions - and probably even sooner, because it can often give a result with fewer equality tests.
A good hash function should take into account all data that is used for equality tests, otherwise you run the risk of degenerated behaviour with skewed data. Since the hash function has to do computations on the data to fit it into an int, while the equality tests just does comparisons, the hash function will nearly alwas be slower than the equality test, especially since the equality test can often give a negative result before actually looking at all the data.