- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
I can't get less than one line but I can make it more obvious. In Ruby:
my_array.uniq
(or my_array.uniq! if you wish to modify the array in place instead of generating a new array).
Admin
The real WTF is that the original code finds the unique elements (at great cost) and then throws it all away without returning any information.
Admin
Doh! Now I'm the idiot.
Ok, hashtable's don't use binary trees, I was thinking of different data structures. Even still, the lookup time from a hashtable is faster than a full iteration, O(1) on average, O(n) at worst.
Admin
LOL. It's sad when people try to look smart and just end up exposing their ignorance. Let me give you a hint. It's not a tree either. Maybe it has something to do with a table...
Admin
I saw same things in java so I realised that any algorithm can be expressed by only try throw catch statements.
:-) Here is the simple proof: http://99-bottles-of-beer.net/language-java-866.html
Admin
Yeah, already self corrected, but thanks.
Admin
Since I also believe that exceptions should only be involved when there is a good reason for it, I've come up with a simple solution. Every time an exception is thrown for our website, it EMAILS THE ADMIN (me in this case) what the exception was, etc. etc.
To keep myself from getting spammed all the time, I make sure to have good, clean code that doesn't throw exceptions.
And if an exception is uncaught, I get a nasty gram from the application...
Admin
doesn't every non unique item take you one frame down the stack?
Admin
Thanks. That was the only response in this whole topic I felt was really worth reading.
Admin
You could use a set. I dont know about .NET but in Java you could do:
Set deduped = new HashSet(ListOfDuplicatedValues);
A set, in its nature contains only unique values.
Admin
To be fair it is an unedited reader submitted article and others already pointed out the problem with that method.
This is not a reflection on CodeProject, it's a reflection on the author of the article not revising it to improve that method.
After reading some of the comments on here I can see there are a lot of little wtf'ers waiting to happen. :)
This type of thing is not up the the usual WTF standards, seriously there is no worse code than this out there in actual production use or is this site just running out of steam?
Admin
Not quite:
Admin
Ummm, individual hash operations are O(n), so using a hash insert operation N times is O(N^2). It doesn't actually matter whether you use contains(value) first, because 2 * O(n) is still O(n). The lookup cost is the same for insert as it is for contains, since the hash table must determine if the value is already contained so it can behave correctly (whether that be to overwrite the contained value, add a second contained value, throw an exception, return FILE_NOT_FOUNd, etc).
The reason why a hash operations are O(n), not O(1) as most people would like to believe, is because of the linear search once you've found a hash bucket. In practice, most hash buckets have only one element, so if carefully used a hash table gets O(1) amortized performance; however, if you're in an "enterprise" situation you might find that there's no bucket size small enough and your hashes will get all O(n) on you.
Admin
In .Net, as with other languages, i goes out of scope as soon as the for loop is exited. So:
FindUniqueElements(i+1);
would cause this method to not even compile. For this to work the try catch block would have to be on the inside of the for loop.
Admin
This is the actual implementation in .NET 2.0:
private struct bucket { public object key; public object val; public int hash_coll; }
public virtual bool ContainsKey(object key) { uint seed; uint incr; Hashtable.bucket bucket; if (key == null) { throw new ArgumentNullException("key", Environment.GetResourceString("ArgumentNull_Key")); } Hashtable.bucket[] buckets = this.buckets; uint num3 = this.InitHash(key, buckets.Length, out seed, out incr); int num4 = 0; int index = (int) (seed % buckets.Length); do { bucket = buckets[index]; if (bucket.key == null) { return false; } if (((bucket.hash_coll & 0x7fffffff) == num3) && this.KeyEquals(bucket.key, key)) { return true; } index = (int) ((index + incr) % ((ulong) buckets.Length)); } while ((bucket.hash_coll < 0) && (++num4 < buckets.Length)); return false; }
Admin
"Enterprise" here having one of two meanings:
Your data set is so huge that the hash table involves disk pages, and linear searches through disk pages suck. Actually even O(1) searches through disk pages suck. This is why most DB's use some kind of balanced tree--it keeps similar values together, so you can keep a bunch of related ones each page.
The hash container you are forced to use was written by weasels, and all data values end up in a single hash bucket, producing slightly worse results than linear search, but consuming more memory.
Admin
Throws stone
Admin
call me crazy - but whats wrong with the way i do it:
?
im not a java/.net/whatever coder, but isn't there also a terrible problem possible with stack overflow from the recursive nature of the WTF?
Admin
This WTF is especially meaningful to me, because I have been there no more than a couple months ago.
I wrote a small utility class to parse an ini file. It was C# with .NET 2.0, btw.
So, I was using exceptions to handle syntax errors. Like, each time I found a syntax error, I'd throw an exception, catch it, log it, then analyse the rest of the file. Then I wrote an ini file full of possible mistakes to test. There were about 15 syntax errors to catch, and one exception for each one. Execution time : 47 to 50 seconds.
It was too much, so I decided I'd try to anticipate all error causes (like duplicate key, malformed line, etc...) and return false with an error message. Execution time : less than what a Datetime object can measure
Lesson learned : NEVER USE EXCEPTIONS, unless you really have to. And when most often you think you need it, you actually don't.
Admin
The quote is, "Let he who is without sin, cast the first stone."
if (!(he & SIN)) { first_stone = (Stone) sones[0]; }
Crash Magnet
Admin
Looks to me like a good way to avoid the "enterprisey" situations mentioned above. If my math is correct, your algorithm would be O(n log n) (sorting: O(n log n), iteration: O(2n) = O(n)). So IN THEORY you would be a good bit faster than the O(n^2) of the hash tables. But since in practice the buckets contain only few elements and the runtime often nears O(n) - faster than you again - that propably just matters in extreme cases.
Feel free to throw exeptions at me if I'm wrong...
Admin
Throwing fifteen exceptions DOES NOT TAKE 50 seconds.
Admin
Actually, were you running under a debugger? It could have spent that time loading symbols when the first exceptions were being thrown. Outside of a debugger, that wouldn't happen; it would still be on the order of a millisecond or two.
Admin
Here is the source code for the 2.0 version of Hashtable.ContainsKey. As you see, it does operate as a hashtable.
Public Overridable Function ContainsKey(ByVal key As Object) As Boolean Dim seed As UInt32 Dim incr As UInt32 Dim bucket As bucket If (key Is Nothing) Then Throw New ArgumentNullException("key", Environment.GetResourceString("ArgumentNull_Key")) End If Dim buckets As bucket() = Me.buckets Dim num3 As UInt32 = Me.InitHash(key, buckets.Length, seed, incr) Dim num4 As Integer = 0 Dim index As Integer = CInt((seed Mod buckets.Length)) Do bucket = buckets(index) If (bucket.key Is Nothing) Then Return False End If If (((bucket.hash_coll And &H7FFFFFFF) = num3) AndAlso Me.KeyEquals(bucket.key, key)) Then Return True End If index = CInt(((index + incr) Mod CULng(buckets.Length))) Loop While ((bucket.hash_coll < 0) AndAlso (++num4 < buckets.Length)) Return False End Function
Admin
Beyond putting Sleep() statements in the exception constructors, I really can't think of a way to make 15 exceptions take more than a couple milliseconds. Is this how you managed it?
Admin
Also, exceptions shouldn't be used for mainstream logic because they are supposed to occur in exceptional circumstances (hence the name) and disrupt standard control flow through the code. If you're reading code like this and see a try-catch clause, you automatically think that if the catch block is entered, something has gone wrong. Ignoring this pattern just makes it less intuitive.
Admin
I take exception to today's submission!
Admin
Sounds like a few experiments with exeptions could be useful.
How long do the following snippets run on your machine?
1:
2:
3:
Addendum (2007-03-16 14:38): oops, was referring to:
Admin
Admin
Admin
or actually
Admin
For integer.Parse it seems to take 100x times longer to catch an exception than it would have taken if it completed successfully. In the IDE it takes 4000x times longer.
Admin
Do you mean to suggest it takes 50 seconds to throw 15 exceptions. You are mistaken, I believe.
What exactly do you believe to be the smallest accuracy of the DateTime structure?
Hogwash. Pure hogwash.
Admin
WTF!!!
This is gotta be one of the stupidest list of comments I've ever seen.
Yes, it's absolutely moronic to write a process that actually depends on exceptions to work.
Hasn't any of the above commentators figured what would happen if the last element WASN'T unique?
Admin
Nice! You get a star :) </srsly>.
Admin
System dependent. It's probably about a millisecond. Nothing wrong with using the return false thing internally, but you'd probably want to throw an exception at the api level.
Admin
Admin
You might want to look at the ticks property of the DateTime structure. You're not just wrong, you're four orders of magnitude wrong.
Admin
Admin
What? Why? That variable is a static global.
Admin
Whether hash operations are O(n) or O(1) depends on the the exact type of hash table you use. Using a fixed number of hash buckets will definitely get you O(n). Using a varying number of hash buckets (as .NET apparently uses), or using open addressing gives you O(1) expected time, but not, as you mentioned, actual O(1). However, using cuckoo hashing gives you O(1) time, guaranteed.
Admin
Especially in .NET
Throwing Exceptions are expensive operations in .NET people! They're not meant to be used for things that you know will happen.
Admin
Your statement is clearly absurd, as measured against three possible yardsticks:
(a) It isn't concise enough. I'm sure that APL only needs, at most, five characters to do the same thing. (Plus you get lazy evaluation for free!) (b) It uses far too many individual characters. (I can see a WTF recursion coming at me right now...) Whitespace would be far better suited to the task. (c) It only uses fourteen different keystrokes. Your keyboard will get bored. Parts of it will jam up, just when you need that elusive "x".
It's hardly in the spirit of things to wonder what the underlying mechanism to "^aCollection asSet" might be. After all, it would detract from the many wonderful arguments we're all having about whether hash tables use trees, or lists, or have duplicates, or might optionally have duplicates, or are anyway a suitable .Net 2.0 implementation compared to Dictionary; which might in turn use trees, or lists, or have duplicates, or... God knows. It might even be implemented in terms of quantum computing by now.
There is nothing wrong with Smalltalk programmers. They tend to be academics, or else the people responsible for bringing the replacement HR system of America's third largest auto manufacturer crashing ignominiously to the ground; but there's nothing wrong with Smalltalk programmers.
Good Smalltalk programmers use Smalltalk appropriately and wisely. But remember:
In Soviet Russia, Smalltalk uses you!
Admin
pedantic
Of course, it would be easy to be even MORE pedantic here...
Admin
However, the only sense in which hashtables operate at between O(1) at best and O(n) at worst is that they operate at O(1). Trumpets please:
Knuth famously claims that there are only seven different categories of sorting algorithms. (I think he's quoting Aristotle here, or perhaps he's just indulging in that well known geek humour that we all love so much.) I can think of about four different hashing mechanisms, but they only really differ in terms of how they cope with collisions.Collisions are a higher order problem.
Lots (not, obviously, all, so don't yell at me) of posters on this site seem to miss the point of hash tables.
Hash tables do one thing, and they do it well.
Basically, you take your key, and you take your value. The key (K) may or may not be related to the value (V). This is entirely irrelevant.
You apply a hashing algorithm to K and use it as an index into, basically, a vector/array/whatever. A perfectly decent algorithm in many cases is to XOR every byte in the key. Or every short int. Or whatever. (If you're insane enough to roll your own, I recommend co-primes.)
You now have (given more advanced usage as described by John, above) O(1) insertion and O(1) deletion and O(1) location. It is not possible to beat this.
What you lose is the ability to iterate in any meaningful manner. You lose sorting, you lose ... well, I'm getting bored. But you still have O(1) for what you want.
Even if you don't use C++, I strongly recommend that you look at the documentation for the STL. It makes all of this stuff fairly clear, and you can probably translate the ideas into your own code.
Admin
I throw boulder.
Admin
I'm pretty sure you were running the code in a debugger. Compile for release, run it outside the debug environment and again I'm pretty sure you will find that exceptions do not in fact take 3 seconds per on average. Consider also the fact that .NET programs all over the world are throwing and catching exceptions all the time and do you think it would be common knowledge by now if .NET exception handling was truly slow?
Admin
Eric is right, however what the OP probably meant is 'less than what the computer can measure' and not 'less than what a datetime can store'. The resolution of time measurement done by a PC depends on the RTC frequency which I believe is set to 18.2Hz. So yeah, 55milliseconds is correct. Unless you want to be pedantic :)
Admin
Tell me about it. One time I threw a bunch of exceptions but they found their way back home and killed my dog. I will not rest until those fucking exceptions are caught.
Admin
For what it's worth, Michael, there are languages which do optimize small hashtables to linear lists and somewhat larger ones to (possibly balanced) binary trees, only actually building a true hash tables when the size and certain other properties of the table indicate a good probability of getting a good tradeoff for the extra space consumed versus the running time.
(Yeah, that means that, if the hashtable starts with fewer than ten elements in its declaration and the compiler can't find initialization code that makes it big enough to be worth it, it could at different times during runtime have each of three different internal representations.)
(Oh, and I seem to remember that one implementation of the language actually added some other internal representations including trie and such.)
That doesn't mean, of course, that O(n^2) is going to be the typical search time for hashtables in said language, but if you've seen such languages you have an excuse for seeing things differently.