• S (unregistered) in reply to Nutster
    Nutster:
    Oops, I missed the call to clear(). So, ignore half my argument. But still, recreating the HashMap for each loop is not the way to go, and if the result from the loop is needed afterwards, well this should fail to compile.

    Since it compiled after the change, the map can't have been used outside the loop. And with the change, it's now enforced that there are no side-effects involving that variable, making things a little bit clearer for the next person to have to maintain it.

    If that change costs 0.0001% performance from just one extra object allocation each time through the loop, it's worth it.

  • Tasmanian (unregistered)

    TRWTF is MM-DD-YY date formatting.

  • CLR (unregistered) in reply to S
    S:
    Nutster:
    Oops, I missed the call to clear(). So, ignore half my argument. But still, recreating the HashMap for each loop is not the way to go, and if the result from the loop is needed afterwards, well this should fail to compile.
    Since it compiled after the change, the map can't have been used outside the loop. And with the change, it's now enforced that there are no side-effects involving that variable, making things a little bit clearer for the next person to have to maintain it.
    Clearly.
  • S (unregistered) in reply to CLR
    CLR:
    S:
    Since it compiled after the change, the map can't have been used outside the loop. And with the change, it's now enforced that there are no side-effects involving that variable, making things a little bit clearer for the next person to have to maintain it.
    Clearly.

    I was trying to avoid using words like that in my post... :)

  • Baboon (unregistered) in reply to IA
    IA:
    I would really want to see the code where
    map.clear()
    works well and
    map = new HashMap()
    makes troubles. This looks like real nano optimization that brings nothing. The new HashMap can be even faster if you have enough memory and GC never runs.
    Jeremy:
    ... code should be written the easiest to understand and maintain way

    fully agree

    +1 This seems like premature optimizations. I personally don't reuse objects unless profiling indicates that it's a hotspot. Now depending on the application this code may not be the most efficient in terms of "supposed" GC overhead however if it talks to a database this code is going to be nowhere near as slow as the actual call to a db itself.

    Lets think about this, depending on how long the method runs for will basically mean one of a few things, are these in the young gen or has it promoted to O-Gen ... how many hash buckets have been created when the load factor is rehash etc.

    So lets assume the default hash buckets of 16 since we don't know what the inner code actually does ... so that's potentially 16 objects dereferenced, the new allocation is only really a small operation and super efficient. Oh also lets not downplay some of the optimizations the JIT could potentially perform such as dead code elimination etc ... really I would have loved to see the bytecode generated here, JIT logs and performance metrics.

    Hmm I think Snoofle needs to read Charlie Hunts excellent Java Performance book instead of the usual self righteous BS he usually spouts, sorry Snoofle without more info here I don't agree with that one part of the WTF, for me code should be clear optimize when necessary, simple as. The code also reads like shit so that in and of itself is a WTF ...

    As for the other part that made me smile, I love people who use the excuse virtual calls are expensive, you should have seen the mess I saw in some C++ and Java code with this belief.

  • (cs)

    TRWTF is that Taran got what he deserved. Most TDWTF stories end with the bad employees forcing the good ones to quit.

  • Marcel (unregistered) in reply to Norman Diamond
    But my latest one appears to be memory pressure. On a PC with 4GB of memory, 3.75GB usable by 32-bit Windows XP...

    You're using a PC with 4GB of RAM in 2013? With 32-bit Win XP? Why? I currently have 16GB and I was shocked to get an out of memory error recently... so this Christmas means 32GB for me (at least!).

  • (cs)
      // Original code
      Map<String,Integer> map = new HashMap<String,Integer>();
      for (int i=0; i<1000000; i++) {
          // use map to do stuff
          map.clear(); // OP: empties map
      }
    
      // Refactored code
      // Taran: Obviously, the gc will take care of this; allocate as close to use as possible
      // OP: yes it will; all 1,000,000 times
      for (int i=0; i<1000000; i++) {
          Map<String,Integer> map = new HashMap<String,Integer>();
          // use map to do stuff
      }
    

    Actually, the new code may be faster. Creating and GCing a map is O(1), but clearing a map is O(map-size), which can be costly for large maps.

  • anon (unregistered)
    To his credit, Taran was a fairly bright person.

    People say crap like this too often. No, no he wasn't. He was a simpleton who ruined everything, he wasn't bright at all.

  • csrster (unregistered) in reply to Duncan Simpson
    Duncan Simpson:
    As someone who has a degree is mathematics and CS, I know that there are several meanings of obvious including 1. True 2. True but a proof is non-trivial 3. False

    Working out which meaning of obvious applies is not always easy to do. Things which are obvious but actually false include that most significant digits are uniformly distributed. In reality 1 wins about 30% of the time and 1-4 about 55%.

    1. Not provable within the axioms of zermelo–fraenkel set theory.
  • faoileag (unregistered) in reply to IA
    IA:
    I would really want to see the code where
    map.clear()
    works well and
    map = new HashMap()
    makes troubles. This looks like real nano optimization that brings nothing. The new HashMap can be even faster if you have enough memory and GC never runs.
    "if you have enough memory" might be a real limitation here. If, for whatever reason, the GC does indeed not run while the loop loops, each instantiation of the HashMaps adds to the memory load. With enough iterations and large HashMaps, this could definitely mean you are running out of memory at some point.

    Also, the GC will eventually run. But if your system does not run out of memory while looping, your GC will at some point in the future have a lot of HashMaps to dispose.

    Nothing is for free, but with the HashMap declared outside the loop and only cleared inside it you will have a much more predictable behaviour.

  • faoileag (unregistered) in reply to S
    S:
    faoileag:
    But creating a hash map is rather expensive. So you do it just once (outside the loop) and then just erase whatever's in it once you're done with it inside the loop.

    That looks like an unnecessary optimisation to me. Creating and discarding the actual HashMap object each time through the loop isn't going to be a tiny fraction of the cost of creating and discarding all of it's internal data structures, which will be done on a clear() anyway.

    Please keep in mind that Garbage Collection does not happen at a specific point in the code, but rather when the Garbage Collector feels like doing so.

    In other words: if you instantiate the HashMap inside the loop, the end of the loop is the earliest point at which the Garbage Collection is allowed to dispose the HashMap. When it actually is doing so is largely beyond your control, so you could end up with quite a lot of HashMap instances in memory that just take up space while waiting to be garbage collected.

    However, clearing a HashMap is something you do have control over. It happens when you call clear().

    So, when instantiating the HashMap once outside the loop and then only clearing it inside it, you will get a much more predictable behaviour.

  • Krunt (unregistered)

    Is this Java, or did someone just shit a bunch of semicolons and un-intuitive methods all over a screen?

    Oh wait...

  • IA (unregistered) in reply to faoileag
    faoileag:
    ...
    Well, either you have OOM or not. If you get that it means you probably allocated a lot of objects per loop (you already said you will have large map, that means lot of other objects). If you .clear() then you saved exactly two allocations per loop (you have to zero the table in both cases). So, if the map is really used then the cost is nothing when you compare it to allocation of other objects.

    What is the cost of throwing away a lot of HashMaps that are not referenced anymore? Nothing, zero. You can always account cost of zeroing to newly created objects.

    Pls, get a profiler or heap monitor, write simple (preferably non trivial) piece of code using HashMap, make 1e6 loops in both scenarios. You won't be able to tell the difference.

  • anonymous (unregistered) in reply to foo AKA fooo
    foo AKA fooo:
    anonymous:
    For uniformly distributed arbitrarily large numbers
    Which obviously(3) exist.
    All real numbers exist. If you're talking about real-world numbers... well shit, I could make up any statistic I wanted and claim it applied and unless I provided a ream of documentation you shouldn't take me very seriously either.

    Also, I could amend my previous statement: for uniformly distributed numbers that are NOT arbitrarily large, but which ARE evenly distributed from base^i (or zero) to base^j - 1 for any positive integers i and j, THEN the distribution of digits in the greatest significance place will also be uniform.

  • Cptain Troll (unregistered)

    It seems 'fairly bright' persons can also suffer from the Dunning-Kruger effect.

    Captch: commoveo - that did not commove me

  • Neil (unregistered) in reply to Scarlet Manuka
    Scarlet Manuka:
    foo AKA fooo:
    Duncan Simpson:
    Things which are obvious but actually false include that most significant digits are uniformly distributed. In reality 1 wins about 30% of the time and 1-4 about 55%.
    This obviously depends on the number system.
    The percentages will depend on the base you're using, but the situations where this sort of skewed distribution arises are precisely the ones that are scale-independent. Indeed the distribution is a consequence of the scale independence.
    I thought that the logarithm of scale independence would give you a uniform distribution, so that a leading digit of less then n would occur logbn where b is the base so in base 10 a first digit of 1-4 should occur log 5 or about 70% of the time.
  • foo AKA fooo (unregistered) in reply to anonymous
    anonymous:
    foo AKA fooo:
    anonymous:
    For uniformly distributed arbitrarily large numbers
    Which obviously(3) exist.
    All real numbers exist. If you're talking about real-world numbers... well shit, I could make up any statistic I wanted and claim it applied and unless I provided a ream of documentation you shouldn't take me very seriously either.
    The numbers exist, but a uniform distribution doesn't.
    Also, I could amend my previous statement: for uniformly distributed numbers that are NOT arbitrarily large, but which ARE evenly distributed from base^i (or zero) to base^j - 1 for any positive integers i and j,
    That's an artificial assumption. The original point was that naturally occurring distributions are not like this (and not uniform since that's impossible), so indeed 1 is more common than higher first digits. Check out any larger data set (population, financial, whatever) and you'll see this pattern.
  • anonymous (unregistered) in reply to foo AKA fooo
    foo AKA fooo:
    anonymous:
    foo AKA fooo:
    anonymous:
    For uniformly distributed arbitrarily large numbers
    Which obviously(3) exist.
    All real numbers exist. If you're talking about real-world numbers... well shit, I could make up any statistic I wanted and claim it applied and unless I provided a ream of documentation you shouldn't take me very seriously either.
    The numbers exist, but a uniform distribution doesn't.
    Also, I could amend my previous statement: for uniformly distributed numbers that are NOT arbitrarily large, but which ARE evenly distributed from base^i (or zero) to base^j - 1 for any positive integers i and j,
    That's an artificial assumption. The original point was that naturally occurring distributions are not like this (and not uniform since that's impossible), so indeed 1 is more common than higher first digits. Check out any larger data set (population, financial, whatever) and you'll see this pattern.
    The only thing remotely close to that in the original statement was "in reality". Mathematics is one subset of reality, and I described the conditions under which the leading digit will be evenly distributed between the non-zero digits of whatever base you're using. I suspected that all along it was supposed to be talking about "real world" numbers, but I don't have the full data set of "real world" numbers and neither does anyone else. I suppose you could do a full copy of the set of "real world" numbers, and then specify that the set also contains itself, since you made the copy. It might take you a while, but luckily it won't change the distribution of the leading digits of the numbers.
  • anonymous (unregistered) in reply to foo AKA fooo

    p.s. and then I'll define a set that includes all real numbers that aren't included in your set, so that the leading digits of all the numbers will be evenly distributed again (LIKE THEY'RE SUPPOSED TO BE).

  • (cs) in reply to anonymous
    anonymous:
    foo AKA fooo:
    anonymous:
    foo AKA fooo:
    anonymous:
    For uniformly distributed arbitrarily large numbers
    Which obviously(3) exist.
    All real numbers exist. If you're talking about real-world numbers... well shit, I could make up any statistic I wanted and claim it applied and unless I provided a ream of documentation you shouldn't take me very seriously either.
    The numbers exist, but a uniform distribution doesn't.
    Also, I could amend my previous statement: for uniformly distributed numbers that are NOT arbitrarily large, but which ARE evenly distributed from base^i (or zero) to base^j - 1 for any positive integers i and j,
    That's an artificial assumption. The original point was that naturally occurring distributions are not like this (and not uniform since that's impossible), so indeed 1 is more common than higher first digits. Check out any larger data set (population, financial, whatever) and you'll see this pattern.
    The only thing remotely close to that in the original statement was "in reality". Mathematics is one subset of reality, and I described the conditions under which the leading digit will be evenly distributed between the non-zero digits of whatever base you're using. I suspected that all along it was supposed to be talking about "real world" numbers, but I don't have the full data set of "real world" numbers and neither does anyone else. I suppose you could do a full copy of the set of "real world" numbers, and then specify that the set also contains itself, since you made the copy. It might take you a while, but luckily it won't change the distribution of the leading digits of the numbers.
    Instead of having this end- and pointless discussion, read about Benford's law on Wikipedia and STFU.

    PS: https://en.wikipedia.org/wiki/Benford%27s_law

  • Taddam (unregistered) in reply to Kuli

    Oh, yes it does have to do with patterns, namely anti-patterns.

  • (cs) in reply to derari
    derari:
      // Original code
      Map<String,Integer> map = new HashMap<String,Integer>();
      for (int i=0; i<1000000; i++) {
          // use map to do stuff
          map.clear(); // OP: empties map
      }
    

    // Refactored code // Taran: Obviously, the gc will take care of this; allocate as close to use as possible // OP: yes it will; all 1,000,000 times for (int i=0; i<1000000; i++) { Map<String,Integer> map = new HashMap<String,Integer>(); // use map to do stuff }

    Actually, the new code may be faster. Creating and GCing a map is O(1), but clearing a map is O(map-size), which can be costly for large maps.

    Doing nothing with the Map: reusing the Map is about 6x times faster (6 vs 36 ms).

    Reuse Map and add 100 elements on each iteration: 5.1 s Create new Map and add 100 elements on each iteration: 6.7 s

    Reuse Map and add 1000 elements on each iteration: 62 s Create new Map and add 1000 elements on each iteration: 81 s

    In both cases reusing the Map is about 23% faster.

  • (cs) in reply to MoSlo
    MoSlo:
    I rank the 'Obviously' people in the same group as those who keep saying 'clearly'.

    Obviously!

  • (cs) in reply to S
    S:
    CLR:
    S:
    Since it compiled after the change, the map can't have been used outside the loop. And with the change, it's now enforced that there are no side-effects involving that variable, making things a little bit clearer for the next person to have to maintain it.
    Clearly.

    I was trying to avoid using words like that in my post... :)

    Obviously.

  • Valued Service (unregistered) in reply to faoileag
    faoileag:
    S:
    faoileag:
    But creating a hash map is rather expensive. So you do it just once (outside the loop) and then just erase whatever's in it once you're done with it inside the loop.

    That looks like an unnecessary optimisation to me. Creating and discarding the actual HashMap object each time through the loop isn't going to be a tiny fraction of the cost of creating and discarding all of it's internal data structures, which will be done on a clear() anyway.

    Please keep in mind that Garbage Collection does not happen at a specific point in the code, but rather when the Garbage Collector feels like doing so.

    In other words: if you instantiate the HashMap inside the loop, the end of the loop is the earliest point at which the Garbage Collection is allowed to dispose the HashMap. When it actually is doing so is largely beyond your control, so you could end up with quite a lot of HashMap instances in memory that just take up space while waiting to be garbage collected.

    However, clearing a HashMap is something you do have control over. It happens when you call clear().

    So, when instantiating the HashMap once outside the loop and then only clearing it inside it, you will get a much more predictable behaviour.

    The predictable behavior being that your hashMap lost references to the objects it knew about.

    That does not guarantee a release in memory of the objects the hashMap referred to, even if the hashMap was the only reference to those objects. You're still dealing with the GC releasing those objects whenever it feels like.

  • anonymous (unregistered) in reply to derari
    derari:
    Instead of having this end- and pointless discussion, read about Benford's law on Wikipedia and STFU.

    PS: https://en.wikipedia.org/wiki/Benford%27s_law

    Well if foo AKA fooo had been more specific and had originally said Benford's law then there wouldn't be any discussion. Note that the very first sentence of the Wikipedia article reads thus:
    Benford's Law, also called the First-Digit Law, refers to the frequency distribution of digits in many (but not all) real-life sources of data.
    He stated it as if it applied to all real-life sources of data, and that's the only reason I felt compelled to argue that it doesn't.

  • Norman (unregistered) in reply to Marcel
    Marcel:
    But my latest one appears to be memory pressure. On a PC with 4GB of memory, 3.75GB usable by 32-bit Windows XP...
    You're using a PC with 4GB of RAM in 2013? With 32-bit Win XP? Why?
    One reason for continuing to use that PC is that the screen is 1920x1200, and newer ones max out at 1920x1080. One reason is that the PC is still perfectly serviceable most of the time. It didn't even suffer overheating of RAM slots or video chip.

    As for migrating the OS, I'm getting used to Windows 7 on another PC so that will be a possibility some day. It will take more time to migrate applications than to migrate the OS.

  • nobulate (unregistered)

    Adding more whitespace in their code will speed it up significantly:

    :s/obviously/ /g
  • faoileag (unregistered) in reply to Valued Service
    Valued Service:
    faoileag:
    So, when instantiating the HashMap once outside the loop and then only clearing it inside it, you will get a much more predictable behaviour.

    The predictable behavior being that your hashMap lost references to the objects it knew about.

    That does not guarantee a release in memory of the objects the hashMap referred to, even if the hashMap was the only reference to those objects. You're still dealing with the GC releasing those objects whenever it feels like.

    You've got a point there. At least as long as what's stored in the hash map has been copied into the hash map rather than referenced. But with a temporary hash map I doubt that that is the case.

  • no_vi_in_obvious (unregistered) in reply to Migala
    Migala:
    A nony mouse...:
    Obviously the problem is with the Code Review Process...Seeing such an artifact ONCE should have put an immediate stop to it. Having it pervade the codebase is TRWTF, and firmly at the leaders door!

    Obviously he's using the word obviously so much so he won't be asked to explain himself.

    He's using the word obviously too often to excuse not saying whatever the other person said before he got to it.

  • use_a_profiler (unregistered) in reply to beginner_

    Try adding 1000 elements first, then do 100 each iteration. Clearing will be slower because it has to clear 1000 element map regardless of how many are actually in it

  • Developer Dude (unregistered) in reply to ObiWayneKenobi

    No - Obviously.

    I have personally seen patterns implemented poorly - singletons that were not thread safe, factories that were simply a facade (wrappers around creating the object - a separate method for each type of object) and totally unneeded and on and on.

    The only pattern usage I saw in the examples for this WTF were patterns of an total lack of understanding of the very basic fundamentals of logic.

    Unfortunately I have seen way too many examples of a lack of understanding the fundamentals too; e.g.:

    HashMap someTuple = new HashMap(5);

    To hold five items. If why this is flawed isn't obvious to someone, then they should go lookup how hash maps work.

  • anonymous (unregistered) in reply to Developer Dude
    Developer Dude:
    No - Obviously.

    I have personally seen patterns implemented poorly - singletons that were not thread safe, factories that were simply a facade (wrappers around creating the object - a separate method for each type of object) and totally unneeded and on and on.

    The only pattern usage I saw in the examples for this WTF were patterns of an total lack of understanding of the very basic fundamentals of logic.

    Unfortunately I have seen way too many examples of a lack of understanding the fundamentals too; e.g.:

    HashMap someTuple = new HashMap(5);

    To hold five items. If why this is flawed isn't obvious to someone, then they should go lookup how hash maps work.

    It'd hold five items as long as none of the hashes collide...

Leave a comment on “Obviously!”

Log In or post as a guest

Replying to comment #:

« Return to Article