The Daily WTF: Curious Perversions in Information Technology

Severity One · 2008-12-23 Reply Admin

Julian Calaby:
Problems with the binary search function:

Creates three unnecessary variables and doesn's use one of them.

The type of sr.

No returns on the recursive calls

Recursive calls are utterly unnecessary as this can easily be performed iteratively.

Confusing comment.

immsi will be parsed as a float for every comparison.

Single space indents are bad style.

strcmp() is not the right way to compare two floats, unless you really want 0 to not equal 0.00.

Why the hell are we passing around a huge array of strings representing floating point numbers anyway?

I hope the data is sorted =)

1, 3, 4 and 5 indicate to me that this was originally implemented as an iterative function, then made recursive. Why, I can't fathom.

Because he's an idiot. No, really, things have progressed a lot since I joined this company almost eight years ago, and whereas there were developers in several section (in this case, the billing section for post-paid customers), there's now just one development section, with clear guidelines and a manager and senior development team that does not take kindly to sloppy design, testing or implementation.

This particular developer, who left a long time ago and is now making more money than we do (we haven't quite caught up yet with Western European standards), now is referred to as "he who shall not be mentioned".

@1: He grabbed an algorythm off the internet and modified it, without actually understanding it. @2: No, that is actually correct. Well, it should have been char**, but at the time we had fewer than 20,000 contract customers. @3: See #1. @4: See #1. @5: That comment is actually quite unique. Usually, there are none, and there's just confusing code. He probably couldn't be bother to remove it. @6: And it really should have used strcmp()... @7: If it were just the indents... @8: But atof() is used for string comparison, because the strings happen to consist of 15 digits. @9: Just a reference, this is C after all, and 20,000 post-paid mobile phone customers should be enough for anybody. :) @10: Good point. Since the code actually worked, I presume it was, but then again...

Read my previous comments on this article for some more gems, like a goto statement in C code. :)

2008-12-23 Reply Admin

SenTree:
Ian Gent:
eric76:
Consider this problem:
George bought a sack of 100 pieces of candy at the store. 90 of the pieces are lemon flavored and ten are cherry flavored. Of the two, George prefers the lemon flavored candies.

Every day George randomly picks a piece of candy out of the bag. If it is lemon flavored, he eats it and puts the bag away for the next day.

But if the candy he chose is cherry flavored, he puts it back in the bag and then randomly picks a candy out of the bag and eats it regardless of the flavor. In other words, he'll only put a piece of candy back at most once per day.

What are the odds that when one piece of candy remains, it will be lemon flavored?

I can't see the solution but I can guess the answer on the following logic.

These kind of puzzles, the answer is always 0, 1/2, or 1. It can't be either 0 or 1, because there is a non-zero chance of picking out lemons until there are none left, or cherries till there are none left.

Therefore the answer is a half.

Actually there's a small chance that the answer is either 1/100 or 99/100, because those numbers come up sometime, but I think 1/2 is more likely.

I think he's effectively picking a random candy every day, so the odds remain what they were at the start, 9/10.

I also think 9/10 is a correct answer. However I think it's more complicated. Each day chances to get a lemon are different: on the first day it's 9/10. On the second day it's 89/99 or 90/99 depending which flavor was picked on the first day. And so on. This is called permutations with repetition.

And you guys using Excel - how did you get this 0.(81)? I don't get it at all. Also I don't think that 10x90 table is enough to make a brute force in this case.

Anyone feel free to correct me if I'm wrong, last time I had maths at school over 10 years ago!

Bellinghman · 2008-12-23 Reply Admin

sobek:
I also think 9/10 is a correct answer. However I think it's more complicated. Each day chances to get a lemon are different: on the first day it's 9/10.

Start with an invalid assumption, and you end up with an invalid answer.

On the first day, there is 9/10 chance of picking a lemon. 1/10 chance of picking a cherry. But you failed to note the bit where picking a cherry gives a second chance. And on that second chance (all the drops are back in the bag), he again has a 9/10 chance of picking a lemon anyway.

So, to eat a cherry on the first day requires a 1/10 * 1/10 chance. There is 99% chance of picking a lemon, not 90%. This is why the lemons get consumed much faster than you expect.

(FWIW, I left school much longer ago than you.)

2008-12-23 Reply Admin

Bellinghman:
sobek:
I also think 9/10 is a correct answer. However I think it's more complicated. Each day chances to get a lemon are different: on the first day it's 9/10.
Start with an invalid assumption, and you end up with an invalid answer.
On the first day, there is 9/10 chance of picking a lemon. 1/10 chance of picking a cherry. But you failed to note the bit where picking a cherry gives a second chance. And on that second chance (all the drops are back in the bag), he again has a 9/10 chance of picking a lemon anyway.

So, to eat a cherry on the first day requires a 1/10 * 1/10 chance. There is 99% chance of picking a lemon, not 90%. This is why the lemons get consumed much faster than you expect.

(FWIW, I left school much longer ago than you.)

Ok, you're perfectly right. Thanks for explanation.

Smash King · 2008-12-23 Reply Admin

sobek:
Bellinghman:
<snip> On the first day, there is 9/10 chance of picking a lemon. 1/10 chance of picking a cherry. But you failed to note the bit where picking a cherry gives a second chance. And on that second chance (all the drops are back in the bag), he again has a 9/10 chance of picking a lemon anyway.
So, to eat a cherry on the first day requires a 1/10 * 1/10 chance. There is 99% chance of picking a lemon, not 90%. <snip>
Ok, you're perfectly right. Thanks for explanation.

No, he's perfectly wrong.... if you consider the fact that he swapped the odds for lemons and cherries. If there are 10 lemon drops on a 100-drops package, it's obvious that the chance of a first pick being a lemon one is 1/10, and not 9/10 . But other than all this (including the "1/10 * 1/10 chance" - that should be a 9/10 * 9/10 chance), he's actually right.

Addendum (2008-12-23 10:32): Stupid me! I knew I should have re-read the actual problem... Well who looks like a fool now?

sota · 2008-12-23 Reply Admin

PC Paul:
SpamBot:
The person who said that the probability of the last candy is lemon is zero is absolutely right. I don't care what maths the rest of you did - you're wrong. It's IMPOSSIBLE if every time he gets a cherry one he puts it back that this will happen any other way.
How am I so certain? My daughter and I spent all night picking out all the red sweets and putting the lemon ones back - result, pot of 100% lemon sweets.

Impressive. But next time read the question properly first...

"But if the candy he chose is cherry flavored, he puts it back in the bag and then randomly picks a candy out of the bag and eats it regardless of the flavor."

Cherry ones only go back if they are the first ones out. Second ones get eaten.

I only know because I thought the same thing at first...

I was somehow misunderstood, defended and then argued against by the opposite result in the same post by the same person.

At any rate, the principal of the Halloween Candy Corn Law clearly states that the least desirable candy will be left when the bottom of the sack is reached. In George's case, this is cherry. Hence, there's no possible way a delicious lemon treat is waiting for him as he polishes off the bag of goodness. QED.

2008-12-23 Reply Admin

Since the original question indicates that George takes another candy, I.E. a different individual sweet than the one involved in the first trial, then cherry flavoured sweets will be elliminated until there is only one left, at which point the second trial will always evaluate as "lemon".

As such the last sweet in the bag would be a charry flavoured one.

If the candidates for the second trial selection included the cherry flavoured specemin from the first trial, then things get markedly more complicated! However, even when the proportion reaches parity, lemon will still be eliminated at least three times more often than cherry. cherry will only be eliminated more often when they outnumber Lemon by 2.41 to 1. This would hint that it won't happen until there are about four lemons left!

Getting to the very end of my lunchbreak here I'm going to say "meh, it probably tends to 1".

Meh, it probably tends to 1.

The true WTF is that the guy who used excel didn't use matlab instead!

Code Dependent · 2008-12-23 Reply Admin

scruffy:
As such the last sweet in the bag would be a charry flavoured one.

The bag caught on fire?

2008-12-23 Reply Admin

Apart from all the other problems with the binsearch algorithm (eg atof), I'm not really sure what the issue is with the recursion.

It's a binary search! So for zillions of entries to search it will only have a depth of a few. It may well not have needed further optimisation. If it's only search 20,000 entries, it will have a depth of 14. That'll hardly be noticeable.

It's not necessarily a case of 'for someone who doesn't know about optimisation it's OK'. It could be more like 'for something that doesn't need optimisation it's reasonable'

As for the 4000->40,000 thing. The only issue I can see with that is that it crashes with a buffer overflow (rather than exiting cleanly) when it reaches 40,001 entries. Programmers are always setting arbitrary limits, eg 'size of memory' or '65535' or '4294967295' or whatever. 40,000 is just another example of that.

Without more information we don't know if 40,000 is a reasonable limit. Each item may be 40kB, or they may just have 200 new entries a year or whatever.

Dynamic allocation won't solve the issue, just move it a bit further away. At some point virtual memory space will run out, and that will be less deterministic than a fixed maximum size, so harder to deal with.

Howver, it DOES need a clean termination (with a sensible error message) if it goes over the limit, rather than just waiting for it to crash.

2008-12-23 Reply Admin

What kind of crappy candy company sells lemon candies that contain 90 lemon and 10 cherry flavor?

I'm not sure if I could trust a company selling lemon candy that seems to always drop 10 cherry candies I didn't like into my bag of lemon candy...

Also, why doesn't he just buy a bag with 100 lemon candies if you don't like cherry don't buy them.

2008-12-23 Reply Admin

I'd love to understand the Haskell version, especially the "optimized" code. Can anyone shed some light on the stuff? Especially some operators aren't clear to me and the whole ps stuff..

I'm trying to get into functional stuff, but I haven't looked at Haskell once and would probably take quite some time to get it on my own.

Any takers?

Severity One · 2008-12-23 Reply Admin

Paul:
As for the 4000->40,000 thing. The only issue I can see with that is that it crashes with a buffer overflow (rather than exiting cleanly) when it reaches 40,001 entries. Programmers are always setting arbitrary limits, eg 'size of memory' or '65535' or '4294967295' or whatever. 40,000 is just another example of that.

Read my comments on the previous pages. It was actually 900 and 90,000, not 4,000 and 40,000, and this was for loading records in memory that didn't need to be loaded in the first place, because scanning could happen on the fly.

It all boils down to that He Who Shall Not Be Mentioned just didn't know what he was doing, or he just didn't care, and there was nobody to check his code.

Nowadays, we run source code analysis tools like PMD on the Subversion repository, so things have changed a bit since then.

2008-12-23 Reply Admin

Severity One:
Julian Calaby:
10. I hope the data is sorted =)
@10: Good point. Since the code actually worked, I presume it was, but then again...

That's not a good point at all.

I can't fathom why people who are worried about the execution time and stack issues of a binary search algorithm done recursively rather than iteratively on a data set that's only 20,000 records (max of 15 recursive calls) are implying that he should check to see if the array is sorted. Sure lets throw a linear scan of the array in there to see if it's sorted before we do our binary search, that'll be great for performance!

A binary search algorithm should not care if the array is sorted, that concern should be left to the person calling the algorithm. If you are stupid enough call binsearch on an unsorted array you deserve to not find your result.

I'm not defending this guys code, anyone with a brain can see that it's poorly designed and written, but some people just pick the stupidest stuff to complain about.

Severity One · 2008-12-23 Reply Admin

Paul:
As for the 4000->40,000 thing. The only issue I can see with that is that it crashes with a buffer overflow (rather than exiting cleanly) when it reaches 40,001 entries. Programmers are always setting arbitrary limits, eg 'size of memory' or '65535' or '4294967295' or whatever. 40,000 is just another example of that.

Read my comments on the previous pages. It was actually 900 and 90,000, not 4,000 and 40,000, and this was for loading records in memory that didn't need to be loaded in the first place, because scanning could happen on the fly.

It all boils down to that He Who Shall Not Be Mentioned just didn't know what he was doing, or he just didn't care, and there was nobody to check his code.

Nowadays, we run source code analysis tools like PMD on the Subversion repository, so things have changed a bit since then.

Severity One · 2008-12-23 Reply Admin

Sean:
A binary search algorithm should not care if the array is sorted, that concern should be left to the person calling the algorithm. If you are stupid enough call binsearch on an unsorted array you deserve to not find your result.
I'm not defending this guys code, anyone with a brain can see that it's poorly designed and written, but some people just pick the stupidest stuff to complain about.

The thing is, after having seen his programming prowess, and since he was the one who called his own function, he might actually be stupid enough to do such a thing.

Well, not stupid. He simply didn't care.

2008-12-23 Reply Admin

scruffy:
Since the original question indicates that George takes another candy, I.E. a different individual sweet than the one involved in the first trial, then cherry flavoured sweets will be elliminated until there is only one left, at which point the second trial will always evaluate as "lemon".

Objection, facts not in evidence. The OP indicates that, "But if the candy he chose is cherry flavored, he puts it back in the bag and then randomly picks a candy out of the bag and eats it regardless of the flavor."

If he puts it back in the bag, then it's possible he can indeed select the same cherry piece twice.

2008-12-23 Reply Admin

Sean:
I can't fathom why people who are worried about the execution time and stack issues of a binary search algorithm done recursively rather than iteratively on a data set that's only 20,000 records (max of 15 recursive calls) are implying that he should check to see if the array is sorted. Sure lets throw a linear scan of the array in there to see if it's sorted before we do our binary search, that'll be great for performance!

Sean, I think you misunderstand. One person raised the worry about the data being sorted. Given reasonable coding standards, it's a reasonable assumption.

What most people are complaining about with the recursion is that it is often better practice* to write such searches as a loop rather than as an recursive function. That's what they mean by "iterative" as opposed to "recursive".

*These habits are often formed when the compiler won't optimize the code to a loop anyway. The concerns are function overhead in run-time and RAM usage (for the iterative people) vs. ease of understanding (for the recursive people).

2008-12-23 Reply Admin

Sean:
I can't fathom why people who are worried about the execution time and stack issues of a binary search algorithm done recursively rather than iteratively on a data set that's only 20,000 records (max of 15 recursive calls) are implying that he should check to see if the array is sorted. Sure lets throw a linear scan of the array in there to see if it's sorted before we do our binary search, that'll be great for performance!

Sean, I think you misunderstand. One person raised the worry about the data being sorted. Given reasonable coding standards, it's a reasonable assumption.

What most people are complaining about with the recursion is that it is often better practice* to write such searches as a loop rather than as an recursive function. That's what they mean by "iterative" as opposed to "recursive".

*These habits are often formed when the compiler won't optimize the code to a loop anyway. The concerns are function overhead in run-time and RAM usage (for the iterative people) vs. ease of understanding (for the recursive people).

2008-12-23 Reply Admin

Sean:
I can't fathom why people who are worried about the execution time and stack issues of a binary search algorithm done recursively rather than iteratively on a data set that's only 20,000 records (max of 15 recursive calls) are implying that he should check to see if the array is sorted. Sure lets throw a linear scan of the array in there to see if it's sorted before we do our binary search, that'll be great for performance!

Sean, I think you misunderstand. One person raised the worry about the data being sorted. Given reasonable coding standards, it's a reasonable assumption.

What most people are complaining about with the recursion is that it is often better practice* to write such searches as a loop rather than as an recursive function. That's what they mean by "iterative" as opposed to "recursive".

*These habits are often formed when the compiler won't optimize the code to a loop anyway. The concerns are function overhead in run-time and RAM usage (for the iterative people) vs. ease of understanding (for the recursive people).

KattMan · 2008-12-23 Reply Admin

Smash King:
No, he's perfectly wrong.... if you consider the fact that he swapped the odds for lemons and cherries. If there are 10 lemon drops on a 100-drops package, it's obvious that the chance of a first pick being a lemon one is 1/10, and not 9/10 . But other than all this (including the "1/10 * 1/10 chance" - that should be a 9/10 * 9/10 chance), he's actually right.
Addendum (2008-12-23 10:32): Stupid me! I knew I should have re-read the actual problem... Well who looks like a fool now?

You sir are incorrect. Don't feel bad, you fell for a common mistake. Read up on Gambler's fallacy. The odds of picking a lemon or cherry do not change on the second pick. Think of it this way, on a 6 sided die, what are your chances of rolling a 1? Exactly 1 in 6. Now say you roll a 1, now you roll again, what are your chances of rolling a 1. Once again 1 in 6, even though the odds of rolling two 1's in a row are less, each rolls odds are figured on its own and not influenced by what has gone before it.

downfall:
Objection, facts not in evidence. The OP indicates that, "But if the candy he chose is cherry flavored, he puts it back in the bag and then randomly picks a candy out of the bag and eats it regardless of the flavor."
If he puts it back in the bag, then it's possible he can indeed select the same cherry piece twice.

See downfall gets it. If there are 90% lemons, then there will ve a 90% chance that a lemon will be last. No programming needed.

2008-12-23 Reply Admin

Cumulative probability. Look into it.

untalented_newbie · 2008-12-23 Reply Admin

Tim:
Seems to me the 4,000 Records solution is exactly the correct one is most circumstances - it minimises effort, retesting and chance of any unintended consequences.
If this was in a production application, The only circumstances under which I would normally let one of my developers spend time rewriting this module to do dynamic allocation would be if (a) if I thought the amount of data would was likely grow to 10 times its current value within the lifetime of the application (and even then I would be tempted to just increase the size of the array further, or (b) the allocation of 40000 records used an excessive amount of memory.

I see what you've done there. Well played, sir. Well played.

2008-12-23 Reply Admin

I love all these people assuming that all compilers will always perform tail call optimisation.

Most mainstream compilers only perform it in very limited circumstances, if at all. It is NEVER guaranteed in any mainstream compiler. And it is guaranteed NOT to happen in cases like the code in the WTF, which doesn't contain any tail calls ...

Ilya Ehrenburg · 2008-12-23 Reply Admin

KattMan:
Smash King:
No, he's perfectly wrong.... if you consider the fact that he swapped the odds for lemons and cherries. If there are 10 lemon drops on a 100-drops package, it's obvious that the chance of a first pick being a lemon one is 1/10, and not 9/10 . But other than all this (including the "1/10 * 1/10 chance" - that should be a 9/10 * 9/10 chance), he's actually right.
Addendum (2008-12-23 10:32): Stupid me! I knew I should have re-read the actual problem... Well who looks like a fool now?

You sir are incorrect. Don't feel bad, you fell for a common mistake. Read up on Gambler's fallacy. The odds of picking a lemon or cherry do not change on the second pick. Think of it this way, on a 6 sided die, what are your chances of rolling a 1? Exactly 1 in 6. Now say you roll a 1, now you roll again, what are your chances of rolling a 1. Once again 1 in 6, even though the odds of rolling two 1's in a row are less, each rolls odds are figured on its own and not influenced by what has gone before it.

downfall:
Objection, facts not in evidence. The OP indicates that, "But if the candy he chose is cherry flavored, he puts it back in the bag and then randomly picks a candy out of the bag and eats it regardless of the flavor."
If he puts it back in the bag, then it's possible he can indeed select the same cherry piece twice.

See downfall gets it. If there are 90% lemons, then there will ve a 90% chance that a lemon will be last. No programming needed.

No, Smash King is right, you're wrong. a) Smash King first believed that Bellinghman got the numbers of lemon flavoured and cherry flavoured candies wrong. He then noticed his error and conceded that that made him look like a fool. He never insinuated that the chances of picking a lemon or cherry changed on the second pick. b) Since the chances of a cherry being permanently removed from the bag are not c/(c+l) but (c/(c+l))^2, the chances of the last candy being lemon flavoured if there are initially 90% lemons is indeed less than 90% (0.9/(1+number of cherries), to be exact).

superjer · 2008-12-24 Reply Admin

The thing I find the craziest about the candy problem is that there is...

a 1763 in 753249978808893720 chance of having 53 lemons and no cherries,

a 1763 in 501668471336346545 chance of having 52 lemons and no cherries,

a 1763 in 334885997323933770 chance of having 51 lemons and no cherries, and

a 1763 in 223989322800270456 chance of having 50 lemons and no cherries!

Isn't that wild?!

2008-12-24 Reply Admin

Ummm, that's what "realloc()" is for...

if (size>capacity) { capacity *= 2; foo = realloc(foo,capacity); }

2008-12-24 Reply Admin

Here is my code:

def probabilityLastIsLemon(numberOfLemons, numberOfCherries): memory = {} def auxProbabilityLastIsLemon(lemonNumber, cherrieNumber, memoization): if lemonNumber == 0: return 0

	if cherrieNumber == 0:
		return 1
	
	
	currentState = (lemonNumber, cherrieNumber)
	if currentState not in memoization:
		numberOfFruits = lemonNumber + cherrieNumber
		
		isLemonProbability = lemonNumber / numberOfFruits
		isCherrieProbability = cherrieNumber / numberOfFruits
		isLemonProbability = isLemonProbability * isCherrieProbability + isLemonProbability
		isCherrieProbability = isCherrieProbability * isCherrieProbability
		
		memoization[currentState] = auxProbabilityLastIsLemon(lemonNumber - 1, cherrieNumber, memoization) * isLemonProbability + auxProbabilityLastIsLemon(lemonNumber, cherrieNumber - 1, memoization) * isCherrieProbability
	
	return memoization[currentState]					   

return auxProbabilityLastIsLemon(numberOfLemons, numberOfCherries, memory)

print(probabilityLastIsLemon(90, 10))

it runs in 171 ms how do you guys are getting 26 ms?

2008-12-24 Reply Admin

[The previous poster]

The general expression is very easy to derive provided you know the hypergeometric distribution.

KattMan · 2008-12-24 Reply Admin

Ilya Ehrenburg:
No, Smash King is right, you're wrong. a) Smash King first believed that Bellinghman got the numbers of lemon flavoured and cherry flavoured candies wrong. He then noticed his error and conceded that that made him look like a fool. He never insinuated that the chances of picking a lemon or cherry changed on the second pick. b) Since the chances of a cherry being permanently removed from the bag are not c/(c+l) but (c/(c+l))^2, the chances of the last candy being lemon flavoured if there are initially 90% lemons is indeed less than 90% (0.9/(1+number of cherries), to be exact).

I still have to differ. Given 90% lemon, 10% red, the chances of any single candy being a lemon is 90%. It doesn't matter if you pick one or 99, the single candy has a 90% chance of being lemon. The trick here is in the way the picking was described, Since the original candy was returned to the bag before the next candy is picked, it has no affect on the resulting pick. Now if the second pick was made before return the first pick to the bag, then the odds of the last one being a lemon would be less than 90%. For a simple example, say there are two candies left, one lemon one cherry, what are the odds the last one will be lemon, 50%. Now pick one and get a cherry, out it back, now what are the odds the last will be a lemon, once again 50%. Now pick one and get a cherry, hold it and pick again, what are your odds, this time 100%. The odds only change when you hold the first pick until after the second pick is made. Returning it brings the odds back in line with the original count.

2008-12-24 Reply Admin

KattMan:
For a simple example, say there are two candies left, one lemon one cherry, what are the odds the last one will be lemon, 50%. Now pick one and get a cherry, out it back, now what are the odds the last will be a lemon, once again 50%. Now pick one and get a cherry, hold it and pick again, what are your odds, this time 100%. The odds only change when you hold the first pick until after the second pick is made. Returning it brings the odds back in line with the original count.

Consider the case where there are exactly two candies left, one lemon, one cherry. The possible draws are:

lemon (50% odds) cherry (50% odds)

But then you put the cherry back and redraw. So half of the time you make two draws and half of that time it is also a lemon. Thus, you draw

lemon (50% odds) cherry then lemon (25% odds) cherry then cherry (25% odds)

Thus, when you have precisely one lemon candy and one cherry candy left, you will eat the lemon candy 75% of the time and the cherry candy 25% of the time and the final piece will thus be cherry 75% of the time and lemon 25% of the time.

2008-12-24 Reply Admin

The world is so simple!

And the year is 365.000 days long!!

And pi is 3.0000!!!

And the probability of any event is the same as the probability of any other event!!!!

Ilya Ehrenburg · 2008-12-24 Reply Admin

Pedro:
Here is my code:
(snip)

it runs in 171 ms how do you guys are getting 26 ms?

Easy:

Use a fast-lookup memoisation structure (e.g. an array).
Fill it bottom up.
If that isn't enough, use a compiled language.

Runs in 2ms on my six year old thing. Of course working out the formula leads to a faster algorithm :-)

2008-12-24 Reply Admin

Never mind, python's 3 interpreter takes 171 ms to startup.

KattMan · 2008-12-24 Reply Admin

eric76:
KattMan:
For a simple example, say there are two candies left, one lemon one cherry, what are the odds the last one will be lemon, 50%. Now pick one and get a cherry, out it back, now what are the odds the last will be a lemon, once again 50%. Now pick one and get a cherry, hold it and pick again, what are your odds, this time 100%. The odds only change when you hold the first pick until after the second pick is made. Returning it brings the odds back in line with the original count.

Consider the case where there are exactly two candies left, one lemon, one cherry. The possible draws are:

lemon (50% odds) cherry (50% odds)

But then you put the cherry back and redraw. So half of the time you make two draws and half of that time it is also a lemon. Thus, you draw

lemon (50% odds) cherry then lemon (25% odds) cherry then cherry (25% odds)

Thus, when you have precisely one lemon candy and one cherry candy left, you will eat the lemon candy 75% of the time and the cherry candy 25% of the time and the final piece will thus be cherry 75% of the time and lemon 25% of the time.

Gamblers Fallacy, look it up.

The previous draw has no affect on the next draw. Given two pieces, your chances are always 50% to draw a chosen peice, regardless of what you drew the last time.

DougBTX · 2008-12-24 Reply Admin

KattMan:
The previous draw has no affect on the next draw.

Here it is with the percentages written out more explicitly:

Draw 1:

lemon (50% odds) cherry (50% odds)

Draw 2 if Lemon in Draw 1:

lemon (100% odds) cherry (0% odds)

Draw 2 if Cherry in Draw 1:

lemon (50% odds) cherry (50% odds) (same odds as Draw 1, see Gamblers Fallacy)

Overall odds over both draws:

lemon (50% * 100% + 50% * 50% = 75%) cherry (50% * 0% + 50% * 50% = 25%)

hth, Douglas

Ilya Ehrenburg · 2008-12-24 Reply Admin

KattMan:
Gamblers Fallacy, look it up.
The previous draw has no affect on the next draw. Given two pieces, your chances are always 50% to draw a chosen peice, regardless of what you drew the last time.

KattMan, there only is a second draw if the first candy drawn was a cherry. Therefore the chance of a cherry being eaten is the proportion of cherries squared. It's not a gambler's fallacy, it's a "best of three with lemon leading 1-0".

2008-12-26 Reply Admin

KattMan:
Gamblers Fallacy, look it up.
The previous draw has no affect on the next draw. Given two pieces, your chances are always 50% to draw a chosen peice, regardless of what you drew the last time.

If you are trolling, you aren't funny. If you aren't trolling, you are stupid (and damn arrogant in not even bothering to check to see if you are wrong)

2008-12-26 Reply Admin

Someone:
I doubt the recursive function is executing enough to cause stack overflow issues on a 32 bit system but that might be something to watch out for if you increase the dataset size.

Not going to happen. Even if the dataset had 2 billion record the recursive function would be called only ~30 times (that's why binary search is so fast, remember?).

UpNDown · 2008-12-26 Reply Admin

Ilya Ehrenburg:
Of course working out the formula leads to a faster algorithm :-)

The formula was presented before, and I've checked the recurrence relation, but how would you "work out" the formula given the problem description?

Ilya Ehrenburg · 2008-12-26 Reply Admin

UpNDown:
Ilya Ehrenburg:
Of course working out the formula leads to a faster algorithm :-)

The formula was presented before, and I've checked the recurrence relation, but how would you "work out" the formula given the problem description?

As one usually does it: First, work out the general recursion: P(l,c) = (1-(c/(l+c))^2P(l-1,c) + (c/(l+c))^2P(l,c-1) for l, c not 0, that follows directly from the description. Then find P(l,c) explicitly for small numbers, see if you spot a pattern, and if you do, (try to) prove it (by induction or otherwise). Here, the cases of 0 lemons or 0 cherries are trivial, so start with 1 cherry. P(l,1) = (1-1/(l+1)^2)P(l-1,1) + 1/(l+1)^2P(l,0), now P(l,0) = 1 and P(0,c) = 0, so P(1,1) = 1/4. Then P(2,1) = (1-1/3^2)1/4 + 1/3^21 = 2/9+1/9 = 1/3 (= 2/6), P(3,1) = (1-1/4^2)*1/3 + 1/4^2 = 5/16+1/16 = 6/16 = 3/8, P(4,1) = (1-1/5^2)*3/8 + 1/5^2 = 10/25 = 2/5 (= 4/10), P(5,1) = (1-1/6^2)2/5 + 1/6^2 = 15/36 = 5/12, you see P(l,1) = l/(2(l+1)), which is then simple to prove by induction. Two cherries next: P(1,2) = (1 - (2/3)^2)0 + (2/3)^2P(1,1) = 1/3^2 = 1/9. P(2,2) = (1 - (2/4)^2)1/9 + (2/4)^2P(2,1) = 1/12 + 1/12 = 2/12 = 1/6. P(3,2) = (1 - (2/5)^2)1/6 + (2/5)^23/8 = 7/50 + 3/50 = 1/5 (= 3/15), P(4,2) = (1 - (2/6)^2)1/5 + (2/6)^22/5 = 8/45 + 2/45 = 10/45 = 2/9 (= 4/18). Continue until you are fed up or see P(l,2) = l/((l+2)3), if the latter, prove it. Go on until you see the pattern P(l,c) = l/((l+c)(c+1)), prove by induction.

Alexis de Torquemada · 2008-12-28 Reply Admin

mykelyk:
Please stop bashing recursive algorithms in iterative ones for the sake of your very own "good style".

Thanks! for writing the rant I would otherwise have written. It seems deeply ingrained in the imperative mindset that recursion = recomputing intermediate results = inefficient. There's nothing wrong with programming in imperative languages to accomplish a goal. I do it all the time. But it's rather sad that so many programmers are unable to fathom even the possibility that non-imperative and loop-free yet efficient solutions exist. Unfortunately, eric76 still doesn't get it.

Alexis de Torquemada · 2008-12-28 Reply Admin

eric76:
This example shows one in which the performance hit is very large. For example, you can code this problem as follows

So in other words, you found a way to write a slow recursive function, and therefore there are no ways to write fast recursive functions.

eric76:
If you modify the algorithm so that it keeps a table of those values it has already calculated and uses those values when needed instead of recalculating them over and over and over again, then the algorithm is very well behaved.

The point that you completely missed is that "keeping a table" (or memoization as it is called more generally) has nothing to do with recursion or looping. In fact, you totally confuse control flow (recursion vs. looping) with data flow (memoization). In fact, because in Haskell expression evaluation is non-strict and free of side-effects, runtime control flow is determined by data dependencies and compiler whims and mostly irrelevant.

Here's another, timeless and simpler example of naive vs. non-naive use of recursion:

-- calculate n-th fibonacci number, n starting at 0
-- dog slow!
fib 0 = 0
fib 1 = 1
fib n = fib (n - 2) + fib (n - 1)

-- fox fast!
fib' n = fst (foldl (\(x,y) _ -> (y,x+y)) (0,1) [1..n])

-- Library function, exposed to show its recursive nature
foldl _ a [] = a
foldl f a (x:xs) = foldl f (f a x) xs

2008-12-28 Reply Admin

drf:
Tim:
Seems to me the 4,000 Records solution is exactly the correct one is most circumstances - it minimises effort, retesting and chance of any unintended consequences.
If this was in a production application, The only circumstances under which I would normally let one of my developers spend time rewriting this module to do dynamic allocation would be if (a) if I thought the amount of data would was likely grow to 10 times its current value within the lifetime of the application (and even then I would be tempted to just increase the size of the array further, or (b) the allocation of 40000 records used an excessive amount of memory.

Can't agree with you there.

Assuming the lifetime of the application isn't known, it becomes a liability. Its uses cannot be known a priori.

The potential security implications of a buffer overrun attack, depending on where the application is used.

The developer implemented a workaround "feature" to accomodate a buffer overrun that would still require testing.

Depending again on the specifics on the application, the conversion to dynamically-allocated memory would likely be trivial.

Potential end-user support requests and user grievances should the application crash unexpectedly.

Additional documentation necessary to properly document the circumstances and workaround in (3).

If the application were to be replaced, strictly non-critical, not security-sensitive, used by technical users, and poorly-coded, I might be tempted then to simply increase the size of the array. But only then. It is not the correct solution.

Oh, gimme a break. Without knowing what the records are, what sort of activity causes them to expand, there may or may not be justification for using a dynamic scheme or not.

Since that data isn't given, this may qualify as a WtF or not.

If the records in question use a keycode specification that can only handle 3999 records, max, then fixed 4000 records would be a reasonable spec. That doesn't seem to be the case, here, but that doesn't mean that there aren't constraints on the record count which makes a presumed maximum count a reasonable notion.

Sometimes people just love to complicate the hell out of things (and proper de-allocation of dynamic records and GC is always an extra step of complication -- even when it's handled automagically by the system in the background) for the sake of a foolish consistency.

Then people wonder how code gets so ridiculously bloated and everyone needs to have a quadcore @ 3Ghz with 4 gigs of RAM.

"I see no reason why any microcomputer, anywhere, should ever require more than 16 mb of RAM" -- Bill Gates, ca. 1982

Sure, 16mb is a bit low for a modern graphical OS, but the whole "just throw more hardware at it" notion breaks down at some point.

Sooner or later you bozos are going to have to actually pay some attention to code efficiency.

Ilya Ehrenburg · 2008-12-28 Reply Admin

Alexis de Torquemada:
eric76:
This example shows one in which the performance hit is very large. For example, you can code this problem as follows

So in other words, you found a way to write a slow recursive function, and therefore there are no ways to write fast recursive functions.

To be fair, he didn't say that. And it seems not uncommon that in imperative languages recursion is indeed slower than an equivalent iteration as function calls can have significant overhead. However, I've had gcc turning tail recursive functions into faster loops than equivalent iterative code.

Another point which may reinforce the prejudice against recursion is that often a naive (and horribly inefficient) recursion is obvious while an iterative solution is not easy to see and once seen the avoidance of the worst inefficiencies is obvious, as for example in the candy problem.

Alexis de Torquemada:
In fact, because in Haskell expression evaluation is non-strict and free of side-effects, runtime control flow is determined by data dependencies and compiler whims and mostly irrelevant.

And that's why tail recursion is mostly a red herring in Haskell land.

Alexis de Torquemada:
Here's another, timeless and simpler example of naive vs. non-naive use of recursion:
-- calculate n-th fibonacci number, n starting at 0
-- dog slow!
fib 0 = 0
fib 1 = 1
fib n = fib (n - 2) + fib (n - 1)

-- fox fast!
fib' n = fst (foldl (\(x,y) _ -> (y,x+y)) (0,1) [1..n])

Not really incredibly fast, you should use foldl' and a strict pair, otherwise you'll still build an enormous thunk and get a stack overflow for large arguments (~500,000 - 1,000,000).

But isn't the official definition of the Fibonacci sequence

fix ((0:) . scanl (+) 1)

?

2008-12-29 Reply Admin

KattMan:
Smash King:
No, he's perfectly wrong.... if you consider the fact that he swapped the odds for lemons and cherries. If there are 10 lemon drops on a 100-drops package, it's obvious that the chance of a first pick being a lemon one is 1/10, and not 9/10 . But other than all this (including the "1/10 * 1/10 chance" - that should be a 9/10 * 9/10 chance), he's actually right.
Addendum (2008-12-23 10:32): Stupid me! I knew I should have re-read the actual problem... Well who looks like a fool now?

You sir are incorrect. Don't feel bad, you fell for a common mistake. Read up on Gambler's fallacy. The odds of picking a lemon or cherry do not change on the second pick. Think of it this way, on a 6 sided die, what are your chances of rolling a 1? Exactly 1 in 6. Now say you roll a 1, now you roll again, what are your chances of rolling a 1. Once again 1 in 6, even though the odds of rolling two 1's in a row are less, each rolls odds are figured on its own and not influenced by what has gone before it.

downfall:
Objection, facts not in evidence. The OP indicates that, "But if the candy he chose is cherry flavored, he puts it back in the bag and then randomly picks a candy out of the bag and eats it regardless of the flavor."
If he puts it back in the bag, then it's possible he can indeed select the same cherry piece twice.

See downfall gets it. If there are 90% lemons, then there will ve a 90% chance that a lemon will be last. No programming needed.

The sad part of the gamblers fallacy is that even knowing my odds were no better (or worse) then the last few times I went, I still went to a casino this week and lost another $500 on the penny slots.

2008-12-30 Reply Admin

In case anybody wants it, naive Mathematica code for the problem (with memoization):

p[0, y] = 0; p[x, 0] = 1; px[x, y] := x/(x + y) py[x, y] := y/(x + y) p[x, y] := p[x, y] = (py[x, y] + 1) px[x, y] p[x - 1, y] + py[x, y]^2 p[x, y - 1] p[90, 10]

result:

9/110

(in about 40 milliseconds on a slow machine)

2009-06-15 Reply Admin

Severity One:
Mark left out some essential information: an IMSI is a 15-digit number that identifies a mobile telephone, or at least its SIM card. So yes, you could think of it as a very big number, but in reality it's a string.
The Real WTF is that, instead of using a string comparison, he converted this 15-digit string into a floating point number and compared with that.

And I got the numbers wrong with the '4000' and '40,000': it was actually '900' and '90,000'.

You have no idea what sort of tripe I came along when I had to go through his code. Basically every single violation of good programming practices imaginable was there, and even a few that are unimaginable. It took me a couple of months to remedy the worst excesses.

Yet another author coming by to set the creative writing majors straight. QFT.

2012-12-17 Reply Admin

mykelyk:
jmucchiello:
blah:
TRWTF is
test = ((low + high) / 2);
is susceptible to *signed* integer overflow. Not pretty.
That's true. Except it's probably also wrong. A binary search would use
low + ((high - low) / 2)
to find the break point.
Learn math.

Yeah, it should be obvious that both formulae are the same formula refactored.

z	=	x+((y-x)/2)
z-x	=	(y-x)/2
2*(z-x)	=	y-x
z+z-x-x	=	y-x
z+z-x	=	y
z+z	=	x+y
2*z	=	x+y
z	=	(x+y)/2

2023-08-07 Reply Admin

buy prednisone without prescription paypal: http://prednisone1st.store/# prednisone acetate

The 8 Report and Other Gems

Leave a comment on “The 8 Report and Other Gems”