The Daily WTF: Curious Perversions in Information Technology

2008-12-22 Reply Admin

8

2008-12-22 Reply Admin

Select 8 from comments

2008-12-22 Reply Admin

Seems to me the 4,000 Records solution is exactly the correct one is most circumstances - it minimises effort, retesting and chance of any unintended consequences.

If this was in a production application, The only circumstances under which I would normally let one of my developers spend time rewriting this module to do dynamic allocation would be if (a) if I thought the amount of data would was likely grow to 10 times its current value within the lifetime of the application (and even then I would be tempted to just increase the size of the array further, or (b) the allocation of 40000 records used an excessive amount of memory.

2008-12-22 Reply Admin

It's funny cause there's no while loop in that last one.

2008-12-22 Reply Admin

Tim:
Seems to me the 4,000 Records solution is exactly the correct one is most circumstances - it minimises effort, retesting and chance of any unintended consequences.
If this was in a production application, The only circumstances under which I would normally let one of my developers spend time rewriting this module to do dynamic allocation would be if (a) if I thought the amount of data would was likely grow to 10 times its current value within the lifetime of the application (and even then I would be tempted to just increase the size of the array further, or (b) the allocation of 40000 records used an excessive amount of memory.

Because it takes so much time to change struct foo foo[4000]; to struct foo *foo = malloc(sizeof(foo) * count);

2008-12-22 Reply Admin

Hmm no, it's recursive. I don't know C(++?) but it actually seems to work.

snoofle · 2008-12-22 Reply Admin

Re 8's - nobody reads reports.

At my first job, some secretary pestered me to write a longer status report than the 2 lines I had written. I appended several relevant source listings and a very large hex core-dump to my status report. It got included up the chain until 3 weeks later somebody finally noticed that the status report took too long to open in mail, scanned through it and found out why. For the next 5 years, I never had to write another status report :)

2008-12-22 Reply Admin

Gunnar:
Hmm no, it's recursive. I don't know C(++?) but it actually seems to work.

Should be 'return binsearch(...)'

but nevertheless hardly a WTF

drf · 2008-12-22 Reply Admin

Tim:
Seems to me the 4,000 Records solution is exactly the correct one is most circumstances - it minimises effort, retesting and chance of any unintended consequences.
If this was in a production application, The only circumstances under which I would normally let one of my developers spend time rewriting this module to do dynamic allocation would be if (a) if I thought the amount of data would was likely grow to 10 times its current value within the lifetime of the application (and even then I would be tempted to just increase the size of the array further, or (b) the allocation of 40000 records used an excessive amount of memory.

Can't agree with you there.

Assuming the lifetime of the application isn't known, it becomes a liability. Its uses cannot be known a priori.
The potential security implications of a buffer overrun attack, depending on where the application is used.
The developer implemented a workaround "feature" to accomodate a buffer overrun that would still require testing.
Depending again on the specifics on the application, the conversion to dynamically-allocated memory would likely be trivial.
Potential end-user support requests and user grievances should the application crash unexpectedly.
Additional documentation necessary to properly document the circumstances and workaround in (3).

If the application were to be replaced, strictly non-critical, not security-sensitive, used by technical users, and poorly-coded, I might be tempted then to simply increase the size of the array. But only then. It is not the correct solution.

Dwedit · 2008-12-22 Reply Admin

The real WTF is that the char arrays are not const.

Zecc · 2008-12-22 Reply Admin

Gunnar:
Hmm no, it's recursive. I don't know C(++?) but it actually seems to work.

Yes, but the comment says "//end while loop", and like someone already pointed out, there's no return before the recursive calls.

ParkinT · 2008-12-22 Reply Admin

Dwedit:
The real WTF is that the char arrays are not const.

The real WTF is that this forum does not filter out the phrase "The real WTF"

kennytm · 2008-12-22 Reply Admin

The real WTF is they store numbers using char[] array.

2008-12-22 Reply Admin

jahhaj:
Gunnar:
Hmm no, it's recursive. I don't know C(++?) but it actually seems to work.

Should be 'return binsearch(...)'

but nevertheless hardly a WTF

atof parses a floating point number. It'll only work if the strings are all numbers and sorted that way.

Code Dependent · 2008-12-22 Reply Admin

The recursion isn't much use without checking the returned result.

2008-12-22 Reply Admin

➈

2008-12-22 Reply Admin

Since the function 'falls of the bottom' without returning a value, the return value will be the value that happens to be in the correct register. This will be the value from the inner call - so will be correct!

The compiler will also, alomost certainly, use 'tail recursion' - converting the code into a loop.

2008-12-22 Reply Admin

Problems with the binary search function:

Creates three unnecessary variables and doesn's use one of them.
The type of sr.
No returns on the recursive calls
Recursive calls are utterly unnecessary as this can easily be performed iteratively.
Confusing comment.
immsi will be parsed as a float for every comparison.
Single space indents are bad style.
strcmp() is not the right way to compare two floats, unless you really want 0 to not equal 0.00.
Why the hell are we passing around a huge array of strings representing floating point numbers anyway?
I hope the data is sorted =)

1, 3, 4 and 5 indicate to me that this was originally implemented as an iterative function, then made recursive. Why, I can't fathom.

2008-12-22 Reply Admin

The binary search code is messy, but I think it does work. The only real error is omitting the return statement. This is undefined behaviour, but apparently their compiler generates code that happens to return false at the end of the function.

Then we have:

a comment which makes no sense
an arbitrary array size in the parameter declaration (which is actually ignored by the compiler; this code works for arrays of any size)
unnecessary use of recursion; a while-loop would have sufficed.

So while I agree this is sloppy coding, none of it makes me scream out WTF?!

2008-12-22 Reply Admin

The algorithm on the binary search is essentially correct. The implementation is buggy and the recursiveness suboptimal, unless you are SURE it will optimize the tail recursion.

I worry, though, about the atoi(). An IMSI has 15 digits, so the chances of an integer being enough aren't very good.

2008-12-22 Reply Admin

If IMSI is what I think it is (international mobile subscriber identity), the immense likelihood is that it's stored internally as BCD. Therefore storing it as a char[] rather than int isn't such a WTF.

2008-12-22 Reply Admin

TRWTF is that the 4000 record program crashed when it reached 4001 records. Unless of course "crash" is new shorthand for aborting execution with an error message like "Unrecoverable error: Found 4001 records in results. Only expected a maximum of 4000 records."

2008-12-22 Reply Admin

well the binsearch function can be wright or wrong, coding such a basic algorithm shows blatant ignorance of common libraries. This is typically a beginner's mistake...

2008-12-22 Reply Admin

I work at a telecom company, and at least here, its common practice to store numbers like an IMSI, MSISDN, etc as plaintext in strings because it makes working with it much easier than having to deal with BCD all the time. The only time you really need it in BCD is when sending it in MAP messages.

2008-12-22 Reply Admin

For some reason, people seem to prefer to solve things recursively even when that is a ridiculously bad approach.

Consider this problem:

George bought a sack of 100 pieces of candy at the store. 90 of the pieces are lemon flavored and ten are cherry flavored. Of the two, George prefers the lemon flavored candies.

Every day George randomly picks a piece of candy out of the bag. If it is lemon flavored, he eats it and puts the bag away for the next day.

But if the candy he chose is cherry flavored, he puts it back in the bag and then randomly picks a candy out of the bag and eats it regardless of the flavor. In other words, he'll only put a piece of candy back at most once per day.

What are the odds that when one piece of candy remains, it will be lemon flavored?

I posed the problem at a company where I used to work. All but one person tried to do it recursively. The remaining person tried to do it using an Excel spreadsheet!!!

2008-12-22 Reply Admin

There is nothing wrong with that function as far as I can tell. The real WTF here is why people keep submitting good code that is above their heads?

The only thing I would be concerned about is: atof(imsi) > atof(sr[test])

But the data set/validation might make it logical to do.

As far as the "it fails to check return values" goes; there really is no need. false can only be hit on the first iteration and true is set by the last iteration of the function before the self recursive function ends.

I doubt the recursive function is executing enough to cause stack overflow issues on a 32 bit system but that might be something to watch out for if you increase the dataset size.

2008-12-22 Reply Admin

ObstinateCoder:
Tim:
Seems to me the 4,000 Records solution is exactly the correct one is most circumstances - it minimises effort, retesting and chance of any unintended consequences.
If this was in a production application, The only circumstances under which I would normally let one of my developers spend time rewriting this module to do dynamic allocation would be if (a) if I thought the amount of data would was likely grow to 10 times its current value within the lifetime of the application (and even then I would be tempted to just increase the size of the array further, or (b) the allocation of 40000 records used an excessive amount of memory.

Because it takes so much time to change struct foo foo[4000]; to struct foo *foo = malloc(sizeof(foo) * count);

This assumes that you have "foo" beforehand. Not true if you're reading from a file or a socket.

bryan986 · 2008-12-22 Reply Admin

Article:
However, what Peter can't figure out is how this report has been sent out for several years an nobody has ever complained about receiving a single-column report consisting solely of the number 8.

They all set up an e-mail filter to delete the report.

2008-12-22 Reply Admin

New word of the day: "segmentaiton", noun, definition: victim of Hooked on Phonics and/or dyslexia.

2008-12-22 Reply Admin

eric76:
For some reason, people seem to prefer to solve things recursively even when that is a ridiculously bad approach.
Consider this problem:

George bought a sack of 100 pieces of candy at the store. 90 of the pieces are lemon flavored and ten are cherry flavored. Of the two, George prefers the lemon flavored candies.

Every day George randomly picks a piece of candy out of the bag. If it is lemon flavored, he eats it and puts the bag away for the next day.

But if the candy he chose is cherry flavored, he puts it back in the bag and then randomly picks a candy out of the bag and eats it regardless of the flavor. In other words, he'll only put a piece of candy back at most once per day.

What are the odds that when one piece of candy remains, it will be lemon flavored?

I posed the problem at a company where I used to work. All but one person tried to do it recursively. The remaining person tried to do it using an Excel spreadsheet!!!

Don't get it. I mean a spreadsheet seems a sensible approach (ok, ignoring issues about rounding etc.) Have a row for each number of remaining candies, and a cell in each row for the number of cherries (so up to 11 cells per row for 0 to 10). The entry is the probability of that combination occurring when we have got to that number of remaining candies.

I assume the issue is meant to be that people should use their brains to solve it and therefore there is a cute answer. I can't see the solution but I can guess the answer on the following logic.

These kind of puzzles, the answer is always 0, 1/2, or 1. It can't be either 0 or 1, because there is a non-zero chance of picking out lemons until there are none left, or cherries till there are none left.

Therefore the answer is a half.

Actually there's a small chance that the answer is either 1/100 or 99/100, because those numbers come up sometime, but I think 1/2 is more likely.

2008-12-22 Reply Admin

The implementation is buggy and the recursiveness suboptimal, unless you are SURE it will optimize the tail recursion.

Are you talking about maximal recursion depth? It's bounded by the base-two logarithm of the input size, so that shouldn't be a problem (less than 40 for a trillion records).

AFAIK tail recursion folding isn't typically done by C/C++ compilers (not that it's impossible, but I wouldn't count on it).

I worry, though, about the atoi(). An IMSI has 15 digits, so the chances of an integer being enough aren't very good.

Ah, an IMSI is some integer number? They are actually using atof() which parses (double-precision) floating point numbers, which happen to support storing 53-bit integers too, so maybe they use atof() as a kludge to support larger integers (up to at least 15 decimal digits without loss of precision).

Severity One · 2008-12-22 Reply Admin

Mark left out some essential information: an IMSI is a 15-digit number that identifies a mobile telephone, or at least its SIM card. So yes, you could think of it as a very big number, but in reality it's a string.

The Real WTF is that, instead of using a string comparison, he converted this 15-digit string into a floating point number and compared with that.

And I got the numbers wrong with the '4000' and '40,000': it was actually '900' and '90,000'.

You have no idea what sort of tripe I came along when I had to go through his code. Basically every single violation of good programming practices imaginable was there, and even a few that are unimaginable. It took me a couple of months to remedy the worst excesses.

SenTree · 2008-12-22 Reply Admin

Ian Gent:
eric76:
Consider this problem:
George bought a sack of 100 pieces of candy at the store. 90 of the pieces are lemon flavored and ten are cherry flavored. Of the two, George prefers the lemon flavored candies.

Every day George randomly picks a piece of candy out of the bag. If it is lemon flavored, he eats it and puts the bag away for the next day.

But if the candy he chose is cherry flavored, he puts it back in the bag and then randomly picks a candy out of the bag and eats it regardless of the flavor. In other words, he'll only put a piece of candy back at most once per day.

What are the odds that when one piece of candy remains, it will be lemon flavored?

I can't see the solution but I can guess the answer on the following logic.

These kind of puzzles, the answer is always 0, 1/2, or 1. It can't be either 0 or 1, because there is a non-zero chance of picking out lemons until there are none left, or cherries till there are none left.

Therefore the answer is a half.

Actually there's a small chance that the answer is either 1/100 or 99/100, because those numbers come up sometime, but I think 1/2 is more likely.

I think he's effectively picking a random candy every day, so the odds remain what they were at the start, 9/10.

2008-12-22 Reply Admin

No: for a lemon to be taken out it needs only be picked at random, but for a cherry to be taken out two cherries need to be picked in succession. That'll shift the distribution substantially towards the end.

2008-12-22 Reply Admin

Julian Calaby:
Problems with the binary search function:

Creates three unnecessary variables and doesn's use one of them.

The type of sr.

No returns on the recursive calls

Recursive calls are utterly unnecessary as this can easily be performed iteratively.

Confusing comment.

immsi will be parsed as a float for every comparison.

Single space indents are bad style.

strcmp() is not the right way to compare two floats, unless you really want 0 to not equal 0.00.

Why the hell are we passing around a huge array of strings representing floating point numbers anyway?

I hope the data is sorted =)

1, 3, 4 and 5 indicate to me that this was originally implemented as an iterative function, then made recursive. Why, I can't fathom.

And, on top of that, the entire array is passed on the stack. Every recursive call, 20 kilobytes are copied onto the stack. I sincerely hope the compiler is smart enough to optimize out the tail recursion.

VibroKatana · 2008-12-22 Reply Admin

Adam:
And, on top of that, the entire array is passed on the stack. Every recursive call, 20 kilobytes are copied onto the stack. I sincerely hope the compiler is smart enough to optimize out the tail recursion.

The arrays are passed as a pointer. However the integers are copied.

2008-12-22 Reply Admin

Adam:

And, on top of that, the entire array is passed on the stack. Every recursive call, 20 kilobytes are copied onto the stack. I sincerely hope the compiler is smart enough to optimize out the tail recursion.

Huh, what? C/C++ doesn't even support passing arrays on the stack...

2008-12-22 Reply Admin

Had Tim mentioned a smarter answer, he might have had more takers for a static allocation.

Where's the file coming from ? We just hear it's "imported". From where ? What's the input to that file ? Is it something that 3rd party folks can add/manipulate ?

If so, every one of you doinks suggesting full dynamic allocation just introduced a different security vulnerability. With 4000 (and better error checking than a plain crash), potential smash the stack. With dynamic, you introduce other exploitable memory issues. Say the size is -1, malloc fails, and you start writing all over memory. Or the size is 4 billion, same issue.

Severity One · 2008-12-22 Reply Admin

Tim:
Seems to me the 4,000 Records solution is exactly the correct one is most circumstances - it minimises effort, retesting and chance of any unintended consequences.
If this was in a production application, The only circumstances under which I would normally let one of my developers spend time rewriting this module to do dynamic allocation would be if (a) if I thought the amount of data would was likely grow to 10 times its current value within the lifetime of the application (and even then I would be tempted to just increase the size of the array further, or (b) the allocation of 40000 records used an excessive amount of memory.

Well, we're talking about mobile phone call records. So even in a country as small as ours (400,000 inhabitants, over half of which are our customers), 900 is not an awful lot of leeway.

The sad bit is that this file read a BER encoded file, looks for some magic numbers (the ASN.1 tags) and stores them into those 900 (or 90,000) records. Then it goes over these records, finds the starting and ending sequence numbers, the start and end date, and how many records were processed. (It actually outputs an SQL query.) None of this information actually requires all these records to be in memory.

Some more code from the same program:

            while (!feof(fpin))
            {
                x1 = getc(fpin);
                if (x1 == 0xDF)
                {
                    y1 = getc(fpin);
                    if (y1 == 0x2F)
                        goto nxt;
                }
            }
nxt:

Why use the 'break' statement if C offers a perfectly good 'goto' statement?

Or this:

frmdt(char *dte)
{
    char s[20];
    time_t now;
    now = time(NULL);
    strftime(s,20,"%Y",localtime(&now));

    if (dte[1] == 0)
    {
        dte[1] = dte[0];
        dte[0] = '0';
    }

    if (dte[3] == 0)
    {
        dte[3] = dte[2];
        dte[2] = '0';
    }

    ndte[0] = dte[0];
    ndte[1] = dte[1];
    ndte[2] = '/';
    ndte[3] = dte[2];
    ndte[4] = dte[3];
    ndte[5] = '/';
    ndte[6] = s[0];
    ndte[7] = s[1];
    ndte[8] = s[2];
    ndte[9] = s[3];

    return 0;
}

Apart from stuff like global variables, for each and every record it uses strftime to find the century. And, incidentally, it takes the current date, rather than the date of the input file.

It just goes on and on...

2008-12-22 Reply Admin

segmentaiton violation.

(Segmentation Fault|Segfault)

bene: 'I have bene here forever' -- Spelling Nazi

2008-12-22 Reply Admin

I liked the first WTF.

2008-12-22 Reply Admin

eric76:
Consider this problem:
George bought a sack of 100 pieces of candy at the store. 90 of the pieces are lemon flavored and ten are cherry flavored. Of the two, George prefers the lemon flavored candies.

Every day George randomly picks a piece of candy out of the bag. If it is lemon flavored, he eats it and puts the bag away for the next day.

But if the candy he chose is cherry flavored, he puts it back in the bag and then randomly picks a candy out of the bag and eats it regardless of the flavor. In other words, he'll only put a piece of candy back at most once per day.

What are the odds that when one piece of candy remains, it will be lemon flavored?

I tried DP approach and got 0.0(81) or 8.18182%.

2008-12-22 Reply Admin

eb0ny:
eric76:
Consider this problem:
...

What are the odds that when one piece of candy remains, it will be lemon flavored?
I tried DP approach and got 0.0(81) or 8.18182%.

Ok, maybe I've been watching too many of those sort of movies, because when I read "DP" I thought of something about as far removed from statistical analysis as you can get...

Captcha: opto (stop it, it'll make you go blind)

2008-12-22 Reply Admin

TRWTF is

test = ((low + high) / 2);

is susceptible to signed integer overflow. Not pretty.

2008-12-22 Reply Admin

Agreed with 0.8181. Done in Excel, by the way. Just a 10x90 grid to brute-force the whole path through the possibility space. In each cell, sum the probability of getting there from the cell directly above, with the probability of getting there from the cell to the left, and it all converges down to an answer at the bottom left.

I'm not sure I can see any greatly simpler solution, so I'd be delighted if the OP could enlighten us on this one.

2008-12-22 Reply Admin

You're a manger right? ok, where? I want to make sure I don't send my resume there. I understand the bottom dollar idea, but a buffer overrun only has to "destroy everything" once.

2008-12-22 Reply Admin

ugh, typo in the post giving x10 error. Actual figure is 0.08181... as previously given. Sorry.

2008-12-22 Reply Admin

Julian Calaby:
4. Recursive calls are utterly unnecessary as this can *easily* be performed iteratively.

But it's easier to read and understand if written recursively and most compilers are smart enough to understand tail recursion and will create a sequence of operations equivalent if not even better to manually done iteration.

Have a look at this http://dl.fefe.de/optimizer-isec.pdf

I totally agree with the rest of your points.

2008-12-22 Reply Admin

That is a much better algorithm than a purely recursive one... I believe we call that dynamic programming. With the caveat that dynamic programming is sort of recursive. I too would like to know the trick to solving this one from the OP.

2008-12-22 Reply Admin

Q:
jahhaj:
Gunnar:
Hmm no, it's recursive. I don't know C(++?) but it actually seems to work.

Should be 'return binsearch(...)'

but nevertheless hardly a WTF

atof parses a floating point number. It'll only work if the strings are all numbers and sorted that way.

Oh my, I totally missed the atof(). I was obsessing over the fact that the function never checked return value of its recursive calls and would fall off the end without returning, plus that it had no way I could see of communicating back the index at which it had found a match.

The 8 Report and Other Gems

Leave a comment on “The 8 Report and Other Gems”