The Daily WTF: Curious Perversions in Information Technology

D2oris · 2008-03-11 Reply Admin

I kinda like it. It's funny.

PSWorx · 2008-03-11 Reply Admin

Well, at least his cs degree is justified. He obviously paid attention during classes...

2008-03-11 Reply Admin

The code itself looks like it works correctly though.

I'm trying to imagine what this programmer would come up with if you told him the next version needs to support Unicode.

PSWorx · 2008-03-11 Reply Admin

The same solution of course. He might take take a bit longer and his file might become a bit larger, but he'll still be finished after finite time, so no need to change a working solution ;)

TGV · 2008-03-11 Reply Admin

Did anyone see the comment above arg == NULL: If the string is empty, then it does not contain a float. Looks like someone has been copy-n-pasting.

I also like that he uses a "sink" state, instead of immediately quitting as soon as the prefix is illegal. But well, that's my obsession, I guess.

2008-03-11 Reply Admin

The behaviour is different than with atoi. atoi("123blah") would be 123, but this function would return 0, because of the non-numeric characters.

This code makes me cry; it reminds me too much of my own workplace where everybody is reinventing the wheel!

dpm · 2008-03-11 Reply Admin

const char *arg; . . . (unsigned char)(*arg)

Lovely. Truly his understanding of pointers --- and casting thereof --- is unsurpassed. On those platforms which support signed chars, what is the value of negative-fifty-four cast to be an unsigned char?

And bonus points: the only reason for casting is to serve as an index in an array. We cannot but admire this mind.

2008-03-11 Reply Admin

dpm:
> const char *arg; > . > . > . > (unsigned char)(*arg)
Lovely. Truly his understanding of pointers --- and casting thereof --- is unsurpassed. On those platforms which support signed chars, what is the value of negative-fifty-four cast to be an unsigned char?

A crash.

2008-03-11 Reply Admin

Except that state 2 and state 3 are identical, so he couldn't have been paying that much attention.

2008-03-11 Reply Admin

dpm:
On those platforms which support signed chars, what is the value of negative-fifty-four cast to be an unsigned char?

202, why?

2008-03-11 Reply Admin

Tony:
Except that state 2 and state 3 are identical, so he couldn't have been paying that much attention.

I thought that at first myself, but the difference is that if the code ends in state 2 it's not a valid integer (it's just "+")

topspin · 2008-03-11 Reply Admin

I think it's quite a nice hack in theory.

More of a "WhyTF did he do this?"

zip · 2008-03-11 Reply Admin

Unknown attacker:
dpm:
> On those platforms which support signed chars, what is the value of negative-fifty-four cast to be an unsigned char?

A crash.

You sure about that?

I'm not about to do the twos-complement math but I'm pretty confident that whatever 8 bits represent -54 will make a perfectly good unsigned char.

2008-03-11 Reply Admin

I can't help wondering whether this was actually all hand-entered or generated by some finite state automation program from a specification for unsigned numbers.

topcat_arg · 2008-03-11 Reply Admin

but you have to admit.. they used a finite automata to do that! I don't understand why they don't use a turing machine.

dpm · 2008-03-11 Reply Admin

zip:
I'm pretty confident that whatever 8 bits represent -54 will make a perfectly good unsigned char.

Casting a negative value to an unsigned type is "undefined", meaning the compiler can do anything it likes.

Why on earth do you think the concept meaningful?

Vollhorst:
202, why?

That's two. Anyone else going to make that mistake?

2008-03-11 Reply Admin

is not finite state arg. is finite state argh.

Sad Bug Killer · 2008-03-11 Reply Admin

The Real WTF (tm) is that he initialized a 256-sized array writing 20 numbers per line thus leaving some space on the last line. That's ugly. Everyone knows 256 sized array should be initialized with a nice 16x16 block.

2008-03-11 Reply Admin

I'm sorry, but that rocks. It may not be the kind of code you want in production, but from a mathematical/CS point of view, it's just cool.

vt_mruhlin · 2008-03-11 Reply Admin

If efficiency was his goal, shouldn't he have put

// If the string is empty, then it does not contain a float if(arg == NULL) return false;

before allocating that big array?

I would have just made it a global variable.

2008-03-11 Reply Admin

Can someone explain this code? I dont totally understand how it works.

2008-03-11 Reply Admin

He he,

Though I would perhaps question the wisdom in using a state table on a per-character basis as done here, I did once use a state table for checking the correctness of a php array before it was upload to a produciton environment.

Though it didn't deal with single characters but regular expression patterns, i.e what pattern could legally occur after a given previous pattern had been found.

dpm · 2008-03-11 Reply Admin

Ikke:
Can someone explain this code? I dont totally understand how it works.

static bool isArgUnsignedInt(const char *arg)
{
    bool bIsUInt = false;
    int nDigits = 0;
    int nPlusSigns = 0;

    if (arg != NULL)
    {
        while (*arg)
        {
            if (*arg == '+' && nDigits == 0 && nPlusSigns == 0)
            {
                ++nPlusSigns;
            }
            else if (isdigit(*arg))
            {
                ++nDigits;
                bIsUInt = true;
            }
            else
            {
                bIsUInt = false;
                break;
            }
            ++arg;
        }
    }

    return bIsUInt;
}

2008-03-11 Reply Admin

vt_mruhlin:
I would have just made it a global variable.

That's not the right answer. There's a keyword I think you should meet. static, meet vt_mruhlin. vt_mruhlin, meet static. There, you've been introduced. Now it is your job to use static responsibly in your code. See the Wikipedia article for more details.

2008-03-11 Reply Admin

dpm:
zip:
I'm pretty confident that whatever 8 bits represent -54 will make a perfectly good unsigned char.
Casting a negative value to an unsigned type is "undefined", meaning the compiler can do anything it likes.

Yeah, sure.

From the C++ standard, §4.7 Integral conversions: (2) If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source integer (modulo 2n where n is the number of bits used to represent the unsigned type). [Note: In a two's complement representation, this conversion is conceptual and there is no change in the bit pattern (if there is no truncation).]

2008-03-11 Reply Admin

States 2 and 3 are necessary, to catch a lonely "+" sign, or sequences of pluses ("++...+"). That said, too bad he did not exit the loop as soon as state 0 is entered.

IMO the approach certainly is not stupid, it's very fast and robust. The next best thing would have been to use a regex.

2008-03-11 Reply Admin

Arrgh! The goggles they do nothing! Dear lordy, W-H-Y, was this code written to win a bet or something??!

2008-03-11 Reply Admin

@dpm:

Science v.s. brute force.

Now try to detect a real, including in scientific notation.

;-)

2008-03-11 Reply Admin

You know what, people?

Artificial Intelligence exists. It lives on the net, takes up mundane jobs like freelance programming to earn its server space and bandwidth, and tries to act and look like human but sometimes fails miserably.

gabba · 2008-03-11 Reply Admin

I especially like how maintainable it is; for example, adapting this to support recognizing hex numbers would only involve hunting down and modifying, what, 36 constants?

zip · 2008-03-11 Reply Admin

dpm:
Casting a negative value to an unsigned type is "undefined", meaning the compiler can do anything it likes.

It can do anything it likes. But I'm talking about what it will do.

2008-03-11 Reply Admin

NiceWTF:
The code itself looks like it works correctly though.

Maybe it's just me, but it doesn't seem to me like this code "works" (in the sense that it accurately accomplishes the task at hand). Unless I'm mistaken, since he never exits his loop when an incorrect character is found then any string that ends "correctly" would be considered an unsigned int, no?

wouldn't 'wtf123' return "true" ?

Claxon · 2008-03-11 Reply Admin

Beeblebrox:
vt_mruhlin:
I would have just made it a global variable.
That's not the right answer. There's a keyword I think you should meet. static, meet vt_mruhlin. vt_mruhlin, meet static. There, you've been introduced. Now it is your job to use static responsibly in your code.

And carefully, because as we know, if there's too much static you can get a shock as soon as you try to touch it!

2008-03-11 Reply Admin

dpm:
Casting a negative value to an unsigned type is "undefined", meaning the compiler can do anything it likes.

Nothing's as bad as an authoritative prick, except an authoritative prick who gets it wrong. You don't actually program in C, right? Admit it. You just heard that C has "undefined" things and assumed this was one of them.

From my copy of the (C99) standard, par. 6.3.1.3:

1. When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it is unchanged.

Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.

There's nothing undefined about this. Now, what you might have said is that the result is implementation-defined, because the actual value depends on the size of a char. That's right, boys and girls, the C standard doesn't say a char has to be 8 bits. If you actually see an implementation where a char is not 8 bits, though, congratulations, you're probably working on embedded software, and good luck with that.

Random832 · 2008-03-11 Reply Admin

Unknown attacker:
dpm:
> const char *arg; > . > . > . > (unsigned char)(*arg)
Lovely. Truly his understanding of pointers --- and casting thereof --- is unsurpassed. On those platforms which support signed chars, what is the value of negative-fifty-four cast to be an unsigned char?

A crash.

202, if characters are eight bits. 458 if it's nine bits. etc, etc, 4294967242 if it's 32 bits. And, before anyone says anything, yes this IS guaranteed on non-twos-complement systems - the negative-to-unsigned conversion is well-defined in C.

dpm · 2008-03-11 Reply Admin

Axel R.:
States 2 and 3 are necessary

I fail to see how, since they are identical.

to catch a lonely "+" sign, or sequences of pluses ("++...+").

No. Only the first character can be a plus sign. After that, only digits are allowed.

IMO the approach certainly is not stupid, it's very fast and robust.

What the hell is "robust"? It _is_ stupid; you don't need states at all when you have isdigit(), and you *certainly* don't need two identical arrays.

The next best thing would have been to use a regex.

Clearly you are ignorant of the Jamie Zawinski quotation.

2008-03-11 Reply Admin

Confused:
NiceWTF:
The code itself looks like it works correctly though.

Maybe it's just me, but it doesn't seem to me like this code "works" (in the sense that it accurately accomplishes the task at hand). Unless I'm mistaken, since he never exits his loop when an incorrect character is found then any string that ends "correctly" would be considered an unsigned int, no?

wouldn't 'wtf123' return "true" ?

Nevermind... I of course missed the "genius" of his sink state.

dpm · 2008-03-11 Reply Admin

Confused:
wouldn't 'wtf123' return "true" ?

No. Once it enters "state 0", it never leaves.

Random832 · 2008-03-11 Reply Admin

JM:

1. When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it is unchanged.

And if it's being converted to _Bool, it's _True if it's positive, _False if it's zero, and _File_Not_Found if it's negative, right?

2008-03-11 Reply Admin

Confused:
Maybe it's just me, but it doesn't seem to me like this code "works" (in the sense that it accurately accomplishes the task at hand). Unless I'm mistaken, since he never exits his loop when an incorrect character is found then any string that ends "correctly" would be considered an unsigned int, no?
wouldn't 'wtf123' return "true" ?

No. The first 'w' would drop the state to 0. It is not possible to switch to any other states from 0.

I, for one, think this is actually quite cool. Even cooler would be generating the state machine from a regex and making the generator available somewhere.

2008-03-11 Reply Admin

dpm:

Ikke:
Can someone explain this code? I dont totally understand how it works.

static bool isArgUnsignedInt(const char *arg)
{
    bool bIsUInt = false;
    int nDigits = 0;
    int nPlusSigns = 0;

    if (arg != NULL)
    {
        while (*arg)
        {
            if (*arg == '+' && nDigits == 0 && nPlusSigns == 0)
            {
                ++nPlusSigns;
            }
            else if (isdigit(*arg))
            {
                ++nDigits;
                bIsUInt = true;
            }
            else
            {
                bIsUInt = false;
                break;
            }
            ++arg;
        }
    }

    return bIsUInt;
}

This is functionally equivalent I believe (above code handles +'s differently):

static bool isArgUnsignedInt(const char *arg)
{
  if (!arg||!(*arg))
		return false;
	if (*arg=='+')
		++arg;
  if (!(*arg))
		return false;
	for (;!(*arg);++arg)
		if ((*arg)<'0' || (*arg)>'9')
			return false;
	return true;
}

2008-03-11 Reply Admin

dpm:
zip:
I'm pretty confident that whatever 8 bits represent -54 will make a perfectly good unsigned char.
Casting a negative value to an unsigned type is "undefined", meaning the compiler can do anything it likes.

Intrigued by this comment, I checked the c99 standard, which states for signed-to-unsigned conversion :

the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.

Which, in layman's terms, for like-sized chars/ints/longs, flips the most significant bit on/off (which in turn means "nothing to do" : the unsigned value's MSB is at the same position as the signed value's sign bit). It works as expected for Mr CodeSOD, and is a perfectly defined behavior.

Concerning the unsigned-to-signed conversion, the behaviour is indeed implmentation-defined (aka "undefined" as dpm pointed out). However, I can't think of any C compiler that wouldn't simply read the bits as a signed value : same code as for the signed-to-unsigned conversion (aka "nothing to do"), and for error-checking, a compile-time warning is usefull and certainly enough.

dpm · 2008-03-11 Reply Admin

nat42:

if (*arg=='+')
	++arg;

Definitely better than mine. Thanks.

Welbog · 2008-03-11 Reply Admin

OJ:
Even cooler would be generating the state machine from a regex and making the generator available somewhere.

This is what happens when you use a compiled regular expression in any modern language. The regular expression is converted into a state machine and executed in a loop much like the one in the OP, albeit with a lot more features.

darkstar949 · 2008-03-11 Reply Admin

Looks to me like someone was bored.

2008-03-11 Reply Admin

dpm:
nat42:
if (*arg=='+')
	++arg;
Definitely better than mine. Thanks.

Actually I'm feeling silly, because I just realised I misread your code, I'm sorry dpm

Welbog · 2008-03-11 Reply Admin

dpm:
Axel R.:
States 2 and 3 are necessary
I fail to see how, since they are identical.

Identical in their transitions, but only state 3 is the accepting state of the DFA. If the string ends in any state other than 3, it is not accepted. That's the difference between the two states. Sure, the arrays are the same, but the abstract states themselves are vastly different.

dpm:
What the hell is "robust"?

Robust, meaning the function does what it says it will do, does it very quickly and doesn't screw up on any inputs. It is completely bug-free code.

dpm:
The next best thing would have been to use a regex.
Clearly you are ignorant of the Jamie Zawinski quotation.

That quote is used by those ignorant (and often fearful) of regular expressions since they are often misapplied. Much like Mr. Zawinski was.

SurfMan · 2008-03-11 Reply Admin

Another wtf is that the TDWTF homepage reports this article as having 1222 words....

Stupidumb · 2008-03-11 Reply Admin

Is everyone here an expert in all aspects of computers? I mean, it's like you guys know the exact details of every single submission. I forgot about FSAs and shit as soon as class ended. Do you people do this for your jobs? My biggest concern at work is timing my lunch break to coincide with co-worker because he makes noises when he eats. I have to leave the room. And why does everyone in the office have to shuffle their feet when they walk? What the fuck! Lift your fucking legs.

2008-03-11 Reply Admin

Integral conversions are similarly defined in C89 -- see A6.1. What is dpm going on about?

Finite State Arg

Leave a comment on “Finite State Arg”