• (cs)

    I kinda like it. It's funny.

  • (cs)

    Well, at least his cs degree is justified. He obviously paid attention during classes...

  • NiceWTF (unregistered)

    The code itself looks like it works correctly though.

    I'm trying to imagine what this programmer would come up with if you told him the next version needs to support Unicode.

  • (cs)

    The same solution of course. He might take take a bit longer and his file might become a bit larger, but he'll still be finished after finite time, so no need to change a working solution ;)

  • (cs)

    Did anyone see the comment above arg == NULL: If the string is empty, then it does not contain a float. Looks like someone has been copy-n-pasting.

    I also like that he uses a "sink" state, instead of immediately quitting as soon as the prefix is illegal. But well, that's my obsession, I guess.

  • John Doe (unregistered)

    The behaviour is different than with atoi. atoi("123blah") would be 123, but this function would return 0, because of the non-numeric characters.

    This code makes me cry; it reminds me too much of my own workplace where everybody is reinventing the wheel!

  • (cs)

    const char *arg; . . . (unsigned char)(*arg)

    Lovely. Truly his understanding of pointers --- and casting thereof --- is unsurpassed. On those platforms which support signed chars, what is the value of negative-fifty-four cast to be an unsigned char?

    And bonus points: the only reason for casting is to serve as an index in an array. We cannot but admire this mind.

  • Unknown attacker (unregistered) in reply to dpm
    dpm:
    > const char *arg; > . > . > . > (unsigned char)(*arg)

    Lovely. Truly his understanding of pointers --- and casting thereof --- is unsurpassed. On those platforms which support signed chars, what is the value of negative-fifty-four cast to be an unsigned char?

    A crash.

  • Tony (unregistered) in reply to PSWorx

    Except that state 2 and state 3 are identical, so he couldn't have been paying that much attention.

  • Vollhorst (unregistered) in reply to dpm
    dpm:
    On those platforms which support signed chars, what is the value of negative-fifty-four cast to be an unsigned char?
    202, why?
  • Chuck (unregistered) in reply to Tony
    Tony:
    Except that state 2 and state 3 are identical, so he couldn't have been paying that much attention.

    I thought that at first myself, but the difference is that if the code ends in state 2 it's not a valid integer (it's just "+")

  • (cs)

    I think it's quite a nice hack in theory.

    More of a "WhyTF did he do this?"

  • (cs) in reply to Unknown attacker
    Unknown attacker:
    dpm:
    > On those platforms which support signed chars, what is the value of negative-fifty-four cast to be an unsigned char?

    A crash.

    You sure about that?

    I'm not about to do the twos-complement math but I'm pretty confident that whatever 8 bits represent -54 will make a perfectly good unsigned char.

  • Steve (unregistered)

    I can't help wondering whether this was actually all hand-entered or generated by some finite state automation program from a specification for unsigned numbers.

  • (cs)

    but you have to admit.. they used a finite automata to do that! I don't understand why they don't use a turing machine.

  • (cs) in reply to zip
    zip:
    I'm pretty confident that whatever 8 bits represent -54 will make a perfectly good unsigned char.
    Casting a negative value to an unsigned type is "undefined", meaning the compiler can do anything it likes.

    Why on earth do you think the concept meaningful?

    Vollhorst:
    202, why?
    That's two. Anyone else going to make that mistake?
  • ruurd (unregistered)

    is not finite state arg. is finite state argh.

  • (cs)

    The Real WTF (tm) is that he initialized a 256-sized array writing 20 numbers per line thus leaving some space on the last line. That's ugly. Everyone knows 256 sized array should be initialized with a nice 16x16 block.

  • AVF (unregistered)

    I'm sorry, but that rocks. It may not be the kind of code you want in production, but from a mathematical/CS point of view, it's just cool.

  • (cs)

    If efficiency was his goal, shouldn't he have put

    // If the string is empty, then it does not contain a float if(arg == NULL) return false;
    before allocating that big array?

    I would have just made it a global variable.

  • Ikke (unregistered)

    Can someone explain this code? I dont totally understand how it works.

  • Jens (unregistered)

    He he,

    Though I would perhaps question the wisdom in using a state table on a per-character basis as done here, I did once use a state table for checking the correctness of a php array before it was upload to a produciton environment.

    Though it didn't deal with single characters but regular expression patterns, i.e what pattern could legally occur after a given previous pattern had been found.

  • (cs) in reply to Ikke
    Ikke:
    Can someone explain this code? I dont totally understand how it works.
    static bool isArgUnsignedInt(const char *arg)
    {
        bool bIsUInt = false;
        int nDigits = 0;
        int nPlusSigns = 0;
    
        if (arg != NULL)
        {
            while (*arg)
            {
                if (*arg == '+' && nDigits == 0 && nPlusSigns == 0)
                {
                    ++nPlusSigns;
                }
                else if (isdigit(*arg))
                {
                    ++nDigits;
                    bIsUInt = true;
                }
                else
                {
                    bIsUInt = false;
                    break;
                }
                ++arg;
            }
        }
    
        return bIsUInt;
    }
    
  • Beeblebrox (unregistered) in reply to vt_mruhlin
    vt_mruhlin:
    I would have just made it a global variable.
    That's not the right answer. There's a keyword I think you should meet. static, meet vt_mruhlin. vt_mruhlin, meet static. There, you've been introduced. Now it is your job to use static responsibly in your code. See the Wikipedia article for more details.
  • me (unregistered) in reply to dpm
    dpm:
    zip:
    I'm pretty confident that whatever 8 bits represent -54 will make a perfectly good unsigned char.
    Casting a negative value to an unsigned type is "undefined", meaning the compiler can do anything it likes.

    Yeah, sure.

    From the C++ standard, §4.7 Integral conversions: (2) If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source integer (modulo 2n where n is the number of bits used to represent the unsigned type). [Note: In a two's complement representation, this conversion is conceptual and there is no change in the bit pattern (if there is no truncation).]

  • Axel R. (unregistered)

    States 2 and 3 are necessary, to catch a lonely "+" sign, or sequences of pluses ("++...+"). That said, too bad he did not exit the loop as soon as state 0 is entered.

    IMO the approach certainly is not stupid, it's very fast and robust. The next best thing would have been to use a regex.

  • nat42 (unregistered)

    Arrgh! The goggles they do nothing! Dear lordy, W-H-Y, was this code written to win a bet or something??!

  • Axel R. (unregistered) in reply to Axel R.

    @dpm:

    Science v.s. brute force.

    Now try to detect a real, including in scientific notation.

    ;-)

  • s. (unregistered)

    You know what, people?

    Artificial Intelligence exists. It lives on the net, takes up mundane jobs like freelance programming to earn its server space and bandwidth, and tries to act and look like human but sometimes fails miserably.

  • (cs)

    I especially like how maintainable it is; for example, adapting this to support recognizing hex numbers would only involve hunting down and modifying, what, 36 constants?

  • (cs) in reply to dpm
    dpm:
    Casting a negative value to an unsigned type is "undefined", meaning the compiler can do anything it likes.

    It can do anything it likes. But I'm talking about what it will do.

  • Confused (unregistered) in reply to NiceWTF
    NiceWTF:
    The code itself looks like it works correctly though.

    Maybe it's just me, but it doesn't seem to me like this code "works" (in the sense that it accurately accomplishes the task at hand). Unless I'm mistaken, since he never exits his loop when an incorrect character is found then any string that ends "correctly" would be considered an unsigned int, no?

    wouldn't 'wtf123' return "true" ?

  • (cs) in reply to Beeblebrox
    Beeblebrox:
    vt_mruhlin:
    I would have just made it a global variable.
    That's not the right answer. There's a keyword I think you should meet. static, meet vt_mruhlin. vt_mruhlin, meet static. There, you've been introduced. Now it is your job to use static responsibly in your code.

    And carefully, because as we know, if there's too much static you can get a shock as soon as you try to touch it!

  • JM (unregistered) in reply to dpm
    dpm:
    Casting a negative value to an unsigned type is "undefined", meaning the compiler can do anything it likes.
    Nothing's as bad as an authoritative prick, except an authoritative prick who gets it wrong. You don't actually program in C, right? Admit it. You just heard that C has "undefined" things and assumed this was one of them.

    From my copy of the (C99) standard, par. 6.3.1.3:

    1. When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it is unchanged.
    1. Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.
    There's nothing undefined about this. Now, what you might have said is that the result is implementation-defined, because the actual value depends on the size of a char. That's right, boys and girls, the C standard doesn't say a char has to be 8 bits. If you actually see an implementation where a char is not 8 bits, though, congratulations, you're probably working on embedded software, and good luck with that.
  • (cs) in reply to Unknown attacker
    Unknown attacker:
    dpm:
    > const char *arg; > . > . > . > (unsigned char)(*arg)

    Lovely. Truly his understanding of pointers --- and casting thereof --- is unsurpassed. On those platforms which support signed chars, what is the value of negative-fifty-four cast to be an unsigned char?

    A crash.

    202, if characters are eight bits. 458 if it's nine bits. etc, etc, 4294967242 if it's 32 bits. And, before anyone says anything, yes this IS guaranteed on non-twos-complement systems - the negative-to-unsigned conversion is well-defined in C.

  • (cs) in reply to Axel R.
    Axel R.:
    States 2 and 3 are necessary
    I fail to see how, since they are identical.
    to catch a lonely "+" sign, or sequences of pluses ("++...+").
    No. Only the first character can be a plus sign. After that, only digits are allowed.
    IMO the approach certainly is not stupid, it's very fast and robust.
    What the hell is "robust"? It _is_ stupid; you don't need states at all when you have isdigit(), and you *certainly* don't need two identical arrays.
    The next best thing would have been to use a regex.
    Clearly you are ignorant of the Jamie Zawinski quotation.
  • Confused (unregistered) in reply to Confused
    Confused:
    NiceWTF:
    The code itself looks like it works correctly though.

    Maybe it's just me, but it doesn't seem to me like this code "works" (in the sense that it accurately accomplishes the task at hand). Unless I'm mistaken, since he never exits his loop when an incorrect character is found then any string that ends "correctly" would be considered an unsigned int, no?

    wouldn't 'wtf123' return "true" ?

    Nevermind... I of course missed the "genius" of his sink state.

  • (cs) in reply to Confused
    Confused:
    wouldn't 'wtf123' return "true" ?
    No. Once it enters "state 0", it never leaves.
  • (cs) in reply to JM
    JM:
    1. When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it is unchanged.

    And if it's being converted to _Bool, it's _True if it's positive, _False if it's zero, and _File_Not_Found if it's negative, right?

  • OJ (unregistered) in reply to Confused
    Confused:
    Maybe it's just me, but it doesn't seem to me like this code "works" (in the sense that it accurately accomplishes the task at hand). Unless I'm mistaken, since he never exits his loop when an incorrect character is found then any string that ends "correctly" would be considered an unsigned int, no?

    wouldn't 'wtf123' return "true" ?

    No. The first 'w' would drop the state to 0. It is not possible to switch to any other states from 0.

    I, for one, think this is actually quite cool. Even cooler would be generating the state machine from a regex and making the generator available somewhere.

  • nat42 (unregistered) in reply to dpm
    dpm:
    Ikke:
    Can someone explain this code? I dont totally understand how it works.
    static bool isArgUnsignedInt(const char *arg)
    {
        bool bIsUInt = false;
        int nDigits = 0;
        int nPlusSigns = 0;
    
        if (arg != NULL)
        {
            while (*arg)
            {
                if (*arg == '+' && nDigits == 0 && nPlusSigns == 0)
                {
                    ++nPlusSigns;
                }
                else if (isdigit(*arg))
                {
                    ++nDigits;
                    bIsUInt = true;
                }
                else
                {
                    bIsUInt = false;
                    break;
                }
                ++arg;
            }
        }
    
        return bIsUInt;
    }
    

    This is functionally equivalent I believe (above code handles +'s differently):

    static bool isArgUnsignedInt(const char *arg)
    {
      if (!arg||!(*arg))
    		return false;
    	if (*arg=='+')
    		++arg;
      if (!(*arg))
    		return false;
    	for (;!(*arg);++arg)
    		if ((*arg)<'0' || (*arg)>'9')
    			return false;
    	return true;
    }
  • moltonel (unregistered) in reply to dpm
    dpm:
    zip:
    I'm pretty confident that whatever 8 bits represent -54 will make a perfectly good unsigned char.
    Casting a negative value to an unsigned type is "undefined", meaning the compiler can do anything it likes.

    Intrigued by this comment, I checked the c99 standard, which states for signed-to-unsigned conversion :

    the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.
    Which, in layman's terms, for like-sized chars/ints/longs, flips the most significant bit on/off (which in turn means "nothing to do" : the unsigned value's MSB is at the same position as the signed value's sign bit). It works as expected for Mr CodeSOD, and is a perfectly defined behavior.

    Concerning the unsigned-to-signed conversion, the behaviour is indeed implmentation-defined (aka "undefined" as dpm pointed out). However, I can't think of any C compiler that wouldn't simply read the bits as a signed value : same code as for the signed-to-unsigned conversion (aka "nothing to do"), and for error-checking, a compile-time warning is usefull and certainly enough.

  • (cs) in reply to nat42
    nat42:
    if (*arg=='+')
    	++arg;

    Definitely better than mine. Thanks.

  • (cs) in reply to OJ
    OJ:
    Even cooler would be generating the state machine from a regex and making the generator available somewhere.
    This is what happens when you use a compiled regular expression in any modern language. The regular expression is converted into a state machine and executed in a loop much like the one in the OP, albeit with a lot more features.
  • (cs)

    Looks to me like someone was bored.

  • nat42 (unregistered) in reply to dpm
    dpm:
    nat42:
    if (*arg=='+')
    	++arg;

    Definitely better than mine. Thanks.

    Actually I'm feeling silly, because I just realised I misread your code, I'm sorry dpm

  • (cs) in reply to dpm
    dpm:
    Axel R.:
    States 2 and 3 are necessary
    I fail to see how, since they are identical.
    Identical in their transitions, but only state 3 is the accepting state of the DFA. If the string ends in any state other than 3, it is not accepted. That's the difference between the two states. Sure, the arrays are the same, but the abstract states themselves are vastly different.
    dpm:
    What the hell is "robust"?
    Robust, meaning the function does what it says it will do, does it very quickly and doesn't screw up on any inputs. It is completely bug-free code.
    dpm:
    The next best thing would have been to use a regex.
    Clearly you are ignorant of the Jamie Zawinski quotation.
    That quote is used by those ignorant (and often fearful) of regular expressions since they are often misapplied. Much like Mr. Zawinski was.
  • (cs)

    Another wtf is that the TDWTF homepage reports this article as having 1222 words....

  • (cs)

    Is everyone here an expert in all aspects of computers? I mean, it's like you guys know the exact details of every single submission. I forgot about FSAs and shit as soon as class ended. Do you people do this for your jobs? My biggest concern at work is timing my lunch break to coincide with co-worker because he makes noises when he eats. I have to leave the room. And why does everyone in the office have to shuffle their feet when they walk? What the fuck! Lift your fucking legs.

  • relaxing (unregistered) in reply to me

    Integral conversions are similarly defined in C89 -- see A6.1. What is dpm going on about?

Leave a comment on “Finite State Arg”

Log In or post as a guest

Replying to comment #:

« Return to Article