• Bored Bystander (unregistered) in reply to this webcomic is a wtf
    this webcomic is a wtf:
    dpm:
    IMO the approach certainly is not stupid, it's very fast and robust.
    What the hell is "robust"? It _is_ stupid; you don't need states at all when you have isdigit(), and you *certainly* don't need two identical arrays.

    Removing the duplicate array is an optimization. You aren't supposed to optimize unless you need to. Remember you don't want "premature optimizations."

    If the arrays had been declared "const", the compiler might be able to optimize them into one.

    With regard to the people proposing solutions using if-statements, surely using switch statements would be better (although more tedious unless you're using a C compiler which supports ranges in cases). That way the compiler can decide the best way to compile it, it might even create the array/fsm version anyway.

  • (cs) in reply to moltonel
    moltonel:
    Intrigued by this comment, I checked the c99 standard,
    FAIL
    moltonel:
    Concerning the unsigned-to-signed conversion, the behaviour is indeed implmentation-defined (aka "undefined" as dpm pointed out).
    "AKA" my ass. The standard goes into great depth and detail about the precise meanings of "undefined" and "implementation-defined", and the two aren't remotely the same. Go back and check appendix K. (I haven't looked that up. I think it's K, but it might be H. It's there in the appendices anyway. And the terms are both defined in the glossary at the start as well.)
  • (cs) in reply to Stupidumb
    Stupidumb:
    Is everyone here an expert in all aspects of computers?
    Not since you turned up.
    Stupidumb:
    I mean, it's like you guys know the exact details of every single submission.
    That's the great thing about programming computers. You don't need to memorise every single combination of statements in every single possible order. You can understand them by going through them logically in the same step-by-step way that the computer is going to do when it runs them.
    Stupidumb:
    I forgot about FSAs and shit as soon as class ended. Do you people do this for your jobs?
    Wow, you haven't realised that this is a website for computer programmers to sound off about their jobs? How did you end up here, anyway?
    Stupidumb:
    My biggest concern at work is
    whether I want fries with my order. Now get back to work, bitch!
  • MM (unregistered) in reply to John Doe
    John Doe:
    The behaviour is different than with atoi. atoi("123blah") would be 123, but this function would return 0, because of the non-numeric characters.

    This code makes me cry; it reminds me too much of my own workplace where everybody is reinventing the wheel!

    Ok, I'm a little confused. (Assuming atoi is the "wheel" you're refering to here.) First you point out how this is NOT the same because it has a different and more accurate result, but then you still complain that it's reinventing the wheel.

    Incidentally, if I'd been asked to do this, I would probably have used the same state machine, but without the arrays. My version would look something like:

    static bool isArgUnsignedInt(const char *arg)
    {
        char curState = 1;
    
        if(arg == NULL) return false;
    
        while (*arg) { 
            switch (curState) {
               case 1: if (*arg == '+') curState = 2;
                       else if (isdigit(*arg)) curState = 3;
                       else curState = 0;
               break;
               case 2: if (isdigit(*arg)) curState = 3;
                       else curState = 0;
               break;
               case 3: if (!isdigit(*arg)) curState = 0;
            }
            arg++;
        }
    
        return(curState == 3);
    }

    (Replacing "curState = 0" with "return false" would speed it up slightly, but for maintainability I usually prefer to have only a single return at the end of a function. If this is just for error checking and errors are rare, then the speed advantage of multiple returns would be minimal and only worthwhile if speed optimization were particularly critical. The version with arrays might also be somewhat more efficient than mine if they had been declared as static, but again, for most purposes maintainability outweighs such a minor efficiency boost.)

  • dcardani (unregistered)

    Everyone keeps saying it's bug free. I'm not sure I agree. I mean it does what it claims, but what it claims is very likely not intended. You could pass it a 4 Gig string of digits, and it would properly claim that it's an integer. However, you would probably not be able to extract or use that value. So in practice, it validates something you probably don't want it to.

  • (cs) in reply to DaveK
    DaveK:
    Stupidumb:
    Is everyone here an expert in all aspects of computers?
    Not since you turned up. ...
    Stupidumb:
    My biggest concern at work is
    whether I want fries with my order. Now get back to work, bitch!
    Wow, take it easy man. I wasn't angry and i wasn't yelling at you. Here are some references to help you understand what I was doing: http://dictionary.reference.com/browse/humor http://dictionary.reference.com/browse/exaggeration

    But seriously, I wanted to know if some of you guys (especially the ones more vocal about the current story's technical stuff) actually have to employ/deal with things like FSA's and stuff in your jobs. Or maybe it's just something you guys research in spare time. I honestly don't have a problem with either. Please don't freak out and attack my intelligence. (It's rude and pitiful). The reason I wanted to know is that I would like to get a job that is a little more challenging than my current one. I thought that being funny Wouldn't hurt.

    Wow, you haven't realiZed that this is a website for computer programmers to submit WTFs? How did you end up here, anyway? Also, the comment section is a great place to exchange ideas and make jokes. Though some of you like using it to insult others. It's great if you know a little something, just don't brag about it or anything. Have some class.

  • (cs) in reply to DaveK
    DaveK:
    Stupidumb:
    My biggest concern at work is
    whether I want fries with my order. Now get back to work, bitch!
    You screwed that part up. I assume you wanted to insult me by implying that I worked at some fast food place. Here is what you should have done:
    DaveK:
    Stupidumb:
    "My biggest concern at work is
    whether you want fries with your order." Now get back to work, bitch!
  • grumpy (unregistered)

    That... is made of pure awesome!

    I want coworkers who write that kind of code

  • Robin Goodfellow (unregistered) in reply to Ikke
    Ikke:
    Can someone explain this code? I dont totally understand how it works.

    It's a finite state machine, the tables contain lists of state values and information on state transitions.

    This is the key bit:

    for(; *arg != '\0'; arg++)
            curState = fsa[curState][(unsigned char)(*arg)];

    The state information is kept in an "unsigned char" value, which is used as a the first index to a 2-dimensional array (state,char -> newstate). The algorithm loops through the characters in the string and sets the state according to the existing state and the character. There are tables for each state and each table contains 256 entries. The return value is whether the final state at the last character of the string was state 3. If you look at the state tables you see that if you are in states 1, 2 or 3 the next state is 3 if the character is a digit 0-9, whereas you'll go to the sink state 0 if the character is something else and you'll stay in state 0 until the end of the string.

    Actually, this code has one too many states, probably indicating it was copied and modified from elsewhere. State 2 is not used. The overall result of the finite state machine logic is to produce true only if all of the characters in the string (except \0) are numerical digits and produce false otherwise. It's an elaborate way of doing this:

    {   
       if(arg == NULL) return false;
       for(; *arg != '\0'; arg++)
       {
          if (*arg != '0' && *arg != '1' && *arg != '2' && *arg != '3' 
                 && *arg != '4' && *arg != '5' && *arg != '6' 
                 && *arg != '7' && *arg != '8' && *arg != '9')
             { return false; }
       }
       return true;
    }
  • Krisztian Pinter (unregistered)

    pretty outdated. i would use a decision tree forest.

  • Not telling you.... (unregistered) in reply to Stupidumb
    Stupidumb:
    But seriously, I wanted to know if some of you guys (especially the ones more vocal about the current story's technical stuff) actually have to employ/deal with things like FSA's and stuff in your jobs.

    Yes and no. 95% of the time I don't have to think/care about FSA's, but at least once a year I have to solve some problem where a well defined FSA (and not if/else/switch/for/break code) is the cleanest, although not necessarily only, solution.

    Sure I could always do it with some jumble of procedural logic, but sometimes, just sometimes, it's the right thing to do.

  • Ikke (unregistered) in reply to Robin Goodfellow
    Robin Goodfellow:
    Ikke:
    Can someone explain this code? I dont totally understand how it works.

    It's a finite state machine, the tables contain lists of state values and information on state transitions.

    This is the key bit:

    for(; *arg != '\0'; arg++)
            curState = fsa[curState][(unsigned char)(*arg)];

    The state information is kept in an "unsigned char" value, which is used as a the first index to a 2-dimensional array (state,char -> newstate). The algorithm loops through the characters in the string and sets the state according to the existing state and the character. There are tables for each state and each table contains 256 entries. The return value is whether the final state at the last character of the string was state 3. If you look at the state tables you see that if you are in states 1, 2 or 3 the next state is 3 if the character is a digit 0-9, whereas you'll go to the sink state 0 if the character is something else and you'll stay in state 0 until the end of the string.

    Actually, this code has one too many states, probably indicating it was copied and modified from elsewhere. State 2 is not used. The overall result of the finite state machine logic is to produce true only if all of the characters in the string (except \0) are numerical digits and produce false otherwise. It's an elaborate way of doing this:

    {   
       if(arg == NULL) return false;
       for(; *arg != '\0'; arg++)
       {
          if (*arg != '0' && *arg != '1' && *arg != '2' && *arg != '3' 
                 && *arg != '4' && *arg != '5' && *arg != '6' 
                 && *arg != '7' && *arg != '8' && *arg != '9')
             { return false; }
       }
       return true;
    }

    Ah ok, i understand now. The index means the character value, and the actual value means the next state.

    But i disagree that state two is never used. In state 1 there is one value that is 2.

  • (cs) in reply to Not telling you....
    Not telling you....:
    Stupidumb:
    But seriously, I wanted to know if some of you guys (especially the ones more vocal about the current story's technical stuff) actually have to employ/deal with things like FSA's and stuff in your jobs.

    Yes and no. 95% of the time I don't have to think/care about FSA's, but at least once a year I have to solve some problem where a well defined FSA (and not if/else/switch/for/break code) is the cleanest, although not necessarily only, solution.

    Sure I could always do it with some jumble of procedural logic, but sometimes, just sometimes, it's the right thing to do.

    Cool, thanks. I know, from what people are saying, that the post's code is pretty excessive I guess. I just wasn't sure if anyone needed in more practical situations. In school it seemed more of a theoretical thing and something you only need to worry about if you are making a compiler or something.

  • Ikke (unregistered) in reply to Ikke
    Ikke:
    [..]

    Ah ok, i understand now. The index means the character value, and the actual value means the next state.

    But i disagree that state two is never used. In state 1 there is one value that is 2.

    Ok, i see what you meant. It's the same as state 3

  • Maarten (unregistered) in reply to MM

    The wheel would be 'strtol', as stated in the article.

    Something like:

    if (arg == NULL || *arg =='\0' || arg=='-') return 0; char res; strtol(arg, &res, 10); return *res == '\0';

    And anybody else notice the comment for this intchecker:

    // If the string is empty, then it does not contain a float

  • (cs) in reply to Ikke
    Ikke:
    Ikke:
    [..]

    Ah ok, i understand now. The index means the character value, and the actual value means the next state.

    But i disagree that state two is never used. In state 1 there is one value that is 2.

    Ok, i see what you meant. It's the same as state 3

    I thought it meant that it allows numbers with a plus sign at the very beginning.

  • MM (unregistered) in reply to Robin Goodfellow
    Robin Goodfellow:
    Actually, this code has one too many states, probably indicating it was copied and modified from elsewhere. State 2 is not used. The overall result of the finite state machine logic is to produce true only if all of the characters in the string (except \0) are numerical digits and produce false otherwise.
    State 2 is used, because not all characters have to be digits. State 2 allows the first (and only the first) character to be a plus sign. The array is the same as 3 because the array just shows what can follow the current state, and either the plus sign or digits can be followed by digits.
  • Ikke (unregistered) in reply to MM
    MM:
    Robin Goodfellow:
    Actually, this code has one too many states, probably indicating it was copied and modified from elsewhere. State 2 is not used. The overall result of the finite state machine logic is to produce true only if all of the characters in the string (except \0) are numerical digits and produce false otherwise.
    State 2 is used, because not all characters have to be digits. State 2 allows the first (and only the first) character to be a plus sign. The array is the same as 3 because the array just shows what can follow the current state, and either the plus sign or digits can be followed by digits.

    there is no difference if you go from state 1 to state 2 or from state 1 to state 3.

    if the first char is a + sign, then it goes to state 2, and then if the next char is a number, then it goes the state 3, where it stays as long it is a number.

    Would the + sign make it to go to state 3, it still only accepts numbers after the + sign.

  • Anonymous (unregistered) in reply to Ikke
    Ikke:
    there is no difference if you go from state 1 to state 2 or from state 1 to state 3.

    if the first char is a + sign, then it goes to state 2, and then if the next char is a number, then it goes the state 3, where it stays as long it is a number.

    Would the + sign make it to go to state 3, it still only accepts numbers after the + sign.

    sigh

    Hint: Check your solution for the input of "+".

  • MM (unregistered) in reply to Ikke
    Ikke:
    there is no difference if you go from state 1 to state 2 or from state 1 to state 3.

    if the first char is a + sign, then it goes to state 2, and then if the next char is a number, then it goes the state 3, where it stays as long it is a number.

    Would the + sign make it to go to state 3, it still only accepts numbers after the + sign.

    That would only be true if it didn't need to have a number. If the string consists exclusively of "+", then it goes to state 2 and ends there, returning false. The implementation in the article requires a digit to follow and bring it to state 3 before accepting it as a valid integer.

  • Scottford (unregistered) in reply to Random832
    Random832:
    And if it's being converted to _Bool, it's _True if it's positive, _False if it's zero, and _File_Not_Found if it's negative, right?

    No way! It's _False if it's positive, _True if it's zero.

  • Scottford (unregistered) in reply to Stupidumb
    Stupidumb:
    My biggest concern at work is timing my lunch break to coincide with co-worker because he makes noises when he eats. I have to leave the room. And why does everyone in the office have to shuffle their feet when they walk? What the fuck! Lift your fucking legs.

    This was the best post among all you morons.

  • Foo Victim (unregistered) in reply to Ikke

    apologies in advance for any technical errors -- i don't do C if i can help it, and i'm not interested enough to read the specs.

    "the state-transition arrays should be static" i know where you're coming from, but wouldn't making them static mean they remain allocated for the life of the program? as opposed to declaring/initialising them within the method, where they'll be allocated on the stack/heap. (?)

    suggesting state 2 is the same as state 3: you fail. state 2's transitions are the same as state 3's, yes, so there's the opportunity to save some memory there, if you're allowed to re-type the arrays...

    (unsigned char[] *) fsa[4];
    fsa[0] = ...
    fsa[1] = ...
    fsa[2] = ...
    fsa[3] = fsa[2];
    

    In case that wasn't flame-bait enough: you may as well do away with C, just use assembly (wrapped up in macros if you prefer, and/or want it to be "portable" like C is) -- I'm talking in general here, not just for this argv-parsing FSA. As far as I can see, C doesn't really give you anything extra. I prefer readable code, and I'm not sure that's possible in C (except for trivial examples).

  • PseudoNoise (unregistered)

    Bug fix for non-8-bit-char systems:

        // Evaluate the fsa for the entire string 
        for(; *arg != '\0'; arg++)
            curState = fsa[curState][((unsigned char)(*arg))&0xFF];
    

    There, all better!

  • MM (unregistered) in reply to dcardani
    dcardani:
    Everyone keeps saying it's bug free. I'm not sure I agree. I mean it does what it claims, but what it claims is very likely not intended. You could pass it a 4 Gig string of digits, and it would properly claim that it's an integer. However, you would probably not be able to extract or use that value. So in practice, it validates something you probably don't want it to.
    Congradulations. I think that's the most valid criticism yet. Yes, it should really include a length check.

    Tricky to do if you really want to allow all valid numbers up to MAXINT, since you can't just look at the length of its decimal representation.

    The first thing that comes to mind (though probably not the most efficient) would be to keep a count of the digits (the state-3 bytes) as you're going through the state machine. At the end, if the state machine says it's good (i.e. state 3), compare that number of digits to the number of digits in the decimal representation of MAXINT. If it's either fewer or more digits, then your length check is done and you can return true or false, but if it's the same number of digits, you have to go through it again to compare them. At least it would place most of the secondary checking on only certain cases of long numbers.

  • Péter (unregistered) in reply to Alexander Fedyukin

    I think the guy who wrote the code has read enough books. He probably wrote a tool many years ago that -given regular grammar- would generate automata in C. No need for any external libraries. The real WTF is... Well, you know...

  • Axel R. (unregistered)
    dpm:
    However, since you are so confident that it can be done so easily, feel free impress us, at which point I will admit your "robustness".

    Straw man. I'm analyzing the code as it is. Until you show that it is as extensible as you claim, I have nothing to prove.

    Nothing like a small contest, uh?

    static bool isArgReal(const char *arg)
    
    /* Given:    A 8-bits char string argument.                                 */
    /* Task:     Determine if the string is a valid real            */
    /*           value.                                             */
    /* Return:   Whether the string contains a unsigned integer.    */
    /* Author:   [redacted], float/double version by Axel R.        */
    
    {
        // hardcoded finite state automaton to recognize any ASCII  
        // float/double
    
        static const char fsa[14][256] = {
            {    // State 0
                0
            }, { // State 1
                0,  0,  0,  0,  0,  0,  0,  0,  0,  1,  1,  0,  0,  1,  0,  0,
                0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
                1,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  2,  0,  2,  9,  0,
                4,  4,  4,  4,  4,  4,  4,  4,  4,  4
            }, { // State 2
                0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
                0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
                0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  3,  0,
                2,  2,  2,  2,  2,  2,  2,  2,  2,  2
            }, { // State 3
               -1,  0,  0,  0,  0,  0,  0,  0,  0,  8,  8,  0,  0,  8,  0,  0,
                0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
                8,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
                3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  0,  0,  0,  0,  0,  0,
                0,  0,  0,  0,  0, 10,  8,  0,  0,  0,  0,  0,  0,  0,  0,  0,
                0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
                0,  0,  0,  0,  0, 10,  8
            }, { // State 4
                0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
                0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
                0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  5,  0,
                4,  4,  4,  4,  4,  4,  4,  4,  4,  4
            }, { // State 5
               -1,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
                0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
                0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
                6,  6,  6,  6,  6,  6,  6,  6,  6,  6,  0,  0,  0,  0,  0,  0,
                0,  0,  0,  0,  0,  0,  8,  0,  0,  0,  0,  0,  0,  0,  0,  0,
                0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
                0,  0,  0,  0,  0,  0,  8
            }, { // State 6
               -1,  0,  0,  0,  0,  0,  0,  0,  0,  8,  8,  0,  0,  8,  0,  0,
                0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
                8,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
                6,  6,  6,  6,  6,  6,  6,  6,  6,  6,  0,  0,  0,  0,  0,  0,
                0,  0,  0,  0,  0,  7,  8,  0,  0,  0,  0,  0,  0,  0,  0,  0,
                0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
                0,  0,  0,  0,  0,  7,  8
            }, { // State 7
                0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
                0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
                0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, 13,  0, 13,  0,  0,
                3,  3,  3,  3,  3,  3,  3,  3,  3,  3
            }, { // State 8
               -1,  0,  0,  0,  0,  0,  0,  0,  0,  8,  8,  0,  0,  8,  0,  0,
                0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
                8
            }, { // State 9
                0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
                0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
                0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
                5,  5,  5,  5,  5,  5,  5,  5,  5,  5
            }, { // State 10
                0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
                0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
                0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, 11,  0, 11,  0,  0,
               12, 12, 12, 12, 12, 12, 12, 12, 12, 12
            }, { // State 11
                0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
                0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
                0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
               12, 12, 12, 12, 12, 12, 12, 12, 12, 12
            }, { // State 12
               -1,  0,  0,  0,  0,  0,  0,  0,  0,  8,  8,  0,  0,  8,  0,  0,
                0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
                8,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
               12, 12, 12, 12, 12, 12, 12, 12, 12, 12,  0,  0,  0,  0,  0,  0,
                0,  0,  0,  0,  0,  0,  8,  0,  0,  0,  0,  0,  0,  0,  0,  0,
                0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
                0,  0,  0,  0,  0,  0,  8
            }, { // State 13
                0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
                0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
                0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
                3,  3,  3,  3,  3,  3,  3,  3,  3,  3
            }
        };  
    
        int state = 1;
    
        // If the string is null or empty, then it does not contain a Real
    
        if (!arg || !*arg) return false;
    
        // Evaluate the FSA
        
        for (; (state = fsa[state][*arg]) > 0; arg++);
    
        return state == -1;
    }
    
    int main(int argc, char* argv[])
    {
        printf("OK: %d\r\n", isArgReal("1."));
        printf("OK: %d\r\n", isArgReal("1.f"));
        printf("OK: %d\r\n", isArgReal(" .12"));
        printf("OK: %d\r\n", isArgReal(".12F "));
        printf("OK: %d\r\n", isArgReal("+.12"));
        printf("OK: %d\r\n", isArgReal("-.12"));
        printf("OK: %d\r\n", isArgReal("1.23"));
        printf("OK: %d\r\n", isArgReal("+1.23"));
        printf("OK: %d\r\n", isArgReal("-1.23"));
        printf("OK: %d\r\n", isArgReal(".12e3"));
        printf("OK: %d\r\n", isArgReal(".12e+3f"));
        printf("OK: %d\r\n", isArgReal(".12e-3"));
        printf("OK: %d\r\n", isArgReal("1.23e4"));
        printf("OK: %d\r\n", isArgReal("1.23e+45"));
        printf("OK: %d\r\n", isArgReal("1.234E-567"));
        printf("OK: %d\r\n", isArgReal("+1.234E+567"));
        printf("OK: %d\r\n", isArgReal(" -1.234E+567"));
        printf("OK: %d\r\n", isArgReal("+1.234E-567"));
        printf("OK: %d\r\n", isArgReal("-1.234E-567 "));
        printf("OK: %d\r\n", isArgReal("1.7976931348623158e+308"));
        printf("OK: %d\r\n", isArgReal("\t\r\n 2.2250738585072014e-308 \t\r\n"));
        
        printf("KO: %d\r\n", isArgReal("."));
        printf("KO: %d\r\n", isArgReal(".1 2"));
        printf("KO: %d\r\n", isArgReal(". 1 2"));
        printf("KO: %d\r\n", isArgReal("1"));
        printf("KO: %d\r\n", isArgReal("123"));
        printf("KO: %d\r\n", isArgReal("-.12x"));
        printf("KO: %d\r\n", isArgReal("1,23"));
        printf("KO: %d\r\n", isArgReal("12f"));
        printf("KO: %d\r\n", isArgReal("12.34."));
        printf("KO: %d\r\n", isArgReal("12.34.67"));
        printf("KO: %d\r\n", isArgReal("++12.34"));
        printf("KO: %d\r\n", isArgReal("--12.34"));
        printf("KO: %d\r\n", isArgReal("+-12.34"));
        printf("KO: %d\r\n", isArgReal("+12.34-"));
        printf("KO: %d\r\n", isArgReal("+12.34+"));
        printf("KO: %d\r\n", isArgReal("-1.23e+"));
        printf("KO: %d\r\n", isArgReal("+1.23e-"));
        printf("KO: %d\r\n", isArgReal("e"));
        printf("KO: %d\r\n", isArgReal(".e"));
        printf("KO: %d\r\n", isArgReal("-.e"));
        printf("KO: %d\r\n", isArgReal(".1e"));
        printf("KO: %d\r\n", isArgReal("1.1e"));
        printf("KO: %d\r\n", isArgReal("+ 1.234E-567"));
        printf("KO: %d\r\n", isArgReal("+1 .234E-567"));
        printf("KO: %d\r\n", isArgReal("+1. 234E-567"));
        printf("KO: %d\r\n", isArgReal("+1.2 34E-567"));
        printf("KO: %d\r\n", isArgReal("+1.234 E-567"));
        printf("KO: %d\r\n", isArgReal("+1.234E -567"));
        printf("KO: %d\r\n", isArgReal("+1.234E- 567"));
        printf("KO: %d\r\n", isArgReal("fail"));
    }
    
    

    Note the extreme simplicity of the code. Any bug in the tables can be fixed in a matter if minutes. Drawing the state transition diagram is left as an exercise to the reader :-)

    Once the principle is understood, it’s child play to write state transition tables for anything, phone numbers, SSNs, whatever. The next step is to learn what finite state automatons can do for you in other areas of programming, for example in handling complex business processes.

    Good luck with the if/then/else version ;-)

    Cheers, Axel

  • Axel R. (unregistered) in reply to Axel R.
    for (; (state = fsa[state][*arg & 0xff]) > 0; arg++);
    is better for non-ASCII input.
  • Anders (unregistered)

    would be fun to see the unicode version.

  • incoherent (unregistered) in reply to PerdidoPunk
    PerdidoPunk:
    topcat_arg:
    but you have to admit.. they used a finite automata to do that! I don't understand why they don't use a turing machine.

    ????

    Turing machines, in practice, are FS machines...

    If by "in practice" you mean "we don't actually have access to a tape of unlimited length", then sure. But by definition, a finite state machine has no memory except for what state it's currently in and the string it's processing, and a Turing machine has an unlimited amount of state.

    Also, Turing machines are a royal pain to define.

  • DHager (unregistered) in reply to MM
    MM:
    dcardani:
    You could pass it a 4 Gig string of digits, and it would properly claim that it's an integer. However, you would probably not be able to extract or use that value. So in practice, it validates something you probably don't want it to.
    Congradulations. I think that's the most valid criticism yet. Yes, it should really include a length check.

    It depends, because there's equivocation over "integer".

    There are cases where you may want to check that something is a mathematical integer, rather than a platform-dependent computing "integer" type.

  • Axel R. (unregistered)
    Anders:
    would be fun to see the unicode version.

    Here you go:

    #ifdef UNICODE
        for (; 256 > (unsigned) *arg && (state = fsa[state][*arg & 0x00ff]) > 0; arg++);
    #else
        for (; (state = fsa[state][*arg & 0xff]) > 0; arg++);
    #endif
    
    

    What did you expect?

    Cheers, Axel

  • (cs) in reply to Stupidumb
    Stupidumb:
    But seriously, I wanted to know if some of you guys (especially the ones more vocal about the current story's technical stuff) actually have to employ/deal with things like FSA's and stuff in your jobs. Or maybe it's just something you guys research in spare time. I honestly don't have a problem with either.

    One place finite state machines can be found is embedded systems programming. A couple of dated examples would be keeping track of the state of analog modems or implementing an ISDN protocol.

    I guess your exposure to this kind of stuff depends on whether you are doing high- or low-level programming.

  • (cs) in reply to Axel R.
    Axel R.:
    dpm:
    However, since you are so confident that it can be done so easily, feel free impress us, at which point I will admit your "robustness".
    Nothing like a small contest, uh?
    I'm not even going to try building that, let alone run it. I will stipulate that it works perfectly and is completely bug-free. Congratulations.
    Note the extreme simplicity of the code.
    Sure, the code is simple. It ought to be, it's essentially one line. The *data*, though . . . that's just crap. If you can look at those arrays and immediately see that lower-case-e is 4 here and 0 there, you're at the far end of the curve. I wouldn't code that in a normal environment. FSAs are fine where appropriate, but I look at that and see high maintenance costs. In fact I'm laughing at the hard-coded "12"s and "4"s and so on. That's just ridiculous --- why not at least use macros to add some meaning to those element values?
    Once the principle is understood, it’s child play to write state transition tables for anything, phone numbers, SSNs, whatever.
    I don't believe that those are easier and more readable than equivalent C code. The more complex, the more appropriate it is to use a FSA, but phone numbers and SSNs are too small and too fixed to bother.
    The next step is to learn what finite state automatons can do for you in other areas of programming, for example in handling complex business processes.
    No, the next step is to figure out how to write them in a more maintainable fashion.
    Good luck with the if/then/else version ;-)
    I'll get right on that after I'm released from jury duty, which begins in ten hours.
  • Axel R. (unregistered)

    Sure enough, I forgot one possible notation: 8e-007

    State 4 should become:

    { // State 4
    0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
    0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
    0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  5,  0,
    4,  4,  4,  4,  4,  4,  4,  4,  4,  4,  0,  0,  0,  0,  0,  0,
    0,  0,  0,  0,  0,  7,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
    0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
    0,  0,  0,  0,  0,  7,  0
    }

    and

        printf("OK: %d\r\n", isArgReal("1e-001"));
    printf("OK: %d\r\n", isArgReal("1e+001"));
    can be added to the tests.

  • RTH (unregistered) in reply to leppie
    Funny how everyone finds this sad, yet not a single code snippet given will be faster.

    Be careful when you throw out challenges like that!

    <pre>/*--------------------------------------------------------------*/
    static bool isArgUnsignedInt(const char *arg)
    
     /* Given:    A string argument.                               */
     /* Task:     Determine if the string is a valid unsigned      */
     /*           integer value.                                   */
     /* Return:   Whether the string contains a unsigned integer.  */
     /* Author:   [redacted]                             */
    
    {
       // hardcoded finite state automaton to recognize any ASCII  
       // integer
    
       // State 0 - sink state
       static unsigned char fsa0[256] = {
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
    
       // State 1 - initial state [+] -> 2   [0-9] -> 3
       static unsigned char fsa1[256] = {
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 2, 0, 0, 0, 0, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
       // State 2 - [0-9] -> 3
       // State 3 - [0-9] -> 3
       static unsigned char fsa23[256] = {
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
        
       static unsigned char *fsa[4] = {fsa0, fsa1, fsa23, fsa23};
    
        // State 1 is the initial state
        unsigned char curState = 1;
    
        // If the string is empty, then it does not contain a float
        if(arg == NULL) return false;
    
        // Evaluate the fsa for the entire string 
        for(; *arg != '\0'; arg++)
            curState = fsa[curState][(unsigned char)(*arg)];
    
        // States 3 is accepting
        return(curState == 3);
    }
    
  • nat42 (unregistered) in reply to Axel R.
    Axel R.:
    dpm:
    However, since you are so confident that it can be done so easily, feel free impress us, at which point I will admit your "robustness".

    Straw man. I'm analyzing the code as it is. Until you show that it is as extensible as you claim, I have nothing to prove.

    Nothing like a small contest, uh?

    Um, this is my version, forgive me for it is truely awful. I wrote it [sorry I mean hacked it up from hell] to test the speed and I'm not surprised to report it is 41% slower when I tested (with your test data looped 100000 times, GCC 4.1.3 set to -O3). It handles the "f" character and whitespace differently to your code (as per the original code whitespace is a deal breaker for matching)

    bool isArgFloat(const char *arg)
    {
    	bool EncounteredDigit=false;
    
      if (!arg)
    		return false;
    	if ((*arg)<'0')
    	{
    		if (IsPointChar(*arg))
    			goto PostPointHandling;
    		else
    			if (!IsSignChar(*arg))
    				return false;
    	}
    	else
    	{
    		if ((*arg)>'9')
    			return false;
    		else
    			EncounteredDigit=true;
    	}
    
    	while (true)
    	{
    		++arg;
    		if ((*arg)<'0')
    		{
    			if (!(*arg))
    				return EncounteredDigit;
    			if (IsPointChar(*arg))
    				goto PostPointHandling;
    			else
    				return false;
    		}
    		else
    		{
    			if ((*arg)>'9')
    				if (EncounteredDigit && IsExpChar(*arg))
    					goto PostExpHandling;
    				else
    					return false;
    		}
    		EncounteredDigit=true;
    	}
    
    PostPointHandling:
    	EncounteredDigit=false;
    	while (true)
    	{
    		++arg;
    		if ((*arg)<'0')
    		{
    			if (!(*arg))
    				return EncounteredDigit;
    			return false;
    		}
    		else
    		{
    			if ((*arg)>'9')
    				if (EncounteredDigit && IsExpChar(*arg))
    					goto PostExpHandling;
    				else
    					return false;
    		}
    		EncounteredDigit=true;
    	}
    
    PostExpHandling:
    	++arg;
    	EncounteredDigit=false;
    	if ((*arg)<'0')
    	{
    		if (!IsSignChar(*arg))
    			return false;
    	}
    	else
    	{
    		if ((*arg)>'9')
    			return false;
    		else
    			EncounteredDigit=true;
    	}
    
    	while (true)
    	{
    		++arg;
    		if ((*arg)<'0')
    		{
    			if (!*arg)
    				return EncounteredDigit;
    			else
    				return false;
    		}
    		else
    		{
    			if ((*arg)>'9')
    				return false;
    		}
    		EncounteredDigit=true;
    	}
    }
    

    Sighs, Got I need to find work...

  • nat42 (unregistered) in reply to Axel R.
    Axel R.:
    dpm:
    However, since you are so confident that it can be done so easily, feel free impress us, at which point I will admit your "robustness".

    Straw man. I'm analyzing the code as it is. Until you show that it is as extensible as you claim, I have nothing to prove.

    Nothing like a small contest, uh?

    Um, this is my version, forgive me for it is truely awful. I wrote it [sorry I mean hacked it up from hell] to test the speed and I'm not surprised to report it is 41% slower when I tested (with your test data looped 100000 times, GCC 4.1.3 set to -O3). It handles the "f" character and whitespace differently to your code (as per the original code whitespace is a deal breaker for matching)

    bool isArgFloat(const char *arg)
    {
    	bool EncounteredDigit=false;
    
      if (!arg)
    		return false;
    	if ((*arg)<'0')
    	{
    		if (IsPointChar(*arg))
    			goto PostPointHandling;
    		else
    			if (!IsSignChar(*arg))
    				return false;
    	}
    	else
    	{
    		if ((*arg)>'9')
    			return false;
    		else
    			EncounteredDigit=true;
    	}
    
    	while (true)
    	{
    		++arg;
    		if ((*arg)<'0')
    		{
    			if (!(*arg))
    				return EncounteredDigit;
    			if (IsPointChar(*arg))
    				goto PostPointHandling;
    			else
    				return false;
    		}
    		else
    		{
    			if ((*arg)>'9')
    				if (EncounteredDigit && IsExpChar(*arg))
    					goto PostExpHandling;
    				else
    					return false;
    		}
    		EncounteredDigit=true;
    	}
    
    PostPointHandling:
    	EncounteredDigit=false;
    	while (true)
    	{
    		++arg;
    		if ((*arg)<'0')
    		{
    			if (!(*arg))
    				return EncounteredDigit;
    			return false;
    		}
    		else
    		{
    			if ((*arg)>'9')
    				if (EncounteredDigit && IsExpChar(*arg))
    					goto PostExpHandling;
    				else
    					return false;
    		}
    		EncounteredDigit=true;
    	}
    
    PostExpHandling:
    	++arg;
    	EncounteredDigit=false;
    	if ((*arg)<'0')
    	{
    		if (!IsSignChar(*arg))
    			return false;
    	}
    	else
    	{
    		if ((*arg)>'9')
    			return false;
    		else
    			EncounteredDigit=true;
    	}
    
    	while (true)
    	{
    		++arg;
    		if ((*arg)<'0')
    		{
    			if (!*arg)
    				return EncounteredDigit;
    			else
    				return false;
    		}
    		else
    		{
    			if ((*arg)>'9')
    				return false;
    		}
    		EncounteredDigit=true;
    	}
    }
    

    Sighs, God I need to find work...

  • (cs) in reply to incoherent
    incoherent:
    PerdidoPunk:
    topcat_arg:
    but you have to admit.. they used a finite automata to do that! I don't understand why they don't use a turing machine.

    ????

    Turing machines, in practice, are FS machines...

    If by "in practice" you mean "we don't actually have access to a tape of unlimited length", then sure. But by definition, a finite state machine has no memory except for what state it's currently in and the string it's processing, and a Turing machine has an unlimited amount of state.
    That depends on whether you consider the tape a part of the Turing machine or an external I/O mechanism. I think the latter view (In which a Turing machine is nothing but a FSM with a strictly defined I/O mechanism) is the only sensible one, since a FSM with no I/O is rather pointless.

  • John (unregistered) in reply to Vollhorst
    On those platforms which support signed chars, what is the value of negative-fifty-four cast to be an unsigned char?
    202, why?

    You sure? It was 65482 on one platform I've worked on.

  • Rhialto (unregistered) in reply to Welbog
    Welbog:
    dpm:
    The expression "(unsigned char)(*arg)" is a bug. It should be "*((unsigned char *) arg)". Agreed?
    ...However, I agree that your suggestion is also correct, and would work even if C didn't guarantee sign-to-unsigned conversion as it does.
    I don't agree. The second version would change the type of the pointer, which is a very different thing.

    Paragraph 6.3.2.3 of the C99 standard describes conversions of pointers and such a signed to unsigned change is not defined.

  • Jimmy Jones (unregistered) in reply to Tony

    The real WTF is how many WTF readers fail on the coding WTFs.

  • (cs) in reply to Rhialto
    Rhialto:
    Welbog:
    dpm:
    The expression "(unsigned char)(*arg)" is a bug. It should be "*((unsigned char *) arg)". Agreed?
    ...However, I agree that your suggestion is also correct, and would work even if C didn't guarantee sign-to-unsigned conversion as it does.
    I don't agree. The second version would change the type of the pointer, which is a very different thing.

    Paragraph 6.3.2.3 of the C99 standard describes conversions of pointers and such a signed to unsigned change is not defined.

    Indeed, but in any two's complement implementation of C integer types (being the majority of environments), casting the pointer has the same end effect as casting the value. I'm saying that casting the value is the right way to do it, but casting the point will work on most platforms.

    Odds are if you're working on a nonstandard platform you wouldn't be writing code that assumes eight-bit chars in the first place, so I think it's a safe assumption that this code was designed to run on a standard platform, so I think it's a safe assumption that casting the pointer will work.

    I suppose I should have explained that in my original reply.

  • icebird (unregistered)

    I just hacked a benchmark program in C with a regular and a fsa based "isdigit" function. The results vary greatly from different machines.

    This was run on an Athlon64 :
    Number: 4327993
    fsa_isnumeric: 2.440000 sec
    my_isnumeric: 1.630000 sec
    
    Number: +3
    fsa_isnumeric: 0.910000 sec
    my_isnumeric: 0.630000 sec
    
    Number: hello
    fsa_isnumeric: 1.910000 sec
    my_isnumeric: 0.680000 sec
    
    Number:
    fsa_isnumeric: 0.500000 sec
    my_isnumeric: 0.500000 sec
    
    Number: 123457894302846359
    fsa_isnumeric: 6.890000 sec
    my_isnumeric: 3.910000 sec
    

    This was run on a pII (number of iterations scaled down) :

    Number: 4327993
    fsa_isnumeric: 1.210000 sec
    my_isnumeric: 2.570000 sec
    
    Number: +3
    fsa_isnumeric: 0.660000 sec
    my_isnumeric: 0.670000 sec
    
    Number: hello
    fsa_isnumeric: 0.740000 sec
    my_isnumeric: 0.610000 sec
    
    Number:
    fsa_isnumeric: 0.290000 sec
    my_isnumeric: 0.250000 sec
    
    Number: 123457894302846359
    fsa_isnumeric: 2.360000 sec
    my_isnumeric: 5.670000 sec
    

    They were both compiled with gcc -O. Not the same version tho. So I guess both win.

    Here is the source : http://pastebin.ca/939540

  • lost my password (unregistered) in reply to Stupidumb
    Stupidumb:
    magetoo:
    brazzy:
    Welbog:
    OJ:
    Even cooler would be generating the state machine from a regex and making the generator available somewhere.
    This is what happens when you use a compiled regular expression in any modern language. The regular expression is converted into a state machine and executed in a loop much like the one in the OP, albeit with a lot more features.
    [Captain Obvious]
    Well, it's not obvious to everyone.

    Isn't that the point of abstraction, though? By having the "regexp" abstraction, you don't have to learn about the underlying details, so you need only to learn that "regexps are good, use them if you can" (and the syntax...).

    I strongly doubt most people here who use them daily could tell how things work under the hood -- and that is a good thing.

    Your comment just shows your stupidity and ignorance. You don't deserve to be here. People who don't understand the inner-workings of regex are lazy and stupid and don't deserve to be here. I suggest they commit seppuku. If you decide you want to get into IT, you better get ready to program in machine code, motherfucker. You make me sick. Everything in computers comes naturally to me, and is painfully obvious. If it's not obvious to you, I pity you (but not really because I hate you). Now I have to go develop a new shader algorithm for the pI187x.3. You have no idea what that is, no surprise...

    "People who don't understand the inner-workings of regex are lazy and stupid and don't deserve to be here." That tone sounds somewhat presumptuous in my ears. As soon as you get used to actually look at current implementations [1] of regular expression engines instead of blindly reproducing trendy academic myths, it becomes obvious that regular expressions usually are not converted into state machines precisely because of the "lot more features" beyond the generative capacity of finite state automata.

    There was a discussion about regular expression algorithms on the python-ideas mailing list [2], as "Secret Labs' Regular Expression Engine" in Python [2] actually is not based on FSAs. J. Carlson felicitously commented about the "lot more features": "along with groups, named references, etc., which are a PITA for non-recursive engines". I once tried to implement match group extraction in a finite state transducer environment, and I badly experienced the "mathematically impossible, stupid!" nuance of the meaning of "PITA"...

    [1] OK, ignore my comment if you definitely refuse to work with anything less modern than Plan 9: http://swtch.com/~rsc/regexp/regexp1.html [2] http://svn.python.org/view/python/trunk/Lib/re.py [3] http://mail.python.org/pipermail/python-ideas/2007-April/000407.html

  • flux (unregistered) in reply to NeoMojo

    And here's a little picture (made with Graphviz):

    [image]

    I took the liberty to use regular expression syntax in the edges, and "others" to indicate, well, the obvious..

    This kind of code would go better with a DSL, such as with the Regel state machine compiler. Although the advantages, if any, compared to using a regular expression in that case are very small :).

  • Dave (unregistered) in reply to Stupidumb
    Stupidumb:
    Is everyone here an expert in all aspects of computers? I mean, it's like you guys know the exact details of every single submission. I forgot about FSAs and shit as soon as class ended.

    You're not a web designer by any chance, are you? To be honest, this site doesn't really require any input from HTML hair dressers...

  • Paolo G (unregistered)

    Obviously, there had to be a reason to avoid 'strtol', 'atoi'

    There's a good reason to avoid both of these: pass a string that cannot be interpreted as a number, as you get back 0, the same as if you pass "0" (or "0.0", or anything similar). You're better off using something like Boost's numeric_cast library instead.

    CAPTCHA: (numerus non) validus

  • DCRoss (unregistered)

    Sure it works but the real question is this: Can it handle Roman Numerals?

  • gallier2 (unregistered) in reply to Scottford

    Don't laugh, here is a real excerpt of a header of the library a coworker maintains. If the rest of the code wasn't so damn complicated, I could provide Alex with ammunition for years. They are full of WTF but would need a lot of background info to be seen as such.

    #ifndef _boolean
    #define _boolean
      typedef enum {true = 0, false } boolean;
    #endif
    

Leave a comment on “Finite State Arg”

Log In or post as a guest

Replying to comment #:

« Return to Article