Comment On Finite State Arg

"It's not every day that you come across a hand-coded, table based parser," writes Joel Davis. "That's pretty hardcore. I figured it must have been needed for checking if millions of strings were uints in some super-important inner-loop. Obviously, there had to be a reason to avoid 'strtol', 'atoi' or even 'isdigit'..." [expand full text]
« PrevPage 1 | Page 2 | Page 3 | Page 4Next »

Re: Finite State Arg

2008-03-11 08:02 • by D2oris
I kinda like it. It's funny.

Re: Finite State Arg

2008-03-11 08:11 • by PSWorx
Well, at least his cs degree is justified. He obviously paid attention during classes...

Re: Finite State Arg

2008-03-11 08:12 • by NiceWTF (unregistered)
The code itself looks like it works correctly though.

I'm trying to imagine what this programmer would come up with if you told him the next version needs to support Unicode.

Re: Finite State Arg

2008-03-11 08:15 • by PSWorx
The same solution of course. He might take take a bit longer and his file might become a bit larger, but he'll still be finished after finite time, so no need to change a working solution ;)

Re: Finite State Arg

2008-03-11 08:19 • by TGV
Did anyone see the comment above arg == NULL: If the string is empty, then it does not contain a float. Looks like someone has been copy-n-pasting.

I also like that he uses a "sink" state, instead of immediately quitting as soon as the prefix is illegal. But well, that's my obsession, I guess.

Re: Finite State Arg

2008-03-11 08:22 • by John Doe (unregistered)
The behaviour is different than with atoi.
atoi("123blah") would be 123, but this function would return 0, because of the non-numeric characters.

This code makes me cry; it reminds me too much of my own workplace where everybody is reinventing the wheel!

Re: Finite State Arg

2008-03-11 08:25 • by dpm
> const char *arg;
> .
> .
> .
> (unsigned char)(*arg)

Lovely. Truly his understanding of pointers --- and casting thereof --- is unsurpassed. On those platforms which support signed chars, what is the value of negative-fifty-four cast to be an unsigned char?

And bonus points: the only reason for casting is to serve as an index in an array. We cannot but admire this mind.

Re: Finite State Arg

2008-03-11 08:29 • by Unknown attacker (unregistered)
182661 in reply to 182659
dpm:
> const char *arg;
> .
> .
> .
> (unsigned char)(*arg)

Lovely. Truly his understanding of pointers --- and casting thereof --- is unsurpassed. On those platforms which support signed chars, what is the value of negative-fifty-four cast to be an unsigned char?


A crash.

Re: Finite State Arg

2008-03-11 08:31 • by Tony (unregistered)
182662 in reply to 182652
Except that state 2 and state 3 are identical, so he couldn't have been paying that much attention.

Re: Finite State Arg

2008-03-11 08:37 • by Vollhorst (unregistered)
182664 in reply to 182659
dpm:
On those platforms which support signed chars, what is the value of negative-fifty-four cast to be an unsigned char?
202, why?

Re: Finite State Arg

2008-03-11 08:37 • by Chuck (unregistered)
182665 in reply to 182662
Tony:
Except that state 2 and state 3 are identical, so he couldn't have been paying that much attention.


I thought that at first myself, but the difference is that if the code ends in state 2 it's not a valid integer (it's just "+")

Re: Finite State Arg

2008-03-11 08:38 • by topspin
I think it's quite a nice hack in theory.

More of a "WhyTF did he do this?"

Re: Finite State Arg

2008-03-11 08:38 • by zip
182667 in reply to 182661
Unknown attacker:
dpm:
> On those platforms which support signed chars, what is the value of negative-fifty-four cast to be an unsigned char?


A crash.


You sure about that?

I'm not about to do the twos-complement math but I'm pretty confident that whatever 8 bits represent -54 will make a perfectly good unsigned char.

Re: Finite State Arg

2008-03-11 08:46 • by Steve (unregistered)
I can't help wondering whether this was actually all hand-entered or generated by some finite state automation program from a specification for unsigned numbers.

Re: Finite State Arg

2008-03-11 08:48 • by topcat_arg
but you have to admit.. they used a finite automata to do that! I don't understand why they don't use a turing machine.

Re: Finite State Arg

2008-03-11 08:59 • by dpm
182671 in reply to 182667
zip:
I'm pretty confident that whatever 8 bits represent -54 will make a perfectly good unsigned char.

Casting a negative value to an unsigned type is "undefined", meaning the compiler can do anything it likes.

Why on earth do you think the concept meaningful?

Vollhorst:
202, why?

That's two. Anyone else going to make that mistake?

Re: Finite State Arg

2008-03-11 09:03 • by ruurd (unregistered)
is not finite state arg.
is finite state argh.

Re: Finite State Arg

2008-03-11 09:14 • by Sad Bug Killer
The Real WTF (tm) is that he initialized a 256-sized array writing 20 numbers per line thus leaving some space on the last line. That's ugly. Everyone knows 256 sized array should be initialized with a nice 16x16 block.

Re: Finite State Arg

2008-03-11 09:17 • by AVF (unregistered)
I'm sorry, but that rocks. It may not be the kind of code you want in production, but from a mathematical/CS point of view, it's just cool.

Re: Finite State Arg

2008-03-11 09:23 • by vt_mruhlin
If efficiency was his goal, shouldn't he have put
// If the string is empty, then it does not contain a float
if(arg == NULL) return false;

before allocating that big array?

I would have just made it a global variable.

Re: Finite State Arg

2008-03-11 09:25 • by Ikke (unregistered)
Can someone explain this code? I dont totally understand how it works.

Re: Finite State Arg

2008-03-11 09:25 • by Jens (unregistered)
He he,

Though I would perhaps question the wisdom in using a state table on a per-character basis as done here, I did once use a state table for checking the correctness of a php array before it was upload to a produciton environment.

Though it didn't deal with single characters but regular expression patterns, i.e what pattern could legally occur after a given previous pattern had been found.

Re: Finite State Arg

2008-03-11 09:32 • by dpm
182680 in reply to 182678
Ikke:
Can someone explain this code? I dont totally understand how it works.



static bool isArgUnsignedInt(const char *arg)
{
bool bIsUInt = false;
int nDigits = 0;
int nPlusSigns = 0;

if (arg != NULL)
{
while (*arg)
{
if (*arg == '+' && nDigits == 0 && nPlusSigns == 0)
{
++nPlusSigns;
}
else if (isdigit(*arg))
{
++nDigits;
bIsUInt = true;
}
else
{
bIsUInt = false;
break;
}
++arg;
}
}

return bIsUInt;
}

Re: Finite State Arg

2008-03-11 09:34 • by Beeblebrox (unregistered)
182682 in reply to 182677
vt_mruhlin:
I would have just made it a global variable.

That's not the right answer. There's a keyword I think you should meet. static, meet vt_mruhlin. vt_mruhlin, meet static. There, you've been introduced. Now it is your job to use static responsibly in your code. See the Wikipedia article for more details.

Re: Finite State Arg

2008-03-11 09:35 • by me (unregistered)
182683 in reply to 182671
dpm:
zip:
I'm pretty confident that whatever 8 bits represent -54 will make a perfectly good unsigned char.

Casting a negative value to an unsigned type is "undefined", meaning the compiler can do anything it likes.


Yeah, sure.

From the C++ standard, §4.7 Integral conversions:
(2) If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source integer (modulo 2n where n is the number of bits used to represent the unsigned type).
[Note: In a two's complement representation, this conversion is conceptual and there is no change in the bit pattern (if there is no truncation).]

Re: Finite State Arg

2008-03-11 09:36 • by Axel R. (unregistered)
States 2 and 3 are necessary, to catch a lonely "+" sign, or sequences of pluses ("++...+"). That said, too bad he did not exit the loop as soon as state 0 is entered.

IMO the approach certainly is not stupid, it's very fast and robust. The next best thing would have been to use a regex.

Re: Finite State Arg

2008-03-11 09:42 • by nat42 (unregistered)
Arrgh! The goggles they do nothing! Dear lordy, W-H-Y, was this code written to win a bet or something??!

Re: Finite State Arg

2008-03-11 09:43 • by Axel R. (unregistered)
182686 in reply to 182684
@dpm:

Science v.s. brute force.

Now try to detect a real, including in scientific notation.

;-)

Re: Finite State Arg

2008-03-11 09:44 • by s. (unregistered)
You know what, people?

Artificial Intelligence exists. It lives on the net, takes up mundane jobs like freelance programming to earn its server space and bandwidth, and tries to act and look like human but sometimes fails miserably.

Re: Finite State Arg

2008-03-11 09:47 • by gabba
I especially like how maintainable it is; for example, adapting this to support recognizing hex numbers would only involve hunting down and modifying, what, 36 constants?

Re: Finite State Arg

2008-03-11 09:47 • by zip
182690 in reply to 182671
dpm:

Casting a negative value to an unsigned type is "undefined", meaning the compiler can do anything it likes.


It *can* do anything it likes. But I'm talking about what it *will* do.

Re: Finite State Arg

2008-03-11 09:49 • by Confused (unregistered)
182691 in reply to 182653
NiceWTF:
The code itself looks like it works correctly though.


Maybe it's just me, but it doesn't seem to me like this code "works" (in the sense that it accurately accomplishes the task at hand). Unless I'm mistaken, since he never exits his loop when an incorrect character is found then any string that ends "correctly" would be considered an unsigned int, no?

wouldn't 'wtf123' return "true" ?

Re: Finite State Arg

2008-03-11 09:54 • by Claxon
182693 in reply to 182682
Beeblebrox:
vt_mruhlin:
I would have just made it a global variable.

That's not the right answer. There's a keyword I think you should meet. static, meet vt_mruhlin. vt_mruhlin, meet static. There, you've been introduced. Now it is your job to use static responsibly in your code.


And carefully, because as we know, if there's too much static you can get a shock as soon as you try to touch it!

Re: Finite State Arg

2008-03-11 09:57 • by JM (unregistered)
182695 in reply to 182671
dpm:
Casting a negative value to an unsigned type is "undefined", meaning the compiler can do anything it likes.

Nothing's as bad as an authoritative prick, except an authoritative prick who gets it wrong. You don't actually program in C, right? Admit it. You just heard that C has "undefined" things and assumed this was one of them.

From my copy of the (C99) standard, par. 6.3.1.3:

1. When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it is unchanged.

2. Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.

There's nothing undefined about this. Now, what you *might* have said is that the result is implementation-defined, because the actual value depends on the size of a char. That's right, boys and girls, the C standard doesn't say a char has to be 8 bits. If you actually see an implementation where a char is not 8 bits, though, congratulations, you're probably working on embedded software, and good luck with that.

Re: Finite State Arg

2008-03-11 09:57 • by Random832
182696 in reply to 182661
Unknown attacker:
dpm:
> const char *arg;
> .
> .
> .
> (unsigned char)(*arg)

Lovely. Truly his understanding of pointers --- and casting thereof --- is unsurpassed. On those platforms which support signed chars, what is the value of negative-fifty-four cast to be an unsigned char?


A crash.


202, if characters are eight bits. 458 if it's nine bits. etc, etc, 4294967242 if it's 32 bits. And, before anyone says anything, yes this IS guaranteed on non-twos-complement systems - the negative-to-unsigned conversion is well-defined in C.

Re: Finite State Arg

2008-03-11 09:57 • by dpm
182697 in reply to 182684
Axel R.:
States 2 and 3 are necessary

I fail to see how, since they are identical.

to catch a lonely "+" sign, or sequences of pluses ("++...+").

No. Only the first character can be a plus sign. After that, only digits are allowed.

IMO the approach certainly is not stupid, it's very fast and robust.

What the hell is "robust"? It _is_ stupid; you don't need states at all when you have isdigit(), and you *certainly* don't need two identical arrays.

The next best thing would have been to use a regex.

Clearly you are ignorant of the Jamie Zawinski quotation.

Re: Finite State Arg

2008-03-11 09:58 • by Confused (unregistered)
182698 in reply to 182691
Confused:
NiceWTF:
The code itself looks like it works correctly though.


Maybe it's just me, but it doesn't seem to me like this code "works" (in the sense that it accurately accomplishes the task at hand). Unless I'm mistaken, since he never exits his loop when an incorrect character is found then any string that ends "correctly" would be considered an unsigned int, no?

wouldn't 'wtf123' return "true" ?


Nevermind... I of course missed the "genius" of his sink state.

Re: Finite State Arg

2008-03-11 09:59 • by dpm
182700 in reply to 182691
Confused:
wouldn't 'wtf123' return "true" ?

No. Once it enters "state 0", it never leaves.

Re: Finite State Arg

2008-03-11 10:00 • by Random832
182701 in reply to 182695
JM:

1. When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it is unchanged.



And if it's being converted to _Bool, it's _True if it's positive, _False if it's zero, and _File_Not_Found if it's negative, right?

Re: Finite State Arg

2008-03-11 10:01 • by OJ (unregistered)
182702 in reply to 182691
Confused:

Maybe it's just me, but it doesn't seem to me like this code "works" (in the sense that it accurately accomplishes the task at hand). Unless I'm mistaken, since he never exits his loop when an incorrect character is found then any string that ends "correctly" would be considered an unsigned int, no?

wouldn't 'wtf123' return "true" ?


No. The first 'w' would drop the state to 0. It is not possible to switch to any other states from 0.

I, for one, think this is actually quite cool. Even cooler would be generating the state machine from a regex and making the generator available somewhere.

Re: Finite State Arg

2008-03-11 10:05 • by nat42 (unregistered)
182705 in reply to 182680
dpm:
Ikke:
Can someone explain this code? I dont totally understand how it works.



static bool isArgUnsignedInt(const char *arg)
{
bool bIsUInt = false;
int nDigits = 0;
int nPlusSigns = 0;

if (arg != NULL)
{
while (*arg)
{
if (*arg == '+' && nDigits == 0 && nPlusSigns == 0)
{
++nPlusSigns;
}
else if (isdigit(*arg))
{
++nDigits;
bIsUInt = true;
}
else
{
bIsUInt = false;
break;
}
++arg;
}
}

return bIsUInt;
}



This is functionally equivalent I believe (above code handles +'s differently):

static bool isArgUnsignedInt(const char *arg)
{
if (!arg||!(*arg))
return false;
if (*arg=='+')
++arg;
if (!(*arg))
return false;
for (;!(*arg);++arg)
if ((*arg)<'0' || (*arg)>'9')
return false;
return true;
}

Re: Finite State Arg

2008-03-11 10:05 • by moltonel (unregistered)
182706 in reply to 182671
dpm:
zip:
I'm pretty confident that whatever 8 bits represent -54 will make a perfectly good unsigned char.

Casting a negative value to an unsigned type is "undefined", meaning the compiler can do anything it likes.


Intrigued by this comment, I checked the c99 standard, which states for signed-to-unsigned conversion :
the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.

Which, in layman's terms, for like-sized chars/ints/longs, flips the most significant bit on/off (which in turn means "nothing to do" : the unsigned value's MSB is at the same position as the signed value's sign bit). It works as expected for Mr CodeSOD, and is a perfectly defined behavior.

Concerning the unsigned-to-signed conversion, the behaviour is indeed implmentation-defined (aka "undefined" as dpm pointed out). However, I can't think of any C compiler that wouldn't simply read the bits as a signed value : same code as for the signed-to-unsigned conversion (aka "nothing to do"), and for error-checking, a compile-time warning is usefull and certainly enough.

Re: Finite State Arg

2008-03-11 10:08 • by dpm
182707 in reply to 182705
nat42:
if (*arg=='+')

++arg;


Definitely better than mine. Thanks.

Re: Finite State Arg

2008-03-11 10:09 • by Welbog
182708 in reply to 182702
OJ:
Even cooler would be generating the state machine from a regex and making the generator available somewhere.
This is what happens when you use a compiled regular expression in any modern language. The regular expression is converted into a state machine and executed in a loop much like the one in the OP, albeit with a lot more features.

Re: Finite State Arg

2008-03-11 10:12 • by darkstar949
Looks to me like someone was bored.

Re: Finite State Arg

2008-03-11 10:13 • by nat42 (unregistered)
182713 in reply to 182707
dpm:
nat42:
if (*arg=='+')

++arg;


Definitely better than mine. Thanks.


Actually I'm feeling silly, because I just realised I misread your code, I'm sorry dpm

Re: Finite State Arg

2008-03-11 10:15 • by Welbog
182715 in reply to 182697
dpm:
Axel R.:
States 2 and 3 are necessary

I fail to see how, since they are identical.
Identical in their transitions, but only state 3 is the accepting state of the DFA. If the string ends in any state other than 3, it is not accepted. That's the difference between the two states. Sure, the arrays are the same, but the abstract states themselves are vastly different.

dpm:
What the hell is "robust"?
Robust, meaning the function does what it says it will do, does it very quickly and doesn't screw up on any inputs. It is completely bug-free code.

dpm:
The next best thing would have been to use a regex.
Clearly you are ignorant of the Jamie Zawinski quotation.
That quote is used by those ignorant (and often fearful) of regular expressions since they are often misapplied. Much like Mr. Zawinski was.

Re: Finite State Arg

2008-03-11 10:17 • by SurfMan
Another wtf is that the TDWTF homepage reports this article as having 1222 words....

Re: Finite State Arg

2008-03-11 10:20 • by Stupidumb
Is everyone here an expert in all aspects of computers? I mean, it's like you guys know the exact details of every single submission. I forgot about FSAs and shit as soon as class ended. Do you people do this for your jobs? My biggest concern at work is timing my lunch break to coincide with co-worker because he makes noises when he eats. I have to leave the room. And why does everyone in the office have to shuffle their feet when they walk? What the fuck! Lift your fucking legs.

Re: Finite State Arg

2008-03-11 10:27 • by relaxing (unregistered)
182719 in reply to 182683
Integral conversions are similarly defined in C89 -- see A6.1.
What is dpm going on about?
« PrevPage 1 | Page 2 | Page 3 | Page 4Next »

Add Comment