The Daily WTF: Curious Perversions in Information Technology

WhiskeyJack · 2008-03-11 Reply Admin

Brilliant, in a useless sort of way.

Kinda like the marble-powered adding machine that this guy built:

http://woodgears.ca/marbleadd/index.html

(In fact, I imagine a similar kathunka-kathunka noise as I picture this code executing on some great long string of digits...)

dpm · 2008-03-11 Reply Admin

Welbog:
dpm:
Axel R.:
States 2 and 3 are necessary
I fail to see how, since they are identical.
Identical in their transitions, but only state 3 is the accepting state of the DFA. If the string ends in any state other than 3, it is not accepted. That's the difference between the two states. Sure, the arrays are the same, but the abstract states themselves are vastly different.

Pfui. A single "bool" variable can replace the entire state #2, and should. It is unnecessary and overly complicating.

Welbog:
dpm:
What the hell is "robust"?
Robust, meaning the function does what it says it will do, does it very quickly and doesn't screw up on any inputs. It is completely bug-free code.

It is quite the opposite. How long does it take to look at that code and ensure that it operates correctly, versus the few lines of C code it can be done in? How stupid is it to hardcode the positions of the digits as ASCII values? Why waste the space (both in code and in runtime memory) with these unnecessary arrays? "Robust" is a good adjective for code you enjoy maintaining, and so does not apply here.

Welbog:
dpm:
The next best thing would have been to use a regex.
Clearly you are ignorant of the Jamie Zawinski quotation.
That quote is used by those ignorant (and often fearful) of regular expressions since they are often misapplied. Much like Mr. Zawinski was.

Mr. Zawinski was what? Ignorant? Fearful? Misapplied?

Jamie meant "use regular expressions only when they are truly needed". This function does not need a regex. It can accomplish what it does with a very few lines of simple C code --- no states, no large static arrays with hardcoded digit representation, and no questionable casting.

2008-03-11 Reply Admin

@dpm

I fail to see how, since they are identical

That's precisely my point. You fail to see how.

No. Only the first character can be a plus sign. After that, only digits are allowed.

That's precisely why states 2 AND 3 are needed (and to catch sequences of pluses at the start)

What the hell is "robust"? It is stupid; you don't need states at all when you have isdigit(), and you certainly don't need two identical arrays.

It's the most robust and failsafe approach to do this detection. Your code is a prime example of exactly NOT what to do (unless you were just trying to "translate" the FSM into procedural code). Try to exent it just a bit (n.nn[eE][+]nn) for example and you end-up with undebuggable spagetthi.

You say you don't need states at all but you code is full of states already, two counters and a bool! You are maintining state all over the place without even knowing it :-)

Now, again, try to detect a real number (float, double, possibly in scientific notation). I keep my stopwatch ready to benchmark your code against the modified FSM. Be sure to pay attention to all possible notations, and to catch all possible illegal forms.

dpm · 2008-03-11 Reply Admin

Axel R.:
You say you don't need states at all but you code is full of states already, two counters and a bool!

Very well. You don't need state tables at all to accomplish what this function does. Even if this is in some library-less environment, you can ignore the isdigit() macro, write

      if ('0' <= *arg && *arg <= '9')

and still achieve the same results with far less code that is far more readable.

Now, again, try to detect a real number (float, double, possibly in scientific notation).

Pass. I made no claim about my code being extensible to that degree. However, since you are so confident that it can be done so easily, feel free impress us, at which point I will admit your "robustness". Assuming of course that most of the readers can understand the state-driven code.

dpm · 2008-03-11 Reply Admin

Axel R.:
Your code is a prime example of exactly NOT what to do (unless you were just trying to "translate" the FSM into procedural code).

Of course that's what I was doing. What else would I do? What did you think it was what not to do?

Thanks to nat42's pointer, my code is reduced to

static bool isArgUnsignedInt(const char *arg)
{
    bool bIsUInt = false;

    if (arg != NULL)
    {
        if (*arg == '+') ++arg;
        while (*arg)
        {
            if (! ('0' <= *arg && *arg <= '9')) return false;
            bIsUInt = true;
            ++arg;
        }
    }

    return bIsUInt;
}

which, I submit, is much shorter, uses much less memory, is much more readable, and is actually faster than the original for most cases which fail.

Try to exent it just a bit (n.nn[eE][+]nn) for example and you end-up with undebuggable spagetthi.

Straw man. I'm analyzing the code as it is. Until you show that it is as extensible as you claim, I have nothing to prove.

brazzy · 2008-03-11 Reply Admin

Welbog:
OJ:
Even cooler would be generating the state machine from a regex and making the generator available somewhere.
This is what happens when you use a compiled regular expression in any modern language. The regular expression is converted into a state machine and executed in a loop much like the one in the OP, albeit with a lot more features.

[image]

Welbog · 2008-03-11 Reply Admin

dpm:
Pfui. A single "bool" variable can replace the entire state #2, and should. It is unnecessary and overly complicating.

That's fair enough, but once you use a boolean for that, you are now interspersing a DFA with normal C code, which would impact readability and understandability (which I will elaborate on below).

dpm:
How long does it take to look at that code and ensure that it operates correctly, versus the few lines of C code it can be done in?

Honestly, once I recognized that it was an implementation of a DFA, I understood how the code works and that it was exhaustive and correct nearly immediately. From my point of view, anyone with an understanding of DFAs would be able to understand this code just fine. The understanding of DFAs comes from a university education (mine, at least - intro to theory of computing was a second-year course which covered regular, context-free and recursive grammars and their respective automata).

dpm:
How stupid is it to hardcode the positions of the digits as ASCII values? Why waste the space (both in code and in runtime memory) with these unnecessary arrays?

These are both very good arguments against the code, and indeed why it is on The Daily WTF and why it deserves to be here. The code wastes space and cannot be used for anything other than ASCII.

I agree with you completely that this is foolish; however, I am also saying that the code itself is bug free. It was obviously designed to only work with ASCII, and with ASCII it works perfectly.

dpm:
"Robust" is a good adjective for code you enjoy maintaining, and so does not apply here.

The algorithm is robust. It's very easy to change to accept a different type of string. There is practically nothing to it. You just change a few constants and you're done. Extending it to Unicode would be a nightmare. I'm not suggesting you do that.

2008-03-11 Reply Admin

Stupidumb:
Is everyone here an expert in all aspects of computers? I mean, it's like you guys know the exact details of every single submission. I forgot about FSAs and shit as soon as class ended. Do you people do this for your jobs? My biggest concern at work is timing my lunch break to coincide with co-worker because he makes noises when he eats. I have to leave the room. And why does everyone in the office have to shuffle their feet when they walk? What the fuck! Lift your fucking legs.

...my biggest concern would be being dumb and lazy.

vt_mruhlin · 2008-03-11 Reply Admin

Can I use static the way my predecessor did?

mq.java:
public volatile static AtomicBoolean True = new AtomicBoolean(true); public volatile static AtomicBoolean False = new AtomicBoolean(false);

Use True and False to see if an MQ connector is in use... MQ.getActiveState() == True.

What to see if a File connector is in use?

File.java:
public volatile static AtomicBoolean FTrue = new AtomicBoolean(true); public volatile static AtomicBoolean FFalse = new AtomicBoolean(false);

File.getActiveState() == FTrue

See, they're static because the truth never changes. But Files and MQs totally have different ideas of what the truth is....

2008-03-11 Reply Admin

This is a really remarkable WTF. It's bug-free, extremely efficient, remembers its CS courses, and has no reason to exist. I'm very impressed.

The only WTF here is that it's a parser for cmdline args. If this actually existed in an optimized inner loop, I would stand up and applaud.

Stupidumb · 2008-03-11 Reply Admin

anon:
Stupidumb:
Is everyone here an expert in all aspects of computers? I mean, it's like you guys know the exact details of every single submission. I forgot about FSAs and shit as soon as class ended. Do you people do this for your jobs? My biggest concern at work is timing my lunch break to coincide with co-worker because he makes noises when he eats. I have to leave the room. And why does everyone in the office have to shuffle their feet when they walk? What the fuck! Lift your fucking legs.
...my biggest concern would be being dumb and lazy.

How did you know I was dumb?

2008-03-11 Reply Admin

Funny how everyone finds this sad, yet not a single code snippet given will be faster. I admit it's not pretty, but it's as fast as you can probably get it.

As for the cast, you cant exactly have negative indexes on an array. So seeing its 256 in size, any value below 0 (char is signed you know) will simply be converted to it's unsigned counterpart to point to valid and positive index in the array.

dpm · 2008-03-11 Reply Admin

Welbog:
Honestly, once I recognized that it was an implementation of a DFA, I understood how the code works and that it was exhaustive and correct nearly immediately.

You can only do that by counting where the '3's are in all three arrays, ensuring that there are the correct number of them and that they are in the correct locations. I politely disagree that such can be done "immediately".

Welbog:
From my point of view, anyone with an understanding of DFAs would be able to understand this code just fine.

Agreed, understanding it _absolutely_ is not being questioned. The amount of time necessary to understand it _relative_ to the function it serves, is. At the risk of making a point you seem to have conceded, it's simply overkill to use state tables here.

Welbog:
I agree with you completely that this is foolish; however, I am also saying that the code itself is bug free.

I stand corrected on my earlier claim about negative-cast-to-unsigned as being "undefined". However, it remains inherently wrong, and I have to reject your claim about this code being "bug free". If the first character of the passed string is 0xb4 and the second is 0x00, what is the result? What is the intended result?

The expression "(unsigned char)(arg)" is a bug. It should be "((unsigned char *) arg)". Agreed?

brazzy · 2008-03-11 Reply Admin

leppie:
Funny how everyone finds this sad, yet not a single code snippet given will be faster. I admit it's not pretty, but it's as fast as you can probably get it.

Sad how there are still people who believe you get brownie points for premature optimization.

As for the cast, you cant exactly have negative indexes on an array.

It's C code, and in C, you most certainly can have negative array indexes.

dpm · 2008-03-11 Reply Admin

leppie:
Funny how everyone finds this sad, yet not a single code snippet given will be faster. I admit it's not pretty, but it's as fast as you can probably get it.

Given that it continues examining bytes even after it is "sunk", I would disagree with that rather broad assessment.

leppie:
any value below 0 (char is signed you know)

Not always. On many platforms, "char" is unsigned, period.

leppie:
will simply be converted to it's unsigned counterpart to point to valid and positive index in the array.

Valid, yes. Positive, yes. Correct, no. Ten of those 128 negative values will end up pointing to '3' instead of '0', and the function would mistakenly return true instead of false.

2008-03-11 Reply Admin

[cote user=dpm] Casting a negative value to an unsigned type is "undefined", meaning the compiler can do anything it likes. [/cote]

Nope. A signed char with the value of -54 is represented as '11001010'. If this value is cast as a unsigned char, it's 202. The compiler will treat that variable as an unsigned, just because you told it to.

2008-03-11 Reply Admin

leppie:
Funny how everyone finds this sad, yet not a single code snippet given will be faster.

Well, the obvious optimization is to throw state 0 over board and just return with false. This increases best-case run time from O(n) to O(1).

Apart from that, one could simply write the state transitions with if-elses inside a switch statement, which would make the code much more readable.

If both would be the case, it still would be a hopelessly over-optimized and over-engineered solution to the problem.

2008-03-11 Reply Admin

I like how everybody is pummeling this guy for a working solution. Maybe this was the first time he did a state machine, and he just did it for fun, or to learn something. Once it's wrapped in a routine, who cares how it's implemented? If it's really just parsing program arguments, who cares if it exits once it's sure the string is invalid? It's not the most maintainable solution, I'll grant, but it's not impossible either. If I were maintaining it, I'd probably just replace it with... with...

magetoo · 2008-03-11 Reply Admin

brazzy:
Welbog:
OJ:
Even cooler would be generating the state machine from a regex and making the generator available somewhere.
This is what happens when you use a compiled regular expression in any modern language. The regular expression is converted into a state machine and executed in a loop much like the one in the OP, albeit with a lot more features.
[Captain Obvious]

Well, it's not obvious to everyone.

Isn't that the point of abstraction, though? By having the "regexp" abstraction, you don't have to learn about the underlying details, so you need only to learn that "regexps are good, use them if you can" (and the syntax...).

I strongly doubt most people here who use them daily could tell how things work under the hood -- and that is a good thing.

Stupidumb · 2008-03-11 Reply Admin

Dave:
I like how everybody is pummeling this guy for a working solution. Maybe this was the first time he did a state machine, and he just did it for fun, or to learn something. Once it's wrapped in a routine, who cares how it's implemented? If it's really just parsing program arguments, who cares if it exits once it's sure the string is invalid? It's not the most maintainable solution, I'll grant, but it's not impossible either. If I were maintaining it, I'd probably just replace it with... with...

They aren't really pummeling the submission's coder. The people here are just bashing each other...as usual.

2008-03-11 Reply Admin

dpm:
Ten of those 128 negative values will end up pointing to '3' instead of '0', and the function would mistakenly return true instead of false.

Eh? Which ten values "mistakenly return true instead of false"?

Whether we read them as signed or unsigned, there are only 256 possible (ascii) values to consider and the code here considers all of them correctly. It correctly parses every possible ascii string.

2008-03-11 Reply Admin

At least with UTF8 (the most common encoding), the same finite state machine would work without the slightest change.

ASCII is still ASCII in UTF8.

2008-03-11 Reply Admin

Any sufficiently good code will be seen as a WTF because people don't understand why it's good.

That's now Sandy's law.

Welbog · 2008-03-11 Reply Admin

dpm:
You can only do that by counting where the '3's are in all three arrays, ensuring that there are the correct number of them and that they are in the correct locations. I politely disagree that such can be done "immediately".

To each his own, I guess. I absolutely loved automata in university so I probably have a bias toward understanding them. To me, this is a readily understandable piece of code.

dpm:
At the risk of making a point you seem to have conceded, it's simply overkill to use state tables here.

You'll have no argument from me on this. I would only support code like this if it were a lot more complicated and/or absolutely needed to be as time-efficient as possible, and even then, only if it makes better use of memory by storing these arrays in a static context and recycling state 2 for 3 somehow.

dpm:
If the first character of the passed string is 0xb4 and the second is 0x00, what is the result? What is the intended result?

The character 0xb4 puts it into the sink state on the first iteration, and the 0x00 stops the loop from iterating further. The final test state == 3 returns false.

dpm:
The expression "(unsigned char)(*arg)" is a bug. It should be "*((unsigned char *) arg)". Agreed?

As it has been brought up, one can cast a signed integer type to an unsigned integer type and the result will be as if the signed type were read as unsigned (assuming two's compliment storage), and that this is guaranteed by C's specification. However, I agree that your suggestion is also correct, and would work even if C didn't guarantee sign-to-unsigned conversion as it does.

Stupidumb · 2008-03-11 Reply Admin

magetoo:
brazzy:
Welbog:
OJ:
Even cooler would be generating the state machine from a regex and making the generator available somewhere.
This is what happens when you use a compiled regular expression in any modern language. The regular expression is converted into a state machine and executed in a loop much like the one in the OP, albeit with a lot more features.
[Captain Obvious]
Well, it's not obvious to everyone.
Isn't that the point of abstraction, though? By having the "regexp" abstraction, you don't have to learn about the underlying details, so you need only to learn that "regexps are good, use them if you can" (and the syntax...).

I strongly doubt most people here who use them daily could tell how things work under the hood -- and that is a good thing.

Your comment just shows your stupidity and ignorance. You don't deserve to be here. People who don't understand the inner-workings of regex are lazy and stupid and don't deserve to be here. I suggest they commit seppuku. If you decide you want to get into IT, you better get ready to program in machine code, motherfucker. You make me sick. Everything in computers comes naturally to me, and is painfully obvious. If it's not obvious to you, I pity you (but not really because I hate you). Now I have to go develop a new shader algorithm for the pI187x.3. You have no idea what that is, no surprise...

SQB · 2008-03-11 Reply Admin

 /* Return:   Whether the string contains a unsigned integer.  */
 /* Author:   12345678911234567892                             */

His name has been redacted, but is apparently 20 characters long.

2008-03-11 Reply Admin

IMHO checking FSM code takes longer because there's alot more cases to check, made worse in this case because of the coder's choice of 20 elements across (16 would have aligned better with std ASCII tables/char maps). Off by one errors and other mishandled cases have got to be harder to spot inside the FSM (this code seems to have dodged these, but I've a new appreciation for "for" now).

As for this code being the holy-grail of speed, I don't buy it. Yes, it could be further optimised (eg remove the hidden multiply/shift required for the array indexing and force the stat into a single dimension array). But if it's only run once or twice I'm not sure the potential gain justify the startup cost of loading it's data into the L1 cache.

Just my thoughts

magetoo · 2008-03-11 Reply Admin

Stupidumb:
magetoo:
Well, it's not obvious to everyone.
Isn't that the point of abstraction, though? ...
Your comment just shows your stupidity and ignorance. You don't deserve to be here. People who don't understand the inner-workings of regex are lazy and stupid and don't deserve to be here. I suggest they commit seppuku. If you decide you want to get into IT, you better get ready to program in machine code, motherfucker. ...

Meh. Wrong site for that. Well, the "Side Bar" forum section perhaps...

2008-03-11 Reply Admin

The expression "(unsigned char)(*arg)" is a bug. It should be "*((unsigned char *) arg)". Agreed?

It's not a bug. It's the same as

const char c; c = *arg; (unsigned char)c

The only problem is if CHAR_BIT != 8 as others have pointed out. But your expression has then same problem, because the error is then in the size of the arrays.

brazzy · 2008-03-11 Reply Admin

magetoo:
brazzy:
Welbog:
OJ:
Even cooler would be generating the state machine from a regex and making the generator available somewhere.
This is what happens when you use a compiled regular expression in any modern language. The regular expression is converted into a state machine and executed in a loop much like the one in the OP, albeit with a lot more features.
[Captain Obvious]
Well, it's not obvious to everyone.

You're right, let me adjust the quip:

Is it a bird? Is it a plane? No, it's sarcasm flying right over Welbog's head!!

Sorry, don't a have a picture for that one.

2008-03-11 Reply Admin

Stupidumb:
People who don't understand the inner-workings of regex are lazy and stupid and don't deserve to be here. I suggest they commit seppuku. If you decide you want to get into IT, you better get ready to program in machine code, motherfucker. You make me sick. Everything in computers comes naturally to me, and is painfully obvious. If it's not obvious to you, I pity you (but not really because I hate you). Now I have to go develop a new shader algorithm for the pI187x.3. You have no idea what that is, no surprise...

Wow - new troll! It's like surreal swamp guy a.k.a "Why is everybody so clueless on importance of low level shader algorithms and if you don't know how mosfet works you don't even belong here" :) (You have no idea what mosfet is, no surprise...)

2008-03-11 Reply Admin

nat42:
...but I've a new appreciation for "for" now).

Geez, I mean "if" not "for", I gotta get more sleep

PerdidoPunk · 2008-03-11 Reply Admin

topcat_arg:
but you have to admit.. they used a finite automata to do that! I don't understand why they don't use a turing machine.

????

Turing machines, in practice, are FS machines...

Simetrical · 2008-03-11 Reply Admin

dpm:
zip:
I'm pretty confident that whatever 8 bits represent -54 will make a perfectly good unsigned char.
Casting a negative value to an unsigned type is "undefined", meaning the compiler can do anything it likes.
Why on earth do you think the concept meaningful?

Vollhorst:
202, why?
That's two. Anyone else going to make that mistake?

Okay, nice theory. Are you aware of any compilers that actually do something different from that in practice? I'm not a C programmer, so I have no idea in this case, but come on, pretending that "undefined behavior" is really never predictable is stupid sometimes.

Addendum (2008-03-11 12:59): Ignore this. I should RTFC's before responding.

2008-03-11 Reply Admin

Welbog:
dpm:
The expression "(unsigned char)(*arg)" is a bug. It should be "*((unsigned char *) arg)". Agreed?
As it has been brought up, one can cast a signed integer type to an unsigned integer type and the result will be as if the signed type were read as unsigned (assuming two's compliment storage), and that this is guaranteed by C's specification. However, I agree that your suggestion is also correct, and would work even if C didn't guarantee sign-to-unsigned conversion as it does.

Not without putting more work into defining "work".

"((unsigned char) arg" and "(unsigned char) (*arg)" are not equivalent. The former forces the compiler to interpret the signed value as an unsigned value (as in, having the exact same bits), which is guaranteed to compile, but not to have the value you'd expect it to have. You can only make it work if you know how signed values are stored on your platform (and adjust your tables accordingly). The second expression properly invokes a conversion, which has a well-defined result.

Yes, on a two's complement architecture where CHAR_BIT == 8 (that is, 99% of the world) it doesn't matter, but for that remaining 1% you might as well not introduce gratuitous incompatibility. (And of course the code still assumes that CHAR_BIT == 8, but fixing that isn't hard, assuming you're still only interested in, say, 256 different characters at most.) So no, the original code is not a bug, and the "fix" just makes things harder.

If you're not convinced by now: don't program in C, use one of the new, sexy O-O languages instead. The drawback of those languages is that they don't allow you to argue fine points like these, because they've all made more assumptions. The advantage of those languages is that they don't allow you to argue fine points like these, because they've all made more assumptions.

Zecc · 2008-03-11 Reply Admin

"Irish I were drunk" girl forever

2008-03-11 Reply Admin

Ikke:
Can someone explain this code? I dont totally understand how it works.

the function creates a 4x256 array of integers (the state array).

each point in the second dimension of the array is populated by either 0, 2 or 3.

The 0th array of integers is enrirely 0s, this is the failed state or sink state or whatever

the 1st array is the starting state.

The 2nd state is the state reach if the string passed in starts with a '+'.

The 3rd state is the one reached if the string is an unsigned integer.

once the array is populated the current state is set to 1 then the input string is checked to see if it is null.

A null string causes the function to exit and return false.

Then the input string is iterated through, one chracter at a time. The curent state is set to 2, 3 or 0 after each character comparison. This is done by retieving the value from the state array for the current state and character.

So if the string "12d" is passed in the the state will changes will go accordingly:

curState = 1 curState = fsa[1][49] = 3 curState = fsa[3][50] = 3 curState = fsa[3][100] = 0

This will cause the function to return false. 49, 50 and 100 are the ascii values for 1, 2 and d respectively.

Hope that helps.

Stupidumb · 2008-03-11 Reply Admin

anonymous coward:
Stupidumb:
People who don't understand the inner-workings of regex are lazy and stupid and don't deserve to be here...
Wow - new troll! It's like surreal swamp guy a.k.a "Why is everybody so clueless on importance of low level shader algorithms and if you don't know how mosfet works you don't even belong here" :) (You have no idea what mosfet is, no surprise...)

You say trollo, I say sarcasmo.

Ok ok, so you're now trying to debunk my claims of being a super shader algorithm genius by assuming I don't know what mosfet is? Man, I hate explaining jokes/sarcasm, it just ruins it... Dude, maybe you're joking too, but from the context of your comment (calling me out to be a troll) I doubt it.

Ok, I'll be honest, I have never used the pI187x.3. I couldn't even find it on wikipedia. I did, however, find what mosfet is: metal–oxide–semiconductor field-effect transistor. I assume that since you brought it up, you understand it completely and can simply will it into existence. You are better than me and in no way resemble the person I emulated in a sarcastic fashion. (If you were just joking, touche, my friend...touche...)

Stupidumb · 2008-03-11 Reply Admin

magetoo:
Stupidumb:
magetoo:
Well, it's not obvious to everyone.
Isn't that the point of abstraction, though? ...
Your comment just shows your stupidity and ignorance. You don't deserve to be here. People who don't understand the ... ...
Meh. Wrong site for that. Well, the "Side Bar" forum section perhaps...

Sigh. wrong site for that. Perhaps on Thursdays...

2008-03-11 Reply Admin

I program in C tictactoe so I don't really get this...but it looks cool

2008-03-11 Reply Admin

dpm:
IMO the approach certainly is not stupid, it's very fast and robust.
What the hell is "robust"? It _is_ stupid; you don't need states at all when you have isdigit(), and you *certainly* don't need two identical arrays.

Removing the duplicate array is an optimization. You aren't supposed to optimize unless you need to. Remember you don't want "premature optimizations."

2008-03-11 Reply Admin

More proof that there are not only noobs, but a whole variaty of them :D

2008-03-11 Reply Admin

dpm:
zip:
I'm pretty confident that whatever 8 bits represent -54 will make a perfectly good unsigned char.
Casting a negative value to an unsigned type is "undefined", meaning the compiler can do anything it likes.
Why on earth do you think the concept meaningful?

Vollhorst:
202, why?
That's two. Anyone else going to make that mistake?

Something like that indeed. Undefined behaviour is just that, the standard doesn't define its behaviour. Each compiler can document this to specify behaviour on its platform.

2008-03-11 Reply Admin

vt_mruhlin:
If efficiency was his goal, shouldn't he have put
// If the string is empty, then it does not contain a float if(arg == NULL) return false;
before allocating that big array?
I would have just made it a global variable.

DUH! He should have made it static anyway... oh boy, some people!

2008-03-11 Reply Admin

nat42:
IMHO checking FSM code takes longer because there's alot more cases to check, made worse in this case because of the coder's choice of 20 elements across (16 would have aligned better with std ASCII tables/char maps). Off by one errors and other mishandled cases have got to be harder to spot inside the FSM (this code seems to have dodged these, but I've a new appreciation for "for" now).
As for this code being the holy-grail of speed, I don't buy it. Yes, it could be further optimised (eg remove the hidden multiply/shift required for the array indexing and force the stat into a single dimension array). But if it's only run once or twice I'm not sure the potential gain justify the startup cost of loading it's data into the L1 cache.

Just my thoughts

Very good point IMHO.

2008-03-11 Reply Admin

iT_Ti:
dpm:
zip:
I'm pretty confident that whatever 8 bits represent -54 will make a perfectly good unsigned char.
Casting a negative value to an unsigned type is "undefined", meaning the compiler can do anything it likes.
Why on earth do you think the concept meaningful?

Vollhorst:
202, why?
That's two. Anyone else going to make that mistake?

Something like that indeed. Undefined behaviour is just that, the standard doesn't define its behaviour. Each compiler can document this to specify behaviour on its platform.

Hmm, it seems it is defined after all. Scratch my remark ;)

dextron · 2008-03-11 Reply Admin

So, assuming this code was used more than just a handful of times at program start, what would the best way to exit the loop

for(; *arg != '\0'; arg++)
    curState = fsa[curState][(unsigned char)(*arg)];

when curState==0?

Something like argcurStat!='\0' or just, *org|curStat

2008-03-11 Reply Admin

    // Evaluate the fsa for the entire string 
    while ( curState )
        curState = fsa[curState][(unsigned char)(*arg++)];

I'm so premature I can hardly optimize myself ;)

2008-03-11 Reply Admin

There was a slashdot article a while ago about efficient argv parsing. It suggested using a tool which had a file format reminiscent to lex/yacc (with %%-delimited sections of stuff to generate C++ code). Yes, someone has generalized and advocated for this ridiculousness.

http://it.slashdot.org/article.pl?sid=07/07/29/1453253

http://www.ibm.com/developerworks/linux/library/l-gperf.html?S_TACT=105AGX59&S_CMP=GR&ca=dgr-lnxw01GPERF

2008-03-11 Reply Admin

A time that was spent writing this code would be much better invested into reading a good book on software development.

Finite State Arg

Leave a comment on “Finite State Arg”