- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
Brilliant, in a useless sort of way.
Kinda like the marble-powered adding machine that this guy built:
http://woodgears.ca/marbleadd/index.html
(In fact, I imagine a similar kathunka-kathunka noise as I picture this code executing on some great long string of digits...)
Admin
It is quite the opposite. How long does it take to look at that code and ensure that it operates correctly, versus the few lines of C code it can be done in? How stupid is it to hardcode the positions of the digits as ASCII values? Why waste the space (both in code and in runtime memory) with these unnecessary arrays? "Robust" is a good adjective for code you enjoy maintaining, and so does not apply here.
Mr. Zawinski was what? Ignorant? Fearful? Misapplied?
Jamie meant "use regular expressions only when they are truly needed". This function does not need a regex. It can accomplish what it does with a very few lines of simple C code --- no states, no large static arrays with hardcoded digit representation, and no questionable casting.
Admin
@dpm
That's precisely my point. You fail to see how.
That's precisely why states 2 AND 3 are needed (and to catch sequences of pluses at the start)
It's the most robust and failsafe approach to do this detection. Your code is a prime example of exactly NOT what to do (unless you were just trying to "translate" the FSM into procedural code). Try to exent it just a bit (n.nn[eE][+]nn) for example and you end-up with undebuggable spagetthi.
You say you don't need states at all but you code is full of states already, two counters and a bool! You are maintining state all over the place without even knowing it :-)
Now, again, try to detect a real number (float, double, possibly in scientific notation). I keep my stopwatch ready to benchmark your code against the modified FSM. Be sure to pay attention to all possible notations, and to catch all possible illegal forms.
Admin
Very well. You don't need state tables at all to accomplish what this function does. Even if this is in some library-less environment, you can ignore the isdigit() macro, write
and still achieve the same results with far less code that is far more readable.
Pass. I made no claim about my code being extensible to that degree. However, since you are so confident that it can be done so easily, feel free impress us, at which point I will admit your "robustness". Assuming of course that most of the readers can understand the state-driven code.Admin
Of course that's what I was doing. What else would I do? What did you think it was what not to do?
Thanks to nat42's pointer, my code is reduced to
which, I submit, is much shorter, uses much less memory, is much more readable, and is actually faster than the original for most cases which fail. Straw man. I'm analyzing the code as it is. Until you show that it is as extensible as you claim, I have nothing to prove.Admin
Admin
I agree with you completely that this is foolish; however, I am also saying that the code itself is bug free. It was obviously designed to only work with ASCII, and with ASCII it works perfectly.
The algorithm is robust. It's very easy to change to accept a different type of string. There is practically nothing to it. You just change a few constants and you're done. Extending it to Unicode would be a nightmare. I'm not suggesting you do that.Admin
Admin
Can I use static the way my predecessor did?
Use True and False to see if an MQ connector is in use... MQ.getActiveState() == True.
What to see if a File connector is in use?
File.getActiveState() == FTrue
See, they're static because the truth never changes. But Files and MQs totally have different ideas of what the truth is....
Admin
This is a really remarkable WTF. It's bug-free, extremely efficient, remembers its CS courses, and has no reason to exist. I'm very impressed.
The only WTF here is that it's a parser for cmdline args. If this actually existed in an optimized inner loop, I would stand up and applaud.
Admin
How did you know I was dumb?
Admin
Funny how everyone finds this sad, yet not a single code snippet given will be faster. I admit it's not pretty, but it's as fast as you can probably get it.
As for the cast, you cant exactly have negative indexes on an array. So seeing its 256 in size, any value below 0 (char is signed you know) will simply be converted to it's unsigned counterpart to point to valid and positive index in the array.
Admin
The expression "(unsigned char)(arg)" is a bug. It should be "((unsigned char *) arg)". Agreed?
Admin
Admin
Admin
[cote user=dpm] Casting a negative value to an unsigned type is "undefined", meaning the compiler can do anything it likes. [/cote]
Nope. A signed char with the value of -54 is represented as '11001010'. If this value is cast as a unsigned char, it's 202. The compiler will treat that variable as an unsigned, just because you told it to.
Admin
Well, the obvious optimization is to throw state 0 over board and just return with false. This increases best-case run time from O(n) to O(1).
Apart from that, one could simply write the state transitions with if-elses inside a switch statement, which would make the code much more readable.
If both would be the case, it still would be a hopelessly over-optimized and over-engineered solution to the problem.
Admin
I like how everybody is pummeling this guy for a working solution. Maybe this was the first time he did a state machine, and he just did it for fun, or to learn something. Once it's wrapped in a routine, who cares how it's implemented? If it's really just parsing program arguments, who cares if it exits once it's sure the string is invalid? It's not the most maintainable solution, I'll grant, but it's not impossible either. If I were maintaining it, I'd probably just replace it with... with...
Admin
Isn't that the point of abstraction, though? By having the "regexp" abstraction, you don't have to learn about the underlying details, so you need only to learn that "regexps are good, use them if you can" (and the syntax...).
I strongly doubt most people here who use them daily could tell how things work under the hood -- and that is a good thing.
Admin
Admin
Eh? Which ten values "mistakenly return true instead of false"?
Whether we read them as signed or unsigned, there are only 256 possible (ascii) values to consider and the code here considers all of them correctly. It correctly parses every possible ascii string.
Admin
At least with UTF8 (the most common encoding), the same finite state machine would work without the slightest change.
ASCII is still ASCII in UTF8.
Admin
Any sufficiently good code will be seen as a WTF because people don't understand why it's good.
That's now Sandy's law.
Admin
Admin
Admin
His name has been redacted, but is apparently 20 characters long.
Admin
IMHO checking FSM code takes longer because there's alot more cases to check, made worse in this case because of the coder's choice of 20 elements across (16 would have aligned better with std ASCII tables/char maps). Off by one errors and other mishandled cases have got to be harder to spot inside the FSM (this code seems to have dodged these, but I've a new appreciation for "for" now).
As for this code being the holy-grail of speed, I don't buy it. Yes, it could be further optimised (eg remove the hidden multiply/shift required for the array indexing and force the stat into a single dimension array). But if it's only run once or twice I'm not sure the potential gain justify the startup cost of loading it's data into the L1 cache.
Just my thoughts
Admin
Admin
It's not a bug. It's the same as
const char c; c = *arg; (unsigned char)c
The only problem is if CHAR_BIT != 8 as others have pointed out. But your expression has then same problem, because the error is then in the size of the arrays.
Admin
Is it a bird? Is it a plane? No, it's sarcasm flying right over Welbog's head!!
Sorry, don't a have a picture for that one.
Admin
Admin
Admin
????
Turing machines, in practice, are FS machines...
Admin
Addendum (2008-03-11 12:59): Ignore this. I should RTFC's before responding.
Admin
"((unsigned char) arg" and "(unsigned char) (*arg)" are not equivalent. The former forces the compiler to interpret the signed value as an unsigned value (as in, having the exact same bits), which is guaranteed to compile, but not to have the value you'd expect it to have. You can only make it work if you know how signed values are stored on your platform (and adjust your tables accordingly). The second expression properly invokes a conversion, which has a well-defined result.
Yes, on a two's complement architecture where CHAR_BIT == 8 (that is, 99% of the world) it doesn't matter, but for that remaining 1% you might as well not introduce gratuitous incompatibility. (And of course the code still assumes that CHAR_BIT == 8, but fixing that isn't hard, assuming you're still only interested in, say, 256 different characters at most.) So no, the original code is not a bug, and the "fix" just makes things harder.
If you're not convinced by now: don't program in C, use one of the new, sexy O-O languages instead. The drawback of those languages is that they don't allow you to argue fine points like these, because they've all made more assumptions. The advantage of those languages is that they don't allow you to argue fine points like these, because they've all made more assumptions.
Admin
"Irish I were drunk" girl forever
Admin
the function creates a 4x256 array of integers (the state array).
each point in the second dimension of the array is populated by either 0, 2 or 3.
The 0th array of integers is enrirely 0s, this is the failed state or sink state or whatever
the 1st array is the starting state.
The 2nd state is the state reach if the string passed in starts with a '+'.
The 3rd state is the one reached if the string is an unsigned integer.
once the array is populated the current state is set to 1 then the input string is checked to see if it is null.
A null string causes the function to exit and return false.
Then the input string is iterated through, one chracter at a time. The curent state is set to 2, 3 or 0 after each character comparison. This is done by retieving the value from the state array for the current state and character.
So if the string "12d" is passed in the the state will changes will go accordingly:
curState = 1 curState = fsa[1][49] = 3 curState = fsa[3][50] = 3 curState = fsa[3][100] = 0
This will cause the function to return false. 49, 50 and 100 are the ascii values for 1, 2 and d respectively.
Hope that helps.
Admin
You say trollo, I say sarcasmo.
Ok ok, so you're now trying to debunk my claims of being a super shader algorithm genius by assuming I don't know what mosfet is? Man, I hate explaining jokes/sarcasm, it just ruins it... Dude, maybe you're joking too, but from the context of your comment (calling me out to be a troll) I doubt it.
Ok, I'll be honest, I have never used the pI187x.3. I couldn't even find it on wikipedia. I did, however, find what mosfet is: metal–oxide–semiconductor field-effect transistor. I assume that since you brought it up, you understand it completely and can simply will it into existence. You are better than me and in no way resemble the person I emulated in a sarcastic fashion. (If you were just joking, touche, my friend...touche...)
Admin
Admin
I program in C tictactoe so I don't really get this...but it looks cool
Admin
Removing the duplicate array is an optimization. You aren't supposed to optimize unless you need to. Remember you don't want "premature optimizations."
Admin
More proof that there are not only noobs, but a whole variaty of them :D
Admin
Something like that indeed. Undefined behaviour is just that, the standard doesn't define its behaviour. Each compiler can document this to specify behaviour on its platform.
Admin
DUH! He should have made it static anyway... oh boy, some people!
Admin
Very good point IMHO.
Admin
Hmm, it seems it is defined after all. Scratch my remark ;)
Admin
So, assuming this code was used more than just a handful of times at program start, what would the best way to exit the loop
when curState==0?
Something like argcurStat!='\0' or just, *org|curStat
Admin
I'm so premature I can hardly optimize myself ;)
Admin
There was a slashdot article a while ago about efficient argv parsing. It suggested using a tool which had a file format reminiscent to lex/yacc (with %%-delimited sections of stuff to generate C++ code). Yes, someone has generalized and advocated for this ridiculousness.
http://it.slashdot.org/article.pl?sid=07/07/29/1453253
http://www.ibm.com/developerworks/linux/library/l-gperf.html?S_TACT=105AGX59&S_CMP=GR&ca=dgr-lnxw01GPERF
Admin
A time that was spent writing this code would be much better invested into reading a good book on software development.