- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
If the arrays had been declared "const", the compiler might be able to optimize them into one.
With regard to the people proposing solutions using if-statements, surely using switch statements would be better (although more tedious unless you're using a C compiler which supports ranges in cases). That way the compiler can decide the best way to compile it, it might even create the array/fsm version anyway.
Admin
Admin
Admin
Incidentally, if I'd been asked to do this, I would probably have used the same state machine, but without the arrays. My version would look something like:
(Replacing "curState = 0" with "return false" would speed it up slightly, but for maintainability I usually prefer to have only a single return at the end of a function. If this is just for error checking and errors are rare, then the speed advantage of multiple returns would be minimal and only worthwhile if speed optimization were particularly critical. The version with arrays might also be somewhat more efficient than mine if they had been declared as static, but again, for most purposes maintainability outweighs such a minor efficiency boost.)
Admin
Everyone keeps saying it's bug free. I'm not sure I agree. I mean it does what it claims, but what it claims is very likely not intended. You could pass it a 4 Gig string of digits, and it would properly claim that it's an integer. However, you would probably not be able to extract or use that value. So in practice, it validates something you probably don't want it to.
Admin
But seriously, I wanted to know if some of you guys (especially the ones more vocal about the current story's technical stuff) actually have to employ/deal with things like FSA's and stuff in your jobs. Or maybe it's just something you guys research in spare time. I honestly don't have a problem with either. Please don't freak out and attack my intelligence. (It's rude and pitiful). The reason I wanted to know is that I would like to get a job that is a little more challenging than my current one. I thought that being funny Wouldn't hurt.
Wow, you haven't realiZed that this is a website for computer programmers to submit WTFs? How did you end up here, anyway? Also, the comment section is a great place to exchange ideas and make jokes. Though some of you like using it to insult others. It's great if you know a little something, just don't brag about it or anything. Have some class.
Admin
Admin
That... is made of pure awesome!
I want coworkers who write that kind of code
Admin
It's a finite state machine, the tables contain lists of state values and information on state transitions.
This is the key bit:
The state information is kept in an "unsigned char" value, which is used as a the first index to a 2-dimensional array (state,char -> newstate). The algorithm loops through the characters in the string and sets the state according to the existing state and the character. There are tables for each state and each table contains 256 entries. The return value is whether the final state at the last character of the string was state 3. If you look at the state tables you see that if you are in states 1, 2 or 3 the next state is 3 if the character is a digit 0-9, whereas you'll go to the sink state 0 if the character is something else and you'll stay in state 0 until the end of the string.
Actually, this code has one too many states, probably indicating it was copied and modified from elsewhere. State 2 is not used. The overall result of the finite state machine logic is to produce true only if all of the characters in the string (except \0) are numerical digits and produce false otherwise. It's an elaborate way of doing this:
Admin
pretty outdated. i would use a decision tree forest.
Admin
Yes and no. 95% of the time I don't have to think/care about FSA's, but at least once a year I have to solve some problem where a well defined FSA (and not if/else/switch/for/break code) is the cleanest, although not necessarily only, solution.
Sure I could always do it with some jumble of procedural logic, but sometimes, just sometimes, it's the right thing to do.
Admin
Ah ok, i understand now. The index means the character value, and the actual value means the next state.
But i disagree that state two is never used. In state 1 there is one value that is 2.
Admin
Cool, thanks. I know, from what people are saying, that the post's code is pretty excessive I guess. I just wasn't sure if anyone needed in more practical situations. In school it seemed more of a theoretical thing and something you only need to worry about if you are making a compiler or something.
Admin
Ok, i see what you meant. It's the same as state 3
Admin
The wheel would be 'strtol', as stated in the article.
Something like:
if (arg == NULL || *arg =='\0' || arg=='-') return 0; char res; strtol(arg, &res, 10); return *res == '\0';
And anybody else notice the comment for this intchecker:
// If the string is empty, then it does not contain a float
Admin
Admin
Admin
there is no difference if you go from state 1 to state 2 or from state 1 to state 3.
if the first char is a + sign, then it goes to state 2, and then if the next char is a number, then it goes the state 3, where it stays as long it is a number.
Would the + sign make it to go to state 3, it still only accepts numbers after the + sign.
Admin
Hint: Check your solution for the input of "+".
Admin
Admin
No way! It's _False if it's positive, _True if it's zero.
Admin
This was the best post among all you morons.
Admin
apologies in advance for any technical errors -- i don't do C if i can help it, and i'm not interested enough to read the specs.
"the state-transition arrays should be static" i know where you're coming from, but wouldn't making them static mean they remain allocated for the life of the program? as opposed to declaring/initialising them within the method, where they'll be allocated on the stack/heap. (?)
suggesting state 2 is the same as state 3: you fail. state 2's transitions are the same as state 3's, yes, so there's the opportunity to save some memory there, if you're allowed to re-type the arrays...
In case that wasn't flame-bait enough: you may as well do away with C, just use assembly (wrapped up in macros if you prefer, and/or want it to be "portable" like C is) -- I'm talking in general here, not just for this argv-parsing FSA. As far as I can see, C doesn't really give you anything extra. I prefer readable code, and I'm not sure that's possible in C (except for trivial examples).
Admin
Bug fix for non-8-bit-char systems:
There, all better!
Admin
Tricky to do if you really want to allow all valid numbers up to MAXINT, since you can't just look at the length of its decimal representation.
The first thing that comes to mind (though probably not the most efficient) would be to keep a count of the digits (the state-3 bytes) as you're going through the state machine. At the end, if the state machine says it's good (i.e. state 3), compare that number of digits to the number of digits in the decimal representation of MAXINT. If it's either fewer or more digits, then your length check is done and you can return true or false, but if it's the same number of digits, you have to go through it again to compare them. At least it would place most of the secondary checking on only certain cases of long numbers.
Admin
I think the guy who wrote the code has read enough books. He probably wrote a tool many years ago that -given regular grammar- would generate automata in C. No need for any external libraries. The real WTF is... Well, you know...
Admin
Nothing like a small contest, uh?
Note the extreme simplicity of the code. Any bug in the tables can be fixed in a matter if minutes. Drawing the state transition diagram is left as an exercise to the reader :-)
Once the principle is understood, it’s child play to write state transition tables for anything, phone numbers, SSNs, whatever. The next step is to learn what finite state automatons can do for you in other areas of programming, for example in handling complex business processes.
Good luck with the if/then/else version ;-)
Cheers, Axel
Admin
for (; (state = fsa[state][*arg & 0xff]) > 0; arg++);
is better for non-ASCII input.Admin
would be fun to see the unicode version.
Admin
Also, Turing machines are a royal pain to define.
Admin
It depends, because there's equivocation over "integer".
There are cases where you may want to check that something is a mathematical integer, rather than a platform-dependent computing "integer" type.
Admin
Here you go:
What did you expect?
Cheers, Axel
Admin
One place finite state machines can be found is embedded systems programming. A couple of dated examples would be keeping track of the state of analog modems or implementing an ISDN protocol.
I guess your exposure to this kind of stuff depends on whether you are doing high- or low-level programming.
Admin
Admin
Sure enough, I forgot one possible notation: 8e-007
State 4 should become:
and
can be added to the tests.Admin
Be careful when you throw out challenges like that!
Admin
Um, this is my version, forgive me for it is truely awful. I wrote it [sorry I mean hacked it up from hell] to test the speed and I'm not surprised to report it is 41% slower when I tested (with your test data looped 100000 times, GCC 4.1.3 set to -O3). It handles the "f" character and whitespace differently to your code (as per the original code whitespace is a deal breaker for matching)
Sighs, Got I need to find work...
Admin
Um, this is my version, forgive me for it is truely awful. I wrote it [sorry I mean hacked it up from hell] to test the speed and I'm not surprised to report it is 41% slower when I tested (with your test data looped 100000 times, GCC 4.1.3 set to -O3). It handles the "f" character and whitespace differently to your code (as per the original code whitespace is a deal breaker for matching)
Sighs, God I need to find work...
Admin
Admin
You sure? It was 65482 on one platform I've worked on.
Admin
Paragraph 6.3.2.3 of the C99 standard describes conversions of pointers and such a signed to unsigned change is not defined.
Admin
The real WTF is how many WTF readers fail on the coding WTFs.
Admin
Odds are if you're working on a nonstandard platform you wouldn't be writing code that assumes eight-bit chars in the first place, so I think it's a safe assumption that this code was designed to run on a standard platform, so I think it's a safe assumption that casting the pointer will work.
I suppose I should have explained that in my original reply.
Admin
I just hacked a benchmark program in C with a regular and a fsa based "isdigit" function. The results vary greatly from different machines.
This was run on a pII (number of iterations scaled down) :
They were both compiled with gcc -O. Not the same version tho. So I guess both win.
Here is the source : http://pastebin.ca/939540
Admin
"People who don't understand the inner-workings of regex are lazy and stupid and don't deserve to be here." That tone sounds somewhat presumptuous in my ears. As soon as you get used to actually look at current implementations [1] of regular expression engines instead of blindly reproducing trendy academic myths, it becomes obvious that regular expressions usually are not converted into state machines precisely because of the "lot more features" beyond the generative capacity of finite state automata.
There was a discussion about regular expression algorithms on the python-ideas mailing list [2], as "Secret Labs' Regular Expression Engine" in Python [2] actually is not based on FSAs. J. Carlson felicitously commented about the "lot more features": "along with groups, named references, etc., which are a PITA for non-recursive engines". I once tried to implement match group extraction in a finite state transducer environment, and I badly experienced the "mathematically impossible, stupid!" nuance of the meaning of "PITA"...
[1] OK, ignore my comment if you definitely refuse to work with anything less modern than Plan 9: http://swtch.com/~rsc/regexp/regexp1.html [2] http://svn.python.org/view/python/trunk/Lib/re.py [3] http://mail.python.org/pipermail/python-ideas/2007-April/000407.html
Admin
And here's a little picture (made with Graphviz):
[image]I took the liberty to use regular expression syntax in the edges, and "others" to indicate, well, the obvious..
This kind of code would go better with a DSL, such as with the Regel state machine compiler. Although the advantages, if any, compared to using a regular expression in that case are very small :).
Admin
You're not a web designer by any chance, are you? To be honest, this site doesn't really require any input from HTML hair dressers...
Admin
Obviously, there had to be a reason to avoid 'strtol', 'atoi'
There's a good reason to avoid both of these: pass a string that cannot be interpreted as a number, as you get back 0, the same as if you pass "0" (or "0.0", or anything similar). You're better off using something like Boost's numeric_cast library instead.
CAPTCHA: (numerus non) validus
Admin
Sure it works but the real question is this: Can it handle Roman Numerals?
Admin
Don't laugh, here is a real excerpt of a header of the library a coworker maintains. If the rest of the code wasn't so damn complicated, I could provide Alex with ammunition for years. They are full of WTF but would need a lot of background info to be seen as such.