• Bim Job (unregistered) in reply to MG
    MG:
    gallier2:
    void remove_dblanks(char *s)
    {
      char *p = s;
      int   ch=0;
    

    if(s) { do { ch = *p++; if(ch == ' ') while(*p == ' ') p++; *s++ = ch; } while(ch); } }

    Why throw in the extra copy?

    void remove_dblanks(char *s) {
        char *p = s++;
        while (*p) {
            if (*p != ' ' || *s != ' ')
                *++p = *s;
            s++;
        }
    }
    
    Why not just write a correct version in C#, or possibly Delphi?

    I suppose you could just SWiG it from C. Good luck on getting that one past an idiot boss such as the OP has. In fact, the idiot boss might actually be right, for once.

  • Matthias (unregistered) in reply to Mcoder

    It is possible to write recursive regexes in C++ with boost.xpressive.

  • Bim Job (unregistered) in reply to MG
    MG:
    MG:
    void remove_dblanks(char *s) {
        char *p = s++;
        while (*p) {
            if (*p != ' ' || *s != ' ')
                *++p = *s;
            s++;
        }
    }
    

    Assuming a C99 compiler, this is the most compact I can make it.

    void remove_dblanks3(char *s) {
        for (char *p = s++; *p; s++) (*p != ' ' || *s != ' ') && (*++p = *s);
    }
    
    Or you could just do it in C#.

    It's not like anybody would notice you getting the language completely wrong.

  • MG (unregistered) in reply to Bim Job
    Bim Job:
    Or you could just do it in C#.

    It's not like anybody would notice you getting the language completely wrong.

    You are assuming that my comments were responses to the article. They were not, they were responses to the first guy who did it in C. It wouldn't make sense for me to criticize someones C code and say "you could have done it better in language X," would it? (Ignoring the fact that the original poster did exactly that.)

    I am told that C# supports pointers to some extent. Probably it wouldn't take much massaging to get my little snippet into C#. I wouldn't know. I'm a lowly UNIX sysadmin, not a super-genius programmer. That means I only know C, Perl, shell, etc. So I am told, anyway.

  • Anonymous Coward (unregistered)

    Why ask for two spaces or more? Replacing one space with one space is perfectly acceptable, and (without having done any benchmarking) I'd expect a simpler regex to work faster.

    regexReplace(itemDesc, " +", " ");

    strikes me as more sensible.

  • Bim Job (unregistered) in reply to MG
    MG:
    Bim Job:
    Or you could just do it in C#.

    It's not like anybody would notice you getting the language completely wrong.

    You are assuming that my comments were responses to the article. They were not, they were responses to the first guy who did it in C. It wouldn't make sense for me to criticize someones C code and say "you could have done it better in language X," would it? (Ignoring the fact that the original poster did exactly that.)

    I am told that C# supports pointers to some extent. Probably it wouldn't take much massaging to get my little snippet into C#. I wouldn't know. I'm a lowly UNIX sysadmin, not a super-genius programmer. That means I only know C, Perl, shell, etc. So I am told, anyway.

    You are sadly misinformed. Leave it to the fucking programmers. Your job is to be a lowly UNIX sysadmin. It doesn't take a super-genius programmer to recognise that the OP is written in C# (or possibly Delphi).

    When somebody told you that C# supports pointers to some extent, did you ask them why? And when you should use them?

    Thought not. I'm a C and C++ programmer, myself, and I'm damned if I'm going to descend to pointer semantics like "unsafe" if I program in C#.

    Best left to idiot SysAdmins who have support from their idiot PHBs. And well worthy of the next WTF.

  • foxyshadis (unregistered) in reply to Bim Job
    Bim Job:
    Or you can just be a dummy, like me, and ask "Why would anybody need back-references? Even Jamie Zawinski could only deal with two problems at a time."
    Backreferences make a large class of very complex regular expressions much simpler. That's generally worth the potentially unbounded worst case, unless your brain is retrofitted with a full regex parser.

    So what's your problem anyway? Asperger's?

  • (cs) in reply to Bim Job
    Bim Job:
    Just as a theoretical follow-up (and you'll be able to make more sense of it than I can), I've found the Google cache for Russ Cox article I was searching for. It's essentially a defence of DFAs over NDFAs.
    Granted, I only skimmed the article, but my impression is that's not a very accurate description of it. For instance, in his graph, the black "good" lines are based off of a DFA, but the think blue line is an NFA.

    From at least that measure, I get that NFA vs DFA isn't a big deal. What is a big deal is either FA-based approach vs. what many real regex libraries (Perl, PCRE, Python, and Ruby) do, which is a third option -- recursive backtracking.

    And the point of the article is that recursive backtracking algorithms have extraordinarily poor worst-case behavior, as compared to the Thompson FA construction (either DFA or NFA).

    Obviously the recursive backtracking construction is usually OK, or it wouldn't be so prevalent. But it's harder to characterize the performance of such an engine then one based on a FA, and ukslim's original comment When you write a regex, you're programming a finite state machine is flat-out wrong for most regex libraries.

  • Bim Job (unregistered) in reply to foxyshadis
    foxyshadis:
    Bim Job:
    Or you can just be a dummy, like me, and ask "Why would anybody need back-references? Even Jamie Zawinski could only deal with two problems at a time."
    Backreferences make a large class of very complex regular expressions much simpler. That's generally worth the potentially unbounded worst case, unless your brain is retrofitted with a full regex parser.

    So what's your problem anyway? Asperger's?

    Could be. <slaps side of head. Ooh, fun!>

    Got an example of "a large class of very complex regular expressions <that back-references make> much simpler?"

    Or are you just bull-shitting?

    That's the great thing about blogs. They appeal to the worst in all of us.

    You'll note (or perhaps you're too stupid to note) that I didn't specifically object to using PCRE and back-tracking where they make "a large class of very complex regular expressions much simpler."

    I'm merely offering links to other possible solutions.

    Can I help you with your problem?

  • Bim Job (unregistered) in reply to EvanED
    EvanED:
    Bim Job:
    Just as a theoretical follow-up (and you'll be able to make more sense of it than I can), I've found the Google cache for Russ Cox article I was searching for. It's essentially a defence of DFAs over NDFAs.
    Granted, I only skimmed the article, but my impression is that's not a very accurate description of it. For instance, in his graph, the black "good" lines are based off of a DFA, but the think blue line is an NFA.

    From at least that measure, I get that NFA vs DFA isn't a big deal. What is a big deal is either FA-based approach vs. what many real regex libraries (Perl, PCRE, Python, and Ruby) do, which is a third option -- recursive backtracking.

    And the point of the article is that recursive backtracking algorithms have extraordinarily poor worst-case behavior, as compared to the Thompson FA construction (either DFA or NFA).

    Obviously the recursive backtracking construction is usually OK, or it wouldn't be so prevalent. But it's harder to characterize the performance of such an engine then one based on a FA, and ukslim's original comment When you write a regex, you're programming a finite state machine is flat-out wrong for most regex libraries.

    Thanks for reading it, anyway. The derivations of the OP are a particularly silly way of examining a Finite State Automaton (there! I've said it!), and I think that ukslim's comment was, shall we say, under-prepared.

    I'm yet to be convinced that a non-Thompson FSA is the way to go. In thirty years of computers, I've come across too many cases where "you don't need to worry about that exponential thang, honey..." right up to the point where you do.

    I was just trying to add a few relevant references. They won't help many people. You've parsed stuff (I parsed, you parse, they fuck up). Just hoping it would give you a few good leads ... that's all.

    PS I'll go back and look at the thin blue line. Good to have an intelligent response, for once.

  • ZOMG (unregistered)

    So... did the guy fix it or did he just quit because there are gasp bad code in the company? Even Microsoft has shit code for chrissakes.

    If he quit because of that then I think his expectations are too high. And the phrase "many past jobs" would indicate that he is either a very old guy or just some arrogant prick who loves jumping ship every few months.

  • (cs)

    What always amazes me is that people producing this kind of code never look at it and think, "Wow, that really looks weird. Can this really be the best approach?"

  • Mike (unregistered) in reply to dkf
    dkf:
    Voice of Reason:
    There's far too much "clever" in these comments...
    So you decided to compensate?

    Great, I just spit sunflower seeds all over the carpet.

  • gomer (unregistered) in reply to ZOMG
    ZOMG:
    Especially Microsoft has shit code for chrissakes.

    FTFY

  • Da' man (unregistered) in reply to Patrick
    Patrick:
    Just another reason why RegEx Rules!

    /ha4@\s+((d)|([t+]h))[3ea4@]\s+p[1l][a4@]n[3e][t+]/i

    Guys!

    Three pages of RegEx talk, and no one thought of posting this link?

    http://ars.userfriendly.org/cartoons/?id=20070628

    Where is this all going to end? Did you guys give up reading other sites completely?

    Shame on you!

    ;-)

  • (cs) in reply to Voice of Reason
    Voice of Reason:
    you certainly can't fault code that's clearer and _may not_ be any slower.
    It's not "clearer". It uses three lines where one would do, and it requires people to think about loop logic instead of reading a single simple command.

    There are many scenarios where regular expressions are unnecessarily complicated and unreadable. But this is not one of them. This is exactly the kind of situation where a simple regex is appropriate.

  • SR (unregistered) in reply to Anonymous Coward
    Anonymous Coward:
    Why ask for two spaces or more? Replacing one space with one space is perfectly acceptable, and (without having done any benchmarking) I'd expect a simpler regex to work faster.

    regexReplace(itemDesc, " +", " ");

    strikes me as more sensible.

    I reckon the " {2,}" version might be a tiny bit faster. That said, I'd go with yours (" +") for readability unless it proved to be a bottleneck.

    Somewhere in the comments is a really nice one: "[ ]+", which is effectively the same expression as " +" but even more readable.

    My favourite solution is this one:

    Jonathan Collins:
    Also, I kind of like "[ ]+" better than " +", if only for clarity.

    Effectively exactly the same expression, but so readable. One I'll be using in future.

  • (cs)

    reads thread

    sighs

                  TDWTF
      _______/  |---^---|  \_______
    
  • oheso (unregistered) in reply to ZOMG
    ZOMG:
    Even Microsoft has shit code for chrissakes.

    Say no! gasp!

  • Henkie (unregistered) in reply to Bim Job

    It doesn't take a super-genius programmer to recognise that the OP is written in C# (or possibly Delphi).

    Delphi uses := as an assigment operator

  • Paul A. Bean (unregistered) in reply to eViLegion
    eViLegion:
    MD:
    Boring!

    And yet not quite boring enough for you to post a comment, thereby suggesting that your life must be even more so.

    catch(DullAndPredicatableStockResponseException yawn)

  • mh (unregistered) in reply to Iago
    Iago:
    It's not "clearer". It uses three lines where one would do, and it requires people to think about loop logic instead of reading a single simple command.
    Pardon me, but if someone is having issues with the clarity of a 3 line loop I would suggest that they might be in the wrong job.
  • SR (unregistered) in reply to mh
    mh:
    Pardon me, but if someone is having issues with the clarity of a 3 line loop I would suggest that they might be in the wrong job.

    I'd suggest the same about someone unable to understand " +".

  • (cs) in reply to Bim Job
    Bim Job:
    MG:
    MG:
    void remove_dblanks(char *s) {
        char *p = s++;
        while (*p) {
            if (*p != ' ' || *s != ' ')
                *++p = *s;
            s++;
        }
    }
    

    Assuming a C99 compiler, this is the most compact I can make it.

    void remove_dblanks3(char *s) {
        for (char *p = s++; *p; s++) (*p != ' ' || *s != ' ') && (*++p = *s);
    }
    
    Or you could just do it in C#.

    It's not like anybody would notice you getting the language completely wrong.

    Or just wrap it in a C++/CLI block and 'pin' your string down so the garbage collector doesn't move it.

  • Evoex (unregistered)

    I'd say this is bullshit. See, 24 replaces of " " to " ", means that 2^25 spaces is turned into two spaces. So for two subsequent spaces to be left there must be 34MB! of spaces. I don't believe anyone will actually have 34MB of spaces, really.

    For those who don't believe me, imagine 8 spaces. After replacing " " with " " once there will be 4 spaces. Again, there will be 2. It will divide by 2 every time.

    Unless, of course, stringReplace would replace only the first occurance of a string, which would be another wtf on itself.

  • Evoex (unregistered) in reply to Evoex
    Evoex:
    I'd say this is bullshit. See, 24 replaces of " " to " ", means that 2^25 spaces is turned into two spaces. So for two subsequent spaces to be left there must be 34MB! of spaces. I don't believe anyone will actually have 34MB of spaces, really.

    For those who don't believe me, imagine 8 spaces. After replacing " " with " " once there will be 4 spaces. Again, there will be 2. It will divide by 2 every time.

    Unless, of course, stringReplace would replace only the first occurance of a string, which would be another wtf on itself.

    Hooray for the board replacing two spaces with one! I'm tempted to try 2^25 spaces. Let's try just a few: " ".

  • g (unregistered)

    How many stringReplaces does it take to change a lightbulb?

  • SR (unregistered) in reply to g
    g:
    How many stringReplaces does it take to change a lightbulb?

    [0-9]

  • Kyle Franz (unregistered)

    On the "goto" issue:

    In C\C++ at least, the code will end up the same in assembly if you use a do\while loop or using if\goto. They both say the same thing.

  • Anonymousse (unregistered) in reply to dkf

    (Schlemiel the Painter)

    Interesting. So part of the inefficiency problem is that strcat(s1, s2) returns s1. For multiple concatenations it would be much better if it returned a pointer to the NUL terminator (called strcat_z() below).

    While returning s1 allows you to nest calls, you might as well write s1 since it has the same value:

    strcat(strcat(s, "a"), "b");

    is the same as the barely longer

    strcat(s, "a"), strcat(s, "b");

    but

    strcat(strcat_z(s, "a"), "b");

    would be more efficient.

    Further optimizations are left as an exercise for the reader.

  • rawr (unregistered)

    TRWTF is that he used regex instead of just enabling global search.

  • Bob (unregistered) in reply to eViLegion
    eViLegion:
    Alekz:
    eViLegion:
    MD:
    Boring!

    And yet not quite boring enough for you to post a comment, thereby suggesting that your life must be even more so.

    No it does not suggest that.

    Oh, right, it suggests MD is a nob with nothing better to do that post an unconstructive single word answer, that is neither funny nor clever. And by extension, you are also one of those, for defending him.

    And, therefore, you are also a nob for the same reasons.

    Oh, shit...

  • BSDGuy (unregistered) in reply to dkf

    w00t Compilers...my FAVOURITE class. C coding and language theory, awesome.

  • Kirby L. Wallace (unregistered)

    I hate RegEx!!! RegEx Die! Die! Die!

    how about:

    while (itemDesc.indexOf("  ") >= 0) {
        itemDesc = stringReplace(itemDesc,"  "," ")
    }
    
  • calculator.ftvb (unregistered) in reply to NMe

    Wouldn't regexReplace(itemDesc, " +", " "); be slightly slower, though, because it is running extra replaces?

  • Joel (unregistered) in reply to Kirby L. Wallace
    Kirby L. Wallace:
    I hate RegEx!!! RegEx Die! Die! Die!

    how about:

    while (itemDesc.indexOf("  ") >= 0) {
        itemDesc = stringReplace(itemDesc,"  "," ")
    }
    

    Any flavor of regex is guaranteed to be more efficient than this if you have more than 3 spaces. The n spaces will be replaced in one pass, whereas with this code it would take log n.

  • SR (unregistered) in reply to calculator.ftvb
    calculator.ftvb:
    Wouldn't regexReplace(itemDesc, " +", " "); be slightly slower, though, because it is running extra replaces?

    I would imagine that it would be slower, but only neglibibly so. I'd leave it in as it's more readable unless it actually proves to be a bottleneck.

  • (cs)

    Maybe it's because I have a fairly strong compiler background, but I don't understand the objections to something like "[ ]+". How is a regexReplace call with that as the regex at all hard to understand?

    You say the loop is only negligibly slower than the regex, but it's also negligibly easier to understand, if any.

  • SR (unregistered) in reply to EvanED
    EvanED:
    Maybe it's because I have a fairly strong compiler background, but I don't understand the objections to something like "[ ]+". How is a regexReplace call with that as the regex at all hard to understand?

    You say the loop is only negligibly slower than the regex, but it's also negligibly easier to understand, if any.

    I think the commenter meant that " +" is slower than " {2,}"

  • Bim Job (unregistered) in reply to bjolling
    bjolling:
    Bim Job:
    Or you could just do it in C#.

    It's not like anybody would notice you getting the language completely wrong.

    Or just wrap it in a C++/CLI block and 'pin' your string down so the garbage collector doesn't move it.
    Nothing wrong with pinning one or more variables if there's some sort of essential algorithmic win. In this case, there isn't. Now you're just adding needless complexity, and expecting some poor sod of a C# maintenance programmer to understand C/C++. Or vice versa. Where's the sense in that?

    I know sod all about C#, but I do know that in this case the language is perfectly feature-rich for the task. I'd either look up the various string library features, or else I'd settle for a simple regexp.

    There are typically more important ways to utilise your mad skillz.

  • suscipere (unregistered)

    When your superior does a task in a certain way you shouldn't rip all his code out and replace it with something different. That could make him look bad or even be considered insubordination. If your boss used a large stack of stringReplace()s instead of a single regular expression, he probably had a good reason and you shouldn't question it, and especially not change the whole thing without at least asking him about it respectfully.

    And you know what they say about regular expressions...

  • MG (unregistered) in reply to Bim Job
    Bim Job:
    It doesn't take a super-genius programmer to recognise that the OP is written in C# (or possibly Delphi).

    Maybe not, but don't you think it might take someone who has seen Delphi or C# code before?

    Bim Job:
    When somebody told you that C# supports pointers to some extent, did you ask them why? And when you should use them?

    Didn't care. I don't have any intention of using C# any time soon.

    Bim Job:
    ... fucking ... idiot SysAdmins ...

    Has anyone mentioned you're kind of an asshole?

  • (cs) in reply to SR
    SR:
    I think the commenter meant that " +" is slower than " {2,}"
    Ah, you're probably right with respect to the poster I followed. Still, there are other people advocating the loop-based approach as being more readable, and my comment was aimed at them too.
  • Chetan (unregistered)

    Good, at least they use source control.

  • lost soul (unregistered)

    I envy all of you that get and understand this programming regex stuff. I just can't seem to wrap my little brain around it all. Maybe I haven't tried hard enough, who knows. It's just depressing to think that it could very well be a mental capacity issue, that's kind of hard to swallow or admit to. Anyhow, it may not be related to mental capabilities so where would be the best place to start learning all of this in your opinions??

    Thanks in advance.

  • Bim Job (unregistered) in reply to MG
    MG:
    Bim Job:
    It doesn't take a super-genius programmer to recognise that the OP is written in C# (or possibly Delphi).

    Maybe not, but don't you think it might take someone who has seen Delphi or C# code before?

    Le me put it another way. It dooesn't take a super-genius programmer to realise that the OP is not written in C or C++. Stripping out the other possibilities leaves a normal programmer with the suspicion that it is Java (I've seen a bit of Java, and it isn't) or Delphi (I stand corrected by a poster above who points out that the assignment operator in Delphi is derived from Pascal) or C#. Or some obscure language that is unaccountably popular on TDWTF. But, no, it's probably C#.

    MG:
    Bim Job:
    When somebody told you that C# supports pointers to some extent, did you ask them why? And when you should use them?

    Didn't care. I don't have any intention of using C# any time soon.

    Neither do I. But I'm prepared to tackle a TDWTF issue at the level on which it is presented -- in this case, C#. It's possible to use pointers in C# (in a convoluted sort of way), but it's a damn silly approach.

    MG:
    Bim Job:
    ... fucking ... idiot SysAdmins ...
    I had to search quite hard to fill in the ellipses here. Ever thought about reviewing movies in Tinseltown? The actual quote is:
    Bim Job:
    Leave it to fucking programmers. Your job is to be a lowly UNIX sysadmin. It doesn't take a super-genius programmer to recognise that the OP is written in C# (or possibly Delphi).

    When somebody told you that C# supports pointers to some extent, did you ask them why? And when you should use them?

    Thought not. I'm a C and C++ programmer, myself, and I'm damned if I'm going to descend to pointer semantics like "unsafe" if I program in C#.

    Best left to idiot SysAdmins. (Note that you described yourself as "a lowly UNIX SysAdmin). I'm not describing you as lowly, nor as an idiot. I'm simply pointing out that the solution you propose would only be accepted by a PHB in thrall to an idiot SysAdmin. At no point did I suggest that you were that idiot SysAdmin.

    Note that the real thing is a comment about C#, C++, C, pointer semantics, and the C# keyword "unsafe." All of these are relevant to the OP and subsequent discussions of "how to improve the OP."

    fucking idiot SysAdmins might well be relevant to you, on the other hand. It's just not what I said.

    MG:
    Has anyone mentioned you're kind of an asshole?
    Nobody I give a shit about, no. Did Mommy never tell you that it's impolite to misrepresent other people with fraudulent quotes?

    The point is, this is written in C#. I don't like it. You don't like it. Get real. It should be fixed in C#, not in some fantasy reality involving random choices based on your own preference.

    You won't be maintaining it.

  • sino (unregistered) in reply to lost soul
    lost soul:
    I envy all of you that get and understand this programming regex stuff. I just can't seem to wrap my little brain around it all. Maybe I haven't tried hard enough, who knows. It's just depressing to think that it could very well be a mental capacity issue, that's kind of hard to swallow or admit to. Anyhow, it may not be related to mental capabilities so where would be the best place to start learning all of this in your opinions??

    Thanks in advance.

    Gee, I dunno... I guess I'd take a look on the internet. Wait, what's this? Lucky you! You're already there!

    In fact, http://lmgtfy.com?q=learn+regular+expressions.

    May the burning shame you now feel imbue you with the power to answer your own questions in the future. I see good things for you.

    IOW, retarded troll is retarded.

  • Bim Job (unregistered) in reply to lost soul
    lost soul:
    I envy all of you that get and understand this programming regex stuff. I just can't seem to wrap my little brain around it all. Maybe I haven't tried hard enough, who knows. It's just depressing to think that it could very well be a mental capacity issue, that's kind of hard to swallow or admit to. Anyhow, it may not be related to mental capabilities so where would be the best place to start learning all of this in your opinions??

    Thanks in advance.

    I've never used it, but nobody has a bad word to say about RegexBuddy. There's no evaluation copy, but it's €30 and there's a money-back guarantee. For other IT needs, I migh recommend a book. With regexps, you really kind of have to play with them.

  • Bim Job (unregistered) in reply to sino
    sino:
    Gee, I dunno... I guess I'd take a look on the internet. Wait, what's this? Lucky you! You're already there!

    In fact, http://lmgtfy.com?q=learn+regular+expressions.

    May the burning shame you now feel imbue you with the power to answer your own questions in the future. I see good things for you.

    IOW, retarded troll is retarded.

    Good job on being more obnoxious that me ...

  • aptent (unregistered) in reply to Bim Job
    Bim Job:
    lost soul:
    I envy all of you that get and understand this programming regex stuff. I just can't seem to wrap my little brain around it all. Maybe I haven't tried hard enough, who knows. It's just depressing to think that it could very well be a mental capacity issue, that's kind of hard to swallow or admit to. Anyhow, it may not be related to mental capabilities so where would be the best place to start learning all of this in your opinions??

    Thanks in advance.

    I've never used it, but nobody has a bad word to say about RegexBuddy. There's no evaluation copy, but it's €30 and there's a money-back guarantee. For other IT needs, I migh recommend a book. With regexps, you really kind of have to play with them.
    Now hold on, there! I certainly agree with the
    kind of have to play with them
    but there's really no call to tell a guy who can't even figure out how to string together enough words to google his question that he needs to drop €30 on help he can get for free!?

Leave a comment on “A Spacy Problem”

Log In or post as a guest

Replying to comment #:

« Return to Article