• Bim Job (unregistered) in reply to aptent
    aptent:
    Bim Job:
    lost soul:
    I envy all of you that get and understand this programming regex stuff. I just can't seem to wrap my little brain around it all. Maybe I haven't tried hard enough, who knows. It's just depressing to think that it could very well be a mental capacity issue, that's kind of hard to swallow or admit to. Anyhow, it may not be related to mental capabilities so where would be the best place to start learning all of this in your opinions??

    Thanks in advance.

    I've never used it, but nobody has a bad word to say about RegexBuddy. There's no evaluation copy, but it's €30 and there's a money-back guarantee. For other IT needs, I migh recommend a book. With regexps, you really kind of have to play with them.
    Now hold on, there! I certainly agree with the
    kind of have to play with them
    but there's really no call to tell a guy who can't even figure out how to string together enough words to google his question that he needs to drop €30 on help he can get for free!?
    Good point.

    Well, we've been asked for advice, and bandwidth is practically free. I see no reason to presuppose that an enquiry about learning regexps is some bizzarro troll. So let's take it at face value.

    • There are (a very large number of Googleable) sites out there. I'd suggest the search phrase "regexp tutorial."
    • There are surprisingly few decent books (I'm open to correction.) I'd suggest the classic, although it may be a little over the top. An alternative is the Cookbook, about which I know nothing. Most O'Reilly cookbooks are a good introduction to the subject.
    • You're still better off with a GUI that tells you where you're going wrong, helps you with hints, and lets you play around. Thus, RegexBuddy. It's about the price of a book.

    Uncontroversial? Helpful? Me?

    Buy thirty euros worth of coffee, sit down, and browse the web. Otherwise, buy a book (if that's how you learn), or buy the software (if that's how you learn).

    I'm not going to dictate choices. It's just nice to know that they're out there.

  • (cs) in reply to Bim Job
    Bim Job:
    MG:
    Bim Job:
    It doesn't take a super-genius programmer to recognise that the OP is written in C# (or possibly Delphi).

    Maybe not, but don't you think it might take someone who has seen Delphi or C# code before?

    Le me put it another way. It dooesn't take a super-genius programmer to realise that the OP is not written in C or C++. Stripping out the other possibilities leaves a normal programmer with the suspicion that it is Java (I've seen a bit of Java, and it isn't)
    How do you know? It's syntactically correct code in all of C, C++, C# and Java. None of them have (as far as I could ascertain) functions stringReplace or regexReplace in their standard libraries, so you can't use that to decide which language it's in. The fact that the method names follow the convention more widespread in Java casts doubt on the C# assumption. Since C# has String.Replace and Regex.Replace in the standard libs, it would make more sense to use those. Since 1.4, Java has Matcher.replaceAll in the standard libs, so unless it's old code, Java isn't the best candidate either. A C programmer would do it differently from the beginning, so it's got to be C++. (Just kidding, the point is, after the redaction, it's impossible to tell which language it was written in.)

  • gallier2 (unregistered)

    Funny, my first troll post! I was OP of the C solution and hadn't imagined that it would start a flame war about sysadmin, C# and whatnot. I knew that the original program was in C# or Java or any other "high-level" "leaking abstraction" (see Joel Spolsky) language. As for the RegEx theoreticians, yes one can use RegEx for that, but one can also use a Howitzer to hunt ducks. I call them RegEx theoreticians because they think they are the fastest way of doing things while they completely forget (or ever knew) that they are heavy to start, especially in managed languages and they are not of O(n) complexity:

    Regular expressions - Perl's regular expression engine is so called NFA (Non-deterministic Finite Automaton), which among other things means that it can rather easily consume large amounts of both time and space if the regular expression may match in several ways. Careful crafting of the regular expressions can help but quite often there really isn't much one can do (the book "Mastering Regular Expressions" is required reading, see perlfaq2). Running out of space manifests itself by Perl running out of memory.

    So I provided a rather optimized implementation of the problem at hand, in the language I work day by day. PS: for the one who responded to my post (MG), there is no supplemental copy in my routine, the inner loop is to find the next non blank character. The use of the temporary variable ch has the purpose to avoid a double dereferencing of the *p pointer that certain compiler (gcc 3.4.6) are unable to avoid, so I did it explicitly here.

  • Paul N (unregistered) in reply to Bim Job
    Bim Job:
    Why not just write a correct version in C#, or possibly Delphi?

    I suppose you could just SWiG it from C. Good luck on getting that one past an idiot boss such as the OP has. In fact, the idiot boss might actually be right, for once.

    C#, how about this?

                StringBuilder sbItemDesc = new StringBuilder();
                string[] splitItemDesc = itemDesc.Split(" ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
                foreach (string item in splitItemDesc)
                {
                    sbItemDesc.Append(item);
                    sbItemDesc.Append(" ");
                }
                itemDesc = sbItemDesc.ToString().Trim();

    captcha: vindico Is that a threat?

  • PRMan (unregistered)

    You can get Expresso for Windows/.NET for free:

    http://www.ultrapico.com/Expresso.htm

    It even has a builder tool and decodes regexes for you.

  • Bim Job (unregistered) in reply to Ilya Ehrenburg
    Ilya Ehrenburg:
    How do you know? It's syntactically correct code in all of C, C++, C# and Java. None of them have (as far as I could ascertain) functions stringReplace or regexReplace in their standard libraries, so you can't use that to decide which language it's in. The fact that the method names follow the convention more widespread in Java casts doubt on the C# assumption. Since C# has String.Replace and Regex.Replace in the standard libs, it would make more sense to use those. Since 1.4, Java has Matcher.replaceAll in the standard libs, so unless it's old code, Java isn't the best candidate either. A C programmer would do it differently from the beginning, so it's got to be C++. (Just kidding, the point is, after the redaction, it's impossible to tell which language it was written in.)
    I'll fess up -- I'm a (fucking) idiot. AFAIK, stringReplace() is only a library function in Delphi (which, of course, uses the Pascal assignment operator -- so it isn't that). regexReplace() doesn't seem to exist in any known standard library. I suppose I was just assuming that the redaction involved replacing "String." with "string" and "Regex." with "regex", but that's no excuse, because it could be any of VB.Net, C#, or even ASP.Net (I think).

    It could also be C or C++, syntactically.

    I'll go away and whip myself, justifiably, with birch twigs. I think my inadvertent point still stands, though. If you're faced with a problem in language X, there is little or no point in providing a solution in language Y. And, as a small mea culpa, I can't imagine a C or C++ programmer writing anything like the code in the OP. Use pointers or use higher-level constructs. Whichever the language supports best.

  • (cs) in reply to Bim Job
    Bim Job:
    I'll fess up -- I'm a (fucking) idiot.
    Fishing for compliments? Okay, here is one: You know that better than I do.
    I can't imagine a C or C++ programmer writing anything like the code in the OP.
    C, no way. C++, weeeelllll... (No, not really, though I've seen idiots programming in C++ which I wouldn't put it past).
  • Andrzej Jarmoniuk (unregistered)

    The new programmer has screwed up big time. What about the LOC?

  • (cs) in reply to suscipere
    suscipere:
    ... he probably had a good reason and you shouldn't question it
    suscipere:
    ... [you shouldn't] change the whole thing without at least asking him about it

    You shouldn't change it without asking the boss about it. You shouldn't ask the boss about it. Therefore.....

  • (cs) in reply to Bim Job
    Bim Job:
    * There are surprisingly few decent books (I'm open to correction.) I'd suggest the classic, although it may be a little over the top. An alternative is the Cookbook, about which I know nothing. Most O'Reilly cookbooks are a good introduction to the subject.

    Another option for the OP is to sit down with an automata theory textbook. The seemingly canonical suggestion here is Sipser's, but I learned from a different one. Hopcroft's book probably goes more into depth on regular languages, etc. than Sipser's.

    But what do I know; I'm just an ivory tower academic. This may not be the best way to learn REs in terms of how you'll use them in actual code. (And, as discussed earlier in this thread, it can fill you with misconceptions about how regex libraries actually behave in practice, because they tend to not actually use the techniques you'd learn this way.)

  • SR (unregistered) in reply to Bim Job
    Bim Job:
    * There are surprisingly few decent books (I'm open to correction.) I'd suggest the classic, although it may be a little over the top. An alternative is the Cookbook, about which I know nothing. Most O'Reilly cookbooks are a good introduction to the subject.

    Sam's Teach Yourself Regular Expressions in 10 Minutes is a great introduction. After priming yourself with that, Google should have most the answers.

    I really ought to get round to reading Mastering Regular Expressions but life is short :o)

  • usitas (unregistered) in reply to luis.espinal
    luis.espinal:
    Steenbergh:
    He shouda used this:

    comefrom RemoveMoreSpaces; strItem = StringReplace(" ", " "); if instr(" ") > 0 then RemoveMoreSpaces:

    We don't need no regular expressions!

    Goto? Dude, what the...? Of all the things considering that DO/WHILE loops have existed since BASIC dialects evolved out of the GW-BASIC/PICK-BASIC/TARD-BASIC decades ago? Goto? Worst fix ever. I hope that was a joke.

    Better now?

  • anon (unregistered)

    Why not just use the trim function?

  • A_S (unregistered)

    while ( $original != str_replace(" "," ",$original) ) ;

  • A_S (unregistered)

    Oops.. :)

  • MG (unregistered) in reply to gallier2
    gallier2:
    PS: for the one who responded to my post (MG), there is no supplemental copy in my routine, the inner loop is to find the next non blank character. The use of the temporary variable ch has the purpose to avoid a double dereferencing of the *p pointer that certain compiler (gcc 3.4.6) are unable to avoid, so I did it explicitly here.

    Ahh, that makes sense.

  • Yan (unregistered) in reply to Mcoder

    Heh. Snobol, back in the day....

    label string " " = " " :s(label)

    Not quite recursive, but not bad for a language that's been dead for, what, 30 years?

  • Bim Job (unregistered) in reply to EvanED
    EvanED:
    Bim Job:
    * There are surprisingly few decent books (I'm open to correction.) I'd suggest the classic, although it may be a little over the top. An alternative is the Cookbook, about which I know nothing. Most O'Reilly cookbooks are a good introduction to the subject.

    Another option for the OP is to sit down with an automata theory textbook. The seemingly canonical suggestion here is Sipser's, but I learned from a different one. Hopcroft's book probably goes more into depth on regular languages, etc. than Sipser's.

    But what do I know; I'm just an ivory tower academic. This may not be the best way to learn REs in terms of how you'll use them in actual code. (And, as discussed earlier in this thread, it can fill you with misconceptions about how regex libraries actually behave in practice, because they tend to not actually use the techniques you'd learn this way.)

    Comment is, in a way, superfluous, but I'd like to think that we (all)'ve pointed somebody at a decent set of regexp alternatives. Even if the original poster was a troll, which I dispute, I'd like to think that the next reader on has something worthwhile to chew on.

    Mind you, the fact that nobody reads a three day old blog-posting stands in the way, somewhat.

    I feel a Marshall McLuhan moment coming on.

  • Bim Job (unregistered) in reply to Ilya Ehrenburg
    Ilya Ehrenburg:
    Bim Job:
    I'll fess up -- I'm a (fucking) idiot.
    Fishing for compliments? Okay, here is one: You know that better than I do.
    I can't imagine a C or C++ programmer writing anything like the code in the OP.
    C, no way. C++, weeeelllll... (No, not really, though I've seen idiots programming in C++ which I wouldn't put it past).
    Most of the idiots programming in C++ I've seen (and I've guesstimate the figure at 90%, idiot-based) are actually mis-translating their original idiot C into even more inept C++.

    Your plus-plusage may vary.

  • (cs) in reply to Bubba
    Bubba:
    Spaces are good. They give the compiler time to think.

    You win today's Out Of Left Field award. Well done.

  • (cs) in reply to amischiefr
    amischiefr:

    Oh right, because 1/2 of the tards out there are lazy, uneducated fucks that think because they write stupid script in their mother's basement that qualifies them as a programmer.

    Yes I know, there are some talented guys out there that don't have a degree, yada yada, some exceptions to the rule, yada yada. And I'm pretty sure that there are people who could be good trial lawyers that can't get into Law school because they don't have the money or can't do well on the LSAT. But guess what? We don't let them practice law! And we shouldn't let just anybody write code.

    Just my rant.

    Mostly agree, in principle, but if you implemented that, the next sound you would hear would be the WHOOMP as air rushed to fill the vacuum left by the remaining US programming jobs being sent overseas where they would have no such silly restrictions.

  • (cs) in reply to Bim Job
    Bim Job:

    It's also slightly sad that you can't spell out "fucking ignorant ranting asshole" without self-censorship.

    Oh. "assHOLE"! I though thought he meant "assHAT".

    Thanks for clearing that up.

  • (cs) in reply to toth
    toth:
    Complete Moron:
    What about stripping html tags or finding a date in a string?

    So THAT'S where you find them. I've been looking in the wrong places.

    Thank you, thank you, I'll be here all night.

    Thank you for not saying "How about finding a string in your date?"

  • feugiat (unregistered) in reply to SQLDave
    SQLDave:
    toth:
    Complete Moron:
    What about stripping html tags or finding a date in a string?

    So THAT'S where you find them. I've been looking in the wrong places.

    Thank you, thank you, I'll be here all night.

    Thank you for not saying "How about finding a string in your date?"

    gulp, rubs rosary "a good [christian] warrior isn't afraid of a little blood!" *gulp, turns pale, vomits and passes out

  • Ajonos (unregistered)

    what about something along the lines of:

    String RemoveDupeSpaces(String itemDesc) { int to = 0; int from = 0; bool last_was_space = false; for(int scan = 0; scan < itemDesc.Length; ++scan) { bool is_this_space = itemDesc[from] == ' '; if(last_was_space && is_this_space) { ++from; } else { itemDesc[to++] = itemDesc[from++]; } last_was_space = is_this_space; } return itemDesc.Substring(0, to); }

    If you're concerned about efficiency, you may as well do it yourself... right?

  • hoodaticus (unregistered)

    "As tempted as Don was to simply add in another bunch of stringReplace calls just for fun, he replaced the whole thing with this.

    // pull double spaces regexReplace(itemDesc, "[ ][ ]+", " ");"

    Oh. Well I guess that works too.

  • (cs)

    Step 1:

    Convert the string variable to a linked list of byte characters

    Step 2:

    Walk the list and set bFoundSpace to true when a space is encountered, or to False when a space is not encountered.

    If space is encountered while bFoundSpace is true, jump back to the prior element of the list and shift its pointer forward one character, then follow the pointer and continue as before.

    Step 3:

    Walk the linked list again to reassemble the string.

    Step 4:

    Fired.

  • Someone (unregistered) in reply to bjolling
    void remove_dblanks3(char *s) {
        for (char *p = s++; *p; s++) (*p != ' ' || *s != ' ') && (*++p = *s);
    }
    

    I find this code much harder to understand than the regex code.

  • (cs)
    adfa adfasd:
    Assuming that nobody's been silly enough to go for the higher-level languages at that point; cheap ed hardy clothing the complexity there starts to become a bit hard to work with.)

    Weirdest spam evar.

  • Quirkafleeg (unregistered) in reply to EvanED
    EvanED:
    adfa adfasd:
    Assuming that nobody's been silly enough to go for the higher-level languages at that point; spam spam spam spam the complexity there starts to become a bit hard to work with.)
    Weirdest spam evar.
    The text is from a comment on the first page; my guess is that it's a bot (running on some zombie Windows box? Wouldn't surprise me) which takes some random sentence from the page then ‘clicks’ on the “comment” link and fills in the form. It's probably able to deal with the captcha too...

    Other sites are seeing this too; a quick search found this posting.

  • (cs)

    While s.count(" "): s.replace(" ", " ")

    It's not rocket science.

  • (cs) in reply to BlueKitties
    BlueKitties:
    While s.count(" "): s.replace(" ", " ")

    It's not rocket science.

    Ah, I get it. The extra space in your first argument to the replace method was removed by the web formatting on this site.

    BlueKitties actually posted this:

    While s.count("  "): s.replace("  ", " ")
  • Barf4Eva (unregistered) in reply to eViLegion
    eViLegion:
    Ben Jammin:
    Edward Royce:
    Boring!

    Now that's funny!

    Yes it is.

    Shame the comment section has gone downhil so much in general though.

    lol! yeah....... right..... never heard that around here before...

  • (cs) in reply to hoodaticus

    Haha, yeah it did clip my extra whitespace. Shame on you WTF admins, you could have just caused a WTF with that feature. =p

  • me (unregistered)

    TRWTF is that the correction makes essentially the same mistake, it should (pedantically) be s/[ ]{2,}/ /g.

  • PRNULC (unregistered) in reply to dkf

    Oh really?

    I just translated his code into vbs and in order for any appreciable impact to be evident the starting string had to be 72170 chars. It is just possible that some user may enter that many spaces into a text input. I suppose.

    This will be my last visit for a while. It's an entertaining site at first. Then it get irritating or even saddening. The catty bickering is just such a waste of life. And I do use regular expressions.

  • Reow (unregistered)

    I'm surprised the correct regex wasn't the original check-in, replaced by a code monkey who didn't understand it.

Leave a comment on “A Spacy Problem”

Log In or post as a guest

Replying to comment #:

« Return to Article