• (cs)

    I used to play quite a bit of NASCAR 2003 online. Lousy racing series; fantastic sim. Anyway, the online component had a swear filter (luckily only enforced when using the publisher's servers). It was pretty aggressive, to prevent people going, "s h i t" or "s&h*i#t" and so forth.

    If it found a bad string - even inside another string - the whole bit would turn into cartoon style curse characters: @#$*#@% this!

    So:

    "He'll run out of gas" -> "@#$@# run out of gas"

    "He's hit the wall!" -> "$@#$!%@#% the wall!"

    "Uff. Truck-like handling there." -> "@$!#$@%@#%@#%!@%^ handling there."

    Even when it was detecting words correctly, the level of censorship bordered on lunacy. If I recall correctly, words like "balls" got censored, too - thus delightfully implying far worse things than intended - "You guys are such a bunch of screwballs!" turned into, "You guys are such a bunch of @#$(!@%*!$&!"

    Great stuff.

  • teh_n1gz (unregistered) in reply to PeriSoft

    IIRC the profanity filter on Lotro was quite strict in that sense too.

    Not particulary amusing, except it did filter out the name of a certain software giants primary product as profanity.

  • Jon haugsand (unregistered) in reply to C
    C:
    The point to UNIX is as you say to chain small tools together. It's also considered somewhat useful if you understand the tools in the first place. For example,

    cat bar | grep foo

    can be replaced with

    grep foo bar

    This was the intent of the "useless use of cat" comment.

    Of course, if you are a patzer who doesn't want to be exposed to true UNIX mastery you probably don't get this.

    Actually, it is way more efficient to use 'cat bar | grep foo' because it makes a more useful abstraction. The fragment 'cat bar' produces lines of text. 'grep foo' filters it. Then you can replace the pipe lined combination with something else that produces lines of text, or you can replace the grep filter with something else. The 'grep foo bar' does not have this option.

    It is all about making small abstractions that helps you solve a seemingly difficult problem by breaking it up piece by piece.

    Web sites like what is pointet to are hopeless and demonstrates lack of abstraction understanding.

    • Jon
  • Tom_fan_63 (unregistered) in reply to Joe
    Joe:
    Sixth!
    Shixth!
  • (cs) in reply to Moss
    Moss:
    Joe:
    Sixth! Sixth! Sixth! Sixth! Sixth! Sixth! Sixth! Sixth! Sixth! Sixth! Sixth! Sixth! Sixth! Sixth! Sixth! Sixth! Sixth! Sixth! Sixth! Sixth! Sixth! Sixth! Sixth! Sixth! Sixth! Sixth! Sixth! Sixth! Sixth! Sixth! Sixth! Sixth! Sixth! Sixth! Sixth! Sixth! Sixth! Sixth! Sixth! Sixth! Sixth! Sixth! Sixth! Sixth! Sixth! Sixth! Sixth! Sixth! Sixth! Sixth! Sixth! Sixth! Sixth! Sixth! Sixth! Sixth! Sixth! Sixth! Sixth! Sixth! Sixth! Sixth! Sixth! Sixth! Sixth! Sixth! Sixth! Sixth! Sixth! Sixth! Sixth! Sixth! Sixth! Sixth! Sixth! Sixth!

    Only a Sixth deals in absolutes.

    That comment wins!

  • (cs) in reply to C
    C:
    So here we are on a website dedicated to the different ways that people can convolute, obfuscate, and otherwise screw up their coding projects...

    The irony is not missed, I hope.

    I hope the irony of someone trying to show us all their super Linux skillz being unable to correctly use BBCode to quote isn't missed, either. ;-)

  • R (unregistered) in reply to Jon haugsand
    Jon haugsand:
    Actually, it is way more efficient to use 'cat bar | grep foo' because it makes a more useful abstraction. The fragment 'cat bar' produces lines of text. 'grep foo' filters it.

    And then grep has only a string to text to work with. It doesn't know that it's coming from a bunch of files, so you'll have to do that part yourself.

    Virtually all of the time, when I'm looking for a string, I'm also looking for the file that contains it, so it's useful fur grep to tell me both what it found and where it found it.

  • Anon Fred (unregistered) in reply to R
    R:
    And then grep has only a string to text to work with. It doesn't know that it's coming from a bunch of files, so you'll have to do that part yourself.
    If you are really using a bunch of files, then of course you should pass them all on the command line to grep.

    But there's more to grep than "find the file that has foo in it."

  • (cs)

    Ah... badly-thought swear filters. Reminds me of one that some admins set up back in the eggdrop heydays. Even as they were actually "guard-bots" for IRC, the "inner chat system" was heavily used as a chat system itself. That, coupled with bot-linking gave us a pretty wide inter-bot chat network to play with.

    One day, some of the bot owners decided to place a swear "punisher" that kicked off badmouthers... except it didn't check much what it was bashing on. We found out that people saying "Articulo" (article) or "Computadora" (computer) were being kicked. Why, you may ask??

    Well, they matched line regexes for "culo" (slang for anus) and "puta" (slang for whore). The "swear detector" didn't even check if these instances were the actual word, or just inside another word! So basically, very similar to this WTF.

  • (cs) in reply to PeriSoft
    PeriSoft:
    I used to play quite a bit of NASCAR 2003 online. Lousy racing series; fantastic sim. Anyway, the online component had a swear filter (luckily only enforced when using the publisher's servers). It was pretty aggressive, to prevent people going, "s h i t" or "s&h*i#t" and so forth.

    If it found a bad string - even inside another string - the whole bit would turn into cartoon style curse characters: @#$*#@% this!

    So:

    "He'll run out of gas" -> "@#$@# run out of gas"

    "He's hit the wall!" -> "$@#$!%@#% the wall!"

    "Uff. Truck-like handling there." -> "@$!#$@%@#%@#%!@%^ handling there."

    Even when it was detecting words correctly, the level of censorship bordered on lunacy. If I recall correctly, words like "balls" got censored, too - thus delightfully implying far worse things than intended - "You guys are such a bunch of screwballs!" turned into, "You guys are such a bunch of @#$(!@%*!$&!"

    Great stuff.

    The great thing about this is that it's probably legal Perl.

    And no, I didn't run it through my local interpreter.

  • (cs)

    $ grep -cE 'd.*a.*m.*n|f.*u.*c.*k|s.*h.*i.*t|p.*e.*n.*i.*s|b.*e.*a.*v.*i.*s|b.*u.*t.*h.*h.*e.*a.*d|c.*u.*n.*y|c.*u.*n.*t|^ass' * english.txt:2627 swedish.txt:267

    And that's just using scrabble-wordlists!

  • (cs) in reply to jarlz0r

    Which proves that this was an example of some shortsighted, unsophisticated preconceptions, which resulted in quite a predicament for attempts at incosequential communications.

  • (cs) in reply to Jon haugsand

    Using cat might be easier on the mind here, but - damn it - it's not more efficient. I shall appear elitist at this point, but if you need an abstraction to help you here that's a problem in and of itself.

    Also - note that cat is useless here, so I can't understand why people act like the award is misplaced.

  • 8bitwizard (unregistered)

    Actually, the real WTF is Digg's profanity filter. If it finds a match, it replaces the entire word (not just the offending substring) with five asterisks. "Wristwatch" matches its filter. (Of course Scunthorpe matches too.)

    And you can only turn it off by creating an account, logging in, and setting a preference. Because cookies just aren't "enterprisey" enough, so they can't allow the riffraff to turn it off.

  • *(*#*)* (unregistered)

    Am I the only one who see him?

  • (cs) in reply to Jon haugsand
    Jon haugsand:
    C:
    The point to UNIX is as you say to chain small tools together. It's also considered somewhat useful if you understand the tools in the first place. For example,

    cat bar | grep foo

    can be replaced with

    grep foo bar

    This was the intent of the "useless use of cat" comment.

    Of course, if you are a patzer who doesn't want to be exposed to true UNIX mastery you probably don't get this.

    Actually, it is way more efficient to use 'cat bar | grep foo' because it makes a more useful abstraction.

    Abstraction is not the same as efficiency.

    "< bar" also produces lines of text. so use the command line "< bar grep foo" to get the efficiency gain while also retaining the useful abstraction.

  • AdT (unregistered) in reply to Jon haugsand
    Jon haugsand:
    Actually, it is way more efficient to use 'cat bar | grep foo' because it makes a more useful abstraction. The fragment 'cat bar' produces lines of text. 'grep foo' filters it. Then you can replace the pipe lined combination with something else that produces lines of text, or you can replace the grep filter with something else. The 'grep foo bar' does not have this option.

    This is so wrong. "cat bar" does not produce "lines of text", it writes a stream of bytes to stdout. The line splitting is peformed by the grep process and none other. Even if cat cared about line endings, the pipe abstraction will flatten the data into a stream of consecutive bytes again anyhow. Abstraction can buy you flexibility, but quite often the price is loss of information. What this means with respect to grep's -n and -l flags is left as an exercise to the reader.

    Furthermore, the command line with the useless cat is not more efficient. It's a lot less efficient, as a simple real world test illustrates unambiguously. In one case, I ran grep 225 times in a tight loop using a non-matching pattern and a 92M binary file which was entirely in the cache. I then introduced a useless cat and ran the functionally equivalent command 225 times in a loop. bash's time command was then used to report the result. Both benchmarks were conducted on a MacBook Pro with 2.6GHz Core 2 Duo, 6MB L2 Cache, Mac OS Leopard. Some other applications were drawing a certain amount of CPU power, most notably Firefox 2.0.0.14.

    Here are the results for 225x grep: real 0m24.998s user 0m10.524s sys 0m13.574s

    And here for 225x cat | grep: real 0m30.166s user 0m14.365s sys 0m35.411s

    Obviously, without the useless cat, the loop finishes about 5 seconds earlier, arguably not that much of a difference. But note how the "sys" time in the second case is actually greater than the "real" time. What does that mean? Well, the Mac OS version of bash computes the user and sys times by adding up the times spent on each CPU. On a multi-core system, user + sys can thus end up greater than real. It thus appears that the grep version saturates only one core, whereas the cat | grep version almost saturates both. And indeed, further inspection with the Activity Monitor shows us that this is exactly what happens. The useless cat version effectively requires twice the amount of actual CPU time to perform the same function.

    This should not come as a surprise to anyone familiar with UNIX shells. Not only does bash have to spawn twice as many processes, but all of the data has to be read from the file, then written to a pipe, then read from the pipe again before it can actually be processed. Two out of three of these I/O operations are completely redundant.

    Jon haugsand:
    It is all about making small abstractions that helps you solve a seemingly difficult problem by breaking it up piece by piece.

    Web sites like what is pointet to are hopeless and demonstrates lack of abstraction understanding.

    A real lack of understanding is exhibited when people take abstraction - which can indeed be very useful - as a panacea and apply it to problems indiscriminately. If the simple file is ever to be replaced by the result of an on-the-fly computation, "grep foo bar" can be changed as easily as "cat bar | grep foo". Keep in mind that you will have to ditch the "cat" in any case - or let the command line deteriorate into the absolutely ridiculous "frobnicate | cat | grep foo".

  • AdT (unregistered)

    It was actually 285 loops in both cases, not 225. The total amount of data processed was 25.544 (binary) gigabytes.

  • (cs) in reply to AdT
    AdT:
    It was actually 285 loops in both cases, not 225. The total amount of data processed was 25.544 (binary) gigabytes.
    I'm so glad you took the trouble to do all that.

    The UNIX-HATERS Handbook

  • (cs)

    Hooray for muds/mushes.

  • (cs)

    Was just thinking: perhaps this regex is more what they intended:

    [[:punct:][:space:]]*d[[:punct:][:space:]]*a[[:punct:][:space:]]*m[[:punct:][:space:]]n[[:punct:][:space:]]

    and so on. This way it only blocks "damn", " d a m n ", "/d/a/m/n/", etc.

  • John Aines (unregistered)

    Thank Christ you put asteriks on those horrible four letter words! The last time I saw the letter "u" in the middle of the otherwise respectable "f", "c", and "k", I shat my pants.

Leave a comment on “Nuns and Regexes Do Not Mix”

Log In or post as a guest

Replying to comment #:

« Return to Article