- Feature Articles
- CodeSOD
- Error'd
- 
                
                    Forums 
- 
                Other Articles
                - Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
 
 
            
Admin
I used to play quite a bit of NASCAR 2003 online. Lousy racing series; fantastic sim. Anyway, the online component had a swear filter (luckily only enforced when using the publisher's servers). It was pretty aggressive, to prevent people going, "s h i t" or "s&h*i#t" and so forth.
If it found a bad string - even inside another string - the whole bit would turn into cartoon style curse characters: @#$*#@% this!
So:
"He'll run out of gas" -> "@#$@# run out of gas"
"He's hit the wall!" -> "$@#$!%@#% the wall!"
"Uff. Truck-like handling there." -> "@$!#$@%@#%@#%!@%^ handling there."
Even when it was detecting words correctly, the level of censorship bordered on lunacy. If I recall correctly, words like "balls" got censored, too - thus delightfully implying far worse things than intended - "You guys are such a bunch of screwballs!" turned into, "You guys are such a bunch of @#$(!@%*!$&!"
Great stuff.
Admin
IIRC the profanity filter on Lotro was quite strict in that sense too.
Not particulary amusing, except it did filter out the name of a certain software giants primary product as profanity.
Admin
Actually, it is way more efficient to use 'cat bar | grep foo' because it makes a more useful abstraction. The fragment 'cat bar' produces lines of text. 'grep foo' filters it. Then you can replace the pipe lined combination with something else that produces lines of text, or you can replace the grep filter with something else. The 'grep foo bar' does not have this option.
It is all about making small abstractions that helps you solve a seemingly difficult problem by breaking it up piece by piece.
Web sites like what is pointet to are hopeless and demonstrates lack of abstraction understanding.
Admin
Admin
That comment wins!
Admin
I hope the irony of someone trying to show us all their super Linux skillz being unable to correctly use BBCode to quote isn't missed, either. ;-)
Admin
And then grep has only a string to text to work with. It doesn't know that it's coming from a bunch of files, so you'll have to do that part yourself.
Virtually all of the time, when I'm looking for a string, I'm also looking for the file that contains it, so it's useful fur grep to tell me both what it found and where it found it.
Admin
But there's more to grep than "find the file that has foo in it."
Admin
Ah... badly-thought swear filters. Reminds me of one that some admins set up back in the eggdrop heydays. Even as they were actually "guard-bots" for IRC, the "inner chat system" was heavily used as a chat system itself. That, coupled with bot-linking gave us a pretty wide inter-bot chat network to play with.
One day, some of the bot owners decided to place a swear "punisher" that kicked off badmouthers... except it didn't check much what it was bashing on. We found out that people saying "Articulo" (article) or "Computadora" (computer) were being kicked. Why, you may ask??
Well, they matched line regexes for "culo" (slang for anus) and "puta" (slang for whore). The "swear detector" didn't even check if these instances were the actual word, or just inside another word! So basically, very similar to this WTF.
Admin
And no, I didn't run it through my local interpreter.
Admin
$ grep -cE 'd.*a.*m.*n|f.*u.*c.*k|s.*h.*i.*t|p.*e.*n.*i.*s|b.*e.*a.*v.*i.*s|b.*u.*t.*h.*h.*e.*a.*d|c.*u.*n.*y|c.*u.*n.*t|^ass' * english.txt:2627 swedish.txt:267
And that's just using scrabble-wordlists!
Admin
Which proves that this was an example of some shortsighted, unsophisticated preconceptions, which resulted in quite a predicament for attempts at incosequential communications.
Admin
Using cat might be easier on the mind here, but - damn it - it's not more efficient. I shall appear elitist at this point, but if you need an abstraction to help you here that's a problem in and of itself.
Also - note that cat is useless here, so I can't understand why people act like the award is misplaced.
Admin
Actually, the real WTF is Digg's profanity filter. If it finds a match, it replaces the entire word (not just the offending substring) with five asterisks. "Wristwatch" matches its filter. (Of course Scunthorpe matches too.)
And you can only turn it off by creating an account, logging in, and setting a preference. Because cookies just aren't "enterprisey" enough, so they can't allow the riffraff to turn it off.
Admin
Am I the only one who see him?
Admin
Abstraction is not the same as efficiency.
"< bar" also produces lines of text. so use the command line "< bar grep foo" to get the efficiency gain while also retaining the useful abstraction.
Admin
This is so wrong. "cat bar" does not produce "lines of text", it writes a stream of bytes to stdout. The line splitting is peformed by the grep process and none other. Even if cat cared about line endings, the pipe abstraction will flatten the data into a stream of consecutive bytes again anyhow. Abstraction can buy you flexibility, but quite often the price is loss of information. What this means with respect to grep's -n and -l flags is left as an exercise to the reader.
Furthermore, the command line with the useless cat is not more efficient. It's a lot less efficient, as a simple real world test illustrates unambiguously. In one case, I ran grep 225 times in a tight loop using a non-matching pattern and a 92M binary file which was entirely in the cache. I then introduced a useless cat and ran the functionally equivalent command 225 times in a loop. bash's time command was then used to report the result. Both benchmarks were conducted on a MacBook Pro with 2.6GHz Core 2 Duo, 6MB L2 Cache, Mac OS Leopard. Some other applications were drawing a certain amount of CPU power, most notably Firefox 2.0.0.14.
Here are the results for 225x grep: real 0m24.998s user 0m10.524s sys 0m13.574s
And here for 225x cat | grep: real 0m30.166s user 0m14.365s sys 0m35.411s
Obviously, without the useless cat, the loop finishes about 5 seconds earlier, arguably not that much of a difference. But note how the "sys" time in the second case is actually greater than the "real" time. What does that mean? Well, the Mac OS version of bash computes the user and sys times by adding up the times spent on each CPU. On a multi-core system, user + sys can thus end up greater than real. It thus appears that the grep version saturates only one core, whereas the cat | grep version almost saturates both. And indeed, further inspection with the Activity Monitor shows us that this is exactly what happens. The useless cat version effectively requires twice the amount of actual CPU time to perform the same function.
This should not come as a surprise to anyone familiar with UNIX shells. Not only does bash have to spawn twice as many processes, but all of the data has to be read from the file, then written to a pipe, then read from the pipe again before it can actually be processed. Two out of three of these I/O operations are completely redundant.
A real lack of understanding is exhibited when people take abstraction - which can indeed be very useful - as a panacea and apply it to problems indiscriminately. If the simple file is ever to be replaced by the result of an on-the-fly computation, "grep foo bar" can be changed as easily as "cat bar | grep foo". Keep in mind that you will have to ditch the "cat" in any case - or let the command line deteriorate into the absolutely ridiculous "frobnicate | cat | grep foo".
Admin
It was actually 285 loops in both cases, not 225. The total amount of data processed was 25.544 (binary) gigabytes.
Admin
The UNIX-HATERS Handbook
Admin
Hooray for muds/mushes.
Admin
Was just thinking: perhaps this regex is more what they intended:
[[:punct:][:space:]]*d[[:punct:][:space:]]*a[[:punct:][:space:]]*m[[:punct:][:space:]]n[[:punct:][:space:]]
and so on. This way it only blocks "damn", " d a m n ", "/d/a/m/n/", etc.
Admin
Thank Christ you put asteriks on those horrible four letter words! The last time I saw the letter "u" in the middle of the otherwise respectable "f", "c", and "k", I shat my pants.