- Feature Articles
- CodeSOD
-
Error'd
- Most Recent Articles
- Secret Horror
- Not Impossible
- Monkeys
- Killing Time
- Hypersensitive
- Infallabella
- Doubled Daniel
- It Figures
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
Admin
Some people, who use regular expressions, know what they are doing and when they use them they go from one problem to no problems.
Some people never tire or repeating a remark which is only valid in a limited context. It's not funny, it's not generally true and, if you really think using regular expressions invariably creates problems, then you are ignorant, incomptent or both.
Admin
The real WTF is one of the following: a) they bothered to keep Jed's solution for v2.0 or b) they lied about that, to spare his feelings
What they should have done was to take Jed out and either set him up for some Special High-Intensity Training or reassigned him to, er, another company.
Admin
No, no it's not.
Admin
Admin
If there are 80 search terms on the 77th appears inside bigstring, your code will run
77 times. Each in O(n), and we assume n is very large here. I'm guessing itself is a character-by-character (short-circuiting) comparison. So we could say your solution runs in O(n (bigstring length) * m (num of search terms) * k (length of search terms, assuming for simplicity they are all in the same length) )A regex match, on the other hand, will only run on bigstring once. O(n). Not even O(n*k) since it will match characters against the regex automata as it goes. Granted, taking branches in the automata is more expensive than character comparison, but in general the regex solution does have a significant performance advantage here.
Admin
Admin
For shits and giggles (after a second look): Your function returs 0 (bool false) for no matches and any matches will be non 0 (bool true) since it is expecting a boolean return, correct?
Admin
Very true, except that this is a perfect illustration of the "two problems" scenario.
Admin
TRWTF is that almost everyone here seems to be incapable of solving this problem efficiently. Hint: index the big string somehow, search for terms until first found! Sheesh! How hard can it be?
Admin
Depends. For matching without any captures, it can be blazing fast once the expression is in memory (even if it's compiled, it has to load the compiled expression). With captures or groups... Doing it without regular expressions can be MUCH faster (I cut SECONDS off a service call in our application simply by removing two regular expressions that were used once each).
Admin
Pattern.quote() and word boundary are your friends.
Only build a hash from the body text, if you plan to search exactly the same text 100s of times, as building the hash is expensive.
On the other hand if Jed had used Pattern.quote(searchTerm[i]) and word boundaries then solution is efficient and elegant. - assuming you are OK with (regex idea of word boundaries)
Admin
If you have access to system(), why not just call grep directly?
Admin
Nope - Boolean is a defined type in VB - though it will implictly cast that way. The code I have given is VB.Net incidentally, framework 4...
Admin
For regex phobic, there is also Boyer Moore algorithm,
but Jed was close - just not quite there.
Admin
"b) they lied about that, to spare his feelings"
c. Anyone who thinks the original code is acceptable doesn't understand the second version and lied to spare everyone's feelings.
Admin
A quote from the article:
jaja! Reminds me of this joke image from XKCD (a comic on-line) [image].Admin
You picture thanks! me say.
Admin
Is that you, Santor?
Admin
Admin
this kind of shit makes me glad I write device drivers and VHDL.
Admin
TRWTF is that the phrase XML has not come up in thye comments yet. Captcha: augue.
Admin
Captcha: secundom = those dummies who try to post "frist" after someone else already has.
Admin
This kind of shit makes me glad I code in Ruby.
Captcha: sino (actually, Ruby is Japano)
Admin
Admin
...that's enough for now. Stay tuned for the rest of easyk's Adventures in V.H.D.L.!
Admin
Admin
P.S. if you ever pop the hood on fgrep, it uses Aho-Corasick internally.
Admin
Not all languages have true booleans. The full if...then syntax guarantees a particular return value. This is not a sheeple thought process.
C evaluates a < b to either 0 or1. It won't even support a < b < c syntax, since that makes (a < b) => 1 which is often less than c.
Furthermore, a device drive might not like 0x0001. It may need 0xffff as its true input.
Admin
Admin
That no laughing matter joke needs to take an arrow to the knee.
Admin
Pfft. Here's in Python:
50% of the lines.
And yes, it short-circuits: the expression inside the any() is a generator which only processes each term as they're consumed by any.
Admin
Stupid Comment 1: "This requires n*n splits".
no it isn't. It performs at most 2*n splits. Or even n+1, as IIRC in VB the loop boundaries are computed once, at the begining.
Stupid Comment 2: "Use hash, sort, bsearch"
no, just put all search terms in hash, then check each word in O(1).
Stupid Comment 3: this comment
Admin
You taught him to wave his cock around.
Admin
Admin
Just the other day I came across a line of VB code reading:
What exactly do we need the iif for?
Admin
I thought the same thing.
And then I thought, What part of my brain is used up remembering theme songs to decades-old television programs? How many can I remember on the spur of the moment? If I could send these files to the recycle bin, could I free up some space for more useful data?
Flintstones, meet the Flintstones ...
... story, of a man named Brady ...
... a three-hour tour, a three-hour tour ...
I still remember some truly obscure ones, like:
It's about time, it's about space ...
Captain Scarlet! Captain Scarlet! he's the one who knows ...
Admin
Umm, wait ...
We have to make assumptions about how the regex processor works, but let's suppose it does traverse the big string just once. It still has to compare a string beginning at each position in the big string to each of the search terms in turn. i.e. number of comparisons is len(bigstring)*count(terms)
The proposed "contains" search will pick each term and for each loop through the big string. Number of comparisons is count(terms)*len(bigstring).
As multiplication is commutative, the number of comparisons is the same in either case. Well, assuming that none of the search terms are found and so you have to search all the way to the end.
Admin
In C#:
or if the Split was really intened:
For large texts you could also throw an AsParallel in between to make it multithreaded.
Admin
Yay! Late-binding FTW!
Admin
Need I point out that Jeb's function will not give the same results as Bob's function? Bob splits the big string at space boundaries and looks for full-word matches. Jeb does not look for word boundaries. Actually, this will probably make Jeb's version slower, as if the average word length is, say, six, then Jeb will do six times as many compares as Bob.
Not to mention that Jeb makes zero effort to escape the strings, so if search terms can contain dollar signs, periods, brackets, etc, etc, he will give incorrect results. (I recently used a function that I thought did a string compare but actually did a Regex search. I wanted to search a template file for the token "[DATE]" and replace it with the current date. This gave rather amusing results.)
Admin
Why do programmers regularly write code that keeps looking for something after they've already found it?
It's like the cliche line, "When I find it, it will be in the last place I look!" Which I normally reply, well duh, why would you keep looking for it after you've found it? But for many programmers, they DON'T find it in the last place they look, because they don't stop looking.
Admin
That "fixed" code is pretty terrible, too.
Um...
And how about all of the +s when & is the real concatenation operator in VB.NET? Why in the world would you ever write Step 1? The If around the For is redundant, too. rx is being declared as an Object - it should be a Regex. And why not just declare it As New Regex anyway? Also, making a MatchCollection is pointless, just take the Count right away. These people need to turn Option Strict On.
Finally, why would you even use joined regular expressions for this in the first place? Especially without escaping? It's not very efficient, and if someone searches for a term with some invalid character, the whole thing blows up.
AND WHY NOT JUST USE Join() INSTEAD OF THE FOR LOOP?
So here's my rewrite:
3 lines. Wow, wonder if they'd hire me.
tl;dr: TRWTF is the new code.
Admin
you're making it too complicated
Admin
Geez, both solutions are pretty crappy... for starters, they both continue checking for subsequent search terms, instead of returning true immediately upon finding one of them...
At least Jeb got rid of the useless "Step 1" from the for loops, though...
Admin
I'm not sure if it would work in VB, nor does Contains takes words boundaries into account...
Admin
But keep posting your ignorance. I'm sure it works in many other languages as well.
Admin
Few lines of [clean, understandable] code IS efficient in developer time - usually one of the most important metrics there is.
Admin
The brilliance about regexs is that they actually can match a string against an arbitrarily complicated regex (excluding back-references and other not-strictly-regex features), in particular against an "or" of many expressions, in O(length(string)). The complexity of the regex only matters for the initial "compilation" step.
If that sounds magic to you, it probably is, just like cars would have been magic to medieval people. (Except that those medieval people couldn't just use the internet to learn about how cars really work.)
But you (and all the other ignorants) just keep posting your boring "two problems" quote without understanding its context.
Admin
Look at the muscles on those arms, they're like hammers! Look, it's that nut who flies around in pajamas!
A nineteen-twenty-eight Porter, that's my mother dear.
There's a scout troop short a child, Khruschev's due at Idlewild.
People yakkety-yak a streak and waste your time of day.
We get the funniest looks from everyone we meet.
Dobie wants a gal who's creamy, Dobie!
Falling, falling, into the hat he fell, spinning, turning, whirling, twirling...Down! Down!
But you gotta love the Animaniacs take on "Mary Tyler Moore":
Who can turn the stove on with her smile? Who can take a bubble bath And suddenly fill it with crocodiles?