- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
Why? There is nothing in the original implementation that suggests that. In fact, searchText.Split(" ") pretty much says that no, that is not what we are doing here.
Why are people still pissing all over each other trying to solve the substring problem when this isn't one???
Hats off to Mark Bowytz for this WTF. It's a 3 bagger.
The Bob's code is a WTF. Jeb's solution is a bigger WTF. (it fails basic regression tests). Most solutions posted here FAIL.
WTF?
Admin
Lol @ this all Why doesn't anyone study data structures and algorithms anymore?
Seriously. Want to persistently, incredibly fastly, sort and search strings and substrings? Ever heard of a trie? A Judy array, specifically. I mean seriously! TTWTF is that this is (IMO) basic algorithmic stuff that 99% of TDWTF are failing at
Admin
Each to their own.
I personally get pissed off when some douche bag spends a full day implementing some special x, y, z followed by days of testing and modification to get it working 'just right' - when I in 15 mins created a simple n^2 loop, threaded it and moved on with only 5 lines of code to ever worry about should there be problems down the track.
IMO you have to PROVE that the naive way is too slow with the given requirements before you even think about implementing a non-trivial alternative. I note you haven't.
Admin
Ah, the Sunk Cost fallacy. The manager is a fail.
Admin
Is there no Regexp Pattern quoting in that language?
In Java I would expect someone to at least call Pattern.quote(searchstring) before concatenating them all with pipes.
Admin
Admin
By the same token:
For i As Integer = 1 To paramTable.stackHeight() - 1 Step 1 If searchTable.Contains(paramTable.Pop()) Then truthTable.Push(True) Else truthTable.Push(False) End If Next
Should be
For i As Integer = 1 To paramTable.stackHeight() - 1 Step 1 truthTable.Push(searchTable.Contains(paramTable.Pop())) Next
But on the whole I actually prefer the first version. TRWTF is VB. VB is a toy and has no place in professional programming.
Admin
Agree with your solution, but Ruby has a function in the standard library for creating a regexp from a list of strings.
Admin
The purpose of a code review is not to determine if the solution is elegant.
A solution should work, should be maintainable, and should have reasonable performance.
If the first solution worked, and if a novice programmer could read the code and understand it , and if the solution did not create a noticeable performance issue, then changing to a different solution is risky with no benefit.
Admin
It should probably flag up where it's shit though.
Admin
I'm glad you code in ruby . more ignorant fools should code in ruby just like you do and then we'd only have ruby code to ignore.
Even if your ruby was good code, which it is obviously not, it'd still be slower than a lot of alternatives.
Seriously .. people like you should just quit coding . completely.
Admin
AAAAnd you're fired too. friggin tourists posting code -- wtf.
Admin
Well if we're going to make assumptions, It's safe to assume that if you're programming in a GOOD language (not VB), your regex processor has been written by people who know at least a good deal about it, and will thus be way more efficient than what you will come up with.
Now stop being ridiculous, use the damn regex WHICH IS MUCH FASTER THAN LOOPED CONTAINS, and be done with it.
Yes, many people get things wrong etc. but if you're not smart enough to see that contains sucks balls and regex is better unless implemented by someone like you (which is not the case luckily), stop thinking, stop discussing, and use the damn regex.
Admin
AAANd another loser . seriously ... how can you possibly post that crap after so many people have posted (somewhat) informed replies ??
Admin
And again .. another topper-idiot...
Admin
And morons like you would know . because ?
This code is not just clean and understandable, it's also an excellent example of retarded slow inefficient bullcrap - at least by an order of magnitude slower.
By writing it correctly once, you save much more developer time than by accepting a crappy version in the first place.
The fact that devs who can write it correctly are few does imply some productivity limits, but otoh you won't spend 100x the dev time fixing bugs and others.
Admin
Suddenly,
a search term with a regular expression in it.
"(x+x+)+y" anyone?
http://www.codinghorror.com/blog/2006/01/regex-performance.html
Admin
You sir, are an ignorant.
The very fact that you can ask such a question shows that you cling to the myth that you can "code".
Please, do acknowledge your total inability to do such an activity and tell your brethren that they too should renounce coding forever.
Admin
Another one bites the dust ? Seriously . don't code, don't post code, don't talk about code.
Admin
Threading is no solution either, since the number of CPU cores will hardly grow as fast as typical data sizes grow in the foreseeable future.
Admin
Please show some sensitivity. I had a son who made a ridiculous application once. It allowed hymns to be selected in real-time during a funeral service. And let me assure you: it was no laughing matter.
Admin
Except that today we work on virtualized everything and those performance improvements are at least 10x faster, not just 2x faster - this means 10x higher server density or 10x longer lasting batteries, so yes it matters.
note : the 10x factor was pulled out of my ass, and it's pretty much CONSERVATIVE compared to reality (I've often improved speed of existing software north of 100X, very rarely less than 2x) - also it does not take into account stupid slow languages like Ruby or node.js or other FOTM funky-looking nice-paradigm slow-as-shit stuff.
Admin
Admin
... and besides, if the joke is already lame, what good does it do rendering it lame?
Admin
And, like the man said, it's more than likely that a library function is by-and-large pretty damn efficient, and better brains than yours have spent considerable effort improving it over more time than you have. If you can demonstrate that a given popular library algorithm is really underperforming compared with your own implementation, then go to it - but I won't believe it till I see it.
Admin
Not that hard: for (@searchTerms) { return 1 if $searchText =~ /\Q$_\E/i } (drop the i flag if you want case sensitive.)
Admin
Admin
You know L. one of the main problems with being a complete prick is that even if you're right no-one listens.
Admin
Please stop enumerating the hundreds of possible much better solutions. They are just VB.NET developers, so praying for their souls is the best think we can do.
Admin
Admin
It seems our little post-whore L. loves to throw himself under every bridge he passes by. Even if they are only being tended by billy goats.
Admin
Admin
Admin
Admin
might be why i like c#...
if i'd really have to use regex i'd use ex.IsMatch(...) instead of ex.Matches(...).Count>0 since it stops evaluating after finding the first match - no need to process the whole string at all :-)
Admin
or even as extension-method...
call would be like
Admin
Oh, by the way, assuming n >> p, that hash/sort/binsearch solution winds up being O(n log n) anyway due to the sort step. That is better than O(m + n) (i.e. linear time) how? And also, all it'd take to make Aho-Corasick or Rabin-Karp match on word boundaries is to bracket each of the keywords being searched for with spaces. Case-insensitive searching can be implemented by case-normalizing the input in a preprocessing step (this gives you a chance to do other things too such as tab expansion, Unicode normalization, ...). (And you still haven't changed the big-O complexity of your algorithm with this: also, you don't have to modify the implementation of the algorithm itself, which allows you to use a known-good implementation (say from a library, or just by invoking fgrep).
CAPTCHA: feugiat
Admin
Ah, here come the VB insults. I'd be willing to bet that more business value has been delivered with VB than all other languages combined.
Admin
I believe COBOL may trump it.
Admin
Only comment I have is that it's possible that the compiler will optimize the complexity away. Maybe.
Also might be that N is a small number. If that's the case, N*M may win.
We have no idea what the previous solution was. Might be that it was even worse. Or a technically better algorithm but as I've found, sometimes the 100X boost you get using library routines in an interpreted language wins.
Choosing regx over strstr is the sign of a programmer that has his priorities out of whack.
Neither solution attempts to strip extraneous characters.
The real WTF here is, not forwarding a copy of the email to my boss with the comment,
'You want me to waste any time on this guy? Because I sure don't.'
Admin
Admin
Yes, it absolutely is more readable, and the compiler will make the end result to be the same, so what's your problem?
Admin
Pfft. Here's in C:
It's half as many lines as the python version.
Admin
If my team spent this much time pontificating over such a triviality I'd shoot them all
Admin
You seem jealous that Python programmers don't have to spend most of their time rewriting functions that were probably invented in Charles Babbage's time.
I personally think Python should have MOAR built-in functions, the more the merrier.
Admin
Perhaps my point was too subtle.
I'll show you all of the code that the C version uses, including "some_function_someone_else_wrote()" and any other functions that end up (directly or indirectly) called by it; if you show me all of the code that the Python version uses, including the code for "any()" and any other functions that end up (directly or indirectly) called by it.
Then we'd have a fair comparison. I'd probably be posting about 50 lines of C, and you'd probably be posting several thousand lines of C (and a little Python).
If you're going to suggest that a line of code ceases to be a line of code if someone else wrote it (e.g. it exists in a library or as a built-in), then I'm going to suggest that all problems (regardless of how hard) can be solved with 1 line of C (by finding someone to write it for you).
Admin
You sound like the kind of person who sits around pondering how you can optimize "Hello World" by eliminating the massive overhead in stdio.h.
Besides, for your comparison to be really fair, you'd need to include the entire source code for the compiler, assembler, linker, and any included source files or linked libraries, not to mention the operating system and hardware drivers (and don't forget the shell you launched the compiler from!). After all, you didn't write any of those things—you're just taking advantage of other people's work in your "one line program".
Admin
I'm only pointing out the futility of comparing languages based on lines of code. You're helping to prove my point.
You seem like the sort of person who's deductive reasoning has atrophied. Because it's not polite to poke fun of disadvantaged people, I won't suggest that this is caused by spending all your time gluing together other people's code (rather than actually writing code and needing to think).
Admin
Such bravado.
You're completely neglecting prefactors, and have no knowledge of how big N is. If N is typically small, then your O(log N) solution with its (typically) large prefactor will be slower than a simple O(N) solution.
If 99.9% of the searches are for small words in short list of words, then you've wasted valuable developer time for a negligible performance gain, or a possible performance loss.
Get it working first. Optimize it later, when you know what the actual bottlenecks are. Critical thought and analysis is better done with real-world data.
Admin
Besides, his "O(log N)" solution is really an O(N log N) solution. (Blame the sort step.) So, even if this was a performance bottleneck in the production system, he's wasting his developer time trying to tinker with something he's pulled out of his hat while yielding a result that is still not properly optimized.
TL;DR: He could have gotten a O(N) (yes, linear time for the entire problem) solution without jumping through all those hoops, considering that Aho-Corasick has been implemented quite a few times over already.