- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
Assuming that the sub should return a ref and that adjacent tabs should insert a blank entry (parsing a tab-separated-value file frex):
sub split_at_tabs { return [split("\t", shift)]; }
Admin
Admin
Admin
Hm, maybe so. But the latest Java example I saw in the thread was:
String[] parts = str.trim().split("\s*;\s*");
Which is not nearly as suckish as the Perl, but still kind of lame.
Admin
Admin
Hmm, I think the problem is that Girlfriend has registered a listener for Chatup events.
By the way, does the Fiance class cause an additional drain on resources when compared to Girlfriend? I'm finding that since a call to the RelationshipFactory class my idle time has all but disappeared and subsequent profiling shows an increase in activity around Shopping related methods.
Anyway, must go as like the DBA on my team I like to take a full dump before leaving the office.
Admin
Admin
Through use of the term "contiguous", and no context, I'd assume you're talking to me if there was a C function called split()
Admin
I just ran my micro benchmarks using Java 1.3 (omitting the String.split of course, which doesn't exist), and the StringTokenizer implementation seems to have made big strides since then. I didn't realize how much (I haven't done much Java programming on strings for a while).
Since 1.4, it appears to be close to 200x faster than it was.
So if you started coding with 1.2, as I did, and avoided using StringTokenizer because it was so slow, as I did, you would likely fall on String.split as the answer to your prayers, while not realizing just how much better StringTokenizer has become.
Hence the cries of "WTF!" from the more grizzled Java programmers.
Admin
What did you say about my mother?...
I hate perl :( (altough i know about its power!)
Admin
Makes no difference, Perl treats the first parameter as a regex, whether you specify it that way or not.
Admin
To me, "split_at_tabs($line)" means "junior programmer wrote a f***ed up split function that I'm going to spend time and coffee trying to bugfix, or at least prove correct" while "split /\t/, $line" means "split line on tab characters, creating empty fields for leading and consecutive tab characters but no fields for trailing tab characters."
If I work 5 different Perl projects, "split_at_tabs" may mean 6 different things, while "split /\t/, $line" means exactly the same on each. Sometimes that kind of ambiguity is desirable, e.g. if there were a lot of parser classes in the project and each one has a different way to split at tabs, then I'd expect split_at_tabs to be a method defined differently over a group of classes to implement the correct polymorphic behavior on each one; however, at this point the project is long past questions of readability...
Imagine what you said in another language:
Part of the problem is that "split_at_tabs" is a bad function name. When I read that I immediately think of three implementations, and the function name doesn't provide a hint as to which:
If the function splits lines in some specific or non-trivial way, then it needs to be named after the requirement for the function (e.g. "split_nntp_xover_line" or "split_csv_line" or "split_psql_copy_table_output_line", which name file formats that provide a context for what we can expect from splitting lines).
Admin
For split there's no difference between /\t/ and "\t", the doc for split says specifically that the first argument is a regex.
You don't need to map or grep, if you want to treat multiple tabs as a delimiter you can use \t+ in the regex
If you want to store the result in an array reference you can build it on the fly:
Ergo:
PS. Yes, there's more than one way to do it ... and the elegant one is preferred
Admin
In C#:
string[] tokens = str.Split( ';' );
Java sucks.
Admin
Lol, 2 pages of arguments about the best way to split a string on a single-character delimiter in java...
I change my mind.
Java REALLY sucks.
Admin
Because TWTOWTDI holds true even in Java?
Plus, most of the argument is about perl. Go figure.
Admin
Admin
This is so much better than the rest of the thread.
Admin
I’m flabbergasted. You just penned tomorrow’s DailyWTF submission, right in the comments of DailyWTF itself… because you are wrong on every single counts. Ugh. (Congratulations?) Sorry, dude. :)
1. Capturing parens mean “keep this part of the delimiter.” So that split isn’t just splitting on tabs, it’s also returning every tab as an element of its own. 2. split always uses a regex. If you pass it a string, the string will be interpreted as a regex. There is only one special case, which is if you pass it a string consisting of a single space. 3. The appended tab is possibly a misguided attempt to force split to return trailing empty fields (by default, they are dropped). 4. There won’t be an empty elements in the resulting list. The map block returns an empty list for tabs, which result in nothing added to the output list. You can return any number of elements from each iteration of a map, including 0. 5. You didn’t get this one wrong… but you didn’t make much of a point in it either. 6. Sorry, but because you were wrong in #1, you also wrong in this.
Did you actually test your code?
Let’s clean this up. Starting point:
First, “I want trailing empty fields” (I can only assume this is what the original programmer wanted) means passing -1 to split:
Second, don’t use map to filter out element from a list. That’s grep’s job.
But why capture tabs only to throw them out?
That array is unnecessary:
Well, that’s not much of a function. Just inline it in the caller:
So that’s 4 things wrong:
1. Use of capturing parens to keep the tabs, only to then throw them out. 2. Useless appending of a tab to the input string. 3. map used in place of grep to filter a list. 4. Unnecessary function written for a job for which an inline call to a builtin function would suffice.
Admin
This kind of bad attitude is why so many people dislike Perl.
Admin
I've read so many string related WTFs recently, that I have decided to construct my own from the screaming scars they've left on my consciousness:
Caveat: The above is written using Visual C++ 7 and you compile it at your own risk, the author is not liable for any subsequent damage to any memory location that is incurred with its use.
Admin
Admin
hahaha
Admin
Admin
The short, equivalent java way to do it is:
s.trim().split("(\s*;\s*)+"));
Admin
So much easier in Mumps. There is a $P function. It can be used to set or get a specified piece of a string. It can be used in conjunction with $L which tells you the number of pieces the string contains.
Admin
Ahem....
$perl -e ' print split /,/, "foo, bar, baz";' foo bar baz
Admin
It just means you did not understand what the original code does.
Admin
Admin
Okay, let's summarize:
The input: a string like "foo, bar , ,,,zab,,lalab,," The output: a list, like [ "foo", "bar", "zab", "lalab" ] Python solution:
Perfectly readable if you understand python's [ x for x in a if cond ] syntax. Ruby Solution:
Again, perfectly readable. I can't see any Java solution with less than 4 lines.
CAPTCHA: we are DOOMed
Admin
(ok, small brain fart, I forgot the input.split(",") in Python's solution)
CAPTCHA: will you spare me for this, DUBYA?
Admin
just realized: if you change the order between map and find_all in my ruby solution, you avoid calling strip twice.
Admin
Admin
Admin
Unless setXXXXXX is being called many times per second, the difference between the performance of StringTokenizer and the performance of a regex split is negligible. And let us not forget that a regex split doesn't have to be str.split(";"). It can also be done with a single private Pattern object, allowing the code to do pattern.split(str). I bet that will affect your benchmark. (Accomplished Java programmers know that String.split creates and destroys a Pattern object with every call, so frequent calls to Striing.split using the same delimiter should be replaced with calls to a single Pattern instance.)
Admin
guess I didn't understand it either.
contrary to what some PP said, it's the unintelligible syntax that makes me dislike Perl, not the friendliness (or lack thereof) of it's fans
Admin
Just four?
Admin
Admin
If you don't know, then RTFM before you comment. Yes, the Perl regex engine is indeed smart enough to process any regular expression with only the amount of process implicit in the complexitiy of the expression. It does not need to be optimized away--it just needs to be provided with thoughtfully designed search patterns.
[Unlike the Captcha machine, which could use a spell-checker. The US Supremes have never to my knowledge, issued an infamous "Dread Scott" decision, so the proper historical allusion is "Dredlocks". ...whatever the Firefox idiot machine may say to the contrary--RTFH, and start with "Marcus Garvey"]
Admin
It's got nothing to do with Perl specifically. Like others have pointed out, if you don't find idiomatic constructs like
readable enough for you, and instead have to wrap them with functions like
then you have no business programming in ANY language.
I dislike ugly Perl code as much as anyone (especially since good Perl code is very easy to write...) - but an inline call to split() is absolutely not an example of it.
Admin
Could well be, but that's beside the point. The original suggestion was that using "\t" instead of /\t/ would be more efficient because it would use a constant string match, not a regex match. This is not true.
Regardless of what the underlying implementation does to optimise them, both expressions will be treated identically.
Admin
Especially when all come arrive at screwy, non-portable deadweight. Unless you are creating and consuming an array in a single scope, there is rarely good reason to create an explicit array in Perl. Here, in one attempt:
sub split_and_trim_at_tabs($string) { return $records = [split /\s\t\s/, $string]; }
BTW: the return statement is really a comment, since Perl always returns the last expression evaluated.
Admin
What the fuck? It shouldn't even be a function.
Unless of course you're fond of hardcoding parameters as part of the function name. After all, we all know that two_plus_two(); is way easier than 2 + 2, right?
Admin
Hmmm. In which language?
[code] my_prompt>perl -w use strict; #always use warnings; #ditto
my $string = "Hi, I am string,, which may or may not contain null strings, particularly between commas,, which doesn't make much grammatical sense, but does illustrate the actions of the Perl split()."; my $tokens = [split /\s,\s/, $string]; foreach (@$tokens) { print "token is $_\n"; }
^Z token is Hi token is I am string token is token is which may or may not contain null strings token is particularly between commas token is token is which doesn't make much grammatical sense token is but does illustrate the actions of the Perl split(). {/code]
Admin
OMG! Whoops, I've been away from Perl for too long, luxuriating in the land of PHP, with its standards function signatures. That should be:
Haven't seen the responses to this WTF yet, but I do indeed quake with trepidation at the flames to come.
Admin
Admin
Lexical variables (declared with my) are reallocated every time the scope in which they are declared is re-entered. (That also means that if you declare a my variable inside a loop and then store a reference to that variable on every iteration, you’ll have references to 20 different variables after 20 iterations, not 20 references to the same variable. Very handy.)
Admin
Btw, why are trimming whitespace when the original function didn’t? Do you know what the format of the data looks like, and that fields should be trimmed?
Admin
Perl... Java... who needs them if you have Haskell?
Admin