- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
Split is a very slow function implemented upon regex pattern. Using StringTokenizer is far most efficient. This isn't probably an issus in most application, but some application that require performances could not use String.split().
Jake is probably a nice guy, but it look like he don't know anything in Java developement...
Admin
What idiot wrote that Perl code?
It should be:
Admin
Admin
Why are you returning a reference?
would work just as well, and let you sayAdmin
Admin
It's not even used :s nothing is returned or am I missing something...
Admin
Why was that perl code even a function? It is a one-liner in perl, and a really short one at that, AND a really common one anyone who uses perl should instantly recognize.
split /\t/,$line
Done. What does a function gain you?
Admin
Is the name of that function being setXXXXXX the result of anonymization? Because it's pretty obvious that it used to be setBureauLocations().
Admin
Admin
It's amusing that in a function (built in or otherwise) that's designed to throw away all the tabs, so much effort goes into keeping them:
my @_record = split(/(\t)/,$line."\t");
why the capturing parens in the regexp? Why append a tab character to the end?
my @record = map { $_ eq "\t" ? () : $_ } @_record; #now that you've split it on tabs, why would there be any tabs in the array? Why create a whole new array @record when you could just reassign it to @_record?
Anyway, that's four, plus the fact that it you should just use the built-in split function in the first place. Well as the Perl mongers say, "there's more than one way to do it." I'd never really considered that some of those ways are not good.
Admin
Admin
Too bad java isn't object oriented or anything. You have to store all data in a string and parse it out like that every single time.
/jerk
I suppose this "could" be the start of loading everything into an object, but I'm not that naive.
Admin
When reading code, I understand split_at_tabs() a lot quicker than split /\t/,$line.
For a very small price in performance (calling a function) you get a lot in readability and maintainability.
Admin
Hey, I'm all for maintaining readability, but if split /\t/,$line is that hard for you to read, code readability isn't your problem.
Admin
then never code in perl. You are not suited to it :)
Admin
The original function trims the individual fields and does not count empty fields. It also uses a semicolon as the field delimiter.
As of Java 1.4, it can be replaced with:
String[] parts = str.trim().split("\s*;\s*");
Admin
The name split_at_tabs, IMO, is worthless when a one-liner would do, especially one as common in idiomatic perl as split. Better to name the function in a way that gives a purpose and isn't wrong if they change from tab-separated to comma-separated, eg. extract_bureau_locations().
Admin
First of all, the perl-code and the java-code are not equivalent (the java-version has error-checking, trims each part and removes empty elements).
Second, the java-version can be made simpler (java has a library function converting Vector to array which can replace the last 6 lines, has been in java since 1.2)
Third, if you want functionalty like the perl-version, you can get it down to 6 lines as well:
Captcha: there is nothing like a flame-war at the end of the week...
Admin
split() takes either a regex or a string as its delimiter. I don't know whether Perl is smart enough to optimize away the invocation of the regex engine in this case, but why risk it?
Admin
I think the 2nd WTF here is that it takes several attempts to get the Perl right.
Admin
This hits home, as I am currently splitting a short string. Do let me point out some WTF: "A sequence of two or more contiguous delimiter characters in the parsed string is considered to be a single delimiter. Delimiter characters at the start or end of the string are ignored. Put another way: the tokens returned by strtok() are always non-empty strings."
so: "foo,bar,,baz" and "foo,bar,baz" and "foo,,,,,,,,,,,,,,,,,,,,,bar,baz"
are equivalent.
No errors, no NULL returned, just silent "pretending it's okay". Gee, thanks!
So I'm using strchr().
Admin
Even prior to Java 1.4 this could be accomplished much more succinctly:
Admin
Admin
And I guess that's five things. :)
The whole thing could be written shorter and better as
Or, if you're into readability and actually do want to return the array reference:
Admin
grep { length $_ }
Admin
I was just going to say that. However, you might want to just remove the regex part of it since we don't need it:
my @record = split("\t", $line);
Done.
And if anyone can't read that easily, then please stop programming. I know Perl can be really hard to read but that line right there couldn't be more self explanatory.
Admin
Bollocks, omitted the check for a zero length string:
Admin
Better question: Why are you writing readable perl?
Admin
I always thought that readability was an anathema to Perl.
Admin
If you want to retain the empty strings between contiguous delimiters then use split(). so "foo,bar,,baz" = "foo","bar","","baz" and "foo,bar,baz" = "foo","bar","baz"
Admin
Admin
(He counts four things wrong with that function. Can you find them all?)
Yes, I can.
Admin
What you just said would be like someone saying of C code that "increment_by_one(&x);" is more clear than "x++;" (ugh, it looks like line noise!)
Admin
I'm not a java programmer, but I thought that language didn't have any pointers? If so, what's up with all the NULL stuff?
Admin
No it should not. There should be no grep at all.
The rwtf is the suggested "fixes" for this godawful function.
Admin
Sadly, there are lots of app server installations in the Java world that are stuck at Java 1.3. Personally, I have never been able to use 1.4, and I think this is true of many people who work with IBM frameworks that are built on top of WebSphere. So there might be good reasons for rolling your own split(), or there may be no compelling reason to go back and swap in the real split after an upgrade to 1.4.
Of course, that doesn't explain using Vector...
Admin
Admin
Um, in Java EVERYTHING is a pointer.
Admin
but java have references, and those can point to nothing
Admin
Java does have pointers (hence the existence of the famous NullpointerException). It doesn't have pointer arithmetics though.
In fact, in Java, all the variables that don't contain a basic type (char, int, float, etc.) are pointers to objects. And these pointers are passed by value.
Admin
Admin
What this thread really shows is that Java and Perl both suck
PythonWin 2.5 (r25:51908, Mar 9 2007, 17:40:28) [MSC v.1310 32 bit (Intel)] on win32. Portions Copyright 1994-2006 Mark Hammond - see 'Help/About PythonWin' for further copyright information.
Admin
C#: str.split(';');
Admin
Admin
I've just done some micro benchmarks (it's a slow morning here) that shows the string split method is 50% slower than the original StringTokenizer method. Chris's more efficient StringTokenizer method is 3x faster than using split. And the regex split doesn't even omit zero length strings like the original.
Considering how slow string manipulation is in Java, I would go for the longer StringTokenizer method every time.
Admin
Admin
Chris's method, as I mentioned above, doesn't discard empty strings. It's something akin to saying, "Brute-forcing 1024-bit key RSA is a much slower method than simply adding 2 + 2." Well, sure, but they don't accomplish the same thing, do they? Apples to oranges, chief.
Admin
Neither does your regex.
Anyway, I used Chris's amended method - look for his second post (assuming it's the same Chris). And I added code to iterate over the array returned from the regex split to remove empty strings, and double checked that the two methods did indeed return the same arrays.
How about trying it yourself, mate.
And none of that changes the fact that the original inefficient code is still faster than the string split.
Addendum (2007-05-04 11:54): Sorry, I shouldn't have said "your regex" ... I didn't check who posted what.
Admin
Yup, both those posts reference the same Chris object. In fact I like to behave much like a Singleton, but it causes exceptions from a particular Girlfriend object whenever I try to interact with other instances of the HotBabe class.
Admin