| « Prev | Page 1 | Page 2 | Page 3 | Next » |
|
Split is a very slow function implemented upon regex pattern. Using StringTokenizer is far most efficient. This isn't probably an issus in most application, but some application that require performances could not use String.split().
Jake is probably a nice guy, but it look like he don't know anything in Java developement... |
|
What idiot wrote that Perl code?
It should be:
|
The first thing wrong is that he is using PERL!!! [But, better than using VB, I suppose] |
Why are you returning a reference? return @record; would work just as well, and let you say
|
The current API regards StringTokenizer as a legacy class which should not be used in new code, instead opting for split. Plus, split would have allowed the acts of tokenizing and then trimming to be implemented in one step. I'm not saying it's a performance improvement to do so, because I haven't bothered to test it. Still, even with StringTokenizer the code is a travesty. It puts the data into a StringTokenizer, into a Vector, into an Enumeration, and finally into an array. Let's not even go into the fact that StringTokenizer implements Enumeration. |
|
It's not even used :s nothing is returned or am I missing something...
|
|
Why was that perl code even a function? It is a one-liner in perl, and a really short one at that, AND a really common one anyone who uses perl should instantly recognize.
split /\t/,$line Done. What does a function gain you? |
|
Is the name of that function being setXXXXXX the result of anonymization? Because it's pretty obvious that it used to be setBureauLocations().
|
Nah... one X was not anonymized: setBureauxLocations() :) |
|
It's amusing that in a function (built in or otherwise) that's designed to throw away all the tabs, so much effort goes into keeping them:
my @_record = split(/(\t)/,$line."\t"); # why the capturing parens in the regexp? Why append a tab character to the end? my @record = map { $_ eq "\t" ? () : $_ } @_record; #now that you've split it on tabs, why would there be any tabs in the array? Why create a whole new array @record when you could just reassign it to @_record? Anyway, that's four, plus the fact that it you should just use the built-in split function in the first place. Well as the Perl mongers say, "there's more than one way to do it." I'd never really considered that some of those ways are not good. |
Don't be silly. Expecting the name of the function to reflect the code inside is crazy, given the incompetency of the rest of the code. For all we know, the name of the function was setInitech, and was anonymized to keep us from knowing the function was from Initech, even though we all knew that anyway. |
|
Too bad java isn't object oriented or anything. You have to store all data in a string and parse it out like that every single time.
/jerk I suppose this "could" be the start of loading everything into an object, but I'm not that naive. |
Re: Splitting Headache
2007-05-04 09:51
•
by
Robert Hanson
(unregistered)
|
When reading code, I understand split_at_tabs() a lot quicker than split /\t/,$line. For a very small price in performance (calling a function) you get a lot in readability and maintainability. |
Re: Splitting Headache
2007-05-04 09:57
•
by
sammy
(unregistered)
|
Hey, I'm all for maintaining readability, but if split /\t/,$line is that hard for you to read, code readability isn't your problem. |
Re: Splitting Headache
2007-05-04 09:57
•
by
Bart B
(unregistered)
|
then never code in perl. You are not suited to it :) |
|
The original function trims the individual fields and does not count empty fields. It also uses a semicolon as the field delimiter.
As of Java 1.4, it can be replaced with: String[] parts = str.trim().split("\\s*;\\s*"); |
Re: Splitting Headache
2007-05-04 10:01
•
by
aaaaaaaaaaaaaaaaaaa
(unregistered)
|
The name split_at_tabs, IMO, is worthless when a one-liner would do, especially one as common in idiomatic perl as split. Better to name the function in a way that gives a purpose and isn't wrong if they change from tab-separated to comma-separated, eg. extract_bureau_locations(). |
|
First of all, the perl-code and the java-code are not equivalent (the java-version has error-checking, trims each part and removes empty elements).
Second, the java-version can be made simpler (java has a library function converting Vector to array which can replace the last 6 lines, has been in java since 1.2) Third, if you want functionalty like the perl-version, you can get it down to 6 lines as well:
Captcha: there is nothing like a flame-war at the end of the week... |
Re: Splitting Headache
2007-05-04 10:10
•
by
Larry Rubinow
(unregistered)
|
|
split() takes either a regex or a string as its delimiter. I don't know whether Perl is smart enough to optimize away the invocation of the regex engine in this case, but why risk it?
|
Re: Splitting Headache
2007-05-04 10:11
•
by
David
(unregistered)
|
I think the 2nd WTF here is that it takes several attempts to get the Perl right. |
|
This hits home, as I am currently splitting a short string.
Do let me point out some WTF: "A sequence of two or more contiguous delimiter characters in the parsed string is considered to be a single delimiter. Delimiter characters at the start or end of the string are ignored. Put another way: the tokens returned by strtok() are always non-empty strings." so: "foo,bar,,baz" and "foo,bar,baz" and "foo,,,,,,,,,,,,,,,,,,,,,bar,baz" are equivalent. No errors, no NULL returned, just silent "pretending it's okay". Gee, thanks! So I'm using strchr(). |
|
Even prior to Java 1.4 this could be accomplished much more succinctly:
|
Re: Splitting Headache
2007-05-04 10:23
•
by
Shinobu
(unregistered)
|
Are you sure you'd rather spend your life Perling than VB'ing? Based on my experience in both, I'd rather not. Although I do concede that, as long as you don't need a GUI, a Perl program is generally shorter than the same in VB. |
So, the four things wrong: 1. Uses grouping parens in the split regex where not needed 2. Uses a regex in the split where it's not needed 3. No reason to add the trailing tab to the split input string 4. Use of 'map' instead of 'grep' results in empty fields 5. No obvious reason to return an array ref rather than the array (though there may be design considerations extrinsic to the example) And I guess that's five things. :) The whole thing could be written shorter and better as
Or, if you're into readability and actually do want to return the array reference:
|
Re: Splitting Headache
2007-05-04 10:28
•
by
Larry Rubinow
(unregistered)
|
Okay, I'm an idiot; add another thing wrong. The grep should more simply be grep { length $_ } |
Re: Splitting Headache
2007-05-04 10:33
•
by
SomeCoder
(unregistered)
|
I was just going to say that. However, you might want to just remove the regex part of it since we don't need it: my @record = split("\t", $line); Done. And if anyone can't read that easily, then please stop programming. I know Perl can be really hard to read but that line right there couldn't be more self explanatory. |
|
Bollocks, omitted the check for a zero length string:
|
Re: Splitting Headache
2007-05-04 10:34
•
by
anon
(unregistered)
|
Better question: Why are you writing readable perl? |
Re: Splitting Headache
2007-05-04 10:34
•
by
Anon
(unregistered)
|
I always thought that readability was an anathema to Perl. |
Re: Splitting Headache
2007-05-04 10:44
•
by
burned
(unregistered)
|
|
If you want to retain the empty strings between contiguous delimiters then use split().
so "foo,bar,,baz" = "foo","bar","","baz" and "foo,bar,baz" = "foo","bar","baz" |
Re: Splitting Headache
2007-05-04 10:47
•
by
Steve
(unregistered)
|
Then you're not a Perl programmer. And Perl will inline the function anyway. HTH. HAND. |
(He counts four things wrong with that function. Can you find them all?) Yes, I can. 1. my ($line) = @_; 2. my @_record = split(/(\t)/,$line."\t"); 3. my @record = map { $_ eq "\t" ? () : $_ } @_record; 4. return \@record; |
Re: Splitting Headache
2007-05-04 10:52
•
by
Mike
(unregistered)
|
What you just said would be like someone saying of C code that "increment_by_one(&x);" is more clear than "x++;" (ugh, it looks like line noise!) |
|
I'm not a java programmer, but I thought that language didn't have any pointers? If so, what's up with all the NULL stuff?
|
Re: Splitting Headache
2007-05-04 10:56
•
by
TylerK
(unregistered)
|
No it should not. There should be no grep at all. The rwtf is the suggested "fixes" for this godawful function. |
|
Sadly, there are lots of app server installations in the Java world that are stuck at Java 1.3. Personally, I have never been able to use 1.4, and I think this is true of many people who work with IBM frameworks that are built on top of WebSphere. So there might be good reasons for rolling your own split(), or there may be no compelling reason to go back and swap in the real split after an upgrade to 1.4.
Of course, that doesn't explain using Vector... |
Except that in your example, should a trimmed string be empty, it is added. In the original, it's discarded.
Everything in Java (except native types like int) is actually a reference to the object. Thus, the reference can be to null. For example: Foo obj;In this example, obj is simply created as a reference to null. No object is created by default. On the other hand, obj2 references a new object using the zero-parameter constructor. The difference between these references and pointers is that you can't just point to random places in memory, but only reference either null or objects already created. |
Re: Splitting Headache
2007-05-04 11:03
•
by
Dave
(unregistered)
|
|
Um, in Java EVERYTHING is a pointer.
|
Re: Splitting Headache
2007-05-04 11:05
•
by
nuller
(unregistered)
|
|
but java have references, and those can point to nothing
|
Re: Splitting Headache
2007-05-04 11:10
•
by
JB
(unregistered)
|
|
Java does have pointers (hence the existence of the famous NullpointerException). It doesn't have pointer arithmetics though.
In fact, in Java, all the variables that don't contain a basic type (char, int, float, etc.) are pointers to objects. And these pointers are passed by value. |
I hate it when people say this. First, native types are obviously not pointers. Second, neither is anything else. You can access arbitrary memory, you can't do pointer arithmetic, and you don't have to concern yourself with the deallocation of the object. But most importantly -- and pay close attention because this is a subtle, but vital, difference -- Java's references are not first-class objects. You can't create a reference to a reference and you can't manipulate them directly (except to change what they reference). In some sense, they are merely a syntactic convenience to bridge the way the computer will behave with the way we want to think about it. |
|
What this thread really shows is that Java and Perl both suck
PythonWin 2.5 (r25:51908, Mar 9 2007, 17:40:28) [MSC v.1310 32 bit (Intel)] on win32. Portions Copyright 1994-2006 Mark Hammond - see 'Help/About PythonWin' for further copyright information. >>> print 'foo, bar, baz'.split(', ') ['foo', 'bar', 'baz'] |
|
C#:
str.split(';'); |
Sounds good to me.
But how did we prove Java and Perl suck? You can do the exact same thing in Java. |
I've just done some micro benchmarks (it's a slow morning here) that shows the string split method is 50% slower than the original StringTokenizer method. Chris's more efficient StringTokenizer method is 3x faster than using split. And the regex split doesn't even omit zero length strings like the original. Considering how slow string manipulation is in Java, I would go for the longer StringTokenizer method every time. |
NullPointerException is a horrible misnomer. It represents what happened in the underlying architecture of the JVM (which obviously uses pointers), and not what happened in the programming language. The Java programming language does not have pointers. It has references, which are different, but have many of the same features. There are tons of programmers out there who "know" that pointer == reference. But that don't make it true. |
Chris's method, as I mentioned above, doesn't discard empty strings. It's something akin to saying, "Brute-forcing 1024-bit key RSA is a much slower method than simply adding 2 + 2." Well, sure, but they don't accomplish the same thing, do they? Apples to oranges, chief. |
Neither does your regex. Anyway, I used Chris's amended method - look for his second post (assuming it's the same Chris). And I added code to iterate over the array returned from the regex split to remove empty strings, and double checked that the two methods did indeed return the same arrays. How about trying it yourself, mate. And none of that changes the fact that the original inefficient code is still faster than the string split. Addendum (2007-05-04 11:54): Sorry, I shouldn't have said "your regex" ... I didn't check who posted what. |
Re: Splitting Headache
2007-05-04 11:56
•
by
Chris
(unregistered)
|
Yup, both those posts reference the same Chris object. In fact I like to behave much like a Singleton, but it causes exceptions from a particular Girlfriend object whenever I try to interact with other instances of the HotBabe class. |
I don't question that StringTokenizer would be faster. I merely wanted to make sure we compare like functions. I had missed Chris's second post, which does take care of the issue. For the record, had I been given this problem, I would likely have used StringTokenizer; splitting on a static regex seems too heavy-weight to me. |
| « Prev | Page 1 | Page 2 | Page 3 | Next » |