• RichiH (unregistered) in reply to Anonymous!

    If you absolutely must use a function, how about

    sub split_at_tabs { return (split('\t', $_[0])); }

  • Noone (unregistered) in reply to Veinor

    Why wouldn't you return a reference? If you return the actual array, it forces a copy of all the data, rather than passing a single value referencing the original. If this is called often or with sizable records, this can be considerably slower.

    Besides, if you're too lazy to deal with the reference, you can always force it back to a copy with:

    @records = @{split_at_line($line)};

    Three more characters, but the option for better performance if you need it. If you want to get all fancy (and harder to read), you can also use wantarray to determine whether the caller wants the scalar (reference) or the actual array in the first place, but this is getting way over-enterprisey for a function that doesn't really need to exist, anyhow.

  • AdT (unregistered)

    The java.util.Vector class is a WTF itself. The Java designers wanted to implement a resizable array-based container but did it wrong, ending up with O(n) amortized time complexity for adding a single element at the end. When they realized this (or maybe the Java users were realizing it because things were so painfully slow), they saw that they couldn't change this behavior without altering Vector's interface. They decided against that and wrote a new class called ArrayList which does essentially the same thing, only in a remotely sane way, and so has O(1) time complexity for "add".

    C++ and the STL are often cited as a bad example of "design by committee" but if design by committee is the alternative to "design by ad-hoc rectal extraction", then I, for one, welcome our new committee of overlords.

  • (cs) in reply to bstorer
    bstorer:
    Making a distinction is vital because they are two separate things. A reference is simply an abstraction within the language.

    A pointer is an abstraction within the language as well. A C-style pointer hides stuff like segments, virtual memory and memory-mapped IO from you.

    So if you're going to be precise, you need to speak of "C pointers" for what you mean and then you might as well also speak of "Java pointers", or just "pointers" if you're being sloppy anyway. The main reason not to do this is that "reference" is what the Java Language Specification calls it.

  • (cs) in reply to AdT
    AdT:
    The java.util.Vector class is a WTF itself. The Java designers wanted to implement a resizable array-based container but did it wrong, ending up with O(n) amortized time complexity for adding a single element at the end. When they realized this (or maybe the Java users were realizing it because things were so painfully slow), they saw that they couldn't change this behavior without altering Vector's interface. They decided against that and wrote a new class called ArrayList which does essentially the same thing, only in a remotely sane way, and so has O(1) time complexity for "add".
    Um, nonsense.

    Vector CAN be O(n) for adding an element if you set a capacityIncrement because then the size of the underlying array is increased by a constant when said array is full. This is only "painfully slow" if you built a very large Vector without setting an appropriately large increment. However, the default behaviour is to double the array size, which is the same thing ArrayList does. It is possible that some ancient (pre 1.1) implementation of Vector did not have that default behaviour, but obviously the problem WAS fixed without having to change the interface.

    ArrayList was created for two reasons: First, to implement the List interface as part of the Java 1.2 Collections framework without the legacy methods present in Vector and second, to offer an implementation without the speed penalty of Vector's inbuilt synchronization, which is usually uneeded or insufficient.

  • (cs) in reply to AdT
    AdT:
    The java.util.Vector class is a WTF itself. The Java designers wanted to implement a resizable array-based container but did it wrong, ending up with O(n) amortized time complexity for adding a single element at the end. When they realized this (or maybe the Java users were realizing it because things were so painfully slow), they saw that they couldn't change this behavior without altering Vector's interface. They decided against that and wrote a new class called ArrayList which does essentially the same thing, only in a remotely sane way, and so has O(1) time complexity for "add".

    C++ and the STL are often cited as a bad example of "design by committee" but if design by committee is the alternative to "design by ad-hoc rectal extraction", then I, for one, welcome our new committee of overlords.

    Both Vector's and ArrayList's add operations have an amortized complexity of O(N), where N is the number of elements being added. Unless you set vector.incrementCapacity to a fixed positive value, in which case it becomes O(N) with N being the number of elements in the array.

    You're also ignoring the fact that in both cases "add" can be used to insert into the middle of the list as well as append to the list, which with both classes is a O(N) operation, with N being the size of the array.

    I'm not a Java expert, but it looks to me like they got into trouble because they made Vector threadsafe. At some point they realized that 90% of coders didn't need the threadsafe behavior, but that they couldn't change it without breaking code for the other 10%, so they made a new class which wasn't threadsafe instead. So it wasn't really the interface that was the problem, it was the defined behavior.

    edit:

    Doh! Beaten...

  • AdT (unregistered) in reply to brazzy
    brazzy:
    However, the default behaviour is to double the array size, which is the same thing ArrayList does.

    Ok, so they did change the interface, but didn't care to update the documentation properly.

    "The capacity is always at least as large as the vector size; it is usually larger because as components are added to the vector, the vector's storage increases in chunks the size of capacityIncrement."

    Only in the details about the protected field capacityIncrement do they mention the size doubling behavior. Maybe I'm a purist, but first of all, a protected member is, for obvious reasons, not a part of the public interface, and second, "declaring data members protected is usually a design error" (Stroustrup, The C++ Programming Language, section 15.3.1.1).

    brazzy:
    It is possible that some ancient (pre 1.1) implementation of Vector did not have that default behaviour, but obviously the problem WAS fixed without having to change the interface.

    We had performance problems using java.util.Vector back in 2000 which we solved by switching to ArrayList. Borland JBuilder incidentally suffered from a similiar performance problem: If you concatenated hundreds of string literals using "+", it would take a long time to compile the code. After doing some performance measurements (showing that there was quadratic growth depending on the number of consecutive literals), it occurred to me that the Java compiler must have concatenated the string literals using java.lang.String's + operator instead of a StringBuilder...

    Back to java.util.Vector: I'm quite certain that the documentation back then was the largely the same as the documentation mentioned above, which contradicts Vector's modern behavior. It may not be a big contradiction and there probably were few programs noticeably affected by the change, but it's still odd.

    ArrayList was created for two reasons: First, to implement the List interface as part of the Java 1.2 Collections framework without the legacy methods present in Vector and second, to offer an implementation without the speed penalty of Vector's inbuilt synchronization, which is usually uneeded or insufficient.

    And you'll also notice that this time they decided not to include a "capacityIncrement" property and wrote in ArrayList's documentation: "The add operation runs in amortized constant time, that is, adding n elements requires O(n) time.", something which they didn't bother to do for java.util.Vector. Guess why?

    (I agree with you regarding the pointlessness of implicit synchronization. Unfortunately, many inexperienced programmers are vulnerable to this kind of snake oil.)

    It's also interesting that ArrayList doesn't have any protected fields except one inherited from AbstractList. I suppose Sun would be more than happy to remove the badly designed Vector from the API if that weren't such a huge compatibility problem.

  • AdT (unregistered) in reply to Devi
    Devi:
    You're also ignoring the fact that in both cases "add" can be used to insert into the middle of the list as well as append to the list, which with both classes is a O(N) operation, with N being the size of the array.

    Thanks for being pedantic^W precise. I was taking about the add(Object o) method, not the add(int index, Object element) method which almost every other container library calls insert.

    Devi:
    I'm not a Java expert, but it looks to me like they got into trouble because they made Vector threadsafe. At some point they realized that 90% of coders didn't need the threadsafe behavior, but that they couldn't change it without breaking code for the other 10%, so they made a new class which wasn't threadsafe instead.

    Don't buy the fairy tales - java.util.Vector is synchronized, not thread-safe. There is no such thing as built-in thread safety. Anyway, I guess Sun had several reasons for writing ArrayList (i.e. several design flaws in Vector).

  • (cs) in reply to AdT
    AdT:
    It's also interesting that ArrayList doesn't have any protected fields except one inherited from AbstractList. I suppose Sun would be more than happy to remove the badly designed Vector from the API if that weren't such a huge compatibility problem.
    True, and this is not helped by the large amount of very old Java tutorials floating around the web that kept and still do keep Vector popular in newly-written code.

    But I guess Sun considers the much-touted (and admittedly quite impressive) compatibility of Java code and even bytecode across platforms and versions too important to ever break it, even for things that are not only ugly or performance-degrading but outright dangerous like Thread.stop()

  • Anonymous (unregistered) in reply to Shinobu
    Shinobu:
    ParkinT:
    (He counts four things wrong with that function. Can you find them all?)
    The first thing wrong is that he is using PERL!!! [But, better than using VB, I suppose]
    Are you sure you'd rather spend your life Perling than VB'ing? Based on my experience in both, I'd rather not. Although I do concede that, as long as you don't need a GUI, a Perl program is generally shorter than the same in VB.

    I'd rather program in Perl than VB. Both have ugly syntaxes, but Perl at least allows me to do quick and dirty things with a very concise syntax. VB's syntax is not only clumsy, but also more irregular. Also VB's libraries are so useless and inconsistent. Perl is much better and the documentation is much clearer.

    As for GUI, there is Perl/Tk, which is quite nice. (I find TCL's syntax much more terrible than Perl.) When will VB have a widget library that does AUTOMATIC geometry management?

  • (cs) in reply to JD
    JD:
    Split is a very slow function implemented upon regex pattern. Using StringTokenizer is far most efficient. This isn't probably an issus in most application, but some application that require performances could not use String.split().

    Jake is probably a nice guy, but it look like he don't know anything in Java developement...

    I can't decide if the fist or 2nd comment is a better example of comment WTFs on this site. Both excellent examples.

  • Sasha (unregistered) in reply to Anon
    Anon:
    I always thought that readability was an anathema to Perl.

    If you try hard you can may unreadable progrma in any language. With Perl, remove "hard". But you still need to work on it.

    P.S. Too many Perl examples are still written like in Perl 4. I prefer 5.10.

  • (cs) in reply to Sasha
    Sasha:
    Anon:
    I always thought that readability was an anathema to Perl.

    If you try hard you can may unreadable progrma in any language. With Perl, remove "hard". But you still need to work on it.

    P.S. Too many Perl examples are still written like in Perl 4. I prefer 5.10.

    Say rather you can program readably in any language. With Perl add hard, but you still need to work on it.

    can't say I ever recall anyone complaining about any language about how hard it was to obfuscate.

    Addendum (2007-05-07 05:05): Oh well, my reply was just about as grammatical as the original post, upon second reading. If you try you can write readably in any language. With me add hard ;}

  • (cs) in reply to TheJasper
    TheJasper:
    can't say I ever recall anyone complaining about any language about how hard it was to obfuscate.
    That's probably because the first, easiest and most effective step towards obfuscation is to use meaningless or misleading identifiers, and I don't see how any programming language will ever be able to make that difficult, since *avoiding* it is actually a constant struggle.
  • (cs) in reply to brazzy
    brazzy:
    AdT:
    It's also interesting that ArrayList doesn't have any protected fields except one inherited from AbstractList. I suppose Sun would be more than happy to remove the badly designed Vector from the API if that weren't such a huge compatibility problem.
    True, and this is not helped by the large amount of very old Java tutorials floating around the web that kept and still do keep Vector popular in newly-written code.

    But I guess Sun considers the much-touted (and admittedly quite impressive) compatibility of Java code and even bytecode across platforms and versions too important to ever break it, even for things that are not only ugly or performance-degrading but outright dangerous like Thread.stop()

    All true. You guys may be interested in this (if you haven't seen it already). A great many people, including myself, would love to see Vector, Hashtable and Enumeration go extinct, but there's just no end in sight.

  • Worf (unregistered)

    I will note that there may be a good reason to not use strtok...

    From the GNU C manpage:

    BUGS
           Never use these functions. If you do, note that:
    
                  These functions modify their first argument.
    
                  The identity of the delimiting character is lost.
    
                  These functions cannot be used on constant strings.
    
                  The  strtok()  function  uses a static buffer while
                  parsing, so it's not thread safe. Use strtok_r() if
                  this matters to you.
    

    And don't provide an equivalent that does the same thing. strtok is very useful, afterall...

    If someone is reading that page and runs across that, well, they may just write their own. After all, the C library guys know what they're talking about, and if they say not to use it, well, why should we use it?

  • Andrew (unregistered) in reply to ParkinT
    ParkinT:
    (He counts four things wrong with that function. Can you find them all?)
    The first thing wrong is that he is using PERL!!!

    No, he's not using PERL. He's using Perl.

  • Cruncher (unregistered)

    Bad perl code competing with bad java code?

    Why not

    @stuff = split /\s+/,$some_string;

    then there is no need to trim each element of stuff, it is has been done, automagically, by the split operation. If your split does not do this, then your split function is rather borked. Based upon all the java comments I see going on here, it appears either java's split is borked, or people don't know how to use it.

    If you want perl code approximating the java code, why not

    $some_string =~ /\s+/ /g; # make multiple white space # become a single white space @stuff = split /\s/,$some_string;

    That is, make the perl be more verbose, and less obvious. Iterate this enough times, add in function calls with funny names, and a few long chained method calls

    that().sorta().look().like().this;

    and you will eventually get the Java code.

  • Simon O'Doherty (unregistered) in reply to nilp

    You shouldn't use StringTokenizer. If it is speed you are worried about use Pattern. It should out perform StringTokenizer unless you have put the compile part into the loop as well.

Leave a comment on “Splitting Headache”

Log In or post as a guest

Replying to comment #:

« Return to Article