• div (unregistered) in reply to Larry Rubinow
    Larry Rubinow:
    4. Use of 'map' instead of 'grep' results in empty fields
    And of course would be unnecessary under split("\t", @) or split(/\t+/, @).
    Larry Rubinow:
    5. No obvious reason to return an array ref rather than the array (though there may be design considerations extrinsic to the example)
    It's often useful to return refs, however this is not the way to do it - taking a reference to a lexical causes the value to remain live and not be garbage collected when the scope is left, however the perl optimizer will often choose to use the same storage on re-entry to the scope. It is highly unlikely that it was intended that subsequent calls to split_at_tabs would trash the saved results of a previous call.

    Assuming that the sub should return a ref and that adjacent tabs should insert a blank entry (parsing a tab-separated-value file frex):

    sub split_at_tabs { return [split("\t", shift)]; }

  • (cs) in reply to bstorer
    bstorer:
    The Java programming language does not have pointers. It has references, which are different, but have many of the same features. There are tons of programmers out there who "know" that pointer == reference. But that don't make it true.
    From the point of view of a Java programmer, making a distinction between a pointer and a reference is pointless. The distinction appears under the covers of the VM, and since you can't really inspect a reference, this isn't visible to someone programming in the language.
  • (cs) in reply to Chris
    Chris:
    nilp:
    Chris's method, as I mentioned above, doesn't discard empty strings. It's something akin to saying, "Brute-forcing 1024-bit key RSA is a much slower method than simply adding 2 + 2." Well, sure, but they don't accomplish the same thing, do they? Apples to oranges, chief.

    Neither does your regex.

    Anyway, I used Chris's amended method - look for his second post (assuming it's the same Chris). And I added code to iterate over the array returned from the regex split to remove empty strings, and double checked that the two methods did indeed return the same arrays.

    How about trying it yourself, mate.

    Yup, both those posts reference the same Chris object. In fact I like to behave much like a Singleton, but it causes exceptions from a particular Girlfriend object whenever I try to interact with other instances of the HotBabe class.

    Clearly you need to use a transparent proxy. Give me the addresses of these HotBabe objects, and I'll interface with them for you. Now all I have to do is catch those pesky FianceExceptions before they become fatal errors...
  • infidel (unregistered) in reply to bstorer
    bstorer:
    infidel:
    What this thread really shows is that Java and Perl both suck

    PythonWin 2.5 (r25:51908, Mar 9 2007, 17:40:28) [MSC v.1310 32 bit (Intel)] on win32. Portions Copyright 1994-2006 Mark Hammond - see 'Help/About PythonWin' for further copyright information.

    print 'foo, bar, baz'.split(', ') ['foo', 'bar', 'baz']

    Sounds good to me.
    irb(main):001:0> 'foo, bar, baz'.split(',')
    => ["foo", " bar", " baz"]
    
    But how did we prove Java and Perl suck? You can do the exact same thing in Java.

    Hm, maybe so. But the latest Java example I saw in the thread was:

    String[] parts = str.trim().split("\s*;\s*");

    Which is not nearly as suckish as the Perl, but still kind of lame.

  • (cs) in reply to Thuktun
    Thuktun:
    bstorer:
    The Java programming language does not have pointers. It has references, which are different, but have many of the same features. There are tons of programmers out there who "know" that pointer == reference. But that don't make it true.
    From the point of view of a Java programmer, making a distinction between a pointer and a reference is pointless. The distinction appears under the covers of the VM, and since you can't really inspect a reference, this isn't visible to someone programming in the language.
    Making a distinction is vital because they are two separate things. A reference is simply an abstraction within the language. A pointer, in Java, just doesn't exist. Should I claim that in BASIC we should be to refer to arrays as vectors because BASIC doesn't have vectors, but an array is kinda similar?
  • Chris (unregistered) in reply to bstorer
    bstorer:
    Chris:
    Yup, both those posts reference the same Chris object. In fact I like to behave much like a Singleton, but it causes exceptions from a particular Girlfriend object whenever I try to interact with other instances of the HotBabe class.
    Clearly you need to use a transparent proxy. Give me the addresses of these HotBabe objects, and I'll interface with them for you. Now all I have to do is catch those pesky FianceExceptions before they become fatal errors...

    Hmm, I think the problem is that Girlfriend has registered a listener for Chatup events.

    By the way, does the Fiance class cause an additional drain on resources when compared to Girlfriend? I'm finding that since a call to the RelationshipFactory class my idle time has all but disappeared and subsequent profiling shows an increase in activity around Shopping related methods.

    Anyway, must go as like the DBA on my team I like to take a full dump before leaving the office.

  • (cs) in reply to Chris
    Chris:
    bstorer:
    Chris:
    Yup, both those posts reference the same Chris object. In fact I like to behave much like a Singleton, but it causes exceptions from a particular Girlfriend object whenever I try to interact with other instances of the HotBabe class.
    Clearly you need to use a transparent proxy. Give me the addresses of these HotBabe objects, and I'll interface with them for you. Now all I have to do is catch those pesky FianceExceptions before they become fatal errors...

    Hmm, I think the problem is that Girlfriend has registered a listener for Chatup events.

    By the way, does the Fiance class cause an additional drain on resources when compared to Girlfriend? I'm finding that since a call to the RelationshipFactory class my idle time has all but disappeared and subsequent profiling shows an increase in activity around Shopping related methods.

    Anyway, must go as like the DBA on my team I like to take a full dump before leaving the office.

    I found that upon casting Girlfriend to Fiance, normal program flow is interrupted frequently by SIG_WEDDING_PLAN. You'll learn not to extend Listener in a hurry. I have to cast Fiance to Wife in a few months, and my main concern is that others have warned me that Wife tries to spawn child processes, which are a huge drain on resources.
  • (cs) in reply to burned
    burned:
    If you want to retain the empty strings between contiguous delimiters then use split(). so "foo,bar,,baz" = "foo","bar","","baz" and "foo,bar,baz" = "foo","bar","baz"

    Through use of the term "contiguous", and no context, I'd assume you're talking to me if there was a C function called split()

  • (cs) in reply to bstorer
    bstorer:
    I don't question that StringTokenizer would be faster. I merely wanted to make sure we compare like functions. I had missed Chris's second post, which does take care of the issue. For the record, had I been given this problem, I would likely have used StringTokenizer; splitting on a static regex seems too heavy-weight to me.

    I just ran my micro benchmarks using Java 1.3 (omitting the String.split of course, which doesn't exist), and the StringTokenizer implementation seems to have made big strides since then. I didn't realize how much (I haven't done much Java programming on strings for a while).

    Since 1.4, it appears to be close to 200x faster than it was.

    So if you started coding with 1.2, as I did, and avoided using StringTokenizer because it was so slow, as I did, you would likely fall on String.split as the answer to your prayers, while not realizing just how much better StringTokenizer has become.

    Hence the cries of "WTF!" from the more grizzled Java programmers.

  • Rafael Larios (unregistered) in reply to Tom
    Tom:

    my @record = map { $_ eq "\t" ? () : $_ } @_record;

    What did you say about my mother?...

    I hate perl :( (altough i know about its power!)

  • KingPong (unregistered) in reply to SomeCoder
    I was just going to say that. However, you might want to just remove the regex part of it since we don't need it:

    my @record = split("\t", $line);

    Makes no difference, Perl treats the first parameter as a regex, whether you specify it that way or not.

  • Zygo (unregistered) in reply to Robert Hanson
    Robert Hanson:
    Lrep:
    split /\t/,$line

    When reading code, I understand split_at_tabs() a lot quicker than split /\t/,$line.

    For a very small price in performance (calling a function) you get a lot in readability and maintainability.

    To me, "split_at_tabs($line)" means "junior programmer wrote a f***ed up split function that I'm going to spend time and coffee trying to bugfix, or at least prove correct" while "split /\t/, $line" means "split line on tab characters, creating empty fields for leading and consecutive tab characters but no fields for trailing tab characters."

    If I work 5 different Perl projects, "split_at_tabs" may mean 6 different things, while "split /\t/, $line" means exactly the same on each. Sometimes that kind of ambiguity is desirable, e.g. if there were a lot of parser classes in the project and each one has a different way to split at tabs, then I'd expect split_at_tabs to be a method defined differently over a group of classes to implement the correct polymorphic behavior on each one; however, at this point the project is long past questions of readability...

    Imagine what you said in another language:

    When reading code, I understand "dereference_and_increment_pointer()" a lot quicker than *p++.

    Part of the problem is that "split_at_tabs" is a bad function name. When I read that I immediately think of three implementations, and the function name doesn't provide a hint as to which:

        split /\t+/, $line;   # "\tfoo\t\tbar" is 3 fields
        split /\t/, $line;    # "\tfoo\t\tbar" is 4 fields
        grep(length($_), split /\t+/, $line); # "\tfoo\t\tbar" is 2 fields
    

    If the function splits lines in some specific or non-trivial way, then it needs to be named after the requirement for the function (e.g. "split_nntp_xover_line" or "split_csv_line" or "split_psql_copy_table_output_line", which name file formats that provide a context for what we can expect from splitting lines).

  • kixx (unregistered)
    1. For split there's no difference between /\t/ and "\t", the doc for split says specifically that the first argument is a regex.

    2. You don't need to map or grep, if you want to treat multiple tabs as a delimiter you can use \t+ in the regex

    3. If you want to store the result in an array reference you can build it on the fly:

     $arrayref = [ function_that_returns_an_array ]; 

    Ergo:

     $fields = [ split /\t+/, $string ]; 

    PS. Yes, there's more than one way to do it ... and the elegant one is preferred

  • RON (unregistered)

    In C#:

    string[] tokens = str.Split( ';' );

    Java sucks.

  • RON (unregistered)

    Lol, 2 pages of arguments about the best way to split a string on a single-character delimiter in java...

    I change my mind.

    Java REALLY sucks.

  • Insensitive Clod (unregistered) in reply to RON
    RON:
    Lol, 2 pages of arguments about the best way to split a string on a single-character delimiter in java...

    I change my mind.

    Java REALLY sucks.

    Because TWTOWTDI holds true even in Java?

    Plus, most of the argument is about perl. Go figure.

  • (cs) in reply to div
    div:
    taking a reference to a lexical causes the value to remain live and not be garbage collected when the scope is left, however the perl optimizer will often choose to use the same storage on re-entry to the scope. It is highly unlikely that it was intended that subsequent calls to split_at_tabs would trash the saved results of a previous call.
    No, the "my" means that a new object is referenced every time. The "don't take references to arrays" gotcha applies to globals, not lexicals.
  • (cs) in reply to bstorer
    bstorer:
    Chris:
    bstorer:
    Chris:
    Yup, both those posts reference the same Chris object. In fact I like to behave much like a Singleton, but it causes exceptions from a particular Girlfriend object whenever I try to interact with other instances of the HotBabe class.
    Clearly you need to use a transparent proxy. Give me the addresses of these HotBabe objects, and I'll interface with them for you. Now all I have to do is catch those pesky FianceExceptions before they become fatal errors...

    Hmm, I think the problem is that Girlfriend has registered a listener for Chatup events.

    By the way, does the Fiance class cause an additional drain on resources when compared to Girlfriend? I'm finding that since a call to the RelationshipFactory class my idle time has all but disappeared and subsequent profiling shows an increase in activity around Shopping related methods.

    Anyway, must go as like the DBA on my team I like to take a full dump before leaving the office.

    I found that upon casting Girlfriend to Fiance, normal program flow is interrupted frequently by SIG_WEDDING_PLAN. You'll learn not to extend Listener in a hurry. I have to cast Fiance to Wife in a few months, and my main concern is that others have warned me that Wife tries to spawn child processes, which are a huge drain on resources.

    This is so much better than the rest of the thread.

  • (cs) in reply to Larry Rubinow
    Larry Rubinow:
    So, the four things wrong: 1. Uses grouping parens in the split regex where not needed 2. Uses a regex in the split where it's not needed 3. No reason to add the trailing tab to the split input string 4. Use of 'map' instead of 'grep' results in empty fields 5. No obvious reason to return an array ref rather than the array (though there may be design considerations extrinsic to the example)

    And I guess that's five things. :)

    The whole thing could be written shorter and better as

    sub split_at_tabs2 {
       return grep { $_ eq "\t" ? () : $_ } split "\t", shift;
    }
    
    Larry Rubinow:
    Okay, I'm an idiot; add another thing wrong. The grep should more simply be

    grep { length $_ }

    I’m flabbergasted. You just penned tomorrow’s DailyWTF submission, right in the comments of DailyWTF itself… because you are wrong on every single counts. Ugh. (Congratulations?) Sorry, dude. :)

    1. Capturing parens mean “keep this part of the delimiter.” So that split isn’t just splitting on tabs, it’s also returning every tab as an element of its own. 2. split always uses a regex. If you pass it a string, the string will be interpreted as a regex. There is only one special case, which is if you pass it a string consisting of a single space. 3. The appended tab is possibly a misguided attempt to force split to return trailing empty fields (by default, they are dropped). 4. There won’t be an empty elements in the resulting list. The map block returns an empty list for tabs, which result in nothing added to the output list. You can return any number of elements from each iteration of a map, including 0. 5. You didn’t get this one wrong… but you didn’t make much of a point in it either. 6. Sorry, but because you were wrong in #1, you also wrong in this.

    Did you actually test your code?

    Let’s clean this up. Starting point:

    sub split_at_tabs {
       my ($line) = @_;
       my @_record = split(/(\t)/,$line."\t");
       my @record = map { $_ eq "\t" ? () : $_ } @_record;
       return \@record;
    }

    First, “I want trailing empty fields” (I can only assume this is what the original programmer wanted) means passing -1 to split:

    sub split_at_tabs {
       my ($line) = @_;
       my @_record = split /(\t)/, $line, -1;
       my @record = map { $_ eq "\t" ? () : $_ } @_record;
       return \@record;
    }

    Second, don’t use map to filter out element from a list. That’s grep’s job.

    sub split_at_tabs {
       my ($line) = @_;
       my @_record = split /(\t)/, $line, -1;
       my @record = grep { $_ ne "\t" } @_record;
       return \@record;
    }

    But why capture tabs only to throw them out?

    sub split_at_tabs {
       my ($line) = @_;
       my @record = split /\t/, $line, -1;
       return \@record;
    }

    That array is unnecessary:

    sub split_at_tabs {
       my ($line) = @_;
       return [ split /\t/, $line, -1 ];
    }

    Well, that’s not much of a function. Just inline it in the caller:

    $fields = [ split /\t/, $line, -1 ];

    So that’s 4 things wrong:

    1. Use of capturing parens to keep the tabs, only to then throw them out. 2. Useless appending of a tab to the input string. 3. map used in place of grep to filter a list. 4. Unnecessary function written for a job for which an inline call to a builtin function would suffice.

  • Frostcat (unregistered) in reply to Bart B
    Bart B:
    then never code in perl. You are not suited to it :)

    This kind of bad attitude is why so many people dislike Perl.

  • (cs)

    I've read so many string related WTFs recently, that I have decided to construct my own from the screaming scars they've left on my consciousness:

    size_t SplitAtTabs(char *toSplit, const char *out[])
    {
    	const char **output=out;
    	*output++=toSplit;
    	while(*((&(*toSplit++-=(*toSplit=='\t'?((*output++=toSplit+1)!=NULL)*'\t':0)))+1));
    
    	return (size_t)(output-out);
    }
    

    Caveat: The above is written using Visual C++ 7 and you compile it at your own risk, the author is not liable for any subsequent damage to any memory location that is incurred with its use.

  • Nub (unregistered) in reply to Devi
    Devi:
    I've read so many string related WTFs recently, that I have decided to construct my own from the screaming scars they've left on my consciousness:
    size_t SplitAtTabs(char *toSplit, const char *out[])
    {
    	const char **output=out;
    	*output++=toSplit;
    	while(*((&(*toSplit++-=(*toSplit=='\t'?((*output++=toSplit+1)!=NULL)*'\t':0)))+1));
    
    	return (size_t)(output-out);
    }
    

    Caveat: The above is written using Visual C++ 7 and you compile it at your own risk, the author is not liable for any subsequent damage to any memory location that is incurred with its use.

    Im almost gonna have to steal it.
  • Brandon (unregistered) in reply to sammy

    hahaha

  • (cs) in reply to TylerK
    TylerK:
    Yes, I can. 1. my ($line) = @_; 2. my @_record = split(/(\t)/,$line."\t"); 3. my @record = map { $_ eq "\t" ? () : $_ } @_record; 4. return \@record;
    Hilarious:DDD
  • Guest (unregistered) in reply to qbolec

    The short, equivalent java way to do it is:

    s.trim().split("(\s*;\s*)+"));

  • Ed (unregistered)

    So much easier in Mumps. There is a $P function. It can be used to set or get a specified piece of a string. It can be used in conjunction with $L which tells you the number of pieces the string contains.

  • lefty (unregistered) in reply to bstorer

    Ahem....

    $perl -e ' print split /,/, "foo, bar, baz";' foo bar baz

  • Guest (unregistered) in reply to RON

    It just means you did not understand what the original code does.

  • Guest (unregistered) in reply to RON
    RON:
    In C#:

    string[] tokens = str.Split( ';' );

    Java sucks.

    It just means you did not understand what the original code does.

  • snqow (unregistered)

    Okay, let's summarize:

    The input: a string like "foo, bar , ,,,zab,,lalab,," The output: a list, like [ "foo", "bar", "zab", "lalab" ] Python solution:

    [ x.strip() for x in input.split if len(x.strip()) > 0 ]

    Perfectly readable if you understand python's [ x for x in a if cond ] syntax. Ruby Solution:

    input.split(",").find_all { |x| x.strip.length > 0 }.map { |x| x.strip }

    Again, perfectly readable. I can't see any Java solution with less than 4 lines.

    CAPTCHA: we are DOOMed

  • snqow (unregistered)

    (ok, small brain fart, I forgot the input.split(",") in Python's solution)

    CAPTCHA: will you spare me for this, DUBYA?

  • snqow (unregistered)

    just realized: if you change the order between map and find_all in my ruby solution, you avoid calling strip twice.

  • Holli (unregistered) in reply to JavaHead
    JavaHead:
    First of all, the perl-code and the java-code are not equivalent (the java-version has error-checking, trims each part and removes empty elements).

    Second, the java-version can be made simpler (java has a library function converting Vector to array which can replace the last 6 lines, has been in java since 1.2)

    Third, if you want functionalty like the perl-version, you can get it down to 6 lines as well:

    	public List setC(String str) {
    		StringTokenizer st = new StringTokenizer(str, ";");
    		Vector v = new Vector(st.countTokens());
    		while(st.hasMoreElements()) { v.add(st.nextElement()); }
    		return v;
    	}
    

    Captcha: there is nothing like a flame-war at the end of the week...

    If you want to remove empty elements just say
    split /;+/, $line;

  • Holli (unregistered) in reply to Larry Rubinow
    Larry Rubinow:
    sub split_at_tabs { my ($line) = @_; my @_record = split(/(\t)/,$line."\t"); my @record = map { $_ eq "\t" ? () : $_ } @_record; return \@record; }
    So, the four things wrong: 1. Uses grouping parens in the split regex where not needed 2. Uses a regex in the split where it's not needed 3. No reason to add the trailing tab to the split input string 4. Use of 'map' instead of 'grep' results in empty fields 5. No obvious reason to return an array ref rather than the array (though there may be design considerations extrinsic to the example)

    And I guess that's five things. :)

    The whole thing could be written shorter and better as

    sub split_at_tabs2 {
       return grep { $_ eq "\t" ? () : $_ } split "\t", shift;
    }
    

    Or, if you're into readability and actually do want to return the array reference:

    sub split_at_tabs3 {
       my $input = shift;
       my @array = split "\t", $input;
       @array = grep { $_ eq "\t" ? () : $_ }, @array;
       return \@array;
    }
    
    sub split_at_tabs4 {
      return [split /\t+/, shift];
    }
    
  • (cs) in reply to nilp
    nilp:
    I've just done some micro benchmarks (it's a slow morning here) that shows the string split method is 50% slower than the original StringTokenizer method. Chris's more efficient StringTokenizer method is 3x faster than using split. And the regex split doesn't even omit zero length strings like the original.

    Considering how slow string manipulation is in Java, I would go for the longer StringTokenizer method every time.

    Then you would be guilty of premature optimization.

    Unless setXXXXXX is being called many times per second, the difference between the performance of StringTokenizer and the performance of a regex split is negligible. And let us not forget that a regex split doesn't have to be str.split(";"). It can also be done with a single private Pattern object, allowing the code to do pattern.split(str). I bet that will affect your benchmark. (Accomplished Java programmers know that String.split creates and destroys a Pattern object with every call, so frequent calls to Striing.split using the same delimiter should be replaced with calls to a single Pattern instance.)

  • infidel (unregistered) in reply to Guest
    Guest:
    RON:
    In C#:

    string[] tokens = str.Split( ';' );

    Java sucks.

    It just means you did not understand what the original code does.

    guess I didn't understand it either.

    contrary to what some PP said, it's the unintelligible syntax that makes me dislike Perl, not the friendliness (or lack thereof) of it's fans

  • Morty (unregistered)
    sub split_at_tabs { my ($line) = @_; my @_record = split(/(\t)/,$line."\t"); my @record = map { $_ eq "\t" ? () : $_ } @_record; return \@record; }

    (He counts four things wrong with that function. Can you find them all?)

    Just four?

    1. Using capture parentheses; clearly, the tabs were not wanted
    2. Appending an extra tab at the end of $line without asking specifying a "LIMIT" for split that would save trailing empty fields
    3. Writing any code at all to eliminate tabs from @_record/@record when it was the result of splitting on tabs, and therefore should contain no tabs
    4. If removing tabs had been necessary, grep is a better function than map.
    5. If removing tabs had been necessary, you don't need a temp variable @record. Perl fully evaluates the right-hand-side before assignment, so you don't need a temp variable. You can say things like: @record=grep {$ ne "\t"} @record
    6. You normally can return just the array, rather than a reference to the array. For really large arrays, you sometimes want return-by-reference, but if the array is composed of substrings from a string that you passed by value, it should be OK to return the array by value.
    7. If you're going to return an array by reference, you probably don't want to do so using a simple reference to a named array created within the scope of a function. If you've saved the reference in the calling context, the value referred to be the reference will be overwritten the next time the function is called. This is probably not what was meant or desired. Instead, create an anonymous array reference.
    8. Given that there is only one statement in the function that does something useful -- the split statement -- this should have been inlined.
  • (cs) in reply to kixx
    kixx:
    1. For split there's no difference between /\t/ and "\t", the doc for split says specifically that the first argument is a regex.
    Aristotle Pagaltzis:
    2. split always uses a regex. If you pass it a string, the string will be interpreted as a regex. There is only one special case, which is if you pass it a string consisting of a single space.
    Aren't constant regexes with no match syntax just optimized into constant strings anyway, so that aside from the initial parse there's no functional difference at all?
  • Joseph Newton (unregistered) in reply to Larry Rubinow
    Larry Rubinow:
    split() takes either a regex or a string as its delimiter. I don't know whether Perl is smart enough to optimize away the invocation of the regex engine in this case, but why risk it?

    If you don't know, then RTFM before you comment. Yes, the Perl regex engine is indeed smart enough to process any regular expression with only the amount of process implicit in the complexitiy of the expression. It does not need to be optimized away--it just needs to be provided with thoughtfully designed search patterns.

    [Unlike the Captcha machine, which could use a spell-checker. The US Supremes have never to my knowledge, issued an infamous "Dread Scott" decision, so the proper historical allusion is "Dredlocks". ...whatever the Firefox idiot machine may say to the contrary--RTFH, and start with "Marcus Garvey"]

  • AT (unregistered) in reply to Frostcat
    Frostcat:
    Bart B:
    then never code in perl. You are not suited to it :)

    This kind of bad attitude is why so many people dislike Perl.

    It's got nothing to do with Perl specifically. Like others have pointed out, if you don't find idiomatic constructs like

       split('/t', $str);
       *p++;
       str.trim();
    

    readable enough for you, and instead have to wrap them with functions like

       split_at_tabs($str);
       dereference_and_increment_pointer(&p);
       trim_string(str);
    

    then you have no business programming in ANY language.

    I dislike ugly Perl code as much as anyone (especially since good Perl code is very easy to write...) - but an inline call to split() is absolutely not an example of it.

  • AT (unregistered) in reply to foxyshadis
    foxyshadis:
    Aren't constant regexes with no match syntax just optimized into constant strings anyway, so that aside from the initial parse there's no functional difference at all?

    Could well be, but that's beside the point. The original suggestion was that using "\t" instead of /\t/ would be more efficient because it would use a constant string match, not a regex match. This is not true.

    Regardless of what the underlying implementation does to optimise them, both expressions will be treated identically.

  • Joseph Newton (unregistered) in reply to David
    David:
    Veinor:
    Anonymous!:
    What idiot wrote that Perl code?

    It should be:

    sub split_at_tabs {
       my ($line) = @_;
       my @record = split(/\t/,$line);
       return \@record;
    }
    

    Why are you returning a reference?

    return @record;
    would work just as well, and let you say

    @records = split_at_line($line);
    

    I think the 2nd WTF here is that it takes several attempts to get the Perl right.

    Especially when all come arrive at screwy, non-portable deadweight. Unless you are creating and consuming an array in a single scope, there is rarely good reason to create an explicit array in Perl. Here, in one attempt:

    sub split_and_trim_at_tabs($string) { return $records = [split /\s\t\s/, $string]; }

    BTW: the return statement is really a comment, since Perl always returns the last expression evaluated.

  • Stephen Touset (unregistered) in reply to Anonymous!
    Anonymous!:
    What idiot wrote that Perl code?

    It should be:

    sub split_at_tabs {
       my ($line) = @_;
       my @record = split(/\t/,$line);
       return \@record;
    }
    

    What the fuck? It shouldn't even be a function.

    split(/\t/, $line);

    Unless of course you're fond of hardcoding parameters as part of the function name. After all, we all know that two_plus_two(); is way easier than 2 + 2, right?

  • Joseph Newton (unregistered) in reply to skztr
    skztr:
    This hits home, as I am currently splitting a short string. Do let me point out some WTF: "A sequence of two or more contiguous delimiter characters in the parsed string is considered to be a single delimiter. Delimiter characters at the start or end of the string are ignored. Put another way: the tokens returned by strtok() are always non-empty strings."

    so: "foo,bar,,baz" and "foo,bar,baz" and "foo,,,,,,,,,,,,,,,,,,,,,bar,baz"

    are equivalent.

    No errors, no NULL returned, just silent "pretending it's okay". Gee, thanks!

    So I'm using strchr().

    Hmmm. In which language?

    [code] my_prompt>perl -w use strict; #always use warnings; #ditto

    my $string = "Hi, I am string,, which may or may not contain null strings, particularly between commas,, which doesn't make much grammatical sense, but does illustrate the actions of the Perl split()."; my $tokens = [split /\s,\s/, $string]; foreach (@$tokens) { print "token is $_\n"; }

    ^Z token is Hi token is I am string token is token is which may or may not contain null strings token is particularly between commas token is token is which doesn't make much grammatical sense token is but does illustrate the actions of the Perl split(). {/code]

  • Joseph Newton (unregistered) in reply to Joseph Newton
    Joseph Newton:

    Especially when all come arrive at screwy, non-portable deadweight. Unless you are creating and consuming an array in a single scope, there is rarely good reason to create an explicit array in Perl. Here, in one attempt:

    sub split_and_trim_at_tabs($string) { return $records = [split /\s\t\s/, $string]; }

    BTW: the return statement is really a comment, since Perl always returns the last expression evaluated.

    OMG! Whoops, I've been away from Perl for too long, luxuriating in the land of PHP, with its standards function signatures. That should be:

    sub split_and_trim_at_tabs {
       my $string = $_[0];
       return $records = [split /\s*\t\s*/, $string];
    }
    

    Haven't seen the responses to this WTF yet, but I do indeed quake with trepidation at the flames to come.

  • (cs) in reply to foxyshadis
    foxyshadis:
    Aristotle Pagaltzis:
    split always uses a regex. If you pass it a string, the string will be interpreted as a regex. There is only one special case, which is if you pass it a string consisting of a single space.
    Aren't constant regexes with no match syntax just optimized into constant strings anyway, so that aside from the initial parse there's no functional difference at all?
    Yes, but that’s an implementation detail of the regex engine. It has no bearing on whether split interprets its first argument as a regex or not.
  • (cs) in reply to Morty
    Morty:
    6. You normally can return just the array, rather than a reference to the array. For really large arrays, you sometimes want return-by-reference, but if the array is composed of substrings from a string that you passed by value, it should be OK to return the array by value.
    And that qualifies for a WTF, how?
    Morty:
    7. If you're going to return an array by reference, you probably don't want to do so using a simple reference to a named array created within the scope of a function. If you've saved the reference in the calling context, the value referred to be the reference will be overwritten the next time the function is called.
    I am not sure what language you’ve been using, but Perl does not behave like that – unless maybe you are talking about global variables used within functions, in which case congratulations for providing another WTF.

    Lexical variables (declared with my) are reallocated every time the scope in which they are declared is re-entered. (That also means that if you declare a my variable inside a loop and then store a reference to that variable on every iteration, you’ll have references to 20 different variables after 20 iterations, not 20 references to the same variable. Very handy.)

  • (cs) in reply to Joseph Newton
    Joseph Newton:
    Whoops, I've been away from Perl for too long, luxuriating in the land of PHP, with its standards function signatures. That should be:
    sub split_and_trim_at_tabs {
       my $string = $_[0];
       return $records = [split /\s*\t\s*/, $string];
    }
    
    “Luxuriating,” yeah right. Why are you assigning to an (undeclared!) `$records` variable in the return statement? Could it be because you’ve been “luxuriating” in the brokenness of PHP for too long?
    sub split_and_trim_at_tabs {
       my $string = $_[0];
       return [split /\s*\t\s*/, $string];
    }

    Btw, why are trimming whitespace when the original function didn’t? Do you know what the format of the data looks like, and that fields should be trimmed?

  • Jelle Fresen (unregistered)

    Perl... Java... who needs them if you have Haskell?

    setXXXXXXXXXX :: String -> [String]
    setXXXXXXXXXX s = (filter (not.isDelimeter).groupBy delimeter) s
            where delimeter x y = isDelimeter x == isDelimeter y
                  isDelimeter x = x <= ' ' || x == ';'
  • Jelle Fresen (unregistered) in reply to Jelle Fresen
    Jelle Fresen:
    setXXXXXXXXXX :: String -> [String]
    woopsie, the function provided to filter needs a slight adjustment: it should be
    (not.isDelimeter.head)

Leave a comment on “Splitting Headache”

Log In or post as a guest

Replying to comment #:

« Return to Article