• bvs23bkv33 (unregistered)

    yep, that is really scrape

  • Yazeran (unregistered)

    So actually this could be done doing this (Perl syntax):

    $var =~ s/[\s\t]+/ /g; $var =~ s/split_by/\n/g;

  • WTFguy (unregistered)

    If you squint real hard, it's a form of normalization. The output of this mess is a cleaned-up version of the input. Which makes the next meta-level of their parser simpler since it can rely on everything being in their home-brew version of a canonical form.

    Of course it fails massively on complicated stuff like CDATAs, escaped entities, etc. Such fun.

    And the multiple comprehensions just scream "I wrote the first (innermost) filter believing it solved the whole problem, then later in testing something leaked through, so I wrote a wrapper filter to handle the leakage, then ..." lather rinse repeat. The only reason this isn't nested another 6 levels is their testing wasn't thorough enough to uncover all the other ways this sucks.

  • kktkkr (unregistered)

    But no? It seems that split_by would always become a newline regardless of whether it was already whitespace, an effect that would be lost if you just did two regex replaces.

  • (nodebb)

    I tried to clean it up:

        def split_lines_words(root, split_by):
            """All calls to list() are unnecessary, I think???."""
            def segments():
                for y in root.text_content().split(split_by):
                    yield y.strip()
            segments = list(segments())
    
            def non_empty_segments():
                for _f in segments:
                    if _f:
                        yield _f
            non_empty_segments = list(non_empty_segments())
    
            def split_words(segment)
                for word in segment.split():
                    yield word.strip()
    
            def cleaned_up_segments():
                for segment in non_empty_segments:
                    yield ' '.join(split_words(segment))
            cleaned_up_segments = list(cleaned_up_segments())
    
            return '\n'.join(cleaned_up_segments)
    

    Addendum 2019-12-02 08:43: Seems code blocks are defined by triple backticks, not via extra indentation.

    what
        if
    
  • (nodebb) in reply to jimbo1qaz 0

    If you worry about intent and not preserving the madness you can clean it up better I think:

    import re
    
    def minimize_whitespace(s):
      whitespace_pattern = re.compile('\s+')
      return whitespace_pattern.sub(' ', s.strip())
    
    def scrape_ext(root, split_by):
      lines = root.text_content().split(split_by)
      cleaned = map(minimize_whitespace, lines)
      return '\n'.join(cleaned)
    
  • Foo AKA Fooo (unregistered) in reply to jimbo1qaz 0

    Ugh, that's what you call cleaning up? Getting rid of the actual benefits of comprehension (compact syntax), at the cost of lots of one-time functions, while not removing any of the problems mentioned in the article such as the redundant strip and the unnecessary extra pass for filtering?

    Good if you're paid by LOC, otherwise I'd say this is even a pessimization of the original WTF.

    Here's some real cleaning up (may not be 100% correct as I don't do Python, but you get the idea):

    (I hope triple backticks work as you say, otherwise the formatting will be messed up, but should still be readable.)

    def scrape_ext(root, split_by):
      a = [y.strip() for y in root.text_content().split(split_by)]
      return '\n'.join([' '.join(c.split()) for c in a if c])
    
  • (nodebb) in reply to Foo AKA Fooo

    even better (or worse? depends on how you view the world and your life)

    def scrape_ext(root, split_by):
        return '\n'.join([' '.join(c.split()) for c in map(str.strip, root.text_content.split(split_by)) if c])
    

    avoids that extra allocation with map, which gives a generator.

  • (nodebb)

    Does everyone in this comment section think there's a variable name drought or something? You can use more than three characters to name them!

  • (nodebb) in reply to Sulis

    Yeah, quite a few of them seem to still be making the same mistake as the original WTF by forgetting this:

    "Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live. Code for readability." -- John F. Woods

  • Foo AKA Fooo (unregistered) in reply to Sulis

    K.

  • P (unregistered)

    I think this comment thread shows pretty clearly that when it comes to Python, most people are just as clueless as Remy as to what is good practice aka Pythonic.

    This is why we told you to stick to your Java/.NET/PHP and never touch languages that you're clearly not qualified to talk about!

  • (nodebb)

    Ironically, list comprehensions are one of the things where Python somewhat fails in terms of programmer convenience.

    In Javascript you can chain list operations in the style

    return array.map(func).filter(pred).reduce(red);
    

    In Python, chaining requires using intermediate names, or re-binding a name, e.g.

    result = map(func, array);
    result = filter(pred, result);
    return reduce(red, result); #< not sure about the API here.
    

    This is compounded by the absence of scope-limiting constructs and const keyword, which somehow encourage me to avoid "unnecessary" local variables.

  • (nodebb) in reply to kbauer

    You realize that instead of

    result = map(func, array)
    result = filter(pred, result)
    

    you can do

    result = filter(pred, map(func, array))
    

    Addendum 2019-12-02 16:23: to be clear, if you're doing something like chaining a map and a filter, you might as well use a list comprehension:

    result = [func(i) for i in array if pred(i)]
    
  • Little Bobby Tables (unregistered) in reply to Sulis

    Thaaaaat's NOTHING!

    I've just had a cow-orker ask me to debug his code. He didn't send me his source code, just a snippet of one block where he had 4 variables, all completely different types, called "arg1", "arg2", "arg3" and "arg4". There's a line where he refers to "arg2 [arg4]" and at that stage I decided I need a new job.

  • (nodebb) in reply to P

    Sticking to .NET would mean array-to-string joining being a String method instead of an Array method wouldn't be considered "unusual".

  • (nodebb) in reply to felpy

    In trivial examples it works out well. But usually these chains get a good deal longer and usually the contain lambdas. The result tend to be deeply nested function calls or list comprehensions, especially when some iteration is involved in making the code do what it was meant to. Or refactoring involving naming intermediate steps.

    With JavaScript's chaining notation, nesting is avoided.

  • (nodebb)

    I think Python's join is a string method because its parameter can be any iterable, whereas if you have an iterable in JavaScript, you have to use an array comprehension to turn it into an array first before you can join it.

  • Herr Otto Flick (unregistered)

    The code in the article was re-written by python-future, which abhors filter() and map(). You can spot it immediately, it will turn "filter(iter, None)" in to "[_f for _f in iter if _f]". The original comprehension was probably much more readable.

  • Vincent (unregistered) in reply to Herr Otto Flick

    Actually, after submitting I've noticed the botched indentation is 2to3's fault and yes, there was a filter() that got replaced. Doesn't make it much better though.

Leave a comment on “List Incomprehension”

Log In or post as a guest

Replying to comment #:

« Return to Article