Loads of languages, like Python, have some sort of "comprehension" as a form of syntactic sugar. Instead of doing something awkward like:
my_list = [1, 2, 3, 4]
res = []
for x in my_list:
res.append(x*x)
# res contains: [1, 4, 9, 16]
You can instead do:
my_list = [1, 2, 3, 4]
res = [x * x for x in my_list]
# res contains: [1, 4, 9, 16]
Used correctly, it's not just code golf, but it can make the intent and purpose of your code more clear. Used incorrectly, you can accomplish the exact opposite.
Vincent took over a product with a lot of modules which had, at one time, been very important bits of functionality, but now were deprecated. For example, there used to be an lxml
-based parser which loaded data from an XML-based web-service. That webservice was long dead, the parser thus was no longer needed, but the code wasn't so well organized that you could just delete the module without doing a review.
That's how Vincent found this:
def scrape_ext(root, split_by):
return '\n'.join([
' '.join([b.strip() for b in c.split()]) for c in [_f for _f in [
y.strip() for y in
root.text_content().split(split_by)] if _f]])
This is the impressive triply-nested comprehension, with useless variable names and a bonus bit of awkward indentation to help keep in unreadable and unclear. So much for Python's whitespace-as-syntax helping developers keep their code blocks properly indented.
Let's see if we can make sense of this by taking it from the inside out. First:
[y.strip() for y in root.text_content().split(split_by)]
This is easy, on its own: take the text of an HTML element, and create a list by splitting on some character, but also stripping whitespace. This, alone, is a pretty textbook example of a simple comprehension: it iterates across a list and manipulates each item in the list in a small way. The next comprehension, wrapping around that:
[_f for _f in split_and_stripped if _f]
This highlights another feature of Python comprehensions, filtering. You have an if _f
at the end, which selects only the elements that are truthy values- any empty strings will be filtered out.
There's only one problem with that filter: it's not necessary. Because the next compression is for c in [_f for … if _f]
, so we could just as easily have done for c in split_and_stripped if c
. And what do we do with c
anyway?
Another nested comprehension:
[b.strip() for b in c.split()]
Split the string on whitespace, strip the whitespace… that we just split on. Python's split will remove all the whitespace characters, making the strip
unnecessary.
Then we ' '.join([b.strip() for b in c.split()])
, which shows us Python's unusual approach to joins (they're string methods, not array methods- this joins the array using a space between each element).
Then we join the results of all the other comprehensions with a \n
.
So the real purpose of this code: turn all the whitespace into single spaces, then replace an arbitrary character (split_by
) with a newline. But you wouldn't get that by just reading it, and I'm not entirely certain that's what the original developer actually realized they were doing, because this isn't the kind of code written by someone who understands the problem they're solving.
Like so much bad code, this was fortunately unused in the program, and Vincent was free to dispose of it.