Very Productive Code

Today's anonymous submitter sends us a pile of code that exists to flummox us. It's consfuing, weird, and I don't understand it, even as I understand it. So we're going to walk through this packet of Python in steps, to see if we can understand what's going on.

First is this handy helper function, to turn a value into a number:

def to_number(s):
    try:
        if not s:
            return 0
        number = float(s)
        return int(number) if number.is_integer() else number
    except (TypeError, ValueError):
        return s

This doesn't start too badly. If the value is falsy (None, for example), return 0. If it has a value, though, turn it into a float. If the resulting float doesn't have a decimal part, convert it into an integer. I've got some complaints about a method returning two different types, but I understand the urge.

But then we get to the exception handler. If, for some reason, this conversion fails we return the input. This method shouldn't be named to_number, it should be named to_number_if_the_input_is_a_number_otherwise_we_return_the_input, because that's a behavior that's guaranteed to confuse someone someday.

Okay, I understand that method, so let's look into another.

@staticmethod
def get_product_property(product_id, property):
    from utils import to_number

    prop = ProductProperty.objects.filter(product_id=product_id, property=property).first()

    value = None if prop is None else prop.value
    if not value:
        return [0, 0] if property.lower() in PROPERTIES_TO_SPLIT else 0

    if property.lower() in PROPERTIES_TO_SPLIT:
        return list(map(to_number, value.split(",")))

    return to_number(value)

One of the first rules in Python: put all your imports in one place, and that place is definitely not at the start of a method. Now, admittedly, this can boost startup times if it's an expensive module to load, but it also means the first call to this method will take a lot longer.

Okay, so we take a product_id and the property and search the ProductProperty lookup table to find the first match, so we can use the value stored in the DB. Then, we get to something terrible. There is a PROPERTIES_TO_SPLIT constant, containing a list of property IDs. Because sometimes, the data in the database is a list of values as a comma-separated string. Sometimes it isn't. This is worse than stringly typed data, it's stringly-multiply-typed data.

But that explains why to_number returns the input on an error- sometimes we pass it an array of numbers.

So now we get to the centerpiece of this sample.

@classmethod
def get_closest_named_product(cls, product):
    from core.products import PRODUCTS
    """
        product = {"interval": (123, 456), "value": (789, )}
        named_products = {
            "some_name": {"interval": (987, 654), "value": (321, )},
            "some_other_name": {"interval": (123, 456), "value": (789, )},
            ...
        }
        -> returns "some_other_name"

        do this by multiplying the product dollar values and finding the named_products
        with a product that's closest to it
    """

    def values_product(item):
        return functools.reduce(operator.mul, itertools.chain(*item.values()))

    product_product = values_product(product)  # 123 * 456 * 789
    named_products = {
        k: values_product(v)
        for k, v in PRODUCTS.items()
    }  # {"some_name": 987 * 654 * 321, ...}

    def key(item):
        return abs(product_product - item)

    selected_product = min(named_products.values(), key=key)
    product_name = {v: k for k, v in named_products.items()}
    return product_name[selected_product]

This helpfully has a very thorough code sample in the documentation. Well, maybe not helpfully, because I have no idea what's going on. A product is apparently a combination of an interval and a value, which are dollar values, so I don't think this is time series data. The way we determine if two products are "close" is based on "multiplying the product dollar values". Why? I don't understand what this is for. Maybe the code will make it clearer.

We create a convenience function value_product with does some functools and iterntools magic to flatten a product and multiply all the numbers together.

Then we repeat this for all the products in our PRODUCTS constant, which we imported at the top (again). But wait, a PRODUCTS constant? Yes! There's a hard-coded list of products in our application code- our application code which queries a database. Well, that's fantastic.

Then we chuck another convenience function, key in there: just take the absolute value of our product_product's difference from the item in question. We use that to (ab)use the min function to search for the minimum difference. Finally, we invert the relationship between keys and values, so that we can lookup the product based on that minimum difference… wait, does this mean that the products in named_products don't store the whole interval/value pair? I don't understand. Does this code even work?

Our submitter writes:

Not that anyone seems to care, this code has been in production for 6 years already.

In this case, the code does work, sometimes, but it's extremely error prone. And the original author knows this. They wrote some unit tests, and commented the unit test for to_number like so:

        # XXX: utils.to_number is returning same string if conversion failed
        # FIXME: Either update unit test expectation or function return value

[Advertisement] BuildMaster allows you to create a self-service release management platform that allows different teams to manage their applications. Explore how!