Today's anonymous submitter sends us a pile of code that exists to flummox us. It's consfuing, weird, and I don't understand it, even as I understand it. So we're going to walk through this packet of Python in steps, to see if we can understand what's going on.
First is this handy helper function, to turn a value into a number:
def to_number(s):
try:
if not s:
return 0
number = float(s)
return int(number) if number.is_integer() else number
except (TypeError, ValueError):
return s
This doesn't start too badly. If the value is falsy (None
, for example), return 0. If it has a value, though, turn it into a float. If the resulting float doesn't have a decimal part, convert it into an integer. I've got some complaints about a method returning two different types, but I understand the urge.
But then we get to the exception handler. If, for some reason, this conversion fails we return the input. This method shouldn't be named to_number
, it should be named to_number_if_the_input_is_a_number_otherwise_we_return_the_input
, because that's a behavior that's guaranteed to confuse someone someday.
Okay, I understand that method, so let's look into another.
@staticmethod
def get_product_property(product_id, property):
from utils import to_number
prop = ProductProperty.objects.filter(product_id=product_id, property=property).first()
value = None if prop is None else prop.value
if not value:
return [0, 0] if property.lower() in PROPERTIES_TO_SPLIT else 0
if property.lower() in PROPERTIES_TO_SPLIT:
return list(map(to_number, value.split(",")))
return to_number(value)
One of the first rules in Python: put all your imports in one place, and that place is definitely not at the start of a method. Now, admittedly, this can boost startup times if it's an expensive module to load, but it also means the first call to this method will take a lot longer.
Okay, so we take a product_id
and the property
and search the ProductProperty
lookup table to find the first match, so we can use the value
stored in the DB. Then, we get to something terrible. There is a PROPERTIES_TO_SPLIT
constant, containing a list of property IDs. Because sometimes, the data in the database is a list of values as a comma-separated string. Sometimes it isn't. This is worse than stringly typed data, it's stringly-multiply-typed data.
But that explains why to_number
returns the input on an error- sometimes we pass it an array of numbers.
So now we get to the centerpiece of this sample.
@classmethod
def get_closest_named_product(cls, product):
from core.products import PRODUCTS
"""
product = {"interval": (123, 456), "value": (789, )}
named_products = {
"some_name": {"interval": (987, 654), "value": (321, )},
"some_other_name": {"interval": (123, 456), "value": (789, )},
...
}
-> returns "some_other_name"
do this by multiplying the product dollar values and finding the named_products
with a product that's closest to it
"""
def values_product(item):
return functools.reduce(operator.mul, itertools.chain(*item.values()))
product_product = values_product(product) # 123 * 456 * 789
named_products = {
k: values_product(v)
for k, v in PRODUCTS.items()
} # {"some_name": 987 * 654 * 321, ...}
def key(item):
return abs(product_product - item)
selected_product = min(named_products.values(), key=key)
product_name = {v: k for k, v in named_products.items()}
return product_name[selected_product]
This helpfully has a very thorough code sample in the documentation. Well, maybe not helpfully, because I have no idea what's going on. A product
is apparently a combination of an interval and a value, which are dollar values, so I don't think this is time series data. The way we determine if two products are "close" is based on "multiplying the product dollar values". Why? I don't understand what this is for. Maybe the code will make it clearer.
We create a convenience function value_product
with does some functools
and iterntools
magic to flatten a product and multiply all the numbers together.
Then we repeat this for all the products in our PRODUCTS
constant, which we imported at the top (again). But wait, a PRODUCTS
constant? Yes! There's a hard-coded list of products in our application code- our application code which queries a database. Well, that's fantastic.
Then we chuck another convenience function, key
in there: just take the absolute value of our product_product
's difference from the item
in question. We use that to (ab)use the min
function to search for the minimum difference. Finally, we invert the relationship between keys and values, so that we can lookup the product based on that minimum difference… wait, does this mean that the products in named_products
don't store the whole interval
/value
pair? I don't understand. Does this code even work?
Our submitter writes:
Not that anyone seems to care, this code has been in production for 6 years already.
In this case, the code does work, sometimes, but it's extremely error prone. And the original author knows this. They wrote some unit tests, and commented the unit test for to_number
like so:
# XXX: utils.to_number is returning same string if conversion failed
# FIXME: Either update unit test expectation or function return value