Part of the pitch of Python is that it's a language which values simplicity and ease of use. Whether it lives up to that pitch is its own discussion, but the good news is that if you really want to create really complex code with loads of inheritance and nasty interrelationships, you can.
Today's anonymous submitter started out by verifying some of the math in some extremely math-y analytical code. Previously, they'd had problems where the documentation and the code were subtly different, so it was important that they understood exactly what the code was doing.
In one file, they found the core class they cared about:
class WeibullFitter(KnownModelParametricUnivariateFitter):
# snip: some math
Now, the math looked more or less right. But there was just one problem: who knows how the superclass interacts with that? Subtle changes in behavior could appear there.
class KnownModelParametricUnivariateFitter(ParametricUnivariateFitter):
_KNOWN_MODEL = True
The "base" class isn't terribly "base" at all, as you can see: all it does is set a property to True
. So once again, up the inheritance tree we go, to see the base class:
class ParametricUnivariateFitter(UnivariateFitter):
# ...snip...
We're still not at the actual base class, though this one has some implementation, at least? Is that a good thing? We haven't hit TRWTF just yet, but this strikes me as a good chance to talk about why "inheritance is considered harmful": inheritance is an automatic dependency. To understand the behavior of a child class, you have to also understand the behavior of its ancestor classes. Certainly, well implemented inheritance should keep those boundaries neat, but as we can see, this example isn't "well implemented".
More than that, Python is purposefully loosely typed, so one of the key benefits of inheritance, polymorphism isn't even a benefit here. And yes, one could use Python's type annotations to get some lint-time type checking, which would sort-of bring back polymorphism, it still doesn't justify this whole inheritance tree.
That all aside, one of the main reasons we use inheritance is so that we can cut out common conditional logic and let the type system worry about it. I call concreteInstance.someMethod()
and the right thing happens, even if I have a dozen possible types, each with some differing behavior. I bring this up, because in the ParametricUnivariateFitter
class, we have this:
def _fit_model(self, Ts, E, entry, weights, show_progress=True):
if utils.CensoringType.is_left_censoring(self): # Oh no.
negative_log_likelihood = self._negative_log_likelihood_left_censoring
elif utils.CensoringType.is_interval_censoring(self): # Oh no no no.
negative_log_likelihood = self._negative_log_likelihood_interval_censoring
elif utils.CensoringType.is_right_censoring(self): # This is exactly what I think it is isn't it.
negative_log_likelihood = self._negative_log_likelihood_right_censoring
# ...snip...
Comments provided by the submitter. In addition to having a whole tree of child classes, each of these child classes may have a censoring type applied, and our behavior is different based on the censoring type. This is 100% a code smell, and it becomes more clear when we take a look at CensoringType
.
class CensoringType(Enum): # enum.Enum from the standard library
LEFT = "left"
INTERVAL = "interval"
RIGHT = "right"
@classmethod
def right_censoring(cls, function: Callable) -> Callable:
@wraps(function) # functools.wraps from the standard library
def f(model, *args, **kwargs):
cls.set_censoring_type(model, cls.RIGHT)
return function(model, *args, **kwargs)
return f
@classmethod
def left_censoring(cls, function: Callable) -> Callable:
@wraps(function)
def f(model, *args, **kwargs):
cls.set_censoring_type(model, cls.LEFT)
return function(model, *args, **kwargs)
return f
@classmethod
def interval_censoring(cls, function: Callable) -> Callable:
@wraps(function)
def f(model, *args, **kwargs):
cls.set_censoring_type(model, cls.INTERVAL)
return function(model, *args, **kwargs)
return f
@classmethod
def is_right_censoring(cls, model) -> bool:
return cls.get_censoring_type(model) == cls.RIGHT
@classmethod
def is_left_censoring(cls, model) -> bool:
return cls.get_censoring_type(model) == cls.LEFT
@classmethod
def is_interval_censoring(cls, model) -> bool:
return cls.get_censoring_type(model) == cls.INTERVAL
@classmethod
def get_censoring_type(cls, model) -> str:
return model._censoring_type
@classmethod
def str_censoring_type(cls, model) -> str:
return model._censoring_type.value
@classmethod
def set_censoring_type(cls, model, censoring_type) -> None:
model._censoring_type = censoring_type
For those not up on Python, Enum
is exactly what you think it is, and the @classmethod
decorator is Python's way of making static methods. In the same way instance methods take self
as their first parameter (the Python-ic "this"), static methods take cls
as their first parameter- a reference to the class itself.
It's also important to note the methods like right_censoring
, because those themselves are "decorator" definitions. See how they def
a local function, itself decorated with @wraps
? The right_censoring(cls, function)
signature expects a callable (probably a constructor method), and replaced its implementation with the implementation of f
- the inner function. Here, it tampers with the input parameters to the constructor before calling the constructor itself.
If you aren't doing a lot of Python, you might be completely confused at this point, so let me just show you how this gets used:
@CensoringType.right_censoring
class SomeCurveFitterType(SomeHorribleTreeOfBaseClasses):
def __init__(self, model, *args, **kwargs):
# snip
instance = SomeCurveFitterType(model, *args, **kwargs)
On that last line, it doesn't directly call the __init__
constructor, it first passes through that inner f
function, which , most notably, does this: cls.set_censoring_type(model, cls.RIGHT)
before invoking the constructor.
If you're confused by all of this, don't feel bad. Decorators are a Pythonic way to tamper with the implementation of classes and functions, allowing you to mix declarative programming with more traditional techniques. In the end, to understand what the WeibullFitter
class does, you have to walk up a half dozen ancestor classes to reach the BaseFitter
type, and you have to note what decorators are applied to it, and any of its ancestor classes, and know what their implementation is.
If you're the person who wrote this code, this mix of decorators and inheritance probably feels wonderfully extensible. It's probably quick and easy to slap a new curve fitting function into the framework, at least if you're the galaxy-brained individual who dreamed up this over-engineered framework. The rest of us just have to poke at it with sticks until the behavior we want falls out.
Our anonymous submitter adds:
I almost feel bad about this. The library is generally good, the math itself is good, this is a very useful library that has made my job a lot easier…
This sort of behavior actually has real performance implications. I had to re-implement a different piece because the sheer number of nested function calls (some of which were actually recursive, in a chain that spanned three different classes) completely destroyed the performance of a conceptually simple process, such that it could take upwards of ten minutes.
The rewritten function runs near-instantly and fits in 20 lines with comments.