If you want millions in VC funding, mumble the words “machine learning” and “disruption” and they’ll blunder out of the woods to just throw money at your startup.

At its core, ML is really about brute-forcing a statistical model. And today’s code from Norine could have possibly been avoided by applying a little more brute force to the programmer responsible.

This particular ML environment, like many, uses Python to wrap around lower-level objects. The ease of Python coupled with the speed of native/GPU-accelerated code. It has a collection of Model datatypes, and at runtime, it needs to decide which concrete Model type it should instantiate. If you come from an OO background in pretty much any other language, you’re thinking about factory patterns and abstract classes, but that’s not terribly Pythonic. Not that this developer’s solution is Pythonic either.

def choose_model(data, env):
  ModelBase = getattr(import_module(env.modelpath), env.modelname)
  class Model(ModelBase):
    def __init__(self, data, env):
      if env.data_save is None:
        if env.counter == 0:
          self.data = data
          raise ValueError("data unavailable with counter > 0")
        with open(env.data_save, "r") as df:
          self.data = json.load(df)
      ModelBase.__init__(self, **self.data)
  return Model(data, env)

This is an example of metaprogramming. We use import_module to dynamically load a module at runtime- potentially smart, because modules may take some time to load, so we shouldn’t load a module we don’t know that we’re going to use. Then, with get_attr, we extract the definition of a class with whatever name is stored in env.modelname.

This is the model class we want to instantiate. But instead of actually instantiating it, we instead create a new derived class, and slap a bunch of logic and file loading into it.

Then we instantiate and return an instance of this dynamically defined derived class.

There are so many things that make me cringe. First, I hate putting file access in the constructor. That’s maybe more personal preference, but I hate constructors which can possibly throw exceptions. See also the raise ValueError, where we explicitly throw exceptions. That’s just me being picky, though, and it’s not like this constructor will ever get called from anywhere else.

More concretely bad, these kinds of dynamically defined classes can have some… unusual effects in Python. For example, in Python2 (which this is), each call to choose_model will tag the returned instance with the same type, regardless of which base class it used. Since this method might potentially be using a different base class depending on the env passed in, that’s asking for confusion. You can route around these problems, but they’re not doing that here.

But far, far more annoying is that the super-class constructor, ModelBase.__init__, isn’t called until the end.

You’ll note that our child class manipulates self.data, and while it’s not pictured here, our base model classes? They also use a property called data, but for a different purpose. So our child class inits a child class property, specifically to build a dictionary of key/value pairs, which it then passes as kwargs, or keyword arguments (the ** operator) to the base class constructor… which then overwrites the self.data our child class was using.

So why do any of that?

Norine changed the code to this simpler, more reliable version, which doesn’t need any metaprogramming or dynamically defined classes:

def choose_model(data, env):
  Model = getattr(import_module(env.modelpath), env.modelname)
  if env.data_save is not None:
    with open(env.data_save, "r") as df:
      data = json.load(df)
  elif env.counter != 0:
    raise ValueError('if env.counter > 0 then must use data_save parameter')

  return Model(**data)

Norine adds:

I’m thinking of holding on to the original, and showing it to interviewees like a Rorschach test. What do you see in this? The fragility of a plugin system? The perils of metaprogramming? The hollowness of an overwritten argument? Do you see someone with more cleverness than sense? Or someone intelligent but bored? Or perhaps you see, in the way the superclass init is called, TRWTF: a Python 2 library written within the last 3 years.

[Advertisement] Utilize BuildMaster to release your software with confidence, at the pace your business demands. Download today!