Last week, we took a look at a hash array anti-pattern in JSON. This week, we get to see a Python version of that idea, with extra bonus quirks, from an anonymous submitter.
In this specific case, the code needed to handle CSV files. The order of the columns absolutely matters, and thus the developer needed to make sure that they always handled columns in the correct order. This led to code like this:
FIELD_NAME_ORDER = collections.OrderedDict({
1: 'Field1',
2: 'Field2',
# etc. There are over a hundred fields.
})
# Elsewhere in the code, the only usage of FIELD_NAME_ORDER...
for field_name in FIELD_NAME_ORDER.values():
AddField(field_name)
Now, the first thing you notice is that this is, once again, a hash array. The keys are the indexes. It doesn't look like that much of a WTF, and you'll note the use of OrderedDict
which ensures that the dictionary retains insertion order. So this is just a silly little block of code…
Except, there are a few problems. First, starting around Python 3.7, OrderedDict
became the default data structure for all dicts, so you don't really need the OrderedDict
constructor in there. That's no big deal, except that prior to that version, a dictionary literal like {1: 'Field1', 2: 'Field2'}
wouldn't be represented as an ordered dict- it would just be a hash, which means the order of the keys is arbitrary.
From the docs:
Keys and values are listed in an arbitrary order which is non-random, varies across Python implementations, and depends on the dictionary’s history of insertions and deletions.
Now, this code targets Python 2.7, which is old and out of support, and clearly TRWTF. But it 2.7, this absolutely was how dictionaries worked, so this code, on the surface, shouldn't work. But it does, and the reason isn't surprising once you think about it: what would you expect the unique hash of the number 1
to be?
CPython, the main implementation of Python, quite reasonably hashes ints to their value: hash(1) == 1
. Non-OrderedDict
s sort the keys in the order of their hash values. So the dict
literal will iterate in the order of the numeric keys, and when we insert that into an OrderedDict
it will preserve the insertion order, which is the numeric order.
The developer who wrote this blundered into a working solution by what appears to be an accident.
Our anonymous submitter took the extra few seconds to replace the OrderedDict
with a list
, which, y'know, is already going to guarantee order without you needing to blunder into how hash
es work.