Heading On Out

Madeline inherited some Python 2.7 code, with an eye towards upgrading it to a more modern Python version. This code generates CSV files, and it's opted to do this by cramming everything into a 2D array in memory and then dumping the array out with some join operations, and that's the real WTF, because that's a guaranteed way to generate invalid CSV files. Like so many things, the CSV files are actually way more complicated than people think.

But we're going to focus in on a smaller subset of this pile of WTFs. I'll lead with the caveat from Madeline: "I've changed some of the variable names around, for a bit of anonymity, but couldn't get variable names quite as terrible as the original ones."

columns = []
headColumns = "Record ID, Record Category,  CITY,Restrictions,UNIQUE_GENERATED_ID".split(
	","
)
for currentColumn in headColumns:
	columns.append(currentColumn)

This code tracks the column names as a comma separated string. It splits the string into a list. Then it iterates across that list and appends each item into… a list. headColumns is never referenced again.

This could have just been… a list of column names. No splits. No appends. Just… have a list of column names.

That's annoying, but then there's all the hinted at details here. The column names will all contain an abritrary number of spaces. There's a mix of conventions- spaces and underscores in names, Title Case and ALL CAPS. Two columns, "Record ID" and "UNIQUE_GENERATED_ID" both suggest that they're unique identifiers. Madeline says that the CSV file ends up containing three different identifier columns, and says "I think that's likely due to an improperly designed database schema.

Yes, I suspect Madeline is right about that.

[Advertisement] ProGet’s got you covered with security and access controls on your NuGet feeds. Learn more.

Heading On Out

Featured Comments