Since it's election day in the US, many people are thinking about counting today. We frequently discuss counting here, and how to do it wrong, so let's look at some code from RK.
This code may not be counting votes, but whatever it's counting, we're not going to enjoy it:
case LogMode.Row_limit: // row limit excel = 65536 rows
if (File.Exists(personalFolder + @"\" + fileName + ".CSV"))
{
using (StreamReader reader = new StreamReader(personalFolder + @"\" + fileName + ".CSV"))
{
countRows = reader.ReadToEnd().Split(new char[] { '\n' }).Length;
}
}
Now, this code is from a rather old application, originally released in 2007. So the comment about Excel's row limit really puts us in a moment in time- Excel 2007 raised the row limit to 1,000,000 rows. But older versions of Excel did cap out at 65,536. And it wasn't the case that everyone just up and switched to Excel 2007 when it came out- transitioning to the new Office file formats was a conversion which took years.
But we're not even reading an Excel file, we're reading a CSV.
I enjoy that we construct the name twice, because that's useful. But the real magic of this one is how we count the rows. Because while Excel can handle 65,536 rows at this time, I don't think this program is going to do a great job of it- because we read the entire file into memory with ReadToEnd
, then Split
on newlines, then count the length that way.
As you can imagine, in practice, this performed terribly on large files, of which there were many.
Unfortunately for RK, there's one rule about old, legacy code: don't touch it. So despite fixing this being a rather easy task, nobody is working on fixing it, because nobody wants to be the one who touched it last. Instead, management is promising to launch a greenfield replacement project any day now…