• my name (unregistered)

    greenfield is a myth. and the replacement with a big bang usually never happens

  • (nodebb)

    My election day prediction is a lengthy argument about the most efficient way to count the number of lines in a CSV file.

  • Greg (unregistered)

    Bonus point for the potential off by one error: you can have n rows with only n-1 new lines

  • (nodebb) in reply to Greg

    Or, worse, n rows and n newlines, so the split creates a blank row at the end, and you finish with n+1 rows to process, one of which is guaranteed to be empty.

  • Jaloopa (unregistered)

    What are the odds the row count is later passed in to another method that reads the entire file again and loops through the lines up to countRows to process them?

  • TS (unregistered)

    Just because it's a CSV file doesn't mean it isn't going to run into the 65535 row limit. And just because this code dates from 2007 doesn't mean the same thing can't bite you badly a decade or more later. https://www.bbc.co.uk/news/technology-54423988

  • (nodebb)

    And of course if any of the CSV data consists of quoted string fields containing embedded newlines, then all newline-based counts are meaningless. Despite the existence of a standard (RFC 4180) the reality is CSV is a very loose format with lots of variations in what gets generated or accepted. Maybe these devs know embedded unescaped newlines are impossible. Far more likely they just never considered the possibility.

  • (nodebb)

    I am not really religious, but Jesus Christ.

    Addendum 2024-11-05 08:08: Even though this code made me nearly puke, it has to be pointed out that they at least correctly disposed the stream. So there's that.

  • (nodebb) in reply to MaxiTB

    @MaxiTB :

    it has to be pointed out that they at least correctly disposed the stream.

    Ah, so you're the sort of person who'd be up there on that cross at the end of The Life of Brian, singing "Always Look on the Bright Side of Life" with all the rest???

  • (nodebb)

    There's also our old friend, the check-if-exists-then-open antipattern.

  • Lurk (unregistered) in reply to dkf

    It's not that daft. In .net at least checking for existence will fail gracefully, i.e. you'll get a false if the process doesn't have access rights to the file whereas trying to do things with a file without that check will land you in exception handling country.

    Why a process hasn't got rights to a file it's supposed to be processing is another problem entirely.

  • OldCoder (unregistered)

    I think I'd be as much concerned about the fact that the first line is a case statement!! Particularly as the value of that constant might just be 65536. The mind boggles what the other case values might be - and WTF the switch statement is doing.

  • Steve (not that one) (unregistered)

    "But we're not even reading an Excel file, we're reading a CSV." -- which was probably dumped from an Excel file -- and then read back in after the counting.

  • (nodebb) in reply to Steve_The_Cynic

    Nope, again, not religious. But it's funny to me that someone reads a complete file into memory than creates countless additional sub strings just to count how many Environment.NewLine are in a file while making sure they are a good citizen and don't waste OS resources like file handles. Then again, on second though, Just remember the last laugh is on you :-)

  • Anon (unregistered)

    Excel 2007 raised the row limit to 1,000,000 rows.

    Anyone who has pressed Ctrl+Down Arrow one too many times, or understands how computers store numbers, would know that the row limit increased to 1,048,576.

  • (nodebb)

    Someone needs to explain to me the usefulness of going to the very last row of a spreadsheet. Going to the last used row of a spreadsheet, sure, but the absolutely very last row?

  • Loren Pechtel (unregistered)

    This isn't as clear cut as it seems. If the typical CSV file is small this would actually do fairly well. The thing is you are going to read the whole file no matter what you do and there are performance advantages to minimizing the number of reads used. So long as the file is small compared to memory reading it in one gulp is probably a good idea.

    The split by line is inherently inefficient as it does a copy and does memory allocation--but for fairly small files this is probably small compared to the disk read time. But this routine will perform quite badly if the files aren't small relative to available memory.

  • (nodebb) in reply to Worf

    Hmmm, maybe we can "solve" the problem by introducing infinite scrollers (as per yesterday's discussion, https://thedailywtf.com/articles/comments/a-matter-of-understanding#comment-665749) as UX of choice for spreadsheets (yes, and CSVs too). What could possibly go wrong?

  • LZ79LRU (unregistered) in reply to Worf

    It's to make sure some idiot user didn't misunderstand how hiding cells work and end up with random data at row 5327 and 332211 because of some sorting rule or because somebody does not understand row freezing.

    I love excel with all my heart. It is gods gift to mankind and proof that not all is yet bad with the world. But the things people do with it also make it the canvas upon which is displayed the totality of human ineptitude.

  • (nodebb)

    @ Worf Ref

    Someone needs to explain to me the usefulness of going to the very last row of a spreadsheet. Going to the last used row of a spreadsheet, sure, but the absolutely very last row?

    No mystery at all. Back in the day it was common to store macros and such at the far corners of the sheet to keep them away from prying eyes and random cut/copy/paste destruction. So that set the rationale for e.g. Ctrl-End to go where it does.

    Then backwards compatibility ensures it still goes there. Even though now we have hidden regions, locked regions and more than one sheet in a "workbook". None of which existed when the keystroke decisions were made.

    Addendum 2024-11-06 08:42: And of course now that sheets have vast numbers of rows and columns.

  • (nodebb)

    They should have used a regex.

  • Strahd Ivarius (unregistered)

    CSV files are not always used for Excel, but for exporting / importing data between systems. I remember a case a few years ago when someone had to export data from a SAP application to import it in an Oracle one, something that went far above 1 million lines. It took around 24 hours to export, and then just before performing the import they noticed that the first line (listing the fields names) was missing... of course the import needed to be done "right now or else". You don't know how it is difficult to prepend a single line to a file when no text editor manages to read all the CSV file content without hanging up or crashing... (command-line to the rescue!!!)

  • LuzrBum.com (unregistered) in reply to my name

    There’s a book called “ Kill it with fire” that dives into this and comes to similar conclusions

Leave a comment on “Counting it All”

Log In or post as a guest

Replying to comment #:

« Return to Article