- Feature Articles
- CodeSOD
-
Error'd
- Most Recent Articles
- Secret Horror
- Not Impossible
- Monkeys
- Killing Time
- Hypersensitive
- Infallabella
- Doubled Daniel
- It Figures
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
greenfield is a myth. and the replacement with a big bang usually never happens
Admin
My election day prediction is a lengthy argument about the most efficient way to count the number of lines in a CSV file.
Admin
Bonus point for the potential off by one error: you can have n rows with only n-1 new lines
Admin
Or, worse, n rows and n newlines, so the split creates a blank row at the end, and you finish with n+1 rows to process, one of which is guaranteed to be empty.
Admin
What are the odds the row count is later passed in to another method that reads the entire file again and loops through the lines up to countRows to process them?
Admin
Just because it's a CSV file doesn't mean it isn't going to run into the 65535 row limit. And just because this code dates from 2007 doesn't mean the same thing can't bite you badly a decade or more later. https://www.bbc.co.uk/news/technology-54423988
Admin
And of course if any of the CSV data consists of quoted string fields containing embedded newlines, then all newline-based counts are meaningless. Despite the existence of a standard (RFC 4180) the reality is CSV is a very loose format with lots of variations in what gets generated or accepted. Maybe these devs know embedded unescaped newlines are impossible. Far more likely they just never considered the possibility.
Admin
I am not really religious, but Jesus Christ.
Addendum 2024-11-05 08:08: Even though this code made me nearly puke, it has to be pointed out that they at least correctly disposed the stream. So there's that.
Admin
@MaxiTB :
Ah, so you're the sort of person who'd be up there on that cross at the end of The Life of Brian, singing "Always Look on the Bright Side of Life" with all the rest???
Admin
There's also our old friend, the check-if-exists-then-open antipattern.
Admin
It's not that daft. In .net at least checking for existence will fail gracefully, i.e. you'll get a false if the process doesn't have access rights to the file whereas trying to do things with a file without that check will land you in exception handling country.
Why a process hasn't got rights to a file it's supposed to be processing is another problem entirely.
Admin
I think I'd be as much concerned about the fact that the first line is a case statement!! Particularly as the value of that constant might just be 65536. The mind boggles what the other case values might be - and WTF the switch statement is doing.
Admin
"But we're not even reading an Excel file, we're reading a CSV." -- which was probably dumped from an Excel file -- and then read back in after the counting.
Admin
Nope, again, not religious. But it's funny to me that someone reads a complete file into memory than creates countless additional sub strings just to count how many Environment.NewLine are in a file while making sure they are a good citizen and don't waste OS resources like file handles. Then again, on second though, Just remember the last laugh is on you :-)
Admin
Anyone who has pressed Ctrl+Down Arrow one too many times, or understands how computers store numbers, would know that the row limit increased to 1,048,576.
Admin
Someone needs to explain to me the usefulness of going to the very last row of a spreadsheet. Going to the last used row of a spreadsheet, sure, but the absolutely very last row?
Admin
This isn't as clear cut as it seems. If the typical CSV file is small this would actually do fairly well. The thing is you are going to read the whole file no matter what you do and there are performance advantages to minimizing the number of reads used. So long as the file is small compared to memory reading it in one gulp is probably a good idea.
The split by line is inherently inefficient as it does a copy and does memory allocation--but for fairly small files this is probably small compared to the disk read time. But this routine will perform quite badly if the files aren't small relative to available memory.
Admin
Hmmm, maybe we can "solve" the problem by introducing infinite scrollers (as per yesterday's discussion, https://thedailywtf.com/articles/comments/a-matter-of-understanding#comment-665749) as UX of choice for spreadsheets (yes, and CSVs too). What could possibly go wrong?
Admin
It's to make sure some idiot user didn't misunderstand how hiding cells work and end up with random data at row 5327 and 332211 because of some sorting rule or because somebody does not understand row freezing.
I love excel with all my heart. It is gods gift to mankind and proof that not all is yet bad with the world. But the things people do with it also make it the canvas upon which is displayed the totality of human ineptitude.
Admin
@ Worf Ref
No mystery at all. Back in the day it was common to store macros and such at the far corners of the sheet to keep them away from prying eyes and random cut/copy/paste destruction. So that set the rationale for e.g. Ctrl-End to go where it does.
Then backwards compatibility ensures it still goes there. Even though now we have hidden regions, locked regions and more than one sheet in a "workbook". None of which existed when the keystroke decisions were made.
Addendum 2024-11-06 08:42: And of course now that sheets have vast numbers of rows and columns.
Admin
They should have used a regex.
Admin
CSV files are not always used for Excel, but for exporting / importing data between systems. I remember a case a few years ago when someone had to export data from a SAP application to import it in an Oracle one, something that went far above 1 million lines. It took around 24 hours to export, and then just before performing the import they noticed that the first line (listing the fields names) was missing... of course the import needed to be done "right now or else". You don't know how it is difficult to prepend a single line to a file when no text editor manages to read all the CSV file content without hanging up or crashing... (command-line to the rescue!!!)
Admin
There’s a book called “ Kill it with fire” that dives into this and comes to similar conclusions