• MiserableOldGit (unregistered)

    TRWTF is trying to harvest data for databases in all sorts of unstructured, unmanaged spreadsheets.

    It's always going to be a load of dirty pain.

    Although that approach is awful, I suspect someone thought hibernate would grease the wheels, rather than just add more sides to them ....

  • Biri (unregistered)

    The progenitor of this story prefers to be called Mr. Syntax, so we will not call him that at any point in this story.

  • Whoever (unregistered)

    TRWTF is posting a huge amount of code to illustrate a point that is not clearly explained in the text. If this is the OP's idea of "brevity", I'd hate to be around when he gets verbose.

  • not a robot (unregistered)

    tldr; whats this? a code review?

  • Sole Purpose of VIsit (unregistered) in reply to Whoever

    I think the "huge amount of code" is supposedly the point, although it doesn't really convince me. Not being Hibernate-aware, it wouldn't surprise me to find out that most of this is required boilerplate, irrespective of the WTF rubric. ("YAGNI -- there are only two basic spreadsheet 'schemas' out there.") Mostly, it just looks like incredibly badly written code with little or no encapsulation involved.

    A question: what was the fix? (We usually get to be told what the fix is.) I'm guessing the two lines after:

    //Op: Null pointer thrown from row.getCell(...)

    ,,, which would be a WTF by the submitter. There's no point in leaving the original bug in the code, even if you comment it out.

  • Koen van Dratel (unregistered)

    @Biri, Whoever, not a robot and Sole Purpose of VIsit : yes, we're into some meta-WTF domain (I can only add the moot point that row.getCell(...) does not throw "Null pointer" but that it can - sensibly - return null)

  • Rick (unregistered)

    The article says that there's no stack trace and no logging, but it looks to me like most of the code is wrapped in try-catch blocks and the logger is including the exception that was caught. The only spot I can see is the getValue() method that returns empty string instead of throwing or logging an exception, but that's pretty standard when you're parsing dirty data and want a process to continue. Languages like C# even include tryParse() methods that do this for you.

  • guest (unregistered)

    While there is some bad coding here. The code could be much worse. I don't see why it would take 8 hours to find the bug. This original coder knew about local final variables. The programmer throws RuntimeExceptions. They even use not null annotations. This code doesn't look wtf level of bad.

  • name (unregistered) in reply to Koen van Dratel

    The problem is more likely to be caused by sheet.getRow returning null for a row that is not defined in the spreadsheet. "We cannot assume that any given cell will exist on all spreadsheets", and we cannot assume that every row will do as well.

  • MiserableOldGit (unregistered) in reply to Koen van Dratel
    I can only add the moot point that row.getCell(...) does not throw "Null pointer" but that it can - sensibly - return null

    I assume the NPE gets thrown because the parameter in df.formatCellValue doesn't accept nulls, or something in there trips over. So his comment is sloppy.

    Regardless, the fact getCell() chucks you a null at times when most of us would have expected an empty string is just one of those gotchas waiting for you if you end up working with the Excel object model. 8 hours does seem a bit of a long time to find it if you have the source code and a clue what you are doing. Not sure I like his fix much either, I'd have brought the value back into a more loosely type variable and tested it rather than bounce off the error handler potentially millions of times in routine data capture. Unused cells are not an unexpected occurrence, if they are in this case his solution is then wrong.

  • name (unregistered) in reply to MiserableOldGit

    A more correct solution would involve checking if currentRow is null before attempting to use it, as well as checking the cells found in a non-null row. The getValue method is not too bad, but I would have checked the cell is not null as you suggest rather than using try-catch.

  • Scott (unregistered)

    tl;dr. Scanning through, it seems like once again, trying to solve problems which really don't exist leads to unnecessary complexity and fragile code. No surprise there.

  • MiserableOldGit (unregistered) in reply to name

    You could, I wouldn't bother as the cell might still be null even if the row is valid.

    Theoretically, if he's only looping through the usedrange he shouldn't find totally missing rows, per se, although Excel can be less than reliable over that.

    If it can be that Excel can have "missing" rows within the usedrange then there's another big bug there, he's looping from a starting point (presumably zero, but I don't see the const declaration) to < getPhysicalNumberOfRows. Normally, getPhysicalNumberOfRows = getLastRowNum+1 = UsedRange.Rows.Count (if you're lucky!), but of course what you describe would mean getPhysicalNumberOfRows <= getLastRowNum so we'll miss final rows ... unless it doesn't do what it says on the tin ... not an uncommon problem with Excel object methods and properties.

  • Da expert (unregistered)

    But can it handle ANY spreadsheet? Huh?

  • Sole Purpose of VIsit (unregistered) in reply to guest

    I'm presuming that the reason it took eight hours to find the bug is down to one of three things (all memes on the modern TDWTF):

    1. Snoofle has anonymized the story out of all recognition. (Probably not, given the wads of useless code.)

    2. The OP doesn't have a clue how to set up an IDE, craft a unit test (via, yes, this is terriffically difficult) canned spreadsheet input, or alternatively input dragged out of the log). I'm no more experienced in Eclipse or NetBeans than I am in Hibernate. But I'm led to believe that it is possible to attach a break point to general exceptions. (This being Java, maybe there's a difference between checked and unchecked exceptions -- but I don't see a reason to care in this case.)

    3. The OP is an imbecile.

    These here are not an XOR., btw.

  • Sole Purpose of VIsit (unregistered)

    I'd respectfully disagree about the level of WTFery, however. There are finals sprinkled all over the place, which is peculiar (to say the least) if this horrid mess is supposed to support arbitrary formats in future. There's practically no parsing going on, which would also come in handy if you're going to support future formats. (Note: protected virtual bool VerifyParse(object input) { return true; } is perfectly acceptable in YAGNI terms.)And the kicker is this, to quote an earlier poster:

    "We cannot assume that any given cell will exist on all spreadsheets", and we cannot assume that every row will do as well.

    Well, er, no. Of course not.

    I don't care if there is only a single spreadsheet "format" involved. Only a nutter wouldn't verify the data.

    In other words, terrible as the original solution is (and it is. It is terrible. Unspeakably so), the WTF here is that more or less everybody else involved in the OP somehow believes that a better solution would be a trivial implementation. The OP guy, the boss, Snoofle ... just about everybody else. (Props to the commenter above who emphasised that importing any spreadsheet format into a DB is ... non trivial.)

  • Dave (unregistered)

    TRWTF is doing anything this daft when Excel has very good converters for almost any imaginable file format. It's trivial to import a file into Excel, let the converter do its thing, and save in Excel format. Then you only need to worry about importing Excel spreadsheets.

    Sometimes the simplest method isn't just the best, but the only sane way. However much you hate Excel and VBA. (Full disclosure, I don't hate VBA: it's better to have shonky tools to hand than good tools that you didn't bring with you. I just hate the things people do with VBA...)

  • Dave (unregistered)

    "blindly assumed that every cell would exist in all spreadsheets, regardless of potential format differences."

    Pfft, you can't even assume that every spreadsheet-file contains so much as a single cell. If you're buggering about in VBA, it can make sense to have a 0x0 worksheet*. I expect there are plenty of other reasons to do the same.

    *FSVO 'make sense' involving protecting things from idiot users with more admin powers than sense.

  • Anon (unregistered) in reply to Sole Purpose of VIsit

    The idea that all developers have this magical access to production data is TRWTF in your comment.

    There's a secondary WTF in your comment, where you imply that it should be easy to generate input which will cause the problem, excepting that if the problem isn't obvious and the program flow is unclear, how could one POSSIBLY generate the input to produce the specific exception at the required location, AND know that the surfaced exception has exactly the same cause as the one in the bug?

  • MiserableOldGit (unregistered)

    But he gave the explanation and the answer in the article ... it's an empty cell. The program flow may be a thing of horror and dismay, but the scenario should have been flushed out by even the most basic unit testing, and if not that, troubleshooting.

    Excel is horrid to interact with this way, but it's not so different to going through a normal recordset and creating a type mismatch or null error because you forgot to check the field definition or were a bit optimistic about data quality.

    Verify the data is pointless with a workflow that involves people giving you spreadsheets, you know it will be littered with every conceivable fuck up and users will assume if they put a new column in, a field will magically appear in their system, if they paste a word doc/pdf/email/jpg in a cell, somehow the content will be extracted and find itself in the comments box even though you were told that was Yes/No/Unknown (but catered for blank and file_not_found, coz you is professional). It needs inline validation with the processing and proper transactions, wherever possible just pop out a list of shitty rows for manual handling and import the rest, otherwise you'll spend a lot of late nights "cleaning data".

    more or less everybody else involved in the OP somehow believes that a better solution would be a trivial implementation.
    That's because some of us deal with this a lot, granted snoofle may have anonymised the sense out of the real submission, but it looks ridiculous. I've got a generic spreadsheet/word doc/xml/pdf/graffiti-on-the-side-of-a-donkey bulk importer to SQL server via Access abomination in my day job at the moment, yes it is what you'd get if you printed off five years of CodeSOD and shoved it through a Nutribullet with camel dung and the tears of orphans, but it's a thing of beauty compared to that solution in the article. I'd bet my left bollock it runs quicker and with a better success rate too. And I didn't write it, I despise it with a passion.
  • Jeremy Hannon (google)

    I wrote a generic CSV importer, but it doesn't import anything. It has to be one of the types defined in the ORM. Then, as long as the column headers are right, in any order, it should import.

    That actually works and isn't a lot of special code because all of the processing rules are already built into the ORM/DAO classes for other uses. I made sure I had constraints on what it could be in my design. I can see where one might easily go south trying to build one, though.

  • Steve (unregistered)

    I suspect a healthy chunk of the eight hours was spent narrowing the problem down to the sections of code that were posted.

  • Zenith (unregistered)

    Excel has some weird corner cases to say the least. There's a process where I have to import an Excel sheet into a SQL table. Before I can run a bulk insert, I have to convert Excel to some kind of delimited format. For some reason, I always get several blank rows at the end, which is usually just enough to make the bulk insert fail. What maybe the convertor should've done was accept some kind of flag for whether or not the top row is column headers, loop through that list, and use it to map the subsequent rows. But I agree with other comments that these files are rarely what you're told they are and that's probably why you wrote a parser rather than suffer through something like SSIS...

  • bodkin (unregistered)

    "Contract our thesis composing administrations UK to get shabby exposition assistance from qualified and master essayists. Get your thesis finished inside 5 days." Truth in spam?

  • Dave (unregistered) in reply to bodkin

    FWIW, I worked for a while writing essays for that kind of business. Including high-level dissertations. It was a summer job when I was 15, I just slapped together bits from Google (or yahoo etc, come to think of it) and reworded them. I got a fiver an hour, if I remember rightly. Quit when I discovered just how much profit the boss was making out of me, having asked for a payrise I didn't get.

  • MiserableOldGit (unregistered)

    If the quality of the thesis authoring is anything like the unintelligible drivel in these spam bots I wouldn't hold out much hope for a high mark.

    Mind you, mine was probably unintelligible drivel with the right buzzwords chucked in the mix and the correct people listed on the grovelling thank you page.

  • Koen van Dratel (unregistered) in reply to name

    You're right: Op might have written "Null pointer thrown from df.formatCellValue(...)" if the problem would have been first encountered at a present row with an absent cell. Same problem in a slightly different scenario, which gives extra support for Sole Purpose of VIsit's closing remark.

  • Catprog (unregistered) in reply to MiserableOldGit

    It is not an empty cell. It is a cell that doesn't exists (From my reading of the article)

  • Dave (unregistered) in reply to MiserableOldGit

    "If the quality of the thesis authoring is anything like the unintelligible drivel in these spam bots I wouldn't hold out much hope for a high mark."

    Oddly enough, the hardest part was writing down to a low grade. The kinds of students who used essay-writers weren't going to hand in A-grade work of their own. Much harder to write a C for them than an A, IME.

  • MiserableOldGit (unregistered) in reply to Catprog
    It is not an empty cell. It is a cell that doesn't exists (From my reading of the article)

    Not a distinction that matters in excel VBA, that's the gotcha that trips many people (including this coder). Unless you've gone and put some data in, calling cell.value will vomit at you, even if it's surrounded on all sides by cells with values. Theoretically, if an empty string has been explicitly placed in a cell, you'll get that back, but I wouldn't be surprised if Excel doesn't remove those when it's "cleaning up".

    In our heads, Excel creates a table of rows and columns as big as we need for what we are doing, and we just need to be careful about the limits. In reality, I think Excel is instantiating cell objects as data is entered in a cell, rows and columns are merely collections of pointers to those objects. That could, of course, be total bollocks, but that's how the behaviour appears when plating with the object model.

  • johnny b mediocre (unregistered)

    I saw this and asked myself, how on earth did it pass code review?

    Then I asked myself, what on earth makes me think they did code review?

  • urkerab (nodebb) in reply to Sole Purpose of VIsit

    I think the //Op comments are not part of the code but were added by the original poster.

Leave a comment on “You Absolutely Don't Need It”

Log In or post as a guest

Replying to comment #:

« Return to Article