• Galumfry (unregistered)

    Optimization is the root of all evil. Jeff is going to hell.

  • RFox (unregistered)

    I think there was a spilling mistak. It should have read read the frist line...

  • JWB (unregistered)

    This developer is clearly paid by line.

  • (cs)

    But look at the VALUE that was created by leveraging the existing infrastructure (disk drive and network) in such a manner...

  • Qvazar (unregistered)

    I've seen this before.. just with database table rows instead of files. I got that monstrosity down to 45 seconds from 45 minutes..

  • np (unregistered)

    Even a minute runtime seems pretty large. Reading 30k lines and sending it across should take a second.

    If it has to take 30k round-trips, then I see another place for an optimization.

  • Accalia (unregistered)

    Huzzah! Remy's back!

    We missed you and your sneaky HTML comments!

    well, i did at any rate.

    Captcha1: tristique

  • Pista (unregistered)

    Oh boy, there are so many WTFs in the first paragraph, that I didn't have to read the rest of submission.

    BTW, it's "ukrainian", not "ukranian" (those who read HTML comments will understand)

  • Miriam (unregistered)

    I guess this is an interesting and efficient way to do things.

  • Miriam (unregistered)

    I guess this is an interesting and efficient way to do things. Like writing comments, for example.

  • Miriam (unregistered)

    I guess this is an interesting and efficient way to do things. Like writing comments, for example. Captcha: luctus

  • Miriam (unregistered)

    I guess this is an interesting and efficient way to do things. Like writing comments, for example. Captcha: luctus TRWTF is not writing this over multiple articles.

  • MrBester (unregistered) in reply to Pista
    Pista:
    BTW, it's "ukrainian", not "ukranian" (those who read HTML comments will understand)

    Yeah, but that extra i costs another 50 quid and puts it over budget.

  • Derp (unregistered)

    Pemy Rorter

  • QJo (unregistered)

    The suspicion is that the original mistake was to put the "open" command within the loop by accident. On finding that this caused there to be a large number of unclosed files (breaking the maximum allowed), he said, d'oh! I forgot to close it, and added the "close" to the loop.

    Then it was: "But it's reading the first line over and over again!" Ah yes, so let's fix that by deleting that pesky first line after I've processed it, so as to get to the second line.

    Oh but hang on, that's no good, I might want to keep the original file. Okay, let's, um, how do we do this? Er um (etc. etc.)

    ... and all for the want of being unable to identify what his first stupid mistake was.

  • (cs) in reply to Pista
    Pista:
    Oh boy, there are so many WTFs in the first paragraph, that I didn't have to read the rest of submission.

    BTW, it's "ukrainian", not "ukranian" (those who read HTML comments will understand)

    No, I believe it's Ukrainian. Unless they are species, like dogs, humans or elves.

  • (cs) in reply to Pista
    Pista:
    BTW, it's "ukrainian", not "ukranian" (those who read HTML comments will understand)
    BTW this typo is so common that it even briefly made it into Firefox 29 alpha.
  • (cs) in reply to Galumfry

    This is just defensive programming. should the process fail it will resume where it left of instead of having to process the whole file again ;-)

  • Anonymii (unregistered)
    It didn’t take Jeff long to figure out why it performed poorly

    Indeed. VB6.

  • rtlgrmpf (unregistered) in reply to Qvazar
    Qvazar:
    I've seen this before.. just with database table rows instead of files. I got that monstrosity down to 45 seconds from 45 minutes..
    I've got once a CSV-processing masterpiece. "It's a leettle slow, please look if we need a faster machine or maybe 2 machines or maybe more RAM..."

    After some iterations it was down from 20h to less than a minute...

  • Moss (unregistered) in reply to np
    np:
    Even a minute runtime seems pretty large. Reading 30k lines and sending it across should take a second.

    If it has to take 30k round-trips, then I see another place for an optimization.

    Perhaps processing whatever is in those 30k lines is what's making it take that long?

  • Barry the Builder (unregistered)

    I recently worked with a Ukrainian offshore company that suddenly went silent, right around the time the Russians (we all know it was you) invaded. Haven't heard from them since. Shame really as I was getting a pretty good collection of WTFs ready to send in.

  • Dictum Barchon (unregistered)

    The classic O(n^2) algorithm for reading a file line-by-line... still doesn't beat the time I found it, in some similarly "enterprise" code, nested inside a crude bubblesort to make the whole thing O(n^4).

  • (cs) in reply to Barry the Builder
    Barry the Builder:
    I recently worked with a Ukrainian offshore company that suddenly went silent, right around the time the Russians (we all know it was you) invaded. Haven't heard from them since. Shame really as I was getting a pretty good collection of WTFs ready to send in.

    So what you're saying is, the Russian invasion had some good points to it after all?

  • (cs)
    It didn’t take Jeff very long to rewrite this to simply read the file, one line at a time.

    heh

  • DigitalDan (unregistered)

    I guess there really are people who do things like this.

    Back in the dark ages, I improved a numerical search by an average factor of 8000 or so -- turned a linear search through an unordered table into a binary search through a sorted one. Was never subsequently able to celebrate with colleagues who chipped a percent or two off running times . . .

  • autark (unregistered)

    These are the types of DailyWTFs I really enjoy. It's like a good horror movie - all the fun is in the anticipation. I read the two routine names - ProcessFile and DeleteLine - and my stomach lurched.

    The satisfaction in reading the code and confirming the train wreck of an implementation is just icing on the cake.

  • David (unregistered)

    And he got a mumbled "thanks" for his efforts in saving the company hundreds of hours over the course of a year.

  • Dan (unregistered) in reply to Pista
    Pista:
    Oh boy, there are so many WTFs in the first paragraph, that I didn't have to read the rest of submission.

    BTW, it's "ukrainian", not "ukranian" (those who read HTML comments will understand)

    Actually it's український. Arguing over a transliteration is pointless unless it fails to approximate the original pronunciation.

  • Mason Wheeler (unregistered)

    Something similar happened at my last job. We had an ad-hoc piece of software thrown together to parse a bunch of flatfiles, load them into a database, and create a specialized report. It worked fine originally, but didn't scale well; some of our larger clients (huge TV networks with 3 letters in their names) were just generating too much data, and the report could take upwards of 10 hours to run.

    Profiling showed that the bulk of the time was being spent uploading the data to the database. (Apparently sending a zillion individual INSERT requests over the wire, while holding a transaction open all that time, while the database was also managing regular business processes, can take a long time!) So I recoded it to use SQL Server's Bulk Insert capabilities and sent the new tool off to the Product Management person who was our contact with this client.

    About 45 minutes later I got a panicked call from the Product person. "I just checked on the report tool, and it's just sitting there. It doesn't look like it's doing anything at all!"

    So I logged in to the test database to check, and everything looked fine to me. When I told the PM person that it looked like the report had run successfully, she was shocked. "You mean it's done already?!?"

  • Publius (unregistered) in reply to Dan
    Dan:
    Pista:
    Oh boy, there are so many WTFs in the first paragraph, that I didn't have to read the rest of submission.

    BTW, it's "ukrainian", not "ukranian" (those who read HTML comments will understand)

    Actually it's український. Arguing over a transliteration is pointless unless it fails to approximate the original pronunciation.

    That was pre-russian invasion. Now it has to be just укранський

  • Cassy (unregistered) in reply to np

    I think the 30k number was an arbitrary amount given for the sake of example. It could easily have been hundreds of thousands and into the millions of lines, I imagine.

  • Paul Neumann (unregistered)

    In the spirit of throwing good money after bad, this is all negligible on an SSD drive. Let me order a 500GB SSD for my workstation to test and, then once confirmed, we can get another for the server.

    There are no software problem too big for hardware to solve!

  • (cs) in reply to ip-guru
    ip-guru:
    This is just defensive programming. should the process fail it will resume where it left of instead of having to process the whole file again ;-)

    Plus one. Sure, it's slow, and sure it does a lot of operations (hint -- isn't that what computers are for?) AND if you power off the computer in the middle of the job, it will just pick up where it left off.

    Who's to say that this is a WTF?

    Without knowing the actual requirements, who knows. For all we know, perhaps the requirements are that "it's bullet proof as far as crash/restart; and that performance is irrelevant."

    Sure, there are other ways to do it -- for example, put each line in a DB, and mark the record after it's been processed; and when the job restarts, only process records that are not marked.

    But suppose the task was assigned like this: As quickly as possible, write a processing loop that is guaranteed to send a line from the file to the processor exactly one time. Performance is not an issue, but I've allocated less than one hour of time for this task.

    How else would you do it? Not everything needs to be "enterprise quality"; and not every task has the luxury of a week of design/programming time.

  • Alexander (unregistered) in reply to rtlgrmpf

    RinkWorks has a similar example

    RinkWorks:
    One of our customers, a major non-US defense contractor, complained that their code ran too slowly. It was a comedy of errors. Act I
    Contractor: "Can you make our code run faster?"
    Tech Support: "Yes, but we have to take a look at it."
    Contractor: "We can't, the code is classified."
    Tech Support: "Can you explain to me what your code is doing?"
    Contractor: "No, that's classified."
    Tech Support: "Can you tell us what functions you use?"
    Contractor: "No that's classified."
    

    Act II

    So, on a hunch, we sent them the latest version of our software for Windows NT.

    Contractor: "Why is this running faster on our 800MHz Pentium than on our VAX?"
    Tech Support: "When did you buy that VAX?"
    Contractor: "Some time in the late 1980s."
    

    Act III

    Finally, some of their code was declassified. We looked at it, and one piece of it contained a routine for reading one million or so integers from a file. Rather than opening the file once and reading them all in, there was a loop: it would open the file, read the first integer, and close it; then open it again, read the second integer, and close it; etc.

  • Butthats 4Eva (unregistered)

    Every time it's some damn relative fkn it all up...

  • Jimmy Rustler (unregistered) in reply to Paul Neumann

    Perhaps this tool would be good as an SSD write cycle stress tester.

  • Canuck (unregistered) in reply to DigitalDan

    Bragging rights ~= improvement x (quality of original programmer).

  • radarbob (unregistered) in reply to JWB
    JWB:
    This developer is clearly paid by line.

    This developer is clearly paid by the I/O operation

    FIFY.

  • Butthats 4Eva (unregistered) in reply to radarbob
    radarbob:
    JWB:
    This developer is clearly paid by a relationship to a bigwig.

    This developer is clearly paid by a relationship to a bigwig

    FIFY.

    FTFBOY

  • Some Guy (unregistered) in reply to np

    This looks like VBA, which is abysmally slow. A minute runtime for a 10000 line file is actually pretty good in VBA if you can believe that.

  • Surgeon Salt (unregistered) in reply to DrPepper
    DrPepper:
    ip-guru:
    This is just defensive programming. should the process fail it will resume where it left of instead of having to process the whole file again ;-)

    Plus one. Sure, it's slow, and sure it does a lot of operations (hint -- isn't that what computers are for?) AND if you power off the computer in the middle of the job, it will just pick up where it left off.

    Who's to say that this is a WTF?

    Without knowing the actual requirements, who knows. For all we know, perhaps the requirements are that "it's bullet proof as far as crash/restart; and that performance is irrelevant."

    Sure, there are other ways to do it -- for example, put each line in a DB, and mark the record after it's been processed; and when the job restarts, only process records that are not marked.

    But suppose the task was assigned like this: As quickly as possible, write a processing loop that is guaranteed to send a line from the file to the processor exactly one time. Performance is not an issue, but I've allocated less than one hour of time for this task.

    How else would you do it? Not everything needs to be "enterprise quality"; and not every task has the luxury of a week of design/programming time.

    You're joking, right?

  • Mickey (unregistered) in reply to Alexander
    Alexander:
    RinkWorks has a similar example
    RinkWorks:
    One of our customers, a major non-US defense contractor, complained that their code ran too slowly. It was a comedy of errors. Act I
    Contractor: "Can you make our code run faster?"
    Tech Support: "Yes, but we have to take a look at it."
    Contractor: "We can't, the code is classified."
    Tech Support: "Can you explain to me what your code is doing?"
    Contractor: "No, that's classified."
    Tech Support: "Can you tell us what functions you use?"
    Contractor: "No that's classified."
    

    Act II

    So, on a hunch, we sent them the latest version of our software for Windows NT.

    Contractor: "Why is this running faster on our 800MHz Pentium than on our VAX?"
    Tech Support: "When did you buy that VAX?"
    Contractor: "Some time in the late 1980s."
    

    Act III

    Finally, some of their code was declassified. We looked at it, and one piece of it contained a routine for reading one million or so integers from a file. Rather than opening the file once and reading them all in, there was a loop: it would open the file, read the first integer, and close it; then open it again, read the second integer, and close it; etc.

    Re III, I think this is not uncommon.

    Every tutorial I've ever seen about reading files emphasises "CLose the fucking thing", and I can imagine a lot of devs who read an example to read a line (open, read, close) and then whack it in a loop to make it do it multiple times....Unfortunately, a lot of dev's out there seem to think it's all about knowing the syntax of a language, not understanding how things work.

    NB: I think modern porgramming languages abstracting a lot of the actual work away haven't helped any. I have nothing against these languages per se' (in fact I agree they can help whip up relatively complex stuff pretty quickly), but I think wanna be dev's (ie students, I suppose) should be forced to use low-level languages (preferably on low-spec machines) at least some of the time to get a bit of an understanding of what's going on and the impact that stupidity (and naiveity, and laziness, and....) can have on performance.

  • AP² (unregistered) in reply to DrPepper

    @DrPepper: someone failed their concurrency class :)

    The implemented solution doesn't actually solve that problem. Say it crashed inside ProcessFile(), how would you know if it had already sent the line or not?

    There's a reason why multi-phase commit protocols have been invented (and even those don't guarantee it, since a perfect solution is probably impossible).

  • Dan (unregistered) in reply to DrPepper
    DrPepper:
    ip-guru:
    This is just defensive programming. should the process fail it will resume where it left of instead of having to process the whole file again ;-)

    Plus one. Sure, it's slow, and sure it does a lot of operations (hint -- isn't that what computers are for?) AND if you power off the computer in the middle of the job, it will just pick up where it left off.

    Who's to say that this is a WTF?

    Without knowing the actual requirements, who knows. For all we know, perhaps the requirements are that "it's bullet proof as far as crash/restart; and that performance is irrelevant."

    Sure, there are other ways to do it -- for example, put each line in a DB, and mark the record after it's been processed; and when the job restarts, only process records that are not marked.

    But suppose the task was assigned like this: As quickly as possible, write a processing loop that is guaranteed to send a line from the file to the processor exactly one time. Performance is not an issue, but I've allocated less than one hour of time for this task.

    How else would you do it? Not everything needs to be "enterprise quality"; and not every task has the luxury of a week of design/programming time.

    You HAVE to be trolling!

  • Rocky Mountain Coder (unregistered)

    This is BRILLIANT!

    He built an Service Buss's queue functionality on top of a text file!

    I've never been this drunk.

  • Sensi (unregistered) in reply to Anonymii
    Anonymii:
    It didn’t take Jeff long to figure out why it performed poorly

    Indeed. VB6.

    At first I wondered why the explanation of the WTF was neccessary. After all, we don't get explanations for Java, C# or PHP WTFs. Then I realized that the people who come here probably can't even tell the difference between VB6 and VBscript.

  • Undeclared (unregistered) in reply to DigitalDan

    Been there, done that. Colleague shows me a slow performing program that had a linear search in a largish, already correctly sorted, array for each input record read.

    Me: well, duh, use a binary search. Colleague: Use a what? Me: (sigh) Here, give me that keyboard.

  • Gerry (unregistered) in reply to DrPepper

    Or simply record where you are up to in a seperate file...

  • Lanni Barovich (unregistered) in reply to Rocky Mountain Coder
    Rocky Mountain Coder:
    This is BRILLIANT!

    He built an Service Buss's queue functionality on top of a text file!

    I've never been this drunk.

    You mean BRILLANT!

Leave a comment on “Line by Line”

Log In or post as a guest

Replying to comment #439551:

« Return to Article