- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
Svillen? Is that you?
Admin
Not that it matters, because the first reaction of any sane person would be "Rewrite!" And no, this is not a good example of why you'd use C rather than C++ (in case anybody asks). Here are, in rough order, the problems:
(1) Shlemiel the painter algorithm. (2) Use of delete[]. This is almost invariably an awful idea; particularly in a loop. (3) Failure to ask yourself what you're trying to achieve. If you can actually make use of a stringified file that's 2GB long, then, fine. Preallocate *pResult as 2GB. (3a) See (1) and (2) in support of (3). (4) RAII. Some poor fool is going to have to delete pResult. Oops, sorry, was that free() pResult? No, actually, it's delete[] pResult. (4a) If, for some insane reason, you wanted to go this way, use an auto_ptr reference. (5) Do not coerce NULL into a long (vide the returned value if the file cannot be opened). (6) Gouge your eyes out before you even think of using ZeroMemory(). This is a surprisingly common Anti-Pattern amongst people who consider themselves C++ programmers, but haven't yet weaned themselves off the C string library. Mind you, it's not remotely necessary in this case. Use std::string, if you really have to do something like this. (7) I wonder what happens if the file is 2.5GB long? Using the type "long" on purpose, of course ... (8) How Long, O Lord? How Long before the Rapture?
and, of course,
(0) Consider using a more suitable tool. Not the tool whose sorry ass you just fired, obviously.
Apologies for stating the bleeding obvious, nine times over. Ya gotta start counting from 0, because that is the Zen of C.
Admin
Really? There's such a thing as finishing a task so that it's confirmable as done and free from logic errors - there's more to the task than mere runtime, and while it's nice to go from 20 seconds to 13, that means that a: it scales that much more and b: it will likely need minimal atention for the next year or two. It's worth a couple hours to make sure that b happens because if you're wrong, it'll take more than 2 hours to fix it.
Admin
"I saved 99% on the first optimization, and then another 35% on the second optimization."
Beancounter reports a 134% savings.
Admin
What are you talking about? This is first-class code which surely does not terminate in time.
Admin
As someone who was named Richard by my parents, I strongly object to this use of the name Dick. There are plenty of synonyms available.
Admin
Wrong! If it hasn't been tested, it doesn't exist.
Admin
http://www.lfgcomic.com/page/3
Admin
Could be worse - your last name could be Smalls
Admin
One of my pet peeves is how, when GUI OSs moved to multitasking, they would demonstrate how awesome that was by multithreading the copy operation. Watch! You can copy one file while copying another! All the little progress bars move at the same time!
But if you pay attention, they're all moving ever slower, and the predicted finish time shoots way up. If you're quiet, you can hear the hard drives thrashing (i.e. dying) as what would normally be efficient sequential writes become rather random writes. And, unless your OS takes great pains to prevent it (I've never heard of that), you're fragmenting the files at the same time.
The best copy process I've ever used was something called CopyDoubler for the old Mac OS 7.x. What it would do is spin off copies to a background process, and that process would perform the copies sequentially. Just as fast as copying one at a time, without making you wait for each copy to complete before starting another, and all the benefits of sequential execution as described in Joel's article. And I have never seen that technique used since.
I get that a lot. Some people bury cars on their ranch. Me, I see everything as puzzles. It's a fun quirk. Which one? The AI one or the social-bigotry-as-law one? I'm not sure I'll pass either, honestly. Hungarian notation is its own WTF. If your function gets big enough that it "needs" special variable name notation, it's big enough to need to be broken up into several smaller functions. The version I use more often is: "To a man with a new hammer, everything looks like a nail." It's a slightly different twist.Admin
This is an artifact of the OS' IO layer and FS - a decent block cache will accumulate writes and spit them out in larger blocks, while a reasonable FS impl (like ext2) allocates files several blocks at a time, leading to overall less fragmentation. And yeah, if you have 1 disk, then copying 4 files shoudl take about 4 times as long; would be interesting to see them serialize the jobs.
Admin
PLEASE PLEASE don't post articles like this without posting the code you wrote as a solution. The WTF code is only half the story.
Admin
my replacement for this mess is rsync + some cfg files.
Admin
See my comment above.
Please do not write any more C++ code unless you have absorbed
(a) The STL (b) Scott Meyer (all three books, please) (c) optionally, anything at all by Herb Sutter.
I'm sick and tired of lazy C++ "programmers."
I'm sick and tired of nitwit C programmers who claim that C++ would just "make things more difficult."
Actually, I'm just sick and tired of programmers. It's no bloody wonder that complete cretins like Rik V's predecessor don't just get employed, but can co-exist in the goldfish tank for a comfortable couple of years whilst being fed cheesy wotsits.
I'm on the right site, then.
Admin
Admin
Given the rubric, why would you want to pester YerLuvinUncleSysAdmin to install rsync and deal with the cfg files?
You know, and I know, that it's a simple twenty line script in Perl. Or Python. Or PowerShell. Or WonderBra (now available in Enterprise Edition!).
You still convinced about rsync? I can show you some 8 by 10 color glossy pictures that will suggest that PowerShell is a better solution.
Admin
Am I the only one thinking this is what Scripting languages were designed for?
I would have thought a situation like this (optimum efficiency is not imperative, just reasonable efficiency) the effort of writing and compiling an application is a bit OTT, when a script can call a few simple OS commands to achieve much the same result. Perhaps it wouldn't be the quickest way to go, but adequate surely (especially compared to the 40 minutes the original app was taking).
(I'll concede that he might be working in a Windoze environment, and things may be different there, although I would have thought that even a batch file could be set up to do a task as simple as checking that the checksum of two files is the same, and moving one if they weren't (even if he needs to then check which is newer). Perhaps I've missed the point....).
Admin
a) given a choice, I'd probably use Boost on a new project. Or python if I can get away with it - most of the C++ patterns are language features in high level languages.
b) he's not the only game in town - MS publishes a number of good SW dev books, even if they can't get OSes right. c) hey, a threads book.
oh, and C++ is frequently a pain in the ass. simple stuff is better if it isn't performance critical in some way.
Admin
I'll bet that admin prefers rsync with its good docs and community support to whatever madness I'd come up with in 2 hours. Besides, powershell sounds sort of windowsy - run windows in production and you deserve what you get.
Admin
It's not likely, but you could have been.
Admin
I can't believe he didn't handle shenanigans.
Old Captcha: genitus = p3nis
Admin
And once again we have insufficient proofreading. It's supposed to be spelled "forty", not "fourty". (By the way, I've once heard somebody claim that "fourty" is the British spelling. It's not. Check your dictionary.)
Admin
Idiocy abounds in this one.
"This was a semi-frequent task, and one that he needed to devote quite a bit of time to each month."
OK, so you had an idiot doing this task. But what the hell were you doing, allowing this situation to continue? This should have been a cronjobbed script that required no human interaction at all. Instead, every month, you allowed your idiot to mess around and waste time while you moped around passively, saying "oh, woe is me, this is no-brainer routine task is taking a lot of time every month. Maybe next month it'll be better."
Admin
I agree that there's little point spending significant time for relatively minor optimisation on a relatively insignificant app (let's face it, if it was allowed to run 40min originally, then 13 sec is not that much better than 20 sec). On the other hand, if the app were a little more critical (or resource hungry), it makes sense spending some time (within reason) to speed up process and minimise system impact.
Spending an hour to save 7 seconds seems a bit excessive (assuming the running of the app directly affects your work, it would take about 515 runs for you to get that hour back) - At once an hour (every hour) on weekdays it would take more than 4 weeks to regain that hour - business hours only (and assuming 8 hr days) it would take closer to 13 weeks.
If there is a constant overhead to initialise it, the saving would take even longer to gain.
If the process runs in the background, the change is basically insignificant.
It sometimes amazes me how obsessed people are with optimisation - and they often use the argument 'but if the system were much bigger....' This argument is only valid (IMO) if the system actually has a realistic chance (ie probability not possibility) of one day actually being significantly bigger. When people argue 'but we don't necessarily know if it will get that big, it could....' one wonders what they do in the early stages of the SDLC. Saving seconds is only important when you often save the seconds, and when there is potential harm in not saving those seconds. A database transaction that will be called a million times a day that can be reduced even by miliseconds may be of some value. A process that is run 10 times (or even 100 times) a day that affects nothing (or very little) has no great benefit in saving a few seconds (especially if the cost to make such a saving is relatively high).
<gets off soapbox>Perhaps the Universities/Colleges are at fault because they try to teach people the importance of optimisation, but use examples such as enrollment systems at a school (and these enrollments would be reasonably constant at several thousand at most schools, not to mention would really only be used once or twice a year). Optimisation for the sake of optimisation is pointless, if it costs time. If a process runs without affecting anything else (ie doesn't block other processes), and is not required in 'realtime' (ie there isn't a client waiting for a response who is going to get upset that the user stares blankly at the screen for 20 seconds) then an optimisation, even by a few seconds is proably useless.
Admin
The latter shall be the former: Well, if you're on a Windows system (and I have a sneaking suspicion that the OP is on a Windows system), then being sort of windowsy is not necessarily a bad idea. There's a general mass-hallucinogenic belief out there that quite a number of wacky, laff-a-minute, FT100 companies out there run windows in production because they've been sold a pup. Being a libertarian socialist, I personally don't believe that they deserve what they get -- it's a nauseating amount of money for no real effort whatsoever. Personally, I think they should be sat in a chair with their eyes propped open with match-sticks, a la Clockwork Orange, and forced to "do their business" on the Ubuntu desktop, or on top of a LAMP spike.
But that's just me.
The former shall thus be the latter (and frankly something we'd have a more fruitful discussion upon):
Interesting, because I don't think it's sane to choose C above C++ unless you're dealing with performance-critical software -- RTOS, these days. I'm not sure that the argument applies, even then.It's a bit like the idea that "Money is the root of all evil," isn't it?
You can strip C++ of all the bits you don't like. Please, no multiple inheritance (akkk..). Please, no ... well, here's a minimum:
You can even ban inheritance altogether (outside the libraries, obviously).
I've never been able to work out what the apparent difficulty is, here.
Anyway. We can both agree that the code in the OP is astonishingly bad, whether viewed through a C prism or a C++ prism. (I hope.) Therefore a meritorious WTF.
Admin
[Pedantry] Are you sure? Wouldn't fileData point to the beginning of the file, and *fileData contain the value of the first character? *fileData's type is a BYTE - can you fit the whole of the file in one BYTE?
Of course it would depend on what BYTE is defined as, but I assume (given how it is used) that BYTE would be an 8-bit Character....
[/pedantry]
Admin
Where I work, we have multiple inheritance toys such as (project)::Role::HasUUID (for a database object with a special UUID column) and Role::HasCredentials (for objects with passwords and the like) and and Role::Historian (it logs changes) like that, and we're really happy with them. It's quite effective to use them in a shallow manner like that (one deep inheritance tree, with assorted one-off "mix-in" style classes, which might possibly have ancestors themselves but generally not). (though, this isn't C++).
Multiple inheritance is like nuclear power. In the right hands, it can work wonders... but you can also use it to render your code uninhabitable without too much difficulty.
Admin
I think he should have written it to use memcached or some kind of MPI so that he could just keep allocating memory from a farm of servers. That way, if designed properly, the program would have been somewhat fault tolerant and could survive a failure of a machine or two while running. It would probably only need each machine to have 16 or so Gigs to compare a pair of large directories.
A Beowulf cluster of PS3's would be perfect.
Also what the hell was he doing writing it in a language that others know? He should have written it in LISP or something.
Admin
I was actually thinking that a task like this belongs to rsync or a scripting language like perl or python: a python script to do sysadminny tasks will often be a page or two of fairly clear code, and a lot of the STL stuff will be irrelevant to it - language features or simply not visible. Sure, it's slower, but if you're doing a simple task once a day, who cares? This task, done in python is probably the same speed (IO limited).
Admin
Err, you work same place as me, or - of wtf this is scary.
You might recongize these two.
If the business critical application fails reboot the machine. Trouble is, not remmebering state over reboot, if the business critical application continues to fail - well hello here's your endless reboot cycle.
If the email alert job failed, queue a job to email alert the operator that we cant email alert to the operator.
I live in this world.
captcha: abbas
Admin
The people that jump around as told to do horrible work, IMHO. The reason is they don't care about anything they do and are happy to go where they're told, so long as they keep getting paid. The ones that do care aren't able to just stop caring at a moment's notice. You may force them to stop what they're doing, but it won't make them productive where you want them. I find that sometimes I just need time to get something off my chest before I can continue on with what I'm actually supposed to do. Otherwise, I'm much less productive.
In any case, he was also further developing his skills by shaving off those further 7 seconds, which might not have mattered now, but might end up being mission critical sometime in the future.
Admin
Meanwhile, back in Realsville ...
The CxO said I must do 'Yada'. I explained to CxO that, okay, 'Yada' is a GoodIdea(tm), but if we do that then ... (badness) ... so let's do 'Blah'. 'Blah' gets all the upsides of 'Yada' with none of the downsides ... (badness) ... and the CxO said, "No, do 'Yada'", and now I am away to completely unrelated very important meeting with' Blankety-Blank Co' - report back to me on progress with 'Yada' two weeks from now.
... what's a developer to do? Answer 'Yada'! Here's the real WTF, every day, every where.
captcha: jugis
Admin
If you'll forgive me (or even not):
Which was, of course, a joke. As an aside, I got paranoid when I typed int21h in to Google on Chrome, and my computer crashed ... but even I don't believe in Assembler Injection. It happened, honest.Yes, I'd use Python, because I like Python. Actually, I'd use Perl, because that's easier to get past the SysAdmin Troll-Gate, and just as good for a simple file-munging script. If you remember, I pointed out that this is obviously a Windows shop: learning enough PowerShell wouldn't cause either one of us much pain.
It's a bit too late, I guess, and I screwed it up -- but the main point of the WTF is that it is wrong on many, many, levels. Not the least of which is that the original manager in charge should have asked for something that "compared two directories and put any files that had changed into a third." Not a lump of cretinous pseudo-C++ that needs to be wrapped in something else before achieving the task at a glacial pace, and even then failing, because the next layer up (which we don't see) is presumably going to have to do something like a memcmp() on two arbitrarily large buffers in memory. And then free the memory. Using the correct operators. Repeatedly, for each file in each directory.
Personally, I'd like to see the implementation of the memcmp().
Failing that, I trust you to sit down with me in some horrible pair programming environment, and we'll get a rock-solid solution to the actual problem, in ten lines or less, using a mutually-agreed (but not rsync and conf files, even though they have their place) scripting language.
It'll probably still not get past the SysAdmin Gatekeeper Trolls, though. One -- or both of us -- can only do their best.
For incidental pain:
Umm, I was arguing for a limited subset of C++ that works as a simple, but improved, drop-in for the C equivalent. It's a non-problem that I often have to face when dealing with an RTOS project. In that context, I could have argued for the use of a binary predicate functor in association with std::sort(), which is faster than a qsort() in C. But I didn't. Life is too (q)short.Is this one of those daft SAT questions for which rich WASPs spend outrageous amounts of money in order for a menial college graduate to knock enough syllogisms into their spotty cretinous little sprog's head before said sprog gets to go to Stanford? (Motto: "We will never fail you. Partly because you have a theoretically huge and pointy head, plus which your hair is blond and your eyes are blue and you, or your parents, will be valued alumni. Plus which, we tried to give you miserable cretinous little bastards an F, and you went on strike!")
There is no possible meaningful way that you can compare or contrast multiple inheritance with nuclear power.
Show me the tensors.
Admin
... around 2.6.12-2.6.18 sort of timescale, as I discovered when I had to figure out why a 400MHz ARM CPU was apparently unable to shovel a mere 6MB of data in RAM from userspace to kernelspace before the hotplug firmware upload script timed out - after 60 seconds!
Turns out it was allocating kernel space one page at a time, copying a page worth of data from user space, then allocating a new space one page larger, copying the existing data across and uploading the rest, etc. etc.; after I told it to double the allocation every time up to 1MB and go in 1MB steps thereafter, it suddenly took under a second.
Eventually it got fixed, but it's pretty sad that it even made it through review like that in the first place.
I would say it's TRWTF, but this seems to be the week for it all over. We found a case in GNU grep just the other day where it was using this same kind of algorithm to handle i18n.
Learning not to write quadratic algorithms - or at least, to know when you're writing one and be able to decide based on the context if it'll matter - is one of the things that separates the professionals from the cowboys and amateurs in this industry.
(And learning when sometimes even a linear algorithm won't be fast enough and what to do about it is the next level...)
Admin
Admin
Given the task to do myself, I'd either use a checksum lib like md5 and compare checksums + filesize for constant size operation or load 64k of each file at a time, compare, then overwrite the buffers with the next 64k, again for constant size operation.
But yeah, we'd probably end up with something workable and yes, this is a really WTFy WTF.
Admin
We can talk about nuclear power, and to restore your adventitious addition here:
. Not really fair to misquote fennec, is it? Particularly since I led him in to your little eco-fetish ex post facto. I mean, it's not as though he'd have known.Did you read the OP? Do you have any comprehension of what an utter failure that was? Do you wish to argue about the OP? Have you followed the (standard blog, with excellent exceptions, as one might expect from the TDWTF) arguments?
Get to the fucking point, man. You're not an idiot, and you score highly on "Reading Comprehension."
You'll note that I'm not asking you to define or defend a "relevant cut." I'm happy wasting words. I'm not quite sure what makes you happy, but ... good luck.
Admin
There's actually one better out there somewhere, but I couldn't find it. Something to do with "it isn't necessary, but in two or three edge cases, it makes it easier to express what you mean."
You right. Me asshole.
I have to re-learn metaphors. Damn, it's been so long since Sophocles, and too many WTFs in between.
Little Jim: He's fallen in the water!
Admin
Captcha: acsi. In the land of the blind the one i'd dyslexic character set is not binary.
Admin
WinMerge is my weapon of choice, although I do use robocopy for batch files.
Admin
Ninety FRist!
Admin
Admin
From fourty minutes? Is this the bonus WTF?
Admin
The real WTF is that all you "C++" programmers posting here are still using new[] and delete[].
...and testing the result of memory allocation for null
(hint: it throws an exception and would leak any local memory which was allocated with new[])
Admin
Admin
So you perform extensive cost-benefit analysis on everything you do, and expect the same from everyone else? Sounds like a waste of some valuable time, to me
Admin
That code makes programming look like politics
Admin
I once saw a piece of code:
while(getTrue()) ...
but what made it particularly funny, was the implementation of the getTrue-function:
bool getTrue() { return false; }
Admin
I was talking about the following declaration:
This is standard C/C++ syntax for declaring an array of 256 pointers to BYTE. Given your interesting explanation which was completely irrelevant to pBuffer, perhaps you were looking at the declaration of the function parameter pResult?Admin