- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
Usually they are the fastest to process. Especially if you're dealing with files containing billions of records, say a terabyte or two in size - I've seen such an application once first-hand. High-energy detectors if you need keywords. You can then have some very nice, unwinded assembly that will go through those at speeds that are basically limited by your IO subsystem. Given a bunch of IO subsystems feeding your CPU, you really want every subsystem to stream, usually at a 100MB/s rate or somesuch. On a modern enough server-grade PC you can process 5+ of those in parallel without loss of streaming. The loop that processes those was hand-tuned assembly spanning 64kb (16 pages, loop unroll was machine generated of course). There were 4, count four conditional jumps in those 16 pages of x86 assembly. Kept the CPU nice and warm. All it did was generate data for some histograms :)
Cheers, Kuba
Admin
Google search for 'get "my documents" folder c++'
2nd result
Admin
Now that I've read all the piss poor suggestions from the audience on how to "fix" the code, there's no way that I'd ever considering hiring one of you. As a matter of fact, I've just finished signing into our HR policy that worsethanfailure.com readers are not to be offered jobs (Yes, we'll ask that as a question).
Do any of you realise the damage you must be doing to Alex's Non-WTF jobs site? Anyone in the industry looking to hire candidates only has to read a single article's comments to say to themselves "I don't want an applicant from there, half of them can't speak English and the other half can't code... but they all think they can do both."
Captcha: pinball – go ahead, slay those poodles.
Admin
Mind you, if we look at the code and try to figure out what it is really doing, we see that it is converting a (probably) fixed-width pre-multiplied decimal number into a format either for display or for feeding into some other program. The "obvious" way, i.e. parsing as a double and then reformatting after a quick multiply, is fraught with tricky aspects. The real choice of simplifications is between pure string-domain processing (what the original code does, though ineptly) and parsing as a decimal into a (sufficiently large) integer, using idiv/mod to split into two pieces and then formatting. I think I'd favour the latter on clarity grounds as the pure-string versions of this sort of processing have a tendency towards obfuscation (even in simpler situations) but there are cases where doing some performance testing to determine the quickest way would be called for.
Whether that's entirely the right thing to do at all depends on context not revealed in the original article. As usual.
Admin
[quote user="I hate you all so very, very much"]captcha: NEXT MORON THAT TELLS ME WHAT THEIR CAPTCHA WAS I SLAUGHTER 10,000 POODLE PUPPIES.
captcha: xevious (I hate poodle puppies.)
Admin
All these examples in the comments are not optimal or preferred in my opinion.
There is java.text.NumberFormat that does exactly what one wants and it is locale-dependent and very easy to use. Rather than getting into XXXX.parse methods or regular expressions, the use of NumberFormat is slightly better due to its capability to handle user input from web applications for example, where a browser settings may slightly differ in numerical formats.
Admin
Well, I'm truly sorry that our innocent attempts at producing useful code snippets proved so offending to a sophisticated audience such as you. Perhaps you could show us how a real english-speaking coder would solve that problem? This way we could all improve our skills -- perhaps to the point that, one day, we could be allowed the chance of working in a company with standards as high as yours.
Admin
That's where slow and bloaty Java programs come from!
The original code is not perfect, but it will work 10 times faster with 1,000,000 lines of incoming data.
I'd suggest following:
Read char by char to length-2. Truncate all zeros off the result. If result is zero - print "0", otherwise print result. Print ".". Print the rest of the line.
Admin
Well then, could you show us how would you use NumberFormat to solve the problem at hand? Apart from abstracting the decimal separator, I can't see how it would help.
Pay attention that the input data has (or at least is expected to have) no separators of any kind. So the number formatting configurations of an hypothetical browser would make little difference in the input. Perhaps it could improve the output, but what if it is to be fed not to a human user, but to another program which has fixed number-formatting settings? I've had my share of number formatting hell, you know.
Something that has already been said here is that good programmers take the right tool for the right job. Should I use "Wrapper.parseWrapped" methods or regular expressions? Hard-code number seperators or abstract them through NumberFormat? It all depends on what the specs are. The only thing that is for certain is that programming solutions are rarely, if ever, universal.
Admin
Sorry. If the result is empty, then ...
Admin
Actually, that's where easily optimizable Java programs come from. That sinppet might not be particularly fast, but it clearly express its intent -- or, in non-XP words, it's easy to understand. If later, when profiling the system, I find that piece of code to be a bottleneck, I can easily optimize it; if it turns out otherwise, I can leave it like that.
Admin
Ok. May I know, how do you [possibly] easily optimize your chunck of code?
Admin
I agree that XML has its' uses, when you have the ability to build both sides of an application (or communications during development with the builders of the other side). I worked with the developers at one of the companies we work with and used XML to transfer data from our PC-based network to their Solaris system. Works great, and easy to debug issues with the data; the validation against the schema catches everything.
Where XML fails, though, is when you have to deal with other people's systems or legacy systems. I transfer data via diskette from disparate systems I have no control over, that offer to export in fixed width or CSV formats. I ask for fixed, and supply the format I want the data in. They export it to that format, and I have canned code that parses it into our system. It works really well, and I very seldom get bad data. When I do, the canned code handles it and logs the error so I can figure out what happened.
It's all in choosing the right tool for the job. Sometimes the right tool doesn't mean it's necessarily the tool you prefer, but the one that gets the job done instead.
Admin
Probably implementing just the algorithm you proposed. Just so we can compare, it would go somewhat like this:
String convert(String amount) { // Just one possible way to deal with this condition. if (amount == null) throw IllegalArgument("Amount must not be a null string");
int index = 0, length = amount.length();
// Just one possible way to deal with this condition. if (length == 0) throw IllegalArgument("Amount must not be an empty string");
// Read char by char to length-2. // Truncate all zeros off the result. for (int n = length - 2; index < n; index++) { char c = amount.charAt(index); if (c != ' ' && c != '0') break; }
StringBuilder builder = new StringBuilder(); for (int n = length - 2; index < n; index++) builder.append(amount.charAt(index));
// If result is empty - print "0", otherwise print result. if (builder.length() == 0) builder.append('0');
// Print ".". builder.append('.');
// Print the rest of the line. for (int n = length; index < n; index++) builder.append(amount.charAt(index));
return builder.toString(); }
Now, what I am saying is not that this wouldn't work, nor that it wouldn't be considerably faster; what I'm questioning is whether it is worth the trouble, considering that I don't even know yet whether this function will be a performance bottleneck.
While my previous example is no programming pearl, it will do the job. It was straightforward to write, it is easy to identify its weaknesses (such as where it could use improved error handling), it's fast to explain and understand, and it avoids (or automatically handles) most of the special cases I have to pay attention for in the optimized version. If it proves not good enough, it won't mean much time lost to throw it alway and replace it with a faster implementation; it's just that, as far as humans are concerned, it's best to favour the clearer, more concise implementation wherever perfomrnace constraints don't exclude it.
Admin
Oops! Where I've written
throw IllegalArgument...
Please read
throw new IllegalArgumentException...
I have been using jEdit / Beanshell to test my code snippets, but because I forgot to test for the error conditions, the interpreter didn't show me those errors. Just one advantage of compiled over interpreted code...
Admin
Which is why atoi is deprecated in favor of strtol and friends, which happen to take a "base" as a parameter. If you say 0, it'll parse a leading 0 as octal, 0x as hex, and everything else as decimal. If you say 10, everything's decimal. 16, hex, 8, octal. Or leave it in "auto" mode...
Admin
In Java you can just parse as a Long then divide by 100, or if you need a BigDecimal, you can just insert a period just before the last two chars and use the String constructor.
Admin
...which is yet another example that Perl IS the lossy compression for programming languages...
Admin
BZZZZT! Wrong!
double result = Double.parseDouble( amount ) / 100;
Admin
In fact you could say they don't need to be parsed at all. And you will never get hung up on a pesky separator in the string you are tring to convert into a long, either.
Humna readability and space efficiency sort of suck though.
captch = pointer (I kid you not)
Admin
So are you volenteering to rewrite and debug the company's legacy 10 million line code base? In your spare time, naturally...
Admin
as a COBOL programmer who has coded to output variable length fields and filesizes, I can assure you the added compexity of doing this with a language that was only designed to handle fixed length fields anf files, will double the deveopment time, and scare the crap out of most programmers who may need to change or maintain it.
Since the usual use of a CSV file is to stuff it into Excel, which has not problem with fixed length fields padded with spaces, and numerics with leading zeros, the extra effort is not justified.
Not stripping out the separator characters from the data fields, or replacing reserve characters in XML output, is just sloppy testing and design.
Admin
Yes, this problem calls for many rocket scientists indeed.
Admin
This all looks nice. But how much CPU cycles does it take? I tried to explain it before but to nobody of little understanding though.
Think of zillions of records, folk.
The original (and bashed) code is nearly perfect in real world.
Admin
Please... I believe I've made a clear point that, although performance is important, optimization is risky business; so it's best to first try the most straightforward solutions, then go back to the system bottlenecks (which you can only reliably determine through testing) and optimize only those critical areas that are holding the system back. If you find some problem with that reasoning, please feel free to point it out; but don't claim your oh so wise point to have been ignored when it just wasn't.
Remember, however, that performance is just one requirement among several. If those fixed-length files contain financial or other highly sensitive data, misprocessing them may have dire consequences; so the benefit of applying a faster but more complex solution must be weighted against the relative risk of an incorrect implementation that isn't unearthed until the system enters production.
It may very well be that, in the real world, we'd end up having to apply a more convoluted solution than sepi's to that formatting problem. But understand the difference: we'd have to apply a more complex solution because the (performance) requirements demand it, not because we think it is "better". Because, as long as humans are involved, it just isn't.
You do realize that each time you call String.valueOf() and String.substring() a new String object is created, right? Because that's what the "nearly perfect" original code does a lot, thus managing to be confusing and inneficient.
Admin
Admin
Why would I want to look in the Float or Double class, it is a String. I want the String class toDouble() method. All I see is the toString() method.
:)
This is one of Java's failures... too many objects and not enough documentation. Almost like a dictionary to get the spelling of a word. How do you spell "Opossum", I cant find it anywhere under the P's.