The Daily WTF: Curious Perversions in Information Technology

2007-04-01 Reply Admin

andy brons:
The biggest WTF is fixed length data files. Who writes these things? They are the biggest pain in @ss. I just don't understood the reasoning for them.

Usually they are the fastest to process. Especially if you're dealing with files containing billions of records, say a terabyte or two in size - I've seen such an application once first-hand. High-energy detectors if you need keywords. You can then have some very nice, unwinded assembly that will go through those at speeds that are basically limited by your IO subsystem. Given a bunch of IO subsystems feeding your CPU, you really want every subsystem to stream, usually at a 100MB/s rate or somesuch. On a modern enough server-grade PC you can process 5+ of those in parallel without loss of streaming. The loop that processes those was hand-tuned assembly spanning 64kb (16 pages, loop unroll was machine generated of course). There were 4, count four conditional jumps in those 16 pages of x86 assembly. Kept the CPU nice and warm. All it did was generate data for some histograms :)

Cheers, Kuba

2007-04-02 Reply Admin

Google search for 'get "my documents" folder c++'

2nd result

2007-04-02 Reply Admin

Now that I've read all the piss poor suggestions from the audience on how to "fix" the code, there's no way that I'd ever considering hiring one of you. As a matter of fact, I've just finished signing into our HR policy that worsethanfailure.com readers are not to be offered jobs (Yes, we'll ask that as a question).

Do any of you realise the damage you must be doing to Alex's Non-WTF jobs site? Anyone in the industry looking to hire candidates only has to read a single article's comments to say to themselves "I don't want an applicant from there, half of them can't speak English and the other half can't code... but they all think they can do both."

Captcha: pinball – go ahead, slay those poodles.

2007-04-02 Reply Admin

Knobhead:
Anyone in the industry looking to hire candidates only has to read a single article's comments to say to themselves "I don't want an applicant from there, half of them can't speak English and the other half can't code... but they all think they can do both."

I think you'll find that the ability to speak (well, write) English and the ability to write code are largely independent variables. For example, consider the large number of people who evidently can't do either.

Mind you, if we look at the code and try to figure out what it is really doing, we see that it is converting a (probably) fixed-width pre-multiplied decimal number into a format either for display or for feeding into some other program. The "obvious" way, i.e. parsing as a double and then reformatting after a quick multiply, is fraught with tricky aspects. The real choice of simplifications is between pure string-domain processing (what the original code does, though ineptly) and parsing as a decimal into a (sufficiently large) integer, using idiv/mod to split into two pieces and then formatting. I think I'd favour the latter on clarity grounds as the pure-string versions of this sort of processing have a tendency towards obfuscation (even in simpler situations) but there are cases where doing some performance testing to determine the quickest way would be called for.

Whether that's entirely the right thing to do at all depends on context not revealed in the original article. As usual.

2007-04-02 Reply Admin

[quote user="I hate you all so very, very much"]captcha: NEXT MORON THAT TELLS ME WHAT THEIR CAPTCHA WAS I SLAUGHTER 10,000 POODLE PUPPIES.

captcha: xevious (I hate poodle puppies.)

Chiraz · 2007-04-02 Reply Admin

All these examples in the comments are not optimal or preferred in my opinion.

There is java.text.NumberFormat that does exactly what one wants and it is locale-dependent and very easy to use. Rather than getting into XXXX.parse methods or regular expressions, the use of NumberFormat is slightly better due to its capability to handle user input from web applications for example, where a browser settings may slightly differ in numerical formats.

2007-04-02 Reply Admin

Knobhead:
Now that I've read all the piss poor suggestions from the audience on how to "fix" the code, there's no way that I'd ever considering hiring one of you. As a matter of fact, I've just finished signing into our HR policy that worsethanfailure.com readers are not to be offered jobs (Yes, we'll ask that as a question).

Well, I'm truly sorry that our innocent attempts at producing useful code snippets proved so offending to a sophisticated audience such as you. Perhaps you could show us how a real english-speaking coder would solve that problem? This way we could all improve our skills -- perhaps to the point that, one day, we could be allowed the chance of working in a company with standards as high as yours.

2007-04-02 Reply Admin

Helio:

String convert(String amount) { if (amount == null) return amount;

String trimmed = amount.trim(); if ("".equals(trimmed)) return amount;

int value = Integer.parseInt(trimmed); int integer = value / 100; int fraction = value % 100; return integer + "." + (fraction < 9 ? "0" : "") + fraction; }

Granted, still no pinnacle of perfection, but a lot better, don't you think?

That's where slow and bloaty Java programs come from!

The original code is not perfect, but it will work 10 times faster with 1,000,000 lines of incoming data.

I'd suggest following:

Read char by char to length-2. Truncate all zeros off the result. If result is zero - print "0", otherwise print result. Print ".". Print the rest of the line.

2007-04-02 Reply Admin

Chiraz:
There is java.text.NumberFormat that does exactly what one wants and it is locale-dependent and very easy to use. Rather than getting into XXXX.parse methods or regular expressions, the use of NumberFormat is slightly better due to its capability to handle user input from web applications for example, where a browser settings may slightly differ in numerical formats.

Well then, could you show us how would you use NumberFormat to solve the problem at hand? Apart from abstracting the decimal separator, I can't see how it would help.

Pay attention that the input data has (or at least is expected to have) no separators of any kind. So the number formatting configurations of an hypothetical browser would make little difference in the input. Perhaps it could improve the output, but what if it is to be fed not to a human user, but to another program which has fixed number-formatting settings? I've had my share of number formatting hell, you know.

Something that has already been said here is that good programmers take the right tool for the right job. Should I use "Wrapper.parseWrapped" methods or regular expressions? Hard-code number seperators or abstract them through NumberFormat? It all depends on what the specs are. The only thing that is for certain is that programming solutions are rarely, if ever, universal.

2007-04-02 Reply Admin

Vladas:
If result is zero - print "0", otherwise print result.

Sorry. If the result is empty, then ...

2007-04-02 Reply Admin

Vladas:
That's where slow and bloaty Java programs come from!

Actually, that's where easily optimizable Java programs come from. That sinppet might not be particularly fast, but it clearly express its intent -- or, in non-XP words, it's easy to understand. If later, when profiling the system, I find that piece of code to be a bottleneck, I can easily optimize it; if it turns out otherwise, I can leave it like that.

2007-04-02 Reply Admin

Helio:
If later, when profiling the system, I find that piece of code to be a bottleneck, I can easily optimize it;

Ok. May I know, how do you [possibly] easily optimize your chunck of code?

KenW · 2007-04-02 Reply Admin

brazzy:
chrismcb:
I love that some people think XML is the greatest thing since the zero was invented.
It's a standardized, flexible file/data format, with parsers readily available for most languages, and relatively easy to read and debug by humans.
IMO these are such massive advantages that there need to be grave, specific reasons for me to use anything else for storing or transmitting data between differen systems.

Sure, there are cases where it's the wrong tool and it can be horribly abused. But I can think of very few cases where fixed-length fields would be a better alternative.

I agree that XML has its' uses, when you have the ability to build both sides of an application (or communications during development with the builders of the other side). I worked with the developers at one of the companies we work with and used XML to transfer data from our PC-based network to their Solaris system. Works great, and easy to debug issues with the data; the validation against the schema catches everything.

Where XML fails, though, is when you have to deal with other people's systems or legacy systems. I transfer data via diskette from disparate systems I have no control over, that offer to export in fixed width or CSV formats. I ask for fixed, and supply the format I want the data in. They export it to that format, and I have canned code that parses it into our system. It works really well, and I very seldom get bad data. When I do, the canned code handles it and logs the error so I can figure out what happened.

It's all in choosing the right tool for the job. Sometimes the right tool doesn't mean it's necessarily the tool you prefer, but the one that gets the job done instead.

2007-04-02 Reply Admin

Vladas:
Helio:
If later, when profiling the system, I find that piece of code to be a bottleneck, I can easily optimize it;

Ok. May I know, how do you [possibly] easily optimize your chunck of code?

Probably implementing just the algorithm you proposed. Just so we can compare, it would go somewhat like this:

String convert(String amount) { // Just one possible way to deal with this condition. if (amount == null) throw IllegalArgument("Amount must not be a null string");

int index = 0, length = amount.length();

// Just one possible way to deal with this condition. if (length == 0) throw IllegalArgument("Amount must not be an empty string");

// Read char by char to length-2. // Truncate all zeros off the result. for (int n = length - 2; index < n; index++) { char c = amount.charAt(index); if (c != ' ' && c != '0') break; }

StringBuilder builder = new StringBuilder(); for (int n = length - 2; index < n; index++) builder.append(amount.charAt(index));

// If result is empty - print "0", otherwise print result. if (builder.length() == 0) builder.append('0');

// Print ".". builder.append('.');

// Print the rest of the line. for (int n = length; index < n; index++) builder.append(amount.charAt(index));

return builder.toString(); }

Now, what I am saying is not that this wouldn't work, nor that it wouldn't be considerably faster; what I'm questioning is whether it is worth the trouble, considering that I don't even know yet whether this function will be a performance bottleneck.

While my previous example is no programming pearl, it will do the job. It was straightforward to write, it is easy to identify its weaknesses (such as where it could use improved error handling), it's fast to explain and understand, and it avoids (or automatically handles) most of the special cases I have to pay attention for in the optimized version. If it proves not good enough, it won't mean much time lost to throw it alway and replace it with a faster implementation; it's just that, as far as humans are concerned, it's best to favour the clearer, more concise implementation wherever perfomrnace constraints don't exclude it.

2007-04-02 Reply Admin

Helio:
Probably implementing just the algorithm you proposed. Just so we can compare, it would go somewhat like this:

Oops! Where I've written

throw IllegalArgument...

Please read

throw new IllegalArgumentException...

I have been using jEdit / Beanshell to test my code snippets, but because I forgot to test for the error conditions, the interpreter didn't show me those errors. Just one advantage of compiled over interpreted code...

2007-04-02 Reply Admin

C programmer:
Maybe it was a (bad) C programmer who remembered that C library functions for number conversions interpret numbers starting with zero as being octal and therefore unable to parse 0000028000 correctly.

Which is why atoi is deprecated in favor of strtol and friends, which happen to take a "base" as a parameter. If you say 0, it'll parse a leading 0 as octal, 0x as hex, and everything else as decimal. If you say 10, everything's decimal. 16, hex, 8, octal. Or leave it in "auto" mode...

2007-04-02 Reply Admin

From TFA:
You’d think that the developers of Java would provide a simple way to convert a string, say "000000028000", into a number, say 280.00

Aaaaah the joys of ISO8583... amounts are expressed in cents.

In Java you can just parse as a Long then divide by 100, or if you need a BigDecimal, you can just insert a period just before the last two chars and use the String constructor.

2007-04-03 Reply Admin

strcmp:
Inserting the dot first simplifies things (and you don't lose precision by converting to anything). In perl:
substr($_, -2, 0) = '.'; s/^0+//;

...which is yet another example that Perl IS the lossy compression for programming languages...

2007-04-03 Reply Admin

Taki:
Just for fun:
double result = Double.parseDouble( amount ) / 10;

BZZZZT! Wrong!

double result = Double.parseDouble( amount ) / 100;

2007-04-03 Reply Admin

Old Timer:
andy brons:
The biggest WTF is fixed length data files. Who writes these things? They are the biggest pain in @ss. I just don't understood the reasoning for them.

Fixed length data files are a lot easier to parse than delimmeted file or (yuch!) xml.

In fact you could say they don't need to be parsed at all. And you will never get hung up on a pesky separator in the string you are tring to convert into a long, either.

Humna readability and space efficiency sort of suck though.

captch = pointer (I kid you not)

2007-04-04 Reply Admin

So are you volenteering to rewrite and debug the company's legacy 10 million line code base? In your spare time, naturally...

2007-04-04 Reply Admin

as a COBOL programmer who has coded to output variable length fields and filesizes, I can assure you the added compexity of doing this with a language that was only designed to handle fixed length fields anf files, will double the deveopment time, and scare the crap out of most programmers who may need to change or maintain it.

Since the usual use of a CSV file is to stuff it into Excel, which has not problem with fixed length fields padded with spaces, and numerics with leading zeros, the extra effort is not justified.

Not stripping out the separator characters from the data fields, or replacing reserve characters in XML output, is just sloppy testing and design.

sepi · 2007-04-04 Reply Admin

Yes, this problem calls for many rocket scientists indeed.

String in = "000000028000";
BigDecimal result = new BigDecimal(in)
                   .divide(new BigDecimal(100)).setScale(2);

2007-04-05 Reply Admin

sepi:
Yes, this problem calls for many rocket scientists indeed.
String in = "000000028000";
BigDecimal result = new BigDecimal(in)
                   .divide(new BigDecimal(100)).setScale(2);

This all looks nice. But how much CPU cycles does it take? I tried to explain it before but to nobody of little understanding though.

Think of zillions of records, folk.

The original (and bashed) code is nearly perfect in real world.

2007-04-05 Reply Admin

Vladas:
sepi:
String in = "000000028000";
BigDecimal result = new BigDecimal(in)
                   .divide(new BigDecimal(100)).setScale(2);
This all looks nice. But how much CPU cycles does it take? I tried to explain it before but to nobody of little understanding though.

Please... I believe I've made a clear point that, although performance is important, optimization is risky business; so it's best to first try the most straightforward solutions, then go back to the system bottlenecks (which you can only reliably determine through testing) and optimize only those critical areas that are holding the system back. If you find some problem with that reasoning, please feel free to point it out; but don't claim your oh so wise point to have been ignored when it just wasn't.

Vladas:
Think of zillions of records, folk.

Remember, however, that performance is just one requirement among several. If those fixed-length files contain financial or other highly sensitive data, misprocessing them may have dire consequences; so the benefit of applying a faster but more complex solution must be weighted against the relative risk of an incorrect implementation that isn't unearthed until the system enters production.

It may very well be that, in the real world, we'd end up having to apply a more convoluted solution than sepi's to that formatting problem. But understand the difference: we'd have to apply a more complex solution because the (performance) requirements demand it, not because we think it is "better". Because, as long as humans are involved, it just isn't.

Vladas:
The original (and bashed) code is nearly perfect in real world.

You do realize that each time you call String.valueOf() and String.substring() a new String object is created, right? Because that's what the "nearly perfect" original code does a lot, thus managing to be confusing and inneficient.

2007-04-27 Reply Admin

RevMike:
brazzy:

I love that some people think XML is the greates thing since the zero was invented. They fail to realize that its just a freaking file format.
It's a standardized, flexible file/data format, with parsers readily available for most languages, and relatively easy to read and debug by humans.
IMO these are such massive advantages that there need to be grave, specific reasons for me to use anything else for storing or transmitting data between differen systems.

Sure, there are cases where it's the wrong tool and it can be horribly abused. But I can think of very few cases where fixed-length fields would be a better alternative.

Don't forget that it takes 100 times as much memory and 100 times as much CPU to deal with XML as opposed to fixed width formats. If you are dealing with small payloads, XML is great. When you start dealing with larger payloads you start feeling pain very quickly. Try performing an XSLT transformation on a 600 MB XML file someday. It isn't pretty. I've been there, got the t-shirt, and never want to deal with it again.

Fixed format feeds have the singular advantage of being able to scale larger than delimited and much much much larger than XML. If you think that there is no place for fixed, you probably have not been working on domains that deal with large data problems.

errmmm... BACS tapes, anyone? Or indeed just about anything involving millions of financial transactions

2007-05-08 Reply Admin

jtl:
float f = Float.parseFloat("123.4");

Why would I want to look in the Float or Double class, it is a String. I want the String class toDouble() method. All I see is the toString() method.

:)

This is one of Java's failures... too many objects and not enough documentation. Almost like a dictionary to get the spelling of a word. How do you spell "Opossum", I cant find it anywhere under the P's.

The Longest Way to Zero

Leave a comment on “The Longest Way to Zero”