The Daily WTF: Curious Perversions in Information Technology

2021-06-29 Reply Admin

    String numberString = "frist";

    NumberFormat nf = NumberFormat.getInstance(Locale.GERMAN);
    if (nf instanceof DecimalFormat) {
        DecimalFormat df = (DecimalFormat) nf;
        df.setParseBigDecimal(true);
        BigDecimal parsed = (BigDecimal) df.parse(numberString);

        System.out.println(parsed);
    }

Mr. TA · 2021-06-29 Reply Admin

They'll fix it, eventually. They just need 4 more years!

2021-06-29 Reply Admin

Actually, one of the worst ideas is to parse (and output anywhere except read-only UI) numbers in locale-specific format. It looks like cool feature, but produce a lot of unexpected errors.

2021-06-29 Reply Admin

so 1.2.3 is a number = 1.23 ?

The code might be ugly but the QA work looks worse.

2021-06-29 Reply Admin

Oh, did Paula indeed find a new job?

2021-06-29 Reply Admin

Well she is brilliant.

2021-06-29 Reply Admin

Actually, one of the worst ideas is to parse (and output anywhere except read-only UI) numbers in locale-specific format

That's an extremely Anglo attitude. If your users are 70-year-old grannies volunteering for a French or German charity, they will enter their prices with a decimal comma, and your lack of flexibility is your fault, not theirs.

2021-06-29 Reply Admin

'Zactly. The problem is dealing with user input, not output.

Especially in a world powered by JSON, etc., everything is explicitly string-ified somewhere in the transmission chain and needs to be reliably de-stringified for use as a numeric value. So you're either parsing locale-aware in the browser to get a numeric value to then inject into your JSON or whatever as a invariantly-cultured stringified numeric, or your passing it to the server as-keystroked, and doing the locale-aware conversion there.

2021-06-29 Reply Admin

Don't you mean "brillant"?

(Autocorrect has a lot to answer for though!)

2021-06-29 Reply Admin

Ah, 2017. The far-off time of the Trump administration.

2021-06-29 Reply Admin

Given that there are constants defined for DOT and COMMA (and SPACE, and DASH, and who knows what else), and that the DOT is referred to in the original comments as a "decimal separator", I have a feeling that whoever originally wrote this code had localization in mind. But as these things usually go, it was never fully implemented, and as the code rolled from "current" to "legacy" and that dev moved on to his next gig, the localization hooks were simply forgotten.

2021-06-29 Reply Admin

Legacy code often means what was written under a previous management structure - "that was written by IT Business Financials, but you're now Business Financials (IT) and company policy is that each team is only responsible for their own code as there's different practices in use"

2021-06-29 Reply Admin

And what if the customer happened to have a name with a numeral inside it? Like, say, "X Æ A-12" . Not that anyone would ever do that. And don't even get me started on little Bobby Tables.

2021-06-29 Reply Admin

Yeah, ok, so the big complication is they don't know whether to resolve a DOT/SPACE/COMMA as a decimal point or not and so they're trying to infer it based on the number of digits following said character. Hence, the 2-digit limit imposed for decimals - a crux of the approach which was broken by the next developer.

So, the question is, how would you attempt to parse a localized number if you don't know the locale? Perhaps that's the problem they were trying to solve... In truth, it would require some external mechanism, which maybe they asked for just to get shot down.

Ex: "453.451" vs "453,451" - impossible to distinguish without more info - unless you want to be Bayesian about it and work with other numerical entries, where some of them can be determined to definitely be of one locale type (or erroneous entries, hence the Bayesian/statistical component).

Ex: 453.451 <- indeterminable 97,4849 <- leaning towards Euro-style, but could be an error 45.1 <- leaning towards US-style, but could be an error etc...

Totally fruitless if there's a mix of styles and the really funny part would be the first time someone uses the software and can't verify any inputs for a while... ;)

2021-06-29 Reply Admin

Nah, in 4 years they will go to 4 decimals...

2021-06-29 Reply Admin

Fuck everything, we're doing five decimals. And we'll have an aloe strip AND a lather strip too.

2021-06-29 Reply Admin

I also assume that the intention was to auto-detect locale based on the number format, while allowing both decimal and thousands separators (and maybe other groupings like 4 digits). Cases like '123,456' are indeed ambiguous in this case, also 12,3456 in case a country uses groups of four digits. As soon as there are two different separators present, it becomes a lot less ambiguous (take the one that appars only once; in case both appear once, take the rightmost one).

But for solving such a problem, I would have used regular expressions for the different cases. And in case of ambiguity, use the rightmost separator as decimal separator unless it is a space.

In case users may edit already pre-filled formatted output, make sure that every number output includes a decimal separator :)

2021-06-29 Reply Admin

In my experience, legacy code means it was written by the previous team.

When all members of that team leave (and one time they all left on the same day!) and new members are brought in, they don't understand the code and don't want to. They decide to write it "better" and the cycle begins again.

In the past decade, I've seen this happen about 4 times to the same product. I seem to be the only one who stays long enough to see the pattern.

2021-06-30 Reply Admin

Does anyone ever use thousands separators in real life though? I mean, I guess we can't exclude them for completeness sake but it's been years since I've seen one live.

2021-06-30 Reply Admin

Two decimal places behind the d.p. is appropriate when you're parsing a string that contains a monetary value, e.g. dollars, pounds sterling or euros. Not necessarily universally, of course, which is TRWTF.

2021-06-30 Reply Admin

A 70-year-old granny will quickly learn how to put in a "." rather than a "," and will grumble for 2 minutes and then Just Get On With It for the entire rest of the time that they're using that software. It's really not a big problem. But if you ever want to take exported data to a different locale and import it where "123,45" is interpreted as ("123" and "45") or as "123", then may God help anyone who has to sort out the resulting mess. Especially if nobody notices, and then exports and imports and re-interpretations happen merrily between systems for days or weeks...

So. Either just stick to a standard when importing/exporting (feel free to display in whatever locale you want in the UI), or disallow import/export.

2021-06-30 Reply Admin

You want to autodetect the correct value of 123,456 based on ... a regex, and some guesses based on what the surrounding numbers look like?

Well in the worst case, I guess you won't be off by more than 2-4 orders of magnitude. So you go right ahead, as long as that thought doesn't bother you at all.

2021-06-30 Reply Admin

I guess I deserve to be made fun of for calling 2017 code "legacy". The previous team did all leave on the same day and none of their knowledge was passed down to us. We want to understand it and we'd love to rewrite it but neither of those goals are realistic. We spend most of our time keeping the ship from sinking.

The intention is to take a chunk of input text our app flagged as numeric, parse it as a number, perform some derived calculations, and eventually display everything to the user for verification. The more accurate we are, the less manual labor they have to do. It's used for numbers of any precision, not just currency.

Detecting the locale isn't required since the user has already set the text's locale. I think I put that in the submission but maybe I forgot. So this is a case where parsing based on locale probably makes sense. The locale was already available in parseNumberInString but both devs chose not to use it.

R3D3 · 2021-06-30 Reply Admin

Excel likes to make such a merry mess, when data is (reasonably) exported as CSV. Not only does it silently assume locale formatted floats, converting all numbers into strings which are the plotted as zeros, it also supports only one character as separator, when common conventions would require at least comma and tab to be supported.

Basically Excel tries to auto-detect the format, and does it all wrong.

Easiest solution? Import with Libreoffice Calc, which has a proper dialog for interpreting CSV files.

2021-07-01 Reply Admin

That depends on of your limited to values that can be represented by coins and notes, but for some prices, especially for small things, decimal places often use more digits to allow for more fine grained values.

And having worked with customers with both comma and dot for decimal all working in the same system the best option is to use local validation, like preventing the use of the thousands separator and only allow one separator as that will force the user to write the numbers in a way you can always identify.

If the paste, make any changes directly in the form so any errors will be visible to the user.

Sure some might complain but the risk for errors is way less.

Scarlet_Manuka · 2021-07-08 Reply Admin

Does anyone ever use thousands separators in real life though?

Oh yes. My second most common ETL breakage these days is seeing $ and/or , appear in what's supposed to be a numeric field in the XML document I'm trying to process. (The system generates values without them, but users can edit it. And apparently the XMLifier on the source side can't be bothered to strip the excess characters out.)

2021-07-12 Reply Admin

I work with someone like this. "DOT" is a variable higher up. So is DASH. They're const's in that //variables section. There is absolutely no consideration to being "locale-aware" in this code -- this is a consideration of, "What if I want to use something other than a dash? It's better to have it in one place." You know, in cash the dash symbol changes.

This constantly came up. "Our company uses the name "Xyz" for all of our products. Store that in a variable, never, ever use a bare string. Then when we do, /etc/xyz/myapp/conf.conf, store it an ETC=/etc, COMPANY_NAME=xyz, MY_APP_NAME=myapp, CONF_DIRECTORY=$ETC/$COMPANY_NAME/$MYAPP_NAME/; CONF_FILE="$CONF_DIRECTORY/conf.conf"

It pisses me and so many other people off to such a great degree. It obfuscates and complexifies everything to the point that you're literally swearing at the coworker trying to un-break the shit that they wrote. DOT being a variable (constant) is in no way related to locale, it's simply one dumbass demanding that string-deduplication be done by an incompetent developer, instead of the compiler.

Big Number One

Leave a comment on “Big Number One”