The Daily WTF: Curious Perversions in Information Technology

2022-05-19 Reply Admin

At least that predecessor has the "honor" of supplying a good WTF to this site

2022-05-19 Reply Admin

The only nice things to say about this mess are a) the guy is consistent; brutally consistent. And b) whoever laid out their data model can both spell & CamelCase consistently.

I give it 2 stars: Not recommended, but I've seen worse.

Separately, I have to say I've seen some "novel" uses of stringly language features before, but iteratively using the C# string interpolation operator as a "buildulator" (great term!) is a new one on me. Whodathunkit?!?11?!

2022-05-19 Reply Admin

In fact there is so much consistency that I was wondering whether this code was written by machine rather than a human - they re-run the generator when they change the schema.

2022-05-19 Reply Admin

Modern optimisers can use constant folding to evaluate these expressions at compile time.

2022-05-19 Reply Admin

And the whole design fails miserably with non-US data. Both names and addresses are best handled as free-form text.

TheCPUWizard · 2022-05-19 Reply Admin

OK, a WTF... but: "but I suspect this is going to be pretty much the most expensive option." -- Does it matter? What is the overall impact to observed application performance between this implementation and an "ideal" (no such thing) one? What is the performance specification relative to this observed behaviour?

2022-05-19 Reply Admin

@Brian the schema looks remarkably like the fields of a Contact record in MSFT Outlook. Which of course originally dates from the US-centric days of PC programming and has since been extended in various ways for internationalization, interoperability, storage in Exchange and later the MSFT cloud, etc.

I could sure imagine a US-centric programmer or data designer simply firingup Outlook and cloning their data model with the thought that "If it's good enough for MSFT's products, it's good enough for us."

I'm not aware of any specific agreed multi-vendor multi-national standard on contact record schemas. That's just not in my area of practice. If anyone knows more about such a thing I'd be curious to get a link to get started reading.

2022-05-19 Reply Admin

Even if this isn't an abysmal way to build a string, it is definitely an abysmal way to build a CSV file.

2022-05-19 Reply Admin

In this situation, shouldn’t the choice of concatenation method be left to the String.join(array) method (or whatever equivalent in your language) ?

2022-05-19 Reply Admin

On the other hand, the stringbuilder was fast on first try... It was also a completely unreadable mess.

I often choose readability over maximum execution speed in cases where speed isn't critical to the application.

2022-05-19 Reply Admin

Unless the input is being sanitized, then the string will happily accept commas in the concatenated values and no doubt break the file format.

This to me appears to be a CSV file, in which case I can't recommend enough that the "CsvHelper" NuGet package be used. Not only can it automatically escape values, but it has a whole range of configurable options and will almost certainly work for whatever your use case is.

I was once flabbergasted when a bank we were integrating with used a form of CSV files and in all their extensive documentation they failed to mention what to do about possible values of the separator character in a field.

2022-05-19 Reply Admin

TRWTF is what happens if any of those fields contains a comma. NEVER build a CSV file (or XML or JSON or any or "structured text" format) by concatenating strings, ALWAYS use a mechanism that automatically handles the necessary quoting and escaping.

2022-05-19 Reply Admin

Even if this isn't an abysmal way to build a string, it is definitely an abysmal way to build a CSV file.

Very timely - just this morning, I adapted a function to create a complete list of all financial data in a certain system to a new use.

The original, written by my predecessors, creates a set of HTML tables for display in the user's browser, and did so by concatenating strings. It's usually a bit slow, but not annoyingly so, except that we have this one customer who has tens of thousands of bookings a month in a specific ledger, and thus creates a table of almost a hundred thousand rows. All using string concatenation. Either the server or the browser does not like that, and crashes.

The new one creates a CSV file for import into Excel or some analysis package, and does so line by line. Still takes ten seconds, but then, it does create a 10 MB CSV...

(In other news, I hate Excel localisation.)

2022-05-19 Reply Admin

In the following centuries this will be recognised as one of the earliest attempts of PDD - %PATH% Driven Development.

2022-05-19 Reply Admin

So in other words, now the analyst's computer crashes due to excel trying to open an 10MB csv file.... :)

Been there and all that, which is one of the reasons i like Linux which comes with a ehome range of tols for data analysis (sed, awk, head, tail, grep, gnuplot etc).

Yazeran

2022-05-19 Reply Admin

I would expect the optimizer to clean this up, all those references to the previous string are going to go away--a StringBuilder will keep the garbage down but the extra lines aren't causing garbage. Write for readability unless you're in a loop that's going to execute many, many, many times. And most optimization is a matter of the right choice of algorithm in the first place.
This is going to be IO bound anyway, even if it is a bit slow nobody is going to notice.
The else block is because somebody's expecting the lines to be there even if they're blank.

Mr. TA · 2022-05-19 Reply Admin

His header is full of content, but his head doesn't seem to be.

Mr. TA · 2022-05-19 Reply Admin

I'm saddened to learn and inform you all that even the latest version of .NET is unable to optimize this way:

SharpLab

And no, anti-performance atrocities like these should not be allowed. Sentence: life of writing HTML/CSS without parole.

Addendum 2022-05-19 12:25: PS. Will post this to .NET Github

2022-05-19 Reply Admin

meh barf

radarbob · 2022-05-19 Reply Admin

the guy is consistent; brutally consistent

It is a well know trait of incompetence.

2022-05-19 Reply Admin

Hmmm.... I wrote a quick script in python to write a 100,000-line csv averaging 105 bytes per line and even when the row is composed entirely of randomly generated numbers which are individually converted to strings and even when the output is written to a network drive, the script still manages to complete in under 1 second...

2022-05-19 Reply Admin

About the first example:

In the current version of .net (6) the compiler will optimize interpolation literals as well. This example will maybe not be optimized due to it's recursive syntax tree nature, but I don't know the details - however they devs stated often that this implementation was only the first iteration. So I wouldn't be surprised it the compiler would unwind expressions like that in the future.

Personally I never understand why something like that is needed at all; just clear a POCO, annotate it fluently or via attributes, serialize with header with whatever nuget library fits best, done. There is pretty much never the need for a construct like that, especially since code generators are maturing more and more since their introduction and integration with .net 5.

2022-05-19 Reply Admin

Maintainability is more often important that worrying about performance for something which runs rarely.

konnichimade · 2022-05-19 Reply Admin

Re: #3, those are columns, not rows. And it could have just been a single statement appending 7 commas in a row instead of 7 statements each appending one comma.

konnichimade · 2022-05-19 Reply Admin

I don't know that I would go so far as to call strict consistency a sign of incompetence, so much as a lack of imagination. After all, this code probably does its job just fine. And I think consistency has value in high-risk scenarios. E.g., I would expect NASA code to be brutally consistent; but NASA code would probably also be a little more thoughtful, and not just consistency for consistency's sake.

konnichimade · 2022-05-19 Reply Admin

But yes, TRWTF is, as usual, a homebrew solution for a well-solved problem (converting a data structure into a CSV).

2022-05-20 Reply Admin

The performance impact depends on environment. If the code is executed on some server every second thousand times then it contributes to cpu load and that's not nice. If the code is executed once every hour then it does matter. As I write applications (running on windows server) which are used by multiple RDP clients ( 10-100) at same time then things like this matters.

2022-05-20 Reply Admin

if this is c# 👇 string s = getSomeString(); s = s + "some suffix";

Then it is correct, the cost to create a string builder is more expensive than the cost to create 3 strings

2022-05-20 Reply Admin

Ah, but the author defended against commas by wrapping every interpolated value in quotes

So now the code will break on quotes instead.

zomgwtf · 2022-05-20 Reply Admin

Looks like the String Buildinator should meet Trogdor the Burninator.

2022-05-20 Reply Admin

So in other words, now the analyst's computer crashes due to excel trying to open an 10MB csv file.... :)

His problem - he asked for it!

(Actually, Excel on my computer did open it. It gets a bit sluggish, true, but I can't really blame it.

2022-05-20 Reply Admin

the script still manages to complete in under 1 second...

There may well be some network lag in my numbers. And getting all the data out of an admittedly over-complicated database (which, no, I can't change).

10 seconds is acceptable to me, given that a. this guy's one of our largest users, and b. they're only going to need this data once or twice a year, for the yearly audit.

The String Buildulator

Leave a comment on “The String Buildulator”