• Hans (unregistered)

    At least that predecessor has the "honor" of supplying a good WTF to this site

  • WTFGuy (unregistered)

    The only nice things to say about this mess are a) the guy is consistent; brutally consistent. And b) whoever laid out their data model can both spell & CamelCase consistently.

    I give it 2 stars: Not recommended, but I've seen worse.

    Separately, I have to say I've seen some "novel" uses of stringly language features before, but iteratively using the C# string interpolation operator as a "buildulator" (great term!) is a new one on me. Whodathunkit?!?11?!

  • Tim Ward (unregistered) in reply to WTFGuy

    In fact there is so much consistency that I was wondering whether this code was written by machine rather than a human - they re-run the generator when they change the schema.

  • Alistair (unregistered)

    Modern optimisers can use constant folding to evaluate these expressions at compile time.

  • Brian (unregistered)

    And the whole design fails miserably with non-US data. Both names and addresses are best handled as free-form text.

  • (nodebb)

    OK, a WTF... but: "but I suspect this is going to be pretty much the most expensive option." -- Does it matter? What is the overall impact to observed application performance between this implementation and an "ideal" (no such thing) one? What is the performance specification relative to this observed behaviour?

  • WTFGuy (unregistered)

    @Brian the schema looks remarkably like the fields of a Contact record in MSFT Outlook. Which of course originally dates from the US-centric days of PC programming and has since been extended in various ways for internationalization, interoperability, storage in Exchange and later the MSFT cloud, etc.

    I could sure imagine a US-centric programmer or data designer simply firingup Outlook and cloning their data model with the thought that "If it's good enough for MSFT's products, it's good enough for us."

    I'm not aware of any specific agreed multi-vendor multi-national standard on contact record schemas. That's just not in my area of practice. If anyone knows more about such a thing I'd be curious to get a link to get started reading.

  • rosuav (unregistered)

    Even if this isn't an abysmal way to build a string, it is definitely an abysmal way to build a CSV file.

  • Shiwa (unregistered)

    In this situation, shouldn’t the choice of concatenation method be left to the String.join(array) method (or whatever equivalent in your language) ?

  • LCrawford (unregistered)

    On the other hand, the stringbuilder was fast on first try... It was also a completely unreadable mess.

    I often choose readability over maximum execution speed in cases where speed isn't critical to the application.

  • Jonathan (unregistered)

    Unless the input is being sanitized, then the string will happily accept commas in the concatenated values and no doubt break the file format.

    This to me appears to be a CSV file, in which case I can't recommend enough that the "CsvHelper" NuGet package be used. Not only can it automatically escape values, but it has a whole range of configurable options and will almost certainly work for whatever your use case is.

    I was once flabbergasted when a bank we were integrating with used a form of CSV files and in all their extensive documentation they failed to mention what to do about possible values of the separator character in a field.

  • Mark (unregistered)

    TRWTF is what happens if any of those fields contains a comma. NEVER build a CSV file (or XML or JSON or any or "structured text" format) by concatenating strings, ALWAYS use a mechanism that automatically handles the necessary quoting and escaping.

  • RLB (unregistered) in reply to rosuav

    Even if this isn't an abysmal way to build a string, it is definitely an abysmal way to build a CSV file.

    Very timely - just this morning, I adapted a function to create a complete list of all financial data in a certain system to a new use.

    The original, written by my predecessors, creates a set of HTML tables for display in the user's browser, and did so by concatenating strings. It's usually a bit slow, but not annoyingly so, except that we have this one customer who has tens of thousands of bookings a month in a specific ledger, and thus creates a table of almost a hundred thousand rows. All using string concatenation. Either the server or the browser does not like that, and crashes.

    The new one creates a CSV file for import into Excel or some analysis package, and does so line by line. Still takes ten seconds, but then, it does create a 10 MB CSV...

    (In other news, I hate Excel localisation.)

  • Kell S (unregistered)

    In the following centuries this will be recognised as one of the earliest attempts of PDD - %PATH% Driven Development.

  • Yazeran (unregistered) in reply to RLB

    So in other words, now the analyst's computer crashes due to excel trying to open an 10MB csv file.... :)

    Been there and all that, which is one of the reasons i like Linux which comes with a ehome range of tols for data analysis (sed, awk, head, tail, grep, gnuplot etc).

    Yazeran

  • Loren Pechtel (unregistered)
    1. I would expect the optimizer to clean this up, all those references to the previous string are going to go away--a StringBuilder will keep the garbage down but the extra lines aren't causing garbage. Write for readability unless you're in a loop that's going to execute many, many, many times. And most optimization is a matter of the right choice of algorithm in the first place.

    2. This is going to be IO bound anyway, even if it is a bit slow nobody is going to notice.

    3. The else block is because somebody's expecting the lines to be there even if they're blank.

  • (nodebb)

    His header is full of content, but his head doesn't seem to be.

  • (nodebb)

    I'm saddened to learn and inform you all that even the latest version of .NET is unable to optimize this way:

    SharpLab

    And no, anti-performance atrocities like these should not be allowed. Sentence: life of writing HTML/CSS without parole.

    Addendum 2022-05-19 12:25: PS. Will post this to .NET Github

  • Barf4Eva (unregistered)

    meh barf

  • (nodebb) in reply to WTFGuy

    the guy is consistent; brutally consistent

    It is a well know trait of incompetence.

  • Yikes (unregistered) in reply to RLB

    Hmmm.... I wrote a quick script in python to write a 100,000-line csv averaging 105 bytes per line and even when the row is composed entirely of randomly generated numbers which are individually converted to strings and even when the output is written to a network drive, the script still manages to complete in under 1 second...

  • MaxiTB (unregistered)

    About the first example:

    In the current version of .net (6) the compiler will optimize interpolation literals as well. This example will maybe not be optimized due to it's recursive syntax tree nature, but I don't know the details - however they devs stated often that this implementation was only the first iteration. So I wouldn't be surprised it the compiler would unwind expressions like that in the future.

    Personally I never understand why something like that is needed at all; just clear a POCO, annotate it fluently or via attributes, serialize with header with whatever nuget library fits best, done. There is pretty much never the need for a construct like that, especially since code generators are maturing more and more since their introduction and integration with .net 5.

  • John (unregistered)

    Maintainability is more often important that worrying about performance for something which runs rarely.

  • (nodebb)

    Re: #3, those are columns, not rows. And it could have just been a single statement appending 7 commas in a row instead of 7 statements each appending one comma.

  • (nodebb) in reply to radarbob

    I don't know that I would go so far as to call strict consistency a sign of incompetence, so much as a lack of imagination. After all, this code probably does its job just fine. And I think consistency has value in high-risk scenarios. E.g., I would expect NASA code to be brutally consistent; but NASA code would probably also be a little more thoughtful, and not just consistency for consistency's sake.

  • (nodebb)

    But yes, TRWTF is, as usual, a homebrew solution for a well-solved problem (converting a data structure into a CSV).

  • ismo (unregistered) in reply to TheCPUWizard
    Comment held for moderation.
  • Aaron (unregistered)

    if this is c# 👇 string s = getSomeString(); s = s + "some suffix";

    Then it is correct, the cost to create a string builder is more expensive than the cost to create 3 strings

  • bp (unregistered) in reply to Mark

    Ah, but the author defended against commas by wrapping every interpolated value in quotes

    So now the code will break on quotes instead.

  • (nodebb)

    Looks like the String Buildinator should meet Trogdor the Burninator.

  • RLB (unregistered) in reply to Yazeran

    So in other words, now the analyst's computer crashes due to excel trying to open an 10MB csv file.... :)

    His problem - he asked for it!

    (Actually, Excel on my computer did open it. It gets a bit sluggish, true, but I can't really blame it.

  • RLB (unregistered) in reply to Yikes

    the script still manages to complete in under 1 second...

    There may well be some network lag in my numbers. And getting all the data out of an admittedly over-complicated database (which, no, I can't change).

    10 seconds is acceptable to me, given that a. this guy's one of our largest users, and b. they're only going to need this data once or twice a year, for the yearly audit.

  • Ultra Cut Scissors (unregistered)
    Comment held for moderation.
  • malikaffan (unregistered)
    Comment held for moderation.
  • farhan (unregistered)
    Comment held for moderation.

Leave a comment on “The String Buildulator”

Log In or post as a guest

Replying to comment #562257:

« Return to Article