The Daily WTF: Curious Perversions in Information Technology

2021-10-14 Reply Admin

If (frist) { frist = false; } else { str += delimiter; }

Mr. TA · 2021-10-14 Reply Admin

I actually have my own version of this method which is an extension on IEnumerable<T> and it takes Action<StringBuilder, T>. Why? First, prior to .NET 5, string Join only took arrays, so you had to convert whatever you had to a string array, which entails N+1 allocations. Now there's an overload which takes IEnumerable<T>, but it calls ToString on each item, which still means N allocations. Whereas my append action can call sb.Append on each property of the item.

Mr. TA · 2021-10-14 Reply Admin

Basically, instead of

string.Join(", ", persons.Select(p=>p.FirstName + " " + p.LastName))

I do

persons.ToDelimitedString(", ", (sb, p)=>{sb.Append(p.FirstName); sb.Append(' '); sb.Append(p.LastName); })

Notice how this involves no allocations whatsoever.

chucker23n · 2021-10-14 Reply Admin

I like the part where they did abstract "what if we want a different delimiter later?", but did not abstract "wait, what if delimiting isn't about tags at all? We can delimit all kinds of things, man!"

2021-10-14 Reply Admin

Note that if the Tags array starts with empty strings, those entries will not be delimited.

2021-10-14 Reply Admin

I am guilty of this, too. I learned about string.Join() several years after I discovered string.Split(), just because the name was too abstract for my brain. I might have discovered it earlier if it was named string.Concat() or string.ToCsv() - which is the name of the extension method I wrote at the time. And there we are back at one of the hardest problems in programming: naming stuff.

2021-10-14 Reply Admin

Mark me in the guilty camp as well. Took me a couple years to find string.Join. My reimplementation looked a lot like this one. Fortunately I had written my own in a library, so it was easy to make the switch

Jeremy Pereira · 2021-10-14 Reply Admin

I've used almost exactly that approach in Java and Swift (although I wouldn't have bothered with the string interpolation part). Apart from ignoring any equivalent standard library function, I don't see anything particularly wrong with it.

Of course, in Swift, calculating the length of a string is an O(n) operation because it attempts to be Unicode safe, so it's better to use isEmpty.

2021-10-14 Reply Admin

Me thrid. When I discovered it, I realized join() was handy for inserting directory path separators.

Naturally, every(), some(), map(), reduce(), and so on soon followed.

Mr. TA · 2021-10-14 Reply Admin

That's actually the pattern I've been using and I just realized something - it would be faster to use IEnumerator<T> to avoid calling the if repetitively:

var enumerator = list.GetEnumerator();
if(enumerator.MoveNext())
{
  append(sb, enumerator.Current);
  while(enumerator.MoveNext())
  {
    ab.Append(delimiter);
    append(sb, enumerator.Current);
  }
}

2021-10-14 Reply Admin

I don't know anything about your use case, but if performance wasn't an issue, I think I prefer the version with the persons.Select method, just because it is cleaner and easier to read.

Mr. TA · 2021-10-14 Reply Admin

Yeah that's the downside of LINQ - it makes things look "cleaner and easier to read" often at the expense of performance. Allocations are NOT free, even in memory-managed (GCed) frameworks, because these objects still have to be collected, recycled, whatever. My version is a bit more verbose, yes, but I think it's still easy to understand, and crucially, doesn't have a performance hit.

2021-10-14 Reply Admin

leading or training delimiters So the latter are training to overtake the former? ;-)

2021-10-14 Reply Admin

Premature optimization is the root of all evil.

Ephemeral GC is very old technology now, so collecting short-lived temporary objects should be reasonably efficient. So unless you're joining very large collections, doing this in critical inner loops, or writing a library that could be used in such situations, don't sweat these details.

Mr. TA · 2021-10-14 Reply Admin

I couldn't disagree with you more. This phrase - "Premature optimization is the root of all evil." - very much sounds to me like an excuse to write poor quality code (no offense intended or implied). Coding is about patterns, and patterns are about consistency. It doesn't matter if you have a collection of 10, 1000, or 1000000 items, using the correct pattern at all times ensures optimal performance.

Now the more important question is, are these Join()-like methods necessary at all? I think whenever you can write straight to a TextWriter (StreamWriter, HtmlWriter/HttpWriter, etc.), you should do so. I have extension methods for this, too - AppendDelimited<T>(this TextWriter/StringBuilder This, IEnumerable<T> list, Action<TextWriter, T> append).

Notice how we improved our code: string.Join(string[]) = N+2 allocations; string.Join(IEnumerable<>) = N+1 allocations; my IEnumerable<T>.ToDelimitedString() = 1 allocation; my TextWriter.AppendDelimited = 0 allocations.

Before you know it, you're shipping code which takes less memory and users don't need 8-core ThreadDestroyer CPUs with 128GB of RAM to start Windows.

2021-10-14 Reply Admin

I agree.

I agree because premature optimization really is a bad idea.

And I agree because, unless you have a working understanding of the way your GC operates, you have no business even thinking about it.

I'm no expert, but fwiw I view the generational .NET GC as an "allocation free" system. It isn't quite free, but absent pathological cases, it's close enough. The key observation with generational GCs is that, in practise, 90% of objects can just be ignored on "de-allocation." It's only the other 10% that need to be moved to the next generation, which can be done via a bit-blat and a reallocation of pointers. Now, maintenance and disposal of unmanaged objects ... that's a different question.

In re the OP, I do particularly admire the way that the "author" is trying to be oh-so-efficient by using StringBuilder ... and then trashes whatever efficiency there is by using string formatting. Which is where we come back to premature optimization ...

2021-10-14 Reply Admin

Easy Reader Version: And then Python goes and makes join an instance method, and it makes sense, but still feels alien

As does JavaScript. And for extra fun, Python makes it a method on the delimiter string, while JavaScript makes it a method on the array. So if you use both languages you have to remember to ' '.join(array) in Python but array.join(' ') in JS.

2021-10-14 Reply Admin

C# 10 has a proposed feature for interpolated string handlers, which could, in theory, allow you to write

persons.ToDelimitedString(",", (w, p) => w.Append($"{p.FirstName} {p.LastName}"));

without incurring the cost of allocating the temporary interpolated string. Much better than a series of Appends.

2021-10-14 Reply Admin

As does JavaScript. And for extra fun, Python makes it a method on the delimiter string, while JavaScript makes it a method on the array. So if you use both languages you have to remember to ' '.join(array) in Python but array.join(' ') in JS.

which somehow strangely makes it one place where the decision in JavaScript makes more sense than the decision in Python

After all, the reason why most people find string.split early on, but take a while to discover string.join is likely because when you have a string and need to break it up, you start by looking at what methods are available on your string, find string.split and move on. However, when you have an array/list/iterable/whatever and need to join the pieces together in to a string, you start by looking at what methods are available on your array/list/iterable/whatever, see nothing (unless it's JS, or another language similar language which does have the join method right here, in which case you're done!), assume it doesn't exist and then start writing your own; only to find some time later that it does exist, just not in the first place you thought to look.

2021-10-14 Reply Admin

a nice highlight in the pythonic world is how easy it is to add quotes around strings (escaping those inside), while leaving numbers alone, while at the same time joining them all with commas

2021-10-14 Reply Admin

Why does everybody understand this wrong? Original quote: "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%"

For me, Mr. Ta's solution (enumeration) is not a small efficiency. It speeds up string concatenation greatly. It also doesn't make it more complicated than the original solution and you did not waste more time to write it. Further on it is a utility function that is used most likely in many places in the program, also in time-critical parts.

The only case I can think of where changing the original solution to Mr Ta's solution is really an "evil premature optimization" is when it is guaranteed that this function is only used to for example concat first, middle and last person name, and never to concat thousands of small string parts. But then I would name it "concatPersonName" to make sure that it is not mis-used in time critical functions. And only then we can say it is evil when somebody optimizes it, because then the efficiency (performance) gain is only small.

2021-10-15 Reply Admin

I noticed that too. Maybe the method brilliant after all, no delimiters in the beginning of the result if the tags in front are empty. Dunno how the .join works, too lazy to lookup.

2021-10-15 Reply Admin

Python can join anything that is iterable, and it means not only objects of some Iterable base class.

Mr. TA · 2021-10-15 Reply Admin

Regardless of the exact internals of GC, creating new strings has a cost. It's a small cost, no doubt, but it's not 0. Following the approach of never allocating objects if it's not needed will always yield higher performance, be it bigger or smaller gain.

WRT this code or any other snippet being of low quality because of poorly thought out optimization, that doesn't mean optimization as a concept is flawed. Bad programmers do optimization wrong, but then they also do database queries wrong. We don't say "database access is the root of all evil" just because some database code is bad, do we?

In my example, the only thing to sacrifice to improve performance is a miniscule change in code style. That's probably the lowest price ever to pay for performance gain. You don't sell your kidney to pay for a new Hermes tie.

2021-10-15 Reply Admin

Spoiler, string.Concat exists too, but it does something different :-)

Lyle Seaman · 2021-10-15 Reply Admin

I agree in principle, however you might find that the C# compiler is smarter than you think. I know that in your example of "string" + "string" + "string", the Java compiler doesn't do two allocations -- it uses an intermediate StringBuilder. I would hope the C# compiler is equally clever.

Mr. TA · 2021-10-16 Reply Admin

The concatenation like you described is definitely optimized into a call to string.Concat. Still, as part of that, a string object is allocated and populated; and inside a loop, it happens repetitively, for these strings to be appended and immediately discarded. My issue isn't the double allocation inside the iteration; it's the allocation being needed at all.

2021-10-17 Reply Admin

I didn't mean to disrespect the method Mr. TA came up with. Having it as an option allows someone to choose which method best suits their needs, and there are definitely use cases where the cost of the allocations is not negligible.

In my opinion, the extension method version is perhaps 5% less readable. Like when someone uses the wrong spelling of "there/their/they're". You can see pretty quickly what's going on, but you might need to pause for a brief moment to parse it with your brain.

I think bigger difference is in terms of separation of concerns. With the extension method, you are using the method's StringBuilder. The extension method is then tied to using that StringBuilder and can't change it for anything else later. Not really a big deal, but I think Join is a bit more elegant, if not as performant.

2021-10-17 Reply Admin

The biggest objection to pre-optimisation is that performance is just one goal. One thing that has been learnt over the decades is that getting the design right, and creating something more manageable and maintainable, is often a much more important goal. Some people get too wrapped up in saving a few cycles that they build an unholy mess in the process.

In this case, though, the difference in readability is fairly minimal, and the gain can be worth it in some circumstances, which is why I said "if performance wasn't an issue".

2021-10-21 Reply Admin

For the .NET commentary, where are the new strings being created? Strings are reference types, even if they often act like value types, so the overhead of turning an enumerable container of strings into an array is entirely in the array and ought to just create additional references to the existing strings.

There isn't a cute parable on it, but the key with optimization is measure, measure, measure, because what's actually slow often isn't the same as what you expected to be slow.

Joining the Rest of Us

Leave a comment on “Joining the Rest of Us”