- Feature Articles
- CodeSOD
- Error'd
- Forums
-
Other Articles
- Random Article
- Other Series
- Alex's Soapbox
- Announcements
- Best of…
- Best of Email
- Best of the Sidebar
- Bring Your Own Code
- Coded Smorgasbord
- Mandatory Fun Day
- Off Topic
- Representative Line
- News Roundup
- Editor's Soapbox
- Software on the Rocks
- Souvenir Potpourri
- Sponsor Post
- Tales from the Interview
- The Daily WTF: Live
- Virtudyne
Admin
If the input contains an =, we set blAction to true. mh, no: If the input contains an =, we set blAction to false.
So if the input contains an =, we return everything after the first = (with spaces and further = removed). Otherwise we return the input with spaces removed.
Admin
Isnt it the other way round? The first check sets blAction to false. So the foreach loop actually skips all chars until after the first '='
It returns an empty string if there is no '=' but returns everything after the frist '=' if there is any '='
Admin
Before anyone thinks this is an optimization; this line here:
foreach (char t in value)
generates an Enumerator for string; so it calls string.GetEnumerator() which in return uses a combination of MoveNext() and Current to iterate the characters.If you want to go for performance, always use the for statement instead.
I'm not commenting the rest of the code, because it's weird with the space trim.
Admin
if the value contains multiple spaces, I think this function munches the value by removing all those spaces, not just trimming it.
Admin
It doesn't, in fact, do that. It returns all non-spaces that meet the criteria you described.
So if the string is
"tokenname=Some Text With Spaces"
, it returns"SomeTextWithSpaces"
.Admin
Yeah, I think the acceptance criteria where:
if(startIndex < 0) startIndex = 0;
).Somehow the first requirement was turned upside down in the original implementation.
Admin
Everything after the first "=", except that all spaces are removed.
Admin
Also, the hilariously named variable blAction
Admin
It's Voodoo code ;-)
Admin
Maybe the whole purpose of that function has always been to serve as a constructor for empty strings the entire time.
Admin
What you said about enumerating/looping over a string is false.
For immutable objects, the compiler is smart enough to avoid using enumerators, and will instead rewrite it as a for loop for you (no pun intended): https://sharplab.io/#v2:D4AQTAjAsAUCDMACciDCAbAhgZ27A3rIsckiACyIBiA9gE4CimAxgBYAUIEADIgJYA7AA4BXAC4BKIiUIwS8xADN6AUxatE7NpjqJm/AQdGTpChVwCcWiQG5TiAL737CZJVp1OPI+KlyZ9vLKuuyCYvyIALyI3DYRADw+YgB0ADIqAgDmYqxxfADU+X5mZpahwuIA2nwAurb2TjAOQA=
In addition, pulling the Length property call out to a variable is useless. See this JIT assembly (they're identical): https://sharplab.io/#v2:C4LghgzgtgPgAgJgIwFgBQcDMACR2DCANpBOgN7rZW45wAs2AYgPYBOAFHEgAzYCWAOwAOAV2ABKStQppqc7ADM22doOD9sAXmzcA3BoA8/YWIB0AGQCmAgObAAFvr4BqZ5NnzPXAJyqTwAG0+AF1xXSkqAF8I7BisXAYWVgBRAA9gVjAAY2BLABMrWwdOHmNRCRiZTyo1bEJrO3stMrNCxvCPTyVWFVq+Zr1DOoaHJ1d3avkfP3Kg0I65aLRIoA
The compiler and runtime are smarter than you think. If you really wanted to optimize, you'd replace your iterate-by-character to use spans. Or even just a simple string.Replace.
NOTE: above code is untested.
Addendum 2024-02-27 07:59: Should use
CultureInfo.Ordinal
when getting the indexes of the spaces.Also, was I just nerd-sniped by a comment?
Addendum 2024-02-27 08:12: Yes: https://sharplab.io/#v2:C4LgTgrgdgPgAgJgIwFgBQ7EGZ0G90AERBiChx+ax1BAKgKYDOwAFAETAD2A1vVFAEMAtvQC8AZU4i69AB7ACAdQCWwABYFxABwEB
Admin
As far as I can see, this will return everything after the first "=" with spaces removed, or if no "=" is present, the whole parameter with spaces removed. So it does something quite different from split(). It could have been done better, surely.
Admin
https://sharplab.io/#v2:C4LgTgrgdgPgAgJgIwFgBQ7EGZ0G90AERBiChx+ax1BAKgKYDOwAFAETAD2A1vVFAEMAtvQC8AZU4i69AB7ACAdQCWwABYFxABwEBjJmwCUAbnJEAvujMksJJADYSAFhnMWcJAAYCyqFojAhtaUNMQeAJwsvv6BplShdpEAJGwEALQAfAS4AGLKADbA9GAAgmAA5lF+AYbmRnHUlhjxRHC2Ho4e3nmFxWWVXQQAbgL5EPRBLdnW1L4K9ACOEKOMAJJQACZyBKLDo+MAdOtbsgDyAGYsAOSiVwA0msBgvuUAwlI6z4ycUAenYBtfKMTDNiMpzgQWItlvk1pttgAeAieSYJGhwADsBDYbDioKIujUAjAAG0ALoETgBGI7AhQegAdwIhOJJJGY3oBwAMnxyup0gRoStjnIyQ0EnNKQERbJaZ5xaEAEr0AQbU5QfIAT20AigCJZYCy0QCtPZhxKjB1UChS2F8NlAGoCEgQVNZlAFIwdPoZbTjcAjvaLtcCFdXWiGWoCvRIV69PRfRldij8dM3RLqsASQcDnGffayQd3lpNbROCwqcAYgcLVaK9L7YZw2iwZm/ZmSSw8wn7QQnS6c2LU9RK76HbtuzKFS3J73dv7Aydg1dQ83Qk00QB6Td0zgEIScMAx7uMYwEVUbAhH5iphfF0vlyvV2s6a2jxuu28eghcYCjHlQHyGi7O+Jx9j4mbcry6jTtQmJ0oydiePWVYBDWlqviwngPL+/7QWoTbTk0TRAA===
Admin
You are correct for System.String but every other type will end up using an Enumerator in an exception scope. Hence you should not aim for exceptions, but use the same pattern implementation that offers the best result for all cases. You can try it yourself, just use a List<char> instead ;-)
I didn't use spans on purpose; first of for .net framework you would need an additional package and it doesn't work as smoothly as it does with .net 5+. Plus honestly I haven't looked up/benchmarked CopyTo in comparison to StringBuilder; I doubt it's worth the loss of readablity, but maybe it's actually ending up more efficient. Worst case obviously would be CopyTo generating additional heap objects, negating the benefits. Do you have benchmark.net results?
When it comes to CultureInfo, using CultureInfo.InvariantCulture for ASCII characters ('=', ' ') is good enough; that's the default for IndexOf, no need to make the code more verbose in this case. To my knowledge there is also no performance advantage, but again, haven't made a benchmark yet ;-)
Admin
Interesting cases where there is no parameter value.
Input: 'MyArg'; Output: 'MyArg' Input: ' MyArg'; Output: 'MyArg' Input: 'My Arg'; Output: 'MyArg' Input: 'MyArg '; Output: 'MyArg' Input: 'MyArg='; Output: '' Input: 'MyArg=MyValue'; Output: 'MyValue' Input: ' MyArg=MyValue'; Output: 'MyValue' Input: 'MyArg =MyValue'; Output: 'MyValue' Input: 'MyArg= MyValue'; Output: 'MyValue' Input: 'MyArg=MyValue '; Output: 'MyValue' Input: 'My Arg =MyValue'; Output: 'MyValue' Input: 'MyArg=My Value'; Output: 'MyValue' Input: 'MyArg= My Value '; Output: 'MyValue'
Admin
Simple version: int pos = value.LastIndexOf('='); return (pos < 0 ? value : value[(pos+1)..]).Replace(" ", "");
Admin
The repetitive appending of a character to the
tmpValue
variable is unlikely to be optimized away and does create N instances for GC to collect, where N is the length of the result string.Admin
Admin
I think this is a rare case where a regex is more readable and expresses more clearly (cf. comments above ;) what the code does (whether that's what it's meant to do is another question, of course):
Admin
I follow up from my pervious post (which is currently locked by moderation), and as expected your code suffers from a scaling issue do to heal allocations made by CopyTo:
The arguments for the benchmark were:
So I think it's pretty clear, that in this case spans make not much sense.
Source: http://www.maxisoft.org/thedailywtf/maxisoft.the-daily-wtf.zip
Admin
Surely you know we have to exercise the GC otherwise it will become slow and unresponsive..... :-)
Admin
In addition to spaces, this will also strip out "="s.
The real wtf is the function name that doesn't seem to come close to describing its behavior, but how do you name a function that does so many things?
GetValueFromKeyValuePairOrJustValueIfNotPairWithSpacesAndEqualsSignsRemoved
Admin
OK OK, fastest version:
Addendum 2024-02-27 15:16: @CrashCodes just noticed your comment, my latest version neglects to check for equal sign, but I hope @MaxiTB runs it as is to compare against the other versions, including the regex one, to finalize the "competition" :)
Admin
GetValueFromKeyValuePairWithSpacesAndEqualsSignsRemovedOrJustEmptyStringIfNotPair
Admin
Someone is getting desperate and is using goto; here ya go:
´´´ | Method | value | Mean | Error | StdDev | Median | |-------------- |--------------------- |----------:|---------:|---------:|----------:| | MaxiTb | Key=Value | 31.94 ns | 0.196 ns | 0.164 ns | 31.88 ns | | Colejohnson66 | Key=Value | 25.50 ns | 0.590 ns | 1.447 ns | 24.65 ns | | MrTa | Key=Value | 34.94 ns | 0.127 ns | 0.106 ns | 34.94 ns | | MrTa2 | Key=Value | 30.87 ns | 0.099 ns | 0.077 ns | 30.87 ns | | MaxiTb | Scali(...)one [51] | 89.16 ns | 0.485 ns | 0.454 ns | 89.05 ns | | Colejohnson66 | Scali(...)one [51] | 199.82 ns | 1.105 ns | 0.922 ns | 199.61 ns | | MrTa | Scali(...)one [51] | 102.90 ns | 0.232 ns | 0.181 ns | 102.89 ns | | MrTa2 | Scali(...)one [51] | 86.67 ns | 0.231 ns | 0.216 ns | 86.64 ns | | MaxiTb | token(...)paces [31] | 64.62 ns | 0.148 ns | 0.123 ns | 64.59 ns | | Colejohnson66 | token(...)paces [31] | 56.63 ns | 0.361 ns | 0.320 ns | 56.53 ns | | MrTa | token(...)paces [31] | 95.36 ns | 0.429 ns | 0.380 ns | 95.32 ns | | MrTa2 | token(...)paces [31] | 66.87 ns | 0.505 ns | 0.472 ns | 66.71 ns | ´´´
Considering that I eye-balled the first version, should I give it another go too? :-)
Admin
Sorry, mess up the formatting and sadly here's no way to get rid of previous posts even when logged on, so here it is again:
Admin
Before I head to bed, I cooked up a personal V2 (balancing for bigger sub string values):
Now I would prefer to post the results here, but the character limits doesn't allow it (even without everything else).
Source: http://www.maxisoft.org/thedailywtf/maxisoft.the-daily-wtf.zip Results: http://www.maxisoft.org/thedailywtf/maxisoft.the-daily-wtf.txt
I also included benchmarks for the fixed original version which performs obviously horrible. Additionally I tried memory profiling but beside the original version all other version where pretty much in the same ball yard.
@colejohnson66's code as he mentioned has a bug, sadly his last link doesn't work for me, otherwise I would have included it as his V2. It should do better for longer strings, similar to my V2 but they way he implemented it will really tank in performance with a lot spaces present (as can be seen in the new test results).
Admin
For those interested how all the algorithms did, here the TOP3 for all benchmark cases:
"NoValue":
"Empty=":
"Key=Value":
"tokenname=Some Text With Spaces":
"Scaling = So Many Spaces Here In This One ":
"Super Long = This Is SuperDupaExtraLongVariant Of ThePreviousMoreSpaceFocusedScalingVersion Just To TestForWayLongerStrings !":
This shows pretty nicely that at a certain points there is no optimal algorithm, it ends up more a case of readability and which use-cases are the most common. And yeah, the original code is so far off the other implementations (which are usually very close together), it's always on the dead-last spot by far.
Admin
This isn't a split.
It returns everything after the first equals--it doesn't split again if there is another equals.
It returns the value if there's no equals.
And I'm pretty sure I understand the use case here: this is part of a configuration file processor. Some parameters are name=value and some are just name with an implicit =true.
simplemagle's answer with a trim added is the right answer.
And don't ask me why they are processing the things without a value as values rather than as names!
Admin
Yes, to me, this looks much like part of a lexer for some basic parsing.
Admin
My interpretation of this code is that it is part of a URL parser… Given a URL like
http://www.example.com/page?param1=value1¶m2=value2¶m3¶m4
, the calling code clearly found out how to get the Query Params part of the string (so now hasparam1=value1¶m2=value2¶m3¶m4
), successfully split the string on the&
character, but needed a way to get the “value” part of these four different parameters, where forparam3
andparam4
, there is no value, so they just want the name of the parameter…They don’t expect spaces (because those tend to get URL encoded as %20 anyway), and they don’t expect additional equals signs (although technically, they might one day need to parse something like
param5=SomePaddedMD5==
, at which point they’ll want to fix this again!)I’m pretty sure in my early days of web dev I came across services that expected parameters in this format - I’m not entirely confident that it’s correct according to standards though.
Admin
Jop, the code itself is properly an anti-pattern to begin with if it is used for configuration parsing, because the proper way to do this in .net is using the options pattern:
https://learn.microsoft.com/en-us/dotnet/core/extensions/options
In other words, you use the options extension to aggregate configurations across all sources (files, DB, env, cmdargs,...) and then use the options binder to bind it to POCOs. This also allows you have have observable configuration properties (for example for on-the-fly feature toggles), but personally I prefer my configuration to stay static.
Admin
The funny thing is, if you have a look at classic books on parsers and compilers, this is pretty much what you get. So it may depend on the direction, you're coming from.
Having said that, using a string index method to start with and a classic for loop instead of an iterator would have (a) sped the thing up, and (b) may be closer to any C source that may have served as an inspiration. It's really this peculiar mix that is particularly smelly, like the worse of all worlds.
Admin
PS: Doing the same with split() would afford
performing the split()
inspecting the length of the resulting array or vector
if it's 1, return the first element
if it's 2, return the second element
if the language doesn't support a limit to split() and it's more than 2, re-join()-ing the parts from 2 and up (e.g.,, unshift() and join() by "=") and return the resulting string.
Admin
Final version:
This achieves either first place or ties for first place for all arguments in @MaxiTB's test suite.
Addendum 2024-02-29 20:55: PS. It's also first in memory allocation in all tests.
Admin
Ironically we ended up with Spans in the end; the only issue of cole's implementation ended up being spaces :-)
Admin
I updated my project with your final solution, for that's it; was a fun little tinkering around but now I focus more on new stuff :-)
Source: http://www.maxisoft.org/thedailywtf/maxisoft.the-daily-wtf.zip Results: http://www.maxisoft.org/thedailywtf/maxisoft.the-daily-wtf.txt
Admin
I made it even a little bit faster by using
stackalloc
. Strangely though, the benchmarks where that code path is not hit - like no equal sign or no intermediate space after equal sign - ran a tiny bit slower with stackalloc. I think it's probably a fluke of benchmarking - the difference is so small, like 1-2 ns, it's within the margin of error. Still, when the code path is hit, it's significantly faster, and allocated memory is significantly down, as well (for obvious reasons).Admin
The only problem with
stackalloc
is what happens if you blow the stack up. A lot of programs (and programmers!) don't cope well with that at all. The key advantage of the heap is that it is comparatively huge; I've run programs with heaps larger than 1TB (yes, that was a nice server machine and a very nasty gigantic matrix).Admin
You have to be careful with stackalloc; it's not something I personally use for arrays beyond 64 bytes. Don't forget, items on the stack have also a minimum size of 4 bytes (32 bit integer), so per char you are using 4 bytes instead of 2. So if you have strings with a length of 16 characters or more, you will start noticing unwanted side effects.
Admin
Wow! This quickly deteriorated into code golf...
Though, I'd probably do indexOf("=") (too tired to check up the .net c# equivalent) and then substring(index+1)... think it works like in Java where you get the rest of the string. Performance wise? It'll be fine, I PROMISE!
Though, I'd also have to handle the fact that (if I'm not wrong) this one will strip the value of all it's spaces... (if the value even have spaces)... shurg sgih