• Rob (unregistered)

    This article needs some reformatting, because the second line only contains 2-space strings as first argument to the Replace method.

  • (nodebb)

    Here we go, another "write the fastest method challenge" 🤣🤣

  • (nodebb)

    Bah. Who cares about performance?

    while (input.Contains("  "))
        input = input.Replace("  ", " ");
    
  • (nodebb) in reply to colejohnson66

    @colejohnson66 It's O(n log(n)) since the number of spaces is halved per iteration. This might even be faster than a regex solution for most inputs.

  • Andreas (unregistered)

    Easy. Just render it as html, which will collaps all whitespace.

  • Sauron (unregistered)

    You have some text, and need to replace every sequence of spaces with a single space. E.g., My text becomes My text. Now, if you're most of us, you ignore the famous quote and reach straight for regexes.

    Wait, wait, wait, wait there. Why do they need to do that in the first place?

    It's difficult to know if there's a legitimate objective without context, but it sounds suspicious. Are they trying to display HTML documents on the server with their own insane renderer or something like that?

    Don't just rashly reach for regexes before it's clear what the actual purpose even is.

  • (nodebb)

    I think the single space in the first line was intentional. The programmer knew that

    this is C#, where Replace replaces /all/ occurences [sic]

    and said, "well, that's what I want. Replace all the spaces with a single space."

    Then when that didn't work, and under a deadline, they put together something that kind of worked, although they didn't really know why, but it met the unit tests and that means they can close this difficult ticket.

  • ZZartin (unregistered) in reply to Sauron

    Wait, wait, wait, wait there. Why do they need to do that in the first place?

    There's a lot of cases where single spaces are desired for example in a name field.

  • (nodebb)
        [Benchmark]
        [Arguments(ArgumentEmpty)]
        [Arguments(ArgumentSpaces)]
        [Arguments(ArgumentNoSpaces)]
        [Arguments(ArgumentText)]
        public string MaxiTB(string value)
        {
            var length = value.Length;
            if (length == 0) return "";
    
            var index = 0;
            if (value[0] == ' ')
            {
                do 
                { 
                    if (++index == length) return " "; 
                } 
                while (value[index] == ' ');
                --index;
            }
    
            var builder = new StringBuilder(length);
    
            while(true)
            {
                var next = value.IndexOf(' ', index + 1);
                if(next < 0)
                {
                    if(index < length - 1)
                        builder.Append(value[index..]);
                    break;
                }
    
                builder.Append(value[index..next]);
    
                index = next;
                do
                {
                    if (++index == length)
                    {
                        builder.Append(' ');
                        return builder.ToString();
                    }
                }
                while (value[index] == ' ');
                --index;
            }
    
            return builder.ToString();
        }
    
  • (nodebb)

    Considering that regarding the article this method is called a lot, proper benchmarking is in order:

    | Method        | value                | Mean          | Error     | StdDev    | Gen0   | Allocated |
    |-------------- |--------------------- |--------------:|----------:|----------:|-------:|----------:|
    | Original      |                      |    54.1995 ns | 0.2349 ns | 0.2082 ns |      - |         - |
    | Colejohnson66 |                      |     4.9047 ns | 0.0452 ns | 0.0423 ns |      - |         - |
    | MaxiTB        |                      |     0.9368 ns | 0.0180 ns | 0.0169 ns |      - |         - |
    | Original      |      (...)      [60] |   896.0425 ns | 2.6450 ns | 2.3447 ns | 0.0687 |     216 B |
    | Colejohnson66 |      (...)      [60] | 1,050.1930 ns | 2.5702 ns | 2.2785 ns | 0.0858 |     272 B |
    | MaxiTB        |      (...)      [60] |    39.7127 ns | 0.1097 ns | 0.0916 ns |      - |         - |
    | Original      |     S(...)d     [81] |   418.2939 ns | 0.6459 ns | 0.5394 ns | 0.0992 |     312 B |
    | Colejohnson66 |     S(...)d     [81] |   403.5888 ns | 1.8307 ns | 1.7125 ns | 0.0992 |     312 B |
    | MaxiTB        |     S(...)d     [81] |   205.2808 ns | 1.4423 ns | 1.3491 ns | 0.2346 |     736 B |
    | Original      | Space(...)ement [40] |    67.5371 ns | 0.3246 ns | 0.2878 ns |      - |         - |
    | Colejohnson66 | Space(...)ement [40] |     8.5752 ns | 0.0673 ns | 0.0629 ns |      - |         - |
    | MaxiTB        | Space(...)ement [40] |    39.1458 ns | 0.0941 ns | 0.0786 ns | 0.0816 |     256 B |
    

    Colejohnson66's version obviously scales very badly with spaces. Can't wait for Mr. TAs version :-)

  • (nodebb)

    Oh yeah, the arguments, because the post length is limited:

        public const string ArgumentEmpty = "";
        public const string ArgumentSpaces = "                                                            ";
        public const string ArgumentText = "    Spaced  Out   Replacement   Out  Spaced   Out   Replacement   Out  Spaced    ";
        public const string ArgumentNoSpaces = "SpacedOutReplacementSpacedOutReplacement";
    
  • (nodebb)

    Oh, I added a source generated regex plus Regex.Replace as an additional reference, which is obviously the most easiest way to do it:

    | Method        | value                | Mean          | Error     | StdDev    | Gen0   | Allocated |
    |-------------- |--------------------- |--------------:|----------:|----------:|-------:|----------:|
    | Original      |                      |    55.1746 ns | 0.1940 ns | 0.1814 ns |      - |         - |
    | Colejohnson66 |                      |     4.5678 ns | 0.0143 ns | 0.0127 ns |      - |         - |
    | RegexCompiled |                      |    57.7880 ns | 0.1129 ns | 0.0943 ns |      - |         - |
    | MaxiTB        |                      |     0.8303 ns | 0.0133 ns | 0.0118 ns |      - |         - |
    | Original      |      (...)      [60] |   894.2432 ns | 2.2619 ns | 2.0052 ns | 0.0687 |     216 B |
    | Colejohnson66 |      (...)      [60] | 1,045.7003 ns | 2.3467 ns | 2.0803 ns | 0.0858 |     272 B |
    | RegexCompiled |      (...)      [60] |   160.7820 ns | 0.3043 ns | 0.2847 ns | 0.0076 |      24 B |
    | MaxiTB        |      (...)      [60] |    41.9431 ns | 0.1955 ns | 0.1733 ns |      - |         - |
    | Original      |     S(...)d     [81] |   416.5683 ns | 0.9257 ns | 0.7730 ns | 0.0992 |     312 B |
    | Colejohnson66 |     S(...)d     [81] |   413.3419 ns | 0.9591 ns | 0.8972 ns | 0.0992 |     312 B |
    | RegexCompiled |     S(...)d     [81] |   621.1182 ns | 5.9584 ns | 5.2820 ns | 0.0477 |     152 B |
    | MaxiTB        |     S(...)d     [81] |   206.1408 ns | 1.3865 ns | 1.2970 ns | 0.2346 |     736 B |
    | Original      | Space(...)ement [40] |    67.4080 ns | 0.1713 ns | 0.1519 ns |      - |         - |
    | Colejohnson66 | Space(...)ement [40] |     7.7962 ns | 0.0827 ns | 0.0773 ns |      - |         - |
    | RegexCompiled | Space(...)ement [40] |    62.9161 ns | 0.0890 ns | 0.0743 ns |      - |         - |
    | MaxiTB        | Space(...)ement [40] |    39.0214 ns | 0.1437 ns | 0.1274 ns | 0.0816 |     256 B |
    
  • Sauron (unregistered) in reply to ZZartin

    Names are hard: https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/

    I'd say don't make too many assumptions. Someone like Elon Musk is ready to name a child with some edge cases to break your form fields to troll you. And I don't know of any international law or treaty mandating that a name can't have 2 consecutive spaces.

  • Jason Stringify (unregistered) in reply to Andreas

    .. which is precisely what has happened to the frist "My text" in the article, leading to the confusing ' "My text" becomes "My text" .'

  • (nodebb)

    Just imagine doing something like this to ... umm.... format python.

  • Standards are for losers (unregistered) in reply to cellocgw

    Won't break my python code because my code uses Tabs!

  • (nodebb) in reply to MaxiTB

    There's no fun, you already did everything I thought about doing 🤣🤣

  • (nodebb) in reply to Mr. TA

    Yeah, I admit, I was bored but there's still a lot room for improvement actually :D

  • Duke of New York (unregistered) in reply to Sauron

    Some people use their kids' names as passwords. Elon Musk used his password as his kid's name.

  • mihi (unregistered)

    Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have /[2-9]|[1-9][0-9]+/ problems.

  • (nodebb) in reply to MaxiTB

    What about starting with a search for the first instance of a double space? That would speed up the case where it doesn't occur (you just return the string or a copy thereof), and otherwise you copy over all the text up to that point into your StringBuilder (which would technically only need to be length - 1 in size).

  • Erwin (unregistered) in reply to Andreas

    ... which also happened to the code-without-markdown-backticks in the last line of the article. The double space in Replace(" ", " ") is collapsed to a single space by the browser. So to the reader it appears like the author suggests to solve the problem of longer sequences of spaces by adding code that does not change the number of spaces.

  • (nodebb)
    # Sane person
    output = re.sub(r"\s+", " ", input)
    
    # Possibly still okay
    output = " ".join(input.split())
    
    # When all you have is .replace, e.g. when using a GUI find/replace tool
    # or some niche scripting language. Possibly actually fast.
    previous = None
    output = input
    while previous != output:
        previous = output
        output = output.replace("  ", " ")
    
    # Works by chance in many cases
    output = input
    output = output.replace("    ", " ")
    output = output.replace("   ", " ")
    output = output.replace("  ", " ")
    
    # This post: Fails more commonly, also unnecessartily first .replace
    output =  input.replace(" ", " ")
    output = output.replace("  ", " ")
    output = output.replace("   ", " ")
    output = output.replace("    ", " ")
    
    # A bit more systematic
    output = input
    for numspaces in range(100, 1, -1):
        output = output.replace(" "*numspaces, " ")
    
    
    # Chaotic evil
    fscript = io.StringIO()
    for numspaces in range(100, 1, -1):
        print(f"output = output.replace('{' '*numspaces}', ' ')", file=fscript)
    data = {"output": input}
    eval(compile(fscript.getvalue(), "*", "exec"), None, data)
    output = data["output"]
    
  • A guy I knew (unregistered)

    Oh the epic fail of putting multiple whitespace in HTML. Remy, was that the WTF?

  • Sauron (unregistered) in reply to Duke of New York

    xD

  • (nodebb)

    I don't know of any international law or treaty mandating that a name can't have 2 consecutive spaces

    You couldn't even write that name out by hand so I really don't think it's a case worth worrying about.

  • (nodebb) in reply to MaxiTB

    MaxiTB, can you please explain why this isn't an infinite loop?

    while (value[index] == ' ');

  • (nodebb)

    [code] | Method | value | Mean | Error | StdDev | Median | Gen0 | Allocated | |--------- |--------------------- |------------:|----------:|----------:|------------:|-------:|----------:| | MaxiTbV1 | | 0.5587 ns | 0.0210 ns | 0.0175 ns | 0.5513 ns | - | - | | MrTaV1 | | 0.4497 ns | 0.0163 ns | 0.0145 ns | 0.4520 ns | - | - | | MaxiTbV1 | (...) [60] | 28.3136 ns | 0.6050 ns | 0.9239 ns | 27.7948 ns | - | - | | MrTaV1 | (...) [60] | 19.9140 ns | 0.1200 ns | 0.1123 ns | 19.8919 ns | - | - | | MaxiTbV1 | S(...)d [81] | 142.0714 ns | 1.6329 ns | 1.5274 ns | 141.8260 ns | 0.0880 | 736 B | | MrTaV1 | S(...)d [81] | 113.0683 ns | 1.0635 ns | 1.0445 ns | 113.2615 ns | 0.0181 | 152 B | | MaxiTbV1 | Space(...)ement [40] | 26.1228 ns | 0.3691 ns | 0.3082 ns | 26.2077 ns | 0.0306 | 256 B | | MrTaV1 | Space(...)ement [40] | 13.5999 ns | 0.1451 ns | 0.1286 ns | 13.5850 ns | - | - | [/code/

  • (nodebb)
    | Method   | value                | Mean        | Error     | StdDev    | Median      | Gen0   | Allocated |
    |--------- |--------------------- |------------:|----------:|----------:|------------:|-------:|----------:|
    | MaxiTbV1 |                      |   0.5587 ns | 0.0210 ns | 0.0175 ns |   0.5513 ns |      - |         - |
    | MrTaV1   |                      |   0.4497 ns | 0.0163 ns | 0.0145 ns |   0.4520 ns |      - |         - |
    | MaxiTbV1 |      (...)      [60] |  28.3136 ns | 0.6050 ns | 0.9239 ns |  27.7948 ns |      - |         - |
    | MrTaV1   |      (...)      [60] |  19.9140 ns | 0.1200 ns | 0.1123 ns |  19.8919 ns |      - |         - |
    | MaxiTbV1 |     S(...)d     [81] | 142.0714 ns | 1.6329 ns | 1.5274 ns | 141.8260 ns | 0.0880 |     736 B |
    | MrTaV1   |     S(...)d     [81] | 113.0683 ns | 1.0635 ns | 1.0445 ns | 113.2615 ns | 0.0181 |     152 B |
    | MaxiTbV1 | Space(...)ement [40] |  26.1228 ns | 0.3691 ns | 0.3082 ns |  26.2077 ns | 0.0306 |     256 B |
    | MrTaV1   | Space(...)ement [40] |  13.5999 ns | 0.1451 ns | 0.1286 ns |  13.5850 ns |      - |         - |
    
  • (nodebb)
    	public string MrTaV1(string value)
    	{
    		var length = value.Length;
    		if (length == 0) return "";
    
    		int firstNonSpace;
    		for (int i = 0; i < length; ++i)
    		{
    			if (value[i] != ' ')
    			{
    				firstNonSpace = i;
    				goto foundNonSpace;
    			}
    		}
    		return " ";
    
    	foundNonSpace:
    		int firstSpace;
    		if (firstNonSpace > 1)
    		{
    			firstSpace = 1;
    			goto foundSpaces;
    		}
    		bool space = false;
    		for (int i = firstNonSpace; i < length; ++i)
    		{
    			if (value[i] == ' ')
    			{
    				if (space)
    				{
    					firstSpace = i - 1;
    					goto foundSpaces;
    				}
    				else
    					space = true;
    			}
    		}
    		return value;
    
    	foundSpaces:
    		char[] buf = ArrayPool<char>.Shared.Rent(length);
    		for (int i = 0; i < firstSpace; ++i)
    		{
    			buf[i] = value[i];
    		}
    		int iDest = firstSpace;
    		space = true;
    		for (int i = firstSpace; i < length; ++i)
    		{
    			char c = value[i];
    			if (c == ' ')
    			{
    				if (!space)
    				{
    					buf[iDest++] = c;
    					space = true;
    				}
    			}
    			else
    			{
    				buf[iDest++] = c;
    				space = false;
    			}
    		}
    		string res= new string(buf, 0, iDest);
    		ArrayPool<char>.Shared.Return(buf);
    		return res;
    	}
    
  • Naomi (unregistered) in reply to Sauron

    Ever since I first read this article, I've thought that the "right" way to deal with names (if simply treating them as arbitrary identifiers, akin to usernames, isn't an option) is to have a separate full/legal name and preferred/friendly name. If these are names of your own users, you can have them enter their full name and try to guess their preferred name; if you get it wrong, they can change it right there in the form, or post-registration.

    ...and then you have systems like my day job, which integrate with a ton of other systems that express most of those falsehoods, so -sigh-.

  • ZZartin (unregistered) in reply to Sauron

    And I don't know of any international law or treaty mandating that a name can't have 2 consecutive spaces.

    It's a presentation issue, a lot of people consider title cased with single spaces "cleaner"

  • (nodebb) in reply to Mr. TA

    Wouldn't stackalloc be even faster than ArrayPool? I'm not familiar with C#, but I looked it up since I figured a language in the C family had to have something for allocating on the stack. Since the array doesn't have to persist beyond this method call, it should be perfectly safe.

  • (nodebb) in reply to Balor64

    Potentially so, but there may be a risk with a large stack array running out the stack size, this can be potentially solved by checking the size and going one of two ways, but with ArrayPool, the performance is already so good, that the complication of conditionally using stackalloc may be slower than just going to ArrayPool.

  • (nodebb) in reply to dpm

    It's not a while by itself, it's a misleadingly formatted do-while. It's not infinite because in between checking the condition in the while, it runs the do block which increments the index.

Leave a comment on “Spaced Out Replacement”

Log In or post as a guest

Replying to comment #:

« Return to Article