• (nodebb)

    And in retrospect, it makes sense

    No. Just no. It does not make sense, or at least it's a case of top-tier first-stripe POLA violation. It is surprising that null + "some string" doesn't explode with a null pointer exception, especially if it's a naked null.

    Or maybe that's too many years doing C++, where calling a member function (like string::operator +(const string &other_string)) via a null pointer is automatically UB.

    Either way, it might be useful that it works like that, but it doesn't make sense.

  • LZ79LRU (unregistered) in reply to Steve_The_Cynic

    Why not? Null means nothing. As in, not missing or unknown value or anything like that but literally nothing. An empty void without a type or a size or a value or anything.

    And nothing + anything else = that anything else.

  • (nodebb) in reply to Steve_The_Cynic

    it works because in this case the compiler converts the "+" into a call to String::Concat, not into a call to an overloaded operator. And if you write string x = null + "something"; the compiler converts it to just string x = null + "something" Of course, don't ask me to judge if it makes sense or not :D

    Addendum 2023-08-01 07:18: even better (or worse, depending on the point of view): there is no overloaded operator + or += AT ALL in the .NET "String" class, so I suppose all compilers for all supported languages always convert concatenation operators to calls to the static method String::Concat.

    Addendum 2023-08-01 07:19: p.s. when I wrote:

    the compiler converts it to just string x = null + "something""

    I meant

    the compiler converts it to just string x = "something"

  • (nodebb)

    And if you prefer Java:

    String x = null + "y"; // becomes "nully"

    p.s. TRWTF is of course using Hungarian notation in C#, such as "string strName" (someone may say TRWTF is using Hungarian notation at all, but YMMV... it looked like a good idea 20+ years ago, and in some... places... was almost enforced...)

  • (nodebb)

    Another side effect, if strName is empty, the method returns null. And if strName is null, it outright crashes.

  • Robin (unregistered)

    I don't really speak C#, or Java - but I'm actually pretty surprised taking .length on a null value is apparently OK. Hell, even in Javascript this would blow up with a runtime error, and in Typescript it wouldn't compile (assuming the argument is typed correctly to allow null, and isn't any).

  • (nodebb) in reply to LZ79LRU

    No I'm with Steve. It's horrible. Firstly, the declaration should have been

    string newStrname = "";
    

    i.e. an empty string which is more self documenting. Secondly (and I'm not an expert in c# but isn't null untyped? Does this make sense?

    MyClassThatIsNotAString x = null;
    string someString = "";
    someString += x;
    

    It's all just horrible and null is definitely not a string so string concatenation should not work.

    Addendum 2023-08-01 08:03: sorry s/x/Twitter/g

  • Tim (unregistered)

    Urgh - This is the kind of code you would write in C in the 90's before we had libraries to do that sort of thing

    I'm with Remy on this - it's one of the rare (in this forum) cases where a regex would be the most appropriate solution, but if you're insistent on not using regex, something like this:

    while(s.contains(" ")) { s = s.replace (" ", " ") }

    would be simpler than the minefield of iterating through a string

  • (nodebb)

    There is so much wrong with this code, from not using StringBuilder over indexing the string multiple times to using char.Parse(" ") over a simple ' '. Whoever wrote this code has zero experience writing C# code obviously.

  • (nodebb)

    replaced it with the simpler return Regex.Replace

    And now you have two problems...

  • (nodebb) in reply to Robin

    Directly accessing an member that is null won't work in C#. The operator works because it's basically

    public static string? operator+(string? value1, string? value2);
    

    so the code turns into:

    var something = default(string?);
    
    something = string.operator+(something, "string");
    
  • Sauron (unregistered)

    The fact null + "some string" evaluates to "some string" is clearly saner than in JS (where it would evaluate to "nullsome string"), but it is still not great.

    null is not a string value. You can see it as a reference to the absence of a string if you want, but it is still not a string value.

    I is not the empty string. It is not the so-called "null character" (\0 in C). In fact, null is not a character at all.

    null is neither the abstract idea of an empty string (string with zero character) nor the C idea of an empty string (string made of a '\0' character followed by garbage in the rest of the allocated memory).

    So don't concatenate a string with null. Assign a string value to a string-typed variable that has the value null.

    But don't concatenate a string with null.

    The next time you suggest that it is okay to concatenate a string with null, you won't need a code review, you'll outright need an exorcist.

  • Sauron (unregistered) in reply to LZ79LRU

    And nothing + anything else = that anything else.

    And what would be that + operation that we can apply to anything?

    And what would be the truth table of that = comparison operator?

    Can you geniusly add or concatenate any abstract data type?

    Are you really 100% sure the comparison would be always be true without a surprise? (even with tricky values like NaN?)

    Think carefully before you answer, or we'll happily replace you with ChatGPT.

  • (nodebb) in reply to StefanoZ

    someone may say TRWTF is using Hungarian notation at all, but YMMV... it looked like a good idea 20+ years ago, and in some... places... was almost enforced...

    Well, that plus a whole bunch of bunches of people who failed to grasp what Simonyi had in mind with the original Hungarian notation, where the wart on the front of the variable name indicated "flavour", what kind of data was in the int or string or char * or whatever, rather than the implementation type. It was a way to work around the inadequacies of the language's type system, since all you have in C for strings is char *, and you can't tell from the language type whether that particular char * contains HTML-encoded, base64-encoded or raw characters, for example. At least with "applications" Hungarian, you have a chance.

    But lots of folks thought he meant "describe the language type" ("systems Hungarian"), leading to WORD wParam (in Windows message procedures, where "wParam" was literally the "word parameter") and similar sins. Curiously, though, there are still lots of AHN parameters in Windows, e.g. all those integers that are named cbSomething - a Count of Bytes.

  • (nodebb) in reply to Robin

    I don't really speak C#, or Java - but I'm actually pretty surprised taking .length on a null value is apparently OK

    It's not OK, in either C# or Java. It throws a NullReferenceException in C#, or NullPointerException in Java

  • Robin (unregistered) in reply to tom103

    Oh yeah, I only just realised I was getting mixed up between strName which is the input and presumably can't be null (not sure if a string in C# is nullable or not but in context I assume not, which makes me happy), and newStrName which is initialised as null and what the discussion in the article is actually about.

    Clearly I am TRWTF here - it's quite a common occurrence...

  • Jaloopa (unregistered) in reply to Robin

    string is a reference type and so nullable by default. This is a private method so if you're feeling charitable you can assume that everywhere that calls it has already vetted strName for nullness. If you're not then all bets are off

    Newer versions of C# let you turn on warnings that sort of get close to making reference types non nullable, where it keeps track of whether an object can be null at any given point and warns you if you do anything that assumes it isn't. It's a bolted on, late addition to the language and it shows, but it can be quite useful

  • Dyspeptic Curmudgeon (unregistered)

    cd 'somewhere' for file in *; do sed -i 's/ / /g' $file done

  • Jasmine (unregistered) in reply to Sauron

    It's not an "operation that is applied to an object". Operators in C# are ALYWAY static methods, e.g. on the string class in this case, and thus CANNOT (and should not) apply a special meaning to the first parameter. You may argue that it feels weird to concatenate something with null, but that is not really what happens from a language perspective - if you rewrite the operator as a static function call it feels less weird. If my memory serves me right, language projections onto the CLR like VB# only ever see the methods, anyway (I guess strings might be handled differently)

  • (nodebb) in reply to Jaloopa

    string is a reference type and so nullable by default.

    Yes.... but string concatenation works differently from every other operator on every other type. For performance reasons, the string concatenation operator is compiled down to calls to various overloads of Concat, and the documentation for Concat says: "String.Empty is used in place of any null argument."

    Addendum 2023-08-01 13:08: ... in addition, Concat happens to be as static method, so it will not throw a null reference exception.

  • Abigail (unregistered)

    Replacing any \s+ with a space does mean that if a string ends with a space and a newline, you keep the space and toss the newline. That may not be what is wanted, even if it's "according to the letter of the spec".

  • Alex (unregistered) in reply to Joe_D

    I always love the reflexive "two problems" response to any use of regex.

    Sure, regexes can absolutely be mis-used, or be difficult to interpret, but if I've learned anything from reading the DailyWTF, it's that everything (string concatenation, in this case) gets misused and can be difficult to interpret.

  • (nodebb)

    While I agree the regex is the solution for the job if you're wedded to the parsing approach there's no need to go to the hassle of starting at 1. Rather, chop one off the loop iteration and compare the character to the next character (or, more simply, just compare both characters to a space) rather than the previous. In the end copy the last character outside the loop.

    Result: It performs as it should, all duplicate whitespace is removed, no extra conditional in the loop.

  • Lőrinczy, Zsigmond (github)

    My legilimency suggest the original task was this: 'remove leading and trailing whitespaces, change inner whitespace-sequences to single space', so String.Trim would have been a good start.

  • FTB (unregistered) in reply to Tim

    I'm fairly sure that "neat" solution runs in O(n^2) time where the original runs in linear time.

  • Tinkle (unregistered)

    Coercing null to an empty string is a bit of a code smell - it is hiding a bug (setting newStrname to null, instead of String.Empty.)

    I am a bit anti-regex, but in this case it is the best solution. Although I would be questioning the specification - probably \h is better than \s, as they probably do not want to replace newlines/paragraphs with spaces.

  • Anon55 (unregistered)

    Someone would argue that throwing a regular expression in doesn't make things simpler. I think \s+ is simple enough that it won't cause issues, but it's also simple enough that you could've done without it.

  • (nodebb) in reply to Steve_The_Cynic

    Not to mention masterpieces such as the "lpsz" prefix: Long (that is, you know, 32-bit) Pointer to Zero(null)-terminated String, a bombastic name for humble char*. Conventions that mysteriously changed when moving from an Hungarian prefix to its related typedef, e.g. "sz" became STR, such as in LPCTSTR (itself another masterpiece). The apex (no, not that APEX!) was MFC, where Hungarian notation infected even the class names: they had to start with C, such as CString, CFile, CPtrList (*) ... because God forbid someone mistakes them for filthy structs! :D

    (*) & (Argh! the memories! CPtrList and its sibling CObList, those dreadful WTF bringers! the horror! the horror! )

  • LZ79LRU (unregistered) in reply to Sauron

    I would like to thank you.

    I started writing a reply to your post and it got ever longer and more complicated as I had to think more and more about it. And eventually I came to the conclusion that my original opinion was wrong.

    For reference, the opinion I held was that since null = nothing the compiler should just safely optimize it away. As in 1 + 2 + null + 3 = 1 + 2 + 3. But I now see how this could lead to interesting behavior and bugs.

    It's not often that sort of thing happens to me any more these days.

  • Tim (unregistered) in reply to FTB

    I'm pretty sure that premature optimization is the root of all evil https://stackify.com/premature-optimization-evil/

  • Officer Johnny Holzkopf (unregistered) in reply to Hand_E_Food

    The code would benefit from at least a null check and an empty check for strName, so null.Length() and "".Length() (as in "for (int i = 0; i < 0; i++)") would not show unwanted behaviour. On the other hand, the code itself is unwanted, so it should probably replace itself with one space.

  • Klimax (unregistered) in reply to StefanoZ

    And you overstated your case so much that you became WTF yourself. You want to distinguish Interfaces (I series of names) from implementation Classes (C series of names). There were pretty good reasons for this naming rule.

  • RLB (unregistered)

    Meh. null + whatever is a choice. One can argue for and against any choice. In SQL, CONCAT(NULL, 'string') evaluates to NULL. Is that more or less surprising than it evaluating to 'string'? I'd say less in SQL, apparently more in C#. Neither is inherently wrong.

  • ajo (unregistered) in reply to jeremypnet

    null is the absence of a value, but not really "untyped". So if you try to call anything with your "x" that expects a parameter that isn't of type "MyClassThatIsNotAString", the compiler will yell at you to fix that.

    I originally wanted to say that for the code you've suggested the compiler would yell at you, but it actually doesn't. the plus sign operator got a function defined for the parameters object and string that will return object.ToSring() concatenated to the string, with null handling that treats it the same as the empty string. So in your case the compiler is happy and at run time after the snipper your "someString" will just be the empty string (since it's just two empty strings concatenated).

    Unless you overload the plus sign operator to do something different in case the compiler encounters string + MyClassThatIsNotAString. e.g. you could have the following defined:

    public static string operator +(string x, MyClassThatIsNotAString b) => "hi";

    in which case your someString would end up holding the value "hi". (or gloriously crash in case b is null and you try to work with it)

  • (nodebb) in reply to Klimax
    Comment held for moderation.
  • (nodebb) in reply to Klimax

    You're right, it may be a bias from my part: since the "I" prefix made its way to .NET (and it still ubiquitous there today) while the "C" prefix did not, I may have been tricked in considering the former "normal" and the latter "evil".

    Anyway, I'm not sure about the other thing you say: the two prefixes may have different origins and so different rationales. At least in the Microsoft world, and as far as I can remember (please correct me if I'm wrong!), the "C" prefix originates from MFC, which never used the "I" prefix for interfaces (hmmm... maybe MFC had few or NO interfaces at all? I mean, of course, "C++ classes with all pure virtual member functions and no data", since there is no "interface" keyword in C++). Instead, the "I" prefix originates from the COM Specification and its IUnknown interface ... did COM advocate the "C" prefix for concrete objects as well, to distinguish them from interfaces? It does now (look for "Coding Style Conventions win32"... yes, unfortunately, COM is still alive), but I'm not sure it did THEN...

    On the other hand, Java was born in the Nineties, like COM and MFC, but it never felt the need for a prefix to distinguish interfaces and classes... so ... well, I don't know ... there may have been "pretty good reasons for this naming rule" as you say ... or not ?

    Sometimes I wonder if the entire programming universe is just a big WTF :D

  • Craig (unregistered)

    As an alternative to regex, instead of c == ' ', one could use char.IsWhiteSpace(c). If you want to exclude line separators from removal, you can do an additional check for char.IsControl(c).

  • tbo (unregistered) in reply to Steve_The_Cynic

    My favorite part of this comment is all the way at the end, where it implies that a language that is useful doesn't make sense.

Leave a comment on “Taking up Spaces”

Log In or post as a guest

Replying to comment #:

« Return to Article