The Daily WTF: Curious Perversions in Information Technology

2008-02-16 Reply Admin

fanguad:
You'll also need...:
What about non latin-based alphabets? You must define a character-supplier interface, and implement it for all character sets. Then you will need a factory to produce the right type of "supplier". Of course, it must all be configurable via xml in case they come up with a new character set, or in case we meet and start working with space aliens who don't happen to use any of the ones we use!

Don't be ridiculous. All space aliens speak English and have American accents.

Obviously you have not seen that old Star Trek episode where they encounter a parallel evolution of Nazis on an M-class planet...

JoeDelekto · 2008-02-16 Reply Admin

On a momentary tangent, it still surprises me how many people ignore (usually under the guise of "pre-mature optimization" being the root of all evil) the true cost of things.

In engineering school, the students are usually taught about not just the cost in terms of economic dollars, but in some cases, the amount of energy expended, the number of CPU cycles clicked, etc. In cases of abstraction where a lot of work for developers is now being handled by: a) the framework; and b) the compiler, the developer should still be cognizant of what is going on under the hood, since the abstraction of modern development tools merely shields people from the complexities of various algorithms as well as assists in the generation of boring and boilerplate code.

For anyone that is at all remotely interested in using .NET, Lutz Roeder's Reflector should be the #1 tool in any .NET developer's toolchest.

I created four simple C# static methods on the 'Main' object of a console application, using the various string concatenation methods that I've seen discussed here and weighed their cost using various criteria such as instructions used, function calls made, size of code, etc. I invite anyone using Reflector to see my findings as well, make sure you disassemble to "IL" code so you can see what the compiler truly generates.

        static void Test1()
        {
            string test1 = "part1" + Environment.NewLine + "part2" + Environment.NewLine + "part3" + Environment.NewLine;
        }

        static void Test2()
        {
            string newLine = Environment.NewLine;

            string test2 = "part1" + newLine + "part2" + newLine + "part3" + newLine;
        }

        static void Test3()
        {
            string newLine = Environment.NewLine;

            string test3 = String.Concat("part1", newLine, "part2", newLine, "part3", newLine);
        }

        static void Test4()
        {
            string test4 = String.Format("part1{0}part2{0}part3{0}", Environment.NewLine);
        }

I found, after examining the generated IL of a release build, that Test2 and Test3 generated identical code. That's because using the '+' operator becomes a String.Concat() call. In that regard, using either method in Test2() or Test3() is a matter of coding standards or what might be considered most readable/understandable to the developer.

Test1(), while also very readable, is highly inefficient, since every instance of Environment.Newline results in a function call to get the newline string for that environment.

Test4(), by the way, is very readable, looks nice, sweet and compact. However, look at it under the hood with reflector. See that objects are created and constructed, functions called, processing done that's not visible unless one actually looks under the hood. String.Format() does indeed use a StringBuilder under the hood, because it is most efficient when dealing with a homogenous set of values that it might expect to find in a format specifier.

StringBuilder, btw, is not a stigma, it does have some very good advantages when used to build large strings or strings that are non-deterministic due to logic within the flow of a program as well as being a conduit to marshal string data back and forth between non .NET code.

<soapbox>

Anyhow, my point is this: Never jump to conclusions based on readability of the code alone. Readable code is not necessarily the most optimal and premature optimization (which can make less-readable code) is also not necessarily the best path to choose. However, unless a developer is lazy and does not look at all the options to weigh them, they will not find a "happy medium" to choose between the two. Also, one has to understand the platform on which they are writing code. This includes not just your code, but also the operating system and in some cases, the CPU itself.

Anyhow, in the grand scheme of things, most people who write code as professionals are not necessarily writing code for themselves, they are developing an application for a user. The end user doesn't care about how readable your code is (you should document confusing code), they just care about getting results and getting them as soon as humanly possible. <grin>

</soapbox>

Anyhow, I tend to like the code presented in Test3() myself.

2008-02-16 Reply Admin

Martin:
Anonymous:
The real WTF is .NET. Seriously, how can using an eighteen-letter character combination instead of a two letter one be "the Right Way?".

It looks like that is the way .net is going. Anyone seen this :http://www.charlespetzold.com/etc/CSAML.html

Perhaps you didn't notice the date of the Petzold article.

2008-02-16 Reply Admin

Programmers should not need to worry about how to write a newline. All students of programming go through these same new line and string builder issues, and that is waste of time of the humankind. I bet in no other field of expertise they think about newline 3 pages of posts with over hundred of comments...

Why not create a character for newline. Empty character is "space". Then the newline character would be "new line", and you would get the newline by pressing certain key in keyboard and maybe with ctrl down or something. I mean, we have at-character, dollar-character, and even exclamation-character! Why not newline-character? How about Ascii 10: LF bound to somewhere in the keyboard and the system looks the system specific combination from a lookup table?

The same goes with adding strings. If there is need for StringBuilder, why the programming language doesn't implement "+"-operation with StringBuilder in the first place?

2008-02-16 Reply Admin

shouldn't this constant have been named cpCRLF?

(cp = C Pound).

2008-02-16 Reply Admin

My company does extensive java development on VMS and let me tell you, it is a true joy to write the C code that the JNI calls to convert files to stream_lf...so java can read it.

2008-02-16 Reply Admin

there's yet still another obvious way:

myString = "Line1 Line2 Line3";

2008-02-16 Reply Admin

sobani:
Windows has: "\r\n"
Linux has: "\n"

MacOS has: "\r"

That's why you should use the Right Way instead of the Easy Way.

Nah, that's just the reason OSX and Windows are wrong.

real_aardvark · 2008-02-16 Reply Admin

JoeDelekto:
On a momentary tangent, it still surprises me how many people ignore (usually under the guise of "pre-mature optimization" being the root of all evil) the true cost of things.
In engineering school, the students are usually taught about not just the cost in terms of economic dollars, but in some cases, the amount of energy expended, the number of CPU cycles clicked, etc. In cases of abstraction where a lot of work for developers is now being handled by: a) the framework; and b) the compiler, the developer should still be cognizant of what is going on under the hood, since the abstraction of modern development tools merely shields people from the complexities of various algorithms as well as assists in the generation of boring and boilerplate code.

For anyone that is at all remotely interested in using .NET, Lutz Roeder's Reflector should be the #1 tool in any .NET developer's toolchest.

I created four simple C# static methods on the 'Main' object of a console application, using the various string concatenation methods that I've seen discussed here and weighed their cost using various criteria such as instructions used, function calls made, size of code, etc. I invite anyone using Reflector to see my findings as well, make sure you disassemble to "IL" code so you can see what the compiler truly generates.
        static void Test1()
        {
            string test1 = "part1" + Environment.NewLine + "part2" + Environment.NewLine + "part3" + Environment.NewLine;
        }

        static void Test2()
        {
            string newLine = Environment.NewLine;

            string test2 = "part1" + newLine + "part2" + newLine + "part3" + newLine;
        }

        static void Test3()
        {
            string newLine = Environment.NewLine;

            string test3 = String.Concat("part1", newLine, "part2", newLine, "part3", newLine);
        }

        static void Test4()
        {
            string test4 = String.Format("part1{0}part2{0}part3{0}", Environment.NewLine);
        }
I found, after examining the generated IL of a release build, that Test2 and Test3 generated identical code. That's because using the '+' operator becomes a String.Concat() call. In that regard, using either method in Test2() or Test3() is a matter of coding standards or what might be considered most readable/understandable to the developer.
Test1(), while also very readable, is highly inefficient, since every instance of Environment.Newline results in a function call to get the newline string for that environment.

Test4(), by the way, is very readable, looks nice, sweet and compact. However, look at it under the hood with reflector. See that objects are created and constructed, functions called, processing done that's not visible unless one actually looks under the hood. String.Format() does indeed use a StringBuilder under the hood, because it is most efficient when dealing with a homogenous set of values that it might expect to find in a format specifier.

StringBuilder, btw, is not a stigma, it does have some very good advantages when used to build large strings or strings that are non-deterministic due to logic within the flow of a program as well as being a conduit to marshal string data back and forth between non .NET code.
<soapbox>
Anyhow, my point is this: Never jump to conclusions based on readability of the code alone. Readable code is not necessarily the most optimal and premature optimization (which can make less-readable code) is also not necessarily the best path to choose. However, unless a developer is lazy and does not look at all the options to weigh them, they will not find a "happy medium" to choose between the two. Also, one has to understand the platform on which they are writing code. This includes not just your code, but also the operating system and in some cases, the CPU itself.

Anyhow, in the grand scheme of things, most people who write code as professionals are not necessarily writing code for themselves, they are developing an application for a user. The end user doesn't care about how readable your code is (you should document confusing code), they just care about getting results and getting them as soon as humanly possible. <grin>
</soapbox>
Anyhow, I tend to like the code presented in Test3() myself.

Exquistite!

I tend to like the code presented in Test4(), myself.

Got any other pointless tests we can laugh at?

Like the soapbox thing, btw. Come down to Hyde Park Corner, some time. I'll be the gent with the carnation in my lapel and a ball peen hammer to take the rest of the world out of your misery.

2008-02-16 Reply Admin

Why make it harder than it is? If "\n" works, go with it. If it works, don't fix it. Especially not by making it more difficult.

2008-02-16 Reply Admin

JoeDelekto:
For anyone that is at all remotely interested in using .NET, Lutz Roeder's Reflector should be the #1 tool in any .NET developer's toolchest.

Not until it stops it with that "If you don't download the newest version I will refuse to run" idiocy.

2008-02-17 Reply Admin

JoeDelekto:
On a momentary tangent, it still surprises me how many people ignore (usually under the guise of "pre-mature optimization" being the root of all evil) the true cost of things.
In engineering school, the students are usually taught about not just the cost in terms of economic dollars, but in some cases, the amount of energy expended, the number of CPU cycles clicked, etc. In cases of abstraction where a lot of work for developers is now being handled by: a) the framework; and b) the compiler, the developer should still be cognizant of what is going on under the hood, since the abstraction of modern development tools merely shields people from the complexities of various algorithms as well as assists in the generation of boring and boilerplate code.

For anyone that is at all remotely interested in using .NET, Lutz Roeder's Reflector should be the #1 tool in any .NET developer's toolchest.

I created four simple C# static methods on the 'Main' object of a console application, using the various string concatenation methods that I've seen discussed here and weighed their cost using various criteria such as instructions used, function calls made, size of code, etc. I invite anyone using Reflector to see my findings as well, make sure you disassemble to "IL" code so you can see what the compiler truly generates.
        static void Test1()
        {
            string test1 = "part1" + Environment.NewLine + "part2" + Environment.NewLine + "part3" + Environment.NewLine;
        }

        static void Test2()
        {
            string newLine = Environment.NewLine;

            string test2 = "part1" + newLine + "part2" + newLine + "part3" + newLine;
        }

        static void Test3()
        {
            string newLine = Environment.NewLine;

            string test3 = String.Concat("part1", newLine, "part2", newLine, "part3", newLine);
        }

        static void Test4()
        {
            string test4 = String.Format("part1{0}part2{0}part3{0}", Environment.NewLine);
        }
I found, after examining the generated IL of a release build, that Test2 and Test3 generated identical code. That's because using the '+' operator becomes a String.Concat() call. In that regard, using either method in Test2() or Test3() is a matter of coding standards or what might be considered most readable/understandable to the developer.
Test1(), while also very readable, is highly inefficient, since every instance of Environment.Newline results in a function call to get the newline string for that environment.

Test4(), by the way, is very readable, looks nice, sweet and compact. However, look at it under the hood with reflector. See that objects are created and constructed, functions called, processing done that's not visible unless one actually looks under the hood. String.Format() does indeed use a StringBuilder under the hood, because it is most efficient when dealing with a homogenous set of values that it might expect to find in a format specifier.

StringBuilder, btw, is not a stigma, it does have some very good advantages when used to build large strings or strings that are non-deterministic due to logic within the flow of a program as well as being a conduit to marshal string data back and forth between non .NET code.
<soapbox>
Anyhow, my point is this: Never jump to conclusions based on readability of the code alone. Readable code is not necessarily the most optimal and premature optimization (which can make less-readable code) is also not necessarily the best path to choose. However, unless a developer is lazy and does not look at all the options to weigh them, they will not find a "happy medium" to choose between the two. Also, one has to understand the platform on which they are writing code. This includes not just your code, but also the operating system and in some cases, the CPU itself.

Anyhow, in the grand scheme of things, most people who write code as professionals are not necessarily writing code for themselves, they are developing an application for a user. The end user doesn't care about how readable your code is (you should document confusing code), they just care about getting results and getting them as soon as humanly possible. <grin>
</soapbox>
Anyhow, I tend to like the code presented in Test3() myself.

Someone posted this (Grovesy?) on page1, complete with the IL output.

by the way, you can use ildasm.exe to view the IL output, comes wit the .net framework.

2008-02-17 Reply Admin

You mean, there's a way other than "\n"? Yeah, I knoes that Windows uses \r\n but PHPGTK works as expected with just \n. Since PHPGTK is the only way I bother writing code that will never run on Windows, this is very much a NON ISSUE.

Seriously, why don't we programmers have a constant LF like:

$a="This is some test".LF."This is the next line";

And let LF be a constant that changes with whatever O/S? So stupid to be worrying about this stuff that was all but resolved in 1976!

JiP · 2008-02-17 Reply Admin

Why still bother with newlines? Just get a video card that allows you to put enough monitors next to each other so everything fits on a single line...

I will lead the way: To start with, I replaced my old 1280x1024 CRT with a 1440x900 LCD screen. Roughly an equal number of pixels, but longer and fewer lines. I have been looking for a 65536x16 pixel screen, but I have been unable to find one yet.

And as for the line separators used in this comment: Although soon I will no longer need them myself, I have strong hopes that everyone's viewer (even Notepad) will be able to grasp them. Otherwise you'll end up right-scrolling a very long way...

2008-02-17 Reply Admin

Why is "\n" non-portable? I'm not a .NET guy, but I've used "\n" on DOS/Windows for ages, and whatever's outputting the string takes care of it ... if you're outputting to the console or a file in text mode, it'll nicely change it to Newline ("\r\n") for you. If you're outputting to a file in binary mode, it'll output "\n" literally.

real_aardvark · 2008-02-17 Reply Admin

Sean:
sobani:
Windows has: "\r\n"
Linux has: "\n"

MacOS has: "\r"

That's why you should use the Right Way instead of the Easy Way.

Nah, that's just the reason OSX and Windows are wrong.

In the beginning, God created Ken Ritchie.

The world went predictably downhill after that.

JoeDelekto · 2008-02-17 Reply Admin

[quote user="real_aardvark] Exquistite!

I tend to like the code presented in Test4(), myself.

Got any other pointless tests we can laugh at?

Like the soapbox thing, btw. Come down to Hyde Park Corner, some time. I'll be the gent with the carnation in my lapel and a ball peen hammer to take the rest of the world out of your misery.[/quote]

Actually, they weren't tests, they were examinations of what the compiler generated from the different ways to write code. If you aren't one of those people who care how much things cost "under the hood" then, of course, you would find it pointless.

And as an aside, I believe nobody should take a hammer to their balls, no matter how peen they are. If that's what one expects in Hyde Park Corner, there's a good reason why that corner has been hidden.

real_aardvark · 2008-02-17 Reply Admin

Dan Neely:
shouldn't this constant have been named cpCRLF?
(cp = C Pound).

Actually, it should have been named "Ermintrude."

There are only so many meaningful cases of Hungarian notation to go round.

JoeDelekto · 2008-02-17 Reply Admin

Watson:
JoeDelekto:
For anyone that is at all remotely interested in using .NET, Lutz Roeder's Reflector should be the #1 tool in any .NET developer's toolchest.
Not until it stops it with that "If you don't download the newest version I will refuse to run" idiocy.

I know, it is quite an annoyance, but it is still the best tool I could find that shows what is going on under the hood when something doesn't work as expected.

To give an example, a person once complained that when they changed the opacity property of their form, dragging it around the screen was slow, but when they set the opacity back to full (after it had already been full), the dragging was still slow.

After digging into the framework using reflector, I found that there was a 'bug' in which an API call wasn't made to make the window unlayered.

Over time, it still amazes me how people don't think that either the third party libraries or even the documentation are flawless, because they don't have the skills to investigate it.

real_aardvark · 2008-02-17 Reply Admin

JoeDelekto:
Actually, they weren't tests, they were examinations of what the compiler generated from the different ways to write code. If you aren't one of those people who care how much things cost "under the hood" then, of course, you would find it pointless.
And as an aside, I believe nobody should take a hammer to their balls, no matter how peen they are. If that's what one expects in Hyde Park Corner, there's a good reason why that corner has been hidden.

Yes, Hyde Park Corner is well-nigh invisible, and has been ever since the fourth Harry Potter book. Or was it the fifth? I wasn't paying attention; I was too busy looking under the hood.

I program in C++. I have no problem with people programming in C#. I don't even have a problem with people programming in Java, although I occasionally allow myself the luxury of wondering, "Why?"

Every now and again, I wonder quite how much CPU might be taken up by a particular loop, or invocation of the OS kernel, or an inadvertent call to the assembly instruction "Halt and catch fire."

I do not, as a matter of course, trouble my few remaining grey cells by exercising them with the concept of converting EOL into CIL. The poor little buggers are still trying to work out whether Hyde Park Corner disappeared between pages 600 and 650, or perhaps even later.

On the whole, I prefer to reserve them for more important computational tasks. Like writing code that actually makes some goddamn sense.

Which, I believe, was my original point.

real_aardvark · 2008-02-17 Reply Admin

JoeDelekto:
Over time, it still amazes me how people don't think that either the third party libraries or even the documentation are flawless, because they don't have the skills to investigate it.

That would be "think," rather than "don't think."

An inspiration for us all, I don't think.

JoeDelekto · 2008-02-17 Reply Admin

real_aardvark:
Sean:
sobani:
Windows has: "\r\n"
Linux has: "\n"

MacOS has: "\r"

That's why you should use the Right Way instead of the Easy Way.

Nah, that's just the reason OSX and Windows are wrong.
In the beginning, God created Ken Ritchie.

The world went predictably downhill after that.

I suppose you weren't referring to Brian Kernighan and Dennis Ritchie, those who birthed the 'C' Programming language.

JoeDelekto · 2008-02-17 Reply Admin

real_aardvark:
JoeDelekto:
Over time, it still amazes me how people don't think that either the third party libraries or even the documentation are flawless, because they don't have the skills to investigate it.
That would be "think," rather than "don't think."
An inspiration for us all, I don't think.

Touche' ...I have expanded it out. "People do not think that either the third party libraries or even the documentation are flawless, because they do not have the skills to investigate it."

Yep, I made a mistake. The "do not think" and "are flawless" contradict my point. I should have said: "People think, at times, that third party libraries or the documentation are flawless, yet they do not have the skills to investigate it."

Thank you for the correction.

JoeDelekto · 2008-02-17 Reply Admin

real_aardvark:
JoeDelekto:
Actually, they weren't tests, they were examinations of what the compiler generated from the different ways to write code. If you aren't one of those people who care how much things cost "under the hood" then, of course, you would find it pointless.
And as an aside, I believe nobody should take a hammer to their balls, no matter how peen they are. If that's what one expects in Hyde Park Corner, there's a good reason why that corner has been hidden.
Yes, Hyde Park Corner is well-nigh invisible, and has been ever since the fourth Harry Potter book. Or was it the fifth? I wasn't paying attention; I was too busy looking under the hood.

I program in C++. I have no problem with people programming in C#. I don't even have a problem with people programming in Java, although I occasionally allow myself the luxury of wondering, "Why?"

Every now and again, I wonder quite how much CPU might be taken up by a particular loop, or invocation of the OS kernel, or an inadvertent call to the assembly instruction "Halt and catch fire."

I do not, as a matter of course, trouble my few remaining grey cells by exercising them with the concept of converting EOL into CIL. The poor little buggers are still trying to work out whether Hyde Park Corner disappeared between pages 600 and 650, or perhaps even later.

On the whole, I prefer to reserve them for more important computational tasks. Like writing code that actually makes some goddamn sense.

Which, I believe, was my original point.

Well, I think code that makes 'sense' changes over time. A person who is used to writing console applications gets a bit confused about the whole "event-driven" paradigm when they use a different operating system.

What used to make 'sense' to me from one realm has changed when I entered the next.

However, I had, at one point, stopped to do things the way I was used to doing them as a developer and consider the user who was using what I had written. (Writing code for myself was never a problem, writing code for other persons introduced new things.)

I like to write code that makes sense, however, there are times I have to write code which doesn't. If that is ever the case, I document it thoroughly within comments in the code.

real_aardvark · 2008-02-17 Reply Admin

JoeDelekto:
Thank you for the correction.

Not at all. Thank you for the Ritchie one.

2008-02-18 Reply Admin

Actually, in Java, "\n" actually IS a Unicode newline character, which gets translated to the right sequence of bytes (#13, #10, or #13#10) when outputting it through an OutputStream. Strings are always Unicode (2 bytes/char) in Java.

I'm kinda surprised it doesn't work the same way in .NET...

JiP · 2008-02-18 Reply Admin

Never mind the earlier funny remarks about not using newlines altogether:

I seem to remember MS-DOS (versions 1.xx through 6.xx) was the only OS around at the time somehow needed the redundant CHR(10) (in style, this is GW-BASIC ;-) ) extension to get printers (you know, the old needles-and-pins-things-that-drove-you-crazy-with-their-incredible-noise ones) to output documents correctly. Like using a backslash instead of a forward slash as directory separator, I suspect this was done by Microsoft to prevent compatibility between the then-existing OSes and their new kiddy.

Can anyone confirm this? It's been a long time...

2008-02-18 Reply Admin

For very old printers, CRLF makes sense because you sometimes want to do only CR to print two lines on top of each other, for underlining, bold print and various composed characters. (One could also use BACKSPACE for this)

You might want to use a naked LF too, though that would be a rarer case. (When writing a narrow column of text down the middle of the paper, without having the print head moving all the way to the margin for each line)

Since, in the very early days, you didn't want to waste CPU on processing files when printing them, you stored the files with CRLF in them. This meant a file could contain underlining etc. in a way the printer handled directly.

When Unix was created, they decided that you would always use a device driver of some sort to talk to printers, and that driver could convert things as needed. Given that, it made more sense to use a single character for newline.

The decided that a lone LF should be converted to CRLF for output, while a lone CR would just be sent through. This way you could still do overprinting like you used to, without having to waste that extra character on lines that didn't need it. However, you lost the possibility of sending naked LFs to the printer, so this was a trade-off.

MS-DOS came later than UNIX, but UNIX was still a small player in a field with very many players. MS-DOS decided to go with CRLF, which was not uncommon in those days. (I must admit I have no idea how common, I was still a wee lad at the time)

As for which slash to use as a path separator, there where a lot of different practices for that too. The whole concept of directories and paths were quite new and were being reinvented all over the place with different syntaxes. Again, there were no reason for MS to copy Unix in particular, so they chose backslash as the character least likely to appear in file names.

Microsoft could have changed this later on, but breaking backwards compatibility is not done lightly.

2008-02-18 Reply Admin

Sorry - I don't find %n on that page. And it doesn't work either.
I allways use \n. Sooner or later MS will find out, that unix did it right in the first place. I don't support notepad - other editors handle \n gracefully on windows.

The format specifiers which do not correspond to arguments have the following syntax: %[flags][width]conversion

And, later:

'n' line separator The result is the platform-specific line separator

Which is to say, since neither flags nor width apply to newline, %n is a platform-specific line separator.

Not sure why it doesn't work for you.

-fred

2008-02-18 Reply Admin

"make sure you disassemble to "IL" code so you can see what the compiler truly generates." Except, as I've already pointed out, what the compiler generates isn't what gets executed because of JITing. It's perfectly possible for the JITer to chose to inline the property, meaning that Test1 would be better than Test2. In my preliminary investigations, it seems that it's not inlined, while all literature suggests it should be :S

"The end user doesn't care about how readable your code is (you should document confusing code), they just care about getting results and getting them as soon as humanly possible." Not true. Depends on the type of consumer. If it's a home user, they care not only that it's fast, but also that it is to a degree stable. If it's a business or power user then they also tend to care that bugs get fixed quickly. Both of these (stability and maintainability) require clear, clean and easy to understand code.

2008-02-18 Reply Admin

Wow! For a brief moment I thought that this was directed at me... Just convert the '\n' legacy code (that I wrote) to a system compatible equivilent string programmatically and come one that's not so bad is... WHAT?!?!

Oh yeah, this is thedailywtf.com. Don't prognosticate because the very reason to read this is because it these stories defy all reason and justifiability. Just remember: CD tray as cup holderl CD tray as cup holder...

2008-02-19 Reply Admin

I just hope that the people here who keep using '\n' know what they're talking about. The '\n' is a C construct that is LF while in code/memory, and expands to CR, LF or CRLF when written to disk (depending on the platform).

This code

fprintf(file, "\r\n"); // must always be Windows newlines!

will fail spectacularly on Windows, producing CRCRLF once it's written :) (I wonder, would it produce CRCR on Apple?)

2008-02-19 Reply Admin

No for a small number of strings String.Concat is the correct way to do it, it doesn't parse the string for arguments.

savar · 2008-02-19 Reply Admin

[email protected]:
On Windows platforms, "\r\n" == Environment.NewLine. If you know the only platform your software will run on is Windows (e.g. server-side code), then there's no real distinction.

what other platforms are people running .net on?

2008-02-19 Reply Admin

JoeDelekto:
On a momentary tangent, it still surprises me how many people ignore (usually under the guise of "pre-mature optimization" being the root of all evil) the true cost of things.
I created four simple C# static methods on the 'Main' object of a console application, using the various string concatenation methods that I've seen discussed here and weighed their cost using various criteria such as instructions used, function calls made, size of code, etc. I invite anyone using Reflector to see my findings as well, make sure you disassemble to "IL" code so you can see what the compiler truly generates.

Except that IL (being the Intermediate Language), of course, isn't executed. It's basically just a preparsed C# that's quicker to compile by the JITter. If you're not looking at x86 (or x64, or IA-64, etc.) assembly, then you've also missed the "true" cost.

Oh - and I'd be curious as to what kind of braindead program would have string concats of Environment.NewLine as it's bottleneck. Profile, and then optimize. Anything else is just bound to be wrong.

real_aardvark · 2008-02-19 Reply Admin

Jb:
Wow! For a brief moment I thought that this was directed at me... Just convert the '\n' legacy code (that I wrote) to a system compatible equivilent string programmatically and come one that's not so bad is... WHAT?!?!
Oh yeah, this is thedailywtf.com. Don't prognosticate because the very reason to read this is because it these stories defy all reason and justifiability. Just remember: CD tray as cup holderl CD tray as cup holder...

OK, guys, let's do our best not to prognosticate. Remember, it's:

"To predict or forecast, especially through the application of skill."

Entirely relevant to the OP, I would say.

Cloak · 2008-02-21 Reply Admin

Aaron:
Grovesy:
String.Format uses StringBuilder.
Which is why you should never ever do

StringBuilder sb = new ...

... sb.Append(String.Format("hello {0}", someString);

A second string builder is created.

Why should you "never ever" do that? Is there a shortage of StringBuilders that nobody has told me about?
What if you're creating an error report that's 500 lines long and 20 of those lines have a few tokens to replace, like system info and such? Should you replace the entire thing with one string.Format(...) call with 50 parameters? Or are you instead suggesting that we replace the original 20 Format()ed lines with 80 repetitive Append calls?

The purpose of a StringBuilder is to avoid the exponential performance associated with a pantload of immutable concatenations. So something like:
StringBuilder sb = new StringBuilder();
for (int i = 0; i < 5000; i++)
{
  sb.Append("Hello, " + "there.")
}
is still far, far better than the old
string s = string.Empty;
for (int i = 0; i < 5000; i++)
{
  s = s + "Hello, " + "there.";
}
Yes, both ways are stupid in this contrived example, and if you are writing a really tight loop (which most people reading this have probably never had to write), then you might want to start worrying about things like this. But for the vast majority of programmers and programs, the important concept is just to use the mutable StringBuilder rather than creating the same immutable string over and over again through concats. It doesn't really matter if you use a regular old concat or Format() to feed one individual line to the StringBuilder.

There's a balance to maintain between performance and code readability. Maybe you really need to optimize the hell out of some chunk of string-twiddling code, but maybe you don't, so never say never.

I have the impression that things are goin this way:

In fact, CSAML is able to rid itself of every symbol used in old-syntax C#. For example, consider the following old-syntax C# assignment statement:

  A = 5 * (B + 27 * C);

This statement translates without much fuss into the following chunk of CSAML:

  <ExpressionStatement>
      <Assignment LValue="A">
          <Assignment.Expression>
              <MultiplicationExpression>
                  <Multiplication.Multiplier>
                      <Literal Type="{x:Type Int32}"
                               Value="5" />
                  </Multiplication.Multiplier>
                  <Multiplication.Multiplicand>
                      <AdditionExpression Augend="B">
                          <AdditionExpression.Addend>
                              <Multiplication.Multiplier>
                                  <Literal Type="{x:Type Int32}"
                                           Value="27" />
                              </Multiplication.Multiplier>
                              <MultiplicationExpression Multiplicand="C"/>
                        </AdditionExpression.Addend>
                     </AdditionExpression>
                </Multiplication.Multiplicand>
              </MultiplicationExpression>
          </Assignment.Expression>
      </Assignment>
  </ExpressionStatement>

I first thought that this guy is making a joke but it seems that is bloody serious about it. Look here: http://www.charlespetzold.com/etc/CSAML.html

Anybody out there who wants to fight against such insaneness? I will stop programming when I will have to write code like that.

Somebody has to stop M$'s stupidity. What's the future of programming? In the next phase we will also have to put XML in another format like eXtended XML (XXML)? Where is the end? Writing code like

A + B = C

should be 20 lines of code (or more) with several hundreds of characers? The guys who promote such code should be shot before they create greater damage!!! Unbelievable!

Cloak · 2008-02-21 Reply Admin

Karellen:
Well, in C and C-derived languages, the correct way is the same as the easy way. It's just:
myString = "Line1\nLine2\nLine3";

If this string is written to a file, stream, socket, etc... that has been opened in text mode, "\n" in the string should be converted into the platform's newline sequence. Similarly, when reading, the platform's newline sequence should be converted to a "\n" for the programmer.

This is why you seek()/fseek() has quirks when working with text-mode files on platform's newline sequence is not a single byte.

Of course, UNIX makes things easy by requiring that '\n' is a single byte.

Well, this is not to defend M$: before computers there were typewriters and they had the possibility to Return the Carriage and to Feed a Line. Both were separate "processes" and hence the CRLF instead of a \n. Doesn't make much sense in our days but explains the Why.

BrownHornet · 2008-02-21 Reply Admin

Boris:
Oh please no. How exactly would you program in something like this? I mean with XAML you can create it all in a visual designer, but actual programming functions? To quote one example, A = 5 * (B + 27 * C); I mean anyone who knows simple algebra would understand this (minus the semicolon in the end maybe). And no one will ever forget what = means (at least in the context of a single programming language). How the heck are you supposed to remember <Assignment.Expression>? And this is a "simple" example.

Cloak:
I have the impression that things are goin this way:
In fact, CSAML is able to rid itself of every symbol used in old-syntax C#. For example, consider the following old-syntax C# assignment statement:

A = 5 * (B + 27 * C);

This statement translates without much fuss into the following chunk of CSAML:

[code snippet removed]

I first thought that this guy is making a joke but it seems that is bloody serious about it. Look here: http://www.charlespetzold.com/etc/CSAML.html

Anybody out there who wants to fight against such insaneness? I will stop programming when I will have to write code like that.

Somebody has to stop M$'s stupidity. What's the future of programming? In the next phase we will also have to put XML in another format like eXtended XML (XXML)? Where is the end? Writing code like

A + B = C

should be 20 lines of code (or more) with several hundreds of characers? The guys who promote such code should be shot before they create greater damage!!! Unbelievable!

Did you guys miss the date at the end of the article?

Cloak · 2008-02-22 Reply Admin

What about:

Environment.Characters.L + Operation.Concatenate + Environment.Characters.i + Operation.Concatenate + Environment.Characters.n + Operation.Concatenate + Environment.Characters.e + Operation.Concatenate + Environment.NewLine

This must be THE ONLY RIGHT WAY!!!

Cloak · 2008-02-22 Reply Admin

RAW:
It's actually:
System.getProperty("line.separator")

but your point is still valid.

Pleeze sent me teh codez to create a new line

2008-03-02 Reply Admin

Paolo G:
"codeier"? That would be the comparative of "codey", right? Adjectives ending in -y form the comparative and superlative using -ier and -iest, and so do adjectives ending in -ey. So "codey" -> "codier", "codiest".

2008-03-12 Reply Admin

Hehe, wonder how the source for Environment.NewLine looks... >:o)

2008-06-30 Reply Admin

Buddy:
Actually, I've done similar - maybe not so verbose.
If you don't know which platform your code will run on, but you absolutely need to be certain that you write a CR/LF pair (or any kind of line termination) because you know what platform will read the file, specifying each character explicitly is the only way.

In C, I'd use something like this:

#define CRLF "\x0d\x0a"

...

fprintf(file, "Ghastly weather%s" CRLF, (emphasize)? "!": ".");

Or you can just do this

#define CRLF "\r\n"

So much typing...ahhh, I think I am going to break my fingers!

2008-07-18 Reply Admin

and therein is proof positive of Windows inherent bloat and redundancy.

2008-07-18 Reply Admin

Sean:
sobani:
Windows has: "\r\n"
Linux has: "\n"

MacOS has: "\r"

That's why you should use the Right Way instead of the Easy Way.

Nah, that's just the reason OSX and Windows are wrong.

Nah, only one of the three is wrong. The bloated and redundant one. OSX and Linux are just different flavours.

2010-12-07 Reply Admin

fanguad:
Don't be ridiculous. All space aliens speak English and have American accents.

I would like to respond to another old thread with a post that nobody will ever see, and mention that Doctor Who^H^HThe Doctor would disagree with you.

His Own Way to Newline

Leave a comment on “His Own Way to Newline”