The Daily WTF: Curious Perversions in Information Technology

2023-03-07 Reply Admin

They also ignore the fact that there could be a sign character in the first position of the string and it still be an integer. Function should be renamed IsPositiveInteger()

2023-03-07 Reply Admin

Negative numbers need not apply.

Mr. TA · 2023-03-07 Reply Admin

Maybe the code's author is a member of a sect who like C# but hate the .NET BCL?

colejohnson66 · 2023-03-07 Reply Admin

One optimization this code makes is to avoid actually parsing the string into an integer. Those res = res * 10 + cur add up, you know.

2023-03-07 Reply Admin

I'm not sure whether the WTF is in the implementation or the naming, but (charitably) assuming the latter, this should really be called IsPositiveInteger.

Steve_The_Cynic · 2023-03-07 Reply Admin

In this case, we exploit the fact that almost all character sets put the characters 0-9 adjacent to each other in sequence.

I know we're working in C# in the article, but in C and C++ this characteristic is required.

https://en.cppreference.com/w/c/language/charset sayeth

In both the source and execution basic character sets, the value of each character after 0 in the above list of decimal digits shall be one greater than the value of the previous.

https://en.cppreference.com/w/cpp/language/charset sayeth

The code unit value of each decimal digit character after the digit 0 (U+0030) shall be one greater than the value of the previous.

Rick · 2023-03-07 Reply Admin

Comparing against zero rather than a value may be a machine cycle faster. With modern CPUs that is on the order of 1/2,000,000,000 of a second faster. But back in the day, I frequently wrote my loops to count down to zero.

2023-03-07 Reply Admin

Or as some 'bad' programmers like to say: Don't be so negative.

dkf · 2023-03-07 Reply Admin

Comparing against zero rather than a value may be a machine cycle faster.

Often one cycle per iteration faster as no explicit compare will be required; the check against zero being naturally yielded as part of the decrement, and that'll be a case that any optimizing compiler will handle as it is such an obvious common win. That sort of thing is potentially significant in a critical inner loop, even now.

Elsewhere, counting downward is useful when dealing with unsigned arithmetic.

2023-03-07 Reply Admin

i have a question, is 1.0 or 1,000 or 1.05e+003 or -1 an integer? lets ask that method

2023-03-07 Reply Admin

Actually IsNonNegativeInteger

Dragnslcr · 2023-03-07 Reply Admin

Technically, IsNonNegativeInteger, since "0" would be valid.

nerd4sale · 2023-03-07 Reply Admin

I don't think the original algorithm was in PHP, since this type of pattern was heavily used in early 1980s BASIC. It's probably even much older, but I don't go back that far. ;-)

2023-03-07 Reply Admin

d'oh. You (and the poster below you) are of course correct.

2023-03-07 Reply Admin

It's not the good kind of cleverness that makes a complicated problem more clear and easier to understand, but the bad kind that exploits assumptions about low-level technical details.

This was a very clear expression of what distinguishes the Mel-style Real Programmers amongst us from those who can over-achieve even when (le gasp!) working with other people. I'm making a note of it.

The_MAZZTer · 2023-03-07 Reply Admin

I had a senior developer focus on micro optimizations for a web app targeting IE6 back in the day. It included reverse loops. All in all we cut down our loading times by like 50% in IE6 I think. But there were definitely some loops that could not be reversed that we found out the hard way.

We were also loading XML documents and manipulating the ODM directly to change data, rather than simply loading the data into lighter weight objects and dropping the XML DOM objects. That probably would have gotten us some good gains...

We also had issues with stack overflows in IE6 from legitimate code. We had to use our own stack array and looping instead of recursion in one spot inour code to combat this, and of course it made debugging anything a nightmare because the actual stack would get clobbered.

2023-03-07 Reply Admin

There's a fair amount of noise about how it declares invalid negative integers, and those expressed in scientific notation, and those with thousand separators, and so on. But unless we have some knowledge of the problem domain, we don't know for sure whether this is intentional, and only positive integers with no formatting etc. are valid.

It's a bit messy and overdesigned, but on the scale of 1 to 10 it barely cracks a 2.

2023-03-07 Reply Admin

Well, TryParse won't catch leading/trailing whitespace (IIRC, been out of the game for a few years now). This is way more realiable.

Sigh.

2023-03-07 Reply Admin

None of those are integers. All are representations of integers in text.

MaxiTB · 2023-03-07 Reply Admin

Just as a clarification to the article:

Not that C# support EBCDIC, but even EBCDIC obeys that common-sense rule.

var encoding = Encoding.GetEncoding("IBM037");

So yeah, ofc has the .net Framework supported ancient charsets, doesn't means that anyone still uses them and you have to register legacy encodings for netcore.

dkf · 2023-03-07 Reply Admin

But that's just about how you move bytes in the outside world into strings in the language, and vice versa. At the time you're considering the characters, the digit range is contiguous.

MaxiTB · 2023-03-07 Reply Admin

I also appreciate the reverse string traversal, as a pointless micro-optimization.

for (int i = pString.Length - 1; i >= 0; i--)

I can understand when a lot of people are confused, because there are two optimization attempts there and one just simply is wrong :-)

The first important information is, that in every c-style for condition the expression is validated with every expression. So having a property call there could in some situations result in overhead (property inlining is a difficult topic, and I just realized that most non-C# developers don't even know what properties are). So using the assignment in this way saves you the local variable if you want to be extra fancy.

The second optimization attempt is outright wrong and it's why the reverse order is used in the first place. It is a C++ thing that obviously a post increment or decrement operation needs to cache the result; so this means --x is more efficient than x-- because former doesn't need to cache the result in some way. In other word, the correct statement would look like this: for (var i = pString.Length; i > 0; --i) Does it matter for .net? Not really simply because this is one of the rare IL optimizations when the code is compiled to machine language.

MaxiTB · 2023-03-07 Reply Admin

Every string in .net is an unique immutable UTF-16 string. And the way your read/write strings is via text encoders. IBM037 is one of those encoders, which means there's a direct support in the .net framework for EBCDIC since .net1.0 because that's how string work - they always get translated, even UTF-16 strings pass through Encoding.UTF16.

2023-03-07 Reply Admin

You may or may not be correct as of a far-distant version, but as of C#7 you are not. (I just tried it.)

In any case, it's not possible to make a sensible argument in favour of the code presented (not even "efficiency", unless you're talking about some weird requirement to check about a billion strings and then throw the resultant integer away.

TryParse is efficient enough to be acceptable, and it returns the integer, and it copes with different environments (ok, not so much exponential notation), and you can specify either an unsigned int or a signed int. For that matter, you can even specify that the receiver is a byte (and you may include leading and trailing whitespace). I've tried that, too.

More importantly, the caller doesn't have to waste brain cells trying to figure out what is going on and whether or not it works.

2023-03-07 Reply Admin

An empty string is an integer, after all...

2023-03-07 Reply Admin

This function says that "123456789012345678901234567890" is an integer, even though it won't fit in even a 64-bit int.

2023-03-07 Reply Admin

| This function says that "123456789012345678901234567890" is an integer, even though it won't fit in even a 64-bit int.

This explains it! The function actually part of HugeNumberLibrary.IsInteger() , in which ulong.TryParse() was not an option.

2023-03-07 Reply Admin

Unless the value of -- or ++ is being used, the compiler can implement them any way it wants, it doesn't need to distinguish between pre- and post-Xrement (pun intended). In the case of the for loop, it can easily generate the same code to decrement and test i regardless of whether you write i-- and --i.

2023-03-07 Reply Admin

Unless the value of -- or ++ is being used, the compiler can implement them any way it wants, it doesn't need to distinguish between pre- and post-Xrement (pun intended). In the case of the for loop, it can easily generate the same code to decrement and test i regardless of whether you write i-- and --i.

MaxiTB · 2023-03-07 Reply Admin

https://learn.microsoft.com/en-us/dotnet/api/system.numerics.biginteger.tryparse?view=net-8.0

MaxiTB · 2023-03-07 Reply Admin

Yes, as I wrote before, it doesn't matter in .net; in C++ on the other hand this optimization is usually not done and the compiler will use the post-crement operator simply because it can be overloaded which is not the case for .net.

2023-03-08 Reply Admin

Empty strings passed to this function will return true

2023-03-08 Reply Admin

Careful now, your programmer envy is showing

2023-03-08 Reply Admin

It might worth mentioning that C actually requires this property of any encoding, quote from the standard:

"In both the source and execution basic character sets, the value of each character after 0 in the above list of decimal digits shall be one greater than the value of the previous." (C11, 5.2.1 Character sets, paragraph 3)

Given that, it's probably fair to assume no encoding would ever violate that. Of course, still no excuse to reinvent your language ;)

2023-03-09 Reply Admin

Minor point: when I taught C/C++ years ago, I'd do something like discussing the string library. My advice to my students was: read the headings to a library. You don't have to memorize them or know the parameters... just know that they exist. It's saves a programmer a lot of unnecessary work.

Trying Parses

Leave a comment on “Trying Parses”