• Brian Boorman (unregistered)

    They also ignore the fact that there could be a sign character in the first position of the string and it still be an integer. Function should be renamed IsPositiveInteger()

  • some guy (unregistered)

    Negative numbers need not apply.

  • (nodebb)

    Maybe the code's author is a member of a sect who like C# but hate the .NET BCL?

  • (nodebb)

    One optimization this code makes is to avoid actually parsing the string into an integer. Those res = res * 10 + cur add up, you know.

  • Robin (unregistered)

    I'm not sure whether the WTF is in the implementation or the naming, but (charitably) assuming the latter, this should really be called IsPositiveInteger.

  • (nodebb)

    In this case, we exploit the fact that almost all character sets put the characters 0-9 adjacent to each other in sequence.

    I know we're working in C# in the article, but in C and C++ this characteristic is required.

    https://en.cppreference.com/w/c/language/charset sayeth

    In both the source and execution basic character sets, the value of each character after 0 in the above list of decimal digits shall be one greater than the value of the previous.

    https://en.cppreference.com/w/cpp/language/charset sayeth

    The code unit value of each decimal digit character after the digit 0 (U+0030) shall be one greater than the value of the previous.

  • (nodebb)

    Comparing against zero rather than a value may be a machine cycle faster. With modern CPUs that is on the order of 1/2,000,000,000 of a second faster. But back in the day, I frequently wrote my loops to count down to zero.

  • Brian (unregistered) in reply to Robin

    Or as some 'bad' programmers like to say: Don't be so negative.

  • (nodebb)

    Comparing against zero rather than a value may be a machine cycle faster.

    Often one cycle per iteration faster as no explicit compare will be required; the check against zero being naturally yielded as part of the decrement, and that'll be a case that any optimizing compiler will handle as it is such an obvious common win. That sort of thing is potentially significant in a critical inner loop, even now.

    Elsewhere, counting downward is useful when dealing with unsigned arithmetic.

  • giammin (unregistered)

    i have a question, is 1.0 or 1,000 or 1.05e+003 or -1 an integer? lets ask that method

  • Jason Stringify (unregistered) in reply to Robin

    Actually IsNonNegativeInteger

  • (nodebb) in reply to Robin

    Technically, IsNonNegativeInteger, since "0" would be valid.

  • (nodebb)

    I don't think the original algorithm was in PHP, since this type of pattern was heavily used in early 1980s BASIC. It's probably even much older, but I don't go back that far. ;-)

  • Robin (unregistered) in reply to Jason Stringify

    d'oh. You (and the poster below you) are of course correct.

  • Alex (unregistered)

    It's not the good kind of cleverness that makes a complicated problem more clear and easier to understand, but the bad kind that exploits assumptions about low-level technical details.

    This was a very clear expression of what distinguishes the Mel-style Real Programmers amongst us from those who can over-achieve even when (le gasp!) working with other people. I'm making a note of it.

  • (nodebb)

    I had a senior developer focus on micro optimizations for a web app targeting IE6 back in the day. It included reverse loops. All in all we cut down our loading times by like 50% in IE6 I think. But there were definitely some loops that could not be reversed that we found out the hard way.

    We were also loading XML documents and manipulating the ODM directly to change data, rather than simply loading the data into lighter weight objects and dropping the XML DOM objects. That probably would have gotten us some good gains...

    We also had issues with stack overflows in IE6 from legitimate code. We had to use our own stack array and looping instead of recursion in one spot inour code to combat this, and of course it made debugging anything a nightmare because the actual stack would get clobbered.

  • Prime Mover (unregistered)

    There's a fair amount of noise about how it declares invalid negative integers, and those expressed in scientific notation, and those with thousand separators, and so on. But unless we have some knowledge of the problem domain, we don't know for sure whether this is intentional, and only positive integers with no formatting etc. are valid.

    It's a bit messy and overdesigned, but on the scale of 1 to 10 it barely cracks a 2.

  • Scott (unregistered)

    Well, TryParse won't catch leading/trailing whitespace (IIRC, been out of the game for a few years now). This is way more realiable.

    Sigh.

  • Dave (unregistered) in reply to giammin

    None of those are integers. All are representations of integers in text.

  • (nodebb)

    Just as a clarification to the article:

    Not that C# support EBCDIC, but even EBCDIC obeys that common-sense rule.

    var encoding = Encoding.GetEncoding("IBM037");

    So yeah, ofc has the .net Framework supported ancient charsets, doesn't means that anyone still uses them and you have to register legacy encodings for netcore.

  • (nodebb) in reply to MaxiTB

    But that's just about how you move bytes in the outside world into strings in the language, and vice versa. At the time you're considering the characters, the digit range is contiguous.

  • (nodebb) in reply to nerd4sale

    I also appreciate the reverse string traversal, as a pointless micro-optimization.

    for (int i = pString.Length - 1; i >= 0; i--)

    I can understand when a lot of people are confused, because there are two optimization attempts there and one just simply is wrong :-)

    The first important information is, that in every c-style for condition the expression is validated with every expression. So having a property call there could in some situations result in overhead (property inlining is a difficult topic, and I just realized that most non-C# developers don't even know what properties are). So using the assignment in this way saves you the local variable if you want to be extra fancy.

    The second optimization attempt is outright wrong and it's why the reverse order is used in the first place. It is a C++ thing that obviously a post increment or decrement operation needs to cache the result; so this means --x is more efficient than x-- because former doesn't need to cache the result in some way. In other word, the correct statement would look like this: for (var i = pString.Length; i > 0; --i) Does it matter for .net? Not really simply because this is one of the rare IL optimizations when the code is compiled to machine language.

  • (nodebb) in reply to dkf

    Every string in .net is an unique immutable UTF-16 string. And the way your read/write strings is via text encoders. IBM037 is one of those encoders, which means there's a direct support in the .net framework for EBCDIC since .net1.0 because that's how string work - they always get translated, even UTF-16 strings pass through Encoding.UTF16.

  • Sole Purpose Of Visit (unregistered) in reply to Scott

    You may or may not be correct as of a far-distant version, but as of C#7 you are not. (I just tried it.)

    In any case, it's not possible to make a sensible argument in favour of the code presented (not even "efficiency", unless you're talking about some weird requirement to check about a billion strings and then throw the resultant integer away.

    TryParse is efficient enough to be acceptable, and it returns the integer, and it copes with different environments (ok, not so much exponential notation), and you can specify either an unsigned int or a signed int. For that matter, you can even specify that the receiver is a byte (and you may include leading and trailing whitespace). I've tried that, too.

    More importantly, the caller doesn't have to waste brain cells trying to figure out what is going on and whether or not it works.

  • a cow (not a robot) (unregistered)
    Comment held for moderation.
  • Andrew Klossner (unregistered)

    This function says that "123456789012345678901234567890" is an integer, even though it won't fit in even a 64-bit int.

  • LCrawford (unregistered) in reply to Andrew Klossner

    | This function says that "123456789012345678901234567890" is an integer, even though it won't fit in even a 64-bit int.

    This explains it! The function actually part of HugeNumberLibrary.IsInteger() , in which ulong.TryParse() was not an option.

  • Barry Margolin (github) in reply to MaxiTB

    Unless the value of -- or ++ is being used, the compiler can implement them any way it wants, it doesn't need to distinguish between pre- and post-Xrement (pun intended). In the case of the for loop, it can easily generate the same code to decrement and test i regardless of whether you write i-- and --i.

  • Barry Margolin (github) in reply to MaxiTB

    Unless the value of -- or ++ is being used, the compiler can implement them any way it wants, it doesn't need to distinguish between pre- and post-Xrement (pun intended). In the case of the for loop, it can easily generate the same code to decrement and test i regardless of whether you write i-- and --i.

  • (nodebb) in reply to LCrawford

    https://learn.microsoft.com/en-us/dotnet/api/system.numerics.biginteger.tryparse?view=net-8.0

  • (nodebb) in reply to Barry Margolin

    Yes, as I wrote before, it doesn't matter in .net; in C++ on the other hand this optimization is usually not done and the compiler will use the post-crement operator simply because it can be overloaded which is not the case for .net.

  • Griffyn (unregistered)
    Comment held for moderation.
  • Joh (unregistered) in reply to Alex
    Comment held for moderation.
  • Felix Palmen (unregistered)

    It might worth mentioning that C actually requires this property of any encoding, quote from the standard:

    "In both the source and execution basic character sets, the value of each character after 0 in the above list of decimal digits shall be one greater than the value of the previous." (C11, 5.2.1 Character sets, paragraph 3)

    Given that, it's probably fair to assume no encoding would ever violate that. Of course, still no excuse to reinvent your language ;)

  • Argle (unregistered)

    Minor point: when I taught C/C++ years ago, I'd do something like discussing the string library. My advice to my students was: read the headings to a library. You don't have to memorize them or know the parameters... just know that they exist. It's saves a programmer a lot of unnecessary work.

Leave a comment on “Trying Parses”

Log In or post as a guest

Replying to comment #597978:

« Return to Article