• (nodebb)

    Dont' be too hard on them. At least they contributed to an Open Source project to improve its quality!

  • Hanzito (unregistered)

    We can assume the string is '\0' terminated. It's the default, after all. And a strcmp when the string isn't '\0' terminated isn't unsafe, I think. It may crash the application, but not alter memory.

  • (nodebb)

    That looks like AI-generated code.

  • Brian Boorman (unregistered)

    Is there a limit on the length of a URL by the specification? Because I see a 2800 byte buffer of 0x00 that they copy into. I assume that's to "ensure" null termination to make the later string functions safe, but only if it's guaranteed that the input string is 2799 bytes or less in length.

  • (author) in reply to Hanzito

    That's not automatically a safe assumption, but also, we have no guarantee that the null terminator appears in the first 2800 bytes.

  • (author) in reply to AGlezB

    This code is old enough that I don't think anything artificial or anything intelligent was involved in its production.

  • Argle (unregistered) in reply to Brian Boorman

    Yeah, there's a limit. A couple years ago, I was working with a non-programmer who was tweaking a web page that used a library to fill in columns in a table. He tried adding a new column and it failed catastrophically. He was certain he had done it exactly like the others and had wasted two days trying to figure out what he did wrong. I took one look and realized that the whole request was a GET with all parameters in the URL. His new column was the one that broke the camel's back. Fortunately, the library had a version that took a POST and the column requests in the body. A few minutes later I had his fix for him.

  • (nodebb)

    [...] isn't unsafe [...] it may crash the application

    You and I have very different definitions of the word "unsafe".

  • (nodebb)

    Even if you don't know about strstr you can at least use

    if (proto_str[j] == '=' && proto_str[j+1] == 'V' && proto_str[j+2] == 'V')
    

    instead of all those nested if statements.

  • Sauron (unregistered)

    Looks like the code was written by an intern randomly thrown at the project. Here's probably how it played out.

    Manager: "We need to tell apart the DRM protocol in URLs."

    Lead dev: "Nobody is available, the VP is ordered that all manpower be used to make the buttons rounder, and change add an animated opacity transition to the dropdown menus."

    Manager: "What about the Kerbleckistani contractors?"

    Lead dev: "They have no Internet today, their government shut it down because of local protests for workers' rights."

    Manager: "What about the consultants?"

    Lead dev: "Our legal team is currently suing them because of their responsibility in the latest data breach we underwent, so they won't work for us anymore."

    Manager: "Shit! What about the intern?"

    Lead dev: "That won't do. This is a C++ project. Our intern has studied Python for 2 months in university, and been ordered to serve coffee and to do other menial work by the VP's underlings for the entire 5 months he's been interning here."

    Manager: "Assign the task to the intern this instant! This software is more valuable than your job! This shit must be shipped! Understood?"

    Lead dev: "Aye, Sir!"

    Lead dev, to himself: "I should quit this madhouse job before somebody runs amok."

  • (nodebb) in reply to Sauron

    Lead dev, to himself: "I should quit this madhouse job before somebody runs amok."

    More likely:

    Lead dev, to himself: "I should quit this madhouse job before I run amok."

  • Alistair (unregistered)

    Does not check the name of the parameter, so it will pick up other parameters whose value coincidentally starts with 'VW','VUDU' or 'NFLX'.

  • OldCoder (unregistered) in reply to Barry Margolin

    You haven't noticed that they do a string copy which should probably have been a toupper(). Almost all http URLs that don't come from the Far East are lower case, yet every subsequent test is for an upper case character!

  • Charles (unregistered)

    While the code is undoubtably bad, the criticism is also bad.

    It is unreasonable to claim that this code is relying on strings being properly nul terminated because by definition a string is nul terminated. If you're using char pointers for things which are not strings, then that is an exotic use which has different ways to fail. It is true that there should be no copying into a local buffer, and that that looks unsafe (it may merely be dangerous as we don't know if the caller guarantees the length is short enough).

    The core problem is not that we're passing pure pointers to char, as that is perfectly standard, idiomatic C. It's that the code was written by someone who is bad at programming and doesn't know what they are doing. There is no magical remedy for this. If you somehow manage to get them to write 'safe' code,. it'll just do the wrong thing. And if that's ok, then don't bother to employ them, just generate your code at random.

    strnstr() is not a good solution. The code is looking for a protocol, which always starts just after an '=', so it should use strchr() to find the '=' and then check for each of the protocols. There is no need to re-scan the string every time. Note that that changes the behaviour - but in a way which I suspect is correct. Should "thing=hello=WV" be classed as Widevine? I am suspicious because finding one protocol stops the search for another, so "thing=VUDU&another&finally=WV" gets classified as Widevine when it probably should be VUDU. There are lots of other cases which might be wrong. What about "thing=WVXY"? Why should that be Widevine? And since there's a mention of HTTP, has the input been properly extracted from a URL, or is there some equally bad parsing code. Since the string is called a 'filename', maybe that means that part of the string is user-supplied and might contain arbitrary text.

  • xtal256 (unregistered) in reply to AGlezB

    "That looks like AI-generated code."

    Even GPT 3 (pre-ChatGPT) wasn't dumb enough to generate this code.

  • (nodebb) in reply to xtal256

    Since it's open source, ChatGPT was probably trained with it.

  • (nodebb)

    we can safely assume the C standard library exists for their target, which means strnstr was right there.

    Actually, it's probably not there if this is supposed to be able to run on an embedded platform, or be reasonable portable, which I'd expect of such code. For example neither the Arduino nor the ESP32 IDF define strnstr or even strstr for that matter. And of course the legacy MS C library has always been a mixed bag of stuff either not implemented, or with their own idiosyncrasies.

    A lot of people are just spoiled by the high expectations set by the GNU C library.

  • (nodebb) in reply to Ralf

    But if that's the case, you write your own (inlineable) strstr() rather than this awful mess.

  • seebs (unregistered)
    Comment held for moderation.
  • TF (unregistered)
    Comment held for moderation.
  • TF (unregistered) in reply to Ralf

    "if this is supposed to be able to run on an embedded platform"

    Then why would you put a 2.8 kilobyte variable on the stack?? (for no good reason either)

  • Katrina K (unregistered) in reply to Domin Abbus
    Comment held for moderation.
  • nijave (unregistered) in reply to Charles

    I think this also fails for overlapping substrings. Given strings like ==NFLX since the loop increment will stack on top of the increments inside the loop (it skips over non matching characters that could be the start of a different match)

Leave a comment on “Contains a Substring”

Log In or post as a guest

Replying to comment #:

« Return to Article