The Daily WTF: Curious Perversions in Information Technology

Domin Abbus · 2024-03-25 Reply Admin

Dont' be too hard on them. At least they contributed to an Open Source project to improve its quality!

2024-03-25 Reply Admin

We can assume the string is '\0' terminated. It's the default, after all. And a strcmp when the string isn't '\0' terminated isn't unsafe, I think. It may crash the application, but not alter memory.

AGlezB · 2024-03-25 Reply Admin

That looks like AI-generated code.

2024-03-25 Reply Admin

Is there a limit on the length of a URL by the specification? Because I see a 2800 byte buffer of 0x00 that they copy into. I assume that's to "ensure" null termination to make the later string functions safe, but only if it's guaranteed that the input string is 2799 bytes or less in length.

Remy Porter · 2024-03-25 Reply Admin

That's not automatically a safe assumption, but also, we have no guarantee that the null terminator appears in the first 2800 bytes.

Remy Porter · 2024-03-25 Reply Admin

This code is old enough that I don't think anything artificial or anything intelligent was involved in its production.

2024-03-25 Reply Admin

Yeah, there's a limit. A couple years ago, I was working with a non-programmer who was tweaking a web page that used a library to fill in columns in a table. He tried adding a new column and it failed catastrophically. He was certain he had done it exactly like the others and had wasted two days trying to figure out what he did wrong. I took one look and realized that the whole request was a GET with all parameters in the URL. His new column was the one that broke the camel's back. Fortunately, the library had a version that took a POST and the column requests in the body. A few minutes later I had his fix for him.

dpm · 2024-03-25 Reply Admin

[...] isn't unsafe [...] it may crash the application

You and I have very different definitions of the word "unsafe".

Barry Margolin · 2024-03-25 Reply Admin

Even if you don't know about strstr you can at least use

if (proto_str[j] == '=' && proto_str[j+1] == 'V' && proto_str[j+2] == 'V')

instead of all those nested if statements.

2024-03-25 Reply Admin

Looks like the code was written by an intern randomly thrown at the project. Here's probably how it played out.

Manager: "We need to tell apart the DRM protocol in URLs."

Lead dev: "Nobody is available, the VP is ordered that all manpower be used to make the buttons rounder, and change add an animated opacity transition to the dropdown menus."

Manager: "What about the Kerbleckistani contractors?"

Lead dev: "They have no Internet today, their government shut it down because of local protests for workers' rights."

Manager: "What about the consultants?"

Lead dev: "Our legal team is currently suing them because of their responsibility in the latest data breach we underwent, so they won't work for us anymore."

Manager: "Shit! What about the intern?"

Lead dev: "That won't do. This is a C++ project. Our intern has studied Python for 2 months in university, and been ordered to serve coffee and to do other menial work by the VP's underlings for the entire 5 months he's been interning here."

Manager: "Assign the task to the intern this instant! This software is more valuable than your job! This shit must be shipped! Understood?"

Lead dev: "Aye, Sir!"

Lead dev, to himself: "I should quit this madhouse job before somebody runs amok."

Steve_The_Cynic · 2024-03-25 Reply Admin

Lead dev, to himself: "I should quit this madhouse job before somebody runs amok."

More likely:

Lead dev, to himself: "I should quit this madhouse job before I run amok."

2024-03-25 Reply Admin

Does not check the name of the parameter, so it will pick up other parameters whose value coincidentally starts with 'VW','VUDU' or 'NFLX'.

2024-03-25 Reply Admin

You haven't noticed that they do a string copy which should probably have been a toupper(). Almost all http URLs that don't come from the Far East are lower case, yet every subsequent test is for an upper case character!

2024-03-25 Reply Admin

While the code is undoubtably bad, the criticism is also bad.

It is unreasonable to claim that this code is relying on strings being properly nul terminated because by definition a string is nul terminated. If you're using char pointers for things which are not strings, then that is an exotic use which has different ways to fail. It is true that there should be no copying into a local buffer, and that that looks unsafe (it may merely be dangerous as we don't know if the caller guarantees the length is short enough).

The core problem is not that we're passing pure pointers to char, as that is perfectly standard, idiomatic C. It's that the code was written by someone who is bad at programming and doesn't know what they are doing. There is no magical remedy for this. If you somehow manage to get them to write 'safe' code,. it'll just do the wrong thing. And if that's ok, then don't bother to employ them, just generate your code at random.

strnstr() is not a good solution. The code is looking for a protocol, which always starts just after an '=', so it should use strchr() to find the '=' and then check for each of the protocols. There is no need to re-scan the string every time. Note that that changes the behaviour - but in a way which I suspect is correct. Should "thing=hello=WV" be classed as Widevine? I am suspicious because finding one protocol stops the search for another, so "thing=VUDU&another&finally=WV" gets classified as Widevine when it probably should be VUDU. There are lots of other cases which might be wrong. What about "thing=WVXY"? Why should that be Widevine? And since there's a mention of HTTP, has the input been properly extracted from a URL, or is there some equally bad parsing code. Since the string is called a 'filename', maybe that means that part of the string is user-supplied and might contain arbitrary text.

2024-03-25 Reply Admin

"That looks like AI-generated code."

Even GPT 3 (pre-ChatGPT) wasn't dumb enough to generate this code.

Ralf · 2024-03-26 Reply Admin

Since it's open source, ChatGPT was probably trained with it.

Ralf · 2024-03-26 Reply Admin

we can safely assume the C standard library exists for their target, which means strnstr was right there.

Actually, it's probably not there if this is supposed to be able to run on an embedded platform, or be reasonable portable, which I'd expect of such code. For example neither the Arduino nor the ESP32 IDF define strnstr or even strstr for that matter. And of course the legacy MS C library has always been a mixed bag of stuff either not implemented, or with their own idiosyncrasies.

A lot of people are just spoiled by the high expectations set by the GNU C library.

dkf · 2024-03-28 Reply Admin

But if that's the case, you write your own (inlineable) strstr() rather than this awful mess.

2024-03-29 Reply Admin

i don't think strnstr is even present in glibc, it's BSD-only, and I think maybe even just FreeBSD. it's not just "not standard", it's a fairly unusual extension.

it's notable that the nested ifs continue advancing the loop variable, so for instance, this won't spot "==WV" as containing "=WV", because the check for the W pushes the loop counter past the = anyway.

the assumption that a string is actually a string, while possibly a risk, isn't as big a problem as you might think. it's not nearly as large a problem as the assumption that the name fits in 2800 characters.

also worth noting that "name" appears to be the URL's scheme, and that it won't work with unusual protocols no one's ever seen, such as "https".

2024-03-29 Reply Admin

I've encountered so much low level/embedded C (or semi-C++) code (usually examples) that seem like they were written by someone who read a few pages of Kernighan and Ritchie 40 years ago and never managed to discover that there was a C standard library... Never mind modern C++.

2024-03-29 Reply Admin

"if this is supposed to be able to run on an embedded platform"

Then why would you put a 2.8 kilobyte variable on the stack?? (for no good reason either)

2024-03-30 Reply Admin

Reading the story again, I am unsure they were contributing to OS, just using it with changes for themselves

2024-04-01 Reply Admin

I think this also fails for overlapping substrings. Given strings like ==NFLX since the loop increment will stack on top of the increments inside the loop (it skips over non matching characters that could be the start of a different match)

Contains a Substring

Leave a comment on “Contains a Substring”