Nancy was recently handed a pile of "modern" PHP that weighs in at tens of thousands of lines of code.
This is how every query is executed:
function getFoo($bar) {
$bar = my_escape($bar);
$sql = "
select *
from foo
where bar = '" . $bar . "'
";
return do_query($sql);
}
Yes, this is a SQL injection vulnerability. No, there is no part of the application which uses parameterized queries. But wait, they call my_escape
. That must be safely escaping the input so it can be used as a query param safely, right?
function my_escape($data)
{
if ( !isset($data)) {
return '';
}
if ( is_numeric($data) ) {
return $data;
}
if(empty(trim($data))) {
return '';
}
$non_displayables = array(
'/%0[0-8bcef]/', // url encoded 00-08, 11, 12, 14, 15
'/%1[0-9a-f]/', // url encoded 16-31
'/[\x00-\x08]/', // 00-08
'/\x0b/', // 11
'/\x0c/', // 12
'/[\x0e-\x1f]/' // 14-31
);
foreach ( $non_displayables as $regex ) {
$data = preg_replace( $regex, '', $data );
}
$data = str_replace("'", "''", $data );
return $data;
}
So, this strips off url encoded characters, as well as hex-values. It, ah, doesn't strip off any of the characters which you might want to strip off if you're trying to protect against SQL injection. These regexes are case sensitive, too, so while %0b
will get stripped, %0B
won't. Which is also irrelevant, because by the time you're getting to passing things into the database, they've already been percent-decoded.
For extra fun, because the regexes are applied in a loop, it's possible that the regex might remove more than intended, because the first regex might strip characters which results in something that's picked up by a later regex.
For example, "A%%001%0cfB"
turns into "A%1fB"
after the first pass, then "AB"
after the second one. But a slightly different string, like "A%%100%1cfB"
turns into "A%0fB"
.
Now, those are contrived examples, but it's an interesting illustration of poorly thought out code. Looking at this, my guess is that someone knew they needed to sanitize their database inputs, but didn't know exactly how. They saw a my_escape
function in the codebase, and said to themselves, "Ah, I can escape my inputs!" without checking what my_escape
actually did.
This code has been running in production for a few years. Nancy would like to change how it works, but it's not currently "an organizational priority" and constitutes a "high risk change to a stable system".