Several months ago, Rebecca's predecessor wrote some Perl code to traverse a JSON-esque data-structure. It was an hash map that could contain values, arrays, or more hashes.
Said predecessor wrote the code, had it reviewed, and tested it. It rolled out in production, and for three months, everything was fine. Three months was also long enough for predecessor to take a new job elsewhere, which left Rebecca as the sole support when it broke.
sub spider_reference {
my ( $reference ) = @_;
if ( $reference =~ m/HASH/ ) {
// spider into the hash %{$reference}
}
else if ( $reference =~ m/ARRAY/ ) {
// iterate over the array @{$reference}
}
}
When Rebecca first saw this code, she thought "I didn't know that you could do that with regexes." After about five seconds more, she realized, no you can't do that with regexes.
This function is accepting an argument that is a field in our JSON-esque structure. If it's an array or a hash map, we need to "spider" into it to do more processing. If it's a scalar value, we just move on.
So how does this work? We take that argument and do a regular expression check on it. For a reference variable- an array or a map- that operation doesn't make sense, so Perl makes it make sense by doing a conversion to string. That string is in in the form HASH (0xDEADBEEF)
, or ARRAY (0x0DEFACED)
- the type and address.
So this code simply does that conversion and checks to see if the resulting string contains the expected type tag, and if it does, it does the required processing. There's just one problem that is likely becoming obvious: this function is also running on scalar values, which may be strings, and thus may contain HASH
or ARRAY
- which is exactly what happened. Some base64 encoded data tricked this code into processing a string as a hash, which crashed the program.
This code is a case of "once you have a hammer, every problem looks like a nail". Using regular expressions to sniff out types is a mistake, but a mistake that our developer got away with for a surprisingly long time. They could have gotten away with it longer if they made the regex check the start of the string only, /^HASH/
, but that wouldn't have truly fixed the problem.
Rebecca fixed it by reading the docs and finding the correct solution- the ref
function, which returns a string describing a reference's type.