In an office cubicle, there worked a programmer. Not a nasty, dirty cubicle in the basement, nor a dry, bare cubicle in a vast faceless farm: it was a web startup cubicle, and that meant comfort. It was decorated brightly, with plenty of monitors and cables, a pantry full of all manner of crisps known to programmer-kind, and posters of films that were adaptations of a single book, broken needlessly into multiple parts.

The programmer was a well-to-do programmer, and his name was Anders. He was part of a coding team, and their team had worked on The System since time out of mind. It was a very respectable codebase, because it never did anything unexpected: you could tell what any function's output would be without the bother of calling it. Unit tests were a thing to be done by other folks!

One good morning, Anders the Programmer was in his cubicle, minding his own interface to one of the company's mobile webservices, when he was suddenly beset upon by uninvited Replacement Characters! They were all over his output, devouring all of his accented characters. None of the other programmers had ever heard of such a thing. Some of them even didn't even know a Replacement Character was the name of the weird square-or-diamond character that results from incorrect or ambiguous character encoding, even though that was something every programmer should know.

The webservice's sole purpose was to return data encoded in UTF8, its raison d'être (or raison d'�tre as the case may be). The service had been written by another programmer, long ago during the second major release of the system. It was now near the end of the third major release, and the programmer had long since passed beyond the LinkedIn veil. No other programmer in the history of The System had encountered this bug before, for Anders' data was the first that had dared to adventure beyond ASCII. All his coworkers felt they should just let sleeping bugs lie, smugly declaring that it had always worked just fine for them.

Anders cracked open the codebase, and tracked down the encoding/decoding function, hoping it would give some word of explanation as to what had gone through the previous programmer's mind. What he found was a complete lack of understanding of UTF8 made manifest:

function _utf8Encode(&$arr){ 
  for($i=0;$i<(count($arr['parameters']));$i++){ 
    $arr['parameters'][$i]= $arr['parameters'][$i]; 
  } 
} 
function _utf8Decode(&$arr){ 
  for($i=0;$i<count($arr['parameters']);$i++){ 
    $arr['parameters'][$i]= $arr['parameters'][$i]; 
  } 
}

Anders commented out the functions, keeping them for posterity, then rewrote them from scratch. He took to writing extensive comments around these functions to help the other programmers understand-- he thought of titling the comment block "There and Back Again, A Datum's Encoding" . Though many shook their heads and touched their foreheads and said "Poor Anders!" and though few believed any of his tales of codepages and document types, he remained very happy to the days he updated his resume.

[Advertisement] BuildMaster allows you to create a self-service release management platform that allows different teams to manage their applications. Explore how!