Matt H. and company recently added support for Cyrillic script to their PDF invoice generator when they discovered that none of the characters would print. The script used DOMPDF to convert the HTML invoices to PDF, and font handling across scripts can get a bit hairy, so it was not really a surprise. However, as he was digging through the code that generated the invoices, he found this little gem:

<?php
...
$body = mb_convert_encoding($body, 'HTML-ENTITIES', 'UTF-8'); 
 $body = preg_replace_callback('/(&#)([0-9]{4,})(;)/', function($matches) { 
 $code = $matches[2]; 
 if ($code<=19968 && $code>=40895) return $matches[0]; // not CJK 
    return '<span style="font-family:kochi-gothic">'.$matches[0].'</spank>'; 
 },$body); 
 $body = mb_convert_encoding($body, 'UTF-8', 'HTML-ENTITIES');
...
?>

This snippet attempts to deliberately specify a Japanese font for all Chinese, Korean or Japanese characters. But since the condition to do it is messed up, it actually applies this font to all html entities above &#999 (ϧ). This is kind of a good thing, because large portions of the Japanese character set are not in the range 19968-40895. On the other hand, Matt was quite impressed with DOMPDF's spanking-good handling of unclosed tags.