Chinese, Korean Japanese - It's All the Same

snoofle

After surviving 35 years, dozens of languages, hundreds of projects, thousands of meetings and millions of LOC, I now teach the basics to the computer-phobic

Matt H. and company recently added support for Cyrillic script to their PDF invoice generator when they discovered that none of the characters would print. The script used DOMPDF to convert the HTML invoices to PDF, and font handling across scripts can get a bit hairy, so it was not really a surprise. However, as he was digging through the code that generated the invoices, he found this little gem:

<?php
...
$body = mb_convert_encoding($body, 'HTML-ENTITIES', 'UTF-8'); 
 $body = preg_replace_callback('/(&#)([0-9]{4,})(;)/', function($matches) { 
 $code = $matches[2]; 
 if ($code<=19968 && $code>=40895) return $matches[0]; // not CJK 
    return '<span style="font-family:kochi-gothic">'.$matches[0].'</spank>'; 
 },$body); 
 $body = mb_convert_encoding($body, 'UTF-8', 'HTML-ENTITIES');
...
?>

This snippet attempts to deliberately specify a Japanese font for all Chinese, Korean or Japanese characters. But since the condition to do it is messed up, it actually applies this font to all html entities above &#999 (ϧ). This is kind of a good thing, because large portions of the Japanese character set are not in the range 19968-40895. On the other hand, Matt was quite impressed with DOMPDF's spanking-good handling of unclosed tags.

[Advertisement] BuildMaster allows you to create a self-service release management platform that allows different teams to manage their applications. Explore how!

Chinese, Korean Japanese - It's All the Same

Featured Comments