Eliza had a co-worker who had a problem. There were users with names like “René”. Other users, using standard keyboards, wanted to search for “René”, but couldn’t be bothered to figure out how to type that accent, so they just searched for “Rene”.

The co-worker came up with this solution:

SELECT * FROM table WHERE UPPER(trim(translate(first_name, 'ÉÊËÈÆ+ÂÀÄÁÇÙÛÜÔÖÏΟéêëèâàäáçùûüûôöîïÿ', 'EEEEEEAAAACUUUOOIIYeeeeaaaacuuuuooiiy'))) LIKE (translate('FIRSTNAMESEARCHSTRING%', 'ÉÊËÈÆ+ÂÀÄÁÇÙÛÜÔÖÏΟéêëèâàäáçùûüûôöîïÿ', 'EEEEEEAAAACUUUOOIIYeeeeaaaacuuuuooiiy')) AND UPPER(trim(translate(last_name, 'ÉÊËÈÆ+ÂÀÄÁÇÙÛÜÔÖÏΟéêëèâàäáçùûüûôöîïÿ', 'EEEEEEAAAACUUUOOIIYeeeeaaaacuuuuooiiy'))) LIKE (translate('LASTNAMESEARCHSTRING%', 'ÉÊËÈÆ+ÂÀÄÁÇÙÛÜÔÖÏΟéêëèâàäáçùûüûôöîïÿ', 'EEEEEEAAAACUUUOOIIYeeeeaaaacuuuuooiiy'))

Eliza noticed it because this query was extremely slow- and no surprise. translate is unavoidably a character by character operation. It was taking 30 seconds to search their database with a nine character last name that didn’t even contain any of the replaced characters. And of course, in most collations and character sets, É and upper('é') are going to be the same character, making half the replacements completely unnecessary.

Speaking of collations, they’re a tool that’s been standardized to make this really complicated problem of determining which characters are the same, or what order is “alphabetical” easy for programmers to solve. Every RDBMS supports them, and by specifying a collation. You can set the collation either on the table, the column, the individual query you’re running against, or for the session of the query, using the standard SQL command COLLATE.

Eliza added some code to issue a COLLATE utf8_general_ci; command before running the query, removed the translates, and watched the query execute in 1/30th of a second, instead of 30 seconds.

[Advertisement] BuildMaster allows you to create a self-service release management platform that allows different teams to manage their applications. Explore how!