Irregular Regular Expressions
by in Feature Articles on 2013-04-30Marcus A. worked for a man who believed that regular expressions were the be-all end-all and could be used to trivially solve every data problem that could possibly arise. Their code base was riddled with regular expression transformations that would reduce most developers to tears. This manager also believed that anything that could be explained could be implemented more cheaply offshore.
Their main application contained a raw text field that held comments by customers. Someone got the idea that these comments could be mined and used for business purposes. However, to do this in a free format text field, the non-standardized words and phrases used by humans would need to be cajoled into something that was more easily processed. To this end, Marcus was tasked with managing an outsourced effort to standardize this data. The input would be a Customer Data table with a comments column. The output would be the same column, but with abbreviations, acronyms, etc. converted to standardized text.