One of the major goals of many software development teams is to take tedious, boring, simplistic manual tasks and automate them. An entire data entry team can be replaced by a single well-written application, saving the company money, greatly improving processing time, and potentially reducing errors.

That is, if it’s done correctly.

Peter G. worked for a state government. One of his department’s tasks involved processing carbon copies of forms for most of the state’s residents. To save costs, improve processing time, and reduce the amount of manual data entry they had to perform, the department decided to automate the process and use optical character recognition (OCR) to scan in the carbon copies and convert the handwritten data into text which was eventually entered into a database.

A pile of paperwork on a desk, with an old style phone and a stream of light artistically highlighting the paper. By By Aaron Logan
[CC BY 2.5], via Wikimedia Commons

The software was written and the department received boxes and boxes and boxes worth of the carbon copy paper forms. The printer had a very long lead time, so they ordered their entire supply of forms for the state for the next year. There were so many boxes that Peter joked about building a castle with them.

Then the system went live. And it didn’t work, at all. Something was wrong with the OCR software and Peter was pulled into the project to help find a fix.

While researching the project history, he found that much of the data on the paper forms wasn’t required, and the decision was made to print those boxes in a different, very specific color. During processing, their custom OCR software would ignore that color, blanking out the box and removing the extraneous information before it was unnecessarily entered into the system. Since it still needed to be visible, but wasn’t important, they chose, with the help of their printer, Pantone 5507.

So he filled out a sample form for one “Homer J. Simpson” and scanned it to see what was meant by “The system doesn’t work.” The system briefly churned and created a record in the test database for his form, but when he inspected the record, it was missing the mandatory unique ID. This ID came from the paper form and was comparable to a license number or Social Security Number, and was absolutely required for the data to be usable.

He filled out a couple more forms in case the system was having trouble understanding his handwriting, but they came out the same way. No unique ID.

He scratched his head and examined the paper forms some more. Eventually, he realized the issue. The box for the unique ID was considered “important” but not “something for users to interact with”, and thus was de-emphasized, and prrinted printed in that different, very specific color that the OCR software ignored: Pantone 5507. So the ID was blanked out and ignored during scanning.

Being a competent developer, Peter quickly came up with a plan to add a step to the task. After scanning, but before handing off to the OCR task, a new task would do a simple color-based find-and-replace within a region of the scan to correct the color of the ID field so it wouldn’t be blanked out.

“No, we don’t have time or money for that,” his manager explained to him. “I’ll have the offshore guys fix it for next year. For now, just cobble something together so the original scan stays with the record.”

The department hired a team of interns to perform manual data entry for the year, whose sole task was to sift through the database records, pull up the corresponding scan, and read and type in the single unique ID field that the OCR software ignored. Meanwhile, the department promised that something bigger, better, and fancier was on the way for next year…

[Advertisement] BuildMaster allows you to create a self-service release management platform that allows different teams to manage their applications. Explore how!