Optical Character Recognition is not a new technology, but it has changed the course of document capture, and helped expedite the digital age by becoming more accurate and more affordable as time progresses. In BLI’s October 1989 issue of Update, we covered OCR and noted its lack of style, diversity and automation as a drawback. Current OCR programs can recognize a variety of font styles and sizes for a significantly lower price. In the 1989 article, the price of “Caere” OmniPage (now owned by Nuance Communications, Inc.) was $2,495 for an MS-DOS-compatible version. And that was considered an affordable price. Today, PC-based dedicated OCR software ranges in price from $100 to about $600 for a single user license. And some desktop scanners now offer basic embedded searchable PDF functionality as a standard feature in bundled scan utilities.
OCR functionality works by matching characters within a document with those stored in an OCR library. Some OCR programs can also recognize document structures, such as those found in forms and tables. Once read and translated by the OCR engine, text in image files becomes fully searchable and editable, depending on the final file format. Some document capture programs also use OCR functionality to index scanned documents. In those cases, the scanned documents themselves may not be searchable and editable, but the content within the document can be used for searching and indexing. This method reduces end user intervention by automating what used to be a manual data entry process. And the increased processing speeds available with today’s technology allow users to quickly convert virtually any scanned image file into a searchable digital document with ease.
Over the last few years, BLI has tested the top dedicated OCR engines from ABBYY, Nuance and IRIS; document capture solutions with OCR functionality, such as KODAK Capture Pro; and OCR accuracy in conjunction with scanner hardware evaluations. Although overall character accuracy has not seen significant changes over the past few years, some dedicated OCR solutions can now recreate the formatting of full-page documents, recognize images from cameras and smartphones, and convert such images to searchable digital files able to be formatted to an e-reader or even an MP3 player. There is no doubt the technology has come a long way in the past 20 years, and BLI has been there to track its progress. For a blast from the past, click the link above to read the original 1989 article. Navigate to BLI’s Solutions Center for an in-depth look at the speed, accuracy and general usability of current dedicated OCR solutions. Or take a gander at any BLI scanner lab test report to discover the hardware’s impact on overall accuracy.