If the resolution is too low, a "B" will look like an "8".
Did you know what OCR really means?
04.12.2018 - Successful OCR processing with good recognition is an important prerequisite for automated document processing. OCR software recognizes the content of letters from pixels on scanned papers, PDFs, and images, transforming the contents of these documents into editable digital texts.
OCR as a basis
Document processing software uses OCR to digitise text content on business documents. The recognised data is automatically assigned to existing data in the SAP ERP system and compared with it. Thus, the system recognizes whether, e.g. the quantity specified on an invoice matches the quantities in the purchase order stored in SAP.
Optimisation of OCR results
In addition to the configuration of the OCR engine, the quality of the material for OCR recognition is decisive for the result of the recognition. The OCR makes certain different demands on an image document to be processed.
The resolution defines the number of points for a defined unit of length and is measured in dots per inch. The higher the resolution, the more accurate the shapes and edges in the document, but the higher the scan time and the file size of the resulting file. As a "golden" value, 300 DPI have been established as the optimal scan resolution. For documents that can not reach this value, e.g. for faxing, the resolution can be increased by interpolation. Depending on the language, the optimal resolution may vary.
- Unique background
To extract the text, the OCR engine must be able to distinguish the payload from the image background. Prerequisite for this is a clear background color. Using clean, white paper and digital optimisation techniques can meet this requirement. Dots, spots, punch marks and other "foreign matter" on the image make OCR detection difficult. Often these are caused by soiling on the scanbody or kinks in the paper. This is particularly noticeable on faxes, where larger areas are often affected. Through digital optimisation, these disturbances can be removed.
- Sharpness and contrast
Sharpness defines the distinctness of details in an image. For the OCR engine, sharpness is an essential feature. Digital post-processing can significantly increase the severity level. Important here is edge sharpness. High edge sharpness makes it easier to recognize text during OCR processing. Brightness differences in an image are called contrast. A high contrast range provides a high sharpness. Particular care should be taken to the background color which should be in high contrast to the text color.