This was a presentation I created on some simple, but extremely useful, techniques that can be used when scanning documents to drastically improve your automatic data capture accuracy.
These techniques include Multistream which simply means that the scanner can output two versions of the scanned document. Typically one is in color and one is in black and white. Why? You would want to save the color version of the image for retrieval purposes. In other words, the user would see an identical electronic version of the hard copy document. The black and white version is used strictly for automatic data extraction because often times the color in unnecessary for OCR.
The second technique is Background Color Removal. Forms designed specifically for automatic data capture such as Health Care Financing Administration (HCFA) CMS1500, UB-92 or OB04's will have one-shade of a consistent background color. Why? This form color is designed make it obvious for the person completing the form exactly where characters and specific information is to be placed in the form. In other words, Social Security Number has an exact box for each of the nine numbers in your SSN. This way the software knows exactly where to automatically look for the SSN field then accurately populate each of the nine numbers. In forms processing, you don't care about the background color, you care about the information on the form. So, therefore, you "dropout" the color and expose the data.
I've written about additional data capture tips, tricks and techniques here:
http://www.aiim.org/community/blogs/expert/Demystifying-Forms-Processing-and-Data-Capture
Here I present a real example of a form before and after Color Dropout. As you can see this form has been nicely deskewed, or aligned correctly, by Image Enhancement technology. The scanner and software combination has intelligent eliminated extraneous pixels with a despeckel function. This particular image has also been automatically fixed to the proper orientation and scanned at the preferable resolution of 300 dots per inch to achieve a much higher level of accuracy and automation for Optical Character Recognition. Next, we let the magic of more Image Processing happen to Dropout the red form background color as you can see. We now have exposed only the information that we want. This reduces file sizes, increases automation and decreases human intervention. The software will now recognize whether check boxes are filled in via Optical Mark Recognition. In addition, our Intelligent Character Recognition, or ICR, on the hand written text is greatly enhanced. Overall, you can see that a well-defined form and well-structure form can really benefit your total efficiency.
Just as there is a trend towards Higher Resolution scanning, there also seems to be a trend towards Color Scanning. While it is true that a good majority of Forms Processing applications still scan in Bitonal or Grayscale modes, with advanced technologies such as Automatic Color Detection, Small Color Detection, Color Smoothing or Color Reduction, it’s wise to consider these technologies to enhance your overall business productivity. In the top set of images we are demonstrating “Color Reduction” or another term might be “Snap to White”. This is a technique where the software is intelligent enough to know that this is a solid background color and automatically eliminates the background to enhance your forms processing accuracy. The second set of images on this slide are an example of “Custom Color Dropout” or another term is “Dynamic Color Dropout”. The image on the left is an Insurance Form with red boxes as the background form color to constrain the data into certain regions of this form. Then, once scanned, the scanning software applies a red filter to completely dropout the unnecessary red background and, again, dramatically increases your processing accuracy. Nearly every document scanner on the market today has color scanning capability. Not that everyone either desires or takes advantage of this capability, however, unlike years ago when the price delta between a monochrome and color scanner was tremendous, these days the price difference is negligible.