|
Some file processing may be performed during the scanning process
or upon entry into the DMS (Document Management System), if a DMS
is part of your project. Some files may have been scanned years
in the past and processing was not taken into consideration. Processing
may be performed on legacy documents or incorporated into your
daily image capture activities.
OCR (Optical Character Recognition) - Capture
of text from a document for use with content text searching.
Once a document has undergone OCR, a DMS or file retrieval application
can rapidly search your database of
documents and return all files that contain a search term. Digital
documents that have not been scanned (Word, Excel, etc) do not
need to be OCR'd. Scanned documents in the form of PDF may provide
more functionality than documents output in the form of TIF.
OCR accuracy is dependant on the quality of the document and
usually used in conjunction with indexed fields, not as a replacement.
Binding and Separating - Files that have been
scanned as single files may be bound since they are much easier
to manage than multiple page files. Files may also be scanned
into bulk files using separators such as barcodes, blank pages,
etc. and later separated in the processing tasks.
Data Capture and Indexing - Data can be automatically
captured from documents and used to populate a DMS. Once the
data has been captured, the DMS can index this information to
return search results almost immediately. Some examples of data
capture methods:
- Forms Recognition - When repetitive forms
are used, data may be captured and stored in a DMS without
human interaction. Multiple form types can be processed with
very little human interaction.
- Form Memorization - The system can be set
up to recognize forms which are input into the system. The
system is trained to perform certain tasks when the memorized
forms are recognized.
Conversion - Electronic files may be converted
from their current file type to a preferred file type.
Routing - Documents may be routed and reorganized
in an intelligent manner using indexed data or folder structure
information. Routing may be used to file scanned documents into
a folder file structure in replacement of a scanner operator
manually placing every file into a particular series of file
folders.
More on File Capturing
File processing can help make files more manageable. Examples
of processing tasks include: separating and merging, OCR, zonal
OCR, forms recognition, conversion, routing, and database (DMS)
population. Some of the processing tasks can be completed with
scanning software and/or your DMS. Files can be processed years
after they are scanned or during the scanning process.
OCR allows scanned documents to
undergo content text searching once the document is added into
your system. Word, Excel, and other digital files do not have to
undergo the OCR process to be content-searchable. Indexing the
documents makes the content search very fast, even if you are searching
through thousands of files.
Recent improvements in OCR make the process very accurate (up
to 99%), however the accuracy of the OCR is dependant on the quality
of the document and to some extent the hardware used to scan the
documents. Most companies are happy to enjoy the benefits of OCR
and content text search even with its imperfections.
Zonal OCR (OCR of a specific zone on a page) scanned documents
can also be processed to find certain information on the document
and input it into fields in your document management system. For
example, an invoice number may be required to organize and store
the document so that the invoice number location is predetermined
in a template and then that number is read and input into the document
management system. This process is called “forms recognition,” and
may include many fields of information from a single document.
Depending on the type of documents and the quantity of fields to
be populated, this process can be both complex and expensive, so
it is important to weigh the cost with the benefits.
|