You are here:

How to OCR Documents

Note: Running OCR can impact crawl speeds. If you are crawling a large data set or multiple data sets, consider waiting to run OCR until the crawls complete.

Locate and select the documents you want to OCR. In CORE Administration, use the Explore tab. In Axcelerate 5, use the Analysis page.
Tip: To locate all image files, use the MIME Type Smart Filter. To locate all files that do not contain text, use the Document Characteristics Smart Filter.
In the Actions menu, select OCR in bulk.
The Bulk OCR wizard opens:
If you are running OCR in Axcelerate 5, you will see a Summary panel containing information about your document selection.
Expand to learn more about the Axcelerate 5 Summary panel
- Selected documents – the number of documents you are submitting for OCR. If this count does not match what you thought you had selected, click Abort to cancel the OCR job and verify your selection, for example, make sure you did not unintentionally select multiple pages of the Results list.
- Documents with existing text – alerts you if documents in your selection already contain text. If you did not intend to submit documents with text for OCR, click Abort to cancel the OCR job and verify your document selection.
- Preprocessing required – alerts you as to how many of the selected documents require backend system processing to prepare them for OCR; a high number may cause the OCR job to take longer to complete.
- Cannot OCR - alerts you if any of the selected documents cannot be OCR'd because they are missing natives or images.
Job Display Name

Optionally rename the OCR job.

Description

Optionally input a description.

I have read the warning and want to continue.

Check this box to confirm you understand that any text which may exist for the selected documents will be replaced and cannot later be recovered.
Click Finish in CORE Administration or Run OCR in Axcelerate 5 to start the OCR job.

The text extracted by OCR replaces the indexed document’s text. The document’s metadata is not changed. The file from which the text was extracted is not changed.

In CORE Administration, monitor the status of the OCR job on the Workspace page using the Jobs tab for the application; in Axcelerate 5 use the Administration > Jobs page. You can view the Details of the job or, if the job contains errors, download the Error Log. In Axcelerate 5 access these items using the actions menu for the specific job.