OCR (Image Text Extraction)

If the HTML or Text view of a document contains no text, there is often a simple reason: The native document is an image or a PDF file that contains only images. Or there is no native document at all, but only an image ingested with CSV load. In such cases, you can try to extract searchable text from the native document or image using OCR.

The OCR function is available in CORE Administration.

Important: If you want the system to use OCR text for learning and phrase detection, run OCR before publishing documents.

Caution: The text replaced with OCR text cannot be restored.