Search-based Enrichment

Search-based enrichment allows you to enrich documents with metadata. It is called search-based as you can initiate the enrichment process for a document scope defined by a search.

There are different types of enrichment, and search-based enrichment will provide more options in coming releases.

Enrich with search patterns matching regular expressions
You can search your documents for specific values or patterns, such as social security numbers, credit card numbers, personal dates etc., and extract these into custom metadata fields.
Search-based enrichment helps you to find
  • PII information that requires special treatment or protection.
  • numeric and alphanumeric patterns that you cannot find with keyword searches or the search.bat script.

    This is possible, because the search for the values to be extracted is not applied to the indexed document text, but to the "raw" text as you see it in HTML or Text view. In CORE Administration, you see it, for example, in the p tag in the XML-View-Original view. Indexing features like stemming, stop word removal or tokenization have no effect on value extraction from this text.

Enrich with MIME type
You can apply MIME type detection after data load and thus fill the MIME type field that is required for some other features. This may be useful after a CSV load, when you see that documents have no MIME type information.

Search-based enrichment is done at run-time, that is, when data has already been loaded or even published.

Search-based enrichment results can be made visible as Smart Filters. Search-based enrichment itself is run as a job that you can track on Jobs tabs or pages.