Character Variants and Highlighting

When you search for documents, characters with diacritical marks and other character variants are internally added to the search for some views, to optimize highlighting.

When documents are loaded and indexed, one process is the normalization of diacritical marks. Internally, characters with diacritical marks such as German Umlauts (for example ä), or French accents (for example, è), are transformed into a normalized form, that is, a or e. This allows search results even if a diacritical mark is omitted in the search or in the document.

Text, Near Native and Production views show search results based on the normalized form.

The Redaction view is not based on the normalized form, but on conversion results . To allow correct highlighting, the search terms are dynamically adapted. If they contain characters with frequently occurring variants, these are replaced with a regular expression that searches for all variants. By default, this is done for German, French, or Spanish texts.

The use of regular expressions adapts highlighting results to the actual search result in the Results list.

Regular expressions are dynamically created when you search for these character groups:

  • a, á, à, â, ä
  • e, é, è, ê ë
  • i, í, ì, î, ï
  • o, ó, ò, ô, ö
  • u, ú, ù, û, ü
  • ß, ss

Copyright © 2019 Open Text. All Rights Reserved. Trademarks owned by Open Text.