Supported Languages Limitations

 

Japanese syllabic characters: Half-width kana characters are not correctly indexed
For example, anata written in kana: あなた is correctly tokenized and indexed, i.e. each character is indexed as one word.
The same item written in half width kana : アナタ is indexed as one word, although there are three characters.
Japanese logographic characters: IDEOGRAPHIC HALF FILL SPACE characters are not treated as whitespace
If kanji is written with IDEOGRAPHIC HALF FILL SPACE characters, these are indexed, too.
For example, anata written in kanji 貴方 is correctly tokenized and indexed.
The same item written in kanji, but with IDEOGRAPHIC HALF FILL SPACE inserted between kanji characters 貴〿方 is tokenized and indexed as three characters.

Copyright © 2018 Open Text. All Rights Reserved. Trademarks owned by Open Text.