Feature Description:
Purpose of lemmatizatiion/stemming is to enable a query with one word form to match documents that contain a different forms of the word.
In languages, some words have a common morphological root. Autonomy provides stemming algorithms that reduce words to this form. This process allows you to match concepts regardless of the grammatical use of words. In English for example, the words help, helpful, helping and helped can all be stripped to their stem help without significant loss of meaning.
Autonomy provides as standard, a set of stemming algorithms for the most commonly used languages. IDOL applies stemming after it discards stop words, both at index time (when content is stored in IDOL server) and at query time (IDOL removes stop words and stems query text before matching).
Solution approach:
There could be two approaches while implementing stemming through Autonnomy -
a) Create the file. This file is a list of words and their stems. Ex:
[UTF8]
mice mouse
mouse mouse
children child
b) Open the IDOL server configuration file. In the [MyLanguage] section for the
stemming file language, set the StemmingFile configuration parameter to
the name of your stemming file. For example:
[english]
Encodings=ASCII:englishASCII,UTF8:englishUTF8
Stoplist=engish.dat
Stemming = true
StemmingFile=english_stem.dat
Purpose of lemmatizatiion/stemming is to enable a query with one word form to match documents that contain a different forms of the word.
In languages, some words have a common morphological root. Autonomy provides stemming algorithms that reduce words to this form. This process allows you to match concepts regardless of the grammatical use of words. In English for example, the words help, helpful, helping and helped can all be stripped to their stem help without significant loss of meaning.
Autonomy provides as standard, a set of stemming algorithms for the most commonly used languages. IDOL applies stemming after it discards stop words, both at index time (when content is stored in IDOL server) and at query time (IDOL removes stop words and stems query text before matching).
Solution approach:
There could be two approaches while implementing stemming through Autonnomy -
- Using default stemming rules provided by Autonomy.
- Create a Custom Stem File for a Language: You can override the default stemming rules for certain words in a given language by creating a language-specific stemming file.
a) Create the file. This file is a list of words and their stems. Ex:
[UTF8]
mice mouse
mouse mouse
children child
b) Open the IDOL server configuration file. In the [MyLanguage] section for the
stemming file language, set the StemmingFile configuration parameter to
the name of your stemming file. For example:
[english]
Encodings=ASCII:englishASCII,UTF8:englishUTF8
Stoplist=engish.dat
Stemming = true
StemmingFile=english_stem.dat
how do you tell IDOL to not include stemmed words when doing hit highlighting with a view action?
ReplyDelete