Wednesday, 27 July 2011

Spell check with Autonomy IDOL.

Autonomy IDOL uses Term Distancing algorithm to find correct spellings and suggests them. In term distancing algorithm IDOL server determines the number of edits (Each edit representing an insertion, deletion and replacement operation of a single character) to find the nearest matching terms.

Following is the minimum set of configurations that is required to activate spell check in Autonomy IDOL.

Index side
The following ConfigurationParams have to be included in the [server] section of the IDOL server configuration file:
  • SpellCheckMaxCheckTerms: It is the maximum size of the query (in number of terms), up to which a query may be considered eligible for spell check.E.g. SpellCheckMaxCheckTerms = 200.
  • SpellCheckIncorrectMaxDocOccs: Maximum number of docs a term can appear in and be considered a misspelling. E.g. SpellCheckMaxCheckTerms = 1 
  • SpellCheckCorrectMinDocOccs: Minimum number of docs a term must appear in order to be a spellcheck suggestion (or to be matched by a wildcard term.).
We can also use the config parameter UnstemmedMinDocOccs for this purpose. It represents the Minimum number of documents a term must appear in order to be a spellcheck suggestion or to be matched by a wildcard term.
E.g.  SpellCheckMaxCheckTerms = 1
        UnstemmedMinDocOccs = 1
There are a few other config parameters related to spell check. These are:
SpellCheckAlphaNumeric: Omits input terms containing numbers from being spellchecked. It is   either true or false.
E.g. SpellCheckMaxCheckTerms = true

SpellCheckCacheMaxSize:  Maximum number of spelling corrections that IDOL server can store.  The spell corrections are stored in IDOL>content>main>prx.db file.
E.g. SpellCheckMaxCheckTerms = 6666

Query Side

Include spellcheck=true in the queries in order to instruct the IDOL server to check the spelling of the query terms and provide suggestions for any misspelled term.

No comments:

Post a Comment