Friday 5 August 2011

Synonym with Autonomy IDOL

Synonym
A synonym based search returns results which are conceptually similar to the query terms.

Solution Approaches:

  1. Enable synonym search in Autonomy IDOL
  2. Create a synonym database
Enable Synonym search in Autonomy IDOL: Autonomy IDOL recommends this method if synonym matching is required for approximate a few 100 terms.

It is a 3 step process 1. Set up a synonym file. 2. Configure the IDOL server to use the synonym file. 3. Execute the Synonym query.

1. Set up a  synonym file:
1. Create a text file and save it in IDOL server's IDOL/content directory using the custom file name (manually created by the User) specified in the IDOL server configuration file [SynonymType] section.
2. Create sections for each language type defined in the IDOL server configuration file.                            
For example:
[EnglishASCII]
[GermanUTF8]
3. In each section, create a line for each word for which user want to list synonyms (using encoding used for the associated language type).                                                                                                                          Example:
[EnglishASCII]
cat
dog

[GermanUTF8]
Katze
Hund

4. List synonym strings next to each word and save the file. Separate the word and each string with commas (there must be no space before or after a comma). The individual terms can contain spaces but must not contain any punctuation.
For example:

[EnglishASCII]
cat,feline,grimalkin,moggy,mouser,puss,pussy,tabby dog,bitch,cur,hound,mans best friend,mongrel,mutt,pooch,puppy

[GermanUTF8]
Katze,Mietze,Mietzekatze,Mietzekater,Kater,Mulle,Kätzchen                     Hund,Wau Wau,Hündin,Töle,Kläffer,Hündchen,Welpe

To configure IDOL server to use a synonym file

1. Open the IDOL server configuration file in a text editor.
2. In the IDOL server configuration file's [FieldProcessing] section, set up a synonym process. This process allows IDOL server to determine when it must apply synonym settings.
For example:

[FieldProcessing]
0=SynonymMatch

3. Create a section for the listed synonym field process to create a property for the process (synonym properties always point to a defined synonym job). Identify the required fields to associate with the process.

For example:
[SynonymMatch]
Property=ApplySynonymMatch
PropertyFieldCSVs=*/DRETITLE,*/DRECONTENT

In this example, IDOL server returns only documents for synonym queries if their DRETITLE or DRECONTENT field values match the query.      
(When identifying the fields, use the format /FieldName to match root-level fields, */FieldName to match all fields except root-level, or /Path/ FieldName to match fields that the specified path points to).

Note: - This should be implemented in [FieldProcessing] section of the IDOL config.

4. Create a section for the property to set the SynonymType parameter to the name of the synonym job that specifies which settings IDOL server must apply to synonym queries.

[ApplySynonymMatch]
SynonymType=Synonym_job

Note: - This should be implemented in [Properties] section of the IDOL config.

5. In the IDOL server configuration file [Synonym] section, list the synonym job whose settings need to apply when a synonym query send to IDOL server.  Multiple jobs can be set up in [Synonym] section. However normally only require one.
For example:

[Synonym]
0=Synonym_job

6. Define a section for the synonym job to specify the settings that required applying to synonym queries. The section must have the same name as the synonym job.
For example:

[Synonym_job]
File=animals.txt
MaxExpandLevel=1

Note: - Information on “ MaxExpandLevel ” :

Description
How many levels (0-3) of synonyms to display. Allows specifying how many levels of the synonym tree you want to show in the links field for query results. Enter 0 to display only direct synonyms, 1 to display direct synonyms and synonyms of the direct synonyms, and so on.

Example
The synonym file contains:
girl, young woman, lass, gal, schoolgirl, young lady, maiden, damsel
maiden, budding, fresh, pristine, new, raw, undeveloped, virgin
pristine, disinfected, germ-free, immaculate, pasteurized, purified, spotless, sterilized
Depending on the MaxExpandLevel level setting, a synonym query for the word "girl" is processed as follows:
MaxExpandLevel=0
Only directly related synonyms are added to a synonym query. If a synonym query, for example, contains the word "girl", the words "young woman", "lass", "gal", "schoolgirl", "young lady", "damsel" and "maiden" are added to it.
MaxExpandLevel=1
If a synonym query contains the word girl, direct synonyms for "girl" are added to the query ("young woman", "lass", "gal", "schoolgirl", "young lady", "damsel", "maiden") as well as synonyms of these direct synonyms ("budding", "fresh", "new", "raw", "undeveloped", "virgin", "pristine").
MaxExpandLevel=2
If a synonym query contains the word girl, direct synonyms for "girl" ("young woman", "lass", "gal", "schoolgirl", "young lady", "damsel", "maiden"), synonyms of the direct synonyms ("budding", "fresh", "new", "raw", "undeveloped", "virgin", "pristine") and synonyms of these synonyms are added to the query ("disinfected", "germ-free", "immaculate", "pasteurized", "purified", "spotless", "sterilized").

7. Save the configuration file and restart IDOL server.

Execute Synonym Searches
After creating a synonym file and configure IDOL server to use it, turn any Query action that send to IDOL server into a synonym query by adding &Synonym=true to it.

For example:
http://localhost:5552/action=Query&Text=Felix is a great mouser&Synonym=true

This query returns documents that conceptually match the term mouser, as well as documents that conceptually match any of the terms listed as synonyms for the term mouser in the synonym file.   

Implementation of Approach 2:-      Set up an Additional Synonym IDOL Server

 Key Process to set up an additional IDOL server

              1> Install the Synonym IDOL server.
              2> Create a synonym file and index it.
              3> Execute a synonym query.

Process to Install the Synonym IDOL Server
1. Create and Index a Synonym File
Install the IDOL server component following the installation instructions. If installation of the Synonym IDOL server is to be done on the same machine as your existing IDOL server, ensure that the servers use different ports.
You can obtain the synonym file you are going to store in your Synonym IDOL server by spidering a Thesaurus site (using HTTP Connector) or by creating the file manually. A synonym file must be a text file that contains these fields:

For example:

#DREREFERRENCE Syn1.txt
#DRECONTENT cat feline grimalkin moggy mouser tabby siamese kitten
#DREENDDOC

#DREREFERRENCE Syn2.txt
#DRECONTENT dog cur hound mongrel mutt pooch puppy
#DREENDDOC

Note: - If HTTP Connector is use to create the synonym file, connector can be used to index the file. The manually created file can be indexed using a DREADD index action.

Execute Synonym Searches
The procedure to execute a synonym search.

To execute synonym searches
1.    Send a query to the Synonym IDOL server.
For example: http://synonymServerHost:synonymServerPort/action=Query&Text=mouser
2.    When the Synonym IDOL server returns the synonym results, add the results to the query string and send the newly formed query to Content IDOL server (normally a front end is set up to do this).
For example:                                       http://IDOLhost:port/action=Query&Text=mouser+(cat feline grimalkin moggy mouser tabby siamese kitten)
This query returns documents that conceptually match the term mouser, as well as documents that conceptually match any of the terms that the Synonym IDOL server lists as synonyms for the term mouser.


1 comment:

  1. Hello,

    I want to know if Autonomy has its own inbuilt synonym database? If yes where and how to get it?

    ReplyDelete