Friday, 12 August 2011



The purpose of lemmatization is to enable a query with one word form to match documents that contain a
different form of the word.

In English, lemmatization can occur for:
  1. singular or plural forms for nouns.
  2. positive, comparative, or superlative forms for adjectives.
  3. tense and person for verbs.
For other languages, lemmatization also allows search across case and gender forms and other form
paradigms, depending on the grammatical features for the word forms.

Lemmatization allows a user to search for a term like car and get both documents that
contain the word car and documents that contain the word cars.

Lemmatization, stemming and wildcard search
Lemmatization differs from stemming or wildcard search by being more precise. Different word forms are
mapped to each other by using a language specific dictionary, not by applying simple suffix chopping rules
(stemming) or partial string matches (wildcard search).

No comments:

Post a Comment