Whether 2 words are morphologically related.
Stemming
to reduce the word to its basic form, which is called the stem, after removing various suffixes and endings, and sometimes performing additional transformations.
Remark: In practice, prefixes are sometimes preserved(rescan).
Porter’s methods for stemming
- rule-based methods
- paper An algorithm for stripping
- the method is not always accurate
Measure
The measure of the word is an indication of the number of syllables in it
- Each sequence of consonants is denoted by C
- Each sequence of vowels is denoted by V
- The initial C and the ending V can be optional
The measure is the repeated [VC] times
Porter’s algorithm
The initial word is checked against a sequence of transformation patterns, in order.
one of the patterns: if the word ends with ation, then the pattern of the word is the part without ation(meditation - medit)
- whenever the pattern matches, the word is transformed and the algorithm restarts from the beginning of the list of patterns with the transformed word
- if no pattern matches, the algorithm stops and outputs the most recently transformed version of the word