LemmInflect
A python module for English lemmatization and inflection.
About
LemmInflect uses a dictionary approach to lemmatize English words and inflect them into forms specified by a user supplied Universal Dependencies or Penn Treebank tag. The library works with out-of-vocabulary (OOV) words by applying neural network techniques to classify word forms and choose the appropriate morphing rules.
The system acts as a standalone module or as an extension to the spaCy NLP system.
The dictionary and morphology rules are derived from the NIH's SPECIALIST Lexicon which contains an extensive set information on English word forms.
A more simplistic inflection only system is available as pyInflect. LemmInflect was created to address some of the shortcoming of that project and add features, such as...
Independence from the spaCy lemmatizer
Neural nets to disambiguate out of vocab morphology
Unigrams to dismabiguate spellings and multiple word forms
Documentation
For the latest documentation, see
Accuracy of the Lemmatizer
The accuracy of LemmInflect and several other popular NLP utilities was tested using