nltk安装第三方自然语言处理工具:
https://github.com/nltk/nltk/wiki/Installing-Third-Party-Software
How NLTK Discovers Third Party Software
NLTK finds third party software through environment variables or via path arguments through api calls. This page will list installation instructions & their associated environment variables.
Java
Java is not required by nltk, however some third party software may be dependent on it. NLTK finds the java binary via the system PATH
environment variable, or through JAVAHOME
or JAVA_HOME
.
To search for java binaries (jar files), nltk checks the java CLASSPATH
variable, however there are usually independent environment variables which are also searched for each dependency individually.
Windows
-
Download & Install the jdk on java's official website: http://www.oracle.com/technetwork/java/javase/downloads/index.html?ssSourceSiteId=otnjp
Linux
It is best to use the package manager to install java.
Stanford Tagger, NER, Tokenizer and Parser.
To install:
-
Make sure java is installed (version 1.8+)
-
Download & extract the stanford tokenizer package (contains the stanford tagger): http://nlp.stanford.edu/software/lex-parser.shtml
-
Download & extract the stanford NER package http://nlp.stanford.edu/software/CRF-NER.shtml
-
Download & extract the stanford POS tagger package http://nlp.stanford.edu/software/tagger.shtml
-
Download & extract the stanford Parser package: http://nlp.stanford.edu/software/lex-parser.shtml
-
Add the directories containing
stanford-postagger.jar
,stanford-ner.jar
andstanford-parser.jar
to theCLASSPATH
environment variable -
Point the
STANFORD_MODELS
environment variable to the directory containing the stanford tokenizer models, stanford pos models, stanford ner models, stanford parser models e.g (arabic.tagger
,arabic-train.tagger
,chinese-distsim.tagger
,stanford-parser-x.x.x-models.jar
...) -
e.g.
export STANFORD_MODELS=/usr/share/stanford-postagger-full-2015-01-30/models:/usr/share/stanford-ner-2015-04-20/classifier