Python中使用Stanford CoreNLP


Stanford CoreNLP的源代码是使用Java写的,提供了Server方式进行交互。stanfordcorenlp是一个对Stanford CoreNLP进行了封装的Python工具包,GitHub地址,使用非常方便。
Stanford 官方发布了python版,直接可以安装。具体参见。
pip install stanfordnlp


1:下载安装JDK 1.8及以上版本。
2:下载Stanford CoreNLP文件,解压。



Python packages using the Stanford CoreNLP server

These packages use the Stanford CoreNLP server that we’ve developed over the last couple of years. You should probably use one of them.

  • stanfordcorenlp by Lynten Guo. A Python wrapper to Stanford CoreNLP server, version 3.8.0. Also: PyPI page.
  • pycorenlp, A Python wrapper for Stanford CoreNLP by Smitha Milli that uses the new CoreNLP v3.6+ server. Available on PyPI.
  • corenlp-pywrap by Sherin Thomas also uses the new CoreNLP v3.6+ server. Python 3.x (only). Also: PyPI page.
  • Stanford CoreNLP Python Interface: A reference implementation of a Python interface to the Stanford CoreNLP server. By Arun Chaganty. PyPI page.
  • pynlp ,A (Pythonic) Python wrapper for Stanford CoreNLP by Sina. PyPI page.
  • NLTK since version 3.2.3 has a new interface to Stanford CoreNLP using the StanfordCoreNLPServer. Among other places, see instructions on using the dependency parser and the code for this module, and if you poke around the documentation, you can find equivalent interfaces to other CoreNLP components; for example here is Stanford CoreNLP NER. Much of the work for this was done by Dmitrijs Milajevs.




StanfordNLP supports Python 3.6 or later. We strongly recommend that you install StanfordNLP from PyPI. If you already have pip installed, simply run

pip install stanfordnlp

this should also help resolve all of the dependencies of StanfordNLP, for instance PyTorch 1.0.0 or above.
Alternatively, you can also install from source of this git repository, which will give you more flexibility in developing on top of StanfordNLP and training your own models. For this option, run

git clone
cd stanfordnlp
pip install -e .
Running StanfordNLP

Getting Started with the neural pipeline
To run your first StanfordNLP pipeline, simply following these steps in your Python interactive interpreter:

>>> import stanfordnlp
>>>'en')   # This downloads the English models for the neural pipeline
>>> nlp = stanfordnlp.Pipeline() # This sets up a default neural pipeline in English
>>> doc = nlp("Barack Obama was born in Hawaii.  He was elected president in 2008.")
>>> doc.sentences[0].print_dependencies()

The last command will print out the words in the first sentence in the input string (or Document, as it is represented in StanfordNLP), as well as the indices for the word that governs it in the Universal Dependencies parse of that sentence (its “head”), along with the dependency relation between the words. The output should look like:

('Barack', '4', 'nsubj:pass')
('Obama', '1', 'flat')
('was', '4', 'aux:pass')
('born', '0', 'root')
('in', '6', 'case')
('Hawaii', '4', 'obl')
('.', '4', 'punct')

Note: If you are running into issues like OSError: [Errno 22] Invalid argument, it’s very likely that you are affected by a known Python issue, and we would recommend Python 3.6.8 or later and Python 3.7.2 or later.

We also provide a multilingual demo script that demonstrates how one uses StanfordNLP in other languages than English, for example Chinese (traditional)

python demo/ -l zh

See our getting started guide for more details.


本教程以stanfordcorenlp接口为例(本文所用版本为Stanford CoreNLP 3.9.1),讲解Python调用StanfordCoreNLP的使用方法。


简单使用命令:pip install stanfordcorenlp
选择USTC镜像安装(安装速度很快,毕竟国内镜像):pip install stanfordcorenlp -i --trusted-host


Simple usage
from stanfordcorenlp import StanfordCoreNLP

nlp = StanfordCoreNLP(r'G:\JavaLibraries\stanford-corenlp-full-2018-02-27')

sentence = 'Guangdong University of Foreign Studies is located in Guangzhou.'
print 'Tokenize:', nlp.word_tokenize(sentence)
print 'Part of Speech:', nlp.pos_tag(sentence)
print 'Named Entities:', nlp.ner(sentence)
print 'Constituency Parsing:', nlp.parse(sentence)
print 'Dependency Parsing:', nlp.dependency_parse(sentence)

nlp.close() # Do not forget to close! The backend server will consume a lot memery.

Output format:

# Tokenize
[u'Guangdong', u'University', u'of', u'Foreign', u'Studies', u'is', u'located', u'in', u'Guangzhou', u'.']

#Part of Speech

[(u'Guangdong', u'NNP'), (u'University', u'NNP'), (u'of', u'IN'), (u'Foreign', u'NNP'), (u'Studies', u'NNPS'), (u'is', u'VBZ'), (u'located', u'JJ'), (u'in', u'IN'), (u'Guangzhou', u'NNP'), (u'.', u'.')]

# Named Entities
 [(u'Guangdong', u'ORGANIZATION'), (u'University', u'ORGANIZATION'), (u'of', u'ORGANIZATION'), (u'Foreign', u'ORGANIZATION'), (u'Studies', u'ORGANIZATION'), (u'is', u'O'), (u'located', u'O'), (u'in', u'O'), (u'Guangzhou', u'LOCATION'), (u'.', u'O')]

# Constituency Parsing
      (NP (NNP Guangdong) (NNP University))
      (PP (IN of)
        (NP (NNP Foreign) (NNPS Studies))))
    (VP (VBZ is)
      (ADJP (JJ located)
        (PP (IN in)
          (NP (NNP Guangzhou)))))
    (. .)))

#Dependency Parsing
[(u'ROOT', 0, 7), (u'compound', 2, 1), (u'nsubjpass', 7, 2), (u'case', 5, 3), (u'compound', 5, 4), (u'nmod', 2, 5), (u'auxpass', 7, 6), (u'case', 9, 8), (u'nmod', 7, 9), (u'punct', 7, 10)]

Other Human Languages Support

Note: you must download an additional model file and place it in the …/stanford-corenlp-full-2018-02-27 folder. For example, you should download the stanford-chinese-corenlp-2018-02-27-models.jar file if you want to process Chinese.

# _*_coding:utf-8_*_

# Other human languages support, e.g. Chinese

sentence = '清华大学位于北京。'

with StanfordCoreNLP(r'G:\JavaLibraries\stanford-corenlp-full-2018-02-27', lang='zh') as nlp:

General Stanford CoreNLP API

Since this will load all the models which require more memory, initialize the server with more memory. 8GB is recommended.

#General json output
nlp = StanfordCoreNLP(r'path_to_corenlp', memory='8g')
print nlp.annotate(sentence)

You can specify properties:

annotators: tokenize, ssplit, pos, lemma, ner, parse, depparse, dcoref (See Detail)

pipelineLanguage: en, zh, ar, fr, de, es (English, Chinese, Arabic, French, German, Spanish) (See Annotator Support Detail)

outputFormat: json, xml, text

text = 'Guangdong University of Foreign Studies is located in Guangzhou. ' \
       'GDUFS is active in a full range of international cooperation and exchanges in education. '

props={'annotators': 'tokenize,ssplit,pos','pipelineLanguage':'en','outputFormat':'xml'}
print nlp.annotate(text, properties=props)

Use an Existing Server

Start a CoreNLP Server with command:

java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000
And then:

# Use an existing server
nlp = StanfordCoreNLP('http://localhost', port=9000)


import logging
from stanfordcorenlp import StanfordCoreNLP

# Debug the wrapper
nlp = StanfordCoreNLP(r'path_or_host', logging_level=logging.DEBUG)

# Check more info from the CoreNLP Server 
nlp = StanfordCoreNLP(r'path_or_host', quiet=False, logging_level=logging.DEBUG)


We use setuptools to package our project. You can build from the latest source code with the following command:

  • $ python bdist_wheel --universal
    You will see the .whl file under dist directory.
发布了158 篇原创文章 · 获赞 144 · 访问量 39万+


©️2019 CSDN 皮肤主题: 精致技术 设计师: CSDN官方博客