Python中使用Stanford CoreNLP

最新推荐文章于 2023-04-26 20:22:45 发布

Mr番茄蛋

最新推荐文章于 2023-04-26 20:22:45 发布

阅读量3.6w

点赞数 16

分类专栏： python NLP 文章标签： NLP

本文链接：https://blog.csdn.net/qq_35203425/article/details/80451243

版权

python 同时被 2 个专栏收录

86 篇文章 6 订阅

订阅专栏

NLP

9 篇文章 0 订阅

订阅专栏

前言

Stanford CoreNLP的源代码是使用Java写的，提供了Server方式进行交互。stanfordcorenlp是一个对Stanford CoreNLP进行了封装的Python工具包，GitHub地址，使用非常方便。
***************更新**************
Stanford 官方发布了python版，直接可以安装。具体参见https://stanfordnlp.github.io/stanfordnlp/。
pip install stanfordnlp

安装依赖

1：下载安装JDK 1.8及以上版本。
2：下载Stanford CoreNLP文件，解压。
3：处理中文还需要下载中文的模型jar文件，然后放到stanford-corenlp-full-2018-02-27根目录下即可（注意一定要下载这个文件，否则它默认是按英文来处理的）。

常用接口

StanfordCoreNLP官网给出了python调用StanfordCoreNLP的接口。

Python packages using the Stanford CoreNLP server

These packages use the Stanford CoreNLP server that we’ve developed over the last couple of years. You should probably use one of them.

stanfordcorenlp by Lynten Guo. A Python wrapper to Stanford CoreNLP server, version 3.8.0. Also: PyPI page.
pycorenlp, A Python wrapper for Stanford CoreNLP by Smitha Milli that uses the new CoreNLP v3.6+ server. Available on PyPI.
corenlp-pywrap by Sherin Thomas also uses the new CoreNLP v3.6+ server. Python 3.x (only). Also: PyPI page.
Stanford CoreNLP Python Interface: A reference implementation of a Python interface to the Stanford CoreNLP server. By Arun Chaganty. PyPI page.
pynlp ，A (Pythonic) Python wrapper for Stanford CoreNLP by Sina. PyPI page.
NLTK since version 3.2.3 has a new interface to Stanford CoreNLP using the StanfordCoreNLPServer. Among other places, see instructions on using the dependency parser and the code for this module, and if you poke around the documentation, you can find equivalent interfaces to other CoreNLP components; for example here is Stanford CoreNLP NER. Much of the work for this was done by Dmitrijs Milajevs.

补充

Stanford官方发布了Python版的nlp处理工具，不在纠结使用java了。

`Setup`

StanfordNLP supports Python 3.6 or later. We strongly recommend that you install StanfordNLP from PyPI. If you already have pip installed, simply run

pip install stanfordnlp

this should also help resolve all of the dependencies of StanfordNLP, for instance PyTorch 1.0.0 or above.
Alternatively, you can also install from source of this git repository, which will give you more flexibility in developing on top of StanfordNLP and training your own models. For this option, run

git clone git@github.com:stanfordnlp/stanfordnlp.git
cd stanfordnlp
pip install -e .

`Running StanfordNLP`

Getting Started with the neural pipeline
To run your first StanfordNLP pipeline, simply following these steps in your Python interactive interpreter:

>>> import stanfordnlp
>>> stanfordnlp.download('en')   # This downloads the English models for the neural pipeline
>>> nlp = stanfordnlp.Pipeline() # This sets up a default neural pipeline in English
>>> doc = nlp("Barack Obama was born in Hawaii.  He was elected president in 2008.")
>>> doc.sentences[0].print_dependencies()

The last command will print out the words in the first sentence in the input string (or Document, as it is represented in StanfordNLP), as well as the indices for the word that governs it in the Universal Dependencies parse of that sentence (its “head”), along with the dependency relation between the words. The output should look like:

('Barack', '4', 'nsubj:pass')
('Obama', '1', 'flat')
('was', '4', 'aux:pass')
('born', '0', 'root')
('in', '6', 'case')
('Hawaii', '4', 'obl')
('.', '4', 'punct')

Note: If you are running into issues like OSError: [Errno 22] Invalid argument, it’s very likely that you are affected by a known Python issue, and we would recommend Python 3.6.8 or later and Python 3.7.2 or later.

We also provide a multilingual demo script that demonstrates how one uses StanfordNLP in other languages than English, for example Chinese (traditional)

python demo/pipeline_demo.py -l zh

See our getting started guide for more details.

另外stanfordcorenlp使用教程

本教程以stanfordcorenlp接口为例（本文所用版本为Stanford CoreNLP 3.9.1），讲解Python调用StanfordCoreNLP的使用方法。

1.使用pip安装`stanfordcorenlp`:

简单使用命令：pip install stanfordcorenlp
选择USTC镜像安装（安装速度很快，毕竟国内镜像）：pip install stanfordcorenlp -i http://pypi.mirrors.ustc.edu.cn/simple/ --trusted-host pypi.mirrors.ustc.edu.cn

2.在Python环境下调用`stanfordcorenlp`:

Simple usage

from stanfordcorenlp import StanfordCoreNLP

nlp = StanfordCoreNLP(r'G:\JavaLibraries\stanford-corenlp-full-2018-02-27')

sentence = 'Guangdong University of Foreign Studies is located in Guangzhou.'
print 'Tokenize:', nlp.word_tokenize(sentence)
print 'Part of Speech:', nlp.pos_tag(sentence)
print 'Named Entities:', nlp.ner(sentence)
print 'Constituency Parsing:', nlp.parse(sentence)
print 'Dependency Parsing:', nlp.dependency_parse(sentence)

nlp.close() # Do not forget to close! The backend server will consume a lot memery.

Output format:

# Tokenize
[u'Guangdong', u'University', u'of', u'Foreign', u'Studies', u'is', u'located', u'in', u'Guangzhou', u'.']

#Part of Speech

[(u'Guangdong', u'NNP'), (u'University', u'NNP'), (u'of', u'IN'), (u'Foreign', u'NNP'), (u'Studies', u'NNPS'), (u'is', u'VBZ'), (u'located', u'JJ'), (u'in', u'IN'), (u'Guangzhou', u'NNP'), (u'.', u'.')]

# Named Entities
 [(u'Guangdong', u'ORGANIZATION'), (u'University', u'ORGANIZATION'), (u'of', u'ORGANIZATION'), (u'Foreign', u'ORGANIZATION'), (u'Studies', u'ORGANIZATION'), (u'is', u'O'), (u'located', u'O'), (u'in', u'O'), (u'Guangzhou', u'LOCATION'), (u'.', u'O')]

# Constituency Parsing
 (ROOT
  (S
    (NP
      (NP (NNP Guangdong) (NNP University))
      (PP (IN of)
        (NP (NNP Foreign) (NNPS Studies))))
    (VP (VBZ is)
      (ADJP (JJ located)
        (PP (IN in)
          (NP (NNP Guangzhou)))))
    (. .)))

#Dependency Parsing
[(u'ROOT', 0, 7), (u'compound', 2, 1), (u'nsubjpass', 7, 2), (u'case', 5, 3), (u'compound', 5, 4), (u'nmod', 2, 5), (u'auxpass', 7, 6), (u'case', 9, 8), (u'nmod', 7, 9), (u'punct', 7, 10)]

Other Human Languages Support

Note: you must download an additional model file and place it in the …/stanford-corenlp-full-2018-02-27 folder. For example, you should download the stanford-chinese-corenlp-2018-02-27-models.jar file if you want to process Chinese.

# _*_coding:utf-8_*_

# Other human languages support, e.g. Chinese

sentence = '清华大学位于北京。'

with StanfordCoreNLP(r'G:\JavaLibraries\stanford-corenlp-full-2018-02-27', lang='zh') as nlp:
    print(nlp.word_tokenize(sentence))
    print(nlp.pos_tag(sentence))
    print(nlp.ner(sentence))
    print(nlp.parse(sentence))
    print(nlp.dependency_parse(sentence))

General Stanford CoreNLP API

Since this will load all the models which require more memory, initialize the server with more memory. 8GB is recommended.

#General json output
nlp = StanfordCoreNLP(r'path_to_corenlp', memory='8g')
print nlp.annotate(sentence)
nlp.close()

You can specify properties:

annotators: tokenize, ssplit, pos, lemma, ner, parse, depparse, dcoref (See Detail)

pipelineLanguage: en, zh, ar, fr, de, es (English, Chinese, Arabic, French, German, Spanish) (See Annotator Support Detail)

outputFormat: json, xml, text

text = 'Guangdong University of Foreign Studies is located in Guangzhou. ' \
       'GDUFS is active in a full range of international cooperation and exchanges in education. '

props={'annotators': 'tokenize,ssplit,pos','pipelineLanguage':'en','outputFormat':'xml'}
print nlp.annotate(text, properties=props)
nlp.close()

Use an Existing Server

Start a CoreNLP Server with command:

java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000
And then:

# Use an existing server
nlp = StanfordCoreNLP('http://localhost', port=9000)

Debug

import logging
from stanfordcorenlp import StanfordCoreNLP

# Debug the wrapper
nlp = StanfordCoreNLP(r'path_or_host', logging_level=logging.DEBUG)

# Check more info from the CoreNLP Server 
nlp = StanfordCoreNLP(r'path_or_host', quiet=False, logging_level=logging.DEBUG)
nlp.close()

Build

We use setuptools to package our project. You can build from the latest source code with the following command:

$ python setup.py bdist_wheel --universal
You will see the .whl file under dist directory.

Mr番茄蛋

关注

16
点赞
踩
85

收藏

觉得还不错? 一键收藏
58
评论
Python中使用Stanford CoreNLP

前言Stanford CoreNLP的源代码是使用Java写的，提供了Server方式进行交互。stanfordcorenlp是一个对Stanford CoreNLP进行了封装的Python工具包，GitHub地址，使用非常方便。安装依赖1：下载安装JDK 1.8及以上版本。 2：下载Stanford CoreNLP文件，解压。 3：处理中文还需要下载中文的模型jar文件，然后...
复制链接

扫一扫