3.4 StanfordCoreNLP分词命名实体提取

最新推荐文章于 2024-05-24 09:30:42 发布

筱筱思

最新推荐文章于 2024-05-24 09:30:42 发布

阅读量768

点赞数

本文链接：https://blog.csdn.net/liusisi_/article/details/108454106

版权

1.StanfordCoreNLP安装：

下载安装JDK 1.8及以上版本
下载Stanford CoreNLP文件，解压。
处理中文还需要下载中文的模型jar文件，然后放到stanford-corenlp-full-2016-10-31根目录下
pip安装standford CoreNLP：pip install stanfordcorenlp （可用豆瓣网安装）>pip install -i https://pypi.douban.com/simple stanfordcorenlp

2.分词和命名实体提取（这里加载非常缓慢，所以只用一个句子测试）

from stanfordcorenlp import StanfordCoreNLP

nlp = StanfordCoreNLP('D:/software/stanford-corenlp-full-2016-10-31', lang='zh')

sentence = '清华大学位于北京。'
print("中文分词：")
print(nlp.word_tokenize(sentence))  # 中文分词
print("词性标注：")
print(nlp.pos_tag(sentence))  # 词性标注
print("命名实体分析：")
print(nlp.ner(sentence))  # 命名实体分析

3.运行结果：
在这里插入图片描述
4.出现的问题
stanfordcorenlp 在做中文处理是返回值为

[’’, ‘’, ‘’, ‘’, ‘’]
[(’’, ‘NR’), (’’, ‘NN’), (’’, ‘VV’), (’’, ‘NR’), (’’, ‘PU’)]
[(’’, ‘ORGANIZATION’), (’’, ‘ORGANIZATION’), (’’, ‘O’), (’’, ‘GPE’), (’’, ‘O’)]
解决方法：