python命名实体识别工具_斯坦福大学使用NLTK命名实体识别器（NER）功能

最新推荐文章于 2022-02-21 09:08:15 发布

大嘴博士

最新推荐文章于 2022-02-21 09:08:15 发布

阅读量696

点赞数

文章标签： python命名实体识别工具

本文链接：https://blog.csdn.net/weixin_32564229/article/details/112931235

版权

Is this possible: to get (similar to) Stanford Named Entity Recognizer functionality using just NLTK?

Is there any example?

In particular, I am interested in extraction LOCATION part of text. For example, from text

The meeting will be held at 22 West Westin st., South Carolina, 12345

on Nov.-18

ideally I would like to get something like

22/LOCATION

(LOCATION West/LOCATION Westin/LOCATION)

st./LOCATION

,/,

(South/LOCATION Carolina/LOCATION)

,/,

12345/LOCATION

.....

or simply

22 West Westin st., South Carolina, 12345

Instead, I am only able to get

The/DT

meeting/NN

will/MD

be/VB

held/VBN

at/IN

22/CD

(LOCATION West/NNP Westin/NNP)

st./NNP

,/,

(GPE South/NNP Carolina/NNP)

,/,

12345/CD

on/IN

Nov.-18/-NONE-)

Note that if I enter my text into http://nlp.stanford.edu:8080/ner/process I get results far from perfect (street number and zip code are still missing) but at least "st." is a part of LOCATION and South Carolina is a LOCATION and not some "GPE / NNP" : ?

What I am doing wrong please? how can I fix it to use NLTK for extracting location piece from some text please?

Many thanks in advance!

解决方案

nltk DOES have an interface for Stanford NER, check nltk.tag.stanford.NERTagger.

from nltk.tag.stanford import NERTagger

st = NERTagger('/usr/share/stanford-ner/classifiers/all.3class.distsim.crf.ser.gz',

'/usr/share/stanford-ner/stanford-ner.jar')

st.tag('Rami Eid is studying at Stony Brook University in NY'.split())

output:

[('Rami', 'PERSON'), ('Eid', 'PERSON'), ('is', 'O'), ('studying', 'O'),

('at', 'O'), ('Stony', 'ORGANIZATION'), ('Brook', 'ORGANIZATION'),

('University', 'ORGANIZATION'), ('in', 'O'), ('NY', 'LOCATION')]

However every time you call tag, nltk simply writes the target sentence into a file and runs Stanford NER command line tool to parse that file and finally parses the output back to python. Therefore the overhead of loading classifiers (around 1 min for me every time) is unavoidable.

If that's a problem, use Pyner.

First run Stanford NER as a server

java -mx1000m -cp stanford-ner.jar edu.stanford.nlp.ie.NERServer \

-loadClassifier classifiers/english.all.3class.distsim.crf.ser.gz -port 9191

then go to pyner folder

import ner

tagger = ner.SocketNER(host='localhost', port=9191)

tagger.get_entities("University of California is located in California, United States")

# {'LOCATION': ['California', 'United States'],

# 'ORGANIZATION': ['University of California']}

tagger.json_entities("Alice went to the Museum of Natural History.")

#'{"ORGANIZATION": ["Museum of Natural History"], "PERSON": ["Alice"]}'

Hope this helps.

大嘴博士

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python命名实体识别工具_斯坦福大学使用NLTK命名实体识别器（NER）功能

Is this possible: to get (similar to) Stanford Named Entity Recognizer functionality using just NLTK?Is there any example?In particular, I am interested in extraction LOCATION part of text. For exampl...
复制链接

扫一扫