python命名实体识别工具_斯坦福大学使用NLTK命名实体识别器(NER)功能

Is this possible: to get (similar to) Stanford Named Entity Recognizer functionality using just NLTK?

Is there any example?

In particular, I am interested in extraction LOCATION part of text. For example, from text

The meeting will be held at 22 West Westin st., South Carolina, 12345

on Nov.-18

ideally I would like to get something like

(S

22/LOCATION

(LOCATION West/LOCATION Westin/LOCATION)

st./LOCATION

,/,

(South/LOCATION Carolina/LOCATION)

,/,

12345/LOCATION

.....

or simply

22 West Westin st., South Carolina, 12345

Instead, I am only able to get

(S

The/DT

meeting/NN

will/MD

be/VB

held/VBN

at/IN

22/CD

(LOCATION West/NNP Westin/NNP)

st./NNP

,/,

(GPE South/NNP Carolina/NNP)

,/,

12345/CD

on/IN

Nov.-18/-NONE-)

Note that if I enter my text into http://nlp.stanford.edu:8080/ner/process I get results far from perfect (street number and zip code are still missing) but at least "st." is a part of LOCATION and South Carolina is a LOCATION and not some "GPE / NNP" : ?

What I am doing wrong please? how can I fix it to use NLTK for extracting location piece from some text please?

Many thanks in advance!

解决方案

nltk DOES have an interface for Stanford NER, check nltk.tag.stanford.NERTagger.

from nltk.tag.stanford import NERTagger

st = NERTagger('/usr/share/stanford-ner/classifiers/all.3class.distsim.crf.ser.gz',

'/usr/share/stanford-ner/stanford-ner.jar')

st.tag('Rami Eid is studying at Stony Brook University in NY'.split())

output:

[('Rami', 'PERSON'), ('Eid', 'PERSON'), ('is', 'O'), ('studying', 'O'),

('at', 'O'), ('Stony', 'ORGANIZATION'), ('Brook', 'ORGANIZATION'),

('University', 'ORGANIZATION'), ('in', 'O'), ('NY', 'LOCATION')]

However every time you call tag, nltk simply writes the target sentence into a file and runs Stanford NER command line tool to parse that file and finally parses the output back to python. Therefore the overhead of loading classifiers (around 1 min for me every time) is unavoidable.

If that's a problem, use Pyner.

First run Stanford NER as a server

java -mx1000m -cp stanford-ner.jar edu.stanford.nlp.ie.NERServer \

-loadClassifier classifiers/english.all.3class.distsim.crf.ser.gz -port 9191

then go to pyner folder

import ner

tagger = ner.SocketNER(host='localhost', port=9191)

tagger.get_entities("University of California is located in California, United States")

# {'LOCATION': ['California', 'United States'],

# 'ORGANIZATION': ['University of California']}

tagger.json_entities("Alice went to the Museum of Natural History.")

#'{"ORGANIZATION": ["Museum of Natural History"], "PERSON": ["Alice"]}'

Hope this helps.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值