python nltk 命名实体_Python lis的NLTK命名实体识别

nltk.ne_chunk返回一个嵌套的nltk.tree.Tree对象,因此您必须遍历Tree对象才能到达ne。>>> from nltk import ne_chunk, pos_tag, word_tokenize

>>> from nltk.tree import Tree

>>>

>>> def get_continuous_chunks(text):

... chunked = ne_chunk(pos_tag(word_tokenize(text)))

... continuous_chunk = []

... current_chunk = []

... for i in chunked:

... if type(i) == Tree:

... current_chunk.append(" ".join([token for token, pos in i.leaves()]))

... elif current_chunk:

... named_entity = " ".join(current_chunk)

... if named_entity not in continuous_chunk:

... continuous_chunk.append(named_entity)

... current_chunk = []

... else:

... continue

... return continuous_chunk

...

>>> my_sent = "WASHINGTON -- In the wake of a string of abuses by New York police officers in the 1990s, Loretta E. Lynch, the top federal prosecutor in Brooklyn, spoke forcefully about the pain of a broken trust that African-Americans felt and said the responsibility for repairing generations of miscommunication and mistrust fell to law enforcement."

>>> get_continuous_chunks(my_sent)

['WASHINGTON', 'New York', 'Loretta E. Lynch', 'Brooklyn']

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值