colab使用斯坦福分词器和词性标注

引入斯坦福

!pip install stanfordcorenlp

下载

!wget http://nlp.stanford.edu/software/stanford-corenlp-latest.zip

解压

!unzip '/content/stanford-corenlp-latest.zip' -d '/content'

改目录

import os
os.chdir('/content/stanford-corenlp-4.0.0')
!pwd

样例一

from stanfordcorenlp import StanfordCoreNLP
import nltk
from nltk.tree import Tree as nltkTree
 
##读取stanford-corenlp所在的目录
nlp = StanfordCoreNLP('/content/stanford-corenlp-4.0.0') 
 
#输入句子
sentence = 'Video becomes a new way of communication between Internet users with the proliferation of sensor-rich mobile devices.'
 

print('Part of Speech:', nlp.pos_tag(sentence))
print('Part of Speech:', nlp.dependency_parse(sentence))
print(nlp.word_tokenize(sentence))
print(nlp.ner(sentence))
print(nlp.parse(sentence))

写入json文件

from stanfordcorenlp import StanfordCoreNLP
import nltk
from nltk.tree import Tree as nltkTree
 
##读取stanford-corenlp所在的目录
nlp = StanfordCoreNLP('/content/stanford-corenlp-4.0.0') 

all_cap_pos = []
all_img_id = []
examples = []

word, pos = [], [] 
for annot in annotations['annotations'][:2000]:
    cap_pos = nlp.pos_tag(annot['caption'])
    image_id = annot['image_id']

    for cap_p in cap_pos:
      word.append(cap_p[0])
      pos.append(cap_p[1])
    examples.append({'word':word, 'pos':pos, 'image_id':image_id})
    word, pos = [], [] 

with open('/content/cap_pos.json','w', encoding='utf-8') as f: 
 	json.dump(examples, f)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值