使用bert实现的一个NER10标签任务
- github:nlp-code/bert命名实体识别.ipynb at main · cshmzin/nlp-code (github.com)
- bert介绍博客:Simple to Bert | Ripshun Blog
- 数据集来源:CLUE官网(细粒度NER任务)
获取数据:
# 获取数据
import json
train_data = []
dev_data = []
test_data = []
for line in open('train.json','r',encoding='UTF-8'):
train_data.append(json.loads(line))
for line in open('dev.json','r',encoding='UTF-8'):
dev_data.append(json.loads(line))
for line in open('test.json','r',encoding='UTF-8'):
test_data.append(json.loads(line))
print(f'数量:train:{len(train_data)},dev:{len(dev_data)},test:{len(test_data)}')
print(train_data[0])
print(dev_data[0])
print(test_data[0])
标签数据处理:
构建标签字典,字典格式如下:
#上图为标签类别
#需要构建标签
import re
label_type = {'o':0,'address':1,'book':2,'company':3,'game':4,'government':5,'movie':6,'name':7,'organization':8,'position':9,'scene':10}
def decode_label(d):
#解析标签,以列表形式构成
text_len = len(d['text'])
label = [0]*text_len
types = d['label'].keys()
for t in types