Bert NER 实战

最新推荐文章于 2024-06-18 09:37:18 发布

Super_Whw

最新推荐文章于 2024-06-18 09:37:18 发布

阅读量605

点赞数 2

分类专栏：机器学习

本文链接：https://blog.csdn.net/weixin_42939835/article/details/116011577

版权

0. 比赛介绍

本项目来自 Kaggle 的 NER 比赛：比赛链接

此 pipeline 及 code 参考自

https://www.kaggle.com/tungmphung/coleridge-matching-bert-ner?select=kaggle_run_ner.py
https://www.kaggle.com/tungmphung/pytorch-bert-for-named-entity-recognition

1. Bert NER Finetune

数据准备

首先需要将数据转换成 NER 的 json 格式。

原始数据

train.csv

0007f880-0a9b-492d-9a58-76eb0b0e0bd7.json （某篇文章）
在这里插入图片描述

由于train.csv中 Id 有重复，首先通过 group 将相同的并入一行：

train = train.groupby('Id').agg({
   
    'pub_title': 'first',
    'dataset_title': '|'.join,
    'dataset_label': '|'.join,
    'cleaned_label': '|'.join
}).reset_index()

print(f'No. grouped training rows: {len(train)}')

No. grouped training rows: 14316

数据转换

直接上代码：

cnt_pos, cnt_neg = 0, 0 # number of sentences that contain/not contain labels
ner_data = []

pbar = tqdm(total

最低0.47元/天解锁文章

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

Super_Whw

关注关注

2
点赞
踩
1

收藏

觉得还不错? 一键收藏
1
评论
Bert NER 实战

目录0. 比赛介绍1. Bert NER Finetune数据准备原始数据数据转换模型训练0. 比赛介绍本项目来自 Kaggle 的 NER 比赛：比赛链接此 pipeline 及 code 参考自https://www.kaggle.com/tungmphung/coleridge-matching-bert-ner?select=kaggle_run_ner.pyhttps://www.kaggle.com/tungmphung/pytorch-bert-for-named-entity-r
复制链接

扫一扫