PaddleNLP实战——LIC2021事件抽取任务基线(附代码)

PaddleNLP实战——LIC2021事件抽取任务基线

相关系列笔记:

论文阅读:DuEE:A Large-Scale Dataset for Chinese Event Extraction in Real-World Scenarios(附数据集地址)
PaddleNLP实战——LIC2021事件抽取任务基线(附代码)
PaddleNLP实战——LIC2021关系抽取任务基线(附代码)

  信息抽取旨在从非结构化自然语言文本中提取结构化知识,如实体、关系、事件等。事件抽取的目标是对于给定的自然语言句子,根据预先指定的事件类型和论元角色,识别句子中所有目标事件类型的事件,并根据相应的论元角色集合抽取事件所对应的论元。其中目标事件类型 (event_type) 和论元角色 (role) 限定了抽取的范围,例如 (event_type:胜负,role:时间,胜者,败者,赛事名称)、(event_type:夺冠,role:夺冠事件,夺冠赛事,冠军)。

在这里插入图片描述

  该示例展示了如何使用PaddleNLP快速复现LIC2021事件抽取比赛基线并进阶优化基线。

# 安装paddlenlp最新版本
!pip install --upgrade paddlenlp

%cd event_extraction/
Looking in indexes: https://mirror.baidu.com/pypi/simple/
Collecting paddlenlp
  Downloading https://mirror.baidu.com/pypi/packages/e9/89/812c1f3683f8296114ca91d591601515352741d37d9847114836a9dfa188/paddlenlp-2.0.0rc16-py3-none-any.whl (295kB)
     |████████████████████████████████| 296kB 20.7MB/s eta 0:00:01
Requirement already satisfied, skipping upgrade: h5py in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlenlp) (2.9.0)
Requirement already satisfied, skipping upgrade: visualdl in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlenlp) (2.1.1)
Requirement already satisfied, skipping upgrade: jieba in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlenlp) (0.42.1)
Requirement already satisfied, skipping upgrade: colorlog in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlenlp) (4.1.0)
Requirement already satisfied, skipping upgrade: colorama in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlenlp) (0.4.4)
Requirement already satisfied, skipping upgrade: seqeval in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from paddlenlp) (1.2.2)
Requirement already satisfied, skipping upgrade: six in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from h5py->paddlenlp) (1.15.0)
Requirement already satisfied, skipping upgrade: numpy>=1.7 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from h5py->paddlenlp) (1.16.4)
Requirement already satisfied, skipping upgrade: Flask-Babel>=1.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->paddlenlp) (1.0.0)
Requirement already satisfied, skipping upgrade: bce-python-sdk in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->paddlenlp) (0.8.53)
Requirement already satisfied, skipping upgrade: requests in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->paddlenlp) (2.22.0)
Requirement already satisfied, skipping upgrade: flask>=1.1.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->paddlenlp) (1.1.1)
Requirement already satisfied, skipping upgrade: Pillow>=7.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->paddlenlp) (7.1.2)
Requirement already satisfied, skipping upgrade: protobuf>=3.11.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->paddlenlp) (3.14.0)
Requirement already satisfied, skipping upgrade: flake8>=3.7.9 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->paddlenlp) (3.8.2)
Requirement already satisfied, skipping upgrade: shellcheck-py in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->paddlenlp) (0.7.1.1)
Requirement already satisfied, skipping upgrade: pre-commit in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl->paddlenlp) (1.21.0)
Requirement already satisfied, skipping upgrade: scikit-learn>=0.21.3 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from seqeval->paddlenlp) (0.22.1)
Requirement already satisfied, skipping upgrade: pytz in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from Flask-Babel>=1.0.0->visualdl->paddlenlp) (2019.3)
Requirement already satisfied, skipping upgrade: Jinja2>=2.5 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from Flask-Babel>=1.0.0->visualdl->paddlenlp) (2.10.1)
Requirement already satisfied, skipping upgrade: Babel>=2.3 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from Flask-Babel>=1.0.0->visualdl->paddlenlp) (2.8.0)
Requirement already satisfied, skipping upgrade: future>=0.6.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from bce-python-sdk->visualdl->paddlenlp) (0.18.0)
Requirement already satisfied, skipping upgrade: pycryptodome>=3.8.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from bce-python-sdk->visualdl->paddlenlp) (3.9.9)
Requirement already satisfied, skipping upgrade: certifi>=2017.4.17 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from requests->visualdl->paddlenlp) (2019.9.11)
Requirement already satisfied, skipping upgrade: idna<2.9,>=2.5 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from requests->visualdl->paddlenlp) (2.8)
Requirement already satisfied, skipping upgrade: chardet<3.1.0,>=3.0.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from requests->visualdl->paddlenlp) (3.0.4)
Requirement already satisfied, skipping upgrade: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from requests->visualdl->paddlenlp) (1.25.6)
Requirement already satisfied, skipping upgrade: itsdangerous>=0.24 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flask>=1.1.1->visualdl->paddlenlp) (1.1.0)
Requirement already satisfied, skipping upgrade: click>=5.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flask>=1.1.1->visualdl->paddlenlp) (7.0)
Requirement already satisfied, skipping upgrade: Werkzeug>=0.15 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flask>=1.1.1->visualdl->paddlenlp) (0.16.0)
Requirement already satisfied, skipping upgrade: mccabe<0.7.0,>=0.6.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flake8>=3.7.9->visualdl->paddlenlp) (0.6.1)
Requirement already satisfied, skipping upgrade: pycodestyle<2.7.0,>=2.6.0a1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flake8>=3.7.9->visualdl->paddlenlp) (2.6.0)
Requirement already satisfied, skipping upgrade: importlib-metadata; python_version < "3.8" in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flake8>=3.7.9->visualdl->paddlenlp) (0.23)
Requirement already satisfied, skipping upgrade: pyflakes<2.3.0,>=2.2.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flake8>=3.7.9->visualdl->paddlenlp) (2.2.0)
Requirement already satisfied, skipping upgrade: cfgv>=2.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl->paddlenlp) (2.0.1)
Requirement already satisfied, skipping upgrade: toml in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl->paddlenlp) (0.10.0)
Requirement already satisfied, skipping upgrade: nodeenv>=0.11.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl->paddlenlp) (1.3.4)
Requirement already satisfied, skipping upgrade: pyyaml in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl->paddlenlp) (5.1.2)
Requirement already satisfied, skipping upgrade: virtualenv>=15.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl->paddlenlp) (16.7.9)
Requirement already satisfied, skipping upgrade: aspy.yaml in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl->paddlenlp) (1.3.0)
Requirement already satisfied, skipping upgrade: identify>=1.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl->paddlenlp) (1.4.10)
Requirement already satisfied, skipping upgrade: scipy>=0.17.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from scikit-learn>=0.21.3->seqeval->paddlenlp) (1.3.0)
Requirement already satisfied, skipping upgrade: joblib>=0.11 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from scikit-learn>=0.21.3->seqeval->paddlenlp) (0.14.1)
Requirement already satisfied, skipping upgrade: MarkupSafe>=0.23 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from Jinja2>=2.5->Flask-Babel>=1.0.0->visualdl->paddlenlp) (1.1.1)
Requirement already satisfied, skipping upgrade: zipp>=0.5 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from importlib-metadata; python_version < "3.8"->flake8>=3.7.9->visualdl->paddlenlp) (0.6.0)
Requirement already satisfied, skipping upgrade: more-itertools in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from zipp>=0.5->importlib-metadata; python_version < "3.8"->flake8>=3.7.9->visualdl->paddlenlp) (7.2.0)
Installing collected packages: paddlenlp
  Found existing installation: paddlenlp 2.0.0rc7
    Uninstalling paddlenlp-2.0.0rc7:
      Successfully uninstalled paddlenlp-2.0.0rc7
Successfully installed paddlenlp-2.0.0rc16
/home/aistudio/event_extraction

  该比赛有两个子任务,一个篇章级事件抽取任务,一个句子级事件抽取任务

一、篇章级事件抽取基线

  篇章级事件抽取数据集(DuEE-Fin)是金融领域篇章级别事件抽取数据集, 共包含13个已定义好的事件类型约束和1.15万中文篇章(存在部分非目标篇章作为负样例),其中6900训练集,1150验证集和3450测试集。 在该数据集上基线采用基于ERNIE的序列标注(sequence labeling)方案,分为基于序列标注的触发词抽取模型、基于序列标注的论元抽取模型和枚举属性分类模型,属于PipeLine模型;基于序列标注的触发词抽取模型采用BIO方式,识别触发词的位置以及对应的事件类型,基于序列标注的论元抽取模型采用BIO方式识别出事件中的论元以及对应的论元角色;枚举属性分类模型采用ernie进行分类。

评测方法

  本任务采用预测论元F1值作为评价指标,对于每个篇章,采用不放回的方式给每个目标事件寻找最相似的预测事件(事件级别匹配),搜寻方式是优先寻找与目标事件的事件类型相同且角色和论元正确数量最多的预测事件。

  f1_score = (2 * P * R) / (P + R),其中

  • 预测论元正确=事件类型和角色相同且论元正确
  • P=预测论元正确数量 / 所有预测论元的数量
  • R=预测论元正确数量 / 所有人工标注论元的数量

1.1 快速复现基线Step1:数据预处理并加载

  从比赛官网下载数据集,解压存放于data/DuEE-Fin目录下,将原始数据预处理成序列标注格式数据。 处理之后的数据同样放在data/DuEE-Fin下, 触发词识别数据文件存放在data/DuEE-Fin/role下, 论元角色识别数据文件存放在data/DuEE-Fin/trigger下。 枚举分类数据存放在data/DuEE-Fin/enum下。

!bash ./run_duee_fin.sh data_prepare
check and create directory
create dir * ./ckpt *
create dir * ./ckpt/DuEE-Fin *
create dir * ./submit *

start DuEE-Fin data prepare

=================DUEE FINANCE DATASET==============

=================start schema process==============
input path ./conf/DuEE-Fin/event_schema.json
save trigger tag 27 at ./conf/DuEE-Fin/trigger_tag.dict
save trigger tag 121 at ./conf/DuEE-Fin/role_tag.dict
save enum tag 4 at ./conf/DuEE-Fin/enum_tag.dict
=================end schema process===============

=================start data process==============

********** start document process **********
train 32795 dev 5302 test 140867
********** end document process **********

********** start sentence process **********

----trigger------for dir ./data/DuEE-Fin/sentence to ./data/DuEE-Fin/trigger
train 7251 dev 1180

----role------for dir ./data/DuEE-Fin/sentence to ./data/DuEE-Fin/role
train 9441 dev 1524

----enum------for dir ./data/DuEE-Fin/sentence to ./data/DuEE-Fin/enum
train 429 dev 69
********** end sentence process **********
=================end data process==============
end DuEE-Fin data prepare

  我们可以加载自定义数据集。通过继承paddle.io.Dataset,自定义实现__getitem__ 和 __len__两个方法。

  如完成触发词识别,加载数据集event_extraction/data/DuEE-Fin/trigger

import paddle
from utils import load_dict

class DuEventExtraction(paddle.io.Dataset):
    """DuEventExtraction"""
    def __init__(self, data_path, tag_path):

        self.label_vocab = load_dict(tag_path)
        self.word_ids = []
        self.label_ids = []
        with open(data_path, 'r', encoding='utf-8') as fp:
            # skip the head line
            next(fp)
            for line in fp.readlines():
                words, labels = line.strip('\n').split('\t')
                words = words.split('\002')
                labels = labels.split('\002')
                self.word_ids.append(words)
                self.label_ids.append(labels)

        self.label_num = max(self.label_vocab.values()) + 1

    def __len__(self):
        return len(self.word_ids)

    def __getitem__(self, index):
        return self.word_ids[index], self.label_ids[index]

train_ds = DuEventExtraction('./data/DuEE-Fin/trigger/train.tsv', './conf/DuEE-Fin/trigger_tag.dict')
dev_ds = DuEventExtraction('./data/DuEE-Fin/trigger/dev.tsv', './conf/DuEE-Fin/trigger_tag.dict')

count = 0
for text, label in train_ds:
    print(f"text: {text}; label: {label}")
    count += 1
    if count >= 3:
        break
text: ['原', '标', '题', ':', '万', '讯', '自', '控', '(', '7', '.', '4', '9', '0', ',', '-', '0', '.', '1', '0', ',', '-', '1', '.', '3', '2', '%', ')', ':', '傅', '宇', '晨', '解', '除', '部', '分', '股', '份', '质', '押', '、', '累', '计', '质', '押', '比', '例', '为', '3', '9', '.', '5', '5', '%', ',', ',', ',', ',', '来', '源', ':', '每', '日', '经', '济', '新', '闻', ',', '每', '经', 'a', 'i', '快', '讯', ',', '万', '讯', '自', '控', '(', 's', 'z', ',', '3', '0', '0', '1', '1', '2', ',', '收', '盘', '价', ':', '7', '.', '4', '9', '元', ')', '6', '月', '3', '日', '下', '午', '发', '布', '公', '告', '称', ',', '公', '司', '接', '到', '股', '东', '傅', '宇', '晨', '的', '通', '知', ',', '获', '悉', '傅', '宇', '晨', '将', '其', '部', '分', '股', '份', '办', '理', '了', '质', '押', '业', '务', '。', ',', '截', '至', '本', '公', '告', '日', ',', '傅', '宇', '晨', '共', '持', '有', '公', '司', '股', '份', '5', '7', '9', '0', '.', '3', '8', '万', '股', ',', '占', '公', '司', '总', '股', '本', '的', '2', '0', '.', '2', '5', '%', ';', '累', '计', '质', '押', '股', '份', '2', '2', '9', '0', '万', '股', ',', '占', '傅', '宇', '晨', '持', '有', '公', '司', '股', '份', '总', '数', '的', '3', '9', '.', '5', '5', '%', ',', '占', '公', '司', '总', '股', '本', '的', '8', '.', '0', '1', '%', '。']; label: ['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'B-质押', 'I-质押', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O']
text: ['客', '户', '端', ',', '新', '浪', '港', '股', '讯', ',', '众', '安', '集', '团', '(', '0', '.', '2', '4', '8', ',', '-', '0', '.', '0', '0', ',', '-', '0', '.', '8', '0', '%', ')', '(', '0', '0', '6', '7', '2', '.', 'h', 'k', ')', '发', '布', '公', '告', ',', '于', '2', '0', '1', '9', '年', '1', '0', '月', '1', '5', '日', ',', '公', '司', '耗', '资', '9', '4', '.', '5', '6', '万', '港', '元', '回', '购', '3', '8', '0', '.', '5', '万', '股', ',', '回', '购', '价', '格', '每', '股', '0', '.', '2', '4', '8', '-', '0', '.', '2', '4', '9', '港', '元', '。']; label: ['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'B-股份回购', 'I-股份回购', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O']
text: ['原', '标', '题', ':', '金', '徽', '酒', '(', '6', '0', '3', '9', '1', '9', '.', 's', 'h', ')', ':', '亚', '特', '集', '团', '解', '除', '质', '押', '1', '9', '8', '0', '万', '股', ',', ',', ',', ',', '来', '源', ':', '格', '隆', '汇', ',', '格', '隆', '汇', '8', '月', '5', '日', '丨', '金', '徽', '酒', '(', '6', '0', '3', '9', '1', '9', '.', 's', 'h', ')', '公', '布', ',', '公', '司', '近', '日', '收', '到', '控', '股', '股', '东', '甘', '肃', '亚', '特', '投', '资', '集', '团', '有', '限', '公', '司', '(', '“', '亚', '特', '集', '团', '”', ')', '将', '其', '持', '有', '的', '公', '司', '部', '分', '股', '份', '解', '除', '质', '押', '的', '通', '知', '。', ',', '2', '0', '1', '8', '年', '4', '月', '9', '日', ',', '亚', '特', '集', '团', '将', '其', '持', '有', '的', '公', '司', '5', '9', '8', '0', '万', '股', '有', '限', '售', '条', '件', '股', '份', '质', '押', '给', '兰', '州', '银', '行', '股', '份', '有', '限', '公', '司', '陇', '南', '分', '行', '。']; label: ['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'B-解除质押', 'I-解除质押', 'I-解除质押', 'I-解除质押', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O']

1.2 快速复现基线Step2:构建模型

  基于序列标注的触发词抽取模型是整体模型的一部分,该部分主要是给定事件类型,识别句子中出现的事件触发词对应的位置以及对应的事件类别,该模型是基于ERNIE开发序列标注模型,模型原理图如下:

在这里插入图片描述
  同样地,基于序列标注的论元抽取模型也是基于ERNIE开发序列标注模型,该部分主要是识别出事件中的论元以及对应论元角色,模型原理图如下:
在这里插入图片描述

  上述样例中通过模型识别出:

  1)论元"新东方",并分配标签"B-收购方"、“I-收购方”、“I-收购方”;
  2)论元"东方优播", 并分配标签"B-被收购方"、“I-被收购方”、“I-被收购方”、“I-被收购方”。

  最终识别出文本中包含的论元角色和论元对是 <收购方,新东方>、<被收购方,东方优播>

  PaddleNLP提供了ERNIE预训练模型常用序列标注模型,可以通过指定模型名字完成一键加载

from paddlenlp.transformers import ErnieForTokenClassification, ErnieForSequenceClassification

label_map = load_dict('./conf/DuEE-Fin/trigger_tag.dict')
id2label = {val: key for key, val in label_map.items()}
model = ErnieForTokenClassification.from_pretrained("ernie-1.0", num_classes=len(label_map))
[2021-04-10 16:11:55,651] [    INFO] - Downloading https://paddlenlp.bj.bcebos.com/models/transformers/ernie/ernie_v1_chn_base.pdparams and saved to /home/aistudio/.paddlenlp/models/ernie-1.0
[2021-04-10 16:11:55,654] [    INFO] - Downloading ernie_v1_chn_base.pdparams from https://paddlenlp.bj.bcebos.com/models/transformers/ernie/ernie_v1_chn_base.pdparams
100%|██████████| 390123/390123 [00:05<00:00, 72718.98it/s]
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py:1303: UserWarning: Skip loading for classifier.weight. classifier.weight is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py:1303: UserWarning: Skip loading for classifier.bias. classifier.bias is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))

  同时,对于枚举分类数据采用的是基于ERNIE的文本分类模型,枚举角色类型为环节。模型原理图如下:

在这里插入图片描述

  给定文本,对文本进行分类,得到不同类别上的概率 筹备上市(0.8)、暂停上市(0.02)、正式上市(0.15)、终止上市(0.03)

  同样地,PaddleNLP提供了ERNIE预训练模型常用文本分类模型,可以通过指定模型名字完成一键加载

from paddlenlp.transformers import ErnieForSequenceClassification

model = ErnieForSequenceClassification.from_pretrained("ernie-1.0", num_classes=len(label_map))

1.3 快速复现基线Step3:数据处理

  我们需要将原始数据处理成模型可读入的数据。PaddleNLP为了方便用户处理数据,内置了对于各个预训练模型对应的Tokenizer,可以完成 文本token化,转token ID,文本长度截断等操作。与加载模型类似地,也可以一键加载。

  文本数据处理直接调用tokenizer即可输出模型所需输入数据。

from paddlenlp.transformers import ErnieTokenizer, ErnieModel

tokenizer = ErnieTokenizer.from_pretrained("ernie-1.0")
ernie_model = ErnieModel.from_pretrained("ernie-1.0")

# 一行代码完成切分token,映射token ID以及拼接特殊token
encoded_text = tokenizer(text="请输入测试样例", return_length=True, return_position_ids=True)
for key, value in encoded_text.items():
    print("{}:\n\t{}".format(key, value))

# 转化成paddle框架数据格式
input_ids = paddle.to_tensor([encoded_text['input_ids']])
print("input_ids : \n\t{}".format(input_ids))

segment_ids = paddle.to_tensor([encoded_text['token_type_ids']])
print("token_type_ids : \n\t{}".format(segment_ids))

# 此时即可输入ERNIE模型中得到相应输出
sequence_output, pooled_output = ernie_model(input_ids, segment_ids)
print("Token wise output shape: \n\t{}\nPooled output shape: \n\t{}".format(sequence_output.shape, pooled_output.shape))
[2021-04-10 16:12:14,372] [    INFO] - Downloading vocab.txt from https://paddlenlp.bj.bcebos.com/models/transformers/ernie/vocab.txt
100%|██████████| 89/89 [00:00<00:00, 4018.40it/s]
[2021-04-10 16:12:14,586] [    INFO] - Already cached /home/aistudio/.paddlenlp/models/ernie-1.0/ernie_v1_chn_base.pdparams

input_ids:
	[1, 647, 789, 109, 558, 525, 314, 656, 2]
token_type_ids:
	[0, 0, 0, 0, 0, 0, 0, 0, 0]
seq_len:
	9
position_ids:
	[0, 1, 2, 3, 4, 5, 6, 7, 8]
input_ids : 
	Tensor(shape=[1, 9], dtype=int64, place=CUDAPlace(0), stop_gradient=True,
       [[1  , 647, 789, 109, 558, 525, 314, 656, 2  ]])
token_type_ids : 
	Tensor(shape=[1, 9], dtype=int64, place=CUDAPlace(0), stop_gradient=True,
       [[0, 0, 0, 0, 0, 0, 0, 0, 0]])
Token wise output shape: 
	[1, 9, 768]
Pooled output shape: 
	[1, 768]

  由以上代码可以见,tokenizer提供了一种非常便利的方式生成模型所需的数据格式。

  以上,

  • input_ids: 表示输入文本的token ID。
  • token_type_ids: 表示对应的token属于输入的第一个句子还是第二个句子。(Transformer类预训练模型支持单句以及句对输入。)详细参见左侧 sequence_labeling.py convert_example_to_feature()函数解释。
  • seq_len: 表示输入句子的token个数。
  • input_mask:表示对应的token是否一个padding token。由于一个batch中的输入句子长度不同,所以需要将不同长度的句子padding到统一固定长度。1表示真实输入,0表示对应token为padding token。
  • position_ids: 表示对应token在整个输入序列中的位置。

  同时,ERNIE模型输出有2个tensor

  • sequence_output是对应每个输入token的语义特征表示,shape为(1, num_tokens, hidden_size)。其一般用于序列标注、问答等任务。
  • pooled_output是对应整个句子的语义特征表示,shape为(1, hidden_size)。其一般用于文本分类、信息检索等任务。

  NOTE:

  如需使用ernie-tiny预训练模型,则对应的tokenizer应该使用paddlenlp.transformers.ErnieTinyTokenizer.from_pretrained(‘ernie-tiny’)

  以上代码示例展示了使用Transformer类预训练模型所需的数据处理步骤。为了更方便地使用,PaddleNLP同时提供了更加高阶API,一键即可返回模型所需数据格式。

  本基线将对数据作以下处理:

  • 将原始数据处理成模型可以读入的格式。首先使用tokenizer切词并映射词表中input ids,转化token type ids等。
  • 使用paddle.io.DataLoader接口多进程异步加载数据

from functools import partial
from paddlenlp.data import Stack, Tuple, Pad

def convert_example_to_feature(example, tokenizer, label_vocab=None, max_seq_len=512, no_entity_label="O", ignore_label=-1, is_test=False):
    tokens, labels = example
    tokenized_input = tokenizer(
        tokens,
        return_length=True,
        is_split_into_words=True,
        max_seq_len=max_seq_len)

    input_ids = tokenized_input['input_ids']
    token_type_ids = tokenized_input['token_type_ids']
    seq_len = tokenized_input['seq_len']

    if is_test:
        return input_ids, token_type_ids, seq_len
    elif label_vocab is not None:
        labels = labels[:(max_seq_len-2)]
        encoded_label = [no_entity_label] + labels + [no_entity_label]
        encoded_label = [label_vocab[x] for x in encoded_label]
        return input_ids, token_type_ids, seq_len, encoded_label


no_entity_label = "O"
# padding label value
ignore_label = -1
batch_size = 32
max_seq_len = 300

trans_func = partial(
    convert_example_to_feature,
    tokenizer=tokenizer,
    label_vocab=train_ds.label_vocab,
    max_seq_len=max_seq_len,
    no_entity_label=no_entity_label,
    ignore_label=ignore_label,
    is_test=False)
batchify_fn = lambda samples, fn=Tuple(
    Pad(axis=0, pad_val=tokenizer.vocab[tokenizer.pad_token]), # input ids
    Pad(axis=0, pad_val=tokenizer.vocab[tokenizer.pad_token]), # token type ids
    Stack(), # sequence lens
    Pad(axis=0, pad_val=ignore_label) # labels
): fn(list(map(trans_func, samples)))

train_loader = paddle.io.DataLoader(
    dataset=train_ds,
    batch_size=batch_size,
    shuffle=True,
    collate_fn=batchify_fn)
dev_loader = paddle.io.DataLoader(
    dataset=dev_ds,
    batch_size=batch_size,
    collate_fn=batchify_fn)

1.4 快速复现基线Step4:定义损失函数和优化器,开始训练

  在该基线上,我们选择交叉墒作为损失函数,使用paddle.optimizer.AdamW作为优化器。

import numpy as np

@paddle.no_grad()
def evaluate(model, criterion, metric, num_label, data_loader):
    """evaluate"""
    model.eval()
    metric.reset()
    losses = []
    for input_ids, seg_ids, seq_lens, labels in data_loader:
        logits = model(input_ids, seg_ids)
        loss = paddle.mean(criterion(logits.reshape([-1, num_label]), labels.reshape([-1])))
        losses.append(loss.numpy())
        preds = paddle.argmax(logits, axis=-1)
        n_infer, n_label, n_correct = metric.compute(None, seq_lens, preds, labels)
        metric.update(n_infer.numpy(), n_label.numpy(), n_correct.numpy())
        precision, recall, f1_score = metric.accumulate()
    avg_loss = np.mean(losses)
    model.train()

    return precision, recall, f1_score, avg_loss
# 模型参数保存路径
!mkdir ckpt/DuEE-Fin/trigger/
import warnings
from paddlenlp.metrics import ChunkEvaluator

warnings.filterwarnings('ignore')

learning_rate=5e-5
weight_decay=0.01
num_epoch = 1

checkpoints = 'ckpt/DuEE-Fin/trigger/'

num_training_steps = len(train_loader) * num_epoch
# Generate parameter names needed to perform weight decay.
# All bias and LayerNorm parameters are excluded.
decay_params = [
    p.name for n, p in model.named_parameters()
    if not any(nd in n for nd in ["bias", "norm"])
]
optimizer = paddle.optimizer.AdamW(
    learning_rate=learning_rate,
    parameters=model.parameters(),
    weight_decay=weight_decay,
    apply_decay_param_fun=lambda x: x in decay_params)

metric = ChunkEvaluator(label_list=train_ds.label_vocab.keys(), suffix=False)
criterion = paddle.nn.loss.CrossEntropyLoss(ignore_index=ignore_label)

step, best_f1 = 0, 0.0
model.train()
rank = paddle.distributed.get_rank()
for epoch in range(num_epoch):
    for idx, (input_ids, token_type_ids, seq_lens, labels) in enumerate(train_loader):
        logits = model(input_ids, token_type_ids).reshape(
            [-1, train_ds.label_num])
        loss = paddle.mean(criterion(logits, labels.reshape([-1])))
        loss.backward()
        optimizer.step()
        optimizer.clear_grad()
        loss_item = loss.numpy().item()
        if step > 0 and step % 10 == 0 and rank == 0:
            print(f'train epoch: {epoch} - step: {step} (total: {num_training_steps}) - loss: {loss_item:.6f}')
        if step > 0 and step % 50 == 0 and rank == 0:
            p, r, f1, avg_loss = evaluate(model, criterion, metric, len(label_map), dev_loader)
            print(f'dev step: {step} - loss: {avg_loss:.5f}, precision: {p:.5f}, recall: {r:.5f}, ' \
                    f'f1: {f1:.5f} current best {best_f1:.5f}')
            if f1 > best_f1:
                best_f1 = f1
                print(f'==============================================save best model ' \
                        f'best performerence {best_f1:5f}')
                paddle.save(model.state_dict(), '{}/best.pdparams'.format(checkpoints))
        step += 1

# save the final model
if rank == 0:
    paddle.save(model.state_dict(), '{}/final.pdparams'.format(checkpoints))
train epoch: 0 - step: 10 (total: 227) - loss: 0.136036
train epoch: 0 - step: 20 (total: 227) - loss: 0.130759
train epoch: 0 - step: 30 (total: 227) - loss: 0.117360
train epoch: 0 - step: 40 (total: 227) - loss: 0.126342
train epoch: 0 - step: 50 (total: 227) - loss: 0.117132
dev step: 50 - loss: 0.11086, precision: 0.00000, recall: 0.00000, f1: 0.00000 current best 0.00000
train epoch: 0 - step: 60 (total: 227) - loss: 0.127355
train epoch: 0 - step: 70 (total: 227) - loss: 0.120025
train epoch: 0 - step: 80 (total: 227) - loss: 0.112086
train epoch: 0 - step: 90 (total: 227) - loss: 0.106585
train epoch: 0 - step: 100 (total: 227) - loss: 0.109516
dev step: 100 - loss: 0.09834, precision: 0.00000, recall: 0.00000, f1: 0.00000 current best 0.00000
train epoch: 0 - step: 110 (total: 227) - loss: 0.082624
train epoch: 0 - step: 120 (total: 227) - loss: 0.056104
train epoch: 0 - step: 130 (total: 227) - loss: 0.064101
train epoch: 0 - step: 140 (total: 227) - loss: 0.059635
train epoch: 0 - step: 150 (total: 227) - loss: 0.057752
dev step: 150 - loss: 0.04139, precision: 0.35824, recall: 0.38144, f1: 0.36947 current best 0.00000
==============================================save best model best performerence 0.369475
train epoch: 0 - step: 160 (total: 227) - loss: 0.045838
train epoch: 0 - step: 170 (total: 227) - loss: 0.030626
train epoch: 0 - step: 180 (total: 227) - loss: 0.029898
train epoch: 0 - step: 190 (total: 227) - loss: 0.020956
train epoch: 0 - step: 200 (total: 227) - loss: 0.032151
dev step: 200 - loss: 0.01862, precision: 0.66860, recall: 0.71763, f1: 0.69225 current best 0.36947
==============================================save best model best performerence 0.692250
train epoch: 0 - step: 210 (total: 227) - loss: 0.017710
train epoch: 0 - step: 220 (total: 227) - loss: 0.012850

  论元识别模型训练与触发词模型训练相同,只需将数据换成处理过后的论元识别数据集即可。 可通过如下方式启动训练。

# 触发词识别模型训练
!bash run_duee_fin.sh trigger_train
该条输出内容超过1000行,保存时将被截断

check and create directory
dir ./ckpt exist
dir ./ckpt/DuEE-Fin exist
dir ./submit exist

start DuEE-Fin trigger train
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  import imp
-----------  Configuration Arguments -----------
gpus: 0
heter_worker_num: None
heter_workers: 
http_port: None
ips: 127.0.0.1
log_dir: log
nproc_per_node: None
server_num: None
servers: 
training_script: sequence_labeling.py
training_script_args: ['--num_epoch', '20', '--learning_rate', '5e-5', '--tag_path', './conf/DuEE-Fin/trigger_tag.dict', '--train_data', './data/DuEE-Fin/trigger/train.tsv', '--dev_data', './data/DuEE-Fin/trigger/dev.tsv', '--test_data', './data/DuEE-Fin/trigger/test.tsv', '--predict_data', './data/DuEE-Fin/sentence/test.json', '--do_train', 'True', '--do_predict', 'False', '--max_seq_len', '300', '--batch_size', '16', '--skip_step', '10', '--valid_step', '50', '--checkpoints', './ckpt/DuEE-Fin/trigger', '--init_ckpt', './ckpt/DuEE-Fin/trigger/best.pdparams', '--predict_save_path', './ckpt/DuEE-Fin/trigger/test_pred.json', '--device', 'gpu']
worker_num: None
workers: 
------------------------------------------------
WARNING 2021-04-10 16:29:19,740 launch.py:316] Not found distinct arguments and compiled with cuda. Default use collective mode
launch train in GPU mode
INFO 2021-04-10 16:29:19,742 launch_utils.py:471] Local start 1 processes. First process distributed environment info (Only For Debug): 
    +=======================================================================================+
    |                        Distributed Envs                      Value                    |
    +---------------------------------------------------------------------------------------+
    |                       PADDLE_TRAINER_ID                        0                      |
    |                 PADDLE_CURRENT_ENDPOINT                 127.0.0.1:54382               |
    |                     PADDLE_TRAINERS_NUM                        1                      |
    |                PADDLE_TRAINER_ENDPOINTS                 127.0.0.1:54382               |
    |                     FLAGS_selected_gpus                        0                      |
    +=======================================================================================+

INFO 2021-04-10 16:29:19,742 launch_utils.py:475] details abouts PADDLE_TRAINER_ENDPOINTS can be found in log/endpoints.log, and detail running logs maybe found in log/workerlog.0
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  import imp
[2021-04-10 16:29:20,983] [    INFO] - Found /home/aistudio/.paddlenlp/models/ernie-1.0/vocab.txt
[2021-04-10 16:29:20,997] [    INFO] - Already cached /home/aistudio/.paddlenlp/models/ernie-1.0/ernie_v1_chn_base.pdparams
W0410 16:29:20.998939   762 device_context.cc:362] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1
W0410 16:29:21.003577   762 device_context.cc:372] device: 0, cuDNN Version: 7.6.
============start train==========
train epoch: 0 - step: 10 (total: 9080) - loss: 0.109321
train epoch: 0 - step: 20 (total: 9080) - loss: 0.129953
train epoch: 0 - step: 30 (total: 9080) - loss: 0.116185
train epoch: 0 - step: 40 (total: 9080) - loss: 0.126599
train epoch: 0 - step: 50 (total: 9080) - loss: 0.109494
dev step: 50 - loss: 0.11120, precision: 0.00000, recall: 0.00000, f1: 0.00000 current best 0.00000
train epoch: 0 - step: 60 (total: 9080) - loss: 0.111870
train epoch: 0 - step: 70 (total: 9080) - loss: 0.156219
train epoch: 0 - step: 80 (total: 9080) - loss: 0.104292
train epoch: 0 - step: 90 (total: 9080) - loss: 0.129062
train epoch: 0 - step: 100 (total: 9080) - loss: 0.116484
dev step: 100 - loss: 0.10372, precision: 0.00000, recall: 0.00000, f1: 0.00000 current best 0.00000
train epoch: 0 - step: 110 (total: 9080) - loss: 0.107833
train epoch: 0 - step: 120 (total: 9080) - loss: 0.097913
train epoch: 0 - step: 130 (total: 9080) - loss: 0.102398
train epoch: 0 - step: 140 (total: 9080) - loss: 0.061798
train epoch: 0 - step: 150 (total: 9080) - loss: 0.070677
dev step: 150 - loss: 0.05695, precision: 0.25240, recall: 0.12324, f1: 0.16562 current best 0.00000
==============================================save best model best performerence 0.165618
……
train epoch: 19 - step: 8660 (total: 9080) - loss: 0.000040
train epoch: 19 - step: 8670 (total: 9080) - loss: 0.000292
train epoch: 19 - step: 8680 (total: 9080) - loss: 0.000617
train epoch: 19 - step: 8690 (total: 9080) - loss: 0.000061
train epoch: 19 - step: 8700 (total: 9080) - loss: 0.000340
dev step: 8700 - loss: 0.01594, precision: 0.86531, recall: 0.89704, f1: 0.88089 current best 0.89685
train epoch: 19 - step: 8710 (total: 9080) - loss: 0.002070
train epoch: 19 - step: 8720 (total: 9080) - loss: 0.000533
train epoch: 19 - step: 8730 (total: 9080) - loss: 0.001161
train epoch: 19 - step: 8740 (total: 9080) - loss: 0.007269
train epoch: 19 - step: 8750 (total: 9080) - loss: 0.000043
dev step: 8750 - loss: 0.01295, precision: 0.86478, recall: 0.90796, f1: 0.88584 current best 0.89685
train epoch: 19 - step: 8760 (total: 9080) - loss: 0.002034
train epoch: 19 - step: 8770 (total: 9080) - loss: 0.000233
train epoch: 19 - step: 8780 (total: 9080) - loss: 0.000176
train epoch: 19 - step: 8790 (total: 9080) - loss: 0.000349
train epoch: 19 - step: 8800 (total: 9080) - loss: 0.001374
dev step: 8800 - loss: 0.01408, precision: 0.86432, recall: 0.89938, f1: 0.88150 current best 0.89685
train epoch: 19 - step: 8810 (total: 9080) - loss: 0.000389
train epoch: 19 - step: 8820 (total: 9080) - loss: 0.003733
train epoch: 19 - step: 8830 (total: 9080) - loss: 0.000166
train epoch: 19 - step: 8840 (total: 9080) - loss: 0.000097
train epoch: 19 - step: 8850 (total: 9080) - loss: 0.000143
dev step: 8850 - loss: 0.01380, precision: 0.86353, recall: 0.90328, f1: 0.88296 current best 0.89685
train epoch: 19 - step: 8860 (total: 9080) - loss: 0.000026
train epoch: 19 - step: 8870 (total: 9080) - loss: 0.000193
train epoch: 19 - step: 8880 (total: 9080) - loss: 0.001100
train epoch: 19 - step: 8890 (total: 9080) - loss: 0.000031
train epoch: 19 - step: 8900 (total: 9080) - loss: 0.000353
dev step: 8900 - loss: 0.01387, precision: 0.88104, recall: 0.89548, f1: 0.88820 current best 0.89685
train epoch: 19 - step: 8910 (total: 9080) - loss: 0.000200
train epoch: 19 - step: 8920 (total: 9080) - loss: 0.000586
train epoch: 19 - step: 8930 (total: 9080) - loss: 0.000042
train epoch: 19 - step: 8940 (total: 9080) - loss: 0.000408
train epoch: 19 - step: 8950 (total: 9080) - loss: 0.000845
dev step: 8950 - loss: 0.01537, precision: 0.86103, recall: 0.91342, f1: 0.88645 current best 0.89685
train epoch: 19 - step: 8960 (total: 9080) - loss: 0.000170
train epoch: 19 - step: 8970 (total: 9080) - loss: 0.002247
train epoch: 19 - step: 8980 (total: 9080) - loss: 0.000848
train epoch: 19 - step: 8990 (total: 9080) - loss: 0.002282
train epoch: 19 - step: 9000 (total: 9080) - loss: 0.000029
dev step: 9000 - loss: 0.01638, precision: 0.88240, recall: 0.87207, f1: 0.87721 current best 0.89685
train epoch: 19 - step: 9010 (total: 9080) - loss: 0.000446
train epoch: 19 - step: 9020 (total: 9080) - loss: 0.000021
train epoch: 19 - step: 9030 (total: 9080) - loss: 0.000486
train epoch: 19 - step: 9040 (total: 9080) - loss: 0.003263
train epoch: 19 - step: 9050 (total: 9080) - loss: 0.000346
dev step: 9050 - loss: 0.01396, precision: 0.88304, recall: 0.88924, f1: 0.88613 current best 0.89685
train epoch: 19 - step: 9060 (total: 9080) - loss: 0.000052
train epoch: 19 - step: 9070 (total: 9080) - loss: 0.000063
INFO 2021-04-10 17:34:32,659 launch.py:240] Local processes completed.
end DuEE-Fin trigger train
# 触发词识别预测
!bash run_duee_fin.sh trigger_predict
check and create directory
dir ./ckpt exist
dir ./ckpt/DuEE-Fin exist
dir ./submit exist

start DuEE-Fin trigger predict
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  import imp
[2021-04-10 17:34:34,610] [    INFO] - Found /home/aistudio/.paddlenlp/models/ernie-1.0/vocab.txt
[2021-04-10 17:34:34,624] [    INFO] - Already cached /home/aistudio/.paddlenlp/models/ernie-1.0/ernie_v1_chn_base.pdparams
W0410 17:34:34.625129  3383 device_context.cc:362] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1
W0410 17:34:34.629817  3383 device_context.cc:372] device: 0, cuDNN Version: 7.6.
============start predict==========
Loaded parameters from ./ckpt/DuEE-Fin/trigger/best.pdparams
save data 140867 to ./ckpt/DuEE-Fin/trigger/test_pred.json
end DuEE-Fin trigger predict
# 论元识别模型训练
!bash run_duee_fin.sh role_train
该条输出内容超过1000行,保存时将被截断

check and create directory
dir ./ckpt exist
dir ./ckpt/DuEE-Fin exist
dir ./submit exist

start DuEE-Fin role train
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  import imp
-----------  Configuration Arguments -----------
gpus: 0
heter_worker_num: None
heter_workers: 
http_port: None
ips: 127.0.0.1
log_dir: log
nproc_per_node: None
server_num: None
servers: 
training_script: sequence_labeling.py
training_script_args: ['--num_epoch', '20', '--learning_rate', '5e-5', '--tag_path', './conf/DuEE-Fin/role_tag.dict', '--train_data', './data/DuEE-Fin/role/train.tsv', '--dev_data', './data/DuEE-Fin/role/dev.tsv', '--test_data', './data/DuEE-Fin/role/test.tsv', '--predict_data', './data/DuEE-Fin/sentence/test.json', '--do_train', 'True', '--do_predict', 'False', '--max_seq_len', '300', '--batch_size', '16', '--skip_step', '10', '--valid_step', '50', '--checkpoints', './ckpt/DuEE-Fin/role', '--init_ckpt', './ckpt/DuEE-Fin/role/best.pdparams', '--predict_save_path', './ckpt/DuEE-Fin/role/test_pred.json', '--device', 'gpu']
worker_num: None
workers: 
------------------------------------------------
WARNING 2021-04-10 17:57:54,959 launch.py:316] Not found distinct arguments and compiled with cuda. Default use collective mode
launch train in GPU mode
INFO 2021-04-10 17:57:54,961 launch_utils.py:471] Local start 1 processes. First process distributed environment info (Only For Debug): 
    +=======================================================================================+
    |                        Distributed Envs                      Value                    |
    +---------------------------------------------------------------------------------------+
    |                       PADDLE_TRAINER_ID                        0                      |
    |                 PADDLE_CURRENT_ENDPOINT                 127.0.0.1:44116               |
    |                     PADDLE_TRAINERS_NUM                        1                      |
    |                PADDLE_TRAINER_ENDPOINTS                 127.0.0.1:44116               |
    |                     FLAGS_selected_gpus                        0                      |
    +=======================================================================================+

INFO 2021-04-10 17:57:54,961 launch_utils.py:475] details abouts PADDLE_TRAINER_ENDPOINTS can be found in log/endpoints.log, and detail running logs maybe found in log/workerlog.0
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  import imp
[2021-04-10 17:57:56,200] [    INFO] - Found /home/aistudio/.paddlenlp/models/ernie-1.0/vocab.txt
[2021-04-10 17:57:56,213] [    INFO] - Already cached /home/aistudio/.paddlenlp/models/ernie-1.0/ernie_v1_chn_base.pdparams
W0410 17:57:56.215006  4136 device_context.cc:362] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1
W0410 17:57:56.219677  4136 device_context.cc:372] device: 0, cuDNN Version: 7.6.
============start train==========
train epoch: 0 - step: 10 (total: 11800) - loss: 1.228878
train epoch: 0 - step: 20 (total: 11800) - loss: 1.163631
train epoch: 0 - step: 30 (total: 11800) - loss: 1.130505
train epoch: 0 - step: 40 (total: 11800) - loss: 1.303947
train epoch: 0 - step: 50 (total: 11800) - loss: 1.111251
dev step: 50 - loss: 1.14692, precision: 0.00000, recall: 0.00000, f1: 0.00000 current best 0.00000
train epoch: 0 - step: 60 (total: 11800) - loss: 1.335606
train epoch: 0 - step: 70 (total: 11800) - loss: 0.886442
train epoch: 0 - step: 80 (total: 11800) - loss: 1.020030
train epoch: 0 - step: 90 (total: 11800) - loss: 0.871939
train epoch: 0 - step: 100 (total: 11800) - loss: 0.928532
dev step: 100 - loss: 0.98844, precision: 0.00000, recall: 0.00000, f1: 0.00000 current best 0.00000
train epoch: 0 - step: 110 (total: 11800) - loss: 1.005332
train epoch: 0 - step: 120 (total: 11800) - loss: 0.769859
train epoch: 0 - step: 130 (total: 11800) - loss: 0.761578
train epoch: 0 - step: 140 (total: 11800) - loss: 0.653325
train epoch: 0 - step: 150 (total: 11800) - loss: 0.899768
dev step: 150 - loss: 0.71772, precision: 0.06080, recall: 0.00835, f1: 0.01468 current best 0.00000
==============================================save best model best performerence 0.014678
train epoch: 0 - step: 160 (total: 11800) - loss: 0.690438
train epoch: 0 - step: 170 (total: 11800) - loss: 0.774387
train epoch: 0 - step: 180 (total: 11800) - loss: 0.615638
train epoch: 0 - step: 190 (total: 11800) - loss: 0.483597
train epoch: 0 - step: 200 (total: 11800) - loss: 0.571479
dev step: 200 - loss: 0.52474, precision: 0.18197, recall: 0.12865, f1: 0.15073 current best 0.01468
==============================================save best model best performerence 0.150733
train epoch: 0 - step: 210 (total: 11800) - loss: 0.540742
train epoch: 0 - step: 220 (total: 11800) - loss: 0.524742
train epoch: 0 - step: 230 (total: 11800) - loss: 0.464600
train epoch: 0 - step: 240 (total: 11800) - loss: 0.478460
train epoch: 0 - step: 250 (total: 11800) - loss: 0.523782
dev step: 250 - loss: 0.42025, precision: 0.25433, recall: 0.23644, f1: 0.24506 current best 0.15073
==============================================save best model best performerence 0.245059
train epoch: 0 - step: 260 (total: 11800) - loss: 0.374678
train epoch: 0 - step: 270 (total: 11800) - loss: 0.530323
train epoch: 0 - step: 280 (total: 11800) - loss: 0.325683
train epoch: 0 - step: 290 (total: 11800) - loss: 0.375011
train epoch: 0 - step: 300 (total: 11800) - loss: 0.385494
dev step: 300 - loss: 0.34790, precision: 0.27753, recall: 0.26766, f1: 0.27251 current best 0.24506
==============================================save best model best performerence 0.272508
train epoch: 0 - step: 310 (total: 11800) - loss: 0.353424
train epoch: 0 - step: 320 (total: 11800) - loss: 0.410307
train epoch: 0 - step: 330 (total: 11800) - loss: 0.322043
train epoch: 0 - step: 340 (total: 11800) - loss: 0.384293
train epoch: 0 - step: 350 (total: 11800) - loss: 0.271734
dev step: 350 - loss: 0.30927, precision: 0.33494, recall: 0.44913, f1: 0.38372 current best 0.27251
==============================================save best model best performerence 0.383722
train epoch: 0 - step: 360 (total: 11800) - loss: 0.424462
train epoch: 0 - step: 370 (total: 11800) - loss: 0.398466
train epoch: 0 - step: 380 (total: 11800) - loss: 0.220276
train epoch: 0 - step: 390 (total: 11800) - loss: 0.329981
train epoch: 0 - step: 400 (total: 11800) - loss: 0.291278
dev step: 400 - loss: 0.28080, precision: 0.37307, recall: 0.44899, f1: 0.40752 current best 0.38372
==============================================save best model best performerence 0.407524
train epoch: 0 - step: 410 (total: 11800) - loss: 0.315920
train epoch: 0 - step: 420 (total: 11800) - loss: 0.335757
train epoch: 0 - step: 430 (total: 11800) - loss: 0.331377
train epoch: 0 - step: 440 (total: 11800) - loss: 0.339501
train epoch: 0 - step: 450 (total: 11800) - loss: 0.216479
dev step: 450 - loss: 0.27126, precision: 0.42649, recall: 0.48424, f1: 0.45353 current best 0.40752
==============================================save best model best performerence 0.453535
train epoch: 0 - step: 460 (total: 11800) - loss: 0.334343
train epoch: 0 - step: 470 (total: 11800) - loss: 0.246070
train epoch: 0 - step: 480 (total: 11800) - loss: 0.266857
train epoch: 0 - step: 490 (total: 11800) - loss: 0.262747
train epoch: 0 - step: 500 (total: 11800) - loss: 0.250897
dev step: 500 - loss: 0.25047, precision: 0.47231, recall: 0.60383, f1: 0.53003 current best 0.45353
==============================================save best model best performerence 0.530032
train epoch: 0 - step: 510 (total: 11800) - loss: 0.223253
train epoch: 0 - step: 520 (total: 11800) - loss: 0.228720
train epoch: 0 - step: 530 (total: 11800) - loss: 0.246290
train epoch: 0 - step: 540 (total: 11800) - loss: 0.287393
train epoch: 0 - step: 550 (total: 11800) - loss: 0.297358
dev step: 550 - loss: 0.24383, precision: 0.49097, recall: 0.55548, f1: 0.52123 current best 0.53003
train epoch: 0 - step: 560 (total: 11800) - loss: 0.266396
train epoch: 0 - step: 570 (total: 11800) - loss: 0.296538
train epoch: 0 - step: 580 (total: 11800) - loss: 0.210442
train epoch: 1 - step: 590 (total: 11800) - loss: 0.282502
train epoch: 1 - step: 600 (total: 11800) - loss: 0.239531
dev step: 600 - loss: 0.22736, precision: 0.49346, recall: 0.61347, f1: 0.54696 current best 0.53003
==============================================save best model best performerence 0.546959
train epoch: 1 - step: 610 (total: 11800) - loss: 0.281700
train epoch: 1 - step: 620 (total: 11800) - loss: 0.291554
train epoch: 1 - step: 630 (total: 11800) - loss: 0.284449
train epoch: 1 - step: 640 (total: 11800) - loss: 0.175821
train epoch: 1 - step: 650 (total: 11800) - loss: 0.234460
dev step: 650 - loss: 0.22660, precision: 0.50054, recall: 0.66628, f1: 0.57164 current best 0.54696
==============================================save best model best performerence 0.571640
train epoch: 1 - step: 660 (total: 11800) - loss: 0.253709
train epoch: 1 - step: 670 (total: 11800) - loss: 0.206524
train epoch: 1 - step: 680 (total: 11800) - loss: 0.273749
train epoch: 1 - step: 690 (total: 11800) - loss: 0.267098
train epoch: 1 - step: 700 (total: 11800) - loss: 0.221125
dev step: 700 - loss: 0.22382, precision: 0.50251, recall: 0.62052, f1: 0.55531 current best 0.57164
train epoch: 1 - step: 710 (total: 11800) - loss: 0.194055
train epoch: 1 - step: 720 (total: 11800) - loss: 0.213713
train epoch: 1 - step: 730 (total: 11800) - loss: 0.266367
train epoch: 1 - step: 740 (total: 11800) - loss: 0.265232
train epoch: 1 - step: 750 (total: 11800) - loss: 0.222215
dev step: 750 - loss: 0.23990, precision: 0.49661, recall: 0.71780, f1: 0.58707 current best 0.57164
==============================================save best model best performerence 0.587065
……
train epoch: 19 - step: 11210 (total: 11800) - loss: 0.071786
train epoch: 19 - step: 11220 (total: 11800) - loss: 0.126563
train epoch: 19 - step: 11230 (total: 11800) - loss: 0.079284
train epoch: 19 - step: 11240 (total: 11800) - loss: 0.097921
train epoch: 19 - step: 11250 (total: 11800) - loss: 0.082845
dev step: 11250 - loss: 0.26768, precision: 0.60864, recall: 0.73406, f1: 0.66549 current best 0.68086
train epoch: 19 - step: 11260 (total: 11800) - loss: 0.040633
train epoch: 19 - step: 11270 (total: 11800) - loss: 0.036113
train epoch: 19 - step: 11280 (total: 11800) - loss: 0.090494
train epoch: 19 - step: 11290 (total: 11800) - loss: 0.058005
train epoch: 19 - step: 11300 (total: 11800) - loss: 0.086870
dev step: 11300 - loss: 0.27434, precision: 0.65781, recall: 0.68772, f1: 0.67244 current best 0.68086
train epoch: 19 - step: 11310 (total: 11800) - loss: 0.092861
train epoch: 19 - step: 11320 (total: 11800) - loss: 0.081821
train epoch: 19 - step: 11330 (total: 11800) - loss: 0.093358
train epoch: 19 - step: 11340 (total: 11800) - loss: 0.041281
train epoch: 19 - step: 11350 (total: 11800) - loss: 0.072158
dev step: 11350 - loss: 0.26591, precision: 0.63945, recall: 0.72125, f1: 0.67789 current best 0.68086
train epoch: 19 - step: 11360 (total: 11800) - loss: 0.056884
train epoch: 19 - step: 11370 (total: 11800) - loss: 0.103474
train epoch: 19 - step: 11380 (total: 11800) - loss: 0.053013
train epoch: 19 - step: 11390 (total: 11800) - loss: 0.120952
train epoch: 19 - step: 11400 (total: 11800) - loss: 0.096058
dev step: 11400 - loss: 0.28324, precision: 0.59984, recall: 0.73752, f1: 0.66159 current best 0.68086
train epoch: 19 - step: 11410 (total: 11800) - loss: 0.053519
train epoch: 19 - step: 11420 (total: 11800) - loss: 0.084413
train epoch: 19 - step: 11430 (total: 11800) - loss: 0.082539
train epoch: 19 - step: 11440 (total: 11800) - loss: 0.025818
train epoch: 19 - step: 11450 (total: 11800) - loss: 0.104579
dev step: 11450 - loss: 0.27601, precision: 0.62382, recall: 0.71161, f1: 0.66483 current best 0.68086
train epoch: 19 - step: 11460 (total: 11800) - loss: 0.023326
train epoch: 19 - step: 11470 (total: 11800) - loss: 0.074468
train epoch: 19 - step: 11480 (total: 11800) - loss: 0.131153
train epoch: 19 - step: 11490 (total: 11800) - loss: 0.144081
train epoch: 19 - step: 11500 (total: 11800) - loss: 0.059301
dev step: 11500 - loss: 0.24404, precision: 0.63090, recall: 0.69881, f1: 0.66312 current best 0.68086
train epoch: 19 - step: 11510 (total: 11800) - loss: 0.087042
train epoch: 19 - step: 11520 (total: 11800) - loss: 0.103437
train epoch: 19 - step: 11530 (total: 11800) - loss: 0.141086
train epoch: 19 - step: 11540 (total: 11800) - loss: 0.073799
train epoch: 19 - step: 11550 (total: 11800) - loss: 0.080609
dev step: 11550 - loss: 0.26010, precision: 0.63815, recall: 0.71392, f1: 0.67391 current best 0.68086
train epoch: 19 - step: 11560 (total: 11800) - loss: 0.070097
train epoch: 19 - step: 11570 (total: 11800) - loss: 0.080336
train epoch: 19 - step: 11580 (total: 11800) - loss: 0.083600
train epoch: 19 - step: 11590 (total: 11800) - loss: 0.094290
train epoch: 19 - step: 11600 (total: 11800) - loss: 0.070526
dev step: 11600 - loss: 0.26730, precision: 0.63843, recall: 0.73536, f1: 0.68347 current best 0.68086
==============================================save best model best performerence 0.683475
train epoch: 19 - step: 11610 (total: 11800) - loss: 0.081728
train epoch: 19 - step: 11620 (total: 11800) - loss: 0.063919
train epoch: 19 - step: 11630 (total: 11800) - loss: 0.126019
train epoch: 19 - step: 11640 (total: 11800) - loss: 0.104756
train epoch: 19 - step: 11650 (total: 11800) - loss: 0.077707
dev step: 11650 - loss: 0.25038, precision: 0.63025, recall: 0.72140, f1: 0.67275 current best 0.68347
train epoch: 19 - step: 11660 (total: 11800) - loss: 0.092881
train epoch: 19 - step: 11670 (total: 11800) - loss: 0.068379
train epoch: 19 - step: 11680 (total: 11800) - loss: 0.046535
train epoch: 19 - step: 11690 (total: 11800) - loss: 0.078183
train epoch: 19 - step: 11700 (total: 11800) - loss: 0.104983
dev step: 11700 - loss: 0.26015, precision: 0.64215, recall: 0.70471, f1: 0.67197 current best 0.68347
train epoch: 19 - step: 11710 (total: 11800) - loss: 0.086539
train epoch: 19 - step: 11720 (total: 11800) - loss: 0.118713
train epoch: 19 - step: 11730 (total: 11800) - loss: 0.081435
train epoch: 19 - step: 11740 (total: 11800) - loss: 0.073214
train epoch: 19 - step: 11750 (total: 11800) - loss: 0.129037
dev step: 11750 - loss: 0.25711, precision: 0.62550, recall: 0.68067, f1: 0.65192 current best 0.68347
train epoch: 19 - step: 11760 (total: 11800) - loss: 0.117920
train epoch: 19 - step: 11770 (total: 11800) - loss: 0.048488
train epoch: 19 - step: 11780 (total: 11800) - loss: 0.095776
train epoch: 19 - step: 11790 (total: 11800) - loss: 0.122794
INFO 2021-04-10 19:32:21,529 launch.py:240] Local processes completed.
end DuEE-Fin role train
# 论元识别预测
!bash run_duee_fin.sh role_predict
check and create directory
dir ./ckpt exist
dir ./ckpt/DuEE-Fin exist
dir ./submit exist

start DuEE-Fin role predict
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  import imp
[2021-04-10 19:32:29,053] [    INFO] - Found /home/aistudio/.paddlenlp/models/ernie-1.0/vocab.txt
[2021-04-10 19:32:29,067] [    INFO] - Already cached /home/aistudio/.paddlenlp/models/ernie-1.0/ernie_v1_chn_base.pdparams
W0410 19:32:29.068078  7827 device_context.cc:362] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1
W0410 19:32:29.072537  7827 device_context.cc:372] device: 0, cuDNN Version: 7.6.
============start predict==========
Loaded parameters from ./ckpt/DuEE-Fin/role/best.pdparams
save data 140867 to ./ckpt/DuEE-Fin/role/test_pred.json
end DuEE-Fin role predict
# 枚举分类模型训练
!bash run_duee_fin.sh enum_train
check and create directory
dir ./ckpt exist
dir ./ckpt/DuEE-Fin exist
dir ./submit exist

start DuEE-Fin enum train
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  import imp
-----------  Configuration Arguments -----------
gpus: 0
heter_worker_num: None
heter_workers: 
http_port: None
ips: 127.0.0.1
log_dir: log
nproc_per_node: None
server_num: None
servers: 
training_script: classifier.py
training_script_args: ['--num_epoch', '20', '--learning_rate', '5e-5', '--tag_path', './conf/DuEE-Fin/enum_tag.dict', '--train_data', './data/DuEE-Fin/enum/train.tsv', '--dev_data', './data/DuEE-Fin/enum/dev.tsv', '--test_data', './data/DuEE-Fin/enum/test.tsv', '--predict_data', './data/DuEE-Fin/sentence/test.json', '--do_train', 'True', '--do_predict', 'False', '--max_seq_len', '300', '--batch_size', '16', '--skip_step', '1', '--valid_step', '5', '--checkpoints', './ckpt/DuEE-Fin/enum', '--init_ckpt', './ckpt/DuEE-Fin/enum/best.pdparams', '--predict_save_path', './ckpt/DuEE-Fin/enum/test_pred.json', '--device', 'gpu']
worker_num: None
workers: 
------------------------------------------------
WARNING 2021-04-10 19:52:37,709 launch.py:316] Not found distinct arguments and compiled with cuda. Default use collective mode
launch train in GPU mode
INFO 2021-04-10 19:52:37,711 launch_utils.py:471] Local start 1 processes. First process distributed environment info (Only For Debug): 
    +=======================================================================================+
    |                        Distributed Envs                      Value                    |
    +---------------------------------------------------------------------------------------+
    |                       PADDLE_TRAINER_ID                        0                      |
    |                 PADDLE_CURRENT_ENDPOINT                 127.0.0.1:53319               |
    |                     PADDLE_TRAINERS_NUM                        1                      |
    |                PADDLE_TRAINER_ENDPOINTS                 127.0.0.1:53319               |
    |                     FLAGS_selected_gpus                        0                      |
    +=======================================================================================+

INFO 2021-04-10 19:52:37,711 launch_utils.py:475] details abouts PADDLE_TRAINER_ENDPOINTS can be found in log/endpoints.log, and detail running logs maybe found in log/workerlog.0
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  import imp
[2021-04-10 19:52:38,983] [    INFO] - Already cached /home/aistudio/.paddlenlp/models/ernie-1.0/ernie_v1_chn_base.pdparams
W0410 19:52:38.984846  8459 device_context.cc:362] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1
W0410 19:52:38.990355  8459 device_context.cc:372] device: 0, cuDNN Version: 7.6.
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py:1303: UserWarning: Skip loading for classifier.weight. classifier.weight is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py:1303: UserWarning: Skip loading for classifier.bias. classifier.bias is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/parallel.py:423: UserWarning: The program will return to single-card operation. Please check 1, whether you use spawn or fleetrun to start the program. 2, Whether it is a multi-card program. 3, Is the current environment multi-card.
  warnings.warn("The program will return to single-card operation. "
[2021-04-10 19:52:45,669] [    INFO] - Found /home/aistudio/.paddlenlp/models/ernie-1.0/vocab.txt
============start train==========
train epoch: 0 - step: 1 (total: 540) - loss: 1.816590 acc 0.00000
train epoch: 0 - step: 2 (total: 540) - loss: 1.258928 acc 0.16667
train epoch: 0 - step: 3 (total: 540) - loss: 1.420988 acc 0.21875
train epoch: 0 - step: 4 (total: 540) - loss: 1.131907 acc 0.27500
train epoch: 0 - step: 5 (total: 540) - loss: 1.223589 acc 0.29167
dev step: 5 - loss: 1.056646 accuracy: 0.57353, current best 0.00000
==============================================save best model best performerence 0.573529
train epoch: 0 - step: 6 (total: 540) - loss: 0.891011 acc 0.62500
train epoch: 0 - step: 7 (total: 540) - loss: 1.019258 acc 0.53125
train epoch: 0 - step: 8 (total: 540) - loss: 0.944579 acc 0.54167
train epoch: 0 - step: 9 (total: 540) - loss: 0.998457 acc 0.54688
train epoch: 0 - step: 10 (total: 540) - loss: 1.451570 acc 0.52500
dev step: 10 - loss: 0.973503 accuracy: 0.58824, current best 0.57353
==============================================save best model best performerence 0.588235
train epoch: 0 - step: 11 (total: 540) - loss: 1.007745 acc 0.50000
train epoch: 0 - step: 12 (total: 540) - loss: 0.987179 acc 0.56250
train epoch: 0 - step: 13 (total: 540) - loss: 1.315943 acc 0.54167
train epoch: 0 - step: 14 (total: 540) - loss: 0.999895 acc 0.53125
train epoch: 0 - step: 15 (total: 540) - loss: 1.151808 acc 0.51250
dev step: 15 - loss: 0.960856 accuracy: 0.57353, current best 0.58824
train epoch: 0 - step: 16 (total: 540) - loss: 0.993396 acc 0.50000
train epoch: 0 - step: 17 (total: 540) - loss: 0.963157 acc 0.56250
train epoch: 0 - step: 18 (total: 540) - loss: 1.068855 acc 0.58333
train epoch: 0 - step: 19 (total: 540) - loss: 0.926241 acc 0.53125
train epoch: 0 - step: 20 (total: 540) - loss: 1.040999 acc 0.55000
dev step: 20 - loss: 0.976091 accuracy: 0.57353, current best 0.58824
train epoch: 0 - step: 21 (total: 540) - loss: 0.889343 acc 0.56250
train epoch: 0 - step: 22 (total: 540) - loss: 1.093462 acc 0.53125
train epoch: 0 - step: 23 (total: 540) - loss: 0.737294 acc 0.60417
train epoch: 0 - step: 24 (total: 540) - loss: 0.808597 acc 0.64062
train epoch: 0 - step: 25 (total: 540) - loss: 1.001462 acc 0.62500
dev step: 25 - loss: 0.890632 accuracy: 0.58824, current best 0.58824
train epoch: 0 - step: 26 (total: 540) - loss: 1.133129 acc 0.58333
train epoch: 1 - step: 27 (total: 540) - loss: 0.722086 acc 0.60714
train epoch: 1 - step: 28 (total: 540) - loss: 1.116035 acc 0.59091
train epoch: 1 - step: 29 (total: 540) - loss: 0.887589 acc 0.61667
train epoch: 1 - step: 30 (total: 540) - loss: 0.892591 acc 0.63158
dev step: 30 - loss: 0.789007 accuracy: 0.66176, current best 0.58824
==============================================save best model best performerence 0.661765
train epoch: 1 - step: 31 (total: 540) - loss: 0.553415 acc 0.93750
train epoch: 1 - step: 32 (total: 540) - loss: 0.908041 acc 0.81250
train epoch: 1 - step: 33 (total: 540) - loss: 0.635944 acc 0.81250
train epoch: 1 - step: 34 (total: 540) - loss: 0.589399 acc 0.79688
train epoch: 1 - step: 35 (total: 540) - loss: 0.848807 acc 0.75000
dev step: 35 - loss: 0.724788 accuracy: 0.73529, current best 0.66176
==============================================save best model best performerence 0.735294
train epoch: 1 - step: 36 (total: 540) - loss: 0.357636 acc 0.87500
train epoch: 1 - step: 37 (total: 540) - loss: 0.589867 acc 0.87500
train epoch: 1 - step: 38 (total: 540) - loss: 0.742335 acc 0.81250
train epoch: 1 - step: 39 (total: 540) - loss: 0.882202 acc 0.76562
train epoch: 1 - step: 40 (total: 540) - loss: 0.428002 acc 0.78750
dev step: 40 - loss: 0.696543 accuracy: 0.76471, current best 0.73529
==============================================save best model best performerence 0.764706
train epoch: 1 - step: 41 (total: 540) - loss: 1.359658 acc 0.50000
train epoch: 1 - step: 42 (total: 540) - loss: 1.061078 acc 0.59375
train epoch: 1 - step: 43 (total: 540) - loss: 0.830923 acc 0.60417
train epoch: 1 - step: 44 (total: 540) - loss: 1.215348 acc 0.59375
train epoch: 1 - step: 45 (total: 540) - loss: 0.437100 acc 0.65000
dev step: 45 - loss: 0.735505 accuracy: 0.76471, current best 0.76471
train epoch: 1 - step: 46 (total: 540) - loss: 0.742862 acc 0.68750
train epoch: 1 - step: 47 (total: 540) - loss: 0.711089 acc 0.68750
train epoch: 1 - step: 48 (total: 540) - loss: 0.544343 acc 0.72917
train epoch: 1 - step: 49 (total: 540) - loss: 0.928760 acc 0.67188
train epoch: 1 - step: 50 (total: 540) - loss: 0.650753 acc 0.70000
dev step: 50 - loss: 0.666267 accuracy: 0.80882, current best 0.76471
==============================================save best model best performerence 0.808824
train epoch: 1 - step: 51 (total: 540) - loss: 0.561961 acc 0.81250
train epoch: 1 - step: 52 (total: 540) - loss: 0.444493 acc 0.84375
train epoch: 1 - step: 53 (total: 540) - loss: 0.727330 acc 0.81818
train epoch: 2 - step: 54 (total: 540) - loss: 0.535819 acc 0.85000
train epoch: 2 - step: 55 (total: 540) - loss: 0.804540 acc 0.80263
dev step: 55 - loss: 0.748626 accuracy: 0.75000, current best 0.80882
……
train epoch: 19 - step: 521 (total: 540) - loss: 0.001116 acc 1.00000
train epoch: 19 - step: 522 (total: 540) - loss: 0.001323 acc 1.00000
train epoch: 19 - step: 523 (total: 540) - loss: 0.000761 acc 1.00000
train epoch: 19 - step: 524 (total: 540) - loss: 0.000776 acc 1.00000
train epoch: 19 - step: 525 (total: 540) - loss: 0.000688 acc 1.00000
dev step: 525 - loss: 0.963112 accuracy: 0.83824, current best 0.86765
train epoch: 19 - step: 526 (total: 540) - loss: 0.001005 acc 1.00000
train epoch: 19 - step: 527 (total: 540) - loss: 0.000491 acc 1.00000
train epoch: 19 - step: 528 (total: 540) - loss: 0.000759 acc 1.00000
train epoch: 19 - step: 529 (total: 540) - loss: 0.000579 acc 1.00000
train epoch: 19 - step: 530 (total: 540) - loss: 0.000592 acc 1.00000
dev step: 530 - loss: 0.965140 accuracy: 0.83824, current best 0.86765
train epoch: 19 - step: 531 (total: 540) - loss: 0.000727 acc 1.00000
train epoch: 19 - step: 532 (total: 540) - loss: 0.000827 acc 1.00000
train epoch: 19 - step: 533 (total: 540) - loss: 0.002026 acc 1.00000
train epoch: 19 - step: 534 (total: 540) - loss: 0.001417 acc 1.00000
train epoch: 19 - step: 535 (total: 540) - loss: 0.000947 acc 1.00000
dev step: 535 - loss: 0.967908 accuracy: 0.83824, current best 0.86765
train epoch: 19 - step: 536 (total: 540) - loss: 0.000558 acc 1.00000
train epoch: 19 - step: 537 (total: 540) - loss: 0.000692 acc 1.00000
train epoch: 19 - step: 538 (total: 540) - loss: 0.001994 acc 1.00000
train epoch: 19 - step: 539 (total: 540) - loss: 0.000524 acc 1.00000
INFO 2021-04-10 19:56:40,966 launch.py:240] Local processes completed.
end DuEE-Fin enum train
# 枚举分类预测
!bash run_duee_fin.sh enum_predict
check and create directory
dir ./ckpt exist
dir ./ckpt/DuEE-Fin exist
dir ./submit exist

start DuEE-Fin enum predict
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  import imp
[2021-04-10 19:56:50,581] [    INFO] - Already cached /home/aistudio/.paddlenlp/models/ernie-1.0/ernie_v1_chn_base.pdparams
W0410 19:56:50.583134  9015 device_context.cc:362] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1
W0410 19:56:50.588418  9015 device_context.cc:372] device: 0, cuDNN Version: 7.6.
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py:1303: UserWarning: Skip loading for classifier.weight. classifier.weight is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py:1303: UserWarning: Skip loading for classifier.bias. classifier.bias is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/parallel.py:423: UserWarning: The program will return to single-card operation. Please check 1, whether you use spawn or fleetrun to start the program. 2, Whether it is a multi-card program. 3, Is the current environment multi-card.
  warnings.warn("The program will return to single-card operation. "
[2021-04-10 19:56:57,202] [    INFO] - Found /home/aistudio/.paddlenlp/models/ernie-1.0/vocab.txt
============start predict==========
Loaded parameters from ./ckpt/DuEE-Fin/enum/best.pdparams
save data 140867 to ./ckpt/DuEE-Fin/enum/test_pred.json
end DuEE-Fin enum predict

1.5 快速复现基线Step5:数据后处理,提交结果

  按照比赛预测指定格式提交结果至评测网站。 结果存放于submit/test_duee_fin.json

!bash run_duee_fin.sh pred_2_submit
check and create directory
dir ./ckpt exist
dir ./ckpt/DuEE-Fin exist
dir ./submit exist

start DuEE-Fin predict data merge to submit fotmat
trigger predict 140867 load from ./ckpt/DuEE-Fin/trigger/test_pred.json
role predict 140867 load from ./ckpt/DuEE-Fin/role/test_pred.json
enum predict 140867 load from ./ckpt/DuEE-Fin/enum/test_pred.json
schema 13 load from ./conf/DuEE-Fin/event_schema.json
submit data 30000 save to ./submit/test_duee_fin.json
end DuEE-Fin role predict data merge

二、句子级事件抽取基线

  句子级别通用领域的事件抽取数据集(DuEE 1.0)上进行事件抽取的基线模型,该模型采用基于ERNIE的序列标注(sequence labeling)方案,分为基于序列标注的触发词抽取模型基于序列标注的论元抽取模型,属于PipeLine模型;基于序列标注的触发词抽取模型采用BIO方式,识别触发词的位置以及对应的事件类型,基于序列标注的论元抽取模型采用BIO方式识别出事件中的论元以及对应的论元角色。模型和数据处理方式与篇章级事件抽取相同,此处不再赘述。句子级别通用领域的事件抽取无枚举角色分类。

# 数据预处理
!bash run_duee_1.sh data_prepare

# 训练触发词识别模型
!bash run_duee_1.sh trigger_train
该条输出内容超过1000行,保存时将被截断

check and create directory
dir ./ckpt exist
create dir * ./ckpt/DuEE1.0 *
dir ./submit exist

start DuEE1.0 data prepare

===============DUEE 1.0 DATASET==============

=================start schema process==============
input path ./conf/DuEE1.0/event_schema.json
save trigger tag 131 at ./conf/DuEE1.0/trigger_tag.dict
save trigger tag 243 at ./conf/DuEE1.0/role_tag.dict
=================end schema process===============

=================start schema process==============

----trigger------for dir ./data/DuEE1.0 to ./data/DuEE1.0/trigger
train 11959 dev 1499

----role------for dir ./data/DuEE1.0 to ./data/DuEE1.0/role
train 13916 dev 1791 test 1
=================end schema process==============
end DuEE1.0 data prepare
check and create directory
dir ./ckpt exist
dir ./ckpt/DuEE1.0 exist
dir ./submit exist

start DuEE1.0 trigger train
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  import imp
-----------  Configuration Arguments -----------
gpus: 0
heter_worker_num: None
heter_workers: 
http_port: None
ips: 127.0.0.1
log_dir: log
nproc_per_node: None
server_num: None
servers: 
training_script: sequence_labeling.py
training_script_args: ['--num_epoch', '20', '--learning_rate', '5e-5', '--tag_path', './conf/DuEE1.0/trigger_tag.dict', '--train_data', './data/DuEE1.0/trigger/train.tsv', '--dev_data', './data/DuEE1.0/trigger/dev.tsv', '--test_data', './data/DuEE1.0/trigger/test.tsv', '--predict_data', './data/DuEE1.0/test.json', '--do_train', 'True', '--do_predict', 'False', '--max_seq_len', '300', '--batch_size', '16', '--skip_step', '10', '--valid_step', '50', '--checkpoints', './ckpt/DuEE1.0/trigger', '--init_ckpt', './ckpt/DuEE1.0/trigger/best.pdparams', '--predict_save_path', './ckpt/DuEE1.0/trigger/test_pred.json', '--device', 'gpu']
worker_num: None
workers: 
------------------------------------------------
WARNING 2021-04-10 20:12:04,884 launch.py:316] Not found distinct arguments and compiled with cuda. Default use collective mode
launch train in GPU mode
INFO 2021-04-10 20:12:04,886 launch_utils.py:471] Local start 1 processes. First process distributed environment info (Only For Debug): 
    +=======================================================================================+
    |                        Distributed Envs                      Value                    |
    +---------------------------------------------------------------------------------------+
    |                       PADDLE_TRAINER_ID                        0                      |
    |                 PADDLE_CURRENT_ENDPOINT                 127.0.0.1:44437               |
    |                     PADDLE_TRAINERS_NUM                        1                      |
    |                PADDLE_TRAINER_ENDPOINTS                 127.0.0.1:44437               |
    |                     FLAGS_selected_gpus                        0                      |
    +=======================================================================================+

INFO 2021-04-10 20:12:04,886 launch_utils.py:475] details abouts PADDLE_TRAINER_ENDPOINTS can be found in log/endpoints.log, and detail running logs maybe found in log/workerlog.0
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  import imp
[2021-04-10 20:12:06,137] [    INFO] - Found /home/aistudio/.paddlenlp/models/ernie-1.0/vocab.txt
[2021-04-10 20:12:06,151] [    INFO] - Already cached /home/aistudio/.paddlenlp/models/ernie-1.0/ernie_v1_chn_base.pdparams
W0410 20:12:06.152766  9531 device_context.cc:362] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1
W0410 20:12:06.157284  9531 device_context.cc:372] device: 0, cuDNN Version: 7.6.
============start train==========
train epoch: 0 - step: 10 (total: 14960) - loss: 0.399632
train epoch: 0 - step: 20 (total: 14960) - loss: 0.439437
train epoch: 0 - step: 30 (total: 14960) - loss: 0.408838
train epoch: 0 - step: 40 (total: 14960) - loss: 0.298826
train epoch: 0 - step: 50 (total: 14960) - loss: 0.394555
dev step: 50 - loss: 0.36327, precision: 0.00000, recall: 0.00000, f1: 0.00000 current best 0.00000
train epoch: 0 - step: 60 (total: 14960) - loss: 0.485982
train epoch: 0 - step: 70 (total: 14960) - loss: 0.250205
train epoch: 0 - step: 80 (total: 14960) - loss: 0.382578
train epoch: 0 - step: 90 (total: 14960) - loss: 0.202613
train epoch: 0 - step: 100 (total: 14960) - loss: 0.309972
dev step: 100 - loss: 0.35608, precision: 0.00000, recall: 0.00000, f1: 0.00000 current best 0.00000
train epoch: 0 - step: 110 (total: 14960) - loss: 0.310728
train epoch: 0 - step: 120 (total: 14960) - loss: 0.324738
train epoch: 0 - step: 130 (total: 14960) - loss: 0.262632
train epoch: 0 - step: 140 (total: 14960) - loss: 0.432903
train epoch: 0 - step: 150 (total: 14960) - loss: 0.436539
dev step: 150 - loss: 0.35624, precision: 0.00000, recall: 0.00000, f1: 0.00000 current best 0.00000
train epoch: 0 - step: 160 (total: 14960) - loss: 0.485794
train epoch: 0 - step: 170 (total: 14960) - loss: 0.315029
train epoch: 0 - step: 180 (total: 14960) - loss: 0.284743
train epoch: 0 - step: 190 (total: 14960) - loss: 0.259944
train epoch: 0 - step: 200 (total: 14960) - loss: 0.311902
dev step: 200 - loss: 0.33042, precision: 0.00000, recall: 0.00000, f1: 0.00000 current best 0.00000
train epoch: 0 - step: 210 (total: 14960) - loss: 0.330571
train epoch: 0 - step: 220 (total: 14960) - loss: 0.273139
train epoch: 0 - step: 230 (total: 14960) - loss: 0.378063
train epoch: 0 - step: 240 (total: 14960) - loss: 0.250299
train epoch: 0 - step: 250 (total: 14960) - loss: 0.290701
dev step: 250 - loss: 0.29563, precision: 0.00000, recall: 0.00000, f1: 0.00000 current best 0.00000
train epoch: 0 - step: 260 (total: 14960) - loss: 0.202284
train epoch: 0 - step: 270 (total: 14960) - loss: 0.180812
train epoch: 0 - step: 280 (total: 14960) - loss: 0.238939
train epoch: 0 - step: 290 (total: 14960) - loss: 0.256409
train epoch: 0 - step: 300 (total: 14960) - loss: 0.192298
dev step: 300 - loss: 0.19781, precision: 0.28393, recall: 0.17765, f1: 0.21856 current best 0.00000
==============================================save best model best performerence 0.218557
train epoch: 0 - step: 310 (total: 14960) - loss: 0.236116
train epoch: 0 - step: 320 (total: 14960) - loss: 0.185691
train epoch: 0 - step: 330 (total: 14960) - loss: 0.150023
train epoch: 0 - step: 340 (total: 14960) - loss: 0.160092
train epoch: 0 - step: 350 (total: 14960) - loss: 0.251915
dev step: 350 - loss: 0.16887, precision: 0.41444, recall: 0.23408, f1: 0.29918 current best 0.21856
==============================================save best model best performerence 0.299179
train epoch: 0 - step: 360 (total: 14960) - loss: 0.226977
train epoch: 0 - step: 370 (total: 14960) - loss: 0.157772
train epoch: 0 - step: 380 (total: 14960) - loss: 0.204087
train epoch: 0 - step: 390 (total: 14960) - loss: 0.193559
train epoch: 0 - step: 400 (total: 14960) - loss: 0.076721
dev step: 400 - loss: 0.14077, precision: 0.40486, recall: 0.33520, f1: 0.36675 current best 0.29918
==============================================save best model best performerence 0.366748
train epoch: 0 - step: 410 (total: 14960) - loss: 0.132487
train epoch: 0 - step: 420 (total: 14960) - loss: 0.234711
train epoch: 0 - step: 430 (total: 14960) - loss: 0.146011
train epoch: 0 - step: 440 (total: 14960) - loss: 0.182145
train epoch: 0 - step: 450 (total: 14960) - loss: 0.124297
dev step: 450 - loss: 0.11585, precision: 0.47749, recall: 0.47989, f1: 0.47868 current best 0.36675
==============================================save best model best performerence 0.478685
train epoch: 0 - step: 460 (total: 14960) - loss: 0.128533
train epoch: 0 - step: 470 (total: 14960) - loss: 0.232507
train epoch: 0 - step: 480 (total: 14960) - loss: 0.138922
train epoch: 0 - step: 490 (total: 14960) - loss: 0.063667
train epoch: 0 - step: 500 (total: 14960) - loss: 0.067490
dev step: 500 - loss: 0.09702, precision: 0.55856, recall: 0.51955, f1: 0.53835 current best 0.47868
==============================================save best model best performerence 0.538350
train epoch: 0 - step: 510 (total: 14960) - loss: 0.076103
train epoch: 0 - step: 520 (total: 14960) - loss: 0.057995
train epoch: 0 - step: 530 (total: 14960) - loss: 0.066106
train epoch: 0 - step: 540 (total: 14960) - loss: 0.122683
train epoch: 0 - step: 550 (total: 14960) - loss: 0.106140
dev step: 550 - loss: 0.08321, precision: 0.62119, recall: 0.59274, f1: 0.60663 current best 0.53835
==============================================save best model best performerence 0.606632
train epoch: 0 - step: 560 (total: 14960) - loss: 0.039723
train epoch: 0 - step: 570 (total: 14960) - loss: 0.093354
train epoch: 0 - step: 580 (total: 14960) - loss: 0.125624
train epoch: 0 - step: 590 (total: 14960) - loss: 0.056028
train epoch: 0 - step: 600 (total: 14960) - loss: 0.050333
dev step: 600 - loss: 0.07346, precision: 0.67859, recall: 0.63575, f1: 0.65648 current best 0.60663
==============================================save best model best performerence 0.656475
train epoch: 0 - step: 610 (total: 14960) - loss: 0.106334
train epoch: 0 - step: 620 (total: 14960) - loss: 0.106583
train epoch: 0 - step: 630 (total: 14960) - loss: 0.060192
train epoch: 0 - step: 640 (total: 14960) - loss: 0.032199
train epoch: 0 - step: 650 (total: 14960) - loss: 0.104459
dev step: 650 - loss: 0.06579, precision: 0.69209, recall: 0.68939, f1: 0.69074 current best 0.65648
==============================================save best model best performerence 0.690736
train epoch: 0 - step: 660 (total: 14960) - loss: 0.068539
train epoch: 0 - step: 670 (total: 14960) - loss: 0.059690
train epoch: 0 - step: 680 (total: 14960) - loss: 0.064414
train epoch: 0 - step: 690 (total: 14960) - loss: 0.085624
train epoch: 0 - step: 700 (total: 14960) - loss: 0.064715
dev step: 700 - loss: 0.06439, precision: 0.68861, recall: 0.69553, f1: 0.69205 current best 0.69074
==============================================save best model best performerence 0.692051
train epoch: 0 - step: 710 (total: 14960) - loss: 0.071924
train epoch: 0 - step: 720 (total: 14960) - loss: 0.064167
train epoch: 0 - step: 730 (total: 14960) - loss: 0.053353
train epoch: 0 - step: 740 (total: 14960) - loss: 0.084605
train epoch: 1 - step: 750 (total: 14960) - loss: 0.071954
dev step: 750 - loss: 0.05509, precision: 0.71468, recall: 0.73184, f1: 0.72316 current best 0.69205
==============================================save best model best performerence 0.723158
train epoch: 1 - step: 760 (total: 14960) - loss: 0.063369
train epoch: 1 - step: 770 (total: 14960) - loss: 0.010517
train epoch: 1 - step: 780 (total: 14960) - loss: 0.053650
train epoch: 1 - step: 790 (total: 14960) - loss: 0.042259
train epoch: 1 - step: 800 (total: 14960) - loss: 0.032458
dev step: 800 - loss: 0.05442, precision: 0.70917, recall: 0.77374, f1: 0.74005 current best 0.72316
==============================================save best model best performerence 0.740048
train epoch: 1 - step: 810 (total: 14960) - loss: 0.056759
train epoch: 1 - step: 820 (total: 14960) - loss: 0.027823
train epoch: 1 - step: 830 (total: 14960) - loss: 0.047783
train epoch: 1 - step: 840 (total: 14960) - loss: 0.038662
train epoch: 1 - step: 850 (total: 14960) - loss: 0.085002
dev step: 850 - loss: 0.05003, precision: 0.72125, recall: 0.80223, f1: 0.75959 current best 0.74005
==============================================save best model best performerence 0.759587
train epoch: 1 - step: 860 (total: 14960) - loss: 0.022502
train epoch: 1 - step: 870 (total: 14960) - loss: 0.039028
train epoch: 1 - step: 880 (total: 14960) - loss: 0.042963
train epoch: 1 - step: 890 (total: 14960) - loss: 0.045788
train epoch: 1 - step: 900 (total: 14960) - loss: 0.026486
dev step: 900 - loss: 0.04721, precision: 0.74372, recall: 0.84302, f1: 0.79026 current best 0.75959
==============================================save best model best performerence 0.790259
train epoch: 1 - step: 910 (total: 14960) - loss: 0.032655
train epoch: 1 - step: 920 (total: 14960) - loss: 0.021889
train epoch: 1 - step: 930 (total: 14960) - loss: 0.033798
train epoch: 1 - step: 940 (total: 14960) - loss: 0.060657
train epoch: 1 - step: 950 (total: 14960) - loss: 0.019720
dev step: 950 - loss: 0.04749, precision: 0.73062, recall: 0.84246, f1: 0.78256 current best 0.79026
train epoch: 1 - step: 960 (total: 14960) - loss: 0.037086
train epoch: 1 - step: 970 (total: 14960) - loss: 0.027883
train epoch: 1 - step: 980 (total: 14960) - loss: 0.044426
train epoch: 1 - step: 990 (total: 14960) - loss: 0.021761
train epoch: 1 - step: 1000 (total: 14960) - loss: 0.044189
dev step: 1000 - loss: 0.04534, precision: 0.79933, recall: 0.80335, f1: 0.80134 current best 0.79026
==============================================save best model best performerence 0.801337
train epoch: 1 - step: 1010 (total: 14960) - loss: 0.050067
train epoch: 1 - step: 1020 (total: 14960) - loss: 0.033646
train epoch: 1 - step: 1030 (total: 14960) - loss: 0.030856
train epoch: 1 - step: 1040 (total: 14960) - loss: 0.045213
train epoch: 1 - step: 1050 (total: 14960) - loss: 0.068307
dev step: 1050 - loss: 0.04333, precision: 0.79307, recall: 0.81788, f1: 0.80528 current best 0.80134
==============================================save best model best performerence 0.805281
train epoch: 1 - step: 1060 (total: 14960) - loss: 0.031629
train epoch: 1 - step: 1070 (total: 14960) - loss: 0.034574
train epoch: 1 - step: 1080 (total: 14960) - loss: 0.009664
train epoch: 1 - step: 1090 (total: 14960) - loss: 0.022344
train epoch: 1 - step: 1100 (total: 14960) - loss: 0.030906
dev step: 1100 - loss: 0.04319, precision: 0.77368, recall: 0.84413, f1: 0.80737 current best 0.80528
==============================================save best model best performerence 0.807374
train epoch: 1 - step: 1110 (total: 14960) - loss: 0.021814
train epoch: 1 - step: 1120 (total: 14960) - loss: 0.015393
train epoch: 1 - step: 1130 (total: 14960) - loss: 0.018273
train epoch: 1 - step: 1140 (total: 14960) - loss: 0.012760
train epoch: 1 - step: 1150 (total: 14960) - loss: 0.047260
dev step: 1150 - loss: 0.04239, precision: 0.79338, recall: 0.83017, f1: 0.81136 current best 0.80737
==============================================save best model best performerence 0.811357
train epoch: 1 - step: 1160 (total: 14960) - loss: 0.055832
train epoch: 1 - step: 1170 (total: 14960) - loss: 0.023067
train epoch: 1 - step: 1180 (total: 14960) - loss: 0.029046
train epoch: 1 - step: 1190 (total: 14960) - loss: 0.022165
train epoch: 1 - step: 1200 (total: 14960) - loss: 0.021577
dev step: 1200 - loss: 0.04173, precision: 0.79144, recall: 0.82682, f1: 0.80874 current best 0.81136
train epoch: 1 - step: 1210 (total: 14960) - loss: 0.040631
train epoch: 1 - step: 1220 (total: 14960) - loss: 0.028234
train epoch: 1 - step: 1230 (total: 14960) - loss: 0.033360
train epoch: 1 - step: 1240 (total: 14960) - loss: 0.023661
train epoch: 1 - step: 1250 (total: 14960) - loss: 0.051824
dev step: 1250 - loss: 0.04070, precision: 0.77673, recall: 0.83184, f1: 0.80335 current best 0.81136
train epoch: 1 - step: 1260 (total: 14960) - loss: 0.027152
train epoch: 1 - step: 1270 (total: 14960) - loss: 0.027165
train epoch: 1 - step: 1280 (total: 14960) - loss: 0.035664
train epoch: 1 - step: 1290 (total: 14960) - loss: 0.038181
train epoch: 1 - step: 1300 (total: 14960) - loss: 0.034335
dev step: 1300 - loss: 0.03963, precision: 0.77882, recall: 0.83799, f1: 0.80732 current best 0.81136
train epoch: 1 - step: 1310 (total: 14960) - loss: 0.045533
train epoch: 1 - step: 1320 (total: 14960) - loss: 0.076441
train epoch: 1 - step: 1330 (total: 14960) - loss: 0.035492
train epoch: 1 - step: 1340 (total: 14960) - loss: 0.020915
train epoch: 1 - step: 1350 (total: 14960) - loss: 0.009881
dev step: 1350 - loss: 0.04082, precision: 0.78723, recall: 0.84749, f1: 0.81625 current best 0.81136
==============================================save best model best performerence 0.816250
train epoch: 1 - step: 1360 (total: 14960) - loss: 0.037463
train epoch: 1 - step: 1370 (total: 14960) - loss: 0.044000
train epoch: 1 - step: 1380 (total: 14960) - loss: 0.033455
train epoch: 1 - step: 1390 (total: 14960) - loss: 0.011349
train epoch: 1 - step: 1400 (total: 14960) - loss: 0.027764
dev step: 1400 - loss: 0.04117, precision: 0.79249, recall: 0.82570, f1: 0.80876 current best 0.81625
train epoch: 1 - step: 1410 (total: 14960) - loss: 0.032213
train epoch: 1 - step: 1420 (total: 14960) - loss: 0.024112
train epoch: 1 - step: 1430 (total: 14960) - loss: 0.025826
train epoch: 1 - step: 1440 (total: 14960) - loss: 0.039797
train epoch: 1 - step: 1450 (total: 14960) - loss: 0.073417
dev step: 1450 - loss: 0.03987, precision: 0.78395, recall: 0.85140, f1: 0.81628 current best 0.81625
==============================================save best model best performerence 0.816283
train epoch: 1 - step: 1460 (total: 14960) - loss: 0.021326
train epoch: 1 - step: 1470 (total: 14960) - loss: 0.018628
train epoch: 1 - step: 1480 (total: 14960) - loss: 0.029017
train epoch: 1 - step: 1490 (total: 14960) - loss: 0.048521
……
train epoch: 19 - step: 14220 (total: 14960) - loss: 0.001144
train epoch: 19 - step: 14230 (total: 14960) - loss: 0.000301
train epoch: 19 - step: 14240 (total: 14960) - loss: 0.001033
train epoch: 19 - step: 14250 (total: 14960) - loss: 0.003649
dev step: 14250 - loss: 0.07217, precision: 0.83424, recall: 0.85754, f1: 0.84573 current best 0.85026
train epoch: 19 - step: 14260 (total: 14960) - loss: 0.000222
train epoch: 19 - step: 14270 (total: 14960) - loss: 0.001345
train epoch: 19 - step: 14280 (total: 14960) - loss: 0.000353
train epoch: 19 - step: 14290 (total: 14960) - loss: 0.004071
train epoch: 19 - step: 14300 (total: 14960) - loss: 0.004355
dev step: 14300 - loss: 0.07171, precision: 0.83568, recall: 0.86089, f1: 0.84810 current best 0.85026
train epoch: 19 - step: 14310 (total: 14960) - loss: 0.001791
train epoch: 19 - step: 14320 (total: 14960) - loss: 0.001619
train epoch: 19 - step: 14330 (total: 14960) - loss: 0.003730
train epoch: 19 - step: 14340 (total: 14960) - loss: 0.000157
train epoch: 19 - step: 14350 (total: 14960) - loss: 0.000462
dev step: 14350 - loss: 0.07241, precision: 0.83370, recall: 0.85698, f1: 0.84518 current best 0.85026
train epoch: 19 - step: 14360 (total: 14960) - loss: 0.000490
train epoch: 19 - step: 14370 (total: 14960) - loss: 0.000182
train epoch: 19 - step: 14380 (total: 14960) - loss: 0.002310
train epoch: 19 - step: 14390 (total: 14960) - loss: 0.000973
train epoch: 19 - step: 14400 (total: 14960) - loss: 0.000543
dev step: 14400 - loss: 0.07378, precision: 0.83623, recall: 0.86145, f1: 0.84865 current best 0.85026
train epoch: 19 - step: 14410 (total: 14960) - loss: 0.000710
train epoch: 19 - step: 14420 (total: 14960) - loss: 0.000122
train epoch: 19 - step: 14430 (total: 14960) - loss: 0.003291
train epoch: 19 - step: 14440 (total: 14960) - loss: 0.001306
train epoch: 19 - step: 14450 (total: 14960) - loss: 0.002820
dev step: 14450 - loss: 0.07792, precision: 0.83982, recall: 0.85531, f1: 0.84750 current best 0.85026
train epoch: 19 - step: 14460 (total: 14960) - loss: 0.000153
train epoch: 19 - step: 14470 (total: 14960) - loss: 0.009174
train epoch: 19 - step: 14480 (total: 14960) - loss: 0.002065
train epoch: 19 - step: 14490 (total: 14960) - loss: 0.001641
train epoch: 19 - step: 14500 (total: 14960) - loss: 0.013356
dev step: 14500 - loss: 0.07694, precision: 0.81122, recall: 0.86425, f1: 0.83689 current best 0.85026
train epoch: 19 - step: 14510 (total: 14960) - loss: 0.000902
train epoch: 19 - step: 14520 (total: 14960) - loss: 0.009084
train epoch: 19 - step: 14530 (total: 14960) - loss: 0.000777
train epoch: 19 - step: 14540 (total: 14960) - loss: 0.000141
train epoch: 19 - step: 14550 (total: 14960) - loss: 0.001748
dev step: 14550 - loss: 0.07448, precision: 0.81751, recall: 0.86592, f1: 0.84102 current best 0.85026
train epoch: 19 - step: 14560 (total: 14960) - loss: 0.000747
train epoch: 19 - step: 14570 (total: 14960) - loss: 0.012806
train epoch: 19 - step: 14580 (total: 14960) - loss: 0.004823
train epoch: 19 - step: 14590 (total: 14960) - loss: 0.001402
train epoch: 19 - step: 14600 (total: 14960) - loss: 0.012385
dev step: 14600 - loss: 0.07167, precision: 0.82297, recall: 0.87263, f1: 0.84707 current best 0.85026
train epoch: 19 - step: 14610 (total: 14960) - loss: 0.003738
train epoch: 19 - step: 14620 (total: 14960) - loss: 0.000189
train epoch: 19 - step: 14630 (total: 14960) - loss: 0.004993
train epoch: 19 - step: 14640 (total: 14960) - loss: 0.000982
train epoch: 19 - step: 14650 (total: 14960) - loss: 0.000245
dev step: 14650 - loss: 0.07632, precision: 0.83324, recall: 0.85978, f1: 0.84630 current best 0.85026
train epoch: 19 - step: 14660 (total: 14960) - loss: 0.002108
train epoch: 19 - step: 14670 (total: 14960) - loss: 0.002859
train epoch: 19 - step: 14680 (total: 14960) - loss: 0.000802
train epoch: 19 - step: 14690 (total: 14960) - loss: 0.001411
train epoch: 19 - step: 14700 (total: 14960) - loss: 0.000175
dev step: 14700 - loss: 0.07886, precision: 0.81485, recall: 0.87039, f1: 0.84171 current best 0.85026
train epoch: 19 - step: 14710 (total: 14960) - loss: 0.000079
train epoch: 19 - step: 14720 (total: 14960) - loss: 0.000239
train epoch: 19 - step: 14730 (total: 14960) - loss: 0.002459
train epoch: 19 - step: 14740 (total: 14960) - loss: 0.000840
train epoch: 19 - step: 14750 (total: 14960) - loss: 0.000168
dev step: 14750 - loss: 0.07765, precision: 0.82555, recall: 0.85922, f1: 0.84205 current best 0.85026
train epoch: 19 - step: 14760 (total: 14960) - loss: 0.000097
train epoch: 19 - step: 14770 (total: 14960) - loss: 0.000967
train epoch: 19 - step: 14780 (total: 14960) - loss: 0.000198
train epoch: 19 - step: 14790 (total: 14960) - loss: 0.000484
train epoch: 19 - step: 14800 (total: 14960) - loss: 0.002144
dev step: 14800 - loss: 0.07190, precision: 0.82507, recall: 0.86425, f1: 0.84420 current best 0.85026
train epoch: 19 - step: 14810 (total: 14960) - loss: 0.000452
train epoch: 19 - step: 14820 (total: 14960) - loss: 0.000663
train epoch: 19 - step: 14830 (total: 14960) - loss: 0.022780
train epoch: 19 - step: 14840 (total: 14960) - loss: 0.007530
train epoch: 19 - step: 14850 (total: 14960) - loss: 0.000360
dev step: 14850 - loss: 0.07089, precision: 0.83607, recall: 0.85475, f1: 0.84530 current best 0.85026
train epoch: 19 - step: 14860 (total: 14960) - loss: 0.002914
train epoch: 19 - step: 14870 (total: 14960) - loss: 0.000343
train epoch: 19 - step: 14880 (total: 14960) - loss: 0.001293
train epoch: 19 - step: 14890 (total: 14960) - loss: 0.000621
train epoch: 19 - step: 14900 (total: 14960) - loss: 0.001378
dev step: 14900 - loss: 0.06631, precision: 0.82437, recall: 0.87318, f1: 0.84807 current best 0.85026
train epoch: 19 - step: 14910 (total: 14960) - loss: 0.000467
train epoch: 19 - step: 14920 (total: 14960) - loss: 0.001079
train epoch: 19 - step: 14930 (total: 14960) - loss: 0.002540
train epoch: 19 - step: 14940 (total: 14960) - loss: 0.006217
train epoch: 19 - step: 14950 (total: 14960) - loss: 0.000213
dev step: 14950 - loss: 0.07010, precision: 0.83477, recall: 0.86369, f1: 0.84898 current best 0.85026
INFO 2021-04-10 21:18:14,794 launch.py:240] Local processes completed.
end DuEE1.0 trigger train
# 触发词识别预测
!bash run_duee_1.sh trigger_predict
check and create directory
dir ./ckpt exist
dir ./ckpt/DuEE1.0 exist
dir ./submit exist

start DuEE1.0 trigger predict
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  import imp
[2021-04-10 21:19:00,925] [    INFO] - Found /home/aistudio/.paddlenlp/models/ernie-1.0/vocab.txt
[2021-04-10 21:19:00,939] [    INFO] - Already cached /home/aistudio/.paddlenlp/models/ernie-1.0/ernie_v1_chn_base.pdparams
W0410 21:19:00.940081 12545 device_context.cc:362] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1
W0410 21:19:00.944607 12545 device_context.cc:372] device: 0, cuDNN Version: 7.6.
============start predict==========
Loaded parameters from ./ckpt/DuEE1.0/trigger/best.pdparams
save data 499 to ./ckpt/DuEE1.0/trigger/test_pred.json
end DuEE1.0 trigger predict
# 论元识别模型训练
!bash run_duee_1.sh role_train
该条输出内容超过1000行,保存时将被截断

check and create directory
dir ./ckpt exist
dir ./ckpt/DuEE1.0 exist
dir ./submit exist

start DuEE1.0 role train
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  import imp
-----------  Configuration Arguments -----------
gpus: 0
heter_worker_num: None
heter_workers: 
http_port: None
ips: 127.0.0.1
log_dir: log
nproc_per_node: None
server_num: None
servers: 
training_script: sequence_labeling.py
training_script_args: ['--num_epoch', '20', '--learning_rate', '5e-5', '--tag_path', './conf/DuEE1.0/role_tag.dict', '--train_data', './data/DuEE1.0/role/train.tsv', '--dev_data', './data/DuEE1.0/role/dev.tsv', '--test_data', './data/DuEE1.0/role/test.tsv', '--predict_data', './data/DuEE1.0/test.json', '--do_train', 'True', '--do_predict', 'False', '--max_seq_len', '300', '--batch_size', '16', '--skip_step', '10', '--valid_step', '50', '--checkpoints', './ckpt/DuEE1.0/role', '--init_ckpt', './ckpt/DuEE1.0/role/best.pdparams', '--predict_save_path', './ckpt/DuEE1.0/role/test_pred.json', '--device', 'gpu']
worker_num: None
workers: 
------------------------------------------------
WARNING 2021-04-10 21:19:31,729 launch.py:316] Not found distinct arguments and compiled with cuda. Default use collective mode
launch train in GPU mode
INFO 2021-04-10 21:19:31,731 launch_utils.py:471] Local start 1 processes. First process distributed environment info (Only For Debug): 
    +=======================================================================================+
    |                        Distributed Envs                      Value                    |
    +---------------------------------------------------------------------------------------+
    |                       PADDLE_TRAINER_ID                        0                      |
    |                 PADDLE_CURRENT_ENDPOINT                 127.0.0.1:39979               |
    |                     PADDLE_TRAINERS_NUM                        1                      |
    |                PADDLE_TRAINER_ENDPOINTS                 127.0.0.1:39979               |
    |                     FLAGS_selected_gpus                        0                      |
    +=======================================================================================+

INFO 2021-04-10 21:19:31,731 launch_utils.py:475] details abouts PADDLE_TRAINER_ENDPOINTS can be found in log/endpoints.log, and detail running logs maybe found in log/workerlog.0
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  import imp
[2021-04-10 21:19:33,027] [    INFO] - Found /home/aistudio/.paddlenlp/models/ernie-1.0/vocab.txt
[2021-04-10 21:19:33,041] [    INFO] - Already cached /home/aistudio/.paddlenlp/models/ernie-1.0/ernie_v1_chn_base.pdparams
W0410 21:19:33.042527 12581 device_context.cc:362] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1
W0410 21:19:33.047051 12581 device_context.cc:372] device: 0, cuDNN Version: 7.6.
============start train==========
train epoch: 0 - step: 10 (total: 17400) - loss: 1.631316
train epoch: 0 - step: 20 (total: 17400) - loss: 1.261623
train epoch: 0 - step: 30 (total: 17400) - loss: 1.499143
train epoch: 0 - step: 40 (total: 17400) - loss: 1.374749
train epoch: 0 - step: 50 (total: 17400) - loss: 2.372678
dev step: 50 - loss: 1.41820, precision: 0.00000, recall: 0.00000, f1: 0.00000 current best 0.00000
train epoch: 0 - step: 60 (total: 17400) - loss: 1.314709
train epoch: 0 - step: 70 (total: 17400) - loss: 1.248975
train epoch: 0 - step: 80 (total: 17400) - loss: 1.268549
train epoch: 0 - step: 90 (total: 17400) - loss: 1.528821
train epoch: 0 - step: 100 (total: 17400) - loss: 1.331270
dev step: 100 - loss: 1.19630, precision: 0.00465, recall: 0.00027, f1: 0.00051 current best 0.00000
==============================================save best model best performerence 0.000513
train epoch: 0 - step: 110 (total: 17400) - loss: 1.180573
train epoch: 0 - step: 120 (total: 17400) - loss: 1.466052
train epoch: 0 - step: 130 (total: 17400) - loss: 1.448678
train epoch: 0 - step: 140 (total: 17400) - loss: 1.050897
train epoch: 0 - step: 150 (total: 17400) - loss: 1.163228
dev step: 150 - loss: 1.00658, precision: 0.15019, recall: 0.07632, f1: 0.10121 current best 0.00051
==============================================save best model best performerence 0.101207
train epoch: 0 - step: 160 (total: 17400) - loss: 1.152660
train epoch: 0 - step: 170 (total: 17400) - loss: 0.800648
train epoch: 0 - step: 180 (total: 17400) - loss: 0.863754
train epoch: 0 - step: 190 (total: 17400) - loss: 1.399399
train epoch: 0 - step: 200 (total: 17400) - loss: 0.933540
dev step: 200 - loss: 0.87702, precision: 0.29212, recall: 0.16812, f1: 0.21341 current best 0.10121
==============================================save best model best performerence 0.213411
train epoch: 0 - step: 210 (total: 17400) - loss: 0.622572
train epoch: 0 - step: 220 (total: 17400) - loss: 0.693645
train epoch: 0 - step: 230 (total: 17400) - loss: 0.456951
train epoch: 0 - step: 240 (total: 17400) - loss: 0.852158
train epoch: 0 - step: 250 (total: 17400) - loss: 0.692744
dev step: 250 - loss: 0.74932, precision: 0.19567, recall: 0.19880, f1: 0.19722 current best 0.21341
train epoch: 0 - step: 260 (total: 17400) - loss: 0.705850
train epoch: 0 - step: 270 (total: 17400) - loss: 0.601921
train epoch: 0 - step: 280 (total: 17400) - loss: 0.790073
train epoch: 0 - step: 290 (total: 17400) - loss: 0.576146
train epoch: 0 - step: 300 (total: 17400) - loss: 0.896055
dev step: 300 - loss: 0.68762, precision: 0.26187, recall: 0.26806, f1: 0.26493 current best 0.21341
==============================================save best model best performerence 0.264931
train epoch: 0 - step: 310 (total: 17400) - loss: 0.550488
train epoch: 0 - step: 320 (total: 17400) - loss: 0.755333
train epoch: 0 - step: 330 (total: 17400) - loss: 0.608667
train epoch: 0 - step: 340 (total: 17400) - loss: 0.735348
train epoch: 0 - step: 350 (total: 17400) - loss: 0.608221
dev step: 350 - loss: 0.58279, precision: 0.30612, recall: 0.30554, f1: 0.30583 current best 0.26493
==============================================save best model best performerence 0.305831
train epoch: 0 - step: 360 (total: 17400) - loss: 0.547571
train epoch: 0 - step: 370 (total: 17400) - loss: 0.651604
train epoch: 0 - step: 380 (total: 17400) - loss: 0.356159
train epoch: 0 - step: 390 (total: 17400) - loss: 0.471009
train epoch: 0 - step: 400 (total: 17400) - loss: 0.464584
dev step: 400 - loss: 0.54754, precision: 0.28086, recall: 0.28055, f1: 0.28071 current best 0.30583
train epoch: 0 - step: 410 (total: 17400) - loss: 0.626027
train epoch: 0 - step: 420 (total: 17400) - loss: 0.362687
train epoch: 0 - step: 430 (total: 17400) - loss: 0.477045
train epoch: 0 - step: 440 (total: 17400) - loss: 0.504392
train epoch: 0 - step: 450 (total: 17400) - loss: 0.452660
dev step: 450 - loss: 0.51604, precision: 0.31064, recall: 0.29495, f1: 0.30259 current best 0.30583
train epoch: 0 - step: 460 (total: 17400) - loss: 0.315736
train epoch: 0 - step: 470 (total: 17400) - loss: 0.695824
train epoch: 0 - step: 480 (total: 17400) - loss: 0.668844
train epoch: 0 - step: 490 (total: 17400) - loss: 0.485630
train epoch: 0 - step: 500 (total: 17400) - loss: 0.553830
dev step: 500 - loss: 0.48320, precision: 0.35442, recall: 0.33623, f1: 0.34509 current best 0.30583
==============================================save best model best performerence 0.345087
train epoch: 0 - step: 510 (total: 17400) - loss: 0.630377
train epoch: 0 - step: 520 (total: 17400) - loss: 0.870098
train epoch: 0 - step: 530 (total: 17400) - loss: 0.525724
train epoch: 0 - step: 540 (total: 17400) - loss: 0.456801
train epoch: 0 - step: 550 (total: 17400) - loss: 0.336868
dev step: 550 - loss: 0.46332, precision: 0.34337, recall: 0.40576, f1: 0.37197 current best 0.34509
==============================================save best model best performerence 0.371966
train epoch: 0 - step: 560 (total: 17400) - loss: 0.549533
train epoch: 0 - step: 570 (total: 17400) - loss: 0.589998
train epoch: 0 - step: 580 (total: 17400) - loss: 0.466343
train epoch: 0 - step: 590 (total: 17400) - loss: 0.757425
train epoch: 0 - step: 600 (total: 17400) - loss: 0.476054
dev step: 600 - loss: 0.46148, precision: 0.32498, recall: 0.40494, f1: 0.36058 current best 0.37197
train epoch: 0 - step: 610 (total: 17400) - loss: 0.867860
train epoch: 0 - step: 620 (total: 17400) - loss: 0.423540
train epoch: 0 - step: 630 (total: 17400) - loss: 0.584098
train epoch: 0 - step: 640 (total: 17400) - loss: 0.333824
train epoch: 0 - step: 650 (total: 17400) - loss: 0.506903
dev step: 650 - loss: 0.41693, precision: 0.35424, recall: 0.40657, f1: 0.37860 current best 0.37197
==============================================save best model best performerence 0.378604
train epoch: 0 - step: 660 (total: 17400) - loss: 0.349384
train epoch: 0 - step: 670 (total: 17400) - loss: 0.551703
train epoch: 0 - step: 680 (total: 17400) - loss: 0.407071
train epoch: 0 - step: 690 (total: 17400) - loss: 0.340015
train epoch: 0 - step: 700 (total: 17400) - loss: 0.514608
dev step: 700 - loss: 0.39935, precision: 0.36408, recall: 0.44704, f1: 0.40132 current best 0.37860
==============================================save best model best performerence 0.401317
train epoch: 0 - step: 710 (total: 17400) - loss: 0.391622
train epoch: 0 - step: 720 (total: 17400) - loss: 0.411886
train epoch: 0 - step: 730 (total: 17400) - loss: 0.396601
train epoch: 0 - step: 740 (total: 17400) - loss: 0.408536
train epoch: 0 - step: 750 (total: 17400) - loss: 0.490862
dev step: 750 - loss: 0.38335, precision: 0.38105, recall: 0.40196, f1: 0.39122 current best 0.40132
train epoch: 0 - step: 760 (total: 17400) - loss: 0.589839
train epoch: 0 - step: 770 (total: 17400) - loss: 0.495729
train epoch: 0 - step: 780 (total: 17400) - loss: 0.292985
train epoch: 0 - step: 790 (total: 17400) - loss: 0.288670
train epoch: 0 - step: 800 (total: 17400) - loss: 0.591148
dev step: 800 - loss: 0.37288, precision: 0.37370, recall: 0.43998, f1: 0.40414 current best 0.40132
==============================================save best model best performerence 0.404141
train epoch: 0 - step: 810 (total: 17400) - loss: 0.323106
train epoch: 0 - step: 820 (total: 17400) - loss: 0.374065
train epoch: 0 - step: 830 (total: 17400) - loss: 0.303335
train epoch: 0 - step: 840 (total: 17400) - loss: 0.362465
train epoch: 0 - step: 850 (total: 17400) - loss: 0.270363
dev step: 850 - loss: 0.34926, precision: 0.40548, recall: 0.39788, f1: 0.40164 current best 0.40414
train epoch: 0 - step: 860 (total: 17400) - loss: 0.640344
train epoch: 1 - step: 870 (total: 17400) - loss: 0.317456
train epoch: 1 - step: 880 (total: 17400) - loss: 0.338208
train epoch: 1 - step: 890 (total: 17400) - loss: 0.270992
train epoch: 1 - step: 900 (total: 17400) - loss: 0.262994
dev step: 900 - loss: 0.35676, precision: 0.36447, recall: 0.50978, f1: 0.42505 current best 0.40414
==============================================save best model best performerence 0.425045
train epoch: 1 - step: 910 (total: 17400) - loss: 0.187394
train epoch: 1 - step: 920 (total: 17400) - loss: 0.319919
train epoch: 1 - step: 930 (total: 17400) - loss: 0.364867
train epoch: 1 - step: 940 (total: 17400) - loss: 0.167465
train epoch: 1 - step: 950 (total: 17400) - loss: 0.378459
dev step: 950 - loss: 0.33845, precision: 0.40183, recall: 0.50027, f1: 0.44568 current best 0.42505
==============================================save best model best performerence 0.445681
train epoch: 1 - step: 960 (total: 17400) - loss: 0.505818
train epoch: 1 - step: 970 (total: 17400) - loss: 0.318232
train epoch: 1 - step: 980 (total: 17400) - loss: 0.354184
train epoch: 1 - step: 990 (total: 17400) - loss: 0.473859
train epoch: 1 - step: 1000 (total: 17400) - loss: 0.268665
dev step: 1000 - loss: 0.34670, precision: 0.41990, recall: 0.50543, f1: 0.45871 current best 0.44568
==============================================save best model best performerence 0.458713
train epoch: 1 - step: 1010 (total: 17400) - loss: 0.457268
train epoch: 1 - step: 1020 (total: 17400) - loss: 0.279792
train epoch: 1 - step: 1030 (total: 17400) - loss: 0.311157
train epoch: 1 - step: 1040 (total: 17400) - loss: 0.266172
train epoch: 1 - step: 1050 (total: 17400) - loss: 0.348649
dev step: 1050 - loss: 0.33027, precision: 0.43967, recall: 0.50570, f1: 0.47038 current best 0.45871
==============================================save best model best performerence 0.470380
train epoch: 1 - step: 1060 (total: 17400) - loss: 0.250878
train epoch: 1 - step: 1070 (total: 17400) - loss: 0.255359
train epoch: 1 - step: 1080 (total: 17400) - loss: 0.244313
train epoch: 1 - step: 1090 (total: 17400) - loss: 0.394027
train epoch: 1 - step: 1100 (total: 17400) - loss: 0.345162
dev step: 1100 - loss: 0.31890, precision: 0.40973, recall: 0.53992, f1: 0.46590 current best 0.47038
train epoch: 1 - step: 1110 (total: 17400) - loss: 0.351362
train epoch: 1 - step: 1120 (total: 17400) - loss: 0.505625
train epoch: 1 - step: 1130 (total: 17400) - loss: 0.254914
train epoch: 1 - step: 1140 (total: 17400) - loss: 0.299322
train epoch: 1 - step: 1150 (total: 17400) - loss: 0.230382
dev step: 1150 - loss: 0.33202, precision: 0.39473, recall: 0.57387, f1: 0.46774 current best 0.47038
train epoch: 1 - step: 1160 (total: 17400) - loss: 0.531530
train epoch: 1 - step: 1170 (total: 17400) - loss: 0.327992
train epoch: 1 - step: 1180 (total: 17400) - loss: 0.261732
train epoch: 1 - step: 1190 (total: 17400) - loss: 0.416111
train epoch: 1 - step: 1200 (total: 17400) - loss: 0.587504
dev step: 1200 - loss: 0.31763, precision: 0.42678, recall: 0.57224, f1: 0.48892 current best 0.47038
==============================================save best model best performerence 0.488920
train epoch: 1 - step: 1210 (total: 17400) - loss: 0.318957
train epoch: 1 - step: 1220 (total: 17400) - loss: 0.240229
train epoch: 1 - step: 1230 (total: 17400) - loss: 0.268677
train epoch: 1 - step: 1240 (total: 17400) - loss: 0.306026
train epoch: 1 - step: 1250 (total: 17400) - loss: 0.207791
dev step: 1250 - loss: 0.32002, precision: 0.44161, recall: 0.53205, f1: 0.48263 current best 0.48892
train epoch: 1 - step: 1260 (total: 17400) - loss: 0.328496
train epoch: 1 - step: 1270 (total: 17400) - loss: 0.169225
train epoch: 1 - step: 1280 (total: 17400) - loss: 0.154055
train epoch: 1 - step: 1290 (total: 17400) - loss: 0.245896
train epoch: 1 - step: 1300 (total: 17400) - loss: 0.307641
dev step: 1300 - loss: 0.31898, precision: 0.43654, recall: 0.56328, f1: 0.49188 current best 0.48892
==============================================save best model best performerence 0.491877
train epoch: 1 - step: 1310 (total: 17400) - loss: 0.333137
train epoch: 1 - step: 1320 (total: 17400) - loss: 0.245721
train epoch: 1 - step: 1330 (total: 17400) - loss: 0.284762
train epoch: 1 - step: 1340 (total: 17400) - loss: 0.454689
train epoch: 1 - step: 1350 (total: 17400) - loss: 0.181988
dev step: 1350 - loss: 0.31523, precision: 0.43998, recall: 0.58039, f1: 0.50053 current best 0.49188
==============================================save best model best performerence 0.500527
train epoch: 1 - step: 1360 (total: 17400) - loss: 0.207600
train epoch: 1 - step: 1370 (total: 17400) - loss: 0.521199
train epoch: 1 - step: 1380 (total: 17400) - loss: 0.212064
train epoch: 1 - step: 1390 (total: 17400) - loss: 0.304855
train epoch: 1 - step: 1400 (total: 17400) - loss: 0.364982
dev step: 1400 - loss: 0.32255, precision: 0.45131, recall: 0.57523, f1: 0.50579 current best 0.50053
==============================================save best model best performerence 0.505791
train epoch: 1 - step: 1410 (total: 17400) - loss: 0.282940
train epoch: 1 - step: 1420 (total: 17400) - loss: 0.247372
train epoch: 1 - step: 1430 (total: 17400) - loss: 0.204306
train epoch: 1 - step: 1440 (total: 17400) - loss: 0.197937
train epoch: 1 - step: 1450 (total: 17400) - loss: 0.248342
dev step: 1450 - loss: 0.31655, precision: 0.43383, recall: 0.57605, f1: 0.49492 current best 0.50579
train epoch: 1 - step: 1460 (total: 17400) - loss: 0.303543
train epoch: 1 - step: 1470 (total: 17400) - loss: 0.228280
train epoch: 1 - step: 1480 (total: 17400) - loss: 0.272400
train epoch: 1 - step: 1490 (total: 17400) - loss: 0.295671
train epoch: 1 - step: 1500 (total: 17400) - loss: 0.238553
dev step: 1500 - loss: 0.29889, precision: 0.45878, recall: 0.50027, f1: 0.47863 current best 0.50579
train epoch: 1 - step: 1510 (total: 17400) - loss: 0.340570
train epoch: 1 - step: 1520 (total: 17400) - loss: 0.178270
train epoch: 1 - step: 1530 (total: 17400) - loss: 0.304790
train epoch: 1 - step: 1540 (total: 17400) - loss: 0.289224
train epoch: 1 - step: 1550 (total: 17400) - loss: 0.371867
dev step: 1550 - loss: 0.30130, precision: 0.45212, recall: 0.61162, f1: 0.51991 current best 0.50579
==============================================save best model best performerence 0.519912
train epoch: 1 - step: 1560 (total: 17400) - loss: 0.240305
train epoch: 1 - step: 1570 (total: 17400) - loss: 0.316205
train epoch: 1 - step: 1580 (total: 17400) - loss: 0.311467
train epoch: 1 - step: 1590 (total: 17400) - loss: 0.270995
train epoch: 1 - step: 1600 (total: 17400) - loss: 0.184202
dev step: 1600 - loss: 0.29522, precision: 0.43972, recall: 0.59234, f1: 0.50474 current best 0.51991
train epoch: 1 - step: 1610 (total: 17400) - loss: 0.431742
train epoch: 1 - step: 1620 (total: 17400) - loss: 0.234169
train epoch: 1 - step: 1630 (total: 17400) - loss: 0.247429
train epoch: 1 - step: 1640 (total: 17400) - loss: 0.355582
train epoch: 1 - step: 1650 (total: 17400) - loss: 0.281345
dev step: 1650 - loss: 0.29843, precision: 0.46141, recall: 0.58446, f1: 0.51570 current best 0.51991
train epoch: 1 - step: 1660 (total: 17400) - loss: 0.201275
train epoch: 1 - step: 1670 (total: 17400) - loss: 0.304434
train epoch: 1 - step: 1680 (total: 17400) - loss: 0.330689
train epoch: 1 - step: 1690 (total: 17400) - loss: 0.277704
train epoch: 1 - step: 1700 (total: 17400) - loss: 0.196703
dev step: 1700 - loss: 0.28736, precision: 0.46048, recall: 0.59017, f1: 0.51732 current best 0.51991
train epoch: 1 - step: 1710 (total: 17400) - loss: 0.253590
train epoch: 1 - step: 1720 (total: 17400) - loss: 0.238998
train epoch: 1 - step: 1730 (total: 17400) - loss: 0.267489
……
train epoch: 19 - step: 16530 (total: 17400) - loss: 0.090804
train epoch: 19 - step: 16540 (total: 17400) - loss: 0.172505
train epoch: 19 - step: 16550 (total: 17400) - loss: 0.041797
dev step: 16550 - loss: 0.42366, precision: 0.53121, recall: 0.61706, f1: 0.57093 current best 0.58724
train epoch: 19 - step: 16560 (total: 17400) - loss: 0.083284
train epoch: 19 - step: 16570 (total: 17400) - loss: 0.027010
train epoch: 19 - step: 16580 (total: 17400) - loss: 0.075735
train epoch: 19 - step: 16590 (total: 17400) - loss: 0.055073
train epoch: 19 - step: 16600 (total: 17400) - loss: 0.089312
dev step: 16600 - loss: 0.40673, precision: 0.53275, recall: 0.62955, f1: 0.57712 current best 0.58724
train epoch: 19 - step: 16610 (total: 17400) - loss: 0.140136
train epoch: 19 - step: 16620 (total: 17400) - loss: 0.056313
train epoch: 19 - step: 16630 (total: 17400) - loss: 0.080976
train epoch: 19 - step: 16640 (total: 17400) - loss: 0.049731
train epoch: 19 - step: 16650 (total: 17400) - loss: 0.029350
dev step: 16650 - loss: 0.41901, precision: 0.53045, recall: 0.63878, f1: 0.57960 current best 0.58724
train epoch: 19 - step: 16660 (total: 17400) - loss: 0.039192
train epoch: 19 - step: 16670 (total: 17400) - loss: 0.114814
train epoch: 19 - step: 16680 (total: 17400) - loss: 0.128558
train epoch: 19 - step: 16690 (total: 17400) - loss: 0.090364
train epoch: 19 - step: 16700 (total: 17400) - loss: 0.015403
dev step: 16700 - loss: 0.40519, precision: 0.52265, recall: 0.61108, f1: 0.56342 current best 0.58724
train epoch: 19 - step: 16710 (total: 17400) - loss: 0.110993
train epoch: 19 - step: 16720 (total: 17400) - loss: 0.070296
train epoch: 19 - step: 16730 (total: 17400) - loss: 0.062231
train epoch: 19 - step: 16740 (total: 17400) - loss: 0.067118
train epoch: 19 - step: 16750 (total: 17400) - loss: 0.041820
dev step: 16750 - loss: 0.40756, precision: 0.51713, recall: 0.62710, f1: 0.56683 current best 0.58724
train epoch: 19 - step: 16760 (total: 17400) - loss: 0.061612
train epoch: 19 - step: 16770 (total: 17400) - loss: 0.121729
train epoch: 19 - step: 16780 (total: 17400) - loss: 0.143003
train epoch: 19 - step: 16790 (total: 17400) - loss: 0.092972
train epoch: 19 - step: 16800 (total: 17400) - loss: 0.085720
dev step: 16800 - loss: 0.39751, precision: 0.52164, recall: 0.61543, f1: 0.56466 current best 0.58724
train epoch: 19 - step: 16810 (total: 17400) - loss: 0.121482
train epoch: 19 - step: 16820 (total: 17400) - loss: 0.056438
train epoch: 19 - step: 16830 (total: 17400) - loss: 0.142359
train epoch: 19 - step: 16840 (total: 17400) - loss: 0.037087
train epoch: 19 - step: 16850 (total: 17400) - loss: 0.090542
dev step: 16850 - loss: 0.43593, precision: 0.54292, recall: 0.62520, f1: 0.58117 current best 0.58724
train epoch: 19 - step: 16860 (total: 17400) - loss: 0.180082
train epoch: 19 - step: 16870 (total: 17400) - loss: 0.053868
train epoch: 19 - step: 16880 (total: 17400) - loss: 0.099053
train epoch: 19 - step: 16890 (total: 17400) - loss: 0.041414
train epoch: 19 - step: 16900 (total: 17400) - loss: 0.059607
dev step: 16900 - loss: 0.40950, precision: 0.53281, recall: 0.64177, f1: 0.58223 current best 0.58724
train epoch: 19 - step: 16910 (total: 17400) - loss: 0.081703
train epoch: 19 - step: 16920 (total: 17400) - loss: 0.058062
train epoch: 19 - step: 16930 (total: 17400) - loss: 0.029519
train epoch: 19 - step: 16940 (total: 17400) - loss: 0.045415
train epoch: 19 - step: 16950 (total: 17400) - loss: 0.078151
dev step: 16950 - loss: 0.39955, precision: 0.52993, recall: 0.62520, f1: 0.57364 current best 0.58724
train epoch: 19 - step: 16960 (total: 17400) - loss: 0.112182
train epoch: 19 - step: 16970 (total: 17400) - loss: 0.072816
train epoch: 19 - step: 16980 (total: 17400) - loss: 0.171157
train epoch: 19 - step: 16990 (total: 17400) - loss: 0.017713
train epoch: 19 - step: 17000 (total: 17400) - loss: 0.090382
dev step: 17000 - loss: 0.41824, precision: 0.54227, recall: 0.61841, f1: 0.57785 current best 0.58724
train epoch: 19 - step: 17010 (total: 17400) - loss: 0.126030
train epoch: 19 - step: 17020 (total: 17400) - loss: 0.072342
train epoch: 19 - step: 17030 (total: 17400) - loss: 0.060565
train epoch: 19 - step: 17040 (total: 17400) - loss: 0.073558
train epoch: 19 - step: 17050 (total: 17400) - loss: 0.033999
dev step: 17050 - loss: 0.42881, precision: 0.52828, recall: 0.61896, f1: 0.57004 current best 0.58724
train epoch: 19 - step: 17060 (total: 17400) - loss: 0.036299
train epoch: 19 - step: 17070 (total: 17400) - loss: 0.052640
train epoch: 19 - step: 17080 (total: 17400) - loss: 0.054092
train epoch: 19 - step: 17090 (total: 17400) - loss: 0.042668
train epoch: 19 - step: 17100 (total: 17400) - loss: 0.058963
dev step: 17100 - loss: 0.42499, precision: 0.52823, recall: 0.62765, f1: 0.57366 current best 0.58724
train epoch: 19 - step: 17110 (total: 17400) - loss: 0.030797
train epoch: 19 - step: 17120 (total: 17400) - loss: 0.096806
train epoch: 19 - step: 17130 (total: 17400) - loss: 0.078804
train epoch: 19 - step: 17140 (total: 17400) - loss: 0.047607
train epoch: 19 - step: 17150 (total: 17400) - loss: 0.056086
dev step: 17150 - loss: 0.39892, precision: 0.53097, recall: 0.58908, f1: 0.55852 current best 0.58724
train epoch: 19 - step: 17160 (total: 17400) - loss: 0.148140
train epoch: 19 - step: 17170 (total: 17400) - loss: 0.096577
train epoch: 19 - step: 17180 (total: 17400) - loss: 0.146454
train epoch: 19 - step: 17190 (total: 17400) - loss: 0.045576
train epoch: 19 - step: 17200 (total: 17400) - loss: 0.084547
dev step: 17200 - loss: 0.39334, precision: 0.51481, recall: 0.59017, f1: 0.54992 current best 0.58724
train epoch: 19 - step: 17210 (total: 17400) - loss: 0.081501
train epoch: 19 - step: 17220 (total: 17400) - loss: 0.079089
train epoch: 19 - step: 17230 (total: 17400) - loss: 0.063774
train epoch: 19 - step: 17240 (total: 17400) - loss: 0.017078
train epoch: 19 - step: 17250 (total: 17400) - loss: 0.086831
dev step: 17250 - loss: 0.38374, precision: 0.52425, recall: 0.62819, f1: 0.57153 current best 0.58724
train epoch: 19 - step: 17260 (total: 17400) - loss: 0.076878
train epoch: 19 - step: 17270 (total: 17400) - loss: 0.036476
train epoch: 19 - step: 17280 (total: 17400) - loss: 0.146443
train epoch: 19 - step: 17290 (total: 17400) - loss: 0.182334
train epoch: 19 - step: 17300 (total: 17400) - loss: 0.040053
dev step: 17300 - loss: 0.40251, precision: 0.52484, recall: 0.63987, f1: 0.57667 current best 0.58724
train epoch: 19 - step: 17310 (total: 17400) - loss: 0.107188
train epoch: 19 - step: 17320 (total: 17400) - loss: 0.143759
train epoch: 19 - step: 17330 (total: 17400) - loss: 0.113866
train epoch: 19 - step: 17340 (total: 17400) - loss: 0.115857
train epoch: 19 - step: 17350 (total: 17400) - loss: 0.035648
dev step: 17350 - loss: 0.41305, precision: 0.52708, recall: 0.61325, f1: 0.56691 current best 0.58724
train epoch: 19 - step: 17360 (total: 17400) - loss: 0.047787
train epoch: 19 - step: 17370 (total: 17400) - loss: 0.057836
train epoch: 19 - step: 17380 (total: 17400) - loss: 0.094507
train epoch: 19 - step: 17390 (total: 17400) - loss: 0.066693
INFO 2021-04-10 22:43:36,736 launch.py:240] Local processes completed.
end DuEE1.0 role train
# 论元识别预测
!bash run_duee_1.sh role_predict
check and create directory
dir ./ckpt exist
dir ./ckpt/DuEE1.0 exist
dir ./submit exist

start DuEE1.0 role predict
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  import imp
[2021-04-10 22:44:10,178] [    INFO] - Found /home/aistudio/.paddlenlp/models/ernie-1.0/vocab.txt
[2021-04-10 22:44:10,192] [    INFO] - Already cached /home/aistudio/.paddlenlp/models/ernie-1.0/ernie_v1_chn_base.pdparams
W0410 22:44:10.193476 16283 device_context.cc:362] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1
W0410 22:44:10.198055 16283 device_context.cc:372] device: 0, cuDNN Version: 7.6.
============start predict==========
Loaded parameters from ./ckpt/DuEE1.0/role/best.pdparams
save data 499 to ./ckpt/DuEE1.0/role/test_pred.json
end DuEE1.0 role predict
# 数据后处理,提交预测结果
# 结果存放于submit/test_duee_1.json
!bash run_duee_1.sh pred_2_submit
check and create directory
dir ./ckpt exist
dir ./ckpt/DuEE1.0 exist
dir ./submit exist

start DuEE1.0 predict data merge to submit fotmat
trigger predict 499 load from ./ckpt/DuEE1.0/trigger/test_pred.json
role predict 499 load from ./ckpt/DuEE1.0/role/test_pred.json
schema 65 load from ./conf/DuEE1.0/event_schema.json
submit data 499 save to ./submit/test_duee_1.json
end DuEE1.0 role predict data merge

2.1 评测方法

  事件论元结果与人工标注的事件论元结果进行匹配,并按字级别匹配F1进行打分,不区分大小写,如论元有多个表述,则取多个匹配F1中的最高值

  f1_score = (2 * P * R) / (P + R),其中

  • P=预测论元得分总和 / 所有预测论元的数量
  • R=预测论元得分总和 / 所有人工标注论元的数量
  • 预测论元得分=事件类型是否准确 * 论元角色是否准确 * 字级别匹配F1值 (*是相乘)
  • 字级别匹配F1值 = 2 * 字级别匹配P值 * 字级别匹配R值 / (字级别匹配P值 + 字级别匹配R值)
  • 字级别匹配P值 = 预测论元和人工标注论元共有字的数量/ 预测论元字数
  • 字级别匹配R值 = 预测论元和人工标注论元共有字的数量/ 人工标注论元字数

三、Tricks

3.1 尝试更多的预训练模型

  基线采用的预训练模型为ERNIE,PaddleNLP提供了丰富的预训练模型,如BERT,RoBERTa,Electra,XLNet等。

  参考PaddleNLP预训练模型介绍

  如可以选择RoBERTa large中文模型优化模型效果,只需更换模型和tokenizer即可无缝衔接。

from paddlenlp.transformers import RobertaForTokenClassification, RobertaTokenizer

model = RobertaForTokenClassification.from_pretrained("roberta-wwm-ext-large", num_classes=len(label_map))
tokenizer = RobertaTokenizer.from_pretrained("roberta-wwm-ext-large")
[2021-04-10 22:48:18,899] [    INFO] - Downloading https://paddlenlp.bj.bcebos.com/models/transformers/roberta_large/roberta_chn_large.pdparams and saved to /home/aistudio/.paddlenlp/models/roberta-wwm-ext-large
[2021-04-10 22:48:18,902] [    INFO] - Downloading roberta_chn_large.pdparams from https://paddlenlp.bj.bcebos.com/models/transformers/roberta_large/roberta_chn_large.pdparams
100%|██████████| 1271615/1271615 [00:18<00:00, 69327.15it/s]
[2021-04-10 22:48:42,145] [    INFO] - Downloading vocab.txt from https://paddlenlp.bj.bcebos.com/models/transformers/roberta_large/vocab.txt
100%|██████████| 107/107 [00:00<00:00, 2073.95it/s]

3.2 修改模型网络结构

  对于序列标注任务,大家会想到GRU+CRF作为常用网络,如何在预训练模型基础之上增加这些网络层呢?

import paddle.nn as nn
from paddlenlp.transformers import ErnieModel
from paddlenlp.layers import LinearChainCrf, LinearChainCrfLoss


class Model(ErnieModel):
    def __init__(self, ernie, num_classes=2, dropout=None, gru_hidden_size=128):
        super(Model, self).__init__()
        self.num_classes = num_classes
        # allow ernie to be config
        self.ernie = ernie  
        self.dropout = nn.Dropout(dropout if dropout is not None else
                                  self.ernie.config["hidden_dropout_prob"])
        # add bi-gru
        self.gru = nn.GRU(
            input_size=self.ernie.config["hidden_size"],
            hidden_size=gru_hidden_size,
            direction='bidirect')
        self.fc = nn.Linear(
            in_features=gru_hidden_size * 2,
            out_features=num_classes)
        # add crf
        self.crf = LinearChainCrf(
            num_classes, 
            with_start_stop_tag=False)
        self.crf_loss = LinearChainCrfLoss(self.crf)
        self.viterbi_decoder = ViterbiDecoder(
            self.crf.transitions, 
            with_start_stop_tag=False)


    def forward(self,
                input_ids,
                token_type_ids=None,
                position_ids=None,
                attention_mask=None):
        sequence_output, _ = self.bert(
            input_ids,
            token_type_ids=token_type_ids,
            position_ids=position_ids,
            attention_mask=attention_mask)
        sequence_output = self.dropout(sequence_output)
        bigru_output, _ = self.gru(sequence_output)
        emission = self.fc(bigru_output)
        _, prediction = self.viterbi_decoder(emission, lengths)
        if labels is not None:
            loss = self.crf_loss(emission, lengths, prediction, labels)
            return loss, lengths, prediction, labels
        else:
            return inputs, lengths, prediction

3.3 模型集成

  使用多个模型进行训练预测,将各个模型预测结果进行融合

参考资料

  https://aistudio.baidu.com/aistudio/competition/detail/65

相关推荐
©️2020 CSDN 皮肤主题: 技术黑板 设计师:CSDN官方博客 返回首页