2022-kaggle-nlp赛事:Feedback Prize - English Language Learning
零、比赛介绍
比赛地址Feedback Prize - English Language Learning | Kaggle
0.1 比赛目标
写作是一项基本技能。可惜很少学生能够磨练,因为学校很少布置写作任务。学习英语作为第二语言的学生,即英语语言学习者(ELL, English Language Learners),尤其受到缺乏实践的影响。现有的工具无法根据学生的语言能力提供反馈,导致最终评估可能对学习者产生偏差。数据科学可够改进自动反馈工具,以更好地支持这些学习者的独特需求。
本次比赛的目标是评估8-12年级英语学习者(ELL,)的语言水平。利用ELLs写的文章作为数据集,开发更好地支持所有学生写作能力的模型。
本次比赛的评价指标是MCRMSE,公式我就截图放上来了
0.2 数据集
本次比赛数据集(ELLIPSE语料库)包括8-12年级英语学习者(ELL)撰写的议论文。论文根据六个分析指标进行评分:cohesion, syntax, vocabulary, phraseology, grammar, and conventions.(衔接、语法、词汇、短语、语法和惯例)。分数范围从1.0到5.0,增量为0.5。得分越高,表示该能力越熟练。您的任务是预测测试集论文的六个指标分数。其中一些文章出现在 Feedback Prize - Evaluating Student Writing 和 Feedback Prize - Predicting Effective Arguments 的数据集中,欢迎您在本次比赛中使用这些早期数据集。
文件和字段:
train.csv:由唯一的text_id
标识,full_text
字段表示文章全文,还有另外6个写作评分指标
test.csv:只有text_id
和full_text
字段,且只有三个测试样本。
sample_submission.csv :提交文件范例
一、设置
1.1 导入相关库
import os,gc,re,ast,sys,copy,json,time,datetime,math,string,pickle,random,joblib,itertools
from distutils.util import strtobool
'''
这段代码使用 Python 的 warnings 模块来控制警告信息的显示。
第一行代码 import warnings 导入了 Python 的 warnings 模块,该模块提供了用于处理警告信息的函数和类。
第二行代码 warnings.filterwarnings('ignore') 调用了 warnings 模块中的 filterwarnings() 函数,用于控制警告信息的显示。在这个例子中,传递给 filterwarnings() 函数的参数是 'ignore',表示忽略所有警告信息。
这段代码的作用是忽略所有警告信息,使它们不会在程序运行时显示出来。这在某些情况下可能会有用,例如当你想要避免某些已知但无关紧要的警告信息干扰程序输出时。
'''
import warnings
warnings.filterwarnings('ignore')
import scipy as sp
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from tqdm.auto import tqdm
from sklearn.metrics import mean_squared_error # 均方误差(Mean Squared Error,MSE)
from sklearn.model_selection import StratifiedKFold, GroupKFold, KFold,train_test_split
import torch
import torch.nn as nn
from torch.utils.data import DataLoader, Dataset
import torch.nn.functional as F
from torch.nn import Parameter
from torch.optim import Adam, SGD, AdamW
from torch.utils.checkpoint import checkpoint
import transformers, tokenizers
print(f'transformers.__version__: {transformers.__version__}')
print(f'tokenizers.__version__: {tokenizers.__version__}')
'''
第一行代码 from transformers import AutoTokenizer, AutoModel, AutoConfig
导入了 transformers 库中的三个类AutoTokenizer 类用于自动加载预训练的分词器,
AutoModel类用于自动加载预训练的模型
AutoConfig 类用于自动加载模型的配置信息。
'''
from transformers import AutoTokenizer, AutoModel, AutoConfig
'''
这两个函数都用于创建学习率调度器,分别使用线性预热和余弦预热策略。
'''
from transformers import get_linear_schedule_with_warmup, get_cosine_schedule_with_warmup
'''
transformers 库中的分词器在运行时会检查 TOKENIZERS_PARALLELISM 环境变量的值
这个环境变量用于控制分词器是否使用并行处理来加速分词。将其设置为 'true' 表示启用并行处理。
'''
os.environ['TOKENIZERS_PARALLELISM']='true'
transformers.__version__: 4.30.2
tokenizers.__version__: 0.13.3
1.2 设置超参数和随机种子
class CFG:
str_now = datetime.datetime.now().strftime('%Y%m%d-%H%M')
model = 'deberta-v3-base'
model_path = '/hy-tmp/model' # 模型的路径
batch_size, n_target, num_workers = 8, 6, 4
target_cols = ['cohesion', 'syntax', 'vocabulary', 'phraseology', 'grammar', 'conventions']
epoch, print_freq = 5, 20 # 训练时每搁20step打印一次
loss_func = 'RMSE' # 'SmoothL1', 'RMSE'
pooling = 'attention' # mean, max, min, attention, weightedlayer
gradient_checkpointing = True # 未知,不知道干嘛的
gradient_accumulation_steps = 1 # 是否使用梯度累计更新
max_grad_norm = 1000 # 梯度裁剪
apex = True # 是否进行自动混合精度训练
scheduler = 'cosine'
# num_cycles:余弦周期数,默认为 0.5。表示在训练过程中余弦曲线的周期数
# num_warmup_steps:是一个用于控制学习率预热的超参数。它表示在训练开始时,学习率预热阶段内的步数。学习率预热是一种常用的学习率调度策略,它在训练开始时逐渐增加学习率,直到达到最大值。这样做的目的是为了在训练开始时避免使用过大的学习率,从而防止模型参数更新过快,导致不稳定。
num_cycles, num_warmup_steps = 0.5, 0
encoder_lr, decoder_lr, min_lr = 2e-5, 2e-5, 1e-6
max_len = 512
weight_decay = 0.01 # 参数优化器中需要权重衰退的参数的权重衰退超参数
fgm = True # 是否使用fgm对抗网络攻击
wandb = True # 是否启用wandb
adv_lr, adv_eps, eps, betas = 1, 0.2, 1e-6, (0.9, 0.999) # 不知道啥用
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') # 如果有GPU环境就用GPU,否则用CPU
save_all_models = False # 是否每个epoch都保存数据
OUTPUT_DIR = f"/hy-tmp/{model}/"
train_file = '/hy-tmp/data/train.csv'
test_file = '/hy-tmp/data/test.csv'
submission_file = '/hy-tmp/data/sample_submission.csv'
if not os.path.exists(CFG.OUTPUT_DIR):
os.makedirs(CFG.OUTPUT_DIR)
CFG.OUTPUT_DIR
'/hy-tmp/deberta-v3-base/'
设置随机种子,这样每次运行结果都会是一样的
def set_seeds(seed):
random.seed(seed) # 使用 Python 内置的 random 模块来设置随机数生成器的种子
np.random.seed(seed) # 使用 NumPy 库中的 random 模块来设置随机数生成器的种子
torch.manual_seed(seed) # 使用 PyTorch 库中的 manual_seed 函数来设置随机数生成器的种子(仅针对 CPU)
if torch.cuda.is_available():
'''
如果有可用的 GPU,则使用 PyTorch 库中的 manual_seed 和 manual_seed_all 函数来分别为当前 GPU 和所有 GPU 设置随机数生成器的种子
'''
torch.cuda.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
torch.backends.cudnn.deterministic = True # 固定网络结构
set_seeds(1111)
这里单独解释一下torch.backends.cudnn.deterministic = True
,cudnn是一个用于加速GPU计算的库。PyTorch 会自动检测你的系统中是否安装了支持的 cudnn
库,并在使用 GPU 进行计算时自动使用它来加速计算。它内部会使用非确定性算法,在计算卷积的时候我们知道卷积核会与数据逐位相乘再累加。
这个时候累加的顺序很重要,由于 GPU 中的线程是并行执行的,每个线程都可能在不同的时间点对同一个位置进行累加,每次运行代码的时候,先计算完成的线程都会不同。即上图的累加顺序可能不是(0,0)+(0,1)···(2,2),也许这次执行是(0,0)的位置先计算完并加到对应output的位置上,下次就是(1,1)的位置先计算完然后加到output上去。
或许这在整数计算时累加顺序的变化不影响最终的结果。但若计算的是浮点数,在实际应用中,由于浮点数运算具有一定的误差(这部分不了解的同学另行百度一下),累加操作的顺序可能会影响最终结果。而我们的目的是希望能够复现结果,希望每次运行结果都是一样的,这时就需要设置这个变量为True
,这样pytorch就会以默认的顺序计算卷积和池化(或者说叫聚合)。设置了这个的好处是便于复现,坏处是计算可能会慢一些。
二、 数据预处理
2.1 定义前处理函数,tokenizer文本
为了将训练测试集都统一处理,测试集添加label=[0,0,0,0,0,0]
def preprocess(df, tokenizer, types=True):
# types主要用于判断是否是训练集
if types:
labels = np.array(df[["cohesion", "syntax", "vocabulary", "phraseology", "grammar", "conventions"]]) # 返回的numpy数组
else:
labels = df['labels'] # 返回的dataframe,返回numpy数组也可以,返回什么无所谓,只要能用下标进行索引就行,实际上这个数据在后续不会使用到
text = list(df['full_text'].iloc[:])
encoding = tokenizer(text, truncation=True, padding='max_length', max_length=CFG.max_len, return_tensors='np')
return encoding, labels
df = pd.read_csv(CFG.train_file)
train_df, val_df = train_test_split(df[:100], test_size=0.2, random_state=1111, shuffle=True)
test_df = pd.read_csv(CFG.test_file)
test_df['labels'] = None # 新增一列数据,主要是便于后续使用 Dataset
test_df['labels'] = test_df['labels'].apply(lambda x: [0, 0, 0, 0, 0, 0]) # 给测试数据新增一个长度为6的label列
tokenizer = AutoTokenizer.from_pretrained(CFG.model_path) # 自动加载tokenizer,该路径下有deberta-v3-base的几个文件
train_encoding, train_label = preprocess(train_df, tokenizer, True)
val_encoding, val_label = preprocess(val_df, tokenizer, True)
test_encoding, test_label = preprocess(test_df, tokenizer, False)
我的CFG.model_path
下有如图中的内容,下载路径为:deberta-v3-base下载路径,路径里的tf_model.h5
是不需要下载的,因为它是tensorflow框架的模型。
2.2 定义Dataset,并将数据装入DataLoader
from torch.utils.data import Dataset, DataLoader
class MyDataset(Dataset):
def __init__(self, encoding, label):
super(Dataset, self).__init__()
self.inputs = encoding
self.label = label
# 获取数据的长度
def __len__(self):
return len(self.label)
# 获取单个数据
def __getitem__(self, index):
'''
inputs中有三个字段 input_ids、token_type_ids、attention_mask。每个字段都是二维数组,行数为数据的条数
因此,我们如果想取出下标为index的数据,则需要依次获取三个字段,每次获取到一个字段时选中下标为index的数据
将选中的数据转为tensor类型,因为后续需要在pytorch进行计算的话必须是张量(tensor)。
'''
item = {key: torch.tensor(val[index], dtype=torch.long) for key, val in self.inputs.items()}
label = torch.tensor(self.label[index], dtype=torch.float)
return item, label
train_dataset = MyDataset(train_encoding, train_label)
val_dataset = MyDataset(val_encoding, val_label)
test_dataset = MyDataset(test_encoding, test_label)
train_loader = DataLoader(train_dataset, batch_size=CFG.batch_size, shuffle=True, num_workers=CFG.num_workers)
val_loader = DataLoader(val_dataset, batch_size=CFG.batch_size, shuffle=True, num_workers=CFG.num_workers)
test_loader = DataLoader(test_dataset, batch_size=CFG.batch_size, shuffle=False, num_workers=CFG.num_workers) # 测试集一定不能打乱
此时我们来输出一下test_loader
的第一个批次,需要注意的是,测试集一共就3行数据,因此第一个批次也只有三行数据。
for i in test_loader:
print(i)
break
下面的输出结果中,是一个字典包含了三个字段,而每个字段有三行数据,这与我们上面写的getitem
函数返回的内容好像不大一致,因为该函数返回的是一个字典,如果我们通过这个函数获得一个批次的数据的话,应该得到的是一个数组里面包含了三个字典,而现在我们得到的是一个字典,字典中每个字段包含了三行数据。至于为什么接下来就会解释。
[{'input_ids': tensor([[ 1, 335, 266, ..., 265, 262, 2],
[ 1, 771, 274, ..., 0, 0, 0],
[ 1, 2651, 9805, ..., 0, 0, 0]]), 'token_type_ids': tensor([[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0]]), 'attention_mask': tensor([[1, 1, 1, ..., 1, 1, 1],
[1, 1, 1, ..., 0, 0, 0],
[1, 1, 1, ..., 0, 0, 0]])}, tensor([[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]])]
当我们使用 DataLoader
从自定义的 Dataset
中获取一个批次的数据时,DataLoader
会调用其 collate_fn
函数将多个样本组合成一个批次。默认情况下,DataLoader
使用的是 default_collate
函数,它能够处理常见的数据类型,例如张量、列表和字典等。
如果我们的 Dataset
的 __getitem__
函数返回的是一个字典,那么当使用 DataLoader
获取一个批次的数据时,返回的数据将是一个字典,其中每个键对应一个批次大小的张量。例如,如果__getitem__
函数返回如下字典:
{
"input_ids": torch.tensor([1, 2, 3]),
"attention_mask": torch.tensor([1, 1, 0])
}
那么当使用 DataLoader
获取一个批次大小为 2 的数据时,返回的数据可能如下所示:
{
"input_ids": torch.tensor([[1, 2, 3], [4, 5, 6]]),
"attention_mask": torch.tensor([[1, 1, 0], [1, 0, 0]])
}
其中,每个键对应一个形状为 (batch_size, ...)
的张量。
数据也读好了,我们接下来写一些辅助函数,例如损失函数,评价指标函数等
三、辅助函数
定义RMSELoss、评价指标MCRMSE分数、logger、FGM等。
下面定义的RMSELoss是训练时使用的损失函数,MCRMSE函数是使用验证集时,评分的评分指标。我们需要区分RMSELoss是用于梯度下降,反向传播的损失函数,而MCRMSE只是个评分指标,计算出来看的而已。
# 用于反向传播的损失函数
class RMSELoss(nn.Module):
def __init__(self, reduction='mean', eps=1e-9):
super().__init__()
self.mse = nn.MSELoss(reduction='none')
self.reduction = reduction
self.eps = eps
def forward(self, y_pred, y_true):
loss = torch.sqrt(self.mse(y_pred, y_true) + self.eps)
if self.reduction == 'none':
loss = loss
elif self.reduction == 'sum':
loss = loss.sum()
elif self.reduction == 'mean':
loss = loss.mean()
return loss
# 用于评分的评分指标
def MCRMSE(y_trues, y_preds):
scores = []
idxes = y_trues.shape[1]
for i in range(idxes):
y_true = y_trues[:, i] # 这是一个一维数组,我们把二维数组中的一列取了出来
y_pred = y_preds[:, i]
score = mean_squared_error(y_true, y_pred, squared=False) # 计算RMSE,均方根误差(Root Mean Squared Error,RMSE)
scores.append(score)
mcrmse_score = np.mean(scores) # 计算MCRMSE
return mcrmse_score, scores
# 主要是方便计算损失
class AverageMeter(object):
def __init__(self):
self.reset()
def reset(self):
self.val = 0 # 记录当前一个batch的平均loss
self.avg = 0 # 当前整轮的平均损失
self.sum = 0 # 整轮的损失和
self.count = 0 # 该轮共有多少数据,便于通过损失和计算出平均损失
def update(self, val, n=1):
self.val = val
self.sum += val * n
self.count += n
self.avg = self.sum / self.count
# 将秒转为分钟-秒
def asMinutes(s):
m = math.floor(s / 60) # 下取整值
s -= m * 60
return f'{int(m)}m {int(s)}s'
# 计算剩余多少时间
def timeSince(since, percent):
now = time.time()
s = now - since # 从开始到现在过了多少秒
es = s / (percent) # 一轮训练需要的总的时间,percent是当前 step 除以 dataloader 的长度————是一个小数
rs = es - s # 剩余时间
return f'{str(asMinutes(s))} (remain {str(asMinutes(rs))})'
def get_logger(filename=CFG.OUTPUT_DIR+'train'):
from logging import getLogger, INFO, StreamHandler, FileHandler, Formatter
logger = getLogger(__name__) # 创建了一个名为__name__的日志记录器对象,并用setLevel(INFO)设置了它的日志级别为INFO,表示只记录INFO级别及以上的信息。
logger.setLevel(INFO)
handler1 = StreamHandler() # 流处理器
handler1.setFormatter(Formatter("%(message)s")) # 设计显示格式
handler2 = FileHandler(filename=f"{filename}.log") # 文件流处理器
handler2.setFormatter(Formatter("%(message)s")) # 设计显示格式
logger.addHandler(handler1) # 输出到屏幕
logger.addHandler(handler2) # 输出到文件
return logger
logger= get_logger()
logger
Fast Gradient Method (FGM)
使用pytorch在NLP中实现对抗训练的教程可以看这个视频# NLP中的对抗训练
class FGM():
def __init__(self, model):
self.model = model
self.backup = {}
def attack(self, epsilon = 1., emb_name = 'word_embeddings'):
for name, param in self.model.named_parameters():
if param.requires_grad and emb_name in name:
self.backup[name] = param.data.clone()
norm = torch.norm(param.grad)
if norm != 0:
r_at = epsilon * param.grad / norm
param.data.add_(r_at)
def restore(self, emb_name = 'word_embeddings'):
for name, param in self.model.named_parameters():
if param.requires_grad and emb_name in name:
assert name in self.backup
param.data = self.backup[name]
self.backup = {}
四、池化
#Attention pooling
class AttentionPooling(nn.Module):
# 抛开batch维度不看,输入的是(sequence_length, hidden_size),即每个词都有一个词向量,然后词向量的个数有sequence_length个
# 注意力池化会根据attention_mask来调整每个词的权重,然后根据权重把所有的词向量加权求和,从而得到一个考虑到了每个词的句向量
def __init__(self, in_dim):
super().__init__()
self.attention = nn.Sequential(
nn.Linear(in_dim, in_dim), # (batch_size, sequence_length, hidden_size),对模型隐藏层的最后一层做一次线性变换
nn.LayerNorm(in_dim), # 归一化,它可以使中间层的分布更加稳定,从而使梯度更加平滑,训练更快,泛化能力更强。12
nn.GELU(), # 激活函数
nn.Linear(in_dim, 1), # (batch_size, sequence_length, 1),对于每个词都有一个权重,词的个数就是句子的长度,然后权重只是一个值,所以大小就如前面所说
)
def forward(self, last_hidden_state, attention_mask):
w = self.attention(last_hidden_state).float() # 计算每个token对句子的贡献
# w的大小为(batch_size, sequence_length, 1)
w[attention_mask==0]=float('-inf') # 把填充的部分的权重调整为负无穷,这样在softmax后值就为0了,表示不用注意
w = torch.softmax(w,1) # 进行softmax,使得所有token位置的权重总和为1
# 对第二维度,即sequence_length进行求和,其实就是根据w把每个词向量加权求和得到句向量
attention_embeddings = torch.sum(w * last_hidden_state, dim=1) # (batch_size, hidden_size),语义信息,hidden_size表示句向量
# 返回一个batch的句向量
return attention_embeddings
五、模型搭建
class FB3Model(nn.Module):
def __init__(self, CFG, config_path=None, pretrained=False):
super().__init__()
self.CFG = CFG
# 加载配置文件
if config_path is None:
self.config = AutoConfig.from_pretrained(CFG.model_path, ouput_hidden_states = True)
self.config.save_pretrained(CFG.OUTPUT_DIR + 'config')
self.config.hidden_dropout = 0. # 这个参数表示隐藏层的dropout概率,也就是隐藏层的神经元有多少比例会被随机关闭,以防止过拟合。一般来说,这个参数的默认值是0.1,设置为0表示不使用dropout。
self.config.hidden_dropout_prob = 0. # 这个参数和上面的参数是一样的,只是名字不同
self.config.attention_dropout = 0. # 这个参数表示注意力机制的dropout概率,也就是注意力矩阵中有多少比例的元素会被随机置为0,以防止过拟合。一般来说,这个参数的默认值也是0.1,设置为0表示不使用dropout。
self.config.attention_probs_dropout_prob = 0. # 这个参数和上面的参数是一样的,只是名字不同
logger.info(self.config)
else:
self.config = torch.load(config_path)
# 加载预训练模型
if pretrained:
self.model = AutoModel.from_pretrained(CFG.model_path, config=self.config)
else:
self.model = AutoModel.from_config(self.config)
# 设置池化方式
if CFG.pooling == 'attention':
self.pool = AttentionPooling(self.config.hidden_size)
self.fc = nn.Linear(self.config.hidden_size, self.CFG.n_targets)
def forward(self, inputs):
outputs = self.model(**inputs) # inputs内部
outputs = self.pool(outputs[1]) # 先池化得到句向量,这里的outputs[1]就是最后一层隐藏层的状态,大小为(batch_size, sequence_length, hidden_size)
output = self.fc(outputs) # 对句向量进行6分类
return output
model = FB3Model(CFG, config_path=None, pretrained=True)
torch.save(model.config, './config.pth')
model.to(device)
六、定义训练函数与验证函数
6.1 定义参数优化器与学习率优化器
关于是使用linear
还是cosine
的学习率优化器,原理和区别,可以看这几篇文章,# Transformers之自定义学习率动态调整
def get_optimizer_params(model,encoder_lr,decoder_lr,weight_decay=0.0):
param_optimizer = list(model.named_parameters())
no_decay = ["bias", "LayerNorm.bias", "LayerNorm.weight"]
optimizer_parameters = [
{'params': [p for n, p in model.model.named_parameters() if not any(nd in n for nd in no_decay)],
'lr': encoder_lr,
'weight_decay': weight_decay},
{'params': [p for n, p in model.model.named_parameters() if any(nd in n for nd in no_decay)],
'lr': encoder_lr,
'weight_decay': 0.0},
{'params': [p for n, p in model.named_parameters() if "model" not in n],
'lr': decoder_lr,
'weight_decay': 0.0}
]
return optimizer_parameters
# 选择使用线性学习率衰减或者cos学习率衰减
def get_scheduler(cfg, optimizer, num_train_steps):
if cfg.scheduler == 'linear':
scheduler = get_linear_schedule_with_warmup(
optimizer,
num_warmup_steps = cfg.num_warmup_steps,
num_training_steps = num_train_steps
)
elif cfg.scheduler == 'cosine':
scheduler = get_cosine_schedule_with_warmup(
optimizer,
num_warmup_steps = cfg.num_warmup_steps,
num_training_steps = num_train_steps,
num_cycles = cfg.num_cycles
)
return scheduler
from torch.optim import AdamW
# 下面是参数优化的优化器
optimizer_parameters = get_optimizer_params(model,CFG.encoder_lr, CFG.decoder_lr,CFG.weight_decay) # 自己定义参数
optimizer = AdamW(optimizer_parameters, lr=CFG.encoder_lr, eps=CFG.eps,betas=CFG.betas) # 定义优化器,学习率我们自己定义了一些层的学习率,如果没有定义的会用此处设置的学习率
# 下面是学习率衰退的优化器
num_train_steps = len(train_loader) * CFG.epochs # 训练步数,计算有多少个batch
scheduler = get_scheduler(CFG, optimizer, num_train_steps) # 定义学习率优化器,此处需要传入总的 batch 的个数
if CFG.loss_func == 'SmoothL1':
criterion = nn.SmoothL1Loss(reduction='mean')
elif CFG.loss_func == 'RMSE':
criterion = RMSELoss(reduction='mean')
6.2 定义训练函数和评估函数
函数中涉及的scaler,是自动混合精度训练的实例化。可以通过这篇文章学习# PyTorch的自动混合精度(AMP)
以前没怎么使用过scheduler的我,看到了scheduler.step()就有点懵了,于是看了下面的文章。了解了optimizer.step()和scheduler.step()的关系。# PyTorch中的optimizer和scheduler
def train_fn(train_loader, model, criterion, optimizer, epoch, scheduler, device):
losses = AverageMeter()
model.train() # 设置成训练模式
scaler = torch.cuda.amp.GradScaler(enabled = CFG.apex) # 自动混合精度训练
start = end = time.time()
global_step = 0
if CFG.fgm:
fgm = FGM(model) # 对抗训练
for step, (inputs, labels) in enumerate(train_loader):
# 不是很清楚字典能不能直接 to(device)
for k, v in inputs.items():
inputs[k] = v.to(device)
labels = labels.to(device)
batch_size = labels.size(0) # labels是一个尺寸为 [N, 1] 的张量,size()可以获取某个维度上的大小,此处获取了就是 N,主要是不知道最后一个 batch 有多大,所以要动态的获取
with torch.cuda.amp.autocast(enabled = CFG.apex):
y_preds = model(inputs) # 获取预测值
loss = criterion(y_preds, labels) # 计算损失
if CFG.gradient_accumulation_steps > 1:
loss = loss / CFG.gradient_accumulation_steps
losses.update(loss.item(), batch_size) # 重新计算总体的均方误差
scaler.scale(loss).backward() # 自动混合精度(AMP)反向传播
grad_norm = torch.nn.utils.clip_grad_norm_(model.parameters(), CFG.max_grad_norm) # 梯度裁剪,而且这是原地执行的,获取grad_norm只是为了输出出来看看结果
#Fast Gradient Method (FGM)
if CFG.fgm:
fgm.attack() # 更新一下embedding
with torch.cuda.amp.autocast(enabled = CFG.apex):
y_preds = model(inputs) # 用受攻击后的embedding重新训练
loss_adv = criterion(y_preds, labels)
loss_adv.backward() # 计算新的梯度,这个梯度是跟之前未受攻击的梯度进行累加的
fgm.restore() # 恢复之前的embedding
if (step + 1) % CFG.gradient_accumulation_steps == 0:
# 下面两行就是更新参数,等同于optimizer.step()
scaler.step(optimizer)# 如果梯度的值不是 infs 或者 NaNs, 那么内部会调用optimizer.step()来更新权重,否则,忽略step调用,从而保证权重不更新(不被破坏)
scaler.update() # 更新scaler的大小
optimizer.zero_grad() # 清空梯度
global_step += 1
scheduler.step() # 更新学习率,学习率的更新与梯度没有关系,所以先清空了梯度再更新学习率也没关系
end = time.time()
if step % CFG.print_freq == 0 or step == (len(train_loader) - 1):
print('Epoch: [{0}][{1}/{2}] '
'Elapsed {remain:s} '
'Loss: {loss.val:.4f}({loss.avg:.4f}) '
'Grad: {grad_norm:.4f} '
'LR: {lr:.8f} '
.format(epoch + 1, step, len(train_loader), remain = timeSince(start, float(step + 1)/len(train_loader)),
loss = losses,
grad_norm = grad_norm,
lr = scheduler.get_lr()[0]))
return losses.avg
验证函数
def valid_fn(valid_loader, model, criterion, device):
losses = AverageMeter()
model.eval() # 设置成测试模式
preds ,targets= [],[]
start = end = time.time()
for step, (inputs, labels) in enumerate(valid_loader):
for k, v in inputs.items():
inputs[k] = v.to(device)
labels = labels.to(device)
batch_size = labels.size(0)
with torch.no_grad(): # 测试模式,不能进行梯度计算
y_preds = model(inputs)
loss = criterion(y_preds, labels)
if CFG.gradient_accumulation_steps > 1:
loss = loss / CFG.gradient_accumulation_steps
losses.update(loss.item(), batch_size)
preds.append(y_preds.to('cpu').numpy())
targets.append(labels.to('cpu').numpy())
end = time.time()
if step % CFG.print_freq == 0 or step == (len(valid_loader)-1):
print('EVAL: [{0}/{1}] '
'Elapsed {remain:s} '
'Loss: {loss.val:.4f}({loss.avg:.4f}) '
.format(step, len(valid_loader),
loss=losses,
remain=timeSince(start, float(step+1)/len(valid_loader))))
predictions = np.concatenate(preds)
targets=np.concatenate(targets)
return losses.avg, predictions,targets
总的训练函数
def train_loop():
best_score = np.inf
for epoch in range(CFG.epoch):
start_time = time.time()
logger.info(f"========== epoch: {epoch} training ==========")
avg_loss = train_fn(train_loader, model, criterion, optimizer, epoch, scheduler, CFG.device)
avg_val_loss, predictions,valid_labels = valid_fn(val_loader, model, criterion, CFG.device)
score, scores = MCRMSE(valid_labels, predictions) # 获取所有指标的平均得分以及每个指标单独的得分
elapsed = time.time() - start_time
logger.info(f'Epoch {epoch+1} - avg_train_loss: {avg_loss:.4f} avg_val_loss: {avg_val_loss:.4f} time: {elapsed:.0f}s')
logger.info(f'Epoch {epoch+1} - Score: {score:.4f} Scores: {scores}')
# 如果最新的分数更高,则保存这个更优的模型
if best_score > score:
best_score = score
logger.info(f'Epoch {epoch+1} - Save Best Score: {best_score:.4f} Model')
torch.save({'model': model.state_dict(),
'predictions': predictions},
CFG.OUTPUT_DIR + "_best.pth")
# 如果设置了保存每个epoch的模型,则每个模型都会被保存
if CFG.save_all_models:
torch.save({'model': model.state_dict(),
'predictions': predictions},
CFG.OUTPUT_DIR + "_epoch{epoch + 1}.pth")
调用一下
train_loop()
========== epoch: 0 training ==========
Epoch: [1][0/391] Elapsed 0m 0s (remain 5m 3s) Loss: 1.9498(1.9498) Grad: inf LR: 0.00002000
Epoch: [1][20/391] Elapsed 0m 9s (remain 2m 40s) Loss: 0.5507(0.9485) Grad: 88425.0312 LR: 0.00001999
Epoch: [1][40/391] Elapsed 0m 17s (remain 2m 29s) Loss: 0.4651(0.7112) Grad: 125639.5469 LR: 0.00001997
Epoch: [1][60/391] Elapsed 0m 25s (remain 2m 20s) Loss: 0.3574(0.6195) Grad: 48965.1836 LR: 0.00001993
Epoch: [1][80/391] Elapsed 0m 34s (remain 2m 11s) Loss: 0.4714(0.5734) Grad: 74479.8281 LR: 0.00001989
Epoch: [1][100/391] Elapsed 0m 42s (remain 2m 2s) Loss: 0.4562(0.5377) Grad: 114286.3984 LR: 0.00001984
Epoch: [1][120/391] Elapsed 0m 51s (remain 1m 54s) Loss: 0.4210(0.5139) Grad: 73615.7266 LR: 0.00001978
Epoch: [1][140/391] Elapsed 0m 59s (remain 1m 45s) Loss: 0.4222(0.5011) Grad: 124342.8672 LR: 0.00001971
Epoch: [1][160/391] Elapsed 1m 8s (remain 1m 37s) Loss: 0.3030(0.4896) Grad: 110364.0078 LR: 0.00001962
Epoch: [1][180/391] Elapsed 1m 16s (remain 1m 28s) Loss: 0.5538(0.4809) Grad: 106854.4375 LR: 0.00001953
Epoch: [1][200/391] Elapsed 1m 24s (remain 1m 20s) Loss: 0.4477(0.4722) Grad: 269420.5938 LR: 0.00001943
Epoch: [1][220/391] Elapsed 1m 33s (remain 1m 11s) Loss: 0.3516(0.4660) Grad: 80327.8438 LR: 0.00001932
Epoch: [1][240/391] Elapsed 1m 41s (remain 1m 3s) Loss: 0.3896(0.4609) Grad: 63723.8398 LR: 0.00001920
Epoch: [1][260/391] Elapsed 1m 50s (remain 0m 54s) Loss: 0.3637(0.4565) Grad: 57669.6133 LR: 0.00001907
Epoch: [1][280/391] Elapsed 1m 58s (remain 0m 46s) Loss: 0.3613(0.4510) Grad: 113929.4141 LR: 0.00001893
Epoch: [1][300/391] Elapsed 2m 6s (remain 0m 37s) Loss: 0.4049(0.4482) Grad: 126043.1719 LR: 0.00001878
Epoch: [1][320/391] Elapsed 2m 15s (remain 0m 29s) Loss: 0.3543(0.4435) Grad: 114538.3203 LR: 0.00001862
Epoch: [1][340/391] Elapsed 2m 23s (remain 0m 21s) Loss: 0.3910(0.4410) Grad: 56835.2969 LR: 0.00001845
Epoch: [1][360/391] Elapsed 2m 32s (remain 0m 12s) Loss: 0.3834(0.4390) Grad: 104753.5312 LR: 0.00001827
Epoch: [1][380/391] Elapsed 2m 40s (remain 0m 4s) Loss: 0.2973(0.4361) Grad: 112279.6641 LR: 0.00001809
Epoch: [1][390/391] Elapsed 2m 44s (remain 0m 0s) Loss: 0.3189(0.4346) Grad: 102051.1328 LR: 0.00001799
EVAL: [0/98] Elapsed 0m 0s (remain 0m 59s) Loss: 0.3986(0.3986)
EVAL: [20/98] Elapsed 0m 5s (remain 0m 20s) Loss: 0.3622(0.3786)
EVAL: [40/98] Elapsed 0m 10s (remain 0m 14s) Loss: 0.3669(0.3765)
EVAL: [60/98] Elapsed 0m 15s (remain 0m 9s) Loss: 0.3587(0.3800)
EVAL: [80/98] Elapsed 0m 20s (remain 0m 4s) Loss: 0.2795(0.3794)
Epoch 1 - avg_train_loss: 0.4346 avg_val_loss: 0.3815 time: 190s
Epoch 1 - Score: 0.4793 Scores: [0.50504076, 0.4751229, 0.43030012, 0.47523162, 0.530547, 0.4594006]
Epoch 1 - Save Best Score: 0.4793 Model
EVAL: [97/98] Elapsed 0m 24s (remain 0m 0s) Loss: 0.5805(0.3815)
========== epoch: 1 training ==========
Epoch: [2][0/391] Elapsed 0m 0s (remain 5m 4s) Loss: 0.2979(0.2979) Grad: inf LR: 0.00001799
Epoch: [2][20/391] Elapsed 0m 9s (remain 2m 41s) Loss: 0.4277(0.3779) Grad: 126436.4922 LR: 0.00001779
Epoch: [2][40/391] Elapsed 0m 17s (remain 2m 29s) Loss: 0.4425(0.3725) Grad: 100469.0000 LR: 0.00001758
Epoch: [2][60/391] Elapsed 0m 25s (remain 2m 20s) Loss: 0.3419(0.3731) Grad: 86438.8672 LR: 0.00001737
Epoch: [2][80/391] Elapsed 0m 34s (remain 2m 11s) Loss: 0.2846(0.3727) Grad: 113561.5391 LR: 0.00001715
Epoch: [2][100/391] Elapsed 0m 42s (remain 2m 2s) Loss: 0.4614(0.3782) Grad: 71337.5234 LR: 0.00001692
Epoch: [2][120/391] Elapsed 0m 51s (remain 1m 54s) Loss: 0.4372(0.3745) Grad: 128858.8438 LR: 0.00001668
Epoch: [2][140/391] Elapsed 0m 59s (remain 1m 45s) Loss: 0.3464(0.3741) Grad: 92793.8203 LR: 0.00001644
Epoch: [2][160/391] Elapsed 1m 8s (remain 1m 37s) Loss: 0.3312(0.3708) Grad: 82513.1406 LR: 0.00001619
Epoch: [2][180/391] Elapsed 1m 16s (remain 1m 28s) Loss: 0.3312(0.3705) Grad: 201764.6094 LR: 0.00001594
Epoch: [2][200/391] Elapsed 1m 24s (remain 1m 20s) Loss: 0.4366(0.3740) Grad: 135784.6406 LR: 0.00001567
Epoch: [2][220/391] Elapsed 1m 33s (remain 1m 11s) Loss: 0.2683(0.3761) Grad: 140406.2031 LR: 0.00001541
Epoch: [2][240/391] Elapsed 1m 41s (remain 1m 3s) Loss: 0.4170(0.3754) Grad: 136473.2344 LR: 0.00001513
Epoch: [2][260/391] Elapsed 1m 50s (remain 0m 54s) Loss: 0.3923(0.3753) Grad: 73960.7891 LR: 0.00001486
Epoch: [2][280/391] Elapsed 1m 58s (remain 0m 46s) Loss: 0.3683(0.3752) Grad: 64848.6758 LR: 0.00001457
Epoch: [2][300/391] Elapsed 2m 7s (remain 0m 38s) Loss: 0.3474(0.3744) Grad: 131543.4062 LR: 0.00001428
Epoch: [2][320/391] Elapsed 2m 15s (remain 0m 29s) Loss: 0.3071(0.3739) Grad: 104604.7188 LR: 0.00001399
Epoch: [2][340/391] Elapsed 2m 24s (remain 0m 21s) Loss: 0.2994(0.3751) Grad: 76737.6875 LR: 0.00001369
Epoch: [2][360/391] Elapsed 2m 32s (remain 0m 12s) Loss: 0.4693(0.3759) Grad: 121103.0078 LR: 0.00001339
Epoch: [2][380/391] Elapsed 2m 41s (remain 0m 4s) Loss: 0.3609(0.3750) Grad: 103191.3594 LR: 0.00001309
Epoch: [2][390/391] Elapsed 2m 45s (remain 0m 0s) Loss: 0.3236(0.3746) Grad: 115441.6875 LR: 0.00001294
EVAL: [0/98] Elapsed 0m 0s (remain 1m 1s) Loss: 0.3850(0.3850)
EVAL: [20/98] Elapsed 0m 5s (remain 0m 20s) Loss: 0.4035(0.3799)
EVAL: [40/98] Elapsed 0m 10s (remain 0m 14s) Loss: 0.4204(0.3744)
EVAL: [60/98] Elapsed 0m 15s (remain 0m 9s) Loss: 0.3059(0.3784)
EVAL: [80/98] Elapsed 0m 20s (remain 0m 4s) Loss: 0.3372(0.3768)
Epoch 2 - avg_train_loss: 0.3746 avg_val_loss: 0.3800 time: 190s
Epoch 2 - Score: 0.4767 Scores: [0.49304673, 0.46059012, 0.44865215, 0.4811058, 0.48887262, 0.4878879]
Epoch 2 - Save Best Score: 0.4767 Model
EVAL: [97/98] Elapsed 0m 24s (remain 0m 0s) Loss: 0.3070(0.3800)
========== epoch: 2 training ==========
Epoch: [3][0/391] Elapsed 0m 0s (remain 5m 1s) Loss: 0.2899(0.2899) Grad: inf LR: 0.00001292
Epoch: [3][20/391] Elapsed 0m 9s (remain 2m 41s) Loss: 0.3787(0.3540) Grad: 175920.7812 LR: 0.00001261
Epoch: [3][40/391] Elapsed 0m 17s (remain 2m 30s) Loss: 0.2569(0.3505) Grad: 96343.5703 LR: 0.00001230
Epoch: [3][60/391] Elapsed 0m 26s (remain 2m 20s) Loss: 0.4611(0.3546) Grad: 181476.2188 LR: 0.00001199
Epoch: [3][80/391] Elapsed 0m 34s (remain 2m 11s) Loss: 0.3593(0.3523) Grad: 71936.0391 LR: 0.00001167
Epoch: [3][100/391] Elapsed 0m 42s (remain 2m 3s) Loss: 0.3393(0.3559) Grad: 109394.1875 LR: 0.00001135
Epoch: [3][120/391] Elapsed 0m 51s (remain 1m 54s) Loss: 0.3660(0.3575) Grad: 140890.3906 LR: 0.00001103
Epoch: [3][140/391] Elapsed 0m 59s (remain 1m 45s) Loss: 0.3748(0.3594) Grad: 123294.2266 LR: 0.00001071
Epoch: [3][160/391] Elapsed 1m 8s (remain 1m 37s) Loss: 0.3356(0.3578) Grad: 129286.8594 LR: 0.00001039
Epoch: [3][180/391] Elapsed 1m 16s (remain 1m 28s) Loss: 0.3279(0.3543) Grad: 134369.8438 LR: 0.00001007
Epoch: [3][200/391] Elapsed 1m 24s (remain 1m 20s) Loss: 0.3662(0.3531) Grad: 109224.3125 LR: 0.00000975
Epoch: [3][220/391] Elapsed 1m 33s (remain 1m 11s) Loss: 0.3806(0.3512) Grad: 119248.4375 LR: 0.00000943
Epoch: [3][240/391] Elapsed 1m 41s (remain 1m 3s) Loss: 0.3819(0.3523) Grad: 78124.8984 LR: 0.00000911
Epoch: [3][260/391] Elapsed 1m 50s (remain 0m 54s) Loss: 0.3427(0.3530) Grad: 73294.2891 LR: 0.00000879
Epoch: [3][280/391] Elapsed 1m 58s (remain 0m 46s) Loss: 0.3162(0.3519) Grad: 94840.5938 LR: 0.00000847
Epoch: [3][300/391] Elapsed 2m 6s (remain 0m 37s) Loss: 0.3040(0.3529) Grad: 88447.9453 LR: 0.00000815
Epoch: [3][320/391] Elapsed 2m 15s (remain 0m 29s) Loss: 0.3175(0.3526) Grad: 145305.6250 LR: 0.00000784
Epoch: [3][340/391] Elapsed 2m 23s (remain 0m 21s) Loss: 0.3468(0.3517) Grad: 110218.4531 LR: 0.00000753
Epoch: [3][360/391] Elapsed 2m 32s (remain 0m 12s) Loss: 0.3926(0.3522) Grad: 65352.9297 LR: 0.00000722
Epoch: [3][380/391] Elapsed 2m 40s (remain 0m 4s) Loss: 0.2828(0.3515) Grad: 105423.3984 LR: 0.00000691
Epoch: [3][390/391] Elapsed 2m 44s (remain 0m 0s) Loss: 0.3242(0.3506) Grad: 59486.8477 LR: 0.00000676
EVAL: [0/98] Elapsed 0m 0s (remain 1m 0s) Loss: 0.3660(0.3660)
EVAL: [20/98] Elapsed 0m 5s (remain 0m 20s) Loss: 0.3246(0.3523)
EVAL: [40/98] Elapsed 0m 10s (remain 0m 14s) Loss: 0.3867(0.3606)
EVAL: [60/98] Elapsed 0m 15s (remain 0m 9s) Loss: 0.2762(0.3658)
EVAL: [80/98] Elapsed 0m 20s (remain 0m 4s) Loss: 0.4370(0.3674)
Epoch 3 - avg_train_loss: 0.3506 avg_val_loss: 0.3708 time: 190s
Epoch 3 - Score: 0.4654 Scores: [0.48894468, 0.47190648, 0.41822833, 0.47279745, 0.478483, 0.46188158]
Epoch 3 - Save Best Score: 0.4654 Model
EVAL: [97/98] Elapsed 0m 24s (remain 0m 0s) Loss: 0.3958(0.3708)
========== epoch: 3 training ==========
Epoch: [4][0/391] Elapsed 0m 0s (remain 5m 27s) Loss: 0.2887(0.2887) Grad: 248518.7812 LR: 0.00000674
Epoch: [4][20/391] Elapsed 0m 9s (remain 2m 41s) Loss: 0.4382(0.3248) Grad: 121720.1172 LR: 0.00000644
Epoch: [4][40/391] Elapsed 0m 17s (remain 2m 30s) Loss: 0.3732(0.3206) Grad: 125786.2812 LR: 0.00000614
Epoch: [4][60/391] Elapsed 0m 26s (remain 2m 20s) Loss: 0.3820(0.3343) Grad: 91117.5938 LR: 0.00000585
Epoch: [4][80/391] Elapsed 0m 34s (remain 2m 12s) Loss: 0.2734(0.3306) Grad: 57463.1992 LR: 0.00000556
Epoch: [4][100/391] Elapsed 0m 42s (remain 2m 3s) Loss: 0.2757(0.3275) Grad: 70494.5469 LR: 0.00000527
Epoch: [4][120/391] Elapsed 0m 51s (remain 1m 54s) Loss: 0.3366(0.3275) Grad: 87479.7891 LR: 0.00000499
Epoch: [4][140/391] Elapsed 0m 59s (remain 1m 46s) Loss: 0.3645(0.3282) Grad: 154315.6562 LR: 0.00000472
Epoch: [4][160/391] Elapsed 1m 8s (remain 1m 37s) Loss: 0.2984(0.3261) Grad: 93636.7109 LR: 0.00000445
Epoch: [4][180/391] Elapsed 1m 16s (remain 1m 29s) Loss: 0.2320(0.3264) Grad: 37266.0586 LR: 0.00000418
Epoch: [4][200/391] Elapsed 1m 25s (remain 1m 20s) Loss: 0.2876(0.3264) Grad: 71387.4922 LR: 0.00000392
Epoch: [4][220/391] Elapsed 1m 33s (remain 1m 12s) Loss: 0.3872(0.3260) Grad: 159705.8438 LR: 0.00000367
Epoch: [4][240/391] Elapsed 1m 42s (remain 1m 3s) Loss: 0.3811(0.3270) Grad: 47979.0312 LR: 0.00000342
Epoch: [4][260/391] Elapsed 1m 50s (remain 0m 55s) Loss: 0.2687(0.3269) Grad: 97840.6406 LR: 0.00000319
Epoch: [4][280/391] Elapsed 1m 59s (remain 0m 46s) Loss: 0.3409(0.3280) Grad: 173353.0156 LR: 0.00000295
Epoch: [4][300/391] Elapsed 2m 7s (remain 0m 38s) Loss: 0.3019(0.3289) Grad: 135909.9062 LR: 0.00000273
Epoch: [4][320/391] Elapsed 2m 15s (remain 0m 29s) Loss: 0.2928(0.3283) Grad: 69503.2734 LR: 0.00000251
Epoch: [4][340/391] Elapsed 2m 24s (remain 0m 21s) Loss: 0.3893(0.3289) Grad: 85478.1719 LR: 0.00000230
Epoch: [4][360/391] Elapsed 2m 32s (remain 0m 12s) Loss: 0.2778(0.3293) Grad: 120143.7656 LR: 0.00000210
Epoch: [4][380/391] Elapsed 2m 41s (remain 0m 4s) Loss: 0.3321(0.3280) Grad: 112291.6406 LR: 0.00000191
Epoch: [4][390/391] Elapsed 2m 45s (remain 0m 0s) Loss: 0.2963(0.3275) Grad: 108092.1172 LR: 0.00000182
EVAL: [0/98] Elapsed 0m 0s (remain 1m 3s) Loss: 0.2932(0.2932)
EVAL: [20/98] Elapsed 0m 5s (remain 0m 20s) Loss: 0.3838(0.3549)
EVAL: [40/98] Elapsed 0m 10s (remain 0m 14s) Loss: 0.3059(0.3605)
EVAL: [60/98] Elapsed 0m 15s (remain 0m 9s) Loss: 0.4385(0.3685)
EVAL: [80/98] Elapsed 0m 20s (remain 0m 4s) Loss: 0.3533(0.3697)
Epoch 4 - avg_train_loss: 0.3275 avg_val_loss: 0.3693 time: 191s
Epoch 4 - Score: 0.4611 Scores: [0.48614553, 0.4448405, 0.42011937, 0.47291633, 0.4917833, 0.45061582]
Epoch 4 - Save Best Score: 0.4611 Model
EVAL: [97/98] Elapsed 0m 24s (remain 0m 0s) Loss: 0.3778(0.3693)
========== epoch: 4 training ==========
Epoch: [5][0/391] Elapsed 0m 0s (remain 5m 11s) Loss: 0.3832(0.3832) Grad: inf LR: 0.00000181
Epoch: [5][20/391] Elapsed 0m 9s (remain 2m 41s) Loss: 0.2926(0.3217) Grad: 94323.1406 LR: 0.00000163
Epoch: [5][40/391] Elapsed 0m 17s (remain 2m 30s) Loss: 0.3275(0.3139) Grad: 108386.5547 LR: 0.00000146
Epoch: [5][60/391] Elapsed 0m 25s (remain 2m 20s) Loss: 0.3118(0.3121) Grad: 89941.1016 LR: 0.00000129
Epoch: [5][80/391] Elapsed 0m 34s (remain 2m 11s) Loss: 0.3067(0.3132) Grad: 165197.1406 LR: 0.00000114
Epoch: [5][100/391] Elapsed 0m 42s (remain 2m 2s) Loss: 0.3166(0.3114) Grad: 56191.2539 LR: 0.00000100
Epoch: [5][120/391] Elapsed 0m 51s (remain 1m 54s) Loss: 0.3278(0.3124) Grad: 94895.2734 LR: 0.00000086
Epoch: [5][140/391] Elapsed 0m 59s (remain 1m 45s) Loss: 0.3607(0.3128) Grad: 77948.6484 LR: 0.00000073
Epoch: [5][160/391] Elapsed 1m 8s (remain 1m 37s) Loss: 0.3402(0.3146) Grad: 113676.4844 LR: 0.00000062
Epoch: [5][180/391] Elapsed 1m 16s (remain 1m 28s) Loss: 0.3783(0.3133) Grad: 56143.0781 LR: 0.00000051
Epoch: [5][200/391] Elapsed 1m 24s (remain 1m 20s) Loss: 0.3733(0.3138) Grad: 80444.6562 LR: 0.00000042
Epoch: [5][220/391] Elapsed 1m 33s (remain 1m 11s) Loss: 0.3398(0.3139) Grad: 107842.5391 LR: 0.00000033
Epoch: [5][240/391] Elapsed 1m 41s (remain 1m 3s) Loss: 0.3067(0.3155) Grad: 119173.0391 LR: 0.00000025
Epoch: [5][260/391] Elapsed 1m 50s (remain 0m 54s) Loss: 0.2778(0.3151) Grad: 51814.4180 LR: 0.00000019
Epoch: [5][280/391] Elapsed 1m 58s (remain 0m 46s) Loss: 0.2593(0.3147) Grad: 122930.2344 LR: 0.00000013
Epoch: [5][300/391] Elapsed 2m 7s (remain 0m 38s) Loss: 0.3550(0.3154) Grad: 145206.0312 LR: 0.00000008
Epoch: [5][320/391] Elapsed 2m 15s (remain 0m 29s) Loss: 0.2629(0.3145) Grad: 94886.2891 LR: 0.00000005
Epoch: [5][340/391] Elapsed 2m 24s (remain 0m 21s) Loss: 0.3118(0.3141) Grad: 71085.5234 LR: 0.00000002
Epoch: [5][360/391] Elapsed 2m 32s (remain 0m 12s) Loss: 0.2707(0.3133) Grad: 113675.3047 LR: 0.00000001
Epoch: [5][380/391] Elapsed 2m 40s (remain 0m 4s) Loss: 0.3170(0.3126) Grad: 110467.9141 LR: 0.00000000
Epoch: [5][390/391] Elapsed 2m 45s (remain 0m 0s) Loss: 0.3775(0.3130) Grad: 110086.3047 LR: 0.00000000
EVAL: [0/98] Elapsed 0m 0s (remain 0m 59s) Loss: 0.3563(0.3563)
EVAL: [20/98] Elapsed 0m 5s (remain 0m 20s) Loss: 0.4007(0.3810)
EVAL: [40/98] Elapsed 0m 10s (remain 0m 14s) Loss: 0.3444(0.3758)
EVAL: [60/98] Elapsed 0m 15s (remain 0m 9s) Loss: 0.3400(0.3755)
EVAL: [80/98] Elapsed 0m 20s (remain 0m 4s) Loss: 0.2994(0.3713)
Epoch 5 - avg_train_loss: 0.3130 avg_val_loss: 0.3677 time: 190s
Epoch 5 - Score: 0.4603 Scores: [0.485256, 0.44779205, 0.4199208, 0.47600335, 0.4788845, 0.45419946]
Epoch 5 - Save Best Score: 0.4603 Model
EVAL: [97/98] Elapsed 0m 24s (remain 0m 0s) Loss: 0.3803(0.3677)
七、推理
这个部分比赛并没有提供标准的label,所以我们如果只是自己线下写着玩,那么这一部分就没意义了。
def inference_fn(test_loader, model, device):
preds = []
model.eval()
model.to(device)
tk0 = tqdm(test_loader, total=len(test_loader))
for inputs,label in tk0:
for k, v in inputs.items():
inputs[k] = v.to(device)
with torch.no_grad():
y_preds = model(inputs)
preds.append(y_preds.to('cpu').numpy())
predictions = np.concatenate(preds)
return predictions
def inference_fn(test_loader, model, device):
preds = []
model.eval()
model.to(device)
tk0 = tqdm(test_loader, total=len(test_loader))
for inputs,label in tk0:
for k, v in inputs.items():
inputs[k] = v.to(device)
with torch.no_grad():
y_preds = model(inputs)
preds.append(y_preds.to('cpu').numpy())
predictions = np.concatenate(preds)
return predictions
输出结果为
array([[ 0.06905353, 0.16151421, -0.7520439 , -0.05804106, 0.86029375,
0.68256676],
[-0.10824093, -0.08520262, -0.72831357, -0.0021437 , 0.7458864 ,
0.6492575 ],
[ 0.07650095, 0.3073048 , -0.8738065 , 0.03434162, 0.63522017,
0.57341987]], dtype=float32)
输出结果文件
test_df[CFG.target_cols] = prediction
submission = submission.drop(columns=CFG.target_cols).merge(test_df[['text_id'] + CFG.target_cols], on='text_id', how='left')
display(submission.head())
submission[['text_id'] + CFG.target_cols].to_csv('submission.csv', index=False)
我自己也是个学习者,因此如果对代码有疑问的同学,欢迎交流。