进行到最后 jupyter notebook进度条停住了
最主要的是最后循环退出来的时候需要梯度更新
for batch_ids,batch_token,batch_text,batch_offset,batch_attention_mask,batch_label in tqdm(train_loader):
batch_token = batch_token.to(device)
batch_attention_mask = batch_attention_mask.to(device)
batch_label = batch_label.to(device)
loss = compute_multilabel_loss(model,batch_token,batch_attention_mask,batch_label)
#loss, logits = model(input_ids=batch_token, attention_mask=batch_attention_mask, labels=batch_label,
# return_dict=False)
train_loss = train_loss+loss
loss.backward()
if((i+1)%accumulation_steps)==0:
optimizer.step() # 反向传播,更新网络参数
optimizer.zero_grad() # 清空梯度
i = i+1
optimizer.step()
optimizer.zero_grad()
#退出来的时候莫忘梯度更新
scheduler.step()
问题并未解决
加上平均loss尝试?
for batch_ids,batch_token,batch_text,batch_offset,batch_attention_mask,batch_label in tqdm(train_loader):
batch_token = batch_token.to(device)
batch_attention_mask = batch_attention_mask.to(device)
batch_label = batch_label.to(device)
loss = compute_multilabel_loss(model,batch_token,batch_attention_mask,batch_label)
#loss, logits = model(input_ids=batch_token, attention_mask=batch_attention_mask, labels=batch_label,
# return_dict=False)
loss = loss/accumulation_steps
#加上了这一句
train_loss = train_loss+loss
loss.backward()
if((i+1)%accumulation_steps)==0:
optimizer.step() # 反向传播,更新网络参数
optimizer.zero_grad() # 清空梯度
i = i+1
optimizer.step()
optimizer.zero_grad()
#退出来的时候莫忘梯度更新
scheduler.step()
此时发现程序可以正常运转了,大概是不除以accumulations_steps时loss太大,影响显存
1.如果数据过大出现问题,可以减小数据试一下
2.注意前面的optimizer.zero_grad()要去除,去除optimizer.zero_grad()跟base模型时间差不多