【寻求帮助,GPT2-chitchat-directml训练记录】

这里写自定义目录标题

问题

尝试使用自己的数据来训练项目
 https://github.com/yangjianxin1/GPT2-chitchat-directml
 但loss在首次训练时可以降下来,但使用pretraied_model继续训练后就降不下来了,而且loss显示的不一样

设备

在这里插入图片描述1. python 3.10.6
2. torch-directml 0.1.13.1.dev230301

训练准备

  1. 数据
    在这里插入图片描述

F:\python\python310\projects\GPT2-chitchat-directml>…\python preprocess.py --train_path data/my/train.txt --save_path data/my/train.pkl
2023-06-15 10:22:46,416 - INFO - preprocessing data,data path:data/my/train.txt, save path:data/my/train.pkl
2023-06-15 10:22:46,426 - INFO - there are 24 dialogue in dataset
100%|█████████████████████████████████████████████████████████████████████████████████| 24/24 [00:00<00:00, 800.79it/s]
2023-06-15 10:22:46,556 - INFO - finish preprocessing data,the result is stored in data/my/train.pkl
2023-06-15 10:22:46,556 - INFO - mean of dialogue len:42.583333333333336,median of dialogue len:39.5,max len:80

train.json

这里是引用

训练记录

首次训练结束时的loss

2023-06-12 15:41:35,439 - INFO - saving current best model for epoch 301
2023-06-12 15:41:35,953 - INFO - saving current best model for epoch 301
2023-06-12 15:41:37,036 - INFO - batch 2 of epoch 302, loss 1.7703745365142822, batch_acc 0.9117647058823529, lr [1.817572524174725e-05]
2023-06-12 15:41:37,462 - INFO - batch 4 of epoch 302, loss 2.0290677547454834, batch_acc 0.8860759493670886, lr [1.817139046348783e-05]
2023-06-12 15:41:37,869 - INFO - batch 6 of epoch 302, loss 0.7335497736930847, batch_acc 0.9285714285714286, lr [1.816705568522841e-05]
使用pretrained继续训练
开始时
2023-06-12 16:22:05,439 - INFO - use GPU privateuseone:0 to train
2023-06-12 16:22:05,443 - INFO - number of model parameters: 4975920
2023-06-12 16:22:05,444 - INFO - args:Namespace(device=device(type=‘privateuseone’, index=0), no_cuda=False, vocab_path=‘vocab/vocab.txt’, model_config=‘data/my/config.json’, train_path=‘data/my/train.pkl’, max_len=150, log_path=‘data/train.log’, log=True, ignore_index=-100, epochs=1000, batch_size=1, gpu0_bsz=10, lr=2.6e-05, eps=1e-09, log_step=2, gradient_accumulation_steps=1, max_grad_norm=2.0, save_model_path=‘data/my/’, pretrained_model=‘data/my/’, seed=None, num_workers=0, patience=0, warmup_steps=4, val_num=12, cuda=True, sep_id=102, pad_id=0, cls_id=101)
2023-06-12 16:22:05,445 - INFO - loading training dataset and validating dataset
2023-06-12 16:22:05,494 - INFO - starting training
2023-06-12 16:22:06,015 - INFO - batch 2 of epoch 1, loss 7.603002071380615, batch_acc 1.0, lr [1.3e-05]
2023-06-12 16:22:06,345 - INFO - batch 4 of epoch 1, loss 7.671083450317383, batch_acc 0.96875, lr [2.6e-05]
2023-06-12 16:22:06,535 - INFO - batch 6 of epoch 1, loss 7.687896728515625, batch_acc 0.9555555555555556, lr [2.599566522174058e-05]
2023-06-12 16:22:06,737 - INFO - batch 8 of epoch 1, loss 7.656625747680664, batch_acc 0.9512195121951219, lr [2.5991330443481157e-05]
2023-06-12 16:22:06,863 - INFO - batch 10 of epoch 1, loss 7.5464067459106445, batch_acc 0.9230769230769231, lr [2.598699566522174e-05]
2023-06-12 16:22:07,021 - INFO - batch 12 of epoch 1, loss 7.670943260192871, batch_acc 0.9367088607594937, lr [2.598266088696232e-05]
2023-06-12 16:22:07,022 - INFO - epoch 1: loss 7.630055824915568, predict_acc 0.955193482688391
2023-06-12 16:22:07,553 - INFO - validate epoch 1: loss 7.547492941220601
2023-06-12 16:22:07,553 - INFO - saving current best model for epoch 1
2023-06-12 16:22:08,042 - INFO - saving current best model for epoch 1
然后一直徘徊在7.xxx

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值