使用tensorboard可视化fairseq-train训练过程报错

做机器翻译的时候遇到的问题,输入的命令如下

nohup fairseq-train /data4/zxzhou/wnt22/ende/preprocessionresult \
--arch transformer_iwslt_de_en --share-decoder-input-output-embed \
--optimizer adam --adam-betas '(0.9, 0.98)' \
--clip-norm 0.0 --lr 5e-4 --lr-scheduler inverse_sqrt \
--warmup-updates 512 --dropout 0.3 --weight-decay 0.0001 \
--criterion label_smoothed_cross_entropy --label-smoothing 0.1 \
--max-tokens 9216 \
--max-epoch 50 \
--save-interval 5 \
--keep-last-epochs 11 \
--eval-bleu \
--eval-bleu-args '{"beam": 5, "max_len_a": 1.2, "max_len_b": 10}' \
--eval-bleu-detok moses \
--eval-bleu-remove-bpe \
--eval-bleu-print-samples \
--best-checkpoint-metric bleu \
--maximize-best-checkpoint-metric \
--save-dir /data4/zxzhou/wnt22/ende/checkpoints/test_transformerbaseline/ \
> baselinetrain.log 2>&1 &

 训练完成后报错

2024-04-27 13:45:21 | INFO | fairseq_cli.train | done training in 131604.6 seconds
Exception in thread Thread-3:
Exception in thread Thread-6:
Traceback (most recent call last):
Traceback (most recent call last):
  File "/home/zxzhou/miniconda3/envs/fseq/lib/python3.10/threading.py", line 1009, in _bootstrap_inner
  File "/home/zxzhou/miniconda3/envs/fseq/lib/python3.10/threading.py", line 1009, in _bootstrap_inner
Exception in thread Thread-7:
Traceback (most recent call last):
  File "/home/zxzhou/miniconda3/envs/fseq/lib/python3.10/threading.py", line 1009, in _bootstrap_inner
    self.run()
    self.run()
    self.run()
  File "/home/zxzhou/miniconda3/envs/fseq/lib/python3.10/site-packages/tensorboardX/event_file_writer.py", line 202, in run
  File "/home/zxzhou/miniconda3/envs/fseq/lib/python3.10/site-packages/tensorboardX/event_file_writer.py", line 202, in run
  File "/home/zxzhou/miniconda3/envs/fseq/lib/python3.10/site-packages/tensorboardX/event_file_writer.py", line 202, in run
    data = self._queue.get(True, queue_wait_duration)
    data = self._queue.get(True, queue_wait_duration)
    data = self._queue.get(True, queue_wait_duration)
  File "/home/zxzhou/miniconda3/envs/fseq/lib/python3.10/multiprocessing/queues.py", line 117, in get
  File "/home/zxzhou/miniconda3/envs/fseq/lib/python3.10/multiprocessing/queues.py", line 117, in get
  File "/home/zxzhou/miniconda3/envs/fseq/lib/python3.10/multiprocessing/queues.py", line 117, in get
    res = self._recv_bytes()
    res = self._recv_bytes()
  File "/home/zxzhou/miniconda3/envs/fseq/lib/python3.10/multiprocessing/connection.py", line 221, in recv_bytes
  File "/home/zxzhou/miniconda3/envs/fseq/lib/python3.10/multiprocessing/connection.py", line 221, in recv_bytes
    res = self._recv_bytes()
  File "/home/zxzhou/miniconda3/envs/fseq/lib/python3.10/multiprocessing/connection.py", line 221, in recv_bytes
    buf = self._recv_bytes(maxlength)
    buf = self._recv_bytes(maxlength)
  File "/home/zxzhou/miniconda3/envs/fseq/lib/python3.10/multiprocessing/connection.py", line 419, in _recv_bytes
  File "/home/zxzhou/miniconda3/envs/fseq/lib/python3.10/multiprocessing/connection.py", line 419, in _recv_bytes
    buf = self._recv_bytes(maxlength)
  File "/home/zxzhou/miniconda3/envs/fseq/lib/python3.10/multiprocessing/connection.py", line 419, in _recv_bytes
    buf = self._recv(4)
    buf = self._recv(4)
  File "/home/zxzhou/miniconda3/envs/fseq/lib/python3.10/multiprocessing/connection.py", line 388, in _recv
    buf = self._recv(4)
  File "/home/zxzhou/miniconda3/envs/fseq/lib/python3.10/multiprocessing/connection.py", line 388, in _recv
    raise EOFError
  File "/home/zxzhou/miniconda3/envs/fseq/lib/python3.10/multiprocessing/connection.py", line 388, in _recv
EOFError
    raise EOFError
    raise EOFError
EOFError
EOFError
/home/zxzhou/miniconda3/envs/fseq/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 600 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
  • 2
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值