目录
如何在pycharm中调试deepspeed
请参考这篇文章:如何在pycharm中调试deepspeed
常见的坑
1. valueerror: no slot “”specified on host ‘localhost’
deepspeed 单机多卡的时候不要设置export CUDA_VISIBLE_DEVICES,否则容易出现序号的错误。例如设置export CUDA_VISIBLE_DEVICES=4,5 则4,5在localhost的list里成为0,1 如果–include="localhost:4,5"会报错。所以直接设置–include="localhost:4,5"即可1。
2. RuntimeError: The server socket has failed to listen on any local network address. The server socket has failed to bind to [::]:29500
注意在py文件前加上这些
--include localhost:0 --master_port 5600 --hostfile hostfile_single