报错：TypeError: DistributedOptimizer() got an unexpected keyword argument ‘--------’

最新推荐文章于 2024-09-16 13:52:40 发布

乐乐灬小Y

最新推荐文章于 2024-09-16 13:52:40 发布

阅读量739

点赞数

文章标签：深度学习

本文链接：https://blog.csdn.net/qq_40701060/article/details/119411781

版权

记录一下问题的结决办法

环境：

tensorflow:tensorflow-gpu1.14.0

horovod:0.19.5

python:3.7.9

CMake：3.21.1

是否安装了nccl：是

nccl版本：nccl_2.6.4-1+cuda10.0_x86_64

CUDA：CUDA10.0

CUDNN_VERSION=7.6.5.32

我在运行官方给的实例会报类似的错误，我的报错是”TypeError: DistributedOptimizer() got an unexpected keyword argument 'gradient_predivide_factor'“我在github上查找到了相关的可能的解决办法。作者亲自解答，我觉得可行度很高

先附上问题网址https://github.com/horovod/horovod/issues/774

Hey @lakshmiumenon, the parameter should be backward_passes_per_step with an additional underscore. That's one possible cause of this error (assuming that wasn't just a transcription error).

That parameter was added very recently, and those changes haven't yet been packaged into a release yet, so it's possible that your installed version of Horovod is behind the version of the examples you're using.

I'd suggest checking out the version of the examples that is the same as your version of Horovod. For example, if you installed Horovod v0.15.2, then you could checkout that tag in your examples repo: git checkout v0.15.2.

这是作者的答复，说白了就是horovod的版本不够，导致最新的一个参数无法引用。

解决办法：将horovod更新到最新版本就行了。