我在尝试运行 TianhongDai写的pytorch实现的dppo时,经历的报错有:
AttributeError: 'Beta' object has no attribute 'data'
AttributeError: 'Beta' object has no attribute 'batch_log_pdf'
File "...\distributed-ppo-master\dppo_agent.py", line 85, in update_network
return critic_loss.data.cpu().numpy()[0], actor_loss.data.cpu().numpy()[0]
IndexError: too many indices for array: array is 0-dimensional, but 1 were indexed
我的环境是:
win 10
Python 3.7.6
torch 1.8.1+cu111
mujoco-py 1.50.1.68
gym 0.18.3
pyro-ppl 1.6.0
而原作者使用的环境还是比较老的
python 3.5.2
pytorch-0.3.1
所以他的写法会有一些旧,做一点更改就好
(1)AttributeError: ‘Beta’ object has no attribute ‘data’
在dppo_agent.py 中的方法select_actions里,将
actions_cpu = actions.data.cpu().numpy()[0]
改成
actions_cpu = actions.sample().data.cpu().numpy()[0]
(2)AttributeError: ‘Beta’ object has no attribute ‘batch_log_pdf’
对于第二个错误,也是函数的版本变化,在最新的pyro.distributions.distribution中,单个还是一个batch的log_prob都直接变成log_porb了
所以只要将所有batch_log_pdf改成log_prob
如
new_action_prob = new_beta_dist.batch_log_pdf(actions_batch)
改成
new_action_prob = new_beta_dist.log_prob(actions_batch)
就ok
(3)IndexError: too many indices for array: array is 0-dimensional, but 1 were indexed
因为方法calculate_the_gradients中返回的是一个值
critic_loss = (returns - predicted_value).pow(2).mean()
所以只要把方法update_network中的return从
return critic_loss.data.cpu().numpy()[0], actor_loss.data.cpu().numpy()[0]
改成
return critic_loss.data.cpu().numpy(), actor_loss.data.cpu().numpy()
便可