使用Stablebaselines3遇到的问题,求助

 为什么同样的环境,我使用PPO和A2C就可以,换了SAC就显示数据格式不匹配???

model_sac = SAC("MlpPolicy",env, verbose=0, tensorboard_log=os.path.join(logs_dir, "SAC_tensorboard"),learning_rate=0.003)
model_sac.learn(total_timesteps=time_steps, tb_log_name='SAC', reset_num_timesteps=False,callback=callback)

这是报错信息:

Traceback (most recent call last):
  File "D:\ps\anaconda\envs\metro-env1\lib\code.py", line 90, in runcode
    exec(code, self.locals)
  File "<input>", line 1, in <module>
  File "D:\ps\pycharm\PyCharm 2021.3.1\plugins\python\helpers\pydev\_pydev_bundle\pydev_umd.py", line 198, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "D:\ps\pycharm\PyCharm 2021.3.1\plugins\python\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "D:/桌面/study/code-study/rl4metro-main1/rl4metro-main 4.20/train.py", line 141, in <module>
    model_sac.learn(total_timesteps=time_steps, tb_log_name='SAC', reset_num_timesteps=False,callback=callback)
  File "D:\ps\anaconda\envs\metro-env1\lib\site-packages\stable_baselines3\sac\sac.py", line 299, in learn
    return super().learn(
  File "D:\ps\anaconda\envs\metro-env1\lib\site-packages\stable_baselines3\common\off_policy_algorithm.py", line 353, in learn
    self.train(batch_size=self.batch_size, gradient_steps=gradient_steps)
  File "D:\ps\anaconda\envs\metro-env1\lib\site-packages\stable_baselines3\sac\sac.py", line 250, in train
    current_q_values = self.critic(replay_data.observations, replay_data.actions)
  File "D:\ps\anaconda\envs\metro-env1\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "D:\ps\anaconda\envs\metro-env1\lib\site-packages\stable_baselines3\common\policies.py", line 945, in forward
    return tuple(q_net(qvalue_input) for q_net in self.q_networks)
  File "D:\ps\anaconda\envs\metro-env1\lib\site-packages\stable_baselines3\common\policies.py", line 945, in <genexpr>
    return tuple(q_net(qvalue_input) for q_net in self.q_networks)
  File "D:\ps\anaconda\envs\metro-env1\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "D:\ps\anaconda\envs\metro-env1\lib\site-packages\torch\nn\modules\container.py", line 141, in forward
    input = module(input)
  File "D:\ps\anaconda\envs\metro-env1\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "D:\ps\anaconda\envs\metro-env1\lib\site-packages\torch\nn\modules\linear.py", line 103, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: expected scalar type Float but found Double
 

  • 4
    点赞
  • 7
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值