由于gym等相关库版本更新,博主在运行DQN代码时遇到环境配置问题(2023.12.23),由于时间关系,在网上找了许多教程才顺利解决。
首先声明,运行中遇到大部分问题都可以通过改变环境配置(包的版本)来解决,现将最终解决后部分关键的包的版本分享给大家:
gym 0.20.0
numpy 1.24.4
pillow 8.2.0
pip 23.3.2
python 3.8.18
tqdm 4.66.1
wheel 0.42.0
博主在配置好上述版本后运行这段代码:
lr = 2e-3
num_episodes = 500
hidden_dim = 128
gamma = 0.98
epsilon = 0.01
target_update = 10
buffer_size = 10000
minimal_size = 500
batch_size = 64
device = torch.device("cuda") if torch.cuda.is_available() else torch.device(
"cpu")
env_name = 'CartPole-v0'
env = gym.make(env_name)
random.seed(0)
np.random.seed(0)
#env.seed(0)
env.reset(seed=0)
torch.manual_seed(0)
replay_buffer = ReplayBuffer(buffer_size)
state_dim = env.observation_space.shape[0]
action_dim = env.action_space.n
agent = DQN(state_dim, hidden_dim, action_dim, lr, gamma, epsilon,
target_update, device)
return_list = []
for i in range(10):
with tqdm(total=int(num_episodes / 10), desc='Iteration %d' % i) as pbar:
for i_episode in range(int(num_episodes / 10)):
episode_return = 0
state = env.reset()
done = False
while not done:
action = agent.take_action(state)
next_state, reward, done, _ = env.step(action)
replay_buffer.add(state, action, reward, next_state, done)
state = next_state
episode_return += reward
# 当buffer数据的数量超过一定值后,才进行Q网络训练
if replay_buffer.size() > minimal_size:
b_s, b_a, b_r, b_ns, b_d = replay_buffer.sample(batch_size)
transition_dict = {
'states': b_s,
'actions': b_a,
'next_states': b_ns,
'rewards': b_r,
'dones': b_d
}
agent.update(transition_dict)
return_list.append(episode_return)
if (i_episode + 1) % 10 == 0:
pbar.set_postfix({
'episode':
'%d' % (num_episodes / 10 * i + i_episode + 1),
'return':
'%.3f' % np.mean(return_list[-10:])
})
pbar.update(1)
报错:ValueError: expected sequence of length 4 at dim 2 (got 0)
经查阅得知将上述代码块中的state = env.reset()
修改为state = env.reset()[0]
,同时将next_state, reward, done, _ = env.step(action)
修改为next_state, reward, done, _, __ = env.step(action)
。
问题得到解决,最后成功运行: