Q-learning实现简单的Gym游戏
Gym是为测试和开发RL算法而设计的环境/任务的集合。它让用户不必再创建复杂的环境。Gym用Python编写,它有很多的环境,比如机器人模拟或Atari 游戏。这里以一个基础的出租车游戏为例,示范Gym的使用方法,以及基本的Q-learning的实现
1.创建环境
import gym
import numpy as np
env = gym.make("Taxi-v3") #创建出租车游戏环境
state = env.reset() #初始化环境
envspace = env.observation_space.n #状态空间的大小
actspace = env.action_space.n #动作空间的大小
2.在不使用Q-learning时虽然也能够通过随机尝试获得满意的结果,但是往往需要很多步才能做对一次
# 随机动作
conter = 0
reward = None
while reward!=20:
state, reward, done, info = env.step(env.action_space.sample())
conter = conter +1
print(reward)
print(done)
print(conter)
3.建立一个Q表,将经验存储在Q表中
# Q-learning
Q = np.zeros([envspace,actspace]) #创建一个Q-table
alpha = 0.5 #学习率
for episode in range(1,2000):
done = False
reward = 0 #瞬时reward
R_cum = 0 #累计reward
state = env.reset() #状态初始化
while done != True:
action = np.argmax(Q[state])
state2,reward,done,info = env.step(action)
Q[state,action] += alpha*(reward+np.max(Q[state2])-Q[state,action])
R_cum +=reward
state = state2
# env.render()
if episode % 50 == 0:
print('episode:{};total reward:{}'.format(episode,R_cum))
print('The Q table is:{}'.format(Q))
# 测试阶段
conter = 0
reward = None
while conter<200:
action = np.argmax(Q[state])
state, reward, done, info = env.step(action)
conter = conter +1
# env.render()
print(reward)
参考来源:https://www.sohu.com/a/197847451_633700