基于DQN强化学习的高速路决策控制
依赖包
gym == 0.21.0
stable-baselines3 == 1.6.2
highway-env == 1.5
环境测试
highway-env环境介绍:highway-env
import gym
import highway_env
# Create environment
env = gym.make('highway-fast-v0')
eposides = 10
rewards = 0
for eq in range(eposides):
obs = env.reset()
done = False
while not done:
action = env.action_space.sample()
obs, reward, done, info = env.step(action)
env.render()
rewards += reward
print(rewards/eposides)
目标车辆随机选取动作,测试视频如下highway_fast_test:,由视频可知,随机选取的动作的平均奖励(Reward)为:9.800666098251863
DQN决策控制研究
模型训练
采用DQN算法进行目标车辆的决策控制,模型训练代码如下:
import gym
import highway_env
from stable_baselines3 import DQN
# Create environment
env = gym.make("highway-fast-v0")
model = DQN('MlpPolicy',
env,
policy_kwargs=dict(net_arch=[256, 256]),
learning_rate=5e-4,
buffer_size=15000,
learning_starts=200,
batch_size=32,
gamma=0.8,
train_freq=1,
gradient_steps=1,
target_update_interval=50,
verbose=1,
tensorboard_log="./logs")
model.learn(int(2e4))
model.save("highway_dqn_model")
模型测试
import gym
import highway_env
from stable_baselines3 import DQN
from stable_baselines3.common.evaluation import evaluate_policy
# Create environment
env = gym.make("highway-fast-v0")
# load model
model = DQN.load("highway_dqn_model", env=env)
mean_reward, std_reward = evaluate_policy(
model,
model.get_env(),
deterministic=True,
render=True,
n_eval_episodes=10)
print(mean_reward)
模型测试视频如下highway_fast_valid:,可知训练后平均奖励为:18.4022157