强化学习--策略网络--TensorFlow

最新推荐文章于 2023-04-17 13:26:35 发布

BigSnakeLin

最新推荐文章于 2023-04-17 13:26:35 发布

阅读量434

点赞数

分类专栏：强化学习文章标签：策略网络 TensorFlow

本文链接：https://blog.csdn.net/weixin_43822600/article/details/100161620

版权

本文介绍了如何利用TensorFlow来实现策略网络，重点在于 TensorFlow 的应用及其在强化学习中的策略网络构建过程。

摘要由CSDN通过智能技术生成

TensorFlow 实现策略网络

#baseline
import tensorflow as tf
import numpy as np
import gym
env = gym.make('CartPole-v0')
env.reset()
random_episodes = 0
reward_sum = 0
while random_episodes < 10:
    #env.render()
    observation,reward,done,_ = env.step(np.random.randint(0,2))
    reward_sum += reward
    if done:
        random_episodes += 1
        print('Reward for the episode was :',reward_sum)
        reward_sum = 0
        env.reset()

Reward for the episode was : 11.0
Reward for the episode was : 31.0
Reward for the episode was : 46.0
Reward for the episode was : 18.0
Reward for the episode was : 10.0
Reward for the episode was : 25.0
Reward for the episode was : 13.0
Reward for the episode was : 25.0
Reward for the episode was : 16.0
Reward for the episode was : 14.0

# 实现强化学习策略网络
#常用网络参数
H = 50#节点数
batch_size = 25
learning_rate = 0.1
D = 4 #观测维度
gamma = 0.99#Reward的discount比例

# 占位符  ---构建一个MLP
observations = tf.placeholder(tf.float32,[None,D],name='input_x')
w1 = tf.get_variable('w1',shape=[D,H],initializer=tf.contrib.layers.xavier_initializer())
layer1 = tf

最低0.47元/天解锁文章

BigSnakeLin

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
强化学习--策略网络--TensorFlow

TensorFlow 实现策略网络#baselineimport tensorflow as tfimport numpy as npimport gymenv = gym.make('CartPole-v0')env.reset()random_episodes = 0reward_sum = 0while random_episodes < 10: #env.r...
复制链接

扫一扫

专栏目录