强化学习之DQN

最新推荐文章于 2024-06-24 14:58:00 发布

q19930928

最新推荐文章于 2024-06-24 14:58:00 发布

阅读量256

点赞数 1

本文链接：https://blog.csdn.net/q19930928/article/details/87895525

版权

本文介绍了DQN（Deep Q-Network）强化学习方法，重点讨论了其核心组成部分——评价网络（evaluate_net）和目标网络（target_net）的构建过程。

摘要由CSDN通过智能技术生成

DQN 包含了两个神经网络

------------------ build evaluate_net ------------------

    self.s = tf.placeholder(tf.float32, [None, self.n_features], name='s')  # input
    self.q_target = tf.placeholder(tf.float32, [None, self.n_actions], name='Q_target')  # for calculating loss
    with tf.variable_scope('eval_net'):
        # c_names(collections_names) are the collections to store variables
        c_names, n_l1, w_initializer, b_initializer = \
            ['eval_net_params', tf.GraphKeys.GLOBAL_VARIABLES], 10, \
            tf.random_normal_initializer(0., 0.3), tf.constant_initializer(0.1)  # config of layers

        # first layer. collections is used later when assign to target net
        with tf.variable_scope('l1'):
            w1 = tf.get_variable('w1', [self.n_features, n_l1], initializer=w_initializer, collections=c_names)
            b1 =