CS231n学习笔记--14. Reinforcement Learning

最新推荐文章于 2023-08-13 18:57:13 发布

Kuekua-seu

最新推荐文章于 2023-08-13 18:57:13 发布

阅读量586

点赞数

分类专栏：深度学习 CS231n学习笔记文章标签：深度学习 cs231n

本文链接：https://blog.csdn.net/u012554092/article/details/78463044

版权

深度学习同时被 2 个专栏收录

36 篇文章 0 订阅

订阅专栏

CS231n学习笔记

12 篇文章 2 订阅

订阅专栏

1. What is Reinforcement Learning

概述：

举个栗子：

再举一个：

2. Markov Decision Process

Mathematical formulation of the RL problem
Markov property: Current state completely characterises the state of the world

**处理流程：**

The optimal policy π*

3. Q-learning

Definitions: Value function and Q-value function：

Bellman equation：

优化策略：

**Solving for the optimal policy: Q-learning**

举个栗子：Playing Atari Games

**Q-network Architecture**

**Training the Q-network: Experience Replay**

Deep Q-Learning with Experience Replay

4. Policy Gradients

Intuition：

Variance reduction：

Variance reduction: Baseline

How to choose the baseline?

A better baseline: Want to push up the probability of an action from a state, if this action was better than the **expected value of what we should get from that state**

**Actor-Critic Algorithm**

5. REINFORCE 的运用

5.1 Recurrent Attention Model (RAM)

效果示意图：

**5.2 AlphaGo**

6. Summary

Policy gradients: very general but suffer from high variance so requires a lot of samples.
Challenge: sample-efficiency
Q-learning: does not always work but when it works, usually more sample-efficient. Challenge: exploration
Guarantees:
Policy Gradients: Converges to a local minima of J(θ), often good enough!
Q-learning: Zero guarantees since you are approximating Bellman equation with a complicated function approximator

Kuekua-seu

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
CS231n学习笔记--14. Reinforcement Learning

1. What is Reinforcement Learning概述：举个栗子：再举一个： 2. Markov Decision ProcessMathematical formulation of the RL problemMarkov property: Current state completely characterises the state of the world
复制链接

扫一扫