深度强化学习专栏
记录自己在Udacity上的深度强化学习课程学习。
小朱 智能驾驶
这个作者很懒,什么都没留下…
展开
-
DRL — Policy Based Methods — Chapter 3-4 Proximal Policy Optimization
DRL — Policy Based Methods — Chapter 3-4 Proximal Policy Optimization3.4.3 Beyond REINFORCEREINFORCE works as follows: First, we initialize a random policy πθ(a;s)\pi_\theta(a;s)πθ(a;s), and using the policy we collect a trajectory – or a list of (stat原创 2020-06-22 16:25:53 · 198 阅读 · 0 评论 -
DRL — Policy Based Methods — Chapter 3-3 Policy Gradient Methods
DRL — Policy Based Methods — Chapter 3-3 Policy Gradient Methods3.3.1 What are Policy Gradient Methods?Policy-based methods are a class of algorithms that search directly for the optimal policy without simultaneously maintaining value function estimates.原创 2020-06-21 11:10:58 · 206 阅读 · 0 评论 -
DRL --- Policy Based Methods --- Chapter 3-2 Introduction to Policy-Based Method
Deep Reinforcement Learning — Value Based Methods — Chapter 2-4 Optimize Your GitHub Profile原创 2020-06-20 19:55:18 · 204 阅读 · 0 评论 -
Deep Reinforcement Learning --- Value Based Methods --- Chapter 2-3 Navigation
Deep Reinforcement Learning — Value Based Methods — Chapter 2-3 Navigation1.Unity ML-Agents2. The Environment -Introduction A reward of +1 is provided for collecting a yellow banana, and a reward of -1 is provided for collecting a blue banana. Thus, th原创 2020-06-05 09:41:07 · 128 阅读 · 0 评论 -
Deep Reinforcement Learning --- Value Based Methods --- Chapter 2-2 Deep Q-Networks
Deep Reinforcement Learning — Value Based Methods — Chapter 2-2 Deep Q-Networks2.2.1 From RL to Deep RL So far, you’ve solved many of your own reinforcement learning problems, using solution method...原创 2020-06-02 21:40:02 · 446 阅读 · 0 评论 -
Chapter 1 - 10: RL in Continuous Spaces
Chapter 1 - 10: RL in Continuous Spaces1.10.1 Introducing Arpan1.10.2 Lesson Overview Reinfoecement learning problems are typically framed as Markov Decision Processor or MDPs. An MDP consists of ...原创 2020-04-24 23:03:17 · 944 阅读 · 0 评论 -
Chapter 1 - 9: Solve OpenAI Gym's Taxi-v2 Task
Chapter 1 - 9: Solve OpenAI Gym’s Taxi-v2 Task1.9.1 Introduction原创 2020-04-15 17:43:47 · 662 阅读 · 0 评论 -
Chapter 1 - 8: Temporal-Difference Methods
Chapter 1 - 8: Temporal-Difference Methods1.8.1 Introduction Monte Carlo learning need breaks, it needed the episode to end so that the return could be calculated and then used as estimate for the ...原创 2020-04-13 00:59:24 · 462 阅读 · 0 评论 -
Chapter 1 - 7: Monte Carlo Methods
Chapter 1 - 7: Monte Carlo Methods1.7.1 Review In order to rigorously define a reinforcement learning task, we generally use a Markov Decision Process(MDP) to model the environment. The MDP specifi...原创 2020-04-06 21:27:00 · 629 阅读 · 0 评论 -
Chapter 1 - 6: The RL Framework: The solution
Chapter 1 - 6: The RL Framework: The solution1.6.2 Polices We’ve seen that we use a Markov decision process or MDP as a formal definition of the problem that we’d like to solve with reinforcement l...原创 2020-04-01 00:14:32 · 399 阅读 · 0 评论 -
第二章 Pytorch基础 Chapter 2-1/2 安装驱动的过程
第二章 Pytorch 基础2.1 为何选择PyTorch?Pytoch 由4个主要的包组成:torch: 类似于Numpy的通用数组库,可将张量类型转换为torch.cuda.TensorFloat,并在GPU上进行计算。torch.autograd: 用于构建计算图形并自动获取梯度的包torch.nn: 具有共享层和损失函数的神经网络库torch.optim: 具有通用优化算法(...原创 2020-03-30 16:53:28 · 307 阅读 · 0 评论 -
Chapter 1 - 5 The RL Framework: The Problem
r=min(vx,vmax)−0.005(vy2+vz2)−0.05y2−0.02r=\min(v_{x},v_{max})-0.005(v_{y}^{2}+v^{2}_{z})-0.05y^{2}-0.02r=min(vx,vmax)−0.005(vy2+vz2)−0.05y2−0.02Γ(n)=(n−1)!∀n∈N\Gamma(n) = (n-1)!\quad\forall n\i...原创 2020-03-29 19:36:49 · 684 阅读 · 0 评论 -
Chapter 1 - 2 欢迎来到深度强化学习
需要将真实世界的问题首先抽象为 Markov Decision Processes (MDPs),然后才可以使用强化学习方法去求解。课程划分:1 强化学习基础2 基于值的方法3 基于策略的方法4 多智能体强化学习...原创 2020-03-08 10:37:05 · 161 阅读 · 0 评论