吴恩达Coursera, 机器学习专项课程, Machine Learning:Unsupervised Learning, Recommenders, Reinforcement Learning第...

Practice quiz: Reinforcement learning introduction

第 1 个问题:You are using reinforcement learning to control a four legged robot. The position of the robot would be its _____.

image
【正确】state

第 2 个问题:You are controlling a Mars rover. You will be very very happy if it gets to state 1 (significant scientific discovery), slightly happy if it gets to state 2 (small scientific discovery), and unhappy if it gets to state 3 (rover is permanently damaged). To reflect this, choose a reward function so that:

image
【正确】R(1) > R(2) > R(3), where R(1) and R(2) are positive and R(3) is negative.
【解释】Good job!

第 3 个问题:You are using reinforcement learning to fly a helicopter. Using a discount factor of 0.75, your helicopter starts in some state and receives rewards -100 on the first step, -100 on the second step, and 1000 on the third and final step (where it has reached a terminal state). What is the return?

image
【正确】-100 - 0.75100 + 0.75^21000

第 4 个问题:Given the rewards and actions below, compute the return from state 3 with a discount factor of \gamma = 0.25.

image
【正确】6.25 Correct
【解释】If starting from state 3, the rewards are in states 3, 2, and 1. The return is 0+(0.25)×0+(0.25) ^2×100=6.25.

Practice quiz: State-action value function

第 1 个问题:Which of the following accurately describes the state-action value function Q(s,a)?

image
【正确】It is the return if you start from state s, take action a (once), then behave optimally after that.

第 2 个问题:You are controlling a robot that has 3 actions: ← (left), → (right) and STOP. From a given state s, you have computed Q(s, ←) = -10, Q(s, →) = -20, Q(s, STOP) = 0.What is the optimal action to take in state s?

image
【正确】STOP

第 3 个问题:For this problem, \gamma = 0.25. The diagram below shows the return and the optimal action from each state. Please compute Q(5, ←).

image
【正确】0.625

Practice quiz: Continuous state spaces

第 1 个问题:The Lunar Lander is a continuous state MDP because:

image
【正确】The state contains numbers such as position and velocity that are continuous valued

第 2 个问题:In the learning algorithm described in the videos, we repeatedly create an artificial training set to which we apply supervised learning where the input x = (s,a) and the target, constructed using Bellman’s equations, is y = _____?

image
【正确】见上图

第 3 个问题:You have reached the final practice quiz of this class! What does that mean? (Please check all the answers, because all of them are correct!)

image
【正确】The DeepLearning.AI and Stanford Online teams would like to give you a round of applause!
【正确】You deserve to celebrate!
【正确】Andrew sends his heartfelt congratulations to you!
【正确】What an accomplishment -- you made it!

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

ZhemgLee

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值