吴恩达Coursera, 机器学习专项课程, Machine Learning：Unsupervised Learning, Recommenders, Reinforcement Learning第...

最新推荐文章于 2024-03-01 17:39:08 发布

ZhemgLee

最新推荐文章于 2024-03-01 17:39:08 发布

阅读量565

点赞数 1

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/qq_39444290/article/details/128161693

版权

Practice quiz: Reinforcement learning introduction

第 1 个问题：You are using reinforcement learning to control a four legged robot. The position of the robot would be its _____.

【正确】state

第 2 个问题：You are controlling a Mars rover. You will be very very happy if it gets to state 1 (significant scientific discovery), slightly happy if it gets to state 2 (small scientific discovery), and unhappy if it gets to state 3 (rover is permanently damaged). To reflect this, choose a reward function so that:

【正确】R(1) > R(2) > R(3), where R(1) and R(2) are positive and R(3) is negative.
【解释】Good job!

第 3 个问题：You are using reinforcement learning to fly a helicopter. Using a discount factor of 0.75, your helicopter starts in some state and receives rewards -100 on the first step, -100 on the second step, and 1000 on the third and final step (where it has reached a terminal state). What is the return?

【正确】-100 - 0.75100 + 0.75^21000

第 4 个问题：Given the rewards and actions below, compute the return from state 3 with a discount factor of \gamma = 0.25.

【正确】6.25 Correct
【解释】If starting from state 3, the rewards are in states 3, 2, and 1. The return is 0+(0.25)×0+(0.25) ^2×100=6.25.

Practice quiz: State-action value function

第 1 个问题：Which of the following accurately describes the state-action value function Q(s,a)?

【正确】It is the return if you start from state s, take action a (once), then behave optimally after that.

第 2 个问题：You are controlling a robot that has 3 actions: ← (left), → (right) and STOP. From a given state s, you have computed Q(s, ←) = -10, Q(s, →) = -20, Q(s, STOP) = 0.What is the optimal action to take in state s?

【正确】STOP

第 3 个问题：For this problem, \gamma = 0.25. The diagram below shows the return and the optimal action from each state. Please compute Q(5, ←).

【正确】0.625

Practice quiz: Continuous state spaces

第 1 个问题：The Lunar Lander is a continuous state MDP because:

【正确】The state contains numbers such as position and velocity that are continuous valued

第 2 个问题：In the learning algorithm described in the videos, we repeatedly create an artificial training set to which we apply supervised learning where the input x = (s,a) and the target, constructed using Bellman’s equations, is y = _____?

【正确】见上图

第 3 个问题：You have reached the final practice quiz of this class! What does that mean? (Please check all the answers, because all of them are correct!)

【正确】The DeepLearning.AI and Stanford Online teams would like to give you a round of applause!
【正确】You deserve to celebrate!
【正确】Andrew sends his heartfelt congratulations to you!
【正确】What an accomplishment -- you made it!

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
打赏
0
评论
吴恩达Coursera, 机器学习专项课程, Machine Learning：Unsupervised Learning, Recommenders, Reinforcement Learning第...

Practice quiz: Reinforcement learning introduction第 1 个问题：You are using reinforcement learning to control a four legged robot. The position of the robot would be its _____.【正确】state第 2 个问题：You ar...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

打赏作者

ZhemgLee 你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20

扫码支付：¥1

获取中

扫码支付

您的余额不足，请更换扫码支付或充值

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。