Reinforcement Learning Exercise 5.1 & 5.2

最新推荐文章于 2019-10-03 18:37:08 发布

YeXiang\^-^/

最新推荐文章于 2019-10-03 18:37:08 发布

阅读量567

点赞数

分类专栏： reinforcement learning 文章标签： reinforcement learning

本文链接：https://blog.csdn.net/ballade2012/article/details/98336406

版权

reinforcement learning 专栏收录该内容

37 篇文章 1 订阅

订阅专栏

Exercise 5.1 Consider the diagrams on the right in Figure 5.1. Why does the estimated value function jump up for the last two rows in the rear? Why does it drop off for the whole last row on the left? Why are the frontmost values higher in the upper diagrams than in the lower?
在这里插入图片描述
The estimated value function jump up for the last 2 rows in the rear is because the play sticks on 20 or 21, and in a much higher probability he would win the game. The diagram drop off for the whole last row on the left is because the dealer showed an ace card which decrease the probability for the play to win. The frontmost values are higher in the upper diagrams than in the lower is because in the upper diagrams, the player has an usable ace card, which increase the probability for the player to win.

Exercise 5.2 Suppose every-visit MC was used instead of first-visit MC on the blackjack task. Would you expect the results to be very different? Why or why not?

The results will be same. That’s because on the blackjack task, all the rewards are zero except the last step. So in an episode the return is only be changed in the last step, no matter in a method of first-visit or every-visit.

YeXiang\^-^/

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Reinforcement Learning Exercise 5.1 & 5.2

Exercise 5.1 Consider the diagrams on the right in Figure 5.1. Why does the estimated value function jump up for the last two rows in the rear? Why does it drop off for the whole last row on the left?...
复制链接

扫一扫