Machine learning week 10(Andrew Ng)

最新推荐文章于 2024-08-18 19:53:43 发布

小白有颗大白梦

最新推荐文章于 2024-08-18 19:53:43 发布

阅读量551

点赞数 1

分类专栏： Machine learning 文章标签：机器学习人工智能

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/weixin_62012485/article/details/126639368

版权

Machine learning 专栏收录该内容

10 篇文章 0 订阅

订阅专栏

Reinforcement learning

1. Reinforcement learning introduction

1.1. What is Reinforcement Learning?

The key idea is rather than you need to tell the algorithm what the right output y for every single input is, all you have to do instead is specify a reward function that tells it when it’s doing well and when it’s doing poorly.

1.2. Mars rover example

1.3. The return in Reinforcement learning

在这里插入图片描述
The first step is $r^0$ .

Select the orientation according to the first two tables

1.4. Making decisions: Policies in reinforcement learning

For example, $\pi(2)$ is left while $\pi(5)$ is right. The number expresses state.
在这里插入图片描述

1.5. Review of key concepts

在这里插入图片描述

2. State-action value function

2.1. State-action value function definition

在这里插入图片描述
The iteration will be used.

2.2. State-action value function example

2.3. Bellman Equation

$Q(s,a) = R(s) + r * max Q(s^{'},a^{'})$
在这里插入图片描述

2.4. Random (stochastic) environment

Sometimes it actually ends up accidentally slipping and going in the opposite direction.

3. Continuous state spaces

3.1. Example of continuous state space applications

Every variable is continuous.

3.2. Lunar lander

在这里插入图片描述

3.3. Learning the state-value function

在这里插入图片描述

Q is a random value at first. We will train the model to find a better Q.

3.4. Algorithm refinement: Improved neural network architecture

在这里插入图片描述

3.5. Algorithm refinement: ε-greedy policy

ε = 0.05
在这里插入图片描述
If we choose a bad ε, we may take 100 times as long.

3.6. Algorithm refinement: Mini-batch and soft update

The idea of mini-batch gradient descent is to not use all 100 million training examples on every single iteration through this loop. Instead, we may pick a smaller number, let me call it m prime equals say, 1,000. On every step, instead of using all 100 million examples, we would pick some subset of 1,000 or m prime examples.
在这里插入图片描述

Soft update
When we set Q equals to $Q_{new}$ , it can make a very abrupt change to Q.So we will adjust the parameters in Q.
$W = 0.01*W_{new} + 0.99 W$
$B = 0.01*B_{new} + 0.99 B$

3.7. The state of reinforcement learning

在这里插入图片描述

Summary

在这里插入图片描述

小白有颗大白梦

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Machine learning week 10(Andrew Ng)

吴恩达机器学习（2022python版）笔记参考与代码实现
复制链接

扫一扫

专栏目录

小白有颗大白梦 CSDN认证博客专家 CSDN认证企业博客

码龄3年

92: 原创

3万+: 周排名

2万+: 总排名

10万+: 访问

: 等级

1430: 积分

296: 粉丝

440: 获赞

20: 评论

624: 收藏

私信

关注

热门文章

分类专栏

最新评论

【Code】Street-Gaussian代码复现笔记
小白有颗大白梦: 抱歉抱歉我还没遇到过这个问题，是不是看看Mask2Former的clone过程是不是正常
【Code】Street-Gaussian代码复现笔记
乐乐的晚安: 博主你好，我在进行最后一步sh make.sh的时候出现了报错： LINK : fatal error LNK1104: 无法打开文件“C:\Users\hp\street-gauss\dependencies\Mask2Former\mask2former\modeling\pixel_decoder\ops\build\temp.win-amd64-cpython-38\Release\Users\hp\street-gauss\dependencies\Mask2Former\mask2former\modeling\pixel_decoder\ops\src\cpu\MultiScaleDeformableAttention.cp38-win_amd64.lib” error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\Professional\\VC\\Tools\\MSVC\\14.29.30133\\bin\\HostX86\\x64\\link.exe' failed with exit code 1104。请问这个应该怎么解决呢？
【博弈论笔记】第六章不完全信息静态表示
金融科技_探索: 线性策略函数的贝叶斯纳什均衡a1与a2的取值范围是否有误，当v1=0时，可能出现b1>v1的情况。
【博弈论笔记】第四章重复博弈
金融科技_探索: 最后的有效工资率博弈中，还未证明动态博弈子博弈的纳什均衡，故尚不能得到触发战略组合是子博弈完美纳什均衡的结论。
【计算机视觉】24-Object Detection
征途黯然.: 计算机视觉24ObjectDetection article is great, thank you for sharing!

最新文章

目录

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。