Rl-Camp-Recap

Reinforcement Learning Camp

Author: Yijia Shaw

Camp held by: Baidu Inc.

Brief intro

Reinforcement learning is a branch of AI that is developing quickly. Since it doesn’t need label data, the training and performance will not be limited by the amount of labeled data, which is a great advantage, compared with supervised, unsupervised or semi-supervised learning.

Course review

Through this course, lead by Instructor Ke and Xiao, I have learned some fundamental algorithms like q − l e a r n i n g q-learning qlearning, d q n dqn dqn, and p g pg pg, et cetera. In the process, what impressed me the most is the part of q-learning taught by Ke. I have learned this basic algorithm before, but I didn’t actually understand the ideas underlying the codes. But through Instructor Ke’s explanation and live videos, I think I have got an a better understanding of the q-table. And the relationship between the state transition probability and the probability in epsilon-greedy.

Reflection

Besides, I have to reflect on my performance in the camp. I do need to improve my self-regulation ability. I was doing another project and so I didn’t put too much effort into the learning and assignments, but to be honest, there did exsit enough time for learning and assignments. So I think I have to improve my time management ability. Plan to recap these later.

Some finding

Another interesting thing is that, the reinforcement learning’s reward signal/training curve is different from the supervised learning. The latter is smoother compared with the former one. I think it’s because the agent need to trade-off between exploration and exploitation. When it explores a new environment, the reward will drop suddenly, but when he has fully exploited it. The performance will be better. So the grade will raise.

Future

The reinforcement learning is an interesting as well as promising field in machine learning. I’m interested in it, and will devote more time to it 😃

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值