CS285 Lecture 1笔记

最新推荐文章于 2022-04-15 12:24:35 发布

子让0830

最新推荐文章于 2022-04-15 12:24:35 发布

阅读量164

点赞数

文章标签：强化学习

本文链接：https://blog.csdn.net/zhaohongjue_0830/article/details/115498114

版权

本文介绍了强化学习的基本概念，探讨了深度学习如何通过端到端学习解决决策问题，强调了深度强化学习在复杂环境中的优势，并讨论了现代研究热点，如从奖励学习进阶、示范学习和预测学习。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Lecture 1 Introduction and Course Overview

Deep learning helps us handle unstructured environments
- learn from amount of data
- Not decision-making problems，usually recognition problems, passive problems
Reinforcement learning provides a formalism for behavior
- is essentially a mathematical formalization of a decision-making problem.

Deep RL

One of the main advantages of Useing deep learning is that, computer can get features(which are not designed by human) through this kind of end-to-end learning, and don’t need we human to discover them.
For standard RL:
- unlikely to have features in common——probably need to design features for each task.
- the main limiting factor for the application of RL
For Deep RL:
- don’t have to design features by hands. it is a automated process

Traditional decision-making system:
- consist of many different parts
- perception, then action
- sometimes is very difficult——some features may be very helpful and relevant to the task but may not be apparent
End-to-end:
- map the perceptron to action directly
- allow the models to discover for itself what are valuable things to pay attention to in the image.

Reinforcement Learning: algorithmic foudation

Deep models: allow reinforcement learning algorithms to solve complex problems end to end——apply RL to general problems

Deep = can process complex sensory input

Reinforcement learning = can choose complex actions

• Basic reinforcement learning deals with maximizing rewards
Advanced topics:
- Learning reward functions from example (inverse reinforcement learning)
- Transferring knowledge between domains (transfer learning, meta-learning)
- Learning to predict and using prediction to act

Learning from demonstrations
- Directly copying observed behavior
- Inferring rewards from observed behavior (inverse reinforcement learning)
Learning from observing the world
- Learning to predict
- Unsupervised learning
Learning from other tasks
- Transfer learning
- Meta-learning: learning to learn