Lecture 1 Introduction and Course Overview
文章目录
What is RL? And why should we care?
-
Deep learning helps us handle unstructured environments
- learn from amount of data
- Not decision-making problems,usually recognition problems, passive problems
-
Reinforcement learning provides a formalism for behavior
- is essentially a mathematical formalization of a decision-making problem.
What is Deep RL? And why should we care?
- One of the main advantages of Useing deep learning is that, computer can get features(which are not designed by human) through this kind of end-to-end learning, and don’t need we human to discover them.
- For standard RL:
- unlikely to have features in common——probably need to design features for each task.
- the main limiting factor for the application of RL
- For Deep RL:
- don’t have to design features by hands. it is a automated process
What does end-to-end learning mean for sequential decision making?
-
Traditional decision-making system:
- consist of many different parts
- perception, then action
- sometimes is very difficult——some features may be very helpful and relevant to the task but may not be apparent
-
End-to-end:
- map the perceptron to action directly
- allow the models to discover for itself what are valuable things to pay attention to in the image.
Reinforcement Learning: algorithmic foudation
Deep models: allow reinforcement learning algorithms to solve complex problems end to end——apply RL to general problems
Deep = can process complex sensory input
Reinforcement learning = can choose complex actions
Why should we study this now?
- Advances in deep learning
- Advances in reinforcement learning
- Advances in deep computational capbility
Beyond learning from reward
- • Basic reinforcement learning deals with maximizing rewards
- Advanced topics:
- Learning reward functions from example (inverse reinforcement learning)
- Transferring knowledge between domains (transfer learning, meta-learning)
- Learning to predict and using prediction to act
Are there other forms of supervision?
- Learning from demonstrations
- Directly copying observed behavior
- Inferring rewards from observed behavior (inverse reinforcement learning)
- Learning from observing the world
- Learning to predict
- Unsupervised learning
- Learning from other tasks
- Transfer learning
- Meta-learning: learning to learn