Introduction
- Agent and Environment,History and state, Agent state, Environment state, Information state, Fully observable enviroments, Partially observable enviroments
环境完全可观测
环境部分可观测
- Policy: agents’ behaviour function, a map from state to action
- Value Function: how good is each state/action, a prediction of future reward
- Model: agents’ representation of enviroment, predicts what the enviroment will do next. predicts the next state/reward
- Categorys: Value based, Policy based, Actor Critic, Model Free, Model based
- E&E: Exploration finds more information about the enviroment; Exploitation exploits known information to maximise reward.
- Prediction: evaluate the future. Control: optimise the future.