智能体分类
Categorizing RL agents
-
基于价值 Value Based
- No Policy (Implicit)
- Value Function
-
基于行动决策 Policy Based
- Policy
- No Value Function
-
结合价值和行动决策 Actor Critic
- Policy
- Value Function
-
无模型 Model Free
- Policy and/or Value Function
- No Model
-
基于模型的 Model Based
- Policy and/or Value Function
- No Model
在连续决策问题当中的两个重要方面
Two fundamental problems in sequential decision making
强化学习问题 Reinforcement Learning:
- 环境是未知的 The environment is initially unknown
- 需要不断和环境进行交互 The agent interacts with the environment