动态编程概述
Planning by Dynamic Programming
定义
动态: 时间或者顺序方面的特性。意味着可以一步步的进行
Dynamic sequential or temporal component to the problem
such as step by step
编程:像数学家一样优化线性问题
Programming optimising a “program”, i.e. a policy
c.f. linear programming
- 可以解决复杂问题 A method for solving complex problems
- 可以分成若干子问题By breaking them down into subproblems
- 解决子问题Solve the subproblems
- 把子问题的答案拼接成解决方案 Combine solutions to subproblems
动态编程通常有两个特性:
1 )最优结构指的是最优化理论,最优解可以分成若干子问题的最优解。Optimal substructure
- Principle of optimality applies
- 合并子问题的答案Optimal solution can be decomposed into subproblems
2 ) 重叠子问题 Overlapping subproblems
- 子问题出现很多次Subproblems recur many times
- 子问题的解决方案可以被存储起来,反复利用Solutions can be cached and reused
MDP具备这两个属性 Markov decision processes satisfy both properties
- 贝尔曼方程把问题用递归的方式变为求解子问题
Bellman equation gives recursive decomposition - 价值函数相当于重叠子问题,存储起来并且复用Value function stores and reuses solutions
动态规划
- Dynamic programming assumes full knowledge of the MDP
- It is used for planning in an MDP
动态规划中最为重要的两个案例是预测和控制
对于预测 prediction:
给出MDP和策略 π \pi π或者MRP,求解该策略价值函数的过程
- Input: MDP < S , A ,