由于组里新同学进来,需要带着他入门RL,选择从silver的课程开始。
对于我自己,增加一个仔细阅读《reinforcement learning:an introduction》的要求。
因为之前读的不太认真,这一次希望可以认真一点,将对应的知识点也做一个简单总结。
9.1 Value-function Approximation . . . . . . . . . . . . . . . . . . . . . . . 191
9.2 The Prediction Objective (MSVE) . . . . . . . . . . . . . . . . . . . . 192
9.3 Stochastic-gradient and Semi-gradient Methods . . . . . . . . . . . . . 194
9.4 Linear Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
9.5 Feature Construction for Linear Methods . . . . . . . . . . . . . . . . 203
9.5.1 Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
9.5.2 Fourier Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
9.5.3 Coarse Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
9.5.4 Tile Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
9.5.5 Radial Basis Functions . . . . . . . . . . . . . . . . . . . . . . . 215
9.6 Nonlinear Function Approximation: Artificial Neural Networks . . . . 216
9.7 Least-Squares TD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
9.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
not all function approximation methods are equally well suited for use in reinforcement learning
==》learn efficiently from incrementally acquired data
==》handle nonstationary target functions
with genuine approximation, an update at one state affects many others, and it is not possible to get all states exactly correct.
By assumption we have far more states than weights, so making one state’s estimate more accurate invariably means making others’ less accurate.
we obtain a natural objective function, theMean Squared Value Error, or MSVE: