1. Policy-based Reinforcement Learning
Suppose we have a good policy (a|s).
Upon observing the stats ,
random sampling: ~
(.|
).
2. Policy-based Reinforcement Learning
Suppose we know the optimal action-value function .
Upon observe the state ,
choose the action that maximizes the value: =
.
3. Summary
The objective of Reinforcement Learning is to learn good policy (a|s) and/or optimal action-value function
.