1. Policy Function
(a|s)
- Policy function
(a|s) is a probability density function.
- It takes state s as input.
- It output the probabilities for all the actions, e.g.,
(left|s) = 0.2;
(right|s) = 0.1;
(up|s) = 0.7.
- The agent performs an action a random drawn from the distribution.
2. Policy Network
(a|s,
)
Policy network: Use a neural net to approximate (a|s).
- Use policy network
(a|s,
) to approximate
(a|s).
: trainable parameters of the neural net.
关于上图:
- Here,
={"left", "right", "up"} is the set of all actions.
- That is why we use softmax activation.