- Policy Gradients is generally believed to be able to apply to a wider range of problems. For instance, on occasions when the Q function (i.e. reward function) is too complex to be learned, DQN will fail miserably.
- Policy Gradients is still capable of learning a good policy since it directly operates in the policy space.
- , Policy Gradients usually show faster convergence rate than DQN, but has a tendency to converge to a local optimal.
- Since Policy Gradients model probabilities of actions, it is capable of learning stochastic policies
- Policy Gradients can be easily applied to model continuous action space since the policy network is designed to model probability distribution, DQN has to go through an expensive action discretization process
- one of the biggest drawbacks of Policy Gradients is the high variance in estimating the gradient
Q Learning vs Policy Gradients
最新推荐文章于 2024-07-30 01:27:39 发布