Thanks Richard S. Sutton and Andrew G. Barto for their great work of Reinforcement Learning: An Introduction.
Here we talk about some popular action exploration strategies in tabular reinforcement learning system.
Softmax Exploration Strategy
One method that is often used in combination with the RL algorithms is the Beltzmann or softmax exploration strategy.
The action selection strategy is still random, but selection probabilities are weighted by their relative
Q
-values. This makes it more likely for the agent to choose good actions, whereas two actions that have similar
in which P(a) is the probability of selecting action a and
Upper-Confidence-Bound Action Selection
It would be better to select among the non-greedy actions according to their potential for actually being optimal, taking into account both how close their estimates are to being maximal and the uncertainties in those estimates. Another effective way of doing this is to select actions according to
The idea of this upper con dence bound (UCB) action selection is that the square-root term is a measure of the uncertainty or variance in the estimate of
a
’s value. The quantity being maximized over is thus a sort of upper bound on the possible true value of action