SARSA
SARSA(State-Action-Reward-State-Action)是一个学习马尔可夫决策过程策略的算法,通常使用在机器学习领域的增强学习上。一篇技术文章介绍了这个算法并且在注脚处提到了SARSA这个别名。
State-Action-Reward-State-Action这个名称清楚地反应了其学习更新函数依赖的5个值,分别是当前状态S1,当前状态选中的动作A1,获得的奖励Reward,S1状态下执行A1后取得的状态S2及S2状态下将会执行的动作A2。我们取这5个值的首字母串起来可以得出一个词SARSA。
以下是维基百科的原文,翻译得不好请轻拍,对的,那谁,请把板砖放下:
State-Action-Reward-State-Action (SARSA) is an algorithm for learning a Markov decision process policy, used in the reinforcement learning area of machine learning. It was introduced in a technical note[1] where the alternative name SARSA was only mentioned as a footnote.
This name simply reflects the fact that the main function for updating the Q-value depends on the curren