on and off policy强化学习
强化学习入门强化学习 on and off policy 即Q-learning Or SARSA强化学习 on and off policy 即Q-learning Or SARSACreated with Raphaël 2.2.0开始学习using Policy such as e greedy, State S, Action AGet Reward R next state, S' from env ,on or off policy ?Q_target = Reward + gamma*
复制链接