δ θ ( s , a , s ′ ) = R ( s , a , s ′ ) + γ v θ ( s ′ ) − v θ ( s ) \delta_{\theta}\left(s, a, s^{\prime}\right)=R\left(s, a, s^{\prime}\right)+\gamma v_{\theta}\left(s^{\prime}\right)-v_{\theta}(s) δθ(s,a,s′)=R(s,a,s′)+γvθ(s′)−vθ(s) R R R是立即回报。 γ