Lesson 18. 1. state-action rewards 2. finite horizon MDP DP algorithm: LQR: linearized quadratic regulation