【Optimal Control (CMU 16-745)】Lecture 7 The Linear Quadratic Regulator Three Ways

啵啵啵啵哲

已于 2023-09-03 08:49:19 修改

阅读量214

点赞数 1

分类专栏：最优控制文章标签：机器人学习

于 2023-09-02 12:08:09 首次发布

本文链接：https://blog.csdn.net/xuzhengzhe/article/details/132636542

版权

最优控制专栏收录该内容

10 篇文章

订阅专栏

Review:

Control history
Deterministic optimal control
Pontryagin’s principle

Lecture 7 The Linear Quadratic Regulator Three Ways

Overview

LQR intro
LQR via shooting
LQR as a QP
Riccati recursion

1. LQR problem

$\begin{aligned} \min_{\mathbf{x}_{1:N}, \mathbf{u}_{1:N-1}} &J = \sum_{k=1}^{N-1} \left(\frac{1}{2}\mathbf{x}_k^T \mathbf{Q}_k \mathbf{x}_k + \frac{1}{2}\mathbf{u}_k^T \mathbf{R} \mathbf{u}_k \right)+ \frac{1}{2}\mathbf{x}_N^T \mathbf{Q}_N \mathbf{x}_N \\ \text{s.t. } &\mathbf{x}_{k+1} = \mathbf{A}_k \mathbf{x}_k + \mathbf{B}_k \mathbf{u}_k, \quad k = 1, \cdots, N-1 \\ &\mathbf{Q}_k \succeq 0, \quad \mathbf{R} \succ 0 \end{aligned}$

$\mathbf{R}$ should be positive definite, otherwise the control input will be infinite. But $\mathbf{Q}_k$ can be positive semidefinite.

Can (locally) approximate many nonlinear problems.
Computational tractable
Many extensions (e.g. infinite horizon, stochastic, etc.)
Time invariant: $\mathbf{A}_k = \mathbf{A}$ , $\mathbf{B}_k = \mathbf{B}$ , $\mathbf{Q}_k = \mathbf{Q}$ , $\mathbf{R}_k = \mathbf{R}$ , $\forall k$ .

2. LQR with indirect shooting

$\begin{aligned} &\mathbf{x}_{k+1} = \nabla_\lambda \mathcal{H}(\mathbf{x}_k, \mathbf{u}_k, \lambda_{k+1})= \mathbf{A} \mathbf{x}_k + \mathbf{B} \mathbf{u}_k\\ &\lambda_k = \nabla_{\mathbf{x}} \mathcal{H}(\mathbf{x}_k, \mathbf{u}_k, \lambda_{k+1})= \mathbf{Q} \mathbf{x}_k + \mathbf{A}^\top \lambda_{k+1}\\ &\lambda_N = \frac{\partial \ell_F}{\partial \mathbf{x}_N} = \mathbf{Q}_N \mathbf{x}_N\\ &\mathbf{u}_k = \argmin_{\tilde{\mathbf{u}}} \mathcal{H}(\mathbf{x}_k, \tilde{\mathbf{u}}, \lambda_{k+1}) = -\mathbf{R}^{-1} \mathbf{B}^\top \lambda_{k+1} \end{aligned}$

Note:
$\mathcal{H}\left(\mathbf{x}_k, \tilde{\mathbf{u}}, \lambda_{k+1}\right) = \frac{1}{2}\mathbf{x}_k^\top \mathbf{Q} \mathbf{x}_k + \frac{1}{2}\tilde{\mathbf{u}}^\top \mathbf{R} \tilde{\mathbf{u}} + \lambda_{k+1}^\top \left(\mathbf{A} \mathbf{x}_k + \mathbf{B} \tilde{\mathbf{u}} \right)$
To minimize $\mathcal{H}$ , we take derivative w.r.t. $\tilde{\mathbf{u}}$ and set it to zero:
$\frac{\partial \mathcal{H}}{\partial \tilde{\mathbf{u}}} = \mathbf{R} \tilde{\mathbf{u}} + \mathbf{B}^\top \lambda_{k+1} = 0 \Rightarrow \tilde{\mathbf{u}} = -\mathbf{R}^{-1} \mathbf{B}^\top \lambda_{k+1}$

(1) Procedure

i. Start with an initial guess $\mathbf{u}_{1:N-1}$ trajectory
ii. Simulate (“rollout”) to get $\mathbf{x}_{1:N}$
iii. Backward pass to get $\lambda$ and $\Delta \mathbf{u}$
iv. Rollout with line search on $\Delta \mathbf{u}$
v. Go to iii until convergence

(2) Example: double integrator

Dynamics:
$\dot{\mathbf{x}} = \begin{bmatrix} \dot{\mathbf{q}}\\ \ddot{\mathbf{q}} \end{bmatrix} = \begin{bmatrix} 0 & 1 \\ 0 & 0 \end{bmatrix} \begin{bmatrix} \mathbf{q}\\ \dot{\mathbf{q}} \end{bmatrix} + \begin{bmatrix} 0\\ 1 \end{bmatrix} \mathbf{u}$

Think of this as a brick sliding on ice (no friction).

Discretize with Euler’s method:
$\mathbf{x}_{k+1} = \begin{bmatrix} 1 & h\\ 0 & 1 \end{bmatrix} \mathbf{x}_k + \begin{bmatrix} \frac{h^2}{2}\\ h \end{bmatrix} \mathbf{u}_k = \mathbf{A} \mathbf{x}_k + \mathbf{B} \mathbf{u}_k$

Code:

using LinearAlgebra
using PyPlot

# Discrete dynamics
h = 0.1   # time step
A = [1 h; 0 1]
B = [0.5*h*h; h]

n = 2     # number of state
m = 1     # number of controls
Tfinal = 5.0 # final time #try larger values
N = Int(Tfinal/h)+1    # number of time steps
thist = Array(range(0,h*(N-1), step=h));

# Initial conditions
x0 = [1.0; 0] 

# Cost weights
Q = 1.0*I(2)
R = 0.1
Qn = 1.0*I(2)

# Cost function
function J(xhist,uhist)
    cost = 0.5*xhist[:,end]'*Qn*xhist[:,end]
    for k = 1:(N-1)
        cost = cost + 0.5*xhist[:,k]'*Q*xhist[:,k] + 0.5*uhist[k]'*R*uhist[k]
    end
    return cost
end

# Dynamics
function rollout(xhist, uhist)
    xnew = zeros(size(xhist))
    xnew[:,1] = xhist[:,1]
    for k = 1:(N-1)
        xnew[:,k+1] .= A*xnew[:,k] + B*uhist[k]
    end
    return xnew
end

# Initial guess
xhist = repeat(x0, 1, N)
uhist = zeros(N-1)
Δu = ones(N-1)
λhist = zeros(n,N)
xhist = rollout(xhist, uhist) #initial rollout to get state trajectory

# J(xhist,uhist) #Initial cost

b = 1e-2 #line search tolerance
α = 1.0
iter = 0
while maximum(abs.(Δu[:])) > 1e-2 #terminate when the gradient is small
    
    #Backward pass to compute λ and Δu
    λhist[:,N] .= Qn*xhist[:,N]
    for k = N-1:-1:1
        Δu[k] = -(uhist[k]+R\B'*λhist[:,k+1])
        λhist[:,k] .= Q*xhist[:,k] + A'*λhist[:,k+1]
    end
    
    #Forward pass with line search to compute x
    α = 1.0
    unew = uhist + α.*Δu
    xnew = rollout(xhist, unew)
    while J(xnew, unew) > J(xhist, uhist) - b*α*Δu[:]'*Δu[:]
        α = 0.5*α
        unew = uhist + α.*Δu
        xnew = rollout(xhist, unew)
    end
    uhist .= unew;
    xhist .= xnew;
    iter += 1
end

# Plot x1 vs. x2, u vs. t, x vs. t, etc.
clf()
figure(figsize=(12,3.5))
subplot(1,2,1)
plot(thist, xhist[1,:], label="Position")
plot(thist, xhist[2,:], label="Velocity")
xlabel("Time")
title("State Trajectory")
legend()

subplot(1,2,2)
plot(thist[1:end-1], uhist, label="Control Input")
xlabel("Time")
title("Control Input")
legend()

display(gcf())
savefig("lqr-shooting.png", dpi=300, bbox_inches="tight")

Result:
在这里插入图片描述

Set the final time to 10.0s, we get:
在这里插入图片描述

The solutions are similar, but a longer time horizon needs more iterations to converge (2415 vs. 664). Its time complexity is linear to the time horizon.

3. LQR as a QP

(1) Redefine some variables

Assume initial state $\mathbf{x}_1$ is given (not a decision variable).
Define $\mathbf{z} = \begin{bmatrix} \mathbf{u}_1\\ \mathbf{x}_2\\ \mathbf{u}_2\\ \mathbf{x}_3\\ \vdots\\ \mathbf{x}_N \end{bmatrix}$ .
Define cost matrix:
$\mathbf{H} = \mathrm{diag}(\mathbf{R}_1, \mathbf{Q}_2, \mathbf{R}_2, \mathbf{Q}_3, \cdots, \mathbf{Q}_N),$ such that
$\frac{1}{2}\mathbf{z}^\top \mathbf{H} \mathbf{z}$
Do the same for dynamics:
Define $\mathbf{C}$ and $\mathbf{d}$ :
$\mathbf{C} = \begin{bmatrix} \mathbf{B} & -\mathbf{I} & \mathbf{0} & \mathbf{0} & \cdots & \mathbf{0}& \mathbf{0}& \mathbf{0}\\ \mathbf{0} & \mathbf{A} & \mathbf{B} & -\mathbf{I} & \cdots & \mathbf{0}& \mathbf{0}& \mathbf{0}\\ \vdots & \vdots & \vdots & \vdots & \ddots & \vdots & \vdots & \vdots\\ \mathbf{0} & \mathbf{0} & \mathbf{0} & \mathbf{0} & \cdots & \mathbf{A}& \mathbf{B}& -\mathbf{I} \end{bmatrix}, \mathbf{d} = \begin{bmatrix} -\mathbf{A}\mathbf{x}_1\\ \mathbf{0}\\ \vdots\\ \mathbf{0} \end{bmatrix}$

So the dynamics constraint can be written as:
$\mathbf{C}\mathbf{z} = \mathbf{d}$

(2) QP formulation and its closed-form solution

Now we can write the LQR problem as a QP:
$\begin{aligned} \min_{\mathbf{z}} \; &J = \frac{1}{2}\mathbf{z}^\top \mathbf{H} \mathbf{z}\\ \text{s.t. } &\mathbf{C}\mathbf{z} = \mathbf{d} \end{aligned}$

The Lagrangian is:
$\mathcal{L} = \frac{1}{2}\mathbf{z}^\top \mathbf{H} \mathbf{z} + \lambda^\top (\mathbf{C}\mathbf{z} - \mathbf{d})$

KKT conditions:
$\begin{aligned} \nabla_\mathbf{z} \mathcal{L} &= \mathbf{H}\mathbf{z} + \mathbf{C}^\top \lambda = \mathbf{0}\\ \nabla_\lambda \mathcal{L} &= \mathbf{C}\mathbf{z} - \mathbf{d} = \mathbf{0} \end{aligned} \Rightarrow \begin{bmatrix} \mathbf{H} & \mathbf{C}^\top\\ \mathbf{C} & \mathbf{0} \end{bmatrix} \begin{bmatrix} \mathbf{z}\\ \lambda \end{bmatrix} = \begin{bmatrix} \mathbf{0}\\ \mathbf{d} \end{bmatrix}$

We get the exact solution by solving one linear system.

(3) Example: double integrator

Code:

using LinearAlgebra
using PyPlot
using SparseArrays

# Discrete dynamics
h = 0.1   # time step
A = [1 h; 0 1]
B = [0.5*h*h; h]

n = 2     # number of state
m = 1     # number of controls
Tfinal = 100.0 # final time
N = Int(Tfinal/h)+1    # number of time steps
thist = Array(range(0,h*(N-1), step=h));

# Initial conditions
x0 = [1.0; 0]

# Cost weights
Q = sparse(1.0*I(2))
R = sparse(0.1*I(1))
Qn = sparse(1.0*I(2))

#Cost function
function J(xhist,uhist)
    cost = 0.5*xhist[:,end]'*Qn*xhist[:,end]
    for k = 1:(N-1)
        cost = cost + 0.5*xhist[:,k]'*Q*xhist[:,k] + 0.5*(uhist[k]'*R*uhist[k])[1]
    end
    return cost
end

# Cost
H = blockdiag(R, kron(I(N-2), blockdiag(Q,R)), Qn);

# Constraints
C = kron(I(N-1), [B -I(2)])
for k = 1:N-2
    C[(k*n).+(1:n), (k*(n+m)-n).+(1:n)] .= A
end
d = [-A*x0; zeros(size(C,1)-n)];

# Solve the linear system
y = [H C'; C zeros(size(C,1),size(C,1))]$$zeros(size(H,1)); d]

# Get state history
z = y[1:size(H,1)]   # states and controls [u0,x1,u1,...,xN]
Z = reshape(z,n+m,N-1)
xhist = Z[m+1:end,:]
xhist = [x0 xhist]

# Get control history
uhist = Z[1,:];

# Plot x1 vs. x2, u vs. t, x vs. t, etc.
times = range(0,h*(N-1), step=h)

clf()
figure(figsize=(12,3.5))
subplot(1,2,1)
plot(times, xhist[1,:], label="Position")
plot(times, xhist[2,:], label="Velocity")
xlabel("Time")
title("State Trajectory")
legend()

subplot(1,2,2)
plot(times[1:end-1], uhist, label="Control Input")
xlabel("Time")
title("Control Input")
legend()

display(gcf())
savefig("lqr-qp.png", dpi=300, bbox_inches="tight")

Result:
在这里插入图片描述

Compute time is much shorter than shooting method, and a longer time horizon is available.

4. A closer look at the LQR QP

(1) KKT system

The KKT system for LQR is very sparse (lots of zeros) and has a lot of structure.
$\left[ \begin{array}{c} \begin{matrix} \mathbf{R}& & & & & & \mathbf{B}^{\top}& & \\ & \mathbf{Q}& & & & & -\mathbf{I}& \mathbf{A}^{\top}& \\ & & \mathbf{R}& & & & & \mathbf{B}^{\top}& \\ & & & \mathbf{Q}& & & & -\mathbf{I}& \mathbf{A}^{\top}\\ & & & & \mathbf{R}& & & & \mathbf{B}^{\top}\\ & & & & & \mathbf{Q}_N& & & -\mathbf{I}\\ \mathbf{B}& -\mathbf{I}& & & & & & & \\ & \mathbf{A}& \mathbf{B}& -\mathbf{I}& & & & \mathbf{0}& \\ & & & \mathbf{A}& \mathbf{B}& -\mathbf{I}& & & \\ \end{matrix}\\ \end{array} \right] \left[ \begin{array}{c} \begin{matrix} \mathbf{u}_1\\ \mathbf{x}_2\\ \mathbf{u}_2\\ \mathbf{x}_3\\ \mathbf{u}_3\\ \mathbf{x}_4\\ \lambda_2\\ \lambda_3\\ \lambda_4 \end{matrix}\\ \end{array} \right] = \left[ \begin{array}{c} \begin{matrix} \mathbf{0}\\ \mathbf{0}\\ \mathbf{0}\\ \mathbf{0}\\ \mathbf{0}\\ \mathbf{0}\\ -\mathbf{A}\mathbf{x}_1\\ \mathbf{0}\\ \mathbf{0}\\ \end{matrix}\\ \end{array} \right]$

(2) Solving procedure (Riccati recursion)

i. $\mathbf{Q}_N \mathbf{x}_4 - \lambda_4 = \mathbf{0} \Rightarrow \lambda_4 = \mathbf{Q}_N \mathbf{x}_4$
ii. Substitute $\lambda_4$ into the last equation:
$\mathbf{R}\mathbf{u}_3 + \mathbf{B}^\top \lambda_4 = \mathbf{R}\mathbf{u}_3 + \mathbf{B}^\top \mathbf{Q}_N \mathbf{x}_4 = \mathbf{0}$
Plug in dynamics for $\mathbf{x}_4$ :
$\mathbf{R}\mathbf{u}_3 + \mathbf{B}^\top \mathbf{Q}_N \left(\mathbf{A}\mathbf{x}_3 + \mathbf{B}\mathbf{u}_3\right) = \mathbf{0}$
$\Rightarrow \mathbf{u}_3 = -\left(\mathbf{R} + \mathbf{B}^\top \mathbf{Q}_N \mathbf{B}\right)^{-1} \mathbf{B}^\top \mathbf{Q}_N \mathbf{A}\mathbf{x}_3 \triangleq -\mathbf{K}_3 \mathbf{x}_3$
iii. Substitute $\mathbf{u}_3$ into the fourth equation:
$\mathbf{Q}\mathbf{x}_3 -\lambda_3 + \mathbf{A}^\top \lambda_4 = \mathbf{0}$
Plug in $\lambda_4$ : $\mathbf{Q}\mathbf{x}_3 -\lambda_3 + \mathbf{A}^\top \mathbf{Q}_N \mathbf{x}_4 = \mathbf{0}$
Plug in dynamics: $\mathbf{Q}\mathbf{x}_3 -\lambda_3 + \mathbf{A}^\top \mathbf{Q}_N \left(\mathbf{A}\mathbf{x}_3 + \mathbf{B}\mathbf{u}_3\right) = \mathbf{0}$
Plug in $\mathbf{u}_3$ : $\mathbf{Q}\mathbf{x}_3 -\lambda_3 + \mathbf{A}^\top \mathbf{Q}_N \left(\mathbf{A}\mathbf{x}_3 - \mathbf{B}\mathbf{K}_3 \mathbf{x}_3\right) = \mathbf{0}$
$\Rightarrow \lambda_3 = \left(\mathbf{Q} + \mathbf{A}^\top \mathbf{Q}_N \left(\mathbf{A} - \mathbf{B}\mathbf{K}_3\right)\right) \mathbf{x}_3 \triangleq \mathbf{P}_3 \mathbf{x}_2$

Now we have a recursion for $\mathbf{K}_k$ and $\mathbf{P}_k$ :
$\begin{aligned} &\mathbf{P}_N = \mathbf{Q}_N, \\ &\mathbf{K}_k = \left(\mathbf{R} + \mathbf{B}^\top \mathbf{P}_{k+1} \mathbf{B}\right)^{-1} \mathbf{B}^\top \mathbf{P}_{k+1} \mathbf{A}, \\ &\mathbf{P}_k = \mathbf{Q} + \mathbf{A}^\top \mathbf{P}_{k+1} \left(\mathbf{A} - \mathbf{B}\mathbf{K}_k\right) \end{aligned}$

This is called the Riccati recursion/equation. We can solve the QP by doing a backward Riccati recursion followed by a forward rollout to compute $\mathbf{x}_{1:N}$ and $\mathbf{u}_{1:N-1}$ form initial conditions.
General (dense) QP has complexity $\mathcal{O}(N^3(n+m)^3)$ ( $N$ is the time horizon, $n$ is the state dimension, $m$ is the control dimension).
Riccati recursion has complexity $\mathcal{O}(N(n+m)^3)$ .
Even more important: we now have a feedback policy $\mathbf{u}_k = -\mathbf{K}_k \mathbf{x}_k$ instead of an open-loop trajectory.

(3) Example: double integrator

Code:

P = zeros(n,n,N)
K = zeros(m,n,N-1)

P[:,:,N] .= Qn

#Backward Riccati recursion
for k = (N-1):-1:1
    K[:,:,k] .= (R + B'*P[:,:,k+1]*B)\(B'*P[:,:,k+1]*A)
    P[:,:,k] .= Q + A'*P[:,:,k+1]*(A-B*K[:,:,k])
end

#Forward rollout starting at x0
xhist = zeros(n,N)
xhist[:,1] = x0
uhist = zeros(m,N-1)
for k = 1:(N-1)
    uhist[:,k] .= -K[:,:,k]*xhist[:,k]
    xhist[:,k+1] .= A*xhist[:,k] + B*uhist[k]
end

Plot the control feedback, we find that the feedback matrix $\mathbf{K}$ converges to a constant value.
Using Julia’s dlqr function, we get the feedback matrix, which is the same as the converged value.
Therefore, when we want to stabilize the system, we can use a constant feedback matrix (treat it as an infinite horizon LQR).