【Optimal Control (CMU 16-745)】Lecture 9 Convex Model Predictive Control

啵啵啵啵哲

已于 2023-11-08 09:14:27 修改

阅读量109

点赞数 2

分类专栏：最优控制文章标签：学习机器人

于 2023-09-03 13:30:24 首次发布

本文链接：https://blog.csdn.net/xuzhengzhe/article/details/132649446

版权

最优控制专栏收录该内容

9 篇文章 1 订阅

订阅专栏

Review:

Infinite-horizon LQR
Controllability
Dynamic programming

Lecture 9 Convex Model Predictive Control

Overview

Convexit background
Convex MPC

1. Finally: what are the Lagrange multipliers?

Recall Riccati derivation from QP:
$\lambda_k = \mathbf{P}_k \mathbf{x}_k$

From DP:
$V_k(\mathbf{x}) = \frac{1}{2}\mathbf{x}^\top \mathbf{P}_k \mathbf{x}$

It is easy to see that $\lambda_k = \nabla \left.V_k(\mathbf{x})\right|_{\mathbf{x}=\mathbf{x}_k} = \mathbf{P}_k \mathbf{x}_k$ .

Dynamics multipliers are cost-to-go gradients.
Carries over to nonlinear setting (not just LQR).

In the nonlinear case, even if we cannot get the value function, we can still get its gradient.

DP vs. QP:
在这里插入图片描述

The state and control input are the same.

Compare matrix $\mathbf{K}$ from dlqr and DP:

#Compute infinite-horizon K matrix using ControlSystems.jl
Kinf = dlqr(A,B,Q,R[1]) 
#Compare to oursa
K[:,:,1]-Kinf

Result:

1×2 Matrix{Float64}:
 -6.72929e-9  -2.28764e-9

The matrix $\mathbf{P}$ has a similar result.

2. Convexity Model Predictive Control

(1) Motivation

LQR is very powerful but we often need to reason about constraints.
Often these constraints are simple (e.g. torque limits).
Constraints break the Riccati solution, but we can still solve the QP online.
Convex MPC has gotten popular as computers have gotten faster.

(1) Background: Convexity

(a) Convex set

$\mathcal{X} \subseteq \mathbb{R}^n$ is convex if for any $\mathbf{x}_1, \mathbf{x}_2 \in \mathcal{X}$ , the line segment between them is in $\mathcal{X}$ . (A line connecting any two points in the set is also contained in the set.)
在这里插入图片描述

Standard examples:

Linear subspace: $\mathcal{X} = \{\mathbf{x} \in \mathbb{R}^n | \mathbf{Ax} = \mathbf{b}\}$
Half space/Box/Polytope: $\mathcal{X} = \{\mathbf{x} \in \mathbb{R}^n | \mathbf{Ax} \leq \mathbf{b}\}$
Ellipsoid: $\mathcal{X} = \{\mathbf{x} \in \mathbb{R}^n | \mathbf{x}^\top \mathbf{P}^{-1} \mathbf{x} \leq 1, \mathbf{P} \succ 0\}$
Cone: $\mathcal{X} = \{\mathbf{x} \in \mathbb{R}^n | \left\|\mathbf{x}_i\right\|_2 \leq \mathbf{x}_1, i = 2, \dots, n\}$ ( $L_2$ norm $\rightarrow$ second-order cone or standard ice cream cone)

(b) Convex function

A function $\mathbb{R}^n \rightarrow \mathbb{R}$ whose epigraph is a convex set.
(standard definition: $f(\lambda \mathbf{x}_1 + (1-\lambda)\mathbf{x}_2) \leq \lambda f(\mathbf{x}_1) + (1-\lambda)f(\mathbf{x}_2)$ for all $\mathbf{x}_1, \mathbf{x}_2 \in \mathbb{R}^n$ and $\lambda \in [0,1]$ )
在这里插入图片描述

The epigraph of a function is the set of points lying on or above its graph. Its definition is:
$\text{epi}(f) = \{(\mathbf{x}, y) \in \mathbb{R}^{n+1} | f(\mathbf{x}) \leq y\}$

Standard examples:

Linear function: $f(\mathbf{x}) = \mathbf{c}^\top \mathbf{x}$
Quadratic function: $f(\mathbf{x}) = \frac{1}{2}\mathbf{x}^\top \mathbf{Q} \mathbf{x} + \mathbf{c}^\top \mathbf{x}, \mathbf{Q} \succ 0$
Norms: $f(\mathbf{x}) = \left\|\mathbf{x}\right\|_p = \left(\sum_{i=1}^n \left|x_i\right|^p\right)^{1/p}, p \geq 1$

© Convex optimization problem

minimize a convex function over a convex set:
$\begin{array}{ll} \text{minimize} & f(\mathbf{x}) \\ \text{subject to} & \mathbf{x} \in \mathcal{X} \end{array}$

Standard examples:

Linear program (LP): $f(\mathbf{x}) = \mathbf{c}^\top \mathbf{x}, \mathcal{X} = \{\mathbf{x} \in \mathbb{R}^n | \mathbf{Ax} \leq \mathbf{b}\}$
Quadratic program (QP): $f(\mathbf{x}) = \frac{1}{2}\mathbf{x}^\top \mathbf{Q} \mathbf{x} + \mathbf{c}^\top \mathbf{x}, \mathcal{X} = \{\mathbf{x} \in \mathbb{R}^n | \mathbf{Ax} \leq \mathbf{b}\}$
Quadratically-constrained quadratic program (QCQP): $f(\mathbf{x}) = \frac{1}{2}\mathbf{x}^\top \mathbf{Q}_0 \mathbf{x} + \mathbf{c}^\top \mathbf{x}, \mathcal{X} = \{\mathbf{x} \in \mathbb{R}^n | \mathbf{x}^\top \mathbf{Q}_i \mathbf{x} + \mathbf{a}_i^\top \mathbf{x} \leq b_i, i = 1, \dots, m\}$ (ellipsoide constraints)
Second-order cone program (SOCP): $f(\mathbf{x}) = \mathbf{c}^\top \mathbf{x}, \mathcal{X} = \{\mathbf{x} \in \mathbb{R}^n | \left\|\mathbf{A}_i \mathbf{x} + \mathbf{b}_i\right\|_2 \leq \mathbf{c}_i^\top \mathbf{x} + d_i, i = 1, \dots, m\}$ (cone constraints)

LP is a special case of QP, QP is a special case of QCQP, QCQP is a special case of SOCP.
$\text{LP} \subset \text{QP} \subset \text{QCQP} \subset \text{SOCP}$

(2) Properties of convex optimization problems

Convex problems don’t have nay spurious local optima that satisfy the KKT conditions.
$\Rightarrow$ Any local optimum is a global optimum.
Practically, Newton’s method converges really fast and reliably (5-10 iterations max).
$\Rightarrow$ We can bound solution time for real-time control.

3. Convex MPC

Think about this as “constrainted LQR”.

Remember from QP, if we have a cost-to-go function, we can get $\mathbf{u}$ by solving a one-step problem:
$\begin{aligned} \mathbf{u}_k &= \argmin_{\mathbf{u}} \ell(\mathbf{x}_k, \mathbf{u}) + V_{k+1}(\mathbf{f}(\mathbf{x}_k, \mathbf{u}))\\ &=\argmin_{\mathbf{u}} \frac{1}{2}\mathbf{u}^\top \mathbf{R} \mathbf{u} + \frac{1}{2}\left(\mathbf{A}_k \mathbf{x}_k + \mathbf{B}_k \mathbf{u} \right)^\top \mathbf{P}_{k+1} \left(\mathbf{A}_k \mathbf{x}_k + \mathbf{B}_k \mathbf{u} \right)\\ \end{aligned}$

We can add constraints on $\mathbf{u}$ to this one-step problem but this will perform poorly beacuse $V (x)$ was computed without constraints.

So, there is no reason we can’t add more steps to the one-step problem:
$\min_{\mathbf{x}_{1:H}, \mathbf{u}_{1:H-1}}\left( \sum_{k=1}^{H-1} \frac{1}{2}\mathbf{x}_k^\top \mathbf{Q} \mathbf{x}_k + \frac{1}{2}\mathbf{u}_k^\top \mathbf{R} \mathbf{u}_k\right) + \frac{1}{2}\mathbf{x}_H^\top \mathbf{P} \mathbf{x}_H\\ \text{s.t.} \quad \mathbf{x}_{k+1} = \mathbf{A}_k \mathbf{x}_k + \mathbf{B}_k \mathbf{u}_k\\ \mathbf{x}_k \in \mathcal{X}, \mathbf{u}_k \in \mathcal{U}$

The last term $\frac{1}{2}\mathbf{x}_H^\top \mathbf{P} \mathbf{x}_H$ is the LQR cost-to-go.
$H\leq N$ is called the horizon.
With no additional constraints, MPC (“receeding horizon control”) exactly matches LQR for any $H$ .
Intuition: explicit constrained optimization over first $H$ steps gets the state close enough to the reference that the constraints are no longer active and LQR solution/cost-to-go is valid further into the future.
In general:
- A good approximation of $V (x)$ is important for good performance.
- Better $V (x)$ $\Rightarrow$ shorter horizon $H$ .
- Longer horizon $H$ $\Rightarrow$ less reliance on $V (x)$ .

(3) Example: Planar quadrotor

在这里插入图片描述

Dynamics:
$\begin{aligned} m\ddot{x} &= -(u_1+u_2)\sin\theta\\ m\ddot{y} &= (u_1+u_2)\cos\theta - mg\\ J\ddot{\theta} &= \frac{1}{2}l(u_1-u_2) \end{aligned}$

Linearize about hover:
$u_1 = u_2 = \frac{1}{2}mg$

$\Rightarrow \begin{cases} \Delta\ddot{x} = -g\Delta\theta\\ \Delta\ddot{y} = \frac{1}{m}\left(\Delta u_1 + \Delta u_2\right)\\ \Delta\ddot{\theta} = \frac{l}{2J}\left(\Delta u_2 - \Delta u_1\right) \end{cases}$

State equation:
$\begin{bmatrix} \Delta \dot{x}\\ \Delta \dot{y}\\ \Delta \dot{\theta}\\ \Delta \ddot{x}\\ \Delta \ddot{y}\\ \Delta \ddot{\theta} \end{bmatrix} = \begin{bmatrix} 0 & 0 & 0 & 1 & 0 & 0\\ 0 & 0 & 0 & 0 & 1 & 0\\ 0 & 0 & 0 & 0 & 0 & 1\\ 0 & 0 & -g & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 0 \end{bmatrix} \begin{bmatrix} \Delta x\\ \Delta y\\ \Delta \theta\\ \Delta \dot{x}\\ \Delta \dot{y}\\ \Delta \dot{\theta} \end{bmatrix} + \begin{bmatrix} 0 & 0\\ 0 & 0\\ 0 & 0\\ 0 & 0\\ \frac{1}{m} & \frac{1}{m}\\ \frac{l}{2J} & -\frac{l}{2J} \end{bmatrix} \begin{bmatrix} \Delta u_1\\ \Delta u_2 \end{bmatrix}$

Namely, $\dot{\mathbf{x}} = \mathbf{A}\mathbf{x} + \mathbf{B}\mathbf{u}$ .

MPC cost function:
$\sum_{k=1}^{H-1} \left( \frac{1}{2}(\mathbf{x}_k-\mathbf{x}_{ref})^\top \mathbf{Q} (\mathbf{x}_k-\mathbf{x}_{ref}) + \frac{1}{2}\Delta\mathbf{u}_k^\top \mathbf{R} \Delta\mathbf{u}_k \right) \\ + \frac{1}{2}(\mathbf{x}_H-\mathbf{x}_{ref})^\top \mathbf{P}_H (\mathbf{x}_H-\mathbf{x}_{ref})$

Code:

using LinearAlgebra
using PyPlot
using SparseArrays
using ForwardDiff
using ControlSystems
using OSQP

#Model parameters
g = 9.81 #m/s^2
m = 1.0 #kg 
ℓ = 0.3 #meters
J = 0.2*m*ℓ*ℓ

#Thrust limits
umin = [0.2*m*g; 0.2*m*g]
umax = [0.6*m*g; 0.6*m*g]

h = 0.05 #time step (20 Hz)

#Planar Quadrotor Dynamics
function quad_dynamics(x,u)
    θ = x[3]
    
    ẍ = (1/m)*(u[1] + u[2])*sin(θ)
    ÿ = (1/m)*(u[1] + u[2])*cos(θ) - g
    θ̈ = (1/J)*(ℓ/2)*(u[2] - u[1])
    
    return [x[4:6]; ẍ; ÿ; θ̈]
end

function quad_dynamics_rk4(x,u)
    #RK4 integration with zero-order hold on u
    f1 = quad_dynamics(x, u)
    f2 = quad_dynamics(x + 0.5*h*f1, u)
    f3 = quad_dynamics(x + 0.5*h*f2, u)
    f4 = quad_dynamics(x + h*f3, u)
    return x + (h/6.0)*(f1 + 2*f2 + 2*f3 + f4)
end

#Linearized dynamics for hovering
x_hover = zeros(6)
u_hover = [0.5*m*g; 0.5*m*g]
A = ForwardDiff.jacobian(x->quad_dynamics_rk4(x,u_hover),x_hover);
B = ForwardDiff.jacobian(u->quad_dynamics_rk4(x_hover,u),u_hover);
quad_dynamics_rk4(x_hover, u_hover)

Nx = 6     # number of state
Nu = 2     # number of controls
Tfinal = 10.0 # final time
Nt = Int(Tfinal/h)+1    # number of time steps
thist = Array(range(0,h*(Nt-1), step=h));

# Cost weights
Q = Array(1.0*I(Nx));
R = Array(.01*I(Nu));
Qn = Array(1.0*I(Nx));

#Cost function
function cost(xhist,uhist)
    cost = 0.5*xhist[:,end]'*Qn*xhist[:,end]
    for k = 1:(size(xhist,2)-1)
        cost = cost + 0.5*xhist[:,k]'*Q*xhist[:,k] + 0.5*(uhist[k]'*R*uhist[k])[1]
    end
    return cost
end

#LQR Hover Controller
P = dare(A,B,Q,R)
K = dlqr(A,B,Q,R)

function lqr_controller(t,x,K,xref)
    
    return u_hover - K*(x-xref)
end

#Build QP matrices for OSQP
Nh = 20 #one second horizon at 20Hz
Nx = 6
Nu = 2
U = kron(Diagonal(I,Nh), [I zeros(Nu,Nx)]) #Matrix that picks out all u
Θ = kron(Diagonal(I,Nh), [0 0 0 0 1 0 0 0]) #Matrix that picks out all x3 (θ)
H = sparse([kron(Diagonal(I,Nh-1),[R zeros(Nu,Nx); zeros(Nx,Nu) Q]) zeros((Nx+Nu)*(Nh-1), Nx+Nu); zeros(Nx+Nu,(Nx+Nu)*(Nh-1)) [R zeros(Nu,Nx); zeros(Nx,Nu) P]])
b = zeros(Nh*(Nx+Nu))
C = sparse([[B -I zeros(Nx,(Nh-1)*(Nu+Nx))]; zeros(Nx*(Nh-1),Nu) [kron(Diagonal(I,Nh-1), [A B]) zeros((Nh-1)*Nx,Nx)] + [zeros((Nh-1)*Nx,Nx) kron(Diagonal(I,Nh-1),[zeros(Nx,Nu) Diagonal(-I,Nx)])]])

#Dynamics + Thrust limit constraints
D = [C; U]
lb = [zeros(Nx*Nh); kron(ones(Nh),umin-u_hover)]
ub = [zeros(Nx*Nh); kron(ones(Nh),umax-u_hover)]

#Dynamics + thrust limit + bound constraint on θ to keep the system within small-angle approximation
#D = [C; U; Θ]
#lb = [zeros(Nx*Nh); kron(ones(Nh),umin-u_hover); -0.2*ones(Nh)]
#ub = [zeros(Nx*Nh); kron(ones(Nh),umax-u_hover); 0.2*ones(Nh)]

prob = OSQP.Model()
OSQP.setup!(prob; P=H, q=b, A=D, l=lb, u=ub, verbose=false, eps_abs=1e-8, eps_rel=1e-8, polish=1);

#MPC Controller
function mpc_controller(t,x,xref)
    
    #Update QP problem
    lb[1:6] .= -A*x
    ub[1:6] .= -A*x
    
    for j = 1:(Nh-1)
        b[(Nu+(j-1)*(Nx+Nu)).+(1:Nx)] .= -Q*xref
    end
    b[(Nu+(Nh-1)*(Nx+Nu)).+(1:Nx)] .= -P*xref
    
    OSQP.update!(prob, q=b, l=lb, u=ub)

    #Solve QP
    results = OSQP.solve!(prob)
    Δu = results.x[1:Nu]

    return u_hover + Δu
end

function closed_loop(x0,controller,N)
    xhist = zeros(length(x0),N)
    u0 = controller(1,x0)
    uhist = zeros(length(u0),N-1)
    uhist[:,1] .= u0
    xhist[:,1] .= x0
    for k = 1:(N-1)
        uk = controller(k,xhist[:,k])
        uhist[:,k] = max.(min.(umax, uk), umin) #enforce control limits
        xhist[:,k+1] .= quad_dynamics_rk4(xhist[:,k],uhist[:,k])
    end
    return xhist, uhist
end

x_ref = [0.0; 1.0; 0; 0; 0; 0]
x0 = [10.0; 2.0; 0.0; 0; 0; 0]
xhist1, uhist1 = closed_loop(x0, (t,x)->lqr_controller(t,x,K,x_ref), Nt);
xhist2, uhist2 = closed_loop(x0, (t,x)->mpc_controller(t,x,x_ref), Nt);

start from $\mathbf{x}_0 = \begin{bmatrix}1.0 & 2.0 & 0.0 & 0 & 0 & 0\end{bmatrix}^\top$ :
state trajectory:
在这里插入图片描述

control input:

start from $\mathbf{x}_0 = \begin{bmatrix}10.0 & 2.0 & 0.0 & 0 & 0 & 0\end{bmatrix}^\top$ :
state trajectory:
在这里插入图片描述

control input:

Add bound constraint on $\theta$ to keep the system within small-angle approximation:
state trajectory:
在这里插入图片描述

control input:
在这里插入图片描述

啵啵啵啵哲

关注

2
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
【Optimal Control (CMU 16-745)】Lecture 9 Convex Model Predictive Control

这样做实际上是拿当前点的Cost-to-Go来evaluate控制u，在控制的开始阶段是不合理的（因为开始阶段误差很大，往往u也很大），但是，随着控制的进行，在LQR控制器的作用下（无约束情况），控制的u会越来越小，因此，可以假设，在一段时间之后，随着u的变小，u会逐渐满足这种最大值限制，此时由Backward Pass所求得的V(x)会变得准确，我们利用这一点，把带约束的LQR问题写成如下形式，(显式约束优化前H步，使得状态足够接近参考状态，使得约束不再激活，LQR解/成本可以在未来更远的地方得到验证。
复制链接

扫一扫