【Optimal Control (CMU 16-745)】Lecture 9 Convex Model Predictive Control

Review:

  • Infinite-horizon LQR
  • Controllability
  • Dynamic programming

Lecture 9 Convex Model Predictive Control

Overview

  • Convexit background
  • Convex MPC

1. Finally: what are the Lagrange multipliers?

Recall Riccati derivation from QP:
λ k = P k x k \lambda_k = \mathbf{P}_k \mathbf{x}_k λk=Pkxk

From DP:
V k ( x ) = 1 2 x ⊤ P k x V_k(\mathbf{x}) = \frac{1}{2}\mathbf{x}^\top \mathbf{P}_k \mathbf{x} Vk(x)=21xPkx

It is easy to see that λ k = ∇ V k ( x ) ∣ x = x k = P k x k \lambda_k = \nabla \left.V_k(\mathbf{x})\right|_{\mathbf{x}=\mathbf{x}_k} = \mathbf{P}_k \mathbf{x}_k λk=Vk(x)x=xk=Pkxk.

  • Dynamics multipliers are cost-to-go gradients.
  • Carries over to nonlinear setting (not just LQR).

In the nonlinear case, even if we cannot get the value function, we can still get its gradient.

DP vs. QP:
在这里插入图片描述
在这里插入图片描述

The state and control input are the same.

Compare matrix K \mathbf{K} K from dlqr and DP:

#Compute infinite-horizon K matrix using ControlSystems.jl
Kinf = dlqr(A,B,Q,R[1]) 
#Compare to oursa
K[:,:,1]-Kinf

Result:

1×2 Matrix{Float64}:
 -6.72929e-9  -2.28764e-9

The matrix P \mathbf{P} P has a similar result.

2. Convexity Model Predictive Control

(1) Motivation
  • LQR is very powerful but we often need to reason about constraints.
  • Often these constraints are simple (e.g. torque limits).
  • Constraints break the Riccati solution, but we can still solve the QP online.
  • Convex MPC has gotten popular as computers have gotten faster.
(1) Background: Convexity
(a) Convex set

X ⊆ R n \mathcal{X} \subseteq \mathbb{R}^n XRn is convex if for any x 1 , x 2 ∈ X \mathbf{x}_1, \mathbf{x}_2 \in \mathcal{X} x1,x2X, the line segment between them is in X \mathcal{X} X. (A line connecting any two points in the set is also contained in the set.)
在这里插入图片描述

Standard examples:
  • Linear subspace: X = { x ∈ R n ∣ A x = b } \mathcal{X} = \{\mathbf{x} \in \mathbb{R}^n | \mathbf{Ax} = \mathbf{b}\} X={xRnAx=b}
  • Half space/Box/Polytope: X = { x ∈ R n ∣ A x ≤ b } \mathcal{X} = \{\mathbf{x} \in \mathbb{R}^n | \mathbf{Ax} \leq \mathbf{b}\} X={xRnAxb}
  • Ellipsoid: X = { x ∈ R n ∣ x ⊤ P − 1 x ≤ 1 , P ≻ 0 } \mathcal{X} = \{\mathbf{x} \in \mathbb{R}^n | \mathbf{x}^\top \mathbf{P}^{-1} \mathbf{x} \leq 1, \mathbf{P} \succ 0\} X={xRnxP1x1,P0}
  • Cone: X = { x ∈ R n ∣ ∥ x i ∥ 2 ≤ x 1 , i = 2 , … , n } \mathcal{X} = \{\mathbf{x} \in \mathbb{R}^n | \left\|\mathbf{x}_i\right\|_2 \leq \mathbf{x}_1, i = 2, \dots, n\} X={xRnxi2x1,i=2,,n} ( L 2 L_2 L2 norm → \rightarrow second-order cone or standard ice cream cone)
(b) Convex function

A function f : R n → R f: \mathbb{R}^n \rightarrow \mathbb{R} f:RnR whose epigraph is a convex set.
(standard definition: f ( λ x 1 + ( 1 − λ ) x 2 ) ≤ λ f ( x 1 ) + ( 1 − λ ) f ( x 2 ) f(\lambda \mathbf{x}_1 + (1-\lambda)\mathbf{x}_2) \leq \lambda f(\mathbf{x}_1) + (1-\lambda)f(\mathbf{x}_2) f(λx1+(1λ)x2)λf(x1)+(1λ)f(x2) for all x 1 , x 2 ∈ R n \mathbf{x}_1, \mathbf{x}_2 \in \mathbb{R}^n x1,x2Rn and λ ∈ [ 0 , 1 ] \lambda \in [0,1] λ[0,1])
在这里插入图片描述

The epigraph of a function is the set of points lying on or above its graph. Its definition is:
epi ( f ) = { ( x , y ) ∈ R n + 1 ∣ f ( x ) ≤ y } \text{epi}(f) = \{(\mathbf{x}, y) \in \mathbb{R}^{n+1} | f(\mathbf{x}) \leq y\} epi(f)={(x,y)Rn+1f(x)y}

Standard examples:
  • Linear function: f ( x ) = c ⊤ x f(\mathbf{x}) = \mathbf{c}^\top \mathbf{x} f(x)=cx
  • Quadratic function: f ( x ) = 1 2 x ⊤ Q x + c ⊤ x , Q ≻ 0 f(\mathbf{x}) = \frac{1}{2}\mathbf{x}^\top \mathbf{Q} \mathbf{x} + \mathbf{c}^\top \mathbf{x}, \mathbf{Q} \succ 0 f(x)=21xQx+cx,Q0
  • Norms: f ( x ) = ∥ x ∥ p = ( ∑ i = 1 n ∣ x i ∣ p ) 1 / p , p ≥ 1 f(\mathbf{x}) = \left\|\mathbf{x}\right\|_p = \left(\sum_{i=1}^n \left|x_i\right|^p\right)^{1/p}, p \geq 1 f(x)=xp=(i=1nxip)1/p,p1
© Convex optimization problem

minimize a convex function over a convex set:
minimize f ( x ) subject to x ∈ X \begin{array}{ll} \text{minimize} & f(\mathbf{x}) \\ \text{subject to} & \mathbf{x} \in \mathcal{X} \end{array} minimizesubject tof(x)xX

Standard examples:
  • Linear program (LP): f ( x ) = c ⊤ x , X = { x ∈ R n ∣ A x ≤ b } f(\mathbf{x}) = \mathbf{c}^\top \mathbf{x}, \mathcal{X} = \{\mathbf{x} \in \mathbb{R}^n | \mathbf{Ax} \leq \mathbf{b}\} f(x)=cx,X={xRnAxb}
  • Quadratic program (QP): f ( x ) = 1 2 x ⊤ Q x + c ⊤ x , X = { x ∈ R n ∣ A x ≤ b } f(\mathbf{x}) = \frac{1}{2}\mathbf{x}^\top \mathbf{Q} \mathbf{x} + \mathbf{c}^\top \mathbf{x}, \mathcal{X} = \{\mathbf{x} \in \mathbb{R}^n | \mathbf{Ax} \leq \mathbf{b}\} f(x)=21xQx+cx,X={xRnAxb}
  • Quadratically-constrained quadratic program (QCQP): f ( x ) = 1 2 x ⊤ Q 0 x + c ⊤ x , X = { x ∈ R n ∣ x ⊤ Q i x + a i ⊤ x ≤ b i , i = 1 , … , m } f(\mathbf{x}) = \frac{1}{2}\mathbf{x}^\top \mathbf{Q}_0 \mathbf{x} + \mathbf{c}^\top \mathbf{x}, \mathcal{X} = \{\mathbf{x} \in \mathbb{R}^n | \mathbf{x}^\top \mathbf{Q}_i \mathbf{x} + \mathbf{a}_i^\top \mathbf{x} \leq b_i, i = 1, \dots, m\} f(x)=21xQ0x+cx,X={xRnxQix+aixbi,i=1,,m} (ellipsoide constraints)
  • Second-order cone program (SOCP): f ( x ) = c ⊤ x , X = { x ∈ R n ∣ ∥ A i x + b i ∥ 2 ≤ c i ⊤ x + d i , i = 1 , … , m } f(\mathbf{x}) = \mathbf{c}^\top \mathbf{x}, \mathcal{X} = \{\mathbf{x} \in \mathbb{R}^n | \left\|\mathbf{A}_i \mathbf{x} + \mathbf{b}_i\right\|_2 \leq \mathbf{c}_i^\top \mathbf{x} + d_i, i = 1, \dots, m\} f(x)=cx,X={xRnAix+bi2cix+di,i=1,,m} (cone constraints)

LP is a special case of QP, QP is a special case of QCQP, QCQP is a special case of SOCP.
LP ⊂ QP ⊂ QCQP ⊂ SOCP \text{LP} \subset \text{QP} \subset \text{QCQP} \subset \text{SOCP} LPQPQCQPSOCP

(2) Properties of convex optimization problems
  • Convex problems don’t have nay spurious local optima that satisfy the KKT conditions.
    ⇒ \Rightarrow Any local optimum is a global optimum.
  • Practically, Newton’s method converges really fast and reliably (5-10 iterations max).
    ⇒ \Rightarrow We can bound solution time for real-time control.

3. Convex MPC

Think about this as “constrainted LQR”.

Remember from QP, if we have a cost-to-go function, we can get u \mathbf{u} u by solving a one-step problem:
u k = arg min ⁡ u ℓ ( x k , u ) + V k + 1 ( f ( x k , u ) ) = arg min ⁡ u 1 2 u ⊤ R u + 1 2 ( A k x k + B k u ) ⊤ P k + 1 ( A k x k + B k u ) \begin{aligned} \mathbf{u}_k &= \argmin_{\mathbf{u}} \ell(\mathbf{x}_k, \mathbf{u}) + V_{k+1}(\mathbf{f}(\mathbf{x}_k, \mathbf{u}))\\ &=\argmin_{\mathbf{u}} \frac{1}{2}\mathbf{u}^\top \mathbf{R} \mathbf{u} + \frac{1}{2}\left(\mathbf{A}_k \mathbf{x}_k + \mathbf{B}_k \mathbf{u} \right)^\top \mathbf{P}_{k+1} \left(\mathbf{A}_k \mathbf{x}_k + \mathbf{B}_k \mathbf{u} \right)\\ \end{aligned} uk=uargmin(xk,u)+Vk+1(f(xk,u))=uargmin21uRu+21(Akxk+Bku)Pk+1(Akxk+Bku)

We can add constraints on u \mathbf{u} u to this one-step problem but this will perform poorly beacuse V ( x ) V(x) V(x) was computed without constraints.

So, there is no reason we can’t add more steps to the one-step problem:
min ⁡ x 1 : H , u 1 : H − 1 ( ∑ k = 1 H − 1 1 2 x k ⊤ Q x k + 1 2 u k ⊤ R u k ) + 1 2 x H ⊤ P x H s.t. x k + 1 = A k x k + B k u k x k ∈ X , u k ∈ U \min_{\mathbf{x}_{1:H}, \mathbf{u}_{1:H-1}}\left( \sum_{k=1}^{H-1} \frac{1}{2}\mathbf{x}_k^\top \mathbf{Q} \mathbf{x}_k + \frac{1}{2}\mathbf{u}_k^\top \mathbf{R} \mathbf{u}_k\right) + \frac{1}{2}\mathbf{x}_H^\top \mathbf{P} \mathbf{x}_H\\ \text{s.t.} \quad \mathbf{x}_{k+1} = \mathbf{A}_k \mathbf{x}_k + \mathbf{B}_k \mathbf{u}_k\\ \mathbf{x}_k \in \mathcal{X}, \mathbf{u}_k \in \mathcal{U} x1:H,u1:H1min(k=1H121xkQxk+21ukRuk)+21xHPxHs.t.xk+1=Akxk+BkukxkX,ukU

  • The last term 1 2 x H ⊤ P x H \frac{1}{2}\mathbf{x}_H^\top \mathbf{P} \mathbf{x}_H 21xHPxH is the LQR cost-to-go.
  • H ≤ N H\leq N HN is called the horizon.
  • With no additional constraints, MPC (“receeding horizon control”) exactly matches LQR for any H H H.
  • Intuition: explicit constrained optimization over first H H H steps gets the state close enough to the reference that the constraints are no longer active and LQR solution/cost-to-go is valid further into the future.
  • In general:
    • A good approximation of V ( x ) V(x) V(x) is important for good performance.
    • Better V ( x ) V(x) V(x) ⇒ \Rightarrow shorter horizon H H H.
    • Longer horizon H H H ⇒ \Rightarrow less reliance on V ( x ) V(x) V(x).
(3) Example: Planar quadrotor

在这里插入图片描述

Dynamics:
m x ¨ = − ( u 1 + u 2 ) sin ⁡ θ m y ¨ = ( u 1 + u 2 ) cos ⁡ θ − m g J θ ¨ = 1 2 l ( u 1 − u 2 ) \begin{aligned} m\ddot{x} &= -(u_1+u_2)\sin\theta\\ m\ddot{y} &= (u_1+u_2)\cos\theta - mg\\ J\ddot{\theta} &= \frac{1}{2}l(u_1-u_2) \end{aligned} mx¨my¨Jθ¨=(u1+u2)sinθ=(u1+u2)cosθmg=21l(u1u2)

Linearize about hover:
u 1 = u 2 = 1 2 m g u_1 = u_2 = \frac{1}{2}mg u1=u2=21mg

⇒ { Δ x ¨ = − g Δ θ Δ y ¨ = 1 m ( Δ u 1 + Δ u 2 ) Δ θ ¨ = l 2 J ( Δ u 2 − Δ u 1 ) \Rightarrow \begin{cases} \Delta\ddot{x} = -g\Delta\theta\\ \Delta\ddot{y} = \frac{1}{m}\left(\Delta u_1 + \Delta u_2\right)\\ \Delta\ddot{\theta} = \frac{l}{2J}\left(\Delta u_2 - \Delta u_1\right) \end{cases} Δx¨=gΔθΔy¨=m1(Δu1+Δu2)Δθ¨=2Jl(Δu2Δu1)

State equation:
[ Δ x ˙ Δ y ˙ Δ θ ˙ Δ x ¨ Δ y ¨ Δ θ ¨ ] = [ 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 − g 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ] [ Δ x Δ y Δ θ Δ x ˙ Δ y ˙ Δ θ ˙ ] + [ 0 0 0 0 0 0 0 0 1 m 1 m l 2 J − l 2 J ] [ Δ u 1 Δ u 2 ] \begin{bmatrix} \Delta \dot{x}\\ \Delta \dot{y}\\ \Delta \dot{\theta}\\ \Delta \ddot{x}\\ \Delta \ddot{y}\\ \Delta \ddot{\theta} \end{bmatrix} = \begin{bmatrix} 0 & 0 & 0 & 1 & 0 & 0\\ 0 & 0 & 0 & 0 & 1 & 0\\ 0 & 0 & 0 & 0 & 0 & 1\\ 0 & 0 & -g & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 0 \end{bmatrix} \begin{bmatrix} \Delta x\\ \Delta y\\ \Delta \theta\\ \Delta \dot{x}\\ \Delta \dot{y}\\ \Delta \dot{\theta} \end{bmatrix} + \begin{bmatrix} 0 & 0\\ 0 & 0\\ 0 & 0\\ 0 & 0\\ \frac{1}{m} & \frac{1}{m}\\ \frac{l}{2J} & -\frac{l}{2J} \end{bmatrix} \begin{bmatrix} \Delta u_1\\ \Delta u_2 \end{bmatrix} Δx˙Δy˙Δθ˙Δx¨Δy¨Δθ¨ = 000000000000000g00100000010000001000 ΔxΔyΔθΔx˙Δy˙Δθ˙ + 0000m12Jl0000m12Jl [Δu1Δu2]

Namely, x ˙ = A x + B u \dot{\mathbf{x}} = \mathbf{A}\mathbf{x} + \mathbf{B}\mathbf{u} x˙=Ax+Bu.

MPC cost function:
J = ∑ k = 1 H − 1 ( 1 2 ( x k − x r e f ) ⊤ Q ( x k − x r e f ) + 1 2 Δ u k ⊤ R Δ u k ) + 1 2 ( x H − x r e f ) ⊤ P H ( x H − x r e f ) J = \sum_{k=1}^{H-1} \left( \frac{1}{2}(\mathbf{x}_k-\mathbf{x}_{ref})^\top \mathbf{Q} (\mathbf{x}_k-\mathbf{x}_{ref}) + \frac{1}{2}\Delta\mathbf{u}_k^\top \mathbf{R} \Delta\mathbf{u}_k \right) \\ + \frac{1}{2}(\mathbf{x}_H-\mathbf{x}_{ref})^\top \mathbf{P}_H (\mathbf{x}_H-\mathbf{x}_{ref}) J=k=1H1(21(xkxref)Q(xkxref)+21ΔukRΔuk)+21(xHxref)PH(xHxref)

Code:

using LinearAlgebra
using PyPlot
using SparseArrays
using ForwardDiff
using ControlSystems
using OSQP

#Model parameters
g = 9.81 #m/s^2
m = 1.0 #kg 
ℓ = 0.3 #meters
J = 0.2*m*ℓ*ℓ

#Thrust limits
umin = [0.2*m*g; 0.2*m*g]
umax = [0.6*m*g; 0.6*m*g]

h = 0.05 #time step (20 Hz)

#Planar Quadrotor Dynamics
function quad_dynamics(x,u)
    θ = x[3]
    
    ẍ = (1/m)*(u[1] + u[2])*sin(θ)
    ÿ = (1/m)*(u[1] + u[2])*cos(θ) - g
    θ̈ = (1/J)*(ℓ/2)*(u[2] - u[1])
    
    return [x[4:6]; ẍ; ÿ; θ̈]
end

function quad_dynamics_rk4(x,u)
    #RK4 integration with zero-order hold on u
    f1 = quad_dynamics(x, u)
    f2 = quad_dynamics(x + 0.5*h*f1, u)
    f3 = quad_dynamics(x + 0.5*h*f2, u)
    f4 = quad_dynamics(x + h*f3, u)
    return x + (h/6.0)*(f1 + 2*f2 + 2*f3 + f4)
end

#Linearized dynamics for hovering
x_hover = zeros(6)
u_hover = [0.5*m*g; 0.5*m*g]
A = ForwardDiff.jacobian(x->quad_dynamics_rk4(x,u_hover),x_hover);
B = ForwardDiff.jacobian(u->quad_dynamics_rk4(x_hover,u),u_hover);
quad_dynamics_rk4(x_hover, u_hover)

Nx = 6     # number of state
Nu = 2     # number of controls
Tfinal = 10.0 # final time
Nt = Int(Tfinal/h)+1    # number of time steps
thist = Array(range(0,h*(Nt-1), step=h));

# Cost weights
Q = Array(1.0*I(Nx));
R = Array(.01*I(Nu));
Qn = Array(1.0*I(Nx));

#Cost function
function cost(xhist,uhist)
    cost = 0.5*xhist[:,end]'*Qn*xhist[:,end]
    for k = 1:(size(xhist,2)-1)
        cost = cost + 0.5*xhist[:,k]'*Q*xhist[:,k] + 0.5*(uhist[k]'*R*uhist[k])[1]
    end
    return cost
end

#LQR Hover Controller
P = dare(A,B,Q,R)
K = dlqr(A,B,Q,R)

function lqr_controller(t,x,K,xref)
    
    return u_hover - K*(x-xref)
end

#Build QP matrices for OSQP
Nh = 20 #one second horizon at 20Hz
Nx = 6
Nu = 2
U = kron(Diagonal(I,Nh), [I zeros(Nu,Nx)]) #Matrix that picks out all u
Θ = kron(Diagonal(I,Nh), [0 0 0 0 1 0 0 0]) #Matrix that picks out all x3 (θ)
H = sparse([kron(Diagonal(I,Nh-1),[R zeros(Nu,Nx); zeros(Nx,Nu) Q]) zeros((Nx+Nu)*(Nh-1), Nx+Nu); zeros(Nx+Nu,(Nx+Nu)*(Nh-1)) [R zeros(Nu,Nx); zeros(Nx,Nu) P]])
b = zeros(Nh*(Nx+Nu))
C = sparse([[B -I zeros(Nx,(Nh-1)*(Nu+Nx))]; zeros(Nx*(Nh-1),Nu) [kron(Diagonal(I,Nh-1), [A B]) zeros((Nh-1)*Nx,Nx)] + [zeros((Nh-1)*Nx,Nx) kron(Diagonal(I,Nh-1),[zeros(Nx,Nu) Diagonal(-I,Nx)])]])

#Dynamics + Thrust limit constraints
D = [C; U]
lb = [zeros(Nx*Nh); kron(ones(Nh),umin-u_hover)]
ub = [zeros(Nx*Nh); kron(ones(Nh),umax-u_hover)]

#Dynamics + thrust limit + bound constraint on θ to keep the system within small-angle approximation
#D = [C; U; Θ]
#lb = [zeros(Nx*Nh); kron(ones(Nh),umin-u_hover); -0.2*ones(Nh)]
#ub = [zeros(Nx*Nh); kron(ones(Nh),umax-u_hover); 0.2*ones(Nh)]

prob = OSQP.Model()
OSQP.setup!(prob; P=H, q=b, A=D, l=lb, u=ub, verbose=false, eps_abs=1e-8, eps_rel=1e-8, polish=1);

#MPC Controller
function mpc_controller(t,x,xref)
    
    #Update QP problem
    lb[1:6] .= -A*x
    ub[1:6] .= -A*x
    
    for j = 1:(Nh-1)
        b[(Nu+(j-1)*(Nx+Nu)).+(1:Nx)] .= -Q*xref
    end
    b[(Nu+(Nh-1)*(Nx+Nu)).+(1:Nx)] .= -P*xref
    
    OSQP.update!(prob, q=b, l=lb, u=ub)

    #Solve QP
    results = OSQP.solve!(prob)
    Δu = results.x[1:Nu]

    return u_hover + Δu
end

function closed_loop(x0,controller,N)
    xhist = zeros(length(x0),N)
    u0 = controller(1,x0)
    uhist = zeros(length(u0),N-1)
    uhist[:,1] .= u0
    xhist[:,1] .= x0
    for k = 1:(N-1)
        uk = controller(k,xhist[:,k])
        uhist[:,k] = max.(min.(umax, uk), umin) #enforce control limits
        xhist[:,k+1] .= quad_dynamics_rk4(xhist[:,k],uhist[:,k])
    end
    return xhist, uhist
end

x_ref = [0.0; 1.0; 0; 0; 0; 0]
x0 = [10.0; 2.0; 0.0; 0; 0; 0]
xhist1, uhist1 = closed_loop(x0, (t,x)->lqr_controller(t,x,K,x_ref), Nt);
xhist2, uhist2 = closed_loop(x0, (t,x)->mpc_controller(t,x,x_ref), Nt);

start from x 0 = [ 1.0 2.0 0.0 0 0 0 ] ⊤ \mathbf{x}_0 = \begin{bmatrix}1.0 & 2.0 & 0.0 & 0 & 0 & 0\end{bmatrix}^\top x0=[1.02.00.0000]:
state trajectory:
在这里插入图片描述
在这里插入图片描述
control input:
在这里插入图片描述

start from x 0 = [ 10.0 2.0 0.0 0 0 0 ] ⊤ \mathbf{x}_0 = \begin{bmatrix}10.0 & 2.0 & 0.0 & 0 & 0 & 0\end{bmatrix}^\top x0=[10.02.00.0000]:
state trajectory:
在这里插入图片描述
在这里插入图片描述
control input:
在这里插入图片描述

Add bound constraint on θ \theta θ to keep the system within small-angle approximation:
state trajectory:
在这里插入图片描述

control input:
在这里插入图片描述

  • 2
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值