数值求解最优控制的各种方法分类如下:
间接法求解最优控制问题的思路是:先通过解析推导得到必要条件,然后将OCP问题参数化或离散化再求解。
前面最优控制理论 三、两点边值问题求解给了一个例子,这个例子只包含终端约束。本小节在它的基础上考虑了一个内点约束(interior point constraint),将问题参数化后求解,需要注意间接法的特点就是不需要离散化
x
(
t
)
,
u
(
t
)
x(t),u(t)
x(t),u(t)。本小节给出了这样一个简单的例子。
问题描述
最短时间拦截且飞行中途经过一个点的问题,如下
min
θ
(
t
)
J
=
t
f
s.t.
{
x
˙
=
u
,
y
˙
=
v
u
˙
=
a
cos
θ
,
v
˙
=
a
sin
θ
\min_{\theta(t)}J=t_f\\ \text{s.t.}\left\{\begin{matrix}\dot x=u,\ \dot y=v\\ \dot u=a\cos\theta,\ \dot v=a\sin\theta \end{matrix}\right.
θ(t)minJ=tfs.t.{x˙=u, y˙=vu˙=acosθ, v˙=asinθ
已知初始点状态、落点位置、经过点的位置,即
x
(
0
)
=
[
x
(
0
)
,
y
(
0
)
,
u
(
0
)
,
v
(
0
)
]
T
=
0
,
N
(
x
(
t
1
)
,
t
1
)
=
[
x
(
t
1
)
−
x
1
y
(
t
1
)
−
y
1
]
=
0
,
t
1
∈
(
0
,
t
f
)
ψ
(
x
(
t
f
)
,
t
f
)
=
[
x
(
t
f
)
−
x
f
y
(
t
f
)
]
=
0
\mathbf x(0)=[x(0),y(0),u(0),v(0)]^\mathrm T=0,\\ N(\mathbf x(t_1),t_1)=\begin{bmatrix}x(t_1)-x_1\\ y(t_1)-y_1\end{bmatrix}=0,\ t_1\in(0,t_f)\\ \psi(\mathbf x(t_f),t_f)=\begin{bmatrix}x(t_f)-x_f\\y(t_f)\end{bmatrix}=0
x(0)=[x(0),y(0),u(0),v(0)]T=0,N(x(t1),t1)=[x(t1)−x1y(t1)−y1]=0, t1∈(0,tf)ψ(x(tf),tf)=[x(tf)−xfy(tf)]=0
且
t
1
t_1
t1未知。求最优控制
θ
∗
(
t
)
\theta^*(t)
θ∗(t)
间接法一阶必要条件推导
首先写出哈密尔顿函数
H
=
1
+
λ
1
u
+
λ
2
v
+
λ
3
a
cos
θ
+
λ
4
a
sin
θ
H=1+\lambda_1u+\lambda_2v+\lambda_3a\cos\theta+\lambda_4a\sin\theta
H=1+λ1u+λ2v+λ3acosθ+λ4asinθ,协态方程
λ
˙
(
t
)
=
−
∂
H
∂
x
=
[
0
0
−
λ
1
−
λ
2
]
\dot\lambda(t)=-\frac {\partial H}{\partial \mathbf x}=\begin{bmatrix}0\\0\\ -\lambda_1\\-\lambda_2\end{bmatrix}
λ˙(t)=−∂x∂H=
00−λ1−λ2
控制方程 ∂ H ∂ θ = − λ 3 sin θ + λ 4 a cos θ = 0 ⇒ tan θ ∗ = λ 4 λ 3 \frac{\partial H}{\partial \theta}=-\lambda_3\sin\theta+\lambda_4a\cos\theta=0\\\rArr\tan\theta^*=\frac{\lambda_4}{\lambda_3} ∂θ∂H=−λ3sinθ+λ4acosθ=0⇒tanθ∗=λ3λ4
可见只要求出协态变量就可以得到最优控制。对终端约束引入拉格朗日乘子
μ
∈
R
2
,
Φ
=
ϕ
+
μ
T
ψ
\mu\in\R^2,\Phi=\phi+\mu^\mathrm T\psi
μ∈R2,Φ=ϕ+μTψ,终端约束与终端时间满足:
λ
(
t
f
)
=
∂
Φ
∂
x
=
[
μ
1
,
μ
2
,
0
,
0
]
T
H
(
t
f
)
+
∂
Φ
∂
t
f
=
1
+
μ
1
u
(
t
f
)
+
μ
2
v
(
t
f
)
=
0
\lambda(t_f)=\frac{\partial\Phi}{\partial \mathbf x}=[\mu_1,\mu_2,0,0]^\mathrm T\\ H(t_f)+\frac{\partial \Phi}{\partial t_f}=1+\mu_1u(t_f)+\mu_2v(t_f)=0
λ(tf)=∂x∂Φ=[μ1,μ2,0,0]TH(tf)+∂tf∂Φ=1+μ1u(tf)+μ2v(tf)=0
对内点约束引入拉格朗日乘子
π
∈
R
2
\pi\in\R^2
π∈R2,内点约束满足:
λ
(
t
1
−
)
=
λ
(
t
1
+
)
+
π
T
∂
N
∂
x
\lambda(t_{1-})=\lambda(t_{1+})+\pi^\mathrm T\frac{\partial N}{\partial \mathbf x}
λ(t1−)=λ(t1+)+πT∂x∂N
推导可得
λ
1
−
=
λ
1
+
+
π
1
=
μ
1
+
π
1
λ
2
−
=
λ
2
+
+
π
2
=
μ
2
+
π
2
\lambda_{1-}=\lambda_{1+}+\pi_1=\mu_1+\pi_1\\ \lambda_{2-}=\lambda_{2+}+\pi_2=\mu_2+\pi_2\\
λ1−=λ1++π1=μ1+π1λ2−=λ2++π2=μ2+π2
λ
3
,
4
\lambda_{3,4}
λ3,4在
t
1
t_1
t1处连续,且可得
θ
(
t
)
\theta(t)
θ(t)连续。由于协态变量是分段的,在两个阶段都不变而
λ
1
,
2
\lambda_{1,2}
λ1,2只在
t
1
t_1
t1处跳变,从
t
f
t_f
tf时刻往前推导可以得到
λ
1
(
t
)
=
{
μ
1
+
π
1
t
∈
[
0
,
t
1
)
μ
1
t
∈
(
t
1
,
t
f
]
λ
2
(
t
)
=
{
μ
2
+
π
2
t
∈
[
0
,
t
1
)
μ
2
t
∈
(
t
1
,
t
f
]
λ
3
(
t
)
=
{
μ
1
t
f
+
t
1
(
μ
1
+
π
1
)
−
(
2
μ
1
+
π
1
)
t
t
∈
[
0
,
t
1
)
μ
1
(
t
f
−
t
)
t
∈
(
t
1
,
t
f
]
λ
4
(
t
)
=
{
μ
2
t
f
+
t
1
(
μ
2
+
π
2
)
−
(
2
μ
2
+
π
2
)
t
t
∈
[
0
,
t
1
)
μ
2
(
t
f
−
t
)
t
∈
(
t
1
,
t
f
]
\lambda_1(t)=\begin{cases}\mu_1+\pi_1&t\in[0,t_1)\\ \mu_1&t\in(t_1,t_f]\end{cases}\\ \lambda_2(t)=\begin{cases}\mu_2+\pi_2&t\in[0,t_1)\\ \mu_2&t\in(t_1,t_f]\end{cases}\\ \lambda_3(t)= \begin{cases}\mu_1t_f+t_1(\mu_1+\pi_1)-(2\mu_1+\pi_1)t&t\in[0,t_1)\\ \mu_1(t_f-t)&t\in(t_1,t_f]\end{cases}\\ \lambda_4(t)= \begin{cases}\mu_2t_f+t_1(\mu_2+\pi_2)-(2\mu_2+\pi_2)t&t\in[0,t_1)\\ \mu_2(t_f-t)&t\in(t_1,t_f]\end{cases}\\
λ1(t)={μ1+π1μ1t∈[0,t1)t∈(t1,tf]λ2(t)={μ2+π2μ2t∈[0,t1)t∈(t1,tf]λ3(t)={μ1tf+t1(μ1+π1)−(2μ1+π1)tμ1(tf−t)t∈[0,t1)t∈(t1,tf]λ4(t)={μ2tf+t1(μ2+π2)−(2μ2+π2)tμ2(tf−t)t∈[0,t1)t∈(t1,tf]
另外,内点时间 H ( ⋅ ∗ , t ) H(\cdot^*,t) H(⋅∗,t)连续,即 H ( ⋅ ∗ , t 1 − ) = H ( ⋅ ∗ , t 1 + ) + μ T ∂ ψ ∂ t 1 ⇒ λ 1 − u ( t 1 ) + λ 2 − v ( t 1 ) = λ 1 + u ( t 1 ) + λ 2 + v ( t 1 ) ⇒ u ( t 1 ) π 1 = − v ( t 1 ) π 2 H(\cdot^*,t_{1-})=H(\cdot^*,t_{1+})+\mu^\mathrm T\frac{\partial \psi}{\partial t_{1}}\\ \rArr \lambda_{1-}u(t_1)+\lambda_{2-}v(t_1)=\lambda_{1+}u(t_1)+\lambda_{2+}v(t_1)\\ \rArr u(t_1)\pi_1=-v(t_1)\pi_2 H(⋅∗,t1−)=H(⋅∗,t1+)+μT∂t1∂ψ⇒λ1−u(t1)+λ2−v(t1)=λ1+u(t1)+λ2+v(t1)⇒u(t1)π1=−v(t1)π2
把以上这些条件代入解析表达式进行数值求解,问题转化为参数 [ t 1 , t f , μ 1 , μ 2 , π 1 , π 2 ] [t_1,t_f,\mu_1,\mu_2,\pi_1,\pi_2] [t1,tf,μ1,μ2,π1,π2]最优化问题,需要满足微分方程约束 x ˙ = f ( x ( t ) , θ ( t ) , t ) \dot{\mathbf {x}}=\mathbf f(\mathbf x(t),\theta(t),t) x˙=f(x(t),θ(t),t)以及代数方程约束:
1 + μ 1 u ( t f ) + μ 2 v ( t f ) = 0 ( μ 1 + π 1 ) u ( t 1 ) + ( μ 2 + π 2 ) v ( t 1 ) = μ 1 u ( t 1 ) + μ 2 v ( t 1 ) \begin{aligned} 1+\mu_1u(t_f)+\mu_2v(t_f)&=0\\ (\mu_1+\pi_1)u(t_1)+(\mu_2+\pi_2)v(t_1)&=\mu_1u(t_1)+\mu_2v(t_1) \end{aligned} 1+μ1u(tf)+μ2v(tf)(μ1+π1)u(t1)+(μ2+π2)v(t1)=0=μ1u(t1)+μ2v(t1)
数值求解
事实上,经过如上间接法的推导过程,已经将原来的两点边值问题转化成参数寻优问题,只需要求解 [ t 1 , t f , μ 1 , μ 2 , π 1 , π 2 ] [t_1,t_f,\mu_1,\mu_2,\pi_1,\pi_2] [t1,tf,μ1,μ2,π1,π2]就可以得到最优控制。这是间接法最突出的优点,也就是有可能(不一定所有问题都行)将函数空间中的无穷维优化问题转化为有限个参数的最优化问题,问题的规模大大降低。计算最难的地方在于如何猜测协态变量相关的参数,由于本文这个例子参数较少因此容易收敛。直接给出MATLAB程序,欢迎交流学习、批评指正。
function [p,X]=Indirect_DSS_InteriorPoint
% 最短时间拦截,带有一个内点约束
% 间接法推导必要条件,把协态变量函数、以及OCP问题本身
% 用(mu1,mu2,pi1,pi2,t1,tf)几个参数刻画,求解参数优化问题即可
% 采用直接打靶法
% 计算量小于直接法
% edited by: Siyang Meng in Northwestern Polytechnical University
%% problem parameters
p.ns = 4; p.nu = 1; % number of states and controls
p.t0 = 0;
p.tf = 12; % tf为性能指标
p.acc=60;
p.X10 = 0; p.X20 = 0; p.X30=0; p.X40=0;% initial conditions
p.X11 = 1500; p.X21=300; % interior point the trajectory passes
p.X1f = 2000; p.X2f = 0; % boundary conditions
%% shooting parameters
p.nt1 = 30; % number of node points in Phase 1
p.nt2 = 10; % number of node points in Phase 2( t1 not included )
%% initial guess of decision variables, based on emprical evidance
decVar0 = [6; % t1 initial Guess
2; % tf-t1 initial Guess
-0.3;% mu1
0.2;% mu2
1.1;% pi1
1];% pi2
% % u 不是决策变量
%% solve the problem with NLP method
options = optimoptions(@fmincon,'Display','Iter','MaxIter',1e3); % option
[decVar_NLP,minJ] = fmincon(@(decVar) objective(decVar,p),decVar0,...
[],[],[],[],[],[],@(x) constraints(x,p),options);
minJ
%% obtain the optimal solution
[~,ceq,p] = constraints(decVar_NLP,p)
[~,X] = ode45(@(t,y) odeCon(t,y,p),p.t,[p.X10,p.X20,p.X30,p.X40]);
y1 = X(:,1); y2 = X(:,2); % extract states
%% plots
figure(1)
plot(y1,y2);hold on,plot(p.X11,p.X21,'bo',p.X1f,p.X2f,'rx')
xlabel('x/m');ylabel('y/m');
title('trajectory');
figure(2);
plot(p.t,p.u);
xlabel('t/s');ylabel('θ(t)/rad');title('Control');
end
%% objective function
function f = objective(decVar,p)
f=decVar(1)+decVar(2);
% u = x(p.u_IDX); % extract
% L = u.^2/2; % integrand
% f = trapz(p.t,L); % calculate objective
end
%% constraint function, c<=0, ceq=0
function [c,ceq,p] = constraints(decVar,p)
p.t1 = decVar(1);
p.dtf = decVar(2);
p.mu1= decVar(3);
p.mu2= decVar(4);
p.pi1= decVar(5);
p.pi2= decVar(6);
% p.u = decVar(p.u_IDX); % extract
p.t = [linspace(p.t0,p.t1,p.nt1)';
linspace(p.t1,p.t1+p.dtf,p.nt2)']; % time horizon
p.t(p.nt1)=p.t1-0.001;
p.t(p.nt1+1)=p.t1+0.001;
% 由解析表达式得到最优控制
[p.u,p.lambda]=necessaryOptimal(p.t,p);
[~,Y] = ode45(@(t,y) odeCon(t,y,p),p.t,[p.X10,p.X20,p.X30,p.X40]); % simulation
% path and terminal constraints
ceq1 = Y(p.nt1,1) - p.X11; % passes position
ceq2 = Y(p.nt1,2) - p.X21;
ceq3 = Y(end,1) - p.X1f; % final state conditions
ceq4 = Y(end,2) - p.X2f;
% interoir time and terminal time constraints
ceq5 = p.pi1*( Y(p.nt1,3)-Y(p.nt1+1,3) )+ p.pi1*Y(p.nt1,3) + ...
p.pi2*( Y(p.nt1,4)-Y(p.nt1+1,4) )+ p.pi2*Y(p.nt1,4);
ceq6 = 1+p.pi1*Y(end,3)+p.pi2*Y(end,4);
ceq = [ ceq1;
ceq2;
ceq3;
ceq4;
ceq5;
ceq6];
% inequality constraints
c1= 0.1-p.t1;
c2= -p.dtf;
c3= p.u-pi;% Active Constraints
c4=-p.u-pi;
c=[ c1;
c2;
c3;
c4];
end
%% indirect method result
function [theta,lambda]=necessaryOptimal(t,p)
t1=p.t1; tf=t1+p.dtf; mu1=p.mu1; mu2=p.mu2; pi1=p.pi1; pi2=p.pi2;
t_phase1=t(1:p.nt1);
t_phase2=t(p.nt1+1:p.nt1+p.nt2);
lambda1=[repmat(mu1+pi1,p.nt1,1);
repmat(mu1 ,p.nt2,1)];
lambda2=[repmat(mu2+pi2,p.nt1,1);
repmat(mu2 ,p.nt2,1)];
lambda3=[mu1*tf+t1*(mu1+pi1)-(2*mu1+pi1)*t_phase1;
mu1*(tf-t_phase2)];
lambda4=[mu2*tf+t1*(mu2+pi2)-(2*mu2+pi2)*t_phase1;
mu2*(tf-t_phase2)];
theta=atan2(lambda4,lambda3);
theta(end)=theta(end-1);% for case that lambda3=lambda4=0
lambda=[lambda1,lambda2,lambda3,lambda4];
end
%% derivative function
function dydt = odeCon(t,X,p)
theta = interp1(p.t,p.u,t); % u
dydt(1,1) = X(3);
dydt(2,1) = X(4);
dydt(3,1) = p.acc*cos(theta);
dydt(4,1) = p.acc*sin(theta);
end
求解结果如下
参考文献
[1] Bryson A E , Ho Y C ,Applied optimal control : optimization, estimation, and control. Sec 3.5[J]. IEEE Transactions on Systems Man & Cybernetics, 1975
[2] Moritz Diehl, Numerical Optimal Control (draft), Sec 8.2. 2011