本博客是最优控制理论 五、极大值原理→控制不等式约束的后续,计算部分。
极大值原理求解最优控制问题属于间接法,它的数值求解基本思路仍然是求解Hamilton系统,也就是包含状态 x ( t ) x(t) x(t),协态 λ ( t ) \lambda(t) λ(t)的变化。它与Hamilton函数法协态方程是一样的,求解的主要区别是,状态方程是在开关开3种系统方程之间切换。由于切换点的时刻和状态先验未知,因此这样构成的问题属于多点边值问题。类似于前面最优控制理论 三、两点边值问题求解这一节的思路,仍然是把最优控制问题转化为求根问题,主要求解步骤是:
- 猜测协态变量的初始值 λ ( t 0 ) \lambda(t_0) λ(t0),
- 然后代入求解Bang-Bang控制的哈密尔顿系统: 通过非线性方程求根方法进行打靶(Shooting)得到准确的协态初值,
- 积分这个ODE系统自然能得到最优控制的解析解 u ( x , λ ) u(x,\lambda) u(x,λ)。
本篇博客的重点不是多点边值问题求解,而是如何用MATLAB程序实现极大值原理所推导的必要条件。虽然这个编程实现很简单,但也有一些小技巧,希望看到这篇博客的人能有所收获。
案例
一维运动,
t
f
t_f
tf未知,综合性能指标如下
min
a
(
t
)
J
=
t
f
+
∫
0
t
f
1
2
R
a
2
d
t
s.t.
{
x
˙
=
v
v
˙
=
a
a
∈
[
−
2
,
1
]
\min_{a(t)}J=t_f+\int_0^{t_f}\frac{1}{2}Ra^2\mathrm dt\\ \text{s.t.}\left\{\begin{matrix} \dot x=v\\ \dot v=a\\ a\in[-2,1] \end{matrix}\right.
a(t)minJ=tf+∫0tf21Ra2dts.t.⎩
⎨
⎧x˙=vv˙=aa∈[−2,1]
x ( 0 ) = [ x ( 0 ) , v ( 0 ) ] T = 0 , ψ ( x ( t f ) , t f ) = [ x ( t f ) − 10 v ( t f ) ] = 0 \mathbf x(0)=[x(0),v(0)]^\mathrm T=0,\\ \psi(\mathbf x(t_f),t_f)=\begin{bmatrix}x(t_f)-10\\ v(t_f)\end{bmatrix}=0 x(0)=[x(0),v(0)]T=0,ψ(x(tf),tf)=[x(tf)−10v(tf)]=0
首先写出Hamilton函数,按照PMP推导必要条件:
λ
1
=
−
∂
H
∂
x
=
0
λ
2
=
−
∂
H
∂
v
=
−
λ
1
H
=
1
+
λ
1
v
+
λ
2
a
+
1
2
R
a
2
H
(
a
∗
,
⋅
)
⩽
H
(
a
,
⋅
)
\begin{aligned} &\lambda_{1}=-\frac{\partial H}{\partial x}=0 \\ &\lambda_{2}=-\frac{\partial H}{\partial v}=-\lambda_{1}\\ &H=1+\lambda_{1} v+\lambda_{2} a+\frac{1}{2} R a^{2} \\ &H\left(a^{*},\cdot\right) \leqslant H(a,\cdot) \end{aligned}
λ1=−∂x∂H=0λ2=−∂v∂H=−λ1H=1+λ1v+λ2a+21Ra2H(a∗,⋅)⩽H(a,⋅)
这个二次函数的极小值,在可行域
a
∈
[
−
2
,
1
]
a\in[-2,1]
a∈[−2,1]的约束下,使
λ
2
a
+
1
/
2
R
a
2
\lambda_{2} a+1/2 R a^{2}
λ2a+1/2Ra2取得极小值的
a
∗
a^*
a∗如下图所示:
最优控制
a
∗
a^*
a∗解析表达式为:
a
∗
=
{
−
2
,
λ
2
>
2
R
−
λ
2
R
,
λ
2
∈
[
−
R
,
2
R
]
1
,
λ
2
<
−
R
a^{*}= \begin{cases}-2, & \lambda_{2}>2 R \\ -\frac{\lambda_{2}}{R}, & \lambda_{2} \in[-R, 2 R] \\ 1, & \lambda_{2}<-R\end{cases}
a∗=⎩
⎨
⎧−2,−Rλ2,1,λ2>2Rλ2∈[−R,2R]λ2<−R另外还有必要条件,由于终端时刻未知,所以
t
f
t_f
tf时刻的必要条件为:
H
+
∂
φ
∂
t
=
1
+
λ
1
v
+
λ
2
a
∗
+
1
2
R
a
∗
2
=
1
+
λ
2
a
+
1
2
R
a
2
=
0
\begin{aligned} H+\frac{\partial \varphi}{\partial t} &=1+\lambda_{1}v+\lambda_{2} a^*+\frac{1}{2} R a^{*2} \\ &=1+\lambda_{2} a+\frac{1}{2} R a^{2}=0 \end{aligned}
H+∂t∂φ=1+λ1v+λ2a∗+21Ra∗2=1+λ2a+21Ra2=0根据这一点大概可以猜测一组
λ
1
,
λ
2
(
t
0
)
,
t
f
\lambda_1,\lambda_2(t_0),t_f
λ1,λ2(t0),tf.接下来就可以正式求解这个问题了。
求解程序说明
如博客一开始的讲述,这里需要构造一个打靶函数,积分Hamilton系统得到它的最终状态,使其满足约束条件。也就是说,要求打靶函数 Φ ( λ 0 ) − z ( t f ) = 0 \Phi(\lambda_0)-\mathbf z(t_f)=0 Φ(λ0)−z(tf)=0的根 λ 0 \lambda_0 λ0,其中 z ≡ [ x ( t f ) , v ( t f ) , H ( t f , ⋅ ) ] \mathbf z\equiv [x(t_f),v(t_f),H(t_f,\cdot)] z≡[x(tf),v(tf),H(tf,⋅)],由于方程数目大于自变量个数,因此需要一些转化技巧。
这里给出一个求解包含Bang-Bang控制形式的
H
a
m
i
l
t
o
n
Hamilton
Hamilton系统的建议,如果直接去用odeSolver
去求解这个
o
d
e
ode
ode系统,就会发现它的求解速度很慢。原因是开关函数的不连续性导致
o
d
e
ode
ode系统难以预测,如果步长不放得很小的话,计算精度很差。我这里是把Switching function单独提取出来,作为
o
d
e
ode
ode系统的触发条件,这样的话odeSolver
就可以自动监测Bang-Bang控制的出现时间。另外,计算终止的触发条件为
v
(
t
f
)
=
0
v(t_f)=0
v(tf)=0。
完整的程序如下:
function [Lam0_opt,sol] = BangBangIdPMP
%indirect Pontryagin Maximium Principle, to get Bang-Bang optimal control
% Edited by Siyang Meng in Northwestern Polytechnical University
% 2021-10-9 17:28:11
% callfunction
% editor: Siyang Meng, from Northwestern Polytechnical University
amin = -2;
amax = 1;
var.R =0.3;
t0 = 0;
tf = 30;% 这是一个很大的值,肯定用不了这么长时间,因为tf是触发得到的,因此需要给足
yinit = [0;0];
yfinal= [100;0];
%% nonlinear root finding
events = @(t,x)switchingOff(t,x,var);
refine = 1;
% Hamilton ODE dynamical system integration options
options = odeset('Events',events,'OutputFcn',[],'refine',refine);
options.AbsTol=1e-6;
options.RelTol=1e-4;
options.MaxStep=0.01;
options.step=0.001;
% solve shooting problem
% levenberg-marquardt trust-region-reflective trust-region-dogleg
soloptions = optimoptions('fsolve','Display','iter','Algorithm','levenberg-marquardt',...
'TolX',1e-8,'TolFun',1e-6,'MaxFunEvals',1000);
Lam00 = [-0.5 ; -3];
[rt,dXf,tout,yout,uout,teout,yeout,ieout] = ShootingTmlErr(Lam00,1);
plotUXL(tout,yout,uout);
[Lam0_opt,FVAL,EXITFLAG,OUTPUT,JACOB] = fsolve(@ShootingTmlErr,Lam00,soloptions);
%% result
[rt,dXf,tout,yout,uout,teout,yeout,ieout] = ShootingTmlErr(Lam0_opt,1);
sol = struct;
sol.var = var;
sol.xopt = Lam0_opt;
sol.rt = rt;
sol.t = tout;
sol.x = yout;
sol.u = uout;
sol.teout = teout;
sol.yeout = yeout;
% plot result
plotUXL(tout,yout,uout);
%%
function plotUXL(t,yout,uout)
t=tout; y=yout;
figure(1)
subplot(2,1,1)
plot(t,y(:,1),'k-'),title('state'),ylabel('x/m');
subplot(2,1,2)
plot(t,y(:,2),'k-'),xlabel('t/s'),ylabel('v/m/s');
hold off
figure(2)
plot(t,uout);xlabel('t/s'),ylabel('accleration'),title('control');
a=diff(y(:,2))./diff(t);
hold on
plot(t(1:end-1),a)
hold off
figure(3)
plot(t,y(:,3:4));xlabel('t/s'),ylabel('costate');
legend('\lambda_x','\lambda_v')
hold off
end
function dx = idHamiltonSystem(t,x,var,uflag)
% a template for Bang-Bang control, Bang-Bang control solved in segments
state = x(1:2);
costate = x(3:4);
switch uflag
case 'min'
u= amin;
case 'max'
u= amax;
case 'sf'
u = - x(4)/var.R;
end
dstate = [state(2);
u];
dcostate = [0 ;
-costate(1)];
dx = [dstate;
dcostate];
end
function [value,isterminal,direction,uflag] =switchingOff(t,x,var)
% a template for Bang-Bang control, to judge when to switch
Sf = - x(4)/var.R;
if Sf<amin
uflag = 'min';
else if Sf>amax
uflag = 'max';
else
uflag = 'sf';
end
end
value = [Sf-amin;
Sf-amax;
x(2)]; % Detect v = 0
isterminal = [1;1;1]; % Stop the integration
direction = [0;0;-1]; % direction
end
function [rt,dXf,tout,yout,uout,teout,yeout,ieout] = ShootingTmlErr(Lam0,outputSave)
% nonlinear root finding problem, OCP shooting formulate
% input: costate initial guess
% output: terminal constraint error,
% switching Hamiltonian system: swtiching points, state and
% costate
Lam0 = reshape(Lam0,2,1);
x0 = [yinit;
Lam0];
% 数值积分求解系统
tstart = t0;
y0 = x0; % 每一段迭代更新
[~,~,~,onOff] =switchingOff(t0,y0,var);
if nargin==2&&outputSave
tout = tstart;
yout = x0';
teout = [];
yeout = [];
ieout = [];
end
for i = 1:100
% Solve until the first terminal event.
system = @(t,x)idHamiltonSystem(t,x,var,onOff);
[t,y,te,ye,ie] = ode45(system,[tstart tf],y0,options);
% Accumulate output. This could be passed out as output arguments.
nt = length(t);
if nargin==2&&outputSave
tout = [tout; t(2:nt)];
yout = [yout; y(2:nt,:)];
teout = [teout; te]; % Events at tstart are never reported.
yeout = [yeout; ye];
ieout = [ieout; ie];
end
tstart = t(nt);
if sum(ie==3)||tstart==tf
if nargin==2&&outputSave
teout = [teout; tf]; % Events at tstart are never reported.
yeout = [yeout; yout(end,:)];
ieout = [ieout; 0];
end
break;
else
y0 = reshape(ye(end,:),4,1);
% Set the new control throttle
[~,~,~,onOff] =switchingOff(te,ye,var);
end
end
if nargin==1
yout=y;
end
uout = [- yout(:,4)/var.R];
uout(uout>amax)=amax;
uout(uout<amin)=amin;
% terminal state triggered
y_end = y(end,:)';
a = uout(end);
Hf= 1+y_end(4)*a+0.5*var.R*a^2;
dXf = [yfinal - y_end(1:2)];
% rt = dXf;
rt = [dXf(1);
Hf-0];
end
end
所得到的结果如下:
打靶所得的解为
λ
(
t
0
)
=
[
−
0.2790
,
−
3.0933
]
\lambda(t_0)=[-0.2790,-3.0933]
λ(t0)=[−0.2790,−3.0933],开关时间序列为
[
10.013
,
13.240
]
[10.013, 13.240]
[10.013,13.240],在这个区间是过渡区,最短时间为
t
f
=
17.440
t_f=17.440
tf=17.440。 这个程序很容易可以改成最小能量、或最短时间Bang-Bang控制的两种问题,可以作为复杂问题的程序框架。
需要说明,间接打靶法的求解需要初始猜测非常接近最优解,而且计算的不稳定性很强。另外由于间接法局限于解析推导,所能解决的实际问题不多,这也是限制它的应用的一个原因。