最优控制理论 三+、间接法参数化方法(带内点约束)

数值求解最优控制的各种方法分类如下:
连续时间OCP求解方法分类

间接法求解最优控制问题的思路是:先通过解析推导得到必要条件,然后将OCP问题参数化或离散化再求解。
前面最优控制理论 三、两点边值问题求解给了一个例子,这个例子只包含终端约束。本小节在它的基础上考虑了一个内点约束(interior point constraint),将问题参数化后求解,需要注意间接法的特点就是不需要离散化 x ( t ) , u ( t ) x(t),u(t) x(t),u(t)。本小节给出了这样一个简单的例子。

问题描述

最短时间拦截且飞行中途经过一个点的问题,如下
min ⁡ θ ( t ) J = t f s.t. { x ˙ = u ,   y ˙ = v u ˙ = a cos ⁡ θ ,   v ˙ = a sin ⁡ θ \min_{\theta(t)}J=t_f\\ \text{s.t.}\left\{\begin{matrix}\dot x=u,\ \dot y=v\\ \dot u=a\cos\theta,\ \dot v=a\sin\theta \end{matrix}\right. θ(t)minJ=tfs.t.{x˙=u, y˙=vu˙=acosθ, v˙=asinθ

已知初始点状态、落点位置、经过点的位置,即
x ( 0 ) = [ x ( 0 ) , y ( 0 ) , u ( 0 ) , v ( 0 ) ] T = 0 , N ( x ( t 1 ) , t 1 ) = [ x ( t 1 ) − x 1 y ( t 1 ) − y 1 ] = 0 ,   t 1 ∈ ( 0 , t f ) ψ ( x ( t f ) , t f ) = [ x ( t f ) − x f y ( t f ) ] = 0 \mathbf x(0)=[x(0),y(0),u(0),v(0)]^\mathrm T=0,\\ N(\mathbf x(t_1),t_1)=\begin{bmatrix}x(t_1)-x_1\\ y(t_1)-y_1\end{bmatrix}=0,\ t_1\in(0,t_f)\\ \psi(\mathbf x(t_f),t_f)=\begin{bmatrix}x(t_f)-x_f\\y(t_f)\end{bmatrix}=0 x(0)=[x(0),y(0),u(0),v(0)]T=0,N(x(t1),t1)=[x(t1)x1y(t1)y1]=0, t1(0,tf)ψ(x(tf),tf)=[x(tf)xfy(tf)]=0

t 1 t_1 t1未知。求最优控制 θ ∗ ( t ) \theta^*(t) θ(t)
在这里插入图片描述

间接法一阶必要条件推导

首先写出哈密尔顿函数 H = 1 + λ 1 u + λ 2 v + λ 3 a cos ⁡ θ + λ 4 a sin ⁡ θ H=1+\lambda_1u+\lambda_2v+\lambda_3a\cos\theta+\lambda_4a\sin\theta H=1+λ1u+λ2v+λ3acosθ+λ4asinθ,协态方程
λ ˙ ( t ) = − ∂ H ∂ x = [ 0 0 − λ 1 − λ 2 ] \dot\lambda(t)=-\frac {\partial H}{\partial \mathbf x}=\begin{bmatrix}0\\0\\ -\lambda_1\\-\lambda_2\end{bmatrix} λ˙(t)=xH= 00λ1λ2

控制方程 ∂ H ∂ θ = − λ 3 sin ⁡ θ + λ 4 a cos ⁡ θ = 0 ⇒ tan ⁡ θ ∗ = λ 4 λ 3 \frac{\partial H}{\partial \theta}=-\lambda_3\sin\theta+\lambda_4a\cos\theta=0\\\rArr\tan\theta^*=\frac{\lambda_4}{\lambda_3} θH=λ3sinθ+λ4acosθ=0tanθ=λ3λ4

可见只要求出协态变量就可以得到最优控制。对终端约束引入拉格朗日乘子 μ ∈ R 2 , Φ = ϕ + μ T ψ \mu\in\R^2,\Phi=\phi+\mu^\mathrm T\psi μR2,Φ=ϕ+μTψ,终端约束与终端时间满足:
λ ( t f ) = ∂ Φ ∂ x = [ μ 1 , μ 2 , 0 , 0 ] T H ( t f ) + ∂ Φ ∂ t f = 1 + μ 1 u ( t f ) + μ 2 v ( t f ) = 0 \lambda(t_f)=\frac{\partial\Phi}{\partial \mathbf x}=[\mu_1,\mu_2,0,0]^\mathrm T\\ H(t_f)+\frac{\partial \Phi}{\partial t_f}=1+\mu_1u(t_f)+\mu_2v(t_f)=0 λ(tf)=xΦ=[μ1,μ2,0,0]TH(tf)+tfΦ=1+μ1u(tf)+μ2v(tf)=0

对内点约束引入拉格朗日乘子 π ∈ R 2 \pi\in\R^2 πR2,内点约束满足:
λ ( t 1 − ) = λ ( t 1 + ) + π T ∂ N ∂ x \lambda(t_{1-})=\lambda(t_{1+})+\pi^\mathrm T\frac{\partial N}{\partial \mathbf x} λ(t1)=λ(t1+)+πTxN

推导可得
λ 1 − = λ 1 + + π 1 = μ 1 + π 1 λ 2 − = λ 2 + + π 2 = μ 2 + π 2 \lambda_{1-}=\lambda_{1+}+\pi_1=\mu_1+\pi_1\\ \lambda_{2-}=\lambda_{2+}+\pi_2=\mu_2+\pi_2\\ λ1=λ1++π1=μ1+π1λ2=λ2++π2=μ2+π2

λ 3 , 4 \lambda_{3,4} λ3,4 t 1 t_1 t1处连续,且可得 θ ( t ) \theta(t) θ(t)连续。由于协态变量是分段的,在两个阶段都不变而 λ 1 , 2 \lambda_{1,2} λ1,2只在 t 1 t_1 t1处跳变,从 t f t_f tf时刻往前推导可以得到
λ 1 ( t ) = { μ 1 + π 1 t ∈ [ 0 , t 1 ) μ 1 t ∈ ( t 1 , t f ] λ 2 ( t ) = { μ 2 + π 2 t ∈ [ 0 , t 1 ) μ 2 t ∈ ( t 1 , t f ] λ 3 ( t ) = { μ 1 t f + t 1 ( μ 1 + π 1 ) − ( 2 μ 1 + π 1 ) t t ∈ [ 0 , t 1 ) μ 1 ( t f − t ) t ∈ ( t 1 , t f ] λ 4 ( t ) = { μ 2 t f + t 1 ( μ 2 + π 2 ) − ( 2 μ 2 + π 2 ) t t ∈ [ 0 , t 1 ) μ 2 ( t f − t ) t ∈ ( t 1 , t f ] \lambda_1(t)=\begin{cases}\mu_1+\pi_1&t\in[0,t_1)\\ \mu_1&t\in(t_1,t_f]\end{cases}\\ \lambda_2(t)=\begin{cases}\mu_2+\pi_2&t\in[0,t_1)\\ \mu_2&t\in(t_1,t_f]\end{cases}\\ \lambda_3(t)= \begin{cases}\mu_1t_f+t_1(\mu_1+\pi_1)-(2\mu_1+\pi_1)t&t\in[0,t_1)\\ \mu_1(t_f-t)&t\in(t_1,t_f]\end{cases}\\ \lambda_4(t)= \begin{cases}\mu_2t_f+t_1(\mu_2+\pi_2)-(2\mu_2+\pi_2)t&t\in[0,t_1)\\ \mu_2(t_f-t)&t\in(t_1,t_f]\end{cases}\\ λ1(t)={μ1+π1μ1t[0,t1)t(t1,tf]λ2(t)={μ2+π2μ2t[0,t1)t(t1,tf]λ3(t)={μ1tf+t1(μ1+π1)(2μ1+π1)tμ1(tft)t[0,t1)t(t1,tf]λ4(t)={μ2tf+t1(μ2+π2)(2μ2+π2)tμ2(tft)t[0,t1)t(t1,tf]

另外,内点时间 H ( ⋅ ∗ , t ) H(\cdot^*,t) H(,t)连续,即 H ( ⋅ ∗ , t 1 − ) = H ( ⋅ ∗ , t 1 + ) + μ T ∂ ψ ∂ t 1 ⇒ λ 1 − u ( t 1 ) + λ 2 − v ( t 1 ) = λ 1 + u ( t 1 ) + λ 2 + v ( t 1 ) ⇒ u ( t 1 ) π 1 = − v ( t 1 ) π 2 H(\cdot^*,t_{1-})=H(\cdot^*,t_{1+})+\mu^\mathrm T\frac{\partial \psi}{\partial t_{1}}\\ \rArr \lambda_{1-}u(t_1)+\lambda_{2-}v(t_1)=\lambda_{1+}u(t_1)+\lambda_{2+}v(t_1)\\ \rArr u(t_1)\pi_1=-v(t_1)\pi_2 H(,t1)=H(,t1+)+μTt1ψλ1u(t1)+λ2v(t1)=λ1+u(t1)+λ2+v(t1)u(t1)π1=v(t1)π2

把以上这些条件代入解析表达式进行数值求解,问题转化为参数 [ t 1 , t f , μ 1 , μ 2 , π 1 , π 2 ] [t_1,t_f,\mu_1,\mu_2,\pi_1,\pi_2] [t1,tf,μ1,μ2,π1,π2]最优化问题,需要满足微分方程约束KaTeX parse error: Unexpected end of input in a macro argument, expected '}' at position 5: \dot\̲m̲a̲t̲h̲b̲f̲ ̲x=\mathbf f[\ma…以及代数方程约束:
1 + μ 1 u ( t f ) + μ 2 v ( t f ) = 0 ( μ 1 + π 1 ) u ( t 1 ) + ( μ 2 + π 2 ) v ( t 1 ) = μ 1 u ( t 1 ) + μ 2 v ( t 1 ) \begin{aligned} 1+\mu_1u(t_f)+\mu_2v(t_f)&=0\\ (\mu_1+\pi_1)u(t_1)+(\mu_2+\pi_2)v(t_1)&=\mu_1u(t_1)+\mu_2v(t_1) \end{aligned} 1+μ1u(tf)+μ2v(tf)(μ1+π1)u(t1)+(μ2+π2)v(t1)=0=μ1u(t1)+μ2v(t1)

数值求解

事实上,经过如上间接法的推导过程,已经将原来的两点边值问题转化成参数寻优问题,只需要求解 [ t 1 , t f , μ 1 , μ 2 , π 1 , π 2 ] [t_1,t_f,\mu_1,\mu_2,\pi_1,\pi_2] [t1,tf,μ1,μ2,π1,π2]就可以得到最优控制。这是间接法最突出的优点,也就是有可能(不一定所有问题都行)将函数空间中的无穷维优化问题转化为有限个参数的最优化问题,问题的规模大大降低。计算最难的地方在于如何猜测协态变量相关的参数,由于本文这个例子参数较少因此容易收敛。直接给出MATLAB程序,欢迎交流学习、批评指正。

function [p,X]=Indirect_DSS_InteriorPoint
% 最短时间拦截,带有一个内点约束
% 间接法推导必要条件,把协态变量函数、以及OCP问题本身
% 用(mu1,mu2,pi1,pi2,t1,tf)几个参数刻画,求解参数优化问题即可
% 采用直接打靶法
% 计算量小于直接法
% edited by: Siyang Meng in Northwestern Polytechnical University

%% problem parameters
    p.ns = 4; p.nu = 1; % number of states and controls
    p.t0 = 0; 
    p.tf = 12; % tf为性能指标
    p.acc=60;
    p.X10 = 0; p.X20 = 0; p.X30=0; p.X40=0;% initial conditions
    p.X11 = 1500; p.X21=300; % interior point the trajectory passes
    p.X1f = 2000; p.X2f = 0; % boundary conditions
    
%% shooting parameters
    p.nt1 = 30;     % number of node points in Phase 1
    p.nt2 = 10;      % number of node points in Phase 2( t1 not included )

    
%% initial guess of decision variables, based on emprical evidance
    decVar0 = [6;   % t1 initial Guess 
               2;  % tf-t1 initial Guess 
               -0.3;% mu1
               0.2;% mu2
               1.1;% pi1
               1];% pi2
%           % u 不是决策变量

    
%% solve the problem with NLP method
    options = optimoptions(@fmincon,'Display','Iter','MaxIter',1e3); % option
    [decVar_NLP,minJ] = fmincon(@(decVar) objective(decVar,p),decVar0,...
        [],[],[],[],[],[],@(x) constraints(x,p),options);
    minJ
%% obtain the optimal solution
    [~,ceq,p] = constraints(decVar_NLP,p)
    [~,X] = ode45(@(t,y) odeCon(t,y,p),p.t,[p.X10,p.X20,p.X30,p.X40]); 
    y1 = X(:,1); y2 = X(:,2); % extract states
    
%% plots
    figure(1)
    plot(y1,y2);hold on,plot(p.X11,p.X21,'bo',p.X1f,p.X2f,'rx')
    xlabel('x/m');ylabel('y/m');
    title('trajectory');
    figure(2);
    plot(p.t,p.u);
    xlabel('t/s');ylabel('θ(t)/rad');title('Control');
end

%% objective function
function f = objective(decVar,p)
    f=decVar(1)+decVar(2);
%     u = x(p.u_IDX);      % extract
%     L = u.^2/2;       % integrand
%     f = trapz(p.t,L); % calculate objective
end

%% constraint function, c<=0, ceq=0
function [c,ceq,p] = constraints(decVar,p)
    p.t1 = decVar(1);
    p.dtf = decVar(2);
    p.mu1= decVar(3);
    p.mu2= decVar(4);
    p.pi1= decVar(5);
    p.pi2= decVar(6);
%     p.u = decVar(p.u_IDX); % extract
    p.t = [linspace(p.t0,p.t1,p.nt1)';
           linspace(p.t1,p.t1+p.dtf,p.nt2)']; % time horizon
    p.t(p.nt1)=p.t1-0.001;
    p.t(p.nt1+1)=p.t1+0.001;
    % 由解析表达式得到最优控制
    [p.u,p.lambda]=necessaryOptimal(p.t,p);
    [~,Y] = ode45(@(t,y) odeCon(t,y,p),p.t,[p.X10,p.X20,p.X30,p.X40]); % simulation
    % path and terminal constraints
    ceq1 = Y(p.nt1,1) - p.X11; % passes position
    ceq2 = Y(p.nt1,2) - p.X21;
    ceq3 = Y(end,1) - p.X1f; % final state conditions
    ceq4 = Y(end,2) - p.X2f;
    % interoir time and terminal time constraints
    ceq5 = p.pi1*( Y(p.nt1,3)-Y(p.nt1+1,3) )+ p.pi1*Y(p.nt1,3) + ...
            p.pi2*( Y(p.nt1,4)-Y(p.nt1+1,4) )+ p.pi2*Y(p.nt1,4);
    ceq6 = 1+p.pi1*Y(end,3)+p.pi2*Y(end,4);
    ceq = [ ceq1;
            ceq2;
            ceq3;
            ceq4;
            ceq5;
            ceq6];
    % inequality constraints
    c1= 0.1-p.t1;
    c2= -p.dtf;
    c3= p.u-pi;% Active Constraints
    c4=-p.u-pi;
    c=[ c1;
        c2;
        c3;
        c4];
end

%% indirect method result
function [theta,lambda]=necessaryOptimal(t,p)
    t1=p.t1; tf=t1+p.dtf; mu1=p.mu1; mu2=p.mu2; pi1=p.pi1; pi2=p.pi2;
    t_phase1=t(1:p.nt1);
    t_phase2=t(p.nt1+1:p.nt1+p.nt2);
    lambda1=[repmat(mu1+pi1,p.nt1,1);
             repmat(mu1    ,p.nt2,1)];
    lambda2=[repmat(mu2+pi2,p.nt1,1);
             repmat(mu2    ,p.nt2,1)];
    lambda3=[mu1*tf+t1*(mu1+pi1)-(2*mu1+pi1)*t_phase1;
             mu1*(tf-t_phase2)];
    lambda4=[mu2*tf+t1*(mu2+pi2)-(2*mu2+pi2)*t_phase1;
             mu2*(tf-t_phase2)];
    theta=atan2(lambda4,lambda3);
    theta(end)=theta(end-1);% for case that lambda3=lambda4=0
    lambda=[lambda1,lambda2,lambda3,lambda4];
end

%% derivative function
function dydt = odeCon(t,X,p)
    theta = interp1(p.t,p.u,t); % u
    dydt(1,1) = X(3);
    dydt(2,1) = X(4);
    dydt(3,1) = p.acc*cos(theta);
    dydt(4,1) = p.acc*sin(theta);
end

求解结果如下
控制角度
轨迹
协态变量

参考文献

[1] Bryson A E , Ho Y C ,Applied optimal control : optimization, estimation, and control. Sec 3.5[J]. IEEE Transactions on Systems Man & Cybernetics, 1975
[2] Moritz Diehl, Numerical Optimal Control (draft), Sec 8.2. 2011

  • 1
    点赞
  • 23
    收藏
    觉得还不错? 一键收藏
  • 2
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值