约束优化问题的KKT条件推导

最新推荐文章于 2024-06-08 09:56:40 发布

xiaofei473

最新推荐文章于 2024-06-08 09:56:40 发布

阅读量2.6k

点赞数 4

分类专栏：凸优化笔记总结文章标签：最优化数值优化机器人决策规划机器学习

本文链接：https://blog.csdn.net/xiaofei473/article/details/119356929

版权

凸优化笔记总结专栏收录该内容

9 篇文章 6 订阅

订阅专栏

KKT条件是约束优化问题最优解的一阶必要条件，证明角度有很多，比较容易看懂的是从约束条件梯度线性无关角度出发的证明，下面进行分析。
约束优化问题的一般形式可写为
$\min_{x\in \mathbb{R} ^n} f\left( x \right) \\s.t. \left\{ \begin{aligned} c_i(x)&=0, i\in \mathcal{E}\\ c_i(x)&\le 0, i\in \mathcal{I}\\\end{aligned} \right. \tag{1}$
其中， $f$ 和 $c_i$ 均为光滑函数， $\mathcal{E}$ 和 $\mathcal{I}$ 分别表示等式和不等式约束。可行集为满足约束 $c_i$ 的 $x$ 集合，即 $\Omega =\left\{ \left. x \right|c_i(x)=0, i\in \mathcal{E} ; c_i(x)\le 0, i\in \mathcal{I} \right\}$ 。

局部解： $x^{\ast}\in \Omega$ ，且存在 $x^{\ast}$ 的邻域 $\mathcal{N}$ ，使得当 $x\in \mathcal{N} \cap \Omega$ 时， $f\left( x^{\ast} \right) \le f\left( x \right)$ ；
严格局部解： $x^{\ast}\in \Omega$ ，且存在 $x^{\ast}$ 的邻域 $\mathcal{N}$ ，使得当 $x\in \mathcal{N} \cap \Omega$ 且 $x\ne x^{\ast}$ 时， $f\left( x^{\ast} \right) < f\left( x \right)$ ；
孤立局部解： $x^{\ast}\in \Omega$ ，且存在 $x^{\ast}$ 的邻域 $\mathcal{N}$ ，使得 $x^{\ast}$ 为 $\mathcal{N} \cap \Omega$ 的唯一局部解；
积极集(active set): $\mathcal{A} \left( x \right) =\mathcal{E} \cup \left\{ \left. i\in \mathcal{I} \right|c_i\left( x \right) =0 \right\}$ ；
接近可行点 $x$ 的可行序列 $\left\{ z_k \right\}$ ：对充分大的 $k$ ， $z_k\in \Omega$ ，且 $z_k\rightarrow x$ ；
切向量 $d$ ：存在接近可行点 $x$ 的可行序列 $\left\{ z_k \right\}$ 和趋于0的正数序列 $\left\{ t_k \right\}$ (即 $t_k\rightarrow 0$ )，使得 $\lim\limits_{k\rightarrow \infty} \frac{z_k-x}{t_k}=d$ ；
切锥：在 $x^{\ast}$ 处的切向量集合 $T_{\Omega}\left( x^{\ast} \right)$ ；
线性化可行方向： $\mathcal{F} \left( x \right) =\left\{ d\left| \begin{array}{l} d^{\mathrm{T}}\nabla c_i(x)=0,i\in \mathcal{E}\\ d^{\mathrm{T}}\nabla c_i(x)\le 0,i\in \mathcal{A} (x)\cap \mathcal{I}\\\end{array} \right. \right\}$
LICQ(Linear independence constraint qualification): $\left\{ \nabla c_i\left( x \right) ,i\in \mathcal{A} \left( x \right) \right\}$ 线性无关；

KKT条件 设 $x^{\ast}$ 为的局部解， $f$ 和 $c_i$ 连续可微，且在 $x^{\ast}$ 处LICQ条件成立，则存在Lagrange乘子向量 $\lambda ^{\ast}$ （其分量为 $\lambda _{i}^{\ast},i\in \mathcal{E} \cup \mathcal{I}$ ），使得如下条件在 $\left( x^{\ast},\lambda ^{\ast} \right)$ 处成立：
$\begin{aligned} \nabla _x\mathcal{L} \left( x^{\ast},\lambda ^{\ast} \right) &=\nabla f\left( x^{\ast} \right) +\sum_{i\in \mathcal{E} \cup \mathcal{I}}{\lambda _{i}^{\ast}\nabla c_i\left( x^{\ast} \right)}=0\\ c_i\left( x^{\ast} \right) &=0,i\in \mathcal{E} \\ c_i\left( x^{\ast} \right) &\le 0,i\in \mathcal{I}\\ \lambda _{i}^{\ast}&\ge 0,i\in \mathcal{I}\\ \lambda _{i}^{\ast}c_i\left( x^{\ast} \right) &=0,i\in \mathcal{E} \cup \mathcal{I} \end{aligned}\tag{2}$

式(2)中的最后一项又称为互补松弛条件。

严格互补：对 $i\in \mathcal{I}$ ， $\lambda _{i}^{\ast}$ 和 $c_i\left( x^{\ast} \right)$ 只有一个为零，即对 $i\in \mathcal{A} \cap \mathcal{I} \left( x \right)$ ， $\lambda _{i}^{\ast}>0$ ；

引理1 设 $x^{\ast}$ 为可行点，则有

(1) $T_{\Omega}( x^{\ast}) \subset \mathcal{F}( x^{\ast})$ ；

(2)若LICQ条件成立，则 $\mathcal{F}(x^{\ast}) =T_{\Omega}(x^{\ast})$ 。

证明：

(1)令 ${ z_k\}$ 和 ${ t_k\}$ 为定义切向量的序列，即 $\displaystyle\lim_{k\rightarrow \infty} \frac{z_k-x^{\ast}}{t_k}=d$ ， $t_k>0$ 且 $t_k\rightarrow 0$ ，则对充分大的 $k$ ，有
$z_k=x^\ast+t_kd+o(t_k)\tag{3}$
注意到 $z_k\in \Omega$ ，对等式约束 $i\in\mathcal{E}$ ，结合式(3)和Taylor公式有
$\begin{aligned} 0&=\frac{1}{t_k}c_i(z_k)\\ &=\frac{1}{t_k}[c_i(x^\ast)+t_k\nabla c_i^\mathrm{T}(x^\ast)d+o(t_k)]\\ &=\nabla c_i^\mathrm{T}(x^\ast)d+\frac{o(t_k)}{t_k} \end{aligned}\tag{4}$

对上式取极限 $k\rightarrow\infty$ ，有 $\nabla c_i^\mathrm{T}(x^\ast)d=0$ 。

类似地，对于积极的不等式约束 $i\in\mathcal{A}(x^\ast)\cap\mathcal{I}$ ，有
$\begin{aligned} 0&\geq\frac{1}{t_k}c_i(z_k)\\ &=\frac{1}{t_k}[c_i(x^\ast)+t_k\nabla c_i^\mathrm{T}(x^\ast)d+o(t_k)]\\ &=\nabla c_i^\mathrm{T}(x^\ast)d+\frac{o(t_k)}{t_k} \end{aligned}\tag{5}$

对上式取极限 $k\rightarrow\infty$ ，有 $\nabla c_i^\mathrm{T}(x^\ast)d\leq0$ ，第(1)条得证。

(2)设 $x^\ast$ 处有 $m$ 个积极约束( $m < n$ ，即积极约束的个数小于优化变量的维数，否则优化问题没有意义)，记为 $c(x^\ast)=[c_1(x^\ast)\cdots c_i(x^\ast)\cdots c_m(x^\ast)]^\mathrm{T}_{i\in\mathcal{A}(x^\ast)}$ ， $c(x^\ast)$ 各分量的梯度组成 $m\times n$ 矩阵 $A(x^\ast)$ 的行，即 $A^\mathrm{T}(x^\ast)=[\nabla c_i(x^\ast)]_{i\in\mathcal{A}(x^\ast)}$ ，由于LICQ条件成立， $A(x^\ast)$ 为行满秩矩阵，令其零空间的基向量组成 $n\times (n-m)$ 矩阵 $Z$ 的列，即 $A(x^\ast)Z=0$ 且 $Z$ 为列满秩矩阵。

设 $d\in\mathcal{F}( x^{\ast})$ ，且 $\{t_k\}_{k=0}^\infty$ 为满足 $\displaystyle\lim_{k\rightarrow\infty}t_k=0$ 的任意正数序列，定义如下参数化方程
$R(z,t)=\begin{bmatrix} c(z)-tA(x^\ast)d\\ Z^\mathrm{T}(z-x^\ast-td) \end{bmatrix}=0\tag{6}$

下面只需证明给定足够小的 $t=t_k>0$ ，式(6)的解 $z=z_k$ 为接近 $x^\ast$ 的可行序列，且 $\lim\limits_{k\rightarrow \infty} \frac{z_k-x}{t_k}=d$ 即可。

考虑到 $t = 0$ ， $z=x^\ast$ 为式(6)的一个解，对应 $R$ 的Jacobian矩阵为
$\nabla_zR(x^\ast,0)=\begin{bmatrix} A(x^\ast)\\ Z^\mathrm{T} \end{bmatrix}\tag{7}$

显然 $\nabla_zR(x^\ast,0)$ 为非奇异矩阵，由隐函数定理可知，对充分小的 $t_k>0$ ，式(6)有唯一解 $z_k$ ，且由 $d\in\mathcal{F}( x^{\ast})$ 可知有
$\begin{aligned} i\in\mathcal{E}\Rightarrow c_i(z_k)=t_k\nabla c_i^\mathrm{T}(x^\ast)d=0\\ i\in\mathcal{A}(x^\ast) \cap \mathcal{I}\Rightarrow c_i(z_k)=t_k\nabla c_i^\mathrm{T}(x^\ast)d\leq0\tag{8} \end{aligned}$

可见 $z_k$ 为可行点(对于充分小的 $t_k>0$ ， $z_k$ 足够接近 $x^\ast$ ，考虑 $x^\ast$ 足够小邻域内的 $z_k$ ，则可知 $z_k$ 同样满足其他严格的不等式约束)。

事实上，在 $(x^\ast,0)$ 附近，式(6)的解 $z$ 可视为关于 $t$ 的隐函数，即 $z = z (t)$ 且 $z_k=z(t_k)$ ，且由隐函数定理可知 $z$ 关于 $t$ 连续可微，满足
$z'(0)=-\nabla_zR(x^\ast,0)^{-1}\nabla R_t(x^\ast,0)\tag{9}$

结合式(6)、(7)和(9)可知 $z^{'} (0) = d$ ，由于 $z(0)=x^\ast$ ，有
$\frac{z_k-x^\ast}{t_k}=\frac{z(0)+t_kz'(0)+o(t_k)-x^\ast}{t_k}=d+\frac{o(t_k)}{t_k}\tag{10}$

上式取极限 $k\rightarrow\infty$ ， $t_k\rightarrow 0$ 可知 $\lim\limits_{k\rightarrow \infty} \frac{z_k-x}{t_k}=d$ ，因此 $d\in T_{\Omega}( x^{\ast})$ ，第(2)条得证。

定理1 设 $x^{\ast}$ 为局部解，则对 $d\in T_{\Omega}( x^{\ast})$ ， $\nabla f^\mathrm{T}(x^\ast)d\geq 0$ 。

证明： 反证法，假设存在 $d\in T_{\Omega}( x^{\ast})$ 使得 $\nabla f^\mathrm{T}(x^\ast)d< 0$ ，令 $d$ 对应的序列分别为 ${z_k\}$ 和 ${t_k\}$ ，则有
$\begin{aligned} f(z_k)&=f(x^\ast)+(z_k-x^\ast)^\mathrm{T}\nabla f(x^\ast)+o(\|z_k-x^\ast\|)\\ &=f(x^\ast)+t_kd^\mathrm{T}\nabla f(x^\ast)+o(t_k) \end{aligned}\tag{11}$

由于 $\nabla f^\mathrm{T}(x^\ast)d< 0$ ，对充分大的 $k$ ，有 $f(z_k)<f(x^\ast)$ ，因此，给定 $x^\ast$ 的任意开邻域，可通过选择足够大的 $k$ ，使得 $z_k$ 位于该邻域且 $f(z_k)<f(x^\ast)$ ，与 $x^{\ast}$ 为局部解矛盾，证毕。

引理2(Farkas引理) 考虑锥 $K=\{By+Cw\vert y\geq 0\}$ ， $B$ 和 $C$ 分别为 $n\times m$ 和 $n\times p$ 矩阵， $y$ 和 $w$ 为合适维度的向量。对任意向量 $g\in\mathbb{R}^n$ ，要么 $g\in K$ ，要么存在 $d\in\mathbb{R}^n$ 使得
$g^\mathrm{T}d<0,\;B^\mathrm{T}d\geq 0,\;C^\mathrm{T}d=0\tag{12}$

证明： 首先证明两种情况不能同时成立。若 $g\in K$ ，则存在向量 $y\geq 0$ 和 $w$ 使得 $g = B y + C w$ 。若此时还存在 $d$ 使得式(12)成立，则有
$0>d^\mathrm{T}g=d^\mathrm{T}By+d^\mathrm{T}Cw=(B^\mathrm{T}d)^\mathrm{T}y+(C^\mathrm{T}d)^\mathrm{T}w\ge 0$

因此两种情况不能同时成立。

进一步证明 $g\notin K$ 时式(12)成立。考虑到 $K$ 为闭集，令 $\hat{s}$ 为 $K$ 中距离 $g$ 最近的向量，即为如下优化问题的解：
$\min\Vert s-g\Vert_2^2,\; {\rm{subject\;to}}\;s\in K\tag{13}$

由于 $\hat{s}\in K$ 且 $K$ 为锥，因此对 $\alpha\geq 0$ ， $\alpha\hat{s}\in K$ ，且当 $\alpha=1$ 时 $\Vert\alpha\hat{s}-g\Vert_2^2$ 最小，因此有
$\left.\frac{\mathrm{d}}{\mathrm{d}\alpha}\Vert\alpha\hat{s}-g\Vert_2^2\right\vert_{\alpha=1}=0\Rightarrow \left.(-2\hat{s}^\mathrm{T}g+2\alpha\hat{s}^\mathrm{T}\hat{s})\right\vert_{\alpha=1}=0\Rightarrow\hat{s}^\mathrm{T}(\hat{s}-g)=0\tag{14}$

对 $K$ 中任意其他向量 $s$ ，由于 $K$ 为凸集，对 $\theta\in [0,1]$ ，有
$\Vert\hat{s}+\theta(s-\hat{s})-g\Vert_2^2\geq\Vert\hat{s}-g\Vert_2^2$

即
$2\theta(s-\hat{s})^\mathrm{T}(\hat{s}-g)+\theta^2\Vert s-\hat{s}\Vert_2^2\geq 0$

上式左右两边除以 $\theta$ 并取极限 $\theta\rightarrow 0^+$ ，有 $(s-\hat{s})^\mathrm{T}(\hat{s}-g)\geq 0$ ，结合式(14)可知对任意 $s\in K$ ，
$s^\mathrm{T}(\hat{s}-g)\geq 0\tag{15}$

下面进一步证明矢量 $d=\hat{s}-g$ 满足式(12)。由于 $g\notin K$ ， $d\neq 0$ ，从而
$d^\mathrm{T}g=d^\mathrm{T}(\hat{s}-d)=(\hat{s}-g)^\mathrm{T}\hat{s}-d^\mathrm{T}d=-\Vert d\Vert_2^2<0\tag{16}$

由式(15)可知对任意 $s\in K$ ， $d^\mathrm{T}s\geq 0$ ，因此对任意 $y\geq 0$ 和 $w$ ，有
$d^\mathrm{T}(By+Cw)\geq 0$

取 $y = 0$ 可得 $(C^\mathrm{T}d)^\mathrm{T}w\geq 0$ 对任意 $w$ 均成立，因此 $C^\mathrm{T}d=0$ ；取 $w = 0$ 可得 $(B^\mathrm{T}d)^\mathrm{T}y\geq 0$ 对任意 $y\geq 0$ 均成立，因此 $B^\mathrm{T}d\geq 0$ ，结合式(16)可知 $d$ 满足式(12)，证毕。

由引理2，令 $B=[-\nabla c_i(x^\ast)]_{i\in\mathcal{A}(x^\ast)\cap\mathcal{I}}$ ， $y=[\lambda_i]_{i\in\mathcal{A}(x^\ast)\cap\mathcal{I}}$ ， $C=[-\nabla c_i(x^\ast)]_{i\in\mathcal{E}}$ ， $w=[\lambda_i]_{i\in\mathcal{E}}$ ，则可定义锥
$\begin{aligned} N&=\{By+Cw\vert y\geq 0\}\\ &=\left\{-\sum_{i\in\mathcal{A}(x^\ast)}\lambda_i\nabla c_i(x^\ast),\;\lambda_i\geq 0\;{\rm for}\;i\in\mathcal{A}(x^\ast)\cap\mathcal{I}\right\} \end{aligned}$

并设 $g=\nabla f(x^\ast)$ ，则要么（注意到 $A^\mathrm{T}(x^\ast)=[\nabla c_i(x^\ast)]_{i\in\mathcal{A}(x^\ast)}$ ）
$\nabla f(x^\ast)=-\sum_{i\in\mathcal{A}(x^\ast)}\lambda_i\nabla c_i(x^\ast)=-A^\mathrm{T}(x^\ast)\lambda^\ast,\;\lambda_i\geq 0\;{\rm for}\;i\in\mathcal{A}(x^\ast)\cap\mathcal{I}\tag{17}$

要么存在 $d$ 使得 $d^\mathrm{T}\nabla f(x^\ast)<0$ ， $B^\mathrm{T}d=[-\nabla c_i(x^\ast)]^\mathrm{T}_{i\in\mathcal{A}(x^\ast)\cap\mathcal{I}}d\geq 0$ ， $C^\mathrm{T}d=[-\nabla c_i(x^\ast)]^\mathrm{T}_{i\in\mathcal{E}}d=0$ ，即 $d\in\mathcal{F}(x^\ast)$ 。

KKT条件的证明 根据定理1，对 $d\in T_{\Omega}( x^{\ast})$ ， $d^\mathrm{T}\nabla f(x^\ast)\geq 0$ ，由于LICQ条件成立，根据引理1可知 $\mathcal{F}(x^{\ast}) =T_{\Omega}(x^{\ast})$ ，因此对 $d\in \mathcal{F}(x^{\ast})$ ，有 $d^\mathrm{T}\nabla f(x^\ast)\geq 0$ ，由引理2可知存在 $\lambda$ 使得式(17)成立。构造 $\lambda^\ast$ 如下
$\ A ( x ∗ ) (18) \lambda_i^\ast=\left\{ \begin{aligned} &\lambda_i,\;i\in\mathcal{A}(x^\ast)\\ &0,\;i\in\mathcal{I}\backslash\mathcal{A}(x^\ast) \end{aligned} \right.\tag{18}$

则可以逐项检查KKT条件是否成立：

由 $\lambda^\ast$ 的定义和式(17)可知式(2)第1项成立；
由于 $x^\ast$ 为可行点，式(2)第2项和第3项成立；
考虑到 $\lambda_i^\ast\geq 0\;{\rm for}\;i\in\mathcal{A}(x^\ast)\cap\mathcal{I}$ ， $\ A ( x ∗ ) \lambda_i^\ast=0\;{\rm for}\;i\in\mathcal{I}\backslash\mathcal{A}(x^\ast)$ ，因此 $\lambda_i^\ast\geq 0\;{\rm for}\;i\in\mathcal{I}$ ，式(2)第4项成立；
由于 $c_i(x^\ast)=0 \;{\rm for}\;i\in\mathcal{A}(x^\ast)$ ， $\ A ( x ∗ ) \lambda_i^\ast=0\;{\rm for}\;i\in\mathcal{I}\backslash\mathcal{A}(x^\ast)$ ，因此 $\lambda _{i}^{\ast}c_i\left( x^{\ast} \right) =0,i\in \mathcal{E} \cup \mathcal{I}$ ，式(2)第5项成立，证毕。

整个证明的关键在于先确定 $\nabla f(x^\ast)$ 和 $T_{\Omega}( x^{\ast})$ 之间的夹角关系，将 $T_{\Omega}( x^{\ast})$ 替换为 $\mathcal{F}(x^{\ast})$ ，再利用Farkas引理证明 $\nabla f(x^\ast)$ 位于构造的锥内。

xiaofei473

关注

4
点赞
踩
11

收藏

觉得还不错? 一键收藏
0
评论
约束优化问题的KKT条件推导

KKT条件是约束优化问题最优解的一阶必要条件，证明角度有很多，比较容易看懂的是从约束条件梯度线性无关角度出发的证明，下面进行分析。约束优化问题的一般形式可写为min⁡x∈Rnf(x)s.t.{ci(x)=0,i∈Eci(x)≤0,i∈I(1)\min_{x\in \mathbb{R} ^n} f\left( x \right) \\s.t. \left\{ \begin{aligned} c_i(x)&=0, i\in \mathcal{E}\\ c_i(x)&\le 0, i\in
复制链接

扫一扫