KKT条件学习

Nightmare004

已于 2022-04-06 02:26:11 修改

阅读量535

点赞数

分类专栏：数学文章标签：算法线性代数

于 2022-01-09 21:52:26 首次发布

本文链接：https://blog.csdn.net/qq_39942341/article/details/122353089

版权

数学专栏收录该内容

144 篇文章 19 订阅

订阅专栏

不等式约束问题

$\begin{array}{lll} \text { (P) } & \min & f(\mathbf{x}) \\ & \text { s.t. } & g_{i}(\mathbf{x}) \leq 0, & i=1,2, \ldots, m \end{array}$
其中 $f,g_1,\cdots,g_m$ 是在 $\mathbb{R}^n$ 连续可微的函数

可行下降方向

$\begin{array}{ll} \min & b(\mathbf{x}) \\ \text { s.t. } & \mathbf{x} \in C \end{array}$
其中 $h$ 是在闭凸集 $\mathbf{C}\subseteq \mathbb{R}^n$ 上连续可微的函数。
设向量 $\mathbf{d}\neq 0$ ， $\mathbf{x}\in\mathbf{C}$
如果 $\nabla h(\mathbf{x})^T\mathbf{d}<0$ ,并且存在 $\epsilon>0$ 使得 $\forall t\in\left[0,\epsilon\right],\mathbf{x}+t\mathbf{d}\in \mathbf{C}$
则 $\mathbf{d}$ 是在 $\mathbf{x}$ 点的可行下降方向

引理1

考虑问题
$\begin{array}{lll} \text { (G) } & \min & h(\mathbf{x}) \\ & \text { s.t. } & \mathbf{x} \in C . \end{array}$
如果 $\mathbf{x}^*$ 是一个局部最优解，则在 $\mathbf{x}^*$ 没有可行下降方向

证明：
假设有下降方向，则存在 $\mathbf{d}\neq 0,\epsilon_1>0$
$\forall t \in \left[0,\epsilon_1\right],\mathbf{x}^*+t\mathbf{d}\in\mathbf{C},\nabla f(\mathbf{x})^T\mathbf{d}<0$
根据下降方向的性质
$\exists \epsilon_2<\epsilon_1,\forall t\in\left[0,\epsilon_2\right],f(\mathbf{x}^*+t\mathbf{d})<f(\mathbf{x}^*)$
和局部最优矛盾了

引理2

设 $\mathbf{x}^*$ 是
$\begin{array}{ll} \min & f(\mathbf{x}) \\ \text { s.t. } & g_{i}(\mathbf{x}) \leq 0, \quad i=1,2, \ldots, m \end{array}$
的局部最优解
其中 $f,g_1,\cdots,g_m$ 是 $\mathbb{R}^n$ 上连续可微的函数
设
$I(\mathbf{x}^*)=\left\{i: g_i(\mathbf{x}^*)=0\right\}$
则不存在 $\mathbf{d}\in\mathbb{R}^n$ ,使得
$\begin{aligned} &\nabla f\left(\mathbf{x}^{*}\right)^{T} \mathbf{d}<0, \\ &\nabla g_{i}\left(\mathbf{x}^{*}\right)^{T} \mathbf{d}<0, \quad i \in I\left(\mathbf{x}^{*}\right) \end{aligned}$
证明：
假设存在 $\mathbf{d}$ 使得
$\begin{aligned} &\nabla f\left(\mathbf{x}^{*}\right)^{T} \mathbf{d}<0, \\ &\nabla g_{i}\left(\mathbf{x}^{*}\right)^{T} \mathbf{d}<0, \quad i \in I\left(\mathbf{x}^{*}\right) \end{aligned}$
那么存在 $\epsilon_1>0$ ， $\forall t \in (0,\epsilon_1),i\in I(\mathbf{x}^*)$
使得 $f(\mathbf{x}^*+t\mathbf{d})<f(\mathbf{x}^*)$ 以及 $g_i(\mathbf{x}^*+t\mathbf{d})<g_i(\mathbf{x}^*)=0$

对于 $i\notin I(\mathbf{x}^*)$ ,有 $g_i(\mathbf{x}^*)<0$
因此 $\exists \epsilon_2>0,\forall t\in (0,\epsilon_2),i\notin I(\mathbf{x}^*)$ ,有 $g_i(\mathbf{x}^*+t\mathbf{d})<0$

所以 $\forall t\in \left(0,\min \left\{\epsilon_1,\epsilon_2\right\}\right)$ ,有
$\begin{aligned} &f\left(\mathbf{x}^{*}+t \mathbf{d}\right)<f\left(\mathbf{x}^{*}\right) \\ &g_{i}\left(\mathbf{x}^{*}+t \mathbf{d}\right)<0, \quad i=1,2, \ldots, m, \end{aligned}$
与 $\mathbf{x}^*$ 局部最优解矛盾

不等式约束问题的Fritz-John条件

设 $\mathbf{x}^*$ 是
$\begin{array}{ll} \min & f(\mathbf{x}) \\ \text { s.t. } & g_{i}(\mathbf{x}) \leq 0, \quad i=1,2, \ldots, m \end{array}$
的局部最优解
其中 $f,g_1,\cdots,g_m$ 是 $\mathbb{R}^n$ 上连续可微的函数
则存在不全为0的 $\lambda_0,\lambda_1,\cdots,\lambda_m\ge 0$ ，使得
$\begin{aligned} \lambda_{0} \nabla f\left(\mathbf{x}^{*}\right)+\sum_{i=1}^{m} \lambda_{i} \nabla g_{i}\left(\mathbf{x}^{*}\right) &=0, \\ \lambda_{i} g_{i}\left(\mathbf{x}^{*}\right) &=0, \quad i=1,2, \ldots, m \end{aligned}$
证明：
由引理2
不存在 $\mathbf{d}$ 使得
$\begin{aligned} &\nabla f\left(\mathbf{x}^{*}\right)^{T} \mathbf{d}<0, \\ &\nabla g_{i}\left(\mathbf{x}^{*}\right)^{T} \mathbf{d}<0, \quad i \in I\left(\mathbf{x}^{*}\right) \end{aligned}$
其中 $I(\mathbf{x}^*)=\left\{i:g_i(\mathbf{x}^*)\right\}=\left\{i_1,i_2,\cdots,i_k\right\}$
这个等价于
$\mathbf{Ad}<0$
无解
其中
$\mathbf{A}=\left(\begin{array}{c} \nabla f\left(\mathbf{x}^{*}\right)^{T} \\ \nabla g_{i_{1}}\left(\mathbf{x}^{*}\right)^{T} \\ \vdots \\ \nabla g_{i_{k}}\left(\mathbf{x}^{*}\right)^{T} \end{array}\right)$
根据Gordan定理，
无解等价于存在 $\eta=\left(\lambda_0,\cdots,\lambda_{i_k}\right)^T\neq 0$ ,使得
$\mathrm{A}^{T} \eta=0, \quad \eta \geq 0$
于是
$\begin{aligned} \lambda_{0} \nabla f\left(\mathbf{x}^{*}\right)+\sum_{i=1}^{m} \lambda_{i} \nabla g_{i}\left(\mathbf{x}^{*}\right) &=0, \\ \lambda_{i} g_{i}\left(\mathbf{x}^{*}\right) &=0, \quad i=1,2, \ldots, m \end{aligned}$

不等式约束问题的KKT条件

设 $\mathbf{x}^*$ 是
$\begin{array}{ll} \min & f(\mathbf{x}) \\ \text { s.t. } & g_{i}(\mathbf{x}) \leq 0, \quad i=1,2, \ldots, m \end{array}$
的局部最优解
其中 $f,g_1,\cdots,g_m$ 是 $\mathbb{R}^n$ 上连续可微的函数
设
$I(\mathbf{x}^*)=\left\{i:g_i(\mathbf{x}^*)=0\right\}$
设 $\left\{\nabla g_i(\mathbf{x}^*)\right\}_{i\in I(\mathbf{x}^*)}$ 线性无关
则存在 $\lambda_1,\cdots,\lambda_m\ge 0$ ,使得
$\begin{aligned} \nabla f\left(\mathbf{x}^{*}\right)+\sum_{i=1}^{m} \lambda_{i} \nabla g_{i}\left(\mathbf{x}^{*}\right) &=0, \\ \lambda_{i} g_{i}\left(\mathbf{x}^{*}\right) &=0, \quad i=1,2, \ldots, m \end{aligned}$
证明：
根据Fritz-John条件，存在不全为0的 $\tilde{\lambda_0},\cdots,\tilde{\lambda_m}\ge 0$
使得 $\begin{aligned} \tilde{\lambda_0} \nabla f\left(\mathbf{x}^{*}\right)+\sum_{i=1}^{m} \tilde{\lambda_i} \nabla g_{i}\left(\mathbf{x}^{*}\right) &=0, \\ \tilde{\lambda_i} g_{i}\left(\mathbf{x}^{*}\right) &=0, \quad i=1,2, \ldots, m \end{aligned}$
如果 $\tilde{\lambda_0}=0$ ,则 $\tilde{\lambda_i}=0$ ,与 $\left\{\nabla g_i(\mathbf{x}^*)\right\}_{i\in I(\mathbf{x}^*)}$ 线性无关
所以令 $\lambda_i=\frac{\tilde{\lambda_i}}{\tilde{\lambda_0}}$
得证

不等式与等式约束问题

不等式与等式约束问题的KKT条件

设 $\mathbf{x}^*$ 是问题
$\begin{array}{ll} \min & f(\mathbf{x}) \\ \text { s.t. } & g_{i}(\mathbf{x}) \leq 0, \quad i=1,2, \ldots, m \\ & h_{j}(\mathbf{x})=0, \quad j=1,2, \ldots, p \end{array}$
的局部最优解，
其中 $f,g_1,\cdots,g_m,h_1,\cdots,h_p$ 是在 $\mathbb{R}^n$ 上连续可微的函数
假设
$\left\{\nabla g_{i}\left(\mathbf{x}^{*}\right): i \in I\left(\mathbf{x}^{*}\right)\right\} \cup\left\{\nabla h_{j}\left(\mathbf{x}^{*}\right): j=1,2, \ldots, p\right\}$
线性无关
其中 $I(\mathbf{x}^*)=\left\{i:g_i(\mathbf{x}^*)=0\right\}$
则存在 $\lambda_1,\cdots,\lambda_m\ge 0$ 和 $\mu_1,\cdots,\mu_p\in \mathbb{R}$ ，使得
$\begin{aligned} \nabla f\left(\mathbf{x}^{*}\right)+\sum_{i=1}^{m} \lambda_{i} \nabla g_{i}\left(\mathbf{x}^{*}\right)+\sum_{j=1}^{p} \mu_{j} \nabla h_{j}\left(\mathbf{x}^{*}\right) &=0, \\ \lambda_{i} g_{i}\left(\mathbf{x}^{*}\right) &=0, \quad i=1,2, \ldots, m \end{aligned}$

KKT点

一个可行解 $\mathbb{x}^*$
如果存在 $\lambda_1,\cdots,\lambda_m\ge 0$ 和 $\mu_1,\cdots,\mu_p\in \mathbb{R}$ ，使得
$\begin{aligned} \nabla f\left(\mathbf{x}^{*}\right)+\sum_{i=1}^{m} \lambda_{i} \nabla g_{i}\left(\mathbf{x}^{*}\right)+\sum_{j=1}^{p} \mu_{j} \nabla h_{j}\left(\mathbf{x}^{*}\right) &=0, \\ \lambda_{i} g_{i}\left(\mathbf{x}^{*}\right) &=0, \quad i=1,2, \ldots, m \end{aligned}$
则 $\mathbf{x}^*$ 称为KKT点

规范(regularity)

一个可行解 $\mathbf{x}^*$
如果 $\left\{\nabla g_{i}\left(\mathbf{x}^{*}\right): i \in I\left(\mathbf{x}^{*}\right)\right\} \cup\left\{\nabla h_{j}\left(\mathbf{x}^{*}\right): j=1,2, \ldots, p\right\}$
线性无关，则称为规范(regularity)

凸的条件下

凸优化问题中KKT条件的充分性

设 $\mathbf{x}^*$ 是问题
$\begin{array}{ll} \min & f(\mathbf{x}) \\ \text { s.t. } & g_{i}(\mathbf{x}) \leq 0, \quad i=1,2, \ldots, m \\ & h_{j}(\mathbf{x})=0, \quad j=1,2, \ldots, p \end{array}$
的一个可行解
其中 $f,g_1,\cdots,g_m$ 是在 $\mathbb{R}^n$ 上连续可微的凸函数
$h_1,\cdots,h_p$ 是仿射函数
如果存在 $\lambda_1,\cdots,\lambda_m\ge 0$ 和 $\mu_1,\cdots,\mu_p\in \mathbb{R}$ ，使得
$\begin{aligned} \nabla f\left(\mathbf{x}^{*}\right)+\sum_{i=1}^{m} \lambda_{i} \nabla g_{i}\left(\mathbf{x}^{*}\right)+\sum_{j=1}^{p} \mu_{j} \nabla h_{j}\left(\mathbf{x}^{*}\right) &=0, \\ \lambda_{i} g_{i}\left(\mathbf{x}^{*}\right) &=0, \quad i=1,2, \ldots, m \end{aligned}$
则 $\mathbf{x}^*$ 是最优解
证明：
$s(\mathbf{x})=f(\mathbf{x})+\sum_{i=1}^{m} \lambda_{i} g_{i}(\mathbf{x})+\sum_{j=1}^{p} \mu_{j} h_{j}(\mathbf{x})$
$\nabla s(\mathbf{x}^*)=0$ ,所以 $s(\mathbf{x}^*)\le s(\mathbf{x})$
所以
$\begin{aligned} f\left(\mathbf{x}^{*}\right) &=f\left(\mathbf{x}^{*}\right)+\sum_{i=1}^{m} \lambda_{i} g_{i}\left(\mathbf{x}^{*}\right)+\sum_{j=1}^{p} \mu_{j} h_{j}\left(\mathbf{x}^{*}\right) \\ &=s\left(\mathbf{x}^{*}\right) \\ & \leq s(\mathbf{x}) \\ &=f(\mathbf{x})+\sum_{i=1}^{m} \lambda_{i} g_{i}(\mathbf{x})+\sum_{j=1}^{p} \mu_{j} h_{j}(\mathbf{x}) \\ & \leq f(\mathbf{x}) \end{aligned}$

不等式约束问题中KKT条件下Slater条件的必要性

设 $\mathbf{x}^*$ 是
$\begin{array}{ll} \min & f(\mathbf{x}) \\ \text { s.t. } & g_{i}(\mathbf{x}) \leq 0, \quad i=1,2, \ldots, m \end{array}$
的局部最优解
其中 $f,g_1,\cdots,g_m$ 是 $\mathbb{R}^n$ 上连续可微的函数，
并且 $g_1,\cdots,g_m$ 是凸函数
假设存在 $\hat{\mathbf{x}}\in \mathbb{R}^n$ ,使得
$g_{i}(\hat{\mathbf{x}})<0, \quad i=1,2, \ldots, m$
则存在 $\lambda_1,\cdots,\lambda_m\ge 0$ ,使得
$\begin{aligned} \nabla f\left(\mathbf{x}^{*}\right)+\sum_{i=1}^{m} \lambda_{i} \nabla g_{i}\left(\mathbf{x}^{*}\right) &=0, \\ \lambda_{i} g_{i}\left(\mathbf{x}^{*}\right) &=0, \quad i=1,2, \ldots, m \end{aligned}$
证明：
因为 $\mathbf{x}^*$ 是局部最优解，所以一定满足Fritz-John条件
即存在不全为0的 $\lambda_0,\lambda_1,\cdots,\lambda_m\ge 0$ ，使得
$\begin{aligned} \lambda_{0} \nabla f\left(\mathbf{x}^{*}\right)+\sum_{i=1}^{m} \lambda_{i} \nabla g_{i}\left(\mathbf{x}^{*}\right) &=0, \\ \lambda_{i} g_{i}\left(\mathbf{x}^{*}\right) &=0, \quad i=1,2, \ldots, m \end{aligned}$

假设 $\lambda_0=0$ ,则
$\sum_{i=1}^{m} \tilde{\lambda}_{i} \nabla g_{i}\left(\mathbf{x}^{*}\right)=0$
由凸函数的一阶条件
$0>g_{i}(\hat{\mathbf{x}}) \geq g_{i}\left(\mathbf{x}^{*}\right)+\nabla g_{i}\left(\mathbf{x}^{*}\right)^{T}\left(\hat{\mathbf{x}}-\mathbf{x}^{*}\right)$
于是
$0>\sum_{i=1}^{m} \tilde{\lambda}_{i} g_{i}\left(\mathbf{x}^{*}\right)+\left[\sum_{i=1}^{m} \tilde{\lambda}_{i} \nabla g_{i}\left(\mathbf{x}^{*}\right)\right]^{T}\left(\hat{\mathbf{x}}-\mathbf{x}^{*}\right)=0$
矛盾
所以 $\lambda_0>0$ ，然后取 $\lambda_i=\frac{\tilde{\lambda_i}}{\lambda_0}$
得证

广义Slater条件

$\begin{array}{ll} g_{i}(\mathbf{x}) \leq 0, & i=1,2, \ldots, m, \\ h_{j}(\mathbf{x}) \leq 0, & j=1,2, \ldots, p, \\ s_{k}(\mathbf{x})=0, & k=1,2, \ldots, q, \end{array}$
其中 $g_i$ 是凸函数， $h_j,s_k$ 是仿射函数
如果存在 $\hat{\mathbf{x}}\in\mathbb{R}^n$ ,满足
$\begin{array}{ll} g_{i}(\hat{\mathbf{x}}) < 0, & i=1,2, \ldots, m, \\ h_{j}(\hat{\mathbf{x}}) \leq 0, & j=1,2, \ldots, p, \\ s_{k}(\hat{\mathbf{x}})=0, & k=1,2, \ldots, q, \end{array}$
则满足广义Slater条件

不等式与等式约束问题的KKT条件下的广义Slater条件的必要性

设 $\mathbf{x}^*$ 是问题
$\begin{array}{ll} \min & f(\mathbf{x}) \\ \text { s.t. } & g_{i}(\mathbf{x}) \leq 0, \quad i=1,2, \ldots, m, \\ & h_{j}(\mathbf{x}) \leq 0, \quad j=1,2, \ldots, p, \\ & s_{k}(\mathbf{x})=0, \quad k=1,2, \ldots, q, \end{array}$
的最优解
其中 $f,g_1,\cdots,g_m$ 是在 $\mathbb{R}^n$ 上的连续可微凸函数， $h_j,s_k$ 是仿射函数
如果存在 $\hat{\mathbf{x}}\in\mathbb{R}^n$ ,满足
$\begin{array}{ll} g_{i}(\hat{\mathbf{x}}) < 0, & i=1,2, \ldots, m, \\ h_{j}(\hat{\mathbf{x}}) \leq 0, & j=1,2, \ldots, p, \\ s_{k}(\hat{\mathbf{x}})=0, & k=1,2, \ldots, q, \end{array}$
则存在 $\lambda_1,\cdots,\lambda_m,\eta_1,\cdots,\eta_p\ge 0,\mu_1,\cdots,\mu_q\in\mathbb{R}$ ,使得
$\begin{aligned} \nabla f\left(\mathbf{x}^{*}\right)+\sum_{i=1}^{m} \lambda_{i} \nabla g_{i}\left(\mathbf{x}^{*}\right)+\sum_{j=1}^{p} \eta_{j} \nabla h_{j}\left(\mathbf{x}^{*}\right)+\sum_{k=1}^{q} \mu_{k} \nabla s_{k}\left(\mathbf{x}^{*}\right) &=0, \\ \lambda_{i} g_{i}\left(\mathbf{x}^{*}\right) &=0, \quad i=1,2, \ldots, m, \\ \eta_{j} h_{j}\left(\mathbf{x}^{*}\right) &=0, \quad j=1,2, \ldots, p . \end{aligned}$