拉格朗日乘子法、KKT条件、拉格朗日对偶性

最新推荐文章于 2024-08-27 20:46:42 发布

MeJnCode

最新推荐文章于 2024-08-27 20:46:42 发布

阅读量2.3w

点赞数 9

分类专栏： MachineLearning 文章标签：统计学优化机器学习拉格朗日 KKT

本文链接：https://blog.csdn.net/sinat_17496535/article/details/52103852

版权

MachineLearning 专栏收录该内容

6 篇文章 0 订阅

订阅专栏

拉格朗日乘子法、KKT条件、拉格朗日对偶性

@20160718

笔记主要来源于维基百科和《统计学习方法》

拉格朗日乘子法(Lagrange Multiplier)

拉格朗日乘子法是一种寻找有等式约束条件的函数的最优值(最大或者最小)的最优化方法.在求取函数最优值的过程中,约束条件通常会给求取最优值带来困难,而拉格朗日乘子法就是解决这类问题的一种强有力的工具.

1. 单约束问题

考虑以下的二维单约束优化问题:

maximize $maximize$

f(x,y) $f(x,y)$

subject $subject$

to $to$

g(x,y)=0 $g(x,y)=0$

把 $f(x,y)$ 绘制成等高图,当沿着曲线 $g(x,y)=0$ 寻找最大值时,函数最大值的点应该是在 $f(x,y)=maximum$ 与 $g(x,y)=0$ 相切的位置,但是有时候也会遇到沿着 $g(x,y)=0$ 寻找的过程中, $f(x,y)$ 在某一段保持不变,这个时候也有可能这段点集就是我们要寻找的最优解.有两种可能会出现这种情况:1) $f$ 和 $g$ 是”平行”的,也就是我们在约束曲线上寻找的同时也是在 $f(x,y)=d$ 上移动; 2) 遇到了 $f$ 的level part,意思就是, $f$ 在任何方向都不会改变.
在以上两种情况中都存在 $\lambda$ 满足下式:

\nabla x, y f = - λ \nabla x, y g,

$\nabla_{x,y}f=-\lambda\nabla_{x,y}g,$
总结上述所有公式,我们有:

L (x, y, λ) = f (x, y) + λ g (x, y)

$L(x,y,\lambda)=f(x,y)+\lambda g(x,y)$
解:

\nabla x, y, λ L (x, y, λ) = 0

$\nabla_{x,y,\lambda}L(x,y,\lambda)=0$
这种方法就是拉格朗日乘子法,其中

λ $\lambda$ 就是拉格朗日乘子,当第二种情况出现时

λ=0 $\lambda=0$ .
类似地,针对多变量问题,我们可通过解下式获得最优解:

\nabla x 1, . . ., x n, λ L (x 1, . . ., x n, λ) = 0

$\nabla_{x_1,...,x_n,\lambda}L(x_1,...,x_n,\lambda)=0$

2. 多约束问题

考虑一个简单的约束问题:两个约束曲线仅相交于一点,那么很显然,这一点就是最优点.再考虑一下更一般的情况, $f$ 的level set并不平行于所有的约束曲线,这时候应该怎么办呢?线性组合!!!拉格朗日乘子法所寻找的点对应的梯度并不是 $f$ 任意某个约束的梯度的倍数,而是所有约束梯度的线性组合!
用 $A$ 表示可寻找的向量空间, $S$ 表示约束梯度的张量空间,就有: $A=S^\perp$ ,向量空间垂直与 $S$ 中的每一个元素.
与单约束问题类似,我们仍然考虑在沿着向量空间寻找过程中那些使 $f$ 不变的点,因为这些点可能就是最优值.
也就是说,我们需要寻找那些 $x$ 使得其移动方向垂直于 $\nabla f(x)$ ,因为这个时候 $f$ 才是不发生变化的,则有: $\nabla f(x)\in A^\perp=S$ ,因此,存在实数 $\lambda_1,\lambda_2,...,\lambda_M$ 满足:

\nabla f (x) = - \sum k = 1 M λ k \nabla g k (x)

$\nabla f(x)=-\sum_{k=1}^{M}\lambda_k\nabla g_k(x)$ 其中,那些实数就是拉格朗日乘子,相应的拉格朗日函数式如下:

L (x 1, . . ., x n, λ 1, . . ., λ M) = f (x 1, . ., x n) - \sum k = 1 M λ k g k (x 1, . . ., x n),

$L(x_1,...,x_n,\lambda_1,...,\lambda_M)=f(x_1,..,x_n)-\sum_{k=1}^M\lambda_kg_k(x_1,...,x_n),$ 解:

\nabla x 1, . . ., x n, λ 1, . . ., λ M L (x 1, . . ., x n, λ 1, . . ., λ M) = 0

$\nabla x_1,...,x_n,\lambda_1,...,\lambda_ML(x_1,...,x_n,\lambda_1,...,\lambda_M)=0$ 以上就是拉格朗日乘子法针对多约束问题的求解办法.

KKT条件(Karush–Kuhn–Tucker conditions)

KKT条件是拉格朗日乘子法的拓展,是一种求取含不等式约束条件的函数最优值的方法.
考虑以下非线性优化问题:

maxmize f(x) $maxmize\space\space f(x)$

subject to gi(x)≤0,hj(x)=0 $subject\space to \space g_i(x)\leq0,h_j(x)=0$

其中 $x$ 就是优化变量, $f$ 是目标函数, $g_i \space(i=1,2,...,m)$ 是不等式约束函数, $h_j\space (j=1,2,...,l)$ 是等式约束函数.
针对该问题,KKT条件就是指最优点 $x^*$ 满足以下条件:

\nabla f (x *) = \sum i = 1 m μ i \nabla g i (x *) + \sum j = 1 l λ j \nabla h j (x *)

$\nabla f(x^*)=\sum_{i=1}^m\mu_i\nabla g_i(x^*)+\sum_{j=1}^l\lambda_j\nabla h_j(x^*)$

g i (x *) \leq 0, f o r a l l i = 1, 2, . . ., m

$g_i(x^*)\leq0,\space for\space all\space i=1,2,...,m$

h j (x *) = 0, f o r a l l j = 1, 2, . . ., l

$h_j(x^*)=0, \space for \space all \space j=1,2,...,l$

μ i \geq 0, f o r a l l i = 1, 2, . . ., m

$\mu_i\geq 0,\space for \space all \space i=1,2,...,m$

μ i g i (x *) = 0, f o r a l l i = 1, 2, . . ., m

$\mu_ig_i(x^*)=0,\space for \space all \space i=1,2,...,m$

拉格朗日对偶性(Lagrange duality)

在约束最优化问题中，常常利用拉格朗日对偶性将原始问题转化为对偶问题。通过解对偶问题而得到原始问题的解.

1. 原始问题(primal problem)

假设 $f(x),c_i(x),h_j(x)$ 是定义在 $R^n$ 上的连续可微函数。考虑如下最优化问题：

min x \in R n f (x) (1)

$\begin{equation}\label{eq:1} \min_{x\in R^n}f(x) \space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space (1) \end{equation}$

s . t . c i (x) \leq 0, i = 1, 2, . . ., k (2)

$\begin{equation}\label{eq:2} s.t. \space c_i(x)\leq0, \space i=1,2,...,k \space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space (2) \end{equation}$

h j (x) = 0, j = 1, 2, . . ., l (3)

$\begin{equation}\label{eq:3} \space\space\space\space\space\space\space h_j(x)=0, \space j=1,2,...,l \space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space (3) \end{equation}$
称此约束最优化问题为 原始最优化问题或 原始问题.
引入广义拉格朗日函数

L (x, α, β) = f (x) + \sum i = 1 k α i c i (x) + \sum j = 1 l β j h j (x) (4)

$\begin{equation} L(x,\alpha,\beta)=f(x)+\sum_{i=1}^k\alpha_ic_i(x)+\sum_{j=1}^l\beta_jh_j(x) \space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space (4) \label{eq:4} \end{equation}$
这里,

αi,βj $\alpha_i,\beta_j$ 是拉格朗日乘子，

αi≥0 $\alpha_i\geq0$ . 考虑

x $x$ 的函数：

θ P (x) = max α, β; α i \geq 0 L (x, α, β) (5)

$\begin{equation}\label{eq:5} \theta_P(x)=\max_{\alpha,\beta;\alpha_i\geq0}L(x,\alpha,\beta) \space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space (5) \end{equation}$
这里下标

P $P$ 表示原始问题.
容易得到：当

x $x$ 满足原始问题约束时，

θP(x)=f(x) $\theta_P(x)=f(x)$ ，则可得到与原始优化问题想等价的极小化问题如下：

min x θ P (x) = min x max α, β; α i \geq 0 L (x, α, β) (6)

$\begin{equation}\label{eq:6} \min_{x}\theta_P(x)=\min_{x}\max_{\alpha,\beta;\alpha_i\geq0}L(x,\alpha,\beta) \space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space (6) \end{equation}$
此问题称为 广义拉格朗日函数的极小极大问题. 定义原始问题的最优值

p * = min x θ P (x) (7)

$\begin{equation}\label{eq:7} p^*=\min_{x}\theta_P(x) \space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space (7) \end{equation}$
称为原始问题的值.

2. 对偶问题(dual problem)

定义

θ D (α, β) = min x L (x, α, β) (8)

$\begin{equation}\label{eq:8} \theta_D(\alpha,\beta)=\min_{x}L(x,\alpha,\beta) \space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space (8) \end{equation}$
再考虑极大化上式，即

max α, β; α i \geq 0 θ D (α, β) = max α, β; α i \geq 0 min x L (x, α, β) (9)

$\begin{equation}\label{eq:9} \max_{\alpha,\beta;\alpha_i\geq0}\theta_D(\alpha,\beta)=\max_{\alpha,\beta;\alpha_i\geq0}\min_{x}L(x,\alpha,\beta) \space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space (9) \end{equation}$
问题

maxα,β;α≥0minxL(x,α,β) $\max_{\alpha,\beta;\alpha\geq0}\min_{x}L(x,\alpha,\beta)$ 称为 广义拉格朗日函数的极大极小问题.
可将广义拉格朗日函数的极大极小问题表示为约束最优化问题：

max α, β θ D (α, β) = max α, β min x L (x, α, β) (10)

$\begin{equation}\label{eq:10} \max_{\alpha,\beta}\theta_D(\alpha,\beta)=\max_{\alpha,\beta}\min_{x}L(x,\alpha,\beta) \space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space (10) \end{equation}$

s . t . α i \geq 0, i = 1, 2, . . ., k (11)

$\begin{equation}\label{eq:11} s.t.\space\alpha_i\geq0,\space i=1,2,...,k \space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space (11) \end{equation}$
称为原使问题的对偶问题. 定义对偶问题的最优值

d * = max α, β; α i \geq 0 θ D (α, β) (12)

$\begin{equation}\label{eq:12} d^*=\max_{\alpha,\beta;\alpha_i\geq0}\theta_D(\alpha,\beta) \space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space\space (12) \end{equation}$
称为对偶问题的值.