支持向量机 Support Vector Machines (SVM) - 2

本文介绍了凸优化问题的基本概念,包括目标函数和约束条件。通过拉格朗日函数将原问题与约束结合,并讨论了原问题与对偶问题的关系,重点阐述了强弱对偶性和Karush-Kuhn-Tucker (KKT) 条件在解决优化问题中的关键作用。
摘要由CSDN通过智能技术生成


前序文章链接:
SVM-1

2.4 凸优化问题、拉格朗日函数、对偶问题、KKT条件

2.4.1 凸优化问题

\qquad 凸优化问题指的是形如以下的最优化问题:

min ⁡ x ⃗ f ( x ⃗ ) s . t . g i ( x ⃗ ) ≤ 0 , i = 1 , 2 , 3 , ⋯   , k h j ( x ⃗ ) = 0 , j = 1 , 2 , 3 , ⋯   , l \min_{\vec{x}}f(\vec{x}) \\ s.t. \quad g_{i}(\vec{x}) \leq 0, i=1,2,3, \cdots, k \\ \qquad h_{j}(\vec{x})=0,j=1,2,3, \cdots, l x minf(x )s.t.gi(x )0,i=1,2,3,,khj(x )=0,j=1,2,3,,l

\qquad 其中,目标函数 f ( x ⃗ ) f(\vec{x}) f(x ) 和约束函数 g i ( x ⃗ ) g_{i}(\vec{x}) gi(x )都是 R n R^{n} Rn上连续可微的凸函数,约束函数 h j ( x ⃗ ) h_{j}(\vec{x}) hj(x ) R n R^{n} Rn上的仿射函数。


2.4.2 拉格朗日函数

\qquad 拉格朗日函数通过引入拉格朗日乘子,将目标优化问题与约束条件组合在一起进行计算。以2.4.1中凸优化问题所对应的拉格朗日函数为例:

L ( x ⃗ , α ⃗ , β ⃗ ) = f ( x ⃗ ) + ∑ i = 1 k α i g i ( x ⃗ ) + ∑ j = 1 l β j h j ( x ⃗ ) L(\vec{x},\vec{\alpha},\vec{\beta}) = f(\vec{x}) + \sum_{i=1}^{k}\alpha_{i}g_{i}(\vec{x})+\sum_{j=1}^{l}\beta_{j}h_{j}(\vec{x}) L(x ,α ,β )=f(x )+i=1kαigi(x )+j=1lβjhj(x )

\qquad 其中,有自变量 x ⃗ = ( x 1 , x 2 , x 3 , ⋯   , x n ) T ∈ R n \vec{x} = (x_{1},x_{2},x_{3}, \cdots ,x_{n})^{T} \in R^{n} x =(x1,x2,x3,,xn)TRn α ⃗ = ( α 1 , α 2 , α 3 , ⋯   , α k ) T ∈ R k \vec{\alpha} = (\alpha_{1},\alpha_{2},\alpha_{3}, \cdots ,\alpha_{k})^{T} \in R^{k} α =(α1,α2,α3,,αk)TRk β ⃗ = ( β 1 , β 2 , β 3 , ⋯   , β l ) T ∈ R l \vec{\beta} = (\beta_{1},\beta_{2},\beta_{3}, \cdots ,\beta_{l})^{T} \in R^{l} β =(β1,β2,β3,,βl)TRl α i \alpha_{i} αi β j \beta_{j} βj 是拉格朗日乘子, α i ≥ 0 \alpha_{i} \geq 0 αi0 。接下来给出论证,为什么采用拉格朗日函数可以等价于原问题及其约束条件。

\qquad 记:

θ p ( x ⃗ ) = max ⁡ α ⃗ , β ⃗ : α i ≥ 0 L ( x ⃗ , α ⃗ , β ⃗ ) \theta_{p}(\vec{x}) = \max_{\vec{\alpha},\vec{\beta}:\alpha_{i} \geq 0}L(\vec{x},\vec{\alpha},\vec{\beta}) θp(x )=α ,β :αi0maxL(x ,α ,β )

\qquad 其中, max ⁡ α ⃗ , β ⃗ : α i ≥ 0 \max_{\vec{\alpha},\vec{\beta}:\alpha_{i} \geq 0} maxα ,β :αi0 可以理解为对于 x ⃗ \vec{x} x 的每一个可取值,在 α ⃗ , β ⃗ : α i ≥ 0 {\vec{\alpha},\vec{\beta}:\alpha_{i} \geq 0} α ,β :αi0 可取值范围内所取到的最大值。假定给某个 x ⃗ \vec{x} x 。如果该 x ⃗ \vec{x} x 违反了原始问题的约束条件,那么就存在某个 g i ( x ⃗ ) > 0 g_i(\vec{x}) > 0 gi(x )>0 或某个 h j ( x ⃗ ) ≠ 0 h_j(\vec{x}) \neq 0 hj(x )=0 ,就可令 $\alpha_{i} \rightarrow +\infty $ 或 $\beta_{j}h_{j}(\vec{x}) \rightarrow +\infty $ 并使其余各 α i , β j \alpha_{i}, \beta_{j} αi,βj 均取 0,得到:

θ p ( x ⃗ ) = max ⁡ α ⃗ , β ⃗ : α i ≥ 0 L ( x ⃗ , α ⃗ , β ⃗ ) = max ⁡ α ⃗ , β ⃗ : α i ≥ 0 [ f ( x ⃗ ) + ∑ i = 1 k α i g i ( x ⃗ ) + ∑ j = 1 l β j h j ( x ⃗ ) ] = + ∞ \theta_{p}(\vec{x}) = \max_{\vec{\alpha},\vec{\beta}:\alpha_{i} \geq 0}L(\vec{x},\vec{\alpha},\vec{\beta}) = \max_{\vec{\alpha},\vec{\beta}:\alpha_{i} \geq 0} \Big [f(\vec{x}) + \sum_{i=1}^{k}\alpha_{i}g_{i}(\vec{x})+\sum_{j=1}^{l}\beta_{j}h_{j}(\vec{x}) \Big] = +\infty θp(x )=α ,β :αi0maxL(x ,α ,β )=α ,β :αi0max[f(x )+i=1kαigi(x )+j=1lβjhj(x )]=+

\qquad 相反,如果 x ⃗ \vec{x} x 满足所有的约束条件,则可以令 α i = 0 , β j \alpha_{i} = 0, \beta_{j} αi=0,βj 取任意值,使得:

θ p ( x ⃗ ) = max ⁡ α ⃗ , β ⃗ : α i ≥ 0 L ( x ⃗ , α ⃗ , β ⃗ ) = max ⁡ α ⃗ , β ⃗ : α i ≥ 0 [ f ( x ⃗ ) + ∑ i = 1 k α i g i ( x ⃗ ) + ∑ j = 1 l β j h j ( x ⃗ ) ] = f ( x ⃗ ) \theta_{p}(\vec{x}) = \max_{\vec{\alpha},\vec{\beta}:\alpha_{i} \geq 0}L(\vec{x},\vec{\alpha},\vec{\beta}) = \max_{\vec{\alpha},\vec{\beta}:\alpha_{i} \geq 0} \Big [f(\vec{x}) + \sum_{i=1}^{k}\alpha_{i}g_{i}(\vec{x})+\sum_{j=1}^{l}\beta_{j}h_{j}(\vec{x}) \Big] = f(\vec{x}) θp(x )=α ,β :αi0maxL(x ,α ,β )=α ,β :αi0max[f(x )+i=1kαigi(x )+j=1lβjhj(x )]=f(x )

θ p ( x ⃗ ) = max ⁡ α ⃗ , β ⃗ : α i ≥ 0 L ( x ⃗ , α ⃗ , β ⃗ ) = { f ( x ⃗ ) , 所 有 满 足 原 始 问 题 约 束 条 件 的 x ⃗ + ∞ , o t h e r s \theta_{p}(\vec{x}) = \max_{\vec{\alpha},\vec{\beta}:\alpha_{i} \geq 0}L(\vec{x},\vec{\alpha},\vec{\beta}) = \begin{cases} f(\vec{x}), &所有满足原始问题约束条件的\vec{x} \\ \\ +\infty, &others \end{cases} θp(x )=α ,β :αi0maxL(x ,α ,β )=f(x ),+,x others

\qquad 此外,也可以考虑采用梯度的思想对拉格朗日函数进行理解,但在此不做赘述。

\qquad 综上,有:

θ p ( x ⃗ ) = max ⁡ α ⃗ , β ⃗ : α i ≥ 0 L ( x ⃗ , α ⃗ , β ⃗ ) = { f ( x ⃗ ) , 所 有 满 足 原 始 问 题 约 束 条 件 的 x ⃗ + ∞ , o t h e r s \theta_{p}(\vec{x}) = \max_{\vec{\alpha},\vec{\beta}:\alpha_{i} \geq 0}L(\vec{x},\vec{\alpha},\vec{\beta}) = \begin{cases} f(\vec{x}), &所有满足原始问题约束条件的\vec{x} \\ \\ +\infty, &others \end{cases} θp(x )=α ,β :αi0maxL(x ,α ,β )=f(x ),+,x others

min ⁡ x ⃗ θ p ( x ⃗ ) = min ⁡ x ⃗ f ( x ⃗ ) = min ⁡ x ⃗ max ⁡ α ⃗ , β ⃗ : α i ≥ 0 L ( x ⃗ , α ⃗ , β ⃗ ) \min_{\vec{x}}\theta_{p}(\vec{x}) = \min_{\vec{x}}f(\vec{x}) = \min_{\vec{x}}\max_{\vec{\alpha},{\vec{\beta}:\alpha_{i} \geq 0}}L(\vec{x},\vec{\alpha},\vec{\beta}) x minθp(x )=x minf(x )=x minα ,β :αi0maxL(x ,α ,β )
\qquad 所以,求2.4.1的原问题即是求解拉格朗日函数的极小极大问题,对拉格朗日乘子求拉格朗日函数的极大值在整个计算过程中起到了过滤不满足约束的参数的作用。


2.4.3 原问题与对偶问题、强弱对偶关系

\qquad 当原问题是一个极小极大问题时,其对偶问题为一个极大极小问题。

\qquad 假设原始问题为 min ⁡ x ⃗ max ⁡ α ⃗ , β ⃗ L ( x ⃗ , α ⃗ , β ⃗ ) \min_{\vec{x}}\max_{\vec{\alpha},\vec{\beta}}L(\vec{x},\vec{\alpha},\vec{\beta}) minx maxα ,β L(x ,α ,β ),最优值为 p ∗ p^{*} p,对偶问题为 max ⁡ α ⃗ , β ⃗ min ⁡ x ⃗ L ( x ⃗ , α ⃗ , β ⃗ ) \max_{\vec{\alpha},\vec{\beta}}\min_{\vec{x}}L(\vec{x},\vec{\alpha},\vec{\beta}) maxα ,β minx L(x ,α ,β )最优值为 d ∗ d^{*} d,那么,有:

p ∗ = min ⁡ x ⃗ max ⁡ α ⃗ , β ⃗ L ( x ⃗ , α ⃗ , β ⃗ ) d ∗ = max ⁡ α ⃗ , β ⃗ min ⁡ x ⃗ L ( x ⃗ , α ⃗ , β ⃗ ) p^{*}=\min_{\vec{x}}\max_{\vec{\alpha},\vec{\beta}}L(\vec{x},\vec{\alpha},\vec{\beta}) \\ d^{*}=\max_{\vec{\alpha},\vec{\beta}}\min_{\vec{x}}L(\vec{x},\vec{\alpha},\vec{\beta}) p=x minα ,β maxL(x ,α ,β )d=α ,β maxx minL(x ,α ,β )

\qquad 对于函数 L ( x ⃗ , α ⃗ , β ⃗ ) L(\vec{x},\vec{\alpha},\vec{\beta}) L(x ,α ,β ),有:

min ⁡ x ⃗ L ( x ⃗ , α ⃗ , β ⃗ ) ≤ L ( x ⃗ , α ⃗ , β ⃗ ) L ( x ⃗ , α ⃗ , β ⃗ ) ≤ max ⁡ α ⃗ , β ⃗ L ( x ⃗ , α ⃗ , β ⃗ ) θ d ( α ⃗ , β ⃗ ) = min ⁡ x ⃗ L ( x ⃗ , α ⃗ , β ⃗ ) ≤ L ( x ⃗ , α ⃗ , β ⃗ ) ≤ max ⁡ α ⃗ , β ⃗ L ( x ⃗ , α ⃗ , β ⃗ ) = θ p ( x ⃗ ) \min_{\vec{x}}L(\vec{x},\vec{\alpha},\vec{\beta}) \leq L(\vec{x},\vec{\alpha},\vec{\beta}) \\ L(\vec{x},\vec{\alpha},\vec{\beta}) \leq \max_{\vec{\alpha},\vec{\beta}}L(\vec{x},\vec{\alpha},\vec{\beta}) \\ \theta_{d}(\vec{\alpha},\vec{\beta}) = \min_{\vec{x}}L(\vec{x},\vec{\alpha},\vec{\beta}) \leq L(\vec{x},\vec{\alpha},\vec{\beta}) \leq \max_{\vec{\alpha},\vec{\beta}}L(\vec{x},\vec{\alpha},\vec{\beta}) = \theta_{p}(\vec{x}) x minL(x ,α ,β )L(x ,α ,β )L(x ,α ,β )α ,β maxL(x ,α ,β )θd(α ,β )=x minL(x ,α ,β )L(x ,α ,β )α ,β maxL(x ,α ,β )=θp(x )

\qquad 则:
d ∗ = max ⁡ α ⃗ , β ⃗ θ d ( α ⃗ , β ⃗ ) ≤ L ( x ⃗ , α ⃗ , β ⃗ ) ≤ min ⁡ x ⃗ θ p ( x ⃗ ) = p ∗ d^{*} = \max_{\vec{\alpha},\vec{\beta}}{\theta_{d}(\vec{\alpha},\vec{\beta})} \leq L(\vec{x},\vec{\alpha},\vec{\beta}) \leq \min_{\vec{x}}{\theta_{p}(\vec{x})} = p^{*} d=α ,β maxθd(α ,β )L(x ,α ,β )x minθp(x )=p
\qquad 即得:
d ∗ ≤ p ∗ d^{*} \leq p^{*} dp

\qquad 当上式不严格取等号时表明原问题与对偶问题存在弱对偶关系,上式取等号时为强队偶关系,取等号的充分条件为:在凸优化问题中存在 x x x ,使得对所有的 i i i 严格满足不等式约束 g i ( x ) g_{i}(x) gi(x) 。这一条件被称为Slater条件。Slater条件在一定程度上指出了与对偶问题有同解的凸优化问题所对应的凸集的几何形式,不作赘述。


2.4.4 Karush-Kuhn-Tucker(KKT)条件

\qquad 根据上文,我们有原凸优化问题:
min ⁡ x ⃗ f ( x ⃗ ) s . t . g i ( x ⃗ ) ≤ 0 , i = 1 , 2 , 3 , ⋯   , k h j ( x ⃗ ) = 0 , j = 1 , 2 , 3 , ⋯   , l \min_{\vec{x}}f(\vec{x}) \\ s.t. \quad g_{i}(\vec{x}) \leq 0, i=1,2,3, \cdots, k \\ \qquad h_{j}(\vec{x})=0,j=1,2,3, \cdots, l x minf(x )s.t.gi(x )0,i=1,2,3,,khj(x )=0,j=1,2,3,,l

\qquad 然后通过拉格朗日函数得到了一对对偶问题:
L ( x ⃗ , α ⃗ , β ⃗ ) = f ( x ⃗ ) + ∑ i = 1 k α i g i ( x ⃗ ) + ∑ j = 1 l β j h j ( x ⃗ ) P r i m a l p r o b l e m : min ⁡ x ⃗ θ p ( x ⃗ ) = min ⁡ x ⃗ max ⁡ α ⃗ , β ⃗ L ( x ⃗ , α ⃗ , β ⃗ ) D u a l p r o b l e m : max ⁡ α ⃗ , β ⃗ θ d ( α ⃗ , β ⃗ ) = max ⁡ α ⃗ , β ⃗ min ⁡ x ⃗ L ( x ⃗ , α ⃗ , β ⃗ ) L(\vec{x},\vec{\alpha},\vec{\beta}) = f(\vec{x}) + \sum_{i=1}^{k}\alpha_{i}g_{i}(\vec{x})+\sum_{j=1}^{l}\beta_{j}h_{j}(\vec{x}) \\ Primal \quad problem: \quad \min_{\vec{x}}{\theta_{p}(\vec{x})} = \min_{\vec{x}}\max_{\vec{\alpha},\vec{\beta}}L(\vec{x},\vec{\alpha},\vec{\beta}) \\ Dual \quad problem: \quad \max_{\vec{\alpha},\vec{\beta}}{\theta_{d}(\vec{\alpha},\vec{\beta})} = \max_{\vec{\alpha},\vec{\beta}}\min_{\vec{x}}L(\vec{x},\vec{\alpha},\vec{\beta}) L(x ,α ,β )=f(x )+i=1kαigi(x )+j=1lβjhj(x )Primalproblem:x minθp(x )=x minα ,β maxL(x ,α ,β )Dualproblem:α ,β maxθd(α ,β )=α ,β maxx minL(x ,α ,β )

\qquad 当一对对偶问题满足Slater条件存在强对偶关系时,可以得到:
p ∗ = θ p ( x ⃗ ∗ ) = min ⁡ x ⃗ θ p ( x ⃗ ) = d ∗ = θ d ( α ⃗ ∗ , β ⃗ ∗ ) = max ⁡ α ⃗ , β ⃗ θ d ( α ⃗ , β ⃗ ) p^{*} = {\theta_{p}(\vec{x}^{*})} = \min_{\vec{x}}{\theta_{p}(\vec{x})} = d^{*} = {\theta_{d}(\vec{\alpha}^{*},\vec{\beta}^{*})} = \max_{\vec{\alpha},\vec{\beta}}{\theta_{d}(\vec{\alpha},\vec{\beta})} p=θp(x )=x minθp(x )=d=θd(α ,β )=α ,β maxθd(α ,β )
\qquad 其中, x ⃗ ∗ , α ⃗ ∗ , β ⃗ ∗ \vec{x}^{*}, \vec{\alpha}^{*}, \vec{\beta}^{*} x ,α ,β 分别是取到原始问题和对偶问题最优值时的解。

\qquad 那么,在得到强对偶关系的条件下,将原问题转化为较容易求解的对偶问题,求出对偶问题的最优值 d ∗ d^{*} d 和最优解 α ⃗ ∗ , β ⃗ ∗ {\vec{\alpha}^{*},\vec{\beta}^{*}} α ,β 后,可以通过Karush-Kuhn-Tucker(KKT)条件转而求出原问题的最优解 x ⃗ ∗ \vec{x}^{*} x 。KKT条件是用于证明 x ⃗ ∗ \vec{x}^{*} x α ⃗ ∗ , β ⃗ ∗ {\vec{\alpha}^{*},\vec{\beta}^{*}} α ,β 分别是原始问题与对偶问题的解的充分必要条件。

\qquad KKT条件:
{ ∇ x ⃗ L ( x ⃗ ∗ , α ⃗ ∗ , β ⃗ ∗ ) = 0 ( 1 ) g i ( x ⃗ ∗ ) ≤ 0 , i = 1 , 2 , 3 , ⋯   , k ( 2 ) h j ( x ⃗ ∗ ) = 0 , j = 1 , 2 , 3 , ⋯   , l ( 3 ) α i ∗ ≥ 0 , i = 1 , 2 , 3 , ⋯   , k ( 4 ) α i ∗ g i ( x ⃗ ∗ ) = 0 , i = 1 , 2 , 3 , ⋯   , k ( 5 ) \begin{cases} \nabla_{\vec{x}}L(\vec{x}^{*},\vec{\alpha}^{*},\vec{\beta}^{*}) = 0 & &(1) \\ g_{i}(\vec{x}^{*}) \leq 0 ,& i = 1,2,3, \cdots ,k &(2) \\ h_{j}(\vec{x}^{*}) = 0 ,& j = 1,2,3, \cdots ,l &(3) \\ \alpha_{i}^{*} \geq 0 ,& i = 1,2,3, \cdots ,k &(4) \\ \alpha_{i}^{*}g_{i}(\vec{x}^{*}) = 0 ,& i = 1,2,3, \cdots ,k &(5) \end{cases} x L(x ,α ,β )=0gi(x )0,hj(x )=0,αi0,αigi(x )=0,i=1,2,3,,kj=1,2,3,,li=1,2,3,,ki=1,2,3,,k(1)(2)(3)(4)(5)

p ∗ = θ p ( x ⃗ ∗ ) = d ∗ = θ d ( α ⃗ ∗ , β ⃗ ∗ ) = L ( x ⃗ ∗ , α ⃗ ∗ , β ⃗ ∗ )    ⟺    { ∇ x ⃗ L ( x ⃗ ∗ , α ⃗ ∗ , β ⃗ ∗ ) = 0 g i ( x ⃗ ∗ ) ≤ 0 , i = 1 , 2 , 3 , ⋯   , k h j ( x ⃗ ∗ ) = 0 , j = 1 , 2 , 3 , ⋯   , l α i ∗ ≥ 0 , i = 1 , 2 , 3 , ⋯   , k α i ∗ g i ( x ⃗ ∗ ) = 0 , i = 1 , 2 , 3 , ⋯   , k p^{*} = \theta_{p}(\vec{x}^{*}) = d^{*} = \theta_{d}(\vec{\alpha}^{*},\vec{\beta}^{*}) = L(\vec{x}^{*},\vec{\alpha}^{*},\vec{\beta}^{*})\iff \begin{cases} \nabla_{\vec{x}}L(\vec{x}^{*},\vec{\alpha}^{*},\vec{\beta}^{*}) = 0 & \\ g_{i}(\vec{x}^{*}) \leq 0 ,& i = 1,2,3, \cdots ,k \\ h_{j}(\vec{x}^{*}) = 0 ,& j = 1,2,3, \cdots ,l \\ \alpha_{i}^{*} \geq 0 ,& i = 1,2,3, \cdots ,k \\ \alpha_{i}^{*}g_{i}(\vec{x}^{*}) = 0 ,& i = 1,2,3, \cdots ,k \end{cases} p=θp(x )=d=θd(α ,β )=L(x ,α ,β )x L(x ,α ,β )=0gi(x )0,hj(x )=0,αi0,αigi(x )=0,i=1,2,3,,kj=1,2,3,,li=1,2,3,,ki=1,2,3,,k

\qquad 下面证明充分性:

\qquad 由于 d ∗ = p ∗ d^{*} = p^{*} d=p, 假设 x ⃗ ∗ \vec{x}^{*} x α ⃗ ∗ , β ⃗ ∗ {\vec{\alpha}^{*},\vec{\beta}^{*}} α ,β 分别是原始问题与对偶问题的解,那么有:

d ∗ = max ⁡ α ⃗ , β ⃗ θ d ( α ⃗ , β ⃗ ) ( a ) = θ d ( α ⃗ ∗ , β ⃗ ∗ ) ( b ) = min ⁡ x ⃗ L ( x ⃗ , α ⃗ ∗ , β ⃗ ∗ ) ( c ) ≤ L ( x ⃗ ∗ , α ⃗ ∗ , β ⃗ ∗ ) ( d ) = f ( x ⃗ ∗ ) + ∑ i = 1 k α i ∗ g i ( x ⃗ ∗ ) + ∑ j = 1 l β j ∗ h j ( x ⃗ ∗ ) ( e ) = p ∗ \begin{aligned} d^{*} & = \max_{\vec{\alpha},\vec{\beta}}\theta_{d}(\vec{\alpha},\vec{\beta}) &(a) \\ & = \theta_{d}(\vec{\alpha}^{*},\vec{\beta}^{*}) &(b) \\ & = \min_{\vec{x}}L(\vec{x}, \vec{\alpha}^{*},\vec{\beta}^{*}) &(c) \\ & \leq L(\vec{x}^{*},\vec{\alpha}^{*},\vec{\beta}^{*}) &(d) \\ & = f(\vec{x}^{*}) + \sum_{i=1}^{k}\alpha_{i}^{*}g_{i}(\vec{x}^{*}) + \sum_{j=1}^{l}\beta_{j}^{*}h_{j}(\vec{x}^{*}) &(e) \\ & = p^{*} \end{aligned} d=α ,β maxθd(α ,β )=θd(α ,β )=x minL(x ,α ,β )L(x ,α ,β )=f(x )+i=1kαigi(x )+j=1lβjhj(x )=p(a)(b)(c)(d)(e)

\qquad 对于满足KKT条件的 x ⃗ ∗ \vec{x}^{*} x α ⃗ ∗ , β ⃗ ∗ {\vec{\alpha}^{*},\vec{\beta}^{*}} α ,β 可以使得上述的不等式都取到等号,所以充分性得证。

\qquad 下面证明必要性:

d ∗ = max ⁡ α ⃗ , β ⃗ θ d ( α ⃗ , β ⃗ ) ( a ) = θ d ( α ⃗ ∗ , β ⃗ ∗ ) ( b ) = min ⁡ x ⃗ L ( x ⃗ , α ⃗ ∗ , β ⃗ ∗ ) ( c ) ≤ L ( x ⃗ ∗ , α ⃗ ∗ , β ⃗ ∗ ) ( d ) = f ( x ⃗ ∗ ) + ∑ i = 1 k α i ∗ g i ( x ⃗ ∗ ) + ∑ j = 1 l β j ∗ h j ( x ⃗ ∗ ) ( e ) ≤ f ( x ⃗ ∗ ) ( f ) = p ∗ \begin{aligned} d^{*} & = \max_{\vec{\alpha},\vec{\beta}}\theta_{d}(\vec{\alpha},\vec{\beta}) &(a) \\ & = \theta_{d}(\vec{\alpha}^{*},\vec{\beta}^{*}) &(b) \\ & = \min_{\vec{x}}L(\vec{x}, \vec{\alpha}^{*},\vec{\beta}^{*}) &(c) \\ & \leq L(\vec{x}^{*},\vec{\alpha}^{*},\vec{\beta}^{*}) &(d) \\ & = f(\vec{x}^{*}) + \sum_{i=1}^{k}\alpha_{i}^{*}g_{i}(\vec{x}^{*}) + \sum_{j=1}^{l}\beta_{j}^{*}h_{j}(\vec{x}^{*}) &(e) \\ & \leq f(\vec{x}^{*}) &(f) \\ & = p^{*} \end{aligned} d=α ,β maxθd(α ,β )=θd(α ,β )=x minL(x ,α ,β )L(x ,α ,β )=f(x )+i=1kαigi(x )+j=1lβjhj(x )f(x )=p(a)(b)(c)(d)(e)(f)
\qquad x ⃗ ∗ , α ⃗ ∗ , β ⃗ ∗ \vec{x}^{*}, \vec{\alpha}^{*}, \vec{\beta}^{*} x ,α ,β 分别是取到原始问题和对偶问题最优值时的解,那么KKT条件中的(2)(3)(4)天然满足。要使得强对偶关系成立,即 d ∗ = p ∗ d^{*} = p^{*} d=p ( d ) (d) (d) ( f ) (f) (f) 中的不等号应该严格取等号。下面给出KKT条件中(1)(5)的推导。

\qquad 对不等式关系 ( c ) (c) (c) ( d ) (d) (d) 取等,有:

∵ min ⁡ x ⃗ L ( x ⃗ , α ⃗ ∗ , β ⃗ ∗ ) = L ( x ⃗ ∗ , α ⃗ ∗ , β ⃗ ∗ ) ∴ ∇ x ⃗ L ( x ⃗ ∗ , α ⃗ ∗ , β ⃗ ∗ ) = 0 ( 1 ) \begin{aligned} & \because \min_{\vec{x}}L(\vec{x},\vec{\alpha}^{*},\vec{\beta}^{*}) = L(\vec{x}^{*},\vec{\alpha}^{*},\vec{\beta}^{*}) \\ & \therefore \nabla_{\vec{x}}L(\vec{x}^{*},\vec{\alpha}^{*},\vec{\beta}^{*}) = 0 &(1) \end{aligned} x minL(x ,α ,β )=L(x ,α ,β )x L(x ,α ,β )=0(1)

\qquad 对不等式关系 ( e ) (e) (e) ( f ) (f) (f) 取等,有:

∵ h j ( x ⃗ ∗ ) = 0 ∴ ∑ j = 1 l β j ∗ h j ( x ⃗ ∗ ) = 0 ∴ ∑ i = 1 k α i ∗ g i ( x ⃗ ∗ ) = 0 ∵ α i ∗ ≥ 0 , g i ( x ⃗ ∗ ) ≤ 0 i = 1 , 2 , 3 , ⋯   , k ∴ α i ∗ g i ( x ⃗ ∗ ) = 0 i = 1 , 2 , 3 , ⋯   , k ( 5 ) \begin{aligned} & \because h_{j}(\vec{x}^{*}) = 0 \\ & \therefore \sum_{j=1}^{l}\beta_{j}^{*}h_{j}(\vec{x}^{*}) = 0 \\ & \therefore \sum_{i=1}^{k}\alpha_{i}^{*}g_{i}(\vec{x}^{*}) = 0 \\ & \because \alpha_{i}^{*} \geq 0 , \quad g_{i}(\vec{x}^{*}) \leq 0 & i = 1,2,3, \cdots ,k \\ & \therefore \alpha_{i}^{*}g_{i}(\vec{x}^{*}) = 0 & i = 1,2,3, \cdots ,k &&(5) \end{aligned} hj(x )=0j=1lβjhj(x )=0i=1kαigi(x )=0αi0,gi(x )0αigi(x )=0i=1,2,3,,ki=1,2,3,,k(5)



  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值