Convex optimization 4.1 --- Lagrange Dual Problem

8 篇文章 1 订阅

1 Introduction

1.1 对偶原理

对偶问题是凸优化的关键,原问题可能因为约束比较复杂,不容易求解,或者不满足凸优化条件,采用对偶处理后,是凹函数。问题是对偶处理的原理是什么?[1]
{ m i n f 0 ( x ) , x ∈ R n s u b f 1 ( x ) ≤ 0 \left \{ \begin{aligned} & min \quad f_0(x), x \in R^n \\ & sub \quad f_1(x) \leq 0 \end{aligned} \right. {minf0(x),xRnsubf1(x)0
采用二维空间几何描述 ( f 0 ( x ) , f 1 ( x ) ) (f_0(x), f_1(x)) (f0(x),f1(x)),u表示 f 1 ( x ) f_1(x) f1(x),t表示 f 0 ( x ) f_0(x) f0(x),集合表示为 G = { ( u , t ) , u ≤ 0 } G=\{ (u, t),u\leq0 \} G={(u,t),u0}.
对于最优解 p ∗ p^* p,有
p ∗ = i n f { t ∣ ( u , t ) ∈ G , u ≤ 0 } p^*=inf \{ t| (u,t)\in G, u \leq 0 \} p=inf{t(u,t)G,u0}

在这里插入图片描述

拉格朗日对偶采用类似于求集合超平面的方式:
g ( λ ) = i n f ( λ u + t ) , λ > 0 g(\lambda) = inf(\lambda u+t), \lambda >0 g(λ)=inf(λu+t),λ>0
∃ ( u , t ) ∈ G , 使 得 λ u + t − b = 0 \exists (u,t)\in G,使得\lambda u+t-b=0 (u,t)G,使λu+tb=0,在上面这个图像上,可能会有三种情况。
在这里插入图片描述
其中 b m a x b_{max} bmax p ∗ p^* p最接近。求 b m a x b_{max} bmax的过程和之前学过的求共轭函数原理上非常相似。
什么时候 b m a x b_{max} bmax p ∗ p^* p相等?
当原问题是一个凸优化问题,则满足强对偶条件。
在这里插入图片描述
几何上理解了对偶的原理后,再推广到弱对偶条件就容易理解了。对于一般的优化问题
{ m i n f 0 ( x ) , x ∈ R n s u b f i ( x ) ≤ 0 , i = 1 , . . . m h i ( x ) = 0 , i = 1 , . . . , p \left \{ \begin{aligned} & min \quad & f_0(x), x \in R^n \\ & sub \quad & f_i(x) \leq 0, i=1,...m \\ & \quad & h_i(x) =0,i=1,...,p \end{aligned} \right. minsubf0(x),xRnfi(x)0,i=1,...mhi(x)=0,i=1,...,p
定义对偶形式,设 t = f 0 ( x ) , μ = [ f 1 ( x ) , . . . , f m ( x ) ] T , υ = [ h 1 ( x ) , . . . , h p ( x ) ] T t=f_0(x), \mu = [f_1(x),...,f_m(x)]^T,\upsilon=[h_1(x),...,h_p(x)]^T t=f0(x),μ=[f1(x),...,fm(x)]T,υ=[h1(x),...,hp(x)]T
g ( λ , ν ) = t + λ T μ + ν T υ = ( λ , ν , 1 ) T ( μ , υ , t ) \begin{aligned} g(\lambda, \nu) &=t+\lambda^T \mu+\nu^T \upsilon \\ & = (\lambda, \nu, 1)^T(\mu, \upsilon, t) \end{aligned} g(λ,ν)=t+λTμ+νTυ=(λ,ν,1)T(μ,υ,t)

( λ , ν , 1 ) (\lambda, \nu, 1) (λ,ν,1)定义了集合 G ( μ , υ , t ) G(\mu, \upsilon, t) G(μ,υ,t)的超平面,所以有
( λ , ν , 1 ) T ( μ , υ , t ) ≥ g ( λ , ν ) (\lambda, \nu, 1)^T(\mu, \upsilon, t) \geq g(\lambda, \nu) (λ,ν,1)T(μ,υ,t)g(λ,ν)
一般定义 λ > 0 , 考 虑 μ < 0 , υ = 0 \lambda > 0,考虑\mu<0, \upsilon=0 λ>0,μ<0,υ=0,有
p ∗ = t ≥ ( λ , ν , 1 ) T ( μ , υ , t ) ≥ g ( λ , ν ) p^*=t\geq(\lambda, \nu, 1)^T(\mu, \upsilon, t)\geq g(\lambda, \nu) p=t(λ,ν,1)T(μ,υ,t)g(λ,ν)
满足了弱对偶条件。

1.2 对偶的性质

对偶有两个重要的性质[2]
1) g ( λ , ν ) g(\lambda, \nu) g(λ,ν)是凹函数(不论原问题是否是凸函数);
2) g ( λ , ν ) < p ∗ g(\lambda,\nu)<p* g(λ,ν)<p,对于任意 λ > 0 , ν \lambda>0,\nu λ>0,ν都成立;
对于性质1, t + λ T μ + ν T υ t+\lambda^T \mu+\nu^T \upsilon t+λTμ+νTυ是线性函数, g ( λ , ν ) g(\lambda,\nu) g(λ,ν)是逐点下确界用二维图像可以表示为下图,所以是凹函数。性质2是弱对偶性质。
对于复杂问题,如果不容易直接求最优解可以转换成
{ m a x g ( λ , ν ) s u b λ > 0 \left \{ \begin{aligned} & max \quad & g(\lambda, \nu) \\ & sub \quad & \lambda>0 \end{aligned} \right. {maxsubg(λ,ν)λ>0
在这里插入图片描述

2 对偶问题

2.1 转换成对偶问题

  • LS problem
    { m a x ∣ ∣ x ∣ ∣ 2 s u b A x = b \left \{ \begin{aligned} & max \quad & ||x||^2 \\ & sub \quad & Ax=b \end{aligned} \right. {maxsubx2Ax=b
    写成对偶形式
    L ( x , v ) = ∣ ∣ x ∣ ∣ 2 + ν T ( A x − b ) = x T x + ν T ( A x − b ) \begin{aligned} L(x,v) &=||x||^2+\nu^T(Ax-b) \\ & = x^Tx+ \nu^T(Ax-b) \end{aligned} L(x,v)=x2+νT(Axb)=xTx+νT(Axb)
    通过求导计算,L(x,v)的最小值
    d L d x = 2 x + A T ν = 0 \frac{dL}{dx}=2x+A^T\nu=0 dxdL=2x+ATν=0
    得到了 x = − 1 2 A T ν x=-\frac{1}{2}A^T\nu x=21ATν之后, g ( ν ) g(\nu) g(ν)也确定了
    g ( ν ) = 1 4 ν T A A T ν − 1 2 ν T A A T ν − ν T b = − 1 4 ν T A A T ν − ν T b \begin{aligned} g(\nu) &=\frac{1}{4}\nu^TAA^T\nu-\frac{1}{2}\nu^TAA^T\nu-\nu^Tb \\ &=-\frac{1}{4}\nu^TAA^T\nu-\nu^Tb \end{aligned} g(ν)=41νTAATν21νTAATννTb=41νTAATννTb
    接下来的工作就是求 g ( ν ) g(\nu) g(ν)的最大值。
  • LP (standard form)
    { m i n c T x s u b A x = b x ≥ 0 \left \{ \begin{aligned} & min \quad & c^Tx \\ & sub \quad & Ax=b \\ & \quad & x\geq 0 \end{aligned} \right. minsubcTxAx=bx0
    写成对偶形式:
    L ( x , v ) = c T x + λ T ( − x ) + ν T ( A x − b ) = ( c − λ + A T ν ) T x − ν T b \begin{aligned} L(x,v) &=c^Tx+\lambda^T(-x)+\nu^T(Ax-b)\\ &=(c-\lambda+A^T\nu)^Tx-\nu^Tb \end{aligned} L(x,v)=cTx+λT(x)+νT(Axb)=(cλ+ATν)TxνTb
    确定了 g ( λ , ν ) g(\lambda, \nu) g(λ,ν)
    在这里插入图片描述
  • two way partitioning
    二进制在计算机和电气系统中很常见。
    { m i n x T W x s u b x i 2 = 1 , i = 1 , . . . , n x i ∈ { − 1 , 1 } \left \{ \begin{aligned} & min \quad & x^TWx \\ & sub \quad & x_i^2=1,i=1,...,n\\ & \quad & x_i\in \{-1,1 \} \end{aligned} \right. minsubxTWxxi2=1,i=1,...,nxi{1,1}
    根据定义,写出对偶形式
    L ( x , v ) = x T W x + ∑ i n ν i ( x i 2 − 1 ) = x T W x + x T [ v 1 0 . . . 0 0 v 2 . . . 0 0 0 . . . v n ] x − I T ν = x T ( W + V ) x − I T ν \begin{aligned} L(x,v) &=x^TWx+\sum_{i}^{n} \nu_i(x_i^2-1) \\ &=x^TWx+x^T \begin{bmatrix} v_1 & 0 & ... & 0 \\ 0 & v_2 & ... & 0 \\ 0 & 0 & ... & v_n \end{bmatrix} x-I^T\nu \\ & = x^T(W+V)x-I^T\nu \end{aligned} L(x,v)=xTWx+inνi(xi21)=xTWx+xTv1000v20.........00vnxITν=xT(W+V)xITν
    根据二次函数的性质,得到
    g ( ν ) = i n f x L ( x , ν ) = { − I T ν , W + V ∈ S D P 0 , W + V ∉ S D P g(\nu)=\mathop{inf}\limits_{x}L(x,\nu)= \left \{ \begin{aligned} -I^T\nu, \quad W+V \in SDP \\ 0, \quad W+V \notin SDP \end{aligned} \right. g(ν)=xinfL(x,ν)={ITν,W+VSDP0,W+V/SDP

2.2 对偶和共轭函数的关系

2.2.1 理论部分

对于下面的形式
{ m i n f 0 ( x ) s u b x = 0 \left \{ \begin{aligned} & min \quad & f_0(x) \\ & sub \quad & x=0 \end{aligned} \right. {minsubf0(x)x=0
对偶形式:
L ( x , ν ) = f 0 ( x ) + ν T x g ( ν ) = i n f x { f 0 ( x ) + ν T x } = − s u p x { − f 0 ( x ) − ν T x } = − f 0 ∗ ( − ν ) \begin{aligned} L(x,\nu) &=f_0(x)+\nu^T x \\ g(\nu) &=\mathop{inf} \limits_{x}\{ f_0(x)+ \nu^T x \} \\ &=-\mathop{sup} \limits_{x} \{ -f_0(x)-\nu^T x \} \\ &=-f_0^*(-\nu) \end{aligned} L(x,ν)g(ν)=f0(x)+νTx=xinf{f0(x)+νTx}=xsup{f0(x)νTx}=f0(ν)
复习一下共轭函数
f ∗ ( y ) = s u p x ∈ d o m f ( y T x − f ( x ) ) f^*(y)=\mathop{sup} \limits_{x \in dom f}(y^Tx-f(x)) f(y)=xdomfsup(yTxf(x))

对于更加一般的形式
{ m i n f 0 ( x ) s u b A x ≤ b C x = d \left \{ \begin{aligned} & min \quad & f_0(x) \\ & sub \quad & Ax\leq b \\ & \quad & Cx=d \end{aligned} \right. minsubf0(x)AxbCx=d
拉格朗日方程
g ( λ , ν ) = i n f x { f 0 ( x ) + λ T ( A x − b ) + ν T ( C x − d ) } = i n f x { ( A T λ + C T ν ) T x + f 0 ( x ) } − λ T b − ν T d = − s u p x { ( − A T λ − C T ν ) T x − f 0 ( x ) } − λ T b − ν T d = − f 0 ∗ ( − A T λ − C T ν ) − λ T b − ν T d \begin{aligned} g(\lambda, \nu) &=\mathop{inf} \limits_{x} \{f_0(x)+\lambda^T(Ax-b)+\nu^T(Cx-d) \}\\ &=\mathop{inf} \limits_{x} \{(A^T\lambda+C^T\nu)^Tx+f_0(x) \}-\lambda^Tb-\nu^Td \\ &=-\mathop{sup} \limits_{x} \{ (-A^T\lambda-C^T\nu)^Tx-f_0(x) \}-\lambda^Tb-\nu^Td \\ &=-f_0^*(-A^T\lambda-C^T\nu) -\lambda^Tb-\nu^Td \end{aligned} g(λ,ν)=xinf{f0(x)+λT(Axb)+νT(Cxd)}=xinf{(ATλ+CTν)Tx+f0(x)}λTbνTd=xsup{(ATλCTν)Txf0(x)}λTbνTd=f0(ATλCTν)λTbνTd
限定的定义域可以归纳为 d o m g = { ( λ , ν ) ∣ ( − A T λ − C T ν ) ∈ f 0 ∗ } domg=\{ (\lambda, \nu)| (-A^T\lambda-C^T\nu) \in f_0^* \} domg={(λ,ν)(ATλCTν)f0}

2.2.2 applications

  • Equally constrained norm minimization
    从拉格朗日方程和共轭方程之间的联系,可以快速的找到拉格朗日方程的限定区间。
    { m i n ∣ ∣ x ∣ ∣ s u b A x = b \left \{ \begin{aligned} & min \quad & ||x||\\ & sub \quad & Ax= b \end{aligned} \right. {minsubxAx=b
    对于 f 0 ( x ) = ∣ ∣ x ∣ ∣ f_0(x)=||x|| f0(x)=x,其共轭函数
    f 0 ∗ ( y ) = s u p x { y T x + ∣ ∣ x ∣ ∣ } = { 0 , ∣ ∣ y ∣ ∣ ≤ 1 ∞ , e l s e \begin{aligned} f_0^*(y) &=\mathop{sup}\limits_{x}\{ y^Tx+||x|| \} \\ & = \left \{ \begin{aligned} & 0, \quad ||y||\leq 1 \\ & \infty, \quad else \end{aligned} \right. \end{aligned} f0(y)=xsup{yTx+x}={0,y1,else
    根据对偶和共轭方程之间的联系,
    g ( ν ) = − f 0 ∗ ( − A T ν ) − b T ν = { − b T v , ∣ ∣ A T ν ∣ ∣ ≤ 1 − ∞ , e l s e \begin{aligned} g(\nu) &=-f_0^*(-A^T\nu)-b^T\nu \\ &= \left \{ \begin{aligned} &-b^Tv, \quad ||A^T\nu|| \leq1 \\ &- \infty, \quad else \end{aligned} \right. \end{aligned} g(ν)=f0(ATν)bTν={bTv,ATν1,else
    这样就得到了对偶形式。

2.3 对偶形式

2.3.1 定义

优化问题转换成功对偶形式,如下
在这里插入图片描述
再来看对偶问题的定义域和纬度:
d o m g = { ( λ , ν ) ∣ g ( λ , v ) > − ∞ } d i m ( d o m g ) ≤ m + p \begin{aligned} dom g=\{ (\lambda, \nu) | g(\lambda, v) > - \infty \} \\ dim (domg) \leq m+p \end{aligned} domg={(λ,ν)g(λ,v)>}dim(domg)m+p

2.3.2 applications

  • standard LP
    p ) { m i n c T x s u b A x = b x ≥ 0 p) \quad \left \{ \begin{aligned} & min \quad & c^Tx\\ & sub \quad & Ax= b \\ & \quad & x \geq 0 \end{aligned} \right. p)minsubcTxAx=bx0
    拉格朗日方程有
    L ( x , λ , ν ) = c T x + λ T ( − x ) + ν T ( A x − b ) = ( c − λ + A T ν ) x − ν T b \begin{aligned} L(x,\lambda, \nu) &=c^Tx+\lambda^T(-x)+\nu^T(Ax-b) \\ & = (c-\lambda+A^T\nu)x-\nu^Tb \end{aligned} L(x,λ,ν)=cTx+λT(x)+νT(Axb)=(cλ+ATν)xνTb
    写成 g ( λ , ν ) g(\lambda, \nu) g(λ,ν)形式
    g ( λ , ν ) = { − ν T b , A T ν − λ + c = 0 − ∞ , e l s e g(\lambda, \nu) = \left \{ \begin{aligned} & -\nu^T b, \quad A^T\nu-\lambda+c=0 \\ & -\infty, \quad else \end{aligned} \right. g(λ,ν)={νTb,ATνλ+c=0,else
    对偶形式
    p ) { m a x − ν T b s u b A T ν − λ + c = 0 λ ≥ 0 p) \quad \left \{ \begin{aligned} & max \quad &-\nu^Tb\\ & sub \quad & A^T\nu-\lambda+c=0 \\ & \quad & \lambda \geq0 \end{aligned} \right. p)maxsubνTbATνλ+c=0λ0
    进一步可以修改成:
    p ) { m a x − ν T b s u b A T ν + c ≥ 0 p) \quad \left \{ \begin{aligned} & max \quad &-\nu^Tb\\ & sub \quad & A^T\nu+c\geq 0 \end{aligned} \right. p){maxsubνTbATν+c0
  • inequality form of LP
    p ) { m i n c T x s u b A x ≤ b p) \quad \left \{ \begin{aligned} & min \quad & c^Tx\\ & sub \quad & Ax\leq b \end{aligned} \right. p){minsubcTxAxb
    拉格朗日方程有
    L ( x , λ ) = c T x + λ T ( A x − b ) = ( c + A T λ ) x − λ T b \begin{aligned} L(x,\lambda) &=c^Tx+\lambda^T(Ax-b) \\ & = (c+A^T\lambda)x-\lambda^Tb \end{aligned} L(x,λ)=cTx+λT(Axb)=(c+ATλ)xλTb
    写成 g ( λ , ν ) g(\lambda, \nu) g(λ,ν)形式
    g ( λ , ν ) = { − λ T b , A T λ + c = 0 − ∞ , e l s e g(\lambda, \nu) = \left \{ \begin{aligned} & -\lambda^T b, \quad A^T\lambda+c=0 \\ & -\infty, \quad else \end{aligned} \right. g(λ,ν)={λTb,ATλ+c=0,else
    对偶形式
    D ) { m a x − λ T b s u b A T λ + c = 0 λ ≥ 0 D) \quad \left \{ \begin{aligned} & max \quad &-\lambda^Tb\\ & sub \quad & A^T\lambda+c=0 \\ & \quad & \lambda \geq0 \end{aligned} \right. D)maxsubλTbATλ+c=0λ0
    这里发现一个有意思的现象,
    在这里插入图片描述

2.4 slater 条件

2.4.1 定义

slater条件从图像上非常容易理解,满足下列形式,说明 p ∗ = d ∗ p^*=d^* p=d。根据[1],slater条件仅仅只是一个充分条件。
在这里插入图片描述
不符合slater条件,但是也满足强对偶条件。
在这里插入图片描述

2.4.2 applications

  • QCQP
    p ) { m i n 1 2 x T p 0 x + q 0 T x + r 0 s u b 1 2 x T p i x + q i T x + r i , i = 1 , . . . , m p) \quad \left \{ \begin{aligned} & min \quad & \frac{1}{2}x^Tp_0x+q_0^Tx+r_0\\ & sub \quad & \frac{1}{2}x^Tp_ix+q_i^Tx+r_i , i=1,...,m \end{aligned} \right. p)minsub21xTp0x+q0Tx+r021xTpix+qiTx+ri,i=1,...,m
    为了简化计算,设
    p ( λ ) = p 0 + ∑ λ i p i q ( λ ) = q 0 + ∑ λ i q i r ( λ ) = r 0 + ∑ λ r i \begin{aligned} p(\lambda) &=p_0+\sum \lambda_ip_i \\ q(\lambda) &=q_0+\sum \lambda_iq_i \\ r(\lambda) & = r_0+\sum \lambda r_i \end{aligned} p(λ)q(λ)r(λ)=p0+λipi=q0+λiqi=r0+λri
    得到拉格朗日方程
    L ( x , λ ) = 1 2 x T p 0 x + q 0 T x + r 0 + ∑ λ i ( 1 2 x T p i x + q i T x + r i ) = 1 2 x T p x + q T x + r \begin{aligned} L(x,\lambda)&=\frac{1}{2}x^Tp_0x+q_0^Tx+r_0 + \sum \lambda_i( \frac{1}{2}x^Tp_ix+q_i^Tx+r_i ) \\ &=\frac{1}{2}x^Tpx+q^Tx+r \end{aligned} L(x,λ)=21xTp0x+q0Tx+r0+λi(21xTpix+qiTx+ri)=21xTpx+qTx+r
    通过求导计算 L ( x , λ ) L(x,\lambda) L(x,λ)相对于x的最小值。有 p ( λ ) x + q ( λ ) = 0 p(\lambda)x+q(\lambda)=0 p(λ)x+q(λ)=0,得到 x 0 = − p ( λ ) − 1 q ( λ ) x_0=-p(\lambda)^{-1}q(\lambda) x0=p(λ)1q(λ)
    代入,有
    g ( λ ) = − 1 2 q T ( λ ) p − 1 ( λ ) q ( λ ) + r ( λ ) \begin{aligned} g(\lambda) &=-\frac{1}{2}q^T(\lambda)p^{-1}(\lambda)q(\lambda)+r(\lambda) \end{aligned} g(λ)=21qT(λ)p1(λ)q(λ)+r(λ)
    对偶形式如下
    D ) { m a x − 1 2 q T ( λ ) p − 1 ( λ ) q ( λ ) + r ( λ ) s u b λ ≥ 0 D) \quad \left \{ \begin{aligned} & max \quad & -\frac{1}{2}q^T(\lambda)p^{-1}(\lambda)q(\lambda)+r(\lambda)\\ & sub \quad & \lambda \geq0 \end{aligned} \right. D)maxsub21qT(λ)p1(λ)q(λ)+r(λ)λ0
  • entropy maximization
    p ) { m i n ∑ x i l o g x i s u b A x ≤ b I T x = 1 p) \quad \left \{ \begin{aligned} & min \quad & \sum x_ilogx_i\\ & sub \quad & Ax\leq b \\ & \quad & I^Tx=1 \end{aligned} \right. p)minsubxilogxiAxbITx=1
    f 0 ( x ) = ∑ x i l o g x i f_0(x)=\sum x_ilogx_i f0(x)=xilogxi,共轭函数
    f 0 ∗ ( y ) = s u p x ∈ d o m f ( y T x − ∑ x i l o g x i ) = s u p x ∈ d o m f ∑ ( y i x i − x i l o g x i ) \begin{aligned} f_0^*(y) &=\mathop{sup}\limits_{x\in dom f}(y^Tx-\sum x_ilogx_i) \\ & = \mathop{sup}\limits_{x\in dom f}\sum(y_ix_i-x_ilogx_i) \end{aligned} f0(y)=xdomfsup(yTxxilogxi)=xdomfsup(yixixilogxi)
    h i ( x ) = y i x i − x i l o g x i 求 导 h_i(x)=y_ix_i-x_ilogx_i求导 hi(x)=yixixilogxi,容易得到 x i = e x p ( y i − 1 ) x_i=exp(y_i-1) xi=exp(yi1)得到最大值,共轭函数可以整理成
    f 0 ∗ ( y ) = ∑ ( y i e x p ( y i − 1 ) − ( y i − 1 ) e x p ( y i − 1 ) ) = ∑ e x p ( y i − 1 ) \begin{aligned} f_0^*(y) &=\sum (y_iexp(y_i-1)-(y_i-1)exp(y_i-1)) \\ & = \sum exp(y_i-1) \end{aligned} f0(y)=(yiexp(yi1)(yi1)exp(yi1))=exp(yi1)
    根据拉格朗日方程和共轭方程的联系,有
    g ( λ , ν ) = f 0 ∗ ( − A T λ − c T ν ) − λ T b − I T ν = − λ T b − I T ν − e x p ( − ν − 1 ) ∑ i e x p ( − a i T λ i ) \begin{aligned} g(\lambda, \nu) & = f_0^*(-A^T\lambda -c^T\nu)-\lambda^Tb-I^T\nu\\ & = -\lambda^Tb-I^T\nu-exp (-\nu-1) \sum \limits_iexp(-a_i^T\lambda_i) \end{aligned} g(λ,ν)=f0(ATλcTν)λTbITν=λTbITνexp(ν1)iexp(aiTλi)
    这是个二元函数,并不容易求到最大值,需要进一步简化。
    ν \nu ν进行求导
    d g d ν = − I + e x p ( − ν − 1 ) ∑ i e x p ( − a i T λ i ) = 0 \frac{dg}{d\nu}=-I+exp (-\nu-1) \sum \limits_iexp(-a_i^T\lambda_i)=0 dνdg=I+exp(ν1)iexp(aiTλi)=0
    得到
    v 0 = l o g ∑ i e x p ( − a i T λ i ) − 1 \begin{aligned} v_0 &=log\sum \limits_iexp(-a_i^T\lambda_i)-1 \end{aligned} v0=logiexp(aiTλi)1
    整理 g ( λ , ν ) g(\lambda, \nu) g(λ,ν),有
    g ( λ , ν ) = − λ T b − l o g ∑ i e x p ( − a i T λ i ) = − l o g ∑ i e x p ( − a i T λ i ) e x p ( − b T λ ) = − l o g ∑ i e x p ( − a i T λ i + b T λ ) \begin{aligned} g(\lambda, \nu)&= -\lambda^Tb-log\sum \limits_iexp(-a_i^T\lambda_i) \\ &=-log \frac{\sum \limits_iexp(-a_i^T\lambda_i)}{exp(-b^T\lambda)} \\ &=-log \sum \limits_iexp(-a_i^T\lambda_i+b^T\lambda) \end{aligned} g(λ,ν)=λTblogiexp(aiTλi)=logexp(bTλ)iexp(aiTλi)=logiexp(aiTλi+bTλ)

对偶形式有
D ) { m i n l o g ∑ i e x p ( − a i T λ i + b T λ ) s u b λ ≥ 0 D) \quad \left \{ \begin{aligned} & min \quad & log \sum \limits_iexp(-a_i^T\lambda_i+b^T\lambda)\\ & sub \quad & \lambda \geq0 \end{aligned} \right. D)minsublogiexp(aiTλi+bTλ)λ0

  • trust region region
    存在原问题并不是凸问题,但是仍满足强对偶条件的情况。
    p ) { m i n x T A x + 2 b T x s u b x T x ≤ 1 p) \quad \left \{ \begin{aligned} & min \quad & x^TAx+2b^Tx\\ & sub \quad & x^Tx \leq 1 \end{aligned} \right. p){minsubxTAx+2bTxxTx1
    拉格朗日方程
    L ( x , λ ) = x T A x + 2 b T x + λ ( x T x − 1 ) = x T ( A + λ I ) x + 2 b T x − λ \begin{aligned} L(x,\lambda) & = x^TAx+2b^Tx+\lambda (x^Tx-1) \\ & = x^T(A+\lambda I)x+2b^Tx-\lambda \end{aligned} L(x,λ)=xTAx+2bTx+λ(xTx1)=xT(A+λI)x+2bTxλ
    通过求导,找出 x 0 = − ( A + λ I ) − 1 b x_0=-(A+\lambda I)^{-1}b x0=(A+λI)1b,
    d L d x = 2 ( A + λ I ) x + 2 b = 0 \frac{dL}{dx}=2(A+\lambda I)x+2b=0 dxdL=2(A+λI)x+2b=0
    A = Q T ε Q A=Q^T \varepsilon Q A=QTεQ, Q是特征向量, ε \varepsilon ε是特征矩阵,有
    g ( λ ) = − b T ( A + λ I ) − T b − λ = − b T Q ( ε + λ I ) − 1 Q T b − λ = − ∑ i = 1 n ( b i T q i ) 2 λ i + λ − λ \begin{aligned} g(\lambda) &= -b^T(A+\lambda I)^{-T}b-\lambda \\ & =-b^TQ(\varepsilon +\lambda I)^{-1}Q^Tb -\lambda \\ &=-\sum \limits_{i=1}^{n}\frac{(b_i^Tq_i)^2}{\lambda_i+\lambda}-\lambda \end{aligned} g(λ)=bT(A+λI)Tbλ=bTQ(ε+λI)1QTbλ=i=1nλi+λ(biTqi)2λ
    转换成对偶形式
    D ) { m i n ∑ i = 1 n ( b i T q i ) 2 λ i + λ + λ s u b λ ≥ − λ ( A ) m i n D) \quad \left \{ \begin{aligned} & min \quad & \sum \limits_{i=1}^{n}\frac{(b_i^Tq_i)^2}{\lambda_i+\lambda}+\lambda \\ & sub \quad & \lambda\geq -\lambda(A)_{min} \end{aligned} \right. D)minsubi=1nλi+λ(biTqi)2+λλλ(A)min

References

[1] https://www.zhihu.com/question/58584814/answer/1119054535
[2] https://zhuanlan.zhihu.com/p/133457394
[3] https://www.youtube.com/watch?v=Qneah_lyQ0o&list=PL-DDW8QIRjNOVxrU2efygBw0xADVOgpmw&index=15
[4] https://www.youtube.com/watch?v=0WpYucMfaHM&list=PL-DDW8QIRjNOVxrU2efygBw0xADVOgpmw&index=16

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值