最优化方法

任意矩阵 A ∈ C m × n A\in C^{m\times n} ACm×n,都能被奇异值分解为:
A = U [ ∑ r 0 0 0 ] V T A=U\begin{bmatrix} \sum_r & 0\\ 0 & 0\end{bmatrix} V^T A=U[r000]VT
其中, U U U m × n m\times n m×n的正交矩阵, V V V n × n n\times n n×n的正交矩阵, ∑ r \sum_r r是由 r r r个沿对角线从大到小排列的奇异值组成的方阵。 r r r就是矩阵 A A A的秩。
3. 线性最小二乘问题
考虑线性方程组 A x = b \mathbf{A}\mathbf{x}=\mathbf{b} Ax=b, 求其最小二乘解。
如果 A \mathbf{A} A的秩是 n n n, 则其唯一解是 A + b A^{+}\mathbf{b} A+b;如果秩小于 n n n, 则有无穷多解, 其中的最小范数解仍然是 A + b \mathbf{A}^{+}\mathbf{b} A+b;常关心的也就是这个解。

适定问题是指满足下列三个要求的问题:①解是存在的(存在性);②解是惟一的(唯一性);③解连续依赖于初始值条件(稳定性)。这三个要求中,只要有一个不满足,则称之为不适定问题。
凸优化问题是指目标函数为凸函数且由约束条件得到的定义域为凸集的优化问题。
凸函数是指一个定义在某个向量空间的凸子集 C C C(区间)上的实值函数 f f f,而且对于凸子集 C C C中任意两个向量 x 1 x_1 x1 x 1 x_1 x1 f ( ( x 1 + x 2 ) / 2 ) ≤ ( f ( x 1 ) + f ( x 2 ) ) / 2 f((x_1+x_2)/2)≤(f(x_1)+f(x_2))/2 f((x1+x2)/2)(f(x1)+f(x2))/2
凸集:在欧氏距离空间中,凸集是指对于所在区域中的任意两个点,连接这两个点的直线也在该区域内的区域。

谱范数(2范数) ∣ ∣ A ∣ ∣ 2 ||A||_2 A2 A H A A^HA AHA (其中 A H A^H AH 共轭转置)的最大特征根的非负平方根,即 ∣ ∣ A ∣ ∣ 2 ||A||_2 A2= ( maximum eigenvalue of  A H A ) 1 / 2 (\text{maximum eigenvalue of }A^HA)^{1/2} (maximum eigenvalue of AHA)1/2

对于 L 1 L_1 L1范数函数的Proximal算子:

参考网址:The Proximal Operator of the L1 Norm Function
由Proximal算子给出的优化问题:
Prox ⁡ λ ∥ 1 ( x ) = arg ⁡ min ⁡ u { 1 2 ∥ u − x ∥ 2 + λ ∥ u ∥ 1 } \operatorname{Prox}_{\lambda \|_{1}}(x)=\arg \min _{u}\left\{\frac{1}{2}\|u-x\|^{2}+\lambda\|u\|_{1}\right\} Proxλ1(x)=argumin{21ux2+λu1}
根据 u u u x x x,该问题可以进行分离,因此可以转换为如下问题:
arg ⁡ min ⁡ u i { 1 2 ( u i − x i ) 2 + λ ∣ u i ∣ } \arg \min _{u_{i}}\left\{\frac{1}{2}\left(u_{i}-x_{i}\right)^{2}+\lambda\left|u_{i}\right|\right\} arguimin{21(uixi)2+λui}
Now, you can proceed using First Order Optimality Condition and the Sub Gradient of the a b s ( ⋅ ) abs(\cdot) abs() function or you can employ simple trick.
需要知道的是, u i u_i ui可以是负值、零和正值。定义: F = 1 2 ( u i − x i ) 2 + λ ∣ u i ∣ F=\frac{1}{2}\left(u_{i}-x_{i}\right)^{2}+\lambda\left|u_{i}\right| F=21(uixi)2+λui
(1)若 u i > 0 u_i>0 ui>0,则 F F F u i u_i ui的偏导数为 ∂ F / ∂ u i = u i − x i + λ \partial F/\partial {u_i}=u_i-x_i+\lambda F/ui=uixi+λ。令 ∂ F / ∂ u i − 0 \partial F/\partial {u_i}-0 F/ui0,则 u i = x i − λ u_i=x_i-\lambda ui=xiλ.
(2)The same procedure for the case ui<0 yields u i = x i + λ u_i=x_i+\lambda ui=xi+λ for x i < − λ x_i<−\lambda xi<λ.
For values of xi in between, since ui=0 and hence the derivative (Sub Gradient) of ui can freely be chosen on the range [−1,1] the value of ui=0 holds.
Prox ⁡ λ ∥ ⋅ ∥ 1 ( x ) i = sign ⁡ ( x i ) max ⁡ ( ∣ x i ∣ − λ , 0 ) \operatorname{Prox}_{\lambda\|\cdot\|_{1}}(x)_{i}=\operatorname{sign}\left(x_{i}\right) \max \left(\left|x_{i}\right|-\lambda, 0\right) Proxλ1(x)i=sign(xi)max(xiλ,0)
该运算可以被称为软阈值(Soft Threshold)函数。

Incremental proximal方法

Incremental proximal方法是一类旨在解决问题的算法,在每次迭代中,它只使用一个成分 f k f_k fk。当已经取得 θ k − 1 \theta_{k-1} θk1的条件下,Inpremental proximal方法通过求解如下子问题进行解决:
θ k = prox ⁡ λ , f k ( θ k − 1 ) = argmin ⁡ θ ∈ R d f k ( θ ) + λ ∥ θ − θ k − 1 ∥ 2 , V − 1 2 \theta_{k}=\operatorname{prox}_{\lambda, f_{k}}\left(\theta_{k-1}\right)=\underset{\theta \in \mathbb{R}^{d}}{\operatorname{argmin}} f_{k}(\theta)+\lambda\left\|\theta-\theta_{k-1}\right\|_{2, V^{-1}}^{2} θk=proxλ,fk(θk1)=θRdargminfk(θ)+λθθk12,V12

基追踪算法

基追踪(basis pursuit)算法是一种用来求解未知参量 L 1 L_1 L1范数最小化的等式约束问题的算法。
基追踪是通常在信号处理中使用的一种对已知系数稀疏化的手段。将优化问题中的 L 0 L_0 L0范数转化为 L 1 L_1 L1范数的求解就是基追踪的基本思想。

肓源分离

盲源分离(Blind Source Separation,BSS)又称为盲信号分离,是指在信号的理论模型和源信号无法精确获知的情况下,如何从混迭信号(观测信号)中分离出各源信号的过程。盲源分离和盲辨识是盲信号处理的两大类型。盲源分离的目的是求得源信号的最佳估计,盲辨识的目的是求得传输通道的混合矩阵。

总变分(Total Variation)最小化方法

Rudin等人(Rudin1990)观察到,受噪声污染的图像的总变分比无噪图像的总变分明显的大。总变分定义为梯度幅值的积分:
J T 0 ( u ) = ∫ Ω u ∣ ∇ u ∣ d x d y = ∫ D u u x 2 + u y 2 d x d y J_{T_{0}}(u)=\int_{\Omega_{u}}\left|\nabla_{u}\right| d x d y=\int_{D_{u}} \sqrt{u_{x}^{2}+u_{y}^{2} d x d y} JT0(u)=Ωuudxdy=Duux2+uy2dxdy
其中, u x = ∂ u ∂ x u_{x}=\frac{\partial u}{\partial x} ux=xu u y = ∂ u ∂ y u_{y}=\frac{\partial u}{\partial y} uy=yu D u D_u Du是图像的支持域。限制总变分就会限制噪声。
对于定义在区间 [ a , b ] ⊂ R [a, b] \subset \mathbb{R} [a,b]R的实值函数 f f f,该定义区间上的总变分为用参数方程 x ↦ f ( x ) f o r x ∈ [ a , b ] x \mapsto f(x) for x\in [a, b] xf(x)forx[a,b]测量曲线的一维长度。

对于2D信号 y y y,比如图像,它的Rudin等人提出了全变分范数[1]:
V ( y ) = ∑ i , j ∣ y i + 1 , j − y i , j ∣ 2 + ∣ y i , j + 1 − y i , j ∣ 2 V(y)=\sum_{i, j} \sqrt{\left|y_{i+1, j}-y_{i, j}\right|^{2}+\left|y_{i, j+1}-y_{i, j}\right|^{2}} V(y)=i,jyi+1,jyi,j2+yi,j+1yi,j2
该全变分范数是各向同性的,但不是可微分的。它的一个变种如下所示(因为这个变种有时更易最小化):
V a n i s o ( y ) = ∑ i , j ∣ y i + 1 , j − y i , j ∣ 2 + ∣ y i , j + 1 − y i , j ∣ 2 = ∑ i , j ∣ y i + 1 , j − y i , j ∣ + ∣ y i , j + 1 − y i , j ∣ V_{\mathrm{aniso}}(y)=\sum_{i, j} \sqrt{\left|y_{i+1, j}-y_{i, j}\right|^{2}}+\sqrt{\left|y_{i, j+1}-y_{i, j}\right|^{2}}=\sum_{i, j}\left|y_{i+1, j}-y_{i, j}\right|+\left|y_{i, j+1}-y_{i, j}\right| Vaniso(y)=i,jyi+1,jyi,j2 +yi,j+1yi,j2 =i,jyi+1,jyi,j+yi,j+1yi,j
则标准的全变分去噪问题有如下形式:
min ⁡ y [ E ( x , y ) + λ V ( y ) ] \min _{y}[\mathrm{E}(x, y)+\lambda V(y)] ymin[E(x,y)+λV(y)]
其中, E E E是2维的 L 2 L_2 L2范数。

Augmented Lagrange Multiplier Method

ALM method may be called as Method of Multiplier (MOM) or Primal-Dual Method. Let’s consider Lagrangian functional only for equality constraints h ( x ) = 0 h(x)=0 h(x)=0.
L ( x ) = f ( x ) + λ T h ( x ) L(x)=f(x)+\lambda^Th(x) L(x)=f(x)+λTh(x)
Now, for a Lagrange multiplier vector λ ∗ \lambda^* λ, suppose that there is an optimum x ∗ x^* x for the following unconstrained optimization problem.
min ⁡ x L ( x , λ ∗ ) \min_x L(x,\lambda^*) xminL(x,λ)
If x ∗ x^* x satisfy all the equality constraints h ( x ∗ ) = 0 h(x^*)=0 h(x)=0 in the original design problem, x ∗ x^* x is an optimum for the original optimization problem and λ ∗ \lambda^* λ is a Lagrange multiplier optimum. Consequently, the original optimization problem can be transformed into the following problem that have the same optimum x ∗ x^* x and λ ∗ \lambda^* λ.
min ⁡ x L ( x , λ ) s.t.    h i ( x ) = 0 ,    i = 1 , 2 , . . . , l \min_x L(x,\lambda )\\ \text{s.t.}\,\,h_i(x)=0,\,\,i=1,2,...,l xminL(x,λ)s.t.hi(x)=0,i=1,2,...,l
In order to avoid the unboundness of Lagrangian, a penalty function is introduced. We call it as augmented Lagrangian.
A ( x , λ , r ) = L ( x , λ ) + 1 2 ∑ i = 1 l r i h i ( x ) ) 2 A(x,\lambda,r)=L(x,\lambda)+\frac{1}{2}\sum_{i=1}^{l}r_ih_i(x))^2 A(x,λ,r)=L(x,λ)+21i=1lrihi(x))2
where, r i r_i ri is the penalty parameter for the ith equality constraint. In the ALM method, the unconstrained optimization tool sequentially minimize the augmented Lagrangian for the given value of r i r_i ri and λ i \lambda_i λi. Then, these two parameters are modified to satisfy the optimality condition.

The update rule for Lagrange multipliers can be determined from the following relation.
lim ⁡ x → x ∗ ∇ A ( x , λ , r ) = ∇ L ( x ∗ , λ ∗ ) \lim_{x\rightarrow x^*}\nabla A(x,\lambda,r)=\nabla L(x^*,\lambda^*) xxlimA(x,λ,r)=L(x,λ)
This implies
lim ⁡ x → x ∗ ( λ i + r i h i ( x ) ) = λ i ∗ ,    i = 1 , 2 , . . . , l \lim_{x\rightarrow x^*} \left(\lambda_i+r_ih_i(x)\right)=\lambda_i^*,\,\,i=1,2,...,l xxlim(λi+rihi(x))=λi,i=1,2,...,l
Hence, the update rule for Lagrange multipliers is
λ i k + 1 = λ i k + r i k h i ( x k ) \lambda_i^{k+1}=\lambda_i^k+r_i^k h_i(x^k) λik+1=λik+rikhi(xk)
where, the superscript k k k is the iteration of ALM algorithm.

Inequality constraints are transformed into equality constraints by adding slack variables ( θ j > 0 \theta_j>0 θj>0). Thus, the augmented Lagrangian becomes
A ( x , θ , μ , r ) = f ( x ) + ∑ j = 1 m [ μ j ( g j ( x ) + θ ) + 1 2 ( g j ( x ) + θ j ) 2 ] A(x,\theta,\mu,r)=f(x)+\sum_{j=1}^m\left[\mu_j(g_j(x)+\theta)+\frac{1}{2}(g_j(x)+\theta_j)^2\right] A(x,θ,μ,r)=f(x)+j=1m[μj(gj(x)+θ)+21(gj(x)+θj)2]
Then, a new primal variables are x ˉ = { x , θ } \bar{x}=\{x,\theta\} xˉ={x,θ}. The augmented Lagrangian should satisfy the optimality condition for slack variable θ j \theta_j θj.
∇ t h e t a j A ( x , θ , μ , r ) = ∑ j = 1 m ( μ j + r j ( g j ( x ) + θ j ) ) = 0 \nabla_{theta_j}A(x,\theta,\mu,r)=\sum_{j=1}^m(\mu_j+r_j(g_j(x)+\theta_j))=0 thetajA(x,θ,μ,r)=j=1m(μj+rj(gj(x)+θj))=0
Hence, an optimum of slack variable θ j \theta_j θj is
θ j ∗ = max ⁡ { 0 , − μ j r j − g j ( x ) } \theta_j^*=\max\left\{0,-\frac{\mu_j}{r_j}-g_j(x)\right\} θj=max{0,rjμjgj(x)}
Now, these optimum values are substituted into the originally transformed form.
g j ( x ) + θ j ∗ = g j ( x ) + max ⁡ { 0 , − μ j r j − g j ( x ) } = max ⁡ { g j ( x ) , − μ j r j } g_j(x)+\theta_j^*=g_j(x)+\max\left\{0,-\frac{\mu_j}{r_j}-g_j(x)\right\}\\ =\max\left\{g_j(x),-\frac{\mu_j}{r_j}\right\}\\ gj(x)+θj=gj(x)+max{0,rjμjgj(x)}=max{gj(x),rjμj}
Hence, the augmented Lagrangian for inequality constraints are transformed into the following simple functional.
A ( x , μ , r ) = f ( x ) + ∑ j = 1 m [ μ j Ψ j , r j ) + 1 2 r j Ψ j ( x , μ j , r j ) 2 ] A(x,\mu,r)=f(x)+\sum_{j=1}^m\left[\mu_j\Psi_j,r_j)+\frac{1}{2}r_j\Psi_j(x,\mu_j,r_j)^2 \right] A(x,μ,r)=f(x)+j=1m[μjΨj,rj)+21rjΨj(x,μj,rj)2]
where, Ψ j ( x , μ j , r j ) = max ⁡ { g j ( x ) , − μ j r j } \Psi_j(x,\mu_j,r_j)=\max\{g_j(x),-\frac{\mu_j}{r_j}\} Ψj(x,μj,rj)=max{gj(x),rjμj}. Also, the Lagrange multiplier update rule is defined as
μ j k + 1 = μ j k + r j Ψ j ( x k , μ j k , r j k ) ,    j = 1 , 2 , . . . , l \mu_j^{k+1}=\mu_j^k+r_j\Psi_j(x^k,\mu_j^k,r_j^k),\,\,j=1,2,...,l μjk+1=μjk+rjΨj(xk,μjk,rjk),j=1,2,...,l

相关函数基础

考虑到Hilbert空间中的线性映射 A : H 1 → H 2 A:H_{1}\to H_{2} A:H1H2。若不关注细节,adjoint operator是如下一种线性算子(大部分情况下这是唯一定义) A ∗ : H 2 → H 1 A^{*}:H_{2}\to H_{1} A:H2H1
⟨ A h 1 , h 2 ⟩ H 2 = ⟨ h 1 , A ∗ h 2 ⟩ H 1 \left\langle Ah_{1},h_{2}\right\rangle _{H_{2}}=\left\langle h_{1},A^{*}h_{2}\right\rangle _{H_{1}} Ah1,h2H2=h1,Ah2H1
其中, ⟨ ⋅ , ⋅ ⟩ H i \langle \cdot ,\cdot \rangle _{H_{i}} ,Hi是Hilbert空间 H i H_{i} Hi中的内积算子.

参考文献:
[1] Rudin, Leonid I., Stanley Osher, and Emad Fatemi. “Nonlinear total variation based noise removal algorithms.” Physica D: nonlinear phenomena 60.1-4 (1992): 259-268.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值