Understanding the ADMM algorithm

版权声明:本文为博主原创文章,未经博主允许不得转载。https://blog.csdn.net/weixin_42444045/article/details/83512560

ADMM

This chapter explains the procedure of Alternative Direction Methods for Multipliers (ADMM), starting from the fundamental knowledge about the optimization problem.

1. What is optimization problem?

The following function represents a most common and simple optimization problem:
m i n x f ( x ) − − − − ( 1 ) min_x f(x) ----(1) minxf(x)(1)

The x represents the variable needs to be optimized, i.e. by changing the value of x to make our objective function f ( x ) f(x) f(x) achieve the minimum value.

In above equation, if we have only this function without any constrains to variable x, this problem is a simplest optimization problem, but in many circumstances, there will be many constrains to optimization variable x such as x needs to meet equality or inequality constraints, or to satisfy a set of numbers, e.g. s . t . A x = b , A x > c s.t. Ax= b, Ax > c s.t.Ax=b,Ax>c. So an optimization problem with equality constraints might look like this:
m i n x f ( x ) − − − − ( 2 ) min_x f(x) ----(2) minxf(x)(2) s . t . A x = b s.t.Ax=b s.t.Ax=b

2. Dual Gradient Ascent

The alternating direction method of multipliers (ADMM) is an algorithm that solves convex optimization problems by breaking them into smaller pieces, each of which are then easier to handle.

Consider the question (2), x ∈ R n , A ∈ R m ∗ n , f : R n ∈ R x∈\mathbb R^n, A∈\mathbb R^{m*n}, f: \mathbb R^n∈R xRn,ARmn,f:RnR is a convex function, the Lagrangian function is :
L ( x , y ) = f ( x ) + y T ( A x − b ) − − − − ( 3 ) L(x,y)=f(x)+y^T(Ax-b)----(3) L(x,y)=f(x)+yT(Axb)(3)
The dual function of (3) is:
g ( y ) = i n f x L ( x , y ) = − f ∗ ( − A T y ) − b T y − − − − ( 4 ) g(y)=inf_xL(x,y)=-f^*(-A^Ty)-b^Ty----(4) g(y)=infxL(x,y)=f(ATy)bTy(4)
Where the y y y is Lagrangian multiplier and dual variable, f ∗ f* f is the conjugate function of f f f.

Assume a function satisfies strong duality property, then the optimal solution for original problem is same as the optimal solution for dual problem. If the optimal solution for original problem is x ∗ x* x, the optimal solution for dual problem is y ∗ y* y, then:
x ∗ = a r g m i n x L ( x , y ∗ ) − − − − ( 5 ) x^*=argmin_xL(x,y^*)----(5) x=argminxL(x,y)(5)
We use gradient ascend to solve dual problem, the update of dual gradient ascent is:
x k + 1 = a r g m i n x L ( x , y k ) x^{k+1}=argmin_xL(x,y^k) xk+1=argminxL(x,yk) y k + 1 = y k + a k ( A x k + 1 − b ) − − − − ( 6 ) y^{k+1}=y^k+a_k(Ax^k+1-b)----(6) yk+1=yk+ak(Axk+1b)(6)
Where a k > 0 a_k>0 ak>0 is the step size of the gradient ascent.

Assume the objective function is decomposable, then:
f ( x ) = ∑ i = 1 n f i ( x i ) − − − − ( 7 ) f(x)=\sum_{i=1}^nf_i(x_i)----(7) f(x)=i=1nfi(xi)(7)
Where , x = ( x 1 , x 2 , . . . , x n ) , x i ∈ R n x=(x_1,x_2,...,x_n),x_i∈\mathbb R^n x=(x1,x2,...,xn),xiRn, split matrix A:
A x = ∑ i = 1 n A i x i − − − − ( 8 ) A_x=\sum_{i=1}^{n}A_ix_i----(8) Ax=i=1nAixi(8)
then,
A x = ∑ i = 1 n A i x i − − − − ( 9 ) A_x=\sum{i=1}^nA_ix_i----(9) Ax=i=1nAixi(9)
The reconstructed Lagrangian function is :
L ( x , y ) = ∑ i = 1 n L i ( x i , y ) = ∑ i = 1 n ( f i ( x i ) + y T A i x i − ( 1 / N ) y T b ) − − − − ( 10 ) L(x,y)=\sum_{i=1}^nL_i(x_i,y)=\sum_{i=1}^n(f_i(x_i)+y^TA_ix_i-(1/N)y^Tb)----(10) L(x,y)=i=1nLi(xi,y)=i=1n(fi(xi)+yTAixi(1/N)yTb)(10)
The update for dual gradient ascent:
x i k + 1 = a r g m i n x i L i ( x i , y k ) x_i^{k+1}=argmin_{x_i}L_i(x_i,y^k) xik+1=argminxiLi(xi,yk) y k + 1 = y k + a k ( A x k + 1 − b ) − − − − ( 11 ) y^{k+1}=y^k+a^k(Ax^{k+1}-b)----(11) yk+1=yk+ak(Axk+1b)(11)

3. Augmented Lagrangian Method

In order to increase the robustness of dual ascent method and relax constrains of strong convexity, we introduce the augmented Lagrangian method as follow:
L ρ ( x , y ) = f ( x ) + y T ( A x − b ) + ( ρ / 2 ) ∣ ∣ A x − b ∣ ∣ 2 2 − − − − ( 12 ) L_\rho(x,y)=f(x)+y^T(Ax-b)+({\rho/2})||Ax-b||_2^2----(12) Lρ(x,y)=f(x)+yT(Axb)+(ρ/2)Axb22(12)
Where is penalty parameter. Augmented Lagrangian method is actually adding a penalty term to reconstruct the original problem into:
m i n x f ( x ) + ( ρ / 2 ) ∣ ∣ A x − b ∣ ∣ 2 2 min_xf(x)+(\rho/2)||Ax-b||_2^2 minxf(x)+(ρ/2)Axb22 s . t . A x = b − − − − ( 13 ) s.t.Ax=b----(13) s.t.Ax=b(13)
Dual ascent update:
x k + 1 = a r g m i n x L ( x , y k ) x^{k+1}=argmin_xL(x,y^k) xk+1=argminxL(x,yk)
y k + 1 = y k + ρ ( A x k − b ) − − − − ( 14 ) y^{k+1}=y^k+\rho(Ax^k-b)----(14) yk+1=yk+ρ(Axkb)(14)

4.ADMM

The method in section 3 uses augmented Lagrangian method to update and the step size is now ρ which replaces , this is called multiplier method, it can make function be more convergence than normal dual ascent method, but the addition of second penalty term disabled the decomposable property of original problem, to address this problem, alternative direction method for multipliers (ADMM) is introduced.

For optimization problem:
m i n x f ( x ) + g ( z ) − − − − ( 15 ) min_xf(x)+g(z)----(15) minxf(x)+g(z)(15) s . t . A x + B z = c s.t.Ax+Bz=c s.t.Ax+Bz=c
x ∈ R n , z ∈ R m , A ∈ R p ∗ n , B ∈ R p ∗ m , c ∈ R p x∈\mathbb R^n, z∈\mathbb R^m, A∈\mathbb R^{p*n}, B∈\mathbb R^{p*m}, c∈\mathbb R^p xRn,zRm,ARpn,BRpm,cRp, the augmented Lagrangian function is:
L p ( x , z , y ) = f ( x ) + g ( z ) + y T ( A x + B z − c ) + ( ρ / 2 ) ∣ ∣ A x + B z − c ∣ ∣ 2 2 − − − − ( 16 ) L_p(x,z,y)=f(x)+g(z)+y^T(Ax+Bz-c)+(\rho/2)||Ax+Bz-c||_2^2----(16) Lp(x,z,y)=f(x)+g(z)+yT(Ax+Bzc)+(ρ/2)Ax+Bzc22(16)
The update of variables by using ADMM is:
x k + 1 = a r g m i n x L ρ ( x k + 1 , z , y k ) x^{k+1}=argmin_xL_\rho(x^{k+1},z,y^k) xk+1=argminxLρ(xk+1,z,yk) z k + 1 = a r g m i n x L ρ ( x k + 1 , z , y k ) z^{k+1}=argmin_xL_\rho(x^{k+1},z,y^k) zk+1=argminxLρ(xk+1,z,yk) y k + 1 = y k + ρ ( A x k + 1 − B z k + 1 − c ) y^{k+1}=y^k+\rho(Ax^{k+1}-Bz^{k+1}-c) yk+1=yk+ρ(Axk+1Bzk+1c)
With ρ > 0 \rho>0 ρ>0, the basic original iteration structure of ADMM method is presented.

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值