[NOTE in progress] Distributed Optimization and Statistical Learning via ADMM - Boyd

最新推荐文章于 2022-01-06 13:15:25 发布

tangwing

最新推荐文章于 2022-01-06 13:15:25 发布

阅读量505

点赞数

分类专栏：算法 # Optimization

本文链接：https://blog.csdn.net/tangwing/article/details/106163739

版权

Reading notes of the paper "Distributed Optimization and Statistical Learning via ADMM" by Boyd, Parikh, Chu, Peleato and Eckstein.

Introduction

ADMM : developped in the 70s with roots in the 50s. Proved to be highly related to other methods like Douglas-Rachford splitting, Spingarn's method of partial inverse, Proximal methods, etc
Why ADMM today: with the arriving of the big data era and the need of ML algorithms, ADMM is proved to be well suited to solve large scale optimization problems, distributionally.
What big data brings to us: with big data, simple methods can be shown as very effective to solve complex pb
ADMM can be seen as a blend of Dual Decomposition and Augmented Lagrangian Methods. The latter is more robust and has a better convergence but cannot be decompose directly as in DD.
ADMM can decompose by example or by features. [To be explored in later chapters]
Note that even used in serial mode, ADMM is still comparable to others methods and often converge in tens of iterations.

What is conjugate function exactly?
Dual ascent and Dual subgradient methods. If the stepsize is chosen appropriately and some other assumptions hold. They converge.
Why augemented lagrangian:
- More robust, less assumption(strict convexity, finiteness of f) : in pratice some convergence assumptions are not met for dual ascent, the constraint may be affine (e.x. Min x s.t. x>10) and the dual pb become unbounded.
- For equality constraints, augmented version has a faster convergence. This can be viewed from the penalty method's point of view.
Dual Decomposition: relax the connecting contraints so that the pb can be decomposed. This naturally invovles parallel computation.
The pho in Augmented Lag is actually the stepsize and with

关注