Optimization Week 12: Proximal gradient method and newton method

最新推荐文章于 2024-02-15 22:10:22 发布

xiwang_chn

最新推荐文章于 2024-02-15 22:10:22 发布

阅读量461

点赞数

分类专栏： Optimization

本文链接：https://blog.csdn.net/weixin_42017454/article/details/109688613

版权

Optimization 专栏收录该内容

15 篇文章 1 订阅

订阅专栏

Week 12: Constraint GD, Proximal gradient method

1 Constrained descent
2 Proximal gradient method

1 Constrained descent

1.1 Projected gradient descent (PGD)

Problem

$\begin{aligned} \min_x& f(x)\\ s.t.&\quad x\in C \end{aligned}$

Algorithm

Then
$x_{t+1}=P_c(x_t-\eta \nabla f(x_t))$

Convergence

Smooth: $O (1 / t)$
Smooth and strongly convex: $O((1-\frac{m}{M})^t)$
Step size $\eta=\frac{1}{M}$

1.2 Frank Wolfe method

Algorithm

$S_{k-1}=\argmin_{s\in C}\nabla f(x_{k-1})^T(S-x_{k-1})=\argmin_{s\in C}\nabla f(x_{k-1})^TS$ $x_k=(1-r_k)x_{k-1}+r_k S_{k-1}$

No projection
$r_k=\frac{2}{k+1}$
$x_k$ always feasible
The objective function to obtain $S_{k-1}$ is linear
Affine invariant

Convergence

$f(x_t)-f(x^*)\leq \frac{2m}{k+1}$ $m$ is the parameter defining the non-linearity, more non-linear, larger $m$ .

Examples

Examples 1-norm constraints
Examples polytope constraints
see note

2 Proximal gradient method

2.1 Motivation

Accelerate the slow convergence of nonsmooth objective function for some decomposable functions.
$f (x) = g (x) + h (x)$ Where $g (x)$ is convex and smooth (M-Lipschitz gradient), $h (x)$ is convex, not smooth, but is seperable.

2.2 Idea of proximal gradient

$x_+=\argmin_y g(x)+\nabla g(x)^T(y-x)+\frac{1}{2\eta}||y-x||^2+h(y)$ $g (x)$ can be approximated using quadratic, $h$ is directly used.
$x_+=\argmin_y \frac{1}{2\eta}||y-(x-\eta \nabla g(x))||^2+h(y)$ $x_+(u)=\argmin_y \frac{1}{2\eta}||y-u||^2+h(y)$

2.3 Proximal gradient

$Prox_{\eta h}(u)=\argmin_y \frac{1}{2\eta}||y-u||^2+h(y)$ $x_+=Prox_{\eta h}(x-\eta \nabla g(x))$

2.4 Convergence

Smooth and convex:
$\eta<\frac{1}{M}, f(x_t)-f^*\leq O(\frac{1}{T}), O(\frac{1}{\varepsilon})$

Acceleration
$O(\frac{1}{T^2}),O(\frac{1}{\sqrt{\varepsilon}})$

Smooth and strongly convex:
$O(C^T), O(\log{\varepsilon})$

2.5 Examples

Prox grad as a generalization of projected gradient descent PGD
12.4 example, Example of prox operator: L1 norm

$\begin{aligned} Prox_{\eta h}(u)=\left\{ \begin{array}{lr} u_i-\eta, &u_i>\eta\\ 0,&-\eta\leq u_i\leq \eta\\ u_i+\eta,& u_i<-\eta \end{array} \right. \end{aligned}$