凸优化算法—坐标下降法（Coordinate Descent Method）& 分块坐标下降法（Block Coordinate Descent Method）

最新推荐文章于 2024-08-02 10:41:31 发布

MadJieJie

最新推荐文章于 2024-08-02 10:41:31 发布

阅读量4.5k

点赞数 7

分类专栏： Algorithms

若有帮助，请点赞&收藏，转载请标注出处。

本文链接：https://blog.csdn.net/madjiejie/article/details/119043284

版权

Algorithms 专栏收录该内容

5 篇文章 2 订阅

订阅专栏

Coordinate Descent Method

Conditions Required: The objective function is differentiable and smooth.

The coodinate descent method is not a gradient optimal method. The method can find the local minimum along the direction of a coordinate at each iteration. For fixed other vectors $x_j,...,x_n$ , by minimizing the objective function with respect to $x_i$ , e.g., $\frac{\partial f}{\partial x_i} = 0$ , the method can optimize all vectors one by one.

The iterative formula of CDM is
$x_i^{(k)} = \argmin_{x_i} f \left(x_1^{(k)},...,x_{i-1}^{(k)},x_i,x_{i+1}^{(k)},..,x_n^{(k)} \right),$ which can be rewritten as
$\begin{aligned} x_1^{(k)} &= \argmin_{x_1} f \left( x_1,x_2^{(k-1)},x_3^{(k-1)},...,x_n^{(k-1)} \right), \\ x_2^{(k)} &= \argmin_{x_1} f \left( x_1^{(k)},x_2, x_3^{(k-1)},...,x_n^{(k-1)} \right) , \\ \quad \cdot &\cdot \cdot, \\ x_n^{(k)} &= \argmin_{x_1} f \left( x_1^{(k)},x_2^{(k)}, x_3^{(k)},...,x_n \right). \end{aligned}$

Different to the gradient descent method, corrdinate descent method makes linear search along one dimension, but the former needs to calculate the gradient of the objective function.

The pseudo-code of coordinate descent method see the following figure:

The following points should be noted.

If coordinate descent method is used in non-smooth objective functions, then the method maybe stop to search results in some points which are not stationary (critical) points (非驻点).
The method can’t handle with high-dimensional problems.

Block Coordinate Descent Method

For better solving the high-dimensional problems, we can introduce the Block Coordinate Descent Method (BCDM).

The idea is spliting variables to many blocks, e.g., $f(\mathbf{x},\mathbf{y})$ . The coordinate descent method is to alternately optimize $x_1,...,x_N,y_1,...,y_N$ in one by one manner, but the block coordinate descent method alternately optimize one block with fixed other block, e.g., alternately optimizing $x_i^k$ with fixed $y_i^{k-1}$ and $y_i^k$ with fixed $x_i^{k-1}$ .