Coordinate Descent Method
Conditions Required: The objective function is differentiable and smooth.
The coodinate descent method is not a gradient optimal method. The method can find the local minimum along the direction of a coordinate at each iteration. For fixed other vectors x j , . . . , x n x_j,...,x_n xj,...,xn, by minimizing the objective function with respect to x i x_i xi, e.g., ∂ f ∂ x i = 0 \frac{\partial f}{\partial x_i} = 0 ∂xi∂f=0, the method can optimize all vectors one by one.
The iterative formula of CDM is
x
i
(
k
)
=
arg min
x
i
f
(
x
1
(
k
)
,
.
.
.
,
x
i
−
1
(
k
)
,
x
i
,
x
i
+
1
(
k
)
,
.
.
,
x
n
(
k
)
)
,
x_i^{(k)} = \argmin_{x_i} f \left(x_1^{(k)},...,x_{i-1}^{(k)},x_i,x_{i+1}^{(k)},..,x_n^{(k)} \right),
xi(k)=xiargminf(x1(k),...,xi−1(k),xi,xi+1(k),..,xn(k)), which can be rewritten as
x
1
(
k
)
=
arg min
x
1
f
(
x
1
,
x
2
(
k
−
1
)
,
x
3
(
k
−
1
)
,
.
.
.
,
x
n
(
k
−
1
)
)
,
x
2
(
k
)
=
arg min
x
1
f
(
x
1
(
k
)
,
x
2
,
x
3
(
k
−
1
)
,
.
.
.
,
x
n
(
k
−
1
)
)
,
⋅
⋅
⋅
,
x
n
(
k
)
=
arg min
x
1
f
(
x
1
(
k
)
,
x
2
(
k
)
,
x
3
(
k
)
,
.
.
.
,
x
n
)
.
\begin{aligned} x_1^{(k)} &= \argmin_{x_1} f \left( x_1,x_2^{(k-1)},x_3^{(k-1)},...,x_n^{(k-1)} \right), \\ x_2^{(k)} &= \argmin_{x_1} f \left( x_1^{(k)},x_2, x_3^{(k-1)},...,x_n^{(k-1)} \right) , \\ \quad \cdot &\cdot \cdot, \\ x_n^{(k)} &= \argmin_{x_1} f \left( x_1^{(k)},x_2^{(k)}, x_3^{(k)},...,x_n \right). \end{aligned}
x1(k)x2(k)⋅xn(k)=x1argminf(x1,x2(k−1),x3(k−1),...,xn(k−1)),=x1argminf(x1(k),x2,x3(k−1),...,xn(k−1)),⋅⋅,=x1argminf(x1(k),x2(k),x3(k),...,xn).
Different to the gradient descent method, corrdinate descent method makes linear search along one dimension, but the former needs to calculate the gradient of the objective function.
The pseudo-code of coordinate descent method see the following figure:
The following points should be noted.
- If coordinate descent method is used in non-smooth objective functions, then the method maybe stop to search results in some points which are not stationary (critical) points (非驻点).
- The method can’t handle with high-dimensional problems.
Block Coordinate Descent Method
For better solving the high-dimensional problems, we can introduce the Block Coordinate Descent Method (BCDM).
The idea is spliting variables to many blocks, e.g., f ( x , y ) f(\mathbf{x},\mathbf{y}) f(x,y). The coordinate descent method is to alternately optimize x 1 , . . . , x N , y 1 , . . . , y N x_1,...,x_N,y_1,...,y_N x1,...,xN,y1,...,yN in one by one manner, but the block coordinate descent method alternately optimize one block with fixed other block, e.g., alternately optimizing x i k x_i^k xik with fixed y i k − 1 y_i^{k-1} yik−1 and y i k y_i^k yik with fixed x i k − 1 x_i^{k-1} xik−1.
If we split the problem to two sub-problems, then we