凸优化算法—坐标下降法(Coordinate Descent Method)& 分块坐标下降法(Block Coordinate Descent Method)


Coordinate Descent Method

Conditions Required: The objective function is differentiable and smooth.

The coodinate descent method is not a gradient optimal method. The method can find the local minimum along the direction of a coordinate at each iteration. For fixed other vectors x j , . . . , x n x_j,...,x_n xj,...,xn, by minimizing the objective function with respect to x i x_i xi, e.g., ∂ f ∂ x i = 0 \frac{\partial f}{\partial x_i} = 0 xif=0, the method can optimize all vectors one by one.

The iterative formula of CDM is
x i ( k ) = arg min ⁡ x i f ( x 1 ( k ) , . . . , x i − 1 ( k ) , x i , x i + 1 ( k ) , . . , x n ( k ) ) , x_i^{(k)} = \argmin_{x_i} f \left(x_1^{(k)},...,x_{i-1}^{(k)},x_i,x_{i+1}^{(k)},..,x_n^{(k)} \right), xi(k)=xiargminf(x1(k),...,xi1(k),xi,xi+1(k),..,xn(k)), which can be rewritten as
x 1 ( k ) = arg min ⁡ x 1 f ( x 1 , x 2 ( k − 1 ) , x 3 ( k − 1 ) , . . . , x n ( k − 1 ) ) , x 2 ( k ) = arg min ⁡ x 1 f ( x 1 ( k ) , x 2 , x 3 ( k − 1 ) , . . . , x n ( k − 1 ) ) , ⋅ ⋅ ⋅ , x n ( k ) = arg min ⁡ x 1 f ( x 1 ( k ) , x 2 ( k ) , x 3 ( k ) , . . . , x n ) . \begin{aligned} x_1^{(k)} &= \argmin_{x_1} f \left( x_1,x_2^{(k-1)},x_3^{(k-1)},...,x_n^{(k-1)} \right), \\ x_2^{(k)} &= \argmin_{x_1} f \left( x_1^{(k)},x_2, x_3^{(k-1)},...,x_n^{(k-1)} \right) , \\ \quad \cdot &\cdot \cdot, \\ x_n^{(k)} &= \argmin_{x_1} f \left( x_1^{(k)},x_2^{(k)}, x_3^{(k)},...,x_n \right). \end{aligned} x1(k)x2(k)xn(k)=x1argminf(x1,x2(k1),x3(k1),...,xn(k1)),=x1argminf(x1(k),x2,x3(k1),...,xn(k1)),,=x1argminf(x1(k),x2(k),x3(k),...,xn).

Different to the gradient descent method, corrdinate descent method makes linear search along one dimension, but the former needs to calculate the gradient of the objective function.

The pseudo-code of coordinate descent method see the following figure:

The following points should be noted.

  • If coordinate descent method is used in non-smooth objective functions, then the method maybe stop to search results in some points which are not stationary (critical) points (非驻点).
  • The method can’t handle with high-dimensional problems.

Block Coordinate Descent Method

For better solving the high-dimensional problems, we can introduce the Block Coordinate Descent Method (BCDM).

The idea is spliting variables to many blocks, e.g., f ( x , y ) f(\mathbf{x},\mathbf{y}) f(x,y). The coordinate descent method is to alternately optimize x 1 , . . . , x N , y 1 , . . . , y N x_1,...,x_N,y_1,...,y_N x1,...,xN,y1,...,yN in one by one manner, but the block coordinate descent method alternately optimize one block with fixed other block, e.g., alternately optimizing x i k x_i^k xik with fixed y i k − 1 y_i^{k-1} yik1 and y i k y_i^k yik with fixed x i k − 1 x_i^{k-1} xik1.

If we split the problem to two sub-problems, then we


  • 6
    点赞
  • 37
    收藏
    觉得还不错? 一键收藏
  • 2
    评论
以下是使用R语言实现的循环坐标下降coordinate descent)求解Lasso的代码: ``` # Lasso using coordinate descent # Define the L1 penalty function L1 <- function(beta, lambda) { sum(abs(beta)) * lambda } # Define the coordinate descent function lasso_cd <- function(X, y, lambda, max_iter = 1000, tol = 1e-4) { # Standardize X and y X <- scale(X) y <- scale(y) # Initialize beta n <- nrow(X) p <- ncol(X) beta <- rep(0, p) beta_old <- beta # Define the soft-thresholding operator soft_thresh <- function(x, lambda) { sign(x) * pmax(abs(x) - lambda, 0) } # Coordinate descent loop for (iter in 1:max_iter) { for (j in 1:p) { X_j <- X[, j] beta_j <- beta[-j] r <- y - X[, -j] %*% beta_j c_j <- colSums(X_j^2) z_j <- X_j %*% r / n + beta[j] beta[j] <- soft_thresh(z_j, lambda * n) / c_j } if (max(abs(beta - beta_old)) < tol) { break } beta_old <- beta } # Return the final beta values beta } # Example usage set.seed(123) n <- 100 p <- 10 X <- matrix(rnorm(n * p), n, p) y <- X[, 1] - 2 * X[, 2] + rnorm(n) lambda <- 0.5 beta <- lasso_cd(X, y, lambda) ``` 其中,第一步定义了L1惩罚函数(L1 penalty function),第二步是循环坐标下降coordinate descent)函数的定义,第三步是对输入的数据进行标准化,第四步是初始化beta。在coordinate descent循环中,我们对于每个j循环遍历每个参数,计算残差r,第j个自变量的列和c_j,z_j,然后更新beta_j。最后,我们检查beta是否收敛,并返回最终的beta值。 这个函数可以接受任意大小的X和y,以及任意的lambda值,它返回Lasso估计的系数向量。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值