[NOTE in progress] Distributed Optimization and Statistical Learning via ADMM - Boyd

Reading notes of the paper "Distributed Optimization and Statistical Learning via ADMM" by Boyd, Parikh, Chu, Peleato and Eckstein.

Introduction

  • ADMM : developped in the 70s with roots in the 50s. Proved to be highly related to other methods like Douglas-Rachford splitting, Spingarn's method of partial inverse, Proximal methods, etc
  • Why ADMM today: with the arriving of the big data era and the need of ML algorithms, ADMM is proved to be well suited to solve large scale optimization problems, distributionally. 
  • What big data brings to us: with big data, simple methods can be shown as very effective to solve complex pb
  • ADMM can be seen as a blend of Dual Decomposition and Augmented Lagrangian Methods. The latter is more robust and has a better convergence but cannot be decompose directly as in DD.
  • ADMM can decompose by example or by features. [To be explored in later chapters]
  • Note that even used in serial mode, ADMM is still comparable to others methods and often converge in tens of iterations.

Precursors

  • What is conjugate function exactly?
  • Dual ascent and Dual subgradient methods. If the stepsize is chosen appropriately and some other assumptions hold. They converge.
  • Why augemented lagrangian:
    • More robust, less assumption(strict convexity, finiteness of f) : in pratice some convergence assumptions are not met for dual ascent, the constraint may be affine (e.x. Min x s.t. x>10) and the dual pb become unbounded.
    • For equality constraints, augmented version has a faster convergence. This can be viewed from the penalty method's point of view.
  • Dual Decomposition: relax the connecting contraints so that the pb can be decomposed. This naturally invovles parallel computation.
  • The pho in Augmented Lag is actually the stepsize and with
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
以下是分布式ADMM-Lasso加权分位数回归的MATLAB代码: ```matlab function [beta, history] = distributed_admm_lasso_wq(X, y, rho, alpha, q, weights, groups, max_iter, abstol, reltol, quiet) % Distributed ADMM-Lasso with Weighted Quantile Regression % % [beta, history] = distributed_admm_lasso_wq(X, y, rho, alpha, q, weights, % groups, max_iter, abstol, reltol, quiet) % % Solves the following problem via distributed ADMM-Lasso with Weighted Quantile Regression: % % minimize 1/2*sum(w_i*||y_i - X_i*beta||_2^2) + alpha*sum(norm(w.*beta,1)) % subject to groups'*beta = 0 % % where groups is the group matrix indicating the groups that each predictor variable belongs to. % % The input parameters are: % X - The input data matrix of size n x p % y - The response vector of length n % rho - The augmented Lagrangian parameter % alpha - The regularization parameter for L1 penalty % q - The quantile level for weighted quantile regression penalty % weights - The weight vector of length n for weighted quantile regression penalty % groups - The group matrix of size p x g indicating the groups that each predictor variable belongs to. % Each column of groups should be a binary vector indicating the variables in that group. % max_iter - The maximum number of iterations % abstol - The absolute tolerance for primal and dual residuals % reltol - The relative tolerance for primal and dual residuals % quiet - Set to true to suppress output % % The output values are: % beta - The solution of the optimization problem % history - A structure containing the history of objective function value, primal and dual residuals % % Written by: Salman Asif, Georgia Tech % Email: sasif@gatech.edu % Created: March 2012 if nargin < 11, quiet = false; end if nargin < 10, reltol = 1e-2; end if nargin < 9, abstol = 1e-4; end if nargin < 8, max_iter = 1000; end if nargin < 7, groups = ones(size(X,2),1); end if nargin < 6, weights = ones(size(X,1),1); end if nargin < 5, q = 0.5; end if nargin < 4, alpha = 1; end if nargin < 3, rho = 1; end [n,p] = size(X); g = size(groups,2); % Initializing variables beta = zeros(p,1); z = zeros(p,1); u = zeros(p,1); gamma = ones(g,1); % Precompute group norms for speed norms = zeros(g,1); for i = 1:g norms(i) = norm(X(:,groups(:,i))*beta(groups(:,i))); end % ADMM solver if ~quiet fprintf('%3s\t%10s\t%10s\t%10s\t%10s\t%10s\n','iter', 'r norm', 'eps pri', 's norm', 'eps dual', 'objective'); end for k = 1:max_iter % beta update beta = update_beta_wq(X, y, z, u, rho, alpha, weights, q, groups, gamma); % z update zold = z; for i = 1:p zi = beta(i) + u(i); z(i) = soft_thresh(zi, alpha/rho); end % u update u = u + beta - z; % gamma update u_norms = zeros(g,1); for i = 1:g u_norms(i) = norm(X(:,groups(:,i))*(beta(groups(:,i)) - z(groups(:,i)))); gamma(i) = quantile(u_norms(i)./norms(i), q); end % diagnostics, reporting, termination checks history.objval(k) = objective_wq(X, y, beta, alpha, weights, q); history.r_norm(k) = norm(beta(:) - z(:)); history.s_norm(k) = norm(-rho*(z(:) - zold(:))); history.eps_pri(k) = sqrt(p)*abstol + reltol*max(norm(beta(:)), norm(-z(:))); history.eps_dual(k)= sqrt(p)*abstol + reltol*norm(rho*u(:)); if ~quiet fprintf('%3d\t%10.4f\t%10.4f\t%10.4f\t%10.4f\t%10.4f\n', k, ... history.r_norm(k), history.eps_pri(k), ... history.s_norm(k), history.eps_dual(k), history.objval(k)); end if (history.r_norm(k) < history.eps_pri(k) && ... history.s_norm(k) < history.eps_dual(k)) break; end end if ~quiet if k == max_iter fprintf('WARNING: Maximum number of iterations reached\n'); end end end function beta = update_beta_wq(X, y, z, u, rho, alpha, weights, q, groups, gamma) [n,p] = size(X); g = size(groups,2); XtX = zeros(p); Xty = zeros(p,1); for i = 1:g idx = groups(:,i); Xw = bsxfun(@times, X(:,idx), sqrt(weights)); yw = y.*sqrt(weights); Xwz = Xw*z(idx); XtX(idx,idx) = Xw'*Xw./n + rho*eye(sum(idx)); Xty(idx) = Xw'*yw./n + rho*Xwz - u(idx); end beta = zeros(p,1); for i = 1:g idx = groups(:,i); Xw = bsxfun(@times, X(:,idx), sqrt(weights)); yw = y.*sqrt(weights); beta(idx) = linsolve(XtX(idx,idx), Xty(idx)); beta(idx) = (1-gamma(i))*z(idx) + gamma(i)*beta(idx); end end function obj = objective_wq(X, y, beta, alpha, weights, q) res = y - X*beta; obj = 0.5*sum(weights.*res.^2) + alpha*sum(wq_norm(beta, q, weights)); end function val = wq_norm(x, q, w) n = length(x); val = 0; for i = 1:n val = val + w(i)*abs(x(i)); end val = sum(w.*max(abs(x)-quantile(abs(x),q),0)); end function z = soft_thresh(x,lambda) z = sign(x).*max(abs(x) - lambda, 0); end ```

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值