梯度下降应用举例

梯度下降应用举例

一、梯度下降法求解LASSO问题
LASSO问题的原始形式为:
min ⁡ f ( x ) = 1 2 ∥ A x − b ∥ 2 + μ ∥ x ∥ 1 (1) \min f(x)=\frac{1}{2}\|Ax-b\|^2+\mu\|x\|_1 \tag{1} minf(x)=21Axb2+μx1(1)
LASSO问题的目标函数 f ( x ) f(x) f(x)不光滑,在某些点处无法求出梯度,因此不能直接对原始问题使用梯度法求解。考虑到目标函数的不光滑项为 ∥ x ∥ 1 \|x\|_1 x1,它实际上是哥哥分量绝对值的和,如果能找到一个光滑函数来近似绝对值函数,那么梯度法就可以被用在LASSO问题的求解上。在实际应用中,我们可以考虑如下以为光滑函数:
l δ ( x ) = { 1 2 δ x 2 , ∣ x ∣ < δ ∣ x ∣ − δ 2 , o t h e r w i s e l_\delta(x)=\{ \begin{array}{lc} \frac{1}{2\delta}x^2, &|x|<\delta \\ |x|-\frac{\delta}{2}, &otherwise \end{array} lδ(x)={2δ1x2,x2δ,x<δotherwise
上述函数实际上是Huber损失函数的一种变形,当 δ → 0 \delta\to 0 δ0时,光滑函数 l δ ( x ) l_\delta(x) lδ(x)与绝对值函数会越来越近,下图展示了 δ \delta δ取不同值时, l δ ( x ) l_\delta(x) lδ(x)的图形:
在这里插入图片描述

实验代码为:
huber函数

function fx = huber_func(x, delta)

fx = (abs(x) < delta) .* 1 / (2 * delta) .* x.^2 + ...
    (abs(x) >= delta) .* (abs(x) - delta / 2);
    

end

画图函数为

x = -4:1e-3:4;
fx = abs(x);
delta0 = 0;
delta1 = 0.02;
delta2 = 0.2;
delta3 = 1;

fx0 = huber_func(x, delta0);
fx1 = huber_func(x, delta1);
fx2 = huber_func(x, delta2);
fx3 = huber_func(x, delta3);
figure, plot(x, fx);
hold on
plot(x, fx0);
plot(x, fx1);
plot(x, fx2);
plot(x, fx3);
legend('|x|', '\delta=0', '\delta=0.02', '\delta=0.2', '\delta=1')

因此,我们构造光滑化LASSO问题为:
min ⁡ f δ ( x ) = 1 2 ∥ A x − b ∥ 2 + μ L δ ( x ) (2) \min f_\delta(x)=\frac{1}{2}\|Ax-b\|^2+\mu L_\delta(x) \tag{2} minfδ(x)=21Axb2+μLδ(x)(2)
其中 δ \delta δ为给定的光滑化参数,在这里 L δ ( x ) = ∑ i = 1 n l δ ( x i ) L_\delta(x)=\sum_{i=1}^{n}l_\delta(x_i) Lδ(x)=i=1nlδ(xi),即对 x x x的每个分量作用一维光滑函数 l δ ( x i ) l_\delta(x_i) lδ(xi)再整体求和。容易计算出 f δ ( x ) f_\delta(x) fδ(x)的梯度为
∇ f δ ( x ) = A T ( A x − b ) + μ ∇ L δ ( x ) (3) \nabla f_\delta(x)=A^T(Ax-b)+\mu\nabla L_\delta(x)\tag{3} fδ(x)=AT(Axb)+μLδ(x)(3)
其中 ∇ L δ \nabla L_\delta Lδ是逐分量定义的:
( ∇ L δ ( x ) ) i = { s i g n ( x i ) , ∣ x i ∣ > δ , x i δ , ∣ x i ∣ ≤ δ l (4) (\nabla L_\delta(x))_i=\{ \begin{array}{l} sign(x_i),&|x_i|>\delta, \\ \frac{x_i}{\delta},&|x_i|\le\delta \tag{4} \end{array}{l} (Lδ(x))i={sign(xi),δxi,xi>δ,xiδl(4)
显然 f δ ( x ) f_\delta(x) fδ(x)的梯度是李普希茨连续的,且相应常数为 L = ∥ A T A ∥ 2 + μ δ L=\|A^TA\|_2+\frac{\mu}{\delta} L=ATA2+δμ,根据梯度法收敛定理,固定补偿需不超过 1 L \frac{1}{L} L1才能保证算法收敛,如果 δ \delta δ过小,那么我们需要选取充分小的步长 α k \alpha_k αk使得梯度法收敛。

lasso问题数值实验
目标函数

function fx = func_lasso(x, A, b, mu, delta)

L_delta = 0.5 * delta * x.^2 .* (abs(x) < delta) + ...
    (abs(x) - 0.5 * delta) .* (abs(x) >= delta);
L_delta = sum(L_delta);

fx = 0.5 * norm(A * x - b, 'fro').^2 + mu * L_delta;

end

梯度函数

function gx = gfunc_lasso(x, A, b, mu, delta)

grad_L_delta = sign(x) .* (abs(x) > delta) + ...
    x / delta .* (abs(x) <= delta);

gx = A' * (A * x - b) + mu * grad_L_delta;


end

BB梯度下降

function [fmin, xmin] = gradient_descent(func, gfunc, x0, epsilon)

iIter = 1;
iterMax = 33000;
xOld = x0;

alphaMin = 1e-5;
alphaMax = 1e5;
M = 10;
%alpha = 0.5;
%[~, ~, alpha] = armijo_rule(func, gfunc, x0, 0.5, -feval(gfunc, x0));
alpha = 0.2;

QOld = 1;
COld = feval(func, xOld);
c1 = 0.5;
eta = 0.5;
% xk = zeros(size(x0, 1), 11);
% fk = zeros(11, 1);
% xk(:, 1) = x0;
% fk(1, :) = feval(func, x0);

while iIter < iterMax
    grad = feval(gfunc, xOld);
    dk = -grad;
    
    % Zhang, Hanger nonmonotone line search
    for i = 1:M
        xNew = xOld + alpha * dk;
        fNew = feval(func, xNew); 
        if fNew <= COld + alpha * c1 * dot(dk, dk)
            break;
        end
        alpha = alpha * eta;
    end
    
    %xNew = xOld - alpha * dk;
    iIter = iIter + 1;
    
    if norm(grad, 2) < epsilon
        break;
    end
    
    % BB step-size calculation
    sk = xNew - xOld; yk = feval(gfunc, xNew) - feval(gfunc, xOld);
    if mod(iIter, 2) == 1
        alpha = dot(sk, yk) / dot(yk, yk);
    else
        alpha = dot(sk, sk) / dot(sk, yk);
    end
    
    alpha = max(min(alpha, alphaMax), alphaMin);
    
    QNew = eta * QOld + 1;
    CNew = (eta * QOld * COld + fNew) / QNew;
    COld = CNew;
    
    xOld = xNew;
%     xk(:, iIter) = xNew;
%     fk(iIter, :) = fNew;
    fprintf('iIter = %d, fx = %f\n', iIter, fNew);
end

if iIter == iterMax
    fprintf('exceed max iteration, not found  minimal point x.\n');
end

xmin = xNew;
fmin = feval(func, xmin);
fprintf('iIter = %d, fmin = %f\n', iIter, fmin);

end

armijo-rule线搜索

function [fnew, xnew, alpha] = armijo_rule(func, gfunc, x0, alpha0, dk)

c1 = 1e-3;
alpha = alpha0;
gamma = 0.8;

iIter = 0;
iterMax = 200;
alphaMin = 1e-5;

while iIter < iterMax 
    
    xnew = x0 + alpha * dk;
    fnew = feval(func, xnew);
    f0 = feval(func, x0);
    if fnew <= f0 + c1 * feval(gfunc, x0)' * dk
        break;
    end
    
    if alpha < alphaMin
        break;
    end
    
    alpha = gamma * alpha;
    iIter = iIter + 1;   
end

if iIter == iterMax
    alpha = alphaMin;
    fprintf('reach maximum iteration, and not found suitable alpha.\n');
end

xnew = x0 + alpha * dk;
fnew = feval(func, xnew);

end

main函数

m = 512;
n = 1024;
r = 0.1;
mu = 1e-3;
delta = 1e-3;
epsilon = 1e-3;
A = randn(m, n);
u = sprandn(n, 1, r);
b = A * u;

u0 = zeros(n, 1);

func =@(x)(func_lasso(x, A, b, mu, delta));
gfunc = @(x)(gfunc_lasso(x, A, b, mu, delta));
[fmin, xmin] = gradient_descent(func, gfunc, u0, epsilon);
figure
subplot(211)
plot(u, 'ro');
hold on
plot(xmin, 'bx')
legend('original signal', 'lasso solution')
subplot(212)
plot(abs(xmin - u), '-')
title('abs_diff')

计算结果如下图所示
在这里插入图片描述

二、Tikhonov(吉洪诺夫)正则化模型求解
Tikhonov正则化方法是Tikhonov等人在1977年提出的求解病态问题的方法。在这里我们讨论其在图像处理上的应用。假设用 x x x来表示真实图像,它是一个 m × n m\times n m×n点阵,用 y y y来表示带有噪声的图像,即 y = x + e y=x+e y=x+e,其中 e e e是高斯白噪声。为了从带噪声的图像 y y y中还原处原始图像 x x x,我们可以利用Tikhonov正则化的思想建立如下模型:
min ⁡ x f ( x ) = 1 2 ∥ x − y ∥ F 2 + λ ( ∥ D 1 x ∥ F 2 + ∥ D 2 x ∥ F 2 ) (5) \min_{x} f(x)=\frac{1}{2}\|x-y\|_F^2+\lambda(\|D_1 x\|_F^2+\|D_2x\|_F^2) \tag{5} xminf(x)=21xyF2+λ(D1xF2+D2xF2)(5)
其中 D 1 x , D 2 x D_1x,D_2x D1x,D2x分别表示对 x x x在水平方向和竖直方向上做前向差分,即
( D 1 x ) i j = 1 h ( x i + 1 , j − x i j ) , ( D 2 x ) i j = 1 h ( x i , j + 1 − x i j ) (6) (D_1x)_{ij}=\frac{1}{h}(x_{i+1,j}-x_{ij}),\\ (D_2x)_{ij}=\frac{1}{h}(x_{i,j+1}-x_{ij}) \tag{6} (D1x)ij=h1(xi+1,jxij),(D2x)ij=h1(xi,j+1xij)(6)
其中 h h h为给定的离散间隔,模型(5)由两项组成,第一项为保真项,即要求真实图像 x x x和带噪图像 y y y不要相差太大,这里使用F番薯的原因是我们假设噪声是高斯白噪声;第二项为Tikhonov正则项,它实际上是对 x x x本身的性质做出限制,在这里的含义是希望原始图像 x x x各个部分的变化不要太剧烈(即水平和垂直方向上的差分的平方和不要太大),这种正则化会使得恢复出的 x x x有比较好的光滑性。模型(5)的目标函数是光滑的,求出 f ( x ) f(x) f(x)的梯度为
∇ f ( x ) = x − y − 2 λ Δ x , (7) \nabla f(x)=x-y-2\lambda\Delta x,\tag{7} f(x)=xy2λΔx,(7)
其中 Δ \Delta Δ是图像 x x x的离散拉普拉斯算子,即
( Δ x ) i j = x i + 1 , j + x i − 1 , j + x i , j + 1 + x i , j − 1 − 4 x i j h 2 (8) (\Delta x)_{ij}=\frac{x_{i+1,j}+x_{i-1,j}+x_{i,j+1}+x_{i,j-1}-4x_{ij}}{h^2}\tag{8} (Δx)ij=h2xi+1,j+xi1,j+xi,j+1+xi,j14xij(8)
因此梯度法的迭代格式为:
x k + 1 = x k − t ∇ f ( x ) = x k − t ( x k − y − 2 λ Δ x ) = x k − t ( I − 2 λ Δ ) x k + t y \begin{aligned} x^{k+1} &=x^{k}-t\nabla f(x) \\ &=x^{k}-t(x^{k}-y-2\lambda\Delta x) \\ &=x^{k}-t(I-2\lambda\Delta)x^{k}+ty \end{aligned} xk+1=xktf(x)=xkt(xky2λΔx)=xkt(I2λΔ)xk+ty

数值实验
目标函数

function fx = func(x, y, lambda)

[height, width] = size(x);
dx = zeros(height, width);
dy = zeros(height, width);
dx(1:end-1, :) = x(2:end, :) - x(1:end-1, :);
dy(:, 1:end-1) = x(:, 2:end) - x(:, 1:end-1);

% ||x-y||^2 = norm(x-y, 'fro');

fx = 0.5 * norm(x-y, 'fro')^2 + lambda * ( norm(dx, 'fro')^2 + norm(dy, 'fro')^2);

end

目标函数梯度:

function g = gfunc(x, y,lambda)

[height, width] = size(x);
x_pad = zeros(height + 2, width + 2);
x_pad(2:end-1, 2:end-1) = x;
x_pad(1, 2:end-1) = x(1, :);
x_pad(end, 2:end-1) = x(end, :);
x_pad(:, 1) = x_pad(:, 2);
x_pad(:, end) = x_pad(:, end-1);

div_x = x_pad(2+1:end-1+1, 2:end-1) + x_pad(2-1:end-1-1, 2:end-1) + ...
    x_pad(2:end-1, 2+1:end-1+1) + x_pad(2:end-1, 2-1:end-1-1) - 4 * x;

g = x - y - 2 * lambda * div_x;

end

BB梯度下降(适用于图像)

function [fmin, xmin] = gradient_descent(func, gfunc, x0, epsilon)

iIter = 1;
iterMax = 2000;
xOld = x0;

alphaMin = 1e-5;
alphaMax = 1e5;
M = 10;
[~, ~, alpha] = armijo_rule(func, gfunc, x0, 0.5, -feval(gfunc, x0));

QOld = 1;
COld = feval(func, xOld);
c1 = 0.5;
eta = 0.5;
% xk = zeros(size(x0, 1), 11);
% fk = zeros(201, 1);
% xk(:, 1) = x0;
% fk(1, :) = feval(func, x0);
fk = 0;

while iIter < iterMax
    grad = feval(gfunc, xOld);
    dk = -grad;
    
    % Zhang, Hanger nonmonotone line search
    for i = 1:M
        xNew = xOld + alpha * dk;
        fNew = feval(func, xNew); 
        if fNew <= COld + alpha * c1 * dot(dk(:), dk(:))
            break;
        end
        alpha = alpha * eta;
    end
    
    %xNew = xOld - alpha * dk;
    iIter = iIter + 1;
    xDiff = xNew - xOld;
    fDiff = fNew - feval(func, xOld);
    xDiffNorm = norm(xDiff, 'fro');
    fDiffNorm = abs(fDiff);
    if norm(grad, 'fro')^2 < epsilon || (xDiffNorm < epsilon && fDiffNorm < epsilon)
        break;
    end
    
    % BB step-size calculation
    sk = xNew - xOld; yk = feval(gfunc, xNew) - feval(gfunc, xOld);
    if mod(iIter, 2) == 1
        alpha = dot(sk(:), yk(:)) / dot(yk(:), yk(:));
    else
        alpha = dot(sk(:), sk(:)) / dot(sk(:), yk(:));
    end
    
    alpha = max(min(alpha, alphaMax), alphaMin);
    
    QNew = eta * QOld + 1;
    CNew = (eta * QOld * COld + fNew) / QNew;
    COld = CNew;
    
    xOld = xNew;
%     xk(:, iIter) = xNew;
    fk = fNew;
    fprintf('iIter = %d, fx = %f\n', iIter, fk);
end

if iIter == iterMax
    fprintf('exceed max iteration, not found  minimal point x.\n');
end

xmin = xNew;
fmin = feval(func, xmin);
fprintf('iIter = %d, fmin = %f\n', iIter, fmin);

end

armijo-rule

function [fnew, xnew, alpha] = armijo_rule(func, gfunc, x0, alpha0, dk)

c1 = 1e-3;
alpha = alpha0;
gamma = 0.8;

iIter = 0;
iterMax = 200;
alphaMin = 1e-5;

while iIter < iterMax 
    
    xnew = x0 + alpha * dk;
    fnew = feval(func, xnew);
    f0 = feval(func, x0);
    if fnew <= f0 + c1 * feval(gfunc, x0)' * dk
        break;
    end
    
    if alpha < alphaMin
        break;
    end
    
    alpha = gamma * alpha;
    iIter = iIter + 1;   
end

if iIter == iterMax
    alpha = alphaMin;
    fprintf('reach maximum iteration, and not found suitable alpha.\n');
end

xnew = x0 + alpha * dk;
fnew = feval(func, xnew);

end

Tikhonov正则化主函数

u = load('tower.mat');
u = u.B1;
x = double(u);

[height, width] = size(u);

x0 = x;
y = x + 20 * randn(height, width);
% y = mat2gray(y);
% x = mat2gray(x);
lambda0 = 0.5;
lambda1 = 2;

epsilon = 1e-4;

func0 = @(x)(func(x, y, lambda0));
gfunc0 = @(x)(gfunc(x, y, lambda0));
[fmin, xmin] = gradient_descent(func0, gfunc0, y, epsilon);

figure, imshow(uint8(y)), title('noisy image');
figure, imshow(uint8(xmin)), title('tikhonov regularization lambda = 0.5');

func1 = @(x)(func(x, y, lambda1));
gfunc1 = @(x)(gfunc(x, y, lambda1));
[fmin, xmin] = gradient_descent(func1, gfunc1, y, epsilon);
figure, imshow(uint8(xmin)), title('tikhonov regularization lambda = 2');

lambda2 = 20;
func2 = @(x)(func(x, y, lambda2));
gfunc2 = @(x)(gfunc(x, y, lambda2));
[fmin, xmin] = gradient_descent(func2, gfunc2, y, epsilon);
figure, imshow(uint8(xmin)), title('tikhonov regularization lambda = 20');

仿真结果如下图所示:
在这里插入图片描述在这里插入图片描述
在这里插入图片描述
在这里插入图片描述

通过数值实验我们可以看到Tikhonov正则化模型可以有效地去除图像中的噪声,但是它的缺点很明显:经过Tikhonov正则化处理的图像会偏光滑,图像中物体之间的边界变得模糊,当 λ \lambda λ较大时尤为明显。出现这一现象的原因时模型(5)中正则项选取不当,在以后的实验中,会考虑修改正则项来建模求解。

在这里插入图片描述

  • 5
    点赞
  • 36
    收藏
    觉得还不错? 一键收藏
  • 4
    评论
评论 4
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值