矢量范数的偏导数
- L1范数不可微。但是存在次梯度,即是次微分的。
L1范数的次梯度如下:
∂∂x||x||1=sign(x)
其中 sign(x) 表示如下:
sign(x)=⎧⎩⎨+1−1[−1,1]xi>0xi<0xi=0
在实验中,我们经常很碰到,一个函数表达式中含有多个带有绝对值表达式,我们为了去掉绝对值号,进行化简,经常需要假设函数绝对值中的表达式满足>0或者<0来消去绝对值。但是当变量很多时,很难划分这样的空间。例如上面的
L1
就是一个例子:
||x||1=|x1|+|x2|+....+|xn|
对于一维的情况:
|x|={x−xx≥0x≤0
但是对于高维的情况,我们很难写书上面明确的展开式。但是,在数值计算中,除了所谓的符号运。都是在知道明确的“值”的情况下,来进行求解的。
因此,知道了具体的值,我们很容易确定这个值的梯度。例如,对于3维的情况。如果值为 x0=[3,−2,5]T ,我们很容易知道,其函数值展开的表达式为:
||x||1=x1−x2+x3
故其梯度为 [1,−1,1]T ,即 sign(x) 。
2. L2 范数:
∂∂x||x−a||2=x−a||x−a||2
∂∂xx−a||x−a||2=I||x−a||2−(x−a)(x−a)T||x−a||32
∂||x||22∂x=∂||xTx||2∂x=2x
例如:求解下面函数的偏导数:
f(W)=12∑i,j∈S1γi,j||wTiX−wTjX||22
其中 W 是矩阵,大小 D×L , X 是矩阵,大小为 D×N ,其中 D 是特征向量的维度,
∂f(W)∂wi===∑i,j∈S1γi,j(wTiX−wTjX)∗∂(wTiX−wTjX)∂wi∑i,j∈S1γi,j(wTiX−wTjX)∗XT∑i,j∈S1γi,j(wTi−wTj)∗(XXT)
注意这里得到的是行向量的形式,因此还需要对其进行转置 。
对
wj
求偏导数:
∂f(W)∂wj===∑i,j∈S1γi,j(wTjX−wTiX)∗∂(wTjX−wTiX)∂wj∑i,j∈S1γi,j(wTjX−wTiX)∗XT∑i,j∈S1γi,j(wTj−wTi)∗(XXT)
Matlab对低维数据进行验证:
% verify
syms x11 x12 x21 x22 x31 x32 wi1 wi2 wi3 wj1 wj2 wj3 real;% define symbols as real value
%method 1
X = [x11 x12;x21 x22;x31 x32]; % X 3*2 ,two sampels ,feature dimension 3
wi = [wi1 wi2 wi3]';
wj = [wj1 wj2 wj3]';
fw = 1/2*norm(wi'*X-wj'*X,2).^2;
grad_wi1 = diff(fw,wi1); %\partial wi1
grad_wi2 = diff(fw,wi2); %\partial wi2
grad_wi3 = diff(fw,wi3); %\partial wi3
grad_wi0 =[grad_wi1;grad_wi2;grad_wi3];
% method 2
grad_wi = (wi'*X-wj'*X)*X';
grad_wi = grad_wi';
clc;
disp('method 1:');
disp(grad_wi0);
disp('method 2:');
disp(grad_wi);
disp('%%%%%%%%%%%%%%%%%%%%%% \partial wj %%%%%%%%%%%%%%%%%%');
% method 1
grad_wj1 = diff(fw,wj1);
grad_wj2 = diff(fw,wj2);
grad_wj3 = diff(fw,wj3);
grad_wj0 = [grad_wj1;grad_wj2;grad_wj3];
% method2
grad_wj = (wj'*X- wi'*X)*X';
grad_wj = grad_wj';
disp('method 1:');
disp(grad_wj0);
disp('method 2:');
disp(grad_wj);
结果:
method 1:
x11*abs(wi1*x11 + wi2*x21 + wi3*x31 - wj1*x11 - wj2*x21 - wj3*x31)*sign(wi1*x11 + wi2*x21 + wi3*x31 - wj1*x11 - wj2*x21 - wj3*x31) + x12*abs(wi1*x12 + wi2*x22 + wi3*x32 - wj1*x12 - wj2*x22 - wj3*x32)*sign(wi1*x12 + wi2*x22 + wi3*x32 - wj1*x12 - wj2*x22 - wj3*x32)
x21*abs(wi1*x11 + wi2*x21 + wi3*x31 - wj1*x11 - wj2*x21 - wj3*x31)*sign(wi1*x11 + wi2*x21 + wi3*x31 - wj1*x11 - wj2*x21 - wj3*x31) + x22*abs(wi1*x12 + wi2*x22 + wi3*x32 - wj1*x12 - wj2*x22 - wj3*x32)*sign(wi1*x12 + wi2*x22 + wi3*x32 - wj1*x12 - wj2*x22 - wj3*x32)
x31*abs(wi1*x11 + wi2*x21 + wi3*x31 - wj1*x11 - wj2*x21 - wj3*x31)*sign(wi1*x11 + wi2*x21 + wi3*x31 - wj1*x11 - wj2*x21 - wj3*x31) + x32*abs(wi1*x12 + wi2*x22 + wi3*x32 - wj1*x12 - wj2*x22 - wj3*x32)*sign(wi1*x12 + wi2*x22 + wi3*x32 - wj1*x12 - wj2*x22 - wj3*x32)
method 2:
x11*(wi1*x11 + wi2*x21 + wi3*x31 - wj1*x11 - wj2*x21 - wj3*x31) + x12*(wi1*x12 + wi2*x22 + wi3*x32 - wj1*x12 - wj2*x22 - wj3*x32)
x21*(wi1*x11 + wi2*x21 + wi3*x31 - wj1*x11 - wj2*x21 - wj3*x31) + x22*(wi1*x12 + wi2*x22 + wi3*x32 - wj1*x12 - wj2*x22 - wj3*x32)
x31*(wi1*x11 + wi2*x21 + wi3*x31 - wj1*x11 - wj2*x21 - wj3*x31) + x32*(wi1*x12 + wi2*x22 + wi3*x32 - wj1*x12 - wj2*x22 - wj3*x32)
%%%%%%%%%%%%%%%%%%%%%% \partial wj %%%%%%%%%%%%%%%%%%
method 1:
- x11*abs(wi1*x11 + wi2*x21 + wi3*x31 - wj1*x11 - wj2*x21 - wj3*x31)*sign(wi1*x11 + wi2*x21 + wi3*x31 - wj1*x11 - wj2*x21 - wj3*x31) - x12*abs(wi1*x12 + wi2*x22 + wi3*x32 - wj1*x12 - wj2*x22 - wj3*x32)*sign(wi1*x12 + wi2*x22 + wi3*x32 - wj1*x12 - wj2*x22 - wj3*x32)
- x21*abs(wi1*x11 + wi2*x21 + wi3*x31 - wj1*x11 - wj2*x21 - wj3*x31)*sign(wi1*x11 + wi2*x21 + wi3*x31 - wj1*x11 - wj2*x21 - wj3*x31) - x22*abs(wi1*x12 + wi2*x22 + wi3*x32 - wj1*x12 - wj2*x22 - wj3*x32)*sign(wi1*x12 + wi2*x22 + wi3*x32 - wj1*x12 - wj2*x22 - wj3*x32)
- x31*abs(wi1*x11 + wi2*x21 + wi3*x31 - wj1*x11 - wj2*x21 - wj3*x31)*sign(wi1*x11 + wi2*x21 + wi3*x31 - wj1*x11 - wj2*x21 - wj3*x31) - x32*abs(wi1*x12 + wi2*x22 + wi3*x32 - wj1*x12 - wj2*x22 - wj3*x32)*sign(wi1*x12 + wi2*x22 + wi3*x32 - wj1*x12 - wj2*x22 - wj3*x32)
method 2:
- x11*(wi1*x11 + wi2*x21 + wi3*x31 - wj1*x11 - wj2*x21 - wj3*x31) - x12*(wi1*x12 + wi2*x22 + wi3*x32 - wj1*x12 - wj2*x22 - wj3*x32)
- x21*(wi1*x11 + wi2*x21 + wi3*x31 - wj1*x11 - wj2*x21 - wj3*x31) - x22*(wi1*x12 + wi2*x22 + wi3*x32 - wj1*x12 - wj2*x22 - wj3*x32)
- x31*(wi1*x11 + wi2*x21 + wi3*x31 - wj1*x11 - wj2*x21 - wj3*x31) - x32*(wi1*x12 + wi2*x22 + wi3*x32 - wj1*x12 - wj2*x22 - wj3*x32)
参考文献:
- The Matrix Cookbook