多变量微积分10-13

10. 二阶导检验

函数的全局最值位于:临界点、定义域边界或无穷远处
临界点类型:局部最大值、局部最小值或鞍点,由二阶导数判定
假设函数f的二阶导数为:
A = f x x B = f x y C = f y y A=f_{xx} \\ B=f_{xy} \\ C=f_{yy} A=fxxB=fxyC=fyy 那么:
A C − B 2 > 0  and  A > 0 ⇒ local minimum A C − B 2 > 0  and  A < 0 ⇒ local maximum A C − B 2 < 0 ⇒ saddle point A C − B 2 = 0 ⇒ cannot compute AC-B^2>0 \text{ and } A>0 \Rightarrow \text{local minimum} \\ AC-B^2>0 \text{ and } A<0 \Rightarrow \text{local maximum} \\ AC-B^2<0 \Rightarrow \text{saddle point} \\ AC-B^2=0 \Rightarrow \text{cannot compute} ACB2>0 and A>0local minimumACB2>0 and A<0local maximumACB2<0saddle pointACB2=0cannot compute

大致推导:
假设函数f为:
f = a x 2 + b x y + c y 2 f=ax^2+bxy+cy^2 f=ax2+bxy+cy2 那么其一阶偏导数为:
∂ f ∂ x = 2 a x + b y = f x ∂ f ∂ y = b x + 2 c y = f y \frac{\partial f}{\partial x}=2ax+by = f_x \\ \frac{\partial f}{\partial y}=bx+2cy = f_y xf=2ax+by=fxyf=bx+2cy=fy 二阶偏导数为:
∂ f x ∂ x = 2 a = f x x = A ∂ f x ∂ y = b = f x y = B ∂ f y ∂ x = b = f y x = B ∂ f y ∂ y = 2 c = f y y = C \frac{\partial f_x}{\partial x}=2a=f_{xx}=A \\ \frac{\partial f_x}{\partial y}=b=f_{xy}=B \\ \frac{\partial f_y}{\partial x}=b=f_{yx}=B \\ \frac{\partial f_y}{\partial y}=2c=f_{yy}=C xfx=2a=fxx=Ayfx=b=fxy=Bxfy=b=fyx=Byfy=2c=fyy=C
二阶偏导数判断方法:
4 a c − b 2 > 0  and  a > 0 ⇒ local minimum 4 a c − b 2 > 0  and  a < 0 ⇒ local maximum 4 a c − b 2 < 0 ⇒ saddle point 4 a c − b 2 = 0 ⇒ cannot compute 4ac-b^2>0 \text{ and } a > 0 \Rightarrow \text{local minimum} \\ 4ac-b^2>0 \text{ and } a < 0 \Rightarrow \text{local maximum} \\ 4ac-b^2<0 \Rightarrow \text{saddle point} \\ 4ac-b^2=0 \Rightarrow \text{cannot compute} 4acb2>0 and a>0local minimum4acb2>0 and a<0local maximum4acb2<0saddle point4acb2=0cannot compute

11. 微分,链式法则

全微分:多元函数微分的确切名字,包含所有能改变函数值的因素
f = f ( x , y , z ) d f = f x d x + f y d y + f z d z = ∂ f ∂ x d x + ∂ f ∂ y d y + ∂ f ∂ z d z Δ f ≈ f x Δ x + f y Δ y + f z Δ z \begin{aligned} f&=f(x,y,z) \\ \mathrm{d} f &= f_x \mathrm{d}x + f_y \mathrm{d}y + f_z \mathrm{d}z \\ &= \frac{\partial f}{\partial x} \mathrm{d}x + \frac{\partial f}{\partial y} \mathrm{d}y + \frac{\partial f}{\partial z} \mathrm{d}z \end{aligned} \\ \Delta f \approx f_x \Delta x + f_y \Delta y + f_z \Delta z fdf=f(x,y,z)=fxdx+fydy+fzdz=xfdx+yfdy+zfdzΔffxΔx+fyΔy+fzΔz 重要 d f ≠ Δ f df \neq \Delta f df=Δf d f \mathrm{d}f df是极限, Δ f \Delta f Δf是数量,当x,y,z变化时, Δ f \Delta f Δf表示的就是变化的量值,当变化的量值趋于0时, ≈ \approx 变为=,且 Δ f \Delta f Δf变为 d f \mathrm{d}f df

在这里插入图片描述

链式法则:
f = f ( x , y ) , x = x ( u , v ) , y = y ( u , v ) d f = f x d x + f y d y = f x ( x u d u + x v d v ) + f y ( y u d u + y v d v ) = ( ∂ f ∂ x ∂ x ∂ u + ∂ f ∂ y ∂ y ∂ u ) d u + ( ∂ f ∂ x ∂ x ∂ v + ∂ f ∂ y ∂ y ∂ v ) d v ∂ f ∂ u = ∂ f ∂ x ∂ x ∂ u + ∂ f ∂ y ∂ y ∂ u \begin{aligned} & f=f(x,y),x=x(u,v),y=y(u,v) \\ \mathrm{d}f &= f_x \mathrm{d}x+f_y \mathrm{d}y \\ &= f_x (x_u du + x_v dv) + f_y(y_u du + y_v dv) \\ &= (\frac{\partial f}{\partial x} \frac{\partial x}{\partial u} + \frac{\partial f}{\partial y} \frac{\partial y}{\partial u}) du + (\frac{\partial f}{\partial x} \frac{\partial x}{\partial v} + \frac{\partial f}{\partial y} \frac{\partial y}{\partial v})dv \\ \frac{\partial f}{\partial u} &= \frac{\partial f}{\partial x} \frac{\partial x}{\partial u} + \frac{\partial f}{\partial y} \frac{\partial y}{\partial u} \end{aligned} dfuff=f(x,y),x=x(u,v),y=y(u,v)=fxdx+fydy=fx(xudu+xvdv)+fy(yudu+yvdv)=(xfux+yfuy)du+(xfvx+yfvy)dv=xfux+yfuy 注意: 不能约分,因为是偏导数,不是导数,导数可以约分

偏微分:针对某一个变量的微分

12. 梯度,方向导数,切平面

偏导数告诉我们,函数f对每个变量的变化有多敏感
用新方法重写导数式子:
d w d t = w x d x d t + w y d y d t = ∇ w ⋅ d r ⃗ d t \begin{aligned} \frac{dw}{dt}&=w_x \frac{dx}{dt} + w_y \frac{dy}{dt} \\ &= \nabla w \cdot \frac{d\vec{r}}{dt} \end{aligned} dtdw=wxdtdx+wydtdy=wdtdr
梯度(gradient):
∇ w = < w x , w y > d r ⃗ d t = < d x d t , d y d t > \nabla w=<w_x,w_y> \\ \frac{d \vec{r}}{dt}=<\frac{dx}{dt},\frac{dy}{dt}> w=<wx,wy>dtdr =<dtdx,dtdy>
方向导数:
d w d s ∣ u ⃗ = ∇ w ⋅ u ⃗ = ∣ ∇ w ∣ cos ⁡ θ \frac{dw}{ds} | _{\vec{u}} = \nabla w \cdot \vec{u} = | \nabla w | \cos \theta dsdwu =wu =wcosθ
梯度向量含义:包含偏导或者导数
性质:1. 梯度向量垂直于原函数的等值面的切平面(level surface)(梯度向量为原函数等值面的切平面的法向量,升维降维考虑)
应用:1. 求切面方程 2. 方向导数;方向向量与梯度向量夹角为0时函数值变化最大,即沿着梯度的方向函数值增加的最快

方向导数是一个值(斜率),梯度是一个向量;方向导数为某点的垂直切面与函数相交的曲线的斜率(附录)

偏导数可以解决很多物理问题,很多规律都是由偏微分方程描述的(未知函数的偏导构成的方程)

13. 拉格朗日乘数法

拉格朗日乘子法(Lagrange Multipliers):有约束条件时,最小化或者最大化多元函数;例如,当x、y不独立时,最小化或者最大化f(x,y),x、y不独立可能表现为g(x,y)=C;该方法适用于约束条件比较复杂的情况

这里的最值不能简单地使用临界点,因为临界点通常不满足约束条件,因此不能使用“最小二乘法”或“梯度下降法”

∇ f /  ⁣ / λ ∇ g \nabla f \mathrel{/\mkern-5mu/} \lambda \nabla g f//λg

疑问:多元一次方程求最值?

14. 非独立变量

  1. 约束变量相互之间的变化率(在等值面上移动):
    g = g ( x , y , z ) = C d g = g x d x + g y d y + g z d z = 0 d z = − g x g z d x − g y g z d y ⇒ ∂ z ∂ x = − g x g z ∂ z ∂ y = − g y g z g=g(x,y,z)=C \\ \mathrm{d}g=g_x dx + g_y dy + g_z dz = 0 \\ dz = -\frac{g_x}{g_z} dx - \frac{g_y}{g_z} dy \\ \Rightarrow \frac{\partial z}{\partial x} = -\frac{g_x}{g_z} \quad \frac{\partial z} {\partial y} = -\frac{g_y}{g_z} g=g(x,y,z)=Cdg=gxdx+gydy+gzdz=0dz=gzgxdxgzgydyxz=gzgxyz=gzgy
  2. 有约束的偏导数,例如 f(x,y,z) where g(x,y,z)=C, ( ∂ f ∂ z ) y (\frac{\partial f}{\partial z})_y (zf)y :微分法、链式法则—
    ( ∂ f ∂ z ) y = ∂ f ∂ x ( ∂ x ∂ z ) y + ∂ f ∂ y ( ∂ y ∂ z ) y + ∂ f ∂ z ( ∂ z ∂ z ) y (\frac{\partial f}{\partial z})_y=\frac{\partial f}{\partial x} (\frac{\partial x}{\partial z})_y + \frac{\partial f}{\partial y} (\frac{\partial y}{\partial z})_y + \frac{\partial f}{\partial z} (\frac{\partial z}{\partial z})_y (zf)y=xf(zx)y+yf(zy)y+zf(zz)y

附录

附录1. 梯度向量垂直于等值面的切平面

梯度向量

	clear; clc; clf;
    f = @(x, y, z) x.^3 + 3*y.^2 + 2.*x.*y + z.^2 - 1;
    fimplicit3(f);
    hold on;
    quiver3(0, 0, 1, 0, 0, 2);
    xlabel('x'); ylabel('y'); zlabel('z');
    quiver3(1, 0, 0, 3, 2, 0);
    axis vis3d;

附录2. 梯度下降法

假设
y = β 0 + β 1 x L ( β ) = 1 N ∑ j = 1 N ( β 0 + β 1 x j − y j ) 2 ∇ L = ( ∂ L ∂ β 0 , ∂ L ∂ β 1 ) = ( 2 N ∑ j = 1 N ( β 0 + β 1 x j − y j ) , 2 N ∑ j = 1 N ( β 0 + β 1 x j − y j ) x j ) y=\beta_0+\beta_1 x \\ L(\beta) = \frac{1}{N} \sum_{j=1}^N (\beta_0 + \beta_1 x_j - y_j)^2 \\ \nabla L = \left (\frac{\partial L}{\partial \beta_0},\frac{\partial L}{\partial \beta_1} \right )=\left (\frac{2}{N} \sum_{j=1}^N(\beta_0 + \beta_1 x_j - y_j),\frac{2}{N} \sum_{j=1}^N (\beta_0 + \beta_1 x_j - y_j) x_j \right ) y=β0+β1xL(β)=N1j=1N(β0+β1xjyj)2L=(β0L,β1L)=(N2j=1N(β0+β1xjyj),N2j=1N(β0+β1xjyj)xj)

梯度下降法的步骤:

  1. i = 0 i = 0 i=0 时,设置初始点 β 0 = ( β 0 0 , β 1 0 ) \beta^0=(\beta_0^0,\beta_1^0) β0=(β00,β10),设置步长(又称学习率) α \alpha α,设置迭代终止的误差忍耐度tol
  2. 计算目标函数 L ( β ) L(\beta) L(β) 在点 ( β 0 i , β 1 i ) (\beta_0^i,\beta_1^i) (β0i,β1i)上的梯度 ∇ L β i \nabla L_{\beta^i} Lβi
  3. 计算 β i + 1 \beta^{i+1} βi+1,公式如下:
    β i + 1 = β i − α ∇ L β i \beta^{i+1} = \beta^i - \alpha \nabla L_{\beta^i} βi+1=βiαLβi
  4. 计算梯度 ∇ L β i + 1 \nabla L_{\beta^{i+1}} Lβi+1,如果梯度的二范数 ∣ ∣ ∇ L β i + 1 ∣ ∣ 2 ≤ tol ||\nabla L_{\beta^{i+1}}||_2 \leq \text{tol} Lβi+12tol,则停止迭代,最优解的取值为 β i + 1 \beta^{i+1} βi+1;否则 i = i + 1 i=i+1 i=i+1,并返回第3步

在这里插入图片描述

% 梯度下降法
function gd()
clear
clc
clf

% 训练数据
X = 1:9;
Y = [1 2 6 7 9 12 13 15 20];
% X = 1:9;
% Y = [742 400 388 762 821 876 854 793 327];

% 初始设置
beta = [1, 1];
alpha = 0.2;
tol_L = 0.01;
batch_size = 4;

% 对X进行归一化
max_x = max(X);
X = X / max_x;

subplot(1, 2, 1);
% syms beta_0 beta_1;
% L = 1/length(x)*sum((beta_0 + beta_1*x - y)^2);
% L = mean((beta_0 + beta_1.*X - Y).^2);
[bb_0, bb_1] = meshgrid(-15:.5:15);
LL = bb_0;
[m n] = size(bb_0);
for i = 1:m
    for j = 1:n
        % LL(i,j) = subs(L, {beta_0, beta_1}, {bb_0(i,j), bb_1(i,j)});
        LL(i,j) = rmse([bb_0(i,j), bb_1(i,j)], X, Y);
    end
end
mesh(bb_0, bb_1, LL);

% 进行第一次计算
% grad = compute_grad(beta, X, Y);
% grad = compute_grad_SGD(beta, X, Y);
grad = compute_grad_batch(beta, batch_size, X, Y);
loss = rmse(beta, X, Y);
% 画图 begin
hold on
plot3(beta(1), beta(2), loss, 'ro');
quiver3(beta(1), beta(2), loss, -grad(1), -grad(2), 0, 'Color', 'r');
subplot(1, 2, 2);
plot(X, Y, 'o');
hold on
XA = 0:0.01:1.2;
YA = beta(1) + beta(2) .* XA;
plot(XA, YA);
% 画图 end
beta = update_beta(beta, alpha, grad);
% grad = compute_grad(beta, X, Y);
% grad = compute_grad_SGD(beta, X, Y);
grad = compute_grad_batch(beta, batch_size, X, Y);
loss_new = rmse(beta, X, Y);

% 开始迭代
i = 1;
while abs(loss_new - loss) > tol_L
    % 画图
    subplot(1, 2, 1);
    plot3(beta(1), beta(2), loss_new, 'bo');
    quiver3(beta(1), beta(2), loss_new, -grad(1), -grad(2), 0, 'Color', 'r');
    subplot(1, 2, 2);
    plot(X, Y, 'o');
    hold on
    XA = 0:0.01:1.2;
    YA = beta(1) + beta(2) .* XA;
    plot(XA, YA);
    axis([0 1.2 0 25]);
    hold off
    % subplot(2, 2, [3 4]);
    % hold on
    % plot(i, abs(loss_new - loss), 'or');
    
    % M(i) = getframe;
    getframe;
    
    beta = update_beta(beta, alpha, grad);
    % grad = compute_grad(beta, X, Y);
    % loss = loss_new;
    % loss_new = rmse(beta, X, Y);
    % fprintf('Round %d Diff RMSE %f\n', i, abs(loss_new - loss));
    % grad = compute_grad_SGD(beta, X, Y);
    grad = compute_grad_batch(beta, batch_size, X, Y);
    if mod(i, 2) == 0
        loss = loss_new;
        loss_new = rmse(beta, X, Y);
        fprintf('Round %d Diff RMSE %f\n', i, abs(loss_new - loss));
    end
    i = i + 1;
end
fprintf('Coef: %f, Intercept: %f\n', beta(2), beta(1));
fprintf('Our Coef: %f, Intercept: %f\n', beta(2) / max_x, beta(1))
res = rmse(beta, X, Y);
fprintf('Our RMSE: %f\n', res);

end

% 定义计算梯度的函数
% 优缺点:稳定,但速度慢
function grad = compute_grad(beta, x, y)
grad = [0, 0];
grad(1) = 2 .* mean(beta(1) + beta(2) .* x - y);
grad(2) = 2 .* mean((beta(1) + beta(2) .* x - y) .* x);
end

% 定义计算随机梯度的函数
% 优缺点:速度快,但不够稳定
function grad = compute_grad_SGD(beta, x, y)
grad = [0, 0];
r = randperm(length(x), 1);
grad(1) = 2 .* mean(beta(1) + beta(2) .* x(r) - y(r));
grad(2) = 2 .* mean((beta(1) + beta(2) .* x(r) - y(r)) .* x(r));
end

% 定义计算mini-batch随机梯度的函数
% 对速度和稳定性进行妥协后的产物
function grad = compute_grad_batch(beta, batch_size, x, y)
grad = [0, 0];
r = randperm(length(x), batch_size);
grad(1) = 2 .* mean(beta(1) + beta(2) .* x(r) - y(r));
grad(2) = 2 .* mean((beta(1) + beta(2) .* x(r) - y(r)) .* x(r));
end

% 定义更新beta的函数
function new_beta = update_beta(beta, alpha, grad)
new_beta = beta - alpha .* grad;
end

% 定义计算RMSE的函数(Root Mean Squared Error,均方根误差)
function res = rmse(beta, x, y)
squared_err = (beta(1) + beta(2) .* x - y).^2;
res = sqrt(mean(squared_err));
end

附录3. 方向导数与梯度向量

在这里插入图片描述

	clear; clc; clf;
    % 使用三维隐函数绘图
    f = @(x, y, z) x.^2 + y.^2 - z;
    fimplicit3(f);
    % 点(1, -1)处的梯度
    hold on;
    syms x y;
    z = x^2 + y^2;
    gradz = gradient(z);
    gradzv = subs(gradz, {x,y}, {1,-1});
    quiver(1, -1, gradzv(1,1), gradzv(2,1));
    % 点(1, -1)处的方向导数,其方向与x轴之间的夹角为theta
    theta = pi*3/4;
    f_2 = @(x, y, z) x*cos(theta) + y*sin(theta) - (cos(theta) - sin(theta));
    fimplicit3(f_2);
    % 坐标轴配置
    axis vis3d;
    xlabel('x轴'); ylabel('y轴'); zlabel('z轴');

附录4. 拉格朗日乘子法

在这里插入图片描述

% 有约束条件时,相当于求两条三维曲线的最小值
clear
clc
clf

subplot(1, 2, 1);

% 约束条件 xy=3
f = @(x, y, z) x .* y - 3;
fimplicit3(f, [-10 10 -10 10 0 60]);

hold on
f = @(x, y, z) x.^2 + y.^2 - z;
fimplicit3(f);

% 梯度向量平行
quiver([-sqrt(3)], [-sqrt(3)], [-2*sqrt(3)], [-2*sqrt(3)]);
quiver([-sqrt(3)], [-sqrt(3)], [-sqrt(3)], [-sqrt(3)]);

xlabel('x');
ylabel('y');
zlabel('z');

subplot(1, 2, 2);

% xy=3 垂直切面与 z=x^2+y^2 的交线
x = 0.01:0.01:10;
y = 3./x;
z = x.^2 + y.^2;
plot3(x, y, z);
hold on
plot(x, y);

x = -10:0.01:0.01;
y = 3./x;
z = x.^2 + y.^2;
plot3(x, y, z);
plot(x, y);


axis([-10 10 -10 10 0 60]);

xlabel('x');
ylabel('y');
zlabel('z');
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值