【自学】深度学习入门 基于python的理论与实现 LESSON5 <神经网络的学习2>

文章目录

前言

一、求导、求偏导

二、梯度

1. 梯度的代码实现

2. 利用梯度下降法求最小值

总结


前言

本节先研究清楚怎么求导、求偏导,然后再学习数值微分、梯度以及学习算法的实现。


一、求导、求偏导

https://zhuanlan.zhihu.com/p/465958129https://zhuanlan.zhihu.com/p/465958129python函数求导用sympy库中的diff()函数:

代码示例:求偏导

import numpy as np
import sympy as sp

x, y = sp.symbols('x y')
z = sp.sin(2 * sp.pi * x + 2 * y / 5)
zx = sp.diff(z, x)
zy = sp.diff(z, y)
print(zx)
print(zy)

结果:

2*pi*cos(2*pi*x + 2*y/5)
2*cos(2*pi*x + 2*y/5)/5

即所求得的导数只是一个符号表达式,不能直接代入数据使用。如何解决:

对x, y 使用evalf()函数分别赋值后,用float进行类型转换后,才能利用numpy进行数值计算。

代码示例:

import numpy as np
import sympy as sp

x, y = sp.symbols('x y')
z = sp.sin(2 * sp.pi * x + 2 * y / 5)
zx = sp.diff(z, x)
zy = sp.diff(z, y)
x1 = 10
y1 = 5
z_x1 = float(zx.evalf(subs = {x:x1, y:y1}))
z_y1 = float(zy.evalf(subs = {x:x1, y:y1}))
print(z_x1)
print(z_y1)

结果:

-2.61472768902227
-0.16645873461885696

如果x和y不是单一的值,而是一个数组,应该如何解决?

代码示例:

import numpy as np
import sympy as sp

x, y = sp.symbols('x y')
z = sp.sin(2 * sp.pi * x + 2 * y / 5)
zx = sp.diff(z, x)
zy = sp.diff(z, y)

#形成数组
x_array = np.linspace(-10, 10, 20)
y_array = np.linspace(-10, 10, 20)

#定义用于储存偏导值的空列表
temp_x = []
temp_y = []
for i in range(20):
    z_x1 = float(zx.evalf(subs = {x:x_array[i], y:y_array[i]}))
    temp_x.append(z_x1)
    z_y1 = float(zy.evalf(subs = {x:x_array[i], y:y_array[i]}))
    temp_y.append(z_y1)

#将列表转换为数组    
zx_array = np.array(temp_x)
zy_array = np.array(temp_y)

print(zx_array)
print(zy_array)

结果:

[-4.10696399 -6.2474788  -5.02056763 -1.08754245  3.43167418  6.10119926
  5.48214385  1.90818227 -2.6943002  -5.84453982 -5.84453982 -2.6943002
  1.90818227  5.48214385  6.10119926  3.43167418 -1.08754245 -5.02056763
 -6.2474788  -4.10696399]
[-0.26145745 -0.39772685 -0.31961926 -0.0692351   0.21846716  0.38841441
  0.34900412  0.12147866 -0.17152448 -0.37207496 -0.37207496 -0.17152448
  0.12147866  0.34900412  0.38841441  0.21846716 -0.0692351  -0.31961926
 -0.39772685 -0.26145745]

至此,求导、求偏导问题解决!

二、梯度

神经网络在学习时必须找到最优参数,即损失函数取最小值时的参数。我们一般通过巧妙地使用梯度来寻找函数最小值。

需要注意的是,梯度表示的是各点处的函数值减小最多的方向。因此无法保证梯度所指的方向就是函数最小值或真正应该前进的方向。实际上,在复杂的函数中,梯度所指的方向基本上都不是函数值最小处。

虽然梯度的方向不一定指向最小值,但沿着它的方向能够最大限度地减小函数值。因此在寻找函数的最小值(或尽可能小的值)的位置时,要以梯度的信息为线索,决定前进方向。

一般来说,神经网络(深度学习)中,梯度法主要指梯度下降法。

1. 梯度的代码实现

 代码实现:

import numpy as np

def numerical_grandient(f, x):
    h = 1e-4
    grad = np.zeros_like(x)
    
    for idx in range(x.size):
        tmp_val = x[idx]
        #f(x+h)的计算
        x[idx] = tmp_val + h
        fxh1 = f(x)
        
        #f(x-h)的计算
        x[idx] = tmp_val - h
        fxh2 = f(x)
        
        grad[idx] = (fxh1 - fxh2) / (2 * h)
        x[idx] = tmp_val
    return grad


def function_2(x):
    return x[0]**2 + x[1]**2

output = numerical_grandient(function_2, np.array([3.0, 4.0]))
print(output)

结果:

 也可以用第一部分的求偏导的代码:

import numpy as np
import sympy as sp

x, y = sp.symbols('x y')
z = x ** 2 + y ** 2
zx = sp.diff(z, x)
zy = sp.diff(z, y)
x1 = 3
y1 = 4
z_x1 = float(zx.evalf(subs = {x:x1, y:y1}))
z_y1 = float(zy.evalf(subs = {x:x1, y:y1}))
grad = [z_x1, z_y1]
print(grad)

结果:

2. 利用梯度下降法求最小值

用数学式来表示梯度下降法:

式中yita表示学习率。学习率决定在一次学习中,应该学习多少,以及在多大程度上更新参数。

 代码实现:

import numpy as np

def numerical_grandient(f, x):
    h = 1e-4
    grad = np.zeros_like(x)
    
    for idx in range(x.size):
        tmp_val = x[idx]
        #f(x+h)的计算
        x[idx] = tmp_val + h
        fxh1 = f(x)
        
        #f(x-h)的计算
        x[idx] = tmp_val - h
        fxh2 = f(x)
        
        grad[idx] = (fxh1 - fxh2) / (2 * h)
        x[idx] = tmp_val
    return grad

def gradient_descent(f, init_x, lr, step_number):
    x = init_x
    
    for i in range(step_number):
        grad = numerical_grandient(f, x)
        x -= lr * grad
        
    return x

def function_2(x):
    return x[0]**2 + x[1]**2

data = np.array([-3.0, 4.0])
output = gradient_descent(f = function_2, init_x = data, lr = 0.1, step_number = 100)
print(output)

结果:

  也可以用第一部分的求偏导的代码稍作改进:

import numpy as np
import sympy as sp

def numerical_grandient(x1, x2):
    x, y = sp.symbols('x y')
    z = x**2+y**2
    zx = sp.diff(z, x)
    zy = sp.diff(z, y)
    z_x1 = float(zx.evalf(subs = {x:x1, y:x2}))
    z_y1 = float(zy.evalf(subs = {x:x1, y:x2}))
    grad = [z_x1, z_y1]
    return grad

def gradient_descent(x1, x2, lr, step_number):
    data_array = np.array([x1, x2])

    for i in range(step_number):
        grad = numerical_grandient(x1, x2)
        grad_array = np.array(grad)
        data_array -= lr * grad_array
        x1 = data_array[0]
        x2 = data_array[1]
    return data_array

output = gradient_descent(-3.0, 4.0, 0.1, 100)    
print(output)

结果:

可以看出,两个代码的运行结果相同!

注意:

 学习率过大或过小都无法得到好的结果。做个实验:

import numpy as np

def numerical_grandient(f, x):
    h = 1e-4
    grad = np.zeros_like(x)
    
    for idx in range(x.size):
        tmp_val = x[idx]
        #f(x+h)的计算
        x[idx] = tmp_val + h
        fxh1 = f(x)
        
        #f(x-h)的计算
        x[idx] = tmp_val - h
        fxh2 = f(x)
        
        grad[idx] = (fxh1 - fxh2) / (2 * h)
        x[idx] = tmp_val
    return grad

def gradient_descent(f, init_x, lr, step_number):
    x = init_x
    
    for i in range(step_number):
        grad = numerical_grandient(f, x)
        x -= lr * grad
        
    return x

def function_2(x):
    return x[0]**2 + x[1]**2

data = np.array([-3.0, 4.0])
output1 = gradient_descent(f = function_2, init_x = data, lr = 10, step_number = 100)
print(f"output_original is {output1}")


import numpy as np
import sympy as sp

def numerical_grandient(x1, x2):
    x, y = sp.symbols('x y')
    z = x**2+y**2
    zx = sp.diff(z, x)
    zy = sp.diff(z, y)
    z_x1 = float(zx.evalf(subs = {x:x1, y:x2}))
    z_y1 = float(zy.evalf(subs = {x:x1, y:x2}))
    grad = [z_x1, z_y1]
    return grad

def gradient_descent(x1, x2, lr, step_number):
    data_array = np.array([x1, x2])

    for i in range(step_number):
        grad = numerical_grandient(x1, x2)
        grad_array = np.array(grad)
        data_array -= lr * grad_array
        x1 = data_array[0]
        x2 = data_array[1]
    return data_array

output2 = gradient_descent(-3.0, 4.0, 10, 100)    
print(f"output_myself is {output2}")

输出结果:

output_original is [-2.58983747e+13 -1.29524862e+12]
output_myself is [-2.25154873e+128  3.00206497e+128]

两个代码的结果不同,不理解为什么

再试试新的学习率:

代码:

import numpy as np

def numerical_grandient(f, x):
    h = 1e-4
    grad = np.zeros_like(x)
    
    for idx in range(x.size):
        tmp_val = x[idx]
        #f(x+h)的计算
        x[idx] = tmp_val + h
        fxh1 = f(x)
        
        #f(x-h)的计算
        x[idx] = tmp_val - h
        fxh2 = f(x)
        
        grad[idx] = (fxh1 - fxh2) / (2 * h)
        x[idx] = tmp_val
    return grad

def gradient_descent(f, init_x, lr, step_number):
    x = init_x
    
    for i in range(step_number):
        grad = numerical_grandient(f, x)
        x -= lr * grad
        
    return x

def function_2(x):
    return x[0]**2 + x[1]**2

data = np.array([-3.0, 4.0])
output1 = gradient_descent(f = function_2, init_x = data, lr = 1e-10, step_number = 100)
print(f"output_original is {output1}")


import numpy as np
import sympy as sp

def numerical_grandient(x1, x2):
    x, y = sp.symbols('x y')
    z = x**2+y**2
    zx = sp.diff(z, x)
    zy = sp.diff(z, y)
    z_x1 = float(zx.evalf(subs = {x:x1, y:x2}))
    z_y1 = float(zy.evalf(subs = {x:x1, y:x2}))
    grad = [z_x1, z_y1]
    return grad

def gradient_descent(x1, x2, lr, step_number):
    data_array = np.array([x1, x2])

    for i in range(step_number):
        grad = numerical_grandient(x1, x2)
        grad_array = np.array(grad)
        data_array -= lr * grad_array
        x1 = data_array[0]
        x2 = data_array[1]
    return data_array

output2 = gradient_descent(-3.0, 4.0, 1e-10, 100)    
print(f"output_myself is {output2}")

结果:

output_original is [-2.99999994  3.99999992]
output_myself is [-2.99999994  3.99999992]

此时的输出结果又相同了。

再和书上对照一下:

我真的不理解,令人很费解呀!!!


总结

今天内容有点多,学习算法的实现下一节再做。

  • 2
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

Rachel MuZy

你的鼓励是我的动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值