文章目录
前言
本节先研究清楚怎么求导、求偏导,然后再学习数值微分、梯度以及学习算法的实现。
一、求导、求偏导
https://zhuanlan.zhihu.com/p/465958129https://zhuanlan.zhihu.com/p/465958129python函数求导用sympy库中的diff()函数:
代码示例:求偏导
import numpy as np
import sympy as sp
x, y = sp.symbols('x y')
z = sp.sin(2 * sp.pi * x + 2 * y / 5)
zx = sp.diff(z, x)
zy = sp.diff(z, y)
print(zx)
print(zy)
结果:
2*pi*cos(2*pi*x + 2*y/5)
2*cos(2*pi*x + 2*y/5)/5
即所求得的导数只是一个符号表达式,不能直接代入数据使用。如何解决:
对x, y 使用evalf()函数分别赋值后,用float进行类型转换后,才能利用numpy进行数值计算。
代码示例:
import numpy as np
import sympy as sp
x, y = sp.symbols('x y')
z = sp.sin(2 * sp.pi * x + 2 * y / 5)
zx = sp.diff(z, x)
zy = sp.diff(z, y)
x1 = 10
y1 = 5
z_x1 = float(zx.evalf(subs = {x:x1, y:y1}))
z_y1 = float(zy.evalf(subs = {x:x1, y:y1}))
print(z_x1)
print(z_y1)
结果:
-2.61472768902227
-0.16645873461885696
如果x和y不是单一的值,而是一个数组,应该如何解决?
代码示例:
import numpy as np
import sympy as sp
x, y = sp.symbols('x y')
z = sp.sin(2 * sp.pi * x + 2 * y / 5)
zx = sp.diff(z, x)
zy = sp.diff(z, y)
#形成数组
x_array = np.linspace(-10, 10, 20)
y_array = np.linspace(-10, 10, 20)
#定义用于储存偏导值的空列表
temp_x = []
temp_y = []
for i in range(20):
z_x1 = float(zx.evalf(subs = {x:x_array[i], y:y_array[i]}))
temp_x.append(z_x1)
z_y1 = float(zy.evalf(subs = {x:x_array[i], y:y_array[i]}))
temp_y.append(z_y1)
#将列表转换为数组
zx_array = np.array(temp_x)
zy_array = np.array(temp_y)
print(zx_array)
print(zy_array)
结果:
[-4.10696399 -6.2474788 -5.02056763 -1.08754245 3.43167418 6.10119926
5.48214385 1.90818227 -2.6943002 -5.84453982 -5.84453982 -2.6943002
1.90818227 5.48214385 6.10119926 3.43167418 -1.08754245 -5.02056763
-6.2474788 -4.10696399]
[-0.26145745 -0.39772685 -0.31961926 -0.0692351 0.21846716 0.38841441
0.34900412 0.12147866 -0.17152448 -0.37207496 -0.37207496 -0.17152448
0.12147866 0.34900412 0.38841441 0.21846716 -0.0692351 -0.31961926
-0.39772685 -0.26145745]
至此,求导、求偏导问题解决!
二、梯度
神经网络在学习时必须找到最优参数,即损失函数取最小值时的参数。我们一般通过巧妙地使用梯度来寻找函数最小值。
需要注意的是,梯度表示的是各点处的函数值减小最多的方向。因此无法保证梯度所指的方向就是函数最小值或真正应该前进的方向。实际上,在复杂的函数中,梯度所指的方向基本上都不是函数值最小处。
虽然梯度的方向不一定指向最小值,但沿着它的方向能够最大限度地减小函数值。因此在寻找函数的最小值(或尽可能小的值)的位置时,要以梯度的信息为线索,决定前进方向。
一般来说,神经网络(深度学习)中,梯度法主要指梯度下降法。
1. 梯度的代码实现
代码实现:
import numpy as np
def numerical_grandient(f, x):
h = 1e-4
grad = np.zeros_like(x)
for idx in range(x.size):
tmp_val = x[idx]
#f(x+h)的计算
x[idx] = tmp_val + h
fxh1 = f(x)
#f(x-h)的计算
x[idx] = tmp_val - h
fxh2 = f(x)
grad[idx] = (fxh1 - fxh2) / (2 * h)
x[idx] = tmp_val
return grad
def function_2(x):
return x[0]**2 + x[1]**2
output = numerical_grandient(function_2, np.array([3.0, 4.0]))
print(output)
结果:
也可以用第一部分的求偏导的代码:
import numpy as np
import sympy as sp
x, y = sp.symbols('x y')
z = x ** 2 + y ** 2
zx = sp.diff(z, x)
zy = sp.diff(z, y)
x1 = 3
y1 = 4
z_x1 = float(zx.evalf(subs = {x:x1, y:y1}))
z_y1 = float(zy.evalf(subs = {x:x1, y:y1}))
grad = [z_x1, z_y1]
print(grad)
结果:
2. 利用梯度下降法求最小值
用数学式来表示梯度下降法:
式中yita表示学习率。学习率决定在一次学习中,应该学习多少,以及在多大程度上更新参数。
代码实现:
import numpy as np
def numerical_grandient(f, x):
h = 1e-4
grad = np.zeros_like(x)
for idx in range(x.size):
tmp_val = x[idx]
#f(x+h)的计算
x[idx] = tmp_val + h
fxh1 = f(x)
#f(x-h)的计算
x[idx] = tmp_val - h
fxh2 = f(x)
grad[idx] = (fxh1 - fxh2) / (2 * h)
x[idx] = tmp_val
return grad
def gradient_descent(f, init_x, lr, step_number):
x = init_x
for i in range(step_number):
grad = numerical_grandient(f, x)
x -= lr * grad
return x
def function_2(x):
return x[0]**2 + x[1]**2
data = np.array([-3.0, 4.0])
output = gradient_descent(f = function_2, init_x = data, lr = 0.1, step_number = 100)
print(output)
结果:
也可以用第一部分的求偏导的代码稍作改进:
import numpy as np
import sympy as sp
def numerical_grandient(x1, x2):
x, y = sp.symbols('x y')
z = x**2+y**2
zx = sp.diff(z, x)
zy = sp.diff(z, y)
z_x1 = float(zx.evalf(subs = {x:x1, y:x2}))
z_y1 = float(zy.evalf(subs = {x:x1, y:x2}))
grad = [z_x1, z_y1]
return grad
def gradient_descent(x1, x2, lr, step_number):
data_array = np.array([x1, x2])
for i in range(step_number):
grad = numerical_grandient(x1, x2)
grad_array = np.array(grad)
data_array -= lr * grad_array
x1 = data_array[0]
x2 = data_array[1]
return data_array
output = gradient_descent(-3.0, 4.0, 0.1, 100)
print(output)
结果:
可以看出,两个代码的运行结果相同!
注意:
学习率过大或过小都无法得到好的结果。做个实验:
import numpy as np
def numerical_grandient(f, x):
h = 1e-4
grad = np.zeros_like(x)
for idx in range(x.size):
tmp_val = x[idx]
#f(x+h)的计算
x[idx] = tmp_val + h
fxh1 = f(x)
#f(x-h)的计算
x[idx] = tmp_val - h
fxh2 = f(x)
grad[idx] = (fxh1 - fxh2) / (2 * h)
x[idx] = tmp_val
return grad
def gradient_descent(f, init_x, lr, step_number):
x = init_x
for i in range(step_number):
grad = numerical_grandient(f, x)
x -= lr * grad
return x
def function_2(x):
return x[0]**2 + x[1]**2
data = np.array([-3.0, 4.0])
output1 = gradient_descent(f = function_2, init_x = data, lr = 10, step_number = 100)
print(f"output_original is {output1}")
import numpy as np
import sympy as sp
def numerical_grandient(x1, x2):
x, y = sp.symbols('x y')
z = x**2+y**2
zx = sp.diff(z, x)
zy = sp.diff(z, y)
z_x1 = float(zx.evalf(subs = {x:x1, y:x2}))
z_y1 = float(zy.evalf(subs = {x:x1, y:x2}))
grad = [z_x1, z_y1]
return grad
def gradient_descent(x1, x2, lr, step_number):
data_array = np.array([x1, x2])
for i in range(step_number):
grad = numerical_grandient(x1, x2)
grad_array = np.array(grad)
data_array -= lr * grad_array
x1 = data_array[0]
x2 = data_array[1]
return data_array
output2 = gradient_descent(-3.0, 4.0, 10, 100)
print(f"output_myself is {output2}")
输出结果:
output_original is [-2.58983747e+13 -1.29524862e+12]
output_myself is [-2.25154873e+128 3.00206497e+128]
两个代码的结果不同,不理解为什么
再试试新的学习率:
代码:
import numpy as np
def numerical_grandient(f, x):
h = 1e-4
grad = np.zeros_like(x)
for idx in range(x.size):
tmp_val = x[idx]
#f(x+h)的计算
x[idx] = tmp_val + h
fxh1 = f(x)
#f(x-h)的计算
x[idx] = tmp_val - h
fxh2 = f(x)
grad[idx] = (fxh1 - fxh2) / (2 * h)
x[idx] = tmp_val
return grad
def gradient_descent(f, init_x, lr, step_number):
x = init_x
for i in range(step_number):
grad = numerical_grandient(f, x)
x -= lr * grad
return x
def function_2(x):
return x[0]**2 + x[1]**2
data = np.array([-3.0, 4.0])
output1 = gradient_descent(f = function_2, init_x = data, lr = 1e-10, step_number = 100)
print(f"output_original is {output1}")
import numpy as np
import sympy as sp
def numerical_grandient(x1, x2):
x, y = sp.symbols('x y')
z = x**2+y**2
zx = sp.diff(z, x)
zy = sp.diff(z, y)
z_x1 = float(zx.evalf(subs = {x:x1, y:x2}))
z_y1 = float(zy.evalf(subs = {x:x1, y:x2}))
grad = [z_x1, z_y1]
return grad
def gradient_descent(x1, x2, lr, step_number):
data_array = np.array([x1, x2])
for i in range(step_number):
grad = numerical_grandient(x1, x2)
grad_array = np.array(grad)
data_array -= lr * grad_array
x1 = data_array[0]
x2 = data_array[1]
return data_array
output2 = gradient_descent(-3.0, 4.0, 1e-10, 100)
print(f"output_myself is {output2}")
结果:
output_original is [-2.99999994 3.99999992]
output_myself is [-2.99999994 3.99999992]
此时的输出结果又相同了。
再和书上对照一下:
我真的不理解,令人很费解呀!!!
总结
今天内容有点多,学习算法的实现下一节再做。