梯度下降 求 最优损失函数 (李宏毅机器学习)
令 模型函数 f = b + w ⋅ x f = b + w \cdot x f=b+w⋅x
根据最小二乘法 m i n ∣ ∣ x w − y ∣ ∣ 2 2 min||x_{w} - y||^2_{2} min∣∣xw−y∣∣22 求得loss fun L ( b , w ) = ∑ n = 1 k ( y − ( b + w ⋅ x ) ) 2 L(b,w) = \sum\limits_{n=1}^k(y - (b + w \cdot x ))^2 L(b,w)=n=1∑k(y−(b+w⋅x))2
b , w = a r g m i n L ( b , w ) b,w = argmin L(b,w) b,w=argminL(b,w)
对于f分别对b,w求偏导:
∂ f ∂ w = 2 ( y − ( b + w ⋅ x ) ) ⋅ ( − x ) \frac{∂f}{∂w} = 2(y - (b + w \cdot x)) \cdot (-x) ∂w∂f=2(y−(b+w⋅x))⋅(−x)
∂ f ∂ b = 2 ( y − ( b + w ⋅ x ) ) ⋅ ( − 1 ) \frac{∂f}{∂b} = 2(y - (b + w \cdot x)) \cdot (-1) ∂b∂f=2(y−(b+w⋅x))⋅(−1)
所以L(b,w) 的对b,w的偏导数可以表示为:
∂ f ∂ w = ∑ n = 1 k 2 ( y − ( b + w ⋅ x ) ) ⋅ ( − x ) \frac{∂f}{∂w} = \sum\limits_{n=1}^k2(y - (b + w \cdot x)) \cdot (-x) ∂w∂f=n=1∑k2(y−(b+w⋅x))⋅(−x)
∂ f ∂ b = ∑ n = 1 k 2 ( y − ( b + w ⋅ x ) ) ⋅ ( − 1 ) \frac{∂f}{∂b} = \sum\limits_{n=1}^k2(y - (b + w \cdot x)) \cdot (-1) ∂b∂f=n=1∑k2(y−(b+w⋅x))⋅(−1)
关于L(b,w) 偏导数推导过程:
![](https://i-blog.csdnimg.cn/blog_migrate/55f0b345a5ad77c99ce99e73772c1335.jpeg)
根据梯度下降算法,我们可以每找到一次(b,w)就求一次梯度,之后更新(b,w),直到(b,w)这个点的梯度接近0
w 1 = w 0 − η ∂ f ∂ w ∣ w = w 0 w_1 = w_0 - \eta \frac{∂f}{∂w}|w=w_0 w1=w0−η∂w∂f∣w=w0
b 1 = b 0 − η ∂ f ∂ b ∣ b = b 0 b_1 = b_0 - \eta \frac{∂f}{∂b}|b=b_0 b1=b0−η∂b∂f∣b=b0
代码实现:
import numpy as np
from numpy import *
import matplotlib.pyplot as plt
# x_data = [[338.], [333.], [328.], [207.], [226.], [25.], [179.], [60.], [208.], [606.]]
x_data = [[338.], [333.], [328.], [207.], [226.], [25.], [179.], [60.], [208.], [606.]]
y_data = [640., 633., 619., 393., 428., 27., 193., 66., 226., 1591.]
# ydata = b + w * xdata
x = np.arange(-200, -100, 1) # x 坐标轴的值
y = np.arange(-5, 5, 0.1) # y 坐标轴的值
Z = np.zeros((len(x), len(y))) # z 是 矩阵0
X, Y = np.meshgrid(x, y) # 网格点坐标
for i in range(len(x)):
for j in range(len(y)):
b = x[i]
w = y[j]
Z[j][i] = 0
for n in range(len(x_data)):
Z[j][i] = Z[j][i] + (y_data[n] - b - w * x_data[n][0])**2
Z[j][i] = Z[j][i] / len(x_data)
b = -120
w = -4
lr = 0.0000001
iteration = 100000
b_his = [b]
w_his = [w]
# 手写梯度下降
for i in range(iteration):
b_grad = 0.0
w_grad = 0.0
for n in range(len(x_data)):
b_grad = b_grad - 2.0 * (y_data[n] - b - w * x_data[n][0]) * 1.0
w_grad = w_grad - 2.0 * (y_data[n] - b - w * x_data[n][0]) * x_data[n][0]
b = b - lr * b_grad
w = w - lr * w_grad
b_his.append(b)
w_his.append(w)
# 最后求出的(b,w)是最接近最优值的,但仍然有误差
plt.contourf(x, y, Z, 50, alpha=0.5, cmap=plt.get_cmap('jet'))
# 最优解,后续可以使用sklearn 求出
plt.plot([-188.4], [2.67], 'x', ms=12, markeredgewidth=3, color='orange')
plt.plot(b_his, w_his, 'o-', ms=3, lw=1.5, color='black')
plt.xlim(-200, -100)
plt.ylim(-5, 5)
plt.xlabel(r'$b$', fontsize=16)
plt.ylabel(r'$w$', fontsize=16)
plt.show()
采用sklearn 线性回归模型也可拟合出最优(b,w)
import numpy as np
from numpy import *
import matplotlib.pyplot as plt
from sklearn import linear_model
# x_data = [[338.], [333.], [328.], [207.], [226.], [25.], [179.], [60.], [208.], [606.]]
x_data = [[338.], [333.], [328.], [207.], [226.], [25.], [179.], [60.], [208.], [606.]]
y_data = [640., 633., 619., 393., 428., 27., 193., 66., 226., 1591.]
# ydata = b + w * xdata
x = np.arange(-200, -100, 1) # x 坐标轴的值
y = np.arange(-5, 5, 0.1) # y 坐标轴的值
Z = np.zeros((len(x), len(y))) # z 是 矩阵0
X, Y = np.meshgrid(x, y) # 网格点坐标
for i in range(len(x)):
for j in range(len(y)):
b = x[i]
w = y[j]
Z[j][i] = 0
for n in range(len(x_data)):
Z[j][i] = Z[j][i] + (y_data[n] - b - w * x_data[n][0])**2
Z[j][i] = Z[j][i] / len(x_data)
reg = linear_model.LinearRegression()
reg.fit(x_data, y_data)
print(reg.coef_[0])
print(reg.intercept_)
plt.contourf(x, y, Z, 50, alpha=0.5, cmap=plt.get_cmap('jet'))
# 最优解
plt.plot(reg.intercept_, reg.coef_[0], 'x', ms=12, lw=1.5, color='orange')
plt.xlim(-200, -100)
plt.ylim(-5, 5)
plt.xlabel(r'$b$', fontsize=16)
plt.ylabel(r'$w$', fontsize=16)
plt.show()