梯度下降
任务:依然是用三个数据点,拟合一个线性模型,模型的参数的确定采用梯度下降法,本问题中,损失函数是MSE,梯度很容易求解,代码比较简单,需要注意的是梯度的计算需要初始w以及设置学习率alpha,深度学习中对于陷入局部最优点有一定的解决办法,但是要注意鞍点的处理(梯度取0的点)。
def forward(x):
return w*x
def cost(xs, ys):
cost = 0
for x_val, y_val in zip(xs, ys):
y_pred = forward(x_val)
cost += (y_val-y_pred) ** 2
return cost / len(xs)
def gradient(xs, ys):
grad = 0
for x_val, y_val in zip(xs, ys):
grad += 2 * (w * x_val - y_val) * x_val
return grad / len(xs)
w = 1.0
alpha = 0.04
list_cost = []
list_epoch = []
for epoch in range(100):
cost_val = cost(x_data, y_data)
grad_val = gradient(x_data, y_data)
w -= alpha * grad_val
print("epoch: {0}, w = {1}, cost = {2}".format(epoch, w, cost_val))
list_epoch.append(epoch)
list_cost.append(cost_val)
print("Predict : x = 4, y = ", forward(4))
plt.xlabel("epoch")
plt.ylabel("cost")
plt.plot(list_epoch, list_cost)
plt.show()
梯度下降算法将所有样本作为一个整体来计算,由于样本与样本之间没有依赖关系,所以可以使用并行的方式,这样算法的速度能够很快但是性能难以保证; 而随机梯度下降法(SGD),每次仅拿出一个样本来计算梯度从而更新w值,算法性能可能更优,但是在更新w时,样本之间是有前后依赖关系的(因为每次更新w都是通过一个样本), 算法的速度较慢。为了折中,深度学习中采用mini-batch的方法(将若干个样本分成一组来更新w)。
import numpy as np
import matplotlib.pyplot as plt
x_data = [1.0,2.0,3.0]
y_data = [2.0,4.0,6,0]
def forward(x):
return w * x
def loss(x, y):
y_pred = forward(x)
return (y - y_pred) ** 2
def gradient(x, y):
return (w * x - y) * x
w = 1.0
alpha = 0.04
list_cost = []
list_epoch = []
for epoch in range(100):
grad = 0
for x_val, y_val in zip(x_data, y_data):
loss_val = loss(x_val, y_val)
grad = gradient(x_val, y_val)
w -= alpha * grad
print("sample x = {0}, y = {1}, grad = {2}".format(x_val, y_val, grad))
print("Epoch = {0}, w = {1}, loss = {2}".format(epoch, w, loss_val))
list_epoch.append(epoch)
list_cost.append(loss_val)
print("Predict : x = 4, y = ", forward(4))
plt.xlabel("epoch")
plt.ylabel("cost")
plt.plot(list_epoch, list_cost)
plt.show()