(1)梯度下降类似贪心,能找到局部最优,不一定能找到全局最优
(2)梯度下降用于求极小值,下降方向是梯度的负方向
import numpy as np
x_data = [1.0, 2.0, 3.0]
y_data = [2.0, 4.0, 6.0]
w = 1.0
def forward(x):
return x * w
#计算MSE
def cost(xs , ys):
cost = 0
for x, y in zip(xs, ys):
y_pred = forward(x)
cost += (y_pred - y) ** 2
return cost / len(xs)
#求梯度
def gradient(xs, ys):
grad = 0
for x, y in zip(xs, ys):
grad += 2 * x * (x * w - y)
return grad / len(xs)
print('Predict (before training)', 4, forward(4))
for epoch in range(100):
cost_val = cost(x_data, y_data)
grad_val = gradient(x_data, y_data)
w -= 0.01 * grad_val
print('Epoch:', epoch, 'w', w, 'loss', cost_val)
print('Predict (after training)', 4, forward(4))
根据梯度下降更新权重,最终的预测结果为:
(3)随机梯度下降:随机选一个损失loss
(4)梯度下降的性能低,时间复杂度低,随机梯度下降的性能高,时间复杂度高。为折衷取Batch,即批量的随机梯度下降,若干个分为一组求梯度下降,性能高、时间复杂度低
(5)梯度下降和随机梯度下降的主要区别:
1>损失函数由cost()改为loss()。cost是计算所有训练数据的损失,loss是随机计算一个训练函数的损失。
2>梯度函数gradient()由计算所有训练数据的梯度改为计算一个训练数据的梯度
import numpy as np
x_data = [1.0, 2.0, 3.0]
y_data = [2.0, 4.0, 6.0]
w = 1.0
def forward(x):
return x * w
def loss(x , y):
y_pred = forward(x)
return (y_pred - y) ** 2
#求梯度
def gradient(x, y):
return 2 * x * (x * w - y)
print('Predict (before training)', 4, forward(4))
for epoch in range(100):
for x, y in zip(x_data, y_data):
grad = gradient(x, y)
w -= 0.01 * grad
print('\tgrad:', x, y, grad)
l = loss(x, y)
print('progress:', epoch, 'w', w, 'loss', l)
print('Predict (after training)', 4, forward(4))
根据随机梯度下降更新权重,最终的预测结果为:
可以看出随机梯度下降的结果比梯度下降的结果要好。