- 首先需要了解代码中用到的MSE算法, MSE就是mean square error, 均方差的意思, 用来求loss损失函数, 预测实际值与预测值的差距。
- 然后需要掌握,如何达到预测值的方法, SGD算法, 也叫梯度下降算法, 或反向传播算法, 其中用到链式法则。
- numpy 首先用于机器学习, 主要是对数组, 矩阵等的操作。通过numpy构造的函数, 可以学习到, 每一个函数构造的细节。
import numpy as np
# f = w * x
# f = 2 * x
x = np.array([1, 2, 3, 4], dtype = np.float32)
y = np.array([2, 4, 6, 8], dtype = np.float32)
w = 0.0
# model prediction
def forward(x):
return w * x
# loss = MSE
def loss(y, y_predicted):
return((y_predicted - y) ** 2).mean()
# gradient
# MSE = 1/n * (w * x - y)** 2
# dJ/dw = 1/n * 2x * (w * x - y)
def gradient(x, y, y_predicted):
return np.dot(2*x, y_predicted-y).mean()
print(f'prediction before training: f(5) = {forward(5):.3f}')
# training
learning_rate = 0.01
n_iters = 20
for epoch in range(n_iters):
# prediction = forward pass
y_pred = forward(x)
#loss
l = loss(y, y_pred)
# gradient
dw = gradient(x, y, y_pred)
# update weights
w -= learning_rate * dw
if epoch % 2 == 0:
print(f'epoch {epoch+1}: w = {w:.3f}, loss = {l:.8f}')
print(f'prediction after training: f(5) = {forward(5):.3f}')
prediction before training: f(5) = 0.000
epoch 1: w = 1.200, loss = 30.00000000
epoch 3: w = 1.872, loss = 0.76800019
epoch 5: w = 1.980, loss = 0.01966083
epoch 7: w = 1.997, loss = 0.00050332
epoch 9: w = 1.999, loss = 0.00001288
epoch 11: w = 2.000, loss = 0.00000033
epoch 13: w = 2.000, loss = 0.00000001
epoch 15: w = 2.000, loss = 0.00000000
epoch 17: w = 2.000, loss = 0.00000000
epoch 19: w = 2.000, loss = 0.00000000
prediction after training: f(5) = 10.000
- 如果了解numpy构造线性回归算法, 可以进一步学习深度学习模型, 通过pytorch库中的反向传播直接构造。
import torch
# f = w * x
# f = 2 * x
x = torch.tensor([1, 2, 3, 4], dtype = torch.float32)
y = torch.tensor([2, 4, 6, 8], dtype = torch.float32)
w = torch.tensor(0.0, dtype = torch.float32, requires_grad = True)
# model prediction
def forward(x):
return w * x
# loss = MSE
def loss(y, y_predicted):
return((y_predicted - y) ** 2).mean()
print(f'prediction before training: f(5) = {forward(5):.3f}')
# training
learning_rate = 0.01
n_iters = 100
for epoch in range(n_iters):
# prediction = forward pass
y_pred = forward(x)
#loss
l = loss(y, y_pred)
# gradient = backward pass
l.backward() # dl/dw
# update weights
with torch.no_grad():
w -= learning_rate * w.grad
# zero gradients
w.grad.zero_()
if epoch % 10 == 0:
print(f'epoch {epoch+1}: w = {w:.3f}, loss = {l:.8f}')
print(f'prediction after training: f(5) = {forward(5):.3f}')
prediction before training: f(5) = 0.000
epoch 1: w = 0.300, loss = 30.00000000
epoch 11: w = 1.665, loss = 1.16278565
epoch 21: w = 1.934, loss = 0.04506890
epoch 31: w = 1.987, loss = 0.00174685
epoch 41: w = 1.997, loss = 0.00006770
epoch 51: w = 1.999, loss = 0.00000262
epoch 61: w = 2.000, loss = 0.00000010
epoch 71: w = 2.000, loss = 0.00000000
epoch 81: w = 2.000, loss = 0.00000000
epoch 91: w = 2.000, loss = 0.00000000
prediction after training: f(5) = 10.000
- numpy 学习达到目标值: f(5) = 10 用了20次迭代, 而pytorch用了100次才达到相同效果, 这是因为,numpy完全按照预先设计的数学模型模型预测, 能够用最快的速度找到最佳梯度值, 而pytorch通过backward模型计算, 需要迭代次数稍多, 而torch的优点是可以使用GPU, 并发计算, 运行时间秒杀numpy.
- 应用GPU举例
x_values = [i for i in range(11)]
x_train = np.array(x_values, dtype = np.float32)
x_train = x_train.reshape(-1, 1)
y_values = [2*i for i in x_values]# y = 2*x + 1
y_train = np.array(y_values, dtype = np.float32)
y_train = y_train.reshape(-1, 1)
import torch
import torch.nn as nn
import numpy as np
class LinearRegessionModel(nn.Module):
def __init__(self, input_dim, output_dim):
super(LinearRegessionModel, self).__init__()
self.linear = nn.Linear(input_dim, output_dim)
def forward(self, x):
out = self.linear(x)
return out
input_dim = 1
output_dim = 1
model = LinearRegessionModel(input_dim, output_dim)
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
model.to(device)
epochs = 1000
learning_rate = 0.01
optimizer = torch.optim.SGD(model.parameters(), lr = learning_rate)
criterion = nn.MSELoss()
for epoch in range(epochs):
epoch += 1
# 转化成tensor
inputs = torch.from_numpy(x_train).to(device)
labels = torch.from_numpy(y_train).to(device)
# 完成每次迭代,梯度清零
optimizer.zero_grad()
# 向前传播
outputs = model(inputs)
# 计算损失
loss = criterion(outputs, labels)
# 反向传播
loss.backward()
# 更新权重
optimizer.step()
if epoch % 50 == 0:
print('epoch {}, loss {}'.format(epoch, loss.item()))
epoch 50, loss 0.03241850063204765
epoch 100, loss 0.01849028840661049
epoch 150, loss 0.01054617203772068
epoch 200, loss 0.0060151442885398865
epoch 250, loss 0.003430827520787716
epoch 300, loss 0.0019568265415728092
epoch 350, loss 0.0011160820722579956
epoch 400, loss 0.0006365829613059759
epoch 450, loss 0.0003630803257692605
epoch 500, loss 0.00020708395459223539
epoch 550, loss 0.00011811463627964258
epoch 600, loss 6.736700743203983e-05
epoch 650, loss 3.8423640944529325e-05
epoch 700, loss 2.1917059711995535e-05
epoch 750, loss 1.2499446711444762e-05
epoch 800, loss 7.129288860596716e-06
epoch 850, loss 4.066908331878949e-06
epoch 900, loss 2.319003215234261e-06
epoch 950, loss 1.3228863053882378e-06
epoch 1000, loss 7.547288873865909e-07
predicted =
model(torch.from_numpy(x_train).requires_grad_()).data.numpy()
predicted array([[4.1975118e-03],
[2.0035930e+00],
[4.0029883e+00],
[6.0023842e+00],
[8.0017796e+00],
[1.0001175e+01],
[1.2000570e+01],
[1.3999966e+01],
[1.5999361e+01],
[1.7998758e+01],
[1.9998154e+01]], dtype=float32) labels tensor([[ 0.],
[ 2.],
[ 4.],
[ 6.],
[ 8.],
[10.],
[12.],
[14.],
[16.],
[18.],
[20.]])