引言
本系列博客旨在为机器学习(深度学习)提供数学理论基础。因此内容更为精简,适合二次学习的读者快速学习或查阅。
推导过程
假设矩阵 A m , n − 1 ′ A'_{m,n-1} Am,n−1′ 为自变量数据矩阵, b m b_{m} bm 为因变量向量,令 A m , n = [ A ′ , 1 ] A_{m,n}=[A',1] Am,n=[A′,1],目标是找到一条直线 f ( x ) = A x f(x)=Ax f(x)=Ax 使得值到直线的误差最小,因此我们需要采用梯度下降算法来找到最小化下式的 x x x 的值: f ( x ) = 1 2 ∣ ∣ A x − b ∣ ∣ 2 2 f(x)=\frac{1}{2}||Ax-b||^{2}_{2} f(x)=21∣∣Ax−b∣∣22 本次带领读者完成一次完整的求导过程,之后就直接给结论了,首先将原函数进行展开: f ( x ) = 1 2 [ ( ( a 11 x 1 + ⋯ + a 1 n x n ) − b 1 ) 2 + ⋯ + ( ( a m 1 x 1 + ⋯ + a m n x n ) − b m ) 2 ] f(x)=\frac{1}{2}[((a_{11}x_{1}+\dots+a_{1n}x_{n})-b_{1})^{2} + \dots + ((a_{m1}x_{1}+\dots+a_{mn}x_{n})-b_{m})^{2}] f(x)=21[((a11x1+⋯+a1nxn)−b1)2+⋯+((am1x1+⋯+amnxn)−bm)2] 根据向量微积分 ∇ x f ( x ) = 1 2 ∗ [ ∂ [ ( ( a 11 x 1 + ⋯ + a 1 n x n ) − b 1 ) 2 + ⋯ + ( ( a m 1 x 1 + ⋯ + a m n x n ) − b m ) 2 ] ∂ x 1 ⋮ ∂ [ ( ( a 11 x 1 + ⋯ + a 1 n x n ) − b 1 ) 2 + ⋯ + ( ( a m 1 x 1 + ⋯ + a m n x n ) − b m ) 2 ] ∂ x n ] = [ 2 a 11 ( ( a 11 x 1 + ⋯ + a 1 n x n ) − b 1 ) + ⋯ + 2 a m 1 ( ( a m 1 x 1 + ⋯ + a m n x n ) − b m ) ⋮ 2 a 1 n ( ( a 11 x 1 + ⋯ + a 1 n x n ) − b 1 ) + ⋯ + 2 a m n ( ( a m 1 x 1 + ⋯ + a m n x n ) − b m ) ] = 1 2 ∗ 2 [ a 11 … a m 1 ⋮ ⋮ a 1 n … a n m ] [ ( a 11 x 1 + ⋯ + a 1 n x n ) − b 1 ⋮ ( a m 1 x 1 + ⋯ + a m n x n ) − b m ] \nabla_{x}f(x)=\frac{1}{2}*\begin{bmatrix} \frac{\partial[((a_{11}x_{1}+\dots+a_{1n}x_{n})-b_{1})^{2}+\dots+((a_{m1}x_{1}+\dots+a_{mn}x_{n})-b_{m})^{2}]}{\partial x_{1}} \\ \vdots \\ \frac{\partial[((a_{11}x_{1}+\dots+a_{1n}x_{n})-b_{1})^{2}+\dots+((a_{m1}x_{1}+\dots+a_{mn}x_{n})-b_{m})^{2}]}{\partial x_{n}} \end{bmatrix}\\= \begin{bmatrix} 2a_{11}((a_{11}x_{1}+\dots+a_{1n}x_{n})-b_{1})+\dots+2a_{m1}((a_{m1}x_{1}+\dots+a_{mn}x_{n})-b_{m})\\ \vdots \\ 2a_{1n}((a_{11}x_{1}+\dots+a_{1n}x_{n})-b_{1})+\dots+2a_{mn}((a_{m1}x_{1}+\dots+a_{mn}x_{n})-b_{m}) \end{bmatrix}\\= \frac{1}{2}*2\begin{bmatrix} a_{11} & \dots & a_{m1} \\ \vdots & & \vdots \\ a_{1n} &\dots & a_{nm} \end{bmatrix}\begin{bmatrix} (a_{11}x_{1}+\dots+a_{1n}x_{n})-b_{1} \\ \vdots \\ (a_{m1}x_{1}+\dots+a_{mn}x_{n})-b_{m} \end{bmatrix} ∇xf(x)=21∗⎣ ⎡∂x1∂[((a11x1+⋯+a1nxn)−b1)2+⋯+((am1x1+⋯+amnxn)−bm)2]⋮∂xn∂[((a11x1+⋯+a1nxn)−b1)2+⋯+((am1x1+⋯+amnxn)−bm)2]⎦ ⎤=⎣ ⎡2a11((a11x1+⋯+a1nxn)−b1)+⋯+2am1((am1x1+⋯+amnxn)−bm)⋮2a1n((a11x1+⋯+a1nxn)−b1)+⋯+2amn((am1x1+⋯+amnxn)−bm)⎦ ⎤=21∗2⎣ ⎡a11⋮a1n……am1⋮anm⎦ ⎤⎣ ⎡(a11x1+⋯+a1nxn)−b1⋮(am1x1+⋯+amnxn)−bm⎦ ⎤ 上式整理可得: ∇ x f ( x ) = A T ( A x − b ) = A T A x − A T b \nabla_{x}f(x)=A^{T}(Ax-b)=A^{T}Ax-A^{T}b ∇xf(x)=AT(Ax−b)=ATAx−ATb
代码实现
import numpy as np
from random import randint
from functools import reduce
from sklearn.metrics import mean_squared_error
class GLSModel:
def __init__(self):
self.x = None
def fit(self, x, y, e, epochs):
"""
填充训练数据进行梯度更新
:param x: 自变量数据
:param y: 因变量数据
:param e: 学习率
:param epochs: 迭代次数
"""
if x.shape[0] != y.shape[0]:
raise ValueError("quantity of x must same as y")
A = np.concatenate((x, np.ones((x.shape[0], 1))), axis=1)
if self.x is None:
self.x = np.random.random((A.shape[1], 1))
for _ in range(epochs):
self.x -= e * (reduce(np.matmul, (A.T, A, self.x)) - np.matmul(A.T, y))
def predict(self, x):
"""
进行预测
:param x: 待预测数据
:return: 预测数据
"""
return np.matmul(np.concatenate((x, np.ones((x.shape[0], 1))), axis=1), self.x)
def evaluate(self, x, y):
y_ = self.predict(x)
return mean_squared_error(y, y_)
train_x, train_y = [], []
for _ in range(1000):
# 2 个自变量
train_xi = [randint(-100, 100) for _ in range(2)]
# 因变量
train_yi = 3 * train_xi[0] + 5 * train_xi[1] + 4
train_x.append(train_xi)
train_y.append([train_yi])
train_x, train_y = np.array(train_x), np.array(train_y)
model = GLSModel()
model.fit(train_x, train_y, 2e-7, 100)
print(f'mse: {model.evaluate(train_x, train_y)}')
# mse: 9.080036995109756
test_x = [randint(-100, 100) for _ in range(2)]
test_y = 3 * test_x[0] + 5 * test_x[1] + 4
print(f'real: {test_y}, pred: {model.predict(np.array([test_x]))[0][0]}')
# real: 13, pred: 10.01059089304455
读者可以自行调参尝试,以获得更优解。