数学推导+纯Python实现机器学习算法1:线性回归


数学推导

线性回归模型中参数估计的推导过程:

f ( x i ) = w x i + b , 使 得 f ( x i ) ≈ y i f(x_i)=wx_i+b,使得f(x_i)\approx y_i f(xi)=wxi+b使f(xi)yi

均方误差是回归任务中最常用的性能度量,可以用于衡量 f ( x ) f(x) f(x) y y y之间的差别:

( w ∗ , b ∗ ) = a r g min ⁡ ( w , b ) ∑ i = 1 m ( y i − f ( x i ) ) 2 = a r g min ⁡ ( w , b ) ∑ i = 1 m ( y i − w x i − b ) 2 \begin{aligned} (w^*,b^*) & = arg\min_{(w,b)}\sum^{m}_{i=1}(y_i-f(x_i))^2 \\ & = arg\min_{(w,b)}\sum^{m}_{i=1}(y_i-wx_i-b)^2 \end{aligned} (w,b)=arg(w,b)mini=1m(yif(xi))2=arg(w,b)mini=1m(yiwxib)2

∂ E ( w , b ) ∂ w = 2 ( w ∑ i = 1 m x i 2 − ∑ i = 1 m ( y i − b ) x i ) \frac{\partial E(w,b)}{\partial w}=2(w\sum^{m}_{i=1}x_i^2-\sum^{m}_{i=1}(y_i-b)x_i) wE(w,b)=2(wi=1mxi2i=1m(yib)xi)

∂ E ( w , b ) ∂ b = 2 ( m b − ∑ i = 1 m ( y i − w x i ) ) \frac{\partial E(w,b)}{\partial b}=2(mb-\sum^{m}_{i=1}(y_i-wx_i)) bE(w,b)=2(mbi=1m(yiwxi))

令以上等式为零:

w = ∑ i = 1 m y i ( x i − x ‾ ) ∑ i = 1 m x i 2 − 1 m ( ∑ i = 1 m x i ) 2 , ( x ‾ = 1 m ∑ i = 1 m x i ) w=\frac{\sum^{m}_{i=1}y_i(x_i-\overline x)}{\sum^{m}_{i=1}x_i^2-\frac{1}{m}(\sum^{m}_{i=1}x_i)^2},(\overline x=\frac{1}{m}\sum^{m}_{i=1}x_i) w=i=1mxi2m1(i=1mxi)2i=1myi(xix)(x=m1i=1mxi)

b = 1 m ∑ i = 1 m ( y i − w x i ) b=\frac{1}{m}\sum^{m}_{i=1}(y_i-wx_i) b=m1i=1m(yiwxi)

w w w的详细推导如下:

w ∑ i = 1 m x i 2 = ∑ i = 1 m ( y i − b ) x i = ∑ i = 1 m ( y i − 1 m ∑ i = 1 m ( y i − w x i ) ) x i = ∑ i = 1 m y i x i − 1 m ∑ i = 1 m ( y i − w x i ) ∑ i = 1 m x i = ∑ i = 1 m y i x i − 1 m ∑ i = 1 m y i ∑ i = 1 m x i + 1 m w ∑ i = 1 m x i ∑ i = 1 m x i = ∑ i = 1 m y i ( x i − 1 m ∑ i = 1 m x i ) + 1 m w ( ∑ i = 1 m x i ) 2 \begin{aligned} w\sum^{m}_{i=1}x_i^2& = \sum^{m}_{i=1}(y_i-b)x_i \\ & = \sum^{m}_{i=1}(y_i-\frac{1}{m}\sum^{m}_{i=1}(y_i-wx_i))x_i \\ & = \sum^{m}_{i=1}y_ix_i-\frac{1}{m}\sum^{m}_{i=1}(y_i-wx_i)\sum^{m}_{i=1}x_i \\ & = \sum^{m}_{i=1}y_ix_i-\frac{1}{m}\sum^{m}_{i=1}y_i\sum^{m}_{i=1}x_i +\frac{1}{m}w\sum^{m}_{i=1}x_i \sum^{m}_{i=1}x_i \\ & = \sum^{m}_{i=1}y_i(x_i-\frac{1}{m}\sum^{m}_{i=1}x_i) +\frac{1}{m}w(\sum^{m}_{i=1}x_i)^2 \end{aligned} wi=1mxi2=i=1m(yib)xi=i=1m(yim1i=1m(yiwxi))xi=i=1myixim1i=1m(yiwxi)i=1mxi=i=1myixim1i=1myii=1mxi+m1wi=1mxii=1mxi=i=1myi(xim1i=1mxi)+m1w(i=1mxi)2

w = ∑ i = 1 m y i ( x i − 1 m ∑ i = 1 m x i ) ∑ i = 1 m x i 2 − 1 m ( ∑ i = 1 m x i ) 2 w=\frac{\sum^{m}_{i=1}y_i(x_i-\frac{1}{m}\sum^{m}_{i=1}x_i)}{\sum^{m}_{i=1}x_i^2-\frac{1}{m}(\sum^{m}_{i=1}x_i)^2} w=i=1mxi2m1(i=1mxi)2i=1myi(xim1i=1mxi)

向量形式推导如下:

f ( x ) = w 1 x 1 + w 2 x 2 + ⋯ + w d x d + b f(x)=w_1x_1+w_2x_2+\cdots+w_dx_d+b f(x)=w1x1+w2x2++wdxd+b 向量形式为:
f ( x ) = w T x + b f(\boldsymbol x)=\boldsymbol w^T\boldsymbol x+b f(x)=wTx+b
其中 w = ( w 1 ; w 2 ; ⋯   ; w d ) \boldsymbol w=(w_1;w_2;\cdots;w_d) w=(w1;w2;;wd)

给定数据集 D = { ( x 1 , y 1 ) , ( x 2 , y 2 ) , ⋯   , ( x m , y m ) } D=\{(\boldsymbol x_1,y_1),(\boldsymbol x_2,y_2),\cdots,(\boldsymbol x_m,y_m)\} D={(x1,y1),(x2,y2),,(xm,ym)},其中 x i = ( x i 1 ; x i 2 ; ⋯   ; x i d ) \boldsymbol x_i=(x_i1;x_i2;\cdots;x_id) xi=(xi1;xi2;;xid) y i ∈ R y_i\in \mathbb R yiR

w ^ = ( w ; b ) \hat \boldsymbol w=(\boldsymbol w;b) w^=(w;b),将数据集 D D D表示为一个 m × ( d + 1 ) 大 小 的 矩 阵 X m\times (d+1)大小的矩阵\boldsymbol X m×(d+1)X,其中每行对应一个示例,该行前 d d d个元素对应于示例的 d d d个属性值,最后一个元素恒置为1,即:

X = ( x 11 x 12 ⋯ x 1 d 1 x 21 x 22 ⋯ x 2 d 1 ⋮ ⋮ ⋱ ⋮ ⋮ x m 1 x m 2 ⋯ x m d 1 ) \boldsymbol X= \begin{pmatrix} x_{11} & x_{12} & \cdots & x_{1d} & 1 \\ x_{21} & x_{22} & \cdots & x_{2d} & 1 \\ \vdots & \vdots & \ddots & \vdots & \vdots \\ x_{m1} & x_{m2} & \cdots & x_{md} & 1 \\ \end{pmatrix} X=x11x21xm1x12x22xm2x1dx2dxmd111
X = ( x 1 T 1 x 2 T 1 ⋮ ⋮ x M T 1 ) \boldsymbol X= \begin{pmatrix} x_{1}^{T} & 1 \\ x_{2}^{T} & 1 \\ \vdots & \vdots\\ x_{M}^{T} & 1 \\ \end{pmatrix} X=x1Tx2TxMT111

再把标记也写成向量形式 y = ( y 1 ; y 2 ; ⋯   ; y m ) \boldsymbol y=(y_1;y_2;\cdots;y_m) y=(y1;y2;;ym),则:

w ^ ∗ = a r g min ⁡ w ^ ( y − X w ^ ) T ( y − X w ^ ) \hat w^*=arg\min_{\hat w}(\boldsymbol y-\boldsymbol X\hat w)^T(\boldsymbol y-\boldsymbol X\hat w) w^=argw^min(yXw^)T(yXw^)

E w ^ = ( y − X w ^ ) T ( y − X w ^ ) E_{\hat w}=(\boldsymbol y-\boldsymbol X\hat w)^T(\boldsymbol y-\boldsymbol X\hat w) Ew^=(yXw^)T(yXw^),对 w ^ \hat w w^求导可得:

∂ E w ^ ∂ w ^ = 2 X T ( X w ^ − y ) \frac{\partial E_{\hat w}}{\partial \hat w}=2\boldsymbol X^T(\boldsymbol X\hat w-\boldsymbol y) w^Ew^=2XT(Xw^y)

令以上等式为零:

w ^ ∗ = ( X T X ) − 1 X T y \hat w^*=(\boldsymbol X^T\boldsymbol X)^{-1}\boldsymbol X^T\boldsymbol y w^=(XTX)1XTy


Numpy 实现

  • 回归模型主体:
import numpy as np

def linear_loss(X, y, w, b):
    num_train = X.shape[0]
    num_feature = X.shape[1]    
    # 模型公式
    y_hat = np.dot(X, w) + b    
    # 损失函数
    loss = np.sum((y_hat-y)**2)/num_train    
    # 参数的偏导
    dw = np.dot(X.T, (y_hat-y)) /num_train
    db = np.sum((y_hat-y)) /num_train    
    return y_hat, loss, dw, db
  • 参数初始化:
def initialize_params(dims):
    w = np.zeros((dims, 1))
    b = 0
    return w, b
  • 基于梯度下降的模型训练过程:
def linar_train(X, y, learning_rate, epochs):
    w, b = initialize(X.shape[1])  
    loss_list = []  
    for i in range(1, epochs):        
    # 计算当前预测值、损失和参数偏导
        y_hat, loss, dw, db = linar_loss(X, y, w, b)  
        loss_list.append(loss)      
        # 基于梯度下降的参数更新过程
        w += -learning_rate * dw
        b += -learning_rate * db        
        # 打印迭代次数和损失

        if i % 10000 == 0:
            print('epoch %d loss %f' % (i, loss)) 
               
        # 保存参数
        params = {            
            'w': w,            
            'b': b
        }        
        
        # 保存梯度
        grads = {            
            'dw': dw,            
            'db': db
        }    
            
    return loss_list, loss, params, grads
  • 以 sklearn 中的 diabetes 数据集为例进行简单的训练,数据准备:
from sklearn.datasets import load_diabetes
from sklearn.utils import shuffle

diabetes = load_diabetes()
data = diabetes.data
target = diabetes.target 

# 打乱数据
X, y = shuffle(data, target, random_state=13)
X = X.astype(np.float32)

# 训练集与测试集的简单划分
offset = int(X.shape[0] * 0.9)
X_train, y_train = X[:offset], y[:offset]
X_test, y_test = X[offset:], y[offset:]
y_train = y_train.reshape((-1,1))
y_test = y_test.reshape((-1,1))

print('X_train=', X_train.shape)
print('X_test=', X_test.shape)
print('y_train=', y_train.shape)
print('y_test=', y_test.shape)

X_train= (397, 10)
X_test= (45, 10)
y_train= (397, 1)
y_test= (45, 1)

  • 执行训练:
loss_list, loss, params, grads = linar_train(X_train, y_train, 0.001, 100000)

epoch 10000 loss 5611.704502
epoch 20000 loss 5258.726277
epoch 30000 loss 4960.271811
epoch 40000 loss 4707.234957
epoch 50000 loss 4492.067734
epoch 60000 loss 4308.511724
epoch 70000 loss 4151.375892
epoch 80000 loss 4016.352800
epoch 90000 loss 3899.866551

  • 查看训练得到的回归模型参数值:
print(params)

{‘w’: array([[ 34.67484945],
[ -5.41125944],
[ 173.52846987],
[ 125.02618256],
[ 30.88182121],
[ 23.07784921],
[ -110.84261285],
[ 107.50915451],
[ 145.73740332],
[ 108.18736122]]), ‘b’: 155.69121346255798}

  • 定义预测函数对测试集结果进行预测:
def predict(X, params):
    w = params['w']
    b = params['b']

    y_pred = np.dot(X, w) + b    
    return y_pred

y_pred = predict(X_test, params)
y_pred[:5]

array([[ 132.02545017],
[ 141.77423134],
[ 150.48352021],
[ 128.48666753],
[ 147.29312454]])

  • 利用 matplotlib 对预测结果和真值进行展示:
import matplotlib.pyplot as plt
f = X_test.dot(params['w']) + params['b']

plt.scatter(range(X_test.shape[0]), y_test)
plt.plot(f, color = 'darkorange')
plt.xlabel('X')
plt.ylabel('y')
plt.show()

  • 训练过程中损失的下降:
import matplotlib.pyplot as plt
plt.plot(loss_list, color = 'blue')
plt.xlabel('epochs')
plt.ylabel('loss')
plt.show()

在这里插入图片描述

  • 封装一个线性回归类
import numpy as np
from sklearn.utils import shuffle
from sklearn.datasets import load_diabetes
import matplotlib.pyplot as plt

class lr_model():
    def __init__(self):
        pass

    # 数据准备
    def prepare_data(self):
        # 糖尿病数据集
        data = load_diabetes().data
        target = load_diabetes().target

        # 打乱数据, random_state为随机数种子
        X, y = shuffle(data, target, random_state=42)
        X = X.astype(np.float32)
        y = y.reshape((-1, 1))
        data = np.concatenate((X, y), axis=1)
        return data

    # 参数初始化
    def initialize_params(self, dims):
        w = np.zeros((dims, 1))
        b = 0
        return w, b

    # 回归模型主体
    def linear_loss(self, X, y, w, b):
        num_train = X.shape[0]
        num_feature = X.shape[1]

        # 模型公式
        y_hat = np.dot(X, w) + b
        # 损失函数
        loss = np.sum((y_hat - y) ** 2) / num_train
        # 参数的偏导
        dw = np.dot(X.T, (y_hat - y)) / num_train
        db = np.sum((y_hat - y)) / num_train
        return y_hat, loss, dw, db

    # 基于梯度下降的模型训练过程
    def linear_train(self, X, y, learning_rate, epochs):
        w, b = self.initialize_params(X.shape[1])

        # 计算当前预测值、损失和参数偏导
        for i in range(1, epochs):
            y_hat, loss, dw, db = self.linear_loss(X, y, w, b)

            # 基于梯度下降的参数更新过程
            w += -learning_rate * dw
            b += -learning_rate * db

            # 打印迭代次数和损失
            if i % 10000 == 0:
                print('epoch %d loss %f' % (i, loss))

            # 保存参数
            params = {
                'w': w,
                'b': b
            }

            # 保存梯度
            grads = {
                'dw': dw,
                'db': db
            }
        return loss, params, grads

    # 定义预测函数对测试集结果进行预测
    def predict(self, X, params):
        w = params['w']
        b = params['b']
        y_pred = np.dot(X, w) + b
        return y_pred

    # k折交叉验证
    def linear_cross_validation(self, data, k, randomize=True):
        if randomize:
            data = list(data)
            shuffle(data)

        slices = [data[i::k] for i in range(k)]
        for i in range(k):
            validation = slices[i]
            train = [data
                     for s in slices if s is not validation for data in s]
            train = np.array(train)
            validation = np.array(validation)
            yield train, validation


if __name__ == '__main__':
    lr = lr_model()
    data = lr.prepare_data()

    # 训练集与测试集的简单划分
    for train, validation in lr.linear_cross_validation(data, 5):
        # 0-10 正向切片,不包括10
        X_train = train[:, :10]
        # -1 反向切片
        y_train = train[:, -1].reshape((-1, 1))
        X_valid = validation[:, :10]
        y_valid = validation[:, -1].reshape((-1, 1))

        # print('X_train=', X_train.shape)
        # X_train= (353, 10)
        
        # print('X_valid=', X_valid.shape)
        # X_valid= (89, 10)
        
        # print('y_train=', y_train.shape)
        # y_train= (353, 1)
        
        # print('y_valid=', y_valid.shape)
        # y_valid= (89, 1)
        
        loss5 = []

        # 执行训练:
        loss, params, grads = lr.linear_train(X_train, y_train, 0.001, 100000)
        
        # print('loss=', loss)
        # loss= 3798.9563843996484
        
        # print('params=', params)
        # params= {'w': array([[  34.67484945],
		#        [  -5.41125944],
		#        [ 173.52846987],
		#        [ 125.02618256],
		#        [  30.88182121],
		#        [  23.07784921],
		#        [-110.84261285],
		#        [ 107.50915451],
		#        [ 145.73740332],
		#        [ 108.18736122]]), 'b': 155.69121346255798}
		
        # print('grads=', grads)
		# grads= {'dw': array([[-0.13523112],
		#       [ 0.21054297],
		#       [-1.25840588],
		#       [-0.86467876],
		#       [-0.03181374],
		#       [ 0.04389929],
		#       [ 0.7288374 ],
		#       [-0.60464886],
		#       [-0.98535644],
		#       [-0.65978939]]), 'db': -0.0004228883425417108}

        loss5.append(loss)
        score = np.mean(loss5)
        print('five kold cross validation score is', score)

        y_pred = lr.predict(X_valid, params)
        valid_score = np.sum(((y_pred - y_valid) ** 2)) / len(X_valid)
        print('valid score is', valid_score)

        f = X_valid.dot(params['w']) + params['b']

        # scatter(x,y) 在向量 x 和 y 指定的位置创建一个包含圆形的散点图
        plt.scatter(range(X_valid.shape[0]), y_valid)
        plt.plot(f, color='darkorange')
        plt.xlabel('X')
        plt.ylabel('y')
        plt.show()

epoch 10000 loss 5611.704502
epoch 20000 loss 5258.726277
epoch 30000 loss 4960.271811
epoch 40000 loss 4707.234957
epoch 50000 loss 4492.067734
epoch 60000 loss 4308.511724
epoch 70000 loss 4151.375892
epoch 80000 loss 4016.352800
epoch 90000 loss 3899.866551
five kold cross validation score is 3798.9563843996484
valid score is 4214.092765494475

在这里插入图片描述


参考:

1、周志华 《机器学习》 CH3-线性回归

2、数学推导+纯Python实现机器学习算法1:线性回归

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值