Python实现线性回归（sklearn工具包、jupyter平台）

ChiCheng83

于 2024-08-05 22:24:28 发布

阅读量229

点赞数 9

分类专栏：机器学习文章标签： jupyter 线性回归 sklearn python

本文链接：https://blog.csdn.net/chencxiaobai/article/details/140718071

版权

机器学习专栏收录该内容

7 篇文章 0 订阅

订阅专栏

算法推导过程在之前的文章中已经给出了求解方法，基于最小二乘法直接求解，但这并不是机器学习的思想，由此引入了梯度下降方法。

1.线性回归方程实现

import numpy as np #导入基础包和一些设置
import os
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
plt.rcParams['axes.labelsize'] = 14
plt.rcParams['xtick.labelsize'] = 12
plt.rcParams['ytick.labelsize'] = 12
import warnings
warnings.filterwarnings('ignore')
np.random.seed(42)

import numpy as np #随便构造一个函数
X = 2*np.random.rand(100,1)#0-2之间的数
y = 4+ 3*X +np.random.randn(100,1)

plt.plot(X,y,'b.') #将函数上的点表示出来
plt.xlabel('X_1')
plt.ylabel('y')
plt.axis([0,2,0,15])
plt.show()

根据以下公式(最小二乘法)求出最优的theta值

X_b = np.c_[np.ones((100,1)),X] #插入一列1，方便矩阵运算（偏置项）
theta_best = np.linalg.inv(X_b.T.dot(X_b)).dot(X_b.T).dot(y) #根据公式的方阵

实际上就是a和b的值（y=ax+b）

将求出的a、b值随便找两个点连成一条线就得到了这条回归方程，只要我们有数据X和Y就能够得到回归方程。但没有体现机器学习的过程，而且矩阵也不一定都是可逆的，就需要其他的方法了

以上的方法sklearn可以直接调用

2.批量梯度下降

拿到数据后，想都不想第一步就是标准化数据，X-均值u让数据在0的范围内波动，再除上每个特征变量（X、Y）的标准差，使得X、Y两个特征尽可能保持相近的特征结果。比如年龄和薪资的关系，年龄的范围小（0-100），薪资的范围大（0-100w），二者相差过大会导致线性回归收敛过慢。因此除上标准差能够让年龄和薪资的范围差距缩小。

批量梯度下降: 得到的theta值和上面的最小二乘法一致

eta = 0.1 #学习率
n_iterations = 1000  #迭代次数
m = 100  #所有样本数
theta = np.random.randn(2,1) #任意给一个theta值
for iteration in range(n_iterations):
    gradients = 2/m* X_b.T.dot(X_b.dot(theta)-y) #梯度下降公式
    theta = theta - eta*gradients #更新参数theta

学习率对结果的影响：

theta_path_bgd = [] #用于存储theta的变化情况
def plot_gradient_descent(theta,eta,theta_path = None):
    m = len(X_b)
    plt.plot(X,y,'b.')
    n_iterations = 1000
    for iteration in range(n_iterations):
        y_predict = X_new_b.dot(theta)
        plt.plot(X_new,y_predict,'b-')
        gradients = 2/m* X_b.T.dot(X_b.dot(theta)-y)
        theta = theta - eta*gradients
        if theta_path is not None:
            theta_path.append(theta)
    plt.xlabel('X_1')
    plt.axis([0,2,0,15])
    plt.title('eta = {}'.format(eta))

theta = np.random.randn(2,1)
#学习率为0.02、0.1、0.5的不同情况
plt.figure(figsize=(10,4))
plt.subplot(131)
plot_gradient_descent(theta,eta = 0.02)
plt.subplot(132)
plot_gradient_descent(theta,eta = 0.1,theta_path=theta_path_bgd)
plt.subplot(133)
plot_gradient_descent(theta,eta = 0.5)
plt.show()

可以看出学习率小时，经过了很多步才达到了理想的位置，学习率为0.1时就快一些，而学习率为0.5时直接偏离了正确结果，因此学习率宁愿小也不要过大

3.随机梯度下降

随机的选择一个样本进行梯度下降。我们指定一个衰减策略：在梯度下降的过程中，我们让开始的收益率最大，在逐渐靠近最优的过程中，收益率逐渐减少。意思就是一开始我们让步长大一些，快速靠近最低的损失值，越靠近最低损失值，我们就越需要更小的步长去靠近它，以防错过最低损失值。

theta_path_sgd=[]
m = len(X_b)
np.random.seed(42)
n_epochs = 50

t0 = 5     #分子
t1 = 50    #分母

def learning_schedule(t):
    return t0/(t1+t)    #迭代次数越大，就让学习率越小

theta = np.random.randn(2,1)

for epoch in range(n_epochs):    #迭代50次
    for i in range(m):            #每次都遍历每个样本
        if epoch < 10 and i<10:    #只记录前10个样本
            y_predict = X_new_b.dot(theta)
            plt.plot(X_new,y_predict,'r-')
        random_index = np.random.randint(m)    #随机从m个样本中抽出一个样本
        xi = X_b[random_index:random_index+1]    #取当前这份数据
        yi = y[random_index:random_index+1]
        gradients = 2* xi.T.dot(xi.dot(theta)-yi)    #梯度下降
        eta = learning_schedule(epoch*m+i)
        theta = theta-eta*gradients    #更新参数
        theta_path_sgd.append(theta)    #记录参数
        
plt.plot(X,y,'b.')
plt.axis([0,2,0,15])   
plt.show()

每次重新执行时，直线都不一样，因为每个样本的差异都不一样，因此有些抖动是正常的。

4.小批量梯度下降

随机选不靠谱，全部选又太慢。因此采用小批量梯度下降（minibatch）。

theta_path_mgd=[]
n_epochs = 50
minibatch = 16
theta = np.random.randn(2,1)
t0, t1 = 200, 1000
def learning_schedule(t):
    return t0 / (t + t1)
np.random.seed(42)
t = 0
for epoch in range(n_epochs):
    shuffled_indices = np.random.permutation(m)    #shuffled洗牌操作，防止获取同一批数据
    X_b_shuffled = X_b[shuffled_indices]
    y_shuffled = y[shuffled_indices]
    for i in range(0,m,minibatch):
        t+=1
        xi = X_b_shuffled[i:i+minibatch]    #获取一部分的样本
        yi = y_shuffled[i:i+minibatch]
        gradients = 2/minibatch* xi.T.dot(xi.dot(theta)-yi)
        eta = learning_schedule(t)
        theta = theta-eta*gradients
        theta_path_mgd.append(theta)

得到的theta值:

5.三种策略的对比

plt.figure(figsize=(12,6))
plt.plot(theta_path_sgd[:,0],theta_path_sgd[:,1],'r-s',linewidth=1,label='SGD')
plt.plot(theta_path_mgd[:,0],theta_path_mgd[:,1],'g-+',linewidth=2,label='MINIGD')
plt.plot(theta_path_bgd[:,0],theta_
         
plt.axis([3.5,4.5,2.0,4.0])
plt.show()

蓝色的为批量下降，可以看出直接奔着最优解去了。绿色为小批量下降，围绕着批量下降波动。红色的为随机下降，震荡明显，受个体样本影响大。