8-5 学习曲线

最新推荐文章于 2023-03-08 17:11:36 发布

Bonjour_Yvonne

最新推荐文章于 2023-03-08 17:11:36 发布

阅读量141

点赞数

分类专栏：机器学习文章标签：机器学习

本文链接：https://blog.csdn.net/Bonjour_h/article/details/117230774

版权

机器学习专栏收录该内容

36 篇文章 1 订阅

订阅专栏

学习曲线：随着训练样本逐渐增多，算法训练出的模型的表现能力

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(666)
x = np.random.uniform(-3.0,3.0,size=100)
X = x.reshape(-1,1)
y = 0.5 * x ** 2 + x + 2 +np.random.normal(0,1,size=100)

plt.scatter(x,y)
plt.show()

输出图片：
在这里插入图片描述
学习曲线

from sklearn.model_selection import train_test_split

X_train,X_test,y_train,y_test = train_test_split(X,y,random_state=666)
X_train.shape
输出：(75, 1)

from sklearn.linear_model import LinearRegression

from sklearn.metrics import mean_squared_error

#两个列表中记录了线性模型随着放进去的训练数据越来越多相应的得到的模型在训练数据集和测试数据集上性能的变化
train_score = []
test_score = []

for i in range(1,76):
    lin_reg = LinearRegression()
    lin_reg.fit(X_train[:i],y_train[:i])
    y_train_predict = lin_reg.predict(X_train[:i])   train_score.append(mean_squared_error(y_train[:i],y_train_predict))
    y_test_predict = lin_reg.predict(X_test)  test_score.append(mean_squared_error(y_test,y_test_predict))

plt.plot([i for i in range(1,76)],np.sqrt(train_score),label="train")
plt.plot([i for i in range(1,76)],np.sqrt(test_score),label="test")

plt.axis([0,76,0,4])
plt.legend()#加上图例
plt.show()

输出：在这里插入图片描述
我们可以将上述过程提炼成一个函数：

def plot_learning_curve(algo,X_train,X_test,y_train,y_test):
    train_score = []
    test_score = []

    for i in range(1,len(X_train)+1):
        algo.fit(X_train[:i],y_train[:i])

        y_train_predict = algo.predict(X_train[:i])
        train_score.append(mean_squared_error(y_train[:i],y_train_predict))

        y_test_predict = algo.predict(X_test)
        test_score.append(mean_squared_error(y_test,y_test_predict))
    
    plt.plot([i for i in range(1,len(X_train)+1)],np.sqrt(train_score),label="train")
    plt.plot([i for i in range(1,len(X_train)+1)],np.sqrt(test_score),label="test")
    plt.axis([0,len(X_train)+1,0,4])
    plt.legend()#加上图例
    plt.show()
    
plot_learning_curve(LinearRegression(),X_train,X_test,y_train,y_test)#误差稳定在1.6左右，欠拟合的情况

在这里插入图片描述
使用线性回归得到的结果误差稳定在1.6左右，这是欠拟合的情况

使用多项式回归

from sklearn.preprocessing import PolynomialFeatures
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline

def PolynomialRegression(degree):
    return Pipeline([
        ("poly",PolynomialFeatures(degree=degree)),
        ("std_scaler",StandardScaler()),
        ("lin_reg",LinearRegression())
    ])
    
poly2_reg = PolynomialRegression(degree=2)
plot_learning_curve(poly2_reg,X_train,X_test,y_train,y_test)#误差稳定在1左右，最佳情况

在这里插入图片描述
使用多项式回归法，在degree = 2的情况下，误差稳定在1.0左右且训练数据集和测试数据集之间的误差接近，相对来说这是最佳的情况

poly2_reg = PolynomialRegression(degree=20)
plot_learning_curve(poly2_reg,X_train,X_test,y_train,y_test)
#过拟合的情况发生（误差情况和最佳情况差不多，都稳定在1.0左右，但是在训练数据集上表现好的时候，在测试数据集上表现不好，训练数据集和测试数据集的误差相差比较大）

在这里插入图片描述

使用多项式回归法，在degree = 20的情况下，误差稳定在1.0，但是在训练数据集上表现好的时候，在测试数据集上表现不好（比如在训练样本数在30-40这段），训练数据集和测试数据集的误差相差比较大，这是过拟合的情况

Bonjour_Yvonne

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
8-5 学习曲线

学习曲线：随着训练样本逐渐增多，算法训练出的模型的表现能力import numpy as npimport matplotlib.pyplot as pltnp.random.seed(666)x = np.random.uniform(-3.0,3.0,size=100)X = x.reshape(-1,1)y = 0.5 * x ** 2 + x + 2 +np.random.normal(0,1,size=100)plt.scatter(x,y)plt.show()输出图片：
复制链接

扫一扫

专栏目录