【Python机器学习】线性模型——线性回归

最新推荐文章于 2024-03-02 08:00:00 发布

zhangbin_237

最新推荐文章于 2024-03-02 08:00:00 发布

阅读量442

点赞数 4

分类专栏： Python机器学习文章标签： python 机器学习线性回归

本文链接：https://blog.csdn.net/weixin_39407597/article/details/135423821

版权

Python机器学习专栏收录该内容

216 篇文章 1 订阅

订阅专栏

本文介绍了线性回归的基本概念，包括其在寻找参数以最小化均方误差的应用。通过在一维和多维数据集（如波士顿房价数据集）上的实例，展示了模型在训练集和测试集上的性能差异，揭示了欠拟合和过拟合的问题。

摘要由CSDN通过智能技术生成

线性回归，又叫普通最小二乘法，是回归问题最简单也是最经典的线性方法。线性回归寻找参数w和b，使得对训练集的预测值与真实的回归目标值y之间的均方误差最小。

均方误差是预测值与真实值之差的平方和除以样本差。线性回归没有参数，是一个优点，但是也因此无法控制模型的复杂度。

一维数据集：

import mglearn.datasets
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

X,y=mglearn.datasets.make_wave(n_samples=60)
X_train,X_test,y_train,y_test=train_test_split(
    X,y,random_state=42
)
lr=LinearRegression().fit(X_train,y_train)

print('斜率：{}'.format(lr.coef_))
print('截距：{}'.format(lr.intercept_))
print('训练集score:{:.2f}'.format(lr.score(X_train,y_train)))
print('测试集score:{:.2f}'.format(lr.score(X_test,y_test)))

score约为0.66，结果不是很好，但训练集和测试集的分数非常相近，说明模型存在欠拟合，而不是过拟合

多维数据集：

mglearn包有个现成数据集，为波士顿房价数据集，有506个样本和105个导出特征。

import mglearn.datasets
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.neighbors import KNeighborsRegressor
import matplotlib.pyplot as plt
import numpy as np

X,y=mglearn.datasets.load_extended_boston()
X_train,X_test,y_train,y_test=train_test_split(
    X,y,random_state=0
)
lr=LinearRegression().fit(X_train,y_train)

print('训练集score:{:.2f}'.format(lr.score(X_train,y_train)))
print('测试集score:{:.2f}'.format(lr.score(X_test,y_test)))