线性回归算法

最新推荐文章于 2024-09-06 10:38:06 发布

cz? 帅哥:null

最新推荐文章于 2024-09-06 10:38:06 发布

阅读量433

点赞数 6

文章标签：线性回归算法回归

本文链接：https://blog.csdn.net/cghhbvcgjb/article/details/141857976

版权

定义

线性回归(Linear regression)是利⽤回归⽅程(函数)对⼀个或多个⾃变量(特征值)和因变量(⽬标值)之间关系进⾏建模的⼀种分析⽅式。

只有一个自变量叫单变量回归；多于一个叫多元回归

线性回归API简单使用

from	sklearn.linear_model	import	LinearRegression

# 获取数据集
x = [[80, 86],
     [82, 80],
     [85, 78],
     [90, 90],
     [86, 82],
     [82, 90],
     [78, 80],
     [92, 94]]
y = [84.2, 80.6, 80.1, 90, 83.2, 87.6, 79.4, 93.4]

# 数据基本处理（该案例中省略）
# 特征工程（该案例中省略）

# 模型训练
estimator = LinearRegression()
estimator.fit(x,y)


# 模型评估（该案例中省略）
# 打印对应系数
print("线性回归的系数是\n", estimator.coef_)
print("输出预测结果\n", estimator.predict([[100,80]]))

数学：求导

学过，不再一一说明

线性回归的损失和优化

损失函数

优化算法

1、正规方程

原理：要优化，就要找到一个w，使得损失函数最小，就是求导，令导数等于0，所对应的那个w就是损失函数最小时的w

2、梯度下降

在单变量的函数中，梯度是函数的微分，代表着函数在某个给定点的切线的斜率

在单变量的函数中，梯度是一个向量，它的方向指出了函数在给定点的上升最快的方向

公式

α：步长/学习率：通过α来控制每一步走的距离，确保不要走太快，以免步子太大扯着蛋（bushi），错过了最低点；同时也不能走太慢。所以应该选择一个合适的α（prominent）

并不一定确保找到最小值局部最优

小结

梯度下降（详细）

1、先决条件：确认优化模型的假设函数和损失函数

2、相关参数初始化

3、算法过程

（1）确定当前位置的损失函数梯度

（2）用步长乘以当前损失函数的梯度，得到当前位置下降的距离

（3）确定是否所有的θ对应的梯度下降的距离都小于算法终止距离，如果都小于，算法终止，否则，更新所有θ

……

案例分析

正规方程实现

from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

def linear_model():
    # 获取数据
    housing = fetch_california_housing()
    # 数据基本处理   分割数据
    x_train, x_test, y_train, y_test = train_test_split(housing.data, housing.target, test_size=0.2)

    # 特征工程  标准化
    transfer = StandardScaler()
    x_train = transfer.fit_transform(x_train)
    x_test = transfer.fit_transform(x_test)

    # 机器学习  线性回归(正规方程)
    estimator = LinearRegression()
    estimator.fit(x_train, y_train)

    print("模型偏置：\n",estimator.intercept_)
    print("模型系数：\n",estimator.coef_)
    # 模型评估
    y_pre = estimator.predict(x_test)
    print("预测值：\n",y_pre)
    ret = mean_squared_error(y_test, y_pre)  # 均方误差
    print("均方误差：\n", ret)

linear_model()

梯度下降实现

from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression, SGDRegressor
from sklearn.metrics import mean_squared_error

def linear_model():
    # 获取数据
    housing = fetch_california_housing()
    # 数据基本处理   分割数据
    x_train, x_test, y_train, y_test = train_test_split(housing.data, housing.target, test_size=0.2)

    # 特征工程  标准化
    transfer = StandardScaler()
    x_train = transfer.fit_transform(x_train)
    x_test = transfer.fit_transform(x_test)

    # 机器学习  线性回归(梯度下降)
    estimator = SGDRegressor(max_iter=10000)
    estimator.fit(x_train, y_train)

    print("模型偏置：\n",estimator.intercept_)
    print("模型系数：\n",estimator.coef_)
    # 模型评估
    y_pre = estimator.predict(x_test)
    print("预测值：\n",y_pre)
    ret = mean_squared_error(y_test, y_pre)  # 均方误差
    print("均方误差：\n", ret)

linear_model()

过拟合和欠拟合

欠拟合：在训练数据上不能很好的拟合，在测试数据上也不能很好的拟合数据（模型太简单了）

原因：学习到数据的特征过少

解决：添加其他特征项、添加多项式特征

过拟合：在训练数据上能够得到一个很好的拟合，但是在测试数据上却不能很好的拟合数据（模型太复杂了）

原因：原始特征过多，存在一些嘈杂特征，模型过于复杂是因为模型尝试去兼顾各个测试数据点

解决：重新清洗数据、增大数据训练量、正则化、减少特征维度

正则化：通过限制高次项来防止模型过拟合

Ridge回归就是岭回归

正则化线性模型

岭回归

from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression, SGDRegressor, Ridge, RidgeCV
from sklearn.metrics import mean_squared_error

def linear_model():
    # 获取数据
    housing = fetch_california_housing()
    # 数据基本处理   分割数据
    x_train, x_test, y_train, y_test = train_test_split(housing.data, housing.target, test_size=0.2)

    # 特征工程  标准化
    transfer = StandardScaler()
    x_train = transfer.fit_transform(x_train)
    x_test = transfer.fit_transform(x_test)

    # 机器学习  岭回归
    # estimator = Ridge(alpha=1.0)
    estimator = RidgeCV(alphas=(0.001, 0.01, 0.1, 1, 10, 100))
    estimator.fit(x_train, y_train)

    print("模型偏置：\n",estimator.intercept_)
    print("模型系数：\n",estimator.coef_)
    # 模型评估
    y_pre = estimator.predict(x_test)
    print("预测值：\n",y_pre)
    ret = mean_squared_error(y_test, y_pre)  # 均方误差
    print("均方误差：\n", ret)

linear_model()

模型保存加载

from sklearn.externals import joblib
# 保存
joblib.dump(estimator,'test.pkl')
# 加载
estimator = joblib.load('test.pkl')

cz? 帅哥:null

关注

6
点赞
踩
10

收藏

觉得还不错? 一键收藏
1
评论
线性回归算法

线性回归(Linear regression)是利⽤回归⽅程(函数)对⼀个或多个⾃变量(特征值)和因变量(⽬标值)之间关系进⾏建模的⼀种分析⽅式。只有一个自变量叫单变量回归；多于一个叫多元回归。
复制链接

扫一扫