线型回归2

最新推荐文章于 2024-08-28 19:40:04 发布

zjkman163com

最新推荐文章于 2024-08-28 19:40:04 发布

阅读量98

点赞数

文章标签：机器学习

本文链接：https://blog.csdn.net/zjkman163com/article/details/107550491

版权

import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
df = pd.read_excel('./datasets/house.xlsx')
feature = df
feature = feature.drop(labels=['Y house price of unit area','No'],axis=1)
target = df['Y house price of unit area']
x_train,x_test,y_train,y_test = train_test_split(feature,target,test_size=0.2,random_state=2020)
regression = LinearRegression()
regression.fit(x_train,y_train)
regression.score(x_test,y_test)
regression.coef_
regression.intercept_
[*zip(feature.columns,regression.coef_)]

from sklearn.metrics import mean_squared_error as mse
y_pred = regression.predict(x_test)

mse(y_test,y_pred)
y_test.min(),y_test.max()

from sklearn.metrics import r2_score
r2_score(y_test,y_pred)

import matplotlib.pyplot as plt
plt.plot(range(len(y_test)),sorted(y_test),c='black',label='y_true')
plt.plot(range(len(y_pred)),sorted(y_pred),c='red',label='y_pred')
plt.legend()

from sklearn.preprocessing import PolynomialFeatures
P = PolynomialFeatures(degree=2)
feature_p_2 = P.fit_transform(feature)
x_train,x_test,y_train,y_test = train_test_split(feature_p_2,target,test_size=0.2,random_state=2020)
regression = LinearRegression()
regression.fit(x_train,y_train)
y_pred = regression.predict(x_test)

from sklearn.metrics import r2_score 
from sklearn.metrics import mean_squared_error
mean_squared_error(y_test,y_pred)
r2_score(y_test,y_pred)

p3 = PolynomialFeatures(degree=3)
feature_p_3 = p3.fit_transform(feature)
x_train,x_test,y_train,y_test = train_test_split(feature_p_3,target,test_size=0.2,random_state=2020)
regression.fit(x_train,y_train)
y_pred = regression.predict(x_test)
mean_squared_error(y_test,y_pred)
r2_score(y_test,y_pred)
regression.coef_

欠拟合：不能很好的拟合数据。（模型过于简单）
过拟合：训练数据上能够获得比其他假设更好的拟合，但是在训练数据以外的数据集上却不能很好的拟合数据，此时认为这个假设出现了过拟合现象。（模型过于复杂）

欠拟合：
原因：模型学习到样本的特征太少
解决：增加样本的特征数量（多项式回归）
如何给样本添加高次的特征数据呢？
使用sklearn.preprocessing.PolynomialFeatures来进行更高次特征的构造
它是使用多项式的方法来进行的，如果有a，b两个特征，那么它的2次多项式为（1,a,b,a^2,ab, b^2）
PolynomialFeatures有三个参数
degree：控制多项式的度
interaction_only：默认为False，如果指定为True，上面的二次项中没有a^2和b2。
include_bias：默认为True。如果为False的话，那么就不会有上面的1那一项

过拟合：
原因：原始特征过多，存在一些嘈杂特征。
解决：
进行特征选择，消除关联性大的特征（很难做）
正则化之岭回归（掌握）

过拟合处理:正则化

可以使得高次项的w权重减小，趋近于0.
LinnerRegression是没有办法进行正则化的，所以该算法模型容易出现过拟合，并且无法解决。
L2正则化：
使用带有正则化算法的回归模型（Ridge岭回归）处理过拟合的问题。
API:from sklearn.linear_model import Ridge
Ridge(alpha=1.0):
alpha:正则化的力度，力度越大，则表示高次项的权重w越接近于0，导致过拟合曲线的凹凸幅度越小。
取值：0-1小数或者1-10整数
coef_:回归系数

from sklearn.linear_model import Ridge
ridge = Ridge(alpha=0.4)
ridge.fit(x_train,y_train)
ridge.coef_

模型的保存和加载
from sklearn.externals import joblib
joblib.dump(model,‘xxx.m’):保存
joblib.load(‘xxx.m’):加载
import pickle
with open(’./123.pkl’,‘wb’) as fp:
pickle.dump(linner,fp)
with open(’./123.pkl’,‘rb’) as fp:
linner = pickle.load(fp)

zjkman163com

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
线型回归2

import pandas as pdfrom sklearn.linear_model import LinearRegressionfrom sklearn.model_selection import train_test_splitdf = pd.read_excel('./datasets/house.xlsx')feature = dffeature = feature.drop(labels=['Y house price of unit area','No'],axis=1)ta
复制链接

扫一扫