【sklearn】LinearRegression使用

最新推荐文章于 2025-03-29 10:47:59 发布

Novelin

最新推荐文章于 2025-03-29 10:47:59 发布

阅读量2.3k

点赞数 1

分类专栏：【机器学习模型】

本文链接：https://blog.csdn.net/qq_40860934/article/details/114288682

版权

【机器学习模型】专栏收录该内容

4 篇文章

订阅专栏

1 参数

sklearn 的 LinearRegression 存在一个参数可以在训练前进行标准化

from sklearn.linear_model import LinearRegression
model = LinearRegression(normalize=True)

文档介绍
normalizebool, default=False

This parameter is ignored when fit_intercept is set to False. If True, the regressors X will be normalized before regression by subtracting the mean and dividing by the l2-norm. If you wish to standardize, please use StandardScaler before calling fit on an estimator with normalize=False.

有意思的是 normalized 和 standardize 都是标准化，减去均值除 l2 范数

2 系数

训练完成的线性回归模型，其系数可以代表该特征的重要性

sorted(dict(zip(continuous_feature_names, model.coef_)).items(), key=lambda x:x[1], reverse=True)

也可以绘图

model = LinearRegression().fit(train_X, train_y_ln)
print('intercept:'+ str(model.intercept_))
sns.barplot(abs(model.coef_), continuous_feature_names)

3 检查模型

训练完模型，要比对真实和预测的差距，确定模型是否可行

subsample_index = np.random.randint(low=0, high=len(train_y), size=50)
plt.scatter(train_X['v_9'][subsample_index], train_y[subsample_index], color='black')
plt.scatter(train_X['v_9'][subsample_index], model.predict(train_X.loc[subsample_index]), color='blue')
plt.xlabel('v_9')
plt.ylabel('price')
plt.legend(['True Price','Predicted Price'],loc='upper right')
print('The predicted price is obvious different from true price')
plt.show()