机器学习(1)——线性回归

一、线性回归模型公式

  1. 无正则的线性回归模型
    在这里插入图片描述
  2. L1正则的Lasso模型
    在这里插入图片描述
  3. L2正则的Ridge Regression(岭回归)模型
    在这里插入图片描述

二、 评价准则

  1. 开方均方误差(rooted mean squared error,RMSE)
    在这里插入图片描述
  2. 平均绝对误差(mean absolute error,MAE)
    在这里插入图片描述
  3. R2 score
    在这里插入图片描述

三、sklearn实现

1. 线性回归模型

  • 无正则的线性回归模型
from sklearn.linear_model import LinearRegression

# Linear Regression
# 1. 生成学习器实例
lr = LinearRegression()

#2. 在训练集上训练学习器
lr.fit(X_train, y_train)

#3.用训练好的学习器对训练集/测试集进行预测
y_train_pred = lr.predict(X_train)
y_test_pred = lr.predict(X_test)

# RMSE(开方均方误差)
rmse_train = np.sqrt(mean_squared_error(y_train,y_train_pred))
rmse_test = np.sqrt(mean_squared_error(y_test,y_test_pred))
print("RMSE on Training set :", rmse_train)
print("RMSE on Test set :", rmse_test)

# R2 score
r2_score_train = r2_score(y_train,y_train_pred)
r2_score_test = r2_score(y_test,y_test_pred)
print("r2_score on Training set :", r2_score_train)
print("r2_score on Test set :", r2_score_test)
  • L1正则的Lasso模型
from sklearn.linear_model import LassoCV
from sklearn.metrics import r2_score  #评价回归预测模型的性能

#设置超参数搜索范围
#alphas = [ 0.01, 0.1, 1, 10,100]

#生成一个LassoCV实例
#lasso = LassoCV(alphas=alphas)  
lasso = LassoCV()  

#训练(内含CV)
lasso.fit(X_train, y_train) 
 
#测试
y_test_pred_lasso = lasso.predict(X_test)
y_train_pred_lasso = lasso.predict(X_train)


#评估,使用r2_score评价模型在测试集和训练集上的性能
print ('The r2 score of LassoCV on test is', r2_score(y_test, y_test_pred_lasso))
print ('The r2 score of LassoCV on train is', r2_score(y_train, y_train_pred_lasso))
  • L2正则的Ridge Regression(岭回归)模型
from sklearn.linear_model import  RidgeCV
from sklearn.metrics import r2_score  #评价回归预测模型的性能

#设置超参数(正则参数)范围
alphas = [ 0.01, 0.1, 1, 10,100]
#n_alphas = 20
#alphas = np.logspace(-5,2,n_alphas)

#生成一个RidgeCV实例
ridge = RidgeCV(alphas=alphas, store_cv_values=True)  

#模型训练
ridge.fit(X_train, y_train)    

#预测
y_test_pred_ridge = ridge.predict(X_test)
y_train_pred_ridge = ridge.predict(X_train)

#评估,使用r2_score评价模型在测试集和训练集上的性能
print ('The r2 score of RidgeCV on test is', r2_score(y_test, y_test_pred_ridge))
print ('The r2 score of RidgeCV on train is', r2_score(y_train, y_train_pred_ridge))

2. 模型评估

  • RMSE(开方均方误差)
from sklearn.metrics import mean_squared_error

rmse_train = np.sqrt(mean_squared_error(y_train,y_train_pred))
rmse_test = np.sqrt(mean_squared_error(y_test,y_test_pred))
print("RMSE on Training set :", rmse_train)
print("RMSE on Test set :", rmse_test)
  • R2 score
r2_score_train = r2_score(y_train,y_train_pred)
r2_score_test = r2_score(y_test,y_test_pred)
print("r2_score on Training set :", r2_score_train)
print("r2_score on Test set :", r2_score_test)

3. 超参数调优

  • LassoCV
#设置超参数搜索范围
#alphas = [ 0.01, 0.1, 1, 10,100]

#生成一个LassoCV实例
#lasso = LassoCV(alphas=alphas)  
lasso = LassoCV()  

#2.模型训练
lasso.fit(X_train, y_train)
alpha = lasso.alpha_
print("Best alpha :" , alpha)
  • RidgeCV
RidgeCV缺省的score是mean squared errors 
#1. 设置超参数搜索范围,生成学习器实例
#RidgeCV(alphas=(0.1, 1.0, 10.0), fit_intercept=True, normalize=False, scoring=None, cv=None, gcv_mode=None, store_cv_values=False)
alphas = [0.01, 0.1, 1, 10, 100, 1000]
ridge = RidgeCV(alphas = alphas, store_cv_values=True)

#2. 用训练数据度模型进行训练
#RidgeCV采用的是广义交叉验证(Generalized Cross-Validation),留一交叉验证(N-折交叉验证)的一种有效实现方式
ridge.fit(X_train, y_train)

#通过交叉验证得到的最佳超参数alpha
alpha = ridge.alpha_
print("Best alpha :", alpha)
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值