Kaggle入门竞赛——房价预测问题:House Prices: Advanced Regression Techniques
Part1详见:[Kaggle竞赛] Part1:特征工程+利用XGBoost进行房价预测
内含数据初步分析、特征工程处理以及简单地用XGBoost实现预测的方法。本篇的模型stacking基于Part1所进行的分析与处理之上,如对整个赛题感兴趣的话请在阅读Part1之后再阅读本篇。
ydata_train = df_train.SalePrice.values
xdata_train = df1_train.drop("SalePrice",axis=1)
df1_test.drop("SalePrice",axis=1,inplace=True)
#数据集备份
xtrain = xdata_train
xtest = df1_test
Stacking
接下来会用到的包:
from sklearn.linear_model import ElasticNet, Lasso, BayesianRidge, LassoLarsIC
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.kernel_ridge import KernelRidge
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import RobustScaler
from sklearn.base import BaseEstimator, TransformerMixin, RegressorMixin, clone
from sklearn.model_selection import KFold, cross_val_score, train_test_split
from sklearn.metrics import mean_squared_error
import xgboost as xgb
import lightgbm as lgb
#写个5折交叉验证用函数
n_folds = 5
def rmsle_cv(model):
kf = KFold(n_folds, shuffle=True, random_state=42).get_n_splits(df1_train.values)#单独做fold划分训练集
rmse= np.sqrt(-cross_val_score(model, df1_train.values, ydata_train, scoring="neg_mean_squared_error", cv = kf))#房价预测惯例取均方根误差
return(rmse)
构建不同的基学习器,这里我们使用了LASSO Regression、Elastic Net Regression、Kernel Ridge Regression、Gradient Boosting Regression、XGBoost、LightGBM 共6个基学习器,其参数可先按经验初始值设定后GridSearch调参寻优(速度过慢),也可以考虑使用贝叶斯调参寻优,本文使用贝叶斯优化。
#
lasso = make_pipeline(RobustScaler(), Lasso(alpha =0.0005, random_state=1))
ENet = make_pipeline(RobustScaler(), ElasticNet(alpha=0.0005, l1_ratio=.9, random_state=3))
KRR = KernelRidge(alpha=0.6, kernel='polynomial', degree=2, coef0=2.5