龙珠训练营机器学习task04

这篇博客记录了使用LightGBM, XGBoost, GBDT和CatBoost建立多个个体学习器的过程,并通过岭回归进行模型融合,以提升预测精度。作者进行了K折交叉验证,分析了每个模型的性能,并探讨了可能的改进方向,如特征工程、超参数调整等。" 47299799,645179,使用FDB调试Flash页面及远程调试指南,"['ActionScript', '调试工具']
摘要由CSDN通过智能技术生成

学习笔记为阿里云天池龙珠计划机器学习训练营的学习内容,学习链接为:https://tianchi.aliyun.com/competition/entrance/231702/introduction?spm=5176.20222472.J_3678908510.8.8f5e67c2RKrT98
总体思路:分别使用LightGBM,xgboost,gbdt,catboost建立多个个体学习器(加入bagging的策略,对数据随机采样),对最终学习器的输出使用岭回归进一步提升精度。代码如下。

改进点:
1.可以在详细分析一下字段,可以考虑对字段进行特殊处理。
2.超参数还可以调,我没有使用网格搜索,只是简单的进行的调参。
3.如果单纯为了提高精度,可以更高随机种子,多试几次
 

import pandas as pd
import numpy as np
df = pd.read_csv("happiness_train_complete.csv",encoding="GB2312")
df = df.sample(frac=1,replace=False,random_state=11)
df.reset_index(inplace=True)
df = df[df["happiness"]>0]
Y = df["happiness"]
df["survey_month"] = df["survey_time"].map(lambda line:line.split(" ")[0].split("/")[1]).astype("int64")
df["survey_day"] = df["survey_time"].map(lambda line:line.split(" ")[0].split("/")[2]).astype("int64")
df["survey_hour"] = df["survey_time"].map(lambda line:line.split(" ")[1].split(":")[0]).astype("int64")
X = df.drop(columns=["id","index","happiness","survey_time","edu_other","property_other","invest_other"])
 
from sklearn.model_selection import train_test_split
from lightgbm.sklearn import LGBMRegressor
from sklearn.metrics import mean_squared_error
from sklearn.externals import joblib
from sklearn.model_selection import KFold
kfold = KFold(n_splits=15, shuffle = True, random_state= 12)
model = LGBMRegressor(n_jobs=-1,learning_rate=0.051,
                      n_estimators=400,
                      num_leaves=11,
                      reg_alpha=2.0, 
                      reg_lambda=2.1,
                      min_child_samples=6,
                      min_split_gain=0.5,
                      colsample_bytree=0.2
                     )
mse = []
i=0
for train, test in kfold.split(X):
    X_train = X.iloc[train]
    y_train = Y.iloc[train]
    X_test = X.iloc[test]
    y_test = Y.iloc[test]
    model.fit(X_train,y_train)
#     model2.fit(model.predict(X_train,pred_leaf=True),y_train)
#     y_pred = model2.predict(model.predict(X=X_test,pred_leaf=True))
    y_pred = model.predict(X=X_test)
    e = mean_squared_error(y_true=y_test,y_pred=y_pred)
    mse.append(e)
    print(e)
    joblib.dump(filename="light"+str(i),value=model)
    i+=1
print("lightgbm",np.mean(mse),mse)
#CatBoostRegressor
import pandas as pd
import numpy as np
df = pd.read_csv("happiness_train_complete.csv",encoding="GB2312")
df = df.sample(frac=1,replace=False,random_state=11)
df.reset_index(inplace=True)
 
df = df[df["happiness"]>0]
Y = df["happiness"]
df["survey_month"] = df["survey_time"].map(lambda line:line.split(" ")[0].split("/")[1]).astype("int64")
df["survey_day"] = df["survey_time"].map(lambda line:line.split(" ")[0].split("/")[2]).astype("int64")
df["survey_hour"] = df["survey_time"].map(lambda line:line.split(" ")[1].split(":")[0]).astype("int64")
X = df.drop(columns=["id","index","happiness","survey_time","edu_other","property_other","invest_other"])
 
 
from sklearn.model_selection import train_test_split
from catboost import Pool, CatBoostRegressor
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import KFold
from sklearn.externals import joblib
kfold = KFold(n_splits=15, shuffle = True, random_state= 12)
model = CatBoostRegressor(colsample_bylevel=0.1,thread_count=6,silent=Tru

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值