比赛地址:https://tianchi.aliyun.com/competition/entrance/231593/introduction
论坛:https://tianchi.aliyun.com/competition/entrance/231593/forum
参考教程:https://tianchi.aliyun.com/notebook-ai/detail?spm=5176.12586969.1002.6.29281b48A4SkSI&postId=58107
github:https://github.com/wepe/O2O-Coupon-Usage-Forecast
数据字段需特别说明的,received_data 点击数据;used_data 消费数据;
XGB代码
import pandas as pd
import xgboost as xgb
from sklearn.preprocessing import MinMaxScaler
dataset1 = xgb.DMatrix(dataset1_x,label=dataset1_y)
dataset2 = xgb.DMatrix(dataset2_x,label=dataset2_y)
dataset12 = xgb.DMatrix(dataset12_x,label=dataset12_y)
dataset3 = xgb.DMatrix(dataset3_x)
params={'booster':'gbtree',
'objective': 'rank:pairwise',
'eval_metric':'auc',
'gamma':0.1,
'min_child_weight':1.1,
'max_depth':5,
'lambda':10,
'subsample':0.7,
'colsample_bytree':0.7,
'colsample_bylevel':0.7,
'eta': 0.01,
'tree_method':'exact',
'seed':0,
'nthread':12
}
watchlist = [(dataset12,'train')]
model = xgb.train(params,dataset12,num_boost_round=3500,evals=watchlist)
#predict test set
dataset3_preds['label'] = model.predict(dataset3)
dataset3_preds.label = MinMaxScaler().fit_transform(dataset3_preds.label.reshape(-1, 1))
dataset3_preds.sort_values(by=['coupon_id','label'],inplace=True)
dataset3_preds.to_csv("xgb_preds.csv",index=None,header=None)
print dataset3_preds.describe()
#save feature score
feature_score = model.get_fscore()
feature_score = sorted(feature_score.items(), key=lambda x:x[1],reverse=True)
fs = []
for (key,value) in feature_score:
fs.append("{0},{1}\n".format(key,value))
with open('xgb_feature_score.csv','w') as f:
f.writelines("feature,score\n")
f.writelines(fs)
项目背景:https://tianchi.aliyun.com/competition/entrance/231593/information
本特征处理代码:https://tianchi.aliyun.com/notebook-ai/detail?spm=5176.12586969.1002.6.29281b48A4SkSI&postId=58107