机器学习---xgboost与lightgbm效果比较(2)

- 背景

根据“kaggle案例--Instacart Market Basket Analysis(1) ”生成的数据,对xgboost与lightGBM进行效果比较。

- 效果比较

数据量: (847466, 20)
xgboost训练时间41s, 精度0.27
lightgbm 训练时间9s, 精度0.28
可以发现, lightgbm训练速度确实比xgboost快很多,且精度损失不大。

- 测试代码

  • (1)生成训练测试数据
import os
import datetime
from datetime import datetime
import pandas as pd
from sklearn.cross_validation import train_test_split #2.7python可用
os.chdir(r'd:\pywork\Instacart')
data=pd.read_csv('data.txt')
train = data.loc[data.eval_set == "train",:]
train.drop(['eval_set', 'user_id', 'product_id', 'order_id'], axis=1, inplace=True)
train.loc[:, 'reordered'] = train.reordered.fillna(0)

X_train, X_val, y_train, y_val = train_test_split(train.drop('reordered', axis=1), train.reordered,                                    test_size=0.9, random_state=42)
  • (2)xgboost训练与预测结果
import xgboost 

#train
d_train = xgboost.DMatrix(X_train, y_train)
xgb_params = {
    "objective"         : "reg:logistic"
    ,"eval_metric"      : "logloss"
    ,"eta"              : 0.1
    ,"max_depth"        : 6
    ,"min_child_weight" :10
    ,"gamma"            :0.70
    ,"subsample"        :0.76
    ,"colsample_bytree" :0.95
    ,"alpha"            :2e-05
    ,"lambda"           :10
}

watchlist= [(d_train, "train")]
xgb_start=datetime.now()
bst = xgboost.train(params=xgb_params, dtrain=d_train, num_boost_round=80, evals=watchlist, verbose_eval=10)
xgb_end=datetime.now()
print 'spendt time :'+ str((xgb_end-xgb_start).seconds)+'(s)'
xgboost.plot_importance(bst)

''' train result:
[0]     train-logloss:0.625642
[10]    train-logloss:0.335753
[20]    train-logloss:0.269213
[30]    train-logloss:0.252115
[40]    train-logloss:0.247442
[50]    train-logloss:0.245712
[60]    train-logloss:0.244735
[70]    train-logloss:0.243973
[79]    train-logloss:0.243472
spendt time :41(s)
'''

#predict 预测
pre_data = xgboost.DMatrix(X_val, y_val)
predict=bst.predict(pre_data)
X_val['reorder']=y_val
X_val['pre']=predict

print "test score is :"
precision=float(len(X_val[(X_val['pre']>0.5) & (X_val['reorder']==1)]))/\
          float(len(X_val[X_val['pre']>0.5]))   # predict score>0.5 and lables=1 nums divied predict score>0.5 nums
recall=float(len(X_val[(X_val['pre']>0.5) & (X_val['reorder']==1)]))/\
          float(len(X_val[X_val['reorder']==1]))     # predict score>0.5 and lables=1 nums divied lables=1
f1_score=2*(precision*recall)/(precision+recall)

'f1_score: 0.27986198335189855'
  • (3) lightgbm训练与预测结果
# trian
import numpy as np
import lightgbm as lgb
labels = np.array(y_train, dtype=np.int8)
d_train = lgb.Dataset(X_train,
                      label=labels)  # , 'order_hour_of_day', 'dow'    
params = {
    'task': 'train',
    'boosting_type': 'gbdt',
    'objective': 'binary',
    'metric': {'binary_logloss'},
    'num_leaves': 96,
    'max_depth': 10,
    'feature_fraction': 0.9,
    'bagging_fraction': 0.95,
    'bagging_freq': 5
}
ROUNDS = 100
watchlist=[d_train]

lgb_start=datetime.now()
bst = lgb.train(params=params, train_set=d_train, num_boost_round=ROUNDS,valid_sets=watchlist,verbose_eval=10)
lgb_end=datetime.now()
print 'spendt time :'+str((lgb_end-lgb_start).seconds)+'(s)'

''' lgb train score:
[10]    training's binary_logloss: 0.348039
[20]    training's binary_logloss: 0.271007
[30]    training's binary_logloss: 0.250972
[40]    training's binary_logloss: 0.245258
[50]    training's binary_logloss: 0.242898
[60]    training's binary_logloss: 0.241338
[70]    training's binary_logloss: 0.240099
[80]    training's binary_logloss: 0.239047
[90]    training's binary_logloss: 0.238009
[100]   training's binary_logloss: 0.236996
spendt time :9(s)
'''

#pridict
X_val=X_val.drop(['reorder','pre'], axis=1)# xgboost使用时添加了pre,所以这里删除
predict=bst.predict(X_val) #X_val 类型string/numpy array/scipy.sparse

X_val['reorder']=y_val
X_val['pre']=predict

print "test score is :"
precision=float(len(X_val[(X_val['pre']>0.5) & (X_val['reorder']==1)]))/\
          float(len(X_val[X_val['pre']>0.5]))   # predict score>0.5 and lables=1 nums divied predict score>0.5 nums
recall=float(len(X_val[(X_val['pre']>0.5) & (X_val['reorder']==1)]))/\
          float(len(X_val[X_val['reorder']==1]))     # predict score>0.5 and lables=1 nums divied lables=1
f1_score=2*(precision*recall)/(precision+recall)
print 'F1 score is :'+str(f1_score)
'F1 score is :0.28706938762'
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值