lightgbm 保存模型 过大_lightgbm 调包侠自救指南

背景

lightGBM主要分为原生接口,与scikit-learn接口两种。

除去传参与调包格式不一样,后者的save与load需要用sklearn来完成。

API手册: https://lightgbm.readthedocs.io/en/latest/Python-API.html

训练

原生接口,使用lgb.train()方法。需要参数外挂为字典。

lgb_train = lgb.Dataset(data=train_x,label=train_y)

lgb_valid = lgb.Dataset(data=valid_x,label=valid_y)

params = {

'task':'train',

'boosting_type':'gbdt',

'objective':'binary',

'metric':{'12','auc','binary_logloss'},

'num_leaves':31,

'num_trees':100,

'learning_rate':0.05,

'feature_fraction':0.9,

'bagging_fraction':0.8,

'bagging_freq':5,

'verbose':0

}

gbm = lgb.train(params=params,

train_set=lgb_train,

num_boost_round=10,

valid_sets=lgb_valid,

early_stopping_rounds=50)

sklearn接口,先定义训练器对象,再用fit()训练。

gbm = lgb.LGBMRegressor(

boosting_type='gbdt', objective='regression', metric='rmse',

learning_rate=0.05, num_leaves=31, max_depth=-1, n_estimators=1000,

subsample=0.7,subsample_freq=1,colsample_bytree=0.7)

gbm.fit(train_x, train_y

early_stopping_rounds=None)

proba_test = gbm.predict_proba(test_x)[:, 1]

交叉验证

lgb.cv

#这是train的进阶

num_round = 10

bst = lgb.train(param, train_data, num_round, valid_sets=[test_data])

#升级之后就是

num_round = 10

lgb.cv(param, train_data, num_round, nfold=5)

自定义loss和eval函数

loss func: 需要定义一个函数,input=目标值和预测值(都是list-like)。反过来,该函数应该返回梯度的两个梯度和每个观测值的hessian数组。如上所述,我们需要使用微积分来派生gradient和hessian,然后在Python中实现它。

eval func:在LightGBM中定制验证丢失需要定义一个函数,该函数接受格式相同的两个数组,但返回三个值: 要打印的名称为metric的字符串、损失本身以及关于是否更高更好的布尔值。

def custom_asymmetric_train(y_true, y_pred):

residual = (y_true - y_pred).astype("float")

grad = np.where(residual<0, -2*10.0*residual, -2*residual)

hess = np.where(residual<0, 2*10.0, 2.0)

return grad, hess

def custom_asymmetric_valid(y_true, y_pred):

residual = (y_true - y_pred).astype("float")

loss = np.where(residual < 0, (residual**2)*10.0, residual**2)

return "custom_asymmetric_eval", np.mean(loss), False

#https://cloud.tencent.com/developer/article/1357671

如果使用sklearn

********* Sklearn API **********

# default lightgbm model with sklearn api

gbm = lightgbm.LGBMRegressor()

#把我们自定义的Loss函数设为objective,也可以在实例化gbm的时候这样做

gbm.set_params(**{'objective': custom_asymmetric_train}, metrics = ["mse", 'mae'])

#在fit中,传入自定义的valid函数

gbm.fit(

X_train,

y_train,

eval_set=[(X_valid, y_valid)],

eval_metric=custom_asymmetric_valid,

verbose=False,

)

y_pred = gbm.predict(X_valid)

如果使用lgbm原生接口

********* Python API **********

# create dataset for lightgbm

# if you want to re-use data, remember to set free_raw_data=False

lgb_train = lgb.Dataset(X_train, y_train, free_raw_data=False)

lgb_eval = lgb.Dataset(X_valid, y_valid, reference=lgb_train, free_raw_data=False)

# specify your configurations as a dict

params = {

'objective': 'regression',

'verbose': 0

}

gbm = lgb.train(params,

lgb_train,

num_boost_round=10,

init_model=gbm,

fobj=custom_asymmetric_train,

feval=custom_asymmetric_valid,

valid_sets=lgb_eval)

y_pred = gbm.predict(X_valid)

使用最优参数预测

ypred = bst.predict(data, num_iteration=bst.best_iteration)

保存模型

gbm.save_model('model.txt',num_iteration=gbm.best_iteration_)

bst.save_model('model.txt')

bst = lgb.Booster(model_file='model.txt') #init model

https://github.com/Microsoft/LightGBM/issues/1217#issuecomment-360352312

I see, for the sklearn model save/load, you can use joblib.

example:

from sklearn.externals import joblib

# save model

joblib.dump(lgbmodel, 'lgb.pkl')

# load model

gbm_pickle = joblib.load('lgb.pkl')

查看属性

对sklearn的API

model.best_iteration

#对model.best_iteration_的封装,也可以直接访问本体。

model.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值