chatgpt如何解决模型训练过程中一些未知错误：以xgboot devices_.IsEmpty为例

mtj66

已于 2023-02-24 17:19:24 修改

阅读量3.3k

点赞数

分类专栏： chatgpt python 机器学习文章标签： python Powered by 金山文档

于 2023-02-24 16:53:11 首次发布

本文链接：https://blog.csdn.net/mtj66/article/details/129203471

版权

python 同时被 3 个专栏收录

44 篇文章 2 订阅

订阅专栏

机器学习

25 篇文章 1 订阅

订阅专栏

chatgpt

1 篇文章 0 订阅

订阅专栏

本文尝试用现在最火的chatGPT在工作中提高生产力。

具体背景如下：在训练模型过程中，为了避免资源抢占，我指定了其他的gpu来提高模型训练效率，但是发现训练的时候模型正常，但是在模型预测的时候一直报错，尝试gpu=1，2，3都报错。gpu=0，或者是不设置都不会出错。

预测的时候具体报错内容如下：

XGBoostError: b'[16:17:19] /workspace/src/common/host_device_vector.cu:249: Check failed: devices_.IsEmpty() \n\nStack trace returned 10 entries:\n[bt] (0) /root/.pyenv/versions/3.6.5/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so(dmlc::StackTrace()+0x47) [0x7f87a4a842f7]\n[bt] (1) /root/.pyenv/versions/3.6.5/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x18) [0x7f87a4a849a8]\n[bt] (2) /root/.pyenv/versions/3.6.5/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so(xgboost::HostDeviceVectorImpl<float>::Reshard(xgboost::GPUSet)+0x73) [0x7f87a4cc6e93]\n[bt] (3) /root/.pyenv/versions/3.6.5/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so(xgboost::predictor::GPUPredictor::InitOutPredictions(xgboost::MetaInfo const&, xgboost::HostDeviceVector<float>*, xgboost::gbm::GBTreeModel const&) const+0x2f) [0x7f87a4c851df]\n[bt] (4) /root/.pyenv/versions/3.6.5/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so(xgboost::predictor::GPUPredictor::PredictBatch(xgboost::DMatrix*, xgboost::HostDeviceVector<float>*, xgboost::gbm::GBTreeModel const&, int, unsigned int)+0xab) [0x7f87a4c8bd4b]\n[bt] (5) /root/.pyenv/versions/3.6.5/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so(xgboost::LearnerImpl::PredictRaw(xgboost::DMatrix*, xgboost::HostDeviceVector<float>*, unsigned int) const+0x33) [0x7f87a4b29723]\n[bt] (6) /root/.pyenv/versions/3.6.5/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so(xgboost::LearnerImpl::Predict(xgboost::DMatrix*, bool, xgboost::HostDeviceVector<float>*, unsigned int, bool, bool, bool, bool) const+0x40) [0x7f87a4b29800]\n[bt] (7) /root/.pyenv/versions/3.6.5/lib/python3.6/site-packages/xgboost/./lib/libxgboost.so(XGBoosterPredict+0xb1) [0x7f87a4ac52b1]\n[bt] (8) /root/.pyenv/versions/3.6.5/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so(ffi_call_unix64+0x4c) [0x7f89e5592b8a]\n[bt] (9) /root/.pyenv/versions/3.6.5/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so(ffi_call+0x165) [0x7f89e5591cd5]\n\n'

可以定位到是predict的原因，找了一圈，发现有个类似的xgboost issue:https://github.com/dmlc/xgboost/issues/3913

没给出具体解决方案，升级版本等其他解决方式也不是本文的讨论范围。

现在有请gpt大佬登场，（我用的是蹩脚的英文暖场，其实就是场景设定，也叫Prompt）。

然后你会发现你的会话变成了,也就意味着在模型领域展开对话。

通过以上猜测应该是在预测过程中缺少device设置，ok，进一步提问

打印predict函数，发现并没有predictor参数，很明显chatgpt在一本正经的胡说八道，其中夹杂着一些真实的内容

猜测应该是在模型训练的时候就指定的，因为我用的是xgb.train模块，返回的就是booster,不是直接初始化booster方式

params = {

'objective': 'binary:logistic',

'eta': 0.02,

'max_depth': 3,

'eval_metric': 'auc',

'silent': 0,

'min_child_weight': 100,

'subsample': 0.7,

'colsample_bytree': 0.7,

'gamma': 0.1,

'lambda': 0.1,

'alpha': 10,

'nthread': 28,

'n_jobs': 28,

'tree_method': 'gpu_hist',

'booster':'gbtree',

'gpu_id':1,

'seed': random_state,

'predictor': 'cpu_predictor',

}

查看xgb官网：

https://xgboost.readthedocs.io/en/release_0.72/parameter.html

predictor, [default=``cpu_predictor``]

The type of predictor algorithm to use. Provides the same results but allows the use of GPU or CPU.

cpu_predictor: Multicore CPU prediction algorithm.

gpu_predictor: Prediction using GPU. Default when tree_method is gpu_exact or gpu_hist.

这个就让人费解了，为什么predictor, [default=``cpu_predictor``] 没有生效呢？

原来这个地方有个坑：Default when tree_method is gpu_exact or gpu_hist，我是指定的tree_method=gpu_hist，也就意味着，xgb源码从这个之后的某个位置，缺少了device的赋值操作，导致预测的时候报错，有空的话可以看下源码。

# 训练

xgb_model = xgb.train(

params=params,

dtrain=d_train,

evals=evallist,

num_boost_round=n_estimators,

early_stopping_rounds=early_stopping_rounds,

verbose_eval=0,

callbacks = [log_evaluation_with_logger(period=log_period, show_stdv=True,logger=logger)]

)

如上参数中加了 'predictor': 'cpu_predictor', 解决了问题。

是xgb好用？是chatgpt好用？还是使用的人具有分析判断的能力才能顺利的解决问题，提高生产力。

mtj66

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
打赏
0
评论
chatgpt如何解决模型训练过程中一些未知错误：以xgboot devices_.IsEmpty为例

用chatGPT提高生产力具体背景如下：在训练模型过程中，为了避免资源抢占，我指定了其他的gpu来提高模型训练效率，但是发现训练的时候模型正常，但是在模型预测的时候一直报错，尝试gpu=1，2，3都报错。
复制链接

扫一扫