xgb自定义目标函数和评价函数(自定义mae近似函数为目标函数)

最新推荐文章于 2024-09-01 00:47:13 发布

图灵的喵

最新推荐文章于 2024-09-01 00:47:13 发布

阅读量3.4k

点赞数 3

分类专栏：数据挖掘文章标签：机器学习数据挖掘 xgboost MAE 自定义目标函数

本文链接：https://blog.csdn.net/r1254/article/details/87277193

版权

数据挖掘专栏收录该内容

1 篇文章 0 订阅

订阅专栏

xgboost自定义目标函数及评价函数

xgboost是支持自定义目标函数和评价函数的，官方给的demo如下：

# user define objective function, given prediction, return gradient and second order gradient
# this is log likelihood loss
def logregobj(preds, dtrain):
    labels = dtrain.get_label()
    preds = 1.0 / (1.0 + np.exp(-preds))
    grad = preds - labels
    hess = preds * (1.0 - preds)
    return grad, hess

# user defined evaluation function, return a pair metric_name, result
# NOTE: when you do customized loss function, the default prediction value is margin
# this may make builtin evaluation metric not function properly
# for example, we are doing logistic loss, the prediction is score before logistic transformation
# the builtin evaluation error assumes input is after logistic transformation
# Take this in mind when you use the customization, and maybe you need write customized evaluation function
def evalerror(preds, dtrain):
    labels = dtrain.get_label()
    # return a pair metric_name, result. The metric name must not contain a colon (:) or a space
    # since preds are margin(before logistic transformation, cutoff at 0)
    return 'my-error', float(sum(labels != (preds > 0.0))) / len(labels)
    
# training with customized objective, we can also do step by step training
# simply look at xgboost.py's implementation of train
bst = xgb.train(param, dtrain, num_round, watchlist, obj=logregobj, feval=evalerror)

自定义可导目标函数逼近MAE

如官方文档注释，这里的用户自定义目标函数需要一阶导数和二阶导数，而mae不可导，所以不能直接使用mae作为自定义目标函数。所以我们需要使用目标函数逼近mae函数。以下是stack overflow上提供的一些逼近MAE的函数。

从上图我们可以看到，xgb自带的MSE函数对于MAE的逼近并不理想，所以我们可以用其他几个函数来逼近MAE，注意，这里在使用其他几个自定义函数时，我们是无法使用gpu加速xgboost。以下为几个自定义目标函数的python实现代码。参考来源stack overflow

def huber_approx_obj(preds, dtrain):
    d = preds - dtrain.get_label() 
    h = 1  #h is delta in the graphic
    scale = 1 + (d / h) ** 2
    scale_sqrt = np.sqrt(scale)
    grad = d / scale_sqrt
    hess = 1 / scale / scale_sqrt
    return grad, hess
    
def fair_obj(preds, dtrain):
    """y = c * abs(x) - c**2 * np.log(abs(x)/c + 1)"""
    x = preds - dtrain.get_label()
    c = 1
    den = abs(x) + c
    grad = c*x / den
    hess = c*c / den ** 2
    return grad, hess
    
def log_cosh_obj(preds, dtrain):
    x = preds - dtrain.get_label()
    grad = np.tanh(x)
    hess = 1 / np.cosh(x)**2
    return grad, hess

以下自定义近似MAE导数参考kaggle讨论

from numba import jit

@jit
def grad(preds, dtrain):
    labels = dtrain.get_label()
    n = preds.shape[0]
    grad = np.empty(n)
    hess = 500 * np.ones(n)
    for i in range(n):
        diff = preds[i] - labels[i]
        if diff > 0:
            grad[i] = 200
        elif diff < 0:
            grad[i] = -200
        else:
            grad[i] = 0
return grad, hess