xgb自定义目标函数和评价函数(自定义mae近似函数为目标函数)

xgboost自定义目标函数及评价函数

xgboost是支持自定义目标函数和评价函数的,官方给的demo如下:

# user define objective function, given prediction, return gradient and second order gradient
# this is log likelihood loss
def logregobj(preds, dtrain):
    labels = dtrain.get_label()
    preds = 1.0 / (1.0 + np.exp(-preds))
    grad = preds - labels
    hess = preds * (1.0 - preds)
    return grad, hess

# user defined evaluation function, return a pair metric_name, result
# NOTE: when you do customized loss function, the default prediction value is margin
# this may make builtin evaluation metric not function properly
# for example, we are doing logistic loss, the prediction is score before logistic transformation
# the builtin evaluation error assumes input is after logistic transformation
# Take this in mind when you use the customization, and maybe you need write customized evaluation function
def evalerror(preds, dtrain):
    labels = dtrain.get_label()
    # return a pair metric_name, result. The metric name must not contain a colon (:) or a space
    # since preds are margin(before logistic transformation, cutoff at 0)
    return 'my-error', float(sum(labels != (preds > 0.0))) / len(labels)
    
# training with customized objective, we can also do step by step training
# simply look at xgboost.py's implementation of train
bst = xgb.train(param, dtrain, num_round, watchlist, obj=logregobj, feval=evalerror)

自定义可导目标函数逼近MAE

如官方文档注释,这里的用户自定义目标函数需要一阶导数和二阶导数,而mae不可导,所以不能直接使用mae作为自定义目标函数。所以我们需要使用目标函数逼近mae函数。以下是stack overflow上提供的一些逼近MAE的函数。

image

从上图我们可以看到,xgb自带的MSE函数对于MAE的逼近并不理想,所以我们可以用其他几个函数来逼近MAE,注意,这里在使用其他几个自定义函数时,我们是无法使用gpu加速xgboost。以下为几个自定义目标函数的python实现代码。参考来源stack overflow

def huber_approx_obj(preds, dtrain):
    d = preds - dtrain.get_label() 
    h = 1  #h is delta in the graphic
    scale = 1 + (d / h) ** 2
    scale_sqrt = np.sqrt(scale)
    grad = d / scale_sqrt
    hess = 1 / scale / scale_sqrt
    return grad, hess
    
def fair_obj(preds, dtrain):
    """y = c * abs(x) - c**2 * np.log(abs(x)/c + 1)"""
    x = preds - dtrain.get_label()
    c = 1
    den = abs(x) + c
    grad = c*x / den
    hess = c*c / den ** 2
    return grad, hess
    
def log_cosh_obj(preds, dtrain):
    x = preds - dtrain.get_label()
    grad = np.tanh(x)
    hess = 1 / np.cosh(x)**2
    return grad, hess

以下自定义近似MAE导数参考kaggle讨论

from numba import jit

@jit
def grad(preds, dtrain):
    labels = dtrain.get_label()
    n = preds.shape[0]
    grad = np.empty(n)
    hess = 500 * np.ones(n)
    for i in range(n):
        diff = preds[i] - labels[i]
        if diff > 0:
            grad[i] = 200
        elif diff < 0:
            grad[i] = -200
        else:
            grad[i] = 0
return grad, hess
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值