置信区间的置信区间
Gradient Boosting methods are a very powerful tool for performing accurate predictions quickly, on large datasets, for complex variables that depend non linearly on a lot of features.
梯度提升方法是一种非常强大的工具,可对大型数据集上的非线性快速依赖许多特征的复杂变量快速执行准确的预测。
Moreover, it has been implemented in various ways: XGBoost, CatBoost, GradientBoostingRegressor, each having its own advantages, discussed here or here. Something these implementations all share is the ability to choose a given objective for training to minimize. And even more interesting is the fact that XGBoost and CatBoost offer easy support for a custom objective function.
而且,它已经以各种方式实现: XGBoost , CatBoost , GradientBoostingRegressor ,每种方法都有其各自的优势,在此处或此处进行讨论。 这些实现都具有的共同点是能够选择给定目标进行培训以使其最小化。 更有趣的是,XGBoost和CatBoost为自定义目标函数提供了轻松的支持。
Why do I need a custom objective?
为什么需要自定义目标?
Most implementations provide standard objective functions, like Least Square, Least Deviation, Huber, RMSE, … But sometimes, the problem you’re working on requires a more specific solution to achieve the expected level of precision. Using a custom objective is usually my favourite option for tuning models.
大多数实现都提供标准的目标函数,例如最小二乘,最小偏差,Huber,RMSE等。但是有时,您正在解决的问题需要更具体的解决方案才能达到预期的精度水平。 使用自定义目标通常是我最喜欢的用于调整模型的选项。
Can you provide us with an example?
您能提供一个例子吗?
Sure! Recently, I’ve been looking for a way to associate the prediction of one of our models with confidence intervals. As a short reminder, confidence intervals are characterised by two elements:
当然! 最近,我一直在寻找一种将我们模型之一的预测与置信区间相关联的方法。 简要提醒一下,置信区间的特征包括两个要素:
- An interval [x_l, x_u] 间隔[x_l,x_u]
The confidence level i.e. the probability that the predicted values lie in this interval.
置信度, 即预测值在此间隔内的概率。
For instance, we can say that the 99% confidence interval of average temperature on earth is [-80, 60].
例如,我们可以说地球上平均温度的99%置信区间为[-80,60]。
Associating confidence intervals with predictions allows us to quantify the level of trust in a prediction.
将置信区间与预测相关联可以使我们量化预测中的信任级别。
How do you compute confidence intervals?
您如何计算置信区间?
You’ll need to train two models :
您需要训练两个模型:
- One for the upper bound of your interval 一个为间隔的上限
- One for the lower bound of your interval 一个用于间隔的下限
And guess what? You need specific metrics to achieve that: Quant