LightGBM与XGBoost在多分类问题上,使用与sklearn决策树中参数class_weight=‘balanced‘的算法一致,都使用 np.bincount

标签稀疏类别不平衡问题解决方案总结:
https://blog.csdn.net/yilulvxing/article/details/111396700

使用boosting遇到样本不均衡中的class_weight

LightGBM在多分类问题上,使用与sklearn决策树中参数class_weight='balanced’的算法一致,都使用 n_samples / (n_classes * np.bincount(y))
但sklearn中对于binary二分类问题,同样使用n_samples / (n_classes * np.bincount(y))计算class_weight=‘balanced’;

LightGBM对二分类问题选用is_unbalance or scale_pos_weight,其中scale_pos_weight为欠采样downsampling方法来减弱样本不均衡的影响,但容易使得准确率降低,
所以使用概率校准Probability calibration
sklearn函数self.classifier = CalibratedClassifierCV(clf, cv=2, method=‘isotonic’),可以用Brier评分

LightGBM与xgboost的scale_pos_weight有区别
LightGBM和XGBoost使用scale_pos_weight处理不平衡数据源码分析:
https://codeleading.com/article/56332668349/

如果adaboost选用的基分类器决策树使用class_weight='balanced’则迭代一次后效果不再提升,errorRate=0;

lightgbm.LGBMClassifier
https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.LGBMClassifier.html

class_weight (dict, ‘balanced’ or None, optional (default=None)) –
Weights associated with classes in the form {class_label: weight}. Use
this parameter only for multi-class classification task; for binary
classification task you may use is_unbalance or scale_pos_weight
parameters. Note, that the usage of all these parameters will result
in poor estimates of the individual class probabilities. You may want
to consider performing probability calibration
(https://scikit-learn.org/stable/modules/calibration.html) of your
model. The ‘balanced’ mode uses the values of y to automatically
adjust weights inversely proportional to class frequencies in the
input data as n_samples / (n_classes * np.bincount(y)). If None, all
classes are supposed to have weight one. Note, that these weights will
be multiplied with sample_weight (passed through the fit method) if
sample_weight is specified.

sklearn.tree.DecisionTreeClassifier
https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html?highlight=decisiontreeclassifier#sklearn.tree.DecisionTreeClassifier

class_weightdict, list of dict or “balanced”, default=None Weights
associated with classes in the form {class_label: weight}. If None,
all classes are supposed to have weight one. For multi-output
problems, a list of dicts can be provided in the same order as the
columns of y.

Note that for multioutput (including multilabel) weights should be
defined for each class of every column in its own dict. For example,
for four-class multilabel classification weights should be [{0: 1, 1:
1}, {0: 1, 1: 5}, {0: 1, 1: 1}, {0: 1, 1: 1}] instead of [{1:1},
{2:5}, {3:1}, {4:1}].

The “balanced” mode uses the values of y to automatically adjust
weights inversely proportional to class frequencies in the input data
as n_samples / (n_classes * np.bincount(y))

For multi-output, the weights of each column of y will be multiplied.

Note that these weights will be multiplied with sample_weight (passed
through the fit method) if sample_weight is specified.

numpy.bincount详解
https://blog.csdn.net/xlinsist/article/details/51346523

XGBoost Parameters
https://xgboost.readthedocs.io/en/latest/parameter.html#parameters-for-tree-booster

Isotonic regression–保序回归
https://blog.csdn.net/weixin_42468475/article/details/115319437

概率校准
https://cloud.tencent.com/developer/article/1642298

Sklearn(一): Probability calibration
https://blog.csdn.net/u014765410/article/details/82772154

1.16. Probability calibration
https://scikit-learn.org/stable/modules/calibration.html#calibration

深度学习的不确定性(Uncertainty/confidence score)与校准(Calibration)

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值