LightGBM与XGBoost在多分类问题上，使用与sklearn决策树中参数class_weight=‘balanced‘的算法一致，都使用 np.bincount

最新推荐文章于 2024-06-06 01:18:09 发布

inf-inf

最新推荐文章于 2024-06-06 01:18:09 发布

阅读量759

点赞数

分类专栏： Notebook python 文章标签： sklearn 决策树算法

原文链接：https://blog.csdn.net/xlinsist/article/details/51346523

版权

Notebook 同时被 2 个专栏收录

49 篇文章 0 订阅

订阅专栏

python

32 篇文章 0 订阅

订阅专栏

标签稀疏类别不平衡问题解决方案总结:
https://blog.csdn.net/yilulvxing/article/details/111396700

使用boosting遇到样本不均衡中的class_weight

LightGBM在多分类问题上，使用与sklearn决策树中参数class_weight='balanced’的算法一致，都使用 n_samples / (n_classes * np.bincount(y))
但sklearn中对于binary二分类问题，同样使用n_samples / (n_classes * np.bincount(y))计算class_weight=‘balanced’；

LightGBM对二分类问题选用is_unbalance or scale_pos_weight，其中scale_pos_weight为欠采样downsampling方法来减弱样本不均衡的影响，但容易使得准确率降低，
所以使用概率校准Probability calibration，
sklearn函数self.classifier = CalibratedClassifierCV(clf, cv=2, method=‘isotonic’)，可以用Brier评分；

LightGBM与xgboost的scale_pos_weight有区别
LightGBM和XGBoost使用scale_pos_weight处理不平衡数据源码分析：
https://codeleading.com/article/56332668349/

如果adaboost选用的基分类器决策树使用class_weight='balanced’则迭代一次后效果不再提升，errorRate=0;

lightgbm.LGBMClassifier
https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.LGBMClassifier.html

class_weight (dict, ‘balanced’ or None, optional (default=None)) –
Weights associated with classes in the form {class_label: weight}. Use
this parameter only for multi-class classification task; for binary
classification task you may use is_unbalance or scale_pos_weight
parameters. Note, that the usage of all these parameters will result
in poor estimates of the individual class probabilities. You may want
to consider performing probability calibration
(https://scikit-learn.org/stable/modules/calibration.html) of your
model. The ‘balanced’ mode uses the values of y to automatically
adjust weights inversely proportional to class frequencies in the
input data as n_samples / (n_classes * np.bincount(y)). If None, all
classes are supposed to have weight one. Note, that these weights will
be multiplied with sample_weight (passed through the fit method) if
sample_weight is specified.

sklearn.tree.DecisionTreeClassifier
https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html?highlight=decisiontreeclassifier#sklearn.tree.DecisionTreeClassifier

class_weightdict, list of dict or “balanced”, default=None Weights
associated with classes in the form {class_label: weight}. If None,
all classes are supposed to have weight one. For multi-output
problems, a list of dicts can be provided in the same order as the
columns of y.

Note that for multioutput (including multilabel) weights should be
defined for each class of every column in its own dict. For example,
for four-class multilabel classification weights should be [{0: 1, 1:
1}, {0: 1, 1: 5}, {0: 1, 1: 1}, {0: 1, 1: 1}] instead of [{1:1},
{2:5}, {3:1}, {4:1}].

The “balanced” mode uses the values of y to automatically adjust
weights inversely proportional to class frequencies in the input data
as n_samples / (n_classes * np.bincount(y))

For multi-output, the weights of each column of y will be multiplied.

Note that these weights will be multiplied with sample_weight (passed
through the fit method) if sample_weight is specified.

numpy.bincount详解
https://blog.csdn.net/xlinsist/article/details/51346523

XGBoost Parameters
https://xgboost.readthedocs.io/en/latest/parameter.html#parameters-for-tree-booster

Isotonic regression–保序回归
https://blog.csdn.net/weixin_42468475/article/details/115319437

概率校准
https://cloud.tencent.com/developer/article/1642298

Sklearn(一): Probability calibration
https://blog.csdn.net/u014765410/article/details/82772154

1.16. Probability calibration
https://scikit-learn.org/stable/modules/calibration.html#calibration

深度学习的不确定性（Uncertainty/confidence score）与校准(Calibration)

inf-inf

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
LightGBM与XGBoost在多分类问题上，使用与sklearn决策树中参数class_weight=‘balanced‘的算法一致，都使用 np.bincount

LightGBM在多分类问题上，使用与sklearn决策树中参数class_weight='balanced'的算法一致，都使用 n_samples / (n_classes * np.bincount(y))但sklearn中对于binary二分类问题，同样使用n_samples / (n_classes * np.bincount(y))计算class_weight='balanced'；
复制链接

扫一扫

专栏目录