数据分箱4——卡方最优分箱 ChiMerge算法使用（有监督）

最新推荐文章于 2024-07-24 22:13:06 发布

呆萌的代Ma

最新推荐文章于 2024-07-24 22:13:06 发布

阅读量1.6k

点赞数

分类专栏：特征工程机器学习文章标签：算法特征工程机器学习

本文为CSDN博主"呆萌的代Ma"原创文章，转载请注明博客链接：https://blog.csdn.net/weixin_35757704/

本文链接：https://blog.csdn.net/weixin_35757704/article/details/121901954

版权

特征工程同时被 2 个专栏收录

38 篇文章 30 订阅

订阅专栏

机器学习

36 篇文章 9 订阅

订阅专栏

论文地址：https://sci2s.ugr.es/keel/pdf/algorithm/congreso/1992-Kerber-ChimErge-AAAI92.pdf

Kerber, Randy. “Chimerge: Discretization of numeric attributes.” Proceedings of the tenth national conference on Artificial intelligence. 1992.

ChiMerge算法详解（英文）：https://medium.com/@nithin_rajan/data-discretization-using-chimerge-55c8ade3cfda

ChiMerge算法借助卡方检验，算法思路是：

首先把每个值都当做一个独立的区间
循环地合并区间，如果卡方值小于4.6，则合并区间（90%置信度/10%显著性水平下，卡方的值为4.6）
直到满足所需的区间数或是全部的卡方都大于4.6为止

ChiMerge算法使用

为了方便与高效，我们借助第三方工具scorecardbundle

首先安装：

pip install -i https://pypi.org/project --upgrade scorecardbundle

Scorecard-Bundle github主页：https://github.com/Lantianzz/Scorecard-Bundle

示例代码

from scorecardbundle.feature_discretization.ChiMerge import ChiMerge
from sklearn.datasets import make_classification
import pandas as pd

if __name__ == '__main__':
    data_x, data_y = make_classification(n_samples=100, n_classes=4, n_features=10, n_informative=8, random_state=0)
    x_value = data_x[:, 0]
    y_value = data_y
    trans_cm = ChiMerge(max_intervals=10, min_intervals=2, decimal=3, output_dataframe=True)
    result_cm = trans_cm.fit_transform(pd.DataFrame(x_value), y_value)
    print("阈值：", trans_cm.boundaries_[0])
    print("分箱结果：", pd.cut(x_value, trans_cm.boundaries_[0]).codes)

算法python实现（可参考以下文章）

ChiMerge (Ker92)：https://gist.github.com/alanzchen/17d0c4a45d59b79052b1cd07f531689e

ChiMerge算法：卡方检验+ChiMerge+Python：https://www.yanxishe.com/blogDetail/25070

注意：目前博主测试了四种复现方式，没有一个是能正常跑通的，很奇怪…而且复现的思路都有部分不同。如果有跑通的请在评论区发一下，相互学习

呆萌的代Ma

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
打赏
1
评论
数据分箱4——卡方最优分箱 ChiMerge算法使用（有监督）

论文地址：https://sci2s.ugr.es/keel/pdf/algorithm/congreso/1992-Kerber-ChimErge-AAAI92.pdfKerber, Randy. “Chimerge: Discretization of numeric attributes.” Proceedings of the tenth national conference on Artificial intelligence. 1992.ChiMerge算法详解（英文）：https:/.
复制链接

扫一扫