不平衡分类_不平衡分类完整的路线图

不平衡分类

The very interesting problem of imbalanced classification is quite famous in articles and academic papers. Most of the work focus is on one part of the big image where it addresses a specific data set and discusses possible solutions. So eventually, you have to open more than 10 tabs in one browser to learn about the problem and its possible solutions. Here I collected a complete road map so you can see the complete image of all the steps you have to go through from dealing with your data till you end up with an informative conclusion based on your question of interest.

分类不平衡这个非常有趣的问题在文章和学术论文中都非常有名。 大部分工作重点都放在大图的一部分上,其中它处理特定的数据集并讨论可能的解决方案。 因此,最终,您必须在一个浏览器中打开10个以上的标签,以了解该问题及其可能的解决方案。 在这里,我收集了完整的路线图,以便您可以看到从处理数据到您最终得出基于您感兴趣的问题的有益结论所必须执行的所有步骤的完整图像。

Before starting this journey together. Let’s talk first why imbalanced classification is important?! to industry people not just the nerdy academic people, and what are the applications that suffer from this problem by nature and some other applications that happen to have imbalanced classes due to customers' behavior?!.

一起开始这个旅程之前。 让我们先说为什么不平衡分类很重要? 对于行业人士,不仅是书呆子的学术人员,还有哪些从本质上受此问题困扰的应用程序,以及由于客户的行为而碰巧出现班级不平衡的其他一些应用程序?

Imbalanced classification refers to having unequal distribution classes. Talking business-wise imagine you have released two products in the market, and you found 90% of your customers prefer one product over the other one. At some point, you will get back to your data team asking to explain the customers' behavior based on the customer characteristics! to be able to understand this behavior and the potential change that would push them to get the other less liked product or to adjust this product based on the customers’ preferences. There are many famous applications for imbalanced classification which are expected to show up due to the nature of this application such as fraud detection, large claim losses in insurance applications, spam mails, hardware failure,.., etc. Some other applications just happen due to unexpected customers' behavior which you can’t anticipate but you have to deal with it when it happens.

不平衡的分类是指具有不相等的分配类别 。 进行商务交流时,假设您在市场上发布了两种产品,而您发现90%的客户更喜欢一种产品。 在某个时候,您将回到数据团队,要求根据客户特征来解释客户的行为! 能够了解这种行为以及可能促使他们获得其他不受欢迎的产品或根据客户的喜好调整产品的潜在变化。 由于这种应用程序的性质,有许多著名的不平衡分类应用程序有望出现,例如欺诈检测,保险应用程序中的大量索赔损失,垃圾邮件,硬件故障等 。 其他一些应用程序的发生是由于您无法预料的意外客户行为 ,但您必须在发生这种情况时对其进行处理。

In this article, I will go through the general 3 steps of imbalanced classification analysis as previewed in the image below

在本文中,我将进行不平衡分类分析的一般3个步骤,如下图所示

Image for post
Image by the author
图片由作者提供

I will explain the details of the available options in each step. This is in addition to highlighting some pitfalls and tricks you need to be aware of when dealing with imbalanced data.

我将在每个步骤中详细说明可用选项。 这不仅突出了在处理不平衡数据时需要注意的一些陷阱和技巧。

数据清理和准备 (Data cleaning and preparation

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值