python 分类模型评估
介绍 (Introduction)
Picking the right machine learning algorithm is decisive, where it decides the performance of the model. The most dominating factor in choosing a model is the performance, which employs the KFold-cross-validation technique to achieve independence.
选择正确的机器学习算法至关重要,它决定了模型的性能。 选择模型中最主要的因素是性能,该性能采用KFold交叉验证技术来实现独立性。
The chosen model usually has a higher mean performance. Nevertheless, sometimes it originated through a statistical fluke. There are many statistical hypothesis-testing approaches to evaluate the mean performance difference resulting from the cross-validation to address this concern. If the difference is above the significance level `p-value` we can reject the null hypothesis that the two algorithms are the same, and the difference is not significant.
所选模型通常具有较高的平均性能。 然而,有时它是由于统计fl幸而产生的。 有许多统计假设检验方法可以评估交叉验证产生的平均性能差异,以解决此问题。 如果差异高于显着性水平“ p值 ”,我们可以拒绝两种算法相同且差异不显着的原假设。
I usually include such a step in my pipeline either when developing a new classification model or competing in one of Kaggle’s competitions.
教程目标 (Tutorial Objectives)
- Understanding the difference between statistical hypothesis tests. 了解统计假设检验之间的差异。
- Model selection based on the mean performance score could be misleading. 基于平均性能得分的模型选择可能会产生误导。
- Why using the Paired Student’s t-test over the original Student’s t-test. 为什么要使用配对学生的t检验代替原始学生的t检验。
Applying the advance technique of 5X2 fold by utilizing the MLxtend library for comparing the algorithms based on p-value
利用MLxtend库将5X2倍进阶技术应用于基于p值的算法比较
表中的内容 (Table of content)
- What does the statistical significance testing mean? 统计显着性检验是什么意思?
- Types of commonly used statistical hypothesis testings 常用统计假设检验的类型
- Extract the best two models based on performance. 根据性能提取最佳的两个模型。
- Steps to conduct hypothesis testing on the best two 对最好的两个进行假设检验的步骤
- Steps to apply the 5X2 fold 应用5X2折页的步骤
- Comparing Classifier algorithms 比较分类器算法
- Summary 摘要
统计假设检验是什么意思? (What does the statistical hypothesis testing mean?)
A statistical hypothesis test quantifies how plausible it is to witness two data samples, considering that they have the same distribution. That describes the null hypothesis. We can test this null hypothesis by applying some statistical calculations.
统计假设检验可量化见证两个数据样本(假设它们具有相同的分布)是多么合理。 这描述了原假设。 我们可以通过应用一些统计计算来检验这种零假设。
If the test result infers insufficient proof to reject the null hypothesis, then any observed difference in the model scores is a happened by chance.
如果测试结果推断出不足以拒绝无效假设的证据,则模型分数的任何观察到的差异都是偶然发生的。
If the test result infers sufficient evidence to reject the null hypothesis, then any observed di