论文数据检验之——Wilcoxson signed-rank test/Scott‐Knott ESD test/cliffs-delta test

最新推荐文章于 2024-07-14 14:34:11 发布

Andre_young

最新推荐文章于 2024-07-14 14:34:11 发布

阅读量2.8k

点赞数 43

文章标签： python r语言

本文链接：https://blog.csdn.net/Andre_young/article/details/134959179

版权

论文数据检验之——Wilcoxson signed-rank test/Scott‐Knott ESD test/cliffs-delta test

Wilcoxson signed-rank test

首先是引用，可以直接跳转链接，查看相关介绍。

https://www.cnblogs.com/djx571/p/10216940.html

首先需要解释清楚这两个问题：

为什么要使用该检验，它能干嘛？
怎么使用该检验？

1.答：使用该测试是为了比较以往方法与我们提出的方法之间的差异性如何，如果数据对之间的差异是非正态分布的，则应使用Wilcoxon Signed-Rank检验；
2.答：
0假设（H0）：假设要比较的方法和我们的方法之间没有显著差异
此时我们已经得到有（以Precision指标为例，假设10个数据集）：
C（compared method）在10个数据集上的Precision指标值： [0.43, 0.55, …]长度为10
O（our method）在10个数据集上的Precision指标值： [0.32, 0.76, …]长度为10
可以查看链接看详细的计算过程，python代码如下：

from scipy import stats
w, p_value = stats.wilcoxon(C, O, correction=False)

我们只要p_value来说明这个差异是不是显著的，如果p<0.05，那么就否定原假设，认为两个方法之间具有显著差异。
后面Benjamini–Hochberg提出了修正检验，可参考原论文

Ferreira, J.A., Zwinderman, A.H.: On the benjamini–hochberg method.
Ann. Stat. 34(4), 1827–1849 (2006). https://doi.org/10.1214/009053
606000000425

Scott‐Knott ESD test

老规矩，先引参考的博客链接，原文链接

Scott‐Knott ESD test 是一种均值比较方法，它利用分层聚类将一组处理均值（treatment means）（如变量重要性评分均值、模型性能均值）划分为统计学上有显著差异的组，差异不可忽略。这是斯科特-克诺特检验的另一种方法，它考虑的是组内和组间处理均值的差异大小（即效应大小）
因此，Scott-Knott ESD 检验可以得出处理平均值的排序，同时确保：
（1）组内差异可以忽略不计；
（2）组间的差异不可忽略。

例子可以参考博客链接

R代码如下：
输入为：在各个指标下各方法的指标值，比如
文件名为precision.csv
csv数据：
datasets | A | B | C |
1 | 0.1 | 0.2 | 0.3 |
2 | 0.3 | 0.4 | 0.1 |

library(ScottKnottESD)
finalpath= "" #这里改为自己的数据路径
skresultpath=''" #输出路径
file_names<- list.files(finalpath)
for (i in 1:length(file_names)) {
    path=paste(finalpath,sep = "",file_names[i])
    print(path)
    csv<- read.csv(file=path, header=TRUE, sep=",")
    csv<-csv[-1]
    sk <- sk_esd(csv)
    resultpath=paste(skresultpath,sep = "",file_names[i])
    resultpath=paste(resultpath,sep = "",".txt")
    write.table (sk[["groups"]], resultpath) 
}

cliffs-delta test

The Cliff’s Delta statistic is a non-parametric effect size measure that quantifies the amount of difference between two groups of observations beyond p-values interpretation. This measure can be understood as a useful complementary analysis for the corresponding hypothesis testing.
Cliff Delta 统计量是一种非参数效应大小度量，可量化两组观测值之间超出 p 值解释的差异量。该措施可以理解为对相应假设检验的有用补充分析。

库函数直接用
官方代码例子：

from cliffs_delta import cliffs_delta

x1 = [10, 20, 20, 20, 30, 30, 30, 40, 50, 100]
x2 = [10, 20, 30, 40, 40, 50]
d, res = cliffs_delta(x1, x2)

print(d,res)

-0.06666666666666667 negligible

至于结果如何分析，这个要去找一些paper