R语言医学数据分析实战（第十三章文中源码解析）

卡卡卡卡卡夫卡

已于 2023-12-24 16:51:02 修改

阅读量134

点赞数

分类专栏： R 文章标签： r语言数据分析开发语言

于 2022-05-22 15:19:30 首次发布

本文链接：https://blog.csdn.net/karol_peng/article/details/124911115

版权

R 专栏收录该内容

6 篇文章 2 订阅

订阅专栏

Chapter 13

聚类分析和判别分析都是样本分类的统计方法。
其中主要的区别是，聚类分析是未知分类，无监督学习；判别分析是已知分类，有监督学习。

13.1 二分类结果的评价标准

# cbind是按列合并，rbind是按行合并。参考：https://blog.csdn.net/ARPOSPF/article/details/85450238
# dimnames给各列命名
table1 <- as.table(cbind(c(80, 20), c(10, 90)))
dimnames(table1) <- list("检测结果" = c("阳性", "阴性"),
                         "金标准" = c("有病", "无病"))
table1
# mosaicplot函数是绘制马赛克图
mosaicplot(t(table1), col = c("red", "white"), main = "")
table2 <- as.table(cbind(c(80, 20), c(190, 1710)))
dimnames(table2) <- list("检测结果" = c("阳性", "阴性"),
                         "金标准" = c("有病", "无病"))
table2
mosaicplot(t(table2), col = c("red", "white"), main = "")

13.2 ROC及曲线下面积

# ROC曲线：横坐标是假阳性率（误诊率），纵坐标是真阳性率（灵敏度）。
install.packages("pROC")
library(pROC)
data(aSAH)
str(aSAH)

# attributes函数访问对象的属性，也可以使用str(roc)查看
roc1 <- roc(outcome ~ s100b, data = aSAH)
attributes(roc1)
roc1$auc
roc.result <- data.frame(threshold = roc1$thresholds, 
                         sensitivity = roc1$sensitivities, 
                         specificity = roc1$specificities)
# 约登指数等于灵敏度+特异度-1
roc.result$youden <- roc.result$sensitivity + roc.result$specificity - 1
head(roc.result)
# 找出约登指数最大的值的位置
which.max(roc.result$youden)
# 找出约登指数最大的值的位置的灵敏度等信息
roc.result[18, ]
# coords函数用于获取截断点，不设置"best"表示显示所有截断点
coords(roc1, "best", best.method = "closest.topleft", transpose = FALSE)
plot(1-roc1$specificities, roc1$sensitivities, type = "l", lwd = 2)
plot.roc(roc1, print.auc = TRUE, auc.polygon = TRUE, 
         grid = c(0.1, 0.2), grid.col = c("green", "red"), 
         auc.polygon.col = "lightblue", print.thres = TRUE)
# ci.auc函数用于获取AUC的置信区间
ci.auc(roc1)
# roc.test函数是对两个诊断试验进行比较，参数是roc对象
roc1 <- roc(aSAH$outcome, aSAH$s100b)
roc2 <- roc(aSAH$outcome, aSAH$ndka)
roc.test(roc1, roc2)

plot(roc1)
# 这里的lines表示在已有曲线上添加line，参考：https://blog.csdn.net/liukaiyue99/article/details/106789905
# text表示在0.5,0.5的位置添加后面的labels
lines(roc2, col = "red")
test <- roc.test(roc1, roc2)
text(0.5, 0.5, labels = paste("p-value =", round(test$p.value, 3)))
# 在右下角添加图例，添加文字、线型及线宽
legend("bottomright", 
       legend = c("S100b", "NDKA"), 
       col = c(1, 2), lwd = 2)

# glm函数用于拟合广义线性模型，通过给出线性预测器的符号描述和误差分布的描述来指定
# 具体使用参考第七章二分类Logistic回归模型的创建
# logistic.display对汇总结果进行输出，参考第七章
# lroc是epiDisplay的一个函数
fit <- glm(case ~ induced + spontaneous, family = binomial, data = infert)
library(epiDisplay)
logistic.display(fit)
lroc(fit, line.col = "red", lwd = 3)

13.3 联合实验

联合实验包括两种：平行试验（或操作），系列实验（与操作）

cut.point1 <- coords(roc1, "best", transpose = FALSE)$threshold
cut.point1
cut.point2 <- coords(roc2, "best", transpose = FALSE)$threshold
cut.point2

# 平行试验
parallel <- ifelse(aSAH$s100b > cut.point1 | aSAH$ndka > cut.point2, 
                   "Positive", "Negative")
table(parallel, aSAH$outcome)
# 系列实验
serial <- ifelse(aSAH$s100b > cut.point1 & aSAH$ndka > cut.point2,
                 "Positive", "Negative")
table(serial, aSAH$outcome)

卡卡卡卡卡夫卡

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
R语言医学数据分析实战（第十三章文中源码解析）

Chapter 13聚类分析和判别分析都是样本分类的统计方法。其中主要的区别是，聚类分析是未知分类，无监督学习；判别分析是已知分类，有监督学习。13.1 二分类结果的评价标准# cbind是按列合并，rbind是按行合并。参考：https://blog.csdn.net/ARPOSPF/article/details/85450238# dimnames给各列命名table1 <- as.table(cbind(c(80, 20), c(10, 90)))dimnames(table1)
复制链接

扫一扫