### 原文链接：

R语言中回归和分类模型选择的性能指标​tecdat.cn

## 回归的绩效衡量

### 关联 ：协方差和标准差

plot.mean.deviation <- function(y, y.hat, label) { means <- c(mean(y), mean(y.hat)) df <- data.frame("N" = c(seq_along(y), seq_along(y)), "Deviation" = c(y.deviation, y.hat.deviation), "Variable" = c(rep("Y", length(y)), rep("Y_Hat", length(y.hat)))) ggplot() + geom_segment(size = 2, data = segment.df, aes(x = N, xend = N, y = Y, yend = Yend, color = Contribution)) + geom_point(data = df, alpha = 0.8, size = 2, aes(x = N, y = Deviation, shape = Variable)) + xlab("Measurement i of N") + ylab("Deviation from mean value") }

# covariance set.seed(1501) df.low <- data.frame(Y = y, Y_Hat = y.hat) p1 <- plot.mean.deviation(y, y.hat, label = "Positive Covariance") # negative covariance: contrasting spread around mean y.hat <- y - 2 * (y - y.mean) + noise p2 <- plot.mean.deviation(y, y.hat, "Negative Covariance") # no covariance y.hat <- runif(N, -0.1, 0.1) grid.arrange(p1, p2, p3, nrow = 3)

plot.mean.deviation(y, y.hat, label = "Positive Covariance")

df.high <- data.frame(Y = y, Y_Hat = y.hat)

### 用相关系数解释

R平方通常为正，因为具有截距的模型会产生SSres <SStotSSres <SStot的预测Y ^ Y ^，因为模型的预测比平均结果更接近结果。因此，只要存在截距，确定系数就是相关系数的平方：

### 确定系数

rsquared <- function(test.preds, test.labels) { return(round(cor(test.preds, test.labels)^2, 3)) } plot.linear.model <- function(model, test.preds = NULL, test.labels = NULL, test.only = FALSE) { # ensure that model is interpreted as a GLM pred <- model$fitted.values obs <- model$model[,1] if (test.only) { } else { plot.df <- data.frame("Prediction" = pred, "Outcome" = obs, "DataSet" = "training") } r.squared <- NULL if (!is.null(test.preds) && !is.null(test.labels)) { # store predicted points: test.df <- data.frame("Prediction" = test.preds, "Outcome" = test.labels, "DataSet" = "test") plot.df <- rbind(plot.df, test.df) } ####### library(ggplot2) p <- ggplot() + # plot training samples geom_point(data = plot.df, aes(x = Outcome, y = Prediction, color = DataSet)) return(p) }

### R平方的局限性

plot(x,y)

## [1] 0.9

R2R2的另一个属性是它取决于值范围。R2R2通常在XX的宽值范围内较大，这是因为协方差的增加是由标准偏差调整的，该标准偏差的缩放速度比1N 项引起的协方差的缩放速度慢。

## [1] "R squared: 0.924115453794893, MSE:0.806898017781999"

## [1] "R squared: 0.0657969487417489, MSE:0.776376454723889"

## 分类模型的绩效指标

### 准确性与敏感性和特异性

• 敏感性：如果事件发生，则模型检测到事件的可能性有多大？
• 特异性：如果没有事件发生，那么该模型识别出没有事件发生的可能性有多大？

### ROC曲线下方的区域

plot.scores.AUC <- function(y, y.hat) { par(mfrow=c(1,2)) hist(y.hat[y == 0], col=rgb(1,0,0,0.5), main = "Score Distribution", breaks=seq(min(y.hat),max(y.hat)+1, 1), xlab = "Prediction") hist(y.hat[y == 1], col = rgb(0,0,1,0.5), add=T, breaks=seq(min(y.hat),max(y.hat) + 1, 1)) legend("topleft", legend = c("Class 0", "Class 1"), col=c("red", "blue"), lty=1, cex=1) }

• 0
点赞
• 1
收藏
• 0
评论
11-15 87
07-10 433
06-20 1万+
05-01 639
08-28 81
09-04 472
07-15 4924
11-13 36
10-10 67
10-11 242

• 非常没帮助
• 没帮助
• 一般
• 有帮助
• 非常有帮助