分类:混淆矩阵/列联表
- 二分类:准确率,召回率,AUC(ROC曲线下的面积),logloss(对预测概率的似然估计),accuracy(概率阈值影响该指标),precision(概率阈值影响该指标)
- T P TP TP:正预测为正(预测正确)
- T N TN TN:负预测为负(预测正确)
- F P FP FP:负预测为正
- F N FN FN:正预测为负
- 预测结果为正的准确率/精度: 准确率/PPV = T P T P + F P \text{准确率/PPV} = \frac{TP}{TP + FP} 准确率/PPV=TP+FPTP
- 正样本预测的准确率/查全率: 召回率/TPR = T P T P + F N \text{召回率/TPR} = \frac{TP}{TP + FN} 召回率/TPR=TP+FNTP
- 负样本预测的准确率: 特异度 = T N T N + F P \text{特异度} = \frac{TN}{TN + FP} 特异度=TN+FPTN
- 预测结果为负的准确率: 敏感度 = T N T N + F N \text{敏感度} = \frac{TN}{TN + FN} 敏感度=TN+FNTN
- 预测准确率: T P + F N T P + T N + F P + F N \frac{TP + FN}{TP + TN + FP + FN} TP+TN+FP+FNTP+FN
- F值(准确率和召回率的调和均值)
- F 1 = 2 × 准确率 × 召回率 准确率 + 召回率 F_1 = \frac{2 \times \text{准确率} \times \text{召回率}}{\text{准确率} + \text{召回率}} F1=准确率+召回率2×准确率×召回率
- F β = ( 1 + β 2 ) × 准确率 × 召回率 β 2 × 准确率 + 召回率 F_{\beta} = \frac{(1 + \beta^2) \times \text{准确率} \times \text{召回率}}{\beta^2 \times \text{准确率} + \text{召回率}} Fβ=β2×准确率+召回率(1+β2)×准确率×召回率
- ROC曲线:FPR(假正率)为x轴,TPR(真正率)为y轴(曲线光滑,基本可以判断没有太大的over fitting)
- PR曲线:召回率R为x轴,准确率P为y轴
- AUC
- 其它指标
- 交叉熵: H ( p , q ) H(p, q) H(p,q)
- Hinge损失
- Hinge平方损失
- Hinge分类损失
- Hamming loss
回归
- MAE(平均绝对值): MAE = ∑ i = 1 n ∣ y i ^ − f ( x i ) ∣ n \text{MAE} = \frac{\sum_{i=1}^n{\mid \hat{y_i} - f(x_i) \mid}}{n} MAE=n∑i=1n∣yi^−f(xi)∣
- MAPE(平均绝对百分比误差): MAPE = ∑ i = 1 n ∣ y i ^ − f ( x i ) f ( x i ) ∣ n \text{MAPE} = \frac{\sum_{i=1}^n{\mid \frac{\hat{y_i} - f(x_i)}{f(x_i)} \mid}}{n} MAPE=n∑i=1n∣f(xi)yi^−f(xi)∣
- MPE: MPE = ∑ i = 1 n y i ^ − f ( x i ) f ( x i ) n \text{MPE} = \frac{\sum_{i=1}^n{\frac{\hat{y_i} - f(x_i)}{f(x_i)}}}{n} MPE=n∑i=1nf(xi)yi^−f(xi)
- MAAPE: MAAPE = ∑ i = 1 n arctan ( ∣ f ( x i ) − y i ^ f ( x i ) ∣ ) n \text{MAAPE} = \frac{\sum_{i=1}^n{\text{arctan}(\mid \frac{f(x_i) - \hat{y_i}}{f(x_i)} \mid)}}{n} MAAPE=n∑i=1narctan(∣f(xi)f(xi)−yi^∣)
- MSLE: MSLE = ∑ i = 1 n ( log ( f ( x i ) + 1 ) − log ( y i ^ + 1 ) ) 2 n \text{MSLE} = \frac{\sum_{i=1}^n{(\log(f(x_i) + 1) - \log(\hat{y_i} + 1))^2}}{n} MSLE=n∑i=1n(log(f(xi)+1)−log(yi^+1))2
- SSE(误差平方和): SSE = ∑ i = 1 n ( y i ^ − f ( x i ) ) 2 \text{SSE} = \sum_{i=1}^n(\hat{y_i} - f(x_i))^2 SSE=∑i=1n(yi^−f(xi))2
- MSE(均方误): MSE = SSE n \text{MSE} = \frac{\text{SSE}}{n} MSE=nSSE
- RMSE: RMSE = MSE \text{RMSE} = \sqrt{\text{MSE}} RMSE=MSE
- R 2 R^2 R2(决定系数): R 2 = 1 − MSE ( y i ^ , f ( x i ) ) ∑ i = 1 n ( f ( x i ) − f ( x i ) ˉ ) 2 / n = 1 − SSE ∑ i = 1 n ( f ( x i ) − f ( x i ) ˉ ) 2 R^2 = 1 - \frac{\text{MSE}(\hat{y_i}, f(x_i))}{\sum_{i=1}^n{(f(x_i) - \bar{f(x_i)})^2}/n} = 1 - \frac{\text{SSE}}{\sum_{i=1}^n(f(x_i) - \bar{f(x_i)})^2} R2=1−∑i=1n(f(xi)−f(xi)ˉ)2/nMSE(yi^,f(xi))=1−∑i=1n(f(xi)−f(xi)ˉ)2SSE
-
R
adjusted
2
R^2_\text{adjusted}
Radjusted2(校正决定系数):
R
adj
2
=
1
−
(
1
−
R
2
)
×
(
n
−
1
)
n
−
p
−
1
R^2_\text{adj} = 1 - \frac{(1 - R^2) \times (n - 1)}{n - p -1}
Radj2=1−n−p−1(1−R2)×(n−1)
- n n n:样本量
- p p p:特征量
- 方差回归得分: 1 − var ( y − y ^ ) var ( y ) 1 - \frac{\text{var}(y - \hat{y})}{\text{var}(y)} 1−var(y)var(y−y^)
时间序列分析
- MAE
- RMSE
- R 2 R^2 R2
- R adj 2 R^2_\text{adj} Radj2
- t , p t, p t,p值
-
C
V
CV
CV(交叉检验):
MSE
(
e
i
)
\text{MSE}(e_i)
MSE(ei)
- e i e_i ei:第 i i i个误差
-
A
I
C
AIC
AIC(赤池信息准则):
T
×
log
SSE
T
+
2
×
(
k
+
2
)
=
T
×
log
(
MSE
)
+
2
×
(
k
+
2
)
T \times \log{\frac{\text{SSE}}{T}} + 2 \times (k + 2) = T \times \log(\text{MSE}) + 2 \times (k + 2)
T×logTSSE+2×(k+2)=T×log(MSE)+2×(k+2)
- T T T:观测点/样本数量
- k k k:预测变量/特征数量
- A I C c AIC_c AICc: AIC + 2 × ( k + 2 ) × ( k + 3 ) T − k − 3 \text{AIC} + \frac{2 \times (k + 2) \times (k + 3)}{T - k - 3} AIC+T−k−32×(k+2)×(k+3)
- B I C BIC BIC(施瓦茨的贝叶斯信息准则): T × log ( MSE ) + ( k + 2 ) × log ( T ) T \times \log(\text{MSE}) + (k + 2) \times \log(T) T×log(MSE)+(k+2)×log(T)