Content
Cross Validation
- random subsampling:
- k-fold
- leave one out
Cost Function
减少预测误差 (prediction error) ——预测值和真实值的差异,这种差异一般用error metric量化
Error Metric:
- cost function
- loss fuction (machine learning)
- objective function (optimization)
- utility function (equal to negative cost, used in decision theory)
Cost Function
MSE, RMSE, MAE, FP & FN, F1 scores
M S E ( a p p r o x i m a t i o n ) = 1 n ∑ i = 1 n ( t r u e − a p p r o x i m a t i o n ) 2 = 1 n d ∑ i = 1 n d ( y i − x i T θ ^ ) 2 MSE(approximation) = \frac{1}{n} \sum^{n}_{i=1}{(true - approximation)}^2\\ = \frac{1}{n_d}\sum^{n_d}_{i=1}(y_i-x_i ^\mathsf{T}\hat{\theta})^2 MSE(approximation)=n1i=1∑n(true−approximation)2=nd1i=1∑nd(yi−xiTθ^)2
R
M
S
E
=
M
S
E
RMSE = \sqrt{MSE}
RMSE=MSE
M
A
E
(
y
^
)
=
1
n
∑
i
=
1
n
∣
y
i
−
y
i
^
∣
MAE(\hat{y}) = \frac{1}{n} \sum^{n}_{i=1}{|y_i - \hat{y_i}|}
MAE(y^)=n1i=1∑n∣yi−yi^∣
TP: True positive; FP: false positive; TN: true negetive; FN: false negetive
P ( y ^ ∣ y ) P(\hat{y}\mid y) P(y^∣y) | y = 1 y=1 y=1 | y = 0 y=0 y=0 |
---|---|---|
y ^ = 1 \hat{y}=1 y^=1 | TP | FP |
y ^ = 0 \hat{y}=0 y^=0 | FN | TN |
R
e
c
a
l
l
=
S
e
n
s
i
t
i
v
i
t
y
=
T
P
T
P
+
F
N
\mathrm{Recall}=\mathrm{Sensitivity} = \frac{TP}{TP+FN}
Recall=Sensitivity=TP+FNTP
S
p
e
c
i
f
i
c
i
t
y
=
T
N
F
P
+
T
N
\mathrm{Specificity} = \frac{TN}{FP+TN}
Specificity=FP+TNTN
A
c
c
u
r
a
c
y
=
T
P
+
T
N
T
P
+
T
N
+
F
P
+
F
N
\mathrm{Accuracy} = \frac{TP+TN}{TP+TN+FP+FN}
Accuracy=TP+TN+FP+FNTP+TN
T
r
u
e
p
o
s
i
t
i
v
e
r
a
t
e
(
T
P
R
)
=
S
e
n
s
i
t
i
v
i
t
y
\mathrm{True~positive~rate~(TPR)}=\mathrm{Sensitivity}
True positive rate (TPR)=Sensitivity
F
a
l
s
e
p
o
s
i
t
i
v
e
r
a
t
e
(
F
P
R
)
=
1
−
S
p
e
c
i
f
i
c
i
t
y
\mathrm{False~positive~rate~(FPR)}=1-\mathrm{Specificity}
False positive rate (FPR)=1−Specificity
If the data is skewed (数据倾斜):
P
r
e
c
i
s
i
o
n
=
T
P
T
P
+
F
P
\mathrm{Precision}=\frac{TP}{TP+FP}
Precision=TP+FPTP
F
1
s
c
o
r
e
=
2
(
p
r
e
c
i
s
i
o
n
×
r
e
c
a
l
l
p
r
e
c
i
s
i
o
n
+
r
e
c
a
l
l
)
\mathrm{F_1~score}=2\left(\frac{precision \times recall}{precision + recall}\right)
F1 score=2(precision+recallprecision×recall)