过拟合与欠拟合
Model capacity
当模型的复杂度增强时,模型的表达性也随之增强,以多项式为例
y
=
β
0
+
β
1
x
+
β
2
x
2
+
.
.
.
+
β
n
x
n
+
ϵ
y = \beta_0 + \beta_1x + \beta_2x^2 + ... + \beta_nx^n + \epsilon
y=β0+β1x+β2x2+...+βnxn+ϵ
随着阶数的增加,可表示的模型也就越多,模型的表示能力也就越强。
Underfitting
- model capacity < ground truth
- train loss/acc is bad
- test loss/acc is bad as well
Overfitting
- model capacity < ground truth
- train loss/acc is good
- test loss/acc is bad
- generalization performance is bad
detect and reduce
Splitting
- Train/Val/Test Set
(x, y), (x_test, y_test) = datasets.mnist.load_data()
x_train, x_val = tf.split(x, num_or_size_splits=[50000, 10000])
y_train, y_val = tf.split(y, num_or_size_splits=[50000, 10000])
K-fold cross-validation
- randomly sample 1/k as val dataset
network.fit(tf.cast(x, dtype=tf.float32)/255., tf.one_hot(tf.cast(y, dtype=tf.int32), depth=10), batch_size=64, epochs=10, validation_split=0.1, validation_freq=2)
这里要注意,validation_split参数是不能用于以dataset格式的输入的
官网所述
The argument validation_split (generating a holdout set from the training data) is not supported when training from Dataset objects, since this features requires the ability to index the samples of the datasets, which is not possible in general with the Dataset API.
更多内容,参见官网训练与评估
Reduce
- More Data
- Consttraint model complexity
- Shallow
- Regularization
- Dropout
- Data augmentation
- Early Stopping
Regularization
L
1
:
J
(
θ
)
=
1
m
∑
l
o
s
s
+
λ
∑
i
=
1
n
∣
θ
i
∣
L_1:J(\theta) = \frac{1}{m}\sum{loss} + \lambda\sum\limits_{i=1}^{n}{|\theta_i|}
L1:J(θ)=m1∑loss+λi=1∑n∣θi∣
L
1
:
J
(
θ
)
=
1
m
∑
l
o
s
s
+
λ
∑
i
=
1
n
θ
i
2
L_1:J(\theta) = \frac{1}{m}\sum{loss} + \lambda\sum\limits_{i=1}^{n}{\theta_i^2}
L1:J(θ)=m1∑loss+λi=1∑nθi2
layers.Dense(256, kernel_regularizer=keras.regularizers.l2(0.001), activation='relu'),
Early Stopping
在验证集上达到峰值的时候就停下
Dropout
随机断连
layers.Dropout(0.5)