tf.keras.callbacks.EarlyStopping 的理解

callback = tf.keras.callbacks.EarlyStopping(monitor='val_loss',min_delta=0.0001, patience=10, restore_best_weights=True)   

history = model.fit(X_train_lst, df_y_train.values, epochs=500, batch_size=200, validation_split=0.3,callbacks=[callback])

Train on 782135 samples, validate on 335201 samples
Epoch 1/500
782135/782135 [==============================] - 8s 11us/sample - loss: 0.0663 - mean_absolute_error: 0.0663 - val_loss: 0.0284 - val_mean_absolute_error: 0.0284
Epoch 2/500
782135/782135 [==============================] - 8s 10us/sample - loss: 0.0237 - mean_absolute_error: 0.0237 - val_loss: 0.0214 - val_mean_absolute_error: 0.0214
Epoch 3/500
782135/782135 [==============================] - 8s 10us/sample - loss: 0.0199 - mean_absolute_error: 0.0199 - val_loss: 0.0195 - val_mean_absolute_error: 0.0195
Epoch 4/500
782135/782135 [==============================] - 8s 10us/sample - loss: 0.0182 - mean_absolute_error: 0.0182 - val_loss: 0.0177 - val_mean_absolute_error: 0.0177
Epoch 5/500
782135/782135 [==============================] - 8s 10us/sample - loss: 0.0171 - mean_absolute_error: 0.0171 - val_loss: 0.0170 - val_mean_absolute_error: 0.0170
Epoch 6/500
782135/782135 [==============================] - 8s 10us/sample - loss: 0.0162 - mean_absolute_error: 0.0162 - val_loss: 0.0167 - val_mean_absolute_error: 0.0167
Epoch 7/500
782135/782135 [==============================] - 8s 10us/sample - loss: 0.0155 - mean_absolute_error: 0.0155 - val_loss: 0.0154 - val_mean_absolute_error: 0.0154
Epoch 8/500
782135/782135 [==============================] - 8s 10us/sample - loss: 0.0151 - mean_absolute_error: 0.0151 - val_loss: 0.0147 - val_mean_absolute_error: 0.0147
Epoch 9/500
782135/782135 [==============================] - 8s 10us/sample - loss: 0.0145 - mean_absolute_error: 0.0145 - val_loss: 0.0148 - val_mean_absolute_error: 0.0148
Epoch 10/500
782135/782135 [==============================] - 8s 10us/sample - loss: 0.0141 - mean_absolute_error: 0.0141 - val_loss: 0.0137 - val_mean_absolute_error: 0.0137
Epoch 11/500
782135/782135 [==============================] - 8s 10us/sample - loss: 0.0139 - mean_absolute_error: 0.0139 - val_loss: 0.0139 - val_mean_absolute_error: 0.0139
Epoch 12/500
782135/782135 [==============================] - 8s 10us/sample - loss: 0.0135 - mean_absolute_error: 0.0135 - val_loss: 0.0150 - val_mean_absolute_error: 0.0150
Epoch 13/500
782135/782135 [==============================] - 8s 10us/sample - loss: 0.0132 - mean_absolute_error: 0.0132 - val_loss: 0.0134 - val_mean_absolute_error: 0.0134
Epoch 14/500
782135/782135 [==============================] - 8s 10us/sample - loss: 0.0130 - mean_absolute_error: 0.0130 - val_loss: 0.0141 - val_mean_absolute_error: 0.0141
Epoch 15/500
782135/782135 [==============================] - 8s 10us/sample - loss: 0.0128 - mean_absolute_error: 0.0128 - val_loss: 0.0142 - val_mean_absolute_error: 0.0142
Epoch 16/500
782135/782135 [==============================] - 8s 10us/sample - loss: 0.0126 - mean_absolute_error: 0.0126 - val_loss: 0.0127 - val_mean_absolute_error: 0.0127
Epoch 17/500
782135/782135 [==============================] - 8s 10us/sample - loss: 0.0124 - mean_absolute_error: 0.0124 - val_loss: 0.0126 - val_mean_absolute_error: 0.0126
Epoch 18/500
782135/782135 [==============================] - 8s 10us/sample - loss: 0.0123 - mean_absolute_error: 0.0123 - val_loss: 0.0124 - val_mean_absolute_error: 0.0124
Epoch 19/500
782135/782135 [==============================] - 8s 10us/sample - loss: 0.0121 - mean_absolute_error: 0.0121 - val_loss: 0.0126 - val_mean_absolute_error: 0.0126
Epoch 20/500
782135/782135 [==============================] - 8s 10us/sample - loss: 0.0120 - mean_absolute_error: 0.0120 - val_loss: 0.0124 - val_mean_absolute_error: 0.0124
Epoch 21/500
782135/782135 [==============================] - 8s 10us/sample - loss: 0.0119 - mean_absolute_error: 0.0119 - val_loss: 0.0124 - val_mean_absolute_error: 0.0124
Epoch 22/500
782135/782135 [==============================] - 8s 10us/sample - loss: 0.0118 - mean_absolute_error: 0.0118 - val_loss: 0.0127 - val_mean_absolute_error: 0.0127
Epoch 23/500
782135/782135 [==============================] - 8s 10us/sample - loss: 0.0116 - mean_absolute_error: 0.0116 - val_loss: 0.0135 - val_mean_absolute_error: 0.0135
Epoch 24/500
782135/782135 [==============================] - 8s 10us/sample - loss: 0.0115 - mean_absolute_error: 0.0115 - val_loss: 0.0115 - val_mean_absolute_error: 0.0115
Epoch 25/500
782135/782135 [==============================] - 8s 10us/sample - loss: 0.0114 - mean_absolute_error: 0.0114 - val_loss: 0.0123 - val_mean_absolute_error: 0.0123
Epoch 26/500
782135/782135 [==============================] - 8s 10us/sample - loss: 0.0114 - mean_absolute_error: 0.0114 - val_loss: 0.0121 - val_mean_absolute_error: 0.0121
Epoch 27/500
782135/782135 [==============================] - 8s 10us/sample - loss: 0.0113 - mean_absolute_error: 0.0113 - val_loss: 0.0113 - val_mean_absolute_error: 0.0113
Epoch 28/500
782135/782135 [==============================] - 8s 10us/sample - loss: 0.0112 - mean_absolute_error: 0.0112 - val_loss: 0.0117 - val_mean_absolute_error: 0.0117
Epoch 29/500
782135/782135 [==============================] - 8s 10us/sample - loss: 0.0111 - mean_absolute_error: 0.0111 - val_loss: 0.0128 - val_mean_absolute_error: 0.0128
Epoch 30/500
782135/782135 [==============================] - 8s 10us/sample - loss: 0.0111 - mean_absolute_error: 0.0111 - val_loss: 0.0111 - val_mean_absolute_error: 0.0111
Epoch 31/500
782135/782135 [==============================] - 8s 10us/sample - loss: 0.0110 - mean_absolute_error: 0.0110 - val_loss: 0.0122 - val_mean_absolute_error: 0.0122
Epoch 32/500
782135/782135 [==============================] - 8s 10us/sample - loss: 0.0109 - mean_absolute_error: 0.0109 - val_loss: 0.0109 - val_mean_absolute_error: 0.0109
Epoch 33/500
782135/782135 [==============================] - 8s 10us/sample - loss: 0.0109 - mean_absolute_error: 0.0109 - val_loss: 0.0117 - val_mean_absolute_error: 0.0117
Epoch 34/500
782135/782135 [==============================] - 8s 10us/sample - loss: 0.0108 - mean_absolute_error: 0.0108 - val_loss: 0.0115 - val_mean_absolute_error: 0.0115
Epoch 35/500
782135/782135 [==============================] - 8s 10us/sample - loss: 0.0107 - mean_absolute_error: 0.0107 - val_loss: 0.0109 - val_mean_absolute_error: 0.0109
Epoch 36/500
782135/782135 [==============================] - 8s 10us/sample - loss: 0.0107 - mean_absolute_error: 0.0107 - val_loss: 0.0116 - val_mean_absolute_error: 0.0116
Epoch 37/500
782135/782135 [==============================] - 8s 10us/sample - loss: 0.0106 - mean_absolute_error: 0.0106 - val_loss: 0.0110 - val_mean_absolute_error: 0.0110
Epoch 38/500
782135/782135 [==============================] - 8s 10us/sample - loss: 0.0105 - mean_absolute_error: 0.0105 - val_loss: 0.0112 - val_mean_absolute_error: 0.0112
Epoch 39/500
782135/782135 [==============================] - 8s 10us/sample - loss: 0.0105 - mean_absolute_error: 0.0105 - val_loss: 0.0109 - val_mean_absolute_error: 0.0109
Epoch 40/500
782135/782135 [==============================] - 8s 10us/sample - loss: 0.0105 - mean_absolute_error: 0.0105 - val_loss: 0.0108 - val_mean_absolute_error: 0.0108
Epoch 41/500
782135/782135 [==============================] - 8s 10us/sample - loss: 0.0104 - mean_absolute_error: 0.0104 - val_loss: 0.0109 - val_mean_absolute_error: 0.0109
Epoch 42/500
782135/782135 [==============================] - 8s 10us/sample - loss: 0.0103 - mean_absolute_error: 0.0103 - val_loss: 0.0109 - val_mean_absolute_error: 0.0109

min_delta:增大或减小的阈值,只有大于这个部分才算作improvement。这个值的大小取决于monitor,也反映了你的容忍程度。本处取为0.0001,即val_loss减小幅度大于0.0001才算有提升,loss不变或增大都算没提升。

patience:能够容忍多少个epoch内都没有improvement。本处取为10。

 

最后10个epochs不满足条件,停止了。

以下转载自https://blog.csdn.net/silent56_th/article/details/72845912

  • min_delta:增大或减小的阈值,只有大于这个部分才算作improvement。这个值的大小取决于monitor,也反映了你的容忍程度。例如笔者的monitor是’acc’,同时其变化范围在70%-90%之间,所以对于小于0.01%的变化不关心。加上观察到训练过程中存在抖动的情况(即先下降后上升),所以适当增大容忍程度,最终设为0.003%。
  • patience:能够容忍多少个epoch内都没有improvement。这个设置其实是在抖动和真正的准确率下降之间做tradeoff。如果patience设的大,那么最终得到的准确率要略低于模型可以达到的最高准确率。如果patience设的小,那么模型很可能在前期抖动,还在全图搜索的阶段就停止了,准确率一般很差。patience的大小和learning rate直接相关。在learning rate设定的情况下,前期先训练几次观察抖动的epoch number,比其稍大些设置patience。在learning rate变化的情况下,建议要略小于最大的抖动epoch number。笔者在引入EarlyStopping之前就已经得到可以接受的结果了,EarlyStopping算是锦上添花,所以patience设的比较高,设为抖动epoch number的最大值。
  • min_delta和patience都和“避免模型停止在抖动过程中”有关系,所以调节的时候需要互相协调。通常情况下,min_delta降低,那么patience可以适当减少;min_delta增加,那么patience需要适当延长;反之亦然。
  • 1
    点赞
  • 9
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值