XGBoost Stopping to Avoid Overfitting(early_stopping_rounds )

from:http://blog.csdn.net/lujiandong1/article/details/52777168

XGBoost模型和其他模型一样,如果迭代次数过多,也会进入过拟合。表现就是随着迭代次数的增加,测试集上的测试误差开始下降。当开始过拟合或者过训练时,测试集上的测试误差开始上升,或者说波动。下面通过实验来说明这种情况:

下面实验数据的来源:https://archive.ics.uci.edu/ml/datasets/Pima+Indians+Diabetes

[python]  view plain  copy
  在CODE上查看代码片 派生到我的代码片
  1. # monitor training performance  
  2. from numpy import loadtxt  
  3. from xgboost import XGBClassifier  
  4. from sklearn.cross_validation import train_test_split  
  5. from sklearn.metrics import accuracy_score  
  6. # load data  
  7. dataset = loadtxt('pima-indians-diabetes.csv', delimiter=",")  
  8. # split data into X and y  
  9. X = dataset[:,0:8]  
  10. Y = dataset[:,8]  
  11. # split data into train and test sets  
  12. X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.33, random_state=7)  
  13. # fit model no training data  
  14. model = XGBClassifier()  
  15. eval_set = [(X_test, y_test)]  
  16. model.fit(X_train, y_train, eval_metric="error", eval_set=eval_set, verbose=True)  
  17. # make predictions for test data  
  18. y_pred = model.predict(X_test)  
  19. predictions = [round(value) for value in y_pred]  
  20. # evaluate predictions  
  21. accuracy = accuracy_score(y_test, predictions)  
  22. print("Accuracy: %.2f%%" % (accuracy * 100.0))  
下面,分析每次迭代时,test error的情况:


分析:当迭代次数过多时,测试集上的测试误差基本上已经不再下降。并且测试误差基本上已经在一个水平附近波动,甚至下降。说明,已经进入了过训练阶段

==============================================================================================================================

下面,我们通过可视化训练loss,测试loss来说明过拟合的现象

[python]  view plain  copy
  在CODE上查看代码片 派生到我的代码片
  1. # plot learning curve  
  2. from numpy import loadtxt  
  3. from xgboost import XGBClassifier  
  4. from sklearn.cross_validation import train_test_split  
  5. from sklearn.metrics import accuracy_score  
  6. from matplotlib import pyplot  
  7. # load data  
  8. dataset = loadtxt('pima-indians-diabetes.csv', delimiter=",")  
  9. # split data into X and y  
  10. X = dataset[:,0:8]  
  11. Y = dataset[:,8]  
  12. # split data into train and test sets  
  13. X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.33, random_state=7)  
  14. # fit model no training data  
  15. model = XGBClassifier()  
  16. eval_set = [(X_train, y_train), (X_test, y_test)]  
  17. model.fit(X_train, y_train, eval_metric=["error""logloss"], eval_set=eval_set, verbose=True)  
  18. # make predictions for test data  
  19. y_pred = model.predict(X_test)  
  20. predictions = [round(value) for value in y_pred]  
  21. # evaluate predictions  
  22. accuracy = accuracy_score(y_test, predictions)  
  23. print("Accuracy: %.2f%%" % (accuracy * 100.0))  
  24. # retrieve performance metrics  
  25. results = model.evals_result()  
  26. epochs = len(results['validation_0']['error'])  
  27. x_axis = range(0, epochs)  
  28. # plot log loss  
  29. fig, ax = pyplot.subplots()  
  30. ax.plot(x_axis, results['validation_0']['logloss'], label='Train')  
  31. ax.plot(x_axis, results['validation_1']['logloss'], label='Test')  
  32. ax.legend()  
  33. pyplot.ylabel('Log Loss')  
  34. pyplot.title('XGBoost Log Loss')  
  35. pyplot.show()  
  36. # plot classification error  
  37. fig, ax = pyplot.subplots()  
  38. ax.plot(x_axis, results['validation_0']['error'], label='Train')  
  39. ax.plot(x_axis, results['validation_1']['error'], label='Test')  
  40. ax.legend()  
  41. pyplot.ylabel('Classification Error')  
  42. pyplot.title('XGBoost Classification Error')  
  43. pyplot.show()  
说明:对于该代码的一些解说

[python]  view plain  copy
  在CODE上查看代码片 派生到我的代码片
  1. eval_set = [(X_train, y_train), (X_test, y_test)]  
  2. model.fit(X_train, y_train, eval_metric=["error""logloss"], eval_set=eval_set, verbose=True)  
说明:每轮迭代的过程中,需要对训练集和测试进行评测,并且,评测的指标是"error","logloss"

[python]  view plain  copy
  在CODE上查看代码片 派生到我的代码片
  1. # retrieve performance metrics  
  2. results = model.evals_result()  
  3. epochs = len(results['validation_0']['error'])  
  4. x_axis = range(0, epochs)  
  5. # plot log loss  
  6. fig, ax = pyplot.subplots()  
  7. ax.plot(x_axis, results['validation_0']['logloss'], label='Train')  
  8. ax.plot(x_axis, results['validation_1']['logloss'], label='Test')  
  9. ax.legend()  
  10. pyplot.ylabel('Log Loss')  
  11. pyplot.title('XGBoost Log Loss')  
  12. pyplot.show()  
说明:每轮评估的结果可以通过evals_result取得,results['validation_0']对应的是训练集的评估结果,results['validation_1']对应的是测试集上的评估结果

下面可视化训练集误差曲线和测试集误差曲线:

通过logloss图,很明显看出,当nround大于40的时候,测试集上的误差开始上升,已经进入了过拟合了。

XGBoost可以通过设置参数 early_stopping_rounds 来解决因为迭代次数过多而过拟合的状态。

[python]  view plain  copy
  在CODE上查看代码片 派生到我的代码片
  1. eval_set = [(X_test, y_test)]  
  2. model.fit(X_train, y_train, early_stopping_rounds=10, eval_metric="logloss", eval_set=eval_set, verbose=True)  
说明:设置early_stopping_rounds=10,当logloss在10轮迭代之内,都没有提升的话,就stop。如果说eval_metric有很多个指标,那就以最后一个指标为准。

[python]  view plain  copy
  在CODE上查看代码片 派生到我的代码片
  1. # early stopping  
  2. from numpy import loadtxt  
  3. from xgboost import XGBClassifier  
  4. from sklearn.cross_validation import train_test_split  
  5. from sklearn.metrics import accuracy_score  
  6. # load data  
  7. dataset = loadtxt('pima-indians-diabetes.csv', delimiter=",")  
  8. # split data into X and y  
  9. X = dataset[:,0:8]  
  10. Y = dataset[:,8]  
  11. # split data into train and test sets  
  12. seed = 7  
  13. test_size = 0.33  
  14. X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=test_size, random_state=seed)  
  15. # fit model no training data  
  16. model = XGBClassifier()  
  17. eval_set = [(X_test, y_test)]  
  18. model.fit(X_train, y_train, early_stopping_rounds=10, eval_metric="logloss", eval_set=eval_set, verbose=True)  
  19. # make predictions for test data  
  20. y_pred = model.predict(X_test)  
  21. predictions = [round(value) for value in y_pred]  
  22. # evaluate predictions  
  23. accuracy = accuracy_score(y_test, predictions)  
  24. print("Accuracy: %.2f%%" % (accuracy * 100.0))  


说明:当nround = 42时,就stop了,说明logloss最佳的状态在nround = 32的时候。经验上,选择early_stopping_rounds = 10%*(总迭代次数)

当使用了early_stopping_rounds,可以通过best_iteration属性来提取出最适合的迭代次数,然后预测的时候就使用stop之前训练的树来预测。

[python]  view plain  copy
  在CODE上查看代码片 派生到我的代码片
  1. print (model.best_iteration)  
  2. limit = model.best_iteration  
  3. y_pred = model.predict(X_test,ntree_limit=limit)  
  4. predictions = [round(value) for value in y_pred]  
  5. # evaluate predictions  
  6. accuracy = accuracy_score(y_test, predictions)  
  7. print("Accuracy: %.2f%%" % (accuracy * 100.0))  

### YOLO Training Epochs Meaning and Usage In machine learning models including those based on YOLO (You Only Look Once), an epoch refers to one complete pass through the entire dataset during training. This concept is fundamental when configuring how long a model will learn from data before it can be used for inference tasks like object detection. For YOLO-based systems, setting up appropriate epochs involves understanding that: - Each epoch allows all images within the training set to contribute towards adjusting weights via backpropagation. - The number of epochs significantly impacts both overfitting risks and underfitting concerns as too few may not allow sufficient learning while excessive numbers could lead to memorization rather than generalizable performance improvements[^1]. When initiating training with commands similar to `train_voc.cmd`, users must specify parameters related to epochs either directly in configuration files (`yolo-obj.cfg`) or through command-line arguments depending upon implementation specifics. For instance, configurations might include settings tailored specifically toward handling custom datasets where adjustments are made according to unique requirements such as class counts which influence filter sizes in convolutional layers[^2]: ```plaintext [yolo] classes=number_of_classes_in_dataset ... max_batches = total_number_of_epochs * batches_per_epoch steps = proportion_of_max_batches_for_learning_rate_decay ``` The above snippet illustrates part of what would typically appear inside `.cfg` files associated with different versions of YOLO architectures. Here, specifying `max_batches` effectively determines the overall duration of training measured across multiple iterations over the full dataset. Meanwhile, modifying `steps` helps control at which points during this process certain hyperparameters—like learning rates—are altered dynamically to optimize convergence properties without risking divergence due to overly aggressive updates early on. Additionally, monitoring progress throughout these stages often requires logging mechanisms built into frameworks supporting YOLO implementations. Such logs provide insights into loss trends alongside other metrics critical for assessing whether chosen values align well enough with desired outcomes after completing specified rounds of exposure against labeled examples provided initially[^3]. --related questions-- 1. How does batch size affect the choice of epochs in YOLO training? 2. What strategies exist for determining optimal stopping criteria beyond fixed epoch limits? 3. Can you explain how transfer learning influences required epoch count for fine-tuning pretrained YOLO models? 4. In what ways do validation sets play a role in tuning epochs during YOLO's development phase?
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值