XGBoost

XGBoost

XGBoost是陈天奇等人开发的一个开源机器学习项目,高效地实现了GBDT算法并进行了算法和工程上的许多改进,被广泛应用在Kaggle竞赛及其他许多机器学习竞赛中并取得了不错的成绩。

说到XGBoost,不得不提GBDT(Gradient Boosting Decision Tree)。因为XGBoost本质上还是一个GBDT,但是力争把速度和效率发挥到极致,所以叫X (Extreme) GBoosted。包括前面说过,两者都是boosting方法。

关于GBDT,这里不再提,可以查看我另一篇的介绍,点此跳转

XGBoost与GBDT异同点

  • GBDT是机器学习算法,XGBoost是该算法的工程实现。
  • 在使用CART作为基分类器时,XGBoost显式地加入了正则项来控制模型的复杂度,有利于防止过拟合,从而提高模型的泛化能力。
  • GBDT在模型训练时只使用了代价函数的一阶导数信息,XGBoost对代价函数进行二阶泰勒展开,可以同时使用一阶和二阶导数。
  • 传统的GBDT采用CART作为基分类器,XGBoost支持多种类型的基分类器,比如线性分类器。
  • 传统的GBDT在每轮迭代时使用全部的数据,XGBoost则采用了与随机森林相似的策略,支持对数据进行采样。
  • 传统的GBDT没有设计对缺失值进行处理,XGBoost能够自动学习出缺失值的处理策略。

部分内容转自博客:终于有人说清楚了–XGBoost算法

XGBoost公式推导

机器学习 集成算法XGBoost原理及推导

import pandas as pd
import numpy as np
import scipy
import xgboost as xgb
import tensorflow as tf
import seaborn as sns
import matplotlib.pyplot as plt

# from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import cross_val_score
from  sklearn import metrics 

# from sklearn.utils import class_weight
from sklearn.model_selection import train_test_split
# from sklearn.metrics import roc_auc_score,precision_recall_curve

# 打印行、列数量设置
pd.set_option('display.max_columns',200)
pd.set_option('display.max_rows',200)
pd.set_option('display.width',200)

#忽略一些版本不兼容等警告
import warnings
warnings.filterwarnings("ignore")
# 模型参数设定
model = xgb.XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,
              importance_type='gain', interaction_constraints='',
              learning_rate=0.1, max_delta_step=0, max_depth=4,
              min_child_weight=1, monotone_constraints='()',
              n_estimators=10, n_jobs=0, num_parallel_tree=1, random_state=0,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', validate_parameters=1, verbosity=None)

XGBClassifier参数

  • base_score [ default=0.5 ]

\qquad 所有实例的初始预测得分,整体偏差the initial prediction score of all instances, global bias

  • n_estimators 迭代次数

  • booster [default=gbtree]

\qquad 有两中模型可以选择gbtree和gblinear。gbtree使用基于树的模型进行提升计算,gblinear使用线性模型进行提升计算。缺省值为gbtree。

  • colsample_bytree [default=1]

\qquad 在建立树时对特征采样的比例。缺省值为1.取值范围: ( 0 , 1 ] (0,1] (0,1]。subsample, colsample_bytree = 0.8: 这个是最常见的初始值了。典型值的范围在0.5-0.9之间

  • subsample [default=1] 每棵树随机采样的样本的比例

\qquad 用于训练模型的子样本占整个样本集合的比例。如果设置为0.5则意味着XGBoost将随机的冲整个样本集合中随机的抽取出50%的子样本建立树模型,这能够防止过拟合。取值范围为:(0,1]。

  • colsample_bylevel 每层随机采样特征占比

\qquad 用来控制树的每一级的每一次分裂,对列数的采样的占比。一般不太用这个参数,因为subsample参数和colsample_bytree参数可以起到相同的作用。但是如果感兴趣,可以挖掘这个参数更多的用处。

  • colsample_bynode 每节点随机采样特征占比

  • gamma [default=0] 惩罚项那个和叶子节点结合的项

\qquad 在树的叶节点上进行进一步分区所需的最小损失减少。 越大,算法将越保守。range: [ 0 , ∞ ] [0,\infty] [0,]。minimum loss reduction required to make a further partition on a leaf node of the tree. the larger, the more conservative the algorithm will be.

  • learning_rates:每一次提升的学习率的列表

  • max_delta_step [default=0]

\qquad 我们允许的最大增量步长是每棵树的权重估算值。 如果将该值设置为0,则表示没有约束。 如果将其设置为正值,则可以帮助使更新步骤更“保守”。 通常不需要此参数,但是当类极度不平衡时,它可能有助于逻辑回归。 将其设置为1-10的值可能有助于控制更新。取值范围为: [ 0 , ∞ ] [0,\infty] [0,]。Maximum delta step we allow each tree’s weight estimation to be. If the value is set to 0, it means there is no constraint. If it is set to a positive value, it can help making the update step more conservative. Usually this parameter is not needed, but it might help in logistic regression when class is extremely imbalanced. Set it to value of 1-10 might help control the update

  • max_depth [default=6]

\qquad 树的最大深度。缺省值为6。取值范围为: [ 1 , ∞ ] [1,\infty] [1,]

  • min_child_weight [default=1]

\qquad 孩子节点中最小的样本权重和。如果一个叶子节点的样本权重和小于min_child_weight则拆分过程结束。在线性回归模型中,这个参数是指建立每个模型所需要的最小样本数。取值范围为: [ 0 , ∞ ] [0,\infty] [0,]

  • importance_type 特征重要性依据 str, default “weight”

weight是特征在树中出现的次数

gain是使用该特征切分的平均增益

cover是使用该特征切分的平均覆盖率,覆盖率定义为受拆分影响的样本数

参数说明
n_jobs线程数
missing缺失值表示
num_parallel_tree构造并行树的数量
random_state随机种子
reg_alphaL1 正则化项
reg_lambdaL2 正则化项
scale_pos_weight样本不平衡处理
tree_method树构建算法
validate_parameters输入参数验证
verbosity打印消息日志等级
interaction_constraints特征交互约束
monotone_constraints单调约束
gpu_idgpu ID

部分内容转载自:

泰坦尼克号乘客生存率分析

traindata_path = u'D:/01_Project/99_test/ML/titanic/train.csv'
testdata_path = u'D:/01_Project/99_test/ML/titanic/test.csv'
testresult_path = u'D:/01_Project/99_test/ML/titanic/gender_submission.csv'
df_train = pd.read_csv(traindata_path)
df_test = pd.read_csv(testdata_path)
df_test['Survived'] = pd.read_csv(testresult_path)['Survived']
data_original = pd.concat([df_train,df_test],sort=False)
display (data_original.head(5))
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
0103Braund, Mr. Owen Harrismale22.010A/5 211717.2500NaNS
1211Cumings, Mrs. John Bradley (Florence Briggs Th...female38.010PC 1759971.2833C85C
2313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNS
3411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123S
4503Allen, Mr. William Henrymale35.0003734508.0500NaNS

字段注释

  • PassengerId => 乘客ID
  • Pclass => 乘客等级(1/2/3等舱位)
  • Name => 乘客姓名
  • Sex => 性别
  • Age => 年龄
  • SibSp => 堂兄弟/妹个数
  • Parch => 父母与小孩个数
  • Ticket => 船票信息
  • Fare => 票价
  • Cabin => 客舱
  • Embarked => 登船港口
# 查看数据
data_original.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1309 entries, 0 to 417
Data columns (total 12 columns):
PassengerId    1309 non-null int64
Survived       1309 non-null int64
Pclass         1309 non-null int64
Name           1309 non-null object
Sex            1309 non-null object
Age            1046 non-null float64
SibSp          1309 non-null int64
Parch          1309 non-null int64
Ticket         1309 non-null object
Fare           1308 non-null float64
Cabin          295 non-null object
Embarked       1307 non-null object
dtypes: float64(2), int64(5), object(5)
memory usage: 132.9+ KB
features = list(data_original.columns[data_original.dtypes != 'object'])
print (features)
['PassengerId', 'Survived', 'Pclass', 'Age', 'SibSp', 'Parch', 'Fare']
# 查看数字类型特征分布
data_original[features].describe()
PassengerIdSurvivedPclassAgeSibSpParchFare
count1309.0000001309.0000001309.0000001046.0000001309.0000001309.0000001308.000000
mean655.0000000.3773872.29488229.8811380.4988540.38502733.295479
std378.0200610.4849180.83783614.4134931.0416580.86556051.758668
min1.0000000.0000001.0000000.1700000.0000000.0000000.000000
25%328.0000000.0000002.00000021.0000000.0000000.0000007.895800
50%655.0000000.0000003.00000028.0000000.0000000.00000014.454200
75%982.0000001.0000003.00000039.0000001.0000000.00000031.275000
max1309.0000001.0000003.00000080.0000008.0000009.000000512.329200
# 查看类别特征值分布
print (data_original['Sex'].value_counts())
print (data_original['Embarked'].value_counts())
male      843
female    466
Name: Sex, dtype: int64
S    914
C    270
Q    123
Name: Embarked, dtype: int64

XGBoost 可以缺失值,但是不接受类型特征,要对类型特征进行转换

DataFrame.dtypes for data must be int, float or bool.

Did not expect the data types in fields Sex, Embarked

# 类别特征独热编码
data_onehot = pd.get_dummies(data_original,columns=['Sex','Embarked'])
# data_original['Sex'].replace('male',0,inplace=True)   #inplace=True 替换
data_onehot.head()
PassengerIdSurvivedPclassNameAgeSibSpParchTicketFareCabinSex_femaleSex_maleEmbarked_CEmbarked_QEmbarked_S
0103Braund, Mr. Owen Harris22.010A/5 211717.2500NaN01001
1211Cumings, Mrs. John Bradley (Florence Briggs Th...38.010PC 1759971.2833C8510100
2313Heikkinen, Miss. Laina26.000STON/O2. 31012827.9250NaN10001
3411Futrelle, Mrs. Jacques Heath (Lily May Peel)35.01011380353.1000C12310001
4503Allen, Mr. William Henry35.0003734508.0500NaN01001
# 剔除非训练特征
drop_features = ['PassengerId', 'Survived', 'Name','Ticket','Cabin']
features_filted = list(data_onehot.columns.values)
for feature in drop_features:
    features_filted.remove(feature)
# features_filted = list(set(features_filted) - set(drop_features))
print (features_filted)

# 划分训练集和验证集
x_train, x_test, y_train, y_test = train_test_split(data_onehot[features_filted], data_onehot['Survived'], random_state=1, train_size=0.7)
display(x_train.shape)
display(x_test.shape)
display(y_train.shape)
display(y_train.shape)
['Pclass', 'Age', 'SibSp', 'Parch', 'Fare', 'Sex_female', 'Sex_male', 'Embarked_C', 'Embarked_Q', 'Embarked_S']



(916, 10)



(393, 10)



(916,)



(916,)

模型训练

model.fit(x_train, y_train,eval_set=[(x_train,y_train),(x_test,y_test)],
       eval_metric=['error'],early_stopping_rounds=5,verbose=True)
[0]	validation_0-error:0.11572	validation_1-error:0.15013
Multiple eval metrics have been passed: 'validation_1-error' will be used for early stopping.

Will train until validation_1-error hasn't improved in 5 rounds.
[1]	validation_0-error:0.11572	validation_1-error:0.15013
[2]	validation_0-error:0.11572	validation_1-error:0.14758
[3]	validation_0-error:0.11572	validation_1-error:0.14758
[4]	validation_0-error:0.11572	validation_1-error:0.14758
[5]	validation_0-error:0.11572	validation_1-error:0.14758
[6]	validation_0-error:0.10808	validation_1-error:0.14758
[7]	validation_0-error:0.10808	validation_1-error:0.14758
Stopping. Best iteration:
[2]	validation_0-error:0.11572	validation_1-error:0.14758






XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,
              importance_type='gain', interaction_constraints='',
              learning_rate=0.1, max_delta_step=0, max_depth=4,
              min_child_weight=1, missing=nan, monotone_constraints='()',
              n_estimators=10, n_jobs=0, num_parallel_tree=1, random_state=0,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', validate_parameters=1, verbosity=None)

参数说明

  • early_stopping_rounds:在连续加入5棵树之后,每一次模型的损失函数都没有下降,这时候停止加树,有监控作用

  • eval_set:进行测试的数据集

  • verbose=False不打印训练过程

  • objective 目标函数

\quad 回归任务

\qquad reg:linear (默认)

\qquad reg:logistic

\quad 二分类

\qquad binary:logistic 概率

\qquad binary:logitraw 类别

\quad 多分类

\qquad multi:softmax num_class=n 返回类别

\qquad multi:softprob num_class=n 返回概率

\qquad rank:pairwise

  • eval_metric

\quad 回归任务(默认rmse)

\qquad rmse–均方根误差

\qquad mae–平均绝对误差

\quad 分类任务(默认error)

\qquad auc–roc曲线下面积

\qquad error–错误率(二分类)

\qquad merror–错误率(多分类)

\qquad logloss–负对数似然函数(二分类)

\qquad mlogloss–负对数似然函数(多分类)

特征重要性

importance_df = pd.DataFrame({
    'features':x_train.columns.values,
    'importance':model.feature_importances_.tolist()
})
importance_df = importance_df.sort_values('importance',ascending=False)
importance_df
featuresimportance
5Sex_female0.893800
0Pclass0.052338
4Fare0.015934
2SibSp0.015133
1Age0.011970
8Embarked_Q0.010825
3Parch0.000000
6Sex_male0.000000
7Embarked_C0.000000
9Embarked_S0.000000
import seaborn as sns
import matplotlib.pyplot as plt
plt.figure(figsize=(10,6))
sns.barplot(importance_df['importance'][:20],importance_df['features'][:20])
plt.show()

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-bz7yuuvx-1615976811646)(output_19_0.png)]

混淆矩阵

pred_y_test = model.predict(x_test)
# m = metrics.confusion_matrix(y_test, pred_y_test)
# display (m)
tn, fp, fn, tp = metrics.confusion_matrix(y_test, pred_y_test).ravel()
print ('matrix    label1   label0')
print ('predict1  {:<6d}   {:<6d}'.format(int(tp), int(fp)))
print ('predict0  {:<6d}   {:<6d}'.format(int(fn), int(tn)))
matrix    label1   label0
predict1  123      17    
predict0  41       212   

交叉验证

验证模型得分

score_x = x_train
score_y = y_train
# 正确率
scores = cross_val_score(model, score_x, score_y, cv=5, scoring='accuracy')
print('交叉验证正确率为:'+str(scores.mean()))  
交叉验证正确率为:0.8700700879068662
# 精确率
scores = cross_val_score(model, score_x, score_y, cv=5, scoring='precision')
print('交叉验证精确率为:'+str(scores.mean()))  
交叉验证精确率为:0.834282815104733
# 召回率
scores = cross_val_score(model, score_x, score_y, cv=5, scoring='recall')
print('交叉验证召回率为:'+str(scores.mean()))  
交叉验证召回率为:0.8030303030303031
# f1_score
scores = cross_val_score(model, score_x, score_y, cv=5, scoring='f1')
print('交叉验证f1_score为:'+str(scores.mean()))  
交叉验证f1_score为:0.81649043555853

TopN

当样本不均衡且比较关注召回率时使用TopN来评估模型,泰坦尼克号乘客生存率预测不适合用TopN来评判模型预测好还。

ratio_list = [0.01,0.02,0.05,0.1,0.2]
test_label = pd.DataFrame(y_test)
index_of_label1 = model.classes_.tolist().index(1)
pred_y_test = model.predict(x_test)
proba_y_test = model.predict_proba(x_test)
test_label['predict'] = pred_y_test
test_label['label_1'] = proba_y_test[:,index_of_label1]
display (test_label.head())

label_1_nbr = len(test_label[test_label['Survived']==1])
print ('label_1_nbr:',label_1_nbr)
print ('sample number:',len(test_label))

for ratio in ratio_list:
    num = test_label.sort_values('label_1',ascending=False)[:int(ratio*test_label.shape[0])]['Survived'].sum()
    count = test_label.sort_values('label_1',ascending=False)[:int(ratio*test_label.shape[0])]['Survived'].count()
    print ('Top %.2f label_1_nbr:%d,sample_nbr:%d,recall:%f'%(ratio,num,count,1.0*num/label_1_nbr))
Survivedpredictlabel_1
201000.406292
115000.381720
255110.520915
212000.406292
195110.622862
label_1_nbr: 164
sample number: 393
Top 0.01 label_1_nbr:3,sample_nbr:3,recall:0.018293
Top 0.02 label_1_nbr:7,sample_nbr:7,recall:0.042683
Top 0.05 label_1_nbr:19,sample_nbr:19,recall:0.115854
Top 0.10 label_1_nbr:38,sample_nbr:39,recall:0.231707
Top 0.20 label_1_nbr:75,sample_nbr:78,recall:0.457317

网格搜索最佳参数

param_grid = [
{'n_estimators': [3, 10, 30], 'max_depth': [2, 4, 6, 8],'learning_rate': [0.01,0.05,0.1]}
]

clf = xgb.XGBClassifier()
grid_search = GridSearchCV(clf, param_grid, cv=5,scoring='neg_mean_squared_error')
grid_search.fit(x_train, y_train)
print (grid_search.best_params_)
print (grid_search.best_estimator_)
{'learning_rate': 0.1, 'max_depth': 4, 'n_estimators': 10}
XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,
              importance_type='gain', interaction_constraints='',
              learning_rate=0.1, max_delta_step=0, max_depth=4,
              min_child_weight=1, missing=nan, monotone_constraints='()',
              n_estimators=10, n_jobs=0, num_parallel_tree=1, random_state=0,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
              tree_method='exact', validate_parameters=1, verbosity=None)

查看特征的正负样本分布

def KdePlot(df,label,factor,flag=None,positive=1):
    import seaborn as sns
    import matplotlib.pyplot as plt
    
    # 设置核密度分布图
    plt.figure(figsize=(20,10))
    sns.set(style='white')
    if positive==0:
        df[factor] = np.abs(df[factor])
    else:
        pass
    if flag == 'log':
        x0 = np.log(df[df[label]==0][factor]+1)
        x1 = np.log(df[df[label]==1][factor]+1)
    else:
        x0 = df[df[label]==0][factor]
        x1 = df[df[label]==1][factor]
        
    sns.distplot(x0,
               color = 'blue',
               kde = True, # 绘制密度曲线
               hist = True, # 绘制直方图
               #rug = True, # rug图
               kde_kws = {'shade':True,'color':'green','facecolor':'green','label':'label_0'},
               rug_kws = {'color':'green','height':0.1,'alpha':0.1})
    plt.xlabel('%s'%factor,fontsize=40)
    plt.ylabel('label_0',fontsize = 30)
    plt.xticks(fontsize = 30)
    plt.yticks(fontsize = 30)
    plt.legend(loc='upper left',fontsize=30)
    
    plt.twinx()
    
    sns.distplot(x1,
               color = 'orange',
               kde = True, # 绘制密度曲线
               hist = True, # 绘制直方图
               #rug = True, # rug图
               kde_kws = {'shade':True,'color':'red','facecolor':'red','label':'label_1'},
               rug_kws = {'color':'red','height':0.1,'alpha':0.2})
#     plt.xlabel('%s'%factor,fontsize=40)
    plt.ylabel('label_1',fontsize = 30)
    plt.xticks(fontsize = 30)
    plt.yticks(fontsize = 30)
    plt.legend(loc='upper right',fontsize=30)
    plt.show()
    
for factor in importance_df['features'].values:
    KdePlot(data_onehot,'Survived',factor)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-xWt8sib9-1615976811653)(output_34_0.png)]

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-elFxCb98-1615976811658)(output_34_1.png)]

XGBoost网络实例

下面随机构造了一批数据,经过学习也能有很高的准确率。

实例来自

import numpy as np
import sklearn as sk
from sklearn.model_selection import train_test_split
import xgboost as xgb
import matplotlib.pyplot as plt
import matplotlib as mpl


def markData():

    x1 = 5 + np.random.rand(30)*5
    y1 = 8 + np.random.rand(30)*5

    x2 = 9 + np.random.rand(30) * 3
    y2 = 1 + np.random.rand(30) * 5

    x3 = 4 + np.random.rand(30) * 5
    y3 = 3 + np.random.rand(30) * 5

    x4 = 8 + np.random.rand(30) * 7
    y4 = 5 + np.random.rand(30) * 5
    # plt.figure()
    # plt.plot(x1, y1, 'ro', linewidth=0.8, label='x1')
    # plt.plot(x2, y2, 'ko', linewidth=0.8, label='x2')
    # plt.plot(x3, y3, 'bo', linewidth=0.8, label='x3')
    # plt.plot(x4, y4, 'go', linewidth=0.8, label='x4')
    # plt.legend(loc="upper right",)
    # plt.show()
    # print(x1)
    x = np.hstack((x1, x2, x3, x4))
    y = np.hstack((y1, y2, y3, y4))
    x = np.stack((x, y), axis=0).transpose()
    # print(x)
    # plt.figure()
    # plt.plot(x[0:30, 0], x[0:30, 1], 'ro', linewidth=0.8)
    # plt.plot(x[30:60, 0], x[30:60, 1], 'bo', linewidth=0.8)
    # plt.plot(x[60:90, 0], x[60:90, 1], 'go', linewidth=0.8)
    # plt.plot(x[90:120, 0], x[90:120, 1], 'ko', linewidth=0.8)
    # plt.show()
    y = np.zeros(120)
    y[0:30] =0
    y[30:60] = 1
    y[60:90] = 2
    y[90:120] =3
    # print(y)
    return x, y


if __name__ == '__main__':
    x, y = markData()
    x_train, x_test, y_train, y_test = train_test_split(x, y, train_size=0.75, random_state=1)
    # print(x_train)
    # print(y_train)
    data_train = xgb.DMatrix(x_train, label=y_train)
    data_test = xgb.DMatrix(x_test, label=y_test)
    # print(data_train)
    # print(data_test)

    # 定义xgb的模型参数
    parms = {'max_depth': 3, 'eta': 0.5, 'slient': 0, 'objective': 'multi:softmax', 'num_class': 4}
    watchlist = [(data_train, 'eval'), (data_test, 'train')]
    bst = xgb.train(parms, data_train, num_boost_round=6, evals=watchlist)
    y_hat = bst.predict(data_test)

    #计算准确率
    print(np.mean(y_hat == y_test))

    # 绘制分类图片
    N, M = 200, 200
    x_min, x_max = np.min(x[:, 0]), np.max(x[:, 0])
    y_min, y_max = np.min(x[:, 1]), np.max(x[:, 1])
    x1 = np.linspace(x_min, x_max, N)
    x2 = np.linspace(y_min, y_max, M)
    tx, ty = np.meshgrid(x1, x2)
    xx = np.stack((tx.flat, ty.flat), axis=1)
    data_xx = xgb.DMatrix(xx)
    yy = bst.predict(data_xx)
    yy = yy.reshape(tx.shape)

    cmp_light = mpl.colors.ListedColormap(['#33FF33', '#FFCC66', '#FFF500', '#22CFCC'])
    cmp_drak = mpl.colors.ListedColormap(['r', 'g', 'b', 'k'])

    plt.figure()
    plt.pcolormesh(tx, ty, yy, cmap=cmp_light)
    plt.scatter(x[:, 0], x[:, 1], c=y, edgecolors='k', cmap=cmp_drak)
    plt.xlabel("x1")
    plt.ylabel("x2")
    plt.xlim(x_min, x_max)
    plt.ylim(y_min, y_max)
    plt.grid(True)
    plt.show()
[18:10:05] WARNING: C:\Users\Administrator\workspace\xgboost-win64_release_1.2.0\src\learner.cc:516: 
Parameters: { slient } might not be used.

  This may not be accurate due to some parameters are only used in language bindings but
  passed down to XGBoost core.  Or some parameters are not used but slip through this
  verification. Please open an issue if you find above cases.


[0]	eval-merror:0.03333	train-merror:0.20000
[1]	eval-merror:0.02222	train-merror:0.20000
[2]	eval-merror:0.02222	train-merror:0.20000
[3]	eval-merror:0.01111	train-merror:0.16667
[4]	eval-merror:0.01111	train-merror:0.20000
[5]	eval-merror:0.01111	train-merror:0.20000
0.8

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-hrv3x0d7-1615976811668)(output_36_1.png)]

  • 1
    点赞
  • 10
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值