局限性很强,首先需要将训练完成后的模型使用save_model('xgb.model')进行保存。
如果一开始的模型为pkl,而后期更换版本后无法使用先前版本对模型进行转换,那么该操作也无法进行,故建议模型保存使用save_model的方式进行。
import xgboost as xgb
xgb_model.save_model('xgb.model')#xgb_model为训练好的模型。
#注意不同版本的XGBOOST无法使用pkl文件进行读取,故使用model的方式保存后可以使用不同版本的xgboost进行读取
model = xgb.Booster(model_file='xgb_model.model')#模型的读取。读取完成后无法直接使用model对DATAFrame格式的数据进行测试。故需要转化为DMatrix格式。
dtrain = xgb.DMatrix(data)#数据格式转化。
probas_train = model.predict(dtrain)#对模型进行训练。
probas_train所得出的数据与PKL的:
test_data1 = test_data1.apply(lambda x: xgb.predict_proba(test_data)[:, 1])所取的数值一致。
具体参数 predict(data, output_margin=False, ntree_limit=0, pred_leaf=False, pred_contribs=False, approx_contribs=False, pred_interactions=False, validate_features=True, training=False, iteration_range=(0, 0), strict_shape=False)
解释:
Parameters
-
data (xgboost.core.DMatrix) – The dmatrix storing the input.
-
output_margin (bool) – Whether to output the raw untransformed margin value.
-
ntree_limit (int) – Deprecated, use iteration_range instead.
-
pred_leaf (bool) – When this option is on, the output will be a matrix of (nsample, ntrees) with each record indicating the predicted leaf index of each sample in each tree. Note that the leaf index of a tree is unique per tree, so you may find leaf 1 in both tree 1 and tree 0.
-
pred_contribs (bool) – When this is True the output will be a matrix of size (nsample, nfeats + 1) with each record indicating the feature contributions (SHAP values) for that prediction. The sum of all feature contributions is equal to the raw untransformed margin value of the prediction. Note the final column is the bias term.
-
approx_contribs (bool) – Approximate the contributions of each feature. Used when
pred_contribs
orpred_interactions
is set to True. Changing the default of this parameter (False) is not recommended. -
pred_interactions (bool) – When this is True the output will be a matrix of size (nsample, nfeats + 1, nfeats + 1) indicating the SHAP interaction values for each pair of features. The sum of each row (or column) of the interaction values equals the corresponding SHAP value (from pred_contribs), and the sum of the entire matrix equals the raw untransformed margin value of the prediction. Note the last row and column correspond to the bias term.
-
validate_features (bool) – When this is True, validate that the Booster’s and data’s feature_names are identical. Otherwise, it is assumed that the feature_names are the same.
-
training (bool) –
Whether the prediction value is used for training. This can effect dart booster, which performs dropouts during training iterations but use all trees for inference. If you want to obtain result with dropouts, set this parameter to True. Also, the parameter is set to true when obtaining prediction for custom objective function.
New in version 1.0.0.
-
iteration_range (Tuple[int, int]) –
Specifies which layer of trees are used in prediction. For example, if a random forest is trained with 100 rounds. Specifying iteration_range=(10, 20), then only the forests built during [10, 20) (half open set) rounds are used in this prediction.
New in version 1.4.0.
-
strict_shape (bool) –
When set to True, output shape is invariant to whether classification is used. For both value and margin prediction, the output shape is (n_samples, n_groups), n_groups == 1 when multi-class is not used. Default to False, in which case the output shape can be (n_samples, ) if multi-class is not used.