CatBoost参数解释

最新推荐文章于 2025-03-24 07:32:56 发布

AiirrrrYee

最新推荐文章于 2025-03-24 07:32:56 发布

阅读量1.2w

点赞数 3

分类专栏：机器学习算法文章标签： CatBoost 参数

本文链接：https://blog.csdn.net/AiirrrrYee/article/details/78224232

版权

机器学习算法专栏收录该内容

10 篇文章

订阅专栏

官方链接
https://tech.yandex.com/catboost/doc/dg/concepts/python-reference_parameters-list-docpage/

Common parameters

nan_mode (string): 处理输入数据中缺失值的方法，包括Forbidden(禁止存在缺失)，Min(用最小值补)，Max(用最大值补)。默认Min。
calc_feature_importance (bool): 是否计算特征重要性。默认True。
fold_permutation_block_size (int): 在数据随机排列前分块，值越小越慢。默认1。
ignored_features (list): 忽略数据集中的某些特征。默认None。
use_best_model (bool): 设置此参数时，需要提供测试数据，树的个数通过训练参数和优化loss function获得。默认False。
loss_function (string/ object): 支持的有RMSE, Logloss, MAE, CrossEntropy, Quantile, LogLinQuantile, Multiclass, MultiClassOneVsAll, MAPE, Poisson。默认Logloss。
custom_loss (object): 训练过程中损失函数的值。默认None。
eval_metric (string): 用于过拟合检验（设置True）和最佳模型选择（设置True）的loss function，用于优化。
iterations (int): 最大树数。默认500。
border (float): 用于二分类／使用Logloss function中，大于border认为是正样本。默认0.5。
gradient_iterations (int): 梯度下降的步数。默认1。
depth (int): 树深，最大16，建议在1到10之间。默认6。
learning_rate (float): 学习率。默认0.03。
rsm (float [0; 1]): 随机子空间（Random subspace method）。默认1。
partition_random_seed (int): 随机种子。默认None，每次训练随机选择。
leaf_estimation_method (string): 计算叶子值的方法，Newton/ Gradient。默认Gradient。
l2_leaf_reg (int): l2正则参数。默认3
has_time (bool): 在将categorical features转化为numerical features和选择树结构时，顺序选择输入数据。默认False（随机）。
priors (string): 训练过程中指定先验。默认None。
feature_priors (list): 在将categorical features转化为numerical features时，指定先验。
name (string): 在可视化工具中的实验名称。默认experiment。
fold_len_multiplier (float): folds长度系数。设置大于1的参数，在参数较小时获得最佳结果。默认2。
approx_on_full_history (bool): 计算近似值，False：使用1／fold_len_multiplier计算；True：使用fold中前面所有行计算。默认False。
class_weights (list): 类别的权重。默认None。
classes_count (int): 类别label的上限。默认：类别label最大值＋1。
one_hot_max_size (bool): 如果feature包含的不同值的数目超过了指定值，将feature转化为float。默认False
random_strength (float): 分数标准差乘数。默认1。
bagging_temperature (float): 贝叶斯套袋控制强度，区间[0, 1]。默认1。

Overfitting detection settings
- od_type (string): 过拟合检测类型：IncToDec/ Iter。默认IncToDec。
- od_pval (float): 使用IncToDec时的阈值，值越大越早检测出过拟合。默认0（不使用过拟合检测）。
- od_wait (int): 在最小化损失函数后的迭代次数。使用InctoDec时，表示当达到阈值后，忽略过拟合检测，继续训练。使用Iter时，表示达到指定次数后，停止训练。默认20。

CTR settings
- ctr_description (string): categorical features的二值化设置。默认None。包括CTR类型（Borders, Buckets, BinarizedTargetMeanValue，Counter），边界数（只对回归，范围1－255，默认1），二值化类型（只对回归，Median, Uniform, UniformAndQuantiles, MaxSumLog, MinEntropy, GreedyLogSum，默认MinEntropy）。默认None。
- counter_calc_method (string): 计算点击率类型的方法，PrefixTest考虑测试集中当前对象，FullTest考虑测试集中所有对象，SkipTest不考虑测试集中的对象，Full考虑训练和测试集中的全部对象。默认None（PrefixTest）。
- ctr_border_count (int): categorical features的分割数，范围1－255。默认16。
- max_ctr_complexity (int): 组合categorical features的最大数目。默认4。
- ctr_leaf_count_limit (int): categorical features最大叶子数，如果超过设置值则部分叶子被丢弃。叶子按值的频率排序，选择前n个（n为设置值），之后的叶子全丢弃。默认None。
- store_all_simple_ctr (bool): 忽略不使用的categorical features。与ctr_leaf_count_limit一起使用。默认False。

Binarization settings
- border_count (int): numerical features的分割数，范围1－255。默认128。
- feature_border_type (string): numerical features的二值化模式，Median, Uniform, UniformAndQuantiles, MaxSumLog, MinEntropy, GreedyLogSum。默认MiniEntropy。

Performance settings
- thread_count (int): 训练模型时使用的thread，不影响结果。默认None。

Output settings
- verbose (bool): 显示详细信息。默认False。
- train_dir (string): 储存训练期间的文件目录。默认当前目录。
- allow_writing_files (bool): 允许在训练期间写analytical和snapshot文件。如果设置为False，snapshot和可视化工具不能使用。默认True。
- save_snapshot (bool): 启用snapshot在中断后存储训练进度。默认None。
- snapshot_file (string): 存储的文件名称。默认experiment.cbsnapshot。
- plot (bool): 训练期间输出以下信息：损失函数值，自定损失值，已训练时间，距训练结束时间。在jupyter notebook中可以使用。默认False。