catboost特征重要性_CatBoost的Python与R实现

最新推荐文章于 2024-06-16 22:25:40 发布

weixin_39585761

最新推荐文章于 2024-06-16 22:25:40 发布

阅读量1.5k

点赞数

文章标签： catboost特征重要性

作者：徐静 AI图像算法研发工程师

博客：https://dataxujing.github.io/

GitHub: https://github.com/DataXujing

CatBoost(Categorical Boosting)算法是一种类似于XGBoost,LightGBM的Gradient Boosting算法，其算法创新主要有两个：一个是对于离散特征值的处理，采用了ordered TS(target statistic)的方法；其二是提供了两种训练模式：Ordered和Plain，其具体的伪代码如下图所示：

通过ordered boosting的思想解决了Gradient Boosting中常出现的prediction shift问题。

CatBoost目前支持通过Python,R和命令行进行调用和训练，支持GPU,其提供了强大的训练过程可视化功能，可以使用jupyter notebook,CatBoost Viewer,TensorBoard可视化训练过程，学习文档丰富，易于上手。

本文带大家结合kaggle中titanic公共数据集基于Python和R训练CatBoost模型。

Python实现CatBoost1.加载数据：

```pythonfrom catboost.datasets import titanicimport numpy as npfrom sklearn.model_selection import train_test_splitfrom  catboost import CatBoostClassifier, Pool, cvfrom sklearn.metrics import accuracy_scoretrain_df, test_df = titanic()X = train_df.drop('Survived', axis=1)y = train_df.Survived# 数据划分X_train, X_validation, y_train, y_validation = train_test_split(X, y, train_size=0.75, random_state=42)X_test = test_df```

这里我们直接使用数据框的结构，对于CatBoost支持numpy中的数组和pandas中的数据框，同时也提供了一种pool数据结构，如果有速度和内存占用优化的需求，官方建议使用pool数据结构，本文我们使用数据框结构作为例子。

2.使用hyperopt调参：

```pythonimport hyperoptfrom numpy.random import RandomState# 目的是最小化目标函数def h

最低0.47元/天解锁文章

weixin_39585761

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
catboost特征重要性_CatBoost的Python与R实现

作者：徐静 AI图像算法研发工程师博客：https://dataxujing.github.io/GitHub: https://github.com/DataXujingCatBoost(Categorical Boosting)算法是一种类似于XGBoost,LightGBM的Gradient Boosting算法，其算法创新主要有两个：一个是对于离散特征值的处理，采用了order...
复制链接

扫一扫