CatBoost: A machine learning library to handle categorical (CAT) data automatically MACHINE LEARNING

Introduction

How many of you have seen this error while building your machine learning models using “sklearn”?

I bet most of us! At least in the initial days.

This error occurs when dealing with categorical (string) variables. In sklearn, you are required to convert these categories in the numerical format.

In order to do this conversion, we use several pre-processing methods like “label encoding”, “one hot encoding” and others.

In this article, I will discuss a recently open sourced library ” CatBoost” developed and contributed by Yandex. CatBoost can use categorical features directly and is scalable in nature.

“This is the first Russian machine learning technology that’s an open source,” said Mikhail Bilenko, Yandex’s head of machine intelligence and research.

P.S. You can also read this article written by me before “How to deal with categorical variables?“.

Table of Contents

  1. What is CatBoost?
  2. Advantages of CatBoost library
  3. CatBoost in comparison to other boosting algorithms
  4. Installing CatBoost
  5. Solving ML challenge using CatBoost
  6. End Notes

1. What is CatBoost?

CatBoost is a recently open-sourced machine learning algorithm from Yandex. It can easily integrate with deep learning frameworks like Google’s TensorFlow and Apple’s Core ML. It can work with diverse data types to help solve a wide range of problems that businesses face today. To top it up, it provides best-in-class accuracy.

It is especially powerful in two ways:

  • It yields state-of-the-art results without extensive data training
    typically required by other machine learning methods, and
  • Provides powerful out-of-the-box support for the more descriptive
    data formats that accompany many business problems.

CatBoost” name comes from two words “**Cat**egory” and “**Boost**ing”.

As discussed, the library works well with multiple Categories of data, such as audio, text, image including historical data.

“Boost” comes from gradient boosting machine learning algorithm as this library is based on gradient boosting library. Gradient boosting is a powerful machine learning algorithm that is widely applied to multiple types of business challenges like fraud detection, recommendation items, forecasting and it performs well also. It can also return very good result with relatively less data, unlike DL models that need to learn from a massive amount of data.

Here is a video message of Mikhail Bilenko, Yandex’s head of machine intelligence and research and Anna Veronika Dorogush, Head of Tandex machine learning systems.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值