机器学习分类数据集_一个很好的机器学习分类器对扑克手数据集的准确性指标

最新推荐文章于 2024-04-13 21:03:03 发布

weixin_26726011

最新推荐文章于 2024-04-13 21:03:03 发布

阅读量1.4k

点赞数

文章标签：机器学习人工智能深度学习 python 大数据

原文链接：https://towardsdatascience.com/a-good-machine-learning-classifiers-accuracy-metric-for-the-poker-hand-dataset-44cc3456b66d

版权

机器学习分类数据集

什么是数据集？ (What is the dataset?)

The Poker Hand dataset [Cattral et al., 2007] is publicly available and very well-documented at the UCI Machine Learning Repository [Dua et al., 2019]. [Cattral et al., 2007] described it as:

扑克之手数据集 [Cattral等人，2007]可公开获得，并且在UCI机器学习存储库中有很好的文档记录[Dua等人，2019]。 [Cattral et al。，2007]将其描述为：

Found to be a challenging dataset for classification algorithms

被发现是分类算法的具有挑战性的数据集

It is an 11-dimensional dataset with 25K samples for training and over 1M samples for testing. Each dataset instance is a 5-cards poker-hand that uses two features per card (suite and rank) and the Poker-hand label.

它是一个11维数据集，包含用于训练的25K样本和用于测试的1M样本。每个数据集实例都是一张5张纸牌的扑克手，每张纸牌使用两个功能(套房和等级)和扑克手标签。

为什么很难？ (Why is it hard?)

It has two properties that makes it particular challenging for classification algorithms: it’s all categorical features and it’s extremely imbalanced. Categorical features are hard because the typical distance (a.k.a. similarity) metrics can’t be naturally applied to such features. E.g. this dataset has two features: rank and suite, calculating the Euclidean distance between “spades” and “hearts” simply doesn’t make sense. Imbalanced datasets are hard because the machine learning algorithms kind of assume a good balance, Jason Brownlee from Machine Learning Mastery describes the problem as:

它具有两个特性，这使分类算法特别具有挑战性：它是所有分类功能 ，并且极不平衡 。

最低0.47元/天解锁文章

weixin_26726011

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
机器学习分类数据集_一个很好的机器学习分类器对扑克手数据集的准确性指标

机器学习分类数据集什么是数据集？ (What is the dataset?)The Poker Hand dataset [Cattral et al., 2007] is publicly available and very well-documented at the UCI Machine Learning Repository [Dua et al., 2019]. [Cattra...
复制链接

扫一扫