机器学习分类数据集_一个很好的机器学习分类器对扑克手数据集的准确性指标

机器学习分类数据集

什么是数据集? (What is the dataset?)

The Poker Hand dataset [Cattral et al., 2007] is publicly available and very well-documented at the UCI Machine Learning Repository [Dua et al., 2019]. [Cattral et al., 2007] described it as:

扑克之手数据集 [Cattral等人,2007]可公开获得,并且在UCI机器学习存储库中有很好的文档记录[Dua等人,2019]。 [Cattral et al。,2007]将其描述为:

Found to be a challenging dataset for classification algorithms

被发现是分类算法的具有挑战性的数据集

It is an 11-dimensional dataset with 25K samples for training and over 1M samples for testing. Each dataset instance is a 5-cards poker-hand that uses two features per card (suite and rank) and the Poker-hand label.

它是一个11维数据集,包含用于训练的25K样本和用于测试的1M样本。 每个数据集实例都是一张5张纸牌的扑克手,每张纸牌使用两个功能(套房和等级)和扑克手标签。

为什么很难? (Why is it hard?)

It has two properties that makes it particular challenging for classification algorithms: it’s all categorical features and it’s extremely imbalanced. Categorical features are hard because the typical distance (a.k.a. similarity) metrics can’t be naturally applied to such features. E.g. this dataset has two features: rank and suite, calculating the Euclidean distance between “spades” and “hearts” simply doesn’t make sense. Imbalanced datasets are hard because the machine learning algorithms kind of assume a good balance, Jason Brownlee from Machine Learning Mastery describes the problem as:

它具有两个特性,这使分类算法特别具有挑战性:它是所有分类功能 ,并且极不平衡

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值