knn scikit_使knn的速度比scikit在20行中学习的速度快300倍

knn scikit

介绍 (Introduction)

k Nearest Neighbors (kNN) is a simple ML algorithm for classification and regression. Scikit-learn features both versions with a very simple API, making it popular in machine learning courses. There is one issue with it — it’s quite slow! But don’t worry, we can make it work for bigger datasets with the Facebook faiss library.

k最近邻(kNN)是用于分类和回归的简单ML算法。 Scikit-learn的两个版本都具有非常简单的API,使其在机器学习课程中很受欢迎。 有一个问题-它很慢! 但是不用担心,我们可以使用Facebook faiss库使它适用于更大的数据集。

The kNN algorithm has to find the nearest neighbors in the training set for the sample being classified. As the dimensionality (number of features) of the data increases, the time needed to find nearest neighbors rises very quickly. To speed up prediction, in the training phase (.fit() method) kNN classifiers create data structures to keep the training dataset in a more organized way, that will help with nearest neighbor searches.

kNN算法必须在训练集中找到要分类的样本的最近邻居。 随着数据的维数(特征数量)的增加,找到最近邻居所需的时间非常Swift地增加。 为了加快预测速度,在训练阶段( .fit()方法),kNN分类器创建数据结构以使训练数据集保持更有条理,这将有助于进行最近的邻居搜索。

Scikit学习vs Faiss (Scikit-learn vs faiss)

In Scikit-learn, the default “auto” mode automatically chooses the algorithm, based on the training data size and structure. It’s either a brute force search (for very small datasets), or one of the popular data structures for nearest neighbor lookups, k-d tree or ball tree. They are simple, often taught at computational geometry courses, but efficiency of their implementation in Scikit-learn is questionable at best. For example, you may have seen choosing only a small part of the MNIST dataset in kNN tutorials, about 10k — the reason for this is that for the entire dataset, which is 60k images, it would be far too slow. And today this doesn’t even come close to “big data”!

在Scikit-learn中,默认的“自动”模式会根据训练数据的大小和结构自动选择算法。 它要么是蛮力搜索(针对非常小的数据集),要么是用于最近邻居查找,kd树或球树的流行

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值